CSCI 4551/5551

CSCI 4551/5551 Parallel and Distributed Systems
Professor Gita Alaghband
Tentative Syllabus,
Academic Calendar fall '19 Deadlines

I may change this somewhat, relevant announcements will be made in class.

Office:

LSC-811

Email:

Gita.Alaghband@ucdenver.edu

WEBSITE:

http://cse.ucdenver.edu/~gita

Office Hours:

Tuesday-Thursday: 1:00 to 3:00 By appointment only. Please call the CSE Office at 303-315-1411 (or 303-315- 1408) for appointments.
Tuesday-Thursday: 3:00 to 3:30

TA contact and office hours provided on the FTP site and emailed directly to students. Note: for all Lab questions and help, please contact the TAs.

Description:

Catalog: Examine a range of topics involving parallel and distributed systems to improve computational performance. Topics include parallel and distributed programming languages, architectures, networks, algorithms and applications.

This is a state-of-the-art course in the vital area of parallel and distributed systems.
With the advances in the computer architecture field, all new computers including laptops are now multi-core systems. While the computer architectures have all moved to multi-core, the system software and programming of these computers have not advanced at the same rate of progress. In fact AMD and Intel announced that they will increase the number of cores on a chip in all future processors as have most computer companies. For these computers to be used effectively, new system software, programming languages and applications must be designed with expertise in parallel and distributed systems. Industry is now looking for software designers with training in parallel and distributed systems for all of their new developments.

This course will cover and relate three main components essential in parallel computation namely, parallel algorithms, parallel architectures, and parallel languages. The three areas will be described and their design influences on each other will be demonstrated.
Student will use our Parallel Distributed Systems (PDS) Laboratory that houses:

• Heracles: a multi-core cluster consisting of 18 nodes distributed as:

o   1 master node, 2 x Intel Xeon E5-2650v4 Processor with 24 cores
o   16 compute nodes, 2 x Intel Xeon E5-2650v4 with 24 cores (12 cores/processor)
o   a cluster node with Intel Xeon E5-2650v4 Processor hosting 4 x NVIDIA Tesla P100 GPUs
o   Mellanox SwitchX-2 18-Port QSFP FDR Externally Managed Switch (1U)
o   Non-Blocking Switch Capacity of 2Tb/s
o   128GB

• Hydra: a multi-core cluster consisting of 17 nodes distributed as:

o   1 master node (12 cores)
o   16 AMD Opteron 2427 nodes (12 core each)
o   416 GB RAM
o   ~ 5TB disk space
o   four nodes connected to 8 Tesla Fermi GPUs 2050, PCIE2x16 (1792 CUDA cores each)

• a 64-core AMD Opteron 6274 server with one NVIDIA Kepler GPU (K40c)
• a 16-core Intel Xeon processor with 2 Intel Xeon Phi 7120P Coprocessors (122 cores) equipped with Intel® Parallel Studio XE latest software.

The PDS Lab supports teaching and research in all areas of parallel and distributed computing: advanced computer architectures, operating systems, parallel programming languages, applications, and high performance computing and networking. For more information on the PDS Lab, please visit: http://PDS.ucdenver.edu

Text:

Fundamentals of Parallel Processing,
Harry Jordan and Gita Alaghband,
Prentice Hall Publication, 2003.
ISBN: 0-13-901158-7 Note: This course uses material outside the textbook as well.

Prerequisites:

Graduate standing in computer science is assumed for all graduate students.

For everyone in CSCI 4551 or dual BS/MS applying towards BS and MSCSCI; these prerequistes are strictly enforced: 3415 & CSCI 3453 & MATH 3195 (with minimum grade of C-)

Expected Knowledge

At the start of the course: Students must
•    have knowledge of algorithms, their design and implementation, be able to analyze them for their complexity
•    be familiar with various programming languages, their characteristics, and differences
•    have an in-depth understanding of principles of operating systems
•    understand linear algebra, differential equations, and be able to find solution to related problems

At the end of the Course: Students will have gained
•    knowledge of multiple parallel platforms: shared-memory multicores (MIMD), GPUs (SIMD/SIMT), distributed memory multiprocessors (MIMD), and clusters
•    an understanding of parallel algorithm design, complexity and performance analysis for various parallel platforms, and their characteristics to solve problems with emphasis on scientific computing applications
•    familiarity and practice with parallel programming languages for each of the platforms (OpenMP: shared memory MIMD; Cuda: GPU; MPI: distributed memory MIMD)
•    an understanding of parallel program constructs (work distribution & scheduling mechanisms: Loops, case statements; synchronization constructs: critical sections, locks, barriers, process creation and join constructs) and their efficient implements for shared memory MIMD
•    an understanding of various communication protocols for message passing (distributed MIMD)
•    an understanding of the underlying interconnection network for various platforms

ABET Criteria

1: Analyze a complex computing problem and to apply principles of computing and other relevant disciplines to identify solutions
6. Apply computer science theory and software development fundamentals to produce computing -based solutions.

Grading:

Students enrolled in CSCI 4551

Students enrolled in CSCI 5551

Homework

15% (individual work)

Lab assignments

35% (individual work)

In-class assignments

15% (usually team work)

10% (usually team work)

Research & Implementation:

25% Team Project Implementation, report & presentation

Research Presentation (MS students only)	10%
Implementation, report & Presentation	20%

Peer Reviews

10%

Final Grade Assignment

Grade	Total points
A	90-100
B	80-89
C	70-79
D	60-69
F	0-59

Notes:

Research Project: (CSCI 5551): Will consist of first a research topic which you will study in depth and present to class followed by a project based on your research that you will implement for your class project. Select a topic based on your interest from the list of topics. I recommend that you talk to me about your ideas early on before you submit your proposal. (CSCI 4551): Will complete a team project . A list of CSCI-4551 projects with high-level requirements are provided for you to choose from. I will need your proposal, team members, and more details discussing variations to the provided projects at the due date. You may discuss other projects of interest with me and if approved work on preparing a detailed proposal. All projects must be done on PDS available computing facilities.

In-class Assignments: We will be discussing and solving some of the Homework problems together in class. There will also be some un-announced class assignments that will require you to work either individually or in teams.

Peer Reviews: Students may be involved in grading homework (at times), team assignment reviews (see guide), research presentations reviews, and project reviews. Class discussions and participation are essential components of this course.

All deadlines must be met.

It is important to attend class regularly. Students are responsible for missed classes.

Workload: You should schedule yourself to spend an average of 9 hours/week for this course.

No computers during lectures: Please do not use your computers during lecture time, you may print the notes to take addition notes and add clarification in class, but please do not try to follow the lectures on your computers during class time
Student Honor Code: We will adhere to the College of Engineering and Applied Science Student Honor Code.

Tentative Schedule

August 20	Classes Begin
October 17	Research & Project Proposals Due (complete with references). Be sure to discuss your project ideas before this date.
October 24	Seminar Presentations Start (We may change this date depending on class size) Reports Are Due at The Time of Presentation, (In addition to your report, Email electronic copy of your slides/Power point with annotated notes and complete references) Peer Reviews Due Next Class Period (Typed Form)
November 8	Project Presentations Start, use project review guide for your reviews. (We may change this date depending on class size)
November 25 - December 1	Fall Break
December 10	Make sure all your work has been completed and submitted. No work submission after this date.

Topics Covered

some adjustments to these topics may be made during the semester due to new parallel computer platforms

Introduction

SIMD
MIMD
SIMD/MIMD pseudo code
SIMD/MIMD code example (Matrix Multiply)

Prefix Algorithms

Sequential
Divide and conquer
Upper/Lower construction
Size and depth analysis
Odd/Even construction
Size and depth analysis
Combination method
Size and depth analysis

Speed-up and efficiency

Example algorithms

vector matrix multiply, Gaussian Elimination
General linear recurrence
Column sweep algorithm and analysis

SIMD architectures

True vs. pipelined SIMD
Memory access organization
Instruction Set Model
Address calculation
PE and CU Instruction Set
Communication instructions
Mask vectors and conditions
Examples

MIMD multiprocessors

Shared memory
Fragmented (distributed memory)
Topology
Examples

Distributed Processing

Introduction
Example code

Programming shared memory multiprocessors

Process management
synchronization
Data oriented
Control oriented
Data sharing
storage classes
Examples, Adaptive Quadrature
OpenMP and Force programming languages

Synchronization/communication in distributed memory

send/receive (blocking vs. non-blocking)
CSP
MPI

Interconnection Networks and Permutations

cyclic
mesh
Perfect and Inverse Perfect Shuffle
Crossbar
Cube
Illiac IV
Benes Network
Omega Network/ destination tag method
Examples

NYU Ultracomputer

Combining network
Fetch and ADD