CSCI 4551/5551 Parallel and Distributed Systems
Professor Gita Alaghband
Tentative Syllabus,
Academic Calendar fall '19 Deadlines

I may change this somewhat, relevant announcements will be made in class.

Office:

LSC-811

Email:

Gita.Alaghband@ucdenver.edu      

WEBSITE:

http://cse.ucdenver.edu/~gita

Office Hours:

Tuesday-Thursday:  1:00 to 3:00      By appointment only. Please call the CSE Office at 303-315-1411 (or 303-315- 1408) for appointments.
Tuesday-Thursday:  3:00 to 3:30

TA contact and office hours provided on the FTP site and emailed directly to students. Note: for all Lab questions and help, please contact the TAs.

Description:

Catalog: Examine a range of topics involving parallel and distributed systems to improve computational performance. Topics include parallel and distributed programming languages, architectures, networks, algorithms and applications.


This is a state-of-the-art course in the vital area of parallel and distributed systems.
With the advances in the computer architecture field, all new computers including laptops are now multi-core systems. While the computer architectures have all moved to multi-core, the system software and programming of these computers have not advanced at the same rate of progress. In fact AMD and Intel announced that they will increase the number of cores on a chip in all future processors as have most computer companies. For these computers to be used effectively, new system software, programming languages and applications must be designed with expertise in parallel and distributed systems. Industry is now looking for software designers with training in parallel and distributed systems for all of their new developments.

This course will cover and relate three main components essential in parallel computation namely, parallel algorithms, parallel architectures, and parallel languages. The three areas will be described and their design influences on each other will be demonstrated.
Student will use our  Parallel Distributed Systems (PDS) Laboratory  that houses:

•    Heracles: a multi-core cluster consisting of 18 nodes distributed as:

o    1 master node, 2 x Intel Xeon E5-2650v4 Processor with 24 cores
o    16 compute nodes, 2 x Intel Xeon E5-2650v4 with 24 cores (12 cores/processor)
o    a cluster node with Intel Xeon E5-2650v4 Processor hosting 4 x NVIDIA Tesla P100 GPUs
o    Mellanox SwitchX-2 18-Port QSFP FDR Externally Managed Switch (1U)
o    Non-Blocking Switch Capacity of 2Tb/s
o    128GB

•    Hydra: a multi-core cluster consisting of 17 nodes distributed as:

o     1 master node (12 cores)
o    16 AMD Opteron 2427 nodes (12 core each)
o    416 GB RAM
o    ~ 5TB disk space
o    four nodes connected to 8 Tesla Fermi GPUs 2050, PCIE2x16 (1792 CUDA cores each)

•    a 64-core AMD Opteron 6274 server with one NVIDIA Kepler GPU (K40c)
•    a 16-core Intel Xeon processor with 2 Intel Xeon Phi 7120P Coprocessors (122 cores)  equipped with Intelฎ Parallel Studio XE latest software.

The PDS Lab supports teaching and research in all areas of parallel and distributed computing: advanced computer architectures, operating systems, parallel programming languages, applications, and high performance computing and networking. For more information on the PDS Lab, please visit: http://PDS.ucdenver.edu

Text:

Fundamentals of Parallel Processing,
Harry Jordan and Gita Alaghband,
Prentice Hall Publication, 2003.
ISBN: 0-13-901158-7                            Note: This course uses material outside the textbook as well.

Prerequisites:

Graduate standing in computer science is assumed for all graduate students.

For everyone in CSCI 4551 or dual BS/MS applying towards BS and MSCSCI; these prerequistes are strictly enforced:  3415 & CSCI 3453 & MATH 3195 (with minimum grade of C-)

Expected Knowledge
At the start of the course: Students must
•    have knowledge of algorithms, their design and implementation, be able to analyze them for their complexity
•    be familiar with various programming languages, their characteristics, and differences
•    have an in-depth understanding of principles of operating systems
•    understand linear algebra, differential equations, and be able to find solution to related problems
At the end of the Course: Students will have gained
•    knowledge of multiple parallel platforms: shared-memory multicores (MIMD), GPUs (SIMD/SIMT), distributed memory multiprocessors (MIMD), and clusters
•    an understanding of parallel algorithm design, complexity and performance analysis for various parallel platforms, and their characteristics to solve problems with emphasis on scientific computing applications
•    familiarity and practice with parallel programming languages for each of the platforms (OpenMP: shared memory MIMD; Cuda: GPU; MPI: distributed memory MIMD)
•    an understanding of parallel program constructs (work distribution & scheduling mechanisms: Loops, case statements; synchronization constructs: critical sections, locks, barriers, process creation and join constructs) and their efficient implements for shared memory MIMD
•    an understanding of various communication protocols for message passing (distributed MIMD)
•    an understanding of the underlying interconnection network for various platforms
ABET Criteria
1: Analyze a complex computing problem and to apply principles of computing and other relevant disciplines to identify solutions
6. Apply computer science theory and software development fundamentals to produce computing -based solutions.

Grading:


Students enrolled in CSCI 4551
Students enrolled in CSCI 5551

Homework

15% (individual work)

15% (individual work)

Lab assignments 35% (individual work) 35% (individual work)

In-class assignments

15% (usually team work)

10% (usually team work)

Research & Implementation:

25% Team Project Implementation, report & presentation

Research Presentation (MS  students only)

10%

Implementation, report & Presentation

20%

Peer Reviews

10%

10%

Final Grade Assignment

Grade

Total points

A

90-100

B

80-89

C

70-79

D

60-69

F

0-59


Notes:

  • Research Project: (CSCI 5551): Will consist of first a research topic which you will study in depth and present to class followed by a project based on your research that you will implement for your class project. Select a topic based on your interest from the list of topics. I  recommend that you talk to me about your ideas  early on before you submit your proposal. (CSCI 4551):  Will complete a team project . A  list of CSCI-4551 projects  with high-level requirements are provided for you to choose from. I will need your proposal, team members,  and more details discussing variations to the provided projects at the due date. You may discuss other projects of interest with me and if approved work on preparing a detailed proposal. All projects must be  done on PDS available computing facilities.
  • In-class Assignments: We will be discussing and solving some of the Homework problems together in class. There will also be some un-announced class assignments that will require you to work either individually or in teams.  
  • Peer Reviews: Students may be involved in grading homework (at times), team assignment reviews (see guide),  research presentations reviews, and project reviews. Class discussions and participation are essential components of this course.
  • All deadlines must be met.
  • It is important to attend class regularly. Students are responsible for missed classes.
  • Workload: You should schedule yourself to spend an average of 9 hours/week for this course.
  • No computers during lectures: Please do not use your computers during lecture time, you may print the notes to take addition notes and add clarification in class, but please do not try to follow the lectures on your computers during class time
  • Student Honor Code: We will adhere to the College of Engineering and Applied Science Student Honor Code.

Tentative  Schedule

August     20

Classes Begin

October    17

Research  & Project Proposals Due (complete with references). Be sure to discuss your project ideas before this date.

October    24

  • Seminar Presentations Start (We may change this date depending on class size)
  • Reports Are Due at The Time of Presentation, (In addition to your report, Email electronic copy of your slides/Power point with annotated notes and complete references)

November    8

 Project Presentations Start, use project review guide for your reviews. (We may change this date depending on class size)

November 25 - December  1

Fall Break

December    10

Make sure all your work has been completed and submitted.
No work submission after this date.

Topics Covered 


some adjustments to these topics may be made during the semester due to new parallel computer platforms


  • Introduction
    • SIMD 
    • MIMD 
    • SIMD/MIMD pseudo code 
    • SIMD/MIMD code example (Matrix Multiply)


 

  • Prefix Algorithms
    • Sequential
    • Divide and conquer
    • Upper/Lower construction
    • Size and depth analysis
    • Odd/Even construction
    • Size and depth analysis
    • Combination method
    • Size and depth analysis


 

  • Speed-up and efficiency


 

  • Example algorithms
    • vector matrix multiply, Gaussian Elimination
    • General linear recurrence
    • Column sweep algorithm and analysis

  • SIMD architectures
    • True vs. pipelined SIMD
    • Memory access organization
    • Instruction Set Model
    • Address calculation
    • PE and CU Instruction Set
    • Communication instructions
    • Mask vectors and conditions
    • Examples


 

  • MIMD multiprocessors
    • Shared memory
    • Fragmented (distributed memory)
    • Topology
    • Examples


 

  • Distributed Processing
    • Introduction
    • Example code


 

  • Programming shared memory multiprocessors
    • Process management
    • synchronization
    • Data oriented
    • Control oriented
    • Data sharing
    • storage classes
    • Examples, Adaptive Quadrature
    • OpenMP and Force programming languages


 

  • Synchronization/communication in distributed memory
    • send/receive (blocking vs. non-blocking)
    • CSP
    • MPI


 

  • Interconnection Networks and Permutations
    • cyclic
    • mesh
    • Perfect and Inverse Perfect Shuffle
    • Crossbar
    • Cube
    • Illiac IV
    • Benes Network
    • Omega Network/ destination tag method
    • Examples


 

  • NYU Ultracomputer
    • Combining network
    • Fetch and ADD