CSCI 5551/7551 Parallel and Distributed Systems
Professor Gita Alaghband

Tentative Syllabus,
Academic Calendar '17 Deadlines

I may change this somewhat, relevant announcements will be made in class.
Office: LSC-811
Email: All your emails must have CSCI 5551 or CSCI 7551 in the subject field; otherwise, I may lose your message.      
Office Hours: Subject to change, I will notify you of change and update this site.
Tuesday-Thursday:  1:00 to 3:00      By appointment only. Please call the CSE Office at 303-315-1411 (or                                                                            303-315- 1408) for appointments.
Tuesday-Thursday:  3:00 to 3:30
Description: This is a state-of-the-art course in the vital area of parallel and distributed systems.
With the advances in the computer architecture field, all new computers including laptops are now multi-core systems. While the computer architectures have all moved to multi-core, the system software and programming of these computers have not advanced at the same rate of progress. In fact AMD and Intel announced that they will increase the number of cores on a chip in all future processors as have most computer companies. For these computers to be used effectively, new system software, programming languages and applications must be designed with expertise in parallel and distributed systems. Industry is now looking for software designers with training in parallel and distributed systems for all of their new developments.

This course will cover and relate three main components essential in parallel computation namely, parallel algorithms, parallel architectures, and parallel languages. The three areas will be described and their design influences on each other will be demonstrated.
Student will use our  Parallel Distributed Systems (PDS) Laboratory  that houses:
    •    Heracles: a multi-core cluster consisting of 18 nodes distributed as:
      o    1 master node, 2 x Intel Xeon E5-2650v4 Processor with 24 cores
      o    16 compute nodes, 2 x Intel Xeon E5-2650v4 with 24 cores (12 cores/processor)
      o    a cluster node with Intel Xeon E5-2650v4 Processor hosting 4 x NVIDIA Tesla P100 GPUs
      o    Mellanox SwitchX-2 18-Port QSFP FDR Externally Managed Switch (1U)
      o    Non-Blocking Switch Capacity of 2Tb/s
      o    128GB
    •    Hydra: a multi-core cluster consisting of 17 nodes distributed as:
      o     1 master node (12 cores)
      o    16 AMD Opteron 2427 nodes (12 core each)
      o    416 GB RAM
      o    ~ 5TB disk space
      o    four nodes connected to 8 Tesla Fermi GPUs 2050, PCIE2x16 (1792 CUDA cores each)
    •    a 64-core AMD Opteron 6274 server with one NVIDIA Kepler GPU (K40c)
    •    a 16-core Intel Xeon processor with 2 Intel Xeon Phi 7120P Coprocessors (122 cores)  equipped with Intel® Parallel Studio XE latest software.

The PDS Lab supports teaching and research in all areas of parallel and distributed computing: advanced computer architectures, operating systems, parallel programming languages, applications, and high performance computing and networking. For more information on the PDS Lab, please visit:
Text: Fundamentals of Parallel Processing,
Harry Jordan and Gita Alaghband
Prentice Hall Publication, 2003.
ISBN: 0-13-901158-7
Prerequisites: Graduate standing in computer science is assumed. If in doubt, please talk to me regarding your background.
Homework/Lab assignments
45% (individual work)
Note: PhD students will conduct more in-depth research projects
In-class assignments

Research & Implementation:
Research Presntation
Implementation Presntation 25%
Peer Reviews/Class participation
Final Grade Assignment
Grade Total points
A 90-100
B 80-89
C 70-79
D 60-69
F 0-59

  • Research Project: Will consist of first a research topic which you will study in depth and present to class followed by a project based on your research that you will implement for your class project. Select a topic based on your interest from the list of topics. I  recommend that you talk to me about your ideas  early on before you submit your proposal.
  • In-class Assignments: There will be some un-announced class assignments and homework solutions.  
  • Peer Reviews/ participation: Students will be involved in grading homework (at times), team assignment reviews (see guide), and research presentations reviews. Class discussions and participation are essential components of this course.
  • All deadlines must be met.
  • It is important to attend class regularly. Students are responsible for missed classes.
  • Workload: You should schedule yourself to spend an average of 9 hours/week for this course outside of classroom.
  • No computers during lectures: Please do not use your computers during lecture time, you may print the notes to take addition notes and add clarification in class, but please do not try to follow the lectures on your computers during class time
  • Student Honor Code: We will adhere to the College of Engineering and Applied Science Student Honor Code.
Tentative  Schedule
August     21 (22) Classes Begin
October    19 Research  & Project Proposals Due (complete with references). Be sure to discuss your project ideas before this date.
October    26
  • Seminar Presentations Start (We may change this date depending on class size)
  • Reports Are Due at The Time of Presentation, (In addition to your report, Email electronic copy of your slides/Power point with annotated notes and complete references)
November    9
 Project Presentations Start, use project review guide for your reviews. (We may change this date depending on class size)
November 20 - 26 Fall Break
December    12 Make sure all your work has been completed and submitted.
No work submission after this date.
Topics Covered 
some adjustments to these topics may be made during the semester due to new parallel computer platforms

  • Introduction
    • SIMD 
    • MIMD 
    • SIMD/MIMD pseudo code 
    • SIMD/MIMD code example

  • Prefix Algorithms
    • Sequential
    • Divide and conquer
    • Upper/Lower construction
    • Size and depth analysis
    • Odd/Even construction
    • Size and depth analysis
    • Combination method
    • Size and depth analysis

  • Speed-up and efficiency

  • Example algorithms
    • vector matrix multiply
    • General linear recurrence
    • Column sweep algorithm and analysis

  • MIMD multiprocessors
    • Shared memory
    • Fragmented (distributed memory)
    • Topology
    • Examples

  • Distributed Processing
    • Introduction
    • Example code

  • Programming shared memory multiprocessors
    • Process management
    • synchronization
    • Data oriented
    • Control oriented
    • Data sharing
    • storage classes
    • Examples
    • OpenMP and Force programming languages

  • Synchronization/communication in distributed memory
    • send/receive (blocking vs. non-blocking)
    • CSP
    • MPI

  • SIMD architectures
    • True vs. pipelined SIMD
    • Memory access organization
    • Instruction Set Model
    • Address calculation
    • PE and CU Instruction Set
    • Communication instructions
    • Mask vectors and conditions
    • Examples

  • Interconnection Networks and Permutations
    • cyclic
    • mesh
    • Perfect and Inverse Perfect Shuffle
    • Crossbar
    • Cube
    • Illiac IV
    • Benes Network
    • Omega Network/ destination tag method
    • Examples

  • NYU Ultracomputer
    • Combining network
    • Fetch and ADD