Office:
|
LSC-811
|
Email:
|
Gita.Alaghband@ucdenver.edu
|
WEBSITE:
|
http://cse.ucdenver.edu/~gita
|
Office Hours:
|
Tuesday-Thursday: 1:00 to 3:00 By
appointment only. Please call the CSE Office at 303-315-1411 (or 303-315-
1408) for appointments.
Tuesday-Thursday: 3:00 to 3:30
TA
contact and office hours provided on the FTP site and emailed directly
to students. Note: for all Lab questions and help, please contact the
TAs.
|
Description:
|
Catalog:
Examine a range of topics involving parallel and distributed systems to
improve computational performance. Topics include parallel and
distributed programming languages, architectures, networks, algorithms
and applications.
This is
a state-of-the-art course in the vital area of parallel and distributed
systems. With the advances in the computer architecture field, all new computers
including laptops are now multi-core systems. While the computer
architectures have all moved to multi-core, the system software and
programming of these computers have not advanced at the same rate of
progress. In fact AMD and Intel announced that they will increase the number
of cores on a chip in all future processors as have most computer companies.
For these computers to be used effectively, new system software, programming
languages and applications must be designed with expertise in parallel and
distributed systems. Industry is now looking for software designers with
training in parallel and distributed systems for all of their new
developments.
This course will cover and relate three main components essential in parallel
computation namely, parallel algorithms, parallel architectures, and parallel
languages. The three areas will be described and their design influences on
each other will be demonstrated.
Student will use our Parallel
Distributed Systems (PDS) Laboratory that houses:
Heracles: a
multi-core cluster consisting of 18 nodes distributed as:
o 1 master node, 2
x Intel Xeon E5-2650v4 Processor with 24 cores
o 16 compute nodes,
2 x Intel Xeon E5-2650v4 with 24 cores (12 cores/processor)
o a cluster node
with Intel Xeon E5-2650v4 Processor hosting 4 x NVIDIA Tesla P100 GPUs
o Mellanox SwitchX-2 18-Port QSFP FDR Externally Managed
Switch (1U)
o Non-Blocking
Switch Capacity of 2Tb/s
o 128GB
Hydra: a
multi-core cluster consisting of 17 nodes distributed as:
o 1 master node
(12 cores)
o 16 AMD Opteron
2427 nodes (12 core each)
o 416 GB RAM
o ~ 5TB disk space
o four nodes
connected to 8 Tesla Fermi GPUs 2050, PCIE2x16 (1792 CUDA cores each)
a 64-core AMD Opteron 6274 server with one
NVIDIA Kepler GPU (K40c)
a 16-core Intel Xeon processor with 2 Intel Xeon Phi
7120P Coprocessors (122 cores) equipped with Intelฎ Parallel Studio XE
latest software.
The PDS
Lab supports teaching and research in all areas of parallel and distributed
computing: advanced computer architectures, operating systems, parallel
programming languages, applications, and high performance computing and
networking. For more information on the PDS Lab, please visit: http://PDS.ucdenver.edu
|
Text:
|
Fundamentals of
Parallel Processing,
Harry Jordan and Gita Alaghband,
Prentice Hall Publication, 2003.
ISBN: 0-13-901158-7
Note: This course uses material
outside the textbook as well.
|
Prerequisites:
|
Graduate
standing in computer science is assumed for all graduate students.
For everyone in CSCI 4551 or dual BS/MS applying
towards BS and MSCSCI; these prerequistes are strictly enforced: 3415 & CSCI 3453
& MATH 3195 (with minimum grade of C-)
|
Expected Knowledge
|
At the start of the course: Students must
have knowledge of algorithms, their design and implementation, be able to analyze them for their complexity
be familiar with various programming languages, their characteristics, and differences
have an in-depth understanding of principles of operating systems
understand linear algebra, differential equations, and be able to find solution to related problems
|
At the end of the Course: Students will have gained
knowledge of multiple parallel platforms:
shared-memory multicores (MIMD), GPUs (SIMD/SIMT), distributed memory
multiprocessors (MIMD), and clusters
an understanding of parallel algorithm design,
complexity and performance analysis for various parallel platforms, and
their characteristics to solve problems with emphasis on scientific
computing applications
familiarity and practice with parallel programming
languages for each of the platforms (OpenMP: shared memory MIMD; Cuda:
GPU; MPI: distributed memory MIMD)
an understanding of parallel program constructs
(work distribution & scheduling mechanisms: Loops, case statements;
synchronization constructs: critical sections, locks, barriers, process
creation and join constructs) and their efficient implements for shared
memory MIMD
an understanding of various communication protocols for message passing (distributed MIMD)
an understanding of the underlying interconnection network for various platforms
|
ABET Criteria
|
1: Analyze a complex computing problem and to apply principles of computing and other relevant disciplines to identify solutions
6. Apply computer science theory and software development fundamentals to produce computing -based solutions.
|
Grading:
|
|
Students enrolled in CSCI 4551
|
Students enrolled in CSCI 5551 |
Homework
|
15% (individual work) |
15% (individual work)
|
Lab assignments |
35% (individual work) |
35% (individual work) |
In-class assignments
|
15% (usually team work)
|
10% (usually team work)
|
Research &
Implementation:
|
25% Team Project Implementation, report & presentation
|
Research Presentation (MS students only)
|
10%
|
Implementation, report & Presentation
|
20%
|
|
Peer Reviews
|
10%
|
10%
|
|
Final Grade Assignment
|
Grade
|
Total points
|
A
|
90-100
|
B
|
80-89
|
C
|
70-79
|
D
|
60-69
|
F
|
0-59
|
|
Notes:
|
- Research Project:
(CSCI 5551): Will consist of first a research topic which you will study in depth and
present to class followed by a project based on your research that you
will implement for your class project. Select a topic based on your
interest from the list of topics. I recommend that you talk to me about your
ideas early on before you submit your proposal. (CSCI 4551): Will complete a team project . A list of CSCI-4551 projects
with high-level requirements are provided for you to choose from. I
will need your proposal, team members, and more details
discussing variations to the provided projects at the due date. You may
discuss other projects of interest with me and if approved work on
preparing a detailed proposal. All projects must be done on PDS
available computing facilities.
- In-class Assignments:
We will be discussing and solving some of the Homework problems
together in class. There will also be some un-announced class
assignments that will require you to work either individually or in
teams.
- Peer Reviews:
Students may be involved in grading homework (at times), team
assignment reviews (see guide), research presentations reviews, and project reviews. Class discussions and participation
are essential components of this course.
- All deadlines must be
met.
- It is important to attend class
regularly. Students are responsible for missed
classes.
- Workload: You
should schedule yourself to spend an average of 9 hours/week for this
course.
- No computers during lectures: Please do
not use your computers during lecture time, you may print the notes to
take addition notes and add clarification in class, but please do not
try to follow the lectures on your computers during class time
- Student Honor Code: We will
adhere to the College of
Engineering and Applied Science Student Honor Code.
|
Tentative Schedule
|
August 20
|
Classes Begin
|
October 17
|
Research
& Project Proposals Due (complete
with references). Be sure to discuss your project ideas before this date.
|
October 24
|
- Seminar
Presentations Start (We may change this
date depending on class size)
- Reports
Are Due at The Time of Presentation, (In
addition to your report, Email electronic copy of your slides/Power
point with annotated notes and complete references)
|
November 8
|
Project Presentations
Start, use project review guide for your reviews. (We may change this date depending on class size)
|
November 25 - December 1
|
Fall Break
|
December 10
|
Make sure all your work has been
completed and submitted.
No work submission after this date.
|
|
Topics Covered
|
some adjustments to these topics may be made during the semester due to new
parallel computer platforms
|
|
- Introduction
- SIMD
- MIMD
- SIMD/MIMD
pseudo code
- SIMD/MIMD
code example (Matrix Multiply)
|
|
- Prefix
Algorithms
- Sequential
- Divide
and conquer
- Upper/Lower
construction
- Size
and depth analysis
- Odd/Even
construction
- Size
and depth analysis
- Combination
method
- Size
and depth analysis
|
|
|
|
- Example
algorithms
- vector matrix
multiply, Gaussian Elimination
- General
linear recurrence
- Column
sweep algorithm and analysis
|
|
- SIMD
architectures
- True
vs. pipelined SIMD
- Memory
access organization
- Instruction
Set Model
- Address
calculation
- PE and
CU Instruction Set
- Communication
instructions
- Mask
vectors and conditions
- Examples
|
|
- MIMD
multiprocessors
- Shared
memory
- Fragmented
(distributed memory)
- Topology
- Examples
|
|
- Distributed
Processing
- Introduction
- Example
code
|
|
- Programming
shared memory multiprocessors
- Process
management
- synchronization
- Data
oriented
- Control
oriented
- Data
sharing
- storage
classes
- Examples, Adaptive Quadrature
- OpenMP and
Force programming languages
|
|
- Synchronization/communication
in distributed memory
- send/receive
(blocking vs. non-blocking)
- CSP
- MPI
|
|
- Interconnection
Networks and Permutations
- cyclic
- mesh
- Perfect
and Inverse Perfect Shuffle
- Crossbar
- Cube
- Illiac IV
- Benes
Network
- Omega
Network/ destination tag method
- Examples
|
|
- NYU Ultracomputer
- Combining
network
- Fetch
and ADD
|