Stanford CS149, Winter 2019
PARALLEL COMPUTING
NOTICE: THE MOST RECENT OFFERING OF THE COURSE CAN BE VIEWED AT:
From smart phones, to multi-core CPUs and GPUs, to the world's largest supercomputers and web sites, parallel processing is ubiquitous in modern computing. The goal of this course is to provide a deep understanding of the fundamental principles and engineering trade-offs involved in designing modern parallel computing systems as well as to teach parallel programming techniques necessary to effectively utilize these machines. Because writing good parallel programs requires an understanding of key machine performance characteristics, this course will cover both parallel hardware and software design.
Basic Info
Tues/Thurs 4:30-6:00pm
Room 420-040
Instructors: Kunle Olukotun and Kayvon Fatahalian
See the course info page for more info on course policies and logistics.
Winter 2019 Schedule
Jan 8 |
Motivations for parallel chip decisions, challenges of parallelizing code
|
Jan 10 |
Forms of parallelism: multicore, SIMD, threading + understanding latency and bandwidth
|
Jan 15 |
ways of thinking about parallel programs, and their corresponding hardware implementations, ISPC programming
|
Jan 17 |
Thought process of parallelizing a program in data parallel and shared address space models
|
Jan 22 |
Achieving good work distribution while minimizing overhead, scheduling Cilk programs with work stealing
|
Jan 24 |
Message passing, async vs. blocking sends/receives, pipelining, increasing arithmetic intensity, avoiding contention
|
Jan 29 |
CUDA programming abstractions, and how they are implemented on modern GPUs
|
Jan 31 |
Definition of memory coherence, invalidation-based coherence using MSI and MESI, false sharing
|
Feb 5 |
Consistency vs. coherence, relaxed consistency models and their motivation, acquire/release semantics
|
Feb 7 |
Directory-based coherence, machine-level atomic operations, implementing locks, implementing barriers
|
Feb 12 |
Midterm Exam
|
Feb 14 |
Data parallel thinking: map, reduce, scan, prefix sum, groupByKey
|
Feb 19 |
producer-consumer locality, RDD abstraction, Spark implementation and scheduling
|
Feb 21 |
Fine-grained snychronization via locks, basics of lock-free programming: single-reader/writer queues, lock-free stacks, the ABA problem, hazard pointers
|
Feb 26 |
Motivation for transactions, design space of transactional memory implementations, lazy-optimistic HTM
|
Feb 28 |
Energy-efficient computing, motivation for heterogeneous processing, fixed-function processing, FPGAs, mobile SoCs
|
Mar 5 |
Motivation for DSLs, OptiML, Delite, case study on Halide
|
Mar 7 |
GraphLab, Ligra, and GraphChi, streaming graph processing, graph compression
|
Mar 12 |
Scheduling convlayers, exploiting precision and sparsity, DNN acelerators (e.g., GPU TensorCores, TPU)
|
Mar 14 |
Have a great spring break!
|
Programming Assignments
Written Assignments