Stanford CS149, Fall 2022
PARALLEL COMPUTING
This page contains lecture slides and recommended readings for the Fall 2022 offering of CS149.
(Challenges of parallelizing code, motivations for parallel chips, processor basics)
(Forms of parallelism: multi-core, SIMD, and multi-threading)
(Finish up multi-threaded and latency vs. bandwidth. ISPC programming, abstraction vs. implementation)
(Ways of thinking about parallel programs, thought process of parallelizing a program in data parallel and shared address space models)
(Achieving good work distribution while minimizing overhead, scheduling Cilk programs with work stealing)
(Message passing, async vs. blocking sends/receives, pipelining, increasing arithmetic intensity, avoiding contention)
(CUDA programming abstractions, and how they are implemented on modern GPUs)
(Data-parallel operations like map, reduce, scan, prefix sum, groupByKey)
(Producer-consumer locality, RDD abstraction, Spark implementation and scheduling)
(Efficiently scheduling DNN layers, mapping convs to matrix-multiplication, transformers, layer fusion)
(Definition of memory coherence, invalidation-based coherence using MSI and MESI, false sharing)
(Relaxed consistency models and their motivation, acquire/release semantics)
(Fine-grained synchronization via locks, basics of lock-free programming: single-reader/writer queues, lock-free stacks, the ABA problem, hazard pointers)
(Performance/productivity motivations for DSLs, case studies on several DSLs)
(Motivation for transactions, design space of transactional memory implementations.)
(Finishing up transactional memory focusing on implementations of STM and HTM.)
(Energy-efficient computing, motivation for heterogeneous processing, fixed-function processing, FPGAs, mobile SoCs)
(How DRAM works, suggestions for post-cs149 topics)