Stanford CS149, Winter 2019
This page contains lecture slides and recommended readings for the Winter 2019 offering of CS149.
(motivations for parallel chip decisions, challenges of parallelizing code)
Further Reading:
- The Future of Microprocessors. by K. Olukotun and L. Hammond, ACM Queue 2005
- Power: A First-Class Architectural Design Constraint. by Trevor Mudge IEEE Computer 2001
(forms of parallelism: multicore, SIMD, threading + understanding latency and bandwidth)
Further Reading:
- CPU DB: Recording Microprocessor History. A. Danowitz, K. Kelley, J. Mao, J.P. Stevenson, M. Horowitz, ACM Queue 2005. (You can also take a peak at the CPU DB website)
- The Compute Architecture of Intel Processor Graphics. Intel Technical Report, 2015 (a very nice description of a modern throughput processor)
- Intel's Haswell CPU Microarchitecture. D. Kanter, 2013 ( article)
- NVIDIA GP100 Pascal Whitepaper. NVIDIA Technical Report 2016
(ways of thinking about parallel programs, and their corresponding hardware implementations)
Further Reading: (some fun systems)
(the thought process of parallelizing a program)
(achieving good work distribution while minimizing overhead, scheduling Cilk programs with work stealing)
Further Reading:
- CilkPlus documentation
- Scheduling Multithreaded Computations by Work Stealing. by Blumofe and Leiserson, JACM 1999
- Implementation of the Cilk 5 Multi-Threaded Language. by Frigo et al. PLDI 1998
- Intel Thread Building Blocks
(message passing, async vs. blocking sends/receives, pipelining, increasing arithmetic intensity, avoiding contention)
(CUDA programming abstractions, and how they are implemented on modern GPUs)
Further Reading:
- You may enjoy the free Udacity Course: Intro to Parallel Programming Using CUDA, by Luebke and Owens
- The Thrust Library is a useful collection library for CUDA.
- Rise of the Graphics Processor. D. Blythe (Proceedings of IEEE 2008) a nice overview of GPU history.
- NVIDIA GeForce GTX 1080 Whitepaper. NVIDIA Technical Report 2016
- NVIDIA Tesla P100 Whitepaper. NVIDIA Technical Report 2016
- The Compute Architecture of Intel Processor Graphics. Intel Technical Report, 2015 (a very nice description of a modern Intel integrated GPU)
- Pascal Tuning Guide. NVIDIA CUDA Documentation
(definition of memory coherence, invalidation-based coherence using MSI and MESI, maintaining coherence with multi-level caches, false sharing)