
Visual computing tasks such as computational imaging, image/video understanding, and real-time 3D graphics are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and machine learning students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.
Apr 7 |
How superscalar, multi-core, SIMD, and hardware multi-threading are used in CPUs/GPUs,
understanding latency and bandwidth constraints
|
Apr 9 |
Algorithms for taking raw sensor pixels to an RGB image: demosaicing, sharpening,
correcting lens aberrations, multi-shot alignment/merging, image filtering
|
Apr 14 |
Multi-scale processing with Gaussian and Laplacian pyramids, HDR/local tone mapping,
portrait mode in the Pixel 2 camera
|
Apr 16 |
Balancing locality, parallelism, and work, fusion and tiling, design of the Halide domain-specific language,
automatically scheduling image processing pipelines
|
Apr 21 |
popular DNN trunks and topologies, design of MobileNet, challenges of direct implementation,
where the compute lies in modern networks, DNN pruning, neural architecture search
|
Apr 23 |
GPUs, Google TPU, special instructions for DNN evaluation,
choice of precision in arithmetic, recent ISCA/MICRO papers on DNN acceleration, flexibility vs efficiency trade-offs
|
Apr 28 |
Footprint challenges of training, model vs. data parallelism,
asynchronous vs. synchronous training debate, parameter server designs,
key systems optimizations for parallel training
|
Apr 30 | |
May 5 |
If the most important step of ML is acquiring training data, why don't we have systems for it?
|
May 7 |
Exploiting temporal coherence in video, specialization to scene and camera viewpoint
|
May 12 |
H.264 video representation/encoding, parallel encoding, motivations for ASIC acceleration,
emerging opportunities for compression when machines, not humans, will observe most images
|
May 14 |
Parallel video encoding, video ingest at Facebook, discussion of ethics of continuous capture
|
May 19 |
3D graphics pipeline as a machine architecture (abstraction), pipeline semantics/functionality, contrasting graphics pipeline architecture with compute-mode GPU architecture
|
May 21 |
3D graphics pipeline as a machine architecture (abstraction), Molnar sorting taxonomy, dataflow scheduling under data amplification, tiled rendering for bandwidth-efficiency, deferred shading as a scheduling decision
|
May 26 |
Renderman Shading Language and Cg: contrasting two different levels of abstraction for shading languages, Slang
|
May 28 |
DXR ray tracing APIs, hardware acceleration of raytracing
|
Jun 2 |
Class Discussion
|
Jun 4 |
How might we architects renderers differently to support the needs of training, rather than game engines?
|
Jun 9 |
Project Presentations
|
Apr 23 | Burst Mode HDR Camera RAW Processing |
May 5 | Optimizing a Conv Layer in Halide (Making Students Appreciate cuBLAS) | Jun 9 | Self-Selected Term Project |