Visual computing tasks such as computational imaging, image/video understanding, and real-time 3D graphics are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and machine learning students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.
How superscalar, multi-core, SIMD, and hardware multi-threading are used in CPUs/GPUs, understanding latency and bandwidth constraints
Algorithms for taking raw sensor pixels to an RGB image: demosaicing, sharpening, correcting lens aberrations, multi-shot alignment/merging, image filtering
Multi-scale processing with Gaussian and Laplacian pyramids, HDR/local tone mapping, portrait mode in the Pixel 2 camera
Balancing locality, parallelism, and work, fusion and tiling, design of the Halide domain-specific language, automatically scheduling image processing pipelines
popular DNN trunks and topologies, design of MobileNet, challenges of direct implementation, where the compute lies in modern networks, DNN pruning, neural architecture search
GPUs, Google TPU, special instructions for DNN evaluation, choice of precision in arithmetic, recent ISCA/MICRO papers on DNN acceleration, flexibility vs efficiency trade-offs
Footprint challenges of training, model vs. data parallelism, asynchronous vs. synchronous training debate, parameter server designs, key systems optimizations for parallel training
If the most important step of ML is acquiring training data, why don't we have systems for it?
Exploiting temporal coherence in video, specialization to scene and camera viewpoint
H.264 video representation/encoding, parallel encoding, motivations for ASIC acceleration, emerging opportunities for compression when machines, not humans, will observe most images
Parallel video encoding, video ingest at Facebook, discussion of ethics of continuous capture
3D graphics pipeline as a machine architecture (abstraction), pipeline semantics/functionality, contrasting graphics pipeline architecture with compute-mode GPU architecture
3D graphics pipeline as a machine architecture (abstraction), Molnar sorting taxonomy, dataflow scheduling under data amplification, tiled rendering for bandwidth-efficiency, deferred shading as a scheduling decision
Renderman Shading Language and Cg: contrasting two different levels of abstraction for shading languages, Slang
DXR ray tracing APIs, hardware acceleration of raytracing
How might we architects renderers differently to support the needs of training, rather than game engines?
|Apr 23||Burst Mode HDR Camera RAW Processing|
|May 5||Optimizing a Conv Layer in Halide (Making Students Appreciate cuBLAS)||Jun 9||Self-Selected Term Project|