
Visual computing tasks such as computational imaging, image/video understanding, and real-time 3D graphics are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and machine learning students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.
Sep 25 |
How superscalar, multi-core, SIMD, and hardware multi-threading are used in CPUs,
GPUs; understanding latency and bandwidth constraints
|
Sep 27 |
Algorithms for taking raw sensor pixels to an RGB image: demosaicing, sharpening,
correcting lens aberrations, multi-shot alignment/merging, image filtering
|
Oct 2 |
Multi-scale processing with Gaussian and Laplacian pyramids, HDR/local tone mapping,
portrait mode in the Pixel 2 camera
|
Oct 4 |
Balancing locality, parallelism, and work, fusion and tiling,
design of the Halide domain-specific language, automatically scheduling image processing pipelines
|
Oct 9 |
Benefits of fixed-function processing, comparing GPUs, DSPs, Image Signal Processors,
and FPGAs for image processing,
domain-specific languages for hardware synthesis such as
Darkroom/Rigel, compiling Halide to hardware
|
Oct 11 |
JPG compression. H.264 video representation/encoding, parallel encoding,
motivations for ASIC acceleration,
emerging opportunities for compression when machines,
not humans, will observe most images
|
Oct 16 |
Light field representation, light-field cameras, computational challenges
of synthesizing video streams for VR output, Google's Jump VR pipeline
|
Oct 18 |
popular DNN trunks and topologies, design of MobileNet,
challenges of direct implementation, where the compute lies in modern networks,
DNN pruning, neural architecture search
|
Oct 23 |
Footprint challenges of training, model vs. data parallelism,
asynchronous vs. synchronous training debate, parameter server designs,
key optimizations for parallel training
|
Oct 25 |
GPUs, Google TPU, special instructions for DNN evaluation,
choice of precision in arithmetic, recent ISCA/MICRO papers on DNN acceleration
|
Oct 30 |
Neural module networks, discussion on value of modularity vs. end-to-end learning
|
Nov 1 |
Exploiting temporal coherence in video, pipelined networks,
specialization to scene and camera viewpoint,
sharing computations across applications and users
|
Nov 6 |
Facebook SVE/Lumos, Scanner, processing as a service
|
Nov 8 |
3D graphics pipeline as a machine architecture (abstraction),
pipeline semantics/functionality, contrasting graphics pipeline architecture with
compute-mode GPU architecture
|
Nov 13 |
Texture sampling basics, hardware texture compression, depth-and-color buffer compression,
motivations for hardware multi-threading for latency hiding in modern GPUs
|
Nov 15 |
Molnar sorting taxonomy, dataflow scheduling under data amplification,
tiled rendering for bandwidth-efficiency, deferred shading as a scheduling decision
|
Nov 27 |
Guest Lecture: Bill Mark (Google)
Topic: specialized hardware for deep learning and computational photography at Google
|
Nov 29 |
Domain-Specific Languages for Shading (with Tim Foley, NVIDIA)
Renderman Shading Language and Cg: contrasting two different levels of
abstraction for shading languages, Slang
|
Dec 4 |
Mapping shaders to GPUs, Design of platform for ML computations, rendering concerns of VR
|
Dec 6 |
How deep learning and hardware specialization stand to make real-time raytracing feasible
|
optional | Optional Assignment 0: Analyzing Parallel Program Performance on a Quad-Core CPU All CVS348K students are encouraged to attempt this assignment during or before the first week of the course to check their background in parallel systems. |
Oct 22 | Assignment 1: Burst Mode HDR Camera RAW Processing for the kPhone 348 |
optional | Optional Assignment 2: Implementing a Separable Conv Layer in Halide |
Dec 11 | Final Project Guidelines: students will complete a substantial term project on a course-relevant topic of their choosing. |