Previous | Next --- Slide 75 of 116

ayushaga

I believe we need 3 threads in this case for full 100% utilization since (12+6)/6 = 3. With a higher ratio of math to memory latency, we need fewer threads for latency hiding since each thread will be doing more useful work on the ALUs at any given time versus waiting on memory bus for a load operation.

Please log in to leave a comment.