Previous | Next --- Slide 15 of 59

gohan2021

During lecture, there was a point that specialized processor has no threads whereas generalized processor has threads. In some sense, I think the parallelism from these specialized processor inherently comes from the Foreach and Reduce primitives provided by the hardware language.

What are ways these (par P) statements gets mapped to underlying hardware? Are they transformed into logic gates with inherent parallelism during instruction execution by the hardware?

shaan0924

Do these templates support both static and dynamic assignment? Upon first glance it seems like these are all static assignment operators so are there other ways we can call dynamic assignment>

fromscratch

Going on a limb here, but could one argue that the distinction between Sequential/Pipe/Stream is not very essential for the programmer? Would it be a decent default to just always go with Pipe or Stream: slight cost in hardware size and generally the best performance? Then this can be fine-tuned later if needed.

sirah

How is Reduce operation parallelized in Spatial implementation? Is it similar to the work-efficient Scan that was discussed a while back in Data-Parallel Thinking lecture, except that we return the last element, instead of the whole scanned array?

jchao01

@sirah my understanding is that it's parallelized through pipelining and essentially duplicating the hardware necessary for the algorithm/controller as in Slide 26

Please log in to leave a comment.