This is probably specific to whichever set of intrinsics you are using, but can SIMD loads/stores only fetch/write data from a contiguous block of memory?
tim
How often do people actually explicitly use SIMD? Unrolling loops into blocks of 8 iterations sounds cumbersome.
abraoliv
How does one write portable code for multi-architecture support? Do I need to re-write my loops of 8 into loops of 4 for ARM processors and keep two separate versions?
This is probably specific to whichever set of intrinsics you are using, but can SIMD loads/stores only fetch/write data from a contiguous block of memory?