Over the past few years, high-end microprocessor designs have added support for a version of SIMD (Single Instruction, Multiple Data) parallel execution that can improve specific multimedia operations, yet can be implemented without completely restructuring the microprocessor design. This version of SIMD, which we call SWAR (SIMD Within A Register), uses most of the existing datapaths of the microprocessor, but allows registers and datapaths to be logically partitioned into fields on which operations SIMD parallel operations can be performed.
AMD/Cyrix/Intel MMX, Sun SPARC V9 VIS, HP PA-RISC MAX, DEC Alpha MAX, SGI MIPS MDMX, and now Motorola PowerPC AltiVec, are all SWAR extensions, but they are all different and somewhat quirky, initially intended only to be used by hand-writing assembly-level code. It is also possible to obtain speedup using software SWAR techniques with ordinary processors. This talk discusses the design of a portable high-level programming model for SWAR and the techniques needed to compile programs written in such a language into efficient implementations on any of these types of processors.