
On Thu, 2009-09-24 at 23:13 +0100, Don Stewart wrote:
It would be beneficial if this wording was applied to all allocation routines - such as mallocForeignPtrBytes, mallocForeignPtrArray, etc. For the curious, this proposal was born from the real-world issue of pulling Word32's from a ByteString in an efficient but portable manner (binary is portable but inefficient, a straight forward unsafePerformIO/peek is efficient but need alignment).
As a side issue, the get/put primitives on Data.Binary should be efficient (though they're about twice as fast when specialized to a strict bytestring... stay tuned for a package in this area).
They are efficient within the constraint of doing byte reads and reconstructing a multi-byte word using bit twiddling. eg: getWord16be :: Get Word16 getWord16be = do s <- readN 2 id return $! (fromIntegral (s `B.index` 0) `shiftl_w16` 8) .|. (fromIntegral (s `B.index` 1)) Where as reading an aligned word directly is rather faster. The problem is that the binary API cannot guarantee alignment so we have to be pessimistic. We could do better on machines that are tolerant of misaligned memory accesses such as x86. We'd need to use cpp to switch between two implementations depending on if the arch supports misaligned memory access and if it's big or little endian. #ifdef ARCH_ALLOWS_MISALIGNED_MEMORY_ACCESS #ifdef ARCH_LITTLE_ENDIAN getWord32le = getWord32host #else getWord32le = ... #endif etc Note also that currently the host order binary ops are not documented as requiring alignment, but they do. They will fail eg on sparc or ppc for misaligned access. Duncan