Strongly Specify Alignment for FFI Allocation

Aside from section 5.7 (storable) and comments on 'alignPtr', the only mention of alignment in the FFI addendum is on mallocBytes/allocaBytes: "The block of memory is sufficiently aligned for any of the basic foreign types (see Section 3.2) that fits into a memory block of the allocated size" It would be beneficial if this wording was applied to all allocation routines - such as mallocForeignPtrBytes, mallocForeignPtrArray, etc. For the curious, this proposal was born from the real-world issue of pulling Word32's from a ByteString in an efficient but portable manner (binary is portable but inefficient, a straight forward unsafePerformIO/peek is efficient but need alignment). If no glaring issue comes up then I'll formalize / make a ticket, Thomas

thomas.dubuisson:
Aside from section 5.7 (storable) and comments on 'alignPtr', the only mention of alignment in the FFI addendum is on mallocBytes/allocaBytes:
"The block of memory is sufficiently aligned for any of the basic foreign types (see Section 3.2) that fits into a memory block of the allocated size"
It would be beneficial if this wording was applied to all allocation routines - such as mallocForeignPtrBytes, mallocForeignPtrArray, etc. For the curious, this proposal was born from the real-world issue of pulling Word32's from a ByteString in an efficient but portable manner (binary is portable but inefficient, a straight forward unsafePerformIO/peek is efficient but need alignment).
As a side issue, the get/put primitives on Data.Binary should be efficient (though they're about twice as fast when specialized to a strict bytestring... stay tuned for a package in this area). -- Don

On Thu, 2009-09-24 at 23:13 +0100, Don Stewart wrote:
It would be beneficial if this wording was applied to all allocation routines - such as mallocForeignPtrBytes, mallocForeignPtrArray, etc. For the curious, this proposal was born from the real-world issue of pulling Word32's from a ByteString in an efficient but portable manner (binary is portable but inefficient, a straight forward unsafePerformIO/peek is efficient but need alignment).
As a side issue, the get/put primitives on Data.Binary should be efficient (though they're about twice as fast when specialized to a strict bytestring... stay tuned for a package in this area).
They are efficient within the constraint of doing byte reads and reconstructing a multi-byte word using bit twiddling. eg: getWord16be :: Get Word16 getWord16be = do s <- readN 2 id return $! (fromIntegral (s `B.index` 0) `shiftl_w16` 8) .|. (fromIntegral (s `B.index` 1)) Where as reading an aligned word directly is rather faster. The problem is that the binary API cannot guarantee alignment so we have to be pessimistic. We could do better on machines that are tolerant of misaligned memory accesses such as x86. We'd need to use cpp to switch between two implementations depending on if the arch supports misaligned memory access and if it's big or little endian. #ifdef ARCH_ALLOWS_MISALIGNED_MEMORY_ACCESS #ifdef ARCH_LITTLE_ENDIAN getWord32le = getWord32host #else getWord32le = ... #endif etc Note also that currently the host order binary ops are not documented as requiring alignment, but they do. They will fail eg on sparc or ppc for misaligned access. Duncan

On Sep 25, 2009, at 07:54 , Duncan Coutts wrote:
pessimistic. We could do better on machines that are tolerant of misaligned memory accesses such as x86. We'd need to use cpp to switch
Hm. I thought x86 could be tolerant (depending on a cpu configuration bit) but the result was so slow that it wasn't worth it? -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Sat, 2009-09-26 at 04:20 +0100, Brandon S. Allbery KF8NH wrote:
On Sep 25, 2009, at 07:54 , Duncan Coutts wrote:
pessimistic. We could do better on machines that are tolerant of misaligned memory accesses such as x86. We'd need to use cpp to switch
Hm. I thought x86 could be tolerant (depending on a cpu configuration bit) but the result was so slow that it wasn't worth it?
It's slow and you would not want to do it much, however I think it's still comparable in speed to doing a series of byte reads/writes and using bit twiddling to convert to/from the larger word type. It's probably also faster to do an unaliged operation sometimes than to do an alignment test each time and call a special unaliged version. Duncan

Thomas DuBuisson:
Aside from section 5.7 (storable) and comments on 'alignPtr', the only mention of alignment in the FFI addendum is on mallocBytes/allocaBytes:
"The block of memory is sufficiently aligned for any of the basic foreign types (see Section 3.2) that fits into a memory block of the allocated size"
It would be beneficial if this wording was applied to all allocation routines - such as mallocForeignPtrBytes, mallocForeignPtrArray, etc. For the curious, this proposal was born from the real-world issue of pulling Word32's from a ByteString in an efficient but portable manner (binary is portable but inefficient, a straight forward unsafePerformIO/peek is efficient but need alignment).
I agree that we should be more precise here.
If no glaring issue comes up then I'll formalize / make a ticket,
Can you please summarise the exact additions that you would like to see as a follow-up email? I will collect all changes that we want to make to the existing FFI Addendum before it goes into the 2009 issue of Haskell'. Cheers, Manuel

Manual,
The verbiage here is borrowed from other parts of the report under the
belief it helps readability.
Option 1: A global note applying to all functions that allocate
memory. Proposed wording for section 5.0:
"All successful results of allocation routines will be sufficiently
aligned for any of the basic foreign types (see Section 3.2) to fit
into a memory block of the allocated size. Array allocation routines
will result in the same alignment as the member elements."
I believe this to be quite clear but if people here are unsatisfied
then a list of example functions could be added at the risk of being
tacky. Or we could avoid specifying functions by mentioning
categories of functions at the risk of muddying the waters:
"This alignment guarantee applies equally to computation-bound
allocations, ForeignPtr references, and Ptr references."
Option 2: Individual notes for all allocation routines
Add to both mallocForeignPtr and mallocForeignPtrBytes:
"The same alignment constraint as for mallocBytes holds."
Add to both mallocForeginPtrArray and mallocForeignPtrArray0:
"The resulting pointer will be suitably aligned to hold values of type b."
Many other functions aren't so deficient as to require extra words -
they contain references to functions that already have alignment
comments, "These functions behave like x, y, z". Still, this hazy
area is why I prefer option 1.
If people actually prefer option 2 then I'll do a more through vetting
of the function descriptions and likely come up with a larger list of
alterations.
Cheers,
Thomas
On Thu, Sep 24, 2009 at 7:08 PM, Manuel M T Chakravarty
Thomas DuBuisson:
Aside from section 5.7 (storable) and comments on 'alignPtr', the only mention of alignment in the FFI addendum is on mallocBytes/allocaBytes:
"The block of memory is sufficiently aligned for any of the basic foreign types (see Section 3.2) that fits into a memory block of the allocated size"
It would be beneficial if this wording was applied to all allocation routines - such as mallocForeignPtrBytes, mallocForeignPtrArray, etc. For the curious, this proposal was born from the real-world issue of pulling Word32's from a ByteString in an efficient but portable manner (binary is portable but inefficient, a straight forward unsafePerformIO/peek is efficient but need alignment).
I agree that we should be more precise here.
If no glaring issue comes up then I'll formalize / make a ticket,
Can you please summarise the exact additions that you would like to see as a follow-up email? I will collect all changes that we want to make to the existing FFI Addendum before it goes into the 2009 issue of Haskell'.
Cheers, Manuel

What about something like
alloca $ \ptr -> ... where ptr :: Ptr Word8.
wouldn't ptr only need byte alignment? if the implementation allocates such values on the stack then it may be hard to guarentee otherwise. So, I think the extra verbiage should only apply to routines that allocate based on a size in bytes, rather than when allocating a specific type like with the plain alloca and malloc routines. John -- John Meacham - ⑆repetae.net⑆john⑈ - http://notanumber.net/

John Meacham wrote:
What about something like
alloca $ \ptr -> ... where ptr :: Ptr Word8.
wouldn't ptr only need byte alignment? if the implementation allocates such values on the stack then it may be hard to guarentee otherwise.
So, I think the extra verbiage should only apply to routines that allocate based on a size in bytes, rather than when allocating a specific type like with the plain alloca and malloc routines.
I had a knee-jerk reaction against this comment, but the more I reflect the more I agree it's probably the right way to go. Thomas
participants (7)
-
Brandon S. Allbery KF8NH
-
Don Stewart
-
Duncan Coutts
-
Duncan Coutts
-
John Meacham
-
Manuel M T Chakravarty
-
Thomas DuBuisson