mapping large structures into memory

I've dabbled in haskell, but am by no means an expert. I was hoping someone here could help me settle this debate so that we can more seriously consider haskell for a next version of an application we're building.... I would like to understand better what its capabilities are for directly mapping and managing memory. For instance, I would like mmap many large files into memory and mutate their internals directly... without needing to reallocate them (or chunks of them) in the haskell heap, and without resorting to a byte-array and byte-offset representation. Furthermore, I might also like to map intrinsic haskell data structures into this mmap'd memory such that standard library functions can manipulate them (perhaps in a purely functional way, e.g. treating them as haskell arrays of smaller foreign structures). I understand that the foreign function interface has the ability to marshall/unmarshall C structs, but I'm unsure of the memory implications of using this mechanism. Our application has a very large footprint, and reallocating some or all of these mapped files is a non- starter. Thanks, Warren

warrensomebody:
I've dabbled in haskell, but am by no means an expert. I was hoping someone here could help me settle this debate so that we can more seriously consider haskell for a next version of an application we're building....
I would like to understand better what its capabilities are for directly mapping and managing memory. For instance, I would like mmap many large files into memory and mutate their internals directly... without needing to reallocate them (or chunks of them) in the haskell heap, and without resorting to a byte-array and byte-offset representation. Furthermore, I might also like to map intrinsic haskell data structures into this mmap'd memory such that standard library functions can manipulate them (perhaps in a purely functional way, e.g. treating them as haskell arrays of smaller foreign structures).
I understand that the foreign function interface has the ability to marshall/unmarshall C structs, but I'm unsure of the memory implications of using this mechanism. Our application has a very large footprint, and reallocating some or all of these mapped files is a non-starter. Thanks,
It is entirely possible to use mmap to map structures into memory. Thanks to the foreign function interface, there are well-defined semantics for calling to and from C. The key questions would be: * what is the type and representation of the data you wish to map * what operations on them -- Don

On Sep 25, 2009, at 12:14 PM, Don Stewart wrote:
It is entirely possible to use mmap to map structures into memory. Thanks to the foreign function interface, there are well-defined semantics for calling to and from C.
The key questions would be:
* what is the type and representation of the data you wish to map * what operations on them
Right... my question relates more to how well the intrinsic type system integrates with foreign/mapped structures. For instance, I wouldn't want to create my own foreign arrays, and have to replicate all sorts of library code that only works on haskell's intrinsic arrays. I'm assuming here that all this mapped data is self-contained, and doesn't point to heap-allocated structures, although that's a related question -- is it possible to inform the gc about heap pointers stored (temporarily) in these structures (and later identify them in order to swizzle them out when flushing the mapped file to disk). Warren

warrensomebody:
On Sep 25, 2009, at 12:14 PM, Don Stewart wrote:
It is entirely possible to use mmap to map structures into memory. Thanks to the foreign function interface, there are well-defined semantics for calling to and from C.
The key questions would be:
* what is the type and representation of the data you wish to map * what operations on them
Right... my question relates more to how well the intrinsic type system integrates with foreign/mapped structures. For instance, I wouldn't want to create my own foreign arrays, and have to replicate all sorts of library code that only works on haskell's intrinsic arrays.
Well, nothing is really 'intrinsic', but the fundamental distinction are unpinned GC-managed memory, and pinned memory. The 'arrays' package illustrates GC-managed memory, while Data.ByteString or the 'carray' or 'hmatrix' library illustrate pinned memory manipulatable with foreign operations. For your mmapped data, you'll need to assign (coerce) the pointers to that data to a type that describes pinned memory.
I'm assuming here that all this mapped data is self-contained, and doesn't point to heap-allocated structures, although that's a related question -- is it possible to inform the gc about heap pointers stored (temporarily) in these structures (and later identify them in order to swizzle them out when flushing the mapped file to disk).
You can associated a ForeignPtr with mmapped data, and have the GC unmap the data for you once references go out of scope. Simple example: - Data.ByteString A fast Haskell type that can be allocated and manipulated by C or Haskell. -- Don
participants (2)
-
Don Stewart
-
Warren Harris