[ANN] Memory mapped files for POSIX and Windows

Hi all, I just uploaded mmap to hackage: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/mmap The package provides memory mapping abstractions for both POSIX and Windows systems like Vista, Linux and Mac OSX. This library provides a wrapper to mmap(2) or MapViewOfFile, allowing files or devices to be lazily loaded into memory as strict or lazy ByteStrings, ForeignPtrs or even plain Ptrs, using the virtual memory subsystem to do on-demand loading. Modifications are also supported. Package includes a cp-like copy utility that uses different mechanisms to copy contents of file. Here come some statistics from using it on 90MB file: Windows: Prelude copy: 10.453 sec ByteString copy: 0.625 sec ByteString.Lazy copy: 0.516 sec MMap copy: 0.281 sec MMap copy lazy: 0.250 sec Linux: Prelude copy: 3.332 sec ByteString copy: 0.280 sec ByteString.Lazy copy: 0.292 sec MMap copy: 0.264 sec MMap copy lazy: 0.200 sec Mac OSX ppc Tiger 10.4 Prelude copy: 5.719 sec ByteString copy: 0.701 sec ByteString.Lazy copy: 0.864 sec MMap copy: 1.073 sec MMap copy lazy: 1.414 sec Hardware is different on those systems, so only relative comparison makes sense. Memory mapping provides significant advantages on Windows and Linux, not so on Mac OSX. Seems like mmap on Mac is not very well implemented. Any feedback is welcome! -- Gracjan

On Tue, Mar 18, 2008 at 01:46:28PM +0000, Gracjan Polak wrote:
I just uploaded mmap to hackage:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/mmap
The package provides memory mapping abstractions for both POSIX and Windows systems like Vista, Linux and Mac OSX. This library provides a wrapper to mmap(2) or MapViewOfFile, allowing files or devices to be lazily loaded into memory as strict or lazy ByteStrings, ForeignPtrs or even plain Ptrs, using the virtual memory subsystem to do on-demand loading. Modifications are also supported.
Package includes a cp-like copy utility that uses different mechanisms to copy contents of file. Here come some statistics from using it on 90MB file:
Incidentally, you'll probably find that for large files, using mmap on Windows is a huge loser when compared with lazy bytestrings. As far as I understand, on Windows when a file is mmapped, its entire contents are immediately loaded into memory, so if you mmap a file that is larger than your memory, you pay a huge penalty on Windows. I'm not sure what the target audience for this library is, but I'd be surprised if it turns out to be useful for much more than toy projects (or posix-only projects), just because the Windows file system semantics are so screwed up. :( -- David Roundy Department of Physics Oregon State University

Hello David, Tuesday, March 18, 2008, 10:15:48 PM, you wrote:
Incidentally, you'll probably find that for large files, using mmap on Windows is a huge loser when compared with lazy bytestrings. As far as I understand, on Windows when a file is mmapped, its entire contents are immediately loaded into memory, so if you mmap a file that is larger than your memory, you pay a huge penalty on Windows.
it's definitely not true, at least for my own experiments. although i never tried o map entire file to memory, but used only small window into file. mmaping for windows and linux implemented in http://www.haskell.org/library/StreamsBeta.tar.gz where it used just to make i/o faster. at least i tried but found undesirable effects, but not one you have mentioned -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin
into file. mmaping for windows and linux implemented in http://www.haskell.org/library/StreamsBeta.tar.gz where it used just to make i/o faster. at least i tried but found undesirable effects, but not one you have mentioned
I did not know about your package. Surely will look into it. -- Gracjan

On Tue, Mar 18, 2008 at 11:03:23PM +0300, Bulat Ziganshin wrote:
Hello David,
Tuesday, March 18, 2008, 10:15:48 PM, you wrote:
Incidentally, you'll probably find that for large files, using mmap on Windows is a huge loser when compared with lazy bytestrings. As far as I understand, on Windows when a file is mmapped, its entire contents are immediately loaded into memory, so if you mmap a file that is larger than your memory, you pay a huge penalty on Windows.
it's definitely not true, at least for my own experiments. although i never tried o map entire file to memory, but used only small window into file. mmaping for windows and linux implemented in http://www.haskell.org/library/StreamsBeta.tar.gz where it used just to make i/o faster. at least i tried but found undesirable effects, but not one you have mentioned
I've never tried this (since I don't have windows), but it's what I've been told. And no, you wouldn't have noticed unless you tried to mmap a large file in its entirety. It's a major difference in behavior, though, between posix and windows mmap. The former behaves essentially like a faster, better version of lazy IO (provided you don't try to modify the file in-place) that is a major improvement when under tight memory pressure (because it's file-backed memory). The latter is quite the opposite. -- David Roundy Department of Physics Oregon State University

gracjanpolak:
Hi all,
I just uploaded mmap to hackage:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/mmap
The package provides memory mapping abstractions for both POSIX and Windows systems like Vista, Linux and Mac OSX. This library provides a wrapper to mmap(2) or MapViewOfFile, allowing files or devices to be lazily loaded into memory as strict or lazy ByteStrings, ForeignPtrs or even plain Ptrs, using the virtual memory subsystem to do on-demand loading. Modifications are also supported.
Cool. I spent a fair bit of time last year working on getting good performance on unix out of lazy bytestrings with individually mapped chunks. http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-mmap-0... Can you talk about the relationship between these two packages? -- Don

Don Stewart
Can you talk about the relationship between these two packages?
I looked at your package very carefully and they are... different :) I tried to provide common API for both POSIX and Windows, so it is the lowest common denominator approach. Your API is read-only, my provides both reading and writing to/from Ptrs or ForeignPtrs. ByteStrings are layered on top of this. My lazy version can mmap files larger than available address space, yours cannot. Target audience is of course myself and few others that read files about 100MB in size semi-randomly. -- Gracjan

gracjanpolak:
Don Stewart
writes: Can you talk about the relationship between these two packages?
I looked at your package very carefully and they are... different :)
I tried to provide common API for both POSIX and Windows, so it is the lowest common denominator approach.
Your API is read-only, my provides both reading and writing to/from Ptrs or ForeignPtrs. ByteStrings are layered on top of this.
My lazy version can mmap files larger than available address space, yours cannot.
Target audience is of course myself and few others that read files about 100MB in size semi-randomly.
Thanks for the info! Do you think we should merge these packages? Or are they really for difference use cases -- I wouldn't know when to use one over the other, currently. Do your lazy bytestrings support unmapping individiual chunks? How do you support files larger than the address space? -- Don
participants (4)
-
Bulat Ziganshin
-
David Roundy
-
Don Stewart
-
Gracjan Polak