ByteString.getContents fails for files >2GB on OS X

Hi,
Data.ByteString.Char8.getContents fails for files >2GB on OS X. Is
there a fix for this?
$ cat getContents.hs
main = getContents
$ ./getContents

Do you have a 32bit or 64bit GHC build? That might have something to
do with it, if you're nearing 2^32 (or 2^31) bytes.
Erik
On Fri, Jun 8, 2012 at 2:25 AM, Shaun Jackman
Hi,
Data.ByteString.Char8.getContents fails for files >2GB on OS X. Is there a fix for this?
$ cat getContents.hs main = getContents $ ./getContents
Mac OS X 10.7.4 64-bit
As a workaround, I used ByteString.Lazy instead of the strict ByteString, which worked, but found it was ~4 times slower for my program, so I'd like to get the strict ByteString working with large files.
Cheers, Shaun
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Isn't it more likely to be due to the garbage collector's strategy (copying) ?
--Serge
On Fri, Jun 8, 2012 at 10:29 AM, Erik Hesselink
Do you have a 32bit or 64bit GHC build? That might have something to do with it, if you're nearing 2^32 (or 2^31) bytes.
Erik
On Fri, Jun 8, 2012 at 2:25 AM, Shaun Jackman
wrote: Hi,
Data.ByteString.Char8.getContents fails for files >2GB on OS X. Is there a fix for this?
$ cat getContents.hs main = getContents $ ./getContents
Mac OS X 10.7.4 64-bit
As a workaround, I used ByteString.Lazy instead of the strict ByteString, which worked, but found it was ~4 times slower for my program, so I'd like to get the strict ByteString working with large files.
Cheers, Shaun
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi Erik, Serge,
I have a 64-bit build of GHC:
http://www.haskell.org/ghc/dist/7.4.1/ghc-7.4.1-x86_64-apple-darwin.tar.bz2
I think it's fundamentally an OS X issue. The system call read(2)
fails for reads >2 GB with EINVAL, even though I have a 64-bit OS X
kernel. GHC would need to hack around this issue.
Cheers,
Shaun
On 8 June 2012 05:08, Serge Le Huitouze
Isn't it more likely to be due to the garbage collector's strategy (copying) ?
--Serge
On Fri, Jun 8, 2012 at 10:29 AM, Erik Hesselink
wrote: Do you have a 32bit or 64bit GHC build? That might have something to do with it, if you're nearing 2^32 (or 2^31) bytes.
Erik
On Fri, Jun 8, 2012 at 2:25 AM, Shaun Jackman
wrote: Hi,
Data.ByteString.Char8.getContents fails for files >2GB on OS X. Is there a fix for this?
$ cat getContents.hs main = getContents $ ./getContents
Mac OS X 10.7.4 64-bit
As a workaround, I used ByteString.Lazy instead of the strict ByteString, which worked, but found it was ~4 times slower for my program, so I'd like to get the strict ByteString working with large files.
Cheers, Shaun
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Try http://hackage.haskell.org/package/bytestring-mmap ?
G
On Fri, Jun 8, 2012 at 8:23 PM, Shaun Jackman
Hi Erik, Serge,
I have a 64-bit build of GHC: http://www.haskell.org/ghc/dist/7.4.1/ghc-7.4.1-x86_64-apple-darwin.tar.bz2
I think it's fundamentally an OS X issue. The system call read(2) fails for reads >2 GB with EINVAL, even though I have a 64-bit OS X kernel. GHC would need to hack around this issue.
Cheers, Shaun
On 8 June 2012 05:08, Serge Le Huitouze
wrote: Isn't it more likely to be due to the garbage collector's strategy (copying) ?
--Serge
On Fri, Jun 8, 2012 at 10:29 AM, Erik Hesselink
wrote: Do you have a 32bit or 64bit GHC build? That might have something to do with it, if you're nearing 2^32 (or 2^31) bytes.
Erik
On Fri, Jun 8, 2012 at 2:25 AM, Shaun Jackman
wrote: Hi,
Data.ByteString.Char8.getContents fails for files >2GB on OS X. Is there a fix for this?
$ cat getContents.hs main = getContents $ ./getContents
Mac OS X 10.7.4 64-bit
As a workaround, I used ByteString.Lazy instead of the strict ByteString, which worked, but found it was ~4 times slower for my program, so I'd like to get the strict ByteString working with large files.
Cheers, Shaun
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
--
Gregory Collins


System.IO.MMap (mmapFileByteString) worked like a charm in loading
files larger than 2 GB on OS X. Using mmapFileByteString and strict
ByteString is roughly seven times faster for my program than using
getContents and ByteString.Lazy.
Cheers,
Shaun
On 11 June 2012 07:08, Gracjan Polak
Gregory Collins
writes: Or:
http://hackage.haskell.org/package/mmap
-- Gracjan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (5)
-
Erik Hesselink
-
Gracjan Polak
-
Gregory Collins
-
Serge Le Huitouze
-
Shaun Jackman