Re: [Haskell-cafe] Data.Binary stack overflow with Data.Sequence String

On Tue, Mar 3, 2009 at 11:50 PM, Spencer Janssen
On Tue, Mar 3, 2009 at 10:30 PM, Gwern Branwen
wrote: So recently I've been having issues with Data.Binary & Data.Sequence; I serialize a 'Seq String'
You can see the file here: http://code.haskell.org/yi/Yi/IReader.hs
The relevant function seems to be:
-- | Read in database from 'dbLocation' and then parse it into an 'ArticleDB'. readDB :: YiM ArticleDB readDB = io $ (dbLocation >>= r) `catch` (\_ -> return empty) where r x = fmap (decode . BL.fromChunks . return) $ B.readFile x -- We read in with strict bytestrings to guarantee the file is closed, -- and then we convert it to the lazy bytestring data.binary expects. -- This is inefficient, but alas...
My current serialized file is about 9.4M. I originally thought that the issue might be the recent upgrade in Yi to binary 0.5, but I unpulled patches back to past that, and the problem still manifested.
Whenever yi tries to read the articles.db file, it stack overflows. It actually stack-overflowed on even smaller files, but I managed to bump the size upwards, it seems, by the strict-Bytestring trick. Unfortunately, my personal file has since passed whatever that limit was.
I've read carefully the previous threads on Data.Binary and Data.Map stack-overflows, but none of them seem to help; hacking some $!s or seqs into readDB seems to make no difference, and Seq is supposed to be a strict datastructure already! Doing things in GHCi has been tedious, and hasn't enlightened me much: sometimes things overflow and sometimes they don't. It's all very frustrating and I'm seriously considering going back to using the original read/show code unless anyone knows how to fix this - that approach may be many times slower, but I know it will work.
-- gwern
Have you tried the darcs version of binary? It has a new instance which looks more efficient than the old.
Cheers, Spencer Janssen
I have. It still stack-overflows on my 9.8 meg file. (The magic number seems to be somewhere between 9 and 10 megabytes.) -- gwern

Hi Gwern,
I get String/Data.Binary issues too. My suggestion would be to change
your strings to ByteString's, serisalise, and then do the reverse
conversion when reading. Interestingly, a String and a ByteString have
identical Data.Binary reps, but in my experiments converting,
including the cost of BS.unpack, makes the reading substantially
cheaper.
Thanks
Neil
On Thu, Mar 5, 2009 at 2:33 AM, Gwern Branwen
On Tue, Mar 3, 2009 at 11:50 PM, Spencer Janssen
wrote: On Tue, Mar 3, 2009 at 10:30 PM, Gwern Branwen
wrote: So recently I've been having issues with Data.Binary & Data.Sequence; I serialize a 'Seq String'
You can see the file here: http://code.haskell.org/yi/Yi/IReader.hs
The relevant function seems to be:
-- | Read in database from 'dbLocation' and then parse it into an 'ArticleDB'. readDB :: YiM ArticleDB readDB = io $ (dbLocation >>= r) `catch` (\_ -> return empty) where r x = fmap (decode . BL.fromChunks . return) $ B.readFile x -- We read in with strict bytestrings to guarantee the file is closed, -- and then we convert it to the lazy bytestring data.binary expects. -- This is inefficient, but alas...
My current serialized file is about 9.4M. I originally thought that the issue might be the recent upgrade in Yi to binary 0.5, but I unpulled patches back to past that, and the problem still manifested.
Whenever yi tries to read the articles.db file, it stack overflows. It actually stack-overflowed on even smaller files, but I managed to bump the size upwards, it seems, by the strict-Bytestring trick. Unfortunately, my personal file has since passed whatever that limit was.
I've read carefully the previous threads on Data.Binary and Data.Map stack-overflows, but none of them seem to help; hacking some $!s or seqs into readDB seems to make no difference, and Seq is supposed to be a strict datastructure already! Doing things in GHCi has been tedious, and hasn't enlightened me much: sometimes things overflow and sometimes they don't. It's all very frustrating and I'm seriously considering going back to using the original read/show code unless anyone knows how to fix this - that approach may be many times slower, but I know it will work.
-- gwern
Have you tried the darcs version of binary? It has a new instance which looks more efficient than the old.
Cheers, Spencer Janssen
I have. It still stack-overflows on my 9.8 meg file. (The magic number seems to be somewhere between 9 and 10 megabytes.)
-- gwern _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Avoid unpack! ndmitchell:
Hi Gwern,
I get String/Data.Binary issues too. My suggestion would be to change your strings to ByteString's, serisalise, and then do the reverse conversion when reading. Interestingly, a String and a ByteString have identical Data.Binary reps, but in my experiments converting, including the cost of BS.unpack, makes the reading substantially cheaper.
Thanks
Neil
On Thu, Mar 5, 2009 at 2:33 AM, Gwern Branwen
wrote: On Tue, Mar 3, 2009 at 11:50 PM, Spencer Janssen
wrote: On Tue, Mar 3, 2009 at 10:30 PM, Gwern Branwen
wrote: So recently I've been having issues with Data.Binary & Data.Sequence; I serialize a 'Seq String'
You can see the file here: http://code.haskell.org/yi/Yi/IReader.hs
The relevant function seems to be:
-- | Read in database from 'dbLocation' and then parse it into an 'ArticleDB'. readDB :: YiM ArticleDB readDB = io $ (dbLocation >>= r) `catch` (\_ -> return empty) where r x = fmap (decode . BL.fromChunks . return) $ B.readFile x -- We read in with strict bytestrings to guarantee the file is closed, -- and then we convert it to the lazy bytestring data.binary expects. -- This is inefficient, but alas...
My current serialized file is about 9.4M. I originally thought that the issue might be the recent upgrade in Yi to binary 0.5, but I unpulled patches back to past that, and the problem still manifested.
Whenever yi tries to read the articles.db file, it stack overflows. It actually stack-overflowed on even smaller files, but I managed to bump the size upwards, it seems, by the strict-Bytestring trick. Unfortunately, my personal file has since passed whatever that limit was.
I've read carefully the previous threads on Data.Binary and Data.Map stack-overflows, but none of them seem to help; hacking some $!s or seqs into readDB seems to make no difference, and Seq is supposed to be a strict datastructure already! Doing things in GHCi has been tedious, and hasn't enlightened me much: sometimes things overflow and sometimes they don't. It's all very frustrating and I'm seriously considering going back to using the original read/show code unless anyone knows how to fix this - that approach may be many times slower, but I know it will work.
-- gwern
Have you tried the darcs version of binary? It has a new instance which looks more efficient than the old.
Cheers, Spencer Janssen
I have. It still stack-overflows on my 9.8 meg file. (The magic number seems to be somewhere between 9 and 10 megabytes.)
-- gwern _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Avoid massive reductions in runtime while maintaining the same API?
I did move to using ByteString's internally for those bits later on,
but reading String's from Data.Binary with a ByteString+unpack went
much more quickly than reading String's
On Thu, Mar 5, 2009 at 7:35 PM, Don Stewart
Avoid unpack!
ndmitchell:
Hi Gwern,
I get String/Data.Binary issues too. My suggestion would be to change your strings to ByteString's, serisalise, and then do the reverse conversion when reading. Interestingly, a String and a ByteString have identical Data.Binary reps, but in my experiments converting, including the cost of BS.unpack, makes the reading substantially cheaper.
Thanks
Neil
On Thu, Mar 5, 2009 at 2:33 AM, Gwern Branwen
wrote: On Tue, Mar 3, 2009 at 11:50 PM, Spencer Janssen
wrote: On Tue, Mar 3, 2009 at 10:30 PM, Gwern Branwen
wrote: So recently I've been having issues with Data.Binary & Data.Sequence; I serialize a 'Seq String'
You can see the file here: http://code.haskell.org/yi/Yi/IReader.hs
The relevant function seems to be:
-- | Read in database from 'dbLocation' and then parse it into an 'ArticleDB'. readDB :: YiM ArticleDB readDB = io $ (dbLocation >>= r) `catch` (\_ -> return empty) where r x = fmap (decode . BL.fromChunks . return) $ B.readFile x -- We read in with strict bytestrings to guarantee the file is closed, -- and then we convert it to the lazy bytestring data.binary expects. -- This is inefficient, but alas...
My current serialized file is about 9.4M. I originally thought that the issue might be the recent upgrade in Yi to binary 0.5, but I unpulled patches back to past that, and the problem still manifested.
Whenever yi tries to read the articles.db file, it stack overflows. It actually stack-overflowed on even smaller files, but I managed to bump the size upwards, it seems, by the strict-Bytestring trick. Unfortunately, my personal file has since passed whatever that limit was.
I've read carefully the previous threads on Data.Binary and Data.Map stack-overflows, but none of them seem to help; hacking some $!s or seqs into readDB seems to make no difference, and Seq is supposed to be a strict datastructure already! Doing things in GHCi has been tedious, and hasn't enlightened me much: sometimes things overflow and sometimes they don't. It's all very frustrating and I'm seriously considering going back to using the original read/show code unless anyone knows how to fix this - that approach may be many times slower, but I know it will work.
-- gwern
Have you tried the darcs version of binary? It has a new instance which looks more efficient than the old.
Cheers, Spencer Janssen
I have. It still stack-overflows on my 9.8 meg file. (The magic number seems to be somewhere between 9 and 10 megabytes.)
-- gwern _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Thu, Mar 5, 2009 at 2:55 PM, Neil Mitchell
Avoid massive reductions in runtime while maintaining the same API?
I did move to using ByteString's internally for those bits later on, but reading String's from Data.Binary with a ByteString+unpack went much more quickly than reading String's
On Thu, Mar 5, 2009 at 7:35 PM, Don Stewart
wrote: Avoid unpack!
I wish I could use ByteStrings throughout and avoid packing/unpacking, but unfortunately I don't control the Yi interfaces! If I want to stick something in a buffer, String it is. This is where recent discussions about Stringable classes certainly seem apropos - it would be nice if we didn't have to do all this marshalling and converting by hand, indeed. -- gwern

On Thu, Mar 5, 2009 at 6:51 AM, Neil Mitchell
Hi Gwern,
I get String/Data.Binary issues too. My suggestion would be to change your strings to ByteString's, serisalise, and then do the reverse conversion when reading. Interestingly, a String and a ByteString have identical Data.Binary reps, but in my experiments converting, including the cost of BS.unpack, makes the reading substantially cheaper.
Thanks
Neil
Ah, thanks for the advice. Switching to (strict) ByteString seems to resolve the stack overflow. (And thank goodness too, I need my ireader!) I hadn't realized it was the String that was messing things up and being lazy. Very annoying! (The String code was cleaner - fewer packs/unpacks.) -- gwern
participants (3)
-
Don Stewart
-
Gwern Branwen
-
Neil Mitchell