
Hi Gwern,
I get String/Data.Binary issues too. My suggestion would be to change
your strings to ByteString's, serisalise, and then do the reverse
conversion when reading. Interestingly, a String and a ByteString have
identical Data.Binary reps, but in my experiments converting,
including the cost of BS.unpack, makes the reading substantially
cheaper.
Thanks
Neil
On Thu, Mar 5, 2009 at 2:33 AM, Gwern Branwen
On Tue, Mar 3, 2009 at 11:50 PM, Spencer Janssen
wrote: On Tue, Mar 3, 2009 at 10:30 PM, Gwern Branwen
wrote: So recently I've been having issues with Data.Binary & Data.Sequence; I serialize a 'Seq String'
You can see the file here: http://code.haskell.org/yi/Yi/IReader.hs
The relevant function seems to be:
-- | Read in database from 'dbLocation' and then parse it into an 'ArticleDB'. readDB :: YiM ArticleDB readDB = io $ (dbLocation >>= r) `catch` (\_ -> return empty) where r x = fmap (decode . BL.fromChunks . return) $ B.readFile x -- We read in with strict bytestrings to guarantee the file is closed, -- and then we convert it to the lazy bytestring data.binary expects. -- This is inefficient, but alas...
My current serialized file is about 9.4M. I originally thought that the issue might be the recent upgrade in Yi to binary 0.5, but I unpulled patches back to past that, and the problem still manifested.
Whenever yi tries to read the articles.db file, it stack overflows. It actually stack-overflowed on even smaller files, but I managed to bump the size upwards, it seems, by the strict-Bytestring trick. Unfortunately, my personal file has since passed whatever that limit was.
I've read carefully the previous threads on Data.Binary and Data.Map stack-overflows, but none of them seem to help; hacking some $!s or seqs into readDB seems to make no difference, and Seq is supposed to be a strict datastructure already! Doing things in GHCi has been tedious, and hasn't enlightened me much: sometimes things overflow and sometimes they don't. It's all very frustrating and I'm seriously considering going back to using the original read/show code unless anyone knows how to fix this - that approach may be many times slower, but I know it will work.
-- gwern
Have you tried the darcs version of binary? It has a new instance which looks more efficient than the old.
Cheers, Spencer Janssen
I have. It still stack-overflows on my 9.8 meg file. (The magic number seems to be somewhere between 9 and 10 megabytes.)
-- gwern _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe