Re: [Haskell-cafe] Data.Binary stack overflow with Data.Sequence String

newer
Test if a file is empty or stat in...

older
MPI

Gwern Branwen

5 Mar 2009 5 Mar '09

2:33 a.m.

On Tue, Mar 3, 2009 at 11:50 PM, Spencer Janssen wrote:

...

On Tue, Mar 3, 2009 at 10:30 PM, Gwern Branwen wrote:

...
So recently I've been having issues with Data.Binary & Data.Sequence; I serialize a 'Seq String'

You can see the file here: http://code.haskell.org/yi/Yi/IReader.hs

The relevant function seems to be:

-- | Read in database from 'dbLocation' and then parse it into an 'ArticleDB'. readDB :: YiM ArticleDB readDB = io $ (dbLocation >>= r) `catch` (\_ -> return empty) where r x = fmap (decode . BL.fromChunks . return) $ B.readFile x -- We read in with strict bytestrings to guarantee the file is closed, -- and then we convert it to the lazy bytestring data.binary expects. -- This is inefficient, but alas...

My current serialized file is about 9.4M. I originally thought that the issue might be the recent upgrade in Yi to binary 0.5, but I unpulled patches back to past that, and the problem still manifested.

Whenever yi tries to read the articles.db file, it stack overflows. It actually stack-overflowed on even smaller files, but I managed to bump the size upwards, it seems, by the strict-Bytestring trick. Unfortunately, my personal file has since passed whatever that limit was.

I've read carefully the previous threads on Data.Binary and Data.Map stack-overflows, but none of them seem to help; hacking some $!s or seqs into readDB seems to make no difference, and Seq is supposed to be a strict datastructure already! Doing things in GHCi has been tedious, and hasn't enlightened me much: sometimes things overflow and sometimes they don't. It's all very frustrating and I'm seriously considering going back to using the original read/show code unless anyone knows how to fix this - that approach may be many times slower, but I know it will work.

-- gwern

Have you tried the darcs version of binary? It has a new instance which looks more efficient than the old.

Cheers, Spencer Janssen

I have. It still stack-overflows on my 9.8 meg file. (The magic number seems to be somewhere between 9 and 10 megabytes.) -- gwern

Show replies by date

Neil Mitchell

5 Mar 5 Mar

11:51 a.m.

New subject: Data.Binary stack overflow with Data.Sequence String

Hi Gwern, I get String/Data.Binary issues too. My suggestion would be to change your strings to ByteString's, serisalise, and then do the reverse conversion when reading. Interestingly, a String and a ByteString have identical Data.Binary reps, but in my experiments converting, including the cost of BS.unpack, makes the reading substantially cheaper. Thanks Neil On Thu, Mar 5, 2009 at 2:33 AM, Gwern Branwen wrote:

...

On Tue, Mar 3, 2009 at 11:50 PM, Spencer Janssen wrote:

...
On Tue, Mar 3, 2009 at 10:30 PM, Gwern Branwen wrote:

...
So recently I've been having issues with Data.Binary & Data.Sequence; I serialize a 'Seq String'

You can see the file here: http://code.haskell.org/yi/Yi/IReader.hs

The relevant function seems to be:

-- | Read in database from 'dbLocation' and then parse it into an 'ArticleDB'. readDB :: YiM ArticleDB readDB = io $ (dbLocation >>= r) `catch` (\_ -> return empty) where r x = fmap (decode . BL.fromChunks . return) $ B.readFile x -- We read in with strict bytestrings to guarantee the file is closed, -- and then we convert it to the lazy bytestring data.binary expects. -- This is inefficient, but alas...

My current serialized file is about 9.4M. I originally thought that the issue might be the recent upgrade in Yi to binary 0.5, but I unpulled patches back to past that, and the problem still manifested.

Whenever yi tries to read the articles.db file, it stack overflows. It actually stack-overflowed on even smaller files, but I managed to bump the size upwards, it seems, by the strict-Bytestring trick. Unfortunately, my personal file has since passed whatever that limit was.

I've read carefully the previous threads on Data.Binary and Data.Map stack-overflows, but none of them seem to help; hacking some $!s or seqs into readDB seems to make no difference, and Seq is supposed to be a strict datastructure already! Doing things in GHCi has been tedious, and hasn't enlightened me much: sometimes things overflow and sometimes they don't. It's all very frustrating and I'm seriously considering going back to using the original read/show code unless anyone knows how to fix this - that approach may be many times slower, but I know it will work.

-- gwern

Have you tried the darcs version of binary? It has a new instance which looks more efficient than the old.

Cheers, Spencer Janssen

I have. It still stack-overflows on my 9.8 meg file. (The magic number seems to be somewhere between 9 and 10 megabytes.)

-- gwern _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Don Stewart

7:35 p.m.

New subject: Data.Binary stack overflow with Data.Sequence String

Avoid unpack! ndmitchell:

...

Hi Gwern,

I get String/Data.Binary issues too. My suggestion would be to change your strings to ByteString's, serisalise, and then do the reverse conversion when reading. Interestingly, a String and a ByteString have identical Data.Binary reps, but in my experiments converting, including the cost of BS.unpack, makes the reading substantially cheaper.

Thanks

Neil

On Thu, Mar 5, 2009 at 2:33 AM, Gwern Branwen wrote:

...
On Tue, Mar 3, 2009 at 11:50 PM, Spencer Janssen wrote:

...
On Tue, Mar 3, 2009 at 10:30 PM, Gwern Branwen wrote:

...
So recently I've been having issues with Data.Binary & Data.Sequence; I serialize a 'Seq String'

You can see the file here: http://code.haskell.org/yi/Yi/IReader.hs

The relevant function seems to be:

-- | Read in database from 'dbLocation' and then parse it into an 'ArticleDB'. readDB :: YiM ArticleDB readDB = io $ (dbLocation >>= r) `catch` (\_ -> return empty) where r x = fmap (decode . BL.fromChunks . return) $ B.readFile x -- We read in with strict bytestrings to guarantee the file is closed, -- and then we convert it to the lazy bytestring data.binary expects. -- This is inefficient, but alas...

My current serialized file is about 9.4M. I originally thought that the issue might be the recent upgrade in Yi to binary 0.5, but I unpulled patches back to past that, and the problem still manifested.

Whenever yi tries to read the articles.db file, it stack overflows. It actually stack-overflowed on even smaller files, but I managed to bump the size upwards, it seems, by the strict-Bytestring trick. Unfortunately, my personal file has since passed whatever that limit was.

I've read carefully the previous threads on Data.Binary and Data.Map stack-overflows, but none of them seem to help; hacking some $!s or seqs into readDB seems to make no difference, and Seq is supposed to be a strict datastructure already! Doing things in GHCi has been tedious, and hasn't enlightened me much: sometimes things overflow and sometimes they don't. It's all very frustrating and I'm seriously considering going back to using the original read/show code unless anyone knows how to fix this - that approach may be many times slower, but I know it will work.

-- gwern

Have you tried the darcs version of binary? It has a new instance which looks more efficient than the old.

Cheers, Spencer Janssen

I have. It still stack-overflows on my 9.8 meg file. (The magic number seems to be somewhere between 9 and 10 megabytes.)

-- gwern _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Neil Mitchell

7:55 p.m.

New subject: Data.Binary stack overflow with Data.Sequence String

Avoid massive reductions in runtime while maintaining the same API? I did move to using ByteString's internally for those bits later on, but reading String's from Data.Binary with a ByteString+unpack went much more quickly than reading String's On Thu, Mar 5, 2009 at 7:35 PM, Don Stewart wrote:

...

Avoid unpack!

ndmitchell:

...
Hi Gwern,

I get String/Data.Binary issues too. My suggestion would be to change your strings to ByteString's, serisalise, and then do the reverse conversion when reading. Interestingly, a String and a ByteString have identical Data.Binary reps, but in my experiments converting, including the cost of BS.unpack, makes the reading substantially cheaper.

Thanks

Neil

On Thu, Mar 5, 2009 at 2:33 AM, Gwern Branwen wrote:

...
On Tue, Mar 3, 2009 at 11:50 PM, Spencer Janssen wrote:

...
On Tue, Mar 3, 2009 at 10:30 PM, Gwern Branwen wrote:

...
So recently I've been having issues with Data.Binary & Data.Sequence; I serialize a 'Seq String'

You can see the file here: http://code.haskell.org/yi/Yi/IReader.hs

The relevant function seems to be:

-- | Read in database from 'dbLocation' and then parse it into an 'ArticleDB'. readDB :: YiM ArticleDB readDB = io $ (dbLocation >>= r) `catch` (\_ -> return empty) where r x = fmap (decode . BL.fromChunks . return) $ B.readFile x -- We read in with strict bytestrings to guarantee the file is closed, -- and then we convert it to the lazy bytestring data.binary expects. -- This is inefficient, but alas...

My current serialized file is about 9.4M. I originally thought that the issue might be the recent upgrade in Yi to binary 0.5, but I unpulled patches back to past that, and the problem still manifested.

Whenever yi tries to read the articles.db file, it stack overflows. It actually stack-overflowed on even smaller files, but I managed to bump the size upwards, it seems, by the strict-Bytestring trick. Unfortunately, my personal file has since passed whatever that limit was.

I've read carefully the previous threads on Data.Binary and Data.Map stack-overflows, but none of them seem to help; hacking some $!s or seqs into readDB seems to make no difference, and Seq is supposed to be a strict datastructure already! Doing things in GHCi has been tedious, and hasn't enlightened me much: sometimes things overflow and sometimes they don't. It's all very frustrating and I'm seriously considering going back to using the original read/show code unless anyone knows how to fix this - that approach may be many times slower, but I know it will work.

-- gwern

Have you tried the darcs version of binary? It has a new instance which looks more efficient than the old.

Cheers, Spencer Janssen

I have. It still stack-overflows on my 9.8 meg file. (The magic number seems to be somewhere between 9 and 10 megabytes.)

-- gwern _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Gwern Branwen

7 Mar 7 Mar

1:23 a.m.

New subject: Data.Binary stack overflow with Data.Sequence String

On Thu, Mar 5, 2009 at 2:55 PM, Neil Mitchell wrote:

...

Avoid massive reductions in runtime while maintaining the same API?

I did move to using ByteString's internally for those bits later on, but reading String's from Data.Binary with a ByteString+unpack went much more quickly than reading String's

On Thu, Mar 5, 2009 at 7:35 PM, Don Stewart wrote:

...
Avoid unpack!

I wish I could use ByteStrings throughout and avoid packing/unpacking, but unfortunately I don't control the Yi interfaces! If I want to stick something in a buffer, String it is. This is where recent discussions about Stringable classes certainly seem apropos - it would be nice if we didn't have to do all this marshalling and converting by hand, indeed. -- gwern

Gwern Branwen

1:19 a.m.

New subject: Data.Binary stack overflow with Data.Sequence String

On Thu, Mar 5, 2009 at 6:51 AM, Neil Mitchell wrote:

...

Hi Gwern,

I get String/Data.Binary issues too. My suggestion would be to change your strings to ByteString's, serisalise, and then do the reverse conversion when reading. Interestingly, a String and a ByteString have identical Data.Binary reps, but in my experiments converting, including the cost of BS.unpack, makes the reading substantially cheaper.

Thanks

Neil

Ah, thanks for the advice. Switching to (strict) ByteString seems to resolve the stack overflow. (And thank goodness too, I need my ireader!) I hadn't realized it was the String that was messing things up and being lazy. Very annoying! (The String code was cleaner - fewer packs/unpacks.) -- gwern

5964

Age (days ago)

5966

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Don Stewart
Gwern Branwen
Neil Mitchell