performance difference for binary-0.4.3.1 with ghc-6.8.3 and ghc-6.10

Hello, I was experimenting with using ghc-6.10.0.20081007 on a project, and it seems that binary-0.4.3.1 has markedly worse performance in certain cases. With the following simple test:
import qualified Data.ByteString.Lazy as L import Data.Binary import Data.Binary.Get import Control.Monad
main :: IO () main = do b <- L.readFile "some_binary_file" putStrLn $ show $ runGet getter b
getter :: Get [Word16] getter = replicateM 1000000 getWord16le
running this program compiled with ghc-6.10 takes about 4 times as long (and consumes much more memory) as when compiled with ghc-6.8.3. The extra time appears to be proportional to the number of elements processed in the Get. Running the programs with -hT shows a clear memory difference, which I think is the source of the problem. I've placed pdfs of that output at https://webspace.utexas.edu/latojw/data/ The difference seems to manifest itself only when the elements are actually processed; changing "show $ runGet " to "show $ length $ runGet " is slightly faster in 6.10. I was working on an Intel Mac with OS 10.4, binary-0.4.3.1, and bytestring-0.9.1.4. Can anyone confirm this, or suggest what might be the difference? Thank you, John Lato

On Sun, Oct 26, 2008 at 9:36 AM, John Lato
Hello,
I was experimenting with using ghc-6.10.0.20081007 on a project, and it seems that binary-0.4.3.1 has markedly worse performance in certain cases. With the following simple test:
import qualified Data.ByteString.Lazy as L import Data.Binary import Data.Binary.Get import Control.Monad
main :: IO () main = do b <- L.readFile "some_binary_file" putStrLn $ show $ runGet getter b
getter :: Get [Word16] getter = replicateM 1000000 getWord16le
running this program compiled with ghc-6.10 takes about 4 times as long (and consumes much more memory) as when compiled with ghc-6.8.3. The extra time appears to be proportional to the number of elements processed in the Get. Running the programs with -hT shows a clear memory difference, which I think is the source of the problem. I've placed pdfs of that output at https://webspace.utexas.edu/latojw/data/
The difference seems to manifest itself only when the elements are actually processed; changing "show $ runGet " to "show $ length $ runGet " is slightly faster in 6.10.
I was working on an Intel Mac with OS 10.4, binary-0.4.3.1, and bytestring-0.9.1.4. Can anyone confirm this, or suggest what might be the difference?
Thank you, John Lato _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
With GHC 6.8.2: test: too few bytes. Failed reading at byte position 1613914 real 0m27.573s user 0m12.917s sys 0m0.087s With GHC 6.11.20081003: test: too few bytes. Failed reading at byte position 1613914 real 0m21.528s user 0m14.759s sys 0m0.135s I'm not using the exact same versions as you are, but I seem to be getting different results.

On Mon, Oct 27, 2008 at 12:34 AM, Alexander Dunlap
On Sun, Oct 26, 2008 at 9:36 AM, John Lato
wrote: Hello,
I was experimenting with using ghc-6.10.0.20081007 on a project, and it seems that binary-0.4.3.1 has markedly worse performance in certain cases. With the following simple test:
import qualified Data.ByteString.Lazy as L import Data.Binary import Data.Binary.Get import Control.Monad
main :: IO () main = do b <- L.readFile "some_binary_file" putStrLn $ show $ runGet getter b
getter :: Get [Word16] getter = replicateM 1000000 getWord16le
running this program compiled with ghc-6.10 takes about 4 times as long (and consumes much more memory) as when compiled with ghc-6.8.3. The extra time appears to be proportional to the number of elements processed in the Get. Running the programs with -hT shows a clear memory difference, which I think is the source of the problem. I've placed pdfs of that output at https://webspace.utexas.edu/latojw/data/
The difference seems to manifest itself only when the elements are actually processed; changing "show $ runGet " to "show $ length $ runGet " is slightly faster in 6.10.
I was working on an Intel Mac with OS 10.4, binary-0.4.3.1, and bytestring-0.9.1.4. Can anyone confirm this, or suggest what might be the difference?
Thank you, John Lato _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
With GHC 6.8.2:
test: too few bytes. Failed reading at byte position 1613914
real 0m27.573s user 0m12.917s sys 0m0.087s
With GHC 6.11.20081003:
test: too few bytes. Failed reading at byte position 1613914
real 0m21.528s user 0m14.759s sys 0m0.135s
I'm not using the exact same versions as you are, but I seem to be getting different results.
Hi Alexander, Thanks for trying this out. Based on your error, I'd guess that the file isn't large enough to read all the requested data. In my case, it was about 70MB. You could try reading from /dev/random, or rather than reading from a file, use something like
test_pure = let b = Data.ByteString.Lazy.repeat 1 in putStrLn $ show $ runGet getter b
I haven't tried a pure version myself, so I don't know what to expect from it. Something to do tonight... Cheers, John

jwlato:
On Mon, Oct 27, 2008 at 12:34 AM, Alexander Dunlap
wrote: On Sun, Oct 26, 2008 at 9:36 AM, John Lato
wrote: Hello,
I was experimenting with using ghc-6.10.0.20081007 on a project, and it seems that binary-0.4.3.1 has markedly worse performance in certain cases. With the following simple test:
import qualified Data.ByteString.Lazy as L import Data.Binary import Data.Binary.Get import Control.Monad
main :: IO () main = do b <- L.readFile "some_binary_file" putStrLn $ show $ runGet getter b
getter :: Get [Word16] getter = replicateM 1000000 getWord16le
running this program compiled with ghc-6.10 takes about 4 times as long (and consumes much more memory) as when compiled with ghc-6.8.3. The extra time appears to be proportional to the number of elements processed in the Get. Running the programs with -hT shows a clear memory difference, which I think is the source of the problem. I've placed pdfs of that output at https://webspace.utexas.edu/latojw/data/
The difference seems to manifest itself only when the elements are actually processed; changing "show $ runGet " to "show $ length $ runGet " is slightly faster in 6.10.
I was working on an Intel Mac with OS 10.4, binary-0.4.3.1, and bytestring-0.9.1.4. Can anyone confirm this, or suggest what might be the difference?
Thank you, John Lato _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
With GHC 6.8.2:
test: too few bytes. Failed reading at byte position 1613914
real 0m27.573s user 0m12.917s sys 0m0.087s
With GHC 6.11.20081003:
test: too few bytes. Failed reading at byte position 1613914
real 0m21.528s user 0m14.759s sys 0m0.135s
I'm not using the exact same versions as you are, but I seem to be getting different results.
Hi Alexander,
Thanks for trying this out. Based on your error, I'd guess that the file isn't large enough to read all the requested data. In my case, it was about 70MB. You could try reading from /dev/random, or rather than reading from a file, use something like
test_pure = let b = Data.ByteString.Lazy.repeat 1 in putStrLn $ show $ runGet getter b
I haven't tried a pure version myself, so I don't know what to expect from it. Something to do tonight...
I can reproduce the slowdown with ghc 6.10 in the binary benchmark suite. Investigating. -- Don

jwlato:
Hello,
I was experimenting with using ghc-6.10.0.20081007 on a project, and it seems that binary-0.4.3.1 has markedly worse performance in certain cases. With the following simple test:
import qualified Data.ByteString.Lazy as L import Data.Binary import Data.Binary.Get import Control.Monad
main :: IO () main = do b <- L.readFile "some_binary_file" putStrLn $ show $ runGet getter b
getter :: Get [Word16] getter = replicateM 1000000 getWord16le
running this program compiled with ghc-6.10 takes about 4 times as long (and consumes much more memory) as when compiled with ghc-6.8.3. The extra time appears to be proportional to the number of elements processed in the Get. Running the programs with -hT shows a clear memory difference, which I think is the source of the problem. I've placed pdfs of that output at https://webspace.utexas.edu/latojw/data/
The difference seems to manifest itself only when the elements are actually processed; changing "show $ runGet " to "show $ length $ runGet " is slightly faster in 6.10.
I was working on an Intel Mac with OS 10.4, binary-0.4.3.1, and bytestring-0.9.1.4. Can anyone confirm this, or suggest what might be the difference?
Is this the sole test case? I can investigate. Though perhaps using a newer GHC release candidate is also a good idea. -- Don

Where can I get ghc-6.10? I cannot see it at haskell.org website.
On 2008-10-29, Don Stewart
jwlato:
Hello,
I was experimenting with using ghc-6.10.0.20081007 on a project, and it seems that binary-0.4.3.1 has markedly worse performance in certain cases. With the following simple test:
import qualified Data.ByteString.Lazy as L import Data.Binary import Data.Binary.Get import Control.Monad
main :: IO () main = do b <- L.readFile "some_binary_file" putStrLn $ show $ runGet getter b
getter :: Get [Word16] getter = replicateM 1000000 getWord16le
running this program compiled with ghc-6.10 takes about 4 times as long (and consumes much more memory) as when compiled with ghc-6.8.3. The extra time appears to be proportional to the number of elements processed in the Get. Running the programs with -hT shows a clear memory difference, which I think is the source of the problem. I've placed pdfs of that output at https://webspace.utexas.edu/latojw/data/
The difference seems to manifest itself only when the elements are actually processed; changing "show $ runGet " to "show $ length $ runGet " is slightly faster in 6.10.
I was working on an Intel Mac with OS 10.4, binary-0.4.3.1, and bytestring-0.9.1.4. Can anyone confirm this, or suggest what might be the difference?
Is this the sole test case?
I can investigate. Though perhaps using a newer GHC release candidate is also a good idea.
-- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Tue, Oct 28, 2008 at 5:43 PM, Don Stewart
jwlato:
Hello,
I was experimenting with using ghc-6.10.0.20081007 on a project, and it seems that binary-0.4.3.1 has markedly worse performance in certain cases. With the following simple test:
import qualified Data.ByteString.Lazy as L import Data.Binary import Data.Binary.Get import Control.Monad
main :: IO () main = do b <- L.readFile "some_binary_file" putStrLn $ show $ runGet getter b
getter :: Get [Word16] getter = replicateM 1000000 getWord16le
running this program compiled with ghc-6.10 takes about 4 times as long (and consumes much more memory) as when compiled with ghc-6.8.3. The extra time appears to be proportional to the number of elements processed in the Get. Running the programs with -hT shows a clear memory difference, which I think is the source of the problem. I've placed pdfs of that output at https://webspace.utexas.edu/latojw/data/
The difference seems to manifest itself only when the elements are actually processed; changing "show $ runGet " to "show $ length $ runGet " is slightly faster in 6.10.
I was working on an Intel Mac with OS 10.4, binary-0.4.3.1, and bytestring-0.9.1.4. Can anyone confirm this, or suggest what might be the difference?
Is this the sole test case?
I can investigate. Though perhaps using a newer GHC release candidate is also a good idea.
-- Don
I'll try a newer release candidate, although at the time I first ran this it was the latest. Last night I tried creating the bytestring using "repeat 1" to remove the file I/O, and the result was the same. There are a few other things I want to try as well (removing replicateM, folding the list to force values rather than printing them all), but my time is extremely limited at the moment. John

jwlato:
On Tue, Oct 28, 2008 at 5:43 PM, Don Stewart
wrote: jwlato:
Hello,
I was experimenting with using ghc-6.10.0.20081007 on a project, and it seems that binary-0.4.3.1 has markedly worse performance in certain cases. With the following simple test:
import qualified Data.ByteString.Lazy as L import Data.Binary import Data.Binary.Get import Control.Monad
main :: IO () main = do b <- L.readFile "some_binary_file" putStrLn $ show $ runGet getter b
getter :: Get [Word16] getter = replicateM 1000000 getWord16le
running this program compiled with ghc-6.10 takes about 4 times as long (and consumes much more memory) as when compiled with ghc-6.8.3. The extra time appears to be proportional to the number of elements processed in the Get. Running the programs with -hT shows a clear memory difference, which I think is the source of the problem. I've placed pdfs of that output at https://webspace.utexas.edu/latojw/data/
The difference seems to manifest itself only when the elements are actually processed; changing "show $ runGet " to "show $ length $ runGet " is slightly faster in 6.10.
I was working on an Intel Mac with OS 10.4, binary-0.4.3.1, and bytestring-0.9.1.4. Can anyone confirm this, or suggest what might be the difference?
Is this the sole test case?
I can investigate. Though perhaps using a newer GHC release candidate is also a good idea.
-- Don
I'll try a newer release candidate, although at the time I first ran this it was the latest.
Last night I tried creating the bytestring using "repeat 1" to remove the file I/O, and the result was the same. There are a few other things I want to try as well (removing replicateM, folding the list to force values rather than printing them all), but my time is extremely limited at the moment.
Could you send me a minimal test case not involving IO? -- Don
participants (6)
-
Alexander Dunlap
-
Don Stewart
-
Donald Halomoan
-
Ian Lynagh
-
Jason Dusek
-
John Lato