Re: [Haskell-cafe] Why is this so inefficient?

6 Feb 2008

      If the  strings are relatively short, there can be a bottleneck
in the current Ord instance for Bytestrings. When lots of them
are in a map, the ffi calls to memcmp dominate. 

I've a fix for this (do it all in Haskell for small strings), and
can push it if someone complains some more.

jefferson.r.heard:
...
I thought this was fairly straightforward, but where the marked line
finishes in 0.31 seconds on my machine, the actual transpose takes
more than 5 minutes.  I know it must be possible to read data in
haskell faster than this.  I'm trying to read a 100MB comma delimited
file.  I've tried both CSV modules, and these take even longer to read
the file.  Is there some general best-practice for reading and parsing
large amounts of data that I'm not aware of?
I have tried, by the way, a couple of things, including putting a bang
(!) before row in transposeRow and using foldr instead of foldl', but
that didn't change anything other than force me to increase the stack
size to 100M on the command line.
I'm running in the profiler now, and I'll update this, but I thought I
would check and see if my head was on remotely straight to begin with.
-- Jeff
---
module ColumnMajorCSV where
import qualified Data.ByteString.Char8 as S
import qualified Data.Map as M
import qualified Data.List as L
transposeRow cols row = zipWith (:) row cols
transposeCSV :: [[S.ByteString]] -> M.Map String [S.ByteString]
transposeCSV (header:rows) = M.fromList (zip (map S.unpack header) spreadsheet)
    where spreadsheet = L.foldl' transposeRow emptySheet rows
          emptySheet = take (length header) $ repeat []
dataFromFile :: String -> IO (M.Map String [S.ByteString])
dataFromFile filename = do
    f <- S.readFile filename
    print . length . map (S.split ',' $!) . S.lines $ f
 -- finishes in 0.31 seconds
    return . transposeCSV . map (S.split ',' $!) . S.lines $ f  --
this takes 5 minutes and 6 seconds
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Why is this so inefficient?

Don Stewart