
On Thu, Sep 18, 2008 at 1:55 PM, Don Stewart
wchogg:
On Thu, Sep 18, 2008 at 1:29 PM, Don Stewart
wrote: <snip> This makes me cry.
import System.Environment import qualified Data.ByteString.Lazy.Char8 as B
main = do [f] <- getArgs s <- B.readFile f print (B.count '\n' s)
Compile it.
$ ghc -O2 --make A.hs
$ time ./A /usr/share/dict/words 52848 ./A /usr/share/dict/words 0.00s user 0.00s system 93% cpu 0.007 total
Against standard tools:
$ time wc -l /usr/share/dict/words 52848 /usr/share/dict/words wc -l /usr/share/dict/words 0.01s user 0.00s system 88% cpu 0.008 total
So both you & Bryan do essentially the same thing and of course both versions are far better than mine. So the purpose of using the Lazy version of ByteString was so that the file is only incrementally loaded by readFile as count is processing?
Yep, that's right
The streaming nature is implicit in the lazy bytestring. It's kind of the dual of explicit chunkwise control -- chunk processing reified into the data structure.
To ask an overly general question, if lazy bytestring makes a nice provider for incremental processing are there reasons to _not_ reach for that as my default when processing large files?