
On Thu, Sep 18, 2008 at 1:55 PM, Don Stewart
wchogg:
On Thu, Sep 18, 2008 at 1:29 PM, Don Stewart
wrote: wchogg:
Hey Haskell, So for a fairly inane reason, I ended up taking a couple of minutes and writing a program that would spit out, to the console, the number of lines in a file. Off the top of my head, I came up with this which worked fine with files that had 100k lines:
main = do path <- liftM head $ getArgs h <- openFile path ReadMode n <- execStateT (countLines h) 0 print n
untilM :: Monad m => (a -> m Bool) -> (a -> m ()) -> a -> m () untilM cond action val = do truthy <- cond val if truthy then return () else action val >> (untilM cond action val)
countLines :: Handle -> StateT Int IO () countLines = untilM (\h -> lift $ hIsEOF h) (\h -> do lift $ hGetLine h modify (+1))
If this makes anyone cringe or cry "you're doing it wrong", I'd actually like to hear it. I never really share my projects, so I don't know how idiosyncratic my style is.
This makes me cry.
import System.Environment import qualified Data.ByteString.Lazy.Char8 as B
main = do [f] <- getArgs s <- B.readFile f print (B.count '\n' s)
Compile it.
$ ghc -O2 --make A.hs
$ time ./A /usr/share/dict/words 52848 ./A /usr/share/dict/words 0.00s user 0.00s system 93% cpu 0.007 total
Against standard tools:
$ time wc -l /usr/share/dict/words 52848 /usr/share/dict/words wc -l /usr/share/dict/words 0.01s user 0.00s system 88% cpu 0.008 total
So both you & Bryan do essentially the same thing and of course both versions are far better than mine. So the purpose of using the Lazy version of ByteString was so that the file is only incrementally loaded by readFile as count is processing?
Yep, that's right
The streaming nature is implicit in the lazy bytestring. It's kind of the dual of explicit chunkwise control -- chunk processing reified into the data structure.
Hi Don, I have a bit more of a followup, actually. You make use of the built in bytestring consumer count, which itself is built upon the foldlChunks function which is only exported in the ByteString.Lazy.Internal. If I want to make my own efficient bytestring consumer, is that what I need to use in order to preserve the inherent laziness of the datastructure? Also, I feel a little at a loss for how to make a good bytestring producer for efficiently _writing_ large swaths of data via writeFile. Would it be possible to whip up a small example? Oh, and lastly, I apologize to both you & Bryan for making you cry. I hope you can forgive my cruelty. Thanks, Creighton