Re: [Haskell-cafe] A round of golf

18 Sep 2008

      dagit:
...
On Thu, Sep 18, 2008 at 12:31 PM, Creighton Hogg  wrote:
...
On Thu, Sep 18, 2008 at 1:55 PM, Don Stewart  wrote:
...
wchogg:
...
On Thu, Sep 18, 2008 at 1:29 PM, Don Stewart  wrote:
<snip>
...
This makes me cry.
import System.Environment
   import qualified Data.ByteString.Lazy.Char8 as B
main = do
       [f] <- getArgs
       s   <- B.readFile f
       print (B.count '\n' s)
Compile it.
$ ghc -O2 --make A.hs
$ time ./A /usr/share/dict/words
   52848
   ./A /usr/share/dict/words 0.00s user 0.00s system 93% cpu 0.007 total
Against standard tools:
$ time wc -l /usr/share/dict/words
   52848 /usr/share/dict/words
   wc -l /usr/share/dict/words 0.01s user 0.00s system 88% cpu 0.008 total
So both you & Bryan do essentially the same thing and of course both
versions are far better than mine.  So the purpose of using the Lazy
version of ByteString was so that the file is only incrementally
loaded by readFile as count is processing?
Yep, that's right
The streaming nature is implicit in the lazy bytestring. It's kind of
the dual of explicit chunkwise control -- chunk processing reified into
the data structure.
To ask an overly general question, if lazy bytestring makes a nice
provider for incremental processing are there reasons to _not_ reach
for that as my default when processing large files?
Yes.  The main time is when you "accidentally" force the whole file
(or at least large parts of it) into memory at the same time.
Profiling and careful programming seem to be the workarounds, but in a
large application the "careful programming" part can become
prohibitively expensive.  This is due to the sometimes subtle nature
of how strictness composes with laziness.  This is a the result of a
more general issue that it is non-obvious how your program is
evaluated at run-time thanks to lazy evaluation, thus making lazy
evaluation act as a double edged sword at times.  I'm not saying get
rid of lazy eval, but occasionally it presents problems for efficiency
and diagnosing efficiency problems.
The rule seems to be:  Write correct code first, fix the problems
(usually just inefficiencies) later.
Using lazy bytestrings makes it easier to write concise code that is
more easily inspected for correctness.  Perhaps it is even easier to
test such code, but I'm skeptical of that.  Thus, I think most people
here would agree that reaching first for lazy byte string is preferred
over other techniques.  Plus, the one of the most common fixes to
inefficient haskell programs is to make them lazy in the right places
and strict in key places and using lazy bytestring will get you part
of the way to that refactoring usually.
Work on the "dual" of lazy bytestrings -- chunked enumerators -- may
lead to more options in this area. 

The question of compositionality of left-fold enumerators remains
(afaik), but we'll see. 

-- Don