Re: [Haskell-cafe] Re: FASTER primes

5 Jan 2010

      Am Montag 04 Januar 2010 16:30:18 schrieb Will Ness:
...
Heinrich Apfelmus  writes:
...
(I haven't followed the whole thread, but hopefully I have enough grasp
of it to make a useful remark. :))
Concerning lists as producer/consumer, I think that's exactly what lazy
evaluation is doing. Neither  filter ,  map  or  span  evaluate and
store more list elements that strictly necessary.
I laways suspected as much, but was once told that Chris Okasaki has shown
that any filter etc must allocate its own storage. With the peek/pull they
don't have to, if they are nested, and the deepest one from the real
storage gets pulled through some pointer chasing eventually. Span isn't so
easily compiled out too or is it? But that might be a minor point.
For me, a real smart compiler is one that would take in e.g. (sum $ take n
$ cycle $ [1..m]) and spill out a straight up math formula, inside a few
ifs maybe (just an aside).
Such things just aren't common enough. If you start letting the compiler look for patterns 
such as this which can be transformed into a simple formula, compile times would explode.
...
Such a smart compiler might even be able to derive a well performing code
right from the Turner's sieve. :)
...
Sure, creating a list head only to immediately consume it is somewhat
inefficient -- and the target of stream fusion[1] -- but this is an
overhead of how list elements are stored, not how many.
it might be equivalent to the (imagined) producer's storing its 'current'
value inside its frame.
How much can we rely on the run-time to actually destroy all the
passed-over elements and not hang on to them for some time?
If they're really passed over, they very probably won't survive the next GC.
...
Is this that compiler switch that Daniel mentioned? Is it reliable?
No, it's something different. Not entirely unrelated.
The -fno-cse turns off Common Subexpression Elimination (rather sharing than elimination).
That is, if you have

f x = g (expensive x) y z (h (expensive x))

the compiler can see the common subexpression (expensive x) on the RHS and 
decide to share it, i.e. calculate it only once:

f x = let e = expensive x in g e y z (h e)

That may or may not be a good idea. If e is small (say an Int) and expensive isn't a 
blatant lie, it is a *very* good idea. But if e is a loong list that expensive x lazily 
produces and that is consumed by g and h in the right manner, it is often a bad idea 
because h might not start to consume before g has demanded a long prefix of the list and 
kaboom, Heap exhausted.

If you turn on optimisations, the compiler does some looking for common subexpressions
to share. There are some heuristics to decide when to share, but they're far from 
infallible. So, if you have a situation like above, and the compiler heuristics indicate 
that sharing would be good although it isn't, you're hosed. So you have the option to tell 
the compiler "try hard to optimise, but don't do CSE (because I know it would be bad 
here)" {- that can be done on a per-file basis, it would be cool if it could be done on a 
per-function basis -}.

Now if you have a list producer and a consumer, without fusion, it goes like
Consumer: Gimme
Producer: creates cons cell, puts value, next cons cell (how to produce more)
Consumer: takes value, does something with it, gimme more.

Stream fusion is about eliminating the cons cell creating and value passing, to fuse 
production and consumption into a nice loop. That is of course impossible if the produced 
list is shared between two (or more) consumers.

Re: [Haskell-cafe] Re: FASTER primes

Daniel Fischer