
Will Ness wrote:
I keep thinking that storage duplication with span, filter etc. is not really necessary, and can be replaced with some pointer chasing - especially when there's only one consumer present for the generated values.
What I mean is thinking of lists in terms of produce/consumer paradigm, as objects supporting the { pull, peek } interface, keeping the generator inside that would produce the next value on 'pull' request and keep it ready for any 'peek's.
Euler's sieve is
sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..]) where (h,t) = span (< p*p) xs
[...]
The real difference here is between those producers whose values will only be consumed once, by one specific consumer, and those which values may be needed more than once, so need really to be maintained in some storage. If not - span, filter, map, whatever - they all are just little modifiers on top of the real producers, which may or may not also have an actual storage maintained by them.
(I haven't followed the whole thread, but hopefully I have enough grasp of it to make a useful remark. :)) Concerning lists as producer/consumer, I think that's exactly what lazy evaluation is doing. Neither filter , map or span evaluate and store more list elements that strictly necessary. Sure, creating a list head only to immediately consume it is somewhat inefficient -- and the target of stream fusion[1] -- but this is an overhead of how list elements are stored, not how many. You can try to implement the Euler sieve with producers by using a type like data Producer a = forall s. Producer { state :: !s, next :: s -> s, value :: s -> a } but I think this will be quite difficult; it's not clear what and thus how big the state will be. (See [1] for choosing a good type.) Concerning the sieves, there is a fundamental difference between the imperative sieve and the functional sieves, regardless of whether the latter start at p or p^2 or use a priority queue. Namely, the imperative sieve makes essential use of *pointer arithmetic*. The key point is that in order to cross off the multiples p, 2*p, 3*p, ... of a prime, the algorithm can directly jump from the (k*p)-th to the (k*p+p)-th array element by adding p to the index. The functional versions can never beat that because they can't just jump over p constructors of a data structure in O(1) time. [1]: http://www.cse.unsw.edu.au/~dons/papers/CLS07.html Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com