
2011/12/26 Gábor Lehel
On Sun, Dec 25, 2011 at 9:19 PM, Eugene Kirpichov
wrote: Hello Heinrich,
Thanks, that's sure some food for thought!
A few notes: * This is indeed analogous to Iteratees. I tried doing the same with Iteratees but failed, so I decided to put together something simple of my own. * The Applicative structure over this stuff is very nice. I was thinking, what structure to put on - and Applicative seems the perfect fit. It's also possible to implement Arrows - but I once tried and failed; however, I was trying that for a more complex stream transformer datatype (a hybrid of Iteratee and Enumerator). * StreamSummary is trivially a bifunctor. I actually wanted to make it an instance of Bifunctor, but it was in the category-extras package and I hesitated to reference this giant just for this purpose :) Probably bifunctors should be in prelude.
Edward Kmett has been splitting that up into a variety of smaller packages, for instance:
Actually it's not a bifunctor - it's a functor in one argument and contrafunctor in the other. Is there a name for such a structure?
* Whereas StreamSummary a r abstracts deconstruction of lists, the dual datatype (StreamSummary a r ->) abstracts construction; however I just now (after looking at your first definition of length) understood that it is trivially isomorphic to the regular list datatype - you just need to be non-strict in the state - listify :: ListTo a [a] = CaseOf [] (\x -> fmap (x:) listify). So you don't need functions of the form (forall r . ListTo a r -> ListTo b r) - you just need (ListTo b [a]). This is a revelation for me.
On Sun, Dec 25, 2011 at 2:25 PM, Heinrich Apfelmus
wrote: Eugene Kirpichov wrote:
In the last couple of days I completed my quest of making my graphing utility timeplot ( http://jkff.info/software/timeplotters ) not load
whole input dataset into memory and consequently be able to deal with datasets of any size, provided however that the amount of data to *draw* is not so large. On the go it also got a huge speedup - previously visualizing a cluster activity dataset with a million events took around 15 minutes and a gig of memory, now it takes 20 seconds and 6 Mb max residence. (I haven't yet uploaded to hackage as I have to give it a bit more testing)
The refactoring involved a number of interesting programming patterns that I'd like to share with you and ask for feedback - perhaps something can be simplified.
The source is at http://github.com/jkff/timeplot
The datatype of incremental computations is at
https://github.com/jkff/timeplot/blob/master/Tools/TimePlot/Incremental.hs.
Strictness is extremely important here - the last memory leak I eliminated was lack of bang patterns in teeSummary.
Your StreamSummary type has a really nice interpretation: it's a reification of case expressions.
For instance, consider the following simple function from lists to integers
length :: [a] -> Int length xs = case xs of [] -> 0 (y:ys) -> 1 + length ys
We want to reify the case expression as constructor of a data type. What type should it have? Well, a case expression maps a list xs to a result, here of type Int, via two cases: the first case gives a result and the other maps a value of type a to a function from lists to results again. This explanation was probably confusing, so I'll just go ahead and define a data type that represents functions from lists [a] to some result of type r
data ListTo a r = CaseOf r (a -> ListTo a r)
interpret :: ListTo a r -> ([a] -> r) interpret (CaseOf nil cons) xs = case xs of [] -> nil (y:ys) -> interpret (cons y) ys
As you can see, we are just mapping each CaseOf constructor back to a built-in case expression.
Likewise, each function from lists can be represented in terms of our new data type: simply replace all built-in case expressions with the new constructor
length' :: ListTo a Int length' = CaseOf (0) (\x -> fmap (1+) length')
length = interpret length'
The CaseOf may look a bit weird, but it's really just a straightforward translation of the case expression you would use to define the function go instead.
Ok, this length function is really inefficient because it builds a huge expression of the form (1+(1+...)). Let's implement a strict variant instead
lengthL :: ListTo a Int lengthL = go 0 where go !n = CaseOf (n) (\x -> go (n+1))
While we're at it, let's translate two more list functions
foldL' :: (b -> a -> b) -> b -> ListTo a b foldL' f b = Case b (\a -> foldL' f $! f b a)
sumL :: ListTo Int Int sumL = foldL' (\b a -> a+b) 0
And now we can go for the point of this message: unlike ordinary functions from lists, we can compose these in lock-step! In particular, the following applicative instance
instance Applicative (ListTo a) where pure b = CaseOf b (const $ pure b) (CaseOf f fs) <*> (CaseOf x xs) = CaseOf (f x) (\a -> fs a <*> xs a)
allows us to write a function
average :: ListTo Int Double average = divide <$> sumL <*> lengthL where divide a b = fromIntegral a / fromIntegral b
that runs in constant space! Why does this work? Well, since we can now inspect case expressions, we can choose to evaluate them in lock-step, essentially computing sum and length with just one pass over the input list. Remember that the original definition
average xs = sum xs / length xs
has a space leak because the input list xs is being shared.
Remarks: 1. Reified case expressions are, of course, the same thing as Iteratees, modulo chunking and weird naming.
2. My point is topped by scathing irony: if Haskell had a form of *partial evaluation*, we could write applicative combinators for *ordinary* functions [a] -> r and express average in constant space. In other words,
the partial
evaluation would make it unnecessary to reify case expressions for the purpose of controlling performance / space leaks.
Best regards, Heinrich Apfelmus
-- http://apfelmus.nfshost.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Work is punishment for failing to procrastinate effectively.
-- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/