
Hi guys, I tried for days now to figure out a solution that Luke Palmer has presented me with, by myself, I'm getting nowhere. He has kindly provided me with this code: import Data.Monoid newtype IntTrie a = IntTrie [a] deriving Show singleton :: (Monoid a) => Int -> a -> IntTrie a singleton ch x = IntTrie $ replicate ch mempty ++ [x] ++ repeat mempty lookupTrie :: IntTrie a -> Int -> a lookupTrie (IntTrie xs) n = xs !! n instance (Monoid a) => Monoid (IntTrie a) where mempty = IntTrie (repeat mempty) mappend (IntTrie xs) (IntTrie ys) = IntTrie (infZipWith mappend xs ys) infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys test = mconcat [singleton (n `mod` 42) [n] | n <- [0..]] `lookupTrie` 10 It's supposed to eventually help me group a list of key value pairs and then further process them in a linear (streaming like) way. The original list being something like [('a', 23), ('b', 18), ('a', 34) ...]. There are couple of techniques employed in this solution, but I'm just guessing here. The keywords I've been looking up so far: Memmoization, Deforestation, Single Pass, Linear Map and some others. Can someone please fill me in? Günther

On Thu, Mar 26, 2009 at 12:21 PM, GüŸnther Schmidt
Hi guys,
I tried for days now to figure out a solution that Luke Palmer has presented me with, by myself, I'm getting nowhere.
Sorry, I meant to respond earlier. They say you don't really understand something until you can explain it to a six year old. So trying to explain this to a colleague made me realize how little I must understand it :-). But I'll try by saying whatever come to mind... *Lazy* list processing is all about *right* associativity. We need to be able to output some information knowing that our input looks like a:b:c:..., where we don't know the ... I see IntTrie [a] as an infinite collection of lists (well, it is [[a]], after all :-), one for each integer. So I want to take a structure like this: (1,2):(3,4):(3,5):... And turn it into a structure like this: { 0 -> ... 1 -> 2:... 2 -> ... 3 -> 4:5:... ... } (This is just a list in my implementation, but I intended it to be a trie, ideally, which is why I wrote the keys explicitly) So the yet-unknown information at the tail of the list turns into yet-unknown information about the tails of the keys. In fact, if you replace ... with _|_, you get exactly the same thing (this is no coincidence!) The spine of this trie is maximally lazy: this is key. If the structure of the spine depended on the input data (as it does for Data.Map), then we wouldn't be able to process infinite data, because we can never get it all. So even making a trie out of the list _|_ gives us: { 0 -> _|_, 1 -> _|_, 2 -> _|_, ... } I.e. the keys are still there. Then we can combine two tries just by combining them pointwise (which is what infZipWith does). It is essential that the pattern matches on infZipWith are lazy. We're zipping together an infinite sequence of lists, and normally the result would be the length of the shortest one, which is unknowable. So the lazy pattern match forces the result ('s spine) to be infinite. Umm... yeah, that's a braindump. Sorry I couldn't be more helpful. I'm happy to answer any specific questions. Luke
He has kindly provided me with this code:
import Data.Monoid
newtype IntTrie a = IntTrie [a] deriving Show
singleton :: (Monoid a) => Int -> a -> IntTrie a singleton ch x = IntTrie $ replicate ch mempty ++ [x] ++ repeat mempty
lookupTrie :: IntTrie a -> Int -> a lookupTrie (IntTrie xs) n = xs !! n
instance (Monoid a) => Monoid (IntTrie a) where mempty = IntTrie (repeat mempty) mappend (IntTrie xs) (IntTrie ys) = IntTrie (infZipWith mappend xs ys)
infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys
test = mconcat [singleton (n `mod` 42) [n] | n <- [0..]] `lookupTrie` 10
It's supposed to eventually help me group a list of key value pairs and then further process them in a linear (streaming like) way.
The original list being something like [('a', 23), ('b', 18), ('a', 34) ...].
There are couple of techniques employed in this solution, but I'm just guessing here.
The keywords I've been looking up so far:
Memmoization, Deforestation, Single Pass, Linear Map and some others.
Can someone please fill me in?
Günther
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Dear Luke, let me thank you first of all for your response, period, and let me also assure you that your efforts are not (totally) in vain. :) Of course I had all kinds of guesses what sort of dark arts you employed here, mostly really long shots (Memoization, CPS ...). And at the end of the day it turns out to be a lot simpler than that. You've basically chosen a data structure that can be consumed while it's still being built (due to Lazy Evaluation). I certainly haven't figured it all out and it will take a lot more time to play with Tries before I have. I'm already eager to sift through the other goodies that are on your blog once I have finished my app, very interesting stuff even if can't understand half of it. :) Would you happen to know a good starting point about tries? Günther Luke Palmer schrieb:
On Thu, Mar 26, 2009 at 12:21 PM, GüŸnther Schmidt
mailto:gue.schmidt@web.de> wrote: Hi guys,
I tried for days now to figure out a solution that Luke Palmer has presented me with, by myself, I'm getting nowhere.
Sorry, I meant to respond earlier.
They say you don't really understand something until you can explain it to a six year old. So trying to explain this to a colleague made me realize how little I must understand it :-).. But I'll try by saying whatever come to mind...
/Lazy/ list processing is all about /right/ associativity. We need to be able to output some information knowing that our input looks like a:b:c:..., where we don't know the ... I see IntTrie [a] as an infinite collection of lists (well, it is [[a]], after all :-), one for each integer. So I want to take a structure like this:
(1,2):(3,4):(3,5):...
And turn it into a structure like this:
{ 0 -> ... 1 -> 2:... 2 -> ... 3 -> 4:5:... ... }
(This is just a list in my implementation, but I intended it to be a trie, ideally, which is why I wrote the keys explicitly)
So the yet-unknown information at the tail of the list turns into yet-unknown information about the tails of the keys. In fact, if you replace .... with _|_, you get exactly the same thing (this is no coincidence!)
The spine of this trie is maximally lazy: this is key. If the structure of the spine depended on the input data (as it does for Data.Map), then we wouldn't be able to process infinite data, because we can never get it all. So even making a trie out of the list _|_ gives us:
{ 0 -> _|_, 1 -> _|_, 2 -> _|_, ... }
I.e. the keys are still there. Then we can combine two tries just by combining them pointwise (which is what infZipWith does). It is essential that the pattern matches on infZipWith are lazy. We're zipping together an infinite sequence of lists, and normally the result would be the length of the shortest one, which is unknowable. So the lazy pattern match forces the result ('s spine) to be infinite.
Umm... yeah, that's a braindump. Sorry I couldn't be more helpful. I'm happy to answer any specific questions.
Luke

Luke, does your explanation to Guenther have anything to do with
coinduction? -- the property that a producer gives a little bit of
output at each step of recursion, which a consumer can than crunch in
a lazy way?
I find that coinduction seems to figure frequently in algos that
process a stream.
So, Guenther, I'm not certain if coinduction figures in here yet but
it gives you another thing to google on. Co-induction seems to play
the same role for stream processing in haskell that tail recursiveness
plays in non-lazy languages
like lisp. That is, it's kind of the ideal to be striven for. Whereas
in haskell, tail recursive is frequently not the best thing because it
goes into a non-terminating state when there is an infinite data
structure which is crunched down to a finite one but at the wrong
point in the function pipeline.
see http://groups.google.com/group/fa.haskell/browse_thread/thread/4240bc7c7abd4d30/49f28f5a41519335?q="it+is+however+nicely+coinductive"#49f28f5a41519335
2009/3/26 Luke Palmer
On Thu, Mar 26, 2009 at 12:21 PM, GüŸnther Schmidt
wrote:y Hi guys,
I tried for days now to figure out a solution that Luke Palmer has presented me with, by myself, I'm getting nowhere.
Sorry, I meant to respond earlier.
They say you don't really understand something until you can explain it to a six year old. So trying to explain this to a colleague made me realize how little I must understand it :-). But I'll try by saying whatever come to mind...
Lazy list processing is all about right associativity. We need to be able to output some information knowing that our input looks like a:b:c:..., where we don't know the ... I see IntTrie [a] as an infinite collection of lists (well, it is [[a]], after all :-), one for each integer. So I want to take a structure like this:
(1,2):(3,4):(3,5):...
And turn it into a structure like this:
{ 0 -> ... 1 -> 2:... 2 -> ... 3 -> 4:5:... ... }
(This is just a list in my implementation, but I intended it to be a trie, ideally, which is why I wrote the keys explicitly)
So the yet-unknown information at the tail of the list turns into yet-unknown information about the tails of the keys. In fact, if you replace ... with _|_, you get exactly the same thing (this is no coincidence!)
The spine of this trie is maximally lazy: this is key. If the structure of the spine depended on the input data (as it does for Data.Map), then we wouldn't be able to process infinite data, because we can never get it all. So even making a trie out of the list _|_ gives us:
{ 0 -> _|_, 1 -> _|_, 2 -> _|_, ... }
I.e. the keys are still there. Then we can combine two tries just by combining them pointwise (which is what infZipWith does). It is essential that the pattern matches on infZipWith are lazy. We're zipping together an infinite sequence of lists, and normally the result would be the length of the shortest one, which is unknowable. So the lazy pattern match forces the result ('s spine) to be infinite.
Umm... yeah, that's a braindump. Sorry I couldn't be more helpful. I'm happy to answer any specific questions.
Luke
He has kindly provided me with this code:
import Data.Monoid
newtype IntTrie a = IntTrie [a] deriving Show
singleton :: (Monoid a) => Int -> a -> IntTrie a singleton ch x = IntTrie $ replicate ch mempty ++ [x] ++ repeat mempty
lookupTrie :: IntTrie a -> Int -> a lookupTrie (IntTrie xs) n = xs !! n
instance (Monoid a) => Monoid (IntTrie a) where mempty = IntTrie (repeat mempty) mappend (IntTrie xs) (IntTrie ys) = IntTrie (infZipWith mappend xs ys)
infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys
test = mconcat [singleton (n `mod` 42) [n] | n <- [0..]] `lookupTrie` 10
It's supposed to eventually help me group a list of key value pairs and then further process them in a linear (streaming like) way.
The original list being something like [('a', 23), ('b', 18), ('a', 34) ...].
There are couple of techniques employed in this solution, but I'm just guessing here.
The keywords I've been looking up so far:
Memmoization, Deforestation, Single Pass, Linear Map and some others.
Can someone please fill me in?
Günther
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Re that link: search for wren's comments containing "it is however
nicely coinductive"
2009/3/26 Thomas Hartman
Luke, does your explanation to Guenther have anything to do with coinduction? -- the property that a producer gives a little bit of output at each step of recursion, which a consumer can than crunch in a lazy way?
I find that coinduction seems to figure frequently in algos that process a stream.
So, Guenther, I'm not certain if coinduction figures in here yet but it gives you another thing to google on. Co-induction seems to play the same role for stream processing in haskell that tail recursiveness plays in non-lazy languages like lisp. That is, it's kind of the ideal to be striven for. Whereas in haskell, tail recursive is frequently not the best thing because it goes into a non-terminating state when there is an infinite data structure which is crunched down to a finite one but at the wrong point in the function pipeline.
2009/3/26 Luke Palmer
: On Thu, Mar 26, 2009 at 12:21 PM, GüŸnther Schmidt
wrote:y Hi guys,
I tried for days now to figure out a solution that Luke Palmer has presented me with, by myself, I'm getting nowhere.
Sorry, I meant to respond earlier.
They say you don't really understand something until you can explain it to a six year old. So trying to explain this to a colleague made me realize how little I must understand it :-). But I'll try by saying whatever come to mind...
Lazy list processing is all about right associativity. We need to be able to output some information knowing that our input looks like a:b:c:..., where we don't know the ... I see IntTrie [a] as an infinite collection of lists (well, it is [[a]], after all :-), one for each integer. So I want to take a structure like this:
(1,2):(3,4):(3,5):...
And turn it into a structure like this:
{ 0 -> ... 1 -> 2:... 2 -> ... 3 -> 4:5:... ... }
(This is just a list in my implementation, but I intended it to be a trie, ideally, which is why I wrote the keys explicitly)
So the yet-unknown information at the tail of the list turns into yet-unknown information about the tails of the keys. In fact, if you replace ... with _|_, you get exactly the same thing (this is no coincidence!)
The spine of this trie is maximally lazy: this is key. If the structure of the spine depended on the input data (as it does for Data.Map), then we wouldn't be able to process infinite data, because we can never get it all. So even making a trie out of the list _|_ gives us:
{ 0 -> _|_, 1 -> _|_, 2 -> _|_, ... }
I.e. the keys are still there. Then we can combine two tries just by combining them pointwise (which is what infZipWith does). It is essential that the pattern matches on infZipWith are lazy. We're zipping together an infinite sequence of lists, and normally the result would be the length of the shortest one, which is unknowable. So the lazy pattern match forces the result ('s spine) to be infinite.
Umm... yeah, that's a braindump. Sorry I couldn't be more helpful. I'm happy to answer any specific questions.
Luke
He has kindly provided me with this code:
import Data.Monoid
newtype IntTrie a = IntTrie [a] deriving Show
singleton :: (Monoid a) => Int -> a -> IntTrie a singleton ch x = IntTrie $ replicate ch mempty ++ [x] ++ repeat mempty
lookupTrie :: IntTrie a -> Int -> a lookupTrie (IntTrie xs) n = xs !! n
instance (Monoid a) => Monoid (IntTrie a) where mempty = IntTrie (repeat mempty) mappend (IntTrie xs) (IntTrie ys) = IntTrie (infZipWith mappend xs ys)
infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys
test = mconcat [singleton (n `mod` 42) [n] | n <- [0..]] `lookupTrie` 10
It's supposed to eventually help me group a list of key value pairs and then further process them in a linear (streaming like) way.
The original list being something like [('a', 23), ('b', 18), ('a', 34) ...].
There are couple of techniques employed in this solution, but I'm just guessing here.
The keywords I've been looking up so far:
Memmoization, Deforestation, Single Pass, Linear Map and some others.
Can someone please fill me in?
Günther
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Well Folks, I've been programming for almost a decade now and making a living off it for almost 8 years. To me programing in Haskell is sometimes quite a humbling experience, because I come to realize how shallow my ventures so far were. The depth this language has is just amazing and the stuff that is tackled in this language is just .... aaahhh. Can't quite put it in words, maybe something along the lines "the ultimate thing, key to the universe" I don't know. Humbling and frustrating especially when you come across the Lukes and they do to you what he did to me in about 12 lines of code. And then you come across the blogs of the Lukes and see all the other things where you stand in awe and your jaw drops. And you realize how far away from that you still are. And now you! Sry for that Thomas, my prescription ran out today. I'll check on the link first thing once I got a refill. Günther Karl-Heeeeinz, mei Drobbe!

Gü?nther Schmidt wrote:
The depth this language has is just amazing and the stuff that is tackled in this language is just .... aaahhh. Can't quite put it in words, maybe something along the lines "the ultimate thing, key to the universe" I don't know.
Humbling and frustrating especially when you come across the Lukes and they do to you what he did to me in about 12 lines of code. And then you come across the blogs of the Lukes and see all the other things where you stand in awe and your jaw drops. And you realize how far away from that you still are.
If you enjoy jaw-dropping brain-exploding fun, have you seen Martin Escardo's work on exhaustively searching infinite spaces? http://math.andrej.com/2007/09/28/seemingly-impossible-functional-programs/ http://math.andrej.com/2008/11/21/a-haskell-monad-for-infinite-search-in-fin... -- Live well, ~wren

Thomas Hartman wrote:
Luke, does your explanation to Guenther have anything to do with coinduction? -- the property that a producer gives a little bit of output at each step of recursion, which a consumer can than crunch in a lazy way?
It has more to do with "tying the knot" (using laziness to define values in terms of themselves), though there are similarities. Take the function: infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys which we could write less clearly as: infZipWith f xxs yys = f (head xxs) (head yys) : infZipWith f (tail xxs) (tail yys) The trick is that we can emit the thunk for f(head xxs)(head yys) without knowing what values xxs and yys actually contain--- they could still be _|_! The hope is that by the time we get to where someone asks for the value of that thunk ---by that point--- we *will* know enough about xxs and yys that we can give an answer. For knot tying to work, we must ensure that we don't ask for things "from the future" before we've actually created them. If we do, then the function will diverge, i.e. _|_. This shares similarities with the ideal 1-to-1 producer/consumer role for deforestation (whence, similarities to coinduction). -- Live well, ~wren

On Thu, 26 Mar 2009, wren ng thornton wrote:
Thomas Hartman wrote:
Luke, does your explanation to Guenther have anything to do with coinduction? -- the property that a producer gives a little bit of output at each step of recursion, which a consumer can than crunch in a lazy way?
It has more to do with "tying the knot" (using laziness to define values in terms of themselves), though there are similarities. Take the function:
infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys
What about using a custom list type, that has only one constructor like (:), that is, a type for infinite lists?

On Fri, Mar 27, 2009 at 7:03 PM, Henning Thielemann < lemming@henning-thielemann.de> wrote:
On Thu, 26 Mar 2009, wren ng thornton wrote:
Thomas Hartman wrote:
Luke, does your explanation to Guenther have anything to do with coinduction? -- the property that a producer gives a little bit of output at each step of recursion, which a consumer can than crunch in a lazy way?
It has more to do with "tying the knot" (using laziness to define values in terms of themselves), though there are similarities. Take the function:
infZipWith f ~(x:xs) ~(y:ys) = f x y : infZipWith f xs ys
What about using a custom list type, that has only one constructor like (:), that is, a type for infinite lists?
Yes, that would be more correct. However, the lazy pattern match would still be necessary, because single-constructor types are lifted. And as long as you're doing that, you might as well go all the way to an infinite binary trie. Luke

On Sat, Mar 28, 2009 at 3:03 AM, Henning Thielemann < lemming@henning-thielemann.de> wrote:
What about using a custom list type, that has only one constructor like (:), that is, a type for infinite lists?
You mean Data.Stream?http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Stream Interesting that this module's zipWith does not do lazy pattern matching, even though it only supports infinite lists.

2009/3/26 Luke Palmer
The spine of this trie is maximally lazy: this is key. If the structure of the spine depended on the input data (as it does for Data.Map), then we wouldn't be able to process infinite data, because we can never get it all. So even making a trie out of the list _|_ gives us:
{ 0 -> _|_, 1 -> _|_, 2 -> _|_, ... }
I.e. the keys are still there. Then we can combine two tries just by combining them pointwise (which is what infZipWith does). It is essential that the pattern matches on infZipWith are lazy. We're zipping together an infinite sequence of lists, and normally the result would be the length of the shortest one, which is unknowable. So the lazy pattern match forces the result ('s spine) to be infinite.
It's also important that (++) is non-strict in its second argument.
Using infZipWith with (+) requires examining the entire input before
getting any output.
Unfortunately, Günther's original post indicated that he wanted the
*sum* of the values for each key. Unless you're using non-strict
natural numbers, that pretty much requires examining the entire input
list before producing any output.
--
Incidentally, this also works (in that it produces the right answers):
type MultiMap k v = k -> [v]
-- The Monoid instance is pre-defined; here's the relevant code:
-- mappend map1 map2 = \k -> map1 k ++ map2 k
singleton :: Eq k => k -> v -> MultiMap k v
singleton k v = \k' -> if k == k' then [v] else []
test = mconcat [ singleton (n `mod` 42) n | n <- [0..]] 10
Or, using insert instead of singleton/union,
insert :: k -> v -> MultiMap k v -> MultiMap k v
insert k v map = \k' -> if k == k' then v : map k' else map k'
fromList :: Eq k => [(k,v)] -> MultiMap k v
fromList = foldr (\(k,v) -> insert k v) (const [])
Note that insert is non-strict in its third argument, meaning that
foldr can return an answer immediately.
--
Dave Menendez

Guys, I appreciate the contribution but I'm now more confused than ever :) On the Co-Induction-whatever thingy: Thanks, that's certainly something interesting and I'll try to find out more about it. But it looks like none of all the proposals offers a true solution, at least none from start to finish. The amount of data is large, but finite. So for now, I'll just stick with my initial approach as I'm running out of time. Günther

How do you plan to consume the running sums of these data? How many
different keys do you have?
If all pure stuff fails, you can always revert to imperative techniques,
using a mutable
arrayhttp://www.haskell.org/ghc/docs/6.6.1/html/libraries/base/Data-Array-ST.htmlor
something :)
On Fri, Mar 27, 2009 at 3:18 PM, GüŸnther Schmidt
Guys,
I appreciate the contribution but I'm now more confused than ever :)
On the Co-Induction-whatever thingy: Thanks, that's certainly something interesting and I'll try to find out more about it.
But it looks like none of all the proposals offers a true solution, at least none from start to finish. The amount of data is large, but finite.
So for now, I'll just stick with my initial approach as I'm running out of time.
Günther
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (8)
-
David Menendez
-
Gü?nther Schmidt
-
Günther Schmidt
-
Henning Thielemann
-
Luke Palmer
-
Peter Verswyvelen
-
Thomas Hartman
-
wren ng thornton