Optimization problem

Magnus Jonsson

13 Sep 2006 13 Sep '06

7:10 p.m.

Dear Haskell Cafe, When programming the other day I ran into this problem. What I want to do is a function that would work like this: splitStreams::Ord a=>[(a,b)]->[(a,[b])]

...

splitStreams [(3,x),(1,y),(3,z),(2,w)] [(3,[x,z]),(1,[y]),(2,[w])]

I don't care about the order that the pairs are output, so the answer could just as well be [(2,[w],(3,[x,z]),(1,[y])]. However I do care about the order of the xyzw:s, so (3,[z,x]) could not be part of the solution in this example. Furthermore it should work on infinite lists. It can't eat the whole list before producing any output. Now, it's not too hard to come up with an inefficient solution that traverses the input list multiple times. For example a sieving solution: import Data.List splitStreams [] = [] splitStreams ((channel,msg):rest) = let (myMsgs,otherMsgs) = partition (\(c,_)->c==channel) rest in (channel, msg : map snd myMsgs) : splitStreams otherMsgs I'm afraid this algorithm is O(n*m) time in the worst case, where n is the length of the input list, and m is the number of unique channels. But is there any way to write it such that each element is touched only once? Or at least an O(n*log(m)) algorithm? Any takers? / Magnus Jonsson

Show replies by date

Twan van Laarhoven

13 Sep 13 Sep

7:53 p.m.

Magnus Jonsson wrote:

...

Dear Haskell Cafe,

When programming the other day I ran into this problem. What I want to do is a function that would work like this:

splitStreams::Ord a=>[(a,b)]->[(a,[b])]

...
splitStreams [(3,x),(1,y),(3,z),(2,w)]

[(3,[x,z]),(1,[y]),(2,[w])]

A O(n log(n)) algorithm is easy if you use Data.Map:

...

import qualified Data.Map as Map

splitStreamsMap :: Ord a => [(a,b)] -> Map.Map a [b] splitStreamsMap = foldl add Map.empty where add (a,b) m = Map.insertWith (++) a [b]

splitStreams = Map.fromList . splitStreamsMap

Twan

Magnus Jonsson

8:26 p.m.

Nice try Twan but your example fails on infinite lists. I cleaned up your example so that it compiles: import qualified Data.Map as Map splitStreamsMap :: Ord a => [(a,b)] -> Map.Map a [b] splitStreamsMap = foldl add Map.empty where add m (a,b) = Map.insertWith (++) a [b] m splitStreams :: Ord a => [(a,b)] -> [(a,[b])] splitStreams = Map.toList . splitStreamsMap It fails to return a value on this test: take 2 $ snd $ head $ splitStreams (map (\x -> (0 ,x)) [1..]) / Magnus On Thu, 14 Sep 2006, Twan van Laarhoven wrote:

...

Magnus Jonsson wrote:

...
Dear Haskell Cafe,

When programming the other day I ran into this problem. What I want to do is a function that would work like this:

splitStreams::Ord a=>[(a,b)]->[(a,[b])]

...
splitStreams [(3,x),(1,y),(3,z),(2,w)]

[(3,[x,z]),(1,[y]),(2,[w])]

A O(n log(n)) algorithm is easy if you use Data.Map:

...
import qualified Data.Map as Map

splitStreamsMap :: Ord a => [(a,b)] -> Map.Map a [b] splitStreamsMap = foldl add Map.empty where add (a,b) m = Map.insertWith (++) a [b]

splitStreams = Map.fromList . splitStreamsMap

Twan

Ketil Malde

14 Sep 14 Sep

4:52 a.m.

Magnus Jonsson writes:

...

splitStreams::Ord a=>[(a,b)]->[(a,[b])]

...

...
splitStreams [(3,x),(1,y),(3,z),(2,w)] [(3,[x,z]),(1,[y]),(2,[w])]

[...]

...

But is there any way to write it such that each element is touched only once? Or at least an O(n*log(m)) algorithm?

I guess it should be possible to use a quicksort-like algorithm, splitting the stream into three streams with keys less than, equal, and greater than the first element, and recurse on the less/equal streams? -k -- If I haven't seen further, it is by standing in the footprints of giants

Magnus Jonsson

9:06 a.m.

On Thu, 14 Sep 2006, Ketil Malde wrote:

...

Magnus Jonsson writes:

...
splitStreams::Ord a=>[(a,b)]->[(a,[b])]

...
...
splitStreams [(3,x),(1,y),(3,z),(2,w)] [(3,[x,z]),(1,[y]),(2,[w])]

[...]

...
But is there any way to write it such that each element is touched only once? Or at least an O(n*log(m)) algorithm?

I guess it should be possible to use a quicksort-like algorithm, splitting the stream into three streams with keys less than, equal, and greater than the first element, and recurse on the less/equal streams?

-k

That is probably the best we can do. I think in a hypothetical concurrent Haskell with futures/promises, the split stream problem could be solved. But if it can be solved in regular Haskell, I would be pleasantly surprised. / Magnus

Matthias Fischmann

10:12 a.m.

...

...
Magnus Jonsson writes:

...
splitStreams::Ord a=>[(a,b)]->[(a,[b])]

...
...
splitStreams [(3,x),(1,y),(3,z),(2,w)] [(3,[x,z]),(1,[y]),(2,[w])]

[...]

...
But is there any way to write it such that each element is touched only once? Or at least an O(n*log(m)) algorithm?

I guess it should be possible to use a quicksort-like algorithm, splitting the stream into three streams with keys less than, equal, and greater than the first element, and recurse on the less/equal streams?

something like this, then? splitStreams :: Ord a => [(a, b)] -> [(a, [b])] splitStreams [] = [] splitStreams ((a, b) : t) = (a, b : bs) : splitStreams t' where (bs, t') = foldr f ([], []) t f z@(a', b') (bs, t') = if a' == a then (b' : bs, t') else (bs, z : t') (i guess i should use a strictness '!' for (xs, ys), but i am still running ghc-6.5, and i don't like the case constructions that much. does this bite me here? i didn't check, really.) who can tune this any further? cheers, m.

Bruno Oliveira

12:47 p.m.

Hello, I think it is possible to do it in O(n) (if you change the representation of the output stream). Note that the solution is quite similar toTwan van Laarhoven, and it should be trivial use "Map" instead of "Rel". Here is my take on it: The type of the output stream:

...

type Rel a b = a -> [b]

Here are cons and nil:

...

cons :: Eq a => (a,b) -> Rel a b -> Rel a b cons (x,y) l z | x == z = y : l x | otherwise = l z

...

nil :: Rel a b nil _ = []

and a lookUp function:

...

lookUp :: Rel a b -> a -> [b] lookUp = id

Finally the splitStreams algorithm:

...

splitStreams :: Eq a => [(a,b)] -> Rel a b splitStreams = foldr cons nil

Here is a test with finite lists:

...

fin = splitStreams [(3,'x'),(1,'y'),(3,'z'),(2,'w')]

and here are the console tests: *General7> lookUp fin 1 "y" *General7> lookUp fin 2 "w" *General7> lookUp fin 3 "xz" Now with infinite lists:

...

inf = splitStreams (map (\x -> (0, x)) [1..])

and here a case where it terminates with infinite lists: *General7> take 10 (lookUp inf 0) [1,2,3,4,5,6,7,8,9,10] Cheers, Bruno Oliveira

Magnus Jonsson

1:11 p.m.

Thanks Bruno. However I think this is still O(n*m). As far as I understand your code it is a longwinded way to say "[b | (a,b) <- input, b == myChannelOfInterest]". This is fine if you are only interested in one channel, and for that case I agree it's O(n), but if you are interested in many different channels, it will take O(n*m). Here's why I think your code is equivalent to using a filter/list-comprehension:

...

...
cons :: Eq a => (a,b) -> Rel a b -> Rel a b cons (x,y) l z | x == z = y : l x | otherwise = l z

z is the channel that the user is interested in. x is the channel of the current message in the list. y is the message content. l is a function for searching the rest of the list. For each element of the list you create a function that compares the requested channel to a (in (a,b)). If it's the same, you return that element followed by a continued search down the list. If you don't find it, you forward the request to the next function down the list. That's exactly the same as what the filter function does. It is possible I made a mistake somewhere and if I did, let me know. / Magnus On Thu, 14 Sep 2006, Bruno Oliveira wrote:

...

Hello,

I think it is possible to do it in O(n) (if you change the representation of the output stream).

Note that the solution is quite similar toTwan van Laarhoven, and it should be trivial use "Map" instead of "Rel".

Here is my take on it:

The type of the output stream:

...
type Rel a b = a -> [b]

Here are cons and nil:

...
nil :: Rel a b nil _ = []

and a lookUp function:

...
lookUp :: Rel a b -> a -> [b] lookUp = id

Finally the splitStreams algorithm:

...
splitStreams :: Eq a => [(a,b)] -> Rel a b splitStreams = foldr cons nil

Here is a test with finite lists:

...
fin = splitStreams [(3,'x'),(1,'y'),(3,'z'),(2,'w')]

and here are the console tests:

*General7> lookUp fin 1 "y" *General7> lookUp fin 2 "w" *General7> lookUp fin 3 "xz"

Now with infinite lists:

...
inf = splitStreams (map (\x -> (0, x)) [1..])

and here a case where it terminates with infinite lists:

*General7> take 10 (lookUp inf 0) [1,2,3,4,5,6,7,8,9,10]

Cheers,

Bruno Oliveira

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Bruno Oliveira

2:03 p.m.

Hello Magnus, You are right. Silly me. I was thinking of Rel as a kind of Hashtable. In this case, I think it should be possible to have O(n), since "cons" would only need constant time. Cheers, Bruno On Thu, 14 Sep 2006 13:11:05 -0400 (EDT), Magnus Jonsson wrote:

...

Thanks Bruno.

...

However I think this is still O(n*m). As far as I understand your code it is a longwinded way to say "[b | (a,b) <- input, b == myChannelOfInterest]". This is fine if you are only interested in one channel, and for that case I agree it's O(n), but if you are interested in many different channels, it will take O(n*m). Here's why I think your code is equivalent to using a filter/list-comprehension:

...

...
...
cons :: Eq a => (a,b) -> Rel a b -> Rel a b cons (x,y) l z | x == z = y : l x | otherwise = l z

...

z is the channel that the user is interested in. x is the channel of the current message in the list. y is the message content. l is a function for searching the rest of the list.

...

For each element of the list you create a function that compares the requested channel to a (in (a,b)). If it's the same, you return that element followed by a continued search down the list. If you don't find it, you forward the request to the next function down the list. That's exactly the same as what the filter function does.

...

It is possible I made a mistake somewhere and if I did, let me know.

...

/ Magnus

...

On Thu, 14 Sep 2006, Bruno Oliveira wrote:

...

...
Hello,

I think it is possible to do it in O(n) (if you change the representation of the output stream).

Note that the solution is quite similar toTwan van Laarhoven, and it should be trivial use "Map" instead of "Rel".

Here is my take on it:

The type of the output stream:

...
type Rel a b = a -> [b]

Here are cons and nil:

...
nil :: Rel a b nil _ = []

and a lookUp function:

...
lookUp :: Rel a b -> a -> [b] lookUp = id

Finally the splitStreams algorithm:

...
splitStreams :: Eq a => [(a,b)] -> Rel a b splitStreams = foldr cons nil

Here is a test with finite lists:

...
fin = splitStreams [(3,'x'),(1,'y'),(3,'z'),(2,'w')]

and here are the console tests:

*General7> lookUp fin 1 "y" *General7> lookUp fin 2 "w" *General7> lookUp fin 3 "xz"

Now with infinite lists:

...
inf = splitStreams (map (\x -> (0, x)) [1..])

and here a case where it terminates with infinite lists:

*General7> take 10 (lookUp inf 0) [1,2,3,4,5,6,7,8,9,10]

Cheers,

Bruno Oliveira

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Henning Thielemann

9:38 a.m.

On Wed, 13 Sep 2006, Magnus Jonsson wrote:

...

When programming the other day I ran into this problem. What I want to do is a function that would work like this:

splitStreams::Ord a=>[(a,b)]->[(a,[b])]

...
splitStreams [(3,x),(1,y),(3,z),(2,w)] [(3,[x,z]),(1,[y]),(2,[w])]

Interestingly we use such a routine in Haskore for splitting a sequence of notes into sequences of notes of equal instruments. It's implemented rather the same way like your version. It's 'slice' in http://darcs.haskell.org/haskore/src/Haskore/Basic/TimeOrderedList.lhs but it is a bit more complicated because it has to manage time information.

Magnus Jonsson

11:34 a.m.

On Thu, 14 Sep 2006, Henning Thielemann wrote:

...

Interestingly we use such a routine in Haskore for splitting a sequence of notes into sequences of notes of equal instruments. It's implemented rather the same way like your version.

It's 'slice' in http://darcs.haskell.org/haskore/src/Haskore/Basic/TimeOrderedList.lhs but it is a bit more complicated because it has to manage time information.

Now even more interestingly, my program also deals with musiec! :) I'm generating microtonal midi files. I use it for very much the same purpose as you do (although my program is not yet finished). - Magnus

Henning Thielemann

11:47 a.m.

On Thu, 14 Sep 2006, Magnus Jonsson wrote:

...

On Thu, 14 Sep 2006, Henning Thielemann wrote:

...
Interestingly we use such a routine in Haskore for splitting a sequence of notes into sequences of notes of equal instruments. It's implemented rather the same way like your version.

It's 'slice' in http://darcs.haskell.org/haskore/src/Haskore/Basic/TimeOrderedList.lhs but it is a bit more complicated because it has to manage time information.

Now even more interestingly, my program also deals with musiec! :) I'm generating microtonal midi files. I use it for very much the same purpose as you do (although my program is not yet finished).

Is it something we could and should add to Haskore? "microtonal" -- I recently read this term here: http://www.math.tu-dresden.de/~mutabor/

Magnus Jonsson

1:29 p.m.

New subject: Haskore microtonal support

On Thu, 14 Sep 2006, Henning Thielemann wrote:

...

On Thu, 14 Sep 2006, Magnus Jonsson wrote:

...
Now even more interestingly, my program also deals with music! :) I'm generating microtonal midi files. I use it for very much the same purpose as you do (although my program is not yet finished).

Is it something we could and should add to Haskore?

If Haskore could support microtones that would make this world a slightly better world for me. Here are the basic things you need to support microtonal music: - Pitch representations would have to be able to express any pitch. - One appealing approach is to represent a pitch directly as it's frequency. - Probably the most useful representation though is a base pitch, say one of C,D,E,F,G,A,B, followed by a list of accidentals that modify the pitch. The user should be able to define his own base pitches and accidentals, in terms of cents or frequency ratios or something similar. - Generating microtonal midi files requires that you add pitch-bend messages before all notes. That restricts each midi channel to only being able to play one note at a time. This is a big deficiency in the midi protocol imo. / Magnus

Paul Hudak

22 Sep 22 Sep

8:59 a.m.

New subject: Haskore microtonal support

An easier and better way to support microtonal music in Haskore is to use the csound back-end instead of MIDI. I'd be happy to help someone develop such a thing if interested. -Paul Magnus Jonsson wrote:

...

On Thu, 14 Sep 2006, Henning Thielemann wrote:

...
On Thu, 14 Sep 2006, Magnus Jonsson wrote:

...
Now even more interestingly, my program also deals with music! :) I'm generating microtonal midi files. I use it for very much the same purpose as you do (although my program is not yet finished).

Is it something we could and should add to Haskore?

If Haskore could support microtones that would make this world a slightly better world for me. Here are the basic things you need to support microtonal music:

- Pitch representations would have to be able to express any pitch. - One appealing approach is to represent a pitch directly as it's frequency. - Probably the most useful representation though is a base pitch, say one of C,D,E,F,G,A,B, followed by a list of accidentals that modify the pitch. The user should be able to define his own base pitches and accidentals, in terms of cents or frequency ratios or something similar. - Generating microtonal midi files requires that you add pitch-bend messages before all notes. That restricts each midi channel to only being able to play one note at a time. This is a big deficiency in the midi protocol imo.

/ Magnus

Bertram Felgenhauer

14 Sep 14 Sep

11:22 a.m.

Magnus Jonsson wrote:

...

splitStreams::Ord a=>[(a,b)]->[(a,[b])]

...
splitStreams [(3,x),(1,y),(3,z),(2,w)] [(3,[x,z]),(1,[y]),(2,[w])]

...

I'm afraid this algorithm is O(n*m) time in the worst case, where n is the length of the input list, and m is the number of unique channels.

But is there any way to write it such that each element is touched only once? Or at least an O(n*log(m)) algorithm?

This can be done. It's an interesting puzzle to make it work for infinite lists. For finite lists, sortBy + groupBy + map easily do the job. The problem is dealing with holes in data structures in Haskell, which are to be filled in later. This can be done by providing blueprints for them. Let's define a basic map:

...

data Map k a = Node k a (Map k a) (Map k a) | Leaf

We will use Map k () as blueprints - the () indicate where the holes are. Next we define a lookup function, which takes a blueprint and evaluates the correct hole in a second map:

...

lookup :: Ord k => k -> Map k () -> Map k a -> Maybe a lookup _ Leaf _ = Nothing lookup k (Node k' _ l r) ~(Node _ a' l' r') = case compare k k' of LT -> lookup k l l' EQ -> Just a' GT -> lookup k r r'

As you can see, the structure of the second argument is forced by the first argument. The lazy pattern assures that we don't look at the second argument too early. In a similar fashion, we can define an update function:

...

update :: Ord k => k -> (a -> a) -> Map k x -> Map k a -> Map k a update k f (Node k' _ l r) ~(Node _ a' l' r') = case compare k k' of LT -> Node k' a' (update k f l l') r' EQ -> Node k' (f a') l' r' GT -> Node k' a' l' (update k f r r')

Next comes insert. Insert takes a blueprint and inserts a key into it. It also takes a map and removes the corresponding hole from it. To simplify the code it does no balancing. [1]

...

insert :: Ord k => k -> Map k () -> Map k a -> (Map k (), Map k a) insert k Leaf _ = (Node k () Leaf Leaf, Leaf) insert k (Node k' _ l r) ~(Node _ a' l' r') = case compare k k' of LT -> let (m, m') = insert k l l' in (Node k' () m r, Node k' a' m' r') EQ -> error "inserting existing element" GT -> let (m, m') = insert k r r' in (Node k' () l m, Node k' a' l' m')

We also need a map, defined in the usual fashion, without the blueprint. A version with blueprint can also be defined, but it isn't necessary for our problem:

...

map :: (a -> b) -> Map k a -> Map k b map _ Leaf = Leaf map f (Node k a l r) = Node k (f a) (mapMap f l) (mapMap f r)

We can now build the splitStream function, using the following helper function:

...

splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b]) splitSeq' bp [] = ([], map (const []) bp) splitSeq' bp ((a,b):xs) = case lookup a bp bp of Just _ -> let (l, m) = splitSeq' bp xs in (l, update a (b:) bp m) _ -> let (bp', m) = insert a bp m' (l, m') = splitSeq' bp' xs in ((a, b : (fromJust $ lookup a bp' m')) : l, m)

splitSeq' takes a blueprint for a map with all keys seen so far, and a list tail. It returns the result list for all new keys, and a map (corresponding to the given blueprint) with the tails of the seen elements. The in the base case it just fills the blueprint with empty lists and returns to the caller. If a new element is seen, insert is used to create a new blueprint, including the new key a, which is passed to the recursive call of splitSeq'. The resulting map from that call is threaded back into insert, which gives us a new map without the a key which matches the structure of the original blueprint. Finally we can define splitSeq as follows:

...

splitSeq :: Ord a => [(a,b)] -> [(a,[b])] splitSeq = fst . splitSeq' Leaf

A quick test: *Main> let s = splitSeq ([(3,'x'),(1,'y'),(3,'z'),(2,'w')] ++ repeat (4,' ')) *Main> s !! 0 (3,"xzInterrupted. (I pressed ^C) *Main> s !! 2 (2,"wInterrupted. (ditto) *Main> s !! 3 (4," ... Is there a simpler way to do this? I don't know. I'm also unsure whether it is a good idea - unless you use several threads to process the list the code will produce a lot of thunks, and eat a lot of memory. The code above provides a maximum amount of lazyness while using O(n log m) time. Depending on the circumstances processing the list in chunks and using techniques like the above to combine the result will be better. enjoy, Bertram [1] Balancing can be done with the information in the blueprint, and mapping back is easily done by doing the transformation on the tree in reverse.

Ross Paterson

11:56 a.m.

Here's my attempt, also using the quicksort idea, but using two passes instead of tying the knot:

...

import Data.Set hiding (map)

First a binary search tree, with a lookup function:

...

data Tree k v = Node (Tree k v) k v (Tree k v)

...

get :: Ord a => a -> Tree a b -> b get a (Node l k v r) = case compare a k of LT -> get a l EQ -> v GT -> get a r

There's no empty case: we'll ensure that we only search for keys that are in the tree. Now to make a tree of lists from the list of pairs:

...

mkTree :: Ord a => [(a,b)] -> Tree a [b] mkTree [] = error "unused" mkTree ((a,b):abs) = Node l a (b:bs) r where l = mkTree [(a',b') | (a',b') <- abs, a' < a] r = mkTree [(a',b') | (a',b') <- abs, a' > a] bs = [b' | (a',b') <- abs, a' == a]

It remains to extract from this tree the list of second elements corresponding to the each distinct first element in the input list:

...

splitStreams :: Ord a => [(a,b)] -> [(a,[b])] splitStreams abs = [(a, get a t) | a <- uniq (map fst abs)] where t = mkTree abs

where uniq computes the list of unique elements of a list:

...

uniq :: Ord a => [a] -> [a] uniq = u empty where u s [] = [] u s (x:xs) | member x s = u s xs | otherwise = x : u (insert x s) xs

There's no rebalancing, so it suffers from the same problems as quicksort.

Bertram Felgenhauer

12:18 p.m.

I wrote:

...

[1] Balancing can be done with the information in the blueprint, and mapping back is easily done by doing the transformation on the tree in reverse.

I should add that this possibility was the main reason for dealing with blueprints at all. As Ross Paterson's solution shows, it's possible to get simpler code without balancing the tree. regards, Bertram

Magnus Jonsson

12:58 p.m.

Thanks Bertran for your solution. I have difficulty understanding it so I can't comment on it right now but I'll try to have a look at it. Do you know of any article that explains this technique, starting from very simple examples? /Magnus On Thu, 14 Sep 2006, Bertram Felgenhauer wrote:

...

I wrote:

...
[1] Balancing can be done with the information in the blueprint, and mapping back is easily done by doing the transformation on the tree in reverse.

I should add that this possibility was the main reason for dealing with blueprints at all. As Ross Paterson's solution shows, it's possible to get simpler code without balancing the tree.

regards,

Bertram _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Bertram Felgenhauer

1:51 p.m.

Magnus Jonsson wrote:

...

Thanks Bertran for your solution. I have difficulty understanding it so I can't comment on it right now but I'll try to have a look at it. Do you know of any article that explains this technique, starting from very simple examples?

Not really. It's a result of thinking about lazy evaluation, and especially lazy patterns (and let bindings) for some time. A wiki article that helped me a lot to understand these is http://www.haskell.org/hawiki/TyingTheKnot I'd like to point out the trustList function there which uses the idea of encoding the structure of a term and its actual values in different arguments, i.e. a blueprint. HTH, Bertram

Henning Thielemann

15 Sep 15 Sep

11:28 a.m.

On Thu, 14 Sep 2006, Bertram Felgenhauer wrote:

...

Magnus Jonsson wrote:

...
Thanks Bertran for your solution. I have difficulty understanding it so I can't comment on it right now but I'll try to have a look at it. Do you know of any article that explains this technique, starting from very simple examples?

Not really. It's a result of thinking about lazy evaluation, and especially lazy patterns (and let bindings) for some time. A wiki article that helped me a lot to understand these is

http://www.haskell.org/hawiki/TyingTheKnot

I'd like to point out the trustList function there which uses the idea of encoding the structure of a term and its actual values in different arguments, i.e. a blueprint.

Maybe you could add a new item "Blueprints" to Haskell Idioms.

Ross Paterson

17 Sep 17 Sep

7:23 p.m.

On Thu, Sep 14, 2006 at 07:51:59PM +0200, Bertram Felgenhauer wrote:

...

It's a result of thinking about lazy evaluation, and especially lazy patterns (and let bindings) for some time. A wiki article that helped me a lot to understand these is

http://www.haskell.org/hawiki/TyingTheKnot

I'd like to point out the trustList function there which uses the idea of encoding the structure of a term and its actual values in different arguments, i.e. a blueprint.

One view of your device is as separating the shape (blueprint) from the contents, e.g. one can split a finite map type data Map k a = Node !Int k a (Map k a) (Map k a) | Leaf into a pair of types data MapShape k = SNode !Int k (MapShape k) (MapShape k) | SLeaf data MapData a = DNode a (MapData a) (MapData a) | DLeaf (We don't even need DLeaf, as it's never examined.) We need functions fill :: a -> MapShape k -> MapData a update :: Ord k => k -> (a -> a) -> MapShape k -> MapData a -> MapData a insert :: Ord k => k -> MapShape k -> (MapShape k, MapData a -> (a, MapData a)) In a dependently typed language we could parameterize MapData with shapes to give more precise types: fill :: a -> forall s :: MapShape k. MapData s a update :: Ord k => k -> (a -> a) -> forall s :: MapShape k. MapData s a -> MapData s a insert :: Ord k => k -> forall s :: MapShape k. (exists s' :: MapShape k, MapData s' a -> (a, MapData s a)) hinting that the function returned by insert reverses the effect of the insertion. It's not possible to code this with GADTs, because existentials are incompatible with irrefutable patterns, at least in GHC. Anyway, splitSeq then becomes splitSeq :: Ord a => [(a,b)] -> [(a,[b])] splitSeq = fst . splitSeq' SLeaf splitSeq' :: Ord a => MapShape a -> [(a,b)] -> ([(a,[b])], MapData [b]) splitSeq' bp [] = ([], fill [] bp) splitSeq' bp ((a,b):xs) | member a bp = let (l, m) = splitSeq' bp xs in (l, update a (b:) bp m) | otherwise = let (bp', extract) = insert a bp (l, m') = splitSeq' bp' xs (bs, m) = extract m' in ((a, b:bs) : l, m) Again, no knot-tying is required. To prove that (even for partial and infinite lists ps) splitSeq ps = [(a, seconds a ps) | a <- nub ps] where seconds :: Eq a => a -> [(a,b)] -> [b] seconds a ps = [b | (a', b) <- ps, a' == a] we need another function get :: Ord k => k -> MapShape k -> MapData a -> a and the properties member k s => get k s (fill v s) = v member k s => get k s (update k f s m) = f (get k s m) member k s /\ member x s /\ x /= k => get x s (update k f s m) = get x s m not (member k s) /\ insert k s = (s', extract) => member x s' = member x s || x == k not (member k s) /\ insert k s = (s', extract) => fst (extract m) = get k s' m not (member k s) /\ insert k s = (s', extract) /\ member x s => get x s (snd (extract m)) = get x s' m Then we can establish, by induction on the input list, (1) fst (splitSeq' s ps) = [(a, seconds a ps) | a <- nub ps, not (member a s)] (2) member x s => get x s (snd (splitSeq' s ps)) = seconds x ps This is enough to sustain the induction and obtain the desired property, but it's a bit inelegant not to have an exact characterization of the second component. It seems essential to observe the map data only though get.

dons＠cse.unsw.edu.au

9:51 p.m.

ross:

...

On Thu, Sep 14, 2006 at 07:51:59PM +0200, Bertram Felgenhauer wrote:

...
It's a result of thinking about lazy evaluation, and especially lazy patterns (and let bindings) for some time. A wiki article that helped me a lot to understand these is

http://www.haskell.org/hawiki/TyingTheKnot

I'd like to point out the trustList function there which uses the idea of encoding the structure of a term and its actual values in different arguments, i.e. a blueprint.

One view of your device is as separating the shape (blueprint) from the contents, e.g. one can split a finite map type

data Map k a = Node !Int k a (Map k a) (Map k a) | Leaf

into a pair of types

data MapShape k = SNode !Int k (MapShape k) (MapShape k) | SLeaf data MapData a = DNode a (MapData a) (MapData a) | DLeaf

... Nice description. Ross, I added a wiki page for this technique. Would you like to either elaborate on the wiki, or include the text of your email to it? http://haskell.org/haskellwiki/Separating_shape_and_content Cheers, Don

Ross Paterson

18 Sep 18 Sep

6:53 p.m.

On Mon, Sep 18, 2006 at 12:23:11AM +0100, Ross Paterson wrote:

...

To prove that (even for partial and infinite lists ps)

splitSeq ps = [(a, seconds a ps) | a <- nub ps]

[...] we can establish, by induction on the input list,

(1) fst (splitSeq' s ps) = [(a, seconds a ps) | a <- nub ps, not (member a s)] (2) member x s => get x s (snd (splitSeq' s ps)) = seconds x ps

Oops, nub ps should be nub (map fst ps) in each case.

apfelmus＠quantentunnel.de

14 Sep 14 Sep

5:06 p.m.

Bertram Felgenhauer wrote:

...

...
splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b]) splitSeq' bp [] = ([], map (const []) bp) splitSeq' bp ((a,b):xs) = case lookup a bp bp of Just _ -> let (l, m) = splitSeq' bp xs in (l, update a (b:) bp m) _ -> let (bp', m) = insert a bp m' (l, m') = splitSeq' bp' xs in ((a, b : (fromJust $ lookup a bp' m')) : l, m)

splitSeq' takes a blueprint for a map with all keys seen so far, and a list tail. It returns the result list for all new keys, and a map (corresponding to the given blueprint) with the tails of the seen elements.

The in the base case it just fills the blueprint with empty lists and returns to the caller.

If a new element is seen, insert is used to create a new blueprint, including the new key a, which is passed to the recursive call of splitSeq'. The resulting map from that call is threaded back into insert, which gives us a new map without the a key which matches the structure of the original blueprint.

Very interesting! So the map with the seen tails matches the blueprint and therefore throws away information (the future keys) from the future. This is what effectively allows the key-tree structure to be rebalanced. If one would return the tails-map with all future keys, it would be _|_ because the key-order in the tree depends on the future keys and changes everytime when a new key is found. I thought that there can only be a static solution, i.e. like the one Ross Paterson presented where the structure (I mean the outermost constructors) of the returned tree are determined before the future. This obviously excludes rebalancing. I found a static solution by trying to fit the key-tails pairs into an infinite tails-map which is filled "first comes first": map ! 1 := (first key seen, tails) map ! 2 := (second key seen, tails) By keeping another key-position-map around which records where each key has been inserted, so that the future knows where to put the tails. The point is that the structure of tails-map is known before the future comes as its keys are just 1,2,3,... and so on. It remains to construct such an infinite random access list, but this is turns out to be even easier than finite random access lists (when using non-uniform recursion from Okasaki's chapter 10) :)

...

data Imp a = Imp a (Imp (a,a)) deriving (Show)

instance Functor Imp where fmap h ~(Imp x xs) = Imp (h x) (fmap (\(x,y) -> (h x, h y)) xs)

update :: (a -> a) -> Position -> Imp a -> Imp a update f 1 ~(Imp x xs) = Imp (f x) xs update f n ~(Imp x xs) = Imp x $ update f2 (n `div` 2) xs where f2 ~(y,z) = if even n then (f y, z) else (y, f z)

Note that we can use irrefutable patterns without hesitation as there is only one constructor. Folding over an infinite thing is strange but here we are

...

fold :: (a -> b -> b) -> Imp a -> b fold f ~(Imp x xs) = f x (fold f2 xs) where f2 (x,y) z = f x (f y z)

It's not so strange anymore when we realize that this can be used to convert it into an infinite list

...

toList = fold (:)

The implementation of fromList is fun as well, so I won't tell it. As fold and unfold can be defined generically for Mu f where f is a functor, i wonder how to deal with it in the case of non-uniform recursion. For splitStreams, the key-position-map is needed in both directions, so we quickly define a bijective map

...

type BiMap a b = (Map.Map a b, Map.Map b a)

switch :: BiMap a b -> BiMap b a switch (x,y) = (y,x)

with the usual operations (empty, member, size etc.) omitted here. Now comes splitStreams itself:

...

splitStreams :: Ord a => [(a,b)] -> [(a,[b])] splitStreams xs = takeWhile (not . null . snd) $ toList $ splitStreams' empty xs

splitStreams' :: Ord a => BiMap a Position -> [(a,b)] -> Imp (a,[b]) splitStreams' bimap [] = fmap (\i -> (findWithDefault undefined i $ switch bimap,[])) $ fromList [1..] splitStreams' bimap ((a,b):xs) = update fun pos $ splitStreams' bimap' xs where fun ~(_,bs) = (a,b:bs) sz = size bimap pos = findWithDefault (sz+1) a bimap bimap' = (if member a bimap then id else insert a (sz+1)) bimap

Note that update actually generates fresh constructors, so the structure of our tails-Imp is not really static. But this is no problem as the form of the constructors is completely known: there is only one. Regards, apfelmus

apfelmus＠quantentunnel.de

15 Sep 15 Sep

5:23 a.m.

apfelmus@quantentunnel.de wrote:

...

type BiMap a b = (Map.Map a b, Map.Map b a)

Actually BiMap is not needed at all, it suffices to have

...

splitStreams :: Ord a => [(a,b)] -> [(a,[b])] splitStreams xs = takeWhile (not . null . snd) $ toList $ splitStreams' Map.empty xs

splitStreams' :: Ord a => Map.Map a Position -> [(a,b)] -> Imp (a,[b]) splitStreams' map [] = fmap (const (undefined,[])) $ fromList [1..] splitStreams' map ((a,b):xs) = update fun pos $ splitStreams' map' xs where fun ~(_,bs) = (a,b:bs) sz = Map.size map pos = Map.findWithDefault (sz+1) a map map' = (if Map.member a map then id else Map.insert a (sz+1)) map

Regards, apfelmus

Ross Paterson

14 Sep 14 Sep

6:57 p.m.

On Thu, Sep 14, 2006 at 05:22:05PM +0200, Bertram Felgenhauer wrote:

...

[much subtle code] We can now build the splitStream function, using the following helper function:

...
splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b])

This works for infinite lists if there is no balancing, but if insert does balancing, the top of the map will not be available until the last key is seen, so splitSeq' could only be used for finite chunks. Then you'll need a way to put the partial answers together. It might be possible to amortize the cost of that for an appropriate choice of chunk length. It would also cost some laziness: the chunked version would work for infinite lists, but wouldn't produce all the output for a partial list.

Bertram Felgenhauer

11:13 p.m.

Ross Paterson wrote:

...

On Thu, Sep 14, 2006 at 05:22:05PM +0200, Bertram Felgenhauer wrote:

...
[much subtle code] We can now build the splitStream function, using the following helper function:

...
splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b])

This works for infinite lists if there is no balancing, but if insert does balancing, the top of the map will not be available until the last key is seen, so splitSeq' could only be used for finite chunks. Then you'll need a way to put the partial answers together.

Just to prove the point, here's the same code with balancing:

...

...
...
SNIP HERE (end marked with <<<) >>>

module SplitSeq (splitSeq) where import Prelude hiding (lookup, map) -- our map data Map k a = Node !Int k a (Map k a) (Map k a) | Leaf deriving Show size :: Map k a -> Int size Leaf = 0 size (Node s _ _ _ _) = s member :: Ord k => k -> Map k a -> Bool member _ Leaf = False member k (Node _ k' _ l r) = case compare k k' of LT -> member k l EQ -> True GT -> member k r -- insert key into blueprint and extract the corresponding value from -- the second argument, threading it backward through all operations insert :: Ord k => k -> Map k () -> Map k a -> (Map k (), a, Map k a) insert k Leaf ~(Node _ _ a _ _) = (Node 1 k () Leaf Leaf, a, Leaf) insert k (Node s k' _ l r) node = case compare k k' of LT -> let (m, a, l'') = insert k l l' (m', a', l', r') = balance k' m r node in (m', a, Node s k' a' l'' r') EQ -> error "inserting existing element" GT -> let (m, a, r'') = insert k r r' (m', a', l', r') = balance k' l m node in (m', a, Node s k' a' l' r'') -- balance and co are taken from Data.Map and adapted balance k l r node | size l + size r <= 1 = let Node _ _ a l' r' = node in (mkNode k () l r, a, l', r') | size r >= 5 * size l = rotateL k l r node | size l >= 5 * size r = rotateR k l r node | otherwise = let Node _ _ a l' r' = node in (mkNode k () l r, a, l', r') rotateL k l r@(Node _ _ _ l' r') node | size l' < 2*size r' = singleL k l r node | otherwise = doubleL k l r node rotateR k l@(Node _ _ _ l' r') r node | size r' < 2*size l' = singleR k l r node | otherwise = doubleR k l r node singleL k l (Node s k' _ m r) ~(Node _ _ a ~(Node _ _ a' l' m') r') = (mkNode k' () (mkNode k () l m) r, a', l', Node s k' a m' r') singleR k (Node s k' _ l m) r ~(Node _ _ a l' ~(Node _ _ a' m' r')) = (mkNode k' () l (mkNode k () m r), a', Node s k' a l' m', r') doubleL k l (Node s k' _ (Node s' k'' _ ml mr) r) ~(Node _ _ a ~(Node _ _ a' l' ml') ~(Node _ _ a'' mr' r')) = (mkNode k'' () (mkNode k () l ml) (mkNode k' () mr r), a', l', Node s k' a'' (Node s' k'' a ml' mr') r') doubleR k (Node s k' _ l (Node s' k'' _ ml mr)) r ~(Node _ _ a ~(Node _ _ a' l' ml') ~(Node _ _ a'' mr' r')) = (mkNode k'' () (mkNode k' () l ml) (mkNode k () mr r), a'', Node s k' a' l' (Node s' k'' a ml' mr'), r') -- make a new node with the correct size mkNode k x l r = Node (size l + size r + 1) k x l r -- update the element associated with the given key update :: Ord k => k -> (a -> a) -> Map k x -> Map k a -> Map k a update k f (Node s k' _ l r) ~(Node _ _ a' l' r') = case compare k k' of LT -> Node s k' a' (update k f l l') r' EQ -> Node s k' (f a') l' r' GT -> Node s k' a' l' (update k f r r') -- standard map function, no blueprints here map :: (a -> b) -> Map k a -> Map k b map _ Leaf = Leaf map f (Node s k a l r) = Node s k (f a) (map f l) (map f r) -- finally, define splitSeq splitSeq :: Ord a => [(a,b)] -> [(a,[b])] splitSeq = fst . splitSeq' Leaf splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b]) splitSeq' bp [] = ([], map (const []) bp) splitSeq' bp ((a,b):xs) = case member a bp of True -> let (l, m) = splitSeq' bp xs in (l, update a (b:) bp m) False -> let (bp', a', m) = insert a bp m' (l, m') = splitSeq' bp' xs in ((a, b : a') : l, m) <<< The balancing code is adopted from Data.Map. I added threading back the result map (from splitSeq) to the balancing operations. This results in the promised O(n*log m) time. The cost associated with a lazy pattern (it creates a closure for each bound variable, IIRC) is quite high but constant, so each call to insert takes O(log m) time, including future forces of the lazy patterns. Likewise, member and update are also O(log m). Example test: Prelude SplitSeq> print $ take 11 $ snd $ splitSeq ([(n `mod` 10000, 'a') | n <- [1..100000]] ++ [(n, 'b') | n <- [0..]]) !! 9534 "aaaaaaaaaab" (1.07 secs, 137026252 bytes) The nonbalancing version would take ages. regards, Bertram

Ross Paterson

15 Sep 15 Sep

10:03 p.m.

On Fri, Sep 15, 2006 at 05:13:29AM +0200, Bertram Felgenhauer wrote:

...

Just to prove the point, here's the same code with balancing:

How embarrassing. Still, your code is quite subtle. As penance, here's my explanation, separating the main idea from the knot-tying. The ingredients are a map type with an insertion function data Key k a = ... instance Functor (Map k) insert :: Ord k => k -> a -> Map k a -> Map k a with a partial inverse uninsert :: Ord k => k -> Map k () -> Map k a -> (a, Map k a) satisfying uninsert k (fmap (const ()) m) (insert k v m) = (v, m) for any map m not containing k. We also need an update function update :: Ord k => k -> (a -> a) -> Map k x -> Map k a -> Map k a where the two map arguments have the same shape. Then splitSeq becomes: splitSeq :: Ord a => [(a,b)] -> [(a,[b])] splitSeq = fst . splitSeq' Leaf splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b]) splitSeq' bp [] = ([], map (const []) bp) splitSeq' bp ((a,b):xs) | member a bp = let (l, m) = splitSeq' bp xs in (l, update a (b:) bp m) | otherwise = let bp' = insert a () bp (l, m') = splitSeq' bp' xs (bs, m) = uninsert a bp m' in ((a, b : bs) : l, m) Applying a tupling transformation to insert+uninsert gives your version. It's interesting that these composed transformations don't seem to cost too much.

apfelmus＠quantentunnel.de

17 Sep 17 Sep

7:08 a.m.

Ross Paterson wrote:

...

How embarrassing. Still, your code is quite subtle. As penance, here's my explanation, separating the main idea from the knot-tying.

The ingredients are a map type with an insertion function [...] insert :: Ord k => k -> a -> Map k a -> Map k a

with a partial inverse

uninsert :: Ord k => k -> Map k () -> Map k a -> (a, Map k a)

[...] Applying a tupling transformation to insert+uninsert gives your version.

It's interesting that these composed transformations don't seem to cost too much.

That the composed transformations are indeed cheap is not necessarily disturbing. Let's call Bertram's original insert version "insertdelete" from now on. When analyzing the magic behind, you do ross_analysis :: ((bp,map') -> (bp',map)) -> (bp->bp', (bp,map') -> map) ross_analysis insertdelete = (insert, uninsert) Of course a tupling transformation will reverse this tupletiple :: (bp->bp', (bp,map') -> map) -> ((bp,map') -> (bp',map)) tupletiple (insert,uninsert) = insertdelete But as @djinn will tell you, "ross_analysis cannot be realized" :) So the original version possesses additional computational power, it can reverse everything it's doing with bp on the map'. uninsert does not have information about the single steps that have been done, it only knows what should come out. From that, it would have to reconstruct quickly what happened, if it wants to be fast. Regards, apfelmus

Ross Paterson

18 Sep 18 Sep

6:51 p.m.

On Sun, Sep 17, 2006 at 01:08:04PM +0200, apfelmus@quantentunnel.de wrote:

...

Ross Paterson wrote:

...
It's interesting that these composed transformations don't seem to cost too much.

That the composed transformations are indeed cheap is not necessarily disturbing.

I meant the composition of the transformations of the tree (update or reverse insert) that Bertram does for each list element. They're cheap because they're bounded by the depth of the tree, and even cheaper if you're probing some other part of the tree.

apfelmus＠quantentunnel.de

19 Sep 19 Sep

7:41 a.m.

Ross Paterson wrote:

...

On Sun, Sep 17, 2006 at 01:08:04PM +0200, apfelmus@quantentunnel.de wrote:

...
Ross Paterson wrote:

...
It's interesting that these composed transformations don't seem to cost too much. That the composed transformations are indeed cheap is not necessarily disturbing.

I meant the composition of the transformations of the tree (update or reverse insert) that Bertram does for each list element. They're cheap because they're bounded by the depth of the tree, and even cheaper if you're probing some other part of the tree.

Me too :) I tried to point out that separating uninsert from in insert increases time complexity. In general uninsert :: Ord k => k -> Map k () -> Map k a -> (a, Map k a) fst $ uninsert _ bp m == differenceWith (curry snd) bp m with needs O(size of blueprint) time. This is to be compared with O(log (size of blueprint)) for the combined insertdelete. That there (likely) must be such a difference can already be seen from the types of insertdelete and (insert,uninsert) ! But you already pointed out that something like insertdelete :: Ord k => k -> Map shape k -> (exists shape' . (Map shape' k, Map shape' a -> (a, Map shape a)) is the best type for insertdelete. Here, it is clear that insertdelete likely can do a fast uninsert. Btw, why are there no irrefutable patterns for GADTs? I mean, such a sin should be shame for a non-strict language... Regards, apfelmus

Ross Paterson

8:34 a.m.

On Tue, Sep 19, 2006 at 01:41:51PM +0200, apfelmus@quantentunnel.de wrote:

...

Btw, why are there no irrefutable patterns for GADTs?

Not GADTs, but existential types (whether done with GADTs or not). They can't be analysed with irrefutable patterns, of which let bindings are a special case: http://www.haskell.org/ghc/docs/6.4.2/html/users_guide/type-extensions.html#... See the second restriction. Irrefutable patterns aren't mentioned there, but they're the general case. See also http://www.haskell.org/pipermail/haskell-cafe/2006-May/015609.html http://www.haskell.org/pipermail/glasgow-haskell-users/2006-May/010278.html though I don't buy the rationale there. Hugs has no such restriction.

...

I mean, such a sin should be shame for a non-strict language...

It certainly bites in this case. We could define data TLeaf data TNode s1 s2 data MapShape s k where SLeaf :: MapShape TLeaf k SNode :: !Int -> k -> MapShape s1 k -> MapShape s2 k -> MapShape (TNode s1 s2) k data MapData s a where DLeaf :: MapData TLeaf a DNode :: a -> MapData s1 a -> MapData s2 a -> MapData (TNode s1 s2) a data InsertResult s k = forall s'. InsertResult (MapShape s' k) (forall a. MapData s' a -> (a, MapData s a)) insert :: Ord k => k -> MapShape s k -> InsertResult s k and have the compiler check that the transformations on the shape match the transformations on the data, but first we need to turn lots of lets into cases and erase the tildes. Of course the resulting program no longer works, but it does have verifiably correct transformations.

Conor McBride

8:52 a.m.

Hi folks apfelmus@quantentunnel.de wrote:

...

Btw, why are there no irrefutable patterns for GADTs? I mean, such a sin should be shame for a non-strict language...

Just imagine

...

data Eq a b where Refl :: Eq a a

...

coerce :: Eq a b -> a -> b coerce ~Refl a = b

coerce undefined True :: String Bang you're dead. Or rather... Twiddle you're dead. Moral: in a non-total language, if you're using indexing to act as evidence for something, you need to be strict about checking the evidence before you act on it, or you will be vulnerable to the blandishments of the most appalling liars. As Randy Pollack used to say to us when we were children, the best thing about working in a strongly normalizing language is not having to normalize things. All the best Conor This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.

Robert Dockins

9:40 a.m.

On Sep 19, 2006, at 8:52 AM, Conor McBride wrote:

...

Hi folks

apfelmus@quantentunnel.de wrote:

...
Btw, why are there no irrefutable patterns for GADTs? I mean, such a sin should be shame for a non-strict language...

Just imagine

...
data Eq a b where Refl :: Eq a a

...
coerce :: Eq a b -> a -> b coerce ~Refl a = b

I think you mean:

...

coerce ~Refl x = x

...

coerce undefined True :: String

Bang you're dead. Or rather... Twiddle you're dead.

Moral: in a non-total language, if you're using indexing to act as evidence for something, you need to be strict about checking the evidence before you act on it, or you will be vulnerable to the blandishments of the most appalling liars.

As Randy Pollack used to say to us when we were children, the best thing about working in a strongly normalizing language is not having to normalize things.

All the best

Conor

Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

Conor McBride

9:48 a.m.

Robert Dockins wrote:

...

On Sep 19, 2006, at 8:52 AM, Conor McBride wrote:

...
Hi folks

apfelmus@quantentunnel.de wrote:

...
Btw, why are there no irrefutable patterns for GADTs? I mean, such a sin should be shame for a non-strict language...

Just imagine

...
data Eq a b where Refl :: Eq a a

...
coerce :: Eq a b -> a -> b coerce ~Refl a = b

I think you mean:

...
coerce ~Refl x = x

Indeed I do. My original program was much safer! Thanks Conor This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.

apfelmus＠quantentunnel.de

2:06 p.m.

Conor McBride wrote:

...

Just imagine

...
data Eq a b where Refl :: Eq a a

...
coerce :: Eq a b -> a -> b Robert Dockins wrote:

...
coerce ~Refl x = x

coerce undefined True :: String

Bang you're dead. Or rather... Twiddle you're dead.

Mom, he's scaring me! ;) Who says this function may not be strict in the first argument despite the irrefutable pattern? Also, for the sarcastically inclined, death is semantically equivalent to _|_. As is (fix id :: a -> b), too. Alas, one can say dontcoerce :: Eq a (Maybe b) -> a -> Maybe b dontcoerce ~Refl x = Just x and dontcoerce' _|_ ==> (\x -> Just (x :: b(~Refl))) ==> \(x::_|_) -> Just (x :: _|_) ==> \_ -> Just _|_ The type of x on the right-hand side inherently depends on ~Refl and should be _|_ if ~Refl is. Type refinement makes the types depend on values, after all. For our optimization problem, it's only a matter of constructors on the right hand side. They should pop out before do not look on any arguments, so it's safe to cry "so you just know, i'm a Just". Maybe one can remedy things by introducing a _|_ on type-level with the only inhabitant _|_ :: _|_. That would treat objections Ross Paterson referenced:

...

See the second restriction. Irrefutable patterns aren't mentioned there, but they're the general case. See also

http://www.haskell.org/pipermail/haskell-cafe/2006-May/015609.html http://www.haskell.org/pipermail/glasgow-haskell-users/2006-May/010278.html

Admittedly, it's a thin ice to trample on, but otherwise, for this special splitSeq-problem, one runs into the "more haste less speed" dilemma (i mean Wadler's paper ). Bertram's lazy algorithm actually is an online-algorithm and it should remain one when making it type safe. Regards, apfelmus

Ross Paterson

20 Sep 20 Sep

5:48 a.m.

On Tue, Sep 19, 2006 at 08:06:07PM +0200, apfelmus@quantentunnel.de wrote:

...

For our optimization problem, it's only a matter of constructors on the right hand side. They should pop out before do not look on any arguments, so it's safe to cry "so you just know, i'm a Just".

It seems the appropriate encoding would be for the shape to be an inductive datatype and the contents (as well as the lists) to be coinductive, as access to the contents is gated through the shape.

Ross Paterson

28 Sep 28 Sep

9:44 a.m.

On Tue, Sep 19, 2006 at 01:52:02PM +0100, Conor McBride wrote:

...

apfelmus@quantentunnel.de wrote:

...
Btw, why are there no irrefutable patterns for GADTs?

Just imagine

...
data Eq a b where Refl :: Eq a a

...
coerce :: Eq a b -> a -> b coerce ~Refl a = a

coerce undefined True :: String

Bang you're dead. Or rather... Twiddle you're dead.

Does anything go wrong with irrefutable patterns for existential types?

Simon Peyton-Jones

10:22 a.m.

| Does anything go wrong with irrefutable patterns for existential types? Try giving the translation into System F. Simon

Ross Paterson

11:04 a.m.

On Thu, Sep 28, 2006 at 03:22:25PM +0100, Simon Peyton-Jones wrote:

...

| Does anything go wrong with irrefutable patterns for existential types?

Try giving the translation into System F.

Hmm, that's not quite as satisfying as Conor's answer for GADTs.

Ross Paterson

29 Sep 29 Sep

5:36 a.m.

New subject: existential types (was Re: Optimization problem)

On Thu, Sep 28, 2006 at 03:22:25PM +0100, Simon Peyton-Jones wrote:

...

| Does anything go wrong with irrefutable patterns for existential types?

Try giving the translation into System F.

I'm a bit puzzled about this. A declaration data Foo = forall a. MkFoo a (a -> Bool) is equivalent to newtype Foo = forall a. Foo (Foo' a) data Foo' a = MkFoo a (a -> Bool) except that you also don't allow existentials with newtypes, for similarly unclear reasons. If you did allow them, you'd certainly allow this equivalent of the original irrefutable match: ... case x of Foo y -> let MkFoo val fn = y in fn val So, is there some real issue with existentials and non-termination?

Simon Peyton-Jones

6:19 a.m.

New subject: existential types (was Re: Optimization problem)

I must be missing your point. Newtype is just type isomorphism; a new name for an existing type. But there is not existing type "exists x. T(x)". So it's not surprising that newtype doesn't support existentials. I've lost track of this thread. Can you re-state the question? I'm strongly influence by the question "can we translate this into System F + (existential) data types" because (a) that's what GHC does (b) it's an excellent sanity check. E.g. if you can, then we know the system is sound. Simon | -----Original Message----- | From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Ross | Paterson | Sent: 29 September 2006 10:37 | To: haskell-cafe@haskell.org | Subject: [Haskell-cafe] existential types (was Re: Optimization problem) | | On Thu, Sep 28, 2006 at 03:22:25PM +0100, Simon Peyton-Jones wrote: | > | Does anything go wrong with irrefutable patterns for existential types? | > | > Try giving the translation into System F. | | I'm a bit puzzled about this. A declaration | | data Foo = forall a. MkFoo a (a -> Bool) | | is equivalent to | | newtype Foo = forall a. Foo (Foo' a) | data Foo' a = MkFoo a (a -> Bool) | | except that you also don't allow existentials with newtypes, for | similarly unclear reasons. If you did allow them, you'd certainly | allow this equivalent of the original irrefutable match: | | ... case x of | Foo y -> let MkFoo val fn = y in fn val | | So, is there some real issue with existentials and non-termination? | | _______________________________________________ | Haskell-Cafe mailing list | Haskell-Cafe@haskell.org | http://www.haskell.org/mailman/listinfo/haskell-cafe

Ross Paterson

7:39 a.m.

New subject: existential types (was Re: Optimization problem)

On Fri, Sep 29, 2006 at 11:19:26AM +0100, Simon Peyton-Jones wrote:

...

I must be missing your point. Newtype is just type isomorphism; a new name for an existing type. But there is not existing type "exists x. T(x)". So it's not surprising that newtype doesn't support existentials.

And yet newtype does support recursion.

...

I've lost track of this thread.

The story so far: apfelmus: why are there no irrefutable patterns for GADTs? Conor: because you could use them to write unsafeCoerce Ross: how about irrefutable patterns (or newtypes) for existential types? Simon: Try giving the translation into System F + (existential) data types Copying a notion of datatype from Haskell to Core and then asking for a translation to that seems to be begging the question. If your target really was System F, you could use the standard encoding of existentials as nested foralls, but there's the problem that Haskell function spaces differ from System F ones, System F is strongly normalizing, etc.

apfelmus＠quantentunnel.de

15 Sep 15 Sep

5 a.m.

Ross Paterson wrote:

...

On Thu, Sep 14, 2006 at 05:22:05PM +0200, Bertram Felgenhauer wrote:

...
[much subtle code] We can now build the splitStream function, using the following helper function:

...
splitSeq' :: Ord a => Map a () -> [(a,b)] -> ([(a,[b])], Map a [b])

This works for infinite lists if there is no balancing, but if insert does balancing, the top of the map will not be available until the last key is seen, so splitSeq' could only be used for finite chunks. Then you'll need a way to put the partial answers together. It might be possible to amortize the cost of that for an appropriate choice of chunk length. It would also cost some laziness: the chunked version would work for infinite lists, but wouldn't produce all the output for a partial list.

No. The point is that in let blueprint = empty (_,m) = splitSeq' blueprint $ concat $ repeat [(1,'a'),(2,'b')], the map m contains only as many keys as there are in blueprint, i.e. not a single one! After munching the first (1,'a'), the first recursive call to splitSeq', the returned map m' fulfills toList m' == [(1,"aaaaa...")] The trick is to throw away all keys from the future, so that there is no need to wait on them. Also, if your argument would have been correct, then the version without balancing wouldn't work either because insert already exchanges Leafs for Nodes in m. So the top of the map would be unavailable until all Leafs have been exchanged. Regards, apfelmus

6810

Age (days ago)

6826

Last active (days ago)

List overview

Download

43 comments

14 participants

participants (14)

apfelmus＠quantentunnel.de
Bertram Felgenhauer
Bruno Oliveira
Conor McBride
dons＠cse.unsw.edu.au
Henning Thielemann
Ketil Malde
Magnus Jonsson
Matthias Fischmann
Paul Hudak
Robert Dockins
Ross Paterson
Simon Peyton-Jones
Twan van Laarhoven