I have created an ugly Haskell program..

.. and I am positive there must be a way of beautifying it, but I am struggling. I bet there is just some lovely way of making this all shrink to three lines.. So here's the problem. I have two lists of tuples: (timestamp, value) What I would like to do in do a kind of 'zip' on two of these lists to make a list of (timestamp, (value1, value2)) with the following rules: - If the timestamps are equal it's easy - make your new element an move on - If one of the lists has a timestamp that the other doesn't, repeat an old value from the other list - If we don't have an old value yet, then don't create an element in the new list. e.g. if I ran my algorithm on these two lists d1 = [ (1,"a"), (2,"b"), (3,"c") ] d2 = [ (2,"b'"), (4,"d'") ] I would like to get result = [ (2, (b,b')), (3, (c,b')), (4, (c,d')) ] e.g. there was no data in d2 for our first element so we skipped it. Okay, so here is my code.. It works, but makes me feel a bit dirty. To explain my nomenclature 't' is 'timestamp of', 'v' is 'value of'. vx' and vy' are the 'old' values from the previous iteration in case a repeat is needed. They are Maybes because at the beginning there may be no old value. d1 = [ (1,"a"), (2,"b"), (3,"c") ] d2 = [ (2,"b'"), (4,"d'") ] t (x,y) = x v (x,y) = y js vx' vy' (x:xs) (y:ys) | t x == t y = ( (t x), (v x, v y) ) : js (Just (v x)) (Just (v y)) xs ys | t x < t y = maybe (js (Just (v x)) Nothing xs (y:ys)) (\z -> ( t x, (v x, z ) ) : ( js (Just (v x)) (Just z) xs (y:ys))) vy' | t x > t y = maybe (js Nothing (Just (v y)) (x:xs) ys) (\z -> ( t y, (z, v y ) ) : ( js (Just z) (Just (v y)) (x:xs) ys)) vx' js vx' vy' (x:xs) [] = maybe [] (\z -> ( t x, (v x, z ) ) : ( js (Just (v x)) (Just z) xs [])) vy' js vx' vy' [] (y:ys) = maybe [] (\z -> ( t y, (z, v y ) ) : ( js (Just z) (Just (v y)) [] ys )) vx' js _ _ [] [] = [] You call it with the first two arguments as Nothing to kick it off (I have a trivial wrapper function to do this) It works fine:
:t js js :: (Ord t) => Maybe a1 -> Maybe a -> [(t, a1)] -> [(t, a)] -> [(t, (a1, a))]
js Nothing Nothing d1 d2 [(2,("b","b'")),(3,("c","b'")),(4,("c","d'"))]
But it just feels gross. Any advice on how to tame this beast would be greatly appreciated :) All the best, Philip

Function you seek is 'specialZip' below. 'fluff' and 'decapitate' are helpers. Not extensively tested. -- Given a list of ints that "should" all have values, fill in missing -- values using the "last" value as default. fluff :: String -> [Int] -> [(Int,String)] -> [(Int,String)] fluff last (i:is) pss@((t,s):ps) | i == t = (i,s) : fluff s is ps | i < t = (i,last) : fluff last is pss fluff last is [] = zip is (repeat last) -- Given two lists, remove enough from the front to get to two equal keys. decapitate [] _ = ([],[]) decapitate _ [] = ([],[]) decapitate xss@((tx,_):xs) yss@((ty,_):ys) | tx < ty = decapitate xs yss | ty < tx = decapitate xss ys | ty == tx = (xss,yss) specialZip d1 d2 = let (dd1,dd2) = decapitate d1 d2 -- build set of every key that should be in final list s = S.toAscList . S.fromList $ (map fst dd1) ++ (map fst dd2) in case (dd1,dd2) of ([],[]) -> [] (xs1,xs2) -> let f1 = fluff "" s xs1 -- use this set to fluff f2 = fluff "" s xs2 -- each list -- so final answer can be a simple zipWith in zipWith (\(t1,s1) (t2,s2) -> (t1,(s1,s2))) f1 f2 Philip Scott wrote:
.. and I am positive there must be a way of beautifying it, but I am struggling. I bet there is just some lovely way of making this all shrink to three lines..
So here's the problem. I have two lists of tuples: (timestamp, value)
What I would like to do in do a kind of 'zip' on two of these lists to make a list of (timestamp, (value1, value2)) with the following rules:
- If the timestamps are equal it's easy - make your new element an move on - If one of the lists has a timestamp that the other doesn't, repeat an old value from the other list - If we don't have an old value yet, then don't create an element in the new list.
e.g. if I ran my algorithm on these two lists
d1 = [ (1,"a"), (2,"b"), (3,"c") ] d2 = [ (2,"b'"), (4,"d'") ]
I would like to get
result = [ (2, (b,b')), (3, (c,b')), (4, (c,d')) ]
e.g. there was no data in d2 for our first element so we skipped it.
Okay, so here is my code.. It works, but makes me feel a bit dirty. To explain my nomenclature 't' is 'timestamp of', 'v' is 'value of'. vx' and vy' are the 'old' values from the previous iteration in case a repeat is needed. They are Maybes because at the beginning there may be no old value.
d1 = [ (1,"a"), (2,"b"), (3,"c") ] d2 = [ (2,"b'"), (4,"d'") ]
t (x,y) = x v (x,y) = y
js vx' vy' (x:xs) (y:ys) | t x == t y = ( (t x), (v x, v y) ) : js (Just (v x)) (Just (v y)) xs ys | t x < t y = maybe (js (Just (v x)) Nothing xs (y:ys)) (\z -> ( t x, (v x, z ) ) : ( js (Just (v x)) (Just z) xs (y:ys))) vy' | t x > t y = maybe (js Nothing (Just (v y)) (x:xs) ys) (\z -> ( t y, (z, v y ) ) : ( js (Just z) (Just (v y)) (x:xs) ys)) vx' js vx' vy' (x:xs) [] = maybe [] (\z -> ( t x, (v x, z ) ) : ( js (Just (v x)) (Just z) xs [])) vy' js vx' vy' [] (y:ys) = maybe [] (\z -> ( t y, (z, v y ) ) : ( js (Just z) (Just (v y)) [] ys )) vx' js _ _ [] [] = []
You call it with the first two arguments as Nothing to kick it off (I have a trivial wrapper function to do this)
It works fine:
:t js js :: (Ord t) => Maybe a1 -> Maybe a -> [(t, a1)] -> [(t, a)] -> [(t, (a1, a))]
js Nothing Nothing d1 d2 [(2,("b","b'")),(3,("c","b'")),(4,("c","d'"))]
But it just feels gross. Any advice on how to tame this beast would be greatly appreciated :)
All the best,
Philip _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Another solution here. The inspiration is to try to use Data.Map's fromListWith to do the main work. Notice that you can "decapitate" the useless head of each list with the single line dropWhile (not . both) . M.toAscList $ neated import Control.Arrow import qualified Data.Map as M data Combine v = LeftOnly v | RightOnly v | BothOfThem v v deriving (Show) cmb :: Combine a -> Combine a -> Combine a cmb (LeftOnly x) (RightOnly y) = BothOfThem x y cmb (RightOnly y) (LeftOnly x) = BothOfThem x y both (_,(BothOfThem _ _ )) = True both _ = False chain _ last2 ((t,LeftOnly v):xs) = (t,(v,last2)) : chain v last2 xs chain last1 _ ((t,RightOnly v):xs) = (t,(last1,v)) : chain last1 v xs chain _ _ ((t,BothOfThem v w):xs) = (t,(v,w)) : chain v w xs chain _ _ [] = [] specialZip d1 d2 = let neated = M.fromListWith cmb $ map (second LeftOnly) d1 ++ map (second RightOnly) d2 dr = dropWhile (not . both) . M.toAscList $ neated in chain "" "" dr Philip Scott wrote:
Michael Mossey wrote:
Function you seek is 'specialZip' below. 'fluff' and 'decapitate' are helpers. Not extensively tested.
Thanks Michael, that looks much better than mine :)

On Sun, Nov 01, 2009 at 11:27:42PM +0000, Philip Scott wrote:
.. and I am positive there must be a way of beautifying it, but I am struggling. I bet there is just some lovely way of making this all shrink to three lines..
So here's the problem. I have two lists of tuples: (timestamp, value)
What I would like to do in do a kind of 'zip' on two of these lists to make a list of (timestamp, (value1, value2)) with the following rules:
- If the timestamps are equal it's easy - make your new element an move on - If one of the lists has a timestamp that the other doesn't, repeat an old value from the other list - If we don't have an old value yet, then don't create an element in the new list.
Ask yourself: What Would Conal Do (WWCD)? Conal Elliott is always trying to get people to think about the semantic essence of their problems, so let's try it. What are we REALLY trying to do here? What are those lists of tuples, REALLY? Well, it seems to me that the lists of tuples are really just representing *functions* on some totally ordered domain. The list-of-pairs representation takes advantage of the fact that these functions tend to be constant on whole intervals of the domain; we only need a tuple to mark the *beginning* of a constant interval. The fact that we want to take a value from the last old timestamp when we don't have a certain timestamp in the list reflects the fact that the function takes on that value over the whole *interval* from the timestamp when it occurred to whenever the next timestamp is. So, let's try converting these lists of pairs to actual functions: asFunc :: (Ord a) => [(a,b)] -> (a -> Maybe b) asFunc is a = fmap snd . listToMaybe . reverse . takeWhile ((<=a) . fst) $ is Simple -- we just scan through the list looking for the right interval. Now the combining function is just a matter of converting the lists to functions, and applying those functions to each index we want in the output list (discarding any Nothings). combine :: (Ord a) => [(a,b)] -> [(a,c)] -> [(a,(b,c))] combine is js = catMaybes . flip map ixs $ \a -> fmap ((,) a) (liftA2 (,) (asFunc is a) (asFunc js a)) where ixs = sort . nub $ map fst is ++ map fst js Done! Now, you might object that this is much more inefficient than the other solutions put forth. That is very true. Converting to a function with 'asFunc' means that we do a linear-time lookup in the list every time we call the function, so this is O(n^2) overall instead of O(n). Building the list of indices ixs in the code above is also O(n^2) instead of O(n). However, I still find it very helpful to think about the essence of the problem like this: elegant yet inefficient code is a much better starting place than the other way around! From here there are several possibilities: maybe this version is efficient enough, if you'll only be working with small lists. Or you can also try to optimize, taking advantage of the fact that we always call the functions built by asFunc with a sequence of strictly increasing inputs. I might make a sort of "iterator" object which acts like a function (a -> Maybe b), but keeps some extra state so that as long as you call it with strictly increasing values of a, you get back a Maybe b (and a new iterator) in constant time. Of course, this is really what the other solutions are doing: but thinking about it this way has helped to structure the solution in a (hopefully) more clear and elegant way. -Brent

Thanks, Brent, for this way of looking at it. If you want n log n behavior you could write asFunc to use a Map for lookup. -Mike Brent Yorgey wrote:
Ask yourself: What Would Conal Do (WWCD)? Conal Elliott is always trying to get people to think about the semantic essence of their problems, so let's try it.
What are we REALLY trying to do here? What are those lists of tuples, REALLY? Well, it seems to me that the lists of tuples are really just representing *functions* on some totally ordered domain. The list-of-pairs representation takes advantage of the fact that these functions tend to be constant on whole intervals of the domain; we only need a tuple to mark the *beginning* of a constant interval. The fact that we want to take a value from the last old timestamp when we don't have a certain timestamp in the list reflects the fact that the function takes on that value over the whole *interval* from the timestamp when it occurred to whenever the next timestamp is.
So, let's try converting these lists of pairs to actual functions:
asFunc :: (Ord a) => [(a,b)] -> (a -> Maybe b) asFunc is a = fmap snd . listToMaybe . reverse . takeWhile ((<=a) . fst) $ is
Simple -- we just scan through the list looking for the right interval.

Brent Yorgey wrote:
Ask yourself: What Would Conal Do (WWCD)? Conal Elliott is always trying to get people to think about the semantic essence of their problems, so let's try it.
What are we REALLY trying to do here? What are those lists of tuples, REALLY? Well, it seems to me that the lists of tuples are really just representing *functions* on some totally ordered domain. [...]
So, let's try converting these lists of pairs to actual functions:
asFunc :: (Ord a) => [(a,b)] -> (a -> Maybe b) asFunc is a = fmap snd . listToMaybe . reverse . takeWhile ((<=a) . fst) $ is
[...]
Now, you might object that this is much more inefficient than the other solutions put forth. That is very true. [...]
However, I still find it very helpful to think about the essence of the problem like this: elegant yet inefficient code is a much better starting place than the other way around! [...]
You can also try to optimize, taking advantage of the fact that we always call the functions built by asFunc with a sequence of strictly increasing inputs.
I am with Brent and Conal here. Now, to continue, ask yourself: What Would Conal Do Next (WWCDN)? What are we really trying to do here? What is this function, really, considering that we are only evaluating it at a strictly increasing sequence of inputs? Well, it seems to me that it is some special kind of function, best captured as an *abstract data type*. In particular, the function is something which I will call a "time series". In other words, the input is to be thought of as time. data Time t = Moment t | Infinity deriving (Eq,Ord,Show) The inclusion of infinity will turn out to be very convenient. Now, the time series is a function that has a value x1 in the distant past, until a time t1 where it begins to have the value x2 , again until a time t2 where it switches to x3 and so on, until a value xn that is kept until infinity. In Haskell, this looks like this function t | -Infinity <= t && t < t1 = x1 | t1 <= t && t < t2 = x2 | t2 <= t && t < t3 = x3 | ... | t1 <= t && t < Infinity = xn and pictorially, something like this: ____ xn _____ ____ x2 ____ | | |____ x3 ____ ... | _____ x1 ____| -Inf t1 t2 ... tn Inf Of course, we can implement this abstract data type with a list of pairs (tk,xk) newtype TimeSeries t a = TS { unTS :: [(a,Time t)] } deriving (Show) and our goal is to equip this data type with a few natural operations that can be used to implement Philip's zip-like function. The first two operations are progenitor :: TimeSeries t a -> a progenitor = fst . head . unTS which returns the value from the distant past and beginning :: TimeSeries t a -> Time t beginning = snd . head . unTS which returns the first point in time when the function changes its value. These correspond to the operation head on lists. The next operation is called `forgetTo` t and will throw away all values and changes before and including a given time t . forgetTo :: Ord t => TimeSeries t a -> Time t -> TimeSeries t a forgetTo (TS xs) Infinity = TS [last xs] forgetTo (TS xs) t = TS $ dropWhile ((<= t) . snd) xs This roughly corresponds to tail , but takes advantage of the time being continuous. Last but not least, we need a way to create a time series forever :: a -> TimeSeries t a forever x = TS [(x,Infinity)] and we need to add values to a time series, which can be done with an operation called prepend that adds a new beginning and replaces the progenitor . -- We assume that t < beginning xs prepend :: a -> Time t -> TimeSeries t a -> TimeSeries t a prepend x Infinity _ = TS [(x,Infinity)] prepend x t (TS xs) = TS $ (x,t) : xs These operations correspond to [] and (:) for lists. The key about these operations is that they have a description / intuition that is *independent* of the implementation of times series. At no point do we need to know how exactly TimeSeries is implemented to understand what these five operations do. Now, Philip's desired zip-like function is straightforward to implement: zipSeries :: Ord t => TimeSeries t a -> TimeSeries t b -> TimeSeries t (a,b) zipSeries xs ys = prepend (progenitor xs, progenitor ys) t $ zipSeries (xs `forgetTo` t) (ys `forgetTo` t) where t = min (beginning xs) (beginning ys) and you may want to convince yourself of its correctness by appealing to the intuition behind time series. Regards, apfelmus -- http://apfelmus.nfshost.com
participants (4)
-
Brent Yorgey
-
Heinrich Apfelmus
-
Michael Mossey
-
Philip Scott