Paths to tree

newer
Re: [Haskell-cafe] GADTs: the plot...

John Ky

29 Jan 2007 29 Jan '07

11:10 a.m.

Hi, I've written some code and was wondering if there was a better way to write it in terms of readability, brevity and/or efficiency. The function concerned is pathsToForest which takes a list of paths (ie. [[String]]) and converts it into a tree structure where the individual nodes are the names in the path. Siblings with the same name are merged. For instance: prettyPrint $ mergeForest $ pathsToForest [["a", "b", "c"], ["c", "b", "a"], ["a", "b", "d"]] gives: a b d c c b a Thanks -John import Data.Tree import Control.Monad data ArcData = ArcData { name :: String } deriving Show type ArcTree = Tree ArcData type ArcForest = Forest ArcData pathsToForest :: [[String]] -> ArcForest pathsToForest paths = mergeForest $ concat $ map pathToTree paths mergeForest :: ArcForest -> ArcForest mergeForest [] = [] mergeForest (x:xs) = merge x (mergeForest xs) where merge :: ArcTree -> ArcForest -> ArcForest merge tree [] = [tree] merge tree (y:ys) = if sameTreeName tree y then merge tree { subForest = mergeForest ((subForest tree) ++ (subForest y)) } ys else (y:merge tree ys) treeName :: ArcTree -> String treeName tree = name $ rootLabel $ tree sameTreeName :: ArcTree -> ArcTree -> Bool sameTreeName treeLeft treeRight = treeName treeLeft == treeName treeRight pathToTree :: [String] -> ArcForest pathToTree [] = [] pathToTree (name:subpath) = [ Node { rootLabel = ArcData { name = name } , subForest = pathToTree subpath } ] prettyPrint' :: ArcForest -> [String] prettyPrint' [] = [] prettyPrint' (x:xs) = [name $ rootLabel $ x] ++ (map (" " ++) (prettyPrint' $ subForest x)) ++ prettyPrint' xs prettyPrint :: ArcForest -> IO () prettyPrint forest = do forM_ (prettyPrint' forest) putStrLn

Attachments:

attachment.html (text/html — 2.9 KB)

Show replies by date

Tomasz Zielonka

29 Jan 29 Jan

12:07 p.m.

On Mon, Jan 29, 2007 at 10:10:47PM +1100, John Ky wrote:

...

I've written some code and was wondering if there was a better way to write it in terms of readability, brevity and/or efficiency.

This version of mergeForest2 should be more efficient. For even better efficiency it should use Data.Sequence for efficient concatenation (instead of ++). I also made it more general. You have to judge readability yourself. import qualified Data.Map as Map data ArcData = ArcData { name :: String } deriving (Show, Eq, Ord) -- derive Ord and Eq mergeForest2 :: (Ord k) => [Tree k] -> Forest k mergeForest2 = map pairToNode . Map.toList . Map.map mergeForest2 . Map.fromListWith (++) . map nodeToPair where nodeToPair (Node x y) = (x, y) pairToNode = uncurry Node Best regards Tomasz

John Ky

30 Jan 30 Jan

11:57 a.m.

Hi Tomasz, I actually quite like this style. I was able to understand it after spending some time reading the docs for some of the functions you used. My problem I guess is being able to write the code this way when the need arises or even just recognising when and where it's an option, both of which to me is considerably harder. Thanks for the tip. -John On 1/29/07, Tomasz Zielonka wrote:

...

On Mon, Jan 29, 2007 at 10:10:47PM +1100, John Ky wrote:

...
I've written some code and was wondering if there was a better way to write it in terms of readability, brevity and/or efficiency.

This version of mergeForest2 should be more efficient. For even better efficiency it should use Data.Sequence for efficient concatenation (instead of ++). I also made it more general. You have to judge readability yourself.

import qualified Data.Map as Map

data ArcData = ArcData { name :: String } deriving (Show, Eq, Ord) -- derive Ord and Eq

mergeForest2 :: (Ord k) => [Tree k] -> Forest k mergeForest2 = map pairToNode . Map.toList . Map.map mergeForest2 . Map.fromListWith (++) . map nodeToPair where nodeToPair (Node x y) = (x, y) pairToNode = uncurry Node

Best regards Tomasz

Bulat Ziganshin

31 Jan 31 Jan

11:22 a.m.

New subject: Re[2]: Paths to tree

Hello John, Tuesday, January 30, 2007, 2:57:04 PM, you wrote:

...

mergeForest2 = map pairToNode . Map.toList . Map.map mergeForest2 . Map.fromListWith (++) . map nodeToPair

...

My problem I guess is being able to write the code this way when the need arises or even just recognising when and where it's an option, both of which to me is considerably harder.

i guess that it's just imperative style of thought that you can't easily scrap out. the whole idea of functional programming is that you make data *transformations*. you start with one data representation, then transform it into another, then you can make some sub-transformations with sub-elements of your data (using various "map"s), then you may glue data together using various folds and so on. there are real difference between imperative style do-it-after-that and functional style "map original data into result" btw, are you seen http://haskell.org/haskellwiki/Simple_unix_tools ? :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Pixel

12:36 p.m.

Bulat Ziganshin writes:

...

btw, are you seen http://haskell.org/haskellwiki/Simple_unix_tools ? :)

: This is intended as a beginners tutorial for learning Haskell from a : "Lets just solve things already!" point of view. The examples should : help give a flavour of the beauty and expressiveness of Haskell : programming. : : -- : -- Some unix-like tools written in simple, clean Haskell : -- : -- : : [...] : showln = (++ "\n") . show maybe it should be mentioned in the page that it needs -fno-monomorphism-restriction?

apfelmus＠quantentunnel.de

29 Jan 29 Jan

2:01 p.m.

John Ky wrote:

...

I've written some code and was wondering if there was a better way to write it in terms of readability, brevity and/or efficiency.

The function concerned is pathsToForest which takes a list of paths (ie. [[String]]) and converts it into a tree structure where the individual nodes are the names in the path. Siblings with the same name are merged.

For instance:

prettyPrint $ mergeForest $ pathsToForest [["a", "b", "c"], ["c", "b", "a"], ["a", "b", "d"]]

gives:

a b d c c b a

Your code is fine, I like it. A minor hint is that mergeForest is a fold: mergeForest = foldr merge [] Also, we have prettyPrint = putStr . unlines . prettyPrint' $ forest So far, so good. Ironically, Data.Tree is not used much because it is so easy to invent your own tree type in Haskell. Most often, Data.Tree is too rigid and does not offer enough structure. Indeed, this is the case here, too: there is more structure behind your task than you may think at first. Let me show it. I can't know, but it doesn't seem unreasonable that you intend to use the ArcForest as a trie, i.e. an efficient implementation of a set of paths which allows to look up quickly whether a given path (here of type [String]) is in the set or not. So, we have type Path = [String] type SetPath = ... -- a set of paths, to be defined later -- checks whether a given path is in the trie member :: Path -> SetPath -> Bool where 'member' is the function akin to 'elem' on lists. The focus of your code is not on the membership test, but constructing the trie from a list of paths. You named the function 'pathsToForest', we will name it fromList :: [Path] -> SetPath Now, if we have a function insert :: Path -> SetPath -> SetPath that inserts a path into the set and we if we are given the notion of an empty set empty :: SetPath , we can write fromList = foldr insert empty Indeed, your function 'merge' corresponds to 'insert'. So let's find 'insert'. It will turn out that it is easier to generalize from SetPath to a storage that associates a value of type v with every path. This is called "finite map" (here, the keys are of type Path): data MapPath v = ... -- to be defined later -- the empty map contains no values empty :: MapPath v -- a map that contains a single key-value pair singleton :: Path -> v -> MapPath v -- lookup the value for a given Path lookup :: Path -> MapPath v -> Maybe v -- insert a value with given key into the finite map insert :: (v -> v -> v) -> Path -> v -> MapPath v -> MapPath v These functions are like the ones from the standard library Data.Map (btw, you can use this library for your task, too). Note that we had to generalize the type signature for 'insert' considerably because we have to specify what to do when there is already a value for the given key in the trie. This is what the argument of type '(v -> v -> v)' does: it takes the old and the new value and merges them somehow. We will need it, soon. Given the generalization, we can now simply put type SetPath = MapPath () and function 'fromList' will read fromList :: [Path] -> SetPath fromList = foldr (\path -> insert (\_ x -> x) path ()) empty It is time to think about coding 'lookup' and 'insert'. And here comes the trick: we know that 'Path = [String]' so let's assume that we already have a finite map for strings: data MapString v = ... emptyStr :: MapString v singletonStr :: String -> v -> MapString v lookupStr :: String -> MapString v -> Maybe v insertStr :: (v -> v -> v) -> String -> v -> MapString v -> MapString v Now, we can build up our finite map for paths: data MapPath v = TriePath (Maybe v) (MapString (MapPath v)) because it (maybe) contains a value for the key '[] :: Path' and it (maybe) contains a map of paths that is organized by their first String element. Let's write this down: lookup :: Path -> MapPath v -> Maybe v lookup [] (TriePath w _) = w lookup (x:xs) (TriePath _ m) = lookup xs (lookupStr x m) Coding 'insert' is slightly more involved and it may be very instructive to find out how your code and the following are essentially the same. We'll prepare us with 'singleton' which corresponds to your 'pathToTree' singleton :: Path -> v -> MapPath v singleton [] v = TriePath (Just v) emptyStr singleton (x:xs) v = TriePath Nothing $ singletonStr x (singleton xs v) Now, we can tackle 'insert'. Compared to your implementation, it is roughly equivalent to 'merge': insert _ p v m ^= merge (singleton p v) m We have insert :: (v -> v -> v) -> Path -> v -> MapPath v -> MapPath v insert f [] v (TriePath w m) = case w of Just v' -> TriePath (Just (f v v')) m Nothing -> TriePath (Just v) m insert f (x:xs) v (TriePath w m) = TriePath w (insertStr (insert f xs v) x empty m) (Coding 'empty' is left as an exercise. Also note that the case expression can be seen as a kind of 'insertMaybe :: (v -> v -> v) -> v -> Maybe v -> Maybe v). Now what about 'MapString v', how do we get this? Well, your implementation corresponds to the choice type MapString v = [(String,v)] But in our case, we can apply the same trick again! We have 'String = [Char]' and given an implementation of data MapChar v = ... we can use exactly the same code from 'MapPath v' to implement 'MapString v'! (This reuse can be abstracted into a type class, but I'll not cover that here.) Of course, we need 'MapChar v' now. But yet, again we can think of Char as Char ^= Int ^= [Bool] where the '[Bool]' means the list of digits in binary representation. So, given 'MapBool v', we can implement 'MapChar v' with yet again the same code that we used for the preceding finite maps! Fortunately, the recursion ends here because a finite map for 'Bool'eans is just the pair type MapBool v = (Maybe v, Maybe v) In case your head does not yet hurt too much :), further information about tries in Haskell and a detailed explanation of why the code above works, can be found in Ralf Hinze. Generalizing generalized tries. Journal of Functional Programming, 10(4):327-351, July 2000 http://www.informatik.uni-bonn.de/~ralf/publications/GGTries.ps.gz Regards, apfelmus

...

import Data.Tree import Control.Monad

data ArcData = ArcData { name :: String } deriving Show

type ArcTree = Tree ArcData type ArcForest = Forest ArcData

pathsToForest :: [[String]] -> ArcForest pathsToForest paths = mergeForest $ concat $ map pathToTree paths

mergeForest :: ArcForest -> ArcForest mergeForest [] = [] mergeForest (x:xs) = merge x (mergeForest xs) where merge :: ArcTree -> ArcForest -> ArcForest merge tree [] = [tree] merge tree (y:ys) = if sameTreeName tree y then merge tree { subForest = mergeForest ((subForest tree) ++ (subForest y)) } ys else (y:merge tree ys)

treeName :: ArcTree -> String treeName tree = name $ rootLabel $ tree

sameTreeName :: ArcTree -> ArcTree -> Bool sameTreeName treeLeft treeRight = treeName treeLeft == treeName treeRight

pathToTree :: [String] -> ArcForest pathToTree [] = [] pathToTree (name:subpath) = [ Node { rootLabel = ArcData { name = name } , subForest = pathToTree subpath } ]

prettyPrint' :: ArcForest -> [String] prettyPrint' [] = [] prettyPrint' (x:xs) = [name $ rootLabel $ x] ++ (map (" " ++) (prettyPrint' $ subForest x)) ++ prettyPrint' xs

prettyPrint :: ArcForest -> IO () prettyPrint forest = do forM_ (prettyPrint' forest) putStrLn

John Ky

30 Jan 30 Jan

12:18 p.m.

Hi apfelmus, Your code is fine, I like it. A minor hint is that mergeForest is a fold:

...

mergeForest = foldr merge [] Also, we have prettyPrint = putStr . unlines . prettyPrint' $ forest

Nice help on the simple things. I can't know, but it doesn't seem unreasonable that you intend to use

...

the ArcForest as a trie, i.e. an efficient implementation of a set of paths which allows to look up quickly whether a given path (here of type [String]) is in the set or not. So, we have

For a while, I was thinking what on Earth are you talking about, even while I continued reading the rest of the email, but it eventually clicked what you where trying to show me - which was something I didn't dare try until I got more familiar with Haskell. You're examples got me started on dealing with these sorts of complex tree structures (or tries as you call them). They made more sense as I spent more time reading and rereading them. Now, we can build up our finite map for paths:

...

data MapPath v = TriePath (Maybe v) (MapString (MapPath v))

because it (maybe) contains a value for the key '[] :: Path' and it (maybe) contains a map of paths that is organized by their first String element.

In my own code I had to diverge from your definition because for my needs, every node needed to contain a value (even if it was a default value). I plan to later add other numerical values to every node so that I can traverse them and do calculations that feed up and trickle down the tree. Now what about 'MapString v', how do we get this? Well, your

...

implementation corresponds to the choice

type MapString v = [(String,v)]

But in our case, we can apply the same trick again! We have 'String = [Char]' and given an implementation of

data MapChar v = ...

we can use exactly the same code from 'MapPath v' to implement 'MapString v'! (This reuse can be abstracted into a type class, but I'll not cover that here.) Of course, we need 'MapChar v' now. But yet, again we can think of Char as

Char ^= Int ^= [Bool]

where the '[Bool]' means the list of digits in binary representation. So, given 'MapBool v', we can implement 'MapChar v' with yet again the same code that we used for the preceding finite maps! Fortunately, the recursion ends here because a finite map for 'Bool'eans is just the pair

type MapBool v = (Maybe v, Maybe v)

That's quite beautiful, but I don't actually need to go that far. Question though, does taking the approach to this conclusion actually have real applications? In case your head does not yet hurt too much :), further information

...

about tries in Haskell and a detailed explanation of why the code above works, can be found in.

I did try to write my own insertWithInit called by fromPath (below), which I couldn't get working. Branches went missing from the result. I had so much trouble figuring where in the function I forgot to do something. At this point my head was about to explode, so I took a different approach using union called by fromList' (also below), which from my limited testing appears to work. I also find the union function incredibly easy to understand. I only hope I got it right. Thanks much, -John import qualified Data.Map as Map import Data.Map (Map) type Path k = [k] data Trie k v = Trie v (Map k (Trie k v)) deriving Show singleton :: v -> Trie k v singleton v = Trie v Map.empty insertWithInit :: (Ord k) => v -> (v -> v -> v) -> Path k -> v -> Trie k v -> Trie k v insertWithInit _ fInsert [] v (Trie v' m) = Trie (fInsert v v') m insertWithInit fInit fInsert (x:xs) v (Trie v' m) = Trie v' (Map.insertWith merge x subTrie m) where subTrie = insertWithInit fInit fInsert xs v (singleton fInit) merge = seq -- Left biased union union :: (Ord k) => Trie k v -> Trie k v -> Trie k v union (Trie k0 v0) (Trie k1 v1) = Trie k0 v where v = Map.unionWith union v0 v1 fromPath :: (Ord k) => v -> v -> Path k -> Trie k v fromPath initV v path = foldr addParent (singleton v) path where addParent step child = Trie initV (Map.fromList [(step, child)]) fromList :: [Path String] -> Trie String () fromList paths = foldl f (singleton ()) paths where f :: Trie String () -> Path String -> Trie String () f trie path = insertWithInit () (\x y -> ()) path () trie fromList' :: [Path String] -> Trie String () fromList' paths = foldl f (singleton ()) paths where f :: Trie String () -> Path String -> Trie String () f trie path = union trie (fromPath () () path) prettyPrint :: Trie String () -> IO () prettyPrint trie = putStrLn $ unlines $ prettyPrint' trie where prettyPrint' (Trie v m) = Map.foldWithKey f [] m f k a out = out ++ [k] ++ (map (" " ++) (prettyPrint' a))

apfelmus＠quantentunnel.de

5:39 p.m.

John Ky wrote:

...

I can't know, but it doesn't seem unreasonable that you intend to use

...
the ArcForest as a trie, i.e. an efficient implementation of a set of paths which allows to look up quickly whether a given path (here of type [String]) is in the set or not. So, we have

For a while, I was thinking what on Earth are you talking about, even while I continued reading the rest of the email, but it eventually clicked what you where trying to show me - which was something I didn't dare try until I got more familiar with Haskell.

Your examples got me started on dealing with these sorts of complex tree structures (or tries as you call them). They made more sense as I spent more time reading and rereading them.

:) I think that the important point is that one can think of the trees you had as things where one can insert and lookup (path,value)-pairs. This suggests a lot of useful functions like 'insert', 'union' and 'singleton' together with corresponding laws like insert k v m == union (singleton k v) m -- left biased union that are very handy for implementation.

...

Now what about 'MapString v', how do we get this? Well, your

...
implementation corresponds to the choice

type MapString v = [(String,v)]

But in our case, we can apply the same trick again!

[...]

That's quite beautiful, but I don't actually need to go that far. Question though, does taking the approach to this conclusion actually have real applications?

Well, besides providing an actual implementation of finite maps, it is also one of the fastest available. So while 'MapString v' and 'Data.Map String v' have the same purpose, 'MapString v' will be faster. But in your case, I wouldn't bother about this now, because if it turns out that you need to change the trie data structure again, the effort spend in optimization would be wasted. Moreover, changing from 'Data.Map' to 'MapString' or similar is very transparent and therefore can be done later because you only rely on the functions like 'unionWith' that are provided by both. Also, the trick that currently reduces the problem of a finite map for the list [k] to the problem of a finite map for k can be extended to decompose arbitrary types. To get a finite map for either one of the keys k1 or k2, you can take a pair of finite maps for the keys Either k1 k2 -> v ^= (k1 -> v, k2 -> v) Similarly, a finite map for pair of keys (k1,k2) can be encoded as a composition of finite maps (k1,k2) -> v ^= k1 -> (k2 -> v) The paper has more on this.

...

Now, we can build up our finite map for paths:

...
data MapPath v = TriePath (Maybe v) (MapString (MapPath v))

because it (maybe) contains a value for the key '[] :: Path' and it (maybe) contains a map of paths that is organized by their first String element.

In my own code I had to diverge from your definition because for my needs, every node needed to contain a value (even if it was a default value). I plan to later add other numerical values to every node so that I can traverse them and do calculations that feed up and trickle down the tree.

...

type Path k = [k]

data Trie k v = Trie v (Map k (Trie k v)) deriving Show

That's fine, adapt them recklessly to your task :)

...

I did try to write my own insertWithInit called by fromPath (below), which I couldn't get working. Branches went missing from the result. I had so much rouble figuring where in the function I forgot to do something.

I don't know an easy way to implement 'insertWithInit' that works with default elements. The problem is that one has to create the default nodes when inserting u insertWithInit v0 f ["a","b","c"] x $ / \ "a" "b" v w ==> u / \ "a" "b" v w / "b" v0 / "c" x while still guaranteeing that f only acts on the inserted value x. This somehow breaks the intuition of inserting a single (key,value)-pair. If you dispense with the 'With' part, you can outsource the creation of the default nodes to 'fromPath' and employ 'union' to implement 'insertInit': insertInit :: (Ord k) => v -> Path k -> v -> Trie k v -> Trie k v insertInit vInit path v m = union m (fromPath vInit v path) In fact, that's what you did for fromList'. If there is a globally known default element, you also have the option to actually stick with (Maybe v). For example, if you do calculations with 'Int', you can do vdefault = 5 withDefault :: Maybe Int -> Int withDefault Nothing = vdefault withDefault (Maybe x) = x instance Num (Maybe Int) where x + y = Just $ withDefault x + withDefault y ... One could also do with 'Trie k v = Trie (Either v) ...' but i don't think that it's really worth it.

...

At this point my head was about to explode, so I took a different approach using union called by fromList' (also below), which from my limited testing appears to work. I also find the union function incredibly easy to understand. I only hope I got it right.

...

union :: (Ord k) => Trie k v -> Trie k v -> Trie k v union (Trie k0 v0) (Trie k1 v1) = Trie k0 v where v = Map.unionWith union v0 v1

Well, once you found such a really concise function, it can only be correct :) While it is not relevant in your case, note that 'union' can be extended to be applicable with the recursive trick. But the extension suggest itself by noting that you used 'Map.unionWith' instead of 'Map.union'. So, you could do unionWith :: (Ord k) => (v -> v -> v) -> Trie k v -> Trie k v -> Trie k v unionWith f (Trie k0 v0) (Trie k1 v1) = Trie (f k0 k1) v where v = Map.unionWith (unionWith f) v0 v1 union = unionWith (\_ y -> y) Regards, apfelmus

6732

Age (days ago)

6734

Last active (days ago)

List overview

Download

7 comments

5 participants

participants (5)

apfelmus＠quantentunnel.de
Bulat Ziganshin
John Ky
Pixel
Tomasz Zielonka

Paths to tree

John Ky

Tomasz Zielonka

John Ky

Bulat Ziganshin

Pixel

apfelmus＠quantentunnel.de

John Ky

apfelmus＠quantentunnel.de

tags

participants (5)