
John Ky wrote:
I've written some code and was wondering if there was a better way to write it in terms of readability, brevity and/or efficiency.
The function concerned is pathsToForest which takes a list of paths (ie. [[String]]) and converts it into a tree structure where the individual nodes are the names in the path. Siblings with the same name are merged.
For instance:
prettyPrint $ mergeForest $ pathsToForest [["a", "b", "c"], ["c", "b", "a"], ["a", "b", "d"]]
gives:
a b d c c b a
Your code is fine, I like it. A minor hint is that mergeForest is a fold: mergeForest = foldr merge [] Also, we have prettyPrint = putStr . unlines . prettyPrint' $ forest So far, so good. Ironically, Data.Tree is not used much because it is so easy to invent your own tree type in Haskell. Most often, Data.Tree is too rigid and does not offer enough structure. Indeed, this is the case here, too: there is more structure behind your task than you may think at first. Let me show it. I can't know, but it doesn't seem unreasonable that you intend to use the ArcForest as a trie, i.e. an efficient implementation of a set of paths which allows to look up quickly whether a given path (here of type [String]) is in the set or not. So, we have type Path = [String] type SetPath = ... -- a set of paths, to be defined later -- checks whether a given path is in the trie member :: Path -> SetPath -> Bool where 'member' is the function akin to 'elem' on lists. The focus of your code is not on the membership test, but constructing the trie from a list of paths. You named the function 'pathsToForest', we will name it fromList :: [Path] -> SetPath Now, if we have a function insert :: Path -> SetPath -> SetPath that inserts a path into the set and we if we are given the notion of an empty set empty :: SetPath , we can write fromList = foldr insert empty Indeed, your function 'merge' corresponds to 'insert'. So let's find 'insert'. It will turn out that it is easier to generalize from SetPath to a storage that associates a value of type v with every path. This is called "finite map" (here, the keys are of type Path): data MapPath v = ... -- to be defined later -- the empty map contains no values empty :: MapPath v -- a map that contains a single key-value pair singleton :: Path -> v -> MapPath v -- lookup the value for a given Path lookup :: Path -> MapPath v -> Maybe v -- insert a value with given key into the finite map insert :: (v -> v -> v) -> Path -> v -> MapPath v -> MapPath v These functions are like the ones from the standard library Data.Map (btw, you can use this library for your task, too). Note that we had to generalize the type signature for 'insert' considerably because we have to specify what to do when there is already a value for the given key in the trie. This is what the argument of type '(v -> v -> v)' does: it takes the old and the new value and merges them somehow. We will need it, soon. Given the generalization, we can now simply put type SetPath = MapPath () and function 'fromList' will read fromList :: [Path] -> SetPath fromList = foldr (\path -> insert (\_ x -> x) path ()) empty It is time to think about coding 'lookup' and 'insert'. And here comes the trick: we know that 'Path = [String]' so let's assume that we already have a finite map for strings: data MapString v = ... emptyStr :: MapString v singletonStr :: String -> v -> MapString v lookupStr :: String -> MapString v -> Maybe v insertStr :: (v -> v -> v) -> String -> v -> MapString v -> MapString v Now, we can build up our finite map for paths: data MapPath v = TriePath (Maybe v) (MapString (MapPath v)) because it (maybe) contains a value for the key '[] :: Path' and it (maybe) contains a map of paths that is organized by their first String element. Let's write this down: lookup :: Path -> MapPath v -> Maybe v lookup [] (TriePath w _) = w lookup (x:xs) (TriePath _ m) = lookup xs (lookupStr x m) Coding 'insert' is slightly more involved and it may be very instructive to find out how your code and the following are essentially the same. We'll prepare us with 'singleton' which corresponds to your 'pathToTree' singleton :: Path -> v -> MapPath v singleton [] v = TriePath (Just v) emptyStr singleton (x:xs) v = TriePath Nothing $ singletonStr x (singleton xs v) Now, we can tackle 'insert'. Compared to your implementation, it is roughly equivalent to 'merge': insert _ p v m ^= merge (singleton p v) m We have insert :: (v -> v -> v) -> Path -> v -> MapPath v -> MapPath v insert f [] v (TriePath w m) = case w of Just v' -> TriePath (Just (f v v')) m Nothing -> TriePath (Just v) m insert f (x:xs) v (TriePath w m) = TriePath w (insertStr (insert f xs v) x empty m) (Coding 'empty' is left as an exercise. Also note that the case expression can be seen as a kind of 'insertMaybe :: (v -> v -> v) -> v -> Maybe v -> Maybe v). Now what about 'MapString v', how do we get this? Well, your implementation corresponds to the choice type MapString v = [(String,v)] But in our case, we can apply the same trick again! We have 'String = [Char]' and given an implementation of data MapChar v = ... we can use exactly the same code from 'MapPath v' to implement 'MapString v'! (This reuse can be abstracted into a type class, but I'll not cover that here.) Of course, we need 'MapChar v' now. But yet, again we can think of Char as Char ^= Int ^= [Bool] where the '[Bool]' means the list of digits in binary representation. So, given 'MapBool v', we can implement 'MapChar v' with yet again the same code that we used for the preceding finite maps! Fortunately, the recursion ends here because a finite map for 'Bool'eans is just the pair type MapBool v = (Maybe v, Maybe v) In case your head does not yet hurt too much :), further information about tries in Haskell and a detailed explanation of why the code above works, can be found in Ralf Hinze. Generalizing generalized tries. Journal of Functional Programming, 10(4):327-351, July 2000 http://www.informatik.uni-bonn.de/~ralf/publications/GGTries.ps.gz Regards, apfelmus
import Data.Tree import Control.Monad
data ArcData = ArcData { name :: String } deriving Show
type ArcTree = Tree ArcData type ArcForest = Forest ArcData
pathsToForest :: [[String]] -> ArcForest pathsToForest paths = mergeForest $ concat $ map pathToTree paths
mergeForest :: ArcForest -> ArcForest mergeForest [] = [] mergeForest (x:xs) = merge x (mergeForest xs) where merge :: ArcTree -> ArcForest -> ArcForest merge tree [] = [tree] merge tree (y:ys) = if sameTreeName tree y then merge tree { subForest = mergeForest ((subForest tree) ++ (subForest y)) } ys else (y:merge tree ys)
treeName :: ArcTree -> String treeName tree = name $ rootLabel $ tree
sameTreeName :: ArcTree -> ArcTree -> Bool sameTreeName treeLeft treeRight = treeName treeLeft == treeName treeRight
pathToTree :: [String] -> ArcForest pathToTree [] = [] pathToTree (name:subpath) = [ Node { rootLabel = ArcData { name = name } , subForest = pathToTree subpath } ]
prettyPrint' :: ArcForest -> [String] prettyPrint' [] = [] prettyPrint' (x:xs) = [name $ rootLabel $ x] ++ (map (" " ++) (prettyPrint' $ subForest x)) ++ prettyPrint' xs
prettyPrint :: ArcForest -> IO () prettyPrint forest = do forM_ (prettyPrint' forest) putStrLn