
On Thu, 08 Jan 2009 21:28:27 minh thu wrote:
Hi,
I'd like to process some kind of graph data structure, say something like
data DS = A [DS] | B DS DS | C.
Graphs in funtional languages aren't usually represented in this sort of manner. Trees are fine to represent like that as they're acyclic and have exactly one parent for each node but for graphs it's much more difficult. Say that you have a graph with directed connections like this: 0 -> 1 1 -> 2 2 -> 3 1 -> 3 3 -> 4 Now you want to alter node 4. Node 3 has to be updated to point to the new version of 4, node 1 has to be changed to point to the new version of 3, node 2 has to be changed to point to the new version of node 3, then node 1 has to be changed again to point to the new version of 2, then finally 0 can be changed to point to the new version of 1 and returned. There is no simple way using this representation to handle that double-update to node 1, or to handle disconnected or cyclic graphs. Updates are extremely difficult since Haskell data structures are not mutable and have no concept of identity. The approach of treating nodes as structures with pointers to each other cannot be cleanly and efficiently implemented in an immutable fashion. It only really makes sense in a stateful, imperative context. An approach that suits functional languages better is to store a flat structure listing the edges leaving each node. This, I believe, is the approach taken by Haskell's main graph library, FGL (http://hackage.haskell.org/cgi-bin/hackage-scripts/package/fgl). You would now have something like: data MyNode nv = MyNode {nodeId::Int, nodeValue::nv} data MyEdge ev = MyEdge {edgeDestination::Int, edgeValue::ev} data MyGraph nv ev = MyGraph { maxNode :: Int, nodes :: (Map Int nv), edges :: (Map Int [MyEdge ev])} emptyGraph :: MyGraph nv ev emptyGraph = MyGraph 0 (Data.Map.empty) (Data.Map.empty) getNode :: MyGraph nv ev -> Int -> Maybe (MyNode nv) getNode g id = ((nodes g) `lookup` id) >>= (\v -> MyNode id v) getEdgesLeaving :: MyGraph nv ev -> Int -> [MyEdge ev] getEdgesLeaving g id = fromMaybe [] ((edges g) `lookup` id) addNode :: nv -> MyGraph nv ev -> (Int, MyGraph nv ev) addNode val g = (maxNode newGraph, newGraph) where newNodeId = (maxNode g) + 1 newGraph = MyGraph newNodeId (insert newNodeId val (nodes g)) (edges g) ... and so on. (This is all totally untested - use at your own peril.) Each node in the graph has a unique identifying number, issued in sequence using maxNode as a counter. This makes identifying cycles easy. The nodes map contains the value for each node based on its id. The edges map contains a list of links from each node to others in the graph. Finding links entering a node is quite expensive - if you need to do this often then maintaining a second list of edges entering each node would speed it up. Each node and each edge can have a custom data structure attached. New nodes and edges can be added without having to modify references elsewhere, nodes have a distinct identity given by the associated Int and the graph is immutable - operations on it produce modified copies. Cheers, Tim