
Graphviz (http://graphviz.org/) has the option to convert provided Dot code for visualising a graph into a canonical form. For example, take the sample Dot code: ---- | digraph foo { | a [color=blue]; | subgraph cluster_bar { | a [label="hi!"] | b [color=blue, label="bye!"]; | b -> c; | } | a -> b; | a -> c; | c -> d; | d [color=red] `---- Calling "dot -Tcanon" on this produces: ,---- | digraph foo { | node [label="\N"]; | subgraph cluster_bar { | a [label="hi!", color=blue]; | b [label="bye!", color=blue]; | b -> c; | a -> b; | a -> c; | } | d [color=red]; | c -> d; | } `---- Note in particular the following: * The addition of the top-level "node [label="\N"]" attribute; this happens automatically even if all nodes have their own custom label. * The two `a' nodes are merged into the inner-most cluster/subgraph that they are defined in (since they represent the same node). * The attributes for `b' are re-ordered. * The "a -> b" and "a -> c" edges are shifted into the cluster, since all of the related nodes are in that cluster (note that the `c' node is implicitly defined there since an edge involving it is defined there). * The "c -> d" edge is shifted to being _after_ the definition of the `d' node (in fact, in each grouping all nodes are defined before edges). I've recently thought up a way that I can duplicate this functionality (in terms of what it does, not necessarily the actual output) in my graphviz library (http://hackage.haskell.org/package/graphviz), however I'm not sure how closely to follow what Graphviz does. What would the community prefer out of the following (including a mixture, such as options 2 and 3): 1) Match Graphviz's output as much as possible (note that the ordering of the attributes won't happen this way, since I'll be sorting them alphabetically rather than working out Graphviz's arcane rules). 1a) As with 1), but don't worry about defining extra attributes such as "node [label="\N"]". 2) Explicitly define nodes that are only mentioned in edges (e.g. actually define a `c' node, even if it doesn't have any extra attributes). Note that this already happens if such an edge is specified inside a different cluster than the already known node is in; in that case then the stand-alone node is defined in the cluster its edge is defined in, and the actual edge is moved into the outer context that combines the two clusters. 2a) As with 2), but define all edges in the overall graph rather than grouping them internally inside clusters, etc. 3) Group common attributes together as much as possible (e.g. inside the bar cluster, define another subgroup which has a top-level attribute "node [color=blue]" and define the `a' and `b' nodes in there). Not sure how well this will work with edges though, and is probably not possible in conjunction with option 2a). This is a bit tricky and subtle: if `a' and `b' are already in such a subgraph, but an edge specifying one of them is defined _before_ the subgraph is defined, then that node will have an explicit color attribute defined of "" (i.e. empty string), though I would classify this as a bug. 4) As an alternate to option 3), remove all explicit top-level node and edge attributes and move them all to the affected node and edge definitions themselves; this may require option 2). I'm also willing to hear and consider other possible variants; however, I'm only going to code (at most) one. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com