Canonical Graphviz code

Graphviz (http://graphviz.org/) has the option to convert provided Dot code for visualising a graph into a canonical form. For example, take the sample Dot code: ---- | digraph foo { | a [color=blue]; | subgraph cluster_bar { | a [label="hi!"] | b [color=blue, label="bye!"]; | b -> c; | } | a -> b; | a -> c; | c -> d; | d [color=red] `---- Calling "dot -Tcanon" on this produces: ,---- | digraph foo { | node [label="\N"]; | subgraph cluster_bar { | a [label="hi!", color=blue]; | b [label="bye!", color=blue]; | b -> c; | a -> b; | a -> c; | } | d [color=red]; | c -> d; | } `---- Note in particular the following: * The addition of the top-level "node [label="\N"]" attribute; this happens automatically even if all nodes have their own custom label. * The two `a' nodes are merged into the inner-most cluster/subgraph that they are defined in (since they represent the same node). * The attributes for `b' are re-ordered. * The "a -> b" and "a -> c" edges are shifted into the cluster, since all of the related nodes are in that cluster (note that the `c' node is implicitly defined there since an edge involving it is defined there). * The "c -> d" edge is shifted to being _after_ the definition of the `d' node (in fact, in each grouping all nodes are defined before edges). I've recently thought up a way that I can duplicate this functionality (in terms of what it does, not necessarily the actual output) in my graphviz library (http://hackage.haskell.org/package/graphviz), however I'm not sure how closely to follow what Graphviz does. What would the community prefer out of the following (including a mixture, such as options 2 and 3): 1) Match Graphviz's output as much as possible (note that the ordering of the attributes won't happen this way, since I'll be sorting them alphabetically rather than working out Graphviz's arcane rules). 1a) As with 1), but don't worry about defining extra attributes such as "node [label="\N"]". 2) Explicitly define nodes that are only mentioned in edges (e.g. actually define a `c' node, even if it doesn't have any extra attributes). Note that this already happens if such an edge is specified inside a different cluster than the already known node is in; in that case then the stand-alone node is defined in the cluster its edge is defined in, and the actual edge is moved into the outer context that combines the two clusters. 2a) As with 2), but define all edges in the overall graph rather than grouping them internally inside clusters, etc. 3) Group common attributes together as much as possible (e.g. inside the bar cluster, define another subgroup which has a top-level attribute "node [color=blue]" and define the `a' and `b' nodes in there). Not sure how well this will work with edges though, and is probably not possible in conjunction with option 2a). This is a bit tricky and subtle: if `a' and `b' are already in such a subgraph, but an edge specifying one of them is defined _before_ the subgraph is defined, then that node will have an explicit color attribute defined of "" (i.e. empty string), though I would classify this as a bug. 4) As an alternate to option 3), remove all explicit top-level node and edge attributes and move them all to the affected node and edge definitions themselves; this may require option 2). I'm also willing to hear and consider other possible variants; however, I'm only going to code (at most) one. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Mon, 05 Jul 2010 19:26:34 -0700, Ivan Miljenovic
Graphviz (http://graphviz.org/) has the option to convert provided Dot code for visualising a graph into a canonical form. For example, take the sample Dot code: [snip] I've recently thought up a way that I can duplicate this functionality (in terms of what it does, not necessarily the actual output) in my graphviz library (http://hackage.haskell.org/package/graphviz), however I'm not sure how closely to follow what Graphviz does.
There doesn't seem to be a clear definition of "canon" output is, and the implication in the documentation is that this mode might better have been named "pprint" (thus my hesitance to refer to it as "canonical form"). Based on this, I'd suggest that you don't need strict adherence to graphviz. It generally appears that "canon" output follows the two guidelines of "Atomic" and "Minimal": * "Atomic" meaning specify all attributes that apply to an element instead of relying on inheritance * "Minimal" meaning don't specify too much - No default attributes - Each element (edge or node) on its own, no combining (see also "Atomic") - Elements only in necessary scope (thus move edges/nodes that only appear in a subgraph context into that subgraph) Based on this I would anticipate your options (1), (2), and (4) below. The advantages of "canon" form would seem to be that removing, or adding elements to the graph is as "safe" as possible because the impact to the graph is only as explicitly stated. For example, removal of a node requires only removing that node statement and any edge statement specifying that node---no concern about removing it from multi-node or multi-edge elements---and that the locality of the removal is fairly evident (i.e. which subgraph the element is part of). Likewise adding an element means that the element will appear pretty much as-specified without inheriting attributes. Interestingly you could envision providing a dual of this mode named "compact". The compact mode would attempt to specify the graph as compactly as possible, using inheritance for attributes and multi-element statements (interestingly it's an attribute-oriented approach to the graph as opposed to the element-oriented approach of "canon" form). -KQ
What would the community prefer out of the following (including a mixture, such as options 2 and 3):
1) Match Graphviz's output as much as possible (note that the ordering of the attributes won't happen this way, since I'll be sorting them alphabetically rather than working out Graphviz's arcane rules).
1a) As with 1), but don't worry about defining extra attributes such as "node [label="\N"]".
2) Explicitly define nodes that are only mentioned in edges (e.g. actually define a `c' node, even if it doesn't have any extra attributes). Note that this already happens if such an edge is specified inside a different cluster than the already known node is in; in that case then the stand-alone node is defined in the cluster its edge is defined in, and the actual edge is moved into the outer context that combines the two clusters.
2a) As with 2), but define all edges in the overall graph rather than grouping them internally inside clusters, etc.
3) Group common attributes together as much as possible (e.g. inside the bar cluster, define another subgroup which has a top-level attribute "node [color=blue]" and define the `a' and `b' nodes in there). Not sure how well this will work with edges though, and is probably not possible in conjunction with option 2a). This is a bit tricky and subtle: if `a' and `b' are already in such a subgraph, but an edge specifying one of them is defined _before_ the subgraph is defined, then that node will have an explicit color attribute defined of "" (i.e. empty string), though I would classify this as a bug.
4) As an alternate to option 3), remove all explicit top-level node and edge attributes and move them all to the affected node and edge definitions themselves; this may require option 2).
I'm also willing to hear and consider other possible variants; however, I'm only going to code (at most) one.
-- -KQ

"Kevin Quick"
On Mon, 05 Jul 2010 19:26:34 -0700, Ivan Miljenovic
wrote: Graphviz (http://graphviz.org/) has the option to convert provided Dot code for visualising a graph into a canonical form. For example, take the sample Dot code: [snip] I've recently thought up a way that I can duplicate this functionality (in terms of what it does, not necessarily the actual output) in my graphviz library (http://hackage.haskell.org/package/graphviz), however I'm not sure how closely to follow what Graphviz does.
There doesn't seem to be a clear definition of "canon" output is, and the implication in the documentation is that this mode might better have been named "pprint" (thus my hesitance to refer to it as "canonical form"). Based on this, I'd suggest that you don't need strict adherence to graphviz.
Even better; after sending that email out yesterday I wrote a Dot-graph by hand and it had some very weird characteristics; try to guess what this looks like: ,---- | digraph { | a; | subgraph cluster { | a -> b; | } | } `---- Note that it isn't the location of the definition of `a' that's the problem; it's the definition of the location of the edge: edges should be defined in the innermost context where _both_ nodes are. As such, I probably won't be implementing the canonical form stuff any time soon in graphviz, and might need to examine Graphviz's source code to compare it and ensure that it's at least similar :s -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Tue, Jul 6, 2010 at 7:15 PM, Ivan Lazar Miljenovic
As such, I probably won't be implementing the canonical form stuff any time soon in graphviz, and might need to examine Graphviz's source code to compare it and ensure that it's at least similar :s
I'm sorry for being silly, but what's the motivation of having this canonic form? =) Cheers! -- Felipe.

Felipe Lessa
On Tue, Jul 6, 2010 at 7:15 PM, Ivan Lazar Miljenovic
wrote: As such, I probably won't be implementing the canonical form stuff any time soon in graphviz, and might need to examine Graphviz's source code to compare it and ensure that it's at least similar :s
I'm sorry for being silly, but what's the motivation of having this canonic form? =)
A few things come to mind: * Easier to reason about, as the various items of different types (global attributes, subgraphs/clusters, nodes and edges) are grouped together rather than being all mixed up; compare the layout of Data.GraphViz.Types.DotGraph to Data.GraphViz.Types.Generalised.GDotGraph. * A non-canonical graph can have the same node specified several times with different attributes; in the canonical form they are all merged into one (see the `a' node in the examples in my original email). * Less ambiguity: in the email I just sent out, I had a graph which I expected `a' to be outside the box and `b' to be inside it; the canonical representation of this explicitly puts both of them inside the cluster such that that ambiguity is no longer present. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Tue, Jul 6, 2010 at 7:53 PM, Ivan Lazar Miljenovic
Felipe Lessa
writes: I'm sorry for being silly, but what's the motivation of having this canonic form? =)
A few things come to mind:
* Easier to reason about, [...] * Less ambiguity: [...]
So you want to do some post-processing to the Dot graph inside Haskell-land? Or is it just a pretty printer? I mean, Dot will accept both and produce the same result (by definition), so if you just wanted to draw it then there wouldn't be any difference. Cheers! :) -- Felipe.

Felipe Lessa
On Tue, Jul 6, 2010 at 7:53 PM, Ivan Lazar Miljenovic
wrote: Felipe Lessa
writes: I'm sorry for being silly, but what's the motivation of having this canonic form? =)
A few things come to mind:
* Easier to reason about, [...] * Less ambiguity: [...]
So you want to do some post-processing to the Dot graph inside Haskell-land? Or is it just a pretty printer? I mean, Dot will accept both and produce the same result (by definition), so if you just wanted to draw it then there wouldn't be any difference.
It's not just to draw it, it's for when you import some Dot code and then want to edit it (add some extra attributes, etc.). And to an extent, it's also a pretty-printer (though graphviz's pretty-printer will still probably stick to using "dot -Tcanon" as it's using the pretty library, which doesn't support infinite-length lines with indenting). -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com
participants (4)
-
Felipe Lessa
-
Ivan Lazar Miljenovic
-
Ivan Miljenovic
-
Kevin Quick