darcs patch dependencies in dot format

Hi all! Yesterday I wrote a little tool to output the dependencies of darcs patches in dot format. The hardest part was to wrap my head around the darcs API and find a way to let it compute the patch dependencies. I don't know, if I got it right, but it looks correct at first glance. Here is the code: https://patch-tag.com/r/shahn/darcs2dot To use it just execute the program in a darcs repo and it will output a dot graph to stdout. The graph does not contain all dependencies, but the transitive reduction. The patch names are truncated at 15 characters. And here is an example graph: http://open-projects.net/~shahn/patchDeps.pdf These are the patch dependencies of one of my darcs repos (https://patch-tag.com/r/shahn/hate). Some observations I made: * There are two completely separate subgraphs. One subgraph seems to be for the testsuite, the other for the actual code. This means, the two don't depend on each other and could easily be put in two distinct repos. * There is a "re-implementation" patch with a lot of incoming and outgoing edges. (Which makes sense.) * There are some sequences of patches (e.g. these four "allow ..." patches in the upper left corner) that seem to contain related patches. * darcs's patch system is awesome! Any comments or suggestions? Cheers, Sönke

Truly amazing! I wonder it would fare with larger repositories. =)
Cheers,
--
Felipe – enviado do meu Galaxy Tab.
Em 12/05/2012 09:52, "Sönke Hahn"
Hi all!
Yesterday I wrote a little tool to output the dependencies of darcs patches in dot format. The hardest part was to wrap my head around the darcs API and find a way to let it compute the patch dependencies. I don't know, if I got it right, but it looks correct at first glance.
Here is the code:
https://patch-tag.com/r/shahn/darcs2dot
To use it just execute the program in a darcs repo and it will output a dot graph to stdout. The graph does not contain all dependencies, but the transitive reduction. The patch names are truncated at 15 characters.
And here is an example graph:
http://open-projects.net/~shahn/patchDeps.pdf
These are the patch dependencies of one of my darcs repos (https://patch-tag.com/r/shahn/hate). Some observations I made:
* There are two completely separate subgraphs. One subgraph seems to be for the testsuite, the other for the actual code. This means, the two don't depend on each other and could easily be put in two distinct repos. * There is a "re-implementation" patch with a lot of incoming and outgoing edges. (Which makes sense.) * There are some sequences of patches (e.g. these four "allow ..." patches in the upper left corner) that seem to contain related patches. * darcs's patch system is awesome!
Any comments or suggestions?
Cheers, Sönke
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 05/13/2012 03:13 AM, Felipe Almeida Lessa wrote:
Truly amazing! I wonder it would fare with larger repositories. =)
It does not scale well. I have not looked into optimization at all. For example the algorithm to compute the transitive reduction is very naive and brute force. Somehow related questions are: What am I going to do with a dot-graph, that has more than 500 vertices? Is there an intelligent way to reduce the graph?

On 13/05/12 14:55, Sönke Hahn wrote:
Somehow related questions are: What am I going to do with a dot-graph, that has more than 500 vertices? Is there an intelligent way to reduce the graph?
Setting the `concentrate' parameter to true helps, but dot does not work well with massive graphs. Francesco.

On 13/05/12 15:13, Francesco Mazzoli wrote:
On 13/05/12 14:55, Sönke Hahn wrote:
Somehow related questions are: What am I going to do with a dot-graph, that has more than 500 vertices? Is there an intelligent way to reduce the graph?
Setting the `concentrate' parameter to true helps, but dot does not work well with massive graphs.
Actually, "neato" doesn't work well with massive graph - and neither do the other algorithms provided by GraphViz. dot is just the file format. I found Gephi (https://gephi.org/) quite good when I had to visualize big graphs, and it supports dot files so you can try it out easily. Francesco.

On 05/13/2012 04:16 PM, Francesco Mazzoli wrote:
I found Gephi (https://gephi.org/) quite good when I had to visualize big graphs, and it supports dot files so you can try it out easily.
gephi looks very interesting, thanks.

Sönke Hahn
On 05/13/2012 03:13 AM, Felipe Almeida Lessa wrote:
Truly amazing!
Yes, nice work!
I wonder it would fare with larger repositories. =)
It does not scale well. [...] Somehow related questions are: What am I going to do with a dot-graph, that has more than 500 vertices? Is there an intelligent way to reduce the graph?
Lacking a solution for the problem of drawing large graphs, making the graph smaller might be the second choice. :-) One option might be to only track dependencies back to a specified tag? Or between tags? -k -- If I haven't seen further, it is by standing in the footprints of giants

On 5/12/12 5:52 AM, Sönke Hahn wrote:
Hi all!
Yesterday I wrote a little tool to output the dependencies of darcs patches in dot format. The hardest part was to wrap my head around the darcs API and find a way to let it compute the patch dependencies. I don't know, if I got it right, but it looks correct at first glance.
Here is the code:
https://patch-tag.com/r/shahn/darcs2dot
To use it just execute the program in a darcs repo and it will output a dot graph to stdout. The graph does not contain all dependencies, but the transitive reduction. The patch names are truncated at 15 characters.
And here is an example graph:
http://open-projects.net/~shahn/patchDeps.pdf
These are the patch dependencies of one of my darcs repos (https://patch-tag.com/r/shahn/hate). Some observations I made:
* There are two completely separate subgraphs. One subgraph seems to be for the testsuite, the other for the actual code. This means, the two don't depend on each other and could easily be put in two distinct repos. * There is a "re-implementation" patch with a lot of incoming and outgoing edges. (Which makes sense.) * There are some sequences of patches (e.g. these four "allow ..." patches in the upper left corner) that seem to contain related patches. * darcs's patch system is awesome!
Any comments or suggestions?
Cheers, Sönke
That's nifty, thanks for sharing it. Cc'ing darcs-user. I tried it in a few repos, like so: $ darcs2dot > patchdeps.dot && dot patchdeps.dot -Tpdf -o patchdeps.pdf In a 200-patch repo it ran quickly, giving: http://joyful.com/darcsden/simon/darcsum/raw/patchdeps.pdf In a 2000-patch repo it took 22 hours: http://joyful.com/darcsden/simon/hledger/raw/patchdeps.pdf In the 10000-patch Darcs repo I killed it after a few hours, but here are the early dependencies (up to tag 1.0.0rc2): http://joyful.com/darcsden/simon/darcs-sm/raw/patchdeps.pdf It should escape double-quotes in patch names, I did that manually. I wonder how to simplify the graph further. Perhaps just the dependencies of tags would be interesting in some repos ? Best, -Simon

Hi, Am Montag, den 14.05.2012, 07:21 -0700 schrieb Simon Michael:
I wonder how to simplify the graph further. Perhaps just the dependencies of tags would be interesting in some repos ?
I think you’d want to collapse [a]→[b] to [a,b], if no other edges leave a or reach b. This way you only get arrows at branching and merging points. Greetings, Joachim -- Joachim "nomeata" Breitner mail@joachim-breitner.de | nomeata@debian.org | GPG: 0x4743206C xmpp: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/

On 05/14/2012 04:21 PM, Simon Michael wrote:
In a 2000-patch repo it took 22 hours: http://joyful.com/darcsden/simon/hledger/raw/patchdeps.pdf
:)
It should escape double-quotes in patch names, I did that manually.
That should be fixed (in the repo).

On 5/12/12 8:52 AM, Sönke Hahn wrote:
Any comments or suggestions?
Cabalize it and release it on Hackage. But especially the cabalization part :) You should probably farm out the toDot rendering to one of the libraries that focuses on that[1], since they'll have focused on the efficiency issues--- or if they haven't, then you can contribute improvements there, helping everyone win. In particular, you're using Strings which is a notorious performance sink. Using Text or ByteStrings would be far better. Also, have you compared your transitive reduction to just outputting the whole graph and then using `tred`? The latter approach has the distinct downside of needing to serialize the whole graph; but it could still be a win unless you intend to implement similar algorithms yourself. The smart algorithms do *much* better than brute force. And of course it'd be nice to be able to pass arguments to the program in order to filter and otherwise manipulate the resulting graph. A lot of that can be done by special programs which only know about the Dot language (e.g., tred), so you should only focus on things which aren't captured by the Dot language or are otherwise only knowable by querying Darcs. [1] Like graphviz or language-dot: http://hackage.haskell.org/package/graphviz http://hackage.haskell.org/package/language-dot Though it doesn't look like those are used by the various other foo2dot programs on Hackage: http://hackage.haskell.org/package/hs2dot http://hackage.haskell.org/package/prof2dot http://hackage.haskell.org/package/scons2dot http://hackage.haskell.org/package/vacuum-cairo http://hackage.haskell.org/package/vacuum-opengl So perhaps there's some issue with those libraries... -- Live well, ~wren

On 16 May 2012 19:43, wren ng thornton
On 5/12/12 8:52 AM, Sönke Hahn wrote:
Any comments or suggestions?
Cabalize it and release it on Hackage. But especially the cabalization part :)
You should probably farm out the toDot rendering to one of the libraries that focuses on that[1], since they'll have focused on the efficiency issues--- or if they haven't, then you can contribute improvements there, helping everyone win. In particular, you're using Strings which is a notorious performance sink. Using Text or ByteStrings would be far better.
Also, have you compared your transitive reduction to just outputting the whole graph and then using `tred`? The latter approach has the distinct downside of needing to serialize the whole graph; but it could still be a win unless you intend to implement similar algorithms yourself. The smart algorithms do *much* better than brute force.
I would like to point out that graphviz has a native implementation of tred (well, analogous rather than exact re-implementation). I also haven't joined this discussion before now, but some of the reduction algorithms in Graphalyze [1] (as used in SourceGraph [2]) might be applicable, though I admit they're not the best possible implementations [1]: http://hackage.haskell.org/package/Graphalyze [2]: http://hackage.haskell.org/package/SourceGraph
And of course it'd be nice to be able to pass arguments to the program in order to filter and otherwise manipulate the resulting graph. A lot of that can be done by special programs which only know about the Dot language (e.g., tred), so you should only focus on things which aren't captured by the Dot language or are otherwise only knowable by querying Darcs.
[1] Like graphviz or language-dot:
http://hackage.haskell.org/package/graphviz http://hackage.haskell.org/package/language-dot
Though it doesn't look like those are used by the various other foo2dot programs on Hackage:
http://hackage.haskell.org/package/hs2dot http://hackage.haskell.org/package/prof2dot http://hackage.haskell.org/package/scons2dot http://hackage.haskell.org/package/vacuum-cairo http://hackage.haskell.org/package/vacuum-opengl
So perhaps there's some issue with those libraries...
-- Live well, ~wren
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

On 16 May 2012 19:43, wren ng thornton
wrote:
You should probably farm out the toDot rendering to one of the libraries that focuses on that[1], since they'll have focused on the efficiency issues--- or if they haven't, then you can contribute improvements there, helping everyone win. In particular, you're using Strings which is a notorious performance sink. Using Text or ByteStrings would be far better.
I'm not sure swapping to Text or ByteStrings make be much great shakes for this. If you are generating huge files, where it would count - then the files are going to be a real problem for Graphviz to render (unless Graphviz has seen some optimization recently...). That said, I would recommend Sönke uses a pretty print library rather than Printf as using the former makes for much more idiomatic for Haskell and generally performs well enough for "generational" activities even if it uses Strings internally. Best wishes Stephen

On 17 May 2012 03:31, Stephen Tetley
On 16 May 2012 19:43, wren ng thornton
wrote: You should probably farm out the toDot rendering to one of the libraries that focuses on that[1], since they'll have focused on the efficiency issues--- or if they haven't, then you can contribute improvements there, helping everyone win. In particular, you're using Strings which is a notorious performance sink. Using Text or ByteStrings would be far better.
I'm not sure swapping to Text or ByteStrings make be much great shakes for this. If you are generating huge files, where it would count - then the files are going to be a real problem for Graphviz to render (unless Graphviz has seen some optimization recently...).
I found with graphviz that switching to Text gave two advantages: 1) Easier to require UTF-8 usage 2) Printing and parsing was faster In part, the speed-up came from switching to wl-pprint-text rather than pretty (the wl-pprint method of pretty-printing seems to have some efficiency improvements in how concatenation is done over pretty) and the Text backend for polyparse lets you use manySatisfy rather than (many . satisfy) which is more efficient. But even in my initial pseudo-port (going via String a lot, etc.) there was a slight speed-up. So I think there is still a valid reason to consider using Text (or possibly ByteString, but I think that has potential issues).
That said, I would recommend Sönke uses a pretty print library rather than Printf as using the former makes for much more idiomatic for Haskell and generally performs well enough for "generational" activities even if it uses Strings internally.
Best wishes
Stephen
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

On 05/16/2012 11:43 AM, wren ng thornton wrote:
On 5/12/12 8:52 AM, Sönke Hahn wrote:
Any comments or suggestions?
Cabalize it and release it on Hackage. But especially the cabalization part :)
Both done: http://hackage.haskell.org/package/darcs2dot

On 05/16/2012 11:43 AM, wren ng thornton wrote:
Also, have you compared your transitive reduction to just outputting the whole graph and then using `tred`? The latter approach has the distinct downside of needing to serialize the whole graph; but it could still be a win unless you intend to implement similar algorithms yourself. The smart algorithms do *much* better than brute force.
I've done some profiling and found that the executable is spending about half of its time executing my brute force graph algorithm. So doing something smarter here (like using "tred") seems like a good idea. The bad news is that without running my inefficient algorithm the executable still doesn't scale well. Maybe there is a better way to let the darcs library compute the patch dependencies.
And of course it'd be nice to be able to pass arguments to the program in order to filter and otherwise manipulate the resulting graph. A lot of that can be done by special programs which only know about the Dot language (e.g., tred), so you should only focus on things which aren't captured by the Dot language or are otherwise only knowable by querying Darcs.
Sounds reasonable. Command line options would be nice.
participants (10)
-
Felipe Almeida Lessa
-
Francesco Mazzoli
-
Ivan Lazar Miljenovic
-
Joachim Breitner
-
Ketil Malde
-
Simon Michael
-
Soenke Hahn
-
Stephen Tetley
-
Sönke Hahn
-
wren ng thornton