Imperative graph algorithm, (tail?) recursion, and speed

Suppose I have a general (not a tree) directed graph, and I would like to find all the "pure descendents" of a node N -- that is, all nodes M for which (1) N is an ancestor of M, and (2) every partial or maximal sequence of predecessors starting from M either consists entirely of descendents of N, or else includes N itself. I have an imperative algorithm (below) for doing that. I want to know how hard it would be to implement in Haskell. It's complex, but basically it sets up a few lists like "members of this list have had all their descendents processed", "members of this list have been processed but their descendents have not been", "members of this list have not been processed at all", etc. Then I iterate through the graph, moving nodes from one collection to the other until the "to be processed" list is empty. I would like a function Graph -> Graph that does that and does not need to rely on a monad. The only way I can think of involves a few mutually recursive functions, each of which passes not only the original graph of interest, but all the lists described in the previous paragraph, around to the others. I don't understand Haskell's evaluation strategy well enough to anticipate the result, but my vague sense is that a lot of copies of the same data would be floating around, ruining the speed of it. Python code for the algorithm: def descendedOnlyFrom(gset): "All Gnodes n for which every ancestry of n includes a Gnode in gset (which can be a set or a list)." # For efficiency improvement ideas and verbose comments, # see treeSearch/all.py/node.descendedOnlyFrom if 1: # variables # "pure" = "descended only from the calling Glist" pe = set(gset) # determined Pure, yet to Explore children of pf = set() # determined Pure, Finished exploring children of pb = set(gset) # determined Pure, Both kinds: pe | pf ud = set() # purity UnDetermined # descended from root, but might have parents outside of pb udp = {} # "Parents of the UnDetermined" # This is a dictionary, from gnodes to sets of gnodes. # The other four variables are sets. while pe: while pe: k = 1 i = pe.pop(); pf.add(i) for c in set(i._children()): if c in pb | ud: continue # If already handled, do not repeat. if set(c._parents()) <= pb: pe.add(c); pb.add(c) else: ud.add(c); udp[c] = set(c._parents()) - pb for i in ud: ipf = udp[i] & pb # (i)'s (p)arents newly (f)ound in pb if ipf: udp[i] -= ipf if udp[i]==set(): ud.remove(i); del( udp[i] ) pe.add(i); pb.add(i) break return pb

Have you looked at fgl? I would also read the paper that went with the
library. It has some nice notions of graph decomposition, and reading the
source for bfs et al should give you a good idea of how to mix in state
like 'what have I visited'.
Ben
On Fri, 6 Mar 2015 2:46 am Jeffrey Brown
Suppose I have a general (not a tree) directed graph, and I would like to find all the "pure descendents" of a node N -- that is, all nodes M for which (1) N is an ancestor of M, and (2) every partial or maximal sequence of predecessors starting from M either consists entirely of descendents of N, or else includes N itself.
I have an imperative algorithm (below) for doing that. I want to know how hard it would be to implement in Haskell. It's complex, but basically it sets up a few lists like "members of this list have had all their descendents processed", "members of this list have been processed but their descendents have not been", "members of this list have not been processed at all", etc. Then I iterate through the graph, moving nodes from one collection to the other until the "to be processed" list is empty.
I would like a function Graph -> Graph that does that and does not need to rely on a monad. The only way I can think of involves a few mutually recursive functions, each of which passes not only the original graph of interest, but all the lists described in the previous paragraph, around to the others. I don't understand Haskell's evaluation strategy well enough to anticipate the result, but my vague sense is that a lot of copies of the same data would be floating around, ruining the speed of it.
Python code for the algorithm: def descendedOnlyFrom(gset): "All Gnodes n for which every ancestry of n includes a Gnode in gset (which can be a set or a list)." # For efficiency improvement ideas and verbose comments, # see treeSearch/all.py/node.descendedOnlyFrom if 1: # variables # "pure" = "descended only from the calling Glist" pe = set(gset) # determined Pure, yet to Explore children of pf = set() # determined Pure, Finished exploring children of pb = set(gset) # determined Pure, Both kinds: pe | pf ud = set() # purity UnDetermined # descended from root, but might have parents outside of pb udp = {} # "Parents of the UnDetermined" # This is a dictionary, from gnodes to sets of gnodes. # The other four variables are sets. while pe: while pe: k = 1 i = pe.pop(); pf.add(i) for c in set(i._children()): if c in pb | ud: continue # If already handled, do not repeat. if set(c._parents()) <= pb: pe.add(c); pb.add(c) else: ud.add(c); udp[c] = set(c._parents()) - pb for i in ud: ipf = udp[i] & pb # (i)'s (p)arents newly (f)ound in pb if ipf: udp[i] -= ipf if udp[i]==set(): ud.remove(i); del( udp[i] ) pe.add(i); pb.add(i) break return pb _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Indeed, I did look at FGL. It's beautiful. It has a bug that does not
permit multiedges in directed graphs, which I need. More importantly,
though, I'm just not ready for it. I have hardly written anything bigger
than a screenful of text in Haskell. If I try using a lot of things I don't
yet understand in addition to the base language I get discouraged.
On Fri, Mar 6, 2015 at 2:25 AM, Benjamin Edwards
Have you looked at fgl? I would also read the paper that went with the library. It has some nice notions of graph decomposition, and reading the source for bfs et al should give you a good idea of how to mix in state like 'what have I visited'.
Ben
On Fri, 6 Mar 2015 2:46 am Jeffrey Brown
wrote: Suppose I have a general (not a tree) directed graph, and I would like to find all the "pure descendents" of a node N -- that is, all nodes M for which (1) N is an ancestor of M, and (2) every partial or maximal sequence of predecessors starting from M either consists entirely of descendents of N, or else includes N itself.
I have an imperative algorithm (below) for doing that. I want to know how hard it would be to implement in Haskell. It's complex, but basically it sets up a few lists like "members of this list have had all their descendents processed", "members of this list have been processed but their descendents have not been", "members of this list have not been processed at all", etc. Then I iterate through the graph, moving nodes from one collection to the other until the "to be processed" list is empty.
I would like a function Graph -> Graph that does that and does not need to rely on a monad. The only way I can think of involves a few mutually recursive functions, each of which passes not only the original graph of interest, but all the lists described in the previous paragraph, around to the others. I don't understand Haskell's evaluation strategy well enough to anticipate the result, but my vague sense is that a lot of copies of the same data would be floating around, ruining the speed of it.
Python code for the algorithm: def descendedOnlyFrom(gset): "All Gnodes n for which every ancestry of n includes a Gnode in gset (which can be a set or a list)." # For efficiency improvement ideas and verbose comments, # see treeSearch/all.py/node.descendedOnlyFrom if 1: # variables # "pure" = "descended only from the calling Glist" pe = set(gset) # determined Pure, yet to Explore children of pf = set() # determined Pure, Finished exploring children of pb = set(gset) # determined Pure, Both kinds: pe | pf ud = set() # purity UnDetermined # descended from root, but might have parents outside of pb udp = {} # "Parents of the UnDetermined" # This is a dictionary, from gnodes to sets of gnodes. # The other four variables are sets. while pe: while pe: k = 1 i = pe.pop(); pf.add(i) for c in set(i._children()): if c in pb | ud: continue # If already handled, do not repeat. if set(c._parents()) <= pb: pe.add(c); pb.add(c) else: ud.add(c); udp[c] = set(c._parents()) - pb for i in ud: ipf = udp[i] & pb # (i)'s (p)arents newly (f)ound in pb if ipf: udp[i] -= ipf if udp[i]==set(): ud.remove(i); del( udp[i] ) pe.add(i); pb.add(i) break return pb _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

On 7 March 2015 at 04:17, Jeffrey Brown
Indeed, I did look at FGL. It's beautiful. It has a bug that does not permit multiedges in directed graphs, which I need.
What do you mean by "multiedges"? Multiple edges between two nodes? If that's the case, PatriciaTree used to have this problem, but I fixed that way back in 2010 when I first took over maintainership of fgl. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

Cool! I was referring to the bug declared at the top of this page [1]. Is that warning obsolete? [1] https://web.engr.oregonstate.edu/~erwig/fgl/haskell/ On Fri, Mar 6, 2015 at 3:04 PM, Ivan Lazar Miljenovic < ivan.miljenovic@gmail.com> wrote:
Indeed, I did look at FGL. It's beautiful. It has a bug that does not
On 7 March 2015 at 04:17, Jeffrey Brown
wrote: permit multiedges in directed graphs, which I need.
What do you mean by "multiedges"? Multiple edges between two nodes? If that's the case, PatriciaTree used to have this problem, but I fixed that way back in 2010 when I first took over maintainership of fgl.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

On 7 March 2015 at 11:18, Jeffrey Brown
Cool! I was referring to the bug declared at the top of this page [1]. Is that warning obsolete?
Apart from the theory that whole website is pretty much obsolete.
On Fri, Mar 6, 2015 at 3:04 PM, Ivan Lazar Miljenovic
wrote: On 7 March 2015 at 04:17, Jeffrey Brown
wrote: Indeed, I did look at FGL. It's beautiful. It has a bug that does not permit multiedges in directed graphs, which I need.
What do you mean by "multiedges"? Multiple edges between two nodes? If that's the case, PatriciaTree used to have this problem, but I fixed that way back in 2010 when I first took over maintainership of fgl.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

For the most part you shouldn't worry much about having "copies of the same
data". Since nearly everything in Haskell is immutable, copying is a pretty
rare operation. Even something like updating a Set, it seems like now you
have two distinct Sets but in fact they will share a lot of the same
internal structure.
Here are my go-to intro references for Haskell's evaluation strategy:
*
http://chimera.labs.oreilly.com/books/1230000000929/ch02.html#sec_par-eval-w...
* https://hackhands.com/lazy-evaluation-works-haskell/
If it did turn out to be one of those situations where the cost of doing
those allocations is just too high, Haskell has mutable data structures
available, and you can hide their usage with ST. I wouldn't start there
though.
On Thu, Mar 5, 2015 at 6:45 PM, Jeffrey Brown
Suppose I have a general (not a tree) directed graph, and I would like to find all the "pure descendents" of a node N -- that is, all nodes M for which (1) N is an ancestor of M, and (2) every partial or maximal sequence of predecessors starting from M either consists entirely of descendents of N, or else includes N itself.
I have an imperative algorithm (below) for doing that. I want to know how hard it would be to implement in Haskell. It's complex, but basically it sets up a few lists like "members of this list have had all their descendents processed", "members of this list have been processed but their descendents have not been", "members of this list have not been processed at all", etc. Then I iterate through the graph, moving nodes from one collection to the other until the "to be processed" list is empty.
I would like a function Graph -> Graph that does that and does not need to rely on a monad. The only way I can think of involves a few mutually recursive functions, each of which passes not only the original graph of interest, but all the lists described in the previous paragraph, around to the others. I don't understand Haskell's evaluation strategy well enough to anticipate the result, but my vague sense is that a lot of copies of the same data would be floating around, ruining the speed of it.
Python code for the algorithm: def descendedOnlyFrom(gset): "All Gnodes n for which every ancestry of n includes a Gnode in gset (which can be a set or a list)." # For efficiency improvement ideas and verbose comments, # see treeSearch/all.py/node.descendedOnlyFrom if 1: # variables # "pure" = "descended only from the calling Glist" pe = set(gset) # determined Pure, yet to Explore children of pf = set() # determined Pure, Finished exploring children of pb = set(gset) # determined Pure, Both kinds: pe | pf ud = set() # purity UnDetermined # descended from root, but might have parents outside of pb udp = {} # "Parents of the UnDetermined" # This is a dictionary, from gnodes to sets of gnodes. # The other four variables are sets. while pe: while pe: k = 1 i = pe.pop(); pf.add(i) for c in set(i._children()): if c in pb | ud: continue # If already handled, do not repeat. if set(c._parents()) <= pb: pe.add(c); pb.add(c) else: ud.add(c); udp[c] = set(c._parents()) - pb for i in ud: ipf = udp[i] & pb # (i)'s (p)arents newly (f)ound in pb if ipf: udp[i] -= ipf if udp[i]==set(): ud.remove(i); del( udp[i] ) pe.add(i); pb.add(i) break return pb
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
participants (4)
-
Benjamin Edwards
-
Bob Ippolito
-
Ivan Lazar Miljenovic
-
Jeffrey Brown