Representing cyclic data structures efficiently in Haskell

What is the best way to represent cyclic data structures in Haskell? Lists are trivial. Trees are trivial. But what if the thing you're trying to represent is shaped like an electronic circuit, with a (possibly large) number of nodes connected by multiple arrows to other nodes? In an imperative language like C++, I'd probably just model the circuit directly, with objects representing nodes and pointers representing links between nodes. A good way to model this in Haskell is less obvious, however. A simplistic approach would be to represent such a structure as a list of nodes, where each node contained a list of other connected nodes. Containing the actual nodes within the sub-lists probably isn't feasible -- some kind of reference would be necessary in order to get around chicken-and-egg problems with instantiating the data structure. But, in this case, it's not so easy to see how one could 'walk' the structure efficiently, because moving from node to node would involve quite horrible (possibly slow) lookups. In C++, I'd simply follow a pointer, which would be an extremely fast operation. I would also need to implement efficient indexes into the data structure -- flat searching will be too slow for non-trivial amounts of data. In C++, I'd implement one or more external (probably STL-based) indexes that point into the main data structure. How do people typically solve this problem at the moment? I know a few people out there have written hardware compilers in Haskell and similar languages, so there should be some experience of this in the community. I can probably work something out soon enough, but reinventing the wheel is never a great idea. Thank you in advance, -- ---------------------------------------------- / __ + / Sarah Thompson **** / / (_ _ _ _ |_ / * / / __)(_|| (_|| ) / sarah@telergy.com * / / + / http://findatlantis.com/ / ----------------------------------------------

Hi Sarah, Representing cyclic structures is indeed somewhat tricky in a pure functional language. The obvious representation (representing the link structure implicitly in an algebraic data type) doesn't work, because you can't obvserve cycles. I recommend checking out two pieces of work in this area: 1. Martin Erwig's work on inductive representations of graphs: Inductive Graphs and Functional Graph Algorithms, Martin Erwig Journal of Functional Programming Vol. 11, No. 5, 467-492, 2001 available from: http://cs.oregonstate.edu/~erwig/papers/abstracts.html#JFP01 2. Koen Claessen and David Sands' work on observable sharing: Observable Sharing for Functional Circuit Description, Koen Claessen, David Sands, published at ASIAN '99, in Phuket, Thailand available from: http://www.math.chalmers.se/~koen/Papers/obs-shar.ps Hope that helps, -Antony -- Antony Courtney Grad. Student, Dept. of Computer Science, Yale University antony@apocalypse.org http://www.apocalypse.org/pub/u/antony

Why not just use a GUID to represent each node. Then a C pointer just becomes a GUID in a tuple alongside the node. The C pointer approach is just a method to have the memory system dole out GUIDs. And it can be generalised. Then all the tuples could be inserted in a hash table on the GUID component. Edges are then tuples of two GUIDs. These could be stored in some other data structure. Another approach would be to have a type (node * (guid list)) (excuse the Ocaml syntax). Put elements of this type in a big array. The GUID of a node (guid) is its array index. Each node is accompanied in the tuple by a list of nodes that it points to. You know your workload and whether you have advance knowledge of the number of nodes, so you are in the position to compare the efficiency of these 2 approaches. Good luck Lex -- Lex Stein http://www.eecs.harvard.edu/~stein/ stein@eecs.harvard.edu TEL: 617-233-0246 On Mon, 7 Jul 2003, Sarah Thompson wrote:
What is the best way to represent cyclic data structures in Haskell?
Lists are trivial. Trees are trivial. But what if the thing you're trying to represent is shaped like an electronic circuit, with a (possibly large) number of nodes connected by multiple arrows to other nodes? In an imperative language like C++, I'd probably just model the circuit directly, with objects representing nodes and pointers representing links between nodes. A good way to model this in Haskell is less obvious, however.
A simplistic approach would be to represent such a structure as a list of nodes, where each node contained a list of other connected nodes. Containing the actual nodes within the sub-lists probably isn't feasible -- some kind of reference would be necessary in order to get around chicken-and-egg problems with instantiating the data structure. But, in this case, it's not so easy to see how one could 'walk' the structure efficiently, because moving from node to node would involve quite horrible (possibly slow) lookups. In C++, I'd simply follow a pointer, which would be an extremely fast operation.
I would also need to implement efficient indexes into the data structure -- flat searching will be too slow for non-trivial amounts of data. In C++, I'd implement one or more external (probably STL-based) indexes that point into the main data structure.
How do people typically solve this problem at the moment? I know a few people out there have written hardware compilers in Haskell and similar languages, so there should be some experience of this in the community. I can probably work something out soon enough, but reinventing the wheel is never a great idea.
Thank you in advance,
-- ---------------------------------------------- / __ + / Sarah Thompson **** / / (_ _ _ _ |_ / * / / __)(_|| (_|| ) / sarah@telergy.com * / / + / http://findatlantis.com/ / ----------------------------------------------
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

--- Sarah Thompson
What is the best way to represent cyclic data structures in Haskell?
You _might_ find some useful ideas in Franklyn Turbak and J. B. Wells. Cycle Therapy: A Prescription for Fold and Unfold on Regular Trees. Third International Conference on Principles and Practice of Declarative Programming. ACM, 2001. http://cs.wellesley.edu/~fturbak/pubs/ppdp01.html http://cs.wellesley.edu/~fturbak/pubs/index.html Stefan Kahrs. Unlimp: Uniqueness as a leitmotiv for implementation. In M. Bruynooghe and M. Wirsing, editors, Proc. Programming Language Implementation and Logic Programming, Lecture Notes in Computer Science 631, 115--129, 1992. http://citeseer.nj.nec.com/kahrs92unlimp.html Chris Milton (busy processing MILSTRIPs in Perl) ===== Christopher Milton cmiltonperl@yahoo.com ���Ӥ䤫�鷺���ӹ����β� --Matsuo Bashou

G'day all. As you note, some kind of indirection is what you want here. On Mon, Jul 07, 2003 at 04:04:17PM +0100, Sarah Thompson wrote:
I would also need to implement efficient indexes into the data structure -- flat searching will be too slow for non-trivial amounts of data. In C++, I'd implement one or more external (probably STL-based) indexes that point into the main data structure.
The direct Haskell equivalent is to use Refs; either IORefs or STRefs. (I'd personally use IORefs, but that's because I like having IO around.) Dereferencing an IORef is a constant-time operation, just like chasing a pointer in C. Cheers, Andrew Bromage

G'day all. On Mon, Jul 07, 2003 at 04:04:17PM +0100, Sarah Thompson wrote:
I would also need to implement efficient indexes into the data structure -- flat searching will be too slow for non-trivial amounts of data. In C++, I'd implement one or more external (probably STL-based) indexes that point into the main data structure.
I replied:
The direct Haskell equivalent is to use Refs; either IORefs or STRefs. (I'd personally use IORefs, but that's because I like having IO around.) Dereferencing an IORef is a constant-time operation, just like chasing a pointer in C.
I forgot to give an example. :-) Suppose you want some kind of electronic circuit, as you suggested. Abstractly, you want something like this: data Node = Node NodeState [Component] data Component = Resistor ResistorCharacteristics Node Node | Transistor TransistorCharacteristics Node Node Node | {- etc -} You can make this indirect in introducing refs: type NodeRef = IORef Node type ComponentRef = IORef Component data Node = Node NodeState [ComponentRef] data Component = Resistor ResistorCharacteristics NodeRef NodeRef | Transistor TransistorCharacteristics NodeRef NodeRef NodeRef The data structures are mutable: getNodeState :: NodeRef -> IO NodeState getNodeState node = do Node state _ <- readIORef node return state setNodeState :: NodeRef -> NodeState -> IO () setNodeState node state = do modifyIORef (\Node _ cs -> Node state cs) node and it's straightforward to construct an index into the middle somewhere: type NamedNodeIndex = FiniteMap String NodeRef Cheers, Andrew Bromage

Hi Sarah, On Mon, 2003-07-07 at 16:04, Sarah Thompson wrote:
What is the best way to represent cyclic data structures in Haskell?
Lists are trivial. Trees are trivial. But what if the thing you're trying to represent is shaped like an electronic circuit, ... ... How do people typically solve this problem at the moment? I know a few people out there have written hardware compilers in Haskell and similar languages, ...
You're right, this is a longstanding problem! I've experimented with several solutions to it in Hydra, a pure functional hardware description language whose aims include preserving referential transparency, and also keeping within a natural functional style. My view is that if you have to resort to encoding a C-style solution in a functional language, then you might as well use C instead. Anyway, there are several papers on Hydra that are relevant. The first solution, described in a paper from 1988, is to build the circular graphs in the obvious way and then to use a pointer-equality predicate to traverse the graphs safely. Of course, this makes the hardware description language impure, it breaks referential transparency, and it has other drawbacks. You can find that paper on my web page, but it was written before Haskell appeared, and it uses an older functional language with different syntax. I wrote a paper about the problem in 1992, www.dcs.gla.ac.uk/~jtod/papers/1992-Netlist/ This paper discusses the problems with the pointer-equality solution and introduces naming as a pure alternative. The idea of naming is to give unique labels to some of the nodes, so that you can build the circular graph in the usual way, and you can also traverse the graph safely. This is a pure functional solution, and it has a lot of advantages over adjacency lists, but it does require that the labels are introduced correctly. The best way to put the labels in is by a transformation. Currently, Hydra uses an automated transformation implemented with Template Haskell to introduce the names. I think this is the ideal solution: it results in a very clean notation for specifying the circuit, and it supports traversal of the graph, yet it preserves simple equational reasoning and all the supporting algorithms are in a pure functional style. I'm currently writing a paper on the details of this solution. It won't be finished for some time, but there already exists a brief survey of the solution, which you can find at: www.dcs.gla.ac.uk/~jtod/drafts/ ("Embedding a hardware description language in Template Haskell") This paper focuses on the way Hydra is implemented as a domain specific language embedded in Haskell, and the last part of it talks about the graph traversal issues. This is a draft, submitted for publication, and any comments on it would be welcome! (However, the actual method used in Hydra is considerably more complex than what's presented in the draft paper, because there are a number of other issues that need to be addressed in addition to just traversing the circular graph safely.) There are a lot of alternative approaches to the problem. They will be compared in more detail in the paper-in-progress, but briefly they include: (1) Using adjacency lists, which have the disadvantages that you pointed out. (2) Using a monadic approach, which can be used in several distinct ways to solve the problem, but which have a bad impact on the specification style and the ease of reasoning about the circuit. (3) Using impure solutions, such as the pointer equality method used in the 1988 Hydra paper. Observable sharing (Claessen and Sands, 1999) is identical to the solution that was implemented in Hydra in the mid 80's and described in the 1988 paper; the difference is that they call the approach "observable sharing" instead of "pointer equality". Best wishes, John O'Donnell
participants (6)
-
Andrew J Bromage
-
Antony Courtney
-
Christopher Milton
-
John O'Donnell
-
Lex Stein
-
Sarah Thompson