
I have a problem that I'm trying to use Haskell for, and I think I'm running into scalability issues in FGL. However, I am quite new to practical programming in Haskell, so it's possible that I have some other bone-headed performance bug in my code. I tried looking around for concrete information about the scalability of Haskell's graph libraries, but didn't find much. So here are the characteristics of the problem I'm working on: - Large directed graphs. Mostly 10k-100k nodes, but some in the low 100ks. - Sparse graphs. The number of edges is only 2-3x the number of nodes. - Immutable structure, mutable labels. After initially reading in the graphs, their shape doesn't change, but information "flows" around the graph, changing the labels on nodes and edges. I wrote some code that reads in graphs and some some basic flow computations on them. The first few graphs I tried were around 10k nodes, and the performance was okay (on the order of several seconds). When I tried some larger graphs (~100k), the memory consumption spiked into multiple GB, the CPU utilization went down to single digit percentages and the overall running time was closer to hours than seconds. Because the graph structure is basically immutable for my problem, I'm tempted to write my own graph representation based on mutable arrays. Before I embark on that, I wonder if anyone else can share their experience with large graphs in Haskell? Is there a library (FGL or otherwise) that should be able to scale up to the size of graph I'm interested in, if I write my code correctly? Thanks, Ben

2012/5/20 Benjamin Ylvisaker
I have a problem that I'm trying to use Haskell for, and I think I'm running into scalability issues in FGL. However, I am quite new to practical programming in Haskell, so it's possible that I have some other bone-headed performance bug in my code. I tried looking around for concrete information about the scalability of Haskell's graph libraries, but didn't find much. So here are the characteristics of the problem I'm working on:
- Large directed graphs. Mostly 10k-100k nodes, but some in the low 100ks. - Sparse graphs. The number of edges is only 2-3x the number of nodes. - Immutable structure, mutable labels. After initially reading in the graphs, their shape doesn't change, but information "flows" around the graph, changing the labels on nodes and edges.
I would like to suggest to you a representation based in 32-bit integers as vertex index. I.e., "roll your own" Use strict IntMap IntSet for neighbor information, it is very efficient.
I wrote some code that reads in graphs and some some basic flow computations on them. The first few graphs I tried were around 10k nodes, and the performance was okay (on the order of several seconds). When I tried some larger graphs (~100k), the memory consumption spiked into multiple GB, the CPU utilization went down to single digit percentages and the overall running time was closer to hours than seconds.
Looks like your code does not force everything. It leaves some thunks unevaluated, check for that situation. It is common pitfall, not only for computations on graphs.
Because the graph structure is basically immutable for my problem, I'm tempted to write my own graph representation based on mutable arrays. Before I embark on that, I wonder if anyone else can share their experience with large graphs in Haskell? Is there a library (FGL or otherwise) that should be able to scale up to the size of graph I'm interested in, if I write my code correctly?
The above structure (IntMap IntSet) allowed for fast computations on relatively large arrays, in order of 1M vertices and 16M undirected/32M directed edges.

I had issues with FGL in the past, too. Although FGL is really nice to
work with, it just uses a ridiculous amount of memory for large
graphs.
In the end, I used Data.Graph from containers [1]. This was a lot more
reasonable, and let me finish my project relatively easily.
Regards,
- Clark
[1] http://hackage.haskell.org/packages/archive/containers/0.5.0.0/doc/html/Data...
On Sun, May 20, 2012 at 10:55 AM, Serguey Zefirov
2012/5/20 Benjamin Ylvisaker
: I have a problem that I'm trying to use Haskell for, and I think I'm running into scalability issues in FGL. However, I am quite new to practical programming in Haskell, so it's possible that I have some other bone-headed performance bug in my code. I tried looking around for concrete information about the scalability of Haskell's graph libraries, but didn't find much. So here are the characteristics of the problem I'm working on:
- Large directed graphs. Mostly 10k-100k nodes, but some in the low 100ks. - Sparse graphs. The number of edges is only 2-3x the number of nodes. - Immutable structure, mutable labels. After initially reading in the graphs, their shape doesn't change, but information "flows" around the graph, changing the labels on nodes and edges.
I would like to suggest to you a representation based in 32-bit integers as vertex index. I.e., "roll your own"
Use strict IntMap IntSet for neighbor information, it is very efficient.
I wrote some code that reads in graphs and some some basic flow computations on them. The first few graphs I tried were around 10k nodes, and the performance was okay (on the order of several seconds). When I tried some larger graphs (~100k), the memory consumption spiked into multiple GB, the CPU utilization went down to single digit percentages and the overall running time was closer to hours than seconds.
Looks like your code does not force everything. It leaves some thunks unevaluated, check for that situation.
It is common pitfall, not only for computations on graphs.
Because the graph structure is basically immutable for my problem, I'm tempted to write my own graph representation based on mutable arrays. Before I embark on that, I wonder if anyone else can share their experience with large graphs in Haskell? Is there a library (FGL or otherwise) that should be able to scale up to the size of graph I'm interested in, if I write my code correctly?
The above structure (IntMap IntSet) allowed for fast computations on relatively large arrays, in order of 1M vertices and 16M undirected/32M directed edges.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Benjamin,
- Immutable structure, mutable labels. After initially reading in the graphs, their shape doesn't change, but information "flows" around the graph, changing the labels on nodes and edges.
I have been working on a similar problem for a while now, hence my interest and dissatisfaction with FGL. I'm not sure if this is exactly what you are looking for, but I've been playing around with mutable graphs (both structure and labels) and found issues with GC, so I set about the goal of creating an unboxed in-place mutable graph that has O(1) neighbor retrieval. The graph has to live in ST or IO (i've been using ST, and it works fine, though I'd like to try STM and try a multiple-spider traversal). So far, it can do very fast traversal and label mutation with no allocation or GC after the initial build.
I wrote some code that reads in graphs and some some basic flow computations on them ... When I tried some larger graphs (~100k), the memory consumption spiked into multiple GB, ...
I can try to use the nodes/specs you provide to give you an estimate of what my framework can handle. If that works for you, I'll clean up my code and you can give it a shot. Send me whatever other details you think are relevant. I am also curious what would happen if I take out the mutable structure feature and just stick with mutable labels and how it affects performance. -Tom

I can try to use the nodes/specs you provide to give you an estimate of what my framework can handle. If that works for you, I'll clean up my code and you can give it a shot. Send me whatever other details you think are relevant.
Benjamin, I had a few moments, so I made a sparse graph of 100k nodes, each with 2-3 edges with each node holding an unboxed int. This is created in 0.12 sec and then traversed for 100 million steps (add one to each node, then move to next node) in an additional 9 secs but no more GC's or allocations. Let me know if this would help you out, I would like a chance to use my pet project for something useful.
participants (4)
-
Benjamin Ylvisaker
-
Clark Gaebel
-
Serguey Zefirov
-
tomberek