
Hello I'm trying to create a directed graph using the Data.Graph.Inductive. The graph is a random graph using the G(n, p) model, that is, each of the n nodes is linked to every other node with probability p. I'm seeing a large increase of memory usage when n grows (this is using p = 0.1): n = 1000 -> 96MB n = 2000 -> 283MB n = 3000 -> 760MB So, I'm probably doing something very stupid :) The code is below. Is there anything I could do to optimize memory usage here? module Main where import Control.Monad import Data.Graph.Inductive import System.Random createEdges :: Int -> Double -> IO [LEdge Int] createEdges n prob = foldM create [] [1..n] where create es i = foldM (flip $ link i) es [i, i-1 .. 1] link i j es | i == j = return es -- no self-loops | otherwise = do es' <- maybeCreateEdge i j prob es es'' <- maybeCreateEdge j i prob es' return es'' maybeCreateEdge :: Node -> Node -> Double -> [LEdge Int] -> IO [LEdge Int] maybeCreateEdge i j prob es = do r <- randomDouble return $ if r < prob then (i, j, 0):es else es randomDouble :: IO Double randomDouble = getStdRandom $ random main :: IO () main = do let (n, p) = (3000, 0.1) es <- createEdges n p let g = mkGraph [(i, 0) | i <- [1..n]] es :: Gr Int Int return () Thanks, Andre

On Thu, Jul 10, 2008 at 4:57 PM, Andre Nathan
Hello
I'm trying to create a directed graph using the Data.Graph.Inductive. The graph is a random graph using the G(n, p) model, that is, each of the n nodes is linked to every other node with probability p.
So the average degree of a single node is p * n, and the expected number of edges in the entire graph will grow as O(n ^2).
I'm seeing a large increase of memory usage when n grows (this is using p = 0.1):
n = 1000 -> 96MB n = 2000 -> 283MB n = 3000 -> 760MB
So, I'm probably doing something very stupid :)
Your ratios are about 1 : 3 : 8. That pretty close to quadratic growth, 1 : 4 : 9, so I think all is well.

On Thu, 2008-07-10 at 18:32 -0400, Ronald Guida wrote:
Your ratios are about 1 : 3 : 8. That pretty close to quadratic growth, 1 : 4 : 9, so I think all is well.
Maybe, but 96MB of resident memory for a 1000-node graph looks bad, especially considering p is low. Is the internal representation of inductive graphs perhaps not very memory-efficient? I still haven't read Erwig's paper... I know this is probably not fair, but I'm comparing these numbers with a C implementation which uses Ruby's C API for its complex data structures, and a 10,000 nodes graph uses around 6MB of memory. Thanks, Andre

andre:
On Thu, 2008-07-10 at 18:32 -0400, Ronald Guida wrote:
Your ratios are about 1 : 3 : 8. That pretty close to quadratic growth, 1 : 4 : 9, so I think all is well.
Maybe, but 96MB of resident memory for a 1000-node graph looks bad, especially considering p is low. Is the internal representation of inductive graphs perhaps not very memory-efficient? I still haven't read Erwig's paper...
I know this is probably not fair, but I'm comparing these numbers with a C implementation which uses Ruby's C API for its complex data structures, and a 10,000 nodes graph uses around 6MB of memory.
Well, they're radically different graph representations, and fgl hasn't been designed for large graphs. What C library is Ruby's binding to? It might be quite cheap to bind to that. I've been on the look out for a good C graph lib to use for Haskell bindings for a while.. -- Don

On Thu, 2008-07-10 at 16:52 -0700, Don Stewart wrote:
Well, they're radically different graph representations, and fgl hasn't been designed for large graphs.
Do you know if King and Launchbury's implementation (Data.Graph) scales better?
What C library is Ruby's binding to? It might be quite cheap to bind to that. I've been on the look out for a good C graph lib to use for Haskell bindings for a while..
None. I've built my own representing the graph as a hash table with nodes as keys and arrays of nodes as values, and I'm using ruby's hash and array classes (which are written in C) for that. Did you have a look at igraph [http://cneurocvs.rmki.kfki.hu/igraph/]? It would probably be a lot of work to bind to it (it has many public functions), but it looks nice and has bindings for a few languages. Andre

On Friday July 11 2008, Andre Nathan wrote:
On Thu, 2008-07-10 at 16:52 -0700, Don Stewart wrote:
Well, they're radically different graph representations, and fgl hasn't been designed for large graphs.
Do you know if King and Launchbury's implementation (Data.Graph) scales better?
Looks like it. I now did a rough benchmark on fully connected graphs with 25 and 50 nodes. Data.Graph.Inductive used 28MB and 365MB respectively (x13 increase) compared to Data.Graph's 486KB and 2MB (x4). And Data.Graph seems to be much faster (basic operations up to 200 times), although it might be related to GC. -- Gokhan

gsan:
On Friday July 11 2008, Andre Nathan wrote:
On Thu, 2008-07-10 at 16:52 -0700, Don Stewart wrote:
Well, they're radically different graph representations, and fgl hasn't been designed for large graphs.
Do you know if King and Launchbury's implementation (Data.Graph) scales better?
Looks like it.
I now did a rough benchmark on fully connected graphs with 25 and 50 nodes. Data.Graph.Inductive used 28MB and 365MB respectively (x13 increase) compared to Data.Graph's 486KB and 2MB (x4). And Data.Graph seems to be much faster (basic operations up to 200 times), although it might be related to GC.
Do you have the bencmark code? I'd like to try a couple of variants on the underlying structures.
participants (4)
-
Andre Nathan
-
Don Stewart
-
Gökhan San
-
Ronald Guida