Graph library recommendations for large graphs

VanL · Aug 24, 2009

I am working on a project that will require building and querying large
graph objects (initially 8M nodes, 30-40M edges; eventually 40M nodes,
100M edges). NetworkX seems to be the most popular, but I am concerned
that a dict representation for nodes would use too much memory -- my
initial tests suggest that a graph with 3M nodes and 12M edges creates
substantial memory pressure on my machine.

Can anybody who has worked with large graphs before give a recommendation?

Thanks,

Van

Diez B. Roggisch · Aug 24, 2009

VanL said:
I am working on a project that will require building and querying large
graph objects (initially 8M nodes, 30-40M edges; eventually 40M nodes,
100M edges). NetworkX seems to be the most popular, but I am concerned
that a dict representation for nodes would use too much memory -- my
initial tests suggest that a graph with 3M nodes and 12M edges creates
substantial memory pressure on my machine.

Can anybody who has worked with large graphs before give a recommendation?

My initial tests show otherwise. The below test-script creates 3 million
nodes with 12 million adjacencies, on my 2GB Notebook.

The theoretical limit for this (if we assume pointer-based
adjacency-references which makes sense if we have sparse graphs as your
numbers indicate) is (32 bits assumed):

- 8 bytes per node (4 byte pointer to adjacency list, 4 byte int for
counting the number of adjacencies in that list)
- 4 bytes per adjacency

This is 60.000.000 for your example - roughly 60MB. On my machine, the
process has 320.000.000MB - (roughly) a factor five. Given the much
richer properties a Python-object (and python-lists) have thas is pretty
good I'd say.

So for your eventual size of 40M nodes, 100M edges, we have a
theoretical amount of 560MB, times 5 makes 2.5 GB. Not exactly a low
memory profile, but manageable on modern hardware.

I don't know anything about NetworkX - it still might be the better
solution, given the underlying C-based algorithms. But if all you need
is to represent a graph of that size, it appears to be working.

---- test.py ----

import random
import gc
import time

class Node(object):

__slots__ = ["adjacencies", "value", "id"]

def __init__(self, id):
id = id
value = random.random()
self.adjacencies = []

nodes = []

gc.disable()
nc = 3000000

for i in xrange(nc):
nodes.append(Node(i))
if (i % 1000) == 0:
print i

for i in xrange(12000000):
a = random.randint(0, nc - 1)
b = random.randint(0, nc - 1)
while a == b:
b = random.randint(0, nc)
nodes[a].adjacencies.append(nodes)
if (i % 1000) == 0:
print "e", i

gc.enable()
while True:
time.sleep(1)
traversed = set()
def depth_search(node, depth=0):
traversed.add(node)
if depth == 4:
return
for child in node.adjacencies:
if child not in traversed:
depth_search(child, depth+1)

depth_search(nodes[random.randint(0, nc - 1)])

Bearophile · Aug 24, 2009

You may try the Python bindings for the Boost Graph Library, the graph
you talk about may fit in 2GB of a 32 bit OS too (this is the first
link I have found, it's a lot of time I don't use those graph
bindings):
http://banyan.usc.edu/log/c_cpp/boost-graph-library-python-bindings

Bye,
bearophile

Istvan Albert · Aug 25, 2009

Can anybody who has worked with large graphs before give a recommendation?

when using large graphs another limitation may come from the various
graph algorithm run times. Most likely you will need to squeeze out as
much as possible and a python implementation has a lot of overhead.

I've used the LEDA graph library with great success. This is a C++
library with substantial syntax sugar that looks a bit like python
(and I made some python bindings for it via SWIG and thus got the best
of both worlds, lost the code I'm afraid).

http://www.algorithmic-solutions.info/leda_guide/Graphs.html

i.

Which graph library is best suited for large graphs?	8	Dec 11, 2009
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Changes to Answers to Frequently Asked Questions (FAQ)	1	Jul 4, 2004
comp.lang.vhdl FAQ part 3 of 4: products & services	0	Jul 8, 2003
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

Graph library recommendations for large graphs

VanL

Diez B. Roggisch

Bearophile

Istvan Albert

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads