Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Graph library recommendations for large graphs

Reply
Thread Tools

Graph library recommendations for large graphs

 
 
VanL
Guest
Posts: n/a
 
      08-24-2009
I am working on a project that will require building and querying large
graph objects (initially 8M nodes, 30-40M edges; eventually 40M nodes,
100M edges). NetworkX seems to be the most popular, but I am concerned
that a dict representation for nodes would use too much memory -- my
initial tests suggest that a graph with 3M nodes and 12M edges creates
substantial memory pressure on my machine.

Can anybody who has worked with large graphs before give a recommendation?

Thanks,

Van

 
Reply With Quote
 
 
 
 
Diez B. Roggisch
Guest
Posts: n/a
 
      08-24-2009
VanL schrieb:
> I am working on a project that will require building and querying large
> graph objects (initially 8M nodes, 30-40M edges; eventually 40M nodes,
> 100M edges). NetworkX seems to be the most popular, but I am concerned
> that a dict representation for nodes would use too much memory -- my
> initial tests suggest that a graph with 3M nodes and 12M edges creates
> substantial memory pressure on my machine.
>
> Can anybody who has worked with large graphs before give a recommendation?


My initial tests show otherwise. The below test-script creates 3 million
nodes with 12 million adjacencies, on my 2GB Notebook.

The theoretical limit for this (if we assume pointer-based
adjacency-references which makes sense if we have sparse graphs as your
numbers indicate) is (32 bits assumed):

- 8 bytes per node (4 byte pointer to adjacency list, 4 byte int for
counting the number of adjacencies in that list)
- 4 bytes per adjacency

This is 60.000.000 for your example - roughly 60MB. On my machine, the
process has 320.000.000MB - (roughly) a factor five. Given the much
richer properties a Python-object (and python-lists) have thas is pretty
good I'd say.

So for your eventual size of 40M nodes, 100M edges, we have a
theoretical amount of 560MB, times 5 makes 2.5 GB. Not exactly a low
memory profile, but manageable on modern hardware.

I don't know anything about NetworkX - it still might be the better
solution, given the underlying C-based algorithms. But if all you need
is to represent a graph of that size, it appears to be working.


---- test.py ----

import random
import gc
import time

class Node(object):

__slots__ = ["adjacencies", "value", "id"]

def __init__(self, id):
id = id
value = random.random()
self.adjacencies = []


nodes = []

gc.disable()
nc = 3000000

for i in xrange(nc):
nodes.append(Node(i))
if (i % 1000) == 0:
print i

for i in xrange(12000000):
a = random.randint(0, nc - 1)
b = random.randint(0, nc - 1)
while a == b:
b = random.randint(0, nc)
nodes[a].adjacencies.append(nodes[b])
if (i % 1000) == 0:
print "e", i


gc.enable()
while True:
time.sleep(1)
traversed = set()
def depth_search(node, depth=0):
traversed.add(node)
if depth == 4:
return
for child in node.adjacencies:
if child not in traversed:
depth_search(child, depth+1)

depth_search(nodes[random.randint(0, nc - 1)])

------


Diez
 
Reply With Quote
 
 
 
 
Bearophile
Guest
Posts: n/a
 
      08-24-2009
You may try the Python bindings for the Boost Graph Library, the graph
you talk about may fit in 2GB of a 32 bit OS too (this is the first
link I have found, it's a lot of time I don't use those graph
bindings):
http://banyan.usc.edu/log/c_cpp/boos...ython-bindings

Bye,
bearophile
 
Reply With Quote
 
Istvan Albert
Guest
Posts: n/a
 
      08-24-2009
On Aug 24, 5:37*pm, VanL <van.lindb...@gmail.com> wrote:
>
> Can anybody who has worked with large graphs before give a recommendation?
>


when using large graphs another limitation may come from the various
graph algorithm run times. Most likely you will need to squeeze out as
much as possible and a python implementation has a lot of overhead.

I've used the LEDA graph library with great success. This is a C++
library with substantial syntax sugar that looks a bit like python
(and I made some python bindings for it via SWIG and thus got the best
of both worlds, lost the code I'm afraid).

http://www.algorithmic-solutions.inf...de/Graphs.html

i.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Using Boost Graph Library to create very large graph Almoni C++ 0 01-17-2010 05:13 AM
Re: Which graph library is best suited for large graphs? Tiago de Paula Peixoto Python 0 12-12-2009 12:01 PM
Which graph library is best suited for large graphs? Wolodja Wentland Python 8 12-11-2009 08:59 PM
Help with initialization of graph (Boost Graph Library) Jef Driesen C++ 3 01-24-2006 01:44 PM
GD::Graph: "mixed" graph doesn't recognize "area" graph type Emilio Mayorga Perl Misc 6 10-08-2003 02:14 AM



Advertisments