I am looking for good recommendations for scalable and/or parallel large graph analysis libraries in various languages. The problems I am working on involve significant computational analysis of graphs/networks with 1-100 million nodes and 10 million to 1+ billion edges. The largest SMP computer I am using has 256 GB memory, but I also have access to an HPC cluster with 1000 cores, 2 TB aggregate memory, and MPI for communication.
I am primarily looking for scalable, high-performance graph libraries that could be used in either single or multi-threaded scenarios, but parallel analysis libraries based on MPI or a similar protocol for communication and/or distributed memory are also of interest for high-end problems. Target programming languages include C++, C, Java, and Python.
My research to-date has come up with the following possible solutions for these languages:
C++ -- The most viable solutions appear to be the Boost Graph Library and Parallel Boost Graph Library. I have looked briefly at MTGL, but it is currently slanted more toward massively multithreaded hardware architectures like the Cray XMT.
C - igraph and SNAP (Small-world Network
Analysis and Partitioning); latter uses OpenMP for parallelism on SMP systems.
Java - I have found no parallel libraries here yet, but JGraphT and perhaps JUNG are leading contenders in the non-parallel space.
Python - igraph and NetworkX look like the most solid options, though neither is parallel. There used to be Python bindings for BGL, but these are now unsupported; last release in 2005 looks stale now.
Other topics here on SO that I've looked at have discussed graph libraries in C++, Java, Python, and other languages. However, none of these topics focused significantly on scalability.
Does anyone have recommendations they can offer based on experience with any of the above or other library packages when applied to large graph analysis problems? Performance, scalability, and code stability/maturity are my primary concerns. Most of the specialized algorithms will be developed by my team with the exception of any graph-oriented parallel communication or distributed memory frameworks (where the graph state is distributed across a cluster).