finding ALL cycles in a huge sparse matrix
- by Andy
Hi there,
First of all I'm quite a Java beginner, so I'm not sure if this is even possible! Basically I have a huge (3+million) data source of relational data (i.e. A is friends with B+C+D, B is friends with D+G+Z (but not A - i.e. unmutual) etc.) and I want to find every cycle within this (not necessarily connected) directed graph.
I've found this thread (http://stackoverflow.com/questions/546655/finding-all-cycles-in-graph/549402#549402) which has pointed me to Donald Johnson's (elementary) cycle-finding algorithm which, superficially at least, looks like it'll do what I'm after (I'm going to try when I'm back at work on Tuesday - thought it wouldn't hurt to ask in the meanwhile!).
I had a quick scan through the code of the Java implementation of Johnson's algorithm (in that thread) and it looks like a matrix of relations is the first step, so I guess my questions are:
a) Is Java capable of handling a 3+million*3+million matrix? (was planning on representing A-friends-with-B by a binary sparse matrix)
b) Do I need to find every connected subgraph as my first problem, or will cycle-finding algorithms handle disjoint data?
c) Is this actually an appropriate solution for the problem? My understanding of "elementary" cycles is that in the graph below, rather than picking out A-B-C-D-E-F it'll pick out A-B-F, B-C-D etc. but that's not the end of the world given the task.
E
/ \
D---F
/ \ / \
C---B---A
d) If necessary, I can simplify the problem by enforcing mutuality in relations - i.e. A-friends-with-B <== B-friends-with-A, and if really necessary I can maybe cut down the data size, but realistically it is always going to be around the 1mil mark.
z) Is this a P or NP task?! Am I biting off more than I can chew?
Thanks all, any help appreciated!
Andy