Velocity Reviews > Perl > perl graph theory (using Graph::Directed)

# perl graph theory (using Graph::Directed)

IRR
Guest
Posts: n/a

 07-26-2004
Hi all,
I'm fairly new to perl and graph theory, so hopefully this question isn't
too ignorant on both counts. I'm trying to input basically an edge list for
a graph, and get an output of the connected graphs that all of these edges
comprise. So for example:
Edge list input:
A B
A C
B A
D E
Perl script output:
A+B+C,D+E
using strongly_connected_graph (described in Graph::Base). Admittedly,
strongly_connected_graph is a black box to me -- I'm not sure if its doing a
DFS versus BFS or ?.
But what is clear is that this scales exponentially -- O(n^2.3) or so, based
on a few test sets) -- in time versus the total number of input edges. This
is fine for a few smaller edge lists that I have, but my eventual goal is to
do this for a set with ~50,000 vertices and ~2 million edges total. Based
on those test sets I mentioned, this size dataset will take ~7000 hours to
run on my current system, which obviously isn't reasonable -- and that's not
even accounting for things getting choked up in inputing that amount of data
So my question is, am I approaching this problem with the right tools? Is
there another way within the Graph module to do this more efficiently?
Thanks,
I.R.

Greg Bacon
Guest
Posts: n/a

 07-26-2004
In article <z1YMc.454\$(E-Mail Removed)>,
IRR <(E-Mail Removed)> wrote:

: I'm fairly new to perl and graph theory, so hopefully this question
: isn't too ignorant on both counts. I'm trying to input basically an
: edge list for a graph, and get an output of the connected graphs that
: all of these edges comprise. [...]

Do you want connected subgraphs or strongly connected? You say
connected here, but you're searching for strongly connected subgraphs
with Graph::Base::strongly_connected_graph.

You say you're new to graph theory, so I'll give quick overviews in
case you don't know the difference. A connected graph (perhaps called
weakly connected for emphasis) is a graph such that for each vertex v,
v is reachable from some other vertex w. A strongly connected graph is
one such that for each pair of vertices v and w, there's a path from v
to w and a path from w to v.

Below is a rendering in Perl of the algorithm from Baase that
enumerates weakly connected subgraphs. The author reports time
complexity of Theta(max(n,m)) with space Theta(n+m), where n is the
number of vertices and m is the number of edges.

#! /usr/local/bin/perl

use warnings;
use strict;

# from Baase's *Computer Algorithms*, 2nd ed., pg. 180

sub dfs {
my \$v = shift;
my \$mark = shift;
my \$tag = shift;

\$mark->{\$v} = \$tag;

my \$w = shift @{ \$adj->{\$v} };

}
}

sub concomps {
# destructive, so copy
my %adj = map ref(\$_) ? [ @\$_ ] : \$_, %{ shift @_ };
my @v = @_;

my %mark;

my \$componentNumber = 0;

foreach my \$v (@v) {
unless (\$mark{\$v}) {
++\$componentNumber;

}
}

my %subgraph;
for (keys %mark) {
push @{ \$subgraph{\$mark{\$_}} }, \$_;
}

values %subgraph;
}

## main
my %v;

while (<DATA>) {
my(\$v,\$w) = split;

# treat graph as undirected
push @{ \$adj{\$v} } => \$w;
push @{ \$adj{\$w} } => \$v;

++\$v{\$v};
++\$v{\$w};
}

print join("," => map join("+", sort @\$_), concomps \%adj, keys %v), "\n";

__DATA__
A B
A C
B A
D E

The author used arrays, and the code above uses hashes to be more
Perlish but with time and space penalties. If the size of your input
makes you consider switching back to arrays, you might also consider
switching to a lower-level language.

Hope this helps,
Greg
--
Those who wish to preserve freedom should recognize, however, that inflation
is probably the most important single factor in that vicious circle wherein
one kind of government action makes more and more government control necessary.
-- F. A. Hayek

IRR
Guest
Posts: n/a

 07-27-2004
Thanks Greg,
You're exactly right, and I can see how strongly_connected_graph could
really eat up a substantial amount of time. That script you gave works
seamlessly and gave exactly what I was after with about 2 lines of
modification -- and it did so around 100x faster than my previous attempt!
Also picked up a copy of the book at the local library (1st edition, but it
still has an informative section on graph algorithms).
Hugely obliged,
I.R.

"Greg Bacon" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> In article <z1YMc.454\$(E-Mail Removed)>,
> IRR <(E-Mail Removed)> wrote:
>
> : I'm fairly new to perl and graph theory, so hopefully this question
> : isn't too ignorant on both counts. I'm trying to input basically an
> : edge list for a graph, and get an output of the connected graphs that
> : all of these edges comprise. [...]
>
> Do you want connected subgraphs or strongly connected? You say
> connected here, but you're searching for strongly connected subgraphs
> with Graph::Base::strongly_connected_graph.
>
> You say you're new to graph theory, so I'll give quick overviews in
> case you don't know the difference. A connected graph (perhaps called
> weakly connected for emphasis) is a graph such that for each vertex v,
> v is reachable from some other vertex w. A strongly connected graph is
> one such that for each pair of vertices v and w, there's a path from v
> to w and a path from w to v.
>
> Below is a rendering in Perl of the algorithm from Baase that
> enumerates weakly connected subgraphs. The author reports time
> complexity of Theta(max(n,m)) with space Theta(n+m), where n is the
> number of vertices and m is the number of edges.
>
> #! /usr/local/bin/perl
>
> use warnings;
> use strict;
>
> # from Baase's *Computer Algorithms*, 2nd ed., pg. 180
>
> sub dfs {
> my \$v = shift;
> my \$mark = shift;
> my \$tag = shift;
>
> \$mark->{\$v} = \$tag;
>
> my \$w = shift @{ \$adj->{\$v} };
>
> }
> }
>
> sub concomps {
> # destructive, so copy
> my %adj = map ref(\$_) ? [ @\$_ ] : \$_, %{ shift @_ };
> my @v = @_;
>
> my %mark;
>
> my \$componentNumber = 0;
>
> foreach my \$v (@v) {
> unless (\$mark{\$v}) {
> ++\$componentNumber;
>
> dfs \$v, \%adj, \%mark, \$componentNumber;
> }
> }
>
> my %subgraph;
> for (keys %mark) {
> push @{ \$subgraph{\$mark{\$_}} }, \$_;
> }
>
> values %subgraph;
> }
>
> ## main
> my %v;
>
> while (<DATA>) {
> my(\$v,\$w) = split;
>
> # treat graph as undirected
> push @{ \$adj{\$v} } => \$w;
> push @{ \$adj{\$w} } => \$v;
>
> ++\$v{\$v};
> ++\$v{\$w};
> }
>
> print join("," => map join("+", sort @\$_), concomps \%adj, keys %v),

"\n";
>
> __DATA__
> A B
> A C
> B A
> D E
>
> The author used arrays, and the code above uses hashes to be more
> Perlish but with time and space penalties. If the size of your input
> makes you consider switching back to arrays, you might also consider
> switching to a lower-level language.
>
> Hope this helps,
> Greg
> --
> Those who wish to preserve freedom should recognize, however, that

inflation
> is probably the most important single factor in that vicious circle

wherein
> one kind of government action makes more and more government control

necessary.
> -- F. A. Hayek