Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > counting matched lines in extremely large files.

Reply
Thread Tools

counting matched lines in extremely large files.

 
 
mikester
Guest
Posts: n/a
 
      12-18-2003
First off I'll say - I am a bad perl programmer.

I want to be better and with your help I'll get there and then be able
to contribute more here.

That being said, I have a simple problem compounded by file size.

I have a PIX that logs to my syslog server for a ton of items - my
logs sizes get extremely large; ~13 GIGABYTEs daily and they are
rotated daily.

I'm trying to set up some intrusion detection but with file sizes that
big just counting incidents to start getting a baseline gets time, cpu
and memory intensive using shell commands like grep. So I wanted to do
something in perl but I don't know if because of the file size and
memory limitations I can do that.

Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?

More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...

Anyway - your help is appreciated.

The Mikester
 
Reply With Quote
 
 
 
 
mikester
Guest
Posts: n/a
 
      12-18-2003
http://www.velocityreviews.com/forums/(E-Mail Removed) (mikester) wrote in message news:<(E-Mail Removed). com>...
> First off I'll say - I am a bad perl programmer.
>
> I want to be better and with your help I'll get there and then be able
> to contribute more here.
>
> That being said, I have a simple problem compounded by file size.
>
> I have a PIX that logs to my syslog server for a ton of items - my
> logs sizes get extremely large; ~13 GIGABYTEs daily and they are
> rotated daily.
>
> I'm trying to set up some intrusion detection but with file sizes that
> big just counting incidents to start getting a baseline gets time, cpu
> and memory intensive using shell commands like grep. So I wanted to do
> something in perl but I don't know if because of the file size and
> memory limitations I can do that.
>
> Here's the shell command based perl script I run to get a basic count
> on a certain number of incidents.
>
> #!/usr/bin/perl
> $LOG = "$ARGV[1]";
> $VARIABLE = "$ARGV[0]";
> $GREP = `zgrep -c $VARIABLE $LOG`;
> print "$GREP\n";
>
> I print out the number and another program uses that output to put the
> number into a database.
>
> How would I accompilish this simply in perl?
>
> More complicated would be to match multiple variables against the same
> log at one time. I would just pull the log into memory if it were a
> manageable size but it is not...
>
> Anyway - your help is appreciated.
>
> The Mikester



Sorry, typo it is actually

> #!/usr/bin/perl
> $LOG = "$ARGV[1]";
> $VARIABLE = "$ARGV[0]";
> $GREP = `grep -c $VARIABLE $LOG`; <----
> print "$GREP\n";


Thanks
 
Reply With Quote
 
 
 
 
Jim Gibson
Guest
Posts: n/a
 
      12-19-2003
In article <(E-Mail Removed) >, mikester
<(E-Mail Removed)> wrote:

[snip]

>
> Here's the shell command based perl script I run to get a basic count
> on a certain number of incidents.
>
> #!/usr/bin/perl
> $LOG = "$ARGV[1]";
> $VARIABLE = "$ARGV[0]";
> $GREP = `zgrep -c $VARIABLE $LOG`;
> print "$GREP\n";
>
> I print out the number and another program uses that output to put the
> number into a database.
>
> How would I accompilish this simply in perl?


Here is a simple perl program that will do that:

#!/usr/bin/perl

use strict;
use warnings;

my $log = $ARGV[1];
my $count = 0;

open(LOG,$log) or die("Can't open $log: $!");
while(<LOG>) {
$count++ if /$ARGV[0]/;
}
print "count of '$ARGV[0]' in $log is $count\n";


>
> More complicated would be to match multiple variables against the same
> log at one time. I would just pull the log into memory if it were a
> manageable size but it is not...


Scanning one line at a time is better. You can make the regular
expression (/$ARGV[0]/ above) as complicated as you want it.

>
> Anyway - your help is appreciated.
>
> The Mikester

 
Reply With Quote
 
mikester
Guest
Posts: n/a
 
      12-22-2003
Jim Gibson <(E-Mail Removed)> wrote in message news:<191220031038058768%(E-Mail Removed) >...
> In article <(E-Mail Removed) >, mikester
> <(E-Mail Removed)> wrote:
>
> [snip]
>
> >
> > Here's the shell command based perl script I run to get a basic count
> > on a certain number of incidents.
> >
> > #!/usr/bin/perl
> > $LOG = "$ARGV[1]";
> > $VARIABLE = "$ARGV[0]";
> > $GREP = `zgrep -c $VARIABLE $LOG`;
> > print "$GREP\n";
> >
> > I print out the number and another program uses that output to put the
> > number into a database.
> >
> > How would I accompilish this simply in perl?

>
> Here is a simple perl program that will do that:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $log = $ARGV[1];
> my $count = 0;
>
> open(LOG,$log) or die("Can't open $log: $!");
> while(<LOG>) {
> $count++ if /$ARGV[0]/;
> }
> print "count of '$ARGV[0]' in $log is $count\n";
>
>
> >
> > More complicated would be to match multiple variables against the same
> > log at one time. I would just pull the log into memory if it were a
> > manageable size but it is not...

>
> Scanning one line at a time is better. You can make the regular
> expression (/$ARGV[0]/ above) as complicated as you want it.
>
> >
> > Anyway - your help is appreciated.
> >
> > The Mikester



I'll give that a shot tomorrow, Thanks - I'll let you know how it goes.
 
Reply With Quote
 
mikester
Guest
Posts: n/a
 
      12-22-2003
Jim Gibson <(E-Mail Removed)> wrote in message news:<191220031038058768%(E-Mail Removed) >...
> In article <(E-Mail Removed) >, mikester
> <(E-Mail Removed)> wrote:
>
> [snip]
>
> >
> > Here's the shell command based perl script I run to get a basic count
> > on a certain number of incidents.
> >
> > #!/usr/bin/perl
> > $LOG = "$ARGV[1]";
> > $VARIABLE = "$ARGV[0]";
> > $GREP = `zgrep -c $VARIABLE $LOG`;
> > print "$GREP\n";
> >
> > I print out the number and another program uses that output to put the
> > number into a database.
> >
> > How would I accompilish this simply in perl?

>
> Here is a simple perl program that will do that:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $log = $ARGV[1];
> my $count = 0;
>
> open(LOG,$log) or die("Can't open $log: $!");
> while(<LOG>) {
> $count++ if /$ARGV[0]/;
> }
> print "count of '$ARGV[0]' in $log is $count\n";
>
>
> >
> > More complicated would be to match multiple variables against the same
> > log at one time. I would just pull the log into memory if it were a
> > manageable size but it is not...

>
> Scanning one line at a time is better. You can make the regular
> expression (/$ARGV[0]/ above) as complicated as you want it.
>
> >
> > Anyway - your help is appreciated.
> >
> > The Mikester



I'll give that a shot tomorrow, Thanks - I'll let you know how it goes.
 
Reply With Quote
 
mikester
Guest
Posts: n/a
 
      12-23-2003
(E-Mail Removed) (mikester) wrote in message news:<(E-Mail Removed). com>...
> Jim Gibson <(E-Mail Removed)> wrote in message news:<191220031038058768%(E-Mail Removed) >...
> > In article <(E-Mail Removed) >, mikester
> > <(E-Mail Removed)> wrote:
> >
> > [snip]
> >
> > >
> > > Here's the shell command based perl script I run to get a basic count
> > > on a certain number of incidents.
> > >
> > > #!/usr/bin/perl
> > > $LOG = "$ARGV[1]";
> > > $VARIABLE = "$ARGV[0]";
> > > $GREP = `zgrep -c $VARIABLE $LOG`;
> > > print "$GREP\n";
> > >
> > > I print out the number and another program uses that output to put the
> > > number into a database.
> > >
> > > How would I accompilish this simply in perl?

> >
> > Here is a simple perl program that will do that:
> >
> > #!/usr/bin/perl
> >
> > use strict;
> > use warnings;
> >
> > my $log = $ARGV[1];
> > my $count = 0;
> >
> > open(LOG,$log) or die("Can't open $log: $!");
> > while(<LOG>) {
> > $count++ if /$ARGV[0]/;
> > }
> > print "count of '$ARGV[0]' in $log is $count\n";
> >
> >
> > >
> > > More complicated would be to match multiple variables against the same
> > > log at one time. I would just pull the log into memory if it were a
> > > manageable size but it is not...

> >
> > Scanning one line at a time is better. You can make the regular
> > expression (/$ARGV[0]/ above) as complicated as you want it.
> >
> > >
> > > Anyway - your help is appreciated.
> > >
> > > The Mikester

>
>
> I'll give that a shot tomorrow, Thanks - I'll let you know how it goes.



It works great - but not with the large files. The files are in the
13GB files size and I just don't have the memory to load that up.
 
Reply With Quote
 
Jim Gibson
Guest
Posts: n/a
 
      12-23-2003
In article <(E-Mail Removed)> , mikester
<(E-Mail Removed)> wrote:

> (E-Mail Removed) (mikester) wrote in message
> news:<(E-Mail Removed). com>...
> > Jim Gibson <(E-Mail Removed)> wrote in message
> > news:<191220031038058768%(E-Mail Removed) >...
> > > In article <(E-Mail Removed) >, mikester
> > > <(E-Mail Removed)> wrote:
> > >
> > > [snip]
> > >
> > > >
> > > > Here's the shell command based perl script I run to get a basic count
> > > > on a certain number of incidents.
> > > >


[snip]

> > > Here is a simple perl program that will do that:
> > >
> > > #!/usr/bin/perl
> > >
> > > use strict;
> > > use warnings;
> > >
> > > my $log = $ARGV[1];
> > > my $count = 0;
> > >
> > > open(LOG,$log) or die("Can't open $log: $!");
> > > while(<LOG>) {
> > > $count++ if /$ARGV[0]/;
> > > }
> > > print "count of '$ARGV[0]' in $log is $count\n";
> > >


>
> It works great - but not with the large files. The files are in the
> 13GB files size and I just don't have the memory to load that up.


It shouldn't take much more memory to run that program on a 13GB file
than it does no a small one. The program only reads in one line at a
time. What doesn't "work great" with the large file? What happens?
 
Reply With Quote
 
mikester
Guest
Posts: n/a
 
      12-25-2003
Jim Gibson <(E-Mail Removed)> wrote in message news:<231220031527288072%(E-Mail Removed) >...
> In article <(E-Mail Removed)> , mikester
> <(E-Mail Removed)> wrote:
>
> > (E-Mail Removed) (mikester) wrote in message
> > news:<(E-Mail Removed). com>...
> > > Jim Gibson <(E-Mail Removed)> wrote in message
> > > news:<191220031038058768%(E-Mail Removed) >...
> > > > In article <(E-Mail Removed) >, mikester
> > > > <(E-Mail Removed)> wrote:
> > > >
> > > > [snip]
> > > >
> > > > >
> > > > > Here's the shell command based perl script I run to get a basic count
> > > > > on a certain number of incidents.
> > > > >

>
> [snip]
>
> > > > Here is a simple perl program that will do that:
> > > >
> > > > #!/usr/bin/perl
> > > >
> > > > use strict;
> > > > use warnings;
> > > >
> > > > my $log = $ARGV[1];
> > > > my $count = 0;
> > > >
> > > > open(LOG,$log) or die("Can't open $log: $!");
> > > > while(<LOG>) {
> > > > $count++ if /$ARGV[0]/;
> > > > }
> > > > print "count of '$ARGV[0]' in $log is $count\n";
> > > >

>
> >
> > It works great - but not with the large files. The files are in the
> > 13GB files size and I just don't have the memory to load that up.

>
> It shouldn't take much more memory to run that program on a 13GB file
> than it does no a small one. The program only reads in one line at a
> time. What doesn't "work great" with the large file? What happens?


I'll post the output after the holiday.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
counting the number of characters that were matched in a regularexpression PugetSoundSylvia@gmail.com Perl Misc 2 04-16-2008 08:18 PM
compare 2 data files and extract fields for matched lines shree Perl Misc 5 12-29-2007 09:45 PM
Hard or Easy? To find string, then grab criterion matched lines above and below? samiam@mytrashmail.com Perl Misc 13 10-09-2006 07:56 PM
counting up instead of counting down edwardfredriks Javascript 6 09-07-2005 03:30 PM
Disk Backed Collection/DB for Extremely Large Datasets - Best Option nicholas.wakefield@gmail.com Java 5 08-10-2005 11:54 PM



Advertisments