Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > matching a pattern with a space or no space??

Reply
Thread Tools

matching a pattern with a space or no space??

 
 
erik
Guest
Posts: n/a
 
      11-09-2005
I am trying to chop up some netscreen firewall logs where I just want
certain fields. In perl, I am doing a "cut" and picking the fields that
I want. The problem is, silly netscreens insert spaces in thier service
name at will. For example it might have:

start_time="2005-11-08 service=https proto=6 src src_port=3873
dst_port=443 src-xlated ip=x.x.x.x
(notice there is no space in the service name, it is just https)

start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
src_port=123 dst_port=123 src-xlated
(notice the space between Network and Time.)

If my cut is space deliminated, the space in the service name throws me
off by 1 field of course. How can I regex a data flow that is always
changing? I am stuck...

Now I can do a "find and replace" for ALL the possible space
deliminated service names, but that has a high Level of Effort. Any
ideas?

 
Reply With Quote
 
 
 
 
Degz
Guest
Posts: n/a
 
      11-09-2005
Does the "Proto" string always come after the service. ?

If so you can find the start posistion of "service" and the start
posistion of "proto" then take the substring ?

ie my $pos1 = index($instring, "service")
my $pos2 = index($instring, "proto")
my $servicename = substr($instring, $pos1,$pos2-1)

Degz

 
Reply With Quote
 
 
 
 
Degz
Guest
Posts: n/a
 
      11-09-2005
Does the "Proto" string always come after the service. ?

If so you can find the start posistion of "service" and the start
posistion of "proto" then take the substring ?

ie my $pos1 = index($instring, "service")
my $pos2 = index($instring, "proto")
my $servicename = substr($instring, $pos1,$pos2-1)

Degz

 
Reply With Quote
 
Michael Zawrotny
Guest
Posts: n/a
 
      11-09-2005
On 9 Nov 2005 07:45:55 -0800, erik <(E-Mail Removed)> wrote:
> I am trying to chop up some netscreen firewall logs where I just want
> certain fields. In perl, I am doing a "cut" and picking the fields that
> I want. The problem is, silly netscreens insert spaces in thier service
> name at will. For example it might have:
>
> start_time="2005-11-08 service=https proto=6 src src_port=3873
> dst_port=443 src-xlated ip=x.x.x.x
> (notice there is no space in the service name, it is just https)
>
> start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
> src_port=123 dst_port=123 src-xlated
> (notice the space between Network and Time.)


As you discovered, simply splitting on whitespace doesn't work. There
might be ways to make it work, but it's better to match the parts you
want explicitly and use non-capturing groups for the rest. For an
example on a cut-down version of your data:

#!/usr/bin/perl

use strict;
use warnings;

use Regexp::Common;

while ( <DATA> ) {
chomp;
my ($service, $proto) = /service=(.*)\s+proto=(.*)/;
print "$service: $proto\n";
}

__DATA__
service=https proto=6
service=Network Time proto=17


Mike

--
Michael Zawrotny
Institute of Molecular Biophysics
Florida State University | email: http://www.velocityreviews.com/forums/(E-Mail Removed)
Tallahassee, FL 32306-4380 | phone: (850) 644-0069
 
Reply With Quote
 
it_says_BALLS_on_your forehead
Guest
Posts: n/a
 
      11-09-2005

erik wrote:
> I am trying to chop up some netscreen firewall logs where I just want
> certain fields. In perl, I am doing a "cut" and picking the fields that
> I want. The problem is, silly netscreens insert spaces in thier service
> name at will. For example it might have:
>
> start_time="2005-11-08 service=https proto=6 src src_port=3873
> dst_port=443 src-xlated ip=x.x.x.x
> (notice there is no space in the service name, it is just https)
>
> start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
> src_port=123 dst_port=123 src-xlated
> (notice the space between Network and Time.)
>
> If my cut is space deliminated, the space in the service name throws me
> off by 1 field of course. How can I regex a data flow that is always
> changing? I am stuck...
>
> Now I can do a "find and replace" for ALL the possible space
> deliminated service names, but that has a high Level of Effort. Any
> ideas?


i'm going to guess that the 'name' in the name/value pair cannot
contain spaces. i'm also going to guess that all name/value pairs are
delimited by spaces. if this is true, you can match on this pattern:

/(\w+)=([\w|\s]+)\s\w+=/

....i haven't tested this, just using it to convey the concept. you can
also probably do positive look-ahead, but i'm not too familiar with
that.

 
Reply With Quote
 
it_says_BALLS_on_your forehead
Guest
Posts: n/a
 
      11-09-2005

it_says_BALLS_on_your forehead wrote:
> erik wrote:
> > I am trying to chop up some netscreen firewall logs where I just want
> > certain fields. In perl, I am doing a "cut" and picking the fields that
> > I want. The problem is, silly netscreens insert spaces in thier service
> > name at will. For example it might have:
> >
> > start_time="2005-11-08 service=https proto=6 src src_port=3873
> > dst_port=443 src-xlated ip=x.x.x.x
> > (notice there is no space in the service name, it is just https)
> >
> > start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
> > src_port=123 dst_port=123 src-xlated
> > (notice the space between Network and Time.)
> >
> > If my cut is space deliminated, the space in the service name throws me
> > off by 1 field of course. How can I regex a data flow that is always
> > changing? I am stuck...
> >
> > Now I can do a "find and replace" for ALL the possible space
> > deliminated service names, but that has a high Level of Effort. Any
> > ideas?

>
> i'm going to guess that the 'name' in the name/value pair cannot
> contain spaces. i'm also going to guess that all name/value pairs are
> delimited by spaces. if this is true, you can match on this pattern:
>
> /(\w+)=([\w|\s]+)\s\w+=/
>
> ...i haven't tested this, just using it to convey the concept. you can
> also probably do positive look-ahead, but i'm not too familiar with
> that.


ok, tested my pattern:

#!/apps/webstats/bin/perl

use strict;

my $string1 = "service=https proto=6";
my $string2 = "service=Network Time proto=17";

my ($name1, $value1) = $string1 =~ m/(\w+)=([\w|\s]+)\s\w+=/;
my ($name2, $value2) = $string2 =~ m/(\w+)=([\w|\s]+)\s\w+=/;

print "1: -$name1- = -$value1-\n";
print "2: -$name2- = -$value2-\n";

#---OUTPUT
1: -service- = -https-
2: -service- = -Network Time-


....again, positive lookahead may be more efficient.

 
Reply With Quote
 
it_says_BALLS_on_your forehead
Guest
Posts: n/a
 
      11-09-2005

Purl Gurl wrote:
> Purl Gurl wrote:
>
> (snipped)
>
> > > start_time="2005-11-08 service=https proto=6 src src_port=3873
> > > dst_port=443 src-xlated ip=x.x.x.x

>
> > > start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
> > > src_port=123 dst_port=123 src-xlated

>
> > There exists discrepancies between your two log formats suggesting
> > your examples are fabricated.

>
> I forgot to add, there is a glaring error in both of your examples
> which directly indicates your examples are fabricated.
>
> Purl Gurl


are you referring to the double quotes?

....anyway, in the interest of productive conversation, here is the code
with positive lookaheads:

#!/apps/webstats/bin/perl

use strict; use warnings;

my $string1 = "service=https proto=6";
my $string2 = "service=Network Time proto=17";

my ($name1, $value1) = $string1 =~ m/(\w+)=([\w|\s]+)(?=\s\w+=)/;
my ($name2, $value2) = $string2 =~ m/(\w+)=([\w|\s]+)(?=\s\w+=)/;

print "1: -$name1- = -$value1-\n";
print "2: -$name2- = -$value2-\n";

#--OUTPUT
1: -service- = -https-
2: -service- = -Network Time-

 
Reply With Quote
 
erik
Guest
Posts: n/a
 
      11-09-2005
Thanks everyone!! I completely forgot about substringing. That'll do it.

 
Reply With Quote
 
Samwyse
Guest
Posts: n/a
 
      11-10-2005
Purl Gurl wrote:
> simon.chao wrote:
>
>>Purl Gurl wrote:
>>
>>>Purl Gurl wrote:

>
>
> (snipped)
>
>
>>are you referring to the double quotes?

>
>
> There is no such critter "double quotes."
>
> My presumption is you meant to write,
>
> "...the single quote mark?"


Or perhaps "... the single double quote?"
 
Reply With Quote
 
Samwyse
Guest
Posts: n/a
 
      11-10-2005
Purl Gurl wrote:
> I suppose you could write a couple dozen snippets to index, return
> true or false, then select an appropriate substring function. Strikes
> me substring would be the most difficult method to use.


No, 'unpack' would be the most difficult.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with Pattern matching. Matching multiple lines from while reading from a file. Bobby Chamness Perl Misc 2 05-03-2007 06:02 PM
Pattern matching : not matching problem Marc Bissonnette Perl Misc 9 01-13-2004 05:52 PM
Why Python style guide (PEP-8) says 4 space indents instead of 8 space??? 8 space indents ever ok?? Christian Seberino Python 21 10-27-2003 04:20 PM
Re: Why Python style guide (PEP-8) says 4 space indents instead of8 space??? 8 space indents ever ok?? Ian Bicking Python 2 10-23-2003 07:07 AM
Stack space, global space, heap space Shuo Xiang C Programming 10 07-11-2003 07:30 PM



Advertisments