Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Convert AWK regex to Python

Reply
Thread Tools

Re: Convert AWK regex to Python

 
 
J
Guest
Posts: n/a
 
      05-16-2011
Hello Peter, Angelico,

Ok lets see, My aim is to filter out several fields from a log file and write them to a new log file. The current log file, as I mentioned previously, has thousands of lines like this:-
2011-05-16 09:46:22,361 [Thread-4847133] PDU D <G_CC_SMS_SERVICE_51408_656.O_ CC_SMS_SERVICE_51408_656-ServerThread-VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 80000004 Status: 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) >

All the lines in the log file are similar and they all have the same length(same amount of fields). Most of the fields are separated by spaces except for couple of them which I am processing with AWK (removing "<G_" from the string for example). So in essence what I want to do is evaluate each line in the log file and break them down into fields which I can call individually and write them to a new log file (for example selecting only fields 1, 2 and 3).

I hope this is clearer now

Regards,

Junior
 
Reply With Quote
 
 
 
 
Steven D'Aprano
Guest
Posts: n/a
 
      05-16-2011
On Mon, 16 May 2011 03:57:49 -0700, J wrote:

> Most of the fields are separated by
> spaces except for couple of them which I am processing with AWK
> (removing "<G_" from the string for example). So in essence what I want
> to do is evaluate each line in the log file and break them down into
> fields which I can call individually and write them to a new log file
> (for example selecting only fields 1, 2 and 3).


fields = line.split(' ')
output.write(fields[1] + ' ')
output.write(fields[2] + ' ')
output.write(fields[3] + '\n')



--
Steven
 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      05-16-2011
J wrote:

> Hello Peter, Angelico,
>
> Ok lets see, My aim is to filter out several fields from a log file and
> write them to a new log file. The current log file, as I mentioned
> previously, has thousands of lines like this:- 2011-05-16 09:46:22,361
> [Thread-4847133] PDU D <G_CC_SMS_SERVICE_51408_656.O_
> CC_SMS_SERVICE_51408_656-ServerThread-

VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX
> - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 80000004
> Status: 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) >
>
> All the lines in the log file are similar and they all have the same
> length (same amount of fields). Most of the fields are separated by
> spaces except for couple of them which I am processing with AWK (removing
> "<G_" from the string for example). So in essence what I want to do is
> evaluate each line in the log file and break them down into fields which I
> can call individually and write them to a new log file (for example
> selecting only fields 1, 2 and 3).
>
> I hope this is clearer now


Not much

It doesn't really matter whether there are 100, 1000, or a million lines in
the file; the important information is the structure of the file. You may be
able to get away with a quick and dirty script consisting of just a few
regular expressions, e. g.

import re

filename = ...

def get_service(line):
return re.compile(r"[(](\w+)").search(line).group(1)

def get_command(line):
return re.compile(r"<G_(\w+)").search(line).group(1)

def get_status(line):
return re.compile(r"Status:\s+(\d+)").search(line).group( 1)

with open(filename) as infile:
for line in infile:
print get_service(line), get_command(line), get_status(line)

but there is no guarantee that there isn't data in your file that breaks the
implied assumptions. Also, from the shell hackery it looks like your
ultimate goal seems to be a kind of frequency table which could be built
along these lines:

freq = {}
with open(filename) as infile:
for line in infile:
service = get_service(line)
command = get_command(line)
status = get_status(line)
key = command, service, status
freq[key] = freq.get(key, 0) + 1

for key, occurences in sorted(freq.iteritems()):
print "Service: {}, Command: {}, Status: {}, Occurences: {}".format(*key
+ (occurences,))

 
Reply With Quote
 
J
Guest
Posts: n/a
 
      05-16-2011
Thanks for the sugestions Peter, I will give them a try

Peter Otten wrote:
> J wrote:
>
> > Hello Peter, Angelico,
> >
> > Ok lets see, My aim is to filter out several fields from a log file and
> > write them to a new log file. The current log file, as I mentioned
> > previously, has thousands of lines like this:- 2011-05-16 09:46:22,361
> > [Thread-4847133] PDU D <G_CC_SMS_SERVICE_51408_656.O_
> > CC_SMS_SERVICE_51408_656-ServerThread-

> VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX
> > - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 80000004
> > Status: 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) >
> >
> > All the lines in the log file are similar and they all have the same
> > length (same amount of fields). Most of the fields are separated by
> > spaces except for couple of them which I am processing with AWK (removing
> > "<G_" from the string for example). So in essence what I want to do is
> > evaluate each line in the log file and break them down into fields which I
> > can call individually and write them to a new log file (for example
> > selecting only fields 1, 2 and 3).
> >
> > I hope this is clearer now

>
> Not much
>
> It doesn't really matter whether there are 100, 1000, or a million lines in
> the file; the important information is the structure of the file. You may be
> able to get away with a quick and dirty script consisting of just a few
> regular expressions, e. g.
>
> import re
>
> filename = ...
>
> def get_service(line):
> return re.compile(r"[(](\w+)").search(line).group(1)
>
> def get_command(line):
> return re.compile(r"<G_(\w+)").search(line).group(1)
>
> def get_status(line):
> return re.compile(r"Status:\s+(\d+)").search(line).group( 1)
>
> with open(filename) as infile:
> for line in infile:
> print get_service(line), get_command(line), get_status(line)
>
> but there is no guarantee that there isn't data in your file that breaks the
> implied assumptions. Also, from the shell hackery it looks like your
> ultimate goal seems to be a kind of frequency table which could be built
> along these lines:
>
> freq = {}
> with open(filename) as infile:
> for line in infile:
> service = get_service(line)
> command = get_command(line)
> status = get_status(line)
> key = command, service, status
> freq[key] = freq.get(key, 0) + 1
>
> for key, occurences in sorted(freq.iteritems()):
> print "Service: {}, Command: {}, Status: {}, Occurences: {}".format(*key
> + (occurences,))

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert AWK regex to Python J Python 6 05-16-2011 04:39 PM
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
text file parsing (awk -> python) Daniel Nogradi Python 3 11-22-2006 06:02 PM
python vs awk for simple sysamin tasks Matthew Thorley Python 20 06-05-2004 08:11 PM
where is the awk to python translator program Dan Jacobson Python 2 07-28-2003 05:09 PM



Advertisments