Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > extract text from log file using re

Reply
Thread Tools

extract text from log file using re

 
 
Fabian Braennstroem
Guest
Posts: n/a
 
      09-13-2007
Hi,

I would like to delete a region on a log file which has this
kind of structure:


#------flutest------------------------------------------------------------
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.


Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.dat"...
Done.

500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04
8.3913e-04 3.8545e-03 1.3315e-01 11:14:10 500

reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499

#------------------------------------------------------------------

I have a small script, which removes lines starting with
'(re)versed', '(i)teration' and '(t)urbulent' and put the
rest into an array:

# -- plot residuals ----------------------------------------
import re
filename="flutest"
reversed_flow=re.compile('^\ re')
turbulent_viscosity_ratio=re.compile('^\ tu')
iteration=re.compile('^\ \ i')

begin_of_res=re.compile('>\ \ \ i')
end_of_res=re.compile('^\ ad')

begin_of_writing=re.compile('^\Writing')
end_of_writing=re.compile('^\Done')

end_number=0
begin_number=0


n = 0
for line in open(filename).readlines():
n = n + 1
if begin_of_res.match(line):
begin_number=n+1
print "Line Number (begin): " + str(n)

if end_of_res.match(line):
end_number=n
print "Line Number (end): " + str(n)

if begin_of_writing.match(line):
begin_w=n+1
print "BeginWriting: " + str(n)
print "HALLO"

if end_of_writing.match(line):
end_w=n+1
print "EndWriting: " +str(n)

if n > end_number:
end_number=n
print "Line Number (end): " + str(end_number)





n = 0
array = []
array_dummy = []
array_mapped = []

mapped = []
mappe = []

n = 0
for line in open(filename).readlines():
n = n + 1
if (begin_number <= n) and (end_number > n):
# if (begin_w <= n) and (end_w > n):
if not reversed_flow.match(line) and not
iteration.match(line) and not
turbulent_viscosity_ratio.match(line):
m=(line.strip().split())
print m
if len(m) > 0:
# print len(m)
laenge_liste=len(m)
# print len(m)
mappe.append(m)


#--end plot
residuals-------------------------------------------------

This works fine ; except for the region with the writing
information:

#-----writing information
-----------------------------------------
Writing "/home/fb/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.
# -------end writing information -------------------------------

Does anyone know, how I can this 'writing' stuff too? The
matchingIt occurs a lot

Regards!
Fabian

 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      09-14-2007
Fabian Braennstroem wrote:

> I would like to delete a region on a log file which has this
> kind of structure:
>
>
> #------flutest------------------------------------------------------------
> 498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
> 8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
> 499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
> 8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
> reversed flow in 1 faces on pressure-outlet 35.
>
> Writing
> "/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.cas"...
> 5429199 mixed cells, zone 29, binary.
> 11187656 mixed interior faces, zone 30, binary.
> 20004 triangular wall faces, zone 31, binary.
> 1104 mixed velocity-inlet faces, zone 32, binary.
> 133638 triangular wall faces, zone 33, binary.
> 14529 triangular wall faces, zone 34, binary.
> 1350 mixed pressure-outlet faces, zone 35, binary.
> 11714 mixed wall faces, zone 36, binary.
> 1232141 nodes, binary.
> 1232141 node flags, binary.
> Done.
>
>
> Writing
> "/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.dat"...
> Done.
>
> 500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04
> 8.3913e-04 3.8545e-03 1.3315e-01 11:14:10 500
>
> reversed flow in 2 faces on pressure-outlet 35.
> 501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
> 8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
>
> #------------------------------------------------------------------
>
> I have a small script, which removes lines starting with
> '(re)versed', '(i)teration' and '(t)urbulent' and put the
> rest into an array:
>
> # -- plot residuals ----------------------------------------
> import re
> filename="flutest"
> reversed_flow=re.compile('^\ re')
> turbulent_viscosity_ratio=re.compile('^\ tu')
> iteration=re.compile('^\ \ i')
>
> begin_of_res=re.compile('>\ \ \ i')
> end_of_res=re.compile('^\ ad')


The following regular expressions have some extra backslashes
which change their meaning:

> begin_of_writing=re.compile('^\Writing')
> end_of_writing=re.compile('^\Done')


But I don't think you need regular expressions at all.
Also, it's better to iterate over the file just once because
you don't need to remember the position of regions to be skipped.
Here's a simplified demo:

def skip_region(items, start, end):
items = iter(items)
while 1:
for line in items:
if start(line):
break
yield line
else:
break
for line in items:
if end(line):
break
else:
break

def begin(line):
return line.strip() == "Writing"

def end(line):
return line.strip() == "Done."

# --- begin demo setup (remove to test with real data) ---
def open(filename):
from StringIO import StringIO
return StringIO("""\
iteration # to be ignored
alpha
beta
reversed # to be ignored
Writing
to
be
ignored
Done.
gamma
delta

""")
# --- end demo setup ---

if __name__ == "__main__":
filename = "fluetest"
for line in skip_region(open(filename), begin, end):
line = line.strip()
if line and not line.startswith(("reversed", "iteration")):
print line

skip_region() takes a file (or any iterable) and two functions
that test for the begin/end of the region to be skipped.
You can nest skip_region() calls if you have regions with different
start/end conditions.

Peter
 
Reply With Quote
 
 
 
 
Paul McGuire
Guest
Posts: n/a
 
      09-14-2007
On Sep 13, 4:09 pm, Fabian Braennstroem <f.braennstr...@gmx.de> wrote:
> Hi,
>
> I would like to delete a region on a log file which has this
> kind of structure:
>


How about just searching for what you want. Here are two approaches,
one using pyparsing, one using the batteries-included re module.

-- Paul


# -*- coding: iso-8859-15 -*-
data = """\
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-050*0.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-050*0.dat"...
Done.


500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04 8.3913e-04
3.8545e-03 1.3315e-01 11:14:10 500


reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
"""

print "search using pyparsing"
from pyparsing import *

integer = Word(nums).setParseAction(lambda t:int(t[0]))
scireal = Regex(r"\d*\.\d*e\-\d\d").setParseAction(lambda
t:float(t[0]))
time = Regex(r"\d\d:\d\d:\d\d")

logline = (integer("testNum") +
And([scireal]*7)("data") +
time("testTime") +
integer("result"))

for tRes in logline.searchString(data):
print "Test#:",tRes.testNum
print "Data:", tRes.data
print "Time:", tRes.testTime
print "Output:", tRes.result
print

print
print "search using re's"
import re
integer = r"\d*"
scireal = r"\d*\.\d*e\-\d\d"
time = r"\d\d:\d\d:\d\d"
ws = r"\s*"

namedField = lambda reStr,n: "(?P<%s>%s)" % (n,reStr)
logline = re.compile(
namedField(integer,"testNum") + ws +
namedField( (scireal+ws)*7,"data" ) +
namedField(time,"testTime") + ws +
namedField(integer,"result") )
for m in logline.finditer(data):
print "Test#:",int(m.group("testNum"))
print "Data:", map(float,m.group("data").split())
print "Time:", m.group("testTime")
print "Output:", int(m.group("result"))
print

Prints:

search using pyparsing
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500

Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499


search using re's
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500

Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Please help me how is easiest way to extract text between some variable text Mladen Perl Misc 5 02-22-2011 10:57 AM
Extract string from log file josephtys86@googlemail.com Python 0 08-09-2008 02:48 PM
How do i extract vidios when winrar wont extract them??? help plzzzzzzzz smuttdog@sc.rr.com Computer Support 2 12-23-2007 07:03 AM
How to log independently of other webapps using log4j? Also, reading from log file from within servlet? unomystEz Java 0 11-19-2006 10:42 AM
Urgent Pls: Facing problem in reading Log information from Log file, created by IIS Amratash ASP .Net 0 04-13-2004 09:08 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57