Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Extract Text Table From File

Reply
Thread Tools

Extract Text Table From File

 
 
Huso
Guest
Posts: n/a
 
      08-27-2012
Hi,

I am trying to extract some text table data from a log file. I am trying different methods, but I don't seem to get anything to work. I am kind of new to python as well. Hence, appreciate if someone could help me out.

Below is just ONE block of the traffic i have in the log files. There will be more in them with different data.

ROUTES TRAFFIC RESULTS, LSR
TRG MP DATE TIME
37 17 120824 0000

R TRAFF NBIDS CCONG NDV ANBLO MHTIME NBANSW
AABBCCO 6.4 204 0.0 115 1.0 113.4 144
AABBCCI 3.0 293 115 1.0 37.0 171
DDEEFFO 0.2 5 0.0 59 0.0 107.6 3
EEFFEEI 0.0 0 59 0.0 0.0 0
HHGGFFO 0.0 0 0.0 30 0.0 0.0 0
HHGGFFI 0.3 15 30 0.0 62.2 4
END

Thanks
 
Reply With Quote
 
 
 
 
Laszlo Nagy
Guest
Posts: n/a
 
      08-27-2012
On 2012-08-27 11:53, Huso wrote:
> Hi,
>
> I am trying to extract some text table data from a log file. I am trying different methods, but I don't seem to get anything to work. I am kind of new to python as well. Hence, appreciate if someone could help me out.


#
# Write test data to test.txt
#

data = """
ROUTES TRAFFIC RESULTS, LSR
TRG MP DATE TIME
37 17 120824 0000

R TRAFF NBIDS CCONG NDV ANBLO MHTIME NBANSW
AABBCCO 6.4 204 0.0 115 1.0 113.4 144
AABBCCI 3.0 293 115 1.0 37.0 171
DDEEFFO 0.2 5 0.0 59 0.0 107.6 3
EEFFEEI 0.0 0 59 0.0 0.0 0
HHGGFFO 0.0 0 0.0 30 0.0 0.0 0
HHGGFFI 0.3 15 30 0.0 62.2 4
END
"""
fout = open("test.txt","wb+")
fout.write(data)
fout.close()

#
# This is how you iterate over a file and process its lines
#
fin = open("test.txt","r")
for line in fin:
# This is one possible way to extract values.
values = line.strip().split()
print values


This will print:

[]
['ROUTES', 'TRAFFIC', 'RESULTS,', 'LSR']
['TRG', 'MP', 'DATE', 'TIME']
['37', '17', '120824', '0000']
[]
['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO', 'MHTIME', 'NBANSW']
['AABBCCO', '6.4', '204', '0.0', '115', '1.0', '113.4', '144']
['AABBCCI', '3.0', '293', '115', '1.0', '37.0', '171']
['DDEEFFO', '0.2', '5', '0.0', '59', '0.0', '107.6', '3']
['EEFFEEI', '0.0', '0', '59', '0.0', '0.0', '0']
['HHGGFFO', '0.0', '0', '0.0', '30', '0.0', '0.0', '0']
['HHGGFFI', '0.3', '15', '30', '0.0', '62.2', '4']
['END']


The "values" list in the last line contains these values. This will work
only if you don't have spaces in your values. Otherwise you can use
regular expressions to parse a line. See here:

http://docs.python.org/library/re.html

Since you did not give any specification on your file format, it would
be hard to give a concrete program that parses your file(s)

Best,

Laszlo



 
Reply With Quote
 
 
 
 
Huso
Guest
Posts: n/a
 
      08-27-2012
On Monday, August 27, 2012 3:12:14 PM UTC+5, Laszlo Nagy wrote:
> On 2012-08-27 11:53, Huso wrote:
>
> > Hi,

>
> >

>
> > I am trying to extract some text table data from a log file. I am trying different methods, but I don't seem to get anything to work. I am kind of new to python as well. Hence, appreciate if someone could help me out.

>
>
>
> #
>
> # Write test data to test.txt
>
> #
>
>
>
> data = """
>
> ROUTES TRAFFIC RESULTS, LSR
>
> TRG MP DATE TIME
>
> 37 17 120824 0000
>
>
>
> R TRAFF NBIDS CCONG NDV ANBLO MHTIME NBANSW
>
> AABBCCO 6.4 204 0.0 115 1.0 113.4 144
>
> AABBCCI 3.0 293 115 1.0 37.0 171
>
> DDEEFFO 0.2 5 0.0 59 0.0 107.6 3
>
> EEFFEEI 0.0 0 59 0.0 0.0 0
>
> HHGGFFO 0.0 0 0.0 30 0.0 0.0 0
>
> HHGGFFI 0.3 15 30 0.0 62.2 4
>
> END
>
> """
>
> fout = open("test.txt","wb+")
>
> fout.write(data)
>
> fout.close()
>
>
>
> #
>
> # This is how you iterate over a file and process its lines
>
> #
>
> fin = open("test.txt","r")
>
> for line in fin:
>
> # This is one possible way to extract values.
>
> values = line.strip().split()
>
> print values
>
>
>
>
>
> This will print:
>
>
>
> []
>
> ['ROUTES', 'TRAFFIC', 'RESULTS,', 'LSR']
>
> ['TRG', 'MP', 'DATE', 'TIME']
>
> ['37', '17', '120824', '0000']
>
> []
>
> ['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO', 'MHTIME', 'NBANSW']
>
> ['AABBCCO', '6.4', '204', '0.0', '115', '1.0', '113.4', '144']
>
> ['AABBCCI', '3.0', '293', '115', '1.0', '37.0', '171']
>
> ['DDEEFFO', '0.2', '5', '0.0', '59', '0.0', '107.6', '3']
>
> ['EEFFEEI', '0.0', '0', '59', '0.0', '0.0', '0']
>
> ['HHGGFFO', '0.0', '0', '0.0', '30', '0.0', '0.0', '0']
>
> ['HHGGFFI', '0.3', '15', '30', '0.0', '62.2', '4']
>
> ['END']
>
>
>
>
>
> The "values" list in the last line contains these values. This will work
>
> only if you don't have spaces in your values. Otherwise you can use
>
> regular expressions to parse a line. See here:
>
>
>
> http://docs.python.org/library/re.html
>
>
>
> Since you did not give any specification on your file format, it would
>
> be hard to give a concrete program that parses your file(s)
>
>
>
> Best,
>
>
>
> Laszlo


Hi,

Thank you for the information.
The exact way I want to extract the data is like as below.

TRG, MP and DATE and TIME is common for that certain block of traffic.
So I am using those and dumping it with the rest of the data into sql.
Table will have all headers (TRG, MP, DATE, TIME, R, TRAFF, NBIDS, CCONG, NDV, ANBLO, MHTIME, NBANSW).

So from this text, the first data will be 37, 17, 120824, 0000, AABBCCO, 6.4, 204, 0.0, 115, 1.0, 113.4, 144.

Thanking,
Huso
 
Reply With Quote
 
Huso
Guest
Posts: n/a
 
      08-27-2012
On Monday, August 27, 2012 3:12:14 PM UTC+5, Laszlo Nagy wrote:
> On 2012-08-27 11:53, Huso wrote:
>
> > Hi,

>
> >

>
> > I am trying to extract some text table data from a log file. I am trying different methods, but I don't seem to get anything to work. I am kind of new to python as well. Hence, appreciate if someone could help me out.

>
>
>
> #
>
> # Write test data to test.txt
>
> #
>
>
>
> data = """
>
> ROUTES TRAFFIC RESULTS, LSR
>
> TRG MP DATE TIME
>
> 37 17 120824 0000
>
>
>
> R TRAFF NBIDS CCONG NDV ANBLO MHTIME NBANSW
>
> AABBCCO 6.4 204 0.0 115 1.0 113.4 144
>
> AABBCCI 3.0 293 115 1.0 37.0 171
>
> DDEEFFO 0.2 5 0.0 59 0.0 107.6 3
>
> EEFFEEI 0.0 0 59 0.0 0.0 0
>
> HHGGFFO 0.0 0 0.0 30 0.0 0.0 0
>
> HHGGFFI 0.3 15 30 0.0 62.2 4
>
> END
>
> """
>
> fout = open("test.txt","wb+")
>
> fout.write(data)
>
> fout.close()
>
>
>
> #
>
> # This is how you iterate over a file and process its lines
>
> #
>
> fin = open("test.txt","r")
>
> for line in fin:
>
> # This is one possible way to extract values.
>
> values = line.strip().split()
>
> print values
>
>
>
>
>
> This will print:
>
>
>
> []
>
> ['ROUTES', 'TRAFFIC', 'RESULTS,', 'LSR']
>
> ['TRG', 'MP', 'DATE', 'TIME']
>
> ['37', '17', '120824', '0000']
>
> []
>
> ['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO', 'MHTIME', 'NBANSW']
>
> ['AABBCCO', '6.4', '204', '0.0', '115', '1.0', '113.4', '144']
>
> ['AABBCCI', '3.0', '293', '115', '1.0', '37.0', '171']
>
> ['DDEEFFO', '0.2', '5', '0.0', '59', '0.0', '107.6', '3']
>
> ['EEFFEEI', '0.0', '0', '59', '0.0', '0.0', '0']
>
> ['HHGGFFO', '0.0', '0', '0.0', '30', '0.0', '0.0', '0']
>
> ['HHGGFFI', '0.3', '15', '30', '0.0', '62.2', '4']
>
> ['END']
>
>
>
>
>
> The "values" list in the last line contains these values. This will work
>
> only if you don't have spaces in your values. Otherwise you can use
>
> regular expressions to parse a line. See here:
>
>
>
> http://docs.python.org/library/re.html
>
>
>
> Since you did not give any specification on your file format, it would
>
> be hard to give a concrete program that parses your file(s)
>
>
>
> Best,
>
>
>
> Laszlo


Hi,

Thank you for the information.
The exact way I want to extract the data is like as below.

TRG, MP and DATE and TIME is common for that certain block of traffic.
So I am using those and dumping it with the rest of the data into sql.
Table will have all headers (TRG, MP, DATE, TIME, R, TRAFF, NBIDS, CCONG, NDV, ANBLO, MHTIME, NBANSW).

So from this text, the first data will be 37, 17, 120824, 0000, AABBCCO, 6.4, 204, 0.0, 115, 1.0, 113.4, 144.

Thanking,
Huso
 
Reply With Quote
 
Laszlo Nagy
Guest
Posts: n/a
 
      08-27-2012

> Hi,
>
> Thank you for the information.
> The exact way I want to extract the data is like as below.
>
> TRG, MP and DATE and TIME is common for that certain block of traffic.
> So I am using those and dumping it with the rest of the data into sql.
> Table will have all headers (TRG, MP, DATE, TIME, R, TRAFF, NBIDS, CCONG, NDV, ANBLO, MHTIME, NBANSW).
>
> So from this text, the first data will be 37, 17, 120824, 0000, AABBCCO, 6.4, 204, 0.0, 115, 1.0, 113.4, 144.

How many blocks do you have in a file? Do you want to create different
data sets for those blocks? How do you identify those blocks? (E.g. are
they all saved into the same database table the same way?)

Anyway here is something:

import re
# AABBCCO 6.4 204 0.0 115 1.0 113.4 144
pattern = re.compile(r"""([A-Z]{7})"""+7*r"""\s+([\d\.]+)""")

#
# This is how you iterate over a file and process its lines
#
fin = open("test.txt","r")
blocks = []
block = None
for line in fin:
# This is one possible way to extract values.
values = line.strip().split()
if values==['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO',
'MHTIME', 'NBANSW']:
if block is not None:
blocks.append(block)
block = []
elif block is not None:
res = pattern.match(line.strip())
if res:
values = list(res.groups())
values[1:] = map(float,values[1:])
block.append(values)
if block is not None:
blocks.append(block)

for idx,block in enumerate(blocks):
print "BLOCK",idx
for values in block:
print values

This prints:

BLOCK 0
['AABBCCO', 6.4, 204.0, 0.0, 115.0, 1.0, 113.4, 144.0]
['DDEEFFO', 0.2, 5.0, 0.0, 59.0, 0.0, 107.6, 3.0]
['HHGGFFO', 0.0, 0.0, 0.0, 30.0, 0.0, 0.0, 0.0]

 
Reply With Quote
 
Huso
Guest
Posts: n/a
 
      08-27-2012
Hi,

There can be any number of blocks in the log file.
I distinguish the block by the start header 'ROUTES TRAFFIC RESULTS, LSR' and ending in 'END'. Each block will have a unique [date + time] value.

I tried the code you mentioned, it works for the data part.
But I need to get the TRG, MP, DATE and TIME for the block with those data as well. This is the part that i'm really tangled in.

Thanking,
Huso
 
Reply With Quote
 
Huso
Guest
Posts: n/a
 
      08-27-2012
Hi,

There can be any number of blocks in the log file.
I distinguish the block by the start header 'ROUTES TRAFFIC RESULTS, LSR' and ending in 'END'. Each block will have a unique [date + time] value.

I tried the code you mentioned, it works for the data part.
But I need to get the TRG, MP, DATE and TIME for the block with those data as well. This is the part that i'm really tangled in.

Thanking,
Huso
 
Reply With Quote
 
Laszlo Nagy
Guest
Posts: n/a
 
      08-27-2012
On 2012-08-27 13:23, Huso wrote:
> Hi,
>
> There can be any number of blocks in the log file.
> I distinguish the block by the start header 'ROUTES TRAFFIC RESULTS, LSR' and ending in 'END'. Each block will have a unique [date + time] value.
>
> I tried the code you mentioned, it works for the data part.
> But I need to get the TRG, MP, DATE and TIME for the block with those data as well. This is the part that i'm really tangled in.
>
> Thanking,
> Huso

Well, I suggest that you try to understand my code and make changes in
it. It is not too hard. First you start reading documentation of the
"re" module. It is worth learning Python. Especially for mining data out
of text files.

Best,

Laszlo

 
Reply With Quote
 
Tim Chase
Guest
Posts: n/a
 
      08-27-2012
On 08/27/12 04:53, Huso wrote:
> Below is just ONE block of the traffic i have in the log files. There will be more in them with different data.
>
> ROUTES TRAFFIC RESULTS, LSR
> TRG MP DATE TIME
> 37 17 120824 0000
>
> R TRAFF NBIDS CCONG NDV ANBLO MHTIME NBANSW
> AABBCCO 6.4 204 0.0 115 1.0 113.4 144
> AABBCCI 3.0 293 115 1.0 37.0 171
> DDEEFFO 0.2 5 0.0 59 0.0 107.6 3
> HHGGFFI 0.3 15 30 0.0 62.2 4
> END


In the past I've used something like the following to find columnar
data based on some found headers:

import re
token_re = re.compile(r'\b(\w+)\s*')
f = file(FILENAME)
headers = f.next() # in your case, you'd
# search forward until
# you got to a header line
# and use that TRAFF... line
header_map = dict(
# build a map of field-name to slice
(
matchobj.group(1).upper(),
slice(*matchobj.span())
)
for matchobj
in token_re.finditer(headers)
)

You can then access your values as you iterate through the rest of
the rows:

for row in f:
if row.startswith("END"): break
traff = float(row[header_map["TRAFF"]])
# ...

which makes the code pretty easy to read, effectively turning it
into a CSV file.

It has the advantage that, if for some reason data in the columns
have spaces in them, it won't throw off the row as a .split() would.

-tkc



 
Reply With Quote
 
Ramchandra Apte
Guest
Posts: n/a
 
      09-05-2012
On Monday, 27 August 2012 15:42:14 UTC+5:30, Laszlo Nagy wrote:
> On 2012-08-27 11:53, Huso wrote:
>
> > Hi,

>
> >

>
> > I am trying to extract some text table data from a log file. I am trying different methods, but I don't seem to get anything to work. I am kind of new to python as well. Hence, appreciate if someone could help me out.

>
>
>
> #
>
> # Write test data to test.txt
>
> #
>
>
>
> data = """
>
> ROUTES TRAFFIC RESULTS, LSR
>
> TRG MP DATE TIME
>
> 37 17 120824 0000
>
>
>
> R TRAFF NBIDS CCONG NDV ANBLO MHTIME NBANSW
>
> AABBCCO 6.4 204 0.0 115 1.0 113.4 144
>
> AABBCCI 3.0 293 115 1.0 37.0 171
>
> DDEEFFO 0.2 5 0.0 59 0.0 107.6 3
>
> EEFFEEI 0.0 0 59 0.0 0.0 0
>
> HHGGFFO 0.0 0 0.0 30 0.0 0.0 0
>
> HHGGFFI 0.3 15 30 0.0 62.2 4
>
> END
>
> """
>
> fout = open("test.txt","wb+")
>
> fout.write(data)
>
> fout.close()
>
>
>
> #
>
> # This is how you iterate over a file and process its lines
>
> #
>
> fin = open("test.txt","r")
>
> for line in fin:
>
> # This is one possible way to extract values.
>
> values = line.strip().split()
>
> print values
>
>
>
>
>
> This will print:
>
>
>
> []
>
> ['ROUTES', 'TRAFFIC', 'RESULTS,', 'LSR']
>
> ['TRG', 'MP', 'DATE', 'TIME']
>
> ['37', '17', '120824', '0000']
>
> []
>
> ['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO', 'MHTIME', 'NBANSW']
>
> ['AABBCCO', '6.4', '204', '0.0', '115', '1.0', '113.4', '144']
>
> ['AABBCCI', '3.0', '293', '115', '1.0', '37.0', '171']
>
> ['DDEEFFO', '0.2', '5', '0.0', '59', '0.0', '107.6', '3']
>
> ['EEFFEEI', '0.0', '0', '59', '0.0', '0.0', '0']
>
> ['HHGGFFO', '0.0', '0', '0.0', '30', '0.0', '0.0', '0']
>
> ['HHGGFFI', '0.3', '15', '30', '0.0', '62.2', '4']
>
> ['END']
>
>
>
>
>
> The "values" list in the last line contains these values. This will work
>
> only if you don't have spaces in your values. Otherwise you can use
>
> regular expressions to parse a line. See here:
>
>
>
> http://docs.python.org/library/re.html
>
>
>

the csv module should be used for this not regex
> Since you did not give any specification on your file format, it would
>
> be hard to give a concrete program that parses your file(s)
>
>
>
> Best,
>
>
>
> Laszlo


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract Text Format Table Data hussain.a.rasheed@gmail.com Python 0 08-27-2012 09:45 AM
Please help me how is easiest way to extract text between some variable text Mladen Perl Misc 5 02-22-2011 10:57 AM
How do i extract vidios when winrar wont extract them??? help plzzzzzzzz smuttdog@sc.rr.com Computer Support 2 12-23-2007 07:03 AM
extract table from xhtml file and java Damo_Suzuki Java 0 12-09-2006 04:44 PM
Extract a portion from a line of a text-file Prabh Java 1 09-01-2003 10:50 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57