Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Parsing ascii file

Reply
Thread Tools

Parsing ascii file

 
 
diablo
Guest
Posts: n/a
 
      06-17-2004
Hello ,

I have a file that contains the following data (example) and does NOT have
any line feeds:

11 22 33 44 55 66 77 88 99 00 aa bb cc
dd ....to 128th byte 11 22 33 44 55 66 77 88 99
00 aa bb cc dd .... and so on

record 1 starts at 0 and finishes at 128, record 2 starts at 129 and
finishes at 256 and so on. there can be as many as 5000 record per file. I
would like to parse the file and retreive the value at field at byte 64-65
and conduct an arithmetical operation on the field (sum them all up).

Can I do this with python?

if I was to use awk it would look something like this :

cat <filename> | fold -w 128 | awk ' { SUM=SUM + substr($0,64,2) } END
{print SUM}'


Regards
Dean


 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      06-17-2004
diablo wrote:

> Hello ,
>
> I have a file that contains the following data (example) and does NOT have
> any line feeds:
>
> 11 22 33 44 55 66 77 88 99 00 aa bb cc
> dd ....to 128th byte 11 22 33 44 55 66 77 88
> 99
> 00 aa bb cc dd .... and so on
>
> record 1 starts at 0 and finishes at 128, record 2 starts at 129 and
> finishes at 256 and so on. there can be as many as 5000 record per file. I
> would like to parse the file and retreive the value at field at byte 64-65
> and conduct an arithmetical operation on the field (sum them all up).
>
> Can I do this with python?
>
> if I was to use awk it would look something like this :
>
> cat <filename> | fold -w 128 | awk ' { SUM=SUM + substr($0,64,2) } END
> {print SUM}'


Is it an ascii or a binary file? I'm not entire sure from your description.
In the following I assume binary data, but it should be easy to modify the
value() function if those two bytes are ascii digits.

import struct, sys
from itertools import imap

def fold(instream, width=80):
while 1:
line = instream.read(width)
if not line: break
yield line

def value(line, start=64): # may be an "off by one" bug
# return int(line[start:start+2]))
return struct.unpack("h", line[start:start+2])[0]

if __name__ == "__main__":
try:
filename = sys.argv[1]
except IndexError:
instream = sys.stdin
else:
instream = file(filename)

print sum(imap(value, fold(instream, 12))

Peter

 
Reply With Quote
 
 
 
 
Eddie Corns
Guest
Posts: n/a
 
      06-17-2004
"diablo" <(E-Mail Removed)> writes:

>Hello ,


>I have a file that contains the following data (example) and does NOT have
>any line feeds:


>11 22 33 44 55 66 77 88 99 00 aa bb cc
>dd ....to 128th byte 11 22 33 44 55 66 77 88 99
>00 aa bb cc dd .... and so on


>record 1 starts at 0 and finishes at 128, record 2 starts at 129 and
>finishes at 256 and so on. there can be as many as 5000 record per file. I
>would like to parse the file and retreive the value at field at byte 64-65
>and conduct an arithmetical operation on the field (sum them all up).


>Can I do this with python?


>if I was to use awk it would look something like this :


>cat <filename> | fold -w 128 | awk ' { SUM=SUM + substr($0,64,2) } END
>{print SUM}'


You can use stdin.read(12 to get consecutive records and slicing to extract
the fields. Something like:

from sys import stdin
sum = 0
while True:
record = stdin.read(12
if not record: break
sum += int(record[64:65])
print sum

Frankly, I'd stick with the Awk version unless it's a pedagogical exercise.
Actually I'd go further and have a script that simplys sums up all the numbers
in the input and add 'cut' into the pipeline to extract the columns first.

Eddie
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex with ASCII and non-ASCII chars TOXiC Python 5 01-31-2007 04:48 PM
[FR/EN] how to convert the characters ASCII(0-255) to ASCII(0-127) Alextophi Perl Misc 8 12-30-2005 10:43 AM
Re: Stripping ASCII codes when parsing David Pratt Python 1 10-18-2005 03:11 AM
Stripping ASCII codes when parsing David Pratt Python 2 10-17-2005 08:13 PM
routine/module to translate microsoft extended ascii to plain ascii James O'Brien Perl Misc 3 03-05-2004 04:33 PM



Advertisments