Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > An Odd Little Script

Reply
Thread Tools

An Odd Little Script

 
 
Greg Lindstrom
Guest
Posts: n/a
 
      03-09-2005
Hello-

I have a task which -- dare I say -- would be easy in <asbestos_undies>
Perl </asbestos_undies> but would rather do in Python (our primary
language at Novasys). I have a file with varying length records. All
but the first record, that is; it's always 107 bytes long. What I would
like to do is strip out all linefeeds from the file, read the character
in position 107 (the end of segment delimiter) and then replace all of
the end of segment characters with linefeeds, making a file where each
segment is on its own line. Currently, some vendors supply files with
linefeeds, others don't, and some split the file every 80 bytes. In
Perl I would operate on the file in place and be on my way. The files
can be quite large, so I'd rather not be making extra copies unless it's
absolutely essential/required.

I turn to the collective wisdom/trickery of the list to point me in the
right direction. How can I perform the above task while keeping my sanity?

Thanks!
--greg
--
Greg Lindstrom 501 975.4859
Computer Programmer
NovaSys Health
Little Rock, Arkansas

"We are the music makers, and we are the dreamers of dreams." W.W.
 
Reply With Quote
 
 
 
 
Michael Hoffman
Guest
Posts: n/a
 
      03-09-2005
Greg Lindstrom wrote:

> I have a file with varying length records. All
> but the first record, that is; it's always 107 bytes long. What I would
> like to do is strip out all linefeeds from the file, read the character
> in position 107 (the end of segment delimiter) and then replace all of
> the end of segment characters with linefeeds, making a file where each
> segment is on its own line.


Hmmmm... here's one way of doing it:

import mmap
import sys

DELIMITER_OFFSET = 107

data_file = file(sys.argv[1], "r+w")
data_file.seek(0, 2)
data_length = data_file.tell()
data = mmap.mmap(data_file.fileno(), data_length, access=mmap.ACCESS_WRITE)
delimiter = data[DELIMITER_OFFSET]

for index, char in enumerate(data):
if char == delimiter:
data[index] = "\n"

data.flush()

There are doubtless more efficient ways, like using mmap.mmap.find()
instead of iterating over every character but that's an exercise for
the reader. And personally I would make extra copies ANYWAY--not doing
so is asking for trouble.
--
Michael Hoffman
 
Reply With Quote
 
 
 
 
M.E.Farmer
Guest
Posts: n/a
 
      03-09-2005
Greg Lindstrom wrote:
> Hello-
>
> I have a task which -- dare I say -- would be easy in

<asbestos_undies>
> Perl </asbestos_undies> but would rather do in Python (our primary
> language at Novasys). I have a file with varying length records.

All
> but the first record, that is; it's always 107 bytes long. What I

would
> like to do is strip out all linefeeds from the file, read the

character
> in position 107 (the end of segment delimiter) and then replace all

of
> the end of segment characters with linefeeds, making a file where

each
> segment is on its own line. Currently, some vendors supply files

with
> linefeeds, others don't, and some split the file every 80 bytes. In
> Perl I would operate on the file in place and be on my way. The

files
> can be quite large, so I'd rather not be making extra copies unless

it's
> absolutely essential/required.
>
> I turn to the collective wisdom/trickery of the list to point me in

the
> right direction. How can I perform the above task while keeping my

sanity?
>
> Thanks!
> --greg
> --
> Greg Lindstrom 501 975.4859
> Computer Programmer
> NovaSys Health
> Little Rock, Arkansas
>
> "We are the music makers, and we are the dreamers of dreams." W.W.


This should be fairly simple, but maybe not
# get the end of segment character
# this is not optimal but should be a start
f = open('yourrecord', 'r')
eos = f.seek(107).read(1)
r = f.read()
f.close()
r = r.replace('\r', '')
r = r.replace('\n', '')
r = r.replace(eos, '\n')
f = open('yourrecord', 'w')
f.write(r)
f.close()

hth,
M.E.Farmer

 
Reply With Quote
 
Michael Hoffman
Guest
Posts: n/a
 
      03-09-2005
Michael Hoffman wrote:
> Greg Lindstrom wrote:
>
>> I have a file with varying length records. All but the first record,
>> that is; it's always 107 bytes long. What I would like to do is strip
>> out all linefeeds from the file, read the character in position 107
>> (the end of segment delimiter) and then replace all of the end of
>> segment characters with linefeeds, making a file where each segment is
>> on its own line.

>
>
> Hmmmm... here's one way of doing it:
>
> import mmap
> import sys
>
> DELIMITER_OFFSET = 107


N.B. this is a zero-based 107. If you are using one-based coordinates,
then this is actually position 108.
--
Michael Hoffman
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
1 little 2 little 3 little Kennedys dale Digital Photography 0 03-23-2008 01:03 PM
having a little problem with some code for a little game I am creating. ThaDoctor C++ 3 09-28-2007 03:28 PM
Odd script error when using client script callbacks (ASP.NET 2.0) =?Utf-8?B?TG93bGFuZGVy?= ASP .Net 0 03-27-2007 10:30 PM
Odd behavior with odd code Michael Speer C Programming 33 02-18-2007 07:31 AM
little red X in little white box Puzzled Computer Support 8 12-13-2004 09:11 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57