Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > convert non-delimited to delimited

Reply
Thread Tools

convert non-delimited to delimited

 
 
RyanL
Guest
Posts: n/a
 
      08-27-2007
I'm a newbie! I have a non-delimited data file that I'd like to
convert to delimited.

Example...
Line in non-delimited file:
0139725635999992000010100534+42050-102800FM-15+1198KAIA

Should be:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

What is the best way to go about this? I've looked all over for
examples, help, suggestions, but have not found much. CSV module
doesn't seem to do exactly what I want. Maybe I'm just missing
something or not using the correct terminology in my searches. Any
assistance is greatly appreaciated! Using Python 2.4

 
Reply With Quote
 
 
 
 
Mark Elston
Guest
Posts: n/a
 
      08-27-2007
* RyanL wrote (on 8/27/2007 10:59 AM):
> I'm a newbie! I have a non-delimited data file that I'd like to
> convert to delimited.
>
> Example...
> Line in non-delimited file:
> 0139725635999992000010100534+42050-102800FM-15+1198KAIA
>
> Should be:
> 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
>
> What is the best way to go about this? I've looked all over for
> examples, help, suggestions, but have not found much. CSV module
> doesn't seem to do exactly what I want. Maybe I'm just missing
> something or not using the correct terminology in my searches. Any
> assistance is greatly appreaciated! Using Python 2.4
>


Since you have to know, a priori, how to break the input string I
assume that these fields are of fixed length. You can use the following
to do what you want:

>>> a="0139725635999992000010100534+42050-102800FM-15+1198KAIA"
>>> print "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s" %

(a[0:4],a[4:10],a[10:15],a[15:19],a[19:21],a[21:23],a[23:25],
a[25:27],a[27],a[28:34],a[34:41],a[41:46],a[46:51],a[51:])

which results in the following output:

0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

Mark
 
Reply With Quote
 
 
 
 
mensanator@aol.com
Guest
Posts: n/a
 
      08-27-2007
On Aug 27, 1:35 pm, Mark Elston <(E-Mail Removed)> wrote:
> * RyanL wrote (on 8/27/2007 10:59 AM):
>
>
>
>
>
> > I'm a newbie! I have a non-delimited data file that I'd like to
> > convert to delimited.

>
> > Example...
> > Line in non-delimited file:
> > 0139725635999992000010100534+42050-102800FM-15+1198KAIA

>
> > Should be:
> > 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

>
> > What is the best way to go about this? I've looked all over for
> > examples, help, suggestions, but have not found much. CSV module
> > doesn't seem to do exactly what I want. Maybe I'm just missing
> > something or not using the correct terminology in my searches. Any
> > assistance is greatly appreaciated! Using Python 2.4

>
> Since you have to know, a priori, how to break the input string I
> assume that these fields are of fixed length. You can use the following
> to do what you want:
>
> >>> a="0139725635999992000010100534+42050-102800FM-15+1198KAIA"
> >>> print "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s" %

> (a[0:4],a[4:10],a[10:15],a[15:19],a[19:21],a[21:23],a[23:25],
> a[25:27],a[27],a[28:34],a[34:41],a[41:46],a[46:51],a[51:])
>
> which results in the following output:
>
> 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
>
> Mark


Or try this:

import struct
test = '0139725635999992000010100534+42050-102800FM-15+1198KAIA'
template = '4s6s5s4s2s2s2s2s1s6s7s5s5s4s'
the_line = struct.unpack(template,test)
print the_line
print ','.join(the_line)

## ('0139', '725635', '99999', '2000', '01', '01', '00', '53', '4',
'+42050', '-102800', 'FM-15', '+1198', 'KAIA')
##
##
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

 
Reply With Quote
 
Matimus
Guest
Posts: n/a
 
      08-27-2007
On Aug 27, 10:59 am, RyanL <(E-Mail Removed)> wrote:
> I'm a newbie! I have a non-delimited data file that I'd like to
> convert to delimited.
>
> Example...
> Line in non-delimited file:
> 0139725635999992000010100534+42050-102800FM-15+1198KAIA
>
> Should be:
> 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
>
> What is the best way to go about this? I've looked all over for
> examples, help, suggestions, but have not found much. CSV module
> doesn't seem to do exactly what I want. Maybe I'm just missing
> something or not using the correct terminology in my searches. Any
> assistance is greatly appreaciated! Using Python 2.4


I don't think you are going to find anything that will just do this
for you. You are going to have read the file, figure out where to
split the string, and reprint it delimited with commas. As for
suggesting code... I can't tell how you actually want to delimit the
stuff from the above example? Are the fields always a fixed number of
characters? If they aren't then is there some other method for
determining how many characters to group into a field? From the looks
of it you could split that string any way you want and get something
that looks right, but isn't.

Matt

 
Reply With Quote
 
Michael Bentley
Guest
Posts: n/a
 
      08-27-2007

On Aug 27, 2007, at 10:59 AM, RyanL wrote:

> I'm a newbie! I have a non-delimited data file that I'd like to
> convert to delimited.
>
> Example...
> Line in non-delimited file:
> 0139725635999992000010100534+42050-102800FM-15+1198KAIA
>
> Should be:
> 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
>
> What is the best way to go about this? I've looked all over for
> examples, help, suggestions, but have not found much. CSV module
> doesn't seem to do exactly what I want. Maybe I'm just missing
> something or not using the correct terminology in my searches. Any
> assistance is greatly appreaciated! Using Python 2.4


Is each data element a fixed size?


 
Reply With Quote
 
Neil Cerutti
Guest
Posts: n/a
 
      08-27-2007
On 2007-08-27, RyanL <(E-Mail Removed)> wrote:
> I'm a newbie! I have a non-delimited data file that I'd like to
> convert to delimited.
>
> Example...
> Line in non-delimited file:
> 0139725635999992000010100534+42050-102800FM-15+1198KAIA
>
> Should be:
> 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA


It looks like a fixed format data file, also called flat-record
files, in which fields are of fixed lengths, and records are
separated by newlines.

> What is the best way to go about this?


Check out chapter 2 of _Text Processing in Python_ for one
solution.

http://gnosis.cx/TPiP/chap2.txt

--
Neil Cerutti
Weight Watchers will meet at 7 p.m. Please use large double door at the side
entrance. --Church Bulletin Blooper
 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      08-28-2007
On Aug 27, 12:59 pm, RyanL <(E-Mail Removed)> wrote:
> I'm a newbie! I have a non-delimited data file that I'd like to
> convert to delimited.
>
> Example...
> Line in non-delimited file:
> 0139725635999992000010100534+42050-102800FM-15+1198KAIA
>
> Should be:
> 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
>
> What is the best way to go about this? I've looked all over for
> examples, help, suggestions, but have not found much. CSV module
> doesn't seem to do exactly what I want. Maybe I'm just missing
> something or not using the correct terminology in my searches. Any
> assistance is greatly appreaciated! Using Python 2.4


I'm guessing that these lines *aren't* fixed-length, especially those
signed integer fields. I used the patented Paul McGuire CrystalBall
module to come up with this pyparsing rendition. (OP may adjust to
suit.)

-- Paul

data = "0139725635999992000010100534+42050-102800FM-15+1198KAIA"
"""to be parsed as:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA"""

from pyparsing import *
import time
def convertTimeStamp(t):
t["date"] = map(int,t.date)
t["time"] = map(int,t.time)
return time.strftime("%Y-%m-%dT%H:%M",
tuple(t.date)+tuple(t.time)+(0,0,0,0))

yearMonthDay = Word(nums,exact=4) + Word(nums,exact=2) +
Word(nums,exact=2)
hourMinuteSecond = Word(nums,exact=2) + Word(nums,exact=2)
timestamp = ( yearMonthDay("date") + hourMinuteSecond("time") )
timestamp.setParseAction(convertTimeStamp)
signedInteger = Word("+-",nums)

fieldA = Word(nums,exact=4)("A")
fieldB = Word(nums,exact=6)("B")
fieldC = Word(nums,exact=5)("C")
fieldD = timestamp("timestamp")
fieldE = Word(nums)("E")
fieldF = signedInteger("latitude").setParseAction(lambda t : int(t[0])/
1000.0)
fieldG = signedInteger("longitude").setParseAction(lambda t :
int(t[0])/1000.0)
fieldH = Combine(Word(alphas,exact=2) + "-" + Word(nums,exact=2))("H")
fieldI = signedInteger("I")
fieldJ = Word(alphas)("J")
dataFields = fieldA + fieldB + fieldC + fieldD + fieldE + \
fieldF + fieldG + fieldH + fieldI + fieldJ

res = dataFields.parseString(data)
print res.dump()

prints:

['0139', '725635', '99999', '2000-01-01T00:53', '4',
42.049999999999997, -102.8, 'FM-15', '+1198', 'KAIA']
- A: 0139
- B: 725635
- C: 99999
- E: 4
- H: FM-15
- I: +1198
- J: KAIA
- latitude: 42.05
- longitude: -102.8
- timestamp: 2000-01-01T00:53

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl/unix script to convert a fixed width file to a tab delimited file Srikant Perl Misc 1 09-29-2007 02:32 PM
A Script To Convert iTunes XML To Tab Delimited Format jean@vaneskahian.com Perl Misc 7 05-27-2006 08:37 PM
import delimited text file to msde TJS ASP .Net 0 10-28-2003 08:40 PM
split a tab delimited string Ajit ASP .Net 6 07-29-2003 03:32 PM



Advertisments