Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Parsing of a file

Reply
Thread Tools

Parsing of a file

 
 
Tommy Grav
Guest
Posts: n/a
 
      08-06-2008
I have a file with the format

Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5
Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10

I would like to parse this file by extracting the field id, ra, dec
and mjd for each line. It is
not, however, certain that the width of each value of the field id,
ra, dec or mjd is the same
in each line. Is there a way to do this such that even if there was a
line where Ra=****** and
MJD=******** was swapped it would be parsed correctly?

Cheers
Tommy
 
Reply With Quote
 
 
 
 
Mike Driscoll
Guest
Posts: n/a
 
      08-06-2008
On Aug 6, 1:55*pm, Tommy Grav <(E-Mail Removed)> wrote:
> I have a file with the format
>
> Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames *
> 5 Set 1
> Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames *
> 5 Set 2
> Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames *
> 5 Set 3
> Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames *
> 5 Set 4
> Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames *
> 5 Set 5
> Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames *
> 5 Set 6
> Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames *
> 5 Set 7
> Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames *
> 5 Set 8
> Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames *
> 5 Set 9
> Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames *
> 5 Set 10
>
> I would like to parse this file by extracting the field id, ra, dec *
> and mjd for each line. It is
> not, however, certain that the width of each value of the field id, *
> ra, dec or mjd is the same
> in each line. Is there a way to do this such that even if there was a *
> line where Ra=****** and
> MJD=******** was swapped it would be parsed correctly?
>
> Cheers
> * *Tommy


I'm sure Python can handle this. Try the PyParsing module or learn
Python regular expression syntax.

http://pyparsing.wikispaces.com/

You could probably do it very crudely by just iterating over each line
and then using the string's find() method.

Mike
 
Reply With Quote
 
 
 
 
John Machin
Guest
Posts: n/a
 
      08-06-2008
On Aug 7, 6:02 am, Mike Driscoll <(E-Mail Removed)> wrote:
> On Aug 6, 1:55 pm, Tommy Grav <(E-Mail Removed)> wrote:
>
>
>
> > I have a file with the format

>
> > Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
> > 5 Set 1
> > Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
> > 5 Set 2
> > Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
> > 5 Set 3
> > Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
> > 5 Set 4
> > Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
> > 5 Set 5
> > Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
> > 5 Set 6
> > Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
> > 5 Set 7
> > Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
> > 5 Set 8
> > Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
> > 5 Set 9
> > Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
> > 5 Set 10

>
> > I would like to parse this file by extracting the field id, ra, dec
> > and mjd for each line. It is
> > not, however, certain that the width of each value of the field id,
> > ra, dec or mjd is the same
> > in each line. Is there a way to do this such that even if there was a
> > line where Ra=****** and
> > MJD=******** was swapped it would be parsed correctly?

>
> > Cheers
> > Tommy

>
> I'm sure Python can handle this. Try the PyParsing module or learn
> Python regular expression syntax.
>
> http://pyparsing.wikispaces.com/
>
> You could probably do it very crudely by just iterating over each line
> and then using the string's find() method.
>


Perhaps you and the OP could spend some time becoming familiar with
built-in functions and str methods. In particular, str.split is your
friend:

C:\junk>type tommy_grav.py
# Look, Ma, no imports!

guff = """\
Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
5 Set 1
Field f31448: MJD=53370.06811620123 Dec=+79:39:43.9 Ra=20:24:58.13
Frames 5 Set
2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5

Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10

"""

is_angle = {
'ra': True,
'dec': True,
'mjd': False,
}

def convert_angle(text):
deg, min, sec = map(float, text.split(':'))
return (sec / 60. + min) / 60. + deg

def parse_line(line):
t = line.split()
assert t[0].lower() == 'field'
assert t[1].startswith('f')
assert t[1].endswith(':')
field_id = t[1].rstrip(':')
rdict = {}
for f in t[2:]:
parts = f.split('=')
if len(parts) == 2:
key = parts[0].lower()
value = parts[1]
assert key not in rdict
if is_angle[key]:
rvalue = convert_angle(value)
else:
rvalue = float(value)
rdict[key] = rvalue
return field_id, rdict['ra'], rdict['dec'], rdict['mjd']

for line in guff.splitlines():
line = line.strip()
if not line:
continue
field_id, ra, dec, mjd = parse_line(line)
print field_id, ra, dec, mjd


C:\junk>tommy_grav.py
f29227 20.3962611111 67.5 53370.0679769
f31448 20.4161472222 79.6621944444 53370.0681162
f31226 20.4126388889 78.4458888889 53370.0682386
f31004 20.4181333333 77.2296944444 53370.0683602
f30782 20.4310944444 76.0135 53370.0684821
f30560 20.4505055556 74.7973055556 53370.068604
f30338 20.4756527778 73.5811111111 53370.0687262
f30116 20.5060277778 72.3648888889 53370.0688489
f29894 20.5412611111 71.1486111111 53370.0689707
f29672 20.5810805556 69.9323888889 53370.0690935

Cheers,
John

 
Reply With Quote
 
bearophileHUGS@lycos.com
Guest
Posts: n/a
 
      08-06-2008
Using something like PyParsing is probably better, but if you don't
want to use it you may use something like this:

raw_data = """
Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5
Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10"""

# from each line extract the fields: id, ra, dec, mjd
# even if they are swapped

data = []
for line in raw_data.lower().splitlines():
if line.startswith("field"):
parts = line.split()
record = {"id": int(parts[1][1:-1])}
for part in parts[2:]:
if "=" in part:
title, field = part.split("=")
record[title] = field
data.append(record)
print data

-----------------

Stefan Behnel:
>You can use named groups in a single regular expression.<


Can you show how to use them in this situation when fields can be
swapped?

Bye,
bearophile
 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      08-06-2008
On Aug 7, 7:06 am, John Machin <(E-Mail Removed)> wrote:
> On Aug 7, 6:02 am, Mike Driscoll <(E-Mail Removed)> wrote:
>
>
>
> > On Aug 6, 1:55 pm, Tommy Grav <(E-Mail Removed)> wrote:

>
> > > I have a file with the format

>
> > > Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
> > > 5 Set 1
> > > Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
> > > 5 Set 2
> > > Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
> > > 5 Set 3
> > > Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
> > > 5 Set 4
> > > Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
> > > 5 Set 5
> > > Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
> > > 5 Set 6
> > > Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
> > > 5 Set 7
> > > Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
> > > 5 Set 8
> > > Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
> > > 5 Set 9
> > > Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
> > > 5 Set 10

>
> > > I would like to parse this file by extracting the field id, ra, dec
> > > and mjd for each line. It is
> > > not, however, certain that the width of each value of the field id,
> > > ra, dec or mjd is the same
> > > in each line. Is there a way to do this such that even if there was a
> > > line where Ra=****** and
> > > MJD=******** was swapped it would be parsed correctly?

>
> > > Cheers
> > > Tommy

>
> > I'm sure Python can handle this. Try the PyParsing module or learn
> > Python regular expression syntax.

>
> >http://pyparsing.wikispaces.com/

>
> > You could probably do it very crudely by just iterating over each line
> > and then using the string's find() method.

>
> Perhaps you and the OP could spend some time becoming familiar with
> built-in functions and str methods. In particular, str.split is your
> friend:
>
> C:\junk>type tommy_grav.py
> # Look, Ma, no imports!
>
> guff = """\
> Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
> 5 Set 1
> Field f31448: MJD=53370.06811620123 Dec=+79:39:43.9 Ra=20:24:58.13
> Frames 5 Set
> 2
> Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
> 5 Set 3
> Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
> 5 Set 4
> Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
> 5 Set 5
>
> Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
> 5 Set 6
> Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
> 5 Set 7
> Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
> 5 Set 8
> Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
> 5 Set 9
> Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
> 5 Set 10
>
> """
>
> is_angle = {
> 'ra': True,
> 'dec': True,
> 'mjd': False,
> }
>
> def convert_angle(text):
> deg, min, sec = map(float, text.split(':'))
> return (sec / 60. + min) / 60. + deg
>
> def parse_line(line):
> t = line.split()
> assert t[0].lower() == 'field'
> assert t[1].startswith('f')
> assert t[1].endswith(':')
> field_id = t[1].rstrip(':')
> rdict = {}
> for f in t[2:]:
> parts = f.split('=')
> if len(parts) == 2:
> key = parts[0].lower()
> value = parts[1]
> assert key not in rdict
> if is_angle[key]:
> rvalue = convert_angle(value)
> else:
> rvalue = float(value)
> rdict[key] = rvalue
> return field_id, rdict['ra'], rdict['dec'], rdict['mjd']
>
> for line in guff.splitlines():
> line = line.strip()
> if not line:
> continue
> field_id, ra, dec, mjd = parse_line(line)
> print field_id, ra, dec, mjd
>
> C:\junk>tommy_grav.py
> f29227 20.3962611111 67.5 53370.0679769
> f31448 20.4161472222 79.6621944444 53370.0681162
> f31226 20.4126388889 78.4458888889 53370.0682386
> f31004 20.4181333333 77.2296944444 53370.0683602
> f30782 20.4310944444 76.0135 53370.0684821
> f30560 20.4505055556 74.7973055556 53370.068604
> f30338 20.4756527778 73.5811111111 53370.0687262
> f30116 20.5060277778 72.3648888889 53370.0688489
> f29894 20.5412611111 71.1486111111 53370.0689707
> f29672 20.5810805556 69.9323888889 53370.0690935
>
> Cheers,
> John


Slightly less ugly:

C:\junk>diff tommy_grav.py tommy_grav_2.py
18,23d17
< is_angle = {
< 'ra': True,
< 'dec': True,
< 'mjd': False,
< }
<
27a22,27
> converter = {
> 'ra': convert_angle,
> 'dec': convert_angle,
> 'mjd': float,
> }
>

41,44c41
< if is_angle[key]:
< rvalue = convert_angle(value)
< else:
< rvalue = float(value)
---
> rvalue = converter[key](value)

 
Reply With Quote
 
Henrique Dante de Almeida
Guest
Posts: n/a
 
      08-06-2008
On Aug 6, 3:55*pm, Tommy Grav <(E-Mail Removed)> wrote:
> I have a file with the format
>
> Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames *
> 5 Set 1
> Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames *
> 5 Set 2
> Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames *
> 5 Set 3
> Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames *
> 5 Set 4
> Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames *
> 5 Set 5
> Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames *
> 5 Set 6
> Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames *
> 5 Set 7
> Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames *
> 5 Set 8
> Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames *
> 5 Set 9
> Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames *
> 5 Set 10
>
> I would like to parse this file by extracting the field id, ra, dec *
> and mjd for each line. It is
> not, however, certain that the width of each value of the field id, *
> ra, dec or mjd is the same
> in each line. Is there a way to do this such that even if there was a *
> line where Ra=****** and
> MJD=******** was swapped it would be parsed correctly?
>
> Cheers
> * *Tommy


Did you consider changing the file format in the first place, so that
you don't have to do any contortions to parse it ?

Anyway, here is a solution with regular expressions (I'm a beginner
with re's in python, so, please correct it if wrong and suggest better
solutions):

import re
s = """Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690
Frames 5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5
Field f30560: Dec=+74:47:50.3 Ra=20:27:01.82 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10"""

s = s.split('\n')
r = re.compile(r'Field (\S+): (??:Ra=(\S+) Dec=(\S+))|(?ec=(\S+)
Ra=(\S+))) MJD=(\S+)')
for i in s:
match = r.findall(i)
field = match[0][0]
Ra = match[0][1] or match[0][4]
Dec = match[0][2] or match[0][3]
MJD = match[0][5]
print field, Ra, Dec, MJD
 
Reply With Quote
 
Bruno Desthuilliers
Guest
Posts: n/a
 
      08-07-2008
Tommy Grav a écrit :
> I have a file with the format
>
> Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames 5
> Set 1
> Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames 5
> Set 2
> Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames 5
> Set 3
> Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames 5
> Set 4
> Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames 5
> Set 5
> Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames 5
> Set 6
> Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames 5
> Set 7
> Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames 5
> Set 8
> Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames 5
> Set 9
> Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames 5
> Set 10
>
> I would like to parse this file by extracting the field id, ra, dec and
> mjd for each line. It is
> not, however, certain that the width of each value of the field id, ra,
> dec or mjd is the same
> in each line. Is there a way to do this such that even if there was a
> line where Ra=****** and
> MJD=******** was swapped it would be parsed correctly?


Q&D :

src = open('/path/to/yourfile.ext')
parsed = []
for line in src:
line = line.strip()
if not line:
continue
head, rest = line.split(':', 1)
field_id = head.split()[1]
data = dict(field_id=field_id)
parts = rest.split()
for part in parts:
try:
key, val = part.split('=')
except ValueError:
continue
data[key] = val
parsed.append(data)
src.close()
 
Reply With Quote
 
Mike Driscoll
Guest
Posts: n/a
 
      08-07-2008
On Aug 6, 4:06*pm, John Machin <(E-Mail Removed)> wrote:
> On Aug 7, 6:02 am, Mike Driscoll <(E-Mail Removed)> wrote:
>
>
>
> > On Aug 6, 1:55 pm, Tommy Grav <(E-Mail Removed)> wrote:

>
> > > I have a file with the format

>
> > > Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
> > > 5 Set 1
> > > Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
> > > 5 Set 2
> > > Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
> > > 5 Set 3
> > > Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
> > > 5 Set 4
> > > Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
> > > 5 Set 5
> > > Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
> > > 5 Set 6
> > > Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
> > > 5 Set 7
> > > Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
> > > 5 Set 8
> > > Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
> > > 5 Set 9
> > > Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
> > > 5 Set 10

>
> > > I would like to parse this file by extracting the field id, ra, dec
> > > and mjd for each line. It is
> > > not, however, certain that the width of each value of the field id,
> > > ra, dec or mjd is the same
> > > in each line. Is there a way to do this such that even if there was a
> > > line where Ra=****** and
> > > MJD=******** was swapped it would be parsed correctly?

>
> > > Cheers
> > > * *Tommy

>
> > I'm sure Python can handle this. Try the PyParsing module or learn
> > Python regular expression syntax.

>
> >http://pyparsing.wikispaces.com/

>
> > You could probably do it very crudely by just iterating over each line
> > and then using the string's find() method.

>
> Perhaps you and the OP could spend some time becoming familiar with
> built-in functions and str methods. In particular, str.split is your
> friend:
>


I'm well aware of the split() method and built-ins, however since this
appeared to be a homework-type question and I was at work, I didn't
spend any time on the issue. The only reason I mentioned McGuire's
PyParsing module was because I had just finished reading his article
on the subject in Python Magazine and it sounded like something the OP
might find interesting.

Here's my own implementation based on what's already been done here.
I'm sure one get have some fun doing it with itertools or list
comprehensions if you wanted to get really fancy.

<code>

raw_data = """
Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5
Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10
""".splitlines()

myList = []
for line in raw_data:
items = line.split()
myDict = {}
for item in items:
if '=' in item:
key, value = item.split('=')
myDict[key] = value
elif item[:1].lower() == 'f' and item[-1:] == ':':
myDict['id'] = item[1:-1]
myList.append(myDict)

print myList

</code>

This doesn't have any type checking or error handling, but it works
with the data provided.

Mike
 
Reply With Quote
 
Tommy Grav
Guest
Posts: n/a
 
      08-08-2008

On Aug 7, 2008, at 12:52 PM, Mike Driscoll wrote:
> I'm well aware of the split() method and built-ins, however since this
> appeared to be a homework-type question and I was at work, I didn't
> spend any time on the issue. The only reason I mentioned McGuire's
> PyParsing module was because I had just finished reading his article
> on the subject in Python Magazine and it sounded like something the OP
> might find interesting.\


Thanks to everyone that responded, I learned a lot about text parsing
from
the responses. I just wanted to respond to Mike and let him know that
this
was not a homework problem. I was given a file in the format by a
colleague
for a project that I am working on (it contains a list of fields
observed by
the LINEAR asteroid search project during 2005 and 2006). I could have
parsed it using slices of each line, but the unusual format of each line
got me thinking about wether there was another way to do it. I had
tried a
few approaches, but I had not considered the .split() and .split("=").
Of course
the list members quickly came up with a simple and elegant solution. And
I learned a lot in the process

Cheers
Tommy Grav
+
-----------------------------------------------------------------------------------------------------------------+
Associate Research Scientist Dept. of Physics and Astronomy
Johns Hopkins University Bloomberg 243
http://www.velocityreviews.com/forums/(E-Mail Removed) 3400 N. Charles St.
(410) 516-7683 Baltimore, MD21218
+
-----------------------------------------------------------------------------------------------------------------+




 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[ANN] Parsing Tutorial and YARD 1.0: A C++ Parsing Framework Christopher Diggins C++ 0 07-09-2007 09:01 PM
[ANN] Parsing Tutorial and YARD 1.0: A C++ Parsing Framework Christopher Diggins C++ 0 07-09-2007 08:58 PM
In file parsing, taking the first few characters of a text file after a readfile or streamreader file read... .Net Sports ASP .Net 11 01-17-2006 12:44 AM
SAX Parsing - Weird results when parsing content between tags. Naren XML 0 05-11-2004 07:25 PM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM



Advertisments