Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Pyparsing help

Reply
Thread Tools

Pyparsing help

 
 
rh0dium
Guest
Posts: n/a
 
      03-22-2008
Hi all,

I am struggling with parsing the following data:

test1 = """
Technology {
name = "gtc"
dielectric = 2.75e-05
unitTimeName = "ns"
timePrecision = 1000
unitLengthName = "micron"
lengthPrecision = 1000
gridResolution = 5
unitVoltageName = "v"
voltagePrecision = 1000000
unitCurrentName = "ma"
currentPrecision = 1000
unitPowerName = "pw"
powerPrecision = 1000
unitResistanceName = "kohm"
resistancePrecision = 10000000
unitCapacitanceName = "pf"
capacitancePrecision = 10000000
unitInductanceName = "nh"
inductancePrecision = 100
}

Tile "unit" {
width = 0.22
height = 1.69
}

Layer "PRBOUNDARY" {
layerNumber = 0
maskName = ""
visible = 1
selectable = 1
blink = 0
color = "cyan"
lineStyle = "solid"
pattern = "blank"
pitch = 0
defaultWidth = 0
minWidth = 0
minSpacing = 0
}

Layer "METAL2" {
layerNumber = 36
maskName = "metal2"
isDefaultLayer = 1
visible = 1
selectable = 1
blink = 0
color = "yellow"
lineStyle = "solid"
pattern = "blank"
pitch = 0.46
defaultWidth = 0.2
minWidth = 0.2
minSpacing = 0.21
fatContactThreshold = 1.4
maxSegLenForRC = 2000
unitMinResistance = 6.1e-05
unitNomResistance = 6.3e-05
unitMaxResistance = 6.9e-05
unitMinHeightFromSub = 1.21
unitNomHeightFromSub = 1.237
unitMaxHeightFromSub = 1.267
unitMinThickness = 0.25
unitNomThickness = 0.475
unitMaxThickness = 0.75
fatTblDimension = 3
fatTblThreshold = (0,0.39,10.005)
fatTblParallelLength = (0,1,0)
fatTblSpacing = (0.21,0.24,0.6,
0.24,0.24,0.6,
0.6,0.6,0.6)
minArea = 0.144
}
"""

So it looks like starting from the inside out
I have an key and a value where the value can be a QuotedString,
Word(num), or a list of nums

So my code to catch this looks like this..

atflist = Suppress("(") + commaSeparatedList + Suppress(")")
atfstr = quotedString.setParseAction(removeQuotes)
atfvalues = ( Word(nums) | atfstr | atflist )

l = ("36", '"metal2"', '(0.21,0.24,0.6,0.24,0.24,0.6)')

for x in l:
print atfvalues.parseString(x)

But this isn't passing the list commaSeparatedList. Can someone point
out my errors?

As a side note: Is this the right approach to using pyparsing. Do we
start from the inside and work our way out or should I have started
with looking at the bigger picture ( keyword + "{" + OneOrMore key /
vals + "}" + ) I started there but could figure out how to look
multiline - I'm assuming I'd just join them all up?

Thanks

 
Reply With Quote
 
 
 
 
Paul McGuire
Guest
Posts: n/a
 
      03-22-2008
On Mar 22, 4:11*pm, rh0dium <steven.kl...@gmail.com> wrote:
> Hi all,
>
> I am struggling with parsing the following data:
>

<snip>
> As a side note: *Is this the right approach to using pyparsing. *Do we
> start from the inside and work our way out or should I have started
> with looking at the bigger picture ( keyword + "{" + OneOrMore key /
> vals + "}" + ) *I started there but could figure out how to look
> multiline - I'm assuming I'd just join them all up?
>
> Thanks


I think your "inside-out" approach is just fine. Start by composing
expressions for the different "pieces" of your input text, then
steadily build up more and more complex forms.

I think the main complication you have is that of using
commaSeparatedList for your list of real numbers. commaSeparatedList
is a very generic helper expression. From the online example (http://
pyparsing.wikispaces.com/space/showimage/commasep.py), here is a
sample of the data that commaSeparatedList will handle:

"a,b,c,100.2,,3",
"d, e, j k , m ",
"'Hello, World', f, g , , 5.1,x",
"John Doe, 123 Main St., Cleveland, Ohio",
"Jane Doe, 456 St. James St., Los Angeles , California ",

In other words, the content of the items between commas is pretty much
anything that is *not* a comma. If you change your definition of
atflist to:

atflist = Suppress("(") + commaSeparatedList # + Suppress(")")

(that is, comment out the trailing right paren), you'll get this
successful parse result:

['0.21', '0.24', '0.6', '0.24', '0.24', '0.6)']

In your example, you are parsing a list of floating point numbers, in
a list delimited by commas, surrounded by parens. This definition of
atflist should give you more control over the parsing process, and
give you real floats to boot:

floatnum = Combine(Word(nums) + "." + Word(nums) +
Optional('e'+oneOf("+ -")+Word(nums)))
floatnum.setParseAction(lambda t:float(t[0]))
atflist = Suppress("(") + delimitedList(floatnum) + Suppress(")")

Now I get this output for your parse test:

[0.20999999999999999, 0.23999999999999999, 0.59999999999999998,
0.23999999999999999, 0.23999999999999999, 0.59999999999999998]

So you can see that this has actually parsed the numbers and converted
them to floats.

I went ahead and added support for scientific notation in floatnum,
since I see that you have several atfvalues that are standalone
floats, some using scientific notation. To add these, just expand
atfvalues to:

atfvalues = ( floatnum | Word(nums) | atfstr | atflist )

(At this point, I'll go on to show how to parse the rest of the data
structure - if you want to take a stab at it yourself, stop reading
here, and then come back to compare your results with my approach.)

To parse the overall structure, now that you have expressions for the
different component pieces, look into using Dict (or more simply using
the helper function dictOf) to define results names automagically for
you based on the attribute names in the input. Dict does *not* change
any of the parsing or matching logic, it just adds named fields in the
parsed results corresponding to the key names found in the input.

Dict is a complex pyparsing class, but dictOf simplfies things.
dictOf takes two arguments:

dictOf(keyExpression, valueExpression)

This translates to:

Dict( OneOrMore( Group(keyExpression + valueExpression) ) )

For example, to parse the lists of entries that look like:

name = "gtc"
dielectric = 2.75e-05
unitTimeName = "ns"
timePrecision = 1000
unitLengthName = "micron"
etc.

just define that this is "a dict of entries each composed of a key
consisting of a Word(alphas), followed by a suppressed '=' sign and an
atfvalues", that is:

attrDict = dictOf(Word(alphas), Suppress("=") + atfvalues)

dictOf takes care of all of the repetition and grouping necessary for
Dict to do its work. These attribute dicts are nested within an outer
main dict, which is "a dict of entries, each with a key of
Word(alphas), and a value of an optional quotedString (an alias,
perhaps?), a left brace, an attrDict, and a right brace," or:

mainDict = dictOf(
Word(alphas),
Optional(quotedString)("alias") +
Suppress("{") + attrDict + Suppress("}")
)

By adding this code to what you already have:

attrDict = dictOf(Word(alphas), Suppress("=") + atfvalues)
mainDict = dictOf(
Word(alphas),
Optional(quotedString)("alias") +
Suppress("{") + attrDict + Suppress("}")
)

You can now write:

md = mainDict.parseString(test1)
print md.dump()
print md.Layer.lineStyle

and get this output:

[['Technology', ['name', 'gtc'], ['dielectric',
2.7500000000000001e-005], ['unitTimeName', 'ns'], ['timePrecision',
'1000'], ['unitLengthName', 'micron'], ['lengthPrecision', '1000'],
['gridResolution', '5'], ['unitVoltageName', 'v'],
['voltagePrecision', '1000000'], ['unitCurrentName', 'ma'],
['currentPrecision', '1000'], ['unitPowerName', 'pw'],
['powerPrecision', '1000'], ['unitResistanceName', 'kohm'],
['resistancePrecision', '10000000'], ['unitCapacitanceName', 'pf'],
['capacitancePrecision', '10000000'], ['unitInductanceName', 'nh'],
['inductancePrecision', '100']], ['Tile', 'unit', ['width', 0.22],
['height', 1.6899999999999999]], ['Layer', 'PRBOUNDARY',
['layerNumber', '0'], ['maskName', ''], ['visible', '1'],
['selectable', '1'], ['blink', '0'], ['color', 'cyan'], ['lineStyle',
'solid'], ['pattern', 'blank'], ['pitch', '0'], ['defaultWidth', '0'],
['minWidth', '0'], ['minSpacing', '0']]]
- Layer: ['PRBOUNDARY', ['layerNumber', '0'], ['maskName', ''],
['visible', '1'], ['selectable', '1'], ['blink', '0'], ['color',
'cyan'], ['lineStyle', 'solid'], ['pattern', 'blank'], ['pitch', '0'],
['defaultWidth', '0'], ['minWidth', '0'], ['minSpacing', '0']]
- alias: PRBOUNDARY
- blink: 0
- color: cyan
- defaultWidth: 0
- layerNumber: 0
- lineStyle: solid
- maskName:
- minSpacing: 0
- minWidth: 0
- pattern: blank
- pitch: 0
- selectable: 1
- visible: 1
- Technology: [['name', 'gtc'], ['dielectric',
2.7500000000000001e-005], ['unitTimeName', 'ns'], ['timePrecision',
'1000'], ['unitLengthName', 'micron'], ['lengthPrecision', '1000'],
['gridResolution', '5'], ['unitVoltageName', 'v'],
['voltagePrecision', '1000000'], ['unitCurrentName', 'ma'],
['currentPrecision', '1000'], ['unitPowerName', 'pw'],
['powerPrecision', '1000'], ['unitResistanceName', 'kohm'],
['resistancePrecision', '10000000'], ['unitCapacitanceName', 'pf'],
['capacitancePrecision', '10000000'], ['unitInductanceName', 'nh'],
['inductancePrecision', '100']]
- capacitancePrecision: 10000000
- currentPrecision: 1000
- dielectric: 2.75e-005
- gridResolution: 5
- inductancePrecision: 100
- lengthPrecision: 1000
- name: gtc
- powerPrecision: 1000
- resistancePrecision: 10000000
- timePrecision: 1000
- unitCapacitanceName: pf
- unitCurrentName: ma
- unitInductanceName: nh
- unitLengthName: micron
- unitPowerName: pw
- unitResistanceName: kohm
- unitTimeName: ns
- unitVoltageName: v
- voltagePrecision: 1000000
- Tile: ['unit', ['width', 0.22], ['height', 1.6899999999999999]]
- alias: unit
- height: 1.69
- width: 0.22
solid

Cheers!
-- Paul
 
Reply With Quote
 
 
 
 
Paul McGuire
Guest
Posts: n/a
 
      03-23-2008
Oof, I see that you have multiple "Layer" entries, with different
qualifying labels. Since the dicts use "Layer" as the key, you only
get the last "Layer" value, with qualifier "PRBOUNDARY", and lose the
"Layer" for "METAL2". To fix this, you'll have to move the optional
alias term to the key, and merge "Layer" and "PRBOUNDARY" into a
single key, perhaps "Layer/PRBOUNDARY" or "Layer(PRBOUNDARY)" - a
parse action should take care of this for you. Unfortnately, these
forms will not allow you to use object attribute form
(md.Layer.lineStyle), you will have to use dict access form
(md["Layer(PRBOUNDARY)"].lineStyle), since these keys have characters
that are not valid attribute name characters.

Or you could add one more level of Dict nesting to your grammar, to
permit access like "md.Layer.PRBOUNDARY.lineStyle".

-- Paul

 
Reply With Quote
 
rh0dium
Guest
Posts: n/a
 
      03-23-2008
On Mar 22, 6:30*pm, Paul McGuire <pt...@austin.rr.com> wrote:
> Oof, I see that you have multiple "Layer" entries, with different
> qualifying labels. *Since the dicts use "Layer" as the key, you only
> get the last "Layer" value, with qualifier "PRBOUNDARY", and lose the
> "Layer" for "METAL2". *To fix this, you'll have to move the optional
> alias term to the key, and merge "Layer" and "PRBOUNDARY" into a
> single key, perhaps "Layer/PRBOUNDARY" or "Layer(PRBOUNDARY)" - a
> parse action should take care of this for you. *Unfortnately, these
> forms will not allow you to use object attribute form
> (md.Layer.lineStyle), you will have to use dict access form
> (md["Layer(PRBOUNDARY)"].lineStyle), since these keys have characters
> that are not valid attribute name characters.
>
> Or you could add one more level of Dict nesting to your grammar, to
> permit access like "md.Layer.PRBOUNDARY.lineStyle".
>
> -- Paul


OK - We'll I got as far as you did but I did it a bit differently..
Then I merged some of your data with my data. But Now I am at the
point of adding another level of the dict and am struggling.. Here is
what I have..

# parse actions
LPAR = Literal("(")
RPAR = Literal(")")
LBRACE = Literal("{")
RBRACE = Literal("}")
EQUAL = Literal("=")

# This will get the values all figured out..
# "metal2" 1 6.05E-05 30
cvtInt = lambda toks: int(toks[0])
cvtReal = lambda toks: float(toks[0])

integer = Combine(Optional(oneOf("+ -")) + Word(nums))\
.setParseAction( cvtInt )
real = Combine(Optional(oneOf("+ -")) + Word(nums) + "." +
Optional(Word(nums)) +
Optional(oneOf("e E")+Optional(oneOf("+ -"))
+Word(nums)))\
.setParseAction( cvtReal )
atfstr = quotedString.setParseAction(removeQuotes)
atflist = Group( LPAR.suppress() +
delimitedList(real, ",") +
RPAR.suppress() )

atfvalues = ( real | integer | atfstr | atflist )

# Now this should work out a single line inside a section
# maskName = "metal2"
# isDefaultLayer = 1
# visible = 1
# fatTblSpacing = (0.21,0.24,0.6,
# 0.6,0.6,0.6)
# minArea = 0.144
atfkeys = Word(alphanums)
attrDict = dictOf( atfkeys , EQUAL.suppress() + atfvalues)

# Now we need to take care of the "Metal2" { one or more
attrDict }
# "METAL2" {
# layerNumber = 36
# maskName = "metal2"
# isDefaultLayer = 1
# visible = 1
# fatTblSpacing =
(0.21,0.24,0.6,
#
0.24,0.24,0.6,
#
0.6,0.6,0.6)
# minArea = 0.144
# }
attrType = dictOf(atfstr, LBRACE.suppress() + attrDict +
RBRACE.suppress())

# Lastly we need to get the ones without attributes (Technology)
attrType2 = LBRACE.suppress() + attrDict + RBRACE.suppress()
mainDict = dictOf(atfkeys, attrType2 | attrType )

md = mainDict.parseString(test1)


But I too am only getting the last layer. I thought if broke out the
"alias" area and then built on that I'd be set but I did something
wrong.

 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      03-23-2008
There are a couple of bugs in our program so far.

First of all, our grammar isn't parsing the METAL2 entry at all. We
should change this line:

md = mainDict.parseString(test1)

to

md = (mainDict+stringEnd).parseString(test1)

The parser is reading as far as it can, but then stopping once
successful parsing is no longer possible. Since there is at least one
valid entry matching the OneOrMore expression, then parseString raises
no errors. By adding "+stringEnd" to our expression to be parsed, we
are saying "once parsing is finished, we should be at the end of the
input string". By making this change, we now get this parse
exception:

pyparsing.ParseException: Expected stringEnd (at char 194, (line:54,
col:1)

So what is the matter with the METAL2 entries? After using brute
force "divide and conquer" (I deleted half of the entries and got a
successful parse, then restored half of the entries I removed, until I
added back the entry that caused the parse to fail), I found these
lines in the input:

fatTblThreshold = (0,0.39,10.005)
fatTblParallelLength = (0,1,0)

Both of these violate the atflist definition, because they contain
integers, not just floatnums. So we need to expand the definition of
aftlist:

floatnum = Combine(Word(nums) + "." + Word(nums) +
Optional('e'+oneOf("+ -")+Word(nums)))
floatnum.setParseAction(lambda t:float(t[0]))
integer = Word(nums).setParseAction(lambda t:int(t[0]))
atflist = Suppress("(") + delimitedList(floatnum|integer) + \
Suppress(")")

Then we need to tackle the issue of adding nesting for those entries
that have sub-keys. This is actually kind of tricky for your data
example, because nesting within Dict expects input data to be nested.
That is, nesting Dict's is normally done with data that is input like:

main
Technology
Layer
PRBOUNDARY
METAL2
Tile
unit

But your data is structured slightly differently:

main
Technology
Layer PRBOUNDARY
Layer METAL2
Tile unit

Because Layer is repeated, the second entry creates a new node named
"Layer" at the second level, and the first "Layer" entry is lost. To
fix this, we need to combine Layer and the layer id into a composite-
type of key. I did this by using Group, and adding the Optional alias
(which I see now is a poor name, "layerId" would be better) as a
second element of the key:

mainDict = dictOf(
Group(Word(alphas)+Optional(quotedString)),
Suppress("{") + attrDict + Suppress("}")
)

But now if we parse the input with this mainDict, we see that the keys
are no longer nice simple strings, but they are 1- or 2-element
ParseResults objects. Here is what I get from the command "print
md.keys()":

[(['Technology'], {}), (['Tile', 'unit'], {}), (['Layer',
'PRBOUNDARY'], {}), (['Layer', 'METAL2'], {})]

So to finally clear this up, we need one more parse action, attached
to the mainDict expression, that rearranges the subdicts using the
elements in the keys. The parse action looks like this, and it will
process the overall parse results for the entire data structure:

def rearrangeSubDicts(toks):
# iterate over all key-value pairs in the dict
for key,value in toks.items():
# key is of the form ['name'] or ['name', 'name2']
# and the value is the attrDict

# if key has just one element, use it to define
# a simple string key
if len(key)==1:
toks[key[0]] = value
else:
# if the key has two elements, create a
# subnode with the first element
if key[0] not in toks:
toks[key[0]] = ParseResults([])

# add an entry for the second key element
toks[key[0]][key[1]] = value

# now delete the original key that is the form
# ['name'] or ['name', 'name2']
del toks[key]

It looks a bit messy, but the point is to modify the tokens in place,
by rearranging the attrdicts to nodes with simple string keys, instead
of keys nested in structures.

Lastly, we attach the parse action in the usual way:

mainDict.setParseAction(rearrangeSubDicts)

Now you can access the fields of the different layers as:

print md.Layer.METAL2.lineStyle

I guess this all looks pretty convoluted. You might be better off
just doing your own Group'ing, and then navigating the nested lists to
build your own dict or other data structure.

-- Paul
 
Reply With Quote
 
Arnaud Delobelle
Guest
Posts: n/a
 
      03-23-2008
On Mar 22, 9:11*pm, rh0dium <steven.kl...@gmail.com> wrote:
> Hi all,


Hi,

> I am struggling with parsing the following data:
>
> test1 = """
> Technology * * *{
> * * * * * * * * name * * * * * * * * * * * * * *= "gtc"
> * * * * * * * * dielectric * * * * * * * * * * *= 2.75e-05

[...]

I know it's cheating, but the grammar of your example is actually
quite simple and the values are valid python expressions, so here is a
solution without pyparsing (or regexps, for that matter). *WARNING*
it uses the exec statement.

from textwrap import dedent

def parse(txt):
globs, parsed = {}, {}
units = txt.strip().split('}')[:-1]
for unit in units:
label, params = unit.split('{')
paramdict = {}
exec dedent(params) in globs, paramdict
try:
label, key = label.split()
parsed.setdefault(label, {})[eval(key)] = paramdict
except ValueError:
parsed[label.strip()] = paramdict
return parsed

>>> p = parse(test1)
>>> p['Layer']['PRBOUNDARY']

{'maskName': '', 'defaultWidth': 0, 'color': 'cyan', 'pattern':
'blank', 'layerNumber': 0, 'minSpacing': 0, 'blink': 0, 'minWidth': 0,
'visible': 1, 'pitch': 0, 'selectable': 1, 'lineStyle': 'solid'}
>>> p['Layer']['METAL2']['maskName']

'metal2'
>>> p['Technology']['gridResolution']

5
>>>


HTH

--
Arnaud

 
Reply With Quote
 
rh0dium
Guest
Posts: n/a
 
      03-23-2008
On Mar 23, 12:26*am, Paul McGuire <pt...@austin.rr.com> wrote:
> There are a couple of bugs in our program so far.
>
> First of all, our grammar isn't parsing the METAL2 entry at all. *We
> should change this line:
>
> * * md = mainDict.parseString(test1)
>
> to
>
> * * md = (mainDict+stringEnd).parseString(test1)
>
> The parser is reading as far as it can, but then stopping once
> successful parsing is no longer possible. *Since there is at least one
> valid entry matching the OneOrMore expression, then parseString raises
> no errors. *By adding "+stringEnd" to our expression to be parsed, we
> are saying "once parsing is finished, we should be at the end of the
> input string". *By making this change, we now get this parse
> exception:
>
> pyparsing.ParseException: Expected stringEnd (at char 194, (line:54,
> col:1)
>
> So what is the matter with the METAL2 entries? *After using brute
> force "divide and conquer" (I deleted half of the entries and got a
> successful parse, then restored half of the entries I removed, until I
> added back the entry that caused the parse to fail), I found these
> lines in the input:
>
> * * fatTblThreshold * * * * * * * * = (0,0.39,10.005)
> * * fatTblParallelLength * * * * * *= (0,1,0)
>
> Both of these violate the atflist definition, because they contain
> integers, not just floatnums. *So we need to expand the definition of
> aftlist:
>
> * * floatnum = Combine(Word(nums) + "." + Word(nums) +
> * * * * Optional('e'+oneOf("+ -")+Word(nums)))
> * * floatnum.setParseAction(lambda t:float(t[0]))
> * * integer = Word(nums).setParseAction(lambda t:int(t[0]))
> * * atflist = Suppress("(") + delimitedList(floatnum|integer) + \
> * * * * * * * * Suppress(")")
>
> Then we need to tackle the issue of adding nesting for those entries
> that have sub-keys. *This is actually kind of tricky for your data
> example, because nesting within Dict expects input data to be nested.
> That is, nesting Dict's is normally done with data that is input like:
>
> main
> * Technology
> * Layer
> * * PRBOUNDARY
> * * METAL2
> * Tile
> * * unit
>
> But your data is structured slightly differently:
>
> main
> * Technology
> * Layer PRBOUNDARY
> * Layer METAL2
> * Tile unit
>
> Because Layer is repeated, the second entry creates a new node named
> "Layer" at the second level, and the first "Layer" entry is lost. *To
> fix this, we need to combine Layer and the layer id into a composite-
> type of key. *I did this by using Group, and adding the Optional alias
> (which I see now is a poor name, "layerId" would be better) as a
> second element of the key:
>
> * * mainDict = dictOf(
> * * * * Group(Word(alphas)+Optional(quotedString)),
> * * * * Suppress("{") + attrDict + Suppress("}")
> * * * * )
>
> But now if we parse the input with this mainDict, we see that the keys
> are no longer nice simple strings, but they are 1- or 2-element
> ParseResults objects. *Here is what I get from the command "print
> md.keys()":
>
> [(['Technology'], {}), (['Tile', 'unit'], {}), (['Layer',
> 'PRBOUNDARY'], {}), (['Layer', 'METAL2'], {})]
>
> So to finally clear this up, we need one more parse action, attached
> to the mainDict expression, that rearranges the subdicts using the
> elements in the keys. *The parse action looks like this, and it will
> process the overall parse results for the entire data structure:
>
> * * def rearrangeSubDicts(toks):
> * * * * # iterate over all key-value pairs in the dict
> * * * * for key,value in toks.items():
> * * * * * * # key is of the form ['name'] or ['name', 'name2']
> * * * * * * # and the value is the attrDict
>
> * * * * * * # if key has just one element, use it to define
> * * * * * * # a simple string key
> * * * * * * if len(key)==1:
> * * * * * * * * toks[key[0]] = value
> * * * * * * else:
> * * * * * * * * # if the key has two elements, create a
> * * * * * * * * # subnode with the first element
> * * * * * * * * if key[0] not in toks:
> * * * * * * * * * * toks[key[0]] = ParseResults([])
>
> * * * * * * * * # add an entry for the second key element
> * * * * * * * * toks[key[0]][key[1]] = value
>
> * * * * * * # now delete the original key that is the form
> * * * * * * # ['name'] or ['name', 'name2']
> * * * * * * del toks[key]
>
> It looks a bit messy, but the point is to modify the tokens in place,
> by rearranging the attrdicts to nodes with simple string keys, instead
> of keys nested in structures.
>
> Lastly, we attach the parse action in the usual way:
>
> * * mainDict.setParseAction(rearrangeSubDicts)
>
> Now you can access the fields of the different layers as:
>
> * * print md.Layer.METAL2.lineStyle
>
> I guess this all looks pretty convoluted. *You might be better off
> just doing your own Group'ing, and then navigating the nested lists to
> build your own dict or other data structure.
>
> -- Paul


Hi Paul,

Before I continue this I must thank you for your help. You really did
do an outstanding job on this code and it is really straight forward
to use and learn from. This was a fun weekend task and I really
wanted to use pyparsing to do it. Because this is one of several type
of files I want to parse. I (as I'm sure you would agree) think the
rearrangeSubDicts is a bit of a hack but never the less absolutely
required and due to the limitations of the data I am parsing. Once
again thanks for your great help. Now the problem..

I attempted to use this code on another testcase. This testcase had
tabs in it. I think 1.4.11 is missing the expandtabs attribute. I
ran my code (which had tabs) and I got this..

AttributeError: 'builtin_function_or_method' object has no attribute
'expandtabs'

Ugh oh. Is this a pyparsing problem or am I just an idiot..

Thanks again!

 
Reply With Quote
 
rh0dium
Guest
Posts: n/a
 
      03-23-2008
On Mar 23, 1:48*pm, rh0dium <steven.kl...@gmail.com> wrote:
> On Mar 23, 12:26*am, Paul McGuire <pt...@austin.rr.com> wrote:
>
>
>
> > There are a couple of bugs in our program so far.

>
> > First of all, our grammar isn't parsing the METAL2 entry at all. *We
> > should change this line:

>
> > * * md = mainDict.parseString(test1)

>
> > to

>
> > * * md = (mainDict+stringEnd).parseString(test1)

>
> > The parser is reading as far as it can, but then stopping once
> > successful parsing is no longer possible. *Since there is at least one
> > valid entry matching the OneOrMore expression, then parseString raises
> > no errors. *By adding "+stringEnd" to our expression to be parsed, we
> > are saying "once parsing is finished, we should be at the end of the
> > input string". *By making this change, we now get this parse
> > exception:

>
> > pyparsing.ParseException: Expected stringEnd (at char 194, (line:54,
> > col:1)

>
> > So what is the matter with the METAL2 entries? *After using brute
> > force "divide and conquer" (I deleted half of the entries and got a
> > successful parse, then restored half of the entries I removed, until I
> > added back the entry that caused the parse to fail), I found these
> > lines in the input:

>
> > * * fatTblThreshold * * * * * * * * = (0,0.39,10.005)
> > * * fatTblParallelLength * * * * * *= (0,1,0)

>
> > Both of these violate the atflist definition, because they contain
> > integers, not just floatnums. *So we need to expand the definition of
> > aftlist:

>
> > * * floatnum = Combine(Word(nums) + "." + Word(nums) +
> > * * * * Optional('e'+oneOf("+ -")+Word(nums)))
> > * * floatnum.setParseAction(lambda t:float(t[0]))
> > * * integer = Word(nums).setParseAction(lambda t:int(t[0]))
> > * * atflist = Suppress("(") + delimitedList(floatnum|integer) + \
> > * * * * * * * * Suppress(")")

>
> > Then we need to tackle the issue of adding nesting for those entries
> > that have sub-keys. *This is actually kind of tricky for your data
> > example, because nesting within Dict expects input data to be nested.
> > That is, nesting Dict's is normally done with data that is input like:

>
> > main
> > * Technology
> > * Layer
> > * * PRBOUNDARY
> > * * METAL2
> > * Tile
> > * * unit

>
> > But your data is structured slightly differently:

>
> > main
> > * Technology
> > * Layer PRBOUNDARY
> > * Layer METAL2
> > * Tile unit

>
> > Because Layer is repeated, the second entry creates a new node named
> > "Layer" at the second level, and the first "Layer" entry is lost. *To
> > fix this, we need to combine Layer and the layer id into a composite-
> > type of key. *I did this by using Group, and adding the Optional alias
> > (which I see now is a poor name, "layerId" would be better) as a
> > second element of the key:

>
> > * * mainDict = dictOf(
> > * * * * Group(Word(alphas)+Optional(quotedString)),
> > * * * * Suppress("{") + attrDict + Suppress("}")
> > * * * * )

>
> > But now if we parse the input with this mainDict, we see that the keys
> > are no longer nice simple strings, but they are 1- or 2-element
> > ParseResults objects. *Here is what I get from the command "print
> > md.keys()":

>
> > [(['Technology'], {}), (['Tile', 'unit'], {}), (['Layer',
> > 'PRBOUNDARY'], {}), (['Layer', 'METAL2'], {})]

>
> > So to finally clear this up, we need one more parse action, attached
> > to the mainDict expression, that rearranges the subdicts using the
> > elements in the keys. *The parse action looks like this, and it will
> > process the overall parse results for the entire data structure:

>
> > * * def rearrangeSubDicts(toks):
> > * * * * # iterate over all key-value pairs in the dict
> > * * * * for key,value in toks.items():
> > * * * * * * # key is of the form ['name'] or ['name', 'name2']
> > * * * * * * # and the value is the attrDict

>
> > * * * * * * # if key has just one element, use it to define
> > * * * * * * # a simple string key
> > * * * * * * if len(key)==1:
> > * * * * * * * * toks[key[0]] = value
> > * * * * * * else:
> > * * * * * * * * # if the key has two elements, create a
> > * * * * * * * * # subnode with the first element
> > * * * * * * * * if key[0] not in toks:
> > * * * * * * * * * * toks[key[0]] = ParseResults([])

>
> > * * * * * * * * # add an entry for the second key element
> > * * * * * * * * toks[key[0]][key[1]] = value

>
> > * * * * * * # now delete the original key that is the form
> > * * * * * * # ['name'] or ['name', 'name2']
> > * * * * * * del toks[key]

>
> > It looks a bit messy, but the point is to modify the tokens in place,
> > by rearranging the attrdicts to nodes with simple string keys, instead
> > of keys nested in structures.

>
> > Lastly, we attach the parse action in the usual way:

>
> > * * mainDict.setParseAction(rearrangeSubDicts)

>
> > Now you can access the fields of the different layers as:

>
> > * * print md.Layer.METAL2.lineStyle

>
> > I guess this all looks pretty convoluted. *You might be better off
> > just doing your own Group'ing, and then navigating the nested lists to
> > build your own dict or other data structure.

>
> > -- Paul

>
> Hi Paul,
>
> Before I continue this I must thank you for your help. *You really did
> do an outstanding job on this code and it is really straight forward
> to use and learn from. *This was a fun weekend task and I really
> wanted to use pyparsing to do it. *Because this is one of several type
> of files I want to parse. *I (as I'm sure you would agree) think the
> rearrangeSubDicts is a bit of a hack but never the less absolutely
> required and due to the limitations of the data I am parsing. * Once
> again thanks for your great help. *Now the problem..
>
> I attempted to use this code on another testcase. *This testcase had
> tabs in it. *I think 1.4.11 is missing the expandtabs attribute. *I
> ran my code (which had tabs) and I got this..
>
> AttributeError: 'builtin_function_or_method' object has no attribute
> 'expandtabs'
>
> Ugh oh. *Is this a pyparsing problem or am I just an idiot..
>
> Thanks again!


Doh!! Nevermind I am an idiot. Nope I got it what a bonehead..

I needed to tweak it a bit to ignore the comments.. Namely this fixed
it up..

mainDict = dictOf(
Group(Word(alphas)+Optional(quotedString)),
Suppress("{") + attrDict + Suppress("}")
) | cStyleComment.suppress()

Thanks again. Now I just need to figure out how to use your dicts to
do some work..

 
Reply With Quote
 
Francesco Bochicchio
Guest
Posts: n/a
 
      03-24-2008
Il Sat, 22 Mar 2008 14:11:16 -0700, rh0dium ha scritto:

> Hi all,
>
> I am struggling with parsing the following data:
>
> test1 = """
> Technology {
> name = "gtc" dielectric
> = 2.75e-05 unitTimeName
> = "ns" timePrecision = 1000
> unitLengthName = "micron"
> lengthPrecision = 1000 gridResolution
> = 5
> unitVoltageName = "v" voltagePrecision
> = 1000000 unitCurrentName =
> "ma" currentPrecision = 1000
> unitPowerName = "pw" powerPrecision
> = 1000 unitResistanceName =
> "kohm" resistancePrecision = 10000000
> unitCapacitanceName = "pf"
> capacitancePrecision = 10000000
> unitInductanceName = "nh"
> inductancePrecision = 100
> }
>
> Tile "unit" {
> width = 0.22 height
> = 1.69
> }
>
>


Did you think of using something a bit more sofisticated than pyparsing?
I have had a good experience to using ply, a pure-python implementation
of yacc/lex tools, which I used to extract significant data from C
programs to automatize documentation.

I never used before yacc or similar tools, but having a bit of experience
with BNF notation, I found ply easy enough. In my case, the major problem
was to cope with yacc limitation in describing C syntax (which I solved
by "oelaxing" the rules a bit, since I was going to process only already-
compiled C code). In your much simpler case, I'd say that a few
production rules should be enough.

P.S : there are others, faster and maybe more complete python parser, but
as I said ply is pure python: no external libraries and runs everywhere.

Ciao
-------
FB
 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      03-24-2008
On Mar 23, 4:04*pm, rh0dium <steven.kl...@gmail.com> wrote:
>
> I needed to tweak it a bit to ignore the comments.. *Namely this fixed
> it up..
>
> * * mainDict = dictOf(
> * * * * * * Group(Word(alphas)+Optional(quotedString)),
> * * * * * * Suppress("{") + attrDict + Suppress("}")
> * * * * * * ) | cStyleComment.suppress()
>
> Thanks again. *Now I just need to figure out how to use your dicts to
> do some work..- Hide quoted text -
>
> - Show quoted text -


I'm glad this is coming around to some reasonable degree of completion
for you. One last thought - your handling of comments is a bit crude,
and will not handle comments that crop up in the middle of dict
entries, as in:

color = /* using non-standard color during testing */
"plum"

The more comprehensive way to handle comments is to call ignore.
Using ignore will propagate the comment handling to all embedded
expressions, so you only need to call ignore once on the top-most
pyparsing expression, as in:

mainDict.ignore(cStyleComment)

Also, ignore does token suppression automatically.

-- Paul
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
help with pyparsing Prabhu Gurumurthy Python 3 12-10-2007 04:04 PM
Help with pyparsing and dealing with null values avidfan Python 2 10-31-2007 03:42 PM
help with pyparsing Neal Becker Python 1 10-31-2007 03:13 PM
Need help parsing with pyparsing... Just Another Victim of the Ambient Morality Python 6 10-23-2007 08:33 AM
Help With PyParsing of output from win32pdhutil.ShowAllProcesses() Steve Python 3 09-12-2007 08:21 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57