| Home | Forums | Reviews | Guides | Newsgroups | Register | Search |
![]() |
| Thread Tools |
| rh0dium |
|
|
|
| |
|
Paul McGuire
Guest
Posts: n/a
|
On Mar 22, 4:11*pm, rh0dium <steven.kl...@gmail.com> wrote:
> Hi all, > > I am struggling with parsing the following data: > <snip> > As a side note: *Is this the right approach to using pyparsing. *Do we > start from the inside and work our way out or should I have started > with looking at the bigger picture ( keyword + "{" + OneOrMore key / > vals + "}" + ) *I started there but could figure out how to look > multiline - I'm assuming I'd just join them all up? > > Thanks I think your "inside-out" approach is just fine. Start by composing expressions for the different "pieces" of your input text, then steadily build up more and more complex forms. I think the main complication you have is that of using commaSeparatedList for your list of real numbers. commaSeparatedList is a very generic helper expression. From the online example (http:// pyparsing.wikispaces.com/space/showimage/commasep.py), here is a sample of the data that commaSeparatedList will handle: "a,b,c,100.2,,3", "d, e, j k , m ", "'Hello, World', f, g , , 5.1,x", "John Doe, 123 Main St., Cleveland, Ohio", "Jane Doe, 456 St. James St., Los Angeles , California ", In other words, the content of the items between commas is pretty much anything that is *not* a comma. If you change your definition of atflist to: atflist = Suppress("(") + commaSeparatedList # + Suppress(")") (that is, comment out the trailing right paren), you'll get this successful parse result: ['0.21', '0.24', '0.6', '0.24', '0.24', '0.6)'] In your example, you are parsing a list of floating point numbers, in a list delimited by commas, surrounded by parens. This definition of atflist should give you more control over the parsing process, and give you real floats to boot: floatnum = Combine(Word(nums) + "." + Word(nums) + Optional('e'+oneOf("+ -")+Word(nums))) floatnum.setParseAction(lambda t:float(t[0])) atflist = Suppress("(") + delimitedList(floatnum) + Suppress(")") Now I get this output for your parse test: [0.20999999999999999, 0.23999999999999999, 0.59999999999999998, 0.23999999999999999, 0.23999999999999999, 0.59999999999999998] So you can see that this has actually parsed the numbers and converted them to floats. I went ahead and added support for scientific notation in floatnum, since I see that you have several atfvalues that are standalone floats, some using scientific notation. To add these, just expand atfvalues to: atfvalues = ( floatnum | Word(nums) | atfstr | atflist ) (At this point, I'll go on to show how to parse the rest of the data structure - if you want to take a stab at it yourself, stop reading here, and then come back to compare your results with my approach.) To parse the overall structure, now that you have expressions for the different component pieces, look into using Dict (or more simply using the helper function dictOf) to define results names automagically for you based on the attribute names in the input. Dict does *not* change any of the parsing or matching logic, it just adds named fields in the parsed results corresponding to the key names found in the input. Dict is a complex pyparsing class, but dictOf simplfies things. dictOf takes two arguments: dictOf(keyExpression, valueExpression) This translates to: Dict( OneOrMore( Group(keyExpression + valueExpression) ) ) For example, to parse the lists of entries that look like: name = "gtc" dielectric = 2.75e-05 unitTimeName = "ns" timePrecision = 1000 unitLengthName = "micron" etc. just define that this is "a dict of entries each composed of a key consisting of a Word(alphas), followed by a suppressed '=' sign and an atfvalues", that is: attrDict = dictOf(Word(alphas), Suppress("=") + atfvalues) dictOf takes care of all of the repetition and grouping necessary for Dict to do its work. These attribute dicts are nested within an outer main dict, which is "a dict of entries, each with a key of Word(alphas), and a value of an optional quotedString (an alias, perhaps?), a left brace, an attrDict, and a right brace," or: mainDict = dictOf( Word(alphas), Optional(quotedString)("alias") + Suppress("{") + attrDict + Suppress("}") ) By adding this code to what you already have: attrDict = dictOf(Word(alphas), Suppress("=") + atfvalues) mainDict = dictOf( Word(alphas), Optional(quotedString)("alias") + Suppress("{") + attrDict + Suppress("}") ) You can now write: md = mainDict.parseString(test1) print md.dump() print md.Layer.lineStyle and get this output: [['Technology', ['name', 'gtc'], ['dielectric', 2.7500000000000001e-005], ['unitTimeName', 'ns'], ['timePrecision', '1000'], ['unitLengthName', 'micron'], ['lengthPrecision', '1000'], ['gridResolution', '5'], ['unitVoltageName', 'v'], ['voltagePrecision', '1000000'], ['unitCurrentName', 'ma'], ['currentPrecision', '1000'], ['unitPowerName', 'pw'], ['powerPrecision', '1000'], ['unitResistanceName', 'kohm'], ['resistancePrecision', '10000000'], ['unitCapacitanceName', 'pf'], ['capacitancePrecision', '10000000'], ['unitInductanceName', 'nh'], ['inductancePrecision', '100']], ['Tile', 'unit', ['width', 0.22], ['height', 1.6899999999999999]], ['Layer', 'PRBOUNDARY', ['layerNumber', '0'], ['maskName', ''], ['visible', '1'], ['selectable', '1'], ['blink', '0'], ['color', 'cyan'], ['lineStyle', 'solid'], ['pattern', 'blank'], ['pitch', '0'], ['defaultWidth', '0'], ['minWidth', '0'], ['minSpacing', '0']]] - Layer: ['PRBOUNDARY', ['layerNumber', '0'], ['maskName', ''], ['visible', '1'], ['selectable', '1'], ['blink', '0'], ['color', 'cyan'], ['lineStyle', 'solid'], ['pattern', 'blank'], ['pitch', '0'], ['defaultWidth', '0'], ['minWidth', '0'], ['minSpacing', '0']] - alias: PRBOUNDARY - blink: 0 - color: cyan - defaultWidth: 0 - layerNumber: 0 - lineStyle: solid - maskName: - minSpacing: 0 - minWidth: 0 - pattern: blank - pitch: 0 - selectable: 1 - visible: 1 - Technology: [['name', 'gtc'], ['dielectric', 2.7500000000000001e-005], ['unitTimeName', 'ns'], ['timePrecision', '1000'], ['unitLengthName', 'micron'], ['lengthPrecision', '1000'], ['gridResolution', '5'], ['unitVoltageName', 'v'], ['voltagePrecision', '1000000'], ['unitCurrentName', 'ma'], ['currentPrecision', '1000'], ['unitPowerName', 'pw'], ['powerPrecision', '1000'], ['unitResistanceName', 'kohm'], ['resistancePrecision', '10000000'], ['unitCapacitanceName', 'pf'], ['capacitancePrecision', '10000000'], ['unitInductanceName', 'nh'], ['inductancePrecision', '100']] - capacitancePrecision: 10000000 - currentPrecision: 1000 - dielectric: 2.75e-005 - gridResolution: 5 - inductancePrecision: 100 - lengthPrecision: 1000 - name: gtc - powerPrecision: 1000 - resistancePrecision: 10000000 - timePrecision: 1000 - unitCapacitanceName: pf - unitCurrentName: ma - unitInductanceName: nh - unitLengthName: micron - unitPowerName: pw - unitResistanceName: kohm - unitTimeName: ns - unitVoltageName: v - voltagePrecision: 1000000 - Tile: ['unit', ['width', 0.22], ['height', 1.6899999999999999]] - alias: unit - height: 1.69 - width: 0.22 solid Cheers! -- Paul |
|
|
|
|
|||
|
|||
| Paul McGuire |
|
|
|
| |
|
Paul McGuire
Guest
Posts: n/a
|
Oof, I see that you have multiple "Layer" entries, with different
qualifying labels. Since the dicts use "Layer" as the key, you only get the last "Layer" value, with qualifier "PRBOUNDARY", and lose the "Layer" for "METAL2". To fix this, you'll have to move the optional alias term to the key, and merge "Layer" and "PRBOUNDARY" into a single key, perhaps "Layer/PRBOUNDARY" or "Layer(PRBOUNDARY)" - a parse action should take care of this for you. Unfortnately, these forms will not allow you to use object attribute form (md.Layer.lineStyle), you will have to use dict access form (md["Layer(PRBOUNDARY)"].lineStyle), since these keys have characters that are not valid attribute name characters. Or you could add one more level of Dict nesting to your grammar, to permit access like "md.Layer.PRBOUNDARY.lineStyle". -- Paul |
|
|
|
|
|||
|
|||
| Paul McGuire |
|
rh0dium
Guest
Posts: n/a
|
On Mar 22, 6:30*pm, Paul McGuire <pt...@austin.rr.com> wrote:
> Oof, I see that you have multiple "Layer" entries, with different > qualifying labels. *Since the dicts use "Layer" as the key, you only > get the last "Layer" value, with qualifier "PRBOUNDARY", and lose the > "Layer" for "METAL2". *To fix this, you'll have to move the optional > alias term to the key, and merge "Layer" and "PRBOUNDARY" into a > single key, perhaps "Layer/PRBOUNDARY" or "Layer(PRBOUNDARY)" - a > parse action should take care of this for you. *Unfortnately, these > forms will not allow you to use object attribute form > (md.Layer.lineStyle), you will have to use dict access form > (md["Layer(PRBOUNDARY)"].lineStyle), since these keys have characters > that are not valid attribute name characters. > > Or you could add one more level of Dict nesting to your grammar, to > permit access like "md.Layer.PRBOUNDARY.lineStyle". > > -- Paul OK - We'll I got as far as you did but I did it a bit differently.. Then I merged some of your data with my data. But Now I am at the point of adding another level of the dict and am struggling.. Here is what I have.. # parse actions LPAR = Literal("(") RPAR = Literal(")") LBRACE = Literal("{") RBRACE = Literal("}") EQUAL = Literal("=") # This will get the values all figured out.. # "metal2" 1 6.05E-05 30 cvtInt = lambda toks: int(toks[0]) cvtReal = lambda toks: float(toks[0]) integer = Combine(Optional(oneOf("+ -")) + Word(nums))\ .setParseAction( cvtInt ) real = Combine(Optional(oneOf("+ -")) + Word(nums) + "." + Optional(Word(nums)) + Optional(oneOf("e E")+Optional(oneOf("+ -")) +Word(nums)))\ .setParseAction( cvtReal ) atfstr = quotedString.setParseAction(removeQuotes) atflist = Group( LPAR.suppress() + delimitedList(real, ",") + RPAR.suppress() ) atfvalues = ( real | integer | atfstr | atflist ) # Now this should work out a single line inside a section # maskName = "metal2" # isDefaultLayer = 1 # visible = 1 # fatTblSpacing = (0.21,0.24,0.6, # 0.6,0.6,0.6) # minArea = 0.144 atfkeys = Word(alphanums) attrDict = dictOf( atfkeys , EQUAL.suppress() + atfvalues) # Now we need to take care of the "Metal2" { one or more attrDict } # "METAL2" { # layerNumber = 36 # maskName = "metal2" # isDefaultLayer = 1 # visible = 1 # fatTblSpacing = (0.21,0.24,0.6, # 0.24,0.24,0.6, # 0.6,0.6,0.6) # minArea = 0.144 # } attrType = dictOf(atfstr, LBRACE.suppress() + attrDict + RBRACE.suppress()) # Lastly we need to get the ones without attributes (Technology) attrType2 = LBRACE.suppress() + attrDict + RBRACE.suppress() mainDict = dictOf(atfkeys, attrType2 | attrType ) md = mainDict.parseString(test1) But I too am only getting the last layer. I thought if broke out the "alias" area and then built on that I'd be set but I did something wrong. |
|
|
|
|
|||
|
|||
| rh0dium |
|
Paul McGuire
Guest
Posts: n/a
|
There are a couple of bugs in our program so far.
First of all, our grammar isn't parsing the METAL2 entry at all. We should change this line: md = mainDict.parseString(test1) to md = (mainDict+stringEnd).parseString(test1) The parser is reading as far as it can, but then stopping once successful parsing is no longer possible. Since there is at least one valid entry matching the OneOrMore expression, then parseString raises no errors. By adding "+stringEnd" to our expression to be parsed, we are saying "once parsing is finished, we should be at the end of the input string". By making this change, we now get this parse exception: pyparsing.ParseException: Expected stringEnd (at char 194 col:1) So what is the matter with the METAL2 entries? After using brute force "divide and conquer" (I deleted half of the entries and got a successful parse, then restored half of the entries I removed, until I added back the entry that caused the parse to fail), I found these lines in the input: fatTblThreshold = (0,0.39,10.005) fatTblParallelLength = (0,1,0) Both of these violate the atflist definition, because they contain integers, not just floatnums. So we need to expand the definition of aftlist: floatnum = Combine(Word(nums) + "." + Word(nums) + Optional('e'+oneOf("+ -")+Word(nums))) floatnum.setParseAction(lambda t:float(t[0])) integer = Word(nums).setParseAction(lambda t:int(t[0])) atflist = Suppress("(") + delimitedList(floatnum|integer) + \ Suppress(")") Then we need to tackle the issue of adding nesting for those entries that have sub-keys. This is actually kind of tricky for your data example, because nesting within Dict expects input data to be nested. That is, nesting Dict's is normally done with data that is input like: main Technology Layer PRBOUNDARY METAL2 Tile unit But your data is structured slightly differently: main Technology Layer PRBOUNDARY Layer METAL2 Tile unit Because Layer is repeated, the second entry creates a new node named "Layer" at the second level, and the first "Layer" entry is lost. To fix this, we need to combine Layer and the layer id into a composite- type of key. I did this by using Group, and adding the Optional alias (which I see now is a poor name, "layerId" would be better) as a second element of the key: mainDict = dictOf( Group(Word(alphas)+Optional(quotedString)), Suppress("{") + attrDict + Suppress("}") ) But now if we parse the input with this mainDict, we see that the keys are no longer nice simple strings, but they are 1- or 2-element ParseResults objects. Here is what I get from the command "print md.keys()": [(['Technology'], {}), (['Tile', 'unit'], {}), (['Layer', 'PRBOUNDARY'], {}), (['Layer', 'METAL2'], {})] So to finally clear this up, we need one more parse action, attached to the mainDict expression, that rearranges the subdicts using the elements in the keys. The parse action looks like this, and it will process the overall parse results for the entire data structure: def rearrangeSubDicts(toks): # iterate over all key-value pairs in the dict for key,value in toks.items(): # key is of the form ['name'] or ['name', 'name2'] # and the value is the attrDict # if key has just one element, use it to define # a simple string key if len(key)==1: toks[key[0]] = value else: # if the key has two elements, create a # subnode with the first element if key[0] not in toks: toks[key[0]] = ParseResults([]) # add an entry for the second key element toks[key[0]][key[1]] = value # now delete the original key that is the form # ['name'] or ['name', 'name2'] del toks[key] It looks a bit messy, but the point is to modify the tokens in place, by rearranging the attrdicts to nodes with simple string keys, instead of keys nested in structures. Lastly, we attach the parse action in the usual way: mainDict.setParseAction(rearrangeSubDicts) Now you can access the fields of the different layers as: print md.Layer.METAL2.lineStyle I guess this all looks pretty convoluted. You might be better off just doing your own Group'ing, and then navigating the nested lists to build your own dict or other data structure. -- Paul |
|
|
|
|
|||
|
|||
| Paul McGuire |
|
Arnaud Delobelle
Guest
Posts: n/a
|
On Mar 22, 9:11*pm, rh0dium <steven.kl...@gmail.com> wrote:
> Hi all, Hi, > I am struggling with parsing the following data: > > test1 = """ > Technology * * *{ > * * * * * * * * name * * * * * * * * * * * * * *= "gtc" > * * * * * * * * dielectric * * * * * * * * * * *= 2.75e-05 [...] I know it's cheating, but the grammar of your example is actually quite simple and the values are valid python expressions, so here is a solution without pyparsing (or regexps, for that matter). *WARNING* it uses the exec statement. from textwrap import dedent def parse(txt): globs, parsed = {}, {} units = txt.strip().split('}')[:-1] for unit in units: label, params = unit.split('{') paramdict = {} exec dedent(params) in globs, paramdict try: label, key = label.split() parsed.setdefault(label, {})[eval(key)] = paramdict except ValueError: parsed[label.strip()] = paramdict return parsed >>> p = parse(test1) >>> p['Layer']['PRBOUNDARY'] {'maskName': '', 'defaultWidth': 0, 'color': 'cyan', 'pattern': 'blank', 'layerNumber': 0, 'minSpacing': 0, 'blink': 0, 'minWidth': 0, 'visible': 1, 'pitch': 0, 'selectable': 1, 'lineStyle': 'solid'} >>> p['Layer']['METAL2']['maskName'] 'metal2' >>> p['Technology']['gridResolution'] 5 >>> HTH -- Arnaud |
|
|
|
|
|||
|
|||
| Arnaud Delobelle |
|
rh0dium
Guest
Posts: n/a
|
On Mar 23, 12:26*am, Paul McGuire <pt...@austin.rr.com> wrote:
> There are a couple of bugs in our program so far. > > First of all, our grammar isn't parsing the METAL2 entry at all. *We > should change this line: > > * * md = mainDict.parseString(test1) > > to > > * * md = (mainDict+stringEnd).parseString(test1) > > The parser is reading as far as it can, but then stopping once > successful parsing is no longer possible. *Since there is at least one > valid entry matching the OneOrMore expression, then parseString raises > no errors. *By adding "+stringEnd" to our expression to be parsed, we > are saying "once parsing is finished, we should be at the end of the > input string". *By making this change, we now get this parse > exception: > > pyparsing.ParseException: Expected stringEnd (at char 194 > col:1) > > So what is the matter with the METAL2 entries? *After using brute > force "divide and conquer" (I deleted half of the entries and got a > successful parse, then restored half of the entries I removed, until I > added back the entry that caused the parse to fail), I found these > lines in the input: > > * * fatTblThreshold * * * * * * * * = (0,0.39,10.005) > * * fatTblParallelLength * * * * * *= (0,1,0) > > Both of these violate the atflist definition, because they contain > integers, not just floatnums. *So we need to expand the definition of > aftlist: > > * * floatnum = Combine(Word(nums) + "." + Word(nums) + > * * * * Optional('e'+oneOf("+ -")+Word(nums))) > * * floatnum.setParseAction(lambda t:float(t[0])) > * * integer = Word(nums).setParseAction(lambda t:int(t[0])) > * * atflist = Suppress("(") + delimitedList(floatnum|integer) + \ > * * * * * * * * Suppress(")") > > Then we need to tackle the issue of adding nesting for those entries > that have sub-keys. *This is actually kind of tricky for your data > example, because nesting within Dict expects input data to be nested. > That is, nesting Dict's is normally done with data that is input like: > > main > * Technology > * Layer > * * PRBOUNDARY > * * METAL2 > * Tile > * * unit > > But your data is structured slightly differently: > > main > * Technology > * Layer PRBOUNDARY > * Layer METAL2 > * Tile unit > > Because Layer is repeated, the second entry creates a new node named > "Layer" at the second level, and the first "Layer" entry is lost. *To > fix this, we need to combine Layer and the layer id into a composite- > type of key. *I did this by using Group, and adding the Optional alias > (which I see now is a poor name, "layerId" would be better) as a > second element of the key: > > * * mainDict = dictOf( > * * * * Group(Word(alphas)+Optional(quotedString)), > * * * * Suppress("{") + attrDict + Suppress("}") > * * * * ) > > But now if we parse the input with this mainDict, we see that the keys > are no longer nice simple strings, but they are 1- or 2-element > ParseResults objects. *Here is what I get from the command "print > md.keys()": > > [(['Technology'], {}), (['Tile', 'unit'], {}), (['Layer', > 'PRBOUNDARY'], {}), (['Layer', 'METAL2'], {})] > > So to finally clear this up, we need one more parse action, attached > to the mainDict expression, that rearranges the subdicts using the > elements in the keys. *The parse action looks like this, and it will > process the overall parse results for the entire data structure: > > * * def rearrangeSubDicts(toks): > * * * * # iterate over all key-value pairs in the dict > * * * * for key,value in toks.items(): > * * * * * * # key is of the form ['name'] or ['name', 'name2'] > * * * * * * # and the value is the attrDict > > * * * * * * # if key has just one element, use it to define > * * * * * * # a simple string key > * * * * * * if len(key)==1: > * * * * * * * * toks[key[0]] = value > * * * * * * else: > * * * * * * * * # if the key has two elements, create a > * * * * * * * * # subnode with the first element > * * * * * * * * if key[0] not in toks: > * * * * * * * * * * toks[key[0]] = ParseResults([]) > > * * * * * * * * # add an entry for the second key element > * * * * * * * * toks[key[0]][key[1]] = value > > * * * * * * # now delete the original key that is the form > * * * * * * # ['name'] or ['name', 'name2'] > * * * * * * del toks[key] > > It looks a bit messy, but the point is to modify the tokens in place, > by rearranging the attrdicts to nodes with simple string keys, instead > of keys nested in structures. > > Lastly, we attach the parse action in the usual way: > > * * mainDict.setParseAction(rearrangeSubDicts) > > Now you can access the fields of the different layers as: > > * * print md.Layer.METAL2.lineStyle > > I guess this all looks pretty convoluted. *You might be better off > just doing your own Group'ing, and then navigating the nested lists to > build your own dict or other data structure. > > -- Paul Hi Paul, Before I continue this I must thank you for your help. You really did do an outstanding job on this code and it is really straight forward to use and learn from. This was a fun weekend task and I really wanted to use pyparsing to do it. Because this is one of several type of files I want to parse. I (as I'm sure you would agree) think the rearrangeSubDicts is a bit of a hack but never the less absolutely required and due to the limitations of the data I am parsing. Once again thanks for your great help. Now the problem.. I attempted to use this code on another testcase. This testcase had tabs in it. I think 1.4.11 is missing the expandtabs attribute. I ran my code (which had tabs) and I got this.. AttributeError: 'builtin_function_or_method' object has no attribute 'expandtabs' Ugh oh. Is this a pyparsing problem or am I just an idiot.. Thanks again! |
|
|
|
|
|||
|
|||
| rh0dium |
|
rh0dium
Guest
Posts: n/a
|
On Mar 23, 1:48*pm, rh0dium <steven.kl...@gmail.com> wrote:
> On Mar 23, 12:26*am, Paul McGuire <pt...@austin.rr.com> wrote: > > > > > There are a couple of bugs in our program so far. > > > First of all, our grammar isn't parsing the METAL2 entry at all. *We > > should change this line: > > > * * md = mainDict.parseString(test1) > > > to > > > * * md = (mainDict+stringEnd).parseString(test1) > > > The parser is reading as far as it can, but then stopping once > > successful parsing is no longer possible. *Since there is at least one > > valid entry matching the OneOrMore expression, then parseString raises > > no errors. *By adding "+stringEnd" to our expression to be parsed, we > > are saying "once parsing is finished, we should be at the end of the > > input string". *By making this change, we now get this parse > > exception: > > > pyparsing.ParseException: Expected stringEnd (at char 194 > > col:1) > > > So what is the matter with the METAL2 entries? *After using brute > > force "divide and conquer" (I deleted half of the entries and got a > > successful parse, then restored half of the entries I removed, until I > > added back the entry that caused the parse to fail), I found these > > lines in the input: > > > * * fatTblThreshold * * * * * * * * = (0,0.39,10.005) > > * * fatTblParallelLength * * * * * *= (0,1,0) > > > Both of these violate the atflist definition, because they contain > > integers, not just floatnums. *So we need to expand the definition of > > aftlist: > > > * * floatnum = Combine(Word(nums) + "." + Word(nums) + > > * * * * Optional('e'+oneOf("+ -")+Word(nums))) > > * * floatnum.setParseAction(lambda t:float(t[0])) > > * * integer = Word(nums).setParseAction(lambda t:int(t[0])) > > * * atflist = Suppress("(") + delimitedList(floatnum|integer) + \ > > * * * * * * * * Suppress(")") > > > Then we need to tackle the issue of adding nesting for those entries > > that have sub-keys. *This is actually kind of tricky for your data > > example, because nesting within Dict expects input data to be nested. > > That is, nesting Dict's is normally done with data that is input like: > > > main > > * Technology > > * Layer > > * * PRBOUNDARY > > * * METAL2 > > * Tile > > * * unit > > > But your data is structured slightly differently: > > > main > > * Technology > > * Layer PRBOUNDARY > > * Layer METAL2 > > * Tile unit > > > Because Layer is repeated, the second entry creates a new node named > > "Layer" at the second level, and the first "Layer" entry is lost. *To > > fix this, we need to combine Layer and the layer id into a composite- > > type of key. *I did this by using Group, and adding the Optional alias > > (which I see now is a poor name, "layerId" would be better) as a > > second element of the key: > > > * * mainDict = dictOf( > > * * * * Group(Word(alphas)+Optional(quotedString)), > > * * * * Suppress("{") + attrDict + Suppress("}") > > * * * * ) > > > But now if we parse the input with this mainDict, we see that the keys > > are no longer nice simple strings, but they are 1- or 2-element > > ParseResults objects. *Here is what I get from the command "print > > md.keys()": > > > [(['Technology'], {}), (['Tile', 'unit'], {}), (['Layer', > > 'PRBOUNDARY'], {}), (['Layer', 'METAL2'], {})] > > > So to finally clear this up, we need one more parse action, attached > > to the mainDict expression, that rearranges the subdicts using the > > elements in the keys. *The parse action looks like this, and it will > > process the overall parse results for the entire data structure: > > > * * def rearrangeSubDicts(toks): > > * * * * # iterate over all key-value pairs in the dict > > * * * * for key,value in toks.items(): > > * * * * * * # key is of the form ['name'] or ['name', 'name2'] > > * * * * * * # and the value is the attrDict > > > * * * * * * # if key has just one element, use it to define > > * * * * * * # a simple string key > > * * * * * * if len(key)==1: > > * * * * * * * * toks[key[0]] = value > > * * * * * * else: > > * * * * * * * * # if the key has two elements, create a > > * * * * * * * * # subnode with the first element > > * * * * * * * * if key[0] not in toks: > > * * * * * * * * * * toks[key[0]] = ParseResults([]) > > > * * * * * * * * # add an entry for the second key element > > * * * * * * * * toks[key[0]][key[1]] = value > > > * * * * * * # now delete the original key that is the form > > * * * * * * # ['name'] or ['name', 'name2'] > > * * * * * * del toks[key] > > > It looks a bit messy, but the point is to modify the tokens in place, > > by rearranging the attrdicts to nodes with simple string keys, instead > > of keys nested in structures. > > > Lastly, we attach the parse action in the usual way: > > > * * mainDict.setParseAction(rearrangeSubDicts) > > > Now you can access the fields of the different layers as: > > > * * print md.Layer.METAL2.lineStyle > > > I guess this all looks pretty convoluted. *You might be better off > > just doing your own Group'ing, and then navigating the nested lists to > > build your own dict or other data structure. > > > -- Paul > > Hi Paul, > > Before I continue this I must thank you for your help. *You really did > do an outstanding job on this code and it is really straight forward > to use and learn from. *This was a fun weekend task and I really > wanted to use pyparsing to do it. *Because this is one of several type > of files I want to parse. *I (as I'm sure you would agree) think the > rearrangeSubDicts is a bit of a hack but never the less absolutely > required and due to the limitations of the data I am parsing. * Once > again thanks for your great help. *Now the problem.. > > I attempted to use this code on another testcase. *This testcase had > tabs in it. *I think 1.4.11 is missing the expandtabs attribute. *I > ran my code (which had tabs) and I got this.. > > AttributeError: 'builtin_function_or_method' object has no attribute > 'expandtabs' > > Ugh oh. *Is this a pyparsing problem or am I just an idiot.. > > Thanks again! Doh!! Nevermind I am an idiot. Nope I got it what a bonehead.. I needed to tweak it a bit to ignore the comments.. Namely this fixed it up.. mainDict = dictOf( Group(Word(alphas)+Optional(quotedString)), Suppress("{") + attrDict + Suppress("}") ) | cStyleComment.suppress() Thanks again. Now I just need to figure out how to use your dicts to do some work.. |
|
|
|
|
|||
|
|||
| rh0dium |
|
Francesco Bochicchio
Guest
Posts: n/a
|
Il Sat, 22 Mar 2008 14:11:16 -0700, rh0dium ha scritto:
> Hi all, > > I am struggling with parsing the following data: > > test1 = """ > Technology { > name = "gtc" dielectric > = 2.75e-05 unitTimeName > = "ns" timePrecision = 1000 > unitLengthName = "micron" > lengthPrecision = 1000 gridResolution > = 5 > unitVoltageName = "v" voltagePrecision > = 1000000 unitCurrentName = > "ma" currentPrecision = 1000 > unitPowerName = "pw" powerPrecision > = 1000 unitResistanceName = > "kohm" resistancePrecision = 10000000 > unitCapacitanceName = "pf" > capacitancePrecision = 10000000 > unitInductanceName = "nh" > inductancePrecision = 100 > } > > Tile "unit" { > width = 0.22 height > = 1.69 > } > > Did you think of using something a bit more sofisticated than pyparsing? I have had a good experience to using ply, a pure-python implementation of yacc/lex tools, which I used to extract significant data from C programs to automatize documentation. I never used before yacc or similar tools, but having a bit of experience with BNF notation, I found ply easy enough. In my case, the major problem was to cope with yacc limitation in describing C syntax (which I solved by "oelaxing" the rules a bit, since I was going to process only already- compiled C code). In your much simpler case, I'd say that a few production rules should be enough. P.S : there are others, faster and maybe more complete python parser, but as I said ply is pure python: no external libraries and runs everywhere. Ciao ------- FB |
|
|
|
|
|||
|
|||
| Francesco Bochicchio |
|
Paul McGuire
Guest
Posts: n/a
|
On Mar 23, 4:04*pm, rh0dium <steven.kl...@gmail.com> wrote:
> > I needed to tweak it a bit to ignore the comments.. *Namely this fixed > it up.. > > * * mainDict = dictOf( > * * * * * * Group(Word(alphas)+Optional(quotedString)), > * * * * * * Suppress("{") + attrDict + Suppress("}") > * * * * * * ) | cStyleComment.suppress() > > Thanks again. *Now I just need to figure out how to use your dicts to > do some work..- Hide quoted text - > > - Show quoted text - I'm glad this is coming around to some reasonable degree of completion for you. One last thought - your handling of comments is a bit crude, and will not handle comments that crop up in the middle of dict entries, as in: color = /* using non-standard color during testing */ "plum" The more comprehensive way to handle comments is to call ignore. Using ignore will propagate the comment handling to all embedded expressions, so you only need to call ignore once on the top-most pyparsing expression, as in: mainDict.ignore(cStyleComment) Also, ignore does token suppression automatically. -- Paul |
|
|
|
|
|||
|
|||
| Paul McGuire |
|
|
|
| |
![]() |
| Thread Tools | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| help with pyparsing | Prabhu Gurumurthy | Python | 3 | 12-10-2007 04:04 PM |
| Help with pyparsing and dealing with null values | avidfan | Python | 2 | 10-31-2007 03:42 PM |
| help with pyparsing | Neal Becker | Python | 1 | 10-31-2007 03:13 PM |
| Need help parsing with pyparsing... | Just Another Victim of the Ambient Morality | Python | 6 | 10-23-2007 08:33 AM |
| Help With PyParsing of output from win32pdhutil.ShowAllProcesses() | Steve | Python | 3 | 09-12-2007 08:21 AM |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc..
SEO by vBSEO ©2010, Crawlability, Inc. |




