Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > re.search (works)|(doesn't work) depending on for loop order

Reply
Thread Tools

re.search (works)|(doesn't work) depending on for loop order

 
 
sgharvey
Guest
Posts: n/a
 
      03-22-2008
.... and by works, I mean works like I expect it to.

I'm writing my own cheesy config.ini parser because ConfigParser
doesn't preserve case or order of sections, or order of options w/in
sections.

What's confusing me is this:
If I try matching every line to one pattern at a time, all the
patterns that are supposed to match, actually match.
If I try to match every pattern to one line at a time, only one
pattern will match.

What am I not understanding about re.search?

Doesn't match properly:
<code>
# Iterate through each pattern for each line
for line in lines:
for pattern in patterns:
# Match each pattern to the current line
match = patterns[pattern].search(line)
if match:
"%s: %s" % (pattern, str(match.groups()) )
</code>

_Does_ match properly:
<code>
# Let's iterate through all the lines for each pattern
for pattern in pattern:
for line in lines:
# Match each pattern to the current line
match = patterns[pattern].search(line)
if match:
"%s: %s" % (pattern, str(match.groups()) )
</code>

Related code:
The whole src
http://pastebin.com/f63298772
regexen and delimiters (imported into whole src)
http://pastebin.com/f485ac180

 
Reply With Quote
 
 
 
 
Marc 'BlackJack' Rintsch
Guest
Posts: n/a
 
      03-22-2008
On Sat, 22 Mar 2008 13:27:49 -0700, sgharvey wrote:

> ... and by works, I mean works like I expect it to.
>
> I'm writing my own cheesy config.ini parser because ConfigParser
> doesn't preserve case or order of sections, or order of options w/in
> sections.
>
> What's confusing me is this:
> If I try matching every line to one pattern at a time, all the
> patterns that are supposed to match, actually match.
> If I try to match every pattern to one line at a time, only one
> pattern will match.
>
> What am I not understanding about re.search?


That has nothing to do with `re.search` but how files work. A file has a
"current position marker" that is advanced at each iteration to the next
line in the file. When it is at the end, it stays there, so you can just
iterate *once* over an open file unless you rewind it with the `seek()`
method.

That only works on "seekable" files and it's not a good idea anyway
because usually the files and the overhead of reading is greater than the
time to iterate over in memory data like the patterns.

Ciao,
Marc 'BlackJack' Rintsch
 
Reply With Quote
 
 
 
 
John Machin
Guest
Posts: n/a
 
      03-22-2008
On Mar 23, 8:21 am, Marc 'BlackJack' Rintsch <(E-Mail Removed)> wrote:
> On Sat, 22 Mar 2008 13:27:49 -0700, sgharvey wrote:
> > ... and by works, I mean works like I expect it to.

>
> > I'm writing my own cheesy config.ini parser because ConfigParser
> > doesn't preserve case or order of sections, or order of options w/in
> > sections.

>
> > What's confusing me is this:
> > If I try matching every line to one pattern at a time, all the
> > patterns that are supposed to match, actually match.
> > If I try to match every pattern to one line at a time, only one
> > pattern will match.

>
> > What am I not understanding about re.search?

>
> That has nothing to do with `re.search` but how files work. A file has a
> "current position marker" that is advanced at each iteration to the next
> line in the file. When it is at the end, it stays there, so you can just
> iterate *once* over an open file unless you rewind it with the `seek()`
> method.
>
> That only works on "seekable" files and it's not a good idea anyway
> because usually the files and the overhead of reading is greater than the
> time to iterate over in memory data like the patterns.
>


Unless the OP has changed the pastebin code since you read it, that's
absolutely nothing to do with his problem -- his pastebin code slurps
in the whole .ini file using file.readlines; it is not iterating over
an open file.
 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      03-22-2008
On Mar 23, 7:27 am, sgharvey <(E-Mail Removed)> wrote:
> ... and by works, I mean works like I expect it to.


You haven't told us what you expect it to do. In any case, your
subject heading indicates that the problem is 99.999% likely to be in
your logic -- the converse would require the result of re.compile() to
retain some memory of what it's seen before *AND* to act differently
depending somehow on those memorised facts.

>
> I'm writing my own cheesy config.ini parser because ConfigParser
> doesn't preserve case or order of sections, or order of options w/in
> sections.
>
> What's confusing me is this:
> If I try matching every line to one pattern at a time, all the
> patterns that are supposed to match, actually match.
> If I try to match every pattern to one line at a time, only one
> pattern will match.
>
> What am I not understanding about re.search?


Its behaviour is not contingent on previous input.

The following pseudocode is not very useful; the corrections I have
made below can be made only after reading the actual pastebin code :-
( ... you are using the name "pattern" to refer both to a pattern name
(e.g. 'setting') and to a compiled regex.

> Doesn't match properly:
> <code>
> # Iterate through each pattern for each line
> for line in lines:
> for pattern in patterns:


you mean: for pattern_name in pattern_names:

> # Match each pattern to the current line
> match = patterns[pattern].search(line)


you mean: match = compiled_regexes[pattern_name].search(line)

> if match:
> "%s: %s" % (pattern, str(match.groups()) )


you mean: print pattern_name, match.groups
> </code>
>
> _Does_ match properly:
> <code>

[snip]

> </code>
>
> Related code:
> The whole src http://pastebin.com/f63298772


This can't be the code that you ran, because it won't even compile.
See comments in my update at http://pastebin.com/m77f0617a

By the way, you should be either (a) using *match* (not search) with a
\Z at the end of each pattern or (b) checking that there is not
extraneous guff at the end of the line ... otherwise a line like
"[blah] waffle" would be classified as a "section".

Have you considered leading/trailing/embedded spaces?

> regexen and delimiters (imported into whole src) http://pastebin.com/f485ac180


HTH,
John
 
Reply With Quote
 
Brian Lane
Guest
Posts: n/a
 
      03-22-2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

sgharvey wrote:
> ... and by works, I mean works like I expect it to.
>
> I'm writing my own cheesy config.ini parser because ConfigParser
> doesn't preserve case or order of sections, or order of options w/in
> sections.
>
> What's confusing me is this:
> If I try matching every line to one pattern at a time, all the
> patterns that are supposed to match, actually match.
> If I try to match every pattern to one line at a time, only one
> pattern will match.


I don't see that behavior when I try your code. I had to fix your
pattern loading:

patterns[pattern] = re.compile(pattern_strings[pattern], re.VERBOSE)

I would also recommend against using both the plural and singular
variable names, its bound to cause confusion eventually.

I also changed contents to self.contents so that it would be accessible
outside the class.

The correct way to do it is run each pattern against each line. This
will maintain the order of the config.ini file. If you do it the other
way you will end up with everything ordered based on the patterns
instead of the file.

I tried it with Python2.5 on OSX from within TextMate and it ran as
expected.

Brian

- --
- ---[Office 70.9F]--[Outside 54.5F]--[Server 103.3F]--[Coaster 68.0F]---
- ---[ KLAHOWYA WSF (366773110) @ 47 31.2076 -122 27.2249 ]---
Software, Linux, Microcontrollers http://www.brianlane.com
AIS Parser SDK http://www.aisparser.com
Movie Landmarks Search Engine http://www.movielandmarks.com

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Remember Lexington Green!

iD8DBQFH5ZHaIftj/pcSws0RAigtAJsE+NWTxwV5kO797P6AXhNTEp8dmQCfXL9I
y0nD/oOfNw6ZR6UZIOvwkkE=
=U+Zo
-----END PGP SIGNATURE-----
 
Reply With Quote
 
Gabriel Genellina
Guest
Posts: n/a
 
      03-23-2008
En Sat, 22 Mar 2008 17:27:49 -0300, sgharvey <(E-Mail Removed)>
escribi�:

> ... and by works, I mean works like I expect it to.
>
> I'm writing my own cheesy config.ini parser because ConfigParser
> doesn't preserve case or order of sections, or order of options w/in
> sections.


Take a look at ConfigObj http://pypi.python.org/pypi/ConfigObj/

Instead of:

# Remove the '\n's from the end of each line
lines = [line[0:line.__len__()-1] for line in lines]

line.__len__() is a crazy (and ugly) way of spelling len(line). The
comment is misleading; you say you remove '\n's but you don't actually
check for them. The last line in the file might not have a trailing \n.
See this:

lines = [line.rstrip('\n') for line in lines]

Usually trailing spaces are ignored too; so you end up writing:

lines = [line.rstrip() for line in lines]

In this case:
# Compile the regexen
patterns = {}
for pattern in pattern_strings:
patterns.update(pattern: re.compile(pattern_strings[pattern],
re.VERBOSE))

That code does not even compile. I got lost with all those similar names;
try to choose meaningful ones. What about this:

patterns = {}
for name,regexpr in pattern_strings.iteritems():
patterns[name] = re.compile(regexpr, re.VERBOSE))

or even:

patterns = dict((name,re.compile(regexpr, re.VERBOSE))
for name,regexpr in pattern_strings.iteritems()

or even compile them directly when you define them.

I'm not sure you can process a config file in this unstructured way; looks
a lot easier if you look for [sections] and process sequentially lines
inside sections.

if match:
content.update({pattern: match.groups()})

I wonder where you got the idea of populating a dict that way. It's a
basic operation:
content[name] = value

The regular expressions look strange too. A comment may be empty. A
setting too. There may be spaces around the = sign. Don't try to catch all
in one go.

--
Gabriel Genellina

 
Reply With Quote
 
sgharvey
Guest
Posts: n/a
 
      03-23-2008
On Mar 22, 5:03 pm, "Gabriel Genellina" <(E-Mail Removed)>
wrote:
> En Sat, 22 Mar 2008 17:27:49 -0300, sgharvey <(E-Mail Removed)>
> escribi�:
> Take a look at ConfigObjhttp://pypi.python.org/pypi/ConfigObj/


Thanks for the pointer; I'll check it out.

> I'm not sure you can process a config file in this unstructured way; looks
> a lot easier if you look for [sections] and process sequentially lines
> inside sections.


It works though... now that I've fixed up all my ugly stuff, and a
dumb logic error or two.

> The regular expressions look strange too. A comment may be empty. A
> setting too. There may be spaces around the = sign. Don't try to catch all
> in one go.


I didn't think about empty comments/settings... fixed now.
It also seemed simpler to handle surrounding spaces after the match
was found.

New version of the problematic part:
<code>
self.contents = []
content = {}
# Get the content in each line
for line in lines:
for name in patterns:
# Match each pattern to the current line
match = patterns[name].search(line)
if match:
content[name] = match.group(0).strip()
self.contents.append(content)
content = {}
</code>

new iniparsing.py
http://pastebin.com/f445701d4

new ini_regexen_dicts.py
http://pastebin.com/f1e41cd3d

> --
> Gabriel Genellina



Much thanks to all for the constructive criticism.

Samuel Harvey
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Triple nested loop python (While loop insde of for loop inside ofwhile loop) Isaac Won Python 9 03-04-2013 10:08 AM
executing a for loop or once depending on a test zebulon C++ 6 02-18-2009 07:47 AM
Weird result returned from adding floats depending on order I add them joanne matthews (RRes-Roth) Python 9 02-21-2007 05:52 PM
If you get an order # does it mean the order is accepted? =?Utf-8?B?U3RldmUxMDc3?= Windows 64bit 3 05-12-2005 11:46 PM
Traversion order cf. output order in XSL Soren Kuula XML 2 02-01-2004 09:10 AM



Advertisments