Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Regular expression to match a # (http://www.velocityreviews.com/forums/t347978-regular-expression-to-match-a.html)

Tom Deco 08-11-2005 10:35 AM

Regular expression to match a #
 
Hi,

I'm trying to use a regular expression to match a string containing a #
(basically i'm looking for #include ...)

I don't seem to manage to write a regular expression that matches this.

My (probably to naive) approach is: p = re.compile(r'\b#include\b)
I also tried p = re.compile(r'\b\#include\b) in a futile attempt to use
a backslash as escape character before the #
None of the above return a match for a string like "#include <stdio>".

I know a # is used for comments, hence my attempt to escape it...

Any suggestion on how to get a regular expression to find a #?

Thanks


Dan 08-11-2005 11:26 AM

Re: Regular expression to match a #
 
> My (probably to naive) approach is: p = re.compile(r'\b#include\b)

I think your problem is the \b at the beginning. \b matches a word break
(defined as \w\W or \W\w). There would only be a word break before the #
if the preceding character were a \w (that is, [A-Za-z0-9_], and maybe
some other characters depending on your locale).

However, the \b after the "include" is exactly what you want.

--
I had picked out the theme of the baby's room and done other
things. I decided to let Jon have this.
- Jamie Cusack (of the Netherlands), whose husband Jon
finally talked her into letting him name their son Jon 2.0



Tom Deco 08-11-2005 11:57 AM

Re: Regular expression to match a #
 
Thanks,
That did the trick...


Duncan Booth 08-11-2005 12:11 PM

Re: Regular expression to match a #
 
Dan wrote:

>> My (probably to naive) approach is: p = re.compile(r'\b#include\b)

>
> I think your problem is the \b at the beginning. \b matches a word break
> (defined as \w\W or \W\w). There would only be a word break before the #
> if the preceding character were a \w (that is, [A-Za-z0-9_], and maybe
> some other characters depending on your locale).
>
> However, the \b after the "include" is exactly what you want.
>


So the OP probably wanted '\B' the exact opposite of '\b' for the start of
the string, i.e. only match the # if it is NOT preceded by a wordbreak.

Alternatively for C style #includes search for r'^\s*#\s*include\b'.

John Machin 08-11-2005 12:24 PM

Re: Regular expression to match a #
 
Tom Deco wrote:
> Hi,
>
> I'm trying to use a regular expression to match a string containing a #
> (basically i'm looking for #include ...)
>
> I don't seem to manage to write a regular expression that matches this.
>
> My (probably to naive) approach is: p = re.compile(r'\b#include\b)
> I also tried p = re.compile(r'\b\#include\b) in a futile attempt to use
> a backslash as escape character before the #
> None of the above return a match for a string like "#include <stdio>".
>
> I know a # is used for comments, hence my attempt to escape it...
>
> Any suggestion on how to get a regular expression to find a #?
>
> Thanks
>


You definitely shouldn't have the first \b -- match() works only at the
beginning of the target string, so it is impossible for there to be a
word boundary just before the "#".

You probably shouldn't have the second \b.

You probably should read section A12 of K&R2.

You probably should be using a parser, but if you persist in using
regular expressions:

(a) read the manual.

(b) try something like this:

>>> pat1 = re.compile(r'\s*#\s*include\s*<\s*([^>\s]+)\s*>\s*$')
>>> pat1.match(" # include < fubar.h > ").group(1)

'fubar.h'

N.B. this is based the assumption that sane programmers don't have
whitespace embedded in the names of source files ;-)

HTH,
John

John Machin 08-11-2005 12:34 PM

Re: Regular expression to match a #
 
Duncan Booth wrote:
> Dan wrote:
>
>
>>>My (probably to naive) approach is: p = re.compile(r'\b#include\b)

>>
>>I think your problem is the \b at the beginning. \b matches a word break
>>(defined as \w\W or \W\w). There would only be a word break before the #
>>if the preceding character were a \w (that is, [A-Za-z0-9_], and maybe
>>some other characters depending on your locale).
>>
>>However, the \b after the "include" is exactly what you want.
>>

>
>
> So the OP probably wanted '\B' the exact opposite of '\b' for the start of
> the string, i.e. only match the # if it is NOT preceded by a wordbreak.
>
> Alternatively for C style #includes search for r'^\s*#\s*include\b'.


Search for r'^something' can never be better/faster than match for
r'something', and with a dopey implementation of search [which Python's
re is NOT] it could be much worse. So please don't tell newbies to
search for r'^something'.


Jeff Schwab 08-11-2005 01:23 PM

Re: Regular expression to match a #
 
John Machin wrote:

> Search for r'^something' can never be better/faster than match for
> r'something', and with a dopey implementation of search [which Python's
> re is NOT] it could be much worse. So please don't tell newbies to
> search for r'^something'.


How else would you match the beginning of a line in a multi-line string?

Duncan Booth 08-11-2005 03:46 PM

Re: Regular expression to match a #
 
John Machin wrote:

>> Alternatively for C style #includes search for r'^\s*#\s*include\b'.

>
> Search for r'^something' can never be better/faster than match for
> r'something', and with a dopey implementation of search [which
> Python's re is NOT] it could be much worse. So please don't tell
> newbies to search for r'^something'.
>

Search for r'^something' is always better than searching for r'something'
when the spec requires the search to match only at the start of a line (on
the principle that code that works is better than code which doesn't).

It appears that this may be something the original poster wanted, so I
stand by my suggestion.

Aahz 08-11-2005 05:04 PM

Re: Regular expression to match a #
 
In article <42fb45d7$1@news.eftel.com>,
John Machin <sjmachin@lexicon.net> wrote:
>
>Search for r'^something' can never be better/faster than match for
>r'something', and with a dopey implementation of search [which Python's
>re is NOT] it could be much worse. So please don't tell newbies to
>search for r'^something'.


You're somehow getting mixed up in thinking that "^" is some kind of
"not" operator -- it's the start of line anchor in this context.
--
Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/

The way to build large Python applications is to componentize and
loosely-couple the hell out of everything.

John Machin 08-11-2005 08:24 PM

Re: Regular expression to match a #
 
Jeff Schwab wrote:
> John Machin wrote:
>
>> Search for r'^something' can never be better/faster than match for
>> r'something', and with a dopey implementation of search [which
>> Python's re is NOT] it could be much worse. So please don't tell
>> newbies to search for r'^something'.

>
>
> How else would you match the beginning of a line in a multi-line string?


I beg your pardon -- I should have qualified that:

"""
So please don't tell newbies to search for r'^something' when match of
r'something' does the job.
"""



All times are GMT. The time now is 05:07 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.