Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > How to escape # hash character in regex match strings

Reply
Thread Tools

How to escape # hash character in regex match strings

 
 
504crank@gmail.com
Guest
Posts: n/a
 
      06-10-2009
I've encountered a problem with my RegEx learning curve -- how to
escape hash characters # in strings being matched, e.g.:

>>> string = re.escape('123#abc456')
>>> match = re.match('\d+', string)
>>> print match


<_sre.SRE_Match object at 0x00A6A800>
>>> print match.group()


123

The correct result should be:

123456

I've tried to escape the hash symbol in the match string without
result.

Any ideas? Is the answer something I overlooked in my lurching Python
schooling?
 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      06-10-2009
wrote:

> I've encountered a problem with my RegEx learning curve -- how to
> escape hash characters # in strings being matched, e.g.:
>
>>>> string = re.escape('123#abc456')
>>>> match = re.match('\d+', string)
>>>> print match

>
> <_sre.SRE_Match object at 0x00A6A800>
>>>> print match.group()

>
> 123
>
> The correct result should be:
>
> 123456


>>> "".join(re.findall("\d+", "123#abc456"))

'123456'

> I've tried to escape the hash symbol in the match string without
> result.
>
> Any ideas? Is the answer something I overlooked in my lurching Python
> schooling?


re.escape() is used to build the regex from a string that may contain
characters that have a special meaning in regular expressions but that you
want to treat as literals. You can for example search for r"C:\dir" with

>>> re.compile(re.escape(r"C:\dir")).findall(r"C:\dir C:7ir")

['C:\\dir']

Without escaping you'd get

>>> re.compile(r"C:\dir").findall(r"C:\dir C:7ir")

['C:7ir']

Peter

 
Reply With Quote
 
 
 
 
David Shapiro
Guest
Posts: n/a
 
      06-10-2009
Maybe a using a Unicode equiv of # would do the trick.

-----Original Message-----
From: python-list-bounces+david.shapiro= [mailtoython-list-bounces+david.shapiro=] On Behalf Of Peter Otten
Sent: Wednesday, June 10, 2009 11:32 AM
To: python-
Subject: Re: How to escape # hash character in regex match strings

wrote:

> I've encountered a problem with my RegEx learning curve -- how to
> escape hash characters # in strings being matched, e.g.:
>
>>>> string = re.escape('123#abc456')
>>>> match = re.match('\d+', string)
>>>> print match

>
> <_sre.SRE_Match object at 0x00A6A800>
>>>> print match.group()

>
> 123
>
> The correct result should be:
>
> 123456


>>> "".join(re.findall("\d+", "123#abc456"))

'123456'

> I've tried to escape the hash symbol in the match string without
> result.
>
> Any ideas? Is the answer something I overlooked in my lurching Python
> schooling?


re.escape() is used to build the regex from a string that may contain
characters that have a special meaning in regular expressions but that you
want to treat as literals. You can for example search for r"C:\dir" with

>>> re.compile(re.escape(r"C:\dir")).findall(r"C:\dir C:7ir")

['C:\\dir']

Without escaping you'd get

>>> re.compile(r"C:\dir").findall(r"C:\dir C:7ir")

['C:7ir']

Peter

--
http://mail.python.org/mailman/listinfo/python-list

 
Reply With Quote
 
Lie Ryan
Guest
Posts: n/a
 
      06-11-2009
wrote:
> I've encountered a problem with my RegEx learning curve -- how to
> escape hash characters # in strings being matched, e.g.:
>
>>>> string = re.escape('123#abc456')
>>>> match = re.match('\d+', string)
>>>> print match

>
> <_sre.SRE_Match object at 0x00A6A800>
>>>> print match.group()

>
> 123
>
> The correct result should be:
>
> 123456
>
> I've tried to escape the hash symbol in the match string without
> result.
>
> Any ideas? Is the answer something I overlooked in my lurching Python
> schooling?


As you're not being clear on what you wanted, I'm just guessing this is
what you wanted:

>>> s = '123#abc456'
>>> re.match('\d+', re.sub('#\D+', '', s)).group()

'123456'
>>> s = '123#this is a comment and is ignored456'
>>> re.match('\d+', re.sub('#\D+', '', s)).group()

'123456'
 
Reply With Quote
 
Brian D
Guest
Posts: n/a
 
      06-11-2009
On Jun 11, 2:01*am, Lie Ryan <lie.1...@gmail.com> wrote:
> 504cr...@gmail.com wrote:
> > I've encountered a problem with my RegEx learning curve -- how to
> > escape hash characters # in strings being matched, e.g.:

>
> >>>> string = re.escape('123#abc456')
> >>>> match = re.match('\d+', string)
> >>>> print match

>
> > <_sre.SRE_Match object at 0x00A6A800>
> >>>> print match.group()

>
> > 123

>
> > The correct result should be:

>
> > 123456

>
> > I've tried to escape the hash symbol in the match string without
> > result.

>
> > Any ideas? Is the answer something I overlooked in my lurching Python
> > schooling?

>
> As you're not being clear on what you wanted, I'm just guessing this is
> what you wanted:
>
> >>> s = '123#abc456'
> >>> re.match('\d+', re.sub('#\D+', '', s)).group()

> '123456'
> >>> s = '123#this is a comment and is ignored456'
> >>> re.match('\d+', re.sub('#\D+', '', s)).group()

>
> '123456'


Sorry I wasn't more clear. I positively appreciate your reply. It
provides half of what I'm hoping to learn. The hash character is
actually a desirable hook to identify a data entity in a scraping
routine I'm developing, but not a character I want in the scrubbed
data.

In my application, the hash makes a string of alphanumeric characters
unique from other alphanumeric strings. The strings I'm looking for
are actually manually-entered identifiers, but a real machine-created
identifier shouldn't contain that hash character. The correct pattern
should be 'A1234509', but is instead often merely entered as '#12345'
when the first character, representing an alphabet sequence for the
month, and the last two characters, representing a two-digit year, can
be assumed. Identifying the hash character in a RegEx match is a way
of trapping the string and transforming it into its correct machine-
generated form.

I'm surprised it's been so difficult to find an example of the hash
character in a RegEx string -- for exactly this type of situation,
since it's so common in the real world that people want to put a pound
symbol in front of a number.

Thanks!
 
Reply With Quote
 
Brian D
Guest
Posts: n/a
 
      06-11-2009
On Jun 11, 9:22*am, Brian D <brianden...@gmail.com> wrote:
> On Jun 11, 2:01*am, Lie Ryan <lie.1...@gmail.com> wrote:
>
>
>
> > 504cr...@gmail.com wrote:
> > > I've encountered a problem with my RegEx learning curve -- how to
> > > escape hash characters # in strings being matched, e.g.:

>
> > >>>> string = re.escape('123#abc456')
> > >>>> match = re.match('\d+', string)
> > >>>> print match

>
> > > <_sre.SRE_Match object at 0x00A6A800>
> > >>>> print match.group()

>
> > > 123

>
> > > The correct result should be:

>
> > > 123456

>
> > > I've tried to escape the hash symbol in the match string without
> > > result.

>
> > > Any ideas? Is the answer something I overlooked in my lurching Python
> > > schooling?

>
> > As you're not being clear on what you wanted, I'm just guessing this is
> > what you wanted:

>
> > >>> s = '123#abc456'
> > >>> re.match('\d+', re.sub('#\D+', '', s)).group()

> > '123456'
> > >>> s = '123#this is a comment and is ignored456'
> > >>> re.match('\d+', re.sub('#\D+', '', s)).group()

>
> > '123456'

>
> Sorry I wasn't more clear. I positively appreciate your reply. It
> provides half of what I'm hoping to learn. The hash character is
> actually a desirable hook to identify a data entity in a scraping
> routine I'm developing, but not a character I want in the scrubbed
> data.
>
> In my application, the hash makes a string of alphanumeric characters
> unique from other alphanumeric strings. The strings I'm looking for
> are actually manually-entered identifiers, but a real machine-created
> identifier shouldn't contain that hash character. The correct pattern
> should be 'A1234509', but is instead often merely entered as '#12345'
> when the first character, representing an alphabet sequence for the
> month, and the last two characters, representing a two-digit year, can
> be assumed. Identifying the hash character in a RegEx match is a way
> of trapping the string and transforming it into its correct machine-
> generated form.
>
> I'm surprised it's been so difficult to find an example of the hash
> character in a RegEx string -- for exactly this type of situation,
> since it's so common in the real world that people want to put a pound
> symbol in front of a number.
>
> Thanks!


By the way, other forms the strings can take in their manually created
forms:

A#12345
#1234509

Garbage in, garbage out -- I know. I wish I could tell the people
entering the data how challenging it is to work with what they
provide, but it is, after all, a screen-scraping routine.
 
Reply With Quote
 
504crank@gmail.com
Guest
Posts: n/a
 
      06-11-2009
On Jun 11, 2:01*am, Lie Ryan <lie.1...@gmail.com> wrote:
> 504cr...@gmail.com wrote:
> > I've encountered a problem with my RegEx learning curve -- how to
> > escape hash characters # in strings being matched, e.g.:

>
> >>>> string = re.escape('123#abc456')
> >>>> match = re.match('\d+', string)
> >>>> print match

>
> > <_sre.SRE_Match object at 0x00A6A800>
> >>>> print match.group()

>
> > 123

>
> > The correct result should be:

>
> > 123456

>
> > I've tried to escape the hash symbol in the match string without
> > result.

>
> > Any ideas? Is the answer something I overlooked in my lurching Python
> > schooling?

>
> As you're not being clear on what you wanted, I'm just guessing this is
> what you wanted:
>
> >>> s = '123#abc456'
> >>> re.match('\d+', re.sub('#\D+', '', s)).group()

> '123456'
> >>> s = '123#this is a comment and is ignored456'
> >>> re.match('\d+', re.sub('#\D+', '', s)).group()

>
> '123456'- Hide quoted text -
>
> - Show quoted text -


Sorry I wasn't more clear. I positively appreciate your reply. It
provides half of what I'm hoping to learn. The hash character is
actually a desirable hook to identify a data entity in a scraping
routine I'm developing, but not a character I want in the scrubbed
data.

In my application, the hash makes a string of alphanumeric characters
unique from other alphanumeric strings. The strings I'm looking for
are actually manually-entered identifiers, but a real machine-created
identifier shouldn't contain that hash character. The correct pattern
should be 'A1234509', but is instead often merely entered as '#12345'
when the first character, representing an alphabet sequence for the
month, and the last two characters, representing a two-digit year, can
be assumed. Identifying the hash character in a RegEx match is a way
of trapping the string and transforming it into its correct machine-
generated form.

Other patterns the strings can take in their manually-created
form:

A#12345
#1234509

Garbage in, garbage out -- I know. I wish I could tell the people
entering the data how challenging it is to work with what they
provide, but it is, after all, a screen-scraping routine.

I'm surprised it's been so difficult to find an example of the hash
character in a RegEx string -- for exactly this type of situation,
since it's so common in the real world that people want to put a pound
symbol in front of a number.

Thanks!

 
Reply With Quote
 
Rhodri James
Guest
Posts: n/a
 
      06-11-2009
On Thu, 11 Jun 2009 15:22:44 +0100, Brian D <> wrote:

> I'm surprised it's been so difficult to find an example of the hash
> character in a RegEx string -- for exactly this type of situation,
> since it's so common in the real world that people want to put a pound
> symbol in front of a number.


It's a character with no special meaning to the regex engine, so I'm not
in the least surprised that there aren't many examples containing it.
You could just as validly claim that there aren't many examples involving
the letter 'q'.

By the way, I don't know what you're doing but I'm seeing all of your
posts twice, from two different addresses. This is a little confusing,
to put it mildly, and doesn't half break the threading.

--
Rhodri James *-* Wildebeest Herder to the Masses
 
Reply With Quote
 
Lie Ryan
Guest
Posts: n/a
 
      06-14-2009
Brian D wrote:
> On Jun 11, 9:22 am, Brian D <brianden...@gmail.com> wrote:
>> On Jun 11, 2:01 am, Lie Ryan <lie.1...@gmail.com> wrote:
>>
>>
>>
>>> 504cr...@gmail.com wrote:
>>>> I've encountered a problem with my RegEx learning curve -- how to
>>>> escape hash characters # in strings being matched, e.g.:
>>>>>>> string = re.escape('123#abc456')
>>>>>>> match = re.match('\d+', string)
>>>>>>> print match
>>>> <_sre.SRE_Match object at 0x00A6A800>
>>>>>>> print match.group()
>>>> 123
>>>> The correct result should be:
>>>> 123456
>>>> I've tried to escape the hash symbol in the match string without
>>>> result.
>>>> Any ideas? Is the answer something I overlooked in my lurching Python
>>>> schooling?
>>> As you're not being clear on what you wanted, I'm just guessing this is
>>> what you wanted:
>>>>>> s = '123#abc456'
>>>>>> re.match('\d+', re.sub('#\D+', '', s)).group()
>>> '123456'
>>>>>> s = '123#this is a comment and is ignored456'
>>>>>> re.match('\d+', re.sub('#\D+', '', s)).group()
>>> '123456'

>> Sorry I wasn't more clear. I positively appreciate your reply. It
>> provides half of what I'm hoping to learn. The hash character is
>> actually a desirable hook to identify a data entity in a scraping
>> routine I'm developing, but not a character I want in the scrubbed
>> data.
>>
>> In my application, the hash makes a string of alphanumeric characters
>> unique from other alphanumeric strings. The strings I'm looking for
>> are actually manually-entered identifiers, but a real machine-created
>> identifier shouldn't contain that hash character. The correct pattern
>> should be 'A1234509', but is instead often merely entered as '#12345'
>> when the first character, representing an alphabet sequence for the
>> month, and the last two characters, representing a two-digit year, can
>> be assumed. Identifying the hash character in a RegEx match is a way
>> of trapping the string and transforming it into its correct machine-
>> generated form.
>>
>> I'm surprised it's been so difficult to find an example of the hash
>> character in a RegEx string -- for exactly this type of situation,
>> since it's so common in the real world that people want to put a pound
>> symbol in front of a number.
>>
>> Thanks!

>
> By the way, other forms the strings can take in their manually created
> forms:
>
> A#12345
> #1234509
>
> Garbage in, garbage out -- I know. I wish I could tell the people
> entering the data how challenging it is to work with what they
> provide, but it is, after all, a screen-scraping routine.


perhaps it's like this?

>>> # you can use re.search if that suits better
>>> a = re.match('([A-Z]?)#(\d{5})(\d\d)?', 'A#12345')
>>> b = re.match('([A-Z]?)#(\d{5})(\d\d)?', '#1234509')
>>> a.group(0)

'A#12345'
>>> a.group(1)

'A'
>>> a.group(2)

'12345'
>>> a.group(3)
>>> b.group(0)

'#1234509'
>>> b.group(1)

''
>>> b.group(2)

'12345'
>>> b.group(3)

'09'
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
hash of hash of hash of hash in c++ rp C++ 1 11-10-2011 04:45 PM
When does the escape character work within raw strings? walterbyrd Python 12 05-24-2009 01:34 AM
How to read strings cantaining escape character from a file and useit as escape sequences? slomo Python 5 12-02-2007 11:39 AM
Escape character is strings Deepu Damodaran Ruby 2 11-26-2007 06:31 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57