Velocity Reviews > Regular expression issue

# Regular expression issue

genxtech
Guest
Posts: n/a

 08-08-2010
I am trying to learn regular expressions in python3 and have an issue
with one of the examples I'm working with.
The code is:

#! /usr/bin/env python3

import re

search_string = "[^aeiou]y\$"
print()

in_string = 'vacancy'
if re.search(search_string, in_string) != None:
print(" ay, ey, iy, oy and uy are not at the end of
{0}.".format(in_string))
else:
print(" ay, ey, iy, oy or uy were found at the end of
{0}.".format(in_string))
print()

in_string = 'boy'
if re.search(search_string, in_string) != None:
print(" ay, ey, iy, oy and uy are not at the end of
{0}.".format(in_string))
else:
print(" ay, ey, iy, oy or uy were found at the end of
{0}.".format(in_string))
print()

in_string = 'day'
if re.search(search_string, in_string) != None:
print(" ay, ey, iy, oy and uy are not at the end of
{0}.".format(in_string))
else:
print(" ay, ey, iy, oy or uy were found at the end of
{0}.".format(in_string))
print()

in_string = 'pita'
if re.search(search_string, in_string) != None:
print(" ay, ey, iy, oy and uy are not at the end of
{0}.".format(in_string))
else:
print(" ay, ey, iy, oy or uy were found at the end of
{0}.".format(in_string))
print()

The output that I am getting is:
ay, ey, iy, oy and uy are not at the end of vacancy.
ay, ey, iy, oy or uy were found at the end of boy.
ay, ey, iy, oy or uy were found at the end of day.
ay, ey, iy, oy or uy were found at the end of pita.

The last line of the output is the opposite of what I expected to see,
and I'm having trouble figuring out what the issue is. Any help would
be greatly appreciated.

Thomas Jollans
Guest
Posts: n/a

 08-08-2010
On Monday 09 August 2010, it occurred to genxtech to exclaim:
> I am trying to learn regular expressions in python3 and have an issue
> with one of the examples I'm working with.
> The code is:
>
> #! /usr/bin/env python3
>
> import re
>
> search_string = "[^aeiou]y\$"

To translate this expression to English:

a character that is not a, e, i, o, or u, followed by the character 'y', at
the end of the line.

"vacancy" matches. It ends with "c" (not one of aeiou), followed by "y"

"pita" does not match: it does not end with "y".

> print()
>
> in_string = 'vacancy'
> if re.search(search_string, in_string) != None:
> print(" ay, ey, iy, oy and uy are not at the end of
> {0}.".format(in_string))
> else:
> print(" ay, ey, iy, oy or uy were found at the end of
> {0}.".format(in_string))
> print()
>
> in_string = 'boy'
> if re.search(search_string, in_string) != None:
> print(" ay, ey, iy, oy and uy are not at the end of
> {0}.".format(in_string))
> else:
> print(" ay, ey, iy, oy or uy were found at the end of
> {0}.".format(in_string))
> print()
>
> in_string = 'day'
> if re.search(search_string, in_string) != None:
> print(" ay, ey, iy, oy and uy are not at the end of
> {0}.".format(in_string))
> else:
> print(" ay, ey, iy, oy or uy were found at the end of
> {0}.".format(in_string))
> print()
>
> in_string = 'pita'
> if re.search(search_string, in_string) != None:
> print(" ay, ey, iy, oy and uy are not at the end of
> {0}.".format(in_string))
> else:
> print(" ay, ey, iy, oy or uy were found at the end of
> {0}.".format(in_string))
> print()
>
> The output that I am getting is:
> ay, ey, iy, oy and uy are not at the end of vacancy.
> ay, ey, iy, oy or uy were found at the end of boy.
> ay, ey, iy, oy or uy were found at the end of day.
> ay, ey, iy, oy or uy were found at the end of pita.
>
> The last line of the output is the opposite of what I expected to see,
> and I'm having trouble figuring out what the issue is. Any help would
> be greatly appreciated.

MRAB
Guest
Posts: n/a

 08-08-2010
genxtech wrote:
> I am trying to learn regular expressions in python3 and have an issue
> with one of the examples I'm working with.
> The code is:
>
> #! /usr/bin/env python3
>
> import re
>
> search_string = "[^aeiou]y\$"

You can think of this as: a non-vowel followed by a 'y', then the end of
the string.

> print()
>
> in_string = 'vacancy'
> if re.search(search_string, in_string) != None:
> print(" ay, ey, iy, oy and uy are not at the end of {0}.".format(in_string))
> else:
> print(" ay, ey, iy, oy or uy were found at the end of {0}.".format(in_string))

Matches because 'c' is a non-vowel, 'y' matches, and then the end of the
string.

> print()
>
> in_string = 'boy'
> if re.search(search_string, in_string) != None:
> print(" ay, ey, iy, oy and uy are not at the end of {0}.".format(in_string))
> else:
> print(" ay, ey, iy, oy or uy were found at the end of {0}.".format(in_string))

Doesn't match because 'o' is a vowel, not a non-vowel.

> print()
>
> in_string = 'day'
> if re.search(search_string, in_string) != None:
> print(" ay, ey, iy, oy and uy are not at the end of {0}.".format(in_string))
> else:
> print(" ay, ey, iy, oy or uy were found at the end of {0}.".format(in_string))

Doesn't match because 'a' is a vowel, not a non-vowel.

> print()
>
> in_string = 'pita'
> if re.search(search_string, in_string) != None:
> print(" ay, ey, iy, oy and uy are not at the end of {0}.".format(in_string))
> else:
> print(" ay, ey, iy, oy or uy were found at the end of {0}.".format(in_string))

Doesn't match because 't' is a non-vowel but 'a' doesn't match 'y'.

> print()
>
> The output that I am getting is:
> ay, ey, iy, oy and uy are not at the end of vacancy.
> ay, ey, iy, oy or uy were found at the end of boy.
> ay, ey, iy, oy or uy were found at the end of day.
> ay, ey, iy, oy or uy were found at the end of pita.
>
> The last line of the output is the opposite of what I expected to see,
> and I'm having trouble figuring out what the issue is. Any help would
> be greatly appreciated.

Chris Rebert
Guest
Posts: n/a

 08-08-2010
On Sun, Aug 8, 2010 at 3:32 PM, Thomas Jollans <(E-Mail Removed)> wrote:
> On Monday 09 August 2010, it occurred to genxtech to exclaim:
>> I am trying to learn regular expressions in python3 and have an issue
>> with one of the examples I'm working with.
>> The code is:
>>
>> #! /usr/bin/env python3
>>
>> import re
>>
>> search_string = "[^aeiou]y\$"

>
> To translate this expression to English:
>
> a character that is not a, e, i, o, or u, followed by the character 'y', at
> the end of the line.
>
> "vacancy" matches. It ends with "c" (not one of aeiou), followed by "y"
>
> "pita" does not match: it does not end with "y".

Or in other words, the regex will not match when:
- the string ends in "ay", "ey", "iy", "oy", or "uy"
- the string doesn't end in "y"
- the string is less than 2 characters long

So, the program has a logic error in its assumptions. A non-match
*doesn't* imply that a string ends in one of the aforementioned pairs;
the other possibilities have been overlooked.

May I suggest instead using the much more straightforward
`search_string = "[aeiou]y\$"` and then swapping your conditions
around? The double-negative sort of style the program is currently
using is (as you've just experienced) harder to reason about and thus
more error-prone.

Cheers,
Chris
--
http://blog.rebertia.com

Tim Chase
Guest
Posts: n/a

 08-08-2010
On 08/08/10 17:20, genxtech wrote:
> if re.search(search_string, in_string) != None:

While the other responses have addressed some of the big issues,
it's also good to use

if thing_to_test is None:

or

if thing_to_test is not None:

instead of "== None" or "!= None".

-tkc

genxtech
Guest
Posts: n/a

 08-09-2010
On Aug 8, 7:34*pm, Tim Chase <(E-Mail Removed)> wrote:
> On 08/08/10 17:20, genxtech wrote:
>
> > if re.search(search_string, in_string) != None:

>
> While the other responses have addressed some of the big issues,
> it's also good to use
>
> * *if thing_to_test is None:
>
> or
>
> * *if thing_to_test is not None:
>
> instead of "== None" or "!= None".
>
> -tkc

I would like to thank all of you for your responses. I understand
what the regular expression means, and am aware of the double negative
nature of the test. I guess what I am really getting at is why the
last test returns a value of None, and even when using the syntax
suggested in this quoted solution, the code for the last test is doing
the opposite of the previous 2 tests that also returned a value of
None. I hope this makes sense and clarifies what I am trying to ask.
Thanks

MRAB
Guest
Posts: n/a

 08-09-2010
genxtech wrote:
> On Aug 8, 7:34 pm, Tim Chase <(E-Mail Removed)> wrote:
>> On 08/08/10 17:20, genxtech wrote:
>>
>>> if re.search(search_string, in_string) != None:

>> While the other responses have addressed some of the big issues,
>> it's also good to use
>>
>> if thing_to_test is None:
>>
>> or
>>
>> if thing_to_test is not None:
>>
>> instead of "== None" or "!= None".
>>
>> -tkc

>
> I would like to thank all of you for your responses. I understand
> what the regular expression means, and am aware of the double negative
> nature of the test. I guess what I am really getting at is why the
> last test returns a value of None, and even when using the syntax
> suggested in this quoted solution, the code for the last test is doing
> the opposite of the previous 2 tests that also returned a value of
> None. I hope this makes sense and clarifies what I am trying to ask.
>

It returns None because it doesn't match.

Why doesn't it match?

Because the regex wants the last character to be a 'y', but it isn't,
it's a 'a'.

nn
Guest
Posts: n/a

 08-09-2010
On Aug 9, 9:18*am, genxtech <(E-Mail Removed)> wrote:
> On Aug 8, 7:34*pm, Tim Chase <(E-Mail Removed)> wrote:
>
>
>
> > On 08/08/10 17:20, genxtech wrote:

>
> > > if re.search(search_string, in_string) != None:

>
> > While the other responses have addressed some of the big issues,
> > it's also good to use

>
> > * *if thing_to_test is None:

>
> > or

>
> > * *if thing_to_test is not None:

>
> > instead of "== None" or "!= None".

>
> > -tkc

>
> I would like to thank all of you for your responses. *I understand
> what the regular expression means, and am aware of the double negative
> nature of the test. *I guess what I am really getting at is why the
> last test returns a value of None, and even when using the syntax
> suggested in this quoted solution, the code for the last test is doing
> the opposite of the previous 2 tests that also returned a value of
> None. *I hope this makes sense and clarifies what I am trying to ask.
> Thanks

First: You understand the regular expression and the double negative
but not both of them together, otherwise you would not be asking here.
The suggestion of refactoring the code is that down the road you or
somebody else doing maintenance will have to read it again. Good books
avoid confusing grammar, likewise, good programs avoid confusing
logic.

Second: the root of your problem is the mistaken believe that P=>Q
implies (not P)=>(not Q); This is not so. Let me give an example: if
you say "if it rains" then "the ground is wet" that does not imply "if
it doesn't rain" then "the ground is not wet". You could be watering
the plants for instance. Saying "if the word finishes with a consonant
and an y" then "ay, ey, iy, oy and uy are not at the end of the word"
does not imply that "if the word does not finish with a consonant and
an y" then "ay, ey, iy, oy or uy were found at the end of the word".
The word could end in x for instance.

I hope I didn't make it more confusing, otherwise other people will
probably chime in to make it clear to you.

genxtech
Guest
Posts: n/a

 08-09-2010
I have it now. Had to beat my head over it a couple times. Thanks
everybody.