Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > RE: r'\' - python parser bug?

Reply
Thread Tools

RE: r'\' - python parser bug?

 
 
Tim Peters
Guest
Posts: n/a
 
      05-24-2004
[Konstantin Veretennicov]
> ActivePython 2.3.2 Build 232
> >>> '\\'

> '\\'
> >>> r'\'

> File "<stdin>", line 1
> r'\'
> ^
> SyntaxError: EOL while scanning single-quoted string
>
> Is this a known issue?


Yes, and a documented one: an r-string cannot end with an odd number of
backslashes. Note that your first expression ('\\') did create a string
with a single backslash, although the repr of that string may have fooled
you into thinking you got two backslashes.

>>> '\\'

'\\'
>>> len('\\')

1
>>> print '\\'

\
>>>


> Should i submit a bug report to development?


Nope: it's not a bug, and won't change.



 
Reply With Quote
 
 
 
 
Konstantin Veretennicov
Guest
Posts: n/a
 
      05-25-2004
"Tim Peters" <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...

> > Should i submit a bug report to development?

>
> Nope: it's not a bug, and won't change.


Ok. Does it mean i'm not encouraged to even try inventing a patch?
It won't break anything, or will it? I agree we can live without r'\',
but are there any reasons *against* r'\'?

- kv
 
Reply With Quote
 
 
 
 
Fredrik Lundh
Guest
Posts: n/a
 
      05-25-2004
Tim Peters wrote:

> Yup. Right now all tools (including Python itself) that scan over strings
> in Python source can (and usually do) treat backslashes identically, whether
> in loops or in regexps.


Or in other words, the point here is that the prefix flag (u, r, whatever) doesn't
affect how a string literal is *parsed*. When the parser sees a backslash inside
a string literal, it always skips the next character. There's no separate grammar
for "raw string literals".

</F>




 
Reply With Quote
 
Fuzzyman
Guest
Posts: n/a
 
      05-26-2004
"Fredrik Lundh" <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> Tim Peters wrote:
>
> > Yup. Right now all tools (including Python itself) that scan over strings
> > in Python source can (and usually do) treat backslashes identically, whether
> > in loops or in regexps.

>
> Or in other words, the point here is that the prefix flag (u, r, whatever) doesn't
> affect how a string literal is *parsed*. When the parser sees a backslash inside
> a string literal, it always skips the next character. There's no separate grammar
> for "raw string literals".
>
> </F>



Wrong, surely ?

>>> print '\\'

\
>>> print r'\\'

\\
>>> print r'c:\subdir\'

SyntaxError: EOL while scanning single-quoted string
>>>


> When the parser sees a backslash inside
> a string literal, it always skips the next character.

In the above example the parser *only* skips the next character if it
is at the end of the string... surely illogical. The reason given is
effectively 'raw strings were created for regular expressions, so it
doesn't matter if the behaviour is illogical' (and precludes other
reasonable uses!!)..........

Regards,


Fuzzy
 
Reply With Quote
 
Duncan Booth
Guest
Posts: n/a
 
      05-26-2004
http://www.velocityreviews.com/forums/(E-Mail Removed) (Fuzzyman) wrote in
news:(E-Mail Removed) om:

>>>> print r'c:\subdir\'

> SyntaxError: EOL while scanning single-quoted string
>>>>

>
>> When the parser sees a backslash inside
>> a string literal, it always skips the next character.

> In the above example the parser *only* skips the next character if it
> is at the end of the string... surely illogical. The reason given is
> effectively 'raw strings were created for regular expressions, so it
> doesn't matter if the behaviour is illogical' (and precludes other
> reasonable uses!!)..........
>


In a python string, backslash is an escape character which gives the next
character(s) special meaning, so '\n' is a single newline character. If the
escaped character isn't a known escape then the parser simply passes
through the entire sequence. So '\s' is a two character string. In all
cases at least one character following the backslash is parsed when the
backslash is encountered, and this character can never form part of the
string terminator.

Raw strings are processed in exactly the same way as normal strings, except
that no escape sequences are recognised, however the character following
the backslash is still prevented from terminating the string, just as it
would in any other string. This *useful*? behaviour allows you to put
single and double quotes into a raw string provided that they are preceded
by a backslash.

print r'c:\subdir\'file'

Raw strings aren't intended for writing DOS pathnames, they are actually
targetted for regular expressions where this behaviour makes more sense.

If you need a lot of pathnames in your program you could consider using
forward slash as the directory separator (use os.path.normpath to convert
to backslashes if you feel the need), or put all your paths in a separate
configuration file where you can choose what quoting, if any to interpret.

Also, provided you use os.path.join to concatenate paths you never actually
*need* to include a trailing separator:

DIR = r'c:\subdir'
FILE = os.path.join(DIR, 'filename')

ducks the entire issue cleanly.
 
Reply With Quote
 
Fredrik Lundh
Guest
Posts: n/a
 
      05-26-2004
Fuzzyman wrote:

> Wrong, surely ?


nope.

> >>> print '\\'

> \


the parser sees the first backslash, skips the second backslash,
sees the end quote, and passes everything between the quotes
to the next compiler stage.

> >>> print r'\\'

> \\


the parser sees the first backslash, skips the second backslash,
sees the end quote, and passes everything between the quotes
to the next compiler stage.

> >>> print r'c:\subdir\'

> SyntaxError: EOL while scanning single-quoted string


the parser sees the first backslash, skips the "s", and moves on.
the parser then sees the second backslash, skips the end quote,
and stumbles upon an EOL. syntax error (grammar violation).

> > When the parser sees a backslash inside
> > a string literal, it always skips the next character.

>
> In the above example the parser *only* skips the next character if it
> is at the end of the string... surely illogical.


you're confusing the string literal syntax (which is what the parser deals
with) with the contents of the resulting string object (which is created by
a later compiler stage). read the grammar (it's in the language reference)
and try again.

</F>




 
Reply With Quote
 
Fuzzyman
Guest
Posts: n/a
 
      05-26-2004
Duncan Booth <(E-Mail Removed)> wrote in message news:<Xns94F55D228C7AAduncanrcpcouk@127.0.0.1>...
> (E-Mail Removed) (Fuzzyman) wrote in
> news:(E-Mail Removed) om:
>
> >>>> print r'c:\subdir\'

> SyntaxError: EOL while scanning single-quoted string
> >>>>

>
> >> When the parser sees a backslash inside
> >> a string literal, it always skips the next character.

> > In the above example the parser *only* skips the next character if it
> > is at the end of the string... surely illogical. The reason given is
> > effectively 'raw strings were created for regular expressions, so it
> > doesn't matter if the behaviour is illogical' (and precludes other
> > reasonable uses!!)..........
> >

>
> In a python string, backslash is an escape character which gives the next
> character(s) special meaning, so '\n' is a single newline character. If the
> escaped character isn't a known escape then the parser simply passes
> through the entire sequence. So '\s' is a two character string. In all
> cases at least one character following the backslash is parsed when the
> backslash is encountered, and this character can never form part of the
> string terminator.
>
> Raw strings are processed in exactly the same way as normal strings, except
> that no escape sequences are recognised, however the character following
> the backslash is still prevented from terminating the string, just as it
> would in any other string. This *useful*? behaviour allows you to put
> single and double quotes into a raw string provided that they are preceded
> by a backslash.
>
> print r'c:\subdir\'file'
>
> Raw strings aren't intended for writing DOS pathnames, they are actually
> targetted for regular expressions where this behaviour makes more sense.
>

[snip..]

Yeah.. that's not an annoying feature.... I mean no-one would ever
want to use strings to hold Windows pathnames in......


Regards,

Fuzzy
 
Reply With Quote
 
Peter Hansen
Guest
Posts: n/a
 
      05-26-2004
Fuzzyman wrote:

> Duncan Booth <(E-Mail Removed)> wrote in message news:<Xns94F55D228C7AAduncanrcpcouk@127.0.0.1>...
>>Raw strings aren't intended for writing DOS pathnames, they are actually
>>targetted for regular expressions where this behaviour makes more sense.

>
> Yeah.. that's not an annoying feature.... I mean no-one would ever
> want to use strings to hold Windows pathnames in......


So use forward slashes. They're prettier anyway, and no need for
the r strings.

-Peter
 
Reply With Quote
 
Konstantin Veretennicov
Guest
Posts: n/a
 
      05-28-2004
Many thanks to everyone for enlightening. Now i can see the reasons
behind "no odd number of trailing backslashes" decision.
Maybe they deserve to be appended to FAQ section on raw strings?

Interestingly, C# did it the other way. Trailing backslashes in
verbatim (raw) strings are allowed, but quotes are not:

@"\" // ok
@"\"" // error, unterminated string literal

For me, personally, trailing backslashes aren't as important as quotes.
Python wins again

- kv
 
Reply With Quote
 
Per Erik Stendahl
Guest
Posts: n/a
 
      05-28-2004
Fuzzyman wrote:

[snip]

> Yeah.. that's not an annoying feature.... I mean no-one would ever
> want to use strings to hold Windows pathnames in......


So I guess I'm not the only one who tries to use a special class for
paths as much as possible then?

Working with pathnames as strings is painful, IMHO. Using objects makes
it much clearer, for me anyway.

path = Path(r'C:\documents\my\file.txt')
if path.isfile():
shutil.copyfile(path.get(), ....)
print path.dir()
other_path = path.parent() / 'subdir' / 'otherfile' + '.txt'
....


Regards,

Per Erik Stendahl

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
import parser does not import parser.py in same dir on win Joel Hedlund Python 2 11-11-2006 03:46 PM
import parser does not import parser.py in same dir on win Joel Hedlund Python 0 11-11-2006 11:34 AM
XML Parser VS HTML Parser ZOCOR Java 11 10-05-2004 01:58 PM
XMLparser: Difference between parser.setErrorHandler() vs. parser.setContentHandler() Bernd Oninger Java 0 06-09-2004 01:26 AM
XMLparser: Difference between parser.setErrorHandler() vs. parser.setContentHandler() Bernd Oninger XML 0 06-09-2004 01:26 AM



Advertisments