Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > reg exp and octal notation

Reply
Thread Tools

reg exp and octal notation

 
 
Lucas Branca
Guest
Posts: n/a
 
      03-05-2004
Could someone explain me the difference between the results below?

## $cat octals.txt
## \006\034abc

import re

a= "\006\034abc"
preg= re.compile(r'([\0-\377]*)')
res = preg.search(a)
print res.groups()

loader = open('./octals.txt', 'r')
b = loader.readline()
preg= re.compile(r'([\0-\377]*)')
res = preg.search(b)
print res.groups()


RESULTS

('\x06\x1cabc',)

('\\006\\034abc\n',)


Many thanks
Lucas


 
Reply With Quote
 
 
 
 
Ruud de Jong
Guest
Posts: n/a
 
      03-05-2004
Lucas Branca schreef:
> Could someone explain me the difference between the results below?
>
> ## $cat octals.txt
> ## \006\034abc
>
> import re
>
> a= "\006\034abc"
> preg= re.compile(r'([\0-\377]*)')
> res = preg.search(a)
> print res.groups()
>
> loader = open('./octals.txt', 'r')
> b = loader.readline()


Look at the value of b at this point, you'll see:
>>> b

'\\006\\034abc\n'

In other words, the backslashes are seen as literal backslashes.
readline() does no evaluation of the string, it just copies the
characters.

Regards,

Ruud

> preg= re.compile(r'([\0-\377]*)')
> res = preg.search(b)
> print res.groups()
>
>
> RESULTS
>
> ('\x06\x1cabc',)
>
> ('\\006\\034abc\n',)
>
>
> Many thanks
> Lucas
>
>


 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      03-05-2004
Lucas Branca wrote:

> Could someone explain me the difference between the results below?
>
> ## $cat octals.txt
> ## \006\034abc
>
> import re
>
> a= "\006\034abc"
> preg= re.compile(r'([\0-\377]*)')
> res = preg.search(a)
> print res.groups()
>
> loader = open('./octals.txt', 'r')
> b = loader.readline()
> preg= re.compile(r'([\0-\377]*)')
> res = preg.search(b)
> print res.groups()
>
>
> RESULTS
>
> ('\x06\x1cabc',)
>
> ('\\006\\034abc\n',)


a and b are two entirely different strings. Whatever similarity there
appears to be is an artifact of Python's treatment of escape sequences -
only in source code not in an arbitrary file.

Your literal string:

>>> s = "\006\034\n"
>>> s

'\x06\x1c\n'

What you read from the text file:

>>> t = "\\006\\034\n"
>>> t

'\\006\\034\n'

Maybe it helps to learn what's really inside these two strings, so let's
have a look at the ascii codes:

>>> map(ord, s)

[6, 28, 10]
>>> map(ord, t)

[92, 48, 48, 54, 92, 48, 51, 52, 10]

Another example: in source code you can write the newline as

>>> a = """

.... """
>>> b = "\n"
>>> c = "\x0a"
>>> d = "\012"
>>> a,b,c,d

('\n', '\n', '\n', '\n')

But if read from a file \n, \x0a, \012 would just be sequences of two or
four characters.

Only when you have understood the above you should return to regular
expressions. Your regexp always matches the whole string - i. e. is
redundant (and probably not what you want, but that you would need to
explain in another post).

[\0-\377] is just a fancy way of writing "match any character"
* means "repeat the preceding as often as you want" (including zero times)

Peter

 
Reply With Quote
 
Lucas Branca
Guest
Posts: n/a
 
      03-05-2004
-- snip --
>> ('\x06\x1cabc',) string from source code


>> ('\\006\\034abc\n',) same string read from file


--snip --
> In other words, the backslashes are seen as literal backslashes.
> readline() does no evaluation of the string, it just copies the
> characters


yeah... you are right guys. I have matched two problems
reg exp are innocents .

Ok. Let's say so:
I have to read each line of a file and strip a particular string from there
(a string containing octal notation too)

the problem is actually the file.readline() that doesn't return
what I was expected to.

pardon my 'newbyeeeee' but is there a way to read a line xy from that file
and obtaining:

line xy: \006\034abc

('\x06\x1cabc',)

and not every single char in it like now ?
('\\006\\034abc\n',)

(before I start to reinvent the wheel ....... )

Thank you
Lucas


 
Reply With Quote
 
Jeff Epler
Guest
Posts: n/a
 
      03-05-2004
If you have a string and want to perform backslash-substitution on it,
use python2.3's "string_escape" codec.

Two examples:

>>> s = "\\n"
>>> s

'\\n'
>>> s.decode("string_escape")

'\n'

>>> "\x30"

'0'
>>> "\\x30"

'\\x30'
>>> "\\x30".decode("string_escape")

'0'

You can remove the trailing newline this way:
if s.endswith("\n"): s = s[:-1]

Jeff

 
Reply With Quote
 
Lucas Branca
Guest
Posts: n/a
 
      03-05-2004
Great!
It's just what I was looking for.
(...and I read it in "what's new" this morning ......
.... "boing boing" with my head now ... )

Thank you very much



"Jeff Epler" <(E-Mail Removed)> ha scritto nel messaggio
news:(E-Mail Removed)...
> If you have a string and want to perform backslash-substitution on it,
> use python2.3's "string_escape" codec.
>
> Two examples:
>
> >>> s = "\\n"
> >>> s

> '\\n'
> >>> s.decode("string_escape")

> '\n'
>
> >>> "\x30"

> '0'
> >>> "\\x30"

> '\\x30'
> >>> "\\x30".decode("string_escape")

> '0'
>
> You can remove the trailing newline this way:
> if s.endswith("\n"): s = s[:-1]
>
> Jeff
>



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to convert Infix notation to postfix notation Tameem C Programming 454 01-31-2014 06:01 PM
Re: Annoying octal notation Derek Martin Python 101 09-05-2009 01:09 AM
Re: Annoying octal notation Simon Forman Python 4 09-03-2009 01:43 PM
Re: Annoying octal notation James Harris Python 1 08-23-2009 06:11 PM
Hungarian Notation Vs. Pascal Notation? Grey Squirrel ASP .Net 6 03-21-2007 09:42 AM



Advertisments