Velocity Reviews > reg exp and octal notation

# reg exp and octal notation

Lucas Branca
Guest
Posts: n/a

 03-05-2004
Could someone explain me the difference between the results below?

## \$cat octals.txt
## \006\034abc

import re

a= "\006\034abc"
preg= re.compile(r'([\0-\377]*)')
res = preg.search(a)
print res.groups()

preg= re.compile(r'([\0-\377]*)')
res = preg.search(b)
print res.groups()

RESULTS

('\x06\x1cabc',)

('\\006\\034abc\n',)

Many thanks
Lucas

Ruud de Jong
Guest
Posts: n/a

 03-05-2004
Lucas Branca schreef:
> Could someone explain me the difference between the results below?
>
> ## \$cat octals.txt
> ## \006\034abc
>
> import re
>
> a= "\006\034abc"
> preg= re.compile(r'([\0-\377]*)')
> res = preg.search(a)
> print res.groups()
>

Look at the value of b at this point, you'll see:
>>> b

'\\006\\034abc\n'

In other words, the backslashes are seen as literal backslashes.
readline() does no evaluation of the string, it just copies the
characters.

Regards,

Ruud

> preg= re.compile(r'([\0-\377]*)')
> res = preg.search(b)
> print res.groups()
>
>
> RESULTS
>
> ('\x06\x1cabc',)
>
> ('\\006\\034abc\n',)
>
>
> Many thanks
> Lucas
>
>

Peter Otten
Guest
Posts: n/a

 03-05-2004
Lucas Branca wrote:

> Could someone explain me the difference between the results below?
>
> ## \$cat octals.txt
> ## \006\034abc
>
> import re
>
> a= "\006\034abc"
> preg= re.compile(r'([\0-\377]*)')
> res = preg.search(a)
> print res.groups()
>
> preg= re.compile(r'([\0-\377]*)')
> res = preg.search(b)
> print res.groups()
>
>
> RESULTS
>
> ('\x06\x1cabc',)
>
> ('\\006\\034abc\n',)

a and b are two entirely different strings. Whatever similarity there
appears to be is an artifact of Python's treatment of escape sequences -
only in source code not in an arbitrary file.

>>> s = "\006\034\n"
>>> s

'\x06\x1c\n'

What you read from the text file:

>>> t = "\\006\\034\n"
>>> t

'\\006\\034\n'

Maybe it helps to learn what's really inside these two strings, so let's
have a look at the ascii codes:

>>> map(ord, s)

[6, 28, 10]
>>> map(ord, t)

[92, 48, 48, 54, 92, 48, 51, 52, 10]

Another example: in source code you can write the newline as

>>> a = """

.... """
>>> b = "\n"
>>> c = "\x0a"
>>> d = "\012"
>>> a,b,c,d

('\n', '\n', '\n', '\n')

But if read from a file \n, \x0a, \012 would just be sequences of two or
four characters.

Only when you have understood the above you should return to regular
expressions. Your regexp always matches the whole string - i. e. is
redundant (and probably not what you want, but that you would need to
explain in another post).

[\0-\377] is just a fancy way of writing "match any character"
* means "repeat the preceding as often as you want" (including zero times)

Peter

Lucas Branca
Guest
Posts: n/a

 03-05-2004
-- snip --
>> ('\x06\x1cabc',) string from source code

>> ('\\006\\034abc\n',) same string read from file

--snip --
> In other words, the backslashes are seen as literal backslashes.
> readline() does no evaluation of the string, it just copies the
> characters

yeah... you are right guys. I have matched two problems
reg exp are innocents .

Ok. Let's say so:
I have to read each line of a file and strip a particular string from there
(a string containing octal notation too)

the problem is actually the file.readline() that doesn't return
what I was expected to.

pardon my 'newbyeeeee' but is there a way to read a line xy from that file
and obtaining:

line xy: \006\034abc

('\x06\x1cabc',)

and not every single char in it like now ?
('\\006\\034abc\n',)

(before I start to reinvent the wheel ....... )

Thank you
Lucas

Jeff Epler
Guest
Posts: n/a

 03-05-2004
If you have a string and want to perform backslash-substitution on it,
use python2.3's "string_escape" codec.

Two examples:

>>> s = "\\n"
>>> s

'\\n'
>>> s.decode("string_escape")

'\n'

>>> "\x30"

'0'
>>> "\\x30"

'\\x30'
>>> "\\x30".decode("string_escape")

'0'

You can remove the trailing newline this way:
if s.endswith("\n"): s = s[:-1]

Jeff

Lucas Branca
Guest
Posts: n/a

 03-05-2004
Great!
It's just what I was looking for.
(...and I read it in "what's new" this morning ......
.... "boing boing" with my head now ... )

Thank you very much

"Jeff Epler" <(E-Mail Removed)> ha scritto nel messaggio
news:(E-Mail Removed)...
> If you have a string and want to perform backslash-substitution on it,
> use python2.3's "string_escape" codec.
>
> Two examples:
>
> >>> s = "\\n"
> >>> s

> '\\n'
> >>> s.decode("string_escape")

> '\n'
>
> >>> "\x30"

> '0'
> >>> "\\x30"

> '\\x30'
> >>> "\\x30".decode("string_escape")

> '0'
>
> You can remove the trailing newline this way:
> if s.endswith("\n"): s = s[:-1]
>
> Jeff
>