Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > regex/lambda black magic

Reply
Thread Tools

regex/lambda black magic

 
 
Andrew Robert
Guest
Posts: n/a
 
      05-25-2006
Hi everyone,

I have two test scripts, an encoder and a decoder.

The encoder, listed below, works perfectly.


import re,sys
output = open(r'e:\pycode\out_test.txt','wb')
for line in open(r'e:\pycode\sigh.txt','rb') :
output.write( re.sub(r'([^\w\s])', lambda s: '%%%2X' %
ord(s.group()), line))


The decoder, well, I have hopes.


import re,sys
output = open(r'e:\pycode\new_test.txt','wb')
for line in open(r'e:\pycode\out_test.txt','rb') :
output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))
% ord(s.group()), line))


The decoder generates the following traceback:

Traceback (most recent call last):
File "E:\pycode\sample_decode_file_specials_from_hex.py ", line 9, in ?
output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))
% ord(s.group()), line))
File "C:\Python24\lib\sre.py", line 142, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "E:\pycode\sample_decode_file_specials_from_hex.py ", line 9, in
<lambda>
output.write( re.sub(r'([^\w\s])', lambda s: chr(int(s.group(), 16))
% ord(s.group()), line))
ValueError: invalid literal for int(): %

Does anyone see what I am doing wrong?


 
Reply With Quote
 
 
 
 
Max Erickson
Guest
Posts: n/a
 
      05-25-2006
Andrew Robert <(E-Mail Removed)> wrote:

> ValueError: invalid literal for int(): %
>
> Does anyone see what I am doing wrong?
>


Try getting rid of the lamba, it might make things clearer and it
simplifies debugging. Something like(this is just a sketch):

def callback(match):
print match.group()
return chr(int(match.group(),16)) % ord(match.group())

output.write(re.sub('r([^\w\s])', callback, line)

It looks like your match.group is a '%' character:

>>> int('%', 16)

Traceback (most recent call last):
File "<pyshell#108>", line 1, in ?
int('%', 16)
ValueError: invalid literal for int(): %
>>>



max

 
Reply With Quote
 
 
 
 
Andrew Robert
Guest
Posts: n/a
 
      05-25-2006
Max Erickson wrote:
<snip>

</snip>

> Try getting rid of the lamba, it might make things clearer and it
> simplifies debugging. Something like(this is just a sketch):
>
>
> max
>

Yeah.. trying to keep everything on one line is becoming something of a
problem.

To make this easier, I followed something from another poster and came
up with this.

import re,base64

# Evaluate captured character as hex
def ret_hex(value):
return base64.b16encode(value)

def ret_ascii(value):
return base64.b16decode(value)

# Evaluate the value of whatever was matched
def eval_match(match):
return ret_ascii(match.group(0))

# Evaluate the value of whatever was matched
# def eval_match(match):
# return ret_hex(match.group(0))

out=open(r'e:\pycode\sigh.new2','wb')

# Read each line, pass any matches on line to function for
# line in file.readlines():
for line in open(r'e:\pycode\sigh.new','rb'):
print (re.sub('[^\w\s]',eval_match, line))



The char to hex pass works but omits the leading % at the start of each
hex value.

ie. 22 instead of %22


The hex to char pass does not appear to work at all.

No error is generated. It just appears to be ignored.
 
Reply With Quote
 
Max Erickson
Guest
Posts: n/a
 
      05-25-2006
Andrew Robert <(E-Mail Removed)> wrote:
> import re,base64
>
> # Evaluate captured character as hex
> def ret_hex(value):
> return base64.b16encode(value)
>
> def ret_ascii(value):
> return base64.b16decode(value)
>


Note that you can just do this:

from base64 import b16encode,b16decode

and use them directly, or

ret_hex=base64.b16encode

ret_ascii=base64.b16decode

if you want different names.


As far as the rest of your problem goes, I only see one pass being
made, is the code you posted the code you are running?

Also, is there some reason that base64.b16encode should be returning a
string that starts with a '%'?

All I would expect is:

base64.b16decode(base64.b16encode(input))==input

other than that I have no idea about the expected behavior.

max

 
Reply With Quote
 
Andrew Robert
Guest
Posts: n/a
 
      05-25-2006

Hi Everyone,


Thanks for all of your patience on this.

I finally got it to work.


Here is the completed test code showing what is going on.

Not cleaned up yet but it works for proof-of-concept purposes.



#!/usr/bin/python

import re,base64

# Evaluate captured character as hex
def ret_hex(value):
return '%'+base64.b16encode(value)

# Evaluate the value of whatever was matched
def enc_hex_match(match):
return ret_hex(match.group(0))

def ret_ascii(value):
return base64.b16decode(value)

# Evaluate the value of whatever was matched
def enc_ascii_match(match):

arg=match.group()

#remove the artifically inserted % sign
arg=arg[1:]

# decode the result
return ret_ascii(arg)

def file_encoder():
# Read each line, pass any matches on line to function for
# line in file.readlines():
output=open(r'e:\pycode\sigh.new','wb')
for line in open(r'e:\pycode\sigh.txt','rb'):
output.write( (re.sub('[^\w\s]',enc_hex_match, line)) )
output.close()


def file_decoder():
# Read each line, pass any matches on line to function for
# line in file.readlines():

output=open(r'e:\pycode\sigh.new2','wb')
for line in open(r'e:\pycode\sigh.new','rb'):
output.write(re.sub('%[0-9A-F][0-9A-F]',enc_ascii_match, line))
output.close()




file_encoder()

file_decoder()
 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      05-25-2006
On 26/05/2006 4:33 AM, Andrew Robert wrote:
> Hi Everyone,
>
>
> Thanks for all of your patience on this.
>
> I finally got it to work.
>
>
> Here is the completed test code showing what is going on.


Consider doing what you should have done at the start: state what you
are trying to achieve. Not very many people have the patience that Max
showing ploughing through code that was both fugly and broken in order
to determine what it should have been doing.

What is the motivation for encoding characters like
,./<>;':"`~!@#$^&*()-+=[]\{}|

>
> Not cleaned up yet but it works for proof-of-concept purposes.
>
>
>
> #!/usr/bin/python
>
> import re,base64
>
> # Evaluate captured character as hex
> def ret_hex(value):
> return '%'+base64.b16encode(value)


This is IMHO rather pointless and obfuscatory, calling a function in a
module when it can be done by a standard language feature. Why did you
change it from the original "%%%2X" % value (which would have been
better IMHO done as "%%%02X" % value)?

>
> # Evaluate the value of whatever was matched
> def enc_hex_match(match):
> return ret_hex(match.group(0))


Why a second level of function call?

>
> def ret_ascii(value):
> return base64.b16decode(value)


See above.


>
> # Evaluate the value of whatever was matched
> def enc_ascii_match(match):
>
> arg=match.group()
>
> #remove the artifically inserted % sign


Don't bother, just ignore it.
return int(match()[1:], 16)

> arg=arg[1:]
>
> # decode the result
> return ret_ascii(arg)
>
> def file_encoder():
> # Read each line, pass any matches on line to function for
> # line in file.readlines():
> output=open(r'e:\pycode\sigh.new','wb')
> for line in open(r'e:\pycode\sigh.txt','rb'):
> output.write( (re.sub('[^\w\s]',enc_hex_match, line)) )
> output.close()


Why are you opening the file with "rb" but then reading it a line at a time?
For a binary file, the whole file may be one "line"; it would be safer
to read() blocks of say 8Kb.
For a text file, the only point of the binary mode might be to avoid any
sort of problem caused by OS-dependant definitions of "newline" i.e.
CRLF vs LF. I note that as \r and \n are whitespace, you are not
encoding them as %0D and %0A; is this deliberate?

>
> def file_decoder():
> # Read each line, pass any matches on line to function for
> # line in file.readlines():
>
> output=open(r'e:\pycode\sigh.new2','wb')
> for line in open(r'e:\pycode\sigh.new','rb'):
> output.write(re.sub('%[0-9A-F][0-9A-F]',enc_ascii_match, line))
> output.close()
>
>
>
>
> file_encoder()
>
> file_decoder()

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
magic / counter-magic? (detect loading file?) Giles Bowkett Ruby 9 12-17-2007 05:42 AM
Black Magic - Currying using __get__ Michael Spencer Python 0 03-24-2005 07:27 PM
Deep Black Magic in Python: please help Jan Burgy Python 2 08-16-2004 07:04 AM



Advertisments