Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Working with bytes.

Reply
Thread Tools

Re: Working with bytes.

 
 
Adam T. Gautier
Guest
Posts: n/a
 
      04-03-2004
I came up with a solution using the binascii module's hexlify method.
Thanks

Adam T. Gautier wrote:

> I have been unable to solve a problem. I am working with MD5
> signatures trying to put these in a database. The MD5 signatures are
> not generated using the python md5 module but an external application
> that is producing the valid 16 byte signature formats. Anyway, these
> 16 byte signatures are not nescarrally valid strings. How do I
> manipulate the bytes? I need to concatenate the bytes with a SQL
> statement which is a string. This works fine for most of the md5
> signatures but some blow up with a TypeError. Because there is a NULL
> byte or something else. So I guess my ultimate question is how do I
> get a prepared SQL statement to accept a series of bytes? How do I
> convert the bytes to a valid string like:
>
> 'x%L9d\340\316\262\363\037\311\345<\262\357\215'
>
> that can be concatenated?
>
> Thanks
>



 
Reply With Quote
 
 
 
 
Anton Vredegoor
Guest
Posts: n/a
 
      04-03-2004
"Adam T. Gautier" <(E-Mail Removed)> wrote:

>I came up with a solution using the binascii module's hexlify method.


That is the most obvious method, I think. However, the code below
stores 7 bits per byte and still remains ascii-compliant (the
binascii.hexlify method stores 4 bits per byte).

Anton

from itertools import islice

def _bits(i):
return [('01'[i>>j & 1]) for j in range(][::-1]

_table = dict([(chr(i),_bits(i)) for i in range(256)])

def _bitstream(bytes):
for byte in bytes:
for bit in _table[byte]:
yield bit

def _dropfirst(gen):
while 1:
gen.next()
for x in islice(gen,7):
yield x

def sevens(bytes):
""" stream normal bytes to bytes where bit 8 is always 1"""
gen = _bitstream(bytes)
while 1:
R = list(islice(gen,7))
if not R: break
s = '1'+ "".join(R) + '0' * (7-len(R))
yield chr(int(s,2))

def eights(bytes,n):
""" the reverse of the sevens function """
gen = _bitstream(bytes)
df = _dropfirst(gen)
for i in xrange(n):
s = ''.join(islice(df,)
yield chr(int(s,2))

def test():
from random import randint
size = 40
R = [chr(randint(0,255)) for i in xrange(size)]
bytes = ''.join(R)
sv = ''.join(sevens(bytes))
check = ''.join(eights(sv,size))
assert check == bytes
print sv

if __name__ == '__main__':
test()

sample output:

扮͡޷˂זڿ힛ͣ



 
Reply With Quote
 
 
 
 
Piet van Oostrum
Guest
Posts: n/a
 
      04-04-2004
>>>>> http://www.velocityreviews.com/forums/(E-Mail Removed) (Anton Vredegoor) (AV) wrote:

AV> "Adam T. Gautier" <(E-Mail Removed)> wrote:
>> I came up with a solution using the binascii module's hexlify method.


AV> That is the most obvious method, I think. However, the code below
AV> stores 7 bits per byte and still remains ascii-compliant (the
AV> binascii.hexlify method stores 4 bits per byte).
.....
AV> sample output:

AV> Ÿæ‰®ÑëÍ¡¾÷ÁóÆú½Þ·ú˂ז𠿋ˆ²ªÅ*ž›¾Ÿ£•Í£¬ô²ŸÕØ

Which includes quite a few NON-ASCII characters.
So what is ASCII-compliant about it?
You can't store 7 bits per byte and still be ASCII-compliant. At least if
you don't want to include control characters.
--
Piet van Oostrum <(E-Mail Removed)>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: (E-Mail Removed)
 
Reply With Quote
 
Anton Vredegoor
Guest
Posts: n/a
 
      04-05-2004
Piet van Oostrum <(E-Mail Removed)> wrote:

>AV>


[snip]

>Which includes quite a few NON-ASCII characters.
>So what is ASCII-compliant about it?
>You can't store 7 bits per byte and still be ASCII-compliant. At least if
>you don't want to include control characters.


Thanks, and yes you are right. I thought that getting rid of control
codes just meant switching to the high bit codes, but of course
control codes are part of the lower bit population and can't be
removed that way. Worse than that: high bit codes are not
ASCII-compliant at all!

However the code below has the 8'th and 7'th bit always set to 0 and 1
respectively, so it should produce ASCII-compliant output using 6 bits
per byte.

I wonder whether it would be possible to use more than six bits per
byte but less than seven? There seem to be some character codes left
and these could be used too?

Anton

from itertools import islice

def _bits(i):
return [('01'[i>>j & 1]) for j in range(][::-1]

_table = dict([(chr(i),_bits(i)) for i in range(256)])

def _bitstream(bytes):
for byte in bytes:
for bit in _table[byte]:
yield bit

def _drop_first_two(gen):
while 1:
gen.next()
gen.next()
for x in islice(gen,6):
yield x

def sixes(bytes):
""" stream normal bytes to bytes where bits 8,7 are 0,1 """
gen = _bitstream(bytes)
while 1:
R = list(islice(gen,6))
if not R: break
s = '01'+ "".join(R) + '0' * (6-len(R))
yield chr(int(s,2))

def eights(bytes,n):
""" the reverse of the sixes function """
gen = _bitstream(bytes)
df = _drop_first_two(gen)
for i in xrange(n):
s = ''.join(islice(df,)
yield chr(int(s,2))

def test():
from random import randint
size = 20
R = [chr(randint(0,255)) for i in xrange(size)]
bytes = ''.join(R)
sx = ''.join(sixes(bytes))
check = ''.join(eights(sx,size))
assert check == bytes
print sx

if __name__ == '__main__':
test()

output:

VMtdh[LII~Qexdyg}xFRhXRIVx

 
Reply With Quote
 
Jason Harper
Guest
Posts: n/a
 
      04-05-2004
Anton Vredegoor wrote:
> I wonder whether it would be possible to use more than six bits per
> byte but less than seven? There seem to be some character codes left
> and these could be used too?


Look up Base85 coding (a standard part of PostScript) for an example of
how this can be done - 4 bytes encoded per 5 characters of printable ASCII.
Jason Harper
 
Reply With Quote
 
Piet van Oostrum
Guest
Posts: n/a
 
      04-05-2004
>>>>> (E-Mail Removed) (Anton Vredegoor) (AV) wrote:

AV> Piet van Oostrum <(E-Mail Removed)> wrote:

>> Which includes quite a few NON-ASCII characters.
>> So what is ASCII-compliant about it?
>> You can't store 7 bits per byte and still be ASCII-compliant. At least if
>> you don't want to include control characters.


AV> Thanks, and yes you are right. I thought that getting rid of control
AV> codes just meant switching to the high bit codes, but of course
AV> control codes are part of the lower bit population and can't be
AV> removed that way. Worse than that: high bit codes are not
AV> ASCII-compliant at all!

AV> However the code below has the 8'th and 7'th bit always set to 0 and 1
AV> respectively, so it should produce ASCII-compliant output using 6 bits
AV> per byte.

Except that the highest code you get is 0177 which is DEL, and is also a
control code. If you store 6 bits per byte that is also what BASE64 does,
so why reinvent the wheel?

AV> I wonder whether it would be possible to use more than six bits per
AV> byte but less than seven? There seem to be some character codes left
AV> and these could be used too?

Yes, you could in principle use 94 characters. There is a scheme called
btoa that encodes 4 bytes into 5 ASCII characters by using BASE85, but I
have never seen a Python implementation of it. It shouldn't be difficult,
however.
--
Piet van Oostrum <(E-Mail Removed)>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: (E-Mail Removed)
 
Reply With Quote
 
Bernhard Herzog
Guest
Posts: n/a
 
      04-05-2004
Piet van Oostrum <(E-Mail Removed)> writes:

> Yes, you could in principle use 94 characters. There is a scheme called
> btoa that encodes 4 bytes into 5 ASCII characters by using BASE85, but I
> have never seen a Python implementation of it. It shouldn't be difficult,
> however.


Is that the same as PDF/PostScript Ascii85? If so, there's an
implementation somewhere in reportlab, IIRC.

Bernhard

--
Intevation GmbH http://intevation.de/
Skencil http://sketch.sourceforge.net/
Thuban http://thuban.intevation.org/
 
Reply With Quote
 
Piet van Oostrum
Guest
Posts: n/a
 
      04-05-2004
>>>>> Bernhard Herzog <(E-Mail Removed)> (BH) wrote:

BH> Piet van Oostrum <(E-Mail Removed)> writes:
>> Yes, you could in principle use 94 characters. There is a scheme called
>> btoa that encodes 4 bytes into 5 ASCII characters by using BASE85, but I
>> have never seen a Python implementation of it. It shouldn't be difficult,
>> however.


BH> Is that the same as PDF/PostScript Ascii85? If so, there's an
BH> implementation somewhere in reportlab, IIRC.

They are slightly different AFAIK. Postscript uses '~' and btoa uses 'x'
as terminating character. For the OP's use it doesn't matter of course.
--
Piet van Oostrum <(E-Mail Removed)>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: (E-Mail Removed)
 
Reply With Quote
 
Anton Vredegoor
Guest
Posts: n/a
 
      04-08-2004
Jason Harper <(E-Mail Removed)> wrote:

>Anton Vredegoor wrote:
>> I wonder whether it would be possible to use more than six bits per
>> byte but less than seven? There seem to be some character codes left
>> and these could be used too?

>
>Look up Base85 coding (a standard part of PostScript) for an example of
>how this can be done - 4 bytes encoded per 5 characters of printable ASCII.


Thanks to you and Piet for mentioning this. I found some other
interesting application of Base85 encoding. It's used for a scheme to
encode ipv6 addresses (which use 128 bits). Since a md5 digest is 16
bytes (== 128 bits) there's a possibility to use this scheme. See

http://www.faqs.org/rfcs/rfc1924.html

for the details.

Anton

from string import digits, ascii_letters

_rfc1924_chars = digits+ascii_letters+'!#$%&()*+-;<=>?@^_`{|}~'
_rfc1924_table = dict([(c,i) for i,c in enumerate(_rfc1924_chars)])
_rfc1924_bases = [85L**i for i in range(20)]

def bytes_to_rfc1924(sixteen):
res = []
i = 0L
for byte in sixteen:
i <<= 8
i |= ord(byte)
for j in range(20):
i,k = divmod(i,85)
res.append(_rfc1924_chars[k])
return "".join(res)

def rfc1924_to_bytes(twenty):
res = []
i = 0L
for b,byte in zip(_rfc1924_bases,twenty):
i += b*_rfc1924_table[byte]
for j in range(16):
k = i & 255
res.append(chr(k))
i >>= 8
res.reverse()
return "".join(res)

def test():
import md5

#md5.digest returns 16 bytes == 128 bits, an ipv6 address
#also uses 128 bits (I don't know which format so I'm using md5
#as a dummy placeholder to get 16 bytes of 'random' data)

bytes = md5.new('9034572345asdf').digest()
r = bytes_to_rfc1924(bytes)
print r
check = rfc1924_to_bytes(r)
assert bytes == check

if __name__=='__main__':
test()

output:

k#llNFNo4sYFxKn*J<lB

 
Reply With Quote
 
Anton Vredegoor
Guest
Posts: n/a
 
      04-08-2004
(E-Mail Removed) (Anton Vredegoor) wrote:

> http://www.faqs.org/rfcs/rfc1924.html


Replying to my own post. I was using lowercase letters before
uppercase and did some other non compliant things. Because I was using
random data I didn't notice. The code below should reproduce the
rfc1924 example.

Anton

from binascii import hexlify
from string import digits, ascii_lowercase, ascii_uppercase

_rfc1924_letters = ascii_uppercase + ascii_lowercase
_rfc1924_chars = digits+_rfc1924_letters+'!#$%&()*+-;<=>?@^_`{|}~'
_rfc1924_table = dict([(c,i) for i,c in enumerate(_rfc1924_chars)])
_rfc1924_bases = [85L**i for i in range(19,-1,-1)]

def bytes_to_rfc1924(sixteen):
res = []
i = 0L
for byte in sixteen:
i <<= 8
i |= ord(byte)
for j in range(20):
i,k = divmod(i,85)
res.append(_rfc1924_chars[k])
res.reverse()
return "".join(res)

def rfc1924_to_bytes(twenty):
res = []
i = 0L
for b,byte in zip(_rfc1924_bases,twenty):
i += b * _rfc1924_table[byte]
for j in range(16):
i,k = divmod(i,256)
res.append(chr(k))
res.reverse()
return "".join(res)

def bytes_as_ipv6(bytes):
addr = [bytes[i:i+2] for i in range(0,16,2)]
return ":".join(map(hexlify,addr))

def test():
s = "4)+k&C#VzJ4br>0wv%Yp"
bytes = rfc1924_to_bytes(s)
check = bytes_to_rfc1924(bytes)
assert s == check
addr = bytes_as_ipv6(bytes)
print addr

if __name__=='__main__':
test()

output:

1080:0000:0000:0000:0008:0800:200c:417a




 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
response.redirect is not working but server.transfer is working gaurav tyagi ASP .Net 14 01-20-2006 04:22 AM
wifi not working on new hp, or not working after live update =?Utf-8?B?RHJhZ29ueA==?= Wireless Networking 1 10-01-2005 11:17 PM
ASP.NET client-side validation working, but button click not working Alan Silver ASP .Net 1 08-02-2005 03:50 PM
Cookies working on intranet but NOT working on Internet Martin Heuckeroth ASP .Net 5 04-01-2005 01:37 AM
Regular Expression validators NOT working, Required Field validators ARE working Ratman ASP .Net 0 09-14-2004 09:36 PM



Advertisments