Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > [newbie] String to binary conversion

Reply
Thread Tools

[newbie] String to binary conversion

 
 
Mok-Kong Shen
Guest
Posts: n/a
 
      08-06-2012

If I have a string "abcd" then, with 8-bit encoding of each character,
there is a corresponding 32-bit binary integer. How could I best
obtain that integer and from that integer backwards again obtain the
original string? Thanks in advance.

M. K. Shen
 
Reply With Quote
 
 
 
 
Tobiah
Guest
Posts: n/a
 
      08-06-2012
The binascii module looks like it might have
something for you. I've never used it.

Tobiah

http://docs.python.org/library/binascii.html

On 08/06/2012 01:46 PM, Mok-Kong Shen wrote:
>
> If I have a string "abcd" then, with 8-bit encoding of each character,
> there is a corresponding 32-bit binary integer. How could I best
> obtain that integer and from that integer backwards again obtain the
> original string? Thanks in advance.
>
> M. K. Shen


 
Reply With Quote
 
 
 
 
Tobiah
Guest
Posts: n/a
 
      08-06-2012
On 08/06/2012 01:59 PM, Tobiah wrote:
> The binascii module looks like it might have
> something for you. I've never used it.


Having actually read some of that doc, I see
it's not what you want at all. Sorry.


 
Reply With Quote
 
Mok-Kong Shen
Guest
Posts: n/a
 
      08-06-2012
Am 06.08.2012 22:59, schrieb Tobiah:
> The binascii module looks like it might have
> something for you. I've never used it.


Thanks for the hint, but if I don't err, the module binascii doesn't
seem to work. I typed:

import binascii

and a line that's given as example in the document:

crc = binascii.crc32("hello")

but got the following error message:

TypeError: 'str' does not support the buffer interface.

The same error message appeared when I tried the other functions.

M. K. Shen

 
Reply With Quote
 
MRAB
Guest
Posts: n/a
 
      08-06-2012
On 06/08/2012 21:46, Mok-Kong Shen wrote:
>
> If I have a string "abcd" then, with 8-bit encoding of each character,
> there is a corresponding 32-bit binary integer. How could I best
> obtain that integer and from that integer backwards again obtain the
> original string? Thanks in advance.
>

Try this (Python 3, in which strings are Unicode):
>>> import struct
>>> # For a little-endian integer
>>> struct.unpack("<I", "abcd".encode("latin-1"))[0]

1684234849
>>> hex(_)

'0x64636261'

or this (Python 2, in which strings are bytestrings):
>>> import struct
>>> # For a little-endian integer
>>> struct.unpack("<I", "abcd")[0]

1684234849
>>> hex(_)

'0x64636261'

 
Reply With Quote
 
Emile van Sebille
Guest
Posts: n/a
 
      08-06-2012
On 8/6/2012 1:46 PM Mok-Kong Shen said...
>
> If I have a string "abcd" then, with 8-bit encoding of each character,
> there is a corresponding 32-bit binary integer. How could I best
> obtain that integer and from that integer backwards again obtain the
> original string? Thanks in advance.


It's easy to write one:

def str2val(str,_val=0):
if len(str)>1: return str2val(str[1:],256*_val+ord(str[0]))
return 256*_val+ord(str[0])


def val2str(val,_str=""):
if val>256: return val2str(int(val/256),_str)+chr(val%256)
return _str+chr(val)


print str2val("abcd")
print val2str(str2val("abcd"))
print val2str(str2val("good"))
print val2str(str2val("longer"))
print val2str(str2val("verymuchlonger"))

Flavor to taste.

Emile

 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      08-07-2012
On Mon, 06 Aug 2012 22:46:38 +0200, Mok-Kong Shen wrote:

> If I have a string "abcd" then, with 8-bit encoding of each character,
> there is a corresponding 32-bit binary integer. How could I best obtain
> that integer and from that integer backwards again obtain the original
> string? Thanks in advance.


First you have to know the encoding, as that will define the integers you
get. There are many 8-bit encodings, but of course they can't all encode
arbitrary 4-character strings. Since there are tens of thousands of
different characters, and an 8-bit encoding can only code for 256 of
them, there are many strings that an encoding cannot handle.

For those, you need multi-byte encodings like UTF-8, UTF-16, etc.

Sticking to one-byte encodings: since most of them are compatible with
ASCII, examples with "abcd" aren't very interesting:

py> 'abcd'.encode('latin1')
b'abcd'

Even though the bytes object b'abcd' is printed as if it were a string,
it is actually treated as an array of one-byte ints:

py> b'abcd'[0]
97

Here's a more interesting example, using Python 3: it uses at least one
character (the Greek letter π) which cannot be encoded in Latin1, and two
which cannot be encoded in ASCII:

py> "aπ©d".encode('iso-8859-7')
b'a\xf0\xa9d'

Most encodings will round-trip successfully:

py> text = 'aπ©Z!'
py> data = text.encode('iso-8859-7')
py> data.decode('iso-8859-7') == text
True


(although the ability to round-trip is a property of the encoding itself,
not of the encoding system).

Naturally if you encode with one encoding, and then decode with another,
you are likely to get different strings:

py> text = 'aπ©Z!'
py> data = text.encode('iso-8859-7')
py> data.decode('latin1')
'að©Z!'
py> data.decode('iso-8859-14')
'aŵ©Z!'


Both the encode and decode methods take an optional argument, errors,
which specify the error handling scheme. The default is errors='strict',
which raises an exception. Others include 'ignore' and 'replace'.

py> 'aŵðπ©Z!'.encode('ascii', 'ignore')
b'aZ!'
py> 'aŵðπ©Z!'.encode('ascii', 'replace')
b'a????Z!'



--
Steven
 
Reply With Quote
 
88888 Dihedral
Guest
Posts: n/a
 
      08-07-2012
Steven D'Aprano於 2012年8月7日星期二UTC+8上午10時01分05秒 寫道:
> On Mon, 06 Aug 2012 22:46:38 +0200, Mok-Kong Shen wrote:
>
>
>
> > If I have a string "abcd" then, with 8-bit encoding of each character,

>
> > there is a corresponding 32-bit binary integer. How could I best obtain

>
> > that integer and from that integer backwards again obtain the original

>
> > string? Thanks in advance.

>
>
>
> First you have to know the encoding, as that will define the integers you
>
> get. There are many 8-bit encodings, but of course they can't all encode
>
> arbitrary 4-character strings. Since there are tens of thousands of
>
> different characters, and an 8-bit encoding can only code for 256 of
>
> them, there are many strings that an encoding cannot handle.
>
>
>
> For those, you need multi-byte encodings like UTF-8, UTF-16, etc.
>
>
>
> Sticking to one-byte encodings: since most of them are compatible with
>
> ASCII, examples with "abcd" aren't very interesting:
>
>
>
> py> 'abcd'.encode('latin1')
>
> b'abcd'
>
>
>
> Even though the bytes object b'abcd' is printed as if it were a string,
>
> it is actually treated as an array of one-byte ints:
>
>
>
> py> b'abcd'[0]
>
> 97
>
>
>
> Here's a more interesting example, using Python 3: it uses at least one
>
> character (the Greek letter π) which cannot be encoded in Latin1, and two
>
> which cannot be encoded in ASCII:
>
>
>
> py> "aπ©d".encode('iso-8859-7')
>
> b'a\xf0\xa9d'
>
>
>
> Most encodings will round-trip successfully:
>
>
>
> py> text = 'aπ©Z!'
>
> py> data = text.encode('iso-8859-7')
>
> py> data.decode('iso-8859-7') == text
>
> True
>
>
>
>
>
> (although the ability to round-trip is a property of the encoding itself,
>
> not of the encoding system).
>
>
>
> Naturally if you encode with one encoding, and then decode with another,
>
> you are likely to get different strings:
>
>
>
> py> text = 'aπ©Z!'
>
> py> data = text.encode('iso-8859-7')
>
> py> data.decode('latin1')
>
> 'að©Z!'
>
> py> data.decode('iso-8859-14')
>
> 'aŵ©Z!'
>
>
>
>
>
> Both the encode and decode methods take an optional argument, errors,
>
> which specify the error handling scheme. The default is errors='strict',
>
> which raises an exception. Others include 'ignore' and 'replace'.
>
>
>
> py> 'aŵðπ©Z!'.encode('ascii', 'ignore')
>
> b'aZ!'
>
> py> 'aŵðπ©Z!'.encode('ascii', 'replace')
>
> b'a????Z!'
>
>
>
>
>
>
>
> --
>
> Steven




Steven D'Aprano於 2012年8月7日星期二UTC+8上午10時01分05秒 寫道:
> On Mon, 06 Aug 2012 22:46:38 +0200, Mok-Kong Shen wrote:
>
>
>
> > If I have a string "abcd" then, with 8-bit encoding of each character,

>
> > there is a corresponding 32-bit binary integer. How could I best obtain

>
> > that integer and from that integer backwards again obtain the original

>
> > string? Thanks in advance.

>
>
>
> First you have to know the encoding, as that will define the integers you
>
> get. There are many 8-bit encodings, but of course they can't all encode
>
> arbitrary 4-character strings. Since there are tens of thousands of
>
> different characters, and an 8-bit encoding can only code for 256 of
>
> them, there are many strings that an encoding cannot handle.
>
>
>
> For those, you need multi-byte encodings like UTF-8, UTF-16, etc.
>
>
>
> Sticking to one-byte encodings: since most of them are compatible with
>
> ASCII, examples with "abcd" aren't very interesting:
>
>
>
> py> 'abcd'.encode('latin1')
>
> b'abcd'
>
>
>
> Even though the bytes object b'abcd' is printed as if it were a string,
>
> it is actually treated as an array of one-byte ints:
>
>
>
> py> b'abcd'[0]
>
> 97
>
>
>
> Here's a more interesting example, using Python 3: it uses at least one
>
> character (the Greek letter π) which cannot be encoded in Latin1, and two
>
> which cannot be encoded in ASCII:
>
>
>
> py> "aπ©d".encode('iso-8859-7')
>
> b'a\xf0\xa9d'
>
>
>
> Most encodings will round-trip successfully:
>
>
>
> py> text = 'aπ©Z!'
>
> py> data = text.encode('iso-8859-7')
>
> py> data.decode('iso-8859-7') == text
>
> True
>
>
>
>
>
> (although the ability to round-trip is a property of the encoding itself,
>
> not of the encoding system).
>
>
>
> Naturally if you encode with one encoding, and then decode with another,
>
> you are likely to get different strings:
>
>
>
> py> text = 'aπ©Z!'
>
> py> data = text.encode('iso-8859-7')
>
> py> data.decode('latin1')
>
> 'að©Z!'
>
> py> data.decode('iso-8859-14')
>
> 'aŵ©Z!'
>
>
>
>
>
> Both the encode and decode methods take an optional argument, errors,
>
> which specify the error handling scheme. The default is errors='strict',
>
> which raises an exception. Others include 'ignore' and 'replace'.
>
>
>
> py> 'aŵðπ©Z!'.encode('ascii', 'ignore')
>
> b'aZ!'
>
> py> 'aŵðπ©Z!'.encode('ascii', 'replace')
>
> b'a????Z!'
>
>
>
>
>
>
>
> --
>
> Steven


I think UTF-8 CODEC or UTF-16 is necessary, just recall those MS encoding codecs
of Win98, and NT that collected taxes all over the world.


Actually for each kind of some character encoding,
please develop a codec to UTF-8 or UTF-16.

It means one can make conversions between any two of the qualified
character sets.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
(8-bit binary to two digit bcd) or (8-bit binary to two digit seven segment) Fangs VHDL 3 10-26-2008 06:41 AM
error C2679: binary '<<' : no operator found which takes a right-hand operand of type 'std::string' (or there is no acceptable conversion) aarthi28@gmail.com C++ 29 06-21-2007 08:42 PM
How to mid string on a binary string? jt C Programming 10 09-22-2005 09:07 PM
String Array to Binary Number Conversion Delali Dzirasa C++ 4 09-15-2003 02:00 PM
data conversion question (binary string to 'real string') Alexander Eisenhuth Python 5 07-25-2003 06:42 PM



Advertisments