Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   [newbie] String to binary conversion (http://www.velocityreviews.com/forums/t949177-newbie-string-to-binary-conversion.html)

Mok-Kong Shen 08-06-2012 08:46 PM

[newbie] String to binary conversion
 

If I have a string "abcd" then, with 8-bit encoding of each character,
there is a corresponding 32-bit binary integer. How could I best
obtain that integer and from that integer backwards again obtain the
original string? Thanks in advance.

M. K. Shen

Tobiah 08-06-2012 08:59 PM

Re: [newbie] String to binary conversion
 
The binascii module looks like it might have
something for you. I've never used it.

Tobiah

http://docs.python.org/library/binascii.html

On 08/06/2012 01:46 PM, Mok-Kong Shen wrote:
>
> If I have a string "abcd" then, with 8-bit encoding of each character,
> there is a corresponding 32-bit binary integer. How could I best
> obtain that integer and from that integer backwards again obtain the
> original string? Thanks in advance.
>
> M. K. Shen



Tobiah 08-06-2012 09:01 PM

Re: [newbie] String to binary conversion
 
On 08/06/2012 01:59 PM, Tobiah wrote:
> The binascii module looks like it might have
> something for you. I've never used it.


Having actually read some of that doc, I see
it's not what you want at all. Sorry.



Mok-Kong Shen 08-06-2012 09:33 PM

Re: [newbie] String to binary conversion
 
Am 06.08.2012 22:59, schrieb Tobiah:
> The binascii module looks like it might have
> something for you. I've never used it.


Thanks for the hint, but if I don't err, the module binascii doesn't
seem to work. I typed:

import binascii

and a line that's given as example in the document:

crc = binascii.crc32("hello")

but got the following error message:

TypeError: 'str' does not support the buffer interface.

The same error message appeared when I tried the other functions.

M. K. Shen


MRAB 08-06-2012 09:56 PM

Re: [newbie] String to binary conversion
 
On 06/08/2012 21:46, Mok-Kong Shen wrote:
>
> If I have a string "abcd" then, with 8-bit encoding of each character,
> there is a corresponding 32-bit binary integer. How could I best
> obtain that integer and from that integer backwards again obtain the
> original string? Thanks in advance.
>

Try this (Python 3, in which strings are Unicode):
>>> import struct
>>> # For a little-endian integer
>>> struct.unpack("<I", "abcd".encode("latin-1"))[0]

1684234849
>>> hex(_)

'0x64636261'

or this (Python 2, in which strings are bytestrings):
>>> import struct
>>> # For a little-endian integer
>>> struct.unpack("<I", "abcd")[0]

1684234849
>>> hex(_)

'0x64636261'


Emile van Sebille 08-06-2012 10:45 PM

Re: [newbie] String to binary conversion
 
On 8/6/2012 1:46 PM Mok-Kong Shen said...
>
> If I have a string "abcd" then, with 8-bit encoding of each character,
> there is a corresponding 32-bit binary integer. How could I best
> obtain that integer and from that integer backwards again obtain the
> original string? Thanks in advance.


It's easy to write one:

def str2val(str,_val=0):
if len(str)>1: return str2val(str[1:],256*_val+ord(str[0]))
return 256*_val+ord(str[0])


def val2str(val,_str=""):
if val>256: return val2str(int(val/256),_str)+chr(val%256)
return _str+chr(val)


print str2val("abcd")
print val2str(str2val("abcd"))
print val2str(str2val("good"))
print val2str(str2val("longer"))
print val2str(str2val("verymuchlonger"))

Flavor to taste.

Emile


Steven D'Aprano 08-07-2012 02:01 AM

Re: [newbie] String to binary conversion
 
On Mon, 06 Aug 2012 22:46:38 +0200, Mok-Kong Shen wrote:

> If I have a string "abcd" then, with 8-bit encoding of each character,
> there is a corresponding 32-bit binary integer. How could I best obtain
> that integer and from that integer backwards again obtain the original
> string? Thanks in advance.


First you have to know the encoding, as that will define the integers you
get. There are many 8-bit encodings, but of course they can't all encode
arbitrary 4-character strings. Since there are tens of thousands of
different characters, and an 8-bit encoding can only code for 256 of
them, there are many strings that an encoding cannot handle.

For those, you need multi-byte encodings like UTF-8, UTF-16, etc.

Sticking to one-byte encodings: since most of them are compatible with
ASCII, examples with "abcd" aren't very interesting:

py> 'abcd'.encode('latin1')
b'abcd'

Even though the bytes object b'abcd' is printed as if it were a string,
it is actually treated as an array of one-byte ints:

py> b'abcd'[0]
97

Here's a more interesting example, using Python 3: it uses at least one
character (the Greek letter π) which cannot be encoded in Latin1, and two
which cannot be encoded in ASCII:

py> "aπ©d".encode('iso-8859-7')
b'a\xf0\xa9d'

Most encodings will round-trip successfully:

py> text = 'aπ©Z!'
py> data = text.encode('iso-8859-7')
py> data.decode('iso-8859-7') == text
True


(although the ability to round-trip is a property of the encoding itself,
not of the encoding system).

Naturally if you encode with one encoding, and then decode with another,
you are likely to get different strings:

py> text = 'aπ©Z!'
py> data = text.encode('iso-8859-7')
py> data.decode('latin1')
'að©Z!'
py> data.decode('iso-8859-14')
'aŵ©Z!'


Both the encode and decode methods take an optional argument, errors,
which specify the error handling scheme. The default is errors='strict',
which raises an exception. Others include 'ignore' and 'replace'.

py> 'aŵðπ©Z!'.encode('ascii', 'ignore')
b'aZ!'
py> 'aŵðπ©Z!'.encode('ascii', 'replace')
b'a????Z!'



--
Steven

88888 Dihedral 08-07-2012 08:17 PM

Re: [newbie] String to binary conversion
 
Steven D'Aprano於 2012年8月7日星期二UTC+8上午10時01分05秒 寫道:
> On Mon, 06 Aug 2012 22:46:38 +0200, Mok-Kong Shen wrote:
>
>
>
> > If I have a string "abcd" then, with 8-bit encoding of each character,

>
> > there is a corresponding 32-bit binary integer. How could I best obtain

>
> > that integer and from that integer backwards again obtain the original

>
> > string? Thanks in advance.

>
>
>
> First you have to know the encoding, as that will define the integers you
>
> get. There are many 8-bit encodings, but of course they can't all encode
>
> arbitrary 4-character strings. Since there are tens of thousands of
>
> different characters, and an 8-bit encoding can only code for 256 of
>
> them, there are many strings that an encoding cannot handle.
>
>
>
> For those, you need multi-byte encodings like UTF-8, UTF-16, etc.
>
>
>
> Sticking to one-byte encodings: since most of them are compatible with
>
> ASCII, examples with "abcd" aren't very interesting:
>
>
>
> py> 'abcd'.encode('latin1')
>
> b'abcd'
>
>
>
> Even though the bytes object b'abcd' is printed as if it were a string,
>
> it is actually treated as an array of one-byte ints:
>
>
>
> py> b'abcd'[0]
>
> 97
>
>
>
> Here's a more interesting example, using Python 3: it uses at least one
>
> character (the Greek letter π) which cannot be encoded in Latin1, and two
>
> which cannot be encoded in ASCII:
>
>
>
> py> "aπ©d".encode('iso-8859-7')
>
> b'a\xf0\xa9d'
>
>
>
> Most encodings will round-trip successfully:
>
>
>
> py> text = 'aπ©Z!'
>
> py> data = text.encode('iso-8859-7')
>
> py> data.decode('iso-8859-7') == text
>
> True
>
>
>
>
>
> (although the ability to round-trip is a property of the encoding itself,
>
> not of the encoding system).
>
>
>
> Naturally if you encode with one encoding, and then decode with another,
>
> you are likely to get different strings:
>
>
>
> py> text = 'aπ©Z!'
>
> py> data = text.encode('iso-8859-7')
>
> py> data.decode('latin1')
>
> 'að©Z!'
>
> py> data.decode('iso-8859-14')
>
> 'aŵ©Z!'
>
>
>
>
>
> Both the encode and decode methods take an optional argument, errors,
>
> which specify the error handling scheme. The default is errors='strict',
>
> which raises an exception. Others include 'ignore' and 'replace'.
>
>
>
> py> 'aŵðπ©Z!'.encode('ascii', 'ignore')
>
> b'aZ!'
>
> py> 'aŵðπ©Z!'.encode('ascii', 'replace')
>
> b'a????Z!'
>
>
>
>
>
>
>
> --
>
> Steven




Steven D'Aprano於 2012年8月7日星期二UTC+8上午10時01分05秒 寫道:
> On Mon, 06 Aug 2012 22:46:38 +0200, Mok-Kong Shen wrote:
>
>
>
> > If I have a string "abcd" then, with 8-bit encoding of each character,

>
> > there is a corresponding 32-bit binary integer. How could I best obtain

>
> > that integer and from that integer backwards again obtain the original

>
> > string? Thanks in advance.

>
>
>
> First you have to know the encoding, as that will define the integers you
>
> get. There are many 8-bit encodings, but of course they can't all encode
>
> arbitrary 4-character strings. Since there are tens of thousands of
>
> different characters, and an 8-bit encoding can only code for 256 of
>
> them, there are many strings that an encoding cannot handle.
>
>
>
> For those, you need multi-byte encodings like UTF-8, UTF-16, etc.
>
>
>
> Sticking to one-byte encodings: since most of them are compatible with
>
> ASCII, examples with "abcd" aren't very interesting:
>
>
>
> py> 'abcd'.encode('latin1')
>
> b'abcd'
>
>
>
> Even though the bytes object b'abcd' is printed as if it were a string,
>
> it is actually treated as an array of one-byte ints:
>
>
>
> py> b'abcd'[0]
>
> 97
>
>
>
> Here's a more interesting example, using Python 3: it uses at least one
>
> character (the Greek letter π) which cannot be encoded in Latin1, and two
>
> which cannot be encoded in ASCII:
>
>
>
> py> "aπ©d".encode('iso-8859-7')
>
> b'a\xf0\xa9d'
>
>
>
> Most encodings will round-trip successfully:
>
>
>
> py> text = 'aπ©Z!'
>
> py> data = text.encode('iso-8859-7')
>
> py> data.decode('iso-8859-7') == text
>
> True
>
>
>
>
>
> (although the ability to round-trip is a property of the encoding itself,
>
> not of the encoding system).
>
>
>
> Naturally if you encode with one encoding, and then decode with another,
>
> you are likely to get different strings:
>
>
>
> py> text = 'aπ©Z!'
>
> py> data = text.encode('iso-8859-7')
>
> py> data.decode('latin1')
>
> 'að©Z!'
>
> py> data.decode('iso-8859-14')
>
> 'aŵ©Z!'
>
>
>
>
>
> Both the encode and decode methods take an optional argument, errors,
>
> which specify the error handling scheme. The default is errors='strict',
>
> which raises an exception. Others include 'ignore' and 'replace'.
>
>
>
> py> 'aŵðπ©Z!'.encode('ascii', 'ignore')
>
> b'aZ!'
>
> py> 'aŵðπ©Z!'.encode('ascii', 'replace')
>
> b'a????Z!'
>
>
>
>
>
>
>
> --
>
> Steven


I think UTF-8 CODEC or UTF-16 is necessary, just recall those MS encoding codecs
of Win98, and NT that collected taxes all over the world.


Actually for each kind of some character encoding,
please develop a codec to UTF-8 or UTF-16.

It means one can make conversions between any two of the qualified
character sets.



All times are GMT. The time now is 03:55 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.