Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > "convert" string to bytes without changing data (encoding)

Reply
Thread Tools

"convert" string to bytes without changing data (encoding)

 
 
Peter Daum
Guest
Posts: n/a
 
      03-28-2012
Hi,

is there any way to convert a string to bytes without
interpreting the data in any way? Something like:

s='abcde'
b=bytes(s, "unchanged")

Regards,
Peter
 
Reply With Quote
 
 
 
 
Chris Angelico
Guest
Posts: n/a
 
      03-28-2012
On Wed, Mar 28, 2012 at 7:56 PM, Peter Daum <(E-Mail Removed)-berlin.de> wrote:
> Hi,
>
> is there any way to convert a string to bytes without
> interpreting the data in any way? Something like:
>
> s='abcde'
> b=bytes(s, "unchanged")


What is a string? It's not a series of bytes. You can't convert it
without encoding those characters into bytes in some way.

ChrisA
 
Reply With Quote
 
 
 
 
Stefan Behnel
Guest
Posts: n/a
 
      03-28-2012
Peter Daum, 28.03.2012 10:56:
> is there any way to convert a string to bytes without
> interpreting the data in any way? Something like:
>
> s='abcde'
> b=bytes(s, "unchanged")


If you can tell us what you actually want to achieve, i.e. why you want to
do this, we may be able to tell you how to do what you want.

Stefan

 
Reply With Quote
 
Peter Daum
Guest
Posts: n/a
 
      03-28-2012
On 2012-03-28 11:02, Chris Angelico wrote:
> On Wed, Mar 28, 2012 at 7:56 PM, Peter Daum <(E-Mail Removed)-berlin.de> wrote:
>> is there any way to convert a string to bytes without
>> interpreting the data in any way? Something like:
>>
>> s='abcde'
>> b=bytes(s, "unchanged")

>
> What is a string? It's not a series of bytes. You can't convert it
> without encoding those characters into bytes in some way.


.... in my example, the variable s points to a "string", i.e. a series of
bytes, (0x61,0x62 ...) interpreted as ascii/unicode characters.

b=bytes(s,'ascii') # or ('utf-8', 'latin1', ...)

would of course work in this case, but in general, if s holds any
data with bytes > 127, the actual data will be changed according
to the provided encoding.

What I am looking for is a general way to just copy the raw data
from a "string" object to a "byte" object without any attempt to
"decode" or "encode" anything ...

Regards,
Peter
 
Reply With Quote
 
Heiko Wundram
Guest
Posts: n/a
 
      03-28-2012
Am 28.03.2012 11:43, schrieb Peter Daum:
> ... in my example, the variable s points to a "string", i.e. a series
> of
> bytes, (0x61,0x62 ...) interpreted as ascii/unicode characters.


No; a string contains a series of codepoints from the unicode plane,
representing natural language characters (at least in the simplistic
view, I'm not talking about surrogates). These can be encoded to
different binary storage representations, of which ascii is (a common)
one.

> What I am looking for is a general way to just copy the raw data
> from a "string" object to a "byte" object without any attempt to
> "decode" or "encode" anything ...


There is "logically" no raw data in the string, just a series of
codepoints, as stated above. You'll have to specify the encoding to use
to get at "raw" data, and from what I gather you're interested in the
latin-1 (or iso-8859-15) encoding, as you're specifically referencing
chars >= 0x80 (which hints at your mindset being in LATIN-land, so to
speak).

--
--- Heiko.
 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      03-28-2012
Peter Daum, 28.03.2012 11:43:
> What I am looking for is a general way to just copy the raw data
> from a "string" object to a "byte" object without any attempt to
> "decode" or "encode" anything ...


That's why I asked about your use case - where does the data come from and
why is it contained in a character string in the first place? If you could
provide that information, we can help you further.

Stefan

 
Reply With Quote
 
Ross Ridge
Guest
Posts: n/a
 
      03-28-2012
Chris Angelico <(E-Mail Removed)> wrote:
>What is a string? It's not a series of bytes.


Of course it is. Conceptually you're not supposed to think of it that
way, but a string is stored in memory as a series of bytes.

What he's asking for many not be very useful or practical, but if that's
your problem here than then that's what you should be addressing, not
pretending that it's fundamentally impossible.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] http://www.velocityreviews.com/forums/(E-Mail Removed)
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //
 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      03-28-2012
On Thu, Mar 29, 2012 at 2:36 AM, Ross Ridge <(E-Mail Removed)> wrote:
> Chris Angelico *<(E-Mail Removed)> wrote:
>>What is a string? It's not a series of bytes.

>
> Of course it is. *Conceptually you're not supposed to think of it that
> way, but a string is stored in memory as a series of bytes.


Note that distinction. I said that a string "is not" a series of
bytes; you say that it "is stored" as bytes.

> What he's asking for many not be very useful or practical, but if that's
> your problem here than then that's what you should be addressing, not
> pretending that it's fundamentally impossible.


That's equivalent to taking a 64-bit integer and trying to treat it as
a 64-bit floating point number. They're all just bits in memory, and
in C it's quite easy to cast a pointer to a different type and
dereference it. But a Python Unicode string might be stored in several
ways; for all you know, it might actually be stored as a sequence of
apples in a refrigerator, just as long as they can be referenced
correctly. There's no logical Python way to turn that into a series of
bytes.

ChrisA
 
Reply With Quote
 
Grant Edwards
Guest
Posts: n/a
 
      03-28-2012
On 2012-03-28, Chris Angelico <(E-Mail Removed)> wrote:

> for all you know, it might actually be stored as a sequence of
> apples in a refrigerator


[...]

> There's no logical Python way to turn that into a series of bytes.


There's got to be a joke there somewhere about how to eat an apple...

--
Grant Edwards grant.b.edwards Yow! Somewhere in DOWNTOWN
at BURBANK a prostitute is
gmail.com OVERCOOKING a LAMB CHOP!!
 
Reply With Quote
 
Peter Daum
Guest
Posts: n/a
 
      03-28-2012
On 2012-03-28 12:42, Heiko Wundram wrote:
> Am 28.03.2012 11:43, schrieb Peter Daum:
>> ... in my example, the variable s points to a "string", i.e. a series of
>> bytes, (0x61,0x62 ...) interpreted as ascii/unicode characters.

>
> No; a string contains a series of codepoints from the unicode plane,
> representing natural language characters (at least in the simplistic
> view, I'm not talking about surrogates). These can be encoded to
> different binary storage representations, of which ascii is (a common) one.
>
>> What I am looking for is a general way to just copy the raw data
>> from a "string" object to a "byte" object without any attempt to
>> "decode" or "encode" anything ...

>
> There is "logically" no raw data in the string, just a series of
> codepoints, as stated above. You'll have to specify the encoding to use
> to get at "raw" data, and from what I gather you're interested in the
> latin-1 (or iso-8859-15) encoding, as you're specifically referencing
> chars >= 0x80 (which hints at your mindset being in LATIN-land, so to
> speak).


.... I was under the illusion, that python (like e.g. perl) stored
strings internally in utf-8. In this case the "conversion" would simple
mean to re-label the data. Unfortunately, as I meanwhile found out, this
is not the case (nor the "apple encoding" , so it would indeed be
pretty useless.

The longer story of my question is: I am new to python (obviously), and
since I am not familiar with either one, I thought it would be advisory
to go for python 3.x. The biggest problem that I am facing is, that I
am often dealing with data, that is basically text, but it can contain
8-bit bytes. In this case, I can not safely assume any given encoding,
but I actually also don't need to know - for my purposes, it would be
perfectly good enough to deal with the ascii portions and keep anything
else unchanged.

As it seems, this would be far easier with python 2.x. With python 3
and its strict distinction between "str" and "bytes", things gets
syntactically pretty awkward and error-prone (something as innocently
looking like "s=s+'/'" hidden in a rarely reached branch and a
seemingly correct program will crash with a TypeError 2 years
later ...)

Regards,
Peter
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Ratio of Bytes Delayed to Bytes Sent netproj Cisco 0 12-21-2005 08:08 PM
4-bytes or 8-bytes alignment? mrby C Programming 8 11-02-2004 08:45 PM
Private Bytes vs. # Bytes in all Heaps in Perfmon Jason Collins ASP .Net 3 02-18-2004 03:59 PM
Re: receiving Bytes and sending Bytes Ieuan Adams Computer Support 0 07-24-2003 07:46 PM
Re: receiving Bytes and sending Bytes The Old Sourdough Computer Support 0 07-23-2003 01:23 PM



Advertisments