Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Question on Strings

Reply
Thread Tools

Re: Question on Strings

 
 
Chris Rebert
Guest
Posts: n/a
 
      02-06-2009
On Fri, Feb 6, 2009 at 1:49 AM, Kalyankumar Ramaseshan
<(E-Mail Removed)> wrote:
>
> Hi,
>
> Excuse me if this is a repeat question!
>
> I just wanted to know how are strings represented in python?
>
> I need to know in terms of:
>
> a) Strings are stored as UTF-16 (LE/BE) or UTF-32 characters?


IIRC, Depends on what the build settings were when CPython was
compiled. UTF-16 is the default.

> b) They are converted to utf-8 format when it is needed for e.g. when storing the string to disk or sending it through a socket (tcp/ip)?


No. They are implicitly converted to ASCII in such cases. To properly
handle non-ASCII Unicode characters, you need to encode/decode the
strings to/from bytes manually by specifying the encoding.

Cheers,
Chris

--
Follow the path of the Iguana...
http://rebertia.com
 
Reply With Quote
 
 
 
 
John Machin
Guest
Posts: n/a
 
      02-06-2009
On Feb 6, 9:24*pm, Chris Rebert <(E-Mail Removed)> wrote:
> On Fri, Feb 6, 2009 at 1:49 AM, Kalyankumar Ramaseshan
>
> <(E-Mail Removed)> wrote:
>
> > Hi,

>
> > Excuse me if this is a repeat question!

>
> > I just wanted to know how are strings represented in python?

>
> > I need to know in terms of:

>
> > a) Strings are stored as UTF-16 (LE/BE) or UTF-32 characters?


Neither.

>
> IIRC, Depends on what the build settings were when CPython was
> compiled. UTF-16 is the default.


Unicode strings are held as arrays of 16-bit numbers or 32-bit numbers
[of which only 21 are used]. If you must use an acronym, use UCS-2 or
UCS-4.

The UTF-n siblings are *external* representations.
2.x: a_unicode_object.decode('UTF-16') -> an_str_object
3.x: an_str_object.decode('UTF-16') -> a_bytes_object

By the way, has anyone come up with a name for the shifting effect
observed above on str, and also with repr, range, and the iter*
family? If not, I suggest that the language's association with the
best of English humour be widened so that it be dubbed the "Mad
Hatter's Tea Party" effect.
 
Reply With Quote
 
 
 
 
MRAB
Guest
Posts: n/a
 
      02-06-2009
John Machin wrote:
> On Feb 6, 9:24 pm, Chris Rebert <(E-Mail Removed)> wrote:
>> On Fri, Feb 6, 2009 at 1:49 AM, Kalyankumar Ramaseshan
>>
>> <(E-Mail Removed)> wrote:
>>
>>> Hi,
>>> Excuse me if this is a repeat question!
>>> I just wanted to know how are strings represented in python?
>>> I need to know in terms of:
>>> a) Strings are stored as UTF-16 (LE/BE) or UTF-32 characters?

>
> Neither.
>
>> IIRC, Depends on what the build settings were when CPython was
>> compiled. UTF-16 is the default.

>
> Unicode strings are held as arrays of 16-bit numbers or 32-bit numbers
> [of which only 21 are used]. If you must use an acronym, use UCS-2 or
> UCS-4.
>
> The UTF-n siblings are *external* representations.
> 2.x: a_unicode_object.decode('UTF-16') -> an_str_object
> 3.x: an_str_object.decode('UTF-16') -> a_bytes_object
>
> By the way, has anyone come up with a name for the shifting effect
> observed above on str, and also with repr, range, and the iter*
> family? If not, I suggest that the language's association with the
> best of English humour be widened so that it be dubbed the "Mad
> Hatter's Tea Party" effect.
>

Bitwise shifts and rotates are collectively referred to as skew
operations. I therefore suggest the term "skewing".
 
Reply With Quote
 
Hendrik van Rooyen
Guest
Posts: n/a
 
      02-06-2009
"John Machin" <s..n@le..n.net> wrote:

>By the way, has anyone come up with a name for the shifting effect
>observed above on str, and also with repr, range, and the iter*
>family? If not, I suggest that the language's association with the
>best of English humour be widened so that it be dubbed the "Mad
>Hatter's Tea Party" effect.


The MHTP effect.

Sounds educated, almost like
a network protocol.

+1

- Hendrik



 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      02-06-2009
John Machin wrote:

> The UTF-n siblings are *external* representations.
> 2.x: a_unicode_object.decode('UTF-16') -> an_str_object
> 3.x: an_str_object.decode('UTF-16') -> a_bytes_object


That should be .encode() to bytes, which is the coded form.
..decode is bytes => str/unicode

 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      02-06-2009
On Feb 7, 5:23*am, Terry Reedy <(E-Mail Removed)> wrote:
> John Machin wrote:
> > The UTF-n siblings are *external* representations.
> > 2.x: a_unicode_object.decode('UTF-16') -> an_str_object
> > 3.x: an_str_object.decode('UTF-16') -> a_bytes_object

>
> That should be .encode() to bytes, which is the coded form.
> .decode is bytes => str/unicode


True. I guess that makes me the Dohmouse
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Strings, Strings and Damned Strings Ben C Programming 14 06-24-2006 05:09 AM
How to generate k+1 length strings from a list of k length strings? Girish Sahani Python 17 06-09-2006 11:01 AM
Catching std::strings and c-style strings at once Kurt Krueckeberg C++ 2 11-17-2004 03:53 AM
convert list of strings to set of regexes; convert list of strings to trie Klaus Neuner Python 7 07-26-2004 07:25 AM
Comparing strings from within strings Rick C Programming 3 10-21-2003 09:10 AM



Advertisments