Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Python encoding question (http://www.velocityreviews.com/forums/t744175-python-encoding-question.html)

Marc Muehlfeld 02-25-2011 09:57 AM

Python encoding question
 
Hi,

I'm doing my first steps with python and I have a problem with understanding
an encoding problem I have. My script:

import os
os.environ["NLS_LANG"] = "German_Germany.UTF8"
import cx_Oracle
connection = cx_Oracle.Connection("username/password@SID")
cursor = connection.cursor()
cursor.execute("SELECT NAME1 FROM COR WHERE CORNB='ABCDEF'")
TEST = cursor.fetchone()
print TEST[0]
print TEST


When I run this script It prints me:
München
('M\xc3\xbcnchen',)

Why is the Umlaut of TEST[0] printed and not from TEST?


And why are both prints show the wrong encoding, when I switch "fetchone()" to
"fetchall()":
('M\xc3\xbcnchen',)
[('M\xc3\xbcnchen',)]


I'm running Python 2.4.3 on CentOS 5.


Regards,
Marc

Jean-Michel Pichavant 02-25-2011 11:19 AM

Re: Python encoding question
 
Marc Muehlfeld wrote:
> Hi,
>
> I'm doing my first steps with python and I have a problem with
> understanding an encoding problem I have. My script:
>
> import os
> os.environ["NLS_LANG"] = "German_Germany.UTF8"
> import cx_Oracle
> connection = cx_Oracle.Connection("username/password@SID")
> cursor = connection.cursor()
> cursor.execute("SELECT NAME1 FROM COR WHERE CORNB='ABCDEF'")
> TEST = cursor.fetchone()
> print TEST[0]
> print TEST
>
>
> When I run this script It prints me:
> München
> ('M\xc3\xbcnchen',)
>
> Why is the Umlaut of TEST[0] printed and not from TEST?
>
>
> And why are both prints show the wrong encoding, when I switch
> "fetchone()" to "fetchall()":
> ('M\xc3\xbcnchen',)
> [('M\xc3\xbcnchen',)]
>
>
> I'm running Python 2.4.3 on CentOS 5.
>
>
> Regards,
> Marc

Nothing related to encoding here. TEST[0] is a string, TEST is a tupple.

s1 = 'aline \n anotherline'

> print str(s1)

aline
anotherline

> print repr(s1)

'aline \n anotherline'

atuple = (s1,)
> print str(atuple)

('aline \n anotherline',)

> print repr(atuple)

('aline \n anotherline',)

Read http://docs.python.org/reference/datamodel.html regarding __repr__
and __str__.

Basically, __str__ and __repr__ are the same method for tuples, while it
differs from each other for strings.
If you want a nice representation of tuple elements you have to do it
yourself:

print ', '.join([str(elem) for elem in atuple])

In a more general manner only strings will print nicely with carriage
returns & UTF8 characters. Everyhing else, like tuple, lists, objects
will using the __repr__ method which displays formal data.

JM

PS :

> class Foo:

def __str__(self):
return 'I am a nice representation of a Foo instance'



> print Foo()

I am a nice representation of a Foo instance

> print str(Foo())

I am a nice representation of a Foo instance

> print repr(Foo())

<__main__.Foo instance at 0xb73a07ac>




Dave Angel 02-25-2011 12:50 PM

Re: Python encoding question
 
On 01/-10/-28163 02:59 PM, Marc Muehlfeld wrote:
> Hi,
>
> <snip>
> TEST = cursor.fetchone()
> print TEST[0]
> print TEST
>
>
> When I run this script It prints me:
> München
> ('M\xc3\xbcnchen',)
>
> Why is the Umlaut of TEST[0] printed and not from TEST?
>


When you print a string, it simply prints it, control characters,
international characters, and all.

When you print a more complex object, it's up to that object to decide
how to print. In the case of a tuple above, the tuple logic displays
the parentheses and the comma, but calls the repr() of any objects it
contains. Tuple doesn't make a special case for strings, or for
numbers, it just always calls repr() (actually it's __repr__(), I think)

A list does the same thing, though it'll use square brackets at the ends.

So the question boils down to what repr() does. It attempts to create a
representation that could be used to create the specific object. So if
there's a newline, it uses \n. And if there are non-ASCII codes, it
uses hex escape sequences. And of course it adds the quote marks.

DaveA


All times are GMT. The time now is 05:58 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57