Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > csv reader

Reply
Thread Tools

csv reader

 
 
Emmanuel
Guest
Posts: n/a
 
      12-15-2009
I have a problem with csv.reader from the library csv. I'm not able to
import accentuated caracters. For example, I'm trying to import a
simple file containing a single word "equação" using the following
code:

import csv
arquivoCSV='test'
a=csv.reader(open(arquivoCSV),delimiter=',')
tab=[]
for row in a:
tab.append(row)
print tab

As a result, I get:

[['equa\xe7\xe3o']]

How can I solve this problem?
 
Reply With Quote
 
 
 
 
Chris Rebert
Guest
Posts: n/a
 
      12-15-2009
On Tue, Dec 15, 2009 at 1:24 PM, Emmanuel <(E-Mail Removed)> wrote:
> I have a problem with csv.reader from the library csv. I'm not able to
> import accentuated caracters. For example, I'm trying to import a
> simple file containing a single word "equação" using the following
> code:
>
> import csv
> arquivoCSV='test'
> a=csv.reader(open(arquivoCSV),delimiter=',')
> tab=[]
> for row in a:
> Â* Â*tab.append(row)
> print tab
>
> As a result, I get:
>
> [['equa\xe7\xe3o']]
>
> How can I solve this problem?


>From http://docs.python.org/library/csv.html :

"""
Note:
This version of the csv module doesn’t support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should be UTF-8 or printable ASCII to be safe;
see the examples in section Examples. These restrictions will be
removed in the future.
"""

Thus, you'll have to decode the results into Unicode manually; this
will require knowing what encoding your file is using. Files in some
encodings may not parse correctly due to the aforementioned NUL
problem.

Cheers,
Chris
--
http://blog.rebertia.com
 
Reply With Quote
 
 
 
 
Jerry Hill
Guest
Posts: n/a
 
      12-15-2009
On Tue, Dec 15, 2009 at 4:24 PM, Emmanuel <(E-Mail Removed)> wrote:
> I have a problem with csv.reader from the library csv. I'm not able to
> import accentuated caracters. For example, I'm trying to import a
> simple file containing a single word "equação" using the following
> code:
>
> import csv
> arquivoCSV='test'
> a=csv.reader(open(arquivoCSV),delimiter=',')
> tab=[]
> for row in a:
> Â* Â*tab.append(row)
> print tab
>
> As a result, I get:
>
> [['equa\xe7\xe3o']]
>
> How can I solve this problem?


I don't think it is a problem. \xe7 is the character ç encoded in
Windows-1252, which is probably the encoding of your csv file. If you
want to convert that to a unicode string, do something like the
following.

s = 'equa\xe7\xe3o'
uni_s = s.decode('Windows-1252')
print uni_s

--
Jerry
 
Reply With Quote
 
Emmanuel
Guest
Posts: n/a
 
      12-15-2009
Then my problem is diferent!

In fact I'm reading a csv file saved from openoffice oocalc using
UTF-8 encoding. I get a list of list (let's cal it tab) with the csv
data.
If I do:

print tab[2][4]
In ipython, I get:
equação de Toricelli. Tarefa exercícios PVR 1 e 2 ; PVP 1

If I only do:
tab[2][4]

In ipython, I get:
'equa\xc3\xa7\xc3\xa3o de Toricelli. Tarefa exerc\xc3\xadcios PVR 1 e
2 ; PVP 1'

Does that mean that my problem is not the one I'm thinking?

My real problem is when I use that that kind of UTF-8 encoded (?) with
selenium here.
Here is an small code example of a not-working case giving the same
error that on my bigger program:


#!/usr/bin/env python
# -*- coding: utf-8 -*-

from selenium import selenium
import sys,os,csv,re


class test:
'''classe para interagir com o sistema acadêmico'''
def __init__(self):
self.webpage=''
self.arquivo=''
self.script=[]
self.sel = selenium('localhost', 4444, '*firefox', 'http://
www.google.com.br')
self.sel.start()
self.sel.open('/')
self.sel.wait_for_page_to_load(30000)
self.sel.type("q", "equação")
#self.sel.type("q", u"equacao")
self.sel.click("btnG")
self.sel.wait_for_page_to_load("30000")


def main():
teste=test()


if __name__ == "__main__":
main()



If I just switch the folowing line:
self.sel.type("q", "equação")

by:
self.sel.type("q", u"equação")


It works fine!
The problem is that the csv.reader does give a "equação" and not a
u"equação"


Here is the error given with bad code (with "equação"):
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (1202, 0))

---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call
last)

/home/manu/Labo/Cefetes_Colatina/Scripts/
20091215_test_acentuated_caracters.py in <module>()
27
28 if __name__ == "__main__":
---> 29 main()
30
31

/home/manu/Labo/Cefetes_Colatina/Scripts/
20091215_test_acentuated_caracters.py in main()
23
24 def main():
---> 25 teste=test()
26
27

/home/manu/Labo/Cefetes_Colatina/Scripts/
20091215_test_acentuated_caracters.py in __init__(self)
16 self.sel.open('/')
17 self.sel.wait_for_page_to_load(30000)
---> 18 self.sel.type("q", "equação")
19 #self.sel.type("q", u"equacao")
20 self.sel.click("btnG")

/home/manu/Labo/Cefetes_Colatina/Scripts/selenium.pyc in type(self,
locator, value)
588 'value' is the value to type
589 """
--> 590 self.do_command("type", [locator,value,])
591
592

/home/manu/Labo/Cefetes_Colatina/Scripts/selenium.pyc in do_command
(self, verb, args)
201 body = u'cmd=' + urllib.quote_plus(unicode(verb).encode
('utf-8'))
202 for i in range(len(args)):
--> 203 body += '&' + unicode(i+1) + '=' +
urllib.quote_plus(unicode(args[i]).encode('utf-8'))
204 if (None != self.sessionId):
205 body += "&sessionId=" + unicode(self.sessionId)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
4: ordinal not in range(12
WARNING: Failure executing file:
<20091215_test_acentuated_caracters.py>
Python 2.6.4 (r264:75706, Oct 27 2009, 06:16:59)
 
Reply With Quote
 
Emmanuel
Guest
Posts: n/a
 
      12-16-2009
As csv.reader does not suport utf-8 encoded files, I'm using:

fp = codecs.open(arquivoCSV, "r", "utf-8")
self.tab=[]
for l in fp:
l=l.replace('\"','').strip()
self.tab.append(l.split(','))

It works much better except that when I do self.sel.type("q", ustring)
where ustring is a unicode string obtained from the file using the
code showed above.

Remaining problem is that I obtain <sp> insted of a regular space...
 
Reply With Quote
 
Gabriel Genellina
Guest
Posts: n/a
 
      12-16-2009
En Tue, 15 Dec 2009 19:12:01 -0300, Emmanuel <(E-Mail Removed)> escribió:

> Then my problem is diferent!
>
> In fact I'm reading a csv file saved from openoffice oocalc using
> UTF-8 encoding. I get a list of list (let's cal it tab) with the csv
> data.
> If I do:
>
> print tab[2][4]
> In ipython, I get:
> equação de Toricelli. Tarefa exercícios PVR 1 e 2 ; PVP 1
>
> If I only do:
> tab[2][4]
>
> In ipython, I get:
> 'equa\xc3\xa7\xc3\xa3o de Toricelli. Tarefa exerc\xc3\xadcios PVR 1 e
> 2 ; PVP 1'
>
> Does that mean that my problem is not the one I'm thinking?


Yes. You have a real problem, but not this one. When you say `print
something`, you get a nice view of `something`, basically the result of
doing `str(something)`. When you say `something` alone in the interpreter,
you get a more formal representation, the result of calling
`repr(something)`:

py> x = "ecuação"
py> print x
ecuação
py> x
'ecua\x87\xc6o'
py> print repr(x)
'ecua\x87\xc6o'

Those '' around the text and the \xNN notation allow for an unambiguous
representation. Two strings may "look like" the same but be different, and
repr shows that.
('ecua\x87\xc6o' is encoded in windows-1252; you should see
'equa\xc3\xa7\xc3\xa3o' in utf-

> My real problem is when I use that that kind of UTF-8 encoded (?) with
> selenium here.
> If I just switch the folowing line:
> self.sel.type("q", "equação")
>
> by:
> self.sel.type("q", u"equação")
>
>
> It works fine!


Yes: you should work with unicode most of the time. The "recipe" for
having as little unicode problems as possible says:

- convert the input data (read from external sources, like a file) from
bytes to unicode, using the (known) encoding of those bytes

- handle unicode internally everywhere in your program

- and convert from unicode to bytes as late as possible, when writing
output (to screen, other files, etc) using the encoding expected by those
external files.

See the Unicode How To: http://docs.python.org/howto/unicode.html

> The problem is that the csv.reader does give a "equação" and not a
> u"equação"


The csv module cannot handle unicode text directly, but see the last
example in the csv documentation for a simple workaround:
http://docs.python.org/library/csv.html

--
Gabriel Genellina

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
throwing exceptions from csv.DictReader or even csv.reader Tim Python 1 07-05-2010 05:32 PM
read and write csv file using csv module jliu66 Python 0 10-19-2007 03:12 PM
How to move data from a CSV file to a JTable, and from a JTable to a CSV file ? Tintin92 Java 1 02-14-2007 06:51 PM
Re: csv writerow creates double spaced excel csv files Skip Montanaro Python 0 02-13-2004 08:50 PM
csv writerow creates double spaced excel csv files Michal Mikolajczyk Python 0 02-13-2004 08:38 PM



Advertisments