Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > UnicodeEncodeError when not running script from IDE

Reply
Thread Tools

UnicodeEncodeError when not running script from IDE

 
 
Magnus Pettersson
Guest
Posts: n/a
 
      02-12-2013
I am using Eclipse to write my python scripts and when i run them from inside eclipse they work fine without errors.

But almost in every script that handle some form of special characters likeswedish and chinese characters etc i get Unicode errors when running the script externally with python.exe or pythonw.exe (but the scripts run completely fine from within Eclipse (standard pydev projects, python2.7). I have usually launched the script gui from wihin eclipse because of this error but now i want to get the bottom of this so i dont have to open eclipse everytime i want to run a script!

Here is the error i get now when running the script with python.exe:
UnicodeEncodeError:'charmap' codec cant encode character u'\u898b' in position 32: character maps to <undefined>

what can i do to fix this?
 
Reply With Quote
 
 
 
 
Andrew Berg
Guest
Posts: n/a
 
      02-12-2013
On 2013.02.12 04:43, Magnus Pettersson wrote:
> I am using Eclipse to write my python scripts and when i run them from inside eclipse they work fine without errors.
>
> But almost in every script that handle some form of special characters like swedish åäö and chinese characters etc i get Unicode errors when running the script externally with python.exe or pythonw.exe (but the scripts run completely fine from within Eclipse (standard pydev projects, python2.7). I have usually launched the script gui from wihin eclipse because of this error but now i want to get the bottom of this so i dont have to open eclipse everytime i want to run a script!
>
> Here is the error i get now when running the script with python.exe:
> UnicodeEncodeError:'charmap' codec cant encode character u'\u898b' in position 32: character maps to <undefined>
>
> what can i do to fix this?
>

Since you didn't say what code actually does this, I'll turn to my
crystal ball. It says you are trying to print characters to a terminal
that doesn't support them. If that is the case, you could try changing
the code page (but only 3.3 supports cp65001, so that probably won't
help) or use replacement characters when printing.

--
CPython 3.3.0 | Windows NT 6.2.9200.16461 / FreeBSD 9.1-RELEASE
 
Reply With Quote
 
 
 
 
Steven D'Aprano
Guest
Posts: n/a
 
      02-12-2013
Magnus Pettersson wrote:

> I am using Eclipse to write my python scripts and when i run them from
> inside eclipse they work fine without errors.
>
> But almost in every script that handle some form of special characters
> like swedish åäö and chinese characters etc


A comment: they are not "special" characters. They're merely not American.


> i get Unicode errors when
> running the script externally with python.exe or pythonw.exe (but the
> scripts run completely fine from within Eclipse (standard pydev projects,
> python2.7). I have usually launched the script gui from wihin eclipse
> because of this error but now i want to get the bottom of this so i dont
> have to open eclipse everytime i want to run a script!
>
> Here is the error i get now when running the script with python.exe:
> UnicodeEncodeError:'charmap' codec cant encode character u'\u898b' in
> position 32: character maps to <undefined>


Please show the *complete* traceback, including the line of code that causes
the exception.


> what can i do to fix this?


My guess is that you are trying to print a character which your terminal
cannot display. My terminal is set to use UTF-8, and so it can display it
fine:

py> c = u'\u898b'
py> print(c)



(or at least it would display fine if the font used had a glyph for that
character). Why there are still terminals in the world that don't default
to UTF-8 is beyond me.

If I manually change the terminal's encoding to Western European ISO 8859-1,
I get some moji-bake:

py> print(c)
è¦


I can't replicate the exception you give, so I assume it is specific to
Windows.




--
Steven

 
Reply With Quote
 
Magnus Pettersson
Guest
Posts: n/a
 
      02-12-2013
Ahh so its the actual printing that makes it error out outside of eclipse because its a different terminal that its printing to. Its the default DOS terminal in windows that runs then i run the script with python.exe and i guess its the same when i run with pythonw.exe just that the terminal window is not opened up, only the pyqt gui in this case.

I will try to fix it now when i know what it is

I never thought about the terminal, last time i had the same problem i justwere playing around for hours with unicode encode and decode and all that not-so-fun stuff

Andrew Berg: Thanks, your crystal ball seems to be right

On Tuesday, February 12, 2013 12:43:00 PM UTC+1, Steven D'Aprano wrote:
> Magnus Pettersson wrote:
>
>
>
> > I am using Eclipse to write my python scripts and when i run them from

>
> > inside eclipse they work fine without errors.

>
> >

>
> > But almost in every script that handle some form of special characters

>
> > like swedish åäö and chinese characters etc

>
>
>
> A comment: they are not "special" characters. They're merely not American..
>
>
>
>
>
> > i get Unicode errors when

>
> > running the script externally with python.exe or pythonw.exe (but the

>
> > scripts run completely fine from within Eclipse (standard pydev projects,

>
> > python2.7). I have usually launched the script gui from wihin eclipse

>
> > because of this error but now i want to get the bottom of this so i dont

>
> > have to open eclipse everytime i want to run a script!

>
> >

>
> > Here is the error i get now when running the script with python.exe:

>
> > UnicodeEncodeError:'charmap' codec cant encode character u'\u898b' in

>
> > position 32: character maps to <undefined>

>
>
>
> Please show the *complete* traceback, including the line of code that causes
>
> the exception.
>
>
>
>
>
> > what can i do to fix this?

>
>
>
> My guess is that you are trying to print a character which your terminal
>
> cannot display. My terminal is set to use UTF-8, and so it can display it
>
> fine:
>
>
>
> py> c = u'\u898b'
>
> py> print(c)
>
> 見
>
>
>
>
>
> (or at least it would display fine if the font used had a glyph for that
>
> character). Why there are still terminals in the world that don't default
>
> to UTF-8 is beyond me.
>
>
>
> If I manually change the terminal's encoding to Western European ISO 8859-1,
>
> I get some moji-bake:
>
>
>
> py> print(c)
>
> è¦
>
>
>
>
>
> I can't replicate the exception you give, so I assume it is specific to
>
> Windows.
>
>
>
>
>
>
>
>
>
> --
>
> Steven


 
Reply With Quote
 
Magnus Pettersson
Guest
Posts: n/a
 
      02-12-2013
I have tried now to take away printing to terminal and just keeping the writing to a .txt file to disk (which is what the scripts purpose is):

with open(filepath,"a") as f:
for card in cardlist:
f.write(card+"\n")

The file it writes to exists and im just appending to it, but when i run the script trough eclipse, all is fine. When i run in terminal i get this error instead:

File "K:\dev\python\webscraping\kanji_anki.py", line 69, in savefile
f.write(card+"\n")
UnicodeEncodeError: 'ascii' codec can't encode character u'\u898b' in position 3
2: ordinal not in range(12

On Tuesday, February 12, 2013 12:01:19 PM UTC+1, Andrew Berg wrote:
> On 2013.02.12 04:43, Magnus Pettersson wrote:
>
> > I am using Eclipse to write my python scripts and when i run them from inside eclipse they work fine without errors.

>
> >

>
> > But almost in every script that handle some form of special characters like swedish and chinese characters etc i get Unicode errors whenrunning the script externally with python.exe or pythonw.exe (but the scripts run completely fine from within Eclipse (standard pydev projects, python2.7). I have usually launched the script gui from wihin eclipse because ofthis error but now i want to get the bottom of this so i dont have to openeclipse everytime i want to run a script!

>
> >

>
> > Here is the error i get now when running the script with python.exe:

>
> > UnicodeEncodeError:'charmap' codec cant encode character u'\u898b' in position 32: character maps to <undefined>

>
> >

>
> > what can i do to fix this?

>
> >

>
> Since you didn't say what code actually does this, I'll turn to my
>
> crystal ball. It says you are trying to print characters to a terminal
>
> that doesn't support them. If that is the case, you could try changing
>
> the code page (but only 3.3 supports cp65001, so that probably won't
>
> help) or use replacement characters when printing.
>
>
>
> --
>
> CPython 3.3.0 | Windows NT 6.2.9200.16461 / FreeBSD 9.1-RELEASE

 
Reply With Quote
 
Magnus Pettersson
Guest
Posts: n/a
 
      02-12-2013
I have tried now to take away printing to terminal and just keeping the writing to a .txt file to disk (which is what the scripts purpose is):

with open(filepath,"a") as f:
for card in cardlist:
f.write(card+"\n")

The file it writes to exists and im just appending to it, but when i run the script trough eclipse, all is fine. When i run in terminal i get this error instead:

File "K:\dev\python\webscraping\kanji_anki.py", line 69, in savefile
f.write(card+"\n")
UnicodeEncodeError: 'ascii' codec can't encode character u'\u898b' in position 3
2: ordinal not in range(12

On Tuesday, February 12, 2013 12:01:19 PM UTC+1, Andrew Berg wrote:
> On 2013.02.12 04:43, Magnus Pettersson wrote:
>
> > I am using Eclipse to write my python scripts and when i run them from inside eclipse they work fine without errors.

>
> >

>
> > But almost in every script that handle some form of special characters like swedish and chinese characters etc i get Unicode errors whenrunning the script externally with python.exe or pythonw.exe (but the scripts run completely fine from within Eclipse (standard pydev projects, python2.7). I have usually launched the script gui from wihin eclipse because ofthis error but now i want to get the bottom of this so i dont have to openeclipse everytime i want to run a script!

>
> >

>
> > Here is the error i get now when running the script with python.exe:

>
> > UnicodeEncodeError:'charmap' codec cant encode character u'\u898b' in position 32: character maps to <undefined>

>
> >

>
> > what can i do to fix this?

>
> >

>
> Since you didn't say what code actually does this, I'll turn to my
>
> crystal ball. It says you are trying to print characters to a terminal
>
> that doesn't support them. If that is the case, you could try changing
>
> the code page (but only 3.3 supports cp65001, so that probably won't
>
> help) or use replacement characters when printing.
>
>
>
> --
>
> CPython 3.3.0 | Windows NT 6.2.9200.16461 / FreeBSD 9.1-RELEASE

 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      02-12-2013
Magnus Pettersson wrote:

> I have tried now to take away printing to terminal and just keeping the
> writing to a .txt file to disk (which is what the scripts purpose is):
>
> with open(filepath,"a") as f:
> for card in cardlist:
> f.write(card+"\n")
>
> The file it writes to exists and im just appending to it, but when i run
> the script trough eclipse, all is fine. When i run in terminal i get this
> error instead:
>
> File "K:\dev\python\webscraping\kanji_anki.py", line 69, in savefile
> f.write(card+"\n")
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u898b' in
> position 3 2: ordinal not in range(12


Are you sure you are writing the same data? That would mean that pydev
changes the default encoding -- which is evil.

A portable approach would be to use codecs.open() or io.open() instead of
the built-in:

import io
with io.open(filepath, "a") as f:
...

io.open() uses UTF-8 by default, but you can specify other encodings with
io.open(filepath, mode, encoding=whatever).

 
Reply With Quote
 
Magnus Pettersson
Guest
Posts: n/a
 
      02-12-2013
> Are you sure you are writing the same data? That would mean that pydev
>
> changes the default encoding -- which is evil.
>
>
>
> A portable approach would be to use codecs.open() or io.open() instead of
>
> the built-in:
>
>
>
> import io
>
> with io.open(filepath, "a") as f:
>
> ...
>
>
>
> io.open() uses UTF-8 by default, but you can specify other encodings with
>
> io.open(filepath, mode, encoding=whatever).



Interesting. Pydev must be doing something behind the scenes because when i changed open() to io.open() i get error inside of eclipse now:

f.write(card+"\n")
File "C:\python27\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_t able)[0]
UnicodeEncodeError: 'charmap' codec can't encode character u'\u53c8' in position 32: character maps to <undefined>

.....

io.open(filepath, "a", encoding="UTF-8") as f:

Then it works in eclipse. But I seem to be having an encoding problem all over the place that works in eclipse but dosnt work outside of eclipse pydev.

Here is the flow of my data, im terrible at using unicode/encode/decode so could use some help here:

kanji_anki_gui.py:

def on_addButton_clicked(self):
#code
# self.kanji.text() comes from a kanji letter written into a pyqt4 QLineEdit
kanji = unicode(self.kanji.text())
card = kanji_anki.scrapeKanji(kanji,tags)
#more code

kanji_anki.py:

def scrapeKanji(kanji, tags="", onlymeaning=False):
baseurl = unicode("http://www.romajidesu.com/kanji/")
url = unicode(baseurl+kanji)
#test to write out url to disk, works outside of eclipse now
savefile([url])

#getting webpage works fine in eclipse, prints "Oh no..." in terminal
try:
page = urllib2.urlopen(url)
except:
print "OH no website dont work"
return None

#Code that does some scraping and returns a string containing kanji letters
return card

def savefile(cardlist,filepath="D:/iknow_kanji.txt"):
with io.open(filepath, "a") as f:
for card in cardlist:
f.write(card+"\n")
return True
 
Reply With Quote
 
Magnus Pettersson
Guest
Posts: n/a
 
      02-12-2013
> Are you sure you are writing the same data? That would mean that pydev
>
> changes the default encoding -- which is evil.
>
>
>
> A portable approach would be to use codecs.open() or io.open() instead of
>
> the built-in:
>
>
>
> import io
>
> with io.open(filepath, "a") as f:
>
> ...
>
>
>
> io.open() uses UTF-8 by default, but you can specify other encodings with
>
> io.open(filepath, mode, encoding=whatever).



Interesting. Pydev must be doing something behind the scenes because when i changed open() to io.open() i get error inside of eclipse now:

f.write(card+"\n")
File "C:\python27\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_t able)[0]
UnicodeEncodeError: 'charmap' codec can't encode character u'\u53c8' in position 32: character maps to <undefined>

.....

io.open(filepath, "a", encoding="UTF-8") as f:

Then it works in eclipse. But I seem to be having an encoding problem all over the place that works in eclipse but dosnt work outside of eclipse pydev.

Here is the flow of my data, im terrible at using unicode/encode/decode so could use some help here:

kanji_anki_gui.py:

def on_addButton_clicked(self):
#code
# self.kanji.text() comes from a kanji letter written into a pyqt4 QLineEdit
kanji = unicode(self.kanji.text())
card = kanji_anki.scrapeKanji(kanji,tags)
#more code

kanji_anki.py:

def scrapeKanji(kanji, tags="", onlymeaning=False):
baseurl = unicode("http://www.romajidesu.com/kanji/")
url = unicode(baseurl+kanji)
#test to write out url to disk, works outside of eclipse now
savefile([url])

#getting webpage works fine in eclipse, prints "Oh no..." in terminal
try:
page = urllib2.urlopen(url)
except:
print "OH no website dont work"
return None

#Code that does some scraping and returns a string containing kanji letters
return card

def savefile(cardlist,filepath="D:/iknow_kanji.txt"):
with io.open(filepath, "a") as f:
for card in cardlist:
f.write(card+"\n")
return True
 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      02-12-2013
Magnus Pettersson wrote:

>> io.open() uses UTF-8 by default, but you can specify other encodings with
>>
>> io.open(filepath, mode, encoding=whatever).

>
>
> Interesting. Pydev must be doing something behind the scenes because when
> i changed open() to io.open() i get error inside of eclipse now:
>
> f.write(card+"\n")
> File "C:\python27\lib\encodings\cp1252.py", line 19, in encode
> return codecs.charmap_encode(input,self.errors,encoding_t able)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character u'\u53c8' in
> position 32: character maps to <undefined>
>
> ....
>
> io.open(filepath, "a", encoding="UTF-8") as f:
>
> Then it works in eclipse. But I seem to be having an encoding problem all
> over the place that works in eclipse but dosnt work outside of eclipse
> pydev.


No, I was wrong about the default; it is actually
locale.getpreferredencoding(). Sorry for the confusion.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
UnicodeEncodeError when piping stdout, but not when printingdirectly to the console Adam Funk Python 4 01-06-2012 02:22 PM
UnicodeEncodeError - a bit out of my element... erikcw Python 3 04-11-2007 05:46 PM
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 99: ordinal not in range(128) Francach Python 2 11-06-2005 09:05 PM
os.path.expanduser on Windows: UnicodeEncodeError Bob Swerdlow Python 1 07-19-2005 04:48 AM
UnicodeEncodeError in string conversion Maurice LING Python 1 03-31-2005 09:00 AM



Advertisments