Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Py3: Read file with Unicode characters

Reply
Thread Tools

Py3: Read file with Unicode characters

 
 
Gnarlodious
Guest
Posts: n/a
 
      04-08-2010
Attempting to read a file containing Unicode characters such as ±:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
5007: ordinal not in range(12

I did succeed by converting all the characters to HTML entities such
as "±", but I want the characters to be the actual font in the
source file. What am I doing wrong? My understanding is that ALL
strings in Py3 are unicode so... confused.

-- Gnarlie

 
Reply With Quote
 
 
 
 
Martin v. Loewis
Guest
Posts: n/a
 
      04-08-2010
Gnarlodious wrote:
> Attempting to read a file containing Unicode characters such as ±:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 5007: ordinal not in range(12
>
> I did succeed by converting all the characters to HTML entities such
> as "±", but I want the characters to be the actual font in the
> source file. What am I doing wrong? My understanding is that ALL
> strings in Py3 are unicode so... confused.


When opening the file, you need to specify the file encoding. If you
don't, it defaults to ASCII (in your situation; the specific default
depends on the environment).

Regards,
Martin
 
Reply With Quote
 
 
 
 
Gnarlodious
Guest
Posts: n/a
 
      04-08-2010
On Apr 8, 9:14*am, "Martin v. Loewis" wrote:

> When opening the file, you need to specify the file encoding.


OK, I had tried this:

open(path, 'r').read().encode('utf-8')

however I get error

TypeError: Can't convert 'bytes' object to str implicitly

I had assumed a Unicode string was a Unicode string, so why is it a
bytes string?

Sorry, doing Unicode in Py3 has really been a challenge.

-- Gnarlie
 
Reply With Quote
 
Martin v. Loewis
Guest
Posts: n/a
 
      04-08-2010
Gnarlodious wrote:
> On Apr 8, 9:14 am, "Martin v. Loewis" wrote:
>
>> When opening the file, you need to specify the file encoding.

>
> OK, I had tried this:
>
> open(path, 'r').read().encode('utf-8')


No, when *opening* the file, you need to specify the encoding:

open(path, 'r', encoding='utf-8').read()

> Sorry, doing Unicode in Py3 has really been a challenge.


That's because you need to re-learn some things.

Regards,
Martin
 
Reply With Quote
 
Gnarlodious
Guest
Posts: n/a
 
      04-08-2010
On Apr 8, 11:04*am, "Martin v. Loewis" wrote:

> That's because you need to re-learn some things.


Apparently so, every little item is a lesson. Thank you.

-- Gnarlie

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Python unicode utf-8 characters and MySQL unicode utf-8 characters Grzegorz ¦liwiński Python 2 01-19-2011 07:31 AM
Re: convert unicode characters to visibly similar ascii characters Laszlo Nagy Python 6 07-02-2008 04:42 PM
Re: convert unicode characters to visibly similar ascii characters M.-A. Lemburg Python 0 07-02-2008 08:39 AM
Re: convert unicode characters to visibly similar ascii characters Terry Reedy Python 0 07-01-2008 07:46 PM
In file parsing, taking the first few characters of a text file after a readfile or streamreader file read... .Net Sports ASP .Net 11 01-17-2006 12:44 AM



Advertisments