Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Character encoding & the copyright symbol

Reply
Thread Tools

Character encoding & the copyright symbol

 
 
Robert Dailey
Guest
Posts: n/a
 
      08-06-2009
Hello,

I'm loading a file via open() in Python 3.1 and I'm getting the
following error when I try to print the contents of the file that I
obtained through a call to read():

UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
position 1650: character maps to <undefined>

The file is defined as ASCII and the copyright symbol shows up just
fine in Notepad++. However, Python will not print this symbol. How can
I get this to work? And no, I won't replace it with "(c)". Thanks!
 
Reply With Quote
 
 
 
 
Philip Semanchuk
Guest
Posts: n/a
 
      08-06-2009

On Aug 6, 2009, at 12:14 PM, Robert Dailey wrote:

> Hello,
>
> I'm loading a file via open() in Python 3.1 and I'm getting the
> following error when I try to print the contents of the file that I
> obtained through a call to read():
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
> position 1650: character maps to <undefined>
>
> The file is defined as ASCII and the copyright symbol shows up just
> fine in Notepad++. However, Python will not print this symbol. How can
> I get this to work? And no, I won't replace it with "(c)". Thanks!


If the file is defined as ASCII and it contains 0xa9, then the file
was written incorrectly or you were told the wrong encoding. There is
no such character in ASCII which runs from 0x00 - 0x7f.

The copyright symbol == 0xa9 if the encoding is ISO-8859-1 or
windows-1252, and since you're on Windows the latter is a likely bet.

http://en.wikipedia.org/wiki/Ascii
http://en.wikipedia.org/wiki/Iso-8859-1
http://en.wikipedia.org/wiki/Windows-1252


Bottom line is that your file is not in ASCII. Try specifying
windows-1252 as the encoding. Without seeing your code I can't tell
you where you need to specify the encoding, but the Python docs should
help you out.


HTH
Philip

 
Reply With Quote
 
 
 
 
Richard Brodie
Guest
Posts: n/a
 
      08-06-2009

"Robert Dailey" <> wrote in message
news:29ab0981-b95d-4435-91bd-...

> UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
> position 1650: character maps to <undefined>
>
> The file is defined as ASCII.


That's the problem: ASCII is a seven bit code. What you have is
actually ISO-8859-1 (or possibly Windows-1252).

The different ISO-8859-n variants assign various characters to
to '\xa9'. Rather than being Western-European centric and assuming
ISO-8859-1 by default, Python throws an error when you stray
outside of strict ASCII.


 
Reply With Quote
 
Robert Dailey
Guest
Posts: n/a
 
      08-06-2009
On Aug 6, 11:31*am, "Richard Brodie" <R.Bro...@rl.ac.uk> wrote:
> "Robert Dailey" <rcdai...@gmail.com> wrote in message
>
> news:29ab0981-b95d-4435-91bd-...
>
> > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
> > position 1650: character maps to <undefined>

>
> > The file is defined as ASCII.

>
> That's the problem: ASCII is a seven bit code. What you have is
> actually ISO-8859-1 (or possibly Windows-1252).
>
> The different ISO-8859-n variants assign various characters to
> to '\xa9'. Rather than being Western-European centric and assuming
> ISO-8859-1 by default, Python throws an error when you stray
> outside of strict ASCII.


Thanks for the help guys. Sorry I left out code, I wasn't sure at the
time if it would be helpful. Below is my code:


#================================================= =======
def GetFileContentsAsString( file ):
f = open( file, mode='r', encoding='cp1252' )
contents = f.read()
f.close()
return contents

#================================================= =======
def ReplaceVersion( file, version, regExps ):
#match = regExps[0].search( 'FILEVERSION 1,45332,2100,32,' )
#print( match.group() )
text = GetFileContentsAsString( file )
print( text )


As you can see, I am trying to load the file with encoding 'cp1252'
which, according to the python 3.1 docs, translates to windows-1252. I
also tried 'latin_1', which translates to ISO-8859-1, but this did not
work either. Am I doing something else wrong?
 
Reply With Quote
 
Albert Hopkins
Guest
Posts: n/a
 
      08-06-2009
On Thu, 2009-08-06 at 09:14 -0700, Robert Dailey wrote:
> Hello,
>
> I'm loading a file via open() in Python 3.1 and I'm getting the
> following error when I try to print the contents of the file that I
> obtained through a call to read():
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
> position 1650: character maps to <undefined>
>
> The file is defined as ASCII and the copyright symbol shows up just
> fine in Notepad++. However, Python will not print this symbol. How can
> I get this to work? And no, I won't replace it with "(c)". Thanks!


It's not actually ASCII but Windows-1252 extended ASCII-like. So with
that information you can do either of 2 things: You can open it in text
mode and specify the encoding:

>>> fp = open(filename, 'r', encoding='windows-1252')
>>> s = fp.read()
>>> print(s)


or you can open it in binary mode and decode it later:

>>> fp = open(filename, 'rb')
>>> b = fp.read()
>>> print(str(b, encoding='windows-1252'))


Or you may be able to set the default encoding to windows-1252 but I
don't know how to do that (in Windows).

p.s.

Next time it might be helpful to paste a code snippet else we have to
make assumptions about what you are actually doing.

 
Reply With Quote
 
Richard Brodie
Guest
Posts: n/a
 
      08-06-2009

"Robert Dailey" <> wrote in message
news:f64f9830-c416-41b1-a510-...

> As you can see, I am trying to load the file with encoding 'cp1252'
> which, according to the python 3.1 docs, translates to windows-1252. I
> also tried 'latin_1', which translates to ISO-8859-1, but this did not
> work either. Am I doing something else wrong?


Probably it's just the debugging print that has a problem, and if you
opened an output file with an encoding specified it would be fine.
When you get a UnicodeEncodingError, it's conversion _from_
Unicode that has failed.


 
Reply With Quote
 
Philip Semanchuk
Guest
Posts: n/a
 
      08-06-2009

On Aug 6, 2009, at 12:41 PM, Robert Dailey wrote:

> On Aug 6, 11:31 am, "Richard Brodie" <R.Bro...@rl.ac.uk> wrote:
>> "Robert Dailey" <rcdai...@gmail.com> wrote in message
>>
>> news:29ab0981-b95d-4435-91bd-
>> ...
>>
>>> UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
>>> position 1650: character maps to <undefined>

>>
>>> The file is defined as ASCII.

>>
>> That's the problem: ASCII is a seven bit code. What you have is
>> actually ISO-8859-1 (or possibly Windows-1252).
>>
>> The different ISO-8859-n variants assign various characters to
>> to '\xa9'. Rather than being Western-European centric and assuming
>> ISO-8859-1 by default, Python throws an error when you stray
>> outside of strict ASCII.

>
> Thanks for the help guys. Sorry I left out code, I wasn't sure at the
> time if it would be helpful. Below is my code:
>
>
> #================================================= =======
> def GetFileContentsAsString( file ):
> f = open( file, mode='r', encoding='cp1252' )
> contents = f.read()
> f.close()
> return contents
>
> #================================================= =======
> def ReplaceVersion( file, version, regExps ):
> #match = regExps[0].search( 'FILEVERSION 1,45332,2100,32,' )
> #print( match.group() )
> text = GetFileContentsAsString( file )
> print( text )
>
>
> As you can see, I am trying to load the file with encoding 'cp1252'
> which, according to the python 3.1 docs, translates to windows-1252. I
> also tried 'latin_1', which translates to ISO-8859-1, but this did not
> work either. Am I doing something else wrong?



Are you getting the error when you read the file or when you
print(text)?

As a side note, you should probably use something other than "file"
for the parameter name in GetFileContentsAsString() since file() is a
Python function.




 
Reply With Quote
 
Nobody
Guest
Posts: n/a
 
      08-06-2009
On Thu, 06 Aug 2009 09:14:08 -0700, Robert Dailey wrote:

> I'm loading a file via open() in Python 3.1 and I'm getting the
> following error when I try to print the contents of the file that I
> obtained through a call to read():
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
> position 1650: character maps to <undefined>
>
> The file is defined as ASCII and the copyright symbol shows up just
> fine in Notepad++. However, Python will not print this symbol. How can
> I get this to work? And no, I won't replace it with "(c)". Thanks!


1. As others have said, your file *isn't* ASCII, but that isn't the
problem.

2. The problem is that the encoding which your standard output
stream uses doesn't have the copyright symbol. You need to use something
like:

sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding = 'iso-8859-1')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding = 'iso-8859-1')

to fix the encoding of the stdout and stderr streams.

 
Reply With Quote
 
Martin v. Löwis
Guest
Posts: n/a
 
      08-06-2009
> As a side note, you should probably use something other than "file" for
> the parameter name in GetFileContentsAsString() since file() is a Python
> function.


Python 3.1.1a0 (py3k:74094, Jul 19 2009, 13:39:42)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> file
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'file' is not defined

Regards,
Martin

 
Reply With Quote
 
Philip Semanchuk
Guest
Posts: n/a
 
      08-06-2009

On Aug 6, 2009, at 3:14 PM, Martin v. Löwis wrote:

>> As a side note, you should probably use something other than "file"
>> for
>> the parameter name in GetFileContentsAsString() since file() is a
>> Python
>> function.

>
> Python 3.1.1a0 (py3k:74094, Jul 19 2009, 13:39:42)
> [GCC 4.3.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> py> file
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'file' is not defined



Whooops, didn't know about that change from 2.x to 3.x. Thanks.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
copyright symbol in web.config =?Utf-8?B?bW9XaGl0ZQ==?= ASP .Net 10 08-22-2012 05:29 AM
Youtube copyright infringements are not all bad for the copyright holders? Colin B Digital Photography 195 01-19-2007 09:00 AM
character encoding +missing character sequence raavi Java 2 03-02-2006 05:01 AM
what's differnece between #ifdef symbol and #if defined(symbol) baumann@pan C Programming 1 04-15-2005 08:25 AM
How do you get copyright symbol in a picture using Irfanview? GrailKing@oops!.Not.The.Realm.net Computer Support 3 09-03-2003 12:22 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57