Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > codecs.open on Win32 -- converting my newlines to CR+LF

Reply
Thread Tools

codecs.open on Win32 -- converting my newlines to CR+LF

 
 
Ryan McGuire
Guest
Posts: n/a
 
      08-27-2009
I've got a UTF-8 encoded text file from Linux with standard newlines
("\n").

I'm reading this file on Win32 with Python 2.6:

codecs.open("whatever.txt","r","utf-8").read()

Inexplicably, all the newlines ("\n") are replaced with CR+LF ("\r
\n") ... Why?

As a workaround I'm having to do this:

open("whatever.txt","r").read().decode("utf-8")

which appropriately does not alter my newlines.

What really gets me confused though is the Python docs for
codecs.open:

"Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-
bit values. This means that no automatic conversion of '\n' is done on
reading and writing."

The way I read that, codecs.open should not touch my newlines. What am
I doing wrong? Is this a bug in Python, or in the docs, or both?
 
Reply With Quote
 
 
 
 
Philip Semanchuk
Guest
Posts: n/a
 
      08-27-2009

On Aug 26, 2009, at 10:52 PM, Ryan McGuire wrote:

> I've got a UTF-8 encoded text file from Linux with standard newlines
> ("\n").
>
> I'm reading this file on Win32 with Python 2.6:
>
> codecs.open("whatever.txt","r","utf-8").read()
>
> Inexplicably, all the newlines ("\n") are replaced with CR+LF ("\r
> \n") ... Why?


Try using "rb" instead of "r" for the mode in the call to open().

HTH
Philip

 
Reply With Quote
 
 
 
 
Ryan McGuire
Guest
Posts: n/a
 
      08-27-2009
On Aug 26, 11:04*pm, Philip Semanchuk <phi...@semanchuk.com> wrote:
> Try using "rb" instead of "r" for the mode in the call to open().
>
> HTH
> Philip


That does indeed fix the problem, thanks! Still seems like the docs
are wrong though.
 
Reply With Quote
 
Chris Rebert
Guest
Posts: n/a
 
      08-27-2009
On Wed, Aug 26, 2009 at 8:40 PM, Ryan McGuire<> wrote:
> On Aug 26, 11:04Â*pm, Philip Semanchuk <phi...@semanchuk.com> wrote:
>> Try using "rb" instead of "r" for the mode in the call to open().
>>
>> HTH
>> Philip

>
> That does indeed fix the problem, thanks! Still seems like the docs
> are wrong though.


Yeah, the need to specify "b" does seem rather incongruous:

codecs.open(filename, mode[, encoding[, errors[, buffering]]])
[...]
Note: Files are always opened in binary mode, even if no binary
mode was specified. This is done to avoid data loss due to encodings
using 8-bit values. This means that no automatic conversion of b'\n'
is done on reading and writing.

File a bug perhaps?: http://bugs.python.org/

Cheers,
Chris
--
http://blog.rebertia.com
 
Reply With Quote
 
Chris Rebert
Guest
Posts: n/a
 
      08-27-2009
On Wed, Aug 26, 2009 at 11:06 PM, Chris Rebert<> wrote:
> On Wed, Aug 26, 2009 at 8:40 PM, Ryan McGuire<> wrote:
>> On Aug 26, 11:04Â*pm, Philip Semanchuk <phi...@semanchuk.com> wrote:
>>> Try using "rb" instead of "r" for the mode in the call to open().
>>>
>>> HTH
>>> Philip

>>
>> That does indeed fix the problem, thanks! Still seems like the docs
>> are wrong though.

>
> Yeah, the need to specify "b" does seem rather incongruous:
>
> codecs.open(filename, mode[, encoding[, errors[, buffering]]])
> Â* Â*[...]
> Â* Â*Note: Files are always opened in binary mode, even if no binary
> mode was specified. This is done to avoid data loss due to encodings
> using 8-bit values. This means that no automatic conversion of b'\n'
> is done on reading and writing.
>
> File a bug perhaps?: http://bugs.python.org/


Ah, I see you already did: http://bugs.python.org/issue6788

- Chris
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Newlines in Datagrid chrismichaelgardner@hotmail.com ASP .Net 2 05-04-2009 03:06 PM
DB Text with NewLines, etc. for display in a Repeater or DataGrid =?Utf-8?B?QWxleCBNYWdoZW4=?= ASP .Net 2 06-15-2006 03:41 PM
Validating Newlines and Carriage Returns via Schema Porthos XML 1 07-26-2005 11:13 PM
newlines in textboxes headware ASP .Net 1 09-22-2004 08:10 PM
i18n - Newlines in ResourceBundle messages Rhino Java 2 12-22-2003 08:20 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57