Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Python 2.6 StreamReader.readline()

Reply
Thread Tools

Python 2.6 StreamReader.readline()

 
 
cpppwner@gmail.com
Guest
Posts: n/a
 
      07-24-2012
Hi,

I have a simple question, I'm using something like the following lines in python 2.6.2

reader = codecs.getreader(encoding)
lines = []
with open(filename, 'rb') as f:
lines = reader(f, 'strict').readlines(keepends=False)

where encoding == 'utf-16-be'
Everything works fine, except that lines[0] is equal to codecs.BOM_UTF16_BE
Is this behaviour correct, that the BOM is still present?

Thanks in advance for your help.

Best,
Stefan
 
Reply With Quote
 
 
 
 
Ulrich Eckhardt
Guest
Posts: n/a
 
      07-25-2012
Am 24.07.2012 17:01, schrieb http://www.velocityreviews.com/forums/(E-Mail Removed):
> reader = codecs.getreader(encoding)
> lines = []
> with open(filename, 'rb') as f:
> lines = reader(f, 'strict').readlines(keepends=False)
>
> where encoding == 'utf-16-be'
> Everything works fine, except that lines[0] is equal to codecs.BOM_UTF16_BE
> Is this behaviour correct, that the BOM is still present?


Yes, assuming the first line only contains that BOM. Technically it's a
space character, and why should those be removed?

Uli
 
Reply With Quote
 
 
 
 
Walter Dörwald
Guest
Posts: n/a
 
      07-25-2012
On 25.07.12 08:09, Ulrich Eckhardt wrote:

> Am 24.07.2012 17:01, schrieb (E-Mail Removed):
>> reader = codecs.getreader(encoding)
>> lines = []
>> with open(filename, 'rb') as f:
>> lines = reader(f, 'strict').readlines(keepends=False)
>>
>> where encoding == 'utf-16-be'
>> Everything works fine, except that lines[0] is equal to
>> codecs.BOM_UTF16_BE
>> Is this behaviour correct, that the BOM is still present?

>
> Yes, assuming the first line only contains that BOM. Technically it's a
> space character, and why should those be removed?


If the first "character" in the file is a BOM the file encoding is
probably not utf-16-be but utf-16.

Servus,
Walter

 
Reply With Quote
 
wxjmfauth@gmail.com
Guest
Posts: n/a
 
      07-25-2012
On Wednesday, July 25, 2012 11:02:01 AM UTC+2, Walter Drwald wrote:
> On 25.07.12 08:09, Ulrich Eckhardt wrote:
>
> > Am 24.07.2012 17:01, schrieb (E-Mail Removed):
> >> reader = codecs.getreader(encoding)
> >> lines = []
> >> with open(filename, 'rb') as f:
> >> lines = reader(f, 'strict').readlines(keepends=False)
> >>
> >> where encoding == 'utf-16-be'
> >> Everything works fine, except that lines[0] is equal to
> >> codecs.BOM_UTF16_BE
> >> Is this behaviour correct, that the BOM is still present?
> >
> > Yes, assuming the first line only contains that BOM. Technically it's a
> > space character, and why should those be removed?
>
> If the first "character" in the file is a BOM the file encodingis
> probably not utf-16-be but utf-16.
>
> Servus,
> Walter


The byte order mark, if present, is nothing else than
an encoded

>>> ud.name('\ufeff')

'ZERO WIDTH NO-BREAK SPACE'

*code point*.

Five "BOM" are possible (Unicode consortium). utf-8-sig, utf-16-be,
utf-16-le, utf-32-be, utf-32-le. The codecs module provide many
aliases.

The fact that utf-16/32 does correspond to -le or to -be may
vary according to the platforms, the compilers, ...

>>> sys.version

'3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit
(Intel)]'
>>> codecs.BOM_UTF16_BE

b'\xfe\xff'
>>> codecs.BOM_UTF16_LE

b'\xff\xfe'
>>> codecs.BOM_UTF16

b'\xff\xfe'
>>>


---

As far as I know, Py 2.7 or Py 3.2 never return a "BOM" when
a file is read correctly.

>>> with open('a-utf-16-be.txt', 'r', encoding='utf-16-be') as f:

.... r = f.readlines()
.... for zeile in r:
.... print(zeile.rstrip())
....
abc
lve
cur
uro
>>>



jmf

 
Reply With Quote
 
wxjmfauth@gmail.com
Guest
Posts: n/a
 
      07-25-2012
On Wednesday, July 25, 2012 11:02:01 AM UTC+2, Walter Drwald wrote:
> On 25.07.12 08:09, Ulrich Eckhardt wrote:
>
> > Am 24.07.2012 17:01, schrieb (E-Mail Removed):
> >> reader = codecs.getreader(encoding)
> >> lines = []
> >> with open(filename, 'rb') as f:
> >> lines = reader(f, 'strict').readlines(keepends=False)
> >>
> >> where encoding == 'utf-16-be'
> >> Everything works fine, except that lines[0] is equal to
> >> codecs.BOM_UTF16_BE
> >> Is this behaviour correct, that the BOM is still present?
> >
> > Yes, assuming the first line only contains that BOM. Technically it's a
> > space character, and why should those be removed?
>
> If the first "character" in the file is a BOM the file encodingis
> probably not utf-16-be but utf-16.
>
> Servus,
> Walter


The byte order mark, if present, is nothing else than
an encoded

>>> ud.name('\ufeff')

'ZERO WIDTH NO-BREAK SPACE'

*code point*.

Five "BOM" are possible (Unicode consortium). utf-8-sig, utf-16-be,
utf-16-le, utf-32-be, utf-32-le. The codecs module provide many
aliases.

The fact that utf-16/32 does correspond to -le or to -be may
vary according to the platforms, the compilers, ...

>>> sys.version

'3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit
(Intel)]'
>>> codecs.BOM_UTF16_BE

b'\xfe\xff'
>>> codecs.BOM_UTF16_LE

b'\xff\xfe'
>>> codecs.BOM_UTF16

b'\xff\xfe'
>>>


---

As far as I know, Py 2.7 or Py 3.2 never return a "BOM" when
a file is read correctly.

>>> with open('a-utf-16-be.txt', 'r', encoding='utf-16-be') as f:

.... r = f.readlines()
.... for zeile in r:
.... print(zeile.rstrip())
....
abc
lve
cur
uro
>>>



jmf

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: [Python-Dev] [python-committers] [RELEASED] Python 3.2 rc 1 R. David Murray Python 0 01-17-2011 02:23 PM
Re: [Python-Dev] [python-committers] [RELEASED] Python 3.2 rc 1 Senthil Kumaran Python 0 01-17-2011 10:31 AM
Re: [Python-Dev] [Python-3000] RELEASED Python 2.6a1 and 3.0a3 Martin v. Lwis Python 0 03-01-2008 10:51 PM
Re: [Python-Dev] [Python-3000] RELEASED Python 2.6a1 and 3.0a3 Paul Moore Python 0 03-01-2008 10:39 PM
Searching comp.lang.python/python-list@python.org (was: UTF-8) skip@pobox.com Python 0 03-10-2007 02:50 PM



Advertisments