Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > PEP 8: Byte Order Mark (BOM) vs coding cookie

Reply
Thread Tools

PEP 8: Byte Order Mark (BOM) vs coding cookie

 
 
twyk
Guest
Posts: n/a
 
      08-24-2008
PEP 8 says ...

Files using ASCII (or UTF-8, for Python 3.0) should not have a coding
cookie.

What about a BOM (Byte Order Mark)? Per Wikipedia ...

http://en.wikipedia.org/wiki/Byte-or...#endnote_UTF-8)

'In UTF-8, this is not really a "byte order" mark. It identifies the
text as UTF-8 but doesn't say anything about the byte order, because
UTF-8 does not have byte order issues.'

So is it good style to omit the BOM in UTF-8 for Python 3.0?
 
Reply With Quote
 
 
 
 
Marc 'BlackJack' Rintsch
Guest
Posts: n/a
 
      08-24-2008
On Sun, 24 Aug 2008 07:28:53 -0700, twyk wrote:

> So is it good style to omit the BOM in UTF-8 for Python 3.0?


I'd say yes because it is unnecessary with UTF-8 and it messes up the she-
bang line of scripts.

Ciao,
Marc 'BlackJack' Rintsch
 
Reply With Quote
 
 
 
 
Terry Reedy
Guest
Posts: n/a
 
      08-25-2008


twyk wrote:
> PEP 8 says ...
>
> Files using ASCII (or UTF-8, for Python 3.0) should not have a coding
> cookie.


> What about a BOM (Byte Order Mark)? Per Wikipedia ...
>
> http://en.wikipedia.org/wiki/Byte-or...#endnote_UTF-8)
>
> 'In UTF-8, this is not really a "byte order" mark. It identifies the
> text as UTF-8 but doesn't say anything about the byte order, because
> UTF-8 does not have byte order issues.'
>
> So is it good style to omit the BOM in UTF-8 for Python 3.0?


According to Unicode manual, yes.

http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf

The endian order entry for UTF-8 in Table 2-4 is marked N/A because
UTF-8 code units are 8 bits in size, and the usual machine issues of
endian order for larger code units do not apply. The serialized order of
the bytes must not depart from the order defined by the UTF-
8 encoding form. Use of a BOM is neither required nor recommended for
UTF-8, but may be encountered in contexts where UTF-8 data is converted
from other encoding forms that use a BOM or where the BOM is used as a
UTF-8 signature. See the “Byte Order Mark” subsection in Section 16.8,
Specials, for more information.

Since Ascii files *are*, by intentional design, UTF-8 files, and since
Python assumes Ascii/UTF-8 as the default, in the absence of a coding
cookie, it does not need the signature.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
utf-16 little endian byte order mark with libxml-ruby Tim Perrett Ruby 1 07-25-2007 12:50 PM
XML-Parsing with UTF-8 Byte-Order-Mark (BOM) Patrick.Gebhardt@gmail.com Java 3 06-29-2007 05:18 PM
XML-Parsing with UTF-8 Byte-Order-Mark (BOM) Patrick.Gebhardt@gmail.com Java 0 06-25-2007 03:50 PM
Reading a signed byte in network byte order Robert Evans Ruby 7 11-15-2005 11:14 PM
PEP for new modules (I read PEP 2) Christoph Becker-Freyseng Python 3 01-16-2004 04:26 PM



Advertisments