Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Why BOM in logging message?

Reply
Thread Tools

Why BOM in logging message?

 
 
Roy Smith
Guest
Posts: n/a
 
      01-09-2013
We've got 10 (supposedly) identical servers, all running Ubuntu 12.04,
Python 2.7, Django 1.3. We log to syslog using the logging module and
a custom fomatter.

'formatters': {
'verbose': {
'format': '%(asctime)s [%(process)d]: %(program)s %(session_id)s %(request_id)s %(request_id_digest)s %(remote_addr)s %(name)s %(level\
name)s %(funcName)s() %(message)s',
'()': 'songza.logging.ContextFormatter',
},
},

There's nothing particularly exciting in the formatter code:

class ContextFormatter(logging.Formatter):
def format(self, record):
record.program = context.get_program()
record.request_id = context.get_request_id()
record.request_id_digest = context.get_request_id_digest()
record.session_id = context.get_session_id() or '-'
record.remote_addr = context.get_remote_addr() or '-'
return logging.Formatter.format(self, record)

What's weird is that two of the servers, and only those two, stick a
BOM (Byte Order Mark) in front of the message they log. It shows up
in syslog as:

2013-01-09T00:00:00+00:00 web5.songza.com <U+FEFF>2013-01-0900:00:00,754 [18979]: [etc...]

The other machines, never put the BOM in. Given that all 10 machines
are ostensibly clones of each other, we're scratching our heads what
might be different about those two which cause the BOMs to appear.
Any ideas?

I suppose it's possible it's a syslog config problem and not a Python
problem, but I figured I'd start at the beginning of the chain.
 
Reply With Quote
 
 
 
 
Chris Angelico
Guest
Posts: n/a
 
      01-09-2013
On Thu, Jan 10, 2013 at 9:54 AM, Roy Smith <(E-Mail Removed)> wrote:
> What's weird is that two of the servers, and only those two, stick a
> BOM (Byte Order Mark) in front of the message they log.


Could it be this issue you're looking at?

http://bugs.python.org/issue14452

What are the exact Python versions in use? Are the two different
servers running an older revision of Py 2.7?

ChrisA
 
Reply With Quote
 
 
 
 
John Gordon
Guest
Posts: n/a
 
      01-09-2013
In <kcksg2$bkp$(E-Mail Removed)> http://www.velocityreviews.com/forums/(E-Mail Removed) (Roy Smith) writes:

> What's weird is that two of the servers, and only those two, stick a
> BOM (Byte Order Mark) in front of the message they log. It shows up
> in syslog as:


> 2013-01-09T00:00:00+00:00 web5.songza.com <U+FEFF>2013-01-0900:00:00,754 [18979]: [etc...]


I worked on an application that would insert a BOM in syslog messages if
the logged message contained unicode, but not if it was plain ascii.

Not sure if this relates to your issue, but it's similar enough that it
seemed worth mentioning.

--
John Gordon A is for Amy, who fell down the stairs
(E-Mail Removed) B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"

 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      01-10-2013
In article <kcktti$sbg$(E-Mail Removed)>,
John Gordon <(E-Mail Removed)> wrote:

> In <kcksg2$bkp$(E-Mail Removed)> (E-Mail Removed) (Roy Smith) writes:
>
> > What's weird is that two of the servers, and only those two, stick a
> > BOM (Byte Order Mark) in front of the message they log. It shows up
> > in syslog as:

>
> > 2013-01-09T00:00:00+00:00 web5.songza.com <U+FEFF>2013-01-0900:00:00,754
> > [18979]: [etc...]

>
> I worked on an application that would insert a BOM in syslog messages if
> the logged message contained unicode, but not if it was plain ascii.
>
> Not sure if this relates to your issue, but it's similar enough that it
> seemed worth mentioning.


That doesn't seem to be it. All messages from web{2,5} have BOMs, no
message from web{1,3,4,6,7,8,9,10} ever does.

I even tried looking at the output of socket.gethostname() on the
various machines to see if maybe the hostname had some unicode character
in it. No joy.
 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      01-10-2013
In article <(E-Mail Removed)>,
Chris Angelico <(E-Mail Removed)> wrote:

> On Thu, Jan 10, 2013 at 9:54 AM, Roy Smith <(E-Mail Removed)> wrote:
> > What's weird is that two of the servers, and only those two, stick a
> > BOM (Byte Order Mark) in front of the message they log.

>
> Could it be this issue you're looking at?
>
> http://bugs.python.org/issue14452
>
> What are the exact Python versions in use? Are the two different
> servers running an older revision of Py 2.7?
>
> ChrisA


It sounds like it might be it, but we're running 2.7.3 on all machines.
 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      01-10-2013
In article <(E-Mail Removed)>,
Roy Smith <(E-Mail Removed)> wrote:
>In article <(E-Mail Removed)>,
> Chris Angelico <(E-Mail Removed)> wrote:
>
>> On Thu, Jan 10, 2013 at 9:54 AM, Roy Smith <(E-Mail Removed)> wrote:
>> > What's weird is that two of the servers, and only those two, stick a
>> > BOM (Byte Order Mark) in front of the message they log.

>>
>> Could it be this issue you're looking at?
>>
>> http://bugs.python.org/issue14452
>>
>> What are the exact Python versions in use? Are the two different
>> servers running an older revision of Py 2.7?
>>
>> ChrisA

>
>It sounds like it might be it, but we're running 2.7.3 on all machines.


Well, this is fascinating. It turns out that while all of our
machines report that they're running 2.7.3, they have two different
versions of /usr/lib/python2.7/logging/handlers.py!

-rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
-rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py

The April 24th version has the BOM code in SysLogHander.emit():

| # Message is a string. Convert to bytes as required by RFC 5424
| if type(msg) is unicode:
| msg = msg.encode('utf-8')
| if codecs:
| msg = codecs.BOM_UTF8 + msg
| msg = prio + msg

and the August 1st version doesn't:

| # Message is a string. Convert to bytes as required by RFC 5424
| if type(msg) is unicode:
| msg = msg.encode('utf-8')
| msg = prio + msg

Is it possible there's two different 2.7.3 builds that somehow got
packaged by Ubuntu at different times? The pattern of which machines
have the August code and which have the April code correlates with
when we rolled out each server instance.
 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      01-10-2013
On Fri, Jan 11, 2013 at 3:06 AM, Roy Smith <(E-Mail Removed)> wrote:
> Well, this is fascinating. It turns out that while all of our
> machines report that they're running 2.7.3, they have two different
> versions of /usr/lib/python2.7/logging/handlers.py!
>
> -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
> -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py


Ha, that would do it!

I don't have a corresponding system to compare against, but what
package is that file managed by?

$ dpkg -S /usr/lib/python2.7/logging/handlers.py

See if both systems show it as part of the same package, and if so, if
the package is at the same version on each. On my Maverick desktop, I
have 2.6, and the package is python2.6-minimal.

ChrisA
 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      01-10-2013
>On Fri, Jan 11, 2013 at 3:06 AM, Roy Smith <(E-Mail Removed)> wrote:
>> -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
>> -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py


Chris Angelico <(E-Mail Removed)> wrote:
>$ dpkg -S /usr/lib/python2.7/logging/handlers.py


Yup, on some machines we've got 2.7.3-0ubuntu3, and on others,
2.7.3-0ubuntu3.1 of python2.7-minimal. Details at:

https://launchpad.net/ubuntu/+source...7.3-0ubuntu3.1

Well, I guess today is dist-upgrade day
 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      01-10-2013
On Fri, Jan 11, 2013 at 3:40 AM, Roy Smith <(E-Mail Removed)> wrote:
>>On Fri, Jan 11, 2013 at 3:06 AM, Roy Smith <(E-Mail Removed)> wrote:
>>> -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
>>> -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py

>
> Chris Angelico <(E-Mail Removed)> wrote:
>>$ dpkg -S /usr/lib/python2.7/logging/handlers.py

>
> Yup, on some machines we've got 2.7.3-0ubuntu3, and on others,
> 2.7.3-0ubuntu3.1 of python2.7-minimal. Details at:
>
> https://launchpad.net/ubuntu/+source...7.3-0ubuntu3.1


I love it when everything adds up! The message even cites the specific
change. It's like a cryptic crossword - once you get the answer, you
KNOW it's the answer because suddenly it all makes sense

Thanks for bringing a fun problem to solve! It's (unfortunately) a
refreshing change to read a post from someone who knows how to ask
smart questions.

ChrisA
 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      01-10-2013
On 1/10/2013 11:06 AM, Roy Smith wrote:

> Well, this is fascinating. It turns out that while all of our
> machines report that they're running 2.7.3, they have two different
> versions of /usr/lib/python2.7/logging/handlers.py!
>
> -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
> -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py
>
> The April 24th version has the BOM code in SysLogHander.emit():


The 'BOM' you are discussing here is not a true 2 or 4 byte
byte-order-mark, but the pseudo-BOM (pseudo because a stream of single
bytes has no byte order within the single bytes) that Micro$tupid adds
(sometimes) to utf-8 encoded bytes to mark their encoding as utf-8
rather than anything else. In otherwords, it is a non-(anti-)standard
U(nicode)E(ncoding)M(ark). It is actually the utf-8 encoding of the
2-byte BOM, and hence not a single mark! It is really confusing when
people use 'BOM' to refer to a UEM sequence instead of a BOM.

> | # Message is a string. Convert to bytes as required by RFC 5424
> | if type(msg) is unicode:
> | msg = msg.encode('utf-8')
> | if codecs:
> | msg = codecs.BOM_UTF8 + msg
> | msg = prio + msg


2.7.3 was released in April.

> and the August 1st version doesn't:
>
> | # Message is a string. Convert to bytes as required by RFC 5424
> | if type(msg) is unicode:
> | msg = msg.encode('utf-8')
> | msg = prio + msg


The issue referenced in an earlier messaged was to remove the UEM where
it did not belong.

> Is it possible there's two different 2.7.3 builds that somehow got
> packaged by Ubuntu at different times?


As you discovered, Ubuntu sometimes updates a release with bug patches
before we release a new official bugfix release. 2.7.4, with many
bugfixes, is still to see the light of day.

> The pattern of which machines
> have the August code and which have the April code correlates with
> when we rolled out each server instance.


Great detective work .

--
Terry Jan Reedy

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Write UTF-8 BOM marker char8s) at the start of file? X_AWieminer_X Java 2 08-18-2011 05:44 PM
why why why why why Mr. SweatyFinger ASP .Net 4 12-21-2006 01:15 PM
findcontrol("PlaceHolderPrice") why why why why why why why why why why why Mr. SweatyFinger ASP .Net 2 12-02-2006 03:46 PM
How to write UTF-16 with BOM in little endian Von: Jean-Marc Autexier <jmau2002@web.de> Datum: Samstag, 30. August 2003 13:35:54 Gruppen: comp.lang.java.help Keine Referenzen Jean-Marc Autexier Java 2 08-30-2003 09:04 PM
TextStreamReader with transparent unicode BOM Support X_AWemner_X Java 3 07-03-2003 09:38 PM



Advertisments