Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Mapforce: mapping to CSV without column header line inserts hex FF FE FF FE

Reply
Thread Tools

Mapforce: mapping to CSV without column header line inserts hex FF FE FF FE

 
 
Lukas
Guest
Posts: n/a
 
      12-09-2005
Hi Group,

In Mapforce 2005 R3, when mapping to CSV with the "First row contains
field names" option UN-checked on the CSV target component settings,
the characters (hex) FF FE FF FE are inserted in the beginning of the
first line when running Java code autogenerated by Mapforce.

In the output tab of the Mapforce application, this problem doesn't
occur. I've not checked whether it occurs when running C#,C++ or XSLT
autogenerated code.

I've encountered this problem when mapping XML to CSV and CSV to CSV.

Does anyone know whether this is this a known bug? Is it fixed in a
later release?
Any known workarounds?

Not holding my breath,

Lukas

 
Reply With Quote
 
 
 
 
Lukas
Guest
Posts: n/a
 
      12-12-2005
Correction:

My editor was displaying those bytes incorrectly.
The bytes inserted are actually:

EF BB BF

 
Reply With Quote
 
 
 
 
Peter Flynn
Guest
Posts: n/a
 
      12-12-2005
Lukas wrote:

> Hi Group,
>
> In Mapforce 2005 R3, when mapping to CSV with the "First row contains
> field names" option UN-checked on the CSV target component settings,
> the characters (hex) FF FE FF FE are inserted in the beginning of the
> first line when running Java code autogenerated by Mapforce.
>
> In the output tab of the Mapforce application, this problem doesn't
> occur. I've not checked whether it occurs when running C#,C++ or XSLT
> autogenerated code.
>
> I've encountered this problem when mapping XML to CSV and CSV to CSV.
>
> Does anyone know whether this is this a known bug? Is it fixed in a
> later release?
> Any known workarounds?


It's not a bug, it's part of XML. It's the Byte Order Mark (BOM) which
is designed to signal to a processor before processing starts which
16-bit character encoding is in use. It's being output because your
processor is emitting UCS-2 which is probably unnecessary unless you
are using a very wide range of character repertoire planes. Check the
Mapforce output settings and switch to UTF-8 instead.

///Peter
--
See FAQ: http://xml.silmaril.ie/appendix/glossary/#bom

 
Reply With Quote
 
Richard Tobin
Guest
Posts: n/a
 
      12-13-2005
In article <(E-Mail Removed) .com>,
Lukas <(E-Mail Removed)> wrote:

>My editor was displaying those bytes incorrectly.
>The bytes inserted are actually:
>
>EF BB BF


I can't help you directly, but EF BB BF is the UTF-8 code for a
byte-order mark (or "BOM"). Maybe you can look that up in the manual
for your software.

-- Richard
 
Reply With Quote
 
Lukas
Guest
Posts: n/a
 
      12-14-2005
Sorry for the confusion. The sequence was actually EF BB BF (UTF-8 BOM,
as Richard notes).

What confuses me about the UTF-8 BOM issue:

A) In XML: Since I'm using UTF-8, which is a 7 bit encoding, and the
xml processing instruction says so explicitly, why would I want to have
nasty binary at the start of my document?

B)
* In Text (CSV): some articles claim that Windows Notepad handles the
BOM gracefully, but in our project the issue would've not even been
raised if our editors had not displayed spurious characters;
... "" (if you view this in ISO 8859-1) in Notepad, a dot in
Ultraedit 8.2. When switching to hex in Ultraedit, completely wrong
values are being displayed throug the length of the doc.

* The issue did not occur when (in Mapforce) the option "First row
contains field names" was checked for the output CSV, although we
viewed the output files with the same editors.

* Mapforce ITSELF doesn't handle the BOM gracefully. If the CSV output
with BOM from one Mapforce code-gen mapping is fed as input to another,
the BOM is visible in the first field and trips up functions operating
on that field.

 
Reply With Quote
 
Lukas
Guest
Posts: n/a
 
      12-14-2005
Sorry, something doesn't display in my last post. It's meant to read:

...

* * * * * * *
* * * *
* * * *
* * * *
* * * *
* * * * *
* * * ****

(if you view this in ISO 8859-1) in Notepad, a dot ...

 
Reply With Quote
 
Richard Tobin
Guest
Posts: n/a
 
      12-14-2005
In article <(E-Mail Removed) .com>,
Lukas <(E-Mail Removed)> wrote:

>A) In XML: Since I'm using UTF-8, which is a 7 bit encoding, and the
>xml processing instruction says so explicitly, why would I want to have
>nasty binary at the start of my document?


UTF-8 is not a 7-bit encoding! It corresponds to ASCII for characters
up to 127, but uses bytes with the high bit set to encode the rest of
Unicode.

>* In Text (CSV): some articles claim that Windows Notepad handles the
>BOM gracefully, but in our project the issue would've not even been
>raised if our editors had not displayed spurious characters;
>.. "" (if you view this in ISO 8859-1) in Notepad


I don't know anything about Notepad, but if you see those characters -
i with diaeresis, double greater-than, inverted question mark - it
means that the program is interpreting the document as 8859-1 rather
than UTF-8. Of course, the whole point of the UTF-8 BOM is to let it
know that it's in UTF-8!

-- Richard
 
Reply With Quote
 
Peter Flynn
Guest
Posts: n/a
 
      12-14-2005
Lukas wrote:

> Sorry for the confusion. The sequence was actually EF BB BF (UTF-8
> BOM, as Richard notes).
>
> What confuses me about the UTF-8 BOM issue:
>
> A) In XML: Since I'm using UTF-8, which is a 7 bit encoding,


Whoah there. UTF-8 uses all 8 bits in the byte. Where did you get the
information that it's 7-bit? The only 7-bit encoding in widespread
use is US-ASCII.

> and the
> xml processing instruction says so explicitly, why would I want to
> have nasty binary at the start of my document?


To identify that it is UTF-8 as opposed to UTF-16 or UTF-32.
If your XML software can't handle it, it's broken and should be
replaced.

> B)
> * In Text (CSV): some articles claim that Windows Notepad handles the
> BOM gracefully, but in our project the issue would've not even been
> raised if our editors had not displayed spurious characters;
> .. "" (if you view this in ISO 8859-1) in Notepad, a dot in
> Ultraedit 8.2. When switching to hex in Ultraedit, completely wrong
> values are being displayed throug the length of the doc.


While most plaintext editors will display ASCII or ISO-8859-1
adequately, large numbers of them spit blood when faced with anything
else. Notepad is suitable for shopping lists and not much else.

> * The issue did not occur when (in Mapforce) the option "First row
> contains field names" was checked for the output CSV, although we
> viewed the output files with the same editors.
>
> * Mapforce ITSELF doesn't handle the BOM gracefully. If the CSV output
> with BOM from one Mapforce code-gen mapping is fed as input to
> another, the BOM is visible in the first field and trips up functions
> operating on that field.


Sounds like Mapforce is broken and you should complain to the vendor.

///Peter
--
XML FAQ: http://xml.silmaril.ie/

 
Reply With Quote
 
Shmuel (Seymour J.) Metz
Guest
Posts: n/a
 
      12-19-2005
In <dnp505$kl3$(E-Mail Removed)>, on 12/14/2005
at 12:59 PM, http://www.velocityreviews.com/forums/(E-Mail Removed) (Richard Tobin) said:

>I don't know anything about Notepad, but if you see those characters
>-
>i with diaeresis, double greater-than, inverted question mark - it
>means that the program is interpreting the document as 8859-1 rather
>than UTF-8. Of course, the whole point of the UTF-8 BOM is to let it
>know that it's in UTF-8!


Why would you need a BOM for UTF-8? It's only needed for characters
larger than an octet, e.g., UTF-16, raw UCS4.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to (E-Mail Removed)

 
Reply With Quote
 
Richard Tobin
Guest
Posts: n/a
 
      12-19-2005
In article <43a6ba3b$28$fuzhry+tra$(E-Mail Removed)> ,
Shmuel (Seymour J.) Metz <(E-Mail Removed)> wrote:

>Why would you need a BOM for UTF-8? It's only needed for characters
>larger than an octet, e.g., UTF-16, raw UCS4.


It also serves to indicate the encoding, as well as which byte-order
variant.

-- Richard
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Odd csv column-name truncation with only one column Tim Chase Python 7 07-20-2012 06:09 PM
Re: Odd csv column-name truncation with only one column Peter Otten Python 0 07-19-2012 11:49 AM
Hex Color Codes - Hex 6 <=> Hex 3 lucanos@gmail.com HTML 10 08-18-2005 11:21 PM
Image in header column (not replacing column header text) hansiman ASP .Net Datagrid Control 3 02-07-2004 12:17 AM
hex(-5) => Futurewarning: ugh, can't we have a better hex than '-'[:n<0]+hex(abs(n)) ?? Bengt Richter Python 6 08-19-2003 07:33 AM



Advertisments