Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > xml and java euro signs disapear

Reply
Thread Tools

xml and java euro signs disapear

 
 
flm
Guest
Posts: n/a
 
      05-11-2005
I've got an XML document that contains euro signs and looks like :

<?xml version="1.0" encoding="utf-8"?>
<merchant id="52">
<product
offerid="03543068131"
deliverycost="6,90 "
/>
....

I use this bit of Java (jdk 1.4.2) code to parse it :

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse( file_ );

The problem is the euro signs are transformed into the charactere '?'
(printing the value of a getAttribute( "deliverycost" ) gives ? on a
utf-8 terminal)

Thanks for any help,
FL

 
Reply With Quote
 
 
 
 
David Carlisle
Guest
Posts: n/a
 
      05-11-2005

You have declared that your xml file is utf-8 encoded but have used (as
far as I can tell) a byte with value 128 to represent a euro which isn't
the utf8 encoding of character 8364 which is the Euro.
You either need to declare the encoding that you are using or express
the character in an encoding-neutral form such as
"& # 8364 ;"
(without the spaces

David
 
Reply With Quote
 
 
 
 
Francois-Louis Mommens
Guest
Posts: n/a
 
      05-11-2005
Thank for you reply David.
If I use & # 8364; or even & # x20ac like you recommand I got the same
result.

FLM

*** Sent via Developersdex http://www.developersdex.com ***
 
Reply With Quote
 
Alain Ketterlin
Guest
Posts: n/a
 
      05-11-2005
"flm" <(E-Mail Removed)> writes:

> The problem is the euro signs are transformed into the charactere '?'
> (printing the value of a getAttribute( "deliverycost" ) gives ? on a
> utf-8 terminal)


The problem is in "printing", probably because your Writer object has
improper encoding and/or mismatching locale. Or because you use
System.out, which use the locale-specified encoding, which may not be
utf-8. It's probably best to give an explicit encoding/charset.

-- Alain.
 
Reply With Quote
 
Martin Honnen
Guest
Posts: n/a
 
      05-11-2005


Francois-Louis Mommens wrote:

> If I use & # 8364; or even & # x20ac like you recommand I got the same
> result.


Are you sure that output terminal is able to render a Euro symbol
properly? What happens if you do not use XML at all but try to output a
Euro symbol '' from a normal string?

--

Martin Honnen
http://JavaScript.FAQTs.com/
 
Reply With Quote
 
Rob van der Putten
Guest
Posts: n/a
 
      05-12-2005
Hi there


flm wrote:

> I've got an XML document that contains euro signs and looks like :
>
> <?xml version="1.0" encoding="utf-8"?>
> <merchant id="52">
> <product
> offerid="03543068131"
> deliverycost="6,90 ?"
> />
> ...
>
> I use this bit of Java (jdk 1.4.2) code to parse it :
>
> DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
> DocumentBuilder builder = factory.newDocumentBuilder();
> Document document = builder.parse( file_ );
>
> The problem is the euro signs are transformed into the charactere '?'
> (printing the value of a getAttribute( "deliverycost" ) gives ? on a
> utf-8 terminal)


If you want to post an UTF-8 file, use UTF-8 as charset; Set the default
charset in your browser / newsreader to UTF-8.

Set your locale to UTF-8, eg en_GB.UTF-8 or en_US.UTF-8
Set de default characterset of your editor to UTF-8.
Use an UTF-8 enabled terminal such as uxterm.
Install unicode fonts such as Cyberbit.ttf, Ariel-unicode or GNU-unifont
and install a unicode font as your default font.


Regards,
Rob
--
+----------------------------------------------------------------------+
| The EU constitution will turn the EU into an USA colony |
| Vote against the EU constitution in the referendum |
+----------------------------------------------------------------------+
 
Reply With Quote
 
Rob vd Putten
Guest
Posts: n/a
 
      05-12-2005
Hi there


Rob van der Putten wrote:

> If you want to post an UTF-8 file, use UTF-8 as charset; Set the default
> charset in your browser / newsreader to UTF-8.
>
> Set your locale to UTF-8, eg en_GB.UTF-8 or en_US.UTF-8
> Set de default characterset of your editor to UTF-8.
> Use an UTF-8 enabled terminal such as uxterm.
> Install unicode fonts such as Cyberbit.ttf, Ariel-unicode or GNU-unifont
> and install a unicode font as your default font.


If all goes well, this should be UTF-8;

Nicer typography in plain text files:

╔═══════════════ ════════════════ ═════════╗
║ ║
║ • ‘single’ and “double” quotes ║
║ ║
║ • Curly apostrophes: “We’ve been here” ║
║ ║
║ • Latin-1 apostrophe and accents: '´` ║
║ ║
║ • ‚deutsche‘ „Anführungszeichen“ ║
║ ║
║ • *, ‡, ‰, •, 3–4, —, −5/+5, ™, … ║
║ ║
║ • ASCII safety test: 1lI|, 0OD, 8B ║
║ *─────────╮ ║
║ • the euro symbol: │ 14.95 € │ ║
║ ╰─────────╯ ║
╚═══════════════ ════════════════ ═════════╝

Russian:

From a Unicode conference invitation:

Зарегистрируйтесь сейчас на Десятую Международную Конференцию по
Unicode, которая состоится 10-12 марта 1997 года в Майнце в Германии.
Конференция соберет широкий круг экспертов по вопросам глобального
Интернета и Unicode, локализации и интернационализации, воплощению и
применению Unicode в различных операционных системах и программных
приложениях, шрифтах, верстке и многоязычных компьютерных системах.

Greek:

From a speech of Demosthenes in the 4th century BC:

Οὐχὶ ταὐτὰ παρίσταταί μοι γιγνώσκειν, ὦ ἄνδρες ᾿Αθηναῖοι,
ὅταν τ᾿ εἰς τὰ πράγματα ἀποβλέψω καὶ ὅταν πρὸς τοὺς
λόγους οὓς ἀκούω· τοὺς μὲν γὰρ λόγους περὶ τοῦ
τιμωρήσασθαι Φίλιππον ὁρῶ γιγνομένους, τὰ δὲ πράγματ᾿
εἰς τοῦτο προήκοντα, ὥσθ᾿ ὅπως μὴ πεισόμεθ᾿ αὐτοὶ
πρότερον κακῶς σκέψασθαι δέον. οὐδέν οὖν ἄλλο μοι δοκοῦσιν
οἱ τὰ τοιαῦτα λέγοντες ἢ τὴν ὑπόθεσιν, περὶ ἧς βουλεύεσθαι,
οὐχὶ τὴν οὖσαν παριστάντες ὑμῖν ἁμαρτάνειν. ἐγὼ δέ, ὅτι μέν
ποτ᾿ ἐξῆν τῇ πόλει καὶ τὰ αὑτῆς ἔχειν ἀσφαλῶς καὶ Φίλιππον
τιμωρήσασθαι, καὶ μάλ᾿ ἀκριβῶς οἶδα· ἐπ᾿ ἐμοῦ γάρ, οὐ πάλαι
γέγονεν ταῦτ᾿ ἀμφότερα· νῦν μέντοι πέπεισμαι τοῦθ᾿ ἱκανὸν
προλαβεῖν ἡμῖν εἶναι τὴν πρώτην, ὅπως τοὺς συμμάχους
σώσομεν. ἐὰν γὰρ τοῦτο βεβαίως ὑπάρξῃ, τότε καὶ περὶ τοῦ
τίνα τιμωρήσεταί τις καὶ ὃν τρόπον ἐξέσται σκοπεῖν· πρὶν δὲ
τὴν ἀρχὴν ὀρθῶς ὑποθέσθαι, μάταιον ἡγοῦμαι περὶ τῆς
τελευτῆς ὁντινοῦν ποιεῖσθαι λόγον.

Δημοσθένους, Γ´ ᾿Ολυνθιακὸς

All the display, editing and conversion software you use should also be
capable of handling UTF-8.


Regards,
Rob
--
+----------------------------------------------------------------------+
| The EU constitution will turn the EU into an USA colony |
| Vote against the EU constitution in the referendum |
+----------------------------------------------------------------------+
 
Reply With Quote
 
Rob van der Putten
Guest
Posts: n/a
 
      05-12-2005
Hi there


Martin Honnen wrote:

> Are you sure that output terminal is able to render a Euro symbol
> properly? What happens if you do not use XML at all but try to output a
> Euro symbol '?' from a normal string?


Most UTF-8 enviroments display dec 128 / hex 0x80 as a glyph looking
something like;

+----+
| 00 |
| 80 |
+----+

The same applies to other glyphs in the 128 / 0x80 ... 159 / 0x9F range;

+----+
| 00 |
| 9F |
+----+

Maybe UTF-8 is somehow converterd to CP-1252.

Try yudit, http://www.yudit.org/ to view and edit your files.


Regards,
Rob
--
+----------------------------------------------------------------------+
| The EU constitution will turn the EU into an USA colony |
| Vote against the EU constitution in the referendum |
+----------------------------------------------------------------------+
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
controls disapear after adding Namespace =?Utf-8?B?TWljaGFlbA==?= ASP .Net 0 04-26-2006 05:54 PM
java xml parsing, euro signs disapear flm Java 11 05-13-2005 06:11 PM
Why do NameValuecollections Disapear from the cache? Alex ASP .Net 3 02-06-2005 03:07 PM
dynamic controls - disapear after postback Chris Thunell ASP .Net 13 07-29-2004 12:09 AM
Network Name Disapear ! ex-dom MCSE 3 12-31-2003 06:23 PM



Advertisments