Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   XML (http://www.velocityreviews.com/forums/f32-xml.html)
-   -   New to XML (http://www.velocityreviews.com/forums/t647183-new-to-xml.html)

jodleren 12-04-2008 02:57 PM

New to XML
 
Hi

I thought that XML is simpler... my problem: I am storing some news
stories in xml, say:

<?xml version="1.0" ?>
<article>
<date>20081111</date>
<author>name with non english characters</author>
<header1>text with non english characters</header1>
</article>

the problem is non english characters - how do I store e.g. &oslash;
or &otilde; in there?

WBR
Sonnich

Martin Honnen 12-04-2008 03:11 PM

Re: New to XML
 
jodleren wrote:

> <?xml version="1.0" ?>
> <article>
> <date>20081111</date>
> <author>name with non english characters</author>
> <header1>text with non english characters</header1>
> </article>
>
> the problem is non english characters - how do I store e.g. &oslash;
> or &otilde; in there?


XML uses and supports Unicode so simply use an editor that supports
Unicode to edit and save your XML documents, that way you can use
characters directly and don't need any character or entity references.


--

Martin Honnen
http://JavaScript.FAQTs.com/

jodleren 12-04-2008 04:09 PM

Re: New to XML
 
On Dec 4, 5:11*pm, Martin Honnen <mahotr...@yahoo.de> wrote:
> jodleren wrote:
> > <?xml version="1.0" ?>
> > <article>
> > *<date>20081111</date>
> > *<author>name with non english characters</author>
> > *<header1>text with non english characters</header1>
> > </article>

>
> > the problem is non english characters - how do I store e.g. &oslash;
> > or &otilde; in there?

>
> XML uses and supports Unicode so simply use an editor that supports
> Unicode to edit and save your XML documents, that way you can use
> characters directly and don't need any character or entity references.


Well, that does not work either. Both cases fail:


<?xml version="1.0" standalone="yes"?>
<document>
<aphorism>**** happens</aphorism>
<author>unknown</author>
<language>English</language>
<more>Ø</more>
</document>



<?xml version="1.0" standalone="yes"?>
<document>
<aphorism>**** happens</aphorism>
<author>unknown</author>
<language>English</language>
<more>&Oslash;</more>
</document>


and they fail at the same line - both & and even &amp;slash; (someone
suggested that) and Ø fail.... how do I overcome this?

WBR
Sonnich

Martin Honnen 12-04-2008 04:21 PM

Re: New to XML
 
jodleren wrote:

>> XML uses and supports Unicode so simply use an editor that supports
>> Unicode to edit and save your XML documents, that way you can use
>> characters directly and don't need any character or entity references.

>
> Well, that does not work either. Both cases fail:
>
>
> <?xml version="1.0" standalone="yes"?>
> <document>
> <aphorism>**** happens</aphorism>
> <author>unknown</author>
> <language>English</language>
> <more>Ø</more>
> </document>


Works fine for me: http://home.arcor.de/martin.honnen/x...2008120403.xml

If you still think there are problems then you need to explain exactly
what you have tried and why you think it failed. I am afraid "does not
work" does not tell us what you have tried exactly and what kind of
failure you think there is. You have managed to use the character "Ø"
literally in your Usenet post, why should that pose a problem in an XML
document?

--

Martin Honnen
http://JavaScript.FAQTs.com/

Philippe Poulard 12-04-2008 04:34 PM

Re: New to XML
 
Hi,

jodleren a écrit :
> <more>Ø</more>


if you write directly such a character, you have to mention the charset
that you used with your editor:
<?xml version="1.0" encoding="[the-encoding-that-contains-theOslash]"?>
(note that if you don't specify the encoding, the default is utf-8 or
utf-16, therefore you can also replace in utf-8 the Ø by the 2 bytes C3
98 (shown here in hexa))

otherwise, you can insert a character reference whatever the encoding used:
<more>Ø</more>

> <more>&Oslash;</more>


this doesn't work because XML is not HTML; an HTML parser relies on some
hardcoded libraries of entities that maps Oslash to U+00D8, but with XML
you have to declare this mapping explicitely (with ENTITY in the DTD)
but I don't recommend such practice (trust me: don't do that)

XML contains 5 hard-coded entities: &amp; &quot; &apos; &lt; &gt;

"&amp;Oslash;" means that you explicitely wants the sequence of text
"&Oslash;" and not an entity reference

--
Cordialement,

///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !

Philippe Poulard 12-04-2008 04:37 PM

Re: New to XML
 
Here is an online tool very useful:
http://people.w3.org/rishida/scripts...conversion.php

--
Cordialement,

///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !

jodleren 12-04-2008 04:44 PM

Re: New to XML
 
On Dec 4, 6:21*pm, Martin Honnen <mahotr...@yahoo.de> wrote:
> jodleren wrote:
> >> XML uses and supports Unicode so simply use an editor that supports
> >> Unicode to edit and save your XML documents, that way you can use
> >> characters directly and don't need any character or entity references.

>
> > Well, that does not work either. Both cases fail:

>
> > <?xml version="1.0" standalone="yes"?>
> > <document>
> > *<aphorism>**** happens</aphorism>
> > *<author>unknown</author>
> > *<language>English</language>
> > *<more>Ø</more>
> > </document>

>
> Works fine for me:http://home.arcor.de/martin.honnen/x...2008120403.xml
>
> If you still think there are problems then you need to explain exactly
> what you have tried and why you think it failed. I am afraid "does not
> work" does not tell us what you have tried exactly and what kind of
> failure you think there is. You have managed to use the character "Ø"
> literally in your Usenet post, why should that pose a problem in an XML
> document?


The unicode part I realise now...

<from ie>
The error I get when _not_ unicode-saved...
The XML page cannot be displayed
Cannot view XML input using XSL style sheet. Please correct the error
and then click the Refresh button, or try again later.
--------------------------------------------------------------------------------
An invalid character was found in text content. Error processing
resource 'file:///Y:/html2/2770/articles/test.xml'. Line ...
<more>
</from ie>

When I open the file in notepad, I can save it as unicode, I have to
do so. An ordanirary text document does not do it.
This might cause problems ahead, therefor it would be easier for me to
use &oslash; instead. Would that in any way be possible?

WBR
Sonnich


Martin Honnen 12-04-2008 04:53 PM

Re: New to XML
 
jodleren wrote:

> When I open the file in notepad, I can save it as unicode, I have to
> do so. An ordanirary text document does not do it.
> This might cause problems ahead, therefor it would be easier for me to
> use &oslash; instead. Would that in any way be possible?


I stronly suggest to use Unicode encodings like UTF-8 or UTF-16, that is
what XML parsers have to support.
If you want to use other encodings then you need to simply declare them
in the XML declaration e.g.
<?xml version="1.0" encoding="ISO-8859-1"?>
is certainly possible.

As for using an entity reference, you would need to declare the entities
first in a document type definition. See
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent for how to do that. But
be aware that non-validating parsers might not read any external
resources so you would need to include the definition in the internal
subset to ensure that any XML parser knows the entities.



--

Martin Honnen
http://JavaScript.FAQTs.com/

Asger Joergensen 12-05-2008 11:52 AM

Re: New to XML
 
Hi jodleren
jodleren wrote:

> Well, that does not work either. Both cases fail:
>
>
> <?xml version="1.0" standalone="yes"?>


> <more>Ø</more>


> <more>&Oslash;</more>


>
> and they fail at the same line - both & and even &amp;slash; (someone
> suggested that) and Ø fail.... how do I overcome this?


I come from Denmark so I know about the Ø and what You need to do is:

The header should look either like this:

<?xml version="1.0" encoding="ISO-8859-1" ?>
if You save in NON-Unicode

or like this if You save in unicode:
<?xml version="1.0" encoding="UTF-8" ?>

Thes characters are not alowed in the text in XML files
& " ' < >
they are reserved for tags and they must be translated to
&amp; &quot; &apos; &lt; &gt;

If You use UTF-8 You can use all other characters

If You use ISO-8859-1 You will have to stay within ISO-8859-1
You can see what that is if You use the charmap.exe and chose
Windows:Wester under advanced.

Kind regards
Asger

jodleren 12-05-2008 02:33 PM

Re: New to XML
 
On Dec 5, 1:52*pm, "Asger Joergensen" <J...@Asger-P.dk> wrote:
> Hi jodleren
>
> jodleren wrote:
> > Well, that does not work either. Both cases fail:

>
> > <?xml version="1.0" standalone="yes"?>
> > *<more>Ø</more>
> > *<more>&Oslash;</more>

>
> > and they fail at the same line - both & and even &amp;slash; (someone
> > suggested that) and Ø fail.... how do I overcome this?

>
> I come from Denmark so I know about the Ø and what You need to do is:
>
> The header should look either like this:
>
> <?xml version="1.0" encoding="ISO-8859-1" ?>
> if You save in NON-Unicode
>
> or like this if You save in unicode:
> <?xml version="1.0" encoding="UTF-8" ?>
>
> Thes characters are not alowed in the text in XML files
> *& " ' < >
> they are reserved for tags and they must be translated to
> &amp; &quot; &apos; &lt; &gt;
>
> If You use UTF-8 You can use all other characters
>
> If You use ISO-8859-1 You will have to stay within ISO-8859-1
> You can see what that is if You use the charmap.exe and chose
> Windows:Wester under advanced.


Hejsa

Tak for svaret, det ser ud til at virker. Jeg spekulerer dog stadig
over alle de tegn, som en artikkel kan indeholde, så måske vil jeg
alligevel konvertere det hele til UTF8. Men det kan jeg gøre senere,
nu kan jeg komme videre med projektet.

Tak for hjælpen

MVH
Sonnich


All times are GMT. The time now is 12:54 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57