![]() |
New to XML
Hi
I thought that XML is simpler... my problem: I am storing some news stories in xml, say: <?xml version="1.0" ?> <article> <date>20081111</date> <author>name with non english characters</author> <header1>text with non english characters</header1> </article> the problem is non english characters - how do I store e.g. ø or õ in there? WBR Sonnich |
Re: New to XML
jodleren wrote:
> <?xml version="1.0" ?> > <article> > <date>20081111</date> > <author>name with non english characters</author> > <header1>text with non english characters</header1> > </article> > > the problem is non english characters - how do I store e.g. ø > or õ in there? XML uses and supports Unicode so simply use an editor that supports Unicode to edit and save your XML documents, that way you can use characters directly and don't need any character or entity references. -- Martin Honnen http://JavaScript.FAQTs.com/ |
Re: New to XML
On Dec 4, 5:11*pm, Martin Honnen <mahotr...@yahoo.de> wrote:
> jodleren wrote: > > <?xml version="1.0" ?> > > <article> > > *<date>20081111</date> > > *<author>name with non english characters</author> > > *<header1>text with non english characters</header1> > > </article> > > > the problem is non english characters - how do I store e.g. ø > > or õ in there? > > XML uses and supports Unicode so simply use an editor that supports > Unicode to edit and save your XML documents, that way you can use > characters directly and don't need any character or entity references. Well, that does not work either. Both cases fail: <?xml version="1.0" standalone="yes"?> <document> <aphorism>**** happens</aphorism> <author>unknown</author> <language>English</language> <more>Ø</more> </document> <?xml version="1.0" standalone="yes"?> <document> <aphorism>**** happens</aphorism> <author>unknown</author> <language>English</language> <more>Ø</more> </document> and they fail at the same line - both & and even &slash; (someone suggested that) and Ø fail.... how do I overcome this? WBR Sonnich |
Re: New to XML
jodleren wrote:
>> XML uses and supports Unicode so simply use an editor that supports >> Unicode to edit and save your XML documents, that way you can use >> characters directly and don't need any character or entity references. > > Well, that does not work either. Both cases fail: > > > <?xml version="1.0" standalone="yes"?> > <document> > <aphorism>**** happens</aphorism> > <author>unknown</author> > <language>English</language> > <more>Ø</more> > </document> Works fine for me: http://home.arcor.de/martin.honnen/x...2008120403.xml If you still think there are problems then you need to explain exactly what you have tried and why you think it failed. I am afraid "does not work" does not tell us what you have tried exactly and what kind of failure you think there is. You have managed to use the character "Ø" literally in your Usenet post, why should that pose a problem in an XML document? -- Martin Honnen http://JavaScript.FAQTs.com/ |
Re: New to XML
Hi,
jodleren a écrit : > <more>Ø</more> if you write directly such a character, you have to mention the charset that you used with your editor: <?xml version="1.0" encoding="[the-encoding-that-contains-theOslash]"?> (note that if you don't specify the encoding, the default is utf-8 or utf-16, therefore you can also replace in utf-8 the Ø by the 2 bytes C3 98 (shown here in hexa)) otherwise, you can insert a character reference whatever the encoding used: <more>Ø</more> > <more>Ø</more> this doesn't work because XML is not HTML; an HTML parser relies on some hardcoded libraries of entities that maps Oslash to U+00D8, but with XML you have to declare this mapping explicitely (with ENTITY in the DTD) but I don't recommend such practice (trust me: don't do that) XML contains 5 hard-coded entities: & " ' < > "&Oslash;" means that you explicitely wants the sequence of text "Ø" and not an entity reference -- Cordialement, /// (. .) --------ooO--(_)--Ooo-------- | Philippe Poulard | ----------------------------- http://reflex.gforge.inria.fr/ Have the RefleX ! |
Re: New to XML
Here is an online tool very useful:
http://people.w3.org/rishida/scripts...conversion.php -- Cordialement, /// (. .) --------ooO--(_)--Ooo-------- | Philippe Poulard | ----------------------------- http://reflex.gforge.inria.fr/ Have the RefleX ! |
Re: New to XML
On Dec 4, 6:21*pm, Martin Honnen <mahotr...@yahoo.de> wrote:
> jodleren wrote: > >> XML uses and supports Unicode so simply use an editor that supports > >> Unicode to edit and save your XML documents, that way you can use > >> characters directly and don't need any character or entity references. > > > Well, that does not work either. Both cases fail: > > > <?xml version="1.0" standalone="yes"?> > > <document> > > *<aphorism>**** happens</aphorism> > > *<author>unknown</author> > > *<language>English</language> > > *<more>Ø</more> > > </document> > > Works fine for me:http://home.arcor.de/martin.honnen/x...2008120403.xml > > If you still think there are problems then you need to explain exactly > what you have tried and why you think it failed. I am afraid "does not > work" does not tell us what you have tried exactly and what kind of > failure you think there is. You have managed to use the character "Ø" > literally in your Usenet post, why should that pose a problem in an XML > document? The unicode part I realise now... <from ie> The error I get when _not_ unicode-saved... The XML page cannot be displayed Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh button, or try again later. -------------------------------------------------------------------------------- An invalid character was found in text content. Error processing resource 'file:///Y:/html2/2770/articles/test.xml'. Line ... <more> </from ie> When I open the file in notepad, I can save it as unicode, I have to do so. An ordanirary text document does not do it. This might cause problems ahead, therefor it would be easier for me to use ø instead. Would that in any way be possible? WBR Sonnich |
Re: New to XML
jodleren wrote:
> When I open the file in notepad, I can save it as unicode, I have to > do so. An ordanirary text document does not do it. > This might cause problems ahead, therefor it would be easier for me to > use ø instead. Would that in any way be possible? I stronly suggest to use Unicode encodings like UTF-8 or UTF-16, that is what XML parsers have to support. If you want to use other encodings then you need to simply declare them in the XML declaration e.g. <?xml version="1.0" encoding="ISO-8859-1"?> is certainly possible. As for using an entity reference, you would need to declare the entities first in a document type definition. See http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent for how to do that. But be aware that non-validating parsers might not read any external resources so you would need to include the definition in the internal subset to ensure that any XML parser knows the entities. -- Martin Honnen http://JavaScript.FAQTs.com/ |
Re: New to XML
Hi jodleren
jodleren wrote: > Well, that does not work either. Both cases fail: > > > <?xml version="1.0" standalone="yes"?> > <more>Ø</more> > <more>Ø</more> > > and they fail at the same line - both & and even &slash; (someone > suggested that) and Ø fail.... how do I overcome this? I come from Denmark so I know about the Ø and what You need to do is: The header should look either like this: <?xml version="1.0" encoding="ISO-8859-1" ?> if You save in NON-Unicode or like this if You save in unicode: <?xml version="1.0" encoding="UTF-8" ?> Thes characters are not alowed in the text in XML files & " ' < > they are reserved for tags and they must be translated to & " ' < > If You use UTF-8 You can use all other characters If You use ISO-8859-1 You will have to stay within ISO-8859-1 You can see what that is if You use the charmap.exe and chose Windows:Wester under advanced. Kind regards Asger |
Re: New to XML
On Dec 5, 1:52*pm, "Asger Joergensen" <J...@Asger-P.dk> wrote:
> Hi jodleren > > jodleren wrote: > > Well, that does not work either. Both cases fail: > > > <?xml version="1.0" standalone="yes"?> > > *<more>Ø</more> > > *<more>Ø</more> > > > and they fail at the same line - both & and even &slash; (someone > > suggested that) and Ø fail.... how do I overcome this? > > I come from Denmark so I know about the Ø and what You need to do is: > > The header should look either like this: > > <?xml version="1.0" encoding="ISO-8859-1" ?> > if You save in NON-Unicode > > or like this if You save in unicode: > <?xml version="1.0" encoding="UTF-8" ?> > > Thes characters are not alowed in the text in XML files > *& " ' < > > they are reserved for tags and they must be translated to > & " ' < > > > If You use UTF-8 You can use all other characters > > If You use ISO-8859-1 You will have to stay within ISO-8859-1 > You can see what that is if You use the charmap.exe and chose > Windows:Wester under advanced. Hejsa Tak for svaret, det ser ud til at virker. Jeg spekulerer dog stadig over alle de tegn, som en artikkel kan indeholde, så måske vil jeg alligevel konvertere det hele til UTF8. Men det kan jeg gøre senere, nu kan jeg komme videre med projektet. Tak for hjælpen MVH Sonnich |
| All times are GMT. The time now is 12:54 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.