Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Unicode values

Thread Tools

Unicode values
Posts: n/a

Can any one explain me the difference between unicode and hexadecimal
entity used in xml.

Reply With Quote
Andreas Prilop
Posts: n/a
On Tue, 19 Feb 2008, Removed) wrote:

> Can any one explain me the difference between unicode and hexadecimal
> entity used in xml.

For example, the Devanagari letter 'ka' has the position U+0915
in Unicode and can be referenced in both HTML and XML as क
or as क .

Solipsists of the world - unite!
Reply With Quote
Andy Dingley
Posts: n/a
On 19 Feb, 14:00, (E-Mail Removed) wrote:

> Can any one explain me the difference between unicode and hexadecimal
> entity used in xml.

Try searching for "Jukka Korpela" and Unicode. He has an O'Reilly book
and a very useful website on the topic. Wikipedia is worth reading

"Unicode" defines a "character set". There are also "encodings" that
specify how computers interpret sequences of bytes or numbers to turn
them into characters. There may be many encodings that all specify the
same character in the same character set, which can get complicated.

Character sets before Unicode tended to work for only one language at
a time. This made them manageably smaller, but also inconvenient for
multi-language work. Unicode takes the different approach: one single,
huge character set for everything.

When you use HTML or XML, there is only _one_ character set that is
ever used: Unicode.

There may be lots of different encodings for a HTML or XML document
(one at a time), but they all lead to Unicode characters. Most
commonly you will specify a character directly (e.g. by typing it),
which also requires you to make sure it's in a suitable encoding for
the document. Alternatively you can use a "numeric character entity"
to specify the Unicode character "" by its identifying number, either
in decimal ø or in hexadecimal ø No matter what the
document's encoding, these same numbers refer to these same
characters: it's skipping the encoding and going straight to Unicode.
This works equally in XML or HTML.

For a few of these characters, there are also "character entity
references" defined for HTML, such as ø (meaning the same "o
with a slash" character as before). These are a bit more readable than
the raw numbers. However remember that they're part of HTML only, not
XML! So you can use them in XHTML, but not in RSS.

(I've confused some definitions here between bytes / octets,
characters / codepoints and Unicode / UCS / ISO10646 in an attempt at
brevity, if not clarity. Jukka will probably accuse me of "worthless
babbling" again as a result)
Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: os.lisdir, gets unicode, returns unicode... USUALLY?!?!? Jean-Paul Calderone Python 23 11-21-2006 10:25 AM
os.lisdir, gets unicode, returns unicode... USUALLY?!?!? gabor Python 13 11-18-2006 09:23 AM
Unicode digit to unicode string Gabriele *darkbard* Farina Python 2 05-16-2006 01:15 PM
unicode wrap unicode object? ygao Python 6 04-08-2006 09:54 AM
Unicode + jsp + mysql + tomcat = unicode still not displaying Robert Mark Bram Java 0 09-28-2003 05:37 AM