Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   HTML (http://www.velocityreviews.com/forums/f31-html.html)
-   -   unicode characters in asci file (http://www.velocityreviews.com/forums/t160045-unicode-characters-in-asci-file.html)

WindAndWaves 11-22-2004 12:46 AM

unicode characters in asci file
 
Hi Gurus

Do you know if it is possible to display unicode characters (e.g. japanese
ones) in a asci based file?

TIA

- Nicolaas



Philip Ronan 11-22-2004 10:16 AM

Re: unicode characters in asci file
 
WindAndWaves wrote:

> Do you know if it is possible to display unicode characters (e.g. japanese
> ones) in a asci based file?


I'm not sure I understand what you mean.

The ASCII character set contains 128 characters consisting of uppercase and
lowercase Roman alphabets, Arabic numerals from 0-9, various punctuation
characters and 32 control codes. No Japanese characters there at all.

The Unicode standard contains thousands of characters, but it isn't the same
thing as ASCII.

If you want to include Japanese characters in an *HTML* file, then you
should use Unicode character entities. For example, the characters for
"Japan" are 日本.

Is that what you wanted?

--
Philip Ronan
phil.ronanzzz@virgin.net
(Please remove the "z"s if replying by email)



Jukka K. Korpela 11-22-2004 12:44 PM

Re: unicode characters in asci file
 
Philip Ronan <phil.ronanzzz@virgin.net> wrote:

> If you want to include Japanese characters in an *HTML* file, then you
> should use Unicode character entities.


Make it "could". Why not use UTF-8? But technically it is indeed possible
to write an HTML document that is ASCII encoded, yet contains any
characters you want.

And they are character references, not entities. See
http://www.cs.tut.fi/~jkorpela/chars/ref.html

> For example, the characters for
> "Japan" are &#x65e5;&#x672c;.


Using decimal notation works more often, though the difference is getting
more and more marginal.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html



Sybren Stuvel 11-22-2004 02:28 PM

Re: unicode characters in asci file
 
Philip Ronan <phil.ronanzzz@virgin.net> wrote:
> [...] you should use Unicode character entities.


Jukka K. Korpela replied:
> Make it "could". Why not use UTF-8?


UTF-8 *is* unicode. It's just an encoding. Philip didn't specify any
encoding - OP might as well use UCS, although it's a large.

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?

Steve Pugh 11-22-2004 02:40 PM

Re: unicode characters in asci file
 
On Mon, 22 Nov 2004 15:28:52 +0100, Sybren Stuvel
<sybrenUSE@YOURthirdtower.com.imagination> wrote:

> Philip Ronan <phil.ronanzzz@virgin.net> wrote:
>> [...] you should use Unicode character entities.

>
> Jukka K. Korpela replied:
>> Make it "could". Why not use UTF-8?

>
> UTF-8 *is* unicode. It's just an encoding. Philip didn't specify any
> encoding - OP might as well use UCS, although it's a large.


Philip said to use "Unicode character entities" and from his example it is
clear that he was talking about "Numeric character references" -
&#number; As such Philip didn't need to specify any encoding - in HTML
all character references are always to Unicode so the encoding used would
be irrelevant.

Jukka was pointing out that instead of the character references the OP
could use UTF-8 and include the characters directly in the page.

Steve

WindAndWaves 11-22-2004 08:54 PM

Re: unicode characters in asci file
 

"Steve Pugh" <steve@pugh.net> wrote in message
news:opshve0zgo06el5p@stevepughlaptop...
> On Mon, 22 Nov 2004 15:28:52 +0100, Sybren Stuvel
> <sybrenUSE@YOURthirdtower.com.imagination> wrote:
>
> > Philip Ronan <phil.ronanzzz@virgin.net> wrote:
> >> [...] you should use Unicode character entities.

> >
> > Jukka K. Korpela replied:
> >> Make it "could". Why not use UTF-8?

> >
> > UTF-8 *is* unicode. It's just an encoding. Philip didn't specify any
> > encoding - OP might as well use UCS, although it's a large.

>
> Philip said to use "Unicode character entities" and from his example it is
> clear that he was talking about "Numeric character references" -
> &#number; As such Philip didn't need to specify any encoding - in HTML
> all character references are always to Unicode so the encoding used would
> be irrelevant.
>
> Jukka was pointing out that instead of the character references the OP
> could use UTF-8 and include the characters directly in the page.
>
> Steve


Thank you all for your replies. I know understand that it is indeed
possible to have 'funny' characters in an ascii file. You see, I have an
index file, which I would like to load quickly, but also contains some
Japanese, Russian, Chinese, etc.. characters (links pointing to translations
of the page). Now, I could either double the file in size by saving it as
unicode or I could use the &#number; codes to specify the characters that I
need.

Can someone please confirm that I understood this correctly.

Thank you


- Nicolaas

PS does anyone know of any programs / online applications that can translate
characters into these codes (&#number;)



Jukka K. Korpela 11-22-2004 09:08 PM

Re: unicode characters in asci file
 
"WindAndWaves" <access@ngaru.com> wrote:

> I have an index file, which I would like to load quickly, but also
> contains some Japanese, Russian, Chinese, etc.. characters (links
> pointing to translations of the page).


Ideally, we would use language negotiation (a protocol for selecting
content based on the language preferences in the browser and information
on existing versions in the server) for sending the user the best
alternative available. But this is unreliable since most people have
wrong language settings in their browsers, so a multilingual index file
is indeed needed for a multilingual site.

> Now, I could either double
> the file in size by saving it as unicode or I could use the &#number;
> codes to specify the characters that I need.


You can use either of the methods, but please note that using Unicode
does not double the file size. Well, sometimes it might, but normally it
won't. In UTF-8, each Ascii character takes just one octet (byte), just
as in a pure Ascii file. Other characters take two or more octets each,
but if your document (including HTML markup, which uses Ascii only) is
dominantly Ascii characters, the increase in file size won't be big, and
it'll probably be a little smaller than the size of a version that uses
&#number; references. (After all, Ӓ is seven octets.)

> PS does anyone know of any programs / online applications that can
> translate characters into these codes (&#number;)


There are many of them, for different platforms. See
http://www.alanwood.net/unicode/utilities_editors.html
(which is about Unicode editors, which let you work with UTF-8 in
general, but they often have an output mode that uses &#number;).

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html



Sybren Stuvel 11-23-2004 03:09 PM

Re: unicode characters in asci file
 
Steve Pugh enlightened us with:
> Jukka was pointing out that instead of the character references the
> OP could use UTF-8 and include the characters directly in the page.


Ah, ok! Indeed, that's very possible, and I do it often.

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?

Sybren Stuvel 11-23-2004 03:11 PM

Re: unicode characters in asci file
 
WindAndWaves enlightened us with:
> I know understand that it is indeed possible to have 'funny'
> characters in an ascii file.


Strictly speaking, it's not. You have references to 'funny'
characters, but the references themselve are ASCII again, so no
'funny' characters are actually in the file. Or you have UTF-8 'funny'
characters in the file, but then the file isn't ASCII any more.

> You see, I have an index file, which I would like to load quickly,
> but also contains some Japanese, Russian, Chinese, etc.. characters
> (links pointing to translations of the page). Now, I could either
> double the file in size by saving it as unicode or I could use the
> &#number; codes to specify the characters that I need.


You understood it incorrectly. If you were to use UCS to store the
unicode, you'd be right. If you use UTF-8 to store the unicode, the
ASCII characters would still take a single byte, and the others two or
more.

> PS does anyone know of any programs / online applications that can
> translate characters into these codes (&#number;)


I think HTML tidy can do that.

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?


All times are GMT. The time now is 01:17 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.