Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > HTML > prob's w foreign char sets ...

Reply
Thread Tools

prob's w foreign char sets ...

 
 
Zigzag
Guest
Posts: n/a
 
      05-20-2012
hi,

I'm attempting to translate some web content into other languages
and need some help understanding the proper coding to have these
characters show up consistently, cross-browser & cross platform.

I'm using HTML 4.01 transitional for these documents, and for the
foreign character sets (ie, chinese, korean, japanese etc), I'm using
the unicode numeric reference example: &#xxxxx; ...

I normally state the page language in the HTML tag (ie <html lang="ko">)
also, using the meta tag:
<meta http-equiv="Content-Language" content="ko">

The web pages all seem to show up well on a new Mac computer,
but on an older PC laptop, none of the Korean, Chinese or Japanese
shows up properly. (firefox renders little squares with a stack of
numbers & letters in them; opera renders plain squares).

Are there any consistent 'rules' to follow for displaying
and rendering unicode character sets properly?

thanks for any pointers.

ZZ
 
Reply With Quote
 
 
 
 
Zigzag
Guest
Posts: n/a
 
      05-20-2012

ps - wondering what the proper character set declaration
should be, ie, UTF-8, iso-8859-1, etc ...

thanks
 
Reply With Quote
 
 
 
 
Jukka K. Korpela
Guest
Posts: n/a
 
      05-21-2012
2012-05-20 22:55, Zigzag wrote:

> [...] for the
> foreign character sets (ie, chinese, korean, japanese etc), I'm using
> the unicode numeric reference example:&#xxxxx; ...


That's possible and works independently of the character encoding of the
HTML document, but it really makes HTML source hard to read. It's
comparable to writing "hello" as "&#x68;&#x65;&#x6c;&#x6c;&#x6f;" (which
is possible).

> I normally state the page language in the HTML tag (ie <html lang="ko">)
> also, using the meta tag:
> <meta http-equiv="Content-Language" content="ko">


Neither of these has much effect, but the former can be regarded as good
practice in principle (the latter is then redundant). Beware that it may
change the default font used by the browser, as this can be
language-dependent. Any setting of the document's overall font family,
say body { font-family: Gulim, Malgun Gothic }, will override that,
though, whenever it lists any specific font that is available in the
user's system.

If you open the Settings in Firefox and select Contents, there's a
button for "additional settings" for fonts (I'm using a Finnish version
now, so I don't know what the exact English terms are here), you can see
(and modify) the settings for various character repertoires, like
Korean, setting the default serif font, default sans-serif font, and
default monospace font. The factory defaults for these defaults are
probably reasonable, though perhaps not optimal.

The morale is: If you use lang markup, you should expect variation of
fonts across browsers, and possibly fonts other than those that would be
used without the lang markup. But you can largely remove this variation
by explicitly specifying a list of fonts, in order of preference.

> The web pages all seem to show up well on a new Mac computer,
> but on an older PC laptop, none of the Korean, Chinese or Japanese
> shows up properly. (firefox renders little squares with a stack of
> numbers& letters in them; opera renders plain squares).


This is most probably due to lack of suitable fonts on the PC, but it is
also possible that the browser just can't find a relevant font and needs
a little help. Testing on different browsers browsers may reveal this.
You can also try with the following style sheet:

* { font-family: Batang }

This should work on Windows XP and later.

More info: Guide to using special characters in HTML,
http://www.cs.tut.fi/~jkorpela/html/characters.html

There's a specific issue with Korean: you can write a hangul syllable
using one syllabic character, or as decomposed, using several
characters. The choice between these representations isn't supposed to
affect the rendering, but in reality it may, due to font limitations and
program limitations.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
 
Reply With Quote
 
mayeul.marguet
Guest
Posts: n/a
 
      05-21-2012
On 20/05/2012 22:03, Zigzag wrote:
>
> ps - wondering what the proper character set declaration
> should be, ie, UTF-8, iso-8859-1, etc ...
>
> thanks


If you insist on using character references for everything non-US, then
it doesn't matter. Declare whatever you want and use it.

Otherwise, you need to use and declare UTF-8, as iso-8859-1 just can't
write korean.

--
Mayeul
 
Reply With Quote
 
Jukka K. Korpela
Guest
Posts: n/a
 
      05-21-2012
2012-05-21 13:22, mayeul.marguet wrote:

> On 20/05/2012 22:03, Zigzag wrote:
>>
>> ps - wondering what the proper character set declaration
>> should be, ie, UTF-8, iso-8859-1, etc ...
>>
>> thanks

>
> If you insist on using character references for everything non-US, then
> it doesn't matter. Declare whatever you want and use it.


In principle, yes. And e.g. Ascii, UTF-8, iso-8859-1, windows-1252 are
just the same when there are no characters outside the Ascii range in
the data.

> Otherwise, you need to use and declare UTF-8, as iso-8859-1 just can't
> write korean.


UTF-8 is probably the best option, but there _are_ several encodings
specifically designed for Korean. But they are usually not a good choice
for web pages. For example, my IE 9 has only one Korean encoding in its
menu for encoding selection, and it is labelled "korealainen" (=
Korean), leaving it to the user to guess which of the Korean encodings
it is...

Besides, if you later find out that you need characters outside the set
supported by a Korean encoding you've selected, you'll face the problem
again: either use clumsy character references, or switch to UTF-8 (which
may be non-trivial after you've created a large site in another encoding).

--
Yucca, http://www.cs.tut.fi/~jkorpela/
 
Reply With Quote
 
Neil Gould
Guest
Posts: n/a
 
      05-21-2012
Zigzag wrote:
>
> I'm attempting to translate some web content into other languages
> and need some help understanding the proper coding to have these
> characters show up consistently, cross-browser & cross platform.
>

[...]
>
> The web pages all seem to show up well on a new Mac computer,
> but on an older PC laptop, none of the Korean, Chinese or Japanese
> shows up properly. (firefox renders little squares with a stack of
> numbers & letters in them; opera renders plain squares).
>

Without the appropriate language fonts installed, you will see those
squares, so make sure that your PC has the correct fontset installed, and it
may render them correctly without changes to the HTML character set.


--
best regards,

Neil




 
Reply With Quote
 
Jukka K. Korpela
Guest
Posts: n/a
 
      05-21-2012
2012-05-21 15:17, Neil Gould wrote:

> Without the appropriate language fonts installed, you will see those
> squares, so make sure that your PC has the correct fontset installed, and it
> may render them correctly without changes to the HTML character set.


Little does that help to make the page display properly on anyone else’s
computer. And installing a “fontset” (whatever that means) does *not*
ensure that all browsers will use it automatically.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
 
Reply With Quote
 
Neil Gould
Guest
Posts: n/a
 
      05-21-2012
Jukka K. Korpela wrote:
> 2012-05-21 15:17, Neil Gould wrote:
>
>> Without the appropriate language fonts installed, you will see
>> those squares, so make sure that your PC has the correct fontset
>> installed, and it may render them correctly without changes to the
>> HTML character set.

>
> Little does that help to make the page display properly on anyone
> else’s computer. And installing a “fontset” (whatever that
> means) does *not* ensure that all browsers will use it automatically.
>

You snipped the context of my remark:
">
> The web pages all seem to show up well on a new Mac computer,
> but on an older PC laptop, none of the Korean, Chinese or Japanese
> shows up properly.
>

I was addressing one reason that Korean, Chinese and Japanese fonts would
not display correctly on *the OP's* older PC. Those fonts are not installed
by default on some older PCs, and will not display regardless of the HTML
character set specified. I was not addressing the issue as it applies to
"anyone else's computer", nor whether all browsers would display installed
fonts correctly.

--
best regards,

Neil



 
Reply With Quote
 
Andreas Prilop
Guest
Posts: n/a
 
      05-22-2012
On Mon, 21 May 2012, mayeul.marguet wrote:

>> wondering what the proper character set declaration should be

>
> If you insist on using character references for everything non-US,
> then it doesn't matter.


It does matter. If you choose ISO-8859-6 or Windows-1256, then
Internet Explorer's default typeface will be Simplified Arabic.
Even in Windows 7, Simplified Arabic does not contain Urdu letters
although Urdu letters were added to code page 1256 with Windows 2000.
Internet Explorer will take the Urdu letters from some other font,
thereby failing to join the letters.
http://www.unicode.org/mail-arch/uni...0110/join.html
http://www.unicode.org/mail-arch/uni...-m03/0110.html

--
In memoriam Alan J. Flavell
http://www.alanflavell.org.uk/charset/
 
Reply With Quote
 
mayeul.marguet
Guest
Posts: n/a
 
      05-23-2012
On 22/05/2012 17:28, Andreas Prilop wrote:
> On Mon, 21 May 2012, mayeul.marguet wrote:
>
>>> wondering what the proper character set declaration should be

>>
>> If you insist on using character references for everything non-US,
>> then it doesn't matter.

>
> It does matter. If you choose ISO-8859-6 or Windows-1256, then
> Internet Explorer's default typeface will be Simplified Arabic.
> Even in Windows 7, Simplified Arabic does not contain Urdu letters
> although Urdu letters were added to code page 1256 with Windows 2000.
> Internet Explorer will take the Urdu letters from some other font,
> thereby failing to join the letters.
> http://www.unicode.org/mail-arch/uni...0110/join.html
> http://www.unicode.org/mail-arch/uni...-m03/0110.html
>


I was reading the question more as an either-or between utf-8 or
windows-1252, but fine observation, thanks.

--
Mayeul
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
(const char *cp) and (char *p) are consistent type, (const char **cpp) and (char **pp) are not consistent lovecreatesbeauty C Programming 1 05-09-2006 08:01 AM
Survey - implementing CRUD with "foreign name" instead of foreign key. H5N1 ASP .Net 0 05-03-2006 11:36 PM
/usr/bin/ld: ../../dist/lib/libjsdombase_s.a(BlockGrouper.o)(.text+0x98): unresolvable relocation against symbol `std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostre silverburgh.meryl@gmail.com C++ 3 03-09-2006 12:14 AM
meta tags in foreign character sets Jascinder HTML 6 02-21-2005 07:18 PM
font-face support for foreign character sets... Arby Trary HTML 6 01-31-2005 12:00 PM



Advertisments