Re: Chinese text in HTML page and Byte-Order Mark
2013-05-28 0:16, Alfred Molon wrote:
> I've noticed that some pages use <span lang="zh" xml:lang="zh"> to embed
> Chinese text, but even simply embedding Chinese text in a UTF-8 HTML
> page seems to work fine as well
Yes, pages work without the lang attribute, but using it may have some
> Why then would this language declaration be necessary?
According to accessibility guidelines, the language of text should be
declared, to help e.g. in speech synthesis. This applies to all texts,
including English-language texts. But this is largely just theory,
though it would apply especially strongly to Chinese texts, since the
way "Chinese" characters (characters of Chinese origin, used for writing
Chinese, Japanese, and other languages) may essentially depend on
language. But speech synthesizers will guess the language or use a fixed
language or use the language selected by the user.
There are other reasons for declaring language, see
but I will just illustrate one of them:
When I view a page containing Chinese characters, on Firefox, those
characters appear in my system in the MS PGothic font, when the page
does not have any font settings. If the characters are inside an element
to which lang=zh applies, they appear in the SimSun font instead. And if
the attribute is lang=zh-TW or lang=zh-Hant, they appear in PMingLiu.
The reason is that the attribute makes the browser apply different
Nowadays, few authors leave fonts unspecified. The main reason is
probably that most browsers have Times New Roman as the default font,
and it is common knowledge, or prejudice, that it is unsuitable for web
pages. So authors declare Arial, because someone told it's cool, or
Verdana, since someone said it's even cooler. And because those fonts
aren't really cool at all in normal font size, authors too often set
font size to something barely legible, but I digress.
On the page you mentioned, the font family declaration in CSS is
font-family: Verdana, Arial, Helvetica, sans-serif. Since none of the
specific font families listed contains Chinese characters, the browser
will use its definition for sans-serif and, if it does not contain them
either, pick them up from some of the fonts in the system, using its own
The morale is that when using Chinese characters, you should take them
into account when writing your font-family rule. This is not obligatory,
but it's the right way to ensure (as far as possible) that the font used
for them will be acceptable and will stylistically match the font used
in the text otherwise.
And when you do so, the lang attribute does not matter in font selection
- but it is advisable to use it for other reasons.
> Another question, the above page validates without errors, but I get the
> Byte-Order Mark found in UTF-8 File.
> The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to
> cause problems for some text editors and older browsers. You may want to
> consider avoiding its use until it is better supported.
That's grossly outdated information, probably retained just because some
people think there *might* be some browser in use that has problems with
BOM. There isn't. Hasn't been for many years. Except perhaps in a
museum, where Netscape 2 and IE 3 can be seen.
In the modern world, BOM is *good* even in UTF-8. It acts as a
practically certain way of indicating that the page is UTF-8 encoded,
even if HTTP headers are missing (e.g., because the page has been saved
You may have problems if you have a BOM at the start of a PHP file. But
that's something completely different.
> How to remove the Byte-Order Mark?
You could remove it by using an editor that can save in the "UTF-8, no
BOM" format. But there is no reason to remove it.
|All times are GMT. The time now is 06:35 PM.|
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.