Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > HTML > Re: Chinese text in HTML page and Byte-Order Mark

Reply
Thread Tools

Re: Chinese text in HTML page and Byte-Order Mark

 
 
Jukka K. Korpela
Guest
Posts: n/a
 
      05-28-2013
2013-05-28 0:16, Alfred Molon wrote:

> I've noticed that some pages use <span lang="zh" xml:lang="zh"> to embed
> Chinese text, but even simply embedding Chinese text in a UTF-8 HTML
> page seems to work fine as well


Yes, pages work without the lang attribute, but using it may have some
effect.

> Why then would this language declaration be necessary?


According to accessibility guidelines, the language of text should be
declared, to help e.g. in speech synthesis. This applies to all texts,
including English-language texts. But this is largely just theory,
though it would apply especially strongly to Chinese texts, since the
way "Chinese" characters (characters of Chinese origin, used for writing
Chinese, Japanese, and other languages) may essentially depend on
language. But speech synthesizers will guess the language or use a fixed
language or use the language selected by the user.

There are other reasons for declaring language, see
http://www.w3.org/International/ques...qa-lang-why.en
but I will just illustrate one of them:

When I view a page containing Chinese characters, on Firefox, those
characters appear in my system in the MS PGothic font, when the page
does not have any font settings. If the characters are inside an element
to which lang=zh applies, they appear in the SimSun font instead. And if
the attribute is lang=zh-TW or lang=zh-Hant, they appear in PMingLiu.
The reason is that the attribute makes the browser apply different
default fonts.

Nowadays, few authors leave fonts unspecified. The main reason is
probably that most browsers have Times New Roman as the default font,
and it is common knowledge, or prejudice, that it is unsuitable for web
pages. So authors declare Arial, because someone told it's cool, or
Verdana, since someone said it's even cooler. And because those fonts
aren't really cool at all in normal font size, authors too often set
font size to something barely legible, but I digress.

On the page you mentioned, the font family declaration in CSS is
font-family: Verdana, Arial, Helvetica, sans-serif. Since none of the
specific font families listed contains Chinese characters, the browser
will use its definition for sans-serif and, if it does not contain them
either, pick them up from some of the fonts in the system, using its own
internal rules.

The morale is that when using Chinese characters, you should take them
into account when writing your font-family rule. This is not obligatory,
but it's the right way to ensure (as far as possible) that the font used
for them will be acceptable and will stylistically match the font used
in the text otherwise.

And when you do so, the lang attribute does not matter in font selection
- but it is advisable to use it for other reasons.

> Another question, the above page validates without errors, but I get the
> warning:
>
> Byte-Order Mark found in UTF-8 File.
>
> The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to
> cause problems for some text editors and older browsers. You may want to
> consider avoiding its use until it is better supported.


That's grossly outdated information, probably retained just because some
people think there *might* be some browser in use that has problems with
BOM. There isn't. Hasn't been for many years. Except perhaps in a
museum, where Netscape 2 and IE 3 can be seen.

In the modern world, BOM is *good* even in UTF-8. It acts as a
practically certain way of indicating that the page is UTF-8 encoded,
even if HTTP headers are missing (e.g., because the page has been saved
locally.

You may have problems if you have a BOM at the start of a PHP file. But
that's something completely different.

> How to remove the Byte-Order Mark?


You could remove it by using an editor that can save in the "UTF-8, no
BOM" format. But there is no reason to remove it.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Chinese text in HTML page and Byte-Order Mark Ben Bacarisse HTML 0 05-28-2013 02:19 AM
Re: Chinese text in HTML page and Byte-Order Mark dorayme HTML 0 05-27-2013 11:45 PM
Re: Chinese text in HTML page and Byte-Order Mark Tim Streater HTML 0 05-27-2013 10:17 PM
XML-Parsing with UTF-8 Byte-Order-Mark (BOM) Patrick.Gebhardt@gmail.com Java 3 06-29-2007 05:18 PM
XML-Parsing with UTF-8 Byte-Order-Mark (BOM) Patrick.Gebhardt@gmail.com Java 0 06-25-2007 03:50 PM



Advertisments