Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Javascript > unprintable characters in a javascript produced msgbox

Reply
Thread Tools

unprintable characters in a javascript produced msgbox

 
 
emrefan
Guest
Posts: n/a
 
      07-02-2008
I am wondering a bit about what I should see in a message box (or in a
webpage, for that matter) when I include an unprintable ASCII
character, say ASCII 255, in there. I experimented a bit on my PC
running Traditional Chinese Windows 98SE and found that the following
javascript code produced a message that seemed to have ASCII
represented as "y".

alert( 'the following char is ASCII FF: \xff. So what does it
look like to you?' );

I had this line in the <HEAD> section of the relevant HTML file where
I put that javascript code:

<meta http-equiv='Content-Type' content='text/html; charset=Big5-
HKSCS'>

But even if I try to figure that into the picture, I still can't see
why it should come out as "y".

Can anybody please enlighten this thick mind?
 
Reply With Quote
 
 
 
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      07-02-2008
emrefan wrote:
> I am wondering a bit about what I should see in a message box (or in a
> webpage, for that matter) when I include an unprintable ASCII character,
> say ASCII 255, in there.


The (7-bit US-)ASCII character set ranges from code points 0 (0x00) to 127
(0x7F). Everything else is _not_ part of (US-)ASCII code:

<http://en.wikipedia.org/wiki/ASCII>

> I experimented a bit on my PC running Traditional Chinese Windows 98SE
> and found that the following javascript code produced a message that
> seemed to have ASCII represented as "y".char


You are getting the LATIN SMALL LETTER Y WITH DIAERESIS character ("ÿ"; note
that there are two dots in the ascent) because this is the character at code
point U+00FF in the Unicode character set as defined in the Unicode
Standard, versions 2.1 and later (a conforming implementation of ECMAScript
Edition 3 must implement the latter), and at code point 255 (0xFF) of
several other character sets, most notably ISO/IEC 8859-1 and Windows-1252:

<http://en.wikipedia.org/wiki/ISO/IEC_8859-1#Related_character_maps>
<http://unicode.org/>
<http://www.ecmascript.org/>

> alert( 'the following char is ASCII FF: \xff. So what does it look like
> to you?' );


Should be window.alert(...) so as to rely less on the UA's scope chain.

> I had this line in the <HEAD> section of the relevant HTML file where I
> put that javascript code:
>
> <meta http-equiv='Content-Type' content='text/html; charset=Big5- HKSCS'>
>
> But even if I try to figure that into the picture, I still can't see why
> it should come out as "y".


The display behavior for the code point 0xFF of the *proposed* character
encoding Big5-HKSCS (which uses the Big5 Character Set with Hong Kong
Supplementary Character Set), even if written properly, is undefined:

<http://en.wikipedia.org/wiki/Big5#HKSCS>
<http://www.iana.org/assignments/charset-reg/>

You should also check the HTTP response message's headers for a
`Content-Type' header that says differently, for it takes precedence then:

<http://www.w3.org/TR/1999/REC-html401-19991224/charset.html#h-5.2.2>


HTH

PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
 
Reply With Quote
 
 
 
 
Bart Van der Donck
Guest
Posts: n/a
 
      07-02-2008
emrefan wrote:

> I am wondering a bit about what I should see in a message box (or in a
> webpage, for that matter) ...


Character encoding in message boxes or web pages are two totally
different things.

> ... when I include an unprintable ASCII character, say ASCII 255,
> in there. *


Code points above 127 are not ASCII anymore. And why would it be
unprintable ?

> I experimented a bit on my PC running Traditional Chinese Windows
> 98SE and found that the following javascript code produced a
> message that seemed to have ASCII represented as "y".


Google Groups probably replaced your "y-umlaut" by "y".

> * * *alert( 'the following char is ASCII FF: \xff. So what does it
> look like to you?' );


This always looks the same for everyone, namely a y with an umlaut on.
No other display is possible here.

> I had this line in the <HEAD> section of the relevant HTML file where
> I put that javascript code:
>
> * * *<meta http-equiv='Content-Type' content='text/html; charset=Big5-
> HKSCS'>


That line does not affect javascript's internal code point table (like
eg. \xff). It defines which character set must be used on the web
page. For displaying y-umlaut on a web page, you probably want:

<meta http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1">

If you want both ISO-8859-1 and Chinese on a same page, I would
definitely go for UTF-8.

> But even if I try to figure that into the picture, I still can't see
> why it should come out as "y".


Because you get what you define If you say ISO-8859-1, then the
browser ties code point 255 to y-umlaut. If you say ISO-8859-2, then
you get an upper dot, etc.
http://en.wikipedia.org/wiki/ISO_8859-1
http://en.wikipedia.org/wiki/ISO_8859-2

Hope this helps,

--
Bart
 
Reply With Quote
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      07-02-2008
Bart Van der Donck wrote:
> emrefan wrote:
>> I am wondering a bit about what I should see in a message box (or in a
>> webpage, for that matter) ...

>
> Character encoding in message boxes or web pages are two totally
> different things.


Not true.

>> alert( 'the following char is ASCII FF: \xff. So what does it
>> look like to you?' );

>
> This always looks the same for everyone, namely a y with an umlaut on.
> No other display is possible here.


You are mistaken. The \x string literal escape sequence may or may not
specify a Unicode character, depending on the ECMAScript implementation.

>> I had this line in the <HEAD> section of the relevant HTML file where
>> I put that javascript code:
>>
>> <meta http-equiv='Content-Type' content='text/html; charset=Big5-
>> HKSCS'>

>
> That line does not affect javascript's internal code point table (like
> eg. \xff).


It could affect it if there was no corresponding HTTP header present that
says otherwise. There is no "javascript", BTW.

> It defines which character set must be used on the web page.


Unless a corresponding HTTP header is present that says otherwise. There
are no "web pages", BTW.


PointedEars
--
var bugRiddenCrashPronePieceOfJunk = (
navigator.userAgent.indexOf('MSIE 5') != -1
&& navigator.userAgent.indexOf('Mac') != -1
) // Plone, register_function.js:16
 
Reply With Quote
 
Bart Van der Donck
Guest
Posts: n/a
 
      07-02-2008
Thomas 'PointedEars' Lahn wrote:

> Bart Van der Donck wrote:


>> Character encoding in message boxes or web pages are two totally
>> different things.

>
> Not true.


It is true, because the character encoding is done at a different
level. Message boxes -like in this example- are actually much easier.
There can only be one possible representation. But when trying to
write y-umlaut in a web page, you have a bunch of possibilities, on
the top of my head, at least 10 - for which of course some are more
preferred than others.

>>> * * *alert( 'the following char is ASCII FF: \xff. So what does it
>>> look like to you?' );

>
>> This always looks the same for everyone, namely a y with an umlaut on.
> > No other display is possible here.

>
> You are mistaken. *The \x string literal escape sequence may or may not
> specify a Unicode character, depending on the ECMAScript implementation.


But I was only saying that alert('\xff') always shows y-umlaut in any
browser. y-umlaut is the character that is tied to code point 255 in
any ECMAScript implementation.

>>> * *<meta http-equiv='Content-Type' content='text/html; charset=Big5-
>>> HKSCS'>

>
> That line does not affect javascript's internal code point table (like
>> eg. \xff).

>
> It could affect it if there was no corresponding HTTP header present that
> says otherwise. *


Untrue. The display of \x.. (and \u....) can never be influenced by
any HTTP-header. The notation is ASCII-safe, and is passed to the
javascript engine to tie it to a fixed character. I think you're
mixing up the character set of a web page with javascript's consistent
internal code point table.

> There is no "javascript", BTW.


Is that so.

>> It defines which character set must be used on the web page.

>
> Unless a corresponding HTTP header is present that says otherwise.


That is far from sure, and could easily vary from browser to browser.
Anyway - it would be unwise to specify a charset on the web page that
contradicts the HTTP header (coder's fault, not browser's fault).

> There are no "web pages", BTW.


Is that so

--
Bart
 
Reply With Quote
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      07-02-2008
Bart Van der Donck wrote:
> Thomas 'PointedEars' Lahn wrote:
>> Bart Van der Donck wrote:
>>> Character encoding in message boxes or web pages are two totally
>>> different things.

>> Not true.

>
> It is true, because the character encoding is done at a different level.
> Message boxes -like in this example- are actually much easier. There can
> only be one possible representation.


You are mistaken. It depends on the user agent which characters are
supported in a message box. However, it has been observed that message
boxes use the character set of their document, regardless of the encoding
that the ECMAScript implementation supports. We have discussed this here
before.

> But when trying to write y-umlaut in a web page, you have a bunch of
> possibilities, on the top of my head, at least 10 - for which of course
> some are more preferred than others.


I don't think the OP wanted to write "y-umlaut" at all.

>>>> alert( 'the following char is ASCII FF: \xff. So what does it look
>>>> like to you?' );
>>> This always looks the same for everyone, namely a y with an umlaut
>>> on. No other display is possible here.

>> You are mistaken. The \x string literal escape sequence may or may not
>> specify a Unicode character, depending on the ECMAScript
>> implementation.

>
> But I was only saying that alert('\xff') always shows y-umlaut in any
> browser.


But you are dead wrong.

> y-umlaut is the character that is tied to code point 255 in any
> ECMAScript implementation.


However, there are implementations that do not support Unicode.

>>>> <meta http-equiv='Content-Type' content='text/html; charset=Big5-
>>>> HKSCS'>

>> That line does not affect javascript's internal code point table (like
>>> eg. \xff).

>> It could affect it if there was no corresponding HTTP header present
>> that says otherwise.

>
> Untrue. The display of \x.. (and \u....) can never be influenced by any
> HTTP-header.


\x definitely can. Obviously, \u cannot.

> The notation is ASCII-safe,


\x cannot be ASCII-safe as if it allows characters to be represented that
are outside the range of the ASCII character set.

>> There is no "javascript", BTW.

>
> Is that so.


Yes, there are different ECMAScript implementations (some of which don't
even deserve that designation), and versions thereof.

>>> It defines which character set must be used on the web page.

>> Unless a corresponding HTTP header is present that says otherwise.

>
> That is far from sure, and could easily vary from browser to browser.


It has been observed that user agents honor the Specification in that
regard. This was the reason why AddDefaultCharset was disabled in newer
Apache versions.

> Anyway - it would be unwise to specify a charset on the web page that
> contradicts the HTTP header (coder's fault, not browser's fault).


Nowadays, no argument there.


PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm>
 
Reply With Quote
 
Bart Van der Donck
Guest
Posts: n/a
 
      07-02-2008
Thomas 'PointedEars' Lahn wrote:

> Bart Van der Donck wrote:
>
>> Thomas 'PointedEars' Lahn wrote:
>>> Bart Van der Donck wrote:
>>>> Character encoding in message boxes or web pages are two totally
>>>> different things.
>>> Not true.

>
>> It is true, because the character encoding is done at a different level.
>> Message boxes -like in this example- are actually much easier. There can
>> only be one possible representation.

>
> You are mistaken. *It depends on the user agent which characters are
> supported in a message box. *However, it has been observed that message
> boxes use the character set of their document, regardless of the encoding
> that the ECMAScript implementation supports. *We have discussed this here
> before.


That is not the point here. It is clear that the original poster was
talking about alert('\xff') versus the encoding of y-umlaut in an HTML-
document. In that regard the representation of \xff has nothing to do
with the representation of y-umlaut outside javascript.

[...]
>> But I was only saying that alert('\xff') always shows y-umlaut in any
>> browser.

>
> But you are dead wrong.


Well, let's see then. Could you show a case where alert('\xff') does
not show y-umlaut ?

>> y-umlaut is the character that is tied to code point 255 in any
>> ECMAScript implementation.

>
> However, there are implementations that do not support Unicode.


Irrelevant. y-umlaut does not need Unicode at all.

>> The display of \x.. (and \u....) can never be influenced by any
>> HTTP-header.

>
> \x definitely can. *Obviously, \u cannot.


Let's see. Could you show an example where \x.. is displayed
differently depending on a varying HTTP-header ?

>> The notation is ASCII-safe,

>
> \x cannot be ASCII-safe as if it allows characters to be represented that
> are outside the range of the ASCII character set.


That's why I said the *notation* is ASCII-safe. What is *represented*
by that notation, is a different job; that is decided by the
javascript engine.

>>> There is no "javascript", BTW.

>> Is that so.

>
> Yes, there are different ECMAScript implementations (some of which don't
> even deserve that designation), and versions thereof.


That's like saying that cars don't exist, but only implementations of
fuel engines.

--
Bart
 
Reply With Quote
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      07-02-2008
Bart Van der Donck wrote:
> Thomas 'PointedEars' Lahn wrote:
>> Bart Van der Donck wrote:
>>> Thomas 'PointedEars' Lahn wrote:
>>>> Bart Van der Donck wrote:
>>>>> Character encoding in message boxes or web pages are two totally
>>>>> different things.
>>>> Not true.
>>> It is true, because the character encoding is done at a different level.
>>> Message boxes -like in this example- are actually much easier. There can
>>> only be one possible representation.

>> You are mistaken. It depends on the user agent which characters are
>> supported in a message box. However, it has been observed that message
>> boxes use the character set of their document, regardless of the encoding
>> that the ECMAScript implementation supports. We have discussed this here
>> before.

>
> That is not the point here. It is clear that the original poster was
> talking about alert('\xff') versus the encoding of y-umlaut in an HTML-
> document. In that regard the representation of \xff has nothing to do
> with the representation of y-umlaut outside javascript.


Yes, it has.

> [...]
>>> But I was only saying that alert('\xff') always shows y-umlaut in any
>>> browser.

>> But you are dead wrong.

>
> Well, let's see then. Could you show a case where alert('\xff') does
> not show y-umlaut ?


Wasting my time supporting your logical fallacy? I don't think so.

Ask something living in Bosnia, Croatia, Czech Republic, Hungaria, Poland,
Romania, Serbia, Slovakia, Slovenia, Malta, Estonia, Latvia, Lithuania,
Greenland, Bulgaria, Belarus, Russia, Macedonia, Greece, Israel, or any
other country where the character set designed for their main language does
not have "y-umlaut", as you put it (you really don't know what an umlaut
is), at decimal code point 255 (*except* with Unicode support), instead.

>>> y-umlaut is the character that is tied to code point 255 in any
>>> ECMAScript implementation.

>> However, there are implementations that do not support Unicode.

>
> Irrelevant.


Not at all.

> y-umlaut does not need Unicode at all.


True, it is also contained in ISO-8859-1. However, as ASCII does not
provide this character, if the \x string escape sequence is used and Unicode
support is not present, the locale encoding (or the encoding of the
document/file) must be used to determine which character to display for
decimal code points beyond 127. (If Unicode is not supported, "\uhhhh" is
interpreted as "uhhhh".)

>>> The notation is ASCII-safe,

>> \x cannot be ASCII-safe as if it allows characters to be represented that
>> are outside the range of the ASCII character set.

>
> That's why I said the *notation* is ASCII-safe.


It would seem whether that is true depends on how one defines "ASCII-safe".

> What is *represented* by that notation, is a different job; that is
> decided by the javascript engine.


See?

>>>> There is no "javascript", BTW.
>>> Is that so.

>> Yes, there are different ECMAScript implementations (some of which don't
>> even deserve that designation), and versions thereof.

>
> That's like saying that cars don't exist, but only implementations of
> fuel engines.


As a matter of fact, there are JavaScript and JScript versions that are not
fully ECMAScript-compliant, and therefore do not provide Unicode support.


PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$>
 
Reply With Quote
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      07-02-2008
Bart Van der Donck wrote:
> Thomas 'PointedEars' Lahn wrote:
>> Bart Van der Donck wrote:
>>> Thomas 'PointedEars' Lahn wrote:
>>>> Bart Van der Donck wrote:
>>>>> Character encoding in message boxes or web pages are two totally
>>>>> different things.
>>>> Not true.
>>> It is true, because the character encoding is done at a different
>>> level. Message boxes -like in this example- are actually much easier.
>>> There can only be one possible representation.

>> You are mistaken. It depends on the user agent which characters are
>> supported in a message box. However, it has been observed that message
>> boxes use the character set of their document, regardless of the
>> encoding that the ECMAScript implementation supports. We have
>> discussed this here before.

>
> That is not the point here. It is clear that the original poster was
> talking about alert('\xff') versus the encoding of y-umlaut in an HTML-
> document. In that regard the representation of \xff has nothing to do
> with the representation of y-umlaut outside javascript.


Yes, it has.

> [...]
>>> But I was only saying that alert('\xff') always shows y-umlaut in any
>>> browser.

>> But you are dead wrong.

>
> Well, let's see then. Could you show a case where alert('\xff') does not
> show y-umlaut ?


Wasting my time supporting your logical fallacy? I don't think so.

Ask someone living in Bosnia, Croatia, Czech Republic, Hungaria, Poland,
Romania, Serbia, Slovakia, Slovenia, Malta, Estonia, Latvia, Lithuania,
Greenland, Bulgaria, Belarus, Russia, Macedonia, Greece, Israel, or any
other country where the character set designed for their main language does
not have "y-umlaut", as you put it (you really don't know what an umlaut
is), at decimal code point 255 (*except* with Unicode support), instead.

>>> y-umlaut is the character that is tied to code point 255 in any
>>> ECMAScript implementation.

>> However, there are implementations that do not support Unicode.

>
> Irrelevant.


Not at all.

> y-umlaut does not need Unicode at all.


True, it is also contained in ISO-8859-1. However, as ASCII does not
provide this character, if the \x string escape sequence is used and Unicode
support is not present, the locale encoding (or the encoding of the
document/file) must be used to determine which character to display for
decimal code points beyond 127. (If Unicode is not supported, "\uhhhh" is
interpreted as "uhhhh" rather than a single character.)

>>> The notation is ASCII-safe,

>> \x cannot be ASCII-safe as if it allows characters to be represented
>> that are outside the range of the ASCII character set.

>
> That's why I said the *notation* is ASCII-safe.


It would seem whether that is true depends on how one defines "ASCII-safe".

> What is *represented* by that notation, is a different job; that is
> decided by the javascript engine.


See?

>>>> There is no "javascript", BTW.
>>> Is that so.

>> Yes, there are different ECMAScript implementations (some of which
>> don't even deserve that designation), and versions thereof.

>
> That's like saying that cars don't exist, but only implementations of
> fuel engines.


As a matter of fact, there are JavaScript and JScript versions that are not
fully ECMAScript-compliant, and therefore do not provide Unicode support.


PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$>
 
Reply With Quote
 
Bart Van der Donck
Guest
Posts: n/a
 
      07-02-2008
Thomas 'PointedEars' Lahn wrote:

> Bart Van der Donck wrote:


>> Could you show a case where alert('\xff') does
>> not show y-umlaut ?

>
> Wasting my time supporting your logical fallacy? *I don't think so.
>
> Ask something living in Bosnia, Croatia, Czech Republic, Hungaria, Poland,
> Romania, Serbia, Slovakia, Slovenia, Malta, Estonia, Latvia, Lithuania,
> Greenland, Bulgaria, Belarus, Russia, Macedonia, Greece, Israel, or any
> other country where the character set designed for their main language does
> not have "y-umlaut", as you put it (you really don't know what an umlaut
> is), at decimal code point 255 (*except* with Unicode support), instead.


You are simply wrong; all of those will display y-umlaut with
alert('\xff'). You keep talking about Unicode but it has nothing to do
with it. As I said, just give me one example, and I'll be immediately
convinced of your point. But there is no such example.

>>>> y-umlaut is the character that is tied to code point 255 in any
>>>> ECMAScript implementation.
>>> However, there are implementations that do not support Unicode.

>
>> Irrelevant.

>
> Not at all.
>
>> y-umlaut does not need Unicode at all.

>
> True, it is also contained in ISO-8859-1. *However, as ASCII does not
> provide this character, if the \x string escape sequence is used and Unicode
> support is not present, the locale encoding (or the encoding of the
> document/file) must be used to determine which character to display for
> decimal code points beyond 127. *


You just wrote the core of your misconception. In the (nowadays highly
unlikely) case that Unicode support would not be present in the
browser's script engine, the locale is NOT used as lookup-table for
\x. It's always the internal lookup table of the script engine. It has
nothing to do with the document or its encoding !

[...]
>> That's why I said the *notation* is ASCII-safe.

> It would seem whether that is true depends on how one defines "ASCII-safe"..


You have the nasty habit to give a silly twist to a position that you
cannot longer hold. ASCII-safe is code-point 0 to 127, as you
perfectly know. There is no room for other interpretations.

>> What is *represented* by that notation, is a different job; that is
>> decided by the javascript engine.

>
> See?


See what then ?

>>>>> There is no "javascript", BTW.
>>>> Is that so.
>>> Yes, there are different ECMAScript implementations (some of which don't
>>> even deserve that designation), and versions thereof.

>> That's like saying that cars don't exist, but only implementations of
>> fuel engines.

> As a matter of fact, there are JavaScript and JScript versions that are not
> fully ECMAScript-compliant, and therefore do not provide Unicode support.


I'm not going to reply on your arguments like "there is no
javascript", "you don't know what an umlaut is", "web pages don't
exist", etc. I made my point clear enough. You already conveniently
snipped my question "Could you show an example where \x.. is displayed
differently depending on a varying HTTP-header" which was one of your
basic points.

--
Bart
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Unprintable characters Stuart Clarke Ruby 1 08-14-2009 03:48 PM
Any luck with unprintable chars in string during debug? Edwin Knoppert ASP .Net 0 11-14-2005 04:47 PM
JavaScript Confirm msgbox... =?Utf-8?B?VEo=?= ASP .Net 3 10-03-2005 03:15 PM
URL inside a mail produced with aspx Antonio D'Ottavio ASP .Net 1 08-30-2005 01:07 PM
Interpretation of registry log of tweakui produced registry alteration vincemoon@rcn.com ASP .Net 0 01-10-2005 02:53 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57