Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > RE: [OT] sentances with two meanings

Reply
Thread Tools

RE: [OT] sentances with two meanings

 
 
Alan Kennedy
Guest
Posts: n/a
 
      07-18-2003
"Martin v. Loewis" wrote:

> So what do you think about this message?:
>
> γίγνωσκω
>
> Look Ma, no markup. And not every character uses two bytes, either.
> And I can use Umlauts (äöü) and Arabic (ءﺎﻣ.ﺔﻛﺮﺷ)
> if I want to.


Martin,

I can see from other people's messages that this has been successful
for some people with modern software.

However, it failed for me on my old Netscape 4.x Messenger. Which is
acceptable, I suppose, because I intentionally use ancient email and
usenet software. It is also worth noting that although my poor old
usenet client failed to display the sequence of characters, the
"Navigator" component to which it belongs correctly displayed the
greek text when fed Bengt's "gignooskoo.html" file (although it failed
on my xml snippet).

More worrying however is the failure of modern browsers to display the
characters when accessed through Google Groups.

http://groups.google.com/groups?hl=e...%40v.loewis.de

I tried to view this in IE6.0 and Netscape 6.2, and all I saw was
"?????s??".

Whereas that thread still shows my XML snippet intact, still
copy&paste-able.

kind regards,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
 
Reply With Quote
 
 
 
 
Martin v. =?iso-8859-15?q?L=F6wis?=
Guest
Posts: n/a
 
      07-18-2003
Erik Max Francis <(E-Mail Removed)> writes:

> Yeah, but that only works if everyone's expecting the same encoding. I
> just see garbage non-ASCII characters, for instance, with my lowly
> Netscape 4 newsreader.


No, that is not a prerequisite. Instead, the prerequisite is that the
news reader/MUA knows what MIME (Multipurpose Internet Mail
Extensions) is. In the MIME header, I clearly identified the encoding
of this message as UTF-8, so any news reader *should* be capable of
converting this to the local encoding (perhaps using replacement
characters where glyphs are missing).

> Looked like it was probably was here, I saw what looked very strongly
> like eight double-byte characters (and two bytes each).


But then, you were able to read the English parts of my message just
fine, right? The ASCII letters in this message did not take up two
bytes per letter, as they would have if the message was encoded in
UTF-16.

Regards,
Martin

 
Reply With Quote
 
 
 
 
Martin v. =?iso-8859-15?q?L=F6wis?=
Guest
Posts: n/a
 
      07-18-2003
Alan Kennedy <(E-Mail Removed)> writes:

> More worrying however is the failure of modern browsers to display the
> characters when accessed through Google Groups.


It's not the browsers that display it incorrectly; it is Google
rendering it incorrectly. Fortunately, they keep the original data at

http://groups.google.com/groups?selm...&output=gplain

Regards,
Martin

 
Reply With Quote
 
Brian McErlean
Guest
Posts: n/a
 
      07-18-2003
Alan Kennedy <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> "Martin v. Loewis" wrote:
>
> > So what do you think about this message?:
> >
> > γίγν??κ?
> >
> > Look Ma, no markup. And not every character uses two bytes, either.
> > And I can use Umlauts (äöü) and Arabic (ء?ﻣ.??ﺮﺷ)
> > if I want to.

>
> Martin,
>
> I can see from other people's messages that this has been successful
> for some people with modern software.
>
> However, it failed for me on my old Netscape 4.x Messenger. Which is
> acceptable, I suppose, because I intentionally use ancient email and
> usenet software. It is also worth noting that although my poor old
> usenet client failed to display the sequence of characters, the
> "Navigator" component to which it belongs correctly displayed the
> greek text when fed Bengt's "gignooskoo.html" file (although it failed
> on my xml snippet).
>
> More worrying however is the failure of modern browsers to display the
> characters when accessed through Google Groups.
>
> >http://groups.google.com/groups?hl=e...%40v.loewis.de

>
> I tried to view this in IE6.0 and Netscape 6.2, and all I saw was
> "?????s??".
>
> Whereas that thread still shows my XML snippet intact, still
> copy&paste-able.
>
> kind regards,


I saw the same as you with that URL, but viewing the thread in google,
or going to the link it gave for "View this article only":
http://groups.google.com/groups?dq=&...%40v.loewis.de

displayed OK (Using Mozilla firebird 0.6)

The key difference seems to be the "oe=UTF-8" argument in the URL.
Adding this to your URL displays it correctly.
 
Reply With Quote
 
Cliff Wells
Guest
Posts: n/a
 
      07-18-2003
On Thu, 2003-07-17 at 15:33, Martin v. Loewis wrote:

> So what do you think about this message?:
>
> γίγνωσκω
>
> Look Ma, no markup. And not every character uses two bytes, either.
> And I can use Umlauts (äöü) and Arabic (ءﺎﻣ.ﺔﻛﺮﺷ) if I want to.
>
> I don't know for whom this renders well, but I guess MSIE5+, NS6+
> and Mozilla 1+ are good candidates - without the need for saving
> things into files.


Looks fine in Evolution 1.4.3. Well, that is, it looks like a bunch of
gibberish, which is what I expected to see <wink>

--
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 (800) 735-0555


 
Reply With Quote
 
Robin Munn
Guest
Posts: n/a
 
      07-18-2003
(Restoring the [OT] marker in the subject because this is definitely
off-topic for a Python newsgroup. Well, I guess it might have a bearing
on Python's use of Unicode. Somehow. Remotely.)

Martin v. Loewis <(E-Mail Removed)> wrote:
> Alan Kennedy wrote:
>> For anybody who has MS Internet Explorer 5+, Netscape 6+, Mozilla 1+,
>> i.e. any browser that supports XML, simply save this to a disk file
>> and open it in your chosen browser.

> [...]
>
>> So, the challenge to the ASCII proponents is: put the greek word
>> "gignooskoo" on everybody's screen, originating from a usenet message,
>> in the original greek, where "oo" -> greek letter omega.

> [...]
>> I expect you won't find it as simple as the XML above, although I'm
>> also completely prepared to be proven wrong (Alan tries to cover his
>> a** in advance .

>
> So what do you think about this message?:
>
> ????????


I see eight identical question marks.

>
> Look Ma, no markup. And not every character uses two bytes, either.
> And I can use Umlauts () and Arabic (???.????) if I want to.


I see an a-umlaut, o-umlaut, u-umlaut. But for the Arabic, I see three
question marks, a period, and four question marks.

Running slrn version 0.9.7.4 on Linux. My terminal is a PuTTY SSH
connection from a Windows box. slrn --version produces:

Slrn 0.9.7.4 [2002-03-13]
S-Lang Library Version: 1.4.7
Compiled at: Jan 12 2003 08:31:04
Operating System: Linux

COMPILE TIME OPTIONS:
Backends: +nntp +slrnpull +spool
External programs / libs: -inews -ssl -uudeview
Features: +charset_mapping +decoding +emphasized_text +end_of_thread
+fake_refs +gen_msgid -grouplens +mime -msgid_cache +piping +rnlock
+slang +spoilers -strict_from +verbatim_marks
DEFAULTS:
Default server object: nntp
Default posting mechanism: nntp
Default character set: isolatin
SUPPORTED CHARACTER SETS:
isolatin ibm850 ibm852 ibm737 NeXT koi8



Looking at a hex dump of this post, I see that your Unicode characters
have become ASCII question marks in my reply. Bad slrn! No biscuit!

--
Robin Munn <(E-Mail Removed)> | http://www.rmunn.com/ | PGP key 0x6AFB6838
-----------------------------+-----------------------+----------------------
"Remember, when it comes to commercial TV, the program is not the product.
YOU are the product, and the advertiser is the customer." - Mark W. Schumann
 
Reply With Quote
 
Bengt Richter
Guest
Posts: n/a
 
      07-18-2003
On 18 Jul 2003 09:30:32 +0200, Hallvard B Furuseth <h.b.furuseth(nospam)@usit.uio(nospam).no> wrote:

>Oren Tirosh wrote:
>>On Fri, Jul 18, 2003 at 03:11:56AM +0000, Bengt Richter wrote:

>
>>> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-8859-7">

>
>Needs to be charset=utf-8. iso-8859-7 has no character number 947.

You're right. I think the iso-8859-7 just served as a font hint, in effect.
You can't leave out the <META ... line on my browser (NS4.5, english font defaults)
but IWG one could with Greek defaults. IWG everything is converted to
windows wchars internally either way, according to some best-guess rules
if things aren't consistent.

If we add a space and the character Ж after the Greek, the difference
will show up: with utf-8 you get the Cyrillic, and with iso-8859-7 you get a question mark.
(or at least you do with NS4.5). Just tried IE5 -- it seems to fake it either way, but screws up
the presentation with a change in font weight after two characters, and then spaces between chars.
Don't know what that's about. I don't use IE5, (comma required normally.
>
>>> <h1>γ(...)

>
>> Actually, you don't need the "CHARSET=iso-8859-7". It would be
>> required if you used the bytes 227, 223, 227, 237, 249, 243, 234, 249
>> to represent the characters. With numeric character references you can
>> embed any character from the UCS repertoire regardless of the charset
>> used.

>
>&#<num>; seems to mean character number NUM in the current character
>set, not in UCS. At least on NS 4.79.

That seems to be confirmed by the Cyrillic experiment above, now at least
for NS4.5 and NS4.79.

Regards,
Bengt Richter
 
Reply With Quote
 
Alan Kennedy
Guest
Posts: n/a
 
      07-18-2003
Alan Kennedy

>> More worrying however is the failure of modern browsers to display
>> the characters when accessed through Google Groups.


Martin v. Lwis:

> It's not the browsers that display it incorrectly; it is Google
> rendering it incorrectly. Fortunately, they keep the original data
> at
>

http://groups.google.com/groups?selm...&output=gplain

Thanks Martin, a virtuoso demonstration.

It is also worth noting that your message and messages quoting it are
the only hits that turn up in a Google Groups search using the
original greek text as a search term: i.e. I go to Google Groups and
paste in the greek letters. This is true of both "global" Google
Groups and the Greek version as well:

groups.google.com: http://tinyurl.com/hd58
groups.google.com.gr: http://tinyurl.com/hd5l

Bravo! (These kudos exchangable for food+beers should you ever decide
to visit Dublin

To everyone else: Why does this stuff get so complicated? Why does it
take a multi-lingual + encoding-guru + protocol-guru + markup-guru +
python-bot like Martin von L to get stuff like this done? Does it have
to require somebot who writes better quality software (i.e. less
defective) than the world's leading search engine, Google, who got it
slightly wrong?

The idea of raising this came to me when that Russian individual
posted a message a few days ago that got very garbled in the
transmission, both subject and content. Again, it was only Martin who
was able to figure out its content: I, being an ordinary mortal, was
left saying "Qu?"

Computers should be about making it easier for people to communicate
with each other. And yes I fully realise python's excellence in that
regard, thanks in large part to Martin.

To me, the "structure data using ASCII" argument seems very similar to
the human language position: "English is now universal, therefore all
people must learn and speak it if they want to communicate." What if I
want to have an irish gaelic word in the subject line of my emails or
usenet posts?

sln libh,

--
alin cinnide
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
 
Reply With Quote
 
Alan Kennedy
Guest
Posts: n/a
 
      07-18-2003
Brian McErlean wrote:
> I saw the same as you with that URL, but viewing the thread in
> google, or going to the link it gave for "View this article only":


> http://groups.google.com/groups?dq=&...%40v.loewis.de
>
> displayed OK (Using Mozilla firebird 0.6)
>
> The key difference seems to be the "oe=UTF-8" argument in the URL.
> Adding this to your URL displays it correctly.


Yep, we're definitely not in ASCII-land anymore.

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
 
Reply With Quote
 
Alan Kennedy
Guest
Posts: n/a
 
      07-20-2003
Alan Kennedy wrote:

>> One person, Bengt, said that he couldn't see it


Ben Finney wrote:

> This is identical to the justification of "$BIGNUM percent of our
> target users use browser $BROWSER, so we can ignore the rest and
> use methods only viewable by browser $BROWSER."


Hmm, I fail to see the connection here. Fair enough, I made a mistake
in structuring my original xml snippet. I didn't attempt to address
the fact there are still some browsers out there that don't do XML.
Bengt corrected that mistake by providing an HTML snippet that works
in
non-XML browsers as well, i.e. a superset of the set I covered. Given
the current market breakdown for browsers, I guesstimate that Bengt's
snippet worked for > 99.9% of recipients.

> Which quickly leads to "You must use $BROWSER to view this site".
> No thanks.


No, that's the precise opposite of the point I was making. My position
is "You must use markup-capable software to perceive what I've
written. Your choice of software is entirely up to you: the only
requirement is the ability to process (x|ht)ml". I try to avoid
platform/language/os/browser dependent anything: that was the whole
point of the post.

> Provide a method that degrades gracefully to ASCII, the current
> standard; then I'll be interested.


#------------------------------------------------------------
snippet = """<?xml version="1.0" encoding="utf-8"?>
<verb>&#x3b3;&#x3af;&#x3b3;&#x3bd;&#x3c9;&#x3c3;&# x3ba;&#x3c9;</verb>
"""

def is7bitclean(s):
for c in s:
if ord(c) > 127:
return 0
return 1

if is7bitclean(snippet):
print "Yep, it's clean."
else:
print "Thou hast broken the rules."
#------------------------------------------------------------

Is that what you meant by "graceful degradation to ASCII"?

The 7-bit cleanness of my original snippet was the reason why it
arrived safely in everyone's "inbox".

Bengt's even-further-travelling HTML snippet is also 7-bit clean.

And if the message structures used in the protocol transporting the
messages were encoded in XML, you wouldn't even have seen any encoding
declarations or pointy brackets, or had to copy&paste.

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
two meanings of a cast Felix Kater C Programming 18 05-25-2007 11:38 AM
Meanings lissy6098 Cisco 3 08-15-2005 12:23 PM
meanings of dialets or implementation of a programming language Matt C++ 3 09-12-2004 05:56 AM
Exif fields meanings Ariel Dolan Digital Photography 2 05-18-2004 12:33 PM
meanings lallous Computer Support 4 01-19-2004 05:06 PM



Advertisments