Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > HTML > Google cached version mangled

Reply
Thread Tools

Google cached version mangled

 
 
N Cook
Guest
Posts: n/a
 
      05-15-2005
I've added a small bit of foreign script to a file and now the Google
cached version is wholly mangled.
The Google version starts

first letter y with 2 dots over and then a sort of p
and all that follows is minus spaces and the source html with brackets.
I tried adding html lang ="en" in <> at the beginning of the file but no
change
on the Google cached version.


 
Reply With Quote
 
 
 
 
Toby Inkster
Guest
Posts: n/a
 
      05-15-2005
N Cook wrote:

> ÿþ


Check your HTTP headers. This is a common UTF-16 thingy.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

 
Reply With Quote
 
 
 
 
David Dorward
Guest
Posts: n/a
 
      05-15-2005
N Cook wrote:

> I've added a small bit of foreign script to a file


It would help if you showed a URL.

> I tried adding html lang ="en" in <> at the beginning of the file but no
> change on the Google cached version.


The lang attribute tells the user agent what language the document is
written in. This is useful for things such as telling an aural browser
which pronunciation guide to use, or for search engines to filter out
documents if the user specified "Only in language X".

It doesn't tell the user agent anything about how characters are represented
in the text file. For that you need to configure your webserver to inform
the user agent what the character encoding of the file is.

http://www.cs.tut.fi/~jkorpela/chars/

--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
 
Reply With Quote
 
N Cook
Guest
Posts: n/a
 
      05-15-2005
"David Dorward" <(E-Mail Removed)> wrote in message
news:d67ic8$pdk$1$(E-Mail Removed)...
> N Cook wrote:
>
> > I've added a small bit of foreign script to a file

>
> It would help if you showed a URL.
>
> > I tried adding html lang ="en" in <> at the beginning of the file but no
> > change on the Google cached version.

>
> The lang attribute tells the user agent what language the document is
> written in. This is useful for things such as telling an aural browser
> which pronunciation guide to use, or for search engines to filter out
> documents if the user specified "Only in language X".
>
> It doesn't tell the user agent anything about how characters are

represented
> in the text file. For that you need to configure your webserver to inform
> the user agent what the character encoding of the file is.
>
> http://www.cs.tut.fi/~jkorpela/chars/
>
> --
> David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
> Home is where the ~/.bashrc is


The actual file is
http://www.divdev.fsnet.co.uk/dysch.htm
all fine until I added the Hebrew piece near the top #linking
to the full Hebrew summary text near the end of the file.
The Hebrew text reads correctly right to left etc , just that Google cached
would seem not to like it.

Do i need to add an Isocode number for English , not just the "en"
designation ?






 
Reply With Quote
 
N Cook
Guest
Posts: n/a
 
      05-15-2005
"N Cook" <(E-Mail Removed)> wrote in message
news:d67nfm$h4c$(E-Mail Removed)...
> "David Dorward" <(E-Mail Removed)> wrote in message
> news:d67ic8$pdk$1$(E-Mail Removed)...
> > N Cook wrote:
> >
> > > I've added a small bit of foreign script to a file

> >
> > It would help if you showed a URL.
> >
> > > I tried adding html lang ="en" in <> at the beginning of the file but

no
> > > change on the Google cached version.

> >
> > The lang attribute tells the user agent what language the document is
> > written in. This is useful for things such as telling an aural browser
> > which pronunciation guide to use, or for search engines to filter out
> > documents if the user specified "Only in language X".
> >
> > It doesn't tell the user agent anything about how characters are

> represented
> > in the text file. For that you need to configure your webserver to

inform
> > the user agent what the character encoding of the file is.
> >
> > http://www.cs.tut.fi/~jkorpela/chars/
> >
> > --
> > David Dorward <http://blog.dorward.me.uk/>

<http://dorward.me.uk/>
> > Home is where the ~/.bashrc is

>
> The actual file is
> http://www.divdev.fsnet.co.uk/dysch.htm
> all fine until I added the Hebrew piece near the top #linking
> to the full Hebrew summary text near the end of the file.
> The Hebrew text reads correctly right to left etc , just that Google

cached
> would seem not to like it.
>
> Do i need to add an Isocode number for English , not just the "en"
> designation ?
>
>
>
>
>
>


That URL is now converted to try without any reference to "he".
The original that of this weekend is cached on Google is now parked, renamed
as
http://www.divdev.fsnet.co.uk/dysch_old.htm


 
Reply With Quote
 
Toby Inkster
Guest
Posts: n/a
 
      05-16-2005
N Cook wrote:

> http://www.divdev.fsnet.co.uk/dysch.htm


As I said yesterday, this is a UTF-16 file. You ought to specify that it's
UTF-16 in the HTTP headers.

Better yet -- convert it to UTF-8 (which handles Hebrew characters just
fine!) and specify UTF-8 in the HTTP headers.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

 
Reply With Quote
 
Luigi Donatello Asero
Guest
Posts: n/a
 
      05-16-2005

"Toby Inkster" <(E-Mail Removed)> skrev i meddelandet
news(E-Mail Removed) .uk...
> N Cook wrote:
>
> > http://www.divdev.fsnet.co.uk/dysch.htm

>
> As I said yesterday, this is a UTF-16 file. You ought to specify that it's
> UTF-16 in the HTTP headers.
>
> Better yet -- convert it to UTF-8 (which handles Hebrew characters just
> fine!) and specify UTF-8 in the HTTP headers.
>
> --
> Toby A Inkster BSc (Hons) ARCS
> Contact Me ~ http://tobyinkster.co.uk/contact
>


I am not sure whether it is the same subject you are talkning about but I
have noticed something unusual ( for me) about the way how the webbsite
https://www.scaiecat-spa-gigi.com can be searched at www.google.se now.
When I searched the term "Scaiecat Spa Gigi" I got some hits from this
website and then a link to other pages of the same websites.
And when I did it, I found about 500 results.
Now I do not find this link any more, although it is clear that there are
more pages which have been indexed.
For example:
http://www.google.se/search?hl=sv&q=...Spa+Gigi&meta=
http://www.google.se/search?hl=sv&q=...+Italien&meta=
http://www.google.se/search?hl=sv&q=fakta+Italien&meta=
http://www.google.it/search?q=traduz...&start=10&sa=N
http://www.google.it/search?hl=it&q=...+svedese&meta=

Please, note that a part of the cached links are https adresses and php
adresses and another part are html adresses.
In the image section you still find a lot of results by using the term
"Scaiecat Spa Gigi"
http://images.google.se/images?q=Sca...Spa+Gigi&hl=sv
So, now I am wondering what has happened.



--
Luigi ( un italiano che vive in Svezia)
https://www.scaiecat-spa-gigi.com/it...ggio-2005.html





 
Reply With Quote
 
N Cook
Guest
Posts: n/a
 
      05-16-2005

"Toby Inkster" <(E-Mail Removed)> wrote in message
news(E-Mail Removed) .uk...
> N Cook wrote:
>
> > http://www.divdev.fsnet.co.uk/dysch.htm

>
> As I said yesterday, this is a UTF-16 file. You ought to specify that it's
> UTF-16 in the HTTP headers.
>
> Better yet -- convert it to UTF-8 (which handles Hebrew characters just
> fine!) and specify UTF-8 in the HTTP headers.
>
> --
> Toby A Inkster BSc (Hons) ARCS
> Contact Me ~ http://tobyinkster.co.uk/contact
>


The Hebrew text as perceived by Google covers 'letters'
& # 1488 ... & # 1514 (no spaces)
Is there a simple way of converting them to equivalents
that will not upset Google. I'm thinking of a cut & paste
into an online facility like online language translation.
I couldn't find one using keywords {convert "utf-16 to utf-8" online }






 
Reply With Quote
 
N Cook
Guest
Posts: n/a
 
      05-18-2005
"N Cook" <(E-Mail Removed)> wrote in message
news:d6a292$sir$(E-Mail Removed)...
>
> "Toby Inkster" <(E-Mail Removed)> wrote in message
> news(E-Mail Removed) .uk...
> > N Cook wrote:
> >
> > > http://www.divdev.fsnet.co.uk/dysch.htm

> >
> > As I said yesterday, this is a UTF-16 file. You ought to specify that

it's
> > UTF-16 in the HTTP headers.
> >
> > Better yet -- convert it to UTF-8 (which handles Hebrew characters just
> > fine!) and specify UTF-8 in the HTTP headers.
> >
> > --
> > Toby A Inkster BSc (Hons) ARCS
> > Contact Me ~ http://tobyinkster.co.uk/contact
> >

>
> The Hebrew text as perceived by Google covers 'letters'
> & # 1488 ... & # 1514 (no spaces)
> Is there a simple way of converting them to equivalents
> that will not upset Google. I'm thinking of a cut & paste
> into an online facility like online language translation.
> I couldn't find one using keywords {convert "utf-16 to utf-8" online }
>
>
>
>
>
>


For the archives, for anyone else not so computer-wise.
It looks as though all that is required is when it comes to saving file to
disk , in my case from Notepad, to
select coding option in "Save As" as UTF-8 rather than Unicode which I had
done before.
Will try ftp, UTF-8 version revised file this week


 
Reply With Quote
 
N Cook
Guest
Posts: n/a
 
      05-20-2005
"N Cook" <(E-Mail Removed)> wrote in message
news:d6f85h$cvg$(E-Mail Removed)...
> "N Cook" <(E-Mail Removed)> wrote in message
> news:d6a292$sir$(E-Mail Removed)...
> >
> > "Toby Inkster" <(E-Mail Removed)> wrote in message
> > news(E-Mail Removed) .uk...
> > > N Cook wrote:
> > >
> > > > http://www.divdev.fsnet.co.uk/dysch.htm
> > >
> > > As I said yesterday, this is a UTF-16 file. You ought to specify that

> it's
> > > UTF-16 in the HTTP headers.
> > >
> > > Better yet -- convert it to UTF-8 (which handles Hebrew characters

just
> > > fine!) and specify UTF-8 in the HTTP headers.
> > >
> > > --
> > > Toby A Inkster BSc (Hons) ARCS
> > > Contact Me ~ http://tobyinkster.co.uk/contact
> > >

> >
> > The Hebrew text as perceived by Google covers 'letters'
> > & # 1488 ... & # 1514 (no spaces)
> > Is there a simple way of converting them to equivalents
> > that will not upset Google. I'm thinking of a cut & paste
> > into an online facility like online language translation.
> > I couldn't find one using keywords {convert "utf-16 to utf-8" online }
> >
> >
> >
> >
> >
> >

>
> For the archives, for anyone else not so computer-wise.
> It looks as though all that is required is when it comes to saving file to
> disk , in my case from Notepad, to
> select coding option in "Save As" as UTF-8 rather than Unicode which I had
> done before.
> Will try ftp, UTF-8 version revised file this week
>
>


That didn't work.

This file, basically in English, contains some UTF-16 code for Hebrew,
Russian
and Thai and is cached with no problem on Google
http://pclt.cis.yale.edu/pclt/encoding/
cached on
http://64.233.183.104/search?q=cache...e.edu/pclt/enc
oding/+%22iso-8859-8%22+hebrew+russian+thai+yale&hl=en&start=1&ie=UTF-8

That Hebrew text does not contain character numbers 1494, 1509 and 1510
which are in 'my' Hebrew text.
I've tried a version minus 2 of these in case they are interpreted as
control codes , I've also added reference to charset=windows-1252.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Where to get stand alone Dot Net Framework version 1.1, version2.0, version 3.0, version 3.5, version 2.0 SP1, version 3.0 SP1 ? MowGreen [MVP] ASP .Net 5 02-09-2008 01:55 AM
Re: Where to get stand alone Dot Net Framework version 1.1, version 2.0, version 3.0, version 3.5, version 2.0 SP1, version 3.0 SP1 ? PA Bear [MS MVP] ASP .Net 0 02-05-2008 03:28 AM
Re: Where to get stand alone Dot Net Framework version 1.1, version 2.0, version 3.0, version 3.5, version 2.0 SP1, version 3.0 SP1 ? V Green ASP .Net 0 02-05-2008 02:45 AM
Re: Google cached version mangled Luigi Donatello Asero HTML 0 05-16-2005 10:49 AM
my cached dataset just wont stay cached!! Craig G ASP .Net 0 03-07-2005 10:02 AM



Advertisments