Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > identify the language of a web page

Reply
Thread Tools

identify the language of a web page

 
 
usgog@yahoo.com
Guest
Posts: n/a
 
      04-11-2008
Suppose I need to classify 10000 web pages based on their languages.
What should I look for to determine the language of each web page? Any
advice is welcome.
 
Reply With Quote
 
 
 
 
Andreas Prilop
Guest
Posts: n/a
 
      04-11-2008
On Thu, 10 Apr 2008, wrote:

> Suppose I need to classify 10000 web pages based on their languages.
> What should I look for to determine the language of each web page?


The "lang" attribute in HTML; the "xml:lang" attribute in XHTML.
 
Reply With Quote
 
 
 
 
Richard Tobin
Guest
Posts: n/a
 
      04-11-2008
In article <26414fbe-a0ef-48c9-af07->,
<> wrote:

>Suppose I need to classify 10000 web pages based on their languages.
>What should I look for to determine the language of each web page? Any
>advice is welcome.


Assuming you want to do this by inspection of the text (rather than
looking for xml:lang and the like), Google for language
identification. The first page lists several tools and a research
bibliography on the subject.

-- Richard
--
:wq
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to identify double bytes language? sqlcamel Perl Misc 8 11-14-2009 11:46 PM
identify the language of a web page usgog@yahoo.com Javascript 6 04-12-2008 11:51 PM
Identify the language of a String literal javadev Java 2 04-14-2006 03:50 AM
Identify which page is loaded in main frame of a frameset Andrew K Javascript 1 02-23-2005 02:06 PM
Page.Controls to identify server controls KK ASP .Net 2 01-25-2004 02:09 PM



Advertisments