wrote:
> Suppose I need to classify 10000 web pages based on their languages.
> What should I look for to determine the language of each web page? Any
> advice is welcome.
You could search the content for a common word in a given language that
is used in neither HTML nor script: " the " (including the spaces) would
be, I guess, a reasonable choice to identify English, although there's
no guarantee some bright spark hasn't named a script variable 'the', or
used the word in a comment.
Are you sure this is a JavaScript question?