Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Saving the web, charset problems and symbols problems

Reply
Thread Tools

Saving the web, charset problems and symbols problems

 
 
Sak Na rede
Guest
Posts: n/a
 
      01-30-2009
Hi all!

I think that a lot of ruby scripts are for web crawling, web scrapping
and many more applications with the web. I'm working with the web too, I
try to save text of many different webs. In this moment I'm trying to
solve two problems:

1 - How to standard the charset of the web. There are a lot of
differents charsets and I think that it must be possible another
solution that see every charset and convert to proper charset each time.
(By the way, what is the best method to see charset of a file? command
file is not very good, I think)

2 - How to convert HTML to plain text. I use Hpricot but a lot of very
rare simbols continues there like "€" or "”". Wich is the most used
method?

Thanks a lot
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
charset problems with urllib/urllib2 Brot Python 0 02-23-2009 04:13 PM
javascript charset <> page charset optimistx Javascript 2 08-15-2008 12:42 PM
Problems with Charset Encoding Grimps ASP General 4 08-12-2008 10:06 PM
Symbols charset problem Sid Ismail HTML 21 06-13-2006 04:22 AM
Still having charset problems with Tomcat 5 on Windows bdobby@fish.co.uk Java 5 10-27-2004 08:45 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57