Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Looking for Java web crawler api

Reply
Thread Tools

Looking for Java web crawler api

 
 
pm
Guest
Posts: n/a
 
      07-12-2011
Hello, I am working on a project that requires me to do custom search on
different websites. I am using Java and while I can write this from
ground up, I am looking at using existing APIs that can be used due to
time limit. So far I have came across Apache's HttpClient.
I am wondering if there are any others that can be effective or
give more options for web searching/scraping. I plan to create a GUI
based application and need something quick and effective while not being
too complex.
I appreciate any feedback.
 
Reply With Quote
 
 
 
 
Bent C Dalager
Guest
Posts: n/a
 
      07-12-2011
I found JSoup (jsoup.org) to be a fine library for web scraping. It
lets you easily set cookies and headers, fetches the URL for you, and
converts the tangled mess of HTML you tend to receive into a
well-formed XML document model.

Cheers,
Bent D.
--
Bent Dalager - http://www.velocityreviews.com/forums/(E-Mail Removed) - http://www.pvv.org/~bcd
powered by emacs
 
Reply With Quote
 
 
 
 
Durango2011
Guest
Posts: n/a
 
      07-13-2011
On Tue, 12 Jul 2011 09:44:38 +0000, Bent C Dalager wrote:

> I found JSoup (jsoup.org) to be a fine library for web scraping. It lets
> you easily set cookies and headers, fetches the URL for you, and
> converts the tangled mess of HTML you tend to receive into a well-formed
> XML document model.
>
> Cheers,
> Bent D.


Thank you very much that looks like what I am looking for.
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      07-14-2011
On 12 Jul 2011 07:14:45 GMT, pm <(E-Mail Removed)0m> wrote, quoted
or indirectly quoted someone who said :

> I am wondering if there are any others that can be effective or
>give more options for web searching/scraping. I plan to create a GUI
>based application and need something quick and effective while not being
>too complex.


If you want something very simple, see
http://mindprod.com/products1.html#HTTP

see http://mindprod.com/jgloss/screenscraping.html
--
Roedy Green Canadian Mind Products
http://mindprod.com
One thing I love about having a website, is that when I complain about
something, I only have to do it once. It saves me endless hours of grumbling.
 
Reply With Quote
 
iadb
Guest
Posts: n/a
 
      07-18-2011
On Jul 12, 3:14*am, pm <(E-Mail Removed)0m> wrote:
> Hello, I am working on a project that requires me to do custom search on
> different websites. *I am using Java and while I can write this from
> ground up, I am looking at using existing APIs that can be used due to
> time limit. *So far I have came across Apache's HttpClient. *
> * * * * I am wondering if there are any others that can be effective or
> give more options for web searching/scraping. I plan to create a GUI
> based application and need something quick and effective while not being
> too complex.
> I appreciate any feedback.


Look at the attached example, it works fine with little
customization..
http://java.sun.com/developer/techni...ty/WebCrawler/


http://www.internetarticlesdb.com
 
Reply With Quote
 
Durango2011
Guest
Posts: n/a
 
      07-21-2011
On Tue, 12 Jul 2011 07:14:45 +0000, pm wrote:


Thanks for all the great feedback
 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      07-21-2011
On 7/12/2011 3:14 AM, pm wrote:
> Hello, I am working on a project that requires me to do custom search on
> different websites. I am using Java and while I can write this from
> ground up, I am looking at using existing APIs that can be used due to
> time limit. So far I have came across Apache's HttpClient.
> I am wondering if there are any others that can be effective or
> give more options for web searching/scraping. I plan to create a GUI
> based application and need something quick and effective while not being
> too complex.


http://nutch.apache.org/ should contain a crawler and it comes with
a searchable database (Lucene).

Arne

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Web Crawler Paul Morrison Java 3 06-30-2012 04:17 PM
Looking for web crawler written in Ruby Kev Ruby 6 02-02-2006 12:15 AM
Web Crawler Sanjay Patra C++ 2 11-18-2004 06:12 AM
Web Crawler / Spider Commercial Software Info Request Gray Ghost Computer Support 1 11-07-2004 01:48 PM
Web Crawler Hans Computer Support 1 07-20-2003 03:20 PM



Advertisments