Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > JSP or httpservlet for Java spider?

Reply
Thread Tools

JSP or httpservlet for Java spider?

 
 
Greg Peters
Guest
Posts: n/a
 
      12-24-2005
Hi. I want to spider just a few websites, not the entire site, just 1 or 2
levels deep. So can I use JSP or httpservlets for this? Does anyone know of
some tutorial/code/book that explains this? I usually use JSP and
httpservlets for processing requests, but I want to get the data from a
different website.

Or do I have to spider using perl, then store it in a database and retrieve
it using JSP/httpservlets? Thank you.

 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      12-24-2005
On 24 Dec 2005 05:25:54 GMT, Greg Peters <(E-Mail Removed)> wrote,
quoted or indirectly quoted someone who said :

>Hi. I want to spider just a few websites, not the entire site, just 1 or 2
>levels deep. So can I use JSP or httpservlets for this? Does anyone know of
>some tutorial/code/book that explains this? I usually use JSP and
>httpservlets for processing requests, but I want to get the data from a
>different website.


see http://mindprod.com/applets/fileio.htm
for how to do GET.

Then you have to find the links to spider e.g.

with pattern
<a href="xxxx"

you can crudely use indexOf "<a href="
or you can use a regex if you want to catch squirrelly stuff like
extra spaces or parms.

See http://mindprod.com/jgloss/regex.html

You add the links to a queue of links to be spidered.
See http://mindprod.com/queue.html

Then you spawn up to N threads that grab the next queue items and
spider it.

See http://mindprod.com/projects/htmlbrokenlink.html
for more details.

--
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
 
Reply With Quote
 
 
 
 
John C. Bollinger
Guest
Posts: n/a
 
      12-26-2005
Greg Peters wrote:
> Hi. I want to spider just a few websites, not the entire site, just 1 or 2
> levels deep. So can I use JSP or httpservlets for this? Does anyone know of
> some tutorial/code/book that explains this? I usually use JSP and
> httpservlets for processing requests, but I want to get the data from a
> different website.
>
> Or do I have to spider using perl, then store it in a database and retrieve
> it using JSP/httpservlets? Thank you.


JSP and servlets are mechanisms for generating dynamic responses to HTTP
requests. They are most often used for serving HTML pages. They have
no special mechanism beyond any other Java code for making
general-purpose HTTP requests are doing anything with the results of
such a request.

Even though JSP and servlets specifically would be inappropriate choices
for a web spider, that does not mean that Java in general is wrong for
the task. To the contrary, the Java platform library has good support
for a wide variety of network- and web-oriented tasks, and there are a
multitude of 3rd party libraries that build further on that foundation.
Look at the URL, URLConnection, and HttpURLConnection classes in the
java.net package to start, and perhaps at DOM (package org.w3c.dom) for
document analysis. You might also find the Jakarta HTTP Client library
useful: http://jakarta.apache.org/commons/httpclient/ There are many
other resources available.

As for displaying pages previously retrieved by your spider, chances are
that a fairly simple servlet could handle the job admirably. There
might be reasons to do it with JSP / custom tags instead, but that
approach wouldn't be my first inclination.


--
John Bollinger
http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Question about invoking HttpServlet on Linux, by URL or via JSP Robert Maas, see http://tinyurl.com/uh3t Java 3 05-11-2005 11:40 PM
HttpServlet and Static Methods Eric Java 1 02-12-2004 08:59 PM
parameter in HttpServlet.doGet Raoul Markus Java 0 08-21-2003 10:35 PM
HTTPSERVLET brijesh Java 1 07-31-2003 12:30 PM
HttpServlet implementation in Tomcat 4.1 Laurent Beaubier \(free.fr\) Java 1 07-03-2003 09:58 PM



Advertisments