Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > recursively pull web site?

Reply
Thread Tools

recursively pull web site?

 
 
Mike
Guest
Posts: n/a
 
      06-23-2004
For the purpose of mirroring files on many nodes, I have set the files
under a web server and want to pull from the top of the structure
down getting all files. Is there an easy way to pass the beginning
url (http://server/RPMS/) to a method and have that method(s) pull
all files to the local node (keeping the directory structure from
the web server)?

This is to propagate configuration files, scripts, etc., to many boxes.

Mike
 
Reply With Quote
 
 
 
 
Mike
Guest
Posts: n/a
 
      06-23-2004
In article <(E-Mail Removed)>, Michael Borgwardt wrote:
> Mike wrote:
>
>> For the purpose of mirroring files on many nodes, I have set the files
>> under a web server and want to pull from the top of the structure
>> down getting all files. Is there an easy way to pass the beginning
>> url (http://server/RPMS/) to a method and have that method(s) pull
>> all files to the local node (keeping the directory structure from
>> the web server)?

>
> Runtime.getRuntime().exec("wget -m "+url);
>
> Assuming, of course, that all the files you need are listed on HTML pages
> (possibly directory listings generated by the server) reachable from
> that first one.


And assuming that wget is installed on all my servers (unix, intel,
mainframe, etc.).
 
Reply With Quote
 
 
 
 
Michael Borgwardt
Guest
Posts: n/a
 
      06-23-2004
Mike wrote:

> For the purpose of mirroring files on many nodes, I have set the files
> under a web server and want to pull from the top of the structure
> down getting all files. Is there an easy way to pass the beginning
> url (http://server/RPMS/) to a method and have that method(s) pull
> all files to the local node (keeping the directory structure from
> the web server)?


Runtime.getRuntime().exec("wget -m "+url);

Assuming, of course, that all the files you need are listed on HTML pages
(possibly directory listings generated by the server) reachable from
that first one.
 
Reply With Quote
 
Andy Fish
Guest
Posts: n/a
 
      06-23-2004
you need a web spider or web robot.

since you're asking in here, I presume you want a java one. I have used jobo
which is free and seems to work OK but many others are available.

Whether it's an appropriate mechanism for mirroring software is another
question - I would probably prefer to tar/zip it up and then FTP it around

Andy


"Mike" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> For the purpose of mirroring files on many nodes, I have set the files
> under a web server and want to pull from the top of the structure
> down getting all files. Is there an easy way to pass the beginning
> url (http://server/RPMS/) to a method and have that method(s) pull
> all files to the local node (keeping the directory structure from
> the web server)?
>
> This is to propagate configuration files, scripts, etc., to many boxes.
>
> Mike



 
Reply With Quote
 
Michael Borgwardt
Guest
Posts: n/a
 
      06-23-2004
Mike wrote:
>>Runtime.getRuntime().exec("wget -m "+url);
>>
>>Assuming, of course, that all the files you need are listed on HTML pages
>>(possibly directory listings generated by the server) reachable from
>>that first one.

>
>
> And assuming that wget is installed on all my servers (unix, intel,
> mainframe, etc.).


Isn't it?

I'm pretty sure it would be less work than programming a web spider of your
own, but if there's one already done in java, that's of course even better.
 
Reply With Quote
 
Mike
Guest
Posts: n/a
 
      06-23-2004
In article <7kiCc.779$(E-Mail Removed)>, Andy Fish wrote:
> you need a web spider or web robot.
>
> since you're asking in here, I presume you want a java one. I have used jobo
> which is free and seems to work OK but many others are available.
>
> Whether it's an appropriate mechanism for mirroring software is another
> question - I would probably prefer to tar/zip it up and then FTP it around
>
> Andy
>
>
> "Mike" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> For the purpose of mirroring files on many nodes, I have set the files
>> under a web server and want to pull from the top of the structure
>> down getting all files. Is there an easy way to pass the beginning
>> url (http://server/RPMS/) to a method and have that method(s) pull
>> all files to the local node (keeping the directory structure from
>> the web server)?
>>
>> This is to propagate configuration files, scripts, etc., to many boxes.
>>
>> Mike

>
>


The tar.gz solution is fine for lots of things, but not incremental
changes to a production server farm. For major application changes,
even utility changes (sudo, lsof, cvs, etc.), I will use rpm since
I can get source for it and can compile it everywhere.

Thanks for the suggestions.

Mike
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      06-23-2004
On Wed, 23 Jun 2004 16:36:19 GMT, "Andy Fish"
<(E-Mail Removed)> wrote or quoted :

>you need a web spider or web robot.


Xenu is very quick at spidering and will produce reports on what it
found including broken links and orphaned files.

you would take its output and feed that into a mindless little program
that just downloaded the files it found one after another.

Xenu is quick because it uses many threads.

You could smarten your own beast up a bit by using several download
threads each feeding off a common queue.


See http://mindprod.com/projects/brokenlinkfixer.html

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
 
Reply With Quote
 
Mike
Guest
Posts: n/a
 
      06-23-2004
In article <(E-Mail Removed)>, Roedy Green wrote:
> On Wed, 23 Jun 2004 16:36:19 GMT, "Andy Fish"
><(E-Mail Removed)> wrote or quoted :
>
>>you need a web spider or web robot.

>
> Xenu is very quick at spidering and will produce reports on what it
> found including broken links and orphaned files.
>
> you would take its output and feed that into a mindless little program
> that just downloaded the files it found one after another.
>
> Xenu is quick because it uses many threads.
>
> You could smarten your own beast up a bit by using several download
> threads each feeding off a common queue.
>
>
> See http://mindprod.com/projects/brokenlinkfixer.html
>


Thanks, Roedy. I enjoy reading your posts. I'll look at Xenu.

Mike
 
Reply With Quote
 
Andrew Thompson
Guest
Posts: n/a
 
      06-23-2004
On Wed, 23 Jun 2004 15:39:39 -0000, Mike wrote:

> Is there an easy way to pass the beginning
> url (http://server/RPMS/) to a method and have that method(s) pull
> all files to the local node (keeping the directory structure from
> the web server)?


This might serve as an example to get you started..
<http://groups.google.com/groups?as_q=PullUrl3%20koran.html>

--
Andrew Thompson
http://www.PhySci.org/ Open-source software suite
http://www.PhySci.org/codes/ Web & IT Help
http://www.1point1C.org/ Science & Technology
 
Reply With Quote
 
Mike
Guest
Posts: n/a
 
      06-24-2004
In article <vonwkqe5mkzg$(E-Mail Removed)>, Andrew Thompson wrote:
> On Wed, 23 Jun 2004 15:39:39 -0000, Mike wrote:
>
>> Is there an easy way to pass the beginning
>> url (http://server/RPMS/) to a method and have that method(s) pull
>> all files to the local node (keeping the directory structure from
>> the web server)?

>
> This might serve as an example to get you started..
><http://groups.google.com/groups?as_q=PullUrl3%20koran.html>
>


Fantastic, thanks. A simple solution occurred to me while driving
home. Since I want to replicate files, my files from my web server,
the script that keeps the files current from the CVS repository
can easily do a 'find . -type f -print > files'. My program first
pulls the file 'files', then iterates through the contents pulling
each file mentioned.

Thanks for the help, everyone.

Mike
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Concurrent threads to pull web pages? Gilles Ganault Python 5 10-02-2009 03:09 PM
weak pull up and pull down krithiga81@yahoo.com VHDL 2 06-28-2006 02:18 PM
Sorting bookmarks recursively? taras.di@gmail.com Firefox 0 07-31-2005 06:02 AM
Best way to recursively populate a tree based on SQL Server 2000 backend Mark ASP .Net 9 05-28-2005 08:54 PM
pull image file from SQL server then displayed in web browser Jason Java 1 04-20-2004 05:40 AM



Advertisments