Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > http request

Reply
Thread Tools

http request

 
 
Peder Ydalus
Guest
Posts: n/a
 
      01-14-2004
I'm trying to write a program that will dynamically let me download
pictures from a website. The problem seems to be, however, that when I
use getstore() or write the (e.g.) ".../images/01.jpg" address in
manually, the server redirects the request to some add page. I guess
it's checking that the only way to get to these pics is that the
requestee has $REMOTE_ADDR, $REMOTE_HOST or $HTTP_REFERER or something
set in the request.

If I need to manually construct such a request, what is the way to go
about this?

Thanks!

- Peder -

 
Reply With Quote
 
 
 
 
Richard Gration
Guest
Posts: n/a
 
      01-14-2004
In article <bu3h72$dgh$(E-Mail Removed)>, "Peder Ydalus"
<(E-Mail Removed)> wrote:


> I'm trying to write a program that will dynamically let me download
> pictures from a website. The problem seems to be, however, that when I
> use getstore() or write the (e.g.) ".../images/01.jpg" address in
> manually, the server redirects the request to some add page. I guess
> it's checking that the only way to get to these pics is that the
> requestee has $REMOTE_ADDR, $REMOTE_HOST or $HTTP_REFERER or something
> set in the request.
> If I need to manually construct such a request, what is the way to go
> about this?
> Thanks!
> - Peder -
>


Hi,

This is how I would go (have gone) about this:

1.Use a packet sniffer (eg ethereal) to find the headers from a successful
request
2. See if you can duplicate this successful request from a perl script by
setting the relevant [1] headers correctly. Setting headers is explained
in the docs for the lwp lib. If yes, you're done. If not ...
3. Set up a cookie jar (also explained in the docs) in your perl script
and see if this improves matters.

If none of this works, post with your results.

Might I also suggest you look into wget, a utility for bulk download of
web pages.

HTH
Rick

[1] Referer: is a good candidate for a relevant header. There may be
others. Also, some web sites react differently based on the User-Agent:
string.
 
Reply With Quote
 
 
 
 
Iain Chalmers
Guest
Posts: n/a
 
      01-15-2004
In article <bu3kpm$6fs$(E-Mail Removed)2surf.net>,
"Richard Gration" <(E-Mail Removed)> wrote:

> In article <bu3h72$dgh$(E-Mail Removed)>, "Peder Ydalus"
> <(E-Mail Removed)> wrote:
>
>
> > I'm trying to write a program that will dynamically let me download
> > pictures from a website. The problem seems to be, however, that when I
> > use getstore() or write the (e.g.) ".../images/01.jpg" address in
> > manually, the server redirects the request to some add page. I guess
> > it's checking that the only way to get to these pics is that the
> > requestee has $REMOTE_ADDR, $REMOTE_HOST or $HTTP_REFERER or something
> > set in the request.
> > If I need to manually construct such a request, what is the way to go
> > about this?
> > Thanks!
> > - Peder -
> >

>
> Hi,
>
> This is how I would go (have gone) about this:
>
> 1.Use a packet sniffer (eg ethereal) to find the headers from a successful
> request
> 2. See if you can duplicate this successful request from a perl script by
> setting the relevant [1] headers correctly. Setting headers is explained
> in the docs for the lwp lib. If yes, you're done. If not ...
> 3. Set up a cookie jar (also explained in the docs) in your perl script
> and see if this improves matters.


Even easier is to use Web Scraping Proxy from:

http://www.research.att.com/~hpk/wsp/

"Web Scraping Proxy

Programmers often need to use information on Web pages as input to
other programs. This is done by Web Scraping, writing a program to
simulate a person viewing a Web site with a browser. It is often hard
to write these programs because it is difficult to determine the Web
requests necessary to do the simulation.

The Web Scraping Proxy (WSP) solves this problem by monitoring the flow
of information between the browser and the Web site and emitting Perl
LWP code fragments that can be used to write the Web Scraping program.
A developer would use the WSP by browsing the site once with a browser
that accesses the WSP as a proxy server. He then uses the emitted code
as a template to build a Perl program that accesses the site. "

cheers,
big

--
'When I first met Katho, she had a meat cleaver in one hand and
half a sheep in the other. "Come in", she says, "Hammo's not here.
I hope you like meat.' Sharkey in aus.moto
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Abort an HTTP request before the request timed out Gelonida N Python 0 02-27-2012 02:22 AM
urllib2.Request:: http Request sending successfully, but Responsecontains in valid data. nRk Python 1 02-12-2009 12:53 AM
Why getInputStream in a http servlet request isn't getting the datasent by browser HTTP POST action? James Java 3 11-25-2005 11:17 PM
How to enter to .aspx page by http connection using http POST request farazkazmi@gmail.com Java 6 08-29-2005 02:58 PM
Re: Accessing Request.InputStream / Request.BinaryRead *as the request is occuring*: How??? Brian Birtle ASP .Net 2 10-16-2003 02:11 PM



Advertisments