Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Web-crawling

Reply
Thread Tools

Web-crawling

 
 
John Bradbury
Guest
Posts: n/a
 
      10-04-2003
I am trying to develop a special putpose crawler using htmllib & urllib.
How do you tell the server application that you are a modern browser and can
handle frames?

Thanks,

john Bradbury


 
Reply With Quote
 
 
 
 
Rene Pijlman
Guest
Posts: n/a
 
      10-04-2003
John Bradbury:
>I am trying to develop a special putpose crawler using htmllib & urllib.
>How do you tell the server application that you are a modern browser and can
>handle frames?


I don't know of any "I can handle frames" header and I don't see why the
server would care, but you could mimic the User-agent header sent by a
modern browser.

--
René Pijlman
 
Reply With Quote
 
 
 
 
John Bradbury
Guest
Posts: n/a
 
      10-04-2003
I don't know what is causing the problem, but the site I am accessing is
sending out forms for a browser that has a low resolution and does not
support frames. Excuse my ignorance, but where do you set up the User-agent
header you suggested.

Many thanks for your prompt reply.

John Bradbury

"Rene Pijlman" <(E-Mail Removed)> wrote in
message news:(E-Mail Removed)...
> John Bradbury:
> >I am trying to develop a special putpose crawler using htmllib & urllib.
> >How do you tell the server application that you are a modern browser and

can
> >handle frames?

>
> I don't know of any "I can handle frames" header and I don't see why the
> server would care, but you could mimic the User-agent header sent by a
> modern browser.
>
> --
> René Pijlman



 
Reply With Quote
 
Rene Pijlman
Guest
Posts: n/a
 
      10-04-2003
John Bradbury:
>where do you set up the User-agent header you suggested.


Its an HTTP header:
http://www.ietf.org/rfc/rfc2616.txt

See also:
http://www.google.com/search?q=urlli...ser%2Dagent%22

which leads you to... tada... the documentation!
http://www.python.org/doc/current/li...le-urllib.html

--
René Pijlman
 
Reply With Quote
 
John J. Lee
Guest
Posts: n/a
 
      10-04-2003
"John Bradbury" <john_bradbury@___cableinet.co.uk> writes:

> "Rene Pijlman" <(E-Mail Removed)> wrote in
> message news:(E-Mail Removed)...
> > John Bradbury:
> > >I am trying to develop a special putpose crawler using htmllib & urllib.
> > >How do you tell the server application that you are a modern browser
> > >and can handle frames?

[...]
> > server would care, but you could mimic the User-agent header sent by a

[...]
> I don't know what is causing the problem, but the site I am accessing is
> sending out forms for a browser that has a low resolution and does not
> support frames. Excuse my ignorance, but where do you set up the
> User-agent header you suggested.


For urllib2 (well, almost):

http://wwwsearch.sourceforge.net/Cli...c.html#headers


John
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments