Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Obtaining Webpage Source with Python

Reply
Thread Tools

Obtaining Webpage Source with Python

 
 
Ryan Kaskel
Guest
Posts: n/a
 
      06-24-2004
How can I obtain the source of a remote webpage (e.g.
http://www.python.org/index.html) using Python?

Something like:

pyPage = open('http://www.python.org/index.html',r).read()

Obviously that won't work but how can I do something to that effect?
Thanks,
Ryan Kaskel

--I posed this before but it seems it is not showing up...
 
Reply With Quote
 
 
 
 
Paul Rubin
Guest
Posts: n/a
 
      06-24-2004
(Ryan Kaskel) writes:
> Something like:
>
> pyPage = open('http://www.python.org/index.html',r).read()
>
> Obviously that won't work but how can I do something to that effect?


import urllib
pyPage = urllib.urlopen('http://www.python.org/index.html',r).read()
 
Reply With Quote
 
 
 
 
Paul Rubin
Guest
Posts: n/a
 
      06-24-2004
Paul Rubin <http://> writes:
> import urllib
> pyPage = urllib.urlopen('http://www.python.org/index.html',r).read()


oops:

import urllib
pyPage = urllib.urlopen('http://www.python.org/index.html').read()

i.e. omit the 'r' argument to urlib.urlopen.
 
Reply With Quote
 
=?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=
Guest
Posts: n/a
 
      06-24-2004

> pyPage = open('http://www.python.org/index.html',r).read()


using open() for local files and ORLs is called url-fopen and works in
PHP, which is a major security hole, because it even allows one to
include() code files from the web without knowing it, that kind of thing...

python has two functions so you know what you're doing.

If your webpage needs cookies or something, you'll need urllib2

If you wanna parse it afterwards use Htmllib or BeautifulSoup
 
Reply With Quote
 
Phil Frost
Guest
Posts: n/a
 
      06-24-2004
Take a look at the urllib module:

http://python.org/doc/2.3.3/lib/module-urllib.html

On Wed, Jun 23, 2004 at 10:03:04PM -0700, Ryan Kaskel wrote:
> How can I obtain the source of a remote webpage (e.g.
> http://www.python.org/index.html) using Python?
>
> Something like:
>
> pyPage = open('http://www.python.org/index.html',r).read()
>
> Obviously that won't work but how can I do something to that effect?
> Thanks,
> Ryan Kaskel
>
> --I posed this before but it seems it is not showing up...


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
cause webpage one to reload when webpage two is closed. Paul ASP .Net 14 06-19-2008 03:02 PM
Obtaining directory from with the source file mshetty@mail.com Java 2 01-18-2007 01:12 AM
Obtaining source for 1.8.5 and 1.9? znmeb@cesmail.net Ruby 1 08-07-2006 02:18 PM
check if a webpage is forwarding to a other webpage martijn@gamecreators.nl Python 1 09-06-2005 02:27 PM
Email contents of webpage or Form on webpage w/o using Server scripting sifar Javascript 5 08-24-2005 05:47 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57