Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > view page source or save after load

Reply
Thread Tools

view page source or save after load

 
 
zephron2000
Guest
Posts: n/a
 
      09-21-2006
Hey,

I need to either:

1. View the page source of a webpage after it loads

or

2. Save the webpage to my computer after it loads (same as File > Save
Page As)

urllib is not sufficient (using urlopen or something else in urllib
isn't going to do the trick)

Any ideas?

Thanks,
Lara



 
Reply With Quote
 
 
 
 
James Stroud
Guest
Posts: n/a
 
      09-21-2006
zephron2000 wrote:
> Hey,
>
> I need to either:
>
> 1. View the page source of a webpage after it loads
>
> or
>
> 2. Save the webpage to my computer after it loads (same as File > Save
> Page As)
>
> urllib is not sufficient (using urlopen or something else in urllib
> isn't going to do the trick)
>
> Any ideas?
>
> Thanks,
> Lara
>
>
>


I happen to be tweaking a module that does this as your question came
in. The relevant lines are:

fetchparams = urllib.urlencode(fetchparams)
wwwf = urllib.urlopen("?".join([baseurl, fetchparams]))
afile = open(filename, "w")
afile.write(wwwf.read())
afile.close()

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
Reply With Quote
 
 
 
 
alex23
Guest
Posts: n/a
 
      09-21-2006
zephron2000 wrote:
> I need to either:
> 1. View the page source of a webpage after it loads
> or
> 2. Save the webpage to my computer after it loads (same as File > Save
> Page As)
> urllib is not sufficient (using urlopen or something else in urllib
> isn't going to do the trick)


You don't really say _why_ urllib.urlopen "isn't going to do the
trick". The following does what you've described:

import urllib
page = urllib.urlopen('http://some.address')
open('saved_page.txt','w').write(page).close()

If you're needing to use a browser directly and you're running under
Windows, try the Internet Explorer Controller library, IEC:

import IEC
ie = IEC.IEController()
ie.Navigate('http://some.address')
page = ie.GetDocumentHTML()
open('saved_page.txt','w').write(page.encode('iso-8859-1')).close()

(You can grab IEC from http://www.mayukhbose.com/python/IEC/index.php)

Hope this helps.

-alex23

 
Reply With Quote
 
Gabriel Genellina
Guest
Posts: n/a
 
      09-21-2006
At Thursday 21/9/2006 02:26, alex23 wrote:

>page = urllib.urlopen('http://some.address')


add .read() at the end

>open('saved_page.txt','w').write(page).close()


write() does not return the file object, so this won't work; you have
to bind the file to a temporary variable to be able to close it.



Gabriel Genellina
Softlab SRL





__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

 
Reply With Quote
 
alex23
Guest
Posts: n/a
 
      09-21-2006

Gabriel Genellina wrote:
<fixes for my stupidity>

Thanks for the corrections, Gabriel. I really need to learn to
cut&paste working code

Cheers.

-alex23

 
Reply With Quote
 
James Stroud
Guest
Posts: n/a
 
      09-21-2006
Gabriel Genellina wrote:
> At Thursday 21/9/2006 02:26, alex23 wrote:
>
>> page = urllib.urlopen('http://some.address')

>
> add .read() at the end
>
>> open('saved_page.txt','w').write(page).close()

>
> write() does not return the file object, so this won't work; you have to
> bind the file to a temporary variable to be able to close it.


Strictly speaking, "have to" is not perfectly correct. The ".close()"
part can simply be eliminated as the file should close via garbage
collection once leaving the local namespace.

James
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
retaining save dialog box even after clicking save button hell2heaven Java 0 10-26-2008 07:18 PM
how to automatically "Save " a page after certain intervals without clicking "Save Page As..." subhadip Java 0 03-28-2007 04:15 PM
Updates in Design view not moved to Source view andspal ASP .Net Web Controls 0 11-02-2006 08:13 AM
Help with status bar - still showing load in progress even after page load Mike Dee Javascript 3 03-01-2006 08:06 PM
How to make a week view and day view calendar just like month view calendar in .NET ? Parthiv Joshi ASP .Net Web Controls 1 07-06-2004 03:15 PM



Advertisments