Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > download robot

Reply
Thread Tools

download robot

 
 
larryzhang
Guest
Posts: n/a
 
      04-13-2009
Hi,
Being a newbie for Python, I am trying to write a code that can act as
a downloading robot.

The website provides information for companies. Manually, I can search
by company name and then click the “download” button to get the data
in excel or word format, before saving the file in a local directory.
The program is to do this automatically.

I have met several problems when writing the codes:
1. The website needs user ID and password, is there a way that I can
pass my ID and password to the server in my python code?
2. Can Python hit the “download” button automatically and choose the
type of file format as I can do manually?
3. The url of each downloading webpage is not unique (webpages point
to different data files may share the same url), which prevent me from
working directly with the url as the address to find a certain file.
Is there any solution for this? Does this mean I have to work directly
with the database stored in the server rather than with the webpage
displayed?

Thank you very much for any comments and suggestions.

Larry
 
Reply With Quote
 
 
 
 
Kushal Kumaran
Guest
Posts: n/a
 
      04-13-2009
On Mon, Apr 13, 2009 at 11:13 AM, larryzhang <(E-Mail Removed)> wrote:
> Hi,
> Being a newbie for Python, I am trying to write a code that can act as
> a downloading robot.
>


This might be useful: http://wwwsearch.sourceforge.net/mechanize/.
I've only casually gone through the page, not actually used it. If
you feel like it, you can also use the urllib2 in the library to do
all the work yourself. Notes if you go this way are below.

> The website provides information for companies. Manually, I can search
> by company name and then click the “download” button to get the data
> in excel or word format, before saving the file in a local directory.
> The program is to do this automatically.
>
> I have met several problems when writing the codes:
> 1. The website needs user ID and password, is there a way that I can
> pass my ID and password to the server in my python code?


See the examples in the urllib2 documentation for how to send a
username and password for Basic authentication. If the authentication
is done using forms, you'll need to put that data with your request.
The website might then use cookies to track you, so your code will
need to be prepared to handle that.

> 2. Can Python hit the “download” button automatically and choose the
> type of file format as I can do manually?


The download button will probably be just an appropriate GET or POST
request. You'll need to be familiar with HTML forms to be able to do
this.

> 3. The url of each downloading webpage is not unique (webpages point
> to different data files may share the same url), which prevent me from
> working directly with the url as the address to find a certain file.
> Is there any solution for this? Does this mean I have to work directly
> with the database stored in the server rather than with the webpage
> displayed?


This simply means that the identifiers for the file to download are
being passed in using means other than the URL, most likely as POST
data. Look at the HTML for the page to see how.

>
> Thank you very much for any comments and suggestions.
>


You'll find tools that let you observe the communication between your
browser and the web server useful. If you use Mozilla Firefox, the
httpfox extension might help.

> Larry
> --
> http://mail.python.org/mailman/listinfo/python-list
>

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Where download download java.awt.robot class Rafal Majda Java 5 04-18-2005 05:00 PM
How can i save a page content from a http address? (just like a Search Engine Robot) gonzal kamikadze ASP .Net 2 04-06-2005 08:11 PM
Robot.java low level help followup yaktipper Java 0 10-27-2003 08:53 PM
How to stop a killer java.awt.robot? Darrel Riekhof Java 1 09-30-2003 08:59 AM



Advertisments