Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > how to count and extract images

Reply
Thread Tools

how to count and extract images

 
 
Joe
Guest
Posts: n/a
 
      10-23-2005
I'm trying to get the location of the image uisng

start = s.find('<a href="somefile') + len('<a
href="somefile')
stop = s.find('">Save File</a></B>',
start) fileName = s[start:stop]
and then construct the url with the filename to download the image
which works fine as cause every image has the Save File link and I can
count number of images easy the problem is when there is more than image I
try using while loop downlaod files, wirks fine for the first one but
always matches the same, how can count and thell the look to skip the fist
one if it has been downloaded and go to next one, and if next one is
downloaded go to next one, and so on.
 
Reply With Quote
 
 
 
 
Alex Martelli
Guest
Posts: n/a
 
      10-24-2005
Joe <(E-Mail Removed)> wrote:

> I'm trying to get the location of the image uisng
>
> start = s.find('<a href="somefile') + len('<a
> href="somefile')
> stop = s.find('">Save File</a></B>',
> start) fileName = s[start:stop]
> and then construct the url with the filename to download the image
> which works fine as cause every image has the Save File link and I can
> count number of images easy the problem is when there is more than image I
> try using while loop downlaod files, wirks fine for the first one but
> always matches the same, how can count and thell the look to skip the fist
> one if it has been downloaded and go to next one, and if next one is
> downloaded go to next one, and so on.


Pass the index from where the search must start as the second argument
to the s.find method -- you're already doing that for the second call,
so it should be pretty obvious it will also work for the first one, no?


Alex
 
Reply With Quote
 
 
 
 
Mike Meyer
Guest
Posts: n/a
 
      10-24-2005
Joe <(E-Mail Removed)> writes:
> start = s.find('<a href="somefile') + len('<a
> href="somefile')
> stop = s.find('">Save File</a></B>',
> start) fileName = s[start:stop]
> and then construct the url with the filename to download the image
> which works fine as cause every image has the Save File link and I can
> count number of images easy the problem is when there is more than image I
> try using while loop downlaod files, wirks fine for the first one but
> always matches the same, how can count and thell the look to skip the fist
> one if it has been downloaded and go to next one, and if next one is
> downloaded go to next one, and so on.


To answer your question, use the first optional argument to find in both
invocations of find:

stop = 0
while end >= 0:
start = s.find('<a href="somefile', stop) + len('<a href="somefile')
stop = s.find('">Save File</a></B>', start)
fileName = s[start:stop]

Now, to give you some advice: don't do this by hand, use an HTML
parsing library. The code above is incredibly fragile, and will break
on any number of minor variations in the input text. Using a real
parser not only avoids all those problems, it makes your code shorter.
I like BeautifulSoup:

soup = BeautifulSoup(s)
for anchor in soup.fetch('a'):
fileName = anchor['href']

to get all the hrefs. If you only want the ones that have "Save File"
in the link text, you'd do:

soup = BeautifulSoup(s)
for link in soup.fetchText('Save File'):
fileName = link.findParent('a')['href']

<mike
--
Mike Meyer <(E-Mail Removed)> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Count = Count + 1 Using only std_logic_1164 Doubt efelnavarro09 VHDL 2 01-26-2011 03:49 AM
How do i extract vidios when winrar wont extract them??? help plzzzzzzzz smuttdog@sc.rr.com Computer Support 2 12-23-2007 07:03 AM
Count(*) in a Subquery with multiple tables: How does SQL determine which table to generate the Count() from? Kaimuri MCSD 3 12-29-2004 06:38 PM
I am adding a new row to the datagrid dynamically but if i use the Count property of Item it is not showing the count of the new rows being added Praveen Balanagendra via .NET 247 ASP .Net 2 06-06-2004 07:16 AM



Advertisments