Web get command (wget) to download all icons/pics on a web page (too large or too small)

Discussion in 'Digital Photography' started by barb, Aug 4, 2006.

  1. barb

    barb Guest

    How do I get Windows/Linux web get to ignore all files of a too-small size?

    Like everyone, I often use the Windows/Linux "Free Software Foundation"
    web-get wget command to download all the PDFs, GIFs, or JPEGs in a web site
    onto my hard disk.

    The basic command we all use is:

    EXAMPLE FOR WINDOWS:
    c:\> wget -prA.gif http://machine/path

    EXAMPLE FOR LINUX:
    % wget -prA.jpg http://machine/path

    This famous wget command works great, except it downloads ALL the JPG & GIF
    icons and photos at the targeted web site - large or small.

    How do we tell wget to skip files of a certain size?

    For example, assume we wish to skip anything smaller than, say, 10KB and
    antyhing larger than, say, 100KB.

    Can we get wget to skip files that are too small or too large?

    barb
     
    barb, Aug 4, 2006
    #1
    1. Advertisements

  2. barb

    barb Guest

    Hi Marvin,

    Thank you for your help. I thought of this but I was kind of hoping that
    wget would have a "size" range option that handled this.

    Something like:

    wget -prA.pdf http://www.consumerreports.com --size<min:max>

    What I do today is sort by file size and then delete the too-large files
    and the too-small files but that is obviously not optimal.

    barb
     
    barb, Aug 4, 2006
    #2
    1. Advertisements

  3. barb

    barb Guest

    Hi Dances with Crows,

    Thank you for your kind help. As you surmised, I do not have the skill set
    to "hack" the venerable wget command so that it selects to download only
    files of a certain range in size.

    I had also read the manpage and I had searched prior but I did not see that
    anyone had done this yet. I am kind of surprised since it's the most basic
    of things you want to do.

    For example, let's say we went to a free icon site and let's say they
    updated that site periodically with the little web page bitmaps and better
    icons usable for powerpoint slides and too-big icons suitable for photo
    sessions.

    Let's say you had a scheduled wget go to that site daily and download all
    the icons automatically from that http web page but not the large ones or
    the really really small ones. Let's say there were thousands of these. Of
    course, ftp would be a pain. You likely wouldn't even have FTP access
    anyway. And, downloading them manually isn't in the cards.

    What I'd want to schedule is:
    wget -prA.gif,jpg,bmp http://that/freeware/icon/web/page --size:<low:high>

    barb
     
    barb, Aug 4, 2006
    #3
  4. barb

    barb Guest

    Hi poddys,

    Thank you very much for asking the right questions. Let's say I went to
    http://www.freeimages.co.uk or http://www.bigfoto.com or
    http://www.freefoto.com/index.jsp or any of a zillion sites which supply
    royalty free images or GIFs or bitmaps or PDFs or HTML files etc.

    Why wouldn't I want to use wget to obtain all the images, pdfs, word
    documents, powerpoint templates, whatever ... that this site offers.

    Even for sites I PAY for such as consumer reports and technical data ...
    why wouldn't I want to just use wget to download every single PDF or
    Microsoft office document or graphic at that web site?

    There's no copyright infringement in that is there?

    I can do all that today with wget.
    The only problem I have is that the really large (too large) files get
    downloaded too and that the really small (too small) files seem to be
    useless clutter.

    barb
     
    barb, Aug 4, 2006
    #4
  5. barb

    barb Guest

    Hi Dances with crows,

    I don't know what I want to do with the images or pdfs or powerpoint
    templates. For example, recently I found a page of royalty free powerpoint
    calendar templates. The web page had scores and scores of them.

    Nobody in their right mind is going to click on a link-by-link basis when
    they can run a simple wget command and get them all in one fell swoop (are
    they?)

    wget -prA.ppt http://that/web/page

    My older brother pointed me to one of his yahoo web pages which contained
    photos, hundreds of them. I picked up them all in seconds using:
    wget -prA.jpg http://that/web/page

    I wouldn't THINK of downloading a hundred photos manually (would you?).

    Do people REALLY download documents MANUALLY nowadays? Oh my. They're crazy
    in my opinion (although I did write and file this letter manually myself
    :p)

    barb
     
    barb, Aug 4, 2006
    #5
  6. barb

    barb Guest

    Hi Ben Dover,
    Thank you very much for your kind advice.

    I am not a programmer but I guess it could look like this (in dos)?

    REM wget.bat
    wget -prA.ppt,jpg,doc,pdf,gif http://some/web/page
    dir
    if filesize < 10K then del filename
    else
    if filesize > 100K then del filename
    end

    And, in linux, maybe something like this (I found on the web):

    # wget
    wget -prA.ppt,jpg,doc,pdf,gif http://some/web/page
    foreach file (`ls`)
    set size = `ls | awk 'print $3'`
    if $size < 10000 then rm $file
    if $size > 100000 then rm $file
    endif
    end

    Is this a good start (which newsgroup could we ask?)
    barb
     
    barb, Aug 4, 2006
    #6
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.