----- Original Message -----
From: "Andy Dingley" <>
Newsgroups: alt.html
Sent: Saturday, April 23, 2005 11:49 AM
Subject: Re: download blocking
> On Sat, 23 Apr 2005 07:57:55 GMT, (Helmut Blass)
> wrote:
>>I have written a VB programm, which automatically downloads web-pages
>>which
>>are linked to rss-feeds. Unfortunately there are some sites which cannot
>>be
>>downloaded by program but only viewed online.
>
> We can guess, but if you tell us the URLs then we can look at the actual
> examples. Also tell us why you can't download them - do you get
> anything, the wrong thing, or just a 404 ?
>
> My two gueses:
>
> It's related to the HTTP user-agent string that you're sending. The site
> only accepts browsers that it recognises. This is stupid behaviour on
> behalf of the site, so stupid that I don't think this is likely. You
> should be able to work around it easily by impersonating IE.
>
> Secondly (and more likely) you're probably using the MSXML component
> within your VB program. This uses XML and RSS 0.9* isn't an XML
> protocol. It looks a lot like XML, but most feeds are either not valid
> RSS, or not even well-formed XML. For a "production grade" RSS reader
> you can't rely on all feeds being well-formed XML, all the time.
>
>
> And I don't know waht "lostinspace"s problem is, but he's a clueless
> muppet if he doesn't realise what RSS is about.
>
http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html
http://blogs.law.harvard.edu/tech/rss#whatIsRss
http://www.webreference.com/authorin...xml/rss/intro/
As a webmaster with very unique and copyrighted content (which exists
NOHWHERE else,) I should allow crawling of my sites under the pretense of
offline-use while the material is harvested to either sell to 3rd partys,
present to third parties; outside my websites or have the material
interpretated for any other 3rd party benefit.
Hogwash.
If vialble orgs desire my content, than let them approach me with
compensation and/or permission for the sweat of my brow, otherwise let them
eat 403's.
My sites are unique in these types of materials, however so are many others.
Few issues regarding traffic and visitors as related to websites are cut and
dry or black and white.
Each webmaster must make their own decisions on what is beneficial and
detriemental to their websites and base their websites actions on what they
desire.
One example would be "Helmut" whom would never get into my sites from a DE
IP range or a DE referral search.
Of course he may fake his IP for limited access. That's not the same as a
full-scrape.
WHY?
Their is no possible way for a DE visitor or traffic to enhance or benefit
my websites. They only draw resources and materials, which I have little
time to spend monitoring for plagiarism.