Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > HTML > download blocking

Reply
Thread Tools

download blocking

 
 
Helmut Blass
Guest
Posts: n/a
 
      04-23-2005
hi,
I have written a VB programm, which automatically downloads web-pages which
are linked to rss-feeds. Unfortunately there are some sites which cannot be
downloaded by program but only viewed online.
I guess there must be some html or javascript trick which blocks the download
process.
does anybody know how this dirty trick works?

thanx for your help, Helmut

--
Der Staat ist die große Fiktion, nach der sich jedermann bemüht, auf Kosten
jedermanns zu leben.

Frédéric Bastiat
 
Reply With Quote
 
 
 
 
lostinspace
Guest
Posts: n/a
 
      04-23-2005
----- Original Message -----
From: "Helmut Blass" <>
Newsgroups: alt.html
Sent: Saturday, April 23, 2005 3:57 AM
Subject: download blocking


>hi,
>I have written a VB programm, which automatically downloads web-pages which
>are linked to rss-feeds. Unfortunately there are some sites which cannot be
>downloaded by program but only viewed online.
>I guess there must be some html or javascript trick which blocks the
>download
>process.
>does anybody know how this dirty trick works?


>thanx for your help, Helmut


>--
>Der Staat ist die große Fiktion, nach der sich jedermann bemüht, auf Kosten
>jedermanns zu leben.


>Frédéric Bastiat



Please help me understand this?
You created a software which crawls and scrapes websites, thereby needlessly
using websites bandwith for your own purposes?

Perhaps even violating UAG's and TOS'.

Then you desire other webmasters to advise you of how to circumvent (hack)
prevention tactics?

****-OFF!


 
Reply With Quote
 
 
 
 
Travis Newbury
Guest
Posts: n/a
 
      04-23-2005
lostinspace wrote:

>>I have written a VB programm, which automatically downloads web-pages which
>>are linked to rss-feeds. Unfortunately there are some sites which cannot be
>>downloaded by program but only viewed online.
>>I guess there must be some html or javascript trick which blocks the
>>download
>>process.
>>does anybody know how this dirty trick works?


No dirty tricks, just some bad vb code on your part. If you can see it
in a browser, you can grab it with VB and inet, and save it to a file.

> Please help me understand this?
> You created a software which crawls and scrapes websites, thereby needlessly
> using websites bandwith for your own purposes?


Or more innocently, they want to read it off line later.

> ****-OFF!


Better to be ****ed off, than ****ed on....

--
-=tn=-
 
Reply With Quote
 
Helmut Blass
Guest
Posts: n/a
 
      04-23-2005
"lostinspace" <> wrote:

>Please help me understand this?
>You created a software which crawls and scrapes websites, thereby needlessly
>using websites bandwith for your own purposes?


every web-surfer uses bandlwith for his purposes. my program just does
automatically what you are doing manually. is there much difference???

Helmut

--
Der Staat ist die große Fiktion, nach der sich jedermann bemüht, auf Kosten
jedermanns zu leben.

Frédéric Bastiat
 
Reply With Quote
 
Helmut Blass
Guest
Posts: n/a
 
      04-23-2005
In article <3mqae.5338$>, Travis Newbury <> wrote:

>No dirty tricks, just some bad vb code on your part. If you can see it
>in a browser, you can grab it with VB and inet, and save it to a file.


in most cases it works. however in few cases I can' grab grab it with vb and
inet. so there must be some tricky mechanism...

Helmut
 
Reply With Quote
 
lostinspace
Guest
Posts: n/a
 
      04-23-2005
----- Original Message -----
From: "Helmut Blass" <>
Newsgroups: alt.html
Sent: Saturday, April 23, 2005 8:25 AM
Subject: Re: download blocking


"lostinspace" <> wrote:

>>Please help me understand this?
>>You created a software which crawls and scrapes websites, thereby
>>needlessly
>>using websites bandwith for your own purposes?


>every web-surfer uses bandlwith for his purposes. my program just does
>automatically what you are doing manually. is there much difference???


>Helmut


Most asuuredly there is a difference and if you incapable of relaizing the
difference, your no different than a thief in the night!

The majority of websites were neither created or intended with this type of
delivery and presentation in mind.
That's why before scraping/downloading you might try reading the websites
UAG/TOS and your own internet providers, as well.



 
Reply With Quote
 
lostinspace
Guest
Posts: n/a
 
      04-23-2005
----- Original Message -----
From: "Travis Newbury" <>
Newsgroups: alt.html
Sent: Saturday, April 23, 2005 7:31 AM
Subject: Re: download blocking


> lostinspace wrote:
>
>>>I have written a VB programm, which automatically downloads web-pages
>>>which
>>>are linked to rss-feeds. Unfortunately there are some sites which cannot
>>>be
>>>downloaded by program but only viewed online.
>>>I guess there must be some html or javascript trick which blocks the
>>>download
>>>process.
>>>does anybody know how this dirty trick works?

>
> No dirty tricks, just some bad vb code on your part. If you can see it in
> a browser, you can grab it with VB and inet, and save it to a file.
>
>> Please help me understand this?
>> You created a software which crawls and scrapes websites, thereby
>> needlessly using websites bandwith for your own purposes?

>
> Or more innocently, they want to read it off line later.
>
>> ****-OFF!

>
> Better to be ****ed off, than ****ed on....
>
> --
> -=tn=-


"> Or more innocently, they want to read it off line later."

Violation of my sites TOS and will get you (as well as innocents in the same
IP range as your provider) denied access in the future.


 
Reply With Quote
 
Oli Filth
Guest
Posts: n/a
 
      04-23-2005
Helmut Blass wrote:
> hi,
> I have written a VB programm, which automatically downloads web-pages which
> are linked to rss-feeds. Unfortunately there are some sites which cannot be
> downloaded by program but only viewed online.
> I guess there must be some html or javascript trick which blocks the download
> process.
> does anybody know how this dirty trick works?
>


What are you sending as your User-Agent HTTP header? If you "fake" this
by setting it to that of a standard browser, it might help, as the
server of the site will just assume you're a browser.

(P.S. This is a complete guess, but give it a go )


--
Oli
 
Reply With Quote
 
Andy Dingley
Guest
Posts: n/a
 
      04-23-2005
On Sat, 23 Apr 2005 07:57:55 GMT, (Helmut Blass)
wrote:
>I have written a VB programm, which automatically downloads web-pages which
>are linked to rss-feeds. Unfortunately there are some sites which cannot be
>downloaded by program but only viewed online.


We can guess, but if you tell us the URLs then we can look at the actual
examples. Also tell us why you can't download them - do you get
anything, the wrong thing, or just a 404 ?

My two gueses:

It's related to the HTTP user-agent string that you're sending. The site
only accepts browsers that it recognises. This is stupid behaviour on
behalf of the site, so stupid that I don't think this is likely. You
should be able to work around it easily by impersonating IE.

Secondly (and more likely) you're probably using the MSXML component
within your VB program. This uses XML and RSS 0.9* isn't an XML
protocol. It looks a lot like XML, but most feeds are either not valid
RSS, or not even well-formed XML. For a "production grade" RSS reader
you can't rely on all feeds being well-formed XML, all the time.


And I don't know waht "lostinspace"s problem is, but he's a clueless
muppet if he doesn't realise what RSS is about.

 
Reply With Quote
 
lostinspace
Guest
Posts: n/a
 
      04-23-2005
----- Original Message -----
From: "Andy Dingley" <>
Newsgroups: alt.html
Sent: Saturday, April 23, 2005 11:49 AM
Subject: Re: download blocking


> On Sat, 23 Apr 2005 07:57:55 GMT, (Helmut Blass)
> wrote:
>>I have written a VB programm, which automatically downloads web-pages
>>which
>>are linked to rss-feeds. Unfortunately there are some sites which cannot
>>be
>>downloaded by program but only viewed online.

>
> We can guess, but if you tell us the URLs then we can look at the actual
> examples. Also tell us why you can't download them - do you get
> anything, the wrong thing, or just a 404 ?
>
> My two gueses:
>
> It's related to the HTTP user-agent string that you're sending. The site
> only accepts browsers that it recognises. This is stupid behaviour on
> behalf of the site, so stupid that I don't think this is likely. You
> should be able to work around it easily by impersonating IE.
>
> Secondly (and more likely) you're probably using the MSXML component
> within your VB program. This uses XML and RSS 0.9* isn't an XML
> protocol. It looks a lot like XML, but most feeds are either not valid
> RSS, or not even well-formed XML. For a "production grade" RSS reader
> you can't rely on all feeds being well-formed XML, all the time.
>
>
> And I don't know waht "lostinspace"s problem is, but he's a clueless
> muppet if he doesn't realise what RSS is about.
>


http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html
http://blogs.law.harvard.edu/tech/rss#whatIsRss
http://www.webreference.com/authorin...xml/rss/intro/

As a webmaster with very unique and copyrighted content (which exists
NOHWHERE else,) I should allow crawling of my sites under the pretense of
offline-use while the material is harvested to either sell to 3rd partys,
present to third parties; outside my websites or have the material
interpretated for any other 3rd party benefit.

Hogwash.

If vialble orgs desire my content, than let them approach me with
compensation and/or permission for the sweat of my brow, otherwise let them
eat 403's.

My sites are unique in these types of materials, however so are many others.
Few issues regarding traffic and visitors as related to websites are cut and
dry or black and white.
Each webmaster must make their own decisions on what is beneficial and
detriemental to their websites and base their websites actions on what they
desire.

One example would be "Helmut" whom would never get into my sites from a DE
IP range or a DE referral search.
Of course he may fake his IP for limited access. That's not the same as a
full-scrape.
WHY?
Their is no possible way for a DE visitor or traffic to enhance or benefit
my websites. They only draw resources and materials, which I have little
time to spend monitoring for plagiarism.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Switching from Non-Blocking to Blocking IO Christian Java 5 12-02-2007 11:24 PM
Non-blocking and semi-blocking Sockets class. nukleus Java 14 01-22-2007 08:22 PM
stealth-blocking, isp blocking website Dhruv Computer Security 9 01-25-2005 05:37 PM
Blocking and non blocking assignment in VHDL Hendra Gunawan VHDL 1 04-08-2004 06:03 AM
blocking i/o vs. non blocking i/o (performance) Andre Kelmanson C Programming 3 10-12-2003 02:09 PM



Advertisments