Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > ASP .Net > ASP General > screen scraping

Reply
Thread Tools

screen scraping

 
 
Roland Hall
Guest
Posts: n/a
 
      03-26-2005
Am I correct in assuming screen scraping is just the response text sent to
the browser? If so, would that mean that this could not be screen scraped?

function moi() {
var tag = '<a href=';
var tagType1 = '"mail'+'to:', tagType2 = '">', tagType3 = '<\/a>';
var user1 = 'web', user2 = 'master', user3 = '@';
var dom1 = 'danger', dom2 = 'ous', dom3 = 'ly';
var tld = '.us';
document.write(tag+tagType1+user1+user2+user3+dom1 +dom2+dom3+tld+tagType2+user1+user2+user3+dom1+dom 2+dom3+tld+tagType3);
}

--
Roland Hall
/* This information is distributed in the hope that it will be useful, but
without any warranty; without even the implied warranty of merchantability
or fitness for a particular purpose. */
Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp
MSDN Library - http://msdn.microsoft.com/library/default.asp


 
Reply With Quote
 
 
 
 
Mark Schupp
Guest
Posts: n/a
 
      03-28-2005
Screen scraping is a technique, not a format. The technique is to intercept
the raw data (in this case HTML)that would normally be displayed on the
client system screen and extract data from it. In ASP context screen
scraping would typically be done by having a server-side component (such as
xmlhttprequest) perform a get or post to a url and return the raw HTML as
text. Then a parser of some kind is used to extract the desired information.

The example you present would be difficult (though not impossible) to
screen-scrape server-side. The parser would have to be able to evaluate the
output of the JavaScript function to get the data. I have seen references to
using the HTML browser component (MSHTML object) to do things like this but
I don't think it works well server-side.

--
Mark Schupp
Head of Development
Integrity eLearning
www.ielearning.com


"Roland Hall" <nobody@nowhere> wrote in message
news:(E-Mail Removed)...
> Am I correct in assuming screen scraping is just the response text sent to
> the browser? If so, would that mean that this could not be screen

scraped?
>
> function moi() {
> var tag = '<a href=';
> var tagType1 = '"mail'+'to:', tagType2 = '">', tagType3 = '<\/a>';
> var user1 = 'web', user2 = 'master', user3 = '@';
> var dom1 = 'danger', dom2 = 'ous', dom3 = 'ly';
> var tld = '.us';
>

document.write(tag+tagType1+user1+user2+user3+dom1 +dom2+dom3+tld+tagType2+us
er1+user2+user3+dom1+dom2+dom3+tld+tagType3);
> }
>
> --
> Roland Hall
> /* This information is distributed in the hope that it will be useful, but
> without any warranty; without even the implied warranty of merchantability
> or fitness for a particular purpose. */
> Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
> WSH 5.6 Documentation -

http://msdn.microsoft.com/downloads/list/webdev.asp
> MSDN Library - http://msdn.microsoft.com/library/default.asp
>
>



 
Reply With Quote
 
 
 
 
larrybud2002@yahoo.com
Guest
Posts: n/a
 
      03-29-2005

Roland Hall wrote:
> Am I correct in assuming screen scraping is just the response text

sent to
> the browser? If so, would that mean that this could not be screen

scraped?
>
> function moi() {
> var tag = '<a href=';
> var tagType1 = '"mail'+'to:', tagType2 = '">', tagType3 = '<\/a>';
> var user1 = 'web', user2 = 'master', user3 = '@';
> var dom1 = 'danger', dom2 = 'ous', dom3 = 'ly';
> var tld = '.us';
>

document.write(tag+tagType1+user1+user2+user3+dom1 +dom2+dom3+tld+tagType2+user1+user2+user3+dom1+dom 2+dom3+tld+tagType3);
> }


Anything can be scraped. If you want to hide an email address, put a
form up and send the email server side so that the email address can
never be retrieved over HTML.

 
Reply With Quote
 
Roland Hall
Guest
Posts: n/a
 
      03-29-2005
<(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ups.com...
:
: Anything can be scraped. If you want to hide an email address, put a
: form up and send the email server side so that the email address can
: never be retrieved over HTML.

Hi Larry...

Thanks for responding...

I understand a form is best but I was looking for a way to defeat the
javascript. Surely a spammer is not going to capture all scripts and
process them in hopes of finding a single email address. The goal of a
spammer is to be lazy and get as much as possible with as little effort as
possible. There is no benefit to processing every script they spider with
no guarantee to of finding an email address encoded in it somewhere. I see
the benefit of finding one in plain sight since 99.99% of them will be that
way.

I also shouldn't have said "screen" scraped as it's not really the screen
memory that's being queried but rather the response text. Javascript
doesn't show the results, except to the browser. I have not seen a way to
grab those results although I can think of some possibilities which appear
to be a lot of effort. I just don't see the ROI but would welcome any info
on how it is accomplished.

--
Roland Hall
/* This information is distributed in the hope that it will be useful, but
without any warranty; without even the implied warranty of merchantability
or fitness for a particular purpose. */
Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp
MSDN Library - http://msdn.microsoft.com/library/default.asp


 
Reply With Quote
 
Roland Hall
Guest
Posts: n/a
 
      03-29-2005
"Mark Schupp" wrote in message news:(E-Mail Removed)...
: Screen scraping is a technique, not a format.

Hi Mark...

Thanks for responding. I didn't realize I said it was a format and I should
have said HTML scraping since it's not really screen scraping like it would
be on a terminal.

: The technique is to intercept
: the raw data (in this case HTML)that would normally be displayed on the
: client system screen and extract data from it. In ASP context screen
: scraping would typically be done by having a server-side component (such
as
: xmlhttprequest) perform a get or post to a url and return the raw HTML as
: text. Then a parser of some kind is used to extract the desired
information.

Yes, I'm familiar with that process.

: The example you present would be difficult (though not impossible) to
: screen-scrape server-side. The parser would have to be able to evaluate
the
: output of the JavaScript function to get the data. I have seen references
to
: using the HTML browser component (MSHTML object) to do things like this
but
: I don't think it works well server-side.

I have not been able to do it either. I think it may require HTML scraping
the site and then "screen" scraping my page, implying printing it to a text
file and then reloading and parsing that or capturing it from my screen
memory, the former being the easier of the two. This would require the
result look like http://www.velocityreviews.com/forums/(E-Mail Removed) instead of user at domain dot com. I think
I'll test the first since so many suggest using encoded javascript to hide
from spammers.

--
Roland Hall
/* This information is distributed in the hope that it will be useful, but
without any warranty; without even the implied warranty of merchantability
or fitness for a particular purpose. */
Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp
MSDN Library - http://msdn.microsoft.com/library/default.asp


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Scheduling a screen-scraping progam on a locked PC? =?Utf-8?B?VGludGluTWlsb3U=?= Microsoft Certification 7 01-12-2005 06:19 PM
Screen scraping in ASP.NET Jim Giblin ASP .Net 3 08-16-2004 08:09 PM
Web Scraping/Site Scraping David Jones Python 4 07-13-2004 01:05 AM
HTML Screen Scraping Q George Durzi ASP .Net 2 02-25-2004 05:17 PM
Screen Scraping C# Robert Martinez ASP .Net 0 08-26-2003 09:59 PM



Advertisments