Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > JavaScript web scraping test cases?

Reply
Thread Tools

JavaScript web scraping test cases?

 
 
John J. Lee
Guest
Posts: n/a
 
      08-20-2003
I've put together a Python package for scraping / testing pages that
depend on embedded JavaScript code (without depending on IE, Mozilla
or Konqueror, and with the DOM etc. all implemented in pure Python --
mostly a hacked 4DOM, with some bits from pxdom; the JavaScript
interpreter I'm using ATM is spidermonkey). It's still missing a lot
and is pre-alpha, but it works, just barely.

Anyway, the point of this post is that I'm looking for pages to test
it on, so if you have a page that you'd like scraped (one that uses
JavaScript in some non-trivial way, of course! -- for dynamically
modifying forms, setting cookies, or whatever), mail me the details:
better that than some randomly-selected site from the Internet.
Obviously, it should be something that doesn't violate any terms &
conditions of use or otherwise cause people trouble, and preferably
that doesn't require any signup.


[In fact, TBH, my completely ad-hoc methodology with this is to write
some web scraping code, discover that the JavaScript breaks things,
often by depending on some nonstandard DOM feature, hack the DOM a
bit, etc. Hopefully I'll reach a point in understanding where I can
rewrite the DOM from scratch ('scratch' here being 4DOM), properly, to
match some approximation of 'HTML DOM as deployed'...]


John
 
Reply With Quote
 
 
 
 
John J. Lee
Guest
Posts: n/a
 
      08-22-2003
http://www.velocityreviews.com/forums/(E-Mail Removed) (John J. Lee) writes:
[...]
> Anyway, the point of this post is that I'm looking for pages to test
> it on, so if you have a page that you'd like scraped (one that uses
> JavaScript in some non-trivial way, of course! -- for dynamically
> modifying forms, setting cookies, or whatever), mail me the details:
> better that than some randomly-selected site from the Internet.
> Obviously, it should be something that doesn't violate any terms &
> conditions of use or otherwise cause people trouble, and preferably
> that doesn't require any signup.

[...]

Nobody?

I'll get my coat.


John
 
Reply With Quote
 
 
 
 
Skip Montanaro
Guest
Posts: n/a
 
      08-23-2003

>> Anyway, the point of this post is that I'm looking for pages to test
>> it on, so if you have a page that you'd like scraped (one that uses
>> JavaScript in some non-trivial way, of course! ...


John> Nobody?

Sorry, I couldn't think of anything off the top of my head. In my own pages
I've only ever used JS in trivial ways. Aside from a calendar on the Mojam
search results pages, I don't think JS is used on our sites at all. Still,
you're welcome to try it out on something like

http://www.mojam.com/concerts/search...lue=greg+brown

Skip

 
Reply With Quote
 
John J Lee
Guest
Posts: n/a
 
      08-23-2003
On Fri, 22 Aug 2003, Skip Montanaro wrote:

>
> >> Anyway, the point of this post is that I'm looking for pages to test
> >> it on, so if you have a page that you'd like scraped (one that uses
> >> JavaScript in some non-trivial way, of course! ...

>
> John> Nobody?
>
> Sorry, I couldn't think of anything off the top of my head. In my own pages

[...]

Oh, I'm sure I'll have no trouble finding test cases -- I just thought
that, rather than some random sites that are of no use to anyone, there is
bound to be somebody out there who actually wanted to scrape a particular
page in the past, and had not bothered previously thanks to the
inconvenience of having to read & reproduce the effect of the JS code
(particularly code that messes about with forms). It would be nice to be
doing something useful at the same time as writing tests!

Of course, I already have those sites that gave rise to the 'itch' to do
this in the first place, but I'm sure there's lots of the browser object
model that they don't exercise...


John

 
Reply With Quote
 
Cousin Stanley
Guest
Posts: n/a
 
      08-25-2003
John ...

I'm not sure what types of applications
you're looking for, but I have some JavaScript plots
that might be interesting to test ...

http://fastq.com/~sckitching/JS/Circle_MH.htm

http://fastq.com/~sckitching/JS/DD_Circles.htm

http://fastq.com/~sckitching/JS/Parabola.htm

--
Cousin Stanley
Human Being
Phoenix, Arizona


 
Reply With Quote
 
John J. Lee
Guest
Posts: n/a
 
      08-25-2003
"Cousin Stanley" <(E-Mail Removed)> writes:

> I'm not sure what types of applications
> you're looking for,


The kind that people actually want to use <wink>.

As I said, there's no problem finding test cases, I just thought that
while I was about this, somebody might happen be reading who was
actually trying to scrape a JS page.


> but I have some JavaScript plots
> that might be interesting to test ...
>
> http://fastq.com/~sckitching/JS/Circle_MH.htm

[...]

Konqueror 3.1 didn't show anything, Mozilla 1.4 printed some pretty
circles, then froze!


John
 
Reply With Quote
 
Cousin Stanley
Guest
Posts: n/a
 
      08-25-2003
John ...

Although it's been a while since I tested these scripts
I thought I remembered testing successfully in both
Mozilla 0.95 and IE 5.1 at the time ...

I tested this morning using Moz 1.3.1 and 2 out of 3 failed,
but all 3 worked in IE 6 ...

The JS used in these scripts, although a bit hackish,
doesn't use any particular IE magic ...

I zipped up all 3 scripts for convenience,
if you want to look at the sources ...

http://fastq.com/~sckitching/JS/JS_Plots.zip

Differences in JS/DOM implementations from browser to browser
hurt my head and seem to be an endless source of problems
for web developers ...

--
Cousin Stanley
Human Being
Phoenix, Arizona


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Scraping data from a web form submit Brent ASP .Net 8 10-19-2010 06:26 AM
Web Scraping/Site Scraping David Jones Python 4 07-13-2004 01:05 AM
General Web Scraping Question Selden McCabe ASP .Net 2 02-22-2004 10:18 AM
Web Scraping Question Selden McCabe ASP .Net 1 02-21-2004 04:43 PM
test test test test test test test Computer Support 2 07-02-2003 06:02 PM



Advertisments