Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > ASP .Net > Trolling a site for data

Reply
Thread Tools

Trolling a site for data

 
 
Steffan A. Cline
Guest
Posts: n/a
 
      11-09-2009
I was trying to find a way to troll/poll/scrape a site for data.
Unfortunately the site uses AJAX with .asp which I have never worked with
before. If this site used php, lasso or the like it would be easy to grab
the url and query the data directly. I have done similar things before where
the pages use a plain form and then paginate through the results. (1-100,
101-200 etc).

The example would be this site :
http://www.tblaw.com/FsSales/PendingSales.aspx

I can simply include the url to the site via the likes of curl or something
but it only gets the first 60 records. No matter what I do, I can't find out
how to get more than the 60.

On another site, for example, I hit the page first to get the cookie and
event and action so that I can keep posting them to the next page with the
page parameters and then parse the results.

Sorry if I am not explaining this very well.

Any suggestions?

Thanks,
Steffan

 
Reply With Quote
 
 
 
 
Alexey Smirnov
Guest
Posts: n/a
 
      11-09-2009
On Nov 9, 4:59*am, "Steffan A. Cline" <(E-Mail Removed)> wrote:
> I was trying to find a way to troll/poll/scrape a site for data.
> Unfortunately the site uses AJAX with .asp which I have never worked with
> before. If this site used php, lasso or the like it would be easy to grab
> the url and query the data directly. I have done similar things before where
> the pages use a plain form and then paginate through the results. (1-100,
> 101-200 etc).
>
> The example would be this site :http://www.tblaw.com/FsSales/PendingSales..aspx
>
> I can simply include the url to the site via the likes of curl or something
> but it only gets the first 60 records. No matter what I do, I can't find out
> how to get more than the 60.
>
> On another site, for example, I hit the page first to get the cookie and
> event and action so that I can keep posting them to the next page with the
> page parameters and then parse the results.
>
> Sorry if I am not explaining this very well.
>
> Any suggestions?
>
> Thanks,
> Steffan


You have to learn how ajax is working. Usually it's a java/vb script
that requests some data from the server, and takes and renders the
resulting data back to the page. It means that you need to find how it
is implemented in every particular case and read the output from the
script/page that returns the resulting data.
 
Reply With Quote
 
 
 
 
Steffan A. Cline
Guest
Posts: n/a
 
      11-09-2009
in article
7930df69-4264-4aab-89ee-211661f99781...oglegroups.com, Alexey
Smirnov at (E-Mail Removed) wrote on 11/9/09 1:03 AM:

> On Nov 9, 4:59*am, "Steffan A. Cline" <(E-Mail Removed)> wrote:
>> I was trying to find a way to troll/poll/scrape a site for data.
>> Unfortunately the site uses AJAX with .asp which I have never worked with
>> before. If this site used php, lasso or the like it would be easy to grab
>> the url and query the data directly. I have done similar things before where
>> the pages use a plain form and then paginate through the results. (1-100,
>> 101-200 etc).
>>
>> The example would be this site
>> :http://www.tblaw.com/FsSales/PendingSales.aspx

>
>>
>> I can simply include the url to the site via the likes of curl or something
>> but it only gets the first 60 records. No matter what I do, I can't find out
>> how to get more than the 60.
>>
>> On another site, for example, I hit the page first to get the cookie and
>> event and action so that I can keep posting them to the next page with the
>> page parameters and then parse the results.
>>
>> Sorry if I am not explaining this very well.
>>
>> Any suggestions?
>>
>> Thanks,
>> Steffan

>
> You have to learn how ajax is working. Usually it's a java/vb script
> that requests some data from the server, and takes and renders the
> resulting data back to the page. It means that you need to find how it
> is implemented in every particular case and read the output from the
> script/page that returns the resulting data.


Right. I get that. The problem is that asp.net does an outstanding way of
obfuscating. On a normal JS based AJAX query, you can easily see the URL and
parameters being sent. The deal is that asp.net sends waaaay more data.

I was hoping someone could help figure out the way exactly that asp.net is
doing it. I tried parsing the headers and no luck.

Thanks,
Steffan

 
Reply With Quote
 
bruce barker
Guest
Posts: n/a
 
      11-09-2009
one area that ap.net is different is its postback model. there are
hidden fields __EVENTTARGET and __EVENTARGUMENT that contain info on the
postback control. __VIEWSTATE contains state infomation. before you
can do a form post to a asp.net server, you must do a get to get a valid
viewstate.

in you case you need to go a get, to get page one dat and a viewstate.
then a form post (filling in __TEVENTTARGET) to get page two and the
viewstate for page 3.

if the site uses an update panel, then its just a little tricker. the
update panel posts all the form data (there will be hidden fields to
identify it as a async postback) via XmlHttpRequest, and gets back just
the html (pretty simple format) for a subsection of the page. You will
need to parse this for your data, new viewstate, and any form field
updates (keep track of the all form field from before the post and merge
results).

-- bruce (sqlwork.com)

Steffan A. Cline wrote:
> in article
> 7930df69-4264-4aab-89ee-211661f99781...oglegroups.com, Alexey
> Smirnov at (E-Mail Removed) wrote on 11/9/09 1:03 AM:
>
>> On Nov 9, 4:59 am, "Steffan A. Cline" <(E-Mail Removed)> wrote:
>>> I was trying to find a way to troll/poll/scrape a site for data.
>>> Unfortunately the site uses AJAX with .asp which I have never worked with
>>> before. If this site used php, lasso or the like it would be easy to grab
>>> the url and query the data directly. I have done similar things before where
>>> the pages use a plain form and then paginate through the results. (1-100,
>>> 101-200 etc).
>>>
>>> The example would be this site
>>> :http://www.tblaw.com/FsSales/PendingSales.aspx
>>> I can simply include the url to the site via the likes of curl or something
>>> but it only gets the first 60 records. No matter what I do, I can't find out
>>> how to get more than the 60.
>>>
>>> On another site, for example, I hit the page first to get the cookie and
>>> event and action so that I can keep posting them to the next page with the
>>> page parameters and then parse the results.
>>>
>>> Sorry if I am not explaining this very well.
>>>
>>> Any suggestions?
>>>
>>> Thanks,
>>> Steffan

>> You have to learn how ajax is working. Usually it's a java/vb script
>> that requests some data from the server, and takes and renders the
>> resulting data back to the page. It means that you need to find how it
>> is implemented in every particular case and read the output from the
>> script/page that returns the resulting data.

>
> Right. I get that. The problem is that asp.net does an outstanding way of
> obfuscating. On a normal JS based AJAX query, you can easily see the URL and
> parameters being sent. The deal is that asp.net sends waaaay more data.
>
> I was hoping someone could help figure out the way exactly that asp.net is
> doing it. I tried parsing the headers and no luck.
>
> Thanks,
> Steffan
>

 
Reply With Quote
 
Alexey Smirnov
Guest
Posts: n/a
 
      11-11-2009
On Nov 9, 2:29*pm, "Steffan A. Cline" <(E-Mail Removed)> wrote:
> in article
> (E-Mail Removed), Alexey
> Smirnov at (E-Mail Removed) wrote on 11/9/09 1:03 AM:
>
>
>
>
>
> > On Nov 9, 4:59*am, "Steffan A. Cline" <(E-Mail Removed)> wrote:
> >> I was trying to find a way to troll/poll/scrape a site for data.
> >> Unfortunately the site uses AJAX with .asp which I have never worked with
> >> before. If this site used php, lasso or the like it would be easy to grab
> >> the url and query the data directly. I have done similar things before where
> >> the pages use a plain form and then paginate through the results. (1-100,
> >> 101-200 etc).

>
> >> The example would be this site
> >> :http://www.tblaw.com/FsSales/PendingSales.aspx

>
> >> I can simply include the url to the site via the likes of curl or something
> >> but it only gets the first 60 records. No matter what I do, I can't find out
> >> how to get more than the 60.

>
> >> On another site, for example, I hit the page first to get the cookie and
> >> event and action so that I can keep posting them to the next page with the
> >> page parameters and then parse the results.

>
> >> Sorry if I am not explaining this very well.

>
> >> Any suggestions?

>
> >> Thanks,
> >> Steffan

>
> > You have to learn how ajax is working. Usually it's a java/vb script
> > that requests some data from the server, and takes and renders the
> > resulting data back to the page. It means that you need to find how it
> > is implemented in every particular case and read the output from the
> > script/page that returns the resulting data.

>
> Right. I get that. The problem is that asp.net does an outstanding way of
> obfuscating. On a normal JS based AJAX query, you can easily see the URL and
> parameters being sent. The deal is that asp.net sends waaaay more data.
>
> I was hoping someone could help figure out the way exactly that asp.net is
> doing it. I tried parsing the headers and no luck.
>
> Thanks,
> Steffan- Hide quoted text -
>
> - Show quoted text -


As Bruce correctly noted, look into postback data, in most cases all
information is there. For instance, if we take your URL as an example,
we will see that the gridview has paging 1..2..3..etc. These links
initiate asynchronous postbacks and cause a partial-page update. Each
link has an id like 'ListView1$PagerTop$ctl01$ctlXX' where 00 is for
page #1, 01 for page #2, etc. and urls as javascript:__doPostBack
('ListView1$PagerTop$ctl01$ctlXX',''). What does it mean? It does mean
that the number of new page will be sent via postback as id of the
link control. Sounds simple, right? Send a request to the remote
server where you should say that your __EVENTTARGET is
ListView1%24PagerTop%24ctl01%24ctl01 when you want to get page #2. If
page controls are based on viewstate you need to copy the viewstate
into request as well. This is probably where you were confused by many
data. ViewState is used the retain the state of controls between
postbacks. Again, if we take your example, we don't change any control
state, and it means that you can copy original viewstate from the very
first page. If it's necessary to know what does ViewState includes,
you can decode it. There are some tools to do it, for example:

http://lachlankeown.blogspot.com/200...r-decoder.html

To debug HTTP requests, use Fiddler Web Debugger.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[trolling] Part of why I hate iostreams... <g> Alf P. Steinbach C++ 4 04-08-2009 09:20 AM
List of free web site design, web site backgrounds, web site layoutsresources cyber HTML 0 12-21-2007 03:47 PM
List of free web site design, web site backgrounds, web site layoutsweb sites cyber HTML 1 12-19-2007 09:07 AM
Trolling for New Web Host . . . Ben Wilson Python 2 02-27-2006 02:26 AM
[Trolling] assembly vs C language RoSsIaCrIiLoIA C Programming 6 02-09-2005 10:20 PM



Advertisments