Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > URL paramater sts - mechanize & nokogiri differences

Reply
Thread Tools

URL paramater sts - mechanize & nokogiri differences

 
 
Don Norcott
Guest
Posts: n/a
 
      10-09-2010
I have written ruby code (with mechanize and nokogiri) to do the
following

1) Retrieve the search webpage
2) Enter search criteria into the from
3) Submit the form and retrieve the first webpage which is a list of
book titles embedded in the page
4) For each title in the retrieved web page extract 5 fields
5) Retrieve the next webpage of titles
6) Repeat 4 & 5 until all titles retrieved

The mechanize code below works to the point of submitting the form. The
first webpage returned is missing at least 2 of the
Fields for each title.

Now if I grab the url generated by mech.submit and use it in firefox it
displays all the titles and information normally BUT
the URL has been changed slightly before the titles are displayed.

THIS IS THE URL RETURNED BY MECHANIZE.SUBMIT
#<URI::HTTP:0x17706d8
URL:http://www.xyz.com/servlet/SearchResults?an=Asimov&bi=0&bx=off&ds=30&kn=scien ce+fiction&recentlyadded=all&sortby=17&sts=t>}


Now if I take the URL from the submit and use it in the nokogiri code
below it fails to open with BAD URI.
Also if take the URL from fire fox and use it in the nokogiri code
below it also fails to open with BAD URI.

Now if I start off in firefox at the search page and enter the same data
into the form and submit it manually I wind up with the
same screen displayed as when I cut and pasted in the url from the
mechanize.submit code.

If I now copy the url from firefox and use it in the nokogiri code below
it works fine and the "puts node.text" shows that
all 5 of the fields I require are there (plus others not present in the
mechanize object)

Now the urls from the 3 steps above only differ in one way, the last
variable (sts) on the url line.
&sortby=17&sts=t>}" from mechanize.submit
&sortby=17&sts=t%3E}" copied from firefox after submit url used and
webpage displayed (changed url)
&sortby=17&sts=t&x=84&y=10" manualy entered the search and this is
the url upon display of first page

The attached file shows what the source (from web page) for the last
title looks like and what the mechanize content for that same title
looks like.

THE CONTENTS OF BOTH <td class="itemNumbr" valign="top">
AND <div class="result-price"> are missing in the mechanize object

Can anyone shed light on what is happening. It would be greatly
appreciated.
Thanks Don

#MECHANIZE CODE
require 'rubygems'
require 'open-uri'
require 'nokogiri'
require 'mechanize'
url = "...." # url of search form
a = Mechanize.new { |agent|
agent.user_agent_alias = 'Mac Safari';
};
search_page = a.get(url);
search_form = search_page.form_with(:name => 'form-advancedSearch')
search_form.an = 'Asimov'
search_form.kn = 'science fiction'
title_pg = search_form.submit # capture submitted url and title_pg
contents
title_pg.links.each do |link|
puts link.text #not all the data is there
end

NOKOGIRI CODE
require 'open-uri'
require 'nokogiri'
url = "http://www.xyz.com/....."

doc = Nokogiri::HTML(open(url))
doc.xpath('//tr').each do |node|
puts node.text
end

Attachments:
http://www.ruby-forum.com/attachment...pPage-Docs.txt

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Don Norcott
Guest
Posts: n/a
 
      10-09-2010
I still have not resolved (or do not understand my problem) but the
following is a work around that allows me to continue with development

title_pg = search_form.submit # get first title page - last line of
orig code

#initialize a Nokogiri::HTML Object with 'title_pg.body' the returned
web page
doc = Nokogiri::HTML(title_pg.body)

can now use Nokogiri to process the title page HTML
doc.xpath('//tr').each do |node|
puts node.text
end

This prints out the fields that are missing in the mechanize object.
Not sure if this is really is a problem or I simply do not understand
the mechanize object properly and the data is there but requires a
different selector??
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Scraping with Nokogiri while using Mechanize Squawk Boxed Ruby 2 03-11-2011 04:22 PM
Mechanize/Nokogiri from file Rowan Udell Ruby 0 09-16-2009 02:25 AM
Moving Mechanize to Nokogiri Patrick L. Ruby 3 02-19-2009 05:11 AM
clicking links in mechanize with :text=> nokogiri.css('a.l') Edouard Dantes Ruby 1 01-29-2009 03:51 PM
Cisco SONET OC-12 STS-3 to OC-12 STS-12c CHANGE USERNAME TO westes Cisco 0 06-09-2004 09:23 PM



Advertisments