Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Getting Response from HTTPS POST

Reply
Thread Tools

Getting Response from HTTPS POST

 
 
Matt White
Guest
Posts: n/a
 
      05-31-2007
Hello,

I am writing a crawler to parse webpages. One site that I am crawling
requires me to log in, so I use an HTTPS POST to log in. However, once
I send the POST I can't get anywhere because I have to have a valid
session id in the URL. If I log in using FireFox, the session id is
appended to the URL for every page that I visit (something like
http://blah.com/page?sessid=5438729057). How can I get this session ID
so that I can append it to my URLs and crawl the page? They used to
send the session id in a cookie but they no longer use cookies (you
will see the attempt to get the cookie still in this code). Here is
what I have:

require 'net/https'
require 'uri'

url = '<appropriate URL here>'
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = uri.scheme == 'https'
http.verify_mode = OpenSSL::SSL::VERIFY_NONE

response = self.get_data(http, uri, headers)
page = response.body

#grab hidden field from the page
view_state = CGI::escape(page[/<input type="hidden"
name="__VIEWSTATE" value="([^"]*)"/, 1])
post_data = '<post data here>'

login_response,data = http.post('<appropriate path here>',
post_data, headers)

cookie = nil
location = nil
login_response.each_header do |name, value|
cookie = value[0, value.index(';')] if name == 'set-cookie'
location = value if name == 'location'
end

headers['Cookie'] = cookie

if location
homepage = get_data(http, URI.parse('<appropriate URI
here>'+location), headers).body
else
homepage = get_data(http, URI.parse('<default URI here>'),
headers).body
end

start_with_homepage(homepage, http, headers)

Thanks,
Matt

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
server side redirect https => https NOT working Axel ASP General 8 04-27-2009 02:02 AM
https authentication & storing https page in string Naveen Dhanuka Ruby 1 09-19-2007 02:05 PM
open-uri and HTTPS, or net/https with a redirect jotto Ruby 4 10-02-2006 07:26 AM
Response.TransmitFile Response.WriteFile SSL HTTPS Ryan Pedersen ASP .Net 1 07-12-2005 02:11 PM
Post post post. Shel-hed Computer Support 2 11-08-2003 07:41 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57