Todd W wrote:
>
> "Hal Vaughan" <> wrote in message
> news:2Y-dnU1DnpqllNzfRVn-...
>> I'm trying to write a scraper for a website that uses cookies. The short
> of
>> it is that I keep getting their "You have to set your browser to allow
>> cookies" message. The code for the full scraper is a bit much, so here
> are
>> the relevant sections:
>>
> <snip />
>
> I've had a lot of sucess using LWP to scrape web pages, for instance I
> have a neat program that shows me all my bank account balances on my web
> enabled cell phone, but Ive had some trouble getting LWP to scrape some
> pages that required cookies also.
>
> Heres my code:
>
> [trwww[at]waveright temp]$ perl -MWWW::Mechanize::Shell -e 'shell'
>>get https://www.setsivr.odjfs.state.oh.us/welcome.asp
> Retrieving https://www.setsivr.odjfs.state.oh.us/welcome.asp(200)
> https://www.setsivr.odjfs.state.oh.us/cookieerror.htm>
>
> If the client and the server were doing everything according to
> specification, this would work.
>
> I get the same problem with lynx, and another poster on perl.libwww
> verified my issue, and also got the same error using a python http
> library.
>
> Heres the archive of my thread:
>
>
http://groups-beta.google.com/group/...d09ffd6ff2f4fd
I checked the thread, and I've gone back over the pages I downloaded. I
wasn't clear (I think I mentioned it in my first post) about how cookies
are normally handled, and had not looked closely at the files (since I
figured that was not likely the problem). It turns out that the cookie IS
being set in Javascript, which I suspected, but didn't realize this is a
problem. I wrote out a routine that scanned the page, grabbed the cookie,
and set it manually with $cookie_jar->set_cookie(), and it looks like it is
set properly (it includes the domain and path setting, as well). However,
even after setting the cookie manually, I either get "no cookie" messages,
or trying to load any page after the login gives me the login page again
(which I noticed happens in Firefox if I try to paste in a link to a page
after the login page when I'm not logged in). (I also looked at the
cookies in Firefox to see if it looked like the same ones I was getting in
Perl, and they seem the same except for the session ID number.)
So I've found a way to set the cookie by hand, but the server I'm trying to
read from doesn't seem to see the cookie is set. Is there something I need
to do, other than setting a cookie, to make sure the server I'm connecting
to knows the cookie is set?
This is not an area I'm an expert in, and it's frustrating because I need to
get this done, so I'm low on sleep, and trying to put together a lot more
pieces than I expected in this. I didn't know, when I sent a page request
to a server, that the server could actually read the cookie with the
request, I thought cookies were only used by client side Java, but the fact
that the server won't send me the right pages without the cookie seems to
say the server can read the cookie. Is that right? If so, how do I make
sure the server gets the cookie?
Thanks for any help on this!
Hal