Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > simulating a browser to get redirected URL location.

Reply
Thread Tools

simulating a browser to get redirected URL location.

 
 
Roedy Green
Guest
Posts: n/a
 
      11-16-2010
I am trying to write some code that chases HTML redirect chains and
makes a list of URLs that have been permanently moved and where they
went.

The code I think should work to get at a "Location:" field in the
response header does not work. I just get null.

urlc.connect();
String location = urlc.getHeaderField( "Location" );

I found some code on the net that claims to work, but it is pretty
ugly:
http://www.kodejava.org/examples/198.html

Is there something obvious I am missing?
--
Roedy Green Canadian Mind Products
http://mindprod.com

Finding a bug is a sign you were asleep a the switch when coding. Stop debugging, and go back over your code line by line.
 
Reply With Quote
 
 
 
 
Arne Vajh°j
Guest
Posts: n/a
 
      11-16-2010
On 16-11-2010 11:36, Roedy Green wrote:
> I am trying to write some code that chases HTML redirect chains and
> makes a list of URLs that have been permanently moved and where they
> went.
>
> The code I think should work to get at a "Location:" field in the
> response header does not work. I just get null.
>
> urlc.connect();
> String location = urlc.getHeaderField( "Location" );
>
> I found some code on the net that claims to work, but it is pretty
> ugly:
> http://www.kodejava.org/examples/198.html
>
> Is there something obvious I am missing?


Use Jakarta HttpClient instead.

Arne

 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      11-16-2010
On Tue, 16 Nov 2010 19:16:13 +0100, Jake Jarvis
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone
who said :

>
>Is urlc set up to automatically follow redirects?


yes. That works. It is also the default. When you turn it off you get
a little message as the page content about the redirect.
--
Roedy Green Canadian Mind Products
http://mindprod.com

Finding a bug is a sign you were asleep a the switch when coding. Stop debugging, and go back over your code line by line.
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      11-17-2010
On Tue, 16 Nov 2010 13:48:57 -0800, Roedy Green
<(E-Mail Removed)> wrote, quoted or indirectly quoted
someone who said :

>>
>>Is urlc set up to automatically follow redirects?

>
>yes. That works. It is also the default. When you turn it off you get
>a little message as the page content about the redirect.


However, I still can't get the Location parm. I figured out how to
get Intellij to trace through getHeaderField, and it is looking
through a list of parms, just none is Location.

I have two idea to attack.

1. Use wireshare to find out if Location is indeed in the returned
header, both with and without followRedirection and find out what
status codes you get.

2. At some point getHeaderField must flip from scanning the header to
send to the header you received. I must find out precisely when that
is and if it does happen as expected. Perhaps it happens only after
you open the InputStream.
--
Roedy Green Canadian Mind Products
http://mindprod.com

Finding a bug is a sign you were asleep a the switch when coding. Stop debugging, and go back over your code line by line.
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      11-17-2010
On Tue, 16 Nov 2010 21:37:54 -0800, Roedy Green
<(E-Mail Removed)> wrote, quoted or indirectly quoted
someone who said :

>
>However, I still can't get the Location parm. I figured out how to
>get Intellij to trace through getHeaderField, and it is looking
>through a list of parms, just none is Location.


It is sort of working now. You see the location only if you turn off
follow redirects. I think you are likely then just finding out about
the first leg.

Browsers must do the fetch in explicit stages. The Location: field is
not there when you fetch the last leg. You have to get it from the
second to last leg.

--
Roedy Green Canadian Mind Products
http://mindprod.com

Finding a bug is a sign you were asleep a the switch when coding. Stop debugging, and go back over your code line by line.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Redirected URL Anil Kumar Ruby 2 08-22-2009 04:02 PM
Finding the Redirected URL Muggle Java 1 08-26-2008 07:22 PM
mechnize redirected url akanksha Ruby 1 08-04-2006 09:46 AM
discover redirected url Miko Python 1 08-21-2004 03:36 PM
Get HTML content of a redirected URL? Kaidi Java 3 01-04-2004 07:37 AM



Advertisments