Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Browser versus Java URLConnection

Reply
Thread Tools

Browser versus Java URLConnection

 
 
little_mm@ntlworld.com
Guest
Posts: n/a
 
      10-04-2006
Hi All

Perhaps someone knows the answer to this problem. I open a connection
to a URL and read lines one at a time from the URL using a
InputStreamReader and a BufferedReader:

// Open connection to URL
URLConnection conn =
(URLConnection)pageURL.openConnection();
conn.setReadTimeout(timeout);
conn.setConnectTimeout(timeout);
conn.setUseCaches(false);
InputStream pageStream = conn.getInputStream();
BufferedReader reader = new BufferedReader(new
InputStreamReader(pageStream));

String line;
StringBuffer pageBuffer = new StringBuffer();
while ((line = reader.readLine()) != null)
{
System.out.println(line);
pageBuffer.append(line);
}
return pageBuffer.toString();


However, the actual text I get back from the URL is different from that
saved out of a browser from the same URL. Particularly, the browser
saves £ characters, whereas the lines read in Java are missing
these characters altogether. Also, some of the characters have actually
been deleted in the Java lines. I have tried using different character
encodings in the second argument of the InputStreamReader, this has
virtually no effect, except using UTF-16 which returns a large number
of "?" characters in the stream. The content type header of the page
says it is ISO-8859-1, but this character encoding string with the
InputStreamReader changes nothing in the Java code: the £ symbol is
still missing.

In the browser, if I change the character encoding to "UTF-8" then the
£ symbol is still properly displayed in the browser. In other words,
it looks like I am receiving different data from the server depending
upon whether I use the browser or the code. I'm not sure if it has
anything to do with the encoding, but I'm just guessing.

Thanks,
Nubs.

 
Reply With Quote
 
 
 
 
Andrew Thompson
Guest
Posts: n/a
 
      10-04-2006
wrote:
....
> Perhaps someone knows the answer to this problem. I open a connection
> to a URL ...


What URL (specifically)?

> ...However, the actual text I get back from the URL is different from that
> saved out of a browser ...


What browser (make, version, OS - specifically)?

Is the saved text identical to the text shown when
you 'view source' in the 'a browser'?

Andrew T.

 
Reply With Quote
 
 
 
 
little_mm@ntlworld.com
Guest
Posts: n/a
 
      10-04-2006
Thanks for the response Andrew.

URL: http://www.net-a-porter.com/Shop/Sho...l?pageNumber=0

Browser: Mozilla Firefox, but same effect in IE6, OS: Windows XP.

Yes, I think view source and save page are identical, although I
haven't checked byte-for-byte.

Nubs.

Andrew Thompson wrote:

> wrote:
> ...
> > Perhaps someone knows the answer to this problem. I open a connection
> > to a URL ...

>
> What URL (specifically)?
>
> > ...However, the actual text I get back from the URL is different from that
> > saved out of a browser ...

>
> What browser (make, version, OS - specifically)?
>
> Is the saved text identical to the text shown when
> you 'view source' in the 'a browser'?
>
> Andrew T.


 
Reply With Quote
 
Chris Uppal
Guest
Posts: n/a
 
      10-04-2006
wrote:

> Perhaps someone knows the answer to this problem. I open a connection
> to a URL and read lines one at a time from the URL using a
> InputStreamReader and a BufferedReader:

[...]
> However, the actual text I get back from the URL is different from that
> saved out of a browser from the same URL. Particularly, the browser
> saves £ characters, whereas the lines read in Java are missing
> these characters altogether. Also, some of the characters have actually
> been deleted in the Java lines.


Maybe the website is using something like the Accept-Language: field in the
request to decide what currency (etc) to send back. I don't know what the Java
HTTP client will send in that field by default, but it is unlikely to be
'en-GB' which is what my browser would send.

I just tried it myself, but -- most unfortunately -- the site has just stopped
responding. I /do/ hope my little experiment didn't kill it...

-- chris



 
Reply With Quote
 
little_mm@ntlworld.com
Guest
Posts: n/a
 
      10-04-2006
Chris Uppal wrote:

> > Perhaps someone knows the answer to this problem. I open a connection
> > to a URL and read lines one at a time from the URL using a
> > InputStreamReader and a BufferedReader:

> [...]
> > However, the actual text I get back from the URL is different from that
> > saved out of a browser from the same URL. Particularly, the browser
> > saves £ characters, whereas the lines read in Java are missing
> > these characters altogether. Also, some of the characters have actually
> > been deleted in the Java lines.

>
> Maybe the website is using something like the Accept-Language: field in the
> request to decide what currency (etc) to send back. I don't know what the Java
> HTTP client will send in that field by default, but it is unlikely to be
> 'en-GB' which is what my browser would send.
>
> I just tried it myself, but -- most unfortunately -- the site has just stopped
> responding. I /do/ hope my little experiment didn't kill it...
>
> -- chris


Hi Chris - thanks for the response. So, question: how do you mimic the
browser's HTTP requests precisely, so that a website generally behaves
in the same way? For example, how do you change the Accept-Language
field?

Thanks,
Nubs.

 
Reply With Quote
 
Tor Iver Wilhelmsen
Guest
Posts: n/a
 
      10-04-2006
writes:

> Hi Chris - thanks for the response. So, question: how do you mimic the
> browser's HTTP requests precisely, so that a website generally behaves
> in the same way? For example, how do you change the Accept-Language
> field?


Look at URLConnection.setRequestProperty().
 
Reply With Quote
 
little_mm@ntlworld.com
Guest
Posts: n/a
 
      10-04-2006
Tor Iver Wilhelmsen wrote:

> > Hi Chris - thanks for the response. So, question: how do you mimic the
> > browser's HTTP requests precisely, so that a website generally behaves
> > in the same way? For example, how do you change the Accept-Language
> > field?

>
> Look at URLConnection.setRequestProperty().


OK, many thanks Iver.

 
Reply With Quote
 
Chris Uppal
Guest
Posts: n/a
 
      10-05-2006
wrote:

> Hi Chris - thanks for the response. So, question: how do you mimic the
> browser's HTTP requests precisely, so that a website generally behaves
> in the same way?


I see that Tor has already answered. I want to add that their server is back
up this morning, and I've just tried again (it stayed up this time !). The bad
news is that changing the Accept-Language field to, say, "da" made no
difference -- it still sent back a page where the price of the first boot was
&pound; <some jaw-droppingly large number>. So that was a red-herring, I'm
afraid.

-- chris


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Mozilla versus IE versus Opera versus Safari Peter Potamus the Purple Hippo Firefox 0 05-08-2008 12:56 PM
equal? versus eql? versus == versus === verus <=> Paul Butcher Ruby 12 11-28-2007 06:06 AM
OutputStream from a URLConnection produces an OutOfMemory OutputStream from a URLConnection produces an OutOfMemory WinstonSmith_101@hotmail.com Java 2 10-25-2006 04:45 PM
script versus code versus ? Russ ASP .Net 1 06-10-2004 03:06 AM
HTML Client Control versus. HTML Server Control versus. Web Server Control Matthew Louden ASP .Net 1 10-11-2003 07:09 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57