Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Searching google in java

Reply
Thread Tools

Searching google in java

 
 
mfasoccer@gmail.com
Guest
Posts: n/a
 
      05-18-2006
Im working on a project that involves searching with google. I have
been getting an http 403 error with the following code:

import java.net.*;
import java.io.*;

public class GoogleSearchTest
{
public static void main(String[] args) throws Exception{
URL hp = new URL("http://www.google.com/search?q=babelfish");
URLConnection hpCon = hp.openConnection();
hpCon.connect();
InputStream input = hpCon.getInputStream(); // error traces to here

/*
This code is all irrelevant to my problem because
the inputstream is refuted
String content = "";
int c;
while((c = input.read()) != -1)
content += (char)c;
*/
}
}

I know that http 403 error means that the server understood the
request, yet refused it. As you can probably tell I have very little
network programming experience, so maybe more experienced programmers
could help alter my approach, or explain a better one? Thanks.

 
Reply With Quote
 
 
 
 
Patricia Shanahan
Guest
Posts: n/a
 
      05-18-2006
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> Im working on a project that involves searching with google. I have
> been getting an http 403 error with the following code:

....

Google offers a Java API, see http://www.google.com/apis/. It is much
easier than trying to get and parse a web page.

Note that they limit automated searching to 1000 queries per day,
non-commercial, and require a license key with each request.

Patricia
 
Reply With Quote
 
 
 
 
alexandre_paterson@yahoo.fr
Guest
Posts: n/a
 
      05-18-2006
(E-Mail Removed) wrote:
....
> I know that http 403 error means that the server understood the
> request, yet refused it. As you can probably tell I have very little
> network programming experience, so maybe more experienced programmers
> could help alter my approach, or explain a better one? Thanks.


A better approach would be to use Google' APIs as Patricia pointed
out.

However this is not always an option (the API didn't help
for, eg, groups.google.com last time I checked [but this was
a long time ago I admit]).

Faking your user agent string will allow you to bypass the 403
(and it probably would be a breach of Google's terms).



--
(Don't pay attention to my .sig) Text file size: 1509 bytes
SHA1: bbfa3226005c2d4d04e3d72d49bfb1eb17e67f12
MD5: 38dfd87012a2754059a88341d66e2ef4

 
Reply With Quote
 
mfasoccer@gmail.com
Guest
Posts: n/a
 
      05-18-2006
> Faking your user agent string will allow you to bypass the 403

Could any provide a sample of how to fake my agent string?

 
Reply With Quote
 
alexandre_paterson@yahoo.fr
Guest
Posts: n/a
 
      05-18-2006
In your example, you insert one line:

URLConnection hpCon = hp.openConnection();
hpCon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U;
Windows NT 5.0; en-US; rv:1.7. Gecko/20050511");
hpCon.connect();

and that may work.

But you still should respect Google's terms...

 
Reply With Quote
 
Andrea Desole
Guest
Posts: n/a
 
      05-18-2006
(E-Mail Removed) wrote:
> In your example, you insert one line:
>
> URLConnection hpCon = hp.openConnection();
> hpCon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U;
> Windows NT 5.0; en-US; rv:1.7. Gecko/20050511");
> hpCon.connect();
>
> and that may work.


I'm not sure this is enough.
You probably have to set the http.agent property:

http://java.sun.com/j2se/1.5.0/docs/...roperties.html
 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      05-18-2006
Andrea Desole wrote:
> (E-Mail Removed) wrote:
>> In your example, you insert one line:
>>
>> URLConnection hpCon = hp.openConnection();
>> hpCon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U;
>> Windows NT 5.0; en-US; rv:1.7. Gecko/20050511");
>> hpCon.connect();
>>
>> and that may work.

>
> I'm not sure this is enough.
> You probably have to set the http.agent property:
>
> http://java.sun.com/j2se/1.5.0/docs/...roperties.html


Additional hint: better use a decent HTTP client such as Apache's as the
standard library classes are quite limited.

Regards

robert
 
Reply With Quote
 
mfasoccer@gmail.com
Guest
Posts: n/a
 
      05-18-2006
> URLConnection hpCon = hp.openConnection();
> hpCon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U;
> Windows NT 5.0; en-US; rv:1.7. Gecko/20050511");
> hpCon.connect();
>

it works, thanks.

 
Reply With Quote
 
VisionSet
Guest
Posts: n/a
 
      05-18-2006

<(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ups.com...
> > URLConnection hpCon = hp.openConnection();
> > hpCon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U;
> > Windows NT 5.0; en-US; rv:1.7. Gecko/20050511");
> > hpCon.connect();
> >

> it works, thanks.


But you'll still get the same restriction of 1000 hits per day however you
do it.

--
Mike W


 
Reply With Quote
 
mfasoccer@gmail.com
Guest
Posts: n/a
 
      05-18-2006
> But you'll still get the same restriction of 1000 hits per day however you
> do it.


Does this mean that even regular searches that are executed through
their website with an actual browser are also limited to 1000 hits per
day?

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Searching Google? Oltmans Python 4 02-18-2009 09:46 PM
Google search result to be URL-limited when searching site, but notwhen searching Web stumblng.tumblr Javascript 1 02-04-2008 09:01 AM
Embeded Java DB with inverted indexing and searching capabilities for Java Beans Sakthi Java 0 09-15-2004 05:35 AM
Embeded Java DB with inverted indexing and searching capabilities for Java Beans Sakthi Java 0 09-15-2004 05:35 AM
Searching headers in the google archives Chad Edwards Computer Support 6 08-27-2004 06:10 PM



Advertisments