Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > urllib2 - 403 that _should_ not occur.

Reply
Thread Tools

urllib2 - 403 that _should_ not occur.

 
 
James Mills
Guest
Posts: n/a
 
      01-12-2009
Hey all,

The following fails for me:

>>> from urllib2 import urlopen
>>> f = urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml")

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/urllib2.py", line 124, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.6/urllib2.py", line 389, in open
response = meth(req, response)
File "/usr/lib/python2.6/urllib2.py", line 502, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.6/urllib2.py", line 427, in error
return self._call_chain(*args)
File "/usr/lib/python2.6/urllib2.py", line 361, in _call_chain
result = func(*args)
File "/usr/lib/python2.6/urllib2.py", line 510, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
>>>


However, that _same_ url works perfectly fine on the
same machine (and same network) using any of:
* curl
* wget
* elinks
* firefox

Any helpful ideas ?

cheers
James

--
-- "Problems are solved by method"
 
Reply With Quote
 
 
 
 
ajaksu
Guest
Posts: n/a
 
      01-12-2009
On Jan 11, 11:59*pm, "James Mills" <(E-Mail Removed)>
wrote:
> Hey all,
>
> The following fails for me:
>
> >>> from urllib2 import urlopen
> >>> f = urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml")

>
> Traceback (most recent call last):

[...]
> Any helpful ideas ?


Maybe raise a real bug @ bugs.python.org instead of just mentioning it
like I did: http://bugs.python.org/msg77889

I think at least some sites would be willing to add the new UA to
their whitelists.

HTH,
Daniel
 
Reply With Quote
 
 
 
 
Philip Semanchuk
Guest
Posts: n/a
 
      01-13-2009

On Jan 12, 2009, at 6:48 PM, ajaksu wrote:

> On Jan 11, 11:59 pm, "James Mills" <(E-Mail Removed)>
> wrote:
>> Hey all,
>>
>> The following fails for me:
>>
>>>>> from urllib2 import urlopen
>>>>> f = urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml
>>>>> ")

>>
>> Traceback (most recent call last):

> [...]
>> Any helpful ideas ?

>
> Maybe raise a real bug @ bugs.python.org instead of just mentioning it
> like I did: http://bugs.python.org/msg77889
>
> I think at least some sites would be willing to add the new UA to
> their whitelists.


I don't think I understand you clearly. Whether or not Google et al
whitelist the Python UA isn't a Python issue, is it?


 
Reply With Quote
 
Steve Holden
Guest
Posts: n/a
 
      01-13-2009
Philip Semanchuk wrote:
>
> On Jan 12, 2009, at 6:48 PM, ajaksu wrote:
>
>> On Jan 11, 11:59 pm, "James Mills" <(E-Mail Removed)>
>> wrote:
>>> Hey all,
>>>
>>> The following fails for me:
>>>
>>>>>> from urllib2 import urlopen
>>>>>> f =
>>>>>> urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml")
>>>>>>
>>>
>>> Traceback (most recent call last):

>> [...]
>>> Any helpful ideas ?

>>
>> Maybe raise a real bug @ bugs.python.org instead of just mentioning it
>> like I did: http://bugs.python.org/msg77889
>>
>> I think at least some sites would be willing to add the new UA to
>> their whitelists.

>
> I don't think I understand you clearly. Whether or not Google et al
> whitelist the Python UA isn't a Python issue, is it?
>

I'd say it's an issue relevant to Python users, which woudl seem to put
it pretty much in the mainstream for c.l.py - especially as the code
causing concern was written in Python.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

 
Reply With Quote
 
Philip Semanchuk
Guest
Posts: n/a
 
      01-13-2009

On Jan 13, 2009, at 1:22 AM, Steve Holden wrote:

> Philip Semanchuk wrote:
>>
>> On Jan 12, 2009, at 6:48 PM, ajaksu wrote:
>>
>>> On Jan 11, 11:59 pm, "James Mills" <(E-Mail Removed)>
>>> wrote:
>>>> Hey all,
>>>>
>>>> The following fails for me:
>>>>
>>>>>>> from urllib2 import urlopen
>>>>>>> f =
>>>>>>> urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml
>>>>>>> ")
>>>>>>>
>>>>
>>>> Traceback (most recent call last):
>>> [...]
>>>> Any helpful ideas ?
>>>
>>> Maybe raise a real bug @ bugs.python.org instead of just
>>> mentioning it
>>> like I did: http://bugs.python.org/msg77889
>>>
>>> I think at least some sites would be willing to add the new UA to
>>> their whitelists.

>>
>> I don't think I understand you clearly. Whether or not Google et al
>> whitelist the Python UA isn't a Python issue, is it?
>>

> I'd say it's an issue relevant to Python users, which woudl seem to
> put
> it pretty much in the mainstream for c.l.py - especially as the code
> causing concern was written in Python.


I didn't mean to imply that the conversation didn't belong here. I
think that is perfectly appropriate. What I don't understand is the
suggestion that Google's server config should be raised as a bug
against Python. (i.e. "raise a real bug @ bugs.python.org...")




 
Reply With Quote
 
Steve Holden
Guest
Posts: n/a
 
      01-13-2009
Philip Semanchuk wrote:
>
> On Jan 13, 2009, at 1:22 AM, Steve Holden wrote:
>
>> Philip Semanchuk wrote:
>>>
>>> On Jan 12, 2009, at 6:48 PM, ajaksu wrote:
>>>
>>>> On Jan 11, 11:59 pm, "James Mills" <(E-Mail Removed)>
>>>> wrote:
>>>>> Hey all,
>>>>>
>>>>> The following fails for me:
>>>>>
>>>>>>>> from urllib2 import urlopen
>>>>>>>> f =
>>>>>>>> urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml")
>>>>>>>>
>>>>>>>>
>>>>>
>>>>> Traceback (most recent call last):
>>>> [...]
>>>>> Any helpful ideas ?
>>>>
>>>> Maybe raise a real bug @ bugs.python.org instead of just mentioning it
>>>> like I did: http://bugs.python.org/msg77889
>>>>
>>>> I think at least some sites would be willing to add the new UA to
>>>> their whitelists.
>>>
>>> I don't think I understand you clearly. Whether or not Google et al
>>> whitelist the Python UA isn't a Python issue, is it?
>>>

>> I'd say it's an issue relevant to Python users, which woudl seem to put
>> it pretty much in the mainstream for c.l.py - especially as the code
>> causing concern was written in Python.

>
> I didn't mean to imply that the conversation didn't belong here. I think
> that is perfectly appropriate. What I don't understand is the suggestion
> that Google's server config should be raised as a bug against Python.
> (i.e. "raise a real bug @ bugs.python.org...")
>

Oh, I see! Yes, it's hard to know what actions anyone could take on such
a bug report. I suppose the documentation could be modified to describe
how some services require specific agents, but that wouldn't help a huge
amount.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

 
Reply With Quote
 
Falcolas
Guest
Posts: n/a
 
      01-13-2009
On Jan 11, 6:59*pm, "James Mills" <(E-Mail Removed)>
wrote:
> Hey all,
>
> The following fails for me:
>
> >>> from urllib2 import urlopen
> >>> f = urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml")


For what it's worth, I've had a similar problem with the urlopen as
well. Using the library default urlopen results in an error, but if I
build an opener with the basic handlers, it works just fine.

>>> import urllib2
>>> f = urllib2.urlopen("http://localhost:8000")

Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
f = urllib2.urlopen("http://localhost:8000")
File "C:\Python25\lib\urllib2.py", line 121, in urlopen
return _opener.open(url, data)
File "C:\Python25\lib\urllib2.py", line 380, in open
response = meth(req, response)
File "C:\Python25\lib\urllib2.py", line 491, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python25\lib\urllib2.py", line 418, in error
return self._call_chain(*args)
File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
result = func(*args)
File "C:\Python25\lib\urllib2.py", line 499, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
>>> opener = urllib2.OpenerDirector()
>>> opener.add_handler(urllib2.HTTPHandler())
>>> opener.add_handler(urllib2.HTTPDefaultErrorHandler ())
>>> f = opener.open("http://localhost:8000")
>>> f.read()

'something relevant'
 
Reply With Quote
 
ajaksu
Guest
Posts: n/a
 
      01-14-2009
On Jan 13, 1:33*am, Philip Semanchuk <(E-Mail Removed)> wrote:
> I don't think I understand you clearly. Whether or not Google et al *
> whitelist the Python UA isn't a Python issue, is it?


Hi, sorry for taking so long to reply

I imagine it's something akin to Firefox's 'Report broken website':
evangelism.

IMHO, if the PSF *cough* Steve *cough* or individual Python hackers
can contact key sites (as Wikipedia, groups.google, etc.) the issue
can be solved sooner.

Instead of waiting for each whitelist maintainer's to find out we have
a new UA, go out and tell them. A template for such requests could
help those inside e.g. Google to bring the issue to the attention of
the whitelist admins. The community has lots of connections that could
be useful to pass the message along, if only 'led by the nose' to
achieve that

Hence, the suggestion to raise a bug.

Regards,
Daniel
 
Reply With Quote
 
Philip Semanchuk
Guest
Posts: n/a
 
      01-14-2009

On Jan 13, 2009, at 9:42 PM, ajaksu wrote:

> On Jan 13, 1:33 am, Philip Semanchuk <(E-Mail Removed)> wrote:
>> I don't think I understand you clearly. Whether or not Google et al
>> whitelist the Python UA isn't a Python issue, is it?

>
> Hi, sorry for taking so long to reply
>
> I imagine it's something akin to Firefox's 'Report broken website':
> evangelism.
>
> IMHO, if the PSF *cough* Steve *cough* or individual Python hackers
> can contact key sites (as Wikipedia, groups.google, etc.) the issue
> can be solved sooner.
>
> Instead of waiting for each whitelist maintainer's to find out we have
> a new UA, go out and tell them. A template for such requests could
> help those inside e.g. Google to bring the issue to the attention of
> the whitelist admins. The community has lots of connections that could
> be useful to pass the message along, if only 'led by the nose' to
> achieve that
>
> Hence, the suggestion to raise a bug.


Gotcha.

In this case I think there is no whitelist. I think Google has a
default accept policy supplemented with a blacklist rather than a
default ban policy mitigated by a whitelist. As evidence I submit the
fact that my user agent of "funny fish" was accepted. In other words,
Google has taken explicit steps to ban agents sending the default
Python UA. Now, if the default UA changed in Python 3.0, maybe the
best thing to do is keep quiet and maybe it will fly under the Google
radar for a while. =)

Cheers
Philip



 
Reply With Quote
 
Steve Holden
Guest
Posts: n/a
 
      01-14-2009
ajaksu wrote:
> On Jan 13, 1:33 am, Philip Semanchuk <(E-Mail Removed)> wrote:
>> I don't think I understand you clearly. Whether or not Google et al
>> whitelist the Python UA isn't a Python issue, is it?

>
> Hi, sorry for taking so long to reply
>
> I imagine it's something akin to Firefox's 'Report broken website':
> evangelism.
>
> IMHO, if the PSF *cough* Steve *cough* or individual Python hackers
> can contact key sites (as Wikipedia, groups.google, etc.) the issue
> can be solved sooner.
>
> Instead of waiting for each whitelist maintainer's to find out we have
> a new UA, go out and tell them. A template for such requests could
> help those inside e.g. Google to bring the issue to the attention of
> the whitelist admins. The community has lots of connections that could
> be useful to pass the message along, if only 'led by the nose' to
> achieve that
>
> Hence, the suggestion to raise a bug.
>

OK, but be aware that the PSF doesn't monitor the bugs looking for
actions to take on behalf of the Python user community. In fact we
aren't overtly "political" in this way at all. This doesn't mean it
wouldn't be useful for the PSF to get involved in this role; just that
right now it isn't, and a bug report probably isn't the best way to get
action.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
problem in running a basic code in python 3.3.0 that includes HTML file Satabdi Mukherjee Python 1 04-04-2013 07:48 PM
Re: urllib2 - 403 that _should_ not occur. Chris Mellon Python 2 01-12-2009 10:43 PM
You are not authorized to view this page HTTP Error 403 - Forbidden Tony Girgenti ASP .Net 5 09-12-2006 07:02 PM
Problem with: urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) Josef Cihal Python 0 09-05-2005 11:26 AM
Error 403-Error 403-Error 403 willem joubert ASP .Net Web Services 1 02-08-2005 06:47 PM



Advertisments