Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: urllib2 FTP Weirdness

Reply
Thread Tools

Re: urllib2 FTP Weirdness

 
 
Chris Angelico
Guest
Posts: n/a
 
      01-23-2013
On Thu, Jan 24, 2013 at 7:07 AM, Nick Cash
<(E-Mail Removed)> wrote:
> Python 2.7.3 on linux
>
> This has me fairly stumped. It looks like
> urllib2.urlopen("ftp://some.ftp.site/path").read()
> will either immediately return '' or hang indefinitely. But
> response = urllib2.urlopen("ftp://some.ftp.site/path")
> response.read()
> works fine and returns what is expected. This is only an issue with urllib2, vanilla urllib doesn't do it.
>
> The site I first noticed it on is private, but I can reproduce it with "ftp://ftp2.census.gov/".


Confirmed on 2.6.5 on Windows, fwiw. This is extremely weird. Possibly
it's some kind of race condition??

ChrisA
 
Reply With Quote
 
 
 
 
Hans Mulder
Guest
Posts: n/a
 
      01-24-2013
On 24/01/13 00:58:04, Chris Angelico wrote:
> On Thu, Jan 24, 2013 at 7:07 AM, Nick Cash
> <(E-Mail Removed)> wrote:
>> Python 2.7.3 on linux
>>
>> This has me fairly stumped. It looks like
>> urllib2.urlopen("ftp://some.ftp.site/path").read()
>> will either immediately return '' or hang indefinitely. But
>> response = urllib2.urlopen("ftp://some.ftp.site/path")
>> response.read()
>> works fine and returns what is expected. This is only an issue with urllib2, vanilla urllib doesn't do it.
>>
>> The site I first noticed it on is private, but I can reproduce it with "ftp://ftp2.census.gov/".

>
> Confirmed on 2.6.5 on Windows, fwiw. This is extremely weird.


It works fine with 2.7.3 on my Mac.

> Possibly it's some kind of race condition??


If urllib2 is using active mode FTP, then a firewall on your box
could explain what you're seeing. But then, that's why active
mode is hardly used these days.


Hope this helps,

-- HansM
 
Reply With Quote
 
 
 
 
Steven D'Aprano
Guest
Posts: n/a
 
      01-24-2013
On Thu, 24 Jan 2013 01:45:31 +0100, Hans Mulder wrote:

> On 24/01/13 00:58:04, Chris Angelico wrote:
>> On Thu, Jan 24, 2013 at 7:07 AM, Nick Cash
>> <(E-Mail Removed)> wrote:
>>> Python 2.7.3 on linux
>>>
>>> This has me fairly stumped. It looks like
>>> urllib2.urlopen("ftp://some.ftp.site/path").read()
>>> will either immediately return '' or hang indefinitely. But
>>> response = urllib2.urlopen("ftp://some.ftp.site/path")
>>> response.read()
>>> works fine and returns what is expected. This is only an issue with
>>> urllib2, vanilla urllib doesn't do it.
>>>
>>> The site I first noticed it on is private, but I can reproduce it with
>>> "ftp://ftp2.census.gov/".

>>
>> Confirmed on 2.6.5 on Windows, fwiw. This is extremely weird.

>
> It works fine with 2.7.3 on my Mac.
>
>> Possibly it's some kind of race condition??

>
> If urllib2 is using active mode FTP, then a firewall on your box could
> explain what you're seeing. But then, that's why active mode is hardly
> used these days.



Explain please?

I cannot see how the firewall could possible distinguish between using a
temporary variable or not in these two snippets:

# no temporary variable hangs, or fails
urllib2.urlopen("ftp://ftp2.census.gov/").read()


# temporary variable succeeds
response = urllib2.urlopen("ftp://ftp2.census.gov/")
response.read()



--
Steven
 
Reply With Quote
 
Cameron Simpson
Guest
Posts: n/a
 
      02-06-2013
On 24Jan2013 04:12, Steven D'Aprano <(E-Mail Removed)> wrote:
| On Thu, 24 Jan 2013 01:45:31 +0100, Hans Mulder wrote:
| > On 24/01/13 00:58:04, Chris Angelico wrote:
| >> Possibly it's some kind of race condition??
| >
| > If urllib2 is using active mode FTP, then a firewall on your box could
| > explain what you're seeing. But then, that's why active mode is hardly
| > used these days.
|
| Explain please?

You do know the difference between active and passive FTP, yes?

| I cannot see how the firewall could possible distinguish between using a
| temporary variable or not in these two snippets:
|
| # no temporary variable hangs, or fails
| urllib2.urlopen("ftp://ftp2.census.gov/").read()
|
| # temporary variable succeeds
| response = urllib2.urlopen("ftp://ftp2.census.gov/")
| response.read()

Timing. (Let me say I consider this scenario unlikely, very unlikely.
But...)

If the latter is consistently slightly slower then the firewall may be an
issue if active FTP is being used. "Active" FTP requires the FTP server
to connect to you to deliver the data: your end opens a listening TCP
socket and says "get", supplying the socket details.

Really the TCP protocol is suppose to be plenty robust enough for this
not to be timing - the opening SYN packet will get resent if the first
try doesn't elicit a response.

For this to work over a firewall the firewall must (1) read your FTP
control connection to see the port announcements and then (2) open a
firewall hole to let the FTP server connect in, probably including a
NAT or RDR arrangement to catch the incoming connection and deliver it
to your end. Let us not even consider other NATting firewalls further
upstream with your ISP.

Active FTP (the original FTP mode) is horrible. Passive FTP is more
conventional: the server listens and you connect to fetch the file. But
it still requires the server to accept connections on multiple ports;
ugh.

I hate FTP and really don't understand why it is still in common use.
--
Cameron Simpson <(E-Mail Removed)>

To be positive: To be mistaken at the top of one's voice.
Ambrose Bierce (1842-1914), U.S. author. The Devil's Dictionary (1881-1906).
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      02-07-2013
On Thu, 07 Feb 2013 10:06:32 +1100, Cameron Simpson wrote:

> | I cannot see how the firewall could possible distinguish between using
> | a temporary variable or not in these two snippets:
> |
> | # no temporary variable hangs, or fails
> | urllib2.urlopen("ftp://ftp2.census.gov/").read()
> |
> | # temporary variable succeeds
> | response = urllib2.urlopen("ftp://ftp2.census.gov/")
> | response.read()
>
> Timing. (Let me say I consider this scenario unlikely, very unlikely.
> But...)
>
> If the latter is consistently slightly slower


On my laptop, the difference is of the order of 10 microseconds. About
half a million times smaller than the amount of time it takes to open the
connection in the first place.


> then the firewall may be
> an issue if active FTP is being used. "Active" FTP requires the FTP
> server to connect to you to deliver the data: your end opens a listening
> TCP socket and says "get", supplying the socket details.


If you are thinking that the socket gets closed if the read is delayed
too much, that doesn't explain the results you are getting. The read
succeeds when there is a delay, not when there is no delay. Almost as if
something is saying "oh, the read request came in too soon after the
connection was made, must block".

What can I say? I cannot reproduce the issue you are having. If you can
reproduce it, try again without the firewall. If bypassing the firewall
makes the issue go away, then go and yell at your network admins until
they fix it.


--
Steven
 
Reply With Quote
 
Cameron Simpson
Guest
Posts: n/a
 
      02-07-2013
On 07Feb2013 02:43, Steven D'Aprano <(E-Mail Removed)> wrote:
| On Thu, 07 Feb 2013 10:06:32 +1100, Cameron Simpson wrote:
| > Timing. (Let me say I consider this scenario unlikely, very unlikely.
| > But...)
| > If the latter is consistently slightly slower
|
| On my laptop, the difference is of the order of 10 microseconds.

Like I said, I do not consider this likely.

| > then the firewall may be
| > an issue if active FTP is being used. "Active" FTP requires the FTP
| > server to connect to you to deliver the data: your end opens a listening
| > TCP socket and says "get", supplying the socket details.
|
| If you are thinking that the socket gets closed if the read is delayed
| too much, that doesn't explain the results you are getting. The read
| succeeds when there is a delay, not when there is no delay. Almost as if
| something is saying "oh, the read request came in too soon after the
| connection was made, must block".

Exactly so.

For active FTP the firewall must accept an _inbound_ connection
from the server. If that connection's opening SYN packet comes in
_before_ the firewall has set up the special purpose rule to accept this
(remember, the fw is not the FTP client) then the firewall will quite
possibly _reject_ the inbound SYN packet, causing the server to see
"connection refused".

client:
open listening socket for data
say "GET foo" to server, with socket details
server:
connect to the socket
send data...

The firewall's in the middle, watching for the socket details. When it
sees them it must create an inbound forwarding rule to let the server's
inbound DATA connection through to the client. But the server believes
the socket is already available (because it is - the client makes it
before supplying the socket details) and may dispatch the DATA
connection before the firewall gets its rule in place.

| What can I say? I cannot reproduce the issue you are having. If you can
| reproduce it, try again without the firewall. If bypassing the firewall
| makes the issue go away, then go and yell at your network admins until
| they fix it.

If it is a speed thing, it may not be fixable. The fix is to use PASV
mode FTP or better still to avoid FTP entirely. I certainly don't support
active FTP on the firewalls I administer.

Cheers,
--
Cameron Simpson <(E-Mail Removed)>

Symbol? What's a symbol? This is a rose.
- R.A. MacAvoy, _Tea with the Black Dragon_
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
urllib2 FTP Weirdness Nick Cash Python 1 01-24-2013 12:41 AM
urllib2 weirdness when https_proxy environment variable is exported Devraj Python 2 10-29-2007 09:58 AM
Problem with: urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) Josef Cihal Python 0 09-05-2005 11:26 AM
Tkinter WEIRDNESS or Python WEIRDNESS? steve Python 4 03-13-2005 12:34 AM
Net::FTP problems getting files from Windows FTP server, but not Linux FTP Server. D. Buck Perl Misc 2 06-29-2004 02:05 PM



Advertisments