Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Re: urllib2 FTP Weirdness (http://www.velocityreviews.com/forums/t956859-re-urllib2-ftp-weirdness.html)

Chris Angelico 01-23-2013 11:58 PM

Re: urllib2 FTP Weirdness
 
On Thu, Jan 24, 2013 at 7:07 AM, Nick Cash
<nick.cash@npcinternational.com> wrote:
> Python 2.7.3 on linux
>
> This has me fairly stumped. It looks like
> urllib2.urlopen("ftp://some.ftp.site/path").read()
> will either immediately return '' or hang indefinitely. But
> response = urllib2.urlopen("ftp://some.ftp.site/path")
> response.read()
> works fine and returns what is expected. This is only an issue with urllib2, vanilla urllib doesn't do it.
>
> The site I first noticed it on is private, but I can reproduce it with "ftp://ftp2.census.gov/".


Confirmed on 2.6.5 on Windows, fwiw. This is extremely weird. Possibly
it's some kind of race condition??

ChrisA

Hans Mulder 01-24-2013 12:45 AM

Re: urllib2 FTP Weirdness
 
On 24/01/13 00:58:04, Chris Angelico wrote:
> On Thu, Jan 24, 2013 at 7:07 AM, Nick Cash
> <nick.cash@npcinternational.com> wrote:
>> Python 2.7.3 on linux
>>
>> This has me fairly stumped. It looks like
>> urllib2.urlopen("ftp://some.ftp.site/path").read()
>> will either immediately return '' or hang indefinitely. But
>> response = urllib2.urlopen("ftp://some.ftp.site/path")
>> response.read()
>> works fine and returns what is expected. This is only an issue with urllib2, vanilla urllib doesn't do it.
>>
>> The site I first noticed it on is private, but I can reproduce it with "ftp://ftp2.census.gov/".

>
> Confirmed on 2.6.5 on Windows, fwiw. This is extremely weird.


It works fine with 2.7.3 on my Mac.

> Possibly it's some kind of race condition??


If urllib2 is using active mode FTP, then a firewall on your box
could explain what you're seeing. But then, that's why active
mode is hardly used these days.


Hope this helps,

-- HansM

Steven D'Aprano 01-24-2013 04:12 AM

Re: urllib2 FTP Weirdness
 
On Thu, 24 Jan 2013 01:45:31 +0100, Hans Mulder wrote:

> On 24/01/13 00:58:04, Chris Angelico wrote:
>> On Thu, Jan 24, 2013 at 7:07 AM, Nick Cash
>> <nick.cash@npcinternational.com> wrote:
>>> Python 2.7.3 on linux
>>>
>>> This has me fairly stumped. It looks like
>>> urllib2.urlopen("ftp://some.ftp.site/path").read()
>>> will either immediately return '' or hang indefinitely. But
>>> response = urllib2.urlopen("ftp://some.ftp.site/path")
>>> response.read()
>>> works fine and returns what is expected. This is only an issue with
>>> urllib2, vanilla urllib doesn't do it.
>>>
>>> The site I first noticed it on is private, but I can reproduce it with
>>> "ftp://ftp2.census.gov/".

>>
>> Confirmed on 2.6.5 on Windows, fwiw. This is extremely weird.

>
> It works fine with 2.7.3 on my Mac.
>
>> Possibly it's some kind of race condition??

>
> If urllib2 is using active mode FTP, then a firewall on your box could
> explain what you're seeing. But then, that's why active mode is hardly
> used these days.



Explain please?

I cannot see how the firewall could possible distinguish between using a
temporary variable or not in these two snippets:

# no temporary variable hangs, or fails
urllib2.urlopen("ftp://ftp2.census.gov/").read()


# temporary variable succeeds
response = urllib2.urlopen("ftp://ftp2.census.gov/")
response.read()



--
Steven

Cameron Simpson 02-06-2013 11:06 PM

Re: urllib2 FTP Weirdness
 
On 24Jan2013 04:12, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
| On Thu, 24 Jan 2013 01:45:31 +0100, Hans Mulder wrote:
| > On 24/01/13 00:58:04, Chris Angelico wrote:
| >> Possibly it's some kind of race condition??
| >
| > If urllib2 is using active mode FTP, then a firewall on your box could
| > explain what you're seeing. But then, that's why active mode is hardly
| > used these days.
|
| Explain please?

You do know the difference between active and passive FTP, yes?

| I cannot see how the firewall could possible distinguish between using a
| temporary variable or not in these two snippets:
|
| # no temporary variable hangs, or fails
| urllib2.urlopen("ftp://ftp2.census.gov/").read()
|
| # temporary variable succeeds
| response = urllib2.urlopen("ftp://ftp2.census.gov/")
| response.read()

Timing. (Let me say I consider this scenario unlikely, very unlikely.
But...)

If the latter is consistently slightly slower then the firewall may be an
issue if active FTP is being used. "Active" FTP requires the FTP server
to connect to you to deliver the data: your end opens a listening TCP
socket and says "get", supplying the socket details.

Really the TCP protocol is suppose to be plenty robust enough for this
not to be timing - the opening SYN packet will get resent if the first
try doesn't elicit a response.

For this to work over a firewall the firewall must (1) read your FTP
control connection to see the port announcements and then (2) open a
firewall hole to let the FTP server connect in, probably including a
NAT or RDR arrangement to catch the incoming connection and deliver it
to your end. Let us not even consider other NATting firewalls further
upstream with your ISP.

Active FTP (the original FTP mode) is horrible. Passive FTP is more
conventional: the server listens and you connect to fetch the file. But
it still requires the server to accept connections on multiple ports;
ugh.

I hate FTP and really don't understand why it is still in common use.
--
Cameron Simpson <cs@zip.com.au>

To be positive: To be mistaken at the top of one's voice.
Ambrose Bierce (1842-1914), U.S. author. The Devil's Dictionary (1881-1906).

Steven D'Aprano 02-07-2013 02:43 AM

Re: urllib2 FTP Weirdness
 
On Thu, 07 Feb 2013 10:06:32 +1100, Cameron Simpson wrote:

> | I cannot see how the firewall could possible distinguish between using
> | a temporary variable or not in these two snippets:
> |
> | # no temporary variable hangs, or fails
> | urllib2.urlopen("ftp://ftp2.census.gov/").read()
> |
> | # temporary variable succeeds
> | response = urllib2.urlopen("ftp://ftp2.census.gov/")
> | response.read()
>
> Timing. (Let me say I consider this scenario unlikely, very unlikely.
> But...)
>
> If the latter is consistently slightly slower


On my laptop, the difference is of the order of 10 microseconds. About
half a million times smaller than the amount of time it takes to open the
connection in the first place.


> then the firewall may be
> an issue if active FTP is being used. "Active" FTP requires the FTP
> server to connect to you to deliver the data: your end opens a listening
> TCP socket and says "get", supplying the socket details.


If you are thinking that the socket gets closed if the read is delayed
too much, that doesn't explain the results you are getting. The read
succeeds when there is a delay, not when there is no delay. Almost as if
something is saying "oh, the read request came in too soon after the
connection was made, must block".

What can I say? I cannot reproduce the issue you are having. If you can
reproduce it, try again without the firewall. If bypassing the firewall
makes the issue go away, then go and yell at your network admins until
they fix it.


--
Steven

Cameron Simpson 02-07-2013 09:49 PM

Re: urllib2 FTP Weirdness
 
On 07Feb2013 02:43, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
| On Thu, 07 Feb 2013 10:06:32 +1100, Cameron Simpson wrote:
| > Timing. (Let me say I consider this scenario unlikely, very unlikely.
| > But...)
| > If the latter is consistently slightly slower
|
| On my laptop, the difference is of the order of 10 microseconds.

Like I said, I do not consider this likely.

| > then the firewall may be
| > an issue if active FTP is being used. "Active" FTP requires the FTP
| > server to connect to you to deliver the data: your end opens a listening
| > TCP socket and says "get", supplying the socket details.
|
| If you are thinking that the socket gets closed if the read is delayed
| too much, that doesn't explain the results you are getting. The read
| succeeds when there is a delay, not when there is no delay. Almost as if
| something is saying "oh, the read request came in too soon after the
| connection was made, must block".

Exactly so.

For active FTP the firewall must accept an _inbound_ connection
from the server. If that connection's opening SYN packet comes in
_before_ the firewall has set up the special purpose rule to accept this
(remember, the fw is not the FTP client) then the firewall will quite
possibly _reject_ the inbound SYN packet, causing the server to see
"connection refused".

client:
open listening socket for data
say "GET foo" to server, with socket details
server:
connect to the socket
send data...

The firewall's in the middle, watching for the socket details. When it
sees them it must create an inbound forwarding rule to let the server's
inbound DATA connection through to the client. But the server believes
the socket is already available (because it is - the client makes it
before supplying the socket details) and may dispatch the DATA
connection before the firewall gets its rule in place.

| What can I say? I cannot reproduce the issue you are having. If you can
| reproduce it, try again without the firewall. If bypassing the firewall
| makes the issue go away, then go and yell at your network admins until
| they fix it.

If it is a speed thing, it may not be fixable. The fix is to use PASV
mode FTP or better still to avoid FTP entirely. I certainly don't support
active FTP on the firewalls I administer.

Cheers,
--
Cameron Simpson <cs@zip.com.au>

Symbol? What's a symbol? This is a rose.
- R.A. MacAvoy, _Tea with the Black Dragon_


All times are GMT. The time now is 12:28 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.