Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > urllib2 (py2.6) vs urllib.request (py3)

Reply
Thread Tools

urllib2 (py2.6) vs urllib.request (py3)

 
 
R. David Murray
Guest
Posts: n/a
 
      03-17-2009
mattia <(E-Mail Removed)> wrote:
> Hi all, can you tell me why the module urllib.request (py3) add extra
> characters (b'fef\r\n and \r\n0\r\n\r\n') in a simple example like the
> following and urllib2 (py2.6) correctly not?
>
> py2.6
> >>> import urllib2
> >>> f = urllib2.urlopen("http://www.google.com").read()
> >>> fd = open("google26.html", "w")
> >>> fd.write(f)
> >>> fd.close()

>
> py3
> >>> import urllib.request
> >>> f = urllib.request.urlopen("http://www.google.com").read()
> >>> with open("google30.html", "w") as fd:

> ... print(f, file=fd)
> ...
> >>>

>
> Opening the two html pages with ff I've got different results (the extra
> characters mentioned earlier), why?


The problem isn't a difference between urllib2 and urllib.request, it
is between fd.write and print. This produces the same result as
your first example:


>>> import urllib.request
>>> f = urllib.request.urlopen("http://www.google.com").read()
>>> with open("temp3.html", "wb") as fd:

.... fd.write(f)


The "b'....'" is the stringified representation of a bytes object,
which is what urllib.request returns in python3. Note the 'wb',
which is a critical difference from the python2.6 case. If you
omit the 'b' in python3, it will complain that you can't write bytes
to the file object.

The thing to keep in mind is that print converts its argument to string
before writing it anywhere (that's the point of using it), and that
bytes (or buffer) and string are very different types in python3.

--
R. David Murray http://www.bitdance.com

 
Reply With Quote
 
 
 
 
R. David Murray
Guest
Posts: n/a
 
      03-17-2009
mattia <(E-Mail Removed)> wrote:
> Il Tue, 17 Mar 2009 10:55:21 +0000, R. David Murray ha scritto:
>
> > mattia <(E-Mail Removed)> wrote:
> >> Hi all, can you tell me why the module urllib.request (py3) add extra
> >> characters (b'fef\r\n and \r\n0\r\n\r\n') in a simple example like the
> >> following and urllib2 (py2.6) correctly not?
> >>
> >> py2.6
> >> >>> import urllib2
> >> >>> f = urllib2.urlopen("http://www.google.com").read() fd =
> >> >>> open("google26.html", "w")
> >> >>> fd.write(f)
> >> >>> fd.close()
> >>
> >> py3
> >> >>> import urllib.request
> >> >>> f = urllib.request.urlopen("http://www.google.com").read() with
> >> >>> open("google30.html", "w") as fd:
> >> ... print(f, file=fd)
> >> ...
> >> >>>
> >> >>>
> >> Opening the two html pages with ff I've got different results (the
> >> extra characters mentioned earlier), why?

> >
> > The problem isn't a difference between urllib2 and urllib.request, it is
> > between fd.write and print. This produces the same result as your first
> > example:
> >
> >
> >>>> import urllib.request
> >>>> f = urllib.request.urlopen("http://www.google.com").read() with
> >>>> open("temp3.html", "wb") as fd:

> > ... fd.write(f)
> >
> >
> > The "b'....'" is the stringified representation of a bytes object, which
> > is what urllib.request returns in python3. Note the 'wb', which is a
> > critical difference from the python2.6 case. If you omit the 'b' in
> > python3, it will complain that you can't write bytes to the file object.
> >
> > The thing to keep in mind is that print converts its argument to string
> > before writing it anywhere (that's the point of using it), and that
> > bytes (or buffer) and string are very different types in python3.

>
> Well... now in the saved file I've got extra characters "fef" at the
> begin and "0" at the end...


The 'fef' is reminiscent of a BOM. I don't see any such thing in the
data file produced by my code snippet above. Did you try running that,
or did you modify your code? If the latter, maybe if you post your
exact code I can try to run it and see if I can figure out what is going on.

I'm far from an expert in unicode issues, by the way Oh, and I'm running
3.1a1+ from svn, by the way, so it is also possible there's been a bug
fix of some sort.

--
R. David Murray http://www.bitdance.com

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with: urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) Josef Cihal Python 0 09-05-2005 11:26 AM
FTP with urllib2 behind a proxy O. Koch Python 4 08-14-2003 11:54 AM
urllib2 Clarence Gardner Python 1 08-08-2003 12:34 AM
Re: urllib2 http status John J. Lee Python 1 07-31-2003 09:35 AM
urllib2 for HTTPS/SSL Kylotan Python 5 07-09-2003 02:22 PM



Advertisments