Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > urlopen() error

Reply
Thread Tools

urlopen() error

 
 
Tempo
Guest
Posts: n/a
 
      09-08-2006
Hello. I am getting an error and it has gotten me stuck. I think the
best thing I can do is post my code and the error message and thank
everybody in advanced for any help that you give this issue. Thank you.

#############
Here's the code:
#############

import urllib2
import re
import xlrd
from BeautifulSoup import BeautifulSoup

book = xlrd.open_workbook("ige_virtualMoney.xls")
sh = book.sheet_by_index(0)
rx = 1
for rx in range(sh.nrows):
u = sh.cell_value(rx, 0)
page = urllib2.urlopen(u)
soup = BeautifulSoup(page)
p = soup.findAll('span', "sale")
p = str(p)
p2 = re.findall('\$\d+\.\d\d', p)
for price in p2:
print price

######################
Here are the error messages:
######################

Traceback (most recent call last):
File "E:\Python24\scraper.py", line 16, in -toplevel-
page = urllib2.urlopen(u)
File "E:\Python24\lib\urllib2.py", line 130, in urlopen
return _opener.open(url, data)
File "E:\Python24\lib\urllib2.py", line 350, in open
protocol = req.get_type()
File "E:\Python24\lib\urllib2.py", line 233, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: List

 
Reply With Quote
 
 
 
 
Rafal Zawadzki
Guest
Posts: n/a
 
      09-08-2006
Tempo wrote:

> Hello. I am getting an error and it has gotten me stuck. I think the
> best thing I can do is post my code and the error message and thank
> everybody in advanced for any help that you give this issue. Thank you.
>
> #############
> Here's the code:
> #############
>
> import urllib2
> import re
> import xlrd
> from BeautifulSoup import BeautifulSoup
>
> book = xlrd.open_workbook("ige_virtualMoney.xls")
> sh = book.sheet_by_index(0)
> rx = 1
> for rx in range(sh.nrows):
> u = sh.cell_value(rx, 0)
> page = urllib2.urlopen(u)
> soup = BeautifulSoup(page)
> p = soup.findAll('span', "sale")
> p = str(p)
> p2 = re.findall('\$\d+\.\d\d', p)
> for price in p2:
> print price


> ValueError: unknown url type: List

^^^^^^^^^^^^^^^^^^^^^^

I don't xlrd, but:
http://docs.python.org/lib/module-urllib2.html
urlopen( url[, data])
Open the URL url, which can be either a string or a Request object.
data should be a string, which specifies additional data to send to the
server. In HTTP requests, which are the only ones that support data, it
should be a buffer in the format of application/x-www-form-urlencoded, for
example one returned from urllib.urlencode().

What is your _u_?
--
Rafał Zawadzki [jid/mail: http://www.velocityreviews.com/forums/(E-Mail Removed), skype: blvszcz]
http://glam.pl - używane ciuchy, vintage, secondhand
http://bluszcz.net - moja strona domowa
 
Reply With Quote
 
 
 
 
Paul McNett
Guest
Posts: n/a
 
      09-08-2006
Tempo wrote:
> Hello. I am getting an error and it has gotten me stuck. I think the
> best thing I can do is post my code and the error message and thank
> everybody in advanced for any help that you give this issue. Thank you.
>
> #############
> Here's the code:
> #############
>
> import urllib2
> import re
> import xlrd
> from BeautifulSoup import BeautifulSoup
>
> book = xlrd.open_workbook("ige_virtualMoney.xls")
> sh = book.sheet_by_index(0)
> rx = 1
> for rx in range(sh.nrows):
> u = sh.cell_value(rx, 0)
> page = urllib2.urlopen(u)
> soup = BeautifulSoup(page)
> p = soup.findAll('span', "sale")
> p = str(p)
> p2 = re.findall('\$\d+\.\d\d', p)
> for price in p2:
> print price
>
> ######################
> Here are the error messages:
> ######################
>
> Traceback (most recent call last):
> File "E:\Python24\scraper.py", line 16, in -toplevel-
> page = urllib2.urlopen(u)
> File "E:\Python24\lib\urllib2.py", line 130, in urlopen
> return _opener.open(url, data)
> File "E:\Python24\lib\urllib2.py", line 350, in open
> protocol = req.get_type()
> File "E:\Python24\lib\urllib2.py", line 233, in get_type
> raise ValueError, "unknown url type: %s" % self.__original
> ValueError: unknown url type: List


You were expecting u to be a url string like "http://google.com", but it
looks like it is actually a list. I'm not familiar with package xlrd but
cell_value() must be returning a list and not a cell value. Presumably,
the list contains the cell value probably in element 0. Put in a print
statement before your call to urlopen() like:

print u

You'll likely discover your error.

--
Paul McNett
http://paulmcnett.com
http://dabodev.com

 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      09-15-2006

Paul McNett wrote:
> Tempo wrote:
> > Hello. I am getting an error and it has gotten me stuck. I think the
> > best thing I can do is post my code and the error message and thank
> > everybody in advanced for any help that you give this issue. Thank you.
> >
> > #############
> > Here's the code:
> > #############
> >
> > import urllib2
> > import re
> > import xlrd
> > from BeautifulSoup import BeautifulSoup
> >
> > book = xlrd.open_workbook("ige_virtualMoney.xls")
> > sh = book.sheet_by_index(0)
> > rx = 1
> > for rx in range(sh.nrows):


The above 2 lines should probably be:
for rx.range(1, sh.nrows):
otherwise the likelihood is that a column heading will be treated as
data.
Now read on

> > u = sh.cell_value(rx, 0)
> > page = urllib2.urlopen(u)
> > soup = BeautifulSoup(page)
> > p = soup.findAll('span', "sale")
> > p = str(p)
> > p2 = re.findall('\$\d+\.\d\d', p)
> > for price in p2:
> > print price
> >
> > ######################
> > Here are the error messages:
> > ######################
> >
> > Traceback (most recent call last):
> > File "E:\Python24\scraper.py", line 16, in -toplevel-
> > page = urllib2.urlopen(u)
> > File "E:\Python24\lib\urllib2.py", line 130, in urlopen
> > return _opener.open(url, data)
> > File "E:\Python24\lib\urllib2.py", line 350, in open
> > protocol = req.get_type()
> > File "E:\Python24\lib\urllib2.py", line 233, in get_type
> > raise ValueError, "unknown url type: %s" % self.__original
> > ValueError: unknown url type: List

>
> You were expecting u to be a url string like "http://google.com", but it
> looks like it is actually a list. I'm not familiar with package xlrd but
> cell_value() must be returning a list and not a cell value. Presumably,
> the list contains the cell value probably in element 0. Put in a print
> statement before your call to urlopen() like:
>
> print u


Sage advice. print repr(u) is in general even better advice.

>
> You'll likely discover your error.
>


Just for the record:

1. The xlrd package's Book.Sheet.cell_value() does *not* return lists.
As its docs say, it returns scalars, of the following types: unicode,
int, float, strg

2. The error is nothing to do with Python lists, it's all about
malformed URLs. "unknown url type" means it's not one of http, ftp,
file, data, gopher, ...

|>>> x = urllib2.urlopen('List')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\Python24\lib\urllib2.py", line 130, in urlopen
return _opener.open(url, data)
File "C:\Python24\lib\urllib2.py", line 350, in open
protocol = req.get_type()
File "C:\Python24\lib\urllib2.py", line 233, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: List

|>>> x = urllib2.urlopen('GOTCHA')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\Python24\lib\urllib2.py", line 130, in urlopen
return _opener.open(url, data)
File "C:\Python24\lib\urllib2.py", line 350, in open
protocol = req.get_type()
File "C:\Python24\lib\urllib2.py", line 233, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: GOTCHA
|>>>

HTH,
John

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
ERROR [HY000] [Microsoft][ODBC Microsoft Access Driver]General error Unable to open registry key 'Temporary (volatile) Jet DSN for process 0xffc Thread 0x228 DBC 0x437b94 Jet'. ERROR [IM006] [Microsoft][ODBC Driver Manager] Driver's SQLSetConnectAttr bazzer ASP .Net 0 03-30-2006 03:16 PM
Error connecting to SQLExpress 2005 locally (error: 26 - Error Locating Server/Instance Specified) hfk0 ASP .Net 2 03-27-2006 08:43 PM
ERROR [HY000] [Microsoft][ODBC Microsoft Access Driver]General error Unable to open registry key 'Temporary (volatile) Jet DSN for process 0x8fc Thread 0x934 DBC 0x437b94 Jet'. ERROR [IM006] [Microsoft][ODBC Driver Manager] Driver's SQLSetConnectAttr bazzer ASP .Net 1 03-24-2006 04:20 PM
ERROR [HY000] [Microsoft][ODBC Microsoft Access Driver]General error Unable to open registry key 'Temporary (volatile) Jet DSN for process 0x8fc Thread 0x934 DBC 0x437b94 Jet'. ERROR [IM006] [Microsoft][ODBC Driver Manager] Driver's SQLSetConnectAttr bazzer ASP .Net 0 03-24-2006 02:22 PM
Error 500: ERROR: Cannot forward. Writer or Stream already obtained. Error JavaQueries Java 1 03-01-2005 06:30 PM



Advertisments