Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > python+libxml2+scrapy AttributeError: 'module' object has noattribute 'HTML_PARSE_RECOVER'

Reply
Thread Tools

python+libxml2+scrapy AttributeError: 'module' object has noattribute 'HTML_PARSE_RECOVER'

 
 
Dmitry Arsentiev
Guest
Posts: n/a
 
      08-15-2012
Hello.

Has anybody already meet the problem like this? -
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

When I run scrapy, I get

File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
line 14, in <module>
libxml2.HTML_PARSE_NOERROR + \
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'


When I run
python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER'

I get
Traceback (most recent call last):
File "<string>", line 1, in <module>
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

How can I cure it?

Python 2.7
libxml2-python 2.6.9
2.6.11-gentoo-r6


I will be grateful for any help.

DETAILS:

scrapy crawl lgz -o items.json -t json
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 4, in <module>
execute()
File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 112, in execute
cmds = _get_commands_dict(inproject)
File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 37, in _get_commands_dict
cmds = _get_commands_from_module('scrapy.commands', inproject)
File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 30, in _get_commands_from_module
for cmd in _iter_command_classes(module):
File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 21, in _iter_command_classes
for module in walk_modules(module_name):
File "/usr/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 65, in walk_modules
submod = __import__(fullpath, {}, {}, [''])
File "/usr/local/lib/python2.7/site-packages/scrapy/commands/shell.py", line 8, in <module>
from scrapy.shell import Shell
File "/usr/local/lib/python2.7/site-packages/scrapy/shell.py", line 14, in <module>
from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector
File "/usr/local/lib/python2.7/site-packages/scrapy/selector/__init__.py", line 30, in <module>
from scrapy.selector.libxml2sel import *
File "/usr/local/lib/python2.7/site-packages/scrapy/selector/libxml2sel.py", line 12, in <module>
from .factories import xmlDoc_from_html, xmlDoc_from_xml
File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py", line 14, in <module>
libxml2.HTML_PARSE_NOERROR + \
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'


 
Reply With Quote
 
 
 
 
Dieter Maurer
Guest
Posts: n/a
 
      08-16-2012
Dmitry Arsentiev <> writes:

> Has anybody already meet the problem like this? -
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
> When I run scrapy, I get
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
> line 14, in <module>
> libxml2.HTML_PARSE_NOERROR + \
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'


Apparently, the versions of "scrapy" and "libxml2" do not fit.

Check with which "libxml2" versions, your "scrapy" version can work
and then install one of them.

 
Reply With Quote
 
 
 
 
personificator@gmail.com
Guest
Posts: n/a
 
      08-17-2012
I believe ftp://xmlsoft.org/libxml2/libxml2-2.8.0.tar.gz was what your looking for. Submit a ticket for the docs to get updated if your feeling generous.

On Wednesday, August 15, 2012 7:49:04 AM UTC-5, Dmitry Arsentiev wrote:
> Hello.
>
>
>
> Has anybody already meet the problem like this? -
>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
>
>
> When I run scrapy, I get
>
>
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
>
> line 14, in <module>
>
> libxml2.HTML_PARSE_NOERROR + \
>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
>
>
>
>
> When I run
>
> python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER'
>
>
>
> I get
>
> Traceback (most recent call last):
>
> File "<string>", line 1, in <module>
>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
>
>
> How can I cure it?
>
>
>
> Python 2.7
>
> libxml2-python 2.6.9
>
> 2.6.11-gentoo-r6
>
>
>
>
>
> I will be grateful for any help.
>
>
>
> DETAILS:
>
>
>
> scrapy crawl lgz -o items.json -t json
>
> Traceback (most recent call last):
>
> File "/usr/local/bin/scrapy", line 4, in <module>
>
> execute()
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 112, in execute
>
> cmds = _get_commands_dict(inproject)
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 37, in _get_commands_dict
>
> cmds = _get_commands_from_module('scrapy.commands', inproject)
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 30, in _get_commands_from_module
>
> for cmd in _iter_command_classes(module):
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 21, in _iter_command_classes
>
> for module in walk_modules(module_name):
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 65, in walk_modules
>
> submod = __import__(fullpath, {}, {}, [''])
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/commands/shell.py", line 8, in <module>
>
> from scrapy.shell import Shell
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/shell.py", line 14, in <module>
>
> from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/__init__.py", line 30, in <module>
>
> from scrapy.selector.libxml2sel import *
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/libxml2sel.py", line 12, in <module>
>
> from .factories import xmlDoc_from_html, xmlDoc_from_xml
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py", line 14, in <module>
>
> libxml2.HTML_PARSE_NOERROR + \
>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'


 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      08-18-2012
Dmitry Arsentiev, 15.08.2012 14:49:
> Has anybody already meet the problem like this? -
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
> When I run scrapy, I get
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
> line 14, in <module>
> libxml2.HTML_PARSE_NOERROR + \
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
>
> When I run
> python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER'
>
> I get
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
> How can I cure it?
>
> Python 2.7
> libxml2-python 2.6.9
> 2.6.11-gentoo-r6


That version of libxml2 is way too old and doesn't support parsing
real-world HTML. IIRC, that started with 2.6.21 and got improved a bit
after that.

Get a 2.8.0 installation, as someone pointed out already.

Stefan


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: "AttributeError: 'module' object has noattribute 'getdefaultlocale'" on Python start Matt Nordhoff Python 0 09-09-2008 07:37 AM
When a control on form has blank value or has no items (dropdownlist) then it wont' be in Request.Forms TS ASP .Net 3 10-06-2006 01:29 PM
Downloaded document has disappeared by the time Word has opened Rob Nicholson ASP .Net 12 12-06-2005 04:59 PM
Object creation - Do we really need to create a parent for a derieved object - can't the base object just point to an already created base object jon wayne C++ 9 09-22-2005 02:06 AM
ZoneAlarm has detected a problem with your installation, and therefore has restricted Internet access from your machine for your protection. Don’t panic A Teuchter Computer Support 2 05-19-2005 09:20 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57