Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > XML Parsing

Reply
Thread Tools

XML Parsing

 
 
Alok Kothari
Guest
Posts: n/a
 
      04-01-2008
Hello,
I am new to XML parsing.Could you kindly tell me whats the
problem with the following code:

import xml.dom.minidom
import xml.parsers.expat
document = """<token pos="nn">Letterman</token><token pos="bez">is</
token><token pos="jjr">better</token><token pos="cs">than</
token><token pos="np">Jay</token><token pos="np">Leno</token>"""



# 3 handler functions
def start_element(name, attrs):
print 'Start element:', name, attrs
def end_element(name):
print 'End element:', name
def char_data(data):
print 'Character data:', repr(data)

p = xml.parsers.expat.ParserCreate()

p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data
p.Parse(document, 1)

OUTPUT:

Start element: token {u'pos': u'nn'}
Character data: u'Letterman'
End element: token

Traceback (most recent call last):
File "C:/Python25/Programs/eg.py", line 20, in <module>
p.Parse(document, 1)
ExpatError: junk after document element: line 1, column 33

 
Reply With Quote
 
 
 
 
Jason Scheirer
Guest
Posts: n/a
 
      04-01-2008
On Apr 1, 12:42 pm, Alok Kothari <(E-Mail Removed)> wrote:
> Hello,
> I am new to XML parsing.Could you kindly tell me whats the
> problem with the following code:
>
> import xml.dom.minidom
> import xml.parsers.expat
> document = """<token pos="nn">Letterman</token><token pos="bez">is</
> token><token pos="jjr">better</token><token pos="cs">than</
> token><token pos="np">Jay</token><token pos="np">Leno</token>"""
>
> # 3 handler functions
> def start_element(name, attrs):
> print 'Start element:', name, attrs
> def end_element(name):
> print 'End element:', name
> def char_data(data):
> print 'Character data:', repr(data)
>
> p = xml.parsers.expat.ParserCreate()
>
> p.StartElementHandler = start_element
> p.EndElementHandler = end_element
> p.CharacterDataHandler = char_data
> p.Parse(document, 1)
>
> OUTPUT:
>
> Start element: token {u'pos': u'nn'}
> Character data: u'Letterman'
> End element: token
>
> Traceback (most recent call last):
> File "C:/Python25/Programs/eg.py", line 20, in <module>
> p.Parse(document, 1)
> ExpatError: junk after document element: line 1, column 33


Your XML is wrong. Don't put line breaks between </ and token>.
 
Reply With Quote
 
 
 
 
7stud
Guest
Posts: n/a
 
      04-01-2008
On Apr 1, 1:42*pm, Alok Kothari <(E-Mail Removed)> wrote:
> Hello,
> * * * * * I am new to XML parsing.Could you kindly tell me whats the
> problem with the following code:
>
> import xml.dom.minidom
> import xml.parsers.expat
> document = """<token pos="nn">Letterman</token><token pos="bez">is</
> token><token pos="jjr">better</token><token pos="cs">than</
> token><token pos="np">Jay</token><token pos="np">Leno</token>"""
>
> # 3 handler functions
> def start_element(name, attrs):
> * * print 'Start element:', name, attrs
> def end_element(name):
> * * print 'End element:', name
> def char_data(data):
> * * print 'Character data:', repr(data)
>
> p = xml.parsers.expat.ParserCreate()
>
> p.StartElementHandler = start_element
> p.EndElementHandler = end_element
> p.CharacterDataHandler = char_data
> p.Parse(document, 1)
>
> OUTPUT:
>
> Start element: token {u'pos': u'nn'}
> Character data: u'Letterman'
> End element: token
>
> Traceback (most recent call last):
> * File "C:/Python25/Programs/eg.py", line 20, in <module>
> * * p.Parse(document, 1)
> ExpatError: junk after document element: line 1, column 33



I don't know if you are aware of the BeautifulSoup module:


import BeautifulSoup as bs

xml = """<token pos="nn">Letterman</token><token pos="bez">is</
token><token pos="jjr">better</token><token pos="cs">than</
token><token pos="np">Jay</token><token pos="np">Leno</token>"""

doc = bs.BeautifulStoneSoup(xml)

tokens = doc.findAll("token")
for token in tokens:
for attr in token.attrs:
print "%s : %s" % attr


print token.string

--output:--
pos : nn
Letterman
pos : bez
is
pos : jjr
better
pos : cs
than
pos : np
Jay
pos : np
Leno
 
Reply With Quote
 
Gabriel Genellina
Guest
Posts: n/a
 
      04-02-2008
En Tue, 01 Apr 2008 20:44:41 -0300, 7stud <(E-Mail Removed)>
escribió:

>> * * * * * I am new to XML parsing.Could you kindly tell me whats the
>> problem with the following code:
>>
>> import xml.dom.minidom
>> import xml.parsers.expat

>
> I don't know if you are aware of the BeautifulSoup module:
>

Or ElementTree:

import xml.etree.ElementTree as ET

doctext = """<tokens><token pos="nn">Letterman</token><token
pos="bez">is</token><token pos="jjr">better</token><token
pos="cs">than</token><token pos="np">Jay</token><token
pos="np">Leno</token></tokens>"""

doc = ET.fromstring(doctext)
for token in doc.findall("token"):
print 'pos:', token.get('pos')
print 'text:', token.text

--
Gabriel Genellina

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What libraries should I use for MIME parsing, XML parsing, and MySQL ? John Levine Ruby 0 02-02-2012 11:15 PM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM
Sequential XML parsing with xml.sax peter@hardy.dropbear.id.au Python 2 08-24-2005 01:29 AM
Clarification on XML parsing & namespaces (xml.dom.minidom) Greg Wogan-Browne Python 1 01-28-2005 03:19 AM
Print XML parsing to JspWriter (out) Class org.xml.sax.helpers.NewInstance can not access a member of class javax.xml.parsers.SAXParser with modifiers "protected" Per Magnus L?vold Java 0 11-15-2004 02:27 PM



Advertisments