Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Get content in a xml element using hpricot

Reply
Thread Tools

Get content in a xml element using hpricot

 
 
Bonita
Guest
Posts: n/a
 
      04-13-2007
Hi


I'm using hpricot to parse the following file.

<item
rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn">
<title>[from morwyn] * HTML for the Conceptually Challenged</title>
<link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn</link>
<description>HTML for the Conceptually Challenged. Very basic tutorial,
plainly worded for people who hate to read instructions.</description>
<dc:creator>morwyn</dc:creator>
<dc:date>2006-10-10T07:28:28Z</dc:date>
<dc:subject>html imported webpagedesign</dc:subject>
<taxo:topics>
<rdf:Bag>
<rdf:li resource="http://del.icio.us/tag/imported" />
<rdf:li resource="http://del.icio.us/tag/html" />
<rdf:li resource="http://del.icio.us/tag/webpagedesign" />
</rdf:Bag>
</taxo:topics>
</item>

I'm trying to get the content from <dc:subject> like this

doc = Hpricot.parse(File.read("965.xhtml"))

(doc/"item").each do |t|

puts (t/"dc:subject").innerTEXT

end

but I got

<dc:subject>html internet tutorial web</dc:subject>

while I only need "html internet tutorial web"

Anyone knows what's the right function to call?

THanks

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
kikijump@gmail.com
Guest
Posts: n/a
 
      04-13-2007
On Apr 13, 9:48 am, Bonita <(E-Mail Removed)> wrote:
> Hi
>
> I'm using hpricot to parse the following file.
>
> <item
> rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn">
> <title>[from morwyn] * HTML for the Conceptually Challenged</title>
> <link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn</link>
> <description>HTML for the Conceptually Challenged. Very basic tutorial,
> plainly worded for people who hate to read instructions.</description>
> <dc:creator>morwyn</dc:creator>
> <dc:date>2006-10-10T07:28:28Z</dc:date>
> <dc:subject>html imported webpagedesign</dc:subject>
> <taxo:topics>
> <rdf:Bag>
> <rdf:li resource="http://del.icio.us/tag/imported" />
> <rdf:li resource="http://del.icio.us/tag/html" />
> <rdf:li resource="http://del.icio.us/tag/webpagedesign" />
> </rdf:Bag>
> </taxo:topics>
> </item>
>
> I'm trying to get the content from <dc:subject> like this
>
> doc = Hpricot.parse(File.read("965.xhtml"))
>
> (doc/"item").each do |t|
>
> puts (t/"dc:subject").innerTEXT
>
> end
>
> but I got
>
> <dc:subject>html internet tutorial web</dc:subject>
>
> while I only need "html internet tutorial web"
>
> Anyone knows what's the right function to call?
>
> THanks
>
> --
> Posted viahttp://www.ruby-forum.com/.


>> puts (t/'dc:subject').text


 
Reply With Quote
 
 
 
 
kikijump@gmail.com
Guest
Posts: n/a
 
      04-13-2007
On Apr 13, 12:40 pm, (E-Mail Removed) wrote:
> On Apr 13, 9:48 am, Bonita <(E-Mail Removed)> wrote:
>
> > Hi

>
> > I'm using hpricot to parse the following file.

>
> > <item
> > rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn">
> > <title>[from morwyn] * HTML for the Conceptually Challenged</title>
> > <link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn</link>
> > <description>HTML for the Conceptually Challenged. Very basic tutorial,
> > plainly worded for people who hate to read instructions.</description>
> > <dc:creator>morwyn</dc:creator>
> > <dc:date>2006-10-10T07:28:28Z</dc:date>
> > <dc:subject>html imported webpagedesign</dc:subject>
> > <taxo:topics>
> > <rdf:Bag>
> > <rdf:li resource="http://del.icio.us/tag/imported" />
> > <rdf:li resource="http://del.icio.us/tag/html" />
> > <rdf:li resource="http://del.icio.us/tag/webpagedesign" />
> > </rdf:Bag>
> > </taxo:topics>
> > </item>

>
> > I'm trying to get the content from <dc:subject> like this

>
> > doc = Hpricot.parse(File.read("965.xhtml"))

>
> > (doc/"item").each do |t|

>
> > puts (t/"dc:subject").innerTEXT

>
> > end

>
> > but I got

>
> > <dc:subject>html internet tutorial web</dc:subject>

>
> > while I only need "html internet tutorial web"

>
> > Anyone knows what's the right function to call?

>
> > THanks

>
> > --
> > Posted viahttp://www.ruby-forum.com/.
> >> puts (t/'dc:subject').text


puts (t/'dc:subject').text

Sorry for the double post but I shouldn't have copy/paste the result
directly from irb

 
Reply With Quote
 
Billy Hsu
Guest
Posts: n/a
 
      04-13-2007
Sorry for deleted your text

Maybe you can try:

puts (t/"dc:subject").text

Bonita wrote:
> I'm trying to get the content from <dc:subject> like this
>
> doc = Hpricot.parse(File.read("965.xhtml"))
>
> (doc/"item").each do |t|
>
> puts (t/"dc:subject").innerTEXT
>
> end
>
> but I got
>
> <dc:subject>html internet tutorial web</dc:subject>
>
> while I only need "html internet tutorial web"
>
> Anyone knows what's the right function to call?
>
> THanks


--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Get XML content using XML::Twig alwaysonnet Perl Misc 19 04-29-2010 07:47 PM
Adding new xml element with hpricot Milo Thurston Ruby 0 03-16-2009 10:18 AM
Scraping 3rd element with hpricot Mark Nielsen Ruby 2 12-10-2008 01:37 AM
how to Update/insert an xml element's text----> (<element>text</element>) HANM XML 2 01-29-2008 03:31 PM
get textual content of a Xml element using 4DOM frankabel Python 4 03-06-2005 08:21 AM



Advertisments