Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > How to get REXML to return items in order??

Reply
Thread Tools

How to get REXML to return items in order??

 
 
ted
Guest
Posts: n/a
 
      09-03-2005
Hi,

I'm new to Ruby and can't figure out why REXML isn't returning the elements
in the order they appear in the document. Here's my code and the document.
Any help appreciated.

Thanks,
Ted

#==============================
# ruby
#==============================
xml = REXML:ocument.new(File.open("test.html"));
xml.elements.each("//span[@class='c5']") do |element|
puts element
end

#==============================
# the "test.html" file
#==============================
<html>
<body>
<a name="1"/>
<table><tr><td><span class="c5"><b>1st Title</b></span></td></tr></table>
<a name="2"/>
<table><tr><td><span class="c5"><b>2nd Title</b></span></td></tr></table>
<a name="3"/>
<table><tr><td><span class="c5"><b>3rd Title</b></span></td></tr>
</table>
</body>
</html>



 
Reply With Quote
 
 
 
 
Gavin Kistner
Guest
Posts: n/a
 
      09-03-2005
--Apple-Mail-2--692775499
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
delsp=yes;
format=flowed

On Sep 3, 2005, at 3:16 PM, ted wrote:
> I'm new to Ruby and can't figure out why REXML isn't returning the
> elements
> in the order they appear in the document. Here's my code and the
> document.


I confirm the problem. Looks like a bug. If I remove some of the
anchors, it works.
(Off-topic - no need to use empty named anchors in your page - just
use IDs on existing elements instead.)

Sliver:~/Desktop] gkistner$ cat tmp.rb
code = <<ENDHTML
<html><body>
<a name="1"/>
<table><tr><td><span class="c5"><b>1st Title</b></span></td></tr></
table>
<a name="2"/>
<table><tr><td><span class="c5"><b>2nd Title</b></span></td></tr></
table>
<a name="3"/>
<table><tr><td><span class="c5"><b>3rd Title</b></span></td></tr></
table>
</body></html>
ENDHTML

require 'rexml/document'
xml = REXML:ocument.new( code );
xml.elements.each( "//span[@class='c5']" ) do |element|
puts element
end


[Sliver:~/Desktop] gkistner$ ruby -v tmp.rb
ruby 1.8.2 (2004-12-25) [powerpc-darwin7.7.2]
<span class='c5'><b>3rd Title</b></span>
<span class='c5'><b>1st Title</b></span>
<span class='c5'><b>2nd Title</b></span>

--Apple-Mail-2--692775499--


 
Reply With Quote
 
 
 
 
ted
Guest
Posts: n/a
 
      09-04-2005
Thanks Gavin. Unfortunately I can't remove the anchors. The html is just a
sample of the documents (not my docs) that I'm given to parse. Someone on
IRC mentioned that XPath 1.0 doesn't guarantee the order of elements.




"Gavin Kistner" <> wrote in message
news:6A73B666-6668-430A-8C58-...
> On Sep 3, 2005, at 3:16 PM, ted wrote:
>> I'm new to Ruby and can't figure out why REXML isn't returning the
>> elements
>> in the order they appear in the document. Here's my code and the
>> document.

>
> I confirm the problem. Looks like a bug. If I remove some of the
> anchors, it works.
> (Off-topic - no need to use empty named anchors in your page - just
> use IDs on existing elements instead.)
>
> Sliver:~/Desktop] gkistner$ cat tmp.rb
> code = <<ENDHTML
> <html><body>
> <a name="1"/>
> <table><tr><td><span class="c5"><b>1st Title</b></span></td></tr></
> table>
> <a name="2"/>
> <table><tr><td><span class="c5"><b>2nd Title</b></span></td></tr></
> table>
> <a name="3"/>
> <table><tr><td><span class="c5"><b>3rd Title</b></span></td></tr></
> table>
> </body></html>
> ENDHTML
>
> require 'rexml/document'
> xml = REXML:ocument.new( code );
> xml.elements.each( "//span[@class='c5']" ) do |element|
> puts element
> end
>
>
> [Sliver:~/Desktop] gkistner$ ruby -v tmp.rb
> ruby 1.8.2 (2004-12-25) [powerpc-darwin7.7.2]
> <span class='c5'><b>3rd Title</b></span>
> <span class='c5'><b>1st Title</b></span>
> <span class='c5'><b>2nd Title</b></span>
>



 
Reply With Quote
 
David A. Black
Guest
Posts: n/a
 
      09-04-2005
Hi --

On Sun, 4 Sep 2005, ted wrote:

> Thanks Gavin. Unfortunately I can't remove the anchors. The html is just a
> sample of the documents (not my docs) that I'm given to parse. Someone on
> IRC mentioned that XPath 1.0 doesn't guarantee the order of elements.


I would be astonished if Sean Russell had combed through the 1.0 spec
to find some loophole that made it plausible to have an iteration not
follow document order. I could be wrong but I think it's more likely
a REXML bug.


David

--
David A. Black



 
Reply With Quote
 
daz
Guest
Posts: n/a
 
      09-04-2005

Gavin Kistner wrote:
> On Sep 3, 2005, at 3:16 PM, ted wrote:
> > I'm new to Ruby and can't figure out why REXML isn't returning the
> > elements
> > in the order they appear in the document. Here's my code and the
> > document.

>
> I confirm the problem. Looks like a bug. [...]



.... and it's fixed in CVS for 1.8.3

If you need this now, you could download the later version here:
http://www.ruby-lang.org/cgi-bin/cvs..._1_8;tarball=1

to e.g. "C:\Ruby\TEMP" then change the lookup path at the top of your script.



$:.unshift('C:/Ruby/TEMP') # for rexml fixes
require 'rexml/document'
xml = REXML:ocument.new(DATA)
xml.elements.each("//span[@class='c5']") do |element|
puts element
end

#-> <span class='c5'><b>1st Title</b></span>
#-> <span class='c5'><b>2nd Title</b></span>
#-> <span class='c5'><b>3rd Title</b></span>

__END__
<html>
<body>
<a name="1"/>
<table><tr><td><span class="c5"><b>1st Title</b></span></td></tr></table>
<a name="2"/>
<table><tr><td><span class="c5"><b>2nd Title</b></span></td></tr></table>
<a name="3"/>
<table><tr><td><span class="c5"><b>3rd Title</b></span></td></tr>
</table>
</body>
</html>


daz



 
Reply With Quote
 
ted
Guest
Posts: n/a
 
      09-04-2005
Thanks daz.


"daz" <> wrote in message
news:...
>
> Gavin Kistner wrote:
>> On Sep 3, 2005, at 3:16 PM, ted wrote:
>> > I'm new to Ruby and can't figure out why REXML isn't returning the
>> > elements
>> > in the order they appear in the document. Here's my code and the
>> > document.

>>
>> I confirm the problem. Looks like a bug. [...]

>
>
> ... and it's fixed in CVS for 1.8.3
>
> If you need this now, you could download the later version here:
> http://www.ruby-lang.org/cgi-bin/cvs..._1_8;tarball=1
>
> to e.g. "C:\Ruby\TEMP" then change the lookup path at the top of your
> script.
>
>
>
> $:.unshift('C:/Ruby/TEMP') # for rexml fixes
> require 'rexml/document'
> xml = REXML:ocument.new(DATA)
> xml.elements.each("//span[@class='c5']") do |element|
> puts element
> end
>
> #-> <span class='c5'><b>1st Title</b></span>
> #-> <span class='c5'><b>2nd Title</b></span>
> #-> <span class='c5'><b>3rd Title</b></span>
>
> __END__
> <html>
> <body>
> <a name="1"/>
> <table><tr><td><span class="c5"><b>1st Title</b></span></td></tr></table>
> <a name="2"/>
> <table><tr><td><span class="c5"><b>2nd Title</b></span></td></tr></table>
> <a name="3"/>
> <table><tr><td><span class="c5"><b>3rd Title</b></span></td></tr>
> </table>
> </body>
> </html>
>
>
> daz
>
>
>



 
Reply With Quote
 
Dan Kohn
Guest
Posts: n/a
 
      09-15-2005
I just wanted to mention that I encountered the same bug and that the
new version of the library fixed it for me. Thank you very much for
the clear instructions. If only for pay products had support that was
this good....

- dan
--
Dan Kohn <private.php?do=newpm&u=>
<http://www.dankohn.com/> <tel:+1-415-233-1000>

 
Reply With Quote
 
Dan Kohn
Guest
Posts: n/a
 
      09-15-2005
Daz, there's a bug in the CVS version of REXML. The following code
produces the error below, but works perfectly with the default 1.8.2
REXML (i.e., when I comment out the first line).

>ruby rexmlbug.rb

C:/Dan/dev/rexml/xpath_parser.rb:157:in `expr': undefined method
`delete_if' for nil:NilClass (NoMethodError)
from C:/Dan/dev/rexml/xpath_parser.rb:481:in `d_o_s'
from C:/Dan/dev/rexml/xpath_parser.rb:478:in `each_index'
from C:/Dan/dev/rexml/xpath_parser.rb:478:in `d_o_s'
from C:/Dan/dev/rexml/xpath_parser.rb:469:in `descendant_or_self'
from C:/Dan/dev/rexml/xpath_parser.rb:314:in `expr'
from C:/Dan/dev/rexml/xpath_parser.rb:125:in `match'
from C:/Dan/dev/rexml/xpath_parser.rb:56:in `parse'
from C:/Dan/dev/rexml/xpath.rb:53:in `each'
from rexmlbug.rb:28
>Exit code: 1



$:.unshift('C:/Dan/dev') # for rexml fixes
require "rexml/document"
include REXML
string = <<EOF
<html>
<td class="t4"><a href="javascript:lu('OZ')">OZ</a>
0204 F Class
<a href="/cgi/get?apt:uMl8TIcSlHI*itn/airports/ICN,itn/air/mp">
ICN</a> to <a
href="/cgi/get?apt:uMl8TIcSlHI*itn/airports/LAX,itn/air/mp">
LAX</a></td>
<tr>
<td class="t4"><font color="white">UNITED</font></td>
<td colspan="4" align="right">
<strong>48,164</strong></td>
</tr>
<tr>
<td class="t4"><font color="white">Star
Alliance</font></td>
<td colspan="4" align="right">
<strong>49,072</strong></td>
</tr>
</html>
EOF

doc = Document.new string.gsub!(/\s+|&nbsp;/," ")
array = Array.new
XPath.each( doc, "//td[@colspan='4']/preceding-sibling::td/child::*") {
|cell|
array << cell.texts.to_s }
puts array

 
Reply With Quote
 
daz
Guest
Posts: n/a
 
      09-17-2005

Dan Kohn wrote:
> Daz, there's a bug in the CVS version of REXML.
> The following code produces the error below, but works
> perfectly with the default 1.8.2 REXML [...]


Thanks Dan.

The REXML code in that area looks quite "fluid" and
there are clear warnings to "turn away" (which I heeded .

I've filed a bug report which you might want to check over
and then keep a watch on.

http://www.germane-software.com/proj...exml/ticket/32


Cheers,

daz



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
REXML::Element.write is deprecated. See REXML::Formatters Phlip Ruby 0 01-15-2008 08:23 PM
what value does lack of return or empty "return;" return Greenhorn C Programming 15 03-06-2005 08:19 PM
rexml error - REXML::Validation Daniel Berger Ruby 2 10-12-2004 04:19 PM
soap4r 1.4.8.1 with REXML 2.7.1 - no REXML::VERSION_MAJOR Damphyr Ruby 2 07-16-2003 09:49 AM



Advertisments