Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Parse and modify an XML file with REXML

Reply
Thread Tools

Parse and modify an XML file with REXML

 
 
jeffnyman@gmail.com
Guest
Posts: n/a
 
      10-05-2006
Greetings all.

When processing XML, is there a way to check what the previous and what
the next "rows" are?

That probably makes no sense without context, so here is an example. I
need to find things in the XML based on rules. For example, one rule
might be "find the first 203 that comes after 202." Another rule is
"Find the first 203 that comes before 16." So say I have this:

<variable value="202">
<variable value="203">
<variable value="203">
<variable value="203">
<variable value="203">
<variable value="203">
<variable value="16">

I have to be able to find that the element after 202 is 203. (As
opposed to a situation where a 202 appeared, but the next element was
not 203.) I then have to determine that the element after a given 203
is 16. Then I have to change the value attribute of the first and last
203 elements. So the XML, after applying the rules, would look like
this:

<variable value="202">
<variable value="203First">
<variable value="203">
<variable value="203">
<variable value="203">
<variable value="203Last">
<variable value="16">

The 202 and 16 are essentially bracketers of data, in this case. There
can be many such groups in the XML that look like this.

I know how to parse through XML using XPath or using a stream listener.
I have read the tutorial that comes with REXML. But what I'm not sure
how to do is check for the conditions like I described above. One
thought was I could read the XML into an array because then I get an
enforced "line numbering" with the indexing. So I could check
currentLine - 1 and currentLine + 1. I'm not sure if that is a smart
approach, however.

Has anyone done something similar in their work?

- Jeff

 
Reply With Quote
 
 
 
 
Tomasz Wegrzanowski
Guest
Posts: n/a
 
      10-05-2006
On 10/5/06, http://www.velocityreviews.com/forums/(E-Mail Removed) <(E-Mail Removed)> wrote:
> Has anyone done something similar in their work?


I think this is fairly similar to what magic/xml does.

The fastest way you can get a solution is by looking at
collection of XQuery use cases reimplemented in magic/xml:
http://zabor.org/taw/magic_xml/xquery_use_cases.html

--
Tomasz Wegrzanowski [ http://t-a-w.blogspot.com/ ]

 
Reply With Quote
 
 
 
 
Peter Szinek
Guest
Posts: n/a
 
      10-05-2006
(E-Mail Removed) wrote:
> Greetings all.
>
> When processing XML, is there a way to check what the previous and what
> the next "rows" are?


I don't know REXML that much (and using Hpricot anyway but standard
XPath axes ( following-sibling, preceding-sibling ) won't help? The
previous node in this case would be self:revious-sibling[1] etc.

HTH,

Peter
http://www.rubyrailways.com


 
Reply With Quote
 
Pete
Guest
Posts: n/a
 
      10-05-2006
In article <(E-Mail Removed) .com>,
<(E-Mail Removed)> wrote:
>Greetings all.
>
>When processing XML, is there a way to check what the previous and what
>the next "rows" are?
>
>That probably makes no sense without context, so here is an example. I
>need to find things in the XML based on rules. For example, one rule
>might be "find the first 203 that comes after 202." Another rule is
>"Find the first 203 that comes before 16." So say I have this:
>
><variable value="202">
><variable value="203">
><variable value="203">
><variable value="203">
><variable value="203">
><variable value="203">
><variable value="16">
>
>I have to be able to find that the element after 202 is 203. (As
>opposed to a situation where a 202 appeared, but the next element was
>not 203.) I then have to determine that the element after a given 203
>is 16. Then I have to change the value attribute of the first and last
>203 elements. [.....]
>
>Has anyone done something similar in their work?
>
>- Jeff


I've just been playing with a project that looks like it might have
some similarities. I acquired an app that creates an XML representation
of a midifile, and I wanted to add useful info to the XML to help the
human reader (and maybe allow other postprocessing). In particular,
a 'note' in a midifile is begun with a NoteOn event, and ends sometime
later when a corrsponding NoteOff appears. I wanted to add an attribute
to each NoteOn element that gave its actual duration. Other elements
that had added attributes could (otherwise) be output again immediately,
but the NoteOns would have to be held until the NoteOff was read, and
as order is important that meant other events might have to wait, too.

(Of course I'm using stream parsing here. the XML-ized midifile can
get pretty long, and I don't like the idea of keeping an entire DOM
tree around. I'm kind of more at home with streams, anyway.)

Essentially I make a list of the elements waiting to be output. Each
object in the list has a 'complete' flag that is set immediately for
most tags, except for NoteOn, which is set complete when the NoteOff
arrives and the duration can be calculated. When the first element
in the list becomes complete, all finished items at the head of the list
are output.

To keep track of the reading end of things I have Element Handler
objects that can maintain knowledge of the current state (which in the
case of NoteOn/Offs means a fairly large array of references, but for
your purposes would just be the value of the previous 'variable').
I actually wrote an extension to REXML for this that I think is quite
useful, and will publish -- soon, I hope. I don't think that would
be needed for your job, though; a simple 'tag_start' handler (from
REXML::StreamListener) that recognized tag 'variable' should be
adequate.

You'd then just have to note, when you got a '203' whether the
previous was '202' and modify it if so. If not, you'd hold on to
it until the next 'variable'; if that was '16', you'd modify it
and output it, otherwise you'd just output it. You wouldn't even
need a list if there were never any intervening elements.

Oof! Sorry, that got rather long-winded, and I don't know if it made
any sense, but I hope it's useful.

-- Pete --

--
================================================== ==========================
The address in the header is a Spam Bucket -- don't bother replying to it...
(If you do need to email, replace the account name with my true name.)
 
Reply With Quote
 
Jeff Nyman
Guest
Posts: n/a
 
      10-06-2006
"Peter Szinek" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> (E-Mail Removed) wrote:
>> Greetings all.
>>
>> When processing XML, is there a way to check what the previous and what
>> the next "rows" are?

>
> I don't know REXML that much (and using Hpricot anyway but standard
> XPath axes ( following-sibling, preceding-sibling ) won't help? The
> previous node in this case would be self:revious-sibling[1] etc.


Thanks for the suggestion. This sounds like it might work. I did not see
this in the REXML documentation initially but I see generally how these work
in concept. In practice, it does not seem to work for me.

I have my XML like this (greatly pared down):

<perflog>
<module>
<perfpoints>
<variable name="202G_OrdAdd">
<variable name="203G_OrdUpdate">
....
</perfpoints
</module>
</perflog>

I tried this:

<code>
xml = Document.new(File.open("test.xml"))

events = XPath.match(xml,
'/perflog/module/perfpoints/variable[@name="203G_OrdUpdate"]'
)

events.each do |event|
puts XPath.match(event, '[selfreceding-sibling[1](@name,
"202G_OrdAdd")]')
end
</code>

In the events iterator, I also tried the following variation:

puts XPath.match(event, 'selfreceding-sibling[1](@name, "202G_OrdAdd")')

I also tried replacing the 'self' with the full node path (i.e.,
"//perflog/module/perfpoints/variable").

I should note I don't get an error when I run the above. I simply get
nothing, so my guess is that I'm using preceding-sibling wrong. I'm guessing
it never feels it found the condition I'm indicating it should be finding.

I did find that I can do this:

puts XPath.match(event, '[selfreceding-sibling::variable[1](@name,
"202G_OrdAdd")]')

(Note the "::variable[1]" addition.) Some documentation I found suggests
that this should count backwards and reference the closest preceding
variable sibling. That does seem to work -- to an extent, but I get
everything returned. Meaning I get this in my results:

<variable name = "203G_OrdUpdate">
<variable name = "202G_OrdAdd">

.... but then I get all the other 203's in my XML listed as well. What I'm
trying to do is just return the one 203 that has a preceding sibling that
has the attribute name 202G_OrdAdd.

I'm getting closer, though. Thank you for the suggestion, as this does seem
to be the road I need to be on.

- Jeff


 
Reply With Quote
 
Ken
Guest
Posts: n/a
 
      10-06-2006
Actually, none of this will work. You can't do what you're trying to do
because preceding-sibling will look at all the preceding siblings. So you'll
find your first 203 gets reported correctly as being "after" 202. But all of
the other 203's in your XML will also say they are after 202 -- because they
are!

If you put a yield statement in your events.each iterator, you'll see what I
mean. It will report the first 203 correctly. The loop will break that that
point because yield will tell you that you have no block. But the point is
when you take out yield, you'll see that your output is all the 203's.

The issue is that you're trying to do two predicates at the same time. That
can work (just have two bracketed groups), but not with how you are trying
to do it in this case. I'd recommend just treating the XML file like a
regular old text file and parse it line by line with regular expressions.
Don't even use an XML parser.


 
Reply With Quote
 
Jeff Nyman
Guest
Posts: n/a
 
      10-06-2006
"Ken" <(E-Mail Removed)> wrote:

> If you put a yield statement in your events.each iterator, you'll see what
> I mean. It will report the first 203 correctly. The loop will break that
> that point because yield will tell you that you have no block. But the
> point is when you take out yield, you'll see that your output is all the
> 203's.


Hmmm. But, you know, you gave me an idea and it does appear to work, at
least when I get out of using my event iterator. Check this out.

If I use this:

XPath.first(xml,
'//variable[@name="203G_OrdUpdate"][following-sibling::variable[1][@name="16G_OrdAdd"]]')

I do get the 203 that appears just before the 16G_OrdAdd. (There are 30
203's in the file and I can tell it's grabbing the right one because each
has a unique count attribute.)

Similarly, I can do this:

XPath.first(xml,
'//variable[@name="203G_OrdUpdate"][preceding-sibling::variable[1][@name="202G_OrdAdd"]]')

That, in turn gets me the first 203 after my 202.

If I change my "first" to "match" then everything comes up just as I want.
So I think my use of the events iterator was throwing me off in terms of
getting my results. It looks like I don't really need to do that. Is the
iterator what you were referring to in terms of this not being workable?
(The "yield" thing kind of threw me off.)

- Jeff


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Rexml pretty printing "can't modify frozen string" Une Bévue Ruby 3 04-25-2010 10:23 AM
REXML::Element.write is deprecated. See REXML::Formatters Phlip Ruby 0 01-15-2008 08:23 PM
REXML/RSS parse error Patrick Plattes Ruby 4 12-07-2006 03:32 PM
rexml error - REXML::Validation Daniel Berger Ruby 2 10-12-2004 04:19 PM
soap4r 1.4.8.1 with REXML 2.7.1 - no REXML::VERSION_MAJOR Damphyr Ruby 2 07-16-2003 09:49 AM



Advertisments