Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > XQuerying material between elements

Reply
Thread Tools

XQuerying material between elements

 
 
patrik.nyman@orient.su.se
Guest
Posts: n/a
 
      04-25-2007
I am working with marking up the text of old books,
and need to be able to present the result page-wise.
Problem is, sometimes the page breaks occurs in the
middle of a paragraph (or in some other element).
See the following example.

<p>I shall not describe it to you, for in-
<lb/>deed I cannot. To delineate the truly aw-
<lb/>ful locality of Trollhättan, would
<lb/>baffle the powers of poetic fancy, and mock
<pb n="15" urn="urn:nbn:se:kb:digark-7886"/>
<lb/>the painter's daring pencil. I ran only af-
<lb/>ford you a faint idea of its characteristic
<lb/>features, and even that will he found
<lb/>arduous. Come, and see it, and you will
<lb/>applaud my modesty.
</p>

<p>[...]
<lb/>of gold." Subscribing to the old Swedish
<lb/>proverb: When it rains down milk, the poor
<lb/>has no spoon," I silently dropped the theme,
<lb/>and would not have rementioned it now,
<pb n="16" urn="urn:nbn:se:kb:digark-7887"/>
<lb/>if I were not anxious to dis-play to you, what
<lb/>an able minister of state I might possibly
<lb/>be, if His Majesty should be pleased to
<lb/>invest me with that honor, which, you
<lb/>know, is as distant from me as the mitre
<lb/>and the slipper of the Pope of Rome.
</p>

Just separating out the material in between the <pb/>'s
gives non-wellformed XML.

So, is it possible to write an XQuery expression that
can fix this, i.e. 'detect' that the <pb/> occurs in
the middle of another element and take the appropriate
action? The result would have to look something like

<pb n="15" urn="urn:nbn:se:kb:digark-7886"/>
<p rend="noindent">the painter's daring pencil. I ran only af-
<lb/>ford you a faint idea of its characteristic
<lb/>features, and even that will he found
<lb/>arduous. Come, and see it, and you will
<lb/>applaud my modesty.
</p>

<p>[...]
<lb/>of gold." Subscribing to the old Swedish
<lb/>proverb: When it rains down milk, the poor
<lb/>has no spoon," I silently dropped the theme,
<lb/>and would not have rementioned it now,
</p>

Thanks.

 
Reply With Quote
 
 
 
 
Joseph Kesselman
Guest
Posts: n/a
 
      04-25-2007
I'm sure XQuery can do it, though I'm not sure of the syntax offhand.

In XPath, I would set up a template that matches on p[pb] (a paragraph
that contains a page break) and rewrites it appropriately by first
outputting a p containing the pb's preceeding siblings, then the pb,
then a p containing the following siblings. Very straightforward.

--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
 
Reply With Quote
 
 
 
 
Pavel Lepin
Guest
Posts: n/a
 
      04-25-2007
Joseph Kesselman <keshlam-> wrote in
<462f6975$1@kcnews01>:
> > So, is it possible to write an XQuery expression that
> > can fix this, i.e. 'detect' that the <pb/> occurs in
> > the middle of another element and take the appropriate
> > action? The result would have to look something like
> > I'm sure XQuery can do it, though I'm not sure of the
> > syntax offhand.

>
> In XPath, I would set up a template that matches on p[pb]
> (a paragraph that contains a page break) and rewrites it
> appropriately by first outputting a p containing the pb's
> preceeding siblings, then the pb, then a p containing the
> following siblings. Very straightforward.


XSLT does indeed seem like a better bet than XQuery in this
case, but if you try to generalize the problem a bit
(multiple page breaks and more than one level of ancestor
elements to be spliced) it gets kinda messy with XSLT1. On
the other hand, an XSLT2 solution would be fairly elegant
thanks to sequences--may FSM touch with his noodly
appendage whoever on XSLT WG came up with those.

--
Pavel Lepin
 
Reply With Quote
 
Joseph Kesselman
Guest
Posts: n/a
 
      04-25-2007
Joseph Kesselman wrote:
> In XPath,


Meant to write XSLT, obviously. Sigh. Engage mind, THEN put fingers in
gear...

--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
 
Reply With Quote
 
patrik.nyman@orient.su.se
Guest
Posts: n/a
 
      04-26-2007
Thanks for the replies. I forgot to mention that the texts
are posited in the eXist database, hence the need for XQuery.
What I've managed to come up with is this.

1 <hit>
2 (: Check if the initial <pb> is the child of another element,
3 and print the name of that element.
4 {
5 let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886']
6 return
7 if ($i1[parent:]) then
8 '<p rend="noindent">'
9 else
10 if ($i1[parent::lg]) then
11 '<lg>'
12 else()
13 }
14 (: Print the material between the pagebreaks.
15 {
16 let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886'],
17 $i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
18 for $n in //text()
19 where $n >> $i1 and $n << $i2
20 return $n
21 }
22 (: Check if the final <pb> is the child of another element,
23 and print the name of that element.
24 {
25 let $i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
26 return
27 if ($i2[parent:]) then
28 '</p>'
29 else
30 if ($i2[parent::lg]) then
31 '</lg>'
32 else()
33 }
34 </hit>

This works fine, except of course for the 'text()' om line 18.
This outputs only the text, not the text and markup, which is what I
want.
Switching 'text()' for 'node()' or 'element()' doesn't give the
desired result either, naturally.

Any suggestions are welcome. Thanks.
--
Patrik Nyman

 
Reply With Quote
 
Pierrick Brihaye
Guest
Posts: n/a
 
      04-26-2007
a écrit :

For my curiosity, is :

> 1 <hit>
> 2 (: Check if the initial <pb> is the child of another element,
> 3 and print the name of that element.
> 4 {
> 5 let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886']
> 6 return
> 7 if ($i1[parent:]) then


this :

> 8 '<p rend="noindent">'
> 9 else
> 10 if ($i1[parent::lg]) then


this :

> 11 '<lg>'
> 12 else()
> 13 }
> 14 (: Print the material between the pagebreaks.
> 15 {
> 16 let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886'],
> 17 $i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
> 18 for $n in //text()
> 19 where $n >> $i1 and $n << $i2
> 20 return $n
> 21 }
> 22 (: Check if the final <pb> is the child of another element,
> 23 and print the name of that element.
> 24 {
> 25 let $i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
> 26 return
> 27 if ($i2[parent:]) then


this :

> 28 '</p>'
> 29 else
> 30 if ($i2[parent::lg]) then


and this :

> 31 '</lg>'
> 32 else()
> 33 }
> 34 </hit>


.... supposed to be mark-up in the resulting sequence ?

p.b.
 
Reply With Quote
 
Joseph Kesselman
Guest
Posts: n/a
 
      04-26-2007
Pierrick Brihaye wrote:
> For my curiosity, is :
> this :
>> 8 '<p rend="noindent">'

> ... supposed to be mark-up in the resulting sequence ?


I certainly hope not, because if so I'd consider it an abuse of XQuery,
akin to trying to hand-construct tags in XSLT.

If the goal is to construct document structure, construct structure, not
text that looks like structure.


--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
 
Reply With Quote
 
Priscilla Walmsley
Guest
Posts: n/a
 
      04-27-2007
Hi,

How about something like this:

let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886'],
$i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
return <hit>{
if ($i1[parent:])
then <p rend="noindent">{$i1/following-sibling::node()}</p>
else ()
,
for $n in //p
where $n >> $i1 and $n << $i2 and not($n/*[. is $i1]) and not($n/*[. is
$i2])
return $n
,
if ($i2[parent:])
then <p>{$i2/preceding-sibling::node()}</p>
else ()

}</hit>

Hope that helps,
Priscilla

---------------------------------------------
Priscilla Walmsley
Author, XQuery (2007, O'Reilly Media)
http://www.datypic.com
http://www.xqueryfunctions.com
---------------------------------------------

*** Sent via Developersdex http://www.developersdex.com ***
 
Reply With Quote
 
patrik.nyman@orient.su.se
Guest
Posts: n/a
 
      04-29-2007
On 27 Apr, 19:13, Priscilla Walmsley <nos...@datypic.com> wrote:
> Hi,
>
> How about something like this:
>
> let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886'],
> $i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
> return <hit>{
> if ($i1[parent:])
> then <p rend="noindent">{$i1/following-sibling::node()}</p>
> else ()
> ,
> for $n in //p
> where $n >> $i1 and $n << $i2 and not($n/*[. is $i1]) and not($n/*[. is
> $i2])
> return $n
> ,
> if ($i2[parent:])
> then <p>{$i2/preceding-sibling::node()}</p>
> else ()
>
> }</hit>
>
> Hope that helps,
> Priscilla
>
> ---------------------------------------------
> Priscilla Walmsley
> Author, XQuery (2007, O'Reilly Media)http://www.datypic.comhttp://www.xqueryfunctions.com
> ---------------------------------------------
>
> *** Sent via Developersdexhttp://www.developersdex.com***


Thanks a lot for this. I cannot test it until wednesday, but then I'll
let you know.

/Patrik Nyman

 
Reply With Quote
 
patrik.nyman@orient.su.se
Guest
Posts: n/a
 
      05-03-2007
On 27 Apr, 19:13, Priscilla Walmsley <nos...@datypic.com> wrote:
> Hi,
>
> How about something like this:
>
> let $i1 := //pb[@urn='urn:nbn:se:kb:digark-7886'],
> $i2 := //pb[@urn='urn:nbn:se:kb:digark-7887']
> return <hit>{
> if ($i1[parent:])
> then <p rend="noindent">{$i1/following-sibling::node()}</p>
> else ()
> ,
> for $n in //p
> where $n >> $i1 and $n << $i2 and not($n/*[. is $i1]) and not($n/*[. is
> $i2])
> return $n
> ,
> if ($i2[parent:])
> then <p>{$i2/preceding-sibling::node()}</p>
> else ()
>
> }</hit>
>
> Hope that helps,
> Priscilla


Yes, it works, and is much better than my version!
Thanks a lot,
Patrik


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
XSLT: iterating all child elements and accessing homonymous childrenin sibling elements Gerald Aichholzer XML 2 06-27-2006 03:46 PM
How do I allow both elements or No elements Billy XML 4 09-12-2005 08:29 AM
Elements within elements Jyrki Keisala XML 5 06-15-2005 04:58 PM
container elements for repeating elements ('element farms') needed? Wolfgang Lipp XML 1 01-30-2004 04:09 PM
container elements for repeating elements ('element farms') needed? Wolfgang Lipp XML 0 01-28-2004 02:50 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57