Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Getting kind of abstract text snippets from text nodes

Reply
Thread Tools

Getting kind of abstract text snippets from text nodes

 
 
Andreas W. Wylach
Guest
Posts: n/a
 
      03-08-2007
Hi everybody,

I am about implementing a little search engine that searches a phrase
over xml text nodes. I got
that all working fine but what I want as the results is not the
complete text of the textnode,
I would like to make an abstract like result list (such output that
you get with google searches.

For eg

.... I am the <b>substring</b> from a complete text node ...

where "substring" is the search term.

The problem is simple (I think): I want to extract all the text parts
of the complete text node,
where search searchterm is highlighted, surrounded by the text like
30
characters.

I found an intersting post "cut down text" which is almost that what
I
am looking for, but there the
text is just trimmed by x characters.

Is anybody here, that has an "elegant" way to solve that or some
hints
that get me to the solution? I am not able to use regex (would be
nice
though)
My parser is Sablotron so I am restricted to the functions that I
get.
(1.0).


Any help is greatly appreciated.


regards,
Andreas W Wylach

 
Reply With Quote
 
 
 
 
Joe Kesselman
Guest
Posts: n/a
 
      03-08-2007
Think about dividing the text into three parts: before your target, the
target itself, and after the target. Process each appropriately. If you
want to report multiple instances within the same block of text, look at
the standard examples of recursive text processing.


--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
 
Reply With Quote
 
 
 
 
Dimitre Novatchev
Guest
Posts: n/a
 
      03-10-2007

"Andreas W. Wylach" <> wrote in message
news: ps.com...
> Hi everybody,
>
> I am about implementing a little search engine that searches a phrase
> over xml text nodes. I got
> that all working fine but what I want as the results is not the
> complete text of the textnode,
> I would like to make an abstract like result list (such output that
> you get with google searches.
>
> For eg
>
> ... I am the <b>substring</b> from a complete text node ...
>
> where "substring" is the search term.
>
> The problem is simple (I think): I want to extract all the text parts
> of the complete text node,
> where search searchterm is highlighted, surrounded by the text like
> 30
> characters.



FXSL gives you exactly that (look for testConcordance.xsl).

As first shown here a year and a half ago:


http://www.stylusstudio.com/xsllist/...post00560.html

this was used to create a concordance of the text of the New Testament for
any word longer than three characters with frequency count in the document
not exceeding a given frequency count parameter (1280, which practically
leaves out mainly pronouns).

The code itself is 95 lines and on a 3GHz, 2GB Pentium IV PC with Saxon 8.6
(at that time) needed less than 92 seconds to produce the complete (huge)
concordance. The source xml document: "ot Ending Spaces.xml" is almost 50
000 (fifty thousand) lines long.

This is just one illustration of the reality of what can be done with XSLT,
disspelling the myths of "XSLT cannot do this or that
efficiently/elegantly".

Hope this helped.


Cheers,
Dimitre Novatchev




 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
About abstract class and abstract method Sameer Java 4 08-31-2005 12:59 AM
Why treat text nodes as nodes? Xamle Eng XML 8 05-28-2005 01:11 PM
Text nodes and element nodes query asd Java 3 05-23-2005 10:01 AM
Deriving abstract class from non-abstract class Matthias Kaeppler Java 1 05-22-2005 01:28 PM
Abstract Classes w/o abstract methods DaKoadMunky Java 4 04-20-2004 04:53 AM



Advertisments