Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > regexp to list all sentences and sub sentences, with overlapping?

Reply
Thread Tools

regexp to list all sentences and sub sentences, with overlapping?

 
 
Tony
Guest
Posts: n/a
 
      11-26-2003
Hello,

Can someone please point me toward a regular expression that goes
through a string and contructs a list of sentences and part sentences,
where words are gradually dropped from the front of the current
sentence? Sound confusing?

Well perhaps an example would help? Given...

"Different countries have different ideas. Merry Christmas to all."

I'd like to output:

Different countries have different ideas.
countries have different ideas.
have different ideas.
different ideas.
Merry Christmas to all.
Christmas to all.
to all.

Is that possible?

Thanks in advance,

Tony
 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      11-26-2003
Tony wrote:
> Can someone please point me toward a regular expression that goes
> through a string and contructs a list of sentences and part sentences,
> where words are gradually dropped from the front of the current
> sentence? Sound confusing?
>
> Well perhaps an example would help? Given...
>
> "Different countries have different ideas. Merry Christmas to all."
>
> I'd like to output:
>
> Different countries have different ideas.
> countries have different ideas.
> have different ideas.
> different ideas.
> Merry Christmas to all.
> Christmas to all.
> to all.
>
> Is that possible?


Maybe, I don't know.
But I question if REs are the best tool for the job.

Two splits with two nested loops will do quite nicely:

use warnings; use strict;
my $s = "Different countries have different ideas. Merry Christmas to all.";
my @sentences = split /\./, $s;
for (@sentences) {
my @words = split (/ /, $_);
while (@words) {
print (join ' ',@words);
print "\n";
shift @words;
}
}

Just replace the print with a push to your result list if you want to have a
list instead.

jue


 
Reply With Quote
 
 
 
 
Andy De Petter
Guest
Posts: n/a
 
      11-26-2003
(Tony) wrote in news:c90e5468.0311260046.693d35c1
@posting.google.com:

> Is that possible?


Everything is possible with perl.

my $s = "Different countries have different ideas. Merry Christmas to
all.";

while ($s =~ m/\s/) {
print $s."\n";
$s =~ s/[^\s]+\s(.*)/$1/;
}

Hth,

-Andy

--
Andy De Petter - http://www.techos.be/andy - (ROT13)
Expert IT Analyst - Belgacom ANS/NTA/NST - http://www.belgacom.be
"Cogito Ergo Sum - I think, therefore I am."
-- R. Descartes
 
Reply With Quote
 
Tony
Guest
Posts: n/a
 
      11-27-2003
Very impressive. Thank you very much.

But what is the second "\s" for in: $s =~ s/[^\s]+\s(.*)/$1/;

I've also decided to implement a second loop, and this time drop off
the LAST word each time. Is there a better regexp than the below
(which seems to be working):

$s =~ s/(.*)[\$\s]+.+$/$1/;



Andy De Petter <> wrote in message news:<Xns943F6BDE93314adepetteskynetbe@195.238.3.1 80>...
> (Tony) wrote in news:c90e5468.0311260046.693d35c1
> @posting.google.com:
>
> > Is that possible?

>
> Everything is possible with perl.
>
> my $s = "Different countries have different ideas. Merry Christmas to
> all.";
>
> while ($s =~ m/\s/) {
> print $s."\n";
> $s =~ s/[^\s]+\s(.*)/$1/;
> }

 
Reply With Quote
 
Andy De Petter
Guest
Posts: n/a
 
      11-27-2003
(Tony) wrote in news:c90e5468.0311270049.41f4b553
@posting.google.com:

>
> But what is the second "\s" for in: $s =~ s/[^\s]+\s(.*)/$1/;
>


To check, wheter there's still a space after a detected word.

> I've also decided to implement a second loop, and this time drop off
> the LAST word each time. Is there a better regexp than the below
> (which seems to be working):
>
> $s =~ s/(.*)[\$\s]+.+$/$1/;


$s =~ s/(.*)\s+[^\s]+\.?/$1/;

(or something ilke that)

-Andy
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Death To Sub-Sub-Sub-Directories! Lawrence D'Oliveiro Java 92 05-20-2011 06:50 AM
Byte Offsets of Tokens, Ngrams and Sentences? Muhammad Adeel Python 2 08-06-2010 10:06 AM
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
Recognising Sub-Items and sub-sub items using xslt Ben XML 2 09-19-2007 09:35 AM
RegExp to match sentences in a paragraph Ana Javascript 0 08-02-2004 09:47 PM



Advertisments