Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Matching neighbouring words of a pattern using Regex

Reply
Thread Tools

Matching neighbouring words of a pattern using Regex

 
 
CV
Guest
Posts: n/a
 
      08-30-2004
How can I match 'n' number of neighbouring words of a pattern using regular
expressions?

For example, suppose I am looking for the pattern "length xyz cm" in some
text. where xyz is a number - integer or fraction or decimal point. How can
I also grab about 3-5 words on either side of the pattern "length xyz cm"?
The surrounding words are not always constant & may be variable. Also, the
original text to be matched is not just a single sentence, but lines from a
file concatenated together - so the text has many newline characters too. I
only want the words on the same line as the pattern.

I have tried using regex of the form
/\b(\w*)\b(\w*)\b(\w*)\b($pattern)\b(\w*)\b(\w*)\b( \w*), but this doesn't
work for some reason. Could someone please offer some suggestions?

thanks!


 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      08-30-2004
[ Reply not posted to the defunct group comp.lang.perl ]

CV wrote:
> How can I match 'n' number of neighbouring words of a pattern using
> regular expressions?
>
> For example, suppose I am looking for the pattern "length xyz cm"
> in some text. where xyz is a number - integer or fraction or
> decimal point. How can I also grab about 3-5 words on either side
> of the pattern "length xyz cm"? The surrounding words are not
> always constant & may be variable. Also, the original text to be
> matched is not just a single sentence, but lines from a file
> concatenated together - so the text has many newline characters
> too. I only want the words on the same line as the pattern.
>
> I have tried using regex of the form
> /\b(\w*)\b(\w*)\b(\w*)\b($pattern)\b(\w*)\b(\w*)\b( \w*), but this
> doesn't work for some reason.


It doesn't work for several reasons, such as:

- No space characters.
- '\w*\b\w*' is an impossible combination that can never match (check
out the description of \b in "perldoc perlre" to learn why).
- The \w character class does not include e.g. the '$' character,
while you mentioned that a "word" may be a variable.

> Could someone please offer some suggestions?


Try something like this:

/((?:\S+ +){0,3})\b($pattern)\b((?: +\S+){0,3})/

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
 
Tad McClellan
Guest
Posts: n/a
 
      08-30-2004
Gunnar Hjalmarsson <> wrote:

> - '\w*\b\w*' is an impossible combination that can never match



It will match any string with at least one \w character in it:

$_ = 'hi';
print "matched '$&'\n" if /\w*\b\w*/;


> (check
> out the description of \b in "perldoc perlre" to learn why).



Check out this part too:

... counting the imaginary characters off the
beginning and end of the string as matching a \W




\W could be the beginning of string in the OP's regex.


--
Tad McClellan SGML consulting
Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Charles DeRykus
Guest
Posts: n/a
 
      08-31-2004
In article <S7OdnYSjY_-DGa7cRVn->,
CV <> wrote:
>How can I match 'n' number of neighbouring words of a pattern using regular
>expressions?
>
>For example, suppose I am looking for the pattern "length xyz cm" in some
>text. where xyz is a number - integer or fraction or decimal point. How can
>I also grab about 3-5 words on either side of the pattern "length xyz cm"?
>The surrounding words are not always constant & may be variable. Also, the
>original text to be matched is not just a single sentence, but lines from a
>file concatenated together - so the text has many newline characters too. I
>only want the words on the same line as the pattern.
>
>I have tried using regex of the form
>/\b(\w*)\b(\w*)\b(\w*)\b($pattern)\b(\w*)\b(\w*)\b( \w*), but this doesn't
>work for some reason. Could someone please offer some suggestions?
>


You may be confused about the \b assertion. Did you intend for
something with \w and \W..? Also, what if the pattern falls
at the beginning or end of the line... do you want to capture
the patterns that may not have 3-5 surrounding words?

One possibility presuming you intend to capture 3-5 surrounding
words:


my $text = "...";
my $pattern = 'length ... cm ';

my $words = '(?:\w+[^\w\n]+){3,5}';
#my $words = '(?:\w+[^\w\n]+){0,5}'; # to catch every pattern

print $1 while /($words$pattern$words)/g;


[ Note the 3-5 surrounding words may consume another
adjacent $pattern instance but you don't specify what
to do in that case. }


hth,
--
Charles DeRykus
 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      08-31-2004
Tad McClellan wrote:
> Gunnar Hjalmarsson wrote:
>>
>> - '\w*\b\w*' is an impossible combination that can never match

>
> It will match any string with at least one \w character in it:
>
> $_ = 'hi';
> print "matched '$&'\n" if /\w*\b\w*/;
>
>> (check
>> out the description of \b in "perldoc perlre" to learn why).

>
> Check out this part too:
>
> ... counting the imaginary characters off the
> beginning and end of the string as matching a \W
>
>
>
> \W could be the beginning of string in the OP's regex.


Thanks, Tad, I stand corrected (even if it doesn't do what the OP
wanted it to do...).

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex testing and UTF8 awarenes or Regex and numeric pattern matching sln@netherlands.com Perl Misc 2 03-10-2009 03:51 AM
String Pattern Matching: regex and Python regex documentation Xah Lee Python 8 09-26-2006 03:24 PM
String Pattern Matching: regex and Python regex documentation Xah Lee Perl Misc 2 09-25-2006 03:15 AM
String Pattern Matching: regex and Python regex documentation Xah Lee Java 1 09-22-2006 07:11 PM
Matching neighbouring words of a pattern using Regex CV Perl 2 08-31-2004 12:27 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57