Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Help needed with reg exp please

Reply
Thread Tools

Help needed with reg exp please

 
 
Aristotle
Guest
Posts: n/a
 
      09-04-2004
Could you please help me out with regular expressions. I'm trying to
write a perl script that proccesses some text, and i'm stuck at the
following:

need to remove from the text below all words starting and ending with
lower case letters. Words maybe followed by dot "." or not (most do),
and may contain a "-" character:


eg:

---> Apis calc. Carb-v. cham. dendr-pol. halia-lac. hep. lac-leo. lyc.
Med. nat-m. nit-ac. nux-v. OPIUM plat. polys. PULS. rauw. sal-fr.
Sanguis-s sil. sulph. Tarent. tung-met. VERAT. viol-o vio-zinc.
zinc-c.

should yield:

---> Apis Carb-v. Med. OPIUM PULS. Sanguis-s Tarent. VERAT.


ie words starting with a capital letter must remain untouched.


I've tried various combinations of reg exp before posting here, but
could not find the right one.
I'd really appreciate your help.
 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      09-04-2004
Aristotle wrote:
> Could you please help me out with regular expressions.


<snip>

> I've tried various combinations of reg exp before posting here,


Show us!

And consult e.g. "perldoc perlrequick", if you haven't done so already.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
 
Aristotle
Guest
Posts: n/a
 
      09-04-2004
Aristotle wrote:
> I've tried various combinations of reg exp before posting here,

Gunnar Hjalmarsson wrote:
>Show us!


I managed to get the desired effect by using the following code; it
gets the job done, but it looks ugly:

{
$parts[1] =~ s/ ([a-z]+[a-z]) / /g;
$parts[1] =~ s/ ([a-z]+[a-z])./ /g;
$parts[1] =~ s/ ([a-z]+[a-z]) / /g;
$parts[1] =~ s/ ([a-z]+[a-z])./ /g;
$parts[1] =~ s/ ([a-z]) / /g;
$parts[1] =~ s/ ([a-z])./ /g;
}

However that was after trying MANY, MANY exps, eg:

$parts[1] =~ s/([a-z]+[a-z]\.)//g;
$parts[1] =~ s/([a-z]*[a-z]\.)//g;
$parts[1] =~ s/([a-z][a-z]+\-[a-z]\.)//g;
$parts[1] =~ s/([a-z][a-z]+\-.[a-z])//g;
$parts[1] =~ s/([a-z][a-z]+[a-z])//g;

I'm no expert, i did what i could...
If you think you can help, please do so without questioning me.
 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      09-04-2004
Aristotle wrote:
> Gunnar Hjalmarsson wrote:
>> Aristotle wrote:
>>> need to remove from the text below all words starting and
>>> ending with lower case letters. Words maybe followed by dot "."
>>> or not (most do), and may contain a "-" character:
>>>
>>> eg:
>>>
>>> ---> Apis calc. Carb-v. cham. dendr-pol. halia-lac. hep.
>>> lac-leo. lyc. Med. nat-m. nit-ac. nux-v. OPIUM plat. polys.
>>> PULS. rauw. sal-fr. Sanguis-s sil. sulph. Tarent. tung-met.
>>> VERAT. viol-o vio-zinc. zinc-c.
>>>
>>> should yield:
>>>
>>> ---> Apis Carb-v. Med. OPIUM PULS. Sanguis-s Tarent. VERAT.
>>>
>>> ie words starting with a capital letter must remain untouched.
>>>
>>> I've tried various combinations of reg exp before posting here,

>>
>> Show us!

>
> I managed to get the desired effect by using the following code; it
> gets the job done, but it looks ugly:
>
> {
> $parts[1] =~ s/ ([a-z]+[a-z]) / /g;
> $parts[1] =~ s/ ([a-z]+[a-z])./ /g;
> $parts[1] =~ s/ ([a-z]+[a-z]) / /g;
> $parts[1] =~ s/ ([a-z]+[a-z])./ /g;
> $parts[1] =~ s/ ([a-z]) / /g;
> $parts[1] =~ s/ ([a-z])./ /g;
> }
>
> However that was after trying MANY, MANY exps, eg:
>
> $parts[1] =~ s/([a-z]+[a-z]\.)//g;
> $parts[1] =~ s/([a-z]*[a-z]\.)//g;
> $parts[1] =~ s/([a-z][a-z]+\-[a-z]\.)//g;
> $parts[1] =~ s/([a-z][a-z]+\-.[a-z])//g;
> $parts[1] =~ s/([a-z][a-z]+[a-z])//g;
>
> I'm no expert, i did what i could...
> If you think you can help, please do so without questioning me.


There are all too many lazy people who have no real interest in
learning Perl, and who believe that groups like this one are just free
help desks. I asked you to prove that you are not one of those by
posting code. You need to live with that, whatever you call it, or
else few people are willing to assist.

Anyway, this is one way to do it with one substitution:

s/\s+[a-z][-\w]*[a-z]\.?//g;

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      09-04-2004
Gunnar Hjalmarsson wrote:
> Anyway, this is one way to do it with one substitution:
>
> s/\s+[a-z][-\w]*[a-z]\.?//g;


Should better be:

s/\s*[a-z][-\w]*[a-z]\.?//g;
--------^

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
reg exp: helping hand needed Oliver Meister Perl Misc 4 11-20-2006 02:58 PM
Help! Complex Pattern Extraction with Key/Value Pairs and Reg Exp? aekalman Perl Misc 6 11-22-2004 10:59 PM
reg exp help Jim Python 5 07-27-2004 08:42 PM
Reg Exp Help PerlE Perl 0 01-30-2004 06:15 AM
help with cr in reg exp... GrelEns Python 1 01-17-2004 11:35 AM



Advertisments