Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > string capture regex

Reply
Thread Tools

string capture regex

 
 
Cheez
Guest
Posts: n/a
 
      01-07-2004
Howdy, newbie to Perl. I want to make a regex that will process a
particular line of text from a large flatfile:

>gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]


I want the regex to:
1. capture the 7 digit number that always follows >gi|
2. then associate that number (in a hash?) with the "words"
Hypothetical, ORF, Yal069wp. These "words" always follow the
"NP_009331.1|" format and end before the "[SC]".

I am little overwhelmed by all the m// and s/// modifiers. Any nudge
in the right direction about developing a regex would be greatly
appreciated.

I will post my code but it's really lame!

Thanks,
Cheez

=====================
$flatfile = "I.faa";

open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n";

@test2 = <FILE>;

close(FILE);

foreach (@test2) {

chomp;

$_ =~ s/\W/ /g; # getting rid of non word chunks..not sure it helps

push @newtest, split(/ /);

}

open (FILE, ">parsed.txt") || die "Can't open '$parsed': $!\n";

print FILE "$_\n" for @newtest;

close(FILE);

print scalar(@newtest); # checking that the array is populated
 
Reply With Quote
 
 
 
 
Sam Holden
Guest
Posts: n/a
 
      01-07-2004
On 6 Jan 2004 16:20:29 -0800, Cheez <(E-Mail Removed)> wrote:
> Howdy, newbie to Perl. I want to make a regex that will process a
> particular line of text from a large flatfile:
>
>>gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

>
> I want the regex to:
> 1. capture the 7 digit number that always follows >gi|
> 2. then associate that number (in a hash?) with the "words"
> Hypothetical, ORF, Yal069wp. These "words" always follow the
> "NP_009331.1|" format and end before the "[SC]".


my %hash;
while (<>) {
chomp;
my (undef, $number, undef, undef, $words) = split /\|/;
$words=~s/\s*\[SC\]$//;
$hash{$number} = $words;
}


>
> I am little overwhelmed by all the m// and s/// modifiers. Any nudge
> in the right direction about developing a regex would be greatly
> appreciated.


Just use split (which does use a regex but a very simple one).

[snip code]

--
Sam Holden
 
Reply With Quote
 
 
 
 
Matt Garrish
Guest
Posts: n/a
 
      01-07-2004

"Cheez" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) om...
> Howdy, newbie to Perl.


You're best not asking the newbie for help. He just doesn't get it... : )

Matt


 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      01-07-2004
Cheez wrote:
> Howdy, newbie to Perl. I want to make a regex that will process a
> particular line of text from a large flatfile:
>
>>gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

>
> I want the regex to:
> 1. capture the 7 digit number that always follows >gi|
> 2. then associate that number (in a hash?) with the "words"
> Hypothetical, ORF, Yal069wp. These "words" always follow the
> "NP_009331.1|" format and end before the "[SC]".


<snip>

> $flatfile = "I.faa";
>
> open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n";


Yet another variant - from here you might want to do something like:

my %hash = ();
while (<FILE>) {
if ( /^gi\|(\d+)\S+\s+(\w+)\s+(\w+);\s+(\w+)/ ) {
$hash{$1} = [ $2, $3, $4 ];
}
}
close FILE;

use Data:umper;
print Dumper %hash;

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
Cheez
Guest
Posts: n/a
 
      01-07-2004
I am really thankful for all of these suggestions. I will try some of
these regexes and get back to you all. Looking at a regex that works
helps me to work backwards (deconstuctionist?) to see *why* it worked.
This will be very helpful for not only this task but many more in the
future.

Thanks again everyone,
Cheez

http://www.velocityreviews.com/forums/(E-Mail Removed) (Cheez) wrote in message news:<(E-Mail Removed). com>...
> Howdy, newbie to Perl. I want to make a regex that will process a
> particular line of text from a large flatfile:

[snip]
 
Reply With Quote
 
Kris Jenkins
Guest
Posts: n/a
 
      01-08-2004
Sam Holden wrote:
> On 6 Jan 2004 16:20:29 -0800, Cheez <(E-Mail Removed)> wrote:
>
>>Howdy, newbie to Perl. I want to make a regex that will process a
>>particular line of text from a large flatfile:
>>
>>
>>>gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

>>
>>I want the regex to:
>>1. capture the 7 digit number that always follows >gi|
>>2. then associate that number (in a hash?) with the "words"
>>Hypothetical, ORF, Yal069wp. These "words" always follow the
>>"NP_009331.1|" format and end before the "[SC]".

>
>
> my %hash;
> while (<>) {
> chomp;
> my (undef, $number, undef, undef, $words) = split /\|/;
> $words=~s/\s*\[SC\]$//;
> $hash{$number} = $words;
> }


Just as another option, you could replace:

my (undef, $number, undef, undef, $words) = split /\|/;

With:

my ( $number, $words) = ( split /\|/ )[1,4];

Cheers,
Kris
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
regex =~ string or string =~ regex? Ruby Newbee Ruby 3 01-04-2010 06:04 PM
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
String Pattern Matching: regex and Python regex documentation Xah Lee Java 1 09-22-2006 07:11 PM
String.replaceAll(String regex, String replacement) question Mladen Adamovic Java 3 12-05-2003 04:20 PM
Re: String.replaceAll(String regex, String replacement) question Mladen Adamovic Java 0 12-04-2003 04:40 PM



Advertisments