Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Help: Show specific part

Reply
Thread Tools

Help: Show specific part

 
 
Amy Lee
Guest
Posts: n/a
 
      08-15-2008
Hello,

I'm a newbie in Perl and do some work in Bioinformatics. I write a tiny
script to show the sequences. However, I have a problem while I'm going to
further process.

My output looks like this.
>xxx

IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFD ADA
IGQLLGYPLVLEEGQECEYGQRRNRSDGFDEEA
>yyy

gaggccatcaagggatggtcgtttctccgggagcaacgcgtccagctcag ggacgacgag
tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggt tactcccatg
gccaagttcgatccggaaatagtccttgagttttatgccaatgcttggcc aacagaggag
>zzz

EGDAHAVSSTPAWVKPQQTPHGTHQYAQHHPSFSAHAGNASSST
PVQPKAPTQREAPQVPTPNTTRPAGNSNTTRNFPPRPLPEFTPLPMTYED LLPSLIANHL
AVVTPGRVLEPPFPKWYDPNATCKYHGGVPGHSVEKCLALKYKVQHLMDA GWLTFQEDRP
NVRTNPLANHGGGAVNAVESD
>qqq

tggaagccgcagaagaatcgttagaaactgctttccag
tcttttgaggtggtcagcatttcctccgtggactccctctttgggcaacc ttgtctgtcc
gatgcagcggtaatgatggcccgagttatgttggggaacggttttgaacc cgggatgggt
ttagaaaaaaacaacggcggcataactagc

And I hope I can save the whole protein sequences with their
tags(>blahblah) into a file, like "protein" and save DNA sequences into
"dna" file.

So from that, "protein" is
>xxx

IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFD ADA
IGQLLGYPLVLEEGQECEYGQRRNRSDGFDEEA
>zzz

EGDAHAVSSTPAWVKPQQTPHGTHQYAQHHPSFSAHAGNASSST
PVQPKAPTQREAPQVPTPNTTRPAGNSNTTRNFPPRPLPEFTPLPMTYED LLPSLIANHL
AVVTPGRVLEPPFPKWYDPNATCKYHGGVPGHSVEKCLALKYKVQHLMDA GWLTFQEDRP
NVRTNPLANHGGGAVNAVESD
"dna" is
>yyy

gaggccatcaagggatggtcgtttctccgggagcaacgcgtccagctcag ggacgacgag
tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggt tactcccatg
gccaagttcgatccggaaatagtccttgagttttatgccaatgcttggcc aacagaggag
>qqq

tggaagccgcagaagaatcgttagaaactgctttccag
tcttttgaggtggtcagcatttcctccgtggactccctctttgggcaacc ttgtctgtcc
gatgcagcggtaatgatggcccgagttatgttggggaacggttttgaacc cgggatgggt
ttagaaaaaaacaacggcggcataactagc

Because of lacking of Perl knowledge, could you show me some tips?

Thank you very much~

Regards,

Amy Lee
 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      08-15-2008
Amy Lee <(E-Mail Removed)> wrote:
>I'm a newbie in Perl and do some work in Bioinformatics. I write a tiny
>script to show the sequences. However, I have a problem while I'm going to
>further process.
>
>My output looks like this.

[snip lengthy text]

>And I hope I can save the whole protein sequences with their
>tags(>blahblah) into a file, like "protein" and save DNA sequences into
>"dna" file.
>
>So from that, "protein" is

[snip lenghty text]

>Because of lacking of Perl knowledge, could you show me some tips?


In how far is the text marked as "output" different from the part marked
as "protein"? They appear to be identical to me. But then again I did
not compare each and every character in those lengthy sequences.

jue
 
Reply With Quote
 
 
 
 
Amy Lee
Guest
Posts: n/a
 
      08-15-2008
On Fri, 15 Aug 2008 13:02:12 +0000, Jürgen Exner wrote:

> Amy Lee <(E-Mail Removed)> wrote:
>>I'm a newbie in Perl and do some work in Bioinformatics. I write a tiny
>>script to show the sequences. However, I have a problem while I'm going to
>>further process.
>>
>>My output looks like this.

> [snip lengthy text]
>
>>And I hope I can save the whole protein sequences with their
>>tags(>blahblah) into a file, like "protein" and save DNA sequences into
>>"dna" file.
>>
>>So from that, "protein" is

> [snip lenghty text]
>
>>Because of lacking of Perl knowledge, could you show me some tips?

>
> In how far is the text marked as "output" different from the part marked
> as "protein"? They appear to be identical to me. But then again I did
> not compare each and every character in those lengthy sequences.
>
> jue

Well, actually speaking, the protein is upper letter, and dna is lowercase
letter. So I suppose that I can deal with it by this. But I don't know how
to do that~

Thanks,

Amy
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      08-15-2008
Amy Lee <(E-Mail Removed)> wrote:
>On Fri, 15 Aug 2008 13:02:12 +0000, Jürgen Exner wrote:
>>>Because of lacking of Perl knowledge, could you show me some tips?

>>
>> In how far is the text marked as "output" different from the part marked
>> as "protein"? They appear to be identical to me. But then again I did
>> not compare each and every character in those lengthy sequences.

>
>Well, actually speaking, the protein is upper letter, and dna is lowercase
>letter.


What on earth are you talking about? I was asking about what is the
difference between your "output" and your "protein" character sequences,
i.e. how do you want your Perl script to manipulate/change/modify those
character sequences?

>So I suppose that I can deal with it by this. But I don't know how
>to do that~


I have no idea what you are talking about. What "that" are you referring
to?

jue
 
Reply With Quote
 
Amy Lee
Guest
Posts: n/a
 
      08-15-2008
On Fri, 15 Aug 2008 13:26:53 +0000, Jürgen Exner wrote:

> Amy Lee <(E-Mail Removed)> wrote:
>>On Fri, 15 Aug 2008 13:02:12 +0000, Jürgen Exner wrote:
>>>>Because of lacking of Perl knowledge, could you show me some tips?
>>>
>>> In how far is the text marked as "output" different from the part marked
>>> as "protein"? They appear to be identical to me. But then again I did
>>> not compare each and every character in those lengthy sequences.

>>
>>Well, actually speaking, the protein is upper letter, and dna is lowercase
>>letter.

>
> What on earth are you talking about? I was asking about what is the
> difference between your "output" and your "protein" character sequences,
> i.e. how do you want your Perl script to manipulate/change/modify those
> character sequences?
>
>>So I suppose that I can deal with it by this. But I don't know how
>>to do that~

>
> I have no idea what you are talking about. What "that" are you referring
> to?
>
> jue

Hmm, sorry to my poor English. Anyway, I will describe my problem in
details.

In fact, perl dose not modify any characters. As you know before, The
"output" is separated by two parts, upper letter part(dna sequences) and
lowercase letter part(protein sequences). And what I want to do is save
the "protein" part into a file and save the "dna" part into another file.
I need not change any characters.

Furthermore, there's a tag like ">xxx" and the tag follows sequences. I
hope I keep this tag when I save the "dna" part and "protein" part.

Thank you very much~

Regards,

Amy
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      08-15-2008
Amy Lee <(E-Mail Removed)> wrote:
>In fact, perl dose not modify any characters. As you know before, The
>"output" is separated by two parts, upper letter part(dna sequences) and
>lowercase letter part(protein sequences).


No, I did not know. It may have been obvious to you but I did not notice
that detail in the long complicated character sequences. Thank you for
the explanation.

>And what I want to do is save
>the "protein" part into a file and save the "dna" part into another file.


Ok, those four lines of explanation make it quite clear what you want to
do. Posting only samples doesn't help because it leaves too much room
for confusion and misunderstandings.

>Furthermore, there's a tag like ">xxx" and the tag follows sequences. I
>hope I keep this tag when I save the "dna" part and "protein" part.


Here's how I would do it (sketch of code only, details and error
handling omitted):

open() the input file, open() two output files 'dna' and 'protein' with
properly named file handles $DNA and $PROTEIN.

Then

while (<$IN>) {#loop through input file
if (substr ($_, 0, 1) eq '>' ){ #found tag in this line
my $next = <$IN>; #get next line for analysis
$isDNA = $next eq lc($next); #set flag for DNA or Prot
print ($isDNA ? $DNA : $PROTEIN) $_, $next;
#print tag line and line from analysis to
#either $DNA or $PROTEIN depending on flag
} else { #not a tag line but regular data
print ($isDNA ? $DNA : $PROTEIN) $_; #print normal data line
}


jue
 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      08-16-2008
Amy Lee schreef:

> I'm a newbie in Perl and do some work in Bioinformatics. I write a
> tiny script to show the sequences. However, I have a problem while
> I'm going to further process. [...]
> And I hope I can save the whole protein sequences with their
> tags(>blahblah) into a file, like "protein" and save DNA sequences
> into "dna" file.



The following code expects "good input". It will be fooled by mixed-up
input like

>xxx

IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFD ADA
tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggt tactcccatg
IGQLLGYPLVLEEGQECEYGQRRNRSDGFDEEA
>yyy

gaggccatcaagggatggtcgtttctccgggagcaacgcgtccagctcag ggacgacgag
IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFD ADA
tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggt tactcccatg
gccaagttcgatccggaaatagtccttgagttttatgccaatgcttggcc aacagaggag


#!/usr/bin/perl
use strict;
use warnings;

my ($fh_dna, $fh_pro) = (\*STDOUT, \*STDERR);

my $tag;

while ( <DATA> ) {
if ( /^>.+/ ) {
$tag = $_;
next; ###
} elsif ( /^[acgt]+$/ ) {
select $fh_dna;
} elsif ( /^[A-Z]+$/ ) {
select $fh_pro;
} else {
die;
}
$tag and print $tag and undef $tag;
print;
}

__DATA__
>xxx

IGRRQWASLVTPMAKFDPEIVLEFYANAWPTEEGVRDMRSWVRGQWIPFD ADA
IGQLLGYPLVLEEGQECEYGQRRNRSDGFDEEA
>yyy

gaggccatcaagggatggtcgtttctccgggagcaacgcgtccagctcag ggacgacgag
tatactgatttccaggaggaaatagggcgccggcagtgggcatcactggt tactcccatg
gccaagttcgatccggaaatagtccttgagttttatgccaatgcttggcc aacagaggag
>zzz

EGDAHAVSSTPAWVKPQQTPHGTHQYAQHHPSFSAHAGNASSST
PVQPKAPTQREAPQVPTPNTTRPAGNSNTTRNFPPRPLPEFTPLPMTYED LLPSLIANHL
AVVTPGRVLEPPFPKWYDPNATCKYHGGVPGHSVEKCLALKYKVQHLMDA GWLTFQEDRP
NVRTNPLANHGGGAVNAVESD
>qqq

tggaagccgcagaagaatcgttagaaactgctttccag
tcttttgaggtggtcagcatttcctccgtggactccctctttgggcaacc ttgtctgtcc
gatgcagcggtaatgatggcccgagttatgttggggaacggttttgaacc cgggatgggt
ttagaaaaaaacaacggcggcataactagc


"shrunken code" variant of the while-loop:

while ( <DATA> ) {
/^>.+/ and $tag = $_ and next;
/^[acgt]+$/ and select($fh_dna) or
/^[A-Z]+$/ and select($fh_pro) or die;
print $tag and undef $tag if $tag;
print;
}

--
Affijn, Ruud

"Gewoon is een tijger."

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
XML parsing problem finding a specific element in a specific place mazdotnet ASP .Net 2 10-02-2009 10:07 AM
Parsing DOM to search specific tags with specific custom attribute William FERRERES Javascript 7 07-09-2007 08:11 PM
Is ViwState Page-Specific or UserControl-Specific =?Utf-8?B?SmF2?= ASP .Net 2 08-16-2006 09:30 PM
redirect traffic on specific ip to specific interface mimiseh Cisco 3 06-05-2005 09:14 PM
How do you make sure a frameset is loaded? I'm trying to open a frameset in a new window which shows a specific html page in a specific frame ck388 Javascript 1 09-24-2003 08:32 PM



Advertisments