Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > print given character range.

Reply
Thread Tools

print given character range.

 
 
Jayaprakash Rudraraju
Guest
Posts: n/a
 
      04-05-2004

Most of the files in bioinformatics save their sequences in fasta
format. Fasta format files contain header lines followed by dna
sequence. I have been using the following short-cut to get sequence
given the range in the sequence.

perl -ne 'chomp; next if />/; print' FASTA.TXT | cut -c3450-3470

Is there is a better and convinient way to do it.
 
Reply With Quote
 
 
 
 
Andre Majorel
Guest
Posts: n/a
 
      04-05-2004
On 2004-04-05, Jayaprakash Rudraraju <(E-Mail Removed)> wrote:

> Most of the files in bioinformatics save their sequences in fasta
> format. Fasta format files contain header lines followed by dna
> sequence. I have been using the following short-cut to get sequence
> given the range in the sequence.
>
> perl -ne 'chomp; next if />/; print' FASTA.TXT | cut -c3450-3470
>
> Is there is a better and convinient way to do it.


Other ways to do it would be:

grep -v '>' FASTA.TXT | tr -d '\n' | cut -c3450-3470

perl -ne '
chomp;
next if />/;
$result .= $_;
if (length ($result) >= 3470)
{
print substr ($result, 3449, 21), "\n";
exit 0
}'

Whether they're faster or more convenient than the above, I
don't know. But the solutions involving cut(1) may not do what
you want if FASTA.TXT is too big to be swallowed in one line.

--
André Majorel <URL:http://www.teaser.fr/~amajorel/>
"Finally I am becoming stupider no more." -- Paul Erdös' epitaph
 
Reply With Quote
 
 
 
 
Cognition Peon
Guest
Posts: n/a
 
      04-06-2004
Yesterday, IP packets from Andre Majorel delivered:

> On 2004-04-05, Jayaprakash Rudraraju <(E-Mail Removed)> wrote:
>
> > Most of the files in bioinformatics save their sequences in fasta
> > format. Fasta format files contain header lines followed by dna
> > sequence. I have been using the following short-cut to get sequence
> > given the range in the sequence.
> >
> > perl -ne 'chomp; next if />/; print' FASTA.TXT | cut -c3450-3470
> >
> > Is there is a better and convinient way to do it.

>
> Other ways to do it would be:
>
> grep -v '>' FASTA.TXT | tr -d '\n' | cut -c3450-3470


Thanks for the solution.. wanted a simpler way to get the range of
sequence from a fasta file. The headers in fasta files always start
with '>' but I was not looking for a faster solution. will use a script
if fasta file is too long.

>
> perl -ne '
> chomp;
> next if />/;
> $result .= $_;
> if (length ($result) >= 3470)
> {
> print substr ($result, 3449, 21), "\n";
> exit 0
> }'
>
> Whether they're faster or more convenient than the above, I
> don't know. But the solutions involving cut(1) may not do what
> you want if FASTA.TXT is too big to be swallowed in one line.
>
>


--
echo http://www.velocityreviews.com/forums/(E-Mail Removed) | perl -pe 'y/a-z/n-za-m/'

If you want to make God laugh, tell him your future plans.
-------------------------------------
Printed using 100% recycled electrons
 
Reply With Quote
 
Adam Price
Guest
Posts: n/a
 
      04-08-2004
On Mon, 5 Apr 2004 15:09:36 -0700, Jayaprakash Rudraraju wrote:

> Most of the files in bioinformatics save their sequences in fasta
> format. Fasta format files contain header lines followed by dna
> sequence. I have been using the following short-cut to get sequence
> given the range in the sequence.
>
> perl -ne 'chomp; next if />/; print' FASTA.TXT | cut -c3450-3470
>
> Is there is a better and convinient way to do it.


You could try looking at CPAN, try
http://search.cpan.org/~birney/bioperl-1.4/
as a place to start looking.
It seems to cover lots of stuff to do with FASTA files.
Adam
 
Reply With Quote
 
Kevin Collins
Guest
Posts: n/a
 
      04-08-2004
Jayaprakash Rudraraju <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> Most of the files in bioinformatics save their sequences in fasta
> format. Fasta format files contain header lines followed by dna
> sequence. I have been using the following short-cut to get sequence
> given the range in the sequence.
>
> perl -ne 'chomp; next if />/; print' FASTA.TXT | cut -c3450-3470
>
> Is there is a better and convinient way to do it.


Try this:

perl -ne 'chomp; print substr($_, 3449, 20) unless (/^>/);'

The "^" assumes (as you mentioned in another reply) that the header
starts with '>' - otherwise you can leave it out. However, if the
lines do start with '>', it is much faster (especially for the long
records) for the regexp engine if you anchor the RE with '^'.

This single perl command should always be faster that 'perl | cut'...

Kevin
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Has thought been given given to a cleaned up C? Possibly called C+. Casey Hawthorne C Programming 385 04-04-2010 02:11 AM
" Given BACK what was freely GIVEN " 2Barter.net C++ 0 12-13-2006 02:56 AM
Days in a given date range for a given month......... Lord0 Java 1 04-19-2006 04:54 PM
generate all possible strings of given length given a set of characters chiara C Programming 6 10-06-2005 01:43 AM
Unlarging the print to print using PDF file to print Bun Mui Computer Support 3 09-13-2004 03:15 AM



Advertisments