Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Split a multi-sequence file into individual files

Reply
Thread Tools

Split a multi-sequence file into individual files

 
 
ela
Guest
Posts: n/a
 
      11-08-2008
From google, no need to reinvent the wheel but this one line code is too
difficult to understand...

perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){ open(F,">$1.fsa")||warn"$1 write
failed:$!\n";chomp;print F ">", $_ }' fastafile

anybody helps?


 
Reply With Quote
 
 
 
 
Tad J McClellan
Guest
Posts: n/a
 
      11-08-2008
ela <(E-Mail Removed)> wrote:
> From google, no need to reinvent the wheel but this one line code is too
> difficult to understand...
>
> perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){ open(F,">$1.fsa")||warn"$1 write
> failed:$!\n";chomp;print F ">", $_ }' fastafile
>
> anybody helps?



BEGIN{ $/=">"; } # set the Input Record Separator (perlvar.pod)
while ( <> ) { # -n wraps in a while-diamond loop
if( /^\s*(\S+)/ ){ # grab the first non-whitespace characters
open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a file
chomp; # remove ">" from end of string
print F ">", $_; # print ">" at beginning of string
}
}



--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
 
Reply With Quote
 
 
 
 
Mirco Wahab
Guest
Posts: n/a
 
      11-08-2008
Tad J McClellan wrote:
> ela <(E-Mail Removed)> wrote:
>> From google, no need to reinvent the wheel but this one line code is too
>> difficult to understand...
>>
>> perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){ open(F,">$1.fsa")||warn"$1 write
>> failed:$!\n";chomp;print F ">", $_ }' fastafile
>>
>> anybody helps?

>
>
> BEGIN{ $/=">"; } # set the Input Record Separator (perlvar.pod)
> while ( <> ) { # -n wraps in a while-diamond loop
> if( /^\s*(\S+)/ ){ # grab the first non-whitespace characters
> open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a file
> chomp; # remove ">" from end of string
> print F ">", $_; # print ">" at beginning of string
> }
> }


I don't understand the purpose of the chomp,
maybe it needs to be in front of the if():

...
local $/ = '>';
while (<>) {
chomp;
if( /\s*(\S+)/ ) {
open my $fh, '>', "$1.fsa" or warn "$1 $!";
print $fh '>'.$_
}
}
...

Regards

M.
 
Reply With Quote
 
Tim Greer
Guest
Posts: n/a
 
      11-08-2008
Mirco Wahab wrote:

> Tad J McClellan wrote:
>> ela <(E-Mail Removed)> wrote:
>>> From google, no need to reinvent the wheel but this one line code is
>>> too difficult to understand...
>>>
>>> perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){
>>> open(F,">$1.fsa")||warn"$1 write failed:$!\n";chomp;print F ">", $_
>>> }' fastafile
>>>
>>> anybody helps?

>>
>>
>> BEGIN{ $/=">"; } # set the Input Record Separator
>> (perlvar.pod)
>> while ( <> ) { # -n wraps in a while-diamond loop
>> if( /^\s*(\S+)/ ){ # grab the first non-whitespace
>> characters
>> open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a
>> file
>> chomp; # remove ">" from end of string
>> print F ">", $_; # print ">" at beginning of string
>> }
>> }

>
> I don't understand the purpose of the chomp,
> maybe it needs to be in front of the if():
>
> ...
> local $/ = '>';
> while (<>) {
> chomp;
> if( /\s*(\S+)/ ) {
> open my $fh, '>', "$1.fsa" or warn "$1 $!";
> print $fh '>'.$_
> }
> }
> ...
>
> Regards
>
> M.


perldoc -f chomp

Chomp removes any newline, if one exists (which it probably would on
<>).

It's the difference between (trying to) opening:
$1.fsa

and

$1
..fsa


--
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!
 
Reply With Quote
 
Tim Greer
Guest
Posts: n/a
 
      11-08-2008
Tim Greer wrote:

> Mirco Wahab wrote:
>
>> Tad J McClellan wrote:
>>> ela <(E-Mail Removed)> wrote:
>>>> From google, no need to reinvent the wheel but this one line code
>>>> is too difficult to understand...
>>>>
>>>> perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){
>>>> open(F,">$1.fsa")||warn"$1 write failed:$!\n";chomp;print F ">", $_
>>>> }' fastafile
>>>>
>>>> anybody helps?
>>>
>>>
>>> BEGIN{ $/=">"; } # set the Input Record Separator
>>> (perlvar.pod)
>>> while ( <> ) { # -n wraps in a while-diamond loop
>>> if( /^\s*(\S+)/ ){ # grab the first non-whitespace
>>> characters
>>> open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a
>>> file
>>> chomp; # remove ">" from end of string
>>> print F ">", $_; # print ">" at beginning of string
>>> }
>>> }

>>
>> I don't understand the purpose of the chomp,
>> maybe it needs to be in front of the if():
>>
>> ...
>> local $/ = '>';
>> while (<>) {
>> chomp;
>> if( /\s*(\S+)/ ) {
>> open my $fh, '>', "$1.fsa" or warn "$1 $!";
>> print $fh '>'.$_
>> }
>> }
>> ...
>>
>> Regards
>>
>> M.

>
> perldoc -f chomp
>
> Chomp removes any newline, if one exists


Pardon... to be clear, it removes the new line at the end of the string
(not just any new line).
--
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!
 
Reply With Quote
 
Mirco Wahab
Guest
Posts: n/a
 
      11-08-2008
Tim Greer wrote:
> Mirco Wahab wrote:
>> Tad J McClellan wrote:
>>> BEGIN{ $/=">"; } # set the Input Record Separator
>>> (perlvar.pod)
>>> while ( <> ) { # -n wraps in a while-diamond loop
>>> if( /^\s*(\S+)/ ){ # grab the first non-whitespace
>>> characters
>>> open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a
>>> file
>>> chomp; # remove ">" from end of string
>>> print F ">", $_; # print ">" at beginning of string
>>> }
>>> }

>> I don't understand the purpose of the chomp,
>> maybe it needs to be in front of the if():
>>
>> ...
>> local $/ = '>';
>> while (<>) {
>> chomp;
>> if( /\s*(\S+)/ ) {
>> open my $fh, '>', "$1.fsa" or warn "$1 $!";
>> print $fh '>'.$_
>> }
>> }
>> ...

>
> perldoc -f chomp
>
> Chomp removes any newline, if one exists (which it probably would on
> <>).


No, it doesn't. It removes the $/, which is
here the '>'.

> It's the difference between (trying to) opening:
> $1.fsa
>
> and
>
> $1
> .fsa


No way. In the above problem, it would on the
first record get the '>' in $1, which leads
to an open argument of ">>.fsa" which
creates a file '.fsa' that contains noting.

Regards

M.






















 
Reply With Quote
 
Tim Greer
Guest
Posts: n/a
 
      11-08-2008
Mirco Wahab wrote:

>> perldoc -f chomp
>>
>> Chomp removes any newline, if one exists (which it probably would on
>> <>).

>
> No, it doesn't. It removes the $/, which is
> here the '>'.


My newsreader is interpreting / / and <> for some reason (and I'm not
seeing what I should be seeing), so I didn't see all of the code for
what it was, I guess. I saw while (<>) { chomp; ... } and hence my
reply. Disregard if it wasn't relevant after all.
--
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      11-10-2008
Mirco Wahab <(E-Mail Removed)> wrote:
> Tad J McClellan wrote:
> > ela <(E-Mail Removed)> wrote:
> >> From google, no need to reinvent the wheel but this one line code is
> >> too difficult to understand...
> >>
> >> perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){ open(F,">$1.fsa")||warn"$1
> >> write failed:$!\n";chomp;print F ">", $_ }' fastafile
> >>
> >> anybody helps?

> >
> >
> > BEGIN{ $/=">"; } # set the Input Record Separator
> > (perlvar.pod) while ( <> ) { # -n wraps in a
> > while-diamond loop
> > if( /^\s*(\S+)/ ){ # grab the first non-whitespace
> > characters
> > open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a file
> > chomp; # remove ">" from end of string
> > print F ">", $_; # print ">" at beginning of string
> > }
> > }

>
> I don't understand the purpose of the chomp,


It is to remove the trailing ">", which is not wanted. In FASTA sequence
files, ">" is start of the next record, not the end of the current one.

> maybe it needs to be in front of the if():


I don't see how that would make a difference. If the if fails, nothing
happens anyway. If the if succeeds, it makes no difference if the chomp
is done before or after.

Ah, but if the file starts out with the first character of ">", (which it
probably does) then the first record contains nothing but $/. By not
chomping the conditional is true you litter your file system with invisible
(on linux) empty files named .fsa. If you do chomp, the conditional is
false and nothing happens, which is what one wants. So yes, the chomp
should be before the if.


Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Split single file into multiple files based on patterns satyam Python 9 10-24-2012 08:17 AM
Read Text File and split them to individual text file Krish ASP .Net 1 10-20-2005 03:39 PM
Text files read multiple files into single file, and then recreate the multiple files googlinggoogler@hotmail.com Python 4 02-13-2005 05:44 PM
Can I use a combination of war files and individual files? Andy Fish Java 0 06-23-2004 11:55 AM
Need to concatenate all files in a dir together into one file and read the first 225 characters from each file into another file. Tony Perl Misc 5 04-19-2004 03:28 PM



Advertisments