Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Reading chunks from file?

Reply
Thread Tools

Reading chunks from file?

 
 
Bryan
Guest
Posts: n/a
 
      06-10-2004
Hi, I'm reading in a file in fasta format:
>header

DATADATADATA
DATADATA

>header

DATA

I have been doing this:
open (INFILE, "< $filename") or die "Cannot open $filename] for read\n\n";
undef $/;
my @chunks = split(/>/, <INFILE>);
$/ = "\n";
close INFILE;

This works, but this split loses the '>' from the header part of the
file, which I would rather keep for identifying header info later. So
first, why do I lose the '>' on this particular split, is there
something I can do to keep it? Second, is there a better way to split
this file into chunks than I am doing?

Thanks,
Bryan

 
Reply With Quote
 
 
 
 
Paul Lalli
Guest
Posts: n/a
 
      06-10-2004
On Thu, 10 Jun 2004, Bryan wrote:

> Hi, I'm reading in a file in fasta format:
> >header

> DATADATADATA
> DATADATA
>
> >header

> DATA
>
> I have been doing this:
> open (INFILE, "< $filename") or die "Cannot open $filename] for read\n\n";
> undef $/;
> my @chunks = split(/>/, <INFILE>);
> $/ = "\n";
> close INFILE;
>
> This works, but this split loses the '>' from the header part of the
> file, which I would rather keep for identifying header info later. So
> first, why do I lose the '>' on this particular split, is there
> something I can do to keep it?


Have you read the documentation for split? The answer to both questions
is found within.

perldoc -f split

> Second, is there a better way to split
> this file into chunks than I am doing?


Do you need to store the whole file in memory at once? Might it be a
better idea to read one record at a time? Rather than undefining the
input record separator, maybe you want to set that variable to the actual
string which separates your records, and then read a file in one record at
a time.

perldoc perlop
for info on $/

Hope this helps,
Paul Lalli
 
Reply With Quote
 
 
 
 
ctcgag@hotmail.com
Guest
Posts: n/a
 
      06-10-2004
Bryan <(E-Mail Removed)> wrote:
> Hi, I'm reading in a file in fasta format:
> >header

> DATADATADATA
> DATADATA
>
> >header

> DATA
>
> I have been doing this:
> open (INFILE, "< $filename") or die "Cannot open $filename] for
> read\n\n"; undef $/;
> my @chunks = split(/>/, <INFILE>);
> $/ = "\n";
> close INFILE;
>
> This works, but this split loses the '>' from the header part of the
> file, which I would rather keep for identifying header info later. So
> first, why do I lose the '>' on this particular split, is there
> something I can do to keep it?


You lose the '>' because that is what split does.

You could keep it by using a look-ahead assertion.

split /(?=>)/ , <DATA>

This will probably produce an empty string or a sting containing just
whitespace as the first element.

> Second, is there a better way to split
> this file into chunks than I am doing?


If the file is big, it would probably be better not to slurp it all
at once. You could set $/ ='>', but then you would have an '>' at the
end of every record (except the last), and not one at the beginning if
every record. (You would also have a blank record as the first one read).
This is kind of ugly, but what you gonna do?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
Brian McCauley
Guest
Posts: n/a
 
      06-10-2004
http://www.velocityreviews.com/forums/(E-Mail Removed) writes:

> If the file is big, it would probably be better not to slurp it all
> at once. You could set $/ ='>', but then you would have an '>' at the
> end of every record (except the last), and not one at the beginning if
> every record. (You would also have a blank record as the first one read).
> This is kind of ugly, but what you gonna do?


Perpaps File::Stream would help?

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem reading binary chunks from file ColdStart VHDL 0 07-12-2010 05:48 PM
Reading in chunks of data Paul.Lee.1971@gmail.com Java 4 07-11-2008 12:50 PM
reading file objects in chunks Martin Marcher Python 1 11-12-2007 05:07 PM
Reading web service resposes in chunks? Ben Johnson Ruby 1 02-28-2007 08:57 PM
reading the buffer in chunks Sean C++ 5 02-01-2007 06:36 PM



Advertisments