Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Reading whole file into memory. Parsing 'C' like file efficently

Reply
Thread Tools

Reading whole file into memory. Parsing 'C' like file efficently

 
 
n_macpherson@sky.com
Guest
Posts: n/a
 
      06-17-2008
I know there are a number of FAQs which disscourage reading whole
files into memory rather than line by line.

However my problem is as follows.

I am reading a file which is a language which looks like (but isn't )
C. I need to insert comments / documentation at various points in the
file. However sometimes I don't know what I want to insert until I get
well past the current line - for example


for(i=0;i<64;i++)
{
// lots of code
}

Say my opening brace is on line 95 and my closing brace 195 I want to
insert a comment

// for loop ends line 195

at line 94 (i.e immediately above the opening brace). The problem is
that processing line by line I don't know until I get to line 195 what
I have to change at line 9 so I have to store lines 94 to 195 in
memory anyway

Similarly if I read a function header, I want to insert some
documentation before the function header
so I don't believe processing the file line by line is the best
solution here. As I will be inserting extra lines into the middle of
an array I think I am going to need a module to do this.

Memory won't be an issue - my largest file will only be 6000

I've been away from Perl for a while but I seem to remember there was
a module File::Tie which might be suitable.

I'd be grateful if anyone has any suggestions - the people who will be
using this don't normally use Perl so I'd like to avoid using any non-
standard modules if possible

Thanks

Niall
 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      06-17-2008
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
>Similarly if I read a function header, I want to insert some
>documentation before the function header
>so I don't believe processing the file line by line is the best
>solution here.


Based on what you said I would tend to agree.

If that kind of automated annotation is useful is a different story,
thou. I doubt it. Like for

>Say my opening brace is on line 95 and my closing brace 195 I want to
>insert a comment
>// for loop ends line 195


First of all a proper indentation will provide even better guidance as
to where the loop ends. And second a single block spanning 100 lines is
just plain nuts. A classic rule of thumb used to be that if the code for
a sub doesn't fit on VT220 screen, then it was too long and you should
think about splitting it. There ware two reasons for this:
- you don't want to keep scrolling up and down while thinking about this
sub
- anyting much longer becomes too complex for a single sub

Granted, times have changed and typically you can display many more
lines on modern terminals. But the second reason is still very sound.
Many people will probably consider 30-50 lines of code to be the maximum
length of code that can still be easily viewed and recognized without
too much mental scrolling.

>As I will be inserting extra lines into the middle of
>an array I think I am going to need a module to do this.


Why? Sounds like a perfect job for splice().

jue
 
Reply With Quote
 
 
 
 
n_macpherson@sky.com
Guest
Posts: n/a
 
      06-17-2008
>
> First of all a proper indentation will provide even better guidance as
> to where the loop ends. And second a single block spanning 100 lines is
> just plain nuts. A classic rule of thumb used to be that if the code for
> a sub doesn't fit on VT220 screen, then it was too long and you should
> think about splitting it. There ware two reasons for this:
> - you don't want to keep scrolling up and down while thinking about this
> sub
> - anyting much longer becomes too complex for a single sub
>
> Granted, times have changed and typically you can display many more
> lines on modern terminals. But the second reason is still very sound.
> Many people will probably consider 30-50 lines of code to be the maximum
> length of code that can still be easily viewed and recognized without
> too much mental scrolling.
>


One of the reasons I am writing this script is because we have
introduced coding standards which specify a maximum of 300 lines per
function and 70 lines for a while/if/else/for loop and I need to
highlight places in our scripts where this occurs. I agree 300 lines
for a function is probably too long but in the language concerned
anything less than 200 would be completely impractical unfortunately.

The indentation is a good point - our developers mostly develop on
site which means a variety of editors ( UltraEdit, Visual Studio,
Notepad++, our own proprietary editor ) are used. This means
indentation across scripts becomes inconsistent. One of the functions
of the script I am writing will be to make sure the indentation
conforms to the coding standards.

> Why? Sounds like a perfect job for splice().


Yes - I'd forgotten splice() will allow me to insert into the middle
of an array (as I said I have been away from Perl for a little
while) . That should work fine for my purposes.
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      06-17-2008
(E-Mail Removed) wrote:
> I know there are a number of FAQs which disscourage reading whole
> files into memory rather than line by line.


I hope the discourage you from reading whole files into memory
thoughtlessly and without good reason. It seems like you do have a good
reason to read them into memory, so go ahead and do it. There is even a
module, File::Slurp, to facilitate it.

....
>
> Memory won't be an issue - my largest file will only be 6000


Those are famous last words

I remember many times when I've said "it will only ever be X large" and
then had to eat those words. But of course, I suspect there are many many
more times that my statement held true and it never did get much larger,
but those ones don't force themselves back into your attention the way the
other ones do.

>
> I've been away from Perl for a while but I seem to remember there was
> a module File::Tie which might be suitable.


For 6000 lines of code, you should be a long long way from needing
Tie::File. In fact, last time I investigated it, the memory overhead for
Tie::File was so large that, unless your file's lines are very long, much
longer than one generally finds in a computer program, it provided little
memory benefit over slurping the file.

>
> I'd be grateful if anyone has any suggestions -


Don't worry about this particular problem until it has proven itself
to be an issue (which it probably won't)

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
Ben Morrow
Guest
Posts: n/a
 
      06-17-2008

Quoth (E-Mail Removed):
> (E-Mail Removed) wrote:
>

[slurping a file into an array]
> > I've been away from Perl for a while but I seem to remember there was
> > a module File::Tie which might be suitable.

>
> For 6000 lines of code, you should be a long long way from needing
> Tie::File. In fact, last time I investigated it, the memory overhead for
> Tie::File was so large that, unless your file's lines are very long, much
> longer than one generally finds in a computer program, it provided little
> memory benefit over slurping the file.


One major advantage of Tie::File is that the interface is exactly the
same as a slurped array, so if/when memory does become a problem, you
can simply replace

use File::Slurp qw/read_file/;

my @data = read_file 'name';

with

use Tie::File;

tie my @data, 'Tie::File', 'name' or die "can't read 'name': $!";

and leave the rest of the code unchanged.

Ben

--
Many users now operate their own computers day in and day out on various
applications without ever writing a program. Indeed, many of these users
cannot write new programs for their machines...
-- F.P. Brooks, 'No Silver Bullet', 1987 [(E-Mail Removed)]
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      06-17-2008
Ben Morrow <(E-Mail Removed)> wrote:
> Quoth (E-Mail Removed):
> > (E-Mail Removed) wrote:
> >

> [slurping a file into an array]
> > > I've been away from Perl for a while but I seem to remember there was
> > > a module File::Tie which might be suitable.

> >
> > For 6000 lines of code, you should be a long long way from needing
> > Tie::File. In fact, last time I investigated it, the memory overhead
> > for Tie::File was so large that, unless your file's lines are very
> > long, much longer than one generally finds in a computer program, it
> > provided little memory benefit over slurping the file.

>
> One major advantage of Tie::File is that the interface is exactly the
> same as a slurped array, so if/when memory does become a problem, you
> can simply replace
>
> use File::Slurp qw/read_file/;
>
> my @data = read_file 'name';


This uses 3 times as much memory as reading in the file in a while loop
and pushing it into the array. It seems like it should only be two times
as much, but it isn't (And it is 1.5 times as much @data=<$fh> takes). Of
course, most of that excess memory is eligible for later reuse, provided
your program survives and needs it.

>
> with
>
> use Tie::File;
>
> tie my @data, 'Tie::File', 'name' or die "can't read 'name': $!";
>
> and leave the rest of the code unchanged.


But my lament is that this just doesn't save all that much memory over
an already efficient slurping method, due to the overhead of Tie::File's
internal structures. I checked again on the latest Tie::File, and based on
vague recollections it does seem substantially better than the older one I
played around with, but still the memory overhead is not an insignificant
fraction of what it would be to just slurp a large file of short lines. So
I consider Tie::File to be an emergency measure I'd throw at a program to
keep it limping along while I redesign and rewrite. (Not that there is
anything wrong with that)

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
cartercc
Guest
Posts: n/a
 
      06-17-2008
On Jun 17, 6:49 am, (E-Mail Removed) wrote:
> Say my opening brace is on line 95 and my closing brace 195 I want to
> insert a comment
>
> // for loop ends line 195
>
> at line 94 (i.e immediately above the opening brace). The problem is
> that processing line by line I don't know until I get to line 195 what
> I have to change at line 9 so I have to store lines 94 to 195 in
> memory anyway
>
> Similarly if I read a function header, I want to insert some
> documentation before the function header
> so I don't believe processing the file line by line is the best
> solution here. As I will be inserting extra lines into the middle of
> an array I think I am going to need a module to do this.


I might approach this by matching delimiters. You can certainly match
delimiters and insert comments just above the opening brace. If you
match on key words (for, while, if, else, etc.) and count your lines,
you can create an intermediate file with a comment template just above
the opening brace, and then manually edit for the final program.
Something like this, maybe:

my $line_counter
my @brace_stack #holds info about your block
while(<INFILE>)
if $_ matches '{'
$line_counter++
push $brace_stack[n]
print OUTFILE "// COMMENT"
print OUTFILE $_
if $_ matches '}'
$line_counter--
pop $brace_stack[n]
print OUTFILE $_
print OUTFILE "// COMMENT"

Obviously, your logic would depend on your coding standard. I wrote
something similar in Java and developed a class that would do
something similar. Perl ought to be a lot easier.

CC
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
HTMLParser not parsing whole html file josh logan Python 4 10-26-2010 04:39 PM
regexing a file's contents without reading the whole thing? Roger Pack Ruby 3 12-02-2009 01:33 AM
Interested in System ID only, not the whole parsing ... Dhurandhar Bhatvadekar XML 5 03-04-2007 11:57 AM
*WITHOUT* using: ValidateRequest="False" for the whole page (or my whole site).... \A_Michigan_User\ ASP .Net 2 08-21-2006 02:13 PM
reading a whole file? markspace C++ 3 05-24-2004 06:59 PM



Advertisments