On Monday, January 14, 2013 7:24:45 PM UTC-8, Xho Jingleheimerschmidt wrote:
> On 01/09/2013 06:10 AM, C.DeRykus wrote:
>
> >
>
> > Since speed isn't critical, the Tie::File suggestion would simplify
>
> > the code considerably. Since the whole file isn't loaded, big files
>
> > won't be problematic
>
>
>
> I haven't used it in a while, but if I recall correctly Tie::File stores
>
> the entire table of line-number/byte-offset in RAM, and that can often
>
> be about as large as storing the entire file if the lines are fairly short.
>
>
Actually IIUC, Tie::File is more parsimonious of memory than even DB_File for instance and employs a
"lazy cache" whose size can be user-specified.
See:
http://perl.plover.com/TieFile/why-not-DB_File
So, even with overhead of 310 bytes per record, that
would get slow only if the file gets really huge and
least-recently read records start to get tossed.
But the stated aim was accuracy rather than speed.
And, since there's a 10Mb record limit with only 200-300K records, that's unlikely to be show-stopper status. Only a couple of seconds to read a comparably sized file in my simple test.
--
Charles DeRykus