On Thu, 10 Dec 2009 23:09:28 -0800 (PST), cvhLE <> wrote:
>On Dec 10, 3:21*pm, Ryan Chan <ryanchan...@gmail.com> wrote:
>> Hello,
>>
>> Consider the case:
>>
>> You have 200 lines of mapping to replace, in a csv format, e.g.
>>
>> apple,orange
>> boy,girl
>> ...
>>
>> You have a 500MB file, you want to replace all 200 lines of mapping,
>> what would be the most efficient way to do it?
>>
>> Thanks.
>
>If you want to replace the whole line or know the column where you
>need to replace it and the line has clear separators you may be be a
>lot faster if you do it using awk:
>
>cat csv|awk -F"," "$2~/apple/ {$2="orange"; print $1,$2} " ...
>
>otherwise I don't see a reason not to use the most obvious way:
>starting from line 1 and running until the end ... especially if dont
>know *where* the 200 lines are ...
>
>#! /usr/bin/perl -w
>%replace=('apple'=>'orange','boy'=>'girl');
>$r="(".join ("|", keys %replace ).")";$r=qr($r);
>while (<>) {
>s/$r/$replace{$1}/g;
>print;
>}
>
I would asume this would take a long
time to do this process.
At a minimum, it would take
500,000,000
x
200
-----------------
100,000,000,000
100 billion character comparisons
if nothing ever matched.
Still not matching word, but the first character
matched before backtracking
100,000,000,000
x
2
----------------
200,000,000,000
brings the total up to 200 billion character
comparisons.
Since this is all a conservative estimate
I would average (conservatively) 4 comparison
characters per map per byte in the file and say
500,000,000
x
800
-----------------
400,000,000,000
400 billion comparisons.
Add to that the menutia of backtracking, loading
buffers, writing to disk, and the underpining layers
Perl has to do to execute C code, and I would go out
for coffee or take a nap.
-sln
|