Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   SUBSTR() with replacement or lvalue performance issues (http://www.velocityreviews.com/forums/t908843-substr-with-replacement-or-lvalue-performance-issues.html)

sln@netherlands.com 11-07-2008 02:17 AM

SUBSTR() with replacement or lvalue performance issues
 
I've read the docs on substr many a times but I still am not
quite clear on if being used as a lvalue or the replacement parameter.

I have a possible quite large string (could be megabytes).
I wan't to insert, possible in the middle a replacement text.
I'm running through an itteration on the string throught sub's etc.

Apart from like, copy from the start of a matched position, to a
file (as opposed to another buffer), then catenating the modification
to the file, then continue on with the next match, is the substr
(lvalue or replacement) a viable option?

I have to consider performance on such large operations.

What do you think would be the performance 'hit' if modifying
the string in-place using substr as either an lvalue or replacement?

There has to be some memcpy()'s or moves involved.
If replacement based, I can adjust the pos() for the next match,
but to insert even a little change in string size, in the middle
of a very large string could be a big performance hit?

All help is appretiated!
TIA

sln


xhoster@gmail.com 11-07-2008 03:22 AM

Re: SUBSTR() with replacement or lvalue performance issues
 
sln@netherlands.com wrote:
> I've read the docs on substr many a times but I still am not
> quite clear on if being used as a lvalue or the replacement parameter.
>
> I have a possible quite large string (could be megabytes).
> I wan't to insert, possible in the middle a replacement text.
> I'm running through an itteration on the string throught sub's etc.
>
> Apart from like, copy from the start of a matched position, to a
> file (as opposed to another buffer), then catenating the modification
> to the file, then continue on with the next match,


Are you describing just the ordinary practice if writing your output to
an output file while looping over a read of the input file? That is
often a good way to do things.

> is the substr
> (lvalue or replacement) a viable option?
>
> I have to consider performance on such large operations.
>
> What do you think would be the performance 'hit' if modifying
> the string in-place using substr as either an lvalue or replacement?


On my system it takes a little under a second to use substr to splice
into the middle of a 1GB string (in a way that makes the string longer)

##baseline:
time perl -le 'my $x; $x.="x"x1000 foreach 1..1e6; \
substr $x, 4e8, 3, "yyyy" foreach 1..0 ;'
1.352u 0.904s 0:02.29 98.2% 0+0k 0+0io 0pf+0w

##20 substrs
time perl -le 'my $x; $x.="x"x1000 foreach 1..1e6; \
substr $x, 4e8, 3, "yyyy" foreach 1..0 ;'
16.585u 1.084s 0:18.31 96.4% 0+0k 0+0io 0pf+0w

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

sln@netherlands.com 11-07-2008 05:48 AM

Re: SUBSTR() with replacement or lvalue performance issues
 
On 07 Nov 2008 03:22:43 GMT, xhoster@gmail.com wrote:

>sln@netherlands.com wrote:
>> I've read the docs on substr many a times but I still am not
>> quite clear on if being used as a lvalue or the replacement parameter.
>>
>> I have a possible quite large string (could be megabytes).
>> I wan't to insert, possible in the middle a replacement text.
>> I'm running through an itteration on the string throught sub's etc.
>>
>> Apart from like, copy from the start of a matched position, to a
>> file (as opposed to another buffer), then catenating the modification
>> to the file, then continue on with the next match,

>
>Are you describing just the ordinary practice if writing your output to
>an output file while looping over a read of the input file? That is
>often a good way to do things.
>

[snip]

Yes. I'm looking at your benchmarks for substr(). I have nightmares on
if threre are like over 100,000 replacements in a gigantic string which
could easily happen. There is a block replacement, I'm not doing s///g
on a gigantic string. I'm finding a block, then acting on indirect values,
then re-inserting them in the stream. Regexp has stopped, to be re'pos()
later (or not). Depends on if a seperate file is being constructed or
the replacement is acting strictly on the buffer.

I'm looking over you benches. I have a bad feeling though. Insertion
(or deletion) several hundred thousand possible times could reek havoc
maybe.

Checking... thanks!

sln

>> is the substr
>> (lvalue or replacement) a viable option?
>>
>> I have to consider performance on such large operations.
>>
>> What do you think would be the performance 'hit' if modifying
>> the string in-place using substr as either an lvalue or replacement?

>
>On my system it takes a little under a second to use substr to splice
>into the middle of a 1GB string (in a way that makes the string longer)
>
>##baseline:
>time perl -le 'my $x; $x.="x"x1000 foreach 1..1e6; \
> substr $x, 4e8, 3, "yyyy" foreach 1..0 ;'
>1.352u 0.904s 0:02.29 98.2% 0+0k 0+0io 0pf+0w
>
>##20 substrs
>time perl -le 'my $x; $x.="x"x1000 foreach 1..1e6; \
> substr $x, 4e8, 3, "yyyy" foreach 1..0 ;'
>16.585u 1.084s 0:18.31 96.4% 0+0k 0+0io 0pf+0w
>
>Xho



Michele Dondi 11-07-2008 10:41 AM

Re: SUBSTR() with replacement or lvalue performance issues
 
On Fri, 07 Nov 2008 02:17:58 GMT, sln@netherlands.com wrote:

>Apart from like, copy from the start of a matched position, to a
>file (as opposed to another buffer), then catenating the modification
>to the file, then continue on with the next match, is the substr
>(lvalue or replacement) a viable option?
>
>I have to consider performance on such large operations.


ISTR that the lvaluedness of substr()'s return value, as long as the
fact that you can EVEN take references of it and modify the string
with a sort of action-at-distance was put there specifically for
performance issues. At some point there were problems with
substitutions having a lenght larger than the substituted IalsoIRC,
but they should be solved in recent enough perls.

See: <http://perlmonks.org/?node_id=498434>


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,

smallpond 11-07-2008 04:00 PM

Re: SUBSTR() with replacement or lvalue performance issues
 
On Nov 7, 12:48 am, s...@netherlands.com wrote:
>
> I'm looking over you benches. I have a bad feeling though. Insertion
> (or deletion) several hundred thousand possible times could reek havoc
> maybe.
>



code can reek, but havoc must be wreaked.

It is good practice to see if you actually have a problem before doing
optimization. "First make it right, then make it fast."


Mirco Wahab 11-07-2008 04:21 PM

Re: SUBSTR() with replacement or lvalue performance issues
 
sln@netherlands.com wrote:
> Yes. I'm looking at your benchmarks for substr(). I have nightmares on
> if threre are like over 100,000 replacements in a gigantic string which
> could easily happen. There is a block replacement, I'm not doing s///g
> on a gigantic string. I'm finding a block, then acting on indirect values,
> then re-inserting them in the stream. Regexp has stopped, to be re'pos()
> later (or not). Depends on if a seperate file is being constructed or
> the replacement is acting strictly on the buffer.


What is the problem domain? "Megabyte strings" and
"100,000 things" to might not turn out that slow on
a usual 3GHz Core2 that has 6MB L2 available.

> I'm looking over you benches. I have a bad feeling though. Insertion
> (or deletion) several hundred thousand possible times could reek havoc
> maybe.


A 'left value'-substr() might be beaten by an
"Inline C" based use of memchr/memcpy - that
unfortunately would need to handle the reallocation
of the buffer manually (if possible/feasible at
all in your problem range).

Regards

M.

xhoster@gmail.com 11-07-2008 05:49 PM

Re: SUBSTR() with replacement or lvalue performance issues
 
Michele Dondi <bik.mido@tiscalinet.it> wrote:
> On Fri, 07 Nov 2008 02:17:58 GMT, sln@netherlands.com wrote:
>
> >Apart from like, copy from the start of a matched position, to a
> >file (as opposed to another buffer), then catenating the modification
> >to the file, then continue on with the next match, is the substr
> >(lvalue or replacement) a viable option?
> >
> >I have to consider performance on such large operations.

>
> ISTR that the lvaluedness of substr()'s return value, as long as the
> fact that you can EVEN take references of it and modify the string
> with a sort of action-at-distance was put there specifically for
> performance issues. At some point there were problems with
> substitutions having a lenght larger than the substituted IalsoIRC,
> but they should be solved in recent enough perls.
>
> See: <http://perlmonks.org/?node_id=498434>


My reading of that is there used to be problems having more than one
action-at-distance references outstanding on the same string, regardless
of the size of the replacements.

Doing replacements that don't preserve length can have a performance
impact, but I don't see that as a "problem" in the same way a bug is a
"problem"; just as one of the trade-offs that always exist and which need
to be kept in mind. I doubt this performance issue will be "fixed" anytime
soon, as it would likely require a fundamental change of the way strings
are managed (i.e. as ropes rather than as contiguous memory regions.)

Of course, the other issue is one of semantics. If I take reference like
my $x=\substr($q,100,10), and then later do an insertion like
substr($q,10,0,"xxx"), will $x now refer to the same *characters* as it did
before (i.e. substr($q,103,10)) or to the same *positions* as it did
before. Empirically, it refers to the same positions.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

sln@netherlands.com 11-09-2008 08:27 PM

Re: SUBSTR() with replacement or lvalue performance issues
 
On Fri, 07 Nov 2008 11:41:21 +0100, Michele Dondi <bik.mido@tiscalinet.it> wrote:

>On Fri, 07 Nov 2008 02:17:58 GMT, sln@netherlands.com wrote:
>
>>Apart from like, copy from the start of a matched position, to a
>>file (as opposed to another buffer), then catenating the modification
>>to the file, then continue on with the next match, is the substr
>>(lvalue or replacement) a viable option?
>>
>>I have to consider performance on such large operations.

>
>ISTR that the lvaluedness of substr()'s return value, as long as the
>fact that you can EVEN take references of it and modify the string
>with a sort of action-at-distance was put there specifically for
>performance issues. At some point there were problems with
>substitutions having a lenght larger than the substituted IalsoIRC,
>but they should be solved in recent enough perls.
>
>See: <http://perlmonks.org/?node_id=498434>
>
>
>Michele


If c, place a 0 at the start of find, save pointer to begin of
last find, add ptr to list.
Create a new char[modified size], add ptr to list
Repeat until end of string.

Write pointer list to file/buffer (file).
Delete list of ptrs.

Perl can't do reference mid string. Jimmy jack it maybe..

sln


sln@netherlands.com 11-10-2008 10:23 PM

Re: SUBSTR() with replacement or lvalue performance issues
 
On Fri, 07 Nov 2008 11:41:21 +0100, Michele Dondi <bik.mido@tiscalinet.it> wrote:

>On Fri, 07 Nov 2008 02:17:58 GMT, sln@netherlands.com wrote:
>
>>Apart from like, copy from the start of a matched position, to a
>>file (as opposed to another buffer), then catenating the modification
>>to the file, then continue on with the next match, is the substr
>>(lvalue or replacement) a viable option?
>>
>>I have to consider performance on such large operations.

>
>ISTR that the lvaluedness of substr()'s return value, as long as the
>fact that you can EVEN take references of it and modify the string
>with a sort of action-at-distance was put there specifically for
>performance issues. At some point there were problems with
>substitutions having a lenght larger than the substituted IalsoIRC,
>but they should be solved in recent enough perls.
>
>See: <http://perlmonks.org/?node_id=498434>
>
>
>Michele


^^^^^^^^^^^^

Being able to get segment references while not altering the
string works pretty good. Altering the string with the ref's is
possible but I wouldn't trust it and the string would still shrink/expand.

In the simplest example usage, something like below would seem to solve
performance issues. Thanks for the link!


sln
---------------------------------

use strict;
use warnings;

my $bigstring = \"some big big big scalar string";
my $modstring;
my $lastpos = 0;
my @segrefs = ();

while ($$bigstring =~ /big/g)
{
my ($offset, $curpos) = ($-[0], pos($$bigstring));

# modify part (local copy) of the big string
$modstring = substr $$bigstring, $offset, ($curpos - $offset);
$modstring .= "-huge";

# cache the interval (read only) and modstring references
push @segrefs, \substr $$bigstring, $lastpos, ($offset - $lastpos);
push @segrefs, \$modstring;

$lastpos = $curpos;
}

# print the new string (to a file maybe)

if ($lastpos)
{
push @segrefs, \substr $$bigstring, $lastpos;
for (@segrefs) {
print $$_;
}
}


Ilya Zakharevich 11-10-2008 11:44 PM

Re: SUBSTR() with replacement or lvalue performance issues
 
[A complimentary Cc of this posting was NOT [per weedlist] sent to
Michele Dondi
<bik.mido@tiscalinet.it>], who wrote in article <bs38h49bu6oanughiuvgdgr8rfen2v8fh0@4ax.com>:
> ISTR that the lvaluedness of substr()'s return value, as long as the
> fact that you can EVEN take references of it and modify the string
> with a sort of action-at-distance was put there specifically for
> performance issues. At some point there were problems with
> substitutions having a lenght larger than the substituted IalsoIRC,
> but they should be solved in recent enough perls.
>
> See: <http://perlmonks.org/?node_id=498434>


Simple experiments show that it is still buggy with 5.8.8: code below returns

the quick brown fox jumps over the laxy dog
the quick brown fox jumps over the lazy dog

Hope this helps,
Ilya

#!/usr/bin/perl -w
use strict;

my $bigScalar = 'the quick brown fox jumps over the laxy dog';

sub change_nth ($$@) {
my($n, $subst) = (shift, shift);
$_[$n] = $subst;
return; # Just in case: avoid $_[7] being returned
}

change_nth 7, 'lazy', map{ substr $bigScalar, $_->[0], $_->[1] }
[0,3], [4,5], [10,5], [16,3], [20,5], [26,4], [31,3], [35,4], [40,3];
print "$bigScalar\n";

change_nth 1, 'lazy', substr($bigScalar, 31, 3), substr($bigScalar, 35, 4),
substr($bigScalar, 40, 3);
print "$bigScalar\n";
__END__


All times are GMT. The time now is 07:54 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.