Velocity Reviews > Efficency and the standard library

# Efficency and the standard library

Seebs
Guest
Posts: n/a

 02-11-2010
On 2010-02-11, Peter Nilsson <(E-Mail Removed)> wrote:
>> * for (count = 0, t = strstr(target, in); t && *t;
>> t = strstr(t, in)) {

> Why are you seemingly scanning the target string for occurances
> of 'in', which I would presume to be original 'input' string?

"in", "out", "original".

> What if the string being scanned for is empty? If the string
> to be scanned is non-empty, this will produce an infinite loop.

This was indeed observed, which is why the error checking at the
beginning of the function is now modified to reject a 0-length
thing-to-replace.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / http://www.velocityreviews.com/forums/(E-Mail Removed)
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Eric Sosman
Guest
Posts: n/a

 02-11-2010
On 2/10/2010 9:30 PM, Seebs wrote:
> On 2010-02-11, Peter Nilsson<(E-Mail Removed)> wrote:
>>> for (count = 0, t = strstr(target, in); t&& *t;
>>> t = strstr(t, in)) {

>
>> Why are you seemingly scanning the target string for occurances
>> of 'in', which I would presume to be original 'input' string?

>
> "in", "out", "original".
>
>> What if the string being scanned for is empty? If the string
>> to be scanned is non-empty, this will produce an infinite loop.

>
> This was indeed observed, which is why the error checking at the
> beginning of the function is now modified to reject a 0-length
> thing-to-replace.

... which is what I imagined the `*t' test in your fragment
was supposed to catch. If zero-length targets are rejected
earlier, I'm somehow missing the purpose of the `*t'.

--
Eric Sosman
(E-Mail Removed)lid

Phred Phungus
Guest
Posts: n/a

 02-11-2010
Dann Corbit wrote:
> In article <(E-Mail Removed)>, (E-Mail Removed)
> says...
>> Phred Phungus <(E-Mail Removed)> writes:
>>
>>> Ben Pfaff wrote:
>>>> Dann Corbit <(E-Mail Removed)> writes:
>>>>
>>>>> For a standard library, the *most* important thing is correctness.
>>>>> IOW, a bsearch() function that is blazing fast but fails if there
>>>>> are more than 2 billion items is not as good as one that is only
>>>>> half as fast but works with anything up to size_t (-1) number of
>>>>> elements.
>>>> If the former bsearch() implementation is on a system that
>>>> doesn't support more than 2 billion bytes of contiguous data, I'm
>>> Why?

>> Because the bsearch() function cannot exceed a limit of 2 billion
>> items on an implementation that limits objects to 2 billion bytes
>> or less.

>
>
> If we are talking about constructing a best of breed library system for
> the C language, then the distinction is of enormous importance.
>
> While I agree it is not relevant for the original implementation, it is
> for an implementation where we are trying to pick and choose the best
> possible routines.
>
> P.S.
> The former case is still a bug waiting to happen.
>
> Even if the former implementation is a toaster IC, 40 years in the
> future, our toaster ICs will have 50 GB RAM on board.

That's a lot of computer power for something that requires almost none.
--
fred

Mark
Guest
Posts: n/a

 02-11-2010
Richard wrote:
> Total nonsense.
>
> The better one is the one that runs more efficiently and works. So
> assuming the bug ix fixed and its more efficient then other is
> better. Its called a function. You dont need to know whats in it.

The one who will maintain this and other functions, needs to know what's in
it.

--
Mark

Nick Keighley
Guest
Posts: n/a

 02-11-2010
On 11 Feb, 01:05, Ben Bacarisse <(E-Mail Removed)> wrote:
> Seebs <(E-Mail Removed)> writes:

> > This may have gotten buried given that it started in a Nilges thread,
> > but it's actually pretty interesting.

>
> > There was some discussion of algorithms to perform the task:
> > * *Inputs: *target string, replacement string, original string
> > * *Output: *a newly-allocated string containing the original
> > * * *string with each copy of the target string replaced with the
> > * * *replacement string.

>
> > Here's a hunk of mine:

>
> > * * * * for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
> > * * * * * * * * ++count;
> > * * * * * * * * t += inlen;
> > * * * * }

>
> > Here's a hunk of another one:

>
> >> * * ptrIndex1 = strMaster;
> >> * * while(*ptrIndex1)
> >> * * {
> >> * * * * ptrIndex0 = ptrIndex1;
> >> * * * * while (-1)
> >> * * * * {
> >> * * * * * * for(;
> >> * * * * * * * * *ptrIndex1 && *ptrIndex1 != *strTarget;
> >> * * * * * * * * ptrIndex1++);
> >> * * * * * * for(ptrIndex2 = strTarget, ptrIndex3 = ptrIndex1;
> >> * * * * * * * * *ptrIndex3
> >> * * * * * * * * &&
> >> * * * * * * * * *ptrIndex2
> >> * * * * * * * * &&
> >> * * * * * * * * *ptrIndex3 == *ptrIndex2;
> >> * * * * * * * * ptrIndex3++, ptrIndex2++);
> >> * * * * * * if (!*ptrIndex2) break;
> >> * * * * * * ptrIndex1 = ptrIndex3;
> >> * * * * * * if (!*ptrIndex3) break;
> >> * * * * }

>
> > These hunks are essentially performing the same part of the core loop;
> > find the next occurrence of the target string in the original string.

>
> > The second one was written by Edward Nilges ("spinoza1111"). *He offers
> > in its defense the assertion that it's the "best" algorithm. *(Nevermind
> > that it's got a bug; the bug could be fixed easily.)

>
> > My thinking:

>
> > The first is better, not because it's less buggy (that could be fixed), but
> > because it's simpler to understand. *It may, or may not, be less efficient.
> > However, efficiency will vary widely with input characteristics; for some
> > cases, the arguable inefficiencies may be completely irrelevant, while in
> > others, they'd be significant.

>
> I don't think you can dismiss the bugs so easily. *I would argue that
> it is not a coincidence that the more complex code has more bugs. *I
> can't write code that looks like spinoza1111's code and get it right.
> I *have* to write simpler code, broken into simple functions, or I have
> no chance of getting it to be correct.

"I now suggest that we confine ourselves to the design and
implementation of intellectually manageable programs." Dijkstra

> For example, the latest improvement introduced another bug[1] and I
> don't think that is simple carelessness. *Without a re-write I suspect
> almost any change is as likely to introduce a bug as fix one (I tried
> to see where the error was, but I gave up). *If I was paid to fix it,
> I'd simply start again and it would end a up as I have already posted,
> though to mirror the behaviour I'd now have to add some argument
> checking.
>
> <snip>
>
> [1] Apparently when the target string is not present in the source
> string: e.g. replace("a", "x", "b").

Nick Keighley
Guest
Posts: n/a

 02-11-2010
On 10 Feb, 19:25, Seebs <(E-Mail Removed)> wrote:

> This may have gotten buried given that it started in a Nilges thread,
> but it's actually pretty interesting.
>
> There was some discussion of algorithms to perform the task:
> * * * * Inputs: *target string, replacement string, original string
> * * * * Output: *a newly-allocated string containing the original
> * * * * * string with each copy of the target string replaced with the
> * * * * * replacement string.
>
> Here's a hunk of mine:
>
> * * * * for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
> * * * * * * * * ++count;
> * * * * * * * * t += inlen;
> * * * * }
>
> Here's a hunk of another one:
>
> > * * ptrIndex1 = strMaster;
> > * * while(*ptrIndex1)
> > * * {
> > * * * * ptrIndex0 = ptrIndex1;
> > * * * * while (-1)
> > * * * * {
> > * * * * * * for(;
> > * * * * * * * * *ptrIndex1 && *ptrIndex1 != *strTarget;
> > * * * * * * * * ptrIndex1++);
> > * * * * * * for(ptrIndex2 = strTarget, ptrIndex3 = ptrIndex1;
> > * * * * * * * * *ptrIndex3
> > * * * * * * * * &&
> > * * * * * * * * *ptrIndex2
> > * * * * * * * * &&
> > * * * * * * * * *ptrIndex3 == *ptrIndex2;
> > * * * * * * * * ptrIndex3++, ptrIndex2++);
> > * * * * * * if (!*ptrIndex2) break;
> > * * * * * * ptrIndex1 = ptrIndex3;
> > * * * * * * if (!*ptrIndex3) break;
> > * * * * }

>
> These hunks are essentially performing the same part of the core loop;
> find the next occurrence of the target string in the original string.
>
> The second one was written by Edward Nilges ("spinoza1111"). *He offers
> in its defense the assertion that it's the "best" algorithm. *(Nevermind
> that it's got a bug; the bug could be fixed easily.)
>
> My thinking:
>
> The first is better, not because it's less buggy (that could be fixed), but
> because it's simpler to understand. *It may, or may not, be less efficient.
> However, efficiency will vary widely with input characteristics; for some
> cases, the arguable inefficiencies may be completely irrelevant, while in
> others, they'd be significant.
>
> But you should always write the simpler one first, and wait until you've
> profiled the code and verified that there's a bottleneck, and you know where
> it is, before trying to optimize. *You may, or may not, be able to outperform
> a given library's implementation of strstr(). *If you have unusual inputs,
> you have a pretty good shot at it... But it may not be worth it. *The
> complicated example has had at least one bug identified, and there may be
> others. *It's extremely hard to read, partially because of the poor
> naming, and partially because of the counter-idiomatic usage. *But mostly,
> it's a lot more work to write and maintain, and for no known benefit.
>
> It's good to think a little about efficiency, but write for clarity and
> correctness first, so you have a known-working program to check your
> results if you later need to modify it for speed.

yes. If strstr() didn't exist it would be necessary to invent it. The
use of appropriate abstractions simplifies the code.

lists of memory blocks. Queues were linked lists of messages. Since
the same pointers were used for both purposes a queue was also a list
of memory blocks.

The code needed to free (return to a free list) all the memory blocks
of the messages that came before a particular message type. Two pages
of tangled code needed to find the block before the first block of the
target message followed by direct manipulation of queue head and tail
pointer and the free list pointer. The blocks also had to be marked as
free (they had some associated data). After I found my third bug I
wrote:

while (not empty (queue) and queue_head.message_type != target)
{
msg = remove_msg (queue)
free_msg (msg)
}

(pseudo code)

As my version was shorter, simpler and more correct it was adopted.
Why deal with three things at once when it can be delegated else
where? The code wanted to manipulate messages not memory blocks.

Nick Keighley
Guest
Posts: n/a

 02-11-2010
On 11 Feb, 06:52, Richard <(E-Mail Removed)> wrote:
> Seebs <(E-Mail Removed)> writes:

> > There was some discussion of algorithms to perform [some] task:

[simple code]

and

[complex code]

<snip>

> > The first is better, not because it's less buggy (that could be fixed), but
> > because it's simpler to understand. *It may, or may not, be less
> > efficient.

>
> Total nonsense.
>
> The better one is the one that runs more efficiently and works.

how do we demonstrate that it works?

> So assuming
> the bug ix fixed and its more efficient then other is better. Its called
> a function. You dont need to know whats in it.

"There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult."
-- C.A.R. Hoare

Richard Tobin
Guest
Posts: n/a

 02-11-2010
In article <(E-Mail Removed)>,
Seebs <(E-Mail Removed)> wrote:

>But you should always write the simpler one first, and wait until you've
>profiled the code and verified that there's a bottleneck, and you know where
>it is, before trying to optimize.

But wouldn't that be Extreme Programming?

-- Richard
--
Please remember to mention me / in tapes you leave behind.

Seebs
Guest
Posts: n/a

 02-11-2010
On 2010-02-11, Richard Tobin <(E-Mail Removed)> wrote:
> In article <(E-Mail Removed)>,
> Seebs <(E-Mail Removed)> wrote:
>>But you should always write the simpler one first, and wait until you've
>>profiled the code and verified that there's a bottleneck, and you know where
>>it is, before trying to optimize.

> But wouldn't that be Extreme Programming?

Only if you have two people do it at once, I think?

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / (E-Mail Removed)
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

John Bode
Guest
Posts: n/a

 02-11-2010
On Feb 11, 2:09*am, "io_x" <(E-Mail Removed)> wrote:
> "Dann Corbit" <(E-Mail Removed)> ha scritto nel messaggionews:(E-Mail Removed) nal-september.org...
>

[snip]

>
> > For a standard library, the *most* important thing is correctness. *IOW,

>
> for a standard library function
> the *most* important thing is the easy to use
> and the error handling well think
>

It doesn't matter how easy to it is to use if it gives you the wrong
It doesn't matter how well it handles errors if it gives you the wrong
It doesn't matter how fast it is if it gives you the wrong answer.
It doesn't matter how much memory it uses if it gives you the wrong

Correctness comes first. Then we can argue about everything else.

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is OffTrackbacks are On Pingbacks are On Refbacks are Off Forum Rules

 Similar Threads Thread Thread Starter Forum Replies Last Post Daniel Java 14 02-16-2005 09:01 AM Gernot Frisch C++ 3 01-24-2005 12:44 PM benevilent@optusnet.com.au Python 1 11-10-2004 07:06 PM Daniel T. C++ 2 10-19-2004 02:02 AM Earl Teigrob ASP .Net 1 07-09-2004 03:56 PM