Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   Bug/Gross InEfficiency in HeathField's fgetline program (http://www.velocityreviews.com/forums/t542648-bug-gross-inefficiency-in-heathfields-fgetline-program.html)

Antoninus Twink 10-07-2007 10:16 PM

Bug/Gross InEfficiency in HeathField's fgetline program
 
The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.

char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

Proposed solution:

char *dot_to_underscore(const char *s)
{
char *t, *u;
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;
}


Richard Heathfield 10-07-2007 10:55 PM

Re: Bug/Gross InEfficiency in HeathField's fgetline program
 
Antoninus Twink said:

> The function below is from Richard HeathField's fgetline program.


It appears to be from my emgen utility, in fact.

> For
> some reason, it makes three passes through the string (a strlen(), a
> strcpy() then another pass to change dots) when two would clearly be
> sufficient. This could lead to unnecessarily bad performance on very
> long strings.


If it becomes a problem, I'll fix it. So far, it has not been a problem.

> It is also written in a hard-to-read and clunky style.


A matter of opinion. Which bit did you find hard to read?

> char *dot_to_underscore(const char *s)
> {
> char *t = malloc(strlen(s) + 1);
> if(t != NULL)
> {
> char *u;
> strcpy(t, s);
> u = t;
> while(*u)
> {
> if(*u == '.')
> {
> *u = '_';
> }
> ++u;
> }
> }
> return
> t;
> }
>
> Proposed solution:
>
> char *dot_to_underscore(const char *s)
> {
> char *t, *u;
> if(t=u=malloc(strlen(s)+1))
> while(*u++=(*s=='.' ? s++, '_' : *s++));
> return t;
> }


It is not obvious to me that this code correctly replaces the code I wrote.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Mark McIntyre 10-07-2007 11:05 PM

Re: Bug/Gross InEfficiency in HeathField's fgetline program
 
On Mon, 8 Oct 2007 00:16:07 +0200 (CEST), in comp.lang.c , Antoninus
Twink <nospam@nospam.com> wrote:

>The function below is from Richard HeathField's fgetline program.


Troll alert.

>it makes three passes through the string (a strlen(), a
>strcpy() then another pass to change dots) when two would clearly be
>sufficient. This could lead to unnecessarily bad performance on very
>long strings.


Possibly but consider the three laws of optimisation, and the data
typically being processed.

>It is also written in a hard-to-read and clunky style.


I disagree.

> char *t, *u;
> if(t=u=malloc(strlen(s)+1))


hilarious !

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Tor Rustad 10-08-2007 12:03 AM

Re: Bug/Gross InEfficiency in HeathField's fgetline program
 
Antoninus Twink wrote:

> The function below is from Richard HeathField's fgetline program. For
> some reason, it makes three passes through the string (a strlen(), a
> strcpy() then another pass to change dots) when two would clearly be
> sufficient.


What new, R.H. has written slow code for years. ;-)

> This could lead to unnecessarily bad performance on very
> long strings. It is also written in a hard-to-read and clunky style.


ROTFL


> Proposed solution:
>
> char *dot_to_underscore(const char *s)
> {
> char *t, *u;
> if(t=u=malloc(strlen(s)+1))
> while(*u++=(*s=='.' ? s++, '_' : *s++));
> return t;
> }


Hmm.. who's code was hard-to-read and clunky?


Anyway, you are missing the point. Strings are usually rather short, and
when located in L1 cache, is doesn't matter much, scanning it 2 or 3 times.

However, a real spead-up here, would be to drop malloc() and use fixed
sized strings. That's the way, to beat the crap out of OO code.

--
Tor <torust [at] online [dot] no>

C-FAQ: http://c-faq.com/

Richard Heathfield 10-08-2007 12:10 AM

Re: Bug/Gross InEfficiency in HeathField's fgetline program
 
Tor Rustad said:

<snip>
>
> Anyway, you are missing the point. Strings are usually rather short, and
> when located in L1 cache, is doesn't matter much, scanning it 2 or 3
> times.


The important thing is to avoid reading it n times.

> However, a real spead-up here, would be to drop malloc() and use fixed
> sized strings. That's the way, to beat the crap out of OO code.


If performance were the primary consideration, that's exactly what I'd have
done. Since it wasn't, it isn't. I considered robustness in the face of
long strings to be more important.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Peter Nilsson 10-08-2007 02:59 AM

Re:Bug/Gross InEfficiency in HeathField's fgetline program
 
Antoninus Twink <nos...@nospam.com> wrote:
> The function below is from Richard HeathField's fgetline
> program. For some reason, it makes three passes through
> the string (a strlen(), a strcpy() then another pass to
> change dots) when two would clearly be sufficient. ...


For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.

--
Peter


CBFalconer 10-08-2007 03:15 AM

Re: Bug/Gross InEfficiency in HeathField's fgetline program
 
Mark McIntyre wrote:
> Antoninus Twink <nospam@nospam.com> wrote:
>

.... snip ...
>
>> char *t, *u;
>> if(t=u=malloc(strlen(s)+1))

>
> hilarious !


What is hilarious? It should detect the failure of malloc quite
reliably. Of course the lack of blanks is rather foul.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>



--
Posted via a free Usenet account from http://www.teranews.com


Chris Thomasson 10-08-2007 05:25 AM

Re: Re:Bug/Gross InEfficiency in HeathField's fgetline program
 
"Peter Nilsson" <airia@acay.com.au> wrote in message
news:1191812348.091709.63610@g4g2000hsf.googlegrou ps.com...
> Antoninus Twink <nos...@nospam.com> wrote:
>> The function below is from Richard HeathField's fgetline
>> program. For some reason, it makes three passes through
>> the string (a strlen(), a strcpy() then another pass to
>> change dots) when two would clearly be sufficient. ...

>
> For some reason you've capitalised three letters in his
> name, when two would clearly be sufficient.


Perhaps when he says his name out loud with a nervous tinge, it sound as if
there should be to capital letters... Speaking out load:

"That darn Heath, possible minor/stutter, Field!"

Na... Nobody is that serious about the pronouncement of somebody's name...
Ahh, it could be a simple typo...



Richard Heathfield 10-08-2007 05:50 AM

Re:Bug/Gross InEfficiency in HeathField's fgetline program
 
Peter Nilsson said:

> Antoninus Twink <nos...@nospam.com> wrote:
>> The function below is from Richard HeathField's fgetline
>> program. For some reason, it makes three passes through
>> the string (a strlen(), a strcpy() then another pass to
>> change dots) when two would clearly be sufficient. ...

>
> For some reason you've capitalised three letters in his
> name, when two would clearly be sufficient.


The mis-capitalisation is consistent with that used by a troll named "Paul"
(with an email address containing "paulcr"), who once threatened to break
my nose, apparently because he didn't know C very well.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Antoninus Twink 10-08-2007 08:38 AM

Re: Bug/Gross InEfficiency in HeathField's fgetline program
 
On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
> Antoninus Twink said:
>
>> The function below is from Richard HeathField's fgetline program.

>
> It appears to be from my emgen utility, in fact.
>
>> For
>> some reason, it makes three passes through the string (a strlen(), a
>> strcpy() then another pass to change dots) when two would clearly be
>> sufficient. This could lead to unnecessarily bad performance on very
>> long strings.

>
> If it becomes a problem, I'll fix it. So far, it has not been a problem.


But what's frustrating is that it's an inefficiency that's completely
gratuitous! We all know that micro-optimization is bad, but this is a
micro-anti-optimization, which is surely worse!

The most natural way to look at this is "copy the characters from one
string to another, replacing . by _ when we see it". This has the
benefit of being a 1-pass algorithm. Instead, you split this up "first
copy one string to another; then go back to the beginning and swap . for
_". This makes a simple single operation into two, at the same time
introducing an extra pass through the string! It's not as if there's so
much fiendish complexity here that there's any benefit in breaking it up
into two separate operations.

>
>> It is also written in a hard-to-read and clunky style.

>
> A matter of opinion. Which bit did you find hard to read?


The function is a completely trivial one, yet I can't see it all at once
in my editor without scrolling! Whitespace can help readability, but
excessive whitespace can reduce it, and at the same time give too much
weight to things that aren't important.

>
>> char *dot_to_underscore(const char *s)
>> {
>> char *t = malloc(strlen(s) + 1);
>> if(t != NULL)
>> {
>> char *u;
>> strcpy(t, s);
>> u = t;
>> while(*u)
>> {
>> if(*u == '.')
>> {
>> *u = '_';
>> }
>> ++u;
>> }
>> }
>> return
>> t;
>> }
>>
>> Proposed solution:
>>
>> char *dot_to_underscore(const char *s)
>> {
>> char *t, *u;
>> if(t=u=malloc(strlen(s)+1))
>> while(*u++=(*s=='.' ? s++, '_' : *s++));
>> return t;
>> }

>
> It is not obvious to me that this code correctly replaces the code I wrote.


If you believe that it doesn't correctly replace the code you wrote, it
would be easy to demonstrate that by pointing out a specific input s for
which it gives a different result, or an error (syntax error or
undefined behavior or whatever).



All times are GMT. The time now is 03:00 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.