Velocity Reviews > Can anyone improve this any further......

# Can anyone improve this any further......

Sean Kenwrick
Guest
Posts: n/a

 02-06-2004
I am writing a byte-code interpreter/emulator for a language that
exclusively uses strings for variables (i.e all variables are pushed onto
the stack as strings). Therefore for arithmetic operations I need to
convert the string to a 32 bit integer, carry out the arithmetic operation,
then convert it back to a string before pushing it back onto the stack.
This can happen millions of times per second so it is essential that I have
optimised (for speed) atol() and ltoa() functions.

Here is my attempt at ltoa()... (The __fastcall keyword is to tell my
compiler to pass arguments through registers and is used instead of inline
because it seems to be just as fast and this function gets called from
various modules ..).

static char * __fastcall myltoa(long n,char * s)
{
register long r, k;
register int flag = 0;
register int next;

next = 0;
if (n == 0) {
s[next++] = '0';
}
else {
if (n < 0) {
s[next++] = '-';
n = -n;
}

if(n < 100) goto label2;
if(n < 100000) goto label1;

k = 1000000000;
r = n/k;
if(flag) s[next++] = '0' + r;
else if(r > 0){ s[next++] = '0' + r;flag = 1;}
n -= r * k;

k = 100000000;
r = n/k;
if(flag) s[next++] = '0' + r;
else if(r > 0){ s[next++] = '0' + r;flag = 1;}
n -= r * k;

k = 10000000;
r = n/k;
if(flag) s[next++] = '0' + r;
else if(r > 0){ s[next++] = '0' + r;flag = 1;}
n -= r * k;

k = 1000000;
r = n/k;
if(flag) s[next++] = '0' + r;
else if(r > 0){ s[next++] = '0' + r;flag = 1;}
n -= r * k;

k = 100000;
r = n/k;
if(flag) s[next++] = '0' + r;
else if(r > 0){ s[next++] = '0' + r;flag = 1;}
n -= r * k;
label1:
k = 10000;
r = n/k;
if(flag) s[next++] = '0' + r;
else if(r > 0){ s[next++] = '0' + r;flag = 1;}
n -= r * k;

k = 1000;
r = n/k;
if(flag) s[next++] = '0' + r;
else if(r > 0){ s[next++] = '0' + r;flag = 1;}
n -= r * k;

k = 100;
r = n/k;
if(flag) s[next++] = '0' + r;
else if(r > 0){ s[next++] = '0' + r;flag = 1;}
n -= r * k;
label2:
k = 10;
r = n/k;
if(flag) s[next++] = '0' + r;
else if(r > 0){ s[next++] = '0' + r;flag = 1;}
n -= r * k;

r=n;
s[next++] = '0' + r;
}
s[next] = '\0';
return(s);
}

The goto's are there because I recognised that the majority of numbers are
less than 100 (mainly return codes from functions) and those that are not
less than 100 tend to be less than 100,000 (this language is not usually
used for mathemetics). This did give me another 15-20% improvement in
speed on a tight loop going some arithmetic..

By the way, I have tried pre-calculating integer values for all strings
pushed onto the stack (by always appending the 4 byte integer value on the
stack at the end of the string, but it actually caused a slowdown since the
I have to do a atol() on every function return value or string concatenation
operation even if the value is not going to be used in an arithmetic or
comparison operation.

Any hints would be welcome...

Sean

pete
Guest
Posts: n/a

 02-06-2004
Sean Kenwrick wrote:
>
> I am writing a byte-code interpreter/emulator for a language that
> exclusively uses strings for variables (i.e all variables are pushed onto
> the stack as strings). Therefore for arithmetic operations I need to
> convert the string to a 32 bit integer, carry out the arithmetic operation,
> then convert it back to a string before pushing it back onto the stack.
> This can happen millions of times per second so it is essential that I have
> optimised (for speed) atol() and ltoa() functions.
>
> Here is my attempt at ltoa()... (The __fastcall keyword is to tell my
> compiler to pass arguments through registers and is used instead of inline
> because it seems to be just as fast and this function gets called from
> various modules ..).
>
> static char * __fastcall myltoa(long n,char * s)
> {
> register long r, k;
> register int flag = 0;
> register int next;
>
> next = 0;
> if (n == 0) {
> s[next++] = '0';
> }
> else {
> if (n < 0) {
> s[next++] = '-';
> n = -n;

Undefined behavior if (-LONG_MAX > n)

> }
>
> if(n < 100) goto label2;
> if(n < 100000) goto label1;
>
> k = 1000000000;
> r = n/k;
> if(flag) s[next++] = '0' + r;

if(flag)
flag is already known to be zero, at this point.

> else if(r > 0){ s[next++] = '0' + r;flag = 1;}

> Any hints would be welcome...

#include <limits.h>
char *lto_a(long n, char *s)
{
char c, *p, *q;
int flag = 0;

q = p = s;
if (0 > n) {
*p++ = '-';
++s;
if (-LONG_MAX > n) {
flag = 1;
n = LONG_MAX;
} else {
n = -n;
}
}
do {
*p++ = (char)(n % 10 + '0');
n /= 10;
} while (n != 0);
if (flag) {
++*s;
}
for (*p = '\0'; --p > s; ++s) {
c = *s;
*s = *p;
*p = c;
}
return q;
}

--
pete

Sean Kenwrick
Guest
Posts: n/a

 02-06-2004

"pete" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Sean Kenwrick wrote:
> >
> > I am writing a byte-code interpreter/emulator for a language that
> > exclusively uses strings for variables (i.e all variables are pushed

onto
> > the stack as strings). Therefore for arithmetic operations I need to
> > convert the string to a 32 bit integer, carry out the arithmetic

operation,
> > then convert it back to a string before pushing it back onto the stack.
> > This can happen millions of times per second so it is essential that I

have
> > optimised (for speed) atol() and ltoa() functions.
> >
> > Here is my attempt at ltoa()... (The __fastcall keyword is to tell my
> > compiler to pass arguments through registers and is used instead of

inline
> > because it seems to be just as fast and this function gets called from
> > various modules ..).
> >
> > static char * __fastcall myltoa(long n,char * s)
> > {
> > register long r, k;
> > register int flag = 0;
> > register int next;
> >
> > next = 0;
> > if (n == 0) {
> > s[next++] = '0';
> > }
> > else {
> > if (n < 0) {
> > s[next++] = '-';
> > n = -n;

>
> Undefined behavior if (-LONG_MAX > n)
>
> > }
> >
> > if(n < 100) goto label2;
> > if(n < 100000) goto label1;
> >
> > k = 1000000000;
> > r = n/k;
> > if(flag) s[next++] = '0' + r;

>
> if(flag)
> flag is already known to be zero, at this point.
>
> > else if(r > 0){ s[next++] = '0' + r;flag = 1;}

>
>
> > Any hints would be welcome...

>
> Please time this for me:
>
> #include <limits.h>
> char *lto_a(long n, char *s)
> {
> char c, *p, *q;
> int flag = 0;
>
> q = p = s;
> if (0 > n)

> *p++ = '-';
> ++s;
> if (-LONG_MAX > n) {
> flag = 1;
> n = LONG_MAX;
> } else {
> n = -n;
> }
> }
> do {
> *p++ = (char)(n % 10 + '0');
> n /= 10;
> } while (n != 0);
> if (flag) {
> ++*s;
> }
> for (*p = '\0'; --p > s; ++s) {
> c = *s;
> *s = *p;
> *p = c;
> }
> return q;
> }
>
> --
> pete

After I change the call to __fastcall and use register variables It's very
slightly slower (mine took 4650ms, yours took 4860ms for a 1,000,000 times
loop doing some simple arithmetic). The difference is probably in the
strrev() required at the end - but I like the solution of going from the
right to left and I think I have a version that might not need the strrev()
at the end..... I will post my attempt and the timing later...

Thanks
Sean

Francois Grieu
Guest
Posts: n/a

 02-06-2004

#include <limits.h>

#if INT_MAX==0x7FFFFFFF && INT_MIN+2147483647>=-1
/* 32 bit signed arithmetic assumed */
char *myltoa(long n, char *s)
{
unsigned short j = 1;
if (n<0)
{ /* handle negatives */
s[0] = '-'; ++j;
if (INT_MIN+2147483647==-1 && n==INT_MIN)
{ /* handle -2147483648 the hard way */
s[1] = '2'; ++j;
n = -147483648;
}
n = -n;
}
/* find length using kinda dichotomy */
s[j += (n<10000)?(n<100)?(n>=10)n>=1000)+2:
(n<100000000)?(n<1000000)?(n>=100000)+4:
(n>=10000000)+6n>=1000000000)+8] = '\0';
/* convert from right to left */
do s[--j] = n%10+'0'; while ((n/=10)!=0);
/* all done */
return(s);
}
#endif

François Grieu

pete
Guest
Posts: n/a

 02-06-2004
Sean Kenwrick wrote:

> After I change the call to __fastcall and use register variables
> It's very slightly slower (mine took 4650ms,
> yours took 4860ms for a 1,000,000 times
> loop doing some simple arithmetic).
> The difference is probably in the strrev() required at the end
> - but I like the solution of going from the
> right to left and I think I have a version that might not need
> the strrev() at the end.....
> I will post my attempt and the timing later...

Don't forget the (-LONG_MAX > n) case.
Thank you.

--
pete

Sean Kenwrick
Guest
Posts: n/a

 02-06-2004

"Sean Kenwrick" <(E-Mail Removed)> wrote in message
news:c00ing\$4dc\$(E-Mail Removed)...
>
> "pete" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
> > Sean Kenwrick wrote:
> > >
> > > I am writing a byte-code interpreter/emulator for a language that
> > > exclusively uses strings for variables (i.e all variables are pushed

> onto
> > > the stack as strings). Therefore for arithmetic operations I need

to
> > > convert the string to a 32 bit integer, carry out the arithmetic

> operation,
> > > then convert it back to a string before pushing it back onto the

stack.
> > > This can happen millions of times per second so it is essential that I

> have
> > > optimised (for speed) atol() and ltoa() functions.
> > >
> > > Here is my attempt at ltoa()... (The __fastcall keyword is to tell my
> > > compiler to pass arguments through registers and is used instead of

> inline
> > > because it seems to be just as fast and this function gets called from
> > > various modules ..).
> > >
> > > static char * __fastcall myltoa(long n,char * s)
> > > {
> > > register long r, k;
> > > register int flag = 0;
> > > register int next;
> > >
> > > next = 0;
> > > if (n == 0) {
> > > s[next++] = '0';
> > > }
> > > else {
> > > if (n < 0) {
> > > s[next++] = '-';
> > > n = -n;

> >
> > Undefined behavior if (-LONG_MAX > n)
> >
> > > }
> > >
> > > if(n < 100) goto label2;
> > > if(n < 100000) goto label1;
> > >
> > > k = 1000000000;
> > > r = n/k;
> > > if(flag) s[next++] = '0' + r;

> >
> > if(flag)
> > flag is already known to be zero, at this point.
> >
> > > else if(r > 0){ s[next++] = '0' + r;flag = 1;}

> >
> >
> > > Any hints would be welcome...

> >
> > Please time this for me:
> >
> > #include <limits.h>
> > char *lto_a(long n, char *s)
> > {
> > char c, *p, *q;
> > int flag = 0;
> >
> > q = p = s;
> > if (0 > n)

>
> > *p++ = '-';
> > ++s;
> > if (-LONG_MAX > n) {
> > flag = 1;
> > n = LONG_MAX;
> > } else {
> > n = -n;
> > }
> > }
> > do {
> > *p++ = (char)(n % 10 + '0');
> > n /= 10;
> > } while (n != 0);
> > if (flag) {
> > ++*s;
> > }
> > for (*p = '\0'; --p > s; ++s) {
> > c = *s;
> > *s = *p;
> > *p = c;
> > }
> > return q;
> > }
> >
> > --
> > pete

>
> After I change the call to __fastcall and use register variables It's very
> slightly slower (mine took 4650ms, yours took 4860ms for a 1,000,000 times
> loop doing some simple arithmetic). The difference is probably in the
> strrev() required at the end - but I like the solution of going from the
> right to left and I think I have a version that might not need the

strrev()
> at the end..... I will post my attempt and the timing later...
>
> Thanks
> Sean
>
>

By the way I mis-remembered the timings above, they were actually 2650 and
2860ms...

Here is the version I ended up with of the above algorithm. I can be
certain that the long passed does not exceed +/- LONG_MAX so I don't need
these tests, and I've done away with the strrev() at the end by filling the
string backwards from the 11th char and then returning a pointer to where
the actual number starts in the string..

char * __fastcall myltoa(long n, char *s)
{
register char *p;
register int flag=0;

p=s+12;
*p--='\0'; /* Null terminate */

// If n is negative
if (0 > n) {
flag=1;
n = -n;
}

do {
// Next char is n%10 (Rightmost digit)
*p-- = (char)(n % 10 + '0');
// Strip off the rightmost digit
n /= 10;
} while (n != 0); // Until n is 0

if(flag)
*p-- = '-';
// Return the offset into the string where the number starts
return p+1;
}

However there was no significant change in the timings so I suspect that the
problem with the above is that it is doing two divides per digit (in fact
one modulo and 1 divide) - and divides are typically alot slower than all
the other arithmetic operations.. In my version I do 1 divide, 1 multiply
and 1 subtraction which might be a saving of quite a few cycles per digit...

I can't see any way of optimising this version any firther unless there is a
way to get rid of one of the divides...
Sean

Sean Kenwrick
Guest
Posts: n/a

 02-06-2004

"Francois Grieu" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
>
> #include <limits.h>
>
> #if INT_MAX==0x7FFFFFFF && INT_MIN+2147483647>=-1
> /* 32 bit signed arithmetic assumed */
> char *myltoa(long n, char *s)
> {
> unsigned short j = 1;
> if (n<0)
> { /* handle negatives */
> s[0] = '-'; ++j;
> if (INT_MIN+2147483647==-1 && n==INT_MIN)
> { /* handle -2147483648 the hard way */
> s[1] = '2'; ++j;
> n = -147483648;
> }
> n = -n;
> }
> /* find length using kinda dichotomy */
> s[j += (n<10000)?(n<100)?(n>=10)n>=1000)+2:
> (n<100000000)?(n<1000000)?(n>=100000)+4:
> (n>=10000000)+6n>=1000000000)+8] = '\0';
> /* convert from right to left */
> do s[--j] = n%10+'0'; while ((n/=10)!=0);
> /* all done */
> return(s);
> }
> #endif
>
>
> François Grieu

This is similar to my solution expect that I didn't bother checking against
INT_MIN/MAX (because I know that these can't be exceeded) and I didn't
bother filling the string from the first character (so no need for the
length calculation (I just return a pointer to the offset in the string).
But this suffers from the same problem that I had with my version (See
previous post) in that it effectively does two divides per digit and this
seems to be causing the function to be slower than my orignal by about 10%..

If there is any way to get rid of one of the divides then this would
probably be faster..

Sean

pete
Guest
Posts: n/a

 02-06-2004
Sean Kenwrick wrote:

> However there was no significant change in the timings
> so I suspect that the problem with the above is that it
> is doing two divides per digit
> (in fact one modulo and 1 divide)
> - and divides are typically alot slower than all
> the other arithmetic operations.. In my version I do 1 divide,
> 1 multiply and 1 subtraction which might be a saving of quite
> a few cycles per digit...
>
> I can't see any way of optimising this version any
> firther unless there is a way to get rid of one of the divides...

There is a way, but is it faster ?

#include <stdlib.h>
ldiv_t ldiv(long numer, long denom);

--
pete

Sean Kenwrick
Guest
Posts: n/a

 02-06-2004

"pete" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Sean Kenwrick wrote:
>
> > However there was no significant change in the timings
> > so I suspect that the problem with the above is that it
> > is doing two divides per digit
> > (in fact one modulo and 1 divide)
> > - and divides are typically alot slower than all
> > the other arithmetic operations.. In my version I do 1 divide,
> > 1 multiply and 1 subtraction which might be a saving of quite
> > a few cycles per digit...
> >
> > I can't see any way of optimising this version any
> > firther unless there is a way to get rid of one of the divides...

>
> There is a way, but is it faster ?
>
> #include <stdlib.h>
> ldiv_t ldiv(long numer, long denom);
>
> --
> pete

I got rid of the extra divide and replaced it with a multiply and subtract
(to do the modulo bit). It is now as fast as my original, and even
slightly faster in some cases since I don't need to short-cut numbers less
than 100,000 or 100 for example... It is also a much cleaner solution so I
will use this instead. Here is my final version...

char * __fastcall myltoa(long n, char *s)
{
register int flag=0;
register int x;

s+=12;
*s--='\0'; /* Null terminate */

/* If n is negative */
if (0 > n) {
flag=1;
n = -n;
}

do {
/* Next char is n%10 (Rightmost digit) */
x=n/10;
*s-- = n -(x*10) + '0';
/* Strip off the rightmost digit */
n = x;
} while (n != 0); /* Until n is 0 */

if(flag)
*s-- = '-';
/* Return the offset into the string where the number starts */
return s+1;
}

Sean

Francois Grieu
Guest
Posts: n/a

 02-07-2004
In article <c00tva\$s3c\$(E-Mail Removed)>,
"Sean Kenwrick" <(E-Mail Removed)> wrote:

> I got rid of the extra divide and replaced it with a multiply and subtract
> (to do the modulo bit). It is now (faster)

In my experience, is unusual that % is slower than / is for unsigned
quatities. I guess your system has a problem with modulo of signed
quantities.
Since apparently you have no requirement that the string returned
starts where s does, maybe try:

/* 32 bit signed arithmetic assumed */
/* does not handle -2147483648 */
char *myltoa(long n, char *s)
{
char neg;
*(s += 11) = neg = '\0';
if (n<0)
{
n = -n; /* does not handle -2147483648 */
neg = '-';
}
do *--s = (unsigned)n%10+'0';
while ((n = (unsigned)n/10)!=0);
if (neg!='\0')
*--s = neg;
return(s);
}

François Grieu