Velocity Reviews > C++ > optimizing the integer rescaling

# optimizing the integer rescaling

Guest
Posts: n/a

 09-08-2010
Hello,

I am trying to optimize integer rescaling (see example at the end).

There is a range or integer elements (in the example, elements are of
type unsigned short), current range maximum, and the desired maximum.
Straight forward way is to multiply by new maximum, and then to divide
by old maximum (taking care about the overflow).

Are there better ways of rescaling the integers?

///////////////////////////////////
#include <iostream>

unsigned short a = 0x1234; // value in the range to rescale
unsigned short b = 0x3fff; // new maximum range value
unsigned short c = 0x0800; // current maximum range value

int main()
{
const unsigned int d = a * b;
const unsigned short e = d / c;
std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
}
///////////////////////////////////

Victor Bazarov
Guest
Posts: n/a

 09-08-2010
On 9/8/2010 6:43 AM, Vladimir Jovic wrote:
> I am trying to optimize integer rescaling (see example at the end).

Optimize? As in "speed it up"?

> There is a range or integer elements (in the example, elements are of
> type unsigned short), current range maximum, and the desired maximum.
> Straight forward way is to multiply by new maximum, and then to divide
> by old maximum (taking care about the overflow).
>
> Are there better ways of rescaling the integers?

Better? In what way? To make it more precise perform the operation
using [long] doubles and then truncate or round the results.

> ///////////////////////////////////
> #include <iostream>
>
> unsigned short a = 0x1234; // value in the range to rescale

It's *not* in the range. The "current maximum" is smaller than this value.

> unsigned short b = 0x3fff; // new maximum range value
> unsigned short c = 0x0800; // current maximum range value
>
>
> int main()
> {
> const unsigned int d = a * b;
> const unsigned short e = d / c;

It is better to store the intermediate value in a type that is
*guaranteed* to be larger. 'int' is not necessarily larger than
'short'. The actual solution apparently depends on the implementation
which controls the sizes of integral values. I would probably seek a
platform-specific solution.

> std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
> }
> ///////////////////////////////////

V
--

Guest
Posts: n/a

 09-08-2010
Victor Bazarov wrote:
> On 9/8/2010 6:43 AM, Vladimir Jovic wrote:
>> I am trying to optimize integer rescaling (see example at the end).

>
> Optimize? As in "speed it up"?

Yes, like use operations which are going to be faster then one
multiplication + one division (as in the example)

>
>> There is a range or integer elements (in the example, elements are of
>> type unsigned short), current range maximum, and the desired maximum.
>> Straight forward way is to multiply by new maximum, and then to divide
>> by old maximum (taking care about the overflow).
>>
>> Are there better ways of rescaling the integers?

>
> Better? In what way? To make it more precise perform the operation
> using [long] doubles and then truncate or round the results.

To do that, you need to convert integer to double, multiply and divide,
and back to integer. That is not going to be faster. Might be more
precise, but it shouldn't be.

>
>> ///////////////////////////////////
>> #include <iostream>
>>
>> unsigned short a = 0x1234; // value in the range to rescale

>
> It's *not* in the range. The "current maximum" is smaller than this value.

Yes, it is not a range, but lets imagine it is a randomly taken element
from a range Then, having a range or randomly chosen element doesn't
make much difference *how* will you rescale it.

>
>> unsigned short b = 0x3fff; // new maximum range value
>> unsigned short c = 0x0800; // current maximum range value

My mistake about the current maximum :
unsigned short c = 0x2800; // current maximum range value

>>
>>
>> int main()
>> {
>> const unsigned int d = a * b;
>> const unsigned short e = d / c;

>
> It is better to store the intermediate value in a type that is
> *guaranteed* to be larger. 'int' is not necessarily larger than
> 'short'. The actual solution apparently depends on the implementation
> which controls the sizes of integral values. I would probably seek a
> platform-specific solution.

Good point. Which NG would you suggest for x86?

Alf P. Steinbach /Usenet
Guest
Posts: n/a

 09-08-2010
* Vladimir Jovic, on 08.09.2010 12:43:
> Hello,
>
> I am trying to optimize integer rescaling (see example at the end).
>
> There is a range or integer elements (in the example, elements are of type
> unsigned short), current range maximum, and the desired maximum. Straight
> forward way is to multiply by new maximum, and then to divide by old maximum
> (taking care about the overflow).
>
> Are there better ways of rescaling the integers?
>
>
>
> ///////////////////////////////////
> #include <iostream>
>
> unsigned short a = 0x1234; // value in the range to rescale
> unsigned short b = 0x3fff; // new maximum range value
> unsigned short c = 0x0800; // current maximum range value
>
>
> int main()
> {
> const unsigned int d = a * b;
> const unsigned short e = d / c;
> std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
> }
> ///////////////////////////////////

As Victor noted your 'b' is out of range. That means that you've not paid
attention to detail. The details may or may not be important depending on the
problem at hand, which you do not describe.

In the general case you need to first decide on the distribution you want. For
example, are your maximums inclusive or exclusive? The value 0x0800 seems like
an exclusive maximum, while 0x3fff seems like an inclusive one.

When you have decided /what/ your rescaling should do, then you should implement
it correctly. That means e.g. avoiding overflow. The above code can easily
overflow, so as portable code it's generally incorrect no matter the details of
the desired scaling (it's like a car with square wheels, doesn't matter whether
it's meant to be a sports car or a lorry, it's just wrong).

So, yes, there are better ways.

The first step on the road to betterness is to define the desired effect, and
the second step is to implement that correctly. Only then worry about micro
efficiency. If at all (and if you do, first measure, Measure, MEASURE).

Cheers & hth.,

- Alf

--
blog at <url: http://alfps.wordpress.com>

Guest
Posts: n/a

 09-08-2010
Alf P. Steinbach /Usenet wrote:
> * Vladimir Jovic, on 08.09.2010 12:43:
>> Hello,
>>
>> I am trying to optimize integer rescaling (see example at the end).
>>
>> There is a range or integer elements (in the example, elements are of
>> type
>> unsigned short), current range maximum, and the desired maximum. Straight
>> forward way is to multiply by new maximum, and then to divide by old
>> maximum
>> (taking care about the overflow).
>>
>> Are there better ways of rescaling the integers?
>>
>>

[snip old code]

> As Victor noted your 'b' is out of range. That means that you've not
> paid attention to detail. The details may or may not be important
> depending on the problem at hand, which you do not describe.
>

Actually, the first part of the algorithm is calculating histogram of an
image. Another part of the algorithm is using cumulative sum of the
histogram as a lookup table.

Since the maximum value in the histogram is number of points in the
image (one point is 16 bits), I need to rescale all values in the
histogram to the maximum value of 16 bits (which is 0xffff).

All parts of the algorithm are optimized (at least I think that is
true), except for this rescaling. I am doing it after I calculate the
cumulative sum, and I am doing it as shown in the code example (one
multiplication and one division). It works fine, but I was wondering if
there is a faster way.

> In the general case you need to first decide on the distribution you
> want. For example, are your maximums inclusive or exclusive? The value
> 0x0800 seems like an exclusive maximum, while 0x3fff seems like an
> inclusive one.
>

Yes, as I said else thread, that is a small mistake. For completeness,
here is the correct version :

///////////////////////////////////
#include <iostream>

unsigned short a = 0x1234; // value in the range to rescale
unsigned short b = 0x3fff; // new maximum range value
unsigned short c = 0x2800; // current maximum range value
int main()
{
const unsigned int d = a * b;
const unsigned short e = d / c;
std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
}
///////////////////////////////////

> When you have decided /what/ your rescaling should do, then you should
> implement it correctly. That means e.g. avoiding overflow. The above
> code can easily overflow, so as portable code it's generally incorrect
> no matter the details of the desired scaling (it's like a car with
> square wheels, doesn't matter whether it's meant to be a sports car or a
> lorry, it's just wrong).
>

I didn't really think about code portability, because my target is set
(x86 - some pentium processor). I also wasn't thinking about the
overflow, because that is easily fixed in proper unit tests.

The problem I am facing is the algorithm can not be used, because it
uses too much CPU

> So, yes, there are better ways.
>
> The first step on the road to betterness is to define the desired
> effect, and the second step is to implement that correctly. Only then
> worry about micro efficiency. If at all (and if you do, first measure,
> Measure, MEASURE).
>

I did measure, and the profiler told that 30% time is spent on rescaling.

ps Is this another problem for comp.programming ? Is there a NG where to
post optimization questions?

Victor Bazarov
Guest
Posts: n/a

 09-08-2010
On 9/8/2010 9:36 AM, Vladimir Jovic wrote:
> Victor Bazarov wrote:
>> On 9/8/2010 6:43 AM, Vladimir Jovic wrote:
>>> I am trying to optimize integer rescaling (see example at the end).

[..]
>>
>> It is better to store the intermediate value in a type that is
>> *guaranteed* to be larger. 'int' is not necessarily larger than
>> 'short'. The actual solution apparently depends on the implementation
>> which controls the sizes of integral values. I would probably seek a
>> platform-specific solution.

>
> Good point. Which NG would you suggest for x86?

comp.lang.asm.x86, maybe?

V
--

Victor Bazarov
Guest
Posts: n/a

 09-08-2010
On 9/8/2010 10:57 AM, Vladimir Jovic wrote:
> The problem I am facing is the algorithm can not be used, because it
> uses too much CPU

Consider utilizing SSE (that's what those are called, IIRC). There you
x86 newsgroup. Actually, a decent compiler should be able to vectorize
some of those for you. BTW, "too much CPU" is not a good description of
your code's deficiencies. Too much compared to WHAT?

V
--

Guest
Posts: n/a

 09-08-2010
Victor Bazarov wrote:
> On 9/8/2010 10:57 AM, Vladimir Jovic wrote:
>> The problem I am facing is the algorithm can not be used, because it
>> uses too much CPU

>
> Consider utilizing SSE (that's what those are called, IIRC). There you
> x86 newsgroup. Actually, a decent compiler should be able to vectorize

The SSE doesn't have integer division.
Will try x86 group

> some of those for you. BTW, "too much CPU" is not a good description of
> your code's deficiencies. Too much compared to WHAT?

IMO the description is ok if you are trying to optimize the code, but
that is not relevant to the question. If your algorithm spends too much
time in one loop, you know you have to optimize everything in that loop.

For example, your response in else thread to convert to double, multiply
and divide and convert back to integer is going to slow down my algorithm.

tni
Guest
Posts: n/a

 09-08-2010
On 2010-09-08 18:18, Vladimir Jovic wrote:
> Victor Bazarov wrote:
>> On 9/8/2010 10:57 AM, Vladimir Jovic wrote:
>>> The problem I am facing is the algorithm can not be used, because it
>>> uses too much CPU

>>
>> Consider utilizing SSE (that's what those are called, IIRC). There you
>> x86 newsgroup. Actually, a decent compiler should be able to vectorize

>
> The SSE doesn't have integer division.
> Will try x86 group
>
>> some of those for you. BTW, "too much CPU" is not a good description
>> of your code's deficiencies. Too much compared to WHAT?

>
> IMO the description is ok if you are trying to optimize the code, but
> that is not relevant to the question. If your algorithm spends too much
> time in one loop, you know you have to optimize everything in that loop.
>
> For example, your response in else thread to convert to double, multiply
> and divide and convert back to integer is going to slow down my algorithm.

You can multiply by the reciprocal.

Since your input is 16-bits only, float instead of double should be enough.

Floating point to int conversion is quite slow in C/C++ on x86, you
should look at SSE compiler intrinsics (that stuff is portable across
the major x86/x64 compilers). With SSE, you can also do 4 conversions in
parallel.

\\

You seem to have compile time constants for your scaling. If you make
those available to the compiler (declare them as integral const values),
the compiler will be able to optimize away the division (at least Visual
C++, GCC do).

Daniel Pitts
Guest
Posts: n/a

 09-08-2010
On 9/8/2010 7:57 AM, Vladimir Jovic wrote:
> Alf P. Steinbach /Usenet wrote:
>> * Vladimir Jovic, on 08.09.2010 12:43:
>>> Hello,
>>>
>>> I am trying to optimize integer rescaling (see example at the end).
>>>
>>> There is a range or integer elements (in the example, elements are of
>>> type
>>> unsigned short), current range maximum, and the desired maximum.
>>> Straight
>>> forward way is to multiply by new maximum, and then to divide by old
>>> maximum
>>> (taking care about the overflow).
>>>
>>> Are there better ways of rescaling the integers?
>>>
>>>

>
> [snip old code]
>
>> As Victor noted your 'b' is out of range. That means that you've not
>> paid attention to detail. The details may or may not be important
>> depending on the problem at hand, which you do not describe.
>>

>
> Actually, the first part of the algorithm is calculating histogram of an
> image. Another part of the algorithm is using cumulative sum of the
> histogram as a lookup table.
>
> Since the maximum value in the histogram is number of points in the
> image (one point is 16 bits), I need to rescale all values in the
> histogram to the maximum value of 16 bits (which is 0xffff).
>
> All parts of the algorithm are optimized (at least I think that is
> true), except for this rescaling. I am doing it after I calculate the
> cumulative sum, and I am doing it as shown in the code example (one
> multiplication and one division). It works fine, but I was wondering if
> there is a faster way.
>
>> In the general case you need to first decide on the distribution you
>> want. For example, are your maximums inclusive or exclusive? The value
>> 0x0800 seems like an exclusive maximum, while 0x3fff seems like an
>> inclusive one.
>>

>
> Yes, as I said else thread, that is a small mistake. For completeness,
> here is the correct version :
>
> ///////////////////////////////////
> #include <iostream>
>
> unsigned short a = 0x1234; // value in the range to rescale
> unsigned short b = 0x3fff; // new maximum range value
> unsigned short c = 0x2800; // current maximum range value
> int main()
> {
> const unsigned int d = a * b;
> const unsigned short e = d / c;
> std::cout << "rescaled value = 0x" << std::hex << e << std::endl;
> }
> ///////////////////////////////////
>
>> When you have decided /what/ your rescaling should do, then you should
>> implement it correctly. That means e.g. avoiding overflow. The above
>> code can easily overflow, so as portable code it's generally incorrect
>> no matter the details of the desired scaling (it's like a car with
>> square wheels, doesn't matter whether it's meant to be a sports car or
>> a lorry, it's just wrong).
>>

>
> I didn't really think about code portability, because my target is set
> (x86 - some pentium processor). I also wasn't thinking about the
> overflow, because that is easily fixed in proper unit tests.
>
> The problem I am facing is the algorithm can not be used, because it
> uses too much CPU
>
>> So, yes, there are better ways.
>>
>> The first step on the road to betterness is to define the desired
>> effect, and the second step is to implement that correctly. Only then
>> worry about micro efficiency. If at all (and if you do, first measure,
>> Measure, MEASURE).
>>

>
> I did measure, and the profiler told that 30% time is spent on rescaling.

Depending on the size of the image, you can try generating a lookup
table. You can use something like Bresenham's Line Algorithm to build
that table using only integers.
>
> ps Is this another problem for comp.programming ? Is there a NG where to
> post optimization questions?

comp.programming is a better place, or comp.graphics.algorithms.

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>