Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Array class and pointer aliasing problems

Reply
Thread Tools

Array class and pointer aliasing problems

 
 
Mathieu Benoit
Guest
Posts: n/a
 
      01-18-2004
(I'm having trouble to post this, sorry if this mail comes
several times)

I'm trying to optimize the use of an integer array class,
and I found that one major problem is pointer aliasing.

I consider the following example:

--------------------------------------------------------
class IntTab
{
public:
IntTab(int n, int m) :
data(new int[n*m]), dim0(n), dim1(m) {
}
~IntTab() {
delete data;
}
int ni() const {
return dim0;
}
int nj() const {
return dim1;
}
int & operator()(int i, int j) {
return data[i*dim1+j];
}
const int & operator()(int i, int j) const {
return data[i*dim1+j];
}
void myjob(IntTab & a) const;
private:
int * const data;
const int dim0;
const int dim1;
};

void myjob(const IntTab & a, IntTab & b)
{
int i, j, k;
int ni = b.ni();
int nj = b.nj();
for (i=0, k=0; i<ni; i++)
for (j=0; j<nj; j++, k++)
b(k,0) = a(i,j);
}

void IntTab::myjob(IntTab & a) const
{
int a_j = a.dim1;
int ni = dim0;
int nj = dim1;
int i, j, k;
int * a_data = a.data;
int * my_data = data;
for (i=0, k=0; i<ni; i++)
for (j=0; j<nj; j++, k++)
a_data[k*a_j] = my_data[i*nj+j];
}
---------------------------------------------------------

"myjob" is poorly optimized by the compiler (g++ 3.2)
because it assumes that in the process of writing b, I could
silently update the pointers (a.data and b.data) and the
index counters (a.dim1 and b.dim1) which are used in the
loop. Indeed, making these variables local as in
IntTab::myjob is much better but of course I prefer to write
my algorithms with the operator() which allows to easily put
some asserts for debugging.

Strangely enough, changing my array for an array of double
leads to the same problem, even with "-fstrict-aliasing".

Moreover, the "restrict" keyword might help here but it is
not supported by g++. Is there another way to tell the
compiler that caching the pointers is safe while preserving
readability ?

Is there something special with g++3.2 ?

Thanks,
Benoit MATHIEU

 
Reply With Quote
 
 
 
 
Jumbo
Guest
Posts: n/a
 
      01-18-2004

"Mathieu Benoit" <(E-Mail Removed)> wrote in message
news:400a7ccb$0$22298$(E-Mail Removed)...
> (I'm having trouble to post this, sorry if this mail comes
> several times)
>
> I'm trying to optimize the use of an integer array class,
> and I found that one major problem is pointer aliasing.
>
> I consider the following example:
>
> --------------------------------------------------------
> class IntTab
> {
> public:
> IntTab(int n, int m) :
> data(new int[n*m]), dim0(n), dim1(m) {
> }
> ~IntTab() {
> delete data;
> }
> int ni() const {
> return dim0;
> }
> int nj() const {
> return dim1;
> }
> int & operator()(int i, int j) {
> return data[i*dim1+j];
> }
> const int & operator()(int i, int j) const {
> return data[i*dim1+j];
> }
> void myjob(IntTab & a) const;
> private:
> int * const data;
> const int dim0;
> const int dim1;
> };
>
> void myjob(const IntTab & a, IntTab & b)
> {
> int i, j, k;
> int ni = b.ni();
> int nj = b.nj();
> for (i=0, k=0; i<ni; i++)
> for (j=0; j<nj; j++, k++)
> b(k,0) = a(i,j);
> }
>
> void IntTab::myjob(IntTab & a) const
> {
> int a_j = a.dim1;
> int ni = dim0;
> int nj = dim1;
> int i, j, k;
> int * a_data = a.data;
> int * my_data = data;
> for (i=0, k=0; i<ni; i++)
> for (j=0; j<nj; j++, k++)
> a_data[k*a_j] = my_data[i*nj+j];
> }
> ---------------------------------------------------------
>
> "myjob" is poorly optimized by the compiler (g++ 3.2)
> because it assumes that in the process of writing b, I could
> silently update the pointers (a.data and b.data) and the
> index counters (a.dim1 and b.dim1) which are used in the
> loop. Indeed, making these variables local as in
> IntTab::myjob is much better but of course I prefer to write
> my algorithms with the operator() which allows to easily put
> some asserts for debugging.
>
> Strangely enough, changing my array for an array of double
> leads to the same problem, even with "-fstrict-aliasing".
>
> Moreover, the "restrict" keyword might help here but it is
> not supported by g++. Is there another way to tell the
> compiler that caching the pointers is safe while preserving
> readability ?
>
> Is there something special with g++3.2 ?
>
> Thanks,
> Benoit MATHIEU
>

Why don't you pass the dimensions into the function and call it like this:
myjob(IntTab1, IntTab2, IntTab2.ni(), IntTab2.nj())


 
Reply With Quote
 
 
 
 
Benoit Mathieu
Guest
Posts: n/a
 
      01-19-2004
>>I'm trying to optimize an integer array class,
>>and I found that one major problem is pointer aliasing.
>>--------------------------------------------------------
>>class IntTab
>>{
>>public:
>> IntTab(int n, int m) :
>> data(new int[n*m]), dim0(n), dim1(m) {
>> }
>> ~IntTab() {
>> delete data;
>> }
>> int ni() const {
>> return dim0;
>> }
>> int nj() const {
>> return dim1;
>> }
>> int & operator()(int i, int j) {
>> return data[i*dim1+j];
>> }
>> const int & operator()(int i, int j) const {
>> return data[i*dim1+j];
>> }
>> void myjob(IntTab & a) const;
>>private:
>> int * const data;
>> const int dim0;
>> const int dim1;
>>};
>>
>>void myjob(const IntTab & a, IntTab & b)
>>{
>> int i, j, k;
>> int ni = b.ni();
>> int nj = b.nj();
>> for (i=0, k=0; i<ni; i++)
>> for (j=0; j<nj; j++, k++)
>> b(k,0) = a(i,j);

<snip>
>>"myjob" is poorly optimized by the compiler (g++ 3.2)
>>because it assumes that in the process of writing b, I could
>>silently update the pointers (a.data and b.data) and the
>>index counters (a.dim1 and b.dim1) which are used in the
>>loop.

<snip>

> Why don't you pass the dimensions into the function and call it like this:
> myjob(IntTab1, IntTab2, IntTab2.ni(), IntTab2.nj())


Would this really help ? I already made a local copy of ni
and nj... I can see in the assembly output that ni and nj
and handled efficiently, only data and dim1 are reloaded
every time...

Maybe this issue is strongly related to the implementation
details in g++ (which might be particularly conservative
about aliasing) and I should ask in another group. But if
this is a g++ issue and the answer is g++ specific, then I
will probably not use it...

I would be more interested if somebody knows about a C++
construct which clearly tells the compiler that the actual
arrays will never alias these indexes and pointers... Does
the standard say something about that ?

 
Reply With Quote
 
Jumbo
Guest
Posts: n/a
 
      01-20-2004

"Benoit Mathieu" <(E-Mail Removed)> wrote in message
news:400c51cb$0$1167$(E-Mail Removed)...
> >>I'm trying to optimize an integer array class,
> >>and I found that one major problem is pointer aliasing.
> >>--------------------------------------------------------
> >>class IntTab
> >>{
> >>public:
> >> IntTab(int n, int m) :
> >> data(new int[n*m]), dim0(n), dim1(m) {
> >> }
> >> ~IntTab() {
> >> delete data;
> >> }
> >> int ni() const {
> >> return dim0;
> >> }
> >> int nj() const {
> >> return dim1;
> >> }
> >> int & operator()(int i, int j) {
> >> return data[i*dim1+j];
> >> }
> >> const int & operator()(int i, int j) const {
> >> return data[i*dim1+j];
> >> }
> >> void myjob(IntTab & a) const;
> >>private:
> >> int * const data;
> >> const int dim0;
> >> const int dim1;
> >>};
> >>
> >>void myjob(const IntTab & a, IntTab & b)
> >>{
> >> int i, j, k;
> >> int ni = b.ni();
> >> int nj = b.nj();
> >> for (i=0, k=0; i<ni; i++)
> >> for (j=0; j<nj; j++, k++)
> >> b(k,0) = a(i,j);

> <snip>
> >>"myjob" is poorly optimized by the compiler (g++ 3.2)
> >>because it assumes that in the process of writing b, I could
> >>silently update the pointers (a.data and b.data) and the
> >>index counters (a.dim1 and b.dim1) which are used in the
> >>loop.

> <snip>
>
> > Why don't you pass the dimensions into the function and call it like

this:
> > myjob(IntTab1, IntTab2, IntTab2.ni(), IntTab2.nj())

>
> Would this really help ? I already made a local copy of ni
> and nj... I can see in the assembly output that ni and nj
> and handled efficiently, only data and dim1 are reloaded
> every time...
>
> Maybe this issue is strongly related to the implementation
> details in g++ (which might be particularly conservative
> about aliasing) and I should ask in another group. But if
> this is a g++ issue and the answer is g++ specific, then I
> will probably not use it...
>
> I would be more interested if somebody knows about a C++
> construct which clearly tells the compiler that the actual
> arrays will never alias these indexes and pointers... Does
> the standard say something about that ?
>

Sorry I don't know.
What about doing it an an asm block, or something like that?


 
Reply With Quote
 
Karl Heinz Buchegger
Guest
Posts: n/a
 
      01-20-2004
Benoit Mathieu wrote:
>
> Maybe this issue is strongly related to the implementation
> details in g++ (which might be particularly conservative
> about aliasing) and I should ask in another group. But if
> this is a g++ issue and the answer is g++ specific, then I
> will probably not use it...
>
> I would be more interested if somebody knows about a C++
> construct which clearly tells the compiler that the actual
> arrays will never alias these indexes and pointers... Does
> the standard say something about that ?


What do you want to achieve?
As I see it, you want the compiler to replace the index
calculation in

for (j=0; j<nj; j++, k++)
b(k,0) = a(i,j);

with some sort of pointer magic. The same thing it can
do when you write

for( i = 0; i < size; ++i )
m[i] = ...

where the compiler can replace the whole thing with

basePtr = m;
for( i = 0; i < size; ++i, basePtr++ )
*basePtr = ...

If your compiler is unable to do this transformation in
your current case, then clearly it is a quality of implementation
issue (meaning: g++ specific).

But what can you do?
One thing you can do in any case is to look how others solve this
(or a similar) problem. What immediatly comes to my mind is the
STL. How do they do it? The STL has introduced a concept called
iterators. There are functions for getting the first iterator
and incrementing such an iterator. Well, in the above, transformed
example, basePtr acts as an iterator! It is assigned a starting value
and after that the pointer (aehh: iterator) is incremented and evaluated.
Hmm.

So you could add functions to your class IntTab:

class intTab
{
int* GetFirstColumn( int Row ) { return &data[ Row*dim1 ]; }
int GetNextColumnValue( int*& Pos ) { return *(Pos++); }

...
};

And your looping construct modifies to:

for( i = 0, k = 0; i < n1; ++i ) {
int* PosA = a.GetFirstColumn( i );

for( j = 0; j < nj; j++, k++ )
b(k,0) = a.GetNextColumnValue( PosA );
}

Try it and see, if the compiler is able to optimize this in a better
way.

Of course one could polish up the above by introducing a seperate
iterator class etc., but for a quick test to see if your compiler can
do a better job the above should be sufficient.

Yes I am aware, that this could be seen as premature oprimization
and as we all know this is the root of all evil

--
Karl Heinz Buchegger
http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
Benoit Mathieu
Guest
Posts: n/a
 
      01-20-2004
Karl Heinz Buchegger wrote:
<snip>
> So you could add functions to your class IntTab:
>
> class intTab
> {
> int* GetFirstColumn( int Row ) { return &data[ Row*dim1 ]; }
> int GetNextColumnValue( int*& Pos ) { return *(Pos++); }
>
> ...
> };
>
> And your looping construct modifies to:
>
> for( i = 0, k = 0; i < n1; ++i ) {
> int* PosA = a.GetFirstColumn( i );
>
> for( j = 0; j < nj; j++, k++ )
> b(k,0) = a.GetNextColumnValue( PosA );
> }
>
> Try it and see, if the compiler is able to optimize this in a better
> way.
>
> Of course one could polish up the above by introducing a seperate
> iterator class etc., but for a quick test to see if your compiler can
> do a better job the above should be sufficient.
>
> Yes I am aware, that this could be seen as premature oprimization
> and as we all know this is the root of all evil


(why is it that I so often look at the assembly output,
that's certainly evil... )

Using an iterator is cool in the example that I showed :
seen from the compiler, it "is" a pointer, and from my point
of view it is also quite interesting since I can hide some
sanity checking in the iterator to prevent the user from
going out of bounds. Things get worse when I need to access
the data randomly (and this is in fact the general case, and
here it is much more intersting to have some bound
checking). An iterator won't do the job in this case... I
also wonder if preserving logical constness is easy with
iterators (I'll have a look in the STL, the answer probably
sits there).

I feel that there is no easy solution to my
performance/clarity/safety tradeoff problem (I mean, other
than "write the code in fortran" (uh!)). I will take it this
way : code every thing as clearly as possible with all the
safe bound checking, run the code and eventually perhaps
concentrate on the few pieces of code where I really spend a
lot of time (creating some "friend" functions where I can
hack with pointers and other evil things...)

Anyway this is interesting: I should have a look at the STL
more often. Thank you ...

Benoît Mathieu
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
pointer to an array vs pointer to pointer subramanian100in@yahoo.com, India C Programming 5 09-23-2011 10:28 AM
Cast a pointer to array to base class pointer to array Hansen C++ 3 04-24-2010 03:30 PM
Pointer to array of array of const pointer RSL C++ 14 02-19-2010 02:06 PM
Array of pointer and pointer of array erfan C Programming 6 01-28-2008 08:55 PM
Array of pointer Vs Pointer to Array sangeetha C Programming 9 10-09-2004 07:01 PM



Advertisments