Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Row-wise vs. column-wise image processing

Reply
Thread Tools

Row-wise vs. column-wise image processing

 
 
Enrique Cruiz
Guest
Posts: n/a
 
      01-25-2007
Hello all,

I am currently implementing a fairly simple algorithm. It scans a
grayscale image, and computes a pixel's new value as a function of its
original value. Two passes are made, first horizontally and second
vertically. The problem I have is that the vertical pass is 3 to 4
times slower than the horizontal, although the code is _exactly_ the
same in both cases?!

The code itself is very simple. There are 2 loops to scan through rows
and colums (horizontal pass), or columns and rows (vertical pass)
depending on the pass. The processing part is simple and is contained
in a single line. 'pixel' is a pointer to the current pixel. The new
value of the current pixel is first updated, and the pointer is then
incremented to the next pixel, which is either in the next column or in
the next row depending on the pass. I should add that the image is
stored as a large 1-D vector, i.e. the values of each rows are
contiguous in memory. Here is a simplified version of the code:

####################
// HORIZONTAL
// loop for every row
....
// then for every column
for(col=firstCol+1 ; col<=lastCol-1 ; ++col)
{
*pixel = (*pixel) * 2 / 3;
pixel++;
}

// VERTICAL
// loop for every column
....
// then for every row
for(row=firstRow+1 ; row<=lastRow-1 ; ++row)
{
*pixel = (*pixel) * 2 / 3;
pixel+=imgWidth;
}
##################

For this small amount of code, timings are as follow:
- horizontal = 0.035 sec.
- vertical =* *0.135 sec.

Now if we simply remove in each case the line updating the pixel
pointer (i.e. "pixel++;" and "pixel+=imgWidth;"), the timings then
becomes equal at 0.081. This simple instruction is responsible for the
massive loss of time. And I have no idea why.

My only guess relates to memory management issues. Since the image is
stored row-wise, the current and next values are physically next in
memory during the horizontal pass. On the other hand, for the vertical
pass, the next value is stored in the next row, and the distance
between them becomes 'image_width'. My guess is that the next pixel
value in such a case is not close enough to be stored in the processor
cache or register. The processor has to fetch it from memory, hence the
massive loss in speed. This is however just a guess.

I would really appreciate if anybody could enlighten me on this topic.

Thanks in advance,


Enrique

 
Reply With Quote
 
 
 
 
peter koch
Guest
Posts: n/a
 
      01-25-2007


On Jan 25, 11:59 am, Enrique Cruiz <jni6l03mdo6n...@jetable.org>
wrote:
> Hello all,
>
> I am currently implementing a fairly simple algorithm. It scans a
> grayscale image, and computes a pixel's new value as a function of its
> original value. Two passes are made, first horizontally and second
> vertically. The problem I have is that the vertical pass is 3 to 4
> times slower than the horizontal, although the code is _exactly_ the
> same in both cases?!
>

[snip]
Modern CPU's are complex beasts doing lots of stuff to improve
performance. One such thing is caching of memory, and that is what
happens here: the horisontal pass can exploit the cache much better
than the vertical pass.

/Peter

 
Reply With Quote
 
 
 
 
=?iso-8859-1?q?Erik_Wikstr=F6m?=
Guest
Posts: n/a
 
      01-25-2007
Enrique Cruiz wrote:

This is not a C++ question as such, so it's kind of off topic, but
anyway...

> // then for every column
> for(col=firstCol+1 ; col<=lastCol-1 ; ++col)
> {
> *pixel = (*pixel) * 2 / 3;
> pixel++;
>
> }


Normally one would use the loop-variable inside the loop, I suspect
that this is not the real code you are showing. In the real code there
might be something else that makes the difference, though I doubt it.

> My only guess relates to memory management issues. Since the image is
> stored row-wise, the current and next values are physically next in
> memory during the horizontal pass. On the other hand, for the vertical
> pass, the next value is stored in the next row, and the distance
> between them becomes 'image_width'. My guess is that the next pixel
> value in such a case is not close enough to be stored in the processor
> cache or register. The processor has to fetch it from memory, hence the
> massive loss in speed. This is however just a guess.


Probably yes, many factors come into play, the size of the image, the
size of the caches or the CPU, how much speculation the CPU does and so
on. A modern CPU often speculates that you don't only want to use the
piece of memory you try to access but also those pieces close to it.
This helps in the row-wise run since it allows the CPU to read in the
memory before it is actually used, in the column-wise run it works
against you by pulling in more than you need. If you have a smaller
image you might be able to squeeze it all into the cache which should
make both runs equally fast.

--
Erik Wikström

 
Reply With Quote
 
Enrique Cruiz
Guest
Posts: n/a
 
      01-25-2007
On 2007-01-25 12:02:42 +0000, "peter koch" <> said:
>
> Modern CPU's are complex beasts doing lots of stuff to improve
> performance. One such thing is caching of memory, and that is what
> happens here: the horisontal pass can exploit the cache much better
> than the vertical pass.


Thanks. Do you know any resources that discuss similar issues related
to (agressive) optimization?

Alexis


 
Reply With Quote
 
Enrique Cruiz
Guest
Posts: n/a
 
      01-25-2007
On 2007-01-25 12:05:36 +0000, "Erik Wikström"
<> said:
> Probably yes, many factors come into play, the size of the image, the
> size of the caches or the CPU, how much speculation the CPU does and so
> on. A modern CPU often speculates that you don't only want to use the
> piece of memory you try to access but also those pieces close to it.
> This helps in the row-wise run since it allows the CPU to read in the
> memory before it is actually used, in the column-wise run it works
> against you by pulling in more than you need. If you have a smaller
> image you might be able to squeeze it all into the cache which should
> make both runs equally fast.


Thanks a lot for the informative answer. Do you know good resources
that discuss similar issues related to optimisation?

Enrique

 
Reply With Quote
 
Jim Langston
Guest
Posts: n/a
 
      01-25-2007
"Enrique Cruiz" <> wrote in message
news:2007012514482643658-jni6l03mdo6nvuu@jetableorg...
> On 2007-01-25 12:05:36 +0000, "Erik Wikström" <>
> said:
>> Probably yes, many factors come into play, the size of the image, the
>> size of the caches or the CPU, how much speculation the CPU does and so
>> on. A modern CPU often speculates that you don't only want to use the
>> piece of memory you try to access but also those pieces close to it.
>> This helps in the row-wise run since it allows the CPU to read in the
>> memory before it is actually used, in the column-wise run it works
>> against you by pulling in more than you need. If you have a smaller
>> image you might be able to squeeze it all into the cache which should
>> make both runs equally fast.

>
> Thanks a lot for the informative answer. Do you know good resources that
> discuss similar issues related to optimisation?
>
> Enrique


Also, consider that
pixel++;
is less assembly than:
pixel+=imgWidth;

I mean, pixel++; may be something as simple as:
INC [pixel]
(although it may be more like)
MOV AX,[pixel]
INC AX;
MOV [pixel],AX

Where pixel+=imgWidth; would involve some memory swapping and an add.
Some I've seen are like:
MOV AX,[pixel];
MOV BX,[imgWidth];
ADD AX,BX
MOV [pixel],DX

or similar. without looking at the assembly I couldn't say. I learned
assembly back in the 80x86 days and I"m sure it's changed a bit since then.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parallel Image Processing in VHDL mike VHDL 10 02-06-2009 11:58 PM
Processing an image with numarray.nd image =?ISO-8859-1?Q?Rapha=EBl_MARC?= Python 1 10-04-2005 10:58 AM
Post-Processing RAW vs Post-Processing TIFF Mike Henley Digital Photography 42 01-30-2005 08:26 AM
wx.Image: Couldn't add an image to the image list. Laszlo Zsolt Nagy Python 1 01-26-2005 09:55 PM
Question: processing HTML, re-write default processing action of many tags Hubert Hung-Hsien Chang Python 2 09-17-2004 03:10 PM



Advertisments