Velocity Reviews > Optimize a discovery algorithm

Optimize a discovery algorithm

Keith Thompson
Guest
Posts: n/a

 01-02-2014
Willem <(E-Mail Removed)> writes:
> Keith Thompson wrote:
> ) Johannes Bauer <(E-Mail Removed)> writes:
> )> On 02.01.2014 08:51, pozz wrote:
> )>> I hope binary operations (AND) on 32-bits variables will be much more
> )>> simple for 8-bit core than mathematical comparisons..
> )>
> )> Definitely not. I'd guess exactly the same amount of clock cycles per
> )> operation (four instructions and one conditional branch). You'll see it
> )> if you look at the assembly output via objdump.
> )
> ) I'm not so sure about that.
> )
> ) If the CPU can only perform 8-bit operations (I have no idea whether
> ) that's the case or not), it can do a 32-bit AND, OR, or XOR in 4 8-bit
> ) operations. For arithmetic operations, it has to deal with carry bits
> ) for addition and subtraction, and more complex interactions for
> ) multiplication and division.
>
> The addition and subtraction operations on 8-bit CPU's (or any CPU for that
> matter) already takes the carry bits into account, so 'it has to deal with
> carry bits' has been built into the hardware.

Fair enough; multi-word addition and subtraction (where a "word" is the
size that can be handled by a CPU instruction) are likely to be as fast
as multi-word bitwise operations, assuming that single-word ADD and XOR,
for example, take the same number of clock cycles.

Multiplication and division are another story.

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

pozz
Guest
Posts: n/a

 01-02-2014
Il 02/01/2014 23:05, Robert Wessel ha scritto:
>> It's very similar to my algorithm except for two things.
>> You use a numerical min..max range and I use bitmask, but it is
>> equivalent (see above).
>> You don't repeat the entire process from the beginning, but you continue
>> querying remaining subranges until the last one. This is the complex
>> job I'm not able to make and it is my true question of my original post:
>> how to implement the discovery algorithm without needing to restart from
>> the beginning (and without using recursion). It's strictly a
>> programming problem, related to C language.

>
> The advantage of using ranges is that you can eliminate a found device
> from a range fairly easily, and the continue checking on either side
> of that found device.

I think it's the same for bitwise ranges.

>> I agree with you. Indeed I have chosen to use your tecnique, but now I
>> have to implement it

>
> Well, in pseudocode:
>
> struct {first, last} stack[33];
> // (stack levels need to match number of bits in serial #s + 1)
> [...]

But you are using a "custom stack", 33 elements wide! It's a lot of
memory space lost (64 * 33 = 2kB!!). I can't statically allocate that
space in my AVR8 microcontroller. It's better to shoot a discovered
slave and restart the algorithm from the beginning: more queries so more
time to complete, but much less RAM requirements.

I think your code is very similar, in terms of memory requirements, to
an analog recursive algorithm that is much shorter and simpler.

> As I mentioned, doing this several times to increase reliability may
> be a good idea if your collision detection is not 100% reliable.

Yes, it is a possibility. Anyway the discovery is manually started by
the user that can check the result. If a problem appears, the user can
restart the detection.

pozz
Guest
Posts: n/a

 01-03-2014
Il 02/01/2014 23:55, Robert Wessel ha scritto:
>>
>> But you are using a "custom stack", 33 elements wide! It's a lot of
>> memory space lost (64 * 33 = 2kB!!). I can't statically allocate that
>> space in my AVR8 microcontroller. It's better to shoot a discovered
>> slave and restart the algorithm from the beginning: more queries so more
>> time to complete, but much less RAM requirements.

>
>
> No, it's a stack of 33 pairs of 32 bit words. IOW, 264 bytes.

Yes, you're right. I was calculating bits and not bytes. Anyway it's a
big space to allocate (I'm using ATmega32A that has 2kB of RAM).

>> I think your code is very similar, in terms of memory requirements, to
>> an analog recursive algorithm that is much shorter and simpler.

>
>
> Well, it is the basic explicit stack version of the inherently
> recursive routine. A "real" recursive routine would probably use
> considerably more than 264 bytes of stack, though (because of the need
> to the parameters). The real recursive implementation would be less
> code, though.

You're right again. If I had RAM space, I would use a "real" recursive
routine: much more simple.

> You could reduce stack usage by iteratively looking at smaller ranges
> than normally are generated for each recursion (implicit or explicit).
> For example, you might subdivide the incoming range into 256 sub
> parts, and then issue queries for those 1/256th subranges until to get
> a response (valid or invalid), and then recurse (again implicitly or
> explicitly) on that range and the remaining area to the right. That
> would reduce the maximum smaller ranged recursed on, limiting the
> stack size (and if you used 256, you'd need no more than 5* stack
> levels). Of course that would increase the number of probes by about
> a factor of 256 as well. An intermediate approach (say 16 subranges),
> would reduce stack usage to 9, at an 16 fold increase in work.
>
>
> *It might be four (or 32, in the original), but I'm too lazy to figure
> out if that's true with the simpler empty range discarding method I'm
> using.

RAM space requirements is important as well as number of probes on the
bus: the probe could take about 100ms-200ms (the timeout of the answer).
If the number of probes increase, the overall time increases as well.

I think my original implementation could be a good compromise: I will
send more probes on the bus (not too much more) than the recursive
algorithm, but I don't need memory space for stack.

Aleksandar Kuktin
Guest
Posts: n/a

 01-03-2014
On Thu, 02 Jan 2014 16:55:22 -0600, Robert Wessel wrote:

> On Thu, 02 Jan 2014 23:17:41 +0100, pozz <(E-Mail Removed)> wrote:
>
>>Il 02/01/2014 23:05, Robert Wessel ha scritto:
>>>> It's very similar to my algorithm except for two things.
>>>> You use a numerical min..max range and I use bitmask, but it is
>>>> equivalent (see above).
>>>> You don't repeat the entire process from the beginning, but you
>>>> continue querying remaining subranges until the last one. This is
>>>> the complex job I'm not able to make and it is my true question of my
>>>> original post:
>>>> how to implement the discovery algorithm without needing to restart
>>>> from the beginning (and without using recursion). It's strictly a
>>>> programming problem, related to C language.
>>>
>>> The advantage of using ranges is that you can eliminate a found device
>>> from a range fairly easily, and the continue checking on either side
>>> of that found device.

[snip]

>>>> I agree with you. Indeed I have chosen to use your tecnique, but now
>>>> I have to implement it
>>>
>>> Well, in pseudocode:
>>>
>>> struct {first, last} stack[33];
>>> // (stack levels need to match number of bits in serial #s + 1)
>> > [...]

>>
>>But you are using a "custom stack", 33 elements wide! It's a lot of
>>memory space lost (64 * 33 = 2kB!!). I can't statically allocate that
>>space in my AVR8 microcontroller. It's better to shoot a discovered
>>slave and restart the algorithm from the beginning: more queries so more
>>time to complete, but much less RAM requirements.

>
> No, it's a stack of 33 pairs of 32 bit words. IOW, 264 bytes.
>
>>I think your code is very similar, in terms of memory requirements, to
>>an analog recursive algorithm that is much shorter and simpler.

>
> Well, it is the basic explicit stack version of the inherently recursive
> routine. A "real" recursive routine would probably use considerably
> more than 264 bytes of stack, though (because of the need to save
> parameters). The real recursive implementation would be less code,
> though.

[snip]

Well... IMHO, this problem does not really have a good solution.
Optimization (AKA, political decision making) is in order.

Using a large amount of memory is mandatory to achieve the minimum
possible packet traffic on the net^H^H^Hbus. Memory is needed to store
the intermediate state of the probing algorithm. Discarding it (AKA using
less memory) requires regenerating the state at a subsequent point in
execution, AKA sending more packets.

Below, I propose a solution (which has not been tested and may or may not
contain fence-post errors) that uses the absolute minimum amount of
memory and discards as little of the intermediate state as possible,
thereby sending less packets than it would otherwise need to. It's also
non-recursive (in a way) and short.

uint32_t top, bottom, reusable;

#define DEFAULT_BOTTOM (~0) /* Assuming the compiler will fill the
entire field with ones. */
top = 0;
bottom = DEFAULT_BOTTOM;

do {
reusable = do_querry_bus(top, bottom);
switch (reusable) {
space where the device was detected. */
bottom = DEFAULT_BOTTOM;
break;

/* Maybe it's in the other half. */
reusable = (bottom - top) /2;

top = bottom;
bottom = bottom + reusable; /* May or may not contain
fence-post errors. */
break;

default:
bottom = bottom - ((bottom - top) /2); /* May or may not contain
fence-post errors. */
};
} while (bottom > top);
#undef DEFAULT_BOTTOM
}

Aleksandar Kuktin
Guest
Posts: n/a

 01-03-2014
On Fri, 03 Jan 2014 17:02:10 +0000, Aleksandar Kuktin wrote:

God, I'm an idiot. Or not. The unpatched code will not incur any
additional cost when more than one slave is on the bus, but it /will/
send an unneccessary packet if there is only one slave on the bus.

@@ -10,9 +10,8 @@
reusable = do_querry_bus(top, bottom);
switch (reusable) {
- space where the device was detected. */
- top = do_assign_address(top, bottom) +1;
+ top = bottom +1;
bottom = DEFAULT_BOTTOM;
break;

Aleksandar Kuktin
Guest
Posts: n/a

 01-03-2014
On Fri, 03 Jan 2014 17:12:31 +0000, Aleksandar Kuktin wrote:

> On Fri, 03 Jan 2014 17:02:10 +0000, Aleksandar Kuktin wrote:
>

>
> God, I'm an idiot. Or not. The unpatched code will not incur any
> additional cost when more than one slave is on the bus, but it /will/
> send an unneccessary packet if there is only one slave on the bus.

And it also has a halting problem when either the bus is empty or all the
slaves are very close to the beggining of the address space.