Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Synchronization Algorithm Verificator for C++0x

Reply
Thread Tools

Synchronization Algorithm Verificator for C++0x

 
 
Dmitriy V'jukov
Guest
Posts: n/a
 
      08-02-2008
I want to announce tool called Relacy Race Detector, which I've
developed. It's synchronization algorithm verificator for C++0x's
relaxed memory model. The tool is implemented in the form of header-
only library for C++03, which can be used for efficient execution of
unit-tests for synchronization algorithms. The tool executes unit-test
many times under control of special scheduler, on every execution
scheduler models different interleaving between threads, at the same
time every execution is exhaustively analyzed for data races, accesses
to freed memory, failed assertions etc. If no errors found then
verification terminates when particular number of interleavings are
verified (for random scheduler), or when all possible interleavings
are verified (for full search scheduler). If error is found then tool
outputs detailed execution history which leads to
error and terminates.
The tool was designed for verification of algorithms like memory
management, memory reclamation for lock-free algorithms, multithreaded
containers (queues, stacks, maps), mutexes, eventcounts and so on.
My personal subjective feeling to date is that tool is capable of
finding extremely subtle algorithmic errors, memory fence placement
errors and memory fence type errors within a second. And after error
is detected error fixing is relatively trivial, because one has
detailed execution history which leads to error.

Main features:
- Relaxed ISO C++0x Memory Model. Relaxed/acquire/release/acq_rel/
seq_cst memory operations and fences. The only non-supported feature
is memory_order_consume, it's simulated with memory_order_acquire.
- Exhaustive automatic error checking (including ABA detection).
- Full-fledged atomics library (with spurious failures in
compare_exchange()).
- Arbitrary number of threads.
- Detailed execution history for failed tests.
- Before/after/invariant functions for test suites.
- No false positives.

Types of detectable errors:
- Race condition (according to ISO C++0x definition)
- Access to uninitialized variable
- Access to freed memory
- Double free
- Memory leak
- Deadlock
- Livelock
- User assert failed
- User invariant failed

You can get some more information (tutorial, examples etc) here:
http://groups.google.com/group/relacy/web

And here is dedicated news group/discussion forum:
http://groups.google.com/group/relacy/topics

If you want to get a copy of the tool, please, contact me via
http://www.velocityreviews.com/forums/(E-Mail Removed).

Any feedback, comments, suggestions are welcome.

Relacy Race Detector is free for open-source, non-commercial
development; research with non-patented, published results;
educational purposes; home/private usage. Please contact me
((E-Mail Removed)) about other usages.


--------------------------------------------------------
Here is quick example.
Code of unit-test:

#include <relacy/relacy_std.hpp>

// template parameter '2' is number of threads
struct race_test : rl::test_suite<race_test, 2>
{
std::atomic<int> a;
rl::var<int> x;

// executed in single thread before main thread function
void before()
{
a($) = 0;
x($) = 0;
}

// main thread function
void thread(unsigned thread_index)
{
if (0 == thread_index)
{
// code executed by first thread
x($) = 1;
a($).store(1, rl::memory_order_relaxed);
}
else
{
// code executed by second thread
if (1 == a($).load(rl::memory_order_relaxed))
x($) = 2;
}
}

// executed in single thread after main thread function
void after()
{
}

// executed in single thread after every
// 'visible' action in main threads.
// disallowed to modify any state
void invariant()
{
}
};

int main()
{
// start simulation
rl::simulate<race_test>();
}

And here is output of the tool:

struct race_test
DATA RACE
iteration: 8

execution history:
[0] 0: <00366538> atomic store, value=0, (prev value=0),
order=seq_cst, in race_test::before, test.cpp(14)
[1] 0: <0036655C> store, value=0, in race_test::before, test.cpp(15)
[2] 0: <0036655C> store, value=1, in race_test::thread, test.cpp(23)
[3] 0: <00366538> atomic store, value=1, (prev value=0),
order=relaxed, in race_test::thread, test.cpp(24)
[4] 1: <00366538> atomic load, value=1, order=relaxed, in
race_test::thread, test.cpp(2
[5] 1: <0036655C> store, value=0, in race_test::thread, test.cpp(29)
[6] 1: data race detected, in race_test::thread, test.cpp(29)

thread 0:
[0] 0: <00366538> atomic store, value=0, (prev value=0),
order=seq_cst, in race_test::before, test.cpp(14)
[1] 0: <0036655C> store, value=0, in race_test::before, test.cpp(15)
[2] 0: <0036655C> store, value=1, in race_test::thread, test.cpp(23)
[3] 0: <00366538> atomic store, value=1, (prev value=0),
order=relaxed, in race_test::thread, test.cpp(24)

thread 1:
[4] 1: <00366538> atomic load, value=1, order=relaxed, in
race_test::thread, test.cpp(2
[5] 1: <0036655C> store, value=0, in race_test::thread, test.cpp(29)
[6] 1: data race detected, in race_test::thread, test.cpp(29)
--------------------------------------------------------

Dmitriy V'jukov
 
Reply With Quote
 
 
 
 
Dmitriy V'jukov
Guest
Posts: n/a
 
      08-03-2008
On 2 ΑΧΗ, 20:47, "Dmitriy V'jukov" <(E-Mail Removed)> wrote:

> I want to announce tool called Relacy Race Detector, which I've
> developed. It's synchronization algorithm verificator for C++0x's
> relaxed memory model.



Q: Can I use Relacy Race Detector to check my algo againts other that C
++0x memory models (x86, PPC, Java, CLI)?


A Yes, you can. Fortunately, C++0x memory model is very relaxaed, so
for the main part it's a "superset" of basically any other memory
model. You just have to define "binding" between your target memory
model and C++0x memory model.

Let's create such binding, for example, for x86 memory model:
- plain load operation is always "acquire" (i.e.
memory_order_acquire)
- plain store operation is always "release" (i.e.
memory_order_release)
- atomic RMW operation is always sequentially consistent (i.e.
memory_order_seq_cst)
- mfence instruction is
std::atomic_thread_fence(memory_order_seq_cst)
That is all. You can create such bindings for other hardware memory
models you are interested in (PPC, Itatium, SPARC etc).

And you can define such binding to other abstract memory model, like
Java MM. Let's see:
- plain load is "relaxed" (i.e. memory_order_relaxed)
- plain store is "relaxed" (i.e. memory_order_relaxed)
- volatile load is "acquire" (i.e. memory_order_acquire)
- volatile store operation is "release" (i.e. memory_order_release)
- atomic RMW operation is always sequentially consistent (i.e.
memory_order_seq_cst)
But here are some caveats. First, you have to emulate work of GC, i.e.
put all allocated memory to special list, and free all allocated
memory in test_suite::after(). Second, you have to manually emit
sequentially consistent memory fence between every volatile store and
volatile load. Third, you have to manually initialize all variables to
default value (0). Fourth, there is no such thing as data race, so you
have to define all variables as std::atomic, this will effectively
disable data race detection mechanizm. Well, actually you can use
rl::var variables, if you know that there must be no concurrent
accesses to variable, this will enable some automatic error detection
wrt data races.
Sounds not very cool... So I'm going to add built-in support for Java
and CLI. Then user would have to define something like RL_JAVA_MODE/
RL_CLI_MODE, and get all those things out-of-the-box. But yes, it
still will be C++ library. What do you think?

Dmitriy V'jukov
 
Reply With Quote
 
 
 
 
Chris M. Thomasson
Guest
Posts: n/a
 
      08-05-2008
"Dmitriy V'jukov" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
>I want to announce tool called Relacy Race Detector, which I've
> developed. It's synchronization algorithm verificator for C++0x's
> relaxed memory model. The tool is implemented in the form of header-
> only library for C++03, which can be used for efficient execution of
> unit-tests for synchronization algorithms. The tool executes unit-test
> many times under control of special scheduler, on every execution
> scheduler models different interleaving between threads, at the same
> time every execution is exhaustively analyzed for data races, accesses
> to freed memory, failed assertions etc. If no errors found then
> verification terminates when particular number of interleavings are
> verified (for random scheduler), or when all possible interleavings
> are verified (for full search scheduler). If error is found then tool
> outputs detailed execution history which leads to
> error and terminates.
> The tool was designed for verification of algorithms like memory
> management, memory reclamation for lock-free algorithms, multithreaded
> containers (queues, stacks, maps), mutexes, eventcounts and so on.
> My personal subjective feeling to date is that tool is capable of
> finding extremely subtle algorithmic errors, memory fence placement
> errors and memory fence type errors within a second. And after error
> is detected error fixing is relatively trivial, because one has
> detailed execution history which leads to error.

[...]

Have you testing the following algorithm yet?

http://research.sun.com/scalable/pub...rkstealing.pdf

I thought about coding it up. However, I think the algorithm might have a
patent application.

 
Reply With Quote
 
Chris M. Thomasson
Guest
Posts: n/a
 
      08-05-2008
"Dmitriy V'jukov" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
>I want to announce tool called Relacy Race Detector, which I've
> developed. It's synchronization algorithm verificator for C++0x's
> relaxed memory model. The tool is implemented in the form of header-
> only library for C++03, which can be used for efficient execution of
> unit-tests for synchronization algorithms. The tool executes unit-test
> many times under control of special scheduler, on every execution
> scheduler models different interleaving between threads, at the same
> time every execution is exhaustively analyzed for data races, accesses
> to freed memory, failed assertions etc. If no errors found then
> verification terminates when particular number of interleavings are
> verified (for random scheduler), or when all possible interleavings
> are verified (for full search scheduler). If error is found then tool
> outputs detailed execution history which leads to
> error and terminates.
> The tool was designed for verification of algorithms like memory
> management, memory reclamation for lock-free algorithms, multithreaded
> containers (queues, stacks, maps), mutexes, eventcounts and so on.
> My personal subjective feeling to date is that tool is capable of
> finding extremely subtle algorithmic errors, memory fence placement
> errors and memory fence type errors within a second. And after error
> is detected error fixing is relatively trivial, because one has
> detailed execution history which leads to error.


[...]


After you integrate a mutex and condvar into the detection framework, I
would like to model an eventcount.

Also, it seems like you could parallelize the testing process somewhat... I
need to think about this some more.

Anyway, nice work.


BTW, you ask if there is any market for this type of tool in an e-mail, well
I hope you don't mind if I answer that here...



IMVOH, when C++ 0x is finally released, it should have a fairly big impact.
I can see it now... You trademark a logo, and give a company the chance to
stick it on their products to give their customers piece of mind:

Customer: "Well, the software has the Relacy logo on the front of the box;
it must be correct concurrent software indeed!"

Vendor: "Are software lives up to the prerequisites of the Relacy logo. We
are proud to be able to display it on our software packaging."


I hope you make tons of $$$ with this tool Dmitriy. Get rich and, then a
little richer; AFAICT its a winner.


Any thoughts?


=^D

 
Reply With Quote
 
Chris M. Thomasson
Guest
Posts: n/a
 
      08-05-2008
"Chris M. Thomasson" <(E-Mail Removed)> wrote in message
news:gH4mk.11785$(E-Mail Removed)...
> "Dmitriy V'jukov" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>>I want to announce tool called Relacy Race Detector, which I've
>> developed.

[...]
>
> [...]
>
> After you integrate a mutex and condvar into the detection framework, I
> would like to model an eventcount.


I would be interested to see how long it takes to detect the following bug
you found in an older version of AppCore:

http://groups.google.com/group/comp....cf4a952ff5af7d

[...]

 
Reply With Quote
 
Dmitriy V'jukov
Guest
Posts: n/a
 
      08-06-2008
On 6 ΑΧΗ, 02:54, "Chris M. Thomasson" <(E-Mail Removed)> wrote:

> I would be interested to see how long it takes to detect the following bug
> you found in an older version of AppCore:
>
> http://groups.google.com/group/comp..../browse_frm/th...


Ok, let's see. I've just finished basic mutex and condvar
implementation.

Here is eventcount:

class eventcount
{
public:
eventcount()
: count(0)
, waiters(0)
{}

void signal_relaxed()
{
unsigned cmp = count($).load(std::memory_order_relaxed);
signal_impl(cmp);
}

void signal()
{
unsigned cmp = count($).fetch_add(0, std::memory_order_seq_cst);
signal_impl(cmp);
}

unsigned get()
{
unsigned cmp = count($).fetch_or(0x80000000,
std::memory_order_seq_cst);
return cmp & 0x7FFFFFFF;
}

void wait(unsigned cmp)
{
unsigned ec = count($).load(std::memory_order_seq_cst);
if (cmp == (ec & 0x7FFFFFFF))
{
guard.lock($);
ec = count($).load(std::memory_order_seq_cst);
if (cmp == (ec & 0x7FFFFFFF))
{
waiters($) += 1;
cv.wait(guard, $);
}
guard.unlock($);
}
}

private:
std::atomic<unsigned> count;
rl::var<unsigned> waiters;
mutex guard;
condition_variable_any cv;

void signal_impl(unsigned cmp)
{
if (cmp & 0x80000000)
{
guard.lock($);
while (false == count($).compare_swap(cmp,
(cmp + 1) & 0x7FFFFFFF, std::memory_order_relaxed));
unsigned w = waiters($);
waiters($) = 0;
guard.unlock($);
if (w)
cv.notify_all($);
}
}
};

Here is the queue:

template<typename T>
class spsc_queue
{
public:
spsc_queue()
{
node* n = RL_NEW node ();
head($) = n;
tail($) = n;
}

~spsc_queue()
{
RL_ASSERT(head($) == tail($));
RL_DELETE((node*)head($));
}

void enqueue(T data)
{
node* n = RL_NEW node (data);
head($)->next($).store(n, std::memory_order_release);
head($) = n;
ec.signal_relaxed();
}

T dequeue()
{
T data = try_dequeue();
while (0 == data)
{
int cmp = ec.get();
data = try_dequeue();
if (data)
break;
ec.wait(cmp);
data = try_dequeue();
if (data)
break;
}
return data;
}

private:
struct node
{
std::atomic<node*> next;
rl::var<T> data;

node(T data = T())
: next(0)
, data(data)
{}
};

rl::var<node*> head;
rl::var<node*> tail;

eventcount ec;

T try_dequeue()
{
node* t = tail($);
node* n = t->next($).load(std::memory_order_acquire);
if (0 == n)
return 0;
T data = n->data($);
RL_DELETE(t);
tail($) = n;
return data;
}
};

Here is the test:

struct spsc_queue_test : rl::test_suite<spsc_queue_test, 2>
{
spsc_queue<int> q;

void thread(unsigned thread_index)
{
if (0 == thread_index)
{
q.enqueue(11);
}
else
{
int d = q.dequeue();
RL_ASSERT(11 == d);
}
}
};

int main()
{
rl::simulate<spsc_queue_test>();
}

It takes around 40 minutes.

Let's run it!
Here is output (note error detected on iteration 3):

struct spsc_queue_test
DEADLOCK
iteration: 3

execution history:
[0] 1: [BEFORE BEGIN], in rl::context_impl<struct
spsc_queue_test,class rl::random_scheduler<2> >::fiber_proc_impl,
context.hpp(341)
[1] 1: <0035398C> atomic store, value=-1, (prev value=0),
order=relaxed, in rl::generic_mutex<0>::generic_mutex, stdlib/
mutex.hpp(77)
[2] 1: memory allocation: addr=00353A80, size=52, in
spsc_queue<int>::spsc_queue, peterson.cpp(397)
[3] 1: <00353938> store, value=00353A80, in
spsc_queue<int>::spsc_queue, peterson.cpp(39
[4] 1: <00353948> store, value=00353A80, in
spsc_queue<int>::spsc_queue, peterson.cpp(399)
[5] 1: [BEFORE END], in rl::context_impl<struct spsc_queue_test,class
rl::random_scheduler<2> >::fiber_proc_impl, context.hpp(356)
[6] 1: <00353948> load, value=00353A80, in
spsc_queue<int>::try_dequeue, peterson.cpp(452)
[7] 1: <00353A80> atomic load, value=00000000, order=acquire, in
spsc_queue<int>::try_dequeue, peterson.cpp(453)
[8] 1: <00353958> fetch_or , prev=0, op=2147483648, new=2147483648,
order=seq_cst, in eventcount::get, peterson.cpp(349)
[9] 1: <00353948> load, value=00353A80, in
spsc_queue<int>::try_dequeue, peterson.cpp(452)
[10] 0: memory allocation: addr=0035BCA0, size=52, in
spsc_queue<int>::enqueue, peterson.cpp(410)
[11] 0: <00353938> load, value=00353A80, in spsc_queue<int>::enqueue,
peterson.cpp(411)
[12] 1: <00353A80> atomic load, value=00000000, order=acquire, in
spsc_queue<int>::try_dequeue, peterson.cpp(453)
[13] 0: <00353A80> atomic store, value=0035BCA0, (prev
value=00000000), order=release, in spsc_queue<int>::enqueue,
peterson.cpp(411)
[14] 0: <00353938> store, value=0035BCA0, in spsc_queue<int>::enqueue,
peterson.cpp(412)
[15] 0: <00353958> atomic load, value=0 [NOT CURRENT], order=relaxed,
in eventcount::signal_relaxed, peterson.cpp(337)
[16] 1: <00353958> atomic load, value=2147483648, order=seq_cst, in
eventcount::wait, peterson.cpp(355)
[17] 1: <0035398C> CAS succ orig=-1, cmp=-1, xchg=1, order=acquire, in
rl::generic_mutex<0>::lock, stdlib/mutex.hpp(117)
[18] 1: <0035398C> mutex: lock, in eventcount::wait, peterson.cpp(35
[19] 1: <00353958> atomic load, value=2147483648, order=seq_cst, in
eventcount::wait, peterson.cpp(359)
[20] 1: <0035397C> load, value=0, in eventcount::wait,
peterson.cpp(362)
[21] 1: <0035397C> store, value=1, in eventcount::wait,
peterson.cpp(362)
[22] 1: <0035398C> atomic store, value=-1, (prev value=1),
order=release, in rl::generic_mutex<0>::unlock, stdlib/mutex.hpp(209)
[23] 1: <0035398C> mutex: unlock, in eventcount::wait,
peterson.cpp(363)
[24] 1: blocking thread, in eventcount::wait, peterson.cpp(363)
[25] 1: deadlock detected, in eventcount::wait, peterson.cpp(363)

thread 0:
[10] 0: memory allocation: addr=0035BCA0, size=52, in
spsc_queue<int>::enqueue, peterson.cpp(410)
[11] 0: <00353938> load, value=00353A80, in spsc_queue<int>::enqueue,
peterson.cpp(411)
[13] 0: <00353A80> atomic store, value=0035BCA0, (prev
value=00000000), order=release, in spsc_queue<int>::enqueue,
peterson.cpp(411)
[14] 0: <00353938> store, value=0035BCA0, in spsc_queue<int>::enqueue,
peterson.cpp(412)
[15] 0: <00353958> atomic load, value=0 [NOT CURRENT], order=relaxed,
in eventcount::signal_relaxed, peterson.cpp(337)

thread 1:
[0] 1: [BEFORE BEGIN], in rl::context_impl<struct
spsc_queue_test,class rl::random_scheduler<2> >::fiber_proc_impl,
context.hpp(341)
[1] 1: <0035398C> atomic store, value=-1, (prev value=0),
order=relaxed, in rl::generic_mutex<0>::generic_mutex, stdlib/
mutex.hpp(77)
[2] 1: memory allocation: addr=00353A80, size=52, in
spsc_queue<int>::spsc_queue, peterson.cpp(397)
[3] 1: <00353938> store, value=00353A80, in
spsc_queue<int>::spsc_queue, peterson.cpp(39
[4] 1: <00353948> store, value=00353A80, in
spsc_queue<int>::spsc_queue, peterson.cpp(399)
[5] 1: [BEFORE END], in rl::context_impl<struct spsc_queue_test,class
rl::random_scheduler<2> >::fiber_proc_impl, context.hpp(356)
[6] 1: <00353948> load, value=00353A80, in
spsc_queue<int>::try_dequeue, peterson.cpp(452)
[7] 1: <00353A80> atomic load, value=00000000, order=acquire, in
spsc_queue<int>::try_dequeue, peterson.cpp(453)
[8] 1: <00353958> fetch_or , prev=0, op=2147483648, new=2147483648,
order=seq_cst, in eventcount::get, peterson.cpp(349)
[9] 1: <00353948> load, value=00353A80, in
spsc_queue<int>::try_dequeue, peterson.cpp(452)
[12] 1: <00353A80> atomic load, value=00000000, order=acquire, in
spsc_queue<int>::try_dequeue, peterson.cpp(453)
[16] 1: <00353958> atomic load, value=2147483648, order=seq_cst, in
eventcount::wait, peterson.cpp(355)
[17] 1: <0035398C> CAS succ orig=-1, cmp=-1, xchg=1, order=acquire, in
rl::generic_mutex<0>::lock, stdlib/mutex.hpp(117)
[18] 1: <0035398C> mutex: lock, in eventcount::wait, peterson.cpp(35
[19] 1: <00353958> atomic load, value=2147483648, order=seq_cst, in
eventcount::wait, peterson.cpp(359)
[20] 1: <0035397C> load, value=0, in eventcount::wait,
peterson.cpp(362)
[21] 1: <0035397C> store, value=1, in eventcount::wait,
peterson.cpp(362)
[22] 1: <0035398C> atomic store, value=-1, (prev value=1),
order=release, in rl::generic_mutex<0>::unlock, stdlib/mutex.hpp(209)
[23] 1: <0035398C> mutex: unlock, in eventcount::wait,
peterson.cpp(363)
[24] 1: blocking thread, in eventcount::wait, peterson.cpp(363)
[25] 1: deadlock detected, in eventcount::wait, peterson.cpp(363)

Operations on condition variables are not yet in execution history.

Let's replace 'ec.signal_relaxed()' with 'ec.signal()'. Here is the
output:

struct spsc_queue_test
6% (65536/1000000)
13% (131072/1000000)
19% (196608/1000000)
26% (262144/1000000)
32% (327680/1000000)
39% (393216/1000000)
45% (458752/1000000)
52% (524288/1000000)
58% (589824/1000000)
65% (655360/1000000)
72% (720896/1000000)
78% (786432/1000000)
85% (851968/1000000)
91% (917504/1000000)
98% (983040/1000000)
iterations: 1000000
total time: 2422
throughput: 412881

Test passed. Throughput is around 400'000 test executions per second.

Also test reveals some errors with memory fences.
In signaling function you are using acquire fence in cas, but relaxed
is enough here:
void signal_impl(unsigned cmp)
{
if (cmp & 0x80000000)
{
guard.lock($);
while (false == count($).compare_swap(cmp,
(cmp + 1) & 0x7FFFFFFF, std::memory_order_relaxed));
unsigned w = waiters($);
waiters($) = 0;
guard.unlock($);
if (w)
cv.notify_all($);
}
}

Also. Following naive approach doesn't work (at least in C++0x):
void signal()
{
std::atomic_thread_fence(std::memory_order_seq_cst );
signal_relaxed();
}
You have to execute sequentially consistent atomic RMW on 'ec'
variable here. Because sequentially consistent fence and sequentially
consistent atomic RMW operation doesn' t synchronize-with each other.
Hmmm... This is strange. But I can't find anything about this in
draft... Anthony Williams said that committee made some changes to
initial proposal on fences, but they are not yet published...


Dmitriy V'jukov
 
Reply With Quote
 
Chris M. Thomasson
Guest
Posts: n/a
 
      08-06-2008
"Dmitriy V'jukov" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
On 6 ΑΧΗ, 02:54, "Chris M. Thomasson" <(E-Mail Removed)> wrote:

> > I would be interested to see how long it takes to detect the following
> > bug
> > you found in an older version of AppCore:
> >
> > http://groups.google.com/group/comp..../browse_frm/th...


> Ok, let's see. I've just finished basic mutex and condvar
> implementation.


Can you e-mail the new version to me please?




> Here is eventcount:


class eventcount
{
[...]
void wait(unsigned cmp)
{
unsigned ec = count($).load(std::memory_order_seq_cst);
if (cmp == (ec & 0x7FFFFFFF))
{
guard.lock($);
ec = count($).load(std::memory_order_seq_cst);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^
I don't think that you need a memory barrier under the lock because the
only mutation which will make the following compare succeed is also
performed under the same lock. The signal-slow path is under the lock, and
the waiter slow-path is under the lock. However, I don't know if this
violates some rule in C++ 0x.

if (cmp == (ec & 0x7FFFFFFF))
{
waiters($) += 1;
cv.wait(guard, $);
}
guard.unlock($);
}
}
};

> Here is the queue:


[...]

> Here is the test:


[...]

> It takes around 40 minutes.


> Let's run it!
> Here is output (note error detected on iteration 3):


> struct spsc_queue_test
> DEADLOCK
> iteration: 3


Cool; it sure does detect the race-condition in the old AppCore! Very nice
indeed.



[...]

> Operations on condition variables are not yet in execution history.


> Let's replace 'ec.signal_relaxed()' with 'ec.signal()'. Here is the
> output:


> struct spsc_queue_test

[...]
> iterations: 1000000
> total time: 2422
> throughput: 412881


> Test passed. Throughput is around 400'000 test executions per second.


Beautiful. My eventcount algorihtm works!

=^D




> Also test reveals some errors with memory fences.
> In signaling function you are using acquire fence in cas, but relaxed
> is enough here:
> void signal_impl(unsigned cmp)
> {
> if (cmp & 0x80000000)
> {
> guard.lock($);
> while (false == count($).compare_swap(cmp,
> (cmp + 1) & 0x7FFFFFFF, std::memory_order_relaxed));
> unsigned w = waiters($);
> waiters($) = 0;
> guard.unlock($);
> if (w)
> cv.notify_all($);
> }
> }


Right. The current public AppCore atomic API is not fine-grain enough to
allow for relaxed operations. I need to change that.




> Also. Following naive approach doesn't work (at least in C++0x):
> void signal()
> {
> std::atomic_thread_fence(std::memory_order_seq_cst );
> signal_relaxed();
> }


Good thing the MFENCE works in x86!




> You have to execute sequentially consistent atomic RMW on 'ec'
> variable here. Because sequentially consistent fence and sequentially
> consistent atomic RMW operation doesn' t synchronize-with each other.
> Hmmm... This is strange. But I can't find anything about this in
> draft... Anthony Williams said that committee made some changes to
> initial proposal on fences, but they are not yet published...


This is weird. a seq_cst fence does not work with seq_cst RMW!? Do you
happen to know the rational for this? IMVHO, it makes no sense at all... I
must be missing something.

:^/

 
Reply With Quote
 
Dmitriy V'jukov
Guest
Posts: n/a
 
      08-06-2008
On Aug 6, 12:23 am, "Chris M. Thomasson" <(E-Mail Removed)> wrote:

> Have you testing the following algorithm yet?
>
> http://research.sun.com/scalable/pub...rkstealing.pdf


Yes and no. No, I didn't test this particular algorithm. But I test my
analogous algorithm for work-stealing deque.
It's extremely hardcore lock-free algorithm with 500+ LOC. And it was
hangup sometimes, and sometimes cause SIGSEGV. I was trying to debug
it manually, I was trying to insert some asserts, I was trying to
insert some "yields", I was trying to review it several times, I was
trying to analyze dumps. Uselessly.
When I run it under Relacy Race Detector I localize and fix 2 subtle
algo errors and clarify some moments with memory fences it about 3
hours. And it was almost mechanical work, not some black magic.

Dmitriy V'jukov
 
Reply With Quote
 
Dmitriy V'jukov
Guest
Posts: n/a
 
      08-06-2008
On Aug 6, 2:46 am, "Chris M. Thomasson" <(E-Mail Removed)> wrote:

> Also, it seems like you could parallelize the testing process somewhat... I
> need to think about this some more.


Yes, this is in todo list:
http://groups.google.com/group/relac...o-feature-list

Feel free to comment on todo/feature list, propose new items, and
prioritize items.
It's better to do this here:
http://groups.google.com/group/relacy/topics

Dmitriy V'jukov
 
Reply With Quote
 
Dmitriy V'jukov
Guest
Posts: n/a
 
      08-06-2008
On Aug 6, 2:46 am, "Chris M. Thomasson" <(E-Mail Removed)> wrote:

> IMVOH, when C++ 0x is finally released, it should have a fairly big impact.
> I can see it now... You trademark a logo, and give a company the chance to
> stick it on their products to give their customers piece of mind:
>
> Customer: "Well, the software has the Relacy logo on the front of the box;
> it must be correct concurrent software indeed!"
>
> Vendor: "Are software lives up to the prerequisites of the Relacy logo. We
> are proud to be able to display it on our software packaging."
>
> I hope you make tons of $$$ with this tool Dmitriy. Get rich and, then a
> little richer; AFAICT its a winner.
>
> Any thoughts?
>
> =^D



Sounds great!

I think I can start developing 2 logos:
[RELACY APPROVED]
and the second:
[RELACY BUSTED]


Dmitriy V'jukov
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Filtered Back Projection Algorithm (FBP Algorithm) Bapaiah Katepalli VHDL 1 06-23-2006 04:50 PM
synchronization Firefox 0 04-12-2005 03:44 PM
System Time synchronization arcvonz ASP .Net 2 08-23-2004 10:17 PM
Flip Flop Synchronization John VHDL 3 01-05-2004 05:58 PM
Key generation algorithm and Cipher algorithm Ahmed Moustafa Java 0 11-15-2003 06:35 AM



Advertisments