4

I'm synchronizing reader and writer processes on Linux.

I have 0 or more process (the readers) that need to sleep until they are woken up, read a resource, go back to sleep and so on. Please note I don't know how many reader processes are up at any moment. I have one process (the writer) that writes on a resource, wakes up the readers and does its business until another resource is ready (in detail, I developed a no starve reader-writers solution, but that's not important).

To implement the sleep / wake up mechanism I use a Posix condition value, pthread_cond_t. The clients call a pthread_cond_wait() on the variable to sleep, while the server does a pthread_cond_broadcast() to wake them all up. As the manual says, I surround these two calls with a lock/unlock of the associated pthread mutex.

The condition variable and the mutex are initialized in the server and shared between processes through a shared memory area (because I'm not working with threads, but with separate processes) an I'm sure my kernel / syscall support it (because I checked _POSIX_THREAD_PROCESS_SHARED).

What happens is that the first client process sleeps and wakes up perfectly. When I start the second process, it blocks on its pthread_cond_wait() and never wakes up, even if I'm sure (by the logs) that pthread_cond_broadcast() is called.

If I kill the first process, and launch another one, it works perfectly. In other words, the condition variable pthread_cond_broadcast() seems to wake up only one process a time. If more than one process wait on the very same shared condition variable, only the first one manages to wake up correctly, while the others just seem to ignore the broadcast.

Why this behaviour? If I send a pthread_cond_broadcast(), every waiting process should wake up, not just one (and, however, not always the same one).

5 Answers 5

7

Have you set the PTHREAD_PROCESS_SHARED attribute on both your condvar and mutex?

For Linux consult the following man pages:

Methods, types, constants etc. are normally defined in /usr/include/pthread.h, /usr/include/nptl/pthread.h.

Sign up to request clarification or add additional context in comments.

3 Comments

Vlad, I'm on Linux, there's no such attribute (according to the manpages).
@james, check your header files (find /usr/include/ -type f | xargs egrep '(PTHREAD_PROCESS_SHARED|pthread_condattr_setpshared|pthread_mutexattr_setpshared)'), it should all be there in /usr/include/pthread.h, even on Linux (it's POSIX after all, and I have it on my CentOS 4.x box.)
...which also bears the question, while we're at it, what Linux are you on? :) (uname -a; cat /etc/issue)
5

Do you test for some condition before calling pthread_cond_wait() ? I am asking because, it's a very common mistake : Your process must not call wait() unless you know some other process will call signal() (or broadcast()) later.

concidering this code (from pthread_cond_wait man page) :

          pthread_mutex_lock(&mut);
          while (x <= y) {
                  pthread_cond_wait(&cond, &mut);
          }
          /* operate on x and y */
          pthread_mutex_unlock(&mut);

If your omit the while test, and just signal from another process whenever your (x <= y) condition is true, it won't work since the signal only wakes up the processes that are already waiting. If signal() was called before the other process calls wait() the signal will be lost and the waiting process will be waiting forever.

EDIT : About the while loop. When you are signaling one process from another process it is set on the ''ready list'' but not necessarily scheduled and your condition (x <= y) may be change again since no one holds the lock. That's why you need to check for your condition each time you are about to wait. It should always be wakeup -> check if the condition is still true -> do work.

hope it's clear.

1 Comment

I don't fully understand your answer.. how can adding a while loop prevent the wait from blocking?
1

The documentation says that it should work... are you sure it's the same conditional value that the rest of the threads are looking at?

This is the example code from opengroup.org:

pthread_cond_wait(mutex, cond):
    value = cond->value; /* 1 */
    pthread_mutex_unlock(mutex); /* 2 */
    pthread_mutex_lock(cond->mutex); /* 10 */
    if (value == cond->value) { /* 11 */
        me->next_cond = cond->waiter;
        cond->waiter = me;
        pthread_mutex_unlock(cond->mutex);
        unable_to_run(me);
    } else
        pthread_mutex_unlock(cond->mutex); /* 12 */
    pthread_mutex_lock(mutex); /* 13 */


pthread_cond_signal(cond):
    pthread_mutex_lock(cond->mutex); /* 3 */
    cond->value++; /* 4 */
    if (cond->waiter) { /* 5 */
        sleeper = cond->waiter; /* 6 */
        cond->waiter = sleeper->next_cond; /* 7 */
        able_to_run(sleeper); /* 8 */
    }
    pthread_mutex_unlock(cond->mutex); /* 9 */

Comments

0

what the last poster said is correct. the KEY to the whole cond-variable situation working correctly is that the cond-var is NOT signalled prior to it being waited on. its strictly a signal that is to be used when others (single or multiple) are waiting. when no one is waiting, its effectively a NOP. which, btw, is NOT how i believe it SHOULD work, but how it DOES work.

larry

1 Comment

You should post this as a comment, not as a new answer.
0

I know that this question is quite old but anyway:

At least on Linux, there is a far better mechanism in the Linux kernel that is comparable to Windows events called "eventfd" (event file descriptor).

Personally, I stopped using POSIX condition variables for the above mentioned reason 'if noone is waiting for the condition, then a signalled condition is just lost'.

You may use eventfds as a reliable mechanism to implement the 'producer-consumer-pattern'. Please check the man page eventfd(2) - there is also a special flag available turning an eventfd in a semaphore-style event (EFD_SEMAPHORE).

Writing to it increments an internal 64bit counter (which is accompanied by a wait queue). Reading decrements it (or zeros it out, depending on the semaphore mode). If it is already zero, waiting is performed. Because this is a normal file descriptor, you can use select, poll or epoll to wait for the event to occur. Very clean and nice kernel implementation!

This works like a rock since 2007 (when the eventfd was added to the Linux kernel). Just forget about POSIX condvars on Linux.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.