Saturday, May 26, 2007

Fix the SEMAPHORE GDB debug bug

When we use gdb to debug posix thread program that use SEMAPHORE machanism. If at the time we debug, there are some threads stop at sem_wait, when we quit GDB, the condition of sem_wait will become invalid and got some uncertain errors even get the program core dump. As shown in the following sample program:

#include <STDIO.H>

#include <ERRNO.H>

#include <PTHREAD.H>

#include<SEMAPHORE.H>

sem_t empty;

sem_t full;

int value[1];

void * child1(void *arg)

{

int i = 0;

int ret = 0;

pthread_t tid=pthread_self();

for(;;)
{
value[0]=i++;
printf("thread %d set value : %d\n",tid, value[0]);
ret = sem_post(&full);
if(ret)
printf("thread %d sem_post full fail: %d \n", tid, ret);
sleep(5);
if(0 != (ret = sem_wait(
&empty))))
{
printf ("Sem_wait returned %ld\n", (unsigned long)ret);
printf("sem_wait for handler failed"); exit(1);
}
}
}
void * child2(void *arg)
{
pthread_t tid=pthread_self();
int ret = 0;
for(;;)
{
if(0 != (ret = sem_wait(&full)))
{
printf ("Sem_wait returned %ld\n", (unsigned long)ret);
printf("sem_wait for handler failed");
exit(1);
}
printf("thread %d get value %d\n",tid, value[0]);
value[0]=0;
ret = sem_post(&empty
);

if(ret)
printf("thread %d sem_post empty fail:%d\n", tid, ret); sleep(1);
}
}
int main(void)
{
pthread_t tid1,tid2;
sem_init (
&empty), 0, 0);
sem_init (&full, 0, 0);
printf("hello\n");
value[0]=0;
pthread_create(&tid1,NULL,child1,NULL);
pthread_create(&tid2,NULL,child2,NULL);
sleep(1000);
printf("main thread exit\n");
return 0;
}

In this sample program. if when we debug it use GDB, Thread 2 wait at After the GDB process terminated. sem_wait,sem_wait will return a non-zero value automatically. But it is not what we want to see, because it should wait.

Why? Can it be fixed? The answer is yes. the quit of GDB will got a interupt signal EINTR, you can check the global variable errno whether equals EINTR, if this is , we ignore the signal and wait again. if not, another error emerges. I exit the program explictly, you can also call assert(0) to get the program core dump and then debug it.

code patch of the program shows as following:

if(0 != (ret = sem_wait(&empty))))

{

printf ("Sem_wait returned %ld\n", (unsigned long)ret);

printf("sem_wait for handler failed"); exit(1);

}

改成:

while (0 != (ret = sem_wait(&empty))))

{

if (errno != EINTR)

{

printf ("Sem_wait returned %ld\n", (unsigned long)ret); printf("sem_wait for handler failed"); exit(1);

}

}

if(0 != (ret = sem_wait(&full)))

{

printf ("Sem_wait returned %ld\n", (unsigned long)ret);

printf("sem_wait for handler failed");

exit(1);

}

改成:

while (0 != (ret = sem_wait(&full)))

{

if (errno != EINTR)

{

printf ("Sem_wait returned %ld\n", (unsigned long)ret);

printf("sem_wait for handler failed");

exit(1);

}

}


1 comment:

joshua said...

It's so nice for me to have found this blog of yours, it's so interesting. I sure hope and wish that you take courage enough to pay me a visit in my PALAVROSSAVRVS REX!, and plus get some surprise. My blog is also so cool! Off course be free to comment as you wish.