Understanding Linux InterProcess Communication-IV: Duplicate Process

LINUX implements through the fork() and exec() system calls an elegant two-step mechanism for process creation and execution.

fork() is used to create the image of a process using the one of an existing one, and
exec is used to execute a program by overwriting that image with the program’s one.

This separation allows to perform some interesting housekeeping actions in between, as we’ll see in the following lectures.

A call to fork() of the form:

#include <unistd.h> 
      pid_t fork(void);

Return Value:

On success, the PID of the child process is returned in the parent, and 0 is returned in the child. On failure, -1 is returned in the parent, no child process is created, and errno is set appropriately.

Errors:

EAGAIN fork() cannot allocate sufficient memory to copy the parent’s page tables and allocate a task structure for the child.
EAGAIN It was not possible to create a new process because the caller’s RLIMIT_NPROC resource limit was encountered. To exceed this limit, the process must have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability.
ENOMEM fork() failed to allocate the necessary kernel structures because memory is tight.

It will creates (if it succeeds) a new process, which a child of the caller’s, and is an exact copy of the (parent) caller itself. By exact copy we mean that it’s image is a physical bitwise copy of the parent’s (in principle, they do not share the image in memory: though there can be exceptions to this rule, we can always thing of the two images as being stored in two separate and protected address spaces in memory, hence a manipulation of the parent’s variables won’t affect the child’s copies, and vice versa). The only visible differences are in the PCB, and the most relevant (for now) of them are the following:

The two processes obviously have two different process id.s. (pid). In a C program process id.s are conveniently represented by variables of pid_t type, the type being defined in the sys/types.h header.

In LINUX the PCB of a process contains the id of the process’s parent, hence the child’s PCB will contain as parent id (ppid) the pid of the process that called fork(), while the caller will have as ppid the pid of the process that spawned it.
The child process has its own copy of the parent’s file descriptors. These descriptors reference the same under-lying objects, so that files are shared between the child and the parent. This makes sense, since other processes might access those files as well, and having them already open in the child is a time-saver.

The fork() call returns in both the parent and the child, and both resume their execution from the statement immediately following the call. One usually wants that parent and child behave differently, and the way to distinguish between them in the program’s source code is to test the value returned by fork(). This value is 0 in the child, and the child’s pid in the parent. Since fork() returns -1 in case the child spawning fails, a catch-all C code fragment to separate behaviours may look like the following:

#include <sys/types.h>
#include <stdlib.h>
#include <errno.h>
#include <stdio.h>

int main()
{
 pid_t childpid;
 childpid=fork();
 switch(childpid)
 {
    case -1:
       fprintf(stderr,"ERROR: %s\n", sys_errlist[errno]);
       exit(1);
       break;
    case 0:
       /* Child's code goes here */
 printf(“ Child: My id is: %d and my parents id is: %d\n”,getpid(),getppid());
       break;
    default:
       printf(“ Parent: My id is: %d and my child id is: %d\n”,getpid(),childpid);
       break;
}

Note that a child (i.e. a process whatsoever, since they are all children of some other process, with the exception of processes 0, swapper and 1, init) cannot use the value returned by fork() to know its pid, since this is always 0 in the child. A system call named getpid() is provided for this purpose, and another one, named getppid() is used to ask the system about the parent’s id. Both functions take no arguments and return the requested value in pid_t type, or -1 in case of failure.

In the above program , a system call to exit() is made in case of failure, which causes the program to abort . We’ll see later that the exit() call returns the lower 8 bits of its argument (1, in the above example) to a waiting parent process, which can use them to determine the child’s exit status and behave accordingly. The usual convention is to exit with 0 on correct termination, and with a meaningful (for the parent) error code on abort.

It is often the case that a parent process must coordinate its actions with those of its children, maybe exchanging with them various kind of messages. UNIX defines several sophisticated inter-process communication (IPC) mechanisms, the simplest of which is a parent’s ability to test the termination status of its children. A synchronization mechanism is provided via the wait() system call, that allows a parent to sleep until one of its children exits, and then get its exit status. This call actually comes in three flavors,

one simply called wait() and common to all version of UNIX (that i know of),
one called waitpid(), which is a POSIX extension, and
one called wait3(), and it’s a BSD extension.

General Syntax of wait() system call is:

#include<sys/types.h>
#include <sys/wait.h>

pid_t wait(int *status);

DESCRIPTION

The wait function suspends execution of the current process until a child has exited, or until a signal is delivered whose action is to terminate the current process or to call a signal handling function. If a child has already exited by the time of the call (a so-called “zombie” process), the function returns immediately. Any system resources used by the child are freed.

Return Value:

The process ID of the child which exited, or zero if WNOHANG was used and no child was available, or -1 on error (in which case errno is set to an appropriate value).

Errors:

ECHILD : if the process specified in pid does not exist or is not a child of the calling process. (This can happen for one’s own child if the action for SIGCHLD is set to SIG_IGN. See also the LINUX NOTES section about threads.)
EINVAL : if the options argument was invalid.
EINTR : if WNOHANG was not set and an unblocked signal or a SIGCHLD was caught.

Here’s an example call to wait(): a program spawns two children, then waits for their completion and behaves differently according to which one is finished. Try to compile and execute it (no need to type: you can cut and paste from your web browser…).

#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    pid_t whichone, first, second;
    int howmany;
    int status;

    if ((first=fork())==0) /* Parent spawns 1st child */
    {
        printf("Hi, I am the first child, and my id is %d\n",getpid());
        sleep(10); /* Sleep 10 sec, then exit */
        exit(0); 
    }
    else if (first == -1)
    {
        perror("1st fork: something went bananas\n");
        exit(1);
    }
    else if ((second=fork())==0) /* Parent spawns 2nd child */
    {
        printf("Hiya, I am the second child, and my id is %d\n",getpid());
        sleep(15); /* Sleep 15 sec, then exit */
        exit(0); 
    }
    else if (second == -1)
    {
        perror("2nd fork: something went bananas\n");
        exit(1);
    }

    printf("This is the parent\n");

    howmany=0; 
    while (howmany < 2) /* Wait twice */
    {
        whichone=wait(&status);
        howmany++;

        if (whichone==first)
           printf("First child exited ");
        else
           printf("Second child exited ");

        if ((status & 0xffff)==0)
           printf("correctly\n");
        else
           printf("uncorrectly\n");
    }
    return 0;
}

The parent enters into the loop; waiting for the children’s completion. The wait() system call blocks the caller process until one of its immediate children (not children’s children, or other siblings) terminates, and then returns the pid of the terminated process. The argument to wait() is the address on an integer variable or the NULL pointer. If it’s not NULL, the system writes 16 bits of status information about the terminated child in the low-order 16 bits of that variable. Among these 16 bits, the higher 8 bits contain the lower 8 bits of the argument the child passed to exit() while the lower 8 bits are all zero if the process exited correctly, and contain error information if not (see the wait(2) man page for details). Hence, if a child exits with 0 all those 16 bits are zero. To reveal if this is actually the case we test the bitwise AND expression (status & 0xffff), which evaluates as an integer whose lower 16 bits are those of status, and the others are zero. If it evaluates to zero, everything went fine, otherwise some trouble occurred. Try changing the argument passed to exit() in one of the children.

The Posix and BSD extensions to wait() are useful when a parent must not block waiting for children, but still wants to know about the children’s termination status values via the wait mechanism. We’ll treat only the Posix waitpid() call, and you are referred to the man page for the BSD call.

The waitpid() call is declared as follows in the sys/wait.h header:

pid_t waitpid(pid_t pid, int *statptr, int options);

Here the meaning of the the return value and of the pointer to the status statptr is exactly the same in wait(). However this call allows to specify which children should be waited for and how. Specifically, the first argument pid specifies the process(es) that must be waited for. The relevant (for now) cases are:

pid == -1: all children are waited for;
pid > 0: it specifies the pid of a single child that should be waited for (an error occurs if that process does not exist or is not one of the caller’s children);

The relevant (for now) value for the third argument is a constant called WNOHANG, that causes the function not to suspend the caller’s execution if status is not immediately available for one of the child processes. This allows to implement a loop in which the parent can do something useful and periodically poll the children’s status as well.

Sources

http://www.cim.mcgill.ca/~franco/OpSys-304-427/lecture-notes/node16.html
http://linux.about.com/od/commands/l/blcmdl2_wait.htm
info pages of linux operating system

EmbLogic's Blog

Blog Members Area

Embedded Systems Trainings

Industrial Trainings

Android System Development

C with Linux

C++ with Linux

Linux Internals and System Programming

Linux Device Drivers

Embedded Applications with ARM

Embedded Systems using 8 bit Controllers

Blog Categories

Understanding Linux InterProcess Communication-IV: Duplicate Process

DESCRIPTION

Leave a Reply Cancel reply