Multithreaded Programming Guide
只搜寻这本书
以 PDF 格式下载本书

Covering Multithreading Basics

1

The word multithreading can be translated as many threads of control. While a traditional UNIX process always has contained and still does contain a single thread of control, multithreading (MT) separates a process into many execution threads, each of which runs independently.
Read this chapter to understand the multithreading basics.
Defining Multithreading Termspage 2
Benefiting From Multithreadingpage 3
Looking At Multithreading Structurepage 5
Meeting Multithreading Standardspage 10
Because each thread runs independently, multithreading your code can:
  • Improve application responsiveness
  • Use multiprocessors more efficiently
  • Improve your program structure
  • Use fewer system resources
  • Improve performance

Defining Multithreading Terms

The following terms are used in this chapter to describe multithreading concepts.
ThreadA sequence of instructions executed within the context of a process
Single-threadedRestricting access to a single thread
MultithreadedAllowing access to two or more threads
User-level orThreads managed by the threads library routines in user
Application-level(as opposed to kernel) space
threads
Lightweight processesThreads in the kernel that execute kernel code and system calls (also called LWPs)
Bound threadsThreads that are permanently bound to LWPs
Unbound threadsThreads that attach and detach from among the LWP pool
Counting semaphoreA memory-based synchronization mechanism

Defining Concurrency and Parallelism

Concurrency exists when at least two threads are in progress at the same time. Parallelism arises when at least two threads are executing simultaneously.
In a multithreaded process on a single processor, the processor can switch execution resources between threads, resulting in concurrent execution. In the same multithreaded process on a shared-memory multiprocessor, each thread in the process can run on a separate processor at the same time, resulting in parallel execution.
When the process has as many threads as, or fewer threads than, there are processors, the threads support system and the operating system ensure that each thread runs on a different processor. For example, in a matrix multiplication with m processors and m threads, each thread computes a row of the result.

Benefiting From Multithreading

Improve Application Responsiveness

Any program in which many activities are not dependent upon each other can be redesigned so that each activity is fired off as a thread. For example, a GUI in which you are performing one activity while starting up another will show improved performance when implemented with threads.

Use Multiprocessors Efficiently

Typically, applications that express concurrency requirements with threads need not take into account the number of available processors. The performance of the application improves transparently with additional processors.
Numerical algorithms and applications with a high degree of parallelism, such as matrix multiplications, can run much faster when implemented with threads on a multiprocessor.

Improve Program Structure

Many programs are more efficiently structured as multiple independent or semi-independent units of execution instead of as a single, monolithic thread. Multithreaded programs can be more adaptive to variations in user demands than are single threaded programs.

Use Fewer System Resources

Programs that use two or more processes that access common data through shared memory are applying more than one thread of control. However, each process has a full address space and operating systems state. The cost of creating and maintaining this large amount of state makes each process much more expensive than a thread in both time and space. In addition, the inherent separation between processes can require a major effort by the programmer to communicate between the threads in different processes or to synchronize their actions.

Combine Threads and RPC

By combining threads and a remote procedure call (RPC) package, you can exploit nonshared-memory multiprocessors (such as a collection of workstations). This combination distributes your application relatively easily and treats the collection of workstations as a multiprocessor.
For example, one thread might create child threads. Each of these children could then place a remote procedure call, invoking a procedure on another workstation. Although the original thread has merely created a number of threads that are now running in parallel, this parallelism involves other computers.

Improve Performance

The performance numbers in this section were obtained on a SPARCstation(TM) 2 (Sun 4/75). The measurements were made using the built-in microsecond resolution timer.

Thread Creation Time

Table 1-1 shows the time consumed to create a thread using a default stack that is cached by the threads package. The measured time includes only the actual creation time. It does not include the time for the initial context switch to the thread. The ratio column gives the ratio of the creation time in that row to the creation time in the previous row.
These data show that threads are inexpensive. The operation of creating a new process is over 30 times as expensive as creating an unbound thread, and about 5 times the cost of creating a bound thread consisting of both a thread and an LWP.
Table 1-1
OperationMicrosecondsRatio
Create unbound thread52-
Create bound thread3506.7
fork()170032.7

Thread Synchronization Times

Table 1-2 shows the time it takes for two threads to synchronize with each other using two p and v semaphores.
Table 1-2
OperationMicrosecondsRatio
Unbound thread66-
Bound thread3905.9
Between processes2003

Looking At Multithreading Structure

Traditional UNIX already supports the concept of threads--each process contains a single thread, so programming with multiple processes is programming with multiple threads. But a process is also an address space, and creating a process involves creating a new address space.
Because of this, creating a process is expensive, while creating a thread within an existing process is cheap. The time it takes to create a thread is on the order of a thousand times less than the time it takes to create a process, partly because switching between threads does not involve switching between address spaces.
Communicating between the threads of one process is simple because the threads share everything--address space, in particular. So, data produced by one thread is immediately available to all the other threads.
The interface to multithreading support is through a subroutine library, libthread. Multithreading provides flexibility by decoupling kernel-level and user-level resources.

User-level Threads

1
Threads are visible only from within the process, where they share all process resources like address space, open files, and so on. The following state is unique to each thread.
  • Thread ID
  • Register state (including PC and stack pointer)
  • Stack
  • Signal mask
  • Priority
  • Thread-private storage
Because threads share the process instructions and most of its data, a change in shared data by one thread can be seen by the other threads in the process. When a thread needs to interact with other threads in the same process, it can do so without involving the operating system.
Threads are the primary programming interface in multithreaded programming. User-level threads are handled in user space and so can avoid kernel context switching penalties. An application can have thousands of threads and still not consume many kernel resources. How many kernel resources the application uses is largely determined by the application.
By default, threads are very lightweight. But, to get more control over a thread (for instance, to control scheduling policy more), the application can bind the thread. When an application binds threads to execution resources, the threads become kernel resources (see "Bound Threads" on page 9 for more information).


1. User-level threads are so named to distinguish them from kernel-level threads, which are the concern of systems programmers, only. Because this book is for application programmers, kernel-level threads are not discussed here.
To summarize, Solaris user-level threads are:
  • Inexpensive to create because they are bits of virtual memory that are allocated from your address space at run time
  • Fast to synchronize because synchronization is done at the application level, not at the kernel level
  • Easily managed by the threads library, libthread

图形

Lightweight Processes

The threads library uses underlying threads of control called lightweight processes that are supported by the kernel. You can think of an LWP as a virtual CPU that executes code or system calls.
Most programmers use threads without thinking about LWPs. All the information here about LWPs is provided so you can understand the differences between bound and unbound threads, described on page 9.

Note - The LWPs in Solaris 2.x are not the same as the LWPs in the SunOS(TM) 4.0 LWP library, which are not supported in Solaris 2.x.

Much as the stdio library routines such as fopen(3S) and fread(3S) use the open(2) and read(2) functions, the thread interface uses the LWP interface, and for many of the same reasons.
Lightweight processes (LWPs) bridge the user level and the kernel level. Each process contains one or more LWPs, each of which runs one or more user threads. The creation of a thread usually involves just the creation of some user context, but not the creation of an LWP.
The user-level threads library, with help from the programmer and the operating system, ensures that the number of LWPs available is adequate for the currently active user-level threads. However, there is no one-to-one mapping between user threads and LWPs, and user-level threads can freely migrate from one LWP to another.
The programmer can tell the threads library how many threads should be "running" at the same time. For example, if the programmer says that up to three threads should run at the same time, then at least three LWPs should be available. If there are three available processors, the threads run in parallel. If there is only one processor, then the operating system multiplexes the three LWPs on that one processor. If all the LWPs block, the threads library adds another LWP to the pool.
When a user thread blocks due to synchronization, its LWP transfers to another runnable thread. This transfer is done with a coroutine linkage and not with a system call.
The operating system decides which LWP should run on which processor and when. It has no knowledge about what user threads are or how many are active in each process. The kernel schedules LWPs onto CPU resources according to their scheduling classes and priorities. The threads library schedules threads on the process pool of LWPs in much the same way. Each LWP is independently dispatched by the kernel, performs independent system calls, incurs independent page faults, and runs in parallel on a multiprocessor system.
An LWP has some capabilities that are not exported directly to threads, such as a special scheduling class.

Unbound Threads

Threads that are scheduled on the LWP pool are called unbound threads. You will usually want your threads to be unbound, allowing them to float among the LWPs.
The library invokes LWPs as needed and assigns them to execute runnable threads. The LWP assumes the state of the thread and executes its instructions. If the thread becomes blocked on a synchronization mechanism, or if another thread should be run, the thread state is saved in process memory and the threads library assigns another thread to the LWP to run.

Bound Threads

If needed, you can permanently bind a thread to an LWP.
For example, you can bind a thread to:
  • Have the thread scheduled globally (such as realtime)
  • Give the thread an alternate signal stack
  • Give the thread a unique alarm or timer
Sometimes having more threads than LWPs, as can happen with unbound threads, is a disadvantage.
For example, a parallel array computation divides the rows of its arrays among different threads. If there is one LWP for each processor, but multiple threads for each LWP, each processor spends time switching between threads. In this case, it is better to have one thread for each LWP, divide the rows among a smaller number of threads, and reduce the number of thread switches.
A mixture of threads that are permanently bound to LWPs and unbound threads is also appropriate for some applications.
An example of this is a realtime application that wants some threads to have system-wide priority and realtime scheduling, while other threads attend to background computations. Another example is a window system with unbound threads for most operations and a mouse serviced by a high-priority, bound, realtime thread.
When a user-level thread issues a system call, the LWP running the thread calls into the kernel and remains attached to the thread at least until the system call completes.

Meeting Multithreading Standards

The history of multithreaded programming goes back to at least the 1960s. Its development on UNIX systems goes back to the mid-1980s. Perhaps surprisingly, there is fair agreement about the features necessary to support multithreaded programming. Even so, several different thread packages are available today, each with a different interface.
However, for several years a group known as POSIX 1003.4a has been working on a standard for multithreaded programming. When the standard is finalized, most vendors of systems supporting multithreaded programming will support the POSIX interface. This will have the important benefit of allowing multithreaded programs to be portable.
There are no fundamental differences between Solaris threads and POSIX 1003.4a. Certainly the interfaces differ, but there is nothing that is expressible with one interface that cannot be expressed relatively easily with the other. There are no incompatibilities between the two, so, at least on Solaris systems, there will be one underlying implementation with two interfaces. Even within a single application, you will be able to use both interfaces.
Another reason for using Solaris threads is the collection of support tools supplied with it, such as the multithreaded debugger. truss, which traces a program's system calls and signals, has been extended to report on the activities of a program's threads as well.