-
Memory Model
There
is no guarantee that memory accesses by multiple threads to the same variable
without synchronization are atomic with respect to each other.
Several
implementation-dependent and application-dependent factors affect whether
accesses are atomic or not. Some variables might be larger than the largest
atomic memory operation on the target platform. Some variables might be mis-aligned
or of unknown alignment and the compiler or the run-time system may need to
use multiple loads/stores to access the variable. Sometimes there are faster
code sequences that use more loads/stores.
-
Internal Control Variables
The OpenMP runtime library maintains the following internal control
variables:
nthreads-var - stores the
number of threads requested for future parallel regions.
dyn-var - controls whether dynamic adjustment of the number
of threads to be used for future parallel regions is enabled.
nest-var - controls whether nested parallelism is enabled for
future parallel regions.
run-sched-var -
stores scheduling information to be used for loop regions using the RUNTIME schedule clause.
def-sched-var -
stores implementation defined default scheduling information for loop regions.
The runtime library maintains separate copies of each of nthreads-var, dyn-var, and nest-var for
each thread. On the other hand, the runtime library maintains one copy of
each of run-sched-var and def-sched-var that
applies to all threads.
-
Number of Threads
The default value of nthreads-var is 1. That
is, without an explicit num_threads() clause, a call
to the omp_set_num_threads() routine, or an explicit
definition of the OMP_NUM_THREADS environment variable,
the default number of threads in a team is 1.
A call to omp_set_num_threads() modifies the value of nthreads-var for
the calling thread only and applies to parallel regions at the same or inner
nesting level encountered by the calling thread.
If the requested
number of threads is greater than the number of threads an implementation
can support or if the value is not a positive integer, then if SUNW_MP_WARN is set to TRUE or a callback function
is registered by a call to sunw_mp_register_warn(),
a warning message will be issued.
-
Nested Parallelism
Nested parallelism is supported. Nested parallel regions can be executed
by multiple threads.
The default value of nest-var is
false. That is, nested parallelism is disabled by default. Set the OMP_NESTED environment variable, or call the omp_set_nested() routine
to enable it.
A call to omp_set_nested() modifies
the value of nest-var for the calling thread only
and applies to parallel regions at the same or inner nesting level encountered
by the calling thread.
By default, the maximum number of active
nesting levels supported is 4. You can change that maximum by setting the
environment variable SUNW_MP_MAX_NESTED_LEVELS.
-
Dynamic Adjustment of Threads
The
default value of dyn-var is true. That is, dynamic
adjustment is enabled by default. Set the OMP_DYNAMIC environment
variable, or call the omp_set_dynamic() routine to
disable dynamic adjustment.
A call to omp_set_dynamic() modifies
the value of dyn-var for the calling thread only
and applies to parallel regions at the same or inner nesting level encountered
by the calling thread.
If dynamic adjustment is enabled, then
the number of threads in the team is adjusted to be the minimum of:
-
the number of threads the user requested
-
1 + the number of available threads in the pool
-
the number of available virtual processors
On the other hand, if dynamic adjustment is disabled, then the number
of threads in the team will be the minimum of:
In exceptional situations, such as when there is lack of system resources,
the number of threads supplied will be less than described above. In these
situations, if SUNW_MP_WARN is set to TRUE or
a callback function is registered via a call to sunw_mp_register_warn(), a warning message will be issued.
Refer to Chapter
2 for more information about the pool of threads and the nested parallelism
execution model.
-
Loop Scheduling
The
default value of def-sched-var is STATIC scheduling.
To specify a different schedule for a loop region, use the SCHEDULE clause.
The default value of run-sched-var is also STATIC scheduling. You can change the default by setting the OMP_SCHEDULE environment variable
-
GUIDED: Determination
of Chunk Sizes
The default chunk size for SCHEDULE(GUIDED) when chunksize is not specified is 1. The OpenMP
runtime library uses the following formula for computing the chunk sizes for
a loop with GUIDED scheduling:
chunksize = unassigned_iterations / (weight * num_threads)
where: unassigned_iterations is the number of iterations
in the loop that have not yet been assigned to any thread; weight is a floating-point
constant that can be specified by the user at runtime with the SUNW_MP_GUIDED_WEIGHT environment variable (2.3 OpenMP Environment Variables). The current default, if not specified, assumes weight
is 2.0; num_threads is the number of threads used to execute the loop.Choice
of the weight value affects the sizes of the initial and subsequent chunks
of iterations assigned to threads in loops, and has a direct affect on load
balancing. Experimental results show that the default weight of 2.0 works
well generally. However some applications could benefit from a different weight
value.
-
Explicitly Threaded Programs
Programs that are explicitly threaded using POSIX or Solaris threads
can contain OpenMP directives or call routines that contain OpenMP directives.
-
Runtime Warnings
Setting the SUNW_MP_WARN environment variable (2.3 OpenMP Environment Variables) enables runtime validity checking by the OpenMP runtime
library.
For example, the following code will fall into an endless
loop as threads wait at different barriers, and must be terminated with a
control-C from the terminal:
% cat bad1.c
#include <omp.h>
#include <stdio.h>
int
main(void)
{
omp_set_dynamic(0);
omp_set_num_threads(4);
#pragma omp parallel
{
int i = omp_get_thread_num();
if (i % 2) {
printf("At barrier 1.\n");
#pragma omp barrier
}
}
return 0;
}
% cc -xopenmp -xO3 bad1.c
% ./a.out run the program
At barrier 1.
At barrier 1.
program hung in endless loop
Control-C to terminate execution
|
But if we set SUNW_MP_WARN before execution,
the runtime library will detect the problem:
% setenv SUNW_MP_WARN TRUE
% ./a.out
WARNING (libmtsk): Environment variable SUNW_MP_WARN is set to
TRUE. Runtime error checking will be enabled.
At barrier 1.
At barrier 1.
WARNING (libmtsk): Threads at barrier from different directives.
Thread at barrier from bad1.c:8.
Thread at barrier from bad1.c:13.
Possible Reasons:
Worksharing constructs not encountered by all threads in the
team in the same order.
Incorrect placement of barrier directives.
WARNING (libmtsk): Runtime shutting down while some parallel region
is still active.
|
The C and C++ compilers also provide a function that can be used to
register a callback function when errors are detected. When an error is
detected, the registered callback function is called and passed a pointer
to an error message string as an argument.
int sunw_mp_register_warn(void
(*func) (void *) )
Access to the prototype for this
function requires adding #include <sunw_mp_misc.h>
For
example:
% cat bad2.c
#include <omp.h>
#include <sunw_mp_misc.h>
#include <stdio.h>
void handle_warn(void *msg)
{
printf("handle_warn: %s\n", (char *)msg);
}
void set(int i)
{
static int k;
#pragma omp critical
{
k++;
}
#pragma omp barrier
}
int main(void)
{
int i, rc;
omp_set_dynamic(0);
omp_set_num_threads(4);
if (sunw_mp_register_warn(handle_warn) != 0) {
printf ("Installing callback failed\n");
}
#pragma omp parallel for
for (i = 0; i < 20; i++) {
set(i);
}
return 0;
}
% cc -xopenmp -xO3 bad2.c
% a.out
WARNING (libmtsk): Environment variable SUNW_MP_WARN is set to
TRUE. Runtime error checking will be enabled.
handle_warn: WARNING (libmtsk): at bad2.c:15. BARRIER is not
permitted in the dynamic extent of FOR / DO.
|
handle_warn() is installed as the callback function
when an error is detected by the OpenMP runtime library. The callback function
in this example merely prints the error message passed to it from the library,
but could be used to trap certain errors.
-
Regarding Specific Constructs:
sections construct
The structured
blocks in a sections construct are divided among the
members of the team executing the sections region, so that the threads execute
an approximately equal number of sections.
single construct
The structured block of a single construct
will be executed by the thread that encounters the single region first.
atomic construct
This implementation
replaces all ATOMIC directives and pragmas by enclosing
the target statement in a CRITICAL construct.
-
Binding Thread Set for OpenMP Library
Routines:
omp_set_num_threads routine
When called from within an explicit parallel region, the binding
thread set for the omp_set_num_threads region is the
calling thread.
omp_get_max_threads routine
When called from within an explicit parallel region, the binding thread
set for the omp_get_max_threads region is the calling
thread.
omp_set_dynamic routine
When
called from within any explicit parallel region, the binding thread set for
the omp_set_dynamic region is the calling thread only.
omp_get_dynamic routine
When called
from within an explicit parallel region, the binding thread set for the omp_get_dynamic region is the calling thread only.
omp_set_nested routine
When called from within an
explicit parallel region, the binding thread set for the omp_set_nested region is the calling thread only.
omp_get_nested routine
When called from within an explicit parallel
region, the binding thread set for the omp_get_nested region
is the calling thread only.
-
Fortran 95-Specific Issues:
threadprivate directive
If the conditions for values of data in the threadprivate objects of
threads (other than the initial thread) to persist between two consecutive
active parallel regions do not all hold, then the allocation status of an
allocatable array in the second region may be "not currently allocated".
shared clause
Passing a shared variable to a non-intrinsic procedure may result in
the value of the shared variable being copied into temporary storage before
the procedure reference, and back out of the temporary storage into the actual
argument storage after the procedure reference. This copying into and out
of temporary storage can occur only if conditions a, b, and c in Section 2.8.3.2
of the OpenMP 2.5 Specification hold.
Include
and module files
Both the include file omp_lib.h and the module file omp_lib are provided
in this implementation.
On Solaris, the OpenMP runtime library
routines that take an argument are extended with a generic interface so arguments
of different Fortran KIND types can be accommodated.