Chapter 2 Compiling and Running OpenMP
Programs
This chapter describes compiler and runtime options affecting programs
that utilize the OpenMP API.
To run
an OpenMP program in a multithreaded environment, you must set the number
of threads in the the program to be greater than one. The number of threads
is controlled by the OMP_SET_NUM_THREADS environment
variable, which must be set greater than one prior to running the program.
The number of threads can also be set by calling omp_set_num_threads() in the program with a value greater than one, or by using the num_threads clause with a PARALLEL directive.
The latest information regarding Sun Studio compilers and OpenMP can
be found on the Sun Developer Network portal, http://developers.sun.com/sunstudio
2.1 Compiler Options To Use
To
enable explicit parallelization with OpenMP directives, compile your program
with the cc, CC, or f95 option
flag -xopenmp. (The f95 compiler
accepts both -xopenmp and -openmp as
synonyms.)
The -xopenmp flag accepts the following optional
keyword sub-options.
|
|
Enables recognition of OpenMP directives.
The minimum optimization level for -xopenmp=parallel is -xO3.
The compiler changes the optimization from a lower level to -xO3 if necessary, and issues a warning.
|
|
|
Enables recognition of OpenMP directives.
The compiler does not raise the optimization level if it is lower than -xO3.
If you explicitly set the optimization level lower than -xO3,
as in -xO2 -openmp=noopt the
compiler will issue an error.
If you do not specify an optimization level with -openmp=noopt,
the OpenMP directives are recognized, the program is parallelized accordingly,
but no optimization is done.
|
|
|
This option is no longer supported.
An OpenMP stubs library is provided for users’ convenience.
To compile an OpenMP program that calls OpenMP library routines but
ignores the OpenMP directives, compile the program without an -xopenmp option, and link the object files with the libompstubs.a library.
For example, % cc omp_ignore.c
-lompstubs
Linking with both libompstubs.a and the OpenMP
runtime library libmtsk.so is unsupported and may result
in unexpected behavior.
|
|
|
Disables recognition of OpenMP directives and does not change the optimization
level.
|
Additional Notes:
-
If you do not specify —xopenmp on
the command line, the compiler assumes —xopenmp=none (disabling
recognition of OpenMP directives).
-
If you specify —xopenmp but without
a keyword sub-option, the compiler assumes —xopenmp=parallel.
-
Specifying -xopenmp=parallel or noopt will define the _OPENMP preprocessor
token to be YYYYMM (specifically 200805L for C/C++
and 200805 for Fortran 95).
-
When debugging OpenMP programs with dbx,
compile with -xopenmp=noopt -g
-
The default optimization level for -xopenmp might
change in future releases. Compilation warning messages can be avoided by
specifying an appropriate optimization level explicitly.
-
With Fortran 95, -xopenmp , -xopenmp=parallel, -xopenmp=noopt will add -stackvar automatically.
-
When compiling and linking an OpenMP program in separate steps,
include -xopenmp on each of the compile and the link
steps.
-
Use the -xvpara C/C++/Fortran option
to display compiler parallelization messages.
-
For best performance and functionality on Solaris platforms,
make sure that the latest OpenMP runtime library, libmtsk.so,
is installed on the running system.
2.2 OpenMP Environment Variables
The OpenMP specification
defines a number of environment variables that control the execution of OpenMP
programs. These are summarized in the following table. For details, refer
to the OpenMP API Version 3.0 specifications. See also 3.8 Environment Variables
Table 2–1 OpenMP Environment Variables
|
Environment Variable
|
Function
|
|
OMP_SCHEDULE
|
Sets schedule type for DO, PARALLEL DO, for, parallel for, directives/pragmas with schedule
type RUNTIME specified.
If not set, a default value of STATIC is used. value is “type[,chunk]”
Example: setenv OMP_SCHEDULE 'GUIDED,4'
|
|
OMP_NUM_THREADS
|
Sets the number of threads
to use during execution of a parallel region.
You can override this value by a NUM_THREADS clause,
or a call to OMP_SET_NUM_THREADS().
If not set, a default of 1 is used. value is
a positive integer.
Example: setenv OMP_NUM_THREADS 16
|
|
OMP_DYNAMIC
|
Enables or
disables dynamic adjustment of the number of threads available for execution
of parallel regions.
If not set, a default value of TRUE is used. value is either TRUE or FALSE.
Example: setenv OMP_DYNAMIC FALSE
|
|
OMP_NESTED
|
Enables or
disables nested parallelism.
value is either TRUE or FALSE.
The default is FALSE.
Example: setenv OMP_NESTED FALSE
|
|
OMP_STACKSIZE
|
Sets the size of the stack for threads created by OpenMP.
Size may be specified as a positive integer in
Kilobytes, or with a suffix B, K, M, or G,
for Bytes, Kilobytes, Megabytes, or Gigabytes.
Example: setenv OMP_STACKSIZE 10M
|
|
OMP_WAIT_POLICY
|
Set desired policy regarding waiting threads. The value is either ACTIVE or PASSIVE.
ACTIVE threads consume processor time while waiting. PASSIVE threads do not and may yield the processor or go to sleep.
|
|
OMP_MAX_ACTIVE_LEVELS
|
Sets the maximum number of levels of nested active parallel regions.
Value is a non-negative integer.
|
|
OMP_THREAD_LIMIT
|
Sets the number of threads to use in the whole OpenMP program. Value
is a positive integer.
|
Sun Studio supports additional multiprocessing environment variables
that affect execution of OpenMP programs and are not part of the OpenMP specifications.
These are summarized in the following table.
Table 2–2 Multiprocessing Environment Variables
|
Environment Variable
|
Function
|
|
PARALLEL
|
For compatibility with legacy programs, setting the PARALLEL environment variable has the same effect as setting OMP_NUM_THREADS. However, if both PARALLEL and OMP_NUM_THREADS are set, they must be set to the same value.
|
|
SUNW_MP_WARN
|
Controls
warning messages issued by the OpenMP runtime library. If set to TRUE the
runtime library issues warning messages to stderr; FALSE disables warning messages. The default is FALSE.
The OpenMP runtime library has the ability to check for many common
OpenMP violations, such as incorrect nesting and deadlocks. However, runtime
checking does add overhead to the execution of the program.
The runtime library issues warning messages to stderr if SUNW_MP_WARN is set to TRUE. The runtime
library will also issue warning messages if the program registers a call-back
function to accept warning messages. A program can register a user call-back
function by calling the following function:
int sunw_mp_register_warn (void (*func)(void *));
|
The address of the call-back function is passed as argument to sunw_mp_register_warn(). This function returns 0 upon successfully registering the call-back
function, 1 upon failure.
If the program has registered a call-back function, libmtsk will
call the registered function passing a pointer to the localized string containing
the error message. The memory pointed to is no longer valid upon return from
the call-back function.
Example:
setenv SUNW_MP_WARN TRUE
|
|
SUNW_MP_THR_IDLE
|
Controls
the status of idle threads in an OpenMP program that are waiting at a barrier
or waiting for new parallel regions to work on. You can set the value to be
one of the following: SPIN, SLEEP, SLEEP( times), SLEEP(timems), SLEEP( timemc),
where time is an integer that specifies an amount
of time, and s, ms, and mc specify
the time unit (seconds, milli-seconds, and micro-seconds, respectively). SLEEP, SLEEP(0), SLEEP(0s), SLEEP(0ms), and SLEEP(0mc) are
all equivalent.
SPIN specifies that an idle thread should spin
while waiting at barrier or waiting for new parallel regions to work on. SLEEP without a time argument specifies that an idle thread should
sleep immediately. SLEEP with a time argument specifies
the amount of time a thread should spin-wait before going to sleep.
The default idle thread status is to sleep after possibly spin-waiting
for some amount of time.
Note
that if both SUNW_MP_THR_IDLE and OMP_WAIT_POLICY are set, they must have consistent values.
Examples:
setenv SUNW_MP_THR_IDLE SPIN
setenv SUNW_MP_THR_IDLE SLEEP
setenv SUNW_MP_THR_IDLE SLEEP(2s)
setenv SUNW_MP_THR_IDLE SLEEP(20ms)
setenv SUNW_MP_THR_IDLE SLEEP(150mc)
|
|
|
SUNW_MP_PROCBIND
|
This environment variable works on both Solaris and Linux systems. The SUNW_MP_PROCBIND environment variable can be used to bind threads of an OpenMP
program to virtual processors on the running system. Performance can be enhanced
with processor binding, but performance degradation will occur if multiple
threads are bound to the same virtual processor. See 2.3 Processor Binding for details.
|
|
SUNW_MP_MAX_POOL_THREADS
|
Specifies the maximum size of the thread pool. The thread pool contains
only non-user threads that the OpenMP runtime library creates. It does not
contain the master thread or any threads created explicitly by the user’s
program. If this environment variable is set to zero, the thread pool will
be empty and all parallel regions will be executed by one thread. The default,
if not specified, is 1023. See 4.2 Control of Nested Parallelism for details.
Note that SUNW_MP_MAX_POOL_THREADS specifies
the maximum number of non-user OpenMP threads to use
for the whole program, while OMP_THREAD_LIMIT specifies
the maximum number of user and non-user OpenMP threads
for the whole program. If both SUNW_MP_MAX_POOL_THREADS and OMP_THREAD_LIMIT are set they must have consistent values such
that OMP_THREAD_LIMIT is set to one more than the value
of SUNW_MP_MAX_POOL_THREADS.
|
|
SUNW_MP_MAX_NESTED_LEVELS
|
Specifies the maximum depth of active nested parallel regions. Any parallel
region that has an active nested depth greater than the value of this environment
variable will be executed by only one thread. A parallel region is considered
not active if it is an OpenMP parallel region that has a false IF clause.
The default, if not specified, is 4. See 4.2 Control of Nested Parallelism for details.
Note that if both SUNW_MP_MAX_NESTED_LEVELS and OMP_MAX_ACTIVE_LEVELS are set, they must be set to the same value.
|
|
STACKSIZE
|
Sets the stack size for each thread. The value is
in kilobytes. The default stack size for a helper thread is 4 Megabytes for
32-bit applications, and 8 Megabytes for 64-bit applications.
Example:
setenv STACKSIZE 8192 sets the thread
stack size to 8 Mb
The STACKSIZE environment variable also accepts
numerical values with a suffix of either B, K, M, or G for bytes, kilobytes, megabytes,
or gigabytes respectively. The default is kilobytes.
Note that if both STACKSIZE and OMP_STACKSIZE are set, they must be set to the same value.
|
|
SUNW_MP_GUIDED_WEIGHT
|
Sets the weighting factor used to determine the size of chunks assigned to threads in loops with GUIDED scheduling.
The value should be a positive floating-point number, and will apply to all
loops with GUIDED scheduling in the program. If not
set, the default value assumed is 2.0.
|
2.3 Processor Binding
With processor binding, the programmer instructs the operating system that a thread in the program should
run on the same processor throughout the execution of the program.
Processor binding, when used along with static scheduling, benefits
applications that exhibit a certain data reuse pattern where data accessed
by a thread in a parallel or worksharing region will be in the local cache
from a previous invocation of a parallel or worksharing region.
From the hardware point of view, a computer system is composed of one
or more physical processors. From the operating system point of view, each of these physical
processors maps to one or more virtual processors onto which threads in a
program can be run. If n virtual processors are
available, then n threads can be scheduled to run
at the same time. Depending on the system, a virtual processor may be a processor,
a core, etc.
For example,
each UltraSPARC IV physical processor has two cores; from the Solaris OS point
of view, each of these cores is a virtual processor onto which a thread can
be scheduled to run. The UltraSPARC T1 physical processor, on the other hand,
has eight cores, and each core can run four simultaneous processing threads;
from the Solaris OS point of view, there are 32 virtual processors onto which
threads can be scheduled to run. On the Solaris Operating System, the number
of virtual processors can be determined by using the psrinfo(1M)
command. On Linux systems,
the file /proc/cpuinfo provides information about available
processors.
When the operating system binds threads to processors, they are in effect
bound to specific virtual processors, not physical processors.
Set the SUNW_MP_PROCBIND environment
variable to bind threads in an OpenMP program to specific virtual processors.
The value specified for SUNW_MP_PROCBIND can be one
of the following:
-
The string "TRUE" or "FALSE"
(or lower case "true" or "false"). For
example, % setenv SUNW_MP_PROCBIND "false"
-
A non-negative integer. For
example, % setenv SUNW_MP_PROCBIND
"2"
-
A list of two or more non-negative integers separated by one
or more spaces. For example, % setenv SUNW_MP_PROCBIND "0 2 4 6"
-
Two non-negative integers, n1 and n2, separated by a minus ("-"); n1 must
be less than or equal to n2. For
example, % setenv SUNW_MP_PROCBIND "0-6"
Note that the non-negative integers referred to above denote logical
identifiers (IDs). Logical IDs may be different from virtual processor
IDs. The difference will be explained below.
Virtual Processor IDs:
Each virtual processor in a system has a unique processor ID. On Solaris
platforms, you can use the psrinfo(1M) command to display
information about the processors in a system, including their processor IDs.
You can use psrinfo -pv to list all physical processors
in the system and the virtual processors that are associated with each physical
processor. Moreover, you can use the prtdiag(1M) command
to display system configuration and diagnostic information.
Virtual processor IDs may be sequential or there may be gaps in the
IDs. For example, on a Sun Fire 4810 with 8 UltraSPARC IV processors (16 cores),
the virtual processor IDs may be: 0, 1, 2, 3, 8, 9, 10, 11, 512, 513, 514,
515, 520, 521, 522, 523.
Logical IDs:
As mentioned above, the non-negative integers specified for SUNW_MP_PROCBIND are logical IDs. Logical IDs are consecutive integers that start
with 0. If the number of virtual processors available in the system is n, then their logical IDs are 0, 1, ..., n-1,
in the order presented by psrinfo(1M). The following
Korn shell script can be used to display the mapping from virtual processor
IDs to logical IDs.
#!/bin/ksh
NUMV=`psrinfo | fgrep "on-line" | wc -l `
set -A VID `psrinfo | cut -f1 `
echo "Total number of on-line virtual processors = $NUMV"
echo
let "I=0"
let "J=0"
while [[ $I -lt $NUMV ]]
do
echo "Virtual processor ID ${VID[I]} maps to logical ID ${J}"
let "I=I+1"
let "J=J+1"
done
|
On systems where a single physical processor maps to several virtual
processors, it may be useful to know which logical IDs correspond to virtual
processors that belong to the same physical processor. The following Korn
shell script can be used with later Solaris releases to display this information.
#!/bin/ksh
NUMV= `psrinfo | grep "on-line" | wc -l `
set -A VLIST `psrinfo | cut -f1 `
set -A CHECKLIST `psrinfo | cut -f1 `
let "I=0"
while [ $I -lt $NUMV ]
do
let "COUNT=0"
SAMELIST="$I"
let "J=I+1"
while [ $J -lt $NUMV ]
do
if [ ${CHECKLIST[J]} -ne -1 ]
then
if [ `psrinfo -p ${VLIST[I]} ${VLIST[J]} ` = 1 ]
then
SAMELIST="$SAMELIST $J"
let "CHECKLIST[J]=-1"
let "COUNT=COUNT+1"
fi
fi
let "J=J+1"
done
if [ $COUNT -gt 0 ]
then
echo "The following logical IDs belong to the same physical processor:"
echo "$SAMELIST"
echo " "
fi
let "I=I+1"
done
|
Interpreting the Value Specified for SUNW_MP_PROCBIND:
If the value specified for SUNW_MP_PROCBIND is TRUE, then the threads will be bound to virtual processors in
a round-robin fashion, starting with the processor whose logical ID is 0.
(Specifying TRUE is equivalent to specifying the value
0 for SUNW_MP_PROCBIND.)
If the value specified for SUNW_MP_PROCBIND is
a non-negative integer, then that integer denotes the starting logical ID
of the virtual processor to which threads should be bound. Threads will be
bound to virtual processors in a round-robin fashion, starting with the processor
with the specified logical ID, and wrapping around to the processor with logical
ID 0, after binding to the processor with logical ID n-1.
If the value specified for SUNW_MP_PROCBIND is
a list of two or more non-negative integers, then threads will be bound in
a round-robin fashion to virtual processors with the specified logical IDs.
Processors with logical IDs other than those specified will not be used.
If the value specified for SUNW_MP_PROCBIND is
two non-negative integers separated by a minus ("-"), then threads will be
bound in a round-robin fashion to virtual processors in the range that begins
with the first logical ID and ends with the second logical ID. Processors
with logical IDs other than those included in the range will not be used.
If the value specified for SUNW_MP_PROCBIND does
not conform to one of the forms described above, or if an invalid logical
ID is given, then an error message will be emitted and execution of the program
will terminate.
Note that the number of threads created by the microtasking library,
libmtsk, depends on environment variables, API calls in the user’s program,
and the num_threads clause. SUNW_MP_PROCBIND specifies
the logical IDs of virtual processors to which the threads should be bound.
Threads will be bound to that set of processors in a round-robin fashion.
If the number of threads used in the program is less than the number of logical
IDs specified by SUNW_MP_PROCBIND, then some virtual
processors will not be used by the program. If the number of threads is greater
than the number of logical IDs specified by SUNW_MP_PROCBIND,
them some virtual processors will have more than one thread bound to them.
Interaction with OS Processor Sets
A processor set can be specified using the psrset utility
on Solaris platforms, or the taskset command on Linux
platforms. SUNW_MP_PROCBIND does not take processor
sets into account. If the programmer uses processor sets, then it is their
responsibility to ensure that the setting of SUNW_MP_PROCBIND is
consistent with the processor set used. Otherwise, the setting of SUNW_MP_PROCBIND will override the processor set setting on Linux systems, while
on Solaris systems an error message will be issued.
2.4 Stacks and Stack Sizes
The executing program maintains a main stack for
the initial thread executing the program, as well as distinct stacks for each
helper thread. Stacks are temporary memory address spaces used to hold arguments
and automatic variables during invocation of a subprogram or function reference.
In
general, the default main stack size is 8 megabytes. Compiling Fortran programs
with the f95 -stackvar option
forces the allocation of local variables and arrays on the stack as if they
were automatic variables. Use of -stackvar with OpenMP
programs is implied However, this may lead to stack overflow if not enough
memory is allocated for the stack.
Use the limit C-shell command, or the ulimit ksh/sh command, to display or set the size of the main stack.
Each helper thread of an OpenMP program has its own thread stack. This
stack mimics the initial (or main) thread stack but is unique to the thread.
The thread’s PRIVATE arrays and variables (local
to the thread) are allocated on the thread stack. The default size is 4 megabytes
on 32-bit SPARC V8 and x86 platforms, and 8 megabytes on 64-bit SPARC V9 and
x86 platforms. The size of the helper thread stack is set with the OMP_STACKSIZE environment
variable.
demo% setenv OMP_STACKSIZE 16384 <-Set thread stack size to 16 Mb (C shell)
demo$ OMP_STACKSIZE=16384 <-Same, using Bourne/Korn shell
demo$ export OMP_STACKSIZE
|
Finding the best stack size might have to be determined by trial and
error. If the stack size is too small for a thread to run it may cause silent
data corruption in neighboring threads, or segmentation faults. If you are
unsure about stack overflows, compile your Fortran, C, or C++ programs with
the -xcheck=stkovf compiler option to force a segmentation
fault on stack overflow. This stops the program before any data corruption
can occur.
2.5 Checking the Correctness of OpenMP Programs
The following are various mechanisms and tools that are available for
checking the correctness of an OpenMP program.
-
You can use the -xvpara C/C++/Fortran
option to display compiler parallelization messages
-
You can use the Sun Studio dbx tool
to debug C, C++, and Fortran OpenMP programs. An OpenMP program should first
be prepared for debugging with dbx by compiling it with the options -xopenmp=noopt
-g . dbx allows the programmer to single-step into a PARALLEL region, set breakpoints in the body of an OpenMP construct, as
well as print the values of shared, private, and other. variables for a given
thread.
-
You can use the SUNW_MP_WARN environment
variable to enable runtime error checking and the issuing of warning messages
by the OpenMP runtime library.
-
You can check your OpenMP program for data races and deadlocks
by using the Sun Studio Thread Analyzer tool. Refer to the Thread Analyzer
manual and the tha(1) man page for details.