Contained Within
Find More Documentation
Featured Support Resources
| PDF로 이 문서 다운로드
Link-Editor
2
Overview
- The link-editing process consists of building an output file from one or more input files. The building of the output file is directed by the options supplied to the link-editor together with the input sections provided by the input files.
- All files are represented in the executable and linking format (ELF). For a complete description of the ELF format refer to Chapter 5, "Object Files", however, for this introduction it is first necessary to introduce two ELF structures, sections and segments. Sections represent the smallest indivisible units that may be processed within an ELF file. Segments are a collection of sections that represent the smallest individual units that may be mapped to a memory image by exec(2) or by the runtime linker.
- Although there are many types of ELF sections, they all fall into two categories with respect to the link-editing phase:
-
- Sections that contain program data, whose interpretation is only meaningful to the application itself (examples of these include the program instructions .text, and the associated data .data and .bss).
- Sections that contain link-editing information (examples of these include the symbol table information found from .symtab and .strtab, and relocation information such as .rela.text).
- Basically, the link-editor concatenates the program data sections into the output file. The link-editing information sections are interpreted by the link-editor and may result in modifications to other sections, or the generation of new output information sections for use in later processing of the output file.
- The following is a simple breakdown of the link-editors functionality, and introduces the topics covered in this chapter:
-
- It verifies and checks for consistency all the options passed to it.
- It concatenates sections of the same characteristics (for example, type, attributes and name) from the input relocatable objects to form new sections within the output file. These concatenated sections may in turn be associated to output segments.
- It reads symbol table information from both relocatable objects and shared objects to verify and unite references with definitions, and normally generates a new symbol table, or tables, within the output file.
- It reads relocation information from the input relocatable objects and applies this information to the output file by updating other input sections. In addition, output relocation sections may be generated for use by the runtime linker.
- It generates program headers that describe any segments created.
- It generates a dynamic linking information section if necessary, which provides information such as shared library dependencies to the runtime linker.
- The process of concatenating like sections, together with the association of sections to segments, is carried out using default information within the link-editor. The default section and segment handling provided by the link-editor is normally sufficient for most users, however, the defaults may be manipulated using the -M option with an associated mapfile (refer to Chapter 6, "Mapfile Option" for more details).
Invoking the Link-Editor
- You can run the link-editor directly from the command-line, or have a compiler driver invoke it for you. In the following two sections both of these methods are expanded upon. However, the latter is the preferred choice, as the compilation environment is often the consequence of a complex and occasionally changing series of operations known only to compiler drivers.
Direct Invocation
- When you invoke the link-editor directly, you have to supply every object file and library required to build the intended output. The link-editor makes no assumptions about the object modules or libraries you meant to use in building the output. For example, when you issue the command:
-
- the link-editor tries to build a dynamic executable named a.out using only the input file test.o. For the a.out to be a useful executable, it should include start-up and exit processing code. This code may be language or operating system specific, and is normally provided through files supplied by the compiler drivers. Additionally, you may also supply your own initialization and termination code. This code must be encapsulated and labelled correctly for it to be correctly recognized and made available to the runtime linker. This encapsulation and labelling is also provided through files supplied by the compiler drivers.
- In practice, there is little reason to invoke the link-editor directly.
Using a Compiler Driver
- The conventional way to use the link-editor is through a language-specific compiler driver. You supply the compiler driver,cc(1), f77(1), etc., with the input files that make up your application, and the compiler driver will add additional files and default libraries to complete the link-edit. These additional files may be seen by expanding the compilation invocation, for example:
-
$ cc -# -o prog main.o
/usr/ccs/bin/ld -dy /opt/COMPILER/crti.o /opt/COMPILER/crt1.o \
/usr/ccs/lib/values-Xt.o -o prog main.o \
-YP,/opt/COMPILER/lib:/usr/ccs/lib:/usr/lib -Qy -lc \
/opt/COMPILER/crtn.o
|
-
Note - This is an example; the actual files included by your compiler driver and the mechanism used to display the link-editor invocation may vary.
Specifying the Link-Editor Options
- Most options to the link-editor can be passed via the compiler driver command-line. For the most part there is no conflict between the compiler and the link-editor options. In cases where a conflict arises, the compiler drivers normally provide a command-line syntax that allows specific options to be passed to the link-editor. However, an alternative mechanism to provide options to the link-editor is to set the LD_OPTIONS environment variable. For example:
-
$ LD_OPTIONS="-R /home/me/libs -L /home/me/libs" cc -o prog \
main.c -lfoo
|
- Here the -R and -L options will be interpreted by the link-editor and prepended to any command-line options received from the compiler driver.
- The link-editor parses the entire option list looking for any invalid options or any options with invalid associated arguments. If either of these cases are found, a suitable error message is generated, and if the error is deemed fatal the link-edit terminates. For example:
-
$ ld -X -z sillydefs main.o
ld: illegal option -- X
ld: fatal: option -z has illegal argument 'sillydefs'
|
- Here the illegal option -X is identified, and the illegal argument to the -z option is caught by the link-editor's checking. If an option requiring an associated argument is mistakenly specified twice the link-editor will provide a suitable warning but will continue with the link-edit. For example:
-
$ ld -e foo ...... -e bar main.o
ld: warning: option -e appears more than once, first setting taken
|
- The link-editor also checks the option list for any fatal inconsistences. For example:
-
$ ld -dy -r main.o
ld: fatal: option -dy and -r are incompatible
|
- After processing all options, and providing no fatal error conditions have been detected, the link-editor proceeds to process the input files.
- Refer to Appendix A, "Link-Editor Quick Reference" for the most commonly used link-editor options, and to the ld(1) manual page for a complete description of all link-editor options.
Input File Processing
- The link-editor reads input files in the order they appear on the command-line. Each file is opened and inspected to determine its ELF file type and thus determine how it must be processed. The file types applicable as input for the link-edit are determined by the binding mode of the link-edit, either static or dynamic.
- Under static linking the link-editor will only accept relocatable objects or archive libraries as input files. Under dynamic linking the link-editor will also accept shared objects.
- Relocatable objects represent the most basic input file type to the link-editing process. The program data sections within these files are concatenated into the output file image being generated. The link-edit information sections are organized for later use, but will not become part of the output file image, as new sections will be generated to take their place. Symbols are gathered into a special internal symbol table that allows for their verification and resolution, and eventually the creation of one or more symbol tables in the output image.
- Although any input file can be specified directly on the link-edit command-line, archive libraries and shared objects are commonly specified using the -l option (refer to the section "Linking with Additional Libraries" on page 14 for coverage of the use of this mechanism and how it relates to the two different linking modes). However, even though shared objects are often referred to as
- shared libraries, and both of these objects may be specified using the same option, the interpretation of shared objects and archive libraries is quite different. The next two sections expand upon these differences.
Archive Processing
- Archives are built using ar(1), and normally consist of a collection of relocatable objects with an archive symbol table. This symbol table provides an association of symbol definitions with the objects that supply these definitions. When the link-editor reads an archive, it uses information within the internal symbol table it is creating to select only the objects from the archive it requires to complete the binding process. To be more precise, the link-editor will extract a relocatable object from an archive if:
-
- It contains a symbol definition that satisfies a symbol reference (sometimes referred to as an undefined symbol) presently held in the link-editor's internal symbol table, or
- It contains a data symbol definition that satisfies a tentative symbol definition presently held in the link-editor's internal symbol table. An example of this would be that a FORTRAN COMMON block definition would result in the extraction of a relocatable object that defines the same DATA symbol.
-
Note - A weak symbol reference will not cause the extraction of an object from an archive. Weak symbols are expanded upon in section "Simple Resolutions" on page 22.
- The link-editor will make multiple passes through an archive extracting relocatable objects as needed to satisfy the symbol information being accumulated in the link-editors internal symbol table. Once the link-editor has made a complete pass through the archive without extracting any relocatable objects, it will move on to process the next input file. This mechanism of only extracting from the archive the relocatable objects needed at the time the archive was encountered means that the position of the archive within the input file list may be significant (refer to section "Position of an Archive on the Command-Line" on page 16 for more details).
-
Note - Although the link-editor will make multiple passes through an archive to resolve symbols, this mechanism may be quite costly for large archives containing random organizations of relocatable objects. In these cases it is
- recommended that tools like lorder(1) and tsort(1) be used to order the relocatable objects within the archive and thus reduce the number of passes the link-editor must carry out.
Shared Object Processing
- Shared objects are indivisible, whole units that have been generated via a previous link-edit of one or more input files. When the link-editor processes a shared object the entire contents of the shared object become a logical part of the resulting output file image. The shared object is not copied physically during the link-edit as its actual inclusion is deferred until process execution. This logical inclusion means that all symbol entries defined in the shared object are made available to the link-editing process.
- The shared object's program data sections and most of the link-editing information sections are unused by the link-editor, as these will be interpreted by the runtime linker when the shared object is bound to generate a runable process. However, the occurrence of a shared object will be remembered, and information will be stored in the output file image to indicate that this object is a dependency and must be made available at runtime.
- If a shared object has dependencies on other shared objects, these too will be processed. This processing will occur after all command-line input files have been processed. These shared objects will be used to complete the symbol resolution process, however their names will not be recorded as dependencies in the output file image being generated.
- Although the position of a shared object on the link-edit command-line has less significance than it does for archive processing, it may have a global effect. Multiple symbols of the same name are allowed to occur between relocatable objects and shared objects, and between multiple shared objects (refer to the section "Symbol Resolution" on page 21 for more details). The order of shared objects processed by the link-editor is maintained in the dependency information stored in the output file image. As the runtime linker reads this information it will load the specified shared objects in the same order. Therefore, the link-editor and the runtime linker will select the first occurrence of a symbol of a multiply defined series of symbols.
-
Note - Multiple symbol definitions, and thus the information to describe the interposing of one definition of a symbol for another, are reported in the load map output generated using the -m option.
Linking with Additional Libraries
- Although the compiler drivers will often insure that appropriate libraries are specified to the link-editor, it is frequently necessary for developers to supply their own. Shared objects and archives can be specified by explicitly naming the input files required to the link-editor, however, a more common and more flexible method involves using the link-editor's -l option.
Library Naming Conventions
- By convention, shared objects are normally designated by the prefix lib and the suffix .so, and archives are designated by the prefix lib and the suffix .a. For example, libc.so is the shared object version of the standard C library made available to the compilation environment, and libc.a is its archive version. These conventions are recognized by the -l option of the link-editor. Developers commonly use this option to supply additional libraries to their link-edit, for example
-
$ cc -o prog file1.c file2.c -lfoo
|
- directs the link-editor to search for libfoo.so, and if it does not find it, to search for libfoo.a.
-
Note - There is a naming convention regarding the compilation environment and the runtime environment use of shared objects. The compilation environment uses the simple .so suffix, whereas the runtime environment commonly uses the suffix with an additional version number. Refer to section "Naming Conventions" on page 68, and "Versioning" on page 73 for more details.
- When link-editing in dynamic mode, you may choose to link with a mix of shared objects and archives. When link-editing in static mode, only archive libraries are acceptable for input. When in dynamic mode and using the -l
- option to enable a library search, the link-editor will first search in a given directory for a shared object that matches the specified name. If no match is found the link-editor will then look for an archive library in the same directory. When in static mode and using the -l option, only archive libraries will be sought.
Linking with a Mix of Shared Objects and Archives
- Although the library search mechanism, in dynamic mode, searches a given directory for a shared object, and then an archive library, finer control of the type of search required can be achieved using the -B option. By specifying the -Bdynamic and -Bstatic options on the command-line, as many times as required, the library search can be toggled between shared objects or archives respectively. For example, to link an application with the archive libfoo.a and the shared object libbar.so, issue the following command:
-
$ cc -o prog main.o file1.o -Bstatic -lfoo -Bdynamic -lbar
|
- The -Bstatic and -Bdynamic keywords are not exactly symmetrical. When you specify -Bstatic, the link-editor does not accept shared objects as input until the next occurrence of -Bdynamic. However, when you specify -Bdynamic, the link-editor will first look for shared objects and then archives in any given directory.
- Thus in the previous example it would be more precise to say that the link-editor will first search for libfoo.a. It will then search for libbar.so, and if that fails, for libbar.a. Finally, it will search for libc.so, and if that fails, libc.a.
- Another example of using these options is in the creation of an ABI-conforming application. For example:
-
$ cc -o prog main.o file.1.o -lsys -Bstatic
|
- Here all the basic system routines defined in libsys.so will be bound to this shared object. Because the compiler driver appends a -lc to the options supplied to the link-editor, and because the -Bstatic has instructed the
- link-editor to search for archive libraries only, any remaining undefined symbols will be resolved by extracting the appropriate relocatable objects from libc.a.
Position of an Archive on the Command-Line
- The position of an archive on the command-line may affect the output file being produced. The link-editor searches an archive only to resolve undefined or tentative external references it has previously seen. Once this search is completed and the required relocatable objects have been extracted, the archive will not be available to resolve any new symbols obtained from the input files that follow the archive on the command-line. For example, the command
-
$ cc -o prog file1.c -Bstatic -lfoo file2.c file3.c -Bdynamic
|
- directs the link-editor to search libfoo.a only to resolved symbol references that have been obtained from file1.c; libfoo.a will not be available to resolve symbol references from file2.c or file3.c.
-
Note - As a rule, it is best to specify any archives at the end of the command-line unless multiple-definition conflicts require you to do otherwise.
Directories Searched by the Link-Editor
- All previous examples assumed that the link-editor knows where to search for the libraries listed on the command-line. By default the link-editor knows of only two standard places to look for libraries, /usr/ccs/lib and /usr/lib. All other directories to be searched must be added to the link-editor's search path explicitly.
- There are two ways to change the link-editor search path: using a command-line option, or using an environment variable.
-
Using a Command-Line Option The -L option can be used to add a new pathname to the library search path. This option affects the search path at the point it is encountered on the command-line. For example, the command
-
$ cc -o prog main.o -Lpath1 file1.o -lfoo file2.o -Lpath2 -lbar
|
- searches path1 (then /usr/ccs/lib and /usr/lib) to find libfoo, but searches path1 and then path2 (and then /usr/ccs/lib and /usr/lib) to find libbar.
- Pathnames defined using the -L option are used only by the link-editor. They are not recorded in the output file image created for use by the runtime linker.
-
Note - You must specify -L if you want the link-editor to search for libraries in your current directory. You can use a period (.) to represent the current directory.
- The -Y option can be used to change the default directories searched by the link-editor. The argument supplied with this option takes the form of a colon separated list of directories. For example, the command
-
$ cc -o prog main.o -YP,/opt/COMPILER/lib:/home/me/lib -lfoo
|
- searches for libfoo only in the directories /opt/COMPILER/lib and /home/me/lib. The directories specified using the -Y option can be supplemented by using the -L option.
-
Using an Environment Variable You can also use the environment variable LD_LIBRARY_PATH, which takes a colon-separated list of directories, to add to the link-editor's library search path. In its most general form, LD_LIBRARY_PATH takes two directory lists separated by a semicolon. The first list is searched before the list(s) supplied on the command-line, and the second list is searched after.
- Here is the combined effect of setting LD_LIBRARY_PATH and calling the link-editor with several -L occurrences:
-
$ LD_LIBRARY_PATH=dir1:dir2;dir3
$ export LD_LIBRARY_PATH
$ cc -o prog main.o -Lpath1 ... -Lpath2 ... -Lpathn -lfoo
|
- The effective search path will be dir1:dir2:path1:path2... pathn:dir3:/usr/ccs/lib:/usr/lib.
- If no semicolon were specified as part of the LD_LIBRARY_PATH definition the specified directory list would be interpreted after any -L options. For example:
-
$ LD_LIBRARY_PATH=dir1:dir2
$ export LD_LIBRARY_PATH
$ cc -o prog main.o -Lpath1 ... -Lpath2 ... -Lpathn -lfoo
|
- Here the effective search path will be path1:path2... pathn:dir1:dir2:/usr/ccs/lib:/usr/lib.
-
Note - This environment variable may also be used to augment the search path of the runtime linker (refer to "Directories Searched by the Runtime Linker" on page 40). To prevent this environment variable from influencing the link-editor the -i option can be used.
Directories Searched by the Runtime Linker
- The runtime linker knows of only one standard place to look for libraries, /usr/lib. All other directories to be searched must be added to the runtime linker's search path explicitly.
- When a dynamic executable or shared object is linked with additional shared objects, these shared objects are recorded as dependencies that must be located again during process execution by the runtime linker. During the link-edit, one or more pathnames can be recorded in the output file being built for the runtime linker to use to search for any shared object dependencies. These recorded pathnames are referred to as a runpath.
-
Note - No matter how you modify the runtime linker's library search path, its last element is always /usr/lib.
- The -R option, which takes a colon-separated list of directories, can be used to record a runpath in a dynamic executable or shared library. For example:
-
$ cc -o prog main.o -R/home/me/lib:/home/you/lib -Lpath1 \
-Lpath2 file1.o file2.o -lfoo -lbar
|
- will record the runpath /home/me/lib:/home/you/lib in the dynamic executable prog. The runtime linker will use these paths, and then the default location /usr/lib, to locate any shared object dependencies, in this case libfoo.so.1 and libbar.so.1.
- The link-editor accepts multiple -R options and will concatenate each of these specifications, separated by a colon. Thus, the above example could also be expressed as:
-
$ cc -o prog main.o -R/home/me/lib -Lpath1 \
-R/home/you/lib -Lpath2 file1.o file2.o -lfoo -lbar
|
-
Note - A historic alternative to specifying the -R option is to set the environment variable LD_RUN_PATH, and make this available to the link-editor. The scope and function of LD_RUN_PATH and -R are identical, but when both are specified, -R supersedes LD_RUN_PATH.
Initialization and Termination Sections
- The .init and .fini section types provide for runtime initialization and termination processing. These section types are concatenated from the input relocatable objects like any other sections. However, the compiler drivers may also supply .init and .fini sections as part of the additional files they add to the beginning and end of the user's input-file list. These files have the effect of encapsulating the .init and .fini code into individual functions that are identified by the reserved symbol names _init and _fini respectively. When building a dynamic executable or shared object, the link-editor records these
- symbol addresses in the output file's image so they may be called by the runtime linker during initialization and termination processing. Refer to the "Initialization and Termination Routines" on page 50 for more details on the runtime processing of these sections.
- The creation of .init and .fini sections can be carried out directly using an assembler, or some compilers may offer special primitives to simplify their declaration. For example, the following code segments result in a call to the function foo being placed in an .init section, and a call to the function bar being placed in a .fini section:
-
#pragma init (foo)
#pragma fini (bar)
foo()
{
/* Perform some initialization processing. */
......
}
bar()
{
/* Perform some termination processing. */
.......
}
|
- Care should be taken when designing initialization and termination code that may be included in both a shared object and archive library. If this code is spread throughout a number of relocatable objects within an archive library, then the link-edit of an application using this archive may only extract a portion of the modules, and hence only a portion of the initialization and termination code. At runtime only this portion of code will be executed. However, the same application built against the shared object will have all the accumulated initialization and termination code executed at runtime when the shared object is mapped in as one of the application's dependencies.
Symbol Processing
- During input file processing, all local symbols from the input relocatable objects are passed through to the output file image. All other symbol entries are accumulated internally to the link-editor. Each time a symbol entry is
- processed, the link-editor determines if a symbol with the same name has already been encountered from a previous input file. If so, a symbol resolution process is called to determine which of the two entries is to be kept.
- On completion of input file processing, providing no fatal error conditions have been encountered during symbol resolution, the link-editor determines if any unbound symbol references (undefined symbols) remain that will cause the link-edit to fail.
- The following sections expand upon symbol resolution and undefined symbol processing.
Symbol Resolution
- Symbol resolution runs the entire spectrum, from simple and intuitive to complex and perplexing. Resolutions may be carried out silently by the link-editor, be accompanied by warning diagnostics, or result in a fatal error condition. The resolution of two symbols depends on the symbols' attributes, the type of file providing the symbol and the type of file being generated. For a complete description of symbol attributes refer to section "Symbol Table" on page 119, however, for the following discussions it is worth identifying three basic symbol types:
-
-
Undefined symbols. These symbols have been referenced in a file but have not been assigned a storage address.
-
Tentative symbols. These symbols have been created within a file but have not yet been sized or allocated in storage. They appear as uninitialized C symbols, or FORTRAN COMMON blocks within the file.
-
Defined symbols. These symbols have been created and assigned storage addresses and space within the file.
- In its simplest form, resolution involves the use of a precedence relationship that has defined symbols dominating tentative symbols, which dominate undefined symbols.
- The following C code example shows how these symbol types may be generated (undefined symbols are prefixed with u_, tentative symbols are prefixed with t_, and defined symbols are prefixed with d_):
-
$ cat main.c
extern int u_bar;
extern int u_foo();
int t_bar;
int d_bar = 1;
d_foo()
{
return (u_foo(u_bar, t_bar, d_bar));
}
$ cc -o main.o -c main.c
$ nm -x main.o
[Index] Value Size Type Bind Other Shndx Name
...............
[8] |0x00000000|0x00000000|NOTY |GLOB |0x0 |UNDEF |u_foo
[9] |0x00000000|0x00000040|FUNC |GLOB |0x0 |2 |d_foo
[10] |0x00000004|0x00000004|OBJT |GLOB |0x0 |COMMON |t_bar
[11] |0x00000000|0x00000000|NOTY |GLOB |0x0 |UNDEF |u_bar
[12] |0x00000000|0x00000004|OBJT |GLOB |0x0 |3 |d_bar
|
Simple Resolutions
- These symbol resolutions are by far the most common, and result when two symbols with similar characteristics are detected, and one symbol takes precedence over the other. This symbol resolution is carried out silently by the link-editor. For example, for symbols with the same binding, a reference to an undefined symbol from one file will be bound to, or satisfied by, a defined or tentative symbol definition from another file. Or, a tentative symbol definition from one file will be bound to a defined symbol definition from another file.
- Symbols that undergo resolution may have either a global or weak binding. Weak bindings have less precedence than global binding, and thus symbols with different bindings are resolved according to a slight alteration of the simple rules outlined above. But first, it is worth introducing how weak symbols may be produced.
- Weak symbols may be defined individually, or as aliases to global symbols using a pragma definition:
-
$ cat main.c
#pragma weak bar
#pragma weak foo = _foo
int bar = 1;
_foo()
{
return (bar);
}
$ cc -o main.o -c main.c
$ nm -x main.o
[Index] Value Size Type Bind Other Shndx Name
...............
[7] |0x00000000|0x00000004|OBJT |WEAK |0x0 |3 |bar
[8] |0x00000000|0x00000028|FUNC |WEAK |0x0 |2 |foo
[9] |0x00000000|0x00000028|FUNC |GLOB |0x0 |2 |_foo
|
- Notice that the weak alias foo is assigned the same attributes as the global symbol _foo. This relationship will be maintained by the link-editor and will result in the symbols being assigned the same value in the output image.
- In symbol resolution, weak defined symbols will be silently overridden by any global definition of the same name.
- Another form of simple symbol resolution occurs between relocatable objects and shared objects, or between multiple shared objects, and is termed interposition. In these cases, if a symbol is multiply defined, the relocatable object, or the first definition between multiple shared objects, will be silently taken by the link-editor. The relocatable object's definition, or the first shared object's definition, is said to interpose on all other definitions. This interposition may be used to override the functionality provided by one shared object by a dynamic executable or another shared object.
- The combination of weak symbols and interposition provides a very useful programming technique. For example, the standard C library provides a number of services that users are allowed to redefine for themselves. However, ANSI C defines a set of standard services that must be present on the system
- and cannot be replaced in a strictly conforming program. The function fread(3S), for example, is an ANSI C library function, whereas the system function read(2) is not. A conforming ANSI C program must be able to redefine read(2), and still use fread(3S) in a predictable way.
- The problem here is that read(2) underlies the fread(3S) implementation in the standard C library, and thus it would seem that a program that redefines read(2) could confuse the fread(3S) implementation. To guard against this, ANSI C states that an implementation cannot use a name that is not reserved to it, and by using the pragma directive shown below:
-
# pragma weak read = _read
|
- we are able to define just such a reserved name, and from it generate an alias for the function read(2). A user may quite freely define their own read() function without compromising the fread(3S) implementation, which in turn is implemented to use the _read() function. The link-editor will not complain of a user's redefinition of read(), either when linking against the shared object or archive version of the standard C library. In the former case, interposition will take its course, whereas in the latter case, the fact that the C library's definition of read(2) is weak allows it to be quietly overridden.
- By using the -m option, the link-editor will list all interposed symbol references along with section load address information to the standard output.
Complex Resolutions
- Complex resolutions occur when two symbols of the same name are found with differing attributes. In these cases the link-editor will select the most appropriate symbol and will generate a warning message indicating the symbol, the attributes that conflict, and the identity of the file from which the symbol definition is taken. For example:
-
$ cat foo.c
int array[1];
$ cat bar.c
int array[2] = { 1, 2 };
$ cc -dn -r -o temp.o foo.c bar.c
ld: warning: symbol `array' has differing sizes:
(file foo.o value=0x4; file bar.o value=0x8);
bar.o definition taken
|
- Here, two files with a definition of the data item array have different size requirements. A similar diagnostic would be produced if the symbols' alignment requirements differed. In both of these cases the diagnostic may be suppressed by using the link-editor's -t option.
- Another form of attribute difference is the symbol's type. For example:
-
$ cat foo.c
bar()
{
return (0);
}
$ cc -o libfoo.so -G -K pic foo.c
$ cat main.c
int bar = 1;
main()
{
return (bar);
}
$ cc -o main main.c -L. -lfoo
ld: warning: symbol `bar' has differing types:
(file main.o type=OBJT; file ./libfoo.so type=FUNC);
main.o definition taken
|
- Here the symbol bar has been defined as both a data item and a function.
-
Note - types in this context are the symbol types that can be expressed in ELF. They are not related to the data types as employed by the programming language except in the crudest fashion.
- In cases like this, the relocatable object definition will be taken when the resolution occurs between a relocatable object and a shared object, or, the first definition will be taken when the resolution occurs between two shared objects. When such resolutions occur between symbols of different bindings (weak or global), a warning will also be produced.
- Inconsistences between symbol types are not suppressed by the -t option.
Fatal Resolutions
- Symbol conflicts that cannot be resolved result in a fatal error condition. In this case an appropriate error message is provided indicating the symbol name together with the names of the files that provided the symbols, and no output
- file will be generated. Although the fatal condition is sufficient to terminate the link-edit, all input file processing will first be completed. In this manner all fatal resolution errors can be identified.
- The most common fatal error condition exists when two relocatable objects both define symbols of the same name, and neither symbol is a weak definition:
-
$ cat foo.c
int bar = 1;
$ cat bar.c
bar()
{
return (0);
}
$ cc -dn -r -o temp.o foo.c bar.c
ld: fatal: symbol `bar' is multiply defined:
(file foo.o and file bar.o);
ld: fatal: File processing errors. No output written to int.o
|
- Here foo.c and bar.c have conflicting definitions for the symbol bar. Since the link-editor cannot determine which should dominate, it will normally give up. However, the link-editor's -z muldefs option can be used to suppress this error condition, and allows the first symbol definition to be taken.
Undefined Symbols
- After all input files have been read and all symbol resolution is complete, the link-editor will search the internal symbol table for any symbol references that have not been bound to symbol definitions. These symbol references are referred to as undefined symbols. The effect of these undefined symbols on the link-edit process can vary according to the type of output file being generated, and possibly the type of symbol.
Generating an Executable
- When the link-editor is generating an executable output file, the link-editor's default behavior is to terminate the link-edit with an appropriate error message should any symbols remain undefined. A symbol remains undefined when a symbol reference in a relocatable object is never matched to a symbol definition:
-
$ cat main.c
extern int foo();
main()
{
return (foo());
}
$ cc -o prog main.c
Undefined first referenced
symbol in file
foo main.o
ld: fatal: Symbol referencing errors. No output written to prog
|
- In a similar manner, a symbol reference within a shared object that is never matched to a symbol definition when the shared object is being used to build a dynamic executable, will also result in an undefined symbol:
-
$ cat foo.c
extern int bar;
foo()
{
return (bar);
}
$ cc -o libfoo.so -G -K pic foo.c
$ cc -o prog main.c -L. -lfoo
Undefined first referenced
symbol in file
bar ./libfoo.so
ld: fatal: Symbol referencing errors. No output written to prog
|
- Sometimes, developers wish to allow undefined symbols in cases like the previous example. In these cases the default fatal error condition can be suppressed by using the -z nodefs option.
-
Note - Care should be taken when using the -z nodefs option. If an unavailable symbol reference is required during the execution of a process, a fatal runtime relocation error will occur. Although this error may be detected during the initial execution and testing of an application, more complex execution paths may result in this error condition taking much longer to detect, which may be time consuming and costly.
- Symbols can also remain undefined when a symbol reference in a relocatable object is bound to a symbol definition in an implicitly defined shared object. For example, continuing with the files main.c and foo.c used in the previous example:
-
$ cat bar.c
int bar = 1;
$ cc -o libbar.so -R. -G -K pic bar.c -L. -lfoo
$ ldd libbar.so
libfoo.so => ./libfoo.so
$ cc -o prog main.c -L. -lbar
Undefined first referenced
symbol in file
foo main.o (./libfoo.so?)
ld: fatal: Symbol referencing errors. No output written to prog
|
- Here prog is being built with an explicit reference to libbar.so, and because libbar.so has a dependency on libfoo.so, an implicit reference to libfoo.so from prog is established. Now, main.c made a specific reference to the interface provided by libfoo.so. This means that prog really has a dependency on libfoo.so. However, because only explicit shared object dependencies are recorded in the output file being generated, prog would fail to run should a new version of libbar.so be developed that no longer has a dependency on libfoo.so. For this reason, bindings of this type are deemed fatal, and the implicit reference should be made explicit by referencing the library directly during the link-edit of prog (the required reference is hinted at as "(./libfoo.so?)" in the fatal error message shown in this example).
Generating a Shared Object
- When the link-editor is generating a shared object, it will by default allow undefined symbols to remain at the end of the link-edit. This allows the shared object to import symbols from either relocatable objects or other shared objects when it is used to build a dynamic executable. The -z defs option can be used to force a fatal error should any undefined symbols remain.
Weak Symbols
- Weak symbol references that are not bound during a link-edit will not result in a fatal error condition, no matter what output file type is being generated. If a static executable is being generated, the symbol will be converted to an absolute symbol and assigned a value of zero. If a dynamic executable or shared object is being produced, the symbol will be left as an undefined weak reference. In this case, during process execution, the runtime linker will search for this symbol, and if it does not find a match, will bind the reference to an address of zero instead of generating a fatal runtime relocation error.
- Within the confines of position-independent code (refer to section "Position-Independent Code" on page 85 for more information), these undefined weak referenced symbols may provide a useful mechanism for testing for the existence of functionality. For example, lets take the following C code fragment:
-
#pragma weak foo
extern void foo(char *);
void
bar(char * path)
{
void (* fptr)();
if ((fptr = foo) != 0)
(* fptr)(path);
}
|
- If, during the link-editing of an executable containing this code, a definition for the function foo was found (say, from binding with a shared object that defined the symbol), then during execution the function address will test
- nonzero, which will result in the function being called. However, if the symbol definition was not found, the executable would still have been built, but during execution the function address will test zero, and thus will not be called.
Tentative Symbol Order Within the Output File
- Normally, contributions from input files appear in the output file in the order of their contribution. An exception occurs when processing tentative symbols and their associated storage. These symbols are not fully defined until their resolution is complete. If the resolution occurs as a result of encountering a defined symbol from a relocatable object, then the order of appearance will be that which would have occurred normally for the definition.
- If it is desirable to control the ordering of a group of symbols, then any tentative definition should be redefined to a zero-initialized data item. For example, the following tentative definitions have resulted in a reordering of the data items within the output file compared to the original order described in the source file foo.c:
-
$ cat foo.c
char A_array[0x10];
char B_array[0x20];
char C_array[0x30];
$ cc -o prog main.c foo.c
$ nm -vx prog | grep array
[32] |0x00020754|0x00000010|OBJT |GLOB |0x0 |15 |A_array
[34] |0x00020764|0x00000030|OBJT |GLOB |0x0 |15 |C_array
[42] |0x00020794|0x00000020|OBJT |GLOB |0x0 |15 |B_array
|
- By defining these symbols as initialized data items, the relative ordering of these symbols within the input file is carried over to the output file:
-
$ cat foo.c
char A_array[0x10] = { 0 };
char B_array[0x20] = { 0 };
char C_array[0x30] = { 0 };
$ cc -o prog main.c foo.c
$ nm -vx prog | grep array
[32] |0x000206bc|0x00000010|OBJT |GLOB |0x0 |12 |A_array
[42] |0x000206cc|0x00000020|OBJT |GLOB |0x0 |12 |B_array
[34] |0x000206ec|0x00000030|OBJT |GLOB |0x0 |12 |C_array
|
Defining Additional Symbols
- The -u option provides a mechanism to generate a symbol reference from the link-edit command line. This option is well suited for extracting objects from archive libraries. This option may be used to perform a link-edit entirely from archives, or to provide additional flexibility in selecting the objects to extract from multiple archives (refer to "Archive Processing" on page 12 for an overview of archive extraction).
- For example, lets take the generation of a dynamic executable from the relocatable object main.o which makes reference to the symbols foo and bar. A developer wishes to obtain the symbol definition foo from the relocatable object foo.o contained in lib1.a, and the symbol definition bar from the relocatable object bar.o contained in lib2.a. However, the archive lib1.a also contains a relocatable object defining the symbol bar (presumably of differing functionality to that provided in lib2.a). In order to specify the required archive extraction the following link-edit can be used:
-
$ cc -o prog -L. -u foo -l1 main.o -l2
|
- Here, the -u option generates a reference to the symbol foo. This reference will cause extraction of the relocatable object foo.o from the archive lib1.a. As the first reference to the symbol bar occurs in main.o, which is encountered after lib1.a has been processed, the relocatable object bar.o will be obtained from the archive lib2.a.
-
Note - This simple example assumes that the relocatable object foo.o from lib1.a does not directly, or indirectly, reference the symbol bar. If it did then the relocatable object bar.o would also be extracted from lib1.a during its processing (refer to "Archive Processing" on page 12 for a discussion of the link-editor's multi-pass processing of an archive).
Generating the Output Image
- Once all input file processing and symbol resolution is completed with no fatal errors, the link-editor will start generating the output file image.
- The link-editor establishes what additional sections must be generated to complete the output file image. These include the symbol tables that may contain local symbol definitions from the input files, together with the global and weak symbol information that has been collected in its internal symbol table, with any output relocation and dynamic information required by the runtime linker. Once all the output section information has been established, the total output file size is calculated and the output file image is created accordingly.
- When building a dynamic executable or shared object, two symbol tables are normally generated. The .dynsym, and its associated string table .dynstr, contain only global, weak and section symbols. These sections are associated with the .text segment so that they are mapped as part of the process image at runtime, and made available to the runtime linker to perform any necessary relocations. The .symtab, and its associated string table .strtab, contain all the symbols collected from the input file processing. These sections are not mapped as part of the process image, and can even be stripped from the image using the -s option, or after the link-edit using strip(1).
- During the generation of the symbol tables a number of reserved symbols are created. These have special meaning to the linking process and should not be defined in any user code:
-
-
_etext, the first location after the text segment.
-
_edata, the first location after initialized data.
-
_end, the first location after all data.
-
_DYNAMIC, the address of the dynamic information section (the .dynamic section).
-
-
_GLOBAL_OFFSET_TABLE_, the position-independent reference to a link-editor supplied table of addresses (the .got section). This table is constructed from position-independent data references occurring in objects that have been compiled with the -K pic option (refer to the section "Position-Independent Code" on page 85 for more information).
-
_PROCEDURE_LINKAGE_TABLE_, the position-independent reference to a link-editor supplied table of addresses (the .plt section). This table is constructed from position-independent function references occurring in objects that have been compiled with the -K pic option (refer to the section "Position-Independent Code" on page 85 for more information).
- If the link-editor is generating an executable, it will look for additional symbols to define the executable's entry point. If a symbol was specified using the -e option it will be used, otherwise the link-editor will look for the reserved symbol names _start, and then main. If none of these symbols exists, the first address of the text segment will be used.
- Having created the output file, all data sections from the input files are copied to the new image. Any relocations specified in the input files are applied to the output image. Any new relocation information that must be generated, together with all the other link-editor generated information, is also written to the new image.
Debugging Aids
- Provided with the SunOS operating system linkers is a debugging library that allows developers to trace the link-editing process in more detail. This library helps users understand, or debug, the link-edit of their own applications or libraries. This is a visual aid, and although the type of information displayed using this library is expected to remain constant, the exact format of the information may change slightly from release to release.
- Much of the debugging output may be unfamiliar to those who do not have an intimate knowledge of ELF, however, some aspects may be of general interest to many developers.
- Debugging is enabled by using the -D option, and all output produced is directed to the standard error. This option must be augmented with one or more tokens to indicate the type of debugging required. The tokens available can be displayed by using -Dhelp. For example:
-
$ ld -Dhelp
debug:
debug: For debugging the link-editing of an application:
debug: LD_OPTIONS=-Doption1,option2 cc -o prog ...
debug: or,
debug: ld -Doption1,option2 -o prog ...
debug: where placement of -D on the command line is significant
debug: and options can be switched off by prepending with `!'.
debug:
debug:
debug: args display input argument processing
debug: detail provide more information in conjunction with other
debug: options
debug: entry display entrance criteria descriptors
debug: files display input file processing (files and libraries)
debug: help display this help message
debug: libs display library search paths; detail flag shows actual
debug: library lookup (-l) processing
debug: map display map file processing
debug: reloc display relocation processing
debug: sections display input section processing
debug: segments display available output segments and address/offset
debug: processing; detail flag shows associated sections
debug: symbols display symbol table processing;
debug: detail flag shows resolution and linker table addition
|
-
Note - The above is an example, and shows the options meaningful to the link-editor. The exact options may differ from release to release.
- As most compiler drivers will interpret the -D option during their preprocessing phase, the LD_OPTIONS environment variable is a suitable mechanism for passing this option to the link-editor.
- The following example shows how input files can be traced. This can be especially useful in determining what libraries have been located, or what relocatable objects have been extracted from an archive during a link-edit:
-
$ LD_OPTIONS=-Dfiles cc -o prog main.o -L. -lfoo
............
debug: file=main.o [ ET_REL ]
debug: file=./libfoo.a [ archive ]
debug: file=./libfoo.a(foo.o) [ ET_REL ]
debug: file=./libfoo.a [ archive ] (again)
............
|
- Here the member foo.o is extracted from the archive library libfoo.a to satisfy the link-edit of prog. Notice that the archive is searched twice (again) to verify that the extraction of foo.o did not warrant the extraction of additional relocatable objects. More than one "again" display indicates that the archive is a candidate for ordering using lorder(1) and tsort(1).
- By adding the symbol's token you can also determine what symbol caused this archive member to be extracted, and which object made the initial symbol reference:
-
$ LD_OPTIONS=-Dsymbols cc -o prog main.o -L. -lfoo
............
debug: symbol table processing; input file=main.o [ ET_REL ]
............
debug: symbol[7]=foo (global); adding
debug:
debug: symbol table processing; input file=./libfoo.a [ archive ]
debug: archive[0]=bar
debug: archive[1]=foo (foo.o) resolves undefined or tentative symbol
debug:
debug: symbol table processing; input file=./libfoo(foo.o) [ ET_REL ]
.............
|
- Here the symbol foo is referenced by main.o and is added to the link-editor's internal symbol table. This symbol reference causes the extraction of the relocatable object foo.o from the archive libfoo.a.
-
Note - The above output has been simplified for this document.
- Using the detail token together with the symbols token the details of symbol resolution during input file processing can be observed:
-
$ LD_OPTIONS=-Dsymbols,detail cc -o prog main.o -L. -lfoo
............
debug: symbol table processing; input file=main.o [ ET_REL ]
............
debug: symbol[7]=foo (global); adding
debug: entered 0x000000 0x000000 NOTY GLOB UNDEF REF_REL_NEED
debug:
debug: symbol table processing; input file=./libfoo.a [ archive ]
debug: archive[0]=bar
debug: archive[1]=foo (foo.o) resolves undefined or tentative symbol
debug:
debug: symbol table processing; input file=./libfoo.a(foo.o) [ ET_REL ]
debug: symbol[1]=foo.c
.............
debug: symbol[7]=bar (global); adding
debug: entered 0x000000 0x000004 OBJT GLOB 3 REF_REL_NEED
debug: symbol[8]=foo (global); resolving [7][0]
debug: old 0x000000 0x000000 NOTY GLOB UNDEF main.o
debug: new 0x000000 0x000024 FUNC GLOB 2 ./libfoo.a(foo.o)
debug: resolved 0x000000 0x000024 FUNC GLOB 2 REF_REL_NEED
|
- Here, the original undefined symbol foo from main.o has been overridden with the symbol definition from the extracted archive member foo.o. The detailed symbol information reflects the attributes of each symbol.
- From the above example, it should be apparent that using some of the debugging tokens can produce a wealth of output. In cases where the developer is only interested in the activity around a subset of the input files, the -D option can be placed directly in the link-edit command-line, and toggled on and off (to obtain the link-edit command-line it may be necessary to expand the compilation line from any driver being used, refer to "Using a Compiler Driver" on page 9 for more details). For example:
-
$ ld .... -o prog main.o -L. -Dsymbols -lbar -D!symbols ....
|
- Here the display of symbol processing will be switched on only during the processing of the library libbar.
|
|