OpenGL 1.1.1 for Solaris Implementation and Performance Guide
  Suchtext Nur in diesem Buch
Dieses Buch im PDF-Format herunterladen
CHAPTER 2

OpenGL for Solaris Architecture


The purpose of designing a graphics system architecture is to enable performance within the constraints of cost and functionality goals. Hardware design places various stages of the graphics pipeline into hardware accelerators. Software design uses the hardware features and complements the hardware by providing complete coverage of functionality.
Understanding the hardware and software architecture of a particular system will help you determine whether a feature is accelerated in the graphics hardware or implemented in software. This will enable you to identify which path through the system your application uses for the feature. With this information, you can project your application's performance. Given knowledge of performance versus functionality tradeoffs, you can make informed choices about how to use the system to maximize your application's interactivity.
This chapter describes the OpenGL for Solaris architecture. First it defines two terms commonly used when discussing hardware and software performance.

Acceleration vs. Optimization

When discussing performance, understanding how the hardware implementor, software implementor, and application programmer define and differentiate the terms hardware acceleration and software optimization is helpful.
  • To the hardware designer, hardware accelerating OpenGL means implementing logic in the form of gates and data paths for OpenGL functions.
  • To the OpenGL software implementor, accelerating OpenGL functions means writing software to use the graphics hardware features. In addition, the software implementor can optimize OpenGL features that are not accelerated in hardware by writing highly tuned code to make the performance of those features as efficient as possible.
  • To the OpenGL application programmer, acceleration typically means the speed at which various combinations of geometry and OpenGL state render, with the goal generally being interactive performance.
With these definitions in mind, the next sections describe the OpenGL architecture and the implementation of this architecture in the Solaris OpenGL software.

A Quick Review of the OpenGL Architecture

As a first step in examining the OpenGL for Solaris architecture, Figure 2-1 shows the basic architecture of the OpenGL library.

Grafik

FIGURE 2-1

In the first stage of the OpenGL pipeline, vertex data enters the pipeline, and curve and surface geometry is evaluated. Next, colors, normals, and texture coordinates are associated with vertices, and vertices are transformed and lit. Vertices are then assembled into geometric primitives.
The rasterization stage converts geometric primitives into frame buffer addresses and values, or fragments. Each fragment may be altered by per-fragment operations, such as blending. Per-fragment operations store updates into the frame buffer based on incoming and previously stored Z values (for Z buffering), blending of incoming fragment colors with stored colors, as well as masking and other logical operations.
1. From Segal, Mark, and Kurt Akeley, "The OpenGL Graphics System: A Specification," Mountain View, CA, 1995.
Pixel data is processed in the pixel operation stage. The resulting data is stored as texture memory, or rasterized and processed as fragments before being written to the frame buffer.
The task of the hardware and software implementors at Sun was to implement the OpenGL functionality. The remainder of this chapter describes this implementation.

Graphics Hardware Architecture

Graphics hardware architectures can be designed to meet varying constraints of cost and CPU performance. High-performance model coordinate (MC) devices typically implement vertex processing and transformations in hardware. A model coordinate device may perform lighting, coordinate transformations, clipping, and culling as well as rasterization and fragment processing in hardware, thereby providing very fast performance.
At a different performance level, rasterization devices typically use the host CPU to perform vertex processing and use the rasterization hardware to convert device coordinate geometry into pixel values. The Ultra Creator and Creator3D systems are examples of device coordinate (DC) devices. The graphics hardware architecture of the Creator3D graphics system is designed as follows:
  • Primitive assembly and vertex processing are performed on the UltraSPARCTM CPU. Texturing operations are also performed on the CPU.
  • Rasterization and fragment processing are performed in the Creator3D Graphics hardware subsystem. The Creator3D graphics system accelerates rasterization of lines, points, and triangles, and also accelerates per-fragment operations such as the pixel ownership test, scissor test, depth buffer test, blending, logical operations, line anti-aliasing, line stippling, and polygon stippling.
The benefit of building custom hardware for graphics is that when operations are parallelized in hardware circuits, turning on features (like both Z-buffering and blending) has a very small performance cost. If a feature is provided in hardware, the hardware is usually designed to allow sustained throughput for that feature. Thus, you can make full use of features that have been implemented in hardware without experiencing performance degradation.
The benefit of putting graphics functions in software is that since the CPU is a required and shared computing resource, using it for graphics operations imposes no additional financial cost. The disadvantage is that each additional graphics operation requires CPU cycle time. When an application asks more of the CPU, the CPU may perform more slowly.

Solaris OpenGL Software Architecture

Once the hardware designers have determined what the hardware will accelerate, all other decisions regarding performance fall to the software implementors. Software implementors need to consider the following questions:
What hardware features will be used?
  1. What features that are not accelerated in hardware can the software optimize?

  2. How will the software implement all functionality?

In response to these questions, the Solaris OpenGL software developers implemented OpenGL as follows:
  • Accelerated OpenGL by using using all features of the Creator and Creator3D graphics subsystems.
  • For the Creator and Creator3D systems optimized line and point transformation and clip test, and a subset of texture lookup and filtering.
  • Implemented OpenGL to its complete specification by writing code for primitive assembly and vertex processing, including:

    · Coordinate transformations

    · Texture coordinate generation

    · Clipping

  • Implemented two forms of software rasterization for OpenGL features not rasterized in hardware:

    · Optimized software rasterizer for many texturing functions and pixel operations. Software rasterization is done by the CPU using an optimized implementation. On an UltraSPARC CPU, some features, such as texturing rasterization, may be handled using software code employing the VIS instruction set.

    · A software rasterizer for all features not handled by the hardware or by the VIS software.

This implementation of the OpenGL for Solaris library allows devices with varying capabilities to run efficiently on the OpenGL software. It enables OpenGL for Solaris applications to run on the following types of devices:
  • Model coordinate device - Handles most OpenGL functionality in hardware, including vertex processing, primitive assembly, rasterization, and fragment operations.
  • Device coordinate device (Creator or Creator3D graphics system) - Performs vertex processing. Rasterization and fragment processing is handled in hardware.
  • Memory mappable devices (SX, ZX, GX, GX+, TGX, TGX+, TCX) - Vertex processing, primitive assembly, rasterization, and fragment processing are performed in software, and the results are written to the memory-mapped frame buffer.
FIGURE 2-2 on page 12 illustrates the graphics software architecture of the OpenGL for Solaris product. This figure shows the paths that application data can take through the OpenGL system, depending on the type of hardware device the application is running on. TABLE 2-1 summarizes the data paths with reference to several hardware platforms.
TABLE 2-1
PlatformVertex ProcessingRasterizationPerformance
MC deviceHardware vertex processingHardware rasterizerFastest path
Software vertex processingHardware rasterizerFast path
Software vertex processingSoftware rasterizerSlow path
DC device
(Creator3D
or Creator)
Software vertex processing

Software vertex processing
Hardware rasterizer

Software rasterizer
Fast path

Slow path
Memory map (ZX, GX, SX)Software vertex processingSoftware rasterizerOnly path

Grafik

FIGURE 2-2

Vertex Processing Architecture

As Figure 2-2 shows, Sun's OpenGL implementation handles vertex processing in several ways:
  • Hardware vertex processing - On model coordinate devices, vertex processing is done via the hardware. In addition to hardware acceleration, the model coordinate (MC) pipeline is optimized for vertex arrays and display list mode. The model coordinate pipeline also recognizes consistent data types within glBegin/glEnd pairs. If the data is consistent, the software is able to use hardware resources efficiently.
  • Software vertex processor - This is the fully optimized path from the software implementor's point of view. The principal optimization is that the model coordinate software pipeline recognizes consistent data types within glBegin/ glEnd pairs: if the data is consistent, the software pipeline is able to use CPU resources efficiently.
The OpenGL vertex array commands result in the best performance for vertex processing on all hardware platforms. For repeated rendering of the same geometry, display lists provide significant performance benefits over immediate mode rendering.

Rasterization and Fragment Processing Architecture

Rasterization and fragment processing is handled in one of the following ways:
  • Hardware rasterizer - The graphics subsystem handles lines, points, and triangles, and does simple fragment processing, such as blending and the depth-buffer test.
  • Optimized software rasterizer - The CPU does software rasterization using an optimized implementation. On an UltraSPARC CPU, some features, such as texturing rasterization, may be handled by the UltraSPARC CPU using software code employing the VIS instruction set.
  • Software rasterizer - The CPU does software rasterization using a generic, unoptimized implementation. The generic software rasterizer is approximately one-sixth the speed of the optimized software rasterizer.

Solaris OpenGL Interface Layers

The OpenGL for Solaris implementation has three layers of interfaces with the hardware, each requiring successively more processing by the host CPU. These interface layers correspond to the stages of the OpenGL pipeline. The rendering interface is determined by the value of the current OpenGL attributes, and in a small number of cases by the geometry itself. In general, the more host processing needed, the slower the resulting rendering, so an application should avoid attributes that force the slower rendering layers to be used.
FIGURE 2-3 on page 15 shows the interface layers and their relationship to data paths through the OpenGL for Solaris system. In this illustration, the filled boxes represent the hardware-specific device pipeline (DP) components and show the hardware data paths. The white boxes represent the device-independent (DI) software components and show the software data paths.
The more efficiently an application can reach a filled box, the better the application's performance will be. For example, for an application running on a model coordinate device, the fast data paths are those that result in rendering in hardware at the vertex processing layer. Setting an attribute that causes the use of the software pipeline for model coordinate processing can result in a significant drop in performance. Setting an attribute that results in the use of software rasterizing can cause an even more significant drop in performance.
On a device coordinate device such as the Creator3D system, hardware rasterization is about three times faster than the VIS (optimized) rasterizer. The VIS rasterizer is about five-to-six times faster than the generic software rasterizer. Thus, the best way to increase rasterization and fragment processing performance on a DC device is to stay in the hardware rasterizer whenever possible.
Memory-mappable devices without hardware support use the software pipeline for model coordinate operations and the software rasterizer for rasterization. Examples of this device are the single-buffered GX, and TGX. For devices that do not allow memory access, the OpenGL for Solaris architecture provides a pixel--rendering interface layer. However, at this time no Sun hardware devices use this interface layer.
For detailed information on attributes that result in slower rendering paths, see Chapter 3 "Performance."

Grafik

FIGURE 2-3