Java on Solaris 7 Developer's Guide
  Search only this book
View this book in:
Download this book in PDF (136 KB)

Chapter 5 Application Performance Tuning

This chapter provides information about how to improve performance for your Java applications in the Solaris 7 environment. An application's performance can be defined as its usage of resources; therefore, performance tuning is the minimizing of its usage of those resources.


Caution - Caution -

Many of these performance tuning tips are specific to Java on the Solaris 2.6 and Solaris 7 platforms. Future releases may have different performance characteristics, therefore these tips may not continue to be appropriate.


Tuning Techniques

System Interface Level

These areas of the Java system interface level, where tuning can often result in significant performance gains, are discussed here:

  • I/O

  • Strings

  • Arrays

  • Vectors

  • Painting/drawing

  • Hashing

  • Images

  • Memory usage

  • Threads

Compiler Optimization Level

The optimizations for these compilers are listed:

  • Java compiler

  • JIT compiler

Code Tuning Level

Code tuning in these areas may be used to increase performance:

  • Loops

  • Convert expr to table lookup

  • Caching

  • Result pre-computation

  • Lazy evaluation

  • Class vs. object initialization

I/O

The biggest and most common performance problem in Java applications is often inefficient I/O. Therefore, I/O issues should generally be the first thing to look at when performance-tuning a Java application. Fixing these problems often results in greater performance gains than all the other possible optimizations combined. It is not unusual to see a speed improvement of one order of magnitude achieved by using efficient I/O techniques.

If an application performs a significant amount of I/O, then it is a candidate for I/O performance tuning. This conclusion can be confirmed by profiling the application. To learn how to profile an application, you can use the Java WorkShop (JWS) product. JWS can be obtained from:

http://www.sun.com/workshop

Select Help->Help Contents, and click on Profiling Projects. This example involves running a benchmark test reading a 150,000-line file using four different methods:

  1. DataInputStream.readLine() alone (unbuffered).

  2. DataInputStream.readLine() with a BufferedInputStream underneath, which has a buffer size of 2048 bytes.

  3. BufferedReader.readline() with a buffer size of 8192 bytes.

  4. BufferedFileReader(fileName).

The results were as follows: (times in seconds) :

DataInputStream: 178.740
DataInputStream(BufferedInputStream): 21.559
BufferedReader 11.150
BufferedFileReader  6.991

Note that methods 1 and 2 do not properly handle Unicode characters, while methods 3 and 4 handle them correctly. This makes methods 1 and 2 unacceptable for most product uses. Also, DataInputStream.readLine() is deprecated as of JDK 1.1. Method 1 is used in JWS and other programs.

Another way to spot Solaris I/O problems is to use truss(1) to look for read(1) and write(1) system calls.

Strings

When using strings, the most important thing to remember is to use char arrays for all character processing in loops, instead of using the String or StringBuffer classes. Accessing an array element is much faster than using the charAt() method to access a character in a string. Also, remember that string constants ("...") are already string objects.

//DON'T

String s = new String("hello");

//DO

String s = "hello";

In addition:

  • class String

    Do not use this class for mutable strings, character processing, or charAt()method inside a loop.

  • class StringBuffer

    Use this class only when a string is mutable, accessed concurrently by multiple threads, and no character processing is performed. Do not use for immutable strings, character processing, or charAt(), setCharAt() methods inside a loop. The default string size is 16 characters. This class is automatically used by the compiler for string concatenation Set the initial buffer size to the maximum string length, if it is known.

  • class StringTokenizer

    This class is useful for simple parsing or scanning, but very inefficient. It can be optimized by storing the string and delimiter in a character array instead of in String, or by storing the highest delimiter character to allow a quicker check. This will result in a 1.6x to 10x performance increase (2.4x is typical), depending on the delimiter list and target string.

Arrays

Arrays are bounds-checked, which will degrade performance. However, accessing arrays is much faster than accessing Vector, String, and StringBuffer. Use System.arraycopy() to improve performance. This is a native method, and much faster than manual array processing.

Vectors

Vector is convenient to use, but inefficient. For best performance, use it only when the structure size is unknown, and efficiency is not a concern. When using Vector, ensure that elementAt() is not used inside a loop, as performance will degrade. Use Vector only when you have an array with the following characteristics:

  • Accessed concurrently by multiple threads

  • Dynamic size

Hashing

HashTable has these tunable parameters:

  • Capacity (usually a prime number), initialCapacity; if this is not set large enough, collisions will result, causing hashing to stop and linear list processing to be executed afterwards.

  • Load factor (0.0-1.0), loadFactor, which is a percentage of capacity beyond which the table will expand. HashTable calls hashCode(). These classes have pre-defined hashCode() methods:

  • Color, Font, Point

  • File

  • Boolean, Byte, Character, Double, Float, Integer, Long, Short, String

  • URL

  • BitSet, Date, GregorianCalendar, Locale, SimpleTimeZone. Note that String.hashCode() does not always sample all the characters, depending on the length:

  • Length from 1 to 15: all n Length from 16 to 23: every other character

  • Length from 24 to 31: every third character

  • And so on.

Images

Painting and Drawing

To improve performance in these areas, use the following techniques:

  • Double buffering (for instance, for animation, draw the image off-screen and load all at once) .

  • Overriding the default, update()


    public void update(Graphics g) {
    				paint(g);
    			}

  • Custom layout managers. If you want custom behavior, GUI performance is best if you write your own.

  • Events. The JDK 1.1 has a more efficient event model than JDK 1.0.

  • Repaint only the damaged regions (use ClipRect).

Asynchronous Loading

To improve (asynchronous) loading performance, use your own imageUpdate() method to override imageUpdate(). imageUpdate() can cause more repainting than desired.


//wait for the width information to be loaded
while (image.getWidth(null) == -1 {
		try {
			Thread.sleep(200);
		}
		catch(InterruptedException e) {
		}
	}  
	if (!haveWidth) {
		synchronized (im) {
			if (im.getWidth(this) == -1) {
				try {
					im.wait();
         	}
        	catch (InterruptedException) {
				}
			}
		}
//If we got this far, the width is loaded, we will never go thru
// all that checking again.
		haveWidth = true;
	} 
... 
public boolean imageUpdate(Image img, int flags, int x, int y, int width, \
			int height) {
		boolean moreUpdatesNeeded = true;
		if ((flags&ImageObserver.WIDTH)!= 0 {
			synchronized (img) {
				img.notifyAll();
				moreUpdatesNeeded = false;
			}
		}
		return
		moreUpdatesNeeded;
}    

Pre-Decoding

Pre-decoding and storing the image in an array will improve performance. Image decoding time is greater than loading time. Pre-decoding using PixelGrabber and MemoryImageSource should combine multiple images into one file for maximum speed. These techniques are more efficient than polling.

Memory Usage

You can dramatically improve application performance by reducing the amount of garbage collection performed during execution. The following practices can also increase performance:

  • Increase the initial heap size from the 1MByte default with

    java -ms number . java -mx number .

    The default maximum heap size is 16 MBytes.

  • Find areas where too much memory is being used with

    java -verbosegc

  • Take size into account when allocating arrays (for instance, if short is big enough, use it instead of int).

  • Avoid allocating objects in loops (readLine() is a common example)

Threads

As discussed in "Java Threads In The Solaris Environment - Earlier Releases* ", performance is increased dramatically by using native threads. Green threads are not time-sliced and may require calls to Thread.yield() in loops, slowing execution. Other techniques to avoid:

  • Overuse of synchronization increases the possibility of deadlock (due to coding errors) and increases the likelihood of delays due to lock contention. Also, the overhead of synchronizing might frequently overcome the advantages. Minimizing synchronization may take work, but can pay off well.

  • Polling: it is only acceptable when waiting for outside events and should be performed in a "side" thread. Use wait()/notify() instead.

Compiler Optimizations

The following compilers automatically perform the listed optimizations.

Java Compiler

  • Inlining

  • Constant folding

JIT Compiler

  • Elimination of some array bounds checking.

  • Elimination of common sub-expressions within blocks

  • Empty method elimination

  • Some register allocation for locals

  • No flow analysis

  • Limited inlining

Code Optimization

Loops

Use these techniques for performance improvements:

  • Move loop invariants outside the loop.

  • Make the tests as simple as possible.

  • Use only local variables inside a loop; assign class fields to local variables before the loop.

  • Move constant conditionals outside loops.

  • Combine similar loops.

  • If loops are interchangeable, nest the busiest one.

  • As a last resort, unroll the loop.

Convert expr to Table Lookup

When a value is being selected based on a single expression with a range of small integers, convert it to a table lookup. Conditional branches defeat many compiler optimizations.

Caching

Though caching takes more memory, it can be used for performance improvement. Use the technique of caching values that are expensive to fetch or compute.

Pre-compute Results

Increase performance by precomputing values known at compile time.

Lazy Evaluation

Save startup time by delaying computation of results until they are needed.

Class vs. Object Initialization

Speed performance up by putting all one-time initializations into a class initializer.