SunDiag User's Guide
  Search only this book
Download this book in PDF

Introducing the SunDiag System Exerciser

1

1.1 Overview

The SunDiag on-line system exerciser runs multiple diagnostic hardware tests from a single interface. It is used primarily with the OpenWindows(TM) user interface that enables you to set test parameters quickly and easily when running the diagnostic tests.
Support for SunDiag 4.4 This manual describes revision 4.4 of the SunDiag software package, released with the Solaris 2.4 operating environment. The SunDiag 4.4 exerciser is an optional, relocatable software package on the Solaris 2.4 System disk. The default installation directory for SunDiag is /opt/SUNWdiag/bin. When you choose to install the SunDiag software, you can change this directory.
Window Interface SunDiag allows you to select tests and test options with a click of the mouse. See Chapter 2, "The SunDiag OPEN LOOK Interface" for detailed information on the OPEN LOOK interface. You can create your own test environment and save it for future use. SunDiag also features an extensive on-line Magnify Help(TM) facility to quickly answer questions about the interface. If you have a question about a specific part of the interface (a button, field, or setting), try the on-line help before consulting this manual. You can usually find the answer by pointing to a specific part of the SunDiag window and pressing the Help key.
TTY Interface Using the TTY interface, you can run the SunDiag software from a terminal or modem attached to a serial port. This feature requires you to type commands instead of using the mouse, and it displays one screen of information at a time. However, it emulates the window system whenever possible; choices are "toggled" by entering a single letter instead of clicking the mouse button.
See Chapter 3, "The SunDiag TTY Interface," for detailed information.
Command Line Interface You can also run each of the SunDiag tests individually from a shell command line using the command line syntax. Each test description contains the command line syntax to use. See Chapter 5, "Running Individual SunDiag Tests from the Command Line" for a list of standard arguments common to all SunDiag tests.
Hardware Verification The SunDiag exerciser automatically probes the system kernel for installed hardware devices. Those devices are then displayed on the SunDiag control panel with the appropriate tests and test options. This provides a quick check of your hardware setup. However, you may need to create other tests to verify specific hardware devices (see the "Adding Your Own Tests in .usertest" on page 1-17 and Appendix A, "Developing Your Own Tests.").
The SunDiag exerciser verifies the configuration, functionality, and reliability of most hardware controllers and devices.

1.2 Hardware and Software Requirements

The SunDiag 4.4 software will run on any system installed with the Solaris 2.4 operating environment. The operating system kernel must be configured to support all peripherals that are to be tested.

1.2.1 OpenWindows Software Requirements

You must meet the following three requirements to run SunDiag with the OpenWindows(TM) software.
  1. OpenWindows Version 3.0

    You must be running the OpenWindows software, version 3.0, or later.

  2. Display Permission

    You must have permission to display the SunDiag window on your screen. Typing the following command at a shell prompt will give you the display permission (server access) you need. Substitute the machinename variable with the actual name of your workstation.


  /usr/openwin/bin/xhost + machinename  


Note - Use the /usr/openwin/bin/xhost command before obtaining root privileges (become superuser). xhost will not work in superuser mode.

This command must be used for every workstation that you intend to display the SunDiag window on.
  1. Correct Library Path

    You may also have to set the LD_LIBRARY_PATH variable, depending on the location of the OpenWindows directory in your system. If you have installed or mounted OpenWindows files in /usr/openwin (the default location), you can ignore this step.

If you have installed the OpenWindows software in a different location, then you must specify where the OpenWindows libraries reside. Use the following command and substitute the pathname variable for the actual path where you installed the OpenWindows software:

  setenv LD_LIBRARY_PATH pathname  

You can check the existing LD_LIBRARY_PATH by typing setenv.

1.2.2 Special Note on Testing Multiple Framebuffers

These rules apply when testing multiple framebuffers (displays) simultaneously:
  • You can test multiple framebuffers on a system simultaneously, but only one framebuffer can be running OpenWindows software.
  • The framebuffer running OpenWindows software must have window locking enabled to avoid incorrect test failures. Other framebuffers must have window locking disabled.

CAUTION Caution - If window locking is disabled (unlocked) on framebuffers that are running OpenWindows software, the SunDiag tests will return spurious error messages if you move the mouse during testing. Even slight mouse movement can cause a test to fail.

  • By default, SunDiag enables window locking on framebuffers that have unit number 0 (for example, cgsix0).

    If your system has more than one framebuffer with unit number 0, you must disable window locking on all of them except the one running OpenWindows software.

    If you are running a framebuffer test from a command line, you can disable window locking by specifying the L argument.

TTY Mode and Framebuffer Window Locking You can run the SunDiag exerciser in TTY mode in one of three ways: from a terminal attached to a serial port, from a Shell Tool window using the -t option (see Table 1-3), or on a monitor that is not running OpenWindows software.
Table 1-1
TTY ModeWindow Locking Issues
Terminal attached to a
serial port
Window locking not necessary because the terminal
cannot run OpenWindows software
Shell tool using -t optionEnables window locking on Framebuffer with unit number 0
Monitor not running OpenWindows softwareThe SunDiag software will probe and find framebuffer devices (cgsix, for example), but it will not enable window locking on those devices because the OpenWindows software is not running

Imported image(39x43)

Warning - Do not attempt to run the TTY mode on the console monitor and framebuffer tests concurrently in this way; doing so causes the framebuffer tests to fail.

1.2.3 Volume Management

Volume Management is a layer of removable media support software that has been added to the Solaris operating environment. This layer manages interactions between users and their removable media and provides transparent access to the media by automatically mounting media with labels.
rawtest, fstest, and cdtest test the diskette and CD-ROM drives regardless of whether the Volume Management software is installed and running. If the Volume Management software is installed, these tests will test
the diskette and CD-ROM drives with device names listed in the second column of Table 1-2. If the Volume Management software is not installed, the device names in columns 3 are used.
Table 1-2
DeviceWith Volume ManagerWithout Volume Management
disketteD=/vol/dev/aliases/floppy0D=/dev/diskette
CD-ROMD=/vol/dev/aliases/cdrom0D=/dev/dsk/c0t6d0

Imported image(39x43)

Warning - Do not edit the /etc/vold.conf file to change the logical name of the diskette drive (floppy0) and CD-ROM drive (cdrom0). Currently, SunDiag is hard-wired to use these pathnames as the default logical name.

1.2.4 Booting and New Device Drivers

When adding a new device driver in the Solaris operating environment, you must reboot the machine with the boot -r command to reconfigure the system and allow the SunDiag exerciser to recognize the new driver.
When you use the boot -r command, the system will probe all attached hardware devices and assign nodes in the filesystem to represent only those devices actually found. It will also configure the logical namespace in /dev as well as the physical namespace in /devices. If you have removed a device from the system, then you also need to reboot the system with boot -r command before the SunDiag kernel sees the correct devices. See the kernel(1M) man page for more information.
Starting with Solaris 2.1, boot messages, SCSI error messages and some other debugging messages are no longer sent to the /var/adm/messages file. To continue to see these types of messages, make sure to boot your system using the boot -v (verbose) option.

1.2.5 Swap Space Requirements

The amount of swap space the SunDiag exerciser requires varies widely with individual hardware and software configurations. Most systems have enough swap space already configured to satisfy the SunDiag testing requirements.
When you start testing with the SunDiag system exerciser (by clicking on the Start button), the program will calculate the amount of swap space it will need for testing. The program does this calculation by first determining the amount of swap space available on the system under test, and then calculating how much of this swap space is needed to run the program itself, the various tests, and the virtual memory test.
If there is not enough swap space available on your machine, a pop-up window will display, preventing you from testing. This pop-up window will display how much swap space you will need to add to your system in order to run all of the SunDiag diagnostic tests.
If you do not have enough swap space to run all of the SunDiag tests, you can either deselect some of tests or you can add more swap space to your system. (Refer to the Solaris 2.4 Administering File Systems manual for information on increasing swap space.)
Look at the test's option menu to find out how much swap space an individual test will use while testing. The amount of swap space used by the test will be shown in Kilobytes next to the word Configuration. Knowing these swap space amounts will help you decide which SunDiag tests to de-select.

CAUTION Caution - If your system does not have enough swap space configured, some of the SunDiag tests may run very slowly or they may freeze your screen. In these instances, the SunDiag kernel will usually return error messages indicating that the problem is due to insufficient swap space.

1.2.5.1 Starting SunDiag With the Virtual Memory Test

The SunDiag virtual memory test (vmem) is designed to stress test the virtual memory of the system; therefore it uses all of the system's remaining swap space.
Because vmem uses all of the system's remaining swap space, you must start all non-SunDiag processes (for example, OpenWindows applications) before you begin to test the system. Starting a process after you have begun testing your system may cause the SunDiag exerciser to run slowly or freeze your screen.
If you plan on starting other processes after you have started the virtual memory test, you must use the vmem reserve option to leave space for these processes before you start testing.
Reserve Option to vmem The vmem Reserve option allows you to reserve additional amounts of swap space for other processes or tests. This option can be exercised from either the vmem test options menu (within the SunDiag control panel) or from the command line.

1.2.6 Setting the Maximum Number of Processes

The SunDiag system exerciser runs under the Solaris 2.4 operating environment. The SunDiag kernel controls hardware specific tests from one interface. Each test displayed on the Status Panel is a separate UNIX(R) process, independent from the SunDiag kernel and other processes.
As a system's configuration grows, the number of processes needed for SunDiag testing also grows (especially on multi-processor systems). If the number of processes needed by SunDiag tests exceeds the maximum number of processes allowed by your system, the SunDiag application will fail.
However, you can change the maximum number of processes on your system by setting the maxusers parameter in your /etc/system file. The following section describes how to set this parameter to allow for all SunDiag processes.

1.2.6.1 Setting the maxusers Parameter in the /etc/system file

Beginning with the Solaris 2.3 operating environment, the default value of maxusers will be set automatically based on the amount of RAM installed on the system. For example, if your system has 32 Mbytes of RAM, the kernel will set the maxusers to 32.
The maximum number of UNIX processes allowed on a system is 16 times the value of maxusers (16*maxusers). Therefore, the maximum number or processes allowed on a system with 32 Mbytes of RAM is 512 (16*32).
However, the default number of UNIX processes set by a system may be insufficient to run all of the SunDiag tests. The number of test processes that the SunDiag exerciser may create while testing is approximately:

Imported image(360x119)

For example, the SunDiag exerciser will generate 1008 processes on a SPARCcenter(TM) 2000 system with 6 CPUs and 32 disks [(20 + 32*2)*(6*2)]. Therefore, this system will conservatively need 1200 processes (including system, network, and window system processes) to run without problems, especially if you have set the Concurrent Tests # to a very high value (see "Concurrent Tests #" on page 2-21). With 1200 processes needed, the maxusers parameter should be set to 75 (maxusers = (1200/16) = 75).
If this machine has more than 75 MBytes of RAM, then there is no need to adjust the default value of maxusers. Otherwise, in this example, the following line must be added to the /etc/system file:

  set maxusers=75  

Refer to the Solaris 2.4 Transition Guide and the SunOS 5.4 Administering Security, Performance, and Accounting manual for more information on setting system configuration parameters.

1.2.7 Loopback Connectors

Certain SunDiag tests require loopback plugs or cables to run successfully. See the individual test descriptions to find out which tests need loopback cables or plugs. Also see Appendix B, "Loopback Connectors" for directions on how to obtain loopback plugs or cables.

1.2.8 Scratch CDs, Tapes, Diskettes

SunDiag requires that "scratch" tapes, diskettes, or optical discs (CDs) be installed in Sun's tape and disk drives for tests to run correctly. Scratch media are spare (usually blank) tapes and diskettes that can be overwritten as well as optical discs. These scratch media devices must be inserted before the kernel is probed by SunDiag, or else SunDiag will report an error message.
For CD-ROM tests, a test CD with a well-known table of contents must be loaded in the CD drive. It is recommended you use a demonstration CD shipped with the drive. Do not use an operating system distribution disc.
1 For tape tests, you need 4mm, 8mm, /
1
2", or
/4" scratch tapes (depending on the
type of drive being tested). Make sure the tape heads have been properly cleaned.
For hard disk and diskette tests, be sure there is enough space on your disk partition. Double or triple density diskettes (1.4 Mbyte) are required, depending on the diskette drive in your system. An additional megabyte of swap space is needed to run fstest.
Note - Beware of using old or scratched tapes and diskettes; they could cause spurious errors in specific tests.

1.2.9 TTY Terminals

SunDiag is designed to run in a window environment, but may be used from a terminal attached to a serial port. Chapter 3, "The SunDiag TTY Interface" describes the user interface when running in TTY mode.

1.3 Preparing to Start the SunDiag Exerciser

You must have server access (display) permission on the system under test to display the SunDiag window. Typing /usr/openwin/bin/xhost + displayhost will give you display permission to run SunDiag on displayhost. Remember that you must run this command before you become superuser.
You must be logged in as superuser (root) to run the program. The SunDiag software needs to write log and error files to the /var/adm/sundiaglog directory, which should be owned by root.

Note - While the SunDiag software is running, you should not run other programs or software that use the same hardware devices that the program is testing. In particular, when the virtual memory test is running, there may not be enough free memory to run any other programs. However, the Reserve option in the virtual memory (vmem) menu lets you specify the amount of memory to reserve from being tested. This option allows you to free up memory to run another application while running the SunDiag software.

If you log in remotely (rlogin) or log in from a serial port, the SunDiag application will automatically run in TTY mode. See Chapter 3, "The SunDiag TTY Interface."
The syntax for using the sundiag command is shown below. You will find it helpful to read Chapter 3, "The SunDiag TTY Interface" before actually starting the program in TTY mode.

Note - The SunDiag exerciser enables all available tests when it is invoked (except those that require intervention mode). Starting all the enabled tests will slow your system down drastically. Be sure to read the "Hardware and Software Requirements" section in this chapter before running all available tests.

1.4 Starting the SunDiag Exerciser

Read through the argument descriptions listed in Table 1-3 before actually starting the SunDiag exerciser. If you don't need any of those options, just type these basic commands:

  example% xhost + machinename  
   machinename being added to access control list  
  example% su  
  Password: (enter your root password)  
  example# /opt/SUNWdiag/bin/sundiag  
  SunDiag: Starting probing routine, please wait...  
  
  (Next, you may see some error messages if you haven't inserted scratch tapes, disks, or CDs in  
  your system. The SunDiag window then displays on your screen.)  

If you are running OpenWindows software, the SunDiag window will display by default; otherwise, SunDiag will display in TTY mode (See Chapter 3, "The SunDiag TTY Interface" for more details). Table 1-3 shows the full syntax for starting the SunDiag exerciser.
Table 1-3
/opt/SUNWdiag/bin/sundiag [-Cpqtw] [-i number] [-o options_file] [-b batch_file] [-k kernel_name]
Argument...Description
-C......Redirects the console output from any existing console window to the SunDiag console window. If you are using the TTY interface, the console message is displayed in the message line of the status screen.

-p......Tells SunDiag to ignore the kernel probe for devices. Use this when running user-defined tests found in .usertest.

-q......Automatically quits the SunDiag program when testing stops. This option can only be issued from a command line.

-t......Instructs SunDiag to run in TTY mode. See Chapter 3, "The SunDiag TTY Interface" for information on running SunDiag in this mode.

Table 1-3 (Continued)
/opt/SUNWdiag/bin/sundiag [-Cpqtw] [-i number] [-o options_file] [-b batch_file] [-k kernel_name]
Argument...Description
-w......Writes the system hardware configuration to the /var/adm/sundiaglog/sundiag.conf file.

-i number...Specifies the maximum number of instances for scalable tests. This setting overrides the default setting of two times the number of processors on the system under test.

-o options_file Directs the SunDiag software to use a specific "saved option" file. See the "Option Files Window Button" section in Chapter 2 for directions on creating an options file. If you do not use the -o argument, SunDiag uses the default option file /var/adm/sundiaglog/options/.sundiag, if it exists.

-b batch_file..Enables you to use a batch_file (collection of option files) to specify testing parameters when running the SunDiag software. See the "Using Batch Files" section in this chapter for more details.

-k kernel_name Specifies a customized kernel name. The default kernel name is /kernel/unix. Since the rstatd that the performance monitor requires is hard-wired to use /vmunix as kernel name, the performance monitor is disabled when the -k option is specified.

1.5 Stopping the SunDiag Exerciser

To exit SunDiag from the OpenWindows interface, click the Stop button at the top of the control panel to stop any tests that are running. Some of the tests, such as the tape tests, may delay before actually stopping, because these tests require time to rewind the tapes.
To exit SunDiag from a terminal or remote login, type q for quit.

1.6 Using Batch Files

Batch files are lists of option files which specify SunDiag tests and their options. To use the SunDiag -b option, you must first create a batch_file in /var/adm/sundiaglog/configs before invoking SunDiag. The batch_file must use the following format:

  #option file runtime delay_before_loading_next_option_file (min.)  
  #----------- ------- ------------------------------------  
  optfile1     60      3  
  optfile2     1020    5  
  optfile1     60      16  
  optfile3     0       0  
  #  

In the example above, optfile1 and optfile2 are created using the Option Files menu button on the SunDiag control panel (See Section 2.1.4, "Control Panel"). These files list the SunDiag tests to be run. They run for the times specified (in minutes) in the runtime column.
Files with a runtime of 0 display the final status of tests that have already run. This feature can be used to give the status of some or all of the option files in the batch file. For instance, the example file above first runs optfile1 and optfile2. Assume that optfile3 is a concatenation of optfile1 and optfile2. Running optfile3 with a runtime of 0 will return the final status of optfile1 and optfile2.
The delay_before_loading_next_option_file field ensures that all tests have been stopped before the next option file is loaded. SunDiag reserves enough time to ensure a smooth transition between tests, even if the delay specified in this field is not long enough.
The settings in the SunDiag Options menu override those in batch files. For example, if the value for Max # of passes is reached before the runtime set in the batch file is over, the test will stop. For this reason, you should set large values on the Set Options menu when using batch files and avoid using the Single Pass values.

Note - The SunDiag schedule option does not work with the batch option.

1.7 Running the SunDiag Exerciser on a Remote System

Use the following commands to run the SunDiag exerciser remotely with the OpenWindows interface. In the cases below, you must be logged in as superuser (root) to run the program. The SunDiag exerciser must be able to write the log and error files to the /var/adm/sundiaglog directory, which should be owned by root.

Note - To get the OpenWindows display on a remote machine, be certain to open the permissions of that window server using the /usr/openwin/bin/xhost + hostname command.

On a Local System Use one of the following commands to run the SunDiag exerciser with the OpenWindows interface from a local system:

  # ./sundiag                                     If OpenWindows software is running, the Sundiag OpenWindows  
                                                   interface displays. Otherwise, SunDiag starts in TTY mode.  
  
  # ./sundiag -display remotehost:0               If OpenWindows software is running on the remote system, the  
                                                   Sundiag OpenWindows interface will display on the remote  
                                                   system. Otherwise an error message is returned:  
                                         XView Error: Cannot open display on window_server:remotehost:0  

From a Remote System Use one of the following commands to run the SunDiag exerciser with the OpenWindows interface from a remote machine. These cases assume that you have used rlogin to log in as superuser on remotehost, and that you want the SunDiag OpenWindows interface to appear on your local system:

  # ./sundiag                                     Sundiag TTY mode will display on your local  
                                                   window/console. See Chapter 3; this is the default mode.  
  
  # ./sundiag -display localhost:0                If OpenWindows software is running on the local system, the  
                                                   Sundiag OpenWindows interface displays on the localhost.  
                                                   Otherwise, an error message is returned:  
                                         XView Error: Cannot Open display on window_server: localhost:0  
  
  # ./sundiag -display remotehost:0               If OpenWindows is running on the remote system, the Sundiag  
                                                   OpenWindows interface displays on the remote server.  
                                                   Otherwise, an error message is returned:  
                                         XView Error: Cannot open display on window_server: remotehost:0  

1.8 Running the SunDiag Exerciser on a Stand-alone System

Some SunDiag applications require a network connection to run successfully. However, you can run the SunDiag exerciser "stand-alone" (without a network connection) by following these steps.
  1. Edit the /etc/rc2.d/S72inetsvc script file, and comment out the following line:


  # /usr/sbin/ifconfig -au netmask + broadcast +  

  1. Comment out all remote mount file systems from /etc/vfstab (since there will be no network connection).

  2. Make sure ypbind is not running in the stand-alone system. This step will make sure that ypbind is not started by any of the rc scripts.

  3. Reboot your system.

    The SunDiag software will now run properly on stand-alone machines.

1.9 Adding Your Own Tests in .usertest

You can add your own tests and have them appear as options on the SunDiag interface. Any file that can be run from a UNIX(R) shell can be used as a test file (including shell scripts).
To run your test through the SunDiag interface, you need to put the test file in the /opt/SUNWdiag/bin directory, and create a new /opt/SUNWdiag/bin/.usertest file using the syntax below.

1.9.1 Setting up a .usertest file

Use the following syntax to set up a .usertest file:

  device_name_label, testname, test_specific_arguments, SCA  

ArgumentDescription
device_name_labeldevice_name_label is the device name to be displayed for your test on the control panel.
testnametestname is the actual name of the file that contains your test (for example, vmem for the virtual memory test). It is used as the test name on the status panel.
test_specific_argumentsThese are the optional test-specific command-line arguments you use to execute your test. Since commas are used to delimit arguments in a .usertest line, use spaces to delimit the test-specific arguments instead. See the individual test descriptions in "SunDiag Tests" for the test-specific arguments.
SCAThis optional flag specifies that the test will be scalable. See "Scalable Tests" on page 4-1 for a list of scalable tests. Not all are tests are designed to be scalable.
The options you set in the .usertest file will become the default options for each test. You can change these options using the pop-up option menus on the SunDiag window interface, but the Reset button will return the options to the default settings specified in the .usertest file.
The following is an example of a .usertest file. SunDiag will ignore lines that are commented out (lines that begin with #).

  # @(#).usertest Rev   MM/DD/YY  
  
  SPC/S Internal Test,newtest,D=any T=1  
  SPC/S 25-pin LB on ttyz00,newtest,D=/dev/ttyz00 T=8  
  #SPC/S 25-pin LB on ttyz01,newtest,D=/dev/ttyz01 T=8  
  #SPC/S 25-pin LB on ttyz02,newtest,D=/dev/ttyz02 T=8  
  #SPC/S 25-pin LB on ttyz03,newtest,D=/dev/ttyz03 T=8  
  #SPC/S 25-pin LB on ttyz04,newtest,D=/dev/ttyz04 T=8  
  #SPC/S 25-pin LB on ttyz05,newtest,D=/dev/ttyz05 T=8  
  #SPC/S 25-pin LB on ttyz06,newtest,D=/dev/ttyz06 T=8  
  #SPC/S 25-pin LB on ttyz07,newtest,D=/dev/ttyz07 T=8  
  SPC/S Echo TTY on ttyz00,newtest,D=/dev/ttyz00 T=16  
  SPC/S 96-pin LB on board 1,newtest,D=sb1 T=4  
  SPC/S 96-pin LB on board 2,newtest,D=sb2 T=4  
  SPC/S 96-pin LB on board 3,newtest,D=sb3 T=4  

Figure 1-1 Example of a .usertest File

1.9.2 Test Writing Precautions

When writing tests for the SunDiag program, you should note the following precautions: (For more thorough instructions on developing your own tests, see Appendix A, "Developing Your Own Tests.")
  • The file descriptor, stdin, will be closed, and stdout and stderr will be redirected to /dev/null.

    Therefore, there will be no direct interaction between the user and the test. Tests should be written to use other methods of logging messages (for example, you can program your tests to write to output files).

  • When the test exits with a status code of 0, SunDiag will count this exit as a pass.

    Status counts of 1 through 90 are indications of errors and the error count will increment. The other status codes are reserved by SunDiag and should not be used.

  • If the test requires a large amount of memory to run (greater than 1 MByte), you should use the reserve option of the vmem test to set aside extra memory for the test.
See Section 6.2, "Virtual Memory Test (vmem)" for instructions.

1.10 SunDiag Exit Status Codes

SunDiag will exit with the following status codes:
Table 1-4
CodeDescription
0The SunDiag kernel exited normally and the tests that were run (if any) all completed successfully.
-1The SunDiag kernel exited due to an abnormal condition in the execution of its kernel.
1The SunDiag kernel exited normally but one or more of the tests that were run, failed.