SMCC NFS Server Performance and Tuning Guide
この本のみを検索
PDF 文書ファイルをダウンロードする

Configuring the Server and the Client to Maximize NFS Performance

4

This chapter provides configuration recommendations to maximize NFS performance. For troubleshooting tips see Chapter 5, "Troubleshooting."

Tuning for NFS Performance Improvement

Check these items when tuning the system:
  • Networks
  • Disk drives
  • Central processor units
  • Memory
  • Number of NFS threads in /etc/init.d/nfs.server
  • /etc/system to modify kernel variables
Once you have profiled the performance capabilities of your server, begin tuning the system. Tuning an NFS server requires a basic understanding of how networks, disk drives, CPUs, and memory affect performance. To tuning the system determine which parameters to adjust to improve balance.
There are three steps to monitor and tune the server performance:
  1. Collect statistics.

    See Chapter 3, "Analyzing NFS Performance."

  2. Identify a constraint or overutilized resource and reconfigure around it. Refer to this chapter and Chapter 3, "Analyzing NFS Performance" for tuning recommendations.

  3. Measure the performance gain over a long evaluation period.

This chapter discusses tuning recommendations for these environments:
  • Attribute-intensive environments, which are applications or environments where primarily small files (one to two hundred bytes) are accessed. Software development is an example of an attribute-intensive environment.
  • Data-intensive environments, which are applications or environments, where primarily large files are accessed. A large file can be defined as a file that takes one or more seconds to transfer (roughly 1 Mbyte). CAD or CAE are examples of data-intensive environments.

Balancing NFS Server Workload

All NFS processing takes place inside the operating system kernel at a higher priority than user-level tasks. When the NFS load is high, any additional tasks performed by an NFS server will run slowly. For this reason, do not combine databases or time-shared loads on an NFS server.
Non-interactive workloads such as mail delivery and printing (excluding the SPARCprinter or other Sun printers based on the NeWSprint(TM) software) are good candidates for using the server for dual purpose (such as NFS and other tasks). If you have spare CPU power and a light NFS load, then interactive work will run normally.

Networking Requirements

Make sure that network traffic is well balanced across all client networks and that networks are not overloaded. If one client network is excessively loaded, watch the NFS traffic on that segment. Identify the hosts that are making the largest demands on the servers. Partition the work load, or move clients from one segment to another.
Simply adding disks to a system does not improve its NFS performance unless the system is truly disk I/O-bound. The network itself is likely to be the constraint as the file server grows, requiring adding more network interfaces to keep the system in balance.
Instead of attempting to move more data blocks over a single network, consider characterizing the amount of data consumed by a typical client and balance the NFS reads and writes over multiple networks.

Data-Intensive Applications

Data-intensive applications demand relatively few networks. However, the networks must be of high-bandwidth (such as FDDI, CDDI, SunATM(TM), or SunFastEthernet(TM)). For more information on FDDI, see the FDDI/S3.0 User's Guide.
If your configuration has either of the following characteristics, then your applications require high-speed networking:
  • Your clients require aggregate data rates of more than 1 Mbyte per second.
  • More than one client must be able to simultaneously consume 1 Mbyte per second of network bandwidth.

Configuring the Network

Follow these suggestions to configure your network when your server's primary application is data-intensive.
  • Configure FDDI or another high-speed network.

    If fiber cabling can't be used for logistical reasons, consider FDDI, CDDI, or SunFastEthernet over twisted-pair implementations. SunATM uses the same size fiber cabling as FDDI.

  • Configure one FDDI ring for each five to seven concurrent fully NFS-active clients.

    Few data-intensive applications make continuous NFS demands. In typical data-intensive EDA and earth-resources applications, this results in 25-40 clients per ring.

    A typical use consists of loading a big block of data which is manipulated then writing the data back. These environments can have very high write percentages due to the step of writing the data back.

  • If your installation has Ethernet cabling, configure one Ethernet for every two active clients.

    This almost always requires using a SPARCserver 1000, SPARCserver 1000E, SPARCcenter 2000, SPARCcenter 2000E system, or an Ultra Enterprise 3000, 4000, 5000, or 6000 system since useful communities require many networks. Configure a maximum of four to six client per network.

Attribute-Intensive Applications

In contrast, most attribute-intensive applications are easily handled with less expensive networks. But, attribute-intensive applications will need many networks. Use lower-speed networking media, such as Ethernet or Token Ring.
· To configure networking when your server's primary application is attribute-intensive
  • Configure on Ethernet or Token Ring.
  • Configure one Ethernet network for eight to ten fully active clients.

    More than 20 to 25 clients per Ethernet results in severe degradation when many clients are active. As a check, an Ethernet is able to sustain about 250- 300 NFS ops/second on the SPECnfs_097 (LADDIS) benchmark, albeit at high collision rates. It is unwise to exceed 200 NFS ops/second on a sustained basis.

  • Configure one Token Ring network for each ten to fifteen active clients.

    If necessary, 50 to 80 total clients per network are feasible on Token Ring networks, due to their superior degradation characteristics under heavy load (compared to Ethernet).

Systems with More Than One Class of Users

Mixing network types is not unreasonable. For example, both FDDI and Token Ring are appropriate for a server that supports both a document imaging application (data-intensive) and a group of PCs running a financial analysis application (most likely attribute-intensive).
The platform you choose is often dictated by the type and number of networks, as they may require many network interface cards.
To configure networking for servers that have more than one class of users, mix network types.

Disk Drives

Disk drive usage is frequently the tightest constraint in an NFS server. Even a sufficiently large memory configuration may not improve performance if the cache cannot be filled quickly enough from the file systems. Use iostat to determine disk usage. Look at the number of read and write operations per second (see "Checking the NFS Server" in Chapter 3).
Because there is little dependence in the stream of NFS requests, the disk activity generated contains large numbers of random access disk operations. The maximum number of random I/O operations per second ranges from 40-90 per disk.
Driving a single disk at more than 60 percent of its random I/O capacity creates a disk bottleneck.

Limiting Disk Bottlenecks

Disk bandwidth on an NFS server has the greatest effect on NFS client performance. Providing sufficient bandwidth and memory for file system caching is crucial to providing the best possible file server performance. Note that read/write latency is also important. For example, each NFSop may involve one or more disk accesses. Disk service times add to the NFSop latency, so slow disks mean a slow NFS server.
Follow these suggestions to ease the disk bottleneck.
  • Balance the I/O load across all disks on the system.

    If one disk is heavily loaded and others are operating at the low end of their capacity, shuffle directories or frequently accessed files to less busy disks.

  • Partition the file system(s) on the heavily used disk and spread the file system(s) over several disks.

    Adding disks provides additional disk capacity and disk I/O bandwidth.

  • Replicate the file system to provide more network-to-disk bandwidth for the clients if the file system used is read-only by the NFS clients, and contains data that doesn't change constantly.

    See the following section "Replicating File Systems."

  • Size the operating system caches correctly, so that frequently needed file system data may be found in memory.

    Caches for inodes (file information nodes), file system metadata such as cylinder group information, and name-to-inode translations must be sufficiently large, or additional disk traffic is created on cache misses. For example, if an NFS client opens a file, that operation generates several name-to-inode translations on the NFS server.

    If an operation misses the Directory Name Lookup Cache (DNLC), the server must search the disk-based directory entries to locate the appropriate entry name. What would nominally be a memory-based operation degrades into several disk operations. Also, cached pages will not be associated with the file.

Replicating File Systems

Commonly used file systems, such as the following, are frequently the most heavily used file systems on an NFS server:
  • /usr directory for diskless clients
  • Local tools and libraries
  • Third-party packages
  • Read-only source code archives
The best way to improve performance for these file systems is to replicate them. One NFS server is limited by disk bandwidth when handling requests for only one file system. Replicating the data increases the size of the aggregate "pipe" from NFS clients to the data.
Replication is not a viable strategy for improving performance with writable data, such as a file system of home directories. Use replication with read-only data.
Follow these suggestions to replicate file systems.
  • Identify the file or file systems to be replicated.

    If several individual files are candidates, consider merging them in a single file system. The potential decrease in performance that arises from combining heavily used files on one disk is more than offset by performance gains through replication.

  • Use nfswatch, to identify the most commonly used files and file systems in a group of NFS servers.

    Table A-1 in the Appendix lists performance monitoring tools, including nfswatch, and explains how to obtain nfswatch.

  • Determine how clients will choose a replica.

    Specify a server name in the /etc/vfstab file to create a permanent binding from NFS client to the server. Alternatively, listing all server names in an automounter map entry allows completely dynamic binding, but may also lead to a client imbalance on some NFS servers. Enforcing "workgroup" partitions, in which groups of clients have their own replicated NFS server, strikes a middle ground between the extremes and often provides the most predictable performance.

  • Choose an update schedule and mechanism for distributing the new data.

    The frequency of change of the read-only data determines the schedule and the mechanism for distributing the new data. File systems that undergo a complete change in contents, for example, a flat file with historical data that is updated monthly, may be best handled by copying data from the distribution media on each machine, or using a combination of ufsdump and restore. File systems with few changes can be handled using management tools such as rdist.

    Evaluate what penalties, if any, are involved if users access stale data on a replica that is out of date. One possible way of doing this is with the Solaris 2.x JumpStart(TM) facilities in combination with cron.

Adding the Cache File System

Adding the cache file system to client mounts provides a local replica for each client. Unlike a replicated server, the cache file system can be used with writable file systems, but performance will degrade as the percent of writes climb. If the percent of writes is too high, the cache file system may decrease NFS performance. The /etc/vfstab entry for the cache file system looks like this:

  #device        device     mount     FS   fsck         mount       mount  
  #to mount      to fsck    point     type pass         at boot     options  
  server:/usr/dist/cache /usr/dist    cachefs  3yes  
  ro,backfstype=nfs,cachedir=/cache  

Configuring Rules for Disk Drives

Follow these general guidelines for configuring disk drives. In addition to the following general guidelines, more specific guideline for configuring disk drives in data-intensive environments and attribute-intensive environments follows.
  • Configure additional drives on each host adapter without degrading performance (as long as the number of active drives does not exceed SCSI standard guidelines.)
Keep these rules in mind when configuring disk drives in data-intensive and attribute-intensive environments.
To configure disk drives in data-intensive environments:
  • Configure no more than four to five fully active 2.9 Gbyte disks per fast/wide SCSI host adapter.
  • Configure at least three disk drives for every active client on FDDI.
  • Configure one drive for every client on Ethernet or Token Ring.
To configure disk drives in attribute-intensive environments:
  • Configure no more than four to five fully active 1.05 Gbyte or 535 Mbyte disks per fast SCSI host adapter.
  • Configure at least one disk drive for every two fully active clients (on any type of network.)
  • Configure no more than six to seven 2.9 Gbyte disk drives for each fast/wide SCSI host adapter.

Using Solstice DiskSuite or Online: DiskSuite to Spread Disk Access Load

A common problem in NFS servers is poor load balancing across disk drives and disk controllers. Balance loads by physical usage instead of logical usage. Use Solstice DiskSuite or Online: DiskSuite to spread disk access across disk drives transparently by using its striping and mirroring functions. Disk concatenation accomplishes a minimum amount of load balancing as well but only when disks are relatively full.
If your environment is data-intensive, stripe with a small interlace to improve disk throughput and distribute the service load. Disk striping improves read and write performance for serial applications. Use 64 Kbytes per number of disks in the stripe as a starting point for interlace size.
If your environment is attribute-intensive, where random access dominates disk usage, the default interlace (one disk cylinder) is most appropriate.
The disk mirroring feature of Solstice DiskSuite or Online: DiskSuite improves disk access time and reduces disk usage by providing access to two or three copies of the same data. This is particularly true in environments dominated by read operations. Write operations are normally slower on a mirrored disk since two or three writes must be accomplished for each logical operation requested.
Use the iostat and sar commands to report disk drive usage. Attaining even disk usage usually requires some iterations of monitoring and data reorganization. In addition, usage patterns change over time. A data layout that works when installed may perform poorly a year later. For more information on checking disk drive usage, see the section "Checking the NFS Server" in Chapter 3.

Using Log-Based File Systems with Solstice DiskSuite or Online: DiskSuite 3.0

The Solaris 2.4 and later software environments and the Online: Disk Suite 3.0 software support a log-based extension to the standard UNIX file system, which behaves like a disk-based Prestoserve NFS accelerator.
In addition to the main file system disk, a small (typically 10 Mbytes) section of disk is used as a sequential log for writes. This speeds up the same kind of operations as a Prestoserve NFS accelerator with two advantages:
  • In dual-machine high-available configurations, the Prestoserve NFS accelerator cannot be used. The log can be shared so that it can be used.
  • After an operating system crash, the fsck of the log-based file system involves a sequential read of the log only. The sequential read of the log is almost instantaneous, even on very large file systems.

Note - You cannot use the Prestoserve NFS accelerator and the log on the same file system.

Using the Optimum Zones of the Disk

When you analyze your disk data layout, consider zone bit recording.
All of Sun's current disks (except the 207 Mbyte disk) use this type of encoding which uses the peculiar geometric properties of a spinning disk to pack more data into the parts of the platter closest to its edge. This results in the lower disk addresses (corresponding to the outside cylinders) usually outperforming the inside addresses by 50 percent. You put the data in the lowest-numbered cylinders, since the zone bit recording data layout makes those cylinders the fastest ones.
This margin is most often realized in serial transfer performance, but also affects random access I/O. Data on the outside cylinders (zero) not only moves past the read/write heads more quickly, but the cylinders are also larger. Data will be spread over fewer large cylinders, resulting in fewer and shorter seeks.

Central Processor Units

This section explains how to determine CPU usage and provides guidelines for configuring CPUs in NFS servers.
· To determine CPU usage
* Type mpstat 30 at the % prompt to get 30 second averages.

  zardoz-12% mpstat 30  
  CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl  
    0    6   0    0   114   14   25    0    6    3    0    48    1   2  25  72  
    1    6   0    0    86   85   50    0    6    3    0    66    1   4  24  71  
    2    7   0    0    42   42   31    0    6    3    0    54    1   3  24  72  
    3    8   0    0     0    0   33    0    6    4    0    54    1   3  24  72  

The mpstat 30 command reports statistics per processor. Each row of the table represents the activity of one processor. The first table summarizes all activities since the system was last booted. Each subsequent table summarizes activity for the preceding interval. All values are rates (events per second).
Look at the following data in the mpstat output (see Table 4-1 on page 70):
  • usr Percent user time
  • sys Percent system time (can be caused by NFS processing)
  • wt Percent wait time (treat as for idle time)
  • idl Percent idle time
If sys is greater than 50 percent, there is a lot of NFS processing going on. Add CPU power to improve NFS performance.
Table 4-1 describes guidelines for configuring CPUs in NFS servers.
Table 4-1
IfThen
Your environment is predominantly attribute-intensive, and you have 1 to 3 medium-speed Ethernet or Token Ring networks.A uniprocessor system is sufficient. For smaller systems, the UltraServer 1, SPARCserver 5, or SPARCserver 2 systems have sufficient processor power.
Your environment is predominantly attribute-intensive and you have between 4 to 60 medium-speed Ethernet or Token Ring networks.Use an UltraServer 2, SPARCserver 10, or SPARCserver 20 system.
You have larger attribute-intensive
environments and SBus and disk
expansion capacity is sufficient.
Use multiprocessor models of the UltraServer 2, SPARCserver
10 or the SPARCserver 20 systems.
You have larger attribute-intensive environments.Use dual-processor systems such as: - SPARCserver 10 system Model 512 - SPARCserver 20 system - SPARCserver 1000or 1000E system - Ultra Enterprise 3000, 4000, 5000, or 6000 system - SPARCcenter 2000/2000E system.

Either the 40 MHz/1Mbyte or the 50MHz/2 Mbyte module work well for an NFS work load in the SPARCcenter 2000 system. You will get better performance from the 50 MHz/2Mbyte module.

Your environment is data-intensive and you have a high-speed network.Configure 1 SuperSPARC processor per high-speed network (such as FDDI).
Table 4-1
IfThen
Your environment is data-intensive and you must use Ethernet due to cabling restrictions.Configure 1 SuperSPARC processor for every 4 Ethernet or Token Ring networks.
Your environment is a pure NFS installation.You do not need to configure additional processors, beyond the recommended number, on your server(s).
Your servers perform tasks in addition to NFS processing.Add additional processors to increase performance significantly.

Memory

Since NFS is a disk I/O-intensive service, a slow server can suffer from I/O bottlenecks. Adding memory eliminates the I/O bottleneck by increasing the file system cache size.
The system could be waiting for file system pages, or it may be paging process images to and from the swap device. The latter effect is only a problem if additional services are provided by the system, since NFS service runs entirely in the operating system kernel.
If the swap device is not showing any I/O activity, then all paging is due to file I/O operations from NFS reads, writes, attributes, and lookups.

Determining if an NFS Server Is Memory Bound

Paging filesystem data from the disk into memory is a more common NFS server performance problem. Follow these suggestions to determine if the server is memory bound:
  • Watch the scan rate reported by vmstat 30.

    If the scan rate (sr, the number of pages scanned) is often over 200 pages/second, then the system is short of memory (RAM). The system is trying to find unused pages to be reused and may be reusing pages that should be cached for rereading by NFS clients.

  • Add memory.
Adding memory eliminates repeated reads of the same data and allows the NFS requests to be satisfied out of the page cache of the server. To calculate the memory required for your NFS server, see the section, "Calculating Memory," which follows.
The memory capacity required for optimal performance depends on the average working set size of files used on that server. The memory acts as a cache for recently read files. The most efficient cache matches the current working set size as closely as possible.
Because of this memory caching feature, it is not unusual for free memory in NFS servers to be in the 3-4 Mbyte range (0.5 Mbytes to 1.0 Mbytes for the Solaris 2.4 and later software environments) if the server has been active for a long time. Such activity is normal and desirable. Having enough memory allows you to service multiple requests without blocking.
The actual files in the working set may change over time. However, the size of the working set may remain relatively constant. NFS creates a sliding window of active files, with many files entering and leaving the working set throughout a typical monitoring period.

Calculating Memory

You can calculate memory according to general or specific memory rules. The following sections contain suggestions for calculating memory.

General Memory Rules

  • Virtual memory = RAM (main memory) + Swap space
  • Calculate the amount of memory according to the five-minute rule: Memory is sized at 16 Mbytes plus memory to cache the data, which will be accessed more often than once in five minutes.

Note - These general memory rules are different from the SunOS 4.x operating system where virtual memory = swap space. Swap space must always be greater than RAM with the SunOS 4.x operating system.

Specific Memory Rules

  • If your server primarily provides user data for many clients, configure relatively minimal memory.

    For small installations, this will be 32 Mbytes; for large installations, this will be about 128 Mbytes. In multiprocessor configurations, provide at least 64 Mbytes per processor. Attribute-intensive applications normally benefit slightly more from memory than data-intensive applications.

  • If your server normally provides temporary file space for applications that use those files heavily, configure your server memory to about 75 percent of the size of the active temporary files in use on the server.
For example, if each client's temporary file is about 5 Mbytes, and the server is expected to handle 20 fully active clients, configure it as follows:
(20 clients x 5 Mbytes)/75% = 133 Mbytes of memory Note that 128 Mbytes is the most appropriate amount of memory that is easily configured.
  • If your server's primary task is to provide only executable images configure server memory to be equal to approximately the combined size of the heavily-used binary files (including libraries).

    For example, a server expected to provide /usr/openwin should have enough memory to cache the X server, CommandTool, libX11.so, libview.so and libXt. This NFS application is considerably different from the more typical /home, /src or /data server in that it normally provides the same files repeatedly to all of its clients; hence is able to effectively cache this data. Clients will not use every page of all of the binaries, which is why it is reasonable to configure only enough to hold the frequently-used programs and libraries. Use the cache file system on the client, if possible, to reduce the load and RAM needs on the server.

  • If the clients are DOS PCs or Macintosh machines, add more RAM cache on the Sun NFS server. These systems do much less caching than UNIX system clients.

Swap Space

Configure at least 64 Mbytes virtual memory (RAM plus swap space). For example:
16 Mbytes RAM48 Mbytes swap space
32 Mbytes RAM32 Mbytes swap space
64 or more Mbytes RAMNo swap space
Configure enough swap space to allow the applications to run properly. For more information, see the manual, File System Administration.

Prestoserve NFS Accelerator

Adding a Prestoserve(TM) NFS accelerator is yet another way to improve NFS performance. The NFS version 2 protocol requires that all writes must be written to stable storage before the operation is replied. The Prestoserve NFS accelerator allows high-speed NVRAM instead of slow disks to satisfy the stable storage requirement. The two types of NVRAM used by the Prestoserve NFS accelerator are the SBus and the NVRAM-NVSIMM.

NVRAM-NVSIMM

In cases where you can use either NVRAM hardware, use the NVRAM-NVSIMM as the Prestoserve cache. The NVRAM-NVSIMM and SBus hardware are functionally identical. However, the NVRAM-NVSIMM hardware is slightly more efficient and doesn't use up one of your SBus slots. The NVRAM-NVSIMMs reside in memory and the NVRAM-NVSIMM cache is larger than the SBus hardware.

NVRAM SBus

The NVRAM SBus board contains only a 1 Mbyte cache. The NVRAM-NVSIMM SIMM module contains 2 Mbytes of cache and is supported by the SPARCcenter 2000, SPARCcenter 2000E, SPARCserver 1000, SPARCserver 1000E, SPARCstation 20, and SPARCstation 10 systems. The SPARCserver 1000 and the SPARCcenter 2000 support up to 8 Mbytes and 16 Mbytes of NVRAM, respectively.
For the Ultra Enterprise 3000, 4000, 5000, and 6000 systems, instead of upgrading NVSIMM SIMM modules, upgrade the NVRAM in the SPARCstorage Arrays connected to the server as an alternate method to improve NFS performance. Upgrading the NVRAM in a SPARCstorage Array performs in a comparable manner to upgrading the NVSIMM SIMM modules.
NFS servers can significantly increase performance of NFS write operations by using nonvolatile memory (NVRAM), which is typically much faster than local disk writes. The Prestoserve NFS accelerators take advantage of the fact that the NFS version 2 protocol requires synchronous writes to be written to nonvolatile memory, instead of requiring them to be written directly to disk. As long as the server returns data acknowledged from previous write operations, it can save the data in nonvolatile memory.
Both types of Prestoserve NFS accelerators speed up NFS server performance by:
  • Providing faster selection of file systems
  • Caching writes for synchronous I/O operations
  • Intercepting synchronous write requests to disk and storing the data in nonvolatile memory

Adding the SBus Prestoserve NFS Accelerator

The SBus Prestoserve NFS accelerator resides on the SBus. You can add the SBus Prestoserve NFS accelerator to any SBus-based server, except the SPARCserver 1000(E) system, the SPARCcenter 2000(E) system, or the Ultra Enterprise 3000, 4000, 5000, or 6000 server systems.

Note - The Ultra Enterprise 3000, 4000, 5000, and 6000 server systems do not support the SBus Prestoserve NFS accelerator.

Systems on which you can add the SBus Prestoserve NFS accelerator:
  • SPARCclassic system
  • SPARCserver LX system
  • SPARCserver 5 system
  • SPARCserver 10 system
  • SPARCserver 20 system
  • Ultra Enterprise 1 system
  • Ultra Enterprise 2 system
  • SPARCserver 600 series

Adding the NVRAM-NVSIMM Prestoserve NFS Accelerator

The NVRAM-NVSIMM Prestoserve NFS accelerator significantly improves the response time seen by NFS clients of heavily loaded or I/O-bound servers. Add the NVRAM-NVSIMM Prestoserve NFS accelerator to the following platforms to improve performance:
  • SPARCserver 10 system
  • SPARCserver 20 system
  • SPARCserver 1000 or 1000E system
  • SPARCcenter 2000 or 2000E system
For Ultra Enterprise 3000, 4000, 5000, and 6000 server systems enable SPARCstorage Array NVRAM fast writes. Turn on fast writes by invoking the ssaadm command.

Note - The introduction of Version 3 reduces the need for Prestoserve capability.

Tuning Parameters

This section describes how to set the number of NFS threads. It also covers tuning the main NFS performance-related parameters in the /etc/system file. Tune these /etc/system parameters carefully, considering the physical memory size of the server and kernel architecture type.

Note - Arbitrary tuning creates major instability problems, including an inability to boot.

Setting the Number of NFS Threads in /etc/init.d/nfs.server

All NFS server configurations must set the number of NFS threads. Each thread is capable of processing one NFS request. A larger pool of threads allows the server to handle more NFS requests in parallel. The default setting, 16, in Solaris 2.3 and later software environments, results in less than desired NFS response times. Scale the setting with the number of processors and networks. Increase the number of NFS server threads by editing the invocation of nfsd in /etc/init.d/nfs.server:

  /usr/lib/nfs/nfsd -a 64  

This example specifies that the maximum allocation of demand-based NFS threads is 64.
There are three ways to size the number of NFS threads. Each method results in about the same number if you followed the configuration guidelines in this manual. Extra NFS threads do not cause a problem.
To set the number of NFS threads:
Take the maximum of the following three suggestions:
  • Use 2 NFS threads for each active client process. A client workstation usually only has one active process. However, a time-shared system that is an NFS client may have many active processes.
  • Use 16 to 32 NFS threads for each CPU. Use roughly 16 for a SPARCclassic or a SPARCstation 5 system. Use 32 NFS threads for a system with a 60 MHz SuperSPARC processor.
  • Use 16 NFS threads for each 10 Mbits of network capacity. For example, if you have FDDI, set the number of NFS threads to 160 nfsds.

Identifying Buffer Sizes and Tuning Variables

The number of fixed-size tables in the kernel has been reduced in each release of the Solaris software environment. Most are now dynamically sized or are linked to the maxusers calculation. Extra tuning to increase the DNLC and inode caches is required for the Solaris 2.2, and later software environments. For Solaris version 2.2, 2.3, and 2.4 you must tune the pager. Tuning the pager is not necessary for the Solaris 2.5 version.

Using /etc/system to Modify Kernel Variables

The /etc/system file is read by the operating system kernel at start-up. It configures the search path for loadable operating system kernel modules and allows kernel variables to be set. For more information, see the man page for system(4).

Note - Be very careful with set commands in /etc/system. The commands in /etc/system cause automatic patches of the kernel.

If your machine will not boot and you suspect a problem with /etc/system, use the boot -a option. With this option, the system prompts (with defaults) for its boot parameters. One of these is the configuration file /etc/system. Either enter the name of a backup copy of the original /etc/system file or enter /dev/null. Fix the file and immediately reboot the system to make sure it is operating correctly.

Adjusting Cache Size: maxusers

The maxusers parameter determines the size of various kernel tables such as the process table. The maxusers parameter is set in the /etc/system file. For example:

  set maxusers = 200  

Solaris 2.2 Software Environment

In the Solaris 2.2 software environment, maxusers is dynamically sized based upon the amount of RAM configured in the system. The automatic configuration of maxusers in the Solaris 2.2 software environment is based upon the value of physmem, which is the amount of memory (in pages) after the kernel allocates its own code and initial data space of around 2 Mbytes. The automatic scaling stops when memory exceeds 128 Mbytes.
For systems with 256 Mbytes or more of memory, set maxusers to the generic safe maximum of 200, or leave it and set the individual kernel resources directly. The actual safe maximum level is hardware-dependent, and varies with the operating system kernel architecture.
Table 4-2 Maxusers
RAM ConfigurationMaxusersProcessesName Cache
Up to and including 16 Mbytes8138226
Up to and including 32 Mbytes32522634
Up to and including 64 Mbytes40650770
Up to and including 128 Mbytes6410341178
Over 128 Mbytes12820582266

Note - The operating system kernel uses about 1.5 Mbytes. Therefore, over 130 Mbytes of memory will be required for maxusers to be set to 128.

The name cache size is the number used for:
ncsize....Number of directory name entries in the DNLC
ufs_ninode         Inactive inode cache size limit

Solaris 2.3 and Later Software Environment

In the Solaris 2.3 and later software environments, maxusers is dynamically sized based upon the amount of RAM configured in the system. The sizing method used for maxusers is:
maxusers = Mbytes of RAM configured in the system
The number of Mbytes of RAM configured into the system is actually based upon physmem which does not include the 2 Mbytes or so that the kernel uses at boot time. The minimum limit is 8 and the maximum automatic limit is 1024, corresponding to systems with 1 Gbyte or more of RAM. It can still be set manually in /etc/system but the manual setting is checked and limited to a maximum of 2048. This is a safe level on all kernel architectures, but uses a large amount of operating system kernel memory.

Parameters Derived from maxusers

Table 4-3 describes the default settings for the performance-related inode cache and name cache operating system kernel parameters.
Table 4-3
Kernel ResourceVariableDefault Setting
Inode Cacheufs_ninode17 * maxusers + 90
Name Cachencsize17 * maxusers + 90

Adjusting the Buffer Cache: bufhwm

The bufhwm variable, set in the /etc/system file, controls the maximum amount of memory allocated to the buffer cache and is specified in Kbytes. The default value of bufhwm is 0, which allows up to 2 percent of system memory to be used. This can be increased up to 20 percent. It may need to be increased to 10 percent for a dedicated NFS file server with a relatively small memory system. On a larger system the bufhwm variable may need to be limited to prevent the system from running out of the operating system kernel virtual address space.
The buffer cache is used to cache inode, indirect block, and cylinder group related disk I/O only. Following is an example of a buffer cache ( bufhwm) setting in the /etc/system file that will allow up to 10 Mbytes of cache. This is the highest value to which you should set bufhwm.

  set bufhwm=10240  

You can monitor the buffer cache using sar -b which reports a read (%rcache) and. a write hit rate (%wcache) for the buffer cache
Code Example 4-1 Output of the sar -b 5 10 Command

  # sar -b 5 10  
  SunOS hostname 5.2 Generic sun4c    08/06/93  
  23:43:39 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s  
  Average        0      25     100       3      22      88       0       0  

If there is a significant (greater than 50) number of reads and writes per second and if the read hit rate (%rcache) falls below 90 percent or if the write hit rate (%wcache) falls below 65 percent, increase the buffer cache size, bufhwm.
In the previous sar -b 5 10 command output, the read hit rate (%rcache) and the write hit rate (%wcache) did not fall below 90 percent or 65 percent respectively.
The following lists the arguments to the sar command shown in Code Example 4-1.
bChecks buffer activity
5Time. Every 5 seconds. (must be at least 5 seconds)
10Number of times the command gathers statistics

Note - If you increase your buffer cache too much, your server may hang, as other device drivers could suffer from a shortage of operating system kernel virtual memory.

Directory Name Lookup Cache (DNLC)

Size the directory name lookup cache (DNLC) to a default value using maxusers. A large cache size (ncsize) significantly helps NFS servers, with many clients.
· To show the DNLC hit rate (cache hits), type vmstat -s

  % vmstat -s  
  ... lines omitted  
  79062 total name lookups (cache hits 94%)  
  16 toolong  

Directory names less than 30 characters long are cached and names that are too long to be cached are reported as well. A cache miss means that a disk I/O may be needed to read the directory when traversing the path name components to get to a file. A hit rate of much less than 90 percent needs attention.
Cache hit rates can significantly affect NFS performance. getattr, setattr and lookup usually represent greater than 50 percent of all NFS calls. If the requested information isn't in cache, the request will generate a disk operation, resulting in a performance penalty as significant as that of a read or write request. The only limit to the size of the DNLC cache is available kernel memory.
If the hit rate (cache hits) is less than 90 percent and there is no problem with too many longnames, tune the variable, ncsize. See the procedure "To reset ncsize" on page 83. The variable ncsize refers to the size of the DNLC in terms of the number of name and vnode translations that can be cached. Each DNLC entry causes about 50 bytes of extra kernel memory to be used.
· To reset ncsize
  1. Set ncsize in the /etc/system file to values higher than the default (based on maxusers.)

    As an initial guideline, since dedicated NFS servers do not need a lot of RAM, maxusers will be low and the DNLC will be small. Double its size.


  set ncsize=5000  

The default value of ncsize is:
ncsize (name cache) = 17 * maxusers + 90
  • For NFS server benchmarks, set it as high as 16000.
  • For maxusers = 2048, set it at 34906.
  1. Reboot the system.

See the section "Increasing the Inode Cache,"which follows.

Increasing the Inode Cache

A memory-resident inode is used whenever an operation is performed on an entity in the file system. The inode read from disk is cached in case it is needed again. The maximum number of active and inactive inodes that the system will cache is set by ufs_ninode.
The inodes are kept on a linked list, rather than a fixed-size table. It is possible that the number of open files in the system can cause the number of active inodes to exceed the limit. Raising the limit allows inactive inodes to be cached in the Solaris 2.3 software environment in case they are needed again. In the Solaris version 2.4 and later, the ufs_ninode count applies only to inactive inodes.
Every entry in the DNLC cache points to an entry in the inode cache, so both caches should be sized together. The inode cache should be at least as big as the DNLC cache. For best performance, it should be the same size in the Solaris version 2.4 and later, but twice the DNLC in the Solaris versions 2.2 and 2.3.
Since it is just a limit, ufs_ninode can be tweaked with adb on a running system with immediate effect. The only upper limit is the amount of kernel memory used by the inodes. The tested upper limit corresponds to maxusers = 2048, which is the same as ncsize at 34906.
Use sar -k to report the size of the kernel memory allocation. Each inode uses 512 bytes of kernel memory from the lg_mem pool in the Solaris versions 2.2 and 2.3. In the Solaris version 2.4, each inode uses 300 bytes of kernel memory from the lg_mem pool. In Solaris version 2.5.1, each inode uses 320 bytes of kernel memory from the lg_mem pool.

Tuning ncsize in the Solaris 2.5.1 and Later Software Environments

In Solaris version 2.5.1, ufs_ninode is automatically adjusted to be at least ncsize. With Solaris version 2.5.1 and later, tune ncsize to get the hit rate up. Allow the system to pick the default ufs_ninodes.

· To increase the inode cache


Note - This procedure applies to Solaris software environments through the Solaris 2.5 software environment. If you have the Solaris 2.5.1 or later software environments, see "Tuning ncsize in the Solaris 2.5.1 and Later Software Environments" on page 84.

If the inode cache hit rate is below 90 percent, or if the DNLC requires tuning for local disk file I/O workloads:
  1. Increase the size of the inode cache.

    Change the variable ufs_ninode in your /etc/system file to the same size as the DNLC (ncsize). For example, for the Solaris version 2.4:


  set ufs_ninode=5000  

For Solaris releases prior to Solaris 2.4:

  set ufs_ninode=10000  

The default value of the inode cache is the same as that for ncsize:
ufs_ninode (default value) = 17 * maxusers + 90.

Caution - Do not set ufs_ninode less than ncsize.

The Solaris 2.x software environment versions, up to the Solaris version 2.3, run more efficiently with the ufs_ninode set to twice ncsize. In Solaris version 2.4 and later, ufs_ninode now limits only the number of inactive inodes, rather than the total number of active and inactive inodes.
  1. Reboot the system.

Increasing Read Throughput

If you are using NFS over a high speed network, such as FDDI, SunFastEthernet, or SunATM, you will obtain better read throughput by increasing the number of read-aheads on the NFS client. The read-ahead functionality is a new feature of Solaris 2.4.
Increasing read-aheads is not recommended in these conditions:
  • The client is very short of RAM.
  • The network is very busy.
  • File accesses are randomly distributed.
When free memory is low, read-ahead will not be performed.
The read-ahead is set to 1 block, by default (8 Kbytes). For example, a read-ahead set to 2 blocks fetches an additional 16 Kbytes from a file while you are reading the first 8 Kbytes from the file. In other words, the read-ahead stays one step ahead of you and fetches information in 8 Kbyte increments to stay ahead of information you need.
The read throughput stops getting faster after setting the number of read-aheads to 6 or 7 blocks. When setting the number of read-aheads, throughput does not usually increase with more than eight read-aheads (8 blocks).

Note - In the following procedures "To increase the number of read-aheads in the Solaris 2.4 version and later with NFS Version 2" and "To increase the number of read-aheads in Solaris 2.5 version and later with NFS Version 3the nfs_nra and the nfs3_nra values can be tuned independently.
If a client is running the Solaris version 2.5 or later, the client may need to tune nfs_nfs (NFS Version 2). This happens if the client is talking to a server that does not support Version 3.
· To increase the number of read-aheads in the Solaris 2.4 version and later with NFS Version 2
  1. Add the following line to /etc/system on the NFS client.


  set nfs:nfs_nra=4  

  1. Reboot the system to implement the read-ahead value.

· To increase the number of read-aheads in Solaris 2.5 version and later with NFS Version 3
  1. Add the following line to /etc/system on the NFS client.


  set nfs:nfs3_nra=6  

  1. Reboot the system to implement the read-ahead value.