Contidos dentro
Localizar Mais Documentação
Destaques de Recursos de Suporte
| Fazer download desta apostila em PDF
Overview of Solstice DiskSuite
3
- Solstice DiskSuite is an unbundled software package that offers enhanced availability, performance, capacity, reliability, and administration.
- DiskSuite provides higher data availability and reliability by supporting the mirroring of any file systems, including root, swap, and /usr. Disk striping increases performance by spreading requests over multiple components. A combination of concatenation and striping increases capacity by grouping components into a single large logical device. Administration is simplified by the automatic replacement of failed components within a mirror and the dynamic growth of metadevices and file systems. UNIX file system (UFS) logging speeds up local directory operations and reboots and decreases the number of synchronous disk writes.
- This chapter provides a high level overview of the features included with the DiskSuite software package. Use the following table to locate specific information.
-
-
Elements of the Metadisk Driver
- The metadisk driver is implemented as a set of loadable, pseudo device drivers. The metadisk driver uses other physical device drivers to pass I/O requests to and from the underlying devices.
- The metadisk driver resides between the file system interface and the device driver interface and interprets information from both above and below. After passing through the metadisk driver, information is received in the expected form by both the file system and by the device drivers. The metadisk is a loadable device driver and has all the same interfaces as any other device driver.
-
Figure 3-1 illustrates the position of a metadevice driver in the kernel hierarchy.

Figure 3-1
- An overview of the primary elements of the metadisk driver is provided in the following sections. These elements include:
-
- Metadevices
- Concatenation and striping
- Mirroring (metamirrors and submirrors)
- UFS logging
- Hot spares
- Disksets
- RAID devices
Metadevices
- Metadevices are the basic functional unit of the metadisk driver. After you create metadevices, you can use them like physical disk partitions. These logical devices can be made up of one or more component partitions. You can configure the component partitions to use a single device, a concatenation of stripes, or stripe of devices.
- Metadevices can provide increased capacity, higher availability, and better performance. To gain increased capacity, you create metadevices that are either concatenations or stripes. Mirroring and UFS logging provide higher availability and striping can help performance.
- Metadevices are transparent to applications software and to component and controller hardware.
- The standard metadevice name begins with ''d'' and is followed by a number. By default, there are 128 unique metadevices in the range 0-to-127. Additional metadevices can be added. Metadevice names are located in /dev/md/dsk and /dev/md/rdsk.
- When using DiskSuite commands such as metastat(1M) and metainit(1M) or when setting up metadevices in the md.tab file, the metadevice name does not need to be fully qualified. For example, you can enter d1 rather than /dev/md/dsk/d1. The examples presented in this manual use the short and long forms for metadevice names interchangeably.
- Metadevices can be configured from IPI and SCSI devices on all SPARC systems and on SCSI and IDE devices on all x86 systems.
Concatenation and Striping
- Each metadevice is either a concatenation or a stripe of component partitions. Concatenations and stripes work much the way the cat(1) program is used to concatenate two or more files together to create one larger file. When partitions are concatenated the addressing of the component blocks is done on the components sequentially. The file system can use the entire concatenation.
- You can use a concatenated or striped metadevice for any file system with the exceptions of root, swap, /usr, /var, /opt, or any file system accessed during a Solaris upgrade or install.
-
Figure 3-2 illustrates the concatenation of three 327 Mbyte components. The block size is 512 bytes. The logical block address ranges are listed below the drives. The physical block address range for each of the three drives would be 0 to 669695. When concatenated, the logical block address range is sequential from 0-to-2009087. In this illustration the disk labels are not part of the logical block addresses.

Figure 3-2
- Striping is similar to concatenation, except the addressing of the metadevice blocks is interlaced on the components, rather than addressed sequentially. When stripes are defined, an interlace size is specified following the -i option. The interlace size is a number (for example, 8, 16, 32, etc.) followed by ''k'' for kilobytes, ''m'' for megabytes, or ''b'' for (512-byte) blocks. The units can be
- specified in either uppercase or lowercase. If the size is not specified, it defaults to 16Kbytes. The interlace value tells DiskSuite how much data is placed on a component before moving to the next component of the stripe.
- Because data is spread across a stripe, you gain increased performance as reads and writes are spread across multiple disk arms. Also, concurrent I/O requests may use different disk arms. This may be true of concatenation as well.
- In Figure 3-3, three 327 Mbyte components are used to illustrate a stripe of component partitions. The block size is 512 bytes. The interlace value is 8 Kbytes (or 16 512-byte blocks). The same logical address range as shown for the concatenation in Figure 3-2 (0 to 2009087) applies to a stripe of the same component configuration. However, in striping, the logical block addresses on each component are alternated according to the interlace size specified. In this illustration the disk labels are not part of the logical block addresses.

Figure 3-3
-
Figure 3-3 further illustrates how improved performance is gained through striping. For example, if a one Mbyte request were issued to this configuration, the data would be spread across the three components and the component arms on all components would be used to retrieve the data concurrently.
Mirroring
- DiskSuite supports mirroring to as many as three separate metadevices. This enables the system to tolerate single-component failures with two-way mirroring and double failures with three-way mirroring. Mirroring can also be used for online backups of file systems.
- To set up mirroring, you create a metamirror. A metamirror is a special type of metadevice made up of one or more other metadevices. Each metadevice within a metamirror is called a submirror.
- Metamirrors can be given names such as d0. The same naming convention used for metamirrors is used for metadevices. For example, a metamirror could have the name d1 and a metadevice might have the name d2, or visa versa.
- After you define a metamirror (with the metainit command for example), you can add additional submirrors at a later date without bringing the system down or disrupting writes and reads to existing metamirrors. Submirrors are added with the metattach(1M) command, which attaches a submirror to the metamirror.
- When the submirror is attached, all the data from another submirror in the metamirror is automatically written to the newly attached submirror. This is called resyncing. After the resyncing is complete, the new submirror is readable and writable. Once a metattach is performed, the submirrors remain attached (even when the system is rebooted) until they are explicitly detached with the metadetach(1M) command.
- You can use other DiskSuite utilities to perform maintenance on metamirrors and submirrors. If a controller fails, any submirrors on that controller can be taken offline with the metaoffline(1M) utility. While a submirror is offline, DiskSuite keeps track of all writes to the metamirror. When the submirror is brought back online with the metaonline(1M) command, only the portions of the metamirror that were written (called resync regions) are resynced.
UNIX File System Logging
- The UNIX file system (UFS) logging facility included with DiskSuite provides faster local directory operations, speeds up reboots, and decreases synchronous disk writes by safely recording file system updates in a log before they are applied to the UFS file system.
-
Note - The UFS logging facility can only be used with Solaris 2.4 or later releases.
- UFS is the standard Solaris file system. UFS file systems are created when Solaris is installed or by users with the newfs(1M) command.
- Because a system crash can interrupt system calls that are already in progress and thereby introduce inconsistencies, UFS file systems should be checked before they are mounted again. Mounting a UFS file system without first checking it and repairing any inconsistencies can cause panics or data corruption. Checking large file systems is a slow operation because it requires reading and verifying the file system data. With the UFS logging facility, UFS file systems do not have to be checked at boot time because the changes from unfinished system calls are discarded.
- A pseudo device, called the metatrans device, is responsible for managing the contents of the log of file system updates. Like other metadevices, the metatrans device behaves the same as an ordinary disk device. The metatrans device is made up of two subdevices: the logging device and the master device. These can be disk partitions, metadevices, and metamirrors; but not metatrans devices.
-
Figure 3-4 illustrates the metatrans device /dev/md/dsk/d1 and the two subdevices of which it's comprised. /dev/dsk/c0t0d0s3 is the master device and /dev/dsk/c1t0d0s3 is the logging device.

Figure 3-4
- As shown in Figure 3-4, the same naming convention used for metamirrors and metadevices is used for metatrans devices. For example, a metatrans device could have the name d1 and a metamirror might have the name d2, or visa versa.
- The logging device contains the log of file system updates. This log consists of a sequence of records, each of which describes a change to a file system. The master device contains an existing or a newly created UFS file system.
- The master device can contain an existing UFS file system because creating a metatrans device does not alter the master device. The difference is that updates to the file system are written to the log before being "rolled forward" to the UFS file system. Likewise, clearing a metatrans device leaves the UFS file system on the master device intact.
- As illustrated in Figure 3-5, a logging device can also be shared among metatrans devices. The figure shows the first metatrans device /dev/md/dsk/d1 and the two subdevices of which it's comprised. /dev/dsk/c0t0d0s3 is the master device and /dev/dsk/c1t0d0s3 is the logging device. The second metatrans device /dev/md/dsk/d2 is shown with its own master device /dev/dsk/c2t0d0s3 and the logging device /dev/dsk/c1t0d0s3 it shares with the first metatrans device.

Figure 3-5
Hot Spares
- DiskSuite's hot spare facility automatically replaces failed submirror or RAID components, provided that a spare component is available and reserved. Hot spares are temporary fixes, used until failed components are either repaired or replaced. Hot spares provide further security from downtime due to hardware failures.
- The analogy that best describes hot spares is spare tires for cars. However, when a component fails, you do not have to stop and change to the hot spare component manually. This occurs automatically and without interruption of service. When a component fails, the replicated data is simply copied from any of the other submirrors or regenerated from the other components in a RAID device. Writes continue to the other components of the submirror or RAID device containing the failed component.
- Hot spares are defined within hot spare pools, which can be a shared resource for all the submirrors and RAID devices you have configured. Individual hot spares can be included in one or more hot spare pools. For example, you may have three submirrors and three hot spares. The three hot spares can be arranged as three hot spare pools, with each pool having the three hot spares in a different order of preference. This enables you to specify which hot spare is used first. It also improves availability by having more hot spares available.
Disksets
- DiskSuite's diskset feature lets you set up groups of host machines and disk drives in which all of the hosts in the set are connected to all the drives in the set. A diskset is a grouping of two hosts and disk drives in which all the drives are accessible by both hosts. DiskSuite requires that the device name be identical on each host in the diskset. There is one metadevice state database per shared diskset, as well as the one on the local diskset.
RAID Devices
- DiskSuite's RAID feature provides support for RAID devices. RAID stands for "Redundant Arrays of Inexpensive Disks." DiskSuite RAID devices support RAID Level 5.
- RAID devices are comprised of three or more physical partitions. Each partition is referred to as a column. A RAID metadevice can be grown by concatenating additional partitions to the metadevice.
- RAID level 5 uses multiple physical partitions used to simulate a single large slice (partition). A single sector on one of these physical slices contain either a sector's worth of contiguous data, or parity information relating to the data on the same sector of all other slices in the array.
- In order to eliminate a parity partition as a bottleneck, no one physical partition will hold all of the parity information; it will be placed on different partitions for different sectors.
- The advantages of a RAID Level 5 configuration are that it can recover from a single disk failure and that it can be more cost effective than mirroring disks.
State Database Replicas
- State database replicas provide the non-volatile storage necessary to keep track of configuration and status information for all metadevices, metamirrors, metatrans devices, hot spares, and RAID devices. The replicas also keep track of error conditions that have occurred.
- After a metadevice is configured, it is necessary for the metadevice driver to remember its configuration and status information. The metadevice state database is the metadevice driver's long term memory. The metadevice driver stores all the metadevice configuration information in the state database. This includes the configuration information about mirrors, submirrors, concatenations, stripes, metatrans devices, and hot spares.
- If the replicated metadevice state database were to be lost, the metadevice driver would have no way of knowing any configuration information. This could result in the loss of all data stored on metadevices. To protect against losing the metadevice state database because of hardware failures, multiple replicas (copies) of the state database are kept.
- These multiple replicas also protect the state database against corruption that can result from a system crash. Each replica of the state database contains a checksum. When the state database is updated, each replica is modified one at a time. If a crash occurs while the database is being updated, only one of the
- replicas will be corrupted. When the system reboots, the metadevice driver uses the checksum embedded in the replicas to determine if a replica has been corrupted. Any replicas that have been corrupted are ignored.
- If a disk that contains the metadevice state database is turned off, the metadevices remain fully functional because the database is retrieved from one of the replicas still in operation. Changes made to the configuration following the reboot are stored only in the replicas that are in operation when the system comes back up. If the disk drive that was turned off is later turned back on, the data contained in the replica stored on that disk is ignored. This is accomplished by comparing it with other replicas.
Expanding Mounted File Systems
- You can expand mounted or unmounted UFS file systems with the DiskSuite concatenation facilities and the growfs(1M) command. The expansion can be performed without bringing down the system or performing a backup.
- Mounted or unmounted file systems can be expanded up to the new size of the metadevice on which the file system resides.
-
Note - Once you have expanded a file system, it cannot be shrunk.
DiskSuite Commands and Utilities
- There are several new utilities included with the DiskSuite package. An overview of the DiskSuite utilities follows. For a complete definition of the functionality and options associated with the utilities, refer to Appendix D.
-
-
metaclear(1M) - Clears (deletes) all (or only specified) metadevices and/or hot spare pools from the configuration. After a metadevice is cleared, it must be reconfigured again with the metainit utility. metaclear does not clear metadevices that are currently in use (open). You can never clear root and swap metadevices.
-
metadb(1M) - Reserves or releases space for the metadevice state databases, which are used in the event of a system failure to determine the status and configuration of the metadevices. All metadevice state databases contain
- identical information, which guards against the loss of configuration information. If the status and configuration information is lost, the metadevices will no longer operate.
-
-
- One other utility associated with DiskSuite is:
-
-
growfs(1M) - Nondestructively expands a mounted or unmounted file system up to the size of the physical device allocated for the file system.
System Files Associated with DiskSuite
- There are three system files associated with DiskSuite that are used by the various utilities:
-
-
md.tab - Used by the metainit and metadb commands as a workspace file. Each metadevice may have a unique entry in this file. Tabs, spaces, comments (using the pound sign (#) character), and continuation of lines (using the backslash (\) character) can be used in the file.
-
Note - The md.tab file is used only when creating metadevices, hot spares, or database replicas. This file is not automatically updated by the DiskSuite utilities. This file may have little or no correspondence with actual metadevices, hot spares, or replicas.
-
-
md.cf - Automatically updated whenever the configuration is changed by the user. This is basically a disaster recovery file and should never be edited by the user. This file should never be used blindly after a disaster as the md.tab file. Be sure to carefully examine the file first.
-
Note - The md.cf file does not get updated when hot sparing occurs.
-
-
mddb.cf - Created whenever the metadb command is run and is used by metainit to find the locations of the metadevice state database. You should never edit this file. Each metadevice state database replica has a unique entry in this file. Each entry contains the driver name and minor unit numbers associated with the block physical device where the replica is stored. Each entry also contains the block number of the master block, which contains a list of all other blocks in the replica.
|
|