|
| 以 PDF 格式下載這本書
RAID Devices
9
- This chapter discusses RAID (Redundant Arrays of Inexpensive Disks) device configuration information. The information provided includes hardware and software considerations along with instructions for managing RAID devices. Use the following table to locate specific information.
-
RAID Overview
- DiskSuite RAID devices support RAID Level 5. RAID Level 5 configuration allows you to recover from a single disk failure. It can also be more cost effective than mirroring disks, especially when it comes to hardware costs.
- RAID must be comprised of three or more physical partitions. Each partition is referred to as a component. A RAID metadevice can be grown by concatenating additional partitions to the metadevice.
- RAID level 5 includes multiple physical partitions used to simulate a single large slice (partition). A single sector on one of these physical slices contain either a sector's worth of data, or parity information relating to the data on the same sector of all other slices in the array.
- In order to eliminate a parity partition as a bottleneck, no one physical partition will hold all of the parity information; it will be placed on different partitions for different sectors.
-

-
Warning - Because RAID devices require the parity to be intermingled with data, running metainit on devices that comprise a RAID device destroys existing data.
Operation of RAID
- The following operations are supported for RAID metadevices using Soltice DiskSuite 4.0:
-
- Configuring and defining RAID metadevices.
- Concatenating new components to existing RAID metadevices.
- Allocating hot spare pools to RAID metadevices to provide component backup in the event of original component failures.
- Replacing errored components or in-use hot spare components with a new component.
- Resynchronizing components during reboot.
- Viewing the status of RAID metadevices.
- Clearing RAID metadevices.
- Each of these operations is discussed in more detail in the following sections.
Creating RAID Metadevices
- In defining a RAID metadevice, the definition of the metadevice is included on the command line as options and parameters to the metainit command. The following is an example of a RAID metadevice command line definition:
-
# metainit /dev/md/dsk/d80 -r /dev/dsk/c0t0d0s1 \
/dev/dsk/c1t0d0s1 /dev/dsk/c2t0d0s1 -i 8k
|
- The option parameters specified in the example above define the characteristics of the RAID metadevice. The -r option informs the metainit utility that the metadevice being defined is a RAID metadevice. The -i option defines the interlace size (8 kilobytes) that is to be used when configuring the RAID metadevice for striping of the data and parity regions of the components defined on the command line, /dev/dsk/c0t0d0s1, /dev/dsk/c1t0d0s1, and /dev/dsk/c2t0d0s1 in the example above.
- The following sections describe how to check the status of RAID metadevices and how to alter their configuration. Step-by-step examples demonstrate the entire procedure for setting up RAID metadevices.
Resyncing RAID Devices
- All RAID metadevices are resynchronized during reboot via the metasync -r (1M)command. During the resync operation, the state of the RAID metadevice is validated and any operations that may have been halted due to a system panic, a system reboot, or a failure to complete (possibly due to an I/O error on an individual component) are restarted. Any of the states defined in "Checking Status" on page 156, are valid unless components are found to be either in the Initializing or Resyncing state and the overall RAID metadevice state contradicts this information. Upon validation of a RAID metadevice, if the state of any components of the metadevice are in the Initializing or Resyncing state, the appropriate operation (init or resync, respectively) will be reinitiated on the component(s) necessary.
- Once the metasync operation completes, the state of the metadevice may either be in the "Okay" state, indicating full component availability, or in one of the two maintenance states ("Maintenance" or "Last Erred"), indicating
- errors were encountered on component(s) during the resync operation. If one of the latter states prevails, use the metareplace(1M) command to perform the appropriate level of data recovery.
Reconfiguring RAID Devices
- Reconfiguration of a RAID metadevice means altering the original composition of the metadevice either by concatenating additional components, replacing components (error or non-errored), or assigning a hot spare pool to the metadevice (to provide a backup in the event that errors are encountered on any of the components of the metadevice). Each of these reconfiguration options is discussed in more detail in the following sections.
Concatenating Components
- Concatenation is the appending of new components to an existing metadevice. Concatenation of a component to a RAID metadevice allows the device to grow by allocating additional disk space from the concatenated components.
-
Note - No parity information is stored on the newly appended components.
- The following steps "grow" an existing metadevice by adding a single concatenated component:
-
- Use metattach(1M) specifying the RAID metadevice to grow and the new component(s) to add to the metadevice configuration.
- If a UFS filesystem exists on the RAID metadevice, run growfs(1M) to update the filesystem so that the additional space is allocated and recognized by the filesystem.
- At this point the new component has been attached to the metadevice and may be used as though it were an original component.
-
Note - Once a component is attached to a RAID device, it cannot be removed.
Replacing Components
- Once an I/O error is detected on a component of a RAID metadevice, no further I/O's will be performed on that component (unless the component is in the "Last Erred" state, refer to "Checking Status" on page 156). In this situation, the administrator would most likely want to perform some type of error recovery such that the state of the RAID device is non-errored and the possibility of data loss is reduced.
- There are two methods for performing recovery:
-
- attempt to recover from possible soft errors by enabling the currently errored component
- replace the existing errored component with a new component
- Both of these methods involve the use of the metareplace(1M) utility to perform component replacement and data recovery. During component replacement, data is recovered. If a hot spare is currently in use, the data is copied from the hot spare. When no hot spare is in use, data is rebuilt using the parity.
- The following steps demonstrate how to replace an errored component of a RAID metadevice in which only one component is errored.
-
- Determine if you have any additional components that are available to be used to replace the errored component.
- If other device components are available, run metareplace with the new component.
- If no other components are available, run metareplace with the -e option to attempt to recover from possible soft errors by resyncing the errored device.
- If multiple errors exist, the device in the "Maintenance" state must first be replaced or enabled.
-

-
Caution - Replacing an errored component when multiple components are in error may cause data to be fabricated. The integrity of the data in this instance is questionable.
- Note that you can use the metareplace command on non-errored devices to change a disk. This can be useful for tuning performance of RAID devices.
Changing Hot Spare Pool Association
- The only parameter of a RAID metadevice that may be altered is the allocation of hot spare pools. The hot spare pool association can be changed on an existing RAID metadevice regardless of whether the metadevice is in use or not. Hot spare pools may be allocated, deallocated, or reassigned at anytime unless a component in the hot spare pool is currently being used to replace an errored component in the RAID metadevice.
- The following steps describe how to allocate a hot spare pool to a RAID metadevice:
-
- Create and configure a RAID metadevice.
- Create a hot spare pool of unused disk partitions
- Run the metaparam(1M) utility to assign this hot spare pool to an existing RAID metadevice.
- Once the hot spare pool has been assigned to the metadevice, any components that are currently errored or error in the future will be replaced by a component in the hot spare pool as long as hot spare components are available and are at least as large as the smallest component in the metadevice.
-
Note - To avoid data fabrication, DiskSuite will not allow hot sparing of a metadevice if any devices within that metadevice are in the "Last Erred" state.
Checking Status
- Like other metadevices --mirrors, stripes and concatenations-- the status of RAID metadevices may be observed using the metastat (1M) command. The states of RAID metadevices vary as do the components of RAID metadevices. The following are brief descriptions of the possible states of RAID metadevices.
-
-
Initializing - This state indicates that the components are in the process of having all disk blocks zeroed. This is necessary due to the nature of RAID metadevices with respect to data and parity interlace striping. If an I/O
- error occurs during this process, the device will go into the "Maintenance" state. If the initialization fails, the metadevice is in the "Init Failed" state and the component is in the "Maintenance" state.
- To recover from this condition, run metaclear(1M) to clear the RAID metadevice and reinitialize the RAID metadevice with a new component to replace the one in error.
- Once the state of the RAID metadevice changes to the "Okay" state, the initialization process is complete and you are once again able to open to RAID device. Up to this point, applications will continue getting the error message.
-
Hardware and Software Considerations
- There are both hardware and software considerations that affect RAID metadevices.
- The software considerations include:
-
- The values assigned to the interlace size when building a RAID metadevice
- Concatenation
- Performance
- The use of a RAID metadevice as a component of another metadevice
- The hardware considerations include:
-
- Mixing different size components
- The number of controllers
- Mixing components with different geometry
- The I/O load on the bus
Assigning Interlace Values
- The key to performance using RAID is the interlace value. The value is user configurable at the time a metadevice is created. Thereafter, the value cannot be modified.
- The interlace value defaults to 16Kbytes. This is a reasonable value for most applications. If the different components in the RAID metadevice reside on different controllers and the accesses to the metadevice are primarily large sequential accesses, then a interlace value of 32Kbytes may have better performance.
Concatenating to a Device
- Concatenating a new component to an existing RAID metadevice will have an impact on the overall performance of the metadevice because the data on concatenated components is sequential; data is not striped across all components. The original components of the metadevice, as discussed in the introduction, have data and parity striped across all components. This striping is lost for the concatenated component, although the data will still be recoverable if this component errors since the parity will still be used during component I/O.
- Concatenated components also differ in the sense that they will not have parity striped on any of the regions, thereby, allowing the entire contents of the component (up to the current component size) to be available for data.
- Any performance enhancements for large or sequential writes are lost when components are concatenated.
Write Performance
- A RAID metadevice maintains parity for multiple partitions. When a write is issued to a RAID metadevice, multiple I/Os are performed to adjust the parity. For this reason, the type of application should be considered; applications with a high ratio of reads to writes will perform better on a RAID device.
Performance of a Degraded Device
- When a RAID metadevice requires maintenance due to a failed disk, the parity is used to reconstruct the data; this requires reading multiple partitions to reconstruct the data. The more components assigned to a RAID metadevice with a failed disk, the longer a read or write operation will take. This applies to resyncing the metadevice as well as normal I/O activity.
RAID as a Component to a Device
- A RAID metadevice cannot be used as a submirror or as a component to a concatenation or stripe. A RAID metadevice may be used as either the master or log device of a metatrans device.
Mixing Different Size Components
- When different size disk components are used in a RAID metadevice, some disk space will be unused unless the unused portion is assigned to another metadevice. This is because the metadevice is limited by the smallest partition in the configuration (n times the smallest component, where n is the number of components in the metadevice). For example, if there are two 327 Mbyte partitions and one 661 Mbyte partition in a RAID metadevice, the metadevice will only use 327 Mbytes of the space on the 661 Mbyte partition.
- To assign the unused disk space to another metadevice, the component must be repartitioned (use format(1M)).
-
Note - You should never repartition a component which is in a RAID device.
Using Components with Different Geometry
- All components in a RAID metadevice should have the same number of sectors and tracks per cylinder. This is referred to as the disk geometry. The geometry is related to the capacity of the drive. Disk geometry varies depending on the manufacturer and type.
- The problem with the differing component geometries is that the UFS file system attempts to lay out file blocks in an efficient manner. UNIX counts on a knowledge of the component geometry and uses cylinder groups to attempt to minimize seek distances. If all components do not have the same geometry, the geometry of the first component is reported to the file system. This may cause the efficiency of the file system to suffer.
Controllers
- Building a RAID metadevice with all the component partitions on the same controller will adversely affect I/O performance. Also, creating a metadevice of components on different controller types can affect performance because some controllers are faster than others. The I/O throughput will be limited by the slowest controller.
- An example of a controller limiting performance is when several devices (e.g., 3 Mbyte per second disks) are attached to the same controller. In this instance, the throughput may be limited to the throughput of the controller and not the sum of the devices.
- Another factor to consider when configuring RAID metadevices with respect to controllers is the possibility of the controller being the single-point-of-failure. RAID provides the capability to recover data when a single errored component exists within the configured metadevice, but when multiple components are errored (that is, controller failure) the task of data recovery may or may not succeed.
Examples
- Examples of the basic RAID operations are defined in this section.
- The examples include:
-
- Configuring a RAID metadevice and monitoring its status during initialization
- Concatenating a new component to an existing RAID metadevice
- Replacing an errored component and monitoring the progress of the component resync process
Defining a RAID device
- The following example shows how to define a RAID metadevice of four components with a interlace size of 10 Megabytes.
- In the following example, the four components comprising the RAID metadevice are /dev/dsk/c1t0d0s2, /dev/dsk/c2t0d0s2, /dev/dsk/c3t0d0s2, and /dev/dsk/c4t0d0s2. The RAID metadevice will be identified by d10.
-
-
Verify that the components and RAID definition are valid using
-
-
metainit -n.
- The -n option validates of the command line syntax without performing actual metadevice initialization.
-
# /usr/opt/SUNWmd/sbin/metainit -n d10 -r /dev/dsk/c1t0d0s2 \
/dev/dsk/c2t0d0s2 /dev/dsk/c3t0d0s2 /dev/dsk/c4t0d0s2 -i 10m
|
-
-
If the configuration is accurate, run metainit to initialize the RAID metadevice.
-
# /usr/opt/SUNWmd/sbin/metainit d10 -r /dev/dsk/c1t0d0s2 \
/dev/dsk/c2t0d0s2 /dev/dsk/c3t0d0s2 /dev/dsk/c4t0d0s2 -i 10m
|
-
-
While the metadevice is initializing, you can use metastat to view the progress.
The RAID metadevice will not be available for use until the completion of the initialization cycle. At this point the state of the metadevice will transition from "Initializing" to "Okay".
-
# metastat d10
d80: RAID
State : Initializing
Initialization in progress: 16% done
Size : 16608 blocks
Original device:
Size : 16608 blocks
Device Start Block Dbase State Hot Spare
c1t0d0s2 471 No Initializing
c2t0d0s2 170 No Initializing
c3t0d0s2 170 No Initializing
c4t0d0s2 170 No Initializing
# metastat d10
d80: RAID
State : Okay
Size : 16608 blocks
Original device:
Size : 16608 blocks
Device Start Block Dbase State Hot Spare
c1t0d0s2 471 No Okay
c2t0d0s2 170 No Okay
c3t0d0s2 170 No Okay
c4t0d0s2 170 No Okay
|
Concatenating to a RAID Device
- The following example shows how to concatenate a new component to an existing RAID metadevice on which a file system exists.
- In the following example, the RAID metadevice is d10, and the new component to attach is /dev/dsk/c5t0d0s2.
-
-
Use metastat to check the status of the RAID metadevice to which the new component will be attached.
-
# metastat d10
d80: RAID
State : Okay
Size : 16608 blocks
Original device:
Size : 16608 blocks
Device Start Block Dbase State Hot Spare
c1t0d0s2 471 No Okay
c2t0d0s2 170 No Okay
c3t0d0s2 170 No Okay
c4t0d0s2 170 No Okay
|
-
-
If the state of the RAID configuration is stable, use metattach to attach the new component to this metadevice.
-
# metattach d10 /dev/dsk/c5t0d0s2
|
-
-
While the new component is initializing, you can use metastat to view the progress.
The new RAID component will not be available for use until the completion of the initialization cycle. At this point the state of the component will transition from "Initializing" to "Okay".
-
# metastat d10
d80: RAID
State : Okay
Size : 27680 blocks
Original device:
Size : 16608 blocks
Device Start Block Dbase State Hot Spare
c1t1d0s2 471 No Okay
c2t0d0s2 170 No Okay
c3t0d0s2 170 No Okay
c4t0d0s2 170 No Okay
Concatenated Devices:
Size : 11072 blocks
Device Start Block Dbase State Hot Spare
c0t2d0s0 935 No Okay
|
-
-
Use the growfs command to expand the mounted file system. growfs is a non-destructive utility and may be issued while the file system is mounted.
-
# growfs /dev/md/rdsk/d10 -M /foo
|
- where /foo is the mount point.
Recovering from Component Errors
- The following example shows how to recover when a single component in a RAID metadevice is errored.
- In the following example, the RAID metadevice is d10, and the component that will be used to replace the errored component is /dev/dsk/c5t0d0s2.
-
-
Using metastat, identify the component that is errored and needs to be replaced.
The state of the metadevice and the errored component will be "Maintenance". When in this state, a line will be displayed with the action that should be taken to recover from this state.
-
# metastat d10
d80: RAID
State : Maintenance
Invoke :metareplace d10 /dev/dsk/c0t2d0s5 <new device>"
Size : 16608 blocks
Original device:
Size : 16608 blocks
Device Start Block Dbase State Hot Spare
c1t0d0s2 471 No Okay
c2t0d0s2 170 No Okay
c3t0d0s2 170 No Maintenance
c4t0d0s2 170 No Okay
|
-
-
Use the metareplace command to perform the replacement of the errored component as follows.
-
# metareplace d10 c3t0d0s2 c5t0d0s2
|
-
-
Use metastat to monitor the progress of the replacement. During the replacement of the errored component the state of the metadevice and the new component will be "Resyncing". While in this state, you may continue the metadevice.
-
# metastat d10
d80: RAID
State : Resyncing
Resync in process : 21% done
Size : 16608 blocks
Original device:
Size : 16608 blocks
Device Start Block Dbase State Hot Spare
c1t0d0s2 471 No Okay
c2t0d0s2 170 No Okay
c5t0d0s2 170 No Resyncing
c4t0d0s2 170 No Okay
|
|
|