Contenues dansTrouver plus de documentationRessources d'assistance comprises | Télécharger cet ouvrage au format PDF (1709 Ko)
Chapter 1 ZFS File System (Introduction)This chapter provides an overview of the ZFS file system and its features and benefits. This chapter also covers some basic terminology used throughout the rest of this book. The following sections are provided in this chapter: What's New in ZFS?This section summarizes new features in the ZFS file system. ZFS Log Device EnhancementsThe following log device enhancements are available in the Solaris Express Community Edition:
Triple Parity RAIDZ (raidz3)Solaris Express Community Edition, build 120: In this Solaris release, a redundant RAID-Z configuration can now have either single-, double-, or triple-parity, which means that one, two, three device failures can be sustained respectively, without any data loss. You can specify the raidz3 keyword for a triple-parity RAID-Z configuration. For more information, see Holding ZFS SnapshotsSolaris Express Community Edition, build 121: If you implement different automatic snapshot policies so that older snapshots are being inadvertently destroyed by zfs receive because they no longer exist on the sending side, you might consider using the snapshots hold feature in this Solaris release. Holding a snapshot prevents it from being destroyed. In addition, this feature allows a snapshot with clones to be deleted pending the removal of the last clone by using the zfs destroy -d command. You can hold a snapshot or set of snapshots. For example, the following syntax puts a hold tag, keep, on tank/home/cindys/snap@1.
For more information, see Holding ZFS Snapshots. ZFS Device Replacement EnhancementsSolaris Express Community Edition, build 117: In this Solaris release, a system event or sysevent is provided when an underlying device is expanded. ZFS has been enhanced to recognize these events and adjusts the pool based on the new size of the expanded LUN, depending on the setting of the autoexpand property. You can use the autoexpand property pool to enable or disable automatic pool expansion when a dynamic LUN expansion event is received. These features enable you to expand a LUN and the resulting pool can access the expanded space without having to export and import pool or reboot the system. For example, automatic LUN expansion is enabled on the tank pool.
Or , you can create the pool with the autoexpand property enabled.
The autoexpand property is disabled by default so you can decide whether you want the LUN expanded or not. A LUN can also be expanded by using the zpool online -e command. For example:
Or, you can reset the autoexpand property after the LUN is attached or made available by using the zpool replace feature. For example, the following pool is created with one 8-Gbyte disk (c0t0d0). The 8-Gbyte disk is replaced with a 16-Gbyte disk (c1t13d0), but the pool size is not expanded until the autoexpand property is enabled.
Another way to expand the LUN in the above example without enabling the autoexpand property, is to use the zpool online -e command even though the device is already online. For example:
Additional device replacement enhancements in this release include the following features:
For more information about replacing devices, see Replacing Devices in a Storage Pool. ZFS User and Group QuotasSolaris Express Community Edition, build 114: In previous Solaris releases, you could apply quotas and reservations to ZFS file systems to manage and reserve space. In this Solaris release, you can set a quota on the amount of space consumed by files that are owned by a particular user or group. You might consider setting user and group quotas in an environment with a large number of users or groups. You can set user or group quotas by using the zfs userquota and zfs groupquota properties as follows:
You can display a user's or group's current quota setting as follows:
Display general quota information as follows:
You can display individual user or group space usage by viewing the userused@user and groupused@group properties as follows:
For more information about setting user quotas, see Setting ZFS Quotas and Reservations. ZFS ACL Pass Through Inheritance for Execute PermissionSolaris Express Community Edition, build 103: In previous Solaris releases, you could apply ACL inheritance so that all files are created with 0664 or 0666 permissions. If you want to optionally include the execute bit from the file creation mode into the inherited ACL, you can use the pass through inheritance for execute permission in this release. If aclinherit=passthrough-x is enabled on a ZFS dataset, you can include execute permission for an output file that is generated from cc or gcc tools. If the inherited ACL does not include execute permission, then the executable output from the compiler won't be executable until you use the chmod command to change the file's permissions. For more information, see Example 8–13. Automatic ZFS SnapshotsSolaris Express Community Edition, build 100: This release includes the Time Slider snapshot tool. This tool automatically snapshots ZFS file systems and allows you to browse and recover snapshots of file systems. For more information, see Managing Automatic ZFS Snapshots. ZFS Property EnhancementsSolaris Express Community Edition, builds 96-99: The following ZFS file system enhancements are included in these releases.
ZFS Log Device RecoverySolaris Express Community Edition, build 96: In this release, ZFS identifies intent log failures in the zpool status command. FMA reports these errors as well. Both ZFS and FMA describe how to recover from an intent log failure. For example, if the system shuts down abruptly before synchronous write operations are committed to a pool with a separate log device, you will see messages similar to the following:
You will need to resolve the log device failure in the following ways:
If you want to recover from this error without replacing log device failure, you can clear the error with the zpool clear command. In this scenario, the pool will operate in degraded mode and the log records will be written to the main pool until the separate log device is replaced. Consider using mirrored log devices to reduce the log device failure scenario. Using ZFS ACL SetsSolaris Express Community Edition, build 95: This release provides the ability to apply NFSv4–style ACLs in sets, rather than apply different ACL permissions individually. The following ACL sets are provided:
These ACL sets are prefined and cannot be modified. For more information about using ACL sets, see Example 8–5. Using Cache Devices in Your ZFS Storage PoolSolaris Express Community Edition, build 78: In this Solaris release, you can create pool and specify cache devices, which are used to cache storage pool data. Cache devices provide an additional layer of caching between main memory and disk. Using cache devices provide the greatest performance improvement for random read-workloads of mostly static content. One or more cache devices can specified when the pool is created. For example:
After cache devices are added, they gradually fill with content from main memory. Depending on the size of your cache device, it could take over an hour for them to fill. Capacity and reads can be monitored by using the zpool iostat command as follows:
Cache devices can be added or removed from the pool after the pool is created. For more information, see Creating a ZFS Storage Pool with Cache Devices and Example 4–4. ZFS Installation and Boot SupportSolaris Express Community Edition, build 90: This release provides the ability to install and boot a ZFS root file system. You can use the initial installation option or the JumpStart feature to install a ZFS root file system. Or, you can use the Live Upgrade feature to migrate a UFS root file system to a ZFS root file system. ZFS support for swap and dump devices is also provided. For more information, see Chapter 5, Installing and Booting a ZFS Root File System. For a list of known issues with this release, go to the following site: http://opensolaris.org/os/community/zfs/boot Rolling Back a Dataset Without UnmountingSolaris Express Community Edition, build 80: This release provides the ability to rollback a dataset without unmounting it first. This feature means that zfs rollback -f option is no longer needed to force an umount operation. The -f option is no longer supported, and is ignored if specified. Enhancements to the zfs send CommandSolaris Express Community Edition, build 77: This release includes the following enhancements to the zfs send command.
For more information, see Sending and Receiving Complex ZFS Snapshot Streams. ZFS Quotas and Reservations for File System Data OnlySolaris Express Community Edition, build 77: In addition to the existing ZFS quota and reservation features, this release includes dataset quotas and reservations that do not include descendents, such as snapshots and clones, in the space consumption accounting.
For example, you can set a 10 Gbyte refquota for studentA that sets a 10-Gbyte hard limit of referenced space. For additional flexibility, you can set a 20-Gbyte quota that allows you to manage studentA's snapshots.
For more information, see Setting ZFS Quotas and Reservations. ZFS File System Properties for the Solaris CIFS ServiceSolaris Express Community Edition, build 77: This release provides support for the SolarisTM Common Internet File System (CIFS) service. This product provides the ability to share files between Solaris and Windows or MacOS systems. To facilitate sharing files between these systems by using the Solaris CIFS service, the following new ZFS properties are provided:
Currently, the sharesmb property is available to share ZFS files in the Solaris CIFS environment. More ZFS CIFS-related properties will be available in an upcoming release. For information about using the sharesmb property, see Sharing ZFS Files in a Solaris CIFS Environment. In addition to the ZFS properties added for supporting the Solaris CIFS software product, the vscan property is available for scanning ZFS files if you have a 3rd-party virus scanning engine. ZFS Storage Pool PropertiesSolaris Express Community Edition, build 77: ZFS storage pool properties were introduced in an earlier release. This release provides for additional property information. For example:
For a description of these properties, see Table 4–1.
ZFS and File System Mirror MountsSolaris Express Community Edition, build 77: In this Solaris release, NFSv4 mount enhancements are provided to make ZFS file systems more accessible to NFS clients. When file systems are created on the NFS server, the NFS client can automatically discover these newly created file systems within their existing mount of a parent file system. For example, if the server neo already shares the tank file system and client zee has it mounted, /tank/baz is automatically visible on the client after it is created on the server.
ZFS Command History Enhancements (zpool history)Solaris Express Community Edition, build 69: The zpool history command has been enhanced to provide the following new features:
For more information about using the zpool history command, see Identifying Problems in ZFS. Upgrading ZFS File Systems (zfs upgrade)Solaris Express Community Edition, build 69: The zfs upgrade command is included in this release to provide future ZFS file system enhancements to existing file systems. ZFS storage pools have a similar upgrade feature to provide pool enhancements to existing storage pools. For example:
Note – File systems that are upgraded and any streams created from those upgraded file systems by the zfs send command are not accessible on systems that are running older software releases. ZFS Delegated AdministrationSolaris Express Community Edition, build 69: In this release, you can delegate fine-grained permissions to perform ZFS administration tasks to non-privileged users. You can use the zfs allow and zfs unallow commands to grant and remove permissions. You can modify the ability to use delegated administration with the pool's delegation property. For example:
By default, the delegation property is enabled. For more information, see Chapter 9, ZFS Delegated Administration and zfs(1M). Setting Up Separate ZFS Logging DevicesSolaris Express Community Edition, build 68: The ZFS intent log (ZIL) is provided to satisfy POSIX requirements for synchronous transactions. For example, databases often require their transactions to be on stable storage devices when returning from a system call. NFS and other applications can also use fsync() to ensure data stability. By default, the ZIL is allocated from blocks within the main storage pool. However, better performance might be possible by using separate intent log devices in your ZFS storage pool, such as with NVRAM or a dedicated disk. Log devices for the ZFS intent log are not related to database log files. You can set up a ZFS logging device when the storage pool is created or after the pool is created. For examples of setting up log devices, see Creating a ZFS Storage Pool with Log Devices and Adding Devices to a Storage Pool. You can attach a log device to an existing log device to create a mirrored log device. This operation is identical to attaching a device in a unmirrored storage pool. Consider the following points when determining whether setting up a ZFS log device is appropriate for your environment:
Creating Intermediate ZFS DatasetsSolaris Express Community Edition, build 68: You can use the -p option with the zfs create, zfs clone, and zfs rename commands to quickly create a non-existent intermediate dataset, if it doesn't already exist. For example, create ZFS datasets (users/area51) in the datab storage pool.
If the intermediate dataset exists during the create operation, the operation completes successfully. Properties specified apply to the target dataset, not to the intermediate datasets. For example:
The intermediate dataset is created with the default mount point. Any additional properties are disabled for the intermediate dataset. For example:
For more information, see zfs(1M). ZFS Hotplugging EnhancementsSolaris Express Community Edition, build 68: In this release, ZFS more effectively responds to devices that are removed and provides a mechanism to automatically identify devices that are inserted with the following enhancements:
For more information, see zpool(1M). Recursively Renaming ZFS Snapshots (zfs rename -r)Solaris Express Community Edition, build 63: You can recursively rename all descendent ZFS snapshots by using the zfs rename -r command. For example, snapshot a set of ZFS file systems.
Then, rename the snapshots the following day.
Snapshots are the only dataset that can be renamed recursively. For more information about snapshots, see Overview of ZFS Snapshots and this blog entry that describes how to create rolling snapshots: http://blogs.sun.com/mmusante/entry/rolling_snapshots_made_easy ZFS Boot Support on x86 SystemsSolaris Express Community Edition, build 62: In this Solaris release, support for booting a ZFS file system is available on x86 systems. For more information, see: http://www.opensolaris.org/os/community/zfs/boot GZIP Compression is Available for ZFSSolaris Express Community Edition, build 62: In this Solaris release, you can set gzip compression on ZFS file systems in addition to lzjb compression. You can specify compression as gzip, the default, or gzip-N, where N equals 1 through 9. For example:
For more information about setting ZFS properties, see Setting ZFS Properties. Storing Multiple Copies of ZFS User DataSolaris Express Community Edition, build 61: As a reliability feature, ZFS file system metadata is automatically stored multiple times across different disks, if possible. This feature is known as ditto blocks. In this Solaris release, you can specify that multiple copies of user data is also stored per file system by using the zfs set copies command. For example:
Available values are 1, 2, or 3. The default value is 1. These copies are in addition to any pool-level redundancy, such as in a mirrored or RAID-Z configuration. The benefits of storing multiple copies of ZFS user data are as follows:
Depending on the allocation of the ditto blocks in the storage pool, multiple copies might be placed on a single disk. A subsequent full disk failure might cause all ditto blocks to be unavailable. You might consider using ditto blocks when you accidentally create a non-redundant pool and when you need to set data retention policies. For a detailed description of how setting copies on a system with a single-disk pool or a multiple-disk pool might impact overall data protection, see this blog: http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection For more information about setting ZFS properties, see Setting ZFS Properties. Improved zpool status OutputSolaris Express Community Edition, build 57: You can use the zpool status -v command to display a list of files with persistent errors. Previously, you had to use the find -inum command to identify the filenames from the list of displayed inodes. For more information about displaying a list of files with persistent errors, see Repairing a Corrupted File or Directory. ZFS and Solaris iSCSI ImprovementsSolaris Express Community Release, build 54: In this Solaris release, you can create a ZFS volume as a Solaris iSCSI target device by setting the shareiscsi property on the ZFS volume. This method is a convenient way to quickly set up a Solaris iSCSI target. For example:
After the iSCSI target is created, set up the iSCSI initiator. For information about setting up a Solaris iSCSI initiator, see Chapter 14, Configuring Solaris iSCSI Targets and Initiators (Tasks), in System Administration Guide: Devices and File Systems. For more information about managing a ZFS volume as an iSCSI target, see Using a ZFS Volume as a Solaris iSCSI Target. Sharing ZFS File System EnhancementsSolaris Express Community Release, build 53: In this Solaris release, the process of sharing file systems has been improved. Although modifying system configuration files, such as /etc/dfs/dfstab, is unnecessary for sharing ZFS file systems, you can use the sharemgr command to manage ZFS share properties. The sharemgr command enables you to set and manage share properties on share groups. ZFS shares are automatically designated in the zfs share group. As in previous releases, you can set the ZFS sharenfs property on a ZFS file system to share a ZFS file system. For example:
Or, you can use the new sharemgr add-share subcommand to share a ZFS file system in the zfs share group. For example:
Then, you can use the sharemgr command to manage ZFS shares. The following example shows how to use sharemgr to set the nosuid property on the shared ZFS file systems. You must preface ZFS share paths with a /zfs designation.
For more information, see sharemgr(1M). ZFS Command History (zpool history)Solaris Express Community Release, build 51: In this Solaris release, ZFS automatically logs successful zfs and zpool commands that modify pool state information. For example:
This features enables you or Sun support personnel to identify the exact set of ZFS commands that was executed to troubleshoot an error scenario. You can identify a specific storage pool with the zpool history command. For example:
In this Solaris release, the zpool history command does not record user-ID, hostname, or zone-name. For more information, see ZFS Command History Enhancements (zpool history). For more information about troubleshooting ZFS problems, see Identifying Problems in ZFS. ZFS Property ImprovementsZFS xattr PropertySolaris Express Community Release, build 56: You can use the xattr property to disable or enable extended attributes for a specific ZFS file system. The default value is on. For a description of ZFS properties, see Introducing ZFS Properties. ZFS canmount PropertySolaris Express Community Release, build 48: The new canmount property allows you to specify whether a dataset can be mounted by using the zfs mount command. For more information, see The canmount Property. ZFS User PropertiesSolaris Express Community Release, build 48: In addition to the standard native properties that can either export internal statistics or control ZFS file system behavior, ZFS supports user properties. User properties have no effect on ZFS behavior, but you can use them to annotate datasets with information that is meaningful in your environment. For more information, see ZFS User Properties. Setting Properties When Creating ZFS File SystemsSolaris Express Community Release, build 48: In this Solaris release, you can set properties when you create a file system, in addition to setting properties after the file system is created. The following examples illustrate equivalent syntax:
Displaying All ZFS File System InformationSolaris Express Community Release, build 48: In this Solaris release, you can use various forms of the zfs get command to display information about all datasets if you do not specify a dataset or if you do not specify all. In previous releases, all dataset information was not retreivable with the zfs get command. For example:
New zfs receive -F OptionSolaris Express Community Release, build 48: In this Solaris release, you can use the new -F option to the zfs receive command to force a rollback of the file system to the most recent snapshot before doing the receive. Using this option might be necessary when the file system is modified between the time a rollback occurs and the receive is initiated. For more information, see Receiving a ZFS Snapshot. Recursive ZFS SnapshotsSolaris Express Community Release, build 43: When you use the zfs snapshot command to create a file system snapshot, you can use the -r option to recursively create snapshots for all descendent file systems. In addition, using the -r option recursively destroys all descendent snapshots when a snapshot is destroyed. Recursive ZFS snapshots are created quickly as one atomic operation. The snapshots are created together (all at once) or not created at all. The benefit of atomic snapshots operations is that the snapshot data is always taken at one consistent time, even across descendent file systems. For more information, see Creating and Destroying ZFS Snapshots. Double Parity RAID-Z (raidz2)Solaris Express Community Release, build 42: A redundant RAID-Z configuration can now have either single- or double-parity, which means that one or two device failures can be sustained respectively, without any data loss. You can specify the raidz2 keyword for a double-parity RAID-Z configuration. Or, you can specify the raidz or raidz1 keyword for a single-parity RAID-Z configuration. For more information, see Creating RAID-Z Storage Pools or zpool(1M). Hot Spares for ZFS Storage Pool DevicesSolaris Express Community Release, build 42: The ZFS hot spares feature enables you to identify disks that could be used to replace a failed or faulted device in one or more storage pools. Designating a device as a hot spare means that if an active device in the pool fails, the hot spare automatically replaces the failed device. Or, you can manually replace a device in a storage pool with a hot spare. For more information, see Designating Hot Spares in Your Storage Pool and zpool(1M). Replacing a ZFS File System With a ZFS Clone (zfs promote)Solaris Express Community Release, build 42: The zfs promote command enables you to replace an existing ZFS file system with a clone of that file system. This feature is helpful when you want to run tests on an alternative version of a file system and then, make that alternative version of the file system the active file system. For more information, see Replacing a ZFS File System With a ZFS Clone and zfs(1M). Upgrading ZFS Storage Pools (zpool upgrade)Solaris Express Community Release, build 39: You can upgrade your storage pools to a newer version to take advantage of the latest features by using the zpool upgrade command. In addition, the zpool status command has been modified to notify you when your pools are running older versions. For more information, see Upgrading ZFS Storage Pools and zpool(1M). If you want to use the ZFS Administration console on a system with a pool from a previous Solaris release, make sure you upgrade your pools before using the ZFS Administration console. To see if your pools need to be upgraded, use the zpool status command. For information about the ZFS Administration console, see ZFS Web-Based Management. Using ZFS to Clone Non-Global Zones and Other EnhancementsSolaris Express Community Release, build 39: When the source zonepath and the target zonepath both reside on ZFS and are in the same pool, zoneadm clone now automatically uses the ZFS clone feature to clone a zone. This enhancement means that zoneadm clone will take a ZFS snapshot of the source zonepath and set up the target zonepath. The snapshot is named SUNWzoneX, where X is a unique ID used to distinguish between multiple snapshots. The destination zone's zonepath is used to name the ZFS clone. A software inventory is performed so that a snapshot used at a future time can be validated by the system. Note that you can still specify that the ZFS zonepath be copied instead of the ZFS clone, if desired. To clone a source zone multiple times, a new parameter added to zoneadm allows you to specify that an existing snapshot should be used. The system validates that the existing snapshot is usable on the target. Additionally, the zone install process now has the capability to detect when a ZFS file system can be created for a zone, and the uninstall process can detect when a ZFS file system in a zone can be destroyed. These steps are then performed automatically by the zoneadm command. Keep the following points in mind when using ZFS on a system with Solaris containers installed:
For more information, see System Administration Guide: Virtualization Using the Solaris Operating System. ZFS Backup and Restore Commands are RenamedSolaris Express Community Release, build 38: In this Solaris release, the zfs backup and zfs restore commands are renamed to zfs send and zfs receive to more accurately describe their function. The function of these commands is to save and restore ZFS data stream representations. For more information about these commands, see Sending and Receiving ZFS Data. Recovering Destroyed Storage PoolsSolaris Express Community Release, build 37: This release includes the zpool import -D command, which enables you to recover pools that were previously destroyed with the zpool destroy command. For more information, see Recovering Destroyed ZFS Storage Pools. ZFS is Integrated With Fault ManagerSolaris Express Community Release, build 36: This release includes the integration of a ZFS diagnostic engine that is capable of diagnosing and reporting pool failures and device failures. Checksum, I/O, device, and pool errors associated with pool or device failures are also reported. The diagnostic engine does not include predictive analysis of checksum and I/O errors, nor does it include proactive actions based on fault analysis. In the event of the ZFS failure, you might see a message similar to the following from fmd:
By reviewing the recommended action, which will be to follow the more specific directions in the zpool status command, you will be able to quickly identify and resolve the failure. For an example of recovering from a reported ZFS problem, see Resolving a Missing Device. New zpool clear CommandSolaris Express Community Release, build 36: This release includes the zpool clear command for clearing error counts associated with a device or the pool. Previously, error counts were cleared when a device in a pool was brought online with the zpool online command. For more information, see zpool(1M) and Clearing Storage Pool Devices. Compact NFSv4 ACL FormatSolaris Express Community Release, build 34: In this release, three NFSv4 ACL formats are available: verbose, positional, and compact. The new compact and positional ACL formats are available to set and display ACLs. You can use the chmod command to set all 3 ACL formats. You can use the ls -V command to display compact and positional ACL formats and the ls -v command to display verbose ACL formats. For more information, see Setting and Displaying ACLs on ZFS Files in Compact Format, chmod(1), and ls(1). File System Monitoring Tool (fsstat)Solaris Express Community Release, build 34: A new file system monitoring tool, fsstat, is available to report file system operations. Activity can be reported by mount point or by file system type. The following example shows general ZFS file system activity.
For more information, see fsstat(1M). ZFS Web-Based ManagementSolaris Express Community Release, build 28: A web-based ZFS management tool is available to perform many administrative actions. With this tool, you can perform the following tasks:
You can access the ZFS Administration console through a secure web browser at the following URL:
If you type the appropriate URL and are unable to reach the ZFS Administration console, the server might not be started. To start the server, run the following command:
If you want the server to run automatically when the system boots, run the following command:
Note – You cannot use the Solaris Management Console (smc) to manage ZFS storage pools or file systems. What Is ZFS?The Solaris ZFS file system is a revolutionary new file system that fundamentally changes the way file systems are administered, with features and benefits not found in any other file system available today. ZFS has been designed to be robust, scalable, and simple to administer. ZFS Pooled StorageZFS uses the concept of storage pools to manage physical storage. Historically, file systems were constructed on top of a single physical device. To address multiple devices and provide for data redundancy, the concept of a volume manager was introduced to provide the image of a single device so that file systems would not have to be modified to take advantage of multiple devices. This design added another layer of complexity and ultimately prevented certain file system advances, because the file system had no control over the physical placement of data on the virtualized volumes. ZFS eliminates the volume management altogether. Instead of forcing you to create virtualized volumes, ZFS aggregates devices into a storage pool. The storage pool describes the physical characteristics of the storage (device layout, data redundancy, and so on,) and acts as an arbitrary data store from which file systems can be created. File systems are no longer constrained to individual devices, allowing them to share space with all file systems in the pool. You no longer need to predetermine the size of a file system, as file systems grow automatically within the space allocated to the storage pool. When new storage is added, all file systems within the pool can immediately use the additional space without additional work. In many ways, the storage pool acts as a virtual memory system. When a memory DIMM is added to a system, the operating system doesn't force you to invoke some commands to configure the memory and assign it to individual processes. All processes on the system automatically use the additional memory. Transactional SemanticsZFS is a transactional file system, which means that the file system state is always consistent on disk. Traditional file systems overwrite data in place, which means that if the machine loses power, for example, between the time a data block is allocated and when it is linked into a directory, the file system will be left in an inconsistent state. Historically, this problem was solved through the use of the fsck command. This command was responsible for going through and verifying file system state, making an attempt to repair any inconsistencies in the process. This problem caused great pain to administrators and was never guaranteed to fix all possible problems. More recently, file systems have introduced the concept of journaling. The journaling process records action in a separate journal, which can then be replayed safely if a system crash occurs. This process introduces unnecessary overhead, because the data needs to be written twice, and often results in a new set of problems, such as when the journal can't be replayed properly. With a transactional file system, data is managed using copy on write semantics. Data is never overwritten, and any sequence of operations is either entirely committed or entirely ignored. This mechanism means that the file system can never be corrupted through accidental loss of power or a system crash. So, no need for a fsck equivalent exists. While the most recently written pieces of data might be lost, the file system itself will always be consistent. In addition, synchronous data (written using the O_DSYNC flag) is always guaranteed to be written before returning, so it is never lost. Checksums and Self-Healing DataWith ZFS, all data and metadata is checksummed using a user-selectable algorithm. Traditional file systems that do provide checksumming have performed it on a per-block basis, out of necessity due to the volume management layer and traditional file system design. The traditional design means that certain failure modes, such as writing a complete block to an incorrect location, can result in properly checksummed data that is actually incorrect. ZFS checksums are stored in a way such that these failure modes are detected and can be recovered from gracefully. All checksumming and data recovery is done at the file system layer, and is transparent to applications. In addition, ZFS provides for self-healing data. ZFS supports storage pools with varying levels of data redundancy, including mirroring and a variation on RAID-5. When a bad data block is detected, ZFS fetches the correct data from another redundant copy, and repairs the bad data, replacing it with the good copy. Unparalleled ScalabilityZFS has been designed from the ground up to be the most scalable file system, ever. The file system itself is 128-bit, allowing for 256 quadrillion zettabytes of storage. All metadata is allocated dynamically, so no need exists to pre-allocate inodes or otherwise limit the scalability of the file system when it is first created. All the algorithms have been written with scalability in mind. Directories can have up to 248 (256 trillion) entries, and no limit exists on the number of file systems or number of files that can be contained within a file system. ZFS SnapshotsA snapshot is a read-only copy of a file system or volume. Snapshots can be created quickly and easily. Initially, snapshots consume no additional space within the pool. As data within the active dataset changes, the snapshot consumes space by continuing to reference the old data. As a result, the snapshot prevents the data from being freed back to the pool. Simplified AdministrationMost importantly, ZFS provides a greatly simplified administration model. Through the use of hierarchical file system layout, property inheritance, and automanagement of mount points and NFS share semantics, ZFS makes it easy to create and manage file systems without needing multiple commands or editing configuration files. You can easily set quotas or reservations, turn compression on or off, or manage mount points for numerous file systems with a single command. Devices can be examined or repaired without having to understand a separate set of volume manager commands. You can take an unlimited number of instantaneous snapshots of file systems. You can backup and restore individual file systems. ZFS manages file systems through a hierarchy that allows for this simplified management of properties such as quotas, reservations, compression, and mount points. In this model, file systems become the central point of control. File systems themselves are very cheap (equivalent to a new directory), so you are encouraged to create a file system for each user, project, workspace, and so on. This design allows you to define fine-grained management points. ZFS TerminologyThis section describes the basic terminology used throughout this book:
ZFS Component Naming RequirementsEach ZFS component must be named according to the following rules:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||