diff options
Diffstat (limited to 'sbin/raidctl/raidctl.8')
-rw-r--r-- | sbin/raidctl/raidctl.8 | 1325 |
1 files changed, 1325 insertions, 0 deletions
diff --git a/sbin/raidctl/raidctl.8 b/sbin/raidctl/raidctl.8 new file mode 100644 index 0000000..9aef14f --- /dev/null +++ b/sbin/raidctl/raidctl.8 @@ -0,0 +1,1325 @@ +.\" $FreeBSD$ +.\" $NetBSD: raidctl.8,v 1.21 2000/08/10 15:14:14 oster Exp $ +.\" +.\" Copyright (c) 1998 The NetBSD Foundation, Inc. +.\" All rights reserved. +.\" +.\" This code is derived from software contributed to The NetBSD Foundation +.\" by Greg Oster +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the NetBSD +.\" Foundation, Inc. and its contributors. +.\" 4. Neither the name of The NetBSD Foundation nor the names of its +.\" contributors may be used to endorse or promote products derived +.\" from this software without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS +.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED +.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS +.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +.\" POSSIBILITY OF SUCH DAMAGE. +.\" +.\" +.\" Copyright (c) 1995 Carnegie-Mellon University. +.\" All rights reserved. +.\" +.\" Author: Mark Holland +.\" +.\" Permission to use, copy, modify and distribute this software and +.\" its documentation is hereby granted, provided that both the copyright +.\" notice and this permission notice appear in all copies of the +.\" software, derivative works or modified versions, and any portions +.\" thereof, and that both notices appear in supporting documentation. +.\" +.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" +.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND +.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. +.\" +.\" Carnegie Mellon requests users of this software to return to +.\" +.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU +.\" School of Computer Science +.\" Carnegie Mellon University +.\" Pittsburgh PA 15213-3890 +.\" +.\" any improvements or extensions that they make and grant Carnegie the +.\" rights to redistribute these changes. +.\" +.Dd November 6, 1998 +.Dt RAIDCTL 8 +.Os FreeBSD +.Sh NAME +.Nm raidctl +.Nd configuration utility for the RAIDframe disk driver +.Sh SYNOPSIS +.Nm +.Op Fl v +.Fl a Ar component Ar dev +.Nm +.Op Fl v +.Fl A Op yes | no | root +.Ar dev +.Nm +.Op Fl v +.Fl B Ar dev +.Nm +.Op Fl v +.Fl c Ar config_file +.Nm +.Op Fl v +.Fl C Ar config_file +.Nm +.Op Fl v +.Fl f Ar component Ar dev +.Nm +.Op Fl v +.Fl F Ar component Ar dev +.Nm +.Op Fl v +.Fl g Ar component Ar dev +.Nm +.Op Fl v +.Fl i Ar dev +.Nm +.Op Fl v +.Fl I Ar serial_number Ar dev +.Nm +.Op Fl v +.Fl p Ar dev +.Nm +.Op Fl v +.Fl P Ar dev +.Nm +.Op Fl v +.Fl r Ar component Ar dev +.Nm +.Op Fl v +.Fl R Ar component Ar dev +.Nm +.Op Fl v +.Fl s Ar dev +.Nm +.Op Fl v +.Fl S Ar dev +.Nm +.Op Fl v +.Fl u Ar dev +.Sh DESCRIPTION +.Nm +is the user-land control program for +.Xr raid 4 , +the RAIDframe disk device. +.Nm +is primarily used to dynamically configure and unconfigure RAIDframe disk +devices. For more information about the RAIDframe disk device, see +.Xr raid 4 . +.Pp +This document assumes the reader has at least rudimentary knowledge of +RAID and RAID concepts. +.Pp +The command-line options for +.Nm +are as follows: +.Bl -tag -width indent +.It Fl a Ar component Ar dev +Add +.Ar component +as a hot spare for the device +.Ar dev . +.It Fl A Ic yes Ar dev +Make the RAID set auto-configurable. The RAID set will be +automatically configured at boot +.Ar before +the root filesystem is +mounted. Note that all components of the set must be of type RAID in the +disklabel. +.It Fl A Ic no Ar dev +Turn off auto-configuration for the RAID set. +.It Fl A Ic root Ar dev +Make the RAID set auto-configurable, and also mark the set as being +eligible to be the root partition. A RAID set configured this way +will +.Ar override +the use of the boot disk as the root device. All components of the +set must be of type RAID in the disklabel. Note that the kernel being +booted must currently reside on a non-RAID set. +.It Fl B Ar dev +Initiate a copyback of reconstructed data from a spare disk to +its original disk. This is performed after a component has failed, +and the failed drive has been reconstructed onto a spare drive. +.It Fl c Ar config_file +Configure a RAIDframe device +according to the configuration given in +.Ar config_file . +A description of the contents of +.Ar config_file +is given later. +.It Fl C Ar config_file +As for +.Ar -c , +but forces the configuration to take place. This is required the +first time a RAID set is configured. +.It Fl f Ar component Ar dev +This marks the specified +.Ar component +as having failed, but does not initiate a reconstruction of that +component. +.It Fl F Ar component Ar dev +Fails the specified +.Ar component +of the device, and immediately begin a reconstruction of the failed +disk onto an available hot spare. This is one of the mechanisms used to start +the reconstruction process if a component does have a hardware failure. +.It Fl g Ar component Ar dev +Get the component label for the specified component. +.It Fl i Ar dev +Initialize the RAID device. In particular, (re-write) the parity on +the selected device. This +.Ar MUST +be done for +.Ar all +RAID sets before the RAID device is labeled and before +filesystems are created on the RAID device. +.It Fl I Ar serial_number Ar dev +Initialize the component labels on each component of the device. +.Ar serial_number +is used as one of the keys in determining whether a +particular set of components belong to the same RAID set. While not +strictly enforced, different serial numbers should be used for +different RAID sets. This step +.Ar MUST +be performed when a new RAID set is created. +.It Fl p Ar dev +Check the status of the parity on the RAID set. Displays a status +message, and returns successfully if the parity is up-to-date. +.It Fl P Ar dev +Check the status of the parity on the RAID set, and initialize +(re-write) the parity if the parity is not known to be up-to-date. +This is normally used after a system crash (and before a +.Xr fsck 8 ) +to ensure the integrity of the parity. +.It Fl r Ar component Ar dev +Remove the spare disk specified by +.Ar component +from the set of available spare components. +.It Fl R Ar component Ar dev +Fails the specified +.Ar component , +if necessary, and immediately begins a reconstruction back to +.Ar component . +This is useful for reconstructing back onto a component after +it has been replaced following a failure. +.It Fl s Ar dev +Display the status of the RAIDframe device for each of the components +and spares. +.It Fl S Ar dev +Check the status of parity re-writing, component reconstruction, and +component copyback. The output indicates the amount of progress +achieved in each of these areas. +.It Fl u Ar dev +Unconfigure the RAIDframe device. +.It Fl v +Be more verbose. For operations such as reconstructions, parity +re-writing, and copybacks, provide a progress indicator. +.El +.Pp +The device used by +.Nm +is specified by +.Ar dev . +.Ar dev +may be either the full name of the device, e.g. /dev/rraid0d, +for the i386 architecture, and /dev/rraid0c +for all others, or just simply raid0 (for /dev/rraid0d). +.Pp +The format of the configuration file is complex, and +only an abbreviated treatment is given here. In the configuration +files, a +.Sq # +indicates the beginning of a comment. +.Pp +There are 4 required sections of a configuration file, and 2 +optional sections. Each section begins with a +.Sq START , +followed by +the section name, and the configuration parameters associated with that +section. The first section is the +.Sq array +section, and it specifies +the number of rows, columns, and spare disks in the RAID set. For +example: +.Bd -unfilled -offset indent +START array +1 3 0 +.Ed +.Pp +indicates an array with 1 row, 3 columns, and 0 spare disks. Note +that although multi-dimensional arrays may be specified, they are +.Ar NOT +supported in the driver. +.Pp +The second section, the +.Sq disks +section, specifies the actual +components of the device. For example: +.Bd -unfilled -offset indent +START disks +/dev/da0s1e +/dev/da1s1e +/dev/da2s1e +.Ed +.Pp +specifies the three component disks to be used in the RAID device. If +any of the specified drives cannot be found when the RAID device is +configured, then they will be marked as +.Sq failed , +and the system will +operate in degraded mode. Note that it is +.Ar imperative +that the order of the components in the configuration file does not +change between configurations of a RAID device. Changing the order +of the components will result in data loss if the set is configured +with the +.Fl C +option. In normal circumstances, the RAID set will not configure if +only +.Fl c +is specified, and the components are out-of-order. +.Pp +The next section, which is the +.Sq spare +section, is optional, and, if +present, specifies the devices to be used as +.Sq hot spares +-- devices +which are on-line, but are not actively used by the RAID driver unless +one of the main components fail. A simple +.Sq spare +section might be: +.Bd -unfilled -offset indent +START spare +/dev/da3s1e +.Ed +.Pp +for a configuration with a single spare component. If no spare drives +are to be used in the configuration, then the +.Sq spare +section may be omitted. +.Pp +The next section is the +.Sq layout +section. This section describes the +general layout parameters for the RAID device, and provides such +information as sectors per stripe unit, stripe units per parity unit, +stripe units per reconstruction unit, and the parity configuration to +use. This section might look like: +.Bd -unfilled -offset indent +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level +32 1 1 5 +.Ed +.Pp +The sectors per stripe unit specifies, in blocks, the interleave +factor; i.e. the number of contiguous sectors to be written to each +component for a single stripe. Appropriate selection of this value +(32 in this example) is the subject of much research in RAID +architectures. The stripe units per parity unit and +stripe units per reconstruction unit are normally each set to 1. +While certain values above 1 are permitted, a discussion of valid +values and the consequences of using anything other than 1 are outside +the scope of this document. The last value in this section (5 in this +example) indicates the parity configuration desired. Valid entries +include: +.Bl -tag -width inde +.It 0 +RAID level 0. No parity, only simple striping. +.It 1 +RAID level 1. Mirroring. The parity is the mirror. +.It 4 +RAID level 4. Striping across components, with parity stored on the +last component. +.It 5 +RAID level 5. Striping across components, parity distributed across +all components. +.El +.Pp +There are other valid entries here, including those for Even-Odd +parity, RAID level 5 with rotated sparing, Chained declustering, +and Interleaved declustering, but as of this writing the code for +those parity operations has not been tested with +.Fx . +.Pp +The next required section is the +.Sq queue +section. This is most often +specified as: +.Bd -unfilled -offset indent +START queue +fifo 100 +.Ed +.Pp +where the queuing method is specified as fifo (first-in, first-out), +and the size of the per-component queue is limited to 100 requests. +Other queuing methods may also be specified, but a discussion of them +is beyond the scope of this document. +.Pp +The final section, the +.Sq debug +section, is optional. For more details +on this the reader is referred to the RAIDframe documentation +discussed in the +.Sx HISTORY +section. + +See +.Sx EXAMPLES +for a more complete configuration file example. + +.Sh EXAMPLES + +It is highly recommended that before using the RAID driver for real +filesystems that the system administrator(s) become quite familiar +with the use of +.Nm , +and that they understand how the component reconstruction process +works. The examples in this section will focus on configuring a +number of different RAID sets of varying degrees of redundancy. +By working through these examples, administrators should be able to +develop a good feel for how to configure a RAID set, and how to +initiate reconstruction of failed components. +.Pp +In the following examples +.Sq raid0 +will be used to denote the RAID device. Depending on the +architecture, +.Sq /dev/rraid0c +or +.Sq /dev/rraid0d +may be used in place of +.Sq raid0 . +.Pp +.Ss Initialization and Configuration +The initial step in configuring a RAID set is to identify the components +that will be used in the RAID set. All components should be the same +size. Each component should have a disklabel type of +.Dv FS_RAID , +and a typical disklabel entry for a RAID component +might look like: +.Bd -unfilled -offset indent +f: 1800000 200495 RAID # (Cyl. 405*- 4041*) +.Ed +.Pp +While +.Dv FS_BSDFFS +will also work as the component type, the type +.Dv FS_RAID +is preferred for RAIDframe use, as it is required for features such as +auto-configuration. As part of the initial configuration of each RAID +set, each component will be given a +.Sq component label . +A +.Sq component label +contains important information about the component, including a +user-specified serial number, the row and column of that component in +the RAID set, the redundancy level of the RAID set, a 'modification +counter', and whether the parity information (if any) on that +component is known to be correct. Component labels are an integral +part of the RAID set, since they are used to ensure that components +are configured in the correct order, and used to keep track of other +vital information about the RAID set. Component labels are also +required for the auto-detection and auto-configuration of RAID sets at +boot time. For a component label to be considered valid, that +particular component label must be in agreement with the other +component labels in the set. For example, the serial number, +.Sq modification counter , +number of rows and number of columns must all +be in agreement. If any of these are different, then the component is +not considered to be part of the set. See +.Xr raid 4 +for more information about component labels. +.Pp +Once the components have been identified, and the disks have +appropriate labels, +.Nm +is then used to configure the +.Xr raid 4 +device. To configure the device, a configuration +file which looks something like: +.Bd -unfilled -offset indent +START array +# numRow numCol numSpare +1 3 1 + +START disks +/dev/da1s1e +/dev/da2s1e +/dev/da3s1e + +START spare +/dev/da4s1e + +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5 +32 1 1 5 + +START queue +fifo 100 +.Ed +.Pp +is created in a file. The above configuration file specifies a RAID 5 +set consisting of the components /dev/da1s1e, /dev/da2s1e, and /dev/da3s1e, +with /dev/da4s1e available as a +.Sq hot spare +in case one of +the three main drives should fail. A RAID 0 set would be specified in +a similar way: +.Bd -unfilled -offset indent +START array +# numRow numCol numSpare +1 4 0 + +START disks +/dev/da1s10e +/dev/da1s11e +/dev/da1s12e +/dev/da1s13e + +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0 +64 1 1 0 + +START queue +fifo 100 +.Ed +.Pp +In this case, devices /dev/da1s10e, /dev/da1s11e, /dev/da1s12e, and /dev/da1s13e +are the components that make up this RAID set. Note that there are no +hot spares for a RAID 0 set, since there is no way to recover data if +any of the components fail. +.Pp +For a RAID 1 (mirror) set, the following configuration might be used: +.Bd -unfilled -offset indent +START array +# numRow numCol numSpare +1 2 0 + +START disks +/dev/da2s10e +/dev/da2s11e + +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1 +128 1 1 1 + +START queue +fifo 100 +.Ed +.Pp +In this case, /dev/da2s10e and /dev/da2s11e are the two components of the +mirror set. While no hot spares have been specified in this +configuration, they easily could be, just as they were specified in +the RAID 5 case above. Note as well that RAID 1 sets are currently +limited to only 2 components. At present, n-way mirroring is not +possible. +.Pp +The first time a RAID set is configured, the +.Fl C +option must be used: +.Bd -unfilled -offset indent +raidctl -C raid0.conf +.Ed +.Pp +where +.Sq raid0.conf +is the name of the RAID configuration file. The +.Fl C +forces the configuration to succeed, even if any of the component +labels are incorrect. The +.Fl C +option should not be used lightly in +situations other than initial configurations, as if +the system is refusing to configure a RAID set, there is probably a +very good reason for it. After the initial configuration is done (and +appropriate component labels are added with the +.Fl I +option) then raid0 can be configured normally with: +.Bd -unfilled -offset indent +raidctl -c raid0.conf +.Ed +.Pp +When the RAID set is configured for the first time, it is +necessary to initialize the component labels, and to initialize the +parity on the RAID set. Initializing the component labels is done with: +.Bd -unfilled -offset indent +raidctl -I 112341 raid0 +.Ed +.Pp +where +.Sq 112341 +is a user-specified serial number for the RAID set. This +initialization step is +.Ar required +for all RAID sets. As well, using different +serial numbers between RAID sets is +.Ar strongly encouraged , +as using the same serial number for all RAID sets will only serve to +decrease the usefulness of the component label checking. +.Pp +Initializing the RAID set is done via the +.Fl i +option. This initialization +.Ar MUST +be done for +.Ar all +RAID sets, since among other things it verifies that the parity (if +any) on the RAID set is correct. Since this initialization may be +quite time-consuming, the +.Fl v +option may be also used in conjunction with +.Fl i : +.Bd -unfilled -offset indent +raidctl -iv raid0 +.Ed +.Pp +This will give more verbose output on the +status of the initialization: +.Bd -unfilled -offset indent +Initiating re-write of parity +Parity Re-write status: + 10% |**** | ETA: 06:03 / +.Ed +.Pp +The output provides a +.Sq Percent Complete +in both a numeric and graphical format, as well as an estimated time +to completion of the operation. +.Pp +Since it is the parity that provides the +.Sq redundancy +part of RAID, it is critical that the parity is correct +as much as possible. If the parity is not correct, then there is no +guarantee that data will not be lost if a component fails. +.Pp +Once the parity is known to be correct, +it is then safe to perform +.Xr disklabel 8 , +.Xr newfs 8 , +or +.Xr fsck 8 +on the device or its filesystems, and then to mount the filesystems +for use. +.Pp +Under certain circumstances (e.g. the additional component has not +arrived, or data is being migrated off of a disk destined to become a +component) it may be desirable to to configure a RAID 1 set with only +a single component. This can be achieved by configuring the set with +a physically existing component (as either the first or second +component) and with a +.Sq fake +component. In the following: +.Bd -unfilled -offset indent +START array +# numRow numCol numSpare +1 2 0 + +START disks +/dev/da6s1e +/dev/da0s1e + +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1 +128 1 1 1 + +START queue +fifo 100 +.Ed +.Pp +/dev/da0s1e is the real component, and will be the second disk of a RAID 1 +set. The component /dev/da6s1e, which must exist, but have no physical +device associated with it, is simply used as a placeholder. +Configuration (using +.Fl C +and +.Fl I Ar 12345 +as above) proceeds normally, but initialization of the RAID set will +have to wait until all physical components are present. After +configuration, this set can be used normally, but will be operating +in degraded mode. Once a second physical component is obtained, it +can be hot-added, the existing data mirrored, and normal operation +resumed. +.Pp +.Ss Maintenance of the RAID set +After the parity has been initialized for the first time, the command: +.Bd -unfilled -offset indent +raidctl -p raid0 +.Ed +.Pp +can be used to check the current status of the parity. To check the +parity and rebuild it necessary (for example, after an unclean +shutdown) the command: +.Bd -unfilled -offset indent +raidctl -P raid0 +.Ed +.Pp +is used. Note that re-writing the parity can be done while +other operations on the RAID set are taking place (e.g. while doing a +.Xr fsck 8 +on a filesystem on the RAID set). However: for maximum effectiveness +of the RAID set, the parity should be known to be correct before any +data on the set is modified. +.Pp +To see how the RAID set is doing, the following command can be used to +show the RAID set's status: +.Bd -unfilled -offset indent +raidctl -s raid0 +.Ed +.Pp +The output will look something like: +.Bd -unfilled -offset indent +Components: + /dev/da1s1e: optimal + /dev/da2s1e: optimal + /dev/da3s1e: optimal +Spares: + /dev/da4s1e: spare +Component label for /dev/da1s1e: + Row: 0 Column: 0 Num Rows: 1 Num Columns: 3 + Version: 2 Serial Number: 13432 Mod Counter: 65 + Clean: No Status: 0 + sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 + RAID Level: 5 blocksize: 512 numBlocks: 1799936 + Autoconfig: No + Last configured as: raid0 +Component label for /dev/da2s1e: + Row: 0 Column: 1 Num Rows: 1 Num Columns: 3 + Version: 2 Serial Number: 13432 Mod Counter: 65 + Clean: No Status: 0 + sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 + RAID Level: 5 blocksize: 512 numBlocks: 1799936 + Autoconfig: No + Last configured as: raid0 +Component label for /dev/da3s1e: + Row: 0 Column: 2 Num Rows: 1 Num Columns: 3 + Version: 2 Serial Number: 13432 Mod Counter: 65 + Clean: No Status: 0 + sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 + RAID Level: 5 blocksize: 512 numBlocks: 1799936 + Autoconfig: No + Last configured as: raid0 +Parity status: clean +Reconstruction is 100% complete. +Parity Re-write is 100% complete. +Copyback is 100% complete. +.Ed +.Pp +This indicates that all is well with the RAID set. Of importance here +are the component lines which read +.Sq optimal , +and the +.Sq Parity status +line which indicates that the parity is up-to-date. Note that if +there are filesystems open on the RAID set, the individual components +will not be +.Sq clean +but the set as a whole can still be clean. +.Pp +To check the component label of /dev/da1s1e, the following is used: +.Bd -unfilled -offset indent +raidctl -g /dev/da1s1e raid0 +.Ed +.Pp +The output of this command will look something like: +.Bd -unfilled -offset indent +Component label for /dev/da1s1e: + Row: 0 Column: 0 Num Rows: 1 Num Columns: 3 + Version: 2 Serial Number: 13432 Mod Counter: 65 + Clean: No Status: 0 + sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 + RAID Level: 5 blocksize: 512 numBlocks: 1799936 + Autoconfig: No + Last configured as: raid0 +.Ed +.Pp +.Ss Dealing with Component Failures +If for some reason +(perhaps to test reconstruction) it is necessary to pretend a drive +has failed, the following will perform that function: +.Bd -unfilled -offset indent +raidctl -f /dev/da2s1e raid0 +.Ed +.Pp +The system will then be performing all operations in degraded mode, +where missing data is re-computed from existing data and the parity. +In this case, obtaining the status of raid0 will return (in part): +.Bd -unfilled -offset indent +Components: + /dev/da1s1e: optimal + /dev/da2s1e: failed + /dev/da3s1e: optimal +Spares: + /dev/da4s1e: spare +.Ed +.Pp +Note that with the use of +.Fl f +a reconstruction has not been started. To both fail the disk and +start a reconstruction, the +.Fl F +option must be used: +.Bd -unfilled -offset indent +raidctl -F /dev/da2s1e raid0 +.Ed +.Pp +The +.Fl f +option may be used first, and then the +.Fl F +option used later, on the same disk, if desired. +Immediately after the reconstruction is started, the status will report: +.Bd -unfilled -offset indent +Components: + /dev/da1s1e: optimal + /dev/da2s1e: reconstructing + /dev/da3s1e: optimal +Spares: + /dev/da4s1e: used_spare +[...] +Parity status: clean +Reconstruction is 10% complete. +Parity Re-write is 100% complete. +Copyback is 100% complete. +.Ed +.Pp +This indicates that a reconstruction is in progress. To find out how +the reconstruction is progressing the +.Fl S +option may be used. This will indicate the progress in terms of the +percentage of the reconstruction that is completed. When the +reconstruction is finished the +.Fl s +option will show: +.Bd -unfilled -offset indent +Components: + /dev/da1s1e: optimal + /dev/da2s1e: spared + /dev/da3s1e: optimal +Spares: + /dev/da4s1e: used_spare +[...] +Parity status: clean +Reconstruction is 100% complete. +Parity Re-write is 100% complete. +Copyback is 100% complete. +.Ed +.Pp +At this point there are at least two options. First, if /dev/da2s1e is +known to be good (i.e. the failure was either caused by +.Fl f +or +.Fl F , +or the failed disk was replaced), then a copyback of the data can +be initiated with the +.Fl B +option. In this example, this would copy the entire contents of +/dev/da4s1e to /dev/da2s1e. Once the copyback procedure is complete, the +status of the device would be (in part): +.Bd -unfilled -offset indent +Components: + /dev/da1s1e: optimal + /dev/da2s1e: optimal + /dev/da3s1e: optimal +Spares: + /dev/da4s1e: spare +.Ed +.Pp +and the system is back to normal operation. +.Pp +The second option after the reconstruction is to simply use /dev/da4s1e +in place of /dev/da2s1e in the configuration file. For example, the +configuration file (in part) might now look like: +.Bd -unfilled -offset indent +START array +1 3 0 + +START drives +/dev/da1s1e +/dev/da4s1e +/dev/da3s1e +.Ed +.Pp +This can be done as /dev/da4s1e is completely interchangeable with +/dev/da2s1e at this point. Note that extreme care must be taken when +changing the order of the drives in a configuration. This is one of +the few instances where the devices and/or their orderings can be +changed without loss of data! In general, the ordering of components +in a configuration file should +.Ar never +be changed. +.Pp +If a component fails and there are no hot spares +available on-line, the status of the RAID set might (in part) look like: +.Bd -unfilled -offset indent +Components: + /dev/da1s1e: optimal + /dev/da2s1e: failed + /dev/da3s1e: optimal +No spares. +.Ed +.Pp +In this case there are a number of options. The first option is to add a hot +spare using: +.Bd -unfilled -offset indent +raidctl -a /dev/da4s1e raid0 +.Ed +.Pp +After the hot add, the status would then be: +.Bd -unfilled -offset indent +Components: + /dev/da1s1e: optimal + /dev/da2s1e: failed + /dev/da3s1e: optimal +Spares: + /dev/da4s1e: spare +.Ed +.Pp +Reconstruction could then take place using +.Fl F +as describe above. +.Pp +A second option is to rebuild directly onto /dev/da2s1e. Once the disk +containing /dev/da2s1e has been replaced, one can simply use: +.Bd -unfilled -offset indent +raidctl -R /dev/da2s1e raid0 +.Ed +.Pp +to rebuild the /dev/da2s1e component. As the rebuilding is in progress, +the status will be: +.Bd -unfilled -offset indent +Components: + /dev/da1s1e: optimal + /dev/da2s1e: reconstructing + /dev/da3s1e: optimal +No spares. +.Ed +.Pp +and when completed, will be: +.Bd -unfilled -offset indent +Components: + /dev/da1s1e: optimal + /dev/da2s1e: optimal + /dev/da3s1e: optimal +No spares. +.Ed +.Pp +In circumstances where a particular component is completely +unavailable after a reboot, a special component name will be used to +indicate the missing component. For example: +.Bd -unfilled -offset indent +Components: + /dev/da2s1e: optimal + component1: failed +No spares. +.Ed +.Pp +indicates that the second component of this RAID set was not detected +at all by the auto-configuration code. The name +.Sq component1 +can be used anywhere a normal component name would be used. For +example, to add a hot spare to the above set, and rebuild to that hot +spare, the following could be done: +.Bd -unfilled -offset indent +raidctl -a /dev/da3s1e raid0 +raidctl -F component1 raid0 +.Ed +.Pp +at which point the data missing from +.Sq component1 +would be reconstructed onto /dev/da3s1e. +.Pp +.Ss RAID on RAID +RAID sets can be layered to create more complex and much larger RAID +sets. A RAID 0 set, for example, could be constructed from four RAID +5 sets. The following configuration file shows such a setup: +.Bd -unfilled -offset indent +START array +# numRow numCol numSpare +1 4 0 + +START disks +/dev/raid1e +/dev/raid2e +/dev/raid3e +/dev/raid4e + +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0 +128 1 1 0 + +START queue +fifo 100 +.Ed +.Pp +A similar configuration file might be used for a RAID 0 set +constructed from components on RAID 1 sets. In such a configuration, +the mirroring provides a high degree of redundancy, while the striping +provides additional speed benefits. +.Pp +.Ss Auto-configuration and Root on RAID +RAID sets can also be auto-configured at boot. To make a set +auto-configurable, simply prepare the RAID set as above, and then do +a: +.Bd -unfilled -offset indent +raidctl -A yes raid0 +.Ed +.Pp +to turn on auto-configuration for that set. To turn off +auto-configuration, use: +.Bd -unfilled -offset indent +raidctl -A no raid0 +.Ed +.Pp +RAID sets which are auto-configurable will be configured before the +root filesystem is mounted. These RAID sets are thus available for +use as a root filesystem, or for any other filesystem. A primary +advantage of using the auto-configuration is that RAID components +become more independent of the disks they reside on. For example, +SCSI ID's can change, but auto-configured sets will always be +configured correctly, even if the SCSI ID's of the component disks +have become scrambled. +.Pp +Having a system's root filesystem (/) on a RAID set is also allowed, +with the +.Sq a +partition of such a RAID set being used for /. +To use raid0a as the root filesystem, simply use: +.Bd -unfilled -offset indent +raidctl -A root raid0 +.Ed +.Pp +To return raid0a to be just an auto-configuring set simply use the +.Fl A Ar yes +arguments. +.Pp +Note that kernels can only be directly read from RAID 1 components on +alpha and pmax architectures. On those architectures, the +.Dv FS_RAID +filesystem is recognized by the bootblocks, and will properly load the +kernel directly from a RAID 1 component. For other architectures, or +to support the root filesystem on other RAID sets, some other +mechanism must be used to get a kernel booting. For example, a small +partition containing only the secondary boot-blocks and an alternate +kernel (or two) could be used. Once a kernel is booting however, and +an auto-configuring RAID set is found that is eligible to be root, +then that RAID set will be auto-configured and used as the root +device. If two or more RAID sets claim to be root devices, then the +user will be prompted to select the root device. At this time, RAID +0, 1, 4, and 5 sets are all supported as root devices. +.Pp +A typical RAID 1 setup with root on RAID might be as follows: +.Bl -enum +.It +wd0a - a small partition, which contains a complete, bootable, basic +NetBSD installation. +.It +wd1a - also contains a complete, bootable, basic NetBSD installation. +.It +wd0e and wd1e - a RAID 1 set, raid0, used for the root filesystem. +.It +wd0f and wd1f - a RAID 1 set, raid1, which will be used only for +swap space. +.It +wd0g and wd1g - a RAID 1 set, raid2, used for /usr, /home, or other +data, if desired. +.It +wd0h and wd0h - a RAID 1 set, raid3, if desired. +.El +.Pp +RAID sets raid0, raid1, and raid2 are all marked as +auto-configurable. raid0 is marked as being a root filesystem. +When new kernels are installed, the kernel is not only copied to /, +but also to wd0a and wd1a. The kernel on wd0a is required, since that +is the kernel the system boots from. The kernel on wd1a is also +required, since that will be the kernel used should wd0 fail. The +important point here is to have redundant copies of the kernel +available, in the event that one of the drives fail. +.Pp +There is no requirement that the root filesystem be on the same disk +as the kernel. For example, obtaining the kernel from wd0a, and using +da0s1e and da1s1e for raid0, and the root filesystem, is fine. It +.Ar is +critical, however, that there be multiple kernels available, in the +event of media failure. +.Pp +Multi-layered RAID devices (such as a RAID 0 set made +up of RAID 1 sets) are +.Ar not +supported as root devices or auto-configurable devices at this point. +(Multi-layered RAID devices +.Ar are +supported in general, however, as mentioned earlier.) Note that in +order to enable component auto-detection and auto-configuration of +RAID devices, the line: +.Bd -unfilled -offset indent +options RAID_AUTOCONFIG +.Ed +.Pp +must be in the kernel configuration file. See +.Xr raid 4 +for more details. +.Pp +.Ss Unconfiguration +The final operation performed by +.Nm +is to unconfigure a +.Xr raid 4 +device. This is accomplished via a simple: +.Bd -unfilled -offset indent +raidctl -u raid0 +.Ed +.Pp +at which point the device is ready to be reconfigured. +.Pp +.Ss Performance Tuning +Selection of the various parameter values which result in the best +performance can be quite tricky, and often requires a bit of +trial-and-error to get those values most appropriate for a given system. +A whole range of factors come into play, including: +.Bl -enum +.It +Types of components (e.g. SCSI vs. IDE) and their bandwidth +.It +Types of controller cards and their bandwidth +.It +Distribution of components among controllers +.It +IO bandwidth +.It +Filesystem access patterns +.It +CPU speed +.El +.Pp +As with most performance tuning, benchmarking under real-life loads +may be the only way to measure expected performance. Understanding +some of the underlying technology is also useful in tuning. The goal +of this section is to provide pointers to those parameters which may +make significant differences in performance. +.Pp +For a RAID 1 set, a SectPerSU value of 64 or 128 is typically +sufficient. Since data in a RAID 1 set is arranged in a linear +fashion on each component, selecting an appropriate stripe size is +somewhat less critical than it is for a RAID 5 set. However: a stripe +size that is too small will cause large IO's to be broken up into a +number of smaller ones, hurting performance. At the same time, a +large stripe size may cause problems with concurrent accesses to +stripes, which may also affect performance. Thus values in the range +of 32 to 128 are often the most effective. +.Pp +Tuning RAID 5 sets is trickier. In the best case, IO is presented to +the RAID set one stripe at a time. Since the entire stripe is +available at the beginning of the IO, the parity of that stripe can +be calculated before the stripe is written, and then the stripe data +and parity can be written in parallel. When the amount of data being +written is less than a full stripe worth, the +.Sq small write +problem occurs. Since a +.Sq small write +means only a portion of the stripe on the components is going to +change, the data (and parity) on the components must be updated +slightly differently. First, the +.Sq old parity +and +.Sq old data +must be read from the components. Then the new parity is constructed, +using the new data to be written, and the old data and old parity. +Finally, the new data and new parity are written. All this extra data +shuffling results in a serious loss of performance, and is typically 2 +to 4 times slower than a full stripe write (or read). To combat this +problem in the real world, it may be useful to ensure that stripe +sizes are small enough that a +.Sq large IO +from the system will use exactly one large stripe write. As is seen +later, there are some filesystem dependencies which may come into play +here as well. +.Pp +Since the size of a +.Sq large IO +is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may +be desirable to select a SectPerSU value of 16 blocks (8K) or 32 +blocks (16K). Since there are 4 data sectors per stripe, the maximum +data per stripe is 64 blocks (32K) or 128 blocks (64K). Again, +empirical measurement will provide the best indicators of which +values will yeild better performance. +.Pp +The parameters used for the filesystem are also critical to good +performance. For +.Xr newfs 8 , +for example, increasing the block size to 32K or 64K may improve +performance dramatically. As well, changing the cylinders-per-group +parameter from 16 to 32 or higher is often not only necessary for +larger filesystems, but may also have positive performance +implications. +.Pp +.Ss Summary +Despite the length of this man-page, configuring a RAID set is a +relatively straight-forward process. All that needs to be done is the +following steps: +.Bl -enum +.It +Use +.Xr disklabel 8 +to create the components (of type RAID). +.It +Construct a RAID configuration file: e.g. +.Sq raid0.conf +.It +Configure the RAID set with: +.Bd -unfilled -offset indent +raidctl -C raid0.conf +.Ed +.Pp +.It +Initialize the component labels with: +.Bd -unfilled -offset indent +raidctl -I 123456 raid0 +.Ed +.Pp +.It +Initialize other important parts of the set with: +.Bd -unfilled -offset indent +raidctl -i raid0 +.Ed +.Pp +.It +Get the default label for the RAID set: +.Bd -unfilled -offset indent +disklabel raid0 > /tmp/label +.Ed +.Pp +.It +Edit the label: +.Bd -unfilled -offset indent +vi /tmp/label +.Ed +.Pp +.It +Put the new label on the RAID set: +.Bd -unfilled -offset indent +disklabel -R -r raid0 /tmp/label +.Ed +.Pp +.It +Create the filesystem: +.Bd -unfilled -offset indent +newfs /dev/rraid0e +.Ed +.Pp +.It +Mount the filesystem: +.Bd -unfilled -offset indent +mount /dev/raid0e /mnt +.Ed +.Pp +.It +Use: +.Bd -unfilled -offset indent +raidctl -c raid0.conf +.Ed +.Pp +To re-configure the RAID set the next time it is needed, or put +raid0.conf into /etc where it will automatically be started by +the /etc/rc scripts. +.El +.Pp +.Sh WARNINGS +Certain RAID levels (1, 4, 5, 6, and others) can protect against some +data loss due to component failure. However the loss of two +components of a RAID 4 or 5 system, or the loss of a single component +of a RAID 0 system will result in the entire filesystem being lost. +RAID is +.Ar NOT +a substitute for good backup practices. +.Pp +Recomputation of parity +.Ar MUST +be performed whenever there is a chance that it may have been +compromised. This includes after system crashes, or before a RAID +device has been used for the first time. Failure to keep parity +correct will be catastrophic should a component ever fail -- it is +better to use RAID 0 and get the additional space and speed, than it +is to use parity, but not keep the parity correct. At least with RAID +0 there is no perception of increased data security. +.Pp +.Sh FILES +.Bl -tag -width /dev/XXrXraidX -compact +.It Pa /dev/{,r}raid* +.Cm raid +device special files. +.El +.Pp +.Sh SEE ALSO +.Xr raid 4 , +.Xr ccd 4 , +.Xr rc 8 +.Sh BUGS +Hot-spare removal is currently not available. +.Sh HISTORY +RAIDframe is a framework for rapid prototyping of RAID structures +developed by the folks at the Parallel Data Laboratory at Carnegie +Mellon University (CMU). +A more complete description of the internals and functionality of +RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool +for RAID Systems", by William V. Courtright II, Garth Gibson, Mark +Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the +Parallel Data Laboratory of Carnegie Mellon University. +.Pp +The +.Nm +command first appeared as a program in CMU's RAIDframe v1.1 distribution. This +version of +.Nm +is a complete re-write, and first appeared in +.Fx 4.4 . +.Sh COPYRIGHT +.Bd -unfilled +The RAIDframe Copyright is as follows: + +Copyright (c) 1994-1996 Carnegie-Mellon University. +All rights reserved. + +Permission to use, copy, modify and distribute this software and +its documentation is hereby granted, provided that both the copyright +notice and this permission notice appear in all copies of the +software, derivative works or modified versions, and any portions +thereof, and that both notices appear in supporting documentation. + +CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" +CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND +FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. + +Carnegie Mellon requests users of this software to return to + + Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU + School of Computer Science + Carnegie Mellon University + Pittsburgh PA 15213-3890 + +any improvements or extensions that they make and grant Carnegie the +rights to redistribute these changes. +.Ed |