summaryrefslogtreecommitdiffstats
path: root/sbin/raidctl/raidctl.8
diff options
context:
space:
mode:
Diffstat (limited to 'sbin/raidctl/raidctl.8')
-rw-r--r--sbin/raidctl/raidctl.81325
1 files changed, 1325 insertions, 0 deletions
diff --git a/sbin/raidctl/raidctl.8 b/sbin/raidctl/raidctl.8
new file mode 100644
index 0000000..9aef14f
--- /dev/null
+++ b/sbin/raidctl/raidctl.8
@@ -0,0 +1,1325 @@
+.\" $FreeBSD$
+.\" $NetBSD: raidctl.8,v 1.21 2000/08/10 15:14:14 oster Exp $
+.\"
+.\" Copyright (c) 1998 The NetBSD Foundation, Inc.
+.\" All rights reserved.
+.\"
+.\" This code is derived from software contributed to The NetBSD Foundation
+.\" by Greg Oster
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\" 3. All advertising materials mentioning features or use of this software
+.\" must display the following acknowledgement:
+.\" This product includes software developed by the NetBSD
+.\" Foundation, Inc. and its contributors.
+.\" 4. Neither the name of The NetBSD Foundation nor the names of its
+.\" contributors may be used to endorse or promote products derived
+.\" from this software without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
+.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
+.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+.\" POSSIBILITY OF SUCH DAMAGE.
+.\"
+.\"
+.\" Copyright (c) 1995 Carnegie-Mellon University.
+.\" All rights reserved.
+.\"
+.\" Author: Mark Holland
+.\"
+.\" Permission to use, copy, modify and distribute this software and
+.\" its documentation is hereby granted, provided that both the copyright
+.\" notice and this permission notice appear in all copies of the
+.\" software, derivative works or modified versions, and any portions
+.\" thereof, and that both notices appear in supporting documentation.
+.\"
+.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
+.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
+.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
+.\"
+.\" Carnegie Mellon requests users of this software to return to
+.\"
+.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
+.\" School of Computer Science
+.\" Carnegie Mellon University
+.\" Pittsburgh PA 15213-3890
+.\"
+.\" any improvements or extensions that they make and grant Carnegie the
+.\" rights to redistribute these changes.
+.\"
+.Dd November 6, 1998
+.Dt RAIDCTL 8
+.Os FreeBSD
+.Sh NAME
+.Nm raidctl
+.Nd configuration utility for the RAIDframe disk driver
+.Sh SYNOPSIS
+.Nm
+.Op Fl v
+.Fl a Ar component Ar dev
+.Nm
+.Op Fl v
+.Fl A Op yes | no | root
+.Ar dev
+.Nm
+.Op Fl v
+.Fl B Ar dev
+.Nm
+.Op Fl v
+.Fl c Ar config_file
+.Nm
+.Op Fl v
+.Fl C Ar config_file
+.Nm
+.Op Fl v
+.Fl f Ar component Ar dev
+.Nm
+.Op Fl v
+.Fl F Ar component Ar dev
+.Nm
+.Op Fl v
+.Fl g Ar component Ar dev
+.Nm
+.Op Fl v
+.Fl i Ar dev
+.Nm
+.Op Fl v
+.Fl I Ar serial_number Ar dev
+.Nm
+.Op Fl v
+.Fl p Ar dev
+.Nm
+.Op Fl v
+.Fl P Ar dev
+.Nm
+.Op Fl v
+.Fl r Ar component Ar dev
+.Nm
+.Op Fl v
+.Fl R Ar component Ar dev
+.Nm
+.Op Fl v
+.Fl s Ar dev
+.Nm
+.Op Fl v
+.Fl S Ar dev
+.Nm
+.Op Fl v
+.Fl u Ar dev
+.Sh DESCRIPTION
+.Nm
+is the user-land control program for
+.Xr raid 4 ,
+the RAIDframe disk device.
+.Nm
+is primarily used to dynamically configure and unconfigure RAIDframe disk
+devices. For more information about the RAIDframe disk device, see
+.Xr raid 4 .
+.Pp
+This document assumes the reader has at least rudimentary knowledge of
+RAID and RAID concepts.
+.Pp
+The command-line options for
+.Nm
+are as follows:
+.Bl -tag -width indent
+.It Fl a Ar component Ar dev
+Add
+.Ar component
+as a hot spare for the device
+.Ar dev .
+.It Fl A Ic yes Ar dev
+Make the RAID set auto-configurable. The RAID set will be
+automatically configured at boot
+.Ar before
+the root filesystem is
+mounted. Note that all components of the set must be of type RAID in the
+disklabel.
+.It Fl A Ic no Ar dev
+Turn off auto-configuration for the RAID set.
+.It Fl A Ic root Ar dev
+Make the RAID set auto-configurable, and also mark the set as being
+eligible to be the root partition. A RAID set configured this way
+will
+.Ar override
+the use of the boot disk as the root device. All components of the
+set must be of type RAID in the disklabel. Note that the kernel being
+booted must currently reside on a non-RAID set.
+.It Fl B Ar dev
+Initiate a copyback of reconstructed data from a spare disk to
+its original disk. This is performed after a component has failed,
+and the failed drive has been reconstructed onto a spare drive.
+.It Fl c Ar config_file
+Configure a RAIDframe device
+according to the configuration given in
+.Ar config_file .
+A description of the contents of
+.Ar config_file
+is given later.
+.It Fl C Ar config_file
+As for
+.Ar -c ,
+but forces the configuration to take place. This is required the
+first time a RAID set is configured.
+.It Fl f Ar component Ar dev
+This marks the specified
+.Ar component
+as having failed, but does not initiate a reconstruction of that
+component.
+.It Fl F Ar component Ar dev
+Fails the specified
+.Ar component
+of the device, and immediately begin a reconstruction of the failed
+disk onto an available hot spare. This is one of the mechanisms used to start
+the reconstruction process if a component does have a hardware failure.
+.It Fl g Ar component Ar dev
+Get the component label for the specified component.
+.It Fl i Ar dev
+Initialize the RAID device. In particular, (re-write) the parity on
+the selected device. This
+.Ar MUST
+be done for
+.Ar all
+RAID sets before the RAID device is labeled and before
+filesystems are created on the RAID device.
+.It Fl I Ar serial_number Ar dev
+Initialize the component labels on each component of the device.
+.Ar serial_number
+is used as one of the keys in determining whether a
+particular set of components belong to the same RAID set. While not
+strictly enforced, different serial numbers should be used for
+different RAID sets. This step
+.Ar MUST
+be performed when a new RAID set is created.
+.It Fl p Ar dev
+Check the status of the parity on the RAID set. Displays a status
+message, and returns successfully if the parity is up-to-date.
+.It Fl P Ar dev
+Check the status of the parity on the RAID set, and initialize
+(re-write) the parity if the parity is not known to be up-to-date.
+This is normally used after a system crash (and before a
+.Xr fsck 8 )
+to ensure the integrity of the parity.
+.It Fl r Ar component Ar dev
+Remove the spare disk specified by
+.Ar component
+from the set of available spare components.
+.It Fl R Ar component Ar dev
+Fails the specified
+.Ar component ,
+if necessary, and immediately begins a reconstruction back to
+.Ar component .
+This is useful for reconstructing back onto a component after
+it has been replaced following a failure.
+.It Fl s Ar dev
+Display the status of the RAIDframe device for each of the components
+and spares.
+.It Fl S Ar dev
+Check the status of parity re-writing, component reconstruction, and
+component copyback. The output indicates the amount of progress
+achieved in each of these areas.
+.It Fl u Ar dev
+Unconfigure the RAIDframe device.
+.It Fl v
+Be more verbose. For operations such as reconstructions, parity
+re-writing, and copybacks, provide a progress indicator.
+.El
+.Pp
+The device used by
+.Nm
+is specified by
+.Ar dev .
+.Ar dev
+may be either the full name of the device, e.g. /dev/rraid0d,
+for the i386 architecture, and /dev/rraid0c
+for all others, or just simply raid0 (for /dev/rraid0d).
+.Pp
+The format of the configuration file is complex, and
+only an abbreviated treatment is given here. In the configuration
+files, a
+.Sq #
+indicates the beginning of a comment.
+.Pp
+There are 4 required sections of a configuration file, and 2
+optional sections. Each section begins with a
+.Sq START ,
+followed by
+the section name, and the configuration parameters associated with that
+section. The first section is the
+.Sq array
+section, and it specifies
+the number of rows, columns, and spare disks in the RAID set. For
+example:
+.Bd -unfilled -offset indent
+START array
+1 3 0
+.Ed
+.Pp
+indicates an array with 1 row, 3 columns, and 0 spare disks. Note
+that although multi-dimensional arrays may be specified, they are
+.Ar NOT
+supported in the driver.
+.Pp
+The second section, the
+.Sq disks
+section, specifies the actual
+components of the device. For example:
+.Bd -unfilled -offset indent
+START disks
+/dev/da0s1e
+/dev/da1s1e
+/dev/da2s1e
+.Ed
+.Pp
+specifies the three component disks to be used in the RAID device. If
+any of the specified drives cannot be found when the RAID device is
+configured, then they will be marked as
+.Sq failed ,
+and the system will
+operate in degraded mode. Note that it is
+.Ar imperative
+that the order of the components in the configuration file does not
+change between configurations of a RAID device. Changing the order
+of the components will result in data loss if the set is configured
+with the
+.Fl C
+option. In normal circumstances, the RAID set will not configure if
+only
+.Fl c
+is specified, and the components are out-of-order.
+.Pp
+The next section, which is the
+.Sq spare
+section, is optional, and, if
+present, specifies the devices to be used as
+.Sq hot spares
+-- devices
+which are on-line, but are not actively used by the RAID driver unless
+one of the main components fail. A simple
+.Sq spare
+section might be:
+.Bd -unfilled -offset indent
+START spare
+/dev/da3s1e
+.Ed
+.Pp
+for a configuration with a single spare component. If no spare drives
+are to be used in the configuration, then the
+.Sq spare
+section may be omitted.
+.Pp
+The next section is the
+.Sq layout
+section. This section describes the
+general layout parameters for the RAID device, and provides such
+information as sectors per stripe unit, stripe units per parity unit,
+stripe units per reconstruction unit, and the parity configuration to
+use. This section might look like:
+.Bd -unfilled -offset indent
+START layout
+# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
+32 1 1 5
+.Ed
+.Pp
+The sectors per stripe unit specifies, in blocks, the interleave
+factor; i.e. the number of contiguous sectors to be written to each
+component for a single stripe. Appropriate selection of this value
+(32 in this example) is the subject of much research in RAID
+architectures. The stripe units per parity unit and
+stripe units per reconstruction unit are normally each set to 1.
+While certain values above 1 are permitted, a discussion of valid
+values and the consequences of using anything other than 1 are outside
+the scope of this document. The last value in this section (5 in this
+example) indicates the parity configuration desired. Valid entries
+include:
+.Bl -tag -width inde
+.It 0
+RAID level 0. No parity, only simple striping.
+.It 1
+RAID level 1. Mirroring. The parity is the mirror.
+.It 4
+RAID level 4. Striping across components, with parity stored on the
+last component.
+.It 5
+RAID level 5. Striping across components, parity distributed across
+all components.
+.El
+.Pp
+There are other valid entries here, including those for Even-Odd
+parity, RAID level 5 with rotated sparing, Chained declustering,
+and Interleaved declustering, but as of this writing the code for
+those parity operations has not been tested with
+.Fx .
+.Pp
+The next required section is the
+.Sq queue
+section. This is most often
+specified as:
+.Bd -unfilled -offset indent
+START queue
+fifo 100
+.Ed
+.Pp
+where the queuing method is specified as fifo (first-in, first-out),
+and the size of the per-component queue is limited to 100 requests.
+Other queuing methods may also be specified, but a discussion of them
+is beyond the scope of this document.
+.Pp
+The final section, the
+.Sq debug
+section, is optional. For more details
+on this the reader is referred to the RAIDframe documentation
+discussed in the
+.Sx HISTORY
+section.
+
+See
+.Sx EXAMPLES
+for a more complete configuration file example.
+
+.Sh EXAMPLES
+
+It is highly recommended that before using the RAID driver for real
+filesystems that the system administrator(s) become quite familiar
+with the use of
+.Nm ,
+and that they understand how the component reconstruction process
+works. The examples in this section will focus on configuring a
+number of different RAID sets of varying degrees of redundancy.
+By working through these examples, administrators should be able to
+develop a good feel for how to configure a RAID set, and how to
+initiate reconstruction of failed components.
+.Pp
+In the following examples
+.Sq raid0
+will be used to denote the RAID device. Depending on the
+architecture,
+.Sq /dev/rraid0c
+or
+.Sq /dev/rraid0d
+may be used in place of
+.Sq raid0 .
+.Pp
+.Ss Initialization and Configuration
+The initial step in configuring a RAID set is to identify the components
+that will be used in the RAID set. All components should be the same
+size. Each component should have a disklabel type of
+.Dv FS_RAID ,
+and a typical disklabel entry for a RAID component
+might look like:
+.Bd -unfilled -offset indent
+f: 1800000 200495 RAID # (Cyl. 405*- 4041*)
+.Ed
+.Pp
+While
+.Dv FS_BSDFFS
+will also work as the component type, the type
+.Dv FS_RAID
+is preferred for RAIDframe use, as it is required for features such as
+auto-configuration. As part of the initial configuration of each RAID
+set, each component will be given a
+.Sq component label .
+A
+.Sq component label
+contains important information about the component, including a
+user-specified serial number, the row and column of that component in
+the RAID set, the redundancy level of the RAID set, a 'modification
+counter', and whether the parity information (if any) on that
+component is known to be correct. Component labels are an integral
+part of the RAID set, since they are used to ensure that components
+are configured in the correct order, and used to keep track of other
+vital information about the RAID set. Component labels are also
+required for the auto-detection and auto-configuration of RAID sets at
+boot time. For a component label to be considered valid, that
+particular component label must be in agreement with the other
+component labels in the set. For example, the serial number,
+.Sq modification counter ,
+number of rows and number of columns must all
+be in agreement. If any of these are different, then the component is
+not considered to be part of the set. See
+.Xr raid 4
+for more information about component labels.
+.Pp
+Once the components have been identified, and the disks have
+appropriate labels,
+.Nm
+is then used to configure the
+.Xr raid 4
+device. To configure the device, a configuration
+file which looks something like:
+.Bd -unfilled -offset indent
+START array
+# numRow numCol numSpare
+1 3 1
+
+START disks
+/dev/da1s1e
+/dev/da2s1e
+/dev/da3s1e
+
+START spare
+/dev/da4s1e
+
+START layout
+# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
+32 1 1 5
+
+START queue
+fifo 100
+.Ed
+.Pp
+is created in a file. The above configuration file specifies a RAID 5
+set consisting of the components /dev/da1s1e, /dev/da2s1e, and /dev/da3s1e,
+with /dev/da4s1e available as a
+.Sq hot spare
+in case one of
+the three main drives should fail. A RAID 0 set would be specified in
+a similar way:
+.Bd -unfilled -offset indent
+START array
+# numRow numCol numSpare
+1 4 0
+
+START disks
+/dev/da1s10e
+/dev/da1s11e
+/dev/da1s12e
+/dev/da1s13e
+
+START layout
+# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
+64 1 1 0
+
+START queue
+fifo 100
+.Ed
+.Pp
+In this case, devices /dev/da1s10e, /dev/da1s11e, /dev/da1s12e, and /dev/da1s13e
+are the components that make up this RAID set. Note that there are no
+hot spares for a RAID 0 set, since there is no way to recover data if
+any of the components fail.
+.Pp
+For a RAID 1 (mirror) set, the following configuration might be used:
+.Bd -unfilled -offset indent
+START array
+# numRow numCol numSpare
+1 2 0
+
+START disks
+/dev/da2s10e
+/dev/da2s11e
+
+START layout
+# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
+128 1 1 1
+
+START queue
+fifo 100
+.Ed
+.Pp
+In this case, /dev/da2s10e and /dev/da2s11e are the two components of the
+mirror set. While no hot spares have been specified in this
+configuration, they easily could be, just as they were specified in
+the RAID 5 case above. Note as well that RAID 1 sets are currently
+limited to only 2 components. At present, n-way mirroring is not
+possible.
+.Pp
+The first time a RAID set is configured, the
+.Fl C
+option must be used:
+.Bd -unfilled -offset indent
+raidctl -C raid0.conf
+.Ed
+.Pp
+where
+.Sq raid0.conf
+is the name of the RAID configuration file. The
+.Fl C
+forces the configuration to succeed, even if any of the component
+labels are incorrect. The
+.Fl C
+option should not be used lightly in
+situations other than initial configurations, as if
+the system is refusing to configure a RAID set, there is probably a
+very good reason for it. After the initial configuration is done (and
+appropriate component labels are added with the
+.Fl I
+option) then raid0 can be configured normally with:
+.Bd -unfilled -offset indent
+raidctl -c raid0.conf
+.Ed
+.Pp
+When the RAID set is configured for the first time, it is
+necessary to initialize the component labels, and to initialize the
+parity on the RAID set. Initializing the component labels is done with:
+.Bd -unfilled -offset indent
+raidctl -I 112341 raid0
+.Ed
+.Pp
+where
+.Sq 112341
+is a user-specified serial number for the RAID set. This
+initialization step is
+.Ar required
+for all RAID sets. As well, using different
+serial numbers between RAID sets is
+.Ar strongly encouraged ,
+as using the same serial number for all RAID sets will only serve to
+decrease the usefulness of the component label checking.
+.Pp
+Initializing the RAID set is done via the
+.Fl i
+option. This initialization
+.Ar MUST
+be done for
+.Ar all
+RAID sets, since among other things it verifies that the parity (if
+any) on the RAID set is correct. Since this initialization may be
+quite time-consuming, the
+.Fl v
+option may be also used in conjunction with
+.Fl i :
+.Bd -unfilled -offset indent
+raidctl -iv raid0
+.Ed
+.Pp
+This will give more verbose output on the
+status of the initialization:
+.Bd -unfilled -offset indent
+Initiating re-write of parity
+Parity Re-write status:
+ 10% |**** | ETA: 06:03 /
+.Ed
+.Pp
+The output provides a
+.Sq Percent Complete
+in both a numeric and graphical format, as well as an estimated time
+to completion of the operation.
+.Pp
+Since it is the parity that provides the
+.Sq redundancy
+part of RAID, it is critical that the parity is correct
+as much as possible. If the parity is not correct, then there is no
+guarantee that data will not be lost if a component fails.
+.Pp
+Once the parity is known to be correct,
+it is then safe to perform
+.Xr disklabel 8 ,
+.Xr newfs 8 ,
+or
+.Xr fsck 8
+on the device or its filesystems, and then to mount the filesystems
+for use.
+.Pp
+Under certain circumstances (e.g. the additional component has not
+arrived, or data is being migrated off of a disk destined to become a
+component) it may be desirable to to configure a RAID 1 set with only
+a single component. This can be achieved by configuring the set with
+a physically existing component (as either the first or second
+component) and with a
+.Sq fake
+component. In the following:
+.Bd -unfilled -offset indent
+START array
+# numRow numCol numSpare
+1 2 0
+
+START disks
+/dev/da6s1e
+/dev/da0s1e
+
+START layout
+# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
+128 1 1 1
+
+START queue
+fifo 100
+.Ed
+.Pp
+/dev/da0s1e is the real component, and will be the second disk of a RAID 1
+set. The component /dev/da6s1e, which must exist, but have no physical
+device associated with it, is simply used as a placeholder.
+Configuration (using
+.Fl C
+and
+.Fl I Ar 12345
+as above) proceeds normally, but initialization of the RAID set will
+have to wait until all physical components are present. After
+configuration, this set can be used normally, but will be operating
+in degraded mode. Once a second physical component is obtained, it
+can be hot-added, the existing data mirrored, and normal operation
+resumed.
+.Pp
+.Ss Maintenance of the RAID set
+After the parity has been initialized for the first time, the command:
+.Bd -unfilled -offset indent
+raidctl -p raid0
+.Ed
+.Pp
+can be used to check the current status of the parity. To check the
+parity and rebuild it necessary (for example, after an unclean
+shutdown) the command:
+.Bd -unfilled -offset indent
+raidctl -P raid0
+.Ed
+.Pp
+is used. Note that re-writing the parity can be done while
+other operations on the RAID set are taking place (e.g. while doing a
+.Xr fsck 8
+on a filesystem on the RAID set). However: for maximum effectiveness
+of the RAID set, the parity should be known to be correct before any
+data on the set is modified.
+.Pp
+To see how the RAID set is doing, the following command can be used to
+show the RAID set's status:
+.Bd -unfilled -offset indent
+raidctl -s raid0
+.Ed
+.Pp
+The output will look something like:
+.Bd -unfilled -offset indent
+Components:
+ /dev/da1s1e: optimal
+ /dev/da2s1e: optimal
+ /dev/da3s1e: optimal
+Spares:
+ /dev/da4s1e: spare
+Component label for /dev/da1s1e:
+ Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
+ Version: 2 Serial Number: 13432 Mod Counter: 65
+ Clean: No Status: 0
+ sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
+ RAID Level: 5 blocksize: 512 numBlocks: 1799936
+ Autoconfig: No
+ Last configured as: raid0
+Component label for /dev/da2s1e:
+ Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
+ Version: 2 Serial Number: 13432 Mod Counter: 65
+ Clean: No Status: 0
+ sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
+ RAID Level: 5 blocksize: 512 numBlocks: 1799936
+ Autoconfig: No
+ Last configured as: raid0
+Component label for /dev/da3s1e:
+ Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
+ Version: 2 Serial Number: 13432 Mod Counter: 65
+ Clean: No Status: 0
+ sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
+ RAID Level: 5 blocksize: 512 numBlocks: 1799936
+ Autoconfig: No
+ Last configured as: raid0
+Parity status: clean
+Reconstruction is 100% complete.
+Parity Re-write is 100% complete.
+Copyback is 100% complete.
+.Ed
+.Pp
+This indicates that all is well with the RAID set. Of importance here
+are the component lines which read
+.Sq optimal ,
+and the
+.Sq Parity status
+line which indicates that the parity is up-to-date. Note that if
+there are filesystems open on the RAID set, the individual components
+will not be
+.Sq clean
+but the set as a whole can still be clean.
+.Pp
+To check the component label of /dev/da1s1e, the following is used:
+.Bd -unfilled -offset indent
+raidctl -g /dev/da1s1e raid0
+.Ed
+.Pp
+The output of this command will look something like:
+.Bd -unfilled -offset indent
+Component label for /dev/da1s1e:
+ Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
+ Version: 2 Serial Number: 13432 Mod Counter: 65
+ Clean: No Status: 0
+ sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
+ RAID Level: 5 blocksize: 512 numBlocks: 1799936
+ Autoconfig: No
+ Last configured as: raid0
+.Ed
+.Pp
+.Ss Dealing with Component Failures
+If for some reason
+(perhaps to test reconstruction) it is necessary to pretend a drive
+has failed, the following will perform that function:
+.Bd -unfilled -offset indent
+raidctl -f /dev/da2s1e raid0
+.Ed
+.Pp
+The system will then be performing all operations in degraded mode,
+where missing data is re-computed from existing data and the parity.
+In this case, obtaining the status of raid0 will return (in part):
+.Bd -unfilled -offset indent
+Components:
+ /dev/da1s1e: optimal
+ /dev/da2s1e: failed
+ /dev/da3s1e: optimal
+Spares:
+ /dev/da4s1e: spare
+.Ed
+.Pp
+Note that with the use of
+.Fl f
+a reconstruction has not been started. To both fail the disk and
+start a reconstruction, the
+.Fl F
+option must be used:
+.Bd -unfilled -offset indent
+raidctl -F /dev/da2s1e raid0
+.Ed
+.Pp
+The
+.Fl f
+option may be used first, and then the
+.Fl F
+option used later, on the same disk, if desired.
+Immediately after the reconstruction is started, the status will report:
+.Bd -unfilled -offset indent
+Components:
+ /dev/da1s1e: optimal
+ /dev/da2s1e: reconstructing
+ /dev/da3s1e: optimal
+Spares:
+ /dev/da4s1e: used_spare
+[...]
+Parity status: clean
+Reconstruction is 10% complete.
+Parity Re-write is 100% complete.
+Copyback is 100% complete.
+.Ed
+.Pp
+This indicates that a reconstruction is in progress. To find out how
+the reconstruction is progressing the
+.Fl S
+option may be used. This will indicate the progress in terms of the
+percentage of the reconstruction that is completed. When the
+reconstruction is finished the
+.Fl s
+option will show:
+.Bd -unfilled -offset indent
+Components:
+ /dev/da1s1e: optimal
+ /dev/da2s1e: spared
+ /dev/da3s1e: optimal
+Spares:
+ /dev/da4s1e: used_spare
+[...]
+Parity status: clean
+Reconstruction is 100% complete.
+Parity Re-write is 100% complete.
+Copyback is 100% complete.
+.Ed
+.Pp
+At this point there are at least two options. First, if /dev/da2s1e is
+known to be good (i.e. the failure was either caused by
+.Fl f
+or
+.Fl F ,
+or the failed disk was replaced), then a copyback of the data can
+be initiated with the
+.Fl B
+option. In this example, this would copy the entire contents of
+/dev/da4s1e to /dev/da2s1e. Once the copyback procedure is complete, the
+status of the device would be (in part):
+.Bd -unfilled -offset indent
+Components:
+ /dev/da1s1e: optimal
+ /dev/da2s1e: optimal
+ /dev/da3s1e: optimal
+Spares:
+ /dev/da4s1e: spare
+.Ed
+.Pp
+and the system is back to normal operation.
+.Pp
+The second option after the reconstruction is to simply use /dev/da4s1e
+in place of /dev/da2s1e in the configuration file. For example, the
+configuration file (in part) might now look like:
+.Bd -unfilled -offset indent
+START array
+1 3 0
+
+START drives
+/dev/da1s1e
+/dev/da4s1e
+/dev/da3s1e
+.Ed
+.Pp
+This can be done as /dev/da4s1e is completely interchangeable with
+/dev/da2s1e at this point. Note that extreme care must be taken when
+changing the order of the drives in a configuration. This is one of
+the few instances where the devices and/or their orderings can be
+changed without loss of data! In general, the ordering of components
+in a configuration file should
+.Ar never
+be changed.
+.Pp
+If a component fails and there are no hot spares
+available on-line, the status of the RAID set might (in part) look like:
+.Bd -unfilled -offset indent
+Components:
+ /dev/da1s1e: optimal
+ /dev/da2s1e: failed
+ /dev/da3s1e: optimal
+No spares.
+.Ed
+.Pp
+In this case there are a number of options. The first option is to add a hot
+spare using:
+.Bd -unfilled -offset indent
+raidctl -a /dev/da4s1e raid0
+.Ed
+.Pp
+After the hot add, the status would then be:
+.Bd -unfilled -offset indent
+Components:
+ /dev/da1s1e: optimal
+ /dev/da2s1e: failed
+ /dev/da3s1e: optimal
+Spares:
+ /dev/da4s1e: spare
+.Ed
+.Pp
+Reconstruction could then take place using
+.Fl F
+as describe above.
+.Pp
+A second option is to rebuild directly onto /dev/da2s1e. Once the disk
+containing /dev/da2s1e has been replaced, one can simply use:
+.Bd -unfilled -offset indent
+raidctl -R /dev/da2s1e raid0
+.Ed
+.Pp
+to rebuild the /dev/da2s1e component. As the rebuilding is in progress,
+the status will be:
+.Bd -unfilled -offset indent
+Components:
+ /dev/da1s1e: optimal
+ /dev/da2s1e: reconstructing
+ /dev/da3s1e: optimal
+No spares.
+.Ed
+.Pp
+and when completed, will be:
+.Bd -unfilled -offset indent
+Components:
+ /dev/da1s1e: optimal
+ /dev/da2s1e: optimal
+ /dev/da3s1e: optimal
+No spares.
+.Ed
+.Pp
+In circumstances where a particular component is completely
+unavailable after a reboot, a special component name will be used to
+indicate the missing component. For example:
+.Bd -unfilled -offset indent
+Components:
+ /dev/da2s1e: optimal
+ component1: failed
+No spares.
+.Ed
+.Pp
+indicates that the second component of this RAID set was not detected
+at all by the auto-configuration code. The name
+.Sq component1
+can be used anywhere a normal component name would be used. For
+example, to add a hot spare to the above set, and rebuild to that hot
+spare, the following could be done:
+.Bd -unfilled -offset indent
+raidctl -a /dev/da3s1e raid0
+raidctl -F component1 raid0
+.Ed
+.Pp
+at which point the data missing from
+.Sq component1
+would be reconstructed onto /dev/da3s1e.
+.Pp
+.Ss RAID on RAID
+RAID sets can be layered to create more complex and much larger RAID
+sets. A RAID 0 set, for example, could be constructed from four RAID
+5 sets. The following configuration file shows such a setup:
+.Bd -unfilled -offset indent
+START array
+# numRow numCol numSpare
+1 4 0
+
+START disks
+/dev/raid1e
+/dev/raid2e
+/dev/raid3e
+/dev/raid4e
+
+START layout
+# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
+128 1 1 0
+
+START queue
+fifo 100
+.Ed
+.Pp
+A similar configuration file might be used for a RAID 0 set
+constructed from components on RAID 1 sets. In such a configuration,
+the mirroring provides a high degree of redundancy, while the striping
+provides additional speed benefits.
+.Pp
+.Ss Auto-configuration and Root on RAID
+RAID sets can also be auto-configured at boot. To make a set
+auto-configurable, simply prepare the RAID set as above, and then do
+a:
+.Bd -unfilled -offset indent
+raidctl -A yes raid0
+.Ed
+.Pp
+to turn on auto-configuration for that set. To turn off
+auto-configuration, use:
+.Bd -unfilled -offset indent
+raidctl -A no raid0
+.Ed
+.Pp
+RAID sets which are auto-configurable will be configured before the
+root filesystem is mounted. These RAID sets are thus available for
+use as a root filesystem, or for any other filesystem. A primary
+advantage of using the auto-configuration is that RAID components
+become more independent of the disks they reside on. For example,
+SCSI ID's can change, but auto-configured sets will always be
+configured correctly, even if the SCSI ID's of the component disks
+have become scrambled.
+.Pp
+Having a system's root filesystem (/) on a RAID set is also allowed,
+with the
+.Sq a
+partition of such a RAID set being used for /.
+To use raid0a as the root filesystem, simply use:
+.Bd -unfilled -offset indent
+raidctl -A root raid0
+.Ed
+.Pp
+To return raid0a to be just an auto-configuring set simply use the
+.Fl A Ar yes
+arguments.
+.Pp
+Note that kernels can only be directly read from RAID 1 components on
+alpha and pmax architectures. On those architectures, the
+.Dv FS_RAID
+filesystem is recognized by the bootblocks, and will properly load the
+kernel directly from a RAID 1 component. For other architectures, or
+to support the root filesystem on other RAID sets, some other
+mechanism must be used to get a kernel booting. For example, a small
+partition containing only the secondary boot-blocks and an alternate
+kernel (or two) could be used. Once a kernel is booting however, and
+an auto-configuring RAID set is found that is eligible to be root,
+then that RAID set will be auto-configured and used as the root
+device. If two or more RAID sets claim to be root devices, then the
+user will be prompted to select the root device. At this time, RAID
+0, 1, 4, and 5 sets are all supported as root devices.
+.Pp
+A typical RAID 1 setup with root on RAID might be as follows:
+.Bl -enum
+.It
+wd0a - a small partition, which contains a complete, bootable, basic
+NetBSD installation.
+.It
+wd1a - also contains a complete, bootable, basic NetBSD installation.
+.It
+wd0e and wd1e - a RAID 1 set, raid0, used for the root filesystem.
+.It
+wd0f and wd1f - a RAID 1 set, raid1, which will be used only for
+swap space.
+.It
+wd0g and wd1g - a RAID 1 set, raid2, used for /usr, /home, or other
+data, if desired.
+.It
+wd0h and wd0h - a RAID 1 set, raid3, if desired.
+.El
+.Pp
+RAID sets raid0, raid1, and raid2 are all marked as
+auto-configurable. raid0 is marked as being a root filesystem.
+When new kernels are installed, the kernel is not only copied to /,
+but also to wd0a and wd1a. The kernel on wd0a is required, since that
+is the kernel the system boots from. The kernel on wd1a is also
+required, since that will be the kernel used should wd0 fail. The
+important point here is to have redundant copies of the kernel
+available, in the event that one of the drives fail.
+.Pp
+There is no requirement that the root filesystem be on the same disk
+as the kernel. For example, obtaining the kernel from wd0a, and using
+da0s1e and da1s1e for raid0, and the root filesystem, is fine. It
+.Ar is
+critical, however, that there be multiple kernels available, in the
+event of media failure.
+.Pp
+Multi-layered RAID devices (such as a RAID 0 set made
+up of RAID 1 sets) are
+.Ar not
+supported as root devices or auto-configurable devices at this point.
+(Multi-layered RAID devices
+.Ar are
+supported in general, however, as mentioned earlier.) Note that in
+order to enable component auto-detection and auto-configuration of
+RAID devices, the line:
+.Bd -unfilled -offset indent
+options RAID_AUTOCONFIG
+.Ed
+.Pp
+must be in the kernel configuration file. See
+.Xr raid 4
+for more details.
+.Pp
+.Ss Unconfiguration
+The final operation performed by
+.Nm
+is to unconfigure a
+.Xr raid 4
+device. This is accomplished via a simple:
+.Bd -unfilled -offset indent
+raidctl -u raid0
+.Ed
+.Pp
+at which point the device is ready to be reconfigured.
+.Pp
+.Ss Performance Tuning
+Selection of the various parameter values which result in the best
+performance can be quite tricky, and often requires a bit of
+trial-and-error to get those values most appropriate for a given system.
+A whole range of factors come into play, including:
+.Bl -enum
+.It
+Types of components (e.g. SCSI vs. IDE) and their bandwidth
+.It
+Types of controller cards and their bandwidth
+.It
+Distribution of components among controllers
+.It
+IO bandwidth
+.It
+Filesystem access patterns
+.It
+CPU speed
+.El
+.Pp
+As with most performance tuning, benchmarking under real-life loads
+may be the only way to measure expected performance. Understanding
+some of the underlying technology is also useful in tuning. The goal
+of this section is to provide pointers to those parameters which may
+make significant differences in performance.
+.Pp
+For a RAID 1 set, a SectPerSU value of 64 or 128 is typically
+sufficient. Since data in a RAID 1 set is arranged in a linear
+fashion on each component, selecting an appropriate stripe size is
+somewhat less critical than it is for a RAID 5 set. However: a stripe
+size that is too small will cause large IO's to be broken up into a
+number of smaller ones, hurting performance. At the same time, a
+large stripe size may cause problems with concurrent accesses to
+stripes, which may also affect performance. Thus values in the range
+of 32 to 128 are often the most effective.
+.Pp
+Tuning RAID 5 sets is trickier. In the best case, IO is presented to
+the RAID set one stripe at a time. Since the entire stripe is
+available at the beginning of the IO, the parity of that stripe can
+be calculated before the stripe is written, and then the stripe data
+and parity can be written in parallel. When the amount of data being
+written is less than a full stripe worth, the
+.Sq small write
+problem occurs. Since a
+.Sq small write
+means only a portion of the stripe on the components is going to
+change, the data (and parity) on the components must be updated
+slightly differently. First, the
+.Sq old parity
+and
+.Sq old data
+must be read from the components. Then the new parity is constructed,
+using the new data to be written, and the old data and old parity.
+Finally, the new data and new parity are written. All this extra data
+shuffling results in a serious loss of performance, and is typically 2
+to 4 times slower than a full stripe write (or read). To combat this
+problem in the real world, it may be useful to ensure that stripe
+sizes are small enough that a
+.Sq large IO
+from the system will use exactly one large stripe write. As is seen
+later, there are some filesystem dependencies which may come into play
+here as well.
+.Pp
+Since the size of a
+.Sq large IO
+is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may
+be desirable to select a SectPerSU value of 16 blocks (8K) or 32
+blocks (16K). Since there are 4 data sectors per stripe, the maximum
+data per stripe is 64 blocks (32K) or 128 blocks (64K). Again,
+empirical measurement will provide the best indicators of which
+values will yeild better performance.
+.Pp
+The parameters used for the filesystem are also critical to good
+performance. For
+.Xr newfs 8 ,
+for example, increasing the block size to 32K or 64K may improve
+performance dramatically. As well, changing the cylinders-per-group
+parameter from 16 to 32 or higher is often not only necessary for
+larger filesystems, but may also have positive performance
+implications.
+.Pp
+.Ss Summary
+Despite the length of this man-page, configuring a RAID set is a
+relatively straight-forward process. All that needs to be done is the
+following steps:
+.Bl -enum
+.It
+Use
+.Xr disklabel 8
+to create the components (of type RAID).
+.It
+Construct a RAID configuration file: e.g.
+.Sq raid0.conf
+.It
+Configure the RAID set with:
+.Bd -unfilled -offset indent
+raidctl -C raid0.conf
+.Ed
+.Pp
+.It
+Initialize the component labels with:
+.Bd -unfilled -offset indent
+raidctl -I 123456 raid0
+.Ed
+.Pp
+.It
+Initialize other important parts of the set with:
+.Bd -unfilled -offset indent
+raidctl -i raid0
+.Ed
+.Pp
+.It
+Get the default label for the RAID set:
+.Bd -unfilled -offset indent
+disklabel raid0 > /tmp/label
+.Ed
+.Pp
+.It
+Edit the label:
+.Bd -unfilled -offset indent
+vi /tmp/label
+.Ed
+.Pp
+.It
+Put the new label on the RAID set:
+.Bd -unfilled -offset indent
+disklabel -R -r raid0 /tmp/label
+.Ed
+.Pp
+.It
+Create the filesystem:
+.Bd -unfilled -offset indent
+newfs /dev/rraid0e
+.Ed
+.Pp
+.It
+Mount the filesystem:
+.Bd -unfilled -offset indent
+mount /dev/raid0e /mnt
+.Ed
+.Pp
+.It
+Use:
+.Bd -unfilled -offset indent
+raidctl -c raid0.conf
+.Ed
+.Pp
+To re-configure the RAID set the next time it is needed, or put
+raid0.conf into /etc where it will automatically be started by
+the /etc/rc scripts.
+.El
+.Pp
+.Sh WARNINGS
+Certain RAID levels (1, 4, 5, 6, and others) can protect against some
+data loss due to component failure. However the loss of two
+components of a RAID 4 or 5 system, or the loss of a single component
+of a RAID 0 system will result in the entire filesystem being lost.
+RAID is
+.Ar NOT
+a substitute for good backup practices.
+.Pp
+Recomputation of parity
+.Ar MUST
+be performed whenever there is a chance that it may have been
+compromised. This includes after system crashes, or before a RAID
+device has been used for the first time. Failure to keep parity
+correct will be catastrophic should a component ever fail -- it is
+better to use RAID 0 and get the additional space and speed, than it
+is to use parity, but not keep the parity correct. At least with RAID
+0 there is no perception of increased data security.
+.Pp
+.Sh FILES
+.Bl -tag -width /dev/XXrXraidX -compact
+.It Pa /dev/{,r}raid*
+.Cm raid
+device special files.
+.El
+.Pp
+.Sh SEE ALSO
+.Xr raid 4 ,
+.Xr ccd 4 ,
+.Xr rc 8
+.Sh BUGS
+Hot-spare removal is currently not available.
+.Sh HISTORY
+RAIDframe is a framework for rapid prototyping of RAID structures
+developed by the folks at the Parallel Data Laboratory at Carnegie
+Mellon University (CMU).
+A more complete description of the internals and functionality of
+RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
+for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
+Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
+Parallel Data Laboratory of Carnegie Mellon University.
+.Pp
+The
+.Nm
+command first appeared as a program in CMU's RAIDframe v1.1 distribution. This
+version of
+.Nm
+is a complete re-write, and first appeared in
+.Fx 4.4 .
+.Sh COPYRIGHT
+.Bd -unfilled
+The RAIDframe Copyright is as follows:
+
+Copyright (c) 1994-1996 Carnegie-Mellon University.
+All rights reserved.
+
+Permission to use, copy, modify and distribute this software and
+its documentation is hereby granted, provided that both the copyright
+notice and this permission notice appear in all copies of the
+software, derivative works or modified versions, and any portions
+thereof, and that both notices appear in supporting documentation.
+
+CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
+CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
+FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
+
+Carnegie Mellon requests users of this software to return to
+
+ Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
+ School of Computer Science
+ Carnegie Mellon University
+ Pittsburgh PA 15213-3890
+
+any improvements or extensions that they make and grant Carnegie the
+rights to redistribute these changes.
+.Ed
OpenPOWER on IntegriCloud