1 files changed, 1325 insertions, 0 deletions
diff --git a/sbin/raidctl/raidctl.8 b/sbin/raidctl/raidctl.8
new file mode 100644
index 0000000..9aef14f
--- /dev/null
+++ b/sbin/raidctl/raidctl.8
@@ -0,0 +1,1325 @@
+.\"	$FreeBSD$
+.\"     $NetBSD: raidctl.8,v 1.21 2000/08/10 15:14:14 oster Exp $
+.\"
+.\" Copyright (c) 1998 The NetBSD Foundation, Inc.
+.\" All rights reserved.
+.\"
+.\" This code is derived from software contributed to The NetBSD Foundation
+.\" by Greg Oster
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\" 3. All advertising materials mentioning features or use of this software
+.\"    must display the following acknowledgement:
+.\"        This product includes software developed by the NetBSD
+.\"        Foundation, Inc. and its contributors.
+.\" 4. Neither the name of The NetBSD Foundation nor the names of its
+.\"    contributors may be used to endorse or promote products derived
+.\"    from this software without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
+.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
+.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+.\" POSSIBILITY OF SUCH DAMAGE.
+.\"
+.\"
+.\" Copyright (c) 1995 Carnegie-Mellon University.
+.\" All rights reserved.
+.\" 
+.\" Author: Mark Holland
+.\" 
+.\" Permission to use, copy, modify and distribute this software and
+.\" its documentation is hereby granted, provided that both the copyright
+.\" notice and this permission notice appear in all copies of the
+.\" software, derivative works or modified versions, and any portions
+.\" thereof, and that both notices appear in supporting documentation.
+.\" 
+.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
+.\" CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
+.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
+.\" 
+.\" Carnegie Mellon requests users of this software to return to
+.\" 
+.\"  Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
+.\"  School of Computer Science
+.\"  Carnegie Mellon University
+.\"  Pittsburgh PA 15213-3890
+.\" 
+.\" any improvements or extensions that they make and grant Carnegie the
+.\" rights to redistribute these changes.
+.\" 
+.Dd November 6, 1998
+.Dt RAIDCTL 8
+.Os FreeBSD
+.Sh NAME
+.Nm raidctl
+.Nd configuration utility for the RAIDframe disk driver
+.Sh SYNOPSIS
+.Nm
+.Op Fl v 
+.Fl a Ar component Ar dev
+.Nm
+.Op Fl v
+.Fl A Op yes | no | root
+.Ar dev
+.Nm
+.Op Fl v 
+.Fl B Ar dev 
+.Nm
+.Op Fl v 
+.Fl c Ar config_file
+.Nm
+.Op Fl v 
+.Fl C Ar config_file
+.Nm
+.Op Fl v 
+.Fl f Ar component Ar dev
+.Nm
+.Op Fl v 
+.Fl F Ar component Ar dev
+.Nm
+.Op Fl v 
+.Fl g Ar component Ar dev
+.Nm
+.Op Fl v 
+.Fl i Ar dev
+.Nm
+.Op Fl v 
+.Fl I Ar serial_number Ar dev
+.Nm
+.Op Fl v 
+.Fl p Ar dev
+.Nm
+.Op Fl v 
+.Fl P Ar dev
+.Nm
+.Op Fl v 
+.Fl r Ar component Ar dev
+.Nm
+.Op Fl v 
+.Fl R Ar component Ar dev
+.Nm
+.Op Fl v 
+.Fl s Ar dev 
+.Nm
+.Op Fl v 
+.Fl S Ar dev
+.Nm
+.Op Fl v 
+.Fl u Ar dev
+.Sh DESCRIPTION
+.Nm
+is the user-land control program for
+.Xr raid 4 ,
+the RAIDframe disk device.  
+.Nm
+is primarily used to dynamically configure and unconfigure RAIDframe disk
+devices.  For more information about the RAIDframe disk device, see
+.Xr raid 4 .
+.Pp
+This document assumes the reader has at least rudimentary knowledge of
+RAID and RAID concepts.
+.Pp
+The command-line options for 
+.Nm
+are as follows:
+.Bl -tag -width indent
+.It Fl a Ar component Ar dev
+Add 
+.Ar component
+as a hot spare for the device 
+.Ar dev .
+.It Fl A Ic yes Ar dev
+Make the RAID set auto-configurable.  The RAID set will be
+automatically configured at boot 
+.Ar before
+the root filesystem is
+mounted.  Note that all components of the set must be of type RAID in the
+disklabel.
+.It Fl A Ic no Ar dev
+Turn off auto-configuration for the RAID set.
+.It Fl A Ic root Ar dev
+Make the RAID set auto-configurable, and also mark the set as being
+eligible to be the root partition.  A RAID set configured this way
+will 
+.Ar override
+the use of the boot disk as the root device.  All components of the
+set must be of type RAID in the disklabel.  Note that the kernel being
+booted must currently reside on a non-RAID set.
+.It Fl B Ar dev
+Initiate a copyback of reconstructed data from a spare disk to 
+its original disk.  This is performed after a component has failed, 
+and the failed drive has been reconstructed onto a spare drive.
+.It Fl c Ar config_file
+Configure a RAIDframe device 
+according to the configuration given in
+.Ar config_file .
+A description of the contents of 
+.Ar config_file
+is given later.
+.It Fl C Ar config_file
+As for
+.Ar -c ,
+but forces the configuration to take place.  This is required the
+first time a RAID set is configured.
+.It Fl f Ar component Ar dev
+This marks the specified 
+.Ar component
+as having failed, but does not initiate a reconstruction of that
+component.  
+.It Fl F Ar component Ar dev
+Fails the specified 
+.Ar component
+of the device, and immediately begin a reconstruction of the failed
+disk onto an available hot spare.  This is one of the mechanisms used to start
+the reconstruction process if a component does have a hardware failure.
+.It Fl g Ar component Ar dev
+Get the component label for the specified component.
+.It Fl i Ar dev
+Initialize the RAID device.  In particular, (re-write) the parity on
+the selected device.  This 
+.Ar MUST
+be done for 
+.Ar all 
+RAID sets before the RAID device is labeled and before
+filesystems are created on the RAID device.
+.It Fl I Ar serial_number Ar dev
+Initialize the component labels on each component of the device.  
+.Ar serial_number 
+is used as one of the keys in determining whether a
+particular set of components belong to the same RAID set.  While not
+strictly enforced, different serial numbers should be used for
+different RAID sets.  This step 
+.Ar MUST
+be performed when a new RAID set is created.
+.It Fl p Ar dev
+Check the status of the parity on the RAID set.  Displays a status
+message, and returns successfully if the parity is up-to-date.
+.It Fl P Ar dev
+Check the status of the parity on the RAID set, and initialize
+(re-write) the parity if the parity is not known to be up-to-date.
+This is normally used after a system crash (and before a
+.Xr fsck 8 )
+to ensure the integrity of the parity.
+.It Fl r Ar component Ar dev
+Remove the spare disk specified by 
+.Ar component 
+from the set of available spare components.
+.It Fl R Ar component Ar dev
+Fails the specified 
+.Ar component , 
+if necessary, and immediately begins a reconstruction back to 
+.Ar component .
+This is useful for reconstructing back onto a component after
+it has been replaced following a failure.
+.It Fl s Ar dev
+Display the status of the RAIDframe device for each of the components
+and spares.  
+.It Fl S Ar dev
+Check the status of parity re-writing, component reconstruction, and
+component copyback.  The output indicates the amount of progress
+achieved in each of these areas.
+.It Fl u Ar dev
+Unconfigure the RAIDframe device.
+.It Fl v 
+Be more verbose.  For operations such as reconstructions, parity
+re-writing, and copybacks, provide a progress indicator.
+.El
+.Pp
+The device used by 
+.Nm
+is specified by 
+.Ar dev .  
+.Ar dev
+may be either the full name of the device, e.g. /dev/rraid0d,
+for the i386 architecture, and /dev/rraid0c
+for all others, or just simply raid0 (for /dev/rraid0d).
+.Pp
+The format of the configuration file is complex, and
+only an abbreviated treatment is given here.  In the configuration
+files, a 
+.Sq #
+indicates the beginning of a comment.
+.Pp
+There are 4 required sections of a configuration file, and 2
+optional sections.  Each section begins with a 
+.Sq START , 
+followed by
+the section name, and the configuration parameters associated with that
+section.  The first section is the 
+.Sq array
+section, and it specifies
+the number of rows, columns, and spare disks in the RAID set.  For
+example: 
+.Bd -unfilled -offset indent
+START array
+1 3 0
+.Ed
+.Pp
+indicates an array with 1 row, 3 columns, and 0 spare disks.  Note
+that although multi-dimensional arrays may be specified, they are 
+.Ar NOT
+supported in the driver.
+.Pp
+The second section, the 
+.Sq disks
+section, specifies the actual
+components of the device.  For example:
+.Bd -unfilled -offset indent
+START disks
+/dev/da0s1e
+/dev/da1s1e
+/dev/da2s1e
+.Ed
+.Pp
+specifies the three component disks to be used in the RAID device.  If
+any of the specified drives cannot be found when the RAID device is
+configured, then they will be marked as 
+.Sq failed , 
+and the system will
+operate in degraded mode.  Note that it is 
+.Ar imperative
+that the order of the components in the configuration file does not
+change between configurations of a RAID device.  Changing the order
+of the components will result in data loss if the set is configured
+with the 
+.Fl C
+option.  In normal circumstances, the RAID set will not configure if
+only
+.Fl c
+is specified, and the components are out-of-order.  
+.Pp
+The next section, which is the 
+.Sq spare
+section, is optional, and, if
+present, specifies the devices to be used as 
+.Sq hot spares
+-- devices
+which are on-line, but are not actively used by the RAID driver unless
+one of the main components fail.  A simple 
+.Sq spare
+section might be:
+.Bd -unfilled -offset indent
+START spare 
+/dev/da3s1e
+.Ed
+.Pp
+for a configuration with a single spare component.  If no spare drives
+are to be used in the configuration, then the 
+.Sq spare
+section may be omitted.
+.Pp
+The next section is the 
+.Sq layout
+section.  This section describes the
+general layout parameters for the RAID device, and provides such
+information as sectors per stripe unit, stripe units per parity unit,
+stripe units per reconstruction unit, and the parity configuration to
+use.  This section might look like:
+.Bd -unfilled -offset indent
+START layout
+# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
+32 1 1 5
+.Ed
+.Pp
+The sectors per stripe unit specifies, in blocks, the interleave
+factor; i.e. the number of contiguous sectors to be written to each
+component for a single stripe.  Appropriate selection of this value
+(32 in this example) is the subject of much research in RAID
+architectures.  The stripe units per parity unit and
+stripe units per reconstruction unit are normally each set to 1.
+While certain values above 1 are permitted, a discussion of valid
+values and the consequences of using anything other than 1 are outside
+the scope of this document.  The last value in this section (5 in this
+example) indicates the parity configuration desired.  Valid entries
+include: 
+.Bl -tag -width inde
+.It 0 
+RAID level 0.  No parity, only simple striping.
+.It 1
+RAID level 1.  Mirroring.  The parity is the mirror.
+.It 4
+RAID level 4.  Striping across components, with parity stored on the
+last component.
+.It 5
+RAID level 5.  Striping across components, parity distributed across
+all components.
+.El
+.Pp
+There are other valid entries here, including those for Even-Odd
+parity, RAID level 5 with rotated sparing, Chained declustering, 
+and Interleaved declustering, but as of this writing the code for
+those parity operations has not been tested with 
+.Fx .
+.Pp
+The next required section is the 
+.Sq queue
+section.  This is most often
+specified as:
+.Bd -unfilled -offset indent
+START queue
+fifo 100
+.Ed
+.Pp
+where the queuing method is specified as fifo (first-in, first-out),
+and the size of the per-component queue is limited to 100 requests.  
+Other queuing methods may also be specified, but a discussion of them
+is beyond the scope of this document.
+.Pp
+The final section, the 
+.Sq debug
+section, is optional.  For more details
+on this the reader is referred to the RAIDframe documentation
+discussed in the 
+.Sx HISTORY
+section.
+
+See
+.Sx EXAMPLES
+for a more complete configuration file example.
+
+.Sh EXAMPLES
+
+It is highly recommended that before using the RAID driver for real
+filesystems that the system administrator(s) become quite familiar
+with the use of
+.Nm ,
+and that they understand how the component reconstruction process
+works.  The examples in this section will focus on configuring a
+number of different RAID sets of varying degrees of redundancy.
+By working through these examples, administrators should be able to 
+develop a good feel for how to configure a RAID set, and how to
+initiate reconstruction of failed components.
+.Pp
+In the following examples
+.Sq raid0
+will be used to denote the RAID device.  Depending on the
+architecture, 
+.Sq /dev/rraid0c 
+or 
+.Sq /dev/rraid0d 
+may be used in place of
+.Sq raid0 .
+.Pp
+.Ss Initialization and Configuration
+The initial step in configuring a RAID set is to identify the components
+that will be used in the RAID set.  All components should be the same
+size.  Each component should have a disklabel type of
+.Dv FS_RAID ,
+and a typical disklabel entry for a RAID component
+might look like:
+.Bd -unfilled -offset indent
+f:  1800000  200495     RAID              # (Cyl.  405*- 4041*)
+.Ed
+.Pp
+While
+.Dv FS_BSDFFS 
+will also work as the component type, the type
+.Dv FS_RAID 
+is preferred for RAIDframe use, as it is required for features such as
+auto-configuration.  As part of the initial configuration of each RAID
+set, each component will be given a
+.Sq component label .
+A
+.Sq component label
+contains important information about the component, including a
+user-specified serial number, the row and column of that component in
+the RAID set, the redundancy level of the RAID set, a 'modification
+counter', and whether the parity information (if any) on that
+component is known to be correct.  Component labels are an integral
+part of the RAID set, since they are used to ensure that components
+are configured in the correct order, and used to keep track of other
+vital information about the RAID set.  Component labels are also
+required for the auto-detection and auto-configuration of RAID sets at
+boot time.  For a component label to be considered valid, that
+particular component label must be in agreement with the other
+component labels in the set.  For example, the serial number,
+.Sq modification counter , 
+number of rows and number of columns must all
+be in agreement.  If any of these are different, then the component is
+not considered to be part of the set.  See
+.Xr raid 4
+for more information about component labels.
+.Pp
+Once the components have been identified, and the disks have
+appropriate labels, 
+.Nm
+is then used to configure the
+.Xr raid 4 
+device.  To configure the device, a configuration
+file which looks something like:
+.Bd -unfilled -offset indent
+START array
+# numRow numCol numSpare
+1 3 1
+
+START disks
+/dev/da1s1e
+/dev/da2s1e
+/dev/da3s1e
+
+START spare
+/dev/da4s1e
+
+START layout
+# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
+32 1 1 5
+
+START queue
+fifo 100
+.Ed
+.Pp
+is created in a file.  The above configuration file specifies a RAID 5
+set consisting of the components /dev/da1s1e, /dev/da2s1e, and /dev/da3s1e,
+with /dev/da4s1e available as a
+.Sq hot spare
+in case one of
+the three main drives should fail. A RAID 0 set would be specified in
+a similar way:
+.Bd -unfilled -offset indent
+START array
+# numRow numCol numSpare
+1 4 0
+
+START disks
+/dev/da1s10e
+/dev/da1s11e
+/dev/da1s12e
+/dev/da1s13e
+
+START layout
+# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
+64 1 1 0
+
+START queue
+fifo 100
+.Ed
+.Pp
+In this case, devices /dev/da1s10e, /dev/da1s11e, /dev/da1s12e, and /dev/da1s13e
+are the components that make up this RAID set.  Note that there are no
+hot spares for a RAID 0 set, since there is no way to recover data if
+any of the components fail.
+.Pp
+For a RAID 1 (mirror) set, the following configuration might be used:
+.Bd -unfilled -offset indent
+START array
+# numRow numCol numSpare
+1 2 0
+
+START disks
+/dev/da2s10e
+/dev/da2s11e
+
+START layout
+# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
+128 1 1 1
+
+START queue
+fifo 100
+.Ed
+.Pp
+In this case, /dev/da2s10e and /dev/da2s11e are the two components of the
+mirror set.  While no hot spares have been specified in this
+configuration, they easily could be, just as they were specified in
+the RAID 5 case above.  Note as well that RAID 1 sets are currently
+limited to only 2 components.  At present, n-way mirroring is not
+possible.
+.Pp
+The first time a RAID set is configured, the 
+.Fl C
+option must be used:
+.Bd -unfilled -offset indent
+raidctl -C raid0.conf
+.Ed
+.Pp
+where 
+.Sq raid0.conf
+is the name of the RAID configuration file.  The 
+.Fl C
+forces the configuration to succeed, even if any of the component
+labels are incorrect.  The
+.Fl C
+option should not be used lightly in
+situations other than initial configurations, as if
+the system is refusing to configure a RAID set, there is probably a
+very good reason for it.  After the initial configuration is done (and
+appropriate component labels are added with the 
+.Fl I
+option) then raid0 can be configured normally with:
+.Bd -unfilled -offset indent
+raidctl -c raid0.conf
+.Ed
+.Pp
+When the RAID set is configured for the first time, it is 
+necessary to initialize the component labels, and to initialize the
+parity on the RAID set.  Initializing the component labels is done with:
+.Bd -unfilled -offset indent
+raidctl -I 112341 raid0
+.Ed
+.Pp
+where 
+.Sq 112341
+is a user-specified serial number for the RAID set.  This
+initialization step is 
+.Ar required 
+for all RAID sets.  As well, using different
+serial numbers between RAID sets is 
+.Ar strongly encouraged , 
+as using the same serial number for all RAID sets will only serve to
+decrease the usefulness of the component label checking.
+.Pp
+Initializing the RAID set is done via the
+.Fl i
+option.  This initialization 
+.Ar MUST
+be done for 
+.Ar all
+RAID sets, since among other things it verifies that the parity (if
+any) on the RAID set is correct.  Since this initialization may be
+quite time-consuming, the
+.Fl v
+option may be also used in conjunction with
+.Fl i :
+.Bd -unfilled -offset indent
+raidctl -iv raid0
+.Ed
+.Pp
+This will give more verbose output on the
+status of the initialization:
+.Bd -unfilled -offset indent
+Initiating re-write of parity
+Parity Re-write status:
+ 10% |****                                   | ETA:    06:03 /
+.Ed
+.Pp
+The output provides a 
+.Sq Percent Complete
+in both a numeric and graphical format, as well as an estimated time
+to completion of the operation.
+.Pp
+Since it is the parity that provides the
+.Sq redundancy
+part of RAID, it is critical that the parity is correct
+as much as possible.  If the parity is not correct, then there is no
+guarantee that data will not be lost if a component fails.
+.Pp
+Once the parity is known to be correct, 
+it is then safe to perform
+.Xr disklabel 8 ,
+.Xr newfs 8 ,
+or
+.Xr fsck 8
+on the device or its filesystems, and then to mount the filesystems
+for use.
+.Pp
+Under certain circumstances (e.g. the additional component has not
+arrived, or data is being migrated off of a disk destined to become a
+component) it may be desirable to to configure a RAID 1 set with only
+a single component.  This can be achieved by configuring the set with
+a physically existing component (as either the first or second
+component) and with a
+.Sq fake
+component.  In the following:
+.Bd -unfilled -offset indent
+START array
+# numRow numCol numSpare
+1 2 0
+
+START disks
+/dev/da6s1e
+/dev/da0s1e
+
+START layout
+# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
+128 1 1 1
+
+START queue
+fifo 100
+.Ed
+.Pp
+/dev/da0s1e is the real component, and will be the second disk of a RAID 1
+set.  The component /dev/da6s1e, which must exist, but have no physical
+device associated with it, is simply used as a placeholder.
+Configuration (using 
+.Fl C
+and 
+.Fl I Ar 12345
+as above) proceeds normally, but initialization of the RAID set will
+have to wait until all physical components are present.  After
+configuration, this set can be used normally, but will be operating 
+in degraded mode.  Once a second physical component is obtained, it
+can be hot-added, the existing data mirrored, and normal operation
+resumed.
+.Pp
+.Ss Maintenance of the RAID set
+After the parity has been initialized for the first time, the command:
+.Bd -unfilled -offset indent
+raidctl -p raid0
+.Ed
+.Pp
+can be used to check the current status of the parity.  To check the
+parity and rebuild it necessary (for example, after an unclean
+shutdown) the command:
+.Bd -unfilled -offset indent
+raidctl -P raid0
+.Ed
+.Pp
+is used.  Note that re-writing the parity can be done while
+other operations on the RAID set are taking place (e.g. while doing a
+.Xr fsck 8
+on a filesystem on the RAID set).  However: for maximum effectiveness
+of the RAID set, the parity should be known to be correct before any
+data on the set is modified.
+.Pp
+To see how the RAID set is doing, the following command can be used to
+show the RAID set's status:
+.Bd -unfilled -offset indent
+raidctl -s raid0
+.Ed
+.Pp
+The output will look something like:
+.Bd -unfilled -offset indent
+Components:
+           /dev/da1s1e: optimal
+           /dev/da2s1e: optimal
+           /dev/da3s1e: optimal
+Spares:
+           /dev/da4s1e: spare
+Component label for /dev/da1s1e:
+   Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
+   Version: 2 Serial Number: 13432 Mod Counter: 65
+   Clean: No Status: 0
+   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
+   RAID Level: 5  blocksize: 512 numBlocks: 1799936
+   Autoconfig: No
+   Last configured as: raid0
+Component label for /dev/da2s1e:
+   Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
+   Version: 2 Serial Number: 13432 Mod Counter: 65
+   Clean: No Status: 0
+   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
+   RAID Level: 5  blocksize: 512 numBlocks: 1799936
+   Autoconfig: No
+   Last configured as: raid0
+Component label for /dev/da3s1e:
+   Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
+   Version: 2 Serial Number: 13432 Mod Counter: 65
+   Clean: No Status: 0
+   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
+   RAID Level: 5  blocksize: 512 numBlocks: 1799936
+   Autoconfig: No
+   Last configured as: raid0
+Parity status: clean
+Reconstruction is 100% complete.
+Parity Re-write is 100% complete.
+Copyback is 100% complete.
+.Ed
+.Pp
+This indicates that all is well with the RAID set.  Of importance here
+are the component lines which read
+.Sq optimal ,
+and the 
+.Sq Parity status
+line which indicates that the parity is up-to-date.  Note that if
+there are filesystems open on the RAID set, the individual components
+will not be 
+.Sq clean
+but the set as a whole can still be clean.
+.Pp
+To check the component label of /dev/da1s1e, the following is used:
+.Bd -unfilled -offset indent
+raidctl -g /dev/da1s1e raid0
+.Ed
+.Pp
+The output of this command will look something like:
+.Bd -unfilled -offset indent
+Component label for /dev/da1s1e:
+   Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
+   Version: 2 Serial Number: 13432 Mod Counter: 65
+   Clean: No Status: 0
+   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
+   RAID Level: 5  blocksize: 512 numBlocks: 1799936
+   Autoconfig: No
+   Last configured as: raid0
+.Ed
+.Pp
+.Ss Dealing with Component Failures
+If for some reason
+(perhaps to test reconstruction) it is necessary to pretend a drive
+has failed, the following will perform that function:
+.Bd -unfilled -offset indent
+raidctl -f /dev/da2s1e raid0
+.Ed
+.Pp
+The system will then be performing all operations in degraded mode,
+where missing data is re-computed from existing data and the parity.
+In this case, obtaining the status of raid0 will return (in part):
+.Bd -unfilled -offset indent
+Components:
+           /dev/da1s1e: optimal
+           /dev/da2s1e: failed
+           /dev/da3s1e: optimal
+Spares:
+           /dev/da4s1e: spare
+.Ed
+.Pp
+Note that with the use of 
+.Fl f
+a reconstruction has not been started.  To both fail the disk and
+start a reconstruction, the 
+.Fl F
+option must be used:
+.Bd -unfilled -offset indent
+raidctl -F /dev/da2s1e raid0
+.Ed
+.Pp
+The 
+.Fl f
+option may be used first, and then the
+.Fl F
+option used later, on the same disk, if desired.  
+Immediately after the reconstruction is started, the status will report:
+.Bd -unfilled -offset indent
+Components:
+           /dev/da1s1e: optimal
+           /dev/da2s1e: reconstructing
+           /dev/da3s1e: optimal
+Spares:
+           /dev/da4s1e: used_spare
+[...]
+Parity status: clean
+Reconstruction is 10% complete.
+Parity Re-write is 100% complete.
+Copyback is 100% complete.
+.Ed
+.Pp
+This indicates that a reconstruction is in progress.  To find out how
+the reconstruction is progressing the 
+.Fl S
+option may be used.  This will indicate the progress in terms of the
+percentage of the reconstruction that is completed.  When the
+reconstruction is finished the
+.Fl s
+option will show:
+.Bd -unfilled -offset indent
+Components:
+           /dev/da1s1e: optimal
+           /dev/da2s1e: spared
+           /dev/da3s1e: optimal
+Spares:
+           /dev/da4s1e: used_spare
+[...]
+Parity status: clean
+Reconstruction is 100% complete.
+Parity Re-write is 100% complete.
+Copyback is 100% complete.
+.Ed
+.Pp
+At this point there are at least two options.  First, if /dev/da2s1e is
+known to be good (i.e. the failure was either caused by 
+.Fl f
+or 
+.Fl F ,
+or the failed disk was replaced), then a copyback of the data can 
+be initiated with the 
+.Fl B
+option.  In this example, this would copy the entire contents of
+/dev/da4s1e to /dev/da2s1e.  Once the copyback procedure is complete, the
+status of the device would be (in part):
+.Bd -unfilled -offset indent
+Components:
+           /dev/da1s1e: optimal
+           /dev/da2s1e: optimal
+           /dev/da3s1e: optimal
+Spares:
+           /dev/da4s1e: spare
+.Ed
+.Pp
+and the system is back to normal operation.
+.Pp
+The second option after the reconstruction is to simply use /dev/da4s1e
+in place of /dev/da2s1e in the configuration file.  For example, the
+configuration file (in part) might now look like:
+.Bd -unfilled -offset indent
+START array
+1 3 0
+
+START drives
+/dev/da1s1e
+/dev/da4s1e
+/dev/da3s1e
+.Ed
+.Pp
+This can be done as /dev/da4s1e is completely interchangeable with
+/dev/da2s1e at this point.  Note that extreme care must be taken when 
+changing the order of the drives in a configuration.  This is one of
+the few instances where the devices and/or their orderings can be
+changed without loss of data!  In general, the ordering of components
+in a configuration file should 
+.Ar never 
+be changed.
+.Pp
+If a component fails and there are no hot spares
+available on-line, the status of the RAID set might (in part) look like:
+.Bd -unfilled -offset indent
+Components:
+           /dev/da1s1e: optimal
+           /dev/da2s1e: failed
+           /dev/da3s1e: optimal
+No spares.
+.Ed
+.Pp
+In this case there are a number of options.  The first option is to add a hot
+spare using:
+.Bd -unfilled -offset indent
+raidctl -a /dev/da4s1e raid0
+.Ed
+.Pp
+After the hot add, the status would then be:
+.Bd -unfilled -offset indent
+Components:
+           /dev/da1s1e: optimal
+           /dev/da2s1e: failed
+           /dev/da3s1e: optimal
+Spares:
+           /dev/da4s1e: spare
+.Ed
+.Pp
+Reconstruction could then take place using 
+.Fl F
+as describe above.
+.Pp
+A second option is to rebuild directly onto /dev/da2s1e.  Once the disk 
+containing /dev/da2s1e has been replaced, one can simply use:
+.Bd -unfilled -offset indent
+raidctl -R /dev/da2s1e raid0
+.Ed
+.Pp
+to rebuild the /dev/da2s1e component.  As the rebuilding is in progress,
+the status will be:
+.Bd -unfilled -offset indent
+Components:
+           /dev/da1s1e: optimal
+           /dev/da2s1e: reconstructing
+           /dev/da3s1e: optimal
+No spares.
+.Ed
+.Pp
+and when completed, will be:
+.Bd -unfilled -offset indent
+Components:
+           /dev/da1s1e: optimal
+           /dev/da2s1e: optimal
+           /dev/da3s1e: optimal
+No spares.
+.Ed
+.Pp
+In circumstances where a particular component is completely
+unavailable after a reboot, a special component name will be used to
+indicate the missing component.  For example:
+.Bd -unfilled -offset indent
+Components:
+           /dev/da2s1e: optimal
+          component1: failed
+No spares.
+.Ed
+.Pp
+indicates that the second component of this RAID set was not detected
+at all by the auto-configuration code.  The name
+.Sq component1
+can be used anywhere a normal component name would be used.  For
+example, to add a hot spare to the above set, and rebuild to that hot
+spare, the following could be done:
+.Bd -unfilled -offset indent
+raidctl -a /dev/da3s1e raid0
+raidctl -F component1 raid0
+.Ed
+.Pp
+at which point the data missing from 
+.Sq component1 
+would be reconstructed onto /dev/da3s1e.
+.Pp
+.Ss RAID on RAID
+RAID sets can be layered to create more complex and much larger RAID
+sets.  A RAID 0 set, for example, could be constructed from four RAID
+5 sets.  The following configuration file shows such a setup:
+.Bd -unfilled -offset indent
+START array
+# numRow numCol numSpare
+1 4 0
+
+START disks
+/dev/raid1e
+/dev/raid2e
+/dev/raid3e
+/dev/raid4e
+
+START layout
+# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
+128 1 1 0
+
+START queue
+fifo 100
+.Ed
+.Pp
+A similar configuration file might be used for a RAID 0 set
+constructed from components on RAID 1 sets.  In such a configuration,
+the mirroring provides a high degree of redundancy, while the striping
+provides additional speed benefits.
+.Pp
+.Ss Auto-configuration and Root on RAID
+RAID sets can also be auto-configured at boot.  To make a set
+auto-configurable, simply prepare the RAID set as above, and then do
+a:
+.Bd -unfilled -offset indent
+raidctl -A yes raid0
+.Ed
+.Pp
+to turn on auto-configuration for that set.  To turn off
+auto-configuration, use:
+.Bd -unfilled -offset indent
+raidctl -A no raid0
+.Ed
+.Pp
+RAID sets which are auto-configurable will be configured before the
+root filesystem is mounted.  These RAID sets are thus available for
+use as a root filesystem, or for any other filesystem.  A primary
+advantage of using the auto-configuration is that RAID components
+become more independent of the disks they reside on.  For example,
+SCSI ID's can change, but auto-configured sets will always be
+configured correctly, even if the SCSI ID's of the component disks
+have become scrambled.
+.Pp
+Having a system's root filesystem (/) on a RAID set is also allowed,
+with the 
+.Sq a
+partition of such a RAID set being used for /.
+To use raid0a as the root filesystem, simply use:
+.Bd -unfilled -offset indent
+raidctl -A root raid0
+.Ed
+.Pp
+To return raid0a to be just an auto-configuring set simply use the
+.Fl A Ar yes
+arguments.
+.Pp
+Note that kernels can only be directly read from RAID 1 components on
+alpha and pmax architectures.  On those architectures, the 
+.Dv FS_RAID
+filesystem is recognized by the bootblocks, and will properly load the
+kernel directly from a RAID 1 component.  For other architectures, or
+to support the root filesystem on other RAID sets, some other
+mechanism must be used to get a kernel booting.  For example, a small
+partition containing only the secondary boot-blocks and an alternate
+kernel (or two) could be used.  Once a kernel is booting however, and
+an auto-configuring RAID set is found that is eligible to be root,
+then that RAID set will be auto-configured and used as the root
+device.  If two or more RAID sets claim to be root devices, then the
+user will be prompted to select the root device.  At this time, RAID
+0, 1, 4, and 5 sets are all supported as root devices.
+.Pp
+A typical RAID 1 setup with root on RAID might be as follows:
+.Bl -enum
+.It 
+wd0a - a small partition, which contains a complete, bootable, basic
+NetBSD installation. 
+.It
+wd1a - also contains a complete, bootable, basic NetBSD installation.
+.It 
+wd0e and wd1e - a RAID 1 set, raid0, used for the root filesystem.
+.It
+wd0f and wd1f - a RAID 1 set, raid1, which will be used only for
+swap space. 
+.It
+wd0g and wd1g - a RAID 1 set, raid2, used for /usr, /home, or other
+data, if desired.
+.It 
+wd0h and wd0h - a RAID 1 set, raid3, if desired.
+.El
+.Pp
+RAID sets raid0, raid1, and raid2 are all marked as
+auto-configurable.  raid0 is marked as being a root filesystem.
+When new kernels are installed, the kernel is not only copied to /, 
+but also to wd0a and wd1a.  The kernel on wd0a is required, since that
+is the kernel the system boots from.  The kernel on wd1a is also
+required, since that will be the kernel used should wd0 fail.  The
+important point here is to have redundant copies of the kernel
+available, in the event that one of the drives fail.
+.Pp
+There is no requirement that the root filesystem be on the same disk
+as the kernel.  For example, obtaining the kernel from wd0a, and using
+da0s1e and da1s1e for raid0, and the root filesystem, is fine.  It 
+.Ar is
+critical, however, that there be multiple kernels available, in the
+event of media failure.
+.Pp
+Multi-layered RAID devices (such as a RAID 0 set made
+up of RAID 1 sets) are
+.Ar not
+supported as root devices or auto-configurable devices at this point.
+(Multi-layered RAID devices 
+.Ar are
+supported in general, however, as mentioned earlier.)  Note that in
+order to enable component auto-detection and auto-configuration of
+RAID devices, the line:
+.Bd -unfilled -offset indent
+options    RAID_AUTOCONFIG
+.Ed
+.Pp
+must be in the kernel configuration file.  See
+.Xr raid 4
+for more details.
+.Pp
+.Ss Unconfiguration
+The final operation performed by 
+.Nm
+is to unconfigure a 
+.Xr raid 4
+device.  This is accomplished via a simple:
+.Bd -unfilled -offset indent
+raidctl -u raid0
+.Ed
+.Pp
+at which point the device is ready to be reconfigured.
+.Pp
+.Ss Performance Tuning
+Selection of the various parameter values which result in the best
+performance can be quite tricky, and often requires a bit of
+trial-and-error to get those values most appropriate for a given system.
+A whole range of factors come into play, including:
+.Bl -enum
+.It
+Types of components (e.g. SCSI vs. IDE) and their bandwidth
+.It
+Types of controller cards and their bandwidth
+.It
+Distribution of components among controllers
+.It
+IO bandwidth
+.It
+Filesystem access patterns
+.It 
+CPU speed
+.El
+.Pp
+As with most performance tuning, benchmarking under real-life loads
+may be the only way to measure expected performance.  Understanding
+some of the underlying technology is also useful in tuning.  The goal
+of this section is to provide pointers to those parameters which may
+make significant differences in performance.
+.Pp
+For a RAID 1 set, a SectPerSU value of 64 or 128 is typically
+sufficient.  Since data in a RAID 1 set is arranged in a linear
+fashion on each component, selecting an appropriate stripe size is
+somewhat less critical than it is for a RAID 5 set.  However: a stripe
+size that is too small will cause large IO's to be broken up into a
+number of smaller ones, hurting performance.  At the same time, a
+large stripe size may cause problems with concurrent accesses to
+stripes, which may also affect performance.  Thus values in the range
+of 32 to 128 are often the most effective.
+.Pp
+Tuning RAID 5 sets is trickier.  In the best case, IO is presented to
+the RAID set one stripe at a time.  Since the entire stripe is
+available at the beginning of the IO, the parity of that stripe can
+be calculated before the stripe is written, and then the stripe data
+and parity can be written in parallel.  When the amount of data being
+written is less than a full stripe worth, the
+.Sq small write
+problem occurs.  Since a 
+.Sq small write
+means only a portion of the stripe on the components is going to
+change, the data (and parity) on the components must be updated
+slightly differently.  First, the 
+.Sq old parity
+and 
+.Sq old data
+must be read from the components.  Then the new parity is constructed,
+using the new data to be written, and the old data and old parity.
+Finally, the new data and new parity are written.  All this extra data
+shuffling results in a serious loss of performance, and is typically 2
+to 4 times slower than a full stripe write (or read).  To combat this
+problem in the real world, it may be useful to ensure that stripe
+sizes are small enough that a
+.Sq large IO
+from the system will use exactly one large stripe write. As is seen
+later, there are some filesystem dependencies which may come into play
+here as well.
+.Pp
+Since the size of a 
+.Sq large IO
+is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may
+be desirable to select a SectPerSU value of 16 blocks (8K) or 32
+blocks (16K).  Since there are 4 data sectors per stripe, the maximum
+data per stripe is 64 blocks (32K) or 128 blocks (64K).  Again,
+empirical measurement will provide the best indicators of which
+values will yeild better performance.
+.Pp
+The parameters used for the filesystem are also critical to good
+performance.  For 
+.Xr newfs 8 , 
+for example, increasing the block size to 32K or 64K may improve
+performance dramatically.  As well, changing the cylinders-per-group
+parameter from 16 to 32 or higher is often not only necessary for
+larger filesystems, but may also have positive performance
+implications.
+.Pp
+.Ss Summary
+Despite the length of this man-page, configuring a RAID set is a
+relatively straight-forward process.  All that needs to be done is the
+following steps:
+.Bl -enum
+.It 
+Use 
+.Xr disklabel 8 
+to create the components (of type RAID).
+.It 
+Construct a RAID configuration file: e.g. 
+.Sq raid0.conf 
+.It 
+Configure the RAID set with: 
+.Bd -unfilled -offset indent
+raidctl -C raid0.conf
+.Ed
+.Pp
+.It 
+Initialize the component labels with: 
+.Bd -unfilled -offset indent
+raidctl -I 123456 raid0
+.Ed
+.Pp
+.It 
+Initialize other important parts of the set with: 
+.Bd -unfilled -offset indent
+raidctl -i raid0
+.Ed
+.Pp
+.It
+Get the default label for the RAID set: 
+.Bd -unfilled -offset indent
+disklabel raid0 > /tmp/label
+.Ed
+.Pp
+.It 
+Edit the label: 
+.Bd -unfilled -offset indent
+vi /tmp/label
+.Ed
+.Pp
+.It 
+Put the new label on the RAID set: 
+.Bd -unfilled -offset indent
+disklabel -R -r raid0 /tmp/label
+.Ed
+.Pp
+.It 
+Create the filesystem: 
+.Bd -unfilled -offset indent
+newfs /dev/rraid0e 
+.Ed
+.Pp
+.It
+Mount the filesystem: 
+.Bd -unfilled -offset indent
+mount /dev/raid0e /mnt
+.Ed
+.Pp
+.It
+Use:
+.Bd -unfilled -offset indent
+raidctl -c raid0.conf
+.Ed
+.Pp
+To re-configure the RAID set the next time it is needed, or put
+raid0.conf into /etc where it will automatically be started by 
+the /etc/rc scripts.
+.El
+.Pp
+.Sh WARNINGS
+Certain RAID levels (1, 4, 5, 6, and others) can protect against some
+data loss due to component failure.  However the loss of two
+components of a RAID 4 or 5 system, or the loss of a single component
+of a RAID 0 system will result in the entire filesystem being lost.
+RAID is 
+.Ar NOT
+a substitute for good backup practices.
+.Pp
+Recomputation of parity 
+.Ar MUST
+be performed whenever there is a chance that it may have been
+compromised.  This includes after system crashes, or before a RAID
+device has been used for the first time.  Failure to keep parity
+correct will be catastrophic should a component ever fail -- it is
+better to use RAID 0 and get the additional space and speed, than it
+is to use parity, but not keep the parity correct.  At least with RAID
+0 there is no perception of increased data security.
+.Pp
+.Sh FILES
+.Bl -tag -width /dev/XXrXraidX -compact
+.It Pa /dev/{,r}raid*
+.Cm raid 
+device special files.
+.El
+.Pp
+.Sh SEE ALSO
+.Xr raid 4 ,
+.Xr ccd 4 ,
+.Xr rc 8
+.Sh BUGS
+Hot-spare removal is currently not available.
+.Sh HISTORY
+RAIDframe is a framework for rapid prototyping of RAID structures
+developed by the folks at the Parallel Data Laboratory at Carnegie
+Mellon University (CMU).  
+A more complete description of the internals and functionality of
+RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
+for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
+Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
+Parallel Data Laboratory of Carnegie Mellon University.
+.Pp
+The
+.Nm
+command first appeared as a program in CMU's RAIDframe v1.1 distribution.  This
+version of
+.Nm
+is a complete re-write, and first appeared in
+.Fx 4.4 .
+.Sh COPYRIGHT
+.Bd -unfilled
+The RAIDframe Copyright is as follows:
+
+Copyright (c) 1994-1996 Carnegie-Mellon University.
+All rights reserved.
+
+Permission to use, copy, modify and distribute this software and
+its documentation is hereby granted, provided that both the copyright
+notice and this permission notice appear in all copies of the
+software, derivative works or modified versions, and any portions
+thereof, and that both notices appear in supporting documentation.
+
+CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
+CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
+FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
+
+Carnegie Mellon requests users of this software to return to
+
+ Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
+ School of Computer Science
+ Carnegie Mellon University
+ Pittsburgh PA 15213-3890
+
+any improvements or extensions that they make and grant Carnegie the
+rights to redistribute these changes.
+.Ed