summaryrefslogtreecommitdiffstats
path: root/share/doc/psd
diff options
context:
space:
mode:
authorgrog <grog@FreeBSD.org>2002-05-19 05:57:43 +0000
committergrog <grog@FreeBSD.org>2002-05-19 05:57:43 +0000
commit6c5f4af7cbed52bd39c7d0fa1117fad665887bb6 (patch)
treec5e29fcfe83241a2f8f80a068a7b0e69be91acd9 /share/doc/psd
parent850ebcf984161ee76292c91f4344c0a2c72f5d90 (diff)
downloadFreeBSD-src-6c5f4af7cbed52bd39c7d0fa1117fad665887bb6.zip
FreeBSD-src-6c5f4af7cbed52bd39c7d0fa1117fad665887bb6.tar.gz
Initial checkin: 4.4BSD version. These files need to be updated with
current license information and adapted to the FreeBSD build environment before they will build. Approved by: David Taylor <davidt@caldera.com>
Diffstat (limited to 'share/doc/psd')
-rw-r--r--share/doc/psd/03.iosys/Makefile11
-rw-r--r--share/doc/psd/03.iosys/iosys1054
2 files changed, 1065 insertions, 0 deletions
diff --git a/share/doc/psd/03.iosys/Makefile b/share/doc/psd/03.iosys/Makefile
new file mode 100644
index 0000000..40132a6
--- /dev/null
+++ b/share/doc/psd/03.iosys/Makefile
@@ -0,0 +1,11 @@
+# @(#)Makefile 8.1 (Berkeley) 6/8/93
+# $FreeBSD$
+
+DIR= psd/03.iosys
+SRCS= iosys
+MACROS= -ms
+
+paper.ps: ${SRCS}
+ ${ROFF} ${SRCS} > ${.TARGET}
+
+.include <bsd.doc.mk>
diff --git a/share/doc/psd/03.iosys/iosys b/share/doc/psd/03.iosys/iosys
new file mode 100644
index 0000000..678c4c0
--- /dev/null
+++ b/share/doc/psd/03.iosys/iosys
@@ -0,0 +1,1054 @@
+.\" This module is believed to contain source code proprietary to AT&T.
+.\" Use and redistribution is subject to the Berkeley Software License
+.\" Agreement and your Software Agreement with AT&T (Western Electric).
+.\"
+.\" @(#)iosys 8.1 (Berkeley) 6/8/93
+.\"
+.\" $FreeBSD$
+.EH 'PSD:3-%''The UNIX I/O System'
+.OH 'The UNIX I/O System''PSD:3-%'
+.TL
+The UNIX I/O System
+.AU
+Dennis M. Ritchie
+.AI
+AT&T Bell Laboratories
+Murray Hill, NJ
+.PP
+This paper gives an overview of the workings of the UNIX\(dg
+.FS
+\(dgUNIX is a Trademark of Bell Laboratories.
+.FE
+I/O system.
+It was written with an eye toward providing
+guidance to writers of device driver routines,
+and is oriented more toward describing the environment
+and nature of device drivers than the implementation
+of that part of the file system which deals with
+ordinary files.
+.PP
+It is assumed that the reader has a good knowledge
+of the overall structure of the file system as discussed
+in the paper ``The UNIX Time-sharing System.''
+A more detailed discussion
+appears in
+``UNIX Implementation;''
+the current document restates parts of that one,
+but is still more detailed.
+It is most useful in
+conjunction with a copy of the system code,
+since it is basically an exegesis of that code.
+.SH
+Device Classes
+.PP
+There are two classes of device:
+.I block
+and
+.I character.
+The block interface is suitable for devices
+like disks, tapes, and DECtape
+which work, or can work, with addressible 512-byte blocks.
+Ordinary magnetic tape just barely fits in this category,
+since by use of forward
+and
+backward spacing any block can be read, even though
+blocks can be written only at the end of the tape.
+Block devices can at least potentially contain a mounted
+file system.
+The interface to block devices is very highly structured;
+the drivers for these devices share a great many routines
+as well as a pool of buffers.
+.PP
+Character-type devices have a much
+more straightforward interface, although
+more work must be done by the driver itself.
+.PP
+Devices of both types are named by a
+.I major
+and a
+.I minor
+device number.
+These numbers are generally stored as an integer
+with the minor device number
+in the low-order 8 bits and the major device number
+in the next-higher 8 bits;
+macros
+.I major
+and
+.I minor
+are available to access these numbers.
+The major device number selects which driver will deal with
+the device; the minor device number is not used
+by the rest of the system but is passed to the
+driver at appropriate times.
+Typically the minor number
+selects a subdevice attached to
+a given controller, or one of
+several similar hardware interfaces.
+.PP
+The major device numbers for block and character devices
+are used as indices in separate tables;
+they both start at 0 and therefore overlap.
+.SH
+Overview of I/O
+.PP
+The purpose of
+the
+.I open
+and
+.I creat
+system calls is to set up entries in three separate
+system tables.
+The first of these is the
+.I u_ofile
+table,
+which is stored in the system's per-process
+data area
+.I u.
+This table is indexed by
+the file descriptor returned by the
+.I open
+or
+.I creat,
+and is accessed during
+a
+.I read,
+.I write,
+or other operation on the open file.
+An entry contains only
+a pointer to the corresponding
+entry of the
+.I file
+table,
+which is a per-system data base.
+There is one entry in the
+.I file
+table for each
+instance of
+.I open
+or
+.I creat.
+This table is per-system because the same instance
+of an open file must be shared among the several processes
+which can result from
+.I forks
+after the file is opened.
+A
+.I file
+table entry contains
+flags which indicate whether the file
+was open for reading or writing or is a pipe, and
+a count which is used to decide when all processes
+using the entry have terminated or closed the file
+(so the entry can be abandoned).
+There is also a 32-bit file offset
+which is used to indicate where in the file the next read
+or write will take place.
+Finally, there is a pointer to the
+entry for the file in the
+.I inode
+table,
+which contains a copy of the file's i-node.
+.PP
+Certain open files can be designated ``multiplexed''
+files, and several other flags apply to such
+channels.
+In such a case, instead of an offset,
+there is a pointer to an associated multiplex channel table.
+Multiplex channels will not be discussed here.
+.PP
+An entry in the
+.I file
+table corresponds precisely to an instance of
+.I open
+or
+.I creat;
+if the same file is opened several times,
+it will have several
+entries in this table.
+However,
+there is at most one entry
+in the
+.I inode
+table for a given file.
+Also, a file may enter the
+.I inode
+table not only because it is open,
+but also because it is the current directory
+of some process or because it
+is a special file containing a currently-mounted
+file system.
+.PP
+An entry in the
+.I inode
+table differs somewhat from the
+corresponding i-node as stored on the disk;
+the modified and accessed times are not stored,
+and the entry is augmented
+by a flag word containing information about the entry,
+a count used to determine when it may be
+allowed to disappear,
+and the device and i-number
+whence the entry came.
+Also, the several block numbers that give addressing
+information for the file are expanded from
+the 3-byte, compressed format used on the disk to full
+.I long
+quantities.
+.PP
+During the processing of an
+.I open
+or
+.I creat
+call for a special file,
+the system always calls the device's
+.I open
+routine to allow for any special processing
+required (rewinding a tape, turning on
+the data-terminal-ready lead of a modem, etc.).
+However,
+the
+.I close
+routine is called only when the last
+process closes a file,
+that is, when the i-node table entry
+is being deallocated.
+Thus it is not feasible
+for a device to maintain, or depend on,
+a count of its users, although it is quite
+possible to
+implement an exclusive-use device which cannot
+be reopened until it has been closed.
+.PP
+When a
+.I read
+or
+.I write
+takes place,
+the user's arguments
+and the
+.I file
+table entry are used to set up the
+variables
+.I u.u_base,
+.I u.u_count,
+and
+.I u.u_offset
+which respectively contain the (user) address
+of the I/O target area, the byte-count for the transfer,
+and the current location in the file.
+If the file referred to is
+a character-type special file, the appropriate read
+or write routine is called; it is responsible
+for transferring data and updating the
+count and current location appropriately
+as discussed below.
+Otherwise, the current location is used to calculate
+a logical block number in the file.
+If the file is an ordinary file the logical block
+number must be mapped (possibly using indirect blocks)
+to a physical block number; a block-type
+special file need not be mapped.
+This mapping is performed by the
+.I bmap
+routine.
+In any event, the resulting physical block number
+is used, as discussed below, to
+read or write the appropriate device.
+.SH
+Character Device Drivers
+.PP
+The
+.I cdevsw
+table specifies the interface routines present for
+character devices.
+Each device provides five routines:
+open, close, read, write, and special-function
+(to implement the
+.I ioctl
+system call).
+Any of these may be missing.
+If a call on the routine
+should be ignored,
+(e.g.
+.I open
+on non-exclusive devices that require no setup)
+the
+.I cdevsw
+entry can be given as
+.I nulldev;
+if it should be considered an error,
+(e.g.
+.I write
+on read-only devices)
+.I nodev
+is used.
+For terminals,
+the
+.I cdevsw
+structure also contains a pointer to the
+.I tty
+structure associated with the terminal.
+.PP
+The
+.I open
+routine is called each time the file
+is opened with the full device number as argument.
+The second argument is a flag which is
+non-zero only if the device is to be written upon.
+.PP
+The
+.I close
+routine is called only when the file
+is closed for the last time,
+that is when the very last process in
+which the file is open closes it.
+This means it is not possible for the driver to
+maintain its own count of its users.
+The first argument is the device number;
+the second is a flag which is non-zero
+if the file was open for writing in the process which
+performs the final
+.I close.
+.PP
+When
+.I write
+is called, it is supplied the device
+as argument.
+The per-user variable
+.I u.u_count
+has been set to
+the number of characters indicated by the user;
+for character devices, this number may be 0
+initially.
+.I u.u_base
+is the address supplied by the user from which to start
+taking characters.
+The system may call the
+routine internally, so the
+flag
+.I u.u_segflg
+is supplied that indicates,
+if
+.I on,
+that
+.I u.u_base
+refers to the system address space instead of
+the user's.
+.PP
+The
+.I write
+routine
+should copy up to
+.I u.u_count
+characters from the user's buffer to the device,
+decrementing
+.I u.u_count
+for each character passed.
+For most drivers, which work one character at a time,
+the routine
+.I "cpass( )"
+is used to pick up characters
+from the user's buffer.
+Successive calls on it return
+the characters to be written until
+.I u.u_count
+goes to 0 or an error occurs,
+when it returns \(mi1.
+.I Cpass
+takes care of interrogating
+.I u.u_segflg
+and updating
+.I u.u_count.
+.PP
+Write routines which want to transfer
+a probably large number of characters into an internal
+buffer may also use the routine
+.I "iomove(buffer, offset, count, flag)"
+which is faster when many characters must be moved.
+.I Iomove
+transfers up to
+.I count
+characters into the
+.I buffer
+starting
+.I offset
+bytes from the start of the buffer;
+.I flag
+should be
+.I B_WRITE
+(which is 0) in the write case.
+Caution:
+the caller is responsible for making sure
+the count is not too large and is non-zero.
+As an efficiency note,
+.I iomove
+is much slower if any of
+.I "buffer+offset, count"
+or
+.I u.u_base
+is odd.
+.PP
+The device's
+.I read
+routine is called under conditions similar to
+.I write,
+except that
+.I u.u_count
+is guaranteed to be non-zero.
+To return characters to the user, the routine
+.I "passc(c)"
+is available; it takes care of housekeeping
+like
+.I cpass
+and returns \(mi1 as the last character
+specified by
+.I u.u_count
+is returned to the user;
+before that time, 0 is returned.
+.I Iomove
+is also usable as with
+.I write;
+the flag should be
+.I B_READ
+but the same cautions apply.
+.PP
+The ``special-functions'' routine
+is invoked by the
+.I stty
+and
+.I gtty
+system calls as follows:
+.I "(*p) (dev, v)"
+where
+.I p
+is a pointer to the device's routine,
+.I dev
+is the device number,
+and
+.I v
+is a vector.
+In the
+.I gtty
+case,
+the device is supposed to place up to 3 words of status information
+into the vector; this will be returned to the caller.
+In the
+.I stty
+case,
+.I v
+is 0;
+the device should take up to 3 words of
+control information from
+the array
+.I "u.u_arg[0...2]."
+.PP
+Finally, each device should have appropriate interrupt-time
+routines.
+When an interrupt occurs, it is turned into a C-compatible call
+on the devices's interrupt routine.
+The interrupt-catching mechanism makes
+the low-order four bits of the ``new PS'' word in the
+trap vector for the interrupt available
+to the interrupt handler.
+This is conventionally used by drivers
+which deal with multiple similar devices
+to encode the minor device number.
+After the interrupt has been processed,
+a return from the interrupt handler will
+return from the interrupt itself.
+.PP
+A number of subroutines are available which are useful
+to character device drivers.
+Most of these handlers, for example, need a place
+to buffer characters in the internal interface
+between their ``top half'' (read/write)
+and ``bottom half'' (interrupt) routines.
+For relatively low data-rate devices, the best mechanism
+is the character queue maintained by the
+routines
+.I getc
+and
+.I putc.
+A queue header has the structure
+.DS
+struct {
+ int c_cc; /* character count */
+ char *c_cf; /* first character */
+ char *c_cl; /* last character */
+} queue;
+.DE
+A character is placed on the end of a queue by
+.I "putc(c, &queue)"
+where
+.I c
+is the character and
+.I queue
+is the queue header.
+The routine returns \(mi1 if there is no space
+to put the character, 0 otherwise.
+The first character on the queue may be retrieved
+by
+.I "getc(&queue)"
+which returns either the (non-negative) character
+or \(mi1 if the queue is empty.
+.PP
+Notice that the space for characters in queues is
+shared among all devices in the system
+and in the standard system there are only some 600
+character slots available.
+Thus device handlers,
+especially write routines, must take
+care to avoid gobbling up excessive numbers of characters.
+.PP
+The other major help available
+to device handlers is the sleep-wakeup mechanism.
+The call
+.I "sleep(event, priority)"
+causes the process to wait (allowing other processes to run)
+until the
+.I event
+occurs;
+at that time, the process is marked ready-to-run
+and the call will return when there is no
+process with higher
+.I priority.
+.PP
+The call
+.I "wakeup(event)"
+indicates that the
+.I event
+has happened, that is, causes processes sleeping
+on the event to be awakened.
+The
+.I event
+is an arbitrary quantity agreed upon
+by the sleeper and the waker-up.
+By convention, it is the address of some data area used
+by the driver, which guarantees that events
+are unique.
+.PP
+Processes sleeping on an event should not assume
+that the event has really happened;
+they should check that the conditions which
+caused them to sleep no longer hold.
+.PP
+Priorities can range from 0 to 127;
+a higher numerical value indicates a less-favored
+scheduling situation.
+A distinction is made between processes sleeping
+at priority less than the parameter
+.I PZERO
+and those at numerically larger priorities.
+The former cannot
+be interrupted by signals, although it
+is conceivable that it may be swapped out.
+Thus it is a bad idea to sleep with
+priority less than PZERO on an event which might never occur.
+On the other hand, calls to
+.I sleep
+with larger priority
+may never return if the process is terminated by
+some signal in the meantime.
+Incidentally, it is a gross error to call
+.I sleep
+in a routine called at interrupt time, since the process
+which is running is almost certainly not the
+process which should go to sleep.
+Likewise, none of the variables in the user area
+``\fIu\fB.\fR''
+should be touched, let alone changed, by an interrupt routine.
+.PP
+If a device driver
+wishes to wait for some event for which it is inconvenient
+or impossible to supply a
+.I wakeup,
+(for example, a device going on-line, which does not
+generally cause an interrupt),
+the call
+.I "sleep(&lbolt, priority)
+may be given.
+.I Lbolt
+is an external cell whose address is awakened once every 4 seconds
+by the clock interrupt routine.
+.PP
+The routines
+.I "spl4( ), spl5( ), spl6( ), spl7( )"
+are available to
+set the processor priority level as indicated to avoid
+inconvenient interrupts from the device.
+.PP
+If a device needs to know about real-time intervals,
+then
+.I "timeout(func, arg, interval)
+will be useful.
+This routine arranges that after
+.I interval
+sixtieths of a second, the
+.I func
+will be called with
+.I arg
+as argument, in the style
+.I "(*func)(arg).
+Timeouts are used, for example,
+to provide real-time delays after function characters
+like new-line and tab in typewriter output,
+and to terminate an attempt to
+read the 201 Dataphone
+.I dp
+if there is no response within a specified number
+of seconds.
+Notice that the number of sixtieths of a second is limited to 32767,
+since it must appear to be positive,
+and that only a bounded number of timeouts
+can be going on at once.
+Also, the specified
+.I func
+is called at clock-interrupt time, so it should
+conform to the requirements of interrupt routines
+in general.
+.SH
+The Block-device Interface
+.PP
+Handling of block devices is mediated by a collection
+of routines that manage a set of buffers containing
+the images of blocks of data on the various devices.
+The most important purpose of these routines is to assure
+that several processes that access the same block of the same
+device in multiprogrammed fashion maintain a consistent
+view of the data in the block.
+A secondary but still important purpose is to increase
+the efficiency of the system by
+keeping in-core copies of blocks that are being
+accessed frequently.
+The main data base for this mechanism is the
+table of buffers
+.I buf.
+Each buffer header contains a pair of pointers
+.I "(b_forw, b_back)"
+which maintain a doubly-linked list
+of the buffers associated with a particular
+block device, and a
+pair of pointers
+.I "(av_forw, av_back)"
+which generally maintain a doubly-linked list of blocks
+which are ``free,'' that is,
+eligible to be reallocated for another transaction.
+Buffers that have I/O in progress
+or are busy for other purposes do not appear in this list.
+The buffer header
+also contains the device and block number to which the
+buffer refers, and a pointer to the actual storage associated with
+the buffer.
+There is a word count
+which is the negative of the number of words
+to be transferred to or from the buffer;
+there is also an error byte and a residual word
+count used to communicate information
+from an I/O routine to its caller.
+Finally, there is a flag word
+with bits indicating the status of the buffer.
+These flags will be discussed below.
+.PP
+Seven routines constitute
+the most important part of the interface with the
+rest of the system.
+Given a device and block number,
+both
+.I bread
+and
+.I getblk
+return a pointer to a buffer header for the block;
+the difference is that
+.I bread
+is guaranteed to return a buffer actually containing the
+current data for the block,
+while
+.I getblk
+returns a buffer which contains the data in the
+block only if it is already in core (whether it is
+or not is indicated by the
+.I B_DONE
+bit; see below).
+In either case the buffer, and the corresponding
+device block, is made ``busy,''
+so that other processes referring to it
+are obliged to wait until it becomes free.
+.I Getblk
+is used, for example,
+when a block is about to be totally rewritten,
+so that its previous contents are
+not useful;
+still, no other process can be allowed to refer to the block
+until the new data is placed into it.
+.PP
+The
+.I breada
+routine is used to implement read-ahead.
+it is logically similar to
+.I bread,
+but takes as an additional argument the number of
+a block (on the same device) to be read asynchronously
+after the specifically requested block is available.
+.PP
+Given a pointer to a buffer,
+the
+.I brelse
+routine
+makes the buffer again available to other processes.
+It is called, for example, after
+data has been extracted following a
+.I bread.
+There are three subtly-different write routines,
+all of which take a buffer pointer as argument,
+and all of which logically release the buffer for
+use by others and place it on the free list.
+.I Bwrite
+puts the
+buffer on the appropriate device queue,
+waits for the write to be done,
+and sets the user's error flag if required.
+.I Bawrite
+places the buffer on the device's queue, but does not wait
+for completion, so that errors cannot be reflected directly to
+the user.
+.I Bdwrite
+does not start any I/O operation at all,
+but merely marks
+the buffer so that if it happens
+to be grabbed from the free list to contain
+data from some other block, the data in it will
+first be written
+out.
+.PP
+.I Bwrite
+is used when one wants to be sure that
+I/O takes place correctly, and that
+errors are reflected to the proper user;
+it is used, for example, when updating i-nodes.
+.I Bawrite
+is useful when more overlap is desired
+(because no wait is required for I/O to finish)
+but when it is reasonably certain that the
+write is really required.
+.I Bdwrite
+is used when there is doubt that the write is
+needed at the moment.
+For example,
+.I bdwrite
+is called when the last byte of a
+.I write
+system call falls short of the end of a
+block, on the assumption that
+another
+.I write
+will be given soon which will re-use the same block.
+On the other hand,
+as the end of a block is passed,
+.I bawrite
+is called, since probably the block will
+not be accessed again soon and one might as
+well start the writing process as soon as possible.
+.PP
+In any event, notice that the routines
+.I "getblk"
+and
+.I bread
+dedicate the given block exclusively to the
+use of the caller, and make others wait,
+while one of
+.I "brelse, bwrite, bawrite,"
+or
+.I bdwrite
+must eventually be called to free the block for use by others.
+.PP
+As mentioned, each buffer header contains a flag
+word which indicates the status of the buffer.
+Since they provide
+one important channel for information between the drivers and the
+block I/O system, it is important to understand these flags.
+The following names are manifest constants which
+select the associated flag bits.
+.IP B_READ 10
+This bit is set when the buffer is handed to the device strategy routine
+(see below) to indicate a read operation.
+The symbol
+.I B_WRITE
+is defined as 0 and does not define a flag; it is provided
+as a mnemonic convenience to callers of routines like
+.I swap
+which have a separate argument
+which indicates read or write.
+.IP B_DONE 10
+This bit is set
+to 0 when a block is handed to the the device strategy
+routine and is turned on when the operation completes,
+whether normally as the result of an error.
+It is also used as part of the return argument of
+.I getblk
+to indicate if 1 that the returned
+buffer actually contains the data in the requested block.
+.IP B_ERROR 10
+This bit may be set to 1 when
+.I B_DONE
+is set to indicate that an I/O or other error occurred.
+If it is set the
+.I b_error
+byte of the buffer header may contain an error code
+if it is non-zero.
+If
+.I b_error
+is 0 the nature of the error is not specified.
+Actually no driver at present sets
+.I b_error;
+the latter is provided for a future improvement
+whereby a more detailed error-reporting
+scheme may be implemented.
+.IP B_BUSY 10
+This bit indicates that the buffer header is not on
+the free list, i.e. is
+dedicated to someone's exclusive use.
+The buffer still remains attached to the list of
+blocks associated with its device, however.
+When
+.I getblk
+(or
+.I bread,
+which calls it) searches the buffer list
+for a given device and finds the requested
+block with this bit on, it sleeps until the bit
+clears.
+.IP B_PHYS 10
+This bit is set for raw I/O transactions that
+need to allocate the Unibus map on an 11/70.
+.IP B_MAP 10
+This bit is set on buffers that have the Unibus map allocated,
+so that the
+.I iodone
+routine knows to deallocate the map.
+.IP B_WANTED 10
+This flag is used in conjunction with the
+.I B_BUSY
+bit.
+Before sleeping as described
+just above,
+.I getblk
+sets this flag.
+Conversely, when the block is freed and the busy bit
+goes down (in
+.I brelse)
+a
+.I wakeup
+is given for the block header whenever
+.I B_WANTED
+is on.
+This strategem avoids the overhead
+of having to call
+.I wakeup
+every time a buffer is freed on the chance that someone
+might want it.
+.IP B_AGE
+This bit may be set on buffers just before releasing them; if it
+is on,
+the buffer is placed at the head of the free list, rather than at the
+tail.
+It is a performance heuristic
+used when the caller judges that the same block will not soon be used again.
+.IP B_ASYNC 10
+This bit is set by
+.I bawrite
+to indicate to the appropriate device driver
+that the buffer should be released when the
+write has been finished, usually at interrupt time.
+The difference between
+.I bwrite
+and
+.I bawrite
+is that the former starts I/O, waits until it is done, and
+frees the buffer.
+The latter merely sets this bit and starts I/O.
+The bit indicates that
+.I relse
+should be called for the buffer on completion.
+.IP B_DELWRI 10
+This bit is set by
+.I bdwrite
+before releasing the buffer.
+When
+.I getblk,
+while searching for a free block,
+discovers the bit is 1 in a buffer it would otherwise grab,
+it causes the block to be written out before reusing it.
+.SH
+Block Device Drivers
+.PP
+The
+.I bdevsw
+table contains the names of the interface routines
+and that of a table for each block device.
+.PP
+Just as for character devices, block device drivers may supply
+an
+.I open
+and a
+.I close
+routine
+called respectively on each open and on the final close
+of the device.
+Instead of separate read and write routines,
+each block device driver has a
+.I strategy
+routine which is called with a pointer to a buffer
+header as argument.
+As discussed, the buffer header contains
+a read/write flag, the core address,
+the block number, a (negative) word count,
+and the major and minor device number.
+The role of the strategy routine
+is to carry out the operation as requested by the
+information in the buffer header.
+When the transaction is complete the
+.I B_DONE
+(and possibly the
+.I B_ERROR)
+bits should be set.
+Then if the
+.I B_ASYNC
+bit is set,
+.I brelse
+should be called;
+otherwise,
+.I wakeup.
+In cases where the device
+is capable, under error-free operation,
+of transferring fewer words than requested,
+the device's word-count register should be placed
+in the residual count slot of
+the buffer header;
+otherwise, the residual count should be set to 0.
+This particular mechanism is really for the benefit
+of the magtape driver;
+when reading this device
+records shorter than requested are quite normal,
+and the user should be told the actual length of the record.
+.PP
+Although the most usual argument
+to the strategy routines
+is a genuine buffer header allocated as discussed above,
+all that is actually required
+is that the argument be a pointer to a place containing the
+appropriate information.
+For example the
+.I swap
+routine, which manages movement
+of core images to and from the swapping device,
+uses the strategy routine
+for this device.
+Care has to be taken that
+no extraneous bits get turned on in the
+flag word.
+.PP
+The device's table specified by
+.I bdevsw
+has a
+byte to contain an active flag and an error count,
+a pair of links which constitute the
+head of the chain of buffers for the device
+.I "(b_forw, b_back),"
+and a first and last pointer for a device queue.
+Of these things, all are used solely by the device driver
+itself
+except for the buffer-chain pointers.
+Typically the flag encodes the state of the
+device, and is used at a minimum to
+indicate that the device is currently engaged in
+transferring information and no new command should be issued.
+The error count is useful for counting retries
+when errors occur.
+The device queue is used to remember stacked requests;
+in the simplest case it may be maintained as a first-in
+first-out list.
+Since buffers which have been handed over to
+the strategy routines are never
+on the list of free buffers,
+the pointers in the buffer which maintain the free list
+.I "(av_forw, av_back)"
+are also used to contain the pointers
+which maintain the device queues.
+.PP
+A couple of routines
+are provided which are useful to block device drivers.
+.I "iodone(bp)"
+arranges that the buffer to which
+.I bp
+points be released or awakened,
+as appropriate,
+when the
+strategy module has finished with the buffer,
+either normally or after an error.
+(In the latter case the
+.I B_ERROR
+bit has presumably been set.)
+.PP
+The routine
+.I "geterror(bp)"
+can be used to examine the error bit in a buffer header
+and arrange that any error indication found therein is
+reflected to the user.
+It may be called only in the non-interrupt
+part of a driver when I/O has completed
+.I (B_DONE
+has been set).
+.SH
+Raw Block-device I/O
+.PP
+A scheme has been set up whereby block device drivers may
+provide the ability to transfer information
+directly between the user's core image and the device
+without the use of buffers and in blocks as large as
+the caller requests.
+The method involves setting up a character-type special file
+corresponding to the raw device
+and providing
+.I read
+and
+.I write
+routines which set up what is usually a private,
+non-shared buffer header with the appropriate information
+and call the device's strategy routine.
+If desired, separate
+.I open
+and
+.I close
+routines may be provided but this is usually unnecessary.
+A special-function routine might come in handy, especially for
+magtape.
+.PP
+A great deal of work has to be done to generate the
+``appropriate information''
+to put in the argument buffer for
+the strategy module;
+the worst part is to map relocated user addresses to physical addresses.
+Most of this work is done by
+.I "physio(strat, bp, dev, rw)
+whose arguments are the name of the
+strategy routine
+.I strat,
+the buffer pointer
+.I bp,
+the device number
+.I dev,
+and a read-write flag
+.I rw
+whose value is either
+.I B_READ
+or
+.I B_WRITE.
+.I Physio
+makes sure that the user's base address and count are
+even (because most devices work in words)
+and that the core area affected is contiguous
+in physical space;
+it delays until the buffer is not busy, and makes it
+busy while the operation is in progress;
+and it sets up user error return information.
OpenPOWER on IntegriCloud