summaryrefslogtreecommitdiffstats
path: root/sys/geom/sched/README
diff options
context:
space:
mode:
Diffstat (limited to 'sys/geom/sched/README')
-rw-r--r--sys/geom/sched/README162
1 files changed, 162 insertions, 0 deletions
diff --git a/sys/geom/sched/README b/sys/geom/sched/README
new file mode 100644
index 0000000..1b52d90
--- /dev/null
+++ b/sys/geom/sched/README
@@ -0,0 +1,162 @@
+
+ --- GEOM BASED DISK SCHEDULERS FOR FREEBSD ---
+
+This code contains a framework for GEOM-based disk schedulers and a
+couple of sample scheduling algorithms that use the framework and
+implement two forms of "anticipatory scheduling" (see below for more
+details).
+
+As a quick example of what this code can give you, try to run "dd",
+"tar", or some other program with highly SEQUENTIAL access patterns,
+together with "cvs", "cvsup", "svn" or other highly RANDOM access patterns
+(this is not a made-up example: it is pretty common for developers
+to have one or more apps doing random accesses, and others that do
+sequential accesses e.g., loading large binaries from disk, checking
+the integrity of tarballs, watching media streams and so on).
+
+These are the results we get on a local machine (AMD BE2400 dual
+core CPU, SATA 250GB disk):
+
+ /mnt is a partition mounted on /dev/ad0s1f
+
+ cvs: cvs -d /mnt/home/ncvs-local update -Pd /mnt/ports
+ dd-read: dd bs=128k of=/dev/null if=/dev/ad0 (or ad0-sched-)
+ dd-writew dd bs=128k if=/dev/zero of=/mnt/largefile
+
+ NO SCHEDULER RR SCHEDULER
+ dd cvs dd cvs
+
+ dd-read only 72 MB/s ---- 72 MB/s ---
+ dd-write only 55 MB/s --- 55 MB/s ---
+ dd-read+cvs 6 MB/s ok 30 MB/s ok
+ dd-write+cvs 55 MB/s slooow 14 MB/s ok
+
+As you can see, when a cvs is running concurrently with dd, the
+performance drops dramatically, and depending on read or write mode,
+one of the two is severely penalized. The use of the RR scheduler
+in this example makes the dd-reader go much faster when competing
+with cvs, and lets cvs progress when competing with a writer.
+
+To try it out:
+
+1. USERS OF FREEBSD 7, PLEASE READ CAREFULLY THE FOLLOWING:
+
+ On loading, this module patches one kernel function (g_io_request())
+ so that I/O requests ("bio's") carry a classification tag, useful
+ for scheduling purposes.
+
+ ON FREEBSD 7, the tag is stored in an existing (though rarely used)
+ field of the "struct bio", a solution which makes this module
+ incompatible with other modules using it, such as ZFS and gjournal.
+ Additionally, g_io_request() is patched in-memory to add a call
+ to the function that initializes this field (i386/amd64 only;
+ for other architectures you need to manually patch sys/geom/geom_io.c).
+ See details in the file g_sched.c.
+
+ On FreeBSD 8.0 and above, the above trick is not necessary,
+ as the struct bio contains dedicated fields for the classifier,
+ and hooks for request classifiers.
+
+ If you don't like the above, don't run this code.
+
+2. PLEASE MAKE SURE THAT THE DISK THAT YOU WILL BE USING FOR TESTS
+ DOES NOT CONTAIN PRECIOUS DATA.
+ This is experimental code, so we make no guarantees, though
+ I am routinely using it on my desktop and laptop.
+
+3. EXTRACT AND BUILD THE PROGRAMS
+ A 'make install' in the directory should work (with root privs),
+ or you can even try the binary modules.
+ If you want to build the modules yourself, look at the Makefile.
+
+4. LOAD THE MODULE, CREATE A GEOM NODE, RUN TESTS
+
+ The scheduler's module must be loaded first:
+
+ # kldload gsched_rr
+
+ substitute with gsched_as to test AS. Then, supposing that you are
+ using /dev/ad0 for testing, a scheduler can be attached to it with:
+
+ # geom sched insert ad0
+
+ The scheduler is inserted transparently in the geom chain, so
+ mounted partitions and filesystems will keep working, but
+ now requests will go through the scheduler.
+
+ To change scheduler on-the-fly, you can reconfigure the geom:
+
+ # geom sched configure -a as ad0.sched.
+
+ assuming that gsched_as was loaded previously.
+
+5. SCHEDULER REMOVAL
+
+ In principle it is possible to remove the scheduler module
+ even on an active chain by doing
+
+ # geom sched destroy ad0.sched.
+
+ However, there is some race in the geom subsystem which makes
+ the removal unsafe if there are active requests on a chain.
+ So, in order to reduce the risk of data losses, make sure
+ you don't remove a scheduler from a chain with ongoing transactions.
+
+--- NOTES ON THE SCHEDULERS ---
+
+The important contribution of this code is the framework to experiment
+with different scheduling algorithms. 'Anticipatory scheduling'
+is a very powerful technique based on the following reasoning:
+
+ The disk throughput is much better if it serves sequential requests.
+ If we have a mix of sequential and random requests, and we see a
+ non-sequential request, do not serve it immediately but instead wait
+ a little bit (2..5ms) to see if there is another one coming that
+ the disk can serve more efficiently.
+
+There are many details that should be added to make sure that the
+mechanism is effective with different workloads and systems, to
+gain a few extra percent in performance, to improve fairness,
+insulation among processes etc. A discussion of the vast literature
+on the subject is beyond the purpose of this short note.
+
+--------------------------------------------------------------------------
+
+TRANSPARENT INSERT/DELETE
+
+geom_sched is an ordinary geom module, however it is convenient
+to plug it transparently into the geom graph, so that one can
+enable or disable scheduling on a mounted filesystem, and the
+names in /etc/fstab do not depend on the presence of the scheduler.
+
+To understand how this works in practice, remember that in GEOM
+we have "providers" and "geom" objects.
+Say that we want to hook a scheduler on provider "ad0",
+accessible through pointer 'pp'. Originally, pp is attached to
+geom "ad0" (same name, different object) accessible through pointer old_gp
+
+ BEFORE ---> [ pp --> old_gp ...]
+
+A normal "geom sched create ad0" call would create a new geom node
+on top of provider ad0/pp, and export a newly created provider
+("ad0.sched." accessible through pointer newpp).
+
+ AFTER create ---> [ newpp --> gp --> cp ] ---> [ pp --> old_gp ... ]
+
+On top of newpp, a whole tree will be created automatically, and we
+can e.g. mount partitions on /dev/ad0.sched.s1d, and those requests
+will go through the scheduler, whereas any partition mounted on
+the pre-existing device entries will not go through the scheduler.
+
+With the transparent insert mechanism, the original provider "ad0"/pp
+is hooked to the newly created geom, as follows:
+
+ AFTER insert ---> [ pp --> gp --> cp ] ---> [ newpp --> old_gp ... ]
+
+so anything that was previously using provider pp will now have
+the requests routed through the scheduler node.
+
+A removal ("geom sched destroy ad0.sched.") will restore the original
+configuration.
+
+# $FreeBSD$
OpenPOWER on IntegriCloud