diff options
Diffstat (limited to 'sys/geom/sched/README')
-rw-r--r-- | sys/geom/sched/README | 162 |
1 files changed, 162 insertions, 0 deletions
diff --git a/sys/geom/sched/README b/sys/geom/sched/README new file mode 100644 index 0000000..1b52d90 --- /dev/null +++ b/sys/geom/sched/README @@ -0,0 +1,162 @@ + + --- GEOM BASED DISK SCHEDULERS FOR FREEBSD --- + +This code contains a framework for GEOM-based disk schedulers and a +couple of sample scheduling algorithms that use the framework and +implement two forms of "anticipatory scheduling" (see below for more +details). + +As a quick example of what this code can give you, try to run "dd", +"tar", or some other program with highly SEQUENTIAL access patterns, +together with "cvs", "cvsup", "svn" or other highly RANDOM access patterns +(this is not a made-up example: it is pretty common for developers +to have one or more apps doing random accesses, and others that do +sequential accesses e.g., loading large binaries from disk, checking +the integrity of tarballs, watching media streams and so on). + +These are the results we get on a local machine (AMD BE2400 dual +core CPU, SATA 250GB disk): + + /mnt is a partition mounted on /dev/ad0s1f + + cvs: cvs -d /mnt/home/ncvs-local update -Pd /mnt/ports + dd-read: dd bs=128k of=/dev/null if=/dev/ad0 (or ad0-sched-) + dd-writew dd bs=128k if=/dev/zero of=/mnt/largefile + + NO SCHEDULER RR SCHEDULER + dd cvs dd cvs + + dd-read only 72 MB/s ---- 72 MB/s --- + dd-write only 55 MB/s --- 55 MB/s --- + dd-read+cvs 6 MB/s ok 30 MB/s ok + dd-write+cvs 55 MB/s slooow 14 MB/s ok + +As you can see, when a cvs is running concurrently with dd, the +performance drops dramatically, and depending on read or write mode, +one of the two is severely penalized. The use of the RR scheduler +in this example makes the dd-reader go much faster when competing +with cvs, and lets cvs progress when competing with a writer. + +To try it out: + +1. USERS OF FREEBSD 7, PLEASE READ CAREFULLY THE FOLLOWING: + + On loading, this module patches one kernel function (g_io_request()) + so that I/O requests ("bio's") carry a classification tag, useful + for scheduling purposes. + + ON FREEBSD 7, the tag is stored in an existing (though rarely used) + field of the "struct bio", a solution which makes this module + incompatible with other modules using it, such as ZFS and gjournal. + Additionally, g_io_request() is patched in-memory to add a call + to the function that initializes this field (i386/amd64 only; + for other architectures you need to manually patch sys/geom/geom_io.c). + See details in the file g_sched.c. + + On FreeBSD 8.0 and above, the above trick is not necessary, + as the struct bio contains dedicated fields for the classifier, + and hooks for request classifiers. + + If you don't like the above, don't run this code. + +2. PLEASE MAKE SURE THAT THE DISK THAT YOU WILL BE USING FOR TESTS + DOES NOT CONTAIN PRECIOUS DATA. + This is experimental code, so we make no guarantees, though + I am routinely using it on my desktop and laptop. + +3. EXTRACT AND BUILD THE PROGRAMS + A 'make install' in the directory should work (with root privs), + or you can even try the binary modules. + If you want to build the modules yourself, look at the Makefile. + +4. LOAD THE MODULE, CREATE A GEOM NODE, RUN TESTS + + The scheduler's module must be loaded first: + + # kldload gsched_rr + + substitute with gsched_as to test AS. Then, supposing that you are + using /dev/ad0 for testing, a scheduler can be attached to it with: + + # geom sched insert ad0 + + The scheduler is inserted transparently in the geom chain, so + mounted partitions and filesystems will keep working, but + now requests will go through the scheduler. + + To change scheduler on-the-fly, you can reconfigure the geom: + + # geom sched configure -a as ad0.sched. + + assuming that gsched_as was loaded previously. + +5. SCHEDULER REMOVAL + + In principle it is possible to remove the scheduler module + even on an active chain by doing + + # geom sched destroy ad0.sched. + + However, there is some race in the geom subsystem which makes + the removal unsafe if there are active requests on a chain. + So, in order to reduce the risk of data losses, make sure + you don't remove a scheduler from a chain with ongoing transactions. + +--- NOTES ON THE SCHEDULERS --- + +The important contribution of this code is the framework to experiment +with different scheduling algorithms. 'Anticipatory scheduling' +is a very powerful technique based on the following reasoning: + + The disk throughput is much better if it serves sequential requests. + If we have a mix of sequential and random requests, and we see a + non-sequential request, do not serve it immediately but instead wait + a little bit (2..5ms) to see if there is another one coming that + the disk can serve more efficiently. + +There are many details that should be added to make sure that the +mechanism is effective with different workloads and systems, to +gain a few extra percent in performance, to improve fairness, +insulation among processes etc. A discussion of the vast literature +on the subject is beyond the purpose of this short note. + +-------------------------------------------------------------------------- + +TRANSPARENT INSERT/DELETE + +geom_sched is an ordinary geom module, however it is convenient +to plug it transparently into the geom graph, so that one can +enable or disable scheduling on a mounted filesystem, and the +names in /etc/fstab do not depend on the presence of the scheduler. + +To understand how this works in practice, remember that in GEOM +we have "providers" and "geom" objects. +Say that we want to hook a scheduler on provider "ad0", +accessible through pointer 'pp'. Originally, pp is attached to +geom "ad0" (same name, different object) accessible through pointer old_gp + + BEFORE ---> [ pp --> old_gp ...] + +A normal "geom sched create ad0" call would create a new geom node +on top of provider ad0/pp, and export a newly created provider +("ad0.sched." accessible through pointer newpp). + + AFTER create ---> [ newpp --> gp --> cp ] ---> [ pp --> old_gp ... ] + +On top of newpp, a whole tree will be created automatically, and we +can e.g. mount partitions on /dev/ad0.sched.s1d, and those requests +will go through the scheduler, whereas any partition mounted on +the pre-existing device entries will not go through the scheduler. + +With the transparent insert mechanism, the original provider "ad0"/pp +is hooked to the newly created geom, as follows: + + AFTER insert ---> [ pp --> gp --> cp ] ---> [ newpp --> old_gp ... ] + +so anything that was previously using provider pp will now have +the requests routed through the scheduler node. + +A removal ("geom sched destroy ad0.sched.") will restore the original +configuration. + +# $FreeBSD$ |