summaryrefslogtreecommitdiffstats
path: root/block/cfq-iosched.c
Commit message (Collapse)AuthorAgeFilesLines
* cfq-iosched: fix alias + front merge bugJens Axboe2007-04-251-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a really rare and obscure bug in CFQ, that causes a crash in cfq_dispatch_insert() due to rq == NULL. One example of the resulting oops is seen here: http://lkml.org/lkml/2007/4/15/41 Neil correctly diagnosed the situation for how this can happen: if two concurrent requests with the exact same sector number (due to direct IO or aliasing between MD and the raw device access), the alias handling will add the request to the sortlist, but next_rq remains NULL. Read the more complete analysis at: http://lkml.org/lkml/2007/4/25/57 This looks like it requires md to trigger, even though it should potentially be possible to due with O_DIRECT (at least if you edit the kernel and doctor some of the unplug calls). The fix is to move the ->next_rq update to when we add a request to the rbtree. Then we remove the possibility for a request to exist in the rbtree code, but not have ->next_rq correctly updated. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cfq-iosched: fix sequential write regressionJens Axboe2007-04-201-15/+19
| | | | | | | | | | | | | | | We have a 10-15% performance regression for sequential writes on TCQ/NCQ enabled drives in 2.6.21-rcX after the CFQ update went in. It has been reported by Valerie Clement <valerie.clement@bull.net> and the Intel testing folks. The regression is because of CFQ's now more aggressive queue control, limiting the depth available to the device. This patches fixes that regression by allowing a greater depth when only one queue is busy. It has been tested to not impact sync-vs-async workloads too much - we still do a lot better than 2.6.20. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cfq-iosched: improve continue or break logic in cfq_dispatchJens Axboe2007-02-111-8/+8
| | | | | | | This improves performance considerably for sync requests when you have command queuing enabled. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: remove the implicit queue kicking in slice expireJens Axboe2007-02-111-6/+6
| | | | | | | We only really need it for a process going away, so move it to those locations. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: check whether a queue timed out in accountingJens Axboe2007-02-111-14/+18
| | | | | | Makes it more fair for the residual slice count. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: tweak the FIFO checkingJens Axboe2007-02-111-3/+4
| | | | | | | | We currently check the FIFO once per slice. Optimize that a bit and only do it as the first thing for a new slice, so we don't end up doing a single request and then seek to the FIFO requests. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: don't pass in queue for cfq_arm_slice_timer()Jens Axboe2007-02-111-5/+4
| | | | | | | It must always be the active queue, otherwise it's a bug. So just use the active_queue, don't pass it in explicitly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: account for slice over/under timeJens Axboe2007-02-111-20/+12
| | | | | | | | If a slice uses less than it is entitled to (or perhaps more), include that in the decision on how much time to give it the next time it gets serviced. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: defer slice activation to first request being activeJens Axboe2007-02-111-38/+53
| | | | | | | This better matches what time the queue is actually spending doing IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] cfq-iosched: use last service point as the fairness criteriaJens Axboe2007-02-111-14/+34
| | | | | | | | Right now we use slice_start, which gives async queues an unfair advantage. Chance that to service_last, and base the resorter on that. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: document the cfqq flagsJens Axboe2007-02-111-9/+9
| | | | Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] cfq-iosched: move on_rr check into cfq_resort_rr_list()Jens Axboe2007-02-111-10/+9
| | | | | | | Move the on_rr check into cfq_resort_rr_list(), every call site needs to check it anyway. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* cfq-iosched: remove cfq_io_context last_queueJens Axboe2007-02-111-17/+2
| | | | | | | It hasn't been used for a while, kill it off and remove the old if 0 code chunk. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] cfq-iosched: merging problemJens Axboe2007-01-021-3/+3
| | | | | | | | | | | | | | | | | | | Two issues: - The final return 1 should be a return 0, otherwise comparing cfqq is a noop. - bio_sync() only checks the sync flag, while rq_is_sync() checks both for READ and sync. The latter is what we want. Expand the bio check to include reads, and relax the restriction to allow merging of async io into sync requests. In the future we want to clean up the SYNC logic, right now it means both sync request (such as READ and O_DIRECT WRITE) and unplug-on-issue. Leave that for later. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cfq-iosched: tighten allow merge criteriaJens Axboe2006-12-221-13/+8
| | | | | | | | | | | | The logic in cfq_allow_merge() wasn't clear enough - basically allow merging for the same queues only. Do a fast check for 'rq and bio both sync/async' before doing the cfqq hash lookup. This is verified to work with the fixed elv_try_merge() from commit bb4067e34159648d394943d5e2a011f838bff22f. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cfq-iosched: don't allow sync merges across queuesJens Axboe2006-12-201-0/+33
| | | | | | | | | | | | Currently we allow any merge, even if the io originates from different processes. This can cause really bad starvation and unfairness, if those ios happen to be synchronous (reads or direct writes). So add a allow_merge hook to the io scheduler ops, so an io scheduler can help decide whether a bio/process combination may be merged with an existing request. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] Propagate down request sync flagJens Axboe2006-12-131-6/+12
| | | | | | | | We need to do this, otherwise the io schedulers don't get access to the sync flag. Then they cannot tell the difference between a regular write and an O_DIRECT write, which can cause a performance loss. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] slab: remove kmem_cache_tChristoph Lameter2006-12-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge branch 'master' of ↵David Howells2006-12-051-5/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>
| * [BLOCK] Cleanup unused variable passingJens Axboe2006-12-011-5/+4
| | | | | | | | | | | | | | | | - ->init_queue() does not need the elevator passed in - ->put_request() is a hot path and need not have the queue passed in - cfq_update_io_seektime() does not need cfqd passed in Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | WorkStruct: Pass the work_struct pointer instead of context dataDavid Howells2006-11-221-3/+5
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is *not* cleared before jumping to the work function, then the work function *may* access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: David Howells <dhowells@redhat.com>
* [PATCH] CFQ: request <-> request merging rr_list fixupJens Axboe2006-10-311-3/+3
| | | | | | | | | | | | | | In very rare circumstances would we be pruning a merged request and at the same time delete the implicated cfqq from the rr_list, and not readd it when the merged request got added. This could cause io stalls until that process issued io again. Fix it up by putting the rr_list add handling into cfq_add_rq_rb(), identical to how pruning is handled in cfq_del_rq_rb(). This fixes a hang reproducible with fsx-linux. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] CFQ: bad locking in changed_ioprio()Jens Axboe2006-10-301-2/+3
| | | | | | | | | When the ioprio code recently got juggled a bit, a bug was introduced. changed_ioprio() is no longer called with interrupts disabled, so using plain spin_lock() on the queue_lock is a bug. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] CFQ: use irq safe locking in cfq_cic_link()Jens Axboe2006-10-301-2/+3
| | | | | | | | | | | | | | If cfq_set_request() is called for a new process AND a non-fs io request (so that __GFP_WAIT may not be set), cfq_cic_link() may use spin_lock_irq() and spin_unlock_irq() with interrupts already disabled. Fix is to always use irq safe locking in cfq_cic_link() Acked-By: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] completions: lockdep annotate on stack completionsPeter Zijlstra2006-10-011-1/+1
| | | | | | | | | | | All on stack DECLARE_COMPLETIONs should be replaced by: DECLARE_COMPLETION_ONSTACK Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Update axboe@suse.de email addressJens Axboe2006-09-301-1/+1
| | | | | | | As people often look for the copyright in files to see who to mail, update the link to a neutral one. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* [PATCH] cfq-iosched: use metadata read flagJens Axboe2006-09-301-0/+24
| | | | | | | | Give meta data reads preference over regular reads, as the process often needs to get that out of the way to do the io it was actually interested in. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: improve queue preemptionJens Axboe2006-09-301-6/+8
| | | | | | | Don't touch the current queues, just make sure that the wanted queue is selected next. Simplifies the logic. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] Add blk_start_queueing() helperJens Axboe2006-09-301-18/+4
| | | | | | | | | | CFQ implements this on its own now, but it's really block layer knowledge. Tells a device queue to start dispatching requests to the driver, taking care to unplug if needed. Also fixes the issue where as/cfq will invoke a stopped queue, which we really don't want. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: kill the empty_listJens Axboe2006-09-301-8/+2
| | | | | | | No point in having a place holder list just for empty queues, so remove it. It's not used for anything other than to keep ->cfq_list busy. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: Kill O(N) runtime of cfq_resort_rr_list()Jens Axboe2006-09-301-31/+22
| | | | | | | | | Currently it scales with number of processes in that priority group, which is potentially not very nice as it's called quite often. Basically we always need to do tail inserts, except for the case of a new process. So just mark/detect a queue as such. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] Make sure all block/io scheduler setups are node awareJens Axboe2006-09-301-6/+7
| | | | | | | Some were kmalloc_node(), some were still kmalloc(). Change them all to kmalloc_node(). Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] Audit block layer inlinesJens Axboe2006-09-301-1/+1
| | | | | | | Kill a few inlines that bring in too much code to more than one location Shrinks kernel text by about 300 bytes on 32-bit x86. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: use new io context counting mechanismJens Axboe2006-09-301-5/+7
| | | | | | | | It's ok if the read path is a lot more costly, as long as inc/dec is really cheap. The inc/dec will happen for each created/freed io context, while the reading only happens when a disk queue exits. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: kill cfq_exit_lockJens Axboe2006-09-301-37/+17
| | | | | | | | | | | | | | | | | | | | | cfq_exit_lock is protecting two things now: - The per-ioc rbtree of cfq_io_contexts - The per-cfqd linked list of cfq_io_contexts The per-cfqd linked list can be protected by the queue lock, as it is (by definition) per cfqd as the queue lock is. The per-ioc rbtree is mainly used and updated by the process itself only. The only outside use is the io priority changing. If we move the priority changing to not browsing the rbtree, we can remove any locking from the rbtree updates and lookup completely. Let the sys_ioprio syscall just mark processes as having the iopriority changed and lazily update the private cfq io contexts the next time io is queued, and we can remove this locking as well. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: cleanups, fixes, dead code removalJens Axboe2006-09-301-114/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | A collection of little fixes and cleanups: - We don't use the 'queued' sysfs exported attribute, since the may_queue() logic was rewritten. So kill it. - Remove dead defines. - cfq_set_active_queue() can be rewritten cleaner with else if conditions. - Several places had cfq_exit_cfqq() like logic, abstract that out and use that. - Annotate the cfqq kmem_cache_alloc() so the allocator knows that this is a repeat allocation if it fails with __GFP_WAIT set. Allows the allocator to start freeing some memory, if needed. CFQ already loops for this condition, so might as well pass the hint down. - Remove cfqd->rq_starved logic. It's not needed anymore after we dropped the crq allocation in cfq_set_request(). - Remove uneeded parameter passing. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] Drop useless bio passing in may_queue/set_request APIJens Axboe2006-09-301-3/+2
| | | | | | It's not needed for anything, so kill the bio passing. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: kill crqJens Axboe2006-09-301-144/+95
| | | | | | | | Get rid of the cfq_rq request type. With the added elevator_private2, we have enough room in struct request to get rid of any crq allocation/free for each request. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: remove the crq flag functions/variableJens Axboe2006-09-301-42/+16
| | | | | | | There's just one flag currently (SYNC), and that one can be grabbed from the request. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: convert to using the FIFO elevator definesJens Axboe2006-09-301-2/+1
| | | | Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: migrate to using the elevator rb functionsJens Axboe2006-09-301-123/+41
| | | | | | This removes the rbtree handling from CFQ. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] elevator: move the backmerging logic into the elevator coreJens Axboe2006-09-301-83/+3
| | | | | | | | | | | Right now, every IO scheduler implements its own backmerging (except for noop, which does no merging). That results in duplicated code for essentially the same operation, which is never a good thing. This patch moves the backmerging out of the io schedulers and into the elevator core. We save 1.6kb of text and as a bonus get backmerging for noop as well. Win-win! Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq_cic_link: fix usage of wrong cfq_io_contextOleg Nesterov2006-08-211-1/+1
| | | | | | | | Obviously, cfq_cic_link() shouldn't free a just allocated cfq_io_context? The dead key is from __cic, so drop that. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: don't use a hard jiffies value, translate from msecsJens Axboe2006-07-251-1/+1
| | | | | | | | | | | | The CIC_SEEKY() test really wants to use the minimum of either: - 2 msecs (not jiffies) - or, the pending slice time So code it like that. Signed-off-by: Jens Axboe <axboe@suse.de>
* Remove obsolete #include <linux/config.h>Jörn Engel2006-06-301-1/+0
| | | | | Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* [PATCH] rbtree: support functions used by the io schedulersJens Axboe2006-06-231-14/+8
| | | | | | | They all duplicate macros to check for empty root and/or node, and clearing a node. So put those in rbtree.h. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: rq update fixesJens Axboe2006-06-231-5/+5
| | | | | | | | | - Remember to set ->last_sector so that the cfq_choose_req() logic works correctly. - Remove redundant call to cfq_choose_req() Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: many performance fixesJens Axboe2006-06-231-40/+76
| | | | | | | | | | | | | | | | | | | | This is a collection of patches that greatly improve CFQ performance in some circumstances. - Change the idling logic to only kick in after a request is done and we are deciding what to do. Before the idling included the request service time, so it was hard to adjust. Now it's true think/idle time. - Take advantage of TCQ/NCQ/queueing for seeky sync workloads, but keep it in control for sync and sequential (or close to) workloads. - Expire queues immediately and move on to other busy queues, if we are not going to idle after the current one finishes. - Don't rearm idle timer if there are no busy queues. Just leave the system idle. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: correctly set ioprio on both targetsJens Axboe2006-06-231-3/+2
| | | | | | | | | | | | | | | | | Patch originally from Vasily Tarasov <vtaras@sw.ru> If you set io-priority of process 1 using sys_ioprio_set system call by another process 2 (like ionice do), then cfq_init_prio_data() function sets priority of process 2 (current) on queue of process 1 and clears the flag, that designates change of ioprio. So the process 1 will work like with priority of process 2. I propose not to call cfq_init_prio_data() on io-priority change, but only mark queue as queue with changed prority. Every time when new request comes cfq-scheduler checks for this flag and atomaticaly changes priority of queue to new value. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] Kill PF_SYNCWRITE flagJens Axboe2006-06-231-3/+1
| | | | | | | | | | | | | | | A process flag to indicate whether we are doing sync io is incredibly ugly. It also causes performance problems when one does a lot of async io and then proceeds to sync it. Part of the io will go out as async, and the other part as sync. This causes a disconnect between the previously submitted io and the synced io. For io schedulers such as CFQ, this will cause us lost merges and suboptimal behaviour in scheduling. Remove PF_SYNCWRITE completely from the fsync/msync paths, and let the O_DIRECT path just directly indicate that the writes are sync by using WRITE_SYNC instead. Signed-off-by: Jens Axboe <axboe@suse.de>
OpenPOWER on IntegriCloud