op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away	Jens Axboe	2011-03-17	3	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \|	We don't have proper reference counting for this yet, so we run into cases where the device is pulled and we OOPS on flushing the fs data. This happens even though the dirty inodes have already been migrated to the default_backing_dev_info. Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
*	block: Require subsystems to explicitly allocate bio_set integrity mempool	Martin K. Petersen	2011-03-17	11	-22/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MD and DM create a new bio_set for every metadevice. Each bio_set has an integrity mempool attached regardless of whether the metadevice is capable of passing integrity metadata. This is a waste of memory. Instead we defer the allocation decision to MD and DM since we know at metadevice creation time whether integrity passthrough is needed or not. Automatic integrity mempool allocation can then be removed from bioset_create() and we make an explicit integrity allocation for the fs_bio_set. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Acked-by: Mike Snitzer <snizer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
*	jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging	Jens Axboe	2011-03-17	1	-10/+8
\| \| \| \| \| \| \| \|	'write_op' was still used, even though it was always WRITE_SYNC now. Add plugging around the cases where it submits IO, and flush them before we end up waiting for that IO. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
*	jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging	Jens Axboe	2011-03-17	1	-11/+11
\| \| \| \| \| \| \| \|	'write_op' was still used, even though it was always WRITE_SYNC now. Add plugging around the cases where it submits IO, and flush them before we end up waiting for that IO. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
*	fs: make fsync_buffers_list() plug	Jens Axboe	2011-03-17	1	-0/+6
\| \| \| \| \| \| \|	It used WRITE_SYNC_PLUG before and potentially submits a batch of IO, so lets enable plugging for this case. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
*	mm: make generic_writepages() use plugging	Shaohua Li	2011-03-17	1	-1/+7
\| \| \| \| \| \| \|	This recovers a performance regression caused by the removal of the per-device plugging. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
*	blk-cgroup: Add unaccounted time to timeslice_used.	Justin TerAvest	2011-03-12	4	-14/+41
\| \| \| \| \| \| \| \| \| \| \| \|	There are two kind of times that tasks are not charged for: the first seek and the extra time slice used over the allocated timeslice. Both of these exported as a new unaccounted_time stat. I think it would be good to have this reported in 'time' as well, but that is probably a separate discussion. Signed-off-by: Justin TerAvest <teravest@google.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
*	block: fixup plugging stubs for !CONFIG_BLOCK	Jens Axboe	2011-03-11	1	-3/+6
\| \| \| \| \| \| \|	They used an older prototype, fix it up. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
*	block: remove obsolete comments for blkdev_issue_zeroout.	Tao Ma	2011-03-11	1	-2/+0
\| \| \| \| \| \| \| \| \|	barrier is already removed, so remove the obsolete comments in blkdev_issue_zeroout. Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
*	blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.	Tao Ma	2011-03-11	1	-11/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In blk_add_trace_rq, we only chose the minor 2 bits from request's cmd_flags and did some check for discard. so most of other flags(e.g, REQ_SYNC) are missing. For example, with a sync write after blkparse we get: 8,16 1 1 0.001776503 7509 A WS 1349632 + 1024 <- (8,17) 1347584 8,16 1 2 0.001776813 7509 Q WS 1349632 + 1024 [dd] 8,16 1 3 0.001780395 7509 G WS 1349632 + 1024 [dd] 8,16 1 5 0.001783186 7509 I W 1349632 + 1024 [dd] 8,16 1 11 0.001816987 7509 D W 1349632 + 1024 [dd] 8,16 0 2 0.006218192 0 C W 1349632 + 1024 [0] Since now we have integrated the flags of both bio and request, it is safe to pass rq->cmd_flags directly to __blk_add_trace. With this patch, after a sync write we get: 8,16 1 1 0.001776900 5425 A WS 1189888 + 1024 <- (8,17) 1187840 8,16 1 2 0.001777179 5425 Q WS 1189888 + 1024 [dd] 8,16 1 3 0.001780797 5425 G WS 1189888 + 1024 [dd] 8,16 1 5 0.001783402 5425 I WS 1189888 + 1024 [dd] 8,16 1 11 0.001817468 5425 D WS 1189888 + 1024 [dd] 8,16 0 2 0.005640709 0 C WS 1189888 + 1024 [0] Signed-off-by: Tao Ma <boyu.mt@taobao.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
*	Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core	Jens Axboe	2011-03-10	137	-1533/+606
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blk-throttle: Use blk_plug in throttle dispatch	Vivek Goyal	2011-03-10	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use plug in throttle dispatch also as we are dispatching a bunch of bios in throttle context and some of them might merge. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: kill off REQ_UNPLUG	Jens Axboe	2011-03-10	24	-75/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	aio: remove request submission batching	Jens Axboe	2011-03-10	1	-72/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should be useless now that we have on-stack plugging. So lets just kill it. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	fs: make aio plug	Shaohua Li	2011-03-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	fs: make mpage read/write_pages() plug	Jens Axboe	2011-03-10	1	-0/+8
\| \| \| \| \| \| \| \|	Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	read-ahead: use plugging	Jens Axboe	2011-03-10	1	-0/+6
\| \| \| \| \| \| \| \|	Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	fs: make generic file read/write functions plug	Jens Axboe	2011-03-10	1	-0/+7
\| \| \| \| \| \| \| \|	Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: remove per-queue plugging	Jens Axboe	2011-03-10	119	-1269/+151
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: initial patch for on-stack per-task plugging	Jens Axboe	2011-03-10	10	-101/+344
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for creating a queuing context outside of the queue itself. This enables us to batch up pieces of IO before grabbing the block device queue lock and submitting them to the IO scheduler. The context is created on the stack of the process and assigned in the task structure, so that we can auto-unplug it if we hit a schedule event. The current queue plugging happens implicitly if IO is submitted to an empty device, yet callers have to remember to unplug that IO when they are going to wait for it. This is an ugly API and has caused bugs in the past. Additionally, it requires hacks in the vm (->sync_page() callback) to handle that logic. By switching to an explicit plugging scheme we make the API a lot nicer and can get rid of the ->sync_page() hack in the vm. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	scsi: convert to blk_delay_queue()	Jens Axboe	2011-03-10	1	-25/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was always abuse to reuse the plugging infrastructure for this, convert it to the (new) real API for delaying queueing a bit. A default delay of 3 msec is defined, to match the previous behaviour. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	ide-cd: convert to blk_delay_queue() for a short pause	Jens Axboe	2011-03-10	1	-11/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was always abuse to reuse the plugging infrastructure for this, convert it to the (new) real API for delaying queueing a bit. Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Acked-by: David S. Miller <davem@davemloft.net>
\| *	block: add API for delaying work/request_fn a little bit	Jens Axboe	2011-03-10	2	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we use plugging for that, but as plugging is going away, we need an alternative mechanism. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* \|	staging: Convert to bdops->check_events()	Tejun Heo	2011-03-09	2	-8/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert two staging drivers - blkvsc_drv and cyasblkdev_block - from ->media_changed() to ->check_events(). The former always indicated media changed while the latter always indicated media not changed. Not sure what the drivers are trying to achieve but keep the original behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
* \|	pktcdvd: Convert to bdops->check_events()	Tejun Heo	2011-03-09	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert from ->media_changed() to ->check_events(). pktcdvd needs to forward all event related operations to the underlying device. Forward ->check_events() instead of ->media_changed() and inherit disk->[async_]events. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Peter Osterlund <petero2@telia.com>
* \|	umem: Drop dummy ->media_changed()	Tejun Heo	2011-03-09	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	umem doesn't implement media changed detection and there's no need to implement dummy callback anymore. Remove it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
* \|	s390/tape_block: Convert to bdops->check_events()	Tejun Heo	2011-03-09	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert from ->media_changed() to ->check_events(). s390/tape_block buffers media changed state and clears it on revalidation. It will behave correctly with kernel event polling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
* \|	i2o_block: Convert to bdops->check_events()	Tejun Heo	2011-03-09	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert from ->media_changed() to ->check_events(). i2o_block buffers media changed state and clears it after reporting. It will behave correctly with kernel event polling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
* \|	xsysace: Convert to bdops->check_events()	Tejun Heo	2011-03-09	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert from ->media_changed() to ->check_events(). xsysace buffers media changed state and clears it on revalidation. It will behave correctly with kernel event polling. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
* \|	ub: Convert to bdops->check_events()	Tejun Heo	2011-03-09	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert from ->media_changed() to ->check_events(). ub buffers media changed state and clears it on revalidation. It will behave correctly with kernel event polling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Pete Zaitcev <zaitcev@redhat.com>
* \|	swim[3]: Convert to bdops->check_events()	Tejun Heo	2011-03-09	2	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert from ->media_changed() to ->check_events(). Both swim and swim3 buffer media changed state and clear it on revalidation. They will behave correctly with kernel event polling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Laurent Vivier <laurent@lvivier.info> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* \|	dac960: Convert to bdops->check_events()	Tejun Heo	2011-03-09	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert from ->media_changed() to ->check_events(). DAC960 media change notification seems to be one way (once set, never cleared) and will generate spurious events when polled once the condition triggers. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
* \|	paride: Convert to bdops->check_events()	Tejun Heo	2011-03-09	3	-14/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert paride drivers from ->media_changed() to ->check_events(). pcd and pd buffer and clear events after reporting; however, pf unconditionally reports MEDIA_CHANGE and will generate spurious events when polled. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Tim Waugh <tim@cyberelk.net>
* \|	gdrom,viocd: Convert to bdops->check_events()	Tejun Heo	2011-03-09	2	-13/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert gdrom and viocd from ->media_changed() to ->check_events(). It's unclear how the conditions are cleared and it's possible that it may generate spurious events when polled. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
* \|	floppy,{ami\|ata}flop: Convert to bdops->check_events()	Tejun Heo	2011-03-09	3	-14/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert the floppy drivers from ->media_changed() to ->check_events(). Both floppy and ataflop buffer media changed state bit and clear them on revalidation and will behave correctly with kernel event polling. I can't tell how amiflop clears its event and it's possible that it may generate spurious events when polled. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
* \|	ide: Convert to bdops->check_events()	Tejun Heo	2011-03-09	4	-15/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert ->media_changed() to the new ->check_events() method. The conversion is mostly mechanical. The only notable change is that cdrom now doesn't generate any event if @slot_nr isn't CDSL_CURRENT. It used to return -EINVAL which would be treated as media changed. As media changer isn't supported anyway, this doesn't make any difference. This makes ide emit the standard disk events and allows kernel event polling. Currently, only MEDIA_CHANGE event is implemented. Adding support for EJECT_REQUEST shouldn't be difficult; however, given that ide driver is already deprecated, it probably is best to leave it alone. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-ide@vger.kernel.org
* \|	block: Don't check events while open is in progress	Tejun Heo	2011-03-09	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Not all block drivers clear events immediately after reporting. Some do so in ->revalidate_disk() or other steps during ->open(). There is a slim chance event poll may happen between the clearing event check from check_disk_change() and the actual clearing of the events which would result in spurious events. Block event checks while block device open is in progress. There is no need to kick explicit event check afterwards as events are always checked during open. -v2: The original patch could have called disk_unblock_events() with an already released or %NULL @disk causing oops. Fixed by making sure references are put after disk_unblock_events() is called. It also makes the error path of __blkdev_get() a bit simpler. This problem was reported by Jens. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
* \|	block: Don't check events on close unless it was blocked	Tejun Heo	2011-03-09	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The block event mechanism currently always checks events when the device is being closed regardless of the open mode. The intention was to allow detection of EJECT_REQUEST when a device is closed whether disk event polling is enabled or not. This is unnecessary as, for devices of interest, events are checked from either userland or kernel and in the former case ->check_events() is performed on open of each poll attempt anyway. Furthermore, this unconditional event check on close makes the code susceptible to event loop if the block driver doesn't clear reported events correctly - an event triggers userland to open and close the device which in turn causes another event, rinse and repeat. Check events on close only if it was blocked by excl write open. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
* \|	block: Don't implicitly trigger event check on disk_unblock_events()	Tejun Heo	2011-03-09	2	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, disk_unblock_events() implicitly kick event check if the block count reaches zero. This behavior is not described in the comment and hinders with future changes. Make the unblocker explicitly check events by calling disk_check_events() as necessary. This patch doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
* \|	blk-cgroup: Lower minimum weight from 100 to 10.	Justin TerAvest	2011-03-08	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've found that we still get good, useful isolation at weights this low. I'd like to adjust the minimum so that any other changes can take these values into account. Signed-off-by: Justin TerAvest <teravest@google.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* \|	block: biovec_slab vs. CONFIG_BLK_DEV_INTEGRITY	Martin K. Petersen	2011-03-08	2	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The block integrity subsystem no longer uses the bio_vec slabs so this code can safely be compiled in. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* \|	blk-throttle: Some cleanups and race fixes in limit update code	Vivek Goyal	2011-03-07	1	-56/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When throttle group limits are updated through cgroups, a thread is woken up to process these updates. While reviewing that code, oleg noted couple of race conditions existed in the code and he also suggested that code can be simplified. This patch fixes the races simplifies the code based on Oleg's suggestions: - Use xchg(). - Introduced a common function throtl_update_blkio_group_common() which is shared now by all iops/bps update functions. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Fixed a merge issue, throtl_schedule_delayed_work() takes throtl_data as the argument now, not the queue. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* \|	blk-throttle: process limit change only through one function	Vivek Goyal	2011-03-07	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the help of cgroup interface one can go and upate the bps/iops limits of existing group. Once the limits are udpated, a thread is woken up to see if some blocked group needs recalculation based on new limits and needs to be requeued. There was also a piece of code where I was checking for group limit update when a fresh bio comes in. This patch gets rid of that piece of code and keeps processing the limit change at one place throtl_process_limit_change(). It just keeps the code simple and easy to understand. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* \|	Merge branch 'block-for-2.6.39-core' of ↵	Jens Axboe	2011-03-07	209	-734/+1841
\|\ \ \| \| \| \| \| \| \| \| \|	ssh://master.kernel.org/pub/scm/linux/kernel/git/tj/misc into for-2.6.39/core
\| * \	Merge branch 'for-linus' of ../linux-2.6-block into block-for-2.6.39/core	Tejun Heo	2011-03-04	209	-734/+1841
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This merge creates two set of conflicts. One is simple context conflicts caused by removal of throtl_scheduled_delayed_work() in for-linus and removal of throtl_shutdown_timer_wq() in for-2.6.39/core. The other is caused by commit 255bb490c8 (block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue()) in for-linus crashing with FLUSH reimplementation in for-2.6.39/core. The conflict isn't trivial but the resolution is straight-forward. * __blk_run_queue() calls in flush_end_io() and flush_data_end_io() should be called with @force_kblockd set to %true. * elv_insert() in blk_kick_flush() should use %ELEVATOR_INSERT_REQUEUE. Both changes are to avoid invoking ->request_fn() directly from request completion path and closely match the changes in the commit 255bb490c8. Signed-off-by: Tejun Heo <tj@kernel.org>
\| \| * \|	block: kill loop_mutex	Petr Uzel	2011-03-03	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Following steps lead to deadlock in kernel: dd if=/dev/zero of=img bs=512 count=1000 losetup -f img mkfs.ext2 /dev/loop0 mount -t ext2 -o loop /dev/loop0 mnt umount mnt/ Stacktrace: [<c102ec04>] irq_exit+0x36/0x59 [<c101502c>] smp_apic_timer_interrupt+0x6b/0x75 [<c127f639>] apic_timer_interrupt+0x31/0x38 [<c101df88>] mutex_spin_on_owner+0x54/0x5b [<fe2250e9>] lo_release+0x12/0x67 [loop] [<c10c4eae>] __blkdev_put+0x7c/0x10c [<c10a4da5>] fput+0xd5/0x1aa [<fe2250cf>] loop_clr_fd+0x1a9/0x1b1 [loop] [<fe225110>] lo_release+0x39/0x67 [loop] [<c10c4eae>] __blkdev_put+0x7c/0x10c [<c10a59d9>] deactivate_locked_super+0x17/0x36 [<c10b6f37>] sys_umount+0x27e/0x2a5 [<c10b6f69>] sys_oldumount+0xb/0xe [<c1002897>] sysenter_do_call+0x12/0x26 [<ffffffff>] 0xffffffff Regression since 2a48fc0ab24241755dc9, which introduced the private loop_mutex as part of the BKL removal process. As per [1], the mutex can be safely removed. [1] http://www.gossamer-threads.com/lists/linux/kernel/1341930 Addresses: https://bugzilla.novell.com/show_bug.cgi?id=669394 Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=29172 Signed-off-by: Petr Uzel <petr.uzel@suse.cz> Cc: stable@kernel.org Reviewed-by: Nikanth Karthikesan <knikanth@suse.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| \| * \|	blktrace: Remove blk_fill_rwbs_rq.	Tao Ma	2011-03-03	3	-20/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we enable trace events to trace block actions, We use blk_fill_rwbs_rq to analyze the corresponding actions in request's cmd_flags, but we only choose the minor 2 bits from it, so most of other flags(e.g, REQ_SYNC) are missing. For example, with a sync write we get: write_test-2409 [001] 160.013869: block_rq_insert: 3,64 W 0 () 258135 + = 8 [write_test] Since now we have integrated the flags of both bio and request, it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and blk_fill_rwbs_rq isn't needed any more. With this patch, after a sync write we get: write_test-2417 [000] 226.603878: block_rq_insert: 3,64 WS 0 () 258135 += 8 [write_test] Signed-off-by: Tao Ma <boyu.mt@taobao.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| \| * \|	block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue()	Tejun Heo	2011-03-02	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	blk-flush decomposes a flush into sequence of multiple requests. On completion of a request, the next one is queued; however, block layer must not implicitly call into q->request_fn() directly from completion path. This makes the queue behave unexpectedly when seen from the drivers and violates the assumption that q->request_fn() is called with process context + queue_lock. This patch makes blk-flush the following two changes to make sure q->request_fn() is not called directly from request completion path. - blk_flush_complete_seq_end_io() now asks __blk_run_queue() to always use kblockd instead of calling directly into q->request_fn(). - queue_next_fseq() uses ELEVATOR_INSERT_REQUEUE instead of ELEVATOR_INSERT_FRONT so that elv_insert() doesn't try to unplug the request queue directly. Reported by Jan in the following threads. http://thread.gmane.org/gmane.linux.ide/48778 http://thread.gmane.org/gmane.linux.ide/48786 stable: applicable to v2.6.37. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jan Beulich <JBeulich@novell.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| \| * \|	block: add @force_kblockd to __blk_run_queue()	Tejun Heo	2011-03-02	7	-14/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	__blk_run_queue() automatically either calls q->request_fn() directly or schedules kblockd depending on whether the function is recursed. blk-flush implementation needs to be able to explicitly choose kblockd. Add @force_kblockd. All the current users are converted to specify %false for the parameter and this patch doesn't introduce any behavior change. stable: This is prerequisite for fixing ide oops caused by the new blk-flush implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| \| * \|	block: fix kernel-doc format for blkdev_issue_zeroout	Ben Hutchings	2011-03-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>