op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block	Linus Torvalds	2010-10-22	43	-299/+2494
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits) cfq-iosched: Fix a gcc 4.5 warning and put some comments block: Turn bvec_k{un,}map_irq() into static inline functions block: fix accounting bug on cross partition merges block: Make the integrity mapped property a bio flag block: Fix double free in blk_integrity_unregister block: Ensure physical block size is unsigned int blkio-throttle: Fix possible multiplication overflow in iops calculations blkio-throttle: limit max iops value to UINT_MAX blkio-throttle: There is no need to convert jiffies to milli seconds blkio-throttle: Fix link failure failure on i386 blkio: Recalculate the throttled bio dispatch time upon throttle limit change blkio: Add root group to td->tg_list blkio: deletion of a cgroup was causes oops blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n block: set the bounce_pfn to the actual DMA limit rather than to max memory block: revert bad fix for memory hotplug causing bounces Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK block: set the bounce_pfn to the actual DMA limit rather than to max memory block: Prevent hang_check firing during long I/O cfq: improve fsync performance for small files ... Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h
\| *	cfq-iosched: Fix a gcc 4.5 warning and put some comments	Vivek Goyal	2010-10-22	1	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Andi encountedred following warning with gcc 4.5 linux/block/cfq-iosched.c: In function ‘cfq_dispatch_requests’: linux/block/cfq-iosched.c:2156:3: warning: array subscript is above array bounds - Warning happens due to following code. slice = group_slice * count / max_t(unsigned, cfqg->busy_queues_avg[cfqd->serving_prio], cfq_group_busy_queues_wl(cfqd->serving_prio, cfqd, cfqg)); gcc is complaining about cfqg->busy_queues_avg[] being indexed by CFQ prio classes (RT, BE and IDLE) while the array size is only 2. - At run time, we never access cfqg->busy_queues_avg[IDLE] and return from function before this code hits. - To fix warning increase the array size though it will remain unused. This patch also puts some comments to clarify some of the confusions. - I have taken Jens's patch and modified it a bit. - Compile tested with gcc 4.4 and boot tested. I don't have gcc 4.5 running, Andi can you please test it with gcc 4.5 to make sure it worked. Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: Turn bvec_k{un,}map_irq() into static inline functions	Geert Uytterhoeven	2010-10-21	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert bvec_k{un,}map_irq() from macros to static inline functions if !CONFIG_HIGHMEM, so we can easier detect mistakes like the one fixed in 93055c31045a2d5599ec613a0c6cdcefc481a460 ("ps3disk: passing wrong variable = to bvec_kunmap_irq()") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: fix accounting bug on cross partition merges	Yasuaki Ishimatsu	2010-10-19	8	-13/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	/proc/diskstats would display a strange output as follows. $ cat /proc/diskstats \|grep sda 8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089 8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691 ~~~~~~~~~~ 8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390 8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92 8 4 sda4 4 0 8 0 0 0 0 0 0 0 0 8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137 Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE. The detailed root cause is as follows. Assuming that there are two partition, sda1 and sda2. 1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight is 0 and sda2's one is 1. \| hd_struct->in_flight --------------------------- sda1 \| 0 sda2 \| 1 --------------------------- 2. A bio belongs to sda1 is issued and is merged into the request mentioned on step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed from sda2 region to sda1 region. However the two partition's hd_struct->in_flight are not changed. \| hd_struct->in_flight --------------------------- sda1 \| 0 sda2 \| 1 --------------------------- 3. The request is finished and blk_account_io_done() is called. In this case, sda2's hd_struct->in_flight, not a sda1's one, is decremented. \| hd_struct->in_flight --------------------------- sda1 \| -1 sda2 \| 1 --------------------------- The patch fixes the problem by caching the partition lookup inside the request structure, hence making sure that the increment and decrement will always happen on the same partition struct. This also speeds up IO with accounting enabled, since it cuts down on the number of lookups we have to do. When reloading partition tables, quiesce IO to ensure that no request references to the partition struct exists. When it is safe to free the partition table, the IO for that device is restarted again. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: Make the integrity mapped property a bio flag	Martin K. Petersen	2010-10-15	2	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we tracked whether the integrity metadata had been remapped using a request flag. This was fine for low-level retries. However, if an I/O was redriven by upper layers we would end up remapping again, causing the retry to fail. Deprecate the REQ_INTEGRITY flag and introduce BIO_MAPPED_INTEGRITY which enables filesystems to notify lower layers that the bio in question has already been remapped. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: Fix double free in blk_integrity_unregister	Martin K. Petersen	2010-10-15	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 3839e4b introduced a kobject_put but failed to remove the kmem_cache_free beneath it, leading to a double free. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: Ensure physical block size is unsigned int	Martin K. Petersen	2010-10-13	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Physical block size was declared unsigned int to accomodate the maximum size reported by READ CAPACITY(16). Make sure we use the right type in the related functions. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blkio-throttle: Fix possible multiplication overflow in iops calculations	Vivek Goyal	2010-10-01	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o User can specify max iops value of 32bit (UINT_MAX), through cgroup interface. If a user has specified say 4294967294 (UNIT_MAX - 2), then on 32bit platform, following multiplication can overflow. io_allowed = (tg->iops[rw] * jiffy_elapsed_rnd) o Explicitly cast the multiplication to 64bit and then perform division and then check whether result is still great then UNINT_MAX. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blkio-throttle: limit max iops value to UINT_MAX	Vivek Goyal	2010-10-01	2	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Limit max iops value to UINT_MAX and return error to user if value is more than that instead of accepting bigger values and truncating implicitly. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blkio-throttle: There is no need to convert jiffies to milli seconds	Vivek Goyal	2010-10-01	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Do not convert jiffies to mili seconds as it is not required. Just work with jiffies and HZ. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blkio-throttle: Fix link failure failure on i386	Vivek Goyal	2010-10-01	1	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Randy Dunlap reported following linux-next failure. This patch fixes it. on i386: blk-throttle.c:(.text+0x1abb8): undefined reference to `__udivdi3' blk-throttle.c:(.text+0x1b1dc): undefined reference to `__udivdi3' o bytes_per_second interface is 64bit and I was continuing to do 64 bit division even on 32bit platform without help of special macros/functions hence the failure. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blkio: Recalculate the throttled bio dispatch time upon throttle limit change	Vivek Goyal	2010-10-01	4	-35/+139
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Currently any cgroup throttle limit changes are processed asynchronousy and the change does not take affect till a new bio is dispatched from same group. o It might happen that a user sets a redicuously low limit on throttling. Say 1 bytes per second on reads. In such cases simple operations like mount a disk can wait for a very long time. o Once bio is throttled, there is no easy way to come out of that wait even if user increases the read limit later. o This patch fixes it. Now if a user changes the cgroup limits, we recalculate the bio dispatch time according to new limits. o Can't take queueu lock under blkcg_lock, hence after the change I wake up the dispatch thread again which recalculates the time. So there are some variables being synchronized across two threads without lock and I had to make use of barriers. Hoping I have used barriers correctly. Any review of memory barrier code especially will help. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blkio: Add root group to td->tg_list	Vivek Goyal	2010-10-01	1	-4/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Currently all the dynamically allocated groups, except root grp is added to td->tg_list. This was not a problem so far but in next patch I will travel through td->tg_list to process any updates of limits on the group. If root group is not in tg_list, then root group's updates are not processed. o It is better to root group also to tg_list instead of doing special processing for it during limit updates. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blkio: deletion of a cgroup was causes oops	Vivek Goyal	2010-10-01	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Now a cgroup list of blkg elements can contain blkg from multiple policies. Before sending an unlink event, make sure blkg belongs to they policy. If policy does not own the blkg, do not send update for this blkg. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n	Vivek Goyal	2010-10-01	1	-47/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently throttling related files were visible even if user had disabled throttling using config options. It was switching off background throttling of bio but not the cgroup files. This patch fixes it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: set the bounce_pfn to the actual DMA limit rather than to max memory	Malahal Naineni	2010-10-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bounce_pfn of the request queue in 64 bit systems is set to the current max_low_pfn. Adding more memory later makes this incorrect. Memory allocated beyond this boot time max_low_pfn appear to require bounce buffers (bounce buffers are actually not allocated but used in calculating segments that may result in "over max segments limit" errors). Signed-off-by: Malahal Naineni <malahal@us.ibm.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: revert bad fix for memory hotplug causing bounces	Jens Axboe	2010-10-01	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revert "block: set the bounce_pfn to the actual DMA limit rather than to max memory" This reverts commit c49825facfd4969585224a896a5e717f88450cad. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK	Mark Lord	2010-09-25	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ensure that 'sysctl_hung_task_timeout_secs' is defined even when CONFIG_DETECT_HUNG_TASK is not set. This way we can safely reference it without need for ifdefs in the code elsewhere. eg. in block/blk-exec.c Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: set the bounce_pfn to the actual DMA limit rather than to max memory	Malahal Naineni	2010-09-24	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bounce_pfn of the request queue in 64 bit systems is set to the current max_low_pfn. Adding more memory later makes this incorrect. Memory allocated beyond this boot time max_low_pfn appear to require bounce buffers (bounce buffers are actually not allocated but used in calculating segments that may result in "over max segments limit" errors). Signed-off-by: Malahal Naineni <malahal@us.ibm.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: Prevent hang_check firing during long I/O	Mark Lord	2010-09-24	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During long I/O operations, the hang_check timer may fire, trigger stack dumps that unnecessarily alarm the user. Eg. hdparm --security-erase NULL /dev/sdb ## can take hours to complete So, if hang_check is armed, we should wake up periodically to prevent it from triggering. This patch uses a wake-up interval equal to half the hang_check timer period, which keeps overhead low enough. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	cfq: improve fsync performance for small files	Corrado Zoccolo	2010-09-20	3	-16/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fsync performance for small files achieved by cfq on high-end disks is lower than what deadline can achieve, due to idling introduced between the sync write happening in process context and the journal commit. Moreover, when competing with a sequential reader, a process writing small files and fsync-ing them is starved. This patch fixes the two problems by: - marking journal commits as WRITE_SYNC, so that they get the REQ_NOIDLE flag set, - force all queues that have REQ_NOIDLE requests to be put in the noidle tree. Having the queue associated to the fsync-ing process and the one associated to journal commits in the noidle tree allows: - switching between them without idling, - fairness vs. competing idling queues, since they will be serviced only after the noidle tree expires its slice. Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	do_mounts: only enable PARTUUID for CONFIG_BLOCK	Jens Axboe	2010-09-17	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When CONFIG_BLOCK is not enabled: init/do_mounts.c:71: error: implicit declaration of function 'dev_to_part' init/do_mounts.c:71: warning: initialization makes pointer from integer without a cast init/do_mounts.c:73: error: dereferencing pointer to incomplete type init/do_mounts.c:76: error: dereferencing pointer to incomplete type init/do_mounts.c:76: error: dereferencing pointer to incomplete type init/do_mounts.c:102: error: implicit declaration of function 'part_pack_uuid' init/do_mounts.c:104: error: 'block_class' undeclared (first use in this function) Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: Fix race during disk initialization	Signed-off-by: Jan Kara	2010-09-16	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a new disk is being discovered, add_disk() first ties the bdev to gendisk (via register_disk()->blkdev_get()) and only after that calls bdi_register_bdev(). Because register_disk() also creates disk's kobject, it can happen that userspace manages to open and modify the device's data (or inode) before its BDI is properly initialized leading to a warning in __mark_inode_dirty(). Fix the problem by registering BDI early enough. This patch addresses https://bugzilla.kernel.org/show_bug.cgi?id=16312 Cc: stable@kernel.org Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blkio: Documentation Update	Vivek Goyal	2010-09-16	1	-3/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Documentation update Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blkio: Implementation of IOPS limit logic	Vivek Goyal	2010-09-16	1	-37/+127
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	o core logic of implementing IOPS throttling. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blk-cgroup: cgroup changes for IOPS limit support	Vivek Goyal	2010-09-16	2	-17/+135
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	o cgroup changes for IOPS throttling rules. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blkio: Core implementation of throttle policy	Vivek Goyal	2010-09-16	7	-3/+979
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Actual implementation of throttling policy in block layer. Currently it implements READ and WRITE bytes per second throttling logic. IOPS throttling comes in later patches. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blk-cgroup: Introduce cgroup changes for throttling policy	Vivek Goyal	2010-09-16	2	-8/+167
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o cgroup chagnes for throttle policy. o Introduces READ and WRITE bytes per second throttling rules. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blk-cgroup: Prepare the base for supporting more than one IO control policies	Vivek Goyal	2010-09-16	4	-154/+429
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o This patch prepares the base for introducing new IO control policies. Currently all the code is written knowing there is only one policy and that is proportional bandwidth. Creating infrastructure for newer policies to come in. o Also there were many functions which were generated using macro. It was very confusing. Got rid of those. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	blk-cgroup: Kill the header printed at the start of blkio.weight_device file	Vivek Goyal	2010-09-16	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Kill extra "dev weight" header which is printed when somebody reads blkio.weight_device file. This really seems to be out of convention. No other blkio files are printing any header at the start of file. I think it is ok to just print values and how to interpret values should be part of documentation. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	core: match_dev_by_uuid() should not be marked __init	Jens Axboe	2010-09-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is also called outside the scope of init functions. Stephen reports: WARNING: init/mounts.o(.text+0x21a): Section mismatch in reference from the function name_to_dev_t() to the function .init.text:match_dev_by_uuid() The function name_to_dev_t() references the function __init match_dev_by_uuid(). This is often because name_to_dev_t lacks a __init annotation or the annotation of match_dev_by_uuid is wrong. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	sg: fix a warning in blk_rq_aligned() call	Namhyung Kim	2010-09-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	2nd argument of blk_rq_aligned() has changed to 'unsigned long' by the previous commit 'block: fix an address space warning in blk-map.c'. That commit neglected to update a user of that function. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	init: add support for root devices specified by partition UUID	Will Drewry	2010-09-15	2	-2/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the third patch in a series which adds support for storing partition metadata, optionally, off of the hd_struct. One major use for that data is being able to resolve partition by other identities than just the index on a block device. Device enumeration varies by platform and there's a benefit to being able to use something like EFI GPT's GUIDs to determine the correct block device and partition to mount as the root. This change adds that support to root= by adding support for the following syntax: root=PARTUUID=hex-uuid Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	genhd, efi: add efi partition metadata to hd_structs	Will Drewry	2010-09-15	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change extends the partition_meta_info structure to support EFI GPT-specific metadata and ensures that data is copied in on partition scanning. Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block, partition: add partition_meta_info to hd_struct	Will Drewry	2010-09-15	5	-6/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm reposting this patch series as v4 since there have been no additional comments, and I cleaned up one extra bit of unneeded code (in 3/3). The patches are against Linus's tree: 2bfc96a127bc1cc94d26bfaa40159966064f9c8c (2.6.36-rc3). Would this patchset be suitable for inclusion in an mm branch? This changes adds a partition_meta_info struct which itself contains a union of structures that provide partition table specific metadata. This change leaves the union empty. The subsequent patch includes an implementation for CONFIG_EFI_PARTITION-based metadata. Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: fix an address space warning in blk-map.c	Namhyung Kim	2010-09-15	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change type of 2nd parameter of blk_rq_aligned() into unsigned long and remove unnecessary casting. Now we can call it with 'uaddr' instead of 'ubuf' in __blk_rq_map_user() so that it can remove following warnings from sparse: block/blk-map.c:57:31: warning: incorrect type in argument 2 (different address spaces) block/blk-map.c:57:31: expected void addr block/blk-map.c:57:31: got void [noderef] <asn:1>ubuf However blk_rq_map_kern() needs one more local variable to handle it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	block: block_dump: Add number of sectors to debug output	San Mehat	2010-09-14	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: San Mehat <san@android.com> Signed-off-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| *	zfcp: Report scatter gather limit for DIX protection information	Christof Schmitt	2010-09-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When sending DIX integrity segments with an I/O request, the restriction for the maximum number of segments is still the same for the zfcp hardware. Report the new sg_prot_tablesize for the SCSI host, so that the number of integrity segments plus the number of data segments is not larger than the hardware limit. This results in using half of the hardware segments for integrity data and the other half for regular data. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
\| *	block/scsi: Provide a limit on the number of integrity segments	Martin K. Petersen	2010-09-10	12	-50/+167
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some controllers have a hardware limit on the number of protection information scatter-gather list segments they can handle. Introduce a max_integrity_segments limit in the block layer and provide a new scsi_host_template setting that allows HBA drivers to provide a value suitable for the hardware. Add support for honoring the integrity segment limit when merging both bios and requests. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
\| *	Consolidate min_not_zero	Martin K. Petersen	2010-09-10	5	-13/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have several users of min_not_zero, each of them using their own definition. Move the define to kernel.h. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
* \|	Merge branch 'linux-next' of git://git.infradead.org/ubi-2.6	Linus Torvalds	2010-10-22	13	-286/+419
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'linux-next' of git://git.infradead.org/ubi-2.6: UBI: tighten the corrupted PEB criteria UBI: fix check_data_ff return code UBI: remember copy_flag while scanning UBI: preserve corrupted PEBs UBI: add truly corrupted PEBs to corrupted list UBI: introduce debugging helper function UBI: make check_pattern function non-static UBI: do not put eraseblocks to the corrupted list unnecessarily UBI: separate out corrupted list UBI: change cascade of ifs to switch statements UBI: rename a local variable UBI: handle bit-flips when no header found UBI: remove duplicate IO error codes UBI: rename IO error code UBI: fix small 80 characters limit style issue UBI: cleanup and simplify Kconfig
\| * \|	UBI: tighten the corrupted PEB criteria	Artem Bityutskiy	2010-10-21	1	-11/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we get a bit-flip of ECC error while reading the data area, do not add it to corrupted list, because it is possible that this is just unstable PEB with corruptions caused by unclean reboots. This patch also improves commentaries. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBI: fix check_data_ff return code	Artem Bityutskiy	2010-10-21	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the data does not contain all 0xFF bytes, 'check_data_ff()' should return 1, not -EINVAL; Also, the caller ('process_eb()') should not add the PEB to the "corrupted" list if there was a read error. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBI: remember copy_flag while scanning	Artem Bityutskiy	2010-10-21	2	-11/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While scanning the flash we read all VID headers and store some important information in 'struct ubi_scan_leb'. Store also the 'copy_flag' value there as it is needed when comparing LEBs. We do not increase memory consumption because this is just one bit and we have plenty of spare bits in 'struct ubi_scan_leb' (sizeof(struct ubi_scan_leb) is 48 both with and without this patch). Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBI: preserve corrupted PEBs	Artem Bityutskiy	2010-10-19	7	-47/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently UBI erases all corrupted eraseblocks, irrespectively of the nature of corruption: corruption due to power cuts and non-power cut corruption. The former case is OK, but the latter is not, because UBI may destroy potentially important data. With this patch, during scanning, when UBI hits a PEB with corrupted VID header, it checks whether this PEB contains only 0xFF data. If yes, it is safe to erase this PEB and it is put to the 'erase' list. If not, this may be important data and it is better to avoid erasing this PEB. Instead, UBI puts it to the corr list and moves out of the pool of available PEB. IOW, UBI preserves this PEB. Such corrupted PEB lessen the amount of available PEBs. So the more of them we accumulate, the less PEBs are available. The maximum amount of non-power cut corrupted PEBs is 8. This patch is a response to UBIFS problem where reporter (Matthew L. Creech <mlcreech@gmail.com>) observes that UBIFS index points to an unmapped LEB. The theory is that corresponding PEB somehow got corrupted and UBI wiped it. This patch (actually a series of patches) tries to make sure such PEBs are preserved - this would make it is easier to analyze the corruption. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBI: add truly corrupted PEBs to corrupted list	Artem Bityutskiy	2010-10-19	1	-2/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Start using the 'corr' list and add there PEBs which look truly corrupted, which means they have corrupted VID header and the data which follows the corrupted header does not contain all 0xFF bytes. At the moment, this does not change UBI functionality much because these PEBs will be erase when scanning finishes. But the plan is to teach UBI preserving them. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBI: introduce debugging helper function	Artem Bityutskiy	2010-10-19	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a helper function to print hexdump: 'ubi_dbg_print_hex_dump()'. It is compiled out if debugging is enabled. Will be used in the next patch. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBI: make check_pattern function non-static	Artem Bityutskiy	2010-10-19	3	-24/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch turns static function 'check_pattern()' into a non-static 'ubi_check_pattern()'. This is just a preparation for the chages which are coming in the next patches. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBI: do not put eraseblocks to the corrupted list unnecessarily	Artem Bityutskiy	2010-10-19	3	-72/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently UBI maintains 2 lists of PEBs during scanning: 1. 'erase' list - PEBs which have no corruptions but should be erased 2. 'corr' list - PEBs which have some corruptions and should be erased But we do not really need 2 lists for PEBs which should be erased after scanning is done - this is redundant. So this patch makes sure all PEBs which are corrupted are moved to the head of the 'erase' list. We add them to the head to make sure they are erased first and we get rid of corruption ASAP. However, we do not remove the 'corr' list and realted functions, because the plan is to use this list for other purposes. Namely, we plan to put eraseblocks with corruption which does not look like it was caused by unclean power cut. Then we'll preserve thes PEBs in order to avoid killing potentially valuable user data. This patch also amends PEBs accounting, because it was closely tight to the 'erase'/'corr' lists separation. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
\| * \|	UBI: separate out corrupted list	Artem Bityutskiy	2010-10-19	1	-11/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces 'add_corrupted()' function and separates out 'corr' list manipulation from the common 'add_to_list()' function. This is just a preparation for further changes - this patch does not change functionality. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>