op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	idr: implement idr_preload[_end]() and idr_alloc()	Tejun Heo	2013-02-27	2	-8/+180
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current idr interface is very cumbersome. * For all allocations, two function calls - idr_pre_get() and idr_get_new() - should be made. idr_pre_get() doesn't guarantee that the following idr_get_new() will not fail from memory shortage. If idr_get_new() returns -EAGAIN, the caller is expected to retry pre_get and allocation. * idr_get_new() can't enforce upper limit. Upper limit can only be enforced by allocating and then freeing if above limit. idr_layer buffer is unnecessarily per-idr. Each idr ends up keeping around MAX_IDR_FREE idr_layers. The memory consumed per idr is under two pages but it makes it difficult to make idr_layer larger. This patch implements the following new set of allocation functions. * idr_preload[_end]() - Similar to radix preload but doesn't fail. The first idr_alloc() inside preload section can be treated as if it were called with @gfp_mask used for idr_preload(). * idr_alloc() - Allocate an ID w/ lower and upper limits. Takes @gfp_flags and can be used w/o preloading. When used inside preloaded section, the allocation mask of preloading can be assumed. If idr_alloc() can be called from a context which allows sufficiently relaxed @gfp_mask, it can be used by itself. If, for example, idr_alloc() is called inside spinlock protected region, preloading can be used like the following. idr_preload(GFP_KERNEL); spin_lock(lock); id = idr_alloc(idr, ptr, start, end, GFP_NOWAIT); spin_unlock(lock); idr_preload_end(); if (id < 0) error; which is much simpler and less error-prone than idr_pre_get and idr_get_new*() loop. The new interface uses per-pcu idr_layer buffer and thus the number of idr's in the system doesn't affect the amount of memory used for preloading. idr_layer_alloc() is introduced to handle idr_layer allocations for both old and new ID allocation paths. This is a bit hairy now but the new interface is expected to replace the old and the internal implementation eventually will become simpler. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	idr: refactor idr_get_new_above()	Tejun Heo	2013-02-27	1	-18/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Move slot filling to idr_fill_slot() from idr_get_new_above_int() and make idr_get_new_above() directly call it. idr_get_new_above_int() is no longer needed and removed. This will be used to implement a new ID allocation interface. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	idr: remove _idr_rc_to_errno() hack	Tejun Heo	2013-02-27	2	-18/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	idr uses -1, IDR_NEED_TO_GROW and IDR_NOMORE_SPACE to communicate exception conditions internally. The return value is later translated to errno values using _idr_rc_to_errno(). This is confusing. Drop the custom ones and consistently use -EAGAIN for "tree needs to grow", -ENOMEM for "need more memory" and -ENOSPC for "ran out of ID space". Due to the weird memory preloading mechanism, [ra]_get_new*() return -EAGAIN on memory shortage, so we need to substitute -ENOMEM w/ -EAGAIN on those interface functions. They'll eventually be cleaned up and the translations will go away. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	idr: relocate idr_for_each_entry() and reorganize id[r\|a]_get_new()	Tejun Heo	2013-02-27	2	-61/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Move idr_for_each_entry() definition next to other idr related definitions. * Make id[r\|a]_get_new() inline wrappers of id[r\|a]_get_new_above(). This changes the implementation of idr_get_new() but the new implementation is trivial. This patch doesn't introduce any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	idr: cosmetic updates to struct / initializer definitions	Tejun Heo	2013-02-27	1	-16/+12
\| \| \| \| \| \| \| \| \| \| \| \|	* Tab align fields like a normal person. * Drop the unnecessary 0 inits from IDR_INIT(). This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	idr: deprecate idr_remove_all()	Tejun Heo	2013-02-27	2	-8/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was only one legitimate use of idr_remove_all() and a lot more of incorrect uses (or lack of it). Now that idr_destroy() implies idr_remove_all() and all the in-kernel users updated not to use it, there's no reason to keep it around. Mark it deprecated so that we can later unexport it. idr_remove_all() is made an inline function calling __idr_remove_all() to avoid triggering deprecated warning on EXPORT_SYMBOL(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	cgroup: don't use idr_remove_all()	Tejun Heo	2013-02-27	1	-3/+1
\| \| \| \| \| \| \| \| \| \|	idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	inotify: don't use idr_remove_all()	Tejun Heo	2013-02-27	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nfs: idr_destroy() no longer needs idr_remove_all()	Tejun Heo	2013-02-27	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop reference to idr_remove_all(). Note that the code wasn't completely correct before because idr_remove() on all entries doesn't necessarily release all idr_layers which could lead to memory leak. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	dlm: don't use idr_remove_all()	Tejun Heo	2013-02-27	2	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. The conversion isn't completely trivial for recover_idr_clear() as it's the only place in kernel which makes legitimate use of idr_remove_all() w/o idr_destroy(). Replace it with idr_remove() call inside idr_for_each_entry() loop. It goes on top so that it matches the operation order in recover_idr_del(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	dlm: use idr_for_each_entry() in recover_idr_clear() error path	Tejun Heo	2013-02-27	1	-13/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Convert recover_idr_clear() to use idr_for_each_entry() instead of idr_for_each(). It's somewhat less efficient this way but it shouldn't matter in an error path. This is to help with deprecation of idr_remove_all(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	rpmsg: don't use idr_remove_all()	Tejun Heo	2013-02-27	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ohad Ben-Cohen <ohad@wizery.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	remoteproc: don't use idr_remove_all()	Tejun Heo	2013-02-27	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ohad Ben-Cohen <ohad@wizery.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	dm: don't use idr_remove_all()	Tejun Heo	2013-02-27	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Alasdair Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	drm: don't use idr_remove_all()	Tejun Heo	2013-02-27	7	-11/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. * drm_ctxbitmap_cleanup() was calling idr_remove_all() but forgetting idr_destroy() thus leaking all buffered free idr_layers. Replace it with idr_destroy(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David Airlie <airlied@linux.ie> Cc: Inki Dae <inki.dae@samsung.com> Cc: Joonyoung Shim <jy0922.shim@samsung.com> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	firewire: don't use idr_remove_all()	Tejun Heo	2013-02-27	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	block/loop: don't use idr_remove_all()	Tejun Heo	2013-02-27	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	atm/nicstar: don't use idr_remove_all()	Tejun Heo	2013-02-27	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \|	idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Chas Williams <chas@cmf.nrl.navy.mil> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	idr: make idr_destroy() imply idr_remove_all()	Tejun Heo	2013-02-27	1	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	idr is silly in quite a few ways, one of which is how it's supposed to be destroyed - idr_destroy() doesn't release IDs and doesn't even whine if the idr isn't empty. If the caller forgets idr_remove_all(), it simply leaks memory. Even ida gets this wrong and leaks memory on destruction. There is absoltely no reason not to call idr_remove_all() from idr_destroy(). Nobody is abusing idr_destroy() for shrinking free layer buffer and continues to use idr after idr_destroy(), so it's safe to do remove_all from destroy. In the whole kernel, there is only one place where idr_remove_all() is legitimiately used without following idr_destroy() while there are quite a few places where the caller forgets either idr_remove_all() or idr_destroy() leaking memory. This patch makes idr_destroy() call idr_destroy_all() and updates the function description accordingly. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	idr: fix a subtle bug in idr_get_next()	Tejun Heo	2013-02-27	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The iteration logic of idr_get_next() is borrowed mostly verbatim from idr_for_each(). It walks down the tree looking for the slot matching the current ID. If the matching slot is not found, the ID is incremented by the distance of single slot at the given level and repeats. The implementation assumes that during the whole iteration id is aligned to the layer boundaries of the level closest to the leaf, which is true for all iterations starting from zero or an existing element and thus is fine for idr_for_each(). However, idr_get_next() may be given any point and if the starting id hits in the middle of a non-existent layer, increment to the next layer will end up skipping the same offset into it. For example, an IDR with IDs filled between [64, 127] would look like the following. [ 0 64 ... ] /----/ \| \| \| NULL [ 64 ... 127 ] If idr_get_next() is called with 63 as the starting point, it will try to follow down the pointer from 0. As it is NULL, it will then try to proceed to the next slot in the same level by adding the slot distance at that level which is 64 - making the next try 127. It goes around the loop and finds and returns 127 skipping [64, 126]. Note that this bug also triggers in idr_for_each_entry() loop which deletes during iteration as deletions can make layers go away leaving the iteration with unaligned ID into missing layers. Fix it by ensuring proceeding to the next slot doesn't carry over the unaligned offset - ie. use round_up(id + 1, slot_distance) instead of id += slot_distance. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: David Teigland <teigland@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	block: fix ext_devt_idr handling	Tomas Henzl	2013-02-27	2	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While adding and removing a lot of disks disks and partitions this sometimes shows up: WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted) Hardware name: sysfs: cannot create duplicate filename '/dev/block/259:751' Modules linked in: raid1 autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc scsi_tgt garp stp llc sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 dm_mirror dm_region_hash dm_log power_meter microcode dcdbas serio_raw amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core k10temp bnx2 sg ixgbe dca mdio ext4 mbcache jbd2 dm_round_robin sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi pata_atiixp ahci mptsas mptscsih mptbase scsi_transport_sas dm_multipath dm_mod [last unloaded: scsi_wait_scan] Pid: 44103, comm: async/16 Not tainted 2.6.32-195.el6.x86_64 #1 Call Trace: warn_slowpath_common+0x87/0xc0 warn_slowpath_fmt+0x46/0x50 sysfs_add_one+0xc9/0x130 sysfs_do_create_link+0x12b/0x170 sysfs_create_link+0x13/0x20 device_add+0x317/0x650 idr_get_new+0x13/0x50 add_partition+0x21c/0x390 rescan_partitions+0x32b/0x470 sd_open+0x81/0x1f0 [sd_mod] __blkdev_get+0x1b6/0x3c0 blkdev_get+0x10/0x20 register_disk+0x155/0x170 add_disk+0xa6/0x160 sd_probe_async+0x13b/0x210 [sd_mod] add_wait_queue+0x46/0x60 async_thread+0x102/0x250 default_wake_function+0x0/0x20 async_thread+0x0/0x250 kthread+0x96/0xa0 child_rip+0xa/0x20 kthread+0x0/0xa0 child_rip+0x0/0x20 This most likely happens because dev_t is freed while the number is still used and idr_get_new() is not protected on every use. The fix adds a mutex where it wasn't before and moves the dev_t free function so it is called after device del. Signed-off-by: Tomas Henzl <thenzl@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	kexec: avoid freeing NULL pointer in image_crash_alloc()	Zhang Yanfei	2013-02-27	1	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Though there is no error if we free a NULL pointer, I think we could avoid this behaviour. Change the code a little in kimage_crash_alloc() could avoid this kind of unnecessary free. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Sasha Levin <sasha.levin@oracle.com> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	kexec: fix memory leak in function kimage_normal_alloc	Zhang Yanfei	2013-02-27	1	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	If kimage_normal_alloc() fails to alloc pages for image->swap_page, it should call kimage_free_page_list() to free allocated pages in image->control_pages list before it frees image. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Sasha Levin <sasha.levin@oracle.com> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	kexec: prevent double free on image allocation failure	Sasha Levin	2013-02-27	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If kimage_normal_alloc() fails to initialize an allocated kimage, it will free the image but would still set 'rimage', as a result kexec_load will try to free it again. This would explode as part of the freeing process is accessing internal members which point to uninitialized memory. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	kexec: export PG_hwpoison flag into vmcoreinfo	Mitsuhiro Tanino	2013-02-27	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch exports a PG_hwpoison into vmcoreinfo when CONFIG_MEMORY_FAILURE is defined. "makedumpfile" needs to read information of memory, such as 'mem_section', 'zone', 'pageflags' from vmcore. We introduce a function into "makedumpfile" to exclude hwpoison page from vmcore dump. In order to introduce this function, PG_hwpoison flag have to export into vmcoreinfo. Signed-off-by: Mitsuhiro Tanino <mitsuhiro.tanino.gm@hitachi.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Mitsuhiro Tanino <mitsuhiro.tanino.gm@hitachi.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	kexec: get rid of duplicate check for hole_end	Zhang Yanfei	2013-02-27	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	hole_end has been checked to make sure it is <= crash_res.end in the while condition check, so the if condition check is duplicate. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	kexec: add the values related to buddy system for filtering free pages.	Atsushi Kumagai	2013-02-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tAdd adds the values related to buddy system to vmcoreinfo data so that makedumpfile (dump filtering command) can filter out all free pages with the new logic. It's faster than the current logic because it can distinguish free page by analyzing page structure at the same time as filtering for other unnecessary pages (e.g. anonymous page). OTOH, the current logic has to trace free_list to distinguish free pages while analyzing page structure to filter out other unnecessary pages. The new logic uses the fact that buddy page is marked by _mapcount == PAGE_BUDDY_MAPCOUNT_VALUE. But, _mapcount shares its memory with other fields for SLAB/SLUB when PG_slab is set, so we need to check if PG_slab is set or not before looking up _mapcount value. And we can get the order of buddy system from private field. To sum it up, the values below are required for this logic. Required values: - OFFSET(page._mapcount) - OFFSET(page.private) - NUMBER(PG_slab) - NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE) Changelog from v1 to v2: 1. remove SIZE(pageflags) The new logic was changed after I sent v1 patch. Accordingly, SIZE(pageflags) has been unnecessary for makedumpfile. What's makedumpfile: makedumpfile creates a small dumpfile by excluding unnecessary pages for the analysis. To distinguish unnecessary pages, makedumpfile gets the vmcoreinfo data which has the minimum debugging information only for dump filtering. Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fork: unshare: remove dead code	Alan Cox	2013-02-27	1	-6/+1
\| \| \| \| \| \| \| \| \| \|	If new_nsproxy is set we will always call switch_task_namespaces and then set new_nsproxy back to NULL so the reassignment and fall through check are redundant Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fs/seq_file.c:seq_lseek(): fix switch statement indenting	Andrew Morton	2013-02-27	1	-20/+20
\| \| \| \| \| \| \|	[akpm@linux-foundation.org: checkpatch fixes] Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	seq-file: use SEEK_ macros instead of hardcoded numbers	Cyrill Gorcunov	2013-02-27	1	-2/+2
\| \| \| \| \| \|	Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	coredump: use a freezable_schedule for the coredump_finish wait	Mandeep Singh Baines	2013-02-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prevents hung_task detector from panicing the machine. This is also needed to prevent this wait from blocking suspend. (It doesnt' currently block suspend but it would once the next patch in this series is applied.) [yongjun_wei@trendmicro.com.cn: kernel/exit.c: remove duplicated include] Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Cc: Ben Chan <benchan@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lockdep: check that no locks held at freeze time	Mandeep Singh Baines	2013-02-27	4	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We shouldn't try_to_freeze if locks are held. Holding a lock can cause a deadlock if the lock is later acquired in the suspend or hibernate path (e.g. by dpm). Holding a lock can also cause a deadlock in the case of cgroup_freezer if a lock is held inside a frozen cgroup that is later acquired by a process outside that group. [akpm@linux-foundation.org: export debug_check_no_locks_held] Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Cc: Ben Chan <benchan@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fs/proc/vmcore.c: put if tests in the top of the while loop to reduce ↵	Zhang Yanfei	2013-02-27	1	-13/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	duplication In read_vmcore() two `if' tests are duplicated. Change the position of them could reduce the duplication. This change does not affect the behaviour of the function. [akpm@linux-foundation.org: avoid `if (foo = bar)' thing, use min_t()] [akpm@linux-foundation.org: s/max_t/min_t/] Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fs/proc: clean up printks	Andrew Morton	2013-02-27	7	-46/+39
\| \| \| \| \| \| \| \| \| \| \| \|	- use pr_foo() throughout - remove a couple of duplicated KERN_WARNINGs, via WARN(KERN_WARNING "...") - nuke a few warnings which I've never seen happen, ever. Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	coredump: remove redundant defines for dumpable states	Kees Cook	2013-02-27	5	-13/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing SUID_DUMP_* defines duplicate the newer SUID_DUMPABLE_* defines introduced in 54b501992dd2 ("coredump: warn about unsafe suid_dumpable / core_pattern combo"). Remove the new ones, and use the prior values instead. Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Chen Gang <gang.chen@asianux.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	kernel/signal.c: fix suboptimal printk usage	Valdis Kletnieks	2013-02-27	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	Several printk's were missing KERN_INFO and KERN_CONT flags. In addition, a printk that was outside a #if/#endif should have been inside, which would result in stray blank line on non-x86 boxes. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	signal: allow to send any siginfo to itself	Andrey Vagin	2013-02-27	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The idea is simple. We need to get the siginfo for each signal on checkpointing dump, and then return it back on restore. The first problem is that the kernel doesn't report complete siginfos to userspace. In a signal handler the kernel strips SI_CODE from siginfo. When a siginfo is received from signalfd, it has a different format with fixed sizes of fields. The interface of signalfd was extended. If a signalfd is created with the flag SFD_RAW, it returns siginfo in a raw format. rt_sigqueueinfo looks suitable for restoring signals, but it can't send siginfo with a positive si_code, because these codes are reserved for the kernel. In the real world each person has right to do anything with himself, so I think a process should able to send any siginfo to itself. This patch: The kernel prevents sending of siginfo with positive si_code, because these codes are reserved for kernel. I think we can allow a task to send such a siginfo to itself. This operation should not be dangerous. This functionality is required for restoring signals in checkpoint/restart. Signed-off-by: Andrey Vagin <avagin@openvz.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Documentation/cgroups/blkio-controller.txt: fix typo	Warren Turkal	2013-02-27	1	-1/+1
\| \| \| \| \| \|	Signed-off-by: Warren Turkal <wt@ooyala.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Documentation/DMA-API-HOWTO.txt: minor grammar corrections	Shuah Khan	2013-02-27	1	-3/+4
\| \| \| \| \| \|	Signed-off-by: Shuah Khan <shuah.khan@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fat: mark fs as dirty on mount and clean on umount	Oleksij Rempel	2013-02-27	3	-0/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no documented methods to mark FAT as dirty. Unofficially MS started to use reserved Byte in boot sector for this purpose, at least since Win 2000. With Win 7 user is warned if fs is dirty and asked to clean it. Different versions of Win, handle it in different ways, but always have same meaning: - Win 2000 and XP, set it on write operations and remove it after operation was finnished - Win 7, set dirty flag on first write and remove it on umount. We will do it as follows: - set dirty flag on mount. If fs was initially dirty, warn user, remember it and do not do any changes to boot sector. - clean it on umount. If fs was initially dirty, leave it dirty. - do not do any thing if fs mounted read-only. - TODO: leave fs dirty if we found some error after mount. Signed-off-by: Oleksij Rempel <bug-track@fisher-privat.net> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fat: add extended fileds to struct fat_boot_sector	Oleksij Rempel	2013-02-27	2	-12/+32
\| \| \| \| \| \| \| \| \|	Later we will need "state" field to check if volume was cleanly unmounted. Signed-off-by: Oleksij Rempel <bug-track@fisher-privat.net> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	hfsplus: fix issue with unzeroed unused b-tree nodes	Vyacheslav Dubeyko	2013-02-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fsck_hfs (under MacOS X) complains about unzeroed unused b-tree nodes after deletion of folders' tree under Linux. SYMPTOMS: Running Disk Utiltiy's "Verify Disk" on "test" gives the following: Verifying volume “Test” Checking file systemChecking Journaled HFS Plus volume. Checking extents overflow file. Checking catalog file. Unused node is not erased (node = 3111) Checking multi-linked files. Checking catalog hierarchy. Checking extended attributes file. Checking volume bitmap. Checking volume information. The volume Test was found corrupt and needs to be repaired. Error: This disk needs to be repaired. Click Repair Disk. REPRODUCING PATH: 1. Prepare HFS+ (non-case sensitive) partition (for example, 5GB) under MacOS X. 2. Copy linux kernel source tree (for example, 3.7-rc6 version) on this partition under MacOS X. 3. Then switch to Linux and mount this prepared partition. 4. Execute `sudo rm -r` under prepared directory with linux kernel source tree. 5. Unmount and boot back into OS X. 6. Open up Disk Utility and verify partition. REPRODUCIBILITY: 100% FIX: It is added code of node clearing in hfs_bnode_put() method for the case when node has flag HFS_BNODE_DELETED. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Kyle Laracey <kalaracey@gmail.com> Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	hfsplus: add support of manipulation by attributes file	Vyacheslav Dubeyko	2013-02-27	13	-183/+287
\| \| \| \| \| \| \| \| \| \| \| \|	Add support of manipulation by attributes file. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	hfsplus: rework functionality of getting, setting and deleting of extended ↵	Vyacheslav Dubeyko	2013-02-27	5	-0/+999
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	attributes Rework functionality of getting, setting and deleting of extended attributes. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	hfsplus: add functionality of manipulating by records in attributes tree	Vyacheslav Dubeyko	2013-02-27	1	-0/+399
\| \| \| \| \| \| \| \| \| \| \| \|	Add functionality of manipulating by records in attributes tree. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	hfsplus: add on-disk layout declarations related to attributes tree	Vyacheslav Dubeyko	2013-02-27	1	-2/+66
\| \| \| \| \| \| \| \| \| \| \| \|	Add all necessary on-disk layout declarations related to attributes file. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	hfsplus: add osx.* prefix for handling namespace of Mac OS X extended attributes	Vyacheslav Dubeyko	2013-02-27	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	hfsplus: reworked support of extended attributes. Current mainline implementation of hfsplus file system driver treats as extended attributes only two fields (fdType and fdCreator) of user_info field in file description record (struct hfsplus_cat_file). It is possible to get or set only these two fields as extended attributes. But HFS+ treats as com.apple.FinderInfo extended attribute an union of user_info and finder_info fields as for file (struct hfsplus_cat_file) as for folder (struct hfsplus_cat_folder). Moreover, current mainline implementation of hfsplus file system driver doesn't support special metadata file - attributes tree. Mac OS X 10.4 and later support extended attributes by making use of the HFS+ filesystem Attributes file B-tree feature which allows for named forks. Mac OS X supports only inline extended attributes, limiting their size to 3802 bytes. Any regular file may have a list of extended attributes. HFS+ supports an arbitrary number of named forks. Each attribute is denoted by a name and the associated data. The name is a null-terminated Unicode string. It is possible to list, to get, to set, and to remove extended attributes from files or directories. It exists some peculiarity during getting of extended attributes list by means of getfattr utility. The getfattr utility expects prefix "user." before any extended attribute's name. So, it ignores any names that don't contained such prefix. Such behavior of getfattr utility results in unexpected empty output of extended attributes list even in the case when file (or folder) contains extended attributes. It needs to use empty string as regular expression pattern for names matching (getfattr --match=""). For support of extended attributes in HFS+: 1. It was added necessary on-disk layout declarations related to Attributes tree into hfsplus_raw.h file. 2. It was added attributes.c file with implementation of functionality of manipulation by records in Attributes tree. 3. It was reworked hfsplus_listxattr, hfsplus_getxattr, hfsplus_setxattr functions in ioctl.c. Moreover, it was added hfsplus_removexattr method. This patch: Add osx. prefix for handling namespace of Mac OS X extended attributes. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lib/scatterlist: use page iterator in the mapping iterator	Imre Deak	2013-02-27	3	-30/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For better code reuse use the newly added page iterator to iterate through the pages. The offset, length within the page is still calculated by the mapping iterator as well as the actual mapping. Idea from Tejun Heo. Signed-off-by: Imre Deak <imre.deak@intel.com> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: James Hogan <james.hogan@imgtec.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	lib/scatterlist: add simple page iterator	Imre Deak	2013-02-27	2	-0/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an iterator to walk through a scatter list a page at a time starting at a specific page offset. As opposed to the mapping iterator this is meant to be small, performing well even in simple loops like collecting all pages on the scatterlist into an array or setting up an iommu table based on the pages' DMA address. Signed-off-by: Imre Deak <imre.deak@intel.com> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Tested-by: Stephen Warren <swarren@wwwdotorg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	MAINTAINERS: update Tegra section to capture all Tegra files	Stephen Warren	2013-02-27	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The intent is to ensure that all Tegra-related patches are sent to the linux-tegra@ mailing list, so people can keep up-to-date on all misc driver changes. Doing this with a keyword is far simpler and more compact than listing all Tegra-related drivers, even if wildcards were used. Words such as integrate or integrator are common. Ensure the character right before "tegra" isn't a-z (case-insensitive), to make sure the keyword doesn't match those. The only files that the keyword doesn't match are the NVEC driver. Add the linux-tegra mailing list to the NVEC entry to solve this. Signed-off-by: Stephen Warren <swarren@nvidia.com> Cc: Joe Perches <joe@perches.com> Cc: Julian Andres Klode <jak@jak-linux.org> Cc: Marc Dietrich <marvin24@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>