op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	ceph: switch to ->read_iter()	Al Viro	2014-05-06	1	-11/+7
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	nfs: switch to ->read_iter()	Al Viro	2014-05-06	3	-14/+9
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fs/block_dev.c: switch to ->read_iter()	Al Viro	2014-05-06	1	-10/+6
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	shmem: switch to ->read_iter()	Al Viro	2014-05-06	1	-10/+5
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	pipe: switch to ->read_iter()	Al Viro	2014-05-06	1	-11/+5
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	cifs: switch to ->read_iter()	Al Viro	2014-05-06	3	-32/+22
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fuse_file_aio_read(): convert to ->read_iter()	Al Viro	2014-05-06	1	-6/+5
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	ocfs2: switch to ->read_iter()	Al Viro	2014-05-06	1	-11/+10
\| \| \| \| \| \|	tracepoints are evil, exhibit #6969... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	ecryptfs: switch to ->read_iter()	Al Viro	2014-05-06	1	-5/+4
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	xfs: switch to ->read_iter()	Al Viro	2014-05-06	1	-12/+7
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	switch simple generic_file_aio_read() users to ->read_iter()	Al Viro	2014-05-06	35	-75/+75
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	new methods: ->read_iter() and ->write_iter()	Al Viro	2014-05-06	7	-13/+121
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Beginning to introduce those. Just the callers for now, and it's clumsier than it'll eventually become; once we finish converting aio_read and aio_write instances, the things will get nicer. For now, these guys are in parallel to ->aio_read() and ->aio_write(); they take iocb and iov_iter, with everything in iov_iter already validated. File offset is passed in iocb->ki_pos, iov/nr_segs - in iov_iter. Main concerns in that series are stack footprint and ability to split the damn thing cleanly. [fix from Peter Ujfalusi <peter.ujfalusi@ti.com> folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	replace checking for ->read/->aio_read presence with check in ->f_mode	Al Viro	2014-05-06	6	-11/+23
\| \| \| \| \| \| \| \| \| \| \|	Since we are about to introduce new methods (read_iter/write_iter), the tests in a bunch of places would have to grow inconveniently. Check once (at open() time) and store results in ->f_mode as FMODE_CAN_READ and FMODE_CAN_WRITE resp. It might end up being a temporary measure - once everything switches from ->aio_{read,write} to ->{read,write}_iter it might make sense to return to open-coded checks. We'll see... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	xfs: trim the argument lists of xfs_file_{dio,buffered}_aio_write()	Al Viro	2014-05-06	1	-19/+14
\| \| \| \| \| \| \|	pos is redundant (it's iocb->ki_pos), and iov/nr_segs/count are taken care of by lifting iov_iter into the caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	blkdev_aio_read(): switch to generic_file_read_iter(), get rid of iov_shorten()	Al Viro	2014-05-06	1	-3/+6
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	iov_iter_truncate()	Al Viro	2014-05-06	7	-42/+40
\| \| \| \| \| \| \| \| \| \| \| \|	Now It Can Be Done(tm) - we don't need to do iov_shorten() in generic_file_direct_write() anymore, now that all ->direct_IO() instances are converted to proper iov_iter methods and honour iter->count and iter->iov_offset properly. Get rid of count/ocount arguments of generic_file_direct_write(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	btrfs: switch check_direct_IO() to iov_iter	Al Viro	2014-05-06	1	-25/+15
\| \| \| \| \| \|	... and don't open-code iov_iter_alignment() there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	new helper: iov_iter_get_pages_alloc()	Al Viro	2014-05-06	4	-257/+167
\| \| \| \| \| \| \| \|	same as iov_iter_get_pages(), except that pages array is allocated (kmalloc if possible, vmalloc if that fails) and left for caller to free. Lustre and NFS ->direct_IO() switched to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	new helper: iov_iter_npages()	Al Viro	2014-05-06	4	-22/+31
\| \| \| \| \| \| \| \|	counts the pages covered by iov_iter, up to given limit. do_block_direct_io() and fuse_iter_npages() switched to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	f2fs: switch to iov_iter_alignment()	Al Viro	2014-05-06	1	-6/+5
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fuse: switch to iov_iter_get_pages()	Al Viro	2014-05-06	1	-21/+12
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fuse: pull iov_iter initializations up	Al Viro	2014-05-06	3	-30/+38
\| \| \| \| \| \| \|	... to fuse_direct_{read,write}(). ->direct_IO() path uses the iov_iter passed by the caller instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	new helper: iov_iter_get_pages()	Al Viro	2014-05-06	3	-73/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	iov_iter_get_pages(iter, pages, maxsize, &start) grabs references pinning the pages of up to maxsize of (contiguous) data from iter. Returns the amount of memory grabbed or -error. In case of success, the requested area begins at offset start in pages[0] and runs through pages[1], etc. Less than requested amount might be returned - either because the contiguous area in the beginning of iterator is smaller than requested, or because the kernel failed to pin that many pages. direct-io.c switched to using iov_iter_get_pages() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	dio: take updating ->result into do_direct_IO()	Al Viro	2014-05-06	1	-4/+2
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	start adding the tag to iov_iter	Al Viro	2014-05-06	15	-35/+41
\| \| \| \| \| \| \| \| \|	For now, just use the same thing we pass to ->direct_IO() - it's all iovec-based at the moment. Pass it explicitly to iov_iter_init() and account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO() uses. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	new helper: generic_file_read_iter()	Al Viro	2014-05-06	4	-48/+37
\| \| \| \| \| \| \| \| \|	iov_iter-using variant of generic_file_aio_read(). Some callers converted. Note that it's still not quite there for use as ->read_iter() - we depend on having zero iter->iov_offset in O_DIRECT case. Fortunately, that's true for all converted callers (and for generic_file_aio_read() itself). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fuse_file_aio_write(): merge initializations of iov_iter	Al Viro	2014-05-06	1	-2/+1
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	ceph_aio_read(): keep iov_iter across retries	Al Viro	2014-05-06	1	-6/+8
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	new primitive: iov_iter_alignment()	Al Viro	2014-05-06	4	-27/+34
\| \| \| \| \| \| \|	returns the value aligned as badly as the worst remaining segment in iov_iter is. Use instead of open-coded equivalents. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	give ->direct_IO() a copy of iov_iter	Al Viro	2014-05-06	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \|	the thing is, we want to advance what's given to ->direct_IO() as we are forming the request; however, the callers care about the amount of data actually transferred, not the amount we tried to transfer. It's more convenient to allow ->direct_IO() instances do use iov_iter_advance() on the copy of iov_iter, leaving the actual advancing of the original to caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	switch {__,}blockdev_direct_IO() to iov_iter	Al Viro	2014-05-06	19	-60/+49
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	get rid of pointless iov_length() in ->direct_IO()	Al Viro	2014-05-06	18	-30/+32
\| \| \| \| \| \|	all callers have iov_length(iter->iov, iter->nr_segs) == iov_iter_count(iter) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	ext4: switch the guts of ->direct_IO() to iov_iter	Al Viro	2014-05-06	3	-18/+15
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	convert the guts of nfs_direct_IO() to iov_iter	Al Viro	2014-05-06	3	-33/+35
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	pass iov_iter to ->direct_IO()	Al Viro	2014-05-06	30	-126/+117
\| \| \| \| \| \|	unmodified, for now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	kill generic_segment_checks()	Al Viro	2014-05-06	10	-104/+16
\| \| \| \| \| \| \| \|	all callers of ->aio_read() and ->aio_write() have iov/nr_segs already checked - generic_segment_checks() done after that is just an odd way to spell iov_length(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	__btrfs_direct_write(): switch to iov_iter	Al Viro	2014-05-06	1	-11/+8
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	generic_file_direct_write(): switch to iov_iter	Al Viro	2014-05-06	6	-23/+19
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	kill iov_iter_copy_from_user()	Al Viro	2014-05-06	5	-40/+5
\| \| \| \| \| \| \|	all callers can use copy_page_from_iter() and it actually simplifies them. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fs/file.c: don't open-code kvfree()	Al Viro	2014-05-06	1	-8/+3
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Merge branch 'akpm' (incoming from Andrew)	Linus Torvalds	2014-05-06	19	-91/+147
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge misc fixes from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: agp: info leak in agpioc_info_wrap() fs/affs/super.c: bugfix / double free fanotify: fix -EOVERFLOW with large files on 64-bit slub: use sysfs'es release mechanism for kmem_cache revert "mm: vmscan: do not swap anon pages just because free+file is low" autofs: fix lockref lookup mm: filemap: update find_get_pages_tag() to deal with shadow entries mm/compaction: make isolate_freepages start at pageblock boundary MAINTAINERS: zswap/zbud: change maintainer email address mm/page-writeback.c: fix divide by zero in pos_ratio_polynom hugetlb: ensure hugepage access is denied if hugepages are not supported slub: fix memcg_propagate_slab_attrs drivers/rtc/rtc-pcf8523.c: fix month definition
\| *	agp: info leak in agpioc_info_wrap()	Dan Carpenter	2014-05-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On 64 bit systems the agp_info struct has a 4 byte hole between ->agp_mode and ->aper_base. We need to clear it to avoid disclosing stack information to userspace. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| *	fs/affs/super.c: bugfix / double free	Fabian Frederick	2014-05-06	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 842a859db26b ("affs: use ->kill_sb() to simplify ->put_super() and failure exits of ->mount()") adds .kill_sb which frees sbi but doesn't remove sbi free in case of parse_options error causing double free+random crash. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> [3.14.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| *	fanotify: fix -EOVERFLOW with large files on 64-bit	Will Woods	2014-05-06	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On 64-bit systems, O_LARGEFILE is automatically added to flags inside the open() syscall (also openat(), blkdev_open(), etc). Userspace therefore defines O_LARGEFILE to be 0 - you can use it, but it's a no-op. Everything should be O_LARGEFILE by default. But: when fanotify does create_fd() it uses dentry_open(), which skips all that. And userspace can't set O_LARGEFILE in fanotify_init() because it's defined to 0. So if fanotify gets an event regarding a large file, the read() will just fail with -EOVERFLOW. This patch adds O_LARGEFILE to fanotify_init()'s event_f_flags on 64-bit systems, using the same test as open()/openat()/etc. Addresses https://bugzilla.redhat.com/show_bug.cgi?id=696821 Signed-off-by: Will Woods <wwoods@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| *	slub: use sysfs'es release mechanism for kmem_cache	Christoph Lameter	2014-05-06	4	-24/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	debugobjects warning during netfilter exit: ------------[ cut here ]------------ WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0() ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 Modules linked in: CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G W 3.11.0-next-20130906-sasha #3984 Workqueue: netns cleanup_net Call Trace: dump_stack+0x52/0x87 warn_slowpath_common+0x8c/0xc0 warn_slowpath_fmt+0x46/0x50 debug_print_object+0x8d/0xb0 __debug_check_no_obj_freed+0xa5/0x220 debug_check_no_obj_freed+0x15/0x20 kmem_cache_free+0x197/0x340 kmem_cache_destroy+0x86/0xe0 nf_conntrack_cleanup_net_list+0x131/0x170 nf_conntrack_pernet_exit+0x5d/0x70 ops_exit_list+0x5e/0x70 cleanup_net+0xfb/0x1c0 process_one_work+0x338/0x550 worker_thread+0x215/0x350 kthread+0xe7/0xf0 ret_from_fork+0x7c/0xb0 Also during dcookie cleanup: WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0() ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 Modules linked in: CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408 Call Trace: dump_stack (lib/dump_stack.c:52) warn_slowpath_common (kernel/panic.c:430) warn_slowpath_fmt (kernel/panic.c:445) debug_print_object (lib/debugobjects.c:262) __debug_check_no_obj_freed (lib/debugobjects.c:697) debug_check_no_obj_freed (lib/debugobjects.c:726) kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717) kmem_cache_destroy (mm/slab_common.c:363) dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343) event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153) __fput (fs/file_table.c:217) ____fput (fs/file_table.c:253) task_work_run (kernel/task_work.c:125 (discriminator 1)) do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751) int_signal (arch/x86/kernel/entry_64.S:807) Sysfs has a release mechanism. Use that to release the kmem_cache structure if CONFIG_SYSFS is enabled. Only slub is changed - slab currently only supports /proc/slabinfo and not /sys/kernel/slab/*. We talked about adding that and someone was working on it. [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build] [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more] Signed-off-by: Christoph Lameter <cl@linux.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Greg KH <greg@kroah.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Pekka Enberg <penberg@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| *	revert "mm: vmscan: do not swap anon pages just because free+file is low"	Johannes Weiner	2014-05-06	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 0bf1457f0cfc ("mm: vmscan: do not swap anon pages just because free+file is low") because it introduced a regression in mostly-anonymous workloads, where reclaim would become ineffective and trap every allocating task in direct reclaim. The problem is that there is a runaway feedback loop in the scan balance between file and anon, where the balance tips heavily towards a tiny thrashing file LRU and anonymous pages are no longer being looked at. The commit in question removed the safe guard that would detect such situations and respond with forced anonymous reclaim. This commit was part of a series to fix premature swapping in loads with relatively little cache, and while it made a small difference, the cure is obviously worse than the disease. Revert it. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| *	autofs: fix lockref lookup	Ian Kent	2014-05-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	autofs needs to be able to see private data dentry flags for its dentrys that are being created but not yet hashed and for its dentrys that have been rmdir()ed but not yet freed. It needs to do this so it can block processes in these states until a status has been returned to indicate the given operation is complete. It does this by keeping two lists, active and expring, of dentrys in this state and uses ->d_release() to keep them stable while it checks the reference count to determine if they should be used. But with the recent lockref changes dentrys being freed sometimes don't transition to a reference count of 0 before being freed so autofs can occassionally use a dentry that is invalid which can lead to a panic. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| *	mm: filemap: update find_get_pages_tag() to deal with shadow entries	Johannes Weiner	2014-05-06	3	-37/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dave Jones reports the following crash when find_get_pages_tag() runs into an exceptional entry: kernel BUG at mm/filemap.c:1347! RIP: find_get_pages_tag+0x1cb/0x220 Call Trace: find_get_pages_tag+0x36/0x220 pagevec_lookup_tag+0x21/0x30 filemap_fdatawait_range+0xbe/0x1e0 filemap_fdatawait+0x27/0x30 sync_inodes_sb+0x204/0x2a0 sync_inodes_one_sb+0x19/0x20 iterate_supers+0xb2/0x110 sys_sync+0x44/0xb0 ia32_do_call+0x13/0x13 1343 /* 1344 * This function is never used on a shmem/tmpfs 1345 * mapping, so a swap entry won't be found here. 1346 / 1347 BUG(); After commit 0cd6144aadd2 ("mm + fs: prepare for non-page entries in page cache radix trees") this comment and BUG() are out of date because exceptional entries can now appear in all mappings - as shadows of recently evicted pages. However, as Hugh Dickins notes, "it is truly surprising for a PAGECACHE_TAG_WRITEBACK (and probably any other PAGECACHE_TAG_) to appear on an exceptional entry. I expect it comes down to an occasional race in RCU lookup of the radix_tree: lacking absolute synchronization, we might sometimes catch an exceptional entry, with the tag which really belongs with the unexceptional entry which was there an instant before." And indeed, not only is the tree walk lockless, the tags are also read in chunks, one radix tree node at a time. There is plenty of time for page reclaim to swoop in and replace a page that was already looked up as tagged with a shadow entry. Remove the BUG() and update the comment. While reviewing all other lookup sites for whether they properly deal with shadow entries of evicted pages, update all the comments and fix memcg file charge moving to not miss shmem/tmpfs swapcache pages. Fixes: 0cd6144aadd2 ("mm + fs: prepare for non-page entries in page cache radix trees") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Dave Jones <davej@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| *	mm/compaction: make isolate_freepages start at pageblock boundary	Vlastimil Babka	2014-05-06	1	-10/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The compaction freepage scanner implementation in isolate_freepages() starts by taking the current cc->free_pfn value as the first pfn. In a for loop, it scans from this first pfn to the end of the pageblock, and then subtracts pageblock_nr_pages from the first pfn to obtain the first pfn for the next for loop iteration. This means that when cc->free_pfn starts at offset X rather than being aligned on pageblock boundary, the scanner will start at offset X in all scanned pageblock, ignoring potentially many free pages. Currently this can happen when a) zone's end pfn is not pageblock aligned, or b) through zone->compact_cached_free_pfn with CONFIG_HOLES_IN_ZONE enabled and a hole spanning the beginning of a pageblock This patch fixes the problem by aligning the initial pfn in isolate_freepages() to pageblock boundary. This also permits replacing the end-of-pageblock alignment within the for loop with a simple pageblock_nr_pages increment. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Heesub Shin <heesub.shin@samsung.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Dongjun Shin <d.j.shin@samsung.com> Cc: Sunghwan Yun <sunghwan.yun@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| *	MAINTAINERS: zswap/zbud: change maintainer email address	Seth Jennings	2014-05-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sjenning@linux.vnet.ibm.com is no longer a viable entity. Signed-off-by: Seth Jennings <sjennings@variantweb.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>