op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	dma: mv_xor: add Device Tree binding	Thomas Petazzoni	2012-11-20	2	-4/+94
\| \| \| \| \| \| \| \| \| \| \|	This patch finally adds a Device Tree binding to the mv_xor driver. Thanks to the previous cleanup patches, the Device Tree binding is relatively simply: one DT node per XOR engine, with sub-nodes for each XOR channel of the XOR engine. The binding obviously comes with the necessary documentation. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: devicetree-discuss@lists.ozlabs.org
*	dma: mv_xor: add missing free_irq() call	Thomas Petazzoni	2012-11-20	2	-1/+5
\| \| \| \| \| \| \|	Even though the driver cannot be unloaded at the moment, it is still good to properly free the IRQ handlers in the channel removal function. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: remove the pool_size from platform_data	Thomas Petazzoni	2012-11-20	4	-26/+9
\| \| \| \| \| \| \| \| \|	The pool_size is always PAGE_SIZE, and since it is a software configuration paramter (and not a hardware description parameter), we cannot make it part of the Device Tree binding, so we'd better remove it from the platform_data as well. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: remove hw_id field from platform_data	Thomas Petazzoni	2012-11-20	3	-8/+3
\| \| \| \| \| \| \| \|	There is no need for the platform_data to give this ID, it is simply the channel number, so we can compute it inside the driver when registering the channels. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: remove useless backpointer from mv_xor_chan to mv_xor_device	Thomas Petazzoni	2012-11-20	2	-2/+0
\| \| \| \| \| \| \|	The backpointer from mv_xor_chan to mv_xor_device is now useless, get rid of it. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: rename mv_xor_private to mv_xor_device	Thomas Petazzoni	2012-11-20	2	-35/+35
\| \| \| \| \| \| \| \| \| \| \| \| \|	Now that mv_xor_device is no longer used to designate the per-channel DMA devices, use it know to designate the XOR engine themselves (currently composed of two XOR channels). So, now we have the nice organization where: - mv_xor_device represents each XOR engine in the system - mv_xor_chan represents each XOR channel of a given XOR engine Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: merge mv_xor_device and mv_xor_chan	Thomas Petazzoni	2012-11-20	2	-55/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Even though the DMA engine infrastructure has support for multiple channels per device, the mv_xor driver registers one DMA engine device for each channel, because the mv_xor channels inside the same XOR engine have different capabilities, and the DMA engine infrastructure only allows to express capabilities at the DMA engine device level. The mv_xor driver has therefore been registering one DMA engine device and one DMA engine channel for each XOR channel since its introduction in the kernel. However, it kept two separate internal structures, mv_xor_device and mv_xor_channel, which didn't make a lot of sense since there was a 1:1 mapping between those structures. This patch gets rid of this duplication, and merges everything into the mv_xor_chan structure. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: use mv_xor_chan pointers as arguments to self-test functions	Thomas Petazzoni	2012-11-20	1	-11/+6
\| \| \| \| \| \| \| \| \| \|	In preparation for the removal of the mv_xor_device structure, we directly pass mv_xor_chan pointers to the self-test functions included in the driver. These functions were anyway selecting the first (and only channel) available in each DMA device, so the behaviour is unchanged. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: in mv_xor_device, rename 'common' to 'dmadev'	Thomas Petazzoni	2012-11-20	2	-8/+8
\| \| \| \| \| \| \| \| \|	The mv_xor_device structure embeds a 'struct dma_device', which is named 'common', a not very meaningful name. Rename it to 'dmadev', which will help avoid confusions later as we merge the mv_xor_device and mv_xor_chan structures together. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: in mv_xor_chan, rename 'common' to 'dmachan'	Thomas Petazzoni	2012-11-20	2	-7/+7
\| \| \| \| \| \| \| \| \|	The mv_xor_chan structure embeds a 'struct dma_chan', which is named 'common', a not very meaningful name. Rename it to 'dmachan', which will help avoid confusions later as we merge the mv_xor_device and mv_xor_chan structures together. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: get rid of the pdev pointer in mv_xor_device	Thomas Petazzoni	2012-11-20	2	-6/+4
\| \| \| \| \| \| \|	It was only used in places where we could get the 'struct device *' pointer through a different way. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: introduce a mv_chan_to_devp() helper	Thomas Petazzoni	2012-11-20	1	-29/+32
\| \| \| \| \| \| \| \| \|	In many place, we need to get the 'struct device ' pointer from a 'struct mv_chan ', so we add a helper that makes this a bit easier. It will also help reducing the change noise in further patches. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: simplify dma_sync_single_for_cpu() calls	Thomas Petazzoni	2012-11-20	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \|	In mv_xor_memcpy_self_test() and mv_xor_xor_self_test(), all DMA functions are called by passing dma_chan->device->dev as the 'device *', except the calls to dma_sync_single_for_cpu() which uselessly goes through mv_chan->device->pdev->dev. Simplify this by using dma_chan->device->dev direclty in dma_sync_single_for_cpu() calls. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: remove unused to_mv_xor_device() macro	Thomas Petazzoni	2012-11-20	1	-3/+0
\| \| \| \| \| \| \|	The to_mv_xor_device() macro is not being used by the driver, so we can get rid of it. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: remove unused id field in mv_xor_device structure	Thomas Petazzoni	2012-11-20	2	-3/+0
\| \| \| \|	Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: rename many symbols to remove the 'shared' word	Thomas Petazzoni	2012-11-20	2	-14/+14
\| \| \| \| \| \| \|	The 'shared' word no longer makes sense in a number of places as we renamed the 'mv_xor_shared' driver to 'mv_xor'. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: change the driver name to 'mv_xor'	Thomas Petazzoni	2012-11-20	6	-12/+14
\| \| \| \| \| \| \| \|	Since we got rid of the per-XOR channel 'mv_xor' driver, now the per-XOR engine driver that used to be called 'mv_xor_shared' can simply be named 'mv_xor'. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: rename mv_xor_shared_platform_data to mv_xor_platform_data	Thomas Petazzoni	2012-11-20	3	-4/+4
\| \| \| \| \| \| \| \| \|	'struct mv_xor_shared_platform_data' used to be the platform_data structure for the 'mv_xor_shared', but this driver is going to be renamed simply 'mv_xor', so also rename its platform_data structure accordingly. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: rename mv_xor_platform_data to mv_xor_channel_data	Thomas Petazzoni	2012-11-20	3	-17/+17
\| \| \| \| \| \| \| \| \| \|	mv_xor_platform_data used to be the platform_data structure associated to the 'mv_xor' driver. This driver no longer exists, and this data structure really contains the properties of each XOR channel part of a given XOR engine. Therefore 'struct mv_xor_channel_data' is a more appropriate name. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: remove 'shared' from mv_xor_platform_data	Thomas Petazzoni	2012-11-20	2	-9/+0
\| \| \| \| \| \| \|	This member of the platform_data structure is no longer used, so get rid of it. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: remove sub-driver 'mv_xor'	Thomas Petazzoni	2012-11-20	2	-49/+1
\| \| \| \| \| \| \| \|	Now that XOR channels are directly registered by the main 'mv_xor_shared' device ->probe() function and all users of the 'mv_xor' device have been removed, we can get rid of the latter. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	arm: plat-orion: remove unused orion_xor_init_channels()	Thomas Petazzoni	2012-11-20	1	-20/+0
\| \| \| \| \| \| \|	Now that xor0 and xor1 are registered in a single driver manner, the orion_xor_init_channels() function has become useless. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	arm: plat-orion: convert the registration of the xor1 engine to the single ↵	Thomas Petazzoni	2012-11-20	1	-52/+42
\| \| \| \| \| \| \| \| \| \|	driver Instead of registering one 'mv_xor_shared' device for the XOR engine, and then two 'mv_xor' devices for the XOR channels, pass the channels properties as platform_data for the main 'mv_xor_shared' device. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	arm: plat-orion: convert the registration of the xor0 engine to the single ↵	Thomas Petazzoni	2012-11-20	1	-52/+42
\| \| \| \| \| \| \| \| \| \|	driver Instead of registering one 'mv_xor_shared' device for the XOR engine, and then two 'mv_xor' devices for the XOR channels, pass the channels properties as platform_data for the main 'mv_xor_shared' device. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: allow channels to be registered directly from the main device	Thomas Petazzoni	2012-11-20	3	-3/+53
\| \| \| \| \| \| \| \| \| \| \|	Extend the XOR engine driver (currently called "mv_xor_shared") so that XOR channels can be passed in the platform_data structure, and be registered from there. This will allow the users of the driver to be converted to the single platform_driver variant of the mv_xor driver. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: split initialization/cleanup of XOR channels	Thomas Petazzoni	2012-11-20	1	-27/+48
\| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of doing the initialization/cleanup of the XOR channels directly in the ->probe() and ->remove() hooks, we create separate utility functions mv_xor_channel_add() and mv_xor_channel_remove(). This will allow to easily introduce in a future patch a different way of registering XOR channels: instead of having one platform_device per channel, we'll trigger the registration of all XOR channels of a given XOR engine directly from the XOR engine ->probe() function. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: do not use pool_size from platform_data within the driver	Thomas Petazzoni	2012-11-20	2	-6/+5
\| \| \| \| \| \| \| \| \| \| \|	The driver currently pokes into the platform_data structure during its normal operation to get the pool_size value. Poking into the platform_data structure is not nice when moving to the Device Tree, so this commit adds a new pool_size field in the mv_xor_device structure, which gets initialized at ->probe() time. The driver then uses this field instead of the platform_data. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	dma: mv_xor: use dev_(err\|info\|notice) instead of dev_printk	Thomas Petazzoni	2012-11-20	1	-30/+30
\| \| \| \| \| \| \|	The usage of dev_printk() is deprecated, and the dev_err(), dev_info() and dev_notice() functions should be used instead. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	ARM: Kirkwood: switch to DT clock providers	Andrew Lunn	2012-11-20	4	-13/+83
\| \| \| \| \| \| \| \| \|	With true DT clock providers available switch Kirkwood clock setup in DT- enabled boards. While AUXDATA can be removed completely from bus probing, some devices still don't know about DT. Therefore, some clkdev aliases are created until these devices also move to DT. Signed-off-by: Andrew Lunn <andrew@lunn.ch>
*	ARM: dove: switch to DT clock providers	Sebastian Hesselbarth	2012-11-20	4	-15/+71
\| \| \| \| \| \| \| \| \|	With true DT clock providers available switch Dove clock setup in DT- enabled boards. While AUXDATA can be removed completely from bus probing, some devices still don't know about DT at all. Therefore, some clock aliases are created until the devices also move to DT. Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
*	clocksource: convert time-armada-370-xp to clk framework	Gregory CLEMENT	2012-11-20	4	-9/+8
\| \| \| \| \|	Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by Gregory CLEMENT <gregory.clement@free-electrons.com>
*	clk: armada-370-xp: add support for clock framework	Gregory CLEMENT	2012-11-20	7	-1/+112
\| \| \| \| \| \|	Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Tested-by Gregory CLEMENT <gregory.clement@free-electrons.com>
*	clk: mvebu: armada 370/XP add clock gating control provider for DT	Gregory CLEMENT	2012-11-20	2	-1/+116
\| \| \| \| \| \|	Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
*	clk: mvebu: add clock gating control provider for DT	Sebastian Hesselbarth	2012-11-20	6	-0/+280
\| \| \| \| \| \| \| \| \|	This driver allows to provide DT clocks for clock gates found on Marvell Dove and Kirkwood SoCs. The clock gates are referenced by the phandle index of the corresponding bit in the clock gating control register to ease lookup in the datasheet. Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
*	clk: mvebu: add armada-370-xp CPU specific clocks	Gregory CLEMENT	2012-11-20	6	-0/+235
\| \| \| \| \| \| \| \|	Add Armada 370/XP specific CPU clocks Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
*	clk: mvebu: add mvebu core clocks.	Sebastian Hesselbarth	2012-11-20	9	-0/+792
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This driver allows to provide DT clocks for core clocks found on Marvell Kirkwood, Dove & 370/XP SoCs. The core clock frequencies and ratios are determined by decoding the Sample-At-Reset registers. Although technically correct, using a divider of 0 will lead to div_by_zero panic. Let's use a ratio of 0/1 instead to fail later with a zero clock. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Tested-by Gregory CLEMENT <gregory.clement@free-electrons.com>
*	Linux 3.7-rc6v3.7-rc6	Linus Torvalds	2012-11-16	1	-1/+1
\|
*	Merge git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2012-11-16	1	-4/+7
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull KVM fix from Marcelo Tosatti: "A correction for oops on module init with older Intel hosts." * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Fix invalid secondary exec controls in vmx_cpuid_update()
\| *	KVM: x86: Fix invalid secondary exec controls in vmx_cpuid_update()	Takashi Iwai	2012-11-16	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The commit [ad756a16: KVM: VMX: Implement PCID/INVPCID for guests with EPT] introduced the unconditional access to SECONDARY_VM_EXEC_CONTROL, and this triggers kernel warnings like below on old CPUs: vmwrite error: reg 401e value a0568000 (err 12) Pid: 13649, comm: qemu-kvm Not tainted 3.7.0-rc4-test2+ #154 Call Trace: [<ffffffffa0558d86>] vmwrite_error+0x27/0x29 [kvm_intel] [<ffffffffa054e8cb>] vmcs_writel+0x1b/0x20 [kvm_intel] [<ffffffffa054f114>] vmx_cpuid_update+0x74/0x170 [kvm_intel] [<ffffffffa03629b6>] kvm_vcpu_ioctl_set_cpuid2+0x76/0x90 [kvm] [<ffffffffa0341c67>] kvm_arch_vcpu_ioctl+0xc37/0xed0 [kvm] [<ffffffff81143f7c>] ? __vunmap+0x9c/0x110 [<ffffffffa0551489>] ? vmx_vcpu_load+0x39/0x1a0 [kvm_intel] [<ffffffffa0340ee2>] ? kvm_arch_vcpu_load+0x52/0x1a0 [kvm] [<ffffffffa032dcd4>] ? vcpu_load+0x74/0xd0 [kvm] [<ffffffffa032deb0>] kvm_vcpu_ioctl+0x110/0x5e0 [kvm] [<ffffffffa032e93d>] ? kvm_dev_ioctl+0x4d/0x4a0 [kvm] [<ffffffff8117dc6f>] do_vfs_ioctl+0x8f/0x530 [<ffffffff81139d76>] ? remove_vma+0x56/0x60 [<ffffffff8113b708>] ? do_munmap+0x328/0x400 [<ffffffff81187c8c>] ? fget_light+0x4c/0x100 [<ffffffff8117e1a1>] sys_ioctl+0x91/0xb0 [<ffffffff815a942d>] system_call_fastpath+0x1a/0x1f This patch adds a check for the availability of secondary exec control to avoid these warnings. Cc: <stable@vger.kernel.org> [v3.6+] Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* \|	Merge branch 'akpm' (Fixes from Andrew)	Linus Torvalds	2012-11-16	19	-120/+86
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge misc fixes from Andrew Morton. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (12 patches) revert "mm: fix-up zone present pages" tmpfs: change final i_blocks BUG to WARNING tmpfs: fix shmem_getpage_gfp() VM_BUG_ON mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address mm: revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" rapidio: fix kernel-doc warnings swapfile: fix name leak in swapoff memcg: fix hotplugged memory zone oops mips, arc: fix build failure memcg: oom: fix totalpages calculation for memory.swappiness==0 mm: fix build warning for uninitialized value mm: add anon_vma_lock to validate_mm()
\| * \|	revert "mm: fix-up zone present pages"	Andrew Morton	2012-11-16	6	-58/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revert commit 7f1290f2f2a4 ("mm: fix-up zone present pages") That patch tried to fix a issue when calculating zone->present_pages, but it caused a regression on 32bit systems with HIGHMEM. With that change, reset_zone_present_pages() resets all zone->present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone->present_pages when the boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Various options for improving the situation are being discussed but for now, let's return to the 3.6 code. Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Tested-by: Chris Clayton <chris2553@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \|	tmpfs: change final i_blocks BUG to WARNING	Hugh Dickins	2012-11-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under a particular load on one machine, I have hit shmem_evict_inode()'s BUG_ON(inode->i_blocks), enough times to narrow it down to a particular race between swapout and eviction. It comes from the "if (freed > 0)" asymmetry in shmem_recalc_inode(), and the lack of coherent locking between mapping's nrpages and shmem's swapped count. There's a window in shmem_writepage(), between lowering nrpages in shmem_delete_from_page_cache() and then raising swapped count, when the freed count appears to be +1 when it should be 0, and then the asymmetry stops it from being corrected with -1 before hitting the BUG. One answer is coherent locking: using tree_lock throughout, without info->lock; reasonable, but the raw_spin_lock in percpu_counter_add() on used_blocks makes that messier than expected. Another answer may be a further effort to eliminate the weird shmem_recalc_inode() altogether, but previous attempts at that failed. So far undecided, but for now change the BUG_ON to WARN_ON: in usual circumstances it remains a useful consistency check. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \|	tmpfs: fix shmem_getpage_gfp() VM_BUG_ON	Hugh Dickins	2012-11-16	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fuzzing with trinity hit the "impossible" VM_BUG_ON(error) (which Fedora has converted to WARNING) in shmem_getpage_gfp(): WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 Call Trace: warn_slowpath_common+0x7f/0xc0 warn_slowpath_null+0x1a/0x20 shmem_getpage_gfp+0xa5c/0xa70 shmem_fault+0x4f/0xa0 __do_fault+0x71/0x5c0 handle_pte_fault+0x97/0xae0 handle_mm_fault+0x289/0x350 __do_page_fault+0x18e/0x530 do_page_fault+0x2b/0x50 page_fault+0x28/0x30 tracesys+0xe1/0xe6 Thanks to Johannes for pointing to truncation: free_swap_and_cache() only does a trylock on the page, so the page lock we've held since before confirming swap is not enough to protect against truncation. What cleanup is needed in this case? Just delete_from_swap_cache(), which takes care of the memcg uncharge. Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Dave Jones <davej@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \|	mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address	Will Deacon	2012-11-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kmap_to_page returns the corresponding struct page for a virtual address of an arbitrary mapping. This works by checking whether the address falls in the pkmap region and using the pkmap page tables instead of the linear mapping if appropriate. Unfortunately, the bounds checking means that PKMAP_ADDR(LAST_PKMAP) is incorrectly treated as a highmem address and we can end up walking off the end of pkmap_page_table and subsequently passing junk to pte_page. This patch fixes the bound check to stay within the pkmap tables. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \|	mm: revert "mm: vmscan: scale number of pages reclaimed by ↵	Mel Gorman	2012-11-16	1	-25/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reclaim/compaction based on failures" Jiri Slaby reported the following: (It's an effective revert of "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures".) Given kswapd had hours of runtime in ps/top output yesterday in the morning and after the revert it's now 2 minutes in sum for the last 24h, I would say, it's gone. The intention of the patch in question was to compensate for the loss of lumpy reclaim. Part of the reason lumpy reclaim worked is because it aggressively reclaimed pages and this patch was meant to be a sane compromise. When compaction fails, it gets deferred and both compaction and reclaim/compaction is deferred avoid excessive reclaim. However, since commit c654345924f7 ("mm: remove __GFP_NO_KSWAPD"), kswapd is woken up each time and continues reclaiming which was not taken into account when the patch was developed. Attempts to address the problem ended up just changing the shape of the problem instead of fixing it. The release window gets closer and while a THP allocation failing is not a major problem, kswapd chewing up a lot of CPU is. This patch reverts commit 83fde0f22872 ("mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures") and will be revisited in the future. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Zdenek Kabelac <zkabelac@redhat.com> Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \|	rapidio: fix kernel-doc warnings	Randy Dunlap	2012-11-16	2	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix rapidio kernel-doc warnings: Warning(drivers/rapidio/rio.c:415): No description found for parameter 'local' Warning(drivers/rapidio/rio.c:415): Excess function parameter 'lstart' description in 'rio_map_inb_region' Warning(include/linux/rio.h:290): No description found for parameter 'switches' Warning(include/linux/rio.h:290): No description found for parameter 'destid_table' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Matt Porter <mporter@kernel.crashing.org> Acked-by: Alexandre Bounine <alexandre.bounine@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \|	swapfile: fix name leak in swapoff	Xiaotian Feng	2012-11-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's a name leak introduced by commit 91a27b2a7567 ("vfs: define struct filename and have getname() return it"). Add the missing putname. [akpm@linux-foundation.org: cleanup] Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \|	memcg: fix hotplugged memory zone oops	Hugh Dickins	2012-11-16	4	-18/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When MEMCG is configured on (even when it's disabled by boot option), when adding or removing a page to/from its lru list, the zone pointer used for stats updates is nowadays taken from the struct lruvec. (On many configurations, calculating zone from page is slower.) But we have no code to update all the lruvecs (per zone, per memcg) when a memory node is hotadded. Here's an extract from the oops which results when running numactl to bind a program to a newly onlined node: BUG: unable to handle kernel NULL pointer dereference at 0000000000000f60 IP: __mod_zone_page_state+0x9/0x60 Pid: 1219, comm: numactl Not tainted 3.6.0-rc5+ #180 Bochs Bochs Process numactl (pid: 1219, threadinfo ffff880039abc000, task ffff8800383c4ce0) Call Trace: __pagevec_lru_add_fn+0xdf/0x140 pagevec_lru_move_fn+0xb1/0x100 __pagevec_lru_add+0x1c/0x30 lru_add_drain_cpu+0xa3/0x130 lru_add_drain+0x2f/0x40 ... The natural solution might be to use a memcg callback whenever memory is hotadded; but that solution has not been scoped out, and it happens that we do have an easy location at which to update lruvec->zone. The lruvec pointer is discovered either by mem_cgroup_zone_lruvec() or by mem_cgroup_page_lruvec(), and both of those do know the right zone. So check and set lruvec->zone in those; and remove the inadequate attempt to set lruvec->zone from lruvec_init(), which is called before NODE_DATA(node) has been allocated in such cases. Ah, there was one exceptionr. For no particularly good reason, mem_cgroup_force_empty_list() has its own code for deciding lruvec. Change it to use the standard mem_cgroup_zone_lruvec() and mem_cgroup_get_lru_size() too. In fact it was already safe against such an oops (the lru lists in danger could only be empty), but we're better proofed against future changes this way. I've marked this for stable (3.6) since we introduced the problem in 3.5 (now closed to stable); but I have no idea if this is the only fix needed to get memory hotadd working with memcg in 3.6, and received no answer when I enquired twice before. Reported-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \|	mips, arc: fix build failure	David Rientjes	2012-11-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using a cross-compiler to fix another issue, the following build error occurred for mips defconfig: arch/mips/fw/arc/misc.c: In function 'ArcHalt': arch/mips/fw/arc/misc.c:25:2: error: implicit declaration of function 'local_irq_disable' Fix it up by including irqflags.h. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \|	memcg: oom: fix totalpages calculation for memory.swappiness==0	Michal Hocko	2012-11-16	2	-6/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	oom_badness() takes a totalpages argument which says how many pages are available and it uses it as a base for the score calculation. The value is calculated by mem_cgroup_get_limit which considers both limit and total_swap_pages (resp. memsw portion of it). This is usually correct but since fe35004fbf9e ("mm: avoid swapping out with swappiness==0") we do not swap when swappiness is 0 which means that we cannot really use up all the totalpages pages. This in turn confuses oom score calculation if the memcg limit is much smaller than the available swap because the used memory (capped by the limit) is negligible comparing to totalpages so the resulting score is too small if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj). A wrong process might be selected as result. The problem can be worked around by checking mem_cgroup_swappiness==0 and not considering swap at all in such a case. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>