summaryrefslogtreecommitdiffstats
path: root/lib
Commit message (Collapse)AuthorAgeFilesLines
* RAID/s390: add SIMD implementation for raid6 gen/xorMartin Schwidefsky2016-08-294-0/+178
| | | | | | | | | | | | | | | | | | | | | | | | Using vector registers is slightly faster: raid6: vx128x8 gen() 19705 MB/s raid6: vx128x8 xor() 11886 MB/s raid6: using algorithm vx128x8 gen() 19705 MB/s raid6: .... xor() 11886 MB/s, rmw enabled vs the software algorithms: raid6: int64x1 gen() 3018 MB/s raid6: int64x1 xor() 1429 MB/s raid6: int64x2 gen() 4661 MB/s raid6: int64x2 xor() 3143 MB/s raid6: int64x4 gen() 5392 MB/s raid6: int64x4 xor() 3509 MB/s raid6: int64x8 gen() 4441 MB/s raid6: int64x8 xor() 3207 MB/s raid6: using algorithm int64x4 gen() 5392 MB/s raid6: .... xor() 3509 MB/s, rmw enabled Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2016-08-172-5/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Buffers powersave frame test is reversed in cfg80211, fix from Felix Fietkau. 2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme. 3) Fix some tg3 ethtool logic bugs, and one that would cause no interrupts to be generated when rx-coalescing is set to 0. From Satish Baddipadige and Siva Reddy Kallam. 4) QLCNIC mailbox corruption and napi budget handling fix from Manish Chopra. 5) Fix fib_trie logic when walking the trie during /proc/net/route output than can access a stale node pointer. From David Forster. 6) Several sctp_diag fixes from Phil Sutter. 7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel. 8) Checksum fixup fixes in bpf from Daniel Borkmann. 9) Memork leaks in nfnetlink, from Liping Zhang. 10) Use after free in rxrpc, from David Howells. 11) Use after free in new skb_array code of macvtap driver, from Jason Wang. 12) Calipso resource leak, from Colin Ian King. 13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang. 14) Fix bpf non-linear packet write helpers, from Daniel Borkmann. 15) Fix lockdep splats in macsec, from Sabrina Dubroca. 16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF handling. 17) Various tc-action bug fixes, from CONG Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) net_sched: allow flushing tc police actions net_sched: unify the init logic for act_police net_sched: convert tcf_exts from list to pointer array net_sched: move tc offload macros to pkt_cls.h net_sched: fix a typo in tc_for_each_action() net_sched: remove an unnecessary list_del() net_sched: remove the leftover cleanup_a() mlxsw: spectrum: Allow packets to be trapped from any PG mlxsw: spectrum: Unmap 802.1Q FID before destroying it mlxsw: spectrum: Add missing rollbacks in error path mlxsw: reg: Fix missing op field fill-up mlxsw: spectrum: Trap loop-backed packets mlxsw: spectrum: Add missing packet traps mlxsw: spectrum: Mark port as active before registering it mlxsw: spectrum: Create PVID vPort before registering netdevice mlxsw: spectrum: Remove redundant errors from the code mlxsw: spectrum: Don't return upon error in removal path i40e: check for and deal with non-contiguous TCs ixgbe: Re-enable ability to toggle VLAN filtering ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths ...
| * rhashtable: fix shift by 64 when shrinkingVegard Nossum2016-08-151-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I got this: ================================================================================ UBSAN: Undefined behaviour in ./include/linux/log2.h:63:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 1 PID: 721 Comm: kworker/1:1 Not tainted 4.8.0-rc1+ #87 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 Workqueue: events rht_deferred_worker 0000000000000000 ffff88011661f8d8 ffffffff82344f50 0000000041b58ab3 ffffffff84f98000 ffffffff82344ea4 ffff88011661f900 ffff88011661f8b0 0000000000000001 ffff88011661f6b8 dffffc0000000000 ffffffff867f7640 Call Trace: [<ffffffff82344f50>] dump_stack+0xac/0xfc [<ffffffff82344ea4>] ? _atomic_dec_and_lock+0xc4/0xc4 [<ffffffff8242f5b8>] ubsan_epilogue+0xd/0x8a [<ffffffff82430c41>] __ubsan_handle_shift_out_of_bounds+0x255/0x29a [<ffffffff824309ec>] ? __ubsan_handle_out_of_bounds+0x180/0x180 [<ffffffff84003436>] ? nl80211_req_set_reg+0x256/0x2f0 [<ffffffff812112ba>] ? print_context_stack+0x8a/0x160 [<ffffffff81200031>] ? amd_pmu_reset+0x341/0x380 [<ffffffff823af808>] rht_deferred_worker+0x1618/0x1790 [<ffffffff823af808>] ? rht_deferred_worker+0x1618/0x1790 [<ffffffff823ae1f0>] ? rhashtable_jhash2+0x370/0x370 [<ffffffff8134c12d>] ? process_one_work+0x6fd/0x1970 [<ffffffff8134c1cf>] process_one_work+0x79f/0x1970 [<ffffffff8134c12d>] ? process_one_work+0x6fd/0x1970 [<ffffffff8134ba30>] ? try_to_grab_pending+0x4c0/0x4c0 [<ffffffff8134d564>] ? worker_thread+0x1c4/0x1340 [<ffffffff8134d8ff>] worker_thread+0x55f/0x1340 [<ffffffff845e904f>] ? __schedule+0x4df/0x1d40 [<ffffffff8134d3a0>] ? process_one_work+0x1970/0x1970 [<ffffffff8134d3a0>] ? process_one_work+0x1970/0x1970 [<ffffffff813642f7>] kthread+0x237/0x390 [<ffffffff813640c0>] ? __kthread_parkme+0x280/0x280 [<ffffffff845f8c93>] ? _raw_spin_unlock_irq+0x33/0x50 [<ffffffff845f95df>] ret_from_fork+0x1f/0x40 [<ffffffff813640c0>] ? __kthread_parkme+0x280/0x280 ================================================================================ roundup_pow_of_two() is undefined when called with an argument of 0, so let's avoid the call and just fall back to ht->p.min_size (which should never be smaller than HASH_MIN_SIZE). Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * rhashtable: avoid large lock-array allocationsFlorian Westphal2016-08-141-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sander reports following splat after netfilter nat bysrc table got converted to rhashtable: swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP) CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1 [..] [<ffffffff811633ed>] warn_alloc_failed+0xdd/0x140 [<ffffffff811638b1>] __alloc_pages_nodemask+0x3e1/0xcf0 [<ffffffff811a72ed>] alloc_pages_current+0x8d/0x110 [<ffffffff8117cb7f>] kmalloc_order+0x1f/0x70 [<ffffffff811aec19>] __kmalloc+0x129/0x140 [<ffffffff8146d561>] bucket_table_alloc+0xc1/0x1d0 [<ffffffff8146da1d>] rhashtable_insert_rehash+0x5d/0xe0 [<ffffffff819fcfff>] nf_nat_setup_info+0x2ef/0x400 The failure happens when allocating the spinlock array. Even with GFP_KERNEL its unlikely for such a large allocation to succeed. Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition to adding NOWARN for atomic allocations this also makes the bucket-array sizing more conservative. In commit 095dc8e0c3686 ("tcp: fix/cleanup inet_ehash_locks_alloc()"), Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'". IOW, consider size needed by a single spinlock when determining number of locks per cpu. So with 64 byte per cacheline and 4 byte per spinlock this gives 32 locks per cpu. Resulting size of the lock-array (sizeof(spinlock) == 4): cpus: 1 2 4 8 16 32 64 old: 1k 1k 4k 8k 16k 16k 16k new: 128 256 512 1k 2k 4k 8k 8k allocation should have decent chance of success even with GFP_ATOMIC, and should not fail with GFP_KERNEL. With 72-byte spinlock (LOCKDEP): cpus : 1 2 old: 9k 18k new: ~2k ~4k Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Suggested-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * rhashtable-test: Fix max_size parameter descriptionPhil Sutter2016-08-081-1/+1
| | | | | | | | | | | | | | | | Looks like a simple copy'n'paste error. Fixes: 1aa661f5c3df1 ("rhashtable-test: Measure time to insert, remove & traverse entries") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
* | unsafe_[get|put]_user: change interface to use a error target labelLinus Torvalds2016-08-082-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I initially added the unsafe_[get|put]_user() helpers in commit 5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses"), I made the mistake of modeling the interface on our traditional __[get|put]_user() functions, which return zero on success, or -EFAULT on failure. That interface is fairly easy to use, but it's actually fairly nasty for good code generation, since it essentially forces the caller to check the error value for each access. In particular, since the error handling is already internally implemented with an exception handler, and we already use "asm goto" for various other things, we could fairly easily make the error cases just jump directly to an error label instead, and avoid the need for explicit checking after each operation. So switch the interface to pass in an error label, rather than checking the error value in the caller. Best do it now before we start growing more users (the signal handling code in particular would be a good place to use the new interface). So rather than if (unsafe_get_user(x, ptr)) ... handle error .. the interface is now unsafe_get_user(x, ptr, label); where an error during the user mode fetch will now just cause a jump to 'label' in the caller. Right now the actual _implementation_ of this all still ends up being a "if (err) goto label", and does not take advantage of any exception label tricks, but for "unsafe_put_user()" in particular it should be fairly straightforward to convert to using the exception table model. Note that "unsafe_get_user()" is much harder to convert to a clever exception table model, because current versions of gcc do not allow the use of "asm goto" (for the exception) with output values (for the actual value to be fetched). But that is hopefully not a limitation in the long term. [ Also note that it might be a good idea to switch unsafe_get_user() to actually _return_ the value it fetches from user space, but this commit only changes the error handling semantics ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | dynamic_debug: add jump label supportJason Baron2016-08-041-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although dynamic debug is often only used for debug builds, sometimes its enabled for production builds as well. Minimize its impact by using jump labels. This reduces the text section by 7000+ bytes in the kernel image below. It does increase data, but this should only be referenced when changing the direction of the branches, and hence usually not in cache. text data bss dec hex filename 8194852 4879776 925696 14000324 d5a0c4 vmlinux.pre 8187337 4960224 925696 14073257 d6bda9 vmlinux.post Link: http://lkml.kernel.org/r/d165b465e8c89bc582d973758d40be44c33f018b.1467837322.git.jbaron@akamai.com Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Joe Perches <joe@perches.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | dma-mapping: use unsigned long for dma_attrsKrzysztof Kozlowski2016-08-042-10/+12
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dma-mapping core and the implementations do not change the DMA attributes passed by pointer. Thus the pointer can point to const data. However the attributes do not have to be a bitfield. Instead unsigned long will do fine: 1. This is just simpler. Both in terms of reading the code and setting attributes. Instead of initializing local attributes on the stack and passing pointer to it to dma_set_attr(), just set the bits. 2. It brings safeness and checking for const correctness because the attributes are passed by value. Semantic patches for this change (at least most of them): virtual patch virtual context @r@ identifier f, attrs; @@ f(..., - struct dma_attrs *attrs + unsigned long attrs , ...) { ... } @@ identifier r.f; @@ f(..., - NULL + 0 ) and // Options: --all-includes virtual patch virtual context @r@ identifier f, attrs; type t; @@ t f(..., struct dma_attrs *attrs); @@ identifier r.f; @@ f(..., - NULL + 0 ) Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Acked-by: Mark Salter <msalter@redhat.com> [c6x] Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris] Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm] Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp] Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core] Acked-by: David Vrabel <david.vrabel@citrix.com> [xen] Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb] Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'akpm' (patches from Andrew)Linus Torvalds2016-08-026-23/+33
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge yet more updates from Andrew Morton: - the rest of ocfs2 - various hotfixes, mainly MM - quite a bit of misc stuff - drivers, fork, exec, signals, etc. - printk updates - firmware - checkpatch - nilfs2 - more kexec stuff than usual - rapidio updates - w1 things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits) ipc: delete "nr_ipc_ns" kcov: allow more fine-grained coverage instrumentation init/Kconfig: add clarification for out-of-tree modules config: add android config fragments init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig relay: add global mode support for buffer-only channels init: allow blacklisting of module_init functions w1:omap_hdq: fix regression w1: add helper macro module_w1_family w1: remove need for ida and use PLATFORM_DEVID_AUTO rapidio/switches: add driver for IDT gen3 switches powerpc/fsl_rio: apply changes for RIO spec rev 3 rapidio: modify for rev.3 specification changes rapidio: change inbound window size type to u64 rapidio/idt_gen2: fix locking warning rapidio: fix error handling in mbox request/release functions rapidio/tsi721_dma: advance queue processing from transfer submit call rapidio/tsi721: add messaging mbox selector parameter rapidio/tsi721: add PCIe MRRS override parameter rapidio/tsi721_dma: add channel mask and queue size parameters ...
| * kcov: allow more fine-grained coverage instrumentationVegard Nossum2016-08-021-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For more targeted fuzzing, it's better to disable kernel-wide instrumentation and instead enable it on a per-subsystem basis. This follows the pattern of UBSAN and allows you to compile in the kcov driver without instrumenting the whole kernel. To instrument a part of the kernel, you can use either # for a single file in the current directory KCOV_INSTRUMENT_filename.o := y or # for all the files in the current directory (excluding subdirectories) KCOV_INSTRUMENT := y or # (same as above) ccflags-y += $(CFLAGS_KCOV) or # for all the files in the current directory (including subdirectories) subdir-ccflags-y += $(CFLAGS_KCOV) Link: http://lkml.kernel.org/r/1464008380-11405-1-git-send-email-vegard.nossum@oracle.com Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * crc32: use ktime_get_ns() for measurementArnd Bergmann2016-08-021-12/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The crc32 test function measures the elapsed time in nanoseconds, but uses 'struct timespec' for that. We want to remove timespec from the kernel for y2038 compatibility, and ktime_get_ns() also helps make the code simpler here. It is also slightly better to use monontonic time, as we are only interested in the time difference. Link: http://lkml.kernel.org/r/20160617143932.3289626-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: "David S . Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * lib/iommu-helper: skip to next segmentSebastian Ott2016-08-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a large enough area in the iommu bitmap is found but would span a boundary we continue the search starting from the next bit position. For large allocations this can lead to several useless invocations of bitmap_find_next_zero_area() and iommu_is_span_boundary(). Continue the search from the start of the next segment (which is the next bit position such that we'll not cross the same segment boundary again). Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1606081910070.3211@schleppi Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * ratelimit: extend to print suppressed messages on releaseBorislav Petkov2016-08-021-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the ratelimiting facility to print the amount of suppressed lines when it is being released. This use case is aimed at short-termed, burst-like users for which we want to output the suppressed lines stats only once, after it has been disposed of. For an example, see /dev/kmsg usage in a follow-on patch. Also, change the printk() line we issue on release to not use "callbacks" as it is misleading: we're not suppressing callbacks but printk() calls. This has been separated from a previous patch by Linus. Link: http://lkml.kernel.org/r/20160716061745.15795-2-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Dave Young <dyoung@redhat.com> Cc: Franck Bui <fbui@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * UBSAN: fix typo in format stringNicolas Iooss2016-08-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | handle_object_size_mismatch() used %pk to format a kernel pointer with pr_err(). This seemed to be a misspelling for %pK, but using this to format a kernel pointer does not make much sence here. Therefore use %p instead, like in handle_missaligned_access(). Link: http://lkml.kernel.org/r/20160730083010.11569-1-nicolas.iooss_linux@m4x.org Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * radix-tree: account nodes to memcg only if explicitly requestedVladimir Davydov2016-08-021-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Radix trees may be used not only for storing page cache pages, so unconditionally accounting radix tree nodes to the current memory cgroup is bad: if a radix tree node is used for storing data shared among different cgroups we risk pinning dead memory cgroups forever. So let's only account radix tree nodes if it was explicitly requested by passing __GFP_ACCOUNT to INIT_RADIX_TREE. Currently, we only want to account page cache entries, so mark mapping->page_tree so. Fixes: 58e698af4c63 ("radix-tree: account radix_tree_node to memory cgroup") Link: http://lkml.kernel.org/r/1470057188-7864-1-git-send-email-vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> [4.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'kbuild' of ↵Linus Torvalds2016-08-021-0/+2
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild updates from Michal Marek: - GCC plugin support by Emese Revfy from grsecurity, with a fixup from Kees Cook. The plugins are meant to be used for static analysis of the kernel code. Two plugins are provided already. - reduction of the gcc commandline by Arnd Bergmann. - IS_ENABLED / IS_REACHABLE macro enhancements by Masahiro Yamada - bin2c fix by Michael Tautschnig - setlocalversion fix by Wolfram Sang * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: gcc-plugins: disable under COMPILE_TEST kbuild: Abort build on bad stack protector flag scripts: Fix size mismatch of kexec_purgatory_size kbuild: make samples depend on headers_install Kbuild: don't add obj tree in additional includes Kbuild: arch: look for generated headers in obtree Kbuild: always prefix objtree in LINUXINCLUDE Kbuild: avoid duplicate include path Kbuild: don't add ../../ to include path vmlinux.lds.h: replace config_enabled() with IS_ENABLED() kconfig.h: allow to use IS_{ENABLE,REACHABLE} in macro expansion kconfig.h: use already defined macros for IS_REACHABLE() define export.h: use __is_defined() to check if __KSYM_* is defined kconfig.h: use __is_defined() to check if MODULE is defined kbuild: setlocalversion: print error to STDERR Add sancov plugin Add Cyclomatic complexity GCC plugin GCC plugin infrastructure Shared library support
| * gcc-plugins: disable under COMPILE_TESTKees Cook2016-07-271-2/+2
| | | | | | | | | | | | | | | | | | | | Since adding the gcc plugin development headers is required for the gcc plugin support, we should ease into this new kernel build dependency more slowly. For now, disable the gcc plugins under COMPILE_TEST so that all*config builds will skip it. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michal Marek <mmarek@suse.com>
| * Add sancov pluginEmese Revfy2016-06-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The sancov gcc plugin inserts a __sanitizer_cov_trace_pc() call at the start of basic blocks. This plugin is a helper plugin for the kcov feature. It supports all gcc versions with plugin support (from gcc-4.5 on). It is based on the gcc commit "Add fuzzing coverage support" by Dmitry Vyukov (https://gcc.gnu.org/viewcvs/gcc?limit_changes=0&view=revision&revision=231296). Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michal Marek <mmarek@suse.com>
* | Merge branch 'linus' of ↵Linus Torvalds2016-08-011-7/+7
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a number of regressions in the marvell cesa driver caused by the chaining work, and a regression in lib/mpi that leads to a GFP_KERNEL allocation with preemption disabled" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: marvell - Don't copy IV vectors from the _process op for ciphers lib/mpi: Fix SG miter leak crypto: marvell - Update cache with input sg only when it is unmapped crypto: marvell - Don't chain at DMA level when backlog is disabled crypto: marvell - Fix memory leaks in TDMA chain for cipher requests
| * | lib/mpi: Fix SG miter leakHerbert Xu2016-07-291-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In mpi_read_raw_from_sgl we may leak the SG miter resouces after reading the leading zeroes. This patch fixes this by stopping the iteration once the leading zeroes have been read. Fixes: 127827b9c295 ("lib/mpi: Do not do sg_virt") Reported-by: Nicolai Stange <nicstange@gmail.com> Tested-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | | Merge branch 'x86-microcode-for-linus' of ↵Linus Torvalds2016-07-301-1/+4
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode updates from Thomas Gleixner: - more work to make the microcode loader robust - a fix for the micro code load precedence - fixes for initrd loading with randomized memory - less printk noise on SMP machines * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm, x86/microcode: Add __PAGE_OFFSET_BASE define on 32-bit x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y x86/microcode: Remove unused symbol exports x86/microcode/intel: Do not issue microcode updates messages on each CPU Documentation/microcode: Document some aspects for more clarity x86/microcode/AMD: Make amd_ucode_patch[] static x86/microcode/intel: Unexport save_mc_for_early() x86/microcode/intel: Rename load_microcode_early() to find_microcode_patch() x86/microcode: Propagate save_microcode_in_initrd() retval x86/microcode: Get rid of find_cpio_data()'s dummy offset arg lib/cpio: Make find_cpio_data()'s offset arg optional x86/microcode: Fix suspend to RAM with builtin microcode x86/microcode: Fix loading precedence
| * \ \ Merge branch 'linus' into x86/microcode, to pick up merge window changesIngo Molnar2016-07-277-44/+67
| |\ \ \ | | | | | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | lib/cpio: Make find_cpio_data()'s offset arg optionalBorislav Petkov2016-06-081-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some callers don't use it so make it optional. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1465225850-7352-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2016-07-283-4/+9
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge more updates from Andrew Morton: "The rest of MM" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (101 commits) mm, compaction: simplify contended compaction handling mm, compaction: introduce direct compaction priority mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations mm, page_alloc: make THP-specific decisions more generic mm, page_alloc: restructure direct compaction handling in slowpath mm, page_alloc: don't retry initial attempt in slowpath mm, page_alloc: set alloc_flags only once in slowpath lib/stackdepot.c: use __GFP_NOWARN for stack allocations mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB mm, kasan: account for object redzone in SLUB's nearest_obj() mm: fix use-after-free if memory allocation failed in vma_adjust() zsmalloc: Delete an unnecessary check before the function call "iput" mm/memblock.c: fix index adjustment error in __next_mem_range_rev() mem-hotplug: alloc new page from a nearest neighbor node when mem-offline mm: optimize copy_page_to/from_iter_iovec mm: add cond_resched() to generic_swapfile_activate() Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode mm: hwpoison: remove incorrect comments make __section_nr() more efficient ...
| * | | | | lib/stackdepot.c: use __GFP_NOWARN for stack allocationsKirill A. Shutemov2016-07-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This (large, atomic) allocation attempt can fail. We expect and handle that, so avoid the scary warning. Link: http://lkml.kernel.org/r/20160720151905.GB19146@node.shutemov.name Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUBAlexander Potapenko2016-07-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For KASAN builds: - switch SLUB allocator to using stackdepot instead of storing the allocation/deallocation stacks in the objects; - change the freelist hook so that parts of the freelist can be put into the quarantine. [aryabinin@virtuozzo.com: fixes] Link: http://lkml.kernel.org/r/1468601423-28676-1-git-send-email-aryabinin@virtuozzo.com Link: http://lkml.kernel.org/r/1468347165-41906-3-git-send-email-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Kuthonuzo Luruo <kuthonuzo.luruo@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | mm: optimize copy_page_to/from_iter_iovecMikulas Patocka2016-07-281-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | copy_page_to_iter_iovec() and copy_page_from_iter_iovec() copy some data to userspace or from userspace. These functions have a fast path where they map a page using kmap_atomic and a slow path where they use kmap. kmap is slower than kmap_atomic, so the fast path is preferred. However, on kernels without highmem support, kmap just calls page_address, so there is no need to avoid kmap. On kernels without highmem support, the fast path just increases code size (and cache footprint) and it doesn't improve copy performance in any way. This patch enables the fast path only if CONFIG_HIGHMEM is defined. Code size reduced by this patch: x86 (without highmem) 928 x86-64 960 sparc64 848 alpha 1136 pa-risc 1200 [akpm@linux-foundation.org: use IS_ENABLED(), per Andi] Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1607221711410.4818@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge branch 'salted-string-hash'Linus Torvalds2016-07-281-2/+2
|\ \ \ \ \ \ | |/ / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes the vfs dentry hashing to mix in the parent pointer at the _beginning_ of the hash, rather than at the end. That actually improves both the hash and the code generation, because we can move more of the computation to the "static" part of the dcache setup, and do less at lookup runtime. It turns out that a lot of other hash users also really wanted to mix in a base pointer as a 'salt' for the hash, and so the slightly extended interface ends up working well for other cases too. Users that want a string hash that is purely about the string pass in a 'salt' pointer of NULL. * merge branch 'salted-string-hash': fs/dcache.c: Save one 32-bit multiply in dcache lookup vfs: make the string hashes salt the hash
| * | | | | vfs: make the string hashes salt the hashLinus Torvalds2016-06-101-2/+2
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We always mixed in the parent pointer into the dentry name hash, but we did it late at lookup time. It turns out that we can simplify that lookup-time action by salting the hash with the parent pointer early instead of late. A few other users of our string hashes also wanted to mix in their own pointers into the hash, and those are updated to use the same mechanism. Hash users that don't have any particular initial salt can just use the NULL pointer as a no-salt. Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 'random_for_linus' of ↵Linus Torvalds2016-07-272-1/+80
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull random driver updates from Ted Ts'o: "A number of improvements for the /dev/random driver; the most important is the use of a ChaCha20-based CRNG for /dev/urandom, which is faster, more efficient, and easier to make scalable for silly/abusive userspace programs that want to read from /dev/urandom in a tight loop on NUMA systems. This set of patches also improves entropy gathering on VM's running on Microsoft Azure, and will take advantage of a hw random number generator (if present) to initialize the /dev/urandom pool" (It turns out that the random tree hadn't been in linux-next this time around, because it had been dropped earlier as being too quiet. Oh well). * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: strengthen input validation for RNDADDTOENTCNT random: add backtracking protection to the CRNG random: make /dev/urandom scalable for silly userspace programs random: replace non-blocking pool with a Chacha20-based CRNG random: properly align get_random_int_hash random: add interrupt callback to VMBus IRQ handler random: print a warning for the first ten uninitialized random users random: initialize the non-blocking pool via add_hwgenerator_randomness()
| * | | | | random: replace non-blocking pool with a Chacha20-based CRNGTheodore Ts'o2016-07-032-1/+80
| | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | The CRNG is faster, and we don't pretend to track entropy usage in the CRNG any more. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2016-07-271-2/+24
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Unified UDP encapsulation offload methods for drivers, from Alexander Duyck. 2) Make DSA binding more sane, from Andrew Lunn. 3) Support QCA9888 chips in ath10k, from Anilkumar Kolli. 4) Several workqueue usage cleanups, from Bhaktipriya Shridhar. 5) Add XDP (eXpress Data Path), essentially running BPF programs on RX packets as soon as the device sees them, with the option to mirror the packet on TX via the same interface. From Brenden Blanco and others. 6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet. 7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli. 8) Simplify netlink conntrack entry layout, from Florian Westphal. 9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido Schimmel, Yotam Gigi, and Jiri Pirko. 10) Add SKB array infrastructure and convert tun and macvtap over to it. From Michael S Tsirkin and Jason Wang. 11) Support qdisc packet injection in pktgen, from John Fastabend. 12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy. 13) Add NV congestion control support to TCP, from Lawrence Brakmo. 14) Add GSO support to SCTP, from Marcelo Ricardo Leitner. 15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni. 16) Support MPLS over IPV4, from Simon Horman. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits) xgene: Fix build warning with ACPI disabled. be2net: perform temperature query in adapter regardless of its interface state l2tp: Correctly return -EBADF from pppol2tp_getname. net/mlx5_core/health: Remove deprecated create_singlethread_workqueue net: ipmr/ip6mr: update lastuse on entry change macsec: ensure rx_sa is set when validation is disabled tipc: dump monitor attributes tipc: add a function to get the bearer name tipc: get monitor threshold for the cluster tipc: make cluster size threshold for monitoring configurable tipc: introduce constants for tipc address validation net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update() MAINTAINERS: xgene: Add driver and documentation path Documentation: dtb: xgene: Add MDIO node dtb: xgene: Add MDIO node drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset drivers: net: xgene: Use exported functions drivers: net: xgene: Enable MDIO driver drivers: net: xgene: Add backward compatibility drivers: net: phy: xgene: Add MDIO driver ...
| * | | | | Introduce rb_replace_node_rcu()David Howells2016-07-061-2/+24
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement an RCU-safe variant of rb_replace_node() and rearrange rb_replace_node() to do things in the same order. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
* | | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2016-07-263-5/+82
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge updates from Andrew Morton: - a few misc bits - ocfs2 - most(?) of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits) thp: fix comments of __pmd_trans_huge_lock() cgroup: remove unnecessary 0 check from css_from_id() cgroup: fix idr leak for the first cgroup root mm: memcontrol: fix documentation for compound parameter mm: memcontrol: remove BUG_ON in uncharge_list mm: fix build warnings in <linux/compaction.h> mm, thp: convert from optimistic swapin collapsing to conservative mm, thp: fix comment inconsistency for swapin readahead functions thp: update Documentation/{vm/transhuge,filesystems/proc}.txt shmem: split huge pages beyond i_size under memory pressure thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE khugepaged: add support of collapse for tmpfs/shmem pages shmem: make shmem_inode_info::lock irq-safe khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() thp: extract khugepaged from mm/huge_memory.c shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings shmem: add huge pages support shmem: get_unmapped_area align huge page shmem: prepare huge= mount option and sysfs knob mm, rmap: account shmem thp pages ...
| * | | | | radix-tree: implement radix_tree_maybe_preload_order()Kirill A. Shutemov2016-07-261-5/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new helper is similar to radix_tree_maybe_preload(), but tries to preload number of nodes required to insert (1 << order) continuous naturally-aligned elements. This is required to push huge pages into pagecache. Link: http://lkml.kernel.org/r/1466021202-61880-24-git-send-email-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | mm/page_owner: use stackdepot to store stacktraceJoonsoo Kim2016-07-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we store each page's allocation stacktrace on corresponding page_ext structure and it requires a lot of memory. This causes the problem that memory tight system doesn't work well if page_owner is enabled. Moreover, even with this large memory consumption, we cannot get full stacktrace because we allocate memory at boot time and just maintain 8 stacktrace slots to balance memory consumption. We could increase it to more but it would make system unusable or change system behaviour. To solve the problem, this patch uses stackdepot to store stacktrace. It obviously provides memory saving but there is a drawback that stackdepot could fail. stackdepot allocates memory at runtime so it could fail if system has not enough memory. But, most of allocation stack are generated at very early time and there are much memory at this time. So, failure would not happen easily. And, one failure means that we miss just one page's allocation stacktrace so it would not be a big problem. In this patch, when memory allocation failure happens, we store special stracktrace handle to the page that is failed to save stacktrace. With it, user can guess memory usage properly even if failure happens. Memory saving looks as following. (4GB memory system with page_owner) (before the patch -> after the patch) static allocation: 92274688 bytes -> 25165824 bytes dynamic allocation after boot + kernel build: 0 bytes -> 327680 bytes total: 92274688 bytes -> 25493504 bytes 72% reduction in total. Note that implementation looks complex than someone would imagine because there is recursion issue. stackdepot uses page allocator and page_owner is called at page allocation. Using stackdepot in page_owner could re-call page allcator and then page_owner. That is a recursion. To detect and avoid it, whenever we obtain stacktrace, recursion is checked and page_owner is set to dummy information if found. Dummy information means that this page is allocated for page_owner feature itself (such as stackdepot) and it's understandable behavior for user. [iamjoonsoo.kim@lge.com: mm-page_owner-use-stackdepot-to-store-stacktrace-v3] Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com Link: http://lkml.kernel.org/r/1466150259-27727-7-git-send-email-iamjoonsoo.kim@lge.com Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | dma-debug: track bucket lock state for static checkersStephen Boyd2016-07-261-0/+2
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | get_hash_bucket() and put_hash_bucket() acquire and release the same spinlock, but this confuses static checkers such as sparse lib/dma-debug.c:254:27: warning: context imbalance in 'get_hash_bucket' - wrong count at exit lib/dma-debug.c:268:13: warning: context imbalance in 'put_hash_bucket' - unexpected unlock Add the appropriate acquire and release statements so that checkers can properly track the lock state. Link: http://lkml.kernel.org/r/20160701191552.24295-1-sboyd@codeaurora.org Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-blockLinus Torvalds2016-07-261-30/+15
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block updates from Jens Axboe: - the big change is the cleanup from Mike Christie, cleaning up our uses of command types and modified flags. This is what will throw some merge conflicts - regression fix for the above for btrfs, from Vincent - following up to the above, better packing of struct request from Christoph - a 2038 fix for blktrace from Arnd - a few trivial/spelling fixes from Bart Van Assche - a front merge check fix from Damien, which could cause issues on SMR drives - Atari partition fix from Gabriel - convert cfq to highres timers, since jiffies isn't granular enough for some devices these days. From Jan and Jeff - CFQ priority boost fix idle classes, from me - cleanup series from Ming, improving our bio/bvec iteration - a direct issue fix for blk-mq from Omar - fix for plug merging not involving the IO scheduler, like we do for other types of merges. From Tahsin - expose DAX type internally and through sysfs. From Toshi and Yigal * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits) block: Fix front merge check block: do not merge requests without consulting with io scheduler block: Fix spelling in a source code comment block: expose QUEUE_FLAG_DAX in sysfs block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Btrfs: fix comparison in __btrfs_map_block() block: atari: Return early for unsupported sector size Doc: block: Fix a typo in queue-sysfs.txt cfq-iosched: Charge at least 1 jiffie instead of 1 ns cfq-iosched: Fix regression in bonnie++ rewrite performance cfq-iosched: Convert slice_resid from u64 to s64 block: Convert fifo_time from ulong to u64 blktrace: avoid using timespec block/blk-cgroup.c: Declare local symbols static block/bio-integrity.c: Add #include "blk.h" block/partition-generic.c: Remove a set-but-not-used variable block: bio: kill BIO_MAX_SIZE cfq-iosched: temporarily boost queue priority for idle classes block: drbd: avoid to use BIO_MAX_SIZE block: bio: remove BIO_MAX_SECTORS ...
| * | | | iov_iter: use bvec iterator to implement iterate_bvec()Ming Lei2016-06-091-30/+15
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bvec has one native/mature iterator for long time, so not necessary to use the reinvented wheel for iterating bvecs in lib/iov_iter.c. Two ITER_BVEC test cases are run: - xfstest(-g auto) on loop dio/aio, no regression found - swap file works well under extreme stress(stress-ng --all 64 -t 800 -v), and lots of OOMs are triggerd, and the whole system still survives Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | | Merge branch 'linus' of ↵Linus Torvalds2016-07-262-173/+92
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Here is the crypto update for 4.8: API: - first part of skcipher low-level conversions - add KPP (Key-agreement Protocol Primitives) interface. Algorithms: - fix IPsec/cryptd reordering issues that affects aesni - RSA no longer does explicit leading zero removal - add SHA3 - add DH - add ECDH - improve DRBG performance by not doing CTR by hand Drivers: - add x86 AVX2 multibuffer SHA256/512 - add POWER8 optimised crc32c - add xts support to vmx - add DH support to qat - add RSA support to caam - add Layerscape support to caam - add SEC1 AEAD support to talitos - improve performance by chaining requests in marvell/cesa - add support for Araneus Alea I USB RNG - add support for Broadcom BCM5301 RNG - add support for Amlogic Meson RNG - add support Broadcom NSP SoC RNG" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (180 commits) crypto: vmx - Fix aes_p8_xts_decrypt build failure crypto: vmx - Ignore generated files crypto: vmx - Adding support for XTS crypto: vmx - Adding asm subroutines for XTS crypto: skcipher - add comment for skcipher_alg->base crypto: testmgr - Print akcipher algorithm name crypto: marvell - Fix wrong flag used for GFP in mv_cesa_dma_add_iv_op crypto: nx - off by one bug in nx_of_update_msc() crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct crypto: scatterwalk - Inline start/map/done crypto: scatterwalk - Remove unnecessary BUG in scatterwalk_start crypto: scatterwalk - Remove unnecessary advance in scatterwalk_pagedone crypto: scatterwalk - Fix test in scatterwalk_done crypto: api - Optimise away crypto_yield when hard preemption is on crypto: scatterwalk - add no-copy support to copychunks crypto: scatterwalk - Remove scatterwalk_bytes_sglen crypto: omap - Stop using crypto scatterwalk_bytes_sglen crypto: skcipher - Remove top-level givcipher interface crypto: user - Remove crypto_lookup_skcipher call crypto: cts - Convert to skcipher ...
| * | | lib/mpi: Do not do sg_virtHerbert Xu2016-07-011-36/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the mpi SG helpers use sg_virt which is completely broken. It happens to work with normal kernel memory but will fail with anything that is not linearly mapped. This patch fixes this by using the SG iterator helpers. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | | crypto: rsa - Generate fixed-length outputHerbert Xu2016-07-011-29/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Every implementation of RSA that we have naturally generates output with leading zeroes. The one and only user of RSA, pkcs1pad wants to have those leading zeroes in place, in fact because they are currently absent it has to write those zeroes itself. So we shouldn't be stripping leading zeroes in the first place. In fact this patch makes rsa-generic produce output with fixed length so that pkcs1pad does not need to do any extra work. This patch also changes DH to use the new interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | | lib/mpi: refactor mpi_read_from_buffer() in terms of mpi_read_raw_data()Nicolai Stange2016-05-311-21/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mpi_read_from_buffer() and mpi_read_raw_data() do basically the same thing except that the former extracts the number of payload bits from the first two bytes of the input buffer. Besides that, the data copying logic is exactly the same. Replace the open coded buffer to MPI instance conversion by a call to mpi_read_raw_data(). Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | | lib/mpi: mpi_read_from_buffer(): sanitize short buffer printkNicolai Stange2016-05-311-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first two bytes of the input buffer encode its expected length and mpi_read_from_buffer() prints a console message if the given buffer is too short. However, there are some oddities with how this message is printed: - It is printed at the default loglevel. This is different from the one used in the case that the first two bytes' value is unsupportedly large, i.e. KERN_INFO. - The format specifier '%d' is used for unsigned ints. - It prints the values of nread and *ret_nread. This is redundant since the former is always the latter + 1. Clean this up as follows: - Use pr_info() rather than printk() with no loglevel. - Use the format specifiers '%u' in place if '%d'. - Do not print the redundant 'nread' but the more helpful 'nbytes' value. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | | lib/mpi: mpi_read_from_buffer(): return -EINVAL upon too short bufferNicolai Stange2016-05-311-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if the input buffer is shorter than the expected length as indicated by its first two bytes, an MPI instance of this expected length will be allocated and filled with as much data as is available. The rest will remain uninitialized. Instead of leaving this condition undetected, an error code should be reported to the caller. Since this situation indicates that the input buffer's first two bytes, encoding the number of expected bits, are garbled, -EINVAL is appropriate here. If the input buffer is shorter than indicated by its first two bytes, make mpi_read_from_buffer() return -EINVAL. Get rid of the 'nread' variable: with the new semantics, the total number of bytes read from the input buffer is known in advance. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | | lib/digsig: digsig_verify_rsa(): return -EINVAL if modulo length is zeroNicolai Stange2016-05-311-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if digsig_verify_rsa() detects that the modulo's length is zero, i.e. mlen == 0, it returns -ENOMEM which doesn't really fit here. Make digsig_verify_rsa() return -EINVAL upon mlen == 0. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | | lib/mpi: mpi_read_from_buffer(): return error codeNicolai Stange2016-05-312-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mpi_read_from_buffer() reads a MPI from a buffer into a newly allocated MPI instance. It expects the buffer's leading two bytes to contain the number of bits, followed by the actual payload. On failure, it returns NULL and updates the in/out argument ret_nread somewhat inconsistently: - If the given buffer is too short to contain the leading two bytes encoding the number of bits or their value is unsupported, then ret_nread will be cleared. - If the allocation of the resulting MPI instance fails, ret_nread is left as is. The only user of mpi_read_from_buffer(), digsig_verify_rsa(), simply checks for a return value of NULL and returns -ENOMEM if that happens. While this is all of cosmetic nature only, there is another error condition which currently isn't detectable by the caller of mpi_read_from_buffer(): if the given buffer is too small to hold the number of bits as encoded in its first two bytes, the return value will be non-NULL and *ret_nread > 0. In preparation of communicating this condition to the caller, let mpi_read_from_buffer() return error values by means of the ERR_PTR() mechanism. Make the sole caller of mpi_read_from_buffer(), digsig_verify_rsa(), check the return value for IS_ERR() rather than == NULL. If IS_ERR() is true, return the associated error value rather than the fixed -ENOMEM. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | | lib/mpi: mpi_read_raw_data(): fix nbits calculationNicolai Stange2016-05-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The number of bits, nbits, is calculated in mpi_read_raw_data() as follows: nbits = nbytes * 8; Afterwards, the number of leading zero bits of the first byte get subtracted: nbits -= count_leading_zeros(buffer[0]); However, count_leading_zeros() takes an unsigned long and thus, the u8 gets promoted to an unsigned long. Thus, the above doesn't subtract the number of leading zeros in the most significant nonzero input byte from nbits, but the number of leading zeros of the most significant nonzero input byte promoted to unsigned long, i.e. BITS_PER_LONG - 8 too many. Fix this by subtracting count_leading_zeros(...) - (BITS_PER_LONG - 8) from nbits only. Fixes: e1045992949 ("MPILIB: Provide a function to read raw data into an MPI") Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | | lib/mpi: mpi_read_raw_data(): purge redundant clearing of nbitsNicolai Stange2016-05-311-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In mpi_read_raw_data(), unsigned nbits is calculated as follows: nbits = nbytes * 8; and redundantly cleared later on if nbytes == 0: if (nbytes > 0) ... else nbits = 0; Purge this redundant clearing for the sake of clarity. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | | lib/mpi: purge mpi_set_buffer()Nicolai Stange2016-05-311-76/+0
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | mpi_set_buffer() has no in-tree users and similar functionality is provided by mpi_read_raw_data(). Remove mpi_set_buffer(). Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
OpenPOWER on IntegriCloud