| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sysfs frequently performs duplication of strings located in read-only
memory section. Replacing kstrdup by kstrdup_const allows to avoid such
operations.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mike Turquette <mturquette@linaro.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
kstrdup() is often used to duplicate strings where neither source neither
destination will be ever modified. In such case we can just reuse the
source instead of duplicating it. The problem is that we must be sure
that the source is non-modifiable and its life-time is long enough.
I suspect the good candidates for such strings are strings located in
kernel .rodata section, they cannot be modifed because the section is
read-only and their life-time is equal to kernel life-time.
This small patchset proposes alternative version of kstrdup -
kstrdup_const, which returns source string if it is located in .rodata
otherwise it fallbacks to kstrdup. To verify if the source is in
.rodata function checks if the address is between sentinels
__start_rodata, __end_rodata. I guess it should work with all
architectures.
The main patch is accompanied by four patches constifying kstrdup for
cases where situtation described above happens frequently.
I have tested the patchset on mobile platform (exynos4210-trats) and it
saves 3272 string allocations. Since minimal allocation is 32 or 64
bytes depending on Kconfig options the patchset saves respectively about
100KB or 200KB of memory.
Stats from tested platform show that the main offender is sysfs:
By caller:
2260 __kernfs_new_node
631 clk_register+0xc8/0x1b8
318 clk_register+0x34/0x1b8
51 kmem_cache_create
12 alloc_vfsmnt
By string (with count >= 5):
883 power
876 subsystem
135 parameters
132 device
61 iommu_group
...
This patch (of 5):
Add an alternative version of kstrdup which returns pointer to constant
char array. The function checks if input string is in persistent and
read-only memory section, if yes it returns the input string, otherwise it
fallbacks to kstrdup.
kstrdup_const is accompanied by kfree_const performing conditional memory
deallocation of the string.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mike Turquette <mturquette@linaro.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 8f243af42ade ("sections: fix const sections for crc32 table")
removed the compile-time generated crc32 tables from the RO sections,
because it conflicts with the definition of __cacheline_aligned which
puts all such aligned data into .data..cacheline_aligned section
optimized for wasting less space, and can cause alignment issues when
used in combination with const with some gcc versions like 4.7.0 due to
a gcc bug [1].
Given that most gcc versions should have the fix by now, we can just use
____cacheline_aligned, which only aligns the data but doesn't move it
into specific sections as opposed to __cacheline_aligned. In case of
gcc versions having the mentioned bug, the alignment attribute will have
no effect, but the data will still be made RO.
After patch tables are in RO:
$ nm -v lib/crc32.o | grep -1 -E "crc32c?table"
0000000000000000 t arch_local_irq_enable
0000000000000000 r crc32ctable_le
0000000000000000 t crc32_exit
--
0000000000000960 t test_buf
0000000000002000 r crc32table_be
0000000000004000 r crc32table_le
000000001d1056e5 A __crc_crc32_be
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52181
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first of these conditionals is completely redundant: If k == lim-1, we
must have off==0, so the second conditional will also trigger and then it
wouldn't matter if upper had some high bits set. But the second
conditional is in fact also redundant, since it only serves to clear out
some high-order "don't care" bits of dst, about which no guarantee is
made.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
| |
We can shift the bits from lower and upper into place before assembling
dst[k + off]; moving the shift of lower into the branch where we already
know that rem is non-zero allows us to remove a conditional.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gcc can generate slightly better code for stuff like "nbits %
BITS_PER_LONG" when it knows nbits is not negative. Since negative size
bitmaps or shift amounts don't make sense, change these parameters of
bitmap_shift_right to unsigned.
If off >= lim (which requires shift >= nbits), k is initialized with a
large positive value, but since I've let k continue to be signed, the loop
will never run and dst will be zeroed as expected. Inside the loop, k is
guaranteed to be non-negative, so the fact that it is promoted to unsigned
in the various expressions it appears in is harmless.
Also use "shift" and "nbits" consistently for the parameter names.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
| |
If left is 0, we can just let mask be ~0UL, so that anding with it is a
no-op. Conveniently, BITMAP_LAST_WORD_MASK provides precisely what we
need, and we can eliminate left.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the condition k==lim-1 is true, we must have off == 0 (otherwise, k
could never become that big). But in that case we have upper == 0 and
hence dst[k] == (src[k] & mask) >> rem. Since mask consists of a
consecutive range of bits starting from the LSB, anding dst[k] with mask
is a no-op.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
| |
We can shift the bits from lower and upper into place before assembling
dst[k]; moving the shift of upper into the branch where we already know
that rem is non-zero allows us to remove a conditional.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've previously changed the nbits parameter of most bitmap_* functions to
unsigned; now it is bitmap_shift_{left,right}'s turn. This alone saves
some .text, but while at it I found that there were a few other things one
could do. The end result of these seven patches is
$ scripts/bloat-o-meter /tmp/bitmap.o.{old,new}
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-328 (-328)
function old new delta
__bitmap_shift_right 384 226 -158
__bitmap_shift_left 306 136 -170
and less importantly also a smaller stack footprint
$ stack-o-meter.pl master bitmap
file function old new delta
lib/bitmap.o __bitmap_shift_right 24 8 -16
lib/bitmap.o __bitmap_shift_left 24 0 -24
For each pair of 0 <= shift <= nbits <= 256 I've tested the end result
with a few randomly filled src buffers (including garbage beyond nbits),
in each case verifying that the shift {left,right}-most bits of dst are
zero and the remaining nbits-shift bits correspond to src, so I'm fairly
confident I didn't screw up. That hasn't stopped me from being wrong
before, though.
This patch (of 7):
gcc can generate slightly better code for stuff like "nbits %
BITS_PER_LONG" when it knows nbits is not negative. Since negative size
bitmaps or shift amounts don't make sense, change these parameters of
bitmap_shift_right to unsigned.
The expressions involving "lim - 1" are still ok, since if lim is 0 the
loop is never executed.
Also use "shift" and "nbits" consistently for the parameter names.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
| |
On little-endian, there's no reason to have an extra, presumably less
efficient, way of copying a bitmap.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the prototype of bitmap_copy_le the same as bitmap_copy's. All other
bitmap_* functions take unsigned long* parameters; there's no reason this
should be special.
The only current user is the static inline uwb_mas_bm_copy_le, which
already does the void* laundering, so the end users can pass their u8 or
__le32 buffers without a cast.
Furthermore, this allows us to simply let bitmap_copy_le be an alias for
bitmap_copy on little-endian; see next patch.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds
Pull LED subsystem update from Bryan Wu:
"The big change of LED subsystem is introducing a new LED class for
Flash type LEDs which will be used for V4L2 subsystem.
Also we got some cleanup and fixes"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
leds: leds-gpio: Pass on error codes unmodified
DT: leds: Add led-sources property
leds: Add LED Flash class extension to the LED subsystem
leds: leds-mc13783: Use of_get_child_by_name() instead of refcount hack
leds: Use setup_timer
leds: Don't allow brightness values greater than max_brightness
DT: leds: Add flash LED devices related properties
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Instead of overriding error codes, pass them on unmodified. This
way a EPROBE_DEFER is correctly passed to the driver core. This results
in the LED driver correctly requesting probe deferral in cases the GPIO
controller is not yet available.
Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Reported-and-tested-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add a property for defining device outputs the LED
represented by the DT child node is connected to.
Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Bryan Wu <cooloney@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ian Campbell <ijc+devicetree@hellion.org.uk>
Cc: Kumar Gala <galak@codeaurora.org>
Cc: devicetree@vger.kernel.org
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Some LED devices support two operation modes - torch and flash.
This patch provides support for flash LED devices in the LED subsystem
by introducing new sysfs attributes and kernel internal interface.
The attributes being introduced are: flash_brightness, flash_strobe,
flash_timeout, max_flash_timeout, max_flash_brightness, flash_fault,
flash_sync_strobe and available_sync_leds. All the flash related
features are placed in a separate module.
The modifications aim to be compatible with V4L2 framework requirements
related to the flash devices management. The design assumes that V4L2
sub-device can take of the LED class device control and communicate
with it through the kernel internal interface. When V4L2 Flash sub-device
file is opened, the LED class device sysfs interface is made
unavailable.
Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
of_find_node_by_name() calls of_node_put() on its "from" parameter.
To counter this, mc13xxx_led_probe_dt() calls of_node_get() first.
Use of_get_child_by_name() instead to get rid of the refcount hack.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: linux-leds@vger.kernel.org
Signed-off-by: Bryan Wu <cooloney@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Convert a call to init_timer and accompanying intializations of
the timer's data and function fields to a call to setup_timer.
A simplified version of the semantic match that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
expression t,f,d;
@@
-init_timer(&t);
+setup_timer(&t,f,d);
-t.function = f;
-t.data = d;
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since commit 4d71a4a12b13 ("leds: Add support for setting brightness in
a synchronous way") the value passed to brightness_set() is no longer
limited to max_brightness and can be different from the internally saved
brightness value.
Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Addition of a LED Flash class extension entails the need for flash LED
specific device tree properties. The properties being added are:
max-microamp, flash-max-microamp, flash-timeout-microsec.
(cooloney@gmail.com: remove white spaces)
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Jacek Anaszewski <j.anaszewski@samsung.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module update from Rusty Russell:
"Trivial cleanups, mainly"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
module: Replace over-engineered nested sleep
module: Annotate nested sleep in resolve_symbol()
module: Remove double spaces in module verification taint message
kernel/module.c: Free lock-classes if parse_args failed
module: set ksymtab/kcrctab* section addresses to 0x0
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Since the introduction of the nested sleep warning; we've established
that the occasional sleep inside a wait_event() is fine.
wait_event() loops are invariant wrt. spurious wakeups, and the
occasional sleep has a similar effect on them. As long as its occasional
its harmless.
Therefore replace the 'correct' but verbose wait_woken() thing with
a simple annotation to shut up the warning.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Because wait_event() loops are safe vs spurious wakeups we can allow the
occasional sleep -- which ends up being very similar.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The warning message when loading modules with a wrong signature has
two spaces in it:
"module verification failed: signature and/or required key missing"
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
parse_args call module parameters' .set handlers, which may use locks defined in the module.
So, these classes should be freed in case parse_args returns error(e.g. due to incorrect parameter passed).
Signed-off-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
These __ksymtab*/__kcrctab* sections currently have non-zero addresses.
Non-zero section addresses in a relocatable ELF confuse GDB and it ends
up not relocating all symbols when add-symbol-file is used on modules
which have exports. The kernel's module loader does not care about
these addresses, so let's just set them to zero.
Before:
$ readelf -S lib/notifier-error-inject.ko | grep 'Name\| __ksymtab_gpl'
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 8] __ksymtab_gpl PROGBITS 0000000c 0001b4 000010 00 A 0 0 4
(gdb) add-symbol-file lib/notifier-error-inject.ko 0x500000 -s .bss 0x700000
add symbol table from file "lib/notifier-error-inject.ko" at
.text_addr = 0x500000
.bss_addr = 0x700000
(gdb) p ¬ifier_err_inject_dir
$3 = (struct dentry **) 0x0
After:
$ readelf -S lib/notifier-error-inject.ko | grep 'Name\| __ksymtab_gpl'
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 8] __ksymtab_gpl PROGBITS 00000000 0001b4 000010 00 A 0 0 4
(gdb) add-symbol-file lib/notifier-error-inject.ko 0x500000 -s .bss 0x700000
add symbol table from file "lib/notifier-error-inject.ko" at
.text_addr = 0x500000
.bss_addr = 0x700000
(gdb) p ¬ifier_err_inject_dir
$3 = (struct dentry **) 0x700000
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull arch/tile changes from Chris Metcalf:
"Not much in this batch, just some minor cleanups"
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: change MAINTAINERS website from tilera.com to ezchip.com
tile: enable sparse checks for get/put_user
tile: fix put_user sparse errors
tile: default to little endian on older toolchains
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add an extra intermediate variable to __get_user and __put_user
to give sparse an opportunity to detect mismatches.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Use x86's __inttype macro instead of using the typeof(x-x) trick to
generate a suitable integer size type. This avoids a sparse warning
when examining the x-x type with a bitwise type.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Older toolchains may not specify __LITTLE_ENDIAN__, but older
toolchains were all little endian. Don't make things unnecessarily
hard for those toolchains.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This reverts commit 5fcee53ce705d49c766f8a302c7e93bdfc33c124.
It causes the suspend to fail on at least the Chromebook Pixel, possibly
other platforms too.
Joerg Roedel points out that the logic should probably have been
if (max_physical_apicid > 255 ||
!(IS_ENABLED(CONFIG_HYPERVISOR_GUEST) &&
hypervisor_x2apic_available())) {
instead, but since the code is not in any fast-path, so we can just live
without that optimization and just revert to the original code.
Acked-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull KVM update from Paolo Bonzini:
"Fairly small update, but there are some interesting new features.
Common:
Optional support for adding a small amount of polling on each HLT
instruction executed in the guest (or equivalent for other
architectures). This can improve latency up to 50% on some
scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This
also has to be enabled manually for now, but the plan is to
auto-tune this in the future.
ARM/ARM64:
The highlights are support for GICv3 emulation and dirty page
tracking
s390:
Several optimizations and bugfixes. Also a first: a feature
exposed by KVM (UUID and long guest name in /proc/sysinfo) before
it is available in IBM's hypervisor! :)
MIPS:
Bugfixes.
x86:
Support for PML (page modification logging, a new feature in
Broadwell Xeons that speeds up dirty page tracking), nested
virtualization improvements (nested APICv---a nice optimization),
usual round of emulation fixes.
There is also a new option to reduce latency of the TSC deadline
timer in the guest; this needs to be tuned manually.
Some commits are common between this pull and Catalin's; I see you
have already included his tree.
Powerpc:
Nothing yet.
The KVM/PPC changes will come in through the PPC maintainers,
because I haven't received them yet and I might end up being
offline for some part of next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
KVM: ia64: drop kvm.h from installed user headers
KVM: x86: fix build with !CONFIG_SMP
KVM: x86: emulate: correct page fault error code for NoWrite instructions
KVM: Disable compat ioctl for s390
KVM: s390: add cpu model support
KVM: s390: use facilities and cpu_id per KVM
KVM: s390/CPACF: Choose crypto control block format
s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
KVM: s390: reenable LPP facility
KVM: s390: floating irqs: fix user triggerable endless loop
kvm: add halt_poll_ns module parameter
kvm: remove KVM_MMIO_SIZE
KVM: MIPS: Don't leak FPU/DSP to guest
KVM: MIPS: Disable HTW while in guest
KVM: nVMX: Enable nested posted interrupt processing
KVM: nVMX: Enable nested virtual interrupt delivery
KVM: nVMX: Enable nested apic register virtualization
KVM: nVMX: Make nested control MSRs per-cpu
KVM: nVMX: Enable nested virtualize x2apic mode
KVM: nVMX: Prepare for using hardware MSR bitmap
...
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The header was deleted, so stop trying to install it.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
<asm/apic.h> isn't included directly and without CONFIG_SMP, an option
that automagically pulls it can't be enabled.
Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
NoWrite instructions (e.g. cmp or test) never set the "write access"
bit in the error code, even if one of the operands is treated as a
destination.
Fixes: c205fb7d7d4f81e46fc577b707ceb9e356af1456
Cc: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: fixes and features for kvm/next (3.20)
1. Fixes
- Fix user triggerable endless loop
- reenable LPP facility
- disable KVM compat ioctl on s390 (untested and broken)
2. cpu models for s390
- provide facilities and instruction blocking per VM
- add s390 specific vm attributes for setting values
3. crypto
- toleration patch for z13 support
4. add uuid and long name to /proc/sysinfo (stsi 322)
- patch Acked by Heiko Carstens (touches non-kvm s390 code)
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We never had a 31bit QEMU/kuli running. We would need to review several
ioctls to check if this creates holes, bugs or whatever to make it work.
Lets just disable compat support for KVM on s390.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This patch enables cpu model support in kvm/s390 via the vm attribute
interface.
During KVM initialization, the host properties cpuid, IBC value and the
facility list are stored in the architecture specific cpu model structure.
During vcpu setup, these properties are taken to initialize the related SIE
state. This mechanism allows to adjust the properties from user space and thus
to implement different selectable cpu models.
This patch uses the IBC functionality to block instructions that have not
been implemented at the requested CPU type and GA level compared to the
full host capability.
Userspace has to initialize the cpu model before vcpu creation. A cpu model
change of running vcpus is not possible.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The patch introduces facilities and cpu_ids per virtual machine.
Different virtual machines may want to expose different facilities and
cpu ids to the guest, so let's make them per-vm instead of global.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We need to specify a different format for the crypto control block
depending on whether the APXA facility is installed or not. Let's
test for it by executing the PQAP(QCI) function and use either a
format-1 or a format-2 crypto control block accordingly. This is a
host only change for z13 and does not affect the guest view.
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
A new architecture extends STSI 3.2.2 with UUID and long names. KVM
will provide the first implementation. This patch adds the additional
data fields (Extended Name and UUID) from the 4KB block returned by
the STSI 3.2.2 command and reflect this information in the
/proc/sysinfo file accordingly.
Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
commit 7be81a46695d ("KVM: s390/facilities: allow TOD-CLOCK steering
facility bit") accidentially disabled the "load program parameter"
facility bit during rebase for upstream submission (my fault).
Re-add that bit.
As this is only for a performance measurement helper instruction
(used by KVM itself) cc stable is not necessary see
http://www-01.ibm.com/support/docview.wss?uid=isg26fcd1cc32246f4c8852574ce0044734a
(SA23-2260 The Load-Program-Parameter and CPU-Measurement Facilities)
for details about LPP and its usecase.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Fixes: 7be81a46695d ("KVM: s390/facilities: allow TOD-CLOCK steering")
|
| |/ / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If a vm with no VCPUs is created, the injection of a floating irq
leads to an endless loop in the kernel.
Let's skip the search for a destination VCPU for a floating irq if no
VCPUs were created.
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
After f78146b0f923, "KVM: Fix page-crossing MMIO", and
87da7e66a405, "KVM: x86: fix vcpu->mmio_fragments overflow",
actually KVM_MMIO_SIZE is gone.
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The FPU and DSP are enabled via the CP0 Status CU1 and MX bits by
kvm_mips_set_c0_status() on a guest exit, presumably in case there is
active state that needs saving if pre-emption occurs. However neither of
these bits are cleared again when returning to the guest.
This effectively gives the guest access to the FPU/DSP hardware after
the first guest exit even though it is not aware of its presence,
allowing FP instructions in guest user code to intermittently actually
execute instead of trapping into the guest OS for emulation. It will
then read & manipulate the hardware FP registers which technically
belong to the user process (e.g. QEMU), or are stale from another user
process. It can also crash the guest OS by causing an FP exception, for
which a guest exception handler won't have been registered.
First lets save and disable the FPU (and MSA) state with lose_fpu(1)
before entering the guest. This simplifies the problem, especially for
when guest FPU/MSA support is added in the future, and prevents FR=1 FPU
state being live when the FR bit gets cleared for the guest, which
according to the architecture causes the contents of the FPU and vector
registers to become UNPREDICTABLE.
We can then safely remove the enabling of the FPU in
kvm_mips_set_c0_status(), since there should never be any active FPU or
MSA state to save at pre-emption, which should plug the FPU leak.
DSP state is always live rather than being lazily restored, so for that
it is simpler to just clear the MX bit again when re-entering the guest.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # v3.10+: 044f0f03eca0: MIPS: KVM: Deliver guest interrupts
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Ensure any hardware page table walker (HTW) is disabled while in KVM
guest mode, as KVM doesn't yet set up hardware page table walking for
guest mappings so the wrong mappings would get loaded, resulting in the
guest hanging or crashing once it reaches userland.
The HTW is disabled and re-enabled around the call to
__kvm_mips_vcpu_run() which does the initial switch into guest mode and
the final switch out of guest context. Additionally it is enabled for
the duration of guest exits (i.e. kvm_mips_handle_exit()), getting
disabled again before returning back to guest or host.
In all cases the HTW is only disabled in normal kernel mode while
interrupts are disabled, so that the HTW doesn't get left disabled if
the process is preempted.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # v3.17+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If vcpu has a interrupt in vmx non-root mode, injecting that interrupt
requires a vmexit. With posted interrupt processing, the vmexit
is not needed, and interrupts are fully taken care of by hardware.
In nested vmx, this feature avoids much more vmexits than non-nested vmx.
When L1 asks L0 to deliver L1's posted interrupt vector, and the target
VCPU is in non-root mode, we use a physical ipi to deliver POSTED_INTR_NV
to the target vCPU. Using POSTED_INTR_NV avoids unexpected interrupts
if a concurrent vmexit happens and L1's vector is different with L0's.
The IPI triggers posted interrupt processing in the target physical CPU.
In case the target vCPU was not in guest mode, complete the posted
interrupt delivery on the next entry to L2.
Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
With virtual interrupt delivery, the hardware lets KVM use a more
efficient mechanism for interrupt injection. This is an important feature
for nested VMX, because it reduces vmexits substantially and they are
much more expensive with nested virtualization. This is especially
important for throughput-bound scenarios.
Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|