summaryrefslogtreecommitdiffstats
path: root/tools
Commit message (Collapse)AuthorAgeFilesLines
* radix_tree: tag all internal tree nodes as indirect pointersMatthew Wilcox2016-03-171-2/+3
| | | | | | | | | | | | | | | | Set the 'indirect_ptr' bit on all the pointers to internal nodes, not just on the root node. This enables the following patches to support multi-order entries in the radix tree. This patch is split out for ease of bisection. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* radix tree test harnessMatthew Wilcox2016-03-1736-0/+2110
| | | | | | | | | | | | | | | | | | | | | | | | | | This code is mostly from Andrew Morton and Nick Piggin; tarball downloaded from http://ozlabs.org/~akpm/rtth.tar.gz with sha1sum 0ce679db9ec047296b5d1ff7a1dfaa03a7bef1bd Some small modifications were necessary to the test harness to fix the build with the current Linux source code. I also made minor modifications to automatically test the radix-tree.c and radix-tree.h files that are in the current source tree, as opposed to a copied and slightly modified version. I am sure more could be done to tidy up the harness, as well as adding more tests. [koct9i@gmail.com: fix compilation] Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* tools/vm/page-types.c: avoid memset() in walk_pfn() when count == 1Naoya Horiguchi2016-03-171-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found that page-types is very slow and my testing shows many timeout errors. Here's an example with a simple program allocating 1000 thps. $ time ./page-types -p $(pgrep -f test_alloc) ... real 0m17.201s user 0m16.889s sys 0m0.312s Most of time is spent in memset(). Currently memset() clears over whole buffer for every walk_pfn() call, which is inefficient when walk_pfn() is called from walk_vma(), because in that case walk_pfn() is called for each pfn. So this patch limits the zero initialization only for the first element. $ time ./page-types.patched -p $(pgrep -f test_alloc) ... real 0m0.182s user 0m0.046s sys 0m0.135s Fixes: 954e95584579 ("tools/vm/page-types.c: add memory cgroup dumping and filtering") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Suggested-by: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* tools/vm/page-types.c: add memory cgroup dumping and filteringKonstantin Khlebnikov2016-03-171-15/+84
| | | | | | | | | | | | | | This adds two command line keys: -c|--cgroup path|@inode Walk only pages owned by this memory cgroup -C|--list-cgroup Show memory cgroup inodes [vdavydov@virtuozzo.com: opt_cgroup should be uint64_t. Fix conflicts with "tools/vm/page-types.c: support swap entry"] Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* tools/vm/page-types.c: support swap entryNaoya Horiguchi2016-03-171-1/+29
| | | | | | | | | | | /proc/pid/pagemap (pte_to_pagemap_entry() internally) already reports about swap entry, so let's make the in-kernel utility aware of it. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'libnvdimm-for-4.6' of ↵Linus Torvalds2016-03-161-63/+222
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: - Asynchronous address range scrub: Given the capacities of next generation persistent memory devices a scrub operation to find all poison may take 10s of seconds. We want this scrub work to be done asynchronously with the rest of system initialization, so we move it out of line from the NFIT probing, i.e. acpi_nfit_add(). - Clear poison: ACPI 6.1 introduces the ability to send "clear error" commands to the ACPI0012:00 device representing the root of an "nvdimm bus". Similar to relocating a bad block on a disk, this support clears media errors in response to a write. - Persistent memory resource tracking: A persistent memory range may be designated as simply "reserved" by platform firmware in the efi/e820 memory map. Later when the NFIT driver loads it discovers that the range is "Persistent Memory". The NFIT bus driver inserts a resource to advertise that "persistent" attribute in the system resource tree for /proc/iomem and kernel-internal usages. - Miscellaneous cleanups and fixes: Workaround section misaligned pmem ranges when allocating a struct page memmap, fix handling of the read-only case in the ioctl path, and clean up block device major number allocation. * tag 'libnvdimm-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits) libnvdimm, pmem: clear poison on write libnvdimm, pmem: fix kmap_atomic() leak in error path nvdimm/btt: don't allocate unused major device number nvdimm/blk: don't allocate unused major device number pmem: don't allocate unused major device number ACPI: Change NFIT driver to insert new resource resource: Export insert_resource and remove_resource resource: Add remove_resource interface resource: Change __request_region to inherit from immediate parent libnvdimm, pmem: fix ia64 build, use PHYS_PFN nfit, libnvdimm: clear poison command support libnvdimm, pfn: 'resource'-address and 'size' attributes for pfn devices libnvdimm, pmem: adjust for section collisions with 'System RAM' libnvdimm, pmem: fix 'pfn' support for section-misaligned namespaces libnvdimm: Fix security issue with DSM IOCTL. libnvdimm: Clean-up access mode check. tools/testing/nvdimm: expand ars unit testing nfit: disable userspace initiated ars during scrub nfit: scrub and register regions in a workqueue nfit, libnvdimm: async region scrub workqueue ...
| * nfit, libnvdimm: clear poison command supportDan Williams2016-03-051-0/+29
| | | | | | | | | | | | | | | | | | Add the boiler-plate for a 'clear error' command based on section 9.20.7.6 "Function Index 4 - Clear Uncorrectable Error" from the ACPI 6.1 specification, and add a reference implementation in nfit_test. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * tools/testing/nvdimm: expand ars unit testingDan Williams2016-03-051-22/+90
| | | | | | | | | | | | | | Simulate platform-firmware-initiated and asynchronous scrub results. This injects poison in the middle of all nfit_test pmem address ranges. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * nfit, tools/testing/nvdimm: unify common init for acpi_nfit_descDan Williams2016-03-051-19/+3
| | | | | | | | | | | | | | | | | | The nvdimm unit test infrastructure performs its own initialization of an acpi_nfit_desc to specify test overrides over the native implementation. Make it clear which attributes and operations it is overriding by re-using acpi_nfit_init_desc() as a common starting point. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * libnvdimm, nfit: centralize command status translationDan Williams2016-03-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The return value from an 'ndctl_fn' reports the command execution status, i.e. was the command properly formatted and was it successfully submitted to the bus provider. The new 'cmd_rc' parameter allows the bus provider to communicate command specific results, translated into common error codes. Convert the ARS commands to this scheme to: 1/ Consolidate status reporting 2/ Prepare for for expanding ars unit test cases 3/ Make the implementation more generic Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * nfit, tools/testing/nvdimm: test multiple control regions per-dimmDan Williams2016-03-051-24/+94
| | | | | | | | | | | | | | | | ACPI 6.1 clarifies that "The system shall include an NVDIMM Control Region Structure for every Function Interface in the NVDIMM." Implement this clarification in nfit_test. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * nfit, tools/testing/nvdimm: add format interface code definitionsDan Williams2016-03-051-1/+6
| | | | | | | | | | | | | | ACPI 6.1 and JEDEC Annex L Release 3 formalize the format interface code. Add definitions and update their usage in the unit test. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | Merge tag 'pm+acpi-4.6-rc1-1' of ↵Linus Torvalds2016-03-162-193/+720
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "This time the majority of changes go into cpufreq and they are significant. First off, the way CPU frequency updates are triggered is different now. Instead of having to set up and manage a deferrable timer for each CPU in the system to evaluate and possibly change its frequency periodically, cpufreq governors set up callbacks to be invoked by the scheduler on a regular basis (basically on utilization updates). The "old" governors, "ondemand" and "conservative", still do all of their work in process context (although that is triggered by the scheduler now), but intel_pstate does it all in the callback invoked by the scheduler with no need for any additional asynchronous processing. Of course, this eliminates the overhead related to the management of all those timers, but also it allows the cpufreq governor code to be simplified quite a bit. On top of that, the common code and data structures used by the "ondemand" and "conservative" governors are cleaned up and made more straightforward and some long-standing and quite annoying problems are addressed. In particular, the handling of governor sysfs attributes is modified and the related locking becomes more fine grained which allows some concurrency problems to be avoided (particularly deadlocks with the core cpufreq code). In principle, the new mechanism for triggering frequency updates allows utilization information to be passed from the scheduler to cpufreq. Although the current code doesn't make use of it, in the works is a new cpufreq governor that will make decisions based on the scheduler's utilization data. That should allow the scheduler and cpufreq to work more closely together in the long run. In addition to the core and governor changes, cpufreq drivers are updated too. Fixes and optimizations go into intel_pstate, the cpufreq-dt driver is updated on top of some modification in the Operating Performance Points (OPP) framework and there are fixes and other updates in the powernv cpufreq driver. Apart from the cpufreq updates there is some new ACPICA material, including a fix for a problem introduced by previous ACPICA updates, and some less significant changes in the ACPI code, like CPPC code optimizations, ACPI processor driver cleanups and support for loading ACPI tables from initrd. Also updated are the generic power domains framework, the Intel RAPL power capping driver and the turbostat utility and we have a bunch of traditional assorted fixes and cleanups. Specifics: - Redesign of cpufreq governors and the intel_pstate driver to make them use callbacks invoked by the scheduler to trigger CPU frequency evaluation instead of using per-CPU deferrable timers for that purpose (Rafael Wysocki). - Reorganization and cleanup of cpufreq governor code to make it more straightforward and fix some concurrency problems in it (Rafael Wysocki, Viresh Kumar). - Cleanup and improvements of locking in the cpufreq core (Viresh Kumar). - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh Kumar, Eric Biggers). - intel_pstate driver updates including fixes, optimizations and a modification to make it enable enable hardware-coordinated P-state selection (HWP) by default if supported by the processor (Philippe Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe Franciosi). - Operating Performance Points (OPP) framework updates to improve its handling of voltage regulators and device clocks and updates of the cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter). - Updates of the powernv cpufreq driver to fix initialization and cleanup problems in it and correct its worker thread handling with respect to CPU offline, new powernv_throttle tracepoint (Shilpasri Bhat). - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki). - ACPICA updates including one fix for a regression introduced by previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box, Colin Ian King). - Support for installing ACPI tables from initrd (Lv Zheng). - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin Chaugule). - Support for _HID(ACPI0010) devices (ACPI processor containers) and ACPI processor driver cleanups (Sudeep Holla). - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory, Aleksey Makarov). - Modification of the ACPI PCI IRQ management code to make it treat 255 in the Interrupt Line register as "not connected" on x86 (as per the specification) and avoid attempts to use that value as a valid interrupt vector (Chen Fan). - ACPI APEI fixes related to resource leaks (Josh Hunt). - Removal of modularity from a few ACPI drivers (BGRT, GHES, intel_pmic_crc) that cannot be built as modules in practice (Paul Gortmaker). - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS as a valid resource type (Harb Abdulhamid). - New device ID (future AMD I2C controller) in the ACPI driver for AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu). - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin). - cpuidle menu governor optimization to avoid a square root computation in it (Rasmus Villemoes). - Fix for potential use-after-free in the generic device properties framework (Heikki Krogerus). - Updates of the generic power domains (genpd) framework including support for multiple power states of a domain, fixes and debugfs output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart, Geert Uytterhoeven). - Intel RAPL power capping driver updates to reduce IPI overhead in it (Jacob Pan). - System suspend/hibernation code cleanups (Eric Biggers, Saurabh Sengar). - Year 2038 fix for the process freezer (Abhilash Jindal). - turbostat utility updates including new features (decoding of more registers and CPUID fields, sub-second intervals support, GFX MHz and RC6 printout, --out command line option), fixes (syscall jitter detection and workaround, reductioin of the number of syscalls made, fixes related to Xeon x200 processors, compiler warning fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)" * tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits) tools/power turbostat: bugfix: TDP MSRs print bits fixing tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump tools/power turbostat: call __cpuid() instead of __get_cpuid() tools/power turbostat: indicate SMX and SGX support tools/power turbostat: detect and work around syscall jitter tools/power turbostat: show GFX%rc6 tools/power turbostat: show GFXMHz tools/power turbostat: show IRQs per CPU tools/power turbostat: make fewer systems calls tools/power turbostat: fix compiler warnings tools/power turbostat: add --out option for saving output in a file tools/power turbostat: re-name "%Busy" field to "Busy%" tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding tools/power turbostat: Intel Xeon x200: fix erroneous bclk value tools/power turbostat: allow sub-sec intervals ACPI / APEI: ERST: Fixed leaked resources in erst_init ACPI / APEI: Fix leaked resources intel_pstate: Do not skip samples partially intel_pstate: Remove freq calculation from intel_pstate_calc_busy() intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance() ...
| * \ Merge branch 'turbostat' of ↵Rafael J. Wysocki2016-03-142-197/+724
| |\ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-tools Pull turbostat updates for 4.6 from Len Brown. * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: bugfix: TDP MSRs print bits fixing tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump tools/power turbostat: call __cpuid() instead of __get_cpuid() tools/power turbostat: indicate SMX and SGX support tools/power turbostat: detect and work around syscall jitter tools/power turbostat: show GFX%rc6 tools/power turbostat: show GFXMHz tools/power turbostat: show IRQs per CPU tools/power turbostat: make fewer systems calls tools/power turbostat: fix compiler warnings tools/power turbostat: add --out option for saving output in a file tools/power turbostat: re-name "%Busy" field to "Busy%" tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding tools/power turbostat: Intel Xeon x200: fix erroneous bclk value tools/power turbostat: allow sub-sec intervals tools/power turbostat: Decode MSR_MISC_PWR_MGMT tools/power turbostat: decode HWP registers x86 msr-index: Simplify syntax for HWP fields tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency tools/power turbostat: decode more CPUID fields
| | * tools/power turbostat: bugfix: TDP MSRs print bits fixingChen Yu2016-03-131-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MSR_CONFIG_TDP_NOMINAL: should print all 8 bits of base_ratio (bit 0:7) 0xFF MSR_CONFIG_TDP_LEVEL_1: should print all 15 bits of PKG_MIN_PWR_LVL1 (bit 48:62) 0x7FFF should print all 15 bits of PKG_MAX_PWR_LVL1 (bit 32:46) 0x7FFF should print all 8 bits of LVL1_RATIO (bit 16:23) 0xFF should print all 15 bits of PKG_TDP_LVL1 (bit 0:14) 0x7FFF And the same modification to MSR_CONFIG_TDP_LEVEL_2. MSR_TURBO_ACTIVATION_RATIO: should print all 8 bits of MAX_NON_TURBO_RATIO (bit 0:7) 0xFF Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dumpLen Brown2016-03-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=0: unlimited) should print as MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=8: unlimited) Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: call __cpuid() instead of __get_cpuid()Len Brown2016-03-131-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | turbostat already checks whether calling each cpuid leavf is legal, and it doesn't look at the function return value, so call the simpler gcc intrinsic __cpuid() instead of __get_cpuid(). syntax only, no functional change Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: indicate SMX and SGX supportLen Brown2016-03-131-1/+27
| | | | | | | | | | | | | | | | | | | | | SGX presence is related to a SKL power workaround, so lets show when that is enabled. Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: detect and work around syscall jitterLen Brown2016-03-131-1/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The accuracy of Bzy_Mhz and Busy% depend on reading the TSC, APERF, and MPERF close together in time. When there is a very short measurement interval, or a large system is profoundly idle, the changes in APERF and MPERF may be very small. They can be small enough that an expensive interrupt between reading APERF and MPERF can cause the APERF/MPERF ratio to become inaccurate, resulting in invalid calculation and display of Bzy_MHz. A dummy APERF read of APERF makes this problem much more rare. Apparently this 1st systemn call after exiting a long stretch of idle is when we typically see expensive timer interrupts that cause large jitter. For the cases that dummy APERF read fails to prevent, we compare the latency of the APERF and MPERF reads. If they differ by more than 2x, we re-issue them. Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: show GFX%rc6Len Brown2016-03-131-0/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The column "GFX%c6" show the percentage of time the GPU is in the "render C6" state, rc6. Deep package C-states on several systems depend on the GPU being in RC6. This information comes from the counter /sys/class/drm/card0/power/rc6_residency_ms, as read before and after the measurement interval. Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: show GFXMHzLen Brown2016-03-131-1/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under the column "GFXMHz", show a snapshot of this attribute: /sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz This is an instantaneous snapshot of what sysfs presents at the end of the measurement interval. turbostat does not average or otherwise perform any math on this value. Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: show IRQs per CPULen Brown2016-03-131-4/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new IRQ column shows how many interrupts have occurred on each CPU during the measurement inteval. This information comes from the difference between /proc/interrupts shapshots made before and after the measurement interval. The first row, the system summary, shows the sum of the IRQS for all CPUs during that interval. Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: make fewer systems callsLen Brown2016-03-131-10/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skip the open(2)/close(2) on each msr read by keeping the /dev/cpu/*/msr files open. The remaining read(2) is generally far fewer cycles than the removed open(2) system call. Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: fix compiler warningsLen Brown2016-03-131-4/+4
| | | | | | | | | | | | Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: add --out option for saving output in a fileLen Brown2016-03-132-135/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By default... Turbostat --debug gconfiguration info goes to stderr. In FORK mode, turbostat statistics go to stderr. In PERIODIC mode, turbostat statistics go to stdout. These defaults do not change, but an option "--out file" will send all output above only to the specified file. Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: re-name "%Busy" field to "Busy%"Len Brown2016-03-132-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | some tools processing turbostat output have difficulty with items that begin with %... Reported-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: Intel Xeon x200: fix turbo-ratio decodingHubert Chrzaniuk2016-03-131-27/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following changes have been made: - changed MSR_NHM_TURBO_RATIO_LIMIT to MSR_TURBO_RATIO_LIMIT in debug print for consistency with Developer Manual - updated definition of bitfields in MSR_TURBO_RATIO_LIMIT and appropriate parsing code - added x200 to list of architectures that do not support Nahlem compatible definition of MSR_TURBO_RATIO_LIMIT register (x200 has the register but bits definition is custom) - fixed typo in code that parses MSR_TURBO_RATIO_LIMIT (logical instead of bitwise operator) - changed MSR_TURBO_RATIO_LIMIT parsing algorithm so the print out had the same order as implementations for other platforms Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: Intel Xeon x200: fix erroneous bclk valueChrzaniuk, Hubert2016-03-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x200 does not enable any way to programmatically obtain bus clock speed. Bclk for the architecture has a fixed value of 100 MHz. At the same time x200 cannot be included in has_snb_msrs since it does not support C7 idle state. prior to this patch, MHz values reported on this chip were erroneously calculated using bclk of 133MHz, causing MHz values to be reported 33% higher than actual. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: allow sub-sec intervalsLen Brown2016-03-132-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | turbostat -i interval_sec will sample and display statistics every interval_sec. interval_sec used to be a whole number of seconds, but now we accept a decimal, as small as 0.001 sec (1 ms). Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: Decode MSR_MISC_PWR_MGMTLen Brown2016-02-171-0/+23
| | | | | | | | | | | | | | | | | | | | | This MSR is helpful to show if P-state HW coordination is enabled or disabled. Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: decode HWP registersLen Brown2016-02-171-6/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | # turbostat --debug ... CPUID(6): ... HWP, HWPnotify, HWPwindow, HWPepp, HWPpkg ... ... cpu0: MSR_PM_ENABLE: 0x00000001 (HWP) cpu0: MSR_HWP_CAPABILITIES: 0x01050916 (high 0x16 guar 0x9 eff 0x5 low 0x1) cpu0: MSR_HWP_REQUEST: 0x80001604 (min 0x4 max 0x16 des 0x0 epp 0x80 window 0x0 pkg 0x0) cpu0: MSR_HWP_INTERRUPT: 0x00000001 (EN_Guaranteed_Perf_Change, Dis_Excursion_Min) cpu0: MSR_HWP_STATUS: 0x00000000 (No-Guaranteed_Perf_Change, No-Excursion_Min) Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequencyLen Brown2016-02-171-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | This CPUID leaf is available on Skylake: CPUID(0x16): base_mhz: 1500 max_mhz: 2200 bus_mhz: 100 Signed-off-by: Len Brown <len.brown@intel.com>
| | * tools/power turbostat: decode more CPUID fieldsLen Brown2016-02-171-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for debugging, dump a few more fields: CPUID(1): SSE3 MONITOR EIST TM2 TSC MSR ACPI-TM TM cpu0: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MONITOR) Signed-off-by: Len Brown <len.brown@intel.com>
* | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2016-03-162-21/+30
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge first patch-bomb from Andrew Morton: - some misc things - ofs2 updates - about half of MM - checkpatch updates - autofs4 update * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits) autofs4: fix string.h include in auto_dev-ioctl.h autofs4: use pr_xxx() macros directly for logging autofs4: change log print macros to not insert newline autofs4: make autofs log prints consistent autofs4: fix some white space errors autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked() autofs4: fix coding style line length in autofs4_wait() autofs4: fix coding style problem in autofs4_get_set_timeout() autofs4: coding style fixes autofs: show pipe inode in mount options kallsyms: add support for relative offsets in kallsyms address table kallsyms: don't overload absolute symbol type for percpu symbols x86: kallsyms: disable absolute percpu symbols on !SMP checkpatch: fix another left brace warning checkpatch: improve UNSPECIFIED_INT test for bare signed/unsigned uses checkpatch: warn on bare unsigned or signed declarations without int checkpatch: exclude asm volatile from complex macro check mm: memcontrol: drop unnecessary lru locking from mem_cgroup_migrate() mm: migrate: consolidate mem_cgroup_migrate() calls mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous ...
| * | | mm, tracing: unify mm flags handling in tracepoints and printkVlastimil Babka2016-03-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In tracepoints, it's possible to print gfp flags in a human-friendly format through a macro show_gfp_flags(), which defines a translation array and passes is to __print_flags(). Since the following patch will introduce support for gfp flags printing in printk(), it would be nice to reuse the array. This is not straightforward, since __print_flags() can't simply reference an array defined in a .c file such as mm/debug.c - it has to be a macro to allow the macro magic to communicate the format to userspace tools such as trace-cmd. The solution is to create a macro __def_gfpflag_names which is used both in show_gfp_flags(), and to define the gfpflag_names[] array in mm/debug.c. On the other hand, mm/debug.c also defines translation tables for page flags and vma flags, and desire was expressed (but not implemented in this series) to use these also from tracepoints. Thus, this patch also renames the events/gfpflags.h file to events/mmflags.h and moves the table definitions there, using the same macro approach as for gfpflags. This allows translating all three kinds of mm-specific flags both in tracepoints and printk. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Michal Hocko <mhocko@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | tools, perf: make gfp_compact_table up to dateVlastimil Babka2016-03-151-19/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When updating tracing's show_gfp_flags() I have noticed that perf's gfp_compact_table is also outdated. Fill in the missing flags and place a note in gfp.h to increase chance that future updates are synced. Convert the __GFP_X flags from "GFP_X" to "__GFP_X" strings in line with the previous patch. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | slub: convert SLAB_DEBUG_FREE to SLAB_CONSISTENCY_CHECKSLaura Abbott2016-03-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SLAB_DEBUG_FREE allows expensive consistency checks at free to be turned on or off. Expand its use to be able to turn off all consistency checks. This gives a nice speed up if you only want features such as poisoning or tracing. Credit to Mathias Krause for the original work which inspired this series Signed-off-by: Laura Abbott <labbott@fedoraproject.org> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mathias Krause <minipli@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2016-03-151-2/+4
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes in this cycle were: - Miscellaneous fixes, cleanups, restructuring. - RCU torture-test updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Export rcu_gp_is_normal() rcu: Remove rcu_user_hooks_switch rcu: Catch up rcu_report_qs_rdp() comment with reality rcu: Document unique-name limitation for DEFINE_STATIC_SRCU() rcu: Make rcu/tiny_plugin.h explicitly non-modular irq: Privatize irq_common_data::state_use_accessors RCU: Privatize rcu_node::lock sparse: Add __private to privatize members of structs rcu: Remove useless rcu_data_p when !PREEMPT_RCU rcutorture: Correct no-expedite console messages rcu: Set rdp->gpwrap when CPU is idle rcu: Stop treating in-kernel CPU-bound workloads as errors rcu: Update rcu_report_qs_rsp() comment rcu: Assign false instead of 0 for ->core_needs_qs rcutorture: Check for self-detected stalls rcutorture: Don't keep empty console.log.diags files rcutorture: Add checks for rcutorture writer starvation
| * \ \ \ Merge commit 'torture.2015.02.23a' into core/rcuIngo Molnar2016-03-151-2/+4
| |\ \ \ \ | | |/ / / | |/| | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | rcutorture: Check for self-detected stallsPaul E. McKenney2016-01-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current scripts parse console output only for cases where one CPU detect a stall on some other CPU or task. This commit therefore adds checks for self-detected stalls. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | * | | rcutorture: Don't keep empty console.log.diags filesPaul E. McKenney2016-01-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, an error-free run produces an empty console.log.diags file. This can be annoying when using "vi */console.log.diags" to see a full summary of the errors. This commit therefore removes any empty files during the analysis process. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | * | | rcutorture: Add checks for rcutorture writer starvationPaul E. McKenney2016-01-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds checks for rcutorture writer starvation, so that instances will be added to the test summary. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* | | | | Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds2016-03-155-44/+501
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "This is another big update. Main changes are: - lots of x86 system call (and other traps/exceptions) entry code enhancements. In particular the complex parts of the 64-bit entry code have been migrated to C code as well, and a number of dusty corners have been refreshed. (Andy Lutomirski) - vDSO special mapping robustification and general cleanups (Andy Lutomirski) - cpufeature refactoring, cleanups and speedups (Borislav Petkov) - lots of other changes ..." * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) x86/cpufeature: Enable new AVX-512 features x86/entry/traps: Show unhandled signal for i386 in do_trap() x86/entry: Call enter_from_user_mode() with IRQs off x86/entry/32: Change INT80 to be an interrupt gate x86/entry: Improve system call entry comments x86/entry: Remove TIF_SINGLESTEP entry work x86/entry/32: Add and check a stack canary for the SYSENTER stack x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed x86/entry: Vastly simplify SYSENTER TF (single-step) handling x86/entry/traps: Clear DR6 early in do_debug() and improve the comment x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions x86/entry/32: Restore FLAGS on SYSEXIT x86/entry/32: Filter NT and speed up AC filtering in SYSENTER x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test selftests/x86: In syscall_nt, test NT|TF as well x86/asm-offsets: Remove PARAVIRT_enabled x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled uprobes: __create_xol_area() must nullify xol_mapping.fault x86/cpufeature: Create a new synthetic cpu capability for machine check recovery ...
| * | | | selftests/x86: In syscall_nt, test NT|TF as wellAndy Lutomirski2016-03-101-8/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting TF prevents fastpath returns in most cases, which causes the test to fail on 32-bit kernels because 32-bit kernels do not, in fact, handle NT correctly on SYSENTER entries. The next patch will fix 32-bit kernels. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/bd4bb48af6b10c0dc84aec6dbcf487ed25683495.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | Merge tag 'v4.5-rc7' into x86/asm, to pick up SMAP fixIngo Molnar2016-03-072-12/+11
| |\ \ \ \ | | | |/ / | | |/| | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | Merge branch 'x86/urgent' into x86/asm, to pick up fixesIngo Molnar2016-02-1828-63/+1510
| |\ \ \ \ | | | | | | | | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | selftests/x86: Add a test for syscall restart under ptraceAndy Lutomirski2016-02-171-0/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This catches a regression from the compat syscall rework. The 32-bit variant of this test currently fails. The issue is that, for a 32-bit tracer and a 32-bit tracee, GETREGS+SETREGS with no changes should be a no-op. It currently isn't a no-op if RAX indicates signal restart, because the high bits get cleared and the kernel loses track of the restart state. Reported-by: Robert O'Callahan <robert@ocallahan.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/c4040b40b5b4a37ed31375a69b683f753ec6788a.1455142412.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | selftests/x86: Fix some error messages in ptrace_syscallAndy Lutomirski2016-02-171-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I had some obvious typos. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e5e6772d4802986cf7df702e646fa24ac14f2204.1455142412.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | selftests/x86: Add tests for UC_SIGCONTEXT_SS and UC_STRICT_RESTORE_SSAndy Lutomirski2016-02-171-28/+202
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This tests the two ABI-preserving cases that DOSEMU cares about, and it also explicitly tests the new UC_SIGCONTEXT_SS and UC_STRICT_RESTORE_SS flags. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Stas Sergeev <stsp@list.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f3d08f98541d0bd3030ceb35e05e21f59e30232c.1455664054.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | x86/signal/64: Re-add support for SS in the 64-bit signal contextAndy Lutomirski2016-02-171-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a second attempt to make the improvements from c6f2062935c8 ("x86/signal/64: Fix SS handling for signals delivered to 64-bit programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add support for SS in the 64-bit signal context"). This adds two new uc_flags flags. UC_SIGCONTEXT_SS will be set for all 64-bit signals (including x32). It indicates that the saved SS field is valid and that the kernel supports the new behavior. The goal is to fix a problems with signal handling in 64-bit tasks: SS wasn't saved in the 64-bit signal context, making it awkward to determine what SS was at the time of signal delivery and making it impossible to return to a non-flat SS (as calling sigreturn clobbers SS). This also made it extremely difficult for 64-bit tasks to return to fully-defined 16-bit contexts, because only the kernel can easily do espfix64, but sigreturn was unable to set a non-flag SS:ESP. (DOSEMU has a monstrous hack to partially work around this limitation.) If we could go back in time, the correct fix would be to make 64-bit signals work just like 32-bit signals with respect to SS: save it in signal context, reset it when delivering a signal, and restore it in sigreturn. Unfortunately, doing that (as I tried originally) breaks DOSEMU: DOSEMU wouldn't reset the signal context's SS when clearing the LDT and changing the saved CS to 64-bit mode, since it predates the SS context field existing in the first place. This patch is a bit more complicated, and it tries to balance a bunch of goals. It makes most cases of changing ucontext->ss during signal handling work as expected. I do this by special-casing the interesting case. On sigreturn, ucontext->ss will be honored by default, unless the ucontext was created from scratch by an old program and had a 64-bit CS (unfortunately, CRIU can do this) or was the result of changing a 32-bit signal context to 64-bit without resetting SS (as DOSEMU does). For the benefit of new 64-bit software that uses segmentation (new versions of DOSEMU might), the new behavior can be detected with a new ucontext flag UC_SIGCONTEXT_SS. To avoid compilation issues, __pad0 is left as an alias for ss in ucontext. The nitty-gritty details are documented in the header file. This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests, as the kernel change allows both of them to pass. Tested-by: Stas Sergeev <stsp@list.ru> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org [ Small readability edit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
OpenPOWER on IntegriCloud