op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	cpufreq: intel_pstate: Ignore turbo active ratio in HWP	Srinivas Pandruvada	2018-08-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	When HWP is active turbo active ratio is not used, so we should allow policy max frequency above turbo activation ratio to be set. When HWP is not active, then any policy max frequency above turbo activation ratio can result upto max one-core turbo frequency. This fix helps better thermal control in turbo region when other methods like "Running Average Power Limit" is not available to use. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	Merge back cpufreq changes for 4.19.	Rafael J. Wysocki	2018-08-06	1	-9/+15
\|\
\| *	Merge back cpufreq material for 4.19.	Rafael J. Wysocki	2018-07-25	1	-9/+15
\| \|\
\| \| *	cpufreq: intel_pstate: Show different max frequency with turbo 3 and HWP	Srinivas Pandruvada	2018-07-19	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On HWP platforms with Turbo 3.0, the HWP capability max ratio shows the maximum ratio of that core, which can be different than other cores. If we show the correct maximum frequency in cpufreq sysfs via cpuinfo_max_freq and scaling_max_freq then, user can know which cores can run faster for pinning some high priority tasks. Currently the max turbo frequency is shown as max frequency, which is the max of all cores, even if some cores can't reach that frequency even for single threaded workload. But it is possible that max ratio in HWP capabilities is set as 0xFF or some high invalid value (E.g. One KBL NUC). Since the actual performance can never exceed 1 core turbo frequency from MSR TURBO_RATIO_LIMIT, we use this as a bound check. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| *	cpufreq: intel_pstate: use match_string() helper	Xie Yisheng	2018-07-02	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	match_string() returns the index of an array for a matching string, which can be used instead of open coded variant. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \| \|	cpufreq: intel_pstate: Limit the scope of HWP dynamic boost platforms	Srinivas Pandruvada	2018-07-31	1	-2/+15
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dynamic boosting of HWP performance on IO wake showed significant improvement to IO workloads. This series was intended for Skylake Xeon platforms only and feature was enabled by default based on CPU model number. But some Xeon platforms reused the Skylake desktop CPU model number. This caused some undesirable side effects to some graphics workloads. Since they are heavily IO bound, the increase in CPU performance decreased the power available for GPU to do its computing and hence decrease in graphics benchmark performance. For example on a Skylake desktop, GpuTest benchmark showed average FPS reduction from 529 to 506. This change makes sure that HWP boost feature is only enabled for Skylake server platforms by using ACPI FADT preferred PM Profile. If some desktop users wants to get benefit of boost, they can still enable boost from intel_pstate sysfs attribute "hwp_dynamic_boost". Fixes: 41ab43c9c89e (cpufreq: intel_pstate: enable boost for Skylake Xeon) Link: https://bugs.freedesktop.org/show_bug.cgi?id=107410 Reported-by: Eero Tamminen <eero.t.tamminen@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Acked-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \|	cpufreq: intel_pstate: Register when ACPI PCCH is present	Rafael J. Wysocki	2018-07-18	1	-1/+16
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, intel_pstate doesn't register if _PSS is not present on HP Proliant systems, because it expects the firmware to take over CPU performance scaling in that case. However, if ACPI PCCH is present, the firmware expects the kernel to use it for CPU performance scaling and the pcc-cpufreq driver is loaded for that. Unfortunately, the firmware interface used by that driver is not scalable for fundamental reasons, so pcc-cpufreq is way suboptimal on systems with more than just a few CPUs. In fact, it is better to avoid using it at all. For this reason, modify intel_pstate to look for ACPI PCCH if _PSS is not present and register if it is there. Also prevent the pcc-cpufreq driver from trying to initialize itself if intel_pstate has been registered already. Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power mgmt option) Reported-by: Andreas Herrmann <aherrmann@suse.com> Reviewed-by: Andreas Herrmann <aherrmann@suse.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Andreas Herrmann <aherrmann@suse.com> Cc: 4.16+ <stable@vger.kernel.org> # 4.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	cpufreq: intel_pstate: Fix scaling max/min limits with Turbo 3.0	Srinivas Pandruvada	2018-06-19	1	-5/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When scaling max/min settings are changed, internally they are converted to a ratio using the max turbo 1 core turbo frequency. This works fine when 1 core max is same irrespective of the core. But under Turbo 3.0, this will not be the case. For example: Core 0: max turbo pstate: 43 (4.3GHz) Core 1: max turbo pstate: 45 (4.5GHz) In this case 1 core turbo ratio will be maximum of all, so it will be 45 (4.5GHz). Suppose scaling max is set to 4GHz (ratio 40) for all cores ,then on core one it will be = max_state * policy->max / max_freq; = 43 * (4000000/4500000) = 38 (3.8GHz) = 38 which is 200MHz less than the desired. On core2, it will be correctly set to ratio 40 (4GHz). Same holds true for scaling min frequency limit. So this requires usage of correct turbo max frequency for core one, which in this case is 4.3GHz. So we need to adjust per CPU cpu->pstate.turbo_freq using the maximum HWP ratio of that core. This change uses the HWP capability of a core to adjust max turbo frequency. But since Broadwell HWP doesn't use ratios in the HWP capabilities, we have to use legacy max 1 core turbo ratio. This is not a problem as the HWP capabilities don't differ among cores in Broadwell. We need to check for non Broadwell CPU model for applying this change, though. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 4.6+ <stable@vger.kernel.org> # 4.6+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	Merge tag 'pm-4.18-rc1-2' of ↵	Linus Torvalds	2018-06-13	1	-3/+176
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These revert a recent PM core change that introduced a regression, fix the build when the recently added Kryo cpufreq driver is selected, add support for devices attached to multiple power domains to the generic power domains (genpd) framework, add support for iowait boosting on systens with hardware-managed P-states (HWP) enabled to the intel_pstate driver, modify the behavior of the wakeup_count device attribute in sysfs, fix a few issues and clean up some ugliness, mostly in cpufreq (core and drivers) and in the cpupower utility. Specifics: - Revert a recent PM core change that attempted to fix an issue related to device links, but introduced a regression (Rafael Wysocki) - Fix build when the recently added cpufreq driver for Kryo processors is selected by making it possible to build that driver as a module (Arnd Bergmann) - Fix the long idle detection mechanism in the out-of-band (ondemand and conservative) cpufreq governors (Chen Yu) - Add support for devices in multiple power domains to the generic power domains (genpd) framework (Ulf Hansson) - Add support for iowait boosting on systems with hardware-managed P-states (HWP) enabled to the intel_pstate driver and make it use that feature on systems with Skylake Xeon processors as it is reported to improve performance significantly on those systems (Srinivas Pandruvada) - Fix and update the acpi_cpufreq, ti-cpufreq and imx6q cpufreq drivers (Colin Ian King, Suman Anna, Sébastien Szymanski) - Change the behavior of the wakeup_count device attribute in sysfs to expose the number of events when the device might have aborted system suspend in progress (Ravi Chandra Sadineni) - Fix two minor issues in the cpupower utility (Abhishek Goel, Colin Ian King)" * tag 'pm-4.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "PM / runtime: Fixup reference counting of device link suppliers at probe" cpufreq: imx6q: check speed grades for i.MX6ULL cpufreq: governors: Fix long idle detection logic in load calculation cpufreq: intel_pstate: enable boost for Skylake Xeon PM / wakeup: Export wakeup_count instead of event_count via sysfs PM / Domains: Add dev_pm_domain_attach_by_id() to manage multi PM domains PM / Domains: Add support for multi PM domains per device to genpd PM / Domains: Split genpd_dev_pm_attach() PM / Domains: Don't attach devices in genpd with multi PM domains PM / Domains: dt: Allow power-domain property to be a list of specifiers cpufreq: intel_pstate: New sysfs entry to control HWP boost cpufreq: intel_pstate: HWP boost performance on IO wakeup cpufreq: intel_pstate: Add HWP boost utility and sched util hooks cpufreq: ti-cpufreq: Use devres managed API in probe() cpufreq: ti-cpufreq: Fix an incorrect error return value cpufreq: ACPI: make function acpi_cpufreq_fast_switch() static cpufreq: kryo: allow building as a loadable module cpupower : Fix header name to read idle state name cpupower: fix spelling mistake: "logilename" -> "logfilename"
\| *	cpufreq: intel_pstate: enable boost for Skylake Xeon	Srinivas Pandruvada	2018-06-08	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable HWP boost on Skylake server and workstations. Reported-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| *	cpufreq: intel_pstate: New sysfs entry to control HWP boost	Srinivas Pandruvada	2018-06-06	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A new attribute is added to intel_pstate sysfs to enable/disable HWP dynamic performance boost. Reported-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| *	cpufreq: intel_pstate: HWP boost performance on IO wakeup	Srinivas Pandruvada	2018-06-06	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change uses SCHED_CPUFREQ_IOWAIT flag to boost HWP performance. Since SCHED_CPUFREQ_IOWAIT flag is set frequently, we don't start boosting steps unless we see two consecutive flags in two ticks. This avoids boosting due to IO because of regular system activities. To avoid synchronization issues, the actual processing of the flag is done on the local CPU callback. Reported-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| *	cpufreq: intel_pstate: Add HWP boost utility and sched util hooks	Srinivas Pandruvada	2018-06-06	1	-3/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added two utility functions to HWP boost up gradually and boost down to the default cached HWP request values. Boost up: Boost up updates HWP request minimum value in steps. This minimum value can reach upto at HWP request maximum values depends on how frequently, this boost up function is called. At max, boost up will take three steps to reach the maximum, depending on the current HWP request levels and HWP capabilities. For example, if the current settings are: If P0 (Turbo max) = P1 (Guaranteed max) = min No boost at all. If P0 (Turbo max) > P1 (Guaranteed max) = min Should result in one level boost only for P0. If P0 (Turbo max) = P1 (Guaranteed max) > min Should result in two level boost: (min + p1)/2 and P1. If P0 (Turbo max) > P1 (Guaranteed max) > min Should result in three level boost: (min + p1)/2, P1 and P0. We don't set any level between P0 and P1 as there is no guarantee that they will be honored. Boost down: After the system is idle for hold time of 3ms, the HWP request is reset to the default value from HWP init or user modified one via sysfs. Caching of HWP Request and Capabilities Store the HWP request value last set using MSR_HWP_REQUEST and read MSR_HWP_CAPABILITIES. This avoid reading of MSRs in the boost utility functions. These boost utility functions calculated limits are based on the latest HWP request value, which can be modified by setpolicy() callback. So if user space modifies the minimum perf value, that will be accounted for every time the boost up is called. There will be case when there can be contention with the user modified minimum perf, in that case user value will gain precedence. For example just before HWP_REQUEST MSR is updated from setpolicy() callback, the boost up function is called via scheduler tick callback. Here the cached MSR value is already the latest and limits are updated based on the latest user limits, but on return the MSR write callback called from setpolicy() callback will update the HWP_REQUEST value. This will be used till next time the boost up function is called. In addition add a variable to control HWP dynamic boosting. When HWP dynamic boost is active then set the HWP specific update util hook. The contents in the utility hooks will be filled in the subsequent patches. Reported-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \|	treewide: Use array_size() in vzalloc()	Kees Cook	2018-06-12	1	-1/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The vzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vzalloc(a * b) with: vzalloc(array_size(a, b)) as well as handling cases of: vzalloc(a * b * c) with: vzalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vzalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| vzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| vzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| vzalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| vzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| vzalloc( - sizeof(u8) * COUNT + COUNT , ...) \| vzalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| vzalloc( - sizeof(char) * COUNT + COUNT , ...) \| vzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vzalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vzalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| vzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| vzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| vzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vzalloc(C1 * C2 * C3, ...) \| vzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vzalloc(C1 * C2, ...) \| vzalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
*	cpufreq: intel_pstate: allow trace in passive mode	Doug Smythies	2018-05-14	1	-2/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow use of the trace_pstate_sample trace function when the intel_pstate driver is in passive mode. Since the core_busy and scaled_busy fields are not used, and it might be desirable to know which path through the driver was used, either intel_cpufreq_target or intel_cpufreq_fast_switch, re-task the core_busy field as a flag indicator. The user can then use the intel_pstate_tracer.py utility to summarize and plot the trace. Note: The core_busy feild still goes by that name in include/trace/events/power.h and within the intel_pstate_tracer.py script and csv file headers, but it is graphed as "performance", and called core_avg_perf now in the intel_pstate driver. Sometimes, in passive mode, the driver is not called for many tens or even hundreds of seconds. The user needs to understand, and not be confused by, this limitation. Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	cpufreq: intel_pstate: Do not include debugfs.h	Rafael J. Wysocki	2018-04-10	1	-1/+0
\| \| \| \| \| \| \| \|	The intel_pstate driver doesn't use debugfs any more, so drop linux/debugfs.h from the list of included headers in it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
*	cpufreq: intel_pstate: Enable HWP during system resume on CPU0	Chen Yu	2018-02-08	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When maxcpus=1 is in the kernel command line, the BP is responsible for re-enabling the HWP - because currently only the APs invoke intel_pstate_hwp_enable() during their online process - which might put the system into unstable state after resume. Fix this by enabling the HWP explicitly on BP during resume. Reported-by: Doug Smythies <dsmythies@telus.net> Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Yu Chen <yu.c.chen@intel.com> [ rjw: Subject/changelog, minor modifications ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	cpufreq: intel_pstate: Add Skylake servers support	Srinivas Pandruvada	2018-01-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently intel_pstate can function only in HWP mode on Skylake servers. When HWP feature is not enabled on the processor then acpi-cpufreq is driver is used. Based on the power and performance tests using intel_pstate scaling algorithm the results are comparable. But intel_pstate brings in additional features: - Display of turbo frequency range, which many users like to see - Place limits in the turbo frequency range when platform allows Since these tests are done only using non PID algorithm introduced in kernel version 4.14, this patch is not a backport candidate. So each user has to carefully weigh the benefits before he backports. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	cpufreq: intel_pstate: Replace bxt_funcs with core_funcs	Srinivas Pandruvada	2018-01-11	1	-11/+2
\| \| \| \| \| \| \| \|	Since core_funcs and bxt_funcs have same set of callbacks, replace bxt_funcs with core_funcs. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*	Merge tag 'acpi-4.14-rc1' of ↵	Linus Torvalds	2017-09-05	1	-39/+25
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These include a usual ACPICA code update (this time to upstream revision 20170728), a fix for a boot crash on some systems with Thunderbolt devices connected at boot time, a rework of the handling of PCI bridges when setting up device wakeup, new support for Apple device properties, support for DMA configurations reported via ACPI on ARM64, APEI-related updates, ACPI EC driver updates and assorted minor modifications in several places. Specifics: - Update the ACPICA code in the kernel to upstream revision 20170728 including: * Alias operator handling update (Bob Moore). * Deferred resolution of reference package elements (Bob Moore). * Support for the _DMA method in walk resources (Bob Moore). * Tables handling update and support for deferred table verification (Lv Zheng). * Update of SMMU models for IORT (Robin Murphy). * Compiler and disassembler updates (Alex James, Erik Schmauss, Ganapatrao Kulkarni, James Morse). * Tools updates (Erik Schmauss, Lv Zheng). * Assorted minor fixes and cleanups (Bob Moore, Kees Cook, Lv Zheng, Shao Ming). - Rework the initialization of non-wakeup GPEs with method handlers in order to address a boot crash on some systems with Thunderbolt devices connected at boot time where we miss an early hotplug event due to a delay in GPE enabling (Rafael Wysocki). - Rework the handling of PCI bridges when setting up ACPI-based device wakeup in order to avoid disabling wakeup for bridges prematurely (Rafael Wysocki). - Consolidate Apple DMI checks throughout the tree, add support for Apple device properties to the device properties framework and use these properties for the handling of I2C and SPI devices on Apple systems (Lukas Wunner). - Add support for _DMA to the ACPI-based device properties lookup code and make it possible to use the information from there to configure DMA regions on ARM64 systems (Lorenzo Pieralisi). - Fix several issues in the APEI code, add support for exporting the BERT error region over sysfs and update APEI MAINTAINERS entry with reviewers information (Borislav Petkov, Dongjiu Geng, Loc Ho, Punit Agrawal, Tony Luck, Yazen Ghannam). - Fix a potential initialization ordering issue in the ACPI EC driver and clean it up somewhat (Lv Zheng). - Update the ACPI SPCR driver to extend the existing XGENE 8250 workaround in it to a new platform (m400) and to work around an Xgene UART clock issue (Graeme Gregory). - Add a new utility function to the ACPI core to support using ACPI OEM ID / OEM Table ID / Revision for system identification in blacklisting or similar and switch over the existing code already using this information to this new interface (Toshi Kani). - Fix an xpower PMIC issue related to GPADC reads that always return 0 without extra pin manipulations (Hans de Goede). - Add statements to print debug messages in a couple of places in the ACPI core for easier diagnostics (Rafael Wysocki). - Clean up the ACPI processor driver slightly (Colin Ian King, Hanjun Guo). - Clean up the ACPI x86 boot code somewhat (Andy Shevchenko). - Add a quirk for Dell OptiPlex 9020M to the ACPI backlight driver (Alex Hung). - Assorted fixes, cleanups and updates related to ACPI (Amitoj Kaur Chawla, Bhumika Goyal, Frank Rowand, Jean Delvare, Punit Agrawal, Ronald Tschalär, Sumeet Pawnikar)" * tag 'acpi-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits) ACPI / APEI: Suppress message if HEST not present intel_pstate: convert to use acpi_match_platform_list() ACPI / blacklist: add acpi_match_platform_list() ACPI, APEI, EINJ: Subtract any matching Register Region from Trigger resources ACPI: make device_attribute const ACPI / sysfs: Extend ACPI sysfs to provide access to boot error region ACPI: APEI: fix the wrong iteration of generic error status block ACPI / processor: make function acpi_processor_check_duplicates() static ACPI / EC: Clean up EC GPE mask flag ACPI: EC: Fix possible issues related to EC initialization order ACPI / PM: Add debug statements to acpi_pm_notify_handler() ACPI: Add debug statements to acpi_global_event_handler() ACPI / scan: Enable GPEs before scanning the namespace ACPICA: Make it possible to enable runtime GPEs earlier ACPICA: Dispatch active GPEs at init time ACPI: SPCR: work around clock issue on xgene UART ACPI: SPCR: extend XGENE 8250 workaround to m400 ACPI / LPSS: Don't abort ACPI scan on missing mem resource mailbox: pcc: Drop uninformative output during boot ACPI/IORT: Add IORT named component memory address limits ...
\| *	intel_pstate: convert to use acpi_match_platform_list()	Toshi Kani	2017-08-29	1	-39/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert to use acpi_match_platform_list() for the platform check. There is no change in functionality. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \|	Merge branch 'intel_pstate'	Rafael J. Wysocki	2017-09-04	1	-300/+20
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* intel_pstate: cpufreq: intel_pstate: Shorten a couple of long names cpufreq: intel_pstate: Simplify intel_pstate_adjust_pstate() cpufreq: intel_pstate: Improve IO performance with per-core P-states cpufreq: intel_pstate: Drop INTEL_PSTATE_HWP_SAMPLING_INTERVAL cpufreq: intel_pstate: Drop ->update_util from pstate_funcs cpufreq: intel_pstate: Do not use PID-based P-state selection
\| * \	Merge back intel_pstate material for v4.14.	Rafael J. Wysocki	2017-08-21	1	-296/+20
\| \|\ \ \| \| \|/ \| \|/\|
\| \| *	cpufreq: intel_pstate: Shorten a couple of long names	Rafael J. Wysocki	2017-08-10	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The names of the INTEL_PSTATE_DEFAULT_SAMPLING_INTERVAL symbol and the get_target_pstate_use_cpu_load() function don't need to be so long any more, so make them shorter. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| *	cpufreq: intel_pstate: Simplify intel_pstate_adjust_pstate()	Rafael J. Wysocki	2017-08-10	1	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since there is only one P-state selection routine in intel_pstate now, make intel_pstate_adjust_pstate() call it directly and drop the target_pstate argument from that function. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| *	Merge v4.13 intel_pstate fixes.	Rafael J. Wysocki	2017-08-04	1	-8/+0
\| \| \|\
\| \| * \|	cpufreq: intel_pstate: Improve IO performance with per-core P-states	Srinivas Pandruvada	2017-08-04	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the current implementation, the response latency between seeing SCHED_CPUFREQ_IOWAIT set and the actual P-state adjustment can be up to 10ms. It can be reduced by bumping up the P-state to the max at the time SCHED_CPUFREQ_IOWAIT is passed to intel_pstate_update_util(). With this change, the IO performance improves significantly. For a simple "grep -r . linux" (Here linux is the kernel source folder) with caches dropped every time on a Broadwell Xeon workstation with per-core P-states, the user and system time is shorter by as much as 30% - 40%. The same performance difference was not observed on clients that don't support per-core P-state. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| * \|	cpufreq: intel_pstate: Drop INTEL_PSTATE_HWP_SAMPLING_INTERVAL	Rafael J. Wysocki	2017-08-01	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After commit 62611cb912f7 (intel_pstate: delete scheduler hook in HWP mode) the INTEL_PSTATE_HWP_SAMPLING_INTERVAL is not used anywhere in the code, so drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| * \|	cpufreq: intel_pstate: Drop ->update_util from pstate_funcs	Rafael J. Wysocki	2017-07-26	1	-13/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All systems use the same P-state selection "powersave" algorithm in the active mode if HWP is not used, so there's no need to provide a pointer for it in struct pstate_funcs any more. Drop ->update_util from struct pstate_funcs and make intel_pstate_set_update_util_hook() use intel_pstate_update_util() directly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| * \|	cpufreq: intel_pstate: Do not use PID-based P-state selection	Rafael J. Wysocki	2017-07-26	1	-274/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All systems with a defined ACPI preferred profile that are not "servers" have been using the load-based P-state selection algorithm in intel_pstate since 4.12-rc1 (mobile systems and laptops have been using it since 4.10-rc1) and no problems with it have been reported to date. In particular, no regressions with respect to the PID-based P-state selection have been reported. Also testing indicates that the P-state selection algorithm based on CPU load is generally on par with the PID-based algorithm performance-wise, and for some workloads it turns out to be better than the other one, while being more straightforward and easier to understand at the same time. Moreover, the PID-based P-state selection algorithm in intel_pstate is known to be unstable in some situation and generally problematic, the issues with it are hard to address and it has become a significant maintenance burden. For these reasons, make intel_pstate use the "powersave" P-state selection algorithm based on CPU load in the active mode on all systems and drop the PID-based P-state selection code along with all things related to it from the driver. Also update the documentation accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \| \| \|	Merge branch 'pm-cpufreq-sched'	Rafael J. Wysocki	2017-09-04	1	-0/+8
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* pm-cpufreq-sched: cpufreq: schedutil: Always process remote callback with slow switching cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily cpufreq: Return 0 from ->fast_switch() on errors cpufreq: Simplify cpufreq_can_do_remote_dvfs() cpufreq: Process remote callbacks from any CPU if the platform permits sched: cpufreq: Allow remote cpufreq callbacks cpufreq: schedutil: Use unsigned int for iowait boost cpufreq: schedutil: Make iowait boost more energy efficient
\| * \| \| \|	sched: cpufreq: Allow remote cpufreq callbacks	Viresh Kumar	2017-08-01	1	-0/+8
\| \| \|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With Android UI and benchmarks the latency of cpufreq response to certain scheduling events can become very critical. Currently, callbacks into cpufreq governors are only made from the scheduler if the target CPU of the event is the same as the current CPU. This means there are certain situations where a target CPU may not run the cpufreq governor for some time. One testcase to show this behavior is where a task starts running on CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the system is configured such that the new tasks should receive maximum demand initially, this should result in CPU0 increasing frequency immediately. But because of the above mentioned limitation though, this does not occur. This patch updates the scheduler core to call the cpufreq callbacks for remote CPUs as well. The schedutil, ondemand and conservative governors are updated to process cpufreq utilization update hooks called for remote CPUs where the remote CPU is managed by the cpufreq policy of the local CPU. The intel_pstate driver is updated to always reject remote callbacks. This is tested with couple of usecases (Android: hackbench, recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit octa-core, single policy). Only galleryfling showed minor improvements, while others didn't had much deviation. The reason being that this patch only targets a corner case, where following are required to be true to improve performance and that doesn't happen too often with these tests: - Task is migrated to another CPU. - The task has high demand, and should take the target CPU to higher OPPs. - And the target CPU doesn't call into the cpufreq governor until the next tick. Based on initial work from Steve Muckle. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \| \| \|	Merge branch 'pm-cpufreq'	Rafael J. Wysocki	2017-09-04	1	-2/+0
\|\ \ \ \ \| \|_\|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* pm-cpufreq: (33 commits) cpufreq: imx6q: Fix imx6sx low frequency support cpufreq: speedstep-lib: make several arrays static, makes code smaller cpufreq: ti: Fix 'of_node_put' being called twice in error handling path cpufreq: dt-platdev: Drop few entries from whitelist cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2 ARM: ux500: don't select CPUFREQ_DT cpufreq: Convert to using %pOF instead of full_name cpufreq: Cap the default transition delay value to 10 ms cpufreq: dbx500: Delete obsolete driver mfd: db8500-prcmu: Get rid of cpufreq dependency cpufreq: enable the DT cpufreq driver on the Ux500 cpufreq: Loongson2: constify platform_device_id cpufreq: dt: Add r8a7796 support to to use generic cpufreq driver cpufreq: remove setting of policy->cpu in policy->cpus during init cpufreq: mediatek: add support of cpufreq to MT7622 SoC cpufreq: mediatek: add cleanups with the more generic naming cpufreq: rcar: Add support for R8A7795 SoC cpufreq: dt: Add rk3328 compatible to use generic cpufreq driver cpufreq: s5pv210: add missing of_node_put() cpufreq: Allow dynamic switching with CPUFREQ_ETERNAL latency ...
\| * \| \|	cpufreq: remove setting of policy->cpu in policy->cpus during init	Sudeep Holla	2017-08-18	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	policy->cpu is copied into policy->cpus in cpufreq_online() before calling into cpufreq_driver->init(). So there's no need to set the same in the individual driver init() functions again. This patch removes the redundant setting of policy->cpu in policy->cpus in intel_pstate and cppc drivers. Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| * \| \|	cpufreq: Don't set transition_latency for setpolicy drivers	Viresh Kumar	2017-07-26	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The transition_latency field isn't used for drivers with ->setpolicy() callback present and there is no point setting it from the drivers. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \| \| \|	cpufreq: intel_pstate: report correct CPU frequencies during trace	Doug Smythies	2017-08-11	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The intel_pstate CPU frequency scaling driver has always calculated CPU frequency incorrectly. Recent changes have eliminted most of the issues, however the frequency reported in the trace buffer, if used, is incorrect. It remains desireable that cpu->pstate.scaling still be a nice round number for things such as when setting max and min frequencies. So the proposal is to just fix the reported frequency in the trace data. Fixes what remains of [1]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=96521 # [1] Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \| \| \|	Merge branches 'pm-cpufreq-x86', 'pm-cpufreq-docs' and 'intel_pstate'	Rafael J. Wysocki	2017-08-03	1	-8/+0
\|\ \ \ \ \| \|_\|/ / \|/\| \| / \| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* pm-cpufreq-x86: cpufreq: x86: Make scaling_cur_freq behave more as expected * pm-cpufreq-docs: cpufreq: docs: Add missing cpuinfo_cur_freq description * intel_pstate: cpufreq: intel_pstate: Drop ->get from intel_pstate structure
\| * \|	cpufreq: intel_pstate: Drop ->get from intel_pstate structure	Rafael J. Wysocki	2017-07-27	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ->get callback in the intel_pstate structure was mostly there for the scaling_cur_freq sysfs attribute to work, but after commit f8475cef9008 (x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF) that attribute uses arch_freq_get_on_cpu() provided by the x86 arch code on all processors supported by intel_pstate, so it doesn't need the ->get callback from the driver any more. Moreover, the very presence of the ->get callback in the intel_pstate structure causes the cpuinfo_cur_freq attribute to be present when intel_pstate operates in the active mode, which is bogus, because the role of that attribute is to return the current CPU frequency as seen by the hardware. For intel_pstate, though, this is just an average frequency and not really current, but computed for the previous sampling interval (the actual current frequency may be way different at the point this value is obtained by reading from cpuinfo_cur_freq), and after commit 82b4e03e01bc (intel_pstate: skip scheduler hook when in "performance" mode) the value in cpuinfo_cur_freq may be stale or just 0, depending on the driver's operation mode. In fact, however, on the hardware supported by intel_pstate there is no way to read the current CPU frequency from it, so the cpuinfo_cur_freq attribute should not be present at all when this driver is in use. For this reason, drop intel_pstate_get() and clear the ->get callback pointer pointing to it, so that the cpuinfo_cur_freq is not present for intel_pstate in the active mode any more. Fixes: 82b4e03e01bc (intel_pstate: skip scheduler hook when in "performance" mode) Reported-by: Huaisheng Ye <yehs1@lenovo.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
\| \| \|
\| \ \
*-. \ \	Merge branches 'intel_pstate' and 'pm-domains'	Rafael J. Wysocki	2017-07-20	1	-2/+19
\|\ \ \ \ \| \| \|/ / \| \|/\| / \| \|_\|/ \|/\| \| \| \| \| \| \| \| \| \| \|	* intel_pstate: cpufreq: intel_pstate: Correct the busy calculation for KNL * pm-domains: PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
\| * \|	cpufreq: intel_pstate: Correct the busy calculation for KNL	Srinivas Pandruvada	2017-07-14	1	-2/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The busy percent calculated for the Knights Landing (KNL) platform is 1024 times smaller than the correct busy value. This causes performance to get stuck at the lowest ratio. The scaling algorithm used for KNL is performance-based, but it still looks at the CPU load to set the scaled busy factor to 0 when the load is less than 1 percent. In this case, since the computed load is 1024x smaller than it should be, the scaled busy factor will always be 0, irrespective of CPU business. This needs a fix similar to the turbostat one in commit b2b34dfe4d9a (tools/power turbostat: KNL workaround for %Busy and Avg_MHz). For this reason, add one more callback to processor-specific callbacks to specify an MPERF multiplier represented by a number of bit positions to shift the value of that register to the left to copmensate for its rate difference with respect to the TSC. This shift value is used during CPU busy calculations. Fixes: ffb810563c (intel_pstate: Avoid getting stuck in high P-states when idle) Reported-and-tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 4.6+ <stable@vger.kernel.org> # 4.6+ [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| \|
\| \ \
*-. \ \	Merge branches 'pm-cpufreq-sched' and 'intel_pstate'	Rafael J. Wysocki	2017-07-14	1	-1/+1
\|\ \ \ \ \| \| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* pm-cpufreq-sched: cpufreq: schedutil: Fix sugov_start() versus sugov_update_shared() race * intel_pstate: cpufreq: intel_pstate: Fix ratio setting for min_perf_pct
\| \| * \|	cpufreq: intel_pstate: Fix ratio setting for min_perf_pct	Srinivas Pandruvada	2017-07-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the minimum performance limit percentage is set to the power-up default, it is possible that minimum performance ratio is off by one. In the set_policy() callback the minimum ratio is calculated by applying global.min_perf_pct to turbo_ratio and rounding up, but the power-up default global.min_perf_pct is already rounded up to the next percent in min_perf_pct_min(). That results in two round up operations, so for the default min_perf_pct one of them is not required. It is better to remove rounding up in min_perf_pct_min() as this matches the displayed min_perf_pct prior to commit c5a2ee7dde89 (cpufreq: intel_pstate: Active mode P-state limits rework) in 4.12. For example on a platform with max turbo ratio of 37 and minimum ratio of 10, the min_perf_pct resulted in 28 with the above commit. Before this commit it was 27 and it will be the same after this change. Fixes: 1a4fe38add8b (cpufreq: intel_pstate: Remove max/min fractions to limit performance) Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| \| \|
\| \ \ \
\| \ \ \
\| \ \ \
*---. \ \ \	Merge branches 'pm-domains', 'pm-sleep' and 'pm-cpufreq'	Rafael J. Wysocki	2017-07-10	1	-1/+1
\|\ \ \ \ \ \ \| \| \| \|/ / / \| \| \|/\| \| / \| \| \|_\|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* pm-domains: PM / Domains: provide pm_genpd_poweroff_noirq() stub Revert "PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device" * pm-sleep: PM / sleep: constify attribute_group structures * pm-cpufreq: cpufreq: intel_pstate: constify attribute_group structures cpufreq: cpufreq_stats: constify attribute_group structures
\| \| \| * \|	cpufreq: intel_pstate: constify attribute_group structures	Arvind Yadav	2017-07-04	1	-1/+1
\| \| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	attribute_groups are not supposed to change at runtime. All functions working with attribute_groups provided by <linux/sysfs.h> work with const attribute_group. So mark the non-const structs as const. File size before: text data bss dec hex filename 15197 2552 40 17789 457d drivers/cpufreq/intel_pstate.o File size After adding 'const': text data bss dec hex filename 15261 2488 40 17789 457d drivers/cpufreq/intel_pstate.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| \| \|
\| \ \ \
*-. \ \ \	Merge branches 'pm-cpufreq', 'intel_pstate' and 'pm-cpuidle'	Rafael J. Wysocki	2017-07-03	1	-70/+68
\|\ \ \ \ \ \| \| \|_\|/ / \| \|/\| \| / \| \| \| \|/ \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* pm-cpufreq: cpufreq / CPPC: Initialize policy->min to lowest nonlinear performance cpufreq: sfi: make freq_table static cpufreq: exynos5440: Fix inconsistent indenting cpufreq: imx6q: imx6ull should use the same flow as imx6ul cpufreq: dt: Add support for hi3660 * intel_pstate: cpufreq: Update scaling_cur_freq documentation cpufreq: intel_pstate: Clean up after performance governor changes intel_pstate: skip scheduler hook when in "performance" mode intel_pstate: delete scheduler hook in HWP mode x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF cpufreq: intel_pstate: Remove max/min fractions to limit performance x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" * pm-cpuidle: cpuidle: menu: allow state 0 to be disabled intel_idle: Use more common logging style x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on AMD systems ARM: cpuidle: Support asymmetric idle definition
\| \| * \|	cpufreq: intel_pstate: Clean up after performance governor changes	Rafael J. Wysocki	2017-06-29	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After commit 82b4e03e01bc (intel_pstate: skip scheduler hook when in "performance" mode) get_target_pstate_use_performance() and get_target_pstate_use_cpu_load() are never called if scaling_governor is "performance", so drop the CPUFREQ_POLICY_PERFORMANCE checks from them as they will never trigger anyway. Moreover, the documentation needs to be updated to reflect the change made by the above commit, so do that too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
\| \| * \|	intel_pstate: skip scheduler hook when in "performance" mode	Len Brown	2017-06-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the governor is set to "performance", intel_pstate does not need the scheduler hook for doing any calculations. Under these conditions, its only purpose is to continue to maintain cpufreq/scaling_cur_freq. The cpufreq/scaling_cur_freq sysfs attribute is now provided by shared x86 cpufreq code on modern x86 systems, including all systems supported by the intel_pstate driver. So in "performance" governor mode, the scheduler hook can be skipped. This applies to both in Software and Hardware P-state control modes. Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| * \|	intel_pstate: delete scheduler hook in HWP mode	Len Brown	2017-06-27	1	-11/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cpufreq/scaling_cur_freq sysfs attribute is now provided by shared x86 cpufreq code on modern x86 systems, including all systems supported by the intel_pstate driver. In HWP mode, maintaining that value was the sole purpose of the scheduler hook, intel_pstate_update_util_hwp(), so it can now be removed. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| * \|	cpufreq: intel_pstate: Remove max/min fractions to limit performance	Srinivas Pandruvada	2017-06-24	1	-51/+63
\| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the current model the max/min perf limits are a fraction of current user space limits to the allowed max_freq or 100% for global limits. This results in wrong ratio limits calculation because of rounding issues for some user space limits. Initially we tried to solve this issue by issue by having more shift bits to increase precision. Still there are isolated cases where we still have error. This can be avoided by using ratios all together. Since the way we get cpuinfo.max_freq is by multiplying scaling factor to max ratio, we can easily keep the max/min ratios in terms of ratios and not fractions. For example: if the max ratio = 36 cpuinfo.max_freq = 36 * 100000 = 3600000 Suppose user space sets a limit of 1200000, then we can calculate max ratio limit as = 36 * 1200000 / 3600000 = 12 This will be correct for any user limits. The other advantage is that, we don't need to do any calculation in the fast path as ratio limit is already calculated via set_policy() callback. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \| \|	Merge branch 'pm-tools'	Rafael J. Wysocki	2017-07-03	1	-19/+15
\|\ \ \ \| \|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* pm-tools: cpupower: Add support for new AMD family 0x17 cpupower: Fix bug where return value was not used tools/power turbostat: update version number tools/power turbostat: decode MSR_IA32_MISC_ENABLE only on Intel tools/power turbostat: stop migrating, unless '-m' tools/power turbostat: if --debug, print sampling overhead tools/power turbostat: hide SKL counters, when not requested intel_pstate: use updated msr-index.h HWP.EPP values tools/power x86_energy_perf_policy: support HWP.EPP x86: msr-index.h: fix shifts to ULL results in HWP macros. x86: msr-index.h: define HWP.EPP values x86: msr-index.h: define EPB mid-points