op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	drm/i915: expose rcs topology through query uAPI	Lionel Landwerlin	2018-03-08	1	-0/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the introduction of asymmetric slices in CNL, we cannot rely on the previous SUBSLICE_MASK getparam to tell userspace what subslices are available. Here we introduce a more detailed way of querying the Gen's GPU topology that doesn't aggregate numbers. This is essential for monitoring parts of the GPU with the OA unit, because counters need to be normalized to the number of EUs/subslices/slices. The current aggregated numbers like EU_TOTAL do not gives us sufficient information. The Mesa series making use of this API is : https://patchwork.freedesktop.org/series/38795/ As a bonus we can draw representations of the GPU : https://imgur.com/a/vuqpa v2: Rename uapi struct s/_mask/_info/ (Tvrtko) Report max_slice/subslice/eus_per_subslice rather than strides (Tvrtko) Add uapi macros to read data from *_info structs (Tvrtko) v3: Use !!(v & DRM_I915_BIT()) for uapi macros instead of custom shifts (Tvrtko) v4: factorize query item writting (Tvrtko) tweak uapi struct/define names (Tvrtko) v5: Replace ALIGN() macro (Chris) v6: Updated uapi comments (Tvrtko) Moved flags != 0 checks into vfuncs (Tvrtko) v7: Use access_ok() before copying anything, to avoid overflows (Chris) Switch BUG_ON() to GEM_WARN_ON() (Tvrtko) v8: Tweak uapi comments style to match the coding style (Lionel) v9: Fix error in comment about computation of enabled subslice (Tvrtko) v10: Fix/update comments in uAPI (Sagar) v11: Drop drm_i915_query_(slice\|subslice\|eu)_info in favor of a single drm_i915_query_topology_info (Joonas) v12: Add subslice_stride/eu_stride in drm_i915_query_topology_info (Joonas) v13: Fix comment in uAPI (Joonas) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180306122857.27317-7-lionel.g.landwerlin@intel.com
*	drm/i915: add query uAPI	Lionel Landwerlin	2018-03-08	1	-3/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a number of information that are readable from hardware registers and that we would like to make accessible to userspace. One particular example is the topology of the execution units (how are execution units grouped in subslices and slices and also which ones have been fused off for die recovery). At the moment the GET_PARAM ioctl covers some basic needs, but generally is only able to return a single value for each defined parameter. This is a bit problematic with topology descriptions which are array/maps of available units. This change introduces a new ioctl that can deal with requests to fill structures of potentially variable lengths. The user is expected fill a query with length fields set at 0 on the first call, the kernel then sets the length fields to the their expected values. A second call to the kernel with length fields at their expected values will trigger a copy of the data to the pointed memory locations. The scope of this uAPI is only to provide information to userspace, not to allow configuration of the device. v2: Simplify dispatcher code iteration (Tvrtko) Tweak uapi drm_i915_query_item structure (Tvrtko) v3: Rename pad fields into flags (Chris) Return error on flags field != 0 (Chris) Only copy length back to userspace in drm_i915_query_item (Chris) v4: Use array of functions instead of switch (Chris) v5: More comments in uapi (Tvrtko) Return query item errors in length field (All) v6: Tweak uapi comments style to match the coding style (Lionel) v7: Add i915_query.h (Joonas) v8: (Lionel) Change the behavior of the item iterator to report invalid queries into the query item rather than stopping the iteration. This enables userspace applications to query newer items on older kernels and only have failure on the items that are not supported. v9: Edit copyright headers (Joonas) v10: Typos & comments in uapi (Joonas) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180306122857.27317-6-lionel.g.landwerlin@intel.com
*	drm/i915: Deprecate I915_SET_COLORKEY_NONE	Ville Syrjälä	2018-02-05	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Deprecate the silly I915_SET_COLORKEY_NONE flag. The obvious way to disable colorkey is to just set flags to 0, which is exactly what the intel ddx has been doing all along. Currently when userspace sets the flags to 0, we end up in a funny state where colorkey is disabled, but various colorkey vs. scaling checks still consider colorkey to be enabled, and thus we don't allow plane scaling to kick in. In case there is some other userspace out there that actually uses this flag (unlikely as this is an i915 specific uapi) we'll keep on accepting it. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180202204231.27905-1-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
*	drm/i915/pmu: Aggregate all RC6 states into one counter	Tvrtko Ursulin	2017-11-24	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Chris has discovered that RC6, RC6p and RC6pp counters are mutually exclusive, and even that on some SNB SKUs you get RC6p increasing, and on the others RC6. Furthermore RC6p and RC6pp were only present starting from GEN6 until, GEN7, not including Haswell. All this combined makes it questionable whether we need to reserve new ABI for these counters. One idea was to just combine them all under the RC6 counter to simplify things for userspace. So that is what this patch does. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171124171331.17981-1-tvrtko.ursulin@linux.intel.com
*	drm/i915/pmu: Drop I915_ENGINE_SAMPLE_MAX from uapi headers	Tvrtko Ursulin	2017-11-23	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	We have agreed during the engine classes discussion that fields marked as non-ABI are better left out altogether from uapi headers. v2: Use a local define for maintanability. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171123100701.18430-1-tvrtko.ursulin@linux.intel.com
*	drm/i915/pmu: Add RC6 residency metrics	Tvrtko Ursulin	2017-11-22	1	-1/+5
\| \| \| \| \| \| \| \| \|	For clients like intel-gpu-overlay it is easier to read the counters via the perf API than having to parse sysfs. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-9-tvrtko.ursulin@linux.intel.com
*	drm/i915/pmu: Add interrupt count metric	Tvrtko Ursulin	2017-11-22	1	-1/+3
\| \| \| \| \| \| \| \| \|	For clients like intel-gpu-overlay it is easier to read the count via the perf API than having to parse /proc. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-7-tvrtko.ursulin@linux.intel.com
*	drm/i915/pmu: Expose a PMU interface for perf queries	Tvrtko Ursulin	2017-11-22	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From: Chris Wilson <chris@chris-wilson.co.uk> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> From: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> The first goal is to be able to measure GPU (and invidual ring) busyness without having to poll registers from userspace. (Which not only incurs holding the forcewake lock indefinitely, perturbing the system, but also runs the risk of hanging the machine.) As an alternative we can use the perf event counter interface to sample the ring registers periodically and send those results to userspace. Functionality we are exporting to userspace is via the existing perf PMU API and can be exercised via the existing tools. For example: perf stat -a -e i915/rcs0-busy/ -I 1000 Will print the render engine busynnes once per second. All the performance counters can be enumerated (perf list) and have their unit of measure correctly reported in sysfs. v1-v2 (Chris Wilson): v2: Use a common timer for the ring sampling. v3: (Tvrtko Ursulin) * Decouple uAPI from i915 engine ids. * Complete uAPI defines. * Refactor some code to helpers for clarity. * Skip sampling disabled engines. * Expose counters in sysfs. * Pass in fake regs to avoid null ptr deref in perf core. * Convert to class/instance uAPI. * Use shared driver code for rc6 residency, power and frequency. v4: (Dmitry Rogozhkin) * Register PMU with .task_ctx_nr=perf_invalid_context * Expose cpumask for the PMU with the single CPU in the mask * Properly support pmu->stop(): it should call pmu->read() * Properly support pmu->del(): it should call stop(event, PERF_EF_UPDATE) * Introduce refcounting of event subscriptions. * Make pmu.busy_stats a refcounter to avoid busy stats going away with some deleted event. * Expose cpumask for i915 PMU to avoid multiple events creation of the same type followed by counter aggregation by perf-stat. * Track CPUs getting online/offline to migrate perf context. If (likely) cpumask will initially set CPU0, CONFIG_BOOTPARAM_HOTPLUG_CPU0 will be needed to see effect of CPU status tracking. * End result is that only global events are supported and perf stat works correctly. * Deny perf driver level sampling - it is prohibited for uncore PMU. v5: (Tvrtko Ursulin) * Don't hardcode number of engine samplers. * Rewrite event ref-counting for correctness and simplicity. * Store initial counter value when starting already enabled events to correctly report values to all listeners. * Fix RC6 residency readout. * Comments, GPL header. v6: * Add missing entry to v4 changelog. * Fix accounting in CPU hotplug case by copying the approach from arch/x86/events/intel/cstate.c. (Dmitry Rogozhkin) v7: * Log failure message only on failure. * Remove CPU hotplug notification state on unregister. v8: * Fix error unwind on failed registration. * Checkpatch cleanup. v9: * Drop the energy metric, it is available via intel_rapl_perf. (Ville Syrjälä) * Use HAS_RC6(p). (Chris Wilson) * Handle unsupported non-engine events. (Dmitry Rogozhkin) * Rebase for intel_rc6_residency_ns needing caller managed runtime pm. * Drop HAS_RC6 checks from the read callback since creating those events will be rejected at init time already. * Add counter units to sysfs so perf stat output is nicer. * Cleanup the attribute tables for brevity and readability. v10: * Fixed queued accounting. v11: * Move intel_engine_lookup_user to intel_engine_cs.c * Commit update. (Joonas Lahtinen) v12: * More accurate sampling. (Chris Wilson) * Store and report frequency in MHz for better usability from perf stat. * Removed metrics: queued, interrupts, rc6 counters. * Sample engine busyness based on seqno difference only for less MMIO (and forcewake) on all platforms. (Chris Wilson) v13: * Comment spelling, use mul_u32_u32 to work around potential GCC issue and somne code alignment changes. (Chris Wilson) v14: * Rebase. v15: * Rebase for RPS refactoring. v16: * Use the dynamic slot in the CPU hotplug state machine so that we are free to setup our state as multi-instance. Previously we were re-using the CPUHP_AP_PERF_X86_UNCORE_ONLINE slot which is neither used as multi-instance, nor owned by our driver to start with. * Register the CPU hotplug handlers after the PMU, otherwise the callback will get called before the PMU is initialized which can end up in perf_pmu_migrate_context with an un-initialized base. * Added workaround for a probable bug in cpuhp core. v17: * Remove workaround for the cpuhp bug. v18: * Rebase for drm_i915_gem_engine_class getting upstream before us. v19: * Rebase. (trivial) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-2-tvrtko.ursulin@linux.intel.com
*	drm/i915: expose command stream timestamp frequency to userspace	Lionel Landwerlin	2017-11-13	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use to have this fixed per generation, but starting with CNL userspace cannot tell just off the PCI ID. Let's make this information available. This is particularly useful for performance monitoring where much of the normalization work is done using those timestamps (this include pipeline statistics in both GL & Vulkan as well as OA reports). v2: Use variables for 24MHz/19.2MHz values (Ewelina) Renamed function & coding style (Sagar) v3: Fix frequency read on Broadwell (Sagar) Fix missing divide by 4 on <= gen4 (Sagar) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110190845.32574-7-lionel.g.landwerlin@intel.com
*	drm/i915: Record the default hw state after reset upon load	Chris Wilson	2017-11-10	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Take a copy of the HW state after a reset upon module loading by executing a context switch from a blank context to the kernel context, thus saving the default hw state over the blank context image. We can then use the default hw state to initialise any future context, ensuring that each starts with the default view of hw state. v2: Unmap our default state from the GTT after stealing it from the context. This should stop us from accidentally overwriting it via the GTT (and frees up some precious GTT space). Testcase: igt/gem_ctx_isolation Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-7-chris@chris-wilson.co.uk
*	drm/i915: Define an engine class enum for the uABI	Tvrtko Ursulin	2017-11-10	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want to be able to report back to userspace details about an engine's class, and in return for userspace to be able to request actions regarding certain classes of engines. To isolate the uABI from any variations between hw generations, we define an abstract class for the engines and internally map onto the hw. v2: Remove MAX from the uABI; keep it internal if we need it, but don't let userspace make the mistake of using it themselves. v3: s/OTHER/INVALID/ The use of OTHER is ill-defined, so remove it from the uABI as any future new type of engine can define a class to suit it. But keep a reserved value for an invalid class, so that we can always unambiguously express when something doesn't belong to the classification. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v2 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-1-chris@chris-wilson.co.uk
*	drm/i915: Reject unknown syncobj flags	Tvrtko Ursulin	2017-11-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have to reject unknown flags for uAPI considerations, and also because the curent implementation limits their i915 storage space to two bits. v2: (Chris Wilson) * Fix fail in ABI check. * Added unknown flags and BUILD_BUG_ON. v3: * Use ARCH_KMALLOC_MINALIGN instead of alignof. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: cf6e7bac6357 ("drm/i915: Add support for drm syncobjs") Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171031102326.9738-1-tvrtko.ursulin@linux.intel.com
*	drm/i915: Don't use BIT() in UAPI section	Joonas Lahtinen	2017-10-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Lets not introduce BIT() macro requirement for UAPI for now. Fixes: 3fd3a6ffe279 ("drm/i915: Simplify i915_reg_read_ioctl") Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171006104559.17312-1-joonas.lahtinen@linux.intel.com
*	drm/i915/scheduler: Support user-defined priorities	Chris Wilson	2017-10-04	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a priority stored in the context as the initial value when submitting a request. This allows us to change the default priority on a per-context basis, allowing different contexts to be favoured with GPU time at the expense of lower importance work. The user can adjust the context's priority via I915_CONTEXT_PARAM_PRIORITY, with more positive values being higher priority (they will be serviced earlier, after their dependencies have been resolved). Any prerequisite work for an execbuf will have its priority raised to match the new request as required. Normal users can specify any value in the range of -1023 to 0 [default], i.e. they can reduce the priority of their workloads (and temporarily boost it back to normal if so desired). Privileged users can specify any value in the range of -1023 to 1023, [default is 0], i.e. they can raise their priority above all overs and so potentially starve the system. Note that the existing schedulers are not fair, nor load balancing, the execution is strictly by priority on a first-come, first-served basis, and the driver may choose to boost some requests above the range available to users. This priority was originally based around nice(2), but evolved to allow clients to adjust their priority within a small range, and allow for a privileged high priority range. For example, this can be used to implement EGL_IMG_context_priority https://www.khronos.org/registry/egl/extensions/IMG/EGL_IMG_context_priority.txt EGL_CONTEXT_PRIORITY_LEVEL_IMG determines the priority level of the context to be created. This attribute is a hint, as an implementation may not support multiple contexts at some priority levels and system policy may limit access to high priority contexts to appropriate system privilege level. The default value for EGL_CONTEXT_PRIORITY_LEVEL_IMG is EGL_CONTEXT_PRIORITY_MEDIUM_IMG." so we can map PRIORITY_HIGH -> 1023 [privileged, will failback to 0] PRIORITY_MED -> 0 [default] PRIORITY_LOW -> -1023 They also map onto the priorities used by VkQueue (and a VkQueue is essentially a timeline, our i915_gem_context under full-ppgtt). v2: s/CAP_SYS_ADMIN/CAP_SYS_NICE/ v3: Report min/max user priorities as defines in the uapi, and rebase internal priorities on the exposed values. Testcase: igt/gem_exec_schedule Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-9-chris@chris-wilson.co.uk
*	drm/i915: Expand I915_PARAM_HAS_SCHEDULER into a capability bitmask	Chris Wilson	2017-10-04	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the next few patches, we wish to enable different features for the scheduler, some which may subtlety change ABI (e.g. allow requests to be reordered under different circumstances). So we need to make sure userspace is cognizant of the changes (if they care), by which we employ the usual method of a GETPARAM. We already have an I915_PARAM_HAS_SCHEDULER (which notes the existing ability to reorder requests to avoid bubbles), and now we wish to extend that to be a bitmask to describe the different capabilities implemented. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-7-chris@chris-wilson.co.uk
*	uapi/drm/i915: document field usage of drm_i915_perf_oa_config	Lionel Landwerlin	2017-09-18	1	-0/+5
\| \| \| \| \| \| \| \| \|	Document the expected length of buffers config pointers (tuple of u32 values). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170918114241.30105-1-lionel.g.landwerlin@intel.com
*	drm/i915: Simplify i915_reg_read_ioctl	Joonas Lahtinen	2017-09-14	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert to use the freshly available made INTEL_GEN_MASK for easier grepping and improve function readability and clarify the UABI documentation. No functional changes. v2: - Lift GEM_BUG_ONs and use is_power_of_2 (Chris) - Retain -EINVAL on bad flags behavior (Chris) v3: - Extract flags with 'entry->size - 1' (Chris) v4: - Add GEM_BUG_ON on for flags vs entry offset (Chris) v5: - Use 'u16' to match 'dev_priv' (Ville) v6: - Fix checkpatch.pl errors Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170913115255.13851-2-joonas.lahtinen@linux.intel.com
*	drm/i915/perf: Remove __user from u64 in drm_i915_perf_oa_config	Chris Wilson	2017-09-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	Sparse complains that these integers from which we form void __user *, and so we don't need the annotation itself inside the uABI. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170901145729.21363-2-chris@chris-wilson.co.uk Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
*	drm/i915: Add support for drm syncobjs	Jason Ekstrand	2017-08-15	1	-2/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds support for waiting on or signaling DRM syncobjs as part of execbuf. It does so by hijacking the currently unused cliprects pointer to instead point to an array of i915_gem_exec_fence structs which containe a DRM syncobj and a flags parameter which specifies whether to wait on it or to signal it. This implementation theoretically allows for both flags to be set in which case it waits on the dma_fence that was in the syncobj and then immediately replaces it with the dma_fence from the current execbuf. v2: - Rebase on new syncobj API v3: - Pull everything out into helpers - Do all allocation in gem_execbuffer2 - Pack the flags in the bottom 2 bits of the drm_syncobj* v4: - Prevent a potential race on syncobj->fence Testcase: igt/gem_exec_fence/syncobj* Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/1499289202-25441-1-git-send-email-jason.ekstrand@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170815145733.4562-1-chris@chris-wilson.co.uk
*	drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface	Lionel Landwerlin	2017-08-03	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivation behind this new interface is expose at runtime the creation of new OA configs which can be used as part of the i915 perf open interface. This will enable the kernel to learn new configs which may be experimental, or otherwise not part of the core set currently available through the i915 perf interface. v2: Drop DRM_ERROR for userspace errors (Matthew) Add padding to userspace structure (Matthew) s/guid/uuid/ (Matthew) v3: Use u32 instead of int to iterate through registers (Matthew) v4: Lock access to dynamic config list (Lionel) v5: by Matthew: Fix uninitialized error values Fix incorrect unwiding when opening perf stream Use kmalloc_array() to store register Use uuid_is_valid() to valid config uuids Declare ioctls as write only Check padding members are set to 0 by Lionel: Return ENOENT rather than EINVAL when trying to remove non existing config v6: by Chris: Use ref counts for OA configs Store UUID in drm_i915_perf_oa_config rather then using pointer Shuffle fields of drm_i915_perf_oa_config to avoid padding v7: by Chris Rename uapi pointers fields to end with '_ptr' v8: by Andrzej, Marek, Sebastian Update register whitelisting by Lionel Add more register names for documentation Allow configuration programming in non-paranoid mode Add support for value filter for a couple of registers already programmed in other part of the kernel v9: Documentation fix (Lionel) Allow writing WAIT_FOR_RC6_EXIT only on Gen8+ (Andrzej) v10: Perform read access_ok() on register pointers (Lionel) Signed-off-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Andrzej Datczuk <andrzej.datczuk@intel.com> Reviewed-by: Andrzej Datczuk <andrzej.datczuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-2-lionel.g.landwerlin@intel.com
*	drm/i915: Allow execbuffer to use the first object as the batch	Chris Wilson	2017-06-16	1	-2/+17
\| \| \| \| \| \| \| \| \| \| \|	Currently, the last object in the execlist is the always the batch. However, when building the batch buffer we often know the batch object first and if we can use the first slot in the execlist we can emit relocation instructions relative to it immediately and avoid a separate pass to adjust the relocations to point to the last execlist slot. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
*	drm/i915/perf: Add OA unit support for Gen 8+	Robert Bragg	2017-06-14	1	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all share (more-or-less) the same OA unit design. Of particular note in comparison to Haswell: some OA unit HW config state has become per-context state and as a consequence it is somewhat more complicated to manage synchronous state changes from the cpu while there's no guarantee of what context (if any) is currently actively running on the gpu. The periodic sampling frequency which can be particularly useful for system-wide analysis (as opposed to command stream synchronised MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to have become per-context save and restored (while the OABUFFER destination is still a shared, system-wide resource). This support for gen8+ takes care to consider a number of timing challenges involved in synchronously updating per-context state primarily by programming all config state from the cpu and updating all current and saved contexts synchronously while the OA unit is still disabled. The driver intentionally avoids depending on command streamer programming to update OA state considering the lack of synchronization between the automatic loading of OACTXCONTROL state (that includes the periodic sampling state and enable state) on context restore and the parsing of any general purpose BB the driver can control. I.e. this implementation is careful to avoid the possibility of a context restore temporarily enabling any out-of-date periodic sampling state. In addition to the risk of transiently-out-of-date state being loaded automatically; there are also internal HW latencies involved in the loading of MUX configurations which would be difficult to account for from the command streamer (and we only want to enable the unit when once the MUX configuration is complete). Since the Gen8+ OA unit design no longer supports clock gating the unit off for a single given context (which effectively stopped any progress of counters while any other context was running) and instead supports tagging OA reports with a context ID for filtering on the CPU, it means we can no longer hide the system-wide progress of counters from a non-privileged application only interested in metrics for its own context. Although we could theoretically try and subtract the progress of other contexts before forwarding reports via read() we aren't in a position to filter reports captured via MI_REPORT_PERF_COUNT commands. As a result, for Gen8+, we always require the dev.i915.perf_stream_paranoid to be unset for any access to OA metrics if not root. v5: Drain submitted requests when enabling metric set to ensure no lite-restore erases the context image we just updated (Lionel) v6: In addition to drain, switch to kernel context & update all context in place (Chris) v7: Add missing mutex_unlock() if switching to kernel context fails (Matthew) v8: Simplify OA period/flex-eu-counters programming by using the batchbuffer instead of modifying ctx-image (Lionel) v9: Back to updating the context image (due to erroneous testing, batchbuffer programming the OA unit doesn't actually work) (Lionel) Pin context before updating context image (Chris) Drop MMIO programming now that we switch to a kernel context with right values in initial context image (Chris) v10: Just pin_map the contexts we want to modify or let the configuration happen on first use (Chris) v11: Update kernel context OA config through the batchbuffer rather than on the fly ctx-image update (Lionel) v12: Rework OA context registers update again by swithing away from user contexts and reconfiguring the kernel context through the batchbuffer and updating all the other contexts' context image. Also take care to lock slice/subslice configuration when OA is on. (Lionel) v13: Request rpcs updates on all engine when updating the OA config (Lionel) v14: Drop any kind of rpcs management now that we monitor sseu configuration changes in a later patch (Lionel) Remove usleep after programming the NOA configs on Gen8+, this doesn't seem to be needed (Lionel) v15: Respect coding style for block comments (Chris) v16: Add missing i915_add_request() in case we fail to emit OA configuration (Matthew) Signed-off-by: Robert Bragg <robert@sixbynine.org> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/ Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
*	drm/i915: expose _SUBSLICE_MASK GETPARM	Robert Bragg	2017-06-14	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	Assuming a uniform mask across all slices, this enables userspace to determine the specific sub slices can be enabled. This information is required, for example, to be able to analyse some OA counter reports where the counter configuration depends on the HW sub slice configuration. Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
*	drm/i915: expose _SLICE_MASK GETPARM	Robert Bragg	2017-06-14	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Enables userspace to determine the maximum number of slices that can be enabled on the device and also know what specific slices can be enabled. This information is required, for example, to be able to analyse some OA counter reports where the counter configuration depends on the HW slice configuration. Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
*	drm/i915: Copy user requested buffers into the error state	Chris Wilson	2017-04-15	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a new execobject.flag (EXEC_OBJECT_CAPTURE) that userspace may use to indicate that it wants the contents of this buffer preserved in the error state (/sys/class/drm/cardN/error) following a GPU hang involving this batch. Use this at your discretion, the contents of the error state. although compressed, are allocated with GFP_ATOMIC (i.e. limited) and kept for all eternity (until the error state is destroyed). Based on an earlier patch by Ben Widawsky <ben@bwidawsk.net> Testcase: igt/gem_exec_capture Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Matt Turner <mattst88@gmail.com> Acked-by: Ben Widawsky <ben@bwidawsk.net> Acked-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170415093902.22581-1-chris@chris-wilson.co.uk
*	drm/i915: Treat WC a separate cache domain	Chris Wilson	2017-04-12	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When discussing a new WC mmap, we based the interface upon the assumption that GTT was fully coherent. How naive! Commits 3b5724d702ef ("drm/i915: Wait for writes through the GTT to land before reading back") and ed4596ea992d ("drm/i915/guc: WA to address the Ringbuffer coherency issue") demonstrate that writes through the GTT are indeed delayed and may be overtaken by direct WC access. To be safe, if userspace is mixing WC mmaps with other potential GTT access (pwrite, GTT mmaps) it should use set_domain(WC). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96563 Testcase: igt/gem_pwrite/small-gtt* Testcase: igt/drv_selftest/coherency Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170412110111.26626-2-chris@chris-wilson.co.uk
*	drm/i915: Support explicit fencing for execbuf	Chris Wilson	2017-01-27	1	-1/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that the user can opt-out of implicit fencing, we need to give them back control over the fencing. We employ sync_file to wrap our drm_i915_gem_request and provide an fd that userspace can merge with other sync_file fds and pass back to the kernel to wait upon before future execution. Testcase: igt/gem_exec_fence Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Chad Versace <chadversary@chromium.org> Link: http://patchwork.freedesktop.org/patch/msgid/20170127094008.27489-2-chris@chris-wilson.co.uk
*	drm/i915: Enable userspace to opt-out of implicit fencing	Chris Wilson	2017-01-27	1	-1/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Userspace is faced with a dilemma. The kernel requires implicit fencing to manage resource usage (we always must wait for the GPU to finish before releasing its PTE) and for third parties. However, userspace may wish to avoid this serialisation if it is either using explicit fencing between parties and wants more fine-grained access to buffers (e.g. it may partition the buffer between uses and track fences on ranges rather than the implicit fences tracking the whole object). It follows that userspace needs a mechanism to avoid the kernel's serialisation on its implicit fences before execbuf execution. The next question is whether this is an object, execbuf or context flag. Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on shared winsys buffers, but implicit fencing on internal surfaces) require a per-object level flag. Given that this flag need to be only set once for the lifetime of the object, this reduces the convenience of having an execbuf or context level flag (and avoids having multiple pieces of uABI controlling the same feature). Incorrect use of this flag will result in rendering corruption and GPU hangs - but will not result in use-after-free or similar resource tracking issues. Serious caveat: write ordering is not strictly correct after setting this flag on a render target on multiple engines. This affects all subsequent GEM operations (execbuf, set-domain, pread) and shared dma-buf operations. A fix is possible - but costly (both in terms of further ABI changes and runtime overhead). Testcase: igt/gem_exec_async Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Chad Versace <chadversary@chromium.org> Link: http://patchwork.freedesktop.org/patch/msgid/20170127094008.27489-1-chris@chris-wilson.co.uk
*	drm/i915/get_params: Add HuC status to getparams	Anusha Srivatsa	2017-01-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch will allow for getparams to return the status of the HuC. As the HuC has to be validated by the GuC this patch uses the validated status to show when the HuC is loaded and ready for use. You cannot use the loaded status as with the GuC as the HuC is verified after it is loaded and is not usable until it is verified. v2: removed the forewakes as the registers are already force-woken. (T.Ursulin) v3: rebased on top of drm-tip. Removed any reference to intel_huc.h v4: rebased. Rename I915_PARAM_HAS_HUC to I915_PARAM_HUC_STATUS. Remove intel_is_huc_valid() since it is used only in one place. Put the case of I915_PARAM_HAS_HUC() in the right place. v5: rebased. Add a comment to specify that I915_READ(reg) does not read garbage value. The register HUC_STATUS2 is force woken and no rpm is needed. Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Signed-off-by: Peter Antoine <peter.antoine@intel.com> Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1484755558-1234-6-git-send-email-anusha.srivatsa@intel.com
*	drm/i915/perf: Treat u64 in uabi as a normal integer	Chris Wilson	2016-12-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Forgo marking up the u64 integer representing a user pointer as this just annoys sparse. The conversion from u64 to a user pointer is managed by u64_to_user_ptr(). Fixes: eec688e1420d ("drm/i915: Add i915 perf infrastructure") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Robert Bragg <robert@sixbynine.org> Cc: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161130164649.26809-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
*	drm/i915: Enable i915 perf stream for Haswell OA unit	Robert Bragg	2016-11-22	1	-2/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gen graphics hardware can be set up to periodically write snapshots of performance counters into a circular buffer via its Observation Architecture and this patch exposes that capability to userspace via the i915 perf interface. v2: Make sure to initialize ->specific_ctx_id when opening, without relying on _pin_notify hook, in case ctx already pinned. v3: Revert back to pinning ctx upfront when opening stream, removing need to hook in to pinning and to update OACONTROL on the fly. Signed-off-by: Robert Bragg <robert@sixbynine.org> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-7-robert@sixbynine.org
*	drm/i915: Add i915 perf infrastructure	Robert Bragg	2016-11-22	1	-0/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds base i915 perf infrastructure for Gen performance metrics. This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64 properties to configure a stream of metrics and returns a new fd usable with standard VFS system calls including read() to read typed and sized records; ioctl() to enable or disable capture and poll() to wait for data. A stream is opened something like: uint64_t properties[] = { /* Single context sampling / DRM_I915_PERF_PROP_CTX_HANDLE, ctx_handle, / Include OA reports in samples / DRM_I915_PERF_PROP_SAMPLE_OA, true, / OA unit configuration */ DRM_I915_PERF_PROP_OA_METRICS_SET, metrics_set_id, DRM_I915_PERF_PROP_OA_FORMAT, report_format, DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent, }; struct drm_i915_perf_open_param parm = { .flags = I915_PERF_FLAG_FD_CLOEXEC \| I915_PERF_FLAG_FD_NONBLOCK \| I915_PERF_FLAG_DISABLED, .properties_ptr = (uint64_t)properties, .num_properties = sizeof(properties) / 16, }; int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param); Records read all start with a common { type, size } header with DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records contain an extensible number of fields and it's the DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that determine what's included in every sample. No specific streams are supported yet so any attempt to open a stream will return an error. v2: use i915_gem_context_get() - Chris Wilson v3: update read() interface to avoid passing state struct - Chris Wilson fix some rebase fallout, with i915-perf init/deinit v4: s/DRM_IORW/DRM_IOW/ - Emil Velikov Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-2-robert@sixbynine.org
*	drm/i915: Add bannable context parameter	Mika Kuoppala	2016-11-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now when driver has per context scoring of 'hanging badness' and also subsequent hangs during short windows are allowed, if there is progress made in between, it does not make sense to expose a ban timing window as a context parameter anymore. Let the scoring be the sole indicator for ban policy and substitute ban period context parameter as a boolean to get/set context bannable property. v2: allow non root to opt into being banned (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
*	drm/i915/scheduler: Signal the arrival of a new request	Chris Wilson	2016-11-14	1	-0/+5
\| \| \| \| \| \| \| \| \|	The start of the scheduler, add a hook into request submission for the scheduler to see the arrival of new requests and prepare its runqueues. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-6-chris@chris-wilson.co.uk
*	drm/i915: Add I915_PARAM_MMAP_GTT_VERSION to advertise unlimited mmaps	Chris Wilson	2016-08-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we have working partial VMA and faulting support for all objects, including fence support, advertise to userspace that it can take advantage of unlimited GGTT mmaps. v2: Make room in the kerneldoc for a more detailed explanation of the limitations of the GTT mmap interface. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20160825180519.11341-1-chris@chris-wilson.co.uk
*	drm/i915: Embrace the race in busy-ioctl	Chris Wilson	2016-08-16	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Daniel Vetter proposed a new challenge to the serialisation inside the busy-ioctl that exposed a flaw that could result in us reporting the wrong engine as being busy. If the request is reallocated as we test its busyness and then reassigned to this object by another thread, we would not notice that the test itself was incorrect. We are faced with a choice of using __i915_gem_active_get_request_rcu() to first acquire a reference to the request preventing the race, or to acknowledge the race and accept the limitations upon the accuracy of the busy flags. Note that we guarantee that we never falsely report the object as idle (providing userspace itself doesn't race), and so the most important use of the busy-ioctl and its guarantees are fulfilled. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471337440-16777-1-git-send-email-chris@chris-wilson.co.uk
*	drm/i915: Document and reject invalid tiling modes	Chris Wilson	2016-08-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Through the GTT interface to the fence registers, we can only handle linear, X and Y tiling. The more esoteric tiling patterns are ignored. Document that the tiling ABI only supports upto Y tiling, and reject any attempts to set a tiling mode other than NONE, X or Y. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-17-git-send-email-chris@chris-wilson.co.uk
*	drm/i915: Pad GTT views of exec objects up to user specified size	Chris Wilson	2016-08-04	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Our GPUs impose certain requirements upon buffers that depend upon how exactly they are used. Typically this is expressed as that they require a larger surface than would be naively computed by pitch * height. Normally such requirements are hidden away in the userspace driver, but when we accept pointers from strangers and later impose extra conditions on them, the original client allocator has no idea about the monstrosities in the GPU and we require the userspace driver to inform the kernel how many padding pages are required beyond the client allocation. v2: Long time, no see v3: Try an anonymous union for uapi struct compatibility Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-7-git-send-email-chris@chris-wilson.co.uk
*	drm/i915: Give proper names to MOCS entries	Imre Deak	2016-07-19	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The purpose for each MOCS entry isn't well defined atm. Defining these is important to remove any uncertainty about the use of these entries for example in terms of performance and GPU/CPU coherency. Suggested by Ville. v4: - Rename I915_MOCS_AUTO to I915_MOCS_PTE. (Chris) CC: Rong R Yang <rong.r.yang@intel.com> CC: Yakui Zhao <yakui.zhao@intel.com> CC: Ville Syrjälä <ville.syrjala@linux.intel.com> CC: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467383528-16142-1-git-send-email-imre.deak@intel.com
*	drm/i915: compile-time consistency check on __EXEC_OBJECT flags	Dave Gordon	2016-07-19	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two different sets of flag bits are stored in the 'flags' member of a 'struct drm_i915_gem_exec_object2', and they're defined in two different source files, increasing the risk of an accidental clash. Some flags in this field are supplied by the user; these are defined in i915_drm.h, and they start from the LSB and work up. Other flags are defined in i915_gem_execbuffer, for internal use within that file only; they start from the MSB and work down. So here we add a compile-time check that the two sets of flags do not overlap, which would cause all sorts of confusion. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1468504324-12690-1-git-send-email-david.s.gordon@intel.com
*	drm/i915: Allow userspace to request no-error-capture upon GPU hangs	Chris Wilson	2016-07-04	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	igt likes to inject GPU hangs into its command streams. However, as we expect these hangs, we don't actually want them recorded in the dmesg output or stored in the i915_error_state (usually). To accommodate this allow userspace to set a flag on the context that any hang emanating from that context will not be recorded. We still do the error capture (otherwise how do we find the guilty context and know its intent?) as part of the reason for random GPU hang injection is to exercise the race conditions between the error capture and normal execution. v2: Split out the request->ringbuf error capture changes. v3: Move the flag defines next to the intel_context->flags definition Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-9-git-send-email-chris@chris-wilson.co.uk
*	drm/i915/bxt: Export pooled eu info to userspace	arun.siluvery@linux.intel.com	2016-07-01	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pooled EU is a bxt only feature and kernel changes are already merged. This feature is not yet exposed to userspace as the support was not yet available. Beignet team expressed interest and added patches to use this. Since we now have a user and patches to use them, expose them from the kernel side as well. v2: fix compile error [1] https://lists.freedesktop.org/archives/beignet/2016-June/007698.html [2] https://lists.freedesktop.org/archives/beignet/2016-June/007699.html Cc: Winiarski, Michal <michal.winiarski@intel.com> Cc: Zou, Nanhai <nanhai.zou@intel.com> Cc: Yang, Rong R <rong.r.yang@intel.com> Cc: Tim Gore <tim.gore@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467369782-25992-1-git-send-email-arun.siluvery@linux.intel.com Acked-by: Jani Nikula <jani.nikula@intel.com>
*	drm/i915: add extern C guard for the UAPI header	Emil Velikov	2016-05-13	1	-0/+8
\| \| \| \| \| \|	Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Daniel Vetter <daniel.vetter@intel.com>
*	drm/i915: Fix VCS ring selection after uapi decoupling	Tvrtko Ursulin	2016-01-28	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This got broken in: commit de1add360522c876c25ef2bbbbab1c94bdb509ab Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Fri Jan 15 15:12:50 2016 +0000 drm/i915: Decouple execbuf uAPI from internal implementation BSD ring flags need to be shifted before they can be considered indices into the ring array. Reported by Zhipeng Gong. v2: Simplify the code. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Zhipeng Gong <zhipeng.gong@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1453902069-31353-1-git-send-email-tvrtko.ursulin@linux.intel.com Testcase: igt/gem_exec_basic # bdw-gt3
*	drm/i915: Seal busy-ioctl uABI and prevent leaking of internal ids	Chris Wilson	2016-01-21	1	-4/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tvrtko was looking through the execbuffer-ioctl and noticed that the uABI was tightly coupled to our internal engine identifiers. Close inspection also revealed that we leak those internal engine identifiers through the busy-ioctl, and those internal identifiers already do not match the user identifiers. Fortuitiously, there is only one user of the set of busy rings from the busy-ioctl, and they only wish to choose between the RENDER and the BLT engines. Let's fix the userspace ABI while we still can. v2: Update the uAPI documentation to explain the identifiers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Testcase: igt/gem_busy Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1452876706-21620-1-git-send-email-chris@chris-wilson.co.uk
*	Merge tag 'drm-intel-next-2015-12-18' of ↵	Dave Airlie	2015-12-23	1	-3/+9
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://anongit.freedesktop.org/drm-intel into drm-next - fix atomic watermark recomputation logic (Maarten) - modeset sequence fixes for LPT (Ville) - more kbl enabling&prep work (Rodrigo, Wayne) - first bits for mst audio - page dirty tracking fixes from Dave Gordon - new get_eld hook from Takashi, also included in the sound tree - fixup cursor handling when placed at address 0 (Ville) - refactor VBT parsing code (Jani) - rpm wakelock debug infrastructure ( Imre) - fbdev is pinned again (Chris) - tune the busywait logic to avoid wasting cpu cycles (Chris) * tag 'drm-intel-next-2015-12-18' of git://anongit.freedesktop.org/drm-intel: (81 commits) drm/i915: Update DRIVER_DATE to 20151218 drm/i915/skl: Default to noncoherent access up to F0 drm/i915: Only spin whilst waiting on the current request drm/i915: Limit the busy wait on requests to 5us not 10ms! drm/i915: Break busywaiting for requests on pending signals drm/i915: don't enable autosuspend on platforms without RPM support drm/i915/backlight: prefer dev_priv over dev pointer drm/i915: Disable primary plane if we fail to reconstruct BIOS fb (v2) drm/i915: Pin the ifbdev for the info->system_base GGTT mmapping drm/i915: Set the map-and-fenceable flag for preallocated objects drm/i915: mdelay(10) considered harmful drm/i915: check that we are in an RPM atomic section in GGTT PTE updaters drm/i915: add support for checking RPM atomic sections drm/i915: check that we hold an RPM wakelock ref before we put it drm/i915: add support for checking if we hold an RPM reference drm/i915: use assert_rpm_wakelock_held instead of opencoding it drm/i915: add assert_rpm_wakelock_held helper drm/i915: remove HAS_RUNTIME_PM check from RPM get/put/assert helpers drm/i915: get a permanent RPM reference on platforms w/o RPM support drm/i915: refactor RPM disabling due to RC6 being disabled ...
\| *	drm/i915: Add soft-pinning API for execbuffer	Chris Wilson	2015-12-09	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Userspace can pass in an offset that it presumes the object is located at. The kernel will then do its utmost to fit the object into that location. The assumption is that userspace is handling its own object locations (for example along with full-ppgtt) and that the kernel will rarely have to make space for the user's requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> v2: Fixed incorrect eviction found by Michal Winiarski - fix suggested by Chris Wilson. Fixed incorrect error paths causing crash found by Michal Winiarski. (Not published externally) v3: Rebased because of trivial conflict in object_bind_to_vm. Fixed eviction to allow eviction of soft-pinned objects when another soft-pinned object used by a subsequent execbuffer overlaps reported by Michal Winiarski. (Not published externally) v4: Moved soft-pinned objects to the front of ordered_vmas so that they are pinned first after an address conflict happens to avoid repeated conflicts in rare cases (Suggested by Chris Wilson). Expanded comment on drm_i915_gem_exec_object2.offset to cover this new API. v5: Added I915_PARAM_HAS_EXEC_SOFTPIN parameter for detecting this capability (Kristian). Added check for multiple pinnings on eviction (Akash). Made sure buffers are not considered misplaced without the user specifying EXEC_OBJECT_SUPPORTS_48B_ADDRESS. User must assume responsibility for any addressing workarounds. Updated object2.offset field comment again to clarify NO_RELOC case (Chris). checkpatch cleanup. v6: Trivial rebase on latest drm-intel-nightly v7: Catch attempts to pin above the max virtual address size and return EINVAL (Tvrtko). Decouple EXEC_OBJECT_SUPPORTS_48B_ADDRESS and EXEC_OBJECT_PINNED flags, user must pass both flags in any attempt to pin something at an offset above 4GB (Chris, Daniel Vetter). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Zou Nanhai <nanhai.zou@intel.com> Cc: Kristian Høgsberg <hoegsberg@gmail.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Acked-by: PDT Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1449575707-20933-1-git-send-email-thomas.daniel@intel.com
* \|	Merge branch 'drm-header-fixes' of https://github.com/GabrielL/linux into ↵	Dave Airlie	2015-12-11	1	-1/+1
\|\ \ \| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	drm-next Fix all the problems with the header files and userspace builds off them. I really care so little about this, but hey who am I to stop progress. * 'drm-header-fixes' of https://github.com/GabrielL/linux: (30 commits) drm: fix inclusion of drm.h in via_drm.h drm: fix inclusion of drm.h in vmwgfx_drm.h drm: fix inclusion of drm.h in virtgpu_drm.h drm: fix inclusion of drm.h in tegra_drm.h drm: fix inclusion of drm.h in savage_drm.h drm: fix inclusion of drm.h in r128_drm.h drm: fix inclusion of drm.h in qxl_drm.h drm: fix inclusion of drm.h in omap_drm.h drm: fix inclusion of drm.h in msm_drm.h drm: fix inclusion of drm.h in mga_drm.h drm: fix inclusion of drm.h in exynos_sarea.h drm: fix inclusion of drm.h in i810_drm.h drm: fix inclusion of drm.h in exynos_sarea.h drm: fix inclusion of drm.h in drm_sarea.h drm: drm_mode.h fix includes drm: drm_fourcc.h fix includes drm: include drm.h in armada_drm.h include/uapi/drm/amdgpu_drm.h: use __u32 and __u64 from <linux/types.h> drm: Kbuild: add admgpu_drm.h to the installed headers drm: use __u{32,64} instead of uint{32,64}_t in virtgpu_drm.h ...
\| *	drm: fix inclusion of drm.h in exynos_sarea.h	Gabriel Laskar	2015-12-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using `#include "drm.h"` instead of `#include <drm/drm.h>` allow drm headers to be moved in another directory without changes, like for the libdrm imports. Signed-off-by: Gabriel Laskar <gabriel@lse.epita.fr> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> CC: Emil Velikov <emil.l.velikov@gmail.com> CC: Mikko Rapeli <mikko.rapeli@iki.fi>
* \|	drm/i915: Make the high dword offset more explicit in i915_reg_read_ioctl	Ville Syrjälä	2015-11-18	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Store the upper dword of the register offset in the whitelist as well. This would allow it to read register where the two halves aren't sitting right next to each other, and it'll make it easier to make register access type safe. While at it change the register offsets to u32 from u64. Our register space isn't quite that big, yet :) v2: Use ldw/udw as the suffixes, and add a note about 64bit wide split regs (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1446839021-18599-1-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>