drivers/gpu/drm/i915/i915_drv.h · e36891900855b3bed5d9cc9209655e6dfa435a5f · xcap / xcap-capability-linux

drm/i915: Dynamic Parity Detection handling · e3689190

Ben Widawsky authored May 25, 2012



On IVB hardware we are given an interrupt whenever a L3 parity error
occurs in the L3 cache. The L3 cache is used by internal GPU clients
only.  This is a very rare occurrence (in fact to test this I need to
use specially instrumented silicon).

When a row in the L3 cache detects a parity error the HW generates an
interrupt. The interrupt is masked in GTIMR until we get a chance to
read some registers and alert userspace via a uevent. With this
information userspace can use a sysfs interface (follow-up patch) to
remap those rows.

Way above my level of understanding, but if a given row fails, it is
statistically more likely to fail again than a row which has not failed.
Therefore it is desirable for an operating system to maintain a lifelong
list of failing rows and always remap any bad rows on driver load.
Hardware limits the number of rows that are remappable per bank/subbank,
and should more than that many rows detect parity errors, software
should maintain a list of the most frequent errors, and remap those
rows.

V2: Drop WARN_ON(IS_GEN6) (Jesse)
DRM_DEBUG row/bank/subbank on errror (Jesse)
Comment updates (Jesse)

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

e3689190