Be consistent, and even when we know we had used a WC, flush the mapped
object after writing into it. The flush understands the mapping type and
will only clflush if !I915_MAP_WC, but will always insert a wmb [sfence]
so that we can be sure that all writes are visible.
v2: Add the unconditional wmb so we are know that we always flush the
writes to memory/HW at that point.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200511141304.599-1-chris@chris-wilson.co.uk
The bspec lists both the clock frequency and the effective interval. The
interval corresponds to observed behaviour, so adjust the frequency to
match.
v2: Mika rightfully asked if we could measure the clock frequency from a
selftest.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200427154554.12736-1-chris@chris-wilson.co.uk
For many configuration details within RC6 and RPS we are programming
intervals for the internal clocks. From gen11, these clocks are
configuration via the RPM_CONFIG and so for convenience, we would like
to convert to/from more natural units (ns).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424162805.25920-2-chris@chris-wilson.co.uk
For verifying reciving the EI interrupts, we need to hold the GPU in
very precise conditions (in terms of C0 cycles during the EI). If we
preempt the busy load to handle the heartbeat, this may perturb the busy
load causing us to miss the interrupt.
The other tests, while not as time sensitive, may also be slightly
perturbed, so apply the heartbeat protection across all the
measurements.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422083855.26842-1-chris@chris-wilson.co.uk
Having noticed that MI_BB_START is incurring a memory stall (see the
correlation with uncore frequency), we have to unroll the loop in order
to diminish the impact of the MI_BB_START on the instruction throughput.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200421171351.19575-1-chris@chris-wilson.co.uk
Let's isolate the impact of cpu frequency selection on determing the GPU
throughput in response to selection of RPS frequencies.
For real systems, we do have to be concerned with the impact of
integrating c-states, p-states and rp-states, but for the sake of
proving whether or not RPS works, one baby step at a time.
For the record, as one would hope, it does not seem to impact on the
measured performance, but we do it anyway to reduce the number of
variables. Later, we can extend the testing to encourage the the
cpu/pkg to try and sleep while the GPU is busy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200421142236.8614-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20200421142236.8614-1-chris@chris-wilson.co.uk
If we detect that the RPS end points do not scale perfectly, take the
time to measure all the in between values as well. We are aborting the
test, so we might as well spend the available time gathering critical
debug information instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200421124636.22554-1-chris@chris-wilson.co.uk
After having testing all the RPS controls individually, we need to take
a step back and check how our RPS worker integrates them to perform
dynamic GPU reclocking. So do that by submitting a spinner and wait and
see what happens.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200420172739.11620-6-chris@chris-wilson.co.uk
If we can not manipulate the frequency with RPS, then comparing min/max
power consumption is pointless / misleading. We will leave the warning
about not being able to control the frequency selection via RPS to other
tests so as not to confuse this more specialised check.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200420172739.11620-2-chris@chris-wilson.co.uk
One of the core tenents of reclocking the GPU is that its throughput
scales with the clock frequency. We can observe this by incrementing a
loop counter on the GPU, and compare the different execution rates at
the notional RPS frequencies.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200420172739.11620-1-chris@chris-wilson.co.uk
A basic premise of RPS is that at lower frequencies, not only do we run
slower, but we save power compared to higher frequencies. For example,
when idle, we set the minimum frequency just in case there is some
residual current. Since the power curve should be a physical
relationship, if we find no power saving it's likely that we've broken
our frequency handling, so test!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200417152018.13079-2-chris@chris-wilson.co.uk
It seems that although (perhaps because of the memory stall?) the
spinner has signaled that it has started, it still takes some time to
spin up to 100% utilisation of the HW. Since the test depends on the
full utilisation of the HW to trigger the RPS interrupt, wait a little
bit and flush the interrupt status to be sure that the event we see if
from the spinner.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200417093928.17822-1-chris@chris-wilson.co.uk
Since we depend upon RPS generating interrupts after evaluation
intervals to determine when to up/down clock the GPU, it is imperative
that we successfully enable interrupt generation! Verify that we do see
an interrupt if we keep the GPU busy for an entire EI.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200415170318.16771-1-chris@chris-wilson.co.uk