v2: rename ina209/ina219 read function
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
v2: add list_del call, reword error message
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
v2: add list_del calls
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
When we start communicating with the pmu a bit more, the current code is
a real issue. I encountered a dead lock here, while testing my dynamic
reclocking code
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
In case of successful suspend, devinit will have to be run and this is
the behavior currently hardcoded. However, as FD bug 94725 suggests,
there might be cases where runtime suspend leaves the GPU powered, and
in such cases devinit should not be run on resume.
On GF100+ we have a reliable way to know whether we need to run devinit.
Use it instead of blindly trusting the flag set by nvkm_devinit_fini().
The code around the NvForcePost also needs to be slightly reworked in
order to keep working.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Suggested-by: Dave Airlie <airlied@redhat.com>
Suggested-by: Karol Herbst <nouveau@karolherbst.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Add a basic clock driver that reuses the GK20A logic.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Make functions/structures that the GM20B driver will reuse public.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Err on the safe side by setting the lowest frequency (and thus voltage)
during device init.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This allows to instanciate drivers that use the same logic as gk20a with
different parameters.
Add a constructor function to allow other chips that inherit from this
clock to easily initialize its members
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
pl_to_div may be done differently depending on the chip. Abstract this
operation so the same logic can be reused for them as well.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This allows us to read them using one single function and will be handy
to the GM20B driver.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Most users are probably not interested in this information.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Only restore the 1:1 divider if it is not set already. Also use the
proper masks for this operation and add a second write as done in the
Android code.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
n_lo is used if we are going to slide. Compute it only if that condition
succeeds to avoid confusion about future usage of this computation.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fix the mask specified to switch to VCO mode was given as an (incorrect)
immediate value. Although the side-effect happens to be the same, this
is clearly incorrect.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
gk20a_pllg_disable() is only used in the context of gk20a_clk_fini().
Move its body there and rename _gk20a_pllg_enable() and
_gk20a_pllg_disable() to non-underscored versions.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Move some variables declarations to the scope where they are actually
used to make the code easier to follow.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Perform computations in Khz instead of Mhz for better precision.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Add basic GM20B volt driver that reuses the GK20A logic.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Split the constructor function so we can reuse the same logic in other
chips.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The CVB calculation and voltage setting functions can be reused for the
future chips. So move the declaration to gk20a.h.
Signed-off-by: Vince Hsu <vinceh@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
the macro deals with target specific differences and so we should always use
this
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
on gk208+ we can simply mov 32bits, so we should have a single mov there
v2: use or operator instead of add
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
When using the DMA-API for instmem, we may obtain a write-combined
mapping. For such cases, add a write barrier in
gk20a_instobj_release_dma() to make sure that all writes have reached
memory at this time.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
LTC operations timeout was set to 2ms, which may be too low for devices
that run at very low clocks (e.g. GM20B) and trigger timeout messages.
Set the timeout to the default 2s. Also remove the redundant error
messages since nvkm_wait_msec() will already display a warning.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
based on Martins initial work
v3: fix ina2x9 calculations
v4: don't kmalloc(0), fix the lsb/pga stuff
v5: add a field to tell if the power reading may be invalid
add nkvm_iccsense_read_all function
check for the device on the i2c bus
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Karol Herbst:
v4: don't kmalloc(0)
v5: stricter validation
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Add secure boot support for the GM20B chip found in Tegra X1. Secure
boot on Tegra works slightly differently from desktop, notably in the
way the WPR region is set up.
In addition, the firmware bootloaders use a slightly different header
format.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Add secure-boot for the dGPU set of GM20X chips, using the PMU as the
high-secure falcon.
This work is based on Deepak Goyal's initial port of Secure Boot to
Nouveau.
v2. use proper memory target function
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
On GM200 and later GPUs, firmware for some essential falcons (notably
GR ones) must be authenticated by a NVIDIA-produced signature and
loaded by a high-secure falcon in order to be able to access privileged
registers, in a process known as Secure Boot.
Secure Boot requires building a binary blob containing the firmwares
and signatures of the falcons to be loaded. This blob is then given to
a high-secure falcon running a signed loader firmware that copies the
blob into a write-protected region, checks that the signatures are
valid, and finally loads the verified firmware into the managed falcons
and switches them to privileged mode.
This patch adds infrastructure code to support this process on chips
that require it.
v2:
- The IRQ mask of the PMU falcon was left - replace it with the proper
irq_mask variable.
- The falcon reset procedure expecting a falcon in an initialized state,
which was accidentally provided by the PMU subdev. Make sure that
secboot can manage the falcon on its own.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Upon encountering an unknown condition code, the script interpreter
is supposed to skip 'size' bytes and continue at the next devinit
token.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
It is not advisable to perform devinit if it has already been done.
VBIOS will very likely have invoked devinit if the GPU is the primary
graphics device, but there is no accurate way to detect this fact yet.
This patch adds such a method for gf100 and later chips, by means of the
NV_PTOP_SCRATCH1_DEVINIT_COMPLETED bit. This bit is set to 1 by devinit,
and reset to 0 when the GPU is powered.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We never use any nv50-specific member in this nv50_devinit_preinit().
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Patch "ltc/gm107: use nvkm_mask to set cbc_ctrl1" sets the 3rd bit
of the CTRL1 register instead of writing it entirely in
gm107_ltc_cbc_clear(). As a counterpart, gm107_ltc_cbc_wait() must also
be modified to wait on that single bit only, otherwise a timeout may
occur if some other bit of that register is set. This happened at least
on GM206 when running glmark2-drm.
While we are at it, use the more compact nvkm_wait_msec() to wait for
the bit to clear.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The asm-generic tree this time contains one series from Nicolas Pitre
that makes the optimized do_div() implementation from the ARM
architecture available to all architectures. This also adds stricter
type checking for callers of do_div, which has uncovered a number
of bugs in existing code, and fixes up the ones we have found.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAVqARKWCrR//JCVInAQJrBhAAlwZL0IiVGFfDXWtvQGOm+yC5j4vdIhMf
1scsvRbk3ln1xUk5+NM61NpxbQotro78K5HxFZFhaVGUTbbFXM9w2VZSyI8ZaGAJ
Od6lBUUyLQmzlbHDJ3v/zrZn8Up7qZlRApmXcbUVDtssfnEfKk4xA2RG9JwIMS1c
uZMvnD7N3P9vxDPl+CsYlB2osi6Yks3VQ1tXYe2z6siO+H67zHaF08+ls7fbsd3d
oyKjZqlaQ02MIOr+AdR0h9iKyJJ6SXT0DQlsMyzB6aBWmeBCNLNALNIiukDk9Qc1
VV3sF1MOS3LtfU2TeOx4Na7hcd2iC6WYLb271iApO2Ww7t16n+de3i6AipZxLUJ0
08jiRlisTzUhXDobRSqI3mcQlxrB5UGfyblab2z/MqGGmIGJSPPRdTPRQUgi0ZKg
jksSmsaPwOQp64FhTgECLJthlYX7h6ULjkvJ9h60gZHa4jhGZbGPeMwHPf1uSm95
EvQE971Ssgm4jwhvxZ/kt1ruuZI/fxxG1Qfw+C25QkXZGKye2nB+icLWeMwz+FXG
HLqkmaAjasf5MAV1GiK8U6zoC6bCOLU0Lea83hOwRPZ999v3Nym1giSatNv4/pB+
QmkXRvFi93cdQ643l7xcUEDT2zpk4pogF3xREiBhyaXtqLlT7pPMKsBQOgdWvFuu
Ou0ZbEAwIVo=
=4psa
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"The asm-generic tree this time contains one series from Nicolas Pitre
that makes the optimized do_div() implementation from the ARM
architecture available to all architectures.
This also adds stricter type checking for callers of do_div, which has
uncovered a number of bugs in existing code, and fixes up the ones we
have found"
* tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
ARM: asm/div64.h: adjust to generic codde
__div64_32(): make it overridable at compile time
__div64_const32(): abstract out the actual 128-bit cross product code
do_div(): generic optimization for constant divisor on 32-bit machines
div64.h: optimize do_div() for power-of-two constant divisors
mtd/sm_ftl.c: fix wrong do_div() usage
drm/mgag200/mgag200_mode.c: fix wrong do_div() usage
hid-sensor-hub.c: fix wrong do_div() usage
ti/fapll: fix wrong do_div() usage
ti/clkt_dpll: fix wrong do_div() usage
tegra/clk-divider: fix wrong do_div() usage
imx/clk-pllv2: fix wrong do_div() usage
imx/clk-pllv1: fix wrong do_div() usage
nouveau/nvkm/subdev/clk/gk20a.c: fix wrong do_div() usage
v2: rename and group functions
v4: change copyright information
move printing of pcie speeds into oneinit,
rename all pcie functions to nvkm_pcie_*
don't try to raise the pcie version when no higher one is supported
v5: revert Copyright changes and rename nvkm_pcie_raise_version to nvkm_pcie_set_version
v6: remove some useless pci_is_pcie checks and rework messages
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Commit 69c4938249 ("drm/nouveau/instmem/gk20a: use direct CPU access")
tried to be smart while using the DMA-API by managing the CPU mappings of
buffers allocated with the DMA-API by itself. In doing so, it relied
on dma_to_phys() which is an architecture-private function not
available everywhere. This broke the build on several architectures.
Since there is no reliable and portable way to obtain the physical
address of a DMA-API buffer, stop trying to be smart and just use the
CPU mapping that the DMA-API can provide. This means that buffers will
be CPU-mapped for all their life as opposed to when we need them, but
anyway using the DMA-API here is a fallback for when no IOMMU is
available so we should not expect optimal behavior.
This makes the IOMMU and DMA-API implementations of instmem diverge
enough that we should maybe put them into separate files...
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The LRU list used for recycling CPU mappings was handling concurrency
very poorly. For instance, if an instobj was acquired twice before being
released once, it would end up into the LRU list even though there is
still a client accessing it.
This patch fixes this by properly counting how many clients are
currently using a given instobj.
While at it, we also raise errors when inconsistencies are detected, and
factorize some code.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is an oversight that made use of the trip-point-based fan managenent on
cards that never expose those. This led the fan to stay at fan_min.
Fortunately, the emergency code would kick when the temperature would reach
90°C.
Reported-by: Tom Englund <tomenglund26@gmail.com>
Tested-by: Tom Englund <tomenglund26@gmail.com>
Signed-off-by: Martin Peres <martin.peres@free.fr>
Tested-by: Daemon32 <lnf.purple@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92126
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
Just the one commit I mentioned earlier, making the PGOB workaround the
default.
* 'linux-4.4' of https://github.com/skeggsb/linux:
drm/nouveau/pmu: remove whitelist for PGOB-exit WAR, enable by default
NVIDIA have indicated that the workaround is required on all GK10[467]
boards that have the PGOB fuse set.
I've left the commandline option in place for now, as paranoia.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs wrote:
A couple of regression fixes, some more boards whitelisted for a hw bug
workaround, gr/ucode fixes for hangs a user is seeing.
The changes look larger than they actually are due to the ucode binaries
(*.fucN.h) being regenerated.
* 'linux-4.4' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau/volt/pwm/gk104: fix an off-by-one resulting in the voltage not being set
drm/nouveau/nvif: allow userspace access to its own client object
drm/nouveau/gr/gf100-: fix oops when calling zbc methods
drm/nouveau/gr/gf117-: assume no PPC if NV_PGRAPH_GPC_GPM_PD_PES_TPC_ID_MASK is zero
drm/nouveau/gr/gf117-: read NV_PGRAPH_GPC_GPM_PD_PES_TPC_ID_MASK from correct GPC
drm/nouveau/gr/gf100-: split out per-gpc address calculation macro
drm/nouveau/bios: return actual size of the buffer retrieved via _ROM
drm/nouveau/instmem: protect instobj list with a spinlock
drm/nouveau/pci: enable c800 magic for some unknown Samsung laptop
drm/nouveau/pci: enable c800 magic for Clevo P157SM
No locking is required for the traversal of this list, as it only
happens during suspend/resume where nothing else can be executing.
Fixes some of the issues noticed during parallel piglit runs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
gk20a is an ARM only GPU, so we can just do the correct thing on
ARM but fail on other architectures. The other option was to use
SWIOTLB as the define, which means phys_to_page exists, but
this seems clearer.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch uses an approach closer to the nvidia driver to configure
both PLLs for high gddr5 memory clocks (usually above 2400MHz)
Previously nouveau used the one PLL as it was used for the lower clocks
and just adjusted the second PLL to get as close as possible to the
requested clock. This means for my card, that I got a 4050 MHz clock
although 4008 MHz was requested.
Now the driver iterates over a list of PLL configuration also used by
the nvidia driver and then adjust the second PLL to get near the
requested clock. Also it hold to some restriction I found while
analyzing the PLL configurations
This won't fix all gddr5 high clock issues itself, but it should be
fine on hybrid gpu systems as found on many laptops these days. Also
switching while normal desktop usage should be a lot more stable than
before.
v2: move the pll code into ramgk104
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Your milage may vary, as it's only been tested on a single G94 and one G96.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Avoids waiting for VBLANKS that never arrive on headless or otherwise
unconventional set-ups. Strategy taken from MEMX.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
10053c is not even read on some cards, and I have no idea exactly what the
criteria are. Likely NVIDIA pre-scans the VBIOS and in their driver disables
all features that are never used. The practical effect should be the same
as this implementation though.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Like Pierre's G94. We might want to structure Kepler similarly in a follow-up.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Does not seem to be necessary for NVA0, hence untested by me.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Seems to be mostly equal to DDR3 on < GT218, should improve stability for
DDR2 reclocks.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
In preparation of changing FBVDDQ, as observed on at least one GDDR3 card.
While at it, adhere to func.log[1] properly for consistency.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
If the hardware supports extended tag field (8-bit ones), then enable it.
This is usually done by the VBIOS, but not on some MBPs (see fdo#86537).
In case extended tag field is not supported, 5-bit tag field is used which
limits the possible number of requests to 32. Apparently bits 7:0 of
0x08841c stores some number of outstanding requests, so cap it to 32 if
extended tag is unsupported.
Fixes: fdo#86537
v2: Restrict changes to chipsets >= 0x84
v3:
* Add nvkm_pci_mask to pci.h
* Mask bit 8 before setting it
v4:
* Rename `add` argument of nvkm_pci_mask to `value`
* Move code from nvkm_pci_init to g84_pci_init and remove PCIe and chipset
checks
v5:
* Rebase code on latest PCI structure
* Restore PCIe check
* Fix namings in nvkm_pci_mask
* Rephrase part of the commit message
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
coverity.com reported that memset was using a buffer of size 0, on
checking the code it turned out that the function was not being used. So
remove it.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Was not able to obtain a trace of NVRM due to kernel version annoyances,
however, experimentally confirmed that the WAR we use on NV50/G8x boards
works here too.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Increase clock timeout of some unknown engines in order to avoid failure
at high gpcclk rate.
This fixes IBUS read faults on my GF119 when reclocking is manually
enabled. Note that memory reclocking is completely broken and NvMemExec
has to be disabled to allow core clock reclocking only.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Most Keplers actually use the GPIO-based voltage management instead of the new
PWM-based one. Use the GPIO mode as a fallback as it already gracefully handles
the case where no GPIOs exist.
All the Maxwells seem to use the PWM method though.
v2:
- Do not forget to commit the PWM configuration change!
Signed-off-by: Martin Peres <martin.peres@free.fr>
This patch is not ideal but it definitely beats a rewrite of the current
interface and is very self-contained.
Signed-off-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Use the IOMMU bit specified in platform data instead of hardcoding it to
the bit used by current Tegra GPUs.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The Great Nouveau Refactoring Take II brought us a lot of goodness,
including acquire/release methods that are called before and after an
instobj is modified. These functions can be used as synchronization
points to manage CPU/GPU coherency if we modify an instobj using the
CPU.
This patch replaces the legacy and slow PRAMIN access for gk20a instmem
with CPU mappings and writes. A LRU list is used to unmap unused
mappings after a certain threshold (currently 1MB) of mapped instobjs is
reached. This allows mappings to be reused most of the time.
Accessing instobjs using the CPU requires to maintain the GPU L2 cache,
which we do in the acquire/release functions. This triggers a lot of L2
flushes/invalidates, but most of them are performed on an empty cache
(and thus return immediately), and overall context setup performance
greatly benefits from this (from 250ms to 160ms on Jetson TK1 for a
simple libdrm program).
Making L2 management more explicit should allow us to grab some more
performance in the future.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Allow clients to manually flush and invalidate L2. This will be useful
for Tegra systems for which we want to write instmem using the CPU.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
These are useful for systems without a coherent CPU/GPU bus. For such
systems we may need to maintain the L2 ourselves.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Some devices may not have a PMU. Avoid a NULL pointer dereference in
such cases by checking whether the pointer given to nvkm_pmu_pgob() is
valid.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Currently OF bios load fails for a few reasons:
- checksum failure
- bios size too small
- no PCIR header
- bios length not a multiple of 4
In this change, we resolve all of the above by ignoring any checksum
failures (since OF VBIOS tends not to have a checksum), and faking the
PCIR data when loading from OF.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
SiS 761 chipset does not support AGP cards but has AGP capability (for
the onboard video). At least PC Chips A31G board using this chipset has
an AGP-like AGPro slot that's wired to the PCI bus. Enabling AGP will
fail (GPU lockup and software fbcon, X11 hangs).
Add support for matching just the host bridge in nvkm_device_agp_quirks
and add entry for SiS 761 with mode 0 (AGP disabled).
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
three nouveau regression fixes.
* 'linux-4.3' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau/device: enable c800 quirk for tecra w50
drm/nouveau/clk/gt215: Unbreak engine pausing for GT21x/MCP7x
drm/nouveau/gr/nv04: fix big endian setting on gr context
Typo that snuck in with commit 6979c6303a
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reported-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The copyright header in nvkm/engine/device/platform.c has been replaced
with the NVIDIA one from drm/nouveau_platform.c, as most of the actual
code is now theirs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Doesn't fix any known issue, but best be safe in case control is handed
to us from firmware with these left enabled.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This ensures we have a valid mask of disabled engines before we start
trying to execute fini()/init() on the subdevs, potentially touching
devices that don't exist.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
An upcoming commit requires being able to modify the PRAMIN BAR page
tables while already holding the MMU subdev mutex.
To solve this issue, each VM has been given its own mutex. As a nice
side-effect, this also allows separate VMs to be updated concurrently.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>