Current uninitialized sensor detection does not work for me on nv4b and
sensor returns crazy values (>190°C). It stabilises later, but it's too
late - therm code shutdowns the machine...
Let's just reset it on init.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
+ the same for shutdown threshold - seems impossible, but shutdown can fail.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fixes "BUG: spinlock bad magic" on module load for nva3+ cards.
Introduced in commit "drm/nouveau/therm: implement support for temperature
alarms".
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
the variable chan is dereferenced in line 190, so it is no reason to check
null again in line 198.
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
It's safe to call kfree(NULL).
Found by smatch.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
fan->get returns int, but we write it to unsigned variable, and then check
whether it's >= 0 (it always is)
Found by smatch:
drivers/gpu/drm/nouveau/core/subdev/therm/fan.c:61 nouveau_fan_update() warn: always true condition '(duty >= 0) => (0-u32max >= 0)'
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
COPY_ZM_REG: destination and source addresses were swapped
RAM_RESTRICT_ZM_REG_GROUP: missing 0x prefix for register address
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We already rely on them having the same fields and layout.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This fixes a bug where, when temperature is outside of the linear range, fan
pwm would be outside of the allowed range ([0, 100]) and could get negative in
some cases.
It seems like a regression that happened when we re-worked the fan management
logic before merging.
Tested-by: Ozan Çağlayan <ozancag@gmail.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
- record channel owner process name
- add some helpers for accessing this information
- let nouveau_enum hold additional value (will be needed in the next patch)
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This should avoid the situation where a user gets its kernel logs flooded when
temperature oscillates around a threshold with 0°C hysteresis.
This patch is just meant to fix broken vbios (as reported on a nv4e on
sysfs hwmon interface.
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Commit 767baf82 drm/nouveau: remove some more unnecessary legacy bios code
has introduced a regression my misplacing the code that sets the major/chip
versions, which are used whist parsing the bmp/bit structure in vbios
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
My GTX660 has the GPIO_FAN function, but it's configured in input-mode;
presumably to monitor the frequency set by an I2C fan controller?
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Expose all the hysteresis parameters + shutdown (emergency) +
fan_boost (fixed pwm trip point).
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
For now, we only boost the fan speed to the maximum and auto-mode
when hitting the FAN_BOOST threshold and halt the computer when it
reaches the shutdown temperature. The downclock and critical thresholds
do nothing.
On nv43:50 and nva3+, temperature is polled because of the limited hardware.
I'll improve the nva3+ situation by implementing alarm management in PDAEMON
whenever I can but polling once every second shouldn't be such a problem.
v2 (Ben Skeggs):
- rebased
v3: fixed false-detections and threshold reprogrammation handling on nv50:nvc0
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
We are going to use PTHERM's IRQs for thermal monitoring but we need to route
them first.
On nv31-50, PBUS's IRQ line is shared with GPIOs IRQs.
It seems like nv10-31 GPIO interruptions aren't well handled. I kept the
original behaviour but it is wrong and may lead to an IRQ storm.
Since we enable all PBUS IRQs, we need a way to avoid being stormed if we
don't handle them. The solution I used was to mask the IRQs that have not been
handled. This will also print one message in the logs to let us know.
v2: drop the shared intr handler because of was racy
v3: style fixes
v4: drop a useless construct in the chipset-dependent INTR
v5: add BUS to the disable mask
v6 (Ben Skeggs):
- general tidy to match the rest of the driver's style
- nva3->nvc0, nva3 can be serviced just fine with nv50.c, rnndb even notes
that the THERM_ALARM bit got left in the hw until fermi anyway.. so, it's
not going to conflict
- removed the peephole and user stuff, for the moment.. will handle them
later if we find a good reason to actually care..
- limited INTR_EN to just what we can handle for now, mostly to prevent
spam of unknown status bits (seen on at least nv4x)
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
v2: improved design but drops safety monitoring (to be in a later patch)
v3: fix locking and mode management
v4: gently fallback to the no-control mode when temperature cannot be got
and use kernel-provided min/max macros
v5 (Ben Skeggs):
- rebased on my previous patches
v6: fix hysterisis management in trip-based auto fan management
This commit also forbids access to fan management to nvc0+ chipsets as
fan management is already taken care of my PDAEMON's default fw.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
v2 (Ben Skeggs):
- split from larger patch
- fixed to not require alarm resched patch
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
v2: change percent from int to atomic_t
v3: random fixes
v4 (Ben Skeggs):
- adapted for split-out fan-control "protocol" structure
- removed need for timer resched
- support for forcing 'toggle' control on PWM boards
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Mostly to allow for the possibility of testing 'toggle' fan control easily.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
This info will be used by two more implementations in upcoming commits.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Regression fixes since rework mostly.
* 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nvc0/fb: fix crash when different mutex is used to protect same list
drm/nouveau/clock: fix support for more than 2 monitors on nve0
drm/nv50/disp: fix selection of bios script for analog outputs
drm/nv17-50: restore fence buffer on resume
drm/nouveau: fix blank LVDS screen regression on pre-nv50 cards
drm/nouveau: fix nouveau_client allocation failure path
drm/nouveau: don't return freed object from nouveau_handle_create
drm/nouveau/vm: fix memory corruption when pgt allocation fails
drm/nouveau: add locking around instobj list operations
drm/nouveau: do not forcibly power on lvds panels
drm/nouveau/devinit: ensure legacy vga control is enabled during post
Fixes regression introduced in commit 861d2107
"drm/nouveau/fb: merge fb/vram and port to subdev interfaces"
nv50_fb_vram_{new,del} functions were changed to use
nouveau_subdev->mutex instead of the old nouveau_mm->mutex.
nvc0_fb_vram_new still uses the nouveau_mm->mutex, but nvc0 doesn't
have its own fb_vram_del function, using nv50_fb_vram_del instead.
Because of this, on nvc0 a different mutex ends up being used to protect
additions and deletions to the same list.
This patch is a -stable candidate for 3.7.
Signed-off-by: Aleksi Torhamo <aleksi@torhamo.net>
Reported-by: Roy Spliet <r.spliet@student.tudelft.nl>
Tested-by: Roy Spliet <r.spliet@student.tudelft.nl>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
Fixes regression introduced in commit 70790f4f
"drm/nouveau/clock: pull in the implementation from all over the place"
When code was moved from nv50_crtc_set_clock to nvc0_clock_pll_set,
the PLLs it is used for got limited to only the first two VPLLs.
nv50_crtc_set_clock was only called to change VPLLs, so it didn't
limit what it was used for in any way. Since nvc0_clock_pll_set is
used for all PLLs, it has to specify which PLLs the code is used for,
and only listed the first two VPLLs.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=58735
This patch is a -stable candidate for 3.7.
Signed-off-by: Aleksi Torhamo <aleksi@torhamo.net>
Tested-by: Aleksi Torhamo <aleksi@torhamo.net>
Tested-by: Sean Santos <quantheory@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
Analog output number was overwritten by value from digital output path.
Fix it.
Fixes resume from s2ram: https://bugs.freedesktop.org/show_bug.cgi?id=58729
(as stumbled on by J Binder, Pontus Fuchs and me)
Fixes blank screen on module load (reported by Sune Mølgaard).
Fixes regression from commit 186ecad21c
("drm/nv50/disp: move remaining interrupt handling into core").
Reported-by: J Binder <wheel@herr-der-mails.de>
Reported-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Reported-by: Sune Mølgaard <sune@molgaard.org>
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Tested-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Tested-by: Sune Mølgaard <sune@molgaard.org>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Since commit 5e120f6e4b "drm/nouveau/fence:
convert to exec engine, and improve channel sync" nouveau fence sync
implementation for nv17-50 and nvc0+ started to rely on state of fence buffer
left by previous sync operation. But as pinned bo's (where fence state is
stored) are not saved+restored across suspend/resume, we need to do it
manually.
nvc0+ was fixed by commit d6ba6d215a
"drm/nvc0/fence: restore pre-suspend fence buffer context on resume".
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=50121
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org