Current uninitialized sensor detection does not work for me on nv4b and
sensor returns crazy values (>190°C). It stabilises later, but it's too
late - therm code shutdowns the machine...
Let's just reset it on init.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
+ the same for shutdown threshold - seems impossible, but shutdown can fail.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fixes "BUG: spinlock bad magic" on module load for nva3+ cards.
Introduced in commit "drm/nouveau/therm: implement support for temperature
alarms".
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
the variable chan is dereferenced in line 190, so it is no reason to check
null again in line 198.
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
It's safe to call kfree(NULL).
Found by smatch.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
fan->get returns int, but we write it to unsigned variable, and then check
whether it's >= 0 (it always is)
Found by smatch:
drivers/gpu/drm/nouveau/core/subdev/therm/fan.c:61 nouveau_fan_update() warn: always true condition '(duty >= 0) => (0-u32max >= 0)'
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
COPY_ZM_REG: destination and source addresses were swapped
RAM_RESTRICT_ZM_REG_GROUP: missing 0x prefix for register address
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We already rely on them having the same fields and layout.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This fixes a bug where, when temperature is outside of the linear range, fan
pwm would be outside of the allowed range ([0, 100]) and could get negative in
some cases.
It seems like a regression that happened when we re-worked the fan management
logic before merging.
Tested-by: Ozan Çağlayan <ozancag@gmail.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
- record channel owner process name
- add some helpers for accessing this information
- let nouveau_enum hold additional value (will be needed in the next patch)
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This should avoid the situation where a user gets its kernel logs flooded when
temperature oscillates around a threshold with 0°C hysteresis.
This patch is just meant to fix broken vbios (as reported on a nv4e on
sysfs hwmon interface.
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Commit 767baf82 drm/nouveau: remove some more unnecessary legacy bios code
has introduced a regression my misplacing the code that sets the major/chip
versions, which are used whist parsing the bmp/bit structure in vbios
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
My GTX660 has the GPIO_FAN function, but it's configured in input-mode;
presumably to monitor the frequency set by an I2C fan controller?
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Expose all the hysteresis parameters + shutdown (emergency) +
fan_boost (fixed pwm trip point).
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
For now, we only boost the fan speed to the maximum and auto-mode
when hitting the FAN_BOOST threshold and halt the computer when it
reaches the shutdown temperature. The downclock and critical thresholds
do nothing.
On nv43:50 and nva3+, temperature is polled because of the limited hardware.
I'll improve the nva3+ situation by implementing alarm management in PDAEMON
whenever I can but polling once every second shouldn't be such a problem.
v2 (Ben Skeggs):
- rebased
v3: fixed false-detections and threshold reprogrammation handling on nv50:nvc0
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
We are going to use PTHERM's IRQs for thermal monitoring but we need to route
them first.
On nv31-50, PBUS's IRQ line is shared with GPIOs IRQs.
It seems like nv10-31 GPIO interruptions aren't well handled. I kept the
original behaviour but it is wrong and may lead to an IRQ storm.
Since we enable all PBUS IRQs, we need a way to avoid being stormed if we
don't handle them. The solution I used was to mask the IRQs that have not been
handled. This will also print one message in the logs to let us know.
v2: drop the shared intr handler because of was racy
v3: style fixes
v4: drop a useless construct in the chipset-dependent INTR
v5: add BUS to the disable mask
v6 (Ben Skeggs):
- general tidy to match the rest of the driver's style
- nva3->nvc0, nva3 can be serviced just fine with nv50.c, rnndb even notes
that the THERM_ALARM bit got left in the hw until fermi anyway.. so, it's
not going to conflict
- removed the peephole and user stuff, for the moment.. will handle them
later if we find a good reason to actually care..
- limited INTR_EN to just what we can handle for now, mostly to prevent
spam of unknown status bits (seen on at least nv4x)
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
v2: improved design but drops safety monitoring (to be in a later patch)
v3: fix locking and mode management
v4: gently fallback to the no-control mode when temperature cannot be got
and use kernel-provided min/max macros
v5 (Ben Skeggs):
- rebased on my previous patches
v6: fix hysterisis management in trip-based auto fan management
This commit also forbids access to fan management to nvc0+ chipsets as
fan management is already taken care of my PDAEMON's default fw.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
v2 (Ben Skeggs):
- split from larger patch
- fixed to not require alarm resched patch
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
v2: change percent from int to atomic_t
v3: random fixes
v4 (Ben Skeggs):
- adapted for split-out fan-control "protocol" structure
- removed need for timer resched
- support for forcing 'toggle' control on PWM boards
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Mostly to allow for the possibility of testing 'toggle' fan control easily.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
This info will be used by two more implementations in upcoming commits.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Pull x86 fixes from Peter Anvin:
"This is a collection of miscellaneous fixes, the most important one is
the fix for the Samsung laptop bricking issue (auto-blacklisting the
samsung-laptop driver); the efi_enabled() changes you see below are
prerequisites for that fix.
The other issues fixed are booting on OLPC XO-1.5, an UV fix, NMI
debugging, and requiring CAP_SYS_RAWIO for MSR references, just as
with I/O port references."
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
samsung-laptop: Disable on EFI hardware
efi: Make 'efi_enabled' a function to query EFI facilities
smp: Fix SMP function call empty cpu mask race
x86/msr: Add capabilities check
x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES
x86/olpc: Fix olpc-xo1-sci.c build errors
arch/x86/platform/uv: Fix incorrect tlb flush all issue
x86-64: Fix unwind annotations in recent NMI changes
x86-32: Start out cr0 clean, disable paging before modifying cr3/4
Originally 'efi_enabled' indicated whether a kernel was booted from
EFI firmware. Over time its semantics have changed, and it now
indicates whether or not we are booted on an EFI machine with
bit-native firmware, e.g. 64-bit kernel with 64-bit firmware.
The immediate motivation for this patch is the bug report at,
https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557
which details how running a platform driver on an EFI machine that is
designed to run under BIOS can cause the machine to become
bricked. Also, the following report,
https://bugzilla.kernel.org/show_bug.cgi?id=47121
details how running said driver can also cause Machine Check
Exceptions. Drivers need a new means of detecting whether they're
running on an EFI machine, as sadly the expression,
if (!efi_enabled)
hasn't been a sufficient condition for quite some time.
Users actually want to query 'efi_enabled' for different reasons -
what they really want access to is the list of available EFI
facilities.
For instance, the x86 reboot code needs to know whether it can invoke
the ResetSystem() function provided by the EFI runtime services, while
the ACPI OSL code wants to know whether the EFI config tables were
mapped successfully. There are also checks in some of the platform
driver code to simply see if they're running on an EFI machine (which
would make it a bad idea to do BIOS-y things).
This patch is a prereq for the samsung-laptop fix patch.
Cc: David Airlie <airlied@linux.ie>
Cc: Corentin Chary <corentincj@iksaif.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Peter Jones <pjones@redhat.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Steve Langasek <steve.langasek@canonical.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Just a few small things:
- 2x workaround bits from Chris to fix up the new scanline waits enabled
in 3.8 on snb. People who've been struck by this on dual-screen also
need to upgrade the ddx.
- Dump the kernel version into i915_error_state, we've had a few mixups
there recently.
- Disable gfx DMAR on gen4 devices, acked by David Woodhouse.
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: dump UTS_RELEASE into the error_state
iommu/intel: disable DMAR for g4x integrated gfx
drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits
drm/i915: Disable AsyncFlip performance optimisations
V2: Add mutex protection, while read.
The hdmi and mixer win_commit calls currently are
not checking the status of IP before updating the
respective registers, this patch adds this check.
Signed-off-by: Shirish S <s.shirish@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Fixes the following warning:
drivers/gpu/drm/exynos/exynos_drm_hdmi.c:111:13: warning:
symbol 'drm_hdmi_get_edid' was not declared. Should it be static?
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Inki Dae <inki.dae@samsung.com>