Disable runtime power management on several sienna cichlid
cards, otherwise SMU will possibly fail to be resumed from
runtime suspend. Will drop this after a clean solution between
kernel driver and SMU FW is available.
amdgpu 0000:63:00.0: amdgpu: GECC is enabled
amdgpu 0000:63:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:63:00.0: amdgpu: SMU is resuming...
amdgpu 0000:63:00.0: amdgpu: SMU: I'm not done with your command: SMN_C2PMSG_66:0x0000000E SMN_C2PMSG_82:0x00000080
amdgpu 0000:63:00.0: amdgpu: Failed to SetDriverDramAddr!
amdgpu 0000:63:00.0: amdgpu: Failed to setup smc hw!
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
amdgpu 0000:63:00.0: amdgpu: amdgpu_device_ip_resume failed (-62)
v2: seperate to a function.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
From the GC info table to the gfx config structure in the
driver. The driver will use this data to configure the
card correctly.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add new or modify existing SEC() definitions to be target-less and
validate that libbpf handles such program definitions correctly.
For kprobe/kretprobe we also add explicit test that generic
bpf_program__attach() works in cases when kprobe definition contains
proper target. It wasn't previously tested as selftests code always
explicitly specified the target regardless.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220428185349.3799599-4-andrii@kernel.org
Similar to previous patch, support target-less definitions like
SEC("fentry"), SEC("freplace"), etc. For such BTF-backed program types
it is expected that user will specify BTF target programmatically at
runtime using bpf_program__set_attach_target() *before* load phase. If
not, libbpf will report this as an error.
Aslo use SEC_ATTACH_BTF flag instead of explicitly listing a set of
types that are expected to require attach_btf_id. This was an accidental
omission during custom SEC() support refactoring.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220428185349.3799599-3-andrii@kernel.org
While technically Xen dom0 is a virtual machine too, it does have
access to most of the hardware so it doesn't need to be considered a
"passthrough". Commit b818a5d374 ("drm/amdgpu/gmc: use PCI BARs for
APUs in passthrough") changed how FB is accessed based on passthrough
mode. This breaks amdgpu in Xen dom0 with message like this:
[drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
While the reason for this failure is unclear, the passthrough mode is
not really necessary in Xen dom0 anyway. So, to unbreak booting affected
kernels, disable passthrough mode in this case.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1985
Fixes: b818a5d374 ("drm/amdgpu/gmc: use PCI BARs for APUs in passthrough")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In a lot of cases the target of kprobe/kretprobe, tracepoint, raw
tracepoint, etc BPF program might not be known at the compilation time
and will be discovered at runtime. This was always a supported case by
libbpf, with APIs like bpf_program__attach_{kprobe,tracepoint,etc}()
accepting full target definition, regardless of what was defined in
SEC() definition in BPF source code.
Unfortunately, up till now libbpf still enforced users to specify at
least something for the fake target, e.g., SEC("kprobe/whatever"), which
is cumbersome and somewhat misleading.
This patch allows target-less SEC() definitions for basic tracing BPF
program types:
- kprobe/kretprobe;
- multi-kprobe/multi-kretprobe;
- tracepoints;
- raw tracepoints.
Such target-less SEC() definitions are meant to specify declaratively
proper BPF program type only. Attachment of them will have to be handled
programmatically using correct APIs. As such, skeleton's auto-attachment
of such BPF programs is skipped and generic bpf_program__attach() will
fail, if attempted, due to the lack of enough target information.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220428185349.3799599-2-andrii@kernel.org
The logic to update the IO links when a KFD device
is removed was not correct as it would miss updating
the proximity domain values for some nodes where the
node_from and node_to both were greater values than the
proximity domain value of the KFD device being removed
from topology.
Fixes: 46d18d510d ("drm/amdkfd: Cleanup IO links during KFD device removal")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
All uses of the 'kfd->gtt_sa_bitmap' bitmap are protected with the
'kfd->gtt_sa_lock' mutex.
So:
- prefer the non-atomic '__set_bit()' function
- use the non-atomic 'bitmap_[set|clear]()' functions instead of
equivalent 'for' loops. These functions can work on several bits at a
time
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
'kfd->gtt_sa_bitmap' is a bitmap. So use 'bitmap_zalloc()' to simplify
code, improve the semantic and avoid some open-coded arithmetic in
allocator arguments.
Also change the corresponding 'kfree()' into 'bitmap_free()' to keep
consistency.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
From [1], I realized two other calls to dcn30 code are associated with
FPU operations and are not protected by DC_FP_* macros:
* dcn30_populate_dml_writeback_from_context()
* dcn30_set_mcif_arb_params()
So, since FPU-associated code is not fully isolated in dcn30, and
dcn3.1.x reuses them, let's wrap their calls properly.
Note: this patch complements the fix from [1].
[1] https://lore.kernel.org/amd-gfx/20220329082957.1662655-1-chandan.vurdigerenataraj@amd.com/
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lockdep reports the following deadlock scenarios for CXL root device
power-management, device_prepare(), operations, and device_shutdown()
operations for 'nd_region' devices:
Chain exists of:
&nvdimm_region_key --> &nvdimm_bus->reconfig_mutex --> system_transition_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(system_transition_mutex);
lock(&nvdimm_bus->reconfig_mutex);
lock(system_transition_mutex);
lock(&nvdimm_region_key);
Chain exists of:
&cxl_nvdimm_bridge_key --> acpi_scan_lock --> &cxl_root_key
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&cxl_root_key);
lock(acpi_scan_lock);
lock(&cxl_root_key);
lock(&cxl_nvdimm_bridge_key);
These stem from holding nvdimm_bus_lock() over hibernate_quiet_exec()
which walks the entire system device topology taking device_lock() along
the way. The nvdimm_bus_lock() is protecting against unregistration,
multiple simultaneous ops callers, and preventing activate_show() from
racing activate_store(). For the first 2, the lock is redundant.
Unregistration already flushes all ops users, and sysfs already prevents
multiple threads to be active in an ops handler at the same time. For
the last userspace should already be waiting for its last
activate_store() to complete, and does not need activate_show() to flush
the write side, so this lock usage can be deleted in these attributes.
Fixes: 48001ea50d ("PM, libnvdimm: Add runtime firmware activation support")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/165074883800.4116052.10737040861825806582.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The CXL "root" device, ACPI0017, is an attach point for coordinating
platform level CXL resources and is the parent device for a CXL port
topology tree. As such it has distinct locking rules relative to other
CXL subsystem objects, but because it is an ACPI device the lock class
is established well before it is given to the cxl_acpi driver.
However, the lockdep API does support changing the lock class "live" for
situations like this. Add a device_lock_set_class() helper that a driver
can use in ->probe() to set a custom lock class, and
device_lock_reset_class() to return to the default "no validate" class
before the custom lock class key goes out of scope after ->remove().
Note the helpers are all macros to support dead code elimination in the
CONFIG_PROVE_LOCKING=n case, however device_set_lock_class() still needs
#ifdef CONFIG_PROVE_LOCKING since lockdep_match_class() explicitly does
not have a helper in the CONFIG_PROVE_LOCKING=n case (see comment in
lockdep.h). The lockdep API needs 2 small tweaks to prevent "unused"
warnings for the @key argument to lock_set_class(), and a new
lock_set_novalidate_class() is added to supplement
lockdep_set_novalidate_class() in the cases where the lock class is
converted while the lock is held.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ben Widawsky <ben.widawsky@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/165100081305.1528964.11138612430659737238.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add full support for negotiating _OSC as defined in the CXL 2.0 spec, as
applicable to CXL-enabled platforms. Advertise support for the CXL
features we support - 'CXL 2.0 port/device register access', 'Protocol
Error Reporting', and 'CXL Native Hot Plug'. Request control for 'CXL
Memory Error Reporting'. The requests are dependent on CONFIG_* based
prerequisites, and prior PCI enabling, similar to how the standard PCI
_OSC bits are determined.
The CXL specification does not define any additional constraints on
the hotplug flow beyond PCIe native hotplug, so a kernel that supports
native PCIe hotplug, supports CXL hotplug. For error handling protocol
and link errors just use PCIe AER. There is nascent support for
amending AER events with CXL specific status [1], but there's
otherwise no additional OS responsibility for CXL errors beyond PCIe
AER. CXL Memory Errors behave the same as typical memory errors so
CONFIG_MEMORY_FAILURE is sufficient to indicate support to platform
firmware.
[1]: https://lore.kernel.org/linux-cxl/164740402242.3912056.8303625392871313860.stgit@dwillia2-desk3.amr.corp.intel.com/
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/20220413073618.291335-4-vishal.l.verma@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
OB In preparation for negotiating OS control of CXL _OSC features, do the
minimal enabling to use CXL _OSC to handle the base PCIe feature
negotiation. Recall that CXL _OSC is a super-set of PCIe _OSC and the
CXL 2.0 specification mandates: "If a CXL Host Bridge device exposes CXL
_OSC, CXL aware OSPM shall evaluate CXL _OSC and not evaluate PCIe
_OSC."
Rather than pass a boolean flag alongside @root to all the helper
functions that need to consider PCIe specifics, add is_pcie() and
is_cxl() helper functions to check the flavor of @root. This also
allows for dynamic fallback to PCIe _OSC in cases where an attempt to
use CXL _OXC fails. This can happen on CXL 1.1 platforms that publish
ACPI0016 devices to indicate CXL host bridges, but do not publish the
optional CXL _OSC method. CXL _OSC is mandatory for CXL 2.0 hosts.
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Robert Moore <robert.moore@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/20220413073618.291335-3-vishal.l.verma@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reintroduce the __kvm_nvhe_ symbols in kallsyms, ignoring the local
symbols in this namespace. The local symbols are not informative and
can cause aliasing issues when symbolizing the addresses.
With the necessary symbols now in kallsyms we can symbolize nVHE
addresses using the %p print format specifier:
[ 98.916444][ T426] kvm [426]: nVHE hyp panic at: [<ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-7-kaleshsingh@google.com
The hypervisor stacks (for both nVHE Hyp mode and nVHE protected mode)
are aligned such that any valid stack address has PAGE_SHIFT bit as 1.
This allows us to conveniently check for overflow in the exception entry
without corrupting any GPRs. We won't recover from a stack overflow so
panic the hypervisor.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-6-kaleshsingh@google.com
pkvm_hyp_alloc_private_va_range() can be used to reserve private VA ranges
in the pKVM nVHE hypervisor. Allocations are aligned based on the order of
the requested size.
This will be used to implement stack guard pages for pKVM nVHE hypervisor
(in a subsequent patch in the series).
Credits to Quentin Perret <qperret@google.com> for the idea of moving
private VA allocation out of __pkvm_create_private_mapping()
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-3-kaleshsingh@google.com
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf and netfilter.
Current release - new code bugs:
- bridge: switchdev: check br_vlan_group() return value
- use this_cpu_inc() to increment net->core_stats, fix preempt-rt
Previous releases - regressions:
- eth: stmmac: fix write to sgmii_adapter_base
Previous releases - always broken:
- netfilter: nf_conntrack_tcp: re-init for syn packets only,
resolving issues with TCP fastopen
- tcp: md5: fix incorrect tcp_header_len for incoming connections
- tcp: fix F-RTO may not work correctly when receiving DSACK
- tcp: ensure use of most recently sent skb when filling rate samples
- tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
- virtio_net: fix wrong buf address calculation when using xdp
- xsk: fix forwarding when combining copy mode with busy poll
- xsk: fix possible crash when multiple sockets are created
- bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from
bpf_xmit lwt hook
- sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event
- wireguard: device: check for metadata_dst with skb_valid_dst()
- netfilter: update ip6_route_me_harder to consider L3 domain
- gre: make o_seqno start from 0 in native mode
- gre: switch o_seqno to atomic to prevent races in collect_md mode
Misc:
- add Eric Dumazet to networking maintainers
- dt: dsa: realtek: remove realtek,rtl8367s string
- netfilter: flowtable: Remove the empty file"
* tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
tcp: fix F-RTO may not work correctly when receiving DSACK
Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
ixgbe: ensure IPsec VF<->PF compatibility
MAINTAINERS: Update BNXT entry with firmware files
netfilter: nft_socket: only do sk lookups when indev is available
net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
bnx2x: fix napi API usage sequence
tls: Skip tls_append_frag on zero copy size
Add Eric Dumazet to networking maintainers
netfilter: conntrack: fix udp offload timeout sysctl
netfilter: nf_conntrack_tcp: re-init for syn packets only
net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
net: Use this_cpu_inc() to increment net->core_stats
Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
Bluetooth: hci_event: Fix creating hci_conn object on error status
Bluetooth: hci_event: Fix checking for invalid handle on error status
ice: fix use-after-free when deinitializing mailbox snapshot
ice: wait 5 s for EMP reset after firmware flash
ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg()
...
In some cases the DT binding will fully describe the set of available
RPMh power-domains, but there is no reason for exposing them all in the
implementation.
Omitting individual data->domains is handle gracefully by
of_genpd_add_provider_onecell(), so there's no reason for printing a
warning when this occurs.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20220426233508.1762345-3-bjorn.andersson@linaro.org
io_rw_init_file does not initialize kiocb->private, so when iocb_bio_iopoll
reads kiocb->private it can contain uninitialized data.
Fixes: 3e08773c38 ("block: switch polling to be bio based")
Signed-off-by: Joseph Ravichandran <jravi@mit.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull thermal control fixes from Rafael Wysocki:
"These take back recent chages that started to confuse users and fix up
an attr.show callback prototype in a driver.
Specifics:
- Stop warning about deprecation of the userspace thermal governor
and cooling device status interface, because there are cases in
which user space has to drive thermal management with the help of
them (Daniel Lezcano)
- Fix attr.show callback prototype in the int340x thermal driver
(Kees Cook)"
* tag 'thermal-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/governor: Remove deprecated information
Revert "thermal/core: Deprecate changing cooling device state from userspace"
thermal: int340x: Fix attr.show callback prototype
Pull power management fixes from Rafael Wysocki:
"These fix up recent intel_idle driver changes and fix some ARM cpufreq
driver issues.
Specifics:
- Fix issues with the Qualcomm's cpufreq driver (Dmitry Baryshkov,
Vladimir Zapolskiy).
- Fix memory leak with the Sun501 driver (Xiaobing Luo).
- Make intel_idle enable C1E promotion on all CPUs when C1E is
preferred to C1 (Artem Bityutskiy).
- Make C6 optimization on Sapphire Rapids added recently work as
expected if both C1E and C1 are "preferred" (Artem Bityutskiy)"
* tag 'pm-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: Fix SPR C6 optimization
intel_idle: Fix the 'preferred_cstates' module parameter
cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts
cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe
cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms
cpufreq: qcom-hw: provide online/offline operations
cpufreq: qcom-hw: fix the opp entries refcounting
cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
Pull ACPI fixes from Rafael WysockiL
"These fix up the ACPI processor driver after a change made during the
5.16 cycle that inadvertently broke falling back to shallower C-states
when C3 cannot be used.
Specifics:
- Make the ACPI processor driver avoid falling back to C3 type of
C-states when C3 cannot be requested (Ville Syrjälä)
- Revert a quirk that is not necessary any more after fixing the
underlying issue properly (Ville Syrjälä)"
* tag 'acpi-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40"
ACPI: processor: idle: Avoid falling back to C3 type C-states