The DMA address used when mapping PCI P2P memory must be the PCI bus
address. Thus, introduce pci_p2pmem_map_sg() to map the correct addresses
when using P2P memory.
Memory mapped in this way does not need to be unmapped and thus if we
provided pci_p2pmem_unmap_sg() it would be empty. This breaks the expected
balance between map/unmap but was left out as an empty function doesn't
really provide any benefit. In the future, if this call becomes necessary
it can be added without much difficulty.
For this, we assume that an SGL passed to these functions contain all P2P
memory or no P2P memory.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Add a sysfs group to display statistics about P2P memory that is registered
in each PCI device.
Attributes in the group display the total amount of P2P memory, the amount
available and whether it is published or not.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Some PCI devices may have memory mapped in a BAR space that's intended for
use in peer-to-peer transactions. To enable such transactions the memory
must be registered with ZONE_DEVICE pages so it can be used by DMA
interfaces in existing drivers.
Add an interface for other subsystems to find and allocate chunks of P2P
memory as necessary to facilitate transfers between two PCI peers:
struct pci_dev *pci_p2pmem_find[_many]();
int pci_p2pdma_distance[_many]();
void *pci_alloc_p2pmem();
The new interface requires a driver to collect a list of client devices
involved in the transaction then call pci_p2pmem_find() to obtain any
suitable P2P memory. Alternatively, if the caller knows a device which
provides P2P memory, they can use pci_p2pdma_distance() to determine if it
is usable. With a suitable p2pmem device, memory can then be allocated
with pci_alloc_p2pmem() for use in DMA transactions.
Depending on hardware, using peer-to-peer memory may reduce the bandwidth
of the transfer but can significantly reduce pressure on system memory.
This may be desirable in many cases: for example a system could be designed
with a small CPU connected to a PCIe switch by a small number of lanes
which would maximize the number of lanes available to connect to NVMe
devices.
The code is designed to only utilize the p2pmem device if all the devices
involved in a transfer are behind the same PCI bridge. This is because we
have no way of knowing whether peer-to-peer routing between PCIe Root Ports
is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
transfers that go through the RC is limited to only reducing DRAM usage
and, in some cases, coding convenience. The PCI-SIG may be exploring
adding a new capability bit to advertise whether this is possible for
future hardware.
This commit includes significant rework and feedback from Christoph
Hellwig.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: fold in fix from Keith Busch <keith.busch@intel.com>:
https://lore.kernel.org/linux-pci/20181012155920.15418-1-keith.busch@intel.com,
to address comment from Dan Carpenter <dan.carpenter@oracle.com>, fold in
https://lore.kernel.org/linux-pci/20181017160510.17926-1-logang@deltatee.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Set the eetlp_prefix_path on PCIE_EXP_TYPE_RC_END devices to allow PASID
to be enabled on them. This fixes IOMMUv2 initialization on AMD Carrizo
APUs.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201079
Fixes: 7ce3f912ae ("PCI: Enable PASID only if entire path supports End-End TLP prefixes")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Calling into the new API to reset the secondary bus results in a deadlock.
This occurs because the device/bus is already locked at probe time.
Reverting back to the old behavior while the API is improved.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985
Fixes: c6a44ba950 ("PCI: Rename pci_try_reset_bus() to pci_reset_bus()")
Fixes: 409888e096 ("IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
The pci_reset_bus() function calls pci_probe_reset_slot() to determine
whether to call the slot or bus reset. The check has faulty logic in that
it does not account for pci_probe_reset_slot() being able to return an
errno. Fix by only calling the slot reset when the function returns 0.
Fixes: 811c5cb37d ("PCI: Unify try slot and bus reset API")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
If both hot-add and power fault were observed in a single interrupt, we
handled the hot-add first, then the power fault, in this path:
pciehp_ist
if (events & (PDC | DLLSC))
pciehp_handle_presence_or_link_change
case OFF_STATE:
pciehp_enable_slot
__pciehp_enable_slot
board_added
pciehp_power_on_slot
ctrl->power_fault_detected = 0
pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC)
pciehp_green_led_on(p_slot) # power LED on
pciehp_set_attention_status(p_slot, 0) # attention LED off
if ((events & PFD) && !ctrl->power_fault_detected)
ctrl->power_fault_detected = 1
pciehp_set_attention_status(1) # attention LED on
pciehp_green_led_off(slot) # power LED off
This left the attention indicator on (even though the hot-add succeeded)
and the power indicator off (even though the slot power was on).
Fix this by checking for power faults before checking for new devices.
Prior to 0e94916e60, this was successful because everything was chained
through work queues and the order was:
INT_PRESENCE_ON -> INT_POWER_FAULT -> ENABLE_REQ
The ENABLE_REQ cleared the power fault at the end, but now everything is
handled inline with the interrupt thread, such that the work ENABLE_REQ was
doing happens before power fault handling now.
Fixes: 0e94916e60 ("PCI: pciehp: Handle events synchronously")
Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
p.port can is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/pci/switch/switchtec.c:912 ioctl_port_to_pff() warn: potential spectre issue 'pcfg->dsp_pff_inst_id' [r]
Fix this by sanitizing p.port before using it to index
pcfg->dsp_pff_inst_id
Notice that given that speculation windows are large, the policy is to kill
the speculation on the first load and not worry if it can be completed with
a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Logan Gunthorpe <logang@deltatee.com>
Cc: stable@vger.kernel.org
Merge more updates from Andrew Morton:
- the rest of MM
- procfs updates
- various misc things
- more y2038 fixes
- get_maintainer updates
- lib/ updates
- checkpatch updates
- various epoll updates
- autofs updates
- hfsplus
- some reiserfs work
- fatfs updates
- signal.c cleanups
- ipc/ updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (166 commits)
ipc/util.c: update return value of ipc_getref from int to bool
ipc/util.c: further variable name cleanups
ipc: simplify ipc initialization
ipc: get rid of ids->tables_initialized hack
lib/rhashtable: guarantee initial hashtable allocation
lib/rhashtable: simplify bucket_table_alloc()
ipc: drop ipc_lock()
ipc/util.c: correct comment in ipc_obtain_object_check
ipc: rename ipcctl_pre_down_nolock()
ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid()
ipc: reorganize initialization of kern_ipc_perm.seq
ipc: compute kern_ipc_perm.id under the ipc lock
init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE
fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp
adfs: use timespec64 for time conversion
kernel/sysctl.c: fix typos in comments
drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md
fork: don't copy inconsistent signal handler state to child
signal: make get_signal() return bool
signal: make sigkill_pending() return bool
...
Allow the PCI quirk tables to be emitted in a way that avoids absolute
references to the hook functions. This reduces the size of the entries,
and, more importantly, makes them invariant under runtime relocation
(e.g., for KASLR)
Link: http://lkml.kernel.org/r/20180704083651.24360-6-ard.biesheuvel@linaro.org
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Make the idle loop handle stopped scheduler tick correctly (Rafael
Wysocki).
- Prevent the menu cpuidle governor from letting CPUs spend too much
time in shallow idle states when it is invoked with scheduler tick
stopped and clean it up somewhat (Rafael Wysocki).
- Avoid invoking the platform firmware to make the platform enter
the ACPI S3 sleep state with suspended PCIe root ports which may
confuse the firmware and cause it to crash (Rafael Wysocki).
- Fix sysfs-related race in the ondemand and conservative cpufreq
governors which may cause the system to crash if the governor
module is removed during an update of CPU frequency limits (Henry
Willard).
- Select SRCU when building the system wakeup framework to avoid a
build issue in it (zhangyi).
- Make the descriptions of ACPI C-states vendor-neutral to avoid
confusion (Prarit Bhargava).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJbfSnVAAoJEILEb/54YlRxBn4QAKQ8PqkSYkBby+1hb90ET4dk
VaLkbCYXuzLK5rIDvnbYOALhVKo4B29Ex5GdCLN7cWkZMkrVKe7oX8QQTnp3/7lF
URjTKgTNec5uJG652PrE3ESAa3X/kYggj6aeQOxDR4iYKzcpJEQ92ekFW+SoJTNp
Jc2kZh3qkC2On64GB3ibsZaKnmHfPvLg0t4agwzuYq/Gff8NRJFk7kMwAPzqGzZo
b2UVRcYFWIRkJjgmU9iInoeHIY8mBdT3IiKwTemZP1dOhb5T1AHOXwGTk6/cS+RH
A9qx4eg7I3R00KmnYvO8WytYJeOu2qb83GIUx4fIJGOqfvevm5xkxB9F+nfE+ouj
ROBqO4+X4XfQGPw8slayg0rJjI9JSkXLnLdl0Qw2WRlbc4/fVWntra1C57EeKFBR
EG9UAF9+7nUUx0bOCLsfFF3+r9R3SDUjk7b4thyhYncyQRsYC+FL7ztlxnMzVtzW
M5SF2sPrpcQzqmcszdUdbESI10n5X8m/crJW4rsbTxBpAM+coO+uLcvHWOY4MpkW
BgBsR6bMDAlG/VlTFgeeP/tkCRd5zNlJi7yBFItXuOoVKXpnHCJuxq2WkZ1Rb74M
Gk1d3TduekHJm8VsLEdCJR/tEk1cMc0zVUD/a1yzI4Z21QxvXUCqMDdws4/Ey184
qmKgNR9R94vSC5xIPRhM
=9GrU
-----END PGP SIGNATURE-----
Merge tag 'pm-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These fix the main idle loop and the menu cpuidle governor, clean up
the latter, fix a mistake in the PCI bus type's support for system
suspend and resume, fix the ondemand and conservative cpufreq
governors, address a build issue in the system wakeup framework and
make the ACPI C-states desciptions less confusing.
Specifics:
- Make the idle loop handle stopped scheduler tick correctly (Rafael
Wysocki).
- Prevent the menu cpuidle governor from letting CPUs spend too much
time in shallow idle states when it is invoked with scheduler tick
stopped and clean it up somewhat (Rafael Wysocki).
- Avoid invoking the platform firmware to make the platform enter the
ACPI S3 sleep state with suspended PCIe root ports which may
confuse the firmware and cause it to crash (Rafael Wysocki).
- Fix sysfs-related race in the ondemand and conservative cpufreq
governors which may cause the system to crash if the governor
module is removed during an update of CPU frequency limits (Henry
Willard).
- Select SRCU when building the system wakeup framework to avoid a
build issue in it (zhangyi).
- Make the descriptions of ACPI C-states vendor-neutral to avoid
confusion (Prarit Bhargava)"
* tag 'pm-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: menu: Handle stopped tick more aggressively
sched: idle: Avoid retaining the tick when it has been stopped
PCI / ACPI / PM: Resume all bridges on suspend-to-RAM
cpuidle: menu: Update stale polling override comment
cpufreq: governor: Avoid accessing invalid governor_data
x86/ACPI/cstate: Make APCI C1 FFH MWAIT C-state description vendor-neutral
cpuidle: menu: Fix white space
PM / sleep: wakeup: Fix build error caused by missing SRCU support
Commit 26112ddc25 (PCI / ACPI / PM: Resume bridges w/o drivers on
suspend-to-RAM) attempted to fix a functional regression resulting
from commit c62ec4610c (PM / core: Fix direct_complete handling
for devices with no callbacks) by resuming PCI bridges without
drivers (that is, "parallel PCI" ones) during system-wide suspend if
the target system state is not ACPI S0 (working state).
That turns out insufficient, however, as it is reported that, at
least in one case, the platform firmware gets confused if a PCIe
root port is suspended before entering the ACPI S3 sleep state.
That issue was exposed by commit 77b3729ca03 (PCI / PM: Use
SMART_SUSPEND and LEAVE_SUSPENDED flags for PCIe ports) that allowed
PCIe ports to stay in runtime suspend during system-wide suspend
(which is OK for suspend-to-idle, but turns out to be problematic
otherwise).
For this reason, drop the driver check from acpi_pci_need_resume()
and resume all bridges (including PCIe ports with drivers) during
system-wide suspend if the target system state is not ACPI S0.
[If the target system state is ACPI S0, it means suspend-to-idle
and the platform firmware is not going to be invoked to actually
suspend the system, so there is no need to resume the bridges in
that case.]
Fixes: 77b3729ca03 (PCI / PM: Use SMART_SUSPEND and LEAVE_SUSPENDED flags for PCIe ports)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200675
Reported-by: teika kazura <teika@gmx.com>
Tested-by: teika kazura <teika@gmx.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: 4.16+ <stable@vger.kernel.org> # 4.16+: 26112ddc25 (PCI / ACPI / PM: Resume bridges ...)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlt1f9AUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxbdhAArnhRvkwOk4m4/LCuKF6HpmlxbBNC
TjnBCenNf+lFXzWskfDFGFl/Wif4UzGbRTSCNQrwMzj3Ww3f/6R2QIq9rEJvyNC4
VdxQnaBEZSUgN87q5UGqgdjMTo3zFvlFH6fpb5XDiQ5IX/QZeXeYqoB64w+HvKPU
M+IsoOvnA5gb7pMcpchrGUnSfS1e6AqQbbTt6tZflore6YCEA4cH5OnpGx8qiZIp
ut+CMBvQjQB01fHeBc/wGrVte4NwXdONrXqpUb4sHF7HqRNfEh0QVyPhvebBi+k1
kquqoBQfPFTqgcab31VOcQhg70dEx+1qGm5/YBAwmhCpHR/g2gioFXoROsr+iUOe
BtF6LZr+Y8cySuhJnkCrJBqWvvBaKbJLg0KMbI+7p4o9MZpod2u7LS5LFrlRDyKW
3nz3o+b1+v3tCCKVKIhKo0ljolgkweQtR1f6KIHvq93wBODHVQnAOt9NlPfHVyks
ryGBnOhMjoU5hvfexgIWFk9Ph9MEVQSffkI+TeFPO/tyGBfGfQyGtESiXuEaMQaH
FGdZHX2RLkY3pWHOtWeMzRHzOnr2XjpDFcAqL3HBGPdJ30K3Umv3WOgoFe2SaocG
0gaddPjKSwwM4Sa/VP+O5cjGuzi7QnczSDdpYjxIGZzBav32hqx4/rsnLw7bHH8y
XkEme7cYJc8MGsA=
=2Dmn
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
- Decode AER errors with names similar to "lspci" (Tyler Baicar)
- Expose AER statistics in sysfs (Rajat Jain)
- Clear AER status bits selectively based on the type of recovery (Oza
Pawandeep)
- Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
Gagniuc)
- Don't clear AER status bits if we're using the "Firmware-First"
strategy where firmware owns the registers (Alexandru Gagniuc)
- Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
Shevchenko)
- Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)
- Defer DPC event handling to work queue (Keith Busch)
- Use threaded IRQ for DPC bottom half (Keith Busch)
- Print AER status while handling DPC events (Keith Busch)
- Work around IDT switch ACS Source Validation erratum (James
Puthukattukaran)
- Emit diagnostics for all cases of PCIe Link downtraining (Links
operating slower than they're capable of) (Alexandru Gagniuc)
- Skip VFs when configuring Max Payload Size (Myron Stowe)
- Reduce Root Port Max Payload Size if necessary when hot-adding a
device below it (Myron Stowe)
- Simplify SHPC existence/permission checks (Bjorn Helgaas)
- Remove hotplug sample skeleton driver (Lukas Wunner)
- Convert pciehp to threaded IRQ handling (Lukas Wunner)
- Improve pciehp tolerance of missed events and initially unstable
links (Lukas Wunner)
- Clear spurious pciehp events on resume (Lukas Wunner)
- Add pciehp runtime PM support, including for Thunderbolt controllers
(Lukas Wunner)
- Support interrupts from pciehp bridges in D3hot (Lukas Wunner)
- Mark fall-through switch cases before enabling -Wimplicit-fallthrough
(Gustavo A. R. Silva)
- Move DMA-debug PCI init from arch code to PCI core (Christoph
Hellwig)
- Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is
supplied (Heiner Kallweit)
- Unify PCI and DMA direction #defines (Shunyong Yang)
- Add PCI_DEVICE_DATA() macro (Andy Shevchenko)
- Check for VPD completion before checking for timeout (Bert Kenward)
- Limit Netronome NFP5000 config space size to work around erratum
(Jakub Kicinski)
- Set IRQCHIP_ONESHOT_SAFE for PCI MSI irqchips (Heiner Kallweit)
- Document ACPI description of PCI host bridges (Bjorn Helgaas)
- Add "pci=disable_acs_redir=" parameter to disable ACS redirection for
peer-to-peer DMA support (we don't have the peer-to-peer support yet;
this is just one piece) (Logan Gunthorpe)
- Clean up devm_of_pci_get_host_bridge_resources() resource allocation
(Jan Kiszka)
- Fixup resizable BARs after suspend/resume (Christian König)
- Make "pci=earlydump" generic (Sinan Kaya)
- Fix ROM BAR access routines to stay in bounds and check for signature
correctly (Rex Zhu)
- Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer)
- Expand documentation for pci_add_dma_alias() (Logan Gunthorpe)
- To avoid bus errors, enable PASID only if entire path supports
End-End TLP prefixes (Sinan Kaya)
- Unify slot and bus reset functions and remove hotplug knowledge from
callers (Sinan Kaya)
- Add Function-Level Reset quirks for Intel and Samsung NVMe devices to
fix guest reboot issues (Alex Williamson)
- Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD
Controller (Bjorn Helgaas)
- Remove Xilinx AXI-PCIe host bridge arch dependency (Palmer Dabbelt)
- Remove Aardvark outbound window configuration (Evan Wang)
- Fix Aardvark bridge window sizing issue (Zachary Zhang)
- Convert Aardvark to use pci_host_probe() to reduce code duplication
(Thomas Petazzoni)
- Correct the Cadence cdns_pcie_writel() signature (Alan Douglas)
- Add Cadence support for optional generic PHYs (Alan Douglas)
- Add Cadence power management ops (Alan Douglas)
- Remove redundant variable from Cadence driver (Colin Ian King)
- Add Kirin MSI support (Xiaowei Song)
- Drop unnecessary root_bus_nr setting from exynos, imx6, keystone,
armada8k, artpec6, designware-plat, histb, qcom, spear13xx (Shawn
Guo)
- Move link notification settings from DesignWare core to individual
drivers (Gustavo Pimentel)
- Add endpoint library MSI-X interfaces (Gustavo Pimentel)
- Correct signature of endpoint library IRQ interfaces (Gustavo
Pimentel)
- Add DesignWare endpoint library MSI-X callbacks (Gustavo Pimentel)
- Add endpoint library MSI-X test support (Gustavo Pimentel)
- Remove unnecessary GFP_ATOMIC from Hyper-V "new child" allocation
(Jia-Ju Bai)
- Add more devices to Broadcom PAXC quirk (Ray Jui)
- Work around corrupted Broadcom PAXC config space to enable SMMU and
GICv3 ITS (Ray Jui)
- Disable MSI parsing to work around broken Broadcom PAXC logic in some
devices (Ray Jui)
- Hide unconfigured functions to work around a Broadcom PAXC defect
(Ray Jui)
- Lower iproc log level to reduce console output during boot (Ray Jui)
- Fix mobiveil iomem/phys_addr_t type usage (Lorenzo Pieralisi)
- Fix mobiveil missing include file (Lorenzo Pieralisi)
- Add mobiveil Kconfig/Makefile support (Lorenzo Pieralisi)
- Fix mvebu I/O space remapping issues (Thomas Petazzoni)
- Use generic pci_host_bridge in mvebu instead of ARM-specific API
(Thomas Petazzoni)
- Whitelist VMD devices with fast interrupt handlers to avoid sharing
vectors with slow handlers (Keith Busch)
* tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (153 commits)
PCI/AER: Don't clear AER bits if error handling is Firmware-First
PCI: Limit config space size for Netronome NFP5000
PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips
PCI/VPD: Check for VPD access completion before checking for timeout
PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
PCI: Match Root Port's MPS to endpoint's MPSS as necessary
PCI: Skip MPS logic for Virtual Functions (VFs)
PCI: Add function 1 DMA alias quirk for Marvell 88SS9183
PCI: Check for PCIe Link downtraining
PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
PCI: Add device-specific ACS Redirect disable infrastructure
PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
PCI: Allow specifying devices using a base bus and path of devfns
PCI: Make specifying PCI devices in kernel parameters reusable
PCI: Hide ACS quirk declarations inside PCI core
PCI: Delay after FLR of Intel DC P3700 NVMe
PCI: Disable Samsung SM961/PM961 NVMe before FLR
PCI: Export pcie_has_flr()
PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers()
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbc41pAAoJEAx081l5xIa+ZrAP/AzKj4i4pBLVJcvNZ2BwD+UD
ZNSNj2iqCJ5+Jo/WtIwQ8tLct9UqfVssUwBke6tZksiLdTigGPTUyVIAdK+9kyWD
D00m3x/pToJrSF2D0FwxQlPUtPkohp9N+E6+TU7gd1oCasZfBzmcEpoVAmZf+NWE
kN1xXpmGxZWpu0wc7JA2lv9MuUTijCwIqJqa5E0bB3z06G5mw+PJ89kYzMx19OyA
ZYQK8y3A40ZGl8UbajZ4xg9pqFCRYFFHGqfYlpUWWTh0XMAXu8+Yqzh3dJxmak7r
4u2pdQBsxPMZO8qKBHpVvI7Zhoe0Ntnolc0XVD+2IbqqnTprVbQs0bWf3YyfUlQi
1/9bWFK67W0LEuzac6M7a7EQqFNiHF13Btao7aqENTIe/GaCZJoopaiRMAmh6EHD
4PezeYqrW8cSaPj6OKouL1BhW9Bjixsg0bvjS/uB6m4KekFCt1++BDFGzkqvm6Mo
SVW7nkJoCFpCASaR7DhUEOPexaHeJ65HCDDUvYdqz9jd2w1TgvvanEZWual1NwEm
ImA8A4wGZ/3KijpyyKm0gE96RX7+zMMZ3brW6p1vhUUKVYJCrvSr5jrXH5+2k6Aw
Y455doGL87IRkwyje/YbQF0I8pbUZD9QS5wII13tLGwOH9/uC/Xl6dHNM40gtqyh
W4gEdY+NAMJmYLvRNawa
=g9rD
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"This is the main drm pull request for 4.19.
Rob has some new hardware support for new qualcomm hw that I'll send
along separately. This has the display part of it, the remaining pull
is for the acceleration engine.
This also contains a wound-wait/wait-die mutex rework, Peter has acked
it for merging via my tree.
Otherwise mostly the usual level of activity. Summary:
core:
- Wound-wait/wait-die mutex rework
- Add writeback connector type
- Add "content type" property for HDMI
- Move GEM bo to drm_framebuffer
- Initial gpu scheduler documentation
- GPU scheduler fixes for dying processes
- Console deferred fbcon takeover support
- Displayport support for CEC tunneling over AUX
panel:
- otm8009a panel driver fixes
- Innolux TV123WAM and G070Y2-L01 panel driver
- Ilitek ILI9881c panel driver
- Rocktech RK070ER9427 LCD
- EDT ETM0700G0EDH6 and EDT ETM0700G0BDH6
- DLC DLC0700YZG-1
- BOE HV070WSA-100
- newhaven, nhd-4.3-480272ef-atxl LCD
- DataImage SCF0700C48GGU18
- Sharp LQ035Q7DB03
- p079zca: Refactor to support multiple panels
tinydrm:
- ILI9341 display panel
New driver:
- vkms - virtual kms driver to testing.
i915:
- Icelake:
Display enablement
DSI support
IRQ support
Powerwell support
- GPU reset fixes and improvements
- Full ppgtt support refactoring
- PSR fixes and improvements
- Execlist improvments
- GuC related fixes
amdgpu:
- Initial amdgpu documentation
- JPEG engine support on VCN
- CIK uses powerplay by default
- Move to using core PCIE functionality for gens/lanes
- DC/Powerplay interface rework
- Stutter mode support for RV
- Vega12 Powerplay updates
- GFXOFF fixes
- GPUVM fault debugging
- Vega12 GFXOFF
- DC improvements
- DC i2c/aux changes
- UVD 7.2 fixes
- Powerplay fixes for Polaris12, CZ/ST
- command submission bo_list fixes
amdkfd:
- Raven support
- Power management fixes
udl:
- Cleanups and fixes
nouveau:
- misc fixes and cleanups.
msm:
- DPU1 support display controller in sdm845
- GPU coredump support.
vmwgfx:
- Atomic modesetting validation fixes
- Support for multisample surfaces
armada:
- Atomic modesetting support completed.
exynos:
- IPPv2 fixes
- Move g2d to component framework
- Suspend/resume support cleanups
- Driver cleanups
imx:
- CSI configuration improvements
- Driver cleanups
- Use atomic suspend/resume helpers
- ipu-v3 V4L2 XRGB32/XBGR32 support
pl111:
- Add Nomadik LCDC variant
v3d:
- GPU scheduler jobs management
sun4i:
- R40 display engine support
- TCON TOP driver
mediatek:
- MT2712 SoC support
rockchip:
- vop fixes
omapdrm:
- Workaround for DRA7 errata i932
- Fix mm_list locking
mali-dp:
- Writeback implementation
PM improvements
- Internal error reporting debugfs
tilcdc:
- Single fix for deferred probing
hdlcd:
- Teardown fixes
tda998x:
- Converted to a bridge driver.
etnaviv:
- Misc fixes"
* tag 'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm: (1506 commits)
drm/amdgpu/sriov: give 8s for recover vram under RUNTIME
drm/scheduler: fix param documentation
drm/i2c: tda998x: correct PLL divider calculation
drm/i2c: tda998x: get rid of private fill_modes function
drm/i2c: tda998x: move mode_valid() to bridge
drm/i2c: tda998x: register bridge outside of component helper
drm/i2c: tda998x: cleanup from previous changes
drm/i2c: tda998x: allocate tda998x_priv inside tda998x_create()
drm/i2c: tda998x: convert to bridge driver
drm/scheduler: fix timeout worker setup for out of order job completions
drm/amd/display: display connected to dp-1 does not light up
drm/amd/display: update clk for various HDMI color depths
drm/amd/display: program display clock on cache match
drm/amd/display: Add NULL check for enabling dp ss
drm/amd/display: add vbios table check for enabling dp ss
drm/amd/display: Don't share clk source between DP and HDMI
drm/amd/display: Fix DP HBR2 Eye Diagram Pattern on Carrizo
drm/amd/display: Use calculated disp_clk_khz value for dce110
drm/amd/display: Implement custom degamma lut on dcn
drm/amd/display: Destroy aux_engines only once
...
- Whitelist VMD devices with fast interrupt handlers to avoid sharing
vectors with slow handlers (Keith Busch)
* remotes/lorenzo/pci/vmd:
PCI: vmd: White list for fast interrupt handlers
- Fix mvebu I/O space remapping issues (Thomas Petazzoni)
- Use generic pci_host_bridge in mvebu instead of ARM-specific API
(Thomas Petazzoni)
* remotes/lorenzo/pci/mvebu:
PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers()
PCI: mvebu: Convert to use pci_host_bridge directly
PCI: mvebu: Use resource_size() to remap I/O space
PCI: mvebu: Only remap I/O space if configured
PCI: mvebu: Fix I/O space end address calculation
PCI: mvebu: Remove redundant platform_set_drvdata() call
- Add more devices to Broadcom PAXC quirk (Ray Jui)
- Work around corrupted Broadcom PAXC config space to enable SMMU and
GICv3 ITS (Ray Jui)
- Disable MSI parsing to work around broken Broadcom PAXC logic in some
devices (Ray Jui)
- Hide unconfigured functions to work around a Broadcom PAXC defect (Ray
Jui)
- Lower iproc log level to reduce console output during boot (Ray Jui)
* remotes/lorenzo/pci/iproc:
PCI: iproc: Reduce inbound/outbound mapping print level
PCI: iproc: Reject unconfigured physical functions from PAXC
PCI: iproc: Disable MSI parsing in certain PAXC blocks
PCI: iproc: Fix up corrupted PAXC root complex config registers
PCI: iproc: Activate PAXC bridge quirk for more devices
- Remove Xilinx AXI-PCIe host bridge arch dependency (Palmer Dabbelt)
* remotes/lorenzo/pci/controller/misc:
PCI/xilinx: Depend on OF instead of the ARCH
- To avoid bus errors, enable PASID only if entire path supports End-End
TLP prefixes (Sinan Kaya)
- Unify slot and bus reset functions and remove hotplug knowledge from
callers (Sinan Kaya)
- Add Function-Level Reset quirks for Intel and Samsung NVMe devices to
fix guest reboot issues (Alex Williamson)
- Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD Controller
(Bjorn Helgaas)
* pci/virtualization:
PCI: Add function 1 DMA alias quirk for Marvell 88SS9183
PCI: Delay after FLR of Intel DC P3700 NVMe
PCI: Disable Samsung SM961/PM961 NVMe before FLR
PCI: Export pcie_has_flr()
PCI: Rename pci_try_reset_bus() to pci_reset_bus()
PCI: Deprecate pci_reset_bus() and pci_reset_slot() functions
PCI: Unify try slot and bus reset API
PCI: Hide pci_reset_bridge_secondary_bus() from drivers
IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset
PCI: Handle error return from pci_reset_bridge_secondary_bus()
PCI/IOV: Tidy pci_sriov_set_totalvfs()
PCI: Enable PASID only if entire path supports End-End TLP prefixes
# Conflicts:
# drivers/pci/hotplug/pciehp_hpc.c
- Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer)
- Expand documentation for pci_add_dma_alias() (Logan Gunthorpe)
* pci/switchtec:
PCI: Expand documentation for pci_add_dma_alias()
PCI: Add DMA alias quirk for Microsemi Switchtec NTB
switchtec: Use generic PCI Vendor ID and Class Code
# Conflicts:
# drivers/pci/quirks.c
- Clean up devm_of_pci_get_host_bridge_resources() resource allocation
(Jan Kiszka)
- Fixup resizable BARs after suspend/resume (Christian König)
- Make "pci=earlydump" generic (Sinan Kaya)
- Fix ROM BAR access routines to stay in bounds and check for signature
correctly (Rex Zhu)
* pci/resource:
PCI: Make pci_get_rom_size() static
PCI: Add check code for last image indicator not set
PCI: Avoid accessing memory outside the ROM BAR
PCI: Make early dump functionality generic
PCI: Cleanup PCI_REBAR_CTRL_BAR_SHIFT handling
PCI: Restore resized BAR state on resume
PCI: Clean up resource allocation in devm_of_pci_get_host_bridge_resources()
# Conflicts:
# Documentation/admin-guide/kernel-parameters.txt
- Add "pci=disable_acs_redir=" parameter to disable ACS redirection for
peer-to-peer DMA support (we don't have the peer-to-peer support yet;
this is just one piece) (Logan Gunthorpe)
* pci/peer-to-peer:
PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
PCI: Add device-specific ACS Redirect disable infrastructure
PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
PCI: Allow specifying devices using a base bus and path of devfns
PCI: Make specifying PCI devices in kernel parameters reusable
PCI: Hide ACS quirk declarations inside PCI core
- Mark fall-through switch cases before enabling -Wimplicit-fallthrough
(Gustavo A. R. Silva)
- Move DMA-debug PCI init from arch code to PCI core (Christoph Hellwig)
- Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is supplied
(Heiner Kallweit)
- Unify PCI and DMA direction #defines (Shunyong Yang)
- Add PCI_DEVICE_DATA() macro (Andy Shevchenko)
- Check for VPD completion before checking for timeout (Bert Kenward)
- Limit Netronome NFP5000 config space size to work around erratum (Jakub
Kicinski)
* pci/misc:
PCI: Limit config space size for Netronome NFP5000
PCI/VPD: Check for VPD access completion before checking for timeout
PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
PCI: Unify PCI and normal DMA direction definitions
PCI: Use IRQF_ONESHOT if pci_request_irq() called with no handler
PCI: Call dma_debug_add_bus() for pci_bus_type from PCI core
PCI: Mark fall-through switch cases before enabling -Wimplicit-fallthrough
# Conflicts:
# drivers/pci/hotplug/pciehp_ctrl.c
- Work around IDT switch ACS Source Validation erratum (James
Puthukattukaran)
- Emit diagnostics for all cases of PCIe Link downtraining (Links
operating slower than they're capable of) (Alexandru Gagniuc)
- Skip VFs when configuring Max Payload Size (Myron Stowe)
- Reduce Root Port Max Payload Size if necessary when hot-adding a device
below it (Myron Stowe)
* pci/enumeration:
PCI: Match Root Port's MPS to endpoint's MPSS as necessary
PCI: Skip MPS logic for Virtual Functions (VFs)
PCI: Check for PCIe Link downtraining
PCI: Workaround IDT switch ACS Source Validation erratum
- Defer DPC event handling to work queue (Keith Busch)
- Use threaded IRQ for DPC bottom half (Keith Busch)
- Print AER status while handling DPC events (Keith Busch)
* pci/dpc:
PCI/DPC: Remove indirection waiting for inactive link
PCI/DPC: Use threaded IRQ for bottom half handling
PCI/DPC: Print AER status in DPC event handling
PCI/DPC: Remove rp_pio_status from dpc struct
PCI/DPC: Defer event handling to work queue
PCI/DPC: Leave interrupts enabled while handling event
- Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
Shevchenko)
- Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)
* pci/aspm:
PCI: Remove unnecessary include of <linux/pci-aspm.h>
iwlwifi: Remove unnecessary include of <linux/pci-aspm.h>
ath9k: Remove unnecessary include of <linux/pci-aspm.h>
igb: Remove unnecessary include of <linux/pci-aspm.h>
PCI/ASPM: Convert to use sysfs_match_string() helper
- Decode AER errors with names similar to "lspci" (Tyler Baicar)
- Expose AER statistics in sysfs (Rajat Jain)
- Clear AER status bits selectively based on the type of recovery (Oza
Pawandeep)
- Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
Gagniuc)
- Don't clear AER status bits if we're using the "Firmware-First"
strategy where firmware owns the registers (Alexandru Gagniuc)
* pci/aer:
PCI/AER: Don't clear AER bits if error handling is Firmware-First
PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition
PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset
PCI/AER: Clear device status bits during ERR_COR handling
PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL
PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path
PCI/AER: Factor out ERR_NONFATAL status bit clearing
PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery
PCI/AER: Clear only ERR_FATAL status bits during fatal recovery
PCI/AER: Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST
PCI/AER: Add sysfs attributes for rootport cumulative stats
PCI/AER: Add sysfs attributes to provide AER stats and breakdown
PCI/AER: Define aer_stats structure for AER capable devices
PCI/AER: Move internal declarations to drivers/pci/pci.h
PCI/AER: Adopt lspci names for AER error decoding
PCI/AER: Expose internal API for obtaining AER information
# Conflicts:
# drivers/pci/pci.h
If the platform requests Firmware-First error handling, firmware is
responsible for reading and clearing AER status bits. If OSPM also clears
them, we may miss errors. See ACPI v6.2, sec 18.3.2.5 and 18.4.
This race is mostly of theoretical significance, as it is not easy to
reasonably demonstrate it in testing.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: add similar guards to pci_cleanup_aer_uncorrect_error_status()
and pci_aer_clear_fatal_status()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Like the NFP4000 and NFP6000, the NFP5000 as an erratum where reading/
writing to PCI config space addresses above 0x600 can cause the NFP to
generate PCIe completion timeouts.
Limit the NFP5000's PF's config space size to 0x600 bytes as is already
done for the NFP4000 and NFP6000.
The NFP5000's VF is 0x6003 (PCI_DEVICE_ID_NETRONOME_NFP6000_VF), the same
device ID as the NFP6000's VF. Thus, its config space is already limited
by the existing use of quirk_nfp6000().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Tony Egan <tony.egan@netronome.com>
If flag IRQCHIP_ONESHOT_SAFE isn't set for an irqchip and we have a
threaded interrupt with no primary handler, flag IRQF_ONESHOT needs to be
set for the interrupt, causing some overhead in the threaded interrupt
handler. For more detailed explanation also check following comment in
__setup_irq():
The interrupt was requested with handler = NULL, so we use the default
primary handler for it. But it does not have the oneshot flag set. In
combination with level interrupts this is deadly, because the default
primary handler just wakes the thread, then the irq lines is reenabled,
but the device still has the level irq asserted. Rinse and repeat....
While this works for edge type interrupts, we play it safe and reject
unconditionally because we can't say for sure which type this interrupt
really has. The type flags are unreliable as the underlying chip
implementation can override them.
Another comment in __setup_irq() gives a hint already that this
overhead can be avoided for PCI-MSI:
Some irq chips like MSI based interrupts are per se one shot safe. Check
the chip flags, so we can avoid the unmask dance at the end of the
threaded handler for those.
Following this let's mark all PCI-MSI irqchips as oneshot-safe.
See also discussion here:
https://lkml.kernel.org/r/alpine.DEB.2.21.1808032136490.1658@nanos.tec.linutronix.de
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Previously we checked the timeout before checking the VPD access completion
bit. On a very heavily loaded system this can cause VPD access to timeout.
Check the completion bit before checking the timeout.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Merge L1 Terminal Fault fixes from Thomas Gleixner:
"L1TF, aka L1 Terminal Fault, is yet another speculative hardware
engineering trainwreck. It's a hardware vulnerability which allows
unprivileged speculative access to data which is available in the
Level 1 Data Cache when the page table entry controlling the virtual
address, which is used for the access, has the Present bit cleared or
other reserved bits set.
If an instruction accesses a virtual address for which the relevant
page table entry (PTE) has the Present bit cleared or other reserved
bits set, then speculative execution ignores the invalid PTE and loads
the referenced data if it is present in the Level 1 Data Cache, as if
the page referenced by the address bits in the PTE was still present
and accessible.
While this is a purely speculative mechanism and the instruction will
raise a page fault when it is retired eventually, the pure act of
loading the data and making it available to other speculative
instructions opens up the opportunity for side channel attacks to
unprivileged malicious code, similar to the Meltdown attack.
While Meltdown breaks the user space to kernel space protection, L1TF
allows to attack any physical memory address in the system and the
attack works across all protection domains. It allows an attack of SGX
and also works from inside virtual machines because the speculation
bypasses the extended page table (EPT) protection mechanism.
The assoicated CVEs are: CVE-2018-3615, CVE-2018-3620, CVE-2018-3646
The mitigations provided by this pull request include:
- Host side protection by inverting the upper address bits of a non
present page table entry so the entry points to uncacheable memory.
- Hypervisor protection by flushing L1 Data Cache on VMENTER.
- SMT (HyperThreading) control knobs, which allow to 'turn off' SMT
by offlining the sibling CPU threads. The knobs are available on
the kernel command line and at runtime via sysfs
- Control knobs for the hypervisor mitigation, related to L1D flush
and SMT control. The knobs are available on the kernel command line
and at runtime via sysfs
- Extensive documentation about L1TF including various degrees of
mitigations.
Thanks to all people who have contributed to this in various ways -
patches, review, testing, backporting - and the fruitful, sometimes
heated, but at the end constructive discussions.
There is work in progress to provide other forms of mitigations, which
might be less horrible performance wise for a particular kind of
workloads, but this is not yet ready for consumption due to their
complexity and limitations"
* 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
x86/microcode: Allow late microcode loading with SMT disabled
tools headers: Synchronise x86 cpufeatures.h for L1TF additions
x86/mm/kmmio: Make the tracer robust against L1TF
x86/mm/pat: Make set_memory_np() L1TF safe
x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
x86/speculation/l1tf: Invert all not present mappings
cpu/hotplug: Fix SMT supported evaluation
KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
Documentation/l1tf: Remove Yonah processors from not vulnerable list
x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
x86: Don't include linux/irq.h from asm/hardirq.h
x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
cpu/hotplug: detect SMT disabled by BIOS
...
In commit 27d868b5e6 ("PCI: Set MPS to match upstream bridge"), we made
sure every device's MPS setting matches its upstream bridge, making it more
likely that a hot-added device will work in a system with an optimized MPS
configuration.
Recently I've started encountering systems where the endpoint device's MPSS
capability is less than its Root Port's current MPS value, thus the
endpoint is not capable of matching its upstream bridge's MPS setting (see:
bugzilla via "Link:" below). This leaves the system vulnerable - the
upstream Root Port could respond with larger TLPs than the device can
handle, and the device will consider them to be 'Malformed'.
One could use the "pci=pcie_bus_safe" kernel parameter to work around the
issue, but that forces a user to supply a kernel parameter to get the
system to function reliably and may end up limiting MPS settings of other
unrelated, sub-topologies which could benefit from maintaining their larger
values.
Augment Keith's approach to include tuning down a Root Port's MPS setting
when its hot-added endpoint device is not capable of matching it.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jon Mason <jdmason@kudzu.us>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Dongdong Liu <liudongdong3@huawei.com>
PCIe r4.0, sec 9.3.5.4, "Device Control Register", shows both
Max_Payload_Size (MPS) and Max_Read_request_Size (MRRS) to be 'RsvdP' for
VFs. Just prior to the table it states:
"PF and VF functionality is defined in Section 7.5.3.4 except where
noted in Table 9-16. For VF fields marked 'RsvdP', the PF setting
applies to the VF."
All of which implies that with respect to Max_Payload_Size Supported
(MPSS), MPS, and MRRS values, we should not be paying any attention to the
VF's fields, but rather only to the PF's. Only looking at the PF's fields
also logically makes sense as it's the sole physical interface to the PCIe
bus.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527
Fixes: 27d868b5e6 ("PCI: Set MPS to match upstream bridge")
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # 4.3+
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Dongdong Liu <liudongdong3@huawei.com>
Cc: Jon Mason <jdmason@kudzu.us>
When both ends of a PCIe Link are capable of a higher bandwidth than is
currently in use, the Link is said to be "downtrained". A downtrained Link
may indicate hardware or configuration problems in the system, but it's
hard to identify such Links from userspace.
Refactor pcie_print_link_status() so it continues to always print PCIe
bandwidth information, as several NIC drivers desire.
Add a new internal __pcie_print_link_status() to emit a message only when a
device's bandwidth is constrained by the fabric and call it from the PCI
core for all devices, which identifies all downtrained Links. It also
emits messages for a few cases that are technically not downtrained, such
as a x4 device in an open-ended x1 slot.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: changelog, move __pcie_print_link_status() declaration to
drivers/pci/, rename pcie_check_upstream_link() to
pcie_report_downtraining()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Intel Sunrise Point PCH hardware has an implementation of the ACS bits that
does not comply with the PCIe standard. Add a device-specific quirk,
pci_quirk_disable_intel_spt_pch_acs_redir() to disable ACS Redirection on
this system.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: changelog, split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Intel Sunrise Point (SPT) PCH hardware has an implementation of the ACS
bits that does not comply with the PCIe standard. To deal with this we
need device-specific quirks to disable ACS redirection.
Add a new pci_dev_specific_disable_acs_redir() quirk and a new
.disable_acs_redir() function pointer for use by non-compliant devices. No
functional change intended.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: split to separate patch, move
pci_dev_specific_disable_acs_redir() declarations to drivers/pci/pci.h]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Convert the search for device-specific ACS enable quirks from searching a
NULL-terminated array to iterating through the array, which is always
fixed-size anyway. No functional change intended.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: changelog, split to separate patch for reviewability]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
To support peer-to-peer traffic on a segment of the PCI hierarchy, we must
disable the ACS redirect bits for select PCI bridges. The bridges must be
selected before the devices are discovered by the kernel and the IOMMU
groups created. Therefore, add a kernel command line parameter to specify
devices which must have their ACS bits disabled.
The new parameter takes a list of devices separated by a semicolon. Each
device specified will have its ACS redirect bits disabled. This is
similar to the existing 'resource_alignment' parameter.
The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P
Egress Control bits are disabled, which is sufficient to always allow
passing P2P traffic uninterrupted. The bits are set after the kernel
(optionally) enables the ACS bits itself. It is also done regardless of
whether the kernel or platform firmware sets the bits.
If the user tries to disable the ACS redirect for a device without the ACS
capability, print a warning to dmesg.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: reorder to add the generic code first and move the
device-specific quirk to subsequent patches]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
When specifying PCI devices on the kernel command line using a
bus/device/function address, bus numbers can change when adding or
replacing a device, changing motherboard firmware, or applying kernel
parameters like "pci=assign-buses". When bus numbers change, it's likely
the command line tweak will be applied to the wrong device.
Therefore, it is useful to be able to specify devices with a base bus
number and the path of devfns needed to get to it, similar to the "device
scope" structure in the Intel VT-d spec, Section 8.3.1.
Thus, we add an option to specify devices in the following format:
[<domain>:]<bus>:<device>.<func>[/<device>.<func>]*
The path can be any segment within the PCI hierarchy of any length and
determined through the use of 'lspci -t'. When specified this way, it is
less likely that a renumbered bus will result in a valid device
specification and the tweak won't be applied to the wrong device.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: use "device" instead of "slot" in documentation since that's the
usual language in the PCI specs]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Separate out the code to match a PCI device with a string (typically
originating from a kernel parameter) from the
pci_specified_resource_alignment() function into its own helper function.
While we are at it, this change fixes the kernel style of the function
(fixing a number of long lines and extra parentheses).
Additionally, make the analogous change to the kernel parameter
documentation: Separate the description of how to specify a PCI device
into its own section at the head of the "pci=" parameter.
This patch should have no functional alterations.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: use "device" instead of "slot" in documentation since that's the
usual language in the PCI specs]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>