pci_dev_resource_resize_attr(n) macro is invoked for six resources,
creating a large footprint function for each resource.
Rework the macro to only create a function that calls a helper function so
the compiler can decide if it warrants to inline the function or not.
With x86_64 defconfig, this saves roughly 2.5kB:
$ scripts/bloat-o-meter drivers/pci/pci-sysfs.o{.old,.new}
add/remove: 1/0 grow/shrink: 0/6 up/down: 512/-2934 (-2422)
Function old new delta
__resource_resize_store - 512 +512
resource5_resize_store 503 14 -489
resource4_resize_store 503 14 -489
resource3_resize_store 503 14 -489
resource2_resize_store 503 14 -489
resource1_resize_store 503 14 -489
resource0_resize_store 500 11 -489
Total: Before=13399, After=10977, chg -18.08%
(The compiler seemingly chose to still inline __resource_resize_show()
which is fine, those functions are not very complex/large.)
Link: https://lore.kernel.org/r/20240222114607.1837-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
It is possible to enable CONFIG_PCI but disable CONFIG_SYSFS and for
space-constrained devices such as routers, such a configuration may
actually make sense.
However pci-sysfs.c is compiled even if CONFIG_SYSFS is disabled,
unnecessarily increasing the kernel's size.
To rectify that:
* Move pci_mmap_fits() to mmap.c. It is not only needed by
pci-sysfs.c, but also proc.c.
* Move pci_dev_type to probe.c and make it private. It references
pci_dev_attr_groups in pci-sysfs.c. Make that public instead for
consistency with pci_dev_groups, pcibus_groups and pci_bus_groups,
which are likewise public and referenced by struct definitions in
pci-driver.c and probe.c.
* Define pci_dev_groups, pci_dev_attr_groups, pcibus_groups and
pci_bus_groups to NULL if CONFIG_SYSFS is disabled. Provide empty
static inlines for pci_{create,remove}_legacy_files() and
pci_{create,remove}_sysfs_dev_files().
Result:
vmlinux size is reduced by 122996 bytes in my arm 32-bit test build.
Link: https://lore.kernel.org/r/85ca95ae8e4d57ccf082c5c069b8b21eb141846e.1698668982.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Here is the set of driver core updates for 6.7-rc1. Nothing major in
here at all, just a small number of changes including:
- minor cleanups and updates from Andy Shevchenko
- __counted_by addition
- firmware_loader update for aborting loads cleaner
- other minor changes, details in the shortlog
- documentation update
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZUTe8A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynP0QCfT2jQx3OcL22MoqCvdTuZJKPiHSIAoMxrliJF
d4cUeICW17ywlTFzsKg8
=nTeu
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of driver core updates for 6.7-rc1. Nothing major in
here at all, just a small number of changes including:
- minor cleanups and updates from Andy Shevchenko
- __counted_by addition
- firmware_loader update for aborting loads cleaner
- other minor changes, details in the shortlog
- documentation update
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (21 commits)
firmware_loader: Abort all upcoming firmware load request once reboot triggered
firmware_loader: Refactor kill_pending_fw_fallback_reqs()
Documentation: security-bugs.rst: linux-distros relaxed their rules
driver core: Release all resources during unbind before updating device links
driver core: class: remove boilerplate code
driver core: platform: Annotate struct irq_affinity_devres with __counted_by
resource: Constify resource crosscheck APIs
resource: Unify next_resource() and next_resource_skip_children()
resource: Reuse for_each_resource() macro
PCI: Implement custom llseek for sysfs resource entries
kernfs: sysfs: support custom llseek method for sysfs entries
debugfs: Fix __rcu type comparison warning
device property: Replace custom implementation of COUNT_ARGS()
drivers: base: test: Make property entry API test modular
driver core: Add missing parameter description to __fwnode_link_add()
device property: Clarify usage scope of some struct fwnode_handle members
devres: rename the first parameter of devm_add_action(_or_reset)
driver core: platform: Unify the firmware node type check
driver core: platform: Use temporary variable in platform_device_add()
driver core: platform: Refactor error path in a couple places
...
- Use FIELD_GET()/FIELD_PREP() when possible throughout drivers/pci/ (Ilpo
Järvinen, Bjorn Helgaas)
- Rework DPC control programming for clarity (Ilpo Järvinen)
* pci/field-get:
PCI/portdrv: Use FIELD_GET()
PCI/VC: Use FIELD_GET()
PCI/PTM: Use FIELD_GET()
PCI/PME: Use FIELD_GET()
PCI/ATS: Use FIELD_GET()
PCI/ATS: Show PASID Capability register width in bitmasks
PCI: Use FIELD_GET() in Sapphire RX 5600 XT Pulse quirk
PCI: Use FIELD_GET()
PCI/MSI: Use FIELD_GET/PREP()
PCI/DPC: Use defines with DPC reason fields
PCI/DPC: Use defined fields with DPC_CTL register
PCI/DPC: Use FIELD_GET()
PCI: hotplug: Use FIELD_GET/PREP()
PCI: dwc: Use FIELD_GET/PREP()
PCI: cadence: Use FIELD_GET()
PCI: Use FIELD_GET() to extract Link Width
PCI: mvebu: Use FIELD_PREP() with Link Width
PCI: tegra194: Use FIELD_GET()/FIELD_PREP() with Link Width fields
# Conflicts:
# drivers/pci/controller/dwc/pcie-tegra194.c
- Add pci_is_vga() helper, which checks for both PCI_CLASS_DISPLAY_VGA and
PCI_CLASS_NOT_DEFINED_VGA (which catches ancient devices built before
Class Codes were defined) (Sui Jingfeng)
- Use the new pci_is_vga() to identify devices for the VGA arbiter, the
sysfs "boot_vga" attribute, and the virtio and qxl drivers (SUi Jingfeng)
* pci/vga:
drm/qxl: Use pci_is_vga() to identify VGA devices
drm/virtio: Use pci_is_vga() to identify VGA devices
PCI/sysfs: Enable 'boot_vga' attribute via pci_is_vga()
PCI/VGA: Select VGA devices earlier
PCI/VGA: Use pci_is_vga() to identify VGA devices
PCI: Add pci_is_vga() helper
Use FIELD_GET() to extract PCIe Negotiated and Maximum Link Width fields
instead of custom masking and shifting.
Link: https://lore.kernel.org/r/20230919125648.1920-7-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: drop duplicate include of <linux/bitfield.h>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Enable the 'boot_vga' sysfs attribute via pci_is_vga().
This exposes 'boot_vga' for old PCI_CLASS_NOT_DEFINED_VGA (0x0001) devices
as well as for the PCI_CLASS_DISPLAY_VGA (0x0300) devices where it was
previously exposed.
Link: https://lore.kernel.org/r/20230830111532.444535-4-sui.jingfeng@linux.dev
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Maciej W. Rozycki" <macro@orcam.me.uk>
Since commit 636b21b501 ("PCI: Revoke mappings like devmem"), mmappable
sysfs entries have started to receive their f_mapping from the iomem
pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs
(and procfs) as well as in /dev/[k]mem.
This resulted in a userspace-visible regression:
1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0)
2. Use lseek(fd, 0, SEEK_END) to determine its size
Expected result: a PCI region size is returned.
Actual result: 0 is returned.
The reason is that PCI resource files residing in sysfs use
generic_file_llseek(), which relies on f_mapping->host inode to get the
file size. As f_mapping is now redefined, f_mapping->host points to an
anonymous zero-sized iomem_inode which has nothing to do with sysfs file
in question.
Implement a custom llseek method for sysfs PCI resources, which is
almost the same as proc_bus_pci_lseek() used for procfs entries.
This makes sysfs and procfs entries consistent with regards to seeking,
but also introduces userspace-visible changes to seeking PCI resources
in sysfs:
- SEEK_DATA and SEEK_HOLE are no longer supported;
- Seeking past the end of the file is prohibited while previously
offsets up to MAX_NON_LFS were accepted (reading from these offsets
was always invalid).
Signed-off-by: Valentine Sinitsyn <valesini@yandex-team.ru>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20230925084013.309399-2-valesini@yandex-team.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
struct pci_dev contains two flags which govern whether the device may
suspend to D3cold:
* no_d3cold provides an opt-out for drivers (e.g. if a device is known
to not wake from D3cold)
* d3cold_allowed provides an opt-out for user space (default is true,
user space may set to false)
Since commit 9d26d3a8f1 ("PCI: Put PCIe ports into D3 during suspend"),
the user space setting overwrites the driver setting. Essentially user
space is trusted to know better than the driver whether D3cold is
working.
That feels unsafe and wrong. Assume that the change was introduced
inadvertently and do not overwrite no_d3cold when d3cold_allowed is
modified. Instead, consider d3cold_allowed in addition to no_d3cold
when choosing a suspend state for the device.
That way, user space may opt out of D3cold if the driver hasn't, but it
may no longer force an opt in if the driver has opted out.
Fixes: 9d26d3a8f1 ("PCI: Put PCIe ports into D3 during suspend")
Link: https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.1695025365.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: stable@vger.kernel.org # v4.8+
If legacy I/O spaces are not supported simply return an error when
trying to access them via pci_resource_io(). This allows inb() and
friends to become undefined when they are known at compile time to be
non-functional in a later patch.
Co-developed-by: Arnd Bergmann <arnd@kernel.org>
Link: https://lore.kernel.org/r/20230703135255.2202721-3-schnelle@linux.ibm.com
Signed-off-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
struct bus_type should never be modified in a sysfs callback as there is
nothing in the structure to modify, and frankly, the structure is almost
never used in a sysfs callback, so mark it as constant to allow struct
bus_type to be moved to read-only memory.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ben Widawsky <bwidawsk@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Harald Freudenberger <freude@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hu Haowen <src.res@email.cn>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Stuart Yoder <stuyoder@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Acked-by: Ilya Dryomov <idryomov@gmail.com> # rbd
Acked-by: Ira Weiny <ira.weiny@intel.com> # cxl
Reviewed-by: Alex Shi <alexs@kernel.org>
Acked-by: Iwona Winiarska <iwona.winiarska@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi
Link: https://lore.kernel.org/r/20230313182918.1312597-23-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmOYpTIUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxuZhAAhGjE8voLZeOYwxbvfL69hGTAZ+Me
x2hqRWVhh/IGWXTTaoSLwSjMMokcmAKN5S/wv8qdCG5sB8EN8FyTBIZDy8PuRRdl
8UlqlBMSL+d4oSRDCnYLxFNcynLRNnmx2dfcdw9tJ4zjTLN8Y4o8PHFogR6pJ3MT
sDC8S0myTQKXr4wAGzTZycKsiGManviYtByp6dCcKD3Oy5Q2uZ9OKO2DP2yQpn+F
c3IJSV9oDz3KR8JVJ5Q1iz9cdMXbGwjkM3JLlHpxhedwjN4ErLumPutKcebtzO5C
aTqabN7Nnzc4yJusAIfojFCWH7fgaYUyJ3pxcFyJ4tu4m9Last+2I5UB/kV2sYAD
jWiCYx3sA/mRopNXOnrBGae+Lgy+sQnt8or0grySr0bK+b+ArAGis4uT4A0uASGO
RUQdIQwz7zhHeQrwAladHWxnx4BEDNCatgfn38p4fklIYKydCY5nfZURMDvHezSR
G6Nu08hoE9ZXlmkWTFw+5F23wPWKcCpzZj0hf7OroIouXUp8vqSFSqatH5vGkbCl
bDswck9GdRJ2hl5SvFOeelaXkM42du45TMLU2JmIn6dYYFNrO93JgdvKSU7E2CpG
AmDIpg1Idxo8fEPPGH1I7RVU5+ilzmmPQQY7poQW+va4/dEd/QVp1+ZZTDnMC1qk
qi3ck22VdvPU2VU=
=KULr
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Squash portdrv_{core,pci}.c into portdrv.c to ease maintenance and
make more things static.
- Make portdrv bind to Switch Ports that have AER. Previously, if
these Ports lacked MSI/MSI-X, portdrv failed to bind, which meant
the Ports couldn't be suspended to low-power states. AER on these
Ports doesn't use interrupts, and the AER driver doesn't need to
claim them.
- Assign PCI domain IDs using ida_alloc(), which makes host bridge
add/remove work better.
Resource management:
- To work better with recent BIOSes that use EfiMemoryMappedIO for
PCI host bridge apertures, remove those regions from the E820 map
(E820 entries normally prevent us from allocating BARs). In v5.19,
we added some quirks to disable E820 checking, but that's not very
maintainable. EfiMemoryMappedIO means the OS needs to map the
region for use by EFI runtime services; it shouldn't prevent OS
from using it.
PCIe native device hotplug:
- Build pciehp by default if USB4 is enabled, since Thunderbolt/USB4
PCIe tunneling depends on native PCIe hotplug.
- Enable Command Completed Interrupt only if supported to avoid user
confusion from lspci output that says this is enabled but not
supported.
- Prevent pciehp from binding to Switch Upstream Ports; this happened
because of interaction with acpiphp and caused devices below the
Upstream Port to disappear.
Power management:
- Convert AGP drivers to generic power management. We hope to remove
legacy power management from the PCI core eventually.
Virtualization:
- Fix pci_device_is_present(), which previously always returned
"false" for VFs, causing virtio hangs when unbinding the driver.
Miscellaneous:
- Convert drivers to gpiod API to prepare for dropping some legacy
code.
- Fix DOE fencepost error for the maximum data object length.
Baikal-T1 PCIe controller driver:
- Add driver and DT bindings.
Broadcom STB PCIe controller driver:
- Enable Multi-MSI.
- Delay 100ms after PERST# deassert to allow power and clocks to
stabilize.
- Configure Read Completion Boundary to 64 bytes.
Freescale i.MX6 PCIe controller driver:
- Initialize PHY before deasserting core reset to fix a regression in
v6.0 on boards where the PHY provides the reference.
- Fix imx6sx and imx8mq clock names in DT schema.
Intel VMD host bridge driver:
- Fix Secondary Bus Reset on VMD bridges, which allows reset of NVMe
SSDs in VT-d pass-through scenarios.
- Disable MSI remapping, which gets re-enabled by firmware during
suspend/resume.
MediaTek PCIe Gen3 controller driver:
- Add MT7986 and MT8195 support.
Qualcomm PCIe controller driver:
- Add SC8280XP/SA8540P basic interconnect support.
Rockchip DesignWare PCIe controller driver:
- Base DT schema on common Synopsys schema.
Synopsys DesignWare PCIe core:
- Collect DT items shared between Root Port and Endpoint (PERST GPIO,
PHY info, clocks, resets, link speed, number of lanes, number of
iATU windows, interrupt info, etc) to snps,dw-pcie-common.yaml.
- Add dma-ranges support for Root Ports and Endpoints.
- Consolidate DT resource retrieval for "dbi", "dbi2", "atu", etc. to
reduce code duplication.
- Add generic names for clocks and resets to encourage more
consistent naming across drivers using DesignWare IP.
- Stop advertising PTM Responder role for Endpoints, which aren't
allowed to be responders.
TI J721E PCIe driver:
- Add j721s2 host mode ID to DT schema.
- Add interrupt properties to DT schema.
Toshiba Visconti PCIe controller driver:
- Fix interrupts array max constraints in DT schema"
* tag 'pci-v6.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (95 commits)
x86/PCI: Use pr_info() when possible
x86/PCI: Fix log message typo
x86/PCI: Tidy E820 removal messages
PCI: Skip allocate_resource() if too little space available
efi/x86: Remove EfiMemoryMappedIO from E820 map
PCI/portdrv: Allow AER service only for Root Ports & RCECs
PCI: xilinx-nwl: Fix coding style violations
PCI: mvebu: Switch to using gpiod API
PCI: pciehp: Enable Command Completed Interrupt only if supported
PCI: aardvark: Switch to using devm_gpiod_get_optional()
dt-bindings: PCI: mediatek-gen3: add support for mt7986
dt-bindings: PCI: mediatek-gen3: add SoC based clock config
dt-bindings: PCI: qcom: Allow 'dma-coherent' property
PCI: mt7621: Add sentinel to quirks table
PCI: vmd: Fix secondary bus reset for Intel bridges
PCI: endpoint: pci-epf-vntb: Fix sparse ntb->reg build warning
PCI: endpoint: pci-epf-vntb: Fix sparse build warning for epf_db
PCI: endpoint: pci-epf-vntb: Replace hardcoded 4 with sizeof(u32)
PCI: endpoint: pci-epf-vntb: Remove unused epf_db_phy struct member
PCI: endpoint: pci-epf-vntb: Fix call pci_epc_mem_free_addr() in error path
...
PCI config space access from user space has traditionally been
unrestricted with writes being an understood risk for device operation.
Unfortunately, device breakage or odd behavior from config writes lacks
indicators that can leave driver writers confused when evaluating
failures. This is especially true with the new PCIe Data Object
Exchange (DOE) mailbox protocol where backdoor shenanigans from user
space through things such as vendor defined protocols may affect device
operation without complete breakage.
A prior proposal restricted read and writes completely.[1] Greg and
Bjorn pointed out that proposal is flawed for a couple of reasons.
First, lspci should always be allowed and should not interfere with any
device operation. Second, setpci is a valuable tool that is sometimes
necessary and it should not be completely restricted.[2] Finally
methods exist for full lock of device access if required.
Even though access should not be restricted it would be nice for driver
writers to be able to flag critical parts of the config space such that
interference from user space can be detected.
Introduce pci_request_config_region_exclusive() to mark exclusive config
regions. Such regions trigger a warning and kernel taint if accessed
via user space.
Create pci_warn_once() to restrict the user from spamming the log.
[1] https://lore.kernel.org/all/161663543465.1867664.5674061943008380442.stgit@dwillia2-desk3.amr.corp.intel.com/
[2] https://lore.kernel.org/all/YF8NGeGv9vYcMfTV@kroah.com/
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20220926215711.2893286-2-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When pci_create_attr() fails, pci_remove_resource_files() is called which
will iterate over the res_attr[_wc] arrays and frees every non NULL entry.
To avoid a double free here set the array entry only after it's clear we
successfully initialized it.
Fixes: b562ec8f74 ("PCI: Don't leak memory if sysfs_create_bin_file() fails")
Link: https://lore.kernel.org/r/20221007070735.GX986@pengutronix.de/
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Add a simple sysfs interface to Resizable BAR support, largely for the
purposes of assigning such devices to a VM through VFIO. Resizable BARs
present a difficult feature to expose to a VM through emulation, as
resizing a BAR is done on the host. It can fail, and often does, but we
have no means via emulation of a PCIe REBAR capability to handle the error
cases.
A vfio-pci specific ioctl interface is also cumbersome as there are often
multiple devices within the same bridge aperture and handling them is a
challenge. In the interface proposed here, expanding a BAR potentially
requires such devices to be soft-removed during the resize operation and
rescanned after, in order for all the necessary resources to be released.
A pci-sysfs interface is also more universal than a vfio specific
interface.
Please see the ABI documentation update for usage.
Link: https://lore.kernel.org/r/166336088796.3597940.14973499936692558556.stgit@omen
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Krzysztof Wilczyński <kw@linux.com>
Use a helper to set driver_override to the reduce amount of duplicated
code. Make the driver_override field const char, because it is not
modified by the core and it matches other subsystems.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220419113435.246203-6-krzysztof.kozlowski@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove variables and assignments that are never used.
Found by Krzysztof using cppcheck, e.g.,
$ cppcheck --enable=all --force
uselessAssignmentPtrArg drivers/pci/proc.c:102 Assignment of function parameter has no effect outside the function. Did you forget dereferencing it?
unreadVariable drivers/pci/setup-bus.c:1528 Variable 'old_flags' is assigned a value that is never used.
Reported-by: Krzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20220313192933.434746-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmGFXBkUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vx6Tg/7BsGWm8f+uw/mr9lLm47q2mc4XyoO
7bR9KDp5NM84W/8ZOU7dqqqsnY0ddrSOLBRyhJJYMW3SwJd1y1ajTBsL1Ujqv+eN
z+JUFmhq4Laqm4k6Spc9CEJE+Ol5P6gGUtxLYo6PM2R0VxnSs/rDxctT5i7YOpCi
COJ+NVT/mc/by2loz1kLTSR9GgtBBgd+Y8UA33GFbHKssROw02L0OI3wffp81Oba
EhMGPoD+0FndAniDw+vaOSoO+YaBuTfbM92T/O00mND69Fj1PWgmNWZz7gAVgsXb
3RrNENUFxgw6CDt7LZWB8OyT04iXe0R2kJs+PA9gigFCGbypwbd/Nbz5M7e9HUTR
ray+1EpZib6+nIksQBL2mX8nmtyHMcLiM57TOEhq0+ECDO640MiRm8t0FIG/1E8v
3ZYd9w20o/NxlFNXHxxpZ3D/osGH5ocyF5c5m1rfB4RGRwztZGL172LWCB0Ezz9r
eHB8sWxylxuhrH+hp2BzQjyddg7rbF+RA4AVfcQSxUpyV01hoRocKqknoDATVeLH
664nJIINFxKJFwfuL3E6OhrInNe1LnAhCZsHHqbS+NNQFgvPRznbixBeLkI9dMf5
Yf6vpsWO7ur8lHHbRndZubVu8nxklXTU7B/w+C11sq6k9LLRJSHzanr3Fn9WA80x
sznCxwUvbTCu1r0=
=nsMh
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Conserve IRQs by setting up portdrv IRQs only when there are users
(Jan Kiszka)
- Rework and simplify _OSC negotiation for control of PCIe features
(Joerg Roedel)
- Remove struct pci_dev.driver pointer since it's redundant with the
struct device.driver pointer (Uwe Kleine-König)
Resource management:
- Coalesce contiguous host bridge apertures from _CRS to accommodate
BARs that cover more than one aperture (Kai-Heng Feng)
Sysfs:
- Check CAP_SYS_ADMIN before parsing user input (Krzysztof
Wilczyński)
- Return -EINVAL consistently from "store" functions (Krzysztof
Wilczyński)
- Use sysfs_emit() in endpoint "show" functions to avoid buffer
overruns (Kunihiko Hayashi)
PCIe native device hotplug:
- Ignore Link Down/Up caused by resets during error recovery so
endpoint drivers can remain bound to the device (Lukas Wunner)
Virtualization:
- Avoid bus resets on Atheros QCA6174, where they hang the device
(Ingmar Klein)
- Work around Pericom PI7C9X2G switch packet drop erratum by using
store and forward mode instead of cut-through (Nathan Rossi)
- Avoid trying to enable AtomicOps on VFs; the PF setting applies to
all VFs (Selvin Xavier)
MSI:
- Document that /sys/bus/pci/devices/.../irq contains the legacy INTx
interrupt or the IRQ of the first MSI (not MSI-X) vector (Barry
Song)
VPD:
- Add pci_read_vpd_any() and pci_write_vpd_any() to access anywhere
in the possible VPD space; use these to simplify the cxgb3 driver
(Heiner Kallweit)
Peer-to-peer DMA:
- Add (not subtract) the bus offset when calculating DMA address
(Wang Lu)
ASPM:
- Re-enable LTR at Downstream Ports so they don't report Unsupported
Requests when reset or hot-added devices send LTR messages
(Mingchuang Qiao)
Apple PCIe controller driver:
- Add driver for Apple M1 PCIe controller (Alyssa Rosenzweig, Marc
Zyngier)
Cadence PCIe controller driver:
- Return success when probe succeeds instead of falling into error
path (Li Chen)
HiSilicon Kirin PCIe controller driver:
- Reorganize PHY logic and add support for external PHY drivers
(Mauro Carvalho Chehab)
- Support PERST# GPIOs for HiKey970 external PEX 8606 bridge (Mauro
Carvalho Chehab)
- Add Kirin 970 support (Mauro Carvalho Chehab)
- Make driver removable (Mauro Carvalho Chehab)
Intel VMD host bridge driver:
- If IOMMU supports interrupt remapping, leave VMD MSI-X remapping
enabled (Adrian Huang)
- Number each controller so we can tell them apart in
/proc/interrupts (Chunguang Xu)
- Avoid building on UML because VMD depends on x86 bare metal APIs
(Johannes Berg)
Marvell Aardvark PCIe controller driver:
- Define macros for PCI_EXP_DEVCTL_PAYLOAD_* (Pali Rohár)
- Set Max Payload Size to 512 bytes per Marvell spec (Pali Rohár)
- Downgrade PIO Response Status messages to debug level (Marek Behún)
- Preserve CRS SV (Config Request Retry Software Visibility) bit in
emulated Root Control register (Pali Rohár)
- Fix issue in configuring reference clock (Pali Rohár)
- Don't clear status bits for masked interrupts (Pali Rohár)
- Don't mask unused interrupts (Pali Rohár)
- Avoid code repetition in advk_pcie_rd_conf() (Marek Behún)
- Retry config accesses on CRS response (Pali Rohár)
- Simplify emulated Root Capabilities initialization (Pali Rohár)
- Fix several link training issues (Pali Rohár)
- Fix link-up checking via LTSSM (Pali Rohár)
- Fix reporting of Data Link Layer Link Active (Pali Rohár)
- Fix emulation of W1C bits (Marek Behún)
- Fix MSI domain .alloc() method to return zero on success (Marek
Behún)
- Read entire 16-bit MSI vector in MSI handler, not just low 8 bits
(Marek Behún)
- Clear Root Port I/O Space, Memory Space, and Bus Master Enable bits
at startup; PCI core will set those as necessary (Pali Rohár)
- When operating as a Root Port, set class code to "PCI Bridge"
instead of the default "Mass Storage Controller" (Pali Rohár)
- Add emulation for PCI_BRIDGE_CTL_BUS_RESET since aardvark doesn't
implement this per spec (Pali Rohár)
- Add emulation of option ROM BAR since aardvark doesn't implement
this per spec (Pali Rohár)
MediaTek MT7621 PCIe controller driver:
- Add MediaTek MT7621 PCIe host controller driver and DT binding
(Sergio Paracuellos)
Qualcomm PCIe controller driver:
- Add SC8180x compatible string (Bjorn Andersson)
- Add endpoint controller driver and DT binding (Manivannan
Sadhasivam)
- Restructure to use of_device_get_match_data() (Prasad Malisetty)
- Add SC7280-specific pcie_1_pipe_clk_src handling (Prasad Malisetty)
Renesas R-Car PCIe controller driver:
- Remove unnecessary includes (Geert Uytterhoeven)
Rockchip DesignWare PCIe controller driver:
- Add DT binding (Simon Xue)
Socionext UniPhier Pro5 controller driver:
- Serialize INTx masking/unmasking (Kunihiko Hayashi)
Synopsys DesignWare PCIe controller driver:
- Run dwc .host_init() method before registering MSI interrupt
handler so we can deal with pending interrupts left by bootloader
(Bjorn Andersson)
- Clean up Kconfig dependencies (Andy Shevchenko)
- Export symbols to allow more modular drivers (Luca Ceresoli)
TI DRA7xx PCIe controller driver:
- Allow host and endpoint drivers to be modules (Luca Ceresoli)
- Enable external clock if present (Luca Ceresoli)
TI J721E PCIe driver:
- Disable PHY when probe fails after initializing it (Christophe
JAILLET)
MicroSemi Switchtec management driver:
- Return error to application when command execution fails because an
out-of-band reset has cleared the device BARs, Memory Space Enable,
etc (Kelvin Cao)
- Fix MRPC error status handling issue (Kelvin Cao)
- Mask out other bits when reading of management VEP instance ID
(Kelvin Cao)
- Return EOPNOTSUPP instead of ENOTSUPP from sysfs show functions
(Kelvin Cao)
- Add check of event support (Logan Gunthorpe)
Miscellaneous:
- Remove unused pci_pool wrappers, which have been replaced by
dma_pool (Cai Huoqing)
- Use 'unsigned int' instead of bare 'unsigned' (Krzysztof
Wilczyński)
- Use kstrtobool() directly, sans strtobool() wrapper (Krzysztof
Wilczyński)
- Fix some sscanf(), sprintf() format mismatches (Krzysztof
Wilczyński)
- Update PCI subsystem information in MAINTAINERS (Krzysztof
Wilczyński)
- Correct some misspellings (Krzysztof Wilczyński)"
* tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (137 commits)
PCI: Add ACS quirk for Pericom PI7C9X2G switches
PCI: apple: Configure RID to SID mapper on device addition
iommu/dart: Exclude MSI doorbell from PCIe device IOVA range
PCI: apple: Implement MSI support
PCI: apple: Add INTx and per-port interrupt support
PCI: kirin: Allow removing the driver
PCI: kirin: De-init the dwc driver
PCI: kirin: Disable clkreq during poweroff sequence
PCI: kirin: Move the power-off code to a common routine
PCI: kirin: Add power_off support for Kirin 960 PHY
PCI: kirin: Allow building it as a module
PCI: kirin: Add MODULE_* macros
PCI: kirin: Add Kirin 970 compatible
PCI: kirin: Support PERST# GPIOs for HiKey970 external PEX 8606 bridge
PCI: apple: Set up reference clocks when probing
PCI: apple: Add initial hardware bring-up
PCI: of: Allow matching of an interrupt-map local to a PCI device
of/irq: Allow matching of an interrupt-map local to an interrupt controller
irqdomain: Make of_phandle_args_to_fwspec() generally available
PCI: Do not enable AtomicOps on VFs
...
- Check for CAP_SYS_ADMIN before validating sysfs user input, not after
(Krzysztof Wilczyński)
- Always return -EINVAL from sysfs "store" functions for invalid user input
instead of -EINVAL sometimes and -ERANGE others (Krzysztof Wilczyński)
- Use kstrtobool() directly instead of the strtobool() wrapper (Krzysztof
Wilczyński)
* pci/sysfs:
PCI: Use kstrtobool() directly, sans strtobool() wrapper
PCI/sysfs: Return -EINVAL consistently from "store" functions
PCI/sysfs: Check CAP_SYS_ADMIN before parsing user input
# Conflicts:
# drivers/pci/iov.c
The sysfs "irq" file contains the legacy INTx IRQ. Or, if the device has
MSI enabled, it contains the first MSI IRQ instead.
Previously this file showed the pci_dev.irq value directly. But we'd
prefer to use pci_dev.irq only for the INTx IRQ and decouple that from any
MSI or MSI-X IRQs.
If the device has MSI enabled, explicitly look up and show the first MSI
IRQ in the sysfs "irq" file. Otherwise, show the INTx IRQ.
This removes the requirement that msi_capability_init() set pci_dev.irq to
the first MSI IRQ when enabling MSI and pci_msi_shutdown() restore the INTx
IRQ when disabling MSI.
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20210825102636.52757-3-21cnbao@gmail.com
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Use the proper macro instead of hard-coded (-1) value.
Suggested-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20211004133453.18881-2-mgurtovoy@nvidia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Most of the "store" functions that handle userspace input via sysfs return
-EINVAL should the value fail validation and/or type conversion. This
error code is a clear message to userspace that the value is not a valid
input.
However, some of the "show" functions return input parsing error codes
as-is, which may be either -EINVAL or -ERANGE. The former would often be
from kstrtobool(), and the latter typically from other kstr*() functions
such as kstrtou8(), kstrtou32(), kstrtoint(), etc.
-EINVAL is commonly returned as the error code to indicate that the value
provided is invalid, but -ERANGE is not very useful in userspace.
Therefore, normalize the return error code to be -EINVAL for when the
validation and/or type conversion fails.
Link: https://lore.kernel.org/r/20210915230127.2495723-2-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Check if the "CAP_SYS_ADMIN" capability flag is set before parsing user
input as it makes more sense to first check whether the current user
actually has the right permissions before accepting any input from such
user.
This will also make order in which enable_store() and msi_bus_store()
perform the "CAP_SYS_ADMIN" capability check consistent with other
PCI-related sysfs objects that first verify whether user has this
capability set.
Link: https://lore.kernel.org/r/20210915230127.2495723-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Here is the big set of driver core patches for 5.15-rc1.
These do change a number of different things across different
subsystems, and because of that, there were 2 stable tags created that
might have already come into your tree from different pulls that did the
following
- changed the bus remove callback to return void
- sysfs iomem_get_mapping rework
The latter one will cause a tiny merge issue with your tree, as there
was a last-minute fix for this in 5.14 in your tree, but the fixup
should be "obvious". If you want me to provide a fixed merge for this,
please let me know.
Other than those two things, there's only a few small things in here:
- kernfs performance improvements for huge numbers of sysfs
users at once
- tiny api cleanups
- other minor changes
All of these have been in linux-next for a while with no reported
problems, other than the before-mentioned merge issue.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYS+FLQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylXuACfWECnysDtXNe66DdETCFs1a1RToYAoMokWeU5
s8VFP1NY2BjmxJbkebLL
=8kVu
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big set of driver core patches for 5.15-rc1.
These do change a number of different things across different
subsystems, and because of that, there were 2 stable tags created that
might have already come into your tree from different pulls that did
the following
- changed the bus remove callback to return void
- sysfs iomem_get_mapping rework
Other than those two things, there's only a few small things in here:
- kernfs performance improvements for huge numbers of sysfs users at
once
- tiny api cleanups
- other minor changes
All of these have been in linux-next for a while with no reported
problems, other than the before-mentioned merge issue"
* tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (33 commits)
MAINTAINERS: Add dri-devel for component.[hc]
driver core: platform: Remove platform_device_add_properties()
ARM: tegra: paz00: Handle device properties with software node API
bitmap: extend comment to bitmap_print_bitmask/list_to_buf
drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI
topology: use bin_attribute to break the size limitation of cpumap ABI
lib: test_bitmap: add bitmap_print_bitmask/list_to_buf test cases
cpumask: introduce cpumap_print_list/bitmask_to_buf to support large bitmask and list
sysfs: Rename struct bin_attribute member to f_mapping
sysfs: Invoke iomem_get_mapping() from the sysfs open callback
debugfs: Return error during {full/open}_proxy_open() on rmmod
zorro: Drop useless (and hardly used) .driver member in struct zorro_dev
zorro: Simplify remove callback
sh: superhyway: Simplify check in remove callback
nubus: Simplify check in remove callback
nubus: Make struct nubus_driver::remove return void
kernfs: dont call d_splice_alias() under kernfs node lock
kernfs: use i_lock to protect concurrent inode updates
kernfs: switch kernfs to use an rwsem
kernfs: use VFS negative dentry caching
...
Two legacy PCI sysfs objects "legacy_io" and "legacy_mem" were updated
to use an unified address space in the commit 636b21b501 ("PCI: Revoke
mappings like devmem"). This allows for revocations to be managed from
a single place when drivers want to take over and mmap() a /dev/mem
range.
Following the update, both of the sysfs objects should leverage the
iomem_get_mapping() function to get an appropriate address range, but
only the "legacy_io" has been correctly updated - the second attribute
seems to be using a wrong variable to pass the iomem_get_mapping()
function to.
Thus, correct the variable name used so that the "legacy_mem" sysfs
object would also correctly call the iomem_get_mapping() function.
Fixes: 636b21b501 ("PCI: Revoke mappings like devmem")
Link: https://lore.kernel.org/r/20210812132144.791268-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add "reset_method" sysfs attribute to enable user to query and set
preferred device reset methods and their ordering.
[bhelgaas: on invalid sysfs input, return error and preserve previous
config, as in earlier patch versions]
Co-developed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20210817180500.1253-6-ameynarkhede03@gmail.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
"reset_fn" indicates whether the device supports any reset mechanism.
Remove the use of reset_fn in favor of the reset_methods array that tracks
supported reset mechanisms of a device and their ordering.
The octeon driver incorrectly used reset_fn to detect whether the device
supports FLR or not. Use pcie_reset_flr() to probe whether it supports FLR.
Co-developed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20210817180500.1253-5-ameynarkhede03@gmail.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
There are two users of iomem_get_mapping(), the struct file and struct
bin_attribute. The former has a member called "f_mapping" and the
latter has a member called "mapping", and both are poniters to struct
address_space.
Rename struct bin_attribute member to "f_mapping" to keep both meaning
and the usage consistent with other users of iomem_get_mapping().
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20210729233235.1508920-3-kw@linux.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Defer invocation of the iomem_get_mapping() to the sysfs open callback
so that it can be executed as needed when the binary sysfs object has
been accessed.
To do that, convert the "mapping" member of the struct bin_attribute
from a pointer to the struct address_space into a function pointer with
a signature that requires the same return type, and then updates the
sysfs_kf_bin_open() to invoke provided function should the function
pointer be valid.
Also, convert every invocation of iomem_get_mapping() into a function
pointer assignment, therefore allowing for the iomem_get_mapping()
invocation to be deferred to when the sysfs open callback runs.
Thus, this change removes the need for the fs_initcalls to complete
before any other sub-system that uses the iomem_get_mapping() would be
able to invoke it safely without leading to a failure and an Oops
related to an invalid iomem_get_mapping() access.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20210729233235.1508920-2-kw@linux.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously, when the value of the "devspec" sysfs attribute was read from
the user space there was no newline present, and utilities such as "cat"
wouldn't display the result of the read correctly.
Append a newline character in the show() function to match other "devspec"
attributes.
Link: https://lore.kernel.org/r/20210603000112.703037-5-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
The sysfs_emit() and sysfs_emit_at() functions were introduced to make it
less ambiguous which function is preferred when writing to the output
buffer in a device attribute's "show" callback [1].
Convert the PCI sysfs object "show" functions from sprintf(), snprintf()
and scnprintf() to sysfs_emit() and sysfs_emit_at() accordingly, as the
latter is aware of the PAGE_SIZE buffer and correctly returns the number of
bytes written into the buffer.
No functional change intended.
[1] Documentation/filesystems/sysfs.rst
[bhelgaas: drop dsm_label_utf16s_to_utf8s(), link speed/width changes]
Link: https://lore.kernel.org/r/20210416205856.3234481-10-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The "label", "index", and "acpi_index" sysfs attributes show firmware label
information about the device. If the ACPI Device Name _DSM is implemented
for the device, we have:
label Device name (optional, may be null)
acpi_index Instance number (unique under \_SB scope)
When there is no ACPI _DSM and SMBIOS provides an Onboard Devices structure
for the device, we have:
label Reference Designation, e.g., a silkscreen label
index Device Type Instance
Previously these attributes were dynamically created either by
pci_bus_add_device() or the pci_sysfs_init() initcall, but since they don't
need to be created or removed dynamically, we can use a static attribute so
the device model takes care of addition and removal automatically.
Convert "label", "index", and "acpi_index" to static attributes.
Presence of the ACPI _DSM (device_has_acpi_name()) determines whether the
ACPI information (label, acpi_index) or the SMBIOS information (label,
index) is visible.
[bhelgaas: commit log, split to separate patch, add "pci_dev_" prefix]
Suggested-by: Oliver O'Halloran <oohall@gmail.com>
Link: https://lore.kernel.org/r/20210416205856.3234481-6-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The "vpd" sysfs attribute allows access to Vital Product Data (VPD).
Previously it was dynamically created either by pci_bus_add_device() or the
pci_sysfs_init() initcall, but since it doesn't need to be created or
removed dynamically, we can use a static attribute so the device model
takes care of addition and removal automatically.
Convert "vpd" to a static attribute and use the .is_bin_visible() callback
to check whether the device supports VPD.
Remove pcie_vpd_create_sysfs_dev_files(),
pcie_vpd_remove_sysfs_dev_files(), pci_create_capabilities_sysfs(), and
pci_create_capabilities_sysfs(), which are no longer needed.
[bhelgaas: This is substantially the same as the earlier patch from Heiner
Kallweit <hkallweit1@gmail.com>. I included Krzysztof's change here so all
the "convert to static attribute" changes are together.]
[bhelgaas: rename to vpd_read()/vpd_write() and pci_dev_vpd_attr_group]
Suggested-by: Oliver O'Halloran <oohall@gmail.com>
Based-on: https://lore.kernel.org/r/7703024f-8882-9eec-a122-599871728a89@gmail.com
Based-on-patch-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20210416205856.3234481-5-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The "reset" sysfs attribute allows for resetting a PCI function.
Previously it was dynamically created either by pci_bus_add_device() or
the pci_sysfs_init() initcall, but since it doesn't need to be created or
removed dynamically, we can use a static attribute so the device model
takes care of addition and removal automatically.
Convert "reset" to a static attribute and use the .is_visible() callback to
check whether the device supports reset.
Clear reset_fn in pci_stop_dev() instead of pci_remove_capabilities_sysfs()
since we no longer explicitly remove the "reset" sysfs file.
[bhelgaas: commit log]
Suggested-by: Oliver O'Halloran <oohall@gmail.com>
Link: https://lore.kernel.org/r/20210416205856.3234481-4-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The "rom" sysfs attribute allows access to the PCI Option ROM. Previously
it was dynamically created either by pci_bus_add_device() or the
pci_sysfs_init() initcall, but since it doesn't need to be created or
removed dynamically, we can use a static attribute so the device model
takes care of addition and removal automatically.
Convert "rom" to a static attribute and use the .is_bin_visible() callback
to set the correct object size based on the ROM size.
Remove "rom_attr" from the struct pci_dev since it is no longer needed.
This attribute was added in the pre-git era by https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/drivers/pci/pci-sysfs.c?id=f6d553444da2
[bhelgaas: commit log]
Suggested-by: Oliver O'Halloran <oohall@gmail.com>
Link: https://lore.kernel.org/r/20210416205856.3234481-3-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The "config" sysfs attribute allows access to either the legacy (PCI and
PCI-X Mode 1) or the extended (PCI-X Mode 2 and PCIe) device configuration
space. Previously it was dynamically created either when a device was
added (for hot-added devices) or via a late_initcall (for devices present
at boot):
pci_bus_add_devices
pci_bus_add_device
pci_create_sysfs_dev_files
if (!sysfs_initialized)
return
sysfs_create_bin_file # for hot-added devices
pci_sysfs_init # late_initcall
sysfs_initialized = 1
for_each_pci_dev(pdev)
pci_create_sysfs_dev_files(pdev) # for devices present at boot
And dynamically removed when the PCI device is stopped and removed:
pci_stop_bus_device
pci_stop_dev
pci_remove_sysfs_dev_files
sysfs_remove_bin_file
This attribute does not need to be created or removed dynamically, so we
can use a static attribute so the device model takes care of addition and
removal automatically.
Convert "config" to a static attribute and use the .is_bin_visible()
callback to set the correct object size (either 256 bytes or 4 KiB) at
runtime.
The pci_sysfs_init() scheme was added in the pre-git era by
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/drivers/pci/pci-sysfs.c?id=f6d553444da2
[bhelgaas: commit log]
Suggested-by: Oliver O'Halloran <oohall@gmail.com>
Link: https://lore.kernel.org/r/CAOSf1CHss03DBSDO4PmTtMp0tCEu5kScn704ZEwLKGXQzBfqaA@mail.gmail.com
Link: https://lore.kernel.org/r/20210416205856.3234481-2-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
A typical cloud provider SR-IOV use case is to create many VFs for use by
guest VMs. The VFs may not be assigned to a VM until a customer requests a
VM of a certain size, e.g., number of CPUs. A VF may need MSI-X vectors
proportional to the number of CPUs in the VM, but there is no standard way
to change the number of MSI-X vectors supported by a VF.
Some Mellanox ConnectX devices support dynamic assignment of MSI-X vectors
to SR-IOV VFs. This can be done by the PF driver after VFs are enabled,
and it can be done without affecting VFs that are already in use. The
hardware supports a limited pool of MSI-X vectors that can be assigned to
the PF or to individual VFs. This is device-specific behavior that
requires support in the PF driver.
Add a read-only "sriov_vf_total_msix" sysfs file for the PF and a writable
"sriov_vf_msix_count" file for each VF. Management software may use these
to learn how many MSI-X vectors are available and to dynamically assign
them to VFs before the VFs are passed through to a VM.
If the PF driver implements the ->sriov_get_vf_total_msix() callback,
"sriov_vf_total_msix" contains the total number of MSI-X vectors available
for distribution among VFs.
If no driver is bound to the VF, writing "N" to "sriov_vf_msix_count" uses
the PF driver ->sriov_set_msix_vec_count() callback to assign "N" MSI-X
vectors to the VF. When a VF driver subsequently reads the MSI-X Message
Control register, it will see the new Table Size "N".
Link: https://lore.kernel.org/linux-pci/20210314124256.70253-2-leon@kernel.org
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Since 3234ac664a ("/dev/mem: Revoke mappings when a driver claims
the region") /dev/kmem zaps PTEs when the kernel requests exclusive
acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
the default for all driver uses.
Except there are two more ways to access PCI BARs: sysfs and proc mmap
support. Let's plug that hole.
For revoke_devmem() to work we need to link our vma into the same
address_space, with consistent vma->vm_pgoff. ->pgoff is already
adjusted, because that's how (io_)remap_pfn_range works, but for the
mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is
to adjust this at at ->open time:
- for sysfs this is easy, now that binary attributes support this. We
just set bin_attr->mapping when mmap is supported
- for procfs it's a bit more tricky, since procfs PCI access has only
one file per device, and access to a specific resource first needs
to be set up with some ioctl calls. But mmap is only supported for
the same resources as sysfs exposes with mmap support, and otherwise
rejected, so we can set the mapping unconditionally at open time
without harm.
A special consideration is for arch_can_pci_mmap_io() - we need to
make sure that the ->f_mapping doesn't alias between ioport and iomem
space. There are only 2 ways in-tree to support mmap of ioports: generic
PCI mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single
architecture hand-rolling. Both approaches support ioport mmap through a
special PFN range and not through magic PTE attributes. Aliasing is
therefore not a problem.
The only difference in access checks left is that sysfs PCI mmap does
not check for CAP_RAWIO. I'm not really sure whether that should be
added or not.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20210204165831.2703772-3-daniel.vetter@ffwll.ch
We are already doing this for all the regular sysfs files on PCI
devices, but not yet on the legacy io files on the PCI buses. Thus far
no problem, but in the next patch I want to wire up iomem revoke
support. That needs the vfs up and running already to make sure that
iomem_get_mapping() works.
Wire it up exactly like the existing code in
pci_create_sysfs_dev_files(). Note that pci_remove_legacy_files()
doesn't need a check since the one for pci_bus->legacy_io is
sufficient.
An alternative solution would be to implement a callback in sysfs to
set up the address space from iomem_get_mapping() when userspace calls
mmap(). This also works, but Greg didn't really like that just to work
around an ordering issue when the kernel loads initially.
v2: Improve commit message (Bjorn)
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20210205133632.2827730-1-daniel.vetter@ffwll.ch
While PCI power states D0-D3hot can be queried from user-space via lspci,
D3cold cannot. lspci cannot provide an accurate value when the device is
in D3cold as it has to restore the device to D0 before it can access its
power state via the configuration space, leading to it reporting D0 or
another on-state. Thus lspci cannot be used to diagnose power consumption
issues for devices that can enter D3cold or to ensure that devices properly
enter D3cold at all.
Add a new sysfs device attribute for the PCI power state, showing the
current power state as seen by the kernel.
[bhelgaas: drop READ_ONCE(), see discussion at the link]
Link: https://lore.kernel.org/r/20201102141520.831630-1-luzmaximilian@gmail.com
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
- Remove unnecessary #includes (Gustavo Pimentel)
- Fix intel_mid_pci.c build error when !CONFIG_ACPI (Randy Dunlap)
- Use scnprintf(), not snprintf(), in sysfs "show" functions (Krzysztof
Wilczyński)
- Simplify pci-pf-stub by using module_pci_driver() (Liu Shixin)
- Print IRQ used by Link Bandwidth Notification (Dongdong Liu)
- Update sysfs mmap-related #ifdef comments (Clint Sbisa)
- Simplify pci_dev_reset_slot_function() (Lukas Wunner)
- Use "NULL" instead of "0" to fix sparse warnings (Gustavo Pimentel)
- Simplify bool comparisons (Krzysztof Wilczyński)
- Drop double zeroing for P2PDMA sg_init_table() (Julia Lawall)
* pci/misc:
PCI: v3-semi: Remove unneeded break
PCI/P2PDMA: Drop double zeroing for sg_init_table()
PCI: Simplify bool comparisons
PCI: endpoint: Use "NULL" instead of "0" as a NULL pointer
PCI: Simplify pci_dev_reset_slot_function()
PCI: Update mmap-related #ifdef comments
PCI/LINK: Print IRQ number used by port
PCI/IOV: Simplify pci-pf-stub with module_pci_driver()
PCI: Use scnprintf(), not snprintf(), in sysfs "show" functions
x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled
PCI: Remove unnecessary header includes
f719582435 ("PCI: Add pci_mmap_resource_range() and use it for ARM64")
changed the #ifdef condition around pci_create_resource_files(),
pci_remove_resource_files(), and related functions, but did not update
comments at the #else and #ifdef.
Update the comments to match the #ifdef.
[bhelgaas: commit log, drop #endif comment since it's close to the #else]
Link: https://lore.kernel.org/r/20200821155121.nzxjeeoze4h5pone@amazon.com
Signed-off-by: Clint Sbisa <csbisa@amazon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The PCI sysfs "config" file allows large reads, and the resulting PCI
config reads can take several milliseconds to complete. Testing with the
cyclictest [1] benchmark showed 5ms+ latencies.
Add a schedule point in pci_read_config() to reduce the maximum latency.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git/
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20200824052025.48362-1-benbjiang@tencent.com
Reported-by: Bin Lai <robinlai@tencent.com>
Signed-off-by: Jiang Biao <benbjiang@tencent.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
- Move _HPX type array from stack to static data (Colin Ian King)
- Avoid an ASMedia XHCI USB PME# defect; apparently it doesn't assert
PME# when USB3.0 devices are hotplugged in D0 (Kai-Heng Feng)
- Revert sysfs "rescan" file renames that broke an application (Kelsey
Skunberg)
* pci/misc:
PCI: sysfs: Revert "rescan" file renames
PCI: Avoid ASMedia XHCI USB PME# from D0 defect
PCI/ACPI: Move pcie_to_hpx3_type[] from stack to static data
We changed these sysfs filenames:
.../pci_bus/<domain:bus>/rescan -> .../pci_bus/<domain:bus>/bus_rescan
.../<domain🚌dev.fn>/rescan -> .../<domain🚌dev.fn>/dev_rescan
and Ruslan reported [1] that this broke a userspace application.
Revert these name changes so both files are named "rescan" again.
Note that we have to use __ATTR() to assign custom C symbols, i.e.,
"struct device_attribute <symbol>".
[1] https://lore.kernel.org/r/CAB=otbSYozS-ZfxB0nCiNnxcbqxwrHOSYxJJtDKa63KzXbXgpw@mail.gmail.com
[bhelgaas: commit log, use __ATTR() both places so we don't have to rename
the attributes]
Fixes: 8bdfa145f5 ("PCI: sysfs: Define device attributes with DEVICE_ATTR*()")
Fixes: 4e2b79436e ("PCI: sysfs: Change DEVICE_ATTR() to DEVICE_ATTR_WO()")
Link: https://lore.kernel.org/r/20200325151708.32612-1-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <kelsey.skunberg@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org # v5.4+
Previously some PCI speed strings came from pci_speed_string(), some came
from the PCIe-specific PCIE_SPEED2STR(), and some came from a PCIe-specific
switch statement. These methods were inconsistent:
pci_speed_string() PCIE_SPEED2STR() switch
------------------ ---------------- ------
33 MHz PCI
...
2.5 GT/s PCIe 2.5 GT/s 2.5 GT/s
5.0 GT/s PCIe 5 GT/s 5 GT/s
8.0 GT/s PCIe 8 GT/s 8 GT/s
16.0 GT/s PCIe 16 GT/s 16 GT/s
32.0 GT/s PCIe 32 GT/s 32 GT/s
Standardize on pci_speed_string() as the single source of these strings.
Note that this adds ".0" and "PCIe" to some messages, including sysfs
"max_link_speed" files, a brcmstb "link up" message, and the link status
dmesg logging, e.g.,
nvme 0000:01:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:01.1 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
I think it's better to standardize on a single version of the speed text.
Previously we had strings like this:
/sys/bus/pci/slots/0/cur_bus_speed: 8.0 GT/s PCIe
/sys/bus/pci/slots/0/max_bus_speed: 8.0 GT/s PCIe
/sys/devices/pci0000:00/0000:00:1c.0/current_link_speed: 8 GT/s
/sys/devices/pci0000:00/0000:00:1c.0/max_link_speed: 8 GT/s
This changes the latter two to match the slots files:
/sys/devices/pci0000:00/0000:00:1c.0/current_link_speed: 8.0 GT/s PCIe
/sys/devices/pci0000:00/0000:00:1c.0/max_link_speed: 8.0 GT/s PCIe
Based-on-patch by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>