Move the refcount manipulation of the MMU context to the point where the
hardware state is programmed. At that point it is also known if a previous
MMU state is still there, or the state needs to be reprogrammed with a
potentially different context.
Cc: stable@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Michael Walle <michael@walle.cc>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
After a reset the GPU is no longer using the MMU context and may be
restarted with a different context. While the mmu_state proeprly was
cleared, the context wasn't unreferenced, leading to a memory leak.
Cc: stable@vger.kernel.org # 5.4
Reported-by: Michael Walle <michael@walle.cc>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Michael Walle <michael@walle.cc>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
When the GPU is reset both the current exec state, as well as all MMU
state is lost. Move the driver side state tracking into the reset function
to keep hardware and software state from diverging.
Cc: stable@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Michael Walle <michael@walle.cc>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The MMU state may be kept across a runtime suspend/resume cycle, as we
avoid a full hardware reset to keep the latency of the runtime PM small.
Don't pretend that the MMU state is lost in driver state. The MMU
context is pushed out when new HW jobs with a different context are
coming in. The only exception to this is when the GPU is unbound, in
which case we need to make sure to also free the last active context.
Cc: stable@vger.kernel.org # 5.4
Reported-by: Michael Walle <michael@walle.cc>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Michael Walle <michael@walle.cc>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
While the DMA frontend can only be active when the MMU context is set, the
reverse isn't necessarily true, as the frontend can be stopped while the
MMU state is kept. Stop treating mmu_context being set as a indication that
the frontend is running and instead add a explicit property.
Cc: stable@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Michael Walle <michael@walle.cc>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The prev context is the MMU context at the time of the job
queueing in hardware. As a job might be queued multiple times
due to recovery after a GPU hang, we need to make sure to put
the stale prev MMU context from a prior queuing, to avoid the
reference and thus the MMU context leaking.
Cc: stable@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Michael Walle <michael@walle.cc>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Being able to have the refcount manipulation in an assignment makes
it much easier to parse the code.
Cc: stable@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Michael Walle <michael@walle.cc>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Drivers:
- cmos: fix a locking issue
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEBqsFVZXh8s/0O5JiY6TcMGxwOjIFAmFBHF4ACgkQY6TcMGxw
OjJz5w/+NCnCVV1LU+N+SRke7/9rnYBQThtED00z+vIZloT+ELz0joO+Zk+NFIp0
H0CQN8Hj03nqaeurYzPunhuqYGEpih8twI3UdrCvcJ0v4Nl9/Xz4x/OLq/uiBGdT
zH15hsle7TMccrcgjAVonzPkVdhaKOgbfvybcAkvUX7rJaHTn8D0T9Ne/3+hEn4f
KfTMghAa5WeZaD4hQV9IdxG7wXy3Q3i4hEOfhBiAXz4x80WVhs7htLdH8VFlD1kc
aZ7jlnT6rna7i5SJUX+DhbDx2q+mE443UPmIu8ow8KS7zvhqoDCnMB8lnOe4diOX
djFxrD9Ql1hvOH5G8VGLbQJb8R7HsKaG7fRW/cG5eSYqspBgd5d6TooiVhI90EjX
sBIZ4Lq7PopAEH1NDh3ONOSND+D6Q2pN9xrN/yicheQVNUZL9nrXKSWXVZePETHn
4pofNnsjCWzDcmQFxrjWury9vs3WR5FEyLxHrbt+EcN6z2hZlmjxpWJAsvFRA2jC
mu9GtvgTRLM1Vhz3dRmS5nwl+2yZ4oBLcrRZ8Q+6JZa9qhmfATawUZJXroihpvZM
SZavK8OwXiztWuW+lunsRGdxSDW/QK9E5Bydc/gRcfTDRPzWqqsnDIMOXptWSJ1L
VS58F4/YrPnbdoH8wYWhZT0ZnB7P3ca9oEqxQp0afFcRjHSqkLk=
=sw2n
-----END PGP SIGNATURE-----
Merge tag 'rtc-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC fix from Alexandre Belloni:
"Fix a locking issue in the cmos rtc driver"
* tag 'rtc-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: cmos: Disable irq around direct invocation of cmos_interrupt()
Sometimes when unbinding the mv88e6xxx driver on Turris MOX, these error
messages appear:
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 1 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 0 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 100 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 1 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 0 from fdb: -2
(and similarly for other ports)
What happens is that DSA has a policy "even if there are bugs, let's at
least not leak memory" and dsa_port_teardown() clears the dp->fdbs and
dp->mdbs lists, which are supposed to be empty.
But deleting that cleanup code, the warnings go away.
=> the FDB and MDB lists (used for refcounting on shared ports, aka CPU
and DSA ports) will eventually be empty, but are not empty by the time
we tear down those ports. Aka we are deleting them too soon.
The addresses that DSA complains about are host-trapped addresses: the
local addresses of the ports, and the MAC address of the bridge device.
The problem is that offloading those entries happens from a deferred
work item scheduled by the SWITCHDEV_FDB_DEL_TO_DEVICE handler, and this
races with the teardown of the CPU and DSA ports where the refcounting
is kept.
In fact, not only it races, but fundamentally speaking, if we iterate
through the port list linearly, we might end up tearing down the shared
ports even before we delete a DSA user port which has a bridge upper.
So as it turns out, we need to first tear down the user ports (and the
unused ones, for no better place of doing that), then the shared ports
(the CPU and DSA ports). In between, we need to ensure that all work
items scheduled by our switchdev handlers (which only run for user
ports, hence the reason why we tear them down first) have finished.
Fixes: 161ca59d39 ("net: dsa: reference count the MDB entries at the cross-chip notifier level")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210914134726.2305133-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 3ac8eed625, which did
more than it said on the box, and not only it replaced to_phy_driver
with phydev->drv, but it also removed the "!drv" check, without actually
explaining why that is fine.
That patch in fact breaks suspend/resume on any system which has PHY
devices with no drivers bound.
The stack trace is:
Unable to handle kernel NULL pointer dereference at virtual address 00000000000000e8
pc : mdio_bus_phy_suspend+0xd8/0xec
lr : dpm_run_callback+0x38/0x90
Call trace:
mdio_bus_phy_suspend+0xd8/0xec
dpm_run_callback+0x38/0x90
__device_suspend+0x108/0x3cc
dpm_suspend+0x140/0x210
dpm_suspend_start+0x7c/0xa0
suspend_devices_and_enter+0x13c/0x540
pm_suspend+0x2a4/0x330
Examples why that assumption is not fine:
- There is an MDIO bus with a PHY device that doesn't have a specific
PHY driver loaded, because mdiobus_register() automatically creates a
PHY device for it but there is no specific PHY driver in the system.
Normally under those circumstances, the generic PHY driver will be
bound lazily to it (at phy_attach_direct time). But some Ethernet
drivers attach to their PHY at .ndo_open time. Until then it, the
to-be-driven-by-genphy PHY device will not have a driver. The blamed
patch amounts to saying "you need to open all net devices before the
system can suspend, to avoid the NULL pointer dereference".
- There is any raw MDIO device which has 'plausible' values in the PHY
ID registers 2 and 3, which is located on an MDIO bus whose driver
does not set bus->phy_mask = ~0 (which prevents auto-scanning of PHY
devices). An example could be a MAC's internal MDIO bus with PCS
devices on it, for serial links such as SGMII. PHY devices will get
created for those PCSes too, due to that MDIO bus auto-scanning, and
although those PHY devices are not used, they do not bother anybody
either. PCS devices are usually managed in Linux as raw MDIO devices.
Nonetheless, they do not have a PHY driver, nor does anybody attempt
to connect to them (because they are not a PHY), and therefore this
patch breaks that.
The goal itself of the patch is questionable, so I am going for a
straight revert. to_phy_driver does not seem to have a need to be
replaced by phydev->drv, in fact that might even trigger code paths
which were not given too deep of a thought.
For instance:
phy_probe populates phydev->drv at the beginning, but does not clean it
up on any error (including EPROBE_DEFER). So if the phydev driver
requests probe deferral, phydev->drv will remain populated despite there
being no driver bound.
If a system suspend starts in between the initial probe deferral request
and the subsequent probe retry, we will be calling the phydev->drv->suspend
method, but _before_ any phydev->drv->probe call has succeeded.
That is to say, if the phydev->drv is allocating any driver-private data
structure in ->probe, it pretty much expects that data structure to be
available in ->suspend. But it may not. That is a pretty insane
environment to present to PHY drivers.
In the code structure before the blamed patch, mdio_bus_phy_may_suspend
would just say "no, don't suspend" to any PHY device which does not have
a driver pointer _in_the_device_structure_ (not the phydev->drv). That
would essentially ensure that ->suspend will never get called for a
device that has not yet successfully completed probe. This is the code
structure the patch is returning to, via the revert.
Fixes: 3ac8eed625 ("net: phy: Uniform PHY driver access")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210914140515.2311548-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DSA supports connecting to a phy-handle, and has a fallback to a non-OF
based method of connecting to an internal PHY on the switch's own MDIO
bus, if no phy-handle and no fixed-link nodes were present.
The -ENODEV error code from the first attempt (phylink_of_phy_connect)
is what triggers the second attempt (phylink_connect_phy).
However, when the first attempt returns a different error code than
-ENODEV, this results in an unbalance of calls to phylink_create and
phylink_destroy by the time we exit the function. The phylink instance
has leaked.
There are many other error codes that can be returned by
phylink_of_phy_connect. For example, phylink_validate returns -EINVAL.
So this is a practical issue too.
Fixes: aab9c4067d ("net: dsa: Plug in PHYLINK support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20210914134331.2303380-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change my email to my unaffiliated address and move me to reviewer status.
Link: https://lore.kernel.org/r/20210909200221.29981-1-jonathan.derrick@intel.com
Signed-off-by: Jon Derrick <jonathan.derrick@linux.dev>
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Nirmal Patel <nirmal.patel@linux.intel.com>
Some AMD GPUs have built-in USB xHCI and USB Type-C UCSI controllers with
power dependencies between the GPU and the other functions as in
6d2e369f0d ("PCI: Add NVIDIA GPU multi-function power dependencies").
Add device link support for the AMD integrated USB xHCI and USB Type-C UCSI
controllers.
Without this, runtime power management, including GPU resume and temp and
fan sensors don't work correctly.
Reported-at: https://gitlab.freedesktop.org/drm/amd/-/issues/1704
Link: https://lore.kernel.org/r/20210903063311.3606226-1-evan.quan@amd.com
Signed-off-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Commit 375553a932 ("PCI: Setup ACPI fwnode early and at the same time
with OF") added a call to pci_set_acpi_fwnode() in pci_setup_device(),
which unconditionally clears any fwnode previously set by
pci_set_of_node().
pci_set_acpi_fwnode() looks for ACPI_COMPANION(), which only returns the
existing fwnode if it was set by ACPI_COMPANION_SET(). If it was set by
OF instead, ACPI_COMPANION() returns NULL and pci_set_acpi_fwnode()
accidentally clears the fwnode. To fix this, look for any fwnode instead
of just ACPI companions.
Fixes a virtio-iommu boot regression in v5.15-rc1.
Fixes: 375553a932 ("PCI: Setup ACPI fwnode early and at the same time with OF")
Link: https://lore.kernel.org/r/20210913172358.1775381-1-jean-philippe@linaro.org
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
7bac54497c ("PCI/VPD: Determine VPD size in pci_vpd_init()") reads VPD at
enumeration-time to find the size. But this is quite slow, and we don't
need the size until we actually need data from VPD. Dave reported a boot
slowdown of more than two minutes [1].
Defer the VPD sizing until a driver or the user (via sysfs) requests
information from VPD.
If devices are quirked because VPD is known not to work, don't bother even
looking for the VPD capability. The VPD will not be accessible at all.
[1] https://lore.kernel.org/r/20210913141818.GA27911@codemonkey.org.uk/
Link: https://lore.kernel.org/r/20210914215543.GA1437800@bjorn-Precision-5520
Fixes: 7bac54497c ("PCI/VPD: Determine VPD size in pci_vpd_init()")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The qnx4 directory entries are 64-byte blocks that have different
contents depending on the a status byte that is in the last byte of the
block.
In particular, a directory entry can be either a "link info" entry with
a 48-byte name and pointers to the real inode information, or an "inode
entry" with a smaller 16-byte name and the full inode information.
But the code was written to always just treat the directory name as if
it was part of that "inode entry", and just extend the name to the
longer case if the status byte said it was a link entry.
That work just fine and gives the right results, but now that gcc is
tracking data structure accesses much more, the code can trigger a
compiler error about using up to 48 bytes (the long name) in a structure
that only has that shorter name in it:
fs/qnx4/dir.c: In function ‘qnx4_readdir’:
fs/qnx4/dir.c:51:32: error: ‘strnlen’ specified bound 48 exceeds source size 16 [-Werror=stringop-overread]
51 | size = strnlen(de->di_fname, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from fs/qnx4/qnx4.h:3,
from fs/qnx4/dir.c:16:
include/uapi/linux/qnx4_fs.h:45:25: note: source object declared here
45 | char di_fname[QNX4_SHORT_NAME_MAX];
| ^~~~~~~~
which is because the source code doesn't really make this whole "one of
two different types" explicit.
Fix this by introducing a very explicit union of the two types, and
basically explaining to the compiler what is really going on.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The sparc mdesc code does pointer games with 'struct mdesc_hdr', but
didn't describe to the compiler how that header is then followed by the
data that the header describes.
As a result, gcc is now unhappy since it does stricter pointer range
tracking, and doesn't understand about how these things work. This
results in various errors like:
arch/sparc/kernel/mdesc.c: In function ‘mdesc_node_by_name’:
arch/sparc/kernel/mdesc.c:647:22: error: ‘strcmp’ reading 1 or more bytes from a region of size 0 [-Werror=stringop-overread]
647 | if (!strcmp(names + ep[ret].name_offset, name))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
which are easily avoided by just describing 'struct mdesc_hdr' better,
and making the node_block() helper function look into that unsized
data[] that follows the header.
This makes the sparc64 build happy again at least for my cross-compiler
version (gcc version 11.2.1).
Link: https://lore.kernel.org/lkml/CAHk-=wi4NW3NC0xWykkw=6LnjQD6D_rtRtxY9g8gQAJXtQMi8A@mail.gmail.com/
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge absolute_pointer macro series from Guenter Roeck:
"Kernel test builds currently fail for several architectures with error
messages such as the following.
drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
arch/m68k/include/asm/string.h:72:25: error:
'__builtin_memcpy' reading 6 bytes from a region of size 0
[-Werror=stringop-overread]
Such warnings may be reported by gcc 11.x for string and memory
operations on fixed addresses if gcc's builtin functions are used for
those operations.
This series introduces absolute_pointer() to fix the problem.
absolute_pointer() disassociates a pointer from its originating symbol
type and context, and thus prevents gcc from making assumptions about
pointers passed to memory operations"
* emailed patches from Guenter Roeck <linux@roeck-us.net>:
alpha: Use absolute_pointer to define COMMAND_LINE
alpha: Move setup.h out of uapi
net: i825xx: Use absolute_pointer for memcpy from fixed memory location
compiler.h: Introduce absolute_pointer macro
alpha:allmodconfig fails to build with the following error
when using gcc 11.x.
arch/alpha/kernel/setup.c: In function 'setup_arch':
arch/alpha/kernel/setup.c:493:13: error:
'strcmp' reading 1 or more bytes from a region of size 0
Avoid the problem by declaring COMMAND_LINE as absolute_pointer().
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most of the contents of setup.h have no value for userspace
applications. The file was probably moved to uapi accidentally.
Keep the file in uapi to define the alpha-specific COMMAND_LINE_SIZE.
Move all other defines to arch/alpha/include/asm/setup.h.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc 11.x reports the following compiler warning/error.
drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
arch/m68k/include/asm/string.h:72:25: error:
'__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]
Use absolute_pointer() to work around the problem.
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
absolute_pointer() disassociates a pointer from its originating symbol
type and context. Use it to prevent compiler warnings/errors such as
drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
arch/m68k/include/asm/string.h:72:25: error:
'__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]
Such warnings may be reported by gcc 11.x for string and memory
operations on fixed addresses.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The lib/bootconfig.c file is shared with the 'bootconfig' tooling, and
as a result, the changes incommit 77e02cf57b ("memblock: introduce
saner 'memblock_free_ptr()' interface") need to also be reflected in the
tooling header file.
So define the new memblock_free_ptr() wrapper, and remove unused __pa()
and memblock_free().
Fixes: 77e02cf57b ("memblock: introduce saner 'memblock_free_ptr()' interface")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Readers of rwbase can lock and unlock without taking any inner lock, if
that happens, we need the ordering provided by atomic operations to
satisfy the ordering semantics of lock/unlock. Without that, considering
the follow case:
{ X = 0 initially }
CPU 0 CPU 1
===== =====
rt_write_lock();
X = 1
rt_write_unlock():
atomic_add(READER_BIAS - WRITER_BIAS, ->readers);
// ->readers is READER_BIAS.
rt_read_lock():
if ((r = atomic_read(->readers)) < 0) // True
atomic_try_cmpxchg(->readers, r, r + 1); // succeed.
<acquire the read lock via fast path>
r1 = X; // r1 may be 0, because nothing prevent the reordering
// of "X=1" and atomic_add() on CPU 1.
Therefore audit every usage of atomic operations that may happen in a
fast path, and add necessary barriers.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20210909110203.953991276@infradead.org
The code in rwbase_write_lock() is a little non-obvious vs the
read+set 'trylock', extract the sequence into a helper function to
clarify the code.
This also provides a single site to fix fast-path ordering.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/YUCq3L+u44NDieEJ@hirez.programming.kicks-ass.net
Noticed while looking at the readers race.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20210909110203.828203010@infradead.org
In perf_event_addr_filters_apply, the task associated with
the event (event->ctx->task) is read using READ_ONCE at the beginning
of the function, checked, and then re-read from event->ctx->task,
voiding all guarantees of the checks. Reuse the value that was read by
READ_ONCE to ensure the consistency of the task struct throughout the
function.
Fixes: 375637bc52 ("perf/core: Introduce address range filtering")
Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210906015310.12802-1-baptiste.lepers@gmail.com
This reverts commit 2112ff5ce0.
We no longer need to track the truncation count, the one user that did
need it has been converted to using iov_iter_restore() instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Get rid of the need to do re-expand and revert on an iterator when we
encounter a short IO, or failure that warrants a retry. Use the new
state save/restore helpers instead.
We keep the iov_iter_state persistent across retries, if we need to
restart the read or write operation. If there's a pending retry, the
operation will always exit with the state correctly saved.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- fix ANA state updates when a namespace is not present (Anton Eidelman)
- nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show
(Dan Carpenter)
- avoid race in shutdown namespace removal (Daniel Wagner)
- fix io_work priority inversion in nvme-tcp (Keith Busch)
- destroy cm id before destroy qp to avoid use after free (Ruozhu Li)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmFB+aQLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYOfPg//fREqsvY1eMkexP/fOyrpVpLH1lLLM2SHUYwyQOtg
4CVxl2KU8YGyGLGpAMO6ToctrNV83oUQUQlgy7PyQQpUzr6m4sJm7Lbxzvqhyh96
MKeHRcWo173twiCLICx4GADSmlwofmwUSzBXdhEtt1gNmg26T+nnDewHpSqLY5Ax
BNAtFkAGTZodupVKUNsI+tqAyXY3Bgk069YqDiZfBCqU62JzLj6QX0aVXcpS2kcv
bVpJ0kbnuoUkHG/eIrLZeiyvikxQWx1KaLaL3Y24wrdkVWmZJpCYbg/gXq5v9GUQ
U8yq/iuOSTtfE0+7hNCyNz8tNLfzxejd0jDCfQcPwXOuzIfNkdzdqhkhste2HYB5
uriDj1eTQq1siH6kKu8oYQBMSxMSw9wM5q71d5QX2XJzw3hl9bqw5GqiT8Gjk1pX
K759Tj+IMFNa1SqUGI5IBfkHz4vvuwOr+pO60fwd3CzajOI8mNnNAuJ+1I2UJaEP
cgq+coD950BsWCyhL+XO/NhYPUSugI331JQqlIkVPPYxqBCJK22c7rZusM0Sm4Gq
aweWKIxtI9cbsrlf6WfJdyHf5clZ9AjMmWEzHFbDeGqUhb3pTFoT5H/UqSw9/6xe
pi7iKS6lBoZAoWKfyt9bOKfmaDlcrylAxPQ/LfyJGSTy5Buj5bcc4FO5rRJ+BIay
t88=
=wad8
-----END PGP SIGNATURE-----
Merge tag 'nvme-5.15-2021-09-15' of git://git.infradead.org/nvme into block-5.15
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.15
- fix ANA state updates when a namespace is not present (Anton Eidelman)
- nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show
(Dan Carpenter)
- avoid race in shutdown namespace removal (Daniel Wagner)
- fix io_work priority inversion in nvme-tcp (Keith Busch)
- destroy cm id before destroy qp to avoid use after free (Ruozhu Li)"
* tag 'nvme-5.15-2021-09-15' of git://git.infradead.org/nvme:
nvme-tcp: fix io_work priority inversion
nvme-rdma: destroy cm id before destroy qp to avoid use after free
nvme-multipath: fix ANA state updates when a namespace is not present
nvme: avoid race in shutdown namespace removal
nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show()
This reverts commit cf4b94c853.
Some PHYs pointed to by "phy-handle" will never bind to a driver until a
consumer attaches to it. And when the consumer attaches to it, they get
forcefully bound to a generic PHY driver. In such cases, parsing the
phy-handle property and creating a device link will prevent the consumer
from ever probing. We don't want that. So revert support for
"phy-handle" property until we come up with a better mechanism for
binding PHYs to generic drivers before a consumer tries to attach to it.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210915081933.485112-1-saravanak@google.com
Signed-off-by: Rob Herring <robh@kernel.org>
s390 is the only architecture which allows to set the
-mwarn-dynamicstack compile option. This however will also always
generate a warning with system call stack randomization, which uses
alloca to generate some random sized stack frame.
On the other hand Linus just enabled "-Werror" by default with commit
3fe617ccaf ("Enable '-Werror' by default for all kernel builds"),
which means compiles will always fail by default.
So instead of playing once again whack-a-mole for something which is
s390 specific, simply remove this option.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Get rid of warnings like:
drivers/s390/crypto/ap_bus.c:216: warning:
bad line:
drivers/s390/crypto/ap_bus.c:444:
warning: Function parameter or member 'floating' not described in 'ap_interrupt_handler'
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Prevent out-of-range access if the returned SCLP SCCB response is smaller
in size than the address of the Secure-IPL flag.
Fixes: c9896acc78 ("s390/ipl: Provide has_secure sysfs attribute")
Cc: stable@vger.kernel.org # 5.2+
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f260 ("mm: mmap: zap pages
with read mmap_sem in munmap").
find_vma() does not check if the address is >= the VMA start address;
use vma_lookup() instead.
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Fixes: dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The ICS native driver relies on the IRQ chip data to find the struct
'ics_native' describing the ICS controller but it was removed by commit
248af248a8 ("powerpc/xics: Rename the map handler in a check handler").
Revert this change to fix the Microwatt SoC platform.
Fixes: 248af248a8 ("powerpc/xics: Rename the map handler in a check handler")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210913134056.3761960-1-clg@kaod.org
It was introduced by 4035b43da6 ("xen-swiotlb: remove xen_set_nslabs")
and then not removed by 2d29960af0 ("swiotlb: dynamically allocate
io_tlb_default_mem").
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/15259326-209a-1d11-338c-5018dc38abe8@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
I consider it unhelpful that address and size of the buffer aren't put
in the log file; it makes diagnosing issues needlessly harder. The
majority of callers of swiotlb_init() also passes 1 for the "verbose"
parameter.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/2e3c8e68-36b2-4ae9-b829-bf7f75d39d47@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Commit a98f565462 ("xen-swiotlb: split xen_swiotlb_init") should not
only have added __init to the split off function, but also should have
dropped __ref from the one left.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/7cd163e1-fe13-270b-384c-2708e8273d34@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Due to the use of max(1024, ...) there's no point retrying (and issuing
bogus log messages) when the number of slabs is already no larger than
this minimum value.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/984fa426-2b7b-4b77-5ce8-766619575b7f@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Only on the 2nd of the paths leading to xen_swiotlb_init()'s "error"
label it is useful to retry the allocation; the first one did already
iterate through all possible order values.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/56477481-87da-4962-9661-5e1b277efde0@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Generic swiotlb code makes sure to keep the slab count a multiple of the
number of slabs per segment. Yet even without checking whether any such
assumption is made elsewhere, it is easy to see that xen_swiotlb_fixup()
might alter unrelated memory when calling xen_create_contiguous_region()
for the last segment, when that's not a full one - the function acts on
full order-N regions, not individual pages.
Align the slab count suitably when halving it for a retry. Add a build
time check and a runtime one. Replace the no longer useful local
variable "slabs" by an "order" one calculated just once, outside of the
loop. Re-use "order" for calculating "dma_bits", and change the type of
the latter as well as the one of "i" while touching this anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/dc054cb0-bec4-4db0-fc06-c9fc957b6e66@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
The commit referenced below removed the assignment of "bytes" from
xen_swiotlb_init() without - like done for xen_swiotlb_init_early() -
adding an assignment on the retry path, thus leading to excessively
sized allocations upon retries.
Fixes: 2d29960af0 ("swiotlb: dynamically allocate io_tlb_default_mem")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/778299d6-9cfd-1c13-026e-25ee5d14ecb3@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Of the two paths leading to the "error" label in xen_swiotlb_init() one
didn't allocate anything, while the other did already free what was
allocated.
Fixes: b827760053 ("xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/ce9c2adb-8a52-6293-982a-0d6ece943ac6@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>