Release the page frag cache when tearing down the io queues
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If maxh2cdata < r2t_length then driver will form multiple
H2CData PDUs, validate R2T PDU in nvme_tcp_handle_r2t() to
reuse nvme_tcp_setup_h2c_data_pdu().
Also set req->state to NVME_TCP_SEND_H2C_PDU in
nvme_tcp_setup_h2c_data_pdu().
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Current nvmet_try_send_ddgst() code does not check whether
all data digest bytes are transmitted, fix this by returning
-EAGAIN if all data digest bytes are not transmitted.
Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If a reset controller is executed while the initiator
is performing some I/O the driver may leak the memory allocated
for the commands' iovec.
Make sure that nvmet_tcp_uninit_data_in_cmds() releases
all the memory.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Makes the code easier to read and to debug.
Sets the freed pointers to NULL, it will be useful
when destroying the queues to understand if the commands'
buffers have been released already or not.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If the initiator executes a reset controller operation while
performing I/O, the target kernel will crash because of a race condition
between release_queue and io_work;
nvmet_tcp_uninit_data_in_cmds() may be executed while io_work
is running, calling flush_work() was not sufficient to
prevent this because io_work could requeue itself.
Fix this bug by using cancel_work_sync() to prevent io_work
from requeuing itself and set rcv_state to NVMET_TCP_RECV_ERR to
make sure we don't receive any more data from the socket.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
It is clearer to initialize rc at the beginning of the function.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Recently, a new field got added to the smb3_fs_context struct
named server_hostname. While creating extra channels, pick up
this field from primary channel.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Recent fix to maintain a nosharesock state on the
server struct caused a regression. It updated this
field in the old tcp session, and not the new one.
This caused the multichannel scenario to misbehave.
Fixes: c9f1c19cf7 (cifs: nosharesock should not share socket with future sessions)
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
On systems with overclocking enabled, CPPC Highest Performance can be
hard coded to 0xff. In this case even if we have cores with different
highest performance, ITMT can't be enabled as the current implementation
depends on CPPC Highest Performance.
On such systems we can use MSR_HWP_CAPABILITIES maximum performance field
when CPPC.Highest Performance is 0xff.
Due to legacy reasons, we can't solely depend on MSR_HWP_CAPABILITIES as
in some older systems CPPC Highest Performance is the only way to identify
different performing cores.
Reported-by: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The power state switch needs to happen first, as that
kickstarts the firmware into normal mode.
Fixes: c9c14be664 ("usb: typec: tipd: Switch CD321X power state to S0")
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Sven Peter <sven@svenpeter.dev>
Signed-off-by: Hector Martin <marcan@marcan.st>
Link: https://lore.kernel.org/r/20211120030717.84287-3-marcan@marcan.st
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SPSS should've been SSPS.
Fixes: c9c14be664 ("usb: typec: tipd: Switch CD321X power state to S0")
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Sven Peter <sven@svenpeter.dev>
Signed-off-by: Hector Martin <marcan@marcan.st>
Link: https://lore.kernel.org/r/20211120030717.84287-2-marcan@marcan.st
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the circular lock dependency and unbalanced unlock of addess0_mutex
introduced when fixing an address0_mutex enumeration retry race in commit
ae6dc22d2d1 ("usb: hub: Fix usb enumeration issue due to address0 race")
Make sure locking order between port_dev->status_lock and address0_mutex
is correct, and that address0_mutex is not unlocked in hub_port_connect
"done:" codepath which may be reached without locking address0_mutex
Fixes: 6ae6dc22d2 ("usb: hub: Fix usb enumeration issue due to address0 race")
Cc: <stable@vger.kernel.org>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20211123101656.1113518-1-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After commit 4adcf2e582 ("cpufreq: intel_pstate: Add ->offline and
->online callbacks") the EPP value set by the "performance" scaling
algorithm in the active mode is not restored after an offline/online
cycle which replaces it with the saved EPP value coming from user
space.
Address this issue by forcing intel_pstate_hwp_set() to set a new
EPP value when it runs first time after online.
Fixes: 4adcf2e582 ("cpufreq: intel_pstate: Add ->offline and ->online callbacks")
Link: https://lore.kernel.org/linux-pm/adc7132c8655bd4d1c8b6129578e931a14fe1db2.camel@linux.intel.com/
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit fbdc21e9b0 ("cpufreq: intel_pstate: Add Icelake servers
support in no-HWP mode") enabled the use of Intel P-State driver
for Ice Lake servers.
But it doesn't cover the case when OS can't control P-States.
Therefore, for Ice Lake server, if MSR_MISC_PWR_MGMT bits 8 or 18
are enabled, then the Intel P-State driver should exit as OS can't
control P-States.
Fixes: fbdc21e9b0 ("cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode")
Signed-off-by: Adamos Ttofari <attofari@amazon.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently mvpp2_xdp_setup won't allow attaching XDP program if
mtu > ETH_DATA_LEN (1500).
The mvpp2_change_mtu on the other hand checks whether
MVPP2_RX_PKT_SIZE(mtu) > MVPP2_BM_LONG_PKT_SIZE.
These two checks are semantically different.
Moreover this limit can be increased to MVPP2_MAX_RX_BUF_SIZE, since in
mvpp2_rx we have
xdp.data = data + MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM;
xdp.frame_sz = PAGE_SIZE;
Change the checks to check whether
mtu > MVPP2_MAX_RX_BUF_SIZE
Fixes: 07dd0a7aae ("mvpp2: add basic XDP support")
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling ipa_cmd_pipeline_clear() after stopping the channel
underlying the AP<-modem RX endpoint can lead to a deadlock.
This occurs in the ->runtime_suspend device power operation for the
IPA driver. While this callback is in progress, any other requests
for power will block until the callback returns.
Stopping the AP<-modem RX channel does not prevent the modem from
sending another packet to this endpoint. If a packet arrives for an
RX channel when the channel is stopped, an SUSPEND IPA interrupt
condition will be pending. Handling an IPA interrupt requires
power, so ipa_isr_thread() calls pm_runtime_get_sync() first thing.
The problem occurs because a "pipeline clear" command will not
complete while such a SUSPEND interrupt condition exists. So the
SUSPEND IPA interrupt handler won't proceed until it gets power;
that won't happen until the ->runtime_suspend callback (and its
"pipeline clear" command) completes; and that can't happen while
the SUSPEND interrupt condition exists.
It turns out that in this case there is no need to use the "pipeline
clear" command. There are scenarios in which clearing the pipeline
is required while suspending, but those are not (yet) supported
upstream. So a simple fix, avoiding the potential deadlock, is to
stop calling ipa_cmd_pipeline_clear() in ipa_endpoint_suspend().
This removes the only user of ipa_cmd_pipeline_clear(), so get rid
of that function. It can be restored again whenever it's needed.
This is basically a manual revert along with an explanation for
commit 6cb63ea6a3 ("net: ipa: introduce ipa_cmd_tag_process()").
Fixes: 6cb63ea6a3 ("net: ipa: introduce ipa_cmd_tag_process()")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the process of driver probing, probe function should return < 0
for failure, otherwise kernel will treat value == 0 as success.
Therefore, we should set err to -EINVAL when
adapter->registered_device_map is NULL. Otherwise kernel will assume
that driver has been successfully probed and will cause unexpected
errors.
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original changes brakes MAC address assignment on older chip
versions (see bug report [0]), and it brakes random MAC assignment.
is_valid_ether_addr() requires that its argument is word-aligned.
Add the missing alignment to array mac_addr.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=215087
Fixes: 1c5d09d587 ("ethernet: r8169: use eth_hw_addr_set()")
Reported-by: Richard Herbert <rherbert@sympatico.ca>
Tested-by: Richard Herbert <rherbert@sympatico.ca>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-11-22
Maciej Fijalkowski says:
Here are the two fixes for issues around ethtool's set_channels()
callback for ice driver. Both are related to XDP resources. First one
corrects the size of vsi->txq_map that is used to track the usage of Tx
resources and the second one prevents the wrong refcounting of bpf_prog.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder says:
====================
net: ipa: prevent shutdown during setup
The setup phase of the IPA driver occurs in one of two ways.
Normally, it is done directly by the main driver probe function.
But some systems (those having a "modem-init" DTS property) don't
start setup until an SMP2P interrupt (sent by the modem) arrives.
Because it isn't performed by the probe function, setup on
"modem-init" systems could be underway at the time a driver
remove (or shutdown) request arrives (or vice-versa). This
situation can lead to hardware state not being cleaned up
properly.
This series addresses this problem by having the driver remove
function disable the setup interrupt. A consequence of this is
that setup will complete if it is underway when the remove function
is called.
So now, when removing the driver, setup:
- will have already completed;
- is underway, and will complete before proceeding; or
- will not have begun (and will not occur).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPA setup_complete flag is set at the end of ipa_setup(), when
the setup phase of initialization has completed successfully. This
occurs as part of driver probe processing, or (if "modem-init" is
specified in the DTS file) it is triggered by the "ipa-setup-ready"
SMP2P interrupt generated by the modem.
In the latter case, it's possible for driver shutdown (or remove) to
begin while setup processing is underway, and this can't be allowed.
The problem is that the setup_complete flag is not adequate to signal
that setup is underway.
If setup_complete is set, it will never be un-set, so that case is
not a problem. But if setup_complete is false, there's a chance
setup is underway.
Because setup is triggered by an interrupt on a "modem-init" system,
there is a simple way to ensure the value of setup_complete is safe
to read. The threaded handler--if it is executing--will complete as
part of a request to disable the "ipa-modem-ready" interrupt. This
means that ipa_setup() (which is called from the handler) will run
to completion if it was underway, or will never be called otherwise.
The request to disable the "ipa-setup-ready" interrupt is currently
made within ipa_modem_stop(). Instead, disable the interrupt
outside that function in the two places it's called. In the case of
ipa_remove(), this ensures the setup_complete flag is safe to read
before we read it.
Rename ipa_smp2p_disable() to be ipa_smp2p_irq_disable_setup(), to be
more specific about its effect.
Fixes: 530f9216a9 ("soc: qcom: ipa: AP/modem communications")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently maintain a "disabled" Boolean flag to determine whether
the "ipa-setup-ready" SMP2P IRQ handler does anything. That flag
must be accessed under protection of a mutex.
Instead, disable the SMP2P interrupt when requested, which prevents
the interrupt handler from ever being called. More importantly, it
synchronizes a thread disabling the interrupt with the completion of
the interrupt handler in case they run concurrently.
Use the IPA setup_complete flag rather than the disabled flag in the
handler to determine whether to ignore any interrupts arriving after
the first.
Rename the "disabled" flag to be "setup_disabled", to be specific
about its purpose.
Fixes: 530f9216a9 ("soc: qcom: ipa: AP/modem communications")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sr is a repeated start, it is used in both I2C and SMBus protocols.
Provide its description and replace start ("S") conditions with repeated
start ("Sr") conditions when relevant. This allows the documentation to
match the SMBus specification available at [1].
[1] http://www.smbus.org/specs/SMBus_3_1_20180319.pdf
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Ido Schimmel says:
====================
mlxsw: Two small fixes
Patch #1 fixes a recent regression that prevents the driver from loading
with old firmware versions.
Patch #2 protects the driver from a NULL pointer dereference when
working on top of a buggy firmware. This was never observed in an actual
system, only on top of an emulator during development.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When processing port up/down events generated by the device's firmware,
the driver protects itself from events reported for non-existent local
ports, but not the CPU port (local port 0), which exists, but lacks a
netdev.
This can result in a NULL pointer dereference when calling
netif_carrier_{on,off}().
Fix this by bailing early when processing an event reported for the CPU
port. Problem was only observed when running on top of a buggy emulator.
Fixes: 28b1987ef5 ("mlxsw: spectrum: Register CPU port with devlink")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver fails to load with old firmware versions that cannot report
the maximum number of RIF MAC profiles [1].
Fix this by defaulting to a maximum of a single profile in such
situations, as multiple profiles are not supported by old firmware
versions.
[1]
mlxsw_spectrum 0000:03:00.0: cannot register bus device
mlxsw_spectrum: probe of 0000:03:00.0 failed with error -5
Fixes: 1c375ffb2e ("mlxsw: spectrum_router: Expose RIF MAC profiles to devlink resource")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reported-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Lu says:
====================
smc: Fixes for closing process and minor cleanup
Patch 1 is a minor cleanup for local struct sock variables.
Patch 2 ensures the active closing side enters TIME_WAIT.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The side that actively closed socket, it's clcsock doesn't enter
TIME_WAIT state, but the passive side does it. It should show the same
behavior as TCP sockets.
Consider this, when client actively closes the socket, the clcsock in
server enters TIME_WAIT state, which means the address is occupied and
won't be reused before TIME_WAIT dismissing. If we restarted server, the
service would be unavailable for a long time.
To solve this issue, shutdown the clcsock in [A], perform the TCP active
close progress first, before the passive closed side closing it. So that
the actively closed side enters TIME_WAIT, not the passive one.
Client | Server
close() // client actively close |
smc_release() |
smc_close_active() // PEERCLOSEWAIT1 |
smc_close_final() // abort or closed = 1|
smc_cdc_get_slot_and_msg_send() |
[A] |
|smc_cdc_msg_recv_action() // ACTIVE
| queue_work(smc_close_wq, &conn->close_work)
| smc_close_passive_work() // PROCESSABORT or APPCLOSEWAIT1
| smc_close_passive_abort_received() // only in abort
|
|close() // server recv zero, close
| smc_release() // PROCESSABORT or APPCLOSEWAIT1
| smc_close_active()
| smc_close_abort() or smc_close_final() // CLOSED
| smc_cdc_get_slot_and_msg_send() // abort or closed = 1
smc_cdc_msg_recv_action() | smc_clcsock_release()
queue_work(smc_close_wq, &conn->close_work) | sock_release(tcp) // actively close clc, enter TIME_WAIT
smc_close_passive_work() // PEERCLOSEWAIT1 | smc_conn_free()
smc_close_passive_abort_received() // CLOSED|
smc_conn_free() |
smc_clcsock_release() |
sock_release(tcp) // passive close clc |
Link: https://www.spinics.net/lists/netdev/msg780407.html
Fixes: b38d732477 ("smc: socket closing and linkgroup cleanup")
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There remains some variables to replace with local struct sock. So clean
them up all.
Fixes: 3163c5071f ("net/smc: use local struct sock variables consistently")
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MIPS/IA64 define END as assembly function ending, which conflict
with END definition in slip.h, just undef it at first
Reported-by: lkp@intel.com
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
MIPS/IA64 define END as assembly function ending, which conflict
with END definition in mkiss.c, just undef it at first
Reported-by: lkp@intel.com
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5fa6863ba6 ("spi: Check we have a spi_device_id for each DT
compatible") added a test to check that every SPI driver has a
spi_device_id for each DT compatiable string defined by the driver
and warns if the spi_device_id is missing. The spi_device_id is
missing for the MMC SPI driver and the following warning is now seen.
WARNING KERN SPI driver mmc_spi has no spi_device_id for mmc-spi-slot
Fix this by adding the necessary spi_device_id.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lore.kernel.org/r/20211115113813.238044-1-jonathanh@nvidia.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Use array_size() helper to aid in 2-factor allocation instances.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Evan Green <evgreen@chromium.org>
Acked-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
We have a special helper to get fwnode out of struct device.
Moreover, dereferencing it directly prevents the fwnode
modifications in the future.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Evan Green <evgreen@chromium.org>
Acked-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Recently ACPI gained the acpi_get_local_address() API which may be used
instead of home grown i2c_mux_gpio_get_acpi_adr().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Evan Green <evgreen@chromium.org>
Acked-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Most IMX I2C interfaces don't generate an interrupt on a stop condition,
so it won't generate a timely stop event on a slave mode transfer.
Some users, like IPMB, need a timely stop event to work properly.
So, add a timer and add the proper handling to generate a stop event in
slave mode if the interface goes idle.
Signed-off-by: Corey Minyard <minyard@acm.org>
Tested-by: Andrew Manley <andrew.manley@sealingtech.com>
Reviewed-by: Andrew Manley <andrew.manley@sealingtech.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
If a timeout is hit, it can result is incorrect data on the I2C bus
and/or memory corruptions in the guest since the device can still be
operating on the buffers it was given while the guest has freed them.
Here is, for example, the start of a slub_debug splat which was
triggered on the next transfer after one transfer was forced to timeout
by setting a breakpoint in the backend (rust-vmm/vhost-device):
BUG kmalloc-1k (Not tainted): Poison overwritten
First byte 0x1 instead of 0x6b
Allocated in virtio_i2c_xfer+0x65/0x35c age=350 cpu=0 pid=29
__kmalloc+0xc2/0x1c9
virtio_i2c_xfer+0x65/0x35c
__i2c_transfer+0x429/0x57d
i2c_transfer+0x115/0x134
i2cdev_ioctl_rdwr+0x16a/0x1de
i2cdev_ioctl+0x247/0x2ed
vfs_ioctl+0x21/0x30
sys_ioctl+0xb18/0xb41
Freed in virtio_i2c_xfer+0x32e/0x35c age=244 cpu=0 pid=29
kfree+0x1bd/0x1cc
virtio_i2c_xfer+0x32e/0x35c
__i2c_transfer+0x429/0x57d
i2c_transfer+0x115/0x134
i2cdev_ioctl_rdwr+0x16a/0x1de
i2cdev_ioctl+0x247/0x2ed
vfs_ioctl+0x21/0x30
sys_ioctl+0xb18/0xb41
There is no simple fix for this (the driver would have to always create
bounce buffers and hold on to them until the device eventually returns
the buffers), so just disable the timeout support for now.
Fixes: 3cfc883804 ("i2c: virtio: add a virtio i2c frontend driver")
Acked-by: Jie Deng <jie.deng@intel.com>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Currently interrupt storm will occur from i2c-i801 after first
transaction if SMB_ALERT signal is enabled and ever asserted. It is
enough if the signal is asserted once even before the driver is loaded
and does not recover because that interrupt is not acknowledged.
This fix aims to fix it by two ways:
- Add acknowledging for the SMB_ALERT interrupt status
- Disable the SMB_ALERT interrupt on platforms where possible since the
driver currently does not make use for it
Acknowledging resets the SMB_ALERT interrupt status on all platforms and
also should help to avoid interrupt storm on older platforms where the
SMB_ALERT interrupt disabling is not available.
For simplicity this fix reuses the host notify feature for disabling and
restoring original register value.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=177311
Reported-by: ck+kernelbugzilla@bl4ckb0x.de
Reported-by: stephane.poignant@protonmail.com
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
If driver interrupts are enabled, SMBHSTCNT_INTREN will be 1 after
the first transaction, and will stay to that value forever. This
means that interrupts will be generated for both host-initiated
transactions and also SMBus Alert events even after the driver is
unloaded. To be on the safe side, we should restore the initial state
of this bit at suspend and reboot time, as we do for several other
configuration bits already and for the same reason: the BIOS should
be handed the device in the same configuration state in which we
received it. Otherwise interrupts may be generated which nobody
will process.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Commits 95b8a5e011 ("MIPS: Remove NETLOGIC support") and edd4488aea
("ARM: remove tango platform") removed Netlogic XLR and Sigma Designs
Tango platforms which means there are no platforms using the XLR I2C
driver and it can be removed.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Netlogic XLP was removed in commit 95b8a5e011 ("MIPS: Remove NETLOGIC
support"). With those gone, the single platform left to support is
Cavium ThunderX2. Remove the Netlogic variant and DT support.
For simplicity, the existing kconfig name is retained.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by George Cherian <gcherian@marvell.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
The i.MX 8QM DTS files use two compatibles, so update the binding to fix
dtbs_check warnings like:
arch/arm64/boot/dts/freescale/imx8qm-mek.dt.yaml: i2c@5a800000:
compatible: ['fsl,imx8qm-lpi2c', 'fsl,imx7ulp-lpi2c'] is too long
Signed-off-by: Abel Vesa <abel.vesa@nxp.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
syzbot reported that the warning in perf_sigtrap() fires, saying that
the event's task does not match current:
| WARNING: CPU: 0 PID: 9090 at kernel/events/core.c:6446 perf_pending_event+0x40d/0x4b0 kernel/events/core.c:6513
| Modules linked in:
| CPU: 0 PID: 9090 Comm: syz-executor.1 Not tainted 5.15.0-syzkaller #0
| Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
| RIP: 0010:perf_sigtrap kernel/events/core.c:6446 [inline]
| RIP: 0010:perf_pending_event_disable kernel/events/core.c:6470 [inline]
| RIP: 0010:perf_pending_event+0x40d/0x4b0 kernel/events/core.c:6513
| ...
| Call Trace:
| <IRQ>
| irq_work_single+0x106/0x220 kernel/irq_work.c:211
| irq_work_run_list+0x6a/0x90 kernel/irq_work.c:242
| irq_work_run+0x4f/0xd0 kernel/irq_work.c:251
| __sysvec_irq_work+0x95/0x3d0 arch/x86/kernel/irq_work.c:22
| sysvec_irq_work+0x8e/0xc0 arch/x86/kernel/irq_work.c:17
| </IRQ>
| <TASK>
| asm_sysvec_irq_work+0x12/0x20 arch/x86/include/asm/idtentry.h:664
| RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline]
| RIP: 0010:_raw_spin_unlock_irqrestore+0x38/0x70 kernel/locking/spinlock.c:194
| ...
| coredump_task_exit kernel/exit.c:371 [inline]
| do_exit+0x1865/0x25c0 kernel/exit.c:771
| do_group_exit+0xe7/0x290 kernel/exit.c:929
| get_signal+0x3b0/0x1ce0 kernel/signal.c:2820
| arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
| handle_signal_work kernel/entry/common.c:148 [inline]
| exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
| exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
| __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
| syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
| do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
| entry_SYSCALL_64_after_hwframe+0x44/0xae
On x86 this shouldn't happen, which has arch_irq_work_raise().
The test program sets up a perf event with sigtrap set to fire on the
'sched_wakeup' tracepoint, which fired in ttwu_do_wakeup().
This happened because the 'sched_wakeup' tracepoint also takes a task
argument passed on to perf_tp_event(), which is used to deliver the
event to that other task.
Since we cannot deliver synchronous signals to other tasks, skip an event if
perf_tp_event() is targeted at another task and perf_event_attr::sigtrap is
set, which will avoid ever entering perf_sigtrap() for such events.
Fixes: 97ba62b278 ("perf: Add support for SIGTRAP on perf events")
Reported-by: syzbot+663359e32ce6f1a305ad@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YYpoCOBmC/kJWfmI@elver.google.com
We found that a process with 10 thousnads threads has been encountered
a regression problem from Linux-v4.14 to Linux-v5.4. It is a kind of
workload which will concurrently allocate lots of memory in different
threads sometimes. In this case, we will see the down_read_trylock()
with a high hotspot. Therefore, we suppose that rwsem has a regression
at least since Linux-v5.4. In order to easily debug this problem, we
write a simply benchmark to create the similar situation lile the
following.
```c++
#include <sys/mman.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sched.h>
#include <cstdio>
#include <cassert>
#include <thread>
#include <vector>
#include <chrono>
volatile int mutex;
void trigger(int cpu, char* ptr, std::size_t sz)
{
cpu_set_t set;
CPU_ZERO(&set);
CPU_SET(cpu, &set);
assert(pthread_setaffinity_np(pthread_self(), sizeof(set), &set) == 0);
while (mutex);
for (std::size_t i = 0; i < sz; i += 4096) {
*ptr = '\0';
ptr += 4096;
}
}
int main(int argc, char* argv[])
{
std::size_t sz = 100;
if (argc > 1)
sz = atoi(argv[1]);
auto nproc = std:🧵:hardware_concurrency();
std::vector<std::thread> thr;
sz <<= 30;
auto* ptr = mmap(nullptr, sz, PROT_READ | PROT_WRITE, MAP_ANON |
MAP_PRIVATE, -1, 0);
assert(ptr != MAP_FAILED);
char* cptr = static_cast<char*>(ptr);
auto run = sz / nproc;
run = (run >> 12) << 12;
mutex = 1;
for (auto i = 0U; i < nproc; ++i) {
thr.emplace_back(std::thread([i, cptr, run]() { trigger(i, cptr, run); }));
cptr += run;
}
rusage usage_start;
getrusage(RUSAGE_SELF, &usage_start);
auto start = std::chrono::system_clock::now();
mutex = 0;
for (auto& t : thr)
t.join();
rusage usage_end;
getrusage(RUSAGE_SELF, &usage_end);
auto end = std::chrono::system_clock::now();
timeval utime;
timeval stime;
timersub(&usage_end.ru_utime, &usage_start.ru_utime, &utime);
timersub(&usage_end.ru_stime, &usage_start.ru_stime, &stime);
printf("usr: %ld.%06ld\n", utime.tv_sec, utime.tv_usec);
printf("sys: %ld.%06ld\n", stime.tv_sec, stime.tv_usec);
printf("real: %lu\n",
std::chrono::duration_cast<std::chrono::milliseconds>(end -
start).count());
return 0;
}
```
The functionality of above program is simply which creates `nproc`
threads and each of them are trying to touch memory (trigger page
fault) on different CPU. Then we will see the similar profile by
`perf top`.
25.55% [kernel] [k] down_read_trylock
14.78% [kernel] [k] handle_mm_fault
13.45% [kernel] [k] up_read
8.61% [kernel] [k] clear_page_erms
3.89% [kernel] [k] __do_page_fault
The highest hot instruction, which accounts for about 92%, in
down_read_trylock() is cmpxchg like the following.
91.89 │ lock cmpxchg %rdx,(%rdi)
Sice the problem is found by migrating from Linux-v4.14 to Linux-v5.4,
so we easily found that the commit ddb20d1d3a ("locking/rwsem: Optimize
down_read_trylock()") caused the regression. The reason is that the
commit assumes the rwsem is not contended at all. But it is not always
true for mmap lock which could be contended with thousands threads.
So most threads almost need to run at least 2 times of "cmpxchg" to
acquire the lock. The overhead of atomic operation is higher than
non-atomic instructions, which caused the regression.
By using the above benchmark, the real executing time on a x86-64 system
before and after the patch were:
Before Patch After Patch
# of Threads real real reduced by
------------ ------ ------ ----------
1 65,373 65,206 ~0.0%
4 15,467 15,378 ~0.5%
40 6,214 5,528 ~11.0%
For the uncontended case, the new down_read_trylock() is the same as
before. For the contended cases, the new down_read_trylock() is faster
than before. The more contended, the more fast.
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/20211118094455.9068-1-songmuchun@bytedance.com
There are some inconsistency in the way that the handoff bit is being
handled in readers and writers that lead to a race condition.
Firstly, when a queue head writer set the handoff bit, it will clear
it when the writer is being killed or interrupted on its way out
without acquiring the lock. That is not the case for a queue head
reader. The handoff bit will simply be inherited by the next waiter.
Secondly, in the out_nolock path of rwsem_down_read_slowpath(), both
the waiter and handoff bits are cleared if the wait queue becomes
empty. For rwsem_down_write_slowpath(), however, the handoff bit is
not checked and cleared if the wait queue is empty. This can
potentially make the handoff bit set with empty wait queue.
Worse, the situation in rwsem_down_write_slowpath() relies on wstate,
a variable set outside of the critical section containing the ->count
manipulation, this leads to race condition where RWSEM_FLAG_HANDOFF
can be double subtracted, corrupting ->count.
To make the handoff bit handling more consistent and robust, extract
out handoff bit clearing code into the new rwsem_del_waiter() helper
function. Also, completely eradicate wstate; always evaluate
everything inside the same critical section.
The common function will only use atomic_long_andnot() to clear bits
when the wait queue is empty to avoid possible race condition. If the
first waiter with handoff bit set is killed or interrupted to exit the
slowpath without acquiring the lock, the next waiter will inherit the
handoff bit.
While at it, simplify the trylock for loop in
rwsem_down_write_slowpath() to make it easier to read.
Fixes: 4f23dbc1e6 ("locking/rwsem: Implement lock handoff to prevent lock starvation")
Reported-by: Zhenhua Ma <mazhenhua@xiaomi.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211116012912.723980-1-longman@redhat.com