Assignment to a typed pointer is sufficient in C.
No cast is needed.
The following Coccinelle script was used to detect this:
@r@
expression x;
void* e;
type T;
identifier f;
@@
(
*((T *)e)
|
((T *)x)[...]
|
((T*)x)->f
|
- (T*)
e
)
Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Link: https://lore.kernel.org/r/20200326113210.GA29951@simran-Inspiron-5558
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cleanup line over 80 characters by removing unnecessary test
'pDM_Odm->RSSI_Min <= 25'. The above test 'pDM_Odm->RSSI_Min > 25'
already guarantees that it is <= 25.
Signed-off-by: Michael Straube <straube.linux@gmail.com>
Link: https://lore.kernel.org/r/20200326084348.15072-1-straube.linux@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is one one corner case at dma_fence_signal_locked
which will raise the NULL pointer problem just like below.
->dma_fence_signal
->dma_fence_signal_locked
->test_and_set_bit
here trigger dma_fence_release happen due to the zero of fence refcount.
->dma_fence_put
->dma_fence_release
->drm_sched_fence_release_scheduled
->call_rcu
here make the union fled “cb_list” at finished fence
to NULL because struct rcu_head contains two pointer
which is same as struct list_head cb_list
Therefore, to hold the reference of finished fence at drm_sched_process_job
to prevent the null pointer during finished fence dma_fence_signal
[ 732.912867] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 732.914815] #PF: supervisor write access in kernel mode
[ 732.915731] #PF: error_code(0x0002) - not-present page
[ 732.916621] PGD 0 P4D 0
[ 732.917072] Oops: 0002 [#1] SMP PTI
[ 732.917682] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 5.4.0-rc7 #1
[ 732.918980] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[ 732.920906] RIP: 0010:dma_fence_signal_locked+0x3e/0x100
[ 732.938569] Call Trace:
[ 732.939003] <IRQ>
[ 732.939364] dma_fence_signal+0x29/0x50
[ 732.940036] drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[ 732.940996] drm_sched_process_job+0x34/0xa0 [gpu_sched]
[ 732.941910] dma_fence_signal_locked+0x85/0x100
[ 732.942692] dma_fence_signal+0x29/0x50
[ 732.943457] amdgpu_fence_process+0x99/0x120 [amdgpu]
[ 732.944393] sdma_v4_0_process_trap_irq+0x81/0xa0 [amdgpu]
v2: hold the finished fence at drm_sched_process_job instead of
amdgpu_fence_process
v3: resume the blank line
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The signed 1 bit bitfields should be unsigned, so make them unsigned.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Peter Chen <peter.chen@nxp.com>
Link: https://lore.kernel.org/r/20200325125041.94769-1-colin.king@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/usb/gadget/udc/fsl_udc_core.c:56:19:
warning: 'driver_desc' defined but not used [-Wunused-const-variable=]
It is never used, so remove it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Peter Chen <peter.chen@nxp.com>
Link: https://lore.kernel.org/r/20200326071419.19240-1-yuehaibing@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In AIO case, the request is freed up if ep_queue fails.
However, io_data->req still has the reference to this freed
request. In the case of this failure if there is aio_cancel
call on this io_data it will lead to an invalid dequeue
operation and a potential use after free issue.
Fix this by setting the io_data->req to NULL when the request
is freed as part of queue failure.
Fixes: 2e4c7553cd ("usb: gadget: f_fs: add aio support")
Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
CC: stable <stable@vger.kernel.org>
Reviewed-by: Peter Chen <peter.chen@nxp.com>
Link: https://lore.kernel.org/r/20200326115620.12571-1-sallenki@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
typec_cable_put() function had typec_cable_get in it's documentation.
Change it to reflect the correct name.
Signed-off-by: Azhar Shaikh <azhar.shaikh@intel.com>
Link: https://lore.kernel.org/r/20200326134633.26780-1-azhar.shaikh@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Here are the USB-serial updates for 5.7-rc1, including:
- support for a new family of Fintek devices
- fix for an io-edgeport slab-out-of-bounds access
- fixes for a couple of kernel-doc issues
Included are also various clean ups and some new modem device ids.
All but the io-edgeport fix have been in linux-next with no reported
issues.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQHbPq+cpGvN/peuzMLxc3C7H1lCAUCXnx+TQAKCRALxc3C7H1l
CEGnAQCCITnwkGkLZe64yyEM9txUw5oXSRFvoURs7en/2+2YZQD/br8Ygn24NgFh
Qg0dZKSoh7FLGGlJm2Dnny31Edz7kAY=
=k8M6
-----END PGP SIGNATURE-----
Merge tag 'usb-serial-5.7-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next
Johan writes:
USB-serial updates for 5.7-rc1
Here are the USB-serial updates for 5.7-rc1, including:
- support for a new family of Fintek devices
- fix for an io-edgeport slab-out-of-bounds access
- fixes for a couple of kernel-doc issues
Included are also various clean ups and some new modem device ids.
All but the io-edgeport fix have been in linux-next with no reported
issues.
* tag 'usb-serial-5.7-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: io_edgeport: fix slab-out-of-bounds read in edge_interrupt_callback
USB: serial: option: add Wistron Neweb D19Q1
USB: serial: option: add BroadMobi BM806U
USB: serial: option: add support for ASKEY WWHC050
USB: serial: f81232: add control driver for F81534A
USB: serial: fix tty cleanup-op kernel-doc
USB: serial: clean up carrier-detect helper
USB: serial: f81232: set F81534A serial port with RS232 mode
USB: serial: f81232: add F81534A support
USB: serial: f81232: use devm_kzalloc for port data
USB: serial: f81232: add tx_empty function
USB: serial: f81232: extract LSR handler
USB: serial: digi_acceleport: remove redundant assignment to pointer priv
USB: serial: relax unthrottle memory barrier
The original single target IPI fastpath patch forgot to filter the
ICR destination shorthand field. Multicast IPI is not suitable for
this feature since wakeup the multiple sleeping vCPUs will extend
the interrupt disabled time, it especially worse in the over-subscribe
and VM has a little bit more vCPUs scenario. Let's narrow it down to
single target IPI.
Two VMs, each is 76 vCPUs, one running 'ebizzy -M', the other
running cyclictest on all vCPUs, w/ this patch, the avg score
of cyclictest can improve more than 5%. (pv tlb, pv ipi, pv
sched yield are disabled during testing to avoid the disturb).
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1585189202-1708-3-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix slab-out-of-bounds read in the interrupt-URB completion handler.
The boundary condition should be (length - 1) as we access
data[position + 1].
Reported-and-tested-by: syzbot+37ba33391ad5f3935bbd@syzkaller.appspotmail.com
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Simplify function returns by merging assignment and return into
one command line.
Found with Coccinelle
@@
local idexpression ret;
expression e;
@@
-ret =
+return
e;
-return ret;
Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Link: https://lore.kernel.org/r/20200325214312.GA1936@simran-Inspiron-5558
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Compress two lines into a single line if immediate return statement is found.
It also removes variable cmd_obj as it is no longer needed.
It is done using script Coccinelle.
And coccinelle uses following semantic patch for this compression function:
@@
expression ret;
identifier f;
@@
-ret =
+return
f(...);
-return ret;
Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Link: https://lore.kernel.org/r/20200325212253.GA8175@simran-Inspiron-5558
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Compress two lines into a single line if immediate return statement is found.
It is done using script Coccinelle. And coccinelle uses following semantic
patch for this compression function:
@@
expression ret;
identifier f;
@@
-ret =
+return
f(...);
-return ret;
Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Link: https://lore.kernel.org/r/20200325205418.GA29149@simran-Inspiron-5558
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The imx SC api strongly assumes that messages are composed out of
4-bytes words but some of our message structs have odd sizeofs.
This produces many oopses with CONFIG_KASAN=y.
Fix by marking with __aligned(4).
Fixes: 666aed2d13 ("clk: imx: scu: add set parent support")
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Link: https://lkml.kernel.org/r/aad021e432b3062c142973d09b766656eec18fde.1582216144.git.leonard.crestez@nxp.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
The imx SC api strongly assumes that messages are composed out of
4-bytes words but some of our message structs have odd sizeofs.
This produces many oopses with CONFIG_KASAN=y.
Fix by marking with __aligned(4).
Fixes: fe37b48204 ("clk: imx: add scu clock common part")
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Link: https://lkml.kernel.org/r/10e97a04980d933b2cfecb6b124bf9046b6e4f16.1582216144.git.leonard.crestez@nxp.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
I copy/pasted these macros and forgot to update the argument
names and where they're passed to. Fix it so that these macros make
sense.
Reported-by: Maxime Ripard <maxime@cerno.tech>
Fixes: 194efb6e26 ("clk: gate: Add support for specifying parents via DT/pointers")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lkml.kernel.org/r/20200325022257.148244-1-sboyd@kernel.org
Tested-by: Maxime Ripard <mripard@kernel.org>
Pull 5.7 NVMe updates from Keith.
* 'nvme-5.7-rc1' of git://git.infradead.org/nvme: (42 commits)
nvme: cleanup namespace identifier reporting in nvme_init_ns_head
nvme: rename __nvme_find_ns_head to nvme_find_ns_head
nvme: refactor nvme_identify_ns_descs error handling
nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl
nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl
nvme: Fix controller creation races with teardown flow
nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl
nvme: Fix ctrl use-after-free during sysfs deletion
nvme-pci: Re-order nvme_pci_free_ctrl
nvme: Remove unused return code from nvme_delete_ctrl_sync
nvme: Use nvme_state_terminal helper
nvme: release ida resources
nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO
nvmet-tcp: optimize tcp stack TX when data digest is used
nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow
nvme-multipath: do not reset on unknown status
nvmet-rdma: allocate RW ctxs according to mdts
nvmet-rdma: Implement get_mdts controller op
nvmet: Add get_mdts op for controllers
nvme-pci: properly print controller address
...
Pull networking fixes from David Miller:
1) Fix deadlock in bpf_send_signal() from Yonghong Song.
2) Fix off by one in kTLS offload of mlx5, from Tariq Toukan.
3) Add missing locking in iwlwifi mvm code, from Avraham Stern.
4) Fix MSG_WAITALL handling in rxrpc, from David Howells.
5) Need to hold RTNL mutex in tcindex_partial_destroy_work(), from Cong
Wang.
6) Fix producer race condition in AF_PACKET, from Willem de Bruijn.
7) cls_route removes the wrong filter during change operations, from
Cong Wang.
8) Reject unrecognized request flags in ethtool netlink code, from
Michal Kubecek.
9) Need to keep MAC in reset until PHY is up in bcmgenet driver, from
Doug Berger.
10) Don't leak ct zone template in act_ct during replace, from Paul
Blakey.
11) Fix flushing of offloaded netfilter flowtable flows, also from Paul
Blakey.
12) Fix throughput drop during tx backpressure in cxgb4, from Rahul
Lakkireddy.
13) Don't let a non-NULL skb->dev leave the TCP stack, from Eric
Dumazet.
14) TCP_QUEUE_SEQ socket option has to update tp->copied_seq as well,
also from Eric Dumazet.
15) Restrict macsec to ethernet devices, from Willem de Bruijn.
16) Fix reference leak in some ethtool *_SET handlers, from Michal
Kubecek.
17) Fix accidental disabling of MSI for some r8169 chips, from Heiner
Kallweit.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (138 commits)
net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build
net: ena: Add PCI shutdown handler to allow safe kexec
selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED
selftests/net: add missing tests to Makefile
r8169: re-enable MSI on RTL8168c
net: phy: mdio-bcm-unimac: Fix clock handling
cxgb4/ptp: pass the sign of offset delta in FW CMD
net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop
net: cbs: Fix software cbs to consider packet sending time
net/mlx5e: Do not recover from a non-fatal syndrome
net/mlx5e: Fix ICOSQ recovery flow with Striding RQ
net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset
net/mlx5e: Enhance ICOSQ WQE info fields
net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure
selftests: netfilter: add nfqueue test case
netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress
netfilter: nft_fwd_netdev: validate family and chain type
netfilter: nft_set_rbtree: Detect partial overlaps on insertion
netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start()
netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion
...
- One core quirk by myself to fix the .irq_disable()
semantics when the gpiolib core takes over this callback.
- The rest is an elaborate series of 4 patches fixing Intel
laptop ACPI wakeup quirks.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAl57qSIACgkQQRCzN7AZ
XXMmqxAAiggbsjiMkufxHxDqKl0tLRkbQwlsnz5VQlH0XCPBYSRidbIfQdiDFTbD
yoBENIYeW/9aVlb0Q7a2T5H4zy1Ye3MiqObZt97ECr7DyxwNrAbPS/IgOFmsjSmu
C4HZgX6/fQpBlJnKVgMBwzdgKShX4Fb4l1gFnY7uRRjBGVkp5ZArqJALJ747zFR0
yj/kpA+hGgO24AJspF+QaLFfKkjMXKv2zgaa3HEtsYqUBH/3yBZigLKJs3koXHYB
3c7EbWxnpYLiPi/T51aN7ZbbNX7Ag6wW6rLm/6460MCYDRSOB+7fCJCZc2KZJomr
4ntq1HHxSrjOiqce+MSvc+WLI6pqjN73tyWJwhFgBfwr32Ara5FR6CO0KJT9M5d+
0i8Na0Dyp2LlZM530ygOOn48p+tiU6NVoR/BXUnXBBuYtFsOmfTtrQGh++Uw+rcJ
wraFAsSzzr7JmXzh+tUEeGGdI4EYy1T6+DppEfSbyty9W6YnWvqHP9xcyuicUyi2
g9Btl0hRhSqQZozDYjROPQox1qjUKHLnI66n/W9pVUN61GlFlSH/38yRAEpD0DZW
GlzSEkfL5SN5DQjaAOC2z0z+kfFSGdtCYMOKkyl+cKGbLCRc43PrzHB3oH+wF1Mu
0J+Hilrk9y4sGTX67jIQPV/t1qhoorRFzihuUkHBdGt//XCqtI4=
=+dEG
-----END PGP SIGNATURE-----
Merge tag 'gpio-v5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
- One core quirk by myself to fix the .irq_disable() semantics when the
gpiolib core takes over this callback.
- The rest is an elaborate series of four patches fixing Intel laptop
ACPI wakeup quirks.
* tag 'gpio-v5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 CHT + AXP288 model
gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 BYT + AXP288 model
gpiolib: acpi: Rework honor_wakeup option into an ignore_wake option
gpiolib: acpi: Correct comment for HP x2 10 honor_wakeup quirk
gpiolib: Fix irq_disable() semantics
Fourth, and last, set of fixes for v5.6. Just two important fixes to
iwlwifi regressions.
iwlwifi
* fix GEO_TX_POWER_LIMIT command on certain devices which caused
firmware to crash during initialisation
* add back device ids for three devices which were accidentally
removed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJee7S6AAoJEG4XJFUm622bcH4IAICV95Y79p9loKZVwXBrguiK
5D9rBNZcSc0Yo9AwZkhatRAJYiLu19hw2qra3YsQnaWpjESmnZtV6/ZASDcpOGSP
NaI4TMLt57dhnpqhpglvGeM9PlUbaoqX7Sl5LbhnSQoKkk7rppnk90KqbXl7SDba
4eG7+i4iMc69uw4BttVh/lfgAH0YyYpgStWi9ccoQ2ip9SrljFZwmCF7q+3/25OR
Hyty44U90BBrsi+LrRuhutR2LuR+TfXEWmtXWRzOpnwmicY8sCpeEVCMeeVy59TR
y66sjlhM5w+N1mZbzBrqUZJLdMGInljx4G4DIvvi8jbD6rVLZc8zAeZMLjO24ZM=
=CFbE
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-2020-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for v5.6
Fourth, and last, set of fixes for v5.6. Just two important fixes to
iwlwifi regressions.
iwlwifi
* fix GEO_TX_POWER_LIMIT command on certain devices which caused
firmware to crash during initialisation
* add back device ids for three devices which were accidentally
removed
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lift the common namespace identifier reporting between the shared
namespace and new nshead cases into common code. This also means
one less lock is held while doing I/O.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
There is no non __-prefixed version, so make the name a little more
readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Move the handling of an error into the function from the caller, and
only do it for an actual error on the admin command itself, not the
command parsing, as that should be enough to deal with devices claiming
a bogus version compliance.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The transition to LIVE state should not fail in case of a new controller.
Moving to DELETING state before nvme_tcp_create_ctrl() allocates all the
resources may leads to NULL dereference at teardown flow (e.g., IO tagset,
admin_q, connect_q).
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The transition to LIVE state should not fail in case of a new controller.
Moving to DELETING state before nvme_tcp_create_ctrl() allocates all the
resources may leads to NULL dereference at teardown flow (e.g., IO tagset,
admin_q, connect_q).
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Calling nvme_sysfs_delete() when the controller is in the middle of
creation may cause several bugs. If the controller is in NEW state we
remove delete_controller file and don't delete the controller. The user
will not be able to use nvme disconnect command on that controller again,
although the controller may be active. Other bugs may happen if the
controller is in the middle of create_ctrl callback and
nvme_do_delete_ctrl() starts. For example, freeing I/O tagset at
nvme_do_delete_ctrl() before it was allocated at create_ctrl callback.
To fix all those races don't allow the user to delete the controller
before it was fully created.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Put the ctrl reference count at nvme_uninit_ctrl as opposed to
nvme_init_ctrl which takes it. This decrease the reference count at the
core layer instead of decreasing it on each transport separately.
Also move the call of nvme_uninit_ctrl at PCI driver after calling to
nvme_release_prp_pools and nvme_dev_unmap, in order to put the reference
count after using the dev. This is safe because those functions use
nvme_dev which is freed only later at nvme_pci_free_ctrl.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
In case nvme_sysfs_delete() is called by the user before taking the ctrl
reference count, the ctrl may be freed during the creation and cause the
bug. Take the reference as soon as the controller is externally visible,
which is done by cdev_device_add() in nvme_init_ctrl(). Also take the
reference count at the core layer instead of taking it on each transport
separately.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Destroy the resources in the same order like in nvme_probe error flow to
improve code readability.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The return code of nvme_delete_ctrl_sync is never used, so change it to
void.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Improve code readability.
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
ida instances allocate some internal memory in addition to the base
'struct ida'. Use ida_destroy() to release that memory at module_exit().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Currently 32 bit application gets ENOTTY when it calls
compat_ioctl with NVME_IOCTL_SUBMIT_IO in 64 bit kernel.
The cause is that the results of sizeof(struct nvme_user_io),
which is used to define NVME_IOCTL_SUBMIT_IO,
are not same between 32 bit compiler and 64 bit compiler.
* 32 bit: the result of sizeof nvme_user_io is 44.
* 64 bit: the result of sizeof nvme_user_io is 48.
64 bit compiler seems to add 32 bit padding for multiple of 8 bytes.
This patch adds a compat_ioctl handler.
The handler replaces NVME_IOCTL_SUBMIT_IO32 with NVME_IOCTL_SUBMIT_IO
in case 32 bit application calls compat_ioctl for submit in 64 bit kernel.
Then, it calls nvme_ioctl as usual.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada (KIOXIA) <masahiro31.yamada@kioxia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
If we have a 4-byte data digest to send to the wire, but we
have more data to send, set MSG_MORE to tell the stack
that more is coming.
Reviewed-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit. Fix it by replacing with scnprintf().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The nvme multipath error handling defaults to controller reset if the
error is unknown. There are, however, no existing nvme status codes that
indicate a reset should be used, and resetting causes unnecessary
disruption to the rest of IO.
Change nvme's error handling to first check if failover should happen.
If not, let the normal error handling take over rather than reset the
controller.
Based-on-a-patch-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Meneghini <johnm@netapp.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Current nvmet-rdma code allocates MR pool budget based on queue size,
assuming both host and target use the same "max_pages_per_mr" count.
After limiting the mdts value for RDMA controllers, we know the factor
of maximum MR's per IO operation. Thus, make sure MR pool will be
sufficient for the required IO depth and IO size.
That is, say host's SQ size is 100, then the MR pool budget allocated
currently at target will also be 100 MRs. But 100 IO WRITE Requests
with 256 sg_count(IO size above 1MB) require 200 MRs when target's
"max_pages_per_mr" is 128.
Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Set the maximal data transfer size to be 1MB (currently mdts is
unlimited). This will allow calculating the amount of MR's that
one ctrl should allocate to fulfill it's capabilities.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Some transports, such as RDMA, would like to set the Maximum Data
Transfer Size (MDTS) according to device/port/ctrl characteristics.
This will enable the transport to set the optimal MDTS according to
controller needs and device capabilities. Add a new nvmet transport
op that is called during ctrl identification. This will not effect
transports that don't implement this option. The return value of the new
op is according to the NVMe spec definition for MDTS.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
If we failed to receive data from the socket, don't try
to further process it, we will for sure be handling a queue
error at this point. While no issue was seen with the
current behavior thus far, its safer to cease socket processing
if we detected an error.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Consolidate the request failure handling code to where
it is being fetched (nvme_tcp_try_send).
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
MAXH2CDATA is not zero based. Also no reason to limit ourselves to
1M transfers as we can do more easily. Make this an arbitrary limit
of 16M.
Reported-by: Wenhua Liu <liuw@vmware.com>
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Currently, queue io_cpu assignment is done sequentially for default,
read and poll queues based on queue id. This causes miss-alignment between
context of CPU initiating I/O and the I/O worker thread processing
queued requests or completions.
Change to modify queue io_cpu assignment to take into account queue
maps offset. Each queue io_cpu will start at zero for each queue map.
This essentially aligns read/poll queues to start over the same range as
default queues.
Testing performed by Mark with:
- ram device (nvmet)
- single CPU core (pinned)
- 100% 4k reads
- engine io_uring (not using sq_thread option)
- hipri flag set
Micro-benchmark results show a net gain of:
- increase of 18%-29% in IOPs
- reduction of 16%-22% in average latency
- reduction of 7%-23% in 99.99% latency
Baseline:
========
QDepth/Batch | IOPs [k] | Avg. Lat [us] | 99.99% Lat [us]
-----------------------------------------------------------------
1/1 | 32.4 | 30.11 | 50.94
32/8 | 179 | 168.20 | 371
CPU alignment:
=============
QDepth/Batch | IOPs [k] | Avg. Lat [us] | 99.99% Lat [us]
-----------------------------------------------------------------
1/1 | 38.5 | 25.18 | 39.16
32/8 | 231 | 130.75 | 343
Reported-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The timeout handler can use the existing nvme_poll() if it needs to
check a polled queue, allowing nvme_poll_irqdisable() to handle only
irq driven queues for the remaining callers.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Completion handling had been done in two steps: find all new completions
under a lock, then handle those completions outside the lock. This was
done to make the locked section as short as possible so that other
threads using the same lock wait less time.
The driver no longer shares locks during completion, and is in fact
lockless for interrupt driven queues, so the optimization no longer
serves its original purpose. Replace the two-pass completion queue
handler with a single pass that completes entries immediately.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The only user for tagged completion was for timeout handling. That user,
though, really only cares if the timed out command is completed, which
we can safely check within the timeout handler.
Remove the tag check to simplify completion handling.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
For set feature command when setting up NVME_FEAT_NUM_QUEUES, check
Number of I/O Completion Queues Requested (NCQR) and Number of I/O
Submission Queues Requested (NSQR) before we proceed, for invalid values
(i.e. 65535) return an appropriate NVMe invalid field status.
Signed-off-by: Amit Engel <Amit.Engel@dell.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>