Commit Graph

873808 Commits

Author SHA1 Message Date
Aneesh Kumar K.V
047e6575ae powerpc/mm: Fixup tlbie vs mtpidr/mtlpidr ordering issue on POWER9
On POWER9, under some circumstances, a broadcast TLB invalidation will
fail to invalidate the ERAT cache on some threads when there are
parallel mtpidr/mtlpidr happening on other threads of the same core.
This can cause stores to continue to go to a page after it's unmapped.

The workaround is to force an ERAT flush using PID=0 or LPID=0 tlbie
flush. This additional TLB flush will cause the ERAT cache
invalidation. Since we are using PID=0 or LPID=0, we don't get
filtered out by the TLB snoop filtering logic.

We need to still follow this up with another tlbie to take care of
store vs tlbie ordering issue explained in commit:
a5d4b5891c ("powerpc/mm: Fixup tlbie vs store ordering issue on
POWER9"). The presence of ERAT cache implies we can still get new
stores and they may miss store queue marking flush.

Cc: stable@vger.kernel.org
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190924035254.24612-3-aneesh.kumar@linux.ibm.com
2019-09-24 20:58:55 +10:00
Aneesh Kumar K.V
09ce98cacd powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flag
Rename the #define to indicate this is related to store vs tlbie
ordering issue. In the next patch, we will be adding another feature
flag that is used to handles ERAT flush vs tlbie ordering issue.

Fixes: a5d4b5891c ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.com
2019-09-24 20:58:47 +10:00
Aneesh Kumar K.V
677733e296 powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisions
The store ordering vs tlbie issue mentioned in commit
a5d4b5891c ("powerpc/mm: Fixup tlbie vs store ordering issue on
POWER9") is fixed for Nimbus 2.3 and Cumulus 1.3 revisions. We don't
need to apply the fixup if we are running on them

We can only do this on PowerNV. On pseries guest with KVM we still
don't support redoing the feature fixup after migration. So we should
be enabling all the workarounds needed, because whe can possibly
migrate between DD 2.3 and DD 2.2

Fixes: a5d4b5891c ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190924035254.24612-1-aneesh.kumar@linux.ibm.com
2019-09-24 20:57:50 +10:00
Laurent Dufour
59545ebe33 powerpc/pseries: Call H_BLOCK_REMOVE when supported
Depending on the hardware and the hypervisor, the hcall H_BLOCK_REMOVE
may not be able to process all the page sizes for a segment base page
size, as reported by the TLB Invalidate Characteristics.

For each pair of base segment page size and actual page size, this
characteristic tells us the size of the block the hcall supports.

In the case, the hcall is not supporting a pair of base segment page
size, actual page size, it is returning H_PARAM which leads to a panic
like this:

  kernel BUG at /home/srikar/work/linux.git/arch/powerpc/platforms/pseries/lpar.c:466!
  Oops: Exception in kernel mode, sig: 5 [#1]
  BE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 28 PID: 583 Comm: modprobe Not tainted 5.2.0-master #5
  NIP: c0000000000be8dc LR: c0000000000be880 CTR: 0000000000000000
  REGS: c0000007e77fb130 TRAP: 0700  Not tainted (5.2.0-master)
  MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 42224824 XER: 20000000
  CFAR: c0000000000be8fc IRQMASK: 0
  GPR00: 0000000022224828 c0000007e77fb3c0 c000000001434d00 0000000000000005
  GPR04: 9000000004fa8c00 0000000000000000 0000000000000003 0000000000000001
  GPR08: c0000007e77fb450 0000000000000000 0000000000000001 ffffffffffffffff
  GPR12: c0000007e77fb450 c00000000edfcb80 0000cd7d3ea30000 c0000000016022b0
  GPR16: 00000000000000b0 0000cd7d3ea30000 0000000000000001 c080001f04f00105
  GPR20: 0000000000000003 0000000000000004 c000000fbeb05f58 c000000001602200
  GPR24: 0000000000000000 0000000000000004 8800000000000000 c000000000c5d148
  GPR28: c000000000000000 8000000000000000 a000000000000000 c0000007e77fb580
  NIP [c0000000000be8dc] .call_block_remove+0x12c/0x220
  LR [c0000000000be880] .call_block_remove+0xd0/0x220
  Call Trace:
    0xc000000fb8c00240 (unreliable)
    .pSeries_lpar_flush_hash_range+0x578/0x670
    .flush_hash_range+0x44/0x100
    .__flush_tlb_pending+0x3c/0xc0
    .zap_pte_range+0x7ec/0x830
    .unmap_page_range+0x3f4/0x540
    .unmap_vmas+0x94/0x120
    .exit_mmap+0xac/0x1f0
    .mmput+0x9c/0x1f0
    .do_exit+0x388/0xd60
    .do_group_exit+0x54/0x100
    .__se_sys_exit_group+0x14/0x20
    system_call+0x5c/0x70
  Instruction dump:
  39400001 38a00000 4800003c 60000000 60420000 7fa9e800 38e00000 419e0014
  7d29d278 7d290074 7929d182 69270001 <0b070000> 7d495378 394a0001 7fa93040

The call to H_BLOCK_REMOVE should only be made for the supported pair
of base segment page size, actual page size and using the correct
maximum block size.

Due to the required complexity in do_block_remove() and
call_block_remove(), and the fact that currently a block size of 8 is
returned by the hypervisor, we are only supporting 8 size block to the
H_BLOCK_REMOVE hcall.

In order to identify this limitation easily in the code, a local
define HBLKR_SUPPORTED_SIZE defining the currently supported block
size, and a dedicated checking helper is_supported_hlbkr() are
introduced.

For regular pages and hugetlb, the assumption is made that the page
size is equal to the base page size. For THP the page size is assumed
to be 16M.

Fixes: ba2dd8a26b ("powerpc/pseries/mm: call H_BLOCK_REMOVE")
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190920130523.20441-3-ldufour@linux.ibm.com
2019-09-24 19:58:47 +10:00
Laurent Dufour
1211ee61b4 powerpc/pseries: Read TLB Block Invalidate Characteristics
The PAPR document specifies the TLB Block Invalidate Characteristics
which tells for each pair of segment base page size, actual page size,
the size of the block the hcall H_BLOCK_REMOVE supports.

These characteristics are loaded at boot time in a new table
hblkr_size. The table is separate from the mmu_psize_def because this
is specific to the pseries platform.

A new init function, pseries_lpar_read_hblkrm_characteristics() is
added to read the characteristics. It is called from
pSeries_setup_arch().

Fixes: ba2dd8a26b ("powerpc/pseries/mm: call H_BLOCK_REMOVE")
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190920130523.20441-2-ldufour@linux.ibm.com
2019-09-24 19:58:42 +10:00
Filippo Sironi
0b15e02f0c iommu/amd: Wait for completion of IOTLB flush in attach_device
To make sure the domain tlb flush completes before the
function returns, explicitly wait for its completion.

Signed-off-by: Filippo Sironi <sironi@amazon.de>
Fixes: 42a49f965a ("amd-iommu: flush domain tlb when attaching a new device")
[joro: Added commit message and fixes tag]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-09-24 11:40:04 +02:00
Dmytro Linkin
fe1587a7de net/mlx5e: Fix matching on tunnel addresses type
In mlx5 parse_tunnel_attr() function dispatch on encap IP address type
is performed by directly checking flow_rule_match_key() on
FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS, and then on
FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS. However, since those are stored in
union, first check is always true if any type of encap address is set,
which leads to IPv6 tunnel encap address being parsed as IPv4 by mlx5.
Determine correct IP address type by checking control key first and if
it set, take address type from match.key->addr_type.

Fixes: d1bda7eecd ("net/mlx5e: Allow matching only enc_key_id/enc_dst_port for decapsulation action")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-24 12:38:08 +03:00
Saeed Mahameed
d22fcc806b net/mlx5e: Fix traffic duplication in ethtool steering
Before this patch, when adding multiple ethtool steering rules with
identical classification, the driver used to append the new destination
to the already existing hw rule, which caused the hw to forward the
traffic to all destinations (rx queues).

Here we avoid this by setting the "no append" mlx5 fs core flag when
adding a new ethtool rule.

Fixes: 6dc6071cfc ("net/mlx5e: Add ethtool flow steering support")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-24 12:38:08 +03:00
Bodong Wang
d19a79ee38 net/mlx5: Add device ID of upcoming BlueField-2
Add the device ID of upcoming BlueField-2 integrated ConnectX-6 Dx
network controller. Its VFs will be using the generic VF device ID:
0x101e "ConnectX Family mlx5Gen Virtual Function".

Fixes: 2e9d3e83ab ("net/mlx5: Update the list of the PCI supported devices")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-24 12:38:07 +03:00
Alaa Hleihel
640bdb1fdb net/mlx5: DR, Allow matching on vport based on vhca_id
In case source_eswitch_owner_vhca_id is given as a match,
the source_vport (vhca_id) will be set in case vhca_id_valid.

This will allow matching on peer vports, vports that belong
to the other pf.

Fixes: 26d688e33f ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-24 12:38:07 +03:00
Alex Vesker
48cbde4bd2 net/mlx5: DR, Fix getting incorrect prev node in ste_free
When we free an STE and the STE is in the middle of collision
list, the prev_ste was obtained incorrectly from the list.
To avoid such issues list_entry calls replaced with standard list API.

Fixes: 26d688e33f ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-24 12:38:07 +03:00
Alex Vesker
cc5fd15fc5 net/mlx5: DR, Remove redundant vport number from action
The vport number is part of the vport_cap, there is no reason
to store in a separate variable on the vport.

Fixes: 9db810ed2d ("net/mlx5: DR, Expose steering action functionality")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-24 12:38:07 +03:00
Yevgeny Kliteynik
d32d7c52e0 net/mlx5: DR, Fix SW steering HW bits and definitions
Fix wrong reserved bits offsets.

Fixes: 97b5484ed6 ("net/mlx5: Add HW bits and definitions required for SW steering")
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-24 12:38:06 +03:00
Andrei Dulea
cc449541f2 iommu/amd: Unmap all L7 PTEs when downgrading page-sizes
When replacing a large mapping created with page-mode 7 (i.e.
non-default page size), tear down the entire series of replicated PTEs.
Besides providing access to the old mapping, another thing that might go
wrong with this issue is on the fetch_pte() code path that can return a
PDE entry of the newly re-mapped range.

While at it, make sure that we flush the TLB in case alloc_pte() fails
and returns NULL at a lower level.

Fixes: 6d568ef9a6 ("iommu/amd: Allow downgrading page-sizes in alloc_pte()")
Signed-off-by: Andrei Dulea <adulea@amazon.de>
2019-09-24 11:15:51 +02:00
Andrei Dulea
7f1f1683c1 iommu/amd: Introduce first_pte_l7() helper
Given an arbitrary pte that is part of a large mapping, this function
returns the first pte of the series (and optionally the mapped size and
number of PTEs)
It will be re-used in a subsequent patch to replace an existing L7
mapping.

Fixes: 6d568ef9a6 ("iommu/amd: Allow downgrading page-sizes in alloc_pte()")
Signed-off-by: Andrei Dulea <adulea@amazon.de>
2019-09-24 11:15:46 +02:00
Andrei Dulea
6ccb72f837 iommu/amd: Fix downgrading default page-sizes in alloc_pte()
Downgrading an existing large mapping to a mapping using smaller
page-sizes works only for the mappings created with page-mode 7 (i.e.
non-default page size).

Treat large mappings created with page-mode 0 (i.e. default page size)
like a non-present mapping and allow to overwrite it in alloc_pte().

While around, make sure that we flush the TLB only if we change an
existing mapping, otherwise we might end up acting on garbage PTEs.

Fixes: 6d568ef9a6 ("iommu/amd: Allow downgrading page-sizes in alloc_pte()")
Signed-off-by: Andrei Dulea <adulea@amazon.de>
2019-09-24 11:15:37 +02:00
Andrei Dulea
34c0989c05 iommu/amd: Fix pages leak in free_pagetable()
Take into account the gathered freelist in free_sub_pt(), otherwise we
end up leaking all that pages.

Fixes: 409afa44f9 ("iommu/amd: Introduce free_sub_pt() function")
Signed-off-by: Andrei Dulea <adulea@amazon.de>
2019-09-24 11:15:09 +02:00
Yan-Hsuan Chuang
0b8dc6abbd rtw88: configure firmware after HCI started
After firmware has been downloaded, driver should send
some information to it through H2C commands. Those H2C
commands are transmitted through TX path.

But before HCI has been started, the TX path is not
working completely. Such as PCI interfaces, the interrupts
are not enabled, hence TX interrupts will not be issued
after H2C skb has been DMAed to the device. And the H2C
skbs will not be released until the device is powered off.

Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-09-24 10:55:40 +03:00
Murphy Zhou
63d37fb4ce CIFS: fix max ea value size
It should not be larger then the slab max buf size. If user
specifies a larger size, it passes this check and goes
straightly to SMB2_set_info_init performing an insecure memcpy.

Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-23 23:28:59 -05:00
zhengbin
8559ad8e89 fs/cifs/sess.c: Remove set but not used variable 'capabilities'
Fixes gcc '-Wunused-but-set-variable' warning:

fs/cifs/sess.c: In function sess_auth_lanman:
fs/cifs/sess.c:910:8: warning: variable capabilities set but not used [-Wunused-but-set-variable]

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-23 22:51:57 -05:00
zhengbin
388962e8e9 fs/cifs/smb2pdu.c: Make SMB2_notify_init static
Fix sparse warnings:

fs/cifs/smb2pdu.c:3200:1: warning: symbol 'SMB2_notify_init' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-23 22:49:05 -05:00
Steve French
d2f15428d6 smb3: fix leak in "open on server" perf counter
We were not bumping up the "open on server" (num_remote_opens)
counter (in some cases) on opens of the share root so
could end up showing as a negative value.

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-09-23 22:48:36 -05:00
Quinn Tran
0aabb6b699 scsi: qla2xxx: Fix Nport ID display value
For N2N, the NPort ID is assigned by driver in the PLOGI ELS.  According to
FW Spec the byte order for SID is not the same as DID.

Link: https://lore.kernel.org/r/20190912180918.6436-8-hmadhani@marvell.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 23:09:43 -04:00
Quinn Tran
f3f1938bb6 scsi: qla2xxx: Fix N2N link up fail
During link up/bounce, qla driver would do command flush as part of
cleanup.  In this case, the flush can intefere with FW state.  This patch
allows FW to be in control of link up.

Link: https://lore.kernel.org/r/20190912180918.6436-7-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 23:09:43 -04:00
Quinn Tran
7f2a398d59 scsi: qla2xxx: Fix N2N link reset
Fix stalled link recovery for N2N with FC-NVMe connection.

Link: https://lore.kernel.org/r/20190912180918.6436-6-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 23:09:43 -04:00
Quinn Tran
f5187b7d1a scsi: qla2xxx: Optimize NPIV tear down process
In the case of NPIV port is being torn down, this patch will set a flag to
indicate VPORT_DELETE. This would prevent relogin to be triggered.

Link: https://lore.kernel.org/r/20190912180918.6436-5-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 23:09:43 -04:00
Quinn Tran
fd5564ba54 scsi: qla2xxx: Fix stale mem access on driver unload
On driver unload, 'remove_one' thread was allowed to advance, while session
cleanup still lag behind.  This patch ensures session deletion will finish
before remove_one can advance.

Link: https://lore.kernel.org/r/20190912180918.6436-4-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 23:09:43 -04:00
Quinn Tran
c3b6a1d397 scsi: qla2xxx: Fix unbound sleep in fcport delete path.
There are instances, though rare, where a LOGO request cannot be sent out
and the thread in free session done can wait indefinitely. Fix this by
putting an upper bound to sleep.

Link: https://lore.kernel.org/r/20190912180918.6436-3-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 23:09:42 -04:00
Himanshu Madhani
248a445adf scsi: qla2xxx: Silence fwdump template message
Print if fwdt template is present or not, only when
ql2xextended_error_logging is enabled.

Link: https://lore.kernel.org/r/20190912180918.6436-2-hmadhani@marvell.com
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 23:09:42 -04:00
YueHaibing
4b6b1bb686 scsi: hisi_sas: Make three functions static
Fix sparse warnings:

drivers/scsi/hisi_sas/hisi_sas_main.c:3686:6:
 warning: symbol 'hisi_sas_debugfs_release' was not declared. Should it be static?
drivers/scsi/hisi_sas/hisi_sas_main.c:3708:5:
 warning: symbol 'hisi_sas_debugfs_alloc' was not declared. Should it be static?
drivers/scsi/hisi_sas/hisi_sas_main.c:3799:6:
 warning: symbol 'hisi_sas_debugfs_bist_init' was not declared. Should it be static?

Link: https://lore.kernel.org/r/20190923054035.19036-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 23:09:42 -04:00
Xiang Chen
70054aa39a scsi: megaraid: disable device when probe failed after enabled device
For pci device, need to disable device when probe failed after enabled
device.

Link: https://lore.kernel.org/r/1567818450-173315-1-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 23:09:42 -04:00
Long Li
0ed8810276 scsi: storvsc: setup 1:1 mapping between hardware queue and CPU queue
storvsc doesn't use a dedicated hardware queue for a given CPU queue. When
issuing I/O, it selects returning CPU (hardware queue) dynamically based on
vmbus channel usage across all channels.

This patch advertises num_present_cpus() as number of hardware queues. This
will have upper layer setup 1:1 mapping between hardware queue and CPU
queue and avoid unnecessary locking when issuing I/O.

Link: https://lore.kernel.org/r/1567790660-48142-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 23:09:42 -04:00
Austin Kim
4b062e7cf9 scsi: qedf: Remove always false 'tmp_prio < 0' statement
Since tmp_prio is declared as u8, the following statement is always false.
   tmp_prio < 0

So remove 'always false' statement.

Link: https://lore.kernel.org/r/20190919075548.GA112801@LGEARND20B15
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 23:09:42 -04:00
Stanley Chu
f51913eef2 scsi: ufs: skip shutdown if hba is not powered
In some cases, hba may go through shutdown flow without successful
initialization and then make system hang.

For example, if ufshcd_change_power_mode() gets error and leads to
ufshcd_hba_exit() to release resources of the host, future shutdown flow
may hang the system since the host register will be accessed in unpowered
state.

To solve this issue, simply add checking to skip shutdown for above kind of
situation.

Link: https://lore.kernel.org/r/1568780438-28753-1-git-send-email-stanley.chu@mediatek.com
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Acked-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 23:09:42 -04:00
Michael Roth
3a83f677a6 KVM: PPC: Book3S HV: use smp_mb() when setting/clearing host_ipi flag
On a 2-socket Power9 system with 32 cores/128 threads (SMT4) and 1TB
of memory running the following guest configs:

  guest A:
    - 224GB of memory
    - 56 VCPUs (sockets=1,cores=28,threads=2), where:
      VCPUs 0-1 are pinned to CPUs 0-3,
      VCPUs 2-3 are pinned to CPUs 4-7,
      ...
      VCPUs 54-55 are pinned to CPUs 108-111

  guest B:
    - 4GB of memory
    - 4 VCPUs (sockets=1,cores=4,threads=1)

with the following workloads (with KSM and THP enabled in all):

  guest A:
    stress --cpu 40 --io 20 --vm 20 --vm-bytes 512M

  guest B:
    stress --cpu 4 --io 4 --vm 4 --vm-bytes 512M

  host:
    stress --cpu 4 --io 4 --vm 2 --vm-bytes 256M

the below soft-lockup traces were observed after an hour or so and
persisted until the host was reset (this was found to be reliably
reproducible for this configuration, for kernels 4.15, 4.18, 5.0,
and 5.3-rc5):

  [ 1253.183290] rcu: INFO: rcu_sched self-detected stall on CPU
  [ 1253.183319] rcu:     124-....: (5250 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=1941
  [ 1256.287426] watchdog: BUG: soft lockup - CPU#105 stuck for 23s! [CPU 52/KVM:19709]
  [ 1264.075773] watchdog: BUG: soft lockup - CPU#24 stuck for 23s! [worker:19913]
  [ 1264.079769] watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [worker:20331]
  [ 1264.095770] watchdog: BUG: soft lockup - CPU#45 stuck for 23s! [worker:20338]
  [ 1264.131773] watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [avocado:19525]
  [ 1280.408480] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
  [ 1316.198012] rcu: INFO: rcu_sched self-detected stall on CPU
  [ 1316.198032] rcu:     124-....: (21003 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=8243
  [ 1340.411024] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
  [ 1379.212609] rcu: INFO: rcu_sched self-detected stall on CPU
  [ 1379.212629] rcu:     124-....: (36756 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=14714
  [ 1404.413615] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791]
  [ 1442.227095] rcu: INFO: rcu_sched self-detected stall on CPU
  [ 1442.227115] rcu:     124-....: (52509 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=21403
  [ 1455.111787] INFO: task worker:19907 blocked for more than 120 seconds.
  [ 1455.111822]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
  [ 1455.111833] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 1455.111884] INFO: task worker:19908 blocked for more than 120 seconds.
  [ 1455.111905]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
  [ 1455.111925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 1455.111966] INFO: task worker:20328 blocked for more than 120 seconds.
  [ 1455.111986]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
  [ 1455.111998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 1455.112048] INFO: task worker:20330 blocked for more than 120 seconds.
  [ 1455.112068]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
  [ 1455.112097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 1455.112138] INFO: task worker:20332 blocked for more than 120 seconds.
  [ 1455.112159]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
  [ 1455.112179] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 1455.112210] INFO: task worker:20333 blocked for more than 120 seconds.
  [ 1455.112231]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
  [ 1455.112242] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 1455.112282] INFO: task worker:20335 blocked for more than 120 seconds.
  [ 1455.112303]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1
  [ 1455.112332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 1455.112372] INFO: task worker:20336 blocked for more than 120 seconds.
  [ 1455.112392]       Tainted: G             L    5.3.0-rc5-mdr-vanilla+ #1

CPUs 45, 24, and 124 are stuck on spin locks, likely held by
CPUs 105 and 31.

CPUs 105 and 31 are stuck in smp_call_function_many(), waiting on
target CPU 42. For instance:

  # CPU 105 registers (via xmon)
  R00 = c00000000020b20c   R16 = 00007d1bcd800000
  R01 = c00000363eaa7970   R17 = 0000000000000001
  R02 = c0000000019b3a00   R18 = 000000000000006b
  R03 = 000000000000002a   R19 = 00007d537d7aecf0
  R04 = 000000000000002a   R20 = 60000000000000e0
  R05 = 000000000000002a   R21 = 0801000000000080
  R06 = c0002073fb0caa08   R22 = 0000000000000d60
  R07 = c0000000019ddd78   R23 = 0000000000000001
  R08 = 000000000000002a   R24 = c00000000147a700
  R09 = 0000000000000001   R25 = c0002073fb0ca908
  R10 = c000008ffeb4e660   R26 = 0000000000000000
  R11 = c0002073fb0ca900   R27 = c0000000019e2464
  R12 = c000000000050790   R28 = c0000000000812b0
  R13 = c000207fff623e00   R29 = c0002073fb0ca808
  R14 = 00007d1bbee00000   R30 = c0002073fb0ca800
  R15 = 00007d1bcd600000   R31 = 0000000000000800
  pc  = c00000000020b260 smp_call_function_many+0x3d0/0x460
  cfar= c00000000020b270 smp_call_function_many+0x3e0/0x460
  lr  = c00000000020b20c smp_call_function_many+0x37c/0x460
  msr = 900000010288b033   cr  = 44024824
  ctr = c000000000050790   xer = 0000000000000000   trap =  100

CPU 42 is running normally, doing VCPU work:

  # CPU 42 stack trace (via xmon)
  [link register   ] c00800001be17188 kvmppc_book3s_radix_page_fault+0x90/0x2b0 [kvm_hv]
  [c000008ed3343820] c000008ed3343850 (unreliable)
  [c000008ed33438d0] c00800001be11b6c kvmppc_book3s_hv_page_fault+0x264/0xe30 [kvm_hv]
  [c000008ed33439d0] c00800001be0d7b4 kvmppc_vcpu_run_hv+0x8dc/0xb50 [kvm_hv]
  [c000008ed3343ae0] c00800001c10891c kvmppc_vcpu_run+0x34/0x48 [kvm]
  [c000008ed3343b00] c00800001c10475c kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
  [c000008ed3343b90] c00800001c0f5a78 kvm_vcpu_ioctl+0x470/0x7c8 [kvm]
  [c000008ed3343d00] c000000000475450 do_vfs_ioctl+0xe0/0xc70
  [c000008ed3343db0] c0000000004760e4 ksys_ioctl+0x104/0x120
  [c000008ed3343e00] c000000000476128 sys_ioctl+0x28/0x80
  [c000008ed3343e20] c00000000000b388 system_call+0x5c/0x70
  --- Exception: c00 (System Call) at 00007d545cfd7694
  SP (7d53ff7edf50) is in userspace

It was subsequently found that ipi_message[PPC_MSG_CALL_FUNCTION]
was set for CPU 42 by at least 1 of the CPUs waiting in
smp_call_function_many(), but somehow the corresponding
call_single_queue entries were never processed by CPU 42, causing the
callers to spin in csd_lock_wait() indefinitely.

Nick Piggin suggested something similar to the following sequence as
a possible explanation (interleaving of CALL_FUNCTION/RESCHEDULE
IPI messages seems to be most common, but any mix of CALL_FUNCTION and
!CALL_FUNCTION messages could trigger it):

    CPU
      X: smp_muxed_ipi_set_message():
      X:   smp_mb()
      X:   message[RESCHEDULE] = 1
      X: doorbell_global_ipi(42):
      X:   kvmppc_set_host_ipi(42, 1)
      X:   ppc_msgsnd_sync()/smp_mb()
      X:   ppc_msgsnd() -> 42
     42: doorbell_exception(): // from CPU X
     42:   ppc_msgsync()
    105: smp_muxed_ipi_set_message():
    105:   smb_mb()
         // STORE DEFERRED DUE TO RE-ORDERING
  --105:   message[CALL_FUNCTION] = 1
  | 105: doorbell_global_ipi(42):
  | 105:   kvmppc_set_host_ipi(42, 1)
  |  42:   kvmppc_set_host_ipi(42, 0)
  |  42: smp_ipi_demux_relaxed()
  |  42: // returns to executing guest
  |      // RE-ORDERED STORE COMPLETES
  ->105:   message[CALL_FUNCTION] = 1
    105:   ppc_msgsnd_sync()/smp_mb()
    105:   ppc_msgsnd() -> 42
     42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored
    105: // hangs waiting on 42 to process messages/call_single_queue

This can be prevented with an smp_mb() at the beginning of
kvmppc_set_host_ipi(), such that stores to message[<type>] (or other
state indicated by the host_ipi flag) are ordered vs. the store to
to host_ipi.

However, doing so might still allow for the following scenario (not
yet observed):

    CPU
      X: smp_muxed_ipi_set_message():
      X:   smp_mb()
      X:   message[RESCHEDULE] = 1
      X: doorbell_global_ipi(42):
      X:   kvmppc_set_host_ipi(42, 1)
      X:   ppc_msgsnd_sync()/smp_mb()
      X:   ppc_msgsnd() -> 42
     42: doorbell_exception(): // from CPU X
     42:   ppc_msgsync()
         // STORE DEFERRED DUE TO RE-ORDERING
  -- 42:   kvmppc_set_host_ipi(42, 0)
  |  42: smp_ipi_demux_relaxed()
  | 105: smp_muxed_ipi_set_message():
  | 105:   smb_mb()
  | 105:   message[CALL_FUNCTION] = 1
  | 105: doorbell_global_ipi(42):
  | 105:   kvmppc_set_host_ipi(42, 1)
  |      // RE-ORDERED STORE COMPLETES
  -> 42:   kvmppc_set_host_ipi(42, 0)
     42: // returns to executing guest
    105:   ppc_msgsnd_sync()/smp_mb()
    105:   ppc_msgsnd() -> 42
     42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored
    105: // hangs waiting on 42 to process messages/call_single_queue

Fixing this scenario would require an smp_mb() *after* clearing
host_ipi flag in kvmppc_set_host_ipi() to order the store vs.
subsequent processing of IPI messages.

To handle both cases, this patch splits kvmppc_set_host_ipi() into
separate set/clear functions, where we execute smp_mb() prior to
setting host_ipi flag, and after clearing host_ipi flag. These
functions pair with each other to synchronize the sender and receiver
sides.

With that change in place the above workload ran for 20 hours without
triggering any lock-ups.

Fixes: 755563bc79 ("powerpc/powernv: Fixes for hypervisor doorbell handling") # v4.0
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190911223155.16045-1-mdroth@linux.vnet.ibm.com
2019-09-24 12:46:26 +10:00
Linus Torvalds
4c07e2ddab Merge tag 'mfd-next-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
 "New Drivers:
   - Add support for Merrifield Basin Cove PMIC

  New Device Support:
   - Add support for Intel Tiger Lake to Intel LPSS PCI
   - Add support for Intel Sky Lake to Intel LPSS PCI
   - Add support for ST-Ericsson DB8520 to DB8500 PRCMU

  New Functionality:
   - Add RTC and PWRC support to MT6323

  Fix-ups:
   - Clean-up include files; davinci_voicecodec, asic3, sm501, mt6397
   - Ignore return values from debugfs_create*(); ab3100-*, ab8500-debugfs, aat2870-core
   - Device Tree changes; rn5t618, mt6397
   - Use new I2C API; tps80031, 88pm860x-core, ab3100-core, bcm590xx,
                      da9150-core, max14577, max77693, max77843, max8907,
                      max8925-i2c, max8997, max8998, palmas, twl-core,
   - Remove obsolete code; da9063, jz4740-adc
   - Simplify semantics; timberdale, htc-i2cpld
   - Add 'fall-through' tags; omap-usb-host, db8500-prcmu
   - Remove superfluous prints; ab8500-debugfs, db8500-prcmu, fsl-imx25-tsadc,
                                intel_soc_pmic_bxtwc, qcom_rpm, sm501
   - Trivial rename/whitespace/typo fixes; mt6397-core, MAINTAINERS
   - Reorganise code structure; mt6397-*
   - Improve code consistency; intel-lpss
   - Use MODULE_SOFTDEP() helper; intel-lpss
   - Use DEFINE_RES_*() helpers; mt6397-core

  Bug Fixes:
   - Clean-up resources; max77620
   - Prevent input events being dropped on resume; intel-lpss-pci
   - Prevent sleeping in IRQ context; ezx-pcap"

* tag 'mfd-next-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (48 commits)
  mfd: mt6323: Add MT6323 RTC and PWRC
  mfd: mt6323: Replace boilerplate resource code with DEFINE_RES_* macros
  mfd: mt6397: Add mutex include
  dt-bindings: mfd: mediatek: Add MT6323 Power Controller
  dt-bindings: mfd: mediatek: Update RTC to include MT6323
  dt-bindings: mfd: mediatek: mt6397: Change to relative paths
  mfd: db8500-prcmu: Support the higher DB8520 ARMSS
  mfd: intel-lpss: Use MODULE_SOFTDEP() instead of implicit request
  mfd: htc-i2cpld: Drop check because i2c_unregister_device() is NULL safe
  mfd: sm501: Include the GPIO driver header
  mfd: intel-lpss: Add Intel Skylake ACPI IDs
  mfd: intel-lpss: Consistently use GENMASK()
  mfd: Add support for Merrifield Basin Cove PMIC
  mfd: ezx-pcap: Replace mutex_lock with spin_lock
  mfd: asic3: Include the right header
  MAINTAINERS: altera-sysmgr: Fix typo in a filepath
  mfd: mt6397: Extract IRQ related code from core driver
  mfd: mt6397: Rename macros to something more readable
  mfd: Remove dev_err() usage after platform_get_irq()
  mfd: db8500-prcmu: Mark expected switch fall-throughs
  ...
2019-09-23 19:37:49 -07:00
Linus Torvalds
d0b3cfee33 Merge tag 'backlight-next-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
 "Core Frameworks
   - Obtain scale type through sysfs

  New Functionality:
   - Provide Device Tree functionality in rave-sp-backlight
   - Calculate if scale type is (non-)linear in pwm_bl

  Fix-ups:
   - Simplify code in lm3630a_bl
   - Trivial rename/whitespace/typo fixes in lms283gf05
   - Remove superfluous NULL check in tosa_lcd
   - Fix power state initialisation in gpio_backlight
   - List supported file in MAINTAINERS

  Bug Fixes:
   - Kconfig - default to not building unless requested in
     {LED,BACKLIGHT}_CLASS_DEVICE"

* tag 'backlight-next-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
  backlight: pwm_bl: Set scale type for brightness curves specified in the DT
  backlight: pwm_bl: Set scale type for CIE 1931 curves
  backlight: Expose brightness curve type through sysfs
  MAINTAINERS: Add entry for stable backlight sysfs ABI documentation
  backlight: gpio-backlight: Correct initial power state handling
  video: backlight: tosa_lcd: drop check because i2c_unregister_device() is NULL safe
  video: backlight: Drop default m for {LCD,BACKLIGHT_CLASS_DEVICE}
  backlight: lms283gf05: Fix a typo in the description passed to 'devm_gpio_request_one()'
  backlight: lm3630a: Switch to use fwnode_property_count_uXX()
  backlight: rave-sp: Leave initial state and register with correct device
2019-09-23 19:33:56 -07:00
Linus Torvalds
299d14d4c3 Merge tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Consolidate _HPP/_HPX stuff in pci-acpi.c and simplify it
     (Krzysztof Wilczynski)

   - Fix incorrect PCIe device types and remove dev->has_secondary_link
     to simplify code that deals with upstream/downstream ports (Mika
     Westerberg)

   - After suspend, restore Resizable BAR size bits correctly for 1MB
     BARs (Sumit Saxena)

   - Enable PCI_MSI_IRQ_DOMAIN support for RISC-V (Wesley Terpstra)

  Virtualization:

   - Add ACS quirks for iProc PAXB (Abhinav Ratna), Amazon Annapurna
     Labs (Ali Saidi)

   - Move sysfs SR-IOV functions to iov.c (Kelsey Skunberg)

   - Remove group write permissions from sysfs sriov_numvfs,
     sriov_drivers_autoprobe (Kelsey Skunberg)

  Hotplug:

   - Simplify pciehp indicator control (Denis Efremov)

  Peer-to-peer DMA:

   - Allow P2P DMA between root ports for whitelisted bridges (Logan
     Gunthorpe)

   - Whitelist some Intel host bridges for P2P DMA (Logan Gunthorpe)

   - DMA map P2P DMA requests that traverse host bridge (Logan
     Gunthorpe)

  Amazon Annapurna Labs host bridge driver:

   - Add DT binding and controller driver (Jonathan Chocron)

  Hyper-V host bridge driver:

   - Fix hv_pci_dev->pci_slot use-after-free (Dexuan Cui)

   - Fix PCI domain number collisions (Haiyang Zhang)

   - Use instance ID bytes 4 & 5 as PCI domain numbers (Haiyang Zhang)

   - Fix build errors on non-SYSFS config (Randy Dunlap)

  i.MX6 host bridge driver:

   - Limit DBI register length (Stefan Agner)

  Intel VMD host bridge driver:

   - Fix config addressing issues (Jon Derrick)

  Layerscape host bridge driver:

   - Add bar_fixed_64bit property to endpoint driver (Xiaowei Bao)

   - Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC drivers separately
     (Xiaowei Bao)

  Mediatek host bridge driver:

   - Add MT7629 controller support (Jianjun Wang)

  Mobiveil host bridge driver:

   - Fix CPU base address setup (Hou Zhiqiang)

   - Make "num-lanes" property optional (Hou Zhiqiang)

  Tegra host bridge driver:

   - Fix OF node reference leak (Nishka Dasgupta)

   - Disable MSI for root ports to work around design problem (Vidya
     Sagar)

   - Add Tegra194 DT binding and controller support (Vidya Sagar)

   - Add support for sideband pins and slot regulators (Vidya Sagar)

   - Add PIPE2UPHY support (Vidya Sagar)

  Misc:

   - Remove unused pci_block_cfg_access() et al (Kelsey Skunberg)

   - Unexport pci_bus_get(), etc (Kelsey Skunberg)

   - Hide PM, VC, link speed, ATS, ECRC, PTM constants and interfaces in
     the PCI core (Kelsey Skunberg)

   - Clean up sysfs DEVICE_ATTR() usage (Kelsey Skunberg)

   - Mark expected switch fall-through (Gustavo A. R. Silva)

   - Propagate errors for optional regulators and PHYs (Thierry Reding)

   - Fix kernel command line resource_alignment parameter issues (Logan
     Gunthorpe)"

* tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (112 commits)
  PCI: Add pci_irq_vector() and other stubs when !CONFIG_PCI
  arm64: tegra: Add PCIe slot supply information in p2972-0000 platform
  arm64: tegra: Add configuration for PCIe C5 sideband signals
  PCI: tegra: Add support to enable slot regulators
  PCI: tegra: Add support to configure sideband pins
  PCI: vmd: Fix shadow offsets to reflect spec changes
  PCI: vmd: Fix config addressing when using bus offsets
  PCI: dwc: Add validation that PCIe core is set to correct mode
  PCI: dwc: al: Add Amazon Annapurna Labs PCIe controller driver
  dt-bindings: PCI: Add Amazon's Annapurna Labs PCIe host bridge binding
  PCI: Add quirk to disable MSI-X support for Amazon's Annapurna Labs Root Port
  PCI/VPD: Prevent VPD access for Amazon's Annapurna Labs Root Port
  PCI: Add ACS quirk for Amazon Annapurna Labs root ports
  PCI: Add Amazon's Annapurna Labs vendor ID
  MAINTAINERS: Add PCI native host/endpoint controllers designated reviewer
  PCI: hv: Use bytes 4 and 5 from instance ID as the PCI domain numbers
  dt-bindings: PCI: tegra: Add PCIe slot supplies regulator entries
  dt-bindings: PCI: tegra: Add sideband pins configuration entries
  PCI: tegra: Add Tegra194 PCIe support
  PCI: Get rid of dev->has_secondary_link flag
  ...
2019-09-23 19:16:01 -07:00
Laurence Oberman
c9c5374937 scsi: bnx2fc: Handle scope bits when array returns BUSY or TSF
The qla2xxx driver had this issue as well when the newer array firmware
returned the retry_delay_timer in the fcp_rsp.  The bnx2fc is not handling
the masking of the scope bits either so the retry_delay_timestamp value
lands up being a large value added to the timer timestamp delaying I/O for
up to 27 Minutes.  This patch adds similar code to handle this to the
bnx2fc driver to avoid the huge delay.

Link: https://lore.kernel.org/r/1568210202-12794-1-git-send-email-loberman@redhat.com
Signed-off-by: Laurence Oberman <loberman@redhat.com>
Reported-by: David Jeffery <djeffery@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 22:13:30 -04:00
Zhang Rui
0f84d1d18c Merge branches 'thermal-core', 'thermal-intel' and 'thermal-soc' into for-5.4 2019-09-24 09:56:37 +08:00
Amit Kucheria
bf8ca04d8b MAINTAINERS: Add Amit Kucheria as reviewer for thermal
Add Amit Kucheria as the reviewer for thermal as he would like to
participate in the review process effort for the thermal framework.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-09-24 09:56:08 +08:00
Amit Kucheria
67eed44b8a thermal: Add some error messages
When registering a thermal zone device, we currently return -EINVAL in
four cases. This makes it a little hard to debug the real cause of the
failure.

Print some error messages to make it easier for developer to figure out
what happened.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-09-24 09:56:08 +08:00
Ido Schimmel
1851799e1d thermal: Fix use-after-free when unregistering thermal zone device
thermal_zone_device_unregister() cancels the delayed work that polls the
thermal zone, but it does not wait for it to finish. This is racy with
respect to the freeing of the thermal zone device, which can result in a
use-after-free [1].

Fix this by waiting for the delayed work to finish before freeing the
thermal zone device. Note that thermal_zone_device_set_polling() is
never invoked from an atomic context, so it is safe to call
cancel_delayed_work_sync() that can block.

[1]
[  +0.002221] ==================================================================
[  +0.000064] BUG: KASAN: use-after-free in __mutex_lock+0x1076/0x11c0
[  +0.000016] Read of size 8 at addr ffff8881e48e0450 by task kworker/1:0/17

[  +0.000023] CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.2.0-rc6-custom-02495-g8e73ca3be4af #1701
[  +0.000010] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[  +0.000016] Workqueue: events_freezable_power_ thermal_zone_device_check
[  +0.000012] Call Trace:
[  +0.000021]  dump_stack+0xa9/0x10e
[  +0.000020]  print_address_description.cold.2+0x9/0x25e
[  +0.000018]  __kasan_report.cold.3+0x78/0x9d
[  +0.000016]  kasan_report+0xe/0x20
[  +0.000016]  __mutex_lock+0x1076/0x11c0
[  +0.000014]  step_wise_throttle+0x72/0x150
[  +0.000018]  handle_thermal_trip+0x167/0x760
[  +0.000019]  thermal_zone_device_update+0x19e/0x5f0
[  +0.000019]  process_one_work+0x969/0x16f0
[  +0.000017]  worker_thread+0x91/0xc40
[  +0.000014]  kthread+0x33d/0x400
[  +0.000015]  ret_from_fork+0x3a/0x50

[  +0.000020] Allocated by task 1:
[  +0.000015]  save_stack+0x19/0x80
[  +0.000015]  __kasan_kmalloc.constprop.4+0xc1/0xd0
[  +0.000014]  kmem_cache_alloc_trace+0x152/0x320
[  +0.000015]  thermal_zone_device_register+0x1b4/0x13a0
[  +0.000015]  mlxsw_thermal_init+0xc92/0x23d0
[  +0.000014]  __mlxsw_core_bus_device_register+0x659/0x11b0
[  +0.000013]  mlxsw_core_bus_device_register+0x3d/0x90
[  +0.000013]  mlxsw_pci_probe+0x355/0x4b0
[  +0.000014]  local_pci_probe+0xc3/0x150
[  +0.000013]  pci_device_probe+0x280/0x410
[  +0.000013]  really_probe+0x26a/0xbb0
[  +0.000013]  driver_probe_device+0x208/0x2e0
[  +0.000013]  device_driver_attach+0xfe/0x140
[  +0.000013]  __driver_attach+0x110/0x310
[  +0.000013]  bus_for_each_dev+0x14b/0x1d0
[  +0.000013]  driver_register+0x1c0/0x400
[  +0.000015]  mlxsw_sp_module_init+0x5d/0xd3
[  +0.000014]  do_one_initcall+0x239/0x4dd
[  +0.000013]  kernel_init_freeable+0x42b/0x4e8
[  +0.000012]  kernel_init+0x11/0x18b
[  +0.000013]  ret_from_fork+0x3a/0x50

[  +0.000015] Freed by task 581:
[  +0.000013]  save_stack+0x19/0x80
[  +0.000014]  __kasan_slab_free+0x125/0x170
[  +0.000013]  kfree+0xf3/0x310
[  +0.000013]  thermal_release+0xc7/0xf0
[  +0.000014]  device_release+0x77/0x200
[  +0.000014]  kobject_put+0x1a8/0x4c0
[  +0.000014]  device_unregister+0x38/0xc0
[  +0.000014]  thermal_zone_device_unregister+0x54e/0x6a0
[  +0.000014]  mlxsw_thermal_fini+0x184/0x35a
[  +0.000014]  mlxsw_core_bus_device_unregister+0x10a/0x640
[  +0.000013]  mlxsw_devlink_core_bus_device_reload+0x92/0x210
[  +0.000015]  devlink_nl_cmd_reload+0x113/0x1f0
[  +0.000014]  genl_family_rcv_msg+0x700/0xee0
[  +0.000013]  genl_rcv_msg+0xca/0x170
[  +0.000013]  netlink_rcv_skb+0x137/0x3a0
[  +0.000012]  genl_rcv+0x29/0x40
[  +0.000013]  netlink_unicast+0x49b/0x660
[  +0.000013]  netlink_sendmsg+0x755/0xc90
[  +0.000013]  __sys_sendto+0x3de/0x430
[  +0.000013]  __x64_sys_sendto+0xe2/0x1b0
[  +0.000013]  do_syscall_64+0xa4/0x4d0
[  +0.000013]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  +0.000017] The buggy address belongs to the object at ffff8881e48e0008
               which belongs to the cache kmalloc-2k of size 2048
[  +0.000012] The buggy address is located 1096 bytes inside of
               2048-byte region [ffff8881e48e0008, ffff8881e48e0808)
[  +0.000007] The buggy address belongs to the page:
[  +0.000012] page:ffffea0007923800 refcount:1 mapcount:0 mapping:ffff88823680d0c0 index:0x0 compound_mapcount: 0
[  +0.000020] flags: 0x200000000010200(slab|head)
[  +0.000019] raw: 0200000000010200 ffffea0007682008 ffffea00076ab808 ffff88823680d0c0
[  +0.000016] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
[  +0.000007] page dumped because: kasan: bad access detected

[  +0.000012] Memory state around the buggy address:
[  +0.000012]  ffff8881e48e0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000012]  ffff8881e48e0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000012] >ffff8881e48e0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000008]                                                  ^
[  +0.000012]  ffff8881e48e0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000012]  ffff8881e48e0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000007] ==================================================================

Fixes: b1569e99c7 ("ACPI: move thermal trip handling to generic thermal layer")
Reported-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-09-24 09:56:08 +08:00
Yue Hu
adc8749b15 thermal/drivers/core: Use put_device() if device_register() fails
Never directly free @dev after calling device_register(), even if it
returned an error! Always use put_device() to give up the reference
initialized. Clean up the rollback block also.

Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-09-24 09:56:08 +08:00
Stefan Mavrodiev
8c7aa18428 thermal_hwmon: Sanitize thermal_zone type
When calling thermal_add_hwmon_sysfs(), the device type is sanitized by
replacing '-' with '_'. However tz->type remains unsanitized. Thus
calling thermal_hwmon_lookup_by_type() returns no device. And if there is
no device, thermal_remove_hwmon_sysfs() fails with "hwmon device lookup
failed!".

The result is unregisted hwmon devices in the sysfs.

Fixes: 409ef0baca ("thermal_hwmon: Sanitize attribute name passed to hwmon")

Signed-off-by: Stefan Mavrodiev <stefan@olimex.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-09-24 09:56:08 +08:00
Chuhong Yuan
97e9cafe85 thermal: intel: Use dev_get_drvdata
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-09-24 09:55:51 +08:00
Rishi Gupta
4c8a342c11 thermal: intel: int3403: replace printk(KERN_WARN...) with pr_warn(...)
Direct invocation of printk() is not preferred to emit logs.
This commit replaces printk(KERN_WARNING) with corresponding
pr_warn() function call.

Signed-off-by: Rishi Gupta <gupt21@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-09-24 09:53:03 +08:00
Kelsey Skunberg
f639cff55f thermal: intel: int340x_thermal: Remove unnecessary acpi_has_method() uses
acpi_evaluate_object() will already return in error if the method does not
exist. Checking if the method is absent before the acpi_evaluate_object()
call is not needed. Remove acpi_has_method() calls to avoid additional
work.

Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-09-24 09:53:03 +08:00
Srinivas Pandruvada
c669675b56 thermal: int340x: processor_thermal: Add Ice Lake support
Add new PCI id for Ice lake processor thermal device. Also enabled
the RAPL mmio interface. The MMIO offsets match Skylake.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-09-24 09:53:03 +08:00
Steffen Maier
6b6fa7a5c8 scsi: core: fix dh and multipathing for SCSI hosts without request batching
This was missing from scsi_device_from_queue() due to the introduction of
another new scsi_mq_ops_no_commit of linux-next commit 8930a6c207 ("scsi:
core: add support for request batching") from Martin's scsi/5.4/scsi-queue
or James' scsi/misc.

Only devicehandler code seems to call scsi_device_from_queue():
*** drivers/scsi/scsi_dh.c:
scsi_dh_activate[255]          sdev = scsi_device_from_queue(q);
scsi_dh_set_params[302]        sdev = scsi_device_from_queue(q);
scsi_dh_attach[325]            sdev = scsi_device_from_queue(q);
scsi_dh_attached_handler_name[363] sdev = scsi_device_from_queue(q);

Fixes multipath tools follow-on errors:

$ multipath -v6
...
libdevmapper: ioctl/libdm-iface.c(1887): device-mapper: reload ioctl on mpatha  failed: No such device
...
mpatha: failed to load map, error 19
...

showing also as kernel messages:

device-mapper: table: 252:0: multipath: error attaching hardware handler
device-mapper: ioctl: error adding target to table

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 8930a6c207 ("scsi: core: add support for request batching")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-23 21:34:34 -04:00