Commit Graph

1059871 Commits

Author SHA1 Message Date
Andrii Nakryiko
de5d0dcef6 libbpf: Fix off-by-one bug in bpf_core_apply_relo()
Fix instruction index validity check which has off-by-one error.

Fixes: 3ee4f53355 ("libbpf: Split bpf_core_apply_relo() into bpf_program independent helper.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211025224531.1088894-2-andrii@kernel.org
2021-10-25 18:37:21 -07:00
Ido Schimmel
759635760a mlxsw: pci: Recycle received packet upon allocation failure
When the driver fails to allocate a new Rx buffer, it passes an empty Rx
descriptor (contains zero address and size) to the device and marks it
as invalid by setting the skb pointer in the descriptor's metadata to
NULL.

After processing enough Rx descriptors, the driver will try to process
the invalid descriptor, but will return immediately seeing that the skb
pointer is NULL. Since the driver no longer passes new Rx descriptors to
the device, the Rx queue will eventually become full and the device will
start to drop packets.

Fix this by recycling the received packet if allocation of the new
packet failed. This means that allocation is no longer performed at the
end of the Rx routine, but at the start, before tearing down the DMA
mapping of the received packet.

Remove the comment about the descriptor being zeroed as it is no longer
correct. This is OK because we either use the descriptor as-is (when
recycling) or overwrite its address and size fields with that of the
newly allocated Rx buffer.

The issue was discovered when a process ("perf") consumed too much
memory and put the system under memory pressure. It can be reproduced by
injecting slab allocation failures [1]. After the fix, the Rx queue no
longer comes to a halt.

[1]
 # echo 10 > /sys/kernel/debug/failslab/times
 # echo 1000 > /sys/kernel/debug/failslab/interval
 # echo 100 > /sys/kernel/debug/failslab/probability

 FAULT_INJECTION: forcing a failure.
 name failslab, interval 1000, probability 100, space 0, times 8
 [...]
 Call Trace:
  <IRQ>
  dump_stack_lvl+0x34/0x44
  should_fail.cold+0x32/0x37
  should_failslab+0x5/0x10
  kmem_cache_alloc_node+0x23/0x190
  __alloc_skb+0x1f9/0x280
  __netdev_alloc_skb+0x3a/0x150
  mlxsw_pci_rdq_skb_alloc+0x24/0x90
  mlxsw_pci_cq_tasklet+0x3dc/0x1200
  tasklet_action_common.constprop.0+0x9f/0x100
  __do_softirq+0xb5/0x252
  irq_exit_rcu+0x7a/0xa0
  common_interrupt+0x83/0xa0
  </IRQ>
  asm_common_interrupt+0x1e/0x40
 RIP: 0010:cpuidle_enter_state+0xc8/0x340
 [...]
 mlxsw_spectrum2 0000:06:00.0: Failed to alloc skb for RDQ

Fixes: eda6500a98 ("mlxsw: Add PCI bus implementation")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20211024064014.1060919-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:21:23 -07:00
Bhawanpreet Lakha
41724ea273 drm/amd/display: Add DP 2.0 MST DM Support
[Why]
Add DP2 MST and debugfs support

[How]
Update the slot info based on the link encoding format

Reviewed-by: "Lin, Wayne" <Wayne.Lin@amd.com>
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211025223825.301703-5-lyude@redhat.com
2021-10-25 21:21:09 -04:00
Fangzhi Zuo
d740e0bf8e drm/amd/display: Add DP 2.0 MST DC Support
[Why]
configure/call DC interface for DP2 mst support. This is needed to make DP2
mst work.

[How]
- add encoding type, logging, mst update/reduce payload functions

Use the link encoding to determine the DP type (1.4 or 2.0) and add a
flag to dc_stream_update to determine whether to increase/reduce
payloads.

v2:
* add DP_UNKNOWN_ENCODING handling

Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Reviewed-by: "Lin, Wayne" <Wayne.Lin@amd.com>
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211025223825.301703-4-lyude@redhat.com
2021-10-25 21:21:08 -04:00
Bhawanpreet Lakha
d6c6a76f80 drm: Update MST First Link Slot Information Based on Encoding Format
8b/10b encoding format requires to reserve the first slot for
recording metadata. Real data transmission starts from the second slot,
with a total of available 63 slots available.

In 128b/132b encoding format, metadata is transmitted separately
in LLCP packet before MTP. Real data transmission starts from
the first slot, with a total of 64 slots available.

v2:
* Move total/start slots to mst_state, and copy it to mst_mgr in
atomic_check

v3:
* Only keep the slot info on the mst_state
* add a start_slot parameter to the payload function, to facilitate non
  atomic drivers (this is a temporary workaround and should be removed when
  we are moving out the non atomic driver helpers)

v4:
*fixed typo and formatting

v5: (no functional changes)
* Fixed formatting in drm_dp_mst_update_slots()
* Reference mst_state instead of mst_state->mgr for debugging info

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
[v5 nitpicks]
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211025223825.301703-3-lyude@redhat.com
2021-10-25 21:21:07 -04:00
Bhawanpreet Lakha
0332078398 drm: Remove slot checks in dp mst topology during commit
This code path is used during commit, and we dont expect things to fail
during the commit stage, so remove this.

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211025223825.301703-2-lyude@redhat.com
2021-10-25 21:21:07 -04:00
Jakub Kicinski
e43b76abf7 Merge branch 'tcp-receive-path-optimizations'
Eric Dumazet says:

====================
tcp: receive path optimizations

This series aims to reduce cache line misses in RX path.

I am still working on better cache locality in tcp_sock but
this will wait few more weeks.
====================

Link: https://lore.kernel.org/r/20211025164825.259415-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:17 -07:00
Eric Dumazet
12c8691de3 ipv6/tcp: small drop monitor changes
Two kfree_skb() calls must be replaced by consume_skb()
for skbs that are not technically dropped.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:14 -07:00
Eric Dumazet
020e71a3cf ipv4: guard IP_MINTTL with a static key
RFC 5082 IP_MINTTL option is rarely used on hosts.

Add a static key to remove from TCP fast path useless code,
and potential cache line miss to fetch inet_sk(sk)->min_ttl

Note that once ip4_min_ttl static key has been enabled,
it stays enabled until next boot.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:14 -07:00
Eric Dumazet
14834c4f4e ipv4: annotate data races arount inet->min_ttl
No report yet from KCSAN, yet worth documenting the races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:13 -07:00
Eric Dumazet
790eb67374 ipv6: guard IPV6_MINHOPCOUNT with a static key
RFC 5082 IPV6_MINHOPCOUNT is rarely used on hosts.

Add a static key to remove from TCP fast path useless code,
and potential cache line miss to fetch tcp_inet6_sk(sk)->min_hopcount

Note that once ip6_min_hopcount static key has been enabled,
it stays enabled until next boot.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:13 -07:00
Eric Dumazet
cc17c3c8e8 ipv6: annotate data races around np->min_hopcount
No report yet from KCSAN, yet worth documenting the races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:13 -07:00
Eric Dumazet
09b8984667 net: annotate accesses to sk->sk_rx_queue_mapping
sk->sk_rx_queue_mapping can be modified locklessly,
add a couple of READ_ONCE()/WRITE_ONCE() to document this fact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:13 -07:00
Eric Dumazet
342159ee39 net: avoid dirtying sk->sk_rx_queue_mapping
sk_rx_queue_mapping is located in a cache line that should be kept read mostly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:13 -07:00
Eric Dumazet
2b13af8ade net: avoid dirtying sk->sk_napi_id
sk_napi_id is located in a cache line that can be kept read mostly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:12 -07:00
Eric Dumazet
ef57c1610d ipv6: move inet6_sk(sk)->rx_dst_cookie to sk->sk_rx_dst_cookie
Increase cache locality by moving rx_dst_coookie next to sk->sk_rx_dst

This removes one or two cache line misses in IPv6 early demux (TCP/UDP)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:12 -07:00
Eric Dumazet
0c0a5ef809 tcp: move inet->rx_dst_ifindex to sk->sk_rx_dst_ifindex
Increase cache locality by moving rx_dst_ifindex next to sk->sk_rx_dst

This is part of an effort to reduce cache line misses in TCP fast path.

This removes one cache line miss in early demux.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:12 -07:00
Alexander Lobakin
fd559a943e ax88796c: fix fetching error stats from percpu containers
rx_dropped, tx_dropped, rx_frame_errors and rx_crc_errors are being
wrongly fetched from the target container rather than source percpu
ones.
No idea if that goes from the vendor driver or was brainoed during
the refactoring, but fix it either way.

Fixes: a97c69ba4f ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Łukasz Stelmach <l.stelmach@samsung.com>
Link: https://lore.kernel.org/r/20211023121148.113466-1-alobakin@pm.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 17:33:16 -07:00
Andrii Nakryiko
9327acd0f9 Merge branch 'bpftool: Switch to libbpf's hashmap for referencing BPF objects'
Quentin Monnet says:

====================

When listing BPF objects, bpftool can print a number of properties about
items holding references to these objects. For example, it can show pinned
paths for BPF programs, maps, and links; or programs and maps using a given
BTF object; or the names and PIDs of processes referencing BPF objects. To
collect this information, bpftool uses hash maps (to be clear: the data
structures, inside bpftool - we are not talking of BPF maps). It uses the
implementation available from the kernel, and picks it up from
tools/include/linux/hashtable.h.

This patchset converts bpftool's hash maps to a distinct implementation
instead, the one coming with libbpf. The main motivation for this change is
that it should ease the path towards a potential out-of-tree mirror for
bpftool, like the one libbpf already has. Although it's not perfect to
depend on libbpf's internal components, bpftool is intimately tied with the
library anyway, and this looks better than depending too much on (non-UAPI)
kernel headers.

The first two patches contain preparatory work on the Makefile and on the
initialisation of the hash maps for collecting pinned paths for objects.
Then the transition is split into several steps, one for each kind of
properties for which the collection is backed by hash maps.

v2:
  - Move hashmap cleanup for pinned paths for links from do_detach() to
    do_show().
  - Handle errors on hashmap__append() (in three of the patches).
  - Rename bpftool_hash_fn() and bpftool_equal_fn() as hash_fn_for_key_id()
    and equal_fn_for_key_id(), respectively.
  - Add curly braces for hashmap__for_each_key_entry() { } in
    show_btf_plain() and show_btf_json(), where the flow was difficult to
    read.
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-10-25 17:31:39 -07:00
Quentin Monnet
d6699f8e0f bpftool: Switch to libbpf's hashmap for PIDs/names references
In order to show PIDs and names for processes holding references to BPF
programs, maps, links, or BTF objects, bpftool creates hash maps to
store all relevant information. This commit is part of a set that
transitions from the kernel's hash map implementation to the one coming
with libbpf.

The motivation is to make bpftool less dependent of kernel headers, to
ease the path to a potential out-of-tree mirror, like libbpf has.

This is the third and final step of the transition, in which we convert
the hash maps used for storing the information about the processes
holding references to BPF objects (programs, maps, links, BTF), and at
last we drop the inclusion of tools/include/linux/hashtable.h.

Note: Checkpatch complains about the use of __weak declarations, and the
missing empty lines after the bunch of empty function declarations when
compiling without the BPF skeletons (none of these were introduced in
this patch). We want to keep things as they are, and the reports should
be safe to ignore.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211023205154.6710-6-quentin@isovalent.com
2021-10-25 17:31:39 -07:00
Quentin Monnet
2828d0d75b bpftool: Switch to libbpf's hashmap for programs/maps in BTF listing
In order to show BPF programs and maps using BTF objects when the latter
are being listed, bpftool creates hash maps to store all relevant items.
This commit is part of a set that transitions from the kernel's hash map
implementation to the one coming with libbpf.

The motivation is to make bpftool less dependent of kernel headers, to
ease the path to a potential out-of-tree mirror, like libbpf has.

This commit focuses on the two hash maps used by bpftool when listing
BTF objects to store references to programs and maps, and convert them
to the libbpf's implementation.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211023205154.6710-5-quentin@isovalent.com
2021-10-25 17:31:39 -07:00
Quentin Monnet
8f184732b6 bpftool: Switch to libbpf's hashmap for pinned paths of BPF objects
In order to show pinned paths for BPF programs, maps, or links when
listing them with the "-f" option, bpftool creates hash maps to store
all relevant paths under the bpffs. So far, it would rely on the
kernel implementation (from tools/include/linux/hashtable.h).

We can make bpftool rely on libbpf's implementation instead. The
motivation is to make bpftool less dependent of kernel headers, to ease
the path to a potential out-of-tree mirror, like libbpf has.

This commit is the first step of the conversion: the hash maps for
pinned paths for programs, maps, and links are converted to libbpf's
hashmap.{c,h}. Other hash maps used for the PIDs of process holding
references to BPF objects are left unchanged for now. On the build side,
this requires adding a dependency to a second header internal to libbpf,
and making it a dependency for the bootstrap bpftool version as well.
The rest of the changes are a rather straightforward conversion.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211023205154.6710-4-quentin@isovalent.com
2021-10-25 17:31:38 -07:00
Quentin Monnet
46241271d1 bpftool: Do not expose and init hash maps for pinned path in main.c
BPF programs, maps, and links, can all be listed with their pinned paths
by bpftool, when the "-f" option is provided. To do so, bpftool builds
hash maps containing all pinned paths for each kind of objects.

These three hash maps are always initialised in main.c, and exposed
through main.h. There appear to be no particular reason to do so: we can
just as well make them static to the files that need them (prog.c,
map.c, and link.c respectively), and initialise them only when we want
to show objects and the "-f" switch is provided.

This may prevent unnecessary memory allocations if the implementation of
the hash maps was to change in the future.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211023205154.6710-3-quentin@isovalent.com
2021-10-25 17:31:38 -07:00
Quentin Monnet
8b6c46241c bpftool: Remove Makefile dep. on $(LIBBPF) for $(LIBBPF_INTERNAL_HDRS)
The dependency is only useful to make sure that the $(LIBBPF_HDRS_DIR)
directory is created before we try to install locally the required
libbpf internal header. Let's create this directory properly instead.

This is in preparation of making $(LIBBPF_INTERNAL_HDRS) a dependency to
the bootstrap bpftool version, in which case we want no dependency on
$(LIBBPF).

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211023205154.6710-2-quentin@isovalent.com
2021-10-25 17:31:38 -07:00
Heiner Kallweit
78b5d5c998 cxgb3: Remove seeprom_write and use VPD API
Using the VPD API allows to simplify the code and completely get
rid of t3_seeprom_write().

Link: https://lore.kernel.org/r/a0291004-dda3-ea08-4d6c-a2f8826c8527@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 19:20:22 -05:00
Heiner Kallweit
43f3b61e37 cxgb3: Use VPD API in t3_seeprom_wp()
Use standard VPD API to replace t3_seeprom_write(), this prepares for
removing this function. Chelsio T3 maps the EEPROM write protect flag
to an arbitrary place in VPD address space, therefore we have to use
pci_write_vpd_any().

Link: https://lore.kernel.org/r/f768fdbe-3a16-d539-57d2-c7c908294336@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 19:20:21 -05:00
Heiner Kallweit
48225f1878 cxgb3: Remove t3_seeprom_read and use VPD API
Using the VPD API allows to simplify the code and completely get rid
of t3_seeprom_read(). Note that we don't have to use pci_read_vpd_any()
here because a VPD quirk sets dev->vpd.len to the full EEPROM size.

Tested with a T320 card.

Link: https://lore.kernel.org/r/68ef15bb-b6bf-40ad-160c-aaa72c4a70f8@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 19:20:21 -05:00
Heiner Kallweit
3331325c63 PCI/VPD: Use pci_read_vpd_any() in pci_vpd_size()
Use new function pci_read_vpd_any() to simplify the code.

[bhelgaas: squash in fix for stack overflow reported & tested by
Qian [1] and Kunihiko [2]:
[1] https://lore.kernel.org/netdev/e89087c5-c495-c5ca-feb1-54cf3a8775c5@quicinc.com/
[2] https://lore.kernel.org/r/2f7e3770-ab47-42b5-719c-f7c661c07d28@socionext.com
Link: https://lore.kernel.org/r/6211be8a-5d10-8f3a-6d33-af695dc35caf@gmail.com
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Reported-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Tested-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
]

Link: https://lore.kernel.org/r/049fa71c-c7af-9c69-51c0-05c1bc2bf660@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 19:12:23 -05:00
Cai Huoqing
d238817238 pinctrl: intel: Kconfig: Add configuration menu to Intel pin control
Adding a configuration menu to hold many Intel pin control drivers
helps to make the display more concise.

Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-10-26 01:30:11 +02:00
Prathamesh Shete
a42c7d95d2 pinctrl: tegra: Use correct offset for pin group
Function tegra_pinctrl_gpio_request_enable() and
tegra_pinctrl_gpio_disable_free() uses pin offset instead
of group offset, causing the driver to use wrong offset
to enable gpio.

Add a helper function tegra_pinctrl_get_group() to parse the
pin group and determine correct offset.

Signed-off-by: Kartik K <kkartik@nvidia.com>
Signed-off-by: Prathamesh Shete <pshete@nvidia.com>
Link: https://lore.kernel.org/r/20211025110959.27751-1-pshete@nvidia.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-10-26 01:27:39 +02:00
Christoph Hellwig
3dd60fb9d9 nvdimm/pmem: stop using q_usage_count as external pgmap refcount
Originally all DAX access when through block_device operations and thus
needed a queue reference.  But since commit cccbce6715
("filesystem-dax: convert to dax_direct_access()") all this happens at
the DAX device level which uses its own refcounting.  Having the external
refcount thus wasn't needed but has otherwise been harmless for long
time.

But now that "block: drain file system I/O on del_gendisk" waits for
q_usage_count to reach 0 in del_gendisk this whole scheme can't work
anymore (and pmem is the only driver abusing q_usage_count like that).
So switch to the internal reference and remove the unbalanced
blk_freeze_queue_start that is taken care of by del_gendisk.

Fixes: 8e141f9eb8 ("block: drain file system I/O on del_gendisk")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211019073641.2323410-2-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-25 16:12:32 -07:00
Geert Uytterhoeven
6dbe88e93c m68knommu: Remove MCPU32 config symbol
As of commit a3595962d8 ("m68knommu: remove obsolete 68360
support"), nothing selects MCPU32 anymore.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2021-10-26 08:46:27 +10:00
Randy Dunlap
1aaa557b2d m68k: set a default value for MEMORY_RESERVE
'make randconfig' can produce a .config file with
"CONFIG_MEMORY_RESERVE=" (no value) since it has no default.
When a subsequent 'make all' is done, kconfig restarts the config
and prompts for a value for MEMORY_RESERVE. This breaks
scripting/automation where there is no interactive user input.

Add a default value for MEMORY_RESERVE. (Any integer value will
work here for kconfig.)

Fixes a kconfig warning:

.config:214:warning: symbol value '' invalid for MEMORY_RESERVE
* Restart config...
Memory reservation (MiB) (MEMORY_RESERVE) [] (NEW)

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2") # from beginning of git history
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2021-10-26 08:46:27 +10:00
Qian Cai
95cadae320 fortify: strlen: Avoid shadowing previous locals
The __compiletime_strlen() macro expansion will shadow p_size and p_len
local variables. No callers currently use any of the shadowed names
for their "p" variable, so there are no code generation problems.

Add "__" prefixes to variable definitions __compiletime_strlen() to
avoid new W=2 warnings:

./include/linux/fortify-string.h: In function 'strnlen':
./include/linux/fortify-string.h:17:9: warning: declaration of 'p_size' shadows a previous local [-Wshadow]
   17 |  size_t p_size = __builtin_object_size(p, 1); \
      |         ^~~~~~
./include/linux/fortify-string.h:77:17: note: in expansion of macro '__compiletime_strlen'
   77 |  size_t p_len = __compiletime_strlen(p);
      |                 ^~~~~~~~~~~~~~~~~~~~
./include/linux/fortify-string.h:76:9: note: shadowed declaration is here
   76 |  size_t p_size = __builtin_object_size(p, 1);
      |         ^~~~~~

Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211025210528.261643-1-quic_qiancai@quicinc.com
2021-10-25 15:34:41 -07:00
Alexei Starovoitov
57c8d362ce Merge branch 'Parallelize verif_scale selftests'
Andrii Nakryiko says:

====================

Reduce amount of waiting time when running test_progs in parallel mode (-j) by
splitting bpf_verif_scale selftests into multiple tests. Previously it was
structured as a test with multiple subtests, but subtests are not easily
parallelizable with test_progs' infra. Also in practice each scale subtest is
really an independent test with nothing shared across all substest.

This patch set changes how test_progs test discovery works. Now it is possible
to define multiple tests within a single source code file. One of the patches
also marks tc_redirect selftests as serial, because it's extremely harmful to
the test system when run in parallel mode.
====================

Acked-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-10-25 14:45:52 -07:00
Andrii Nakryiko
3762a39ce8 selftests/bpf: Split out bpf_verif_scale selftests into multiple tests
Instead of using subtests in bpf_verif_scale selftest, turn each scale
sub-test into its own test. Each subtest is compltely independent and
just reuses a bit of common test running logic, so the conversion is
trivial. For convenience, keep all of BPF verifier scale tests in one
file.

This conversion shaves off a significant amount of time when running
test_progs in parallel mode. E.g., just running scale tests (-t verif_scale):

BEFORE
======
Summary: 24/0 PASSED, 0 SKIPPED, 0 FAILED

real    0m22.894s
user    0m0.012s
sys     0m22.797s

AFTER
=====
Summary: 24/0 PASSED, 0 SKIPPED, 0 FAILED

real    0m12.044s
user    0m0.024s
sys     0m27.869s

Ten second saving right there. test_progs -j is not yet ready to be
turned on by default, unfortunately, and some tests fail almost every
time, but this is a good improvement nevertheless. Ignoring few
failures, here is sequential vs parallel run times when running all
tests now:

SEQUENTIAL
==========
Summary: 206/953 PASSED, 4 SKIPPED, 0 FAILED

real    1m5.625s
user    0m4.211s
sys     0m31.650s

PARALLEL
========
Summary: 204/952 PASSED, 4 SKIPPED, 2 FAILED

real    0m35.550s
user    0m4.998s
sys     0m39.890s

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211022223228.99920-5-andrii@kernel.org
2021-10-25 14:45:46 -07:00
Andrii Nakryiko
2c0f51ac32 selftests/bpf: Mark tc_redirect selftest as serial
It seems to cause a lot of harm to kprobe/tracepoint selftests. Yucong
mentioned before that it does manipulate sysfs, which might be the
reason. So let's mark it as serial, though ideally it would be less
intrusive on the system at test.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211022223228.99920-4-andrii@kernel.org
2021-10-25 14:45:46 -07:00
Andrii Nakryiko
8ea688e7f4 selftests/bpf: Support multiple tests per file
Revamp how test discovery works for test_progs and allow multiple test
entries per file. Any global void function with no arguments and
serial_test_ or test_ prefix is considered a test.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211022223228.99920-3-andrii@kernel.org
2021-10-25 14:45:46 -07:00
Andrii Nakryiko
6972dc3b87 selftests/bpf: Normalize selftest entry points
Ensure that all test entry points are global void functions with no
input arguments. Mark few subtest entry points as static.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211022223228.99920-2-andrii@kernel.org
2021-10-25 14:45:45 -07:00
Eric W. Biederman
1fbd60df8a signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
Update save_v86_state to always complete all of it's work except
possibly some of the copies to userspace even if save_v86_state takes
a fault.  This ensures that the kernel is always in a sane state, even
if userspace has done something silly.

When save_v86_state takes a fault update it to force userspace to take
a SIGSEGV and terminate the userspace application.

As Andy pointed out in review of the first version of this change
there are races between sigaction and the application terinating.  Now
that the code has been modified to always perform all save_v86_state's
work (except possibly copying to userspace) those races do not matter
from a kernel perspective.

Forcing the userspace application to terminate (by resetting it's
handler to SIGDFL) is there to keep everything as close to the current
behavior as possible while removing the unique (and difficult to
maintain) use of do_exit.

If this new SIGSEGV happens during handle_signal the next time around
the exit_to_user_mode_loop, SIGSEGV will be delivered to userspace.

All of the callers of handle_vm86_trap and handle_vm86_fault run the
exit_to_user_mode_loop before they return to userspace any signal sent
to the current task during their execution will be delivered to the
current task before that tasks exits to usermode.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: H Peter Anvin <hpa@zytor.com>
v1: https://lkml.kernel.org/r/20211020174406.17889-10-ebiederm@xmission.com
Link: https://lkml.kernel.org/r/877de1xcr6.fsf_-_@disp2133
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-25 15:56:29 -05:00
Eric W. Biederman
1a4d21a23c signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
The function save_v86_state is only called when userspace was
operating in vm86 mode before entering the kernel.  Not having vm86
state in the task_struct should never happen.  So transform the hand
rolled BUG_ON into an actual BUG_ON to make it clear what is
happening.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: H Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20211020174406.17889-9-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-25 15:56:29 -05:00
Eric W. Biederman
984bd71fb3 signal/sparc: In setup_tsb_params convert open coded BUG into BUG
The function setup_tsb_params has exactly one caller tsb_grow.  The
function tsb_grow passes in a tsb_bytes value that is between 8192 and
1048576 inclusive, and is guaranteed to be a power of 2.  The function
setup_tsb_params verifies this property with a switch statement and
then prints an error and causes the task to exit if this is not true.

In practice that print statement can never be reached because tsb_grow
never passes in a bad tsb_size.  So if tsb_size ever gets a bad value
that is a kernel bug.

So replace the do_exit which is effectively an open coded version of
BUG() with an actuall call to BUG().  Making it clearer that this
is a case that can never, and should never happen.

Cc: David Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Link: https://lkml.kernel.org/r/20211020174406.17889-8-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-25 15:56:29 -05:00
Eric W. Biederman
83a1f27ad7 signal/powerpc: On swapcontext failure force SIGSEGV
If the register state may be partial and corrupted instead of calling
do_exit, call force_sigsegv(SIGSEGV).  Which properly kills the
process with SIGSEGV and does not let any more userspace code execute,
instead of just killing one thread of the process and potentially
confusing everything.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: 756f1ae8a44e ("PPC32: Rework signal code and add a swapcontext system call.")
Fixes: 04879b04bf50 ("[PATCH] ppc64: VMX (Altivec) support & signal32 rework, from Ben Herrenschmidt")
Link: https://lkml.kernel.org/r/20211020174406.17889-7-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-25 15:56:29 -05:00
Eric W. Biederman
ce0ee4e6ac signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
Today the sh code allocates memory the first time a process uses
the fpu.  If that memory allocation fails, kill the affected task
with force_sig(SIGKILL) rather than do_group_exit(SIGKILL).

Calling do_group_exit from an exception handler can potentially lead
to dead locks as do_group_exit is not designed to be called from
interrupt context.  Instead use force_sig(SIGKILL) to kill the
userspace process.  Sending signals in general and force_sig in
particular has been tested from interrupt context so there should be
no problems.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: linux-sh@vger.kernel.org
Fixes: 0ea820cf9b ("sh: Move over to dynamically allocated FPU context.")
Link: https://lkml.kernel.org/r/20211020174406.17889-6-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-25 15:56:29 -05:00
Eric W. Biederman
95bf9d646c signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
When an instruction to save or restore a register from the stack fails
in _save_fp_context or _restore_fp_context return with -EFAULT.  This
change was made to r2300_fpu.S[1] but it looks like it got lost with
the introduction of EX2[2].  This is also what the other implementation
of _save_fp_context and _restore_fp_context in r4k_fpu.S does, and
what is needed for the callers to be able to handle the error.

Furthermore calling do_exit(SIGSEGV) from bad_stack is wrong because
it does not terminate the entire process it just terminates a single
thread.

As the changed code was the only caller of arch/mips/kernel/syscall.c:bad_stack
remove the problematic and now unused helper function.

Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Maciej Rozycki <macro@orcam.me.uk>
Cc: linux-mips@vger.kernel.org
[1] 35938a00ba ("MIPS: Fix ISA I FP sigcontext access violation handling")
[2] f92722dc45 ("MIPS: Correct MIPS I FP sigcontext layout")
Cc: stable@vger.kernel.org
Fixes: f92722dc45 ("MIPS: Correct MIPS I FP sigcontext layout")
Acked-by: Maciej W. Rozycki <macro@orcam.me.uk>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Link: https://lkml.kernel.org/r/20211020174406.17889-5-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-25 15:55:35 -05:00
Parav Pandit
d67ab0a8c1 net/mlx5: SF_DEV Add SF device trace points
Add SF device add and delete specific trace points.

echo mlx5:mlx5_sf_dev_add >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_dev_del >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_vhca_event >> /sys/kernel/debug/tracing/set_event

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:21 -07:00
Parav Pandit
b3ccada68b net/mlx5: SF, Add SF trace points
Add support for trace events for SFs to improve debugging.
This covers
(a) port add and free trace points
(b) device level trace points
(c) SF hardware context add, free trace points.
(d) SF function activate/deacticate and state trace points

SF events examples:
echo mlx5:mlx5_sf_add >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_free >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_hwc_alloc >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_hwc_free >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_hwc_deferred_free >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_update_state >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_activate >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_deactivate >> /sys/kernel/debug/tracing/set_event

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:21 -07:00
Shay Drory
5546040619 net/mlx5: Let user configure max_macs param
Currently, max_macs is taking 70Kbytes of memory per function. This
size is not needed in all use cases, and is critical with large scale.
Hence, allow user to configure the number of max_macs.

For example, to reduce the number of max_macs to 1, execute::
$ devlink dev param set pci/0000:00:0b.0 name max_macs value 1 \
              cmode driverinit
$ devlink dev reload pci/0000:00:0b.0

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:21 -07:00
Shay Drory
a6cb08daa3 net/mlx5: Let user configure event_eq_size param
Event EQ is an EQ which received the notification of almost all the
events generated by the NIC.
Currently, each event EQ is taking 512KB of memory. This size is not
needed in most use cases, and is critical with large scale. Hence,
allow user to configure the size of the event EQ.

For example to reduce event EQ size to 64, execute::
$ devlink resource set pci/0000:00:0b.0 path /event_eq_size/ size 64
$ devlink dev reload pci/0000:00:0b.0

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:20 -07:00
Shay Drory
46ae40b94d net/mlx5: Let user configure io_eq_size param
Currently, each I/O EQ is taking 128KB of memory. This size
is not needed in all use cases, and is critical with large scale.
Hence, allow user to configure the size of I/O EQs.

For example, to reduce I/O EQ size to 64, execute:
$ devlink resource set pci/0000:00:0b.0 path /io_eq_size/ size 64
$ devlink dev reload pci/0000:00:0b.0

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:20 -07:00