Commit Graph

1062067 Commits

Author SHA1 Message Date
Aleksander Jan Bajkowski
5be60a9453 net: lantiq_xrx200: fix statistics of received bytes
Received frames have FCS truncated. There is no need
to subtract FCS length from the statistics.

Fixes: fe1a56420c ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-28 12:18:18 +00:00
Christophe JAILLET
1cd5384c88 net: ag71xx: Fix a potential double free in error handling paths
'ndev' is a managed resource allocated with devm_alloc_etherdev(), so there
is no need to call free_netdev() explicitly or there will be a double
free().

Simplify all error handling paths accordingly.

Fixes: d51b6ce441 ("net: ethernet: add ag71xx driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-28 12:16:34 +00:00
wolfgang huang
8b5fdfc57c mISDN: change function names to avoid conflicts
As we build for mips, we meet following error. l1_init error with
multiple definition. Some architecture devices usually marked with
l1, l2, lxx as the start-up phase. so we change the mISDN function
names, align with Isdnl2_xxx.

mips-linux-gnu-ld: drivers/isdn/mISDN/layer1.o: in function `l1_init':
(.text+0x890): multiple definition of `l1_init'; \
arch/mips/kernel/bmips_5xxx_init.o:(.text+0xf0): first defined here
make[1]: *** [home/mips/kernel-build/linux/Makefile:1161: vmlinux] Error 1

Signed-off-by: wolfgang huang <huangjinhui@kylinos.cn>
Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-28 12:12:04 +00:00
Linus Torvalds
a8ad9a2434 Another EFI fix for v5.16
- Prevent missing prototype warning from breaking the build under
   CONFIG_WERROR=y
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmHJrdYACgkQw08iOZLZ
 jyRS/wwAvrrv6xidZNqm/BAr4bhY9saQ4BZTracZuHJwD7300H9mjlMDMnVbLhOU
 oVv9hEHDqBqBKgZxIYc0ok4SMZEpieFhLuTdXzVf5dMg+BL+qHR0EQp8kLp8mssL
 xbPnMidecL2TfjPqLn34MVPe46uQo2yioxET4hDvWA5xsWNKKd0AeKVzUzdV2AVE
 elnHsdsBWpCexaIJkQEFJWf/Y3xUlzcqp7zcqgJq4hq3fZlEbD8SJ7TUM7ReAQJ6
 5afzYXBm086sn602hZVbAtOy94vLBBQswnbKntmFpcT+z/GLNjycx+A4Z2h+6jjx
 ImiIkNZA2N5ij12riXdx3n6OG0LMuILlRhhqyckgrgHyc1oYCwaUq+LOuVQENZF+
 YNGfyntQ+Y0F9dfG28atnFiL5ZFZcCSJxVZ3Dl2bql2DHptbQNHOBdXKOcDWTkpc
 JKTs3j3rigZ1I01b1SgWzH418do8s8y4iv4nmp3EdMUO8v8rEFMx3yHIHKSRjMqj
 1vAPZtWQ
 =coEy
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent-for-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fix from Ard Biesheuvel:
 "Another EFI fix for v5.16:

   - Prevent missing prototype warning from breaking the build under
     CONFIG_WERROR=y"

* tag 'efi-urgent-for-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: Move efifb_setup_from_dmi() prototype from arch headers
2021-12-27 08:58:35 -08:00
Tom Rix
732bc2ff08 selinux: initialize proto variable in selinux_ip_postroute_compat()
Clang static analysis reports this warning

hooks.c:5765:6: warning: 4th function call argument is an uninitialized
                value
        if (selinux_xfrm_postroute_last(sksec->sid, skb, &ad, proto))
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

selinux_parse_skb() can return ok without setting proto.  The later call
to selinux_xfrm_postroute_last() does an early check of proto and can
return ok if the garbage proto value matches.  So initialize proto.

Cc: stable@vger.kernel.org
Fixes: eef9b41622 ("selinux: cleanup selinux_xfrm_sock_rcv_skb() and selinux_xfrm_postroute_last()")
Signed-off-by: Tom Rix <trix@redhat.com>
[PM: typo/spelling and checkpatch.pl description fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-12-27 10:41:20 -05:00
Krzysztof Kozlowski
79b69a8370 nfc: uapi: use kernel size_t to fix user-space builds
Fix user-space builds if it includes /usr/include/linux/nfc.h before
some of other headers:

  /usr/include/linux/nfc.h:281:9: error: unknown type name ‘size_t’
    281 |         size_t service_name_len;
        |         ^~~~~~

Fixes: d646960f79 ("NFC: Initial LLCP support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 14:58:37 +00:00
Dmitry V. Levin
7175f02c4e uapi: fix linux/nfc.h userspace compilation errors
Replace sa_family_t with __kernel_sa_family_t to fix the following
linux/nfc.h userspace compilation errors:

/usr/include/linux/nfc.h:266:2: error: unknown type name 'sa_family_t'
  sa_family_t sa_family;
/usr/include/linux/nfc.h:274:2: error: unknown type name 'sa_family_t'
  sa_family_t sa_family;

Fixes: 23b7869c0f ("NFC: add the NFC socket raw protocol")
Fixes: d646960f79 ("NFC: Initial LLCP support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 14:58:03 +00:00
Wen Zhiwei
b4aadd2073 net:Remove initialization of static variables to 0
Delete the initialization of three static variables
because it is meaningless.

Signed-off-by: Wen Zhiwei <wenzhiwei@kylinos.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 14:53:00 +00:00
Matthias-Christian Ott
ca506fca46 net: usb: pegasus: Do not drop long Ethernet frames
The D-Link DSB-650TX (2001:4002) is unable to receive Ethernet frames
that are longer than 1518 octets, for example, Ethernet frames that
contain 802.1Q VLAN tags.

The frames are sent to the pegasus driver via USB but the driver
discards them because they have the Long_pkt field set to 1 in the
received status report. The function read_bulk_callback of the pegasus
driver treats such received "packets" (in the terminology of the
hardware) as errors but the field simply does just indicate that the
Ethernet frame (MAC destination to FCS) is longer than 1518 octets.

It seems that in the 1990s there was a distinction between
"giant" (> 1518) and "runt" (< 64) frames and the hardware includes
flags to indicate this distinction. It seems that the purpose of the
distinction "giant" frames was to not allow infinitely long frames due
to transmission errors and to allow hardware to have an upper limit of
the frame size. However, the hardware already has such limit with its
2048 octet receive buffer and, therefore, Long_pkt is merely a
convention and should not be treated as a receive error.

Actually, the hardware is even able to receive Ethernet frames with 2048
octets which exceeds the claimed limit frame size limit of the driver of
1536 octets (PEGASUS_MTU).

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Matthias-Christian Ott <ott@mirix.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 14:52:06 +00:00
Zekun Shen
5f50153288 atlantic: Fix buff_ring OOB in aq_ring_rx_clean
The function obtain the next buffer without boundary check.
We should return with I/O error code.

The bug is found by fuzzing and the crash report is attached.
It is an OOB bug although reported as use-after-free.

[    4.804724] BUG: KASAN: use-after-free in aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.805661] Read of size 4 at addr ffff888034fe93a8 by task ksoftirqd/0/9
[    4.806505]
[    4.806703] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G        W         5.6.0 #34
[    4.809030] Call Trace:
[    4.809343]  dump_stack+0x76/0xa0
[    4.809755]  print_address_description.constprop.0+0x16/0x200
[    4.810455]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.811234]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.813183]  __kasan_report.cold+0x37/0x7c
[    4.813715]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.814393]  kasan_report+0xe/0x20
[    4.814837]  aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[    4.815499]  ? hw_atl_b0_hw_ring_rx_receive+0x9a5/0xb90 [atlantic]
[    4.816290]  aq_vec_poll+0x179/0x5d0 [atlantic]
[    4.816870]  ? _GLOBAL__sub_I_65535_1_aq_pci_func_init+0x20/0x20 [atlantic]
[    4.817746]  ? __next_timer_interrupt+0xba/0xf0
[    4.818322]  net_rx_action+0x363/0xbd0
[    4.818803]  ? call_timer_fn+0x240/0x240
[    4.819302]  ? __switch_to_asm+0x40/0x70
[    4.819809]  ? napi_busy_loop+0x520/0x520
[    4.820324]  __do_softirq+0x18c/0x634
[    4.820797]  ? takeover_tasklets+0x5f0/0x5f0
[    4.821343]  run_ksoftirqd+0x15/0x20
[    4.821804]  smpboot_thread_fn+0x2f1/0x6b0
[    4.822331]  ? smpboot_unregister_percpu_thread+0x160/0x160
[    4.823041]  ? __kthread_parkme+0x80/0x100
[    4.823571]  ? smpboot_unregister_percpu_thread+0x160/0x160
[    4.824301]  kthread+0x2b5/0x3b0
[    4.824723]  ? kthread_create_on_node+0xd0/0xd0
[    4.825304]  ret_from_fork+0x35/0x40

Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 14:49:53 +00:00
yangxingwu
6c25449e1a net: udp: fix alignment problem in udp4_seq_show()
$ cat /pro/net/udp

before:

  sl  local_address rem_address   st tx_queue rx_queue tr tm->when
26050: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000
26320: 0100007F:0143 00000000:0000 07 00000000:00000000 00:00000000
27135: 00000000:8472 00000000:0000 07 00000000:00000000 00:00000000

after:

   sl  local_address rem_address   st tx_queue rx_queue tr tm->when
26050: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000
26320: 0100007F:0143 00000000:0000 07 00000000:00000000 00:00000000
27135: 00000000:8472 00000000:0000 07 00000000:00000000 00:00000000

Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 14:47:14 +00:00
Karsten Graul
6d7373dabf net/smc: fix using of uninitialized completions
In smc_wr_tx_send_wait() the completion on index specified by
pend->idx is initialized and after smc_wr_tx_send() was called the wait
for completion starts. pend->idx is used to get the correct index for
the wait, but the pend structure could already be cleared in
smc_wr_tx_process_cqe().
Introduce pnd_idx to hold and use a local copy of the correct index.

Fixes: 09c61d24f9 ("net/smc: wait for departure of an IB message")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 14:46:17 +00:00
Remi Pommarel
fd3a459000 net: bridge: Get SIOCGIFBR/SIOCSIFBR ioctl working in compat mode
In compat mode SIOC{G,S}IFBR ioctls were only supporting
BRCTL_GET_VERSION returning an artificially version to spur userland
tool to use SIOCDEVPRIVATE instead. But some userland tools ignore that
and use SIOC{G,S}IFBR unconditionally as seen with busybox's brctl.

Example of non working 32-bit brctl with CONFIG_COMPAT=y:
$ brctl show
brctl: SIOCGIFBR: Invalid argument

Example of fixed 32-bit brctl with CONFIG_COMPAT=y:
$ brctl show
bridge name     bridge id               STP enabled     interfaces
br0

Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:31:57 +00:00
Lad Prabhakar
32f52e8e78 net: ethernet: ti: davinci_emac: Use platform_get_irq() to get the interrupt
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.

In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq() for DT users only.

While at it propagate error code in case request_irq() fails instead of
returning -EBUSY.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:22:19 +00:00
Lad Prabhakar
7801302b9a net: xilinx: emaclite: Use platform_get_irq() to get the interrupt
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.

In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq().

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:22:19 +00:00
Lad Prabhakar
6c119fbdb8 net: ethoc: Use platform_get_irq() to get the interrupt
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.

In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq().

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:22:19 +00:00
Lad Prabhakar
441faddaad fsl/fman: Use platform_get_irq() to get the interrupt
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.

In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq(). While doing so return error pointer
from read_dts_node() as platform_get_irq() may return -EPROBE_DEFER.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:22:19 +00:00
Lad Prabhakar
f83b434811 net: pxa168_eth: Use platform_get_irq() to get the interrupt
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.

In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq().

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:22:19 +00:00
Lad Prabhakar
c0032d6e87 ethernet: netsec: Use platform_get_irq() to get the interrupt
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypasses the hierarchical setup and messes up the
irq chaining.

In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq().

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:22:19 +00:00
Kai-Heng Feng
f4dd5174e2 net: wwan: iosm: Keep device at D0 for s2idle case
We are seeing spurious wakeup caused by Intel 7560 WWAN on AMD laptops.
This prevent those laptops to stay in s2idle state.

>From what I can understand, the intention of ipc_pcie_suspend() is to
put the device to D3cold, and ipc_pcie_suspend_s2idle() is to keep the
device at D0. However, the device can still be put to D3hot/D3cold by
PCI core.

So explicitly let PCI core know this device should stay at D0, to solve
the spurious wakeup.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:19:51 +00:00
Kai-Heng Feng
8f58e29ed7 net: wwan: iosm: Let PCI core handle PCI power transition
pci_pm_suspend_noirq() and pci_pm_resume_noirq() already handle power
transition for system-wide suspend and resume, so it's not necessary to
do it in the driver.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:19:51 +00:00
William Zhao
c1833c3964 ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate
The "__ip6_tnl_parm" struct was left uninitialized causing an invalid
load of random data when the "__ip6_tnl_parm" struct was used elsewhere.
As an example, in the function "ip6_tnl_xmit_ctl()", it tries to access
the "collect_md" member. With "__ip6_tnl_parm" being uninitialized and
containing random data, the UBSAN detected that "collect_md" held a
non-boolean value.

The UBSAN issue is as follows:
===============================================================
UBSAN: invalid-load in net/ipv6/ip6_tunnel.c:1025:14
load of value 30 is not a valid value for type '_Bool'
CPU: 1 PID: 228 Comm: kworker/1:3 Not tainted 5.16.0-rc4+ #8
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
Call Trace:
<TASK>
dump_stack_lvl+0x44/0x57
ubsan_epilogue+0x5/0x40
__ubsan_handle_load_invalid_value+0x66/0x70
? __cpuhp_setup_state+0x1d3/0x210
ip6_tnl_xmit_ctl.cold.52+0x2c/0x6f [ip6_tunnel]
vti6_tnl_xmit+0x79c/0x1e96 [ip6_vti]
? lock_is_held_type+0xd9/0x130
? vti6_rcv+0x100/0x100 [ip6_vti]
? lock_is_held_type+0xd9/0x130
? rcu_read_lock_bh_held+0xc0/0xc0
? lock_acquired+0x262/0xb10
dev_hard_start_xmit+0x1e6/0x820
__dev_queue_xmit+0x2079/0x3340
? mark_lock.part.52+0xf7/0x1050
? netdev_core_pick_tx+0x290/0x290
? kvm_clock_read+0x14/0x30
? kvm_sched_clock_read+0x5/0x10
? sched_clock_cpu+0x15/0x200
? find_held_lock+0x3a/0x1c0
? lock_release+0x42f/0xc90
? lock_downgrade+0x6b0/0x6b0
? mark_held_locks+0xb7/0x120
? neigh_connected_output+0x31f/0x470
? lockdep_hardirqs_on+0x79/0x100
? neigh_connected_output+0x31f/0x470
? ip6_finish_output2+0x9b0/0x1d90
? rcu_read_lock_bh_held+0x62/0xc0
? ip6_finish_output2+0x9b0/0x1d90
ip6_finish_output2+0x9b0/0x1d90
? ip6_append_data+0x330/0x330
? ip6_mtu+0x166/0x370
? __ip6_finish_output+0x1ad/0xfb0
? nf_hook_slow+0xa6/0x170
ip6_output+0x1fb/0x710
? nf_hook.constprop.32+0x317/0x430
? ip6_finish_output+0x180/0x180
? __ip6_finish_output+0xfb0/0xfb0
? lock_is_held_type+0xd9/0x130
ndisc_send_skb+0xb33/0x1590
? __sk_mem_raise_allocated+0x11cf/0x1560
? dst_output+0x4a0/0x4a0
? ndisc_send_rs+0x432/0x610
addrconf_dad_completed+0x30c/0xbb0
? addrconf_rs_timer+0x650/0x650
? addrconf_dad_work+0x73c/0x10e0
addrconf_dad_work+0x73c/0x10e0
? addrconf_dad_completed+0xbb0/0xbb0
? rcu_read_lock_sched_held+0xaf/0xe0
? rcu_read_lock_bh_held+0xc0/0xc0
process_one_work+0x97b/0x1740
? pwq_dec_nr_in_flight+0x270/0x270
worker_thread+0x87/0xbf0
? process_one_work+0x1740/0x1740
kthread+0x3ac/0x490
? set_kthread_struct+0x100/0x100
ret_from_fork+0x22/0x30
</TASK>
===============================================================

The solution is to initialize "__ip6_tnl_parm" struct to zeros in the
"vti6_siocdevprivate()" function.

Signed-off-by: William Zhao <wizhao@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:17:33 +00:00
Horatiu Vultur
0c94d657d2 net: lan966x: Fix the vlan used by host ports
The blamed commit changed the vlan used by the host ports to be 4095
instead of 0.
Because of this change the following issues are seen:
- when the port is probed first it was adding an entry in the MAC table
  with the wrong vlan (port->pvid which is default 0) and not HOST_PVID
- when the port is removed from a bridge, it was using the wrong vlan to
  add entries in the MAC table. It was using the old PVID and not the
  HOST_PVID

This patch fixes this two issues by using the HOST_PVID instead of
port->pvid.

Fixes: 6d2c186afa ("net: lan966x: Add vlan support.")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:16:30 +00:00
David S. Miller
099eac91bc Merge branch 'bnxt_en-next'
Michael Chan says:

====================
bnxt_en: Update for net-next

This series includes some added error logging for firmware error reports,
DIM interrupt sampling for the latest 5750X chips, CQE interrupt
coalescing mode support, and to use RX page frag buffers for better
software GRO performance.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:29 +00:00
Jakub Kicinski
720908e5f8 bnxt_en: Use page frag RX buffers for better software GRO performance
If NETIF_F_GRO_HW is disabled, the existing driver code uses kmalloc'ed
data for RX buffers.  This causes inefficient SW GRO performance
because the GRO data is merged using the less efficient frag_list.
Use netdev_alloc_frag() and friends instead so that GRO data can be
merged into skb_shinfo(skb)->frags for better performance.

[Use skb_free_frag() - Vikas Gupta]

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Edwin Peer
b976969bed bnxt_en: convert to xdp_do_flush
The xdp_do_flush_map function has been replaced with the more general
xdp_do_flush().

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Michael Chan
3fcbdbd5d8 bnxt_en: Support CQE coalescing mode in ethtool
Support showing and setting the CQE mode in ethtool.

Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Michael Chan
df78ea2246 bnxt_en: Support configurable CQE coalescing mode
CQE coalescing mode is the same as the timer reset coalescing mode
on Broadcom devices.  Currently this mode is always enabled if it
is supported by the device.  Restructure the code slightly to support
dynamically changing this mode.

Add a flags field to struct bnxt_coal.  Initially, the CQE flag will
be set for the RX and TX side if the device supports it.  We need to
move bnxt_init_dflt_coal() to set up default coalescing until the
capability is determined.

Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Andy Gospodarek
dc1f5d1ebc bnxt_en: enable interrupt sampling on 5750X for DIM
5750X (P5) chips handle receiving packets on the NQ rather than the main
completion queue so we need to get and set stats from the correct spots
for dynamic interrupt moderation.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Michael Chan
0fb8582ae5 bnxt_en: Log error report for dropped doorbell
Log the unrecognized error report type value as well.

Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Somnath Kotur
5a717f4a8e bnxt_en: Add event handler for PAUSE Storm event
FW has been modified to send a new async event when it detects
a pause storm. Register for this new event and log it upon receipt.

Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-27 12:00:28 +00:00
Linus Torvalds
fc74e0a40e Linux 5.16-rc7 2021-12-26 13:17:17 -08:00
Linus Torvalds
e8ffcd3ab0 - Prevent potential undefined behavior due to shifting pkey constants into the sign bit
- Move the EFI memory reservation code *after* the efi= cmdline parsing has happened
 
 - Revert two commits which turned out to be the wrong direction to chase
 when accommodating early memblock reservations consolidation and command line parameters
 parsing
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHIYbAACgkQEsHwGGHe
 VUphMhAAptI7YGJFqfH6bUuVh7HX38FOx5pCODpZUXJWIsvgC0MRnxqUF0+Srzbt
 ZBMhh7WUiOpIe/ZeUpRouX5Al4nhiDdG4QeddJLc1DaSaAgOJWD9NH0V0K1aWf4k
 poAUoX7mp5M1mfc/boOQGBdtX+l+3vCYU0RzyQ+IhfDNM6GzJpAC1txWFaL/lWE+
 3IVb4OF37mW7Hlk+HdizcTgm1qaHRobwP7fxbqRwdHZy+YRjz0eOd2dR625jff6b
 y3Z7WVkwx+YtNVD7YnXvmJKyh43rbbMmx0CU6WiYlhGkz5dooe9Er2LuouS1kzRr
 4XWvZrr28qLc9eCdv6MaZGv62KyP0QBm4Fg7E/nC/0Jo8NA3GQXulF2OHofSCX/h
 BZa3mJa+heoXpUb7zDTUMa8UCdOYGJpvHRGYcU3JiaIwv+dnTRBdAzyVcqCMWt5P
 wPOl4A08vVZ+w3aNrZh3Zfg03pG4/9miveIa7EsuCZF/oN+5KAVQdcU74zneH4Qx
 nUfF7b80PyiSnCblQ3DL+efKVG4/Bud6crxo1tOtw6cDQxiAYDzupuZBlHV9791O
 0TwPuLJSAqv4aMiobi5/j32pSLaohYJij9stbfB8zeD49rq5uvT+jDoxcKiZurJP
 jamo6LHffF+o8+F2YwFOSIPayZOViXxWvwPd+E6uL7EodvZgiS0=
 =j12C
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Prevent potential undefined behavior due to shifting pkey constants
   into the sign bit

 - Move the EFI memory reservation code *after* the efi= cmdline parsing
   has happened

 - Revert two commits which turned out to be the wrong direction to
   chase when accommodating early memblock reservations consolidation
   and command line parameters parsing

* tag 'x86_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pkey: Fix undefined behaviour with PKRU_WD_BIT
  x86/boot: Move EFI range reservation after cmdline parsing
  Revert "x86/boot: Pull up cmdline preparation and early param parsing"
  Revert "x86/boot: Mark prepare_command_line() __init"
2021-12-26 10:28:55 -08:00
Linus Torvalds
2afa90bd1c - Prevent clang from reordering the reachable annotation in an inline asm statement
without inputs
 
 - Fix objtool builds on non-glibc systems due to undefined __always_inline
 -----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHIVhIACgkQEsHwGGHe
 VUoG0Q/4rbCfRL4apKqVgZKZ2zXW69V+aWqGzFL56vshUhChnQpIr3ANN4qDOaCu
 TD23gxGcjQzjJsqDOu/YCQIxbcBdsDY5Ufix22eOnO24xcjFCNMcdYkV+nn1iXtO
 VBqqOM7ebm1ZXjIuhu43gNABY7s8v9bFM6MupZUKQ51tSsn2v6Mt2acsRZXeRFl1
 bUPBmEr6nOepeDwuAvdIWFQ2hl87FAlQlkPcvdbE7zEnAE/a/TOT8IFOZ4fHh+IB
 Kluw0FeiWBkP5pqPHNKwP64JNKvlAl5yMxFOYONBevgB1WVQJTFt0PXM78OXT0Fw
 oythqKA64c3XI9GyXuDL5Uwl8X4LmJzVE0LIflSesvAhWuAkiz3pYg8n5Q6vD4IL
 9VV/SZes+8/XmcxirAqnPWrmQmWa8iDf4QGiWwpVnXaHPQxw1HMjU0jC54/mSuiF
 0mV6G4IedsBiOXs0MocZF5VwcN0oroOZQEXqtHSo2uxUDIqDQyGY4f8xsHmi3Oed
 XZdS/RpUVUyrcj/aVXnSFQfBcfUO0Q0xvqqMDmBeSHE8B6gXdm8aW0fPSFQxc1Tk
 blYZ+0rB3ZdQqZsIXFe9nQGIh+4/bSHiLWP5+yf2zr1WAqyCBfSEPkmtvqPt9hJX
 SJfvAHlj2TIWR2VMO88EonRxNIAlhpEAsptHvCLCAQFReCgsNw==
 =QUDC
 -----END PGP SIGNATURE-----

Merge tag 'objtool_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Borislav Petkov:

 - Prevent clang from reordering the reachable annotation in
   an inline asm statement without inputs

 - Fix objtool builds on non-glibc systems due to undefined
   __always_inline

* tag 'objtool_urgent_for_v5.16_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  compiler.h: Fix annotation macro misplacement with Clang
  uapi: Fix undefined __always_inline on non-glibc systems
2021-12-26 10:19:40 -08:00
Linus Torvalds
438645193e Some hopefully final pin control fixes for the v5.16 kernel:
- Fix an out-of-bounds bug in the Mediatek driver
 - Fix an init order bug in the Broadcom BCM2835 driver
 - Fix a GPIO offset bug in the STM32 driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmHH3VUACgkQQRCzN7AZ
 XXNIOBAAmPYOpFC7UrQ5dr6RU2X9/sG1NjyMRcl0fsuTi+vv0BmBWUj3lC64Epjm
 CubPen+UpEkcME2t7h7XNT2qgtHraHdnQ7muCwriClq6qZiOJVmMIflGTONEbrcG
 wGEIRBH/1EMFr/5oYdF2caJ7VQzSR6d6JwKzLr/P4NJxPOoMxQL27Pw1C1umud2z
 ailoYrtRYqN71aAMFBl0v0pgiiFffYN6ey4CNEwZ1Vj5UodsYP4tNtMSkky+a31L
 AHoqu1f4hNiTAfa0C/UxuOhlQyak4iw8PKcQErwhJ1a06BKHT9xA9QYa7l7FuH7i
 DoDmuKoR96LJnEotVYwHpwWLUv3LwhSk+uZl58xLhmDovk8yo9V3vmNAwAS1Ckf8
 i8kVZyTRXGXrChAmwEZz48w++8B2+xQB1y8LnkfhG7fx9izwsDXouTd/ilnofNdA
 sWbd+lB2Bfn1qjKu+SFnXvXOwt7ji8FuCp3hMvZE2ANh6NV6boYro4isSrcXHvCV
 Tw1qDuEHqIKZimz/PwJPryHLTs3T3GkXqejh9ciFY7x4JVkhIhtTCXITRgsPasBm
 gt7eBD7HvkaXbbLdIRU56VbKgD4aIjtyQ27d3V4bZMA/8S8ynI1d/FDN5FQcLxlW
 IQbTvF63UxOWCEcY1S+J+NSE1twuaY0ZE2b4ElmNTOQkfxYi20I=
 =9jNV
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Some hopefully final pin control fixes for the v5.16 kernel:

   - Fix an out-of-bounds bug in the Mediatek driver

   - Fix an init order bug in the Broadcom BCM2835 driver

   - Fix a GPIO offset bug in the STM32 driver"

* tag 'pinctrl-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: stm32: consider the GPIO offset to expose all the GPIO lines
  pinctrl: bcm2835: Change init order for gpio hogs
  pinctrl: mediatek: fix global-out-of-bounds issue
2021-12-25 20:00:09 -08:00
Linus Torvalds
e2ae0d4a6b hwmon fixes for v5.16-rc7
A couple of lm90 driver fixes. None of them are critical,
 but they should nevertheless be fixed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEiHPvMQj9QTOCiqgVyx8mb86fmYEFAmHHQUwACgkQyx8mb86f
 mYEZ6RAAh7t8ePVhgZvy26q0Lz86xmle3npou6GspvJywxtaMIxHb9ojffGp3mpS
 64TksTkgV0f7d+hWcB1CuY2rJB4jt1oGjfV0waYzM6kduKSRwxsauH556rfgfz27
 quauK152uQKxFP3p/lI040lQKxa3kUnM+xj0+7FFvimaJe05zEMMJqt6DqHJ4LTF
 qqd9v4gb9aIxrDKECYVohagYXL5P5dsmtGsrY4rqxtUUC/cmLXzTYY4Uyf7Bz96h
 HQlidtMYL1VyLAin/E2VDCbSwKqG/uhLc4dxHQ8MjDgbKfWCdwSWEKgjRIAw9P8j
 lK8d8eB82wm1arDXdndSZg4Iww+SekZtbv5EgRDDEOxuNzq5hyydbmd5ECxBqiot
 WkV0vrIG8x1hXij2OH+dIpfIGqcQHKf4NIVPQRHkVebgAywVxsn2uoOl3aaE6fZI
 HjHR6dbEd9dhTwQrZIOUbPnRaNljq0a1HFs5P3N5m91iOW/f93+IdSAxWN1aWIsH
 nWPrXvMFRrE+1FjNdnplPZ+VxUHjkeYXy5bWi0Jv6RmZyrcHDbBei5Dv7/7y7Qje
 HHp1F/JZ1wYYg1S75ludDs9DT4BEuWMOcNffB3WeZrzpSTOgeUbDl/u1nlyVQ8lk
 Oh+XdSc+S/ibNpmEaOSMFpBxc2/E/MaCPLj0sZ1NW2iv8xA4iLk=
 =JB4V
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-v5.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "A couple of lm90 driver fixes. None of them are critical, but they
  should nevertheless be fixed"

* tag 'hwmon-for-v5.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (lm90) Do not report 'busy' status bit as alarm
  hwmom: (lm90) Fix citical alarm status for MAX6680/MAX6681
  hwmon: (lm90) Drop critical attribute support for MAX6654
  hwmon: (lm90) Prevent integer overflow/underflow in hysteresis calculations
  hwmon: (lm90) Fix usage of CONFIG2 register in detect function
2021-12-25 13:08:22 -08:00
Linus Torvalds
5b5e3d0347 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 "A few small updates to drivers.

  Of note we are now deferring probes of i8042 on some Asus devices as
  the controller is not ready to respond to queries first time around
  when the driver is compiled into the kernel"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: elants_i2c - do not check Remark ID on eKTH3900/eKTH5312
  Input: atmel_mxt_ts - fix double free in mxt_read_info_block
  Input: goodix - fix memory leak in goodix_firmware_upload
  Input: goodix - add id->model mapping for the "9111" model
  Input: goodix - try not to touch the reset-pin on x86/ACPI devices
  Input: i8042 - enable deferred probe quirk for ASUS UM325UA
  Input: elantech - fix stack out of bound access in elantech_change_report_id()
  Input: iqs626a - prohibit inlining of channel parsing functions
  Input: i8042 - add deferred probe support
2021-12-25 13:00:14 -08:00
Linus Torvalds
d0cc67b278 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "9 patches.

  Subsystems affected by this patch series: mm (kfence, mempolicy,
  memory-failure, pagemap, pagealloc, damon, and memory-failure),
  core-kernel, and MAINTAINERS"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()
  mm/damon/dbgfs: protect targets destructions with kdamond_lock
  mm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid
  mm: delete unsafe BUG from page_cache_add_speculative()
  mm, hwpoison: fix condition in free hugetlb page path
  MAINTAINERS: mark more list instances as moderated
  kernel/crash_core: suppress unknown crashkernel parameter warning
  mm: mempolicy: fix THP allocations escaping mempolicy restrictions
  kfence: fix memory leak when cat kfence objects
2021-12-25 12:30:03 -08:00
Liu Shixin
2a57d83c78 mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()
Hulk Robot reported a panic in put_page_testzero() when testing
madvise() with MADV_SOFT_OFFLINE.  The BUG() is triggered when retrying
get_any_page().  This is because we keep MF_COUNT_INCREASED flag in
second try but the refcnt is not increased.

    page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
    ------------[ cut here ]------------
    kernel BUG at include/linux/mm.h:737!
    invalid opcode: 0000 [#1] PREEMPT SMP
    CPU: 5 PID: 2135 Comm: sshd Tainted: G    B             5.16.0-rc6-dirty #373
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
    RIP: release_pages+0x53f/0x840
    Call Trace:
      free_pages_and_swap_cache+0x64/0x80
      tlb_flush_mmu+0x6f/0x220
      unmap_page_range+0xe6c/0x12c0
      unmap_single_vma+0x90/0x170
      unmap_vmas+0xc4/0x180
      exit_mmap+0xde/0x3a0
      mmput+0xa3/0x250
      do_exit+0x564/0x1470
      do_group_exit+0x3b/0x100
      __do_sys_exit_group+0x13/0x20
      __x64_sys_exit_group+0x16/0x20
      do_syscall_64+0x34/0x80
      entry_SYSCALL_64_after_hwframe+0x44/0xae
    Modules linked in:
    ---[ end trace e99579b570fe0649 ]---
    RIP: 0010:release_pages+0x53f/0x840

Link: https://lkml.kernel.org/r/20211221074908.3910286-1-liushixin2@huawei.com
Fixes: b94e02822d ("mm,hwpoison: try to narrow window race for free pages")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-25 12:20:56 -08:00
SeongJae Park
3479641796 mm/damon/dbgfs: protect targets destructions with kdamond_lock
DAMON debugfs interface iterates current monitoring targets in
'dbgfs_target_ids_read()' while holding the corresponding
'kdamond_lock'.  However, it also destructs the monitoring targets in
'dbgfs_before_terminate()' without holding the lock.  This can result in
a use_after_free bug.  This commit avoids the race by protecting the
destruction with the corresponding 'kdamond_lock'.

Link: https://lkml.kernel.org/r/20211221094447.2241-1-sj@kernel.org
Reported-by: Sangwoo Bae <sangwoob@amazon.com>
Fixes: 4bc05954d0 ("mm/damon: implement a debugfs-based user space interface")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>	[5.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-25 12:20:56 -08:00
Thibaut Sautereau
595ec1973c mm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid
The second parameter of alloc_pages_exact_nid is the one indicating the
size of memory pointed by the returned pointer.

Link: https://lkml.kernel.org/r/YbjEgwhn4bGblp//@coeus
Fixes: abd58f38df ("mm/page_alloc: add __alloc_size attributes for better bounds checking")
Signed-off-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Levente Polyak <levente@leventepolyak.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-25 12:20:56 -08:00
Hugh Dickins
94ab10dd42 mm: delete unsafe BUG from page_cache_add_speculative()
It is not easily reproducible, but on 5.16-rc I have several times hit
the VM_BUG_ON_PAGE(PageTail(page), page) in
page_cache_add_speculative(): usually from filemap_get_read_batch() for
an ext4 read, yesterday from next_uptodate_page() from
filemap_map_pages() for a shmem fault.

That BUG used to be placed where page_ref_add_unless() had succeeded,
but now it is placed before folio_ref_add_unless() is attempted: that is
not safe, since it is only the acquired reference which makes the page
safe from racing THP collapse or split.

We could keep the BUG, checking PageTail only when
folio_ref_try_add_rcu() has succeeded; but I don't think it adds much
value - just delete it.

Link: https://lkml.kernel.org/r/8b98fc6f-3439-8614-c3f3-945c659a1aba@google.com
Fixes: 020853b6f5 ("mm: Add folio_try_get_rcu()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-25 12:20:55 -08:00
Naoya Horiguchi
e37e7b0b3b mm, hwpoison: fix condition in free hugetlb page path
When a memory error hits a tail page of a free hugepage,
__page_handle_poison() is expected to be called to isolate the error in
4kB unit, but it's not called due to the outdated if-condition in
memory_failure_hugetlb().  This loses the chance to isolate the error in
the finer unit, so it's not optimal.  Drop the condition.

This "(p != head && TestSetPageHWPoison(head)" condition is based on the
old semantics of PageHWPoison on hugepage (where PG_hwpoison flag was
set on the subpage), so it's not necessray any more.  By getting to set
PG_hwpoison on head page for hugepages, concurrent error events on
different subpages in a single hugepage can be prevented by
TestSetPageHWPoison(head) at the beginning of memory_failure_hugetlb().
So dropping the condition should not reopen the race window originally
mentioned in commit b985194c8c ("hwpoison, hugetlb:
lock_page/unlock_page does not match for handling a free hugepage")

[naoya.horiguchi@linux.dev: fix "HardwareCorrupted" counter]
  Link: https://lkml.kernel.org/r/20211220084851.GA1460264@u2004

Link: https://lkml.kernel.org/r/20211210110208.879740-1-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Fei Luo <luofei@unicloud.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>	[5.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-25 12:20:55 -08:00
Randy Dunlap
7e5b901e46 MAINTAINERS: mark more list instances as moderated
Some lists that are moderated are not marked as moderated consistently,
so mark them all as moderated.

Link: https://lkml.kernel.org/r/20211209001330.18558-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Conor Culhane <conor.culhane@silvaco.com>
Cc: Ryder Lee <ryder.lee@mediatek.com>
Cc: Jianjun Wang <jianjun.wang@mediatek.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-25 12:20:55 -08:00
Philipp Rudo
71d2bcec2d kernel/crash_core: suppress unknown crashkernel parameter warning
When booting with crashkernel= on the kernel command line a warning
similar to

    Kernel command line: ro console=ttyS0 crashkernel=256M
    Unknown kernel command line parameters "crashkernel=256M", will be passed to user space.

is printed.

This comes from crashkernel= being parsed independent from the kernel
parameter handling mechanism.  So the code in init/main.c doesn't know
that crashkernel= is a valid kernel parameter and prints this incorrect
warning.

Suppress the warning by adding a dummy early_param handler for
crashkernel=.

Link: https://lkml.kernel.org/r/20211208133443.6867-1-prudo@redhat.com
Fixes: 86d1919a4f ("init: print out unknown kernel parameters")
Signed-off-by: Philipp Rudo <prudo@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-25 12:20:55 -08:00
Andrey Ryabinin
3386353406 mm: mempolicy: fix THP allocations escaping mempolicy restrictions
alloc_pages_vma() may try to allocate THP page on the local NUMA node
first:

	page = __alloc_pages_node(hpage_node,
		gfp | __GFP_THISNODE | __GFP_NORETRY, order);

And if the allocation fails it retries allowing remote memory:

	if (!page && (gfp & __GFP_DIRECT_RECLAIM))
    		page = __alloc_pages_node(hpage_node,
					gfp, order);

However, this retry allocation completely ignores memory policy nodemask
allowing allocation to escape restrictions.

The first appearance of this bug seems to be the commit ac5b2c1891
("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings").

The bug disappeared later in the commit 89c83fb539 ("mm, thp:
consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") and
reappeared again in slightly different form in the commit 76e654cc91
("mm, page_alloc: allow hugepage fallback to remote nodes when
madvised")

Fix this by passing correct nodemask to the __alloc_pages() call.

The demonstration/reproducer of the problem:

    $ mount -oremount,size=4G,huge=always /dev/shm/
    $ echo always > /sys/kernel/mm/transparent_hugepage/defrag
    $ cat mbind_thp.c
    #include <unistd.h>
    #include <sys/mman.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <assert.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <numaif.h>

    #define SIZE 2ULL << 30
    int main(int argc, char **argv)
    {
        int fd;
        unsigned long long i;
        char *addr;
        pid_t pid;
        char buf[100];
        unsigned long nodemask = 1;

        fd = open("/dev/shm/test", O_RDWR|O_CREAT);
        assert(fd > 0);
        assert(ftruncate(fd, SIZE) == 0);

        addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                           MAP_SHARED, fd, 0);

        assert(mbind(addr, SIZE, MPOL_BIND, &nodemask, 2, MPOL_MF_STRICT|MPOL_MF_MOVE)==0);
        for (i = 0; i < SIZE; i+=4096) {
          addr[i] = 1;
        }
        pid = getpid();
        snprintf(buf, sizeof(buf), "grep shm /proc/%d/numa_maps", pid);
        system(buf);
        sleep(10000);

        return 0;
    }
    $ gcc mbind_thp.c -o mbind_thp -lnuma
    $ numactl -H
    available: 2 nodes (0-1)
    node 0 cpus: 0 2
    node 0 size: 1918 MB
    node 0 free: 1595 MB
    node 1 cpus: 1 3
    node 1 size: 2014 MB
    node 1 free: 1731 MB
    node distances:
    node   0   1
      0:  10  20
      1:  20  10
    $ rm -f /dev/shm/test; taskset -c 0 ./mbind_thp
    7fd970a00000 bind:0 file=/dev/shm/test dirty=524288 active=0 N0=396800 N1=127488 kernelpagesize_kB=4

Link: https://lkml.kernel.org/r/20211208165343.22349-1-arbn@yandex-team.com
Fixes: ac5b2c1891 ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-25 12:20:55 -08:00
Baokun Li
0129ab1f26 kfence: fix memory leak when cat kfence objects
Hulk robot reported a kmemleak problem:

    unreferenced object 0xffff93d1d8cc02e8 (size 248):
      comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s)
      hex dump (first 32 bytes):
        00 40 85 19 d4 93 ff ff 00 10 00 00 00 00 00 00  .@..............
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      backtrace:
         seq_open+0x2a/0x80
         full_proxy_open+0x167/0x1e0
         do_dentry_open+0x1e1/0x3a0
         path_openat+0x961/0xa20
         do_filp_open+0xae/0x120
         do_sys_openat2+0x216/0x2f0
         do_sys_open+0x57/0x80
         do_syscall_64+0x33/0x40
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
    unreferenced object 0xffff93d419854000 (size 4096):
      comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s)
      hex dump (first 32 bytes):
        6b 66 65 6e 63 65 2d 23 32 35 30 3a 20 30 78 30  kfence-#250: 0x0
        30 30 30 30 30 30 30 37 35 34 62 64 61 31 32 2d  0000000754bda12-
      backtrace:
         seq_read_iter+0x313/0x440
         seq_read+0x14b/0x1a0
         full_proxy_read+0x56/0x80
         vfs_read+0xa5/0x1b0
         ksys_read+0xa0/0xf0
         do_syscall_64+0x33/0x40
         entry_SYSCALL_64_after_hwframe+0x44/0xa9

I find that we can easily reproduce this problem with the following
commands:

	cat /sys/kernel/debug/kfence/objects
	echo scan > /sys/kernel/debug/kmemleak
	cat /sys/kernel/debug/kmemleak

The leaked memory is allocated in the stack below:

    do_syscall_64
      do_sys_open
        do_dentry_open
          full_proxy_open
            seq_open            ---> alloc seq_file
      vfs_read
        full_proxy_read
          seq_read
            seq_read_iter
              traverse          ---> alloc seq_buf

And it should have been released in the following process:

    do_syscall_64
      syscall_exit_to_user_mode
        exit_to_user_mode_prepare
          task_work_run
            ____fput
              __fput
                full_proxy_release  ---> free here

However, the release function corresponding to file_operations is not
implemented in kfence.  As a result, a memory leak occurs.  Therefore,
the solution to this problem is to implement the corresponding release
function.

Link: https://lkml.kernel.org/r/20211206133628.2822545-1-libaokun1@huawei.com
Fixes: 0ce20dd840 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Marco Elver <elver@google.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-25 12:20:55 -08:00
Ma Xinjian
e6007b85df selftests: mptcp: Remove the deprecated config NFT_COUNTER
NFT_COUNTER was removed since
390ad4295aa ("netfilter: nf_tables: make counter support built-in")
LKP/0Day will check if all configs listing under selftests are able to
be enabled properly.

For the missing configs, it will report something like:
LKP WARN miss config CONFIG_NFT_COUNTER= of net/mptcp/config

- it's not reasonable to keep the deprecated configs.
- configs under kselftests are recommended by corresponding tests.
So if some configs are missing, it will impact the testing results

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ma Xinjian <xinjianx.ma@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-25 17:15:49 +00:00
Xin Long
5ec7d18d18 sctp: use call_rcu to free endpoint
This patch is to delay the endpoint free by calling call_rcu() to fix
another use-after-free issue in sctp_sock_dump():

  BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20
  Call Trace:
    __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218
    lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
    __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
    _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
    spin_lock_bh include/linux/spinlock.h:334 [inline]
    __lock_sock+0x203/0x350 net/core/sock.c:2253
    lock_sock_nested+0xfe/0x120 net/core/sock.c:2774
    lock_sock include/net/sock.h:1492 [inline]
    sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324
    sctp_for_each_transport+0x2b5/0x370 net/sctp/socket.c:5091
    sctp_diag_dump+0x3ac/0x660 net/sctp/diag.c:527
    __inet_diag_dump+0xa8/0x140 net/ipv4/inet_diag.c:1049
    inet_diag_dump+0x9b/0x110 net/ipv4/inet_diag.c:1065
    netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244
    __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352
    netlink_dump_start include/linux/netlink.h:216 [inline]
    inet_diag_handler_cmd+0x2ce/0x3f0 net/ipv4/inet_diag.c:1170
    __sock_diag_cmd net/core/sock_diag.c:232 [inline]
    sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263
    netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477
    sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274

This issue occurs when asoc is peeled off and the old sk is freed after
getting it by asoc->base.sk and before calling lock_sock(sk).

To prevent the sk free, as a holder of the sk, ep should be alive when
calling lock_sock(). This patch uses call_rcu() and moves sock_put and
ep free into sctp_endpoint_destroy_rcu(), so that it's safe to try to
hold the ep under rcu_read_lock in sctp_transport_traverse_process().

If sctp_endpoint_hold() returns true, it means this ep is still alive
and we have held it and can continue to dump it; If it returns false,
it means this ep is dead and can be freed after rcu_read_unlock, and
we should skip it.

In sctp_sock_dump(), after locking the sk, if this ep is different from
tsp->asoc->ep, it means during this dumping, this asoc was peeled off
before calling lock_sock(), and the sk should be skipped; If this ep is
the same with tsp->asoc->ep, it means no peeloff happens on this asoc,
and due to lock_sock, no peeloff will happen either until release_sock.

Note that delaying endpoint free won't delay the port release, as the
port release happens in sctp_endpoint_destroy() before calling call_rcu().
Also, freeing endpoint by call_rcu() makes it safe to access the sk by
asoc->base.sk in sctp_assocs_seq_show() and sctp_rcv().

Thanks Jones to bring this issue up.

v1->v2:
  - improve the changelog.
  - add kfree(ep) into sctp_endpoint_destroy_rcu(), as Jakub noticed.

Reported-by: syzbot+9276d76e83e3bcde6c99@syzkaller.appspotmail.com
Reported-by: Lee Jones <lee.jones@linaro.org>
Fixes: d25adbeb0c ("sctp: fix an use-after-free issue in sctp_sock_dump")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-25 17:13:37 +00:00
Christophe JAILLET
7c63f26cb5 lib: objagg: Use the bitmap API when applicable
Use 'bitmap_zalloc()' to simplify code, improve the semantic and reduce
some open-coded arithmetic in allocator arguments.

Also change the corresponding 'kfree()' into 'bitmap_free()' to keep
consistency.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/f9541b085ec68e573004e1be200c11c9c901181a.1640295165.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-24 14:54:29 -08:00