Commit Graph

1106573 Commits

Author SHA1 Message Date
Linus Torvalds
d9cdc3b125 powerpc fixes for 5.19 #5
- On Power8 bare metal, fix creation of RNG platform devices, which are needed
    for the /dev/hwrng driver to probe correctly.
 
 Thanks to: Jason A. Donenfeld, Sachin Sant.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmLJSdATHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDIzEACy0D2CLdJq1YSvjVfuTbon6WpzUkBq
 5ld1cIst+tLkmq9ixpwaGe+x0/7eELRL5/yfdp1/cFTLirKX1fJOlM2/P52Q+ehN
 DOJfSlDLBHP+os4klKw1g3HrbyP5IBLL0Up6fiebd83SpRHHQ8gmANlfb0KZH5MD
 mUH33cE0egSqkZmTIeMW5cCOq9tnppaM/2CFl5ijxXs5xdJodZAR1HV6uhfxhf0N
 Gqj2O44VVHyckMEVrdd2NmqXeOufWCObGAI/4WYVAijMyyewMnJDnTpso5ZBcgGv
 Y2cX5kCdN1D6rfBEKqTVkECt+Q340KbB86kIJqnPuc7T7ay7Ky4mfVOTRF2EXDQs
 mc0+MUTn1W0ydEuktO2A8lXN/3y/Y4S3SzkPDPKCLJ0+iKMx5Hij6aBqyV+gxyBp
 oc7XC4VCMNChnpxgqVzEuweTUf/3YiEDI0pQqG7pt2E/SLFoi98r4T3OH23PjdWM
 13HjilUFKVD4DNWSLCtdS9+TksvYirL7SiRhqL6x1e1qHKsvWaNwP346kEmKG5Gl
 sls152oyiNrLLiIZucuzLsJthlqJIWH+r5GNBtvxx/TO8i7O82c24wyohJm+U4lt
 liwyPQtudDTnPkA4fbTevNqQlPRMXv1/w5B2LMfJiLmbbh0fVHWaU7beHwyDji73
 xF4zYtzEp4zmIQ==
 =rqGK
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:

 - On Power8 bare metal, fix creation of RNG platform devices, which are
   needed for the /dev/hwrng driver to probe correctly.

Thanks to Jason A. Donenfeld, and Sachin Sant.

* tag 'powerpc-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: delay rng platform device creation until later in boot
2022-07-09 10:34:08 -07:00
Takashi Iwai
a4bd9358d5 ASoC: Fixes for v5.19
Quite a large batch due to things building up for a couple of weeks but
 all driver specific apart from Marek's documentation fix.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmLHDQcACgkQJNaLcl1U
 h9DaoAf+LjanLg76KbeHMQg/IaG2dhvKJvSAjixMdLlNixMDpSOVr/2UR4lzLCpw
 gfoJk9liujPjtKMsIlQT/OiVdsMROc8vfb10PdmV7rntaBwCN01aqyr4NMauOmby
 ccdZlsoj7D0k3iTtP0nlZ+Xvm+L0zTEfI66bSB9UU+/h8PDSoK/RHMCjNBtqenAd
 4QMQWxQTvqt3xfgrWxiZeFPUJrLFjDypEDqG8qx8roIcbzEhVYInGq52XZw3ST9o
 mvC/LIFK96WvDGgiOj2UNmSuWpe3lF7cMPmomwyYkdUMcMwbSdVfDOsnAdnyqONi
 RiWIr7g5NOPkYa4mSNhB9Ia1PTzf0Q==
 =V5ia
 -----END PGP SIGNATURE-----

Merge tag 'asoc-fix-v5.19-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.19

Quite a large batch due to things building up for a couple of weeks but
all driver specific apart from Marek's documentation fix.
2022-07-09 18:23:54 +02:00
Pablo Neira Ayuso
c39ba4de6b netfilter: nf_tables: replace BUG_ON by element length check
BUG_ON can be triggered from userspace with an element with a large
userdata area. Replace it by length check and return EINVAL instead.
Over time extensions have been growing in size.

Pick a sufficiently old Fixes: tag to propagate this fix.

Fixes: 7d7402642e ("netfilter: nf_tables: variable sized set element keys / data")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-09 16:25:09 +02:00
Jens Axboe
d785a773be io_uring: check that we have a file table when allocating update slots
If IORING_FILE_INDEX_ALLOC is set asking for an allocated slot, the
helper doesn't check if we actually have a file table or not. The non
alloc path does do that correctly, and returns -ENXIO if we haven't set
one up.

Do the same for the allocated path, avoiding a NULL pointer dereference
when trying to find a free bit.

Fixes: a7c41b4687 ("io_uring: let IORING_OP_FILES_UPDATE support choosing fixed file slots")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-09 07:02:10 -06:00
Eric Dumazet
72a0b32911 vlan: fix memory leak in vlan_newlink()
Blamed commit added back a bug I fixed in commit 9bbd917e0b
("vlan: fix memory leak in vlan_dev_set_egress_priority")

If a memory allocation fails in vlan_changelink() after other allocations
succeeded, we need to call vlan_dev_free_egress_priority()
to free all allocated memory because after a failed ->newlink()
we do not call any methods like ndo_uninit() or dev->priv_destructor().

In following example, if the allocation for last element 2000:2001 fails,
we need to free eight prior allocations:

ip link add link dummy0 dummy0.100 type vlan id 100 \
	egress-qos-map 1:2 2:3 3:4 4:5 5:6 6:7 7:8 8:9 2000:2001

syzbot report was:

BUG: memory leak
unreferenced object 0xffff888117bd1060 (size 32):
comm "syz-executor408", pid 3759, jiffies 4294956555 (age 34.090s)
hex dump (first 32 bytes):
09 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff83fc60ad>] kmalloc include/linux/slab.h:600 [inline]
[<ffffffff83fc60ad>] vlan_dev_set_egress_priority+0xed/0x170 net/8021q/vlan_dev.c:193
[<ffffffff83fc6628>] vlan_changelink+0x178/0x1d0 net/8021q/vlan_netlink.c:128
[<ffffffff83fc67c8>] vlan_newlink+0x148/0x260 net/8021q/vlan_netlink.c:185
[<ffffffff838b1278>] rtnl_newlink_create net/core/rtnetlink.c:3363 [inline]
[<ffffffff838b1278>] __rtnl_newlink+0xa58/0xdc0 net/core/rtnetlink.c:3580
[<ffffffff838b1629>] rtnl_newlink+0x49/0x70 net/core/rtnetlink.c:3593
[<ffffffff838ac66c>] rtnetlink_rcv_msg+0x21c/0x5c0 net/core/rtnetlink.c:6089
[<ffffffff839f9c37>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2501
[<ffffffff839f8da7>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
[<ffffffff839f8da7>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
[<ffffffff839f9266>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
[<ffffffff8384dbf6>] sock_sendmsg_nosec net/socket.c:714 [inline]
[<ffffffff8384dbf6>] sock_sendmsg+0x56/0x80 net/socket.c:734
[<ffffffff8384e15c>] ____sys_sendmsg+0x36c/0x390 net/socket.c:2488
[<ffffffff838523cb>] ___sys_sendmsg+0x8b/0xd0 net/socket.c:2542
[<ffffffff838525b8>] __sys_sendmsg net/socket.c:2571 [inline]
[<ffffffff838525b8>] __do_sys_sendmsg net/socket.c:2580 [inline]
[<ffffffff838525b8>] __se_sys_sendmsg net/socket.c:2578 [inline]
[<ffffffff838525b8>] __x64_sys_sendmsg+0x78/0xf0 net/socket.c:2578
[<ffffffff845ad8d5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff845ad8d5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
[<ffffffff8460006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: 37aa50c539 ("vlan: introduce vlan_dev_free_egress_priority")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:26:59 +01:00
Baowen Zheng
9c840d5f9a nfp: fix issue of skb segments exceeds descriptor limitation
TCP packets will be dropped if the segments number in the tx skb
exceeds limitation when sending iperf3 traffic with --zerocopy option.

we make the following changes:

Get nr_frags in nfp_nfdk_tx_maybe_close_block instead of passing from
outside because it will be changed after skb_linearize operation.

Fill maximum dma_len in first tx descriptor to make sure the whole
head is included in the first descriptor.

Fixes: c10d12e3dc ("nfp: add support for NFDK data path")
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:25:02 +01:00
Pawan Gupta
4ad3278df6 x86/speculation: Disable RRSBA behavior
Some Intel processors may use alternate predictors for RETs on
RSB-underflow. This condition may be vulnerable to Branch History
Injection (BHI) and intramode-BTI.

Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines,
eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against
such attacks. However, on RSB-underflow, RET target prediction may
fallback to alternate predictors. As a result, RET's predicted target
may get influenced by branch history.

A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback
behavior when in kernel mode. When set, RETs will not take predictions
from alternate predictors, hence mitigating RETs as well. Support for
this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2).

For spectre v2 mitigation, when a user selects a mitigation that
protects indirect CALLs and JMPs against BHI and intramode-BTI, set
RRSBA_DIS_S also to protect RETs for RSB-underflow case.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-09 13:12:45 +02:00
Konrad Rzeszutek Wilk
697977d841 x86/kexec: Disable RET on kexec
All the invocations unroll to __x86_return_thunk and this file
must be PIC independent.

This fixes kexec on 64-bit AMD boxes.

  [ bp: Fix 32-bit build. ]

Reported-by: Edward Tran <edward.tran@oracle.com>
Reported-by: Awais Tanveer <awais.tanveer@oracle.com>
Suggested-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-09 13:12:32 +02:00
Pablo Neira Ayuso
7a847c00ee netfilter: nf_log: incorrect offset to network header
NFPROTO_ARP is expecting to find the ARP header at the network offset.

In the particular case of ARP, HTYPE= field shows the initial bytes of
the ethernet header destination MAC address.

 netdev out: IN= OUT=bridge0 MACSRC=c2:76:e5:71:e1:de MACDST=36:b0:4a:e2:72:ea MACPROTO=0806 ARP HTYPE=14000 PTYPE=0x4ae2 OPCODE=49782

NFPROTO_NETDEV egress hook is also expecting to find the IP headers at
the network offset.

Fixes: 35b9395104 ("netfilter: add generic ARP packet logger")
Reported-by: Tom Yan <tom.ty89@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-09 09:55:43 +02:00
Jakub Kicinski
6676d7270c Merge branch 'selftests-forwarding-install-two-missing-tests'
Martin Blumenstingl says:

====================
selftests: forwarding: Install two missing tests

For some distributions (e.g. OpenWrt) we don't want to rely on rsync
to copy the tests to the target as some extra dependencies need to be
installed. The Makefile in tools/testing/selftests/net/forwarding
already installs most of the tests.

This series adds the two missing tests to the list of installed tests.
That way a downstream distribution can build a package using this
Makefile (and add dependencies there as needed).
====================

Link: https://lore.kernel.org/r/20220707135532.1783925-1-martin.blumenstingl@googlemail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 20:31:03 -07:00
Martin Blumenstingl
cfbba7b46a selftests: forwarding: Install no_forwarding.sh
When using the Makefile from tools/testing/selftests/net/forwarding/
all tests should be installed. Add no_forwarding.sh to the list of
"to be installed tests" where it has been missing so far.

Fixes: 476a4f05d9 ("selftests: forwarding: add a no_forwarding.sh test")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 20:30:57 -07:00
Martin Blumenstingl
437ac2592c selftests: forwarding: Install local_termination.sh
When using the Makefile from tools/testing/selftests/net/forwarding/
all tests should be installed. Add local_termination.sh to the list of
"to be installed tests" where it has been missing so far.

Fixes: 90b9566aa5 ("selftests: forwarding: add a test for local_termination.sh")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 20:30:57 -07:00
Linus Torvalds
e5524c2a1f fscache fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmLInYwACgkQ+7dXa6fL
 C2tbmxAAlBkMvoIeyOlkMeAB3l1aK5CAYmgddjWOxY6UHNIGgXqWlGJCoxYngTmy
 j+5Uz1mMNh7TamSJpCTXUBbiLWxzoHI34oie+NT9VUXTTVw0/GQ//jQPlvE+HeJG
 Hmi0HR4lby1O6X9r63obzvDctJPhsVLdgJFtZ3UidwzgY2zYnScxwyCuwSfLUVnF
 YJACZWPFj1oWxOiS2bQZgnvCJpfJ6J0I5asmO0lI1tQDS0o4RoiRvDPTx8Z6K3KO
 8CSm5tvu+vYrbZ5aNCN9f0MZfNl+YZiNQyxEiIP9Phu9iZFRe9UgXvDrE9346uQM
 /Vxa5nvyj+D0Hg+l1jBFXvwoKxLHTZ/BaK74h2/QzRKgxKrHTAyYOgxqt7U/YCsa
 S489eHOGNwbqIKACljBcMX9YZeCdeixt4aK/VIRr+d1f2Ui30Z05ExuYOfGn4wBq
 VcSsBgAALwcyqrpi61URYvKduHTa2D8ATKNKXNM/i1Wv6KQDeom5jyKj3mE7UNsI
 WAgWE2xy74kjYs1Lc0izsfieyZrt1MqZiyYFjS86Cnt0wVmdAQYrQ3wciN5D9IJN
 MiUYuienGRf1MNM2tmFK+N7e+pa/waKcl15/d3Zal94rlATpR2bFOI1zJ9qJ8rpi
 k8B/eydAZx+oUthxWvfIdLwIREJzA49jzBHgfAy/CMjqPCj7gZ8=
 =FDkz
 -----END PGP SIGNATURE-----

Merge tag 'fscache-fixes-20220708' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull fscache fixes from David Howells:

 - Fix a check in fscache_wait_on_volume_collision() in which the
   polarity is reversed. It should complain if a volume is still marked
   acquisition-pending after 20s, but instead complains if the mark has
   been cleared (ie. the condition has cleared).

   Also switch an open-coded test of the ACQUIRE_PENDING volume flag to
   use the helper function for consistency.

 - Not a fix per se, but neaten the code by using a helper to check for
   the DROPPED state.

 - Fix cachefiles's support for erofs to only flush requests associated
   with a released control file, not all requests.

 - Fix a race between one process invalidating an object in the cache
   and another process trying to look it up.

* tag 'fscache-fixes-20220708' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  fscache: Fix invalidation/lookup race
  cachefiles: narrow the scope of flushed requests when releasing fd
  fscache: Introduce fscache_cookie_is_dropped()
  fscache: Fix if condition in fscache_wait_on_volume_collision()
2022-07-08 16:08:48 -07:00
Jakub Kicinski
7c895ef884 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
bpf 2022-07-08

We've added 3 non-merge commits during the last 2 day(s) which contain
a total of 7 files changed, 40 insertions(+), 24 deletions(-).

The main changes are:

1) Fix cBPF splat triggered by skb not having a mac header, from Eric Dumazet.

2) Fix spurious packet loss in generic XDP when pushing packets out (note
   that native XDP is not affected by the issue), from Johan Almbladh.

3) Fix bpf_dynptr_{read,write}() helper signatures with flag argument before
   its set in stone as UAPI, from Joanne Koong.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs
  bpf: Make sure mac_header was set before using it
  xdp: Fix spurious packet loss in generic XDP TX path
====================

Link: https://lore.kernel.org/r/20220708213418.19626-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 15:24:16 -07:00
Sven Schnelle
3418357a32 ptrace: fix clearing of JOBCTL_TRACED in ptrace_unfreeze_traced()
CI reported the following splat while running the strace testsuite:

[ 3976.640309] WARNING: CPU: 1 PID: 3570031 at kernel/ptrace.c:272 ptrace_check_attach+0x12e/0x178
[ 3976.640391] CPU: 1 PID: 3570031 Comm: strace Tainted: G           OE     5.19.0-20220624.rc3.git0.ee819a77d4e7.300.fc36.s390x #1
[ 3976.640410] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
[ 3976.640452] Call Trace:
[ 3976.640454]  [<00000000ab4b645a>] ptrace_check_attach+0x132/0x178
[ 3976.640457] ([<00000000ab4b6450>] ptrace_check_attach+0x128/0x178)
[ 3976.640460]  [<00000000ab4b6cde>] __s390x_sys_ptrace+0x86/0x160
[ 3976.640463]  [<00000000ac03fcec>] __do_syscall+0x1d4/0x200
[ 3976.640468]  [<00000000ac04e312>] system_call+0x82/0xb0
[ 3976.640470] Last Breaking-Event-Address:
[ 3976.640471]  [<00000000ab4ea3c8>] wait_task_inactive+0x98/0x190

This is because JOBCTL_TRACED is set, but the task is not in TASK_TRACED
state. Caused by ptrace_unfreeze_traced() which does:

task->jobctl &= ~TASK_TRACED

but it should be:

task->jobctl &= ~JOBCTL_TRACED

Fixes: 31cae1eaae ("sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220706101625.2100298-1-svens@linux.ibm.com
Link: https://lkml.kernel.org/r/YrHA5UkJLornOdCz@li-4a3a4a4c-28e5-11b2-a85c-a8d192c6f089.ibm.com
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2101641
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Tested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2022-07-08 16:08:24 -05:00
Linus Torvalds
525496a030 ACPI fixes for 5.19-rc6
- Prevent _CPC from being used if the platform firmware does not
    confirm CPPC v2 support via _OSC (Mario Limonciello).
 
  - Allow systems with X86_FEATURE_CPPC set to use _CPC even if CPPC
    support cannot be agreed on via _OSC (Mario Limonciello).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmLIgTwSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxmDAP/1OEgChbdJNvhg7no4BHEwSyuArAVcRD
 +D5bCiPq8x5dSNke3BrN4n789fhsHphXucqI5yTq7lX1AonyO1EHkc+oG+EGrZ+8
 dyw+6Nx4WdJkOLKrEwQm3QQYuBK/k4pCUwhWBBg5FEsn4QldBE2pzEC8aP7gNgN6
 gAxs+6nC7r/9fMoRo8+SG9Xrr1FH1mddru4WTp6croLNrlLVXTaoAHwXMB4qk8in
 JC8bvfnWSWXn05fvJ+nYf8wb93tjBsuyx/Om7kmP7V1WT30fidv+BdP50x6kFq9/
 nE0iTAAgqiO1SpRpUHfYc63GUvBC9d9ZUsbSyT7fMEqZCqLRsKukUA5xFxgQORBi
 fKWVVSjOaQUFk0eeivdqWTgpaIRkjxVCmLFxGN/vY3T1pqAmJIdoY5x7ZzxuOrBW
 6h1zOT0Qm9LURrlnXhcFHXALXFAGVpYCoP4X0PhNi9fZCSrjraG77dar0fj5KgHv
 pUnLTL1qHxrzX/Ax7LJ/Usc+mKO8s/Wd790CvtFElHIsH5p7Bx/l3BDdQHbGK2l2
 ToO9WQQCumccXs6SHCVYQJIH1pr0UIB2X15FNrmLVem8WkEBFJ7DwrZ/nitenOV3
 c9BsQr+rJ3s9S9aSYvIAadurPzC6x5lKu0GKMUfM2GUl/vVcFrzkzlAATD467/IC
 Z7gsfL1UtOwx
 =yWsx
 -----END PGP SIGNATURE-----

Merge tag 'acpi-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix two recent regressions related to CPPC support.

  Specifics:

   - Prevent _CPC from being used if the platform firmware does not
     confirm CPPC v2 support via _OSC (Mario Limonciello)

   - Allow systems with X86_FEATURE_CPPC set to use _CPC even if CPPC
     support cannot be agreed on via _OSC (Mario Limonciello)"

* tag 'acpi-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported
  ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked
2022-07-08 13:05:56 -07:00
Linus Torvalds
3784fad934 Power management fixes for 5.19-rc6
- Fix NULL pointer dereference related to printing a diagnostic
    message in the exynos-bus devfreq driver (Christian Marangi).
 
  - Fix race condition in the runtime PM framework which in some
    cases may cause a supplier device to be suspended when its
    consumer is still active (Rafael Wysocki).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmLIgKMSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxNgAP/R4t7jx+LyeQbX9JzdY/bA6NTGIdkMbJ
 RhA9Nc3Sbsb+KXulsccq108O4CZwjiCEngvZHkOJJtzgRzLL7jxV6J63tNfLLS60
 oJS4EAOAZ0c1JiO/Fh9G/c6stKdPAajs4PNSHpGcsq8m1e8CktmWPqW40OPD6kDk
 YbkdMkO/7KJtOBX4IwzF67lYMQsulUGi2TAsq6sxsC00KA2A1CoXTWzoamDKtOD3
 8y5JKKRCg7nYhZ7/qxy1zlnhHFJGzVppWbSv98DdZdvsKyizAkk7agBOP4Ld3iPT
 ZtA1L0Pgjx6qUoQ7D7rUgNrmnHGCk77aluAkg/gtS2GI61bvbtT3apyYusJVC0jg
 IKI3MHZZTwqdsEyoBxxWXuRciv1HXXrcwSMB3S9pBAwZEES+GwDFO9MwStkAv8zi
 acceNirUiawODs6qNZjS8Ycsaz56DHCjcsBBCvmP5NnNA8kFdqxB4muz1SRmqubE
 N9BPNDLFCoZArUkwg35QL9EZZLYjLjf5YtF9Xe3lPCaQdbPaIc+4MfXkEp4HHYC3
 V1vvbyZbkq/xSNNCjXRoF4n1x2IZvz7SpsdPjxqADpVNxpL/d/dET8/NBXGxAwJE
 KLh/iElyuIL17mgqH/fXpYNI6bqphxhe+sasH/nFpKOewdEQDtnYqmz6GQQGH469
 /42eiBC4saNB
 =iyaY
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix a NULL pointer dereference in a devfreq driver and a runtime
  PM framework issue that may cause a supplier device to be suspended
  before its consumer.

  Specifics:

   - Fix NULL pointer dereference related to printing a diagnostic
     message in the exynos-bus devfreq driver (Christian Marangi)

   - Fix race condition in the runtime PM framework which in some cases
     may cause a supplier device to be suspended when its consumer is
     still active (Rafael Wysocki)"

* tag 'pm-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / devfreq: exynos-bus: Fix NULL pointer dereference
  PM: runtime: Fix supplier device management during consumer probe
  PM: runtime: Redefine pm_runtime_release_supplier()
2022-07-08 13:01:04 -07:00
Linus Torvalds
483e4a1d83 cxl fixes for 5.19-rc6
- Update MAINTAINERS for Ben's email
 
 - Fix cleanup of port devices on failure to probe driver
 
 - Fix endianness in get/set LSA mailbox command structures
 
 - Fix memregion_free() fallback definition
 
 - Fix missing variable payload checks in CXL cmd size validation
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQT9vPEBxh63bwxRYEEPzq5USduLdgUCYsiAcQAKCRAPzq5USduL
 duFHAP9Szd4QqD4hrqjkUeAHqwm6Sa9xgReB+Pbk3VuMbR25YwD9FhLA5fy9UL7Q
 oG8srdh0SqTB/kdOs2hnHUTEnonBGAw=
 =fso6
 -----END PGP SIGNATURE-----

Merge tag 'cxl-fixes-for-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl fixes from Vishal Verma:

 - Update MAINTAINERS for Ben's email

 - Fix cleanup of port devices on failure to probe driver

 - Fix endianness in get/set LSA mailbox command structures

 - Fix memregion_free() fallback definition

 - Fix missing variable payload checks in CXL cmd size validation

* tag 'cxl-fixes-for-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/mbox: Fix missing variable payload checks in cmd size validation
  memregion: Fix memregion_free() fallback definition
  cxl/mbox: Use __le32 in get,set_lsa mailbox structures
  cxl/core: Use is_endpoint_decoder
  cxl: Fix cleanup of port devices on failure to probe driver.
  MAINTAINERS: Update Ben's email address
2022-07-08 12:55:25 -07:00
Linus Torvalds
f5645edf6c IOMMU Fixes for Linux v5.19-rc5
Including:
 
 	- Patch for the Intel VT-d driver to fix device setup failures
 	  when the PASID table is shared
 
 	- Fix for Intel VT-d device hot-add failure due to wrong device
 	  notifier order
 
 	- Remove the old IOMMU mailing list from the MAINTAINERS file
 	  now that it has been retired
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmLIUbYACgkQK/BELZcB
 GuPSYBAAo5HfhNcdBaecojN/bPxvOpZBm00Tta1XQ6KnEKGiD+qgE3EhZZgmeaEq
 yrpomHhGzb6Der+eq0mdKSYGTY4RTPFbwTpmZh/VtCm5aLnhjeyjgk5y3fFDlsiU
 wZfKr1BrBZvJ+yAgpAdvIKsZEjsNyK+2cwvLz36WlMHqzMtNtqaU0OF3oxF+DC08
 CzFOlRSqOiyp2Ct4FP9yEXU8sVwvqnHQbuekn8L+loQB5zm5SgIOVy7ttb4idPEn
 Q2Kg6imxRVSA8MXvMR8OleQrnOTZrMZVr5OuMbsFPQ2H9RxX7Bq92AMPSReYGRhi
 ep6WOVaAq14PxlL4y4HyEd6RYkiNYyz0hdnoXgezJF3k0hoG6hZLdG/SSK5l7zGY
 +hrMmKgnF7uPz1M7YFenoy5p5EudzRBafIRgXrOZNZoXUdwNhh8hqJfjRev4op/y
 t8Mt1j6goX9icUonPK5SN8MrDo8EaQj0BgC0d2ihScA3uOhVGSfCzLfhNDkGg5Jy
 b936bgBzwv8Q7kBORPMmzK1+sNu28/NJnSmURGeSGGwZnsSOEd/GRjZkf7iGc9u7
 RrRpW4dH3RlwcUY/ug6UcGqq7xaQPSS//PIVQ/dmjCjX32Xk95jurt1v/hLf0Cm8
 eXSp1Q69kQAiNsu4uvZMfYRsVzh9Z2+UgdpoVXQH20ct94M92vA=
 =KPue
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - fix device setup failures in the Intel VT-d driver when the PASID
   table is shared

 - fix Intel VT-d device hot-add failure due to wrong device notifier
   order

 - remove the old IOMMU mailing list from the MAINTAINERS file now that
   it has been retired

* tag 'iommu-fixes-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  MAINTAINERS: Remove iommu@lists.linux-foundation.org
  iommu/vt-d: Fix RID2PASID setup/teardown failure
  iommu/vt-d: Fix PCI bus rescan device hot add
2022-07-08 12:49:00 -07:00
Linus Torvalds
2b93fe647c gpio fixes for v5.19-rc6
- fix a build error in gpio-vf610
 - fix a null-pointer dereference in the GPIO character device code
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmLIQ2YACgkQEacuoBRx
 13KmMRAAqP0dMz9xocZWIif2feUYy4EAHGidIh0AN2ehTW0urmb6e8KAzOL4t0h6
 O7ODeCHXvtM7xGqmeFPveyY+yygSfB1oitQQWawePpUHtG0jHicG6gtV70S7k0BA
 MZbBhWHmJ0iC0s3x5DPnXlJ2/jqBjaF7UZi4bHVMMMfjB049YVTKQJ3mS/KTEFv8
 Wn5GBwSACwSlYYA9opjJNK7fxxU4z7tbiycORLFKoiG66AdOGz7aZeI9Rr06EZ3T
 ftkawIfxOtBivvcRdQmqKF3pvYju/aWnSTfCErmuifHP+sSsna8EZke+qvyx2YJp
 +Q95QObc/7sghKWjSNz1vvGRdT27kzyJdquopzVDdV5+vM6X7OGS/leSmGTo5m0f
 WAt8ZCmyGcTAsK3eL814vhntYls9aTiCeHLYD3i5k+cVym8qHxzGLe8vls8+Mmvy
 U/S0obfFHbe2dur4ET2f71vjF4niEJmDwNYTe5tCzTNP/9fWcw8LLMOKfdjLldu7
 44ennv+FAKE9eLRrRjQ8CkpXxUl++IV/TEtbTTY780ocSG4Olr951Ns7YQQC4aV3
 ro3SzmgBaeO7UDvelCUsxhQT1G3faBmhwaJh57CegGFSKQYN1waCmT/oispz4U0p
 08Xa7f2iNnWjBqI4nLWe2vad7+2Lq0SwH9jhp9oIw2qmVGGnz0I=
 =Fq0O
 -----END PGP SIGNATURE-----

Merge tag 'gpio-fixes-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

 - fix a build error in gpio-vf610

 - fix a null-pointer dereference in the GPIO character device code

* tag 'gpio-fixes-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: cdev: fix null pointer dereference in linereq_free()
  gpio: vf610: fix compilation error
2022-07-08 12:39:52 -07:00
Rafael J. Wysocki
fe7c758c07 Merge branch 'pm-core'
Merge a runtime PM framework cleanup and fix related to device links.

* pm-core:
  PM: runtime: Fix supplier device management during consumer probe
  PM: runtime: Redefine pm_runtime_release_supplier()
2022-07-08 20:38:51 +02:00
Linus Torvalds
a471da3100 block-5.19-2022-07-08
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLIJFEQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgIPD/9InvUKGh8y27bZT+y5F7aucRm1wAZGa016
 EESTOapGUb0lWo3X3JGDjwfnpUhCWwxsHs5R13ZEdrwEWh1q55TJiCc9MGt2X+NB
 f5fJHeoa9hWTlG/iFi7+ElkuTER1TcKqxysL3Gci6JWlQ1ObYVBOZAi9/9v5r5t8
 3Cv7KfkhhclVOFpEMio00YTSl54axKToqFoNr0jE10lVXZtZtIsPlRAt8QvG66qA
 K10rbznRrKmtKJvzHKx+3PrTj77QIaifcH9bqRu4KWqs2SsoEUIjlVKUkbz8F16g
 +YmhfktYaatrPI/G3/tACKkdCi537HdUVyJFRFNw1439+DSETJNi2nQjeVdatsTe
 kEL+5RlrUfjUnQpEWUBOmW4T5mNB0sryxSiOjSNQx0X8Mulp6SI7CaNovioI0SWS
 7lydZZpb3difQjdrA1iGU8dpLtyDPRPeQZOJV9jPodhMYzr5UB4qTX2NkWTFSnr3
 tiIflRdEdbGCVc5E41OVMpIAzdyLQWQDRn3r/3T/hScN99mKKAEv31HoPP4etmRw
 M8mvwrJwNRrbOd+1QiAgblh/3ozqafmFEAI3LZBtCbupb1fWG2qaqXer5KPWpCXv
 Kor+Q6dGWK0pQFSxoMaFwjDUo7gZiL9RxkmZEpkr+w1rKA3gqlx4b+NlZV7RbJez
 2tijGLQtQA==
 =UQiB
 -----END PGP SIGNATURE-----

Merge tag 'block-5.19-2022-07-08' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "NVMe pull request with another id quirk addition, and a tracing fix"

* tag 'block-5.19-2022-07-08' of git://git.kernel.dk/linux-block:
  nvme: use struct group for generic command dwords
  nvme-pci: phison e16 has bogus namespace ids
2022-07-08 11:32:23 -07:00
Linus Torvalds
29837019d5 io_uring-5.19-2022-07-08
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLIJGcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgS1D/9E2MxaWaHR+A35AXignJLaXxiLWlzTAwxr
 /ioT6omgogbiaoTrxVZMbW+6vAPbUamXA9yqv2fC/4RERBfz3Z6UBLa8Hp6SFCMQ
 vb/LSwRwnUuMr/HyF3hLjwISOi4oYs/uo3E7IpOPbpC/e0nDpJnGWx1UfQn0tJSH
 WVwZsLbUe8T9fhHA0uSHbMoMSvLQGqfwwY+MET2+j+ZDwMoet194yka22jJwfDbF
 l3cnUe2TQh6orRdzuagWX9WmdnWWyQM3DTqW2cSA0hepyxGMWkCMhMuyV5yqUXhD
 noHshcyL76h8WQi/BwYDAGYqGy1+FkOkuV3DmVnjHVQory17Ze8ijtImMoEWpkgl
 TwTTd2+o0ivcEd0JHLeqLHkTXKUENeUMTpJVuLotLMMdupIvF0jrdNWTWCM7uBto
 Q9JxIkEs+16bRqT+yzC4cNuzSQRL6+qQ5jVO5BsNmJoNvs15KN7vgAQ+uR4NCCIv
 GqHbTiBVsi7DYipS6jNi/bWnxDtIsNsn48WCdx52OYpd1NbkY3oEHMqBZ6rnsPJH
 /uek3VLajRZfG61EMUGlazitQdW3/z31wM0iP8Y8xPyvSNCbsJkFtrNO13jpuy9p
 0YRP3peXYb2eEzYpq355RSCCpyActDYp77hjcyYQP3gJcnoViZ6rBUHr5Rw5W6lN
 siwWz7aNXg==
 =w3SQ
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.19-2022-07-08' of git://git.kernel.dk/linux-block

Pull io_uring tweak from Jens Axboe:
 "Just a minor tweak to an addition made in this release cycle: padding
  a 32-bit value that's in a 64-bit union to avoid any potential
  funkiness from that"

* tag 'io_uring-5.19-2022-07-08' of git://git.kernel.dk/linux-block:
  io_uring: explicit sqe padding for ioctl commands
2022-07-08 11:25:01 -07:00
Linus Torvalds
086ff84617 fbdev fixes and updates for kernel v5.19-rc6:
fbcon now prevents switching to screen resolutions which are smaller
 than the font size, and prevents enabling a font which is bigger than
 the current screen resolution. This fixes vmalloc-out-of-bounds accesses
 found by KASAN.
 
 Guiling Deng fixed a bug where the centered fbdev logo wasn't displayed
 correctly if the screen size matched the logo size.
 
 Hsin-Yi Wang provided a patch to include errno.h to fix build when
 CONFIG_OF isn't enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCYsfdCgAKCRD3ErUQojoP
 XxhvAP9HYH0XcfHGEIQ3YSyFRY4JpyLb0TcTD7mnrYwPgAw+KAD7Bw9EE7WhGZLC
 7iuDn30/mdCFqiz3IuTjoRKYuG2ceg8=
 =yrqq
 -----END PGP SIGNATURE-----

Merge tag 'for-5.19/fbdev-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull fbdev fixes from Helge Deller:

 - fbcon now prevents switching to screen resolutions which are smaller
   than the font size, and prevents enabling a font which is bigger than
   the current screen resolution. This fixes vmalloc-out-of-bounds
   accesses found by KASAN.

 - Guiling Deng fixed a bug where the centered fbdev logo wasn't
   displayed correctly if the screen size matched the logo size.

 - Hsin-Yi Wang provided a patch to include errno.h to fix build when
   CONFIG_OF isn't enabled.

* tag 'for-5.19/fbdev-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbcon: Use fbcon_info_from_console() in fbcon_modechange_possible()
  fbmem: Check virtual screen sizes in fb_set_var()
  fbcon: Prevent that screen size is smaller than font size
  fbcon: Disallow setting font bigger than screen size
  video: of_display_timing.h: include errno.h
  fbdev: fbmem: Fix logo center image dx issue
2022-07-08 11:03:26 -07:00
Naohiro Aota
b3a3b02557 btrfs: zoned: drop optimization of zone finish
We have an optimization in do_zone_finish() to send REQ_OP_ZONE_FINISH only
when necessary, i.e. we don't send REQ_OP_ZONE_FINISH when we assume we
wrote fully into the zone.

The assumption is determined by "alloc_offset == capacity". This condition
won't work if the last ordered extent is canceled due to some errors. In
that case, we consider the zone is deactivated without sending the finish
command while it's still active.

This inconstancy results in activating another block group while we cannot
really activate the underlying zone, which causes the active zone exceeds
errors like below.

    BTRFS error (device nvme3n2): allocation failed flags 1, wanted 520192 tree-log 0, relocation: 0
    nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR
    active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0
    nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR
    active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0

Fix the issue by removing the optimization for now.

Fixes: 8376d9e1ed ("btrfs: zoned: finish superblock zone once no space left for new SB")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-07-08 19:18:00 +02:00
Christoph Hellwig
2963457829 btrfs: zoned: fix a leaked bioc in read_zone_info
The bioc would leak on the normal completion path and also on the RAID56
check (but that one won't happen in practice due to the invalid
combination with zoned mode).

Fixes: 7db1c5d14d ("btrfs: zoned: support dev-replace in zoned filesystems")
CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[ update changelog ]
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-07-08 19:13:32 +02:00
Filipe Manana
a4527e1853 btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents
When doing a direct IO read or write, we always return -ENOTBLK when we
find a compressed extent (or an inline extent) so that we fallback to
buffered IO. This however is not ideal in case we are in a NOWAIT context
(io_uring for example), because buffered IO can block and we currently
have no support for NOWAIT semantics for buffered IO, so if we need to
fallback to buffered IO we should first signal the caller that we may
need to block by returning -EAGAIN instead.

This behaviour can also result in short reads being returned to user
space, which although it's not incorrect and user space should be able
to deal with partial reads, it's somewhat surprising and even some popular
applications like QEMU (Link tag #1) and MariaDB (Link tag #2) don't
deal with short reads properly (or at all).

The short read case happens when we try to read from a range that has a
non-compressed and non-inline extent followed by a compressed extent.
After having read the first extent, when we find the compressed extent we
return -ENOTBLK from btrfs_dio_iomap_begin(), which results in iomap to
treat the request as a short read, returning 0 (success) and waiting for
previously submitted bios to complete (this happens at
fs/iomap/direct-io.c:__iomap_dio_rw()). After that, and while at
btrfs_file_read_iter(), we call filemap_read() to use buffered IO to
read the remaining data, and pass it the number of bytes we were able to
read with direct IO. Than at filemap_read() if we get a page fault error
when accessing the read buffer, we return a partial read instead of an
-EFAULT error, because the number of bytes previously read is greater
than zero.

So fix this by returning -EAGAIN for NOWAIT direct IO when we find a
compressed or an inline extent.

Reported-by: Dominique MARTINET <dominique.martinet@atmark-techno.com>
Link: https://lore.kernel.org/linux-btrfs/YrrFGO4A1jS0GI0G@atmark-techno.com/
Link: https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
Tested-by: Dominique MARTINET <dominique.martinet@atmark-techno.com>
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-07-08 19:13:22 +02:00
Christian Brauner
4a47c6385b ovl: turn of SB_POSIXACL with idmapped layers temporarily
This cycle we added support for mounting overlayfs on top of idmapped
mounts.  Recently I've started looking into potential corner cases when
trying to add additional tests and I noticed that reporting for POSIX ACLs
is currently wrong when using idmapped layers with overlayfs mounted on top
of it.

I have sent out an patch that fixes this and makes POSIX ACLs work
correctly but the patch is a bit bigger and we're already at -rc5 so I
recommend we simply don't raise SB_POSIXACL when idmapped layers are
used. Then we can fix the VFS part described below for the next merge
window so we can have good exposure in -next.

I'm going to give a rather detailed explanation to both the origin of the
problem and mention the solution so people know what's going on.

Let's assume the user creates the following directory layout and they have
a rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you
would expect files on your host system to be owned. For example, ~/.bashrc
for your regular user would be owned by 1000:1000 and /root/.bashrc would
be owned by 0:0. IOW, this is just regular boring filesystem tree on an
ext4 or xfs filesystem.

The user chooses to set POSIX ACLs using the setfacl binary granting the
user with uid 4 read, write, and execute permissions for their .bashrc
file:

        setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc

Now they to expose the whole rootfs to a container using an idmapped
mount. So they first create:

        mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap}
        mkdir -pv /vol/contpool/ctrover/{over,work}
        chown 10000000:10000000 /vol/contpool/ctrover/{over,work}

The user now creates an idmapped mount for the rootfs:

        mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \
                                      /var/lib/lxc/c2/rootfs \
                                      /vol/contpool/lowermap

This for example makes it so that
/var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc which is owned by uid and gid
1000 as being owned by uid and gid 10001000 at
/vol/contpool/lowermap/home/ubuntu/.bashrc.

Assume the user wants to expose these idmapped mounts through an overlayfs
mount to a container.

       mount -t overlay overlay                      \
             -o lowerdir=/vol/contpool/lowermap,     \
                upperdir=/vol/contpool/overmap/over, \
                workdir=/vol/contpool/overmap/work   \
             /vol/contpool/merge

The user can do this in two ways:

(1) Mount overlayfs in the initial user namespace and expose it to the
    container.

(2) Mount overlayfs on top of the idmapped mounts inside of the container's
    user namespace.

Let's assume the user chooses the (1) option and mounts overlayfs on the
host and then changes into a container which uses the idmapping
0:10000000:65536 which is the same used for the two idmapped mounts.

Now the user tries to retrieve the POSIX ACLs using the getfacl command

        getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc

and to their surprise they see:

        # file: vol/contpool/merge/home/ubuntu/.bashrc
        # owner: 1000
        # group: 1000
        user::rw-
        user:4294967295:rwx
        group::r--
        mask::rwx
        other::r--

indicating the uid wasn't correctly translated according to the idmapped
mount. The problem is how we currently translate POSIX ACLs. Let's inspect
the callchain in this example:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                  |> vfs_getxattr()
                  |  -> __vfs_getxattr()
                  |     -> handler->get == ovl_posix_acl_xattr_get()
                  |        -> ovl_xattr_get()
                  |           -> vfs_getxattr()
                  |              -> __vfs_getxattr()
                  |                 -> handler->get() /* lower filesystem callback */
                  |> posix_acl_fix_xattr_to_user()
                     {
                              4 = make_kuid(&init_user_ns, 4);
                              4 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 4);
                              /* FAILURE */
                             -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                     }

If the user chooses to use option (2) and mounts overlayfs on top of
idmapped mounts inside the container things don't look that much better:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:10000000:65536

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                  |> vfs_getxattr()
                  |  -> __vfs_getxattr()
                  |     -> handler->get == ovl_posix_acl_xattr_get()
                  |        -> ovl_xattr_get()
                  |           -> vfs_getxattr()
                  |              -> __vfs_getxattr()
                  |                 -> handler->get() /* lower filesystem callback */
                  |> posix_acl_fix_xattr_to_user()
                     {
                              4 = make_kuid(&init_user_ns, 4);
                              4 = mapped_kuid_fs(&init_user_ns, 4);
                              /* FAILURE */
                             -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                     }

As is easily seen the problem arises because the idmapping of the lower
mount isn't taken into account as all of this happens in do_gexattr(). But
do_getxattr() is always called on an overlayfs mount and inode and thus
cannot possible take the idmapping of the lower layers into account.

This problem is similar for fscaps but there the translation happens as
part of vfs_getxattr() already. Let's walk through an fscaps overlayfs
callchain:

        setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc

The expected outcome here is that we'll receive the cap_net_raw capability
as we are able to map the uid associated with the fscap to 0 within our
container.  IOW, we want to see 0 as the result of the idmapping
translations.

If the user chooses option (1) we get the following callchain for fscaps:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                   -> vfs_getxattr()
                      -> xattr_getsecurity()
                         -> security_inode_getsecurity()                                       ________________________________
                            -> cap_inode_getsecurity()                                         |                              |
                               {                                                               V                              |
                                        10000000 = make_kuid(0:0:4k /* overlayfs idmapping */, 10000000);                     |
                                        10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                  |
                                               /* Expected result is 0 and thus that we own the fscap. */                     |
                                               0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);            |
                               }                                                                                              |
                               -> vfs_getxattr_alloc()                                                                        |
                                  -> handler->get == ovl_other_xattr_get()                                                    |
                                     -> vfs_getxattr()                                                                        |
                                        -> xattr_getsecurity()                                                                |
                                           -> security_inode_getsecurity()                                                    |
                                              -> cap_inode_getsecurity()                                                      |
                                                 {                                                                            |
                                                                0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);               |
                                                         10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); |
                                                         10000000 = from_kuid(0:0:4k /* overlayfs idmapping */, 10000000);    |
                                                         |____________________________________________________________________|
                                                 }
                                                 -> vfs_getxattr_alloc()
                                                    -> handler->get == /* lower filesystem callback */

And if the user chooses option (2) we get:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:10000000:65536

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                   -> vfs_getxattr()
                      -> xattr_getsecurity()
                         -> security_inode_getsecurity()                                                _______________________________
                            -> cap_inode_getsecurity()                                                  |                             |
                               {                                                                        V                             |
                                       10000000 = make_kuid(0:10000000:65536 /* overlayfs idmapping */, 0);                           |
                                       10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                           |
                                               /* Expected result is 0 and thus that we own the fscap. */                             |
                                              0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);                     |
                               }                                                                                                      |
                               -> vfs_getxattr_alloc()                                                                                |
                                  -> handler->get == ovl_other_xattr_get()                                                            |
                                    |-> vfs_getxattr()                                                                                |
                                        -> xattr_getsecurity()                                                                        |
                                           -> security_inode_getsecurity()                                                            |
                                              -> cap_inode_getsecurity()                                                              |
                                                 {                                                                                    |
                                                                 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);                      |
                                                          10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0);        |
                                                                 0 = from_kuid(0:10000000:65536 /* overlayfs idmapping */, 10000000); |
                                                                 |____________________________________________________________________|
                                                 }
                                                 -> vfs_getxattr_alloc()
                                                    -> handler->get == /* lower filesystem callback */

We can see how the translation happens correctly in those cases as the
conversion happens within the vfs_getxattr() helper.

For POSIX ACLs we need to do something similar. However, in contrast to
fscaps we cannot apply the fix directly to the kernel internal posix acl
data structure as this would alter the cached values and would also require
a rework of how we currently deal with POSIX ACLs in general which almost
never take the filesystem idmapping into account (the noteable exception
being FUSE but even there the implementation is special) and instead
retrieve the raw values based on the initial idmapping.

The correct values are then generated right before returning to
userspace. The fix for this is to move taking the mount's idmapping into
account directly in vfs_getxattr() instead of having it be part of
posix_acl_fix_xattr_to_user().

To this end we simply move the idmapped mount translation into a separate
step performed in vfs_{g,s}etxattr() instead of in
posix_acl_fix_xattr_{from,to}_user().

To see how this fixes things let's go back to the original example. Assume
the user chose option (1) and mounted overlayfs on top of idmapped mounts
on the host:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                  |> vfs_getxattr()
                  |  |> __vfs_getxattr()
                  |  |  -> handler->get == ovl_posix_acl_xattr_get()
                  |  |     -> ovl_xattr_get()
                  |  |        -> vfs_getxattr()
                  |  |           |> __vfs_getxattr()
                  |  |           |  -> handler->get() /* lower filesystem callback */
                  |  |           |> posix_acl_getxattr_idmapped_mnt()
                  |  |              {
                  |  |                              4 = make_kuid(&init_user_ns, 4);
                  |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                  |  |                       10000004 = from_kuid(&init_user_ns, 10000004);
                  |  |                       |_______________________
                  |  |              }                               |
                  |  |                                              |
                  |  |> posix_acl_getxattr_idmapped_mnt()           |
                  |     {                                           |
                  |                                                 V
                  |             10000004 = make_kuid(&init_user_ns, 10000004);
                  |             10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004);
                  |             10000004 = from_kuid(&init_user_ns, 10000004);
                  |     }       |_________________________________________________
                  |                                                              |
                  |                                                              |
                  |> posix_acl_fix_xattr_to_user()                               |
                     {                                                           V
                                 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                        /* SUCCESS */
                                        4 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000004);
                     }

And similarly if the user chooses option (1) and mounted overayfs on top of
idmapped mounts inside the container:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:10000000:65536

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                  |> vfs_getxattr()
                  |  |> __vfs_getxattr()
                  |  |  -> handler->get == ovl_posix_acl_xattr_get()
                  |  |     -> ovl_xattr_get()
                  |  |        -> vfs_getxattr()
                  |  |           |> __vfs_getxattr()
                  |  |           |  -> handler->get() /* lower filesystem callback */
                  |  |           |> posix_acl_getxattr_idmapped_mnt()
                  |  |              {
                  |  |                              4 = make_kuid(&init_user_ns, 4);
                  |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                  |  |                       10000004 = from_kuid(&init_user_ns, 10000004);
                  |  |                       |_______________________
                  |  |              }                               |
                  |  |                                              |
                  |  |> posix_acl_getxattr_idmapped_mnt()           |
                  |     {                                           V
                  |             10000004 = make_kuid(&init_user_ns, 10000004);
                  |             10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004);
                  |             10000004 = from_kuid(0(&init_user_ns, 10000004);
                  |             |_________________________________________________
                  |     }                                                        |
                  |                                                              |
                  |> posix_acl_fix_xattr_to_user()                               |
                     {                                                           V
                                 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                        /* SUCCESS */
                                        4 = from_kuid(0:10000000:65536 /* caller's idmappings */, 10000004);
                     }

The last remaining problem we need to fix here is ovl_get_acl(). During
ovl_permission() overlayfs will call:

        ovl_permission()
        -> generic_permission()
           -> acl_permission_check()
              -> check_acl()
                 -> get_acl()
                    -> inode->i_op->get_acl() == ovl_get_acl()
                        > get_acl() /* on the underlying filesystem)
                          ->inode->i_op->get_acl() == /*lower filesystem callback */
                 -> posix_acl_permission()

passing through the get_acl request to the underlying filesystem. This will
retrieve the acls stored in the lower filesystem without taking the
idmapping of the underlying mount into account as this would mean altering
the cached values for the lower filesystem. The simple solution is to have
ovl_get_acl() simply duplicate the ACLs, update the values according to the
idmapped mount and return it to acl_permission_check() so it can be used in
posix_acl_permission(). Since overlayfs doesn't cache ACLs they'll be
released right after.

Link: https://github.com/brauner/mount-idmapped/issues/9
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: linux-unionfs@vger.kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Fixes: bc70682a49 ("ovl: support idmapped layers")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-07-08 15:48:31 +02:00
David S. Miller
32b3ad1418 Merge branch 'sysctl-data-races'
Kuniyuki Iwashima says:

====================
sysctl: Fix data-races around ipv4_table.

A sysctl variable is accessed concurrently, and there is always a chance
of data-race.  So, all readers and writers need some basic protection to
avoid load/store-tearing.

The first half of this series changes some proc handlers used in ipv4_table
to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the
sysctl side.  Then, the second half adds READ_ONCE() to the other readers
of ipv4_table.

Changes:
  v2:
    * Drop some changes that makes backporting difficult
      * First cleanup patch
      * Lockless helpers and .proc_handler changes
    * Drop the tracing part for .sysctl_mem
      * Steve already posted a fix
    * Drop int-to-bool change for cipso
      * Should be posted to net-next later
    * Drop proc_dobool() change
      * Can be included in another series

  v1: https://lore.kernel.org/netdev/20220706052130.16368-1-kuniyu@amazon.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:34 +01:00
Kuniyuki Iwashima
73318c4b7d ipv4: Fix a data-race around sysctl_fib_sync_mem.
While reading sysctl_fib_sync_mem, it can be changed concurrently.
So, we need to add READ_ONCE() to avoid a data-race.

Fixes: 9ab948a91b ("ipv4: Allow amount of dirty memory from fib resizing to be controllable")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:34 +01:00
Kuniyuki Iwashima
48d7ee321e icmp: Fix data-races around sysctl.
While reading icmp sysctl variables, they can be changed concurrently.
So, we need to add READ_ONCE() to avoid data-races.

Fixes: 4cdf507d54 ("icmp: add a global rate limitation")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:34 +01:00
Kuniyuki Iwashima
dd44f04b92 cipso: Fix data-races around sysctl.
While reading cipso sysctl variables, they can be changed concurrently.
So, we need to add READ_ONCE() to avoid data-races.

Fixes: 446fda4f26 ("[NetLabel]: CIPSOv4 engine")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:33 +01:00
Kuniyuki Iwashima
310731e2f1 net: Fix data-races around sysctl_mem.
While reading .sysctl_mem, it can be changed concurrently.
So, we need to add READ_ONCE() to avoid data-races.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:33 +01:00
Kuniyuki Iwashima
3d32edf1f3 inetpeer: Fix data-races around sysctl.
While reading inetpeer sysctl variables, they can be changed
concurrently.  So, we need to add READ_ONCE() to avoid data-races.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:33 +01:00
Kuniyuki Iwashima
47e6ab24e8 tcp: Fix a data-race around sysctl_tcp_max_orphans.
While reading sysctl_tcp_max_orphans, it can be changed concurrently.
So, we need to add READ_ONCE() to avoid a data-race.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:33 +01:00
Kuniyuki Iwashima
e877820877 sysctl: Fix data races in proc_dointvec_jiffies().
A sysctl variable is accessed concurrently, and there is always a chance
of data-race.  So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_dointvec_jiffies() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side.  For now,
proc_dointvec_jiffies() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:33 +01:00
Kuniyuki Iwashima
c31bcc8fb8 sysctl: Fix data races in proc_doulongvec_minmax().
A sysctl variable is accessed concurrently, and there is always a chance
of data-race.  So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_doulongvec_minmax() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side.  For now,
proc_doulongvec_minmax() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:33 +01:00
Kuniyuki Iwashima
2d3b559df3 sysctl: Fix data races in proc_douintvec_minmax().
A sysctl variable is accessed concurrently, and there is always a chance
of data-race.  So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_douintvec_minmax() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side.  For now,
proc_douintvec_minmax() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.

Fixes: 61d9b56a89 ("sysctl: add unsigned int range support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:33 +01:00
Kuniyuki Iwashima
f613d86d01 sysctl: Fix data races in proc_dointvec_minmax().
A sysctl variable is accessed concurrently, and there is always a chance
of data-race.  So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_dointvec_minmax() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side.  For now,
proc_dointvec_minmax() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:32 +01:00
Kuniyuki Iwashima
4762b532ec sysctl: Fix data races in proc_douintvec().
A sysctl variable is accessed concurrently, and there is always a chance
of data-race.  So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_douintvec() to use READ_ONCE() and WRITE_ONCE()
internally to fix data-races on the sysctl side.  For now, proc_douintvec()
itself is tolerant to a data-race, but we still need to add annotations on
the other subsystem's side.

Fixes: e7d316a02f ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:32 +01:00
Kuniyuki Iwashima
1f1be04b4d sysctl: Fix data races in proc_dointvec().
A sysctl variable is accessed concurrently, and there is always a chance
of data-race.  So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_dointvec() to use READ_ONCE() and WRITE_ONCE()
internally to fix data-races on the sysctl side.  For now, proc_dointvec()
itself is tolerant to a data-race, but we still need to add annotations on
the other subsystem's side.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:32 +01:00
Steven Rostedt (Google)
820b8963ad net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointer
The trace event sock_exceed_buf_limit saves the prot->sysctl_mem pointer
and then dereferences it in the TP_printk() portion. This is unsafe as the
TP_printk() portion is executed at the time the buffer is read. That is,
it can be seconds, minutes, days, months, even years later. If the proto
is freed, then this dereference will can also lead to a kernel crash.

Instead, save the sysctl_mem array into the ring buffer and have the
TP_printk() reference that instead. This is the proper and safe way to
read pointers in trace events.

Link: https://lore.kernel.org/all/20220706052130.16368-12-kuniyu@amazon.com/

Cc: stable@vger.kernel.org
Fixes: 3847ce32ae ("core: add tracepoints for queueing skb to rcvbuf")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:06:17 +01:00
Thadeu Lima de Souza Cascardo
2259da159f x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported
There are some VM configurations which have Skylake model but do not
support IBPB. In those cases, when using retbleed=ibpb, userspace is going
to be killed and kernel is going to panic.

If the CPU does not support IBPB, warn and proceed with the auto option. Also,
do not fallback to IBPB on AMD/Hygon systems if it is not supported.

Fixes: 3ebc170068 ("x86/bugs: Add retbleed=ibpb")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-08 12:50:52 +02:00
Joanne Koong
f8d3da4ef8 bpf: Add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs
Commit 13bbbfbea7 ("bpf: Add bpf_dynptr_read and bpf_dynptr_write")
added the bpf_dynptr_write() and bpf_dynptr_read() APIs.

However, it will be needed for some dynptr types to pass in flags as
well (e.g. when writing to a skb, the user may like to invalidate the
hash or recompute the checksum).

This patch adds a "u64 flags" arg to the bpf_dynptr_read() and
bpf_dynptr_write() APIs before their UAPI signature freezes where
we then cannot change them anymore with a 5.19.x released kernel.

Fixes: 13bbbfbea7 ("bpf: Add bpf_dynptr_read and bpf_dynptr_write")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20220706232547.4016651-1-joannelkoong@gmail.com
2022-07-08 10:55:53 +02:00
Joerg Roedel
c51b8f85c4 MAINTAINERS: Remove iommu@lists.linux-foundation.org
The IOMMU mailing list has moved to iommu@lists.linux.dev
and the old list should bounce by now. Remove it from the
MAINTAINERS file.

Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20220706103331.10215-1-joro@8bytes.org
2022-07-08 09:34:26 +02:00
Pavel Skripkin
f46fd3d7c3 net: ocelot: fix wrong time_after usage
Accidentally noticed, that this driver is the only user of
while (time_after(jiffies...)).

It looks like typo, because likely this while loop will finish after 1st
iteration, because time_after() returns true when 1st argument _is after_
2nd one.

There is one possible problem with this poll loop: the scheduler could put
the thread to sleep, and it does not get woken up for
OCELOT_FDMA_CH_SAFE_TIMEOUT_US. During that time, the hardware has done
its thing, but you exit the while loop and return -ETIMEDOUT.

Fix it by using sane poll API that avoids all problems described above

Fixes: 753a026cfe ("net: ocelot: add FDMA support")
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220706132845.27968-1-paskripkin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07 18:05:36 -07:00
Jakub Kicinski
fe5235aef8 mlx5-fixes-2022-07-06
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmLGFrwACgkQSD+KveBX
 +j6frQf9Hb1Qg4+gzRuhC35GgZ72HTWSWmECnQKx98puSSYrtTMGUUUgrJTf0O5V
 Jxrm7bWG4/D5ykfWcXWJm7bQZNgnYia9vC8YAWJpHd7zzCcecQBU8qhRH371eMfp
 wUs8rsNxIOJyhLvmQ+bLO/7Njb3cPrahnurWAoLVspjpR4nzbrVOjPAznjMHeXur
 OIiVgr0DPbYf/DaacKgiymlE6z+qziBuu2gohv21008EvW2ymzlqU33eGO1pKblt
 icOlx6wcEgbyrJmdy6gd9eCO/qd+z7EIUYfbcTSDdrVpOpH4HQjXMLEgKBZqCTq4
 lkQpJIamFRSgsAJb8sF58b/RhcQofA==
 =DOa+
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2022-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2022-07-06

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2022-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Ring the TX doorbell on DMA errors
  net/mlx5e: Fix capability check for updating vnic env counters
  net/mlx5e: CT: Use own workqueue instead of mlx5e priv
  net/mlx5: Lag, correct get the port select mode str
  net/mlx5e: Fix enabling sriov while tc nic rules are offloaded
  net/mlx5e: kTLS, Fix build time constant test in RX
  net/mlx5e: kTLS, Fix build time constant test in TX
  net/mlx5: Lag, decouple FDB selection and shared FDB
  net/mlx5: TC, allow offload from uplink to other PF's VF
====================

Link: https://lore.kernel.org/r/20220706231309.38579-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07 17:44:45 -07:00
Siddharth Vadapalli
0680e20af5 net: ethernet: ti: am65-cpsw: Fix devlink port register sequence
Renaming interfaces using udevd depends on the interface being registered
before its netdev is registered. Otherwise, udevd reads an empty
phys_port_name value, resulting in the interface not being renamed.

Fix this by registering the interface before registering its netdev
by invoking am65_cpsw_nuss_register_devlink() before invoking
register_netdev() for the interface.

Move the function call to devlink_port_type_eth_set(), invoking it after
register_netdev() is invoked, to ensure that netlink notification for the
port state change is generated after the netdev is completely initialized.

Fixes: 58356eb31d ("net: ti: am65-cpsw-nuss: Add devlink support")
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://lore.kernel.org/r/20220706070208.12207-1-s-vadapalli@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07 17:21:44 -07:00
Jon Hunter
029c1c2059 net: stmmac: dwc-qos: Disable split header for Tegra194
There is a long-standing issue with the Synopsys DWC Ethernet driver
for Tegra194 where random system crashes have been observed [0]. The
problem occurs when the split header feature is enabled in the stmmac
driver. In the bad case, a larger than expected buffer length is
received and causes the calculation of the total buffer length to
overflow. This results in a very large buffer length that causes the
kernel to crash. Why this larger buffer length is received is not clear,
however, the feedback from the NVIDIA design team is that the split
header feature is not supported for Tegra194. Therefore, disable split
header support for Tegra194 to prevent these random crashes from
occurring.

[0] https://lore.kernel.org/linux-tegra/b0b17697-f23e-8fa5-3757-604a86f3a095@nvidia.com/

Fixes: 67afd6d1cf ("net: stmmac: Add Split Header support and enable it in XGMAC cores")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lore.kernel.org/r/20220706083913.13750-1-jonathanh@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07 17:05:01 -07:00
Jens Axboe
6b0de7d0f3 nvme fixes for Linux 5.19
- another bogus identifier quirk (Keith Busch)
  - use struct group in the tracer to avoid a gcc warning (Keith Busch)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmLG3mQLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNoRQ/9HOXrJuVXUxtm86vs0cxF94uyGs5WvuHn2SEt7cA9
 bYporFmqz2U/I6+tCXMS5ZlxAgzE9+lFedizjEVJsZ4yTxf6fYUTpf4JFUflghVr
 Fnqhe8bmbCpBrz6jwFjX7cuwIOw5BPenZnCGojPsKuTICMbYjo/OlTFgfJjlFOhm
 d0WY9z1CkS0c9FL+CzO6ST0GWsKtjpkjhMokWjqOV9txcXaGdzdac7nr7Bky0fZC
 XwFmokAnmGvko5X88EQNq1+p/QkRnY8ucjy5vhBU9XuXiJAJScgD7pLreqzLvHyK
 6qwgBCCpeg6x4A19LgmUDNgZi1Apa/F/DHjpeVwN5C7hJn1ZIKWqGV6TfLwm6V5Z
 bqwhWzs0JXUFLW2Zgoh0B5Apro33ABn3o+qY5rwdFlAiS/SH12EWlo7OW1/tVa0w
 P5c0iZvRfwDeWImBWCYPvq/dh6IAuYl0OkqDQJwcvTIoejwIWeqj9kbvk25HBzP3
 F+hhvriF/lZhTannu6GsZd7+I7Xtigr9FnTHkJZKgrKVPrvjaeauQ+YRRt7QKmV4
 /Vtq/SVTIZZc4LJGzHWgvuCTi3UuO69JOQTP+Ylsfm0N+xGKKRUUYWReBb4yquB4
 f8Hs0n2BQDQMVUWnRuWJF2tMt1mAfVOXrCMzFH1n4XwT0BQybH3fR/h+S/HsWvlG
 SYE=
 =jbww
 -----END PGP SIGNATURE-----

Merge tag 'nvme-5.19-2022-07-07' of git://git.infradead.org/nvme into block-5.19

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 5.19

 - another bogus identifier quirk (Keith Busch)
 - use struct group in the tracer to avoid a gcc warning (Keith Busch)"

* tag 'nvme-5.19-2022-07-07' of git://git.infradead.org/nvme:
  nvme: use struct group for generic command dwords
  nvme-pci: phison e16 has bogus namespace ids
2022-07-07 17:38:19 -06:00