Commit Graph

68110 Commits

Author SHA1 Message Date
Duoming Zhou
fc6d01ff9e ax25: Fix NULL pointer dereferences in ax25 timers
The previous commit 7ec02f5ac8 ("ax25: fix NPD bug in ax25_disconnect")
move ax25_disconnect into lock_sock() in order to prevent NPD bugs. But
there are race conditions that may lead to null pointer dereferences in
ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(),
ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we use
ax25_kill_by_device() to detach the ax25 device.

One of the race conditions that cause null pointer dereferences can be
shown as below:

      (Thread 1)                    |      (Thread 2)
ax25_connect()                      |
 ax25_std_establish_data_link()     |
  ax25_start_t1timer()              |
   mod_timer(&ax25->t1timer,..)     |
                                    | ax25_kill_by_device()
   (wait a time)                    |  ...
                                    |  s->ax25_dev = NULL; //(1)
   ax25_t1timer_expiry()            |
    ax25->ax25_dev->values[..] //(2)|  ...
     ...                            |

We set null to ax25_cb->ax25_dev in position (1) and dereference
the null pointer in position (2).

The corresponding fail log is shown below:
===============================================================
BUG: kernel NULL pointer dereference, address: 0000000000000050
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc6-00794-g45690b7d0
RIP: 0010:ax25_t1timer_expiry+0x12/0x40
...
Call Trace:
 call_timer_fn+0x21/0x120
 __run_timers.part.0+0x1ca/0x250
 run_timer_softirq+0x2c/0x60
 __do_softirq+0xef/0x2f3
 irq_exit_rcu+0xb6/0x100
 sysvec_apic_timer_interrupt+0xa2/0xd0
...

This patch moves ax25_disconnect() before s->ax25_dev = NULL
and uses del_timer_sync() to delete timers in ax25_disconnect().
If ax25_disconnect() is called by ax25_kill_by_device() or
ax25->ax25_dev is NULL, the reason in ax25_disconnect() will be
equal to ENETUNREACH, it will wait all timers to stop before we
set null to s->ax25_dev in ax25_kill_by_device().

Fixes: 7ec02f5ac8 ("ax25: fix NPD bug in ax25_disconnect")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-21 10:56:19 +00:00
Duoming Zhou
9fd75b66b8 ax25: Fix refcount leaks caused by ax25_cb_del()
The previous commit d01ffb9eee ("ax25: add refcount in ax25_dev to
avoid UAF bugs") and commit feef318c85 ("ax25: fix UAF bugs of
net_device caused by rebinding operation") increase the refcounts of
ax25_dev and net_device in ax25_bind() and decrease the matching refcounts
in ax25_kill_by_device() in order to prevent UAF bugs, but there are
reference count leaks.

The root cause of refcount leaks is shown below:

     (Thread 1)                      |      (Thread 2)
ax25_bind()                          |
 ...                                 |
 ax25_addr_ax25dev()                 |
  ax25_dev_hold()   //(1)            |
  ...                                |
 dev_hold_track()   //(2)            |
 ...                                 | ax25_destroy_socket()
                                     |  ax25_cb_del()
                                     |   ...
                                     |   hlist_del_init() //(3)
                                     |
                                     |
     (Thread 3)                      |
ax25_kill_by_device()                |
 ...                                 |
 ax25_for_each(s, &ax25_list) {      |
  if (s->ax25_dev == ax25_dev) //(4) |
   ...                               |

Firstly, we use ax25_bind() to increase the refcount of ax25_dev in
position (1) and increase the refcount of net_device in position (2).
Then, we use ax25_cb_del() invoked by ax25_destroy_socket() to delete
ax25_cb in hlist in position (3) before calling ax25_kill_by_device().
Finally, the decrements of refcounts in ax25_kill_by_device() will not
be executed, because no s->ax25_dev equals to ax25_dev in position (4).

This patch adds decrements of refcounts in ax25_release() and use
lock_sock() to do synchronization. If refcounts decrease in ax25_release(),
the decrements of refcounts in ax25_kill_by_device() will not be
executed and vice versa.

Fixes: d01ffb9eee ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Fixes: 87563a043c ("ax25: fix reference count leaks of ax25_dev")
Fixes: feef318c85 ("ax25: fix UAF bugs of net_device caused by rebinding operation")
Reported-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-21 10:56:19 +00:00
Petr Machata
0caf6d9922 af_netlink: Fix shift out of bounds in group mask calculation
When a netlink message is received, netlink_recvmsg() fills in the address
of the sender. One of the fields is the 32-bit bitfield nl_groups, which
carries the multicast group on which the message was received. The least
significant bit corresponds to group 1, and therefore the highest group
that the field can represent is 32. Above that, the UB sanitizer flags the
out-of-bounds shift attempts.

Which bits end up being set in such case is implementation defined, but
it's either going to be a wrong non-zero value, or zero, which is at least
not misleading. Make the latter choice deterministic by always setting to 0
for higher-numbered multicast groups.

To get information about membership in groups >= 32, userspace is expected
to use nl_pktinfo control messages[0], which are enabled by NETLINK_PKTINFO
socket option.
[0] https://lwn.net/Articles/147608/

The way to trigger this issue is e.g. through monitoring the BRVLAN group:

	# bridge monitor vlan &
	# ip link add name br type bridge

Which produces the following citation:

	UBSAN: shift-out-of-bounds in net/netlink/af_netlink.c:162:19
	shift exponent 32 is too large for 32-bit type 'int'

Fixes: f7fa9b10ed ("[NETLINK]: Support dynamic number of multicast groups per netlink family")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/2bef6aabf201d1fc16cca139a744700cff9dcb04.1647527635.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18 21:47:14 -07:00
Yonglong Li
3ef3905aa3 mptcp: Fix crash due to tcp_tsorted_anchor was initialized before release skb
Got crash when doing pressure test of mptcp:

===========================================================================
dst_release: dst:ffffa06ce6e5c058 refcnt:-1
kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle kernel paging request at ffffa06ce6e5c058
PGD 190a01067 P4D 190a01067 PUD 43fffb067 PMD 22e403063 PTE 8000000226e5c063
Oops: 0011 [#1] SMP PTI
CPU: 7 PID: 7823 Comm: kworker/7:0 Kdump: loaded Tainted: G            E
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.2.1 04/01/2014
Call Trace:
 ? skb_release_head_state+0x68/0x100
 ? skb_release_all+0xe/0x30
 ? kfree_skb+0x32/0xa0
 ? mptcp_sendmsg_frag+0x57e/0x750
 ? __mptcp_retrans+0x21b/0x3c0
 ? __switch_to_asm+0x35/0x70
 ? mptcp_worker+0x25e/0x320
 ? process_one_work+0x1a7/0x360
 ? worker_thread+0x30/0x390
 ? create_worker+0x1a0/0x1a0
 ? kthread+0x112/0x130
 ? kthread_flush_work_fn+0x10/0x10
 ? ret_from_fork+0x35/0x40
===========================================================================

In __mptcp_alloc_tx_skb skb was allocated and skb->tcp_tsorted_anchor will
be initialized, in under memory pressure situation sk_wmem_schedule will
return false and then kfree_skb. In this case skb->_skb_refdst is not null
because_skb_refdst and tcp_tsorted_anchor are stored in the same mem, and
kfree_skb will try to release dst and cause crash.

Fixes: f70cad1085 ("mptcp: stop relying on tcp_tx_skb_cache")
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20220317220953.426024-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18 14:13:48 -07:00
Guillaume Nault
544b4dd568 ipv4: Fix route lookups when handling ICMP redirects and PMTU updates
The PMTU update and ICMP redirect helper functions initialise their fl4
variable with either __build_flow_key() or build_sk_flow_key(). These
initialisation functions always set ->flowi4_scope with
RT_SCOPE_UNIVERSE and might set the ECN bits of ->flowi4_tos. This is
not a problem when the route lookup is later done via
ip_route_output_key_hash(), which properly clears the ECN bits from
->flowi4_tos and initialises ->flowi4_scope based on the RTO_ONLINK
flag. However, some helpers call fib_lookup() directly, without
sanitising the tos and scope fields, so the route lookup can fail and,
as a result, the ICMP redirect or PMTU update aren't taken into
account.

Fix this by extracting the ->flowi4_tos and ->flowi4_scope sanitisation
code into ip_rt_fix_tos(), then use this function in handlers that call
fib_lookup() directly.

Note 1: We can't sanitise ->flowi4_tos and ->flowi4_scope in a central
place (like __build_flow_key() or flowi4_init_output()), because
ip_route_output_key_hash() expects non-sanitised values. When called
with sanitised values, it can erroneously overwrite RT_SCOPE_LINK with
RT_SCOPE_UNIVERSE in ->flowi4_scope. Therefore we have to be careful to
sanitise the values only for those paths that don't call
ip_route_output_key_hash().

Note 2: The problem is mostly about sanitising ->flowi4_tos. Having
->flowi4_scope initialised with RT_SCOPE_UNIVERSE instead of
RT_SCOPE_LINK probably wasn't really a problem: sockets with the
SOCK_LOCALROUTE flag set (those that'd result in RTO_ONLINK being set)
normally shouldn't receive ICMP redirects or PMTU updates.

Fixes: 4895c771c7 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18 14:06:45 -07:00
Jakub Kicinski
6bd0c76bd7 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2022-03-18

We've added 2 non-merge commits during the last 18 day(s) which contain
a total of 2 files changed, 50 insertions(+), 20 deletions(-).

The main changes are:

1) Fix a race in XSK socket teardown code that can lead to a NULL pointer
   dereference, from Magnus.

2) Small MAINTAINERS doc update to remove Lorenz from sockmap, from Lorenz.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  xsk: Fix race at socket teardown
  bpf: Remove Lorenz Bauer from L7 BPF maintainers
====================

Link: https://lore.kernel.org/r/20220318152418.28638-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18 10:05:17 -07:00
Kuniyuki Iwashima
d9a232d435 af_unix: Support POLLPRI for OOB.
The commit 314001f0bf ("af_unix: Add OOB support") introduced OOB for
AF_UNIX, but it lacks some changes for POLLPRI.  Let's add the missing
piece.

In the selftest, normal datagrams are sent followed by OOB data, so this
commit replaces `POLLIN | POLLPRI` with just `POLLPRI` in the first test
case.

Fixes: 314001f0bf ("af_unix: Add OOB support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-18 13:30:52 +00:00
Kuniyuki Iwashima
e82025c623 af_unix: Fix some data-races around unix_sk(sk)->oob_skb.
Out-of-band data automatically places a "mark" showing wherein the
sequence the out-of-band data would have been.  If the out-of-band data
implies cancelling everything sent so far, the "mark" is helpful to flush
them.  When the socket's read pointer reaches the "mark", the ioctl() below
sets a non zero value to the arg `atmark`:

The out-of-band data is queued in sk->sk_receive_queue as well as ordinary
data and also saved in unix_sk(sk)->oob_skb.  It can be used to test if the
head of the receive queue is the out-of-band data meaning the socket is at
the "mark".

While testing that, unix_ioctl() reads unix_sk(sk)->oob_skb locklessly.
Thus, all accesses to oob_skb need some basic protection to avoid
load/store tearing which KCSAN detects when these are called concurrently:

  - ioctl(fd_a, SIOCATMARK, &atmark, sizeof(atmark))
  - send(fd_b_connected_to_a, buf, sizeof(buf), MSG_OOB)

BUG: KCSAN: data-race in unix_ioctl / unix_stream_sendmsg

write to 0xffff888003d9cff0 of 8 bytes by task 175 on cpu 1:
 unix_stream_sendmsg (net/unix/af_unix.c:2087 net/unix/af_unix.c:2191)
 sock_sendmsg (net/socket.c:705 net/socket.c:725)
 __sys_sendto (net/socket.c:2040)
 __x64_sys_sendto (net/socket.c:2048)
 do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)

read to 0xffff888003d9cff0 of 8 bytes by task 176 on cpu 0:
 unix_ioctl (net/unix/af_unix.c:3101 (discriminator 1))
 sock_do_ioctl (net/socket.c:1128)
 sock_ioctl (net/socket.c:1242)
 __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:874 fs/ioctl.c:860 fs/ioctl.c:860)
 do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)

value changed: 0xffff888003da0c00 -> 0xffff888003da0d00

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 176 Comm: unix_race_oob_i Not tainted 5.17.0-rc5-59529-g83dc4c2af682 #12
Hardware name: Red Hat KVM, BIOS 1.11.0-2.amzn2 04/01/2014

Fixes: 314001f0bf ("af_unix: Add OOB support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-18 13:30:52 +00:00
David S. Miller
4fa331b45d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix PPPoE and QinQ with flowtable inet family.

2) Missing register validation in nf_tables.

3) Initialize registers to avoid stack memleak to userspace.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-18 10:33:47 +00:00
Pablo Neira Ayuso
4c905f6740 netfilter: nf_tables: initialize registers in nft_do_chain()
Initialize registers to avoid stack leak into userspace.

Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-17 15:50:27 +01:00
Pablo Neira Ayuso
6e1acfa387 netfilter: nf_tables: validate registers coming from userspace.
Bail out in case userspace uses unsupported registers.

Fixes: 49499c3e6e ("netfilter: nf_tables: switch registers to 32 bit addressing")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-17 14:06:14 +01:00
Miaoqian Lin
cb0b430b4e net: dsa: Add missing of_node_put() in dsa_port_parse_of
The device_node pointer is returned by of_parse_phandle()  with refcount
incremented. We should use of_node_put() on it when done.

Fixes: 6d4e5c570c ("net: dsa: get port type at parse time")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20220316082602.10785-1-linmq006@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-17 13:13:27 +01:00
Jakub Kicinski
186abea8a8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2022-03-16

1) Fix a kernel-info-leak in pfkey.
   From Haimin Zhang.

2) Fix an incorrect check of the return value of ipv6_skip_exthdr.
   From Sabrina Dubroca.

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  esp6: fix check on ipv6_skip_exthdr's return value
  af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
====================

Link: https://lore.kernel.org/r/20220316121142.3142336-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-16 11:39:37 -07:00
Pablo Neira Ayuso
0492d85763 netfilter: flowtable: Fix QinQ and pppoe support for inet table
nf_flow_offload_inet_hook() does not check for 802.1q and PPPoE.
Fetch inner ethertype from these encapsulation protocols.

Fixes: 72efd585f7 ("netfilter: flowtable: add pppoe support")
Fixes: 4cd91f7c29 ("netfilter: flowtable: add vlan support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-16 11:25:04 +01:00
Eric Dumazet
c700525fcc net/packet: fix slab-out-of-bounds access in packet_recvmsg()
syzbot found that when an AF_PACKET socket is using PACKET_COPY_THRESH
and mmap operations, tpacket_rcv() is queueing skbs with
garbage in skb->cb[], triggering a too big copy [1]

Presumably, users of af_packet using mmap() already gets correct
metadata from the mapped buffer, we can simply make sure
to clear 12 bytes that might be copied to user space later.

BUG: KASAN: stack-out-of-bounds in memcpy include/linux/fortify-string.h:225 [inline]
BUG: KASAN: stack-out-of-bounds in packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489
Write of size 165 at addr ffffc9000385fb78 by task syz-executor233/3631

CPU: 0 PID: 3631 Comm: syz-executor233 Not tainted 5.17.0-rc7-syzkaller-02396-g0b3660695e80 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_address_description.constprop.0.cold+0xf/0x336 mm/kasan/report.c:255
 __kasan_report mm/kasan/report.c:442 [inline]
 kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
 check_region_inline mm/kasan/generic.c:183 [inline]
 kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
 memcpy+0x39/0x60 mm/kasan/shadow.c:66
 memcpy include/linux/fortify-string.h:225 [inline]
 packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489
 sock_recvmsg_nosec net/socket.c:948 [inline]
 sock_recvmsg net/socket.c:966 [inline]
 sock_recvmsg net/socket.c:962 [inline]
 ____sys_recvmsg+0x2c4/0x600 net/socket.c:2632
 ___sys_recvmsg+0x127/0x200 net/socket.c:2674
 __sys_recvmsg+0xe2/0x1a0 net/socket.c:2704
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fdfd5954c29
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcf8e71e48 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fdfd5954c29
RDX: 0000000000000000 RSI: 0000000020000500 RDI: 0000000000000005
RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffcf8e71e60
R13: 00000000000f4240 R14: 000000000000c1ff R15: 00007ffcf8e71e54
 </TASK>

addr ffffc9000385fb78 is located in stack of task syz-executor233/3631 at offset 32 in frame:
 ____sys_recvmsg+0x0/0x600 include/linux/uio.h:246

this frame has 1 object:
 [32, 160) 'addr'

Memory state around the buggy address:
 ffffc9000385fa80: 00 04 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00
 ffffc9000385fb00: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00
>ffffc9000385fb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f3
                                                                ^
 ffffc9000385fc00: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 f1
 ffffc9000385fc80: f1 f1 f1 00 f2 f2 f2 00 f2 f2 f2 00 00 00 00 00
==================================================================

Fixes: 0fb375fb9b ("[AF_PACKET]: Allow for > 8 byte hardware addresses.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220312232958.3535620-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-14 22:08:34 -07:00
Jakub Kicinski
15d703921f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net coming late
in the 5.17-rc process:

1) Revert port remap to mitigate shadowing service ports, this is causing
   problems in existing setups and this mitigation can be achieved with
   explicit ruleset, eg.

	... tcp sport < 16386 tcp dport >= 32768 masquerade random

  This patches provided a built-in policy similar to the one described above.

2) Disable register tracking infrastructure in nf_tables. Florian reported
   two issues:

   - Existing expressions with no implemented .reduce interface
     that causes data-store on register should cancel the tracking.
   - Register clobbering might be possible storing data on registers that
     are larger than 32-bits.

   This might lead to generating incorrect ruleset bytecode. These two
   issues are scheduled to be addressed in the next release cycle.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: disable register tracking
  Revert "netfilter: conntrack: tag conntracks picked up in local out hook"
  Revert "netfilter: nat: force port remap to prevent shadowing well-known ports"
====================

Link: https://lore.kernel.org/r/20220312220315.64531-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-14 15:51:10 -07:00
Sabrina Dubroca
4db4075f92 esp6: fix check on ipv6_skip_exthdr's return value
Commit 5f9c55c806 ("ipv6: check return value of ipv6_skip_exthdr")
introduced an incorrect check, which leads to all ESP packets over
either TCPv6 or UDPv6 encapsulation being dropped. In this particular
case, offset is negative, since skb->data points to the ESP header in
the following chain of headers, while skb->network_header points to
the IPv6 header:

    IPv6 | ext | ... | ext | UDP | ESP | ...

That doesn't seem to be a problem, especially considering that if we
reach esp6_input_done2, we're guaranteed to have a full set of headers
available (otherwise the packet would have been dropped earlier in the
stack). However, it means that the return value will (intentionally)
be negative. We can make the test more specific, as the expected
return value of ipv6_skip_exthdr will be the (negated) size of either
a UDP header, or a TCP header with possible options.

In the future, we should probably either make ipv6_skip_exthdr
explicitly accept negative offsets (and adjust its return value for
error cases), or make ipv6_skip_exthdr only take non-negative
offsets (and audit all callers).

Fixes: 5f9c55c806 ("ipv6: check return value of ipv6_skip_exthdr")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-14 11:42:27 +01:00
Pablo Neira Ayuso
ed5f85d422 netfilter: nf_tables: disable register tracking
The register tracking infrastructure is incomplete, it might lead to
generating incorrect ruleset bytecode, disable it by now given we are
late in the release process.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-12 16:07:38 +01:00
Jiyong Park
8e6ed96376 vsock: each transport cycles only on its own sockets
When iterating over sockets using vsock_for_each_connected_socket, make
sure that a transport filters out sockets that don't belong to the
transport.

There actually was an issue caused by this; in a nested VM
configuration, destroying the nested VM (which often involves the
closing of /dev/vhost-vsock if there was h2g connections to the nested
VM) kills not only the h2g connections, but also all existing g2h
connections to the (outmost) host which are totally unrelated.

Tested: Executed the following steps on Cuttlefish (Android running on a
VM) [1]: (1) Enter into an `adb shell` session - to have a g2h
connection inside the VM, (2) open and then close /dev/vhost-vsock by
`exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb
session is not reset.

[1] https://android.googlesource.com/device/google/cuttlefish/

Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jiyong Park <jiyong@google.com>
Link: https://lore.kernel.org/r/20220311020017.1509316-1-jiyong@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11 23:14:19 -08:00
Tadeusz Struk
5e34af4142 net: ipv6: fix skb_over_panic in __ip6_append_data
Syzbot found a kernel bug in the ipv6 stack:
LINK: https://syzkaller.appspot.com/bug?id=205d6f11d72329ab8d62a610c44c5e7e25415580
The reproducer triggers it by sending a crafted message via sendmmsg()
call, which triggers skb_over_panic, and crashes the kernel:

skbuff: skb_over_panic: text:ffffffff84647fb4 len:65575 put:65575
head:ffff888109ff0000 data:ffff888109ff0088 tail:0x100af end:0xfec0
dev:<NULL>

Update the check that prevents an invalid packet with MTU equal
to the fregment header size to eat up all the space for payload.

The reproducer can be found here:
LINK: https://syzkaller.appspot.com/text?tag=ReproC&x=1648c83fb00000

Reported-by: syzbot+e223cf47ec8ae183f2a0@syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20220310232538.1044947-1-tadeusz.struk@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11 17:11:38 -08:00
Linus Torvalds
186d32bbf0 Networking fixes for 5.17-rc8/final, including fixes from bluetooth,
and ipsec.
 
 Current release - regressions:
 
  - Bluetooth: fix unbalanced unlock in set_device_flags()
 
  - Bluetooth: fix not processing all entries on cmd_sync_work,
    make connect with qualcomm and intel adapters reliable
 
  - Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0"
 
  - xdp: xdp_mem_allocator can be NULL in trace_mem_connect()
 
  - eth: ice: fix race condition and deadlock during interface enslave
 
 Current release - new code bugs:
 
  - tipc: fix incorrect order of state message data sanity check
 
 Previous releases - regressions:
 
  - esp: fix possible buffer overflow in ESP transformation
 
  - dsa: unlock the rtnl_mutex when dsa_master_setup() fails
 
  - phy: meson-gxl: fix interrupt handling in forced mode
 
  - smsc95xx: ignore -ENODEV errors when device is unplugged
 
 Previous releases - always broken:
 
  - xfrm: fix tunnel mode fragmentation behavior
 
  - esp: fix inter address family tunneling on GSO
 
  - tipc: fix null-deref due to race when enabling bearer
 
  - sctp: fix kernel-infoleak for SCTP sockets
 
  - eth: macb: fix lost RX packet wakeup race in NAPI receive
 
  - eth: intel stop disabling VFs due to PF error responses
 
  - eth: bcmgenet: don't claim WOL when its not available
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmIqlOsACgkQMUZtbf5S
 IrtKJBAAjZpYBwwHty6JR7AahLF4LNO+o1KmraqFV7YByS5NRfBRpXV7asvpxJNF
 9iJhOWtLMsz/mVq0OXdx/+NpDh9JIHrQzb3GiskeKzBdhHmW4HjuYug1gytqRDMx
 uZOiQEuJSREu0tCsfcVWTF8wm4OgmPWtyZNZq2kwXsHiKoptB9KFK9pcvD6Utxrg
 jTpYBS5I9cX0Sj+gG9fZFNeyaxgmKkC5cM4cSLcheGSKHvEbX6MIXfi2Wb1VRBzE
 Qk/1JbkQf4gQ1BAu9kt8+jgWqW7vSnDn2iYUVw7RSSlj5xIM4f4m71nS9XzejJLb
 ADry24arlmknMS9Rhpy7n3ogNn/5MtlsZt01z/AAyZDRc1rrsWDqOJugtDRSnSEh
 yAhAsl/vqOuoovA86IRBTji8JlyfNZXt33K7+1KKDsj1wzSpcB9AKTDps8Ncu9uL
 elyaU2v4bTdhdqkQnxpcsLlLcV3FzLaWUVLpcla3XVLvzjEnoY+mhR5boW735uj7
 f8Ig9Aj4UceJ+sQtXywciknE1+s48/pWqs8b8Y5DXX1P168A1ud5voy4Po6RvqQG
 B17WvAaq/7DsMKcuofeykFHCKlwO36xdt6l0ExaQuzmV+NgoEBWAmgwsyl9ktFpT
 I09D2RMPfTqYgdNvYkKGBrMKV87weVvHpMIeJiG1YeiBB3e1Xw8=
 =WfAR
 -----END PGP SIGNATURE-----

Merge tag 'net-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, and ipsec.

  Current release - regressions:

   - Bluetooth: fix unbalanced unlock in set_device_flags()

   - Bluetooth: fix not processing all entries on cmd_sync_work, make
     connect with qualcomm and intel adapters reliable

   - Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0"

   - xdp: xdp_mem_allocator can be NULL in trace_mem_connect()

   - eth: ice: fix race condition and deadlock during interface enslave

  Current release - new code bugs:

   - tipc: fix incorrect order of state message data sanity check

  Previous releases - regressions:

   - esp: fix possible buffer overflow in ESP transformation

   - dsa: unlock the rtnl_mutex when dsa_master_setup() fails

   - phy: meson-gxl: fix interrupt handling in forced mode

   - smsc95xx: ignore -ENODEV errors when device is unplugged

  Previous releases - always broken:

   - xfrm: fix tunnel mode fragmentation behavior

   - esp: fix inter address family tunneling on GSO

   - tipc: fix null-deref due to race when enabling bearer

   - sctp: fix kernel-infoleak for SCTP sockets

   - eth: macb: fix lost RX packet wakeup race in NAPI receive

   - eth: intel stop disabling VFs due to PF error responses

   - eth: bcmgenet: don't claim WOL when its not available"

* tag 'net-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
  xdp: xdp_mem_allocator can be NULL in trace_mem_connect().
  ice: Fix race condition during interface enslave
  net: phy: meson-gxl: improve link-up behavior
  net: bcmgenet: Don't claim WOL when its not available
  net: arc_emac: Fix use after free in arc_mdio_probe()
  sctp: fix kernel-infoleak for SCTP sockets
  net: phy: correct spelling error of media in documentation
  net: phy: DP83822: clear MISR2 register to disable interrupts
  gianfar: ethtool: Fix refcount leak in gfar_get_ts_info
  selftests: pmtu.sh: Kill nettest processes launched in subshell.
  selftests: pmtu.sh: Kill tcpdump processes launched by subshell.
  NFC: port100: fix use-after-free in port100_send_complete
  net/mlx5e: SHAMPO, reduce TIR indication
  net/mlx5e: Lag, Only handle events from highest priority multipath entry
  net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE
  net/mlx5: Fix a race on command flush flow
  net/mlx5: Fix size field in bufferx_reg struct
  ax25: Fix NULL pointer dereference in ax25_kill_by_device
  net: marvell: prestera: Add missing of_node_put() in prestera_switch_set_base_mac_addr
  net: ethernet: lpc_eth: Handle error for clk_enable
  ...
2022-03-10 16:47:58 -08:00
Sebastian Andrzej Siewior
e0ae713023 xdp: xdp_mem_allocator can be NULL in trace_mem_connect().
Since the commit mentioned below __xdp_reg_mem_model() can return a NULL
pointer. This pointer is dereferenced in trace_mem_connect() which leads
to segfault.

The trace points (mem_connect + mem_disconnect) were put in place to
pair connect/disconnect using the IDs. The ID is only assigned if
__xdp_reg_mem_model() does not return NULL. That connect trace point is
of no use if there is no ID.

Skip that connect trace point if xdp_alloc is NULL.

[ Toke Høiland-Jørgensen delivered the reasoning for skipping the trace
  point ]

Fixes: 4a48ef70b9 ("xdp: Allow registering memory model without rxq reference")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/YikmmXsffE+QajTB@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10 16:09:29 -08:00
Eric Dumazet
633593a808 sctp: fix kernel-infoleak for SCTP sockets
syzbot reported a kernel infoleak [1] of 4 bytes.

After analysis, it turned out r->idiag_expires is not initialized
if inet_sctp_diag_fill() calls inet_diag_msg_common_fill()

Make sure to clear idiag_timer/idiag_retrans/idiag_expires
and let inet_diag_msg_sctpasoc_fill() fill them again if needed.

[1]

BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline]
BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:154 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x6ef/0x25a0 lib/iov_iter.c:668
 instrument_copy_to_user include/linux/instrumented.h:121 [inline]
 copyout lib/iov_iter.c:154 [inline]
 _copy_to_iter+0x6ef/0x25a0 lib/iov_iter.c:668
 copy_to_iter include/linux/uio.h:162 [inline]
 simple_copy_to_iter+0xf3/0x140 net/core/datagram.c:519
 __skb_datagram_iter+0x2d5/0x11b0 net/core/datagram.c:425
 skb_copy_datagram_iter+0xdc/0x270 net/core/datagram.c:533
 skb_copy_datagram_msg include/linux/skbuff.h:3696 [inline]
 netlink_recvmsg+0x669/0x1c80 net/netlink/af_netlink.c:1977
 sock_recvmsg_nosec net/socket.c:948 [inline]
 sock_recvmsg net/socket.c:966 [inline]
 __sys_recvfrom+0x795/0xa10 net/socket.c:2097
 __do_sys_recvfrom net/socket.c:2115 [inline]
 __se_sys_recvfrom net/socket.c:2111 [inline]
 __x64_sys_recvfrom+0x19d/0x210 net/socket.c:2111
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Uninit was created at:
 slab_post_alloc_hook mm/slab.h:737 [inline]
 slab_alloc_node mm/slub.c:3247 [inline]
 __kmalloc_node_track_caller+0xe0c/0x1510 mm/slub.c:4975
 kmalloc_reserve net/core/skbuff.c:354 [inline]
 __alloc_skb+0x545/0xf90 net/core/skbuff.c:426
 alloc_skb include/linux/skbuff.h:1158 [inline]
 netlink_dump+0x3e5/0x16c0 net/netlink/af_netlink.c:2248
 __netlink_dump_start+0xcf8/0xe90 net/netlink/af_netlink.c:2373
 netlink_dump_start include/linux/netlink.h:254 [inline]
 inet_diag_handler_cmd+0x2e7/0x400 net/ipv4/inet_diag.c:1341
 sock_diag_rcv_msg+0x24a/0x620
 netlink_rcv_skb+0x40c/0x7e0 net/netlink/af_netlink.c:2494
 sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:277
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x1093/0x1360 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x14d9/0x1720 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 sock_write_iter+0x594/0x690 net/socket.c:1061
 do_iter_readv_writev+0xa7f/0xc70
 do_iter_write+0x52c/0x1500 fs/read_write.c:851
 vfs_writev fs/read_write.c:924 [inline]
 do_writev+0x645/0xe00 fs/read_write.c:967
 __do_sys_writev fs/read_write.c:1040 [inline]
 __se_sys_writev fs/read_write.c:1037 [inline]
 __x64_sys_writev+0xe5/0x120 fs/read_write.c:1037
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Bytes 68-71 of 2508 are uninitialized
Memory access of size 2508 starts at ffff888114f9b000
Data copied to user address 00007f7fe09ff2e0

CPU: 1 PID: 3478 Comm: syz-executor306 Not tainted 5.17.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 8f840e47f1 ("sctp: add the sctp_diag.c file")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20220310001145.297371-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10 14:46:42 -08:00
Haimin Zhang
9a564bccb7 af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
Add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
to initialize the buffer of supp_skb to fix a kernel-info-leak issue.
1) Function pfkey_register calls compose_sadb_supported to request
a sk_buff. 2) compose_sadb_supported calls alloc_sbk to allocate
a sk_buff, but it doesn't zero it. 3) If auth_len is greater 0, then
compose_sadb_supported treats the memory as a struct sadb_supported and
begins to initialize. But it just initializes the field sadb_supported_len
and field sadb_supported_exttype without field sadb_supported_reserved.

Reported-by: TCS Robot <tcs_robot@tencent.com>
Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-10 07:39:47 +01:00
David S. Miller
cc7e2f596e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2022-03-09

1) Fix IPv6 PMTU discovery for xfrm interfaces.
   From Lina Wang.

2) Revert failing for policies and states that are
   configured with XFRMA_IF_ID 0. It broke a
   user configuration. From Kai Lueke.

3) Fix a possible buffer overflow in the ESP output path.

4) Fix ESP GSO for tunnel and BEET mode on inter address
   family tunnels.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 14:48:11 +00:00
Duoming Zhou
71171ac8eb ax25: Fix NULL pointer dereference in ax25_kill_by_device
When two ax25 devices attempted to establish connection, the requester use ax25_create(),
ax25_bind() and ax25_connect() to initiate connection. The receiver use ax25_rcv() to
accept connection and use ax25_create_cb() in ax25_rcv() to create ax25_cb, but the
ax25_cb->sk is NULL. When the receiver is detaching, a NULL pointer dereference bug
caused by sock_hold(sk) in ax25_kill_by_device() will happen. The corresponding
fail log is shown below:

===============================================================
BUG: KASAN: null-ptr-deref in ax25_device_event+0xfd/0x290
Call Trace:
...
ax25_device_event+0xfd/0x290
raw_notifier_call_chain+0x5e/0x70
dev_close_many+0x174/0x220
unregister_netdevice_many+0x1f7/0xa60
unregister_netdevice_queue+0x12f/0x170
unregister_netdev+0x13/0x20
mkiss_close+0xcd/0x140
tty_ldisc_release+0xc0/0x220
tty_release_struct+0x17/0xa0
tty_release+0x62d/0x670
...

This patch add condition check in ax25_kill_by_device(). If s->sk is
NULL, it will goto if branch to kill device.

Fixes: 4e0f718daf ("ax25: improve the incomplete fix to avoid UAF and NPD bugs")
Reported-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09 12:45:02 +00:00
Tung Nguyen
c79fcc27be tipc: fix incorrect order of state message data sanity check
When receiving a state message, function tipc_link_validate_msg()
is called to validate its header portion. Then, its data portion
is validated before it can be accessed correctly. However, current
data sanity  check is done after the message header is accessed to
update some link variables.

This commit fixes this issue by moving the data sanity check to
the beginning of state message handling and right after the header
sanity check.

Fixes: 9aa422ad32 ("tipc: improve size validations for received domain records")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20220308021200.9245-1-tung.q.nguyen@dektech.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-08 22:18:42 -08:00
Florian Westphal
ee0a4dc9f3 Revert "netfilter: conntrack: tag conntracks picked up in local out hook"
This was a prerequisite for the ill-fated
"netfilter: nat: force port remap to prevent shadowing well-known ports".

As this has been reverted, this change can be backed out too.

Signed-off-by: Florian Westphal <fw@strlen.de>
2022-03-08 17:28:38 +01:00
Florian Westphal
a82c25c366 Revert "netfilter: nat: force port remap to prevent shadowing well-known ports"
This reverts commit 878aed8db3.

This change breaks existing setups where conntrack is used with
asymmetric paths.

In these cases, the NAT transformation occurs on the syn-ack instead of
the syn:

1. SYN    x:12345 -> y -> 443 // sent by initiator, receiverd by responder
2. SYNACK y:443 -> x:12345 // First packet seen by conntrack, as sent by responder
3. tuple_force_port_remap() gets called, sees:
  'tcp from 443 to port 12345 NAT' -> pick a new source port, inititor receives
4. SYNACK y:$RANDOM -> x:12345   // connection is never established

While its possible to avoid the breakage with NOTRACK rules, a kernel
update should not break working setups.

An alternative to the revert is to augment conntrack to tag
mid-stream connections plus more code in the nat core to skip NAT
for such connections, however, this leads to more interaction/integration
between conntrack and NAT.

Therefore, revert, users will need to add explicit nat rules to avoid
port shadowing.

Link: https://lore.kernel.org/netfilter-devel/20220302105908.GA5852@breakpoint.cc/#R
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2051413
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-03-08 13:52:11 +01:00
Steffen Klassert
23c7f8d798 net: Fix esp GSO on inter address family tunnels.
The esp tunnel GSO handlers use skb_mac_gso_segment to
push the inner packet to the segmentation handlers.
However, skb_mac_gso_segment takes the Ethernet Protocol
ID from 'skb->protocol' which is wrong for inter address
family tunnels. We fix this by introducing a new
skb_eth_gso_segment function.

This function can be used if it is necessary to pass the
Ethernet Protocol ID directly to the segmentation handler.
First users of this function will be the esp4 and esp6
tunnel segmentation handlers.

Fixes: c35fe4106b ("xfrm: Add mode handlers for IPsec on layer 2")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-07 13:14:04 +01:00
Steffen Klassert
053c8fdf2c esp: Fix BEET mode inter address family tunneling on GSO
The xfrm{4,6}_beet_gso_segment() functions did not correctly set the
SKB_GSO_IPXIP4 and SKB_GSO_IPXIP6 gso types for the address family
tunneling case. Fix this by setting these gso types.

Fixes: 384a46ea7b ("esp4: add gso_segment for esp4 beet mode")
Fixes: 7f9e40eb18 ("esp6: add gso_segment for esp6 beet mode")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-07 13:14:03 +01:00
Steffen Klassert
ebe48d368e esp: Fix possible buffer overflow in ESP transformation
The maximum message size that can be send is bigger than
the  maximum site that skb_page_frag_refill can allocate.
So it is possible to write beyond the allocated buffer.

Fix this by doing a fallback to COW in that case.

v2:

Avoid get get_order() costs as suggested by Linus Torvalds.

Fixes: cac2661c53 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a ("esp6: Avoid skb_cow_data whenever possible")
Reported-by: valis <sec@valis.email>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-07 13:14:03 +01:00
Juergen Gross
5cadd4bb1d xen/9p: use alloc/free_pages_exact()
Instead of __get_free_pages() and free_pages() use alloc_pages_exact()
and free_pages_exact(). This is in preparation of a change of
gnttab_end_foreign_access() which will prohibit use of high-order
pages.

By using the local variable "order" instead of ring->intf->ring_order
in the error path of xen_9pfs_front_alloc_dataring() another bug is
fixed, as the error path can be entered before ring->intf->ring_order
is being set.

By using alloc_pages_exact() the size in bytes is specified for the
allocation, which fixes another bug for the case of
order < (PAGE_SHIFT - XEN_PAGE_SHIFT).

This is part of CVE-2022-23041 / XSA-396.

Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V4:
- new patch
2022-03-07 09:48:55 +01:00
Vladimir Oltean
afb3cc1a39 net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails
After the blamed commit, dsa_tree_setup_master() may exit without
calling rtnl_unlock(), fix that.

Fixes: c146f9bc19 ("net: dsa: hold rtnl_mutex when calling dsa_master_{setup,teardown}")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-06 10:55:54 +00:00
Kai Lueke
a3d9001b4e Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0"
This reverts commit 68ac0f3810 because ID
0 was meant to be used for configuring the policy/state without
matching for a specific interface (e.g., Cilium is affected, see
https://github.com/cilium/cilium/pull/18789 and
https://github.com/cilium/cilium/pull/19019).

Signed-off-by: Kai Lueke <kailueke@linux.microsoft.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-06 08:38:28 +01:00
Tung Nguyen
be4977b847 tipc: fix kernel panic when enabling bearer
When enabling a bearer on a node, a kernel panic is observed:

[    4.498085] RIP: 0010:tipc_mon_prep+0x4e/0x130 [tipc]
...
[    4.520030] Call Trace:
[    4.520689]  <IRQ>
[    4.521236]  tipc_link_build_proto_msg+0x375/0x750 [tipc]
[    4.522654]  tipc_link_build_state_msg+0x48/0xc0 [tipc]
[    4.524034]  __tipc_node_link_up+0xd7/0x290 [tipc]
[    4.525292]  tipc_rcv+0x5da/0x730 [tipc]
[    4.526346]  ? __netif_receive_skb_core+0xb7/0xfc0
[    4.527601]  tipc_l2_rcv_msg+0x5e/0x90 [tipc]
[    4.528737]  __netif_receive_skb_list_core+0x20b/0x260
[    4.530068]  netif_receive_skb_list_internal+0x1bf/0x2e0
[    4.531450]  ? dev_gro_receive+0x4c2/0x680
[    4.532512]  napi_complete_done+0x6f/0x180
[    4.533570]  virtnet_poll+0x29c/0x42e [virtio_net]
...

The node in question is receiving activate messages in another
thread after changing bearer status to allow message sending/
receiving in current thread:

         thread 1           |              thread 2
         --------           |              --------
                            |
tipc_enable_bearer()        |
  test_and_set_bit_lock()   |
    tipc_bearer_xmit_skb()  |
                            | tipc_l2_rcv_msg()
                            |   tipc_rcv()
                            |     __tipc_node_link_up()
                            |       tipc_link_build_state_msg()
                            |         tipc_link_build_proto_msg()
                            |           tipc_mon_prep()
                            |           {
                            |             ...
                            |             // null-pointer dereference
                            |             u16 gen = mon->dom_gen;
                            |             ...
                            |           }
  // Not being executed yet |
  tipc_mon_create()         |
  {                         |
    ...                     |
    // allocate             |
    mon = kzalloc();        |
    ...                     |
  }                         |

Monitoring pointer in thread 2 is dereferenced before monitoring data
is allocated in thread 1. This causes kernel panic.

This commit fixes it by allocating the monitoring data before enabling
the bearer to receive messages.

Fixes: 35c55c9877 ("tipc: add neighbor monitoring framework")
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04 13:19:13 +00:00
Jakub Kicinski
9f3956d659 bluetooth pull request for net:
- Fix regression with processing of MGMT commands
  - Fix unbalanced unlock in Set Device Flags
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmIhLGwZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKdweD/wIGDx874gJQWGgrm0ASvzi
 1M3YLPu+CSfJ9wRnBS95FmnpBLKXbO6trZ1MNkxeuqS9yb7oyOXx992NrqOxwetm
 AdvA2YvQW7GB61d2QujxM6QkI1BQI2y+QIrDu+6K/YPsRQHhsLoPznlyo3mAimDx
 K7coCWq3qbQSDdjgMUT2/FGv+V6+qRVjUsp72OMs5XocXJ5ruiDKxHahdoBz8GhK
 fUKsx7v+rUjGt4iNygpYt985dOnAAPvcbP/BCk0LLwtCEB8TA4iXXrqAj1Ibzkfi
 Nyf7wtDWuKxTtOprLl1Ed8lGdx1iM+yf2adtNIekId9eWtkw/Ete2GRfHgIvIVIN
 uhIaVNDLOPB/gDzp2s0Rl8eYYUVYrjDBx37UhsUhNGksr6H49fKRBoBO19QBQ3Fk
 0jg+4Kodta5aXxYGPAiTsx1HJU9it0ascFgQBuvvMNKTZycCYTGwjbx4eemwD4z2
 N9gYAY0+S7GzNrm2vlxMq0DafTrcKTydGJX6RWtXRbczMLpcyZVnuBx20gC+qbU6
 /SROOkaSONz+jpYtff22aJ3zva+LpIOdc/XmVtLgHq1bp+S1UM99fAa2QP/om3Gl
 TXjv+NnM3ST370NDh3Zja07kW53NcKxbcRGr9j35r7q4+kFAsilAdsEDUYW/Bkvr
 dcVDhm94cErgfyhLdQ6kxg==
 =Awzt
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2022-03-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fix regression with processing of MGMT commands
 - Fix unbalanced unlock in Set Device Flags

* tag 'for-net-2022-03-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: hci_sync: Fix not processing all entries on cmd_sync_work
  Bluetooth: hci_core: Fix unbalanced unlock in set_device_flags()
====================

Link: https://lore.kernel.org/r/20220303210743.314679-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03 20:31:03 -08:00
Eric Dumazet
2d3916f318 ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
While investigating on why a synchronize_net() has been added recently
in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report()
might drop skbs in some cases.

Discussion about removing synchronize_net() from ipv6_mc_down()
will happen in a different thread.

Fixes: f185de28d9 ("mld: add new workqueues for process mld events")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220303173728.937869-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03 09:47:06 -08:00
Vladimir Oltean
e1bec7fa1c net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change
The blamed commit said one thing but did another. It explains that we
should restore the "return err" to the original "goto out_unwind_tagger",
but instead it replaced it with "goto out_unlock".

When DSA_NOTIFIER_TAG_PROTO fails after the first switch of a
multi-switch tree, the switches would end up not using the same tagging
protocol.

Fixes: 0b0e2ff103 ("net: dsa: restore error path of dsa_tree_change_tag_proto")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220303154249.1854436-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03 08:39:12 -08:00
Vladimir Oltean
10b6bb62ae net: dcb: disable softirqs in dcbnl_flush_dev()
Ido Schimmel points out that since commit 52cff74eef ("dcbnl : Disable
software interrupts before taking dcb_lock"), the DCB API can be called
by drivers from softirq context.

One such in-tree example is the chelsio cxgb4 driver:
dcb_rpl
-> cxgb4_dcb_handle_fw_update
   -> dcb_ieee_setapp

If the firmware for this driver happened to send an event which resulted
in a call to dcb_ieee_setapp() at the exact same time as another
DCB-enabled interface was unregistering on the same CPU, the softirq
would deadlock, because the interrupted process was already holding the
dcb_lock in dcbnl_flush_dev().

Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as
in the rest of the dcbnl code.

Fixes: 91b0383fef ("net: dcb: flush lingering app table entries for unregistered devices")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220302193939.1368823-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03 08:01:55 -08:00
Luiz Augusto von Dentz
008ee9eb8a Bluetooth: hci_sync: Fix not processing all entries on cmd_sync_work
hci_cmd_sync_queue can be called multiple times, each adding a
hci_cmd_sync_work_entry, before hci_cmd_sync_work is run so this makes
sure they are all dequeued properly otherwise it creates a backlog of
entries that are never run.

Link: https://lore.kernel.org/all/CAJCQCtSeUtHCgsHXLGrSTWKmyjaQDbDNpP4rb0i+RE+L2FTXSA@mail.gmail.com/T/
Fixes: 6a98e3836f ("Bluetooth: Add helper for serialized HCI command execution")
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-03-03 13:30:03 +01:00
Hans de Goede
815d512192 Bluetooth: hci_core: Fix unbalanced unlock in set_device_flags()
There is only one "goto done;" in set_device_flags() and this happens
*before* hci_dev_lock() is called, move the done label to after the
hci_dev_unlock() to fix the following unlock balance:

[   31.493567] =====================================
[   31.493571] WARNING: bad unlock balance detected!
[   31.493576] 5.17.0-rc2+ #13 Tainted: G         C  E
[   31.493581] -------------------------------------
[   31.493584] bluetoothd/685 is trying to release lock (&hdev->lock) at:
[   31.493594] [<ffffffffc07603f5>] set_device_flags+0x65/0x1f0 [bluetooth]
[   31.493684] but there are no more locks to release!

Note this bug has been around for a couple of years, but before
commit fe92ee6425 ("Bluetooth: hci_core: Rework hci_conn_params flags")
supported_flags was hardcoded to "((1U << HCI_CONN_FLAG_MAX) - 1)" so
the check for unsupported flags which does the "goto done;" never
triggered.

Fixes: fe92ee6425 ("Bluetooth: hci_core: Rework hci_conn_params flags")
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-03-03 11:35:10 +01:00
D. Wythe
4940a1fdf3 net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
The problem of SMC_CLC_DECL_ERR_REGRMB on the server is very clear.
Based on the fact that whether a new SMC connection can be accepted or
not depends on not only the limit of conn nums, but also the available
entries of rtoken. Since the rtoken release is trigger by peer, while
the conn nums is decrease by local, tons of thing can happen in this
time difference.

This only thing that needs to be mentioned is that now all connection
creations are completely protected by smc_server_lgr_pending lock, it's
enough to check only the available entries in rtokens_used_mask.

Fixes: cd6851f303 ("smc: remote memory buffers (RMBs)")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:34:18 +00:00
D. Wythe
0537f0a215 net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
The main reason for this unexpected SMC_CLC_DECL_ERR_REGRMB in client
dues to following execution sequence:

Server Conn A:           Server Conn B:			Client Conn B:

smc_lgr_unregister_conn
                        smc_lgr_register_conn
                        smc_clc_send_accept     ->
                                                        smc_rtoken_add
smcr_buf_unuse
		->		Client Conn A:
				smc_rtoken_delete

smc_lgr_unregister_conn() makes current link available to assigned to new
incoming connection, while smcr_buf_unuse() has not executed yet, which
means that smc_rtoken_add may fail because of insufficient rtoken_entry,
reversing their execution order will avoid this problem.

Fixes: 3e034725c0 ("net/smc: common functions for RMBs and send buffers")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 10:34:18 +00:00
Eric Dumazet
e3d5ea2c01 tcp: make tcp_read_sock() more robust
If recv_actor() returns an incorrect value, tcp_read_sock()
might loop forever.

Instead, issue a one time warning and make sure to make progress.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20220302161723.3910001-2-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 22:49:03 -08:00
Eric Dumazet
60ce37b039 bpf, sockmap: Do not ignore orig_len parameter
Currently, sk_psock_verdict_recv() returns skb->len

This is problematic because tcp_read_sock() might have
passed orig_len < skb->len, due to the presence of TCP urgent data.

This causes an infinite loop from tcp_read_sock()

Followup patch will make tcp_read_sock() more robust vs bad actors.

Fixes: ef5659280e ("bpf, sockmap: Allow skipping sk_skb parser program")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20220302161723.3910001-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 22:49:03 -08:00
lena wang
224102de2f net: fix up skbs delta_truesize in UDP GRO frag_list
The truesize for a UDP GRO packet is added by main skb and skbs in main
skb's frag_list:
skb_gro_receive_list
        p->truesize += skb->truesize;

The commit 53475c5dd8 ("net: fix use-after-free when UDP GRO with
shared fraglist") introduced a truesize increase for frag_list skbs.
When uncloning skb, it will call pskb_expand_head and trusesize for
frag_list skbs may increase. This can occur when allocators uses
__netdev_alloc_skb and not jump into __alloc_skb. This flow does not
use ksize(len) to calculate truesize while pskb_expand_head uses.
skb_segment_list
err = skb_unclone(nskb, GFP_ATOMIC);
pskb_expand_head
        if (!skb->sk || skb->destructor == sock_edemux)
                skb->truesize += size - osize;

If we uses increased truesize adding as delta_truesize, it will be
larger than before and even larger than previous total truesize value
if skbs in frag_list are abundant. The main skb truesize will become
smaller and even a minus value or a huge value for an unsigned int
parameter. Then the following memory check will drop this abnormal skb.

To avoid this error we should use the original truesize to segment the
main skb.

Fixes: 53475c5dd8 ("net: fix use-after-free when UDP GRO with shared fraglist")
Signed-off-by: lena wang <lena.wang@mediatek.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1646133431-8948-1-git-send-email-lena.wang@mediatek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 22:04:10 -08:00
Jakub Kicinski
ea97ab9889 Here are some batman-adv bugfixes:
- Remove redundant iflink requests, by Sven Eckelmann (2 patches)
 
  - Don't expect inter-netns unique iflink indices, by Sven Eckelmann
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmIfmmUWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoSy0D/9XfRgiSQH9DuyzXjlRMV1qMgNO
 9O0HOVqZagRhtydGNsBhu1mSFcBIFUhyRD20BvyKrulsgb9Cpkojk9zv31kV0W9S
 PIK81oNVplAiDnz3aZmGWqC1GEZHziJOezvr8eBIvdRay+F/M5oqWqqtQmDyZFXc
 JEnQcJyNZTGAmfYKM8Nq5iAnqoIkPIxChvipR7P12g/cjd0owbzl/QShdlLw0nDe
 zK8Pj/zXSA+hXdIsWrh9n7yU7m7sTWWMOo0ElkOhwb6DyDxAhEvIe59jcv0SE+HL
 7zHFf1hJluO7mswK9pJSIm80+P5WtyEBskUjCL0rv7fdAzvuLsv5YLoZvEmYf39h
 /5Wo3rM5iXwNuT9hB9+teUZaPBF1dXAYOusN1Y/vuvNdG+YCq8BJn5UDmiXyYEWF
 23N1R1LQrYs+JoitIS01cpTi69rjZFN5xqtFs+gN84jgfb4vkP2JxjHyz2eBB47Q
 UTYaUh60uYtMLYXZdVFwUP7DbJnN5fGlPg9zIPAU+yHHvyqmrGjY80vn+voJ0Y5w
 wNOzTpyg7+KFpM1CpCccqivWxuJeVbgVS5jL+xegpkf7PgOy5oepkMyThZ24yy9o
 K3uTHCl0HH+YrvVFWZVrqEVQoMZ3ErQmihHXwZZy/p71MeQ/m8gnT2iEMaOlxX3k
 Efrh9mM6JI+wLCtZLA==
 =kOlX
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-pullrequest-20220302' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here are some batman-adv bugfixes:

 - Remove redundant iflink requests, by Sven Eckelmann (2 patches)

 - Don't expect inter-netns unique iflink indices, by Sven Eckelmann

* tag 'batadv-net-pullrequest-20220302' of git://git.open-mesh.org/linux-merge:
  batman-adv: Don't expect inter-netns unique iflink indices
  batman-adv: Request iflink once in batadv_get_real_netdevice
  batman-adv: Request iflink once in batadv-on-batadv check
====================

Link: https://lore.kernel.org/r/20220302163049.101957-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 21:53:35 -08:00
Sreeramya Soratkal
e50b88c4f0 nl80211: Update bss channel on channel switch for P2P_CLIENT
The wdev channel information is updated post channel switch only for
the station mode and not for the other modes. Due to this, the P2P client
still points to the old value though it moved to the new channel
when the channel change is induced from the P2P GO.

Update the bss channel after CSA channel switch completion for P2P client
interface as well.

Signed-off-by: Sreeramya Soratkal <quic_ssramya@quicinc.com>
Link: https://lore.kernel.org/r/1646114600-31479-1-git-send-email-quic_ssramya@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-02 22:37:05 +01:00
Sven Eckelmann
6c1f41afc1 batman-adv: Don't expect inter-netns unique iflink indices
The ifindex doesn't have to be unique for multiple network namespaces on
the same machine.

  $ ip netns add test1
  $ ip -net test1 link add dummy1 type dummy
  $ ip netns add test2
  $ ip -net test2 link add dummy2 type dummy

  $ ip -net test1 link show dev dummy1
  6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 96:81:55:1e:dd:85 brd ff:ff:ff:ff:ff:ff
  $ ip -net test2 link show dev dummy2
  6: dummy2: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 5a:3c:af:35:07:c3 brd ff:ff:ff:ff:ff:ff

But the batman-adv code to walk through the various layers of virtual
interfaces uses this assumption because dev_get_iflink handles it
internally and doesn't return the actual netns of the iflink. And
dev_get_iflink only documents the situation where ifindex == iflink for
physical devices.

But only checking for dev->netdev_ops->ndo_get_iflink is also not an option
because ipoib_get_iflink implements it even when it sometimes returns an
iflink != ifindex and sometimes iflink == ifindex. The caller must
therefore make sure itself to check both netns and iflink + ifindex for
equality. Only when they are equal, a "physical" interface was detected
which should stop the traversal. On the other hand, vxcan_get_iflink can
also return 0 in case there was currently no valid peer. In this case, it
is still necessary to stop.

Fixes: b7eddd0b39 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
Fixes: 5ed4a460a1 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-02 09:24:55 +01:00