Commit Graph

984366 Commits

Author SHA1 Message Date
Jakub Kicinski
ebf322822c Merge branch 'mptcp-another-set-of-miscellaneous-mptcp-fixes'
Mat Martineau says:

====================
mptcp: Another set of miscellaneous MPTCP fixes

This is another collection of MPTCP fixes and enhancements that we have
tested in the MPTCP tree:

Patch 1 cleans up cgroup attachment for in-kernel subflow sockets.

Patches 2 and 3 make sure that deletion of advertised addresses by an
MPTCP path manager when flushing all addresses behaves similarly to the
remove-single-address operation, and adds related tests.

Patches 4 and 8 do some minor cleanup.

Patches 5-7 add MPTCP_FASTCLOSE functionality. Note that patch 6 adds MPTCP
option parsing to tcp_reset().

Patch 9 optimizes skb size for outgoing MPTCP packets.
====================

Link: https://lore.kernel.org/r/20201210222506.222251-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:30:09 -08:00
Paolo Abeni
15e6ca974b mptcp: let MPTCP create max size skbs
Currently the xmit path of the MPTCP protocol creates smaller-
than-max-size skbs, which is suboptimal for the performances.

There are a few things to improve:
- when coalescing to an existing skb, must clear the PUSH flag
- tcp_build_frag() expect the available space as an argument.
  When coalescing is enable MPTCP already subtracted the
  to-be-coalesced skb len. We must increment said argument
  accordingly.

Before:
./use_mptcp.sh netperf -H 127.0.0.1 -t TCP_STREAM
[...]
131072  16384  16384    30.00    24414.86

After:
./use_mptcp.sh netperf -H 127.0.0.1 -t TCP_STREAM
[...]
131072  16384  16384    30.05    28357.69

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:30:06 -08:00
Paolo Abeni
1bc7327b5f mptcp: pm: simplify select_local_address()
There is no need to unconditionally acquire the join list
lock, we can simply splice the join list into the subflow
list and traverse only the latter.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:30:06 -08:00
Florian Westphal
50c504a20a mptcp: parse and act on incoming FASTCLOSE option
parse the MPTCP FASTCLOSE subtype.

If provided key matches the local one, schedule the work queue to close
(with tcp reset) all subflows.

The MPTCP socket moves to closed state immediately.

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:30:06 -08:00
Florian Westphal
049fe386d3 tcp: parse mptcp options contained in reset packets
Because TCP-level resets only affect the subflow, there is a MPTCP
option to indicate that the MPTCP-level connection should be closed
immediately without a mptcp-level fin exchange.

This is the 'MPTCP fast close option'.  It can be carried on ack
segments or TCP resets.  In the latter case, its needed to parse mptcp
options also for reset packets so that MPTCP can act accordingly.

Next patch will add receive side fastclose support in MPTCP.

Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:30:06 -08:00
Florian Westphal
ab82e996a1 mptcp: hold mptcp socket before calling tcp_done
When processing options from tcp reset path its possible that
tcp_done(ssk) drops the last reference on the mptcp socket which
results in use-after-free.

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:30:06 -08:00
Geliang Tang
ba34c3de71 mptcp: use MPTCPOPT_HMAC_LEN macro
Use the macro MPTCPOPT_HMAC_LEN instead of a constant in struct
mptcp_options_received.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:30:06 -08:00
Geliang Tang
6fe4ccdc3d selftests: mptcp: add the flush addrs testcase
This patch added the flush addrs testcase. In do_transfer, if the number
of removing addresses is less than 8, use the del addr command to remove
the addresses one by one. If the number is more than 8, use the flush addrs
command to remove the addresses.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:30:06 -08:00
Geliang Tang
141694df65 mptcp: remove address when netlink flushes addrs
When the PM netlink flushes the addresses, invoke the remove address
function mptcp_nl_remove_subflow_and_signal_addr to remove the addresses
and the subflows. Since this function should not be invoked under lock,
move __flush_addrs out of the pernet->lock.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:30:06 -08:00
Nicolas Rybowski
3764b0c565 mptcp: attach subflow socket to parent cgroup
It has been observed that the kernel sockets created for the subflows
(except the first one) are not in the same cgroup as their parents.
That's because the additional subflows are created by kernel workers.

This is a problem with eBPF programs attached to the parent's
cgroup won't be executed for the children. But also with any other features
of CGroup linked to a sk.

This patch fixes this behaviour.

As the subflow sockets are created by the kernel, we can't use
'mem_cgroup_sk_alloc' because of the current context being the one of the
kworker. This is why we have to do low level memcg manipulation, if
required.

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:30:06 -08:00
Linus Torvalds
e857b6fcc5 A moderate set of locking updates:
- A few extensions to the rwsem API and support for opportunistic
     spinning and lock stealing
 
   - lockdep selftest improvements
 
   - Documentation updates
 
   - Cleanups and small fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XvCgTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodbkD/9kmXCablxzCG+IGdRU0KfSvbalHkoS
 hUW7sJ8qYdoysOVMdvImPwqxLDy/P6D8Nk6z+hdaPmfWvIDQQECd7Mg/UhZLkRzI
 BGNgpatnzX4PK5sm/IFExCisPCkkbkjprocnk//TGjdwTiMMDxrndsEpwVwcucDp
 TwOjPPxoAbfWHUmnv2SUOD7mWMqMH/ISTQlKUaz+UCQicPNuHumdsQKvZx3eu7Cv
 KvucTso5Qjmyy0HwpmJO/IEyZs7Ibrb5Ocw5wds3yo2PFTjYTvo3JlJ16g8IvaZW
 ckk+o+3QKp29oFAPQ+dFGEG10w4JQI3AZkDVouFR4BDD0sbOm7BvWCsVq/J8vk3i
 xnmaHT3zB5F4T97O+osBj2KS4zLliOHohWzDNv1+JVBCfniYbPo5hqa/n7OO2oot
 M3xXY3ddgfTEUOtvOPPfZwfG5XmPrgwj8iiyywlTQU4BR5rWYj2ehvhWOwugQJ6x
 g56nQzuf3KmyoI2S+1GZoxtgWSLwoXbUAPL8p4lyvy6jKKFV84BOJeVac803BBUo
 yLFBSvTfZ95iNc84XHjJOJ/MGE8e2hOGa2KEdxuh1qE5FPazBg5e2cQh2j125PLz
 uyhelQn7SgAHSKSXSAOPq0JFsrmxRmkzIgG9zLSEqo+6g6uKdWgGYVCbEzOB+9gB
 2tNEgP6Mfh+ARg==
 =uqcN
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Thomas Gleixner:
 "A moderate set of locking updates:

   - A few extensions to the rwsem API and support for opportunistic
     spinning and lock stealing

   - lockdep selftest improvements

   - Documentation updates

   - Cleanups and small fixes all over the place"

* tag 'locking-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  seqlock: kernel-doc: Specify when preemption is automatically altered
  seqlock: Prefix internal seqcount_t-only macros with a "do_"
  Documentation: seqlock: s/LOCKTYPE/LOCKNAME/g
  locking/rwsem: Remove reader optimistic spinning
  locking/rwsem: Enable reader optimistic lock stealing
  locking/rwsem: Prevent potential lock starvation
  locking/rwsem: Pass the current atomic count to rwsem_down_read_slowpath()
  locking/rwsem: Fold __down_{read,write}*()
  locking/rwsem: Introduce rwsem_write_trylock()
  locking/rwsem: Better collate rwsem_read_trylock()
  rwsem: Implement down_read_interruptible
  rwsem: Implement down_read_killable_nested
  refcount: Fix a kernel-doc markup
  completion: Drop init_completion define
  atomic: Update MAINTAINERS
  atomic: Delete obsolete documentation
  seqlock: Rename __seqprop() users
  lockdep/selftest: Add spin_nest_lock test
  lockdep/selftests: Fix PROVE_RAW_LOCK_NESTING
  seqlock: avoid -Wshadow warnings
  ...
2020-12-14 17:27:47 -08:00
Loic Poulain
efc36d3c34 net: mhi: Fix unexpected queue wake
This patch checks that MHI queue is not full before waking up the net
queue. This fix sporadic MHI queueing issues in xmit. Indeed xmit and
its symmetric complete callback (ul_callback) can run concurently, it
is then not safe to unconditionnaly waking the queue in the callback
without checking queue fullness.

Fixes: 3ffec6a14f ("net: Add mhi-net driver")
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Link: https://lore.kernel.org/r/1607599507-5879-1-git-send-email-loic.poulain@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:25:56 -08:00
Rasmus Villemoes
49506a9ba0 net: dsa: mv88e6xxx: don't set non-existing learn2all bit for 6220/6250
The 6220 and 6250 switches do not have a learn2all bit in global1, ATU
control register; bit 3 is reserverd.

On the switches that do have that bit, it is used to control whether
learning frames are sent out the ports that have the message_port bit
set. So rather than adding yet another chip method, use the existence
of the ->port_setup_message_port method as a proxy for determining
whether the learn2all bit exists (and should be set).

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Link: https://lore.kernel.org/r/20201210110645.27765-1-rasmus.villemoes@prevas.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:25:03 -08:00
Linus Torvalds
8c1dccc803 RCU, LKMM and KCSAN updates collected by Paul McKenney:
RCU:
 
     - Avoid cpuinfo-induced IPI pileups and idle-CPU IPIs.
 
     - Lockdep-RCU updates reducing the need for __maybe_unused.
 
     - Tasks-RCU updates.
 
     - Miscellaneous fixes.
 
     - Documentation updates.
 
     - Torture-test updates.
 
   KCSAN:
 
     - updates for selftests, avoiding setting watchpoints on NULL pointers
 
     - fix to watchpoint encoding
 
   LKMM:
 
     - updates for documentation along with some updates to example-code
       litmus tests
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/Xon4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobXUD/92LJTI/TMgK6Z6EEQBiJZO/2mNKjK8
 FEKc6AqTNMlZNsWCfQ5UgqtHpn+MkBZsX1x4u22gehE1qaCB8gnQ5wXgbXon8tQm
 exxVk6vvQZjseeqCMqrsUYQlD7dNgHnf1qAmWXJvji4sA/1Opo6n2M74tqfE2ueV
 S5hpQwSuK/6Zu2Hrr62HD8+Fx0in6ZuKRZxHGp1392l++DGbniJM3dzntRXB+JbZ
 w3PDHFCQuGzTytyeKuQV48ot9IK+2YzmjIp/+4tHL6mvU38xeSu6gcYtqKPcfYWw
 D6HXvDa965h5IrFdSA2JWSzjJ+VYgZVElk2HyXDNIae0fM/8GidgoIDQipT1WAur
 sxW/Ke4U6Jm5MMqXqV8iMNduktkGD1/h6G/iB1Yis29xFdthorNpbHVAP+8cKXgf
 1cR6RorOuBYv6XpyzygHtE7qfLY5ST352pJ4+UqNzboujOcuEnGaygttt0F/F8sA
 ZH8NT5dyUfbGeqepdZWkbj116Hjeg3fyV3CZeyBhDeqpjf1Nn3nbJ1xRksPLfa3i
 IKvN7HSzEg+vKnsJNnQeFlAmQ/W3n2bedzRqfaCg77pNhKI6jPuavY5f2YGFUj0y
 yx0UzOYoI1Cln0keBMmynbyUKgJ7zstLkrt/JenjhtD3B+0df5BmYjkL+nqkP6ax
 +XTCu7Xg+B061g==
 =N/iO
 -----END PGP SIGNATURE-----

Merge tag 'core-rcu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Thomas Gleixner:
 "RCU, LKMM and KCSAN updates collected by Paul McKenney.

  RCU:
   - Avoid cpuinfo-induced IPI pileups and idle-CPU IPIs

   - Lockdep-RCU updates reducing the need for __maybe_unused

   - Tasks-RCU updates

   - Miscellaneous fixes

   - Documentation updates

   - Torture-test updates

  KCSAN:
   - updates for selftests, avoiding setting watchpoints on NULL pointers

   - fix to watchpoint encoding

  LKMM:
   - updates for documentation along with some updates to example-code
     litmus tests"

* tag 'core-rcu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  srcu: Take early exit on memory-allocation failure
  rcu/tree: Defer kvfree_rcu() allocation to a clean context
  rcu: Do not report strict GPs for outgoing CPUs
  rcu: Fix a typo in rcu_blocking_is_gp() header comment
  rcu: Prevent lockdep-RCU splats on lock acquisition/release
  rcu/tree: nocb: Avoid raising softirq for offloaded ready-to-execute CBs
  rcu,ftrace: Fix ftrace recursion
  rcu/tree: Make struct kernel_param_ops definitions const
  rcu/tree: Add a warning if CPU being onlined did not report QS already
  rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config
  rcu: Fix single-CPU check in rcu_blocking_is_gp()
  rcu: Implement rcu_segcblist_is_offloaded() config dependent
  list.h: Update comment to explicitly note circular lists
  rcu: Panic after fixed number of stalls
  x86/smpboot:  Move rcu_cpu_starting() earlier
  rcu: Allow rcu_irq_enter_check_tick() from NMI
  tools/memory-model: Label MP tests' producers and consumers
  tools/memory-model: Use "buf" and "flag" for message-passing tests
  tools/memory-model: Add types to litmus tests
  tools/memory-model: Add a glossary of LKMM terms
  ...
2020-12-14 17:21:16 -08:00
Eelco Chaudron
09d6217254 net: openvswitch: fix TTL decrement exception action execution
Currently, the exception actions are not processed correctly as the wrong
dataset is passed. This change fixes this, including the misleading
comment.

In addition, a check was added to make sure we work on an IPv4 packet,
and not just assume if it's not IPv6 it's IPv4.

This was all tested using OVS with patch,
https://patchwork.ozlabs.org/project/openvswitch/list/?series=21639,
applied and sending packets with a TTL of 1 (and 0), both with IPv4
and IPv6.

Fixes: 69929d4c49 ("net: openvswitch: fix TTL decrement action netlink message format")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160733569860.3007.12938188180387116741.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:18:25 -08:00
Linus Torvalds
1ac0884d54 A set of updates for entry/exit handling:
- More generalization of entry/exit functionality
 
  - The consolidation work to reclaim TIF flags on x86 and also for non-x86
    specific TIF flags which are solely relevant for syscall related work
    and have been moved into their own storage space. The x86 specific part
    had to be merged in to avoid a major conflict.
 
  - The TIF_NOTIFY_SIGNAL work which replaces the inefficient signal
    delivery mode of task work and results in an impressive performance
    improvement for io_uring. The non-x86 consolidation of this is going to
    come seperate via Jens.
 
  - The selective syscall redirection facility which provides a clean and
    efficient way to support the non-Linux syscalls of WINE by catching them
    at syscall entry and redirecting them to the user space emulation. This
    can be utilized for other purposes as well and has been designed
    carefully to avoid overhead for the regular fastpath. This includes the
    core changes and the x86 support code.
 
  - Simplification of the context tracking entry/exit handling for the users
    of the generic entry code which guarantee the proper ordering and
    protection.
 
  - Preparatory changes to make the generic entry code accomodate S390
    specific requirements which are mostly related to their syscall restart
    mechanism.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XoPoTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoe0tD/4jSKHIogVM9kVpiYfwjDGS1NluaBXn
 71ZoASbX9GZebyGandMyF2QP1iJ24ZO0RztBwHEVH6fyomKB2iFNedssCpO9yfWV
 3eFRpOvMpbszY2W2bd0QG3GrqaTttjVfB4ahkGLzqeSbchdob6hZpNDYtBZnujA6
 GSnrrurfJkCGoQny+yJQYdQJXQU+BIX90B2a2Q+jW123Luy/iHXC1f/krZSA1m14
 fC9xYLSUjPphTzh2ZOW+C3DgdjOL5PfAm/6F+DArt4GtLgrEGD7R74aLSFhvetky
 dn5QtG+yAsz1i0cc5Wu/JBcT9tOkY92rPYSyLI9bYQUSQ/bMyuprz6oYKj3dubsu
 ZSsKPdkNFPIniL4fLdCMWZcIXX5xgnrxKjdgXZXW3gtrcxSns8w8uED3Sh7dgE08
 pgIeq67E5g/OB8kJXH1VxdewmeQb9cOmnzzHwNO7TrrGbBKjDTYHNdYOKf1dUTTK
 ZX1UjLfGwxTkMYAbQD1k0JGZ2OLRshzSaH5BW/ZKa3bvJW6yYOq+/YT8B8hbJ8U3
 vThlO75/55IJxS5r5Y3vZd/IHdsYbPuETD+TA8tNYtPqNZasW8nnk4TYctWqzDuO
 /Ka1wvWYid3c6ySznQn4zSyRjr968AfHeZ9YTUMhWufy5waXVmdBMG41u3IKfsVt
 osyzNc4EK19/Mg==
 =hsjV
 -----END PGP SIGNATURE-----

Merge tag 'core-entry-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core entry/exit updates from Thomas Gleixner:
 "A set of updates for entry/exit handling:

   - More generalization of entry/exit functionality

   - The consolidation work to reclaim TIF flags on x86 and also for
     non-x86 specific TIF flags which are solely relevant for syscall
     related work and have been moved into their own storage space. The
     x86 specific part had to be merged in to avoid a major conflict.

   - The TIF_NOTIFY_SIGNAL work which replaces the inefficient signal
     delivery mode of task work and results in an impressive performance
     improvement for io_uring. The non-x86 consolidation of this is
     going to come seperate via Jens.

   - The selective syscall redirection facility which provides a clean
     and efficient way to support the non-Linux syscalls of WINE by
     catching them at syscall entry and redirecting them to the user
     space emulation. This can be utilized for other purposes as well
     and has been designed carefully to avoid overhead for the regular
     fastpath. This includes the core changes and the x86 support code.

   - Simplification of the context tracking entry/exit handling for the
     users of the generic entry code which guarantee the proper ordering
     and protection.

   - Preparatory changes to make the generic entry code accomodate S390
     specific requirements which are mostly related to their syscall
     restart mechanism"

* tag 'core-entry-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  entry: Add syscall_exit_to_user_mode_work()
  entry: Add exit_to_user_mode() wrapper
  entry_Add_enter_from_user_mode_wrapper
  entry: Rename exit_to_user_mode()
  entry: Rename enter_from_user_mode()
  docs: Document Syscall User Dispatch
  selftests: Add benchmark for syscall user dispatch
  selftests: Add kselftest for syscall user dispatch
  entry: Support Syscall User Dispatch on common syscall entry
  kernel: Implement selective syscall userspace redirection
  signal: Expose SYS_USER_DISPATCH si_code type
  x86: vdso: Expose sigreturn address on vdso to the kernel
  MAINTAINERS: Add entry for common entry code
  entry: Fix boot for !CONFIG_GENERIC_ENTRY
  x86: Support HAVE_CONTEXT_TRACKING_OFFSTACK
  context_tracking: Only define schedule_user() on !HAVE_CONTEXT_TRACKING_OFFSTACK archs
  sched: Detect call to schedule from critical entry code
  context_tracking: Don't implement exception_enter/exit() on CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK
  context_tracking: Introduce HAVE_CONTEXT_TRACKING_OFFSTACK
  x86: Reclaim unused x86 TI flags
  ...
2020-12-14 17:13:53 -08:00
Linus Torvalds
ff6135959a A much quieter cycle for documentation (happily), with, one hopes, the bulk
of the churn behind us.  Significant stuff in this pull includes:
 
  - A set of new Chinese translations
  - Italian translation updates
  - A mechanism from Mauro to automatically format Documentation/features
    for the built docs
  - Automatic cross references without explicit :ref: markup
  - A new reset-controller document
  - An extensive new document on reporting problems from Thorsten
 
 That last patch also adds the CC-BY-4.0 license to LICENSES/dual; there was
 some discussion on this, but we seem to have consensus and an ack from Greg
 for that addition.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl/XyewPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YUeYH/AiNNlVIF/80T45TAkm+1kpy2Fb+d/5wbtGK
 PB7OTXPyDmmqwZNldlF9IsRhp5W+wYC3PNlulYMG44hT7/Jqf2CMFw8SOZqGLmBV
 LhWwoS+TAWLB19IOOMrVXbhAlNsX01NwBDY/dwONjW1Jcu+tuAsBR47T9lKjw4kJ
 qGFGMQTvZG9Ig1x7E6X38mAd7W3SD1viNuUePS2YcoB15GAocWfVVHvu1r+RHUTS
 27ET8tWzMMuiaCAD6toVY9L4T7iCI7YSPXQm8BOkf/f4LXDnpo8Fo11LE5ozTAh3
 +avnNt8vnrRXc06MnzwsvNHm2TqN97B4shkeDiPAV3ySXI8Zu/w=
 =HScX
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.11' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "A much quieter cycle for documentation (happily), with, one hopes, the
  bulk of the churn behind us. Significant stuff in this pull includes:

   - A set of new Chinese translations

   - Italian translation updates

   - A mechanism from Mauro to automatically format
     Documentation/features for the built docs

   - Automatic cross references without explicit :ref: markup

   - A new reset-controller document

   - An extensive new document on reporting problems from Thorsten

  That last patch also adds the CC-BY-4.0 license to LICENSES/dual;
  there was some discussion on this, but we seem to have consensus and
  an ack from Greg for that addition"

* tag 'docs-5.11' of git://git.lwn.net/linux: (50 commits)
  docs: fix broken cross reference in translations/zh_CN
  docs: Note that sphinx 1.7 will be required soon
  docs: update requirements to install six module
  docs: reporting-issues: move 'outdated, need help' note to proper place
  docs: Update documentation to reflect what TAINT_CPU_OUT_OF_SPEC means
  docs: add a reset controller chapter to the driver API docs
  docs: make reporting-bugs.rst obsolete
  docs: Add a new text describing how to report bugs
  LICENSES: Add the CC-BY-4.0 license
  Documentation: fix multiple typos found in the admin-guide subdirectory
  Documentation: fix typos found in admin-guide subdirectory
  kernel-doc: Fix example in Nested structs/unions
  docs: clean up sysctl/kernel: titles, version
  docs: trace: fix event state structure name
  docs: nios2: add missing ReST file
  scripts: get_feat.pl: reduce table width for all features output
  scripts: get_feat.pl: change the group by order
  scripts: get_feat.pl: make complete table more coincise
  scripts: kernel-doc: fix parsing function-like typedefs
  Documentation: fix typos found in process, dev-tools, and doc-guide subdirectories
  ...
2020-12-14 16:55:54 -08:00
Linus Torvalds
f9b4240b07 fixes-v5.11
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCX9daOgAKCRCRxhvAZXjc
 ohPkAQChXUB2BAjtIzXlCkZoDBbzHHblm5DZ37oy/4xYFmAcEwEA5sw6dQqyGHnF
 GEP9def51HvXLpBV2BzNUGggo1SoGgQ=
 =w/cO
 -----END PGP SIGNATURE-----

Merge tag 'fixes-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull misc fixes from Christian Brauner:
 "This contains several fixes which felt worth being combined into a
  single branch:

   - Use put_nsproxy() instead of open-coding it switch_task_namespaces()

   - Kirill's work to unify lifecycle management for all namespaces. The
     lifetime counters are used identically for all namespaces types.
     Namespaces may of course have additional unrelated counters and
     these are not altered. This work allows us to unify the type of the
     counters and reduces maintenance cost by moving the counter in one
     place and indicating that basic lifetime management is identical
     for all namespaces.

   - Peilin's fix adding three byte padding to Dmitry's
     PTRACE_GET_SYSCALL_INFO uapi struct to prevent an info leak.

   - Two smal patches to convert from the /* fall through */ comment
     annotation to the fallthrough keyword annotation which I had taken
     into my branch and into -next before df561f6688 ("treewide: Use
     fallthrough pseudo-keyword") made it upstream which fixed this
     tree-wide.

     Since I didn't want to invalidate all testing for other commits I
     didn't rebase and kept them"

* tag 'fixes-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  nsproxy: use put_nsproxy() in switch_task_namespaces()
  sys: Convert to the new fallthrough notation
  signal: Convert to the new fallthrough notation
  time: Use generic ns_common::count
  cgroup: Use generic ns_common::count
  mnt: Use generic ns_common::count
  user: Use generic ns_common::count
  pid: Use generic ns_common::count
  ipc: Use generic ns_common::count
  uts: Use generic ns_common::count
  net: Use generic ns_common::count
  ns: Add a common refcount into ns_common
  ptrace: Prevent kernel-infoleak in ptrace_get_syscall_info()
2020-12-14 16:40:27 -08:00
Linus Torvalds
6d93a1971a time-namespace-v5.11
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCX9cwgAAKCRCRxhvAZXjc
 onViAP9CDMQct0RfdpdKOrh4NkxWiheBp7CzVSP1Xfy8KHBslgD/X7kilcthT8PC
 JTJmngrVWoehX+s49kl2PSuuLsGElAo=
 =llnx
 -----END PGP SIGNATURE-----

Merge tag 'time-namespace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull time namespace updates from Christian Brauner:
 "When time namespaces were introduced we missed to virtualize the
  'btime' field in /proc/stat. This confuses tasks which are in another
  time namespace with a virtualized boottime which is common in some
  container workloads. This contains Michael's series to fix 'btime'
  which Thomas asked me to take through my tree.

  To fix 'btime' virtualization we simply subtract the offset of the
  time namespace's boottime from btime before printing the stats. Note
  that since start_boottime of processes are seconds since boottime and
  the boottime stamp is now shifted according to the time namespace's
  offset, the offset of the time namespace also needs to be applied
  before the process stats are given to userspace. This avoids that
  processes shown by tools such as 'ps' appear as time travelers in the
  corresponding time namespace.

  Selftests are included to verify that btime virtualization in
  /proc/stat works as expected"

* tag 'time-namespace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  namespace: make timens_on_fork() return nothing
  selftests/timens: added selftest for /proc/stat btime
  fs/proc: apply the time namespace offset to /proc/stat btime
  timens: additional helper functions for boottime offset handling
2020-12-14 16:35:39 -08:00
Dmitry Torokhov
4b4193256c Merge branch 'next' into for-linus
Prepare input updates for 5.11 merge window.
2020-12-14 16:27:23 -08:00
Linus Torvalds
0ca2ce81eb arm64 updates for 5.11:
- Expose tag address bits in siginfo. The original arm64 ABI did not
   expose any of the bits 63:56 of a tagged address in siginfo. In the
   presence of user ASAN or MTE, this information may be useful. The
   implementation is generic to other architectures supporting tags (like
   SPARC ADI, subject to wiring up the arch code). The user will have to
   opt in via sigaction(SA_EXPOSE_TAGBITS) so that the extra bits, if
   available, become visible in si_addr.
 
 - Default to 32-bit wide ZONE_DMA. Previously, ZONE_DMA was set to the
   lowest 1GB to cope with the Raspberry Pi 4 limitations, to the
   detriment of other platforms. With these changes, the kernel scans the
   Device Tree dma-ranges and the ACPI IORT information before deciding
   on a smaller ZONE_DMA.
 
 - Strengthen READ_ONCE() to acquire when CONFIG_LTO=y. When building
   with LTO, there is an increased risk of the compiler converting an
   address dependency headed by a READ_ONCE() invocation into a control
   dependency and consequently allowing for harmful reordering by the
   CPU.
 
 - Add CPPC FFH support using arm64 AMU counters.
 
 - set_fs() removal on arm64. This renders the User Access Override (UAO)
   ARMv8 feature unnecessary.
 
 - Perf updates: PMU driver for the ARM DMC-620 memory controller, sysfs
   identifier file for SMMUv3, stop event counters support for i.MX8MP,
   enable the perf events-based hard lockup detector.
 
 - Reorganise the kernel VA space slightly so that 52-bit VA
   configurations can use more virtual address space.
 
 - Improve the robustness of the arm64 memory offline event notifier.
 
 - Pad the Image header to 64K following the EFI header definition
   updated recently to increase the section alignment to 64K.
 
 - Support CONFIG_CMDLINE_EXTEND on arm64.
 
 - Do not use tagged PC in the kernel (TCR_EL1.TBID1==1), freeing up 8
   bits for PtrAuth.
 
 - Switch to vmapped shadow call stacks.
 
 - Miscellaneous clean-ups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl/XcSgACgkQa9axLQDI
 XvGkwg//SLknimELD/cphf2UzZm5RFuCU0x1UnIXs9XYo5BrOpgVLLA//+XkCrKN
 0GLAdtBDfw1axWJudzgMBiHrv6wSGh4p3YWjLIW06u/PJu3m3U8oiiolvvF8d7Yq
 UKDseKGQnQkrl97J0SyA+Da/u8D11GEzp52SWL5iRxzt6vInEC27iTOp9n1yoaoP
 f3y7qdp9kv831ryUM3rXFYpc8YuMWXk+JpBSNaxqmjlvjMzipA5PhzBLmNzfc657
 XcrRX5qsgjEeJW8UUnWUVNB42j7tVzN77yraoUpoVVCzZZeWOQxqq5EscKPfIhRt
 AjtSIQNOs95ZVE0SFCTjXnUUb823coUs4dMCdftqlE62JNRwdR+3bkfa+QjPTg1F
 O9ohW1AzX0/JB19QBxMaOgbheB8GFXh3DVJ6pizTgxJgyPvQQtFuEhT1kq8Cst0U
 Pe+pEWsg9t41bUXNz+/l9tUWKWpeCfFNMTrBXLmXrNlTLeOvDh/0UiF0+2lYJYgf
 YAboibQ5eOv2wGCcSDEbNMJ6B2/6GtubDJxH4du680F6Emb6pCSw0ntPwB7mSGLG
 5dXz+9FJxDLjmxw7BXxQgc5MoYIrt5JQtaOQ6UxU8dPy53/+py4Ck6tXNkz0+Ap7
 gPPaGGy1GqobQFu3qlHtOK1VleQi/sWcrpmPHrpiiFUf6N7EmcY=
 =zXFk
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Expose tag address bits in siginfo. The original arm64 ABI did not
   expose any of the bits 63:56 of a tagged address in siginfo. In the
   presence of user ASAN or MTE, this information may be useful. The
   implementation is generic to other architectures supporting tags
   (like SPARC ADI, subject to wiring up the arch code). The user will
   have to opt in via sigaction(SA_EXPOSE_TAGBITS) so that the extra
   bits, if available, become visible in si_addr.

 - Default to 32-bit wide ZONE_DMA. Previously, ZONE_DMA was set to the
   lowest 1GB to cope with the Raspberry Pi 4 limitations, to the
   detriment of other platforms. With these changes, the kernel scans
   the Device Tree dma-ranges and the ACPI IORT information before
   deciding on a smaller ZONE_DMA.

 - Strengthen READ_ONCE() to acquire when CONFIG_LTO=y. When building
   with LTO, there is an increased risk of the compiler converting an
   address dependency headed by a READ_ONCE() invocation into a control
   dependency and consequently allowing for harmful reordering by the
   CPU.

 - Add CPPC FFH support using arm64 AMU counters.

 - set_fs() removal on arm64. This renders the User Access Override
   (UAO) ARMv8 feature unnecessary.

 - Perf updates: PMU driver for the ARM DMC-620 memory controller, sysfs
   identifier file for SMMUv3, stop event counters support for i.MX8MP,
   enable the perf events-based hard lockup detector.

 - Reorganise the kernel VA space slightly so that 52-bit VA
   configurations can use more virtual address space.

 - Improve the robustness of the arm64 memory offline event notifier.

 - Pad the Image header to 64K following the EFI header definition
   updated recently to increase the section alignment to 64K.

 - Support CONFIG_CMDLINE_EXTEND on arm64.

 - Do not use tagged PC in the kernel (TCR_EL1.TBID1==1), freeing up 8
   bits for PtrAuth.

 - Switch to vmapped shadow call stacks.

 - Miscellaneous clean-ups.

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits)
  perf/imx_ddr: Add system PMU identifier for userspace
  bindings: perf: imx-ddr: add compatible string
  arm64: Fix build failure when HARDLOCKUP_DETECTOR_PERF is enabled
  arm64: mte: fix prctl(PR_GET_TAGGED_ADDR_CTRL) if TCF0=NONE
  arm64: mark __system_matches_cap as __maybe_unused
  arm64: uaccess: remove vestigal UAO support
  arm64: uaccess: remove redundant PAN toggling
  arm64: uaccess: remove addr_limit_user_check()
  arm64: uaccess: remove set_fs()
  arm64: uaccess cleanup macro naming
  arm64: uaccess: split user/kernel routines
  arm64: uaccess: refactor __{get,put}_user
  arm64: uaccess: simplify __copy_user_flushcache()
  arm64: uaccess: rename privileged uaccess routines
  arm64: sdei: explicitly simulate PAN/UAO entry
  arm64: sdei: move uaccess logic to arch/arm64/
  arm64: head.S: always initialize PSTATE
  arm64: head.S: cleanup SCTLR_ELx initialization
  arm64: head.S: rename el2_setup -> init_kernel_el
  arm64: add C wrappers for SET_PSTATE_*()
  ...
2020-12-14 16:24:30 -08:00
Linus Torvalds
586592478b - Add support for the hugetlb_cma command line option to allocate gigantic
hugepages using CMA:
 
 - Add arch_get_random_long() support.
 
 - Add ap bus userspace notifications.
 
 - Increase default size of vmalloc area to 512GB and otherwise let it increase
   dynamically by the size of physical memory. This should fix all occurrences
   where the vmalloc area was not large enough.
 
 - Completely get rid of set_fs() (aka select SET_FS) and rework address space
   handling while doing that; making address space handling much more simple.
 
 - Reimplement getcpu vdso syscall in C.
 
 - Add support for extended SCLP responses (> 4k). This allows e.g. to handle
   also potential large system configurations.
 
 - Simplify KASAN by removing 3-level page table support and only supporting
   4-levels from now on.
 
 - Improve debug-ability of the kernel decompressor code, which now prints also
   stack traces and symbols in case of problems to the console.
 
 - Remove more power management leftovers.
 
 - Other various fixes and improvements all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAl/XQAIACgkQIg7DeRsp
 bsIdYA//TCtSTrka/yW03b4b0FuLtKNpKB5zQgaqtEurbgbZhXdZ7/L3N+KavPQH
 njmKAARxebRIJB0DoZ9w9XpSb+mI3Q5y8GMi5xvUzjtJj/c6ahi3cEXIpuDR0PBv
 bf4UYSUpvndOwVFVOEZLeaJwKciCYvdoOwjBCmoKz9orthNVdVh5vztVRE2dMkNl
 y9C/Pb3w4ZMYxrbETuYnxqzueCxUhVOJmwodkGdP6bxBeemOwKn2TLVZQCbGGe7y
 BZpG+xsTaLZV1dZUZuDSOzVi1CTzJBGaJuYy5ewddWfxi7+mxqwEg/4s6nGKAciX
 Fa3T6aqLpUmDDN842Ql9TZHrwR+GYrlAp3XaQETOusUuEQLvP1dKRj/RXiDXN3MZ
 L+Mfa56dbs9GkVaNN/N+L7Y4z/6tZ2caX4X2S22Cp/QzvRTrG4jXVTn0r4WIcY/2
 vn7fEy71LJ97CLQTDryyfJx7YNMdyIlUZY5ICAk1bt8nz1lB/IoZy0YoCBvPxIzb
 cEKcFTOdOtZR4WY3F8+kU0Nv1HQ8yPBzMaAqSNERvNQhMvoCChxntmyYxuVgH5iB
 SACADqEJKQ3hb4nMnxkeTrmmrhH4e0kdF9lAEytX+VYbjAq/6MY+qYo+QHDYkFWh
 BndxI54d6IiktDcKuBcpKJM7S/7N2t+EsLTS6Dhux7dbDZ2+Upw=
 =UR7j
 -----END PGP SIGNATURE-----

Merge tag 's390-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - Add support for the hugetlb_cma command line option to allocate
   gigantic hugepages using CMA

 - Add arch_get_random_long() support.

 - Add ap bus userspace notifications.

 - Increase default size of vmalloc area to 512GB and otherwise let it
   increase dynamically by the size of physical memory. This should fix
   all occurrences where the vmalloc area was not large enough.

 - Completely get rid of set_fs() (aka select SET_FS) and rework address
   space handling while doing that; making address space handling much
   more simple.

 - Reimplement getcpu vdso syscall in C.

 - Add support for extended SCLP responses (> 4k). This allows e.g. to
   handle also potential large system configurations.

 - Simplify KASAN by removing 3-level page table support and only
   supporting 4-levels from now on.

 - Improve debug-ability of the kernel decompressor code, which now
   prints also stack traces and symbols in case of problems to the
   console.

 - Remove more power management leftovers.

 - Other various fixes and improvements all over the place.

* tag 's390-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (62 commits)
  s390/mm: add support to allocate gigantic hugepages using CMA
  s390/crypto: add arch_get_random_long() support
  s390/smp: perform initial CPU reset also for SMT siblings
  s390/mm: use invalid asce for user space when switching to init_mm
  s390/idle: fix accounting with machine checks
  s390/idle: add missing mt_cycles calculation
  s390/boot: add build-id to decompressor
  s390/kexec_file: fix diag308 subcode when loading crash kernel
  s390/cio: fix use-after-free in ccw_device_destroy_console
  s390/cio: remove pm support from ccw bus driver
  s390/cio: remove pm support from css-bus driver
  s390/cio: remove pm support from IO subchannel drivers
  s390/cio: remove pm support from chsc subchannel driver
  s390/vmur: remove unused pm related functions
  s390/tape: remove unsupported PM functions
  s390/cio: remove pm support from eadm-sch drivers
  s390: remove pm support from console drivers
  s390/dasd: remove unused pm related functions
  s390/zfcp: remove pm support from zfcp driver
  s390/ap: let bus_register() add the AP bus sysfs attributes
  ...
2020-12-14 16:22:26 -08:00
Linus Torvalds
0b03beface m68k updates for v5.11
- Fix WARNING splat in pmac_zilog driver,
   - Fix ADB input device regression,
   - Assume maintainership for adb-iop and via-macii,
   - Minor fixes and improvements,
   - Defconfig updates.
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQQ9qaHoIs/1I4cXmEiKwlD9ZEnxcAUCX9crJBUcZ2VlcnRAbGlu
 dXgtbTY4ay5vcmcACgkQisJQ/WRJ8XCmTgD/XEux61KarMOuNhbyGf05nAN+xbf5
 Q16XDa6AQmGWgRcBANpCX+9m5dgu4GQo+M6yiK9ZptevookDy+JpvE8DqLMA
 =BrTy
 -----END PGP SIGNATURE-----

Merge tag 'm68k-for-v5.11-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k

Pull m68k updates from Geert Uytterhoeven:

 - fix WARNING splat in pmac_zilog driver

 - fix ADB input device regression

 - assume maintainership for adb-iop and via-macii

 - minor fixes and improvements

 - defconfig updates

* tag 'm68k-for-v5.11-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  MAINTAINERS: Update m68k Mac entry
  macintosh/adb-iop: Send correct poll command
  macintosh/adb-iop: Always wait for reply message from IOP
  m68k: Fix WARNING splat in pmac_zilog driver
  m68k: Add a missing ELF_DETAILS in link script
  m68k: Drop redundant NOTES in link script
  m68k: mac: Update Kconfig help
  m68k: mac: Remove redundant VIA register writes
  m68k: mac: Remove dead code
  m68k: mac: Refactor iop_preinit() and iop_init()
  m68k: defconfig: Enable KUnit tests
  m68k: defconfig: Update defconfigs for v5.10-rc1
  m68k: Remove unused mach_max_dma_address
  m68k: Avoid xchg() warning
2020-12-14 16:20:48 -08:00
Linus Torvalds
2c075f38a7 Merge branch 'radeon-fixes' (Radeon and amdgpu fixes)
Merge radeon and amdgpu fixes for boot problems.

The drm merge ended up causing my AMD workstation to not boot.  When
reporting this, Alex Deucher correctly blamed commit 28a68f8282
("drm/radeon/ttm: use multihop") and pointed to these two fixes that
were already in the drm-misc tree:

  95e3d610d3 ("drm/radeon: fix check order in radeon_bo_move")
  aefec40938 ("drm/amdgpu: fix check order in amdgpu_bo_move")

Since I hate doing merges with a known-bad base, I've cherry-picked
these fixes into a branch of their own, just to be able to continue my
merge window.  The original commits will eventually come in as
duplicates through the normal channels, but in the meantime we have a
tree that isn't broken with basically every AMD GPU out there.

Acked-by: Alex Deucher <alexdeucher@gmail.com>

* cherry-picked from the drm-misc tree:
  drm/radeon: fix check order in radeon_bo_move
  drm/amdgpu: fix check order in amdgpu_bo_move
2020-12-14 16:17:38 -08:00
Christian König
68b111bf74 drm/radeon: fix check order in radeon_bo_move
Reorder the code to fix checking if blitting is available.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 28a68f8282 ("drm/radeon/ttm: use multihop")
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/403847/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-14 16:00:46 -08:00
Christian König
228ddee8ed drm/amdgpu: fix check order in amdgpu_bo_move
Reorder the code to fix checking if blitting is available.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/401019/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-14 16:00:46 -08:00
Jakub Kicinski
7bca5021a4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

1) Missing dependencies in NFT_BRIDGE_REJECT, from Randy Dunlap.

2) Use atomic_inc_return() instead of atomic_add_return() in IPVS,
   from Yejune Deng.

3) Simplify check for overquota in xt_nfacct, from Kaixu Xia.

4) Move nfnl_acct_list away from struct net, from Miao Wang.

5) Pass actual sk in reject actions, from Jan Engelhardt.

6) Add timeout and protoinfo to ctnetlink destroy events,
   from Florian Westphal.

7) Four patches to generalize set infrastructure to support
   for multiple expressions per set element.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
  netfilter: nftables: netlink support for several set element expressions
  netfilter: nftables: generalize set extension to support for several expressions
  netfilter: nftables: move nft_expr before nft_set
  netfilter: nftables: generalize set expressions support
  netfilter: ctnetlink: add timeout and protoinfo to destroy events
  netfilter: use actual socket sk for REJECT action
  netfilter: nfnl_acct: remove data from struct net
  netfilter: Remove unnecessary conversion to bool
  ipvs: replace atomic_add_return()
  netfilter: nft_reject_bridge: fix build errors due to code movement
====================

Link: https://lore.kernel.org/r/20201212230513.3465-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 15:43:21 -08:00
Jakub Kicinski
a6b5e026e6 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-12-14

1) Expose bpf_sk_storage_*() helpers to iterator programs, from Florent Revest.

2) Add AF_XDP selftests based on veth devs to BPF selftests, from Weqaar Janjua.

3) Support for finding BTF based kernel attach targets through libbpf's
   bpf_program__set_attach_target() API, from Andrii Nakryiko.

4) Permit pointers on stack for helper calls in the verifier, from Yonghong Song.

5) Fix overflows in hash map elem size after rlimit removal, from Eric Dumazet.

6) Get rid of direct invocation of llc in BPF selftests, from Andrew Delgadillo.

7) Fix xsk_recvmsg() to reorder socket state check before access, from Björn Töpel.

8) Add new libbpf API helper to retrieve ring buffer epoll fd, from Brendan Jackman.

9) Batch of minor BPF selftest improvements all over the place, from Florian Lehner,
   KP Singh, Jiri Olsa and various others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (31 commits)
  selftests/bpf: Add a test for ptr_to_map_value on stack for helper access
  bpf: Permits pointers on stack for helper calls
  libbpf: Expose libbpf ring_buffer epoll_fd
  selftests/bpf: Add set_attach_target() API selftest for module target
  libbpf: Support modules in bpf_program__set_attach_target() API
  selftests/bpf: Silence ima_setup.sh when not running in verbose mode.
  selftests/bpf: Drop the need for LLVM's llc
  selftests/bpf: fix bpf_testmod.ko recompilation logic
  samples/bpf: Fix possible hang in xdpsock with multiple threads
  selftests/bpf: Make selftest compilation work on clang 11
  selftests/bpf: Xsk selftests - adding xdpxceiver to .gitignore
  selftests/bpf: Drop tcp-{client,server}.py from Makefile
  selftests/bpf: Xsk selftests - Bi-directional Sockets - SKB, DRV
  selftests/bpf: Xsk selftests - Socket Teardown - SKB, DRV
  selftests/bpf: Xsk selftests - DRV POLL, NOPOLL
  selftests/bpf: Xsk selftests - SKB POLL, NOPOLL
  selftests/bpf: Xsk selftests framework
  bpf: Only provide bpf_sock_from_file with CONFIG_NET
  bpf: Return -ENOTSUPP when attaching to non-kernel BTF
  xsk: Validate socket state in xsk_recvmsg, prior touching socket members
  ...
====================

Link: https://lore.kernel.org/r/20201214214316.20642-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 15:34:36 -08:00
Uladzislau Rezki (Sony)
1b04fa9900 rcu-tasks: Move RCU-tasks initialization to before early_initcall()
PowerPC testing encountered boot failures due to RCU Tasks not being
fully initialized until core_initcall() time.  This commit therefore
initializes RCU Tasks (along with Rude RCU and RCU Tasks Trace) just
before early_initcall() time, thus allowing waiting on RCU Tasks grace
periods from early_initcall() handlers.

Link: https://lore.kernel.org/rcu/87eekfh80a.fsf@dja-thinkpad.axtens.net/
Fixes: 36dadef23f ("kprobes: Init kprobes in early_initcall")
Tested-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-12-14 15:31:13 -08:00
Colin Ian King
92f0a3a22c Input: da7280 - fix spelling mistake "sequemce" -> "sequence"
There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201214223109.82924-1-colin.king@canonical.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-12-14 15:02:06 -08:00
Arnd Bergmann
f051ae4f6c Input: cyapa_gen6 - fix out-of-bounds stack access
gcc -Warray-bounds warns about a serious bug in
cyapa_pip_retrieve_data_structure:

drivers/input/mouse/cyapa_gen6.c: In function 'cyapa_pip_retrieve_data_structure.constprop':
include/linux/unaligned/access_ok.h:40:17: warning: array subscript -1 is outside array bounds of 'struct retrieve_data_struct_cmd[1]' [-Warray-bounds]
   40 |  *((__le16 *)p) = cpu_to_le16(val);
drivers/input/mouse/cyapa_gen6.c:569:13: note: while referencing 'cmd'
  569 |  } __packed cmd;
      |             ^~~

Apparently the '-2' was added to the pointer instead of the value,
writing garbage into the stack next to this variable.

Fixes: c2c06c41f7 ("Input: cyapa - add gen6 device module support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20201026161332.3708389-1-arnd@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-12-14 15:02:04 -08:00
Ilya Dryomov
2f0df6cfa3 libceph: drop ceph_auth_{create,update}_authorizer()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
ce287162d9 libceph, ceph: make use of __ceph_auth_get_authorizer() in msgr1
This shouldn't cause any functional changes.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
cd1a677cad libceph, ceph: implement msgr2.1 protocol (crc and secure modes)
Implement msgr2.1 wire protocol, available since nautilus 14.2.11
and octopus 15.2.5.  msgr2.0 wire protocol is not implemented -- it
has several security, integrity and robustness issues and therefore
considered deprecated.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
00498b9941 libceph: introduce connection modes and ms_mode option
msgr2 supports two connection modes: crc (plain) and secure (on-wire
encryption).  Connection mode is picked by server based on input from
client.

Introduce ms_mode option:

  ms_mode=legacy        - msgr1 (default)
  ms_mode=crc           - crc mode, if denied fail
  ms_mode=secure        - secure mode, if denied fail
  ms_mode=prefer-crc    - crc mode, if denied agree to secure mode
  ms_mode=prefer-secure - secure mode, if denied agree to crc mode

ms_mode affects all connections, we don't separate connections to mons
like it's done in userspace with ms_client_mode vs ms_mon_client_mode.

For now the default is legacy, to be flipped to prefer-crc after some
time.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
313771e80f libceph, rbd: ignore addr->type while comparing in some cases
For libceph, this ensures that libceph instance sharing (share option)
continues to work.  For rbd, this avoids blocklisting alive lock owners
(locker addr is always LEGACY, while watcher addr is ANY in nautilus).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
a5cbd5fc22 libceph, ceph: get and handle cluster maps with addrvecs
In preparation for msgr2, make the cluster send us maps with addrvecs
including both LEGACY and MSGR2 addrs instead of a single LEGACY addr.
This means advertising support for SERVER_NAUTILUS and also some older
features: SERVER_MIMIC, MONENC and MONNAMES.

MONNAMES and MONENC are actually pre-argonaut, we just never updated
ceph_monmap_decode() for them.  Decoding is unconditional, see commit
23c625ce30 ("libceph: assume argonaut on the server side").

SERVER_MIMIC doesn't bear any meaning for the kernel client.

Since ceph_decode_entity_addrvec() is guarded by encoding version
checks (and in msgr2 case it is guarded implicitly by the fact that
server is speaking msgr2), we assume MSG_ADDR2 for it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
8921f25116 libceph: factor out finish_auth()
In preparation for msgr2, factor out finish_auth() so it is suitable
for both existing MAuth message based authentication and upcoming msgr2
authentication exchange.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
c1c0ce78f4 libceph: drop ac->ops->name field
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
59711f9ec2 libceph: amend cephx init_protocol() and build_request()
In msgr2, initial authentication happens with an exchange of msgr2
control frames -- MAuth message and struct ceph_mon_request_header
aren't used.  Make that optional.

Stop reporting cephx protocol as "x".  Use "cephx" instead.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
285ea34fc8 libceph, ceph: incorporate nautilus cephx changes
- request service tickets together with auth ticket.  Currently we get
  auth ticket via CEPHX_GET_AUTH_SESSION_KEY op and then request service
  tickets via CEPHX_GET_PRINCIPAL_SESSION_KEY op in a separate message.
  Since nautilus, desired service tickets are shared togther with auth
  ticket in CEPHX_GET_AUTH_SESSION_KEY reply.

- propagate session key and connection secret, if any.  In preparation
  for msgr2, update handle_reply() and verify_authorizer_reply() auth
  ops to propagate session key and connection secret.  Since nautilus,
  if secure mode is negotiated, connection secret is shared either in
  CEPHX_GET_AUTH_SESSION_KEY reply (for mons) or in a final authorizer
  reply (for osds and mdses).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
6610fff278 libceph: safer en/decoding of cephx requests and replies
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
f79e25b087 libceph: more insight into ticket expiry and invalidation
Make it clear that "need" is a union of "missing" and "have, but up
for renewal" and dout when the ticket goes missing due to expiry or
invalidation by client.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
a56dd9bf47 libceph: move msgr1 protocol specific fields to its own struct
A couple whitespace fixups, no functional changes.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
2f713615dd libceph: move msgr1 protocol implementation to its own file
A pure move, no other changes.

Note that ceph_tcp_recv{msg,page}() and ceph_tcp_send{msg,page}()
helpers are also moved.  msgr2 will bring its own, more efficient,
variants based on iov_iter.  Switching msgr1 to them was considered
but decided against to avoid subtle regressions.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:50 +01:00
Ilya Dryomov
566050e17e libceph: separate msgr1 protocol implementation
In preparation for msgr2, define internal messenger <-> protocol
interface (as opposed to external messenger <-> client interface, which
is struct ceph_connection_operations) consisting of try_read(),
try_write(), revoke(), revoke_incoming(), opened(), reset_session() and
reset_protocol() ops.  The semantics are exactly the same as they are
now.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:49 +01:00
Ilya Dryomov
6503e0b69c libceph: export remaining protocol independent infrastructure
In preparation for msgr2, make all protocol independent functions
in messenger.c global.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:49 +01:00
Ilya Dryomov
699921d9e6 libceph: export zero_page
In preparation for msgr2, make zero_page global.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:49 +01:00
Ilya Dryomov
3fefd43e74 libceph: rename and export con->flags bits
In preparation for msgr2, move the defines to the header file.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:49 +01:00
Ilya Dryomov
6d7f62bfb5 libceph: rename and export con->state states
In preparation for msgr2, rename msgr1 specific states and move the
defines to the header file.

Also drop state transition comments.  They don't cover all possible
transitions (e.g. NEGOTIATING -> STANDBY, etc) and currently do more
harm than good.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:49 +01:00