Commit Graph

966273 Commits

Author SHA1 Message Date
Rohit Maheshwari
38f7e1c0c4 net/tls: race causes kernel panic
BUG: kernel NULL pointer dereference, address: 00000000000000b8
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 80000008b6fef067 P4D 80000008b6fef067 PUD 8b6fe6067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 12 PID: 23871 Comm: kworker/12:80 Kdump: loaded Tainted: G S
 5.9.0-rc3+ #1
 Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.1 03/29/2018
 Workqueue: events tx_work_handler [tls]
 RIP: 0010:tx_work_handler+0x1b/0x70 [tls]
 Code: dc fe ff ff e8 16 d4 a3 f6 66 0f 1f 44 00 00 0f 1f 44 00 00 55 53 48 8b
 6f 58 48 8b bd a0 04 00 00 48 85 ff 74 1c 48 8b 47 28 <48> 8b 90 b8 00 00 00 83
 e2 02 75 0c f0 48 0f ba b0 b8 00 00 00 00
 RSP: 0018:ffffa44ace61fe88 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffff91da9e45cc30 RCX: dead000000000122
 RDX: 0000000000000001 RSI: ffff91da9e45cc38 RDI: ffff91d95efac200
 RBP: ffff91da133fd780 R08: 0000000000000000 R09: 000073746e657665
 R10: 8080808080808080 R11: 0000000000000000 R12: ffff91dad7d30700
 R13: ffff91dab6561080 R14: 0ffff91dad7d3070 R15: ffff91da9e45cc38
 FS:  0000000000000000(0000) GS:ffff91dad7d00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000000b8 CR3: 0000000906478003 CR4: 00000000003706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  process_one_work+0x1a7/0x370
  worker_thread+0x30/0x370
  ? process_one_work+0x370/0x370
  kthread+0x114/0x130
  ? kthread_park+0x80/0x80
  ret_from_fork+0x22/0x30

tls_sw_release_resources_tx() waits for encrypt_pending, which
can have race, so we need similar changes as in commit
0cada33241 here as well.

Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 20:06:29 -07:00
Wang Qing
0eb11dfe22 net/ethernet/broadcom: fix spelling typo
Modify the comment typo: "compliment" -> "complement".

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 20:05:48 -07:00
Xiaoliang Yang
4ab810a4e0 net: mscc: ocelot: fix fields offset in SG_CONFIG_REG_3
INIT_IPS and GATE_ENABLE fields have a wrong offset in SG_CONFIG_REG_3.
This register is used by stream gate control of PSFP, and it has not
been used before, because PSFP is not implemented in ocelot driver.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 20:00:40 -07:00
Xiaoliang Yang
dba1e4660a net: dsa: felix: convert TAS link speed based on phylink speed
state->speed holds a value of 10, 100, 1000 or 2500, but
QSYS_TAG_CONFIG_LINK_SPEED expects a value of 0, 1, 2, 3. So convert the
speed to a proper value.

Fixes: de143c0e27 ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 20:00:40 -07:00
Luo bin
f68910a805 hinic: fix wrong return value of mac-set cmd
It should also be regarded as an error when hw return status=4 for PF's
setting mac cmd. Only if PF return status=4 to VF should this cmd be
taken special treatment.

Fixes: 7dd29ee128 ("hinic: add sriov feature support")
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 20:00:40 -07:00
David S. Miller
a1a35529bd Merge branch 'mptcp-RM_ADDR-ADD_ADDR-enhancements'
Geliang Tang says:

====================
mptcp: RM_ADDR/ADD_ADDR enhancements

This series include two enhancements for the MPTCP path management,
namely RM_ADDR support and ADD_ADDR echo support, as specified by RFC
sections 3.4.1 and 3.4.2.

1 RM_ADDR support include 9 patches (1-3 and 8-13):

Patch 1 is the helper for patch 2, these two patches add the RM_ADDR
outgoing functions, which are derived from ADD_ADDR's corresponding
functions.

Patch 3 adds the RM_ADDR incoming logic, when RM_ADDR suboption is
received, close the subflow matching the rm_id, and update PM counter.

Patch 8 is the main remove routine. When the PM netlink removes an address,
we traverse all the existing msk sockets to find the relevant sockets. Then
trigger the RM_ADDR signal and remove the subflow which using this local
address, this subflow removing functions has been implemented in patch 9.

Finally, patches 10-13 are the self-tests for RM_ADDR.

2 ADD_ADDR echo support include 7 patches (4-7 and 14-16).

Patch 4 adds the ADD_ADDR echo logic, when the ADD_ADDR suboption has been
received, send out the same ADD_ADDR suboption with echo-flag, and no HMAC
included.

Patches 5 and 6 are the self-tests for ADD_ADDR echo. Patch 7 is a little
cleaning up.

Patch 14 and 15 are the helpers for patch 16. These three patches add
the ADD_ADDR retransmition when no ADD_ADDR echo is received.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:34 -07:00
Geliang Tang
00cfd77b90 mptcp: retransmit ADD_ADDR when timeout
This patch implemented the retransmition of ADD_ADDR when no ADD_ADDR echo
is received. It added a timer with the announced address. When timeout
occurs, ADD_ADDR will be retransmitted.

Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:34 -07:00
Geliang Tang
08b81d8731 mptcp: add sk_stop_timer_sync helper
This patch added a new helper sk_stop_timer_sync, it deactivates a timer
like sk_stop_timer, but waits for the handler to finish.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:34 -07:00
Geliang Tang
0abd40f823 mptcp: add struct mptcp_pm_add_entry
Add a new struct mptcp_pm_add_entry to describe add_addr's entry.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:34 -07:00
Geliang Tang
dd72b0fede selftests: mptcp: add remove addr and subflow test cases
This patch added the remove addr and subflow test cases and two new
functions.

The first function run_remove_tests calls do_transfer with two new
arguments, rm_nr_ns1 and rm_nr_ns2, for the numbers of addresses should be
removed during the transfer process in namespace 1 and namespace 2.

If both these two arguments are 0, we do the join test cases with
"mptcp_connect -j" command. Otherwise, do the remove test cases with
"mptcp_connect -r" command.

The second function chk_rm_nr checks the RM_ADDR related mibs's counters.

The output of the test cases looks like this:

11 remove single subflow           syn[ ok ] - synack[ ok ] - ack[ ok ]
                                   rm [ ok ] - sf    [ ok ]
12 remove multiple subflows        syn[ ok ] - synack[ ok ] - ack[ ok ]
                                   rm [ ok ] - sf    [ ok ]
13 remove single address           syn[ ok ] - synack[ ok ] - ack[ ok ]
                                   add[ ok ] - echo  [ ok ]
                                   rm [ ok ] - sf    [ ok ]
14 remove subflow and signal       syn[ ok ] - synack[ ok ] - ack[ ok ]
                                   add[ ok ] - echo  [ ok ]
                                   rm [ ok ] - sf    [ ok ]
15 remove subflows and signal      syn[ ok ] - synack[ ok ] - ack[ ok ]
                                   add[ ok ] - echo  [ ok ]
                                   rm [ ok ] - sf    [ ok ]

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:34 -07:00
Geliang Tang
1315332409 selftests: mptcp: add remove cfg in mptcp_connect
This patch added a new cfg, named cfg_remove in mptcp_connect. This new
cfg_remove is copied from cfg_join. The only difference between them is in
the do_rnd_write function. Here we slow down the transfer process of all
data to let the RM_ADDR suboption can be sent and received completely.
Otherwise the remove address and subflow test cases don't work.

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:34 -07:00
Geliang Tang
5c8c164095 mptcp: add mptcp_destroy_common helper
This patch added a new helper named mptcp_destroy_common containing the
shared code between mptcp_destroy() and mptcp_sock_destruct().

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:34 -07:00
Geliang Tang
7a7e52e38a mptcp: add RM_ADDR related mibs
This patch added two new mibs for RM_ADDR, named MPTCP_MIB_RMADDR and
MPTCP_MIB_RMSUBFLOW, when the RM_ADDR suboption is received, increase
the first mib counter, when the local subflow is removed, increase the
second mib counter.

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:34 -07:00
Geliang Tang
0ee4261a36 mptcp: implement mptcp_pm_remove_subflow
This patch implemented the local subflow removing function,
mptcp_pm_remove_subflow, it simply called mptcp_pm_nl_rm_subflow_received
under the PM spin lock.

We use mptcp_pm_remove_subflow to remove a local subflow, so change it's
argument from remote_id to local_id.

We check subflow->local_id in mptcp_pm_nl_rm_subflow_received to remove
a subflow.

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:34 -07:00
Geliang Tang
b6c0838086 mptcp: remove addr and subflow in PM netlink
This patch implements the remove announced addr and subflow logic in PM
netlink.

When the PM netlink removes an address, we traverse all the existing msk
sockets to find the relevant sockets.

We add a new list named anno_list in mptcp_pm_data, to record all the
announced addrs. In the traversing, we check if it has been recorded.
If it has been, we trigger the RM_ADDR signal.

We also check if this address is in conn_list. If it is, we remove the
subflow which using this local address.

Since we call mptcp_pm_free_anno_list in mptcp_destroy, we need to move
__mptcp_init_sock before the mptcp_is_enabled check in mptcp_init_sock.

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:33 -07:00
Geliang Tang
f58f065aa1 mptcp: add accept_subflow re-check
The re-check of pm->accept_subflow with pm->lock held was missing, this
patch fixed it.

Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:33 -07:00
Geliang Tang
be61316003 selftests: mptcp: add ADD_ADDR mibs check function
This patch added the ADD_ADDR related mibs counter check function
chk_add_nr(). This function check both ADD_ADDR and ADD_ADDR with
echo flag.

The output looks like this:

 07 unused signal address             syn[ ok ] - synack[ ok ] - ack[ ok ]
                                      add[ ok ] - echo  [ ok ]
 08 signal address                    syn[ ok ] - synack[ ok ] - ack[ ok ]
                                      add[ ok ] - echo  [ ok ]
 09 subflow and signal                syn[ ok ] - synack[ ok ] - ack[ ok ]
                                      add[ ok ] - echo  [ ok ]
 10 multiple subflows and signal      syn[ ok ] - synack[ ok ] - ack[ ok ]
                                      add[ ok ] - echo  [ ok ]
 11 remove subflow and signal         syn[ ok ] - synack[ ok ] - ack[ ok ]
                                      add[ ok ] - echo  [ ok ]

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:33 -07:00
Geliang Tang
a877de0671 mptcp: add ADD_ADDR related mibs
This patch added two mibs for ADD_ADDR, MPTCP_MIB_ADDADDR for receiving
of the ADD_ADDR suboption with echo-flag=0, and MPTCP_MIB_ECHOADD for
receiving the ADD_ADDR suboption with echo-flag=1.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:33 -07:00
Geliang Tang
6a6c05a8b0 mptcp: send out ADD_ADDR with echo flag
When the ADD_ADDR suboption has been received, we need to send out the same
ADD_ADDR suboption with echo-flag=1, and no HMAC.

Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:33 -07:00
Geliang Tang
d0876b2284 mptcp: add the incoming RM_ADDR support
This patch added the RM_ADDR option parsing logic:

We parsed the incoming options to find if the rm_addr option is received,
and called mptcp_pm_rm_addr_received to schedule PM work to a new status,
named MPTCP_PM_RM_ADDR_RECEIVED.

PM work got this status, and called mptcp_pm_nl_rm_addr_received to handle
it.

In mptcp_pm_nl_rm_addr_received, we closed the subflow matching the rm_id,
and updated PM counter.

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:33 -07:00
Geliang Tang
5cb104ae55 mptcp: add the outgoing RM_ADDR support
This patch added a new signal named rm_addr_signal in PM. On outgoing path,
we called mptcp_pm_should_rm_signal to check if rm_addr_signal has been
set. If it has been, we sent out the RM_ADDR option.

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:33 -07:00
Geliang Tang
f643b8032e mptcp: rename addr_signal and the related functions
This patch renamed addr_signal and the related functions with the explicit
word "add".

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:33 -07:00
David S. Miller
075c156850 mlx5-updates-2020-09-22
This series includes mlx5 updates
 
 1) Add support for Connection Tracking offload in NIC mode.
    Supporting CT offload in NIC mode on Mellanox cards is useful for
    scenarios where the dual port NIC serves as a gateway between 2
    networks and forwards traffic between these networks.
 
    Since the traffic is not terminated on the host in this case,
    no use of SRIOV VFs and/or switchdev mode is required.
 
    Today Mellanox NIC cards already support offloading of packet forwarding
    between physical ports without going to the host so combining it with CT
    offloading allows users to create a gateway with forwarding and CT
    (Including NAT) offloading capabilities in non-switchdev mode.
 
    To support connection tracking in non-Switchdev mode (Single NIC mode),
    we need to make use of the current Connection tracking infrastructure
    implemented on top of E-Switch and the mlx5 generic flow table chains
    APIs, to make it work on non-Eswitch steering domain e.g. NIC RX domain,
    the following was performed:
 
  1.1) Refactor current flow steering chains infrastructure and
       updates TC nic mode implementation to use flow table chains.
  1.2) Refactor current Connection Tracking (CT) infrastructure to not
       assume E-switch backend, and make the CT layer agnostic to
       underlying steering mode (E-Switch/NIC)
  1.3) Plumbing to support CT offload in NIC mode.
 
 2) Trivial code cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl9rz9cACgkQSD+KveBX
 +j7K/gf/ZysTFuFuC7MCo7xJO2vxlGGE1r6/ENsqonvUT2tcoZCdK9bZMw1Mx17Z
 r1nyn0xQ3MwRheXMSpqXngTPpfGM6eNgV9CDfFXm62z6WXMYieen0t/LrM/mxo+2
 s74Okp53peyGNpePyseewEUGV7zaR6F6uukkKvr441gvAOF3Fcfaz+dIv7KzxKNS
 +b78yw0b6mGc4foYLSuJcDQlSwqjeIpdSib8xmETMZwRzCt20GCEBDsBAaKt0wzM
 1fTZttY+kuLd/m/q+sh3s/4lN2kOO+dwK5NGf+RWtiOaDWT+J/ogVmI2ywXIwsg7
 U63nhjGAr7GPqkaG0Jv3aS7na6pbSA==
 =sByc
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2020-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-09-22

This series includes mlx5 updates

1) Add support for Connection Tracking offload in NIC mode.
   Supporting CT offload in NIC mode on Mellanox cards is useful for
   scenarios where the dual port NIC serves as a gateway between 2
   networks and forwards traffic between these networks.

   Since the traffic is not terminated on the host in this case,
   no use of SRIOV VFs and/or switchdev mode is required.

   Today Mellanox NIC cards already support offloading of packet forwarding
   between physical ports without going to the host so combining it with CT
   offloading allows users to create a gateway with forwarding and CT
   (Including NAT) offloading capabilities in non-switchdev mode.

   To support connection tracking in non-Switchdev mode (Single NIC mode),
   we need to make use of the current Connection tracking infrastructure
   implemented on top of E-Switch and the mlx5 generic flow table chains
   APIs, to make it work on non-Eswitch steering domain e.g. NIC RX domain,
   the following was performed:

 1.1) Refactor current flow steering chains infrastructure and
      updates TC nic mode implementation to use flow table chains.
 1.2) Refactor current Connection Tracking (CT) infrastructure to not
      assume E-switch backend, and make the CT layer agnostic to
      underlying steering mode (E-Switch/NIC)
 1.3) Plumbing to support CT offload in NIC mode.

2) Trivial code cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:54:40 -07:00
Xie He
ed46cd1d4c drivers/net/wan/x25_asy: Correct the ndo_open and ndo_stop functions
1.
Move the lapb_register/lapb_unregister calls into the ndo_open/ndo_stop
functions.
This makes the LAPB protocol start/stop when the network interface
starts/stops. When the network interface is down, the LAPB protocol
shouldn't be running and the LAPB module shoudn't be generating control
frames.

2.
Move netif_start_queue/netif_stop_queue into the ndo_open/ndo_stop
functions.
This makes the TX queue start/stop when the network interface
starts/stops.
(netif_stop_queue was originally in the ndo_stop function. But to make
the code look better, I created a new function to use as ndo_stop, and
made it call the original ndo_stop function. I moved netif_stop_queue
from the original ndo_stop function to the new ndo_stop function.)

Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:52:58 -07:00
Maciej Żenczykowski
02a1b175b0 net/ipv4: always honour route mtu during forwarding
Documentation/networking/ip-sysctl.txt:46 says:
  ip_forward_use_pmtu - BOOLEAN
    By default we don't trust protocol path MTUs while forwarding
    because they could be easily forged and can lead to unwanted
    fragmentation by the router.
    You only need to enable this if you have user-space software
    which tries to discover path mtus by itself and depends on the
    kernel honoring this information. This is normally not the case.
    Default: 0 (disabled)
    Possible values:
    0 - disabled
    1 - enabled

Which makes it pretty clear that setting it to 1 is a potential
security/safety/DoS issue, and yet it is entirely reasonable to want
forwarded traffic to honour explicitly administrator configured
route mtus (instead of defaulting to device mtu).

Indeed, I can't think of a single reason why you wouldn't want to.
Since you configured a route mtu you probably know better...

It is pretty common to have a higher device mtu to allow receiving
large (jumbo) frames, while having some routes via that interface
(potentially including the default route to the internet) specify
a lower mtu.

Note that ipv6 forwarding uses device mtu unless the route is locked
(in which case it will use the route mtu).

This approach is not usable for IPv4 where an 'mtu lock' on a route
also has the side effect of disabling TCP path mtu discovery via
disabling the IPv4 DF (don't frag) bit on all outgoing frames.

I'm not aware of a way to lock a route from an IPv6 RA, so that also
potentially seems wrong.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Sunmeet Gill (Sunny) <sgill@quicinc.com>
Cc: Vinay Paradkar <vparadka@qti.qualcomm.com>
Cc: Tyler Wear <twear@quicinc.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:51:16 -07:00
David S. Miller
54ce00ae36 Merge branch 'dpaa2-mac-add-PCS-support-through-the-Lynx-module'
Ioana Ciornei says:

====================
dpaa2-mac: add PCS support through the Lynx module

This patch set aims to add PCS support in the dpaa2-eth driver by
leveraging the Lynx PCS module.

The first two patches are some missing pieces: the first one adding
support for 10GBASER in Lynx PCS while the second one adds a new
function - of_mdio_find_device - which is helpful in retrieving the PCS
represented as a mdio_device.  The final patch adds the glue logic
between phylink and the Lynx PCS module: it retrieves the PCS
represented as an mdio_device and registers it to Lynx and phylink.
From that point on, any PCS callbacks are treated by Lynx, without
dpaa2-eth interaction.

Changes in v2:
 - move put_device() after destroy - 3/3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:49:36 -07:00
Ioana Ciornei
94ae899b20 dpaa2-mac: add PCS support through the Lynx module
Include PCS support in the dpaa2-eth driver by integrating it with the
new Lynx PCS module. There is not much to talk about in terms of changes
needed in the dpaa2-eth driver since the only steps necessary are to
find the MDIO device representing the PCS, register it to the Lynx PCS
module and then let phylink know if its existence also.
After this, the PCS callbacks will be treated directly by Lynx, without
interraction from dpaa2-eth's part.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:49:36 -07:00
Russell King
b5b6775d72 of: add of_mdio_find_device() api
Add a helper function which finds the mdio_device structure given a
device tree node. This is helpful for finding the PCS device based on a
DTS node but managing it as a mdio_device instead of a phy_device.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:49:36 -07:00
Ioana Ciornei
e7e95c9003 net: pcs-lynx: add support for 10GBASER
Add support in the Lynx PCS module for the 10GBASE-R mode which is only
used to get the link state, since it offers a single fixed speed.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:49:36 -07:00
Vladimir Oltean
e2f9a8fe73 net: mscc: ocelot: always pass skb clone to ocelot_port_add_txtstamp_skb
Currently, ocelot switchdev passes the skb directly to the function that
enqueues it to the list of skb's awaiting a TX timestamp. Whereas the
felix DSA driver first clones the skb, then passes the clone to this
queue.

This matters because in the case of felix, the common IRQ handler, which
is ocelot_get_txtstamp(), currently clones the clone, and frees the
original clone. This is useless and can be simplified by using
skb_complete_tx_timestamp() instead of skb_tstamp_tx().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:47:56 -07:00
David S. Miller
6d8899962a Merge branch 'net_sched-fix-a-UAF-in-tcf_action_init'
Cong Wang says:

====================
net_sched: fix a UAF in tcf_action_init()

This patchset fixes a use-after-free triggered by syzbot. Please
find more details in each patch description.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:46:21 -07:00
Cong Wang
0fedc63fad net_sched: commit action insertions together
syzbot is able to trigger a failure case inside the loop in
tcf_action_init(), and when this happens we clean up with
tcf_action_destroy(). But, as these actions are already inserted
into the global IDR, other parallel process could free them
before tcf_action_destroy(), then we will trigger a use-after-free.

Fix this by deferring the insertions even later, after the loop,
and committing all the insertions in a separate loop, so we will
never fail in the middle of the insertions any more.

One side effect is that the window between alloction and final
insertion becomes larger, now it is more likely that the loop in
tcf_del_walker() sees the placeholder -EBUSY pointer. So we have
to check for error pointer in tcf_del_walker().

Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com
Fixes: 0190c1d452 ("net: sched: atomically check-allocate action")
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:46:21 -07:00
Cong Wang
e49d8c22f1 net_sched: defer tcf_idr_insert() in tcf_action_init_1()
All TC actions call tcf_idr_insert() for new action at the end
of their ->init(), so we can actually move it to a central place
in tcf_action_init_1().

And once the action is inserted into the global IDR, other parallel
process could free it immediately as its refcnt is still 1, so we can
not fail after this, we need to move it after the goto action
validation to avoid handling the failure case after insertion.

This is found during code review, is not directly triggered by syzbot.
And this prepares for the next patch.

Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:46:21 -07:00
Dave Airlie
ba78755e0c drm-misc-fixes for v5.9:
- Single null pointer deref fix for dma-buf.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAl9seRQACgkQ/lWMcqZw
 E8PODA/8DUbUMRyJqFtWYFiX92MULZSWdS4Khj1uFRhmLcpNSpLo/P+In88HJeLK
 bbYuDYAbAAxdce2VZEP/gNu5C2HDctJ4jCV+4fusczpUavthl1hkmlvDHnoJjtiX
 VYRHnZPyq5Yj+GRAtiHLzHSszUVsc3N18Vby2IG0Qu7MgN49pIgrVWD3/L8Dzdoi
 e6eVjSTypliCEhKgp3c+nl7ZaAnyqcKnhrisZW7sTuZm32GfG8C1E0DhGzbopPC5
 KQyMtQk6YRw7aSIA9lQQ8KoWqzyNEyRhM82gVzjmnWPfjtrhowObEQkLW42QaSUF
 wcV8k5T/Bw4Kgqrbo7bsrVCzBN+Dwz8gRT2jfXkw7CmRLv9aN5cTmu9eRPlgUbnc
 NpkS/0S/V/zRv+spNd+ILn3gaJdFEU7+LSxhCJwaDq95JHf5fB+KNhqWWEpun6/y
 D6Ge2OGjnVWBF4pfumHgv+F39a4rO5KsWapxucWjakj7UBT3uOaO8O/mvkCT8G6g
 nKm2+A6u8CUIpZUQlE5ICyHwncLBUzLaoTEqCwujNdBWnNkCBreTfjX04i3yVrQg
 XODboY8EiN+w8rBHexKLBkF+dgmSOxW1hG7a6gt2A42Qn42PJ5LFAnsRbkUSKA2t
 NDNxDTzpIn/O1i+bTg96Q5hWDKY15Vo9d6tWX4CUOEwCTzq5b2c=
 =CEdq
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2020-09-24' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v5.9:
- Single null pointer deref fix for dma-buf.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/4106c21e-f52c-4c05-6cdb-daa743bb8617@linux.intel.com
2020-09-25 11:30:00 +10:00
Dave Airlie
f3231a02aa Merge tag 'drm-intel-fixes-2020-09-24' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.9-rc7:
- Fix selftest reference to stack data out of scope
- Fix GVT null pointer dereference
- Backmerge from Linus' master to fix build

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87zh5fpmha.fsf@intel.com
2020-09-25 11:07:01 +10:00
Dave Airlie
720777c5be BackMerge commit '98477740630f270aecf648f1d6a9dbc6027d4ff1' into drm-fixes
The dax mess had some fallout, and i915 used a later base to fix their CI.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2020-09-25 11:06:18 +10:00
Xiao Ni
d3ee2d8415 md/raid10: improve discard request for far layout
For far layout, the discard region is not continuous on disks. So it needs
far copies r10bio to cover all regions. It needs a way to know all r10bios
have finish or not. Similar with raid10_sync_request, only the first r10bio
master_bio records the discard bio. Other r10bios master_bio record the
first r10bio. The first r10bio can finish after other r10bios finish and
then return the discard bio.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:45 -07:00
Xiao Ni
bcc90d2804 md/raid10: improve raid10 discard request
Now the discard request is split by chunk size. So it takes a long time
to finish mkfs on disks which support discard function. This patch improve
handling raid10 discard request. It uses the similar way with patch
29efc390b (md/md0: optimize raid0 discard handling).

But it's a little complex than raid0. Because raid10 has different layout.
If raid10 is offset layout and the discard request is smaller than stripe
size. There are some holes when we submit discard bio to underlayer disks.

For example: five disks (disk1 - disk5)
D01 D02 D03 D04 D05
D05 D01 D02 D03 D04
D06 D07 D08 D09 D10
D10 D06 D07 D08 D09
The discard bio just wants to discard from D03 to D10. For disk3, there is
a hole between D03 and D08. For disk4, there is a hole between D04 and D09.
D03 is a chunk, raid10_write_request can handle one chunk perfectly. So
the part that is not aligned with stripe size is still handled by
raid10_write_request.

If reshape is running when discard bio comes and the discard bio spans the
reshape position, raid10_write_request is responsible to handle this
discard bio.

I did a test with this patch set.
Without patch:
time mkfs.xfs /dev/md0
real4m39.775s
user0m0.000s
sys0m0.298s

With patch:
time mkfs.xfs /dev/md0
real0m0.105s
user0m0.000s
sys0m0.007s

nvme3n1           259:1    0   477G  0 disk
└─nvme3n1p1       259:10   0    50G  0 part
nvme4n1           259:2    0   477G  0 disk
└─nvme4n1p1       259:11   0    50G  0 part
nvme5n1           259:6    0   477G  0 disk
└─nvme5n1p1       259:12   0    50G  0 part
nvme2n1           259:9    0   477G  0 disk
└─nvme2n1p1       259:15   0    50G  0 part
nvme0n1           259:13   0   477G  0 disk
└─nvme0n1p1       259:14   0    50G  0 part

Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:45 -07:00
Xiao Ni
f046f5d0d7 md/raid10: pull codes that wait for blocked dev into one function
The following patch will reuse these logics, so pull the same codes into
one function.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:45 -07:00
Xiao Ni
8650a88901 md/raid10: extend r10bio devs to raid disks
Now it allocs r10bio->devs[conf->copies]. Discard bio needs to submit
to all member disks and it needs to use r10bio. So extend to
r10bio->devs[geo.raid_disks].

Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:45 -07:00
Xiao Ni
2628089b74 md: add md_submit_discard_bio() for submitting discard bio
Move these logic from raid0.c to md.c, so that we can also use it in
raid10.c.

Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:45 -07:00
Zhen Lei
e287308b83 md: Simplify code with existing definition RESYNC_SECTORS in raid10.c
#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)

"RESYNC_BLOCK_SIZE/512" is equal to "RESYNC_BLOCK_SIZE >> 9", replace it
with RESYNC_SECTORS.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:45 -07:00
Yufen Yu
3891258443 md/raid5: reallocate page array after setting new stripe_size
When try to resize stripe_size, we also need to free old
shared page array and allocate new.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:45 -07:00
Yufen Yu
f16acaf328 md/raid5: resize stripe_head when reshape array
When reshape array, we try to reuse shared pages of old stripe_head,
and allocate more for the new one if needed.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:45 -07:00
Yufen Yu
046169f048 md/raid5: let multiple devices of stripe_head share page
In current implementation, grow_buffers() uses alloc_page() to
allocate the buffers for each stripe_head, i.e. allocate a page
for each dev[i] in stripe_head.

After setting stripe_size as a configurable value by writing
sysfs entry, it means that we always allocate 64K buffers, but
just use 4K of them when stripe_size is 4K in 64KB arm64.

To avoid wasting memory, we try to let multiple sh->dev share
one real page. That means, multiple sh->dev[i].page will point
to the only page with different offset. Example of 64K PAGE_SIZE
and 4K stripe_size as following:

                    64K PAGE_SIZE
          +---+---+---+---+------------------------------+
          |   |   |   |   |
          |   |   |   |   |
          +-+-+-+-+-+-+-+-+------------------------------+
            ^   ^   ^   ^
            |   |   |   +----------------------------+
            |   |   |                                |
            |   |   +-------------------+            |
            |   |                       |            |
            |   +----------+            |            |
            |              |            |            |
            +-+            |            |            |
              |            |            |            |
        +-----+-----+------+-----+------+-----+------+------+
sh      | offset(0) | offset(4K) | offset(8K) | offset(12K) |
 +      +-----------+------------+------------+-------------+
 +----> dev[0].page  dev[1].page  dev[2].page  dev[3].page

A new 'pages' array will be added into stripe_head to record shared
page used by this stripe_head. Allocate them when grow_buffers()
and free them when shrink_buffers().

After trying to share page, the users of sh->dev[i].page need to take
care of the related page offset: page of issued bio and page passed
to xor compution functions. But thanks for previous different page offset
supported. Here, we just need to set correct dev[i].offset.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:44 -07:00
Yufen Yu
4f86ff5580 md/raid6: let async recovery function support different page offset
For now, asynchronous raid6 recovery calculate functions are require
common offset for pages. But, we expect them to support different page
offset after introducing stripe shared page. Do that by simplily adding
page offset where each page address are referred. Then, replace the
old interface with the new ones in raid6 and raid6test.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:44 -07:00
Yufen Yu
d69454bc9f md/raid6: let syndrome computor support different page offset
For now, syndrome compute functions require common offset in the pages
array. However, we expect them to support different offset when try to
use shared page in the following. Simplily covert them by adding page
offset where each page address are referred.

Since the only caller of async_gen_syndrome() and async_syndrome_val()
are in raid6, we don't want to reserve the old interface but modify the
interface directly. After that, replacing old interfaces with new ones
for raid6 and raid6test.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:44 -07:00
Yufen Yu
a7c224a820 md/raid5: convert to new xor compution interface
We try to replace async_xor() and async_xor_val() with the new
introduced interface async_xor_offs() and async_xor_val_offs()
for raid456.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:44 -07:00
Yufen Yu
29bcff787a md/raid5: add new xor function to support different page offset
raid5 will call async_xor() and async_xor_val() to compute xor.
For now, both of them require the common src/dst page offset. But,
we want them to support different src/dst page offset for following
shared page.

Here, adding two new function async_xor_offs() and async_xor_val_offs()
respectively for async_xor() and async_xor_val().

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:44 -07:00
Yufen Yu
248728dd04 md/raid5: make async_copy_data() to support different page offset
ops_run_biofill() and ops_run_biodrain() will call async_copy_data()
to copy sh->dev[i].page from or to bio page. For now, it implies the
offset of dev[i].page is 0. But we want to support different page offset
in the following.

Thus, pass page offset to these functions and replace 'page_offset'
with 'page_offset + poff'.

No functional change.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:44 -07:00