Modify the comment typo: "compliment" -> "complement".
Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
INIT_IPS and GATE_ENABLE fields have a wrong offset in SG_CONFIG_REG_3.
This register is used by stream gate control of PSFP, and it has not
been used before, because PSFP is not implemented in ocelot driver.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
state->speed holds a value of 10, 100, 1000 or 2500, but
QSYS_TAG_CONFIG_LINK_SPEED expects a value of 0, 1, 2, 3. So convert the
speed to a proper value.
Fixes: de143c0e27 ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It should also be regarded as an error when hw return status=4 for PF's
setting mac cmd. Only if PF return status=4 to VF should this cmd be
taken special treatment.
Fixes: 7dd29ee128 ("hinic: add sriov feature support")
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang says:
====================
mptcp: RM_ADDR/ADD_ADDR enhancements
This series include two enhancements for the MPTCP path management,
namely RM_ADDR support and ADD_ADDR echo support, as specified by RFC
sections 3.4.1 and 3.4.2.
1 RM_ADDR support include 9 patches (1-3 and 8-13):
Patch 1 is the helper for patch 2, these two patches add the RM_ADDR
outgoing functions, which are derived from ADD_ADDR's corresponding
functions.
Patch 3 adds the RM_ADDR incoming logic, when RM_ADDR suboption is
received, close the subflow matching the rm_id, and update PM counter.
Patch 8 is the main remove routine. When the PM netlink removes an address,
we traverse all the existing msk sockets to find the relevant sockets. Then
trigger the RM_ADDR signal and remove the subflow which using this local
address, this subflow removing functions has been implemented in patch 9.
Finally, patches 10-13 are the self-tests for RM_ADDR.
2 ADD_ADDR echo support include 7 patches (4-7 and 14-16).
Patch 4 adds the ADD_ADDR echo logic, when the ADD_ADDR suboption has been
received, send out the same ADD_ADDR suboption with echo-flag, and no HMAC
included.
Patches 5 and 6 are the self-tests for ADD_ADDR echo. Patch 7 is a little
cleaning up.
Patch 14 and 15 are the helpers for patch 16. These three patches add
the ADD_ADDR retransmition when no ADD_ADDR echo is received.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implemented the retransmition of ADD_ADDR when no ADD_ADDR echo
is received. It added a timer with the announced address. When timeout
occurs, ADD_ADDR will be retransmitted.
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch added a new helper sk_stop_timer_sync, it deactivates a timer
like sk_stop_timer, but waits for the handler to finish.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new struct mptcp_pm_add_entry to describe add_addr's entry.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch added the remove addr and subflow test cases and two new
functions.
The first function run_remove_tests calls do_transfer with two new
arguments, rm_nr_ns1 and rm_nr_ns2, for the numbers of addresses should be
removed during the transfer process in namespace 1 and namespace 2.
If both these two arguments are 0, we do the join test cases with
"mptcp_connect -j" command. Otherwise, do the remove test cases with
"mptcp_connect -r" command.
The second function chk_rm_nr checks the RM_ADDR related mibs's counters.
The output of the test cases looks like this:
11 remove single subflow syn[ ok ] - synack[ ok ] - ack[ ok ]
rm [ ok ] - sf [ ok ]
12 remove multiple subflows syn[ ok ] - synack[ ok ] - ack[ ok ]
rm [ ok ] - sf [ ok ]
13 remove single address syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ]
rm [ ok ] - sf [ ok ]
14 remove subflow and signal syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ]
rm [ ok ] - sf [ ok ]
15 remove subflows and signal syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ]
rm [ ok ] - sf [ ok ]
Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch added a new cfg, named cfg_remove in mptcp_connect. This new
cfg_remove is copied from cfg_join. The only difference between them is in
the do_rnd_write function. Here we slow down the transfer process of all
data to let the RM_ADDR suboption can be sent and received completely.
Otherwise the remove address and subflow test cases don't work.
Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch added a new helper named mptcp_destroy_common containing the
shared code between mptcp_destroy() and mptcp_sock_destruct().
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch added two new mibs for RM_ADDR, named MPTCP_MIB_RMADDR and
MPTCP_MIB_RMSUBFLOW, when the RM_ADDR suboption is received, increase
the first mib counter, when the local subflow is removed, increase the
second mib counter.
Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implemented the local subflow removing function,
mptcp_pm_remove_subflow, it simply called mptcp_pm_nl_rm_subflow_received
under the PM spin lock.
We use mptcp_pm_remove_subflow to remove a local subflow, so change it's
argument from remote_id to local_id.
We check subflow->local_id in mptcp_pm_nl_rm_subflow_received to remove
a subflow.
Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the remove announced addr and subflow logic in PM
netlink.
When the PM netlink removes an address, we traverse all the existing msk
sockets to find the relevant sockets.
We add a new list named anno_list in mptcp_pm_data, to record all the
announced addrs. In the traversing, we check if it has been recorded.
If it has been, we trigger the RM_ADDR signal.
We also check if this address is in conn_list. If it is, we remove the
subflow which using this local address.
Since we call mptcp_pm_free_anno_list in mptcp_destroy, we need to move
__mptcp_init_sock before the mptcp_is_enabled check in mptcp_init_sock.
Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The re-check of pm->accept_subflow with pm->lock held was missing, this
patch fixed it.
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch added the ADD_ADDR related mibs counter check function
chk_add_nr(). This function check both ADD_ADDR and ADD_ADDR with
echo flag.
The output looks like this:
07 unused signal address syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ]
08 signal address syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ]
09 subflow and signal syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ]
10 multiple subflows and signal syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ]
11 remove subflow and signal syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ]
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch added two mibs for ADD_ADDR, MPTCP_MIB_ADDADDR for receiving
of the ADD_ADDR suboption with echo-flag=0, and MPTCP_MIB_ECHOADD for
receiving the ADD_ADDR suboption with echo-flag=1.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the ADD_ADDR suboption has been received, we need to send out the same
ADD_ADDR suboption with echo-flag=1, and no HMAC.
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch added the RM_ADDR option parsing logic:
We parsed the incoming options to find if the rm_addr option is received,
and called mptcp_pm_rm_addr_received to schedule PM work to a new status,
named MPTCP_PM_RM_ADDR_RECEIVED.
PM work got this status, and called mptcp_pm_nl_rm_addr_received to handle
it.
In mptcp_pm_nl_rm_addr_received, we closed the subflow matching the rm_id,
and updated PM counter.
Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch added a new signal named rm_addr_signal in PM. On outgoing path,
we called mptcp_pm_should_rm_signal to check if rm_addr_signal has been
set. If it has been, we sent out the RM_ADDR option.
Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch renamed addr_signal and the related functions with the explicit
word "add".
Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This series includes mlx5 updates
1) Add support for Connection Tracking offload in NIC mode.
Supporting CT offload in NIC mode on Mellanox cards is useful for
scenarios where the dual port NIC serves as a gateway between 2
networks and forwards traffic between these networks.
Since the traffic is not terminated on the host in this case,
no use of SRIOV VFs and/or switchdev mode is required.
Today Mellanox NIC cards already support offloading of packet forwarding
between physical ports without going to the host so combining it with CT
offloading allows users to create a gateway with forwarding and CT
(Including NAT) offloading capabilities in non-switchdev mode.
To support connection tracking in non-Switchdev mode (Single NIC mode),
we need to make use of the current Connection tracking infrastructure
implemented on top of E-Switch and the mlx5 generic flow table chains
APIs, to make it work on non-Eswitch steering domain e.g. NIC RX domain,
the following was performed:
1.1) Refactor current flow steering chains infrastructure and
updates TC nic mode implementation to use flow table chains.
1.2) Refactor current Connection Tracking (CT) infrastructure to not
assume E-switch backend, and make the CT layer agnostic to
underlying steering mode (E-Switch/NIC)
1.3) Plumbing to support CT offload in NIC mode.
2) Trivial code cleanups.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl9rz9cACgkQSD+KveBX
+j7K/gf/ZysTFuFuC7MCo7xJO2vxlGGE1r6/ENsqonvUT2tcoZCdK9bZMw1Mx17Z
r1nyn0xQ3MwRheXMSpqXngTPpfGM6eNgV9CDfFXm62z6WXMYieen0t/LrM/mxo+2
s74Okp53peyGNpePyseewEUGV7zaR6F6uukkKvr441gvAOF3Fcfaz+dIv7KzxKNS
+b78yw0b6mGc4foYLSuJcDQlSwqjeIpdSib8xmETMZwRzCt20GCEBDsBAaKt0wzM
1fTZttY+kuLd/m/q+sh3s/4lN2kOO+dwK5NGf+RWtiOaDWT+J/ogVmI2ywXIwsg7
U63nhjGAr7GPqkaG0Jv3aS7na6pbSA==
=sByc
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2020-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2020-09-22
This series includes mlx5 updates
1) Add support for Connection Tracking offload in NIC mode.
Supporting CT offload in NIC mode on Mellanox cards is useful for
scenarios where the dual port NIC serves as a gateway between 2
networks and forwards traffic between these networks.
Since the traffic is not terminated on the host in this case,
no use of SRIOV VFs and/or switchdev mode is required.
Today Mellanox NIC cards already support offloading of packet forwarding
between physical ports without going to the host so combining it with CT
offloading allows users to create a gateway with forwarding and CT
(Including NAT) offloading capabilities in non-switchdev mode.
To support connection tracking in non-Switchdev mode (Single NIC mode),
we need to make use of the current Connection tracking infrastructure
implemented on top of E-Switch and the mlx5 generic flow table chains
APIs, to make it work on non-Eswitch steering domain e.g. NIC RX domain,
the following was performed:
1.1) Refactor current flow steering chains infrastructure and
updates TC nic mode implementation to use flow table chains.
1.2) Refactor current Connection Tracking (CT) infrastructure to not
assume E-switch backend, and make the CT layer agnostic to
underlying steering mode (E-Switch/NIC)
1.3) Plumbing to support CT offload in NIC mode.
2) Trivial code cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
1.
Move the lapb_register/lapb_unregister calls into the ndo_open/ndo_stop
functions.
This makes the LAPB protocol start/stop when the network interface
starts/stops. When the network interface is down, the LAPB protocol
shouldn't be running and the LAPB module shoudn't be generating control
frames.
2.
Move netif_start_queue/netif_stop_queue into the ndo_open/ndo_stop
functions.
This makes the TX queue start/stop when the network interface
starts/stops.
(netif_stop_queue was originally in the ndo_stop function. But to make
the code look better, I created a new function to use as ndo_stop, and
made it call the original ndo_stop function. I moved netif_stop_queue
from the original ndo_stop function to the new ndo_stop function.)
Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Documentation/networking/ip-sysctl.txt:46 says:
ip_forward_use_pmtu - BOOLEAN
By default we don't trust protocol path MTUs while forwarding
because they could be easily forged and can lead to unwanted
fragmentation by the router.
You only need to enable this if you have user-space software
which tries to discover path mtus by itself and depends on the
kernel honoring this information. This is normally not the case.
Default: 0 (disabled)
Possible values:
0 - disabled
1 - enabled
Which makes it pretty clear that setting it to 1 is a potential
security/safety/DoS issue, and yet it is entirely reasonable to want
forwarded traffic to honour explicitly administrator configured
route mtus (instead of defaulting to device mtu).
Indeed, I can't think of a single reason why you wouldn't want to.
Since you configured a route mtu you probably know better...
It is pretty common to have a higher device mtu to allow receiving
large (jumbo) frames, while having some routes via that interface
(potentially including the default route to the internet) specify
a lower mtu.
Note that ipv6 forwarding uses device mtu unless the route is locked
(in which case it will use the route mtu).
This approach is not usable for IPv4 where an 'mtu lock' on a route
also has the side effect of disabling TCP path mtu discovery via
disabling the IPv4 DF (don't frag) bit on all outgoing frames.
I'm not aware of a way to lock a route from an IPv6 RA, so that also
potentially seems wrong.
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Sunmeet Gill (Sunny) <sgill@quicinc.com>
Cc: Vinay Paradkar <vparadka@qti.qualcomm.com>
Cc: Tyler Wear <twear@quicinc.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei says:
====================
dpaa2-mac: add PCS support through the Lynx module
This patch set aims to add PCS support in the dpaa2-eth driver by
leveraging the Lynx PCS module.
The first two patches are some missing pieces: the first one adding
support for 10GBASER in Lynx PCS while the second one adds a new
function - of_mdio_find_device - which is helpful in retrieving the PCS
represented as a mdio_device. The final patch adds the glue logic
between phylink and the Lynx PCS module: it retrieves the PCS
represented as an mdio_device and registers it to Lynx and phylink.
From that point on, any PCS callbacks are treated by Lynx, without
dpaa2-eth interaction.
Changes in v2:
- move put_device() after destroy - 3/3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Include PCS support in the dpaa2-eth driver by integrating it with the
new Lynx PCS module. There is not much to talk about in terms of changes
needed in the dpaa2-eth driver since the only steps necessary are to
find the MDIO device representing the PCS, register it to the Lynx PCS
module and then let phylink know if its existence also.
After this, the PCS callbacks will be treated directly by Lynx, without
interraction from dpaa2-eth's part.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper function which finds the mdio_device structure given a
device tree node. This is helpful for finding the PCS device based on a
DTS node but managing it as a mdio_device instead of a phy_device.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support in the Lynx PCS module for the 10GBASE-R mode which is only
used to get the link state, since it offers a single fixed speed.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, ocelot switchdev passes the skb directly to the function that
enqueues it to the list of skb's awaiting a TX timestamp. Whereas the
felix DSA driver first clones the skb, then passes the clone to this
queue.
This matters because in the case of felix, the common IRQ handler, which
is ocelot_get_txtstamp(), currently clones the clone, and frees the
original clone. This is useless and can be simplified by using
skb_complete_tx_timestamp() instead of skb_tstamp_tx().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang says:
====================
net_sched: fix a UAF in tcf_action_init()
This patchset fixes a use-after-free triggered by syzbot. Please
find more details in each patch description.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot is able to trigger a failure case inside the loop in
tcf_action_init(), and when this happens we clean up with
tcf_action_destroy(). But, as these actions are already inserted
into the global IDR, other parallel process could free them
before tcf_action_destroy(), then we will trigger a use-after-free.
Fix this by deferring the insertions even later, after the loop,
and committing all the insertions in a separate loop, so we will
never fail in the middle of the insertions any more.
One side effect is that the window between alloction and final
insertion becomes larger, now it is more likely that the loop in
tcf_del_walker() sees the placeholder -EBUSY pointer. So we have
to check for error pointer in tcf_del_walker().
Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com
Fixes: 0190c1d452 ("net: sched: atomically check-allocate action")
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All TC actions call tcf_idr_insert() for new action at the end
of their ->init(), so we can actually move it to a central place
in tcf_action_init_1().
And once the action is inserted into the global IDR, other parallel
process could free it immediately as its refcnt is still 1, so we can
not fail after this, we need to move it after the goto action
validation to avoid handling the failure case after insertion.
This is found during code review, is not directly triggered by syzbot.
And this prepares for the next patch.
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drm/i915 fixes for v5.9-rc7:
- Fix selftest reference to stack data out of scope
- Fix GVT null pointer dereference
- Backmerge from Linus' master to fix build
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87zh5fpmha.fsf@intel.com
For far layout, the discard region is not continuous on disks. So it needs
far copies r10bio to cover all regions. It needs a way to know all r10bios
have finish or not. Similar with raid10_sync_request, only the first r10bio
master_bio records the discard bio. Other r10bios master_bio record the
first r10bio. The first r10bio can finish after other r10bios finish and
then return the discard bio.
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Now the discard request is split by chunk size. So it takes a long time
to finish mkfs on disks which support discard function. This patch improve
handling raid10 discard request. It uses the similar way with patch
29efc390b (md/md0: optimize raid0 discard handling).
But it's a little complex than raid0. Because raid10 has different layout.
If raid10 is offset layout and the discard request is smaller than stripe
size. There are some holes when we submit discard bio to underlayer disks.
For example: five disks (disk1 - disk5)
D01 D02 D03 D04 D05
D05 D01 D02 D03 D04
D06 D07 D08 D09 D10
D10 D06 D07 D08 D09
The discard bio just wants to discard from D03 to D10. For disk3, there is
a hole between D03 and D08. For disk4, there is a hole between D04 and D09.
D03 is a chunk, raid10_write_request can handle one chunk perfectly. So
the part that is not aligned with stripe size is still handled by
raid10_write_request.
If reshape is running when discard bio comes and the discard bio spans the
reshape position, raid10_write_request is responsible to handle this
discard bio.
I did a test with this patch set.
Without patch:
time mkfs.xfs /dev/md0
real4m39.775s
user0m0.000s
sys0m0.298s
With patch:
time mkfs.xfs /dev/md0
real0m0.105s
user0m0.000s
sys0m0.007s
nvme3n1 259:1 0 477G 0 disk
└─nvme3n1p1 259:10 0 50G 0 part
nvme4n1 259:2 0 477G 0 disk
└─nvme4n1p1 259:11 0 50G 0 part
nvme5n1 259:6 0 477G 0 disk
└─nvme5n1p1 259:12 0 50G 0 part
nvme2n1 259:9 0 477G 0 disk
└─nvme2n1p1 259:15 0 50G 0 part
nvme0n1 259:13 0 477G 0 disk
└─nvme0n1p1 259:14 0 50G 0 part
Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
The following patch will reuse these logics, so pull the same codes into
one function.
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Now it allocs r10bio->devs[conf->copies]. Discard bio needs to submit
to all member disks and it needs to use r10bio. So extend to
r10bio->devs[geo.raid_disks].
Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Move these logic from raid0.c to md.c, so that we can also use it in
raid10.c.
Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
"RESYNC_BLOCK_SIZE/512" is equal to "RESYNC_BLOCK_SIZE >> 9", replace it
with RESYNC_SECTORS.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
When try to resize stripe_size, we also need to free old
shared page array and allocate new.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
When reshape array, we try to reuse shared pages of old stripe_head,
and allocate more for the new one if needed.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
In current implementation, grow_buffers() uses alloc_page() to
allocate the buffers for each stripe_head, i.e. allocate a page
for each dev[i] in stripe_head.
After setting stripe_size as a configurable value by writing
sysfs entry, it means that we always allocate 64K buffers, but
just use 4K of them when stripe_size is 4K in 64KB arm64.
To avoid wasting memory, we try to let multiple sh->dev share
one real page. That means, multiple sh->dev[i].page will point
to the only page with different offset. Example of 64K PAGE_SIZE
and 4K stripe_size as following:
64K PAGE_SIZE
+---+---+---+---+------------------------------+
| | | | |
| | | | |
+-+-+-+-+-+-+-+-+------------------------------+
^ ^ ^ ^
| | | +----------------------------+
| | | |
| | +-------------------+ |
| | | |
| +----------+ | |
| | | |
+-+ | | |
| | | |
+-----+-----+------+-----+------+-----+------+------+
sh | offset(0) | offset(4K) | offset(8K) | offset(12K) |
+ +-----------+------------+------------+-------------+
+----> dev[0].page dev[1].page dev[2].page dev[3].page
A new 'pages' array will be added into stripe_head to record shared
page used by this stripe_head. Allocate them when grow_buffers()
and free them when shrink_buffers().
After trying to share page, the users of sh->dev[i].page need to take
care of the related page offset: page of issued bio and page passed
to xor compution functions. But thanks for previous different page offset
supported. Here, we just need to set correct dev[i].offset.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
For now, asynchronous raid6 recovery calculate functions are require
common offset for pages. But, we expect them to support different page
offset after introducing stripe shared page. Do that by simplily adding
page offset where each page address are referred. Then, replace the
old interface with the new ones in raid6 and raid6test.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
For now, syndrome compute functions require common offset in the pages
array. However, we expect them to support different offset when try to
use shared page in the following. Simplily covert them by adding page
offset where each page address are referred.
Since the only caller of async_gen_syndrome() and async_syndrome_val()
are in raid6, we don't want to reserve the old interface but modify the
interface directly. After that, replacing old interfaces with new ones
for raid6 and raid6test.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
We try to replace async_xor() and async_xor_val() with the new
introduced interface async_xor_offs() and async_xor_val_offs()
for raid456.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
raid5 will call async_xor() and async_xor_val() to compute xor.
For now, both of them require the common src/dst page offset. But,
we want them to support different src/dst page offset for following
shared page.
Here, adding two new function async_xor_offs() and async_xor_val_offs()
respectively for async_xor() and async_xor_val().
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
ops_run_biofill() and ops_run_biodrain() will call async_copy_data()
to copy sh->dev[i].page from or to bio page. For now, it implies the
offset of dev[i].page is 0. But we want to support different page offset
in the following.
Thus, pass page offset to these functions and replace 'page_offset'
with 'page_offset + poff'.
No functional change.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>