Commit Graph

67575 Commits

Author SHA1 Message Date
Shay Drory
0b5705ebc3 devlink: Add new "event_eq_size" generic device param
Add new device generic parameter to determine the size of the
asynchronous control events EQ.

For example, to reduce event EQ size to 64, execute:
$ devlink dev param set pci/0000:06:00.0 \
              name event_eq_size value 64 cmode driverinit
$ devlink dev reload pci/0000:06:00.0

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21 19:08:54 -08:00
Shay Drory
47402385d0 devlink: Add new "io_eq_size" generic device param
Add new device generic parameter to determine the size of the
I/O completion EQs.

For example, to reduce I/O EQ size to 64, execute:
$ devlink dev param set pci/0000:06:00.0 \
              name io_eq_size value 64 cmode driverinit
$ devlink dev reload pci/0000:06:00.0

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21 19:08:54 -08:00
Jakub Kicinski
294e70c952 This time we have:
* ndo_fill_forward_path support in mac80211, to let
    drivers use it
  * association comeback notification for userspace,
    to be able to react more sensibly to long delays
  * support for background radar detection hardware
    in some chipsets
  * SA Query Procedures offload on the AP side
  * more logging if we find problems with HT/VHT/HE
  * various cleanups and minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmHBuIoACgkQB8qZga/f
 l8SDNQ//bWl1fnVTzXcva16NGXNtQc8ufOdDfEHsusTA0qP1EfCDfhiMmRZ+jUQH
 Xdg7F3Yube0fgij1sEgpcoVOFm5wr7p861nljR8m71t9FI832gfd+qdCJicNxGGI
 B3zEhHCkcZ4yBhT35+cKG/H3WBysI8RO65dC6NVlzCyY1iM9TVkHBtbEKrdNljcM
 cKKWRp/fk7lCRVqLtunUd5kJauwJxjwHOm4GTH5BajbT/06m91GLoj/tZEjr9rQL
 aSsBa1nR0/LcMyYbbQYIxLikTZnkzILIJGLakb7k5ZJ2W4/hUv0Zn6LUCyMDM1mK
 7+Bt6qvB3Wz/TwjKYDm2qOniaD4IDVOtEpVPaXGau8c5Cj6rjnJ/cgF3ydBk4+xB
 5xngZBCk6Y4+epg9V7EWfqmV0vVqlWqfUfARwPulLWA1X15mVVBmcrafGEaLvGrC
 mvkq0n0XZzf+ObrILK7yjafOdLC4ATCj8j6RW85mH4yU+PqKrx3gOCrWn3Zm+6BN
 n6y7vs5x6zEitqjap4zsiVxqJf3jtAVcdVy7k52VF2BBpF8xoyrIMYZw5CNUG2Jv
 aTmW5aE8X9mQ2VT88JewZst0IX4jjfK/B8wOj24tokC2mXRdM5uKTOWK7uTFQJfM
 lLFcRYzo6n6epHrA5oBN4SnQ3/QpZNJOEsRxyROXemDxnQ9de+w=
 =u1jf
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-net-next-2021-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
This time we have:
 * ndo_fill_forward_path support in mac80211, to let drivers use it
 * association comeback notification for userspace, to be able
   to react more sensibly to long delays
 * support for background radar detection hardware in some chipsets
 * SA Query Procedures offload on the AP side
 * more logging if we find problems with HT/VHT/HE
 * various cleanups and minor fixes

Conflicts:

net/wireless/reg.c:
  e08ebd6d7b ("cfg80211: Acquire wiphy mutex on regulatory work")
  701fdfe348 ("cfg80211: Enable regulatory enforcement checks for drivers supporting mesh iface")
  https://lore.kernel.org/r/20211221111950.57ecc6a7@canb.auug.org.au

drivers/net/wireless/ath/ath10k/wmi.c:
  7f599aeccb ("cfg80211: Use the HE operation IE to determine a 6GHz BSS channel")
  3bf2537ec2 ("ath10k: drop beacon and probe response which leak from other channel")
  https://lore.kernel.org/r/20211221115004.1cd6b262@canb.auug.org.au

* tag 'mac80211-next-for-net-next-2021-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (32 commits)
  cfg80211: Enable regulatory enforcement checks for drivers supporting mesh iface
  rfkill: allow to get the software rfkill state
  cfg80211: refactor cfg80211_get_ies_channel_number()
  nl82011: clarify interface combinations wrt. channels
  nl80211: Add support to offload SA Query procedures for AP SME device
  nl80211: Add support to set AP settings flags with single attribute
  mac80211: add more HT/VHT/HE state logging
  cfg80211: Use the HE operation IE to determine a 6GHz BSS channel
  cfg80211: rename offchannel_chain structs to background_chain to avoid confusion with ETSI standard
  mac80211: Notify cfg80211 about association comeback
  cfg80211: Add support for notifying association comeback
  mac80211: introduce channel switch disconnect function
  cfg80211: Fix order of enum nl80211_band_iftype_attr documentation
  cfg80211: simplify cfg80211_chandef_valid()
  mac80211: Remove a couple of obsolete TODO
  mac80211: fix FEC flag in radio tap header
  mac80211: use coarse boottime for airtime fairness code
  ieee80211: change HE nominal packet padding value defines
  cfg80211: use ieee80211_bss_get_elem() instead of _get_ie()
  mac80211: Use memset_after() to clear tx status
  ...
====================

Link: https://lore.kernel.org/r/20211221112532.28708-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-21 07:41:52 -08:00
Yang Li
c48c94b0ab net/sched: use min() macro instead of doing it manually
Fix following coccicheck warnings:
./net/sched/cls_api.c:3333:17-18: WARNING opportunity for min()
./net/sched/cls_api.c:3389:17-18: WARNING opportunity for min()
./net/sched/cls_api.c:3427:17-18: WARNING opportunity for min()

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-21 10:16:47 +00:00
Matt Johnston
dbcefdeb2a mctp: emit RTM_NEWADDR and RTM_DELADDR
Userspace can receive notification of MCTP address changes via
RTNLGRP_MCTP_IFADDR rtnetlink multicast group.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://lore.kernel.org/r/20211220023104.1965509-1-matt@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-20 18:40:48 -08:00
Sriram R
701fdfe348 cfg80211: Enable regulatory enforcement checks for drivers supporting mesh iface
Currently cfg80211 checks for invalid channels whenever there is a
regulatory update and stops the active interfaces if it is operating on
an unsupported channel in the new regulatory domain.

This is done based on a regulatory flag REGULATORY_IGNORE_STALE_KICKOFF
set during wiphy registration which disables this enforcement when
unsupported interface modes are supported by driver.

Add support to enable this enforcement when Mesh Point interface type
is advertised by drivers.

Signed-off-by: Sriram R <quic_srirrama@quicinc.com>
Link: https://lore.kernel.org/r/1638409120-28997-1-git-send-email-quic_srirrama@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 11:18:30 +01:00
Emmanuel Grumbach
5bc9a9dd75 rfkill: allow to get the software rfkill state
iwlwifi needs to be able to differentiate between the
software rfkill state and the hardware rfkill state.

The reason for this is that iwlwifi needs to notify any
change in the software rfkill state even when it doesn't
own the device (which means even when the hardware rfkill
is asserted).

In order to be able to know the software rfkill when the
host does not own the device, iwlwifi needs to be able to
ask the state of the software rfkill ignoring the state
of the hardware rfkill.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://lore.kernel.org/r/20211219195124.125689-1-emmanuel.grumbach@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 11:02:38 +01:00
Johannes Berg
75cca1fac2 cfg80211: refactor cfg80211_get_ies_channel_number()
Now that this is no longer part of the bigger function,
we can get rid of the channel_num variable. Also change
the function to use the struct element helpers, instead
of open-coding the element handling.

Link: https://lore.kernel.org/r/20211202130913.a0adf67a9319.I6db0340a34fff18d78e9cd512f4abf855da4e43a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 10:57:19 +01:00
Veerendranath Jakkam
47301a74bb nl80211: Add support to set AP settings flags with single attribute
In previous method each AP settings flag is represented by a top-level
flag attribute and conversion to enum cfg80211_ap_settings_flags had to
be done before sending them to driver. This commit is to make it easier
to define new AP settings flags and sending them to driver.

This commit also deprecate sending of
%NL80211_ATTR_EXTERNAL_AUTH_SUPPORT in %NL80211_CMD_START_AP. But to
maintain backwards compatibility checks for
%NL80211_ATTR_EXTERNAL_AUTH_SUPPORT in %NL80211_CMD_START_AP when
%NL80211_ATTR_AP_SETTINGS_FLAGS not present in %NL80211_CMD_START_AP.

Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org>
Link: https://lore.kernel.org/r/1637911519-21306-1-git-send-email-vjakkam@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 10:41:26 +01:00
Johannes Berg
636ccdae4e mac80211: add more HT/VHT/HE state logging
Add more logging in places that affect HT/VHT/HE state, so
things get easier to debug.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211130131608.ac51d574458c.If197b45c5b31d2fbd254fa12c2d7c736f304d4ae@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 10:39:36 +01:00
Ayala Beker
7f599aeccb cfg80211: Use the HE operation IE to determine a 6GHz BSS channel
A non-collocated AP whose primary channel is not a PSC channel
may transmit a duplicated beacon on the corresponding PSC channel
in which it would indicate its true primary channel.
Use this inforamtion contained in the HE operation IE to determine
the primary channel of the AP.
In case of invalid infomration ignore it and use the channel
the frame was received on.

Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211202143322.71eb2176e54e.I130f678e4aa390973ab39d838bbfe7b2d54bff8e@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 10:38:32 +01:00
Lorenzo Bianconi
a95bfb876f cfg80211: rename offchannel_chain structs to background_chain to avoid confusion with ETSI standard
ETSI standard defines "Offchannel CAC" as:
"Off-Channel CAC is performed by a number of non-continuous checks
spread over a period in time. This period, which is required to
determine the presence of radar signals, is defined as the Off-Channel
CAC Time..
Minimum Off-Channel CAC Time 6 minutes and Maximum Off-Channel CAC Time
4 hours..".
mac80211 implementation refers to a dedicated hw chain used for continuous
radar monitoring. Rename offchannel_* references to background_* in
order to avoid confusion with ETSI standard.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/4204cc1d648d76b44557981713231e030a3bd991.1638190762.git.lorenzo@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 10:37:36 +01:00
Ilan Peer
852a07c10d mac80211: Notify cfg80211 about association comeback
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211129152938.d76eac9e51ee.I986cffab95d51adfee6d84964711644392005113@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 10:37:17 +01:00
Ilan Peer
a083ee8a4e cfg80211: Add support for notifying association comeback
Thought the underline driver MLME can handle association temporal
rejection with comeback, it is still useful to notify this to
user space, as user space might want to handle the temporal
rejection differently. For example, in case the comeback time
is too long, user space can deauthenticate immediately and try
to associate with a different AP.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211129152938.2467809e8cb3.I45574185b582666bc78eef0c29a4c36b478e5382@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 10:37:03 +01:00
Nathan Errera
6d50176428 mac80211: introduce channel switch disconnect function
Introduce a disconnect function that can be used when a
channel switch error occurs. The channel switch can request to
block the tx, and so, we need to make sure we do not send a deauth
frame in this case.

Signed-off-by: Nathan Errera <nathan.errera@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211129152938.cd2a615a0702.I9edb14785586344af17644b610ab5be109dcef00@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 10:36:51 +01:00
Johannes Berg
3bb1ccc4ed cfg80211: simplify cfg80211_chandef_valid()
There are a lot of duplicate checks in this function to
check the delta between the control channel and CF1.
With the addition of 320 MHz, this will become even more.
Simplify the code so that the common checks are done
only once for multiple bandwidths.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211129152938.2d0240b07f11.I759e8e990f5386ba2b56ffb2488a8d4e16e22c1b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 10:36:24 +01:00
Ilan Peer
cee04f3c3a mac80211: Remove a couple of obsolete TODO
The HE capability IE is an extension IE so remove
an irrelevant comments.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211129152938.550b95b5fca7.Ia31395e880172aefcc0a8c70ed060f84b94bdb83@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 10:36:14 +01:00
P Praneesh
57553c3a6c mac80211: fix FEC flag in radio tap header
In mac80211, while building radiotap header
IEEE80211_RADIOTAP_MCS_HAVE_FEC flag is missing when LDPC enabled
from driver, hence LDPC is not updated properly in radiotap header.
Fix that by adding HAVE_FEC flag while building radiotap header.

Signed-off-by: P Praneesh <quic_ppranees@quicinc.com>
Link: https://lore.kernel.org/r/1638294648-844-2-git-send-email-quic_ppranees@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 10:25:00 +01:00
Felix Fietkau
6a789ba679 mac80211: use coarse boottime for airtime fairness code
The time values used by the airtime fairness code only need to be accurate
enough to cover station activity detection.
Using ktime_get_coarse_boottime_ns instead of ktime_get_boottime_ns will
drop the accuracy down to jiffies intervals, but at the same time saves
a lot of CPU cycles in a hot path

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20211217114258.14619-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20 10:24:41 +01:00
Baowen Zheng
c86e0209dc flow_offload: validate flags of filter and actions
Add process to validate flags of filter and actions when adding
a tc filter.

We need to prevent adding filter with flags conflicts with its actions.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19 14:08:48 +00:00
Baowen Zheng
13926d19a1 flow_offload: add reoffload process to update hw_count
Add reoffload process to update hw_count when driver
is inserted or removed.

We will delete the action if it is with skip_sw flag and
not offloaded to any hardware in reoffload process.

When reoffloading actions, we still offload the actions
that are added independent of filters.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19 14:08:48 +00:00
Baowen Zheng
e8cb5bcf6e net: sched: save full flags for tc action
Save full action flags and return user flags when return flags to
user space.

Save full action flags to distinguish if the action is created
independent from classifier.

We made this change mainly for further patch to reoffload tc actions.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19 14:08:48 +00:00
Baowen Zheng
c7a66f8d8a flow_offload: add process to update action stats from hardware
When collecting stats for actions update them using both
hardware and software counters.

Stats update process should not run in context of preempt_disable.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19 14:08:48 +00:00
Baowen Zheng
bcd6436858 flow_offload: rename exts stats update functions with hw
Rename exts stats update functions with hw for readability.

We make this change also to update stats from hw for an action
when it is offloaded to hw as a single action.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19 14:08:48 +00:00
Baowen Zheng
7adc576512 flow_offload: add skip_hw and skip_sw to control if offload the action
We add skip_hw and skip_sw for user to control if offload the action
to hardware.

We also add in_hw_count for user to indicate if the action is offloaded
to any hardware.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19 14:08:48 +00:00
Baowen Zheng
8cbfe939ab flow_offload: allow user to offload tc action to net device
Use flow_indr_dev_register/flow_indr_dev_setup_offload to
offload tc action.

We need to call tc_cleanup_flow_action to clean up tc action entry since
in tc_setup_action, some actions may hold dev refcnt, especially the mirror
action.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19 14:08:48 +00:00
Baowen Zheng
c54e1d920f flow_offload: add ops to tc_action_ops for flow action setup
Add a new ops to tc_action_ops for flow action setup.

Refactor function tc_setup_flow_action to use this new ops.

We make this change to facilitate to add standalone action module.

We will also use this ops to offload action independent of filter
in following patch.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19 14:08:48 +00:00
Baowen Zheng
9c1c0e124c flow_offload: rename offload functions with offload instead of flow
To improves readability, we rename offload functions with offload instead
of flow.

The term flow is related to exact matches, so we rename these functions
with offload.

We make this change to facilitate single action offload functions naming.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19 14:08:48 +00:00
Baowen Zheng
5a9959008f flow_offload: add index to flow_action_entry structure
Add index to flow_action_entry structure and delete index from police and
gate child structure.

We make this change to offload tc action for driver to identify a tc
action.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19 14:08:47 +00:00
Baowen Zheng
40bd094d65 flow_offload: fill flags to action structure
Fill flags to action structure to allow user control if
the action should be offloaded to hardware or not.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19 14:08:47 +00:00
Yajun Deng
f85b244ee3 xdp: move the if dev statements to the first
The xdp_rxq_info_unreg() called by xdp_rxq_info_reg() is meaningless when
dev is NULL, so move the if dev statements to the first.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-18 12:35:49 +00:00
Jean Sacren
59060a47ca mptcp: clean up harmless false expressions
entry->addr.id is u8 with a range from 0 to 255 and MAX_ADDR_ID is 255.
We should drop both false expressions of (entry->addr.id > MAX_ADDR_ID).

We should also remove the obsolete parentheses in the first if branch.

Use U8_MAX for MAX_ADDR_ID and add a comment to show the link to
mptcp_addr_info.id as suggested by Mr. Matthieu Baerts.

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-17 19:27:05 -08:00
Paolo Abeni
3ce0852c86 mptcp: enforce HoL-blocking estimation
The MPTCP packet scheduler has sub-optimal behavior with asymmetric
subflows: if the faster subflow-level cwin is closed, the packet
scheduler can enqueue "too much" data on a slower subflow.

When all the data on the faster subflow is acked, if the mptcp-level
cwin is closed, and link utilization becomes suboptimal.

The solution is implementing blest-like[1] HoL-blocking estimation,
transmitting only on the subflow with the shorter estimated time to
flush the queued memory. If such subflows cwin is closed, we wait
even if other subflows are available.

This is quite simpler than the original blest implementation, as we
leverage the pacing rate provided by the TCP socket. To get a more
accurate estimation for the subflow linger-time, we maintain a
per-subflow weighted average of such info.

Additionally drop magic numbers usage in favor of newly defined
macros and use more meaningful names for status variable.

[1] http://dl.ifip.org/db/conf/networking/networking2016/1570234725.pdf

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/137
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-17 19:27:04 -08:00
Jakub Kicinski
7cd2802d74 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16 16:13:19 -08:00
Linus Torvalds
180f3bcfe3 Networking fixes for 5.16-rc6, including fixes from mac80211, wifi, bpf.
Current release - regressions:
 
  - dpaa2-eth: fix buffer overrun when reporting ethtool statistics
 
 Current release - new code bugs:
 
  - bpf: fix incorrect state pruning for <8B spill/fill
 
  - iavf:
      - add missing unlocks in iavf_watchdog_task()
      - do not override the adapter state in the watchdog task (again)
 
  - mlxsw: spectrum_router: consolidate MAC profiles when possible
 
 Previous releases - regressions:
 
  - mac80211, fix:
      - rate control, avoid driver crash for retransmitted frames
      - regression in SSN handling of addba tx
      - a memory leak where sta_info is not freed
      - marking TX-during-stop for TX in in_reconfig, prevent stall
 
  - cfg80211: acquire wiphy mutex on regulatory work
 
  - wifi drivers: fix build regressions and LED config dependency
 
  - virtio_net: fix rx_drops stat for small pkts
 
  - dsa: mv88e6xxx: unforce speed & duplex in mac_link_down()
 
 Previous releases - always broken:
 
  - bpf, fix:
     - kernel address leakage in atomic fetch
     - kernel address leakage in atomic cmpxchg's r0 aux reg
     - signed bounds propagation after mov32
     - extable fixup offset
     - extable address check
 
  - mac80211:
      - fix the size used for building probe request
      - send ADDBA requests using the tid/queue of the aggregation
        session
      - agg-tx: don't schedule_and_wake_txq() under sta->lock,
        avoid deadlocks
      - validate extended element ID is present
 
  - mptcp:
      - never allow the PM to close a listener subflow (null-defer)
      - clear 'kern' flag from fallback sockets, prevent crash
      - fix deadlock in __mptcp_push_pending()
 
  - inet_diag: fix kernel-infoleak for UDP sockets
 
  - xsk: do not sleep in poll() when need_wakeup set
 
  - smc: avoid very long waits in smc_release()
 
  - sch_ets: don't remove idle classes from the round-robin list
 
  - netdevsim:
      - zero-initialize memory for bpf map's value, prevent info leak
      - don't let user space overwrite read only (max) ethtool parms
 
  - ixgbe: set X550 MDIO speed before talking to PHY
 
  - stmmac:
      - fix null-deref in flower deletion w/ VLAN prio Rx steering
      - dwmac-rk: fix oob read in rk_gmac_setup
 
  - ice: time stamping fixes
 
  - systemport: add global locking for descriptor life cycle
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmG7rdUACgkQMUZtbf5S
 IrtRvw//etsgeg2+zxe+fBSbe7ZihcCB4yzWUoRDdNzPrLNLsnWxKT1wYblDcZft
 b1f/SpTy9ycfg+fspn2qET8gzydn4m9xHkjmlQPzmXB9tdIDF6mECFTAXYlar1hQ
 RQIijpfZYyrZeGdgHpsyq72YC4dpNdbZrxmQFVdpMr3cK8P2N0Dn32bBVa//+jb+
 LCv3Uw9C0yNbqhtRIiukkWIE20+/pXtKm0uErDVmvonqFMWPo6mYD0C2PwC20PwR
 Kv5ok6jH+44fCSwDoLChbB+Wes0AtrIQdUvUwXGXaF3MDfZl+24oLkX5xJl3EHWT
 90Mh0k0NhRORgBZ3NItwK7OliohrRHCYxlAXPjg1Dicxl+kxl0wPlva8v64eAA+u
 ZhwXwaQpCrZNdKoxHJw9kQ/CmbggtxcWkVolbZp3TzDjYY1E7qxuwg51YMhGmGT1
 FPjradYGvHKi+thizJiEdiZaMKRc8bpaL0hbpROxFQvfjNwFOwREQhtnXYP3W5Kd
 lK88fWaH86dxqL+ABvbrMnSZKuNlSL8R/CROWpZuF+vyLRXaxhAvYRrL79bgmkKq
 zvImnh1mFovdyKGJhibFMdy92X14z8FzoyX3VQuFcl9EB+2NQXnNZ6abDLJlufZX
 A0jQ5r46Ce/yyaXXmS61PrP7Pf5sxhs/69fqAIDQfSSzpyUKHd4=
 =VIbd
 -----END PGP SIGNATURE-----

Merge tag 'net-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from mac80211, wifi, bpf.

  Relatively large batches of fixes from BPF and the WiFi stack, calm in
  general networking.

  Current release - regressions:

   - dpaa2-eth: fix buffer overrun when reporting ethtool statistics

  Current release - new code bugs:

   - bpf: fix incorrect state pruning for <8B spill/fill

   - iavf:
       - add missing unlocks in iavf_watchdog_task()
       - do not override the adapter state in the watchdog task (again)

   - mlxsw: spectrum_router: consolidate MAC profiles when possible

  Previous releases - regressions:

   - mac80211 fixes:
       - rate control, avoid driver crash for retransmitted frames
       - regression in SSN handling of addba tx
       - a memory leak where sta_info is not freed
       - marking TX-during-stop for TX in in_reconfig, prevent stall

   - cfg80211: acquire wiphy mutex on regulatory work

   - wifi drivers: fix build regressions and LED config dependency

   - virtio_net: fix rx_drops stat for small pkts

   - dsa: mv88e6xxx: unforce speed & duplex in mac_link_down()

  Previous releases - always broken:

   - bpf fixes:
       - kernel address leakage in atomic fetch
       - kernel address leakage in atomic cmpxchg's r0 aux reg
       - signed bounds propagation after mov32
       - extable fixup offset
       - extable address check

   - mac80211:
       - fix the size used for building probe request
       - send ADDBA requests using the tid/queue of the aggregation
         session
       - agg-tx: don't schedule_and_wake_txq() under sta->lock, avoid
         deadlocks
       - validate extended element ID is present

   - mptcp:
       - never allow the PM to close a listener subflow (null-defer)
       - clear 'kern' flag from fallback sockets, prevent crash
       - fix deadlock in __mptcp_push_pending()

   - inet_diag: fix kernel-infoleak for UDP sockets

   - xsk: do not sleep in poll() when need_wakeup set

   - smc: avoid very long waits in smc_release()

   - sch_ets: don't remove idle classes from the round-robin list

   - netdevsim:
       - zero-initialize memory for bpf map's value, prevent info leak
       - don't let user space overwrite read only (max) ethtool parms

   - ixgbe: set X550 MDIO speed before talking to PHY

   - stmmac:
       - fix null-deref in flower deletion w/ VLAN prio Rx steering
       - dwmac-rk: fix oob read in rk_gmac_setup

   - ice: time stamping fixes

   - systemport: add global locking for descriptor life cycle"

* tag 'net-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (89 commits)
  bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
  selftest/bpf: Add a test that reads various addresses.
  bpf: Fix extable address check.
  bpf: Fix extable fixup offset.
  bpf, selftests: Add test case trying to taint map value pointer
  bpf: Make 32->64 bounds propagation slightly more robust
  bpf: Fix signed bounds propagation after mov32
  sit: do not call ipip6_dev_free() from sit_init_net()
  net: systemport: Add global locking for descriptor lifecycle
  net/smc: Prevent smc_release() from long blocking
  net: Fix double 0x prefix print in SKB dump
  virtio_net: fix rx_drops stat for small pkts
  dsa: mv88e6xxx: fix debug print for SPEED_UNFORCED
  sfc_ef100: potential dereference of null pointer
  net: stmmac: dwmac-rk: fix oob read in rk_gmac_setup
  net: usb: lan78xx: add Allied Telesis AT29M2-AF
  net/packet: rx_owner_map depends on pg_vec
  netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc
  dpaa2-eth: fix ethtool statistics
  ixgbe: set X550 MDIO speed before talking to PHY
  ...
2021-12-16 15:02:14 -08:00
Jakub Kicinski
0c3e247460 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-12-16

We've added 15 non-merge commits during the last 7 day(s) which contain
a total of 12 files changed, 434 insertions(+), 30 deletions(-).

The main changes are:

1) Fix incorrect verifier state pruning behavior for <8B register spill/fill,
   from Paul Chaignon.

2) Fix x86-64 JIT's extable handling for fentry/fexit when return pointer
   is an ERR_PTR(), from Alexei Starovoitov.

3) Fix 3 different possibilities that BPF verifier missed where unprivileged
   could leak kernel addresses, from Daniel Borkmann.

4) Fix xsk's poll behavior under need_wakeup flag, from Magnus Karlsson.

5) Fix an oob-write in test_verifier due to a missed MAX_NR_MAPS bump,
   from Kumar Kartikeya Dwivedi.

6) Fix a race in test_btf_skc_cls_ingress selftest, from Martin KaFai Lau.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
  selftest/bpf: Add a test that reads various addresses.
  bpf: Fix extable address check.
  bpf: Fix extable fixup offset.
  bpf, selftests: Add test case trying to taint map value pointer
  bpf: Make 32->64 bounds propagation slightly more robust
  bpf: Fix signed bounds propagation after mov32
  bpf, selftests: Update test case for atomic cmpxchg on r0 with pointer
  bpf: Fix kernel address leakage in atomic cmpxchg's r0 aux reg
  bpf, selftests: Add test case for atomic fetch on spilled pointer
  bpf: Fix kernel address leakage in atomic fetch
  selftests/bpf: Fix OOB write in test_verifier
  xsk: Do not sleep in poll() when need_wakeup set
  selftests/bpf: Tests for state pruning with u32 spill/fill
  bpf: Fix incorrect state pruning for <8B spill/fill
====================

Link: https://lore.kernel.org/r/20211216210005.13815-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16 13:06:49 -08:00
Eric Dumazet
e28587cc49 sit: do not call ipip6_dev_free() from sit_init_net()
ipip6_dev_free is sit dev->priv_destructor, already called
by register_netdevice() if something goes wrong.

Alternative would be to make ipip6_dev_free() robust against
multiple invocations, but other drivers do not implement this
strategy.

syzbot reported:

dst_release underflow
WARNING: CPU: 0 PID: 5059 at net/core/dst.c:173 dst_release+0xd8/0xe0 net/core/dst.c:173
Modules linked in:
CPU: 1 PID: 5059 Comm: syz-executor.4 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:dst_release+0xd8/0xe0 net/core/dst.c:173
Code: 4c 89 f2 89 d9 31 c0 5b 41 5e 5d e9 da d5 44 f9 e8 1d 90 5f f9 c6 05 87 48 c6 05 01 48 c7 c7 80 44 99 8b 31 c0 e8 e8 67 29 f9 <0f> 0b eb 85 0f 1f 40 00 53 48 89 fb e8 f7 8f 5f f9 48 83 c3 a8 48
RSP: 0018:ffffc9000aa5faa0 EFLAGS: 00010246
RAX: d6894a925dd15a00 RBX: 00000000ffffffff RCX: 0000000000040000
RDX: ffffc90005e19000 RSI: 000000000003ffff RDI: 0000000000040000
RBP: 0000000000000000 R08: ffffffff816a1f42 R09: ffffed1017344f2c
R10: ffffed1017344f2c R11: 0000000000000000 R12: 0000607f462b1358
R13: 1ffffffff1bfd305 R14: ffffe8ffffcb1358 R15: dffffc0000000000
FS:  00007f66c71a2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f88aaed5058 CR3: 0000000023e0f000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 dst_cache_destroy+0x107/0x1e0 net/core/dst_cache.c:160
 ipip6_dev_free net/ipv6/sit.c:1414 [inline]
 sit_init_net+0x229/0x550 net/ipv6/sit.c:1936
 ops_init+0x313/0x430 net/core/net_namespace.c:140
 setup_net+0x35b/0x9d0 net/core/net_namespace.c:326
 copy_net_ns+0x359/0x5c0 net/core/net_namespace.c:470
 create_new_namespaces+0x4ce/0xa00 kernel/nsproxy.c:110
 unshare_nsproxy_namespaces+0x11e/0x180 kernel/nsproxy.c:226
 ksys_unshare+0x57d/0xb50 kernel/fork.c:3075
 __do_sys_unshare kernel/fork.c:3146 [inline]
 __se_sys_unshare kernel/fork.c:3144 [inline]
 __x64_sys_unshare+0x34/0x40 kernel/fork.c:3144
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f66c882ce99
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f66c71a2168 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 00007f66c893ff60 RCX: 00007f66c882ce99
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000048040200
RBP: 00007f66c8886ff1 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fff6634832f R14: 00007f66c71a2300 R15: 0000000000022000
 </TASK>

Fixes: cf124db566 ("net: Fix inconsistent teardown and release of private netdev state.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20211216111741.1387540-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16 08:38:10 -08:00
D. Wythe
5c15b3123f net/smc: Prevent smc_release() from long blocking
In nginx/wrk benchmark, there's a hung problem with high probability
on case likes that: (client will last several minutes to exit)

server: smc_run nginx

client: smc_run wrk -c 10000 -t 1 http://server

Client hangs with the following backtrace:

0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f
1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6
2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c
3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de
4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3
5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc]
6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d
7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl
8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93
9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5
10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2
11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a
12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994
13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373
14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c

This issue dues to flush_work(), which is used to wait for
smc_connect_work() to finish in smc_release(). Once lots of
smc_connect_work() was pending or all executing work dangling,
smc_release() has to block until one worker comes to free, which
is equivalent to wait another smc_connnect_work() to finish.

In order to fix this, There are two changes:

1. For those idle smc_connect_work(), cancel it from the workqueue; for
   executing smc_connect_work(), waiting for it to finish. For that
   purpose, replace flush_work() with cancel_work_sync().

2. Since smc_connect() hold a reference for passive closing, if
   smc_connect_work() has been cancelled, release the reference.

Fixes: 24ac3a08e6 ("net/smc: rebuild nonblocking connect")
Reported-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16 08:11:05 -08:00
Florian Westphal
66495f301c fib: expand fib_rule_policy
Now that there is only one fib nla_policy there is no need to
keep the macro around.  Place it where its used.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16 07:18:35 -08:00
Florian Westphal
92e1bcee06 fib: rules: remove duplicated nla policies
The attributes are identical in all implementations so move the ipv4 one
into the core and remove the per-family nla policies.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16 07:18:35 -08:00
Gal Pressman
8a03ef676a net: Fix double 0x prefix print in SKB dump
When printing netdev features %pNF already takes care of the 0x prefix,
remove the explicit one.

Fixes: 6413139dfc ("skbuff: increase verbosity when dumping skb data")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16 11:08:15 +00:00
Willem de Bruijn
ec6af094ea net/packet: rx_owner_map depends on pg_vec
Packet sockets may switch ring versions. Avoid misinterpreting state
between versions, whose fields share a union. rx_owner_map is only
allocated with a packet ring (pg_vec) and both are swapped together.
If pg_vec is NULL, meaning no packet ring was allocated, then neither
was rx_owner_map. And the field may be old state from a tpacket_v3.

Fixes: 61fad6816f ("net/packet: tpacket_rcv: avoid a producer race condition")
Reported-by: Syzbot <syzbot+1ac0994a0a0c55151121@syzkaller.appspotmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20211215143937.106178-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-15 17:49:36 -08:00
Jakub Kicinski
bd1d97d861 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next, mostly
rather small housekeeping patches:

1) Remove unused variable in IPVS, from GuoYong Zheng.

2) Use memset_after in conntrack, from Kees Cook.

3) Remove leftover function in nfnetlink_queue, from Florian Westphal.

4) Remove redundant test on bool in conntrack, from Bernard Zhao.

5) egress support for nft_fwd, from Lukas Wunner.

6) Make pppoe work for br_netfilter, from Florian Westphal.

7) Remove unused variable in conntrack resize routine, from luo penghao.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
  netfilter: conntrack: Remove useless assignment statements
  netfilter: bridge: add support for pppoe filtering
  netfilter: nft_fwd_netdev: Support egress hook
  netfilter: ctnetlink: remove useless type conversion to bool
  netfilter: nf_queue: remove leftover synchronize_rcu
  netfilter: conntrack: Use memset_startat() to zero struct nf_conn
  ipvs: remove unused variable for ip_vs_new_dest
====================

Link: https://lore.kernel.org/r/20211215234911.170741-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-15 17:29:28 -08:00
luo penghao
284ca7647c netfilter: conntrack: Remove useless assignment statements
The old_size assignment here will not be used anymore

The clang_analyzer complains as follows:

Value stored to 'old_size' is never read

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-16 00:17:40 +01:00
Jakub Kicinski
3bc14ea0d1 ethtool: always write dev in ethnl_parse_header_dev_get
Commit 0976b888a1 ("ethtool: fix null-ptr-deref on ref tracker")
made the write to req_info.dev conditional, but as Eric points out
in a different follow up the structure is often allocated on the
stack and not kzalloc()'d so seems safer to always write the dev,
in case it's garbage on input.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:09:24 +00:00
Eric Dumazet
f1d9268e06 net: add net device refcount tracker to struct packet_type
Most notable changes are in af_packet, tipc ones are trivial.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jon Maloy <jmaloy@redhat.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:07:04 +00:00
Eric Dumazet
34ac17ecbf ethtool: use ethnl_parse_header_dev_put()
It seems I missed that most ethnl_parse_header_dev_get() callers
declare an on-stack struct ethnl_req_info, and that they simply call
dev_put(req_info.dev) when about to return.

Add ethnl_parse_header_dev_put() helper to properly untrack
reference taken by ethnl_parse_header_dev_get().

Fixes: e4b8954074 ("netlink: add net device refcount tracker to struct ethnl_req_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 10:27:47 +00:00
Maxim Galaganov
3d79e3756c mptcp: fix deadlock in __mptcp_push_pending()
__mptcp_push_pending() may call mptcp_flush_join_list() with subflow
socket lock held. If such call hits mptcp_sockopt_sync_all() then
subsequently __mptcp_sockopt_sync() could try to lock the subflow
socket for itself, causing a deadlock.

sysrq: Show Blocked State
task:ss-server       state:D stack:    0 pid:  938 ppid:     1 flags:0x00000000
Call Trace:
 <TASK>
 __schedule+0x2d6/0x10c0
 ? __mod_memcg_state+0x4d/0x70
 ? csum_partial+0xd/0x20
 ? _raw_spin_lock_irqsave+0x26/0x50
 schedule+0x4e/0xc0
 __lock_sock+0x69/0x90
 ? do_wait_intr_irq+0xa0/0xa0
 __lock_sock_fast+0x35/0x50
 mptcp_sockopt_sync_all+0x38/0xc0
 __mptcp_push_pending+0x105/0x200
 mptcp_sendmsg+0x466/0x490
 sock_sendmsg+0x57/0x60
 __sys_sendto+0xf0/0x160
 ? do_wait_intr_irq+0xa0/0xa0
 ? fpregs_restore_userregs+0x12/0xd0
 __x64_sys_sendto+0x20/0x30
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f9ba546c2d0
RSP: 002b:00007ffdc3b762d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f9ba56c8060 RCX: 00007f9ba546c2d0
RDX: 000000000000077a RSI: 0000000000e5e180 RDI: 0000000000000234
RBP: 0000000000cc57f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ba56c8060
R13: 0000000000b6ba60 R14: 0000000000cc7840 R15: 41d8685b1d7901b8
 </TASK>

Fix the issue by using __mptcp_flush_join_list() instead of plain
mptcp_flush_join_list() inside __mptcp_push_pending(), as suggested by
Florian. The sockopt sync will be deferred to the workqueue.

Fixes: 1b3e7ede13 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/244
Suggested-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Maxim Galaganov <max@internet.ru>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14 18:49:40 -08:00
Florian Westphal
d6692b3b97 mptcp: clear 'kern' flag from fallback sockets
The mptcp ULP extension relies on sk->sk_sock_kern being set correctly:
It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from
working for plain tcp sockets (any userspace-exposed socket).

But in case of fallback, accept() can return a plain tcp sk.
In such case, sk is still tagged as 'kernel' and setsockopt will work.

This will crash the kernel, The subflow extension has a NULL ctx->conn
mptcp socket:

BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0
Call Trace:
 tcp_data_ready+0xf8/0x370
 [..]

Fixes: cf7da0d66c ("mptcp: Create SUBFLOW socket for incoming connections")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14 18:49:39 -08:00
Florian Westphal
404cd9a221 mptcp: remove tcp ulp setsockopt support
TCP_ULP setsockopt cannot be used for mptcp because its already
used internally to plumb subflow (tcp) sockets to the mptcp layer.

syzbot managed to trigger a crash for mptcp connections that are
in fallback mode:

KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
CPU: 1 PID: 1083 Comm: syz-executor.3 Not tainted 5.16.0-rc2-syzkaller #0
RIP: 0010:tls_build_proto net/tls/tls_main.c:776 [inline]
[..]
 __tcp_set_ulp net/ipv4/tcp_ulp.c:139 [inline]
 tcp_set_ulp+0x428/0x4c0 net/ipv4/tcp_ulp.c:160
 do_tcp_setsockopt+0x455/0x37c0 net/ipv4/tcp.c:3391
 mptcp_setsockopt+0x1b47/0x2400 net/mptcp/sockopt.c:638

Remove support for TCP_ULP setsockopt.

Fixes: d9e4c12918 ("mptcp: only admit explicitly supported sockopt")
Reported-by: syzbot+1fd9b69cde42967d1add@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14 18:49:39 -08:00