Commit Graph

903445 Commits

Author SHA1 Message Date
Andrii Nakryiko
babf316409 bpf: Add bpf_link_new_file that doesn't install FD
Add bpf_link_new_file() API for cases when we need to ensure anon_inode is
successfully created before we proceed with expensive BPF program attachment
procedure, which will require equally (if not more so) expensive and
potentially failing compensation detachment procedure just because anon_inode
creation failed. This API allows to simplify code by ensuring first that
anon_inode is created and after BPF program is attached proceed with
fd_install() that can't fail.

After anon_inode file is created, link can't be just kfree()'d anymore,
because its destruction will be performed by deferred file_operations->release
call. For this, bpf_link API required specifying two separate operations:
release() and dealloc(), former performing detachment only, while the latter
frees memory used by bpf_link itself. dealloc() needs to be specified, because
struct bpf_link is frequently embedded into link type-specific container
struct (e.g., struct bpf_raw_tp_link), so bpf_link itself doesn't know how to
properly free the memory. In case when anon_inode file was successfully
created, but subsequent BPF attachment failed, bpf_link needs to be marked as
"defunct", so that file's release() callback will perform only memory
deallocation, but no detachment.

Convert raw tracepoint and tracing attachment to new API and eliminate
detachment from error handling path.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200309231051.1270337-1-andriin@fb.com
2020-03-11 14:02:48 +01:00
Nicolas Cavallari
ba32679cac mac80211: Do not send mesh HWMP PREQ if HWMP is disabled
When trying to transmit to an unknown destination, the mesh code would
unconditionally transmit a HWMP PREQ even if HWMP is not the current
path selection algorithm.

Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Link: https://lore.kernel.org/r/20200305140409.12204-1-cavallar@lri.fr
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-11 09:04:14 +01:00
Jakub Kicinski
5cde05c61c nl80211: add missing attribute validation for channel switch
Add missing attribute validation for NL80211_ATTR_OPER_CLASS
to the netlink policy.

Fixes: 1057d35ede ("cfg80211: introduce TDLS channel switch commands")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20200303051058.4089398-4-kuba@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-11 08:58:39 +01:00
Jakub Kicinski
056e9375e1 nl80211: add missing attribute validation for beacon report scanning
Add missing attribute validation for beacon report scanning
to the netlink policy.

Fixes: 1d76250bd3 ("nl80211: support beacon report scanning")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20200303051058.4089398-3-kuba@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-11 08:58:31 +01:00
Jakub Kicinski
0e1a1d853e nl80211: add missing attribute validation for critical protocol indication
Add missing attribute validation for critical protocol fields
to the netlink policy.

Fixes: 5de1798489 ("cfg80211: introduce critical protocol indication from user-space")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20200303051058.4089398-2-kuba@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-11 08:58:27 +01:00
David S. Miller
96ee187bad Merge branch 'ethtool-consolidate-irq-coalescing-part-3'
Jakub Kicinski says:

====================
ethtool: consolidate irq coalescing - part 3

Convert more drivers following the groundwork laid in a recent
patch set [1] and continued in [2]. The aim of the effort is to
consolidate irq coalescing parameter validation in the core.

This set converts 15 drivers in drivers/net/ethernet.
3 more conversion sets to come.

None of the drivers here checked all unsupported parameters.

[1] https://lore.kernel.org/netdev/20200305051542.991898-1-kuba@kernel.org/
[2] https://lore.kernel.org/netdev/20200306010602.1620354-1-kuba@kernel.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:54 -07:00
Jakub Kicinski
d13f1167ab net: gemini: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:53 -07:00
Jakub Kicinski
009ab69b4b net: cxgb4vf: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:53 -07:00
Jakub Kicinski
5608c64179 net: cxgb4: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:53 -07:00
Jakub Kicinski
62923b6abe net: cxgb3: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:53 -07:00
Jakub Kicinski
d824178d0f net: cxgb2: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:53 -07:00
Jakub Kicinski
bd4be35b4a net: mlx4: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:53 -07:00
Jakub Kicinski
812df69beb net: liquidio: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:53 -07:00
Jakub Kicinski
659d0760b0 net: bna: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:53 -07:00
Jakub Kicinski
3eb2efbea1 net: tg3: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:53 -07:00
Jakub Kicinski
f6f508c07a net: bcmgenet: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject all unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:53 -07:00
Jakub Kicinski
a0dadb331d net: bnx2x: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:53 -07:00
Jakub Kicinski
05c531452f net: bnx2: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:53 -07:00
Jakub Kicinski
f4a76615f0 net: systemport: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject most of unsupported
parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:52 -07:00
Jakub Kicinski
fcca747f18 net: aquantia: reject all unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver only rejected some of the unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:52 -07:00
Jakub Kicinski
8e4f90caf0 net: ena: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Sameeh Jubran <sameehj@amazon.com>
Acked-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:52 -07:00
Heiner Kallweit
314a9cbbfb r8169: simplify getting stats by using netdev_stats_to_stats64
Let netdev_stats_to_stats64() do the copy work for us.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:23:44 -07:00
Heiner Kallweit
047521d7b1 r8169: let rtl8169_mark_to_asic clear rx descriptor field opts2
Clearing opts2 belongs to preparing the descriptor for DMA engine use.
Therefore move it into rtl8169_mark_to_asic().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:23:23 -07:00
David S. Miller
6ee2425804 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2020-03-10

This series contains updates to ice and iavf drivers.

Cleaned up unnecessary parenthesis, which was pointed out by Sergei
Shtylyov.

Mitch updates the iavf and ice drivers to expand the limitation on the
number of queues that the driver can support to account for the newer
800-series capabilities.

Brett cleans up the error messages for both SR-IOV and non SR-IOV use
cases.  Fixed the logic when the ice driver is removed and a bare-metal
VF is passing traffic, which was causing a transmit hang on the VF.
Updated the ice driver to display "Link detected" field via ethtool,
when the driver is in safe mode.  Updated ice driver to properly set
VLAN pruning when transmit anti-spoof is off.

Avinash fixed a corner case in DCB, when switching from IEEE to CEE
mode, the DCBX mode does not get properly updated.

Dave updates the logic when switching from software DCB to firmware DCB
to renegotiate DCBX to ensure the firmware agent has up to date
information about the DCB settings of the link partner.

Lukasz increases the PF's mailbox receive queue size to the maximum to
prevent potential bottleneck or slow down occurring from the PF's
mailbox receive queue being full.

Bruce updates the ice driver to use strscpy() instead of strlcpy().
Cleaned up variable names that were not very descriptive with names that
had more meaning.

Anirudh replaces the use of ENOTSUPP with EOPNOTSUPP in the ice driver.

Jake fixed up a function header comment to properly reflect the variable
size and use.

v2: Dropped patch 5 of the original series, where Tony added tunnel
    offload support.  Based on community feedback, the patch needed
    changes, so giving Tony additional time to work on those changes and
    not hold up the remaining changes in the series.
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:20:03 -07:00
DENG Qingfang
13e787ca82 net: dsa: mt7530: fix macro MIRROR_PORT
The inner pair of parentheses should be around the variable x

Fixes: 37feab6076 ("net: dsa: mt7530: add support for port mirroring")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:12:54 -07:00
George McCollister
469b390e1b net: dsa: microchip: use delayed_work instead of timer + work
Simplify ksz_common.c by using delayed_work instead of a combination of
timer and work.

Signed-off-by: George McCollister <george.mccollister@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:10:19 -07:00
David S. Miller
2165fdf4bc Merge branch 's390-qeth-fixes'
Julian Wiedmann says:

====================
s390/qeth: fixes 2020-03-10

This fixes three minor issues:
1) a setup parameter gets cleared unnecessarily when the HW config
   changes,
2) insufficient error handling when initially filling the RX ring, and
3) a rarely used worker that needs to be cancelled during tear down.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:07:49 -07:00
Julian Wiedmann
0e635c2a87 s390/qeth: cancel RX reclaim work earlier
When qeth's napi poll code fails to refill an entirely empty RX ring, it
kicks off buffer_reclaim_work to try again later.

Make sure that this worker is cancelled when setting the qeth device
offline. Otherwise a RX refill action can unexpectedly end up running
concurrently to bigger re-configurations (eg. resizing the buffer pool),
without any locking.

Fixes: b333293058 ("qeth: add support for af_iucv HiperSockets transport")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:07:49 -07:00
Julian Wiedmann
1741385280 s390/qeth: handle error when backing RX buffer
qeth_init_qdio_queues() fills the RX ring with an initial set of
RX buffers. If qeth_init_input_buffer() fails to back one of the RX
buffers with memory, we need to bail out and report the error.

Fixes: 4a71df5004 ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:07:49 -07:00
Julian Wiedmann
240c194849 s390/qeth: don't reset default_out_queue
When an OSA device in prio-queue setup is reduced to 1 TX queue due to
HW restrictions, we reset its the default_out_queue to 0.

In the old code this was needed so that qeth_get_priority_queue() gets
the queue selection right. But with proper multiqueue support we already
reduced dev->real_num_tx_queues to 1, and so the stack puts all traffic
on txq 0 without even calling .ndo_select_queue.

Thus we can preserve the user's configuration, and apply it if the OSA
device later re-gains support for multiple TX queues.

Fixes: 73dc2daf11 ("s390/qeth: add TX multiqueue support for OSA devices")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:07:49 -07:00
David S. Miller
377bb76444 Merge branch 'flow_offload-follow-ups-to-HW-stats-type-patchset'
Jiri Pirko says:

====================
flow_offload: follow-ups to HW stats type patchset

This patchset includes couple of patches in reaction to the discussions
to the original HW stats patchset. The first patch is a fix,
the other two patches are basically cosmetics.
====================

Acked-by: Edward Cree <ecree@solarflare.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:04:19 -07:00
Jiri Pirko
a16fa28984 flow_offload: restrict driver to pass one allowed bit to flow_action_hw_stats_types_check()
The intention of this helper was to allow driver to specify one type
that it supports, so not only "any" value would pass. So make the API
more strict and allow driver to pass only 1 bit that is going
to be checked.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:04:19 -07:00
Jiri Pirko
42d5fe5f9c flow_offload: turn hw_stats_type into dedicated enum
Put the values into enum and add an enum to define the bits.

Suggested-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:04:19 -07:00
Jiri Pirko
a393daa899 flow_offload: fix allowed types check
Change the check to see if the passed allowed type bit is enabled.

Fixes: 319a1d1947 ("flow_offload: check for basic action hw stats type")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:04:19 -07:00
David S. Miller
a2d8bf77a2 Merge branch 'MACSec-bugfixes-related-to-MAC-address-change'
Igor Russkikh says:

====================
MACSec bugfixes related to MAC address change

We found out that there's an issue in MACSec code when the MAC address
is changed.
Both s/w and offloaded implementations don't update SCI when the MAC
address changes at the moment, but they should do so, because SCI contains
MAC in its first 6 octets.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:59:32 -07:00
Dmitry Bogdanov
09f4136c5d net: macsec: invoke mdo_upd_secy callback when mac address changed
Notify the offload engine about MAC address change to reconfigure it
accordingly.

Fixes: 3cf3227a21 ("net: macsec: hardware offloading infrastructure")
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:59:32 -07:00
Dmitry Bogdanov
6fc498bc82 net: macsec: update SCI upon MAC address change.
SCI should be updated, because it contains MAC in its first 6 octets.

Fixes: c09440f7dc ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:59:32 -07:00
Juliet Kim
7d7195a026 ibmvnic: Do not process device remove during device reset
The ibmvnic driver does not check the device state when the device
is removed. If the device is removed while a device reset is being
processed, the remove may free structures needed by the reset,
causing an oops.

Fix this by checking the device state before processing device remove.

Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:50:42 -07:00
David S. Miller
79c57bffeb Merge branch 'enetc-Support-extended-BD-rings-at-runtime'
Claudiu Manoil says:

====================
enetc: Support extended BD rings at runtime

First two patches are just misc code cleanup.
The 3rd patch prepares the Rx BD processing code to be extended
to processing both normal and extended BDs.
The last one adds extended Rx BD support for timestamping
without the need of a static config. Finally, the config option
FSL_ENETC_HW_TIMESTAMPING can be dropped.
Care was taken not to impact non-timestamping usecases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:48:54 -07:00
Claudiu Manoil
434cebabd3 enetc: Add dynamic allocation of extended Rx BD rings
Hardware timestamping support (PTP) on Rx requires extended
buffer descriptors, double the size of normal Rx descriptors.
On the current controller revision only the timestamping offload
requires extended Rx descriptors.
Since Rx timestamping can be turned on/off at runtime, make Rx ring
allocation configurable at runtime too. As a result, the static
config option FSL_ENETC_HW_TIMESTAMPING can be dropped and the
extended descriptors can be used only when Rx timestamping gets
activated.
The extension has the same size as the base descriptor, making
the descriptor iterators easy to update for the extended case.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:48:54 -07:00
Claudiu Manoil
714239ac63 enetc: Clean up Rx BD iteration
Improve maintainability of the code iterating the Rx buffer
descriptors to prepare it to support iterating extended Rx BD
descriptors as well.
Don't increment by one the h/w descriptor pointers explicitly,
provide an iterator that takes care of the h/w details.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:48:54 -07:00
Claudiu Manoil
a784c92ee2 enetc: Clean up of ehtool stats len
Refactor the stats len computation code to make it easier
to add new stats counters.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:48:54 -07:00
Claudiu Manoil
9ff3dd7b84 enetc: Drop redundant device node check
The existence of the DT port node is the first thing checked
at probe time, and probing won't reach this point if the node
is missing.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:48:54 -07:00
Lukas Wunner
1e09e5818b pktgen: Allow on loopback device
When pktgen is used to measure the performance of dev_queue_xmit()
packet handling in the core, it is preferable to not hand down
packets to a low-level Ethernet driver as it would distort the
measurements.

Allow using pktgen on the loopback device, thus constraining
measurements to core code.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:44:59 -07:00
Jiri Pirko
62751b6808 flow_offload: use flow_action_for_each in flow_action_mixed_hw_stats_types_check()
Instead of manually iterating over entries, use flow_action_for_each
helper. Move the helper and wrap it to fit to 80 cols on the way.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:43:25 -07:00
Karsten Graul
ece0d7bd74 net/smc: cancel event worker during device removal
During IB device removal, cancel the event worker before the device
structure is freed.

Fixes: a4cf0443c4 ("smc: introduce SMC as an IB-client")
Reported-by: syzbot+b297c6825752e7a07272@syzkaller.appspotmail.com
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:40:33 -07:00
Hangbin Liu
60380488e4 ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface
Rafał found an issue that for non-Ethernet interface, if we down and up
frequently, the memory will be consumed slowly.

The reason is we add allnodes/allrouters addressed in multicast list in
ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast
addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up()
for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb
getting bigger and bigger. The call stack looks like:

addrconf_notify(NETDEV_REGISTER)
	ipv6_add_dev
		ipv6_dev_mc_inc(ff01::1)
		ipv6_dev_mc_inc(ff02::1)
		ipv6_dev_mc_inc(ff02::2)

addrconf_notify(NETDEV_UP)
	addrconf_dev_config
		/* Alas, we support only Ethernet autoconfiguration. */
		return;

addrconf_notify(NETDEV_DOWN)
	addrconf_ifdown
		ipv6_mc_down
			igmp6_group_dropped(ff02::2)
				mld_add_delrec(ff02::2)
			igmp6_group_dropped(ff02::1)
			igmp6_group_dropped(ff01::1)

After investigating, I can't found a rule to disable multicast on
non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM,
tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up()
in inetdev_event(). Even for IPv6, we don't check the dev type and call
ipv6_add_dev(), ipv6_dev_mc_inc() after register device.

So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for
non-Ethernet interface.

v2: Also check IFF_MULTICAST flag to make sure the interface supports
    multicast

Reported-by: Rafał Miłecki <zajec5@gmail.com>
Tested-by: Rafał Miłecki <zajec5@gmail.com>
Fixes: 74235a25c6 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels")
Fixes: 1666d49e1d ("mld: do not remove mld souce list info when set link down")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:37:49 -07:00
Linus Torvalds
f35111a946 Another update for the .clang-format macro list
It has been a while since the last time I sent one!
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAl5nujkACgkQGXyLc2ht
 IW3cUBAAtLrfdsFPr/uLeFpHFIPm+X3ZAq/nOQBdrerR4GJfMD/zDN5FjKiXlsfo
 KWIk/kG0nyqEnRrwX/oh8XqGdx3kgGuniZY9Sjo9XOMggS4pdYZBYo/6LVnJ1Mgo
 2AOb9z+5wqOvsOLeIH46M0avOk2V2+hdlClx/ns6WeQnTipqbMzgWAfdnsTWk3i7
 ApK4Sc/1yQPKz1Nhr/REcADU38x2fyfHj1r2EvCm2E3ljC/che5LkvCmZHr5Iqud
 iC/BooxdeCsFfN4mvl17IBsK1iveeRt+ZFW4NsijiMNBYrfUQ74PxEre+iCn1uk3
 +KLsBnYVsqkrczyGPHjtZJnonEOjte7BQp076vA2QcTRDS3akAfdm94QsUwHJq39
 iJVo8LHJhLPoIgfdhml3RFDLSA5FB7O5feC6gDNgPnq7X9FqNeNvJRJzqYTe3TtE
 UqXpbbqFf4I+1GOBOmOSOL8eSBfnJVoTOwVJsvcMF1SBfwpyYyQApOXsIQjKcZH+
 KGIoUTd6bS6xc4FMx/6/O2Z7aWUaH72X1jADo0uP947BhQGQDcnRmH/cu+IqXceB
 sHs6a2a3wuJ9iqDb3fT1tEF/k2n9X5y9PPYqRynD1320E9OWihavG89+3m5BrfJd
 NFSxp8Buw72hjbTJDRv79RtVhZcYdUOAauO26IX+3K5X9ThZfrk=
 =t4To
 -----END PGP SIGNATURE-----

Merge tag 'clang-format-for-linus-v5.6-rc6' of git://github.com/ojeda/linux

Pull clang-format update from Miguel Ojeda:
 "Another update for the .clang-format macro list

  It has been a while since the last time I sent one!"

* tag 'clang-format-for-linus-v5.6-rc6' of git://github.com/ojeda/linux:
  clang-format: Update with the latest for_each macro list
2020-03-10 15:36:27 -07:00
Shakeel Butt
d752a49865 net: memcg: late association of sock to memcg
If a TCP socket is allocated in IRQ context or cloned from unassociated
(i.e. not associated to a memcg) in IRQ context then it will remain
unassociated for its whole life. Almost half of the TCPs created on the
system are created in IRQ context, so, memory used by such sockets will
not be accounted by the memcg.

This issue is more widespread in cgroup v1 where network memory
accounting is opt-in but it can happen in cgroup v2 if the source socket
for the cloning was created in root memcg.

To fix the issue, just do the association of the sockets at the accept()
time in the process context and then force charge the memory buffer
already used and reserved by the socket.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:33:05 -07:00
Shakeel Butt
e876ecc67d cgroup: memcg: net: do not associate sock with unrelated cgroup
We are testing network memory accounting in our setup and noticed
inconsistent network memory usage and often unrelated cgroups network
usage correlates with testing workload. On further inspection, it
seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in
irq context specially for cgroup v1.

mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context
and kind of assumes that this can only happen from sk_clone_lock()
and the source sock object has already associated cgroup. However in
cgroup v1, where network memory accounting is opt-in, the source sock
can be unassociated with any cgroup and the new cloned sock can get
associated with unrelated interrupted cgroup.

Cgroup v2 can also suffer if the source sock object was created by
process in the root cgroup or if sk_alloc() is called in irq context.
The fix is to just do nothing in interrupt.

WARNING: Please note that about half of the TCP sockets are allocated
from the IRQ context, so, memory used by such sockets will not be
accouted by the memcg.

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:

CPU: 70 PID: 12720 Comm: ssh Tainted:  5.6.0-smp-DEV #1
Hardware name: ...
Call Trace:
 <IRQ>
 dump_stack+0x57/0x75
 mem_cgroup_sk_alloc+0xe9/0xf0
 sk_clone_lock+0x2a7/0x420
 inet_csk_clone_lock+0x1b/0x110
 tcp_create_openreq_child+0x23/0x3b0
 tcp_v6_syn_recv_sock+0x88/0x730
 tcp_check_req+0x429/0x560
 tcp_v6_rcv+0x72d/0xa40
 ip6_protocol_deliver_rcu+0xc9/0x400
 ip6_input+0x44/0xd0
 ? ip6_protocol_deliver_rcu+0x400/0x400
 ip6_rcv_finish+0x71/0x80
 ipv6_rcv+0x5b/0xe0
 ? ip6_sublist_rcv+0x2e0/0x2e0
 process_backlog+0x108/0x1e0
 net_rx_action+0x26b/0x460
 __do_softirq+0x104/0x2a6
 do_softirq_own_stack+0x2a/0x40
 </IRQ>
 do_softirq.part.19+0x40/0x50
 __local_bh_enable_ip+0x51/0x60
 ip6_finish_output2+0x23d/0x520
 ? ip6table_mangle_hook+0x55/0x160
 __ip6_finish_output+0xa1/0x100
 ip6_finish_output+0x30/0xd0
 ip6_output+0x73/0x120
 ? __ip6_finish_output+0x100/0x100
 ip6_xmit+0x2e3/0x600
 ? ipv6_anycast_cleanup+0x50/0x50
 ? inet6_csk_route_socket+0x136/0x1e0
 ? skb_free_head+0x1e/0x30
 inet6_csk_xmit+0x95/0xf0
 __tcp_transmit_skb+0x5b4/0xb20
 __tcp_send_ack.part.60+0xa3/0x110
 tcp_send_ack+0x1d/0x20
 tcp_rcv_state_process+0xe64/0xe80
 ? tcp_v6_connect+0x5d1/0x5f0
 tcp_v6_do_rcv+0x1b1/0x3f0
 ? tcp_v6_do_rcv+0x1b1/0x3f0
 __release_sock+0x7f/0xd0
 release_sock+0x30/0xa0
 __inet_stream_connect+0x1c3/0x3b0
 ? prepare_to_wait+0xb0/0xb0
 inet_stream_connect+0x3b/0x60
 __sys_connect+0x101/0x120
 ? __sys_getsockopt+0x11b/0x140
 __x64_sys_connect+0x1a/0x20
 do_syscall_64+0x51/0x200
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
Fixes: 2d75807383 ("mm: memcontrol: consolidate cgroup socket tracking")
Fixes: d979a39d72 ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 15:33:05 -07:00