Commit Graph

1105052 Commits

Author SHA1 Message Date
Jakub Kicinski
68d5428931 Merge branch 'mlxsw-remove-xm-support'
Ido Schimmel says:

====================
mlxsw: Remove XM support

The XM was supposed to be an external device connected to the
Spectrum-{2,3} ASICs using dedicated Ethernet ports. Its purpose was to
increase the number of routes that can be offloaded to hardware. This was
achieved by having the ASIC act as a cache that refers cache misses to the
XM where the FIB is stored and LPM lookup is performed.

Testing was done over an emulator and dedicated setups in the lab, but
the product was discontinued before shipping to customers.

Therefore, in order to remove dead code and reduce complexity of the
code base, revert the three patchsets that added XM support.
====================

Link: https://lore.kernel.org/r/20220613132116.2021055-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-14 21:51:08 -07:00
Petr Machata
87c0a3c676 mlxsw: Revert "Prepare for XM implementation - LPM trees"
This reverts commit 923ba95ea2 ("Merge branch
'mlxsw-spectrum-prepare-for-xm-implementation-lpm-trees'").

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-14 21:51:05 -07:00
Petr Machata
725ff53204 mlxsw: Revert "Prepare for XM implementation - prefix insertion and removal"
This reverts commit e7086213f7 ("Merge branch
'mlxsw-spectrum-prepare-for-xm-implementation-prefix-insertion-and-removal'").

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-14 21:51:05 -07:00
Petr Machata
6a4b02b8fa mlxsw: Revert "Introduce initial XM router support"
This reverts commit 75c2a8fe8e ("Merge branch
'mlxsw-introduce-initial-xm-router-support'").

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-14 21:51:05 -07:00
Jakub Kicinski
6ac6dc746d Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
mlx5-next: updates 2022-06-14

1) Updated HW bits and definitions for upcoming features
 1.1) vport debug counters
 1.2) flow meter
 1.3) Execute ASO action for flow entry
 1.4) enhanced CQE compression

2) Add ICM header-modify-pattern RDMA API

Leon Says
=========

SW steering manipulates packet's header using "modifying header" actions.
Many of these actions do the same operation, but use different data each time.
Currently we create and keep every one of these actions, which use expensive
and limited resources.

Now we introduce a new mechanism - pattern and argument, which splits
a modifying action into two parts:
1. action pattern: contains the operations to be applied on packet's header,
mainly set/add/copy of fields in the packet
2. action data/argument: contains the data to be used by each operation
in the pattern.

This way we reuse same patterns with different arguments to create new
modifying actions, and since many actions share the same operations, we end
up creating a small number of patterns that we keep in a dedicated cache.

These modify header patterns are implemented as new type of ICM memory,
so the following kernel patch series add the support for this new ICM type.
==========

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Add bits and fields to support enhanced CQE compression
  net/mlx5: Remove not used MLX5_CAP_BITS_RW_MASK
  net/mlx5: group fdb cleanup to single function
  net/mlx5: Add support EXECUTE_ASO action for flow entry
  net/mlx5: Add HW definitions of vport debug counters
  net/mlx5: Add IFC bits and enums for flow meter
  RDMA/mlx5: Support handling of modify-header pattern ICM area
  net/mlx5: Manage ICM of type modify-header pattern
  net/mlx5: Introduce header-modify-pattern ICM properties
====================

Link: https://lore.kernel.org/r/20220614184028.51548-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-14 19:09:39 -07:00
Jakub Kicinski
7e5e8ec7db docs: tls: document the TLS_TX_ZEROCOPY_RO
Add missing documentation for the TLS_TX_ZEROCOPY_RO opt-in.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Link: https://lore.kernel.org/r/20220610180212.110590-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-14 11:14:17 +02:00
Marco Bonelli
19d62f5eea ethtool: Fix and simplify ethtool_convert_link_mode_to_legacy_u32()
Fix the implementation of ethtool_convert_link_mode_to_legacy_u32(), which
is supposed to return false if src has bits higher than 31 set. The current
implementation uses the complement of bitmap_fill(ext, 32) to test high
bits of src, which is wrong as bitmap_fill() fills _with long granularity_,
and sizeof(long) can be > 4. No users of this function currently check the
return value, so the bug was dormant.

Also remove the check for __ETHTOOL_LINK_MODE_MASK_NBITS > 32, as the enum
ethtool_link_mode_bit_indices contains far beyond 32 values. Using
find_next_bit() to test the src bitmask works regardless of this anyway.

Signed-off-by: Marco Bonelli <marco@mebeim.net>
Link: https://lore.kernel.org/r/20220609134900.11201-1-marco@mebeim.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-13 23:11:35 -07:00
Rasmus Villemoes
bfa54812f0 net: phy: fixed_phy: set phy_mask before calling mdiobus_register()
There's no point probing for phys on this artificial bus, so we can
save a little bit of boot time by telling mdiobus_register() not to do
that.

This doesn't have any functional change, since, at this point,
fixed_mdio_read() returns 0xffff for all addresses/registers, so

  mdiobus_scan() -> get_phy_device() -> get_phy_c22_id()

will return -ENODEV, which is just ignored.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20220606200208.1665417-1-linux@rasmusvillemoes.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-13 23:11:24 -07:00
Ofer Levi
cdcdce948d net/mlx5: Add bits and fields to support enhanced CQE compression
Expose ifc bits and add needed structure fields and methods to
support enhanced CQE compression feature.
The enhanced CQE compression feature improves cpu utiliziation with
better packet latency from nic to host.

Signed-off-by: Ofer Levi <oferle@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13 14:59:06 -07:00
Shay Drory
d107ba1f7c net/mlx5: Remove not used MLX5_CAP_BITS_RW_MASK
Remove not used MLX5_CAP_BITS_RW_MASK.
While at it, remove CAP_MASK, MLX5_CAP_OFF_CMDIF_CSUM
and MLX5_DEV_CAP_FLAG_*, since MLX5_CAP_BITS_RW_MASK
was their only user.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13 14:59:06 -07:00
Shay Drory
684f062c97 net/mlx5: group fdb cleanup to single function
Currently, the allocation of fdb software objects are done is single
function, oppose to the cleanup of them.
Group the cleanup of fdb software objects to single function.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13 14:59:06 -07:00
Jianbo Liu
91707779a4 net/mlx5: Add support EXECUTE_ASO action for flow entry
Attach flow meter to FTE with object id and index.
Use metadata register C5 to store the packet color meter result.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13 14:59:06 -07:00
Saeed Mahameed
3e94e61bd4 net/mlx5: Add HW definitions of vport debug counters
total_q_under_processor_handle - number of queues in error state due to an
async error or errored command.

send_queue_priority_update_flow - number of QP/SQ priority/SL update
events.

cq_overrun - number of times CQ entered an error state due to an
overflow.

async_eq_overrun -number of time an EQ mapped to async events was
overrun.

comp_eq_overrun - number of time an EQ mapped to completion events was
overrun.

quota_exceeded_command - number of commands issued and failed due to quota
exceeded.

invalid_command - number of commands issued and failed dues to any reason
other than quota exceeded.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13 14:59:05 -07:00
Jianbo Liu
f5d23ee137 net/mlx5: Add IFC bits and enums for flow meter
Add/extend structure layouts and defines for flow meter.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13 14:59:05 -07:00
Yevgeny Kliteynik
a6492af380 RDMA/mlx5: Support handling of modify-header pattern ICM area
Add support for allocate/deallocate and registering MR of the new type
of ICM area. Support exists only for devices that support sw_owner_v2.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13 14:58:01 -07:00
Yevgeny Kliteynik
667658364b net/mlx5: Manage ICM of type modify-header pattern
Added support for managing new type of ICM for devices that
support sw_owner_v2.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13 14:58:01 -07:00
Yevgeny Kliteynik
795e10b450 net/mlx5: Introduce header-modify-pattern ICM properties
Added new fields for device memory capabilities, in order to
support creation of ICM memory for modify header patterns.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13 14:58:01 -07:00
Yajun Deng
c04245328d net: make __sys_accept4_file() static
__sys_accept4_file() isn't used outside of the file, make it static.

As the same time, move file_flags and nofile parameters into
__sys_accept4_file().

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13 13:47:15 +01:00
Eric Dumazet
219160be49 tcp: sk_forced_mem_schedule() optimization
sk_memory_allocated_add() has three callers, and returns
to them @memory_allocated.

sk_forced_mem_schedule() is one of them, and ignores
the returned value.

Change sk_memory_allocated_add() to return void.

Change sock_reserve_memory() and __sk_mem_raise_allocated()
to call sk_memory_allocated().

This removes one cache line miss [1] for RPC workloads,
as first skbs in TCP write queue and receive queue go through
sk_forced_mem_schedule().

[1] Cache line holding tcp_memory_allocated.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13 13:35:25 +01:00
Parthiban Veerasooran
4066bf4ce3 net: smsc95xx: add support for Microchip EVB-LAN8670-USB
This patch adds support for Microchip's EVB-LAN8670-USB 10BASE-T1S
ethernet device to the existing smsc95xx driver by adding the new
USB VID/PID pairs.

Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13 13:33:14 +01:00
Yinjun Zhang
5f30671d8d nfp: support 48-bit DMA addressing for NFP3800
48-bit DMA addressing is supported in NFP3800 HW and implemented
in NFDK firmware, so enable this feature in driver now. Note that
with this change, NFD3 firmware, which doesn't implement 48-bit
DMA, cannot be used for NFP3800 any more.

RX free list descriptor, used by both NFD3 and NFDK, is also modified
to support 48-bit DMA. That's OK because the top bits is always get
set to 0 when assigned with 40-bit address.

Based on initial work of Jakub Kicinski <jakub.kicinski@netronome.com>.

Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13 13:31:39 +01:00
David S. Miller
11a1585f26 Merge branch 'ipa-refactoring'
Alex Elder says:

====================
net: ipa: simple refactoring

This series contains some minor code improvements.

The first patch verifies that the configuration is compatible with a
recently-defined limit.  The second and third rename two fields so
they better reflect their use in the code.  The next gets rid of an
empty function by reworking its only caller.

The last two begin to remove the assumption that an event ring is
associated with a single channel.  Eventually we'll support having
multiple channels share an event ring but some more needs to be done
before that can happen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13 12:01:58 +01:00
Alex Elder
bcec9ecbaf net: ipa: derive channel from transaction
In gsi_channel_tx_queued(), we report when a transaction gets passed
to hardware.  Change that function so it takes transaction rather
than a channel as its argument, and derive the channel from the
transaction.  Rename the function accordingly.

Delete the header comments above the function definition; the ones
above the declaration in "gsi_private.h" should suffice.  In
addition, the comments above gsi_channel_tx_update() do a fine job
of explaining what's going on.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13 12:01:58 +01:00
Alex Elder
7dd9558fed net: ipa: determine channel from event
Each event in an event ring describes the TRE whose completion
caused the event.  Currently, every event ring is dedicated to a
single channel, so the channel is easily derived from the event
ring.

An event ring can actually be shared by more than one channel
though, and to distinguish events for one channel from another, the
event structure contains a field indicating which channel the event
is associated with.

In gsi_event_trans(), use the channel ID in an event to determine
which channel the event is for.  This makes the channel pointer now
passed to that function irrelevant; pass the GSI pointer to that
function instead.

And although it shouldn't happen, warn if an event arrives that
records a channel ID that's not in use, or if the event does not
have a transaction associated with it.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13 12:01:58 +01:00
Alex Elder
983a1a3081 net: ipa: simplify endpoint transaction completion
When a GSI transaction completes, ipa_endpoint_trans_complete() is
eventually called.  That handles TX and RX completions separately,
but ipa_endpoint_tx_complete() is a no-op.

Instead, have ipa_endpoint_trans_complete() return immediately for a
TX transaction, and incorporate code from ipa_endpoint_rx_complete()
to handle RX transactions.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13 12:01:58 +01:00
Alex Elder
317595d2ce net: ipa: rename endpoint->trans_tre_max
The trans_tre_max field of the IPA endpoint structure is only used
to limit the number of fragments allowed for an SKB being prepared
for transmission.  Recognizing that, rename the field skb_frag_max,
and reduce its value by 1.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13 12:01:58 +01:00
Alex Elder
88e03057e4 net: ipa: rename channel->tlv_count
Each GSI channel has a TLV FIFO of a certain size, specified in the
configuration data for an AP channel.  That size dictates the
maximum number of TREs that are allowed in a single transaction.

The only way that value is used after initialization is as a limit
on the number of TREs in a transaction; calling it "tlv_count"
isn't helpful, and in fact gsi_channel_trans_tre_max() exists to
sort of abstract it.

Instead, rename the channel->tlv_count field trans_tre_max, and get
rid of the helper function.  Update a couple of comments as well.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13 12:01:58 +01:00
Alex Elder
92f78f81ac net: ipa: verify command channel TLV count
In commit 8797972aff ("net: ipa: remove command info pool"), the
maximum number of IPA commands that would be sent in a single
transaction was defined.  That number can't exceed the size of the
TLV FIFO on the command channel, and we can check that at runtime.

To add this check, pass a new flag to gsi_channel_data_valid() to
indicate the channel being checked is being used for IPA commands.
Knowing that we can also verify the channel direction is correct.

Use a new local variable that refers to the command-specific portion
of the data being checked.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13 12:01:58 +01:00
Yinjun Zhang
27f2533bcc nfp: flower: support to offload pedit of IPv6 flowinto fields
Previously the traffic class field is ignored while firmware has
already supported to pedit flowinfo fields, including traffic
class and flow label, now add it back.

Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Link: https://lore.kernel.org/r/20220609080136.151830-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 22:23:17 -07:00
Bin Chen
10e11aa241 ethernet: Remove vf rate limit check for drivers
The commit a14857c27a ("rtnetlink: verify rate parameters for calls to
ndo_set_vf_rate") has been merged to master, so we can to remove the
now-duplicate checks in drivers.

Signed-off-by: Bin Chen <bin.chen@corigine.com>
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220609084717.155154-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 22:19:32 -07:00
Jakub Kicinski
68c51dd992 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
10GbE Intel Wired LAN Driver Updates 2022-06-09

Maximilian Heyne adds reporting of VF statistics on ixgbe via iproute2
interface.

Kai-Heng Feng removes duplicate defines from igb.

Jiaqing Zhao fixes typos in e1000, ixgb, and ixgbe drivers.

Julia Lawall fixes typos for fm10k, ixgbe, and ice drivers.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  drivers/net/ethernet/intel: fix typos in comments
  ixgbe: Fix typos in comments
  ixgb: Fix typos in comments
  e1000: Fix typos in comments
  igb: Remove duplicate defines
  drivers, ixgbe: export vf statistics
====================

Link: https://lore.kernel.org/r/20220609171257.2727150-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 22:07:32 -07:00
Jakub Kicinski
e10b02ee5b Merge branch 'net-reduce-tcp_memory_allocated-inflation'
Eric Dumazet says:

====================
net: reduce tcp_memory_allocated inflation

Hosts with a lot of sockets tend to hit so called TCP memory pressure,
leading to very bad TCP performance and/or OOM.

The problem is that some TCP sockets can hold up to 2MB of 'forward
allocations' in their per-socket cache (sk->sk_forward_alloc),
and there is no mechanism to make them relinquish their share
under mem pressure.
Only under some potentially rare events their share is reclaimed,
one socket at a time.

In this series, I implemented a per-cpu cache instead of a per-socket one.

Each CPU has a +1/-1 MB (256 pages on x86) forward alloc cache, in order
to not dirty tcp_memory_allocated shared cache line too often.

We keep sk->sk_forward_alloc values as small as possible, to meet
memcg page granularity constraint.

Note that memcg already has a per-cpu cache, although MEMCG_CHARGE_BATCH
is defined to 32 pages, which seems a bit small.

Note that while this cover letter mentions TCP, this work is generic
and supports TCP, UDP, DECNET, SCTP.
====================

Link: https://lore.kernel.org/r/20220609063412.2205738-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 16:21:40 -07:00
Eric Dumazet
0f2c269398 net: unexport __sk_mem_{raise|reduce}_allocated
These two helpers are only used from core networking.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 16:21:27 -07:00
Eric Dumazet
4890b686f4 net: keep sk->sk_forward_alloc as small as possible
Currently, tcp_memory_allocated can hit tcp_mem[] limits quite fast.

Each TCP socket can forward allocate up to 2 MB of memory, even after
flow became less active.

10,000 sockets can have reserved 20 GB of memory,
and we have no shrinker in place to reclaim that.

Instead of trying to reclaim the extra allocations in some places,
just keep sk->sk_forward_alloc values as small as possible.

This should not impact performance too much now we have per-cpu
reserves: Changes to tcp_memory_allocated should not be too frequent.

For sockets not using SO_RESERVE_MEM:
 - idle sockets (no packets in tx/rx queues) have zero forward alloc.
 - non idle sockets have a forward alloc smaller than one page.

Note:

 - Removal of SK_RECLAIM_CHUNK and SK_RECLAIM_THRESHOLD
   is left to MPTCP maintainers as a follow up.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 16:21:27 -07:00
Eric Dumazet
7c80b038d2 net: fix sk_wmem_schedule() and sk_rmem_schedule() errors
If sk->sk_forward_alloc is 150000, and we need to schedule 150001 bytes,
we want to allocate 1 byte more (rounded up to one page),
instead of 150001 :/

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 16:21:27 -07:00
Eric Dumazet
3cd3399dd7 net: implement per-cpu reserves for memory_allocated
We plan keeping sk->sk_forward_alloc as small as possible
in future patches.

This means we are going to call sk_memory_allocated_add()
and sk_memory_allocated_sub() more often.

Implement a per-cpu cache of +1/-1 MB, to reduce number
of changes to sk->sk_prot->memory_allocated, which
would otherwise be cause of false sharing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 16:21:26 -07:00
Eric Dumazet
0defbb0af7 net: add per_cpu_fw_alloc field to struct proto
Each protocol having a ->memory_allocated pointer gets a corresponding
per-cpu reserve, that following patches will use.

Instead of having reserved bytes per socket,
we want to have per-cpu reserves.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 16:21:26 -07:00
Eric Dumazet
100fdd1faf net: remove SK_MEM_QUANTUM and SK_MEM_QUANTUM_SHIFT
Due to memcg interface, SK_MEM_QUANTUM is effectively PAGE_SIZE.

This might change in the future, but it seems better to avoid the
confusion.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 16:21:26 -07:00
Eric Dumazet
e70f3c7012 Revert "net: set SK_MEM_QUANTUM to 4096"
This reverts commit bd68a2a854.

This change broke memcg on arches with PAGE_SIZE != 4096

Later, commit 2bb2f5fb21 ("net: add new socket option SO_RESERVE_MEM")
also assumed PAGE_SIZE==SK_MEM_QUANTUM

Following patches in the series will greatly reduce the over allocations
problem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 16:21:26 -07:00
Jakub Kicinski
5c281b4e52 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 15:55:32 -07:00
Linus Torvalds
aa3398fb4b Devicetree fixes for 5.19, part 2:
- More DT meta-schema check fixes from new bindings in merge window
 
 - Fix stale DT binding references from Mauro
 
 - Update various binding maintainers
 
 - Fix in arm,malidp properties to match reality
 
 - Add deprecated 'atheros' vendor prefix
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmKjj4oQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhwz85EACy/7Qvrv3BCJYW12lsWxzm69DXjMx/aXSO
 JJRN8hPOMGVb2S1278Hz8JpiRt+CjMyOLFMDgmuztieRd6DoHHVn5YU89T/8kkdv
 uzL1aT7R3X7Wwcs1OIIkPDikhtCsi74HPF6wp1mdxnXkfMj1sAd6ZGLoIpQc14qa
 Az6fwljlqj6sksDokSxjRvLtNT7Ncw6h8hYW2ALlDGeeMupbkOnA7M1OGYOsKiIS
 N6jf00dYTDgHcWzr7JJxb9ykSTHgzEy7ohkXAalSSkHp2Gl1wCuBJr7KnL+F59mg
 Z4Jp9XsU+KsqEOuSWmO2wOdfH8f8hlecU/zu1hq4uVwDv7VuugNdf4Ccqqe62Hje
 Qek4SXl4ChV+O9cVs09ww/yC2lfTJlES7R99oGL24Rx8ExpPRDXLrA6LVPhydHWf
 SK9vVcEEHeORM/in9Y8VKYlzoAG/eDPdd8ovunG9focwMizTFhU3vxhsdzcz4aGK
 VpEfKVDot/dNDxrC/CikyUldiY8GOONb+L+adux4kj0Y5Fz4W6w9YnRqha3DYtnr
 rXkUOW5v78dOFL6XQdtTW4rwJSHXUkMcYO7pxWvgY0e128qEV8+G6BxzSG0ihEET
 qcJ34C6wgPYdowP/tTIlOOFBOiPliMiAzrMLy6JFghGgnagNmK58kuTGfRoYdDDq
 VFlIkstHZw==
 =bYOd
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-fixes-for-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull more devicetree fixes from Rob Herring:

 - More DT meta-schema check fixes from new bindings in merge window

 - Fix stale DT binding references from Mauro

 - Update various binding maintainers

 - Fix in arm,malidp properties to match reality

 - Add deprecated 'atheros' vendor prefix

* tag 'devicetree-fixes-for-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: display: arm,malidp: remove bogus RQOS property
  dt-bindings: pinctrl: ralink: Fix 'enum' lists with duplicate entries
  dt-bindings: Drop more redundant 'maxItems/minItems' in if/then schemas
  dt-bindings: nvme: apple,nvme-ans: Drop 'maxItems' from 'apple,sart'
  MAINTAINERS: rectify entries for ARM DRM DRIVERS after dt conversion
  MAINTAINERS: update snps,axs10x-reset.yaml reference
  MAINTAINERS: update dongwoon,dw9807-vcm.yaml reference
  MAINTAINERS: update cortina,gemini-ethernet.yaml reference
  dt-bindings: mfd: rk808: update rockchip,rk808.yaml reference
  dt-bindings: reset: update st,stih407-powerdown.yaml references
  dt-bindings: arm: update vexpress-config.yaml references
  dt-bindings: interrupt-controller: update brcm,l2-intc.yaml reference
  dt-bindings: mfd: bd9571mwv: update rohm,bd9571mwv.yaml reference
  dt-bindings: update Luca Ceresoli's e-mail address
  dt-bindings: msm: update maintainers list with proper id
  dt-bindings: vendor-prefixes: document deprecated Atheros
  dt-bindings: Update QCOM USB subsystem maintainer information
2022-06-10 11:57:36 -07:00
Linus Torvalds
1bc27dec7e Power management fixes for 5.19-rc2
- Fix CPUIDLE_FLAG_IRQ_ENABLE handling in intel_idle (Peter Zijlstra).
 
  - Allow all platforms to use the global poweroff handler and make
    non-syscall poweroff code paths work again (Dmitry Osipenko).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmKjkJUSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxugkQAKjDzcrVBffv4zOjijJS+ojnctRZShIw
 tWw/CUjm+VVAASSQ9XsaWFP/Eu3sA2tLOEaZddoAMCxzlwCgCVVZUgwChGmytiqr
 s4hJHyLkVAP8TszQoXvoyy5AIJI7Un33moXveGH9ZIbIAjtYckfInQJbyIsmVDF6
 VmOryqDxt+LS7hiXaAhEkExiy6HdfbmTt5f+tBO76p+eXtyj1hvkbUYrk7yX+Rv4
 6V+wDxZxy4LbhQDZQRzGYL3OBG33t9ThmVKi0Km2Aq6DXd5XIUMXZ9tYJ6+TYfe2
 scn6ZpTT87HLb8j96ZQUHU+MESvlBlpGdPGHwA2WORRqBL1sR0tsGqcfvHt5c57u
 mTrDFXaVs1vG4Zaa9JpxbrKwrNyi3xspG7EVKKtPfFHjREKhcLEh5k5F/x4ZQMY/
 U45hbEoLEWrP4Z68th7vL0T7E85tCW3+wp3FfzVkJ36H5cCy7CWAG5qpto0d3uo7
 6M3Xe+ms547R2zWWlGcdxynlhIfWK3lzYc1YZb6eoeb67m7Iyav2aehnOmko9wIO
 iNONbaB4SrwPdBi7BqaAX30G+IX/MDSu6EotI6FTJEWlJmBZPXryPBXaIgluhZWL
 8CurDLxNPFyXR3tj7BiLAYbIxGJwIqFHODS6cm6kioM8VA+WskcvLfE9+QbEXCEh
 tH+l1dzk+NYt
 =2OMR
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix an intel_idle issue introduced during the 5.16 development
  cycle and two recent regressions in the system reboot/poweroff code.

  Specifics:

   - Fix CPUIDLE_FLAG_IRQ_ENABLE handling in intel_idle (Peter Zijlstra)

   - Allow all platforms to use the global poweroff handler and make
     non-syscall poweroff code paths work again (Dmitry Osipenko)"

* tag 'pm-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE
  kernel/reboot: Fix powering off using a non-syscall code paths
  kernel/reboot: Use static handler for register_platform_power_off()
2022-06-10 11:49:27 -07:00
David Howells
d56fd98612 certs: Convert spaces in certs/Makefile to a tab
There's a rule in certs/Makefile for which the command begins with eight
spaces.  This results in:

        ../certs/Makefile:21: FORCE prerequisite is missing
        ../certs/Makefile:21: *** missing separator.  Stop.

Fix this by turning the spaces into a tab.

Fixes: addf466389 ("certs: Check that builtin blacklist hashes are valid")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Mickaël Salaün <mic@linux.microsoft.com>
cc: keyrings@vger.kernel.org
Link: https://lore.kernel.org/r/486b1b80-9932-aab6-138d-434c541c934a@digikod.net/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-10 11:42:02 -07:00
Andre Przywara
0b9431c822 dt-bindings: display: arm,malidp: remove bogus RQOS property
As Liviu pointed out, the arm,malidp-arqos-high-level property
mentioned in the original .txt binding was a mistake, and
arm,malidp-arqos-value needs to take its place.

The binding commit ce6eb0253c ("dt/bindings: display: Add optional
property node define for Mali DP500") mentions the right name in the
commit message, but has the wrong name in the diff.
Commit d298e6a27a ("drm/arm/mali-dp: Add display QoS interface
configuration for Mali DP500") uses the property in the driver, but uses
the shorter name.

Remove the wrong property from the binding, and use the proper name in
the example. The actual property was already documented properly.

Fixes: 2c8b082a3a ("dt-bindings: display: convert Arm Mali-DP to DT schema")
Link: https://lore.kernel.org/linux-arm-kernel/YnumGEilUblhBx8E@e110455-lin.cambridge.arm.com/
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reported-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220609162729.1441760-1-andre.przywara@arm.com
2022-06-10 12:32:05 -06:00
Rafael J. Wysocki
67e59f8d01 Merge branch 'pm-sysoff'
Merge fixes for regressions introduced by the recent rework of the
system reboot/poweroff code.

* pm-sysoff:
  kernel/reboot: Fix powering off using a non-syscall code paths
  kernel/reboot: Use static handler for register_platform_power_off()
2022-06-10 20:24:10 +02:00
Rob Herring
01aa6cbf3d dt-bindings: pinctrl: ralink: Fix 'enum' lists with duplicate entries
There's no reason to list the same value twice in an 'enum'. This was fixed
treewide in commit c3b0068194 ("dt-bindings: Fix 'enum' lists with
duplicate entries"), but this one got added in the merge window.

A meta-schema change will catch future cases.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20220606212239.1360877-1-robh@kernel.org
2022-06-10 12:19:07 -06:00
Linus Torvalds
fe43c01889 A few documentation fixes for 5.19, including moving the new HTE docs to a
more suitable location, adding loongarch to the features lists, and a
 couple of typo fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmKjf0MPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Yus0H/jUm48oqvWxk7T7V1TVJXkRACd/pq+7v+pbl
 wgDTqjoqR4zxrUkdrZb+YAWXfnC+QKttxs1nWsZTXd5xyQ9w9pBHXEIPPNdmxHpI
 5qa1dPWQrKuyULne01R2kPLUACZTpIBVZQ3tsO7LyJ6BwD295rZxXrsnKLwuYhfl
 OmrS1oYhu/Gt9ROt3NbFDxs5PZjkwjiqjKohUAfqW8g2s1YsGVp0WeNplwx4zRZc
 sgXHw3zSg81F/NeORhneEesJmqRqCUw5pYMNzJkRKjOAiO5DtD0HXHLLRKflX6+j
 M5QlViO81BT7Rv8UiUJvXynIi5g6MPQ3EkjESERTVcH1JtRRZvo=
 =Hy9U
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.19-3' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "A few documentation fixes for 5.19, including moving the new HTE docs
  to a more suitable location, adding loongarch to the features lists,
  and a couple of typo fixes"

* tag 'docs-5.19-3' of git://git.lwn.net/linux:
  docs: arm: tcm: Fix typo in description of TCM and MMU usage
  docs: Move the HTE documentation to driver-api/
  docs: usb: fix literal block marker in usbmon verification example
  Documentation/features: Update the arch support status files
2022-06-10 11:14:47 -07:00
Linus Torvalds
36a2366379 arm64 fixes for 5.19-rc2:
- SME save/restore for EFI fix - incorrect logic for detecting the need
   for saving/restoring the FFR state.
 
 - SME fix for a CPU ID field value.
 
 - Sysreg generation awk script fix (comparison operator).
 
 - Some typos in documentation or comments and silence a sparse warning
   (missing prototype).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmKjfnMACgkQa9axLQDI
 XvFaZQ/7B4KO2DCVy9EgUQn6sC+9fIra60nb4INGezzyxPgYvPfeX51U16jaxNXB
 iPAQCJy5RKOioN76Ew8HgQd6UHGyspjLGC8Va/xvnGzWa4zSEqX6uFx7t7t6p2G7
 CgP2BjF7hQEaRZoluhs5b3lXKnVx2zooLp7HBRaVukGd1f75evSUE68hOqSLX2DO
 bhFGyCsOlpZyKC0HwVA2bl2ozaujQFAp+MHvzsb+yRdHc/IhNaaqpACrarUGXinR
 Dj4kQ/lo3nh7nNsVpsq0xwtwGV+3LrlIhIhyIMTUNE5fTCq7IlN8yy/j7yeMDhMb
 jSKPz+n9ZzHOBF2rwK4zmQC/1Pe0HFHWq/kCB66kUrtjFug9rJeTcUThkpSay7Ya
 2HEYwdPvT59OAw+IPSFtHQD5XOV33ci8wykLT8dV4mffrWtY9nx6E3xlCW8/5Ue8
 Q3ti08gN3aARUMj8sJTE/E/O02idMoU/pSzaqowtG8psZxSFf9DSZGs1WjvKVKRj
 x4JGzBv5egAk/fSDw5NbMCAh1KW94Wm1pdXYdzYp90iJQwPfXrf5YyJ+c2FzuuIj
 8YW7jxq/dBp880yWFa77d41uzauWsIpflaekzCnLBz7oXOrkpQBVxRAOEwWA35Fx
 JKeAqJieyPILymSM9e1HnBRYR5WQztPSFwHmCnUiVwCIOHM6kl4=
 =dll5
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - SME save/restore for EFI fix - incorrect logic for detecting the need
   for saving/restoring the FFR state.

 - SME fix for a CPU ID field value.

 - Sysreg generation awk script fix (comparison operator).

 - Some typos in documentation or comments and silence a sparse warning
   (missing prototype).

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Add kasan_hw_tags_enable() prototype to silence sparse
  arm64/sme: Fix EFI save/restore
  arm64/fpsimd: Fix typo in comment
  arm64/sysreg: Fix typo in Enum element regex
  arm64/sme: Fix SVE/SME typo in ABI documentation
  arm64/sme: Fix tests for 0b1111 value ID registers
2022-06-10 11:03:51 -07:00
Linus Torvalds
ad6e076498 zonefs fixes for 5.19-rc2
* Fix handling of the explicit-open mount option, and in particular the
   conditions under which this option can be ignored.
 
 * Fix a problem with zonefs iomap_begin method, causing a hang in
   iomap_readahead() when a readahead request reaches the end of a file.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYqMh/wAKCRDdoc3SxdoY
 dvd+AP4jNRFhAedXl0mIutoP4k0XwblSz9RwrXLOYzkOtgpXGQD+Lps42w6EQliE
 wWuuL4syVgKamolj0WGcPLarGZC7LQA=
 =neot
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs fixes from Damien Le Moal:

 - Fix handling of the explicit-open mount option, and in particular the
   conditions under which this option can be ignored.

 - Fix a problem with zonefs iomap_begin method, causing a hang in
   iomap_readahead() when a readahead request reaches the end of a file.

* tag 'zonefs-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: fix zonefs_iomap_begin() for reads
  zonefs: Do not ignore explicit_open with active zone limit
  zonefs: fix handling of explicit_open option on mount
2022-06-10 10:56:28 -07:00
Linus Torvalds
f7a1d00e74 ATA fixes for 5.19-rc2
Several small fixes for rc2:
 
 * Remove unused field in struct ata_port, from Hannes.
 
 * Fix a potential (very unlikely) NULL pointer dereference in
   ata_host_alloc_pinfo(), from Sergey.
 
 * Fix a device reference leak in the pata_octeon_cf driver, from
   Miaoqian.
 
 * Fixes for handling access to the concurrent positioning ranges log
   page used with multi-actuator HDDs, from Tyler.
 
 * Fix the values shown by the pio_mode and dma_mode sysfs device
   attributes, from Sergey.
 
 * Update the MAINTAINERS file to add libata sysfs ABI documentation
   file, from Sergey.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYqMjsAAKCRDdoc3SxdoY
 dvS8AQDKFhvTnzQL/nIHWC5y0bsH4wF213g69SM79U7sPL2boQEAxczNR4RllIcT
 yv4aAG1mk2+ii6ClqdC4m1EfpS1WtgI=
 =fsTk
 -----END PGP SIGNATURE-----

Merge tag 'ata-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ATA fixes from Damien Le Moal:
 "Several small fixes for rc2:

   - Remove unused field in struct ata_port (Hannes)

   - Fix a potential (very unlikely) NULL pointer dereference in
     ata_host_alloc_pinfo() (Sergey)

   - Fix a device reference leak in the pata_octeon_cf driver (Miaoqian)

   - Fixes for handling access to the concurrent positioning ranges log
     page used with multi-actuator HDDs (Tyler)

   - Fix the values shown by the pio_mode and dma_mode sysfs device
     attributes (Sergey)

   - Update the MAINTAINERS file to add libata sysfs ABI documentation
     file (Sergey)"

* tag 'ata-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  MAINTAINERS: add ATA sysfs file documentation to libata entry
  ata: libata-transport: fix {dma|pio|xfer}_mode sysfs files
  libata: fix translation of concurrent positioning ranges
  libata: fix reading concurrent positioning ranges log
  ata: pata_octeon_cf: Fix refcount leak in octeon_cf_probe
  ata: libata-core: fix NULL pointer deref in ata_host_alloc_pinfo()
  ata: libata: drop 'sas_last_tag'
2022-06-10 10:30:43 -07:00