Document the different needs of documentation for the b53 driver.
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Document DSA tagged and VLAN based switch configuration by showcases.
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This series adds mlx5 support for devlink fw versions query.
1) Implement the required low level firmware commands
2) Implement the devlink knobs and callbacks for fw versions query.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0eZOQACgkQSD+KveBX
+j6a0gf/SbSY2yDz6iyDZy7Zt0e7onD395UyWIoXbXJAI/aMggvdNE233dh369xa
d4Zo8zvFijiFpPKmP0p+WZfvR5YPPWgtOYDv1MbaTz5lqsGYUh2kN9FaTyCNZW3O
jeNidxjsne70VBrpmOZJX2mxduKwRdxchRg2iQPdeZQGO5Uq3nEdOfJVx/mhsIGw
iRczF4IAQpArZEyKXgqMSrtJ99FJ7JSJXnsqjVJyUC1zyCZ+Cq3VV55+u5VCBokd
qr/VSYqFd38SEHR5l4nXDXLX23pOwWNhUdPgGopjak8bN3k19vk6GfQ2bzLFnaOW
4fx1k4uTixKOrbLkgzay/YSsoF2TDw==
=Bwl7
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2019-07-04-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-update-2019-07-04
This series adds mlx5 support for devlink fw versions query.
1) Implement the required low level firmware commands
2) Implement the devlink knobs and callbacks for fw versions query.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The callback is invoked using 'devlink dev info <pci>' command and returns
the running and pending firmware version of the HCA and the name of the
kernel driver.
If there is a pending firmware version (a new version is burned but the
HCA still runs with the previous) it is returned as the stored
firmware version. Otherwise, the running version is returned for this
field.
Output example:
$ devlink dev info pci/0000:00:06.0
pci/0000:00:06.0:
driver mlx5_core
versions:
fixed:
fw.psid MT_0000000009
running:
fw.version 16.26.0100
stored:
fw.version 16.26.0100
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-07-03
The following pull-request contains BPF updates for your *net-next* tree.
There is a minor merge conflict in mlx5 due to 8960b38932 ("linux/dim:
Rename externally used net_dim members") which has been pulled into your
tree in the meantime, but resolution seems not that bad ... getting current
bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed
just so he's aware of the resolution below:
** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:
<<<<<<< HEAD
static int mlx5e_open_cq(struct mlx5e_channel *c,
struct dim_cq_moder moder,
struct mlx5e_cq_param *param,
struct mlx5e_cq *cq)
=======
int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder,
struct mlx5e_cq_param *param, struct mlx5e_cq *cq)
>>>>>>> e5a3e259ef
Resolution is to take the second chunk and rename net_dim_cq_moder into
dim_cq_moder. Also the signature for mlx5e_open_cq() in ...
drivers/net/ethernet/mellanox/mlx5/core/en.h +977
... and in mlx5e_open_xsk() ...
drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64
... needs the same rename from net_dim_cq_moder into dim_cq_moder.
** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:
<<<<<<< HEAD
int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
struct dim_cq_moder icocq_moder = {0, 0};
struct net_device *netdev = priv->netdev;
struct mlx5e_channel *c;
unsigned int irq;
=======
struct net_dim_cq_moder icocq_moder = {0, 0};
>>>>>>> e5a3e259ef
Take the second chunk and rename net_dim_cq_moder into dim_cq_moder
as well.
Let me know if you run into any issues. Anyway, the main changes are:
1) Long-awaited AF_XDP support for mlx5e driver, from Maxim.
2) Addition of two new per-cgroup BPF hooks for getsockopt and
setsockopt along with a new sockopt program type which allows more
fine-grained pass/reject settings for containers. Also add a sock_ops
callback that can be selectively enabled on a per-socket basis and is
executed for every RTT to help tracking TCP statistics, both features
from Stanislav.
3) Follow-up fix from loops in precision tracking which was not propagating
precision marks and as a result verifier assumed that some branches were
not taken and therefore wrongly removed as dead code, from Alexei.
4) Fix BPF cgroup release synchronization race which could lead to a
double-free if a leaf's cgroup_bpf object is released and a new BPF
program is attached to the one of ancestor cgroups in parallel, from Roman.
5) Support for bulking XDP_TX on veth devices which improves performance
in some cases by around 9%, from Toshiaki.
6) Allow for lookups into BPF devmap and improve feedback when calling into
bpf_redirect_map() as lookup is now performed right away in the helper
itself, from Toke.
7) Add support for fq's Earliest Departure Time to the Host Bandwidth
Manager (HBM) sample BPF program, from Lawrence.
8) Various cleanups and minor fixes all over the place from many others.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the workqueue to handle management interrupts and
support for resets.
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for passing traffic.
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a driver framework for the Compute Engine Virtual NIC that will be
available in the future.
At this point the only functionality is loading the driver.
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend flowlabel_reflect bitmask to allow conditional
reflection of incoming flowlabels in echo replies.
Note this has precedence against auto flowlabels.
Add flowlabel_reflect enum to replace hard coded
values.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Document contains configuration options description,
details and examples of driver various settings.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix documentation that mention xdpsock_kern.c which has been
replaced by code embedded in libbpf.
Signed-off-by: Eric Leblond <eric@regit.org>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There seems to be some confusion surrounding three PHY interface modes,
specifically 1000BASE-X, 2500BASE-X and SGMII. Add some documentation
to phylib detailing precisely what these interface modes refer to.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some TCP peers announce a very small MSS option in their SYN and/or
SYN/ACK messages.
This forces the stack to send packets with a very high network/cpu
overhead.
Linux has enforced a minimal value of 48. Since this value includes
the size of TCP options, and that the options can consume up to 40
bytes, this means that each segment can include only 8 bytes of payload.
In some cases, it can be useful to increase the minimal value
to a saner value.
We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
reasons.
Note that TCP_MAXSEG socket option enforces a minimal value
of (TCP_MIN_MSS). David Miller increased this minimal value
in commit c39508d6f1 ("tcp: Make TCP_MAXSEG minimum more correct.")
from 64 to 88.
We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.
CVE-2019-11479 -- tcp mss hardcoded to 48
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of relying on rps_needed, it is safer to use a separate
static key, since we do not want to enable TCP rx_skb_cache
by default. This feature can cause huge increase of memory
usage on hosts with millions of sockets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mlx5 devlink health fw reporters and sw reset support
This series provides mlx5 firmware reset support and firmware devlink health
reporters.
1) Add initial mlx5 kernel documentation and include devlink health reporters
2) Add CR-Space access and FW Crdump snapshot support via devlink region_snapshot
3) Issue software reset upon FW asserts
4) Add fw and fw_fatal devlink heath reporters to follow fw errors indication by
dump and recover procedures and enable trigger these functionality by user.
4.1) fw reporter:
The fw reporter implements diagnose and dump callbacks.
It follows symptoms of fw error such as fw syndrome by triggering
fw core dump and storing it and any other fw trace into the dump buffer.
The fw reporter diagnose command can be triggered any time by the user to check
current fw status.
4.2) fw_fatal repoter:
The fw_fatal reporter implements dump and recover callbacks.
It follows fatal errors indications by CR-space dump and recover flow.
The CR-space dump uses vsc interface which is valid even if the FW command
interface is not functional, which is the case in most FW fatal errors. The
CR-space dump is stored as a memory region snapshot to ease read by address.
The recover function runs recover flow which reloads the driver and triggers fw
reset if needed.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0CsLgACgkQSD+KveBX
+j7mFwf+MYvIbUO4mXyoZIezci1UCzt1vNAkUYPceE94O9fK68ItrwtwrstgIqqS
58Tgx//MXxPpe9k9NIWjeS3i8sjcb8fDoqkjOCj7KAchv0IhSUvYFRpBrUK+yTOW
NIIXZzuCgIoR9a/hVlT/lhG+dm4MX2L5dWFtORLxMoO+ff3yiy4nNf9+Zdt0H7LT
YCELWnKeIQCvdzJAxX7OyTh3eOfc/h7o1nOsU4VugBHxKxx4T+9A26d+cZeZH5Ox
3ikTCc01ivVHqcLydAy96HQu0MENSNYNpmyDxWum3oJGFFu6hBQTM2ueRmVWZfwH
DRu+hhxONZROxxtpmP/ULmwYcLnBHg==
=VhXt
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2019-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-06-13
Mlx5 devlink health fw reporters and sw reset support
This series provides mlx5 firmware reset support and firmware devlink health
reporters.
1) Add initial mlx5 kernel documentation and include devlink health reporters
2) Add CR-Space access and FW Crdump snapshot support via devlink region_snapshot
3) Issue software reset upon FW asserts
4) Add fw and fw_fatal devlink heath reporters to follow fw errors indication by
dump and recover procedures and enable trigger these functionality by user.
4.1) fw reporter:
The fw reporter implements diagnose and dump callbacks.
It follows symptoms of fw error such as fw syndrome by triggering
fw core dump and storing it and any other fw trace into the dump buffer.
The fw reporter diagnose command can be triggered any time by the user to check
current fw status.
4.2) fw_fatal repoter:
The fw_fatal reporter implements dump and recover callbacks.
It follows fatal errors indications by CR-space dump and recover flow.
The CR-space dump uses vsc interface which is valid even if the FW command
interface is not functional, which is the case in most FW fatal errors. The
CR-space dump is stored as a memory region snapshot to ease read by address.
The recover function runs recover flow which reloads the driver and triggers fw
reset if needed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Multipath hash policy value of 0 isn't distributing since the outer IP
dest and src aren't varied eventhough the inner ones are. Since the flow
is on the inner ones in the case of tunneled traffic, hashing on them is
desired.
This is done mainly for IP over GRE, hence only tested for that. But
anything else supported by flow dissection should work.
v2: Use skb_flow_dissect_flow_keys() directly so that other tunneling
can be supported through flow dissection (per Nikolay Aleksandrov).
v3: Remove accidental inclusion of ports in the hash keys and clarify
the documentation (Nikolay Alexandrov).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a spelling typo in rds.txt
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS offload drivers keep track of TCP seq numbers to make sure
the packets are fed into the HW in order.
When packets get dropped on the way through the stack, the driver
will get out of sync and have to use fallback encryption, but unless
TCP seq number is resynced it will never match the packets correctly
(or even worse - use incorrect record sequence number after TCP seq
wraps).
Existing drivers (mlx5) feed the entire record on every out-of-order
event, allowing FW/HW to always be in sync.
This patch adds an alternative, more akin to the RX resync. When
driver sees a frame which is past its expected sequence number the
stream must have gotten out of order (if the sequence number is
smaller than expected its likely a retransmission which doesn't
require resync). Driver will ask the stack to perform TX sync
before it submits the next full record, and fall back to software
crypto until stack has performed the sync.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS offload device may lose sync with the TCP stream if packets
arrive out of order. Drivers can currently request a resync at
a specific TCP sequence number. When a record is found starting
at that sequence number kernel will inform the device of the
corresponding record number.
This requires the device to constantly scan the stream for a
known pattern (constant bytes of the header) after sync is lost.
This patch adds an alternative approach which is entirely under
the control of the kernel. Kernel tracks records it had to fully
decrypt, even though TLS socket is in TLS_HW mode. If multiple
records did not have any decrypted parts - it's a pretty strong
indication that the device is out of sync.
We choose the min number of fully encrypted records to be 2,
which should hopefully be more than will get retransmitted at
a time.
After kernel decides the device is out of sync it schedules a
resync request. If the TCP socket is empty the resync gets
performed immediately. If socket is not empty we leave the
record parser to resync when next record comes.
Before resync in message parser we peek at the TCP socket and
don't attempt the sync if the socket already has some of the
next record queued.
On resync failure (encrypted data continues to flow in) we
retry with exponential backoff, up to once every 128 records
(with a 16k record thats at most once every 2M of data).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2019-06-07
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix several bugs in riscv64 JIT code emission which forgot to clear high
32-bits for alu32 ops, from Björn and Luke with selftests covering all
relevant BPF alu ops from Björn and Jiong.
2) Two fixes for UDP BPF reuseport that avoid calling the program in case of
__udp6_lib_err and UDP GRO which broke reuseport_select_sock() assumption
that skb->data is pointing to transport header, from Martin.
3) Two fixes for BPF sockmap: a use-after-free from sleep in psock's backlog
workqueue, and a missing restore of sk_write_space when psock gets dropped,
from Jakub and John.
4) Fix unconnected UDP sendmsg hook API which is insufficient as-is since it
breaks standard applications like DNS if reverse NAT is not performed upon
receive, from Daniel.
5) Fix an out-of-bounds read in __bpf_skc_lookup which in case of AF_INET6
fails to verify that the length of the tuple is long enough, from Lorenz.
6) Fix libbpf's libbpf__probe_raw_btf to return an fd instead of 0/1 (for
{un,}successful probe) as that is expected to be propagated as an fd to
load_sk_storage_btf() and thus closing the wrong descriptor otherwise,
from Michal.
7) Fix bpftool's JSON output for the case when a lookup fails, from Krzesimir.
8) Minor misc fixes in docs, samples and selftests, from various others.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When RST packets are sent because no socket could be found,
it makes sense to use flowlabel_reflect sysctl to decide
if a reflection of the flowlabel is requested.
This extends commit 22b6722bfa ("ipv6: Add sysctl for per
namespace flow label reflection"), for some TCP RST packets.
In order to provide full control of this new feature,
flowlabel_reflect becomes a bitmask.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's possible that TCP stack will decide to retransmit a packet
right when that packet's data gets acked, especially in presence
of packet reordering. This means that packets may be in flight,
even though tls_device code has already freed their record state.
Make fill_sg_in() and in turn tls_sw_fallback() not generate a
warning in that case, and quietly proceed to drop such frames.
Make the exit path from tls_sw_fallback() drop monitor friendly,
for users to be able to troubleshoot dropped retransmissions.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The phylink conflict was between a bug fix by Russell King
to make sure we have a consistent PHY interface mode, and
a change in net-next to pull some code in phylink_resolve()
into the helper functions phylink_mac_link_{up,down}()
On the dp83867 side it's mostly overlapping changes, with
the 'net' side removing a condition that was supposed to
trigger for RGMII but because of how it was coded never
actually could trigger.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add docs for /proc/sys/net/ipv4/tcp_fastopen_key
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Cc: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The phylink_config structure will encapsulate a pointer to a struct
device and the operation type requested for this instance of PHYLINK.
This patch does not make any functional changes, it just transitions the
PHYLINK internals and all its users to the new API.
A pointer to a phylink_config structure will be passed to
phylink_create() instead of the net_device directly. Also, the same
phylink_config pointer will be passed back to all phylink_mac_ops
callbacks instead of the net_device. Using this mechanism, a PHYLINK
user can get the original net_device using a structure such as
'to_net_dev(config->dev)' or directly the structure containing the
phylink_config using a container_of call.
At the moment, only the PHYLINK_NETDEV is defined as a valid operation
type for PHYLINK. In this mode, a valid reference to a struct device
linked to the original net_device should be passed to PHYLINK through
the phylink_config structure.
This API changes is mainly driven by the necessity of adding a new
operation type in PHYLINK that disconnects the phy_device from the
net_device and also works when the net_device is lacking.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Describe existing kernel TLS offload (added back in Linux 4.19) -
the mechanism, the expected behavior and the notable corner cases.
This documentation is mostly targeting hardware vendors who want
to implement offload, to ensure consistency between implementations.
v2:
- add emphasis around TLS_SW/TLS_HW/TLS_HW_RECORD;
- remove mentions of ongoing work (Boris);
- split the flow of data in SW vs. HW cases in TX overview
(Boris);
- call out which fields are updated by the device and which
are filled by the stack (Boris);
- move error handling into it's own section (Boris);
- add more words about fallback (Boris);
- note that checksum validation is required (Alexei);
- note that drivers shouldn't pay attention to the TLS
device features.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the TLS doc to RST. Use C code blocks for the code
samples, and mark hyperlinks.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the device drivers have really long document titles
making the networking table of contents hard to look through.
Place vendor drivers under a submenu.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes some spelling typos found in ip-sysctl.txt
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix Sphinx warnings in Documentation/networking/af_xdp.rst by
adding indentation:
Documentation/networking/af_xdp.rst:319: WARNING: Literal block expected; none found.
Documentation/networking/af_xdp.rst:326: WARNING: Literal block expected; none found.
Fixes: 0f4a9b7d4e ("xsk: add FAQ to facilitate for first time users")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Allow kernel services using AF_RXRPC to indicate that a call should be
non-interruptible. This allows kafs to make things like lock-extension and
writeback data storage calls non-interruptible.
If this is set, signals will be ignored for operations on that call where
possible - such as waiting to get a call channel on an rxrpc connection.
It doesn't prevent UDP sendmsg from being interrupted, but that will be
handled by packet retransmission.
rxrpc_kernel_recv_data() isn't affected by this since that never waits,
preferring instead to return -EAGAIN and leave the waiting to the caller.
Userspace initiated calls can't be set to be uninterruptible at this time.
Signed-off-by: David Howells <dhowells@redhat.com>
Provide an interface to set max lifespan on a call from inside of the
kernel without having to call kernel_sendmsg().
Signed-off-by: David Howells <dhowells@redhat.com>
This adds a table which illustrates what combinations of management /
regular traffic work depending on the state the switch ports are in.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ReST underline warning:
./Documentation/networking/netdev-FAQ.rst:135: WARNING: Title underline too short.
Q: I made changes to only a few patches in a patch series should I resend only those changed?
--------------------------------------------------------------------------------------------
Fixes: ffa9125373 ("Documentation: networking: Update netdev-FAQ regarding patches")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2019-04-30
1) Fix an out-of-bound array accesses in __xfrm_policy_unlink.
From YueHaibing.
2) Reset the secpath on failure in the ESP GRO handlers
to avoid dereferencing an invalid pointer on error.
From Myungho Jung.
3) Add and revert a patch that tried to add rcu annotations
to netns_xfrm. From Su Yanjun.
4) Wait for rcu callbacks before freeing xfrm6_tunnel_spi_kmem.
From Su Yanjun.
5) Fix forgotten vti4 ipip tunnel deregistration.
From Jeremy Sowden:
6) Remove some duplicated log messages in vti4.
From Jeremy Sowden.
7) Don't use IPSEC_PROTO_ANY when flushing states because
this will flush only IPsec portocol speciffic states.
IPPROTO_ROUTING states may remain in the lists when
doing net exit. Fix this by replacing IPSEC_PROTO_ANY
with zero. From Cong Wang.
8) Add length check for UDP encapsulation to fix "Oversized IP packet"
warnings on receive side. From Sabrina Dubroca.
9) Fix xfrm interface lookup when the interface is associated to
a vrf layer 3 master device. From Martin Willi.
10) Reload header pointers after pskb_may_pull() in _decode_session4(),
otherwise we may read from uninitialized memory.
11) Update the documentation about xfrm[46]_gc_thresh, it
is not used anymore after the flowcache removal.
From Nicolas Dichtel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2019-04-22
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) allow stack/queue helpers from more bpf program types, from Alban.
2) allow parallel verification of root bpf programs, from Alexei.
3) introduce bpf sysctl hook for trusted root cases, from Andrey.
4) recognize var/datasec in btf deduplication, from Andrii.
5) cpumap performance optimizations, from Jesper.
6) verifier prep for alu32 optimization, from Jiong.
7) libbpf xsk cleanup, from Magnus.
8) other various fixes and cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_DECNET_ROUTE_FWMARK was removed in commit 47dcf0cb10 ("[NET]: Rethink mark field in struct flowi")
Since nothing replace it (and nothindg need to replace it, simply remove
it from documentation.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit da70314917 ("bpf: Document BPF_PROG_TYPE_CGROUP_SYSCTL")
Andrey proposes to put per-prog type docs under Documentation/bpf/
Let's move flow dissector documentation there as well.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
To make ICMPv6 closer to ICMPv4, add ratemask parameter. Since the ICMP
message types use larger numeric values, a simple bitmask doesn't fit.
I use large bitmap. The input and output are the in form of list of
ranges. Set the default to rate limit all error messages but Packet Too
Big. For Packet Too Big, use ratemask instead of hard-coded.
There are functions where icmpv6_xrlim_allow() and icmpv6_global_allow()
aren't called. This patch only adds them to icmpv6_echo_reply().
Rate limiting error messages is mandated by RFC 4443 but RFC 4890 says
that it is also acceptable to rate limit informational messages. Thus,
I removed the current hard-coded behavior of icmpv6_mask_allow() that
doesn't rate limit informational messages.
v2: Add dummy function proc_do_large_bitmap() if CONFIG_PROC_SYSCTL
isn't defined, expand the description in ip-sysctl.txt and remove
unnecessary conditional before kfree().
v3: Inline the bitmap instead of dynamically allocated. Still is a
pointer to it is needed because of the way proc_do_large_bitmap work.
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make rxrpc_kernel_check_life() pass back the life counter through the
argument list and return true if the call has not yet completed.
Suggested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>