Fix FTM per burst maximum value from 15 to 31
(The maximal bits that represents that number in the frame
is 5 hence a maximal value of 31)
Signed-off-by: Aviya Erenfeld <aviya.erenfeld@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If a driver does any significant activity in its ibss_join method,
then it will very well expect that to be called during restart,
before any stations are added. Do that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jouni reports that in some cases it is possible that getting
disconnected (or stopping AP, after previous patches) results
in further operations hitting the window within the regulatory
core restoring the regdomain to the defaults. The reason for
this is that we have to call out to CRDA or otherwise do some
asynchronous work, and thus can't do the restore atomically.
However, we've previously seen all the data we need to do the
restore, so we can hang on to that data and use it later for
the restore. This makes the whole thing happen within a single
locked section and thus atomic.
However, we can't *always* do this - there are unfortunately
cases where the restore needs to re-request, because this is
also used (abused?) as an error recovery process, so make the
new behaviour optional and only use it when doing a regular
restore as described above.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
All the setup code in AF_XDP is protected by a mutex with the
exception of the mmap code that cannot use it. To make sure that a
process banging on the mmap call at the same time as another process
is setting up the socket, smp_wmb() calls were added in the umem
registration code and the queue creation code, so that the published
structures that xsk_mmap needs would be consistent. However, the
corresponding smp_rmb() calls were not added to the xsk_mmap
code. This patch adds these calls.
Fixes: 37b076933a ("xsk: add missing write- and data-dependency barrier")
Fixes: c0c77d8fb7 ("xsk: add user memory registration support sockopt")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_skb_change_proto and bpf_skb_adjust_room change skb header length.
For GSO packets they adjust gso_size to maintain the same MTU.
The gso size can only be safely adjusted on bytestream protocols.
Commit d02f51cbcf ("bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat
to deal with gso sctp skbs") excluded SKB_GSO_SCTP.
Since then type SKB_GSO_UDP_L4 has been added, whose contents are one
gso_size unit per datagram. Also exclude these.
Move from a blacklist to a whitelist check to future proof against
additional such new GSO types, e.g., for fraglist based GRO.
Fixes: bec1f6f697 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds a helper function BPF_FUNC_tcp_sock and it
is currently available for cg_skb and sched_(cls|act):
struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk);
int cg_skb_foo(struct __sk_buff *skb) {
struct bpf_tcp_sock *tp;
struct bpf_sock *sk;
__u32 snd_cwnd;
sk = skb->sk;
if (!sk)
return 1;
tp = bpf_tcp_sock(sk);
if (!tp)
return 1;
snd_cwnd = tp->snd_cwnd;
/* ... */
return 1;
}
A 'struct bpf_tcp_sock' is also added to the uapi bpf.h to provide
read-only access. bpf_tcp_sock has all the existing tcp_sock's fields
that has already been exposed by the bpf_sock_ops.
i.e. no new tcp_sock's fields are exposed in bpf.h.
This helper returns a pointer to the tcp_sock. If it is not a tcp_sock
or it cannot be traced back to a tcp_sock by sk_to_full_sk(), it
returns NULL. Hence, the caller needs to check for NULL before
accessing it.
The current use case is to expose members from tcp_sock
to allow a cg_skb_bpf_prog to provide per cgroup traffic
policing/shaping.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The next patch will introduce a new "struct bpf_tcp_sock" which
exposes the same tcp_sock's fields already exposed in
"struct bpf_sock_ops".
This patch refactor the existing convert_ctx_access() codes for
"struct bpf_sock_ops" to get them ready to be reused for
"struct bpf_tcp_sock". The "rtt_min" is not refactored
in this patch because its handling is different from other
fields.
The SOCK_OPS_GET_TCP_SOCK_FIELD is new. All other SOCK_OPS_XXX_FIELD
changes are code move only.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds "state", "dst_ip4", "dst_ip6" and "dst_port" to the
bpf_sock. The userspace has already been using "state",
e.g. inet_diag (ss -t) and getsockopt(TCP_INFO).
This patch also allows narrow load on the following existing fields:
"family", "type", "protocol" and "src_port". Unlike IP address,
the load offset is resticted to the first byte for them but it
can be relaxed later if there is a use case.
This patch also folds __sock_filter_check_size() into
bpf_sock_is_valid_access() since it is not called
by any where else. All bpf_sock checking is in
one place.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In kernel, it is common to check "skb->sk && sk_fullsock(skb->sk)"
before accessing the fields in sock. For example, in __netdev_pick_tx:
static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
struct net_device *sb_dev)
{
/* ... */
struct sock *sk = skb->sk;
if (queue_index != new_index && sk &&
sk_fullsock(sk) &&
rcu_access_pointer(sk->sk_dst_cache))
sk_tx_queue_set(sk, new_index);
/* ... */
return queue_index;
}
This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
where a few of the convert_ctx_access() in filter.c has already been
accessing the skb->sk sock_common's fields,
e.g. sock_ops_convert_ctx_access().
"__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
Some of the fileds in "bpf_sock" will not be directly
accessible through the "__sk_buff->sk" pointer. It is limited
by the new "bpf_sock_common_is_valid_access()".
e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
are not allowed.
The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
can be used to get a sk with all accessible fields in "bpf_sock".
This helper is added to both cg_skb and sched_(cls|act).
int cg_skb_foo(struct __sk_buff *skb) {
struct bpf_sock *sk;
sk = skb->sk;
if (!sk)
return 1;
sk = bpf_sk_fullsock(sk);
if (!sk)
return 1;
if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
return 1;
/* some_traffic_shaping(); */
return 1;
}
(1) The sk is read only
(2) There is no new "struct bpf_sock_common" introduced.
(3) Future kernel sock's members could be added to bpf_sock only
instead of repeatedly adding at multiple places like currently
in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.
(4) After "sk = skb->sk", the reg holding sk is in type
PTR_TO_SOCK_COMMON_OR_NULL.
(5) After bpf_sk_fullsock(), the return type will be in type
PTR_TO_SOCKET_OR_NULL which is the same as the return type of
bpf_sk_lookup_xxx().
However, bpf_sk_fullsock() does not take refcnt. The
acquire_reference_state() is only depending on the return type now.
To avoid it, a new is_acquire_function() is checked before calling
acquire_reference_state().
(6) The WARN_ON in "release_reference_state()" is no longer an
internal verifier bug.
When reg->id is not found in state->refs[], it means the
bpf_prog does something wrong like
"bpf_sk_release(bpf_sk_fullsock(skb->sk))" where reference has
never been acquired by calling "bpf_sk_fullsock(skb->sk)".
A -EINVAL and a verbose are done instead of WARN_ON. A test is
added to the test_verifier in a later patch.
Since the WARN_ON in "release_reference_state()" is no longer
needed, "__release_reference_state()" is folded into
"release_reference_state()" also.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Modify the kernel users of the TCA_ACT_* macros to use TCA_ID_*. For
example, use TCA_ID_GACT instead of TCA_ACT_GACT. This will align with
TCA_ID_POLICE and also differentiates these identifier, used in struct
tc_action_ops type field, from other macros starting with TCA_ACT_.
To make things clearer, we name the enum defining the TCA_ID_*
identifiers and also change the "type" field of struct tc_action to
id.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move all the TC identifiers to one place, to the same enum that defines
the identifier of police action. This makes it easier choose numbers for
new actions since they are now defined in one place. We preserve the
original values for binary compatibility. New IDs should be added inside
the enum.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function tls_sw_recvmsg() dequeues multiple records from stream parser
and decrypts them. In case the decryption is done by async accelerator,
the records may get submitted for decryption while the previous ones may
not have been decryted yet. For tls1.3, the record type is known only
after decryption. Therefore, for tls1.3, tls_sw_recvmsg() may submit
records for decryption even if it gets 'handshake' records after 'data'
records. These intermediate 'handshake' records may do a key updation.
By the time new keys are given to ktls by userspace, it is possible that
ktls has already submitted some records i(which are encrypted with new
keys) for decryption using old keys. This would lead to decrypt failure.
Therefore, async decryption of records should be disabled for tls1.3.
Fixes: 130b392c6c ("net: tls: Add tls 1.3 support")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The generic netlink code is expected to trigger notification messages when
configuration might have been changed. But the configuration of batman-adv
is most of the time still done using sysfs. So the sysfs interface should
also trigger the corresponding netlink messages via the "config" multicast
group.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The B.A.T.M.A.N. V implementation tries to estimate the link throughput of
an interface to an originator using different automatic methods. It is
still possible to overwrite it the link throughput for all reachable
originators via this interface.
The BATADV_CMD_SET_HARDIF/BATADV_CMD_GET_HARDIF commands allow to set/get
the configuration of this feature using the u32
BATADV_ATTR_THROUGHPUT_OVERRIDE attribute. The used unit is in 100 Kbit/s.
If the value is set to 0 then batman-adv will try to estimate the
throughput by itself.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The ELP packets are transmitted every elp_interval milliseconds on an
slave/hard-interface. This value can be changed using the configuration
interface.
The BATADV_CMD_SET_HARDIF/BATADV_CMD_GET_HARDIF commands allow to set/get
the configuration of this feature using the u32 BATADV_ATTR_ELP_INTERVAL
attribute.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The OGM packets are transmitted every orig_interval milliseconds. This
value can be changed using the configuration interface.
The BATADV_CMD_SET_MESH/BATADV_CMD_GET_MESH commands allow to set/get the
configuration of this feature using the u32 BATADV_ATTR_ORIG_INTERVAL
attribute.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The mesh interface can use (in an homogeneous mesh) network coding, a
mechanism that aims to increase the overall network throughput by fusing
multiple packets in one transmission.
The BATADV_CMD_SET_MESH/BATADV_CMD_GET_MESH commands allow to set/get the
configuration of this feature using the BATADV_ATTR_NETWORK_CODING_ENABLED
attribute. Setting the u8 to zero will disable this feature and setting it
to something else is enabling this feature.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The mesh interface can optimize the flooding of multicast packets based on
the content of the global translation tables. To disable this behavior and
use the broadcast-like flooding of the packets, forceflood has to be
enabled.
The BATADV_CMD_SET_MESH/BATADV_CMD_GET_MESH commands allow to set/get the
configuration of this feature using the
BATADV_ATTR_MULTICAST_FORCEFLOOD_ENABLED attribute. Setting the u8 to zero
will disable this feature (allowing multicast optimizations) and setting it
to something else is enabling this feature (forcing simple flooding).
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
In contrast to other modules, batman-adv allows to set the debug message
verbosity per mesh/soft-interface and not per module (via modparam).
The BATADV_CMD_SET_MESH/BATADV_CMD_GET_MESH commands allow to set/get the
configuration of this feature using the u32 (bitmask) BATADV_ATTR_LOG_LEVEL
attribute.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The TQ (B.A.T.M.A.N. IV) and throughput values (B.A.T.M.A.N. V) are reduced
when they are forwarded. One of the reductions is the penalty for
traversing an additional hop. This hop_penalty (0-255) defines the
percentage of reduction (0-100%).
The BATADV_CMD_SET_MESH/BATADV_CMD_GET_MESH commands allow to set/get the
configuration of this feature using the u8 BATADV_ATTR_HOP_PENALTY
attribute.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The mesh/soft-interface can optimize the handling of DHCP packets. Instead
of flooding them through the whole mesh, it can be forwarded as unicast to
a specific gateway server. The originator which injects the packets in the
mesh has to select (based on sel_class thresholds) a responsible gateway
server. This is done by switching this originator to the gw_mode client.
The servers announce their forwarding bandwidth (download/upload) when the
gw_mode server was selected.
The BATADV_CMD_SET_MESH/BATADV_CMD_GET_MESH commands allow to set/get the
configuration of this feature using the attributes:
* u8 BATADV_ATTR_GW_MODE (0 == off, 1 == client, 2 == server)
* u32 BATADV_ATTR_GW_BANDWIDTH_DOWN (in 100 kbit/s steps)
* u32 BATADV_ATTR_GW_BANDWIDTH_UP (in 100 kbit/s steps)
* u32 BATADV_ATTR_GW_SEL_CLASS
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The mesh interface can fragment unicast packets when the packet size
exceeds the outgoing slave/hard-interface MTU.
The BATADV_CMD_SET_MESH/BATADV_CMD_GET_MESH commands allow to set/get the
configuration of this feature using the BATADV_ATTR_FRAGMENTATION_ENABLED
attribute. Setting the u8 to zero will disable this feature and setting it
to something else is enabling this feature.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The mesh interface can use a distributed hash table to answer ARP requests
without flooding the request through the whole mesh.
The BATADV_CMD_SET_MESH/BATADV_CMD_GET_MESH commands allow to set/get the
configuration of this feature using the
BATADV_ATTR_DISTRIBUTED_ARP_TABLE_ENABLED attribute. Setting the u8 to zero
will disable this feature and setting it to something else is enabling this
feature.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The mesh interface can try to detect loops in the same mesh caused by
(indirectly) bridged mesh/soft-interfaces of different nodes. Some of the
loops can also be resolved without breaking the mesh.
The BATADV_CMD_SET_MESH/BATADV_CMD_GET_MESH commands allow to set/get the
configuration of this feature using the
BATADV_ATTR_BRIDGE_LOOP_AVOIDANCE_ENABLED attribute. Setting the u8 to zero
will disable this feature and setting it to something else is enabling this
feature.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The mesh interface can use multiple slave/hard-interface ports at the same
time to transport the traffic to other nodes.
The BATADV_CMD_SET_MESH/BATADV_CMD_GET_MESH commands allow to set/get the
configuration of this feature using the BATADV_ATTR_BONDING_ENABLED
attribute. Setting the u8 to zero will disable this feature and setting it
to something else is enabling this feature.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The mesh interface can drop messages between clients to implement a
mesh-wide AP isolation.
The BATADV_CMD_SET_MESH/BATADV_CMD_GET_MESH and
BATADV_CMD_SET_VLAN/BATADV_CMD_GET_VLAN commands allow to set/get the
configuration of this feature using the BATADV_ATTR_AP_ISOLATION_ENABLED
attribute. Setting the u8 to zero will disable this feature and setting it
to something else is enabling this feature.
This feature also requires that skbuff which should be handled as isolated
are marked. The BATADV_CMD_SET_MESH/BATADV_CMD_GET_MESH commands allow to
set/get the mark/mask using the u32 attributes BATADV_ATTR_ISOLATION_MARK
and BATADV_ATTR_ISOLATION_MASK.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The mesh interface can delay OGM messages to aggregate different ogms
together in a single OGM packet.
The BATADV_CMD_SET_MESH/BATADV_CMD_GET_MESH commands allow to set/get the
configuration of this feature using the BATADV_ATTR_AGGREGATED_OGMS_ENABLED
attribute. Setting the u8 to zero will disable this feature and setting it
to something else is enabling this feature.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The batman-adv configuration interface was implemented solely using sysfs.
This approach was condemned by non-batadv developers as "huge mistake".
Instead a netlink/genl based implementation was suggested.
Beside the mesh/soft-interface specific configuration, the VLANs on top of
the mesh/soft-interface have configuration settings. The genl interface
reflects this by allowing to get/set it using the vlan specific commands
BATADV_CMD_GET_VLAN/BATADV_CMD_SET_VLAN.
The set command BATADV_CMD_SET_MESH will also notify interested userspace
listeners of the "config" mcast group using the BATADV_CMD_SET_VLAN command
message type that settings might have been changed and what the current
values are.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The batman-adv configuration interface was implemented solely using sysfs.
This approach was condemned by non-batadv developers as "huge mistake".
Instead a netlink/genl based implementation was suggested.
Beside the mesh/soft-interface specific configuration, the
slave/hard-interface have B.A.T.M.A.N. V specific configuration settings.
The genl interface reflects this by allowing to get/set it using the
hard-interface specific commands.
The BATADV_CMD_GET_HARDIFS (or short version BATADV_CMD_GET_HARDIF) is
reused as get command because it already allow sto dump the content of
other information from the slave/hard-interface which are not yet
configuration specific.
The set command BATADV_CMD_SET_HARDIF will also notify interested userspace
listeners of the "config" mcast group using the BATADV_CMD_SET_HARDIF
command message type that settings might have been changed and what the
current values are.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The batman-adv configuration interface was implemented solely using sysfs.
This approach was condemned by non-batadv developers as "huge mistake".
Instead a netlink/genl based implementation was suggested.
The main objects for this configuration is the mesh/soft-interface object.
Its actual object in memory already contains most of the available
configuration settings. The genl interface reflects this by allowing to
get/set it using the mesh specific commands.
The BATADV_CMD_GET_MESH_INFO (or short version BATADV_CMD_GET_MESH) is
reused as get command because it already provides the content of other
information from the mesh/soft-interface which are not yet configuration
specific.
The set command BATADV_CMD_SET_MESH will also notify interested userspace
listeners of the "config" mcast group using the BATADV_CMD_SET_MESH command
message type that settings might have been changed and what the current
values are.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The commit ff4c92d85c ("genetlink: introduce pre_doit/post_doit hooks")
intoduced a mechanism to run specific code for doit hooks before/after the
hooks are run. Since all doit hooks are requiring the batadv softif, it
should be retrieved/freed in these helpers to simplify the code.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
net/core/ethtool.c:3023:19: warning: address of array
'ext_m_spec->h_dest' will always evaluate to 'true'
[-Wpointer-bool-conversion]
if (ext_m_spec->h_dest) {
~~ ~~~~~~~~~~~~^~~~~~
h_dest is an array, it can't be null so remove this check.
Fixes: eca4205f9e ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator")
Link: https://github.com/ClangBuiltLinux/linux/issues/353
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
size = sizeof(struct foo) + count * sizeof(struct boo);
instance = alloc(size, GFP_KERNEL)
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = alloc(struct_size(instance, entry, count), GFP_KERNEL)
Notice that, in this case, variable size is not necessary, hence it is
removed.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
instance = alloc(sizeof(struct foo) + count * sizeof(struct boo));
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = alloc(struct_size(instance, entry, count));
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
size = sizeof(struct foo) + count * sizeof(struct boo);
instance = alloc(size, GFP_KERNEL)
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
size = struct_size(instance, entry, count);
instance = alloc(size, GFP_KERNEL)
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
size = sizeof(struct foo) + count * sizeof(struct boo);
instance = alloc(size, GFP_KERNEL)
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
size = struct_size(instance, entry, count);
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent change in the rx_curs_confirmed assignment disregards
byte order, which causes problems on little endian architectures.
This patch fixes it.
Fixes: b8649efad8 ("net/smc: fix sender_free computation") (net-tree)
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the unlikely event that the kmalloc call in vmci_transport_socket_init()
fails, we end-up calling vmci_transport_destruct() with a NULL vmci_trans()
and oopsing.
This change addresses the above explicitly checking for zero vmci_trans()
at destruction time.
Reported-by: Xiumei Mu <xmu@redhat.com>
Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the algorithm described in the comment block at the
beginning of ip_rt_send_redirect, the host should try to send
'ip_rt_redirect_number' ICMP redirect packets with an exponential
backoff and then stop sending them at all assuming that the destination
ignores redirects.
If the device has previously sent some ICMP error packets that are
rate-limited (e.g TTL expired) and continues to receive traffic,
the redirect packets will never be transmitted. This happens since
peer->rate_tokens will be typically greater than 'ip_rt_redirect_number'
and so it will never be reset even if the redirect silence timeout
(ip_rt_redirect_silence) has elapsed without receiving any packet
requiring redirects.
Fix it by using a dedicated counter for the number of ICMP redirect
packets that has been sent by the host
I have not been able to identify a given commit that introduced the
issue since ip_rt_send_redirect implements the same rate-limiting
algorithm from commit 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, user can do dump or get of param values right after the
devlink params are registered. However the driver may not be initialized
which is an issue. The same problem happens during notification
upon param registration. Allow driver to publish devlink params
whenever it is ready to handle get() ops. Note that this cannot
be resolved by init reordering, as the "driverinit" params have
to be available before the driver is initialized (it needs the param
values there).
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An ipvlan bug fix in 'net' conflicted with the abstraction away
of the IPV6 specific support in 'net-next'.
Similarly, a bug fix for mlx5 in 'net' conflicted with the flow
action conversion in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"This pull request is dedicated to the upcoming snowpocalypse parts 2
and 3 in the Pacific Northwest:
1) Drop profiles are broken because some drivers use dev_kfree_skb*
instead of dev_consume_skb*, from Yang Wei.
2) Fix IWLWIFI kconfig deps, from Luca Coelho.
3) Fix percpu maps updating in bpftool, from Paolo Abeni.
4) Missing station release in batman-adv, from Felix Fietkau.
5) Fix some networking compat ioctl bugs, from Johannes Berg.
6) ucc_geth must reset the BQL queue state when stopping the device,
from Mathias Thore.
7) Several XDP bug fixes in virtio_net from Toshiaki Makita.
8) TSO packets must be sent always on queue 0 in stmmac, from Jose
Abreu.
9) Fix socket refcounting bug in RDS, from Eric Dumazet.
10) Handle sparse cpu allocations in bpf selftests, from Martynas
Pumputis.
11) Make sure mgmt frames have enough tailroom in mac80211, from Felix
Feitkau.
12) Use safe list walking in sctp_sendmsg() asoc list traversal, from
Greg Kroah-Hartman.
13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL
ccid, from Eric Dumazet.
14) Need to reload WoL password into bcmsysport device after deep
sleeps, from Florian Fainelli.
15) Remove filter from mask before freeing in cls_flower, from Petr
Machata.
16) Missing release and use after free in error paths of s390 qeth
code, from Julian Wiedmann.
17) Fix lockdep false positive in dsa code, from Marc Zyngier.
18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn.
19) Fix EQ firmware assert in qed driver, from Manish Chopra.
20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net: dsa: b53: Fix for failure when irq is not defined in dt
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
geneve: should not call rt6_lookup() when ipv6 was disabled
net: Don't default Cavium PTP driver to 'y'
net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net/mlx5e: Don't overwrite pedit action when multiple pedit used
net/mlx5e: Update hw flows when encap source mac changed
qed*: Advance drivers version to 8.37.0.20
qed: Change verbosity for coalescing message.
qede: Fix system crash on configuring channels.
qed: Consider TX tcs while deriving the max num_queues for PF.
...
new_ie is used as a temporary storage for the generation of
the new elements. However, after copying from it the memory
wasn't freed and leaked. Free it.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Extension IEs have ID 255 followed by extension ID. Current
code is buggy in handling it in two ways:
1. When checking if IE is in the frame, it uses just the ID, which
for extension elements is too broad.
2. It uses 0xFF to mark copied IEs, which will result in not copying
extension IEs from the subelement.
Fix both issue.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Set multi-bssid support flags according to driver support.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add support for multi-bssid.
This includes:
- Parsing multi-bssid element
- Overriding DTIM values
- Taking into account in various places the inner BSSID instead of
transmitter BSSID
- Save aside some multi-bssid properties needed by drivers
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the new IEs are generated, the multiple BSSID elements
are not saved. Save aside properties that are needed later
for PS.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Parsing and exposing nontransmitted APs is problematic
when underlying HW doesn't support it. Do it only if
driver indicated support. Allow HE restriction as well,
since the HE spec defined the exact manner that Multiple
BSSID set should behave. APs that not support the HE
spec will have less predictable Multiple BSSID set
support/behavior
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Previously the transmitted BSS and the non-trasmitted BSS list were
defined in struct cfg80211_internal_bss. Move them to struct cfg80211_bss
since mac80211 needs this info.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When holding data of the non-transmitting BSS, we need to keep the
transmitting BSS data on. Otherwise it will be released, and release
the non-transmitting BSS with it.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use the new for_each_element() helper here, we cannot use
for_each_subelement() since we have a fixed 1 byte before
the subelements start.
While at it, also fix le16_to_cpup() to be get_unaligned_le16()
since we don't know anything about alignment.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This extends cfg80211 BSS table processing to be able to parse Multiple
BSSID element from Beacon and Probe Response frames and to update the
BSS profiles in internal database for non-transmitted BSSs.
Signed-off-by: Peng Xu <pxu@codeaurora.org>
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This will allow iterating over multiple BSSs inside
cfg80211_bss, in case of multiple BSSID.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In multiple BSSID, we have nested IEs inside the multiple
BSSID IE, that override the external ones for that specific
BSS. As preparation for supporting that, pass 2 BSSIDs to the
parse function, the transmitter, and the selected BSSID, so
it can know which IEs to choose. If the selected BSSID is
NULL, the outer ones will be applied.
Change ieee80211_bss_info_update to parse elements itself,
instead of receiving them parsed, so we have the relevant
bss entry in hand.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This makes for much simpler code, simply walk through all
the elements and check that the last one found ends with
the end of the data. This works because if any element is
malformed the walk is aborted, we end up with a mismatch.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We currently have a number of helpers to find elements that just
return a u8 *, change those to return a struct element and add
inlines to deal with the u8 * compatibility.
Note that the match behaviour is changed to start the natch at
the data, so conversion from _ie_match to _elem_match need to
be done carefully.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Rather than always iterating elements from frames with pure
u8 pointers, add a type "struct element" that encapsulates
the id/datalen/data format of them.
Then, add the element iteration macros
* for_each_element
* for_each_element_id
* for_each_element_extid
which take, as their first 'argument', such a structure and
iterate through a given u8 array interpreting it as elements.
While at it and since we'll need it, also add
* for_each_subelement
* for_each_subelement_id
* for_each_subelement_extid
which instead of taking data/length just take an outer element
and use its data/datalen.
Also add for_each_element_completed() to determine if any of
the loops above completed, i.e. it was able to parse all of
the elements successfully and no data remained.
Use for_each_element_id() in cfg80211_find_ie_match() as the
first user of this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit ed75986f4a ("net/smc: ipv6 support for smc_diag.c") changed the
value of the diag_family field. The idea was to indicate the family of
the IP address in the inet_diag_sockid field. But the change makes it
impossible to distinguish an inet_sock_diag response message from SMC
sock_diag response. This patch restores the original behaviour and sends
AF_SMC as value of the diag_family field.
Fixes: ed75986f4a ("net/smc: ipv6 support for smc_diag.c")
Reported-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The lgr field of an smc_connection is set in smc_conn_create() and
should be cleared in smc_conn_free() for consistency reasons, so move
the responsible code.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If SMC client and server connections are both established at the same
time, smc_connect_rdma() cannot send a CLC confirm message while
smc_listen_work() is waiting for one due to lock contention. This can
result in timeouts in smc_clc_wait_msg() and failed SMC connections.
In case of SMC-R, there are two types of LGRs (client and server LGRs)
which can be protected by separate locks. So, this patch splits the LGR
pending lock into two separate locks for client and server to avoid the
locking issue for SMC-R.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If SMC client and server connections are both established at the same
time, smc_connect_ism() cannot send a CLC confirm message while
smc_listen_work() is waiting for one due to lock contention. This can
result in timeouts in smc_clc_wait_msg() and failed SMC connections.
In case of SMC-D, the LGR pending lock is not needed while
smc_listen_work() is waiting for the CLC confirm message. So, this patch
releases the lock earlier for SMC-D to avoid the locking issue.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SMC already provides a wrapper for atomic64 calls to be
architecture independent. Use this wrapper for SMC-D as well.
Reported-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to RFC7609 (http://www.rfc-editor.org/info/rfc7609)
first the SMC-R connection is shut down and then the normal TCP
connection FIN processing drives cleanup of the internal TCP connection.
The unconditional release of the clcsock during active socket closing
has to be postponed if the peer has not yet signalled socket closing.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we disabled IPv6 from the kernel command line (ipv6.disable=1), we should
not call ip6_err_gen_icmpv6_unreach(). This:
ip link add sit1 type sit local 192.0.2.1 remote 192.0.2.2 ttl 1
ip link set sit1 up
ip addr add 198.51.100.1/24 dev sit1
ping 198.51.100.2
if IPv6 is disabled at boot time, will crash the kernel.
v2: there's no need to use in6_dev_get(), use __in6_dev_get() instead,
as we only need to check that idev exists and we are under
rcu_read_lock() (from netif_receive_skb_internal()).
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: ca15a078bd ("sit: generate icmpv6 error when receiving icmpv4 error")
Cc: Oussama Ghorbel <ghorbel@pivasoftware.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add devlink health dump commands, in order to run an dump operation
over a specific reporter.
The supported operations are dump_get in order to get last saved
dump (if not exist, dump now) and dump_clear to clear last saved
dump.
It is expected from driver's callback for diagnose command to fill it
via the devlink fmsg API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add devlink health diagnose command, in order to run a diagnose
operation over a specific reporter.
It is expected from driver's callback for diagnose command to fill it
via the devlink fmsg API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add devlink health recover command to the uapi, in order to allow the user
to execute a recover operation over a specific reporter.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add devlink health set command, in order to set configuration parameters
for a specific reporter.
Supported parameters are:
- graceful_period: Time interval between auto recoveries (in msec)
- auto_recover: Determines if the devlink shall execute recover upon
receiving error for the reporter
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add devlink health get command to provide reporter/s data for user space.
Add the ability to get data per reporter or dump data from all available
reporters.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon error discover, every driver can report it to the devlink health
mechanism via devlink_health_report function, using the appropriate
reporter registered to it. Driver can pass error specific context which
will be delivered to it as part of the dump / recovery callbacks.
Once an error is reported, devlink health will do the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and stored at the reporter instance (as long
as there is no other dump which is already stored)
* Auto recovery attempt is being done. Depends on:
- Auto Recovery configuration
- Grace period vs. Time since last recover
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink health reporter is an instance for reporting, diagnosing and
recovering from run time errors discovered by the reporters.
Define it's data structure and supported operations.
In addition, expose devlink API to create and destroy a reporter.
Each devlink instance will hold it's own reporters list.
As part of the allocation, driver shall provide a set of callbacks which
will be used by devlink in order to handle health reports and user
commands related to this reporter. In addition, driver is entitled to
provide some priv pointer, which can be fetched from the reporter by
devlink_health_reporter_priv function.
For each reporter, devlink will hold a metadata of statistics,
dump msg and status.
For passing dumps and diagnose data to the user-space, it will use devlink
fmsg API.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink fmsg is a mechanism to pass descriptors between drivers and
devlink, in json-like format. The API allows the driver to add nested
attributes such as object, object pair and value array, in addition to
attributes such as name and value.
Driver can use this API to fill the fmsg context in a format which will be
translated by the devlink to the netlink message later.
There is no memory allocation in advance (other than the initial list
head), and it dynamically allocates messages descriptors and add them to
the list on the fly.
When it needs to send the data using SKBs to the netlink layer, it
fragments the data between different SKBs. In order to do this
fragmentation, it uses virtual nests attributes, to avoid actual
nesting use which cannot be divided between different SKBs.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Santosh Shilimkar says:
====================
rds: add tos support
RDS applications make use of tos to classify database traffic.
This feature has been used in shipping products from 2.6.32 based
kernels. Its tied with RDS v4.1 protocol version and the compatibility
gets negotiated as part of connections setup.
Patchset keeps full backward compatibility using existing connection
negotiation scheme. Currently the feature is exploited by RDMA
transport and for TCP transport the user tos values are mapped to
same default class (0).
For RDMA transports, RDMA CM service type API is used to
set up different SL(service lanes) and the IB fabric is configured
for tos mapping using Subnet Manager(SL to VL mappings).
Similarly for ROCE fabric, user priority is mapped with different
DSCP code points which are associated with different switch queues
in the fabric.
The original code was developed by Bang Nguyen in downstream kernel back in
2.6.32 kernel days and it has evolved significantly over period of time.
Thanks to Yanjun for doing testing with various combinations of host like
v3.1<->v4.1, v4.1.<->v3.1, v4.1 upstream to shipping v4.1 etc etc
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-02-07
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add a riscv64 JIT for BPF, from Björn.
2) Implement BTF deduplication algorithm for libbpf which takes BTF type
information containing duplicate per-compilation unit information and
reduces it to an equivalent set of BTF types with no duplication and
without loss of information, from Andrii.
3) Offloaded and native BPF XDP programs can coexist today, enable also
offloaded and generic ones as well, from Jakub.
4) Expose various BTF related helper functions in libbpf as API which
are in particular helpful for JITed programs, from Yonghong.
5) Fix the recently added JMP32 code emission in s390x JIT, from Heiko.
6) Fix BPF kselftests' tcp_{server,client}.py to be able to run inside
a network namespace, also add a fix for libbpf to get libbpf_print()
working, from Stanislav.
7) Fixes for bpftool documentation, from Prashant.
8) Type cleanup in BPF kselftests' test_maps.c to silence a gcc8 warning,
from Breno.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have a dedicated NDO for getting a port's parent ID, get rid
of SWITCHDEV_ATTR_ID_PORT_PARENT_ID and convert all callers to use the
NDO exclusively. This is a preliminary change to getting rid of
switchdev_ops eventually.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA implements SWITCHDEV_ATTR_ID_PORT_PARENT_ID and we want to get rid
of switchdev_ops eventually, ease that migration by implementing a
ndo_get_port_parent_id() function which returns what
switchdev_port_attr_get() would do.
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for getting rid of switchdev_ops, create a dedicated NDO
operation for getting the port's parent identifier. There are
essentially two classes of drivers that need to implement getting the
port's parent ID which are VF/PF drivers with a built-in switch, and
pure switchdev drivers such as mlxsw, ocelot, dsa etc.
We introduce a helper function: dev_get_port_parent_id() which supports
recursion into the lower devices to obtain the first port's parent ID.
Convert the bridge, core and ipv4 multicast routing code to check for
such ndo_get_port_parent_id() and call the helper function when valid
before falling back to switchdev_port_attr_get(). This will allow us to
convert all relevant drivers in one go instead of having to implement
both switchdev_port_attr_get() and ndo_get_port_parent_id() operations,
then get rid of switchdev_port_attr_get().
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function can't succeed if dp->pl is NULL. It will Oops inside the
call to return phylink_ethtool_get_eee(dp->pl, e);
Fixes: 1be52e97ed ("dsa: slave: eee: Allow ports to use phylink")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two and a half years ago, the client was changed to use gathered
Send for larger inline messages, in commit 655fec6987 ("xprtrdma:
Use gathered Send for large inline messages"). Several fixes were
required because there are a few in-kernel device drivers whose
max_sge is 3, and these were broken by the change.
Apparently my memory is going, because some time later, I submitted
commit 25fd86eca1 ("svcrdma: Don't overrun the SGE array in
svc_rdma_send_ctxt"), and after that, commit f3c1fd0ee2 ("svcrdma:
Reduce max_send_sges"). These too incorrectly assumed in-kernel
device drivers would have more than a few Send SGEs available.
The fix for the server side is not the same. This is because the
fundamental problem on the server is that, whether or not the client
has provisioned a chunk for the RPC reply, the server must squeeze
even the most complex RPC replies into a single RDMA Send. Failing
in the send path because of Send SGE exhaustion should never be an
option.
Therefore, instead of failing when the send path runs out of SGEs,
switch to using a bounce buffer mechanism to handle RPC replies that
are too complex for the device to send directly. That allows us to
remove the max_sge check to enable drivers with small max_sge to
work again.
Reported-by: Don Dutile <ddutile@redhat.com>
Fixes: 25fd86eca1 ("svcrdma: Don't overrun the SGE array in ...")
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Define a tracepoint and allow user to trace messages in case of an hardware
error code for hardware associated with devlink instance.
Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
batadv_dat_put_dhcp is creating a new ARP packet via
batadv_dat_arp_create_reply and tries to forward it via
batadv_dat_send_data to different peers in the DHT. The original skb is not
consumed by batadv_dat_send_data and thus has to be consumed by the caller.
Fixes: b61ec31c85 ("batman-adv: Snoop DHCPACKs for DAT")
Signed-off-by: Martin Weinelt <martin@linuxlounge.net>
[sven@narfation.org: add commit message]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
This patch adds a function to translate the ethtool_rx_flow_spec
structure to the flow_rule representation.
This allows us to reuse code from the driver side given that both flower
and ethtool_rx_flow interfaces use the same representation.
This patch also includes support for the flow type flags FLOW_EXT,
FLOW_MAC_EXT and FLOW_RSS.
The ethtool_rx_flow_spec_input wrapper structure is used to convey the
rss_context field, that is away from the ethtool_rx_flow_spec structure,
and the ethtool_rx_flow_spec structure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that drivers have been converted to use the flow action
infrastructure, remove this field from the tc_cls_flower_offload
structure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides the flow_stats structure that acts as container for
tc_cls_flower_offload, then we can use to restore the statistics on the
existing TC actions. Hence, tcf_exts_stats_update() is not used from
drivers anymore.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements a new function to translate from native TC action
to the new flow_action representation. Moreover, this patch also updates
cls_flower to use this new function.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This new infrastructure defines the nic actions that you can perform
from existing network drivers. This infrastructure allows us to avoid a
direct dependency with the native software TC action representation.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch wraps the dissector key and mask - that flower uses to
represent the matching side - around the flow_match structure.
To avoid a follow up patch that would edit the same LoCs in the drivers,
this patch also wraps this new flow match structure around the flow rule
object. This new structure will also contain the flow actions in follow
up patches.
This introduces two new interfaces:
bool flow_rule_match_key(rule, dissector_id)
that returns true if a given matching key is set on, and:
flow_rule_match_XYZ(rule, &match);
To fetch the matching side XYZ into the match container structure, to
retrieve the key and the mask with one single call.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit a25717d2b6 ("xdp: support simultaneous driver and
hw XDP attachment") users can load an XDP program for offload and
in native driver mode simultaneously. Allow a similar mix of
offload and SKB mode/generic XDP.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When we destroy the interface we already hold the wdev->mtx
while calling cfg80211_pmsr_wdev_down(), which assumes this
isn't true and flushes the worker that takes the lock, thus
leading to a deadlock.
Fix this by refactoring the worker and calling its code in
cfg80211_pmsr_wdev_down() directly.
We still need to flush the work later to make sure it's not
still running and will crash, but it will not do anything.
Fixes: 9bb7e0f24e ("cfg80211: add peer measurement with FTM initiator API")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When we *don't* have a MAC address attribute, we shouldn't
try to use this - this was intended to copy the local MAC
address instead, so fix it.
Fixes: 9bb7e0f24e ("cfg80211: add peer measurement with FTM initiator API")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Use CONFIG_NF_TABLES_INET from seltests, not NF_TABLES_INET.
From Naresh Kamboju.
2) Add a test to cover masquerading and redirect case, from Florian
Westphal.
3) Two packets coming from the same socket may race to set up NAT,
ending up with different tuples and the packet losing race being
dropped. Update nf_conntrack_tuple_taken() to exercise clash
resolution for this case. From Martynas Pumputis and Florian
Westphal.
4) Unbind anonymous sets from the commit and abort path, this fixes
a splat due to double set list removal/release in case that the
transaction needs to be aborted.
5) Do not preserve original output interface for packets that are
redirected in the output chain when ip6_route_me_harder() is
called. Otherwise packets end up going not going to the loopback
device. From Eli Cooper.
6) Fix bogus splat in nft_compat with CONFIG_REFCOUNT_FULL=y, this
also simplifies the existing logic to deal with the list insertions
of the xtables extensions. From Florian Westphal.
Diffstat look rather larger than usual because of the new selftest, but
Florian and I consider that having tests soon into the tree is good to
improve coverage. If there's a different policy in this regard, please,
let me know.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When I moved the refcount to refcount_t type I missed the fact that
refcount_inc() will result in use-after-free warning with
CONFIG_REFCOUNT_FULL=y builds.
The correct fix would be to init the reference count to 1 at allocation
time, but, unfortunately we cannot do this, as we can't undo that
in case something else fails later in the batch.
So only solution I see is to special-case the 'new entry' condition
and replace refcount_inc() with a "delayed" refcount_set(1) in this case,
as done here.
The .activate callback can be removed to simplify things, we only
need to make sure that deactivate() decrements/unlinks the entry
from the list at end of transaction phase (commit or abort).
Fixes: 12c44aba66 ("netfilter: nft_compat: use refcnt_t type for nft_xt reference count")
Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 508b09046c ("netfilter: ipv6: Preserve link scope traffic
original oif") made ip6_route_me_harder() keep the original oif for
link-local and multicast packets. However, it also affected packets
for the loopback address because it used rt6_need_strict().
REDIRECT rules in the OUTPUT chain rewrite the destination to loopback
address; thus its oif should not be preserved. This commit fixes the bug
that redirected local packets are being dropped. Actually the packet was
not exactly dropped; Instead it was sent out to the original oif rather
than lo. When a packet with daddr ::1 is sent to the router, it is
effectively dropped.
Fixes: 508b09046c ("netfilter: ipv6: Preserve link scope traffic original oif")
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
xfrm_state_put() moves struct xfrm_state to the GC list
and schedules the GC work to clean it up. On net exit call
path, xfrm_state_flush() is called to clean up and
xfrm_flush_gc() is called to wait for the GC work to complete
before exit.
However, this doesn't work because one of the ->destructor(),
ipcomp_destroy(), schedules the same GC work again inside
the GC work. It is hard to wait for such a nested async
callback. This is also why syzbot still reports the following
warning:
WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
...
ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
kthread+0x357/0x430 kernel/kthread.c:246
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
In fact, it is perfectly fine to bypass GC and destroy xfrm_state
synchronously on net exit call path, because it is in process context
and doesn't need a work struct to do any blocking work.
This patch introduces xfrm_state_put_sync() which simply bypasses
GC, and lets its callers to decide whether to use this synchronous
version. On net exit path, xfrm_state_fini() and
xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
blocking, it can use xfrm_state_put_sync() directly too.
Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
reflect this change.
Fixes: b48c05ab5d ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The unbalance of master's promiscuity or allmulti will happen after ifdown
and ifup a slave interface which is in a bridge.
When we ifdown a slave interface , both the 'dsa_slave_close' and
'dsa_slave_change_rx_flags' will clear the master's flags. The flags
of master will be decrease twice.
In the other hand, if we ifup the slave interface again, since the
slave's flags were cleared the 'dsa_slave_open' won't set the master's
flag, only 'dsa_slave_change_rx_flags' that triggered by 'br_add_if'
will set the master's flags. The flags of master is increase once.
Only propagating flag changes when a slave interface is up makes
sure this does not happen. The 'vlan_dev_change_rx_flags' had the
same problem and was fixed, and changes here follows that fix.
Fixes: 91da11f870 ("net: Distributed Switch Architecture protocol support")
Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For RDMA transports, RDS TOS is an extension of IB QoS(Annex A13)
to provide clients the ability to segregate traffic flows for
different type of data. RDMA CM abstract it for ULPs using
rdma_set_service_type(). Internally, each traffic flow is
represented by a connection with all of its independent resources
like that of a normal connection, and is differentiated by
service type. In other words, there can be multiple qp connections
between an IP pair and each supports a unique service type.
The feature has been added from RDSv4.1 onwards and supports
rolling upgrades. RDMA connection metadata also carries the tos
information to set up SL on end to end context. The original
code was developed by Bang Nguyen in downstream kernel back in
2.6.32 kernel days and it has evolved over period of time.
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
RDMA transport maps user tos to underline virtual lanes(VL)
for IB or DSCP values. RDMA CM transport abstract thats for
RDS. TCP transport makes use of default priority 0 and maps
all user tos values to it.
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
RDS Service type (TOS) is user-defined and needs to be configured
via RDS IOCTL interface. It must be set before initiating any
traffic and once set the TOS can not be changed. All out-going
traffic from the socket will be associated with its TOS.
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
For legacy protocol version incompatibility with non linux RDS,
consumer reject reason being used to convey it to peer. But the
choice of reject reason value as '1' was really poor.
Anyway for interoperability reasons with shipping products,
it needs to be supported. For any future versions, properly
encoded reject reason should to be used.
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Mark RDSv3.1 as compat version and add v4.1 version macro's.
Subsequent patches enable TOS(Type of Service) feature which is
tied with v4.1 for RDMA transport.
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
indirect calls are only needed if ipv6 is a module.
Add helpers to abstract the v6ops indirections and use them instead.
fragment, reroute and route_input are kept as indirect calls.
The first two are not not used in hot path and route_input is only
used by bridge netfilter.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_nat_ipv6 calls two ipv6 core functions, so add those to v6ops to avoid
the module dependency.
This is a prerequisite for merging ipv4 and ipv6 nat implementations.
Add wrappers to avoid the indirection if ipv6 is builtin.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In fl_change(), when adding a new rule (i.e. fold == NULL), a driver may
reject the new rule, for example due to resource exhaustion. By that
point, the new rule was already assigned a mask, and it was added to
that mask's hash table. The clean-up path that's invoked as a result of
the rejection however neglects to undo the hash table addition, and
proceeds to free the new rule, thus leaving a dangling pointer in the
hash table.
Fix by removing fnew from the mask's hash table before it is freed.
Fixes: 35cc3cefc4 ("net/sched: cls_flower: Reject duplicated rules also under skip_sw")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If some kind of closing is received from the peer while still in
state SMC_INIT, it means the peer has had an active connection and
closed the socket quickly before listen_work finished. This should
not result in a shortcut from state SMC_INIT to state SMC_CLOSED.
This patch adds the socket to the accept queue in state
SMC_APPCLOSEWAIT1. The socket reaches state SMC_CLOSED once being
accepted and closed with smc_release().
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once RMBs are flagged as unused they are candidates for reuse.
Thus the LLC DELETE RKEY operaton should be made before flagging
the RMB as unused.
Fixes: c7674c001b ("net/smc: unregister rkeys of unused buffer")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some scenarios a separate consumer cursor update is necessary.
The decision is made in smc_tx_consumer_cursor_update(). The
sender_free computation could be wrong:
The rx confirmed cursor is always smaller than or equal to the
rx producer cursor. The parameters in the smc_curs_diff() call
have to be exchanged, otherwise sender_free might even be negative.
And if more data arrives local_rx_ctrl.prod might be updated, enabling
a cursor difference between local_rx_ctrl.prod and rx confirmed cursor
larger than the RMB size. This case is not covered by smc_curs_diff().
Thus function smc_curs_diff_large() is introduced here.
If a recvmsg() is processed in parallel, local_tx_ctrl.cons might
change during smc_cdc_msg_send. Make sure rx_curs_confirmed is updated
with the actually sent local_tx_ctrl.cons value.
Fixes: e82f2e31f5 ("net/smc: optimize consumer cursor updates")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The work requests for rdma writes are built in local variables within
function smc_tx_rdma_write(). This violates the rule that the work
request storage has to stay till the work request is confirmed by
a completion queue response.
This patch introduces preallocated memory for these work requests.
The storage is allocated, once a link (and thus a queue pair) is
established.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anonymous sets that are bound to rules from the same transaction trigger
a kernel splat from the abort path due to double set list removal and
double free.
This patch updates the logic to search for the transaction that is
responsible for creating the set and disable the set list removal and
release, given the rule is now responsible for this. Lookup is reverse
since the transaction that adds the set is likely to be at the tail of
the list.
Moreover, this patch adds the unbind step to deliver the event from the
commit path. This should not be done from the worker thread, since we
have no guarantees of in-order delivery to the listener.
This patch removes the assumption that both activate and deactivate
callbacks need to be provided.
Fixes: cd5125d8f5 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase")
Reported-by: Mikhail Morfikov <mmorfikov@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nft "tunnel" expr match both the tun_info of RX and TX. This patch
provide the NFTA_TUNNEL_MODE to individually match the tun_info of
RX or TX.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It is possible that two concurrent packets originating from the same
socket of a connection-less protocol (e.g. UDP) can end up having
different IP_CT_DIR_REPLY tuples which results in one of the packets
being dropped.
To illustrate this, consider the following simplified scenario:
1. Packet A and B are sent at the same time from two different threads
by same UDP socket. No matching conntrack entry exists yet.
Both packets cause allocation of a new conntrack entry.
2. get_unique_tuple gets called for A. No clashing entry found.
conntrack entry for A is added to main conntrack table.
3. get_unique_tuple is called for B and will find that the reply
tuple of B is already taken by A.
It will allocate a new UDP source port for B to resolve the clash.
4. conntrack entry for B cannot be added to main conntrack table
because its ORIGINAL direction is clashing with A and the REPLY
directions of A and B are not the same anymore due to UDP source
port reallocation done in step 3.
This patch modifies nf_conntrack_tuple_taken so it doesn't consider
colliding reply tuples if the IP_CT_DIR_ORIGINAL tuples are equal.
[ Florian: simplify patch to not use .allow_clash setting
and always ignore identical flows ]
Signed-off-by: Martynas Pumputis <martynas@weave.works>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
net/core/sock.c: In function 'sock_setsockopt':
net/core/sock.c:914:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
sock_set_flag(sk, SOCK_TSTAMP_NEW);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/core/sock.c:915:2: note: here
case SO_TIMESTAMPING_OLD:
^~~~
Fixes: 9718475e69 ("socket: Add SO_TIMESTAMPING_NEW")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the header search paths -Itools/include and
-Itools/include/uapi are not used. Let's drop the unused code.
We can remove -I. too by fixing up one C file.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now when using stream reconfig to add out streams, stream->out
will get re-allocated, and all old streams' information will
be copied to the new ones and the old ones will be freed.
So without stream->out_curr updated, next time when trying to
send from stream->out_curr stream, a panic would be caused.
This patch is to check and update stream->out_curr when
allocating stream_out.
v1->v2:
- define fa_index() to get elem index from stream->out_curr.
v2->v3:
- repost with no change.
Fixes: 5bbbbe32a4 ("sctp: introduce stream scheduler foundations")
Reported-by: Ying Xu <yinxu@redhat.com>
Reported-by: syzbot+e33a3a138267ca119c7d@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shared buffer allocation is usually done in cell increments.
Drivers will either round up the allocation or refuse the
configuration if it's not an exact multiple of cell size.
Drivers know exactly the cell size of shared buffer, so help
out users by providing this information in dumps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval
as the time format. struct timeval is not y2038 safe.
The subsequent patches in the series add support for new socket
timeout options with _NEW suffix that will use y2038 safe
data structures. Although the existing struct timeval layout
is sufficiently wide to represent timeouts, because of the way
libc will interpret time_t based on user defined flag, these
new flags provide a way of having a structure that is the same
for all architectures consistently.
Rename the existing options with _OLD suffix forms so that the
right option is enabled for userspace applications according
to the architecture and time_t definition of libc.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: ccaulfie@redhat.com
Cc: deller@gmx.de
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: cluster-devel@redhat.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of y2038 solution, all internal uses of
struct timeval are replaced by struct __kernel_old_timeval
and struct compat_timeval by struct old_timeval32.
Make socket timestamps use these new types.
This is mainly to be able to verify that the kernel build
is y2038 safe when such non y2038 safe types are not
supported anymore.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: isdn@linux-pingi.de
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a cleanup to prepare for the addition of 64-bit time_t
in O_SNDTIMEO/O_RCVTIMEO. The existing compat handler seems
unnecessarily complex and error-prone, moving it all into the
main setsockopt()/getsockopt() implementation requires half
as much code and is easier to extend.
32-bit user space can now use old_timeval32 on both 32-bit
and 64-bit machines, while 64-bit code can use
__old_kernel_timeval.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the virtio transport device disappear, we should reset all
connected sockets in order to inform the users.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
virtio_vsock_remove() invokes the vsock_core_exit() also if there
are opened sockets for the AF_VSOCK protocol family. In this way
the vsock "transport" pointer is set to NULL, triggering the
kernel panic at the first socket activity.
This patch move the vsock_core_init()/vsock_core_exit() in the
virtio_vsock respectively in module_init and module_exit functions,
that cannot be invoked until there are open sockets.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1609699
Reported-by: Yan Fu <yafu@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't drop IGMP packets with a source address of all zeros which are
IGMP proxy reports. This is documented in Section 2.1.1 IGMP
Forwarding Rules of RFC 4541 IGMP and MLD Snooping Switches
Considerations.
Signed-off-by: Edward Chron <echron@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2019-02-01
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) introduce bpf_spin_lock, from Alexei.
2) convert xdp samples to libbpf, from Maciej.
3) skip verifier tests for unsupported program/map types, from Stanislav.
4) powerpc64 JIT support for BTF line info, from Sandipan.
5) assorted fixed, from Valdis, Jesper, Jiong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If driver did not fill the fw_version field, try to call into
the new devlink get_info op and collect the versions that way.
We assume ethtool was always reporting running versions.
v4:
- use IS_REACHABLE() to avoid problems with DEVLINK=m (kbuildbot).
v3 (Jiri):
- do a dump and then parse it instead of special handling;
- concatenate all versions (well, all that fit :)).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ethtool -i has a few fixed-size fields which can be used to report
firmware version and expansion ROM version. Unfortunately, modern
hardware has more firmware components. There is usually some
datapath microcode, management controller, PXE drivers, and a
CPLD load. Running ethtool -i on modern controllers reveals the
fact that vendors cram multiple values into firmware version field.
Here are some examples from systems I could lay my hands on quickly:
tg3: "FFV20.2.17 bc 5720-v1.39"
i40e: "6.01 0x800034a4 1.1747.0"
nfp: "0.0.3.5 0.25 sriov-2.1.16 nic"
Add a new devlink API to allow retrieving multiple versions, and
provide user-readable name for those versions.
While at it break down the versions into three categories:
- fixed - this is the board/fixed component version, usually vendors
report information like the board version in the PCI VPD,
but it will benefit from naming and common API as well;
- running - this is the running firmware version;
- stored - this is firmware in the flash, after firmware update
this value will reflect the flashed version, while the
running version may only be updated after reboot.
v3:
- add per-type helpers instead of using the special argument (Jiri).
RFCv2:
- remove the nesting in attr DEVLINK_ATTR_INFO_VERSIONS (now
versions are mixed with other info attrs)l
- have the driver report versions from the same callback as
other info.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ethtool -i has served us well for a long time, but its showing
its limitations more and more. The device information should
also be reported per device not per-netdev.
Lay foundation for a simple devlink-based way of reading device
info. Add driver name and device serial number as initial pieces
of information exposed via this new API.
v3:
- rename helpers (Jiri);
- rename driver name attr (Jiri);
- remove double spacing in commit message (Jiri).
RFC v2:
- wrap the skb into an opaque structure (Jiri);
- allow the serial number of be any length (Jiri & Andrew);
- add driver name (Jonathan).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf 2019-01-31
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) disable preemption in sender side of socket filters, from Alexei.
2) fix two potential deadlocks in syscall bpf lookup and prog_register,
from Martin and Alexei.
3) fix BTF to allow typedef on func_proto, from Yonghong.
4) two bpftool fixes, from Jiri and Paolo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 3fb72f1e6e ("ipconfig wait for carrier") added a
"wait for carrier" policy, with a fixed worst case maximum wait
of two minutes.
Now make the wait for carrier timeout configurable on the kernel
commandline and use the 120s as the default.
The timeout messages introduced with
commit 5e404cd658 ("ipconfig: add informative timeout messages while
waiting for carrier") are done in a fixed interval of 20 seconds, just
like they were before (240/12).
Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we don't zerocopy if the crypto framework async bit is set.
However some crypto algorithms (such as x86 AESNI) support async,
but in the context of sendmsg, will never run asynchronously. Instead,
check for actual EINPROGRESS return code before assuming algorithm is
async.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS 1.3 has minor changes from TLS 1.2 at the record layer.
* Header now hardcodes the same version and application content type in
the header.
* The real content type is appended after the data, before encryption (or
after decryption).
* The IV is xored with the sequence number, instead of concatinating four
bytes of IV with the explicit IV.
* Zero-padding: No exlicit length is given, we search backwards from the
end of the decrypted data for the first non-zero byte, which is the
content type. Currently recv supports reading zero-padding, but there
is no way for send to add zero padding.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For TLS 1.3, the control message is encrypted. Handle control
message checks after decryption.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS 1.3 has a different AAD size, use a variable in the code to
make TLS 1.3 support easy.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wire up support for 256 bit keys from the setsockopt to the crypto
framework
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not use pend->idx as index for the arrays because its value is
located in the cleared area. Use the existing local variable instead.
Without this fix the wrong area might be cleared.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device field of the IB event structure does not always point to the
SMC IB device. Load the pointer from the qp_context which is always
provided to smc_ib_qp_event_handler() in the priv field. And for qp
events the affected port is given in the qp structure of the ibevent,
derive it from there.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call smc_cdc_msg_send() under the connection send_lock to make sure all
send operations for one connection are serialized.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
smc_cdc_get_free_slot() might wait for free transfer buffers when using
SMC-R. This wait should not be done under the send_lock, which is a
spin_lock. This fixes a cpu loop in parallel threads waiting for the
send_lock.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a socket was connected and is now shut down for read, return 0 to
indicate end of data in recvmsg and splice_read (like TCP) and do not
return ENOTCONN. This behavior is required by the socket api.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When there is no more send buffer space and at least 1 byte was already
sent then return to user space. The wait is only done when no data was
sent by the sendmsg() call.
This fixes smc_tx_sendmsg() which tried to always send all user data and
started to wait for free send buffer space when needed. During this wait
the user space program was blocked in the sendmsg() call and hence not
able to receive incoming data. When both sides were in such a situation
then the connection stalled forever.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To prevent races between smc_lgr_terminate() and smc_conn_free() add an
extra check of the lgr field before accessing it, and cancel a delayed
free_work when a new smc connection is created.
This fixes the problem that free_work cleared the lgr variable but
smc_lgr_terminate() or smc_conn_free() still access it in parallel.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, users can only send pnetids with a maximum length of 15 bytes
over the SMC netlink interface although the maximum pnetid length is 16
bytes. This patch changes the SMC netlink policy to accept 16 byte
pnetids.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Comparing an int to a size, which is unsigned, causes the int to become
unsigned, giving the wrong result. kernel_sendmsg can return a negative
error code.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to provide more meaningful messages to user when the process of
loading xdp program onto network interface failed, let's add extack
messages within dev_change_xdp_fd.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
bpf program serialize access to other variables.
Example:
struct hash_elem {
int cnt;
struct bpf_spin_lock lock;
};
struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
if (val) {
bpf_spin_lock(&val->lock);
val->cnt++;
bpf_spin_unlock(&val->lock);
}
Restrictions and safety checks:
- bpf_spin_lock is only allowed inside HASH and ARRAY maps.
- BTF description of the map is mandatory for safety analysis.
- bpf program can take one bpf_spin_lock at a time, since two or more can
cause dead locks.
- only one 'struct bpf_spin_lock' is allowed per map element.
It drastically simplifies implementation yet allows bpf program to use
any number of bpf_spin_locks.
- when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
- bpf program must bpf_spin_unlock() before return.
- bpf program can access 'struct bpf_spin_lock' only via
bpf_spin_lock()/bpf_spin_unlock() helpers.
- load/store into 'struct bpf_spin_lock lock;' field is not allowed.
- to use bpf_spin_lock() helper the BTF description of map value must be
a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
Nested lock inside another struct is not allowed.
- syscall map_lookup doesn't copy bpf_spin_lock field to user space.
- syscall map_update and program map_update do not update bpf_spin_lock field.
- bpf_spin_lock cannot be on the stack or inside networking packet.
bpf_spin_lock can only be inside HASH or ARRAY map value.
- bpf_spin_lock is available to root only and to all program types.
- bpf_spin_lock is not allowed in inner maps of map-in-map.
- ld_abs is not allowed inside spin_lock-ed region.
- tracing progs and socket filter progs cannot use bpf_spin_lock due to
insufficient preemption checks
Implementation details:
- cgroup-bpf class of programs can nest with xdp/tc programs.
Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
Other solutions to avoid nested bpf_spin_lock are possible.
Like making sure that all networking progs run with softirq disabled.
spin_lock_irqsave is the simplest and doesn't add overhead to the
programs that don't use it.
- arch_spinlock_t is used when its implemented as queued_spin_lock
- archs can force their own arch_spinlock_t
- on architectures where queued_spin_lock is not available and
sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
- presence of bpf_spin_lock inside map value could have been indicated via
extra flag during map_create, but specifying it via BTF is cleaner.
It provides introspection for map key/value and reduces user mistakes.
Next steps:
- allow bpf_spin_lock in other map types (like cgroup local storage)
- introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
to request kernel to grab bpf_spin_lock before rewriting the value.
That will serialize access to map elements.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
- bump version strings, by Simon Wunderlich
- Add DHCPACKs for DAT snooping, by Linus Luessing
- Update copyright years for 2019, by Sven Eckelmann
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlxUKp8WHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoc1rD/oC39wZl/ZdubJogpGWI3G9SdU8
SxbjnaCVCkLMsLJH/wv/OM9OWwHa2OhvOzmMO5Zq47dy5SYXqBMkvv0QOMJcUQqz
1nPMssTVo9gQLrJiPiP+yXGZ5+sc4dsLCiykFsCs7YRhqeStL1QTZkAOEKfy0K+7
6G5OQsLT62aY+1OWLbb5oJaB9nlhUY+GyuQZ213jNuYxP7I6MyM9FfMHokASMLn8
H58fIvDyGkC0PXvYZiJedlnFBU92TPEnBAV/afJ8egcmYQw9jkWL3cbS5ZzqDG4m
49p9/Xmt2ARsf4UMDxQTEE3elw3tu1PZGPSecTmU+rRzHyYHIIYWnFgTZWmK7/zU
TKQMlrPx4ky8HOyIY6/5AHNR7x5muchgxf0ft+4Jf0Bf+rGFIgdqfAIeQliUNAUc
IW+HC0c1SEU/519a6z1V/ARrC6W4qk8aBZ0G4zyx+76KLvxlyvgjEo3XNasyNJnY
GpHHhpyIeY5xeNOsGmoVrQraJRMqwr4jnWdcmf1LS8o9loB6X7bij8SRKUsEwNi2
AkIK2sojUf6c4YZW/GVqIgvPuGtZL/Sy9FHx7Ve5f1NRuxpcMStSVV+daYqHYEe6
72/WYF+oJc4fCR3zAXc67wVoyuPVPxOKpwh2uJUeVbvdvHZ0+dpOV740ktdFjIWQ
3XAGhl/dSrn7OXYJqA==
=G6IW
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-for-davem-20190201' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- Add DHCPACKs for DAT snooping, by Linus Luessing
- Update copyright years for 2019, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In sctp_sendmesg(), when walking the list of endpoint associations, the
association can be dropped from the list, making the list corrupt.
Properly handle this by using list_for_each_entry_safe()
Fixes: 4910280503 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg")
Reported-by: Secunia Research <vuln@secunia.com>
Tested-by: Secunia Research <vuln@secunia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* airtime fairness scheduling in mac80211, so we can share
* more authentication offloads to userspace - this is for
SAE which is part of WPA3 and is hard to do in firmware
* documentation fixes
* various mesh improvements
* various other small improvements/cleanups
This also contains the NLA_POLICY_NESTED{,_ARRAY} change we
discussed, which affects everyone but there's no other user
yet.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlxUKdwACgkQB8qZga/f
l8TVaA/+Mw4gwHqAb4oQPfimZi4Zvaefu1lA6xRFEpubMUuo9ntR+qbRQWEt4D5B
XOyY0xoqqeewxelGdXZ/XvF9vFxK1m9SgP5A5RSswbWETDLV/AIL3x9OdhcBLYMn
ONm79SgMC0pyP+UePvgtZqNSQpf/3T1jeTBZAADyE3jf42iYhvtpyRFKwJi1mvNT
wGN6mWz4FD4m3uFQem6bVRFPPfv06f29fK7JcY9iTkP7H8Glt0NK9mrt2OVPqZM4
g7BDg/epOlKuvFgv5Z1uttNWfEWikZt6fWbyusA7T4UBO7jf7mQKotSczQ1D0awQ
wWFfLcX8esVmiioMwyyJus1sxxxX3GIcbIfgfd3wM4kq4LfqICuT8NSssziPD5OH
A1gJMQsVk/bT2uACDA2IoX1m4Sf1hvgdKiyihyMQdcAlJuRBh8Bbbthd2Jzlo+nM
mHNT2lW2WR+0WslSl+uEk9NCtnOVZKdmSM5dJCmFgc/sycg7+AtksQNBT5p8C7SR
TsqArDDRHXsySZYFVCDziaJsU/SIp3RQ/5K3owzs7TtCkVOlOJVRlG7/4twa4AdH
qw3VPspXmPj6N+LDbBfmz5g2YAVQXktTTkrfYtwN5lAD9qfLpCRwO3T+BaVgVASZ
8GCd0uIuIo8DI61QFf8lLbB5K90FHoSNNfG0yDeB7oBYF+S5v94=
=9ese
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2019-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
New features for the wifi stack:
* airtime fairness scheduling in mac80211, so we can share
* more authentication offloads to userspace - this is for
SAE which is part of WPA3 and is hard to do in firmware
* documentation fixes
* various mesh improvements
* various other small improvements/cleanups
This also contains the NLA_POLICY_NESTED{,_ARRAY} change we
discussed, which affects everyone but there's no other user
yet.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- Avoid WARN to report incorrect configuration, by Sven Eckelmann
- Fix mac header position setting, by Sven Eckelmann
- Fix releasing station statistics, by Felix Fietkau
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlxUKaYWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoZZXEACeYSbn6Z6/B6fUg0pYiDtupnr5
wTCiomEoJHSh9Ko4xz3NhTdr68b1dkq+rZrANtMf7nGJau+VDSsPdbKG5+efdU0O
or0F2gzBB1O4YF4zX50IwVz/kV81qsumzosrISDgJ1vbcEwUOf6Tu1BW+5IQjrS7
rJPYQ5V9OABoPhRd0zjnQq5efxTcouVbKDwDNubHLzjUW6fRKfEH+R8IjhzSq8Wq
E4KCEKdfFQLkaIqNYRbfaRSqD1diK5xQm1Z1ioZXPGJYLaHyDctVmC9b5vULm1fg
BGxBOcuniEvoSMLTnfSRxKHkxhioSJjIiHu4yABcCZFI1RCsDAQchTissUjtWirP
JAnR+/0wRzGKxdNc+rSISwBB0yxKTiSxNvBQRwnFoX3PkKlPLMQOZT45UH6qKFYj
yBu+2hLiG+BaJksz2R1K/23COmAJO3JktfcvvKVwki+EmXlU8TFDj7HWzM3oZO/F
b4GXR5+lO1nt92llGSn7y+IhD/VYM8mwPG40ANVPTaaWiMeAUd/0YaVw8xyf+jte
zedLOYRlQQs1ab92CJVHKXP3IH+IY9m7HNSdFjmR3mAN2aIq7ETR1ZulAwflaYe3
VI3tMe5kDPihrCkJm7GLIbjGxVJmcwqJFSa+usStsTv2xcSvcK1pxQKEoFjkFZXs
wpOsiHI1nWRQBxW5SA==
=M/+V
-----END PGP SIGNATURE-----
Merge tag 'batadv-net-for-davem-20190201' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Avoid WARN to report incorrect configuration, by Sven Eckelmann
- Fix mac header position setting, by Sven Eckelmann
- Fix releasing station statistics, by Felix Fietkau
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We recently changed this function in commit f9fc54d313 ("ethtool:
check the return value of get_regs_len") such that if "reglen" is zero
we return directly. That means we can remove this condition as well.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the missing and malformed documentation that kernel-doc and
sphinx warn about. While at it, also add some things to the docs
to fix missing links.
Sadly, the only way I could find to fix this was to add some
trailing whitespace.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since we now prevent regulatory restore during STA disconnect
if concurrent AP interfaces are active, we need to reschedule
this check when the AP state changes. This fixes never doing
a restore when an AP is the last interface to stop. Or to put
it another way: we need to re-check after anything we check
here changes.
Cc: stable@vger.kernel.org
Fixes: 113f3aaa81 ("cfg80211: Prevent regulatory restore during STA disconnect in concurrent interfaces")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some drivers use IEEE80211_KEY_FLAG_SW_MGMT_TX to indicate that management
frames need to be software encrypted. Since normal data packets are still
encrypted by the hardware, crypto_tx_tailroom_needed_cnt gets decremented
after key upload to hw. This can lead to passing skbs to ccmp_encrypt_skb,
which don't have the necessary tailroom for software encryption.
Change the code to add tailroom for encrypted management packets, even if
crypto_tx_tailroom_needed_cnt is 0.
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In typical cases, there's no need to pass both the maxattr
and the policy array pointer, as the maxattr should just be
ARRAY_SIZE(policy) - 1. Therefore, to be less error prone,
just remove the maxattr argument from the default macros
and deduce the size accordingly.
Leave the original macros with a leading underscore to use
here and in case somebody needs to pass a policy pointer
where the policy isn't declared in the same place and thus
ARRAY_SIZE() cannot be used.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Merge net-next so that we get the changes from net, which would
otherwise conflict with the NLA_POLICY_NESTED/_ARRAY changes.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There was a typo in the documentation for weight_multiplier in mac80211.h,
and the doc was missing entirely for airtime and airtime_weight in sta_info.h.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The size of L2TPv2 header with all optional fields is 14 bytes.
l2tp_udp_recv_core only moves 10 bytes to the linear part of a
skb. This may lead to l2tp_recv_common read data outside of a skb.
This patch make sure that there is at least 14 bytes in the linear
part of a skb to meet the maximum need of l2tp_udp_recv_core and
l2tp_recv_common. The minimum size of both PPP HDLC-like frame and
Ethernet frame is larger than 14 bytes, so we are safe to do so.
Also remove L2TP_HDR_SIZE_NOSEQ, it is unused now.
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use pskb_may_pull() to make sure the optional fields are in skb linear
parts, so we can safely read them later.
It's easy to reproduce the issue with a net driver that supports paged
skb data. Just create a L2TPv3 over IP tunnel and then generates some
network traffic.
Once reproduced, rx err in /sys/kernel/debug/l2tp/tunnels will increase.
Changes in v4:
1. s/l2tp_v3_pull_opt/l2tp_v3_ensure_opt_in_linear/
2. s/tunnel->version != L2TP_HDR_VER_2/tunnel->version == L2TP_HDR_VER_3/
3. Add 'Fixes' in commit messages.
Changes in v3:
1. To keep consistency, move the code out of l2tp_recv_common.
2. Use "net" instead of "net-next", since this is a bug fix.
Changes in v2:
1. Only fix L2TPv3 to make code simple.
To fix both L2TPv3 and L2TPv2, we'd better refactor l2tp_recv_common.
It's complicated to do so.
2. Reloading pointers after pskb_may_pull
Fixes: f7faffa3ff ("l2tp: Add L2TPv3 protocol support")
Fixes: 0d76751fad ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support")
Fixes: a32e0eec70 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb->cb may contain data from previous layers (in an observed case
IPv4 with L3 Master Device). In the observed scenario, the data in
IPCB(skb)->frags was misinterpreted as IP6CB(skb)->frag_max_size,
eventually caused an unexpected IPv6 fragmentation in ip6_fragment()
through ip6_finish_output().
This patch clears IP6CB(skb), which potentially contains garbage data,
on the SRH ip4ip6 encapsulation.
Fixes: 32d99d0b67 ("ipv6: sr: add support for ip4ip6 encapsulation")
Signed-off-by: Yohei Kanemaru <yohei.kanemaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Erspan_v4, Erspan_v6 protocol relies on o_key to configure
session id header field. However TUNNEL_KEY bit is cleared in
ip6erspan_tunnel_xmit since ERSPAN protocol does not set the key field
of the external GRE header and so the configured o_key is not reported
to userspace. The issue can be triggered with the following reproducer:
$ip link add ip6erspan1 type ip6erspan local 2000::1 remote 2000::2 \
key 1 seq erspan_ver 1
$ip link set ip6erspan1 up
ip -d link sh ip6erspan1
ip6erspan1@NONE: <BROADCAST,MULTICAST> mtu 1422 qdisc noop state DOWN mode DEFAULT
link/ether ba:ff:09:24:c3:0e brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
ip6erspan remote 2000::2 local 2000::1 encaplimit 4 flowlabel 0x00000 ikey 0.0.0.1 iseq oseq
Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
ip6gre_fill_info
Fixes: 5a963eb61b ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erspan protocol (version 1 and 2) relies on o_key to configure
session id header field. However TUNNEL_KEY bit is cleared in
erspan_xmit since ERSPAN protocol does not set the key field
of the external GRE header and so the configured o_key is not reported
to userspace. The issue can be triggered with the following reproducer:
$ip link add erspan1 type erspan local 192.168.0.1 remote 192.168.0.2 \
key 1 seq erspan_ver 1
$ip link set erspan1 up
$ip -d link sh erspan1
erspan1@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UNKNOWN mode DEFAULT
link/ether 52:aa:99:95:9a:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
erspan remote 192.168.0.2 local 192.168.0.1 ttl inherit ikey 0.0.0.1 iseq oseq erspan_index 0
Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
ipgre_fill_info
Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same story as before, these use struct ifreq and thus need
to be read with the shorter version to not cause faults.
Cc: stable@vger.kernel.org
Fixes: f92d4fc953 ("kill bond_ioctl()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As reported by Robert O'Callahan in
https://bugzilla.kernel.org/show_bug.cgi?id=202273
reverting the previous changes in this area broke
the SIOCGIFNAME ioctl in compat again (I'd previously
fixed it after his previous report of breakage in
https://bugzilla.kernel.org/show_bug.cgi?id=199469).
This is obviously because I fixed SIOCGIFNAME more or
less by accident.
Fix it explicitly now by making it pass through the
restored compat translation code.
Cc: stable@vger.kernel.org
Fixes: 4cf808e7ac ("kill dev_ifname32()")
Reported-by: Robert O'Callahan <robert@ocallahan.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit bf4405737f ("kill dev_ifsioc()").
This wasn't really unused as implied by the original commit,
it still handles the copy to/from user differently, and the
commit thus caused issues such as
https://bugzilla.kernel.org/show_bug.cgi?id=199469
and
https://bugzilla.kernel.org/show_bug.cgi?id=202273
However, deviating from a strict revert, rename dev_ifsioc()
to compat_ifreq_ioctl() to be clearer as to its purpose and
add a comment.
Cc: stable@vger.kernel.org
Fixes: bf4405737f ("kill dev_ifsioc()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 1cebf8f143 ("socket: fix struct ifreq
size in compat ioctl"), it's a bugfix for another commit that
I'll revert next.
This is not a 'perfect' revert, I'm keeping some coding style
intact rather than revert to the state with indentation errors.
Cc: stable@vger.kernel.org
Fixes: 1cebf8f143 ("socket: fix struct ifreq size in compat ioctl")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_scheduler and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_scheduler,
it's compatible with 0.
SCTP_CURRENT_ASSOC is supported for SCTP_STREAM_SCHEDULER in this
patch. It also adds default_ss in sctp_sock to support
SCTP_FUTURE_ASSOC.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_event and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_event,
it's compatible with 0.
SCTP_CURRENT_ASSOC is supported for SCTP_EVENT in this patch.
It also adds sctp_assoc_ulpevent_type_set() to make code more
readable.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_enable_strreset and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_enable_strreset,
it's compatible with 0.
SCTP_CURRENT_ASSOC is supported for SCTP_ENABLE_STREAM_RESET in this patch.
It also adjusts some code to keep a same check form as other functions.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_default_prinfo and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_default_prinfo,
it's compatible with 0.
SCTP_CURRENT_ASSOC is supported for SCTP_DEFAULT_PRINFO in this patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_deactivate_key.
SCTP_CURRENT_ASSOC is supported for SCTP_AUTH_DEACTIVATE_KEY in this
patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_del_key.
SCTP_CURRENT_ASSOC is supported for SCTP_AUTH_DELETE_KEY in this patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_auth_key.
SCTP_CURRENT_ASSOC is supported for SCTP_AUTH_ACTIVE_KEY in this patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_auth_key.
SCTP_CURRENT_ASSOC is supported for SCTP_AUTH_KEY in this patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_maxburst and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_maxburst,
it's compatible with 0.
SCTP_CURRENT_ASSOC is supported for SCTP_CONTEXT in this patch.
It also adjusts some code to keep a same check form as other
functions.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_context and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_context,
it's compatible with 0.
SCTP_CURRENT_ASSOC is supported for SCTP_CONTEXT in this patch.
It also adjusts some code to keep a same check form as other
functions.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_default_sndinfo and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_default_sndinfo,
it's compatible with 0.
SCTP_CURRENT_ASSOC is supported for SCTP_DEFAULT_SNDINFO in this patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_default_send_param and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_default_send_param,
it's compatible with 0.
SCTP_CURRENT_ASSOC is supported for SCTP_DEFAULT_SEND_PARAM in this patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_delayed_ack and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_delayed_ack,
it's compatible with 0.
SCTP_CURRENT_ASSOC is supported for SCTP_DELAYED_SACK in this patch.
It also adds sctp_apply_asoc_delayed_ack() to make code more readable.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP_STREAM_SCHEDULER_VALUE is a special one, as its value is not
save in sctp_sock, but only in asoc. So only SCTP_CURRENT_ASSOC
reserved assoc_id can be used in sctp_setsockopt_scheduler_value.
This patch adds SCTP_CURRENT_ASOC support for
SCTP_STREAM_SCHEDULER_VALUE.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_reconfig_supported, it's compatible with 0.
It also adjusts some code to keep a same check form as other functions.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_reconfig_supported, it's compatible with 0.
It also adjusts some code to keep a same check form as other functions.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_pr_supported, it's compatible with 0.
It also adjusts some code to keep a same check form as other functions.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_paddr_thresholds, it's compatible with 0.
It also adds pf_retrans in sctp_sock to support SCTP_FUTURE_ASSOC.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_FUTURE_ASSOC instead in
sctp_getsockopt_local_auth_chunks, it's compatible with 0.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_maxseg, it's compatible with 0.
Also check asoc_id early as other sctp setsockopts does.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_associnfo, it's compatible with 0.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_rtoinfo, it's compatible with 0.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check with SCTP_FUTURE_ASSOC instead in
sctp_/setgetsockopt_peer_addr_params, it's compatible with 0.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to add 3 constants SCTP_FUTURE_ASSOC,
SCTP_CURRENT_ASSOC and SCTP_ALL_ASSOC for reserved
assoc_ids, as defined in rfc6458#section-7.2.
And add the process for them when doing lookup and
inserting in sctp_id2assoc and sctp_assoc_set_id.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
wake_on_lan - Enables Wake on Lan for this port. If enabled,
the controller asserts a wake pin based on the WOL type.
v2->v3:
- Define only WOL types used now and define them as bitfield, so that
mutliple WOL types can be enabled upon power on.
- Modify "wake-on-lan" name to "wake_on_lan" to be symmetric with
previous definitions.
- Rename DEVLINK_PARAM_WOL_XXX to DEVLINK_PARAM_WAKE_XXX to be
symmetrical with ethtool WOL definitions.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add notification call for devlink port param set, register and unregister
functions.
Add devlink_port_param_value_changed() function to enable the driver notify
devlink on value change. Driver should use this function after value was
changed on any configuration mode part to driverinit.
v7->v8:
Order devlink_port_param_value_changed() definitions followed by
devlink_param_value_changed()
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for "driverinit" configuration mode value for devlink_port
configuration parameters. Add devlink_port_param_driverinit_value_set()
function to help the driver set the value to devlink_port.
Also, move the common code to __devlink_param_driverinit_value_set()
to be used by both device and port params.
v7->v8:
Re-order the definitions as follows:
__devlink_param_driverinit_value_get
__devlink_param_driverinit_value_set
devlink_param_driverinit_value_get
devlink_param_driverinit_value_set
devlink_port_param_driverinit_value_get
devlink_port_param_driverinit_value_set
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for "driverinit" configuration mode value for devlink_port
configuration parameters. Add devlink_port_param_driverinit_value_get()
function to help the driver get the value from devlink_port.
Also, move the common code to __devlink_param_driverinit_value_get()
to be used by both device and port params.
v7->v8:
-Add the missing devlink_port_param_driverinit_value_get() declaration.
-Also, order devlink_port_param_driverinit_value_get() after
devlink_param_driverinit_value_get/set() calls
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add port param set command to set the value for a parameter.
Value can be set to any of the supported configuration modes.
v7->v8: Append "Acked-by: Jiri Pirko <jiri@mellanox.com>"
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add port param get command which gets data per parameter.
It also has option to dump the parameters data per port.
v7->v8: Append "Acked-by: Jiri Pirko <jiri@mellanox.com>"
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add functions to register and unregister for the driver supported
configuration parameters table per port.
v7->v8:
- Order the definitions following way as suggested by Jiri.
__devlink_params_register
__devlink_params_unregister
devlink_params_register
devlink_params_unregister
devlink_port_params_register
devlink_port_params_unregister
- Append with Acked-by: Jiri Pirko <jiri@mellanox.com>.
v2->v3:
- Add a helper __devlink_params_register() with common code used by
both devlink_params_register() and devlink_port_params_register().
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Assign a default net namespace to netdevs created by init_dummy_netdev().
Fixes a NULL pointer dereference caused by busy-polling a socket bound to
an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
if napi_poll() received packets:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
IP: napi_busy_loop+0xd6/0x200
Call Trace:
sock_poll+0x5e/0x80
do_sys_poll+0x324/0x5a0
SyS_poll+0x6c/0xf0
do_syscall_64+0x6b/0x1f0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: 7db6b048da ("net: Commonize busy polling code to focus on napi_id instead of socket")
Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 75dd48e2e4 ("netfilter: nf_tables: Support RULE_ID reference in new rule")
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If there are outstanding async tx requests (when crypto returns EINPROGRESS),
there is a potential deadlock: the tx work acquires the lock, while we
cancel_delayed_work_sync() while holding the lock. Drop the lock while waiting
for the work to complete.
Fixes: a42055e8d2 ("Add support for async encryption of records...")
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aead_request_set_crypt takes an iv pointer, and we change the iv
soon after setting it. Some async crypto algorithms don't save the iv,
so we need to save it in the tls_rec for async requests.
Found by hardcoding x64 aesni to use async crypto manager (to test the async
codepath), however I don't think this combination can happen in the wild.
Presumably other hardware offloads will need this fix, but there have been
no user reports.
Fixes: a42055e8d2 ("Add support for async encryption of records...")
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-01-29
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Teach verifier dead code removal, this also allows for optimizing /
removing conditional branches around dead code and to shrink the
resulting image. Code store constrained architectures like nfp would
have hard time doing this at JIT level, from Jakub.
2) Add JMP32 instructions to BPF ISA in order to allow for optimizing
code generation for 32-bit sub-registers. Evaluation shows that this
can result in code reduction of ~5-20% compared to 64 bit-only code
generation. Also add implementation for most JITs, from Jiong.
3) Add support for __int128 types in BTF which is also needed for
vmlinux's BTF conversion to work, from Yonghong.
4) Add a new command to bpftool in order to dump a list of BPF-related
parameters from the system or for a specific network device e.g. in
terms of available prog/map types or helper functions, from Quentin.
5) Add AF_XDP sock_diag interface for querying sockets from user
space which provides information about the RX/TX/fill/completion
rings, umem, memory usage etc, from Björn.
6) Add skb context access for skb_shared_info->gso_segs field, from Eric.
7) Add support for testing flow dissector BPF programs by extending
existing BPF_PROG_TEST_RUN infrastructure, from Stanislav.
8) Split BPF kselftest's test_verifier into various subgroups of tests
in order better deal with merge conflicts in this area, from Jakub.
9) Add support for queue/stack manipulations in bpftool, from Stanislav.
10) Document BTF, from Yonghong.
11) Dump supported ELF section names in libbpf on program load
failure, from Taeung.
12) Silence a false positive compiler warning in verifier's BTF
handling, from Peter.
13) Fix help string in bpftool's feature probing, from Prashant.
14) Remove duplicate includes in BPF kselftests, from Yue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next tree:
1) Introduce a hashtable to speed up object lookups, from Florian Westphal.
2) Make direct calls to built-in extension, also from Florian.
3) Call helper before confirming the conntrack as it used to be originally,
from Florian.
4) Call request_module() to autoload br_netfilter when physdev is used
to relax the dependency, also from Florian.
5) Allow to insert rules at a given position ID that is internal to the
batch, from Phil Sutter.
6) Several patches to replace conntrack indirections by direct calls,
and to reduce modularization, from Florian. This also includes
several follow up patches to deal with minor fallout from this
rework.
7) Use RCU from conntrack gre helper, from Florian.
8) GRE conntrack module becomes built-in into nf_conntrack, from Florian.
9) Replace nf_ct_invert_tuplepr() by calls to nf_ct_invert_tuple(),
from Florian.
10) Unify sysctl handling at the core of nf_conntrack, from Florian.
11) Provide modparam to register conntrack hooks.
12) Allow to match on the interface kind string, from wenxu.
13) Remove several exported symbols, not required anymore now after
a bit of de-modulatization work has been done, from Florian.
14) Remove built-in map support in the hash extension, this can be
done with the existing userspace infrastructure, from laura.
15) Remove indirection to calculate checksums in IPVS, from Matteo Croce.
16) Use call wrappers for indirection in IPVS, also from Matteo.
17) Remove superfluous __percpu parameter in nft_counter, patch from
Luc Van Oostenryck.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The input is packet data, the output is struct bpf_flow_key. This should
make it easy to test flow dissector programs without elaborate
setup.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This way, we can reuse it for flow dissector in BPF_PROG_TEST_RUN.
No functional changes.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Despite having stopped the parser, we still need to deinitialize it
by calling strp_done so that it cancels its work. Otherwise the worker
thread can run after we have freed the parser, and attempt to access
its workqueue resulting in a use-after-free:
==================================================================
BUG: KASAN: use-after-free in pwq_activate_delayed_work+0x1b/0x1d0
Read of size 8 at addr ffff888069975240 by task kworker/u2:2/93
CPU: 0 PID: 93 Comm: kworker/u2:2 Not tainted 5.0.0-rc2-00335-g28f9d1a3d4fe-dirty #14
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
Workqueue: (null) (kstrp)
Call Trace:
print_address_description+0x6e/0x2b0
? pwq_activate_delayed_work+0x1b/0x1d0
kasan_report+0xfd/0x177
? pwq_activate_delayed_work+0x1b/0x1d0
? pwq_activate_delayed_work+0x1b/0x1d0
pwq_activate_delayed_work+0x1b/0x1d0
? process_one_work+0x4aa/0x660
pwq_dec_nr_in_flight+0x9b/0x100
worker_thread+0x82/0x680
? process_one_work+0x660/0x660
kthread+0x1b9/0x1e0
? __kthread_create_on_node+0x250/0x250
ret_from_fork+0x1f/0x30
Allocated by task 111:
sk_psock_init+0x3c/0x1b0
sock_map_link.isra.2+0x103/0x4b0
sock_map_update_common+0x94/0x270
sock_map_update_elem+0x145/0x160
__se_sys_bpf+0x152e/0x1e10
do_syscall_64+0xb2/0x3e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 112:
kfree+0x7f/0x140
process_one_work+0x40b/0x660
worker_thread+0x82/0x680
kthread+0x1b9/0x1e0
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888069975180
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 192 bytes inside of
512-byte region [ffff888069975180, ffff888069975380)
The buggy address belongs to the page:
page:ffffea0001a65d00 count:1 mapcount:0 mapping:ffff88806d401280 index:0x0 compound_mapcount: 0
flags: 0x4000000000010200(slab|head)
raw: 4000000000010200 dead000000000100 dead000000000200 ffff88806d401280
raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888069975100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888069975180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888069975200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888069975280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888069975300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Reported-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/netdev/CAJPywTLwgXNEZ2dZVoa=udiZmtrWJ0q5SuBW64aYs0Y1khXX3A@mail.gmail.com
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree:
1) The nftnl mutex is now per-netns, therefore use reference counter
for matches and targets to deal with concurrent updates from netns.
Moreover, place extensions in a pernet list. Patches from Florian Westphal.
2) Bail out with EINVAL in case of negative timeouts via setsockopt()
through ip_vs_set_timeout(), from ZhangXiaoxu.
3) Spurious EINVAL on ebtables 32bit binary with 64bit kernel, also
from Florian.
4) Reset TCP option header parser in case of fingerprint mismatch,
otherwise follow up overlapping fingerprint definitions including
TCP options do not work, from Fernando Fernandez Mancera.
5) Compilation warning in ipt_CLUSTER with CONFIG_PROC_FS unset.
From Anders Roxell.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Clearly, I missed this when trying out the previously
merged patches. Remove the spurious variable now.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Only one caller; place it where needed and get rid of the EXPORT_SYMBOL.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When nf_ct_netns_get() fails, it should clean up itself,
its caller doesn't need to call nf_conntrack_fini_net().
nf_conntrack_init_net() is called after registering sysctl
and proc, so its cleanup function should be called before
unregistering sysctl and proc.
Fixes: ba3fbe6636 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks")
Fixes: b884fa4617 ("netfilter: conntrack: unify sysctl handling")
Reported-and-tested-by: syzbot+fcee88b2d87f0539dfe9@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nft_counter_rest() has its first argument declared as
struct nft_counter_percpu_priv __percpu *priv
but this structure is not percpu (it only countains
a member 'counter' which is, correctly, a pointer to a
percpu struct nft_counter).
So, remove the '__percpu' from the argument's declaration.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
On ESP output, sk_wmem_alloc is incremented for the added padding if a
socket is associated to the skb. When replying with TCP SYNACKs over
IPsec, the associated sk is a casted request socket, only. Increasing
sk_wmem_alloc on a request socket results in a write at an arbitrary
struct offset. In the best case, this produces the following WARNING:
WARNING: CPU: 1 PID: 0 at lib/refcount.c:102 esp_output_head+0x2e4/0x308 [esp4]
refcount_t: addition on 0; use-after-free.
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc3 #2
Hardware name: Marvell Armada 380/385 (Device Tree)
[...]
[<bf0ff354>] (esp_output_head [esp4]) from [<bf1006a4>] (esp_output+0xb8/0x180 [esp4])
[<bf1006a4>] (esp_output [esp4]) from [<c05dee64>] (xfrm_output_resume+0x558/0x664)
[<c05dee64>] (xfrm_output_resume) from [<c05d07b0>] (xfrm4_output+0x44/0xc4)
[<c05d07b0>] (xfrm4_output) from [<c05956bc>] (tcp_v4_send_synack+0xa8/0xe8)
[<c05956bc>] (tcp_v4_send_synack) from [<c0586ad8>] (tcp_conn_request+0x7f4/0x948)
[<c0586ad8>] (tcp_conn_request) from [<c058c404>] (tcp_rcv_state_process+0x2a0/0xe64)
[<c058c404>] (tcp_rcv_state_process) from [<c05958ac>] (tcp_v4_do_rcv+0xf0/0x1f4)
[<c05958ac>] (tcp_v4_do_rcv) from [<c0598a4c>] (tcp_v4_rcv+0xdb8/0xe20)
[<c0598a4c>] (tcp_v4_rcv) from [<c056eb74>] (ip_protocol_deliver_rcu+0x2c/0x2dc)
[<c056eb74>] (ip_protocol_deliver_rcu) from [<c056ee6c>] (ip_local_deliver_finish+0x48/0x54)
[<c056ee6c>] (ip_local_deliver_finish) from [<c056eecc>] (ip_local_deliver+0x54/0xec)
[<c056eecc>] (ip_local_deliver) from [<c056efac>] (ip_rcv+0x48/0xb8)
[<c056efac>] (ip_rcv) from [<c0519c2c>] (__netif_receive_skb_one_core+0x50/0x6c)
[...]
The issue triggers only when not using TCP syncookies, as for syncookies
no socket is associated.
Fixes: cac2661c53 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a ("esp6: Avoid skb_cow_data whenever possible")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Use the new indirect call wrappers in IPVS when calling the TCP or UDP
protocol specific functions.
This avoids an indirect calls in IPVS, and reduces the performance
impact of the Spectre mitigation.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The function pointer ip_vs_protocol->csum_check is only used in protocol
specific code, and never in the generic one.
Remove the function pointer from struct ip_vs_protocol and call the
checksum functions directly.
This reduces the performance impact of the Spectre mitigation, and
should give a small improvement even with RETPOLINES disabled.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When CONFIG_PROC_FS isn't set the variable cn isn't used.
net/ipv4/netfilter/ipt_CLUSTERIP.c: In function ‘clusterip_net_exit’:
net/ipv4/netfilter/ipt_CLUSTERIP.c:849:24: warning: unused variable ‘cn’ [-Wunused-variable]
struct clusterip_net *cn = clusterip_pernet(net);
^~
Rework so the variable 'cn' is declared inside "#ifdef CONFIG_PROC_FS".
Fixes: b12f7bad5a ("netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When we check the tcp options of a packet and it doesn't match the current
fingerprint, the tcp packet option pointer must be restored to its initial
value in order to do the proper tcp options check for the next fingerprint.
Here we can see an example.
Assumming the following fingerprint base with two lines:
S10:64:1:60:M*,S,T,N,W6: Linux:3.0::Linux 3.0
S20:64:1:60:M*,S,T,N,W7: Linux:4.19:arch:Linux 4.1
Where TCP options are the last field in the OS signature, all of them overlap
except by the last one, ie. 'W6' versus 'W7'.
In case a packet for Linux 4.19 kicks in, the osf finds no matching because the
TCP options pointer is updated after checking for the TCP options in the first
line.
Therefore, reset pointer back to where it should be.
Fixes: 11eeef41d5 ("netfilter: passive OS fingerprint xtables match")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Unlike ip(6)tables ebtables only counts user-defined chains.
The effect is that a 32bit ebtables binary on a 64bit kernel can do
'ebtables -N FOO' only after adding at least one rule, else the request
fails with -EINVAL.
This is a similar fix as done in
3f1e53abff ("netfilter: ebtables: don't attempt to allocate 0-sized compat array").
Fixes: 7d7d7e0211 ("netfilter: compat: reject huge allocation requests")
Reported-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When the MC route socket is closed, mroute_clean_tables() is called to
cleanup existing routes. Mistakenly notifiers call was put on the cleanup
of the unresolved MC route entries cache.
In a case where the MC socket closes before an unresolved route expires,
the notifier call leads to a crash, caused by the driver trying to
increment a non initialized refcount_t object [1] and then when handling
is done, to decrement it [2]. This was detected by a test recently added in
commit 6d4efada3b ("selftests: forwarding: Add multicast routing test").
Fix that by putting notifiers call on the resolved entries traversal,
instead of on the unresolved entries traversal.
[1]
[ 245.748967] refcount_t: increment on 0; use-after-free.
[ 245.754829] WARNING: CPU: 3 PID: 3223 at lib/refcount.c:153 refcount_inc_checked+0x2b/0x30
...
[ 245.802357] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
[ 245.811873] RIP: 0010:refcount_inc_checked+0x2b/0x30
...
[ 245.907487] Call Trace:
[ 245.910231] mlxsw_sp_router_fib_event.cold.181+0x42/0x47 [mlxsw_spectrum]
[ 245.917913] notifier_call_chain+0x45/0x7
[ 245.922484] atomic_notifier_call_chain+0x15/0x20
[ 245.927729] call_fib_notifiers+0x15/0x30
[ 245.932205] mroute_clean_tables+0x372/0x3f
[ 245.936971] ip6mr_sk_done+0xb1/0xc0
[ 245.940960] ip6_mroute_setsockopt+0x1da/0x5f0
...
[2]
[ 246.128487] refcount_t: underflow; use-after-free.
[ 246.133859] WARNING: CPU: 0 PID: 7 at lib/refcount.c:187 refcount_sub_and_test_checked+0x4c/0x60
[ 246.183521] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
...
[ 246.193062] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fibmr_event_work [mlxsw_spectrum]
[ 246.202394] RIP: 0010:refcount_sub_and_test_checked+0x4c/0x60
...
[ 246.298889] Call Trace:
[ 246.301617] refcount_dec_and_test_checked+0x11/0x20
[ 246.307170] mlxsw_sp_router_fibmr_event_work.cold.196+0x47/0x78 [mlxsw_spectrum]
[ 246.315531] process_one_work+0x1fa/0x3f0
[ 246.320005] worker_thread+0x2f/0x3e0
[ 246.324083] kthread+0x118/0x130
[ 246.327683] ? wq_update_unbound_numa+0x1b0/0x1b0
[ 246.332926] ? kthread_park+0x80/0x80
[ 246.337013] ret_from_fork+0x1f/0x30
Fixes: 088aa3eec2 ("ip6mr: Support fib notifications")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Digging through the ioctls with Al because of the previous
patches, we found that on 64-bit decnet's dn_dev_ioctl()
is wrong, because struct ifreq::ifr_ifru is actually 24
bytes (not 16 as expected from struct sockaddr) due to the
ifru_map and ifru_settings members.
Clearly, decnet expects the ioctl to be called with a struct
like
struct ifreq_dn {
char ifr_name[IFNAMSIZ];
struct sockaddr_dn ifr_addr;
};
since it does
struct ifreq *ifr = ...;
struct sockaddr_dn *sdn = (struct sockaddr_dn *)&ifr->ifr_addr;
This means that DN_IFREQ_SIZE is too big for what it wants on
64-bit, as it is
sizeof(struct ifreq) - sizeof(struct sockaddr) +
sizeof(struct sockaddr_dn)
This assumes that sizeof(struct sockaddr) is the size of ifr_ifru
but that isn't true.
Fix this to use offsetof(struct ifreq, ifr_ifru).
This indeed doesn't really matter much - the result is that we
copy in/out 8 bytes more than we should on 64-bit platforms. In
case the "struct ifreq_dn" lands just on the end of a page though
it might lead to faults.
As far as I can tell, it has been like this forever, so it seems
very likely that nobody cares.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to be more confident about an on-going interactive session, we
increment pingpong count by 1 for every interactive transaction and we
adjust TCP_PINGPONG_THRESH to 3.
This means, we only consider a session in pingpong mode after we see 3
interactive transactions, and start to activate delayed acks in quick
ack mode.
And in order to not over-count the credits, we only increase pingpong
count for the first packet sent in response for the previous received
packet.
This is mainly to prevent delaying the ack immediately after some
handshake protocol but no real interactive traffic pattern afterwards.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using pingpong as a single bit information, we refactor the
code to treat it as a counter. When interactive session is detected,
we set pingpong count to TCP_PINGPONG_THRESH. And when pingpong count
is >= TCP_PINGPONG_THRESH, we consider the session in pingpong mode.
This patch is a pure refactor and sets foundation for the next patch.
This patch itself does not change any pingpong logic.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix blank line coding style issues, make the code cleaner.
Remove a redundant blank line in ip_rcv_core().
Insert a blank line in ip_rcv() between different statement blocks.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an internally generated frame is handled by rose_xmit(),
rose_route_frame() is called:
if (!rose_route_frame(skb, NULL)) {
dev_kfree_skb(skb);
stats->tx_errors++;
return NETDEV_TX_OK;
}
We have the same code sequence in Net/Rom where an internally generated
frame is handled by nr_xmit() calling nr_route_frame(skb, NULL).
However, in this function NULL argument is tested while it is not in
rose_route_frame().
Then kernel panic occurs later on when calling ax25cmp() with a NULL
ax25_cb argument as reported many times and recently with syzbot.
We need to test if ax25 is NULL before using it.
Testing:
Built kernel with CONFIG_ROSE=y.
Signed-off-by: Bernard Pidoux <f6bvp@free.fr>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: syzbot+1a2c456a1ea08fa5b5f7@syzkaller.appspotmail.com
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Bernard Pidoux <f6bvp@free.fr>
Cc: linux-hams@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_reset_timer() and sk_stop_timer() properly handle
sock refcnt for timer function. Switching to them
could fix a refcounting bug reported by syzbot.
Reported-and-tested-by: syzbot+defa700d16f1bd1b9a05@syzkaller.appspotmail.com
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2019-01-25
1) Several patches to fix the fallout from the recent
tree based policy lookup work. From Florian Westphal.
2) Fix VTI for IPCOMP for 'not compressed' IPCOMP packets.
We need an extra IPIP handler to process these packets
correctly. From Su Yanjun.
3) Fix validation of template and selector families for
MODE_ROUTEOPTIMIZATION with ipv4-in-ipv6 packets.
This can lead to a stack-out-of-bounds because
flowi4 struct is treated as flowi6 struct.
Fix from Florian Westphal.
4) Restore the default behaviour of the xfrm set-mark
in the output path. This was changed accidentally
when mark setting was extended to the input path.
From Benedict Wong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Count ttl-dropped frames properly in mac80211, from Bob Copeland.
2) Integer overflow in ktime handling of bcm can code, from Oliver
Hartkopp.
3) Fix RX desc handling wrt. hw checksumming in ravb, from Simon
Horman.
4) Various hash key fixes in hv_netvsc, from Haiyang Zhang.
5) Use after free in ax25, from Eric Dumazet.
6) Several fixes to the SSN support in SCTP, from Xin Long.
7) Do not process frames after a NAPI reschedule in ibmveth, from
Thomas Falcon.
8) Fix NLA_POLICY_NESTED arguments, from Johannes Berg.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (42 commits)
qed: Revert error handling changes.
cfg80211: extend range deviation for DMG
cfg80211: reg: remove warn_on for a normal case
mac80211: Add attribute aligned(2) to struct 'action'
mac80211: don't initiate TDLS connection if station is not associated to AP
nl80211: fix NLA_POLICY_NESTED() arguments
ibmveth: Do not process frames after calling napi_reschedule
net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP
net: usb: asix: ax88772_bind return error when hw_reset fail
MAINTAINERS: Update cavium networking drivers
net/mlx4_core: Fix error handling when initializing CQ bufs in the driver
net/mlx4_core: Add masking for a few queries on HCA caps
sctp: set flow sport from saddr only when it's 0
sctp: set chunk transport correctly when it's a new asoc
sctp: improve the events for sctp stream adding
sctp: improve the events for sctp stream reset
ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel
ax25: fix possible use-after-free
sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe
hv_netvsc: fix typos in code comments
...
Refactor collect metatdata mode tunnel xmit to the generic xmit function
ip_md_tunnel_xmit. It makes codes more generic and support more feture
such as pmtu_update through ip_md_tunnel_xmit
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Init the gre_key from tuninfo->key.tun_id and init the mark
from the skb->mark, set the oif to zero in the collect metadata
mode.
Fixes: cfc7381b30 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tnl_update_pmtu in ip_md_tunnel_xmit to dynamic modify
the pmtu which packet send through collect_metadata mode
ip tunnel
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ip tunnel dst cache in ip_md_tunnel_xmit to make more
efficient for the route lookup.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Accept MSG_ZEROCOPY in all the TCP states that allow sendmsg. Remove
the explicit check for ESTABLISHED and CLOSE_WAIT states.
This requires correctly handling zerocopy state (uarg, sk_zckey) in
all paths reachable from other TCP states. Such as the EPIPE case
in sk_stream_wait_connect, which a sendmsg() in incorrect state will
now hit. Most paths are already safe.
Only extension needed is for TCP Fastopen active open. This can build
an skb with data in tcp_send_syn_data. Pass the uarg along with other
fastopen state, so that this skb also generates a zerocopy
notification on release.
Tested with active and passive tcp fastopen packetdrill scripts at
1747eef03d
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, IPv6 defragmentation code drops non-last fragments that
are smaller than 1280 bytes: see
commit 0ed4229b08 ("ipv6: defrag: drop non-last frags smaller than min mtu")
This behavior is not specified in IPv6 RFCs and appears to break
compatibility with some IPv6 implemenations, as reported here:
https://www.spinics.net/lists/netdev/msg543846.html
This patch re-uses common IP defragmentation queueing and reassembly
code in IP6 defragmentation in nf_conntrack, removing the 1280 byte
restriction.
Signed-off-by: Peter Oskolkov <posk@google.com>
Reported-by: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, IPv6 defragmentation code drops non-last fragments that
are smaller than 1280 bytes: see
commit 0ed4229b08 ("ipv6: defrag: drop non-last frags smaller than min mtu")
This behavior is not specified in IPv6 RFCs and appears to break
compatibility with some IPv6 implemenations, as reported here:
https://www.spinics.net/lists/netdev/msg543846.html
This patch re-uses common IP defragmentation queueing and reassembly
code in IPv6, removing the 1280 byte restriction.
Signed-off-by: Peter Oskolkov <posk@google.com>
Reported-by: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a refactoring patch: without changing runtime behavior,
it moves rbtree-related code from IPv4-specific files/functions
into .h/.c defrag files shared with IPv6 defragmentation code.
Signed-off-by: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Host drivers may offload authentication to the user space
through the commit ("cfg80211: Authentication offload to
user space in AP mode").
This interface can be used to implement SAE by having the
userspace do authentication/PMKID key derivation and driver
handle the association.
A step ahead, this interface can get further optimized if the
PMKID is passed to the host driver and also have it respond to
the association request by the STA on a valid PMKID.
This commit enables the userspace to pass the PMKID to the host
drivers through the set/del pmksa operations in AP mode.
Set/Del pmksa is now restricted to STA/P2P client mode only and
thus the drivers might not expect them in any other(AP) mode.
This commit also introduces a feature flag
NL80211_EXT_FEATURE_AP_PMKSA_CACHING (johannes: renamed) to
maintain the backward compatibility of such an expectation by
the host drivers. These operations are allowed in AP mode only
when the drivers advertize the capability through this flag.
Signed-off-by: Liangwei Dong <liangwei@codeaurora.org>
Signed-off-by: Srinivas Dasari <dasaris@codeaurora.org>
[rename flag to NL80211_EXT_FEATURE_AP_PMKSA_CACHING]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
commit 40cbfa9021 ("cfg80211/nl80211: Optional authentication
offload to userspace")' introduced authentication offload to user
space by the host drivers in station mode. This commit extends
the same for the AP mode too.
Extend NL80211_ATTR_EXTERNAL_AUTH_SUPPORT to also claim the
support of external authentication from the user space in AP mode.
A new flag parameter is introduced in cfg80211_ap_settings to
intend the same while "start ap".
Host driver to use NL80211_CMD_FRAME interface to transmit and
receive the authentication frames to / from the user space.
Host driver to indicate the flag NL80211_RXMGMT_FLAG_EXTERNAL_AUTH
while sending the authentication frame to the user space. This
intends to the user space that the driver wishes it to process
the authentication frame for certain protocols, though it had
initially advertised the support for SME functionality.
User space shall accordingly do the authentication and indicate
its final status through the command NL80211_CMD_EXTERNAL_AUTH.
Allow the command even if userspace doesn't include the attribute
NL80211_ATTR_SSID for AP interface.
Host driver shall continue with the association sequence and
indicate the STA connection status through cfg80211_new_sta.
To facilitate the host drivers in AP mode for matching the pmkid
by the stations during the association, NL80211_CMD_EXTERNAL_AUTH
is also enhanced to include the pmkid to drivers after
the authentication.
This pmkid can also be used in the STA mode to include in the
association request.
Also modify nl80211_external_auth to not mandate SSID in AP mode.
Signed-off-by: Srinivas Dasari <dasaris@codeaurora.org>
[remove useless nla_get_flag() usage]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* avoid trying to operate TDLS when not connection,
this is not valid and led to issues
* count TTL-dropped frames in mesh better
* deal with new WiGig channels in regulatory code
* remove a WARN_ON() that can trigger due to benign
races during device/driver registration
* fix nested netlink policy maxattrs (syzkaller)
* fix hwsim n_limits (syzkaller)
* propagate __aligned(2) to a surrounding struct
* return proper error in virt_wifi error path
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlxLDJoACgkQB8qZga/f
l8SgpQ//W9ZulmDuuuR2qBijFP3JfAZQyruAX+D4Ddp/dGdJWcRLOXhz7U/IhvfM
wuak6e7LQvJnlPDhbkwpJDyQXeva7OmN5j0JNcg4MjIszkPPATz8GctdQfcIAzKg
pxhx6p8tpUUTQdDv87u4rNHrLoa+nyx8GKBqk7Ec0FeOt3LOtp8vOv+S7XNYJlHG
J28DiU3bBWBusumfZ1hqwAcrx3NN3vHylc9WFcQjZPPJ/o9ygPxlpdbkle9XUaNu
wFFDB9hQw4cSuLCR1/aZ4Ixf1ZFX5BG76iQAkwfiIDPgl0ViXq38Nebd4d8bM3l6
dUEhIYVHpXzfz5EbpSGp5sNCqajXQ+KKmqq7QhOC8PKafCZ56FeqQWpQ4ZTOHMEs
AGFxnXWp6TOc/MdJR/bB+JELVoOWkn9K146/5BkiIc8z4Ca7yz7fF23KIw3PVi4M
Ucy6DknPwq60ytn6Mfaxc3XnQlmsJ4UbMNZ9EhL94c9tiWJt4Abm3Xk52on/AA9u
1sXeia+85V2xMyd0P3GStSl3gxoHVikQ10/0BbHtbJTlTAkl3BP1ytZiVCOCOqFs
o16A59U8V9Ilt9ZvgN9wOQ2ckPnFi8RjLZRZQwwrmVCaFIeQ0BtT6FErpml3H47x
fODWB0DZ2HLbbalaRjKEP/DXr2vZu9UT33cJILjCvm5C4Kvae3Y=
=uCO9
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2019-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just a few small fixes:
* avoid trying to operate TDLS when not connection,
this is not valid and led to issues
* count TTL-dropped frames in mesh better
* deal with new WiGig channels in regulatory code
* remove a WARN_ON() that can trigger due to benign
races during device/driver registration
* fix nested netlink policy maxattrs (syzkaller)
* fix hwsim n_limits (syzkaller)
* propagate __aligned(2) to a surrounding struct
* return proper error in virt_wifi error path
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There was no such capability advertisement from the driver and thus the
current user space has to assume the driver to support all the AKMs. While
that may be the case with some drivers (e.g., mac80211-based ones), there
are cfg80211-based drivers that implement SME and have constraints on
which AKMs can be supported (e.g., such drivers may need an update to
support SAE AKM using NL80211_CMD_EXTERNAL_AUTH). Allow such drivers to
advertise the exact set of supported AKMs so that user space tools can
determine what network profile options should be allowed to be configured.
Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org>
[pmsr data might be big, start a new netlink message section]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently Self Managed WIPHY's are not notified on any
hints other than user cell base station hints.
Self Managed wiphy's basically rely on hints from firmware
and its local regdb for regulatory management, so hints from wireless
core can be ignored. But all user hints needs to be notified
to them to provide flexibility to these drivers to honour or
ignore these user hints.
Currently none of the drivers supporting self managed wiphy
register a notifier with cfg80211. Hence this change does not affect
any other driver behavior.
Signed-off-by: Sriram R <srirrama@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>