This patch is to move nf_ct_netns_get() out of nf_conncount_init()
and let the consumers of nf_conncount decide if they want to turn
on netfilter conntrack.
It makes nf_conncount more flexible to be used in other places and
avoids netfilter conntrack turned on when using it in openvswitch
conntrack.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Similar to commit 70f06c115b ("sched: act_ct: switch to per-action
label counting"), we should also switch to per-action label counting
in openvswitch conntrack, as Florian suggested.
The difference is that nf_connlabels_get() is called unconditionally
when creating an ct action in ovs_ct_copy_action(). As with these
flows:
table=0,ip,actions=ct(commit,table=1)
table=1,ip,actions=ct(commit,exec(set_field:0xac->ct_label),table=2)
it needs to make sure the label ext is created in the 1st flow before
the ct is committed in ovs_ct_commit(). Otherwise, the warning in
nf_ct_ext_add() when creating the label ext in the 2nd flow will
be triggered:
WARN_ON(nf_ct_is_confirmed(ct));
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/6b9347d5c1a0b364e88d900b29a616c3f8e5b1ca.1723483073.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At this time, conntrack either returns NF_ACCEPT or NF_DROP.
To improve debuging it would be nice to be able to replace NF_DROP
verdict with NF_DROP_REASON() helper,
This helper releases the skb instantly (so drop_monitor can pinpoint
precise location) and returns NF_STOLEN.
Prepare call sites to deal with this before introducing such changes
in conntrack and nat core.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Aaron Conole <aconole@redhat.om>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya found a failure in running check-kernel tests with at_groups=144
(144: conntrack - FTP SNAT orig tuple) in OVS repo. After his further
investigation, the root cause is that the labels sent to userspace
for related ct are incorrect.
The labels for unconfirmed related ct should use its master's labels.
However, the changes made in commit 8c8b733208 ("openvswitch: set
IPS_CONFIRMED in tmpl status only when commit is set in conntrack")
led to getting labels from this related ct.
So fix it in ovs_ct_get_labels() by changing to copy labels from its
master ct if it is a unconfirmed related ct. Note that there is no
fix needed for ct->mark, as it was already copied from its master
ct for related ct in init_conntrack().
Fixes: 8c8b733208 ("openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Tested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since kfree_rcu, which is called in the hlist_for_each_entry_rcu traversal
of ovs_ct_limit_exit, is not part of the RCU read critical section, it
is possible that the RCU grace period will pass during the traversal and
the key will be free.
To prevent this, it should be changed to hlist_for_each_entry_safe.
Fixes: 11efd5cb04 ("openvswitch: Support conntrack zone limit")
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/ZiYvzQN/Ry5oeFQW@v4bel-B760M-AORUS-ELITE-AX
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On startup, ovs-vswitchd probes different datapath features including
support for timeout policies. While probing, it tries to execute
certain operations with OVS_PACKET_ATTR_PROBE or OVS_FLOW_ATTR_PROBE
attributes set. These attributes tell the openvswitch module to not
log any errors when they occur as it is expected that some of the
probes will fail.
For some reason, setting the timeout policy ignores the PROBE attribute
and logs a failure anyway. This is causing the following kernel log
on each re-start of ovs-vswitchd:
kernel: Failed to associated timeout policy `ovs_test_tp'
Fix that by using the same logging macro that all other messages are
using. The message will still be printed at info level when needed
and will be rate limited, but with a net rate limiter instead of
generic printk one.
The nf_ct_set_timeout() itself will still print some info messages,
but at least this change makes logging in openvswitch module more
consistent.
Fixes: 06bd2bdf19 ("openvswitch: Add timeout support to ct action")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20240403203803.2137962-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Referenced commit doesn't always set iifidx when offloading the flow to
hardware. Fix the following cases:
- nf_conn_act_ct_ext_fill() is called before extension is created with
nf_conn_act_ct_ext_add() in tcf_ct_act(). This can cause rule offload with
unspecified iifidx when connection is offloaded after only single
original-direction packet has been processed by tc data path. Always fill
the new nf_conn_act_ct_ext instance after creating it in
nf_conn_act_ct_ext_add().
- Offloading of unidirectional UDP NEW connections is now supported, but ct
flow iifidx field is not updated when connection is promoted to
bidirectional which can result reply-direction iifidx to be zero when
refreshing the connection. Fill in the extension and update flow iifidx
before calling flow_offload_refresh().
Fixes: 9795ded7f9 ("net/sched: act_ct: Fill offloading tuple iifidx")
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: 6a9bad0069 ("net/sched: act_ct: offload UDP NEW connections")
Link: https://lore.kernel.org/r/20231103151410.764271-1-vladbu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Only three families use info->userhdr today and going forward
we discourage using fixed headers in new families.
So having the pointer to user header in struct genl_info
is an overkill. Compute the header pointer at runtime.
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20230814214723.2924989-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use drop reasons from include/net/dropreason-core.h when a reasonable
candidate exists.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed
from the hashtable when lookup, we can simplify the exp processing code a
lot in openvswitch conntrack.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Now handle_fragments() in OVS and TC have the similar code, and
this patch removes the duplicate code by moving the function
to nf_conntrack_ovs.
Note that skb_clear_hash(skb) or skb->ignore_df = 1 should be
done only when defrag returns 0, as it does in other places
in kernel.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch has no functional changes and just moves key and ovs_cb update
out of handle_fragments, and skb_clear_hash() and skb->ignore_df change
into handle_fragments(), to make it easier to move the duplicate code
from handle_fragments() into nf_conntrack_ovs later.
Note that it changes to pass info->family to handle_fragments() instead
of key for the packet type check, as info->family is set according to
key->eth.type in ovs_ct_copy_action() when creating the action.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are almost the same code in ovs_skb_network_trim() and
tcf_ct_skb_network_trim(), this patch extracts them into a function
nf_ct_skb_network_trim() and moves the function to nf_conntrack_ovs.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use string_is_terminated() helper instead of cpecific memchr() call.
This shows better the intention of the call.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230208133153.22528-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IPv4 GSO packets may get processed in ovs_skb_network_trim(),
and we need to use skb_ip_totlen() to get iph totlen.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are two nat functions are nearly the same in both OVS and
TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().
This patch creates nf_nat_ovs.c under netfilter and moves them
there then exports nf_ct_nat() so that it can be shared by both
OVS and TC, and keeps the nat (type) check and nat flag update
in OVS and TC's own place, as these parts are different between
OVS and TC.
Note that in OVS nat function it was using skb->protocol to get
the proto as it already skips vlans in key_extract(), while it
doesn't in TC, and TC has to call skb_protocol() to get proto.
So in nf_ct_nat_execute(), we keep using skb_protocol() which
works for both OVS and TC contrack.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When it fails to allocate nat ext, the packet should be dropped, like
the memory allocation failures in other places in ovs_ct_nat().
This patch changes to return NF_DROP when fails to add nat ext before
doing NAT in ovs_ct_nat(), also it would keep consistent with tc
action ct' processing in tcf_ct_act_nat().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Either OVS_CT_SRC_NAT or OVS_CT_DST_NAT is set, OVS_CT_NAT must be
set in info->nat. Thus, if OVS_CT_NAT is not set in info->nat, it
will definitely not do NAT but returns NF_ACCEPT in ovs_ct_nat().
This patch changes nothing funcational but only makes this return
earlier in ovs_ct_nat() to keep consistent with TC's processing
in tcf_ct_act_nat().
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The calls to ovs_ct_nat_execute() are as below:
ovs_ct_execute()
ovs_ct_lookup()
__ovs_ct_lookup()
ovs_ct_nat()
ovs_ct_nat_execute()
ovs_ct_commit()
__ovs_ct_lookup()
ovs_ct_nat()
ovs_ct_nat_execute()
and since skb_pull_rcsum() and skb_push_rcsum() are already
called in ovs_ct_execute(), there's no need to do it again
in ovs_ct_nat_execute().
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_conn:mark can be read from and written to in parallel. Use
READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted
compiler optimizations.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Move ovs_ct_add_helper from openvswitch to nf_conntrack_helper and
rename as nf_ct_add_helper, so that it can be used in TC act_ct in
the next patch.
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Move ovs_ct_helper from openvswitch to nf_conntrack_helper and rename
as nf_ct_helper so that it can be used in TC act_ct in the next patch.
Note that it also adds the checks for the family and proto, as in TC
act_ct, the packets with correct family and proto are not guaranteed.
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
A WARN_ON call trace would be triggered when 'ct(commit, alg=helper)'
applies on a confirmed connection:
WARNING: CPU: 0 PID: 1251 at net/netfilter/nf_conntrack_extend.c:98
RIP: 0010:nf_ct_ext_add+0x12d/0x150 [nf_conntrack]
Call Trace:
<TASK>
nf_ct_helper_ext_add+0x12/0x60 [nf_conntrack]
__nf_ct_try_assign_helper+0xc4/0x160 [nf_conntrack]
__ovs_ct_lookup+0x72e/0x780 [openvswitch]
ovs_ct_execute+0x1d8/0x920 [openvswitch]
do_execute_actions+0x4e6/0xb60 [openvswitch]
ovs_execute_actions+0x60/0x140 [openvswitch]
ovs_packet_cmd_execute+0x2ad/0x310 [openvswitch]
genl_family_rcv_msg_doit.isra.15+0x113/0x150
genl_rcv_msg+0xef/0x1f0
which can be reproduced with these OVS flows:
table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk
actions=ct(commit, table=1)
table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new
actions=ct(commit, alg=ftp),normal
The issue was introduced by commit 248d45f1e1 ("openvswitch: Allow
attaching helper in later commit") where it somehow removed the check
of nf_ct_is_confirmed before asigning the helper. This patch is to fix
it by bringing it back.
Fixes: 248d45f1e1 ("openvswitch: Allow attaching helper in later commit")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/c5c9092a22a2194650222bffaf786902613deb16.1665085502.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similar to the previous commit, the Netlink interface of the OVS
conntrack module was restricted to global CAP_NET_ADMIN by using
GENL_ADMIN_PERM. This is changed to GENL_UNS_ADMIN_PERM to support
unprivileged containers in non-initial user namespace.
Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We had historically not checked that genlmsghdr.reserved
is 0 on input which prevents us from using those precious
bytes in the future.
One use case would be to extend the cmd field, which is
currently just 8 bits wide and 256 is not a lot of commands
for some core families.
To make sure that new families do the right thing by default
put the onus of opting out of validation on existing families.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
Signed-off-by: David S. Miller <davem@davemloft.net>
If packet headers changed, the cached nfct is no longer relevant
for the packet and attempt to re-use it leads to the incorrect packet
classification.
This issue is causing broken connectivity in OpenStack deployments
with OVS/OVN due to hairpin traffic being unexpectedly dropped.
The setup has datapath flows with several conntrack actions and tuple
changes between them:
actions:ct(commit,zone=8,mark=0/0x1,nat(src)),
set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)),
set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)),
ct(zone=8),recirc(0x4)
After the first ct() action the packet headers are almost fully
re-written. The next ct() tries to re-use the existing nfct entry
and marks the packet as invalid, so it gets dropped later in the
pipeline.
Clearing the cached conntrack entry whenever packet tuple is changed
to avoid the issue.
The flow key should not be cleared though, because we should still
be able to match on the ct_state if the recirculation happens after
the tuple change but before the next ct() action.
Cc: stable@vger.kernel.org
Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Link: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-May/051829.html
Link: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://lore.kernel.org/r/20220606221140.488984-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During NAT, a tuple collision may occur. When this happens, openvswitch
will make a second pass through NAT which will perform additional packet
modification. This will update the skb data, but not the flow key that
OVS uses. This means that future flow lookups, and packet matches will
have incorrect data. This has been supported since
5d50aa83e2 ("openvswitch: support asymmetric conntrack").
That commit failed to properly update the sw_flow_key attributes, since
it only called the ovs_ct_nat_update_key once, rather than each time
ovs_ct_nat_execute was called. As these two operations are linked, the
ovs_ct_nat_execute() function should always make sure that the
sw_flow_key is updated after a successful call through NAT infrastructure.
Fixes: 5d50aa83e2 ("openvswitch: support asymmetric conntrack")
Cc: Dumitru Ceara <dceara@redhat.com>
Cc: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20220318124319.3056455-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next. This
includes one patch to update ovs and act_ct to use nf_ct_put() instead
of nf_conntrack_put().
1) Add netns_tracker to nfnetlink_log and masquerade, from Eric Dumazet.
2) Remove redundant rcu read-size lock in nf_tables packet path.
3) Replace BUG() by WARN_ON_ONCE() in nft_payload.
4) Consolidate rule verdict tracing.
5) Replace WARN_ON() by WARN_ON_ONCE() in nf_tables core.
6) Make counter support built-in in nf_tables.
7) Add new field to conntrack object to identify locally generated
traffic, from Florian Westphal.
8) Prevent NAT from shadowing well-known ports, from Florian Westphal.
9) Merge nf_flow_table_{ipv4,ipv6} into nf_flow_table_inet, also from
Florian.
10) Remove redundant pointer in nft_pipapo AVX2 support, from Colin Ian King.
11) Replace opencoded max() in conntrack, from Jiapeng Chong.
12) Update conntrack to use refcount_t API, from Florian Westphal.
13) Move ip_ct_attach indirection into the nf_ct_hook structure.
14) Constify several pointer object in the netfilter codebase,
from Florian Westphal.
15) Tree-wide replacement of nf_conntrack_put() by nf_ct_put(), also
from Florian.
16) Fix egress splat due to incorrect rcu notation, from Florian.
17) Move stateful fields of connlimit, last, quota, numgen and limit
out of the expression data area.
18) Build a blob to represent the ruleset in nf_tables, this is a
requirement of the new register tracking infrastructure.
19) Add NFT_REG32_NUM to define the maximum number of 32-bit registers.
20) Add register tracking infrastructure to skip redundant
store-to-register operations, this includes support for payload,
meta and bitwise expresssions.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: (32 commits)
netfilter: nft_meta: cancel register tracking after meta update
netfilter: nft_payload: cancel register tracking after payload update
netfilter: nft_bitwise: track register operations
netfilter: nft_meta: track register operations
netfilter: nft_payload: track register operations
netfilter: nf_tables: add register tracking infrastructure
netfilter: nf_tables: add NFT_REG32_NUM
netfilter: nf_tables: add rule blob layout
netfilter: nft_limit: move stateful fields out of expression data
netfilter: nft_limit: rename stateful structure
netfilter: nft_numgen: move stateful fields out of expression data
netfilter: nft_quota: move stateful fields out of expression data
netfilter: nft_last: move stateful fields out of expression data
netfilter: nft_connlimit: move stateful fields out of expression data
netfilter: egress: avoid a lockdep splat
net: prefer nf_ct_put instead of nf_conntrack_put
netfilter: conntrack: avoid useless indirection during conntrack destruction
netfilter: make function op structures const
netfilter: core: move ip_ct_attach indirection to struct nf_ct_hook
netfilter: conntrack: convert to refcount_t api
...
====================
Link: https://lore.kernel.org/r/20220109231640.104123-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Its the same as nf_conntrack_put(), but without the
need for an indirect call. The downside is a module dependency on
nf_conntrack, but all of these already depend on conntrack anyway.
Cc: Paul Blakey <paulb@mellanox.com>
Cc: dev@openvswitch.org
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Convert nf_conn reference counting from atomic_t to refcount_t based api.
refcount_t api provides more runtime sanity checks and will warn on
certain constructs, e.g. refcount_inc() on a zero reference count, which
usually indicates use-after-free.
For this reason template allocation is changed to init the refcount to
1, the subsequenct add operations are removed.
Likewise, init_conntrack() is changed to set the initial refcount to 1
instead refcount_inc().
This is safe because the new entry is not (yet) visible to other cpus.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
To give drivers the originating device information for optimized
connection tracking offload, fill in act ct extension with
ifindex from skb.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need add 'if (skb_nfct(skb))' assignment, the
nf_conntrack_put() would check it.
Signed-off-by: Yejune Deng <yejunedeng@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
'struct ovs_zone_limit' has more members than initialized in
ovs_ct_limit_get_default_limit(). The rest of the memory is a random
kernel stack content that ends up being sent to userspace.
Fix that by using designated initializer that will clear all
non-specified fields.
Fixes: 11efd5cb04 ("openvswitch: Support conntrack zone limit")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'skb_push()'/'skb_postpush_rcsum()' can be replaced by an equivalent
'skb_push_rcsum()' which is less verbose.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
When openvswitch conntrack offload with act_ct action. The first rule
do conntrack in the act_ct in tc subsystem. And miss the next rule in
the tc and fallback to the ovs datapath but miss set post_ct flag
which will lead the ct_state_key with -trk flag.
Fixes: 7baf2429a1 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify the return expression.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no easy way to distinguish if a conntracked tcp packet is
marked invalid because of tcp_in_window() check error or because
it doesn't belong to an existing connection. With this patch,
openvswitch sets liberal tcp flag for the established sessions so
that out of window packets are not marked invalid.
A helper function - nf_ct_set_tcp_be_liberal(nf_conn) is added which
sets this flag for both the directions of the nf_conn.
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20201116130126.3065077-1-nusiddiq@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Small conflict around locking in rxrpc_process_event() -
channel_lock moved to bundle in next, while state lock
needs _bh() from net.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With multiple DNAT rules it's possible that after destination
translation the resulting tuples collide.
For example, two openvswitch flows:
nw_dst=10.0.0.10,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
nw_dst=10.0.0.20,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
Assuming two TCP clients initiating the following connections:
10.0.0.10:5000->10.0.0.10:10
10.0.0.10:5000->10.0.0.20:10
Both tuples would translate to 10.0.0.10:5000->20.0.0.1:20 causing
nf_conntrack_confirm() to fail because of tuple collision.
Netfilter handles this case by allocating a null binding for SNAT at
egress by default. Perform the same operation in openvswitch for DNAT
if no explicit SNAT is requested by the user and allocate a null binding
for SNAT for packets in the "original" direction.
Reported-at: https://bugzilla.redhat.com/1877128
Suggested-by: Florian Westphal <fw@strlen.de>
Fixes: 05752523e5 ("openvswitch: Interface with NAT.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The only usage of these is to assign their address to the small_ops field
in the genl_family struct, which is a const pointer, and applying
ARRAY_SIZE() on them. Make them const to allow the compiler to put them
in read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bulk of the genetlink users can use smaller ops, move them.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the function ovs_ct_limit_exit, there is already a helper vaibale
which could be reused to improve the readability, so i fix it in this
patch.
Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ovs_ct_put_key() is potentially copying uninitialized kernel stack memory
into socket buffers, since the compiler may leave a 3-byte hole at the end
of `struct ovs_key_ct_tuple_ipv4` and `struct ovs_key_ct_tuple_ipv6`. Fix
it by initializing `orig` with memset().
Fixes: 9dd7f8907c ("openvswitch: Add original direction conntrack tuple to sw_flow_key.")
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The openvswitch module shares a common conntrack and NAT infrastructure
exposed via netfilter. It's possible that a packet needs both SNAT and
DNAT manipulation, due to e.g. tuple collision. Netfilter can support
this because it runs through the NAT table twice - once on ingress and
again after egress. The openvswitch module doesn't have such capability.
Like netfilter hook infrastructure, we should run through NAT twice to
keep the symmetry.
Fixes: 05752523e5 ("openvswitch: Interface with NAT.")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows to attach conntrack helper to a confirmed conntrack
entry. Currently, we can only attach alg helper to a conntrack entry
when it is in the unconfirmed state. This patch enables an use case
that we can firstly commit a conntrack entry after it passed some
initial conditions. After that the processing pipeline will further
check a couple of packets to determine if the connection belongs to
a particular application, and attach alg helper to the connection
in a later stage.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When IP fragments are reassembled before being sent to conntrack, the
key from the last fragment is used. Unless there are reordering
issues, the last fragment received will not contain the L4 ports, so the
key for the reassembled datagram won't contain them. This patch updates
the key once we have a reassembled datagram.
The handle_fragments() function works on L3 headers so we pull the L3/L4
flow key update code from key_extract into a new function
'key_extract_l3l4'. Then we add a another new function
ovs_flow_key_update_l3l4() and export it so that it is accessible by
handle_fragments() for conntrack packet reassembly.
Co-authored-by: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>