Commit Graph

1091 Commits

Author SHA1 Message Date
Florian Westphal
cefa31a9d4 netfilter: nft_log: perform module load from nf_tables
modprobe calls from the nf_logger_find_get() API causes deadlock in very
special cases because they occur with the nf_tables transaction mutex held.

In the specific case of nf_log, deadlock is via:

 A nf_tables -> transaction mutex -> nft_log -> modprobe -> nf_log_syslog \
	    -> pernet_ops rwsem -> wait for C
 B netlink event -> rtnl_mutex -> nf_tables transaction mutex -> wait for A
 C close() -> ip6mr_sk_done -> rtnl_mutex -> wait for B

Earlier patch added NFLOG/xt_LOG module softdeps to avoid the need to load
the backend module during a transaction.

For nft_log we would have to add a softdep for both nfnetlink_log or
nf_log_syslog, since we do not know in advance which of the two backends
are going to be configured.

This defers the modprobe op until after the transaction mutex is released.

Tested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 22:34:11 +02:00
Florian Westphal
e465cccd0b netfilter: nf_log_common: merge with nf_log_syslog
Remove nf_log_common.  Now that all per-af modules have been merged
there is no longer a need to provide a helper module.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 22:34:10 +02:00
Florian Westphal
77ccee96a6 netfilter: nf_log_bridge: merge with nf_log_syslog
Provide bridge log support from nf_log_syslog.

After the merge there is no need to load the "real packet loggers",
all of them now reside in the same module.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 22:34:05 +02:00
David S. Miller
efd13b71a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 15:31:22 -07:00
Felix Fietkau
26267bf9bb netfilter: flowtable: bridge vlan hardware offload and switchdev
The switch might have already added the VLAN tag through PVID hardware
offload. Keep this extra VLAN in the flowtable but skip it on egress.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
73f97025a9 netfilter: nft_flow_offload: use direct xmit if hardware offload is enabled
If there is a forward path to reach an ethernet device and hardware
offload is enabled, then use the direct xmit path.

Moreover, store the real device in the direct xmit path info since
software datapath uses dev_hard_header() to push the layer encapsulation
headers while hardware offload refers to the real device.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
4cd91f7c29 netfilter: flowtable: add vlan support
Add the vlan id and protocol to the flow tuple to uniquely identify
flows from the receive path. For the transmit path, dev_hard_header() on
the vlan device push the headers. This patch includes support for two
vlan headers (QinQ) from the ingress path.

Add a generic encap field to the flowtable entry which stores the
protocol and the tag id. This allows to reuse these fields in the PPPoE
support coming in a later patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
7a27f6ab41 netfilter: flowtable: use dev_fill_forward_path() to obtain egress device
The egress device in the tuple is obtained from route. Use
dev_fill_forward_path() instead to provide the real egress device for
this flow whenever this is available.

The new FLOW_OFFLOAD_XMIT_DIRECT type uses dev_queue_xmit() to transmit
ethernet frames. Cache the source and destination hardware address to
use dev_queue_xmit() to transfer packets.

The FLOW_OFFLOAD_XMIT_DIRECT replaces FLOW_OFFLOAD_XMIT_NEIGH if
dev_fill_forward_path() finds a direct transmit path.

In case of topology updates, if peer is moved to different bridge port,
the connection will time out, reconnect will result in a new entry with
the correct path. Snooping fdb updates would allow for cleaning up stale
flowtable entries.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
c63a7cc4d7 netfilter: flowtable: use dev_fill_forward_path() to obtain ingress device
Obtain the ingress device in the tuple from the route in the reply
direction. Use dev_fill_forward_path() instead to get the real ingress
device for this flow.

Fall back to use the ingress device that the IP forwarding route
provides if:

- dev_fill_forward_path() finds no real ingress device.
- the ingress device that is obtained is not part of the flowtable
  devices.
- this route has a xfrm policy.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
5139c0c007 netfilter: flowtable: add xmit path types
Add the xmit_type field that defines the two supported xmit paths in the
flowtable data plane, which are the neighbour and the xfrm xmit paths.
This patch prepares for new flowtable xmit path types to come.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
0ce7cf4127 netfilter: nftables: update table flags from the commit phase
Do not update table flags from the preparation phase. Store the flags
update into the transaction, then update the flags from the commit
phase.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 01:35:39 +01:00
Pablo Neira Ayuso
f4401262b9 netfilter: flowtable: fast NAT functions never fail
Simplify existing fast NAT routines by returning void. After the
skb_try_make_writable() call consolidation, these routines cannot ever
fail.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:44:00 +01:00
Pablo Neira Ayuso
4f08f173d0 netfilter: flowtable: move FLOW_OFFLOAD_DIR_MAX away from enumeration
This allows to remove the default case which should not ever happen and
that was added to avoid gcc warnings on unhandled FLOW_OFFLOAD_DIR_MAX
enumeration case.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:44:00 +01:00
YueHaibing
d4a96be654 netfilter: conntrack: Remove unused variable declaration
commit e97c3e278e ("tproxy: split off ipv6 defragmentation to a separate
module") left behind this.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:34:19 +01:00
Pablo Neira Ayuso
7b35582cd0 netfilter: nftables: allow to update flowtable flags
Honor flowtable flags from the control update path. Disallow disabling
to toggle hardware offload support though.

Fixes: 8bb69f3b29 ("netfilter: nf_tables: add flowtable offload control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:32:21 +01:00
Pablo Neira Ayuso
6001a930ce netfilter: nftables: introduce table ownership
A userspace daemon like firewalld might need to monitor for netlink
updates to detect its ruleset removal by the (global) flush ruleset
command to ensure ruleset persistency. This adds extra complexity from
userspace and, for some little time, the firewall policy is not in
place.

This patch adds the NFT_TABLE_F_OWNER flag which allows a userspace
program to own the table that creates in exclusivity.

Tables that are owned...

- can only be updated and removed by the owner, non-owners hit EPERM if
  they try to update it or remove it.
- are destroyed when the owner closes the netlink socket or the process
  is gone (implicit netlink socket closure).
- are skipped by the global flush ruleset command.
- are listed in the global ruleset.

The userspace process that sets on the NFT_TABLE_F_OWNER flag need to
leave open the netlink socket.

A new NFTA_TABLE_OWNER netlink attribute specifies the netlink port ID
to identify the owner from userspace.

This patch also updates error reporting when an unknown table flag is
specified to change it from EINVAL to EOPNOTSUPP given that EINVAL is
usually reserved to report for malformed netlink messages to userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-15 18:17:15 +01:00
Jakub Kicinski
c273a20c30 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

1) Remove indirection and use nf_ct_get() instead from nfnetlink_log
   and nfnetlink_queue, from Florian Westphal.

2) Add weighted random twos choice least-connection scheduling for IPVS,
   from Darby Payne.

3) Add a __hash placeholder in the flow tuple structure to identify
   the field to be included in the rhashtable key hash calculation.

4) Add a new nft_parse_register_load() and nft_parse_register_store()
   to consolidate register load and store in the core.

5) Statify nft_parse_register() since it has no more module clients.

6) Remove redundant assignment in nft_cmp, from Colin Ian King.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
  netfilter: nftables: remove redundant assignment of variable err
  netfilter: nftables: statify nft_parse_register()
  netfilter: nftables: add nft_parse_register_store() and use it
  netfilter: nftables: add nft_parse_register_load() and use it
  netfilter: flowtable: add hash offset field to tuple
  ipvs: add weighted random twos choice algorithm
  netfilter: ctnetlink: remove get_ct indirection
====================

Link: https://lore.kernel.org/r/20210206015005.23037-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 15:34:23 -08:00
Pablo Neira Ayuso
08a01c11a5 netfilter: nftables: statify nft_parse_register()
This function is not used anymore by any extension, statify it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-01-27 23:16:04 +01:00
Pablo Neira Ayuso
345023b0db netfilter: nftables: add nft_parse_register_store() and use it
This new function combines the netlink register attribute parser
and the store validation function.

This update requires to replace:

        enum nft_registers      dreg:8;

in many of the expression private areas otherwise compiler complains
with:

        error: cannot take address of bit-field ‘dreg’

when passing the register field as reference.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-01-27 23:16:02 +01:00
Pablo Neira Ayuso
4f16d25c68 netfilter: nftables: add nft_parse_register_load() and use it
This new function combines the netlink register attribute parser
and the load validation function.

This update requires to replace:

	enum nft_registers      sreg:8;

in many of the expression private areas otherwise compiler complains
with:

	error: cannot take address of bit-field ‘sreg’

when passing the register field as reference.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-01-27 22:53:29 +01:00
Pablo Neira Ayuso
dbc859d96f netfilter: flowtable: add hash offset field to tuple
Add a placeholder field to calculate hash tuple offset. Similar to
2c407aca64 ("netfilter: conntrack: avoid gcc-10 zero-length-bounds
warning").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-01-26 01:10:07 +01:00
Pablo Neira Ayuso
fca05d4d61 netfilter: nft_dynset: honor stateful expressions in set definition
If the set definition contains stateful expressions, allocate them for
the newly added entries from the packet path.

Fixes: 65038428b2 ("netfilter: nf_tables: allow to specify stateful expression in set definition")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-01-16 18:57:26 +01:00
Jakub Kicinski
7bca5021a4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

1) Missing dependencies in NFT_BRIDGE_REJECT, from Randy Dunlap.

2) Use atomic_inc_return() instead of atomic_add_return() in IPVS,
   from Yejune Deng.

3) Simplify check for overquota in xt_nfacct, from Kaixu Xia.

4) Move nfnl_acct_list away from struct net, from Miao Wang.

5) Pass actual sk in reject actions, from Jan Engelhardt.

6) Add timeout and protoinfo to ctnetlink destroy events,
   from Florian Westphal.

7) Four patches to generalize set infrastructure to support
   for multiple expressions per set element.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
  netfilter: nftables: netlink support for several set element expressions
  netfilter: nftables: generalize set extension to support for several expressions
  netfilter: nftables: move nft_expr before nft_set
  netfilter: nftables: generalize set expressions support
  netfilter: ctnetlink: add timeout and protoinfo to destroy events
  netfilter: use actual socket sk for REJECT action
  netfilter: nfnl_acct: remove data from struct net
  netfilter: Remove unnecessary conversion to bool
  ipvs: replace atomic_add_return()
  netfilter: nft_reject_bridge: fix build errors due to code movement
====================

Link: https://lore.kernel.org/r/20201212230513.3465-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 15:43:21 -08:00
Pablo Neira Ayuso
563125a73a netfilter: nftables: generalize set extension to support for several expressions
This patch replaces NFT_SET_EXPR by NFT_SET_EXT_EXPRESSIONS. This new
extension allows to attach several expressions to one set element (not
only one single expression as NFT_SET_EXPR provides). This patch
prepares for support for several expressions per set element in the
netlink userspace API.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-12 19:20:24 +01:00
Pablo Neira Ayuso
92b211a289 netfilter: nftables: move nft_expr before nft_set
Move the nft_expr structure definition before nft_set. Expressions are
used by rules and sets, remove unnecessary forward declarations. This
comes as preparation to support for multiple expressions per set element.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-12 11:44:42 +01:00
Pablo Neira Ayuso
8cfd9b0f85 netfilter: nftables: generalize set expressions support
Currently, the set infrastucture allows for one single expressions per
element. This patch extends the existing infrastructure to allow for up
to two expressions. This is not updating the netlink API yet, this is
coming as an initial preparation patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-12 11:44:42 +01:00
Florian Westphal
86d21fc747 netfilter: ctnetlink: add timeout and protoinfo to destroy events
DESTROY events do not include the remaining timeout.

Add the timeout if the entry was removed explicitly. This can happen
when a conntrack gets deleted prematurely, e.g. due to a tcp reset,
module removal, netdev notifier (nat/masquerade device went down),
ctnetlink and so on.

Add the protocol state too for the destroy message to check for abnormal
state on connection termination.

Joint work with Pablo.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-12 11:44:42 +01:00
Jakub Kicinski
46d5e62dd3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
xdp_return_frame_bulk() needs to pass a xdp_buff
to __xdp_return().

strlcpy got converted to strscpy but here it makes no
functional difference, so just keep the right code.

Conflicts:
	net/netfilter/nf_tables_api.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-11 22:29:38 -08:00
Pablo Neira Ayuso
917d80d376 netfilter: nft_dynset: fix timeouts later than 23 days
Use nf_msecs_to_jiffies64 and nf_jiffies64_to_msecs as provided by
8e1102d5a1 ("netfilter: nf_tables: support timeouts larger than 23
days"), otherwise ruleset listing breaks.

Fixes: a8b1e36d0d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-08 20:42:11 +01:00
Jakub Kicinski
55fd59b003 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:
	drivers/net/ethernet/ibm/ibmvnic.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03 15:44:09 -08:00
Jan Engelhardt
04295878be netfilter: use actual socket sk for REJECT action
True to the message of commit v5.10-rc1-105-g46d6c5ae953c, _do_
actually make use of state->sk when possible, such as in the REJECT
modules.

Reported-by: Minqiang Chen <ptpt52@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-01 14:33:55 +01:00
Pablo Neira Ayuso
a5d45bc0dc netfilter: nftables_offload: build mask based from the matching bytes
Userspace might match on prefix bytes of header fields if they are on
the byte boundary, this requires that the mask is adjusted accordingly.
Use NFT_OFFLOAD_MATCH_EXACT() for meta since prefix byte matching is not
allowed for this type of selector.

The bitwise expression might be optimized out by userspace, hence the
kernel needs to infer the prefix from the number of payload bytes to
match on. This patch adds nft_payload_offload_mask() to calculate the
bitmask to match on the prefix.

Fixes: c9626a2cbd ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-27 12:10:47 +01:00
Pablo Neira Ayuso
3c78e9e0d3 netfilter: nftables_offload: set address type in control dissector
This patch adds nft_flow_rule_set_addr_type() to set the address type
from the nft_payload expression accordingly.

If the address type is not set in the control dissector then a rule that
matches either on source or destination IP address does not work.

After this patch, nft hardware offload generates the flow dissector
configuration as tc-flower does to match on an IP address.

This patch has been also tested functionally to make sure packets are
filtered out by the NIC.

This is also getting the code aligned with the existing netfilter flow
offload infrastructure which is also setting the control dissector.

Fixes: c9626a2cbd ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-27 12:10:46 +01:00
Numan Siddique
e2ef5203c8 net: openvswitch: Be liberal in tcp conntrack.
There is no easy way to distinguish if a conntracked tcp packet is
marked invalid because of tcp_in_window() check error or because
it doesn't belong to an existing connection. With this patch,
openvswitch sets liberal tcp flag for the established sessions so
that out of window packets are not marked invalid.

A helper function - nf_ct_set_tcp_be_liberal(nf_conn) is added which
sets this flag for both the directions of the nf_conn.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20201116130126.3065077-1-nusiddiq@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 09:53:48 -08:00
Jose M. Guisado Gomez
fa538f7cf0 netfilter: nf_reject: add reject skbuff creation helpers
Adds reject skbuff creation helper functions to ipv4/6 nf_reject
infrastructure. Use these functions for reject verdict in bridge
family.

Can be reused by all different families that support reject and
will not inject the reject packet through ip local out.

Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31 10:40:22 +01:00
Saeed Mirzamohammadi
31cc578ae2 netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create
This patch fixes the issue due to:

BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2
net/netfilter/nf_tables_offload.c:40
Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244

The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds.

This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue.

Add nft_expr_more() and use it to fix this problem.

Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-20 13:54:54 +02:00
Jakub Kicinski
2295cddf99 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor conflicts in net/mptcp/protocol.h and
tools/testing/selftests/net/Makefile.

In both cases code was added on both sides in the same place
so just keep both.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15 12:43:21 -07:00
Pablo Neira Ayuso
d25e2e9388 netfilter: restore NF_INET_NUMHOOKS
This definition is used by the iptables legacy UAPI, restore it.

Fixes: d3519cb89f ("netfilter: nf_tables: add inet ingress support")
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-14 20:28:05 -07:00
Pablo Neira Ayuso
0d9826bc18 netfilter: nf_log: missing vlan offload tag and proto
Dump vlan tag and proto for the usual vlan offload case if the
NF_LOG_MACDECODE flag is set on. Without this information the logging is
misleading as there is no reference to the VLAN header.

[12716.993704] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0800 SRC=192.168.10.2 DST=172.217.168.163 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=2548 DF PROTO=TCP SPT=55848 DPT=80 WINDOW=501 RES=0x00 ACK FIN URGP=0
[12721.157643] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0806 ARP HTYPE=1 PTYPE=0x0800 OPCODE=2 MACSRC=86:6c:92:ea:d6:73 IPSRC=192.168.10.2 MACDST=0e:3b:eb:86:73:76 IPDST=192.168.10.1

Fixes: 83e96d443b ("netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-14 01:25:14 +02:00
Pablo Neira Ayuso
d3519cb89f netfilter: nf_tables: add inet ingress support
This patch adds a new ingress hook for the inet family. The inet ingress
hook emulates the IP receive path code, therefore, unclean packets are
drop before walking over the ruleset in this basechain.

This patch also introduces the nft_base_chain_netdev() helper function
to check if this hook is bound to one or more devices (through the hook
list infrastructure). This check allows to perform the same handling for
the inet ingress as it would be a netdev ingress chain from the control
plane.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12 01:57:34 +02:00
Phil Sutter
10fdd6d80e netfilter: nf_tables: Implement fast bitwise expression
A typical use of bitwise expression is to mask out parts of an IP
address when matching on the network part only. Optimize for this common
use with a fast variant for NFT_BITWISE_BOOL-type expressions operating
on 32bit-sized values.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-04 21:08:33 +02:00
Phil Sutter
5f48846daf netfilter: nf_tables: Enable fast nft_cmp for inverted matches
Add a boolean indicating NFT_CMP_NEQ. To include it into the match
decision, it is sufficient to XOR it with the data comparison's result.

While being at it, store the mask that is calculated during expression
init and free the eval routine from having to recalculate it each time.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-04 21:08:32 +02:00
Jose M. Guisado Gomez
002f217653 netfilter: nf_tables: add userdata attributes to nft_chain
Enables storing userdata for nft_chain. Field udata points to user data
and udlen stores its length.

Adds new attribute flag NFTA_CHAIN_USERDATA.

Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-30 11:46:50 +02:00
YueHaibing
8a8b9047a8 netfilter: nf_tables: Remove ununsed function nft_data_debug
It is never used, so can be removed.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-22 01:35:29 +02:00
David S. Miller
d85427e3c8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Rewrite inner header IPv6 in ICMPv6 messages in ip6t_NPT,
   from Michael Zhou.

2) do_ip_vs_set_ctl() dereferences uninitialized value,
   from Peilin Ye.

3) Support for userdata in tables, from Jose M. Guisado.

4) Do not increment ct error and invalid stats at the same time,
   from Florian Westphal.

5) Remove ct ignore stats, also from Florian.

6) Add ct stats for clash resolution, from Florian Westphal.

7) Bump reference counter bump on ct clash resolution only,
   this is safe because bucket lock is held, again from Florian.

8) Use ip_is_fragment() in xt_HMARK, from YueHaibing.

9) Add wildcard support for nft_socket, from Balazs Scheidler.

10) Remove superfluous IPVS dependency on iptables, from
    Yaroslav Bolyukin.

11) Remove unused definition in ebt_stp, from Wang Hai.

12) Replace CONFIG_NFT_CHAIN_NAT_{IPV4,IPV6} by CONFIG_NFT_NAT
    in selftests/net, from Fabian Frederick.

13) Add userdata support for nft_object, from Jose M. Guisado.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09 11:21:19 -07:00
Jose M. Guisado Gomez
b131c96496 netfilter: nf_tables: add userdata support for nft_object
Enables storing userdata for nft_object. Initially this will store an
optional comment but can be extended in the future as needed.

Adds new attribute NFTA_OBJ_USERDATA to nft_object.

Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-08 16:35:38 +02:00
Jose M. Guisado Gomez
7a81575b80 netfilter: nf_tables: add userdata attributes to nft_table
Enables storing userdata for nft_table. Field udata points to user data
and udlen store its length.

Adds new attribute flag NFTA_TABLE_USERDATA

Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28 19:18:48 +02:00
Florian Westphal
1e105e6afa netfilter: nf_tables: fix destination register zeroing
Following bug was reported via irc:
nft list ruleset
   set knock_candidates_ipv4 {
      type ipv4_addr . inet_service
      size 65535
      elements = { 127.0.0.1 . 123,
                   127.0.0.1 . 123 }
      }
 ..
   udp dport 123 add @knock_candidates_ipv4 { ip saddr . 123 }
   udp dport 123 add @knock_candidates_ipv4 { ip saddr . udp dport }

It should not have been possible to add a duplicate set entry.

After some debugging it turned out that the problem is the immediate
value (123) in the second-to-last rule.

Concatenations use 32bit registers, i.e. the elements are 8 bytes each,
not 6 and it turns out the kernel inserted

inet firewall @knock_candidates_ipv4
        element 0100007f ffff7b00  : 0 [end]
        element 0100007f 00007b00  : 0 [end]

Note the non-zero upper bits of the first element.  It turns out that
nft_immediate doesn't zero the destination register, but this is needed
when the length isn't a multiple of 4.

Furthermore, the zeroing in nft_payload is broken.  We can't use
[len / 4] = 0 -- if len is a multiple of 4, index is off by one.

Skip zeroing in this case and use a conditional instead of (len -1) / 4.

Fixes: 49499c3e6e ("netfilter: nf_tables: switch registers to 32 bit addressing")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-21 19:00:33 +02:00
Linus Torvalds
97d052ea3f Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Thomas Gleixner:
 "A set of locking fixes and updates:

   - Untangle the header spaghetti which causes build failures in
     various situations caused by the lockdep additions to seqcount to
     validate that the write side critical sections are non-preemptible.

   - The seqcount associated lock debug addons which were blocked by the
     above fallout.

     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict
     per CPU seqcounts. As the lock is not part of the seqcount, lockdep
     cannot validate that the lock is held.

     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored
     and write_seqcount_begin() has a lockdep assertion to validate that
     the lock is held.

     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API
     is unchanged and determines the type at compile time with the help
     of _Generic which is possible now that the minimal GCC version has
     been moved up.

     Adding this lockdep coverage unearthed a handful of seqcount bugs
     which have been addressed already independent of this.

     While generally useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if
     the writers are serialized by an associated lock, which leads to
     the well known reader preempts writer livelock. RT prevents this by
     storing the associated lock pointer independent of lockdep in the
     seqcount and changing the reader side to block on the lock when a
     reader detects that a writer is in the write side critical section.

   - Conversion of seqcount usage sites to associated types and
     initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/seqlock, headers: Untangle the spaghetti monster
  locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
  x86/headers: Remove APIC headers from <asm/smp.h>
  seqcount: More consistent seqprop names
  seqcount: Compress SEQCNT_LOCKNAME_ZERO()
  seqlock: Fold seqcount_LOCKNAME_init() definition
  seqlock: Fold seqcount_LOCKNAME_t definition
  seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
  hrtimer: Use sequence counter with associated raw spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  vfs: Use sequence counter with associated spinlock
  timekeeping: Use sequence counter with associated raw spinlock
  xfrm: policy: Use sequence counters with associated lock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  sched: tasks: Use sequence counter with associated spinlock
  ...
2020-08-10 19:07:44 -07:00
David S. Miller
ee895a30ef Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Flush the cleanup xtables worker to make sure destructors
   have completed, from Florian Westphal.

2) iifgroup is matching erroneously, also from Florian.

3) Add selftest for meta interface matching, from Florian Westphal.

4) Move nf_ct_offload_timeout() to header, from Roi Dayan.

5) Call nf_ct_offload_timeout() from flow_offload_add() to
   make sure garbage collection does not evict offloaded flow,
   from Roi Dayan.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04 13:32:39 -07:00