As stated in [1], dma_set_mask() with a 64-bit mask never fails if
dev->dma_mask is non-NULL.
So, if it fails, the 32 bits case will also fail for the same reason.
Moreover, dma_set_mask_and_coherent() returns 0 or -EIO, so the return
code of the function can be used directly.
Finally, inline bnx2x_set_coherency_mask() because it is now only a wrapper
for a single dma_set_mask_and_coherent() call.
Simplify code and remove some dead code accordingly.
[1]: https://lkml.org/lkml/2021/6/7/398
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/29608a525876afddceabf8f11b2ba606da8748fc.1641730747.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As stated in [1], dma_set_mask() with a 64-bit mask never fails if
dev->dma_mask is non-NULL.
So, if it fails, the 32 bits case will also fail for the same reason.
Moreover, dma_set_mask_and_coherent() returns 0 or -EIO, so the return
code of the function can be used directly. There is no need to 'rc = -EIO'
explicitly.
Simplify code and remove some dead code accordingly.
[1]: https://lkml.org/lkml/2021/6/7/398
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/b9aa46e7e5a5aa61f56aac5ea439930f41ad9946.1641726804.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As stated in [1], dma_set_mask() with a 64-bit mask never fails if
dev->dma_mask is non-NULL.
So, if it fails, the 32 bits case will also fail for the same reason.
So if dma_set_mask_and_coherent() succeeds, 'netdev->features' will have
NETIF_F_HIGHDMA in all cases. Move the assignment of this feature in
be_netdev_init() instead be_probe() which is a much logical place.
Simplify code and remove some dead code accordingly.
[1]: https://lkml.org/lkml/2021/6/7/398
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/637696d7141faa68c29fc34b70f9aa67d5e605f0.1641718999.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As stated in [1], dma_set_mask() with a 64-bit mask will never fail if
dev->dma_mask is non-NULL.
So, if it fails, the 32 bits case will also fail for the same reason.
So, if dma_set_mask_and_coherent() succeeds, 'using_dac' is known to be
'true'. This variable can be removed.
Simplify code and remove some dead code accordingly.
[1]: https://lkml.org/lkml/2021/6/7/398
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/1d5a7b3f4fa735f1233c3eb3fa07e71df95fad75.1641658516.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As stated in [1], dma_set_mask() with a 64-bit mask will never fail if
dev->dma_mask is non-NULL.
So, if it fails, the 32 bits case will also fail for the same reason.
If dma_set_mask_and_coherent() succeeds, 'ap->pci_using_dac' is known to be
1. So 'pci_using_dac' can be removed from the 'struct ace_private'.
Simplify code and remove some dead code accordingly.
[1]: https://lkml.org/lkml/2021/6/7/398
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/1a414c05c27b21c661aef61dffe1adcd1578b1f5.1641651917.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As stated in [1], dma_set_mask() with a 64-bit mask will never fail if
dev->dma_mask is non-NULL.
So, if it fails, the 32 bits case will also fail for the same reason.
So qlcnic_set_dma_mask(), (in qlcnic_main.c) can be simplified a lot and
inlined directly in its only caller.
If dma_set_mask_and_coherent() succeeds, 'pci_using_dac' is known to be 1.
So it can be removed from all the calling chain.
qlcnic_setup_netdev() can finally be simplified as-well.
[1]: https://lkml.org/lkml/2021/6/7/398
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/4996ab0337d62ec6a54b2edf234cd5ced4b4d7ad.1641649611.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees reports quoted commit introduced the following warning on arm64:
drivers/net/ethernet/allwinner/sun4i-emac.c:922:60: error: format '%x' expects argument of type 'unsigned int', but argument 3 has type 'resource_size_t' {aka 'long long unsigned int'} [-Werror=format=]
922 | netdev_info(ndev, "get io resource from device: 0x%x, size = %u\n",
| ~^
| | | unsigned int
| %llx
923 | regs->start, resource_size(regs));
| ~~~~~~~~~~~
| |
| resource_size_t {aka long long unsigned int}
.. and another one like that for resource_size().
Switch to %pa and a cast.
Reported-by: Kees Cook <keescook@chromium.org>
Fixes: 47869e82c8 ("sun4i-emac.c: add dma support")
Link: https://lore.kernel.org/r/20220108034438.2227343-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As page_pool_refill_alloc_cache() is only called by
__page_pool_get_cached(), which assumes non-concurrent access
as suggested by the comment in __page_pool_get_cached(), and
ptr_ring allows concurrent access between consumer and producer,
so remove the spinlock in page_pool_refill_alloc_cache().
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/r/20220107090042.13605-1-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Menglong Dong says:
====================
net: skb: introduce kfree_skb_with_reason()
In this series patch, the interface kfree_skb_with_reason() is
introduced(), which is used to collect skb drop reason, and pass
it to 'kfree_skb' tracepoint. Therefor, 'drop_monitor' or eBPF is
able to monitor abnormal skb with detail reason.
In fact, this series patches are out of the intelligence of David
and Steve, I'm just a truck man :/
Previous discussion is here:
https://lore.kernel.org/netdev/20211118105752.1d46e990@gandalf.local.home/https://lore.kernel.org/netdev/67b36bd8-2477-88ac-83a0-35a1eeaf40c9@gmail.com/
In the first patch, kfree_skb_with_reason() is introduced and
the 'reason' field is added to 'kfree_skb' tracepoint. In the
second patch, 'kfree_skb()' in replaced with 'kfree_skb_with_reason()'
in tcp_v4_rcv(). In the third patch, 'kfree_skb_with_reason()' is
used in __udp4_lib_rcv().
Changes since v3:
- fix some code style problems in skb.h
Changes since v2:
- rename kfree_skb_with_reason() to kfree_skb_reason()
- make kfree_skb() static inline, as Jakub suggested
Changes since v1:
- rename some drop reason, as David suggested
- add the third patch
====================
Link: https://lore.kernel.org/r/20220109063628.526990-1-imagedong@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace kfree_skb() with kfree_skb_reason() in __udp4_lib_rcv.
New drop reason 'SKB_DROP_REASON_UDP_CSUM' is added for udp csum
error.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace kfree_skb() with kfree_skb_reason() in tcp_v4_rcv(). Following
drop reasons are added:
SKB_DROP_REASON_NO_SOCKET
SKB_DROP_REASON_PKT_TOO_SMALL
SKB_DROP_REASON_TCP_CSUM
SKB_DROP_REASON_TCP_FILTER
After this patch, 'kfree_skb' event will print message like this:
$ TASK-PID CPU# ||||| TIMESTAMP FUNCTION
$ | | | ||||| | |
<idle>-0 [000] ..s1. 36.113438: kfree_skb: skbaddr=(____ptrval____) protocol=2048 location=(____ptrval____) reason: NO_SOCKET
The reason of skb drop is printed too.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce the interface kfree_skb_reason(), which is able to pass
the reason why the skb is dropped to 'kfree_skb' tracepoint.
Add the 'reason' field to 'trace_kfree_skb', therefor user can get
more detail information about abnormal skb with 'drop_monitor' or
eBPF.
All drop reasons are defined in the enum 'skb_drop_reason', and
they will be print as string in 'kfree_skb' tracepoint in format
of 'reason: XXX'.
( Maybe the reasons should be defined in a uapi header file, so that
user space can use them? )
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Build bot reports:
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c: In function 'fec_set_block_stats':
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:1235:48: error: 'outl' undeclared (first use in this function); did you mean 'out'?
1235 | if (mlx5_core_access_reg(mdev, in, sz, outl, sz, MLX5_REG_PPCNT, 0, 0))
| ^~~~
| out
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220109213321.2292830-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan says:
====================
bnxt_en: Update for net-next
This series adds better error and debug logging for firmware messages.
We now also use the firmware provided timeout value for long running
commands instead of capping it to 40 seconds.
====================
Link: https://lore.kernel.org/r/1641772485-10421-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While it has always been possible to infer that an HWRM command was
abandoned due to an unhealthy firmware status by the shortened timeout
reported, this change improves the log messaging to account for this
case explicitly. In the interests of further clarity, the firmware
status is now also reported in these new messages.
v2: Remove inline keyword for hwrm_wait_must_abort() in .c file.
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some older devices cannot accommodate the 40 seconds timeout
cap for long running commands (such as NVRAM commands) due to
hardware limitations. Allow these devices to request more time for
these long running commands, but print a warning, since the longer
timeout may cause the hung task watchdog to trigger. In the case of a
firmware update operation, this is preferable to failing outright.
v2: Use bp->hwrm_cmd_max_timeout directly without the constants.
Fixes: 881d8353b0 ("bnxt_en: Add an upper bound for all firmware command timeouts.")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current driver design relies on the PF netdev being open in order
to intercept the following HWRM commands from a VF:
- HWRM_FUNC_VF_CFG
- HWRM_CFA_L2_FILTER_ALLOC
- HWRM_PORT_PHY_QCFG (only if FW_CAP_LINK_ADMIN is not supported)
If the PF is closed, then VFs are subjected to rather inscrutable error
messages in response to any configuration requests involving the above
command types. Recent firmware distinguishes this problem case from
other errors by returning HWRM_ERR_CODE_PF_UNAVAILABLE. In most cases,
the appropriate course of action is still to fail, but this can now be
accomplished with the aid of more user informative log messages. For L2
filter allocations that are already asynchronous, an automatic retry
seems more appropriate.
v2: Delete extra newline.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add logging of firmware messages. These can be useful for diagnosing
issues in the field, but due to their verbosity are only appropriate
at a debug message level.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When building ARCH=arm allmodconfig:
drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c: In function ‘iwl_mvm_ftm_rtt_smoothing’:
./include/asm-generic/div64.h:222:35: error: comparison of distinct pointer types lacks a cast [-Werror]
222 | (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
| ^~
drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c:1070:9: note: in expansion of macro ‘do_div’
1070 | do_div(rtt_avg, 100);
| ^~~~~~
do_div() has to be used with an unsigned 64-bit integer dividend but
rtt_avg is a signed 64-bit integer.
div_s64() expects a signed 64-bit integer dividend and signed 32-bit
divisor, which fits this scenario, so use that function here to fix the
warning.
Fixes: 8b0f92549f ("iwlwifi: mvm: fix 32-bit build in FTM")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20211227191757.2354329-1-nathan@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next. This
includes one patch to update ovs and act_ct to use nf_ct_put() instead
of nf_conntrack_put().
1) Add netns_tracker to nfnetlink_log and masquerade, from Eric Dumazet.
2) Remove redundant rcu read-size lock in nf_tables packet path.
3) Replace BUG() by WARN_ON_ONCE() in nft_payload.
4) Consolidate rule verdict tracing.
5) Replace WARN_ON() by WARN_ON_ONCE() in nf_tables core.
6) Make counter support built-in in nf_tables.
7) Add new field to conntrack object to identify locally generated
traffic, from Florian Westphal.
8) Prevent NAT from shadowing well-known ports, from Florian Westphal.
9) Merge nf_flow_table_{ipv4,ipv6} into nf_flow_table_inet, also from
Florian.
10) Remove redundant pointer in nft_pipapo AVX2 support, from Colin Ian King.
11) Replace opencoded max() in conntrack, from Jiapeng Chong.
12) Update conntrack to use refcount_t API, from Florian Westphal.
13) Move ip_ct_attach indirection into the nf_ct_hook structure.
14) Constify several pointer object in the netfilter codebase,
from Florian Westphal.
15) Tree-wide replacement of nf_conntrack_put() by nf_ct_put(), also
from Florian.
16) Fix egress splat due to incorrect rcu notation, from Florian.
17) Move stateful fields of connlimit, last, quota, numgen and limit
out of the expression data area.
18) Build a blob to represent the ruleset in nf_tables, this is a
requirement of the new register tracking infrastructure.
19) Add NFT_REG32_NUM to define the maximum number of 32-bit registers.
20) Add register tracking infrastructure to skip redundant
store-to-register operations, this includes support for payload,
meta and bitwise expresssions.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: (32 commits)
netfilter: nft_meta: cancel register tracking after meta update
netfilter: nft_payload: cancel register tracking after payload update
netfilter: nft_bitwise: track register operations
netfilter: nft_meta: track register operations
netfilter: nft_payload: track register operations
netfilter: nf_tables: add register tracking infrastructure
netfilter: nf_tables: add NFT_REG32_NUM
netfilter: nf_tables: add rule blob layout
netfilter: nft_limit: move stateful fields out of expression data
netfilter: nft_limit: rename stateful structure
netfilter: nft_numgen: move stateful fields out of expression data
netfilter: nft_quota: move stateful fields out of expression data
netfilter: nft_last: move stateful fields out of expression data
netfilter: nft_connlimit: move stateful fields out of expression data
netfilter: egress: avoid a lockdep splat
net: prefer nf_ct_put instead of nf_conntrack_put
netfilter: conntrack: avoid useless indirection during conntrack destruction
netfilter: make function op structures const
netfilter: core: move ip_ct_attach indirection to struct nf_ct_hook
netfilter: conntrack: convert to refcount_t api
...
====================
Link: https://lore.kernel.org/r/20220109231640.104123-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The meta expression might mangle the packet metadata, cancel register
tracking since any metadata in the registers is stale.
Finer grain register tracking cancellation by inspecting the meta type
on the register is also possible.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The payload expression might mangle the packet, cancel register tracking
since any payload data in the registers is stale.
Finer grain register tracking cancellation by inspecting the payload
base, offset and length on the register is also possible.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Check if the destination register already contains the data that this
bitwise expression performs. This allows to skip this redundant
operation.
If the destination contains a different bitwise operation, cancel the
register tracking information. If the destination contains no bitwise
operation, update the register tracking information.
Update the payload and meta expression to check if this bitwise
operation has been already performed on the register. Hence, both the
payload/meta and the bitwise expressions are reduced.
There is also a special case: If source register != destination register
and source register is not updated by a previous bitwise operation, then
transfer selector from the source register to the destination register.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Check if the destination register already contains the data that this
meta store expression performs. This allows to skip this redundant
operation. If the destination contains a different selector, update
the register tracking information.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Check if the destination register already contains the data that this
payload store expression performs. This allows to skip this redundant
operation. If the destination contains a different selector, update
the register tracking information.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds new infrastructure to skip redundant selector store
operations on the same register to achieve a performance boost from
the packet path.
This is particularly noticeable in pure linear rulesets but it also
helps in rulesets which are already heaving relying in maps to avoid
ruleset linear inspection.
The idea is to keep data of the most recurrent store operations on
register to reuse them with cmp and lookup expressions.
This infrastructure allows for dynamic ruleset updates since the ruleset
blob reduction happens from the kernel.
Userspace still needs to be updated to maximize register utilization to
cooperate to improve register data reuse / reduce number of store on
register operations.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a definition including the maximum number of 32-bits registers that
are used a scratchpad memory area to store data.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds a blob layout per chain to represent the ruleset in the
packet datapath.
size (unsigned long)
struct nft_rule_dp
struct nft_expr
...
struct nft_rule_dp
struct nft_expr
...
struct nft_rule_dp (is_last=1)
The new structure nft_rule_dp represents the rule in a more compact way
(smaller memory footprint) compared to the control-plane nft_rule
structure.
The ruleset blob is a read-only data structure. The first field contains
the blob size, then the rules containing expressions. There is a trailing
rule which is used by the tracing infrastructure which is equivalent to
the NULL rule marker in the previous representation. The blob size field
does not include the size of this trailing rule marker.
The ruleset blob is generated from the commit path.
This patch reuses the infrastructure available since 0cbc06b3fa
("netfilter: nf_tables: remove synchronize_rcu in commit phase") to
build the array of rules per chain.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
include/linux/netfilter_netdev.h:97 suspicious rcu_dereference_check() usage!
2 locks held by sd-resolve/1100:
0: ..(rcu_read_lock_bh){1:3}, at: ip_finish_output2
1: ..(rcu_read_lock_bh){1:3}, at: __dev_queue_xmit
__dev_queue_xmit+0 ..
The helper has two callers, one uses rcu_read_lock, the other
rcu_read_lock_bh(). Annotate the dereference to reflect this.
Fixes: 42df6e1d22 ("netfilter: Introduce egress hook")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Its the same as nf_conntrack_put(), but without the
need for an indirect call. The downside is a module dependency on
nf_conntrack, but all of these already depend on conntrack anyway.
Cc: Paul Blakey <paulb@mellanox.com>
Cc: dev@openvswitch.org
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_ct_put() results in a usesless indirection:
nf_ct_put -> nf_conntrack_put -> nf_conntrack_destroy -> rcu readlock +
indirect call of ct_hooks->destroy().
There are two _put helpers:
nf_ct_put and nf_conntrack_put. The latter is what should be used in
code that MUST NOT cause a linker dependency on the conntrack module
(e.g. calls from core network stack).
Everyone else should call nf_ct_put() instead.
A followup patch will convert a few nf_conntrack_put() calls to
nf_ct_put(), in particular from modules that already have a conntrack
dependency such as act_ct or even nf_conntrack itself.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
No functional changes, these structures should be const.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ip_ct_attach predates struct nf_ct_hook, we can place it there and
remove the exported symbol.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Convert nf_conn reference counting from atomic_t to refcount_t based api.
refcount_t api provides more runtime sanity checks and will warn on
certain constructs, e.g. refcount_inc() on a zero reference count, which
usually indicates use-after-free.
For this reason template allocation is changed to init the refcount to
1, the subsequenct add operations are removed.
Likewise, init_conntrack() is changed to set the initial refcount to 1
instead refcount_inc().
This is safe because the new entry is not (yet) visible to other cpus.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>