This tracepoint can be used to trace synack retransmits. It maintains
pointer to struct request_sock.
We cannot simply reuse trace_tcp_retransmit_skb() here, because the
sk here is the LISTEN socket. The IP addresses and ports should be
extracted from struct request_sock.
Note that, like many other tracepoints, this patch uses IS_ENABLED
in TP_fast_assign macro, which triggers sparse warning like:
./include/trace/events/tcp.h:274:1: error: directive in argument list
./include/trace/events/tcp.h:281:1: error: directive in argument list
However, there is no good solution to avoid these warnings. To the
best of our knowledge, these warnings are harmless.
Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 8200 (IPv6) defines Hop-by-Hop options and Destination options
extension headers. Both of these carry a list of TLVs which is
only limited by the maximum length of the extension header (2048
bytes). By the spec a host must process all the TLVs in these
options, however these could be used as a fairly obvious
denial of service attack. I think this could in fact be
a significant DOS vector on the Internet, one mitigating
factor might be that many FWs drop all packets with EH (and
obviously this is only IPv6) so an Internet wide attack might not
be so effective (yet!).
By my calculation, the worse case packet with TLVs in a standard
1500 byte MTU packet that would be processed by the stack contains
1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I
wrote a quick test program that floods a whole bunch of these
packets to a host and sure enough there is substantial time spent
in ip6_parse_tlv. These packets contain nothing but unknown TLVS
(that are ignored), TLV padding, and bogus UDP header with zero
payload length.
25.38% [kernel] [k] __fib6_clean_all
21.63% [kernel] [k] ip6_parse_tlv
4.21% [kernel] [k] __local_bh_enable_ip
2.18% [kernel] [k] ip6_pol_route.isra.39
1.98% [kernel] [k] fib6_walk_continue
1.88% [kernel] [k] _raw_write_lock_bh
1.65% [kernel] [k] dst_release
This patch adds configurable limits to Destination and Hop-by-Hop
options. There are three limits that may be set:
- Limit the number of options in a Hop-by-Hop or Destination options
extension header.
- Limit the byte length of a Hop-by-Hop or Destination options
extension header.
- Disallow unrecognized options in a Hop-by-Hop or Destination
options extension header.
The limits are set in corresponding sysctls:
ipv6.sysctl.max_dst_opts_cnt
ipv6.sysctl.max_hbh_opts_cnt
ipv6.sysctl.max_dst_opts_len
ipv6.sysctl.max_hbh_opts_len
If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed.
The number of known TLVs that are allowed is the absolute value of
this number.
If a limit is exceeded when processing an extension header the packet is
dropped.
Default values are set to 8 for options counts, and set to INT_MAX
for maximum length. Note the choice to limit options to 8 is an
arbitrary guess (roughly based on the fact that the stack supports
three HBH options and just one destination option).
These limits have being proposed in draft-ietf-6man-rfc6434-bis.
Tested (by Martin Lau)
I tested out 1 thread (i.e. one raw_udp process).
I changed the net.ipv6.max_dst_(opts|hbh)_number between 8 to 2048.
With sysctls setting to 2048, the softirq% is packed to 100%.
With 8, the softirq% is almost unnoticable from mpstat.
v2;
- Code and documention cleanup.
- Change references of RFC2460 to be RFC8200.
- Add reference to RFC6434-bis where the limits will be in standard.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lipeng says:
====================
net: hns3: add support for reset
There are 4 reset types for HNS3 PF driver, include global reset,
core reset, IMP reset, PF reset.The core reset will reset all datapath
of all functions except IMP, MAC and PCI interface. Global reset is equal
with the core reset plus all MAC reset. IMP reset is caused by watchdog
timer expiration, the same range with core reset. PF reset will reset
whole physical function.
This patchset adds reset support for hns3 driver and fix some related bugs.
---
Change log:
V1 -> V2:
1, fix some comments from Yunsheng Lin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
All member of Struct hdev->hw_stats is initialized to 0 as hdev is
allocated by devm_kzalloc. But in reset process, hdev will not be
allocated again, so need clear hdev->hw_stats in reset process, otherwise
the statistic will be wrong after reset. This patch set all of the
statistic counters to zero after reset.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
we should use free_irq to free the nic irq during the unloading time.
because we use request_irq to apply it when nic up. It will crash if
up net device after reset the port. This patch fixes the issue.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implement the interface of reset notification in hns3_enet,
it will do resetting business which include shutdown nic device,
free and initialize client side resource.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add timeout handler in hns3_enet.c to handle
TX side timeout event, when TX timeout event occur, it will triger
NIC driver into reset process.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds reset support for PF,it include : global reset, core reset,
IMP reset, PF reset.The core reset will Reset all datapath of all functions
except IMP, MAC and PCI interface. Global reset is equal with the core
reset plus all MAC reset. IMP reset is caused by watchdog timer expiration,
the same with core reset in the reset flow. PF reset will reset whole
physical function.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds initialization and deinitialization for misc interrupt.
This interrupt will be used to handle reset message(IRQ).
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no necessary to reallocate the descriptor and remap the descriptor
memory in reset process, But there is still some other action exist in both
reset process and initialization process.
To reuse the common interface in reset process and initialization process,
This patch moves out the descriptor allocate and memory maping from
interface cmdq_init.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It needs initialize mdio in initialization process, but reset process
does not reset mdio, so do not initialize mdio in reset process.
This patch move out the mdio configuration function from the mac_init.
So mac_init can be used both in reset process and initialization process.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch refactor the mapping of tqp to vport, making the maping function
can be used both in the reset process and initialization process.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the timer conversion patches evidently escaped build testing
until I ran into in on ARM randconfig builds:
drivers/net/ethernet/seeq/ether3.c: In function 'ether3_ledoff':
drivers/net/ethernet/seeq/ether3.c:175:40: error: 'priv' undeclared (first use in this function); did you mean 'pid'?
drivers/net/ethernet/seeq/ether3.c:176:27: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
This fixes the two small typos that caused the problems.
Fixes: 6fd9c53f71 ("net: seeq: Convert timers to use timer_setup()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide a rough overview of the state of the driver. And explain that the
driver operates in two modes: bridged and port-separated.
Signed-off-by: Egil Hjelmeland <egil.hjelmeland@zenitel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
nfp: TC block fixes, app fallback and dev_alloc()
This series has three parts. First of all John and I fix some
fallout from the TC block conversion. John also fixes sleeping
in the neigh notifier.
Secondly I reorganise the nfp_app table to make it easier to
deal with excluding apps which have unmet Kconfig dependencies.
Last but not least after the fixes which went into -net some time
ago I refactor the page allocation, add a ethtool counter for
failed allocations and clean the ethtool stat code while at it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We split rvector stats into two categories - per queue and
stats which are added up into one total counter. Improve
the defines denoting their number.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a counter incremented when allocation of replacement
RX page fails.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the dev_alloc_page() networking helper to allocate pages
for RX packets.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If kernel config does not include BPF just replace the BPF
app handler with the handler for basic NIC. The BPF app
will now be built only if BPF infrastructure is selected
in kernel config.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The app table is an unordered array right now. We have to search
apps by ID. It also makes it harder to fall back to core NIC if
advanced functions are not compiled into the kernel (e.g. eBPF).
Make the table keyed by app id.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent TC changes dropped the check protecting us from trying
to offload a TC program if XDP programs are already loaded.
Fixes: 90d97315b3 ("nfp: bpf: Convert ndo_setup_tc offloads to block callbacks")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Functions called by the netevent notifier must be in atomic context.
Change the mutex to spinlock and ensure mem allocations are done with the
atomic flag.
Also, remove unnecessary locking after notifiers are unregistered.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure priv netdev data in flower app is cast to nfp_repr and not nfp_net
as in other apps.
Fixes: 363fc53b8b ("nfp: flower: Convert ndo_setup_tc offloads to block callbacks")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use PATH_MAX instead of hardcoded array size 256
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
LiquidIO firmware supports a vswitch that needs to know the names of the
VF representors in the host to maintain compatibility for direct
programming using external Openflow agents. So, for each VF representor,
send its name to the firmware when it gets registered and when its name
changes.
Signed-off-by: Vijaya Mohan Guvva <vijaya.guvva@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
BPF range marking improvements for meta data
The set contains improvements for direct packet access range
markings related to data_meta pointer and test cases for all
such access patterns that the verifier matches on.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lets also add test cases to cover all possible data_meta access tests
for good/bad access cases so we keep tracking them.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow-up to 0fd4759c55 ("bpf: fix pattern matches for direct
packet access") to cover also the remaining data_meta/data matches
in the verifier. The matches are also refactored a bit to simplify
handling of all the cases.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two minor cleanups after Dave's recent merge in f8ddadc4db
("Merge git://git.kernel.org...") of net into net-next in
order to get the code in line with what was done originally
in the net tree: i) use max() instead of max_t() since both
ranges are u16, ii) don't split the direct access test cases
in the middle with bpf_exit test cases from 390ee7e29f
("bpf: enforce return code for cgroup-bpf programs").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Touching linux/bpf.h makes us rebuild a surprisingly large
portion of the kernel. Remove the unnecessary dependency
from security.h, it only needs forward declarations.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hesoteric board configurations where port 0 is not available would still
make SYSTEMPORT inspect the switch port 0, queue 0, which, not being
enabled, would cause transmit timeouts over time. Just ignore those
unconfigured rings instead.
Fixes: 84ff33eeb23d ("net: systemport: Establish DSA network device queue mapping")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
nfp: bpf: rename ALU_OP_NEG and support BPF_NEG
Jiong says:
Compilers are starting to use BPF_NEG, for example LLVM. However, NFP
does not support JITing it. This patch set adds this. Unit test is added
as well.
Meanwhile, the current NFP_ALU_NEG is actually doing bitwise NOT (one's
complement) operation, so the name is misleading. This patch set corrects
this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch supports BPF_NEG under both BPF_ALU64 and BPF_ALU. LLVM recently
starts to generate it.
NOTE: BPF_NEG takes single operand which is an register and serve as both
input and output.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current ALU_OP_NEG is Op encoding 0x4 for NPF ALU instruction. It is
actually performing "~B" operation which is bitwise NOT.
The using naming ALU_OP_NEG is misleading as NEG is -B which is not the
same as ~B.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
yuan linyu says:
====================
net: dpaa: two minor cleanup
original i try to remove duplicate code which clean allocated per-cpu area,
thanks to David S. Miller, there are two build warning as errors.
path 1: fix old code maybe-uninitialized warning.
path 2: remove duplicate code and fix unused var warning.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Discovered that the compiler laid-out asm code in suboptimal way
when studying perf report during benchmarking of cpumap. Help
the compiler by the marking unlikely code paths.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
net: sched: block callbacks follow-up
This patchset does a bit of cleanup of leftovers after block callbacks
patchset. The main part is patch 2, which restores the original handling
of tc offload feature flag.
---
v1->v2:
- rebased on top of current net-next (bnxt changes)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since tc_can_offload is always called from block callback or egdev
callback, no need to check if ndo_setup_tc exists.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the only user, mlx5 driver does the check in
mlx5e_setup_tc_block_cb, no need to check here.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This restores the original behaviour before the block callbacks were
introduced. Allow the drivers to do binding of block always, no matter
if the NETIF_F_HW_TC feature is on or off. Move the check to the block
callback which is called for rule insertion.
Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the bridge device doesn't generate any notifications upon vlan
modifications on itself because it doesn't use the generic bridge
notifications.
With the recent changes we know if anything was modified in the vlan config
thus we can generate a notification when necessary for the bridge device
so add support to br_ifinfo_notify() similar to how other combined
functions are done - if port is present it takes precedence, otherwise
notify about the bridge. I've explicitly marked the locations where the
notification should be always for the port by setting bridge to NULL.
I've also taken the liberty to rearrange each modified function's local
variables in reverse xmas tree as well.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The assignment to kinfo is redundant as this is a duplicate of
the initialiation of kinfo a few lines earlier, so it can be
removed. The assignment to v_tc_info is never read, so this
variable is redundant and can be removed completely. Cleans
up two clang warnings:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c:433:34:
warning: Value stored to 'kinfo' during its initialization is never read
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c:775:3:
warning: Value stored to 'v_tc_info' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The zero value assigned to inst_processed at the end of each
iteration of the do-while loop is overwritten on the next iteration
and hence it is a redundant assignment and can be removed. Cleans
up clang warning:
drivers/net/ethernet/cavium/liquidio/request_manager.c:480:3:
warning: Value stored to 'inst_processed' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pointer np is initialized and then re-assigned the same value
a few lines later. Remove the redundant duplicated assignment. Cleans
up clang warning:
drivers/net/ethernet/dlink/dl2k.c:314:25: warning: Value stored to
'np' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stat set to zero and the value is never read, instead stat is
set again in the do-loop. Hence the setting to zero is redundant
and can be removed. Cleans up clang warning:
drivers/net/wan/wanxl.c:737:2: warning: Value stored to 'stat'
is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smooth Cong Wang's bug fix into 'net-next'. Basically put
the bulk of the tcf_block_put() logic from 'net' into
tcf_block_put_ext(), but after the offload unbind.
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer says:
====================
Updates for samples/pktgen
This patchset updates samples/pktgen and synchronize with changes
maintained in https://github.com/netoptimizer/network-testing/
Features wise Robert Hoo <robert.hu@intel.com> added support for
detecting and determining dev NUMA node IRQs, and added a new script
named pktgen_sample06_numa_awared_queue_irq_affinity.sh that use these
features.
Cleanup remove last of the old sample files, as IPv6 is covered by
existing sample code.
====================
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>