Currently the kernel accounts the memory for network traffic through
mem_cgroup_[un]charge_skmem() interface. However the memory accounted
only includes the truesize of sk_buff which does not include the size of
sock objects. In our production environment, with opt-out kmem
accounting, the sock kmem caches (TCP[v6], UDP[v6], RAW[v6], UNIX) are
among the top most charged kmem caches and consume a significant amount
of memory which can not be left as system overhead. So, this patch
converts the kmem caches of all sock objects to SLAB_ACCOUNT.
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge fixups for the recent extenstion of the generic power domains
(genpd) framework covering performance states.
* pm-domains:
PM / Domains: Rename opp_node to np
PM / Domains: Fix return value of of_genpd_opp_to_performance_state()
The BIT macro uses unsigned long which some architectures handle as 32 bit
and therefore might cause macro's shift to overflow when used on a value
equals or larger than 32 (NL80211_STA_INFO_RX_DURATION and afterwards).
Since 'filled' member in station_info changed to u64, BIT_ULL macro
should be used with all NL80211_STA_INFO_* attribute types instead of BIT
to prevent future possible bugs when one will use BIT macro for higher
attributes by mistake.
This commit cleans up all usages of BIT macro with the above field
in mac80211 by changing it to BIT_ULL instead.
Signed-off-by: Omer Efrat <omer.efrat@tandemg.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The BIT macro uses unsigned long which some architectures handle as 32 bit
and therefore might cause macro's shift to overflow when used on a value
equals or larger than 32 (NL80211_STA_INFO_RX_DURATION and afterwards).
Since 'filled' member in station_info changed to u64, BIT_ULL macro
should be used with all NL80211_STA_INFO_* attribute types instead of BIT
to prevent future possible bugs when one will use BIT macro for higher
attributes by mistake.
This commit cleans up all usages of BIT macro with the above field
in cfg80211 by changing it to BIT_ULL instead. In addition, there are
some places which don't use BIT nor BIT_ULL macros so align those as well.
Signed-off-by: Omer Efrat <omer.efrat@tandemg.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We don't need to check if he_oper is NULL before calling
ieee80211_verify_sta_he_mcs_support() as it - now - will
correctly check this itself. Remove the redundant check.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
he_op is being dereferenced before it is null checked, hence there
is a potential null pointer dereference.
Fix this by moving the pointer dereference after he_op has been
properly null checked.
Notice that, currently, he_op is already being null checked before
calling this function at 4593:
4593 if (!he_oper ||
4594 !ieee80211_verify_sta_he_mcs_support(sband, he_oper))
4595 ifmgd->flags |= IEEE80211_STA_DISABLE_HE;
but in case ieee80211_verify_sta_he_mcs_support is ever called
without verifying he_oper is not null, we will end up having a
null pointer dereference. So, we better don't take any chances.
Addresses-Coverity-ID: 1470068 ("Dereference before null check")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The cfg80211 layer uses get_seconds() to read the current time
in its supend handling. This function is deprecated because of the 32-bit
time_t overflow, and it can cause unexpected behavior when the time
changes due to settimeofday() calls or leap second updates.
In many cases, we want to use monotonic time instead, however cfg80211
explicitly tracks the time spent in suspend, so this changes the
driver over to use ktime_get_boottime_seconds(), which is slightly
slower, but not used in a fastpath here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
At the very least we should check the return value if
nla_parse_nested() is called with a non-NULL policy.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit 9757235f45, "nl80211: correct checks for
NL80211_MESHCONF_HT_OPMODE value") relaxed the range for the HT
operation field in meshconf, while also adding checks requiring
the non-greenfield and non-ht-sta bits to be set in certain
circumstances. The latter bit is actually reserved for mesh BSSes
according to Table 9-168 in 802.11-2016, so in fact it should not
be set.
wpa_supplicant sets these bits because the mesh and AP code share
the same implementation, but authsae does not. As a result, some
meshconf updates from authsae which set only the NONHT_MIXED
protection bits were being rejected.
In order to avoid breaking userspace by changing the rules again,
simply accept the values with or without the bits set, and mask
off the reserved bit to match the spec.
While in here, update the 802.11-2012 reference to 802.11-2016.
Fixes: 9757235f45 ("nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value")
Cc: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Bob Copeland <bobcopeland@fb.com>
Reviewed-by: Masashi Honma <masashi.honma@gmail.com>
Reviewed-by: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
On pre-emption enabled kernels the following print was being seen due to
missing local_bh_disable/local_bh_enable calls. mac80211 assumes that
pre-emption is disabled in the data path.
BUG: using smp_processor_id() in preemptible [00000000] code: iwd/517
caller is __ieee80211_subif_start_xmit+0x144/0x210 [mac80211]
[...]
Call Trace:
dump_stack+0x5c/0x80
check_preemption_disabled.cold.0+0x46/0x51
__ieee80211_subif_start_xmit+0x144/0x210 [mac80211]
Fixes: 9118064914 ("mac80211: Add support for tx_control_port")
Signed-off-by: Denis Kenzior <denkenz@gmail.com>
[commit message rewrite, fixes tag]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It seems that during the conversion from gpio* to gpiod*, the initial
state of SCL was wrongly switched to LOW. Fix it to be HIGH again.
Fixes: 7bb75029ef ("i2c: gpio: Enforce open drain through gpiolib")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
If DMA safe memory was allocated, but the subsequent I2C transfer
fails the memory is leaked. Plug this leak.
Fixes: 8a77821e74 ("i2c: smbus: use DMA safe buffers for emulated SMBus transactions")
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
So, if somebody wants to re-implement this in the future, we pinpoint to
a problem case.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
This reverts commit 3e5f06bed7. As per
bugzilla #200045, this caused a regression. I don't really see a way to
fix it without having the hardware. So, revert the patch and I will fix
the issue I was seeing originally in the i2c-gpio driver itself. I
couldn't find new users of this algorithm since, so there should be no
one depending on the new behaviour.
Reported-by: Sergey Larin <cerg2010cerg2010@mail.ru>
Fixes: 3e5f06bed7 ("i2c: algo-bit: init the bus to a known state")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Sergey Larin <cerg2010cerg2010@mail.ru>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Tom Herbert says:
====================
ila: Cleanup
Perform some cleanup in ILA code. This includes:
- Fix rhashtable walk for cases where nl dumps are done with muliple
function calls. Add a skip index to skip over entries in
a node that have been previously visitied. Call rhashtable_walk_peek
to avoid dropping items between calls to ila_nl_dump.
- Call alloc_bucket_spinlocks to create bucket locks.
- Split out module initialization and netlink definitions into
separate files.
- Add ILA_CMD_FLUSH netlink command to clear the ILA translation table.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ILA_CMD_FLUSH netlink command to clear the ILA translation table.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a main ila file that contains the module initialization functions
as well as netlink definitions. Previously these were defined in
ila_xlat and ila_common. This approach allows better extensibility.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
To allocate the array of bucket locks for the hash table we now
call library function alloc_bucket_spinlocks.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Perform better EAGAIN handling, handle case where ila_dump_info
fails and we missed objects in the dump, and add a skip index
to skip over ila entires in a list on a rhashtable node that have
already been visited (by a previous call to ila_nl_dump).
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li says:
====================
net: hns3: a few code improvements
This patchset fixes a few code stylistic issues from
concentrated review, no functional changes introduced.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
MACRO lower_32_bits and upper_32_bits can help to get bits 0-31
and bits 32-63 of a number, so just use it.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hclge_hw is embedded in hclge_dev, so use container_of instead of
back to get hclge_dev.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The interface h->ae_algo->ops->put_vector is called in both
hns3_nic_dealloc_vector_data and hns3_nic_uninit_vector_data in
hns3_client_uninit, this will cause vector freed twice.
This patch remove the Redundant put_vector to make vector freed
only once.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Print the ret value in error information can help find the reason.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extraction an interface for state init|uninit to make the code
easier to read.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
linux/slab.h is not used in hnae3.h, this patch removes it.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first bd of a packet is invalid and invalid ring head for tx
IRQ is not offen, they may occur when there is error,
Add unlikely for error check branch is better for performance.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HW supports UDP, TCP and SCTP packets checksum for both ipv4 and
ipv6, but do not support other type packets checksum for ipv4 or
ipv6.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the hdev->vector_status[vector_id] is already HCLGE_INVALID_VPORT,
should log the error and return.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The interface init_client_instance and uninit_client_instance
do not register anything, only initialize the client instance.
This patch rename the related interface to make the function
name to indicate the purpose.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In hclge_unmap_ring_frm_vector, there are 2 steps:
step 1: get vector index.
step 2 unbind ring with vector.
But it gets vector id again in step 2 interface. This patch
removes hclge_get_vector_index from hclge_bind_ring_with_vector,
and make the step the same with hns3 PF driver.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a user is accessing a file in selinuxfs with a pointer to a userspace
buffer that is backed by e.g. a userfaultfd, the userspace access can
stall indefinitely, which can block fsi->mutex if it is held.
For sel_read_policy(), remove the locking, since this method doesn't seem
to access anything that requires locking.
For sel_read_bool(), move the user access below the locked region.
For sel_write_bool() and sel_commit_bools_write(), move the user access
up above the locked region.
Cc: stable@vger.kernel.org
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: removed an unused variable in sel_read_policy()]
Signed-off-by: Paul Moore <paul@paul-moore.com>
For ACLs implemented using either FIB rules or FIB entries, the BPF
program needs the FIB lookup status to be able to drop the packet.
Since the bpf_fib_lookup API has not reached a released kernel yet,
change the return code to contain an encoding of the FIB lookup
result and return the nexthop device index in the params struct.
In addition, inform the BPF program of any post FIB lookup reason as
to why the packet needs to go up the stack.
The fib result for unicast routes must have an egress device, so remove
the check that it is non-NULL.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Flag with FLAG_EXPECTED_FAIL the BPF_MAXINSNS tests that cannot be jited
on s390 because they exceed BPF_SIZE_MAX and fail when
CONFIG_BPF_JIT_ALWAYS_ON is set. Also set .expected_errcode to -ENOTSUPP
so the tests pass in that case.
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
XDP_TX requires also changing the MAC-addrs, else some hardware
may drop the TX packet before reaching the wire. This was
observed with driver mlx5.
If xdp_rxq_info select --action XDP_TX the swapmac functionality
is activated. It is also possible to manually enable via cmdline
option --swapmac. This is practical if wanting to measure the
overhead of writing/updating payload for other action types.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There is a cost associated with reading the packet data payload
that this test ignored. Add option --read to allow enabling
reading part of the payload.
This sample/tool helps us analyse an issue observed with a NIC
mlx5 (ConnectX-5 Ex) and an Intel(R) Xeon(R) CPU E5-1650 v4.
With no_touch of data:
Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:no_touch
XDP stats CPU pps issue-pps
XDP-RX CPU 0 14,465,157 0
XDP-RX CPU 1 14,464,728 0
XDP-RX CPU 2 14,465,283 0
XDP-RX CPU 3 14,465,282 0
XDP-RX CPU 4 14,464,159 0
XDP-RX CPU 5 14,465,379 0
XDP-RX CPU total 86,789,992
When not touching data, we observe that the CPUs have idle cycles.
When reading data the CPUs are 100% busy in softirq.
With reading data:
Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:read
XDP stats CPU pps issue-pps
XDP-RX CPU 0 9,620,639 0
XDP-RX CPU 1 9,489,843 0
XDP-RX CPU 2 9,407,854 0
XDP-RX CPU 3 9,422,289 0
XDP-RX CPU 4 9,321,959 0
XDP-RX CPU 5 9,395,242 0
XDP-RX CPU total 56,657,828
The effect seen above is a result of cache-misses occuring when
more RXQs are being used. Based on perf-event observations, our
conclusion is that the CPUs DDIO (Direct Data I/O) choose to
deliver packet into main memory, instead of L3-cache. We also
found, that this can be mitigated by either using less RXQs or by
reducing NICs the RX-ring size.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Disable periodic stats update background thread and update stats in
background on demand when ndo_get_stats is called.
Having a background thread running in the driver all the time is bad for
power consumption and normally a user space daemon will query the stats
once every specific interval, so ideally the background thread and its
interval can be done in user space..
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
A per-ring counter for NOP operations already exists.
Here I add a counter that sums them up.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add ethtool counter to indicate the number of strides consumed
by filler CQEs.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add per-channel and global ethtool counters for channel events.
Each event indicates an interrupt on one of the channel's
completion queues.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add per-ring and global ethtool counters for congested UMR requests.
These events indicate congestion in UMR handlers in HW.
Such event is concluded when there's an outstanding UMR post,
yet the SW consumed at least two additional MPWQEs in the meanwhile.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add per-channel and global ethtool counters for NAPI.
This helps us monitor and analyze performance in general.
- ch[i]_poll:
the number of times the channel's NAPI poll was invoked.
- ch[i]_arm:
the number of times the channel's NAPI poll completed
and armed the completion queues.
- ch[i]_aff_change:
the number of times the channel's NAPI poll explicitly
stopped execution on a cpu due to a change in affinity.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add per-ring and global ethtool counters for XDP_TX completions.
This helps us monitor and analyze XDP_TX flow performance.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add per-ring and global ethtool counters for TX completions.
This helps us monitor and analyze TX flow performance.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Local variable 'wq' already points to &sq->wq, use it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Replace calls to kzalloc_node with kvzalloc_node, as it fallsback
to lower-order pages if the higher-order trials fail.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch adds a counter for tx UDP GSO packets that contain a segment
that is not aligned to MSS - remaining segment.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch enables UDP GSO support. We enable this by using two WQEs
the first is a UDP LSO WQE for all segments with equal length, and the
second is for the last segment in case it has different length.
Due to HW limitation, before sending, we must adjust the packet length fields.
We measure performance between two Intel(R) Xeon(R) CPU E5-2643 v2 @3.50GHz
machines connected back-to-back with Connectx4-Lx (40Gbps) NICs.
We compare single stream UDP, UDP GSO and UDP GSO with offload.
Performance:
| MSS (bytes) | Throughput (Gbps) | CPU utilization (%)
UDP GSO offload | 1472 | 35.6 | 8%
UDP GSO | 1472 | 25.5 | 17%
UDP | 1472 | 10.2 | 17%
UDP GSO offload | 1024 | 35.6 | 8%
UDP GSO | 1024 | 19.2 | 17%
UDP | 1024 | 5.7 | 17%
UDP GSO offload | 512 | 33.8 | 16%
UDP GSO | 512 | 10.4 | 17%
UDP | 512 | 3.5 | 17%
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>