watermark_boost_factor_sysctl_handler is just a pointless wrapper for
proc_dointvec_minmax, so remove it and use proc_dointvec_minmax
directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When tx complete is disabled, all tx status will be set with status
HTT_TX_COMPL_STATE_ACK and indicate to mac80211 by ieee80211_tx_status,
then it does not have the statistics for retries and failed packets.
count of tx retries and tx failed of command "iw wlan0 station dump"
are both 0. If tx complete is not disabled, then firmware report the
tx status and ath10k indicate the status to mac80211, then mac80211
save the statistics and command "iw wlan0 station dump" show them.
for example:
localhost ~ # iw dev wlan0 station dump
Station 3c:28:6d:96:fd:69 (on wlan0)
inactive time: 5 ms
rx bytes: 1325012
rx packets: 6477
tx bytes: 85264
tx packets: 518
tx retries: 0
tx failed: 0
This patch only effect chips with tx complete disabled, e.g. SDIO.
with this patch, output of command "iw dev wlan0 station dump":
Station c4:04:15:5d:97:22 (on wlan0)
inactive time: 608 ms
rx bytes: 180366
rx packets: 991
tx bytes: 98765577
tx packets: 64624
tx retries: 14682
tx failed: 47086
Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-00042.
Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20200423024134.10601-1-wgong@codeaurora.org
When run command "iw dev wlan0 station dump", the rx duration is 0.
When firmware indicate WMI_UPDATE_STATS_EVENTID, extended flag of
statsis not set by default, so firmware do not report rx duration.
one sample:
localhost # iw wlan0 station dump
Station c4:04:15:5d:97:22 (on wlan0)
inactive time: 48 ms
rx bytes: 21670
rx packets: 147
tx bytes: 11529
tx packets: 100
tx retries: 88
tx failed: 36
beacon loss: 1
beacon rx: 31
rx drop misc: 47
signal: -72 [-74, -75] dBm
signal avg: -71 [-74, -75] dBm
beacon signal avg: -71 dBm
tx bitrate: 54.0 MBit/s MCS 3 40MHz
rx bitrate: 1.0 MBit/s
rx duration: 0 us
This patch enable firmware's extened flag of stats by setting flag
WMI_TLV_STAT_PEER_EXTD of ar->fw_stats_req_mask which is set in
ath10k_core_init_firmware_features via WMI_REQUEST_STATS_CMDID.
After apply this patch, rx duration show value with the command:
Station c4:04:15:5d:97:22 (on wlan0)
inactive time: 883 ms
rx bytes: 44289
rx packets: 265
tx bytes: 10838
tx packets: 93
tx retries: 899
tx failed: 103
beacon loss: 0
beacon rx: 78
rx drop misc: 46
signal: -71 [-74, -76] dBm
signal avg: -70 [-74, -76] dBm
beacon signal avg: -70 dBm
tx bitrate: 54.0 MBit/s MCS 3 40MHz
rx bitrate: 1.0 MBit/s
rx duration: 358004 us
This patch do not have side effect for all chips, because function
ath10k_debug_fw_stats_request is already exported to debugfs
"fw_stats" and WMI_REQUEST_STATS_CMDID is safely sent after condition
checked by ath10k_peer_stats_enabled in ath10k_sta_statistics.
Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-00042.
Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20200423022758.5365-1-wgong@codeaurora.org
we are sending the reo flush command for the deleted peer
tid after the ageout period reaches 1 second. This handling
causes reo ring get full when more than 128 clients are
disconnected continuously. so added the count for flush list
and reo flush command is triggered after the list count reaches
the threshold value, it is configured as 64 (half of the reo ring).
This will avoid the situation where reo ring get full.
Signed-off-by: Karthikeyan Periyasamy <periyasa@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1587552378-4884-1-git-send-email-periyasa@codeaurora.org
If the UDP header of a local VXLAN endpoint is NAT-ed, and the VXLAN
device has disabled UDP checksums and enabled Tx checksum offloading,
then the skb passed to udp_manip_pkt() has hdr->check == 0 (outer
checksum disabled) and skb->ip_summed == CHECKSUM_PARTIAL (inner packet
checksum offloaded).
Because of the ->ip_summed value, udp_manip_pkt() tries to update the
outer checksum with the new address and port, leading to an invalid
checksum sent on the wire, as the original null checksum obviously
didn't take the old address and port into account.
So, we can't take ->ip_summed into account in udp_manip_pkt(), as it
might not refer to the checksum we're acting on. Instead, we can base
the decision to update the UDP checksum entirely on the value of
hdr->check, because it's null if and only if checksum is disabled:
* A fully computed checksum can't be 0, since a 0 checksum is
represented by the CSUM_MANGLED_0 value instead.
* A partial checksum can't be 0, since the pseudo-header always adds
at least one non-zero value (the UDP protocol type 0x11) and adding
more values to the sum can't make it wrap to 0 as the carry is then
added to the wrapped number.
* A disabled checksum uses the special value 0.
The problem seems to be there from day one, although it was probably
not visible before UDP tunnels were implemented.
Fixes: 5b1158e909 ("[NETFILTER]: Add NAT support for nf_conntrack")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This bit indicates that the conntrack entry is offloaded to hardware
flow table. nf_conntrack entry will be tagged with [HW_OFFLOAD] if
it's offload to hardware.
cat /proc/net/nf_conntrack
ipv4 2 tcp 6 \
src=1.1.1.17 dst=1.1.1.16 sport=56394 dport=5001 \
src=1.1.1.16 dst=1.1.1.17 sport=5001 dport=56394 [HW_OFFLOAD] \
mark=0 zone=0 use=3
Note that HW_OFFLOAD/OFFLOAD/ASSURED are mutually exclusive.
Changelog:
* V1->V2:
- Remove check of lastused from stats. It was meant for cases such
as removing driver module while traffic still running. Better to
handle such cases from garbage collector.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull cifs fixes from Steve French:
"Five cifs/smb3 fixes:two for DFS reconnect failover, one lease fix for
stable and the others to fix a missing spinlock during reconnect"
* tag '5.7-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix uninitialised lease_key in open_shroot()
cifs: ensure correct super block for DFS reconnect
cifs: do not share tcons with DFS
cifs: minor update to comments around the cifs_tcp_ses_lock mutex
cifs: protect updating server->dstaddr with a spinlock
Pull USB fixes from Greg KH:
"Here are a number of USB driver fixes for 5.7-rc3.
Nothing huge, just the usual collection of:
- xhci fixes
- gadget driver fixes
- syzkaller fuzzing fixes
- new device ids and DT bindings
- new quirks added for broken devices
A few of the gadget driver fixes show up twice here as they were
applied to my branch, and also by Felipe to his branch which I then
pulled in as we got out of sync a bit.
All of these have been in linux-next with no reported issues"
* tag 'usb-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
USB: sisusbvga: Change port variable from signed to unsigned
usb-storage: Add unusual_devs entry for JMicron JMS566
USB: hub: Revert commit bd0e6c9614 ("usb: hub: try old enumeration scheme first for high speed devices")
USB: hub: Fix handling of connect changes during sleep
usb: typec: altmode: Fix typec_altmode_get_partner sometimes returning an invalid pointer
xhci: Don't clear hub TT buffer on ep0 protocol stall
xhci: prevent bus suspend if a roothub port detected a over-current condition
xhci: Fix handling halted endpoint even if endpoint ring appears empty
usb: raw-gadget: Fix copy_to/from_user() checks
usb: raw-gadget: fix raw_event_queue_fetch locking
usb: gadget: udc: atmel: Fix vbus disconnect handling
usb: dwc3: gadget: Fix request completion check
USB: Add USB_QUIRK_DELAY_CTRL_MSG and USB_QUIRK_DELAY_INIT for Corsair K70 RGB RAPIDFIRE
phy: tegra: Select USB_COMMON for usb_get_maximum_speed()
usb: typec: tcpm: Ignore CC and vbus changes in PORT_RESET change
usb: f_fs: Clear OS Extended descriptor counts to zero in ffs_data_reset()
cdc-acm: introduce a cool down
cdc-acm: close race betrween suspend() and acm_softint
UAS: fix deadlock in error handling and PM flushing work
UAS: no use logging any details in case of ENODEV
...
Pull tty/serial fixes from Greg KH:
"Here are some tty and serial driver fixes for 5.7-rc3.
The "largest" in here are a number of reverts for previous changes to
the uartps serial driver that turned out to not be a good idea at all.
The others are just small fixes found by people and tools. Included in
here is a much-reported symbol export needed by previous changes that
happened in 5.7-rc1. All of these have been in linux-next for a while
with no reported issues"
* tag 'tty-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: hvc: fix buffer overflow during hvc_alloc().
tty: rocket, avoid OOB access
tty: serial: bcm63xx: fix missing clk_put() in bcm63xx_uart
vt: don't hardcode the mem allocation upper bound
tty: serial: owl: add "much needed" clk_prepare_enable()
vt: don't use kmalloc() for the unicode screen buffer
tty/sysrq: Export sysrq_mask(), sysrq_toggle_support()
serial: sh-sci: Make sure status register SCxSR is read in correct sequence
serial: sunhv: Initialize lock for non-registered console
Revert "serial: uartps: Register own uart console and driver structures"
Revert "serial: uartps: Move Port ID to device data structure"
Revert "serial: uartps: Change uart ID port allocation"
Revert "serial: uartps: Do not allow use aliases >= MAX_UART_INSTANCES"
Revert "serial: uartps: Fix error path when alloc failed"
Revert "serial: uartps: Use the same dynamic major number for all ports"
Revert "serial: uartps: Fix uartps_major handling"
Pull char/misc driver fixes from Greg KH:
"Here are 4 small misc driver fixes for 5.7-rc3:
- mei driver fix
- interconnect driver fix
- two fpga driver fixes
All have been in linux-next with no reported issues"
* tag 'char-misc-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
interconnect: qcom: Fix uninitialized tcs_cmd::wait
mei: me: fix irq number stored in hw struct
fpga: dfl: pci: fix return value of cci_pci_sriov_configure
fpga: zynq: Remove clk_get error message for probe defer
Pull staging/IIO driver fixes from Greg KH:
"Here are some small staging and IIO driver fixes for 5.7-rc3
Lots of tiny things for reported issues in staging and IIO drivers,
including a counter driver fix as well (the iio drivers seem to be
tied to those). Full details of the fixes are in the shortlog.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (27 commits)
staging: vt6656: Fix calling conditions of vnt_set_bss_mode
staging: comedi: Fix comedi_device refcnt leak in comedi_open
staging: vt6656: Fix pairwise key entry save.
staging: vt6656: Fix drivers TBTT timing counter.
staging: vt6656: Don't set RCR_MULTICAST or RCR_BROADCAST by default.
MAINTAINERS: remove Stefan Popa's email
iio: adc: ad7192: fix null pointer de-reference crash during probe
iio: core: remove extra semi-colon from devm_iio_device_register() macro
iio: adc: ti-ads8344: properly byte swap value
iio: imu: inv_mpu6050: fix suspend/resume with runtime power
iio: st_sensors: rely on odr mask to know if odr can be set
iio: xilinx-xadc: Make sure not exceed maximum samplerate
iio: xilinx-xadc: Fix sequencer configuration for aux channels in simultaneous mode
iio: xilinx-xadc: Fix clearing interrupt when enabling trigger
iio: xilinx-xadc: Fix ADC-B powerdown
iio: dac: ad5770r: fix off-by-one check on maximum number of channels
iio: imu: st_lsm6dsx: flush hw FIFO before resetting the device
iio: core: Fix handling of 'dB'
dt-bindings: iio: adc: stm32-adc: fix id relative path
counter: 104-quad-8: Add lock guards - generic interface
...
Pull driver core fixes from Greg KH:
"Here are some small firmware/driver core/debugfs fixes for 5.7-rc3.
The debugfs change is now possible as now the last users of
debugfs_create_u32() have been fixed up in the different trees that
got merged into 5.7-rc1, and I don't want it creeping back in.
The firmware changes did cause a regression in linux-next, so the
final patch here reverts part of that, re-exporting the symbol to
resolve that issue. All of these patches, with the exception of the
final one, have been in linux-next with only that one reported issue"
* tag 'driver-core-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
firmware_loader: revert removal of the fw_fallback_config export
debugfs: remove return value of debugfs_create_u32()
firmware_loader: remove unused exports
firmware: imx: fix compile-testing
Pull s390 fixes from Vasily Gorbik:
- Add a few notrace annotations to avoid potential crashes when
switching ftrace tracers.
- Avoid setting affinity for floating irqs in pci code.
- Fix build issue found by kbuild test robot.
* tag 's390-5.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/protvirt: fix compilation issue
s390/pci: do not set affinity for floating irqs
s390/ftrace: fix potential crashes when switching tracers
Pull powerpc fixes from Michael Ellerman:
- One important fix for a bug in the way we find the cache-line size
from the device tree, which was leading to the wrong size being
reported to userspace on some platforms.
- A fix for 8xx STRICT_KERNEL_RWX which was leaving TLB entries around
leading to a window at boot when the strict mapping wasn't enforced.
- A fix to enable our KUAP (kernel user access prevention) debugging on
PPC32.
- A build fix for clang in lib/mpi.
Thanks to: Chris Packham, Christophe Leroy, Nathan Chancellor, Qian Cai.
* tag 'powerpc-5.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
lib/mpi: Fix building for powerpc with clang
powerpc/mm: Fix CONFIG_PPC_KUAP_DEBUG on PPC32
powerpc/8xx: Fix STRICT_KERNEL_RWX startup test failure
powerpc/setup_64: Set cache-line-size based on cache-block-size
Pull more Devicetree fixes from Rob Herring:
"A couple of schema and kbuild fixes"
* tag 'devicetree-fixes-for-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: phy: qcom-qusb2: Fix defaults
dt-bindings: Fix erroneous 'additionalProperties'
dt-bindings: Fix command line length limit calling dt-mk-schema
dt-bindings: Re-enable core schemas for dtbs_check
Lorenz Bauer says:
====================
We've been developing an in-house L4 load balancer based on XDP
and TC for a while. Following Alexei's call for more up-to-date examples of
production BPF in the kernel tree [1], Cloudflare is making this available
under dual GPL-2.0 or BSD 3-clause terms.
The code requires at least v5.3 to function correctly.
1: https://lore.kernel.org/bpf/20200326210719.den5isqxntnoqhmv@ast-mbp/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
cls_redirect is a TC clsact based replacement for the glb-redirect iptables
module available at [1]. It enables what GitHub calls "second chance"
flows [2], similarly proposed by the Beamer paper [3]. In contrast to
glb-redirect, it also supports migrating UDP flows as long as connected
sockets are used. cls_redirect is in production at Cloudflare, as part of
our own L4 load balancer.
We have modified the encapsulation format slightly from glb-redirect:
glbgue_chained_routing.private_data_type has been repurposed to form a
version field and several flags. Both have been arranged in a way that
a private_data_type value of zero matches the current glb-redirect
behaviour. This means that cls_redirect will understand packets in
glb-redirect format, but not vice versa.
The test suite only covers basic features. For example, cls_redirect will
correctly forward path MTU discovery packets, but this is not exercised.
It is also possible to switch the encapsulation format to GRE on the last
hop, which is also not tested.
There are two major distinctions from glb-redirect: first, cls_redirect
relies on receiving encapsulated packets directly from a router. This is
because we don't have access to the neighbour tables from BPF, yet. See
forward_to_next_hop for details. Second, cls_redirect performs decapsulation
instead of using separate ipip and sit tunnel devices. This
avoids issues with the sit tunnel [4] and makes deploying the classifier
easier: decapsulated packets appear on the same interface, so existing
firewall rules continue to work as expected.
The code base started it's life on v4.19, so there are most likely still
hold overs from old workarounds. In no particular order:
- The function buf_off is required to defeat a clang optimization
that leads to the verifier rejecting the program due to pointer
arithmetic in the wrong order.
- The function pkt_parse_ipv6 is force inlined, because it would
otherwise be rejected due to returning a pointer to stack memory.
- The functions fill_tuple and classify_tcp contain kludges, because
we've run out of function arguments.
- The logic in general is rather nested, due to verifier restrictions.
I think this is either because the verifier loses track of constants
on the stack, or because it can't track enum like variables.
1: https://github.com/github/glb-director/tree/master/src/glb-redirect
2: https://github.com/github/glb-director/blob/master/docs/development/second-chance-design.md
3: https://www.usenix.org/conference/nsdi18/presentation/olteanu
4: https://github.com/github/glb-director/issues/64
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200424185556.7358-2-lmb@cloudflare.com
To make BPF verifier verbose log more releavant and easier to use to debug
verification failures, "pop" parts of log that were successfully verified.
This has effect of leaving only verifier logs that correspond to code branches
that lead to verification failure, which in practice should result in much
shorter and more relevant verifier log dumps. This behavior is made the
default behavior and can be overriden to do exhaustive logging by specifying
BPF_LOG_LEVEL2 log level.
Using BPF_LOG_LEVEL2 to disable this behavior is not ideal, because in some
cases it's good to have BPF_LOG_LEVEL2 per-instruction register dump
verbosity, but still have only relevant verifier branches logged. But for this
patch, I didn't want to add any new flags. It might be worth-while to just
rethink how BPF verifier logging is performed and requested and streamline it
a bit. But this trimming of successfully verified branches seems to be useful
and a good default behavior.
To test this, I modified runqslower slightly to introduce read of
uninitialized stack variable. Log (**truncated in the middle** to save many
lines out of this commit message) BEFORE this change:
; int handle__sched_switch(u64 *ctx)
0: (bf) r6 = r1
; struct task_struct *prev = (struct task_struct *)ctx[1];
1: (79) r1 = *(u64 *)(r6 +8)
func 'sched_switch' arg1 has btf_id 151 type STRUCT 'task_struct'
2: (b7) r2 = 0
; struct event event = {};
3: (7b) *(u64 *)(r10 -24) = r2
last_idx 3 first_idx 0
regs=4 stack=0 before 2: (b7) r2 = 0
4: (7b) *(u64 *)(r10 -32) = r2
5: (7b) *(u64 *)(r10 -40) = r2
6: (7b) *(u64 *)(r10 -48) = r2
; if (prev->state == TASK_RUNNING)
[ ... instruction dump from insn #7 through #50 are cut out ... ]
51: (b7) r2 = 16
52: (85) call bpf_get_current_comm#16
last_idx 52 first_idx 42
regs=4 stack=0 before 51: (b7) r2 = 16
; bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
53: (bf) r1 = r6
54: (18) r2 = 0xffff8881f3868800
56: (18) r3 = 0xffffffff
58: (bf) r4 = r7
59: (b7) r5 = 32
60: (85) call bpf_perf_event_output#25
last_idx 60 first_idx 53
regs=20 stack=0 before 59: (b7) r5 = 32
61: (bf) r2 = r10
; event.pid = pid;
62: (07) r2 += -16
; bpf_map_delete_elem(&start, &pid);
63: (18) r1 = 0xffff8881f3868000
65: (85) call bpf_map_delete_elem#3
; }
66: (b7) r0 = 0
67: (95) exit
from 44 to 66: safe
from 34 to 66: safe
from 11 to 28: R1_w=inv0 R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=mmmm???? fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000
; bpf_map_update_elem(&start, &pid, &ts, 0);
28: (bf) r2 = r10
;
29: (07) r2 += -16
; tsp = bpf_map_lookup_elem(&start, &pid);
30: (18) r1 = 0xffff8881f3868000
32: (85) call bpf_map_lookup_elem#1
invalid indirect read from stack off -16+0 size 4
processed 65 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 4
Notice how there is a successful code path from instruction 0 through 67, few
successfully verified jumps (44->66, 34->66), and only after that 11->28 jump
plus error on instruction #32.
AFTER this change (full verifier log, **no truncation**):
; int handle__sched_switch(u64 *ctx)
0: (bf) r6 = r1
; struct task_struct *prev = (struct task_struct *)ctx[1];
1: (79) r1 = *(u64 *)(r6 +8)
func 'sched_switch' arg1 has btf_id 151 type STRUCT 'task_struct'
2: (b7) r2 = 0
; struct event event = {};
3: (7b) *(u64 *)(r10 -24) = r2
last_idx 3 first_idx 0
regs=4 stack=0 before 2: (b7) r2 = 0
4: (7b) *(u64 *)(r10 -32) = r2
5: (7b) *(u64 *)(r10 -40) = r2
6: (7b) *(u64 *)(r10 -48) = r2
; if (prev->state == TASK_RUNNING)
7: (79) r2 = *(u64 *)(r1 +16)
; if (prev->state == TASK_RUNNING)
8: (55) if r2 != 0x0 goto pc+19
R1_w=ptr_task_struct(id=0,off=0,imm=0) R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000
; trace_enqueue(prev->tgid, prev->pid);
9: (61) r1 = *(u32 *)(r1 +1184)
10: (63) *(u32 *)(r10 -4) = r1
; if (!pid || (targ_pid && targ_pid != pid))
11: (15) if r1 == 0x0 goto pc+16
from 11 to 28: R1_w=inv0 R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=mmmm???? fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000
; bpf_map_update_elem(&start, &pid, &ts, 0);
28: (bf) r2 = r10
;
29: (07) r2 += -16
; tsp = bpf_map_lookup_elem(&start, &pid);
30: (18) r1 = 0xffff8881db3ce800
32: (85) call bpf_map_lookup_elem#1
invalid indirect read from stack off -16+0 size 4
processed 65 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 4
Notice how in this case, there are 0-11 instructions + jump from 11 to
28 is recorded + 28-32 instructions with error on insn #32.
test_verifier test runner was updated to specify BPF_LOG_LEVEL2 for
VERBOSE_ACCEPT expected result due to potentially "incomplete" success verbose
log at BPF_LOG_LEVEL1.
On success, verbose log will only have a summary of number of processed
instructions, etc, but no branch tracing log. Having just a last succesful
branch tracing seemed weird and confusing. Having small and clean summary log
in success case seems quite logical and nice, though.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200423195850.1259827-1-andriin@fb.com
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.
nmi_access_ok() is the last inline function which requires access to
cpu_tlbstate. Move it into the TLB code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092600.052543007@linutronix.de
On a device like a cellphone which is constantly suspending
and resuming CLOCK_MONOTONIC is not particularly useful for
keeping track of or reacting to external network events.
Instead you want to use CLOCK_BOOTTIME.
Hence add bpf_ktime_get_boot_ns() as a mirror of bpf_ktime_get_ns()
based around CLOCK_BOOTTIME instead of CLOCK_MONOTONIC.
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Loongson-2K (Loongson64 Reduced) is a family of SoC shipped with
gs264e core.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reduce the number of required exports to one and make flush_tlb_global()
static to the TLB code.
flush_tlb_local() cannot be confined to the TLB code as the MTRR
handling requires a PGE-less flush.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200421092559.740388137@linutronix.de
The entire implementation is in kernel/bpf/helpers.c:
BPF_CALL_0(bpf_ktime_get_ns) {
/* NMI safe access to clock monotonic */
return ktime_get_mono_fast_ns();
}
const struct bpf_func_proto bpf_ktime_get_ns_proto = {
.func = bpf_ktime_get_ns,
.gpl_only = false,
.ret_type = RET_INTEGER,
};
and this was presumably marked GPL due to kernel/time/timekeeping.c:
EXPORT_SYMBOL_GPL(ktime_get_mono_fast_ns);
and while that may make sense for kernel modules (although even that
is doubtful), there is currently AFAICT no other source of time
available to ebpf.
Furthermore this is really just equivalent to clock_gettime(CLOCK_MONOTONIC)
which is exposed to userspace (via vdso even to make it performant)...
As such, I see no reason to keep the GPL restriction.
(In the future I'd like to have access to time from Apache licensed ebpf code)
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This allows TC eBPF programs to modify and forward (redirect) packets
from interfaces without ethernet headers (for example cellular)
to interfaces with (for example ethernet/wifi).
The lack of this appears to simply be an oversight.
Tested:
in active use in Android R on 4.14+ devices for ipv6
cellular to wifi tethering offload.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch fixes an off by one error in the RV32 JIT handling for BPF
tail call. Currently, the code decrements TCC before checking if it
is less than zero. This limits the maximum number of tail calls to 32
instead of 33 as in other JITs. The fix is to instead check the old
value of TCC before decrementing.
Fixes: 5f316b65e9 ("riscv, bpf: Add RV32G eBPF JIT")
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Xi Wang <xi.wang@gmail.com>
Link: https://lore.kernel.org/bpf/20200421002804.5118-1-luke.r.nels@gmail.com
The following error was shown when a bpf program was compiled without
vmlinux.h auto-generated from BTF:
# clang -I./linux/tools/lib/ -I/lib/modules/$(uname -r)/build/include/ \
-O2 -Wall -target bpf -emit-llvm -c bpf_prog.c -o bpf_prog.bc
...
In file included from linux/tools/lib/bpf/bpf_helpers.h:5:
linux/tools/lib/bpf/bpf_helper_defs.h:56:82: error: unknown type name '__u64'
...
It seems that bpf programs are intended for being built together with
the vmlinux.h (which will have all the __u64 and other typedefs). But
users may mistakenly think "include <linux/types.h>" is missing
because the vmlinux.h is not common for non-bpf developers. IMO, an
explicit comment therefore should be added to bpf_helpers.h as this
patch shows.
Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1587427527-29399-1-git-send-email-komachi.yoshiki@gmail.com
Currently the following prog types don't fall back to bpf_base_func_proto()
(instead they have cgroup_base_func_proto which has a limited set of
helpers from bpf_base_func_proto):
* BPF_PROG_TYPE_CGROUP_DEVICE
* BPF_PROG_TYPE_CGROUP_SYSCTL
* BPF_PROG_TYPE_CGROUP_SOCKOPT
I don't see any specific reason why we shouldn't use bpf_base_func_proto(),
every other type of program (except bpf-lirc and, understandably, tracing)
use it, so let's fall back to bpf_base_func_proto for those prog types
as well.
This basically boils down to adding access to the following helpers:
* BPF_FUNC_get_prandom_u32
* BPF_FUNC_get_smp_processor_id
* BPF_FUNC_get_numa_node_id
* BPF_FUNC_tail_call
* BPF_FUNC_ktime_get_ns
* BPF_FUNC_spin_lock (CAP_SYS_ADMIN)
* BPF_FUNC_spin_unlock (CAP_SYS_ADMIN)
* BPF_FUNC_jiffies64 (CAP_SYS_ADMIN)
I've also added bpf_perf_event_output() because it's really handy for
logging and debugging.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200420174610.77494-1-sdf@google.com
Here are some minor edits of the devices.rst documentation file,
intended to improve the clarity and add a couple of missing details.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In order to use perf tool on the Loongson platform, we should enable kernel
support for various performance events provided by software and hardware,
so add CONFIG_PERF_EVENTS=y to loongson3_defconfig.
E.g. without this patch:
[loongson@localhost perf]$ ./perf list
List of pre-defined events (to be used in -e):
duration_time [Tool event]
rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
(see 'man perf-list' on how to encode it)
mem:<addr>[/len][:access] [Hardware breakpoint]
With this patch:
[loongson@localhost perf]$ ./perf list
List of pre-defined events (to be used in -e):
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
alignment-faults [Software event]
bpf-output [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
page-faults OR faults [Software event]
task-clock [Software event]
duration_time [Tool event]
L1-dcache-load-misses [Hardware cache event]
L1-dcache-store-misses [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-store-misses [Hardware cache event]
iTLB-load-misses [Hardware cache event]
rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
(see 'man perf-list' on how to encode it)
mem:<addr>[/len][:access] [Hardware breakpoint]
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
MIPS define a "Fill" macro as a cache operation in cacheops.h, this
will cause build failure under some special configurations because in
seq_file.c there is a "Fill" label. To avoid this failure we rename the
"Fill" macro to "Fill_I" which has the same coding style as other cache
operations in cacheops.h (we think renaming the "Fill" macro is more
reasonable than renaming the "Fill" label).
Callers of "Fill" macro is also updated.
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
... and no, __put_user() doesn't help here - skipping
access_ok() on the second call does not remove the
possibility of page having become unmapped or r/o
in the meanwhile
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.
As a last step, move __flush_tlb_others() out of line and hide the
native function. The latter can be static when CONFIG_PARAVIRT is
disabled.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.641957686@linutronix.de
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.
As a fourth step, move __flush_tlb_one_kernel() out of line and hide
the native function. The latter can be static when CONFIG_PARAVIRT is
disabled.
Consolidate the name space while at it and remove the pointless extra
wrapper in the paravirt code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.535159540@linutronix.de
cpu_tlbstate is exported because various TLB-related functions need access
to it, but cpu_tlbstate is sensitive information which should only be
accessed by well-contained kernel functions and not be directly exposed to
modules.
As a third step, move _flush_tlb_one_user() out of line and hide the
native function. The latter can be static when CONFIG_PARAVIRT is
disabled.
Consolidate the name space while at it and remove the pointless extra
wrapper in the paravirt code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.428213098@linutronix.de
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.
As a second step, move __flush_tlb_global() out of line and hide the
native function. The latter can be static when CONFIG_PARAVIRT is
disabled.
Consolidate the namespace while at it and remove the pointless extra
wrapper in the paravirt code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.336916818@linutronix.de
cpu_tlbstate is exported because various TLB-related functions need
access to it, but cpu_tlbstate is sensitive information which should
only be accessed by well-contained kernel functions and not be directly
exposed to modules.
As a first step, move __flush_tlb() out of line and hide the native
function. The latter can be static when CONFIG_PARAVIRT is disabled.
Consolidate the namespace while at it and remove the pointless extra
wrapper in the paravirt code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200421092559.246130908@linutronix.de
Christoph's patch removed two unsused exported symbols, however, one
symbol is used by the firmware_loader itself. If CONFIG_FW_LOADER=m so
the firmware_loader is modular but CONFIG_FW_LOADER_USER_HELPER=y we fail
the build at mostpost.
ERROR: modpost: "fw_fallback_config" [drivers/base/firmware_loader/firmware_class.ko] undefined!
This happens because the variable fw_fallback_config is built into the
kernel if CONFIG_FW_LOADER_USER_HELPER=y always, so we need to grant
access to the firmware loader module by exporting it.
Revert only one hunk from his patch.
Fixes: 739604734b ("firmware_loader: remove unused exports")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20200424184916.22843-1-mcgrof@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>