Add support for the RX checksum offload. This is enabled by default and
may be disabled and re-enabled using 'ethtool':
# ethtool -K eth0 rx off
# ethtool -K eth0 rx on
Some Ether MACs provide a simple checksumming scheme which appears to be
completely compatible with CHECKSUM_COMPLETE: sum of all packet data after
the L2 header is appended to packet data; this may be trivially read by
the driver and used to update the skb accordingly. The same checksumming
scheme is implemented in the EtherAVB MACs and now supported by the 'ravb'
driver.
In terms of performance, throughput is close to gigabit line rate with the
RX checksum offload both enabled and disabled. The 'perf' output, however,
appears to indicate that significantly less time is spent in do_csum() --
this is as expected.
Test results with RX checksum offload enabled:
~/netperf-2.2pl4# perf record -a ./netperf -t TCP_MAERTS -H 192.168.2.4
TCP MAERTS TEST to 192.168.2.4
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 10.01 933.93
[ perf record: Woken up 8 times to write data ]
[ perf record: Captured and wrote 1.955 MB perf.data (41940 samples) ]
~/netperf-2.2pl4# perf report
Samples: 41K of event 'cycles:ppp', Event count (approx.): 9915302763
Overhead Command Shared Object Symbol
9.44% netperf [kernel.kallsyms] [k] __arch_copy_to_user
7.75% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irq
6.31% swapper [kernel.kallsyms] [k] default_idle_call
5.89% swapper [kernel.kallsyms] [k] arch_cpu_idle
4.37% swapper [kernel.kallsyms] [k] tick_nohz_idle_exit
4.02% netperf [kernel.kallsyms] [k] _raw_spin_unlock_irq
2.52% netperf [kernel.kallsyms] [k] preempt_count_sub
1.81% netperf [kernel.kallsyms] [k] tcp_recvmsg
1.80% netperf [kernel.kallsyms] [k] _raw_spin_unlock_irqres
1.78% netperf [kernel.kallsyms] [k] preempt_count_add
1.36% netperf [kernel.kallsyms] [k] __tcp_transmit_skb
1.20% netperf [kernel.kallsyms] [k] __local_bh_enable_ip
1.10% netperf [kernel.kallsyms] [k] sh_eth_start_xmit
Test results with RX checksum offload disabled:
~/netperf-2.2pl4# perf record -a ./netperf -t TCP_MAERTS -H 192.168.2.4
TCP MAERTS TEST to 192.168.2.4
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 10.01 932.04
[ perf record: Woken up 14 times to write data ]
[ perf record: Captured and wrote 3.642 MB perf.data (78817 samples) ]
~/netperf-2.2pl4# perf report
Samples: 78K of event 'cycles:ppp', Event count (approx.): 18091442796
Overhead Command Shared Object Symbol
7.00% swapper [kernel.kallsyms] [k] do_csum
3.94% swapper [kernel.kallsyms] [k] sh_eth_poll
3.83% ksoftirqd/0 [kernel.kallsyms] [k] do_csum
3.23% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irq
2.87% netperf [kernel.kallsyms] [k] __arch_copy_to_user
2.86% swapper [kernel.kallsyms] [k] arch_cpu_idle
2.13% swapper [kernel.kallsyms] [k] default_idle_call
2.12% ksoftirqd/0 [kernel.kallsyms] [k] sh_eth_poll
2.02% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.84% swapper [kernel.kallsyms] [k] __softirqentry_text_start
1.64% swapper [kernel.kallsyms] [k] tick_nohz_idle_exit
1.53% netperf [kernel.kallsyms] [k] _raw_spin_unlock_irq
1.32% netperf [kernel.kallsyms] [k] preempt_count_sub
1.27% swapper [kernel.kallsyms] [k] __pi___inval_dcache_area
1.22% swapper [kernel.kallsyms] [k] check_preemption_disabled
1.01% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
The above results collected on the R-Car V3H Starter Kit board.
Based on the commit 4d86d38186 ("ravb: RX checksum offload")...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 62e04b7e0e ("sh_eth: rename 'sh_eth_cpu_data::hw_crc'") renamed
the field to 'hw_checksum' for the Ether DMAC "intelligent checksum",
however some Ether MACs implement a simpler checksumming scheme, so that
name now seems misleading. Rename that field to 'csmr' as the "intelligent
checksum" is always controlled by the CSMR register.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song says:
====================
This patch set exposed a few functions in libbpf.
All these newly added API functions are helpful for
JIT based bpf compilation where .BTF and .BTF.ext
are available as in-memory data blobs.
Patch #1 exposed several btf_ext__* API functions which
are used to handle .BTF.ext ELF sections.
Patch #2 refactored the function bpf_map_find_btf_info()
and exposed API function btf__get_map_kv_tids() to
retrieve the map key/value type id's generated by
bpf program through BPF_ANNOTATE_KV_PAIR macro.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, to get map key/value type id's, the macro
BPF_ANNOTATE_KV_PAIR(<map_name>, <key_type>, <value_type>)
needs to be defined in the bpf program for the
corresponding map.
During program/map loading time,
the local static function bpf_map_find_btf_info()
in libbpf.c is implemented to retrieve the key/value
type ids given the map name.
The patch refactored function bpf_map_find_btf_info()
to create an API btf__get_map_kv_tids() which includes
the bulk of implementation for the original function.
The API btf__get_map_kv_tids() can be used by bcc,
a JIT based bpf compilation system, which uses the
same BPF_ANNOTATE_KV_PAIR to record map key/value types.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The following set of functions, which manipulates .BTF.ext
section, are exposed as API functions:
. btf_ext__new
. btf_ext__free
. btf_ext__reloc_func_info
. btf_ext__reloc_line_info
. btf_ext__func_info_rec_size
. btf_ext__line_info_rec_size
These functions are useful for JIT based bpf codegen, e.g.,
bcc, to manipulate in-memory .BTF.ext sections.
The signature of function btf_ext__reloc_func_info()
is also changed to be the same as its definition in btf.c.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Bind and connect to localhost. There is no reason for this test to
use non-localhost interface. This lets us run this test in a network
namespace.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
DTC introduced an i2c_bus_reg check in v1.4.7, used since Linux v4.20,
which complains about upper case addresses used in the unit name.
nexys4ddr.dts names an I2C device node "ad7420@4B", leading to:
arch/mips/boot/dts/xilfpga/nexys4ddr.dts:109.16-112.8: Warning
(i2c_bus_reg): /i2c@10A00000/ad7420@4B: I2C bus unit address format
error, expected "4b"
Fix this by switching to lower case addresses throughout the file, as is
*mostly* the case in the file already & fairly standard throughout the
tree.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: stable@vger.kernel.org # v4.20+
Cc: linux-mips@vger.kernel.org
Our hugepage support already exists for MIPS64 CPUs, and is already
enabled for older architecture revisions. There's nothing MIPSr6
specific involved, and our hugepage support already works fine for
MIPS64r6 CPUs such as the I6500, so allow it to be selected in Kconfig.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
set_pte() contains an open coded version of cmpxchg() - it atomically
replaces the buddy pte's value if it is currently zero. Simplify the
code considerably by just using cmpxchg() instead of reinventing it.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Introduce support for using MemoryMapIDs (MMIDs) as an alternative to
Address Space IDs (ASIDs). The major difference between the two is that
MMIDs are global - ie. an MMID uniquely identifies an address space
across all coherent CPUs. In contrast ASIDs are non-global per-CPU IDs,
wherein each address space is allocated a separate ASID for each CPU
upon which it is used. This global namespace allows a new GINVT
instruction be used to globally invalidate TLB entries associated with a
particular MMID across all coherent CPUs in the system, removing the
need for IPIs to invalidate entries with separate ASIDs on each CPU.
The allocation scheme used here is largely borrowed from arm64 (see
arch/arm64/mm/context.c). In essence we maintain a bitmap to track
available MMIDs, and MMIDs in active use at the time of a rollover to a
new MMID version are preserved in the new version. The allocation scheme
requires efficient 64 bit atomics in order to perform reasonably, so
this support depends upon CONFIG_GENERIC_ATOMIC64=n (ie. currently it
will only be included in MIPS64 kernels).
The first, and currently only, available CPU with support for MMIDs is
the MIPS I6500. This CPU supports 16 bit MMIDs, and so for now we cap
our MMIDs to 16 bits wide in order to prevent the bitmap growing to
absurd sizes if any future CPU does implement 32 bit MMIDs as the
architecture manuals suggest is recommended.
When MMIDs are in use we also make use of GINVT instruction which is
available due to the global nature of MMIDs. By executing a sequence of
GINVT & SYNC 0x14 instructions we can avoid the overhead of an IPI to
each remote CPU in many cases. One complication is that GINVT will
invalidate wired entries (in all cases apart from type 0, which targets
the entire TLB). In order to avoid GINVT invalidating any wired TLB
entries we set up, we make sure to create those entries using a reserved
MMID (0) that we never associate with any address space.
Also of note is that KVM will require further work in order to support
MMIDs & GINVT, since KVM is involved in allocating IDs for guests & in
configuring the MMU. That work is not part of this patch, so for now
when MMIDs are in use KVM is disabled.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Add a family of ginvt_* functions making it easy to emit a GINVT
instruction to globally invalidate TLB entries. We make use of the
_ASM_MACRO infrastructure to support emitting the instructions even if
the assembler isn't new enough to support them natively.
An associated STYPE_GINV definition & sync_ginv() function are added to
emit a sync instruction of type 0x14, which operates as a completion
barrier for these new GINVT (and GINVI) instructions.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
When we gain MMID support we'll be storing MMIDs as atomic64_t values
and accessing them via atomic64_* functions. This necessitates that we
don't use cpu_context() as the left hand side of an assignment, ie. as a
modifiable lvalue. In preparation for this introduce a new
set_cpu_context() function & replace all assignments with cpu_context()
on their left hand side with an equivalent call to set_cpu_context().
To enforce that cpu_context() should not be used for assignments, we
rewrite it as a static inline function.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Introduce a new check_mmu_context() function to check an mm's ASID
version & get a new one if it's outdated, and a
check_switch_mmu_context() function which additionally sets up the new
ASID & page directory. Simplify switch_mm() & various
get_new_mmu_context() callsites in MIPS KVM by making use of the new
functions, which will help reduce the amount of code that requires
modification to gain MMID support.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
In preparation for adding MMID support to get_new_mmu_context() which
will increase the size of the function somewhat, move it from
asm/mmu_context.h into a C file.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Split always-included objects to one per line in order to make it easier
to modify the list of included objects.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
All 3 variants of local_flush_tlb_mm() are now effectively simple calls
to drop_mmu_context(). Remove them and use drop_mmu_context() directly.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
The r4k variant of local_flush_tlb_mm() wraps its call to
drop_mmu_context() with a preempt_disable() & preempt_enable() pair, but
this is redundant since drop_mmu_context() disables interrupts and from
Documentation/preempt-locking.txt:
Note that you do not need to explicitly prevent preemption if you are
holding any locks or interrupts are disabled, since preemption is
implicitly disabled in those cases.
Remove the redundant preempt_disable() & preempt_enable() calls.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
drop_mmu_context() is preceded by a comment indicating what happens if
the mm provided is currently active on the local CPU. Move that comment
into the block that executes in this case, adjusting slightly to reflect
its new location.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
If an mm does not have an ASID on the local CPU then drop_mmu_context()
is always redundant, since there's no context to "drop". Various callers
of drop_mmu_context() check whether the mm has been allocated an ASID
before making the call. Move that check into drop_mmu_context() and
remove it from callers to simplify them.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
If drop_mmu_context() is called with an mm that is not currently active
on the local CPU then there's no need for us to stop & start a hardware
page table walker because it can't be fetching entries for the ASID
corresponding to the mm we're operating on.
Move the htw_stop() & htw_start() calls into the block which we run only
if the mm is currently active, in order to avoid the redundant work.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
get_new_mmu_context() accepts a cpu argument, but implicitly assumes
that this is always equal to smp_processor_id() by operating on the
local CPU's TLB & icache.
Remove the cpu argument and have get_new_mmu_context() call
smp_processor_id() instead.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
The drop_mmu_context() function accepts a cpu argument, but it
implicitly expects that this is always equal to smp_processor_id() by
allocating & configuring an ASID on the local CPU when the mm is active
on the CPU indicated by the cpu argument.
All callers do provide the value of smp_processor_id() to the cpu
argument.
Remove the redundant argument and have drop_mmu_context() call
smp_processor_id() itself, making it clearer that the cpu variable
always represents the local CPU.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
MIPS has separate definitions of activate_mm() & switch_mm() which are
identical apart from switch_mm() checking that the ASID is valid before
acquiring a new one.
We know that when activate_mm() is called cpu_context(X, mm) will be
zero, and this will never be considered a valid ASID because we never
allow the ASID version number to be zero, instead beginning with version
1 using asid_first_version(). Therefore switch_mm() will always allocate
a new ASID when called for a new task, meaning that it will behave
identically to activate_mm().
Take advantage of this to remove the duplication & define activate_mm()
using switch_mm() just like many other architectures do.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
On the Loongson-2G/2H/3A/3B there is a hardware flaw that ll/sc and
lld/scd is very weak ordering. We should add sync instructions "before
each ll/lld" and "at the branch-target between ll/sc" to workaround.
Otherwise, this flaw will cause deadlock occasionally (e.g. when doing
heavy load test with LTP).
Below is the explaination of CPU designer:
"For Loongson 3 family, when a memory access instruction (load, store,
or prefetch)'s executing occurs between the execution of LL and SC, the
success or failure of SC is not predictable. Although programmer would
not insert memory access instructions between LL and SC, the memory
instructions before LL in program-order, may dynamically executed
between the execution of LL/SC, so a memory fence (SYNC) is needed
before LL/LLD to avoid this situation.
Since Loongson-3A R2 (3A2000), we have improved our hardware design to
handle this case. But we later deduce a rarely circumstance that some
speculatively executed memory instructions due to branch misprediction
between LL/SC still fall into the above case, so a memory fence (SYNC)
at branch-target (if its target is not between LL/SC) is needed for
Loongson 3A1000, 3B1500, 3A2000 and 3A3000.
Our processor is continually evolving and we aim to to remove all these
workaround-SYNCs around LL/SC for new-come processor."
Here is an example:
Both cpu1 and cpu2 simutaneously run atomic_add by 1 on same atomic var,
this bug cause both 'sc' run by two cpus (in atomic_add) succeed at same
time('sc' return 1), and the variable is only *added by 1*, sometimes,
which is wrong and unacceptable(it should be added by 2).
Why disable fix-loongson3-llsc in compiler?
Because compiler fix will cause problems in kernel's __ex_table section.
This patch fix all the cases in kernel, but:
+. the fix at the end of futex_atomic_cmpxchg_inatomic is for branch-target
of 'bne', there other cases which smp_mb__before_llsc() and smp_llsc_mb() fix
the ll and branch-target coincidently such as atomic_sub_if_positive/
cmpxchg/xchg, just like this one.
+. Loongson 3 does support CONFIG_EDAC_ATOMIC_SCRUB, so no need to touch
edac.h
+. local_ops and cmpxchg_local should not be affected by this bug since
only the owner can write.
+. mips_atomic_set for syscall.c is deprecated and rarely used, just let
it go
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Huang Pei <huangpei@loongson.cn>
[paul.burton@mips.com:
- Simplify the addition of -mno-fix-loongson3-llsc to cflags, and add
a comment describing why it's there.
- Make loongson_llsc_mb() a no-op when
CONFIG_CPU_LOONGSON3_WORKAROUNDS=n, rather than a compiler memory
barrier.
- Add a comment describing the bug & how loongson_llsc_mb() helps
in asm/barrier.h.]
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: ambrosehua@gmail.com
Cc: Steven J . Hill <Steven.Hill@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Cc: Li Xuefeng <lixuefeng@loongson.cn>
Cc: Xu Chenghua <xuchenghua@loongson.cn>
With a suitably defined "probe:vfs_getname" probe, 'perf trace' can
"beautify" its output, so syscalls like open() or openat() can print the
"filename" argument instead of just its hex address, like:
$ perf trace -e open -- touch /dev/null
[...]
0.590 ( 0.014 ms): touch/18063 open(filename: /dev/null, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
[...]
The output without such beautifier looks like:
0.529 ( 0.011 ms): touch/18075 open(filename: 0xc78cf288, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
However, when the vfs_getname probe expands to multiple probes and it is
not the first one that is hit, the beautifier fails, as following:
0.326 ( 0.010 ms): touch/18072 open(filename: , flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
Fix it by hooking into all the expanded probes (inlines), now, for instance:
[root@quaco ~]# perf probe -l
probe:vfs_getname (on getname_flags:73@fs/namei.c with pathname)
probe:vfs_getname_1 (on getname_flags:73@fs/namei.c with pathname)
[root@quaco ~]# perf trace -e open* sleep 1
0.010 ( 0.005 ms): sleep/5588 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: RDONLY|CLOEXEC) = 3
0.029 ( 0.006 ms): sleep/5588 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: RDONLY|CLOEXEC) = 3
0.194 ( 0.008 ms): sleep/5588 openat(dfd: CWD, filename: /usr/lib/locale/locale-archive, flags: RDONLY|CLOEXEC) = 3
[root@quaco ~]#
Works, further verified with:
[root@quaco ~]# perf test vfs
65: Use vfs_getname probe to get syscall args filenames : Ok
66: Add vfs_getname probe to get syscall args filenames : Ok
67: Check open filename arg using perf trace + vfs_getname: Ok
[root@quaco ~]#
Reported-by: Michael Petlan <mpetlan@redhat.com>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-mv8kolk17xla1smvmp3qabv1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When perf is built with the annobin plugin (RHEL8 build) extra symbols
are added to its binary:
# nm perf | grep annobin | head -10
0000000000241100 t .annobin_annotate.c
0000000000326490 t .annobin_annotate.c
0000000000249255 t .annobin_annotate.c_end
00000000003283a8 t .annobin_annotate.c_end
00000000001bce18 t .annobin_annotate.c_end.hot
00000000001bce18 t .annobin_annotate.c_end.hot
00000000001bc3e2 t .annobin_annotate.c_end.unlikely
00000000001bc400 t .annobin_annotate.c_end.unlikely
00000000001bce18 t .annobin_annotate.c.hot
00000000001bce18 t .annobin_annotate.c.hot
...
Those symbols have no use for report or annotation and should be
skipped. Moreover they interfere with the DWARF unwind test on the PPC
arch, where they are mixed with checked symbols and then the test fails:
# perf test dwarf -v
59: Test dwarf unwind :
--- start ---
test child forked, pid 8515
unwind: .annobin_dwarf_unwind.c:ip = 0x10dba40dc (0x2740dc)
...
got: .annobin_dwarf_unwind.c 0x10dba40dc, expecting test__arch_unwind_sample
unwind: failed with 'no error'
The annobin symbols are defined as NOTYPE/LOCAL/HIDDEN:
# readelf -s ./perf | grep annobin | head -1
40: 00000000001bce4f 0 NOTYPE LOCAL HIDDEN 13 .annobin_init.c
They can still pass the check for the label symbol. Adding check for
HIDDEN and INTERNAL (as suggested by Nick below) visibility and filter
out such symbols.
> Just to be awkward, if you are going to ignore STV_HIDDEN
> symbols then you should probably also ignore STV_INTERNAL ones
> as well... Annobin does not generate them, but you never know,
> one day some other tool might create some.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Clifton <nickc@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190128133526.GD15461@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Those aren't present in Alpine Linux 3.4 to edge, so provide fallback
defines to get the next patch building there keeping the build
bisectable.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Clifton <nickc@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/n/tip-03cg3gya2ju4ba2x6ibb9fuz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 626a5f66da ("s390: bpf: implement jitting of JMP32") added
JMP32 code-gen support for s390. However it triggers the warning below
due to some unusual gotos in the original s390 bpf jit code.
Add a couple of additional "is_jmp32" initializations to fix this.
Also fix the wrong opcode for the "llilf" instruction that was
introduced with the same commit.
arch/s390/net/bpf_jit_comp.c: In function 'bpf_jit_insn':
arch/s390/net/bpf_jit_comp.c:248:55: warning: 'is_jmp32' may be used uninitialized in this function [-Wmaybe-uninitialized]
_EMIT6(op1 | reg(b1, b2) << 16 | (rel & 0xffff), op2 | mask); \
^
arch/s390/net/bpf_jit_comp.c:1211:8: note: 'is_jmp32' was declared here
bool is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;
Fixes: 626a5f66da ("s390: bpf: implement jitting of JMP32")
Cc: Jiong Wang <jiong.wang@netronome.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Julian Wiedmann says:
====================
s390/qeth: fixes 2019-02-04
please apply the following four fixes to -net.
Patch 1 takes care of a common resource leak in various error paths, while the
second patch fixes a misordered kfree when cleaning up after an error.
The other two patches ensure that there's no stale work dangling on workqueues
when the qeth device has already been offlined and/or removed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Work for Bridgeport events is currently placed on a driver-wide
workqueue. If the card is removed and freed while any such work is still
active, this causes a use-after-free.
So put the events on a per-card queue, where we can control their
lifetime. As we also don't want stale events to last beyond an
offline & online cycle, flush this queue when setting the card offline.
Fixes: b4d72c08b3 ("qeth: bridgeport support - basic control")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A card's close_dev work is scheduled on a driver-wide workqueue. If the
card is removed and freed while the work is still active, this causes a
use-after-free.
So make sure that the work is completed before freeing the card.
Fixes: 0f54761d16 ("qeth: Support VEPA mode")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error path in qeth_alloc_qdio_buffers() that takes care of
cleaning up the Output Queues is buggy. It first frees the queue, but
then calls qeth_clear_outq_buffers() with that very queue struct.
Make the call to qeth_clear_outq_buffers() part of the free action
(in the correct order), and while at it fix the naming of the helper.
Fixes: 0da9581ddb ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever we fail before/while starting an IO, make sure to release the
IO buffer. Usually qeth_irq() would do this for us, but if the IO
doesn't even start we obviously won't get an interrupt for it either.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song says:
====================
These are patches responding to my comments for
Magnus's patch (https://patchwork.ozlabs.org/patch/1032848/).
The goal is to make pr_* macros available to other C files
than libbpf.c, and to simplify API function libbpf_set_print().
Specifically, Patch #1 used global functions
to facilitate pr_* macros in the header files so they
are available in different C files.
Patch #2 removes the global function libbpf_print_level_available()
which is added in Patch 1.
Patch #3 simplified libbpf_set_print() which takes only one print
function with a debug level argument among others.
Changelogs:
v3 -> v4:
. rename libbpf internal header util.h to libbpf_util.h
. rename libbpf internal function libbpf_debug_print() to libbpf_print()
v2 -> v3:
. bailed out earlier in libbpf_debug_print() if __libbpf_pr is NULL
. added missing LIBBPF_DEBUG level check in libbpf.c __base_pr().
v1 -> v2:
. Renamed global function libbpf_dprint() to libbpf_debug_print()
to be more expressive.
. Removed libbpf_dprint_level_available() as it is used only
once in btf.c and we can remove it by optimizing for common cases.
====================
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, the libbpf API function libbpf_set_print()
takes three function pointer parameters for warning, info
and debug printout respectively.
This patch changes the API to have just one function pointer
parameter and the function pointer has one additional
parameter "debugging level". So if in the future, if
the debug level is increased, the function signature
won't change.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, the btf log is allocated and printed out in case
of error at LIBBPF_DEBUG level.
Such logs from kernel are very important for debugging.
For example, bpf syscall BPF_PROG_LOAD command can get
verifier logs back to user space. In function load_program()
of libbpf.c, the log buffer is allocated unconditionally
and printed out at pr_warning() level.
Let us do the similar thing here for btf. Allocate buffer
unconditionally and print out error logs at pr_warning() level.
This can reduce one global function and
optimize for common situations where pr_warning()
is activated either by default or by user supplied
debug output function.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A global function libbpf_print, which is invisible
outside the shared library, is defined to print based
on levels. The pr_warning, pr_info and pr_debug
macros are moved into the newly created header
common.h. So any .c file including common.h can
use these macros directly.
Currently btf__new and btf_ext__new API has an argument getting
__pr_debug function pointer into btf.c so the debugging information
can be printed there. This patch removed this parameter
from btf__new and btf_ext__new and directly using pr_debug in btf.c.
Another global function libbpf_print_level_available, also
invisible outside the shared library, can test
whether a particular level debug printing is
available or not. It is used in btf.c to
test whether DEBUG level debug printing is availabl or not,
based on which the log buffer will be allocated when loading
btf to the kernel.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
indirect calls are only needed if ipv6 is a module.
Add helpers to abstract the v6ops indirections and use them instead.
fragment, reroute and route_input are kept as indirect calls.
The first two are not not used in hot path and route_input is only
used by bridge netfilter.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_nat_ipv6 calls two ipv6 core functions, so add those to v6ops to avoid
the module dependency.
This is a prerequisite for merging ipv4 and ipv6 nat implementations.
Add wrappers to avoid the indirection if ipv6 is builtin.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In fl_change(), when adding a new rule (i.e. fold == NULL), a driver may
reject the new rule, for example due to resource exhaustion. By that
point, the new rule was already assigned a mask, and it was added to
that mask's hash table. The clean-up path that's invoked as a result of
the rejection however neglects to undo the hash table addition, and
proceeds to free the new rule, thus leaving a dangling pointer in the
hash table.
Fix by removing fnew from the mask's hash table before it is freed.
Fixes: 35cc3cefc4 ("net/sched: cls_flower: Reject duplicated rules also under skip_sw")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
First set of small, but importnat, fixes for 5.0.
iwlwifi
* fix a build regression introduced in 5.0-rc1
wlcore
* fix a firmware regression from v4.18-rc1
mt76x0
* fix for configuring tx power from user space
ath10k
* fix wcn3990 regression from v4.20-rc1
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJcWDsEAAoJEG4XJFUm622bSGsIAKevgvjWQkE7ITGfVuVNxRr9
B1pLgpOrQpVJtInbIS/sGpRKlcGn+3rE47KWzsl/1osGroQGcj5GV+hfGsvGkLwD
rzbYnAWA4VKP9yD/RDCho9v8kjAQYdfIArE7zpm14De0oXJHEn3C94Hxd90hn6CX
BhX4iZONcpIaAnggLj5aSIOo8+UFoE9BlGLFN0uNGQT2V0X/GjvZSfTAGsaof09Q
vSij06GNAoPF4g4ekJA0sJk0T3hJ8NNklrArKii7SE841j8Y1UaDHtHd/X5eUyLa
tB2pDvQgoCO6O5stZWtq1fr/qPBFthyE1TiMla5WgW+8l4QYaF14smNxkW5Kamg=
=5tr0
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-for-davem-2019-02-04' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 5.0
First set of small, but importnat, fixes for 5.0.
iwlwifi
* fix a build regression introduced in 5.0-rc1
wlcore
* fix a firmware regression from v4.18-rc1
mt76x0
* fix for configuring tx power from user space
ath10k
* fix wcn3990 regression from v4.20-rc1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun says:
====================
net/smc: fixes 2019-02-04
here are more fixes in the smc code for the net tree:
Patch 1 fixes an IB-related problem with SMCR.
Patch 2 fixes a cursor problem for one-way traffic.
Patch 3 fixes a problem with RMB-reusage.
Patch 4 fixes a closing issue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If some kind of closing is received from the peer while still in
state SMC_INIT, it means the peer has had an active connection and
closed the socket quickly before listen_work finished. This should
not result in a shortcut from state SMC_INIT to state SMC_CLOSED.
This patch adds the socket to the accept queue in state
SMC_APPCLOSEWAIT1. The socket reaches state SMC_CLOSED once being
accepted and closed with smc_release().
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once RMBs are flagged as unused they are candidates for reuse.
Thus the LLC DELETE RKEY operaton should be made before flagging
the RMB as unused.
Fixes: c7674c001b ("net/smc: unregister rkeys of unused buffer")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some scenarios a separate consumer cursor update is necessary.
The decision is made in smc_tx_consumer_cursor_update(). The
sender_free computation could be wrong:
The rx confirmed cursor is always smaller than or equal to the
rx producer cursor. The parameters in the smc_curs_diff() call
have to be exchanged, otherwise sender_free might even be negative.
And if more data arrives local_rx_ctrl.prod might be updated, enabling
a cursor difference between local_rx_ctrl.prod and rx confirmed cursor
larger than the RMB size. This case is not covered by smc_curs_diff().
Thus function smc_curs_diff_large() is introduced here.
If a recvmsg() is processed in parallel, local_tx_ctrl.cons might
change during smc_cdc_msg_send. Make sure rx_curs_confirmed is updated
with the actually sent local_tx_ctrl.cons value.
Fixes: e82f2e31f5 ("net/smc: optimize consumer cursor updates")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The work requests for rdma writes are built in local variables within
function smc_tx_rdma_write(). This violates the rule that the work
request storage has to stay till the work request is confirmed by
a completion queue response.
This patch introduces preallocated memory for these work requests.
The storage is allocated, once a link (and thus a queue pair) is
established.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During sendmsg() a cloned skb is saved via dp83640_txtstamp() in
->tx_queue. After the NIC sends this packet, the PHY will reply with a
timestamp for that TX packet. If the cable is pulled at the right time I
don't see that packet. It might gets flushed as part of queue shutdown
on NIC's side.
Once the link is up again then after the next sendmsg() we enqueue
another skb in dp83640_txtstamp() and have two on the list. Then the PHY
will send a reply and decode_txts() attaches it to the first skb on the
list.
No crash occurs since refcounting works but we are one packet behind.
linuxptp/ptp4l usually closes the socket and opens a new one (in such a
timeout case) so those "stale" replies never get there. However it does
not resume normal operation anymore.
Purge old skbs in decode_txts().
Fixes: cb646e2b02 ("ptp: Added a clock driver for the National Semiconductor PHYTER.")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anonymous sets that are bound to rules from the same transaction trigger
a kernel splat from the abort path due to double set list removal and
double free.
This patch updates the logic to search for the transaction that is
responsible for creating the set and disable the set list removal and
release, given the rule is now responsible for this. Lookup is reverse
since the transaction that adds the set is likely to be at the tail of
the list.
Moreover, this patch adds the unbind step to deliver the event from the
commit path. This should not be done from the worker thread, since we
have no guarantees of in-order delivery to the listener.
This patch removes the assumption that both activate and deactivate
callbacks need to be provided.
Fixes: cd5125d8f5 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase")
Reported-by: Mikhail Morfikov <mmorfikov@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change snprintf to scnprintf. There are generally two cases where using
snprintf causes problems.
1) Uses of size += snprintf(buf, SIZE - size, fmt, ...) In this case,
if snprintf would have written more characters than what the buffer
size (SIZE) is, then size will end up larger than SIZE. In later uses
of snprintf, SIZE - size will result in a negative number, leading to
problems. Note that size might already be too large by using size =
snprintf before the code reaches a case of size += snprintf.
2) If size is ultimately used as a length parameter for a copy back to
user space, then it will potentially allow for a buffer overflow and
information disclosure when size is greater than SIZE. When the size is
used to index the buffer directly, we can have memory corruption. This
also means when size = snprintf... is used, it may also cause problems
since size may become large. Copying to userspace is mitigated by the
HARDENED_USERCOPY kernel configuration.
The solution to these issues is to use scnprintf which returns the number
of characters actually written to the buffer, so the size variable will
never exceed SIZE.
Cc: Willy Tarreau <w@1wt.eu>
Cc: Silvio Cesare <silvio.cesare@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
HTT aggr message parameter in HL2.0 fw are different in comparison
to legacy fw version. Fill correct HTT aggr msg parameter for
targets using HL2.0 firmware.
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>