linux

Author	SHA1	Message	Date
Brian Foster	826f7e3413	xfs: use bitops interface for buf log item AIL flag check The xfs_log_item flags were converted to atomic bitops as of commit `22525c17ed` ("xfs: log item flags are racy"). The assert check for AIL presence in xfs_buf_item_relse() still uses the old value based check. This likely went unnoticed as XFS_LI_IN_AIL evaluates to 0 and causes the assert to unconditionally pass. Fix up the check. Signed-off-by: Brian Foster <bfoster@redhat.com> Fixes: `22525c17ed` ("xfs: log item flags are racy") Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-12-19 07:53:47 -08:00
Andrii Nakryiko	7745ff9842	libbpf: Fix another __u64 printf warning Fix yet another printf warning for %llu specifier on ppc64le. This time size_t casting won't work, so cast to verbose `unsigned long long`. Fixes: `166750bc1d` ("libbpf: Support libbpf-provided extern variables") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191219052103.3515-1-andriin@fb.com	2019-12-19 16:47:56 +01:00
Toke Høiland-Jørgensen	b5c7d0d0f7	libbpf: Fix printing of ulimit value Naresh pointed out that libbpf builds fail on 32-bit architectures because rlimit.rlim_cur is defined as 'unsigned long long' on those architectures. Fix this by using %zu in printf and casting to size_t. Fixes: `dc3a2d2547` ("libbpf: Print hint about ulimit when getting permission denied error") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191219090236.905059-1-toke@redhat.com	2019-12-19 16:25:52 +01:00
Daniel Borkmann	ca8d0fa7cf	Merge branch 'bpf-fix-xsk-wakeup' Maxim Mikityanskiy says: ==================== This series addresses the issue described in the commit message of the first patch: lack of synchronization between XSK wakeup and destroying the resources used by XSK wakeup. The idea is similar to napi_synchronize. The series contains fixes for the drivers that implement XSK. v2 incorporates changes suggested by Björn: 1. Call synchronize_rcu in Intel drivers only if the XDP program is being unloaded. 2. Don't forget rcu_read_lock when wakeup is called from xsk_poll. 3. Use xs->zc as the condition to call ndo_xsk_wakeup. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-12-19 16:20:53 +01:00
Maxim Mikityanskiy	c0fdccfd22	net/ixgbe: Fix concurrency issues between config flow and XSK Use synchronize_rcu to wait until the XSK wakeup function finishes before destroying the resources it uses: 1. ixgbe_down already calls synchronize_rcu after setting __IXGBE_DOWN. 2. After switching the XDP program, call synchronize_rcu to let ixgbe_xsk_wakeup exit before the XDP program is freed. 3. Changing the number of channels brings the interface down. 4. Disabling UMEM sets __IXGBE_TX_DISABLED before closing hardware resources and resetting xsk_umem. Check that bit in ixgbe_xsk_wakeup to avoid using the XDP ring when it's already destroyed. synchronize_rcu is called from ixgbe_txrx_ring_disable. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-5-maximmi@mellanox.com	2019-12-19 16:20:49 +01:00
Maxim Mikityanskiy	b3873a5be7	net/i40e: Fix concurrency issues between config flow and XSK Use synchronize_rcu to wait until the XSK wakeup function finishes before destroying the resources it uses: 1. i40e_down already calls synchronize_rcu. On i40e_down either __I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in i40e_xsk_wakeup (the former is already checked there). 2. After switching the XDP program, call synchronize_rcu to let i40e_xsk_wakeup exit before the XDP program is freed. 3. Changing the number of channels brings the interface down (see i40e_prep_for_reset and i40e_pf_quiesce_all_vsi). 4. Disabling UMEM sets __I40E_CONFIG_BUSY, too. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-4-maximmi@mellanox.com	2019-12-19 16:20:49 +01:00
Maxim Mikityanskiy	9cf88808ad	net/mlx5e: Fix concurrency issues between config flow and XSK After disabling resources necessary for XSK (the XDP program, channels, XSK queues), use synchronize_rcu to wait until the XSK wakeup function finishes, before freeing the resources. Suspend XSK wakeups during switching channels. If the XDP program is being removed, synchronize_rcu before closing the old channels to allow XSK wakeup to complete. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-3-maximmi@mellanox.com	2019-12-19 16:20:49 +01:00
Maxim Mikityanskiy	0687068208	xsk: Add rcu_read_lock around the XSK wakeup The XSK wakeup callback in drivers makes some sanity checks before triggering NAPI. However, some configuration changes may occur during this function that affect the result of those checks. For example, the interface can go down, and all the resources will be destroyed after the checks in the wakeup function, but before it attempts to use these resources. Wrap this callback in rcu_read_lock to allow driver to synchronize_rcu before actually destroying the resources. xsk_wakeup is a new function that encapsulates calling ndo_xsk_wakeup wrapped into the RCU lock. After this commit, xsk_poll starts using xsk_wakeup and checks xs->zc instead of ndo_xsk_wakeup != NULL to decide ndo_xsk_wakeup should be called. It also fixes a bug introduced with the need_wakeup feature: a non-zero-copy socket may be used with a driver supporting zero-copy, and in this case ndo_xsk_wakeup should not be called, so the xs->zc check is the correct one. Fixes: `77cd0d7b3f` ("xsk: add support for need_wakeup flag in AF_XDP rings") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-2-maximmi@mellanox.com	2019-12-19 16:20:48 +01:00
Alexei Starovoitov	580205dd4f	selftests/bpf: Fix test_attach_probe Fix two issues in test_attach_probe: 1. it was not able to parse /proc/self/maps beyond the first line, since %s means parse string until white space. 2. offset has to be accounted for otherwise uprobed address is incorrect. Fixes: `1e8611bbdf` ("selftests/bpf: add kprobe/uprobe selftests") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191219020442.1922617-1-ast@kernel.org	2019-12-19 16:14:08 +01:00
Rafael J. Wysocki	505b308b69	Merge branch 'pm-cpufreq' * pm-cpufreq: cpufreq: Avoid leaving stale IRQ work items during CPU offline	2019-12-19 16:10:52 +01:00
Toke Høiland-Jørgensen	12dd14b230	libbpf: Add missing newline in opts validation macro The error log output in the opts validation macro was missing a newline. Fixes: `2ce8450ef5` ("libbpf: add bpf_object__open_{file, mem} w/ extensible opts") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191219120714.928380-1-toke@redhat.com	2019-12-19 16:08:46 +01:00
Daniel Borkmann	7800a3d54a	Merge branch 'bpf-riscv-jit-improvements' Björn Töpel says: ==================== This series contain one non-critical fix, support for far jumps, and some optimizations for the BPF JIT. Previously, the JIT only supported 12b branch targets for conditional branches, and 21b for unconditional branches. Starting with this series, 32b branching is supported. As part of supporting far jumps, branch relaxation was introduced. The idea is to start with a pessimistic jump (e.g. auipc/jalr) and for each pass the JIT will have an opportunity to pick a better instruction (e.g. jal) and shrink the image. Instead of two passes, the JIT requires more passes. It typically converges after 3 passes. The optimizations mentioned in the subject are for calls and tail calls. In the tail call generation we can save one instruction by using the offset in jalr. Calls are optimized by doing (auipc)/jal(r) relative jumps instead of loading the entire absolute address and doing jalr. This required that the JIT image allocator was made RISC-V specific, so we can ensure that the JIT image and the kernel text are in range (32b). The last two patches of the series is not critical to the series, but are two UAPI build issues for BPF events. A closer look from the RV-folks would be much appreciated. The test_bpf.ko module, selftests/bpf/test_verifier and selftests/seccomp/seccomp_bpf pass all tests. RISC-V is still missing proper kprobe and tracepoint support, so a lot of BPF selftests cannot be run. v1->v2: [1] * Removed unused function parameter from emit_branch() * Added patch to support far branch in tail call emit [1] https://lore.kernel.org/bpf/20191209173136.29615-1-bjorn.topel@gmail.com/ ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-12-19 16:04:18 +01:00
Björn Töpel	34bfc10a6e	riscv, perf: Add arch specific perf_arch_bpf_user_pt_regs RISC-V was missing a proper perf_arch_bpf_user_pt_regs macro for CONFIG_PERF_EVENT builds. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191216091343.23260-10-bjorn.topel@gmail.com	2019-12-19 16:03:31 +01:00
Björn Töpel	eb9928bed0	riscv, bpf: Add missing uapi header for BPF_PROG_TYPE_PERF_EVENT programs Add missing uapi header the BPF_PROG_TYPE_PERF_EVENT programs by exporting struct user_regs_struct instead of struct pt_regs which is in-kernel only. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191216091343.23260-9-bjorn.topel@gmail.com	2019-12-19 16:03:31 +01:00
Björn Töpel	e368b64f8b	riscv, bpf: Optimize calls Instead of using emit_imm() and emit_jalr() which can expand to six instructions, start using jal or auipc+jalr. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191216091343.23260-8-bjorn.topel@gmail.com	2019-12-19 16:03:31 +01:00
Björn Töpel	7f3631e88e	riscv, bpf: Provide RISC-V specific JIT image alloc/free This commit makes sure that the JIT images is kept close to the kernel text, so BPF calls can use relative calling with auipc/jalr or jal instead of loading the full 64-bit address and jalr. The BPF JIT image region is 128 MB before the kernel text. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191216091343.23260-7-bjorn.topel@gmail.com	2019-12-19 16:03:31 +01:00
Björn Töpel	fe8322b866	riscv, bpf: Optimize BPF tail calls Remove one addi, and instead use the offset part of jalr. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191216091343.23260-6-bjorn.topel@gmail.com	2019-12-19 16:03:31 +01:00
Björn Töpel	33203c02f2	riscv, bpf: Add support for far jumps and exits This commit add support for far (offset > 21b) jumps and exits. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Luke Nelson <lukenels@cs.washington.edu> Link: https://lore.kernel.org/bpf/20191216091343.23260-5-bjorn.topel@gmail.com	2019-12-19 16:03:30 +01:00
Björn Töpel	29d92edd9e	riscv, bpf: Add support for far branching when emitting tail call Start use the emit_branch() function in the tail call emitter in order to support far branching. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191216091343.23260-4-bjorn.topel@gmail.com	2019-12-19 16:03:30 +01:00
Björn Töpel	7d1ef13fea	riscv, bpf: Add support for far branching This commit adds branch relaxation to the BPF JIT, and with that support for far (offset greater than 12b) branching. The branch relaxation requires more than two passes to converge. For most programs it is three passes, but for larger programs it can be more. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Luke Nelson <lukenels@cs.washington.edu> Link: https://lore.kernel.org/bpf/20191216091343.23260-3-bjorn.topel@gmail.com	2019-12-19 16:03:30 +01:00
Björn Töpel	f1003b787c	riscv, bpf: Fix broken BPF tail calls The BPF JIT incorrectly clobbered the a0 register, and did not flag usage of s5 register when BPF stack was being used. Fixes: `2353ecc6f9` ("bpf, riscv: add BPF JIT for RV64G") Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191216091343.23260-2-bjorn.topel@gmail.com	2019-12-19 16:03:30 +01:00
Yangbo Lu	f667216c5c	mmc: sdhci-of-esdhc: re-implement erratum A-009204 workaround The erratum A-009204 workaround patch was reverted because of incorrect implementation. `8b6dc6b` mmc: sdhci-of-esdhc: Revert "mmc: sdhci-of-esdhc: add erratum A-009204 support" This patch is to re-implement the workaround (add a 5 ms delay before setting SYSCTL[RSTD] to make sure all the DMA transfers are finished). Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Link: https://lore.kernel.org/r/20191219032335.26528-1-yangbo.lu@nxp.com Fixes: `5dd1955225` ("mmc: sdhci-of-esdhc: add erratum A-009204 support") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>	2019-12-19 08:13:43 +01:00
Jeffrey Hugo	781d8cea68	clk: qcom: Avoid SMMU/cx gdsc corner cases Mark the msm8998 cpu CX gdsc as votable and use the hw control to avoid corner cases with SMMU per hardware documentation. Fixes: `3f7df5baa2` ("clk: qcom: Add MSM8998 GPU Clock Controller (GPUCC) driver") Signed-off-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com> Link: https://lkml.kernel.org/r/20191217171905.5619-1-jeffrey.l.hugo@gmail.com Signed-off-by: Stephen Boyd <sboyd@kernel.org>	2019-12-18 22:02:27 -08:00
Matthias Kaehlcke	8d20c39f06	clk: qcom: gcc-sc7180: Fix setting flag for votable GDSCs Commit `17269568f7` ("clk: qcom: Add Global Clock controller (GCC) driver for SC7180") sets the VOTABLE flag in .pwrsts, but it needs to be set in .flags, fix this. Fixes: `17269568f7` ("clk: qcom: Add Global Clock controller (GCC) driver for SC7180") Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Link: https://lkml.kernel.org/r/20191204120341.1.I9971817e83ee890d1096c43c5a6ce6ced53d5bd3@changeid Signed-off-by: Stephen Boyd <sboyd@kernel.org>	2019-12-18 21:35:58 -08:00
Alexei Starovoitov	a352a82496	Merge branch 'libbpf-extern-followups' Andrii Nakryiko says: ==================== Based on latest feedback and discussions, this patch set implements the following changes: - Kconfig-provided externs have to be in .kconfig section, for which bpf_helpers.h provides convenient __kconfig macro (Daniel); - instead of allowing to override Kconfig file path, switch this to ability to extend and override system Kconfig with user-provided custom values (Alexei); - BTF is required when externs are used. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-12-18 17:33:52 -08:00
Andrii Nakryiko	630628cb7d	libbpf: BTF is required when externs are present BTF is required to get type information about extern variables. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191219002837.3074619-4-andriin@fb.com	2019-12-18 17:33:36 -08:00
Andrii Nakryiko	8601fd4221	libbpf: Allow to augment system Kconfig through extra optional config Instead of all or nothing approach of overriding Kconfig file location, allow to extend it with extra values and override chosen subset of values though optional user-provided extra config, passed as a string through open options' .kconfig option. If same config key is present in both user-supplied config and Kconfig, user-supplied one wins. This allows applications to more easily test various conditions despite host kernel's real configuration. If all of BPF object's __kconfig externs are satisfied from user-supplied config, system Kconfig won't be read at all. Simplify selftests by not needing to create temporary Kconfig files. Suggested-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191219002837.3074619-3-andriin@fb.com	2019-12-18 17:33:36 -08:00
Andrii Nakryiko	81bfdd087b	libbpf: Put Kconfig externs into .kconfig section Move Kconfig-provided externs into custom .kconfig section. Add __kconfig into bpf_helpers.h for user convenience. Update selftests accordingly. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191219002837.3074619-2-andriin@fb.com	2019-12-18 17:33:36 -08:00
Andrii Nakryiko	d69587062c	libbpf: Add bpf_link__disconnect() API to preserve underlying BPF resource There are cases in which BPF resource (program, map, etc) has to outlive userspace program that "installed" it in the system in the first place. When BPF program is attached, libbpf returns bpf_link object, which is supposed to be destroyed after no longer necessary through bpf_link__destroy() API. Currently, bpf_link destruction causes both automatic detachment and frees up any resources allocated to for bpf_link in-memory representation. This is inconvenient for the case described above because of coupling of detachment and resource freeing. This patch introduces bpf_link__disconnect() API call, which marks bpf_link as disconnected from its underlying BPF resouces. This means that when bpf_link is destroyed later, all its memory resources will be freed, but BPF resource itself won't be detached. This design allows to follow strict and resource-leak-free design by default, while giving easy and straightforward way for user code to opt for keeping BPF resource attached beyond lifetime of a bpf_link. For some BPF programs (i.e., FS-based tracepoints, kprobes, raw tracepoint, etc), user has to make sure to pin BPF program to prevent kernel to automatically detach it on process exit. This should typically be achived by pinning BPF program (or map in some cases) in BPF FS. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191218225039.2668205-1-andriin@fb.com	2019-12-18 17:17:47 -08:00
Linus Torvalds	4a94c43323	tpmdd fixes for Linux v5.5-rc3 -----BEGIN PGP SIGNATURE----- iJYEABYIAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCXfrLASAcamFya2tvLnNh a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0pZfAQD9F5Vjdqp3fWk+ pxt+eD9+xaD2MYuSVO2AEVBC949vdQD/TP7xnb66w7n9YtMtm9MgvysHAakJYeAe l4XsHAiPHgI= =CFIs -----END PGP SIGNATURE----- Merge tag 'tpmdd-next-20191219' of git://git.infradead.org/users/jjs/linux-tpmdd Pull tpm fixes from Jarkko Sakkinen: "Bunch of fixes for rc3" * tag 'tpmdd-next-20191219' of git://git.infradead.org/users/jjs/linux-tpmdd: tpm/tpm_ftpm_tee: add shutdown call back tpm: selftest: cleanup after unseal with wrong auth/policy test tpm: selftest: add test covering async mode tpm: fix invalid locking in NONBLOCKING mode security: keys: trusted: fix lost handle flush tpm_tis: reserve chip for duration of tpm_tis_core_init KEYS: asymmetric: return ENOMEM if akcipher_request_alloc() fails KEYS: remove CONFIG_KEYS_COMPAT	2019-12-18 17:17:36 -08:00
Nikita V. Shirokov	6de6c1f840	bpf: Allow to change skb mark in test_run allow to pass skb's mark field into bpf_prog_test_run ctx for BPF_PROG_TYPE_SCHED_CLS prog type. that would allow to test bpf programs which are doing decision based on this field Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-12-18 17:05:58 -08:00
Andrii Nakryiko	dacce6412e	bpftool: Work-around rst2man conversion bug Work-around what appears to be a bug in rst2man convertion tool, used to create man pages out of reStructureText-formatted documents. If text line starts with dot, rst2man will put it in resulting man file verbatim. This seems to cause man tool to interpret it as a directive/command (e.g., `.bs`), and subsequently not render entire line because it's unrecognized one. Enclose '.xxx' words in extra formatting to work around. Fixes: `cb21ac5885` ("bpftool: Add gen subcommand manpage") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com Link: https://lore.kernel.org/bpf/20191218221707.2552199-1-andriin@fb.com	2019-12-18 17:03:52 -08:00
Andrii Nakryiko	7c43e0d6a5	bpftool: Simplify format string to not use positional args Change format string referring to just single argument out of two available. Some versions of libc can reject such format string. Reported-by: Nikita Shirokov <tehnerd@tehnerd.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191218214314.2403729-1-andriin@fb.com	2019-12-18 17:02:16 -08:00
Pavel Tatashin	1760eb689e	tpm/tpm_ftpm_tee: add shutdown call back Add shutdown call back to close existing session with fTPM TA to support kexec scenario. Add parentheses to function names in comments as specified in kdoc. Signed-off-by: Thirupathaiah Annapureddy <thiruan@microsoft.com> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>	2019-12-19 02:31:28 +02:00
Chuhong Yuan	84c92365b2	drm/exynos: gsc: add missed component_del The driver forgets to call component_del in remove to match component_add in probe. Add the missed call to fix it. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: Inki Dae <inki.dae@samsung.net>	2019-12-19 08:52:42 +09:00
Jose Abreu	a1ec57c020	net: stmmac: tc: Fix TAPRIO division operation For ARCHs that don't support 64 bits division we need to use the helpers. Fixes: `b60189e039` ("net: stmmac: Integrate EST with TAPRIO scheduler API") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-18 15:12:33 -08:00
Vasily Gorbik	b4adfe5591	s390/ftrace: save traced function caller A typical backtrace acquired from ftraced function currently looks like the following (e.g. for "path_openat"): arch_stack_walk+0x15c/0x2d8 stack_trace_save+0x50/0x68 stack_trace_call+0x15a/0x3b8 ftrace_graph_caller+0x0/0x1c 0x3e0007e3c98 <- ftraced function caller (should be do_filp_open+0x7c/0xe8) do_open_execat+0x70/0x1b8 __do_execve_file.isra.0+0x7d8/0x860 __s390x_sys_execve+0x56/0x68 system_call+0xdc/0x2d8 Note random "0x3e0007e3c98" stack value as ftraced function caller. This value causes either imprecise unwinder result or unwinding failure. That "0x3e0007e3c98" comes from r14 of ftraced function stack frame, which it haven't had a chance to initialize since the very first instruction calls ftrace code ("ftrace_caller"). (ftraced function might never save r14 as well). Nevertheless according to s390 ABI any function is called with stack frame allocated for it and r14 contains return address. "ftrace_caller" itself is called with "brasl %r0,ftrace_caller". So, to fix this issue simply always save traced function caller onto ftraced function stack frame. Reported-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2019-12-18 23:29:26 +01:00
Vasily Gorbik	eef06cbf67	s390/unwind: stop gracefully at user mode pt_regs in irq stack Consider reaching user mode pt_regs at the bottom of irq stack graceful unwinder termination. This is the case when irq/mcck/ext interrupt arrives while in user mode. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2019-12-18 23:29:26 +01:00
Christian Borntraeger	c23587c92f	s390/purgatory: do not build purgatory with kcov, kasan and friends the purgatory must not rely on functions from the "old" kernel, so we must disable kasan and friends. We also need to have a separate copy of string.c as the default does not build memcmp with KASAN. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2019-12-18 23:29:26 +01:00
Hans de Goede	cd92ac2530	s390/purgatory: Make sure we fail the build if purgatory has missing symbols Since we link purgatory with -r aka we enable "incremental linking" no checks for unresolved symbols are done while linking the purgatory. This commit adds an extra check for unresolved symbols by calling ld without -r before running objcopy to generate purgatory.ro. This will help us catch missing symbols in the purgatory sooner. Note this commit also removes --no-undefined from LDFLAGS_purgatory as that has no effect. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/lkml/20191212205304.191610-1-hdegoede@redhat.com Tested-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2019-12-18 23:29:26 +01:00
Sven Schnelle	6feeee8efc	s390/ftrace: fix endless recursion in function_graph tracer The following sequence triggers a kernel stack overflow on s390x: mount -t tracefs tracefs /sys/kernel/tracing cd /sys/kernel/tracing echo function_graph > current_tracer [crash] This is because preempt_count_{add,sub} are in the list of traced functions, which can be demonstrated by: echo preempt_count_add >set_ftrace_filter echo function_graph > current_tracer [crash] The stack overflow happens because get_tod_clock_monotonic() gets called by ftrace but itself calls preempt_{disable,enable}(), which leads to a endless recursion. Fix this by using preempt_{disable,enable}_notrace(). Fixes: `011620688a` ("s390/time: ensure get_clock_monotonic() returns monotonic values") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2019-12-18 23:29:26 +01:00
David S. Miller	6bff001702	Merge branch 'ETS-qdisc' Petr Machata says: ==================== Add a new Qdisc, ETS The IEEE standard 802.1Qaz (and 802.1Q-2014) specifies four principal transmission selection algorithms: strict priority, credit-based shaper, ETS (bandwidth sharing), and vendor-specific. All these have their corresponding knobs in DCB. But DCB does not have interfaces to configure RED and ECN, unlike Qdiscs. In the Qdisc land, strict priority is implemented by PRIO. Credit-based transmission selection algorithm can then be modeled by having e.g. TBF or CBS Qdisc below some of the PRIO bands. ETS would then be modeled by placing a DRR Qdisc under the last PRIO band. The problem with this approach is that DRR on its own, as well as the combination of PRIO and DRR, are tricky to configure and tricky to offload to 802.1Qaz-compliant hardware. This is due to several reasons: - As any classful Qdisc, DRR supports adding classifiers to decide in which class to enqueue packets. Unlike PRIO, there's however no fallback in the form of priomap. A way to achieve classification based on packet priority is e.g. like this: # tc filter add dev swp1 root handle 1: \ basic match 'meta(priority eq 0)' flowid 1:10 Expressing the priomap in this manner however forces drivers to deep dive into the classifier block to parse the individual rules. A possible solution would be to extend the classes with a "defmap" a la split / defmap mechanism of CBQ, and introduce this as a last resort classification. However, unlike priomap, this doesn't have the guarantee of covering all priorities. Traffic whose priority is not covered is dropped by DRR as unclassified. But ASICs tend to implement dropping in the ACL block, not in scheduling pipelines. The need to treat these configurations correctly (if only to decide to not offload at all) complicates a driver. It's not clear how to retrofit priomap with all its benefits to DRR without changing it beyond recognition. - The interplay between PRIO and DRR is also causing problems. 802.1Qaz has all ETS TCs as a last resort. Switch ASICs that support ETS at all are likely to handle ETS traffic this way as well. However, the Linux model is more generic, allowing the DRR block in any band. Drivers would need to be careful to handle this case correctly, otherwise the offloaded model might not match the slow-path one. In a similar vein, PRIO and DRR need to agree on the list of priorities assigned to DRR. This is doubly problematic--the user needs to take care to keep the two in sync, and the driver needs to watch for any holes in DRR coverage and treat the traffic correctly, as discussed above. Note that at the time that DRR Qdisc is added, it has no classes, and thus any priorities assigned to that PRIO band are not covered. Thus this case is surprisingly rather common, and needs to be handled gracefully by the driver. - Similarly due to DRR flexibility, when a Qdisc (such as RED) is attached below it, it is not immediately clear which TC the class represents. This is unlike PRIO with its straightforward classid scheme. When DRR is combined with PRIO, the relationship between classes and TCs gets even more murky. This is a problem for users as well: the TC mapping is rather important for (devlink) shared buffer configuration and (ethtool) counters. So instead, this patch set introduces a new Qdisc, which is based on 802.1Qaz wording. It is PRIO-like in how it is configured, meaning one needs to specify how many bands there are, how many are strict and how many are ETS, quanta for the latter, and priomap. The new Qdisc operates like the PRIO / DRR combo would when configured as per the standard. The strict classes, if any, are tried for traffic first. When there's no traffic in any of the strict queues, the ETS ones (if any) are treated in the same way as in DRR. The chosen interface makes the overall system both reasonably easy to configure, and reasonably easy to offload. The extra code to support ETS in mlxsw (which already supports PRIO) is about 150 lines, of which perhaps 20 lines is bona fide new business logic. Credit-based shaping transmission selection algorithm can be configured by adding a CBS Qdisc under one of the strict bands (e.g. TBF can be used to a similar effect as well). As a non-work-conserving Qdisc, CBS can't be hooked under the ETS bands. This is detected and handled identically to DRR Qdisc at runtime. Note that offloading CBS is not subject of this patchset. The patchset proceeds in four stages: - Patches #1-#3 are cleanups. - Patches #4 and #5 contain the new Qdisc. - Patches #6 and #7 update mlxsw to offload the new Qdisc. - Patches #8-#10 add selftests for ETS. Examples: - Add a Qdisc with 6 bands, 3 strict and 3 ETS with 45%-30%-25% weights: # tc qdisc add dev swp1 root handle 1: \ ets strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5 # tc qdisc sh dev swp1 qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5 - Tweak quantum of one of the classes of the previous Qdisc: # tc class ch dev swp1 classid 1:4 ets quantum 1000 # tc qdisc sh dev swp1 qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 1000 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5 # tc class ch dev swp1 classid 1:3 ets quantum 1000 Error: Strict bands do not have a configurable quantum. - Purely strict Qdisc with 1:1 mapping between priorities and TCs: # tc qdisc add dev swp1 root handle 1: \ ets strict 8 priomap 7 6 5 4 3 2 1 0 # tc qdisc sh dev swp1 qdisc ets 1: root refcnt 2 bands 8 strict 8 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7 - Use "bands" to specify number of bands explicitly. Underspecified bands are implicitly ETS and their quantum is taken from MTU. The following thus gives each band the same weight: # tc qdisc add dev swp1 root handle 1: \ ets bands 8 priomap 7 6 5 4 3 2 1 0 # tc qdisc sh dev swp1 qdisc ets 1: root refcnt 2 bands 8 quanta 1514 1514 1514 1514 1514 1514 1514 1514 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7 v2: - This addresses points raised by David Miller. - Patch #4: - sch_ets.c: Add a comment with description of the Qdisc and the dequeuing algorithm. - Kconfig: Add a high-level description to the help blurb. v1: - No changes, first upstream submission after RFC. v3 (internal): - This addresses review from Jiri Pirko. - Patch #3: - Rename to _HR_ instead of to _HIERARCHY_. - Patch #4: - pkt_sched.h: Keep all the TCA_ETS_ constants in one enum. - pkt_sched.h: Rename TCA_ETS_BANDS to _NBANDS, _STRICT to _NSTRICT, _BAND_QUANTUM to _QUANTA_BAND and _PMAP_BAND to _PRIOMAP_BAND. - sch_ets.c: Update to reflect the above changes. Add a new policy, ets_class_policy, which is used when parsing class changes. Currently that policy is the same as the quanta policy, but that might change. - sch_ets.c: Move MTU handling from ets_quantum_parse() to the one caller that makes use of it. - sch_ets.c: ets_qdisc_priomap_parse(): WARN_ON_ONCE on invalid attribute instead of returning an extack. - Patch #6: - __mlxsw_sp_qdisc_ets_replace(): Pass the weights argument to this function in this patch already. Drop the weight computation. - mlxsw_sp_qdisc_prio_replace(): Rename "quanta" to "zeroes" and pass for the abovementioned "weights". - mlxsw_sp_qdisc_prio_graft(): Convert to a wrapper around __mlxsw_sp_qdisc_ets_graft(), instead of invoking the latter directly from mlxsw_sp_setup_tc_prio(). - Update to follow the _HIERARCHY_ -> _HR_ renaming. - Patch #7: - __mlxsw_sp_qdisc_ets_replace(): The "weights" argument passing and weight computation removal are now done in a previous patch. - mlxsw_sp_setup_tc_ets(): Drop case TC_ETS_REPLACE, which is handled earlier in the function. - Patch #3 (iproute2): - Add an example output to the commit message. - tc-ets.8: Fix output of two examples. - tc-ets.8: Describe default values of "bands", "quanta". - q_ets.c: A number of fixes in error messages. - q_ets.c: Comment formatting: /padding/ -> /* padding */ - q_ets.c: parse_nbands: Move duplicate checking to callers. - q_ets.c: Don't accept both "quantum" and "quanta" as equivalent. v2 (internal): - This addresses review from Ido Schimmel and comments from Alexander Kushnarov. - Patch #2: - s/coment/comment in the commit message. - Patch #4: - sch_ets: ets_class_is_strict(), ets_class_id(): Constify an argument - ets_class_find(): RXTify - Patch #3 (iproute2): - tc-ets.8: some spelling fixes - tc-ets.8: add another example - tc.8: add an ETS to "CLASSFUL QDISCS" section v1 (internal): - This addresses RFC reviews from Ido Schimmel and Roman Mashak, bugs found by Alexander Petrovskiy and myself, and other improvements. - Patch #2: - Expand the explanation with an explicit example. - Patch #4: - Kconfig: s/sch_drr/sch_ets/ - sch_ets: Reorder includes to be in alphabetical order - sch_ets: ets_quantum_parse(): Rename the return-pointer argument from pquantum to quantum, and use it directly, not going through a local temporary. - sch_ets: ets_qdisc_quanta_parse(): Convert syntax of function argument "quanta" from an array to a pointer. - sch_ets: ets_qdisc_priomap_parse(): Likewise with "priomap". - sch_ets: ets_qdisc_quanta_parse(), ets_qdisc_priomap_parse(): Invoke __nla_validate_nested directly instead of nl80211_validate_nested(). - sch_ets: ets_qdisc_quanta_parse(): WARN_ON_ONCE on invalid attribute instead of returning an extack. - sch_ets: ets_qdisc_change(): Make the last band the default one for unmentioned priomap priorities. - sch_ets: Fix a panic when an offloaded child in a bandwidth-sharing band notified its ETS parent. - sch_ets: When ungrafting, add the newly-created invisible FIFO to the Qdisc hash - Patch #5: - pkt_cls.h: Note that quantum=0 signifies a strict band. - Fix error path handling when ets_offload_dump() fails. - Patch #6: - __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function arguments "quanta" and "priomap" from arrays to pointers. - Patch #7: - __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function argument "weights" from an array to a pointer. - Patch #9: - mlxsw/sch_ets.sh: Add a comment explaining packet prioritization. - Adjust the whole suite to allow testing of traffic classifiers in addition to testing priomap. - Patch #10: - Add a number of new tests to test default priomap band, overlarge number of bands, zeroes in quanta, and altogether missing quanta. - Patch #1 (iproute2): - State motivation for inclusion of this patch in the patcheset in the commit message. - Patch #3 (iproute2): - tc-ets.8: it is now December - tc-ets.8: explain inactivity WRT using non-WC Qdiscs under ETS band - tc-ets.8: s/flow/band in explanation of quantum - tc-ets.8: explain what happens with priorities not covered by priomap - tc-ets.8: default priomap band is now the last one - q_ets.c: ets_parse_opt(): Remove unnecessary initialization of priomap and quanta. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-18 13:32:30 -08:00
Petr Machata	82c664b69c	selftests: qdiscs: Add test coverage for ETS Qdisc Add TDC coverage for the new ETS Qdisc. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-18 13:32:30 -08:00
Petr Machata	ddd3fd750f	selftests: forwarding: sch_ets: Add test coverage for ETS Qdisc This tests the newly-added ETS Qdisc. It runs two to three streams of traffic, each with a different priority. ETS Qdisc is supposed to allocate bandwidth according to the DRR algorithm and given weights. After running the traffic for a while, counters are compared for each stream to check that the expected ratio is in fact observed. In order for the DRR process to kick in, a traffic bottleneck must exist in the first place. In slow path, such bottleneck can be implemented by wrapping the ETS Qdisc inside a TBF or other shaper. This might however make the configuration unoffloadable. Instead, on HW datapath, the bottleneck would be set up by lowering port speed and configuring shared buffer suitably. Therefore the test is structured as a core component that implements the testing, with two wrapper scripts that implement the details of slow path resp. fast path configuration. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-18 13:32:30 -08:00
Petr Machata	4cf9b8f992	selftests: forwarding: Move start_/stop_traffic from mlxsw to lib.sh These two functions are used for starting several streams of traffic, and then stopping them later. They will be handy for the test coverage of ETS Qdisc. Move them from mlxsw-specific qos_lib.sh to the generic lib.sh. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-18 13:32:30 -08:00
Petr Machata	19f405b988	mlxsw: spectrum_qdisc: Support offloading of ETS Qdisc Handle TC_SETUP_QDISC_ETS, add a new ops structure for the ETS Qdisc. Invoke the extended prio handlers implemented in the previous patch. For stats ops, invoke directly the prio callbacks, which are not sensitive to differences between PRIO and ETS. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-18 13:32:30 -08:00
Petr Machata	7917f52ae1	mlxsw: spectrum_qdisc: Generalize PRIO offload to support ETS Thanks to the similarity between PRIO and ETS it is possible to simply reuse most of the code for offloading PRIO Qdisc. Extract the common functionality into separate functions, making the current PRIO handlers thin API adapters. Extend the new functions to pass quanta for individual bands, which allows configuring a subset of bands as WRR. Invoke mlxsw_sp_port_ets_set() as appropriate to de/configure WRR-ness and weight of individual bands. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-18 13:32:30 -08:00
Petr Machata	d35eb52bd2	net: sch_ets: Make the ETS qdisc offloadable Add hooks at appropriate points to make it possible to offload the ETS Qdisc. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-18 13:32:29 -08:00
Petr Machata	dcc68b4d80	net: sch_ets: Add a new Qdisc Introduces a new Qdisc, which is based on 802.1Q-2014 wording. It is PRIO-like in how it is configured, meaning one needs to specify how many bands there are, how many are strict and how many are dwrr, quanta for the latter, and priomap. The new Qdisc operates like the PRIO / DRR combo would when configured as per the standard. The strict classes, if any, are tried for traffic first. When there's no traffic in any of the strict queues, the ETS ones (if any) are treated in the same way as in DRR. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-18 13:32:29 -08:00
Petr Machata	9cf9b925d5	mlxsw: spectrum: Rename MLXSW_REG_QEEC_HIERARCY_* enumerators These enums want to be named MLXSW_REG_QEEC_HIERARCHY_, but due to a typo lack the second H. That is confusing and complicates searching. But actually the enumerators should be named _HR_, because that is how their enum type is called. So rename them as appropriate. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-18 13:32:29 -08:00

... 5 6 7 8 9 ...

888380 Commits