linux

Author	SHA1	Message	Date
Sean Christopherson	142ccde1f7	KVM: x86/mmu: Coalesce TLB flushes when zapping collapsible SPTEs Gather pending TLB flushes across both the legacy and TDP MMUs when zapping collapsible SPTEs to avoid multiple flushes if both the legacy MMU (for nested guests) and TDP MMU have mappings for the memslot. Note, this also optimizes the TDP MMU to flush only the relevant range when running as L1 with Hyper-V enlightenments. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:54 -04:00
Sean Christopherson	302695a574	KVM: x86/mmu: Move flushing for "slot" handlers to caller for legacy MMU Place the onus on the caller of slot_handle_*() to flush the TLB, rather than handling the flush in the helper, and rename parameters accordingly. This will allow future patches to coalesce flushes between address spaces and between the legacy and TDP MMUs. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:53 -04:00
Sean Christopherson	af95b53e56	KVM: x86/mmu: Coalesce TDP MMU TLB flushes when zapping collapsible SPTEs When zapping collapsible SPTEs across multiple roots, gather pending flushes and perform a single remote TLB flush at the end, as opposed to flushing after processing every root. Note, flush may be cleared by the result of zap_collapsible_spte_range(). This is intended and correct, e.g. yielding may have serviced a prior pending flush. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:53 -04:00
Vitaly Kuznetsov	c28fa560c5	KVM: x86/vPMU: Forbid reading from MSR_F15H_PERF MSRs when guest doesn't have X86_FEATURE_PERFCTR_CORE MSR_F15H_PERF_CTL0-5, MSR_F15H_PERF_CTR0-5 MSRs have a CPUID bit assigned to them (X86_FEATURE_PERFCTR_CORE) and when it wasn't exposed to the guest the correct behavior is to inject #GP an not just return zero. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210329124804.170173-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:53 -04:00
Krish Sadhukhan	9a7de6ecc3	KVM: nSVM: If VMRUN is single-stepped, queue the #DB intercept in nested_svm_vmexit() According to APM, the #DB intercept for a single-stepped VMRUN must happen after the completion of that instruction, when the guest does #VMEXIT to the host. However, in the current implementation of KVM, the #DB intercept for a single-stepped VMRUN happens after the completion of the instruction that follows the VMRUN instruction. When the #DB intercept handler is invoked, it shows the RIP of the instruction that follows VMRUN, instead of of VMRUN itself. This is an incorrect RIP as far as single-stepping VMRUN is concerned. This patch fixes the problem by checking, in nested_svm_vmexit(), for the condition that the VMRUN instruction is being single-stepped and if so, queues the pending #DB intercept so that the #DB is accounted for before we execute L1's next instruction. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oraacle.com> Message-Id: <20210323175006.73249-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:52 -04:00
Paolo Bonzini	4a38162ee9	KVM: MMU: load PDPTRs outside mmu_lock On SVM, reading PDPTRs might access guest memory, which might fault and thus might sleep. On the other hand, it is not possible to release the lock after make_mmu_pages_available has been called. Therefore, push the call to make_mmu_pages_available and the mmu_lock critical section within mmu_alloc_direct_roots and mmu_alloc_shadow_roots. Reported-by: Wanpeng Li <wanpengli@tencent.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:52 -04:00
Paolo Bonzini	d9bd0082e2	Merge remote-tracking branch 'tip/x86/sgx' into kvm-next Pull generic x86 SGX changes needed to support SGX in virtual machines.	2021-04-17 08:29:47 -04:00
Paolo Bonzini	387cb8e89d	KVM: s390: Fix potential crash in preemptible kernels There is a potential race for preemptible kernels, where the host kernel would get a fault when it is preempted as the wrong point in time. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJgeo0hAAoJEBF7vIC1phx8Um4QAID4KCVuRhAiRs3z2m4DYHsQ cKTUGuBkty7gJfrO5byT9brvN9nf58Sxm22U/fUzgSD+W4wBQMVUl2nJFECg7ZoH GITCOl9UCT35Sllp6v2ZJB/RtVGESklhmS8rJo7FAXjR2SlJJaW0nZvFuI//jcjX 5O+DSj2PoqJPSmwasZWCyCvHJouswcEFkF+1wI3oUww7XMBFF31MPI1g8jZ4DRtj BI8uDx5W41qnpbccMQNHmi15J8ff+Of3qWe8y2+z+68puNHdNYV/fwybfa0OhelV bgkdNA1HOeUVcKkf+JpDsl/1LmIfrWbwieDlGuUapjJU4ohMXwS8/m5lePq7Gmnn Zf03aSk+GfD4T4l5HJcFEqy0HxHWrGYgGVMWKlvXm9qkdQ/1tl5DhWHgHKbg8L6f btEpKrwAuzTE/5zDd163pB/E4oVXXqvSn8pfCEsx5T7azxDiGllxCAP+oU7tSwlS wjgwJYwJvKTvsgVSR8FeCWUgcCDD3Y6yI5KZZcpzPuwcfNQsl50Z1GYFmS/WTl9J cqmAFsanNR/PC1SmVnuJgucOPx3vyVqcHQ8AWK2TirHuRx5q53oBqFBioB3dY96G 8/SkXOskwvlsI2lzrNGaSm9Sd63Su82pU9NlU7crHzhQScoHNNIYI1dd3zW9k9Nr Y8KTpV79FyZdyomnoRH+ =CsvI -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix potential crash in preemptible kernels There is a potential race for preemptible kernels, where the host kernel would get a fault when it is preempted as the wrong point in time.	2021-04-17 08:29:41 -04:00
Xiongwei Song	7153d4bf0b	powerpc/traps: Enhance readability for trap types Define macros to list ppc interrupt types in interttupt.h, replace the reference of the trap hex values with these macros. Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S, arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S, arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h. Signed-off-by: Xiongwei Song <sxwjean@gmail.com> [mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1618398033-13025-1-git-send-email-sxwjean@me.com	2021-04-17 22:20:19 +10:00
Ali Saidi	84a24bf8c5	locking/qrwlock: Fix ordering in queued_write_lock_slowpath() While this code is executed with the wait_lock held, a reader can acquire the lock without holding wait_lock. The writer side loops checking the value with the atomic_cond_read_acquire(), but only truly acquires the lock when the compare-and-exchange is completed successfully which isn’t ordered. This exposes the window between the acquire and the cmpxchg to an A-B-A problem which allows reads following the lock acquisition to observe values speculatively before the write lock is truly acquired. We've seen a problem in epoll where the reader does a xchg while holding the read lock, but the writer can see a value change out from under it. Writer \| Reader -------------------------------------------------------------------------------- ep_scan_ready_list() \| \|- write_lock_irq() \| \|- queued_write_lock_slowpath() \| \|- atomic_cond_read_acquire() \| \| read_lock_irqsave(&ep->lock, flags); --> (observes value before unlock) \| chain_epi_lockless() \| \| epi->next = xchg(&ep->ovflist, epi); \| \| read_unlock_irqrestore(&ep->lock, flags); \| \| \| atomic_cmpxchg_relaxed() \| \|-- READ_ONCE(ep->ovflist); \| A core can order the read of the ovflist ahead of the atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire semantics addresses this issue at which point the atomic_cond_read can be switched to use relaxed semantics. Fixes: `b519b56e37` ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock") Signed-off-by: Ali Saidi <alisaidi@amazon.com> [peterz: use try_cmpxchg()] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steve Capper <steve.capper@arm.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Tested-by: Steve Capper <steve.capper@arm.com>	2021-04-17 13:40:50 +02:00
Peter Zijlstra	9406415f46	sched/debug: Rename the sched_debug parameter to sched_verbose CONFIG_SCHED_DEBUG is the build-time Kconfig knob, the boot param sched_debug and the /debug/sched/debug_enabled knobs control the sched_debug_enabled variable, but what they really do is make SCHED_DEBUG more verbose, so rename the lot. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2021-04-17 13:22:44 +02:00
Jiapeng Chong	9b9310445f	rtc: ds1511: remove unused function Fix the following clang warning: drivers/rtc/rtc-ds1511.c:108:1: warning: unused function 'rtc_write_alarm' [-Wunused-function]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/1618475821-102974-1-git-send-email-jiapeng.chong@linux.alibaba.com	2021-04-17 11:21:04 +02:00
Michael Walle	7fcb861859	rtc: fsl-ftm-alarm: add MODULE_TABLE() The module doesn't load automatically. Fix it by adding the missing MODULE_TABLE(). Fixes: `7b0b551dbc` ("rtc: fsl-ftm-alarm: add FTM alarm driver") Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20210414084006.17933-1-michael@walle.cc	2021-04-17 11:21:04 +02:00
Kalle Valo	197b9c152b	iwlwifi patches for v5.13 * Add support for new FTM FW APIs; * Some CSA fixes; * Support for new HW family and other HW detection fixes; * Robustness improvement in the HW detection code; * One fix in PMF; * Some new regulatory features; * Support for passive scan in 6GHz; * Some improvements in the sync queue implementation; * Support for new devices; * Support for a new FW API command version; * Some locking fixes; * Bump the FW API version support for AX devices; * Some other small fixes, clean-ups and improvements. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEF3LNfgb2BPWm68smoUecoho8xfoFAmB2tukACgkQoUecoho8 xfob8w/+Oih7yNEBrz/nwDRmZmHgjM2+XZjOucwLBbwBbv9SnGtpohBa4Fuy/Ekw Gi6dGWBBn4U8QxRGVG3UMztlZSVpeWuqbqLyG0VyST66rCncLK0quxV2y4jt06mH BJ/hcPpYmUzRInegFhfEgAXKoy06/CsHyjiX23ZKVCSONRxZstJ9wcSwmzknpDM6 eADa5za9Yynndlp0yHpOtOKpt2YMogmWhKyhiq54hPsbJsnblxHYuGCO1g4aYiQV YMMO+gM91Ux5sBmcGJMlzfBbzmJ/ieQ+u/mNaPb77sNglB5xS7diNE2LlsFxmeWS SafQWhezbs+EdjSHlKXBw61ZStlrIMKVAHY4yYcJwrwpRot0C7JymvpJ6lDRYLz1 Hw+diqQcQqpTA9vFsHhtdqZwgpg55b+QU+P07WNlXehheenfg7f0vDYaVgiIl3j/ K+eZXo9DWX5Bf1SgriBnnDQd4XA2KYg9ViEswJ/AUHl3GtHwLdwOa3fB+GLCKFgX PlYAxJitYeVf5kDDVdxnP+DXOZO4HGD6WwFl5GU3nTdT3Fg4ZJ6/UCsG4hINjNiK OAO3b6N2xExHanVjCZwuMLG27fxXwnbZBtg5XGrjznLZrMaiQL/A81DEvVXCVBFq 1i/37EYdZpca5IRRXBw0V84q4z/kIekpPCCLAOawmEcj6XOZBRQ= =dm69 -----END PGP SIGNATURE----- Merge tag 'iwlwifi-next-for-kalle-2021-04-12-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next iwlwifi patches for v5.13 * Add support for new FTM FW APIs; * Some CSA fixes; * Support for new HW family and other HW detection fixes; * Robustness improvement in the HW detection code; * One fix in PMF; * Some new regulatory features; * Support for passive scan in 6GHz; * Some improvements in the sync queue implementation; * Support for new devices; * Support for a new FW API command version; * Some locking fixes; * Bump the FW API version support for AX devices; * Some other small fixes, clean-ups and improvements. # gpg: Signature made Wed 14 Apr 2021 12:33:29 PM EEST using RSA key ID 1A3CC5FA # gpg: Good signature from "Luciano Roth Coelho (Luca) <luca@coelho.fi>" # gpg: aka "Luciano Roth Coelho (Intel) <luciano.coelho@intel.com>"	2021-04-17 11:38:01 +03:00
Kalle Valo	961b27ffc5	mt76 patches for 5.13 * code cleanup * mt7915/mt7615 decap offload support * driver fixes * mt7613 eeprom support * MCU code unification * threaded NAPI support * new device IDs * mt7921 device reset support * rx timestamp support -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iEYEABECAAYFAmB0t30ACgkQ130UHQKnbvUn1gCfVybGmVsXGEZy05Zci0nzqu58 TFMAoItsQM6BieJvGMifoQov1ogTaTc9 =xjS9 -----END PGP SIGNATURE----- Merge tag 'mt76-for-kvalo-2021-04-12' of https://github.com/nbd168/wireless mt76 patches for 5.13 * code cleanup * mt7915/mt7615 decap offload support * driver fixes * mt7613 eeprom support * MCU code unification * threaded NAPI support * new device IDs * mt7921 device reset support * rx timestamp support # gpg: Signature made Tue 13 Apr 2021 12:11:25 AM EEST using DSA key ID 02A76EF5 # gpg: Good signature from "Felix Fietkau <nbd@nbd.name>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 75D1 1A7D 91A7 710F 4900 42EF D77D 141D 02A7 6EF5	2021-04-17 11:34:43 +03:00
Lucas Endres	d86f43b17e	ALSA: usb-audio: Add support for many Roland devices' implicit feedback quirks It makes USB audio capture and playback possible and pristine on my Roland INTEGRA-7, Boutique D-05, and R-26, along with many more I've encountered people having had issues with over the last decade or so. Signed-off-by: Lucas Endres <jaffa225man@gmail.com> Link: https://lore.kernel.org/r/CAOsVg8rA61B=005_VyUwpw3piVwA7Bo5fs1GYEB054efyzGjLw@mail.gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>	2021-04-17 10:07:04 +02:00
Dan Williams	fae8817ae8	cxl/mem: Fix memory device capacity probing The CXL Identify Memory Device output payload emits capacity in 256MB units. The driver is treating the capacity field as bytes. This was missed because QEMU reports bytes when it should report bytes / 256MB. Fixes: `8adaf747c9` ("cxl/mem: Find device capabilities") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Cc: Ben Widawsky <ben.widawsky@intel.com> Link: https://lore.kernel.org/r/161862021044.3259705.7008520073059739760.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-04-16 18:21:56 -07:00
Tony Ambardar	7de21e679e	powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h A few archs like powerpc have different errno.h values for macros EDEADLOCK and EDEADLK. In code including both libc and linux versions of errno.h, this can result in multiple definitions of EDEADLOCK in the include chain. Definitions to the same value (e.g. seen with mips) do not raise warnings, but on powerpc there are redefinitions changing the value, which raise warnings and errors (if using "-Werror"). Guard against these redefinitions to avoid build errors like the following, first seen cross-compiling libbpf v5.8.9 for powerpc using GCC 8.4.0 with musl 1.1.24: In file included from ../../arch/powerpc/include/uapi/asm/errno.h:5, from ../../include/linux/err.h:8, from libbpf.c:29: ../../include/uapi/asm-generic/errno.h:40: error: "EDEADLOCK" redefined [-Werror] #define EDEADLOCK EDEADLK In file included from toolchain-powerpc_8540_gcc-8.4.0_musl/include/errno.h:10, from libbpf.c:26: toolchain-powerpc_8540_gcc-8.4.0_musl/include/bits/errno.h:58: note: this is the location of the previous definition #define EDEADLOCK 58 cc1: all warnings being treated as errors Cc: Stable <stable@vger.kernel.org> Reported-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200917135437.1238787-1-Tony.Ambardar@gmail.com	2021-04-17 10:40:51 +10:00
Srikar Dronamraju	c1e53367da	powerpc/smp: Cache CPU to chip lookup On systems with large CPUs per node, even with the filtered matching of related CPUs, there can be large number of calls to cpu_to_chip_id for the same CPU. For example with 4096 vCPU, 1 node QEMU configuration, with 4 threads per core, system could be see upto 1024 calls to cpu_to_chip_id() for the same CPU. On a given system, cpu_to_chip_id() for a given CPU would always return the same. Hence cache the result in a lookup table for use in subsequent calls. Since all CPUs sharing the same core will belong to the same chip, the lookup_table has an entry for one CPU per core. chip_id_lookup_table is not being freed and would be used on subsequent CPU online post CPU offline. Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210415120934.232271-4-srikar@linux.vnet.ibm.com	2021-04-17 10:40:51 +10:00
Srikar Dronamraju	131c82b6a1	Revert "powerpc/topology: Update topology_core_cpumask" Now that cpu_core_mask has been reintroduced, lets revert commit `4bce545903` ("powerpc/topology: Update topology_core_cpumask") Post this commit, lscpu should reflect topologies as requested by a user when a QEMU instance is launched with NUMA spanning multiple sockets. Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210415120934.232271-3-srikar@linux.vnet.ibm.com	2021-04-17 10:40:51 +10:00
Srikar Dronamraju	c47f892d7a	powerpc/smp: Reintroduce cpu_core_mask Daniel reported that with Commit `4ca234a9cb` ("powerpc/smp: Stop updating cpu_core_mask") QEMU was unable to set single NUMA node SMP topologies such as: -smp 8,maxcpus=8,cores=2,threads=2,sockets=2 i.e he expected 2 sockets in one NUMA node. The above commit helped to reduce boot time on Large Systems for example 4096 vCPU single socket QEMU instance. PAPR is silent on having more than one socket within a NUMA node. cpu_core_mask and cpu_cpu_mask for any CPU would be same unless the number of sockets is different from the number of NUMA nodes. One option is to reintroduce cpu_core_mask but use a slightly different method to arrive at the cpu_core_mask. Previously each CPU's chip-id would be compared with all other CPU's chip-id to verify if both the CPUs were related at the chip level. Now if a CPU 'A' is found related / (unrelated) to another CPU 'B', all the thread siblings of 'A' and thread siblings of 'B' are automatically marked as related / (unrelated). Also if a platform doesn't support ibm,chip-id property, i.e its cpu_to_chip_id returns -1, cpu_core_map holds a copy of cpu_cpu_mask(). Fixes: `4ca234a9cb` ("powerpc/smp: Stop updating cpu_core_mask") Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210415120934.232271-2-srikar@linux.vnet.ibm.com	2021-04-17 10:40:51 +10:00
Cédric Le Goater	e9e16917bc	powerpc/xive: Use the "ibm, chip-id" property only under PowerNV The 'chip_id' field of the XIVE CPU structure is used to choose a target for a source located on the same chip. For that, the XIVE driver queries the chip identifier from the "ibm,chip-id" property and compares it to a 'src_chip' field identifying the chip of a source. This information is only available on the PowerNV platform, 'src_chip' being assigned to XIVE_INVALID_CHIP_ID under pSeries. The "ibm,chip-id" property is also not available on all platforms. It was first introduced on PowerNV and later, under QEMU for pSeries/KVM. However, the property is not part of PAPR and does not exist under pSeries/PowerVM. Assign 'chip_id' to XIVE_INVALID_CHIP_ID by default and let the PowerNV platform override the value with the "ibm,chip-id" property. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210413130352.1183267-1-clg@kaod.org	2021-04-17 10:40:51 +10:00
David S. Miller	474f459360	Merge branch 'mptcp-fixes-and-tracepoints' Mat Martineau says: ==================== mptcp: Fixes and tracepoints from the mptcp tree Here's one more batch of changes that we've tested out in the MPTCP tree. Patch 1 makes the MPTCP KUnit config symbol more consistent with other subsystems. Patch 2 fixes a couple of format specifiers in pr_debug()s Patches 3-7 add four helpful tracepoints for MPTCP. Patch 8 is a one-line refactor to use an available helper macro. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:10:40 -07:00
Geliang Tang	442279154c	mptcp: use mptcp_for_each_subflow in mptcp_close This patch used the macro helper mptcp_for_each_subflow() instead of list_for_each_entry() in mptcp_close. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:10:40 -07:00
Geliang Tang	d96a838a7c	mptcp: add tracepoint in subflow_check_data_avail This patch added a tracepoint in subflow_check_data_avail() to show the mapping status. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:10:40 -07:00
Geliang Tang	ed66bfb4ce	mptcp: add tracepoint in ack_update_msk This patch added a tracepoint in ack_update_msk() to track the incoming data_ack and window/snd_una updates. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:10:40 -07:00
Geliang Tang	0918e34b85	mptcp: add tracepoint in get_mapping_status This patch added a tracepoint in the mapping status function get_mapping_status() to dump every mpext field. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:10:40 -07:00
Geliang Tang	e10a989209	mptcp: add tracepoint in mptcp_subflow_get_send This patch added a tracepoint in the packet scheduler function mptcp_subflow_get_send(). Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:10:40 -07:00
Geliang Tang	43f1140b96	mptcp: export mptcp_subflow_active This patch moved the static function mptcp_subflow_active to protocol.h as an inline one. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:10:40 -07:00
Geliang Tang	e4b6135134	mptcp: fix format specifiers for unsigned int Some of the sequence numbers are printed as the negative ones in the debug log: [ 46.250932] MPTCP: DSS [ 46.250940] MPTCP: data_fin=0 dsn64=0 use_map=0 ack64=1 use_ack=1 [ 46.250948] MPTCP: data_ack=2344892449471675613 [ 46.251012] MPTCP: msk=000000006e157e3f status=10 [ 46.251023] MPTCP: msk=000000006e157e3f snd_data_fin_enable=0 pending=0 snd_nxt=2344892449471700189 write_seq=2344892449471700189 [ 46.251343] MPTCP: msk=00000000ec44a129 ssk=00000000f7abd481 sending dfrag at seq=-1658937016627538668 len=100 already sent=0 [ 46.251360] MPTCP: data_seq=16787807057082012948 subflow_seq=1 data_len=100 dsn64=1 This patch used the format specifier %u instead of %d for the unsigned int values to fix it. Fixes: `d9ca1de8c0` ("mptcp: move page frag allocation in mptcp_sendmsg()") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:10:40 -07:00
Nico Pache	3fcc8a25e3	kunit: mptcp: adhere to KUNIT formatting standard Drop 'S' from end of CONFIG_MPTCP_KUNIT_TESTS in order to adhere to the KUNIT *_KUNIT_TEST config name format. Fixes: `a00a582203` (mptcp: move crypto test to KUNIT) Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Nico Pache <npache@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:10:40 -07:00
David S. Miller	820dd7a244	Merge branch 'enetc-xdp-fixes' Vladimir Oltean says: ==================== Fixups for XDP on NXP ENETC After some more XDP testing on the NXP LS1028A, this is a set of 10 bug fixes, simplifications and tweaks, ranging from addressing Toke's feedback (the network stack can run concurrently with XDP on the same TX rings) to fixing some OOM conditions seen under TX congestion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:40 -07:00
Vladimir Oltean	24e3930971	net: enetc: apply the MDIO workaround for XDP_REDIRECT too Described in `fd5736bf9f` ("enetc: Workaround for MDIO register access issue") is a workaround for a hardware bug that requires a register access of the MDIO controller to never happen concurrently with a register access of a port PF. To avoid that, a mutual exclusion scheme with rwlocks was implemented - the port PF accessors are the 'read' side, and the MDIO accessors are the 'write' side. When we do XDP_REDIRECT between two ENETC interfaces, all is fine because the MDIO lock is already taken from the NAPI poll loop. But when the ingress interface is not ENETC, just the egress is, the MDIO lock is not taken, so we might access the port PF registers concurrently with MDIO, which will make the link flap due to wrong values returned from the PHY. To avoid this, let's just slap an enetc_lock_mdio/enetc_unlock_mdio at the beginning and ending of enetc_xdp_xmit. The fact that the MDIO lock is designed as a rwlock is important here, because the read side is reentrant (that is one of the main reasons why we chose it). Usually, the way we benefit of its reentrancy is by running the data path concurrently on both CPUs, but in this case, we benefit from the reentrancy by taking the lock even when the lock is already taken (and that's the situation where ENETC is both the ingress and the egress interface for XDP_REDIRECT, which was fine before and still is fine now). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:40 -07:00
Vladimir Oltean	92ff9a6e57	net: enetc: fix buffer leaks with XDP_TX enqueue rejections If the TX ring is congested, enetc_xdp_tx() returns false for the current XDP frame (represented as an array of software BDs). This array of software TX BDs is constructed in enetc_rx_swbd_to_xdp_tx_swbd from software BDs freshly cleaned from the RX ring. The issue is that we scrub the RX software BDs too soon, more precisely before we know that we can enqueue the TX BDs successfully into the TX ring. If we can't enqueue them (and enetc_xdp_tx returns false), we call enetc_xdp_drop which attempts to recycle the buffers held by the RX software BDs. But because we scrubbed those RX BDs already, two things happen: (a) we leak their memory (b) we populate the RX software BD ring with an all-zero rx_swbd structure, which makes the buffer refill path allocate more memory. enetc_refill_rx_ring -> if (unlikely(!rx_swbd->page)) -> enetc_new_page That is a recipe for fast OOM. Fixes: `7ed2bc8007` ("net: enetc: add support for XDP_TX") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:40 -07:00
Vladimir Oltean	975acc833c	net: enetc: handle the invalid XDP action the same way as XDP_DROP When the XDP program returns an invalid action, we should free the RX buffer. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:40 -07:00
Vladimir Oltean	7eab503b11	net: enetc: use dedicated TX rings for XDP It is possible for one CPU to perform TX hashing (see netdev_pick_tx) between the 8 ENETC TX rings, and the TX hashing to select TX queue 1. At the same time, it is possible for the other CPU to already use TX ring 1 for XDP (either XDP_TX or XDP_REDIRECT). Since there is no mutual exclusion between XDP and the network stack, we run into an issue because the ENETC TX procedure is not reentrant. The obvious approach would be to just make XDP take the lock of the network stack's TX queue corresponding to the ring it's about to enqueue in. For XDP_REDIRECT, this is quite straightforward, a lock at the beginning and end of enetc_xdp_xmit() should do the trick. But for XDP_TX, it's a bit more complicated. For one, we do TX batching all by ourselves for frames with the XDP_TX verdict. This is something we would like to keep the way it is, for performance reasons. But batching means that the network stack's lock should be kept from the first enqueued XDP_TX frame and until we ring the doorbell. That is mostly fine, except for cases when in the same NAPI loop we have mixed XDP_TX and XDP_REDIRECT frames. So if enetc_xdp_xmit() gets called while we are holding the lock from the RX NAPI, then bam, deadlock. The naive answer could be 'just flush the XDP_TX frames first, then release the network stack's TX queue lock, then call xdp_do_flush_map()'. But even xdp_do_redirect() is capable of flushing the batched XDP_REDIRECT frames, so unless we unlock/relock the TX queue around xdp_do_redirect(), there simply isn't any clean way to protect XDP_TX from concurrent network stack .ndo_start_xmit() on another CPU. So we need to take a different approach, and that is to reserve two rings for the sole use of XDP. We leave TX rings 0..ndev->real_num_tx_queues-1 to be handled by the network stack, and we pick them from the end of the priv->tx_ring array. We make an effort to keep the mapping done by enetc_alloc_msix() which decides which CPU handles the TX completions of which TX ring in its NAPI poll. So the XDP TX ring of CPU 0 is handled by TX ring 6, and the XDP TX ring of CPU 1 is handled by TX ring 7. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:40 -07:00
Vladimir Oltean	ee3e875f10	net: enetc: increase TX ring size Now that commit `d6a2829e82` ("net: enetc: increase RX ring default size") has increased the RX ring size, it is quite easy to congest the TX rings when the traffic is predominantly XDP_TX, as the RX ring is quite a bit larger than the TX one. Since we bit the bullet and did the expensive thing already (larger RX rings consume more memory pages), it seems quite foolish to keep the TX rings small. So make them equally sized with TX. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:39 -07:00
Vladimir Oltean	a6369fe6e0	net: enetc: remove unneeded xdp_do_flush_map() xdp_do_redirect already contains: -> dev_map_enqueue -> __xdp_enqueue -> bq_enqueue -> bq_xmit_all // if we have more than 16 frames So the logic from enetc will never be hit, because ENETC_DEFAULT_TX_WORK is 128. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:39 -07:00
Vladimir Oltean	8f50d8bb3f	net: enetc: stop XDP NAPI processing when build_skb() fails When the code path below fails: enetc_clean_rx_ring_xdp // XDP_PASS -> enetc_build_skb -> enetc_map_rx_buff_to_skb -> build_skb enetc_clean_rx_ring_xdp will 'break', but that 'break' instruction isn't strong enough to actually break the NAPI poll loop, just the switch/case statement for XDP actions. So we increment rx_frm_cnt and go to the next frames minding our own business. Instead let's do what the skb NAPI poll function does, and break the loop now, waiting for the memory pressure to go away. Otherwise the next calls to build_skb() are likely to fail too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:39 -07:00
Vladimir Oltean	672f9a2198	net: enetc: recycle buffers for frames with RX errors When receiving a frame with errors, currently we do nothing with it (we don't construct an skb or an xdp_buff), we just exit the NAPI poll loop. Let's put the buffer back into the RX ring (similar to XDP_DROP). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:39 -07:00
Vladimir Oltean	6b04830d5e	net: enetc: rename the buffer reuse helpers enetc_put_xdp_buff has nothing to do with XDP, frankly, it is just a helper to populate the recycle end of the shadow RX BD ring (next_to_alloc) with a given buffer. On the other hand, enetc_put_rx_buff plays more tricks than its name would suggest. So let's rename enetc_put_rx_buff into enetc_flip_rx_buff to reflect the half-page buffer reuse tricks that it employs, and enetc_put_xdp_buff into enetc_put_rx_buff which suggests a more garden-variety operation. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:39 -07:00
Vladimir Oltean	e9e49ae88e	net: enetc: remove redundant clearing of skb/xdp_frame pointer in TX conf path Later in enetc_clean_tx_ring we have: /* Scrub the swbd here so we don't have to do that * when we reuse it during xmit / memset(tx_swbd, 0, sizeof(tx_swbd)); So these assignments are unnecessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:08:39 -07:00
David S. Miller	bc45f524d9	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2021-04-16 This series contains updates to igb and igc drivers. Ederson adjusts Tx buffer distributions in Qav mode to improve TSN-aware traffic for igb. He also enable PPS support and auxiliary PHC functions for igc. Grzegorz checks that the MTA register was properly written and retries if not for igb. Sasha adds reporting of EEE low power idle counters to ethtool and fixes a return value being overwritten through looping for igc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:06:14 -07:00
Gustavo A. R. Silva	1e3d976dbb	flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target() Fix the following out-of-bounds warning: net/core/flow_dissector.c:835:3: warning: 'memcpy' offset [33, 48] from the object at 'flow_keys' is out of the bounds of referenced subobject 'ipv6_src' with type '__u32[4]' {aka 'unsigned int[4]'} at offset 16 [-Warray-bounds] The problem is that the original code is trying to copy data into a couple of struct members adjacent to each other in a single call to memcpy(). So, the compiler legitimately complains about it. As these are just a couple of members, fix this by copying each one of them in separate calls to memcpy(). This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:02:27 -07:00
Florian Westphal	f2764bd4f6	netlink: don't call ->netlink_bind with table lock held When I added support to allow generic netlink multicast groups to be restricted to subscribers with CAP_NET_ADMIN I was unaware that a genl_bind implementation already existed in the past. It was reverted due to ABBA deadlock: 1. ->netlink_bind gets called with the table lock held. 2. genetlink bind callback is invoked, it grabs the genl lock. But when a new genl subsystem is (un)registered, these two locks are taken in reverse order. One solution would be to revert again and add a comment in genl referring `1e82a62fec`, "genetlink: remove genl_bind"). This would need a second change in mptcp to not expose the raw token value anymore, e.g. by hashing the token with a secret key so userspace can still associate subflow events with the correct mptcp connection. However, Paolo Abeni reminded me to double-check why the netlink table is locked in the first place. I can't find one. netlink_bind() is already called without this lock when userspace joins a group via NETLINK_ADD_MEMBERSHIP setsockopt. Same holds for the netlink_unbind operation. Digging through the history, commit `f773608026` ("netlink: access nlk groups safely in netlink bind and getname") expanded the lock scope. commit `3a20773bee` ("net: netlink: cap max groups which will be considered in netlink_bind()") ... removed the nlk->ngroups access that the lock scope extension was all about. Reduce the lock scope again and always call ->netlink_bind without the table lock. The Fixes tag should be vs. the patch mentioned in the link below, but that one got squash-merged into the patch that came earlier in the series. Fixes: `4d54cc3211` ("mptcp: avoid lock_fast usage in accept path") Link: https://lore.kernel.org/mptcp/20210213000001.379332-8-mathew.j.martineau@linux.intel.com/T/#u Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Sean Tranchetti <stranche@codeaurora.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:01:04 -07:00
David S. Miller	1c86514d7f	Merge branch 'ethtool-stats' Jakub Kicinski says: ==================== ethtool: add uAPI for reading standard stats Continuing the effort of providing a unified access method to standard stats, and explicitly tying the definitions to the standards this series adds an API for general stats which do no fit into more targeted control APIs. There is nothing clever here, just a netlink API for dumping statistics defined by standards and RFCs which today end up in ethtool -S under infinite variations of names. This series adds basic IEEE stats (for PHY, MAC, Ctrl frames) and RMON stats. AFAICT other RFCs only duplicate the IEEE stats. This series does _not_ add a netlink API to read driver-defined stats. There seems to be little to gain from moving that part to netlink. The netlink message format is very simple, and aims to allow adding stats and groups with no changes to user tooling (which IIUC is expected for ethtool). On user space side we can re-use -S, and make it dump standard stats if --groups are defined. $ ethtool -S eth0 --groups eth-phy eth-mac eth-ctrl rmon Stats for eth0: eth-phy-SymbolErrorDuringCarrier: 0 eth-mac-FramesTransmittedOK: 0 eth-mac-FrameTooLongErrors: 0 eth-ctrl-MACControlFramesTransmitted: 0 eth-ctrl-MACControlFramesReceived: 1 eth-ctrl-UnsupportedOpcodesReceived: 0 rmon-etherStatsUndersizePkts: 0 rmon-etherStatsJabbers: 0 rmon-rx-etherStatsPkts64Octets: 1 rmon-rx-etherStatsPkts128to255Octets: 0 rmon-rx-etherStatsPkts1024toMaxOctets: 1 rmon-tx-etherStatsPkts64Octets: 1 rmon-tx-etherStatsPkts128to255Octets: 0 rmon-tx-etherStatsPkts1024toMaxOctets: 1 v1: Driver support for mlxsw, mlx5 and bnxt included. Compared to the RFC I went ahead with wrapping the stats into a 1:1 nest. Now IDs of stats can start from 0, at a cost of slightly "careful" u64 alignment handling. v2: Add missing kdoc in patch 5. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 17:00:03 -07:00
Jakub Kicinski	b572ec9ff0	mlx5: implement ethtool standard stats Add support for PHY/MAC/Ctrl/RMON stats. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 16:59:47 -07:00
Jakub Kicinski	782bc00aff	bnxt: implement ethtool standard stats Most of the names seem to strongly correlate with names from the standard and RFC. Whether ..+good_frames are indeed Frames..OK I'm the least sure of. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 16:59:20 -07:00
Jakub Kicinski	c1912ab0ee	mlxsw: implement ethtool standard stats mlxsw has nicely grouped stats, add support for standard uAPI. I'm guessing the register access part. Compile tested only. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 16:59:20 -07:00
Jakub Kicinski	a8b06e9d40	ethtool: add interface to read RMON stats Most devices maintain RMON (RFC 2819) stats - particularly the "histogram" of packets received by size. Unlike other RFCs which duplicate IEEE stats, the short/oversized frame counters in RMON don't seem to match IEEE stats 1-to-1 either, so expose those, too. Do not expose basic packet, CRC errors etc - those are already otherwise covered. Because standard defines packet ranges only up to 1518, and everything above that should theoretically be "oversized" - devices often create their own ranges. Going beyond what the RFC defines - expose the "histogram" in the Tx direction (assume for now that the ranges will be the same). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 16:59:20 -07:00

... 74 75 76 77 78 ...

1014064 Commits