Fixing nested_vmcb_check_save to avoid all TOC/TOU races
is a bit harder in released kernels, so do the bare minimum
by avoiding that EFER.SVME is cleared. This is problematic
because svm_set_efer frees the data structures for nested
virtualization if EFER.SVME is cleared.
Also check that EFER.SVME remains set after a nested vmexit;
clearing it could happen if the bit is zero in the save area
that is passed to KVM_SET_NESTED_STATE (the save area of the
nested state corresponds to the nested hypervisor's state
and is restored on the next nested vmexit).
Cc: stable@vger.kernel.org
Fixes: 2fcf4876ad ("KVM: nSVM: implement on demand allocation of the nested state")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Avoid races between check and use of the nested VMCB controls. This
for example ensures that the VMRUN intercept is always reflected to the
nested hypervisor, instead of being processed by the host. Without this
patch, it is possible to end up with svm->nested.hsave pointing to
the MSR permission bitmap for nested guests.
This bug is CVE-2021-29657.
Reported-by: Felix Wilhelm <fwilhelm@google.com>
Cc: stable@vger.kernel.org
Fixes: 2fcf4876ad ("KVM: nSVM: implement on demand allocation of the nested state")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To aid with debugging, add details of the source of a panic from nVHE
hyp. This is done by having nVHE hyp exit to nvhe_hyp_panic_handler()
rather than directly to panic(). The handler will then add the extra
details for debugging before panicking the kernel.
If the panic was due to a BUG(), look up the metadata to log the file
and line, if available, otherwise log an address that can be looked up
in vmlinux. The hyp offset is also logged to allow other hyp VAs to be
converted, similar to how the kernel offset is logged during a panic.
__hyp_panic_string is now inlined since it no longer needs to be
referenced as a symbol and the message is free to diverge between VHE
and nVHE.
The following is an example of the logs generated by a BUG in nVHE hyp.
[ 46.754840] kvm [307]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/switch.c:242!
[ 46.755357] kvm [307]: Hyp Offset: 0xfffea6c58e1e0000
[ 46.755824] Kernel panic - not syncing: HYP panic:
[ 46.755824] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800
[ 46.755824] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000
[ 46.755824] VCPU:0000d93a880d0000
[ 46.756960] CPU: 3 PID: 307 Comm: kvm-vcpu-0 Not tainted 5.12.0-rc3-00005-gc572b99cf65b-dirty #133
[ 46.757459] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[ 46.758366] Call trace:
[ 46.758601] dump_backtrace+0x0/0x1b0
[ 46.758856] show_stack+0x18/0x70
[ 46.759057] dump_stack+0xd0/0x12c
[ 46.759236] panic+0x16c/0x334
[ 46.759426] arm64_kernel_unmapped_at_el0+0x0/0x30
[ 46.759661] kvm_arch_vcpu_ioctl_run+0x134/0x750
[ 46.759936] kvm_vcpu_ioctl+0x2f0/0x970
[ 46.760156] __arm64_sys_ioctl+0xa8/0xec
[ 46.760379] el0_svc_common.constprop.0+0x60/0x120
[ 46.760627] do_el0_svc+0x24/0x90
[ 46.760766] el0_svc+0x2c/0x54
[ 46.760915] el0_sync_handler+0x1a4/0x1b0
[ 46.761146] el0_sync+0x170/0x180
[ 46.761889] SMP: stopping secondary CPUs
[ 46.762786] Kernel Offset: 0x3e1cd2820000 from 0xffff800010000000
[ 46.763142] PHYS_OFFSET: 0xffffa9f680000000
[ 46.763359] CPU features: 0x00240022,61806008
[ 46.763651] Memory Limit: none
[ 46.813867] ---[ end Kernel panic - not syncing: HYP panic:
[ 46.813867] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800
[ 46.813867] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000
[ 46.813867] VCPU:0000d93a880d0000 ]---
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210318143311.839894-6-ascull@google.com
hyp_panic() reports the address of the panic by using ELR_EL2, but this
isn't a useful address when hyp_panic() is called directly. Replace such
direct calls with BUG() and BUG_ON() which use BRK to trigger an
exception that then goes to hyp_panic() with the correct address. Also
remove the hyp_panic() declaration from the header file to avoid
accidental misuse.
Signed-off-by: Andrew Scull <ascull@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210318143311.839894-5-ascull@google.com
Set bug_get_file_line()'s output parameter values directly rather than
first nullifying them and then conditionally setting new values.
Signed-off-by: Andrew Scull <ascull@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210318143311.839894-4-ascull@google.com
There is some non-trivial config-based logic to get the file name and
line number associated with a bug. Factor this out to a getter that can
be resused.
Signed-off-by: Andrew Scull <ascull@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210318143311.839894-3-ascull@google.com
report_bug() will return early if it cannot find a bug corresponding to
the provided address. The subsequent test for the bug will always be
true so remove it.
Fixes: 1b4cfe3c0a ("lib/bug.c: exclude non-BUG/WARN exceptions from report_bug()")
Signed-off-by: Andrew Scull <ascull@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210318143311.839894-2-ascull@google.com
When suspending the driver, anx7625_power_standby() will be called to
turn off reset-gpios and enable-gpios. However, power supplies are not
disabled. To save power, the driver can get the power supply regulators
and turn off them in anx7625_power_standby().
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Reviewed-by: Robert Foss <robert.foss@linaro.org>
Reviewed-by: Xin Ji <xji@analogixsemi.com>
Signed-off-by: Robert Foss <robert.foss@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210401053202.159302-2-hsinyi@chromium.org
Use the new multi-interface support in USB serial core to properly claim
also the control interface during probe. This prevents having another
driver claim the control interface and makes core allocate resources
also for the interrupt endpoint (currently unused).
Switch to probing only Communication Class interfaces and use the Union
functional descriptor to determine the corresponding data interface.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
It may be possible that the string pointer does not move
when parsing. Add a code which detects this state and
simply break the parser loop in this case.
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20210331180725.663623-1-perex@perex.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
A single USB function can be implemented using a group of interfaces and
this is for example commonly used for Communication Class devices.
Add support for multi-interface functions to USB serial core and export
an interface that allows drivers to claim a second sibling interface.
The interface could easily be extended to allow claiming further
interfaces if ever needed.
When a driver claims a sibling interface in probe(), core allocates
resources for any bulk in, bulk out, interrupt in and interrupt out
endpoints found also on the sibling interface.
Disconnect is implemented so that unbinding either interface will
release the other interface while disconnect() is called precisely once.
Similarly, suspend() is called when the first sibling interface is
suspended and resume() is called when the last sibling interface is
resumed by USB core.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Refactor endpoint classification and replace the build-time
endpoint-array sanity checks with runtime checks in preparation for
handling endpoints from a sibling interface.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Use more specific custom GPIO line names which denote exactly where
the GPIO came from, i.e. the base board. Also, update the new blank
GPIO line names set up by stm32mp15xx-dhcom-som.dtsi back to their
original values.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Patrice Chotard <patrice.chotard@foss.st.com>
Cc: Patrick Delaunay <patrick.delaunay@foss.st.com>
Cc: linux-stm32@st-md-mailman.stormreply.com
To: linux-arm-kernel@lists.infradead.org
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Use more specific custom GPIO line names which denote exactly where
the GPIO came from, i.e. the base board. Also, update the new blank
GPIO line names set up by stm32mp15xx-dhcom-som.dtsi back to their
original values.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Patrice Chotard <patrice.chotard@foss.st.com>
Cc: Patrick Delaunay <patrick.delaunay@foss.st.com>
Cc: linux-stm32@st-md-mailman.stormreply.com
To: linux-arm-kernel@lists.infradead.org
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Fill in the custom GPIO line names used by DH.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Patrice Chotard <patrice.chotard@foss.st.com>
Cc: Patrick Delaunay <patrick.delaunay@foss.st.com>
Cc: linux-stm32@st-md-mailman.stormreply.com
To: linux-arm-kernel@lists.infradead.org
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
The suspending flag was added back in 2009 but no users ever followed.
Remove it.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
The XR21V141X does not have a 5- or 6-bit mode, but the current
implementation failed to properly restore the old setting when CS5 or
CS6 was requested. Instead an invalid request would be sent to the
device.
Fixes: c2d405aa86 ("USB: serial: add MaxLinear/Exar USB to Serial driver")
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@vger.kernel.org # 5.12
Signed-off-by: Johan Hovold <johan@kernel.org>
To use additional properties 'bluetooth' on serial, need replace false with
'type: object' for 'additionalProperties' to make it as a node, else will
run into dtbs_check warnings.
'arch/arm/boot/dts/stm32h750i-art-pi.dt.yaml: serial@40004800:
'bluetooth' does not match any of the regexes: 'pinctrl-[0-9]+'
Fixes: af1c2d8169 ("dt-bindings: serial: Convert STM32 UART to json-schema")
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Valentin Caron <valentin.caron@foss.st.com>
Signed-off-by: dillon min <dillon.minfei@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/1616757302-7889-8-git-send-email-dillon.minfei@gmail.com
Signed-off-by: dillon min <dillon.minfei@gmail.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
The STM32H750 is a Cortex-M7 MCU running at 480MHz
and containing 128KBytes internal flash, 1MiB SRAM.
Signed-off-by: dillon min <dillon.minfei@gmail.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
The variable error is initialized to 0 and is set to 1 this
value is never read as it is on an immediate return path. The
only read of error is to check it is 0 and this check is always
true at that point of the code. The variable is redundant and
can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
This patchset has following changes:
- introduce stm32h750.dtsi to support stm32h750 value line
- add pin groups for usart3/uart4/spi1/sdmmc2
- add stm32h750i-art-pi.dtb (arch/arm/boot/dts/Makefile)
- add stm32h750-art-pi.dts to support art-pi board
art-pi board component:
- 8MiB qspi flash
- 16MiB spi flash
- 32MiB sdram
- ap6212 wifi&bt&fm
the detail board information can be found at:
https://art-pi.gitee.io/website/
Signed-off-by: dillon min <dillon.minfei@gmail.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Replace upper case by lower case in i2c nodes name.
Signed-off-by: dillon min <dillon.minfei@gmail.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
Some instances are missing in current support of stm32h743 MCU. This commit
adds usart3/uart4 and sdmmc2 support.
Signed-off-by: dillon min <dillon.minfei@gmail.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
This patch is intend to add support stm32h750 value line,
just add stm32h7-pinctrl.dtsi for extending, with following changes:
- rename stm32h743-pinctrl.dtsi to stm32h7-pinctrl.dtsi
- move 'pin-controller' from stm32h7-pinctrl.dtsi to stm32h743.dtsi, to
fix make dtbs_check warrnings
arch/arm/boot/dts/stm32h750i-art-pi.dt.yaml: soc: 'i2c@40005C00',
'i2c@58001C00' do not match any of the regexes:
'@(0|[1-9a-f][0-9a-f]*)$', '^[^@]+$', 'pinctrl-[0-9]+'
Signed-off-by: dillon min <dillon.minfei@gmail.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
There is no way as of now to have a parent or bus defining properties
for child nodes. For now, avoid it in the example to silence warnings on
dt_schema_check. We can figure out how to deal with actual dts files
later.
[p.yadav@ti.com: Fix how compatible is defined, make reset optional, fix
minor typos, remove subnode properties in example, update commit
message.]
Signed-off-by: Ramuthevar Vadivel Murugan <vadivel.muruganx.ramuthevar@linux.intel.com>
Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210326130034.15231-5-p.yadav@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
We get warning of using a unsigned variable being compared to less than
zero. The comparison is correct as it checks for errors from previous
call to qcom_swrm_get_alert_slave_dev_num(), so we should use a signed
variable here.
While at it, drop the superfluous initialization as well
drivers/soundwire/qcom.c: qcom_swrm_irq_handler() warn: impossible
condition '(devnum < 0) => (0-255 < 0)'
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20210331155520.2987823-1-vkoul@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Art-pi based on stm32h750xbh6, with following resources:
-8MiB QSPI flash
-16MiB SPI flash
-32MiB SDRAM
-AP6212 wifi, bt, fm
detail information can be found at:
https://art-pi.gitee.io/website/
Signed-off-by: dillon min <dillon.minfei@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
This patchset add support for soc stm32h750, stm32h750 has mirror
different from stm32h743
item stm32h743 stm32h750
flash size: 2MiB 128KiB
adc: none 3
crypto-hash: none aes/hamc/des/tdes/md5/sha
detail information can be found at:
https://www.st.com/en/microcontrollers-microprocessors/stm32h750-value-line.html
Signed-off-by: dillon min <dillon.minfei@gmail.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
- This header file isn't used anymore so drop it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJgYbbvAAoJEFc4NIkMQxK4KMYQALXOVKqGu8fgQXct3lyIn9cK
lvFkrWv03v5DYdyHio7XR8egFzRMKw0XENYzAM0CAaUsVApOgi63puZrtyO5+taW
++Ai3oclCrGvwEWXhxx6jGEPUPaPPrD8sQF60+3bOxbAGDxHk99RhEHvYwIQbQzD
5ZJa3V2K7xiBmnnXP2mlm5qNSVC0c8rRWtf5rXjRHcAfaWHy9gqScaXqwn/9KKZy
Nrf5dsV3vxG9F+HdoWyKIdvhGfVrjdIldkLBtCn2P3n0aHJXAeqpoK1K6JpVIFtO
mJPHwB9XwZZ/I2jxXBpATU70C50SAKgFd0bS5f6caZOZJHw1S+VkocgcOqLZA4Kz
7QnaMJfef2MkSK/1I3XsLgissd+4GsnghNt8t5lF6+vNuTTEcqY+k/eL+2xpXzaF
lKtXqTgL3+JKlCibYUK0ZCGa7M3cGGWjmSSUsXTo2jWY7GqlT5T6lB0YOp8fmr8P
7IQEm3l1gNCgFpF8S2mbIHbWKRlv/Bm0oTbUixAPMojLYjIPC48YZ7qZZ5WwPcSz
CB+y2F+QkIDR1Du8Ys8cacNgedVfSZz3SGdJ8pnpTI5UkzGWvgzqTwnqRumNmPUb
uzAtmA9Fi43f/DWgNqsA5FPODtDin3ubU8/JwGKlNEYgJfrXYga4trINOAqCjKoJ
sHgi5Ekd8ZzVsf9CPWRe
=Rp8S
-----END PGP SIGNATURE-----
Merge tag 'exynos-drm-fixes-for-v5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
Just one cleanup which drops of_gpio.h inclusion.
- This header file isn't used anymore so drop it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1617016858-14081-1-git-send-email-inki.dae@samsung.com
The page table of AMDGPU requires an alignment to CPU page so we should
check ioctl parameters for it. Return -EINVAL if some parameter is
unaligned to CPU page, instead of corrupt the page table sliently.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
In Mesa, dev_info.gart_page_size is used for alignment and it was
set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU
driver requires an alignment on CPU pages. So, for non-4KB page system,
gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE).
Signed-off-by: Rui Wang <wangr@lemote.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Link: https://github.com/loongson-community/linux-stable/commit/caa9c0a1
[Xi: rebased for drm-next, use max_t for checkpatch,
and reworded commit message.]
Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549
Tested-by: Dan Horák <dan@danny.cz>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Do the same thing we do for Renoir. We can check, but since
the sbios has started DPM, it will always return true which
causes the driver to skip some of the SMU init when it shouldn't.
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.
Changes since v1:
* Change dqm->fence_addr as a u64 pointer to fix this issue,
also fix up query_status and amdkfd_fence_wait_timeout function
uses 64 bit fence value to make them consistent.
Signed-off-by: Qu Huang <jinsdb@126.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
For multiple split bios, if one of the bio is fail, the whole
should return error to application. But we found there is a race
between bio_integrity_verify_fn and bio complete, which return
io success to application after one of the bio fail. The race as
following:
split bio(READ) kworker
nvme_complete_rq
blk_update_request //split error=0
bio_endio
bio_integrity_endio
queue_work(kintegrityd_wq, &bip->bip_work);
bio_integrity_verify_fn
bio_endio //split bio
__bio_chain_endio
if (!parent->bi_status)
<interrupt entry>
nvme_irq
blk_update_request //parent error=7
req_bio_endio
bio->bi_status = 7 //parent bio
<interrupt exit>
parent->bi_status = 0
parent->bi_end_io() // return bi_status=0
The bio has been split as two: split and parent. When split
bio completed, it depends on kworker to do endio, while
bio_integrity_verify_fn have been interrupted by parent bio
complete irq handler. Then, parent bio->bi_status which have
been set in irq handler will overwrite by kworker.
In fact, even without the above race, we also need to conside
the concurrency beteen mulitple split bio complete and update
the same parent bi_status. Normally, multiple split bios will
be issued to the same hctx and complete from the same irq
vector. But if we have updated queue map between multiple split
bios, these bios may complete on different hw queue and different
irq vector. Then the concurrency update parent bi_status may
cause the final status error.
Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210331115359.1125679-1-yuyufen@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
SO_TXTIME hardware offload requires testing across devices, either
between machines or separate network namespaces.
Split up SO_TXTIME test into tx and rx modes, so traffic can be
sent from one process to another. Create a veth-pair on different
namespaces and bind each process to an end point via [-S]ource and
[-D]estination parameters. Optional start [-t]ime parameter can be
passed to synchronize the test across the hosts (with synchorinzed
clocks).
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mt7623 uses offload version 2 too
tested on Bananapi-R2
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
xdp_return_frame() may be called outside of NAPI context to return
xdpf back to page_pool. xdp_return_frame() calls __xdp_return() with
napi_direct = false. For page_pool memory model, __xdp_return() calls
xdp_return_frame_no_direct() unconditionally and below false negative
kernel BUG throw happened under preempt-rt build:
[ 430.450355] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/3884
[ 430.451678] caller is __xdp_return+0x1ff/0x2e0
[ 430.452111] CPU: 0 PID: 3884 Comm: modprobe Tainted: G U E 5.12.0-rc2+ #45
Changes in v2:
- This patch fixes the issue by making xdp_return_frame_no_direct() is
only called if napi_direct = true, as recommended for better by
Jesper Dangaard Brouer. Thanks!
Fixes: 2539650fad ("xdp: Helpers for disabling napi_direct of xdp_return_frame")
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Turn on the MEEAO field of MTL_ECC_Control_Register by default.
As the MTL ECC Error Address Status Over-ride(MEEAO) is set by default,
the following error address fields will hold the last valid address
where the error is detected.
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Tan Tee Min <tee.min.tan@intel.com>
Co-developed-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit f211ac1545.
We had similar attempt in the past, and we reverted it.
History:
64a146513f [NET]: Revert incorrect accept queue backlog changes.
8488df894d [NET]: Fix bugs in "Whether sock accept queue is full" checking
I am adding a fat comment so that future attempts will
be much harder.
Fixes: f211ac1545 ("net: correct sk_acceptq_is_full()")
Cc: iuyacan <yacanliu@163.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean says:
====================
XDP for NXP ENETC
This series adds support to the enetc driver for the basic XDP primitives.
The ENETC is a network controller found inside the NXP LS1028A SoC,
which is a dual-core Cortex A72 device for industrial networking,
with the CPUs clocked at up to 1.3 GHz. On this platform, there are 4
ENETC ports and a 6-port embedded DSA switch, in a topology that looks
like this:
+-------------------------------------------------------------------------+
| +--------+ 1 Gbps (typically disabled) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5 Gbps | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| | | | | |
| +------------------------------------------------+
| | | Felix | | Felix | |
| | Switch | port 4 | | port 5 | |
| | +--------+ +--------+ |
| | |
| +--------+ +--------+ | +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | | port 0 | | port 1 | | port 2 | | port 3 | |
+-------------------------------------------------------------------------+
| | | | | |
v v v v v v
Up to Up to Up to 4x 2.5Gbps
2.5Gbps 1Gbps
The ENETC ports 2 and 3 can act as DSA masters for the embedded switch.
Because 4 out of the 6 externally-facing ports of the SoC are switch
ports, the most interesting use case for XDP on this device is in fact
XDP_TX on the 2.5Gbps DSA master.
Nonetheless, the results presented below are for IPv4 forwarding between
ENETC port 0 (eno0) and port 1 (eno1) both configured for 1Gbps.
There are two streams of IPv4/UDP datagrams with a frame length of 64
octets delivered at 100% port load to eno0 and to eno1. eno0 has a flow
steering rule to process the traffic on RX ring 0 (CPU 0), and eno1 has
a flow steering rule towards RX ring 1 (CPU 1).
For the IPFWD test, standard IP routing was enabled in the netns.
For the XDP_DROP test, the samples/bpf/xdp1 program was attached to both
eno0 and to eno1.
For the XDP_TX test, the samples/bpf/xdp2 program was attached to both
eno0 and to eno1.
For the XDP_REDIRECT test, the samples/bpf/xdp_redirect program was
attached once to the input of eno0/output of eno1, and twice to the
input of eno1/output of eno0.
Finally, the preliminary results are as follows:
| IPFWD | XDP_TX | XDP_REDIRECT | XDP_DROP
--------+-------+--------+-------------------------
fps | 761 | 2535 | 1735 | 2783
Gbps | 0.51 | 1.71 | 1.17 | n/a
There is a strange phenomenon in my testing sistem where it appears that
one CPU is processing more than the other. I have not investigated this
too much. Also, the code might not be very well optimized (for example
dma_sync_for_device is called with the full ENETC_RXB_DMA_SIZE_XDP).
Design wise, the ENETC is a PCI device with BD rings, so it uses the
MEM_TYPE_PAGE_SHARED memory model, as can typically be seen in Intel
devices. The strategy was to build upon the existing model that the
driver uses, and not change it too much. So you will see things like a
separate NAPI poll function for XDP.
I have only tested with PAGE_SIZE=4096, and since we split pages in
half, it means that MTU-sized frames are scatter/gather (the XDP
headroom + skb_shared_info only leaves us 1476 bytes of data per
buffer). This is sub-optimal, but I would rather keep it this way and
help speed up Lorenzo's series for S/G support through testing, rather
than change the enetc driver to use some other memory model like page_pool.
My code is already structured for S/G, and that works fine for XDP_DROP
and XDP_TX, just not for XDP_REDIRECT, even between two enetc ports.
So the S/G XDP_REDIRECT is stubbed out (the frames are dropped), but
obviously I would like to remove that limitation soon.
Please note that I am rather new to this kind of stuff, I am more of a
control path person, so I would appreciate feedback.
Enough talking, on to the patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver implementation of the XDP_REDIRECT action reuses parts from
XDP_TX, most notably the enetc_xdp_tx function which transmits an array
of TX software BDs. Only this time, the buffers don't have DMA mappings,
we need to create them.
When a BPF program reaches the XDP_REDIRECT verdict for a frame, we can
employ the same buffer reuse strategy as for the normal processing path
and for XDP_PASS: we can flip to the other page half and seed that to
the RX ring.
Note that scatter/gather support is there, but disabled due to lack of
multi-buffer support in XDP (which is added by this series):
https://patchwork.kernel.org/project/netdevbpf/cover/cover.1616179034.git.lorenzo@kernel.org/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As explained in the XDP_TX patch, when receiving a burst of frames with
the XDP_TX verdict, there is a momentary dip in the number of available
RX buffers. The system will eventually recover as TX completions will
start kicking in and refilling our RX BD ring again. But until that
happens, we need to survive with as few out-of-buffer discards as
possible.
This increases the memory footprint of the driver in order to avoid
discards at 2.5Gbps line rate 64B packet sizes, the maximum speed
available for testing on 1 port on NXP LS1028A.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For reflecting packets back into the interface they came from, we create
an array of TX software BDs derived from the RX software BDs. Therefore,
we need to extend the TX software BD structure to contain most of the
stuff that's already present in the RX software BD structure, for
reasons that will become evident in a moment.
For a frame with the XDP_TX verdict, we don't reuse any buffer right
away as we do for XDP_DROP (the same page half) or XDP_PASS (the other
page half, same as the skb code path).
Because the buffer transfers ownership from the RX ring to the TX ring,
reusing any page half right away is very dangerous. So what we can do is
we can recycle the same page half as soon as TX is complete.
The code path is:
enetc_poll
-> enetc_clean_rx_ring_xdp
-> enetc_xdp_tx
-> enetc_refill_rx_ring
(time passes, another MSI interrupt is raised)
enetc_poll
-> enetc_clean_tx_ring
-> enetc_recycle_xdp_tx_buff
But that creates a problem, because there is a potentially large time
window between enetc_xdp_tx and enetc_recycle_xdp_tx_buff, period in
which we'll have less and less RX buffers.
Basically, when the ship starts sinking, the knee-jerk reaction is to
let enetc_refill_rx_ring do what it does for the standard skb code path
(refill every 16 consumed buffers), but that turns out to be very
inefficient. The problem is that we have no rx_swbd->page at our
disposal from the enetc_reuse_page path, so enetc_refill_rx_ring would
have to call enetc_new_page for every buffer that we refill (if we
choose to refill at this early stage). Very inefficient, it only makes
the problem worse, because page allocation is an expensive process, and
CPU time is exactly what we're lacking.
Additionally, there is an even bigger problem: if we let
enetc_refill_rx_ring top up the ring's buffers again from the RX path,
remember that the buffers sent to transmission haven't disappeared
anywhere. They will be eventually sent, and processed in
enetc_clean_tx_ring, and an attempt will be made to recycle them.
But surprise, the RX ring is already full of new buffers, because we
were premature in deciding that we should refill. So not only we took
the expensive decision of allocating new pages, but now we must throw
away perfectly good and reusable buffers.
So what we do is we implement an elastic refill mechanism, which keeps
track of the number of in-flight XDP_TX buffer descriptors. We top up
the RX ring only up to the total ring capacity minus the number of BDs
that are in flight (because we know that those BDs will return to us
eventually).
The enetc driver manages 1 RX ring per CPU, and the default TX ring
management is the same. So we do XDP_TX towards the TX ring of the same
index, because it is affined to the same CPU. This will probably not
produce great results when we have a tc-taprio/tc-mqprio qdisc on the
interface, because in that case, the number of TX rings might be
greater, but I didn't add any checks for that yet (mostly because I
didn't know what checks to add).
It should also be noted that we need to change the DMA mapping direction
for RX buffers, since they may now be reflected into the TX ring of the
same device. We choose to use DMA_BIDIRECTIONAL instead of unmapping and
remapping as DMA_TO_DEVICE, because performance is better this way.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the RX ring, enetc uses an allocation scheme based on pages split
into two buffers, which is already very efficient in terms of preventing
reallocations / maximizing reuse, so I see no reason why I would change
that.
+--------+--------+--------+--------+--------+--------+--------+
| | | | | | | |
| half B | half B | half B | half B | half B | half B | half B |
| | | | | | | |
+--------+--------+--------+--------+--------+--------+--------+
| | | | | | | |
| half A | half A | half A | half A | half A | half A | half A | RX ring
| | | | | | | |
+--------+--------+--------+--------+--------+--------+--------+
^ ^
| |
next_to_clean next_to_alloc
next_to_use
+--------+--------+--------+--------+--------+
| | | | | |
| half B | half B | half B | half B | half B |
| | | | | |
+--------+--------+--------+--------+--------+--------+--------+
| | | | | | | |
| half B | half B | half A | half A | half A | half A | half A | RX ring
| | | | | | | |
+--------+--------+--------+--------+--------+--------+--------+
| | | ^ ^
| half A | half A | | |
| | | next_to_clean next_to_use
+--------+--------+
^
|
next_to_alloc
then when enetc_refill_rx_ring is called, whose purpose is to advance
next_to_use, it sees that it can take buffers up to next_to_alloc, and
it says "oh, hey, rx_swbd->page isn't NULL, I don't need to allocate
one!".
The only problem is that for default PAGE_SIZE values of 4096, buffer
sizes are 2048 bytes. While this is enough for normal skb allocations at
an MTU of 1500 bytes, for XDP it isn't, because the XDP headroom is 256
bytes, and including skb_shared_info and alignment, we end up being able
to make use of only 1472 bytes, which is insufficient for the default
MTU.
To solve that problem, we implement scatter/gather processing in the
driver, because we would really like to keep the existing allocation
scheme. A packet of 1500 bytes is received in a buffer of 1472 bytes and
another one of 28 bytes.
Because the headroom required by XDP is different (and much larger) than
the one required by the network stack, whenever a BPF program is added
or deleted on the port, we drain the existing RX buffers and seed new
ones with the required headroom. We also keep the required headroom in
rx_ring->buffer_offset.
The simplest way to implement XDP_PASS, where an skb must be created, is
to create an xdp_buff based on the next_to_clean RX BDs, but not clear
those BDs from the RX ring yet, just keep the original index at which
the BDs for this frame started. Then, if the verdict is XDP_PASS,
instead of converting the xdb_buff to an skb, we replay a call to
enetc_build_skb (just as in the normal enetc_clean_rx_ring case),
starting from the original BD index.
We would also like to be minimally invasive to the regular RX data path,
and not check whether there is a BPF program attached to the ring on
every packet. So we create a separate RX ring processing function for
XDP.
Because we only install/remove the BPF program while the interface is
down, we forgo the rcu_read_lock() in enetc_clean_rx_ring, since there
shouldn't be any circumstance in which we are processing packets and
there is a potentially freed BPF program attached to the RX ring.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>