The XRMT configures the M-Table for the XLT-LPM.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During the router init flow, call into XM code and initialize couple of
items needed for XM functionality:
1) Query the capabilities and sizes. Check the XM device id.
2) Initialize the M-value. Note that currently the M-value is set fixed
to 16 for IPv4. In future this may change to better cover the actual
inserted routes.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The XLTQ is used to query HW for XM-related info.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The RXLTM configures and selects the M for the XM lookups.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use the info stored in the bus_info struct about the eXtended mezanine
connected ports and don't expose them.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The output of boardinfo command was extended to contain information
about XM. Indicates if is present and in case it is, tells which
localports are used for the connection. So parse this info and store it
in bus_info passed up to the driver.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In order to offload entries to XM, implement a set of low-level
functions to work with LPM trees in XM and also to pack and write
FIB entries into XM.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The RXLTE enables XLT (eXtended Lookup Table) LPM lookups if a capable
XM is present on the system.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The XMDR allows direct access to the XM device via the switch.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Simplification and distangling of the MSI related functionality
- Let IO/APIC construct the RTE entries from an MSI message instead of
having IO/APIC specific code in the interrupt remapping drivers
- Make the retrieval of the parent interrupt domain (vector or remap
unit) less hardcoded and use the relevant irqdomain callbacks for
selection.
- Allow the handling of more than 255 CPUs without a virtualized IOMMU
when the hypervisor supports it. This has made been possible by the
above modifications and also simplifies the existing workaround in the
HyperV specific virtual IOMMU.
- Cleanup of the historical timer_works() irq flags related
inconsistencies.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/Xxd8THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYpOD/9C5TppNlPMUyx2SflH6bxt37pJEpln
+hYTKsk+jSThntr5mfj+GifGvgmHOVBTGnlDUnUnrpN7TQmLFBzwTOtnBLW53AO2
16/u0+Xci4LNCtEkaymf0Rq4MfsfriXHPJr0A/CnZ0tpHSf5QKHAiitSiGujdMlb
gbq43+zXd+jNkH7vkOLPX/7dZVI1hNASQEevJu2tRR4xYTuXFdBxvLgYkHtYKKrK
R1sbs6nI6yIzye2u4m4xGu29SxgUft+zdUf+UehJKM3yFmf51d9qpkX+kLaTWuaL
VPsMItbn0kdvxwXQWO6DYnIAAnVKCklyHQJTZCoNq9Fe91OoByak1CEVspSOa1av
JmycNSch4IYWasR4vVCB1gbb+V9SejcKu5SV3CDrEDqwkOIpfiqpriUXSCJTLlFd
QOEDOLuuk/79Qs//J/tb/nJ4IuKv8WPudDfIlMro8wUsAr67DjD4mnXprZ+svwWx
Ct/0/Memk+BSa0cw6pvg24BUZGN6zrufkBu2HKT9GOXRUdNkdLkiPhT8mK4T/O0l
f90QCLjPSOJ/K/pLEWdUHEPmgC5Q9RsXOmwVGqX+RbjfP7mYTJXlmWnBb+cFNch0
xFIH3SxVGylxxT06NX3SkvinrHj10CoAlmneefBlLtx6dF+2P84DAMZSF0OFToVI
c2KMg5zoesI4bg==
=8Gfs
-----END PGP SIGNATURE-----
Merge tag 'x86-apic-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Thomas Gleixner:
"Yet another large set of x86 interrupt management updates:
- Simplification and distangling of the MSI related functionality
- Let IO/APIC construct the RTE entries from an MSI message instead
of having IO/APIC specific code in the interrupt remapping drivers
- Make the retrieval of the parent interrupt domain (vector or remap
unit) less hardcoded and use the relevant irqdomain callbacks for
selection.
- Allow the handling of more than 255 CPUs without a virtualized
IOMMU when the hypervisor supports it. This has made been possible
by the above modifications and also simplifies the existing
workaround in the HyperV specific virtual IOMMU.
- Cleanup of the historical timer_works() irq flags related
inconsistencies"
* tag 'x86-apic-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
x86/ioapic: Cleanup the timer_works() irqflags mess
iommu/hyper-v: Remove I/O-APIC ID check from hyperv_irq_remapping_select()
iommu/amd: Fix IOMMU interrupt generation in X2APIC mode
iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled
iommu/amd: Fix union of bitfields in intcapxt support
x86/ioapic: Correct the PCI/ISA trigger type selection
x86/ioapic: Use I/O-APIC ID for finding irqdomain, not index
x86/hyperv: Enable 15-bit APIC ID if the hypervisor supports it
x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected
iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available
x86/apic: Support 15 bits of APIC ID in MSI where available
x86/ioapic: Handle Extended Destination ID field in RTE
iommu/vt-d: Simplify intel_irq_remapping_select()
x86: Kill all traces of irq_remapping_get_irq_domain()
x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
iommu/hyper-v: Implement select() method on remapping irqdomain
iommu/vt-d: Implement select() method on remapping irqdomain
iommu/amd: Implement select() method on remapping irqdomain
x86/apic: Add select() method on vector irqdomain
...
Michael Chan says:
====================
bnxt_en: Improve firmware flashing.
This patchset improves firmware flashing in 2 ways:
- If firmware returns NO_SPACE error during flashing, the driver will
create the UPDATE directory with more staging area and retry.
- Instead of allocating a big DMA buffer for the entire contents of
the firmware package size, fallback to a smaller buffer to DMA the
contents in multiple DMA operations.
====================
Link: https://lore.kernel.org/r/1607860306-17244-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current scheme allocates a DMA buffer as big as the requested
firmware package file and DMAs the contents to firmware in one
operation. The buffer size can be several hundred kilo bytes and
the driver may not be able to allocate the memory. This will cause
firmware upgrade to fail.
Improve the scheme by using smaller DMA blocks and calling firmware to
DMA each block in a batch mode. Older firmware can cause excessive
NVRAM erases if the block size is too small so we try to allocate a
256K buffer to begin with and size it down successively if we cannot
allocate the memory.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In bnxt_flash_package_from_fw_obj(), if firmware returns the NO_SPACE
error, call __bnxt_flash_nvram() to create the UPDATE directory and
then loop back and retry one more time.
Since the first try may fail, we use the silent version to send the
firmware commands.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On NICs with a smaller NVRAM, FW installation may fail after multiple
updates due to fragmentation. The driver can retry when FW returns
a special error code. To faciliate the retry, we restructure the
logic that performs the flashing in a loop. The actual retry logic
will be added in the next patch.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This function will be modified in the next patch to retry flashing
the firmware in a loop. To facilate that, we rearrange the code so
that the steps that only need to be done once before the loop will be
moved to the top of the function.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Refactor bnxt_flash_nvram() into __bnxt_flash_nvram() that takes an
additional dir_item_len parameter. The new function will be used
in subsequent patches with the dir_item_len parameter set to create
the UPDATE directory during flashing.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Simplify the FPU protection for !RT kernels
- Add the RT variant of FPU protections
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/Xxk8THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYLWEADC6OnEn/bZO6RMUQvTr0OJGo/G4ZGg
pibxHqBF1Q1vso8nGtYhPXHV7BbYKc1t1g3cvTBrWARSI+0GkRvZuqRUekcm2anR
ERs0zxIVexj0DqVa/08qkH//njG5QLPig2y6EtcOcNWnJiWrf3BayfY+Jelr+pQ0
ZK8yK6X38AwgflIs1RG2gIVocAB16cr4SX6cEqn4+FTY8TW4WFkPeZPYR55eh99s
aTQU+YK/bHul/WmWT+OPbJcfywLy//F+KbHSdjUhOHJO3YW9aAS9xti6wVGlwB0O
j6x82N4pJj1llhOsVZgLVVwX7zUtJlFz4D7w8Bl2/bUDt/EVKS2397uAWNIV5+0g
/8e+xnk9KDsGkD0GqboYUk4SXJiPUJ6GVc3FB1TIFemxWhgKYJ0yfRcGgez41O3P
IOYLoN32UZZvTPhwzX/WZlTSjQ+pTiNWE7Jhnm+HS3ss9qNEeiCYU2zhjIY2/wnB
lhB8W/kAw+UtjJbguwem6NwJB0e6egPn6c+6UL5n1JmMXvmlYe0tEBoZBgqt+30Y
Kz+rhLt5m4VQYTXoGAd1qVQ13eEk0SsbcB0Xl1tU+ix6JuFc4KiFzRR+uj7pm11V
xhCpt7I7Je0jVnnCRc1Sc5gQlEWfVb/FVy8IBlQz6qyUUqHf7tUwnwVmji9IyK4Y
Pc7Peqaoy7Cc3Q==
=bgdQ
-----END PGP SIGNATURE-----
Merge tag 'x86-fpu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU updates from Thomas Gleixner:
- Simplify the FPU protection for !RT kernels
- Add the RT variant of FPU protections
* tag 'x86-fpu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Make kernel FPU protection RT friendly
x86/fpu: Simplify fpregs_[un]lock()
Since its creation Marvell NIC driver for Armada 375/7k8k and
CN913x SoC families mvpp2 has been lacking an entry in MAINTAINERS,
which sometimes lead to unhandled bugs that persisted
across several kernel releases.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201211165114.26290-1-mw@semihalf.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot reproduces BUG_ON in skb_checksum_help():
tun creates (bogus) skb with huge partial-checksummed area and
small ip packet inside. Then ip_rcv trims the skb based on size
of internal ip packet, after that csum offset points beyond of
trimmed skb. Then checksum_tg() called via netfilter hook
triggers BUG_ON:
offset = skb_checksum_start_offset(skb);
BUG_ON(offset >= skb_headlen(skb));
To work around the problem this patch forces pskb_trim_rcsum_slow()
to return -EINVAL in described scenario. It allows its callers to
drop such kind of packets.
Link: https://syzkaller.appspot.com/bug?id=b419a5ca95062664fe1a60b764621eb4526e2cd0
Reported-by: syzbot+7010af67ced6105e5ab6@syzkaller.appspotmail.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/1b2494af-2c56-8ee2-7bc0-923fcad1cdf8@virtuozzo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub pointed out that the IP_ECN_set* helpers basically open-code
csum16_add(), so let's switch them over to using the helper instead.
v2:
- Use __be16 for check_add stack variable in IP_ECN_set_ce() (kbot)
v3:
- Turns out we need __force casts to do arithmetic on __be16 types
Reported-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Jonathan Morton <chromatix99@gmail.com>
Tested-by: Pete Heist <pete@heistp.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20201211142638.154780-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Consolidate all kmap_atomic() internals into a generic implementation
which builds the base for the kmap_local() API and make the
kmap_atomic() interface wrappers which handle the disabling/enabling of
preemption and pagefaults.
- Switch the storage from per-CPU to per task and provide scheduler
support for clearing mapping when scheduling out and restoring them
when scheduling back in.
- Merge the migrate_disable/enable() code, which is also part of the
scheduler pull request. This was required to make the kmap_local()
interface available which does not disable preemption when a mapping
is established. It has to disable migration instead to guarantee that
the virtual address of the mapped slot is the same accross preemption.
- Provide better debug facilities: guard pages and enforced utilization
of the mapping mechanics on 64bit systems when the architecture allows
it.
- Provide the new kmap_local() API which can now be used to cleanup the
kmap_atomic() usage sites all over the place. Most of the usage sites
do not require the implicit disabling of preemption and pagefaults so
the penalty on 64bit and 32bit non-highmem systems is removed and quite
some of the code can be simplified. A wholesale conversion is not
possible because some usage depends on the implicit side effects and
some need to be cleaned up because they work around these side effects.
The migrate disable side effect is only effective on highmem systems
and when enforced debugging is enabled. On 64bit and 32bit non-highmem
systems the overhead is completely avoided.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XyQwTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoUolD/9+R+BX96fGir+I8rG9dc3cbLw5meSi
0I/Nq3PToZMs2Iqv50DsoaPYHHz/M6fcAO9LRIgsE9jRbnY93GnsBM0wU9Y8yQaT
4wUzOG5WHaLDfqIkx/CN9coUl458oEiwOEbn79A2FmPXFzr7IpkufnV3ybGDwzwP
p73bjMJMPPFrsa9ig87YiYfV/5IAZHi82PN8Cq1v4yNzgXRP3Tg6QoAuCO84ZnWF
RYlrfKjcJ2xPdn+RuYyXolPtxr1hJQ0bOUpe4xu/UfeZjxZ7i1wtwLN9kWZe8CKH
+x4Lz8HZZ5QMTQ9sCHOLtKzu2MceMcpISzoQH4/aFQCNMgLn1zLbS790XkYiQCuR
ne9Cua+IqgYfGMG8cq8+bkU9HCNKaXqIBgPEKE/iHYVmqzCOqhW5Cogu4KFekf6V
Wi7pyyUdX2en8BAWpk5NHc8de9cGcc+HXMq2NIcgXjVWvPaqRP6DeITERTZLJOmz
XPxq5oPLGl7wdm7z+ICIaNApy8zuxpzb6sPLNcn7l5OeorViORlUu08AN8587wAj
FiVjp6ZYomg+gyMkiNkDqFOGDH5TMENpOFoB0hNNEyJwwS0xh6CgWuwZcv+N8aPO
HuS/P+tNANbD8ggT4UparXYce7YCtgOf3IG4GA3JJYvYmJ6pU+AZOWRoDScWq4o+
+jlfoJhMbtx5Gg==
=n71I
-----END PGP SIGNATURE-----
Merge tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull kmap updates from Thomas Gleixner:
"The new preemtible kmap_local() implementation:
- Consolidate all kmap_atomic() internals into a generic
implementation which builds the base for the kmap_local() API and
make the kmap_atomic() interface wrappers which handle the
disabling/enabling of preemption and pagefaults.
- Switch the storage from per-CPU to per task and provide scheduler
support for clearing mapping when scheduling out and restoring them
when scheduling back in.
- Merge the migrate_disable/enable() code, which is also part of the
scheduler pull request. This was required to make the kmap_local()
interface available which does not disable preemption when a
mapping is established. It has to disable migration instead to
guarantee that the virtual address of the mapped slot is the same
across preemption.
- Provide better debug facilities: guard pages and enforced
utilization of the mapping mechanics on 64bit systems when the
architecture allows it.
- Provide the new kmap_local() API which can now be used to cleanup
the kmap_atomic() usage sites all over the place. Most of the usage
sites do not require the implicit disabling of preemption and
pagefaults so the penalty on 64bit and 32bit non-highmem systems is
removed and quite some of the code can be simplified. A wholesale
conversion is not possible because some usage depends on the
implicit side effects and some need to be cleaned up because they
work around these side effects.
The migrate disable side effect is only effective on highmem
systems and when enforced debugging is enabled. On 64bit and 32bit
non-highmem systems the overhead is completely avoided"
* tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
ARM: highmem: Fix cache_is_vivt() reference
x86/crashdump/32: Simplify copy_oldmem_page()
io-mapping: Provide iomap_local variant
mm/highmem: Provide kmap_local*
sched: highmem: Store local kmaps in task struct
x86: Support kmap_local() forced debugging
mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
mm/highmem: Provide and use CONFIG_DEBUG_KMAP_LOCAL
microblaze/mm/highmem: Add dropped #ifdef back
xtensa/mm/highmem: Make generic kmap_atomic() work correctly
mm/highmem: Take kmap_high_get() properly into account
highmem: High implementation details and document API
Documentation/io-mapping: Remove outdated blurb
io-mapping: Cleanup atomic iomap
mm/highmem: Remove the old kmap_atomic cruft
highmem: Get rid of kmap_types.h
xtensa/mm/highmem: Switch to generic kmap atomic
sparc/mm/highmem: Switch to generic kmap atomic
powerpc/mm/highmem: Switch to generic kmap atomic
nds32/mm/highmem: Switch to generic kmap atomic
...
- migrate_disable/enable() support which originates from the RT tree and
is now a prerequisite for the new preemptible kmap_local() API which aims
to replace kmap_atomic().
- A fair amount of topology and NUMA related improvements
- Improvements for the frequency invariant calculations
- Enhanced robustness for the global CPU priority tracking and decision
making
- The usual small fixes and enhancements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XwK4THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoX28D/9cVrvziSQGfBfuQWnUiw8iOIq1QBa2
Me+Tvenhfrlt7xU6rbP9ciFu7eTN+fS06m5uQPGI+t22WuJmHzbmw1bJVXfkvYfI
/QoU+Hg7DkDAn1p7ZKXh0dRkV0nI9ixxSHl0E+Zf1ATBxCUMV2SO85flg6z/4qJq
3VWUye0dmR7/bhtkIjv5rwce9v2JB2g1AbgYXYTW9lHVoUdGoMSdiZAF4tGyHLnx
sJ6DMqQ+k+dmPyYO0z5MTzjW/fXit4n9w2e3z9TvRH/uBu58WSW1RBmQYX6aHBAg
dhT9F4lvTs6lJY23x5RSFWDOv6xAvKF5a0xfb8UZcyH5EoLYrPRvm42a0BbjdeRa
u0z7LbwIlKA+RFdZzFZWz8UvvO0ljyMjmiuqZnZ5dY9Cd80LSBuxrWeQYG0qg6lR
Y2povhhCepEG+q8AXIe2YjHKWKKC1s/l/VY3CNnCzcd21JPQjQ4Z5eWGmHif5IED
CntaeFFhZadR3w02tkX35zFmY3w4soKKrbI4EKWrQwd+cIEQlOSY7dEPI/b5BbYj
MWAb3P4EG9N77AWTNmbhK4nN0brEYb+rBbCA+5dtNBVhHTxAC7OTWElJOC2O66FI
e06dREjvwYtOkRUkUguWwErbIai2gJ2MH0VILV3hHoh64oRk7jjM8PZYnjQkdptQ
Gsq0rJW5iiu/OQ==
=Oz1V
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
- migrate_disable/enable() support which originates from the RT tree
and is now a prerequisite for the new preemptible kmap_local() API
which aims to replace kmap_atomic().
- A fair amount of topology and NUMA related improvements
- Improvements for the frequency invariant calculations
- Enhanced robustness for the global CPU priority tracking and decision
making
- The usual small fixes and enhancements all over the place
* tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
sched/fair: Trivial correction of the newidle_balance() comment
sched/fair: Clear SMT siblings after determining the core is not idle
sched: Fix kernel-doc markup
x86: Print ratio freq_max/freq_base used in frequency invariance calculations
x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC
x86, sched: Calculate frequency invariance for AMD systems
irq_work: Optimize irq_work_single()
smp: Cleanup smp_call_function*()
irq_work: Cleanup
sched: Limit the amount of NUMA imbalance that can exist at fork time
sched/numa: Allow a floating imbalance between NUMA nodes
sched: Avoid unnecessary calculation of load imbalance at clone time
sched/numa: Rename nr_running and break out the magic number
sched: Make migrate_disable/enable() independent of RT
sched/topology: Condition EAS enablement on FIE support
arm64: Rebuild sched domains on invariance status changes
sched/topology,schedutil: Wrap sched domains rebuild
sched/uclamp: Allow to reset a task uclamp constraint value
sched/core: Fix typos in comments
Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug
...
I got a warining report:
br_sysfs_addbr: can't create group bridge4/bridge
------------[ cut here ]------------
sysfs group 'bridge' not found for kobject 'bridge4'
WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline]
WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 sysfs_remove_group+0x153/0x1b0 fs/sysfs/group.c:270
Modules linked in: iptable_nat
...
Call Trace:
br_dev_delete+0x112/0x190 net/bridge/br_if.c:384
br_dev_newlink net/bridge/br_netlink.c:1381 [inline]
br_dev_newlink+0xdb/0x100 net/bridge/br_netlink.c:1362
__rtnl_newlink+0xe11/0x13f0 net/core/rtnetlink.c:3441
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500
rtnetlink_rcv_msg+0x385/0x980 net/core/rtnetlink.c:5562
netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2494
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x793/0xc80 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg+0x139/0x170 net/socket.c:671
____sys_sendmsg+0x658/0x7d0 net/socket.c:2353
___sys_sendmsg+0xf8/0x170 net/socket.c:2407
__sys_sendmsg+0xd3/0x190 net/socket.c:2440
do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
In br_device_event(), if the bridge sysfs fails to be added,
br_device_event() should return error. This can prevent warining
when removing bridge sysfs that do not exist.
Fixes: bb900b27a2 ("bridge: allow creating bridge devices with netlink")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Tested-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20201211122921.40386-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Core:
- Robustness improvements for the NOHZ tick management
- Fixes and consolidation of the NTP/RTC synchronization code
- Small fixes and improvements in various places
- A set of function documentation udpates and fixes
Drivers:
- Cleanups and improvements in various clocksoure/event drivers
- Removal of the EZChip NPS clocksource driver as the platfrom support
was removed from ARC
- The usual set of new device tree binding and json conversions
- The RTC driver which have been acked by the RTC maintainer:
- Fix a long standing bug in the MC146818 library code which can cause
reading garbage during the RTC internal update.
- The changes related to the NTP/RTC consolidation work.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/Xw1wTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYof7SD/4iIjuP5HoY7ec0z9wSFQ5U5nUwJnpW
Sre13SUXpW+wOa/RcjAaHiD2G4MGtQyUIBibuL18Q5GMtGOvlIueEniuYP57p1XU
ipr1UMnFvRkAaFNOnySzLiQyuliteBcNSDHrLYsSWW2BwjLbNzX46zG5kILrt31i
IsseHZdD9+7SXBLvCjO6FAYkVH8FeIaFKv+3ZmroWOxPBOXi4wn02K86HrXs/6Wu
9SCUIMcewhvSx3xCURzyMv6S2hgKSzywRNc5WcYIE8OPlKbnAE0IC370r3o2uL1B
4dZPv4H1y7F7M4G+/XlIv0l2DTp9RuiWut9QcYmHtlFCKkrEO3ZGlcgPU6y5+mNc
AwwG0J51yJYqg42aifdDNJ18B9GUNVCfVAKZcOYHLXOBgSvshd2WkPJkXsGaHd3z
KrK3kZUnx+/QUWZB7dMuq+HQG2PJTvKkEwu4VGReWPGmubXbsIqBZ0vH5jYHjuEo
t4QCUc5BpNlXOUJxal5wzVmDWnoqfKqbmnPky/f/cmNEfQNY6nA9hC3vo781j532
Z5snFXhbITqIkaHoN86wMuuDCjKBKBJGQvejZKgPvh3oIg9d5yaj9P0UAhoYtv+M
jMus4QDb6eBirgnZIVpgBC3kVZOxNOEHNsPeCcVfvPa7QOQnY4Cmb0GWnpZ2SZOz
KYSjTIXKgZnHiQ==
=eWC0
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers and timekeeping updates from Thomas Gleixner:
"Core:
- Robustness improvements for the NOHZ tick management
- Fixes and consolidation of the NTP/RTC synchronization code
- Small fixes and improvements in various places
- A set of function documentation udpates and fixes
Drivers:
- Cleanups and improvements in various clocksoure/event drivers
- Removal of the EZChip NPS clocksource driver as the platfrom
support was removed from ARC
- The usual set of new device tree binding and json conversions
- The RTC driver which have been acked by the RTC maintainer:
* fix a long standing bug in the MC146818 library code which can
cause reading garbage during the RTC internal update.
* changes related to the NTP/RTC consolidation work"
* tag 'timers-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
ntp: Fix prototype in the !CONFIG_GENERIC_CMOS_UPDATE case
tick/sched: Make jiffies update quick check more robust
ntp: Consolidate the RTC update implementation
ntp: Make the RTC sync offset less obscure
ntp, rtc: Move rtc_set_ntp_time() to ntp code
ntp: Make the RTC synchronization more reliable
rtc: core: Make the sync offset default more realistic
rtc: cmos: Make rtc_cmos sync offset correct
rtc: mc146818: Reduce spinlock section in mc146818_set_time()
rtc: mc146818: Prevent reading garbage
clocksource/drivers/sh_cmt: Fix potential deadlock when calling runtime PM
clocksource/drivers/arm_arch_timer: Correct fault programming of CNTKCTL_EL1.EVNTI
clocksource/drivers/arm_arch_timer: Use stable count reader in erratum sne
clocksource/drivers/dw_apb_timer_of: Add error handling if no clock available
clocksource/drivers/riscv: Make RISCV_TIMER depends on RISCV_SBI
clocksource/drivers/ingenic: Fix section mismatch
clocksource/drivers/cadence_ttc: Fix memory leak in ttc_setup_clockevent()
dt-bindings: timer: renesas: tmu: Convert to json-schema
dt-bindings: timer: renesas: tmu: Document r8a774e1 bindings
clocksource/drivers/orion: Add missing clk_disable_unprepare() on error path
...
Instead doing the check for allocation in each loop, move it outside
the while loop and do it every NAPI loop.
This change boosts the xdpsock rxdrop scenario with 15% more
packets-per-second.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/r/20201211085410.59350-1-bjorn.topel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
George Cherian says:
====================
Add devlink and devlink health reporters to octeontx2
Add basic devlink and devlink health reporters.
Devlink health reporters are added for NPA block.
Address Jakub's comment to add devlink support for error reporting.
https://www.spinics.net/lists/netdev/msg670712.html
For now, I have dropped the NIX block health reporters.
This series attempts to add health reporters only for the NPA block.
As per Jakub's suggestion separate reporters per event is used and also
got rid of the counters.
====================
Link: https://lore.kernel.org/r/20201211062526.2302643-1-george.cherian@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add Documentation for devlink health reporters for NPA block.
Signed-off-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add devlink support to AF driver. Basic devlink support is added.
Currently info_get is the only supported devlink ops.
devlink ouptput looks like this
# devlink dev
pci/0002:01:00.0
# devlink dev info
pci/0002:01:00.0:
driver octeontx2-af
#
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The cleanup function in this script that tries to delete hv-1 / hv-2
vm-1 / vm-2 netns will generate some uncessary error messages:
Cannot remove namespace file "/run/netns/hv-2": No such file or directory
Cannot remove namespace file "/run/netns/vm-1": No such file or directory
Cannot remove namespace file "/run/netns/vm-2": No such file or directory
Redirect it to /dev/null like other commands in the cleanup function
to reduce confusion.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Link: https://lore.kernel.org/r/20201211042420.16411-1-po-hsu.lin@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For each TCP zero window probe, the icsk_backoff is increased by one and
its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
shift count would be masked to range 0 to 63. And on ARMv7 the result is
zero. If the shift count is masked, only several probes will be sent
with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
needs tcp_retries2 times probes to end this false timeout. Besides,
bitwise shift greater than or equal to the width is an undefined
behavior.
This patch adds a limit to the backoff. The max value of max_when is
TCP_RTO_MAX and the min value of timeout base is TCP_RTO_MIN. The limit
is the backoff from TCP_RTO_MIN to TCP_RTO_MAX.
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Link: https://lore.kernel.org/r/20201208091910.37618-1-cambda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Core:
- Better handling of page table leaves on archictectures which have
architectures have non-pagetable aligned huge/large pages. For such
architectures a leaf can actually be part of a larger entry.
- Prevent a deadlock vs. exec_update_mutex
Architectures:
- The related updates for page size calculation of leaf entries
- The usual churn to support new CPUs
- Small fixes and improvements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XvgATHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoUrdEACatdr93wv75vnm5tCZM4EsFvB2PzVJ
ck4K4+hHiMVV4802qf+kW5plF+rckAU4TAai/L7wkTntKHvjD/0/o1epoIStb+dS
SCpVkQMCLT/8xT242iHPOfgsQpVpJnIiBwVRjn8HXu82nXdgMJhKnBjTe634UfxW
o2OCFiyJzpRi5l86gVp67ueqgvl34NPI2JaSLc0g80QfZ8akzdePPpED35CzYjZh
41k+7ssvt6qch3vMUySHAhkX4gQl0nc80YAaF/XZbCfvdyY7D03PtfBjfvphTSK0
l54z9aWh0ciK9P1aPfvkHDXBJUR2VtUAx2GiURK+XU3jNk3KMrz9CcBl1D/exIAg
07IsiYVoB38YAUOZoR9K8p+p+5EuwYRRUMAgfQfBALCuaLQV477Cne82b2KmNCus
1izUQvcDDf0s74OyYTHWFXRGla95COJvNLzkrZ1oU3mX4HgdKdOAUbf/2XTLWeKO
3HOIS+jsg5cp82tRe4X5r51h73pONYlo9lLo/CjQXz25vMcXKtE/MZGq2gkRff4p
N4k88eQ5LOsRqUaU46GcHozXRCfcpW7SPI9AaN5I/fKGIZvHP7uMdMb+g5DV8yHI
dNZ8u5uLPHwdg80C3fJ3Pnp7VsVNHliPXMwv0vib7BCp7aUVZWeFnOntw3PdYFRk
XKEbfl36IuAadg==
=rZ99
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
"Core:
- Better handling of page table leaves on archictectures which have
architectures have non-pagetable aligned huge/large pages. For such
architectures a leaf can actually be part of a larger entry.
- Prevent a deadlock vs exec_update_mutex
Architectures:
- The related updates for page size calculation of leaf entries
- The usual churn to support new CPUs
- Small fixes and improvements all over the place"
* tag 'perf-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
perf/x86/intel: Add Tremont Topdown support
uprobes/x86: Fix fall-through warnings for Clang
perf/x86: Fix fall-through warnings for Clang
kprobes/x86: Fix fall-through warnings for Clang
perf/x86/intel/lbr: Fix the return type of get_lbr_cycles()
perf/x86/intel: Fix rtm_abort_event encoding on Ice Lake
x86/kprobes: Restore BTF if the single-stepping is cancelled
perf: Break deadlock involving exec_update_mutex
sparc64/mm: Implement pXX_leaf_size() support
powerpc/8xx: Implement pXX_leaf_size() support
arm64/mm: Implement pXX_leaf_size() support
perf/core: Fix arch_perf_get_page_size()
mm: Introduce pXX_leaf_size()
mm/gup: Provide gup_get_pte() more generic
perf/x86/intel: Add event constraint for CYCLE_ACTIVITY.STALLS_MEM_ANY
perf/x86/intel/uncore: Add Rocket Lake support
perf/x86/msr: Add Rocket Lake CPU support
perf/x86/cstate: Add Rocket Lake CPU support
perf/x86/intel: Add Rocket Lake CPU support
perf,mm: Handle non-page-table-aligned hugetlbfs
...
Mat Martineau says:
====================
mptcp: Another set of miscellaneous MPTCP fixes
This is another collection of MPTCP fixes and enhancements that we have
tested in the MPTCP tree:
Patch 1 cleans up cgroup attachment for in-kernel subflow sockets.
Patches 2 and 3 make sure that deletion of advertised addresses by an
MPTCP path manager when flushing all addresses behaves similarly to the
remove-single-address operation, and adds related tests.
Patches 4 and 8 do some minor cleanup.
Patches 5-7 add MPTCP_FASTCLOSE functionality. Note that patch 6 adds MPTCP
option parsing to tcp_reset().
Patch 9 optimizes skb size for outgoing MPTCP packets.
====================
Link: https://lore.kernel.org/r/20201210222506.222251-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the xmit path of the MPTCP protocol creates smaller-
than-max-size skbs, which is suboptimal for the performances.
There are a few things to improve:
- when coalescing to an existing skb, must clear the PUSH flag
- tcp_build_frag() expect the available space as an argument.
When coalescing is enable MPTCP already subtracted the
to-be-coalesced skb len. We must increment said argument
accordingly.
Before:
./use_mptcp.sh netperf -H 127.0.0.1 -t TCP_STREAM
[...]
131072 16384 16384 30.00 24414.86
After:
./use_mptcp.sh netperf -H 127.0.0.1 -t TCP_STREAM
[...]
131072 16384 16384 30.05 28357.69
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is no need to unconditionally acquire the join list
lock, we can simply splice the join list into the subflow
list and traverse only the latter.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
parse the MPTCP FASTCLOSE subtype.
If provided key matches the local one, schedule the work queue to close
(with tcp reset) all subflows.
The MPTCP socket moves to closed state immediately.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Because TCP-level resets only affect the subflow, there is a MPTCP
option to indicate that the MPTCP-level connection should be closed
immediately without a mptcp-level fin exchange.
This is the 'MPTCP fast close option'. It can be carried on ack
segments or TCP resets. In the latter case, its needed to parse mptcp
options also for reset packets so that MPTCP can act accordingly.
Next patch will add receive side fastclose support in MPTCP.
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When processing options from tcp reset path its possible that
tcp_done(ssk) drops the last reference on the mptcp socket which
results in use-after-free.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use the macro MPTCPOPT_HMAC_LEN instead of a constant in struct
mptcp_options_received.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch added the flush addrs testcase. In do_transfer, if the number
of removing addresses is less than 8, use the del addr command to remove
the addresses one by one. If the number is more than 8, use the flush addrs
command to remove the addresses.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the PM netlink flushes the addresses, invoke the remove address
function mptcp_nl_remove_subflow_and_signal_addr to remove the addresses
and the subflows. Since this function should not be invoked under lock,
move __flush_addrs out of the pernet->lock.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It has been observed that the kernel sockets created for the subflows
(except the first one) are not in the same cgroup as their parents.
That's because the additional subflows are created by kernel workers.
This is a problem with eBPF programs attached to the parent's
cgroup won't be executed for the children. But also with any other features
of CGroup linked to a sk.
This patch fixes this behaviour.
As the subflow sockets are created by the kernel, we can't use
'mem_cgroup_sk_alloc' because of the current context being the one of the
kworker. This is why we have to do low level memcg manipulation, if
required.
Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- A few extensions to the rwsem API and support for opportunistic
spinning and lock stealing
- lockdep selftest improvements
- Documentation updates
- Cleanups and small fixes all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XvCgTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodbkD/9kmXCablxzCG+IGdRU0KfSvbalHkoS
hUW7sJ8qYdoysOVMdvImPwqxLDy/P6D8Nk6z+hdaPmfWvIDQQECd7Mg/UhZLkRzI
BGNgpatnzX4PK5sm/IFExCisPCkkbkjprocnk//TGjdwTiMMDxrndsEpwVwcucDp
TwOjPPxoAbfWHUmnv2SUOD7mWMqMH/ISTQlKUaz+UCQicPNuHumdsQKvZx3eu7Cv
KvucTso5Qjmyy0HwpmJO/IEyZs7Ibrb5Ocw5wds3yo2PFTjYTvo3JlJ16g8IvaZW
ckk+o+3QKp29oFAPQ+dFGEG10w4JQI3AZkDVouFR4BDD0sbOm7BvWCsVq/J8vk3i
xnmaHT3zB5F4T97O+osBj2KS4zLliOHohWzDNv1+JVBCfniYbPo5hqa/n7OO2oot
M3xXY3ddgfTEUOtvOPPfZwfG5XmPrgwj8iiyywlTQU4BR5rWYj2ehvhWOwugQJ6x
g56nQzuf3KmyoI2S+1GZoxtgWSLwoXbUAPL8p4lyvy6jKKFV84BOJeVac803BBUo
yLFBSvTfZ95iNc84XHjJOJ/MGE8e2hOGa2KEdxuh1qE5FPazBg5e2cQh2j125PLz
uyhelQn7SgAHSKSXSAOPq0JFsrmxRmkzIgG9zLSEqo+6g6uKdWgGYVCbEzOB+9gB
2tNEgP6Mfh+ARg==
=uqcN
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Thomas Gleixner:
"A moderate set of locking updates:
- A few extensions to the rwsem API and support for opportunistic
spinning and lock stealing
- lockdep selftest improvements
- Documentation updates
- Cleanups and small fixes all over the place"
* tag 'locking-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
seqlock: kernel-doc: Specify when preemption is automatically altered
seqlock: Prefix internal seqcount_t-only macros with a "do_"
Documentation: seqlock: s/LOCKTYPE/LOCKNAME/g
locking/rwsem: Remove reader optimistic spinning
locking/rwsem: Enable reader optimistic lock stealing
locking/rwsem: Prevent potential lock starvation
locking/rwsem: Pass the current atomic count to rwsem_down_read_slowpath()
locking/rwsem: Fold __down_{read,write}*()
locking/rwsem: Introduce rwsem_write_trylock()
locking/rwsem: Better collate rwsem_read_trylock()
rwsem: Implement down_read_interruptible
rwsem: Implement down_read_killable_nested
refcount: Fix a kernel-doc markup
completion: Drop init_completion define
atomic: Update MAINTAINERS
atomic: Delete obsolete documentation
seqlock: Rename __seqprop() users
lockdep/selftest: Add spin_nest_lock test
lockdep/selftests: Fix PROVE_RAW_LOCK_NESTING
seqlock: avoid -Wshadow warnings
...
This patch checks that MHI queue is not full before waking up the net
queue. This fix sporadic MHI queueing issues in xmit. Indeed xmit and
its symmetric complete callback (ul_callback) can run concurently, it
is then not safe to unconditionnaly waking the queue in the callback
without checking queue fullness.
Fixes: 3ffec6a14f ("net: Add mhi-net driver")
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Link: https://lore.kernel.org/r/1607599507-5879-1-git-send-email-loic.poulain@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The 6220 and 6250 switches do not have a learn2all bit in global1, ATU
control register; bit 3 is reserverd.
On the switches that do have that bit, it is used to control whether
learning frames are sent out the ports that have the message_port bit
set. So rather than adding yet another chip method, use the existence
of the ->port_setup_message_port method as a proxy for determining
whether the learn2all bit exists (and should be set).
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Link: https://lore.kernel.org/r/20201210110645.27765-1-rasmus.villemoes@prevas.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RCU:
- Avoid cpuinfo-induced IPI pileups and idle-CPU IPIs.
- Lockdep-RCU updates reducing the need for __maybe_unused.
- Tasks-RCU updates.
- Miscellaneous fixes.
- Documentation updates.
- Torture-test updates.
KCSAN:
- updates for selftests, avoiding setting watchpoints on NULL pointers
- fix to watchpoint encoding
LKMM:
- updates for documentation along with some updates to example-code
litmus tests
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/Xon4THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYobXUD/92LJTI/TMgK6Z6EEQBiJZO/2mNKjK8
FEKc6AqTNMlZNsWCfQ5UgqtHpn+MkBZsX1x4u22gehE1qaCB8gnQ5wXgbXon8tQm
exxVk6vvQZjseeqCMqrsUYQlD7dNgHnf1qAmWXJvji4sA/1Opo6n2M74tqfE2ueV
S5hpQwSuK/6Zu2Hrr62HD8+Fx0in6ZuKRZxHGp1392l++DGbniJM3dzntRXB+JbZ
w3PDHFCQuGzTytyeKuQV48ot9IK+2YzmjIp/+4tHL6mvU38xeSu6gcYtqKPcfYWw
D6HXvDa965h5IrFdSA2JWSzjJ+VYgZVElk2HyXDNIae0fM/8GidgoIDQipT1WAur
sxW/Ke4U6Jm5MMqXqV8iMNduktkGD1/h6G/iB1Yis29xFdthorNpbHVAP+8cKXgf
1cR6RorOuBYv6XpyzygHtE7qfLY5ST352pJ4+UqNzboujOcuEnGaygttt0F/F8sA
ZH8NT5dyUfbGeqepdZWkbj116Hjeg3fyV3CZeyBhDeqpjf1Nn3nbJ1xRksPLfa3i
IKvN7HSzEg+vKnsJNnQeFlAmQ/W3n2bedzRqfaCg77pNhKI6jPuavY5f2YGFUj0y
yx0UzOYoI1Cln0keBMmynbyUKgJ7zstLkrt/JenjhtD3B+0df5BmYjkL+nqkP6ax
+XTCu7Xg+B061g==
=N/iO
-----END PGP SIGNATURE-----
Merge tag 'core-rcu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Thomas Gleixner:
"RCU, LKMM and KCSAN updates collected by Paul McKenney.
RCU:
- Avoid cpuinfo-induced IPI pileups and idle-CPU IPIs
- Lockdep-RCU updates reducing the need for __maybe_unused
- Tasks-RCU updates
- Miscellaneous fixes
- Documentation updates
- Torture-test updates
KCSAN:
- updates for selftests, avoiding setting watchpoints on NULL pointers
- fix to watchpoint encoding
LKMM:
- updates for documentation along with some updates to example-code
litmus tests"
* tag 'core-rcu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
srcu: Take early exit on memory-allocation failure
rcu/tree: Defer kvfree_rcu() allocation to a clean context
rcu: Do not report strict GPs for outgoing CPUs
rcu: Fix a typo in rcu_blocking_is_gp() header comment
rcu: Prevent lockdep-RCU splats on lock acquisition/release
rcu/tree: nocb: Avoid raising softirq for offloaded ready-to-execute CBs
rcu,ftrace: Fix ftrace recursion
rcu/tree: Make struct kernel_param_ops definitions const
rcu/tree: Add a warning if CPU being onlined did not report QS already
rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config
rcu: Fix single-CPU check in rcu_blocking_is_gp()
rcu: Implement rcu_segcblist_is_offloaded() config dependent
list.h: Update comment to explicitly note circular lists
rcu: Panic after fixed number of stalls
x86/smpboot: Move rcu_cpu_starting() earlier
rcu: Allow rcu_irq_enter_check_tick() from NMI
tools/memory-model: Label MP tests' producers and consumers
tools/memory-model: Use "buf" and "flag" for message-passing tests
tools/memory-model: Add types to litmus tests
tools/memory-model: Add a glossary of LKMM terms
...
Currently, the exception actions are not processed correctly as the wrong
dataset is passed. This change fixes this, including the misleading
comment.
In addition, a check was added to make sure we work on an IPv4 packet,
and not just assume if it's not IPv6 it's IPv4.
This was all tested using OVS with patch,
https://patchwork.ozlabs.org/project/openvswitch/list/?series=21639,
applied and sending packets with a TTL of 1 (and 0), both with IPv4
and IPv6.
Fixes: 69929d4c49 ("net: openvswitch: fix TTL decrement action netlink message format")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160733569860.3007.12938188180387116741.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- More generalization of entry/exit functionality
- The consolidation work to reclaim TIF flags on x86 and also for non-x86
specific TIF flags which are solely relevant for syscall related work
and have been moved into their own storage space. The x86 specific part
had to be merged in to avoid a major conflict.
- The TIF_NOTIFY_SIGNAL work which replaces the inefficient signal
delivery mode of task work and results in an impressive performance
improvement for io_uring. The non-x86 consolidation of this is going to
come seperate via Jens.
- The selective syscall redirection facility which provides a clean and
efficient way to support the non-Linux syscalls of WINE by catching them
at syscall entry and redirecting them to the user space emulation. This
can be utilized for other purposes as well and has been designed
carefully to avoid overhead for the regular fastpath. This includes the
core changes and the x86 support code.
- Simplification of the context tracking entry/exit handling for the users
of the generic entry code which guarantee the proper ordering and
protection.
- Preparatory changes to make the generic entry code accomodate S390
specific requirements which are mostly related to their syscall restart
mechanism.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XoPoTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoe0tD/4jSKHIogVM9kVpiYfwjDGS1NluaBXn
71ZoASbX9GZebyGandMyF2QP1iJ24ZO0RztBwHEVH6fyomKB2iFNedssCpO9yfWV
3eFRpOvMpbszY2W2bd0QG3GrqaTttjVfB4ahkGLzqeSbchdob6hZpNDYtBZnujA6
GSnrrurfJkCGoQny+yJQYdQJXQU+BIX90B2a2Q+jW123Luy/iHXC1f/krZSA1m14
fC9xYLSUjPphTzh2ZOW+C3DgdjOL5PfAm/6F+DArt4GtLgrEGD7R74aLSFhvetky
dn5QtG+yAsz1i0cc5Wu/JBcT9tOkY92rPYSyLI9bYQUSQ/bMyuprz6oYKj3dubsu
ZSsKPdkNFPIniL4fLdCMWZcIXX5xgnrxKjdgXZXW3gtrcxSns8w8uED3Sh7dgE08
pgIeq67E5g/OB8kJXH1VxdewmeQb9cOmnzzHwNO7TrrGbBKjDTYHNdYOKf1dUTTK
ZX1UjLfGwxTkMYAbQD1k0JGZ2OLRshzSaH5BW/ZKa3bvJW6yYOq+/YT8B8hbJ8U3
vThlO75/55IJxS5r5Y3vZd/IHdsYbPuETD+TA8tNYtPqNZasW8nnk4TYctWqzDuO
/Ka1wvWYid3c6ySznQn4zSyRjr968AfHeZ9YTUMhWufy5waXVmdBMG41u3IKfsVt
osyzNc4EK19/Mg==
=hsjV
-----END PGP SIGNATURE-----
Merge tag 'core-entry-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core entry/exit updates from Thomas Gleixner:
"A set of updates for entry/exit handling:
- More generalization of entry/exit functionality
- The consolidation work to reclaim TIF flags on x86 and also for
non-x86 specific TIF flags which are solely relevant for syscall
related work and have been moved into their own storage space. The
x86 specific part had to be merged in to avoid a major conflict.
- The TIF_NOTIFY_SIGNAL work which replaces the inefficient signal
delivery mode of task work and results in an impressive performance
improvement for io_uring. The non-x86 consolidation of this is
going to come seperate via Jens.
- The selective syscall redirection facility which provides a clean
and efficient way to support the non-Linux syscalls of WINE by
catching them at syscall entry and redirecting them to the user
space emulation. This can be utilized for other purposes as well
and has been designed carefully to avoid overhead for the regular
fastpath. This includes the core changes and the x86 support code.
- Simplification of the context tracking entry/exit handling for the
users of the generic entry code which guarantee the proper ordering
and protection.
- Preparatory changes to make the generic entry code accomodate S390
specific requirements which are mostly related to their syscall
restart mechanism"
* tag 'core-entry-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
entry: Add syscall_exit_to_user_mode_work()
entry: Add exit_to_user_mode() wrapper
entry_Add_enter_from_user_mode_wrapper
entry: Rename exit_to_user_mode()
entry: Rename enter_from_user_mode()
docs: Document Syscall User Dispatch
selftests: Add benchmark for syscall user dispatch
selftests: Add kselftest for syscall user dispatch
entry: Support Syscall User Dispatch on common syscall entry
kernel: Implement selective syscall userspace redirection
signal: Expose SYS_USER_DISPATCH si_code type
x86: vdso: Expose sigreturn address on vdso to the kernel
MAINTAINERS: Add entry for common entry code
entry: Fix boot for !CONFIG_GENERIC_ENTRY
x86: Support HAVE_CONTEXT_TRACKING_OFFSTACK
context_tracking: Only define schedule_user() on !HAVE_CONTEXT_TRACKING_OFFSTACK archs
sched: Detect call to schedule from critical entry code
context_tracking: Don't implement exception_enter/exit() on CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK
context_tracking: Introduce HAVE_CONTEXT_TRACKING_OFFSTACK
x86: Reclaim unused x86 TI flags
...