This commit may cause a less than required dma mask to be used for
some allocations, which apparently leads to module load failures for
iwlwifi sometimes.
This reverts commit d657c5c73c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Fabio Coatti <fabio.coatti@gmail.com>
Tested-by: Fabio Coatti <fabio.coatti@gmail.com>
- Revert part of a recent ACPICA regression fix that added leading
newlines to ACPICA error messages and made the kernel log look
broken (Rafael Wysocki).
- Fix an ACPI battery driver regression introduced during the 4.17
cycle due to incorrect error handling that made Thinkpad 13
laptops crash on boot (Jouke Witteveen).
- Fix up the recently added PPTT ACPI table support by covering
the case when a PPTT structure represents a processors group
correctly (Sudeep Holla).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJbPf5UAAoJEILEb/54YlRxQjUQAJp1vfYp9w/zVT0fDQZIgkmS
k1BYw5LonMhGOls0qd8laQJ5g7wgc9rIeA6/oT/eCkvJVzOqYN0y49EKBdevLuB6
3K7NeFJtJwvQFPhQrrDBN2PL5SN6ZL9e16MxPZXeVrWN2wMAAdCZiDQN1QyEZNj2
SeFBnMiG/oYiX/87/mkbmwJyG8QX9onH6VywwRP1YGOpPnt/oLtpb3nPHh+Ep85Z
+k2LmICGdUbIadYLhO9UlvyWxaUoVIoyGt/p51vgB2wezfkqg6phKPbl3v1+3z0l
90R4Ls968yzdnZ/09e8ywAy2pUOPQe1BfzSxSrmW0+QF+83tHoOV8xoMo32571pb
/kJn+9qOa6SNfKDTK1thLbSQDgWe02qalWXmCcSyHcUbPaIeEozymyrdF8TZ6xmR
W190C+yQlSS8428B8fI8kPsslg1ZDjqRT2eSnvlAogLYX/wHHN5qygtKHc2iWMso
WL98G9KmcxIt6HtDKUCkW49mW1bACnD3RhrOVczCGPbyqk4asclsMFzqV1W+emMN
d0XaELMkkMaKfLAx3UPS+xNeMAQ/gshfUPm/oMbSjYrHpI+9l8XEUsirdjf3QWiY
8lBUtW7wZPqU8wjUtUh7e7P1F0UFHK4Dzp39IgCWuCnvuid/2mfBwAJ2LcTCaxzx
mYHKZwkBqs68wvuVQM1S
=8VBX
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix a recent ACPICA regression, fix a battery driver regression
introduced during the 4.17 cycle and fix up the recently added support
for the PPTT ACPI table.
Specifics:
- Revert part of a recent ACPICA regression fix that added leading
newlines to ACPICA error messages and made the kernel log look
broken (Rafael Wysocki).
- Fix an ACPI battery driver regression introduced during the 4.17
cycle due to incorrect error handling that made Thinkpad 13 laptops
crash on boot (Jouke Witteveen).
- Fix up the recently added PPTT ACPI table support by covering the
case when a PPTT structure represents a processors group correctly
(Sudeep Holla)"
* tag 'acpi-4.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / battery: Safe unregistering of hooks
ACPI / PPTT: use ACPI ID whenever ACPI_PPTT_ACPI_PROCESSOR_ID_VALID is set
ACPICA: Drop leading newlines from error messages
- Resume parallel PCI (non-PCIe) bridges on suspend-to-RAM (ACP S3)
to avoid confusing the platform firmware which started to happen
after a core power management regression fix that went in during
the 4.17 cycle (Rafael Wysocki).
- Fix up the recently added support for devices in multiple power
domains by avoiding to power up the entire domain unnecessarily
when attaching a device to it (Ulf Hansson).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJbPf3xAAoJEILEb/54YlRx6rcQAKlHlSZI1VwdnxJ1rYwyRLZ+
4Ss5DQQnEUYTaJUXQiwlfmV7tYqrvqsGNP3v1/vlhftdae9zGgH++rju8QxeNHfr
Yt/vTINIepbkH5Q747NSlw1MKpODIWqJKGdYmLNYeo8oKqAYHDmoipryk0r1J4ae
zQyc5nHCDiQtDQsOa0xw7XlVyeLyaHIZCuQ4FAyElTQ7T9t3KyxNtskuXqpJPndd
Kestr7P+Zx0km8GHMHZsY48IC6U561fkAhnk6S9wBUizzeMd+rQVzBLdAaJLS5Lr
zIIDY6RuYcBYcWcVhg6l5510NfuqEeSF50YFjRY1AWezGoRDA5LJkfUeXOt5G4g+
/uZqwL+8oLhTJptCDKPic9/kmTTCvIZxpZMQE391sAYdN6KpZfLpDWNToGEasr3v
ux6XLCUkmv+DOmPztXUNeKv//stCSBsWUl1tbplrvQ4ksuVNZ1U1JQ4mS3Qs0ki0
GXSCqxwM87EhrgJiRwxVd0SieC0szznYGs9zvVNsi1ecnC5SBUw3d+d5mwYdF/F4
WD+qwuuAtEGCg8qPWVTRD6xp58dojZijpynPW21HsSFy//FWR1KJwBxfd1ZkxGD2
iMntqTB3UGE/Nj5UKcsU8A6j8VojH7ndahfjdvWI9Tg8qE8g64xEjM3xomc3NevG
YfvxlNIX1pnDZoIO4iYg
=oF4i
-----END PGP SIGNATURE-----
Merge tag 'pm-4.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a PCI power management regression introduced during the 4.17
cycle and fix up the recently added support for devices in multiple
power domains.
Specifics:
- Resume parallel PCI (non-PCIe) bridges on suspend-to-RAM (ACP S3)
to avoid confusing the platform firmware which started to happen
after a core power management regression fix that went in during
the 4.17 cycle (Rafael Wysocki).
- Fix up the recently added support for devices in multiple power
domains by avoiding to power up the entire domain unnecessarily
when attaching a device to it (Ulf Hansson)"
* tag 'pm-4.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Domains: Don't power on at attach for the multi PM domain case
PCI / ACPI / PM: Resume bridges w/o drivers on suspend-to-RAM
This patch set contains a handful of fixes for the RISC-V port:
* A fix to R_RISCV_ADD32/R_RISCV_SUB32 relocations that allows modules
that use these to load correctly.
* The removal of of_platform_populate(), which is obselete.
* The removal of irq-riscv-intc.h, which is obselete.
* A fix to PTRACE_SETREGSET.
* Fixes that allow the RV32I kernel to build (at least for Zong, I've
got another patch on the mailing list that's necessary on my setup
:)).
I've just given these a defconfig build test.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAls9OjoTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRDvTKFQLMurQf1kD/9I6PVt6fqgUvNKdrG8x9XskOI0eTPh
0+pZRUIOCu0cPBVE8YRLvuOs5wDzUsMDGNKC2UGV/Y8IJxBV2ObQ8KmrC8bbfiy6
EzYVM8oA12oT6k77DmFUhoZf87djRwIvueVuqd+CQhPI/6YjIgInemTiP+8UXHHd
fI0U4EtneiYWt7m8q9hZSXp0g7CtGLaadWRm88bDAhSEMif5O9WjQy1nAbT7WXeV
cNV6w91nru/zKCO0TrDp6zfYdBPo/M0bKALW7s2GRN7Oj/SxOegLaAq+jFp9M09c
5KLWCkcohUzsNrKgO9syHgCSm1V7pMOUsAVa7L+EisUR16WbnpZYGcHbyfCrCGwz
c8TQ3kZcpxEbvEhK+sZQZ0uvD2vNbg3wLGJUBmw7T/OvuQSs3GMMbRNOvQAhZcHp
uSqCdS7znYywFA/FRv8+/qttxSHEfPqrwWnduaL2lPnxGDDoBMa2QdYPd/iwajiT
+Epd5mg3csmxGhEyD9W5nkM4wojZs/6Wic8GF89kBx8K9tnt93cs07JlI7jC4Y+B
QCnuyMa3bjPSnHcAyhcK3Phor6Ik10JpLD3oiRngj3yWiuEtx3NX1dnHRDSGU7fw
/57vVKeLKumE0BzrkNSEogq5bAaKCWbr8iCUPfa+XeWXpvgb57Z/AD4DJ1tf83cu
wcZNg2jVd4Ylrw==
=9Bi0
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-4.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V fixes from Palmer Dabbelt:
"This contains a handful of fixes for the RISC-V port:
- A fix to R_RISCV_ADD32/R_RISCV_SUB32 relocations that allows
modules that use these to load correctly.
- The removal of of_platform_populate(), which is obselete.
- The removal of irq-riscv-intc.h, which is obselete.
- A fix to PTRACE_SETREGSET.
- Fixes that allow the RV32I kernel to build (at least for Zong, I've
got another patch on the mailing list that's necessary on my setup :)).
I've just given these a defconfig build test"
* tag 'riscv-for-linus-4.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
RISC-V: Fix PTRACE_SETREGSET bug.
RISC-V: Don't include irq-riscv-intc.h
riscv: remove unnecessary of_platform_populate call
RISC-V: fix R_RISCV_ADD32/R_RISCV_SUB32 relocations
RISC-V: Change variable type for 32-bit compatible
RISC-V: Add definiion of extract symbol's index and type for 32-bit
RISC-V: Select GENERIC_UCMPDI2 on RV32I
RISC-V: Add conditional macro for zone of DMA32
Pull m68knommu fix from Greg Ungerer:
"A single fix for breakage introduced in this merge window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: fix "bad page state" oops on ColdFire boot
Merge ACPICA regression fix and a fix for the recently added PPTT
support.
* acpi-tables:
ACPI / PPTT: use ACPI ID whenever ACPI_PPTT_ACPI_PROCESSOR_ID_VALID is set
* acpica:
ACPICA: Drop leading newlines from error messages
These patches for building 32-bit RISC-V kernel.
- Fix the compile errors and warnings on RV32I.
- Fix some incompatible problem on RV32I.
- Add format.h for compatible of print format.
The fixed width integer types format for Elf_Addr will move to
generic header by another patch. For now, there are some warning
about unexpected argument of type on RV32I.
Change in v1:
- Fix some error in v1
- Remove implementation of fixed width integer types format for Elf_Addr.
In riscv_gpr_set, pass regs instead of ®s to user_regset_copyin to fix
gdb segfault.
Signed-off-by: Jim Wilson <jimw@sifive.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
This file has never existed in the upstream kernel, but it's guarded by
an #ifdef that's also never existed in the upstream kernel. As a part
of our interrupt controller refactoring this header is no longer
necessary, but this reference managed to sneak in anyway.
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
The DT core will call of_platform_default_populate, so it is not
necessary for arch specific code to call it unless there are custom
match entries, auxdata or parent device. Neither of those apply here, so
remove the call.
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: linux-riscv@lists.infradead.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
The R_RISCV_ADD32/R_RISCV_SUB32 relocations should add/subtract the
address of the symbol (without overflow check), not its contents.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
On 32-bit, it need to use __ucmpdi2, otherwise, it can't find the __ucmpdi2
symbol.
Signed-off-by: Zong Li <zong@andestech.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
The DMA32 is for 64-bit usage.
Signed-off-by: Zong Li <zong@andestech.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
A hooking API was implemented for 4.17 in fa93854f7a followed
by hooks for Thinkpad laptops in 2801b9683f. The Thinkpad
drivers did not support the Thinkpad 13 and the hooking API crashes
on unsupported batteries by altering a list of hooks during unsafe
iteration. Thus, Thinkpad 13 laptops could no longer boot.
Additionally, a lock was kept in place and debugging information was
printed out of order.
Fixes: fa93854f7a (battery: Add the battery hooking API)
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
Signed-off-by: Jouke Witteveen <j.witteveen@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If struct page is poisoned, and uninitialized access is detected via
PF_POISONED_CHECK(page) dump_page() is called to output the page. But,
the dump_page() itself accesses struct page to determine how to print
it, and therefore gets into a recursive loop.
For example:
dump_page()
__dump_page()
PageSlab(page)
PF_POISONED_CHECK(page)
VM_BUG_ON_PGFLAGS(PagePoisoned(page), page)
dump_page() recursion loop.
Link: http://lkml.kernel.org/r/20180702180536.2552-1-pasha.tatashin@oracle.com
Fixes: f165b378bb ("mm: uninitialized struct page poisoning sanity checking")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ARM trusted foundations code is currently broken in linux-next when
CONFIG_KCOV_INSTRUMENT_ALL is set:
/tmp/ccHdQsCI.s: Assembler messages:
/tmp/ccHdQsCI.s:37: Error: .err encountered
/tmp/ccHdQsCI.s:38: Error: .err encountered
/tmp/ccHdQsCI.s:39: Error: .err encountered
scripts/Makefile.build:311: recipe for target 'arch/arm/firmware/trusted_foundations.o' failed
I could not find a function attribute that lets me disable
-fsanitize-coverage=trace-pc for just one function, so this turns it off
for the entire file instead.
Link: http://lkml.kernel.org/r/20180529103636.1535457-1-arnd@arndb.de
Fixes: 758517202b ("arm: port KCOV to arm")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Olof Johansson <olof@lixom.net>
Tested-by: Olof Johansson <olof@lixom.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a special case that the size is "(N << KASAN_SHADOW_SCALE_SHIFT)
Pages plus X", the value of X is [1, KASAN_SHADOW_SCALE_SIZE-1]. The
operation "size >> KASAN_SHADOW_SCALE_SHIFT" will drop X, and the
roundup operation can not retrieve the missed one page. For example:
size=0x28006, PAGE_SIZE=0x1000, KASAN_SHADOW_SCALE_SHIFT=3, we will get
shadow_size=0x5000, but actually we need 6 pages.
shadow_size = round_up(size >> KASAN_SHADOW_SCALE_SHIFT, PAGE_SIZE);
This can lead to a kernel crash when kasan is enabled and the value of
mod->core_layout.size or mod->init_layout.size is like above. Because
the shadow memory of X has not been allocated and mapped.
move_module:
ptr = module_alloc(mod->core_layout.size);
...
memset(ptr, 0, mod->core_layout.size); //crashed
Unable to handle kernel paging request at virtual address ffff0fffff97b000
......
Call trace:
__asan_storeN+0x174/0x1a8
memset+0x24/0x48
layout_and_allocate+0xcd8/0x1800
load_module+0x190/0x23e8
SyS_finit_module+0x148/0x180
Link: http://lkml.kernel.org/r/1529659626-12660-1-git-send-email-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Dmitriy Vyukov <dvyukov@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Libin <huawei.libin@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When booting with very large numbers of gigantic (i.e. 1G) pages, the
operations in the loop of gather_bootmem_prealloc, and specifically
prep_compound_gigantic_page, takes a very long time, and can cause a
softlockup if enough pages are requested at boot.
For example booting with 3844 1G pages requires prepping
(set_compound_head, init the count) over 1 billion 4K tail pages, which
takes considerable time.
Add a cond_resched() to the outer loop in gather_bootmem_prealloc() to
prevent this lockup.
Tested: Booted with softlockup_panic=1 hugepagesz=1G hugepages=3844 and
no softlockup is reported, and the hugepages are reported as
successfully setup.
Link: http://lkml.kernel.org/r/20180627214447.260804-1-cannonmatthews@google.com
Signed-off-by: Cannon Matthews <cannonmatthews@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use huge_ptep_get() to translate huge ptes to normal ptes so we can
check them with the huge_pte_* functions. Otherwise some architectures
will check the wrong values and will not wait for userspace to bring in
the memory.
Link: http://lkml.kernel.org/r/20180626132421.78084-1-frankja@linux.ibm.com
Fixes: 369cd2121b ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
My networking merge (commit 4e33d7d479: "Pull networking fixes from
David Miller") got the poll() handling conflict wrong for af_smc.
The conflict between my a11e1d432b ("Revert changes to convert to
->poll_mask() and aio IOCB_CMD_POLL") and Ursula Braun's 24ac3a08e6
("net/smc: rebuild nonblocking connect") should have left the call to
sock_poll_wait() in place, just without the socket lock release/retake.
And I really should have realized that. But happily, I at least asked
Ursula to double-check the merge, and she set me right.
This also fixes an incidental whitespace issue nearby that annoyed me
while looking at this.
Pointed-out-by: Ursula Braun <ubraun@linux.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are no legacy behavior in drivers to consider while attaching a
device to genpd - for the multiple PM domain case.
For that reason, let's instead require the driver to runtime resume the
device, via calling pm_runtime_get_sync() for example, when it needs to
power on the corresponding PM domain.
This allows us to improve the situation during attach. Instead of always
power on the PM domain, which may be unnecessary, let's leave it in its
current state. Additionally, to avoid the PM domain to stay powered on,
let's schedule a power off work.
Fixes: 3c095f32a9 (PM / Domains: Add support for multi PM domains ...)
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, we use the ACPI processor ID only for the leaf/processor nodes
as the specification states it must match the value of the ACPI processor
ID field in the processor’s entry in the MADT.
However, if a PPTT structure represents a processors group, it
matches a processor container UID in the namespace and the
ACPI_PPTT_ACPI_PROCESSOR_ID_VALID flag indicates whether the
ACPI processor ID is valid.
Let's use UID whenever ACPI_PPTT_ACPI_PROCESSOR_ID_VALID is set to be
consistent instead of using table offset as it's currently done for
non-leaf nodes.
Fixes: 2bd00bcd73 (ACPI/PPTT: Add Processor Properties Topology Table parsing)
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Jeremy Linton <jeremy.linton@arm.com>
[ rjw: Changelog (minor) ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull MD fixes from Shaohua Li:
"Two small fixes for MD:
- an error handling fix from me
- a recover bug fix for raid10 from BingJing"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/raid10: fix that replacement cannot complete recovery after reassemble
MD: cleanup resources in failure
Two fixes here which were breaking OpenRISC boot.
- Fix bug in __pte_free_tlb() exposed in 4.18 by Matthew Wilcox's page
table flag addition.
- Fix issue booting on real hardware if delay slot detection emulation
is disabled.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbOjFIAAoJEMOzHC1eZifkvesP/1WPmI9M6g57kkky7uU5MJi6
cdarrEJbk3KrGFQCJeDkYB3rNQ+NuGebNfbe1AJZabot8raCvU6eGcsvkOVMM4ik
v3iN7Dp4NstKJJ3nr1uAihhJpJdIrVH6caJd21Do23SZGrjUUaa621g72nUCxZT1
u1i4M9YLrUazMtIWhOBL4nkSmVmxL2Qc1fywg/ahDfeUSkqoY3su98HG/sc4t7Yx
j1Bg+ugJyXR87G6mo+wlXF9Y+lXCycSVQC8TEdD0ku9qQzGKsb9ER/wJUSFLcQbP
lrny+rYW79VEbht69NavXTyGV+k+F5+Jr9+w6XN36me3NbmgrBPucpmLj6iGMRDf
xJ0+rS+4/ECy6rGDc3Q3p6SaL/YfJeib0XxmrH5ACg7B4k0Iczk5nuL6sbPcEDLw
a7dOWlLH6DLxmeDF68ExQNi//R+wLe/MRxmOHAoBbyIAXbq+2cvGqp8Jk1V8JQP3
hgQA9BLFb72o7djepJ0MOynXE6nQbWoTIUDQqoy4sLwqCUT40JnRjC4/ji9OcFBe
Ma3CrTTu0RA3U0e984mP025f6MQrLIyhU0AdA+iadnrarC+FIpe/4bzhYfL1OAfy
chsOKAvQnzD9y3b01gbql1x6JV6ro91YGwtP0vdfjiyahQBICIzrglxoZ6byY6AQ
RrwXPgBn8BFEaxAzUBGj
=7uxj
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://github.com/stffrdhrn/linux
Pull OpenRISC fixes from Stafford Horne:
"Two fixes for issues which were breaking OpenRISC boot:
- Fix bug in __pte_free_tlb() exposed in 4.18 by Matthew Wilcox's
page table flag addition.
- Fix issue booting on real hardware if delay slot detection
emulation is disabled"
* tag 'for-linus' of git://github.com/stffrdhrn/linux:
openrisc: entry: Fix delay slot exception detection
openrisc: Call destructor during __pte_free_tlb
Pull networking fixes from David Miller:
1) Verify netlink attributes properly in nf_queue, from Eric Dumazet.
2) Need to bump memory lock rlimit for test_sockmap bpf test, from
Yonghong Song.
3) Fix VLAN handling in lan78xx driver, from Dave Stevenson.
4) Fix uninitialized read in nf_log, from Jann Horn.
5) Fix raw command length parsing in mlx5, from Alex Vesker.
6) Cleanup loopback RDS connections upon netns deletion, from Sowmini
Varadhan.
7) Fix regressions in FIB rule matching during create, from Jason A.
Donenfeld and Roopa Prabhu.
8) Fix mpls ether type detection in nfp, from Pieter Jansen van Vuuren.
9) More bpfilter build fixes/adjustments from Masahiro Yamada.
10) Fix XDP_{TX,REDIRECT} flushing in various drivers, from Jesper
Dangaard Brouer.
11) fib_tests.sh file permissions were broken, from Shuah Khan.
12) Make sure BH/preemption is disabled in data path of mac80211, from
Denis Kenzior.
13) Don't ignore nla_parse_nested() return values in nl80211, from
Johannes berg.
14) Properly account sock objects ot kmemcg, from Shakeel Butt.
15) Adjustments to setting bpf program permissions to read-only, from
Daniel Borkmann.
16) TCP Fast Open key endianness was broken, it always took on the host
endiannness. Whoops. Explicitly make it little endian. From Yuching
Cheng.
17) Fix prefix route setting for link local addresses in ipv6, from
David Ahern.
18) Potential Spectre v1 in zatm driver, from Gustavo A. R. Silva.
19) Various bpf sockmap fixes, from John Fastabend.
20) Use after free for GRO with ESP, from Sabrina Dubroca.
21) Passing bogus flags to crypto_alloc_shash() in ipv6 SR code, from
Eric Biggers.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
qede: Adverstise software timestamp caps when PHC is not available.
qed: Fix use of incorrect size in memcpy call.
qed: Fix setting of incorrect eswitch mode.
qed: Limit msix vectors in kdump kernel to the minimum required count.
ipvlan: call dev_change_flags when ipvlan mode is reset
ipv6: sr: fix passing wrong flags to crypto_alloc_shash()
net: fix use-after-free in GRO with ESP
tcp: prevent bogus FRTO undos with non-SACK flows
bpf: sockhash, add release routine
bpf: sockhash fix omitted bucket lock in sock_close
bpf: sockmap, fix smap_list_map_remove when psock is in many maps
bpf: sockmap, fix crash when ipv6 sock is added
net: fib_rules: bring back rule_exists to match rule during add
hv_netvsc: split sub-channel setup into async and sync
net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN
atm: zatm: Fix potential Spectre v1
s390/qeth: consistently re-enable device features
s390/qeth: don't clobber buffer on async TX completion
s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]
s390/qeth: fix race when setting MAC address
...
Sudarsana Reddy Kalluru says:
====================
qed*: Fix series.
The patch series addresses few issues in the qed* drivers.
Please consider applying it to 'net' branch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When ptp clock is not available for a PF (e.g., higher PFs in NPAR mode),
get-tsinfo() callback should return the software timestamp capabilities
instead of returning the error.
Fixes: 4c55215c ("qede: Add driver support for PTP")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the correct size value while copying chassis/port id values.
Fixes: 6ad8c632e ("qed: Add support for query/config dcbx.")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default, driver sets the eswitch mode incorrectly as VEB (virtual
Ethernet bridging).
Need to set VEB eswitch mode only when sriov is enabled, and it should be
to set NONE by default. The patch incorporates this change.
Fixes: 0fefbfbaa ("qed*: Management firmware - notifications and defaults")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Memory size is limited in the kdump kernel environment. Allocation of more
msix-vectors (or queues) consumes few tens of MBs of memory, which might
lead to the kdump kernel failure.
This patch adds changes to limit the number of MSI-X vectors in kdump
kernel to minimum required value (i.e., 2 per engine).
Fixes: fe56b9e6a ("qed: Add module with basic common support")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After we change the ipvlan mode from l3 to l2, or vice versa, we only
reset IFF_NOARP flag, but don't flush the ARP table cache, which will
cause eth->h_dest to be equal to eth->h_source in ipvlan_xmit_mode_l2().
Then the message will not come out of host.
Here is the reproducer on local host:
ip link set eth1 up
ip addr add 192.168.1.1/24 dev eth1
ip link add link eth1 ipvlan1 type ipvlan mode l3
ip netns add net1
ip link set ipvlan1 netns net1
ip netns exec net1 ip link set ipvlan1 up
ip netns exec net1 ip addr add 192.168.2.1/24 dev ipvlan1
ip route add 192.168.2.0/24 via 192.168.1.2
ping 192.168.2.2 -c 2
ip netns exec net1 ip link set ipvlan1 type ipvlan mode l2
ping 192.168.2.2 -c 2
Add the same configuration on remote host. After we set the mode to l2,
we could find that the src/dst MAC addresses are the same on eth1:
21:26:06.648565 00:b7:13:ad:d3:05 > 00:b7:13:ad:d3:05, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58356, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.2.1 > 192.168.2.2: ICMP echo request, id 22686, seq 1, length 64
Fix this by calling dev_change_flags(), which will call netdevice notifier
with flag change info.
v2:
a) As pointed out by Wang Cong, check return value for dev_change_flags() when
change dev flags.
b) As suggested by Stefano and Sabrina, move flags setting before l3mdev_ops.
So we don't need to redo ipvlan_{, un}register_nf_hook() again in err path.
Reported-by: Jianlin Shi <jishi@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Fixes: 2ad7bf3638 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags,
not 'gfp_t'. So don't pass GFP_KERNEL to it.
Fixes: bf355b8d2c ("ipv6: sr: add core files for SR HMAC support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the addition of GRO for ESP, gro_receive can consume the skb and
return -EINPROGRESS. In that case, the lower layer GRO handler cannot
touch the skb anymore.
Commit 5f114163f2 ("net: Add a skb_gro_flush_final helper.") converted
some of the gro_receive handlers that can lead to ESP's gro_receive so
that they wouldn't access the skb when -EINPROGRESS is returned, but
missed other spots, mainly in tunneling protocols.
This patch finishes the conversion to using skb_gro_flush_final(), and
adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
GUE.
Fixes: 5f114163f2 ("net: Add a skb_gro_flush_final helper.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Booting a ColdFire m68k core with MMU enabled causes a "bad page state"
oops since commit 1d40a5ea01 ("mm: mark pages in use for page tables"):
BUG: Bad page state in process sh pfn:01ce2
page:004fefc8 count:0 mapcount:-1024 mapping:00000000 index:0x0
flags: 0x0()
raw: 00000000 00000000 00000000 fffffbff 00000000 00000100 00000200 00000000
raw: 039c4000
page dumped because: nonzero mapcount
Modules linked in:
CPU: 0 PID: 22 Comm: sh Not tainted 4.17.0-07461-g1d40a5ea01d5 #13
Fix by calling pgtable_page_dtor() in our __pte_free_tlb() code path,
so that the PG_table flag is cleared before we free the pte page.
Note that I had to change the type of pte_free() to be static from
extern. Otherwise you get a lot of warnings like this:
./arch/m68k/include/asm/mcf_pgalloc.h:80:2: warning: ‘pgtable_page_dtor’ is static but used in inline function ‘pte_free’ which is not static
pgtable_page_dtor(page);
^
And making it static is consistent with our use of this in the other
m68k pgalloc definitions of pte_free().
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
CC: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAls4zz0ACgkQxWXV+ddt
WDs0ZhAAplEAcN1BP986BS7GpjrG20vQtP9AnHlnSEJJJmsnykpspBylOcLRkKjF
LKBfBPCKqIo7kn5ebKT1Kk7zJPkOOEfmxGW7hffVN/oa/oMtmgJbbHDUgl2TDgdu
rky1O+Bj+S37s5rhiXAJ4oU9ekdpWIlN30GczfynjiPqGigKh/cINsEEhQIIAiJG
PRDQfSIJeh67x1AP0KE8sJAYSsaeFxT+kHrT/NPs1NFDSzrQSa/QWPFVjGVVuI/Y
w84Mo0EqdRV7tap7D3QyWyYea6zdP00PG8TyLl0Kba+LckFbzpNN5hP3SUxleBzL
0ZBJi7/tOqnrMV3YaGm40dLfgD4B+CFt8zDyg2JvWUxxEzfQfYif7KIT2IV8fSqS
QrVw2NrzQC7EZ4Zu98wCN7dyyOE8yhqbq805YdG3Nj+zT6DqRu01TBo4Yr/Ek8ux
+ITAtQVbaOZmTIt/qh/Oxc5jRsurAno1FP3XRH+1hfSlS7xc3LfI1CUbX3jAKzXN
edxdM4/h+d4nekvROnKBH4EheS6+ZVfgzYlYUW9c2rjcJ1RHhDElbh14+IoM6LKJ
nJ+Cp+744F6W5jaG3oWElJrdhlY31mWUjiZaj2CHl16EcH3MToytrxKMX+OWo95W
gChnKicrtpO6+9nbED3Tdhp7SkbDysun6jvEpSgdlm8+2H5Kwrw=
=KWL7
-----END PGP SIGNATURE-----
Merge tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"We have a few regression fixes for qgroup rescan status tracking and
the vm_fault_t conversion that mixed up the error values"
* tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix mount failure when qgroup rescan is in progress
Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion
btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf
Pull vfs fix from Al Viro:
"Followup to procfs-seq_file series this window"
This fixes a memory leak by making sure that proc seq files release any
private data on close. The 'proc_seq_open' has to be properly paired
with 'proc_seq_release' that releases the extra private data.
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
proc: add proc_seq_release
Here are a few small staging and IIO driver fixes for 4.18-rc3.
Nothing major or big, all just fixes for reported problems since
4.18-rc1. All of these have been in linux-next this week with no
reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWziQ7w8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynwZACgjpcPVTEiA0W02s4B/GhmVt1NMeUAnjgeLDzY
yRz6SX18lSmjHqCUXAk1
=/2QW
-----END PGP SIGNATURE-----
Merge tag 'staging-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO fixes from Greg KH:
"Here are a few small staging and IIO driver fixes for 4.18-rc3.
Nothing major or big, all just fixes for reported problems since
4.18-rc1. All of these have been in linux-next this week with no
reported problems"
* tag 'staging-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: android: ion: Return an ERR_PTR in ion_map_kernel
staging: comedi: quatech_daqp_cs: fix no-op loop daqp_ao_insn_write()
iio: imu: inv_mpu6050: Fix probe() failure on older ACPI based machines
iio: buffer: fix the function signature to match implementation
iio: mma8452: Fix ignoring MMA8452_INT_DRDY
iio: tsl2x7x/tsl2772: avoid potential division by zero
iio: pressure: bmp280: fix relative humidity unit
Here are 5 fixes for the tty core and some serial drivers.
The tty core one fix some security and other issues reported by the
syzbot that I have taken too long in responding to (sorry Tetsuo!). The
8350 serial driver fix resolves an issue of devices that used to work
properly stopping working as they shouldn't have been added to a
blacklist.
All of these have been in linux-next for a few days with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWziSHw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ym5LACg0xI1fC7LAKrLSLKglU/H4Wsv6b0AoNIkfbWi
wxAZZKscwFVKNpv6gN9n
=YgAj
-----END PGP SIGNATURE-----
Merge tag 'tty-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are five fixes for the tty core and some serial drivers.
The tty core ones fix some security and other issues reported by the
syzbot that I have taken too long in responding to (sorry Tetsuo!).
The 8350 serial driver fix resolves an issue of devices that used to
work properly stopping working as they shouldn't have been added to a
blacklist.
All of these have been in linux-next for a few days with no reported
issues"
* tag 'tty-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vt: prevent leaking uninitialized data to userspace via /dev/vcs*
serdev: fix memleak on module unload
serial: 8250_pci: Remove stalled entries in blacklist
n_tty: Access echo_* variables carefully.
n_tty: Fix stall at n_tty_receive_char_special().
Here is a number of USB gadget and other driver fixes for 4.18-rc3.
There's a bunch of them here, most of them being gadget driver and xhci
host controller fixes for reported issues (as normal), but there are
also some new device ids, and some fixes for the typec code.
There is an acpi core patch in here that was acked by the acpi
maintainer as it is needed for the typec fixes in order to properly
solve a problem in that driver.
All of these have been in linux-next this week with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWziS+Q8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yn8QQCfU8PnLo+uPoV+DU7Nm1486mHTP4YAoLU3Q0nE
oRmnzA3TnxytluWgbC7M
=H8MG
-----END PGP SIGNATURE-----
Merge tag 'usb-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here is a number of USB gadget and other driver fixes for 4.18-rc3.
There's a bunch of them here, most of them being gadget driver and
xhci host controller fixes for reported issues (as normal), but there
are also some new device ids, and some fixes for the typec code.
There is an acpi core patch in here that was acked by the acpi
maintainer as it is needed for the typec fixes in order to properly
solve a problem in that driver.
All of these have been in linux-next this week with no reported
issues"
* tag 'usb-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
usb: chipidea: host: fix disconnection detect issue
usb: typec: tcpm: fix logbuffer index is wrong if _tcpm_log is re-entered
typec: tcpm: Fix a msecs vs jiffies bug
NFC: pn533: Fix wrong GFP flag usage
usb: cdc_acm: Add quirk for Uniden UBC125 scanner
staging/typec: fix tcpci_rt1711h build errors
usb: typec: ucsi: Fix for incorrect status data issue
usb: typec: ucsi: acpi: Workaround for cache mode issue
acpi: Add helper for deactivating memory region
usb: xhci: increase CRS timeout value
usb: xhci: tegra: fix runtime PM error handling
usb: xhci: remove the code build warning
xhci: Fix kernel oops in trace_xhci_free_virt_device
xhci: Fix perceived dead host due to runtime suspend race with event handler
dwc2: gadget: Fix ISOC IN DDMA PID bitfield value calculation
usb: gadget: dwc2: fix memory leak in gadget_init()
usb: gadget: composite: fix delayed_status race condition when set_interface
usb: dwc2: fix isoc split in transfer with no data
usb: dwc2: alloc dma aligned buffer for isoc split in
usb: dwc2: fix the incorrect bitmaps for the ports of multi_tt hub
...
If SACK is not enabled and the first cumulative ACK after the RTO
retransmission covers more than the retransmitted skb, a spurious
FRTO undo will trigger (assuming FRTO is enabled for that RTO).
The reason is that any non-retransmitted segment acknowledged will
set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
no indication that it would have been delivered for real (the
scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
case so the check for that bit won't help like it does with SACK).
Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
in tcp_process_loss.
We need to use more strict condition for non-SACK case and check
that none of the cumulatively ACKed segments were retransmitted
to prove that progress is due to original transmissions. Only then
keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
non-SACK case.
(FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS
to better indicate its purpose but to keep this change minimal, it
will be done in another patch).
Besides burstiness and congestion control violations, this problem
can result in RTO loop: When the loss recovery is prematurely
undoed, only new data will be transmitted (if available) and
the next retransmission can occur only after a new RTO which in case
of multiple losses (that are not for consecutive packets) requires
one RTO per loss to recover.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally in patch e6d20c55a4 ("openrisc: entry: Fix delay slot
detection") I fixed delay slot detection, but only for QEMU. We missed
that hardware delay slot detection using delay slot exception flag (DSX)
was still broken. This was because QEMU set the DSX flag in both
pre-exception supervision register (ESR) and supervision register (SR)
register, but on real hardware the DSX flag is only set on the SR
register during exceptions.
Fix this by carrying the DSX flag into the SR register during exception.
We also update the DSX flag read locations to read the value from the SR
register not the pt_regs SR register which represents ESR. The ESR
should never have the DSX flag set.
In the process I updated/removed a few comments to match the current
state. Including removing a comment saying that the DSX detection logic
was inefficient and needed to be rewritten.
I have tested this on QEMU with a patch ensuring it matches the hardware
specification.
Link: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg00000.html
Fixes: e6d20c55a4 ("openrisc: entry: Fix delay slot detection")
Signed-off-by: Stafford Horne <shorne@gmail.com>
Daniel Borkmann says:
====================
pull-request: bpf 2018-07-01
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) A bpf_fib_lookup() helper fix to change the API before freeze to
return an encoding of the FIB lookup result and return the nexthop
device index in the params struct (instead of device index as return
code that we had before), from David.
2) Various BPF JIT fixes to address syzkaller fallout, that is, do not
reject progs when set_memory_*() fails since it could still be RO.
Also arm32 JIT was not using bpf_jit_binary_lock_ro() API which was
an issue, and a memory leak in s390 JIT found during review, from
Daniel.
3) Multiple fixes for sockmap/hash to address most of the syzkaller
triggered bugs. Usage with IPv6 was crashing, a GPF in bpf_tcp_close(),
a missing sock_map_release() routine to hook up to callbacks, and a
fix for an omitted bucket lock in sock_close(), from John.
4) Two bpftool fixes to remove duplicated error message on program load,
and another one to close the libbpf object after program load. One
additional fix for nfp driver's BPF offload to avoid stopping offload
completely if replace of program failed, from Jakub.
5) Couple of BPF selftest fixes that bail out in some of the test
scripts if the user does not have the right privileges, from Jeffrin.
6) Fixes in test_bpf for s390 when CONFIG_BPF_JIT_ALWAYS_ON is set
where we need to set the flag that some of the test cases are expected
to fail, from Kleber.
7) Fix to detangle BPF_LIRC_MODE2 dependency from CONFIG_CGROUP_BPF
since it has no relation to it and lirc2 users often have configs
without cgroups enabled and thus would not be able to use it, from Sean.
8) Fix a selftest failure in sockmap by removing a useless setrlimit()
call that would set a too low limit where at the same time we are
already including bpf_rlimit.h that does the job, from Yonghong.
9) Fix BPF selftest config with missing missing NET_SCHED, from Anders.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend says:
====================
This addresses two syzbot issues that lead to identifying (by Eric and
Wei) a class of bugs where we don't correctly check for IPv4/v6
sockets and their associated state. The second issue was a locking
omission in sockhash.
The first patch addresses IPv6 socks and fixing an error where
sockhash would overwrite the prot pointer with IPv4 prot. To fix
this build similar solution to TLS ULP. Although we continue to
allow socks in all states not just ESTABLISH in this patch set
because as Martin points out there should be no issue with this
on the sockmap ULP because we don't use the ctx in this code. Once
multiple ULPs coexist we may need to revisit this. However we
can do this in *next trees.
The other issue syzbot found that the tcp_close() handler missed
locking the hash bucket lock which could result in corrupting the
sockhash bucket list if delete and close ran at the same time.
And also the smap_list_remove() routine was not working correctly
at all. This was not caught in my testing because in general my
tests (to date at least lets add some more robust selftest in
bpf-next) do things in the "expected" order, create map, add socks,
delete socks, then tear down maps. The tests we have that do the
ops out of this order where only working on single maps not multi-
maps so we never saw the issue. Thanks syzbot. The fix is to
restructure the tcp_close() lock handling. And fix the obvious
bug in smap_list_remove().
Finally, during review I noticed the release handler was omitted
from the upstream code (patch 4) due to an incorrect merge conflict
fix when I ported the code to latest bpf-next before submitting.
This would leave references to the map around if the user never
closes the map.
v3: rework patches, dropping ESTABLISH check and adding rcu
annotation along with the smap_list_remove fix
v4: missed one more case where maps was being accessed without
the sk_callback_lock, spoted by Martin as well.
v5: changed to use a specific lock for maps and reduced callback
lock so that it is only used to gaurd sk callbacks. I think
this makes the logic a bit cleaner and avoids confusion
ovoer what each lock is doing.
Also big thanks to Martin for thorough review he caught at least
one case where I missed a rcu_call().
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add map_release_uref pointer to hashmap ops. This was dropped when
original sockhash code was ported into bpf-next before initial
commit.
Fixes: 8111038444 ("bpf: sockmap, add hash map support")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
First the sk_callback_lock() was being used to protect both the
sock callback hooks and the psock->maps list. This got overly
convoluted after the addition of sockhash (in sockmap it made
some sense because masp and callbacks were tightly coupled) so
lets split out a specific lock for maps and only use the callback
lock for its intended purpose. This fixes a couple cases where
we missed using maps lock when it was in fact needed. Also this
makes it easier to follow the code because now we can put the
locking closer to the actual code its serializing.
Next, in sock_hash_delete_elem() the pattern was as follows,
sock_hash_delete_elem()
[...]
spin_lock(bucket_lock)
l = lookup_elem_raw()
if (l)
hlist_del_rcu()
write_lock(sk_callback_lock)
.... destroy psock ...
write_unlock(sk_callback_lock)
spin_unlock(bucket_lock)
The ordering is necessary because we only know the {p}sock after
dereferencing the hash table which we can't do unless we have the
bucket lock held. Once we have the bucket lock and the psock element
it is deleted from the hashmap to ensure any other path doing a lookup
will fail. Finally, the refcnt is decremented and if zero the psock
is destroyed.
In parallel with the above (or free'ing the map) a tcp close event
may trigger tcp_close(). Which at the moment omits the bucket lock
altogether (oops!) where the flow looks like this,
bpf_tcp_close()
[...]
write_lock(sk_callback_lock)
for each psock->maps // list of maps this sock is part of
hlist_del_rcu(ref_hash_node);
.... destroy psock ...
write_unlock(sk_callback_lock)
Obviously, and demonstrated by syzbot, this is broken because
we can have multiple threads deleting entries via hlist_del_rcu().
To fix this we might be tempted to wrap the hlist operation in a
bucket lock but that would create a lock inversion problem. In
summary to follow locking rules the psocks maps list needs the
sk_callback_lock (after this patch maps_lock) but we need the bucket
lock to do the hlist_del_rcu.
To resolve the lock inversion problem pop the head of the maps list
repeatedly and remove the reference until no more are left. If a
delete happens in parallel from the BPF API that is OK as well because
it will do a similar action, lookup the lock in the map/hash, delete
it from the map/hash, and dec the refcnt. We check for this case
before doing a destroy on the psock to ensure we don't have two
threads tearing down a psock. The new logic is as follows,
bpf_tcp_close()
e = psock_map_pop(psock->maps) // done with map lock
bucket_lock() // lock hash list bucket
l = lookup_elem_raw(head, hash, key, key_size);
if (l) {
//only get here if elmnt was not already removed
hlist_del_rcu()
... destroy psock...
}
bucket_unlock()
And finally for all the above to work add missing locking around map
operations per above. Then add RCU annotations and use
rcu_dereference/rcu_assign_pointer to manage values relying on RCU so
that the object is not free'd from sock_hash_free() while it is being
referenced in bpf_tcp_close().
Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
Fixes: 8111038444 ("bpf: sockmap, add hash map support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>