Add support reading device ID when controller is in bootloader mode, which
may happen if firmware update has been interrupted.
Signed-off-by: simba.hsu <simba.hsu@rad-ic.com>
Link: https://lore.kernel.org/r/20210818063644.8654-1-simba.hsu@rad-ic.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The function cpcap_power_button_probe() does not perform
sufficient error checking after executing platform_get_irq(),
thus fix it.
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20210802121740.8700-1-tangbin@cmss.chinamobile.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
As the ADC ladder input driver does not have any code or data located in
initmem, there is no need to annotate the adc_keys_driver structure with
__refdata. Drop the annotation, to avoid suppressing future section
warnings.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/7091e8213602be64826fd689a7337246d218f3b1.1626255421.git.geert+renesas@glider.be
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
This keyboard driver is implementing a GPIO driver, so it need
to include <linux/gpio/driver.h> and not the legacy <linux/gpio.h>
header.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Link: https://lore.kernel.org/r/20210816232707.485031-1-linus.walleij@linaro.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
This keyboard driver is implementing a GPIO driver, so it need
to include <linux/gpio/driver.h> and not the legacy <linux/gpio.h>
header.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20210820222958.57238-1-linus.walleij@linaro.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Convert the TI TSC2004/TSC2005 DT bindings to YAML schema.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210620210708.100147-1-marex@denx.de
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The LRADC can be a wakeup source and is listed as such in some DT
already. Let's make sure we allow that property in the binding.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210721140424.725744-21-maxime@cerno.tech
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The Haptic feedback based on a regulator is supported by Linux thanks to
its device tree binding.
Now that we have the DT validation in place, let's convert the device
tree bindings for that driver over to a YAML schema.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210721140424.725744-19-maxime@cerno.tech
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The Pixcir Touchscreen Controller is supported by Linux thanks to
its device tree binding.
Now that we have the DT validation in place, let's convert the device
tree bindings for that driver over to a YAML schema.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210721140424.725744-18-maxime@cerno.tech
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The ChipOne ICN8318 Touchscreen Controller is supported by Linux thanks
to its device tree binding.
Now that we have the DT validation in place, let's convert the device
tree bindings for that driver over to a YAML schema.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210721140424.725744-17-maxime@cerno.tech
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
There is absolutely no reason to use comma operator in this code, 2
separate statements make much more sense.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/YPsa1qCBn/SAmE5x@google.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Convert qcom pm8941 power key binding from .txt to .yaml format.
The example has been removed in favour of full example being
available in the qcom,pon.yaml binding.
Signed-off-by: satya priya <skakit@codeaurora.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/1620800053-26405-5-git-send-email-skakit@codeaurora.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Change 'additionalProperties' to true as this is a generic binding.
Signed-off-by: satya priya <skakit@codeaurora.org>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/1620800053-26405-4-git-send-email-skakit@codeaurora.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
This reverts commit 81c7c0a350. The idea
to make write method mandatory was flawed as several client drivers
(such as atkbd) check for presence of write() method to adjust behavior
of the driver.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Given that all serio drivers except one implement write() method
let's make it mandatory to avoid testing for its presence whenever
we attempt to use it.
Link: https://lore.kernel.org/r/YFgUxG/TljMuVeQ3@google.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The NSLU2 has been migrated to devicetree and there we use
the gpio-beeper.c driver instead, the boardfile will be deleted
for kernel v5.15 so drop this custom and now unneeded driver.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20210714115028.916360-1-linus.walleij@linaro.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmDY+dceHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGx1YH/idbXKwfQkBpBCud
BvUo2RaRetACEXo38ydiHNxSyAkde79AVMNDXWCBgnWFwpUG51TtFIAn2VhVIv7d
WlBJPWhbmHwddQB+HACYXsRcBRCc2md7RufOqR/yulx+T8QxQy7yHEd7wOlSdYWC
/BUb/94qREK60FwdWjATSdqO5ditOd5XxvBnfGh04iUmiMRwubOtYPfaomo9MIK6
Qs/Yt6SkIROi9cMQf2NakE2UFeVnQ+/TrDTRsTqokUtLSzpjxDjX39JoRNjLVgS1
XOhrOlUQ+sJ1O1Hq4vSfy8maWivzF9XCCsApd6+Ks1yMB6yw15kU7cQ2yF++UPvC
ktrHsiM=
=I8Bu
-----END PGP SIGNATURE-----
Merge tag 'v5.13' into next
Sync up with the mainline to get the latest parport API.
Even though we validate user-provided inputs we then traverse past
validated data when applying the new map. The issue was originally
discovered by Murray McAllister with this simple POC (if the following
is executed by an unprivileged user it will instantly panic the system):
int main(void) {
int fd, ret;
unsigned int buffer[10000];
fd = open("/dev/input/js0", O_RDONLY);
if (fd == -1)
printf("Error opening file\n");
ret = ioctl(fd, JSIOCSBTNMAP & ~IOCSIZE_MASK, &buffer);
printf("%d\n", ret);
}
The solution is to traverse internal buffer which is guaranteed to only
contain valid date when constructing the map.
Fixes: 182d679b22 ("Input: joydev - prevent potential read overflow in ioctl")
Fixes: 999b874f4a ("Input: joydev - validate axis/button maps before clobbering current ones")
Reported-by: Murray McAllister <murray.mcallister@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Alexander Larkin <avlarkin82@gmail.com>
Link: https://lore.kernel.org/r/20210620120030.1513655-1-avlarkin82@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
This reverts commits 4bad58ebc8 (and
399f8dd9a8, which tried to fix it).
I do not believe these are correct, and I'm about to release 5.13, so am
reverting them out of an abundance of caution.
The locking is odd, and appears broken.
On the allocation side (in __sigqueue_alloc()), the locking is somewhat
straightforward: it depends on sighand->siglock. Since one caller
doesn't hold that lock, it further then tests 'sigqueue_flags' to avoid
the case with no locks held.
On the freeing side (in sigqueue_cache_or_free()), there is no locking
at all, and the logic instead depends on 'current' being a single
thread, and not able to race with itself.
To make things more exciting, there's also the data race between freeing
a signal and allocating one, which is handled by using WRITE_ONCE() and
READ_ONCE(), and being mutually exclusive wrt the initial state (ie
freeing will only free if the old state was NULL, while allocating will
obviously only use the value if it was non-NULL, so only one or the
other will actually act on the value).
However, while the free->alloc paths do seem mutually exclusive thanks
to just the data value dependency, it's not clear what the memory
ordering constraints are on it. Could writes from the previous
allocation possibly be delayed and seen by the new allocation later,
causing logical inconsistencies?
So it's all very exciting and unusual.
And in particular, it seems that the freeing side is incorrect in
depending on "current" being single-threaded. Yes, 'current' is a
single thread, but in the presense of asynchronous events even a single
thread can have data races.
And such asynchronous events can and do happen, with interrupts causing
signals to be flushed and thus free'd (for example - sending a
SIGCONT/SIGSTOP can happen from interrupt context, and can flush
previously queued process control signals).
So regardless of all the other questions about the memory ordering and
locking for this new cached allocation, the sigqueue_cache_or_free()
assumptions seem to be fundamentally incorrect.
It may be that people will show me the errors of my ways, and tell me
why this is all safe after all. We can reinstate it if so. But my
current belief is that the WRITE_ONCE() that sets the cached entry needs
to be a smp_store_release(), and the READ_ONCE() that finds a cached
entry needs to be a smp_load_acquire() to handle memory ordering
correctly.
And the sequence in sigqueue_cache_or_free() would need to either use a
lock or at least be interrupt-safe some way (perhaps by using something
like the percpu 'cmpxchg': it doesn't need to be SMP-safe, but like the
percpu operations it needs to be interrupt-safe).
Fixes: 399f8dd9a8 ("signal: Prevent sigqueue caching after task got released")
Fixes: 4bad58ebc8 ("signal: Allow tasks to cache one sigqueue struct")
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
generic entry.
- Fix potential register clobbering in stack switch helper.
- Fix thread/group masks for offline cpus.
- Fix cleanup of mdev resources when remove callback is invoked in vfio-ap
code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmDXFfYACgkQjYWKoQLX
FBjWmwf/X9JNauWHyr2lrWKiP8QF6jviRHT3aysQKAn6For0qpFEFVLdAaHGTfhw
3C+s/CHz4AJFQEsOyBIHx8hlBlaQJm4o+6Aaem7yAaDo/B65KkN9bTi5BcYIu5Bo
WZruSgoNbN8effm5dnahqsopXSSLXPf2CuaVoqwW15FUM7b/tu4ljpt5meHbtBnX
VC8iaKI90ZRmARUvwSaR29zj4y56vlzG8ebmTZA2C5ZdleNnie/DeXvePXkP4FL4
QVHGK4RjAnTe32YlwNJMbFPqOuumdQEPlVFT1mP+YLmBRaaOSdy9ky+CnfPLw8dp
pIasRLu+hex1xpyvao2d5tinCT73xw==
=M4Hh
-----END PGP SIGNATURE-----
Merge tag 's390-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix a couple of late pt_regs flags handling findings of conversion to
generic entry.
- Fix potential register clobbering in stack switch helper.
- Fix thread/group masks for offline cpus.
- Fix cleanup of mdev resources when remove callback is invoked in
vfio-ap code.
* tag 's390-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/stack: fix possible register corruption with stack switch helper
s390/topology: clear thread/group maps for offline cpus
s390/vfio-ap: clean up mdev resources when remove callback invoked
s390: clear pt_regs::flags on irq entry
s390: fix system call restart with multiple signals
- Put an fwnode in the errorpath in the SGPIO driver
- Fix the number of GPIO lines per bank in the STM32 driver
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmDWZr0ACgkQQRCzN7AZ
XXMUPhAArQVMlbu4m8V7gOwXEhpgGjI1g1E/9thF3lWkpnB5dbMrCnEt7rjhWHa+
+Oo60AwOG09TL+JClSqDPkt8KX5RWUlgauSnPsrKZN2wvSACzkX49QBf7D9KjN71
vVdmLJTo5bNcA+JTD5EoEs4cT7v5O7smpCRkufCMbTpvhlKZpXlpJht68/LFr/Uf
TlY3zeJzk5IuFuAIcvsyzeCXloGk2z+mBIKsYFtQv6F38/K9ClGtoumwdc6aXkCH
/jV3xWs0jkWNJgjbK/U/wCsg01jW2+iE4lDvufU6FqXCRUEPmKd/5yNIZ2yABoP7
SOS1QKwbinhXZujY8o3Of9Rqb/1ErL8GFIlGew3+4EJrJlXlWdCE+26cOpXtCpbR
wq8RWpd9L8GFfo4LIWbT6P4QI3FxalYMkVnPqYgNAJVvLphZGu0yguac5Tb1P8fu
M5uw8WgfEjo8qAXUu4sJBwAfm+OHriArdQ729ovmrBMWI7MmhxvBA40lML3BzxH6
9S9Q29wYpYmQCzbqaSjihx27ccaMGKC+kEKiaqfBKngrh1x0pRMWM09NUwSRRCDt
D8fVzE0BHhgJTyRVzfxrLQgsHiuDV3ty8N6vJdzzAXUpQBicupv6JjK8z6X2owOI
g56mh3SyvYIguq5IASTOkg9/OY2o+ANr8Lny5eCnMX8aTiZ+AkA=
=ZJ4G
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Two last-minute fixes:
- Put an fwnode in the errorpath in the SGPIO driver
- Fix the number of GPIO lines per bank in the STM32 driver"
* tag 'pinctrl-v5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: stm32: fix the reported number of GPIO lines per bank
pinctrl: microchip-sgpio: Put fwnode in error case during ->probe()
Two small fixes, both in upper layer drivers (scsi disk and cdrom).
The sd one is fixing a commit changing revalidation that came from the
block tree a while ago (5.10) and the sr one adds handling of a
condition we didn't previously handle for manually removed media.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYNZTQSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishSp6AP9Q8jdJ
29ditYcnr23nsLgtzK0koo7ip3TM6WjEf3SSIAEA45wPiEdC8G35wfQFzefy3UfO
LGbq6eNGwPDakldysAs=
=NKGv
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes, both in upper layer drivers (scsi disk and cdrom).
The sd one is fixing a commit changing revalidation that came from the
block tree a while ago (5.10) and the sr one adds handling of a
condition we didn't previously handle for manually removed media"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Call sd_revalidate_disk() for ioctl(BLKRRPART)
scsi: sr: Return appropriate error code when disk is ejected
Merge misc fixes from Andrew Morton:
"24 patches, based on 4a09d388f2.
Subsystems affected by this patch series: mm (thp, vmalloc, hugetlb,
memory-failure, and pagealloc), nilfs2, kthread, MAINTAINERS, and
mailmap"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
mailmap: add Marek's other e-mail address and identity without diacritics
MAINTAINERS: fix Marek's identity again
mm/page_alloc: do bulk array bounds check after checking populated elements
mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array
mm/hwpoison: do not lock page again when me_huge_page() successfully recovers
mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned
mm/memory-failure: use a mutex to avoid memory_failure() races
mm, futex: fix shared futex pgoff on shmem huge page
kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
kthread_worker: split code for canceling the delayed work timer
mm/vmalloc: unbreak kasan vmalloc support
KVM: s390: prepare for hugepage vmalloc
mm/vmalloc: add vmalloc_no_huge
nilfs2: fix memory leak in nilfs_sysfs_delete_device_group
mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
mm: page_vma_mapped_walk(): get vma_address_end() earlier
mm: page_vma_mapped_walk(): use goto instead of while (1)
mm: page_vma_mapped_walk(): add a level of indentation
mm: page_vma_mapped_walk(): crossing page table boundary
...
This ioctl request reads from uffdio_continue structure written by
userspace which justifies _IOC_WRITE flag. It also writes back to that
structure which justifies _IOC_READ flag.
See NOTEs in include/uapi/asm-generic/ioctl.h for more information.
Fixes: f619147104 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull i2c fixes from Wolfram Sang:
"Three more driver bugfixes and an annotation fix for the core"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: robotfuzz-osif: fix control-request directions
i2c: dev: Add __user annotation
i2c: cp2615: check for allocation failure in cp2615_i2c_recv()
i2c: i801: Ensure that SMBHSTSTS_INUSE_STS is cleared when leaving i801_access
Fix a NULL pointer dereference introduced by a recent commit and
occurring when device_remove_software_node() is used with a device
that has never been registered (Heikki Krogerus).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmDWDEYSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxIWYQALEQ3YUszZIp76jaRa4zD1dOenT/o68l
X+2ju3LHpxfXi0+zPwZLqSTZdnuhvVoo7Ym+xgjCCSPshtvuKLTh4Guzy+WU0aMc
JUGeeusd178aJe4kxMB+gD2sAAIvhNN/meHoV/olRQ8UB9xYN1QgBssP2YsnQ6ns
A/6hxrK2HgC0d7Px2p68W20MmtFgXmwWIuDYDG3uIqcdpIQWi9GxZfOmuPvC3MP3
YB5CW8qhV9BXbXth5wf7QrB9GrVRXGcJkoD7ju1FbCKMc+yu2kAjPkHr/o4w/zcB
3D141e6UbXTkIrqbmkaoaQbEf3GIhYuVYTfz0ey5xGkHZJt7fAFXvC0p+iHT9D35
iDDd4/QqS5slXOpFOkoRU66Wu+GOZtVMFmpjQvyCo8HmDt6MlSmI/XxBmZW1bopX
SekDd3CRnfZAFCzAmSqpgdsjtoENXBhn8w4qsYtpGZ9Qy4Ckf/wcpuTAuxmiY+Vh
bK4YfyaACF3LVlz8BoMaOous581sIsZIjUMTCwrwL81iVXxapkoNLz+IOGWLTX1r
e/XZ47PnJ3pHYiAcRIQ2pRhWOkaV3ihu21U+EoxNqEa1nbmBsoAoTbIbCmRSi4qr
xz1ds2QecGoG7z0JGRmcpe8fk3P+eapdKNe6iwFj0yn79wjpjKfniqMLvbXNxUgu
DuQvEVJD0Wbv
=vBvH
-----END PGP SIGNATURE-----
Merge tag 'devprop-5.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull device properties framework fix from Rafael Wysocki:
"Fix a NULL pointer dereference introduced by a recent commit and
occurring when device_remove_software_node() is used with a device
that has never been registered (Heikki Krogerus)"
* tag 'devprop-5.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
software node: Handle software node injection to an existing device properly
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYNXrhgAKCRCAXGG7T9hj
vuJDAP9WFyhonqK7ZPFy/nKqUXFdkAhXcVzshLaMZi2YR81T0wD9FNcA5efTAHLM
LPD0ZvubyAK43x1NKfUSpawPtk0WFAo=
=fomh
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.13b-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"A fix for a regression introduced in 5.12: when migrating an irq
related to a Xen user event to another cpu, a race might result
in a WARN() triggering"
* tag 'for-linus-5.13b-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: reset active flag for lateeoi events later
This is a very small fix that was provided by Nick Piggin and tested
by myself.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDU+IYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPNRggAj286k4/wI4bCFw4cB8MkuJc0hgYm
H22SuYjUsBwaxUPCYijRxshlq1nP21I05bJKHY1OpZPkEVyOd8MufC6Jc7YOENni
f0nmZRmRpgmKuIEnwLoLra17MpZn3qFOkWwfmmNLgtLX2CRaHbhq3gZpX2V3mP7J
qsepe0s65cCEbA0yC0+oMuS9JHAv7RjfDKM+3dOgE1pH83lv9UqF7W0WJfc0d7z0
KDPCLgU4Qa2rst/yPKwt/IFuzF8PBIwwYkYOZ/UrxKyZcUZGIHHyy0yWSywXIzcU
8zXl8AlUokadoqaGAP4qIaspDZWCIVA+S4X4y2Ly0CtExtfsh+27ww8eiQ==
=bjxg
-----END PGP SIGNATURE-----
Merge tag 'for-linus-urgent' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"A selftests fix for ARM, and the fix for page reference count
underflow. This is a very small fix that was provided by Nick Piggin
and tested by myself"
* tag 'for-linus-urgent' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: do not allow mapping valid but non-reference-counted pages
KVM: selftests: Fix mapping length truncation in m{,un}map()
- prevent unprivileged userspace from reinitializing supervisor states
- Prepare init_fpstate, which is the buffer used when initializing
FPU state, properly in case the skip-writing-state-components XSAVE*
variants are used.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmDVltYACgkQEsHwGGHe
VUqY9Q//c4MhJP2E15cqTWupxYk41k0UMjqPIwGmt6hRoDNKeFQm0xSgeOwe2mgk
bbzGDJOfAi2Hxza2fw6No4wIiaB3sZIqK451aI1SM9HTDB/B/dMGBPXAp9qRlnbT
kU/rDqQVqi7wlwunSunFoSLTwmQw0Lispmzwz9yirdQ+jVsnuTLWtPbUZM8RL/j8
XAhVwhDNc+Wuw0OBvRsyP5Mp6k9+2ic6z2ObIgSfgp4GeDG2F/+ZQ5W5ZeHVGQda
5QqKIdWCmAinzdz3N0iksthT3RJwLmYZ0K/qvLMrYNCvZiuUBdgrUn1Yrjo1c3lx
W+SUMtgehlylfyBbyGn5zBbJtZJtflx+kYLHLzw58lWC+ekRfxqx2F+e7S4facXr
Xn9IpnIAhru1/SAItSvScxXzjVW4DwZKO3tLr+/KsrRsTnS15pD6rx6OK88HHP/y
ofjCeS0P8STb7/Gzzqj7c+7bJvSZo/h7jmF+H2y5tRhUXZogSoh1z/QGYpvcFrwP
GOZeACREBv+D1PQNp/DN/ZiZHg6+csEg+3abtRaZSbdnfsCSpU/imXcX9GPco5vu
XS+Gxle2aqvRmQNuJEbNr7YDfocZWWXmXnkPSKCtvqSgNdxjFjZ2v3TRTAgvHEoS
Otpsv5Hk9g0FCep4oHG3zv8cb+Nk7Ycl2ZLZXQwE2Egane6U4K8=
=uqQE
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"Two more urgent FPU fixes:
- prevent unprivileged userspace from reinitializing supervisor
states
- prepare init_fpstate, which is the buffer used when initializing
FPU state, properly in case the skip-writing-state-components
XSAVE* variants are used"
* tag 'x86_urgent_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Make init_fpstate correct with optimized XSAVE
x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate()
and one in the filesystem for proper propagation of MDS request errors.
Also included a locking fix for async creates, marked for stable.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmDV440THGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi3CXB/0aA0Ka+weQtIxxX3zl1thsE1APxoKe
va77EfTJZbN12UHKAJ6sJUpXCLFc5hVJETw7w3qyz22VvJIPUQWd+h4w4eTXJ4QK
Fab6+HT0/p0NxZ29rxa1bkHnrRAD30cpNd6WXcAeMJ3ZKvZfPtPnIWXSmCbJYGLV
xhwx8y6kzjE60B60bjcQzuSpsMQkq0OpdXYdyxq3RysCjTCyDfpGuFnDHSv3aklm
d6tyv2nUDM/oF/CEFZrTeaLrIZsYxxkpJHKkm7Xy70bUv8IMW97CKJSFjKYucyYd
iV7VbtIKPq3sbGrmkaWm4nET5Z0C+m+JD2AhR17ylbdQy91hKaGrbnpw
=RTBT
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.13-rc8' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two regression fixes from the merge window: one in the auth code
affecting old clusters and one in the filesystem for proper
propagation of MDS request errors.
Also included a locking fix for async creates, marked for stable"
* tag 'ceph-for-5.13-rc8' of https://github.com/ceph/ceph-client:
libceph: set global_id as soon as we get an auth ticket
libceph: don't pass result into ac->ops->handle_reply()
ceph: fix error handling in ceph_atomic_open and ceph_lookup
ceph: must hold snap_rwsem when filling inode for async create
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmDQ9Z4ACgkQ+7dXa6fL
C2sVfg/9H3OlL4Vwv5CERgb5kbDoALAh5epYMFyvNVy9l5iTvYekxG3qna6ee0+G
pTRHSU5tZUCUslpafv8LUz1RE/iM127y+BP+qq2joG9jkT4q3QfeV4oDr+dgZWCL
obp+rjQSTKkaGh3eqjDx7gCSp5mqQI/M8MXe1VOXxUzAnsf2nH4iJLv/A9NN17xW
l7sQfZJGHD1BPqsxxFiSr+UCkCLuLDybUDYL6+PZcFu0jSON0h4yEtIwsAA7a1Zw
TvgUpequrbyTORI3sHJ8eIixWosh4yLTJ4pDqs8qDqq3Wm5eTZjss0wxEaVfpaTu
cg/CcNUBiHJ/6Q8r+JiuPbEnfnQ9woL8951/CNi+cOGhTy9LNoG70orSZKKynmW5
QmpYBK5BkM57EPj7DYZJRI1Bwy41pJapFj8tXbMjObU+ZyaXMysmBDanIRK/cLoy
fr4Sz+1D8yJPQ0GDgC4051CxrhynOEnRo8JbESg8CD4PnqFeM7EoCh48H2oVvR4N
9v2xuaRyBvi2KTmKSktRe+s1DS80acVMUYP33nT2zAthvL91VVgMY3Hz2/QrlAp8
h1hREME8aRcN4LrBIgp/ET150hUh44U2A07PqnYYe0653MH7aHHFfk134ZuTmbub
Pfc1MHtWgPAWN4c140ILBRTidJeShszoJGgpD6tflkj10VI2s34=
=2c6l
-----END PGP SIGNATURE-----
Merge tag 'netfs-fixes-20210621' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull netfs fixes from David Howells:
"This contains patches to fix netfs_write_begin() and afs_write_end()
in the following ways:
(1) In netfs_write_begin(), extract the decision about whether to skip
a page out to its own helper and have that clear around the region
to be written, but not clear that region. This requires the
filesystem to patch it up afterwards if the hole doesn't get
completely filled.
(2) Use offset_in_thp() in (1) rather than manually calculating the
offset into the page.
(3) Due to (1), afs_write_end() now needs to handle short data write
into the page by generic_perform_write(). I've adopted an
analogous approach to ceph of just returning 0 in this case and
letting the caller go round again.
It also adds a note that (in the future) the len parameter may extend
beyond the page allocated. This is because the page allocation is
deferred to write_begin() and that gets to decide what size of THP to
allocate."
Jeff Layton points out:
"The netfs fix in particular fixes a data corruption bug in cephfs"
* tag 'netfs-fixes-20210621' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
netfs: fix test for whether we can skip read when writing beyond EOF
afs: Fix afs_write_end() to handle short writes
- fix wake-up interrupt support on gpio-mxc
- zero the padding bytes in a structure passed to user-space in the GPIO
character device
- select HAS_IOPORT_MAP in two drivers that need it to fix a Kbuild issue
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmDVy5IACgkQEacuoBRx
13IVwhAAu7+LCkGMb5esTdE/sP73CmLuDGM7frBLg4NOZg1XCapvv8BPsylS66WI
G756fxMfp1OqG8Y6rnHPX9E8jWyBYBPKop9kvx/IfdTqxjihbzEkTgnixxjPh8I5
yeEY5RtU9K0vJGKE1PRVXh+kcjW0MGzq5309uNJw4CpWF6bzf2M1q2+f5hg0OjKS
WXH5A164PqQakCaYZUh/zLzFnjW+lVjLIWRTXrxCSzQxFzDc8p/XZE24gVNNWVfX
SVfHTdzdVb1RPFbB5I36c8e4tXKAGMoYSwgqGKj2+rQpx+ttpSqtWlAeS6dThtIo
wZcfp8b7V+4O6dreiChjx8jLN/5lU+++7d0AK1Jw8sffqxsB0HWPLJY+8rL6sVfu
VIhS3nKLJwnPe+7kulDtHFVoelhKzr3AgqDrz2lYwbfOLjH8+f6naIHMabrrsTH4
L6ydSnf/NbL0SFzMpdyQ5yFfXMmcNqziFpt/EMol8ChIaOi6QbCT7W/L/3MGPn36
TuW9QtYegb8ZVZ4OEIph2SwK3zbuxlRWuLrTTQUXT4xnckdlmt8R+7REvYjI7VWn
fkHyo2LEvwOBvR/dvmedsuWytWwx17O2x2C7NC7Cd1n1hWWU2oZzNf/GqmJYYnwM
hxJiv16zV/cXpQEswnIRWdhwJixHC1r2KDkm7ETf50JHl5g4uoE=
=udPz
-----END PGP SIGNATURE-----
Merge tag 'gpio-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix wake-up interrupt support on gpio-mxc
- zero the padding bytes in a structure passed to user-space in the
GPIO character device
- require HAS_IOPORT_MAP in two drivers that need it to fix a Kbuild
issue
* tag 'gpio-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: AMD8111 and TQMX86 require HAS_IOPORT_MAP
gpiolib: cdev: zero padding during conversion to gpioline_info_changed
gpio: mxc: Fix disabled interrupt wake-up support
Two small changes have been cherry-picked as a last material for
5.13: a coverage after UMN revert action and a stale MAINTAINERS
entry fix.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmDUg48OHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE9pkA/+Mw/lrSNN87q165/HIY+Hq24rSj9uo82eAE4J
MWCBZM98ne0F7aZ8olh0EZSxF7/q78ceCabOi+0P3LX5CYBMLldSkxABRjKSR3dv
BMzN5lvF9Kxy9rHX5rEG/O9gFaab0JErcZagk56PetSHyJug9atljnldiQYYUAqc
0b9JKjiEU9MrHdg/fg14ZIu7253tVcz2rp2bubreTZFnbH5axybU4FJ7Qn4a+eh1
PtPp2roT3cPSQpD+aUcuGggu8AfxmAI0FI+Meq0fqcFX0VLnuWLnQVo4vhQXrTkN
fErELuAw9ffImARa1Sd7X7b8jT3sM8EhPp3RZxfPox2MUEjENOr43lczAXPX8lbj
Z1G0cxJiKH0fswkoiEAwa8+RSOSOLM9M1wh3O7oieI3/ISd6WyOo3Lik2alxjG24
64z6MRBFBDfBfQ/O7MvRn6zRJR0zsEZLtB7HOLHTTzP2lbfkBH368+DGHYA2Hlv/
ug4cH3C+S97mtW8wt89eKz02W6q4harDmcKmsiGwqK45wwjhXPQzApeo//nZpkFB
gK8c7/+UDIaz/FWl0mlklRqRECMV0Is+7EKys4/HHmOEbLPYS90SARjm9h0fo77b
00wsf6bsiJL/GjVBCpyc5HP6cpv9BBFZoG9k89B5mbpr2AcKc/YpnfPIOEMtnsMx
bZunxus=
=/VH1
-----END PGP SIGNATURE-----
Merge tag 'sound-5.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Two small changes have been cherry-picked as a last material for 5.13:
a coverage after UMN revert action and a stale MAINTAINERS entry fix"
* tag 'sound-5.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
MAINTAINERS: remove Timur Tabi from Freescale SOC sound drivers
ASoC: rt5645: Avoid upgrading static warnings to errors
Both of these drivers use ioport_map(), so they need to
depend on HAS_IOPORT_MAP. Otherwise, they cannot be built
even with COMPILE_TEST on architectures without an ioport
implementation, such as ARCH=um.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Some of my commits were sent with identities
Marek Behun <marek.behun@nic.cz>
Marek Behún <marek.behun@nic.cz>
while the correct one is
Marek Behún <kabel@kernel.org>
Put this into mailmap so that git shortlog prints all my commits under
one identity.
Link: https://lkml.kernel.org/r/20210616113624.19351-2-kabel@kernel.org
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix my name to use diacritics, since MAINTAINERS supports it.
Fix my e-mail address in MAINTAINERS' marvell10g PHY driver description,
I accidentally put my other e-mail address here.
Link: https://lkml.kernel.org/r/20210616113624.19351-1-kabel@kernel.org
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter reported the following
The patch 0f87d9d30f: "mm/page_alloc: add an array-based interface
to the bulk page allocator" from Apr 29, 2021, leads to the following
static checker warning:
mm/page_alloc.c:5338 __alloc_pages_bulk()
warn: potentially one past the end of array 'page_array[nr_populated]'
The problem can occur if an array is passed in that is fully populated.
That potentially ends up allocating a single page and storing it past
the end of the array. This patch returns 0 if the array is fully
populated.
Link: https://lkml.kernel.org/r/20210618125102.GU30378@techsingularity.net
Fixes: 0f87d9d30f ("mm/page_alloc: add an array-based interface to the bulk page allocator")
Signed-off-by: Mel Gorman <mgorman@techsinguliarity.net>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the event that somebody would call this with an already fully
populated page_array, the last loop iteration would do an access beyond
the end of page_array.
It's of course extremely unlikely that would ever be done, but this
triggers my internal static analyzer. Also, if it really is not
supposed to be invoked this way (i.e., with no NULL entries in
page_array), the nr_populated<nr_pages check could simply be removed
instead.
Link: https://lkml.kernel.org/r/20210507064504.1712559-1-linux@rasmusvillemoes.dk
Fixes: 0f87d9d30f ("mm/page_alloc: add an array-based interface to the bulk page allocator")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently me_huge_page() temporary unlocks page to perform some actions
then locks it again later. My testcase (which calls hard-offline on
some tail page in a hugetlb, then accesses the address of the hugetlb
range) showed that page allocation code detects this page lock on buddy
page and printed out "BUG: Bad page state" message.
check_new_page_bad() does not consider a page with __PG_HWPOISON as bad
page, so this flag works as kind of filter, but this filtering doesn't
work in this case because the "bad page" is not the actual hwpoisoned
page. So stop locking page again. Actions to be taken depend on the
page type of the error, so page unlocking should be done in ->action()
callbacks. So let's make it assumed and change all existing callbacks
that way.
Link: https://lkml.kernel.org/r/20210609072029.74645-1-nao.horiguchi@gmail.com
Fixes: commit 78bb920344 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When memory_failure() is called with MF_ACTION_REQUIRED on the page that
has already been hwpoisoned, memory_failure() could fail to send SIGBUS
to the affected process, which results in infinite loop of MCEs.
Currently memory_failure() returns 0 if it's called for already
hwpoisoned page, then the caller, kill_me_maybe(), could return without
sending SIGBUS to current process. An action required MCE is raised
when the current process accesses to the broken memory, so no SIGBUS
means that the current process continues to run and access to the error
page again soon, so running into MCE loop.
This issue can arise for example in the following scenarios:
- Two or more threads access to the poisoned page concurrently. If
local MCE is enabled, MCE handler independently handles the MCE
events. So there's a race among MCE events, and the second or latter
threads fall into the situation in question.
- If there was a precedent memory error event and memory_failure() for
the event failed to unmap the error page for some reason, the
subsequent memory access to the error page triggers the MCE loop
situation.
To fix the issue, make memory_failure() return an error code when the
error page has already been hwpoisoned. This allows memory error
handler to control how it sends signals to userspace. And make sure
that any process touching a hwpoisoned page should get a SIGBUS even in
"already hwpoisoned" path of memory_failure() as is done in page fault
path.
Link: https://lkml.kernel.org/r/20210521030156.2612074-3-nao.horiguchi@gmail.com
Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jue Wang <juew@google.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm,hwpoison: fix sending SIGBUS for Action Required MCE", v5.
I wrote this patchset to materialize what I think is the current
allowable solution mentioned by the previous discussion [1]. I simply
borrowed Tony's mutex patch and Aili's return code patch, then I queued
another one to find error virtual address in the best effort manner. I
know that this is not a perfect solution, but should work for some
typical case.
[1]: https://lore.kernel.org/linux-mm/20210331192540.2141052f@alex-virtual-machine/
This patch (of 2):
There can be races when multiple CPUs consume poison from the same page.
The first into memory_failure() atomically sets the HWPoison page flag
and begins hunting for tasks that map this page. Eventually it
invalidates those mappings and may send a SIGBUS to the affected tasks.
But while all that work is going on, other CPUs see a "success" return
code from memory_failure() and so they believe the error has been
handled and continue executing.
Fix by wrapping most of the internal parts of memory_failure() in a
mutex.
[akpm@linux-foundation.org: make mf_mutex local to memory_failure()]
Link: https://lkml.kernel.org/r/20210521030156.2612074-1-nao.horiguchi@gmail.com
Link: https://lkml.kernel.org/r/20210521030156.2612074-2-nao.horiguchi@gmail.com
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Aili Yao <yaoaili@kingsoft.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jue Wang <juew@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If more than one futex is placed on a shmem huge page, it can happen
that waking the second wakes the first instead, and leaves the second
waiting: the key's shared.pgoff is wrong.
When 3.11 commit 13d60f4b6a ("futex: Take hugepages into account when
generating futex_key"), the only shared huge pages came from hugetlbfs,
and the code added to deal with its exceptional page->index was put into
hugetlb source. Then that was missed when 4.8 added shmem huge pages.
page_to_pgoff() is what others use for this nowadays: except that, as
currently written, it gives the right answer on hugetlbfs head, but
nonsense on hugetlbfs tails. Fix that by calling hugetlbfs-specific
hugetlb_basepage_index() on PageHuge tails as well as on head.
Yes, it's unconventional to declare hugetlb_basepage_index() there in
pagemap.h, rather than in hugetlb.h; but I do not expect anything but
page_to_pgoff() ever to need it.
[akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope]
Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com
Fixes: 800d8c63b2 ("shmem: add huge pages support")
Reported-by: Neel Natu <neelnatu@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Zhang Yi <wetpzy@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The system might hang with the following backtrace:
schedule+0x80/0x100
schedule_timeout+0x48/0x138
wait_for_common+0xa4/0x134
wait_for_completion+0x1c/0x2c
kthread_flush_work+0x114/0x1cc
kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144
kthread_cancel_delayed_work_sync+0x18/0x2c
xxxx_pm_notify+0xb0/0xd8
blocking_notifier_call_chain_robust+0x80/0x194
pm_notifier_call_chain_robust+0x28/0x4c
suspend_prepare+0x40/0x260
enter_state+0x80/0x3f4
pm_suspend+0x60/0xdc
state_store+0x108/0x144
kobj_attr_store+0x38/0x88
sysfs_kf_write+0x64/0xc0
kernfs_fop_write_iter+0x108/0x1d0
vfs_write+0x2f4/0x368
ksys_write+0x7c/0xec
It is caused by the following race between kthread_mod_delayed_work()
and kthread_cancel_delayed_work_sync():
CPU0 CPU1
Context: Thread A Context: Thread B
kthread_mod_delayed_work()
spin_lock()
__kthread_cancel_work()
spin_unlock()
del_timer_sync()
kthread_cancel_delayed_work_sync()
spin_lock()
__kthread_cancel_work()
spin_unlock()
del_timer_sync()
spin_lock()
work->canceling++
spin_unlock
spin_lock()
queue_delayed_work()
// dwork is put into the worker->delayed_work_list
spin_unlock()
kthread_flush_work()
// flush_work is put at the tail of the dwork
wait_for_completion()
Context: IRQ
kthread_delayed_work_timer_fn()
spin_lock()
list_del_init(&work->node);
spin_unlock()
BANG: flush_work is not longer linked and will never get proceed.
The problem is that kthread_mod_delayed_work() checks work->canceling
flag before canceling the timer.
A simple solution is to (re)check work->canceling after
__kthread_cancel_work(). But then it is not clear what should be
returned when __kthread_cancel_work() removed the work from the queue
(list) and it can't queue it again with the new @delay.
The return value might be used for reference counting. The caller has
to know whether a new work has been queued or an existing one was
replaced.
The proper solution is that kthread_mod_delayed_work() will remove the
work from the queue (list) _only_ when work->canceling is not set. The
flag must be checked after the timer is stopped and the remaining
operations can be done under worker->lock.
Note that kthread_mod_delayed_work() could remove the timer and then
bail out. It is fine. The other canceling caller needs to cancel the
timer as well. The important thing is that the queue (list)
manipulation is done atomically under worker->lock.
Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com
Fixes: 9a6b06c8d9 ("kthread: allow to modify delayed kthread work")
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reported-by: Martin Liu <liumartin@google.com>
Cc: <jenhaochen@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>