-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAlxu23wTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi04sB/9/Lm694io6coIW/8mTVjKs9iF7+AkA
V11Pu3VHnp8okcNs3iSv6rtbPgLrONP5fg7OC9xAzzsEjA8kitnV7kC0RunzuUiw
BnZcE0v5Eh4NB34DW4w4I0JDkEH75zlsw/ar8HDvgxTw9hCgV56n1oic+MkqAuiL
5TwBjuLL5Nl4tA9djcMtHhBy3u+jQXAEDJ9tiKFBhzQOPo73N2IyZyY+Kz3qqygV
XLWvpmW6PA9jJt8pUGRCqpahaYAY7+cyzOUJbqdRUpeESVl/2Epr7zJZo9LHQtsE
GympS5weYMg6xLoZHQXvLbb77E/SlXkeVyNzBw7LnV6us2PS4tbgSt2J
=alOG
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.0-rc8' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two bug fixes for old issues, both marked for stable"
* tag 'ceph-for-5.0-rc8' of git://github.com/ceph/ceph-client:
ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
libceph: handle an empty authorize reply
Tetsuo has reported that creating a thousands of processes sharing MM
without SIGHAND (aka alien threads) and setting
/proc/<pid>/oom_score_adj will swamp the kernel log and takes ages [1]
to finish. This is especially worrisome that all that printing is done
under RCU lock and this can potentially trigger RCU stall or softlockup
detector.
The primary reason for the printk was to catch potential users who might
depend on the behavior prior to 44a70adec9 ("mm, oom_adj: make sure
processes sharing mm have same view of oom_score_adj") but after more
than 2 years without a single report I guess it is safe to simply remove
the printk altogether.
The next step should be moving oom_score_adj over to the mm struct and
remove all the tasks crawling as suggested by [2]
[1] http://lkml.kernel.org/r/97fce864-6f75-bca5-14bc-12c9f890e740@i-love.sakura.ne.jp
[2] http://lkml.kernel.org/r/20190117155159.GA4087@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20190212102129.26288-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull keys fixes from James Morris:
- Handle quotas better, allowing full quota to be reached.
- Fix the creation of shortcuts in the assoc_array internal
representation when the index key needs to be an exact multiple of
the machine word size.
- Fix a dependency loop between the request_key contruction record and
the request_key authentication key. The construction record isn't
really necessary and can be dispensed with.
- Set the timestamp on a new key rather than leaving it as 0. This
would ordinarily be fine - provided the system clock is never set to
a time before 1970
* 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
keys: Timestamp new keys
keys: Fix dependency loop between construction record and auth key
assoc_array: Fix shortcut creation
KEYS: allow reaching the keys quotas exactly
Commit 8099b047ec ("exec: load_script: don't blindly truncate
shebang string") was trying to protect against a confused exec of a
truncated interpreter path. However, it was overeager and also refused
to truncate arguments as well, which broke userspace, and it was
reverted. This attempts the protection again, but allows arguments to
remain truncated. In an effort to improve readability, helper functions
and comments have been added.
Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Samuel Dionne-Riel <samuel@dionne-riel.com>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Graham Christensen <graham@grahamc.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
that could prevent clients from reclaiming state after a kernel upgrade.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJcZzZHAAoJECebzXlCjuG+EOsQALVuwSJqQh4GUVMSBYzL6Ov4
SfinB8LJ8/1HwngSvRB3xQ4HiOtpFSNkjzfFYE7epy6augY8tRRnHGbnlHbsG5vI
wQqTR6PbSq2mupgpi2WGRlRh521SDOi8V49fplUC+FuV7dJT/wm0hgdKsHCPHPX4
TEYPglsvG6PLu5IcAofNac9PVZH21s3yVIKvqd6yifED5lhopdNw210s5DtzvugI
g2JgHOhTfana+xQS/cJ1U8JHbbpM7jwOXAJ7IWD8k4GXdAW03X6jNOcseudcBTQY
qSL33//6Xdu0r0uI21z4ZWxSWCOtt8YvnbMoG4EBqh3DpKbUpExh8j4eIyNPSuSF
Y/8iAVJ9KWYhWO+IVPqvHVXz4mCIDK+f7iJ/m+lLjOQmWkpp6koeUDjKs4k9zBUC
mbGTOrh0TJzXvKWKEU5Qy7meZVJGUpV+9ca+cDs5XN7Xa3blTp+5VrRVeDgKO5Kx
OF3Y3IBOWhqN7+kEH98RvdZAmtbO0zg02IEIHOMPxH69JU8o0EsEni1LXsqDJrRi
sLVYXvLwdPLfkqSjpI8xNeaoFXeelopx8Re+2oNEFIEvsfeT5XikbQHoqgFJNsyk
hz7PHwuyGjc6NJRRSBUKYouWKPP4rrM7ZiOSyIEDYIIwyhBirpjrECaHzdi3D5j+
xUyFGMF5F3wk1fdQHPfD
=NopI
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.0-2' of git://linux-nfs.org/~bfields/linux
Pull more nfsd fixes from Bruce Fields:
"Two small fixes, one for crashes using nfs/krb5 with older enctypes,
one that could prevent clients from reclaiming state after a kernel
upgrade"
* tag 'nfsd-5.0-2' of git://linux-nfs.org/~bfields/linux:
sunrpc: fix 4 more call sites that were using stack memory with a scatterlist
Revert "nfsd4: return default lease period"
- Make sure Send CQ is allocated on an existing compvec
- Properly check debugfs dentry before using it
- Don't use page_file_mapping() after removing a page
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlxnMQ0ACgkQ18tUv7Cl
QOsbhQ//VhgoXX25xHrApLz8wMuYPNOboDFSUf0O1GWoHi3opHnP+9LPf/iZkRQy
YS0ufcO95i1LGjZLb8ac9hBWkko8TBl/dIONsG4ppf2bAbiVuag848wehi8hsGba
zaSsXV6qdibq4qZsyK35hh0cHVHDgB1EMTu7AVORdvXsTHVX3xL86vts2y2VSLKv
w9yKQBg4E4pWwENi7v77icSuGg/WpwfKnYxBzG6JPXuHQLGidyc/HrnVmLwhd6DQ
0Sa6nzOAvgjjgVibB+tJfsitScmMTsaxulvHsm5iLjPJZ8SUjxYvAPl3AZdCYPvU
XaADy8nrvXJUe9APhMINbkoxnF4W/OPnUMG3bWkWp2LeNZvk5l7VOzTW5Sh49Xyk
pBAOd7qr3kfjFdvzypVz9NeXuS6BsTUA6LAudo8rF7nxi8jHPp6L+zZNWVrPIjY0
+bNIj3K1Bji3jU9vTHyTzxDRB/4ZnzJaPF2Gv/5Y2cvkI7mfzHUz5p6cAU1OPIVB
kuhZXkQFEPSS2OV6MUOe/HgmtY0oLM3XU9cEaFkLz59D1kb1fjO/yUu9YBQMq6Ke
o6b7Dwh4WvLVN/AbgegKOnp5G0/ljmz6y7ML0AElYXg1iT4k0zE+qJpMWhOTRJnd
+jf4hSS+l7p7D1ed+uqdMS/jc1s5vcuxwYDQUIutELjA/TCbLNI=
=28v+
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-5.0-4' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull more NFS client fixes from Anna Schumaker:
"Three fixes this time.
Nicolas's is for xprtrdma completion vector allocation on single-core
systems. Greg's adds an error check when allocating a debugfs dentry.
And Ben's is an additional fix for nfs_page_async_flush() to prevent
pages from accidentally getting truncated.
Summary:
- Make sure Send CQ is allocated on an existing compvec
- Properly check debugfs dentry before using it
- Don't use page_file_mapping() after removing a page"
* tag 'nfs-for-5.0-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Don't use page_file_mapping after removing the page
rpc: properly check debugfs dentry before using it
xprtrdma: Make sure Send CQ is allocated on an existing compvec
In the request_key() upcall mechanism there's a dependency loop by which if
a key type driver overrides the ->request_key hook and the userspace side
manages to lose the authorisation key, the auth key and the internal
construction record (struct key_construction) can keep each other pinned.
Fix this by the following changes:
(1) Killing off the construction record and using the auth key instead.
(2) Including the operation name in the auth key payload and making the
payload available outside of security/keys/.
(3) The ->request_key hook is given the authkey instead of the cons
record and operation name.
Changes (2) and (3) allow the auth key to naturally be cleaned up if the
keyring it is in is destroyed or cleared or the auth key is unlinked.
Fixes: 7ee02a316600 ("keys: Fix dependency loop between construction record and auth key")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
The netfilter conflicts were rather simple overlapping
changes.
However, the cls_tcindex.c stuff was a bit more complex.
On the 'net' side, Cong is fixing several races and memory
leaks. Whilst on the 'net-next' side we have Vlad adding
the rtnl-ness support.
What I've decided to do, in order to resolve this, is revert the
conversion over to using a workqueue that Cong did, bringing us back
to pure RCU. I did it this way because I believe that either Cong's
races don't apply with have Vlad did things, or Cong will have to
implement the race fix slightly differently.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 8099b047ec.
It turns out that people do actually depend on the shebang string being
truncated, and on the fact that an interpreter (like perl) will often
just re-interpret it entirely to get the full argument list.
Reported-by: Samuel Dionne-Riel <samuel@dionne-riel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 2a5f14f279.
This patch causes xfstests generic/311 to fail. Reverting this for
now until we have a proper fix.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit d6ebf5088f.
I forgot that the kernel's default lease period should never be
decreased!
After a kernel upgrade, the kernel has no way of knowing on its own what
the previous lease time was. Unless userspace tells it otherwise, it
will assume the previous lease period was the same.
So if we decrease this value in a kernel upgrade, we end up enforcing a
grace period that's too short, and clients will fail to reclaim state in
time. Symptoms may include EIO and log messages like "NFS:
nfs4_reclaim_open_state: Lock reclaim failed!"
There was no real justification for the lease period decrease anyway.
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Fixes: d6ebf5088f "nfsd4: return default lease period"
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Merge fixes from Andrew Morton:
"6 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: proc: smaps_rollup: fix pss_locked calculation
Rename include/{uapi => }/asm-generic/shmparam.h really
Revert "mm: use early_pfn_to_nid in page_ext_init"
mm/gup: fix gup_pmd_range() for dax
Revert "mm: slowly shrink slabs with a relatively small number of objects"
Revert "mm: don't reclaim inodes with many attached pages"
The 'pss_locked' field of smaps_rollup was being calculated incorrectly.
It accumulated the current pss everytime a locked VMA was found. Fix
that by adding to 'pss_locked' the same time as that of 'pss' if the vma
being walked is locked.
Link: http://lkml.kernel.org/r/20190203065425.14650-1-sspatil@android.com
Fixes: 493b0e9d94 ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: Sandeep Patil <sspatil@android.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: <stable@vger.kernel.org> [4.14.x, 4.19.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit a76cf1a474 ("mm: don't reclaim inodes with many
attached pages").
This change causes serious changes to page cache and inode cache
behaviour and balance, resulting in major performance regressions when
combining worklaods such as large file copies and kernel compiles.
https://bugzilla.kernel.org/show_bug.cgi?id=202441
This change is a hack to work around the problems introduced by changing
how agressive shrinkers are on small caches in commit 172b06c32b ("mm:
slowly shrink slabs with a relatively small number of objects"). It
creates more problems than it solves, wasn't adequately reviewed or
tested, so it needs to be reverted.
Link: http://lkml.kernel.org/r/20190130041707.27750-2-david@fromorbit.com
Fixes: a76cf1a474 ("mm: don't reclaim inodes with many attached pages")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Wolfgang Walter <linux@stwm.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Spock <dairinin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If nfs_page_async_flush() removes the page from the mapping, then we can't
use page_file_mapping() on it as nfs_updatepate() is wont to do when
receiving an error. Instead, push the mapping to the stack before the page
is possibly truncated.
Fixes: 8fc75bed96 ("NFS: Fix up return value on fatal errors in nfs_page_async_flush()")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
nojournal mode unsafe.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlxiSNUACgkQ8vlZVpUN
gaPcCgf+LTKVur85OUKqeEDRfiCYjNemzgSOve8EtiGnPzG5s8vVeVQ5n880lTzY
c/Q2/CotN1/0slWxF8WmZ6xTzdrdadtdJ2iv6O5R58uj3lz3UK4cXJf+XmvQRj3L
5UqMygRN1omKFCp+dB38SeZuE1amV/tV61Q3ATnVMLi+xZcH8lV+efulNi19p2Gd
EC7DYs2seAiNEJAh9nVSL1LWo3i0OH5JJQCv08+c71CPPqixAYuuEy59q55k6OvW
Jp8mhYd4/5ueoV+rxH8Gft52MaOPvPHaxWvxrIbpKb5u8EZ6WUXGZ74qdK3KG6hG
k29IX8XoXWlTHkfSiOOZLUJuW7n+XQ==
=w43x
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fix from Ted Ts'o:
"Revert a commit which landed in v5.0-rc1 since it makes fsync in ext4
nojournal mode unsafe"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
Waiman reported that on large systems with a large amount of interrupts the
readout of /proc/stat takes a long time to sum up the interrupt
statistics. In principle this is not a problem. but for unknown reasons
some enterprise quality software reads /proc/stat with a high frequency.
The reason for this is that interrupt statistics are accounted per cpu. So
the /proc/stat logic has to sum up the interrupt stats for each interrupt.
The interrupt core provides now a per interrupt summary counter which can
be used to avoid the summation loops completely except for interrupts
marked PER_CPU which are only a small fraction of the interrupt space if at
all.
Another simplification is to iterate only over the active interrupts and
skip the potentially large gaps in the interrupt number space and just
print zeros for the gaps without going into the interrupt core in the first
place.
Waiman provided test results from a 4-socket IvyBridge-EX system (60-core
120-thread, 3016 irqs) excuting a test program which reads /proc/stat
50,000 times:
Before: 18.436s (sys 18.380s)
After: 3.769s (sys 3.742s)
Reported-by: Waiman Long <longman@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Daniel Colascione <dancol@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Link: https://lkml.kernel.org/r/20190208135021.013828701@linutronix.de
This series finally gets us to the point of having system calls with
64-bit time_t on all architectures, after a long time of incremental
preparation patches.
There was actually one conversion that I missed during the summer,
i.e. Deepa's timex series, which I now updated based the 5.0-rc1 changes
and review comments.
The following system calls are now added on all 32-bit architectures
using the same system call numbers:
403 clock_gettime64
404 clock_settime64
405 clock_adjtime64
406 clock_getres_time64
407 clock_nanosleep_time64
408 timer_gettime64
409 timer_settime64
410 timerfd_gettime64
411 timerfd_settime64
412 utimensat_time64
413 pselect6_time64
414 ppoll_time64
416 io_pgetevents_time64
417 recvmmsg_time64
418 mq_timedsend_time64
419 mq_timedreceiv_time64
420 semtimedop_time64
421 rt_sigtimedwait_time64
422 futex_time64
423 sched_rr_get_interval_time64
Each one of these corresponds directly to an existing system call
that includes a 'struct timespec' argument, or a structure containing
a timespec or (in case of clock_adjtime) timeval. Not included here
are new versions of getitimer/setitimer and getrusage/waitid, which
are planned for the future but only needed to make a consistent API
rather than for correct operation beyond y2038. These four system
calls are based on 'timeval', and it has not been finally decided
what the replacement kernel interface will use instead.
So far, I have done a lot of build testing across most architectures,
which has found a number of bugs. Runtime testing so far included
testing LTP on 32-bit ARM with the existing system calls, to ensure
we do not regress for existing binaries, and a test with a 32-bit
x86 build of LTP against a modified version of the musl C library
that has been adapted to the new system call interface [3].
This library can be used for testing on all architectures supported
by musl-1.1.21, but it is not how the support is getting integrated
into the official musl release. Official musl support is planned
but will require more invasive changes to the library.
Link: https://lore.kernel.org/lkml/20190110162435.309262-1-arnd@arndb.de/T/
Link: https://lore.kernel.org/lkml/20190118161835.2259170-1-arnd@arndb.de/
Link: https://git.linaro.org/people/arnd/musl-y2038.git/ [2]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJcXf7/AAoJEGCrR//JCVInPSUP/RhsQSCKMGtONB/vVICQhwep
PybhzBSpHWFxszzTi6BEPN1zS9B069G9mDollRBYZCckyPqL/Bv6sI/vzQZdNk01
Q6Nw92OnNE1QP8owZ5TjrZhpbtopWdqIXjsbGZlloUemvuJP2JwvKovQUcn5CPTQ
jbnqU04CVyFFJYVxAnGJ+VSeWNrjW/cm/m+rhLFjUcwW7Y3aodxsPqPP6+K9hY9P
yIWfcH42WBeEWGm1RSBOZOScQl4SGCPUAhFydl/TqyEQagyegJMIyMOv9wZ5AuTT
xK644bDVmNsrtJDZDpx+J8hytXCk1LrnKzkHR/uK80iUIraF/8D7PlaPgTmEEjko
XcrywEkvkXTVU3owCm2/sbV+8fyFKzSPipnNfN1JNxEX71A98kvMRtPjDueQq/GA
Yh81rr2YLF2sUiArkc2fNpENT7EGhrh1q6gviK3FB8YDgj1kSgPK5wC/X0uolC35
E7iC2kg4NaNEIjhKP/WKluCaTvjRbvV+0IrlJLlhLTnsqbA57ZKCCteiBrlm7wQN
4csUtCyxchR9Ac2o/lj+Mf53z68Zv74haIROp18K2dL7ZpVcOPnA3XHeauSAdoyp
wy2Ek6ilNvlNB+4x+mRntPoOsyuOUGv7JXzB9JvweLWUd9G7tvYeDJQp/0YpDppb
K4UWcKnhtEom0DgK08vY
=IZVb
-----END PGP SIGNATURE-----
Merge tag 'y2038-new-syscalls' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground into timers/2038
Pull y2038 - time64 system calls from Arnd Bergmann:
This series finally gets us to the point of having system calls with 64-bit
time_t on all architectures, after a long time of incremental preparation
patches.
There was actually one conversion that I missed during the summer,
i.e. Deepa's timex series, which I now updated based the 5.0-rc1 changes
and review comments.
The following system calls are now added on all 32-bit architectures using
the same system call numbers:
403 clock_gettime64
404 clock_settime64
405 clock_adjtime64
406 clock_getres_time64
407 clock_nanosleep_time64
408 timer_gettime64
409 timer_settime64
410 timerfd_gettime64
411 timerfd_settime64
412 utimensat_time64
413 pselect6_time64
414 ppoll_time64
416 io_pgetevents_time64
417 recvmmsg_time64
418 mq_timedsend_time64
419 mq_timedreceiv_time64
420 semtimedop_time64
421 rt_sigtimedwait_time64
422 futex_time64
423 sched_rr_get_interval_time64
Each one of these corresponds directly to an existing system call that
includes a 'struct timespec' argument, or a structure containing a timespec
or (in case of clock_adjtime) timeval. Not included here are new versions
of getitimer/setitimer and getrusage/waitid, which are planned for the
future but only needed to make a consistent API rather than for correct
operation beyond y2038. These four system calls are based on 'timeval', and
it has not been finally decided what the replacement kernel interface will
use instead.
So far, I have done a lot of build testing across most architectures, which
has found a number of bugs. Runtime testing so far included testing LTP on
32-bit ARM with the existing system calls, to ensure we do not regress for
existing binaries, and a test with a 32-bit x86 build of LTP against a
modified version of the musl C library that has been adapted to the new
system call interface [3]. This library can be used for testing on all
architectures supported by musl-1.1.21, but it is not how the support is
getting integrated into the official musl release. Official musl support is
planned but will require more invasive changes to the library.
Link: https://lore.kernel.org/lkml/20190110162435.309262-1-arnd@arndb.de/T/
Link: https://lore.kernel.org/lkml/20190118161835.2259170-1-arnd@arndb.de/
Link: https://git.linaro.org/people/arnd/musl-y2038.git/ [2]
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlxfARoQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjsgEACP8vQzbvsOZOxHKi9Vcd8ziwyjyBebNh4F
cKOx2Blgv0ReVAqLOVp9VJOJQoVQumV1btaA2YrmevxnCMpNUBpbP6G02tAqe9Z+
D75FSpZXy4UvcMSlhfc/iB/RMI06benI9LnuL7zbzIQtrbtu+OFRnO6fpQOVGLxT
Qa1wt/Rgahc48L4aHnIgPn0nyBRsEvuhC6FjI2D8akDaNiaHzwtGbpx7yDdmLNml
fCzC2uSRJ31bXsO/5/fJorinaJ56r5N8aHaINYwXDv8zd8i94nQZhITAasXub1Km
0nyuAg/fSzIdkrGmPINTKFaGYsOfRwpS4C4vagreBhzjfolPY0z9sQEQ63gZzDrd
mAjHPxLTd165OLlR/RxoMC8AjZCZ0/YQaucxUOPkaIHfth5/dy5BFaCkWyA/I7/Z
VnAyq0SqeL4hgIOGxZM0HeehKx+palNdJNZTcY7vF/7MVPuh5WM6z/FWsFa8k+ss
B9YN4wchh7I8EVbLmfz9s/eqabRWF3Agh1dE+yAKwt1KIWHaMXWZTRQnj/69fs2e
s3pwVMiiSz6K/Xnoe12nmQ4K0XeyKNROO78IIGY/Oa0Pe/hzCAaJMRMDsLp5EcJj
dxpoi1OfGHMGoqYhL6tx6Atq5f6CMDrS28k/D44DHfO7T1qQGVy1A9SY7ZCfM5+c
HKxTuRh8mg==
=tuL6
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20190209' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request from Christoph, fixing namespace locking when
dealing with the effects log, and a rapid add/remove issue (Keith)
- blktrace tweak, ensuring requests with -1 sectors are shown (Jan)
- link power management quirk for a Smasung SSD (Hans)
- m68k nfblock dynamic major number fix (Chengguang)
- series fixing blk-iolatency inflight counter issue (Liu)
- ensure that we clear ->private when setting up the aio kiocb (Mike)
- __find_get_block_slow() rate limit print (Tetsuo)
* tag 'for-linus-20190209' of git://git.kernel.dk/linux-block:
blk-mq: remove duplicated definition of blk_mq_freeze_queue
Blk-iolatency: warn on negative inflight IO counter
blk-iolatency: fix IO hang due to negative inflight counter
blktrace: Show requests without sector
fs: ratelimit __find_get_block_slow() failure message.
m68k: set proper major_num when specifying module param major_num
libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD
nvme-pci: fix rapid add remove sequence
nvme: lock NS list changes while handling command effects
aio: initialize kiocb private in case any filesystems expect it.
An ipvlan bug fix in 'net' conflicted with the abstraction away
of the IPV6 specific support in 'net-next'.
Similarly, a bug fix for mlx5 in 'net' conflicted with the flow
action conversion in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Here are some driver core fixes for 5.0-rc6.
Well, not so much "driver core" as "debugfs". There's a lot of
outstanding debugfs cleanup patches coming in through different
subsystem trees, and in that process the debugfs core was found that it
really should return errors when something bad happens, to prevent
random files from showing up in the root of debugfs afterward. So
debugfs was fixed up to handle this properly, and then two fixes for
the relay and blk-mq code was needed as it was making invalid
assumptions about debugfs return values.
There's also a cacheinfo fix in here that resolves a tiny issue.
All of these have been in linux-next for over a week with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXF069g8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yk0+gCgy9PTVAJR5ZbYtWTJOTdBnd7pfqMAoMuGxc+6
LLEbfSykLRxEf5SeOJun
=KP8e
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are some driver core fixes for 5.0-rc6.
Well, not so much "driver core" as "debugfs". There's a lot of
outstanding debugfs cleanup patches coming in through different
subsystem trees, and in that process the debugfs core was found that
it really should return errors when something bad happens, to prevent
random files from showing up in the root of debugfs afterward. So
debugfs was fixed up to handle this properly, and then two fixes for
the relay and blk-mq code was needed as it was making invalid
assumptions about debugfs return values.
There's also a cacheinfo fix in here that resolves a tiny issue.
All of these have been in linux-next for over a week with no reported
problems"
* tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
blk-mq: protect debugfs_create_files() from failures
relay: check return of create_buf_file() properly
debugfs: debugfs_lookup() should return NULL if not found
debugfs: return error values, not NULL
debugfs: fix debugfs_rename parameter checking
cacheinfo: Keep the old value if of_property_read_u32 fails
- Fix cache coherency problem with writeback mappings
- Fix buffer deadlock when shutting fs down
- Fix a null pointer dereference when running online repair
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlxYbyEACgkQ+H93GTRK
tOtLNg//VIiU6w1EgiQBgk8XxN3i9YQLBIzfbtgZGtLJWX8zYnGx9H4iVfZ9UDr8
eKFoPVLvqZo5mutuot+3+ps5H4z6g9BotS048FJjFoarQwtYqnG3tcFkmLDInKW1
jTPWBV/P7w+ODyPO082SyQ+Zn9pooyXkPoBbgA+vbQoqIsY5IF7VeFasrFMtRu21
PEm/CpMxK5VMly+5ceOoqtdlWvRPDfczLfzW/iDZ4Qs2itUqFA6TJo5TD7kd4A/f
yrjV6H5tWtp0uvBCBDq4W225uVUFVWC+wTrrII6qbvuDBNWfBsQ65GczYubAAu9X
kdJdY3xj/Br1dk6jLTciCjihbjJ49xaxXfLAokNkh1pjHqHyinB5ALXC0dG4o+eo
d5y5qo10zt3HZ8Kzr8753SzxRBjGbQhok+ytrBSpX8GckhAXmH5S6WZDFDh6PbJj
5PSwvL7FNbS4M/Myjl+dwk3kWLVrGV2SglOJxCCsqCZPxNzopIrNf1uazLTZV+/2
d+G7LQPXSvjK/iLfDQH/6sBIREx0nd3H/6mnmWBg/1xMD6z/Hgn8GJvAE8luRtOi
usXYcjlkSEOSwxbUC4fCo0CrPp8DOHbrEEO4pavTN+GVIYsIen0ghq/x0HfcOCEu
XguyRTYdQXnLZNo3zCqmnU4/C1W2L5Oce4IDznH8PEIgqAqXXrQ=
=XsPH
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Here are a handful of XFS fixes to fix a data corruption problem, a
crasher bug, and a deadlock.
Summary:
- Fix cache coherency problem with writeback mappings
- Fix buffer deadlock when shutting fs down
- Fix a null pointer dereference when running online repair"
* tag 'xfs-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: set buffer ops when repair probes for btree type
xfs: end sync buffer I/O properly on shutdown error
xfs: eof trim writeback mapping as soon as it is cached
Creating a new cache for kernfs_iattrs.
Currently, memory is allocated with kzalloc() which
always gives aligned memory. On ARM, this is 64 byte aligned.
To avoid the wastage of memory in aligning the size requested,
a new cache for kernfs_iattrs is created.
Size of struct kernfs_iattrs is 80 Bytes.
On ARM, it will come in kmalloc-128 slab.
and it will come in kmalloc-192 slab if debug info is enabled.
Extra bytes taken 48 bytes.
Total number of objects created : 4096
Total saving = 48*4096 = 192 KB
After creating new slab(When debug info is enabled) :
sh-3.2# cat /proc/slabinfo
...
kernfs_iattrs_cache 4069 4096 128 32 1 : tunables 0 0 0 : slabdata 128 128 0
...
All testing has been done on ARM target.
Signed-off-by: Ayush Mittal <ayush.m@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This include is not needed (fs/sysfs/file.c builds just fine without
it). Remove it.
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJcXJY6AAoJECebzXlCjuG+scAP/jm9ETUxC3E9ZcVetSLs7vf1
AxHVsr3E3qf9uViq/+1NBRJ/BKE1Porzi1Uz01ie5MdY6SHWFMxvIvZrTRuEAhbc
VA9xeRcnX+7RkPWik3sepSUieVy8KgJDMxdE07HwwyzST14I1s5Ev79wo6XiYlTw
3+ZdZe19Y4owmTkbDiLsxJVsI2Y8b+9BIhZ9/ICRyFzZclnyLdO15HTDr4q7E9cw
ZEZMOoljX4cjY8cD7tqf68QECxZYm4a8Ba+L6P+oqKajq/6yUrocXA2UG65EMtIQ
LtMIdpkC+zHFagRQBC3ymqEWTpX9ED6TA4H4ZSdh6UH8NwYXHsxQUYfI4Oetju/B
iqWBbXpwg2jMNRDhXS/KdNezYcGjYGJZ03dezeSP/GwojoiKALD0iXxWU2zyq0Qs
crnhmc2j1wZZl5CFXLCYwjDjHbeH6gWGfLuzAv1Q9/jQUitQ2CpC4t1MCRdOlWBt
cqjCleF6Rd3oVk51BdYPm5OyCHyQjrrXmOsx8aHkY774p7TqsaHBjSreTBjQWhO8
wAfnr6yS4/21Hfrji52Nf4Q6UJ1FFEWB0wXJhYAHzem1RsPGSbnEDam6Bpd6Bh0d
ZxAU4spoVhezQPjF4JCmHfiAJ7rbegfltJa669rE5L1kbUYYE6TAAcOm8RTSWoPZ
iGxkg/en5XiGsOGH9AyB
=++rV
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.0-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
"Two small nfsd bugfixes for 5.0, for an RDMA bug and a file clone bug"
* tag 'nfsd-5.0-1' of git://linux-nfs.org/~bfields/linux:
svcrdma: Remove max_sge check at connect time
nfsd: Fix error return values for nfsd4_clone_file_range()
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXFls+wAKCRDh3BK/laaZ
PC+hAQDRkyJeAmMzpHwvv/IASqpJgc6HrSzH0p201lDyARcKIAD+MWxZHYP4ltAn
WVTLIvYT1xsoqGG3plfZ/d1iNbAWcwU=
=cL/O
-----END PGP SIGNATURE-----
Merge tag 'fuse-fixes-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"A fix for a CUSE regression introduced in v4.20, as well as fixes for
a couple of old bugs"
* tag 'fuse-fixes-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: decrement NR_WRITEBACK_TEMP on the right page
fuse: call pipe_buf_release() under pipe lock
cuse: fix ioctl
fuse: handle zero sized retrieve correctly
A lot of system calls that pass a time_t somewhere have an implementation
using a COMPAT_SYSCALL_DEFINEx() on 64-bit architectures, and have
been reworked so that this implementation can now be used on 32-bit
architectures as well.
The missing step is to redefine them using the regular SYSCALL_DEFINEx()
to get them out of the compat namespace and make it possible to build them
on 32-bit architectures.
Any system call that ends in 'time' gets a '32' suffix on its name for
that version, while the others get a '_time32' suffix, to distinguish
them from the normal version, which takes a 64-bit time argument in the
future.
In this step, only 64-bit architectures are changed, doing this rename
first lets us avoid touching the 32-bit architectures twice.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
If the parameter 'count' is non-zero, nfsd4_clone_file_range() will
currently clobber all errors returned by vfs_clone_file_range() and
replace them with EINVAL.
Fixes: 42ec3d4c02 ("vfs: make remap_file_range functions take and...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When something let __find_get_block_slow() hit all_mapped path, it calls
printk() for 100+ times per a second. But there is no need to print same
message with such high frequency; it is just asking for stall warning, or
at least bloating log files.
[ 399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
[ 399.873324][T15342] b_state=0x00000029, b_size=512
[ 399.878403][T15342] device loop0 blocksize: 4096
[ 399.883296][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
[ 399.890400][T15342] b_state=0x00000029, b_size=512
[ 399.895595][T15342] device loop0 blocksize: 4096
[ 399.900556][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
[ 399.907471][T15342] b_state=0x00000029, b_size=512
[ 399.912506][T15342] device loop0 blocksize: 4096
This patch reduces frequency to up to once per a second, in addition to
concatenating three lines into one.
[ 399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8, b_state=0x00000029, b_size=512, device loop0 blocksize: 4096
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A recent optimization had left private uninitialized.
Fixes: 2bc4ca9bb6 ("aio: don't zero entire aio_kiocb aio_get_req()")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable sighand_struct.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
** Important note for maintainers:
Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.
For the sighand_struct.count it might make a difference
in following places:
- __cleanup_sighand: decrement in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: viro@zeniv.linux.org.uk
Link: https://lkml.kernel.org/r/1547814450-18902-2-git-send-email-elena.reshetova@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In xrep_findroot_block, we work out the btree type and correctness of a
given block by calling different btree verifiers on root block
candidates. However, we leave the NULL b_ops while ->verify_read
validates the block, which means that if the verifier calls
xfs_buf_verifier_error it'll crash on the null b_ops. Fix it to set
b_ops before calling the verifier and unsetting it if the verifier
fails.
Furthermore, improve the documentation around xfs_buf_ensure_ops, which
is the function that is responsible for cleaning up the b_ops state of
buffers that go through xrep_findroot_block but don't match anything.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
As of commit e339dd8d8b ("xfs: use sync buffer I/O for sync delwri
queue submission"), the delwri submission code uses sync buffer I/O
for sync delwri I/O. Instead of waiting on async I/O to unlock the
buffer, it uses the underlying sync I/O completion mechanism.
If delwri buffer submission fails due to a shutdown scenario, an
error is set on the buffer and buffer completion never occurs. This
can cause xfs_buf_delwri_submit() to deadlock waiting on a
completion event.
We could check the error state before waiting on such buffers, but
that doesn't serialize against the case of an error set via a racing
I/O completion. Instead, invoke I/O completion in the shutdown case
regardless of buffer I/O type.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The cached writeback mapping is EOF trimmed to try and avoid races
between post-eof block management and writeback that result in
sending cached data to a stale location. The cached mapping is
currently trimmed on the validation check, which leaves a race
window between the time the mapping is cached and when it is trimmed
against the current inode size.
For example, if a new mapping is cached by delalloc conversion on a
blocksize == page size fs, we could cycle various locks, perform
memory allocations, etc. in the writeback codepath before the
associated mapping is eventually trimmed to i_size. This leaves
enough time for a post-eof truncate and file append before the
cached mapping is trimmed. The former event essentially invalidates
a range of the cached mapping and the latter bumps the inode size
such the trim on the next writepage event won't trim all of the
invalid blocks. fstest generic/464 reproduces this scenario
occasionally and causes a lost writeback and stale delalloc blocks
warning on inode inactivation.
To work around this problem, trim the cached writeback mapping as
soon as it is cached in addition to on subsequent validation checks.
This is a minor tweak to tighten the race window as much as possible
until a proper invalidation mechanism is available.
Fixes: 40214d128e ("xfs: trim writepage mapping to within eof")
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval
as the time format. struct timeval is not y2038 safe.
The subsequent patches in the series add support for new socket
timeout options with _NEW suffix that will use y2038 safe
data structures. Although the existing struct timeval layout
is sufficiently wide to represent timeouts, because of the way
libc will interpret time_t based on user defined flag, these
new flags provide a way of having a structure that is the same
for all architectures consistently.
Rename the existing options with _OLD suffix forms so that the
right option is enabled for userspace applications according
to the architecture and time_t definition of libc.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: ccaulfie@redhat.com
Cc: deller@gmx.de
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: cluster-devel@redhat.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlxWtJgACgkQxWXV+ddt
WDsDow//ZpnyDwQWvSIfF2UUQOPlcBjbHKuBA7rU0wdybW635QYGR0mqnI+1VnMj
7ssUkeN6N0a2gQzrUG4Y+zpdzOWv2xQ4jKZ9GMOp9gwyzEFyPkcFXOnmM8UfYtVu
e3fK65e8BZHmTeu0kGKah4Dt1g0t4fUmhsKR4Pfp5YNJC+zuuGTwUW1K/ZQHXJ+3
kTHc7WP1lsF7wgaZ+Gl+Kvp8fVrHVdygMVTdRBW8QaBgPLa/KExvK62jW+NmCYhj
7OIkWdew7e8IXc3Ie5IbOomHAv7IgqqgiO9VO9+n0EpyV4UobUgxrgBKJ+0yc1Ya
eidbKhMslwUE50y00JVm+vw0gwQHkR+hZDn/mRB6xiIeI8tu/yQIJZ6AhYJXoByR
cu8+SNO5Z5dOZ1f146ZH8lnkr24tuSnkDUhbRDR5pdb4tAHHej2ALzhbfbwbPEpF
IverYLw5fOMGeRU/mBsjkVadpSZ4S0HVNU85ERdhLtVLK1PSaY2UkUaA+Ii5y7au
EYDjaGMflmJ8cAVqgtgedEff2n8OKDnzRZlz4IPLI73MVSITGZkM7PmYmYsLm2Bs
mDPnmyqR8kzcd1RRtSeZTvqOpAIZG+QUOmD2jiKrchmp54Sz0V/HJWRs3aQybD6Q
ph0yAcbkvgp/ewe5IFgaI0kcyH7zYdL6GtiI2WUE3/8DObgrgsA=
=E2PP
-----END PGP SIGNATURE-----
Merge tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- regression fix: transaction commit can run away due to delayed ref
waiting heuristic, this is not necessary now because of the proper
reservation mechanism introduced in 5.0
- regression fix: potential crash due to use-before-check of an ERR_PTR
return value
- fix for transaction abort during transaction commit that needs to
properly clean up pending block groups
- fix deadlock during b-tree node/leaf splitting, when this happens on
some of the fundamental trees, we must prevent new tree block
allocation to re-enter indirectly via the block group flushing path
- potential memory leak after errors during mount
* tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: On error always free subvol_name in btrfs_mount
btrfs: clean up pending block groups when transaction commit aborts
btrfs: fix potential oops in device_list_add
btrfs: don't end the transaction for delayed refs in throttle
Btrfs: fix deadlock when allocating tree block during leaf/node split
Merge misc fixes from Andrew Morton:
"24 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
autofs: fix error return in autofs_fill_super()
autofs: drop dentry reference only when it is never used
fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
psi: clarify the Kconfig text for the default-disable option
mm, memory_hotplug: __offline_pages fix wrong locking
mm: hwpoison: use do_send_sig_info() instead of force_sig()
kasan: mark file common so ftrace doesn't trace it
init/Kconfig: fix grammar by moving a closing parenthesis
lib/test_kmod.c: potential double free in error handling
mm, oom: fix use-after-free in oom_kill_process
mm/hotplug: invalid PFNs from pfn_to_online_page()
mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
psi: fix aggregation idle shut-off
mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
oom, oom_reaper: do not enqueue same task twice
mm: migrate: make buffer_migrate_page_norefs() actually succeed
kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
x86_64: increase stack size for KASAN_EXTRA
...
-----BEGIN PGP SIGNATURE-----
iQGyBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlxUf8kACgkQiiy9cAdy
T1Fy8Qv4rlgVRBPcSJpvZKhFjMd5KskVIDnP0tOc5od+Mg+547eAg7UKKnJVDgKS
OUtiP8uC/UErWvQ214hoXF1sMwoG+rDdTRdIOVDLD2a2gIIJzaIPADhYt4w5kiAX
nO9NwPwk03KTzNUQmpdFPofbLK4csmJUR4kBNE+XhZTulmw9BkplGXnNkQGjsEdf
nY6YdfqofE//DmR/KB1BylD0lxDtnfuB5zoELPmM6X4iWtD9W6pKVg23huEv+Dmh
80LdYt77vI4RKEzOAmsYEpROdAmlCksVQC1nlh5EHOuMN/9TCdpdWRahJWWhbbKr
/b8TuD+0QUUKEQdGvwM8DM/OCjnwoJphzI8CK1VzRpM4XkZxpQTzng2zVFPNzs4Y
wwKCsxxTO7ePGg6hE/DTnRn9XJtKbR1nQsIjHcw6rE0ydl+P6YRZDUEehqyRU+jd
Z09XCPS1+zeUPE6PRE9wlrIt7QdxOynufhaWRKf18b2b4cfKbNGcd+rMMZPagUQe
x2CMkEU=
=qX/z
-----END PGP SIGNATURE-----
Merge tag '5.0-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb3 fixes from Steve French:
"SMB3 fixes, some from this week's SMB3 test evemt, 5 for stable and a
particularly important one for queryxattr (see xfstests 70 and 117)"
* tag '5.0-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
CIFS: fix use-after-free of the lease keys
CIFS: Do not consider -ENODATA as stat failure for reads
CIFS: Do not count -ENODATA as failure for query directory
CIFS: Fix trace command logging for SMB2 reads and writes
CIFS: Fix possible oops and memory leaks in async IO
cifs: limit amount of data we request for xattrs to CIFSMaxBufSize
cifs: fix computation for MAX_SMB2_HDR_SIZE
In autofs_fill_super() on error of get inode/make root dentry the return
should be ENOMEM as this is the only failure case of the called
functions.
Link: http://lkml.kernel.org/r/154725123240.11260.796773942606871359.stgit@pluto-themaw-net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
autofs_expire_run() calls dput(dentry) to drop the reference count of
dentry. However, dentry is read via autofs_dentry_ino(dentry) after
that. This may result in a use-free-bug. The patch drops the reference
count of dentry only when it is never used.
Link: http://lkml.kernel.org/r/154725122396.11260.16053424107144453867.stgit@pluto-themaw-net
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When superblock has lots of inodes without any pagecache (like is the
case for /proc), drop_pagecache_sb() will iterate through all of them
without dropping sb->s_inode_list_lock which can lead to softlockups
(one of our customers hit this).
Fix the problem by going to the slow path and doing cond_resched() in
case the process needs rescheduling.
Link: http://lkml.kernel.org/r/20190114085343.15011-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- fix page migration when using iomap for pagecache management
- fix a use-after-free bug in the directio code
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlxPK1IACgkQ+H93GTRK
tOvzeA/+I4bWVmovfV+EGFzHSV6zRsy17v6c4ncMrdia41rhmvxl+sAJgj2+uFCb
J37cCMpPrmAOz+JGIW7PbCt8uzmwaXOfB9p9N58wM+hSSxtlN+wZFsIaoepOUTDK
t3e2L7QQxQjN9HXZU0RNUi/zgS3poDfzap7cZ71spBxX5hVd1zVQa0q/o5OXr7OI
sBlZLsIOhmS8WU2TmfkwzUVi+/FR4dCgyP8eDAGho/KbwvO9sfWzLNxf0U/ORWfA
JG+2LX42eKKjX6wo39zW1mAXwOBhnLnCqOOgKrDy8XRXPARiNAzLNn0AwhxJSAqD
z3qE298Oag6gZo0lNJjDXIko5D9y2koWjQe7z4fJ2JYpV5mSyq8/F4XDNO2FKIap
07p0OBGa1yfwQYmS5TrhJvYwvsHqTNs122jpowqeD3o0Xh64y2TZLEMdhhIsRjga
+S9OSVQ15JDf/QeI5LPGK6Oc6B3JnRYgrYf7g7DYu4eqEsJ3V3pqtbXzjxGkzUjx
5xf85ujuRUQoKCPZQ00ewmsfZMfOcaYqfhosx6LvR6ZPvPH3Ex3nLHks5ZV/eXfR
Auusq6XMiHb5ljukfVCa0WStntUl5gMaRha5QJy1Vg5Zd1ikvTkB5CkJFlODJ8hS
GaEIa58Gf75dkHOvm4bFEOoy2EcE3I+qdwR+0xgg+6gDlNeZZNQ=
=KYgc
-----END PGP SIGNATURE-----
Merge tag 'iomap-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:
"A couple of iomap fixes to eliminate some memory corruption and hang
problems that were reported:
- fix page migration when using iomap for pagecache management
- fix a use-after-free bug in the directio code"
* tag 'iomap-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: fix a use after free in iomap_dio_rw
iomap: get/put the page in iomap_page_create/release()
On ppc64le, When a string with PAGE_SIZE - 1 (i.e. 64k-1) length is
passed as a "filesystem type" argument to the mount(2) syscall,
copy_mount_string() ends up allocating 64k (the PAGE_SIZE on ppc64le)
worth of space for holding the string in kernel's address space.
Later, in set_precision() (invoked by get_fs_type() ->
__request_module() -> vsnprintf()), we end up assigning
strlen(fs-type-string) i.e. 65535 as the
value to 'struct printf_spec'->precision member. This field has a width
of 16 bits and it is a signed data type. Hence an invalid value ends
up getting assigned. This causes the "WARN_ONCE(spec->precision != prec,
"precision %d too large", prec)" statement inside set_precision() to be
executed.
This commit fixes the bug by limiting the length of the string passed by
copy_mount_string() to strndup_user() to PATH_MAX.
Signed-off-by: Chandan Rajendra <chandan@linux.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.ibm.com>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This reverts commit ad211f3e94.
As Jan Kara pointed out, this change was unsafe since it means we lose
the call to sync_mapping_buffers() in the nojournal case. The
original point of the commit was avoid taking the inode mutex (since
it causes a lockdep warning in generic/113); but we need the mutex in
order to call sync_mapping_buffers().
The real fix to this problem was discussed here:
https://lore.kernel.org/lkml/20181025150540.259281-4-bvanassche@acm.org
The proposed patch was to fix a syzbot complaint, but the problem can
also demonstrated via "kvm-xfstests -c nojournal generic/113".
Multiple solutions were discused in the e-mail thread, but none have
landed in the kernel as of this writing. Anyway, commit
ad211f3e94 is absolutely the wrong way to suppress the lockdep, so
revert it.
Fixes: ad211f3e94 ("ext4: use ext4_write_inode() when fsyncing w/o a journal")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported: Jan Kara <jack@suse.cz>
This reverts commit 2d29f6b96d.
It turns out that the fix can lead to a ~20 percent performance regression
in initial writes to the page cache according to iozone. Let's revert this
for now to have more time for a proper fix.
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stable bugfix:
- Fix up return value on fatal errors in nfs_page_async_flush()
Other bugfix:
- Fix NULL pointer dereference of dev_name
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlxTOEsACgkQ18tUv7Cl
QOuS2A//U2J1xz2N8R/k9I4puMXss+DpUAfryNRrDul0qL4tsr7UhHzHezJVl17X
coPGA/YD+voybyT+eYeACCHUhDMNN8gj2KoCMlE1ueWAbiCOxrS4NgFM2djO3lka
dlfqgSbVS1Z7+KtEEiFGq/HiF6y0WxanMBHnfhllNbXBDE6W0/+EPdgjX7fZF3FF
AS6QQmruXL/b1/hJasfTsF3wcHs3y+Y23RP85j4F8aYrcWLOyPUhhuzv/o6Zoh37
fqltMxueWy+2qpn8dBE+9ILuKnUxnIsIwpF4YFhI7XrQlqMIWYMrShiqSDqYeVUP
3qdX8LtRR2VsNCTDR9HamVtCkbi9DkJRXQA/fChVPiLA+P0W2Q2uiKsNKEijuZdl
9fvl9aIL/+glczHrZeJTKellFSEocaZ/L5gVmpM6Fk8zyFitP0+nkO40g/qou+A0
O77A+EK9v4XPe8z87kwrZhphT12QZK2oIPMAZDnjitktbuObip0Wva4w92KnIqK0
QPIN081oxNF7BnWEUESCTeqXl670lV83Xek1eVHSCTnFOI68riP1YoUQlIhujV/R
82J+y6HJYtLDj87NuJrAAXtUrtzAPDr39TJr3V2aH0kdpPajUAhkC3gLix13ORyM
cmP3K1M3U5f3HAElrywqQrGcxaYKN/Hpfb2427vEnbxieTKElVo=
=ZOXa
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-5.0-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"This addresses two bugs, one in the error code handling of
nfs_page_async_flush() and one to fix a potential NULL pointer
dereference in nfs_parse_devname().
Stable bugfix:
- Fix up return value on fatal errors in nfs_page_async_flush()
Other bugfix:
- Fix NULL pointer dereference of dev_name"
* tag 'nfs-for-5.0-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Fix up return value on fatal errors in nfs_page_async_flush()
nfs: Fix NULL pointer dereference of dev_name
The request buffers are freed right before copying the pointers.
Use the func args instead which are identical and still valid.
Simple reproducer (requires KASAN enabled) on a cifs mount:
echo foo > foo ; tail -f foo & rm foo
Cc: <stable@vger.kernel.org> # 4.20
Fixes: 179e44d49c ("smb3: add tracepoint for sending lease break responses to server")
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
The current dentry number tracking code doesn't distinguish between
positive & negative dentries. It just reports the total number of
dentries in the LRU lists.
As excessive number of negative dentries can have an impact on system
performance, it will be wise to track the number of positive and
negative dentries separately.
This patch adds tracking for the total number of negative dentries in
the system LRU lists and reports it in the 5th field in the
/proc/sys/fs/dentry-state file. The number, however, does not include
negative dentries that are in flight but not in the LRU yet as well as
those in the shrinker lists which are on the way out anyway.
The number of positive dentries in the LRU lists can be roughly found by
subtracting the number of negative dentries from the unused count.
Matthew Wilcox had confirmed that since the introduction of the
dentry_stat structure in 2.1.60, the dummy array was there, probably for
future extension. They were not replacements of pre-existing fields.
So no sane applications that read the value of /proc/sys/fs/dentry-state
will do dummy thing if the last 2 fields of the sysctl parameter are not
zero. IOW, it will be safe to use one of the dummy array entry for
negative dentry count.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
lists and the shrink lists where the DCACHE_LRU_LIST bit is set.
The shrink_dcache_sb() function moves dentries from the LRU list to a
shrink list and subtracts the dentry count from nr_dentry_unused. This
is incorrect as the nr_dentry_unused count will also be decremented in
shrink_dentry_list() via d_shrink_del().
To fix this double decrement, the decrement in the shrink_dcache_sb()
function is taken out.
Fixes: 4e717f5c10 ("list_lru: remove special case function list_lru_dispose_all."
Cc: stable@kernel.org
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The subvol_name is allocated in btrfs_parse_subvol_options and is
consumed and freed in mount_subvol. Add a free to the error paths that
don't call mount_subvol so that it is guaranteed that subvol_name is
freed when an error happens.
Fixes: 312c89fbca ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()")
Cc: stable@vger.kernel.org # v4.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
alloc_fs_devices() can return ERR_PTR(-ENOMEM), so dereferencing its
result before the check for IS_ERR() is a bad idea.
Fixes: d1a6300282 ("btrfs: add members to fs_devices to track fsid changes")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Lots of callers of debugfs_lookup() were just checking NULL to see if
the file/directory was found or not. By changing this in ff9fb72bc0
("debugfs: return error values, not NULL") we caused some subsystems to
easily crash.
Fixes: ff9fb72bc0 ("debugfs: return error values, not NULL")
Reported-by: syzbot+b382ba6a802a3d242790@syzkaller.appspotmail.com
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When doing reads beyound the end of a file the server returns
error STATUS_END_OF_FILE error which is mapped to -ENODATA.
Currently we report it as a failure which confuses read stats.
Change it to not consider -ENODATA as failure for stat purposes.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Currently we log success once we send an async IO request to
the server. Instead we need to analyse a response and then log
success or failure for a particular command. Also fix argument
list for read logging.
Cc: <stable@vger.kernel.org> # 4.18
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Allocation of a page array for non-cached IO was separated from
allocation of rdata and wdata structures and this introduced memory
leaks and a possible null pointer dereference. This patch fixes
these problems.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
minus the various headers and blobs that will be part of the reply.
or else we might trigger a session reconnect.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
The size of the fixed part of the create response is 88 bytes not 56.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Ensure that we return the fatal error value that caused us to exit
nfs_page_async_flush().
Fixes: c373fff7bd ("NFSv4: Don't special case "launder"")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.12+
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When an error happens, debugfs should return an error pointer value, not
NULL. This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).
So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a NULL pointer dereference of dev_name in nfs_parse_devname()
The oops looks something like:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
RIP: 0010:nfs_fs_mount+0x3b6/0xc20 [nfs]
...
Call Trace:
? ida_alloc_range+0x34b/0x3d0
? nfs_clone_super+0x80/0x80 [nfs]
? nfs_free_parsed_mount_data+0x60/0x60 [nfs]
mount_fs+0x52/0x170
? __init_waitqueue_head+0x3b/0x50
vfs_kern_mount+0x6b/0x170
do_mount+0x216/0xdc0
ksys_mount+0x83/0xd0
__x64_sys_mount+0x25/0x30
do_syscall_64+0x65/0x220
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fix this by adding a NULL check on dev_name
Signed-off-by: Yao Liu <yotta.liu@ucloud.cn>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Previously callers to btrfs_end_transaction_throttle() would commit the
transaction if there wasn't enough delayed refs space. This happens in
relocation, and if the fs is relatively empty we'll run out of delayed
refs space basically immediately, so we'll just be stuck in this loop of
committing the transaction over and over again.
This code existed because we didn't have a good feedback mechanism for
running delayed refs, but with the delayed refs rsv we do now. Delete
this throttling code and let the btrfs_start_transaction() in relocation
deal with putting pressure on the delayed refs infrastructure. With
this patch we no longer take 5 minutes to balance a metadata only fs.
Qu has submitted a fstest to catch slow balance or excessive transaction
commits. Steps to reproduce:
* create subvolume
* create many (eg. 16000) inlined files, of size 2KiB
* iteratively snapshot and touch several files to trigger metadata
updates
* start balance -m
Reported-by: Qu Wenruo <wqu@suse.com>
Fixes: 64403612b7 ("btrfs: rework btrfs_check_space_for_delayed_refs")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add tags and steps to reproduce ]
Signed-off-by: David Sterba <dsterba@suse.com>
When splitting a leaf or node from one of the trees that are modified when
flushing pending block groups (extent, chunk, device and free space trees),
we need to allocate a new tree block, which in turn can result in the need
to allocate a new block group. After allocating the new block group we may
need to flush new block groups that were previously allocated during the
course of the current transaction, which is what may cause a deadlock due
to attempts to write lock twice the same leaf or node, as when splitting
a leaf or node we are holding a write lock on it and its parent node.
The same type of deadlock can also happen when increasing the tree's
height, since we are holding a lock on the existing root while allocating
the tree block to use as the new root node.
An example trace when the deadlock happens during the leaf split path is:
[27175.293054] CPU: 0 PID: 3005 Comm: kworker/u17:6 Tainted: G W 4.19.16 #1
[27175.293942] Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
[27175.294846] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
(...)
[27175.298384] RSP: 0018:ffffab2087107758 EFLAGS: 00010246
[27175.299269] RAX: 0000000000000bbd RBX: ffff9fadc7141c48 RCX: 0000000000000001
[27175.300155] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9fadc7141c48
[27175.301023] RBP: 0000000000000001 R08: ffff9faeb6ac1040 R09: ffff9fa9c0000000
[27175.301887] R10: 0000000000000000 R11: 0000000000000040 R12: ffff9fb21aac8000
[27175.302743] R13: ffff9fb1a64d6a20 R14: 0000000000000001 R15: ffff9fb1a64d6a18
[27175.303601] FS: 0000000000000000(0000) GS:ffff9fb21fa00000(0000) knlGS:0000000000000000
[27175.304468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[27175.305339] CR2: 00007fdc8743ead8 CR3: 0000000763e0a006 CR4: 00000000003606f0
[27175.306220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[27175.307087] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[27175.307940] Call Trace:
[27175.308802] btrfs_search_slot+0x779/0x9a0 [btrfs]
[27175.309669] ? update_space_info+0xba/0xe0 [btrfs]
[27175.310534] btrfs_insert_empty_items+0x67/0xc0 [btrfs]
[27175.311397] btrfs_insert_item+0x60/0xd0 [btrfs]
[27175.312253] btrfs_create_pending_block_groups+0xee/0x210 [btrfs]
[27175.313116] do_chunk_alloc+0x25f/0x300 [btrfs]
[27175.313984] find_free_extent+0x706/0x10d0 [btrfs]
[27175.314855] btrfs_reserve_extent+0x9b/0x1d0 [btrfs]
[27175.315707] btrfs_alloc_tree_block+0x100/0x5b0 [btrfs]
[27175.316548] split_leaf+0x130/0x610 [btrfs]
[27175.317390] btrfs_search_slot+0x94d/0x9a0 [btrfs]
[27175.318235] btrfs_insert_empty_items+0x67/0xc0 [btrfs]
[27175.319087] alloc_reserved_file_extent+0x84/0x2c0 [btrfs]
[27175.319938] __btrfs_run_delayed_refs+0x596/0x1150 [btrfs]
[27175.320792] btrfs_run_delayed_refs+0xed/0x1b0 [btrfs]
[27175.321643] delayed_ref_async_start+0x81/0x90 [btrfs]
[27175.322491] normal_work_helper+0xd0/0x320 [btrfs]
[27175.323328] ? move_linked_works+0x6e/0xa0
[27175.324160] process_one_work+0x191/0x370
[27175.324976] worker_thread+0x4f/0x3b0
[27175.325763] kthread+0xf8/0x130
[27175.326531] ? rescuer_thread+0x320/0x320
[27175.327284] ? kthread_create_worker_on_cpu+0x50/0x50
[27175.328027] ret_from_fork+0x35/0x40
[27175.328741] ---[ end trace 300a1b9f0ac30e26 ]---
Fix this by preventing the flushing of new blocks groups when splitting a
leaf/node and when inserting a new root node for one of the trees modified
by the flushing operation, similar to what is done when COWing a node/leaf
from on of these trees.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202383
Reported-by: Eli V <eliventer@gmail.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a local wait_for_completion variable to avoid an access to the
potentially freed dio struture after dropping the last reference count.
Also use the chance to document the completion behavior to make the
refcounting clear to the reader of the code.
Fixes: ff6a9292e6 ("iomap: implement direct I/O")
Reported-by: Chandan Rajendra <chandan@linux.ibm.com>
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Chandan Rajendra <chandan@linux.ibm.com>
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
migrate_page_move_mapping() expects pages with private data set to have
a page_count elevated by 1. This is what used to happen for xfs through
the buffer_heads code before the switch to iomap in commit 82cb14175e
("xfs: add support for sub-pagesize writeback without buffer_heads").
Not having the count elevated causes move_pages() to fail on memory
mapped files coming from xfs.
Make iomap compatible with the migrate_page_move_mapping() assumption by
elevating the page count as part of iomap_page_create() and lowering it
in iomap_page_release().
It causes the move_pages() syscall to misbehave on memory mapped files
from xfs. It does not not move any pages, which I suppose is "just" a
perf issue, but it also ends up returning a positive number which is out
of spec for the syscall. Talking to Michal Hocko, it sounds like
returning positive numbers might be a necessary update to move_pages()
anyway though.
Fixes: 82cb14175e ("xfs: add support for sub-pagesize writeback without buffer_heads")
Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com>
[hch: actually get/put the page iomap_migrate_page() to make it work
properly]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlxLV8IACgkQiiy9cAdy
T1HRIgv/fieDI6HBXAoK9mXzVLJXCZ1IxVZTygC3Xn/bCaxHerS+QSe88w0r1QTZ
KLCoT7pBBy06uTowX4/cAF0IYTStsYWngBsH3+HL4EEchkZAVe02wAJ8QOCXs9YU
UjoMvNLmcyrzwNt8EduNM2jUUpOL8EHiZOQKUvLt1Vdo3xBtyB3Nkttji92YLOQb
Cxs0RJdKNaalMphOQwAtbdbZwlL79KGSA0gMm9W2VD744YKMjQ3uGXaTuoe6+w19
voYKVTSelgyZg51yzF20x1Pl3OV/8L6tWgN1BIcKZbfGBshJhhcD93Pef7bmhxNS
Y2yF3SPr/zcdgEkwigVlNs1uBCBBeU04/yQr+YvJwNcjsRTcTKOMuquxq4gXM1P8
7f8gA1u0n1H70YKK29sd3sl7Iu8rHVacwlxjfoAcCjMe0VQvOJtR+V/zwP2uWr8T
IjX05unBe/B1v+NXqNdkOpE6ZaiJtt2ny5NDZhfeubOouBLsey7/g/gSx48ICrq+
U20kKsb6
=CCH2
-----END PGP SIGNATURE-----
Merge tag '5.0-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb3 fixes from Steve French:
"A set of small smb3 fixes, some fixing various crediting issues
discovered during xfstest runs, five for stable"
* tag '5.0-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: print CIFSMaxBufSize as part of /proc/fs/cifs/DebugData
smb3: add credits we receive from oplock/break PDUs
CIFS: Fix mounts if the client is low on credits
CIFS: Do not assume one credit for async responses
CIFS: Fix credit calculations in compound mid callback
CIFS: Fix credit calculation for encrypted reads with errors
CIFS: Fix credits calculations for reads with errors
CIFS: Do not reconnect TCP session in add_credits()
smb3: Cleanup license mess
CIFS: Fix possible hang during async MTU reads and writes
cifs: fix memory leak of an allocated cifs_ntsd structure
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlxLdgsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpoVGD/4sGYQqfiXogQIJYbPH2RRPrJuLIIITjiAv
lPXX1wx/tvz/ktwKJE2OiWTES0JjdH1HlC+7/0L/fLb8DXiKBmFuUHlwhureoL9Y
o//BIQKuaje35kyHITTy2UAJOOqnNtJTaAP2AfkL+eOcj/V/G5rJIfLGs9QtuAR7
sRJ+uhg1EbW/+uO0bULDmG6WUjxFu8mqcw3i6g0VVLVOnXB2EKcZTl3KPrdAXrUp
XtmouERga6OfAUSJyZmTSV136mL+opRB2WFFVeIzjQfLmyItDGbSX/YPS8oJ2pow
v7630F+CMrd4aKpqqtnAhfWpGqd0Xw7cYfZ9MKTJmZPmGzf9a1fQFpmgZosD4Dh3
7MrhboU4TUt9PdXESA7CmE7LkTp99ghfj5/ysKrSV5h3HsH2RbLxJk91Rx3vmAWD
u1xWRYL+GYLH6ZwOLvM1iqBrrLN3mUyrx98SaMgoXuqNzmQmgz9LPeA0Gt09FJbo
uj+ebg4dRwuThjni4xQhl3zL2RQy7nlTDFDdKOz/XoiYk2NUVksss+sxGjNarHj0
b5pCD4HOp57OreGExaOARpBRah5HSNdQtBRsIOnbyEq6f/e1LsIY23Z9nNF0deGO
sZzgsbnsn+zg8bC6T/Gk4UY6XdCcgaS3SL04SVKAE3lO6A4C/Awo8DgD9Bl1zpC1
HQlNkl5fBg==
=iucY
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20190125' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes for this release. This contains:
- Silence sparse rightfully complaining about non-static wbt
functions (Bart)
- Fixes for the zoned comments/ioctl documentation (Damien)
- direct-io fix that's been lingering for a while (Ernesto)
- cgroup writeback fix (Tejun)
- Set of NVMe patches for nvme-rdma/tcp (Sagi, Hannes, Raju)
- Block recursion tracking fix (Ming)
- Fix debugfs command flag naming for a few flags (Jianchao)"
* tag 'for-linus-20190125' of git://git.kernel.dk/linux-block:
block: Fix comment typo
uapi: fix ioctl documentation
blk-wbt: Declare local functions static
blk-mq: fix the cmd_flag_name array
nvme-multipath: drop optimization for static ANA group IDs
nvmet-rdma: fix null dereference under heavy load
nvme-rdma: rework queue maps handling
nvme-tcp: fix timeout handler
nvme-rdma: fix timeout handler
writeback: synchronize sync(2) against cgroup writeback membership switches
block: cover another queue enter recursion via BIO_QUEUE_ENTERED
direct-io: allow direct writes to empty inodes
debugfs_rename() needs to check that the dentries passed into it really
are valid, as sometimes they are not (i.e. if the return value of
another debugfs call is passed into this one.) So fix this up by
properly checking if the two parent directories are errors (they are
allowed to be NULL), and if the dentry to rename is not NULL or an
error.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CRYPTO_TFM_REQ_WEAK_KEY confuses newcomers to the crypto API because it
sounds like it is requesting a weak key. Actually, it is requesting
that weak keys be forbidden (for algorithms that have the notion of
"weak keys"; currently only DES and XTS do).
Also it is only one letter away from CRYPTO_TFM_RES_WEAK_KEY, with which
it can be easily confused. (This in fact happened in the UX500 driver,
though just in some debugging messages.)
Therefore, make the intent clear by renaming it to
CRYPTO_TFM_REQ_FORBID_WEAK_KEYS.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Was helpful in debug for some recent problems.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Otherwise we gradually leak credits leading to potential
hung session.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
If the server doesn't grant us at least 3 credits during the mount
we won't be able to complete it because query path info operation
requires 3 credits. Use the cached file handle if possible to allow
the mount to succeed.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
If we don't receive a response we can't assume that the server
granted one credit. Assume zero credits in such cases.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The current code doesn't do proper accounting for credits
in SMB1 case: it adds one credit per response only if we get
a complete response while it needs to return it unconditionally.
Fix this and also include malformed responses for SMB2+ into
accounting for credits because such responses have Credit
Granted field, thus nothing prevents to get a proper credit
value from them.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We do need to account for credits received in error responses
to read requests on encrypted sessions.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently we mark MID as malformed if we get an error from server
in a read response. This leads to not properly processing credits
in the readv callback. Fix this by marking such a response as
normal received response and process it appropriately.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When executing add_credits() we currently call cifs_reconnect()
if the number of credits is zero and there are no requests in
flight. In this case we may call cifs_reconnect() recursively
twice and cause memory corruption given the following sequence
of functions:
mid1.callback() -> add_credits() -> cifs_reconnect() ->
-> mid2.callback() -> add_credits() -> cifs_reconnect().
Fix this by avoiding to call cifs_reconnect() in add_credits()
and checking for zero credits in the demultiplex thread.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlxJ6LUACgkQnJ2qBz9k
QNn75Qf/U60RPju54Kdyepo4iMFoQXS/VJg87bpRetUdN9vKU3CbnyYAx43ftPOk
GH+UU2u+24DX2AWbzy4yx2CJP1YUUF1oa2QkSW2P9WWn1NYU3goWha1Il8bc96Kq
TQU/ypl83RoaFtw4JwpL27ot39UH5WWafYiackqzOYIuAdpgei2aaZFNvVeb9Ije
9udt7ygcTn94DVdYAFI6NeH5KAA+kzpIyWWRCWlmESsNYnCUQntSLm3tQG8sTHZt
sXk2Kp1r7XFl/zkg9iFSOhSIIyVizSC4cHrE0iJ/0kLgoJjfjbOa1jZ5Eeq8yTc3
UsXI5ICNS3RuZWwSWkh759JnhQ1R/A==
=cB63
-----END PGP SIGNATURE-----
Merge tag 'fsnotify_for_v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull inotify fix from Jan Kara:
"Fix a file refcount leak in an inotify error path"
* tag 'fsnotify_for_v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
inotify: Fix fd refcount leak in inotify_add_watch().
Precise and non-ambiguous license information is important. The recently
added aegis header file has a SPDX license identifier, which is nice, but
at the same time it has a contradictionary license boiler plate text.
SPDX-License-Identifier: GPL-2.0
versus
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
Oh well.
Assuming that the SPDX identifier is correct and according to x86/hyper-v
contributions from Microsoft GPL V2 only is the usual license.
Remove the boiler plate as it is wrong and even if correct it is redundant.
Fixes: eccb4422cf ("smb3: Add ftrace tracepoints for improved SMB3 debugging")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
When doing MTU i/o we need to leave some credits for
possible reopen requests and other operations happening
in parallel. Currently we leave 1 credit which is not
enough even for reopen only: we need at least 2 credits
if durable handle reconnect fails. Also there may be
other operations at the same time including compounding
ones which require 3 credits at a time each. Fix this
by leaving 8 credits which is big enough to cover most
scenarios.
Was able to reproduce this when server was configured
to give out fewer credits than usual.
The proper fix would be to reconnect a file handle first
and then obtain credits for an MTU request but this leads
to bigger code changes and should happen in other patches.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The call to SMB2_queary_acl can allocate memory to pntsd and also
return a failure via a call to SMB2_query_acl (and then query_info).
This occurs when query_info allocates the structure and then in
query_info the call to smb2_validate_and_copy_iov fails. Currently the
failure just returns without kfree'ing pntsd hence causing a memory
leak.
Currently, *data is allocated if it's not already pointing to a buffer,
so it needs to be kfree'd only if was allocated in query_info, so the
fix adds an allocated flag to track this. Also set *dlen to zero on
an error just to be safe since *data is kfree'd.
Also set errno to -ENOMEM if the allocation of *data fails.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Dan Carpener <dan.carpenter@oracle.com>
sync_inodes_sb() can race against cgwb (cgroup writeback) membership
switches and fail to writeback some inodes. For example, if an inode
switches to another wb while sync_inodes_sb() is in progress, the new
wb might not be visible to bdi_split_work_to_wbs() at all or the inode
might jump from a wb which hasn't issued writebacks yet to one which
already has.
This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb
switch path against sync_inodes_sb() so that sync_inodes_sb() is
guaranteed to see all the target wbs and inodes can't jump wbs to
escape syncing.
v2: Fixed misplaced rwsem init. Spotted by Jiufei.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jiufei Xue <xuejiufei@gmail.com>
Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
On a DIO_SKIP_HOLES filesystem, the ->get_block() method is currently
not allowed to create blocks for an empty inode. This confusion comes
from trying to bit shift a negative number, so check the size of the
inode first.
The problem is most visible for hfsplus, because the fallback to
buffered I/O doesn't happen and the write fails with EIO. This is in
part the fault of the module, because it gives a wrong return value on
->get_block(); that will be fixed in a separate patch.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
debugfs_use_file_start() and debugfs_use_file_finish() do not exist
since commit c9afbec270 ("debugfs: purge obsolete SRCU based removal
protection"); tweak debugfs_create_file_unsafe() comment.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Precise and non-ambiguous license information is important. The recently
added quota.c file has a SPDX license identifier, which is nice, but
at the same time it has a contradictionary license boiler plate text.
SPDX-License-Identifier: GPL-2.0
versus
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
Oh well.
As the other ceph related files are licensed under the GPL v2 only, it's
assumed that the SPDX id is correct and the boiler plate was randomly
copied into that patch.
Remove the boiler plate as it is wrong and even if correct it is redundant.
Fixes: fb18a57568 ("ceph: quota: add initial infrastructure to support cephfs quotas")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Luis Henriques <lhenriques@suse.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Sage Weil <sage@redhat.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
Acked-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
snap realm and corresponding inode have pointers to each other.
The two pointer should get clear at the same time. Otherwise,
snap realm's pointer may reference freed inode.
Cc: stable@vger.kernel.org # 4.17+
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
- Fix console ramoops to show the previous boot logs (Sai Prakash Ranjan)
- Avoid allocation and leak of platform data
-----BEGIN PGP SIGNATURE-----
Comment: Kees Cook <kees@outflux.net>
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlxE/QYWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsVXD/94pJsNHogOle3XnZJst/1cV4+R
KTyQ6QQ1184lI5FPGJliV3veCuKnTY4Q7e5vCl2mzEvds+VRxu2yUm+LFxKVj04W
yiRNv8Yn3+3eKxNufqLa9ySbsSZyIpoCKQ/20nbC5cNM3r+Gq12GUnDBixtGdl9V
fB316emIaKR70Qvi9KlfwDTcjh+SWBR2u8L0YsmTaoJkjtaCJhekN3hfya6Wnjf5
1sj1fCDzteda/Ld0gXoImHPvt0UcHPJDp1+ZhY0ja64QIuoWwDxsisp4X4nS8iFD
oak9BWRDhRwSzPnyCu8BI3Hc4ayxcBYR/vmBS12LTkIJVdR1CaqfeDYvZexyX04n
TYZnmPeZ+2R3wzA9aCLjUDEAYNi403sLqIvR99jlrOoiMo+5Bf05TvBG8lND2dQg
A13854vg9ssxfiuEmvxJTg8SLtcFiqEtrzZhqaLVfuf4cqaN3o0Ed+A0gjZadwwD
TPxiMh4YEZuw+qszZwyzi7uECKOvJIJeLCL8CApPvihCNFy2vKF5wj58ZizXXmoT
Orq6DJ1ITx4DTkYIgtiTC9HKDuvNDZ/ncdb5QoqlpoTe01r8GQOzffZ+tD3xRTNc
8+yAdK0/bTjLVNbFX/Q4dwagh0I8EQSrdw8CmmQs0lsCvfyYTfHz8uPeBQ7rrzeN
2r1H7a/OKpL/duTkEw==
=m6sX
-----END PGP SIGNATURE-----
Merge tag 'pstore-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore fixes from Kees Cook:
- Fix console ramoops to show the previous boot logs (Sai Prakash
Ranjan)
- Avoid allocation and leak of platform data
* tag 'pstore-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Avoid allocation and leak of platform data
pstore/ram: Fix console ramoops to show the previous boot logs
Yue Hu noticed that when parsing device tree the allocated platform data
was never freed. Since it's not used beyond the function scope, this
switches to using a stack variable instead.
Reported-by: Yue Hu <huyue2@yulong.com>
Fixes: 35da60941e ("pstore/ram: add Device Tree bindings")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlxElHsACgkQxWXV+ddt
WDsF8Q/6A+l/Ku8D+xSmw+JGXGNroX1V62sxVYWqIgrkYNg6iSMjicqF3aN1bCbR
LqvOQ5skerrKteNYIPbTTOD5Xp37ccinSeEWEF0ktFkeU1G6yJo7aRsnQlO1sduk
2PKUOA1/PdeTBiOj1bej/1ybhtIW+d0MaoPtnUCMC8DD/ihfmU332+KC8VmRUYGZ
4kT1DvKfBOSVz1UTl6OJdWo76crvjz0eGVnH1YG7DoFbpVzAbphHE7+aC/WkHQme
X5Ux2NtYVMe/0IGAC7kJnj4jCZ1weAdlmvzzagjzGtWdhWgTxntiJ/FVJs9nO8Dm
G/pVtD8RVjgMkPOWcfw5fdderrBJqjVGgl4VDrDLqjO9OTGNFJs+HgcPRi6Oli28
sA+HG+U34YzSfKY0L9eAmpkNxMjWywBuXTQIAlMhHNZCL0vZ2K5EYa3dTp9OswAW
IIcOh/LfZxiomvMvUqQWcRCy5y/b+cYjOjbHwkrw+ewd3IWXVLG8YLMyZI3vnHKu
/f1xn6KCap9a1cS4LwyK6gzstEugn0MYmnmD/Jx8I1BJFBt55Q31ES6tPHgTmh/d
QjveRjMkxNCql4h5Hq0+LiXoSoocBmsO0wrs2QrWSx4PBnJsvjySnWr/8GfAOj79
BhnuQFxbr/BkNyBzvKrjoI+zZnrVm0cBU59lP6PzN75+kQTaIfs=
=hGlM
-----END PGP SIGNATURE-----
Merge tag 'for-5.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A handful of fixes (some of them in testing for a long time):
- fix some test failures regarding cleanup after transaction abort
- revert of a patch that could cause a deadlock
- delayed iput fixes, that can help in ENOSPC situation when there's
low space and a lot data to write"
* tag 'for-5.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: wakeup cleaner thread when adding delayed iput
btrfs: run delayed iputs before committing
btrfs: wait on ordered extents on abort cleanup
btrfs: handle delayed ref head accounting cleanup in abort
Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"