There is no need to allocate aio rings from HIGHMEM because of very
little memory needed here.
Therefore, use GFP_USER flag in find_or_create_page() and get rid of
kmap*() mappings.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ira Weiny <ira.weiny@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Message-Id: <20230609145937.17610-1-fmdefrancesco@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Delete duplicated word in comment.
Signed-off-by: Mao Zhu <zhumao001@208suo.com>
Message-Id: <20230611123314.5282-1-dengshaomin@cdjrlc.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
buffer_check_dirty_writeback is only used by the block device aops,
remove the export.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230608122958.276954-1-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
As each option string fragment is always prepended with a comma it would
happen that the whole string always starts with a comma. This could be
interpreted by filesystem drivers as an empty option and may produce
errors.
For example the NTFS driver from ntfs.ko behaves like this and fails
when mounted via the new API.
Link: https://github.com/util-linux/util-linux/issues/2298
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Fixes: 3e1aeb00e6 ("vfs: Implement a filesystem superblock creation/configuration context")
Cc: stable@vger.kernel.org
Message-Id: <20230607-fs-empty-option-v1-1-20c8dbf4671b@weissschuh.net>
Signed-off-by: Christian Brauner <brauner@kernel.org>
NULL the dangling pipe reference while clearing watch_queue.
If not done, a reference to a freed pipe remains in the watch_queue,
as this function is called before freeing a pipe in free_pipe_info()
(see line 834 of fs/pipe.c).
The sole use of wqueue->defunct is for checking if the watch queue has
been cleared, but wqueue->pipe is also NULLed while clearing.
Thus, wqueue->defunct is superfluous, as wqueue->pipe can be checked
for NULL. Hence, the former can be removed.
Tested with keyutils testsuite.
Cc: stable@vger.kernel.org # 6.1
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Acked-by: David Howells <dhowells@redhat.com>
Message-Id: <20230605143616.640517-1-code@siddh.me>
Signed-off-by: Christian Brauner <brauner@kernel.org>
In the syscall test of UnixBench, performance regression occurred due
to false sharing.
The lock and atomic members, including file::f_lock, file::f_count and
file::f_pos_lock are highly contended and frequently updated in the
high-concurrency test scenarios. perf c2c indentified one affected
read access, file::f_op.
To prevent false sharing, the layout of file struct is changed as
following
(A) f_lock, f_count and f_pos_lock are put together to share the same
cache line.
(B) The read mostly members, including f_path, f_inode, f_op are put
into a separate cache line.
(C) f_mode is put together with f_count, since they are used frequently
at the same time.
Due to '__randomize_layout' attribute of file struct, the updated layout
only can be effective when CONFIG_RANDSTRUCT_NONE is 'y'.
The optimization has been validated in the syscall test of UnixBench.
performance gain is 30~50%. Furthermore, to confirm the optimization
effectiveness on the other codes path, the results of fsdisk, fsbuffer
and fstime are also shown.
Here are the detailed test results of unixbench.
Command: numactl -C 3-18 ./Run -c 16 syscall fsbuffer fstime fsdisk
Without Patch
------------------------------------------------------------------------
File Copy 1024 bufsize 2000 maxblocks 875052.1 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 235484.0 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 2815153.5 KBps (30.0 s, 2 samples)
System Call Overhead 5772268.3 lps (10.0 s, 7 samples)
System Benchmarks Partial Index BASELINE RESULT INDEX
File Copy 1024 bufsize 2000 maxblocks 3960.0 875052.1 2209.7
File Copy 256 bufsize 500 maxblocks 1655.0 235484.0 1422.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 2815153.5 4853.7
System Call Overhead 15000.0 5772268.3 3848.2
========
System Benchmarks Index Score (Partial Only) 2768.3
With Patch
------------------------------------------------------------------------
File Copy 1024 bufsize 2000 maxblocks 1009977.2 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 264765.9 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 3052236.0 KBps (30.0 s, 2 samples)
System Call Overhead 8237404.4 lps (10.0 s, 7 samples)
System Benchmarks Partial Index BASELINE RESULT INDEX
File Copy 1024 bufsize 2000 maxblocks 3960.0 1009977.2 2550.4
File Copy 256 bufsize 500 maxblocks 1655.0 264765.9 1599.8
File Copy 4096 bufsize 8000 maxblocks 5800.0 3052236.0 5262.5
System Call Overhead 15000.0 8237404.4 5491.6
========
System Benchmarks Index Score (Partial Only) 3295.3
Signed-off-by: chenzhiyin <zhiyin.chen@intel.com>
Message-Id: <20230601092400.27162-1-zhiyin.chen@intel.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
With commit 849ad04cf5 ("new helper: put_and_unmap_page()"), Al Viro
introduced the put_and_unmap_page() to use in those many places where we
have a common pattern consisting of calls to kunmap_local() +
put_page().
Obviously, first we unmap and then we put pages. Instead, the original
name of this helper seems to imply that we first put and then unmap.
Therefore, rename the helper and change the only known upstreamed user
(i.e., fs/sysv) before this helper enters common use and might become
difficult to find all call sites and instead easy to break the builds.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Message-Id: <20230602103307.5637-1-fmdefrancesco@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Set mode 0600 on files in the cache so that cachefilesd can run as an
unprivileged user rather than leaving the files all with 0. Directories
are already set to 0700.
Userspace then needs to set the uid and gid before issuing the "bind"
command and the cache must've been chown'd to those IDs.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
cc: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
cc: linux-erofs@lists.ozlabs.org
cc: linux-fsdevel@vger.kernel.org
Message-Id: <1853230.1684516880@warthog.procyon.org.uk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
The split_fs_names() function takes a names parameter, but the function
actually uses the root_fs_names global variable instead. This names
parameter is not used in the function, so it can be safely removed.
This change does not affect the functionality of split_fs_names() or any
other part of the kernel.
Signed-off-by: Yihuan Pan <xun794@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <4lsiigvaw4lxcs37rlhgepv77xyxym6krkqcpc3xfncnswok3y@b67z3b44orar>
Signed-off-by: Christian Brauner <brauner@kernel.org>
To avoid confusing the compiler about possible negative sizes, switch
"ssize" which can never be negative from int to u32. Seen with GCC 13:
../fs/jfs/namei.c: In function 'jfs_symlink': ../include/linux/fortify-string.h:57:33: warning: '__builtin_memcpy' pointer overflow between offset 0 and size [-2147483648, -1]
[-Warray-bounds=]
57 | #define __underlying_memcpy __builtin_memcpy
| ^
...
../fs/jfs/namei.c:950:17: note: in expansion of macro 'memcpy'
950 | memcpy(ip->i_link, name, ssize);
| ^~~~~~
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: jfs-discussion@lists.sourceforge.net
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jeff Xu <jeffxu@chromium.org>
Message-Id: <20230204183355.never.877-kees@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
The arch_report_meminfo() function is provided by four architectures,
with a __weak fallback in procfs itself. On architectures that don't
have a custom version, the __weak version causes a warning because
of the missing prototype.
Remove the architecture specific prototypes and instead add one
in linux/proc_fs.h.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for arch/x86
Acked-by: Helge Deller <deller@gmx.de> # parisc
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Message-Id: <20230516195834.551901-1-arnd@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
A couple of functions from fs/pipe.c are used both internally
and for the watch queue code, but the declaration is only
visible when the latter is enabled:
fs/pipe.c:1254:5: error: no previous prototype for 'pipe_resize_ring'
fs/pipe.c:758:15: error: no previous prototype for 'account_pipe_buffers'
fs/pipe.c:764:6: error: no previous prototype for 'too_many_pipe_buffers_soft'
fs/pipe.c:771:6: error: no previous prototype for 'too_many_pipe_buffers_hard'
fs/pipe.c:777:6: error: no previous prototype for 'pipe_is_unprivileged_user'
Make the visible unconditionally to avoid these warnings.
Fixes: c73be61ced ("pipe: Add general notification queue support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Message-Id: <20230516195629.551602-1-arnd@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
make W=1 warns about a missing prototype that is defined but
not visible at point where simple_dname() is defined:
fs/d_path.c:317:7: error: no previous prototype for 'simple_dname' [-Werror=missing-prototypes]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Message-Id: <20230516195444.551461-1-arnd@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
The motivation for this patch has been to enable using a stricter
apparmor profile to prevent programs from reading any coredump in the
system.
However, this became something else. The following details are based on
Christian's and Linus' archeology into the history of the number "2" in
the coredump handling code.
To make sure we're not accidently introducing some subtle behavioral
change into the coredump code we set out on a voyage into the depths of
history.git to figure out why this was O_RDWR in the first place.
Coredump handling was introduced over 30 years ago in commit
ddc733f452e0 ("[PATCH] Linux-0.97 (August 1, 1992)").
The original code used O_WRONLY:
open_namei("core",O_CREAT | O_WRONLY | O_TRUNC,0600,&inode,NULL)
However, this changed in 1993 and starting with commit
9cb9f18b5d26 ("[PATCH] Linux-0.99.10 (June 7, 1993)") the coredump code
suddenly used the constant "2":
open_namei("core",O_CREAT | 2 | O_TRUNC,0600,&inode,NULL)
This was curious as in the same commit the kernel switched from
constants to proper defines in other places such as KERNEL_DS and
USER_DS and O_RDWR did already exist.
So why was "2" used? It turns out that open_namei() - an early version
of what later turned into filp_open() - didn't accept O_RDWR.
A semantic quirk of the open() uapi is the definition of the O_RDONLY
flag. It would seem natural to define:
#define O_RDWR (O_RDONLY | O_WRONLY)
but that isn't possible because:
#define O_RDONLY 0
This makes O_RDONLY effectively meaningless when passed to the kernel.
In other words, there has never been a way - until O_PATH at least - to
open a file without any permission; O_RDONLY was always implied on the
uapi side while the kernel does in fact allow opening files without
permissions.
The trouble comes when trying to map the uapi flags onto the
corresponding file mode flags FMODE_{READ,WRITE}. This mapping still
happens today and is causing issues to this day (We ran into this
during additions for openat2() for example.).
So the special value "3" was used to indicate that the file was opened
for special access:
f->f_flags = flag = flags;
f->f_mode = (flag+1) & O_ACCMODE;
if (f->f_mode)
flag++;
This allowed the file mode to be set to FMODE_READ | FMODE_WRITE mapping
the O_{RDONLY,WRONLY,RDWR} flags into the FMODE_{READ,WRITE} flags. The
special access then required read-write permissions and 0 was used to
access symlinks.
But back when ddc733f452e0 ("[PATCH] Linux-0.97 (August 1, 1992)") added
coredump handling open_namei() took the FMODE_{READ,WRITE} flags as an
argument. So the coredump handling introduced in
ddc733f452e0 ("[PATCH] Linux-0.97 (August 1, 1992)") was buggy because
O_WRONLY shouldn't have been passed. Since O_WRONLY is 1 but
open_namei() took FMODE_{READ,WRITE} it was passed FMODE_READ on
accident.
So 9cb9f18b5d26 ("[PATCH] Linux-0.99.10 (June 7, 1993)") was a bugfix
for this and the 2 didn't really mean O_RDWR, it meant FMODE_WRITE which
was correct.
The clue is that FMODE_{READ,WRITE} didn't exist yet and thus a raw "2"
value was passed.
Fast forward 5 years when around 2.2.4pre4 (February 16, 1999) this code
was changed to:
- dentry = open_namei(corefile,O_CREAT | 2 | O_TRUNC | O_NOFOLLOW, 0600);
...
+ file = filp_open(corefile,O_CREAT | 2 | O_TRUNC | O_NOFOLLOW, 0600);
At this point the raw "2" should have become O_WRONLY again as
filp_open() didn't take FMODE_{READ,WRITE} but O_{RDONLY,WRONLY,RDWR}.
Another 17 years later, the code was changed again cementing the mistake
and making it almost impossible to detect when commit
378c6520e7 ("fs/coredump: prevent fsuid=0 dumps into user-controlled directories")
replaced the raw "2" with O_RDWR.
And now, here we are with this patch that sent us on a quest to answer
the big questions in life such as "Why are coredump files opened with
O_RDWR?" and "Is it safe to just use O_WRONLY?".
So with this commit we're reintroducing O_WRONLY again and bringing this
code back to its original state when it was first introduced in commit
ddc733f452e0 ("[PATCH] Linux-0.97 (August 1, 1992)") over 30 years ago.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20230420120409.602576-1-vsementsov@yandex-team.ru>
[brauner@kernel.org: completely rewritten commit message]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Use __FMODE_NONOTIFY instead of FMODE_NONOTIFY to fixes
the following sparce warnings:
fs/overlayfs/file.c:48:37: sparse: warning: restricted fmode_t degrades to integer
fs/overlayfs/file.c:128:13: sparse: warning: restricted fmode_t degrades to integer
fs/open.c:1159:21: sparse: warning: restricted fmode_t degrades to integer
Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com>
Message-Id: <20230502232210.119063-1-minhuadotchen@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Use kcalloc() for allocation/flush of 128 pointers table to
reduce stack usage.
Function now returns -ENOMEM or 0 on success.
stackusage
Before:
./fs/jffs2/xattr.c:775 jffs2_build_xattr_subsystem 1208
dynamic,bounded
After:
./fs/jffs2/xattr.c:775 jffs2_build_xattr_subsystem 192
dynamic,bounded
Also update definition when CONFIG_JFFS2_FS_XATTR is not enabled
Tested with an MTD mount point and some user set/getfattr.
Many current target on OpenWRT also suffer from a compilation warning
(that become an error with CONFIG_WERROR) with the following output:
fs/jffs2/xattr.c: In function 'jffs2_build_xattr_subsystem':
fs/jffs2/xattr.c:887:1: error: the frame size of 1088 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
887 | }
| ^
Using dynamic allocation fix this compilation warning.
Fixes: c9f700f840 ("[JFFS2][XATTR] using 'delete marker' for xdatum/xref deletion")
Reported-by: Tim Gardner <tim.gardner@canonical.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Ron Economos <re@w6rz.net>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org
Message-Id: <20230506045612.16616-1-ansuelsmth@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
fs/open.c: In functions 'setattr_vfsuid' and 'setattr_vfsgid':
warning: Function parameter or member 'attr' not described
- Fix warning by removing kernel-doc for these as they are static
inline functions and not required to be exposed via kernel-doc.
fs/open.c:
warning: Excess function parameter 'opened' description in 'finish_open'
warning: Excess function parameter 'cred' description in 'vfs_open'
- Fix by removing the parameters from the kernel-doc as they are no
longer required by the function.
Signed-off-by: Anuradha Weeraman <anuradha@debian.org>
Message-Id: <20230506182928.384105-1-anuradha@debian.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Message-Id: <20230510221119.3508930-1-azeemshaikh38@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Fix the following sparse warnings by using __poll_t instead
of unsigned type.
fs/eventpoll.c:541:9: sparse: warning: restricted __poll_t degrades to integer
fs/eventfd.c:67:17: sparse: warning: restricted __poll_t degrades to integer
Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com>
Message-Id: <20230511164628.336586-1-minhuadotchen@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
- Fix a compilation issue with DEFINE_STATIC_SRCU() in the unit tests
- Fix leaking kernel memory to a root-only sysfs attribute
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZGETgQAKCRDfioYZHlFs
Zy+FAQDDwPDprMrALvuWz3rYPROPH0h6X2zLYH5JFq29cqjO9wD/RVlrXFFkGaG+
3n7Uip2rZaW3OpC2TOaqBaDxTkXo0ww=
=yFDG
-----END PGP SIGNATURE-----
Merge tag 'cxl-fixes-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link fixes from Dan Williams:
- Fix a compilation issue with DEFINE_STATIC_SRCU() in the unit tests
- Fix leaking kernel memory to a root-only sysfs attribute
* tag 'cxl-fixes-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Add missing return to cdat read error path
tools/testing/cxl: Use DEFINE_STATIC_SRCU()
- Fix encoding of swp_entry due to added SWP_EXCLUSIVE flag
- Include reboot.h to avoid gcc-12 compiler warning
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZGEQcQAKCRD3ErUQojoP
X9UDAQCRpuIPVJcdOmb1iIfv0+IFShNHEOb6yn2Yl8F33s3UYAD+LZyuMXDID2zj
QkeZaQWEaya6/YwEPDGKb05YwEdu4gI=
=L8XB
-----END PGP SIGNATURE-----
Merge tag 'parisc-for-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
- Fix encoding of swp_entry due to added SWP_EXCLUSIVE flag
- Include reboot.h to avoid gcc-12 compiler warning
* tag 'parisc-for-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix encoding of swp_entry due to added SWP_EXCLUSIVE flag
parisc: kexec: include reboot.h
Fixes for v6.4-rc1:
- fix unwinder for uleb128 case
- fix kernel-doc warnings for HP Jornada 7xx
- fix unbalanced stack on vfp success path
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuNNh8scc2k/wOAE+9OeQG+StrGQFAmRg4MEACgkQ9OeQG+St
rGQpQQ//UukQgRa+w7wEi9mkqYfjm8bP+LT5EdXDYfSeijvUkZ57iazMeyzDA32D
AnrirhcxJr3qMs9Er9jaLqf+jQ9intL3KAL5c69GXx4hExcDhXgTngvAxFuf+IXh
4G52brjQbgdcwjyzkALikgpKunS5SeJ9VF7Mf9jMXhg0IpoLV1bOVosoUUBlqvMJ
XEBvb9DXIgFLSeMETjG9ELX4DjaJChK5dCtyMQJCRCPCSdSub5cjMVY1A5aqROcf
w5gtOAyHCJVDCvYtMwszr4HQcOf+MWDkPJ3Knlf4y1PkdH9W1QRk9L82ADGZlnsk
3CGsq+/5nE7WeFL29ct4FbA9mP2NZTKuVVhCGVlGdzNTPuDv3+Wu1BC9orNwKqit
x5ikUa6W4iDcEpCIkYeYt8MfxUW8eGYn/DhqN4a2uSBQPtVbyLfj1Nesjix8Mud+
tZIsQ47y3TF92t35fNgbHMxQNq/V7B6uWJpvDa8UoN57/pT+VzW69cv3RXle6UtT
R4O0xcSgrOKrckfYl4zhkaJur7iMyI8QYYDquIL+0UxJ19uKPqCFuiwsN1IF/2uu
ltQkZYjXQnQazcAZPtCyJrYYt8mB2Gg6zO3jIpHNcY2RbU6GHdhPlbjodfXOFe9x
ILR6W9vVtcqbJy8pDgp2H7u7KzoUrwyN5nfH4TfPVKO/WZ+MBwE=
=vp7E
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- fix unwinder for uleb128 case
- fix kernel-doc warnings for HP Jornada 7xx
- fix unbalanced stack on vfp success path
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9297/1: vfp: avoid unbalanced stack on 'success' return path
ARM: 9296/1: HP Jornada 7XX: fix kernel-doc warnings
ARM: 9295/1: unwind:fix unwind abort for uleb128 case
names land in traceevents output instead and thus the blocked function
can be identified
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRg0vIACgkQEsHwGGHe
VUqyhw//V27hy1LQObFH86sPFUf5DU5mcpF6ymZ1ww0rNNIGPJh9DUDG0krV5c8s
cmB2bJMh7SbPc0z8PTQ9Qmj7wzWzrO52OeITvzj4n3oPsHGFcoAyqNIxh5qZLWD2
2hFuuLpwuNv7nAISd275gWU2uUkhviYZMiaBaFpysM3jxQGuAsEx+lw1zIYmCkR8
hTL4m9k71S4UBvPmgas1C3s/JClzO3OKHSoiphtb872RdemO/alhfS2YHH+kkEUL
9v5fyH+1zznisOu7XbBhLK2e8Tgj6GT0v80hzG6ySRMHs1C+mg1ZyvvnUaSZ6hIr
FXGsOH9qtI5CT/vZspJUEl9Ew1SHjO5TQlb7A+sL1TZXRuwRP3pNsryZSO8kJkag
7yVmywWCO8pngxyD7tj2tLsO8b5tQ/0Cq9w43I21kTjxpKvdJ9dixBFQCGc9IJVB
C/wD7JiiALXcN5uDrn+l2TPRBdbzM1UAommbCE9ugfs/6h20EU0Tku4qfKQSuzyD
1wX6DtAr7u5tOP10+Chapj/+BGGSiAFaTc1uQLsnf13+AvXRnyMXSHNdOouGc19E
flkZrR4ap8x1iDp9OijtU56iUjKcJkp7kGeBptFEZNbtm+iks7s7aNCnz0uHERAy
KuHoxJ0lcsefOOp5qoKa+63wbS9ooM5ErStnETpXp4X+YWg+A0k=
=otoJ
-----END PGP SIGNATURE-----
Merge tag 'locking_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Borislav Petkov:
- Make sure __down_read_common() is always inlined so that the callers'
names land in traceevents output and thus the blocked function can be
identified
* tag 'locking_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Add __always_inline annotation to __down_read_common() and inlined callers
so that the correct record sizes are used
- Update the sample size for AMD BRS events
- Fix a confusion with using the same on-stack struct with different
events in the event processing path
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRgzWIACgkQEsHwGGHe
VUpdgw//a1toWyjwrIV1YMu8lEpsrPKpOqIFuDQcLSl1vsYrmTRJ47PI1j/ZTQeo
HgNkEE6lxAa9h/lKAjlE/lACE6Hr59xnQmu0BdG/SS+hlhWkT+oKLEUWz5qD4MuE
bWdpxwHOhMIFR1ASAMThy/mE9V4TKsI/tsd7lMXUo6/skDGCmCGIgRq//3NUB5fV
0ivp5lv6NXFnUwS34Ot3fbWj/be7rr2vkYgN8WbwMAaEbpCIyseh6Tz+5ZRbENfP
dMdh6ryuJ2BJ9BcDe9XlcEvPcaTvz7LVnzOVFz/AnBgtBTIOw/26xt17pgXBH7NK
kpTKQTPp0mnt6ysnX5zYkeumKaxxqvVWaf18AQHkupj1HwggjiEFPnKK9KfslSy4
1tcED/D3i5QLOx+A8lCtA4ACwGl0Cvwgvw98Gp9imLst/zmMKa4MK96BYCodirKJ
iDKN5aFA6c3pKJ4KTE7N6KKFzwhslTrehTHAJIL7BiVw3aMGin6514OnMELZBzam
/zud81OWAKywWWRSwg7wy+K8RGH0R6K5dhwFrrm2BMqAluMq+rX1pRY9pEsL6jDj
bCl45L52IsXZBSz2JTwWHGTssPyeDIe157ICFDOBnIx08u4KzJ+Knxsbaq2Jjs3R
9wm5H9yp/+q7//3XcEkdFjQwDVh2LJkY0QinH+6rPiAseBC9ukU=
=OCba
-----END PGP SIGNATURE-----
Merge tag 'perf_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Make sure the PEBS buffer is flushed before reprogramming the
hardware so that the correct record sizes are used
- Update the sample size for AMD BRS events
- Fix a confusion with using the same on-stack struct with different
events in the event processing path
* tag 'perf_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG
perf/x86: Fix missing sample size update on AMD BRS
perf/core: Fix perf_sample_data not properly initialized for different swevents in perf_tp_event()
amd_nb.c work for drivers which switch to them. Add a PCI device ID
to k10temp's table so that latter is loaded on such systems too
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRgx6IACgkQEsHwGGHe
VUpzJQ//WL//vtzAyQyEQQ0HP5cUNwtts79rEWpNyHJD0nFymbOrR7ho97HTi7pu
a+wc3p9gwTC626GJD9dFVZs9YPqXI/Q2BKndQ6vrct9eaVnxWiJ45B5yJYLCUppE
AHAChrDZ8ErXQjV1jeHtkIZY1mSa5Q7slfn8KFLzY5gIWA/3ds2pmIDK4r2wqQUR
xIn+kDaaprsu8vhffWtHKIj5n1zRkLwPNZZ/2rZbQ7NJHt0yqu4Ld/L6myC6cQVj
cX1Z1Wg++T3rZHxChx69w/jYuCY2EeM5JHXEbQqUG1tnfkpInZqvyjz1FUs9iYOY
NOv9hcod9jIRjGvMooKn7MDhVy3yCtyrR590tKVP4qj3Wb+wKKJtPECjMfCo6I0H
nVIiQPMZAbbo9UVLVA9UickSxKUgNCvhANUcnYWEQp+DM8wYxbV35PT8lomNz6Cj
3mKo/uqQPfGn3yUR1GIQagYSxHZJ1NTv50lPQOV0tZtfP2Gk/H4uT2w6ZFJKiJIV
KTiE5vFW9VhbVye/VcHWyJ3+mB5SgClaA+t87zGpgJiPjfVWo2bnB6JkwZodEWi9
V8c53Hq+NG78QnFYZR5WNDYQ96LYWfurhcuqol8Sr836Y9LaOCGEAIsPMvG7esB2
y/GHRWdhkjOf5uNHduGoOVzlH95qW9mBtdFacVpaM/eocb1Zghk=
=dn2p
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Borislav Petkov:
- Add the required PCI IDs so that the generic SMN accesses provided by
amd_nb.c work for drivers which switch to them. Add a PCI device ID
to k10temp's table so that latter is loaded on such systems too
* tag 'x86_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hwmon: (k10temp) Add PCI ID for family 19, model 78h
x86/amd_nb: Add PCI ID for family 19h model 78h
device is replaced while the system is already in oneshot mode
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRgwZoACgkQEsHwGGHe
VUr4+g//Z9TQC2iEJWZKQZlShoV2e3nhbyzNFDKGRvLcpSQOZvW6M9XMyV6QHSlF
A56x18DT2Oi1YoVxxSuSkmYDxp6j8hA8hQWw3xHKNO7z0MfrGHsqEQo8UUuyZayj
LbVondAC5NvHQWzuPC/g+E0AcDNGvkYrIT+hqAsC7STEvzz+1Y73ZWvlWQCPjgdR
SDkw5w0OHCvbE7bEE53By2SrnDt0x8C9OHy1sa8juR3vcYIVhG6Rn/SaCB1wLmUT
RrZD+JdBh+ZEAjVNqwFa3PX5UHaIpdHJ3mutDoiCbrRsjYFJLJGFXQ/war4XPa/g
OzG8j3XQ1gSmJN75oI5RrZD9LOp3/cRqtTlTGe4tXK4yUclfsXwRVdmmvvfBIcpl
fhFpw9Purl6NDc7ezhSG9Lz6+M2lEtb//5oBytcMeNzPYnJE/BjgnKcVLxirbKGW
VHisXVVs0u0rRhxEUuEUAzEPCZoU5vtTUtEx5XWXlKlllcx1hsuYL29bAc7+w5TL
PBl676r7LjYOX3QXz0SfPUHXad+XpHCS2Yn4enciNOZsVhaxDSxIZOrzsLMQlSIx
DPyNfZD5EdsZGExZLO8YMDlXK+NqIVFDyPwkUcqLyP4cEbYzqZc/eNJ1SK3MREpC
Xx8L3GVIo6Ow5M3MWol1SejRGp8Bj6dWIdPlXqLEoOou4do13ks=
=i0Bq
-----END PGP SIGNATURE-----
Merge tag 'timers_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Borislav Petkov:
- Prevent CPU state corruption when an active clockevent broadcast
device is replaced while the system is already in oneshot mode
* tag 'timers_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/broadcast: Make broadcast device replacement work correctly
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmRgCfAACgkQ8vlZVpUN
gaOaOgf5AbFUBsjb95Aq2Y6SKvlyO2xFd2OqJXu6+bGaJScQ8qeoW2byihN4vD/e
i5V5vivpk764k1uOUe9fq5BlkaTuvFJI8d81eEnJC3LW4s7r6Gv586dwbE5lr0Bq
cZKCVMYdgwz3admGtPXrN0CVgg+Y/wHb1ZmGtt2nAqZfNqYfpX0waDyGr6JebhkO
04VE8QQCvMkO6oOIR9ZfbJmVm5vrGqQVLW4T0hXVTj9r3gUu/61qAkt2XYAu5tKJ
ENIoMv2ix0asAgFSbcIzY6YnCzSY9hiV/K6Twtusf63r22T+r6+LXBqUe+8hMx4E
Vh8L+5wkeNkCXD8HwnHizPx5r0nLqw==
=ouFA
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Some ext4 bug fixes (mostly to address Syzbot reports)"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: bail out of ext4_xattr_ibody_get() fails for any reason
ext4: add bounds checking in get_max_inline_xattr_value_size()
ext4: add indication of ro vs r/w mounts in the mount message
ext4: fix deadlock when converting an inline directory in nojournal mode
ext4: improve error recovery code paths in __ext4_remount()
ext4: improve error handling from ext4_dirhash()
ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled
ext4: check iomap type only if ext4_iomap_begin() does not fail
ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum
ext4: fix data races when using cached status extents
ext4: avoid deadlock in fs reclaim with page writeback
ext4: fix invalid free tracking in ext4_xattr_move_to_block()
ext4: remove a BUG_ON in ext4_mb_release_group_pa()
ext4: allow ext4_get_group_info() to fail
ext4: fix lockdep warning when enabling MMP
ext4: fix WARNING in mb_find_extent
Single small fix for the UFS driver to fix a power management failure.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZGAbriYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishWqRAP4yeSUs
mx7L4bQElBS8Qzha34WWf538mZKMeWd7GzkBDgD/R0qEUrfl6u2SynrwFlRM7xTN
XI4O5a0YcXJ6VWW0bd0=
=BfhA
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"A single small fix for the UFS driver to fix a power management
failure"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Fix I/O hang that occurs when BKOPS fails in W-LUN suspend
Fix the __swp_offset() and __swp_entry() macros due to commit 6d239fc78c
("parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE") which introduced the
SWP_EXCLUSIVE flag by reusing the _PAGE_ACCESSED flag.
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 6d239fc78c ("parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE")
Cc: <stable@vger.kernel.org> # v6.3+
In ext4_update_inline_data(), if ext4_xattr_ibody_get() fails for any
reason, it's best if we just fail as opposed to stumbling on,
especially if the failure is EFSCORRUPTED.
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Normally the extended attributes in the inode body would have been
checked when the inode is first opened, but if someone is writing to
the block device while the file system is mounted, it's possible for
the inode table to get corrupted. Add bounds checking to avoid
reading beyond the end of allocated memory if this happens.
Reported-by: syzbot+1966db24521e5f6e23f7@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=1966db24521e5f6e23f7
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Whether the file system is mounted read-only or read/write is more
important than the quota mode, which we are already printing. Add the
ro vs r/w indication since this can be helpful in debugging problems
from the console log.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If there are failures while changing the mount options in
__ext4_remount(), we need to restore the old mount options.
This commit fixes two problem. The first is there is a chance that we
will free the old quota file names before a potential failure leading
to a use-after-free. The second problem addressed in this commit is
if there is a failed read/write to read-only transition, if the quota
has already been suspended, we need to renable quota handling.
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-2-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When a file system currently mounted read/only is remounted
read/write, if we clear the SB_RDONLY flag too early, before the quota
is initialized, and there is another process/thread constantly
attempting to create a directory, it's possible to trigger the
WARN_ON_ONCE(dquot_initialize_needed(inode));
in ext4_xattr_block_set(), with the following stack trace:
WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680
RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141
Call Trace:
ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458
ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44
security_inode_init_security+0x2df/0x3f0 security/security.c:1147
__ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324
ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992
vfs_mkdir+0x29d/0x450 fs/namei.c:4038
do_mkdirat+0x264/0x520 fs/namei.c:4061
__do_sys_mkdirat fs/namei.c:4076 [inline]
__se_sys_mkdirat fs/namei.c:4074 [inline]
__x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu
Reported-by: syzbot+6385d7d3065524c5ca6d@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c34c
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ext4 has a filesystem wide lock protecting ext4_writepages() calls to
avoid races with switching of journalled data flag or inode format. This
lock can however cause a deadlock like:
CPU0 CPU1
ext4_writepages()
percpu_down_read(sbi->s_writepages_rwsem);
ext4_change_inode_journal_flag()
percpu_down_write(sbi->s_writepages_rwsem);
- blocks, all readers block from now on
ext4_do_writepages()
ext4_init_io_end()
kmem_cache_zalloc(io_end_cachep, GFP_KERNEL)
fs_reclaim frees dentry...
dentry_unlink_inode()
iput() - last ref =>
iput_final() - inode dirty =>
write_inode_now()...
ext4_writepages() tries to acquire sbi->s_writepages_rwsem
and blocks forever
Make sure we cannot recurse into filesystem reclaim from writeback code
to avoid the deadlock.
Reported-by: syzbot+6898da502aef574c5f8a@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/0000000000004c66b405fa108e27@google.com
Fixes: c8585c6fca ("ext4: fix races between changing inode journal mode and ext4_writepages")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230504124723.20205-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Previously, ext4_get_group_info() would treat an invalid group number
as BUG(), since in theory it should never happen. However, if a
malicious attaker (or fuzzer) modifies the superblock via the block
device while it is the file system is mounted, it is possible for
s_first_data_block to get set to a very large number. In that case,
when calculating the block group of some block number (such as the
starting block of a preallocation region), could result in an
underflow and very large block group number. Then the BUG_ON check in
ext4_get_group_info() would fire, resutling in a denial of service
attack that can be triggered by root or someone with write access to
the block device.
For a quality of implementation perspective, it's best that even if
the system administrator does something that they shouldn't, that it
will not trigger a BUG. So instead of BUG'ing, ext4_get_group_info()
will call ext4_error and return NULL. We also add fallback code in
all of the callers of ext4_get_group_info() that it might NULL.
Also, since ext4_get_group_info() was already borderline to be an
inline function, un-inline it. The results in a next reduction of the
compiled text size of ext4 by roughly 2k.
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230430154311.579720-2-tytso@mit.edu
Reported-by: syzbot+e2efa3efc15a1c9e95c3@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=69b28112e098b070f639efb356393af3ffec4220
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmRfkqYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgptznD/9PB1uPQEZ6hDaCg31XBmbe4m6sRcAEaNr0
RO18W8A8TYgxxpaQubPm53+sMFoaSc3oH7Ingu1iBfZdNY/5sY2hfgQW+M9NGoXa
uSeibVThihngeLVrWIks/w9k2UL+80Xs8THH/b+AQISK3gVqrPkpFcAybIGSPOZS
LiB3MbEbvGyxfKe64pRBPOZ3B6nedKHqpQ24g6XR9tx0dwtOUWWn6Len+4yPE1j7
Isd9OVMZEvw1orIVNe5y+FzJfoezGAMrHxTaQ+sGZj5JjaFzrFFfn9BeD9kMAxFd
8OoWRAUQWGD3yrxQ8Dd9jEdGfvNQHKmiFb9nygPPBUmgNmGct6O9lWyUWCKV121+
7m6ThdZLLYS9G555BZjI/4ubHtl/Y4upsHI5Ixbnmgvrp3tFyaX1ids5VNJPgKaH
lrrlQ307tIMDp1q2xcunE3uBE4/eBewnQ74S0rvtpvBM3UV6iDBLEkBQ/+5WPw0q
qAgQwDkTr+uzudkOUhJDa5VWzNY521ALLmc94d+4bOawiY9YYjuxV0ORVa6s4V2a
Ez6Gj9O06h3Ndg0UbDvUOE4YF4fmBezbLdbS10gcRCR9L/VZMG44jfHualgrfXmN
efZrK5YUHaSkwllsN2OSkYaH0/iG63yEHOc9sL1lfRQfUEVOWSUqkkrsPwSlG2E8
FC4b6BEpQA==
=8fWJ
-----END PGP SIGNATURE-----
Merge tag 'block-6.4-2023-05-13' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Just a few minor fixes for drivers, and a deletion of a file that is
woefully out-of-date these days"
* tag 'block-6.4-2023-05-13' of git://git.kernel.dk/linux:
Documentation/block: drop the request.rst file
ublk: fix command op code check
block/rnbd: replace REQ_OP_FLUSH with REQ_OP_WRITE
nbd: Fix debugfs_create_dir error checking
Add a return to the error path when cxl_cdat_read_table() fails. Current
code continues with the table pointer points to freed memory.
Fixes: 7a877c9239 ("cxl/pci: Simplify CDAT retrieval error path")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/168382793506.3510737.4792518576623749076.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>