Unused Value that colin.king@canonical.com sent me and a
related fix I added.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdLflkAAoJEM9EDqnrzg2+XEIQAImX96e5A/gDSAFhL23cIW73
2mAeCjwvpHk9hmLESpek3NfKH9fBkKS07bv0ICr1P4vSW2KIecbyc6Megern8w6H
WuY0WSELCDy207zqHf1BeGwEMC8Z/MadVSP/2wed7E0+AppcsPeGhe5yFlD8MKYq
sFBoZ3tQ88iujNqlORlPy7bWJzsx3a9MYznDlbeIVY7gsgnFrgyHCoY+cpZTG9fr
ZEQmyw2B7NWwzu438i3Qk+MDiztq4ldwhsO8HFt3+qTBtfUzmb7upWFMM1AUWQxd
3XIYZiCXgPmfIsQQJE5HPeE10zrMNVRYu6CjeUZPl11iy+/OKcY/UVQCwNxbQFAc
ZBFk6OIj9wZsMWAgnlp6zUUJ/s//OudnuSbbHlj3VDVEa+aZSJawgoIsauCyyZc/
5/sDpKcjb5Z4O5Sd+BkT8Zg0iAszBx7aJSqj/ggPvqxt7UQ9dF97AgYdZYfav/br
xL2O9KNfWGHj+qUV8qmQXVMJZWPrHZjQMPqA4Oi5CxmW+eF1kxr2mw3bU7suHXKm
CA7fRECsiyo2ULIl3wT12cOrpuFRQegkUW6AuBq5Z2JNw+H15JCbPZVMllfj7xye
TAxXXp9atcyRpCOrtB3dRO0itnGDrWx4WAxaKR127IeqKAiP2ZsPceu0mILamPno
ekJLcwC8kwbUDrNHR170
=PJuy
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.3-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"Two small fixes.
This is just a fix for an unused value that Colin King sent me and a
related fix I added"
* tag 'for-linus-5.3-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: eliminate needless variable assignments
orangefs: remove redundant assignment to variable buffer_index
- Standardize parameter checking for the SETFLAGS and FSSETXATTR ioctls
(which were the file attribute setters for ext4 and xfs and have now
been hoisted to the vfs)
- Only allow the DAX flag to be set on files and directories.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl0aJgMACgkQ+H93GTRK
tOuKkg//SJaxcB63uVPZk9hDraYTmyo9OXRRX6X9WwDKPTWwa88CUwS1ny1QF7Mt
zMkgzG2/y2Rs9PQ0ARoPbh1hNb2CXnvA+xnzUEev1MW6UN/nTFMZEOPn2ZQ+DxQE
gg/0U56kKgtjtXzBZVpTgHzSETivdXwHxFW3hiTtyRXg+4ulgDIZLOjN2wRB+Pdb
X8ZmM6MqKOTbhQEXlw13TlCKBzoMjC1w4UU4rkZPjoSjAaUWiPfrk/XU7qgguf9p
v1dbSN2dADQ19jzZ1dmggXnlJsRMZjk/ls5rxJlB5DHDbh6YgnA2TE+tYrtH28eB
uyKfD+RQnMzRVdmH8PsMQRQQFXR2UYyprVP7a6wi6TkB+gytn7sR5uT4sbAhmhcF
TiTYfYNRXzemHCewyOwOsUE/7oCeiJcdbqiPAHHD/jYLZfRjSXDcGzz3+7ZYZ3GO
hRxUhpxHPbkmK4T2OxhzReCbRsLN/0BeEcDdLkNWmi2FTh3V1gYzMGkgI9wsVbsd
pHjoGIHbMPWqktF/obuGq96WVfYBBaWJ6WNzQqKT4dQYAJBW2omxitXQHLpi6cjt
hG5ncxa3cPpWx4t3Lx2hb0TPS7RyYvuoQIcS/Me2RWioxrwWrgnOqdHFfLEwWpfN
jRowdWiGgOIsq8hMt7qycmGCXzbgsbaA/7oRqh8TiwM9taPOM4c=
=uH2E
-----END PGP SIGNATURE-----
Merge tag 'vfs-fix-ioctl-checking-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull common SETFLAGS/FSSETXATTR parameter checking from Darrick Wong:
"Here's a patch series that sets up common parameter checking functions
for the FS_IOC_SETFLAGS and FS_IOC_FSSETXATTR ioctl implementations.
The goal here is to reduce the amount of behaviorial variance between
the filesystems where those ioctls originated (ext2 and XFS,
respectively) and everybody else.
- Standardize parameter checking for the SETFLAGS and FSSETXATTR
ioctls (which were the file attribute setters for ext4 and xfs and
have now been hoisted to the vfs)
- Only allow the DAX flag to be set on files and directories"
* tag 'vfs-fix-ioctl-checking-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
vfs: only allow FSSETXATTR to set DAX flag on files and dirs
vfs: teach vfs_ioc_fssetxattr_check to check extent size hints
vfs: teach vfs_ioc_fssetxattr_check to check project id info
vfs: create a generic checking function for FS_IOC_FSSETXATTR
vfs: create a generic checking and prep function for FS_IOC_SETFLAGS
The variable buffer_index is being initialized however this is never
read and later it is being reassigned to a new value. The initialization
is redundant and hence can be removed.
Addresses-Coverity: ("Unused Value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Stephen writes:
After merging the driver-core tree, today's linux-next build (x86_64
allmodconfig) produced this warning:
fs/orangefs/orangefs-debugfs.c: In function 'orangefs_debugfs_init':
fs/orangefs/orangefs-debugfs.c:193:1: warning: label 'out' defined but not used [-Wunused-label]
out:
^~~
fs/orangefs/orangefs-debugfs.c: In function 'orangefs_kernel_debug_init':
fs/orangefs/orangefs-debugfs.c:204:17: warning: unused variable 'ret' [-Wunused-variable]
struct dentry *ret;
^~~
Fix this up and change the return type of the function to void as it can
not fail, which cleans up some more code and variables as well.
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Martin Brandenburg <martin@omnibond.com>
Cc: devel@lists.orangefs.org
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: f095adba36 ("orangefs: no need to check return value of debugfs_create functions")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Martin Brandenburg <martin@omnibond.com>
Cc: devel@lists.orangefs.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20190612152204.GA17511@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Create a generic function to check incoming FS_IOC_SETFLAGS flag values
and later prepare the inode for updates so that we can standardize the
implementations that follow ext4's flag values.
Note that the efivarfs implementation no longer fails a no-op SETFLAGS
without CAP_LINUX_IMMUTABLE since that's the behavior in ext*.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Sterba <dsterba@suse.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have MODULE_LICENCE("GPL*") inside which was used in the initial
scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.
This patch does not change any functionality. New functionality will
follow in subsequent patches.
Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.
NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted. This breaks the current
GUP call site convention of having the returned pages be the final
parameter. So the suggestion was rejected.
Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marshall <hubcap@omnibond.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
the pagecache" patch series which greatly improves our small IO
performance and helps us pass more xfstests than before.
fix: orangefs: truncate before updating size
Pagecache series: all the rest
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJc1CucAAoJEM9EDqnrzg2+2h4P/i2ktG4Zi3VDwCr1xea0bWSf
kidYxzVPmGAmfC+UNMn/+X4v/R6rtHcpW/g7/0WLqhLUhRwjB1gzT6U9mHfu8BQV
1I7tg6lQqcTi4gNkBMWI4MHKRyoQ5bAgY7jFxLjqx1zj5+qxrtPn4GUCUbDBDPpP
Ip9RD/izEKeG77gdSUkCD72ojHl9lqmT1Qd3k3asmozzdWzd8mdXy8EQplECo7tZ
1tLzV+BSER2BVQfmws2CCG00DLFzQRJkDYxm7I5l1xoQ/Ft8n6k/tXZ1QzkPREg2
FhaoY5eCA/2MkB0VhUKj1Y+gXNZcbys11oxcqyZZ6p2kOu+1bY+evkzwDUzsIAg+
nl1d3LoeEVKoxpa4O+vCHLF++HX8TuVHSaxvNaZPMm86gxowCh0aberyWcscGbQp
m2M28WdjlzYC7uPSakoKVvF+ZBNKQP5sn6jbjYSOHAB9nFz6x5ILH3SMK8MXoZZe
2Ejd4NBmtqekdLp2MNlAWHF1O9sTagH+OqUvbNRNjrQXM3hNSgeguGx1b0E+KlT2
lwKORIVwZNZ/xuizdwM01kWRQObAmGrptlVbR4+HKqo99g9gGv6sdw//cHzo9/2S
GV34cpS5VRSwHVwuzADD1A/qQ+k5xomihrpZwcqh+YOwQV16JnTHiSeTYZKVIEW8
cBiZ2k5dbKm1QcLc9tJ3
=uBS+
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.2-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"This includes one fix and our "Orangefs through the pagecache" patch
series which greatly improves our small IO performance and helps us
pass more xfstests than before.
Fix:
- orangefs: truncate before updating size
Pagecache series:
- all the rest"
* tag 'for-linus-5.2-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (23 commits)
orangefs: truncate before updating size
orangefs: copy Orangefs-sized blocks into the pagecache if possible.
orangefs: pass slot index back to readpage.
orangefs: remember count when reading.
orangefs: add orangefs_revalidate_mapping
orangefs: implement writepages
orangefs: write range tracking
orangefs: avoid fsync service operation on flush
orangefs: skip inode writeout if nothing to write
orangefs: move do_readv_writev to direct_IO
orangefs: do not return successful read when the client-core disappeared
orangefs: implement writepage
orangefs: migrate to generic_file_read_iter
orangefs: service ops done for writeback are not killable
orangefs: remove orangefs_readpages
orangefs: reorganize setattr functions to track attribute changes
orangefs: let setattr write to cached inode
orangefs: set up and use backing_dev_info
orangefs: hold i_lock during inode_getattr
orangefs: update attributes rather than relying on server
...
Otherwise we race with orangefs_writepage/orangefs_writepages
which and does not expect i_size < page_offset.
Fixes xfstests generic/129.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
->readpage looks in file->private_data to try and find out how the
userspace program set "count" in read(2) or with "dd bs=" or whatever.
->readpage uses "count" and inode->i_size to calculate how much
data Orangefs should deposit in the Orangefs shared buffer, and
remembers which slot the data is in.
After copying data from the Orangefs shared buffer slot into
"the page", readpage tries to increment through the pagecache index
and fill as many pages as it can from the extra data in the shared
buffer. Hopefully these extra pages will soon be needed by the vfs,
and they'll be in the pagecache already.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
When userspace deposits more than a page of data into the shared buffer,
we'll need to know which slot it is in when we get back to readpage
so that we can try to use the extra data to fill some extra pages.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Orangefs wins when it can do IO on large (up to four meg) blocks at a time,
and looses when it has to do tiny "small io" reads and writes. Accessing
Orangefs through the pagecache with the kernel module helps with small io,
both reading and writing, a great deal. Readpage generally tries to fetch a
page (four k) at a time. We'll let users use "count" (as in read(2) or
pread(2) for example) as a knob to control how much data they get from
Orangefs at a time and we'll try to use the data to fill extra
pagecache pages when we get to ->readpage, hopefully resulting in
fewer calls to readpage and Orangefs userspace.
We need a way to remember how they set count so that we can still have
it available when we get to ->readpage.
- We'll use file->private_data to keep track of "count".
We'll wrap generic_file_open with orangefs_file_open and
initialize private_data to NULL there.
- In ->read_iter we have access to both "count" and file, so
we'll kmalloc some space onto file->private_data and store
"count" there.
- We'll kfree file->private_data each time we visit ->flush and
reinitialize it to NULL.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
This is modeled after NFS, except our method is different. We use a
simple timer to determine whether to invalidate the page cache. This
is bound to perform.
This addes a sysfs parameter cache_timeout_msecs which controls the time
between page cache invalidations.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Go through pages and look for a consecutive writable region. After
finding a number of consecutive writable pages or when finding that
the next page's dirty range is not contiguous and cannot be written
as one request, send the write to the server.
The number of pages is determined by the client-core's buffer size.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Attach the actual range of bytes written to plus the responsible uid/gid
to each dirty page. This information must be sent to the server when
the page is written out.
Now write_begin, page_mkwrite, and invalidatepage keep up with this
information. There are several conditions where they must write out the
page immediately to store the new range. Two non-contiguous ranges
cannot be stored on a single page.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Without this, an fsync call is sent to the server even if no data
changed. This resulted in a rather severe (50%) performance regression
under certain metadata-heavy workloads.
In the past, everything was direct IO. Nothing happend on a close call.
An explicit fsync call would send an fsync request to the server which
in turn fsynced the underlying file.
Now there are cached writes. Then fsync began writing out dirty pages
in addition to making an fsync request to the server, and close began
calling fsync.
With this commit, close only writes out dirty pages, and does not make
the fsync request.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Would happen if an inode is dirty but whatever happened is not something
that can be written out to OrangeFS.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
direct_IO was the only caller and all direct_IO did was call it,
so there's no use in having the code spread out into so many functions.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Now orangefs_inode_getattr fills from cache if an inode has dirty pages.
also if attr_valid and dirty pages and !flags, we spin on inode writeback
before returning if pages still dirty after: should it be other way
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Remove orangefs_inode_read. It was used by readpage. Calling
wait_for_direct_io directly serves the purpose just as well. There is
now no check of the bufmap size in the readpage path. There are already
other places the bufmap size is assumed to be greater than PAGE_SIZE.
Important to call truncate_inode_pages now in the write path so a
subsequent read sees the new data.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
It's a copy of the loop which would run in read_pages from
mm/readahead.c.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
OrangeFS accepts a mask indicating which attributes were changed. The
kernel must not set any bits except those that were actually changed.
The kernel must set the uid/gid of the request to the actual uid/gid
responsible for the change.
Code path for notify_change initiated setattrs is
orangefs_setattr(dentry, iattr)
-> __orangefs_setattr(inode, iattr)
In kernel changes are initiated by calling __orangefs_setattr.
Code path for writeback is
orangefs_write_inode
-> orangefs_inode_setattr
attr_valid and attr_uid and attr_gid change together under i_lock.
I_DIRTY changes separately.
__orangefs_setattr
lock
if needs to be cleaned first, unlock and retry
set attr_valid
copy data in
unlock
mark_inode_dirty
orangefs_inode_setattr
lock
copy attributes out
unlock
clear getattr_time
# __writeback_single_inode clears dirty
orangefs_inode_getattr
# possible to get here with attr_valid set and not dirty
lock
if getattr_time ok or attr_valid set, unlock and return
unlock
do server operation
# another thread may getattr or setattr, so check for that
lock
if getattr_time ok or attr_valid, unlock and return
else, copy in
update getattr_time
unlock
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
This is a fairly big change, but ultimately it's not a lot of code.
Implement write_inode and then avoid the call to orangefs_inode_setattr
within orangefs_setattr.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
This should be a no-op now. When inode writeback works, this will
prevent a getattr from overwriting inode data while an inode is
transitioning to dirty.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
This should be a no-op now, but once inode writeback works, it'll be
necessary to have the correct attribute in the dirty inode.
Previously the attribute fetch timeout was marked invalid and the server
provided the updated attribute. When the inode is dirty, the server
cannot be consulted since it does not yet know the pending setattr.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
No need to store the received mask. It is either STATX_BASIC_STATS or
STATX_BASIC_STATS & ~STATX_SIZE. If STATX_SIZE is requested, the cache
is bypassed anyway, so the cached mask is unnecessary to decide whether
to do a real getattr.
This is a change. Previously a getattr would want size and use the
cached size. All of the in-kernel callers that wanted size did not want
a cached size. Now a getattr cannot use the cached size if it wants
size at all.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
When an inode is created, we fetch attributes from the server. There is
no need to turn around and invalidate them.
No need to initialize attributes after the getattr either. Either it'll
be exactly the same, or it'll be something else and wrong.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
This uses the same timeout as the getattr cache. This substantially
increases performance when writing files with smaller buffer sizes.
When writing, the size is (often) changed, which causes a call to
notify_change which calls security_inode_need_killpriv which needs a
getxattr. Caching it reduces traffic to the server.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Pull misc vfs updates from Al Viro:
"Assorted fixes (really no common topic here)"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: Make __vfs_write() static
vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
pipe: stop using ->can_merge
splice: don't merge into linked buffers
fs: move generic stat response attr handling to vfs_getattr_nosec
orangefs: don't reinitialize result_mask in ->getattr
fs/devpts: always delete dcache dentry-s in dput()
The caller already initializes it to the basic stats. Just
clear not supported default bits where needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Multiple filesystems open code lru_to_page(). Rectify this by moving
the macro from mm_inline (which is specific to lru stuff) to the more
generic mm.h header and start using the macro where appropriate.
No functional changes.
Link: http://lkml.kernel.org/r/20181129104810.23361-1-nborisov@suse.com
Link: https://lkml.kernel.org/r/20181129075301.29087-1-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Pankaj gupta <pagupta@redhat.com>
Acked-by: "Yan, Zheng" <zyan@redhat.com> [ceph]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is already done for us internally by the signal machinery.
[akpm@linux-foundation.org: fix fs/buffer.c]
Link: http://lkml.kernel.org/r/20181116002713.8474-7-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull AFS updates from Al Viro:
"AFS series, with some iov_iter bits included"
* 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
missing bits of "iov_iter: Separate type from direction and use accessor functions"
afs: Probe multiple fileservers simultaneously
afs: Fix callback handling
afs: Eliminate the address pointer from the address list cursor
afs: Allow dumping of server cursor on operation failure
afs: Implement YFS support in the fs client
afs: Expand data structure fields to support YFS
afs: Get the target vnode in afs_rmdir() and get a callback on it
afs: Calc callback expiry in op reply delivery
afs: Fix FS.FetchStatus delivery from updating wrong vnode
afs: Implement the YFS cache manager service
afs: Remove callback details from afs_callback_break struct
afs: Commit the status on a new file/dir/symlink
afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
afs: Don't invoke the server to read data beyond EOF
afs: Add a couple of tracepoints to log I/O errors
afs: Handle EIO from delivery function
afs: Fix TTL on VL server and address lists
afs: Implement VL server rotation
afs: Improve FS server rotation error handling
...
In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.
Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements. This makes it easier to add further
iterator types. Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.
Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself. Only the direction is required.
Signed-off-by: David Howells <dhowells@redhat.com>
If a slab cache object is allocated, it needs to be freed eventually,
certainly before anyone unloads the module that allocated it.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Currently accessing various /sys/fs/orangefs files will spam the
kernel log with the following info message when the client is not
running:
[ 491.489284] sysfs_service_op_show: Client not running :-5:
Rate limit this info message to make it less spammy.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
default_acl and acl of newly created inode will be initiated
as ACL_NOT_CACHED in vfs function inode_init_always() and later
will be updated by calling xxx_init_acl() in specific filesystems.
Howerver, when default_acl and acl are NULL then they keep the value
of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl
in this case.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Pointer orangefs_inode is being assigned but is never used hence it is
redundant and can be removed.
Cleans up clang warning:
warning: variable 'orangefs_inode' set but not used [-Wunused-but-set-variable]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Use new return type vm_fault_t for fault handler. For now,
this is just documenting that the function returns a VM_FAULT
value rather than an errno. Once all instances are converted,
vm_fault_t will become a distinct type.
See the following
commit 1c8f422059 ("mm: change return type to vm_fault_t")
Fixed checkpatch.pl warning.
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
- can.rst: fix a footnote reference;
- crypto_engine.rst: Fix two parsing warnings;
- Fix a lot of broken references to Documentation/*;
- Improves the scripts/documentation-file-ref-check script,
in order to help detecting/fixing broken references,
preventing false-positives.
After this patch series, only 33 broken references to doc files are
detected by scripts/documentation-file-ref-check.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbJC2aAAoJEAhfPr2O5OEVPmMP/2rN5m9LZ048oRWlg4hCwo73
4FpWqDg18hbWCMHXYHIN1UACIMUkIUfgLhF7WE3D/XqRMuxHoiE5u7DUdak7+VNt
wunpksKFJbgyfFMHRvykHcZV+jQFVbM7eFvXVPIvoSaAeGH6zx4imHTyeDn3x/nL
gdtBqM4bvEhmBjotBTRR4PB8+oPrT/HIT5npHepx3UnFFFAzDQGEZ/I67/el2G5C
pVmYdBXvr7iqrvUs6FilHLTEfe1quCI4UaKNfLHKrxXrTkiJQFOwugYuobZfNmxT
GwjWzfpNy9HMlKJFYipcByALxel1Mnpqz5mIxFQaCTygBuEsORCWzW5MoKIsIUJ0
KOoG76v0rUyMvLBRvaoao3CHYHdzxhQbtVV9DjyDuDksa2G5IoCAF1t6DyIOitRw
9plMnGckk+FJ/MXJKYWXHszFS8NhI0SF2zHe3s1DmRTD8P6oxkxvxBFz6iqqADmL
W6XHd8CcqJItaS9ctPen91TFuysN1HFpdzLLY+xwWmmKOcWC/jFjhTm8pj7xLQHM
5yuuEcefsajf+Xk4w2fSQmRfXnuq+oOlPuWpwSvEy+59cHGI0ms18P1nHy/yt3II
CJywwdx6fjwDon57RFKH7kkGd7px317zMqWdIv9gUj/qZAy9gcdLdvEQLhx9u0aV
4F+hLKFDFEpf58xqRT1R
=/ozx
-----END PGP SIGNATURE-----
Merge tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental
Pull documentation fixes from Mauro Carvalho Chehab:
"This solves a series of broken links for files under Documentation,
and improves a script meant to detect such broken links (see
scripts/documentation-file-ref-check).
The changes on this series are:
- can.rst: fix a footnote reference;
- crypto_engine.rst: Fix two parsing warnings;
- Fix a lot of broken references to Documentation/*;
- improve the scripts/documentation-file-ref-check script, in order
to help detecting/fixing broken references, preventing
false-positives.
After this patch series, only 33 broken references to doc files are
detected by scripts/documentation-file-ref-check"
* tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits)
fix a series of Documentation/ broken file name references
Documentation: rstFlatTable.py: fix a broken reference
ABI: sysfs-devices-system-cpu: remove a broken reference
devicetree: fix a series of wrong file references
devicetree: fix name of pinctrl-bindings.txt
devicetree: fix some bindings file names
MAINTAINERS: fix location of DT npcm files
MAINTAINERS: fix location of some display DT bindings
kernel-parameters.txt: fix pointers to sound parameters
bindings: nvmem/zii: Fix location of nvmem.txt
docs: Fix more broken references
scripts/documentation-file-ref-check: check tools/*/Documentation
scripts/documentation-file-ref-check: get rid of false-positives
scripts/documentation-file-ref-check: hint: dash or underline
scripts/documentation-file-ref-check: add a fix logic for DT
scripts/documentation-file-ref-check: accept more wildcards at filenames
scripts/documentation-file-ref-check: fix help message
media: max2175: fix location of driver's companion documentation
media: v4l: fix broken video4linux docs locations
media: dvb: point to the location of the old README.dvb-usb file
...
Pull compat updates from Al Viro:
"Some biarch patches - getting rid of assorted (mis)uses of
compat_alloc_user_space().
Not much in that area this cycle..."
* 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
orangefs: simplify compat ioctl handling
signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart
As files move around, their previous links break. Fix the
references for them.
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>
This is a late set of changes from Deepa Dinamani doing an automated
treewide conversion of the inode and iattr structures from 'timespec'
to 'timespec64', to push the conversion from the VFS layer into the
individual file systems.
There were no conflicts between this and the contents of linux-next
until just before the merge window, when we saw multiple problems:
- A minor conflict with my own y2038 fixes, which I could address
by adding another patch on top here.
- One semantic conflict with late changes to the NFS tree. I addressed
this by merging Deepa's original branch on top of the changes that
now got merged into mainline and making sure the merge commit includes
the necessary changes as produced by coccinelle.
- A trivial conflict against the removal of staging/lustre.
- Multiple conflicts against the VFS changes in the overlayfs tree.
These are still part of linux-next, but apparently this is no longer
intended for 4.18 [1], so I am ignoring that part.
As Deepa writes:
The series aims to switch vfs timestamps to use struct timespec64.
Currently vfs uses struct timespec, which is not y2038 safe.
The series involves the following:
1. Add vfs helper functions for supporting struct timepec64 timestamps.
2. Cast prints of vfs timestamps to avoid warnings after the switch.
3. Simplify code using vfs timestamps so that the actual
replacement becomes easy.
4. Convert vfs timestamps to use struct timespec64 using a script.
This is a flag day patch.
Next steps:
1. Convert APIs that can handle timespec64, instead of converting
timestamps at the boundaries.
2. Update internal data structures to avoid timestamp conversions.
Thomas Gleixner adds:
I think there is no point to drag that out for the next merge window.
The whole thing needs to be done in one go for the core changes which
means that you're going to play that catchup game forever. Let's get
over with it towards the end of the merge window.
[1] https://www.spinics.net/lists/linux-fsdevel/msg128294.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbInZAAAoJEGCrR//JCVInReoQAIlVIIMt5ZX6wmaKbrjy9Itf
MfgbFihQ/djLnuSPVQ3nztcxF0d66BKHZ9puVjz6+mIHqfDvJTRwZs9nU+sOF/T1
g78fRkM1cxq6ZCkGYAbzyjyo5aC4PnSMP/NQLmwqvi0MXqqrbDoq5ZdP9DHJw39h
L9lD8FM/P7T29Fgp9tq/pT5l9X8VU8+s5KQG1uhB5hii4VL6pD6JyLElDita7rg+
Z7/V7jkxIGEUWF7vGaiR1QTFzEtpUA/exDf9cnsf51OGtK/LJfQ0oiZPPuq3oA/E
LSbt8YQQObc+dvfnGxwgxEg1k5WP5ekj/Wdibv/+rQKgGyLOTz6Q4xK6r8F2ahxs
nyZQBdXqHhJYyKr1H1reUH3mrSgQbE5U5R1i3My0xV2dSn+vtK5vgF21v2Ku3A1G
wJratdtF/kVBzSEQUhsYTw14Un+xhBLRWzcq0cELonqxaKvRQK9r92KHLIWNE7/v
c0TmhFbkZA+zR8HdsaL3iYf1+0W/eYy8PcvepyldKNeW2pVk3CyvdTfY2Z87G2XK
tIkK+BUWbG3drEGG3hxZ3757Ln3a9qWyC5ruD3mBVkuug/wekbI8PykYJS7Mx4s/
WNXl0dAL0Eeu1M8uEJejRAe1Q3eXoMWZbvCYZc+wAm92pATfHVcKwPOh8P7NHlfy
A3HkjIBrKW5AgQDxfgvm
=CZX2
-----END PGP SIGNATURE-----
Merge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground
Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
"This is a late set of changes from Deepa Dinamani doing an automated
treewide conversion of the inode and iattr structures from 'timespec'
to 'timespec64', to push the conversion from the VFS layer into the
individual file systems.
As Deepa writes:
'The series aims to switch vfs timestamps to use struct timespec64.
Currently vfs uses struct timespec, which is not y2038 safe.
The series involves the following:
1. Add vfs helper functions for supporting struct timepec64
timestamps.
2. Cast prints of vfs timestamps to avoid warnings after the switch.
3. Simplify code using vfs timestamps so that the actual replacement
becomes easy.
4. Convert vfs timestamps to use struct timespec64 using a script.
This is a flag day patch.
Next steps:
1. Convert APIs that can handle timespec64, instead of converting
timestamps at the boundaries.
2. Update internal data structures to avoid timestamp conversions'
Thomas Gleixner adds:
'I think there is no point to drag that out for the next merge
window. The whole thing needs to be done in one go for the core
changes which means that you're going to play that catchup game
forever. Let's get over with it towards the end of the merge window'"
* tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
pstore: Remove bogus format string definition
vfs: change inode times to use struct timespec64
pstore: Convert internal records to timespec64
udf: Simplify calls to udf_disk_stamp_to_time
fs: nfs: get rid of memcpys for inode times
ceph: make inode time prints to be long long
lustre: Use long long type to print inode time
fs: add timespec64_truncate()
Pull the timespec64 conversion from Deepa Dinamani:
"The series aims to switch vfs timestamps to use
struct timespec64. Currently vfs uses struct timespec,
which is not y2038 safe.
The flag patch applies cleanly. I've not seen the timestamps
update logic change often. The series applies cleanly on 4.17-rc6
and linux-next tip (top commit: next-20180517).
I'm not sure how to merge this kind of a series with a flag patch.
We are targeting 4.18 for this.
Let me know if you have other suggestions.
The series involves the following:
1. Add vfs helper functions for supporting struct timepec64 timestamps.
2. Cast prints of vfs timestamps to avoid warnings after the switch.
3. Simplify code using vfs timestamps so that the actual
replacement becomes easy.
4. Convert vfs timestamps to use struct timespec64 using a script.
This is a flag day patch.
I've tried to keep the conversions with the script simple, to
aid in the reviews. I've kept all the internal filesystem data
structures and function signatures the same.
Next steps:
1. Convert APIs that can handle timespec64, instead of converting
timestamps at the boundaries.
2. Update internal data structures to avoid timestamp conversions."
I've pulled it into a branch based on top of the NFS changes that
are now in mainline, so I could resolve the non-obvious conflict
between the two while merging.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
+ fix some sparse warnings
+ cleanup some code formatting
+ fix up some attribute/meta-data related code
-----BEGIN PGP SIGNATURE-----
iQIbBAABAgAGBQJbGUIEAAoJEM9EDqnrzg2+3ygP9RPTSd2Mh/vyzokxmBacT/iW
CrJXdrJkG+Yb/adGuT2UAPIMg73aGLqAPMN8dHhfhzotdd1BgrhMK+zZECUClwOf
dxQCA+RFn1YofXMojqWYAFtJ6ptJgMDJxje62hS+QCWzDz0IM1PYr2qCQPs1IRmb
CWVuP1igwt82VIVLey8YEuI5yuOnFOSO+7rKi7TPsY991oPxxz2brmrMuiW2Dg0W
6MCP19XUBpxdDJWtL2GWXAwgkP9JYkhokwkAloqWXoOpDUf0EniomuRMBLOtISoO
tJfgnYIN+7sRTSbqsNtKQrm7f8/B01t+tGuzJYDGlUls02V3DkS9EjypvP+VZ6e9
bsN3oQHD6xFx/7egxUVBRFD6Hw83j58DH1N+79whVrU3ospkj3z0QX1g6zqgY2ew
mgileXBAJQD+SUr0F42QqeFVexjky9dKtDZLQe42jxLZXpPly2t737h4ZatdW3ro
yKJLUzuHbLGQzgGJOSQsSfmwUkSeTilgCV6WFyJWinlxUWQEkp0F+1u5B4RX87BQ
SHUz5O3pWDpz+kLtauzsVlXDbBQfZHimIbnKWHZoQ5dER/LQfWBfvOz+CT/5uBBg
qhdPZ4wKLD4qvzByuZJKyYmPXpUoOWh/i5QcvJ8B8rNxCx8XEP/VoikjUqkCUajb
4HFQwnZtnQOLfnufTTw=
=vwvd
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.18-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"Fixes and cleanups:
- fix some sparse warnings
- cleanup some code formatting
- fix up some attribute/meta-data related code"
* tag 'for-linus-4.18-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: use sparse annotations for holding locks across function calls.
orangefs: make debug_help_fops static
orangefs: remove unused function orangefs_get_bufmap_init
orangefs: specify user pointers when using dev_map_desc and bufmap
orangefs: formatting cleanups
orangefs: set i_size on new symlink
orangefs: report attributes_mask and attributes for statx
orangefs: make struct orangefs_file_vm_ops static
orangefs: revamp block sizes
get_bufmap_init is used in the out-of-tree module, but was left in
the upstream version as an oversight. Tip-of-the-hat to sparse and Al Viro.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
As long as a symlink inode remains in-core, the destination (and
therefore size) will not be re-fetched from the server, as it cannot
change. The original implementation of the attribute cache assumed that
setting the expiry time in the past was sufficient to cause a re-fetch
of all attributes on the next getattr. That does not work in this case.
The bug manifested itself as follows. When the command sequence
touch foo; ln -s foo bar; ls -l bar
is run, the output was
lrwxrwxrwx. 1 fedora fedora 4906 Apr 24 19:10 bar -> foo
However, after a re-mount, ls -l bar produces
lrwxrwxrwx. 1 fedora fedora 3 Apr 24 19:10 bar -> foo
After this commit, even before a re-mount, the output is
lrwxrwxrwx. 1 fedora fedora 3 Apr 24 19:10 bar -> foo
Reported-by: Becky Ligon <ligon@clemson.edu>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Fixes: 71680c18c8 ("orangefs: Cache getattr results.")
Cc: stable@vger.kernel.org
Cc: hubcap@omnibond.com
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
OrangeFS formerly failed to set attributes_mask with the result that
software could not see immutable and append flags present in the
filesystem.
Reported-by: Becky Ligon <ligon@clemson.edu>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Fixes: 68a24a6cc4 ("orangefs: implement statx")
Cc: stable@vger.kernel.org
Cc: hubcap@omnibond.com
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
The struct orangefs_file_vm_ops is local to the source and does not need
to be in global scope, so make it static.
Cleans up sparse warning:
fs/orangefs/file.c:547:35: warning: symbol 'orangefs_file_vm_ops' was not
declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Now the superblock block size is PAGE_SIZE. The inode block size is
PAGE_SIZE for directories and symlinks, but is the server-reported
block size for regular files.
The block size in the OrangeFS private inode is now deleted. Stat
now reports PAGE_SIZE for directories and symlinks and the
server-reported block size for regular files.
The user-space visible change is that the block size for directores
and symlinks and the superblock is now PAGE_SIZE rather than the size of
the client-core shared memory buffers, which was typically four
megabytes.
Reported-by: Becky Ligon <ligon@clemson.edu>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: hubcap@omnibond.com
Cc: walt@omnibond.com
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
For anything NFS-exported we do _not_ want to unlock new inode
before it has grown an alias; original set of fixes got the
ordering right, but missed the nasty complication in case of
lockdep being enabled - unlock_new_inode() does
lockdep_annotate_inode_mutex_key(inode)
which can only be done before anyone gets a chance to touch
->i_mutex. Unfortunately, flipping the order and doing
unlock_new_inode() before d_instantiate() opens a window when
mkdir can race with open-by-fhandle on a guessed fhandle, leading
to multiple aliases for a directory inode and all the breakage
that follows from that.
Correct solution: a new primitive (d_instantiate_new())
combining these two in the right order - lockdep annotate, then
d_instantiate(), then the rest of unlock_new_inode(). All
combinations of d_instantiate() with unlock_new_inode() should
be converted to that.
Cc: stable@kernel.org # 2.6.29 and later
Tested-by: Mike Marshall <hubcap@omnibond.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
orangefs_fill_sb() might've failed to allocate ORANGEFS_SB(s); don't
oops in that case.
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
+ Documentation cleanups
+ removal of unused code
+ cause some structs to be static
+ implement Orangefs vm_operations fault callout
+ eliminate two single-use functions and put their cleaned up code in line.
+ replace a vmalloc/memset instance with vzalloc
+ fix a race condition bug in wait code.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJayk8pAAoJEM9EDqnrzg2+92IP/2AqMOUodqF+AfKrAebQIAEk
DvQefRC2eQXN2Ak8KYSO7oMdqfY7xeVSaKllWsSHdnktI5L9l6AOBa7uYQYF4OQt
sqT5Wj5sbnDS54gOfFt95sUHEQXs5wegTXBE15t4uxaLSpv22mPjzqlTzyZcK7IX
nhSIxEi+utXOW1WodT2xLgS08ac693KXciNNN14NKi5DJ+h6a8Z2kZBq/6ssC1oV
nW2tIfRVWNVk6cfJ+Wq12JksEZOPemzjv2zojv3PZwCcwObOyxvf16nxnLjdkJvw
obYWmR8ZcF3TItBbkxOTfanr+UAhZ55YVYDpitJmV9fpPM9eoPAR0gNWBgq45kaS
s27pvMYf5uvGRJHgEO6hBpmi1dExDDMLfH6Y4IN+zyHKjNfgvfQKwIkzrjlmG6q6
AgO3mbbMJfid2x7lkShuP1uYqUfv03fo2xW3zxWjjxo3tEbADSgzJQ5kjPixDvcB
0ru2SlRVqgYIFqBN8VgWkXKbfcpSEcDITmFEkBKg/SkA08ry4iLk744UqIPa5n76
5DYB0u99nHPMf2BBVsz6L/mUfVuHBYUBx1iII95PqJhVlj2Yi0sIj3AHUzQ4neKS
YWGMk4VQ+B5zCObSohCRv9l3ECUqugpiJzCuZjZZS5LUkj9RO5EJ1LDhOU/31xDL
OUTgZl7f5NykxwIHh3bD
=zJ1Z
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"Fixes and cleanups:
- Documentation cleanups
- removal of unused code
- make some structs static
- implement Orangefs vm_operations fault callout
- eliminate two single-use functions and put their cleaned up code in
line.
- replace a vmalloc/memset instance with vzalloc
- fix a race condition bug in wait code"
* tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
Orangefs: documentation updates
orangefs: document package install and xfstests procedure
orangefs: remove unused code
orangefs: make several *_operations structs static
orangefs: implement vm_ops->fault
orangefs: open code short single-use functions
orangefs: replace vmalloc and memset with vzalloc
orangefs: bug fix for a race condition when getting a slot
Must retrieve size before running filemap_fault so the kernel has
an up-to-date size.
This should have been caught by xfstests generic/246, but it was masked
by orangefs_new_inode, which set i_size to PAGE_SIZE. When nothing
caused a getattr prior to a pagefault, i_size was still PAGE_SIZE.
Since xfstests only read 10 bytes, it did not catch this bug.
When orangefs_new_inode was modified to perform a getattr instead,
i_size was set to zero, as it was a newly created file. Then
orangefs_file_write_iter did NOT set i_size. Instead it invalidated the
attribute cache, which should have caused the next caller to retrieve
i_size. But the fault handler did not know it was supposed to retrieve
i_size. So during xfstests, i_size was still zero, and filemap_fault
returned VM_FAULT_SIGBUS.
Fixes xfstests generic/452.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Use vzalloc instead of the vmalloc, memset combo
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
When a slot becomes free, call wake_up_locked regardless of the number
of slots available.
Without this patch, wake_up_locked is only called when going from no
free slots to one. This means that there is a chance a waiting task
will not be woken up. In many cases, the system will bounce between 0
and 1 free slots, and the waiting tasks will be woken up. But if there
is still a waiting task and another slot becomes available before the
number of free slots reaches zero, that waiting task may never be woken
up since the number of free slots may never reach zero again.
The bug behavior is easy to reproduce with the following script,
where /mnt/orangefs is an OrangeFS file system.
for i in {1..100}; do
for j in {1..20}; do
dd if=/dev/zero of=/mnt/orangefs/tmp$j bs=32768 count=32 &
done
wait
done
Signed-off-by: David Reynolds <david@omnibond.com>
Reviewed-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
This patch fixes spelling typos found in printk.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
with de-mangling cleanups yet to come.
NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.
The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1. don't pass garbage return codes back up the call chain (Mike Marshall)
2. fix stale inode test (Martin Brandenburg)
3. fix off-by-one errors (Xiongfeng Wang)
Also: add Martin as a reviewer in the Maintainers file.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaejneAAoJEM9EDqnrzg2+XhoQAIDF112mOwLwqDPmr4ty0g6/
gBcoHOrRFlYWPlS5aubjoZ3jFX2fAeNuHzYS4LIuqVKUdsC+oTKQ2URJ7KKpvLiK
6zOaz2Y4GLns2sa1ZUKli6nEBbPi6uwoF54FNbwt3b+97wpmJwlnXm9ztyt5REKA
zOHvLgJAcfGNZEJ7gyB1zjwllu4JeD0A4MoN4vJCtkKLAaNClywu4+V0jwZB+SSN
8QjDXNqkcD31ahWhQ/CaU4zXlxOOV+4ZR7/p5IKT693hEhV+ikTvmXy8g0+bksxj
L+FHmQMTO+GqCS5FxuBQd3v1IP5FkoHEmAwvr3C5aMlRAaVJ9eVVIZaC9CpOJBRB
S/CiaG2Mw8vx8VGOm8O93Z+xDi9tCYP8x4i7b5r62h0T9wSyHJSkSIUd6VIkCV9Q
c92bX/N3wHBvCPT+RC898plni5HsFpzs3vSs8hiaAICgp64sC8pIqVlZOAdMtJd8
RL4la/Fited/T+3BpaCTkmnvNk8Ktax7wHYsCt4gSyHN8WRvkzowgC5kV6S30Qlh
zfoXG0K50FcU8T5r3i8slvUHmsiyYxYwJIk/z1iDgXI7y4IIR6FGDxQmw5TxgNS7
+veTo6FCxon6QshtpAOeELCau7qNXhtlDdGqqm4+gDfMWoCn0Jem/LzdA2gPXCOr
iCDwHLiu6WXt7ZHTrgln
=xrih
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"Mostly cleanups, but three bug fixes:
- don't pass garbage return codes back up the call chain (Mike
Marshall)
- fix stale inode test (Martin Brandenburg)
- fix off-by-one errors (Xiongfeng Wang)
Also add Martin as a reviewer in the Maintainers file"
* tag 'for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: reverse sense of is-inode-stale test in d_revalidate
orangefs: simplify orangefs_inode_is_stale
Orangefs: don't propogate whacky error codes
orangefs: use correct string length
orangefs: make orangefs_make_bad_inode static
orangefs: remove ORANGEFS_KERNEL_DEBUG
orangefs: remove gossip_ldebug and gossip_lerr
orangefs: make orangefs_client_debug_init static
MAINTAINERS: update orangefs list and add myself as reviewer
If a dentry is deleted, then a dentry is recreated with the same handle
but a different type (i.e. it was a file and now it's a symlink), then
its a different inode. The check was backwards, so d_revalidate would
not have noticed.
Due to the design of the OrangeFS server, this is rather unlikely.
It's also possible for the dentry to be deleted and recreated with the
same type. This would be undetectable. It's a bit of a ship of
Theseus.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Check whether this is a new inode at location of call.
Raises the question of what to do with an unknown inode type. Old code
would've marked the inode bad and returned ESTALE.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
When we get an error return code from userspace (the client-core)
we check to make sure it is a valid code.
This patch maps the whacky return code to -EINVAL instead of
propagating garbage back up the call chain potentially resulting
in a hard-to-find train-wreck.
The client-core doesn't have any business returning whacky return
codes, but if it does, we don't want the kernel to crash as a result.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
gcc-8 reports
fs/orangefs/dcache.c: In function 'orangefs_d_revalidate':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified
bound 256 equals destination size [-Wstringop-truncation]
fs/orangefs/namei.c: In function 'orangefs_rename':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified
bound 256 equals destination size [-Wstringop-truncation]
fs/orangefs/super.c: In function 'orangefs_mount':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified
bound 256 equals destination size [-Wstringop-truncation]
We need one less byte or call strlcpy() to make it a nul-terminated
string.
Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
It wasn't possible to enable it, and it would've had very little effect.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
gossip_ldebug is unused.
gossip_lerr is used in two places. The messages are unique so line
numbers are unnecessary.
Also remove support for compiling gossip messages out. It wasn't
possible to enable it anyway.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass all
hardened usercopy checks since these sizes cannot change at runtime.)
This new check is WARN-by-default, so any mistakes can be found over the
next several releases without breaking anyone's system.
The series has roughly the following sections:
- remove %p and improve reporting with offset
- prepare infrastructure and whitelist kmalloc
- update VFS subsystem with whitelists
- update SCSI subsystem with whitelists
- update network subsystem with whitelists
- update process memory with whitelists
- update per-architecture thread_struct with whitelists
- update KVM with whitelists and fix ioctl bug
- mark all other allocations as not whitelisted
- update lkdtm for more sensible test overage
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJabvleAAoJEIly9N/cbcAmO1kQAJnjVPutnLSbnUteZxtsv7W4
43Cggvokfxr6l08Yh3hUowNxZVKjhF9uwMVgRRg9Nl5WdYCN+vCQbHz+ZdzGJXKq
cGqdKWgexMKX+aBdNDrK7BphUeD46sH7JWR+a/lDV/BgPxBCm9i5ZZCgXbPP89AZ
NpLBji7gz49wMsnm/x135xtNlZ3dG0oKETzi7MiR+NtKtUGvoIszSKy5JdPZ4m8q
9fnXmHqmwM6uQFuzDJPt1o+D1fusTuYnjI7EgyrJRRhQ+BB3qEFZApXnKNDRS9Dm
uB7jtcwefJCjlZVCf2+PWTOEifH2WFZXLPFlC8f44jK6iRW2Nc+wVRisJ3vSNBG1
gaRUe/FSge68eyfQj5OFiwM/2099MNkKdZ0fSOjEBeubQpiFChjgWgcOXa5Bhlrr
C4CIhFV2qg/tOuHDAF+Q5S96oZkaTy5qcEEwhBSW15ySDUaRWFSrtboNt6ZVOhug
d8JJvDCQWoNu1IQozcbv6xW/Rk7miy8c0INZ4q33YUvIZpH862+vgDWfTJ73Zy9H
jR/8eG6t3kFHKS1vWdKZzOX1bEcnd02CGElFnFYUEewKoV7ZeeLsYX7zodyUAKyi
Yp5CImsDbWWTsptBg6h9nt2TseXTxYCt2bbmpJcqzsqSCUwOQNQ4/YpuzLeG0ihc
JgOmUnQNJWCTwUUw5AS1
=tzmJ
-----END PGP SIGNATURE-----
Merge tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardened usercopy whitelisting from Kees Cook:
"Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs.
To further restrict what memory is available for copying, this creates
a way to whitelist specific areas of a given slab cache object for
copying to/from userspace, allowing much finer granularity of access
control.
Slab caches that are never exposed to userspace can declare no
whitelist for their objects, thereby keeping them unavailable to
userspace via dynamic copy operations. (Note, an implicit form of
whitelisting is the use of constant sizes in usercopy operations and
get_user()/put_user(); these bypass all hardened usercopy checks since
these sizes cannot change at runtime.)
This new check is WARN-by-default, so any mistakes can be found over
the next several releases without breaking anyone's system.
The series has roughly the following sections:
- remove %p and improve reporting with offset
- prepare infrastructure and whitelist kmalloc
- update VFS subsystem with whitelists
- update SCSI subsystem with whitelists
- update network subsystem with whitelists
- update process memory with whitelists
- update per-architecture thread_struct with whitelists
- update KVM with whitelists and fix ioctl bug
- mark all other allocations as not whitelisted
- update lkdtm for more sensible test overage"
* tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits)
lkdtm: Update usercopy tests for whitelisting
usercopy: Restrict non-usercopy caches to size 0
kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
kvm: whitelist struct kvm_vcpu_arch
arm: Implement thread_struct whitelist for hardened usercopy
arm64: Implement thread_struct whitelist for hardened usercopy
x86: Implement thread_struct whitelist for hardened usercopy
fork: Provide usercopy whitelisting for task_struct
fork: Define usercopy region in thread_stack slab caches
fork: Define usercopy region in mm_struct slab caches
net: Restrict unwhitelisted proto caches to size 0
sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
sctp: Define usercopy region in SCTP proto slab cache
caif: Define usercopy region in caif proto slab cache
ip: Define usercopy region in IP proto slab cache
net: Define usercopy region in struct proto slab cache
scsi: Define usercopy region in scsi_sense_cache slab cache
cifs: Define usercopy region in cifs_request slab cache
vxfs: Define usercopy region in vxfs_inode slab cache
ufs: Define usercopy region in ufs_inode_cache slab cache
...
Pull poll annotations from Al Viro:
"This introduces a __bitwise type for POLL### bitmap, and propagates
the annotations through the tree. Most of that stuff is as simple as
'make ->poll() instances return __poll_t and do the same to local
variables used to hold the future return value'.
Some of the obvious brainos found in process are fixed (e.g. POLLIN
misspelled as POLL_IN). At that point the amount of sparse warnings is
low and most of them are for genuine bugs - e.g. ->poll() instance
deciding to return -EINVAL instead of a bitmap. I hadn't touched those
in this series - it's large enough as it is.
Another problem it has caught was eventpoll() ABI mess; select.c and
eventpoll.c assumed that corresponding POLL### and EPOLL### were
equal. That's true for some, but not all of them - EPOLL### are
arch-independent, but POLL### are not.
The last commit in this series separates userland POLL### values from
the (now arch-independent) kernel-side ones, converting between them
in the few places where they are copied to/from userland. AFAICS, this
is the least disruptive fix preserving poll(2) ABI and making epoll()
work on all architectures.
As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
it will trigger only on what would've triggered EPOLLWRBAND on other
architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
at all on sparc. With this patch they should work consistently on all
architectures"
* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
make kernel-side POLL... arch-independent
eventpoll: no need to mask the result of epi_item_poll() again
eventpoll: constify struct epoll_event pointers
debugging printk in sg_poll() uses %x to print POLL... bitmap
annotate poll(2) guts
9p: untangle ->poll() mess
->si_band gets POLL... bitmap stored into a user-visible long field
ring_buffer_poll_wait() return value used as return value of ->poll()
the rest of drivers/*: annotate ->poll() instances
media: annotate ->poll() instances
fs: annotate ->poll() instances
ipc, kernel, mm: annotate ->poll() instances
net: annotate ->poll() instances
apparmor: annotate ->poll() instances
tomoyo: annotate ->poll() instances
sound: annotate ->poll() instances
acpi: annotate ->poll() instances
crypto: annotate ->poll() instances
block: annotate ->poll() instances
x86: annotate ->poll() instances
...
After do_readv_writev, the inode cache is invalidated anyway, so i_size
will never be read. It will be fetched from the server which will also
know about updates from other machines.
Fixes deadlock on 32-bit SMP.
See https://marc.info/?l=linux-fsdevel&m=151268557427760&w=2
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In orangefs_devreq_read, there is a loop which picks an op off the list
of pending ops. If the loop fails to find an op, there is nothing to
read, and it returns EAGAIN. If the op has been given up on, the loop
is restarted via a goto. The bug is that the variable which the found
op is written to is not reinitialized, so if there are no more eligible
ops on the list, the code runs again on the already handled op.
This is triggered by interrupting a process while the op is being copied
to the client-core. It's a fairly small window, but it's there.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
set_op_state_purged can delete the op.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
orangefs symlink pathnames, stored in struct orangefs_inode_s.link_target
and therefore contained in the orangefs_inode_cache, need to be copied
to/from userspace.
cache object allocation:
fs/orangefs/super.c:
orangefs_alloc_inode(...):
...
orangefs_inode = kmem_cache_alloc(orangefs_inode_cache, ...);
...
return &orangefs_inode->vfs_inode;
fs/orangefs/orangefs-utils.c:
exofs_symlink(...):
...
inode->i_link = orangefs_inode->link_target;
example usage trace:
readlink_copy+0x43/0x70
vfs_readlink+0x62/0x110
SyS_readlinkat+0x100/0x130
fs/namei.c:
readlink_copy(..., link):
...
copy_to_user(..., link, len);
(inlined in vfs_readlink)
generic_readlink(dentry, ...):
struct inode *inode = d_inode(dentry);
const char *link = inode->i_link;
...
readlink_copy(..., link);
In support of usercopy hardening, this patch defines a region in the
orangefs_inode_cache slab cache in which userspace copy operations are
allowed.
This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.
This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.
Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
This is a pure automated search-and-replace of the internal kernel
superblock flags.
The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.
Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.
The script to do this was:
# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"
SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
for f in $L; do sed -i $f $SED_PROG; done
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>