Commit Graph

51627 Commits

Author SHA1 Message Date
Christoph Hellwig
942491c9e6 xfs: fix AIM7 regression
Apparently our current rwsem code doesn't like doing the trylock, then
lock for real scheme.  So change our read/write methods to just do the
trylock for the RWF_NOWAIT case.  This fixes a ~25% regression in
AIM7.

Fixes: 91f9943e ("fs: support RWF_NOWAIT for buffered reads")
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-23 18:31:50 -07:00
David S. Miller
f8ddadc4db Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
There were quite a few overlapping sets of changes here.

Daniel's bug fix for off-by-ones in the new BPF branch instructions,
along with the added allowances for "data_end > ptr + x" forms
collided with the metadata additions.

Along with those three changes came veritifer test cases, which in
their final form I tried to group together properly.  If I had just
trimmed GIT's conflict tags as-is, this would have split up the
meta tests unnecessarily.

In the socketmap code, a set of preemption disabling changes
overlapped with the rename of bpf_compute_data_end() to
bpf_compute_data_pointers().

Changes were made to the mv88e6060.c driver set addr method
which got removed in net-next.

The hyperv transport socket layer had a locking change in 'net'
which overlapped with a change of socket state macro usage
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 13:39:14 +01:00
Linus Torvalds
ec0145e9cc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "MS_I_VERSION fixes - Mimi's fix + missing bits picked from Matthew
  (his patch contained a duplicate of the fs/namespace.c fix as well,
  but by that point the original fix had already been applied)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Convert fs/*/* to SB_I_VERSION
  vfs: fix mounting a filesystem with i_version
2017-10-21 21:39:18 -04:00
nixiaoming
c79dde629d tty fix oops when rmmod 8250
After rmmod 8250.ko
tty_kref_put starts kwork (release_one_tty) to release proc interface
oops when accessing driver->driver_name in proc_tty_unregister_driver

Use jprobe, found driver->driver_name point to 8250.ko
static static struct uart_driver serial8250_reg
.driver_name= serial,

Use name in proc_dir_entry instead of driver->driver_name to fix oops

test on linux 4.1.12:

BUG: unable to handle kernel paging request at ffffffffa01979de
IP: [<ffffffff81310f40>] strchr+0x0/0x30
PGD 1a0d067 PUD 1a0e063 PMD 851c1f067 PTE 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ... ...  [last unloaded: 8250]
CPU: 7 PID: 116 Comm: kworker/7:1 Tainted: G           O    4.1.12 #1
Hardware name: Insyde RiverForest/Type2 - Board Product Name1, BIOS NE5KV904 12/21/2015
Workqueue: events release_one_tty
task: ffff88085b684960 ti: ffff880852884000 task.ti: ffff880852884000
RIP: 0010:[<ffffffff81310f40>]  [<ffffffff81310f40>] strchr+0x0/0x30
RSP: 0018:ffff880852887c90  EFLAGS: 00010282
RAX: ffffffff81a5eca0 RBX: ffffffffa01979de RCX: 0000000000000004
RDX: ffff880852887d10 RSI: 000000000000002f RDI: ffffffffa01979de
RBP: ffff880852887cd8 R08: 0000000000000000 R09: ffff88085f5d94d0
R10: 0000000000000195 R11: 0000000000000000 R12: ffffffffa01979de
R13: ffff880852887d00 R14: ffffffffa01979de R15: ffff88085f02e840
FS:  0000000000000000(0000) GS:ffff88085f5c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa01979de CR3: 0000000001a0c000 CR4: 00000000001406e0
Stack:
 ffffffff812349b1 ffff880852887cb8 ffff880852887d10 ffff88085f5cd6c2
 ffff880852800a80 ffffffffa01979de ffff880852800a84 0000000000000010
 ffff88085bb28bd8 ffff880852887d38 ffffffff812354f0 ffff880852887d08
Call Trace:
 [<ffffffff812349b1>] ? __xlate_proc_name+0x71/0xd0
 [<ffffffff812354f0>] remove_proc_entry+0x40/0x180
 [<ffffffff815f6811>] ? _raw_spin_lock_irqsave+0x41/0x60
 [<ffffffff813be520>] ? destruct_tty_driver+0x60/0xe0
 [<ffffffff81237c68>] proc_tty_unregister_driver+0x28/0x40
 [<ffffffff813be548>] destruct_tty_driver+0x88/0xe0
 [<ffffffff813be5bd>] tty_driver_kref_put+0x1d/0x20
 [<ffffffff813becca>] release_one_tty+0x5a/0xd0
 [<ffffffff81074159>] process_one_work+0x139/0x420
 [<ffffffff810745a1>] worker_thread+0x121/0x450
 [<ffffffff81074480>] ? process_scheduled_works+0x40/0x40
 [<ffffffff8107a16c>] kthread+0xec/0x110
 [<ffffffff81080000>] ? tg_rt_schedulable+0x210/0x220
 [<ffffffff8107a080>] ? kthread_freezable_should_stop+0x80/0x80
 [<ffffffff815f7292>] ret_from_fork+0x42/0x70
 [<ffffffff8107a080>] ? kthread_freezable_should_stop+0x80/0x80

Signed-off-by: nixiaoming <nixiaoming@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-20 14:06:45 +02:00
Linus Torvalds
03b652e5c0 Merge branch 'fixes-v4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull key handling fixes from James Morris:
 "This includes a fix for the capabilities code from Colin King, and a
  set of further fixes for the keys subsystem. From David:

   - Fix a bunch of places where kernel drivers may access revoked
     user-type keys and don't do it correctly.

   - Fix some ecryptfs bits.

   - Fix big_key to require CONFIG_CRYPTO.

   - Fix a couple of bugs in the asymmetric key type.

   - Fix a race between updating and finding negative keys.

   - Prevent add_key() from updating uninstantiated keys.

   - Make loading of key flags and expiry time atomic when not holding
     locks"

* 'fixes-v4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  commoncap: move assignment of fs_ns to avoid null pointer dereference
  pkcs7: Prevent NULL pointer dereference, since sinfo is not always set.
  KEYS: load key flags and expiry time atomically in proc_keys_show()
  KEYS: Load key expiry time atomically in keyring_search_iterator()
  KEYS: load key flags and expiry time atomically in key_validate()
  KEYS: don't let add_key() update an uninstantiated key
  KEYS: Fix race between updating and finding a negative key
  KEYS: checking the input id parameters before finding asymmetric key
  KEYS: Fix the wrong index when checking the existence of second id
  security/keys: BIG_KEY requires CONFIG_CRYPTO
  ecryptfs: fix dereference of NULL user_key_payload
  fscrypt: fix dereference of NULL user_key_payload
  lib/digsig: fix dereference of NULL user_key_payload
  FS-Cache: fix dereference of NULL user_key_payload
  KEYS: encrypted: fix dereference of NULL user_key_payload
2017-10-20 06:19:38 -04:00
Mathieu Desnoyers
a961e40917 membarrier: Provide register expedited private command
This introduces a "register private expedited" membarrier command which
allows eventual removal of important memory barrier constraints on the
scheduler fast-paths. It changes how the "private expedited" membarrier
command (new to 4.14) is used from user-space.

This new command allows processes to register their intent to use the
private expedited command.  This affects how the expedited private
command introduced in 4.14-rc is meant to be used, and should be merged
before 4.14 final.

Processes are now required to register before using
MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.

This fixes a problem that arose when designing requested extensions to
sys_membarrier() to allow JITs to efficiently flush old code from
instruction caches.  Several potential algorithms are much less painful
if the user register intent to use this functionality early on, for
example, before the process spawns the second thread.  Registering at
this time removes the need to interrupt each and every thread in that
process at the first expedited sys_membarrier() system call.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-19 22:13:40 -04:00
Dan Carpenter
0ce5cdc9d7 ovl: Return -ENOMEM if an allocation fails ovl_lookup()
The error code is missing here so it means we return ERR_PTR(0) or NULL.
The other error paths all return an error code so this probably should
as well.

Fixes: 02b69b284c ("ovl: lookup redirects")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-19 16:19:52 +02:00
Hirofumi Nakagawa
b3885bd6ed ovl: add NULL check in ovl_alloc_inode
This was detected by fault injection test

Signed-off-by: Hirofumi Nakagawa <nklabs@gmail.com>
Fixes: 13cf199d00 ("ovl: allocate an ovl_inode struct")
Cc: <stable@vger.kernel.org> # v4.13
2017-10-19 16:19:51 +02:00
Bhumika Goyal
761594b741 dlm: make config_item_type const
Make config_item_type structures const as they are either passed to a
function having the argument as const or stored in the const "ci_type"
field of a config_item structure.

Done using Coccinelle.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 16:15:22 +02:00
Bhumika Goyal
4843afe4e6 ocfs2/cluster: make config_item_type const
Make these structures const as they are either passed to the functions
having the argument as const or stored as a reference in the "ci_type"
const field of a config_item structure.

Done using Coccinelle.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 16:15:18 +02:00
Bhumika Goyal
aa293583f0 configfs: make ci_type field, some pointers and function arguments const
The ci_type field of the config_item structure do not modify the fields
of the config_item_type structure it points to. And the other pointers
initialized with ci_type do not modify the fields as well.
So, make the ci_type field and the pointers initialized with ci_type
as const.

Make the struct config_item_type *type function argument of functions
config_{item/group}_init_type_name const as the argument in both the
functions is only stored in the ci_type field of a config_item structure
which is now made const.
Make the argument of configfs_register_default_group const as it is
only passed to the argument of the function config_group_init_type_name
which is now const.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 16:15:16 +02:00
Thomas Meyer
3f6928c347 configfs: Fix bool initialization/comparison
Bool initializations should use true and false. Bool tests don't need
comparisons.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 16:15:14 +02:00
James Morris
494b9ae7ab Merge commit 'tags/keys-fixes-20171018' into fixes-v4.14-rc5 2017-10-19 12:28:38 +11:00
Eric Biggers
3ce2b8ddd8 ext4: switch to fscrypt_prepare_setattr()
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 20:21:58 -04:00
Eric Biggers
8990427501 ext4: switch to fscrypt_prepare_lookup()
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 20:21:58 -04:00
Eric Biggers
07543d164b ext4: switch to fscrypt_prepare_rename()
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 20:21:57 -04:00
Eric Biggers
697251816d ext4: switch to fscrypt_prepare_link()
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 20:21:57 -04:00
Eric Biggers
09a5c31c91 ext4: switch to fscrypt_file_open()
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 20:21:57 -04:00
Eric Biggers
32c3cf028e fscrypt: new helper function - fscrypt_prepare_lookup()
Introduce a helper function which prepares to look up the given dentry
in the given directory.  If the directory is encrypted, it handles
loading the directory's encryption key, setting the dentry's ->d_op to
fscrypt_d_ops, and setting DCACHE_ENCRYPTED_WITH_KEY if the directory's
encryption key is available.

Note: once all filesystems switch over to this, we'll be able to move
fscrypt_d_ops and fscrypt_set_encrypted_dentry() to fscrypt_private.h.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 19:52:38 -04:00
Eric Biggers
94b26f3672 fscrypt: new helper function - fscrypt_prepare_rename()
Introduce a helper function which prepares to rename a file into a
possibly encrypted directory.  It handles loading the encryption keys
for the source and target directories if needed, and it handles
enforcing that if the target directory (and the source directory for a
cross-rename) is encrypted, then the file being moved into the directory
has the same encryption policy as its containing directory.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 19:52:38 -04:00
Eric Biggers
0ea87a9644 fscrypt: new helper function - fscrypt_prepare_link()
Introduce a helper function which prepares to link an inode into a
possibly-encrypted directory.  It handles setting up the target
directory's encryption key, then verifying that the link won't violate
the constraint that all files in an encrypted directory tree use the
same encryption policy.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 19:52:38 -04:00
Eric Biggers
efcc7ae2c9 fscrypt: new helper function - fscrypt_file_open()
Add a helper function which prepares to open a regular file which may be
encrypted.  It handles setting up the file's encryption key, then
checking that the file's encryption policy matches that of its parent
directory (if the parent directory is encrypted).  It may be set as the
->open() method or it can be called from another ->open() method.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 19:52:37 -04:00
Eric Biggers
ffcc41829a fscrypt: remove unneeded empty fscrypt_operations structs
In the case where a filesystem has been configured without encryption
support, there is no longer any need to initialize ->s_cop at all, since
none of the methods are ever called.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 19:52:37 -04:00
Eric Biggers
f7293e48bb fscrypt: remove ->is_encrypted()
Now that all callers of fscrypt_operations.is_encrypted() have been
switched to IS_ENCRYPTED(), remove ->is_encrypted().

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 19:52:37 -04:00
Eric Biggers
e0428a266d fscrypt: switch from ->is_encrypted() to IS_ENCRYPTED()
IS_ENCRYPTED() now gives the same information as
i_sb->s_cop->is_encrypted() but is more efficient, since IS_ENCRYPTED()
is just a simple flag check.  Prepare to remove ->is_encrypted() by
switching all callers to IS_ENCRYPTED().

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 19:52:36 -04:00
Eric Biggers
2ee6a576be fs, fscrypt: add an S_ENCRYPTED inode flag
Introduce a flag S_ENCRYPTED which can be set in ->i_flags to indicate
that the inode is encrypted using the fscrypt (fs/crypto/) mechanism.

Checking this flag will give the same information that
inode->i_sb->s_cop->is_encrypted(inode) currently does, but will be more
efficient.  This will be useful for adding higher-level helper functions
for filesystems to use.  For example we'll be able to replace this:

	if (ext4_encrypted_inode(inode)) {
		ret = fscrypt_get_encryption_info(inode);
		if (ret)
			return ret;
		if (!fscrypt_has_encryption_key(inode))
			return -ENOKEY;
	}

with this:

	ret = fscrypt_require_key(inode);
	if (ret)
		return ret;

... since we'll be able to retain the fast path for unencrypted files as
a single flag check, using an inline function.  This wasn't possible
before because we'd have had to frequently call through the
->i_sb->s_cop->is_encrypted function pointer, even when the encryption
support was disabled or not being used.

Note: we don't define S_ENCRYPTED to 0 if CONFIG_FS_ENCRYPTION is
disabled because we want to continue to return an error if an encrypted
file is accessed without encryption support, rather than pretending that
it is unencrypted.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 19:52:36 -04:00
Dave Chinner
734f0d241d fscrypt: clean up include file mess
Filesystems have to include different header files based on whether they
are compiled with encryption support or not. That's nasty and messy.

Instead, rationalise the headers so we have a single include fscrypt.h
and let it decide what internal implementation to include based on the
__FS_HAS_ENCRYPTION define.  Filesystems set __FS_HAS_ENCRYPTION to 1
before including linux/fscrypt.h if they are built with encryption
support.  Otherwise, they must set __FS_HAS_ENCRYPTION to 0.

Add guards to prevent fscrypt_supp.h and fscrypt_notsupp.h from being
directly included by filesystems.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[EB: use 1 and 0 rather than defined/undefined]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 19:52:36 -04:00
Matthew Garrett
357fdad075 Convert fs/*/* to SB_I_VERSION
[AV: in addition to the fix in previous commit]

Signed-off-by: Matthew Garrett <mjg59@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-10-18 18:51:27 -04:00
Linus Torvalds
73d3393ada Merge tag 'xfs-4.14-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:

 - fix some more CONFIG_XFS_RT related build problems

 - fix data loss when writeback at eof races eofblocks gc and loses

 - invalidate page cache after fs finishes a dio write

 - remove dirty page state when invalidating pages so releasepage does
   the right thing when handed a dirty page

* tag 'xfs-4.14-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: move two more RT specific functions into CONFIG_XFS_RT
  xfs: trim writepage mapping to within eof
  fs: invalidate page cache after end_io() in dio completion
  xfs: cancel dirty pages on invalidation
2017-10-18 14:51:50 -04:00
Linus Torvalds
020b302376 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Three small fixes:

   - A fix for skd, it was using kfree() to free a structure allocate
     with kmem_cache_alloc().

   - Stable fix for nbd, fixing a regression using the normal ioctl
     based tools.

   - Fix for a previous fix in this series, that fixed up
     inconsistencies between buffered and direct IO"

* 'for-linus' of git://git.kernel.dk/linux-block:
  fs: Avoid invalidation in interrupt context in dio_complete()
  nbd: don't set the device size until we're connected
  skd: Use kmem_cache_free
2017-10-18 14:43:40 -04:00
Simon Ruderich
d98bf8cd11 ext4: mention noload when recovering on read-only device
Help the user to find the appropriate mount option to continue mounting
the file system on a read-only device if the journal requires recovery.

Signed-off-by: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 13:06:37 -04:00
Long Li
4572f0539c CIFS: SMBD: Fix the definition for SMB2_CHANNEL_RDMA_V1_INVALIDATE
The channel value for requesting server remote invalidating local memory
registration should be 0x00000002

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-10-18 11:52:39 -05:00
Ronnie Sahlberg
7cb3def44c cifs: handle large EA requests more gracefully in smb2+
Update reading the EA using increasingly larger buffer sizes
until the response will fit in the buffer, or we exceed the
(arbitrary) maximum set to 64kb.

Without this change, a user is able to add more and more EAs using
setfattr until the point where the total space of all EAs exceed 2kb
at which point the user can no longer list the EAs at all
and getfattr will abort with an error.

The same issue still exists for EAs in SMB1.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-10-18 11:52:39 -05:00
Steve French
06e2290844 Fix encryption labels and lengths for SMB3.1.1
SMB3.1.1 is most secure and recent dialect. Fixup labels and lengths
for sMB3.1.1 signing and encryption.

Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
2017-10-18 11:52:39 -05:00
Kees Cook
235699a8f4 ext4: convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
2017-10-18 12:45:17 -04:00
Kees Cook
e3c957885e jbd2: convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
2017-10-18 12:40:28 -04:00
David Howells
bc5e3a546d rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals
Make AF_RXRPC accept MSG_WAITALL as a flag to sendmsg() to tell it to
ignore signals whilst loading up the message queue, provided progress is
being made in emptying the queue at the other side.

Progress is defined as the base of the transmit window having being
advanced within 2 RTT periods.  If the period is exceeded with no progress,
sendmsg() will return anyway, indicating how much data has been copied, if
any.

Once the supplied buffer is entirely decanted, the sendmsg() will return.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-10-18 11:43:07 +01:00
David Howells
a68f4a27f5 rxrpc: Support service upgrade from a kernel service
Provide support for a kernel service to make use of the service upgrade
facility.  This involves:

 (1) Pass an upgrade request flag to rxrpc_kernel_begin_call().

 (2) Make rxrpc_kernel_recv_data() return the call's current service ID so
     that the caller can detect service upgrade and see what the service
     was upgraded to.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-10-18 11:37:20 +01:00
Lukas Czerner
ffe51f0142 fs: Avoid invalidation in interrupt context in dio_complete()
Currently we try to defer completion of async DIO to the process context
in case there are any mapped pages associated with the inode so that we
can invalidate the pages when the IO completes. However the check is racy
and the pages can be mapped afterwards. If this happens we might end up
calling invalidate_inode_pages2_range() in dio_complete() in interrupt
context which could sleep. This can be reproduced by generic/451.

Fix this by passing the information whether we can or can't invalidate
to the dio_complete(). Thanks Eryu Guan for reporting this and Jan Kara
for suggesting a fix.

Fixes: 332391a993 ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO")
Reported-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-17 08:43:09 -06:00
Steve Magnani
89a4d970ef udf: Fix some sign-conversion warnings
Fix some warnings that appear when compiling with -Wconversion.
A sub-optimal choice of variable type leads to warnings about
conversion in both directions between unsigned and signed.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-17 12:02:07 +02:00
Steve Magnani
fcbf7637e6 udf: Fix signed/unsigned format specifiers
Fix problems noted in compilion with -Wformat=2 -Wformat-signedness.
In particular, a mismatch between the signedness of a value and the
signedness of its format specifier can result in unsigned values being
printed as negative numbers, e.g.:

  Partition (0 type 1511) starts at physical 460, block length -1779968542

...which occurs when mounting a large (> 1 TiB) UDF partition.

Changes since V1:
* Fixed additional issues noted in udf_bitmap_free_blocks(),
  udf_get_fileident(), udf_show_options()

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-17 12:00:58 +02:00
Steve Magnani
b490bdd630 udf: Fix 64-bit sign extension issues affecting blocks > 0x7FFFFFFF
Large (> 1 TiB) UDF filesystems appear subject to several problems when
mounted on 64-bit systems:

* readdir() can fail on a directory containing File Identifiers residing
  above 0x7FFFFFFF. This manifests as a 'ls' command failing with EIO.

* FIBMAP on a file block located above 0x7FFFFFFF can return a negative
  value. The low 32 bits are correct, but applications that don't mask the
  high 32 bits of the result can perform incorrectly.

Per suggestion by Jan Kara, introduce a udf_pblk_t type for representation
of UDF block addresses. Ultimately, all driver functions that manipulate
UDF block addresses should use this type; for now, deployment is limited
to functions with actual or potential sign extension issues.

Changes to udf_readdir() and udf_block_map() address the issues noted
above; other changes address potential similar issues uncovered during
audit of the driver code.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-17 11:56:45 +02:00
Mimi Zohar
917086ff23 vfs: fix mounting a filesystem with i_version
The mount i_version flag is not enabled in the new sb_flags.  This patch
adds the missing SB_I_VERSION flag.

Fixes: e462ec5 "VFS: Differentiate mount flags (MS_*) from internal
       superblock flags"
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-10-17 02:22:07 -04:00
Arnd Bergmann
785545c898 xfs: move two more RT specific functions into CONFIG_XFS_RT
The last cleanup introduced two harmless warnings:

fs/xfs/xfs_fsmap.c:480:1: warning: '__xfs_getfsmap_rtdev' defined but not used
fs/xfs/xfs_fsmap.c:372:1: warning: 'xfs_getfsmap_rtdev_rtbitmap_helper' defined but not used

This moves those two functions as well.

Fixes: bb9c2e5433 ("xfs: move more RT specific code under CONFIG_XFS_RT")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-16 12:26:50 -07:00
Brian Foster
40214d128e xfs: trim writepage mapping to within eof
The writeback rework in commit fbcc025613 ("xfs: Introduce
writeback context for writepages") introduced a subtle change in
behavior with regard to the block mapping used across the
->writepages() sequence. The previous xfs_cluster_write() code would
only flush pages up to EOF at the time of the writepage, thus
ensuring that any pages due to file-extending writes would be
handled on a separate cycle and with a new, updated block mapping.

The updated code establishes a block mapping in xfs_writepage_map()
that could extend beyond EOF if the file has post-eof preallocation.
Because we now use the generic writeback infrastructure and pass the
cached mapping to each writepage call, there is no implicit EOF
limit in place. If eofblocks trimming occurs during ->writepages(),
any post-eof portion of the cached mapping becomes invalid. The
eofblocks code has no means to serialize against writeback because
there are no pages associated with post-eof blocks. Therefore if an
eofblocks trim occurs and is followed by a file-extending buffered
write, not only has the mapping become invalid, but we could end up
writing a page to disk based on the invalid mapping.

Consider the following sequence of events:

- A buffered write creates a delalloc extent and post-eof
  speculative preallocation.
- Writeback starts and on the first writepage cycle, the delalloc
  extent is converted to real blocks (including the post-eof blocks)
  and the mapping is cached.
- The file is closed and xfs_release() trims post-eof blocks. The
  cached writeback mapping is now invalid.
- Another buffered write appends the file with a delalloc extent.
- The concurrent writeback cycle picks up the just written page
  because the writeback range end is LLONG_MAX. xfs_writepage_map()
  attributes it to the (now invalid) cached mapping and writes the
  data to an incorrect location on disk (and where the file offset is
  still backed by a delalloc extent).

This problem is reproduced by xfstests test generic/464, which
triggers racing writes, appends, open/closes and writeback requests.

To address this problem, trim the mapping used during writeback to
within EOF when the mapping is validated. This ensures the mapping
is revalidated for any pages encountered beyond EOF as of the time
the current mapping was cached or last validated.

Reported-by: Eryu Guan <eguan@redhat.com>
Diagnosed-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-16 12:26:50 -07:00
Eryu Guan
5e25c269e1 fs: invalidate page cache after end_io() in dio completion
Commit 332391a993 ("fs: Fix page cache inconsistency when mixing
buffered and AIO DIO") moved page cache invalidation from
iomap_dio_rw() to iomap_dio_complete() for iomap based direct write
path, but before the dio->end_io() call, and it re-introdued the bug
fixed by commit c771c14baa ("iomap: invalidate page caches should
be after iomap_dio_complete() in direct write").

I found this because fstests generic/418 started failing on XFS with
v4.14-rc3 kernel, which is the regression test for this specific
bug.

So similarly, fix it by moving dio->end_io() (which does the
unwritten extent conversion) before page cache invalidation, to make
sure next buffer read reads the final real allocations not unwritten
extents. I also add some comments about why should end_io() go first
in case we get it wrong again in the future.

Note that, there's no such problem in the non-iomap based direct
write path, because we didn't remove the page cache invalidation
after the ->direct_IO() in generic_file_direct_write() call, but I
decided to fix dio_complete() too so we don't leave a landmine
there, also be consistent with iomap_dio_complete().

Fixes: 332391a993 ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO")
Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
2017-10-16 12:11:56 -07:00
Dave Chinner
793d7dbe6d xfs: cancel dirty pages on invalidation
Recently we've had warnings arise from the vm handing us pages
without bufferheads attached to them. This should not ever occur
in XFS, but we don't defend against it properly if it does. The only
place where we remove bufferheads from a page is in
xfs_vm_releasepage(), but we can't tell the difference here between
"page is dirty so don't release" and "page is dirty but is being
invalidated so release it".

In some places that are invalidating pages ask for pages to be
released and follow up afterward calling ->releasepage by checking
whether the page was dirty and then aborting the invalidation. This
is a possible vector for releasing buffers from a page but then
leaving it in the mapping, so we really do need to avoid dirty pages
in xfs_vm_releasepage().

To differentiate between invalidated pages and normal pages, we need
to clear the page dirty flag when invalidating the pages. This can
be done through xfs_vm_invalidatepage(), and will result
xfs_vm_releasepage() seeing the page as clean which matches the
bufferhead state on the page after calling block_invalidatepage().

Hence we can re-add the page dirty check in xfs_vm_releasepage to
catch the case where we might be releasing a page that is actually
dirty and so should not have the bufferheads on it removed. This
will remove one possible vector of "dirty page with no bufferheads"
and so help narrow down the search for the root cause of that
problem.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-16 12:11:56 -07:00
NeilBrown
1fea73ac92 NFS: remove special-case revalidate in nfs_opendir()
Commit f5a73672d1 ("NFS: allow close-to-open cache semantics to
apply to root of NFS filesystem") added a call to
__nfs_revalidate_inode() to nfs_opendir to as the lookup
process wouldn't reliable do this.

Subsequent commit a3fbbde70a ("VFS: we need to set LOOKUP_JUMPED
on mountpoint crossing") make this unnecessary.  So remove the
unnecessary code.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-10-16 13:51:27 -04:00
NeilBrown
b688741cb0 NFS: revalidate "." etc correctly on "open".
For correct close-to-open semantics, NFS must validate
the change attribute of a directory (or file) on open.

Since commit ecf3d1f1aa ("vfs: kill FS_REVAL_DOT by adding a
d_weak_revalidate dentry op"), open() of "." or a path ending ".." is
not revalidated reliably (except when that direct is a mount point).

Prior to that commit, "." was revalidated using nfs_lookup_revalidate()
which checks the LOOKUP_OPEN flag and forces revalidation if the flag is
set.
Since that commit, nfs_weak_revalidate() is used for NFSv3 (which
ignores the flags) and nothing is used for NFSv4.

This is fixed by using nfs_lookup_verify_inode() in
nfs_weak_revalidate().  This does the revalidation exactly when needed.
Also, add a definition of .d_weak_revalidate for NFSv4.

The incorrect behavior is easily demonstrated by running "echo *" in
some non-mountpoint NFS directory while watching network traffic.
Without this patch, "echo *" sometimes doesn't produce any traffic.
With the patch it always does.

Fixes: ecf3d1f1aa ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op")
cc: stable@vger.kernel.org (3.9+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-10-16 13:51:27 -04:00
Anna Schumaker
1750d929b0 NFS: Don't compare apples to elephants to determine access bits
The NFS_ACCESS_* flags aren't a 1:1 mapping to the MAY_* flags, so
checking for MAY_WHATEVER might have surprising results in
nfs*_proc_access().  Let's simplify this check when determining which
bits to ask for, and do it in a generic place instead of copying code
for each NFS version.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-10-16 13:51:27 -04:00