Commit Graph

798761 Commits

Author SHA1 Message Date
Ronnie Sahlberg
e77fe73c7e cifs: we can not use small padding iovs together with encryption
We can not append small padding buffers as separate iovs when encryption is
used. For this case we must flatten the request into a single buffer
containing both the data from all the iovs as well as the padding bytes.

This is at least needed for 4.20 as well due to compounding changes.

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-31 00:58:52 -06:00
Steve French
14e92c5dc7 cifs: Minor Kconfig clarification
Clarify the use of the CONFIG_DFS_UPCALL for DNS name resolution
when server ip addresses change (e.g. on long running mounts)

Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:13:11 -06:00
Paulo Alcantara
28eb24ff75 cifs: Always resolve hostname before reconnecting
In case a hostname resolves to a different IP address (e.g. long
running mounts), make sure to resolve it every time prior to calling
generic_ip_connect() in reconnect.

Suggested-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:13:11 -06:00
Paulo Alcantara
0874401549 cifs: Add support for failover in cifs_reconnect_tcon()
After a successful failover, the cifs_reconnect_tcon() function will
make sure to reconnect every tcon to new target server.

Same as previous commit but for SMB1 codepath.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:13:11 -06:00
Paulo Alcantara
a3a53b7603 cifs: Add support for failover in smb2_reconnect()
After a successful failover in cifs_reconnect(), the smb2_reconnect()
function will make sure to reconnect every tcon to new target server.

For SMB2+.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:13:11 -06:00
Paulo Alcantara
2332440714 cifs: Only free DFS target list if we actually got one
Fix potential NULL ptr deref when DFS target list is empty.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:13:11 -06:00
Paulo Alcantara
e511d31753 cifs: start DFS cache refresher in cifs_mount()
Start the DFS cache refresh worker per volume during cifs mount.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:13:11 -06:00
YueHaibing
2f0a617448 cifs: Use GFP_ATOMIC when a lock is held in cifs_mount()
A spin lock is held before kstrndup, it may sleep with holding
the spinlock, so we should use GFP_ATOMIC instead.

Fixes: e58c31d5e387 ("cifs: Add support for failover in cifs_reconnect()")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
2018-12-28 10:13:11 -06:00
Paulo Alcantara
93d5cb517d cifs: Add support for failover in cifs_reconnect()
After failing to reconnect to original target, it will retry any
target available from DFS cache.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:13:11 -06:00
Paulo Alcantara
4a367dc044 cifs: Add support for failover in cifs_mount()
This patch adds support for failover when failing to connect in
cifs_mount().

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:10:29 -06:00
YueHaibing
5a650501eb cifs: remove set but not used variable 'sep'
Fixes gcc '-Wunused-but-set-variable' warning:

fs/cifs/cifs_dfs_ref.c: In function 'cifs_dfs_do_automount':
fs/cifs/cifs_dfs_ref.c:309:7: warning:
 variable 'sep' set but not used [-Wunused-but-set-variable]

It never used since introdution in commit 0f56b277073c ("cifs: Make use
of DFS cache to get new DFS referrals")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:09:46 -06:00
Paulo Alcantara
1c780228e9 cifs: Make use of DFS cache to get new DFS referrals
This patch will make use of DFS cache routines where appropriate and
do not always request a new referral from server.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:09:46 -06:00
Steve French
e8bcdfdbf9 cifs: minor updates to documentation
Update cifs "TODO" file.

Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:09:46 -06:00
Joe Perches
0544b324e6 cifs: check kzalloc return
kzalloc can return NULL so an additional check is needed. While there
is a check for ret_buf there is no check for the allocation of
ret_buf->crfid.fid - this check is thus added. Both call-sites
of tconInfoAlloc() check for NULL return of tconInfoAlloc()
so returning NULL on failure of kzalloc() here seems appropriate.
As the kzalloc() is the only thing here that can fail it is
moved to the beginning so as not to initialize other resources
on failure of kzalloc.

Fixes: 3d4ef9a153 ("smb3: fix redundant opens on root")

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:09:46 -06:00
YueHaibing
29cbfa1b2b cifs: remove set but not used variable 'server'
Fixes gcc '-Wunused-but-set-variable' warning:

fs/cifs/smb2pdu.c: In function 'smb311_posix_mkdir':
fs/cifs/smb2pdu.c:2040:26: warning:
 variable 'server' set but not used [-Wunused-but-set-variable]

fs/cifs/smb2pdu.c: In function 'build_qfs_info_req':
fs/cifs/smb2pdu.c:4067:26: warning:
 variable 'server' set but not used [-Wunused-but-set-variable]

The first 'server' never used since commit bea851b8ba ("smb3: Fix mode on
mkdir on smb311 mounts")
And the second not used since commit 1fc6ad2f10 ("cifs: remove
header_preamble_size where it is always 0")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:09:46 -06:00
Dan Carpenter
34bca9bbe7 cifs: Use kzfree() to free password
We should zero out the password before we free it.

Fixes: 3d6cacbb5310 ("cifs: Add DFS cache routines")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
2018-12-28 10:09:46 -06:00
Wei Yongjun
3e80be0158 cifs: Fix to use kmem_cache_free() instead of kfree()
memory allocated by kmem_cache_alloc() in alloc_cache_entry()
should be freed using kmem_cache_free(), not kfree().

Fixes: 34a44fb160f9 ("cifs: Add DFS cache routines")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-12-28 10:09:46 -06:00
Stephen Rothwell
54e4f73cbe cifs: update for current_kernel_time64() removal
Fixes cifs build failure after merge of the y2038 tree

After merging the y2038 tree, today's linux-next build (x86_64
allmodconfig) failed like this:

fs/cifs/dfs_cache.c: In function 'cache_entry_expired':
fs/cifs/dfs_cache.c:106:7: error: implicit declaration of function 'current_kernel_time64'; did you mean 'core_kernel_text'? [-Werror=implicit-function-declaration]
  ts = current_kernel_time64();
       ^~~~~~~~~~~~~~~~~~~~~
       core_kernel_text
fs/cifs/dfs_cache.c:106:5: error: incompatible types when assigning to type 'struct timespec64' from type 'int'
  ts = current_kernel_time64();
     ^
fs/cifs/dfs_cache.c: In function 'get_expire_time':
fs/cifs/dfs_cache.c:342:24: error: incompatible type for argument 1 of 'timespec64_add'
  return timespec64_add(current_kernel_time64(), ts);
                        ^~~~~~~~~~~~~~~~~~~~~~~
In file included from include/linux/restart_block.h:10,
                 from include/linux/thread_info.h:13,
                 from arch/x86/include/asm/preempt.h:7,
                 from include/linux/preempt.h:78,
                 from include/linux/rcupdate.h:40,
                 from fs/cifs/dfs_cache.c:8:
include/linux/time64.h:66:66: note: expected 'struct timespec64' but argument is of type 'int'
 static inline struct timespec64 timespec64_add(struct timespec64 lhs,
                                                ~~~~~~~~~~~~~~~~~~^~~
fs/cifs/dfs_cache.c:343:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^

Caused by:

  commit ccea641b6742 ("timekeeping: remove obsolete time accessors")

interacting with:
  commit 34a44fb160f9 ("cifs: Add DFS cache routines")

from the cifs tree.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:09:46 -06:00
Paulo Alcantara
54be1f6c1c cifs: Add DFS cache routines
* Add new dfs_cache.[ch] files

* Add new /proc/fs/cifs/dfscache file
  - dump current cache when read
  - clear current cache when writing "0" to it

* Add delayed_work to periodically refresh cache entries

The new interface will be used for caching DFS referrals, as well as
supporting client target failover.

The DFS cache is a hashtable that maps UNC paths to cache entries.

A cache entry contains:
- the UNC path it is mapped on
- how much the the UNC path the entry consumes
- flags
- a Time-To-Live after which the entry expires
- a list of possible targets (linked lists of UNC paths)
- a "hint target" pointing the last known working target or the first
  target if none were tried. This hint lets cifs.ko remember and try
  working targets first.

* Looking for an entry in the cache is done with dfs_cache_find()
  - if no valid entries are found, a DFS query is made, stored in the
    cache and returned
  - the full target list can be copied and returned to avoid race
    conditions and looped on with the help with the
    dfs_cache_tgt_iterator

* Updating the target hint to the next target is done with
  dfs_cache_update_tgthint()

These functions have a dfs_cache_noreq_XXX() version that doesn't
fetches referrals if no entries are found. These versions don't
require the tcp/ses/tcon/cifs_sb parameters as a result.

Expired entries cannot be used and since they have a pretty short TTL
[1] in order for them to be useful for failover the DFS cache adds a
delayed work called periodically to keep them fresh.

Since we might not have available connections to issue the referral
request when refreshing we need to store volume_info structs with
credentials and other needed info to be able to connect to the right
server.

1: Windows defaults: 5mn for domain-based referrals, 30mn for regular
links

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-28 10:05:58 -06:00
Paulo Alcantara
e7b602f437 cifs: Save TTL value when parsing DFS referrals
This will be needed by DFS cache.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 23:49:00 -06:00
Aurelien Aptel
5fc7fcd054 cifs: auto disable 'serverino' in dfs mounts
Different servers have different set of file ids.

After failover, unique IDs will be different so we can't validate
them.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 23:05:11 -06:00
Paulo Alcantara
d9345e0ae7 cifs: Make devname param optional in cifs_compose_mount_options()
If we only want to get the mount options strings, do not return the
devname.

For DFS failover, we'll be passing the DFS full path down to
cifs_mount() rather than the devname.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 23:05:08 -06:00
Paulo Alcantara
c34fea5a63 cifs: Skip any trailing backslashes from UNC
When extracting hostname from UNC, check for leading backslashes
before trying to remove them.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 23:05:05 -06:00
Paulo Alcantara
56c762eb9b cifs: Refactor out cifs_mount()
* Split and refactor the very large function cifs_mount() in multiple
  functions:

- tcp, ses and tcon setup to mount_get_conns()
- tcp, ses and tcon cleanup in mount_put_conns()
- tcon tlink setup to mount_setup_tlink()
- remote path checking to is_path_remote()

* Implement 2 version of cifs_mount() for DFS-enabled builds and
  non-DFS-enabled builds (CONFIG_CIFS_DFS_UPCALL).

In preparation for DFS failover support.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 23:00:38 -06:00
Georgy A Bystrenin
9a596f5b39 CIFS: Fix error mapping for SMB2_LOCK command which caused OFD lock problem
While resolving a bug with locks on samba shares found a strange behavior.
When a file locked by one node and we trying to lock it from another node
it fail with errno 5 (EIO) but in that case errno must be set to
(EACCES | EAGAIN).
This isn't happening when we try to lock file second time on same node.
In this case it returns EACCES as expected.
Also this issue not reproduces when we use SMB1 protocol (vers=1.0 in
mount options).

Further investigation showed that the mapping from status_to_posix_error
is different for SMB1 and SMB2+ implementations.
For SMB1 mapping is [NT_STATUS_LOCK_NOT_GRANTED to ERRlock]
(See fs/cifs/netmisc.c line 66)
but for SMB2+ mapping is [STATUS_LOCK_NOT_GRANTED to -EIO]
(see fs/cifs/smb2maperror.c line 383)

Quick changes in SMB2+ mapping from EIO to EACCES has fixed issue.

BUG: https://bugzilla.kernel.org/show_bug.cgi?id=201971

Signed-off-by: Georgy A Bystrenin <gkot@altlinux.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 22:42:56 -06:00
Long Li
54e94ff94e CIFS: return correct errors when pinning memory failed for direct I/O
When pinning memory failed, we should return the correct error code and
rewind the SMB credits.

Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Signed-off-by: Long Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Cc: Murphy Zhou <jencce.kernel@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 22:42:48 -06:00
Long Li
b6bc8a7b99 CIFS: use the correct length when pinning memory for direct I/O for write
The current code attempts to pin memory using the largest possible wsize
based on the currect SMB credits. This doesn't cause kernel oops but this
is not optimal as we may pin more pages then actually needed.

Fix this by only pinning what are needed for doing this write I/O.

Signed-off-by: Long Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
2018-12-23 22:42:19 -06:00
Ronnie Sahlberg
59a63e479c cifs: check ntwrk_buf_start for NULL before dereferencing it
RHBZ: 1021460

There is an issue where when multiple threads open/close the same directory
ntwrk_buf_start might end up being NULL, causing the call to smbCalcSize
later to oops with a NULL deref.

The real bug is why this happens and why this can become NULL for an
open cfile, which should not be allowed.
This patch tries to avoid a oops until the time when we fix the underlying
issue.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 22:41:31 -06:00
Ronnie Sahlberg
52baa51d30 cifs: remove coverity warning in calc_lanman_hash
password_with_pad is a fixed size buffer of 16 bytes, it contains a
password string, to be padded with \0 if shorter than 16 bytes
but is just truncated if longer.
It is not, and we do not depend on it to be, nul terminated.

As such, do not use strncpy() to populate this buffer since
the str* prefix suggests that this is a string, which it is not,
and it also confuses coverity causing a false warning.

Detected by CoverityScan CID#113743 ("Buffer not null terminated")

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 22:41:26 -06:00
YueHaibing
0f57451eeb cifs: remove set but not used variable 'smb_buf'
Fixes gcc '-Wunused-but-set-variable' warning:

fs/cifs/sess.c: In function '_sess_auth_rawntlmssp_assemble_req':
fs/cifs/sess.c:1157:18: warning:
 variable 'smb_buf' set but not used [-Wunused-but-set-variable]

It never used since commit cc87c47d9d ("cifs: Separate rawntlmssp auth
from CIFS_SessSetup()")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 22:41:20 -06:00
Gustavo A. R. Silva
07fa6010ff cifs: suppress some implicit-fallthrough warnings
To avoid the warning:

     warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 22:41:11 -06:00
Ronnie Sahlberg
f9793b6fcc cifs: change smb2_query_eas to use the compound query-info helper
Reducing the number of network roundtrips improves the performance
of query xattrs

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 22:40:17 -06:00
Kenneth D'souza
4a3b38aec5 Add vers=3.0.2 as a valid option for SMBv3.0.2
Technically 3.02 is not the dialect name although that is more familiar to
many, so we should also accept the official dialect name (3.0.2 vs. 3.02)
in vers=

Signed-off-by: Kenneth D'souza <kdsouza@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 22:39:29 -06:00
Ronnie Sahlberg
07d3b2e426 cifs: create a helper function for compound query_info
and convert statfs to use it.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 22:38:17 -06:00
Steve French
97aa495a89 cifs: address trivial coverity warning
This is not actually a bug but as Coverity points out we shouldn't
be doing an "|=" on a value which hasn't been set (although technically
it was memset to zero so isn't a bug) and so might as well change
"|=" to "=" in this line

Detected by CoverityScan, CID#728535 ("Unitialized scalar variable")

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-12-23 22:38:14 -06:00
Steve French
f5942db5ef cifs: smb2 commands can not be negative, remove confusing check
As Coverity points out le16_to_cpu(midEntry->Command) can not be
less than zero.

Detected by CoverityScan, CID#1438650 ("Macro compares unsigned to 0")

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-12-23 22:37:23 -06:00
Ronnie Sahlberg
0967e54579 cifs: use a compound for setting an xattr
Improve performance by reducing number of network round trips
for set xattr.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 22:36:24 -06:00
Colin Ian King
5890255b83 cifs: clean up indentation, replace spaces with tab
Trivial fix to clean up indentation, replace spaces with tab

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23 22:36:09 -06:00
Linus Torvalds
8fe28cb58b Linux 4.20 2018-12-23 15:55:59 -08:00
Linus Torvalds
3c730b1041 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "A couple of fixes - no common topic ;-)"

[ The aio spectre patch also came in from Jens, so now we have that
  doubly fixed .. ]

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  proc/sysctl: don't return ENOMEM on lookup when a table is unregistering
  aio: fix spectre gadget in lookup_ioctx
2018-12-23 10:40:41 -08:00
Linus Torvalds
9105b8aa50 SCSI fixes on 20181221
This is two simple target fixes and one discard related I/O starvation
 problem in sd.  The discard problem occurs because the discard page
 doesn't have a mempool backing so if the allocation fails due to
 memory pressure, we then lose the forward progress we require if the
 writeout is on the same device.  The fix is to back it with a mempool.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXB2mCiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishSJmAP9E8ItG
 tSgUyIfRRcn/ZxYdfOg1EWxGgDq17Fq2TgQU3gEAolSLwol7eKl1hQnDpOKPVMmC
 //j4JyKpCl3EEvNs6DQ=
 =3Hmt
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is two simple target fixes and one discard related I/O starvation
  problem in sd.

  The discard problem occurs because the discard page doesn't have a
  mempool backing so if the allocation fails due to memory pressure, we
  then lose the forward progress we require if the writeout is on the
  same device. The fix is to back it with a mempool"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: use mempool for discard special page
  scsi: target: iscsi: cxgbit: add missing spin_lock_init()
  scsi: target: iscsi: cxgbit: fix csk leak
2018-12-22 15:03:00 -08:00
Linus Torvalds
1104bd96eb A cleanup for userspace in compiler_types.h
- don't pollute userspace with macro definitions
     From Xiaozhou Liu
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAlwdVEQACgkQGXyLc2ht
 IW0RSxAAo3yZBrkMxN6FIrBeEfENFs8TL3iDq5GoCPShJNWHRpRbEBhi06/k6dnA
 ePFmgXL/FEio+f47aUj/pEh2NQv5QcwkLRpizREmGtHjVBngJNARFyHxveZyqE52
 ArySpu5/WPswQdu73cQLAwqtk505Gi8jNLRKVqr4CiBJZB/WO7rsINWDOhUulpwG
 9b8Kmct4al/3mhDOhnn1ppgAIauzj2xoyXxYMLZx95h7oycfssUvbNfJtALnxCJs
 eIWAxGebr3ni85q9J69gMfIOwiSn6HtaLAuv8Q7AOKuCBd1+/ymX79gCwH68dQVl
 tdDhIE7vEAWZHbVHy7fdnNIUbPfAMk/QonLStbdd2nYVeblD/luSe91NShCo7Jg5
 ZVJHdA+eD9IjypGz4mMzjOlvhCWZBGtOdnby4tD6YxV+S9fDQPvE+9Ws1JaAIzpH
 kpnj1tmi5YwqN6T5pLWQwVs/HCuoCXI89pv0tSQwip2/txxorAIhhJvzo94lLdv/
 nOABNb6/eszVj7IGOxKLXW/djFluwt/0SzlaD3A8pIjQWNXolfpmBu/9xMcJVZuj
 070Vfn60bjH9q2qitBvlYLCX4eXaGBcfybRi7oe5WnOlKVCl1idQSPcn+2dhiaOO
 JpInO/XLQUrieHT4f/AW6prWJ4AiQJd1lpKw76acOjcm/iXMJGM=
 =e9gb
 -----END PGP SIGNATURE-----

Merge tag 'compiler-attributes-for-linus-v4.20' of https://github.com/ojeda/linux

Pull compiler_types.h fix from Miguel Ojeda:
 "A cleanup for userspace in compiler_types.h: don't pollute userspace
  with macro definitions (Xiaozhou Liu)

  This is harmless for the kernel, but v4.19 was released with a few
  macros exposed to userspace as the patch explains; which this removes,
  so it *could* happen that we break something for someone (although
  leaving inline redefined is probably worse)"

* tag 'compiler-attributes-for-linus-v4.20' of https://github.com/ojeda/linux:
  include/linux/compiler_types.h: don't pollute userspace with macro definitions
2018-12-22 14:29:21 -08:00
Linus Torvalds
38c0ecf608 Fix bug in auxdisplay.
- charlcd: fix x/y command parsing
     From Mans Rullgard
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAlwdT8QACgkQGXyLc2ht
 IW3xxQ/9EEw7lrrACBaTzj9DaEkln1Dz1ObEkSZdZJ+YxkPaagKAZZvaDmigYrkL
 SRSAxNr11mRVLOLjPK4/D1JpK07Xb7Ahu0lgcfInqxEv+lgkqVNdxCxomCqNowRM
 FbCd1xKH2VMWo1z6Bi/tvnJtD/tYnJzzBPPPNMRtd5Cmtt6OF5LVjE68pGcgHELq
 HD6uygy3WmGIzHxhV3O5PSM7FhD0mp9HgOzheBXqJY0ku5y/fj5eyp7QaucqSd2E
 oTr8X43GdGhAbntMbQ7Go0c9u4r6TJKkmqw9ylvXICDyNRzP3fLOipqRDh9U9slv
 ApHkZeM7v/2vzYdf1VJNep5Cb3EoeAE93CGnmVN+Jux006/6BQSTZ6k4z6XfXTpA
 wguhLgnRJ0EUx6REhC1ciPdmB9sJnkg6AUO8O4KQFU8X5MDT2pV+055ZopKkGTBB
 RKYvn3ymmY0FcOXPcn0lu48aCCkkyGZ7ANxTfZDpBt5UnMEWWHiYaElyet2OssG9
 RhkxS6ogzhlEesEGF5c9ZkRujHorBnvpwPACocK+kwoEMBaRQ6SDWQ+gZS6YGceS
 21MqY3WqMXZxT1PzJ1r9Z+FVB6gPPLGCiDEJg4QOrBQ3r6cq5JQK5LVNnD3b9Uqz
 /wWJj0KT1oOyMkdxzxxiGtcutDTzRluBOmRyQ0A6z0xTabgD0r8=
 =BHr1
 -----END PGP SIGNATURE-----

Merge tag 'auxdisplay-for-linus-v4.20' of https://github.com/ojeda/linux

Pull auxdisplay fix from Miguel Ojeda:
 "charlcd: fix x/y command parsing (Mans Rullgard)"

* tag 'auxdisplay-for-linus-v4.20' of https://github.com/ojeda/linux:
  auxdisplay: charlcd: fix x/y command parsing
2018-12-22 14:25:23 -08:00
Christian Brauner
94f82008ce Revert "vfs: Allow userns root to call mknod on owned filesystems."
This reverts commit 55956b59df.

commit 55956b59df ("vfs: Allow userns root to call mknod on owned filesystems.")
enabled mknod() in user namespaces for userns root if CAP_MKNOD is
available. However, these device nodes are useless since any filesystem
mounted from a non-initial user namespace will set the SB_I_NODEV flag on
the filesystem. Now, when a device node s created in a non-initial user
namespace a call to open() on said device node will fail due to:

bool may_open_dev(const struct path *path)
{
        return !(path->mnt->mnt_flags & MNT_NODEV) &&
                !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
}

The problem with this is that as of the aforementioned commit mknod()
creates partially functional device nodes in non-initial user namespaces.
In particular, it has the consequence that as of the aforementioned commit
open() will be more privileged with respect to device nodes than mknod().
Before it was the other way around. Specifically, if mknod() succeeded
then it was transparent for any userspace application that a fatal error
must have occured when open() failed.

All of this breaks multiple userspace workloads and a widespread assumption
about how to handle mknod(). Basically, all container runtimes and systemd
live by the slogan "ask for forgiveness not permission" when running user
namespace workloads. For mknod() the assumption is that if the syscall
succeeds the device nodes are useable irrespective of whether it succeeds
in a non-initial user namespace or not. This logic was chosen explicitly
to allow for the glorious day when mknod() will actually be able to create
fully functional device nodes in user namespaces.
A specific problem people are already running into when running 4.18 rc
kernels are failing systemd services. For any distro that is run in a
container systemd services started with the PrivateDevices= property set
will fail to start since the device nodes in question cannot be
opened (cf. the arguments in [1]).

Full disclosure, Seth made the very sound argument that it is already
possible to end up with partially functional device nodes. Any filesystem
mounted with MS_NODEV set will allow mknod() to succeed but will not allow
open() to succeed. The difference to the case here is that the MS_NODEV
case is transparent to userspace since it is an explicitly set mount option
while the SB_I_NODEV case is an implicit property enforced by the kernel
and hence opaque to userspace.

[1]: https://github.com/systemd/systemd/pull/9483

Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-22 14:18:34 -08:00
Christoph Hellwig
0cd60eb1a7 dma-mapping: fix flags in dma_alloc_wc
We really need the writecombine flag in dma_alloc_wc, fix a stupid
oversight.

Fixes: 7ed1d91a9e ("dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-22 08:46:27 -08:00
Linus Torvalds
23203e3f34 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "4 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, page_alloc: fix has_unmovable_pages for HugePages
  fork,memcg: fix crash in free_thread_stack on memcg charge fail
  mm: thp: fix flags for pmd migration when split
  mm, memory_hotplug: initialize struct pages for the full memory section
2018-12-21 14:59:00 -08:00
Oscar Salvador
17e2e7d7e1 mm, page_alloc: fix has_unmovable_pages for HugePages
While playing with gigantic hugepages and memory_hotplug, I triggered
the following #PF when "cat memoryX/removable":

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 1481 Comm: cat Tainted: G            E     4.20.0-rc6-mm1-1-default+ #18
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
  RIP: 0010:has_unmovable_pages+0x154/0x210
  Call Trace:
   is_mem_section_removable+0x7d/0x100
   removable_show+0x90/0xb0
   dev_attr_show+0x1c/0x50
   sysfs_kf_seq_show+0xca/0x1b0
   seq_read+0x133/0x380
   __vfs_read+0x26/0x180
   vfs_read+0x89/0x140
   ksys_read+0x42/0x90
   do_syscall_64+0x5b/0x180
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

The reason is we do not pass the Head to page_hstate(), and so, the call
to compound_order() in page_hstate() returns 0, so we end up checking
all hstates's size to match PAGE_SIZE.

Obviously, we do not find any hstate matching that size, and we return
NULL.  Then, we dereference that NULL pointer in
hugepage_migration_supported() and we got the #PF from above.

Fix that by getting the head page before calling page_hstate().

Also, since gigantic pages span several pageblocks, re-adjust the logic
for skipping pages.  While are it, we can also get rid of the
round_up().

[osalvador@suse.de: remove round_up(), adjust skip pages logic per Michal]
  Link: http://lkml.kernel.org/r/20181221062809.31771-1-osalvador@suse.de
Link: http://lkml.kernel.org/r/20181217225113.17864-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21 14:51:18 -08:00
Rik van Riel
5eed6f1dff fork,memcg: fix crash in free_thread_stack on memcg charge fail
Commit 9b6f7e163c ("mm: rework memcg kernel stack accounting") will
result in fork failing if allocating a kernel stack for a task in
dup_task_struct exceeds the kernel memory allowance for that cgroup.

Unfortunately, it also results in a crash.

This is due to the code jumping to free_stack and calling
free_thread_stack when the memcg kernel stack charge fails, but without
tsk->stack pointing at the freshly allocated stack.

This in turn results in the vfree_atomic in free_thread_stack oopsing
with a backtrace like this:

#5 [ffffc900244efc88] die at ffffffff8101f0ab
 #6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86
 #7 [ffffc900244efce0] general_protection at ffffffff818ff082
    [exception RIP: llist_add_batch+7]
    RIP: ffffffff8150d487  RSP: ffffc900244efd98  RFLAGS: 00010282
    RAX: 0000000000000000  RBX: ffff88085ef55980  RCX: 0000000000000000
    RDX: ffff88085ef55980  RSI: 343834343531203a  RDI: 343834343531203a
    RBP: ffffc900244efd98   R8: 0000000000000001   R9: ffff8808578c3600
    R10: 0000000000000000  R11: 0000000000000001  R12: ffff88029f6c21c0
    R13: 0000000000000286  R14: ffff880147759b00  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7
 #9 [ffffc900244efdb8] copy_process at ffffffff81086e37
#10 [ffffc900244efe98] _do_fork at ffffffff810884e0
#11 [ffffc900244eff10] sys_vfork at ffffffff810887ff
#12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43
    RIP: 000000000049b948  RSP: 00007ffcdb307830  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000896030  RCX: 000000000049b948
    RDX: 0000000000000000  RSI: 00007ffcdb307790  RDI: 00000000005d7421
    RBP: 000000000067370f   R8: 00007ffcdb3077b0   R9: 000000000001ed00
    R10: 0000000000000008  R11: 0000000000000246  R12: 0000000000000040
    R13: 000000000000000f  R14: 0000000000000000  R15: 000000000088d018
    ORIG_RAX: 000000000000003a  CS: 0033  SS: 002b

The simplest fix is to assign tsk->stack right where it is allocated.

Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com
Fixes: 9b6f7e163c ("mm: rework memcg kernel stack accounting")
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21 14:51:18 -08:00
Peter Xu
2e83ee1d86 mm: thp: fix flags for pmd migration when split
When splitting a huge migrating PMD, we'll transfer all the existing PMD
bits and apply them again onto the small PTEs.  However we are fetching
the bits unconditionally via pmd_soft_dirty(), pmd_write() or
pmd_yound() while actually they don't make sense at all when it's a
migration entry.  Fix them up.  Since at it, drop the ifdef together as
not needed.

Note that if my understanding is correct about the problem then if
without the patch there is chance to lose some of the dirty bits in the
migrating pmd pages (on x86_64 we're fetching bit 11 which is part of
swap offset instead of bit 2) and it could potentially corrupt the
memory of an userspace program which depends on the dirty bit.

Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21 14:51:18 -08:00
Mikhail Zaslonko
2830bf6f05 mm, memory_hotplug: initialize struct pages for the full memory section
If memory end is not aligned with the sparse memory section boundary,
the mapping of such a section is only partly initialized.  This may lead
to VM_BUG_ON due to uninitialized struct page access from
is_mem_section_removable() or test_pages_in_a_zone() function triggered
by memory_hotplug sysfs handlers:

Here are the the panic examples:
 CONFIG_DEBUG_VM=y
 CONFIG_DEBUG_VM_PGFLAGS=y

 kernel parameter mem=2050M
 --------------------------
 page:000003d082008000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ( test_pages_in_a_zone+0xde/0x160)
   show_valid_zones+0x5c/0x190
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   test_pages_in_a_zone+0xde/0x160
 Kernel panic - not syncing: Fatal exception: panic_on_oops

 kernel parameter mem=3075M
 --------------------------
 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ( is_mem_section_removable+0xb4/0x190)
   show_mem_removable+0x9a/0xd8
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

Fix the problem by initializing the last memory section of each zone in
memmap_init_zone() till the very end, even if it goes beyond the zone end.

Michal said:

: This has alwways been problem AFAIU.  It just went unnoticed because we
: have zeroed memmaps during allocation before f7f99100d8 ("mm: stop
: zeroing memory during allocation in vmemmap") and so the above test
: would simply skip these ranges as belonging to zone 0 or provided a
: garbage.
:
: So I guess we do care for post f7f99100d8 kernels mostly and
: therefore Fixes: f7f99100d8 ("mm: stop zeroing memory during
: allocation in vmemmap")

Link: http://lkml.kernel.org/r/20181212172712.34019-2-zaslonko@linux.ibm.com
Fixes: f7f99100d8 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21 14:51:18 -08:00