Commit Graph

968775 Commits

Author SHA1 Message Date
Ilya Dryomov
fc4c128e15 libceph: change ceph_con_in_msg_alloc() to take hdr
ceph_con_in_msg_alloc() is protocol independent, but con->in_hdr (and
struct ceph_msg_header in general) is msgr1 specific.  While the struct
is deeply ingrained inside and outside the messenger, con->in_hdr field
can be separated.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:49 +01:00
Ilya Dryomov
8ee8abf797 libceph: change ceph_msg_data_cursor_init() to take cursor
Make it possible to have local cursors and embed them outside struct
ceph_msg.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:49 +01:00
Ilya Dryomov
0247192809 libceph: handle discarding acked and requeued messages separately
Make it easier to follow and remove dependency on msgr1 specific
CEPH_MSGR_TAG_SEQ.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:49 +01:00
Ilya Dryomov
5cd8da3a1c libceph: drop msg->ack_stamp field
It is set in process_ack() but never used.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:49 +01:00
Ilya Dryomov
d3c1248cac libceph: remove redundant session reset log message
Stick with pr_info message because session reset isn't an error most of
the time.  When it is (i.e. if the server denies the reconnect attempt),
we get a bunch of other pr_err messages.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:49 +01:00
Ilya Dryomov
a3da057bbd libceph: clear con->peer_global_seq on RESETSESSION
con->peer_global_seq is part of session state.  Clear it when
the server tells us to reset, not just in ceph_con_close().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Ilya Dryomov
5963c3d01c libceph: rename reset_connection() to ceph_con_reset_session()
With just session reset bits left, rename appropriately.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Ilya Dryomov
3596f4c124 libceph: split protocol reset bits out of reset_connection()
Move protocol reset bits into ceph_con_reset_protocol(), leaving
just session reset bits.

Note that con->out_skip is now reset on faults.  This fixes a crash
in the case of a stateful session getting a fault while in the middle
of revoking a message.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Ilya Dryomov
90b6561a05 libceph: don't call reset_connection() on version/feature mismatches
A fault due to a version mismatch or a feature set mismatch used to be
treated differently from other faults: the connection would get closed
without trying to reconnect and there was a ->bad_proto() connection op
for notifying about that.

This changed a long time ago, see commits 6384bb8b8e ("libceph: kill
bad_proto ceph connection op") and 0fa6ebc600 ("libceph: fix protocol
feature mismatch failure path").  Nowadays these aren't any different
from other faults (i.e. we try to reconnect even though the mismatch
won't resolve until the server is replaced).  reset_connection() calls
there are rather confusing because reset_connection() resets a session
together an individual instance of the protocol.  This is cleaned up
in the next patch.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Ilya Dryomov
418af5b3bf libceph: lower exponential backoff delay
The current setting allows the backoff to climb up to 5 minutes.  This
is too high -- it becomes hard to tell whether the client is stuck on
something or just in backoff.

In userspace, ms_max_backoff is defaulted to 15 seconds.  Let's do the
same.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Ilya Dryomov
b77f8f0e4f libceph: include middle_len in process_message() dout
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Jeff Layton
4f1ddb1ea8 ceph: implement updated ceph_mds_request_head structure
When we added the btime feature in mainline ceph, we had to extend
struct ceph_mds_request_args so that it could be set. Implement the same
in the kernel client.

Rename ceph_mds_request_head with a _old extension, and a union
ceph_mds_request_args_ext to allow for the extended size of the new
header format.

Add the appropriate code to handle both formats in struct
create_request_message and key the behavior on whether the peer supports
CEPH_FEATURE_FS_BTIME.

The gid_list field in the payload is now populated from the saved
credential. For now, we don't add any support for setting the btime via
setattr, but this does enable us to add that in the future.

[ idryomov: break unnecessarily long lines ]

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Jeff Layton
396bd62c69 ceph: clean up argument lists to __prepare_send_request and __send_request
We can always get the mdsc from the session, so there's no need to pass
it in as a separate argument. Pass the session to __prepare_send_request
as well, to prepare for later patches that will need to access it.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Jeff Layton
7fe0cdeb0f ceph: take a cred reference instead of tracking individual uid/gid
Replace req->r_uid/r_gid with an r_cred pointer and take a reference to
that at the point where we previously would sample the two.  Use that to
populate the uid and gid in the header and release the reference when
the request is freed.

This should enable us to later add support for sending supplementary
group lists in MDS requests.

[ idryomov: break unnecessarily long lines ]

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Jeff Layton
0f51a98361 ceph: don't reach into request header for readdir info
We already have a pointer to the argument struct in req->r_args. Use that
instead of groveling around in the ceph_mds_request_head.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Xiubo Li
968cd14edc ceph: set osdmap epoch for setxattr
When setting the file/dir layout, it may need data pool info. So
in mds server, it needs to check the osdmap. At present, if mds
doesn't find the data pool specified, it will try to get the latest
osdmap. Now if pass the osd epoch for setxattr, the mds server can
only check this epoch of osdmap.

URL: https://tracker.ceph.com/issues/48504
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Colin Ian King
4a756db2a1 ceph: remove redundant assignment to variable i
The variable i is being initialized with a value that is never read
and it is being updated later with a new value in a for-loop.  The
initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Luis Henriques
dd980fc0d5 ceph: add ceph.caps vxattr
Add a new vxattr that allows userspace to list the caps for a specific
directory or file.

[ jlayton: change format delimiter to '/' ]

Signed-off-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Jeff Layton
bca9fc14c7 ceph: when filling trace, call ceph_get_inode outside of mutexes
Geng Jichao reported a rather complex deadlock involving several
moving parts:

1) readahead is issued against an inode and some of its pages are locked
   while the read is in flight

2) the same inode is evicted from the cache, and this task gets stuck
   waiting for the page lock because of the above readahead

3) another task is processing a reply trace, and looks up the inode
   being evicted while holding the s_mutex. That ends up waiting for the
   eviction to complete

4) a write reply for an unrelated inode is then processed in the
   ceph_con_workfn job. It calls ceph_check_caps after putting wrbuffer
   caps, and that gets stuck waiting on the s_mutex held by 3.

The reply to "1" is stuck behind the write reply in "4", so we deadlock
at that point.

This patch changes the trace processing to call ceph_get_inode outside
of the s_mutex and snap_rwsem, which should break the cycle above.

[ idryomov: break unnecessarily long lines ]

URL: https://tracker.ceph.com/issues/47998
Reported-by: Geng Jichao <gengjichao@jd.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:48 +01:00
Luis Henriques
6646ea1c8e Revert "ceph: allow rename operation under different quota realms"
This reverts commit dffdcd7145.

When doing a rename across quota realms, there's a corner case that isn't
handled correctly.  Here's a testcase:

  mkdir files limit
  truncate files/file -s 10G
  setfattr limit -n ceph.quota.max_bytes -v 1000000
  mv files limit/

The above will succeed because ftruncate(2) won't immediately notify the
MDSs with the new file size, and thus the quota realms stats won't be
updated.

Since the possible fixes for this issue would have a huge performance impact,
the solution for now is to simply revert to returning -EXDEV when doing a cross
quota realms rename.

URL: https://tracker.ceph.com/issues/48203
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:47 +01:00
Jeff Layton
68cbb8056a ceph: fix inode refcount leak when ceph_fill_inode on non-I_NEW inode fails
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:47 +01:00
Luis Henriques
ccd1acdf1c ceph: downgrade warning from mdsmap decode to debug
While the MDS cluster is unstable and changing state the client may get
mdsmap updates that will trigger warnings:

  [144692.478400] ceph: mdsmap_decode got incorrect state(up:standby-replay)
  [144697.489552] ceph: mdsmap_decode got incorrect state(up:standby-replay)
  [144697.489580] ceph: mdsmap_decode got incorrect state(up:standby-replay)

This patch downgrades these warnings to debug, as they may flood the logs
if the cluster is unstable for a while.

Signed-off-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:47 +01:00
Luis Henriques
e5cafce3ad ceph: fix race in concurrent __ceph_remove_cap invocations
A NULL pointer dereference may occur in __ceph_remove_cap with some of the
callbacks used in ceph_iterate_session_caps, namely trim_caps_cb and
remove_session_caps_cb. Those callers hold the session->s_mutex, so they
are prevented from concurrent execution, but ceph_evict_inode does not.

Since the callers of this function hold the i_ceph_lock, the fix is simply
a matter of returning immediately if caps->ci is NULL.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/43272
Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:47 +01:00
Jeff Layton
4a357f5069 ceph: pass down the flags to grab_cache_page_write_begin
write_begin operations are passed a flags parameter that we need to
mirror here, so that we don't (e.g.) recurse back into filesystem code
inappropriately.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:47 +01:00
Xiubo Li
5a9e2f5d55 ceph: add ceph.{cluster_fsid/client_id} vxattrs
These two vxattrs will only exist in local client side, with which
we can easily know which mountpoint the file belongs to and also
they can help locate the debugfs path quickly.

URL: https://tracker.ceph.com/issues/48057
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:47 +01:00
Xiubo Li
247b1f19db ceph: add status debugfs file
This will help list some useful client side info, like the client
entity address/name and blocklisted status, etc.

URL: https://tracker.ceph.com/issues/48057
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:47 +01:00
Liu, Changcheng
36c9478d60 libceph: remove unused port macros
1. monitor's default port is defined by CEPH_MON_PORT
2. CEPH_PORT_START and CEPH_PORT_LAST are not needed.

Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:47 +01:00
Jeff Layton
04fabb1199 ceph: ensure we have Fs caps when fetching dir link count
The link count for a directory is defined as inode->i_subdirs + 2,
(for "." and ".."). i_subdirs is only populated when Fs caps are held.
Ensure we grab Fs caps when fetching the link count for a directory.

[ idryomov: break unnecessarily long line ]

URL: https://tracker.ceph.com/issues/48125
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:47 +01:00
Xiubo Li
8ba3b8c7fb ceph: send dentry lease metrics to MDS daemon
For the old ceph version, if it received this one metric message
containing the dentry lease metric info, it will just ignore it.

URL: https://tracker.ceph.com/issues/43423
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:47 +01:00
Jeff Layton
81048c00d1 ceph: acquire Fs caps when getting dir stats
We only update the inode's dirstats when we have Fs caps from the MDS.

Declare a new VXATTR_FLAG_DIRSTAT that we set on all dirstats, and have
the vxattr handling code acquire those caps when it's set.

URL: https://tracker.ceph.com/issues/48104
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:47 +01:00
Jeff Layton
06a1ad438b ceph: fix up some warnings on W=1 builds
Convert some decodes into unused variables into skips, and fix up some
non-kerneldoc comment headers to not start with "/**".

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:47 +01:00
Jeff Layton
4ae3713fe4 ceph: queue MDS requests to REJECTED sessions when CLEANRECOVER is set
Ilya noticed that the first access to a blacklisted mount would often
get back -EACCES, but then subsequent calls would be OK. The problem is
in __do_request. If the session is marked as REJECTED, a hard error is
returned instead of waiting for a new session to come into being.

When the session is REJECTED and the mount was done with
recover_session=clean, queue the request to the waiting_for_map queue,
which will be awoken after tearing down the old session. We can only
do this for sync requests though, so check for async ones first and
just let the callers redrive a sync request.

URL: https://tracker.ceph.com/issues/47385
Reported-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:47 +01:00
Jeff Layton
dbeec07bc8 ceph: remove timeout on allowing reconnect after blocklisting
30 minutes is a long time to wait, and this makes it difficult to test
the feature by manually blocklisting clients. Remove the timeout
infrastructure and just allow the client to reconnect at will.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:46 +01:00
Jeff Layton
50c9132ddf ceph: add new RECOVER mount_state when recovering session
When recovering a session (a'la recover_session=clean), we want to do
all of the operations that we do on a forced umount, but changing the
mount state to SHUTDOWN is can cause queued MDS requests to fail when
the session comes back. Most of those can idle until the session is
recovered in this situation.

Reserve SHUTDOWN state for forced umount, and make a new RECOVER state
for the forced reconnect situation. Change several tests for equality with
SHUTDOWN to test for that or RECOVER.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:46 +01:00
Jeff Layton
aa5c791053 ceph: make fsc->mount_state an int
This field is an unsigned long currently, which is a bit of a waste on
most arches since this just holds an enum. Make it (signed) int instead.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:46 +01:00
Jeff Layton
dc167e38a0 ceph: don't WARN when removing caps due to blocklisting
We expect to remove dirty caps when the client is blocklisted. Don't
throw a warning in that case.

[ idryomov: break unnecessarily long line ]

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14 23:21:46 +01:00
Linus Torvalds
2c85ebc57b Linux 5.10 2020-12-13 14:41:30 -08:00
Linus Torvalds
ec6f5e0e5c A set of x86 and membarrier fixes:
- Correct a few problems in the x86 and the generic membarrier
     implementation. Small corrections for assumptions about visibility
     which have turned out not to be true.
 
   - Make the PAT bits for memory encryption correct vs. 4K and 2M/1G page
     table entries as they are at a different location.
 
   - Fix a concurrency issue in the the local bandwidth readout of resource
     control leading to incorrect values
 
   - Fix the ordering of allocating a vector for an interrupt. The order
     missed to respect the provided cpumask when the first attempt of
     allocating node local in the mask fails. It then tries the node instead
     of trying the full provided mask first. This leads to erroneous error
     messages and breaking the (user) supplied affinity request. Reorder it.
 
   - Make the INT3 padding detection in optprobe work correctly.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/WOw0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodUhEACW02Al+mlpZjOhzIGwNUALGX7YGEOi
 wJoiIDTV2vOtKhcJ/AOVn05CznoWXa6PN1+7pm0nR2M1XaiXZ3B6HMjy1+0rpSqK
 5dCCEqr+QowwMRsKd4hnuJowR/7Og66FiYIcknDJg/1egg+RXy7yKds577W6/KW0
 2VV/x35xDywERiQ28qIftQ4L7NREyiioomN/GFpeoQf8tQF06Rb4t12T5pdA6D8S
 /C1wj0IKVRMJo/7HbB/X6skxPOK0PWAEBpaT0+Q66VKSNnFVN5ap+rGTKdUoTmuM
 NdxHOukpHyFCQtbRPOjRUeeSZJLKSX5oXsBO1GUvyyxcT2XNlTOdNamYLWyO+RfV
 uP97qhzYDKL+cgDkwypJ0WOzOb9EIXOh4P9BTnJFBGhc4EQwen3cpb3CyWWftjnv
 /obXiRDnAOE6P6H2AGiwrK3Pny+SvgrFYKMe+iy+ntToz1yrDh7ZrZ0DQKVoFWEr
 n3qUnlPZmVvRzHIRVYoK69nS/UgmNN0LssavzRBzab3BcK93f23QkW86P42kNCLa
 9kL+ZgGwlpqUZZR/p3pHq9Mv2ZXGEdUYY99h2vy1fgMwOw/RQZnPDAj/131UXlsU
 4DL0mAlTK1/0/81H6/V1GDBkMkO+hN4x6Y3asi7bPHEKEzlLe+P1tUt3YELd3CEC
 dc8ebICG1InXsw==
 =t31y
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of x86 and membarrier fixes:

   - Correct a few problems in the x86 and the generic membarrier
     implementation. Small corrections for assumptions about visibility
     which have turned out not to be true.

   - Make the PAT bits for memory encryption correct vs 4K and 2M/1G
     page table entries as they are at a different location.

   - Fix a concurrency issue in the the local bandwidth readout of
     resource control leading to incorrect values

   - Fix the ordering of allocating a vector for an interrupt. The order
     missed to respect the provided cpumask when the first attempt of
     allocating node local in the mask fails. It then tries the node
     instead of trying the full provided mask first. This leads to
     erroneous error messages and breaking the (user) supplied affinity
     request. Reorder it.

   - Make the INT3 padding detection in optprobe work correctly"

* tag 'x86-urgent-2020-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kprobes: Fix optprobe to detect INT3 padding correctly
  x86/apic/vector: Fix ordering in vector assignment
  x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  x86/mm/mem_encrypt: Fix definition of PMD_FLAGS_DEC_WP
  membarrier: Execute SYNC_CORE on the calling thread
  membarrier: Explicitly sync remote cores when SYNC_CORE is requested
  membarrier: Add an actual barrier before rseq_preempt()
  x86/membarrier: Get rid of a dubious optimization
2020-12-13 11:31:19 -08:00
Linus Torvalds
d2360a398f block-5.10-2020-12-12
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/VMG8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpsBdEACjbsECHIyQZOtTHW+vY2wkOj2ph+OkXtUU
 nmRlXn5itcrTEeHyenwDEc/W03WbU5m4C6BUz9ecRtXLda6K9W2d7YspSWND1JP/
 Gz3YtingA4xwAl7wOtHT6gi6JHb/seDObiKWAik6f7A3xvBW3o9lV1QRy3kaNdG1
 XgcYvBZBHDaP7Gdvvbg6FucSe3dNjSi+skVdvx8j+i27HF3LAEp3eoWG4vZBUePH
 RIpdDyCXRao+Z22V+EeY0X5cH568fxS408Z6xXewV/g2TDd4XUaXz3WM0O9FN0jU
 oTiQeXL+1y4EfU8QoboTWvHmpoKRqWMSZVD0GXbmx4ZwpiFHbyBub2SydU2Xm/Vl
 oNwHc1wktVU2HFrWU9dwrlWz4Egt4KZO6D9SzJ3VDt9HLKIciyZD16rbsVwSr/Bc
 BVQaFJjRYoR60xxNohIEC8C3rF3qRbkAy5HYSHFUyuOE6gm05RQIHLprY6gIjSei
 7gJnP1FL3iu5bQywp7ptqaRIlkeau6kqvboc8oYFDQvVAp35JqnxjdOsAByWUZR9
 lrN87bIh0n1Vqprs5arXWhw5Ixy7ZQsf0DsrFImZgU+VNTFmuB+1cKaa94EsQwo5
 gQH2j39QQSlGmkXphQVWDd0eGCAt3OrmjYe721ZSSLvJjM64+7pZFZpYPc//VpR5
 8Kcq/yDeOA==
 =PmVb
 -----END PGP SIGNATURE-----

Merge tag 'block-5.10-2020-12-12' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "This should be it for 5.10.

  Mike and Song looked into the warning case, and thankfully it appears
  the fix was pretty trivial - we can just change the md device chunk
  type to unsigned int to get rid of it. They cannot currently be < 0,
  and nobody is checking for that either.

  We're reverting the discard changes as the corruption reports came in
  very late, and there's just no time to attempt to deal with it at this
  point. Reverting the changes in question is the right call for 5.10"

* tag 'block-5.10-2020-12-12' of git://git.kernel.dk/linux-block:
  md: change mddev 'chunk_sectors' from int to unsigned
  Revert "md: add md_submit_discard_bio() for submitting discard bio"
  Revert "md/raid10: extend r10bio devs to raid disks"
  Revert "md/raid10: pull codes that wait for blocked dev into one function"
  Revert "md/raid10: improve raid10 discard request"
  Revert "md/raid10: improve discard request for far layout"
  Revert "dm raid: remove unnecessary discard limits for raid10"
2020-12-13 10:36:23 -08:00
Linus Torvalds
6bff9bb8a2 SCSI fixes on 20201212
Five small fixes: four in drivers: hisi_sas: fix internal queue
 timeout, be2iscsi: revert a prior fix causing problems, bnx2i: add
 missing dependency, storvsc: late arriving revert of a problem fix,
 and one in the core.  The core one is a minor change to stop paying
 attention to the busy count when returning out of resources because
 there's a race window where the queue might not restart due to missing
 returning I/O.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX9UadiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishfV1AP4qISaq
 qQyrX4OcePdQL9YrfaBCuV3NckE+cFKKO4qCmAEAxTarxMsgqpS1M9W6Y/D271mz
 Bpkc0QI1xbCUIyoGk6c=
 =HaYI
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Five small fixes.  Four in drivers:

   - hisi_sas: fix internal queue timeout

   - be2iscsi: revert a prior fix causing problems

   - bnx2i: add missing dependency

   - storvsc: late arriving revert of a problem fix

  and one in the core.

  The core one is a minor change to stop paying attention to the busy
  count when returning out of resources because there's a race window
  where the queue might not restart due to missing returning I/O"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  Revert "scsi: storvsc: Validate length of incoming packet in storvsc_on_channel_callback()"
  scsi: hisi_sas: Select a suitable queue for internal I/Os
  scsi: core: Fix race between handling STS_RESOURCE and completion
  scsi: be2iscsi: Revert "Fix a theoretical leak in beiscsi_create_eqs()"
  scsi: bnx2i: Requires MMU
2020-12-12 12:57:12 -08:00
Linus Torvalds
5ee595d907 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
 "Bugfix for the AT24 EEPROM driver"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  misc: eeprom: at24: fix NVMEM name with custom AT24 device name
2020-12-12 12:47:46 -08:00
Linus Torvalds
7b1b868e1d Bugfixes for ARM, x86 and tools.
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/UDHQUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMGeQf9EtGft5U5EihqAbNr2O61Bh4ptCIT
 +qNWWfuGQkKLsP6PCHMUJnNI3WJy2/Gb5+nUHjFXSEZBP2l3KGRuDniAdm4+DyEi
 2khVmJiXYn2q2yfodmpHA/dqav3OHSrsq2IfH+J+WAFlIHnjkdz3Wk1zNFk7Y/xv
 PVv2czvXhsnrvHvNp5e1+YsVGkMZc9fwXLRbac7ptmaKUKCBAgpZO8Gkc2GGgOdE
 zUDp3qA8/7Ys+vzzYfPrRMUhev9dgE4x2TBmtOuzqOcfj2FOKRbKbwjur37fJ61j
 Px4F2ZI0GEL0RrHvZK1vZ5KO41BcD+gQPumKAg1Lgz312loKj85RG8nBEQ==
 =BJ9g
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bugfixes for ARM, x86 and tools"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  tools/kvm_stat: Exempt time-based counters
  KVM: mmu: Fix SPTE encoding of MMIO generation upper half
  kvm: x86/mmu: Use cpuid to determine max gfn
  kvm: svm: de-allocate svm_cpu_data for all cpus in svm_cpu_uninit()
  selftests: kvm/set_memory_region_test: Fix race in move region test
  KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
  KVM: arm64: Fix handling of merging tables into a block entry
  KVM: arm64: Fix memory leak on stage2 update of a valid PTE
2020-12-12 10:08:16 -08:00
Linus Torvalds
b53966ffd4 xen: branch for v5.10-rc8
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCX9MyxQAKCRCAXGG7T9hj
 vrL8AP0aGFq2Ya77poPVLWMvrBV9KtMVzpLUBzF/M75ShE0EXAD7B7K5QOL/1RcJ
 6BD85rFyOaA0YrunHYGJAtDTqms4ZQY=
 =uwI9
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.10c-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "A short series fixing a regression introduced in 5.9 for running as
  Xen dom0 on a system with NVMe backed storage"

* tag 'for-linus-5.10c-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: don't use page->lru for ZONE_DEVICE memory
  xen: add helpers for caching grant mapping pages
2020-12-12 10:02:03 -08:00
Linus Torvalds
b01deddb8d RISC-V Fixes for 5.10 (unless there's an rc8)
I've just got one fix.  It's nothing critical, just a randconfig that
 wasn't building.  That said, it does seem pretty safe and is technically
 a regression so I'm sending it along for 5.10:
 
 * Define get_cycles64() all the time, as it's used by most
   configurations.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl/T5B4THHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiQflD/4oJJzU9XWYqIzUwidXbISzL9nm3Sg7
 Izo7mlhei2Hlp9CrnEcCkAdSpYRvdOsnK9ikcOe0gjOqI03DbFp+tN/oZkhLR+Gl
 HkLLm7jmNs7NumQeQ4HzMDl+SbA2zE6vUkYgaf99Oy7bD6DvVHEWe3Ghg89fqUka
 W2OzOV0m95sNYG0Yy+/YKPtb0uoYAGQeSD337etTgzPIZRqetYh9kv23H9pXpShg
 Z47+a8/ZUXLSdZjmN3oOcte5dAG4ygCoge6UkE1u9T2ATjYWUkUacoBkzzavGOrN
 a2nz2hB0A2Wfm260LDOBe+5+YqbNPxDKUuBzXL+n/g66zggZFHoPkx2CbJyU3PBj
 2QtxiRbrLAqoldEn9XnR47K+i5Eg9M+vTy09WaHbvNp8bIKPf0RY3IOuiJqCTeOx
 vlb6q17c7YGos0zPZNj9gXmZu+/ayCvl0rPXjwybHq+SOi2TXYeKeBVF4cBKDWOo
 g2oed2gOaX1ekW9b11giygWdDGlwPWxz6dMUlEtEm/2y4emOz/otTHAblq9l+jZf
 JKtUCm2Vxu4ukxFjXuNVElV5LPLR2EHrm+vJfwFSWePY+/+JQN0ft6s3zCBlSN6U
 RaHyNrQKAqNqJzftArBgiKGObBj8N7QJ6SoFNVmQBCAQHIvRGvI8tHcqyIKFYRLf
 YZn+8RbXaW4nOQ==
 =wSj3
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fix from Palmer Dabbelt:
 "Just one fix. It's nothing critical, just a randconfig that wasn't
  building. That said, it does seem pretty safe and is technically a
  regression so I'm sending it along for 5.10:

   - define get_cycles64() all the time, as it's used by most
     configurations"

* tag 'riscv-for-linus-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Define get_cycles64() regardless of M-mode
2020-12-12 09:50:26 -08:00
Linus Torvalds
31d00f6eb1 io_uring-5.10-2020-12-11
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/UMOwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmI3EAC22QohUftfWAJeks142J9wFKQgYfNZpZHs
 iJR07NTYUSghp9rpfGVNGY/ZonUoKds0VSbDZpRU8TTjglGf9GguzYE4iOMedtG3
 MA9diNOnAAK0gwZrBx1uB882X4pf1qYGChMe9fXXJ4mQsHpG84q00GnJI0cPjkKn
 PaYRVugr8AxCzG4dHHL0ckQtsEHWaPwe7i5u4UkoOC3IsrVjRri6wYAa98xDuznv
 fqssoMp2cSOcVHsfypQAVZbg0heR3Zo2OuXE4uw6yjB7w5j8tuLIV9XKtqwiRn2n
 57rdnGfY6vSfJb9EPm6fwBRnve2cI2Xbr30x/XWSIx+QcUOFOsBfdblaJikX8cSJ
 FG5c0AgBeRo1heWC4FRNIFStAubm0RPJAPqNm9p6T0DzO4hC3xTTdkl04B5bA4Ms
 4tNWhk2fP8YRKhC0gRcw7zRHAl/qjuMjrUor5L68jxGlmFR7h+uMky0AsS4KZKNZ
 o1+HFmc/m+3ZBNzL7NoXkJzWFaKzZVf0iuEqJP3Xqy91hCA85pvh7DGAlQpZBppM
 C0eNeJ59rWvfSuUS9bLlLfoMO3Ab5BsE4IEji7K2lBNZsCXQpXHVv4xJxgKJRja9
 t+SuBgYpNOfHrIPSY0LaJh93MNzoP22TP8qF8U5gWQ0Bus/mVKE9S1EXSUAvfHPm
 IyOfTB9M6g==
 =wjOp
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.10-2020-12-11' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Two fixes in here, fixing issues introduced in this merge window"

* tag 'io_uring-5.10-2020-12-11' of git://git.kernel.dk/linux-block:
  io_uring: fix file leak on error path of io ctx creation
  io_uring: fix mis-seting personality's creds
2020-12-12 09:45:01 -08:00
Linus Torvalds
643e69aff8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:

 - a fix for cm109 stomping on its own control URB if it tries to toggle
   buzzer immediately after userspace opens input device (found by
   syzcaller)

 - another fix for Raydium touchscreens that do not like splitting
   command transfers

 - quirks for i8042, soc_button_array, and goodix drivers to make them
   work better with certain hardware.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: goodix - add upside-down quirk for Teclast X98 Pro tablet
  Input: cm109 - do not stomp on control URB
  Input: i8042 - add Acer laptops to the i8042 reset list
  Input: cros_ec_keyb - send 'scancodes' in addition to key events
  Input: soc_button_array - add Lenovo Yoga Tablet2 1051L to the dmi_use_low_level_irq list
  Input: raydium_ts_i2c - do not split tx transactions
2020-12-12 09:41:33 -08:00
Mike Snitzer
6ffeb1c3f8 md: change mddev 'chunk_sectors' from int to unsigned
Commit e2782f560c ("Revert "dm raid: remove unnecessary discard
limits for raid10"") exposed compiler warnings introduced by commit
e0910c8e4f ("dm raid: fix discard limits for raid1 and raid10"):

In file included from ./include/linux/kernel.h:14,
                 from ./include/asm-generic/bug.h:20,
                 from ./arch/x86/include/asm/bug.h:93,
                 from ./include/linux/bug.h:5,
                 from ./include/linux/mmdebug.h:5,
                 from ./include/linux/gfp.h:5,
                 from ./include/linux/slab.h:15,
                 from drivers/md/dm-raid.c:8:
drivers/md/dm-raid.c: In function ‘raid_io_hints’:
./include/linux/minmax.h:18:28: warning: comparison of distinct pointer types lacks a cast
  (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                            ^~
./include/linux/minmax.h:32:4: note: in expansion of macro ‘__typecheck’
   (__typecheck(x, y) && __no_side_effects(x, y))
    ^~~~~~~~~~~
./include/linux/minmax.h:42:24: note: in expansion of macro ‘__safe_cmp’
  __builtin_choose_expr(__safe_cmp(x, y), \
                        ^~~~~~~~~~
./include/linux/minmax.h:51:19: note: in expansion of macro ‘__careful_cmp’
 #define min(x, y) __careful_cmp(x, y, <)
                   ^~~~~~~~~~~~~
./include/linux/minmax.h:84:39: note: in expansion of macro ‘min’
  __x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); })
                                       ^~~
drivers/md/dm-raid.c:3739:33: note: in expansion of macro ‘min_not_zero’
   limits->max_discard_sectors = min_not_zero(rs->md.chunk_sectors,
                                 ^~~~~~~~~~~~

Fix this by changing the chunk_sectors member of 'struct mddev' from
int to 'unsigned int' to match the type used for the 'chunk_sectors'
member of 'struct queue_limits'.  Various MD code still uses 'int' but
none of it appears to ever make use of signed int; and storing
positive signed int in unsigned is perfectly safe.

Reported-by: Song Liu <songliubraving@fb.com>
Fixes: e2782f560c ("Revert "dm raid: remove unnecessary discard limits for raid10"")
Fixes: e0910c8e4f ("dm raid: fix discard limits for raid1 and raid10")
Cc: stable@vger,kernel.org # e0910c8e4f was marked for stable@
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-12 10:07:50 -07:00
Masami Hiramatsu
0d07c0ec43 x86/kprobes: Fix optprobe to detect INT3 padding correctly
Commit

  7705dc8557 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")

changed the padding bytes between functions from NOP to INT3. However,
when optprobe decodes a target function it finds INT3 and gives up the
jump optimization.

Instead of giving up any INT3 detection, check whether the rest of the
bytes to the end of the function are INT3. If all of them are INT3,
those come from the linker. In that case, continue the optprobe jump
optimization.

 [ bp: Massage commit message. ]

Fixes: 7705dc8557 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")
Reported-by: Adam Zabrocki <pi3@pi3.com.pl>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160767025681.3880685.16021570341428835411.stgit@devnote2
2020-12-12 15:25:17 +01:00
Simon Beginn
cffdd6d904 Input: goodix - add upside-down quirk for Teclast X98 Pro tablet
The touchscreen on the Teclast x98 Pro is also mounted upside-down in
relation to the display orientation.

Signed-off-by: Simon Beginn <linux@simonmicro.de>
Signed-off-by: Bastien Nocera <hadess@hadess.net>
Link: https://lore.kernel.org/r/20201117004253.27A5A27EFD@localhost
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-12-11 16:21:01 -08:00
Stefan Raspl
111d0bda8e tools/kvm_stat: Exempt time-based counters
The new counters halt_poll_success_ns and halt_poll_fail_ns do not count
events. Instead they provide a time, and mess up our statistics. Therefore,
we should exclude them.
Removal is currently implemented with an exempt list. If more counters like
these appear, we can think about a more general rule like excluding all
fields name "*_ns", in case that's a standing convention.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Tested-and-reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20201208210829.101324-1-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-11 19:18:51 -05:00