linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-20 01:52:13 +00:00

Author	SHA1	Message	Date
Jeff Layton	0fa8263367	ceph: fix endianness bug when handling MDS session feature bits Eduard reported a problem mounting cephfs on s390 arch. The feature mask sent by the MDS is little-endian, so we need to convert it before storing and testing against it. Cc: stable@vger.kernel.org Reported-and-Tested-by: Eduard Shishkin <edward6@linux.ibm.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-05-04 19:14:23 +02:00
Yan, Zheng	719a2514e9	ceph: consider inode's last read/write when calculating wanted caps Add i_last_rd and i_last_wr to ceph_inode_info. These fields are used to track the last time the client acquired read/write caps for the inode. If there is no read/write on an inode for 'caps_wanted_delay_max' seconds, __ceph_caps_file_wanted() does not request caps for read/write even there are open files. Call __ceph_touch_fmode() for dir operations. __ceph_caps_file_wanted() calculates dir's wanted caps according to last dir read/modification. If there is recent dir read, dir inode wants CEPH_CAP_ANY_SHARED caps. If there is recent dir modification, also wants CEPH_CAP_FILE_EXCL. Readdir is a special case. Dir inode wants CEPH_CAP_FILE_EXCL after readdir, as with that, modifications do not need to release CEPH_CAP_FILE_SHARED or invalidate all dentry leases issued by readdir. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:42 +02:00
Yan, Zheng	c0e385b106	ceph: always renew caps if mds_wanted is insufficient Original code only renews caps for inodes with CEPH_I_CAP_DROPPED flag, which indicates that mds has closed the session and caps were dropped. Remove this flag in preparation for not requesting caps for idle open files. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:42 +02:00
Jeff Layton	785892fe88	ceph: cache layout in parent dir on first sync create If a create is done, then typically we'll end up writing to the file soon afterward. We don't want to wait for the reply before doing that when doing an async create, so that means we need the layout for the new file before we've gotten the response from the MDS. All files created in a directory will initially inherit the same layout, so copy off the requisite info from the first synchronous create in the directory, and save it in a new i_cached_layout field. Zero out the layout when we lose Dc caps in the dir. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:42 +02:00
Jeff Layton	6deb8008a8	ceph: add new MDS req field to hold delegated inode number Add new request field to hold the delegated inode number. Encode that into the message when it's set. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:42 +02:00
Jeff Layton	d484648787	ceph: decode interval_sets for delegated inos Starting in Octopus, the MDS will hand out caps that allow the client to do asynchronous file creates under certain conditions. As part of that, the MDS will delegate ranges of inode numbers to the client. Add the infrastructure to decode these ranges, and stuff them into an xarray for later consumption by the async creation code. Because the xarray code currently only handles unsigned long indexes, and those are 32-bits on 32-bit arches, we only enable the decoding when running on a 64-bit arch. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:42 +02:00
Jeff Layton	a25949b990	ceph: cap tracking for async directory operations Track and correctly handle directory caps for asynchronous operations. Add aliases for Frc caps that we now designate at Dcu caps (when dealing with directories). Unlike file caps, we don't reclaim these when the session goes away, and instead preemptively release them. In-flight async dirops are instead handled during reconnect phase. The client needs to re-do a synchronous operation in order to re-get directory caps. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:41 +02:00
Jeff Layton	891f3f5a6a	ceph: add infrastructure for waiting for async create to complete When we issue an async create, we must ensure that any later on-the-wire requests involving it wait for the create reply. Expand i_ceph_flags to be an unsigned long, and add a new bit that MDS requests can wait on. If the bit is set in the inode when sending caps, then don't send it and just return that it has been delayed. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:41 +02:00
Jeff Layton	3bb48b4142	ceph: add flag to designate that a request is asynchronous ...and ensure that such requests are never queued. The MDS has need to know that a request is asynchronous so add flags and proper infrastructure for that. Also, delegated inode numbers and directory caps are associated with the session, so ensure that async requests are always transmitted on the first attempt and are never queued to wait for session reestablishment. If it does end up looking like we'll need to queue the request, then have it return -EJUKEBOX so the caller can reattempt with a synchronous request. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:41 +02:00
Xiubo Li	8ccf7fcce1	ceph: return ETIMEDOUT errno to userland when request timed out req->r_timeout is only used during mounting, so this error will be more accurate. URL: https://tracker.ceph.com/issues/44215 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:41 +02:00
Jeff Layton	058daab79d	ceph: move to a dedicated slabcache for mds requests On my machine (x86_64) this struct is 952 bytes, which gets rounded up to 1024 by kmalloc. Move this to a dedicated slabcache, so we can allocate them without the extra 72 bytes of overhead per. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:41 +02:00
Yan, Zheng	525d15e8e5	ceph: check inode type for CEPH_CAP_FILE_{CACHE,RD,REXTEND,LAZYIO} These bits will have new meaning for directory inodes. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:40 +02:00
Jeff Layton	3db0a2fc56	ceph: register MDS request with dir inode from the start When the unsafe reply to a request comes in, the request is put on the r_unsafe_dir inode's list. In future patches, we're going to need to wait on requests that may not have gotten an unsafe reply yet. Change __register_request to put the entry on the dir inode's list when the pointer is set in the request, and don't check the CEPH_MDS_R_GOT_UNSAFE flag when unregistering it. The only place that uses this list today is fsync codepath, and with the coming changes, we'll want to wait on all operations whether it has gotten an unsafe reply or not. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:39 +02:00
Linus Torvalds	4c46bef2e9	We have: - a set of patches that fixes various corner cases in mount and umount code (Xiubo Li). This has to do with choosing an MDS, distinguishing between laggy and down MDSes and parsing the server path. - inode initialization fixes (Jeff Layton). The one included here mostly concerns things like open_by_handle() and there is another one that will come through Al. - copy_file_range() now uses the new copy-from2 op (Luis Henriques). The existing copy-from op turned out to be infeasible for generic filesystem use; we disable the copy offload if OSDs don't support copy-from2. - a patch to link "rbd" and "block" devices together in sysfs (Hannes Reinecke) And a smattering of cleanups from Xiubo, Jeff and Chengguang. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl47PUcTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi6LoCACmVli5N6bgnBE4sTixi/jz6aCCbk32 ZPlKiSesHnOGkY6KXHJT58JYy0paITBRik5ypdz06J8aCOtWyPLbn3uCemF9CYn2 g6dId2Lf5vGFrgSm4YSiqp9a86IZmYSDG41LbJD/IJWFDWdMWqNPMDqji6yaIO5O NJI5N0tk+VFXdV+JyjV9X/FnP1r1D2ReZzz21ZiqTJXSmE8YIkioLjkq36QTMMG7 Gm5qdlc1x2r4qfzA1g+OiWgRQCUMgkuYerFzus4mVbW4hrphsavH2DArbOwFmsXF 46hOq+1uGVVyZILLJfKNiktf1GExBF0icbSREJtmjUHbQvNR8BH0C+fV =vvIc -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.6-rc1' of https://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: - a set of patches that fixes various corner cases in mount and umount code (Xiubo Li). This has to do with choosing an MDS, distinguishing between laggy and down MDSes and parsing the server path. - inode initialization fixes (Jeff Layton). The one included here mostly concerns things like open_by_handle() and there is another one that will come through Al. - copy_file_range() now uses the new copy-from2 op (Luis Henriques). The existing copy-from op turned out to be infeasible for generic filesystem use; we disable the copy offload if OSDs don't support copy-from2. - a patch to link "rbd" and "block" devices together in sysfs (Hannes Reinecke) ... and a smattering of cleanups from Xiubo, Jeff and Chengguang. * tag 'ceph-for-5.6-rc1' of https://github.com/ceph/ceph-client: (25 commits) rbd: set the 'device' link in sysfs ceph: move net/ceph/ceph_fs.c to fs/ceph/util.c ceph: print name of xattr in __ceph_{get,set}xattr() douts ceph: print r_direct_hash in hex in __choose_mds() dout ceph: use copy-from2 op in copy_file_range ceph: close holes in structs ceph_mds_session and ceph_mds_request rbd: work around -Wuninitialized warning ceph: allocate the correct amount of extra bytes for the session features ceph: rename get_session and switch to use ceph_get_mds_session ceph: remove the extra slashes in the server path ceph: add possible_max_rank and make the code more readable ceph: print dentry offset in hex and fix xattr_version type ceph: only touch the caps which have the subset mask requested ceph: don't clear I_NEW until inode metadata is fully populated ceph: retry the same mds later after the new session is opened ceph: check availability of mds cluster on mount after wait timeout ceph: keep the session state until it is released ceph: add __send_request helper ceph: ensure we have a new cap before continuing in fill_inode ceph: drop unused ttl_from parameter from fill_inode ...	2020-02-06 12:21:01 +00:00
Linus Torvalds	bddea11b1b	Merge branch 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs timestamp updates from Al Viro: "More 64bit timestamp work" * 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: kernfs: don't bother with timestamp truncation fs: Do not overload update_time fs: Delete timespec64_trunc() fs: ubifs: Eliminate timespec64_trunc() usage fs: ceph: Delete timespec64_trunc() usage fs: cifs: Delete usage of timespec64_trunc fs: fat: Eliminate timespec64_trunc() usage utimes: Clamp the timestamps in notify_change()	2020-02-05 05:02:42 +00:00
Xiubo Li	3c802092da	ceph: print r_direct_hash in hex in __choose_mds() dout It's hard to read, especially when it is: ceph: __choose_mds 00000000b7bc9c15 is_hash=1 (-271041095) mode 0 At the same time, switch to __func__ to get rid of the checkpatch warning. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-01-27 16:53:40 +01:00
Xiubo Li	9ba1e22453	ceph: allocate the correct amount of extra bytes for the session features The total bytes may potentially be larger than 8. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-01-27 16:53:40 +01:00
Xiubo Li	5b3248c677	ceph: rename get_session and switch to use ceph_get_mds_session Just in case the session's refcount reach 0 and is releasing, and if we get the session without checking it, we may encounter kernel crash. Rename get_session to ceph_get_mds_session and make it global. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-01-27 16:53:40 +01:00
Xiubo Li	b38c9eb475	ceph: add possible_max_rank and make the code more readable The m_num_mds here is actually the number for MDSs which are in up:active status, and it will be duplicated to m_num_active_mds, so remove it. Add possible_max_rank to the mdsmap struct and this will be the correctly possible largest rank boundary. Remove the special case for one mds in __mdsmap_get_random_mds(), because the validate mds rank may not always be 0. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-01-27 16:53:40 +01:00
Xiubo Li	c4853e9776	ceph: retry the same mds later after the new session is opened If max_mds > 1 and a request is submitted that chooses a random mds rank, and the relating session is not opened yet, the request will wait until the session has been opened and resend again. Every time the request goes through __do_request, it will release the req->session first and choose a random one again, which may be a completely different rank than the one it just waited on. In the worst case, it will open all the mds sessions one by one just before the request can be successfully sent out. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-01-27 16:53:39 +01:00
Xiubo Li	97820058fb	ceph: check availability of mds cluster on mount after wait timeout If all the MDS daemons are down for some reason, then the first mount attempt will fail with EIO after the mount request times out. A mount attempt will also fail with EIO if all of the MDS's are laggy. This patch changes the code to return -EHOSTUNREACH in these situations and adds a pr_info error message to help the admin determine the cause. URL: https://tracker.ceph.com/issues/4386 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-01-27 16:53:39 +01:00
Xiubo Li	4d681c2f91	ceph: keep the session state until it is released When reconnecting the session but if it is denied by the MDS due to client was in blacklist or something else, kclient will receive a session close reply, and we will never see the important log: "ceph: mds%d reconnect denied" And with the confusing log: "ceph: handle_session mds0 close 0000000085804730 state ??? seq 0" Let's keep the session state until its memories is released. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-01-27 16:53:39 +01:00
Xiubo Li	9cf54563b0	ceph: add __send_request helper Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-01-27 16:53:39 +01:00
Xiubo Li	07edc0571e	ceph: fix possible long time wait during umount During umount, if there has no any unsafe request in the mdsc and some requests still in-flight and not got reply yet, and if the rest requets are all safe ones, after that even all of them in mdsc are unregistered, the umount must wait until after mount_timeout seconds anyway. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-01-27 16:53:39 +01:00
Xiubo Li	5d47648fe9	ceph: only choose one MDS who is in up:active state without laggy Even the MDS is in up:active state, but it also maybe laggy. Here will skip the laggy MDSs. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-01-27 16:53:39 +01:00
Chengguang Xu	8f5ac172ab	ceph: delete redundant douts in con_get/put() We print session's refcount in debug message inside ceph_put_mds_session() and get_session(), so we don't have to print it in con_get()/__ceph_lookup_mds_session()/con_put(). Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-01-27 16:53:39 +01:00
Jeff Layton	9c1c2b35f1	ceph: hold extra reference to r_parent over life of request Currently, we just assume that it will stick around by virtue of the submitter's reference, but later patches will allow the syscall to return early and we can't rely on that reference at that point. While I'm not aware of any reports of it, Xiubo pointed out that this may fix a use-after-free. If the wait for a reply times out or is canceled via signal, and then the reply comes in after the syscall returns, the client can end up trying to access r_parent without a reference. Take an extra reference to the inode when setting r_parent and release it when releasing the request. Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-01-21 19:02:37 +01:00
Xiubo Li	bba1560bd4	ceph: trigger the reclaim work once there has enough pending caps The nr in ceph_reclaim_caps_nr() is very possibly larger than 1, so we may miss it and the reclaim work couldn't triggered as expected. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-12-09 20:55:10 +01:00
Jeff Layton	3a3430affc	ceph: show tasks waiting on caps in debugfs caps file Add some visibility of tasks that are waiting for caps to the "caps" debugfs file. Display the tgid of the waiting task, inode number, and the caps the task needs and wants. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-12-09 20:55:10 +01:00
Jeff Layton	ad8c28a9eb	ceph: convert int fields in ceph_mount_options to unsigned int Most of these values should never be negative, so convert them to unsigned values. Add some sanity checking to the parsed values, and clean up some unneeded casts. Note that while caps_max should never be negative, this patch leaves it signed, since this value ends up later being compared to a signed counter. Just ensure that userland never passes in a negative value for caps_max. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-12-09 20:55:10 +01:00
Deepa Dinamani	668c9a61e3	fs: ceph: Delete timespec64_trunc() usage Since ceph always uses ns granularity, skip the truncation which is a no-op. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: jlayton@kernel.org Cc: ceph-devel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-12-08 19:10:53 -05:00
Jeff Layton	2def865a81	ceph: don't leave ino field in ceph_mds_request_head uninitialized We currently just pass junk in this field unless we're retransmitting a create, but in later patches, we'll need a mechanism to pass a delegated inode number on an initial create request. Prepare for this by ensuring this field is zeroed out. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-11-25 11:44:02 +01:00
Jeff Layton	f5946bcc5e	ceph: tone down loglevel on ceph_mdsc_build_path warning When this occurs, it usually means that we raced with a rename, and there is no need to warn in that case. Only printk if we pass the rename sequence check but still ended up with pos < 0. Either way, this doesn't warrant a KERN_ERR message. Change it to KERN_WARNING. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-11-25 11:44:02 +01:00
Jeff Layton	1d3f87233e	ceph: just skip unrecognized info in ceph_reply_info_extra In the future, we're going to want to extend the ceph_reply_info_extra for create replies. Currently though, the kernel code doesn't accept an extra blob that is larger than the expected data. Change the code to skip over any unrecognized fields at the end of the extra blob, rather than returning -EIO. Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-10-15 17:43:10 +02:00
Erqi Chen	71a228bc8d	ceph: reconnect connection if session hang in opening state If client mds session is evicted in CEPH_MDS_SESSION_OPENING state, mds won't send session msg to client, and delayed_work skip CEPH_MDS_SESSION_OPENING state session, the session hang forever. Allow ceph_con_keepalive to reconnect a session in OPENING to avoid session hang. Also, ensure that we skip sessions in RESTARTING and REJECTED states since those states can't be resurrected by issuing a keepalive. Link: https://tracker.ceph.com/issues/41551 Signed-off-by: Erqi Chen chenerqi@gmail.com Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-09-16 12:06:25 +02:00
Jeff Layton	533a2818dd	ceph: eliminate session->s_trim_caps It's only used to keep count of caps being trimmed, but that requires that we hold the session->s_mutex to prevent multiple trimming operations from running concurrently. We can achieve the same effect using an integer on the stack, which allows us to (eventually) not need the s_mutex. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-09-16 12:06:24 +02:00
Yan, Zheng	131d7eb4fa	ceph: auto reconnect after blacklisted Make client use osd reply and session message to infer if itself is blacklisted. Client reconnect to cluster using new entity addr if it is blacklisted. Auto reconnect is limited to once every 30 minutes. Auto reconnect is disabled by default. It can be enabled/disabled by recover_session=<no\|clean> mount option. In 'clean' mode, client drops any dirty data/metadata, invalidates page caches and invalidates all writable file handles. After reconnect, file locks become stale because MDS loses track of them. If an inode contains any stale file locks, read/write on the indoe are not allowed until applications release all stale file locks. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-09-16 12:06:24 +02:00
Yan, Zheng	d468e729b7	ceph: add helper function that forcibly reconnects to ceph cluster. It closes mds sessions, drop all caps and invalidates page caches, then use new entity address to reconnect to the cluster. After reconnect, all dirty data/metadata are dropped, file locks get lost sliently. Open files continue to work because client will try renewing caps on later read/write. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-09-16 12:06:24 +02:00
Yan, Zheng	f4b9786622	ceph: track and report error of async metadata operation Use errseq_t to track and report errors of async metadata operations, similar to how kernel handles errors during writeback. If any dirty caps or any unsafe request gets dropped during session eviction, record -EIO in corresponding inode's i_meta_err. The error will be reported by subsequent fsync, Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-09-16 12:06:23 +02:00
Jeff Layton	a35ead314e	ceph: add change_attr field to ceph_inode_info Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-07-08 14:01:43 +02:00
Jeff Layton	245ce991cc	ceph: add btime field to ceph_inode_info Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-07-08 14:01:43 +02:00
Yan, Zheng	428138c989	ceph: remove request from waiting list before unregister Link: https://tracker.ceph.com/issues/40339 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-07-08 14:01:42 +02:00
Yan, Zheng	6f0f597b5d	ceph: don't blindly unregister session that is in opening state handle_cap_export() may add placeholder caps to session that is in opening state. These caps' session pointer become wild after session get unregistered. The fix is not to unregister session in opening state during mds failovers, just let client to reconnect later when mds is recovered. Link: https://tracker.ceph.com/issues/40190 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-07-08 14:01:42 +02:00
Yan, Zheng	8f2a98ef3c	ceph: ensure d_name/d_parent stability in ceph_mdsc_lease_send_msg() Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-07-08 14:01:42 +02:00
Yan, Zheng	41883ba8ee	ceph: use READ_ONCE to access d_parent in RCU critical section Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-07-08 14:01:42 +02:00
David Disseldorp	193e7b3762	ceph: carry snapshot creation time with inodes MDS InodeStat v3 wire structures include a trailing snapshot creation time member. Unmarshall this and retain it for a future vxattr. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-07-08 14:01:40 +02:00
Jeff Layton	d6b8bd679c	ceph: fix ceph_mdsc_build_path to not stop on first component When ceph_mdsc_build_path is handed a positive dentry, it will return a zero-length path string with the base set to that dentry. This is not what we want. Always include at least one path component in the string. ceph_mdsc_build_path has behaved this way for a long time but it didn't matter until recent d_name handling rework. Fixes: `964fff7491` ("ceph: use ceph_mdsc_build_path instead of clone_dentry_name") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-06-27 18:27:36 +02:00
Yan, Zheng	3e1d0452ed	ceph: avoid iput_final() while holding mutex or in dispatch thread iput_final() may wait for reahahead pages. The wait can cause deadlock. For example: Workqueue: ceph-msgr ceph_con_workfn [libceph] Call Trace: schedule+0x36/0x80 io_schedule+0x16/0x40 __lock_page+0x101/0x140 truncate_inode_pages_range+0x556/0x9f0 truncate_inode_pages_final+0x4d/0x60 evict+0x182/0x1a0 iput+0x1d2/0x220 iterate_session_caps+0x82/0x230 [ceph] dispatch+0x678/0xa80 [ceph] ceph_con_workfn+0x95b/0x1560 [libceph] process_one_work+0x14d/0x410 worker_thread+0x4b/0x460 kthread+0x105/0x140 ret_from_fork+0x22/0x40 Workqueue: ceph-msgr ceph_con_workfn [libceph] Call Trace: __schedule+0x3d6/0x8b0 schedule+0x36/0x80 schedule_preempt_disabled+0xe/0x10 mutex_lock+0x2f/0x40 ceph_check_caps+0x505/0xa80 [ceph] ceph_put_wrbuffer_cap_refs+0x1e5/0x2c0 [ceph] writepages_finish+0x2d3/0x410 [ceph] __complete_request+0x26/0x60 [libceph] handle_reply+0x6c8/0xa10 [libceph] dispatch+0x29a/0xbb0 [libceph] ceph_con_workfn+0x95b/0x1560 [libceph] process_one_work+0x14d/0x410 worker_thread+0x4b/0x460 kthread+0x105/0x140 ret_from_fork+0x22/0x40 In above example, truncate_inode_pages_range() waits for readahead pages while holding s_mutex. ceph_check_caps() waits for s_mutex and blocks OSD dispatch thread. Later OSD replies (for readahead) can't be handled. ceph_check_caps() also may lock snap_rwsem for read. So similar deadlock can happen if iput_final() is called while holding snap_rwsem. In general, it's not good to call iput_final() inside MDS/OSD dispatch threads or while holding any mutex. The fix is introducing ceph_async_iput(), which calls iput_final() in workqueue. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-06-05 20:34:39 +02:00
Jeff Layton	4198aba4f4	ceph: fix unaligned access in ceph_send_cap_releases Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-05-07 19:43:05 +02:00
Jeff Layton	488f5284e2	ceph: just call get_session in __ceph_lookup_mds_session I originally thought there was a potential race here, but the fact that this is called with the mdsc->mutex held, ensures that the last reference to the session can't be put here. Still, it's clearer to just return the value from get_session here, and may prevent a bug later if we ever rework this code to be less reliant on mutexes. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-05-07 19:22:38 +02:00

1 2 3 4 5 ...

397 Commits