linux

Author	SHA1	Message	Date
Linus Torvalds	972a278fe6	for-5.19-rc7-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmLRpPgACgkQxWXV+ddt WDtu/BAAnfx7CXKIfWKpz6FZEio9Qb3mUHVOglyKzqR0qB72OdrC1dQMvEWPJc6h N65di6+8tTNmRIlaFBMU0MDHODR2aDRpDtlR9eUzUuidTc4iOp1fi31uBwl31r7b k8mCZBc/IAdfH13lBtcfkb2HGid7rik5ZC6Kx/glMcqh647QkSMAleupUsIYHKsK IgcUWuN3wFIUK2WVgsja7+ljlwIHBHKRp9yrEYw+ef/B0NCNKvOnrIOPJzO7nxMP 1FbqJ6F7u7HjoMFcMwn5rbV/BoIwSSvXyKRqOW+EhGeQR/imVmkH9jXJ7wXdblSz IvSqaZ0DaWWSvivdMpwbr8Z0Cu4iIYhVY6PSA0hukR63qB5GwKKJ6j1L0zoYoz8C IDWJPW03FNRIu5ZOduvUQ3qG7jcJQZ3WPCCfrDST1cO2xHT/7f65Tjz4k0hvp4za edITetC1mEv310CHeGsJaLxGYPNrRe38VZYPxgJ7yFpteGYjh0ZwsuyUHb4MH1no JWwgElNW+m1BatdWSUBYk6xhqod1s2LOFPNqo7jNlv8I27hPViqCbBA2i9FlkXf+ FwL5kWyJXs69gfjUIj59381Z0U1VdA1tvU8GP2m2+JvIDS6ooAcZj7yEQ69mCZxi 2RFJIU0NFbnc/5j2ARSzOTGs9glDD0yffgXJM+cK+TWsQ3AC31I= =/47A -----END PGP SIGNATURE----- Merge tag 'for-5.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs reverts from David Sterba: "Due to a recent report [1] we need to revert the radix tree to xarray conversion patches. There's a problem with sleeping under spinlock, when xa_insert could allocate memory under pressure. We use GFP_NOFS so this is a real problem that we unfortunately did not discover during review. I'm sorry to do such change at rc6 time but the revert is IMO the safer option, there are patches to use mutex instead of the spin locks but that would need more testing. The revert branch has been tested on a few setups, all seem ok. The conversion to xarray will be revisited in the future" Link: https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/ [1] * tag 'for-5.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Revert "btrfs: turn delayed_nodes_tree into an XArray" Revert "btrfs: turn name_cache radix tree into XArray in send_ctx" Revert "btrfs: turn fs_info member buffer_radix into XArray" Revert "btrfs: turn fs_roots_radix in btrfs_fs_info into an XArray"	2022-07-16 13:48:55 -07:00
Linus Torvalds	1ce9d792e8	A folio locking fixup that Xiubo and David cooperated on, marked for stable. Most of it is in netfs but I picked it up into ceph tree on agreement with David. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmLRle4THGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHziwNrB/wLIT7pDkZl2h1LclJS1WfgzgPkaOVq sN8RO+QH3zIx5av/b3BH/R9Ilp2M4QjWr7f5y3emVZPxV9KQ2lrUj30XKecfIO4+ nGU3YunO+rfaUTyySJb06VFfhLpOjxjWGFEjgAO+exiWz4zl2h8dOXqYBTE/cStT +721WZKYR25UK7c7kp/LgRC9QhjqH1MDm7wvPOAg6CR7mw2OiwjYD7o8Ou+zvGfp 6GimxbWouJNT+/xW2T3wIJsmQuwZbw4L4tsLSfhKTk57ooKtR1cdm0h/N7LM1bQa fijU36LdGJGqKKF+kVJV73sNuPIZGY+KVS+ApiuOJ/LMDXxoeuiYtewT =P3hf -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.19-rc7' of https://github.com/ceph/ceph-client Pull ceph fix from Ilya Dryomov: "A folio locking fixup that Xiubo and David cooperated on, marked for stable. Most of it is in netfs but I picked it up into ceph tree on agreement with David" * tag 'ceph-for-5.19-rc7' of https://github.com/ceph/ceph-client: netfs: do not unlock and put the folio twice	2022-07-15 10:27:28 -07:00
David Sterba	088aea3b97	Revert "btrfs: turn delayed_nodes_tree into an XArray" This reverts commit `253bf57555`. Revert the xarray conversion, there's a problem with potential sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS allocation. The radix tree used the preloading mechanism to avoid sleeping but this is not available in xarray. Conversion from spin lock to mutex is possible but at time of rc6 is riskier than a clean revert. [1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/ Reported-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-07-15 19:15:19 +02:00
David Sterba	5b8418b843	Revert "btrfs: turn name_cache radix tree into XArray in send_ctx" This reverts commit `4076942021`. Revert the xarray conversion, there's a problem with potential sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS allocation. The radix tree used the preloading mechanism to avoid sleeping but this is not available in xarray. Conversion from spin lock to mutex is possible but at time of rc6 is riskier than a clean revert. [1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/ Reported-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-07-15 19:14:58 +02:00
David Sterba	01cd390903	Revert "btrfs: turn fs_info member buffer_radix into XArray" This reverts commit `8ee922689d`. Revert the xarray conversion, there's a problem with potential sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS allocation. The radix tree used the preloading mechanism to avoid sleeping but this is not available in xarray. Conversion from spin lock to mutex is possible but at time of rc6 is riskier than a clean revert. [1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/ Reported-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-07-15 19:14:33 +02:00
David Sterba	fc7cbcd489	Revert "btrfs: turn fs_roots_radix in btrfs_fs_info into an XArray" This reverts commit `48b36a602a`. Revert the xarray conversion, there's a problem with potential sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS allocation. The radix tree used the preloading mechanism to avoid sleeping but this is not available in xarray. Conversion from spin lock to mutex is possible but at time of rc6 is riskier than a clean revert. [1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/ Reported-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-07-15 19:14:28 +02:00
Linus Torvalds	b926f2adb0	Revert "vf/remap: return the amount of bytes actually deduplicated" This reverts commit `4a57a84000`. Dave Chinner reports: "As I suspected would occur, this change causes test failures. e.g generic/517 in fstests fails with: generic/517 1s ... - output mismatch [..] -deduped 131172/131172 bytes at offset 65536 +deduped 131072/131172 bytes at offset 65536" can you please revert this commit for the 5.19 series to give us more time to investigate and consider the impact of the the API change on userspace applications before we commit to changing the API" That changed return value seems to reflect reality, but with the fstest change, let's revert for now. Requested-by: Dave Chinner <david@fromorbit.com> Link: https://lore.kernel.org/all/20220714223238.GH3600936@dread.disaster.area/ Cc: Ansgar Lößer <ansgar.loesser@tu-darmstadt.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-07-14 15:35:24 -07:00
Linus Torvalds	f41d5df5f1	three smb3 client fixes: 2 related to multichannel, one for working around a negotiate protocol bug in some Samba servers -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmLPa1YACgkQiiy9cAdy T1EaFQv+JB99S1F4ANk7uc+diFd/j+b1pIpR2nuNq6BM0Nn1sofbNZup/yFovoXN L+nm/XyWUYhtx5Rl5HPgfBaQwMq4uZPN8o/5vxKW1Kb/uxBV+VdSw9LhiNAWZpPB dGP7pGVu05TNnlsNBJjLTjAtnB7kL2r+XSMU+UPBDx2yTECJzhZbiedcg1qysh3t GqC03rsh8Gi5gnqUI4XZDh5iC2OAnlKeoQIXck0JbF6z3/kOjHsqUyuVE8ToDSry RNXxoy+hB7dsO1eehwsKBMg4mry5ArOv4aTpAa6nH3vF94FV2YG1tVroKIQRkw3r dcnLRcaBj8E3voNy1xK7qbSIdj6DlAbBeHS4cjAFHX+VCmecn6drolZDjIMfTZv8 XSweHaP94Bm043866+vcFqqNNB9B2DgFiJgyZwOQC7sdTOmN49r44o0zE68mWn40 c6n7RxayvobU9hU7ylqID6prKsI1zibUYXEItJaAHWUuqFbISd4iihFzJky+npvR Tep17Tts =FxnJ -----END PGP SIGNATURE----- Merge tag '5.19-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Three smb3 client fixes: - two multichannel fixes: fix a potential deadlock freeing a channel, and fix a race condition on failed creation of a new channel - mount failure fix: work around a server bug in some common older Samba servers by avoiding padding at the end of the negotiate protocol request" * tag '5.19-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: workaround negprot bug in some Samba servers cifs: remove unnecessary locking of chan_lock while freeing session cifs: fix race condition with delayed threads	2022-07-14 12:35:15 -07:00
Linus Torvalds	a24a6c05ff	Notable regression fixes: - Enable SETATTR(time_create) to fix regression with Mac OS clients - Fix a lockd crasher and broken NLM UNLCK behavior -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmLQSzcACgkQM2qzM29m f5d5Sg/9Hln5dSnIpgJKyhIy+ic5QB29VL9RWZ7JtVCUsWsaVObkwobmne7UWAms X9eehGG2gYTsLcXH9oIGeLRrjbWGYbdg3B6tGMFeQZgI6NeSvBS9AeDzyG+rRTyL 2czvODorTSfYokbsHNE3kWh0Tohkvd/4wV1y5GUcLYKUj6ZO21xsSbEj7381nriJ 9nkudqgJTfCINxqRiIRBT+iCNfjvAc8EhfxP+ThcmcjJH9WeWi9kR0ZF050jloO/ JH2RZRNN+crNdg77HlXyc/jcr2G/wv/IR5G40LqeLgrJbvPAKRGAltzeyErWY7eB 6jsWf7/vil6+PfuUaPskt9h60hGe3DeBzraK4gFcEKci6YSqbE7h1GrUtM6/2E0a 5HM2mYAZfYBZyT2HitZCoFXOvGssM6543QXW2noMo/6eAWFPfdzjlN55FLNssqkq 8fPMLoiBElOvVLCYvi91CpnqSsmR4WW44dwY/d+H7DG4QnCurPGMH2OFHX3epk5J kJzhT63GuemgwrxZICT8KQ6aVWHreHRO2LTKv9yjhEohsqkCf3eV/xlWKzcRFXLj HQIBfQf1UIOonxOkdGUG1w0Es/Wz7btQehc/CwaMi74VUjZUp5qSMjCUWkckyALc 3k1ojcODEyR/5rOepQF4kRzOzSbm7TETwhF7YG0TRPugIGri/sQ= =T8gK -----END PGP SIGNATURE----- Merge tag 'nfsd-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Notable regression fixes: - Enable SETATTR(time_create) to fix regression with Mac OS clients - Fix a lockd crasher and broken NLM UNLCK behavior" * tag 'nfsd-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: lockd: fix nlm_close_files lockd: set fl_owner when unlocking files NFSD: Decode NFSv4 birth time attribute	2022-07-14 12:29:43 -07:00
Xiubo Li	fac47b43c7	netfs: do not unlock and put the folio twice check_write_begin() will unlock and put the folio when return non-zero. So we should avoid unlocking and putting it twice in netfs layer. Change the way ->check_write_begin() works in the following two ways: (1) Pass it a pointer to the folio pointer, allowing it to unlock and put the folio prior to doing the stuff it wants to do, provided it clears the folio pointer. (2) Change the return values such that 0 with folio pointer set means continue, 0 with folio pointer cleared means re-get and all error codes indicating an error (no special treatment for -EAGAIN). [ bagasdotme: use Sphinx code text syntax for *foliop pointer ] Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/56423 Link: https://lore.kernel.org/r/cf169f43-8ee7-8697-25da-0204d1b4343e@redhat.com Co-developed-by: David Howells <dhowells@redhat.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-07-14 10:10:12 +02:00
Steve French	32f319183c	smb3: workaround negprot bug in some Samba servers Mount can now fail to older Samba servers due to a server bug handling padding at the end of the last negotiate context (negotiate contexts typically are rounded up to 8 bytes by adding padding if needed). This server bug can be avoided by switching the order of negotiate contexts, placing a negotiate context at the end that does not require padding (prior to the recent netname context fix this was the case on the client). Fixes: `73130a7b1a` ("smb3: fix empty netname context on secondary channels") Reported-by: Julian Sikorski <belegdol@gmail.com> Tested-by: Julian Sikorski <belegdol+github@gmail.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-13 19:59:47 -05:00
Ansgar Lößer	4a57a84000	vf/remap: return the amount of bytes actually deduplicated When using the FIDEDUPRANGE ioctl, in case of success the requested size is returned. In some cases this might not be the actual amount of bytes deduplicated. This change modifies vfs_dedupe_file_range() to report the actual amount of bytes deduplicated, instead of the requested amount. Link: https://lore.kernel.org/linux-fsdevel/5548ef63-62f9-4f46-5793-03165ceccacc@tu-darmstadt.de/ Reported-by: Ansgar Lößer <ansgar.loesser@kom.tu-darmstadt.de> Reported-by: Max Schlecht <max.schlecht@informatik.hu-berlin.de> Reported-by: Björn Scheuermann <scheuermann@kom.tu-darmstadt.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Darrick J Wong <djwong@kernel.org> Signed-off-by: Ansgar Lößer <ansgar.loesser@kom.tu-darmstadt.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-07-13 12:08:14 -07:00
Dave Chinner	5750676b64	fs/remap: constrain dedupe of EOF blocks If dedupe of an EOF block is not constrainted to match against only other EOF blocks with the same EOF offset into the block, it can match against any other block that has the same matching initial bytes in it, even if the bytes beyond EOF in the source file do not match. Fix this by constraining the EOF block matching to only match against other EOF blocks that have identical EOF offsets and data. This allows "whole file dedupe" to continue to work without allowing eof blocks to randomly match against partial full blocks with the same data. Reported-by: Ansgar Lößer <ansgar.loesser@tu-darmstadt.de> Fixes: `1383a7ed67` ("vfs: check file ranges before cloning files") Link: https://lore.kernel.org/linux-fsdevel/a7c93559-4ba1-df2f-7a85-55a143696405@tu-darmstadt.de/ Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-07-13 10:28:16 -07:00
Linus Torvalds	72a8e05d4f	overlayfs fixes for 5.19-rc7 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYs11GgAKCRDh3BK/laaZ PAD3APsHu08aHid5O/zPnD/90BNqAo3ruvu2WhI5wa8Dacd5SwEAgoSlH2Tx3iy9 4zWK4zZX98qAGyI+ij5aejc0TvONqAE= =4KjV -----END PGP SIGNATURE----- Merge tag 'ovl-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fix from Miklos Szeredi: "Add a temporary fix for posix acls on idmapped mounts introduced in this cycle. A proper fix will be added in the next cycle" * tag 'ovl-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: turn off SB_POSIXACL with idmapped layers temporarily	2022-07-12 08:59:35 -07:00
Shyam Prasad N	2883f4b5a0	cifs: remove unnecessary locking of chan_lock while freeing session In cifs_put_smb_ses, when we're freeing the last ref count to the session, we need to free up each channel. At this point, it is unnecessary to take chan_lock, since we have the last reference to the ses. Picking up this lock also introduced a deadlock because it calls cifs_put_tcp_ses, which locks cifs_tcp_ses_lock. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Acked-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-12 10:35:48 -05:00
Shyam Prasad N	50bd7d5a64	cifs: fix race condition with delayed threads On failure to create a new channel, first cancel the delayed threads, which could try to search for this channel, and not find it. The other option was to put the tcp session for the channel first, before decrementing chan_count. But that would leave a reference to the tcp session, when it has been freed already. So going with the former option and cancelling the delayed works first, before rolling back the channel. Fixes: `aa45dadd34` ("cifs: change iface_list from array to sorted linked list") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Acked-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-07-12 10:35:13 -05:00
Linus Torvalds	5a29232d87	for-5.19-rc6-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmLMiQQACgkQxWXV+ddt WDvBQg/+I1ebfW2DFY8kBwy7c1qKZWIhNx1VVk2AegIXvrW/Tos7wp5O6fi7p/jL d6k8zO/zFLlfiI4Ckmz3gt7cxaMTNXxr6+GQpNNm1b92Wdcy1a+3gquzcehT9Q10 ZB4ecPWzEDXgORvdBYG2eD2Z8PrsF0Wu88XRDiiJOBQLjZ+k2sVp8QvJlOllLDoC m7rPoq98jC6VpZwFJ+fGk2jC7y4+1QXrOuQMy7LRTe59Thp6wUFDDPtkKfr5scDC UxkctlUdInD7A6DVvPzwaBFNoT8UeEByGHcMd3KjjrTdmqSWW6k8FiF4ckZwA3zJ oPdJVzdC5a2W7t6BHw+t7VNmkKd+swnr2sVSGQ8eIzF7z3/JSqyYVwziOD1YzAdU QUmawWm4/SFvsbO8aoLrEKNbUiTgQwVbKzJh4Dhu9VJ43jeCwCX7pa/uZI4evgyG T0tuwm58bWCk4y1o1fcFYgf4JcVgK23F2vKckUFZeHoV3Q8R0DnPCCGTqs1qT5vY irZ9AIawmaR09JptMjjsAEjDA9qb16Ut/J6/anukyCgL610EyYZG7zb1WH1cUD1o zNXY6O/iKyNdiXj7V1fTMiG/M8hGDcFu4pOpBk3hFjHEXX9BefoVC0J5YzvCecPz isqboD5Lt1I4mrzac1X+serMYfVbFH6+tsEPBQBZf/o/a0u43jI= =Cxvn -----END PGP SIGNATURE----- Merge tag 'for-5.19-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A more fixes that seem to me to be important enough to get merged before release: - in zoned mode, fix leak of a structure when reading zone info, this happens on normal path so this can be significant - in zoned mode, revert an optimization added in 5.19-rc1 to finish a zone when the capacity is full, but this is not reliable in all cases - try to avoid short reads for compressed data or inline files when it's a NOWAIT read, applications should handle that but there are two, qemu and mariadb, that are affected" * tag 'for-5.19-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: drop optimization of zone finish btrfs: zoned: fix a leaked bioc in read_zone_info btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents	2022-07-11 14:41:44 -07:00
Jeff Layton	1197eb5906	lockd: fix nlm_close_files This loop condition tries a bit too hard to be clever. Just test for the two indices we care about explicitly. Cc: J. Bruce Fields <bfields@fieldses.org> Fixes: `7f024fcd5c` ("Keep read and write fds with each nlm_file") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-11 15:49:56 -04:00
Linus Torvalds	8e59a6a7a4	Mainly MM fixes. About half for issues which were introduced after 5.18 and the remainder for longer-term issues. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYsxt9wAKCRDdBJ7gKXxA jnjWAQD6ts4tgsX+hQ5lrZjWRvYIxH/I4jbtxyMyhc+iKarotAD+NILVgrzIvr0v ijlA4LLtmdhN1UWdSomUm3bZVn6n+QA= =1375 -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2022-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "Mainly MM fixes. About half for issues which were introduced after 5.18 and the remainder for longer-term issues" * tag 'mm-hotfixes-stable-2022-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: split huge PUD on wp_huge_pud fallback nilfs2: fix incorrect masking of permission flags for symlinks mm/rmap: fix dereferencing invalid subpage pointer in try_to_migrate_one() riscv/mm: fix build error while PAGE_TABLE_CHECK enabled without MMU Documentation: highmem: use literal block for code example in highmem.h comment mm: sparsemem: fix missing higher order allocation splitting mm/damon: use set_huge_pte_at() to make huge pte old sh: convert nommu io{re,un}map() to static inline functions mm: userfaultfd: fix UFFDIO_CONTINUE on fallocated shmem pages	2022-07-11 12:49:56 -07:00
Jeff Layton	aec158242b	lockd: set fl_owner when unlocking files Unlocking a POSIX lock on an inode with vfs_lock_file only works if the owner matches. Ensure we set it in the request. Cc: J. Bruce Fields <bfields@fieldses.org> Fixes: `7f024fcd5c` ("Keep read and write fds with each nlm_file") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-11 15:49:56 -04:00
Chuck Lever	5b2f3e0777	NFSD: Decode NFSv4 birth time attribute NFSD has advertised support for the NFSv4 time_create attribute since commit `e377a3e698` ("nfsd: Add support for the birth time attribute"). Igor Mammedov reports that Mac OS clients attempt to set the NFSv4 birth time attribute via OPEN(CREATE) and SETATTR if the server indicates that it supports it, but since the above commit was merged, those attempts now fail. Table 5 in RFC 8881 lists the time_create attribute as one that can be both set and retrieved, but the above commit did not add server support for clients to provide a time_create attribute. IMO that's a bug in our implementation of the NFSv4 protocol, which this commit addresses. Whether NFSD silently ignores the new birth time or actually sets it is another matter. I haven't found another filesystem service in the Linux kernel that enables users or clients to modify a file's birth time attribute. This commit reflects my (perhaps incorrect) understanding of whether Linux users can set a file's birth time. NFSD will now recognize a time_create attribute but it ignores its value. It clears the time_create bit in the returned attribute bitmask to indicate that the value was not used. Reported-by: Igor Mammedov <imammedo@redhat.com> Fixes: `e377a3e698` ("nfsd: Add support for the birth time attribute") Tested-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-11 13:52:22 -04:00
Oleg Nesterov	d5b36a4dbd	fix race between exit_itimers() and /proc/pid/timers As Chris explains, the comment above exit_itimers() is not correct, we can race with proc_timers_seq_ops. Change exit_itimers() to clear signal->posix_timers with ->siglock held. Cc: <stable@vger.kernel.org> Reported-by: chris@accessvector.net Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-07-11 09:52:59 -07:00
Linus Torvalds	d9919d43cb	o_uring-5.19-2022-07-09 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLJpHQQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpmHAD/9r8+hKrvYwoBVoi0r8SD/hDkOJMJAnJOBb IBzpDRGcZhdLqGvMkhkdk1s83TTsKysStYSsr0jTlNMtevHNUx5jalYY0x7sIJBt S5WEwJ8aKbCs4G7pRU/T/wtRtYFz+JmlKT3VNGjkLWVrWn77CGEWtAHCSgzIgScn 7QzZiQIVheDr9RsqU8xre1m21+rQWvOssip82UfuTpiYOKTYyHkb91MoNdTGd3JB ABrr1Ind4lSbMaXXcUtUP6iV15NU95CplEr8ln4DS9/syoM7O1ZiOdJb5uo/ywyC +VEh2LO+UyRn7V4me743C1UGASNTYoyOjGJs0t0en2L10gTZHx3tWLkKRvk0uF0E MOXJ/F8QRZxtb/RTUPX2MED/U/ZERLBbF/Jrdfunwj4NuF6mxOhM0MhFb+DCyxK0 BmEfTIdmYzHkyKgp46OeaoJ+8cuyoHKC2DlZMoCl+mw6Fmv1XjVnnDag0Oxl2rv4 ANCPZNGHvi5kC2t5fHu7zgOBgbb1IAkLLwcUA8SQ27dRQPfmsB6sm7YUAabuTu8N zc937WxpXw9FsjtmFb7JQR5yLsTIG4BYHSZFM1FF8YX6nFptm4OhKFevgYH8r3sg sj4qUZFVVc60xTnFR7Yr+ccBbNPQPzy1tRG70OZvMYtkPFqn+IrfxrBzZTxl6/bl 5dWh6Y8L+A== =ZK3Q -----END PGP SIGNATURE----- Merge tag 'io_uring-5.19-2022-07-09' of git://git.kernel.dk/linux-block Pull io_uring fix from Jens Axboe: "A single fix for an issue that came up yesterday that we should plug for -rc6. This is a regression introduced in this cycle" * tag 'io_uring-5.19-2022-07-09' of git://git.kernel.dk/linux-block: io_uring: check that we have a file table when allocating update slots	2022-07-10 09:14:54 -07:00
Jens Axboe	d785a773be	io_uring: check that we have a file table when allocating update slots If IORING_FILE_INDEX_ALLOC is set asking for an allocated slot, the helper doesn't check if we actually have a file table or not. The non alloc path does do that correctly, and returns -ENXIO if we haven't set one up. Do the same for the allocated path, avoiding a NULL pointer dereference when trying to find a free bit. Fixes: `a7c41b4687` ("io_uring: let IORING_OP_FILES_UPDATE support choosing fixed file slots") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-09 07:02:10 -06:00
Linus Torvalds	e5524c2a1f	fscache fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmLInYwACgkQ+7dXa6fL C2tbmxAAlBkMvoIeyOlkMeAB3l1aK5CAYmgddjWOxY6UHNIGgXqWlGJCoxYngTmy j+5Uz1mMNh7TamSJpCTXUBbiLWxzoHI34oie+NT9VUXTTVw0/GQ//jQPlvE+HeJG Hmi0HR4lby1O6X9r63obzvDctJPhsVLdgJFtZ3UidwzgY2zYnScxwyCuwSfLUVnF YJACZWPFj1oWxOiS2bQZgnvCJpfJ6J0I5asmO0lI1tQDS0o4RoiRvDPTx8Z6K3KO 8CSm5tvu+vYrbZ5aNCN9f0MZfNl+YZiNQyxEiIP9Phu9iZFRe9UgXvDrE9346uQM /Vxa5nvyj+D0Hg+l1jBFXvwoKxLHTZ/BaK74h2/QzRKgxKrHTAyYOgxqt7U/YCsa S489eHOGNwbqIKACljBcMX9YZeCdeixt4aK/VIRr+d1f2Ui30Z05ExuYOfGn4wBq VcSsBgAALwcyqrpi61URYvKduHTa2D8ATKNKXNM/i1Wv6KQDeom5jyKj3mE7UNsI WAgWE2xy74kjYs1Lc0izsfieyZrt1MqZiyYFjS86Cnt0wVmdAQYrQ3wciN5D9IJN MiUYuienGRf1MNM2tmFK+N7e+pa/waKcl15/d3Zal94rlATpR2bFOI1zJ9qJ8rpi k8B/eydAZx+oUthxWvfIdLwIREJzA49jzBHgfAy/CMjqPCj7gZ8= =FDkz -----END PGP SIGNATURE----- Merge tag 'fscache-fixes-20220708' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache fixes from David Howells: - Fix a check in fscache_wait_on_volume_collision() in which the polarity is reversed. It should complain if a volume is still marked acquisition-pending after 20s, but instead complains if the mark has been cleared (ie. the condition has cleared). Also switch an open-coded test of the ACQUIRE_PENDING volume flag to use the helper function for consistency. - Not a fix per se, but neaten the code by using a helper to check for the DROPPED state. - Fix cachefiles's support for erofs to only flush requests associated with a released control file, not all requests. - Fix a race between one process invalidating an object in the cache and another process trying to look it up. * tag 'fscache-fixes-20220708' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache: Fix invalidation/lookup race cachefiles: narrow the scope of flushed requests when releasing fd fscache: Introduce fscache_cookie_is_dropped() fscache: Fix if condition in fscache_wait_on_volume_collision()	2022-07-08 16:08:48 -07:00
Linus Torvalds	29837019d5	io_uring-5.19-2022-07-08 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLIJGcQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpgS1D/9E2MxaWaHR+A35AXignJLaXxiLWlzTAwxr /ioT6omgogbiaoTrxVZMbW+6vAPbUamXA9yqv2fC/4RERBfz3Z6UBLa8Hp6SFCMQ vb/LSwRwnUuMr/HyF3hLjwISOi4oYs/uo3E7IpOPbpC/e0nDpJnGWx1UfQn0tJSH WVwZsLbUe8T9fhHA0uSHbMoMSvLQGqfwwY+MET2+j+ZDwMoet194yka22jJwfDbF l3cnUe2TQh6orRdzuagWX9WmdnWWyQM3DTqW2cSA0hepyxGMWkCMhMuyV5yqUXhD noHshcyL76h8WQi/BwYDAGYqGy1+FkOkuV3DmVnjHVQory17Ze8ijtImMoEWpkgl TwTTd2+o0ivcEd0JHLeqLHkTXKUENeUMTpJVuLotLMMdupIvF0jrdNWTWCM7uBto Q9JxIkEs+16bRqT+yzC4cNuzSQRL6+qQ5jVO5BsNmJoNvs15KN7vgAQ+uR4NCCIv GqHbTiBVsi7DYipS6jNi/bWnxDtIsNsn48WCdx52OYpd1NbkY3oEHMqBZ6rnsPJH /uek3VLajRZfG61EMUGlazitQdW3/z31wM0iP8Y8xPyvSNCbsJkFtrNO13jpuy9p 0YRP3peXYb2eEzYpq355RSCCpyActDYp77hjcyYQP3gJcnoViZ6rBUHr5Rw5W6lN siwWz7aNXg== =w3SQ -----END PGP SIGNATURE----- Merge tag 'io_uring-5.19-2022-07-08' of git://git.kernel.dk/linux-block Pull io_uring tweak from Jens Axboe: "Just a minor tweak to an addition made in this release cycle: padding a 32-bit value that's in a 64-bit union to avoid any potential funkiness from that" * tag 'io_uring-5.19-2022-07-08' of git://git.kernel.dk/linux-block: io_uring: explicit sqe padding for ioctl commands	2022-07-08 11:25:01 -07:00
Naohiro Aota	b3a3b02557	btrfs: zoned: drop optimization of zone finish We have an optimization in do_zone_finish() to send REQ_OP_ZONE_FINISH only when necessary, i.e. we don't send REQ_OP_ZONE_FINISH when we assume we wrote fully into the zone. The assumption is determined by "alloc_offset == capacity". This condition won't work if the last ordered extent is canceled due to some errors. In that case, we consider the zone is deactivated without sending the finish command while it's still active. This inconstancy results in activating another block group while we cannot really activate the underlying zone, which causes the active zone exceeds errors like below. BTRFS error (device nvme3n2): allocation failed flags 1, wanted 520192 tree-log 0, relocation: 0 nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0 nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0 Fix the issue by removing the optimization for now. Fixes: `8376d9e1ed` ("btrfs: zoned: finish superblock zone once no space left for new SB") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-07-08 19:18:00 +02:00
Christoph Hellwig	2963457829	btrfs: zoned: fix a leaked bioc in read_zone_info The bioc would leak on the normal completion path and also on the RAID56 check (but that one won't happen in practice due to the invalid combination with zoned mode). Fixes: `7db1c5d14d` ("btrfs: zoned: support dev-replace in zoned filesystems") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [ update changelog ] Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-07-08 19:13:32 +02:00
Filipe Manana	a4527e1853	btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents When doing a direct IO read or write, we always return -ENOTBLK when we find a compressed extent (or an inline extent) so that we fallback to buffered IO. This however is not ideal in case we are in a NOWAIT context (io_uring for example), because buffered IO can block and we currently have no support for NOWAIT semantics for buffered IO, so if we need to fallback to buffered IO we should first signal the caller that we may need to block by returning -EAGAIN instead. This behaviour can also result in short reads being returned to user space, which although it's not incorrect and user space should be able to deal with partial reads, it's somewhat surprising and even some popular applications like QEMU (Link tag #1) and MariaDB (Link tag #2) don't deal with short reads properly (or at all). The short read case happens when we try to read from a range that has a non-compressed and non-inline extent followed by a compressed extent. After having read the first extent, when we find the compressed extent we return -ENOTBLK from btrfs_dio_iomap_begin(), which results in iomap to treat the request as a short read, returning 0 (success) and waiting for previously submitted bios to complete (this happens at fs/iomap/direct-io.c:__iomap_dio_rw()). After that, and while at btrfs_file_read_iter(), we call filemap_read() to use buffered IO to read the remaining data, and pass it the number of bytes we were able to read with direct IO. Than at filemap_read() if we get a page fault error when accessing the read buffer, we return a partial read instead of an -EFAULT error, because the number of bytes previously read is greater than zero. So fix this by returning -EAGAIN for NOWAIT direct IO when we find a compressed or an inline extent. Reported-by: Dominique MARTINET <dominique.martinet@atmark-techno.com> Link: https://lore.kernel.org/linux-btrfs/YrrFGO4A1jS0GI0G@atmark-techno.com/ Link: https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582 Tested-by: Dominique MARTINET <dominique.martinet@atmark-techno.com> CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-07-08 19:13:22 +02:00
Christian Brauner	4a47c6385b	ovl: turn of SB_POSIXACL with idmapped layers temporarily This cycle we added support for mounting overlayfs on top of idmapped mounts. Recently I've started looking into potential corner cases when trying to add additional tests and I noticed that reporting for POSIX ACLs is currently wrong when using idmapped layers with overlayfs mounted on top of it. I have sent out an patch that fixes this and makes POSIX ACLs work correctly but the patch is a bit bigger and we're already at -rc5 so I recommend we simply don't raise SB_POSIXACL when idmapped layers are used. Then we can fix the VFS part described below for the next merge window so we can have good exposure in -next. I'm going to give a rather detailed explanation to both the origin of the problem and mention the solution so people know what's going on. Let's assume the user creates the following directory layout and they have a rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you would expect files on your host system to be owned. For example, ~/.bashrc for your regular user would be owned by 1000:1000 and /root/.bashrc would be owned by 0:0. IOW, this is just regular boring filesystem tree on an ext4 or xfs filesystem. The user chooses to set POSIX ACLs using the setfacl binary granting the user with uid 4 read, write, and execute permissions for their .bashrc file: setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc Now they to expose the whole rootfs to a container using an idmapped mount. So they first create: mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap} mkdir -pv /vol/contpool/ctrover/{over,work} chown 10000000:10000000 /vol/contpool/ctrover/{over,work} The user now creates an idmapped mount for the rootfs: mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \ /var/lib/lxc/c2/rootfs \ /vol/contpool/lowermap This for example makes it so that /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc which is owned by uid and gid 1000 as being owned by uid and gid 10001000 at /vol/contpool/lowermap/home/ubuntu/.bashrc. Assume the user wants to expose these idmapped mounts through an overlayfs mount to a container. mount -t overlay overlay \ -o lowerdir=/vol/contpool/lowermap, \ upperdir=/vol/contpool/overmap/over, \ workdir=/vol/contpool/overmap/work \ /vol/contpool/merge The user can do this in two ways: (1) Mount overlayfs in the initial user namespace and expose it to the container. (2) Mount overlayfs on top of the idmapped mounts inside of the container's user namespace. Let's assume the user chooses the (1) option and mounts overlayfs on the host and then changes into a container which uses the idmapping 0:10000000:65536 which is the same used for the two idmapped mounts. Now the user tries to retrieve the POSIX ACLs using the getfacl command getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc and to their surprise they see: # file: vol/contpool/merge/home/ubuntu/.bashrc # owner: 1000 # group: 1000 user::rw- user:4294967295:rwx group::r-- mask::rwx other::r-- indicating the uid wasn't correctly translated according to the idmapped mount. The problem is how we currently translate POSIX ACLs. Let's inspect the callchain in this example: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping / sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() \|> vfs_getxattr() \| -> __vfs_getxattr() \| -> handler->get == ovl_posix_acl_xattr_get() \| -> ovl_xattr_get() \| -> vfs_getxattr() \| -> __vfs_getxattr() \| -> handler->get() / lower filesystem callback / \|> posix_acl_fix_xattr_to_user() { 4 = make_kuid(&init_user_ns, 4); 4 = mapped_kuid_fs(&init_user_ns / no idmapped mount /, 4); / FAILURE / -1 = from_kuid(0:10000000:65536 / caller's idmapping /, 4); } If the user chooses to use option (2) and mounts overlayfs on top of idmapped mounts inside the container things don't look that much better: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() \|> vfs_getxattr() \| -> __vfs_getxattr() \| -> handler->get == ovl_posix_acl_xattr_get() \| -> ovl_xattr_get() \| -> vfs_getxattr() \| -> __vfs_getxattr() \| -> handler->get() / lower filesystem callback / \|> posix_acl_fix_xattr_to_user() { 4 = make_kuid(&init_user_ns, 4); 4 = mapped_kuid_fs(&init_user_ns, 4); / FAILURE / -1 = from_kuid(0:10000000:65536 / caller's idmapping /, 4); } As is easily seen the problem arises because the idmapping of the lower mount isn't taken into account as all of this happens in do_gexattr(). But do_getxattr() is always called on an overlayfs mount and inode and thus cannot possible take the idmapping of the lower layers into account. This problem is similar for fscaps but there the translation happens as part of vfs_getxattr() already. Let's walk through an fscaps overlayfs callchain: setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc The expected outcome here is that we'll receive the cap_net_raw capability as we are able to map the uid associated with the fscap to 0 within our container. IOW, we want to see 0 as the result of the idmapping translations. If the user chooses option (1) we get the following callchain for fscaps: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k / initial idmapping / sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() -> vfs_getxattr() -> xattr_getsecurity() -> security_inode_getsecurity() ________________________________ -> cap_inode_getsecurity() \| \| { V \| 10000000 = make_kuid(0:0:4k / overlayfs idmapping /, 10000000); \| 10000000 = mapped_kuid_fs(0:0:4k / no idmapped mount /, 10000000); \| / Expected result is 0 and thus that we own the fscap. / \| 0 = from_kuid(0:10000000:65536 / caller's idmapping /, 10000000); \| } \| -> vfs_getxattr_alloc() \| -> handler->get == ovl_other_xattr_get() \| -> vfs_getxattr() \| -> xattr_getsecurity() \| -> security_inode_getsecurity() \| -> cap_inode_getsecurity() \| { \| 0 = make_kuid(0:0:4k / lower s_user_ns /, 0); \| 10000000 = mapped_kuid_fs(0:10000000:65536 / idmapped mount /, 0); \| 10000000 = from_kuid(0:0:4k / overlayfs idmapping /, 10000000); \| \|____________________________________________________________________\| } -> vfs_getxattr_alloc() -> handler->get == / lower filesystem callback / And if the user chooses option (2) we get: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() -> vfs_getxattr() -> xattr_getsecurity() -> security_inode_getsecurity() _______________________________ -> cap_inode_getsecurity() \| \| { V \| 10000000 = make_kuid(0:10000000:65536 / overlayfs idmapping /, 0); \| 10000000 = mapped_kuid_fs(0:0:4k / no idmapped mount /, 10000000); \| / Expected result is 0 and thus that we own the fscap. / \| 0 = from_kuid(0:10000000:65536 / caller's idmapping /, 10000000); \| } \| -> vfs_getxattr_alloc() \| -> handler->get == ovl_other_xattr_get() \| \|-> vfs_getxattr() \| -> xattr_getsecurity() \| -> security_inode_getsecurity() \| -> cap_inode_getsecurity() \| { \| 0 = make_kuid(0:0:4k / lower s_user_ns /, 0); \| 10000000 = mapped_kuid_fs(0:10000000:65536 / idmapped mount /, 0); \| 0 = from_kuid(0:10000000:65536 / overlayfs idmapping /, 10000000); \| \|____________________________________________________________________\| } -> vfs_getxattr_alloc() -> handler->get == / lower filesystem callback / We can see how the translation happens correctly in those cases as the conversion happens within the vfs_getxattr() helper. For POSIX ACLs we need to do something similar. However, in contrast to fscaps we cannot apply the fix directly to the kernel internal posix acl data structure as this would alter the cached values and would also require a rework of how we currently deal with POSIX ACLs in general which almost never take the filesystem idmapping into account (the noteable exception being FUSE but even there the implementation is special) and instead retrieve the raw values based on the initial idmapping. The correct values are then generated right before returning to userspace. The fix for this is to move taking the mount's idmapping into account directly in vfs_getxattr() instead of having it be part of posix_acl_fix_xattr_to_user(). To this end we simply move the idmapped mount translation into a separate step performed in vfs_{g,s}etxattr() instead of in posix_acl_fix_xattr_{from,to}_user(). To see how this fixes things let's go back to the original example. Assume the user chose option (1) and mounted overlayfs on top of idmapped mounts on the host: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k / initial idmapping / sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() \|> vfs_getxattr() \| \|> __vfs_getxattr() \| \| -> handler->get == ovl_posix_acl_xattr_get() \| \| -> ovl_xattr_get() \| \| -> vfs_getxattr() \| \| \|> __vfs_getxattr() \| \| \| -> handler->get() / lower filesystem callback / \| \| \|> posix_acl_getxattr_idmapped_mnt() \| \| { \| \| 4 = make_kuid(&init_user_ns, 4); \| \| 10000004 = mapped_kuid_fs(0:10000000:65536 / lower idmapped mount /, 4); \| \| 10000004 = from_kuid(&init_user_ns, 10000004); \| \| \|_______________________ \| \| } \| \| \| \| \| \|> posix_acl_getxattr_idmapped_mnt() \| \| { \| \| V \| 10000004 = make_kuid(&init_user_ns, 10000004); \| 10000004 = mapped_kuid_fs(&init_user_ns / no idmapped mount /, 10000004); \| 10000004 = from_kuid(&init_user_ns, 10000004); \| } \|_________________________________________________ \| \| \| \| \|> posix_acl_fix_xattr_to_user() \| { V 10000004 = make_kuid(0:0:4k / init_user_ns /, 10000004); / SUCCESS / 4 = from_kuid(0:10000000:65536 / caller's idmapping /, 10000004); } And similarly if the user chooses option (1) and mounted overayfs on top of idmapped mounts inside the container: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() \|> vfs_getxattr() \| \|> __vfs_getxattr() \| \| -> handler->get == ovl_posix_acl_xattr_get() \| \| -> ovl_xattr_get() \| \| -> vfs_getxattr() \| \| \|> __vfs_getxattr() \| \| \| -> handler->get() / lower filesystem callback / \| \| \|> posix_acl_getxattr_idmapped_mnt() \| \| { \| \| 4 = make_kuid(&init_user_ns, 4); \| \| 10000004 = mapped_kuid_fs(0:10000000:65536 / lower idmapped mount /, 4); \| \| 10000004 = from_kuid(&init_user_ns, 10000004); \| \| \|_______________________ \| \| } \| \| \| \| \| \|> posix_acl_getxattr_idmapped_mnt() \| \| { V \| 10000004 = make_kuid(&init_user_ns, 10000004); \| 10000004 = mapped_kuid_fs(&init_user_ns / no idmapped mount /, 10000004); \| 10000004 = from_kuid(0(&init_user_ns, 10000004); \| \|_________________________________________________ \| } \| \| \| \|> posix_acl_fix_xattr_to_user() \| { V 10000004 = make_kuid(0:0:4k / init_user_ns /, 10000004); / SUCCESS / 4 = from_kuid(0:10000000:65536 / caller's idmappings /, 10000004); } The last remaining problem we need to fix here is ovl_get_acl(). During ovl_permission() overlayfs will call: ovl_permission() -> generic_permission() -> acl_permission_check() -> check_acl() -> get_acl() -> inode->i_op->get_acl() == ovl_get_acl() > get_acl() / on the underlying filesystem) ->inode->i_op->get_acl() == /lower filesystem callback / -> posix_acl_permission() passing through the get_acl request to the underlying filesystem. This will retrieve the acls stored in the lower filesystem without taking the idmapping of the underlying mount into account as this would mean altering the cached values for the lower filesystem. The simple solution is to have ovl_get_acl() simply duplicate the ACLs, update the values according to the idmapped mount and return it to acl_permission_check() so it can be used in posix_acl_permission(). Since overlayfs doesn't cache ACLs they'll be released right after. Link: https://github.com/brauner/mount-idmapped/issues/9 Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: linux-unionfs@vger.kernel.org Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Fixes: `bc70682a49` ("ovl: support idmapped layers") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-07-08 15:48:31 +02:00
Pavel Begunkov	bdb2c48e4b	io_uring: explicit sqe padding for ioctl commands 32 bit sqe->cmd_op is an union with 64 bit values. It's always a good idea to do padding explicitly. Also zero check it in prep, so it can be used in the future if needed without compatibility concerns. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e6b95a05e970af79000435166185e85b196b2ba2.1657202417.git.asml.silence@gmail.com [axboe: turn bitwise OR into logical variant] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-07 17:33:01 -06:00
David Howells	85e4ea1049	fscache: Fix invalidation/lookup race If an NFS file is opened for writing and closed, fscache_invalidate() will be asked to invalidate the file - however, if the cookie is in the LOOKING_UP state (or the CREATING state), then request to invalidate doesn't get recorded for fscache_cookie_state_machine() to do something with. Fix this by making __fscache_invalidate() set a flag if it sees the cookie is in the LOOKING_UP state to indicate that we need to go to invalidation. Note that this requires a count on the n_accesses counter for the state machine, which that will release when it's done. fscache_cookie_state_machine() then shifts to the INVALIDATING state if it sees the flag. Without this, an nfs file can get corrupted if it gets modified locally and then read locally as the cache contents may not get updated. Fixes: `d24af13e2e` ("fscache: Implement cookie invalidation") Reported-by: Max Kellermann <mk@cm4all.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Max Kellermann <mk@cm4all.com> Link: https://lore.kernel.org/r/YlWWbpW5Foynjllo@rabbit.intern.cm-ag [1]	2022-07-05 16:12:55 +01:00
Jia Zhu	65aa5f6fd8	cachefiles: narrow the scope of flushed requests when releasing fd When an anonymous fd is released, only flush the requests associated with it, rather than all of requests in xarray. Fixes: `9032b6e858` ("cachefiles: implement on-demand read") Signed-off-by: Jia Zhu <zhujia.zj@bytedance.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://listman.redhat.com/archives/linux-cachefs/2022-June/006937.html	2022-07-05 16:12:21 +01:00
Yue Hu	5c4588aea6	fscache: Introduce fscache_cookie_is_dropped() FSCACHE_COOKIE_STATE_DROPPED will be read more than once, so let's add a helper to avoid code duplication. Signed-off-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://listman.redhat.com/archives/linux-cachefs/2022-May/006919.html	2022-07-05 16:12:20 +01:00
Yue Hu	bf17455b9c	fscache: Fix if condition in fscache_wait_on_volume_collision() After waiting for the volume to complete the acquisition with timeout, the if condition under which potential volume collision occurs should be acquire the volume is still pending rather than not pending so that we will continue to wait until the pending flag is cleared. Also, use the existing test pending wrapper directly instead of test_bit(). Fixes: `62ab633523` ("fscache: Implement volume registration") Signed-off-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://listman.redhat.com/archives/linux-cachefs/2022-May/006918.html	2022-07-05 16:12:20 +01:00
Ryusuke Konishi	5924e6ec15	nilfs2: fix incorrect masking of permission flags for symlinks The permission flags of newly created symlinks are wrongly dropped on nilfs2 with the current umask value even though symlinks should have 777 (rwxrwxrwx) permissions: $ umask 0022 $ touch file && ln -s file symlink; ls -l file symlink -rw-r--r--. 1 root root 0 Jun 23 16:29 file lrwxr-xr-x. 1 root root 4 Jun 23 16:29 symlink -> file This fixes the bug by inserting a missing check that excludes symlinks. Link: https://lkml.kernel.org/r/1655974441-5612-1-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: Tommy Pettersson <ptp@lysator.liu.se> Reported-by: Ciprian Craciun <ciprian.craciun@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-07-03 15:42:33 -07:00
Linus Torvalds	20855e4cb3	Fixes for 5.19-rc5: - Fix statfs blocking on background inode gc workers - Fix some broken inode lock assertion code - Fix xattr leaf buffer leaks when cancelling a deferred xattr update operation - Clean up xattr recovery to make it easier to understand. - Fix xattr leaf block verifiers tripping over empty blocks. - Remove complicated and error prone xattr leaf block bholding mess. - Fix a bug where an rt extent crossing EOF was treated as "posteof" blocks and cleaned unnecessarily. - Fix a UAF when log shutdown races with unmount. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmK/kVMACgkQ+H93GTRK tOs0tQ/+PYRhEDKrgocxZGJFNvnxqPRdEDu9k5XCnO2Y/DZRAF52F0JZaPtuiFH4 12e9vzYYRNrE9KifzPWo4j2L067kFszt4XcAjytJuf5f6k/duX7XbsdMb17Qxd28 mZDtBBSQCc9fcQo21u5SdZlPaD1SC1843jB4Oe7Sbo3AFvVAMwuBUgnp2TSDA8V0 0q25PUD0ZvWP3UTQS4M4fW4WhFa5wF+GnLR1DZjryFIzuUp9JwdCQZHIFnp6cHq9 TZMDJ4WhD9igMSzicRfgPoC8z/D3Mm0cFmRoURbG3GLzAeJ+e7PJ43rvlwq6Ajcv v5DhyQvFkiVjKLsrtJyvvUGSpkLL/touNG8MUE9I0heiiwb0QbP108aHWU8AS1Dr q7XHIxPaOhvlzVZN1uTuZE4N51/0NWITGKBwF0XU1b5D3wLyvOY6fbI7KLfkX2Sa 4zHKn4QpHUIE9fs5Na3H6L+ndlJclo2DJA6lF26pLgmrT7NLmJG+r97XagBsp/pr X8qOvVMg1XJA37Vy1bTN5cfEYzTTksJk/fQ3AvSKHDCeP5u87kiZ6hqNnW6dD0YF D8VTX29rVQr5HavbcGCmAyBZpk4CfclCsWCQrZu9MCnQSW37HnObXPJkIWvzt8Mn j6emhPcYHy5TwSChxdpzl733ZX0KdkdOAgkWgqtod2E/7fe+g7Q= =8QeL -----END PGP SIGNATURE----- Merge tag 'xfs-5.19-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: "This fixes some stalling problems and corrects the last of the problems (I hope) observed during testing of the new atomic xattr update feature. - Fix statfs blocking on background inode gc workers - Fix some broken inode lock assertion code - Fix xattr leaf buffer leaks when cancelling a deferred xattr update operation - Clean up xattr recovery to make it easier to understand. - Fix xattr leaf block verifiers tripping over empty blocks. - Remove complicated and error prone xattr leaf block bholding mess. - Fix a bug where an rt extent crossing EOF was treated as "posteof" blocks and cleaned unnecessarily. - Fix a UAF when log shutdown races with unmount" * tag 'xfs-5.19-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: prevent a UAF when log IO errors race with unmount xfs: dont treat rt extents beyond EOF as eofblocks to be cleared xfs: don't hold xattr leaf buffers across transaction rolls xfs: empty xattr leaf header blocks are not corruption xfs: clean up the end of xfs_attri_item_recover xfs: always free xattri_leaf_bp when cancelling a deferred op xfs: use invalidate_lock to check the state of mmap_lock xfs: factor out the common lock flags assert xfs: introduce xfs_inodegc_push() xfs: bound maximum wait time for inodegc work	2022-07-03 09:42:17 -07:00
Linus Torvalds	69cb6c6556	Notable regression fixes: - Fix NFSD crash during NFSv4.2 READ_PLUS operation - Fix incorrect status code returned by COMMIT operation -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmLAhFIACgkQM2qzM29m f5fK2w//TyUAzwoTZQEgmFbxNhXgUYC6uwWPqTCLMieStKtPwT8GsuXviY34QziF 2vER0NW9Am2GQyL3gtiYFM07OoHhQ4gr86ltaeIHAHhcwm3eIs879ARwEsyN6eDR +RDpqnONwtg+yaepfCMc4Bki9Jex+mmoXro86nFPmH+TDM5QiIRY0ncBWSLVWvYT YciAgvL6vfo2G79NYOzohoTb15ydotmy6m9H70nN+a2l6bKOIT8cF4S8lZETJZXA Rlj+R0eE0iXZTtp7VsAfVAHHfOzexGJjE85hpVzGiZWbxe6o0WNmBpoHICXs9VoP WRkq6vBC9P6wJ7EOu84/SRlR/rQaxfrYB7beqTC2kO6t5Ka5/xpJMKZOhfukCMV+ 7+uPAUnSIRqpHG0hWm1kPto00bfXm9fMETQbOEw7UQ9dH/322iOJnXwy03ZEXtJq 9G0k+yNcydoIs2g5OpNrw4f/wTQcdlbf1ZA5O9dAsxwS1ZTNpKUiE0Sd0Ez/0VIJ t5DZFWH44rs/60avxRYxtw965gNugTxb1YiKdXObvbFGf+xYtH0zePce++4kTJlE t886stLcHbuPYs4/XItwTxLMSnDM5UR22Icho7ElRHvPiWjZmc6o2fxCkWTzCOE2 ylZowLFMYZ/KbrP5mcqLARKEP6oFJERXt/U8U0CWeVbZomy9GVY= =z5W8 -----END PGP SIGNATURE----- Merge tag 'nfsd-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Notable regression fixes: - Fix NFSD crash during NFSv4.2 READ_PLUS operation - Fix incorrect status code returned by COMMIT operation" * tag 'nfsd-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix READ_PLUS crasher NFSD: restore EINVAL error translation in nfsd_commit()	2022-07-02 11:20:56 -07:00
Linus Torvalds	76ff294e16	More NFS Client Bugfixes for Linux 5.19-rc - Bugfixes: - Allocate a fattr for _nfs4_discover_trunking() - Fix module reference count leak in nfs4_run_state_manager() -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmK/NHYACgkQ18tUv7Cl QOtbXBAAhivn5bqZvrKz4WI4WjddTRcjyvITiW6m26GZZVgNHz1Sc2Tp6TA6QEL8 c7OLkj2SoA0SmO4kZX+gCpOzsapQA7FUWULFxpPxJFp/NCsgoxYqO5ZfX0qVpk6t 1w8lGFqsMve3LdRmcqvbaIrvzJPMdsVvixrZwXRQMe/atvtUMmgHo4pkBPuQ5nv9 FzWh0KRiohhEWSmncD0fjdYBCq4VOqrUEEn4BTeMoXwvg/noLj5GX3mS14CFH12Y +iOqQ05O48Ny8qhzeQ8bRat43t4cZoCpFUcwEPB0CWoNCqS4Qoqvn48Ic+4WMxpN nPg2CqkqaG2RUJozSJz8m+GQNbEohoGkruZXJh7TQaqWXrIJGRMBhKhI2b7hujBG meu0ypETzlbofjleCpevfvnNStoRTuakssMpcU/hfKjnfNIsHnADSYey0HWHWTAH ZaBT6N5Z3C6hRWz7wXIn3uTuWadZbDC9+HGvvyMpuP+PDQFM9exJfhoO0JQvQc/G dPB7SyINVj2Z9gguJaew4csRoSxZqxeLB28XHQ7yyYsmo7lbUWaeUgGGpynrrsZL Ysuh7JVoZyhSSdNb3b2vlnI7zK+F1qIkzg0/u2ZN8CLPfMaBCmPL031oFudpWJ2n TLnxoFHhsG2MTwua3CaOfk8QvNc2dT7ZsvxXHl7HxBlxGtBgSN4= =OYKI -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.19-3' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client fixes from Anna Schumaker: - Allocate a fattr for _nfs4_discover_trunking() - Fix module reference count leak in nfs4_run_state_manager() * tag 'nfs-for-5.19-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4: Add an fattr allocation to _nfs4_discover_trunking() NFS: restore module put when manager exits.	2022-07-01 11:11:32 -07:00
Linus Torvalds	6f8693ea2b	A filesystem fix, marked for stable. There appears to be a deeper issue on the MDS side, but for now we are going with this one-liner to avoid busy looping and potential soft lockups. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmK/C8MTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi1cEB/9CiJoDsc1v+DrP/4Ud/AbI4LMffMcr tkHmUo8ZT5D4feUzSFE6iKgb3gRCJUkYKzesywQ7Xhv7Mr6/DKB4+t9QtrympZFd sAg775mHkL0NI6/OLnLSRva/r627PFk6f1v8OWENOjsw01PLOtWAB/B5FqlgN8tG EQLfX0G83o4AXt4NcPCcsucPh7FxC2iKe8XWqAE6VTjkKnyz3IQHvSLweWV68U8R ht6eun8H+slx8Kw1lSZfW/XoFGFO4uKntCh/CKKH28ZqaXrxrdsfmXSVOMlOi351 qxPfrTPgaSfvWQLbYQfPdQZCsfyyPgP2wdAVfpy56vk0yoxi2TLGBPsD =bu9O -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.19-rc5' of https://github.com/ceph/ceph-client Pull ceph fix from Ilya Dryomov: "A ceph filesystem fix, marked for stable. There appears to be a deeper issue on the MDS side, but for now we are going with this one-liner to avoid busy looping and potential soft lockups" * tag 'ceph-for-5.19-rc5' of https://github.com/ceph/ceph-client: ceph: wait on async create before checking caps for syncfs	2022-07-01 11:06:21 -07:00
Linus Torvalds	0a35d1622d	io_uring-5.19-2022-07-01 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmK+6CoQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpsZPD/9xPZTAJhX3/HNTjbi+FlSvTaJ/4rll98No 1pzW+nZyBVr4yesnHW2qtLwLRaYMNAFjdJmakn1BIUau4IT4Eqhb8NEz4ZCKnDD2 Kwi0q/9c0I/GxTnVXmwXPQzQkZarYLa8cppQr1L/L3el1xTU9qXUdpR7+vxPKi4J ADDP+7buRYp7Td2RfBD2lD4B7jNMpZYVC/2/Y3fixkuJvK4eYKuf+5K7zgmbahm5 YOm86k3P7QN7saTxUeyUrwR/G6CoY99Dd54KadQAS4XkU1f6XuNjF6IsYjPUEZ1B pKlhK4mhGieMlW8yBti0BdJLLTAHVsL9Pa0Aqsv1EdZ3x/Mfp9kmwig9RAGREyQX gNs316VgsfnZb+AdImZ9EItRnPZ/1Z0//VOWiDy7CijKABCZCSFXqOwQ+Yonyfab ZoVXlwlvOaxmiQAWhJe2XKxzRtAfeQgyirmF95N+c/wtIH6dWzJeIs2xFLPIKCaY tkv5Ah4IBGxofJj1SNqKNRUcv6N/Hr7zs/p6yTQpVEoUzsKqzh1eNz8PDA3ewrq4 C6nkXnZfidyqPuUZJIfOa02N/cPLUSclxdll6pHQfIMiwLBlV60pFcSsylgdYTE+ XT/iwiiaSTPUUIkCTYhyoUpfZnNX6IoVpxKOuh5gLOmTz/+xlRfcRjcjuXIoneHQ D9qlUWbYLA== =Edge -----END PGP SIGNATURE----- Merge tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Two minor tweaks: - While we still can, adjust the send/recv based flags to be in ->ioprio rather than in ->addr2. This is consistent with eg accept, and also doesn't waste a full 64-bit field for flags (Pavel) - 5.18-stable fix for re-importing provided buffers. Not much real world relevance here as it'll only impact non-pollable files gone async, which is more of a practical test case rather than something that is used in the wild (Dylan)" * tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-block: io_uring: fix provided buffer import io_uring: keep sendrecv flags in ioprio	2022-07-01 10:52:01 -07:00
Darrick J. Wong	7561cea5db	xfs: prevent a UAF when log IO errors race with unmount KASAN reported the following use after free bug when running generic/475: XFS (dm-0): Mounting V5 Filesystem XFS (dm-0): Starting recovery (logdev: internal) XFS (dm-0): Ending recovery (logdev: internal) Buffer I/O error on dev dm-0, logical block 20639616, async page read Buffer I/O error on dev dm-0, logical block 20639617, async page read XFS (dm-0): log I/O error -5 XFS (dm-0): Filesystem has been shut down due to log error (0x2). XFS (dm-0): Unmounting Filesystem XFS (dm-0): Please unmount the filesystem and rectify the problem(s). ================================================================== BUG: KASAN: use-after-free in do_raw_spin_lock+0x246/0x270 Read of size 4 at addr ffff888109dd84c4 by task 3:1H/136 CPU: 3 PID: 136 Comm: 3:1H Not tainted 5.19.0-rc4-xfsx #rc4 8e53ab5ad0fddeb31cee5e7063ff9c361915a9c4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Workqueue: xfs-log/dm-0 xlog_ioend_work [xfs] Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x2b8/0x661 ? do_raw_spin_lock+0x246/0x270 kasan_report+0xab/0x120 ? do_raw_spin_lock+0x246/0x270 do_raw_spin_lock+0x246/0x270 ? rwlock_bug.part.0+0x90/0x90 xlog_force_shutdown+0xf6/0x370 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318] xlog_ioend_work+0x100/0x190 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318] process_one_work+0x672/0x1040 worker_thread+0x59b/0xec0 ? __kthread_parkme+0xc6/0x1f0 ? process_one_work+0x1040/0x1040 ? process_one_work+0x1040/0x1040 kthread+0x29e/0x340 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 154099: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 kmem_alloc+0x8d/0x2e0 [xfs] xlog_cil_init+0x1f/0x540 [xfs] xlog_alloc_log+0xd1e/0x1260 [xfs] xfs_log_mount+0xba/0x640 [xfs] xfs_mountfs+0xf2b/0x1d00 [xfs] xfs_fs_fill_super+0x10af/0x1910 [xfs] get_tree_bdev+0x383/0x670 vfs_get_tree+0x7d/0x240 path_mount+0xdb7/0x1890 __x64_sys_mount+0x1fa/0x270 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 154151: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x110/0x190 slab_free_freelist_hook+0xab/0x180 kfree+0xbc/0x310 xlog_dealloc_log+0x1b/0x2b0 [xfs] xfs_unmountfs+0x119/0x200 [xfs] xfs_fs_put_super+0x6e/0x2e0 [xfs] generic_shutdown_super+0x12b/0x3a0 kill_block_super+0x95/0xd0 deactivate_locked_super+0x80/0x130 cleanup_mnt+0x329/0x4d0 task_work_run+0xc5/0x160 exit_to_user_mode_prepare+0xd4/0xe0 syscall_exit_to_user_mode+0x1d/0x40 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This appears to be a race between the unmount process, which frees the CIL and waits for in-flight iclog IO; and the iclog IO completion. When generic/475 runs, it starts fsstress in the background, waits a few seconds, and substitutes a dm-error device to simulate a disk falling out of a machine. If the fsstress encounters EIO on a pure data write, it will exit but the filesystem will still be online. The next thing the test does is unmount the filesystem, which tries to clean the log, free the CIL, and wait for iclog IO completion. If an iclog was being written when the dm-error switch occurred, it can race with log unmounting as follows: Thread 1 Thread 2 xfs_log_unmount xfs_log_clean xfs_log_quiesce xlog_ioend_work <observe error> xlog_force_shutdown test_and_set_bit(XLOG_IOERROR) xfs_log_force <log is shut down, nop> xfs_log_umount_write <log is shut down, nop> xlog_dealloc_log xlog_cil_destroy <wait for iclogs> spin_lock(&log->l_cilp->xc_push_lock) <KABOOM> Therefore, free the CIL after waiting for the iclogs to complete. I /think/ this race has existed for quite a few years now, though I don't remember the ~2014 era logging code well enough to know if it was a real threat then or if the actual race was exposed only more recently. Fixes: `ac983517ec` ("xfs: don't sleep in xlog_cil_force_lsn on shutdown") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-07-01 09:09:52 -07:00
Amir Goldstein	868f9f2f8e	vfs: fix copy_file_range() regression in cross-fs copies A regression has been reported by Nicolas Boichat, found while using the copy_file_range syscall to copy a tracefs file. Before commit `5dae222a5f` ("vfs: allow copy_file_range to copy across devices") the kernel would return -EXDEV to userspace when trying to copy a file across different filesystems. After this commit, the syscall doesn't fail anymore and instead returns zero (zero bytes copied), as this file's content is generated on-the-fly and thus reports a size of zero. Another regression has been reported by He Zhe - the assertion of WARN_ON_ONCE(ret == -EOPNOTSUPP) can be triggered from userspace when copying from a sysfs file whose read operation may return -EOPNOTSUPP. Since we do not have test coverage for copy_file_range() between any two types of filesystems, the best way to avoid these sort of issues in the future is for the kernel to be more picky about filesystems that are allowed to do copy_file_range(). This patch restores some cross-filesystem copy restrictions that existed prior to commit `5dae222a5f` ("vfs: allow copy_file_range to copy across devices"), namely, cross-sb copy is not allowed for filesystems that do not implement ->copy_file_range(). Filesystems that do implement ->copy_file_range() have full control of the result - if this method returns an error, the error is returned to the user. Before this change this was only true for fs that did not implement the ->remap_file_range() operation (i.e. nfsv3). Filesystems that do not implement ->copy_file_range() still fall-back to the generic_copy_file_range() implementation when the copy is within the same sb. This helps the kernel can maintain a more consistent story about which filesystems support copy_file_range(). nfsd and ksmbd servers are modified to fall-back to the generic_copy_file_range() implementation in case vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV, which preserves behavior of server-side-copy. fall-back to generic_copy_file_range() is not implemented for the smb operation FSCTL_DUPLICATE_EXTENTS_TO_FILE, which is arguably a correct change of behavior. Fixes: `5dae222a5f` ("vfs: allow copy_file_range to copy across devices") Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/ Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/ Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/ Link: https://lore.kernel.org/linux-fsdevel/20210630161320.29006-1-lhenriques@suse.de/ Reported-by: Nicolas Boichat <drinkcat@chromium.org> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Luis Henriques <lhenriques@suse.de> Fixes: `64bf5ff58d` ("vfs: no fallback for ->copy_file_range") Link: https://lore.kernel.org/linux-fsdevel/20f17f64-88cb-4e80-07c1-85cb96c83619@windriver.com/ Reported-by: He Zhe <zhe.he@windriver.com> Tested-by: Namjae Jeon <linkinjeon@kernel.org> Tested-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-06-30 15:16:38 -07:00
Scott Mayhew	4f40a5b554	NFSv4: Add an fattr allocation to _nfs4_discover_trunking() This was missed in `c3ed222745` ("NFSv4: Fix free of uninitialized nfs4_label on referral lookup.") and causes a panic when mounting with '-o trunkdiscovery': PID: 1604 TASK: ffff93dac3520000 CPU: 3 COMMAND: "mount.nfs" #0 [ffffb79140f738f8] machine_kexec at ffffffffaec64bee #1 [ffffb79140f73950] __crash_kexec at ffffffffaeda67fd #2 [ffffb79140f73a18] crash_kexec at ffffffffaeda76ed #3 [ffffb79140f73a30] oops_end at ffffffffaec2658d #4 [ffffb79140f73a50] general_protection at ffffffffaf60111e [exception RIP: nfs_fattr_init+0x5] RIP: ffffffffc0c18265 RSP: ffffb79140f73b08 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff93dac304a800 RCX: 0000000000000000 RDX: ffffb79140f73bb0 RSI: ffff93dadc8cbb40 RDI: d03ee11cfaf6bd50 RBP: ffffb79140f73be8 R8: ffffffffc0691560 R9: 0000000000000006 R10: ffff93db3ffd3df8 R11: 0000000000000000 R12: ffff93dac4040000 R13: ffff93dac2848e00 R14: ffffb79140f73b60 R15: ffffb79140f73b30 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffb79140f73b08] _nfs41_proc_get_locations at ffffffffc0c73d53 [nfsv4] #6 [ffffb79140f73bf0] nfs4_proc_get_locations at ffffffffc0c83e90 [nfsv4] #7 [ffffb79140f73c60] nfs4_discover_trunking at ffffffffc0c83fb7 [nfsv4] #8 [ffffb79140f73cd8] nfs_probe_fsinfo at ffffffffc0c0f95f [nfs] #9 [ffffb79140f73da0] nfs_probe_server at ffffffffc0c1026a [nfs] RIP: 00007f6254fce26e RSP: 00007ffc69496ac8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6254fce26e RDX: 00005600220a82a0 RSI: 00005600220a64d0 RDI: 00005600220a6520 RBP: 00007ffc69496c50 R8: 00005600220a8710 R9: 003035322e323231 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc69496c50 R13: 00005600220a8440 R14: 0000000000000010 R15: 0000560020650ef9 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b Fixes: `c3ed222745` ("NFSv4: Fix free of uninitialized nfs4_label on referral lookup.") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2022-06-30 16:13:00 -04:00
NeilBrown	080abad71e	NFS: restore module put when manager exits. Commit `f49169c97f` ("NFSD: Remove svc_serv_ops::svo_module") removed calls to module_put_and_kthread_exit() from threads that acted as SUNRPC servers and had a related svc_serv_ops structure. This was correct. It ALSO removed the module_put_and_kthread_exit() call from nfs4_run_state_manager() which is NOT a SUNRPC service. Consequently every time the NFSv4 state manager runs the module count increments and won't be decremented. So the nfsv4 module cannot be unloaded. So restore the module_put_and_kthread_exit() call. Fixes: `f49169c97f` ("NFSD: Remove svc_serv_ops::svo_module") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2022-06-30 16:13:00 -04:00
Dylan Yudaken	09007af2b6	io_uring: fix provided buffer import io_import_iovec uses the s pointer, but this was changed immediately after the iovec was re-imported and so it was imported into the wrong place. Change the ordering. Fixes: `2be2eb02e2` ("io_uring: ensure reads re-import for selected buffers") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630132006.2825668-1-dylany@fb.com [axboe: ensure we don't half-import as well] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-30 11:34:41 -06:00
Linus Torvalds	9fb3bb25d1	Merge tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify fix from Jan Kara: "A fix for recently added fanotify API to have stricter checks and refuse some invalid flag combinations to make our life easier in the future" * tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: refine the validation checks on non-dir inode mask	2022-06-30 09:57:18 -07:00
Pavel Begunkov	29c1ac230e	io_uring: keep sendrecv flags in ioprio We waste a u64 SQE field for flags even though we don't need as many bits and it can be used for something more useful later. Store io_uring specific send/recv flags in sqe->ioprio instead of ->addr2. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Fixes: `0455d4ccec` ("io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg") [axboe: change comment in io_uring.h as well] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-30 07:15:50 -06:00
Linus Torvalds	732f306943	six ksmbd server fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmK7GlQACgkQiiy9cAdy T1FIPQv/agDjMbpfqI4+3JrDzGofGyUgq31bQRwjKouM8dxC7iE0KRaerzDUYur1 oZSaqidCZPiFTrROQ2Kuzf8YAKFS0XubaH3HTJd5yu2/WiApsSNwelcDYtrolq5T OnC3t5tpkVDUcx8ynHAGrYe0mEkY5+IIYkKU9O9JduPlDX4qC796Iel9ESg5LoDR 0fx+HTdCr3OdfOipOEUl4HxmwWsx8u7RywmNPusaH21jbuC9w1LWSLBWuFIZywio bHKq57P9k/I9OXTGV7DuEaf2uvZd+ceH1ZahgXIvDdHzHM183ltdTWiyIbmtLDAj iYyos6UZsdK5IpN8+z9wY1zqUfj0btr+UVhbrXRwv5dIH+C0tnYCwDGNXszUh7Id XTeQWnaI+BaBA9VuqpvUtSXIJTsikaTYdnk4m/rLDYPaHVeuCKv1QkcitWpjHvIt FPlIvB95QMy6CrWtlGxjwDpNBLnca4NKigLTCESJaIKcvnn25kA8OtRG6tHbpvVM KiCkjH14 =8zjT -----END PGP SIGNATURE----- Merge tag '5.19-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd Pull ksmbd server fixes from Steve French: - seek null check (don't use f_seek op directly and blindly) - offset validation in FSCTL_SET_ZERO_DATA - fallocate fix (relates e.g. to xfstests generic/091 and 263) - two cleanup fixes - fix socket settings on some arch * tag '5.19-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: use vfs_llseek instead of dereferencing NULL ksmbd: check invalid FileOffset and BeyondFinalZero in FSCTL_ZERO_DATA ksmbd: set the range of bytes to zero without extending file size in FSCTL_ZERO_DATA ksmbd: remove duplicate flag set in smb2_write ksmbd: smbd: Remove useless license text when SPDX-License-Identifier is already used ksmbd: use SOCK_NONBLOCK type for kernel_accept()	2022-06-29 09:20:40 -07:00
Jeff Layton	8692969e91	ceph: wait on async create before checking caps for syncfs Currently, we'll call ceph_check_caps, but if we're still waiting on the reply, we'll end up spinning around on the same inode in flush_dirty_session_caps. Wait for the async create reply before flushing caps. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/55823 Fixes: `fbed7045f5` ("ceph: wait for async create reply before sending any cap messages") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-06-29 18:02:57 +02:00
Darrick J. Wong	8944c6fb8a	xfs: dont treat rt extents beyond EOF as eofblocks to be cleared On a system with a realtime volume and a 28k realtime extent, generic/491 fails because the test opens a file on a frozen filesystem and closing it causes xfs_release -> xfs_can_free_eofblocks to mistakenly think that the the blocks of the realtime extent beyond EOF are posteof blocks to be freed. Realtime extents cannot be partially unmapped, so this is pointless. Worse yet, this triggers posteof cleanup, which stalls on a transaction allocation, which is why the test fails. Teach the predicate to account for realtime extents properly. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2022-06-29 08:47:56 -07:00
Darrick J. Wong	e53bcffad0	xfs: don't hold xattr leaf buffers across transaction rolls Now that we've established (again!) that empty xattr leaf buffers are ok, we no longer need to bhold them to transactions when we're creating new leaf blocks. Get rid of the entire mechanism, which should simplify the xattr code quite a bit. The original justification for using bhold here was to prevent the AIL from trying to write the empty leaf block into the fs during the brief time that we release the buffer lock. The reason for /that/ was to prevent recovery from tripping over the empty ondisk block. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-06-29 08:47:56 -07:00
Darrick J. Wong	7be3bd8856	xfs: empty xattr leaf header blocks are not corruption TLDR: Revert commit `51e6104fdb` ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") because it was wrong. Every now and then we get a corruption report from the kernel or xfs_repair about empty leaf blocks in the extended attribute structure. We've long thought that these shouldn't be possible, but prior to 5.18 one would shake loose in the recoveryloop fstests about once a month. A new addition to the xattr leaf block verifier in 5.19-rc1 makes this happen every 7 minutes on my testing cloud. I added a ton of logging to detect any time we set the header count on an xattr leaf block to zero. This produced the following dmesg output on generic/388: XFS (sda4): ino 0x21fcbaf leaf 0x129bf78 hdcount==0! Call Trace: <TASK> dump_stack_lvl+0x34/0x44 xfs_attr3_leaf_create+0x187/0x230 xfs_attr_shortform_to_leaf+0xd1/0x2f0 xfs_attr_set_iter+0x73e/0xa90 xfs_xattri_finish_update+0x45/0x80 xfs_attr_finish_item+0x1b/0xd0 xfs_defer_finish_noroll+0x19c/0x770 __xfs_trans_commit+0x153/0x3e0 xfs_attr_set+0x36b/0x740 xfs_xattr_set+0x89/0xd0 __vfs_setxattr+0x67/0x80 __vfs_setxattr_noperm+0x6e/0x120 vfs_setxattr+0x97/0x180 setxattr+0x88/0xa0 path_setxattr+0xc3/0xe0 __x64_sys_setxattr+0x27/0x30 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 So now we know that someone is creating empty xattr leaf blocks as part of converting a sf xattr structure into a leaf xattr structure. The conversion routine logs any existing sf attributes in the same transaction that creates the leaf block, so we know this is a setxattr to a file that has no attributes at all. Next, g/388 calls the shutdown ioctl and cycles the mount to trigger log recovery. I also augmented buffer item recovery to call ->verify_struct on any attr leaf blocks and complain if it finds a failure: XFS (sda4): Unmounting Filesystem XFS (sda4): Mounting V5 Filesystem XFS (sda4): Starting recovery (logdev: internal) XFS (sda4): xattr leaf daddr 0x129bf78 hdrcount == 0! Call Trace: <TASK> dump_stack_lvl+0x34/0x44 xfs_attr3_leaf_verify+0x3b8/0x420 xlog_recover_buf_commit_pass2+0x60a/0x6c0 xlog_recover_items_pass2+0x4e/0xc0 xlog_recover_commit_trans+0x33c/0x350 xlog_recovery_process_trans+0xa5/0xe0 xlog_recover_process_data+0x8d/0x140 xlog_do_recovery_pass+0x19b/0x720 xlog_do_log_recovery+0x62/0xc0 xlog_do_recover+0x33/0x1d0 xlog_recover+0xda/0x190 xfs_log_mount+0x14c/0x360 xfs_mountfs+0x517/0xa60 xfs_fs_fill_super+0x6bc/0x950 get_tree_bdev+0x175/0x280 vfs_get_tree+0x1a/0x80 path_mount+0x6f5/0xaa0 __x64_sys_mount+0x103/0x140 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fc61e241eae And a moment later, the _delwri_submit of the recovered buffers trips the same verifier and recovery fails: XFS (sda4): Metadata corruption detected at xfs_attr3_leaf_verify+0x393/0x420 [xfs], xfs_attr3_leaf block 0x129bf78 XFS (sda4): Unmount and run xfs_repair XFS (sda4): First 128 bytes of corrupted metadata buffer: 00000000: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00 ........;....... 00000010: 00 00 00 00 01 29 bf 78 00 00 00 00 00 00 00 00 .....).x........ 00000020: a5 1b d0 02 b2 9a 49 df 8e 9c fb 8d f8 31 3e 9d ......I......1>. 00000030: 00 00 00 00 02 1f cb af 00 00 00 00 10 00 00 00 ................ 00000040: 00 50 0f b0 00 00 00 00 00 00 00 00 00 00 00 00 .P.............. 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (sda4): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x37f/0x3b0 [xfs] (fs/xfs/xfs_buf.c:1518). Shutting down filesystem. XFS (sda4): Please unmount the filesystem and rectify the problem(s) XFS (sda4): log mount/recovery failed: error -117 XFS (sda4): log mount failed I think I see what's going on here -- setxattr is racing with something that shuts down the filesystem: Thread 1 Thread 2 -------- -------- xfs_attr_sf_addname xfs_attr_shortform_to_leaf <create empty leaf> xfs_trans_bhold(leaf) xattri_dela_state = XFS_DAS_LEAF_ADD <roll transaction> <flush log> <shut down filesystem> xfs_trans_bhold_release(leaf) <discover fs is dead, bail> Thread 3 -------- <cycle mount, start recovery> xlog_recover_buf_commit_pass2 xlog_recover_do_reg_buffer <replay empty leaf buffer from recovered buf item> xfs_buf_delwri_queue(leaf) xfs_buf_delwri_submit _xfs_buf_ioapply(leaf) xfs_attr3_leaf_write_verify <trip over empty leaf buffer> <fail recovery> As you can see, the bhold keeps the leaf buffer locked and thus prevents the AIL from tripping over the ichdr.count==0 check in the write verifier. Unfortunately, it doesn't prevent the log from getting flushed to disk, which sets up log recovery to fail. So. It's clear that the kernel has always had the ability to persist attr leaf blocks with ichdr.count==0, which means that it's part of the ondisk format now. Unfortunately, this check has been added and removed multiple times throughout history. It first appeared in[1] kernel 3.10 as part of the early V5 format patches. The check was later discovered to break log recovery and hence disabled[2] during log recovery in kernel 4.10. Simultaneously, the check was added[3] to xfs_repair 4.9.0 to try to weed out the empty leaf blocks. This was still not correct because log recovery would recover an empty attr leaf block successfully only for regular xattr operations to trip over the empty block during of the block during regular operation. Therefore, the check was removed entirely[4] in kernel 5.7 but removal of the xfs_repair check was forgotten. The continued complaints from xfs_repair lead to us mistakenly re-adding[5] the verifier check for kernel 5.19. Remove it once again. [1] `517c22207b` ("xfs: add CRCs to attr leaf blocks") [2] `2e1d23370e` ("xfs: ignore leaf attr ichdr.count in verifier during log replay") [3] f7140161 ("xfs_repair: junk leaf attribute if count == 0") [4] `f28cef9e4d` ("xfs: don't fail verifier on empty attr3 leaf block") [5] `51e6104fdb` ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") Looking at the rest of the xattr code, it seems that files with empty leaf blocks behave as expected -- listxattr reports no attributes; getxattr on any xattr returns nothing as expected; removexattr does nothing; and setxattr can add attributes just fine. Original-bug: `517c22207b` ("xfs: add CRCs to attr leaf blocks") Still-not-fixed-by: `2e1d23370e` ("xfs: ignore leaf attr ichdr.count in verifier during log replay") Removed-in: `f28cef9e4d` ("xfs: don't fail verifier on empty attr3 leaf block") Fixes: `51e6104fdb` ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-06-29 08:47:56 -07:00
Amir Goldstein	8698e3bab4	fanotify: refine the validation checks on non-dir inode mask Commit `ceaf69f8ea` ("fanotify: do not allow setting dirent events in mask of non-dir") added restrictions about setting dirent events in the mask of a non-dir inode mark, which does not make any sense. For backward compatibility, these restictions were added only to new (v5.17+) APIs. It also does not make any sense to set the flags FAN_EVENT_ON_CHILD or FAN_ONDIR in the mask of a non-dir inode. Add these flags to the dir-only restriction of the new APIs as well. Move the check of the dir-only flags for new APIs into the helper fanotify_events_supported(), which is only called for FAN_MARK_ADD, because there is no need to error on an attempt to remove the dir-only flags from non-dir inode. Fixes: `ceaf69f8ea` ("fanotify: do not allow setting dirent events in mask of non-dir") Link: https://lore.kernel.org/linux-fsdevel/20220627113224.kr2725conevh53u4@quack3.lan/ Link: https://lore.kernel.org/r/20220627174719.2838175-1-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2022-06-28 11:18:13 +02:00
Alexey Khoroshilov	8a9ffb8c85	NFSD: restore EINVAL error translation in nfsd_commit() commit `555dbf1a9a` ("nfsd: Replace use of rwsem with errseq_t") incidentally broke translation of -EINVAL to nfserr_notsupp. The patch restores that. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Fixes: `555dbf1a9a` ("nfsd: Replace use of rwsem with errseq_t") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-06-27 10:33:05 -04:00
Darrick J. Wong	f94e08b602	xfs: clean up the end of xfs_attri_item_recover The end of this function could use some cleanup -- the EAGAIN conditionals make it harder to figure out what's going on with the disposal of xattri_leaf_bp, and the dual error/ret variables aren't needed. Turn the EAGAIN case into a separate block documenting all the subtleties of recovering in the middle of an xattr update chain, which makes the rest of the prologue much simpler. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-06-26 14:43:28 -07:00
Darrick J. Wong	b822ea17fd	xfs: always free xattri_leaf_bp when cancelling a deferred op While running the following fstest with logged xattrs DISabled, I noticed the following: # FSSTRESS_AVOID="-z -f unlink=1 -f rmdir=1 -f creat=2 -f mkdir=2 -f getfattr=3 -f listfattr=3 -f attr_remove=4 -f removefattr=4 -f setfattr=20 -f attr_set=60" ./check generic/475 INFO: task u9:1:40 blocked for more than 61 seconds. Tainted: G O 5.19.0-rc2-djwx #rc2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:u9:1 state:D stack:12872 pid: 40 ppid: 2 flags:0x00004000 Workqueue: xfs-cil/dm-0 xlog_cil_push_work [xfs] Call Trace: <TASK> __schedule+0x2db/0x1110 schedule+0x58/0xc0 schedule_timeout+0x115/0x160 __down_common+0x126/0x210 down+0x54/0x70 xfs_buf_lock+0x2d/0xe0 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] xfs_buf_item_unpin+0x227/0x3a0 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] xfs_trans_committed_bulk+0x18e/0x320 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] xlog_cil_committed+0x2ea/0x360 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] xlog_cil_push_work+0x60f/0x690 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] process_one_work+0x1df/0x3c0 worker_thread+0x53/0x3b0 kthread+0xea/0x110 ret_from_fork+0x1f/0x30 </TASK> This appears to be the result of shortform_to_leaf creating a new leaf buffer as part of adding an xattr to a file. The new leaf buffer is held and attached to the xfs_attr_intent structure, but then the filesystem shuts down. Instead of the usual path (which adds the attr to the held leaf buffer which releases the hold), we instead cancel the entire deferred operation. Unfortunately, xfs_attr_cancel_item doesn't release any attached leaf buffers, so we leak the locked buffer. The CIL cannot do anything about that, and hangs. Fix this by teaching it to release leaf buffers, and make XFS a little more careful about not leaving a dangling reference. The prologue of xfs_attri_item_recover is (in this author's opinion) a little hard to figure out, so I'll clean that up in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-06-26 14:43:28 -07:00
Kaixu Xia	82af880639	xfs: use invalidate_lock to check the state of mmap_lock We should use invalidate_lock and XFS_MMAPLOCK_SHARED to check the state of mmap_lock rw_semaphore in xfs_isilocked(), rather than i_rwsem and XFS_IOLOCK_SHARED. Fixes: `2433480a7e` ("xfs: Convert to use invalidate_lock") Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-06-26 14:43:28 -07:00
Kaixu Xia	ca76a761ea	xfs: factor out the common lock flags assert There are similar lock flags assert in xfs_ilock(), xfs_ilock_nowait(), xfs_iunlock(), thus we can factor it out into a helper that is clear. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-06-26 14:43:28 -07:00
Linus Torvalds	413c1f1491	Minor things, mainly - mailmap updates, MAINTAINERS updates, etc. Fixes for post-5.18 changes: - fix for a damon boot hang, from SeongJae - fix for a kfence warning splat, from Jason Donenfeld - fix for zero-pfn pinning, from Alex Williamson - fix for fallocate hole punch clearing, from Mike Kravetz Fixes pre-5.18 material: - fix for a performance regression, from Marcelo - fix for a hwpoisining BUG from zhenwei pi -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYri4RgAKCRDdBJ7gKXxA jmhsAQDCvGqtIUhgkTwid8KBRNbowsg0LXd6k+gUjcxBhH403wEA0r0cxxkDAmgr QNXn/qZRzQP2ji+pdjH9NBOsd2g2XQA= =UGJ7 -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "Minor things, mainly - mailmap updates, MAINTAINERS updates, etc. Fixes for this merge window: - fix for a damon boot hang, from SeongJae - fix for a kfence warning splat, from Jason Donenfeld - fix for zero-pfn pinning, from Alex Williamson - fix for fallocate hole punch clearing, from Mike Kravetz Fixes for previous releases: - fix for a performance regression, from Marcelo - fix for a hwpoisining BUG from zhenwei pi" * tag 'mm-hotfixes-stable-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mailmap: add entry for Christian Marangi mm/memory-failure: disable unpoison once hw error happens hugetlbfs: zero partial pages during fallocate hole punch mm: memcontrol: reference to tools/cgroup/memcg_slabinfo.py mm: re-allow pinning of zero pfns mm/kfence: select random number before taking raw lock MAINTAINERS: add maillist information for LoongArch MAINTAINERS: update MM tree references MAINTAINERS: update Abel Vesa's email MAINTAINERS: add MEMORY HOT(UN)PLUG section and add David as reviewer MAINTAINERS: add Miaohe Lin as a memory-failure reviewer mailmap: add alias for jarkko@profian.com mm/damon/reclaim: schedule 'damon_reclaim_timer' only after 'system_wq' is initialized kthread: make it clear that kthread_create_on_node() might be terminated by any fatal signal mm: lru_cache_disable: use synchronize_rcu_expedited mm/page_isolation.c: fix one kernel-doc comment	2022-06-26 14:00:55 -07:00
Linus Torvalds	82708bb1eb	for-5.19-rc3-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmK4dV4ACgkQxWXV+ddt WDs4uQ/7B0XqPK05NJntJfwnuIoT/yOreKf47wt/6DyFV3CDMFte/qzaZwthwu6P F0GMpSYAlVszLlML5elvF9VXymlV+e+QROtbD6QCNLNW1IwHA7ZiF5fV/a1Rj930 XSuaDyVFPAK7892RR6yMQ20IeMBuvqiAhXWEzaIJ2tIcAHn+fP+VkY8Nc0aZj3iC mI+ep4n93karDxmnHVGUxJTxAe0l/uNopx+fYBWQDj7HuoMLo0Cu+rAdv0gRIxi2 RWUBkR4e4PBwV1OFScwNCsljjt6bHdUHrtdB3fo5Hzu9cO5hHdL7NEsKB1K2w7rV bgNuNqfj6Y4xUBchAfQO5CCJ9ISci5KoJ4RBpk6EprZR3QN40kN8GPlhi2519K7w F3d8jolDDHlkqxIsqoe47MYOcSepNEadVNsiYKb0rM6doilfxyXiu6dtTFMrC8Vy K2HDCdTyuIgw+TnwqT1puaUwxiIL8DFJf1CVyjwGuQ4UgaIEkHXKIsCssyyJ76Jh QkWX1aeRldbfkVArJWHQWqDQopx9pFBz1gjlws0YjAsU5YijOOXva464P9Rxg+Gq 4pRlgnO48joQam9bRirP2Z6yhqa4O6jkzKDOXSYduAUYD7IMfpsYnz09wKS95jj+ QCrR7VmKnpQdsXg5a/mqyacfIH30ph002VywRxPiFM89Syd25yo= =rUrf -----END PGP SIGNATURE----- Merge tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - zoned relocation fixes: - fix critical section end for extent writeback, this could lead to out of order write - prevent writing to previous data relocation block group if space gets low - reflink fixes: - fix race between reflinking and ordered extent completion - proper error handling when block reserve migration fails - add missing inode iversion/mtime/ctime updates on each iteration when replacing extents - fix deadlock when running fsync/fiemap/commit at the same time - fix false-positive KCSAN report regarding pid tracking for read locks and data race - minor documentation update and link to new site * tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Documentation: update btrfs list of features and link to readthedocs.io btrfs: fix deadlock with fsync+fiemap+transaction commit btrfs: don't set lock_owner when locking extent buffer for reading btrfs: zoned: fix critical section of relocation inode writeback btrfs: zoned: prevent allocation from previous data relocation BG btrfs: do not BUG_ON() on failure to migrate space when replacing extents btrfs: add missing inode updates on each iteration when replacing extents btrfs: fix race between reflinking and ordered extent completion	2022-06-26 10:11:36 -07:00
Linus Torvalds	97d4d02697	Description for this pull request: - Use updated exfat_chain directly instead of snapshot values in rename. -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmK3ryYWHGxpbmtpbmpl b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCLBLEADQV1zAIN3/NwvrHlsB/8fUDLoM DgnAfvGSQ61A5DYpx3clo5BtvpRhQIj22/3Jj3AMcUDXUkrEvsgLe1R7tQ6xUu/w PXwDPU89F5hI/nviyJYB6g8EjpWctzvbkkgR9eJO6ZxBna7VYHpz3tljnciW9slR v4tPD7iSlhaMkb+3qO4ll3LvMx1uKRUATu1C9sh5YbesYAN6A3De6fcSi7lPuXKN JefjWOWQEir+/9Hcrjxz/FNOdYzKS6CSE4ps6AJx+mqXkChcWLgwp5+Q+oDOYnAG SQ9Tk2pW37Ba7+WlM9HJc5vM2j1a0Ww4HFQEqAG/yzQmP1N97mX4Jv3/9nao0ojY OUEchgVIutPojFK1ykXUd4RZSDkLPq+LREtfQ++gfO2/oKlDhS6TwXYMg3Tg5kW5 q8TIWXzfd+waYCcHtP4MGsh6dVGT15REukbKHzFb6X+e/R1f9cGnI6a68187eDb5 bewEw/zIwx3GH8UT1k4sjwxvkGFln/HuAS11g5cLbcYG/htSehY1u/ir4ddmNn2u bRpUe0KuyzZfBdJ2bvOWJ37P7n4YOweRWudjMOIGENS5NwLFb4zd64NwHzklUmiw cpwMsaMxzzOdmf9TobjJ0ioRNh9eiltaxBme1KK22tJSNfGG/6+UnxryKLMFc5Rd 8KBPcuD8/rOKYdIi4Q== =GSFP -----END PGP SIGNATURE----- Merge tag 'exfat-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fix from Namjae Jeon: - Use updated exfat_chain directly instead of snapshot values in rename. * tag 'exfat-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: use updated exfat_chain directly during renaming	2022-06-26 08:41:04 -07:00
Linus Torvalds	918c30dffd	7 SMB3 fixes, addressing important multichannel, reconnect issues -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmK3e7AACgkQiiy9cAdy T1EYgwwArQqUZTxIAQ8zcM1AKP7CbCmZWKt1EMH3tYghnYc0D1oY7XAZ6obrbVNU jstw09T+fc2V/5lCKQ5FRIXFwS9DuG7/pnYeEWoft37zhe5mE/uVdIVbd139LE6t Ho96+OY6xkUL+cd2v6v2SyECxE+ahJQBvOmfmY3bvvr9MGWR1pC6aU182cQQCUKs sOoPEj/KTYRc2AutMHu0xJTIEkrGkQBaUrUd+YbQKfoMg48WFkTVHl+2XhKumkM+ 2uF97G5P+1J4WPlc/XlsnmJfA93J8H1Ex6rfv3NuMpBh0N6Q5YSCfcheXVTN5yN6 Px9r5Da1n+M1/WeGSUplm1r+z3jSYYJT9RRi0QPzYBzwlngyqdcfMsspU9qnBEdv Yya1QXn3cO0MY1y1SwwX0OqGc3AVdAvTAoPgBJG5fecnhf+X7UVe9dpcg7wRzXE/ bK97MOdDf4UE9u8NhFcTSmpubu7iplkY1CtMqGDd5VA4pSuRGNYsnQbo+7ZsaG0k 6Etl+sCz =Y554 -----END PGP SIGNATURE----- Merge tag '5.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs client fixes from Steve French: "Fixes addressing important multichannel, and reconnect issues. Multichannel mounts when the server network interfaces changed, or ip addresses changed, uncovered problems, especially in reconnect, but the patches for this were held up until recently due to some lock conflicts that are now addressed. Included in this set of fixes: - three fixes relating to multichannel reconnect, dynamically adjusting the list of server interfaces to avoid problems during reconnect - a lock conflict fix related to the above - two important fixes for negotiate on secondary channels (null netname can unintentionally cause multichannel to be disabled to some servers) - a reconnect fix (reporting incorrect IP address in some cases)" * tag '5.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update cifs_ses::ip_addr after failover cifs: avoid deadlocks while updating iface cifs: periodically query network interfaces from server cifs: during reconnect, update interface if necessary cifs: change iface_list from array to sorted linked list smb3: use netname when available on secondary channels smb3: fix empty netname context on secondary channels	2022-06-26 08:34:52 -07:00
Jason A. Donenfeld	067baa9a37	ksmbd: use vfs_llseek instead of dereferencing NULL By not checking whether llseek is NULL, this might jump to NULL. Also, it doesn't check FMODE_LSEEK. Fix this by using vfs_llseek(), which always does the right thing. Fixes: `f441584858` ("cifsd: add file operations") Cc: stable@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: Ronnie Sahlberg <lsahlber@redhat.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-25 19:52:49 -05:00
Linus Torvalds	29eeafc661	f2fs-fix-5.19 This includes some urgent fixes to avoid generating corrupted inodes caused by compressed and inline_data files. In addition, another patch tries to avoid wrong error report which prevents a roll-forward recovery. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmK2WREACgkQQBSofoJI UNJtuA//a9/7svQ32hK2/mGE9boK8V1tQEeOnTS79toMOh/AajAAlQyo7PmNuY3Z CkvT3wFJ7KzTgHZ7pHSAMdXX3grb+xs9vGqVdp6ICE4Le3p1QSdIaX7XCtTuhB3t p5u7yMuPorDFFKTJ9Ijq6/3xiS/qoKLCITAgzxMW8fdJzgJGU9qM2XMFw6r7fQnq sCQAJLGI0mZUkL0eDeb5iBTup9fSh3O5VEtXiOxqOI97tyUpeCt68PfTT3xW6viB u0QVaxTQYyM9/e61KpdgbhX7pfhz3mWsUgCvTZ9nH2siM9j0tWm3Q/vtMdnH1ETk bau2100B/hDywkulGrRYDmiYBbFQ/DZyPXxnE8kxe5AOejq47t1HDEmzd+fnac1x 1eHSSw/ZKVEMlQX0bGDSRBJM7hZBfCdq4cj5GbswQ8vsYJ/1FYKWTi8T6s8fYTD3 6QPkDxKDHemcbNbbFnHlBjxrb+L1QmVZK+WDqmTe9Nh2G1Er/nnhjM3T7D6iOJG9 9egE+37r90Z/I3CFOKelMxJ1cpVq7/baunCSe1sN7y40WwLfUOfkATctl8TyuN/1 gwLshYdTrvn6m5GKNkL/Nsu4o5HewIak+SJdP3HXahEk1ZMzVPWvz+xb5CnbziJk U0gc7rwhc8rpTjePTVYmOeaYwDJi6WTIjRQqhW6CxdkTYB2ttPA= =m3Fh -----END PGP SIGNATURE----- Merge tag 'f2fs-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "Some urgent fixes to avoid generating corrupted inodes caused by compressed and inline_data files. In addition, avoid a wrong error report which prevents a roll-forward recovery" * tag 'f2fs-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: do not count ENOENT for error case f2fs: fix iostat related lock protection f2fs: attach inline_data after setting compression	2022-06-25 09:19:51 -07:00
Paulo Alcantara	af3a6d1018	cifs: update cifs_ses::ip_addr after failover cifs_ses::ip_addr wasn't being updated in cifs_session_setup() when reconnecting SMB sessions thus returning wrong value in /proc/fs/cifs/DebugData. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-24 13:34:28 -05:00
Linus Torvalds	598f240487	io_uring-5.19-2022-06-24 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmK19YQQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpiycD/0TUfhJbMosCPjfIQXD4pHx+hWhN24B4RA6 /2EDMfmhtm8gHVNZzeh0Qyh9UHXbteZK3uhiAUE42MzMr4FJ1ow3Lt+28Ou9dkF5 tMvwANRiipXFjIJJez2v30zZ2nozhaJlPsdrSq9YY48kS2F9nYGVm07rPQ0gMdoI Awjwb515xK+VMciSDpYo9BcBj1LqDr+yYAyPELt8UlSuvEhZ0TauYzyP7VCSgByI aA8BWe5Gh5LLbEg3JoGAE1eG/Xs1OJjPAL/fY9C8k9umCmH3dQvpsOwtek1v359D JuL/Q1M/iPdq8TRg+Dj+ynv92EDVULuwnSQdOypAQIXKCVdCvCak4QwK0rQ8vn+c AinbHMaKpDc28P07ISBpPsvvpinktBd3fVfNLtq6tn2epkqYXvPcZa6n9La4Jrh8 zAt3YIzKt60LSbrOs8jervVF+YZpCU0xKt8WFbhwy5D8POIgRUX8Nu5sI5e8vFEL vdzhEzEJrL6HlOo2LOQbX4zMHG2IqPcUJQo5Yt2DXOIos5cJifPnhv8OMTQ1dZIG gS3N2DgH4AA0FP1awiB7C45sVltDDKb/DEgTUdde4UmP0I4Cy7LXjxrYn58kA1mi l+c/465D1US/fmfzc2sXxlKhMA932ICNeJldZwBJByTRdfV1gDCMWgY4B7XRlQMZ LuGKsxtUIw== =Z57a -----END PGP SIGNATURE----- Merge tag 'io_uring-5.19-2022-06-24' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "A few fixes that should go into the 5.19 release. All are fixing issues that either happened in this release, or going to stable. In detail: - A small series of fixlets for the poll handling, all destined for stable (Pavel) - Fix a merge error from myself that caused a potential -EINVAL for the recv/recvmsg flag setting (me) - Fix a kbuf recycling issue for partial IO (me) - Use the original request for the inflight tracking (me) - Fix an issue introduced this merge window with trace points using a custom decoder function, which won't work for perf (Dylan)" * tag 'io_uring-5.19-2022-06-24' of git://git.kernel.dk/linux-block: io_uring: use original request task for inflight tracking io_uring: move io_uring_get_opcode out of TP_printk io_uring: fix double poll leak on repolling io_uring: fix wrong arm_poll error handling io_uring: fail links when poll fails io_uring: fix req->apoll_events io_uring: fix merge error in checking send/recv addr2 flags io_uring: mark reissue requests with REQ_F_PARTIAL_IO	2022-06-24 11:02:26 -07:00
Shyam Prasad N	8da33fd11c	cifs: avoid deadlocks while updating iface We use cifs_tcp_ses_lock to protect a lot of things. Not only does it protect the lists of connections, sessions, tree connects, open file lists, etc., we also use it to protect some fields in each of it's entries. In this case, cifs_mark_ses_for_reconnect takes the cifs_tcp_ses_lock to traverse the lists, and then calls cifs_update_iface. However, that can end up calling cifs_put_tcp_session, which picks up the same lock again. Avoid this by taking a ref for the session, drop the lock, and then call update iface. Also, in cifs_update_iface, avoid nested locking of iface_lock and chan_lock, as much as possible. When unavoidable, we need to pick iface_lock first. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-24 09:17:56 -05:00
Namjae Jeon	b5e5f9dfc9	ksmbd: check invalid FileOffset and BeyondFinalZero in FSCTL_ZERO_DATA FileOffset should not be greater than BeyondFinalZero in FSCTL_ZERO_DATA. And don't call ksmbd_vfs_zero_data() if length is zero. Cc: stable@vger.kernel.org Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-23 23:30:46 -05:00
Namjae Jeon	18e39fb960	ksmbd: set the range of bytes to zero without extending file size in FSCTL_ZERO_DATA generic/091, 263 test failed since commit `f66f8b94e7` ("cifs: when extending a file with falloc we should make files not-sparse"). FSCTL_ZERO_DATA sets the range of bytes to zero without extending file size. The VFS_FALLOCATE_FL_KEEP_SIZE flag should be used even on non-sparse files. Cc: stable@vger.kernel.org Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-23 23:30:46 -05:00
Hyunchul Lee	745bbc0995	ksmbd: remove duplicate flag set in smb2_write The writethrough flag is set again if is_rdma_channel is false. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-23 23:30:46 -05:00
Dave Chinner	5e672cd69f	xfs: introduce xfs_inodegc_push() The current blocking mechanism for pushing the inodegc queue out to disk can result in systems becoming unusable when there is a long running inodegc operation. This is because the statfs() implementation currently issues a blocking flush of the inodegc queue and a significant number of common system utilities will call statfs() to discover something about the underlying filesystem. This can result in userspace operations getting stuck on inodegc progress, and when trying to remove a heavily reflinked file on slow storage with a full journal, this can result in delays measuring in hours. Avoid this problem by adding "push" function that expedites the flushing of the inodegc queue, but doesn't wait for it to complete. Convert xfs_fs_statfs() and xfs_qm_scall_getquota() to use this mechanism so they don't block but still ensure that queued operations are expedited. Fixes: `ab23a77687` ("xfs: per-cpu deferred inode inactivation queues") Reported-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Dave Chinner <dchinner@redhat.com> [djwong: fix _getquota_next to use _inodegc_push too] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-06-23 13:34:38 -07:00
Dave Chinner	7cf2b0f961	xfs: bound maximum wait time for inodegc work Currently inodegc work can sit queued on the per-cpu queue until the workqueue is either flushed of the queue reaches a depth that triggers work queuing (and later throttling). This means that we could queue work that waits for a long time for some other event to trigger flushing. Hence instead of just queueing work at a specific depth, use a delayed work that queues the work at a bound time. We can still schedule the work immediately at a given depth, but we no long need to worry about leaving a number of items on the list that won't get processed until external events prevail. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-06-23 13:34:38 -07:00
Linus Torvalds	fa1796a835	Tracing fixes: - Check for NULL in kretprobe_dispatcher() NULL can now be passed in, make sure it can handle it - Clean up unneeded #endif #ifdef of the same preprocessor check in the middle of the block. - Comment clean up - Remove unneeded initialization of the "ret" variable in __trace_uprobe_create() -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYrMu9hQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qpuZAP9gS8Xcd7nenV3i9j4lCFktWQrvQwvh wyNb9UuLqPVMUQEAkk4hzq38P2UvEOZ+v+WdJnXfOb3wpFhrxWFycz5ZVAw= =9WXA -----END PGP SIGNATURE----- Merge tag 'trace-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Check for NULL in kretprobe_dispatcher() NULL can now be passed in, make sure it can handle it - Clean up unneeded #endif #ifdef of the same preprocessor check in the middle of the block. - Comment clean up - Remove unneeded initialization of the "ret" variable in __trace_uprobe_create() * tag 'trace-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/uprobes: Remove unwanted initialization in __trace_uprobe_create() tracefs: Fix syntax errors in comments tracing: Simplify conditional compilation code in tracing_set_tracer() tracing/kprobes: Check whether get_kretprobe() returns NULL in kretprobe_dispatcher()	2022-06-23 12:24:49 -05:00
Jens Axboe	386e4fb696	io_uring: use original request task for inflight tracking In prior kernels, we did file assignment always at prep time. This meant that req->task == current. But after deferring that assignment and then pushing the inflight tracking back in, we've got the inflight tracking using current when it should in fact now be using req->task. Fixup that error introduced by adding the inflight tracking back after file assignments got modifed. Fixes: `9cae36a094` ("io_uring: reinstate the inflight tracking") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-23 11:06:43 -06:00
Shyam Prasad N	6e1c1c08cd	cifs: periodically query network interfaces from server Currently, we only query the server for network interfaces information at the time of mount, and never afterwards. This can be a problem, especially for services like Azure, where the IP address of the channel endpoints can change over time. With this change, we schedule a 600s polling of this info from the server for each tree connect. An alternative for periodic polling was to do this only at the time of reconnect. But this could delay the reconnect time slightly. Also, there are some challenges w.r.t how we have cifs_reconnect implemented today. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-22 19:51:43 -05:00
Shyam Prasad N	b54034a73b	cifs: during reconnect, update interface if necessary Going forward, the plan is to periodically query the server for it's interfaces (when multichannel is enabled). This change allows checking for inactive interfaces during reconnect, and reconnect to a new interface if necessary. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-22 19:51:43 -05:00
Shyam Prasad N	aa45dadd34	cifs: change iface_list from array to sorted linked list A server's published interface list can change over time, and needs to be updated. We've storing iface_list as a simple array, which makes it difficult to manipulate an existing list. With this change, iface_list is modified into a linked list of interfaces, which is kept sorted by speed. Also added a reference counter for an iface entry, so that each channel can maintain a backpointer to the iface and drop it easily when needed. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-22 19:51:43 -05:00
Shyam Prasad N	9de74996a7	smb3: use netname when available on secondary channels Some servers do not allow null netname contexts, which would cause multichannel to revert to single channel when mounting to some servers (e.g. Azure xSMB). The previous patch fixed that by avoiding incorrectly sending the netname context when there would be a null hostname sent in the netname context, while this patch fixes the null hostname for the secondary channel by using the hostname of the primary channel for the secondary channel. Fixes: `4c14d7043f` ("cifs: populate empty hostnames for extra channels") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-22 19:46:53 -05:00
Linus Torvalds	3abc3ae553	9p-for-5.19-rc4: fid refcount and fscache fixes This contains a couple of fixes: - fid refcounting was incorrect in some corner cases and would leak resources, only freed at umount time. The first three commits fix three such cases - cache=loose or fscache was broken when trying to write a partial page to a file with no read permission since the rework a few releases ago. The fix taken here is just to restore old behavior of using the special 'writeback_fid' for such reads, which is open as root/RDWR and such not get complains that we try to read on a WRONLY fid. Long-term it'd be nice to get rid of this and not issue the read at all (skip cache?) in such cases, but that direction hasn't progressed -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmKynEUACgkQq06b7GqY 5nBOwQ//c1AoCuzt8gXefaBy9dDvaq/Cg5a339bUGmsvRJS8dHWTx2/HO7ncf3wE 59uRh+ipLxXmHTkkLz13JtaVAFQ2HYlxKwmyvakBIjGVgDC+IYm9vkPHb2Z2yIBY D6XTuNufnb+/lrqekrmHiT2+eJOi2MhxPNyjXUAML7KKny6LpzdwymF/KIEsCbR8 EbRrSf+KTnCssIfJlrZUbbk2UkbW18uG/V1MgThN3rgj+bgG/oB+lU6BELCIOQc2 +0io2dg+ZgfJIK2fpBKF64vK2ILMSNEJ8obkfWgqOyI/LBOya38Z/cSbuzPMBwZd P2A2zQmjp8oYSbXM8EGaSFTXix28Lxljk5vvT/xbEipzyUU3UZAPJJE6UX9M66UF d/FHA8ljDVuRrknM0yDv5sqBYRB8uuEBtUiKGBO6k5zPTn0A7oEzEviryMCiEUF5 1fbe/PWrFLnZMB2hWZ1aiY0tyopivp67zo6mRY/qehCihb/QlpiVNLGCC1e3eMdu FHPR3pSD1B5jFurOB8Wn1zUMjsZsnIjvpOET4WiP9pU9SJpOCd2fsAo69POHZVfA NIJxZ9MqW+3/eK+7CDmwnJLhTNRvvrQmTH55Ex61HTcn+2KFIqizCr/I6sQUl/g0 teAB8T5UlS6+nDDWfZouUiXcm0He2C56RyJOCYlagHD1qYm//Gg= =2yZw -----END PGP SIGNATURE----- Merge tag '9p-for-5.19-rc4' of https://github.com/martinetd/linux Pull 9pfs fixes from Dominique Martinet: "A couple of fid refcount and fscache fixes: - fid refcounting was incorrect in some corner cases and would leak resources, only freed at umount time. The first three commits fix three such cases - 'cache=loose' or fscache was broken when trying to write a partial page to a file with no read permission since the rework a few releases ago. The fix taken here is just to restore old behavior of using the special 'writeback_fid' for such reads, which is open as root/RDWR and such not get complains that we try to read on a WRONLY fid. Long-term it'd be nice to get rid of this and not issue the read at all (skip cache?) in such cases, but that direction hasn't progressed" * tag '9p-for-5.19-rc4' of https://github.com/martinetd/linux: 9p: fix EBADF errors in cached mode 9p: Fix refcounting during full path walks for fid lookups 9p: fix fid refcount leak in v9fs_vfs_get_link 9p: fix fid refcount leak in v9fs_vfs_atomic_open_dotl	2022-06-22 08:09:49 -05:00
Pavel Begunkov	c0737fa9a5	io_uring: fix double poll leak on repolling We have re-polling for partial IO, so a request can be polled twice. If it used two poll entries the first time then on the second io_arm_poll_handler() it will find the old apoll entry and NULL kmalloc()'ed second entry, i.e. apoll->double_poll, so leaking it. Fixes: `10c873334f` ("io_uring: allow re-poll if we made progress") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fee2452494222ecc7f1f88c8fb659baef971414a.1655852245.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-21 17:24:37 -06:00
Pavel Begunkov	9d2ad2947a	io_uring: fix wrong arm_poll error handling Leaving ip.error set when a request was punted to task_work execution is problematic, don't forget to clear it. Fixes: `aa43477b04` ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a6c84ef4182c6962380aebe11b35bdcb25b0ccfb.1655852245.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-21 17:24:37 -06:00
Pavel Begunkov	c487a5ad48	io_uring: fail links when poll fails Don't forget to cancel all linked requests of poll request when __io_arm_poll_handler() failed. Fixes: `aa43477b04` ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a78aad962460f9fdfe4aa4c0b62425c88f9415bc.1655852245.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-21 17:24:37 -06:00
Linus Torvalds	ff872b76b3	for-5.19-rc3-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmKxvkkACgkQxWXV+ddt WDsQYhAAofZGaOdBwSDvGA4srB2ieDIFoMeNb1NYp2P5vafPo3Q5AAvgGAeKhp5x g2C7W/8q2GMJ+B9SjyiBkVufuQmCWbFKxStQM3QysYoj/EyKyp7SXtO4YMWHz2T3 nfMMlPo2aNpr7Z2s+tcjhthq/hIvVFi6kweRFNvacM2bb/17IxgAdqLpQBqK5xe9 /IGSUTw75jSd2sZSyzBqrqshKDonmJ7u4qCV2X5hTPi8w4AUDERJrm0bOnikNXHx 4LnNDmSIA0BEXybHwEAShoK0ge66z1kP1UspQNB7pKriJcyroNPjgm/fMZJiRKIc zEYEMSzTYQa5eDwhXCz5PCaPqY4y/ovfYCsmySVXt1a7wgplVl+vsOaesE2NFVCX FE36d58L+4I8iTJhpVCNmEU9N/spfvAr3mBAcKCkbp9WKyGJ9/2yJpRThkV8Pw2Y bzhFIYRs1CJvkK7P4Cp+FSfzJx6tvYAqblvE97VUt83PuqS1Fb49lKdr5DZnbplV vDkewmvXSmHH9Ic5xBeTJXJZ+yeibk/0LSNEKczWva6f60h0ubF0OI6BzmS+NZbN HyitKerX0ZyFi5VUOZ+PKzXfR3ZlX3SmjAcHrl9BjZjFOJkpxAx6yWBzdnkitb+O fYyT68H4IetxwkghPVBv8qFCkuNy/i9NsEILcAAXd8CHGQlfwDA= =eORM -----END PGP SIGNATURE----- Merge tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - print more error messages for invalid mount option values - prevent remount with v1 space cache for subpage filesystem - fix hang during unmount when block group reclaim task is running * tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: add error messages to all unrecognized mount options btrfs: prevent remounting to v1 space cache for subpage mount btrfs: fix hang during unmount when block group reclaim task is running	2022-06-21 12:06:04 -05:00
David Howells	cb78d1b5ef	afs: Fix dynamic root getattr The recent patch to make afs_getattr consult the server didn't account for the pseudo-inodes employed by the dynamic root-type afs superblock not having a volume or a server to access, and thus an oops occurs if such a directory is stat'd. Fix this by checking to see if the vnode->volume pointer actually points anywhere before following it in afs_getattr(). This can be tested by stat'ing a directory in /afs. It may be sufficient just to do "ls /afs" and the oops looks something like: BUG: kernel NULL pointer dereference, address: 0000000000000020 ... RIP: 0010:afs_getattr+0x8b/0x14b ... Call Trace: <TASK> vfs_statx+0x79/0xf5 vfs_fstatat+0x49/0x62 Fixes: `2aeb8c86d4` ("afs: Fix afs_getattr() to refetch file status if callback break occurred") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/165408450783.1031787.7941404776393751186.stgit@warthog.procyon.org.uk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-06-21 11:47:30 -05:00
Jaegeuk Kim	82c7863ed9	f2fs: do not count ENOENT for error case Otherwise, we can get a wrong cp_error mark. Cc: <stable@vger.kernel.org> Fixes: `a7b8618aa2` ("f2fs: avoid infinite loop to flush node pages") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-06-21 08:29:56 -07:00
Pavel Begunkov	aacf2f9f38	io_uring: fix req->apoll_events apoll_events should be set once in the beginning of poll arming just as poll->events and not change after. However, currently io_uring resets it on each __io_poll_execute() for no clear reason. There is also a place in __io_arm_poll_handler() where we add EPOLLONESHOT to downgrade a multishot, but forget to do the same thing with ->apoll_events, which is buggy. Fixes: `81459350d5` ("io_uring: cache req->apoll->events in req->cflags") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/0aef40399ba75b1a4d2c2e85e6e8fd93c02fc6e4.1655814213.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-21 07:49:05 -06:00
Jens Axboe	b60cac14bb	io_uring: fix merge error in checking send/recv addr2 flags With the dropping of the IOPOLL checking in the per-opcode handlers, we inadvertently left two checks in the recv/recvmsg and send/sendmsg prep handlers for the same thing, and one of them includes addr2 which holds the flags for these opcodes. Fix it up and kill the redundant checks. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-21 07:47:13 -06:00
Josef Bacik	bf7ba8ee75	btrfs: fix deadlock with fsync+fiemap+transaction commit We are hitting the following deadlock in production occasionally Task 1 Task 2 Task 3 Task 4 Task 5 fsync(A) start trans start commit falloc(A) lock 5m-10m start trans wait for commit fiemap(A) lock 0-10m wait for 5m-10m (have 0-5m locked) have btrfs_need_log_full_commit !full_sync wait_ordered_extents finish_ordered_io(A) lock 0-5m DEADLOCK We have an existing dependency of file extent lock -> transaction. However in fsync if we tried to do the fast logging, but then had to fall back to committing the transaction, we will be forced to call btrfs_wait_ordered_range() to make sure all of our extents are updated. This creates a dependency of transaction -> file extent lock, because btrfs_finish_ordered_io() will need to take the file extent lock in order to run the ordered extents. Fix this by stopping the transaction if we have to do the full commit and we attempted to do the fast logging. Then attach to the transaction and commit it if we need to. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-06-21 14:47:08 +02:00
Zygo Blaxell	97e86631bc	btrfs: don't set lock_owner when locking extent buffer for reading In `196d59ab9c` "btrfs: switch extent buffer tree lock to rw_semaphore" the functions for tree read locking were rewritten, and in the process the read lock functions started setting eb->lock_owner = current->pid. Previously lock_owner was only set in tree write lock functions. Read locks are shared, so they don't have exclusive ownership of the underlying object, so setting lock_owner to any single value for a read lock makes no sense. It's mostly harmless because write locks and read locks are mutually exclusive, and none of the existing code in btrfs (btrfs_init_new_buffer and print_eb_refs_lock) cares what nonsense is written in lock_owner when no writer is holding the lock. KCSAN does care, and will complain about the data race incessantly. Remove the assignments in the read lock functions because they're useless noise. Fixes: `196d59ab9c` ("btrfs: switch extent buffer tree lock to rw_semaphore") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Signed-off-by: David Sterba <dsterba@suse.com>	2022-06-21 14:46:56 +02:00
Naohiro Aota	19ab78ca86	btrfs: zoned: fix critical section of relocation inode writeback We use btrfs_zoned_data_reloc_{lock,unlock} to allow only one process to write out to the relocation inode. That critical section must include all the IO submission for the inode. However, flush_write_bio() in extent_writepages() is out of the critical section, causing an IO submission outside of the lock. This leads to an out of the order IO submission and fail the relocation process. Fix it by extending the critical section. Fixes: `35156d8527` ("btrfs: zoned: only allow one process to add pages to a relocation inode") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-06-21 14:46:30 +02:00
Naohiro Aota	343d8a3085	btrfs: zoned: prevent allocation from previous data relocation BG After commit `5f0addf7b8` ("btrfs: zoned: use dedicated lock for data relocation"), we observe IO errors on e.g, btrfs/232 like below. [09.0][T4038707] WARNING: CPU: 3 PID: 4038707 at fs/btrfs/extent-tree.c:2381 btrfs_cross_ref_exist+0xfc/0x120 [btrfs] <snip> [09.9][T4038707] Call Trace: [09.5][T4038707] <TASK> [09.3][T4038707] run_delalloc_nocow+0x7f1/0x11a0 [btrfs] [09.6][T4038707] ? test_range_bit+0x174/0x320 [btrfs] [09.2][T4038707] ? fallback_to_cow+0x980/0x980 [btrfs] [09.3][T4038707] ? find_lock_delalloc_range+0x33e/0x3e0 [btrfs] [09.5][T4038707] btrfs_run_delalloc_range+0x445/0x1320 [btrfs] [09.2][T4038707] ? test_range_bit+0x320/0x320 [btrfs] [09.4][T4038707] ? lock_downgrade+0x6a0/0x6a0 [09.2][T4038707] ? orc_find.part.0+0x1ed/0x300 [09.5][T4038707] ? __module_address.part.0+0x25/0x300 [09.0][T4038707] writepage_delalloc+0x159/0x310 [btrfs] <snip> [09.4][ C3] sd 10:0:1:0: [sde] tag#2620 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s [09.5][ C3] sd 10:0:1:0: [sde] tag#2620 Sense Key : Illegal Request [current] [09.9][ C3] sd 10:0:1:0: [sde] tag#2620 Add. Sense: Unaligned write command [09.5][ C3] sd 10:0:1:0: [sde] tag#2620 CDB: Write(16) 8a 00 00 00 00 00 02 f3 63 87 00 00 00 2c 00 00 [09.4][ C3] critical target error, dev sde, sector 396041272 op 0x1:(WRITE) flags 0x800 phys_seg 3 prio class 0 [09.9][ C3] BTRFS error (device dm-1): bdev /dev/mapper/dml_102_2 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 The IO errors occur when we allocate a regular extent in previous data relocation block group. On zoned btrfs, we use a dedicated block group to relocate a data extent. Thus, we allocate relocating data extents (pre-alloc) only from the dedicated block group and vice versa. Once the free space in the dedicated block group gets tight, a relocating extent may not fit into the block group. In that case, we need to switch the dedicated block group to the next one. Then, the previous one is now freed up for allocating a regular extent. The BG is already not enough to allocate the relocating extent, but there is still room to allocate a smaller extent. Now the problem happens. By allocating a regular extent while nocow IOs for the relocation is still on-going, we will issue WRITE IOs (for relocation) and ZONE APPEND IOs (for the regular writes) at the same time. That mixed IOs confuses the write pointer and arises the unaligned write errors. This commit introduces a new bit 'zoned_data_reloc_ongoing' to the btrfs_block_group. We set this bit before releasing the dedicated block group, and no extent are allocated from a block group having this bit set. This bit is similar to setting block_group->ro, but is different from it by allowing nocow writes to start. Once all the nocow IO for relocation is done (hooked from btrfs_finish_ordered_io), we reset the bit to release the block group for further allocation. Fixes: `c2707a2556` ("btrfs: zoned: add a dedicated data relocation block group") CC: stable@vger.kernel.org # 5.16+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-06-21 14:43:48 +02:00
Filipe Manana	650c9caba3	btrfs: do not BUG_ON() on failure to migrate space when replacing extents At btrfs_replace_file_extents(), if we fail to migrate reserved metadata space from the transaction block reserve into the local block reserve, we trigger a BUG_ON(). This is because it should not be possible to have a failure here, as we reserved more space when we started the transaction than the space we want to migrate. However having a BUG_ON() is way too drastic, we can perfectly handle the failure and return the error to the caller. So just do that instead, and add a WARN_ON() to make it easier to notice the failure if it ever happens (which is particularly useful for fstests, and the warning will trigger a failure of a test case). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-06-21 14:43:27 +02:00
Filipe Manana	983d8209c6	btrfs: add missing inode updates on each iteration when replacing extents When replacing file extents, called during fallocate, hole punching, clone and deduplication, we may not be able to replace/drop all the target file extent items with a single transaction handle. We may get -ENOSPC while doing it, in which case we release the transaction handle, balance the dirty pages of the btree inode, flush delayed items and get a new transaction handle to operate on what's left of the target range. By dropping and replacing file extent items we have effectively modified the inode, so we should bump its iversion and update its mtime/ctime before we update the inode item. This is because if the transaction we used for partially modifying the inode gets committed by someone after we release it and before we finish the rest of the range, a power failure happens, then after mounting the filesystem our inode has an outdated iversion and mtime/ctime, corresponding to the values it had before we changed it. So add the missing iversion and mtime/ctime updates. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-06-21 14:43:21 +02:00
Filipe Manana	d4597898ba	btrfs: fix race between reflinking and ordered extent completion While doing a reflink operation, if an ordered extent for a file range that does not overlap with the source and destination ranges of the reflink operation happens, we can end up having a failure in the reflink operation and return -EINVAL to user space. The following sequence of steps explains how this can happen: 1) We have the page at file offset 315392 dirty (under delalloc); 2) A reflink operation for this file starts, using the same file as both source and destination, the source range is [372736, 409600) (length of 36864 bytes) and the destination range is [208896, 245760); 3) At btrfs_remap_file_range_prep(), we flush all delalloc in the source and destination ranges, and wait for any ordered extents in those range to complete; 4) Still at btrfs_remap_file_range_prep(), we then flush all delalloc in the inode, but we neither wait for it to complete nor any ordered extents to complete. This results in starting delalloc for the page at file offset 315392 and creating an ordered extent for that single page range; 5) We then move to btrfs_clone() and enter the loop to find file extent items to copy from the source range to destination range; 6) In the first iteration we end up at last file extent item stored in leaf A: (...) item 131 key (143616 108 315392) itemoff 5101 itemsize 53 extent data disk bytenr 1903988736 nr 73728 extent data offset 12288 nr 61440 ram 73728 This represents the file range [315392, 376832), which overlaps with the source range to clone. @datal is set to 61440, key.offset is 315392 and @next_key_min_offset is therefore set to 376832 (315392 + 61440). @off (372736) is > key.offset (315392), so @new_key.offset is set to the value of @destoff (208896). @new_key.offset == @last_dest_end (208896) so @drop_start is set to 208896 (@new_key.offset). @datal is adjusted to 4096, as @off is > @key.offset. So in this iteration we call btrfs_replace_file_extents() for the range [208896, 212991] (a single page, which is [@drop_start, @new_key.offset + @datal - 1]). @last_dest_end is set to 212992 (@new_key.offset + @datal = 208896 + 4096 = 212992). Before the next iteration of the loop, @key.offset is set to the value 376832, which is @next_key_min_offset; 7) On the second iteration btrfs_search_slot() leaves us again at leaf A, but this time pointing beyond the last slot of leaf A, as that's where a key with offset 376832 should be at if it existed. So end up calling btrfs_next_leaf(); 8) btrfs_next_leaf() releases the path, but before it searches again the tree for the next key/leaf, the ordered extent for the single page range at file offset 315392 completes. That results in trimming the file extent item we processed before, adjusting its key offset from 315392 to 319488, reducing its length from 61440 to 57344 and inserting a new file extent item for that single page range, with a key offset of 315392 and a length of 4096. Leaf A now looks like: (...) item 132 key (143616 108 315392) itemoff 4995 itemsize 53 extent data disk bytenr 1801666560 nr 4096 extent data offset 0 nr 4096 ram 4096 item 133 key (143616 108 319488) itemoff 4942 itemsize 53 extent data disk bytenr 1903988736 nr 73728 extent data offset 16384 nr 57344 ram 73728 9) When btrfs_next_leaf() returns, it gives us a path pointing to leaf A at slot 133, since it's the first key that follows what was the last key we saw (143616 108 315392). In fact it's the same item we processed before, but its key offset was changed, so it counts as a new key; 10) So now we have: @key.offset == 319488 @datal == 57344 @off (372736) is > key.offset (319488), so @new_key.offset is set to 208896 (@destoff value). @new_key.offset (208896) != @last_dest_end (212992), so @drop_start is set to 212992 (@last_dest_end value). @datal is adjusted to 4096 because @off > @key.offset. So in this iteration we call btrfs_replace_file_extents() for the invalid range of [212992, 212991] (which is [@drop_start, @new_key.offset + @datal - 1]). This range is empty, the end offset is smaller than the start offset so btrfs_replace_file_extents() returns -EINVAL, which we end up returning to user space and fail the reflink operation. This all happens because the range of this file extent item was already processed in the previous iteration. This scenario can be triggered very sporadically by fsx from fstests, for example with test case generic/522. So fix this by having btrfs_clone() skip file extent items that cover a file range that we have already processed. CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-06-21 14:43:13 +02:00
Steve French	73130a7b1a	smb3: fix empty netname context on secondary channels Some servers do not allow null netname contexts, which would cause multichannel to revert to single channel when mounting to some servers (e.g. Azure xSMB). Fixes: `4c14d7043f` ("cifs: populate empty hostnames for extra channels") Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-20 16:23:50 -05:00
Jens Axboe	1bacd264d3	io_uring: mark reissue requests with REQ_F_PARTIAL_IO If we mark for reissue, we assume that the buffer will remain stable. Hence if are using a provided buffer, we need to ensure that we stick with it for the duration of that request. This only affects block devices that use provided buffers, as those are the only ones that get marked with REQ_F_REISSUE. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-20 06:39:27 -06:00
Daeho Jeong	61803e9843	f2fs: fix iostat related lock protection Made iostat related locks safe to be called from irq context again. Cc: <stable@vger.kernel.org> Fixes: `a1e09b03e6` ("f2fs: use iomap for direct I/O") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Tested-by: Eddie Huang <eddie.huang@mediatek.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-06-19 15:16:12 -07:00
Jaegeuk Kim	4cde00d507	f2fs: attach inline_data after setting compression This fixes the below corruption. [345393.335389] F2FS-fs (vdb): sanity_check_inode: inode (ino=6d0, mode=33206) should not have inline_data, run fsck to fix Cc: <stable@vger.kernel.org> Fixes: `677a82b44e` ("f2fs: fix to do sanity check for inline inode") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-06-19 15:16:10 -07:00
Linus Torvalds	063232b6c4	Fixes for 5.19-rc3: - Fix a bug where inode flag changes would accidentally drop nrext64. - Fix a race condition when toggling LARP mode. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmKqyp4ACgkQ+H93GTRK tOtnURAAmJUASVXnixuuqRp8srbotuWc9EGJY+0/UFAfnfSlgasVeS1XB5bZ1CZP QhRYgDfPnuDvXwNrz3LHFL1ihll1whbJeXP2tYnCTolB8yFutk/xDLmwvXuRVR0y yzbbl6MtnHZ7SThhsXgUoJ3b0ItVxq8xN/0h1VVr0OI2zUryOR+Kd1c/G3VIPPZ6 ZXyigcdQFAqB1oB/f2D6yHIqtIZopS+kwtcMTBz0qr82Tvp4Vzh9OMCU6BwdtidG o/UIBSrliW8qgrXom5Asy5mmLCa3wou7JfQc176ADbG09XjxoL0djHF5ZcbpQT7i A3WRQwwsNPfTGmyukngk2rH9JoeVSzvhyXD2ArrLJB/Ra097reXpsH0ABm63ova3 YV8sX8BCoTjNzoN+abHq9jXxfcLaesJyZKfm6wU1bJ/0nkSYnGqwI9tWii18lRUQ GuVEShDMJAIUYWo2ysmm1fRhNM7I9+kE8ZprNBuUnK3ej9efZQPV20uOzqDI7H0Z 6IW1JKHZr4WHAHeymkl8AHKt6U6+tCBjSUT/CGlfph+NNvytd2XvvEAIW5oFMEvA fMvYSnuk40tb6LpBGQcXxRjl14BvgBgc2omkVZuJf1X3rkg7i6U9zJv9rp87CBhl PnEnLvDa86KHxmq2Jxs1rh0LYu2OzCNGsoxICf8w4mloZmEFIqA= =vvDX -----END PGP SIGNATURE----- Merge tag 'xfs-5.19-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: "There's not a whole lot this time around (I'm still on vacation) but here are some important fixes for new features merged in -rc1: - Fix a bug where inode flag changes would accidentally drop nrext64 - Fix a race condition when toggling LARP mode" * tag 'xfs-5.19-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: preserve DIFLAG2_NREXT64 when setting other inode attributes xfs: fix variable state usage xfs: fix TOCTOU race involving the new logged xattrs control knob	2022-06-19 09:24:49 -05:00
Linus Torvalds	354c6e071b	Fix a variety of bugs, many of which were found by folks using fuzzing or error injection. Also fix up how test_dummy_encryption mount option is handled for the new mount API. Finally, fix/cleanup a number of comments and ext4 Documentation files. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmKuYpcACgkQ8vlZVpUN gaMXwwf8DSHJ3gI2Lo0wrzJm7KSS0C+HK29/rtLCZdxECQsZR156ZzSF3zAFKOwK Yx3RJwiFxrciUUytY/MWTyalCk+M8oW1093SfRqNNZCbZNi33acnbTqioa7INnDw snFGGEU1y0M0AUduxNWPr71P80sTyQa0ZplIc4YeR98zzMvoWgi1dvo4wNdtJNQb Gb0FtBhgP+IeK50eBlK4O0Eg5kqd0V5OeTLUYUfsWqU28ap8dHYE48I6sIdHx6az sa6b2+YRuBxJUV61FNujuVtkDgUHXtXM97kkGpywRSLjo4iFxlQvX9Ew4lBD9RDI b0YHVzK/DU9M3VfiYgzGwShCb/M68w== =NtNY -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix a variety of bugs, many of which were found by folks using fuzzing or error injection. Also fix up how test_dummy_encryption mount option is handled for the new mount API. Finally, fix/cleanup a number of comments and ext4 Documentation files" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix a doubled word "need" in a comment ext4: add reserved GDT blocks check ext4: make variable "count" signed ext4: correct the judgment of BUG in ext4_mb_normalize_request ext4: fix bug_on ext4_mb_use_inode_pa ext4: fix up test_dummy_encryption handling for new mount API ext4: use kmemdup() to replace kmalloc + memcpy ext4: fix super block checksum incorrect after mount ext4: improve write performance with disabled delalloc ext4: fix warning when submitting superblock in ext4_commit_super() ext4, doc: remove unnecessary escaping ext4: fix incorrect comment in ext4_bio_write_page() fs: fix jbd2_journal_try_to_free_buffers() kernel-doc comment	2022-06-18 21:51:12 -05:00
Linus Torvalds	ace2045ed5	2 smb3 debugging improvements -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmKuOPcACgkQiiy9cAdy T1FNoAv/VZwWl1J5iFVAbZhLAt/LhkL/1Ee8TeMRxa7QExifBJ4latsi1duOXBnR bRQ+lFuDmg1cuma4aayH7bHGnZZMEoZku0bpj/h8MOTf/w+GLIUUH/0LSEOi1klz nmj3fbJ4TMF/rA0Elsz4/iJIZhka3QbTAS3y7l9SlsLlgktoKJuZpEEuRgFsYNEW zdQMbb7q53L2txDDZAnR5TqesDgzeXePnvVRZDPAar8HnYrOg4sC6ueqxJtUKKBP TcC/2956tXHqd+5EYyH2Vuspf38dGxYs5qIhsMokRoMx42dAQ824JeuFy+D7eps6 /hwDp+U1XIdllQW7qVD8MZ5CzIZlFKTZGu/B4Uh7GAtzluIAFyayGhVcDdj7LFVV fEaR8N9og9DEAmqUhsKLBZM656lhpu38cOslpGqNw0gCSZNxLyyp1hNkXrVlYv9L SwclZjoQbOBMPGriyv0h6rSaNoR+J7hps8cpW/eVnXMC5VNnsrXM+EYPJbu8EWYL SLJKZp6g =oBRl -----END PGP SIGNATURE----- Merge tag '5.19-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs client fixes from Steve French: "Two cifs debugging improvements - one found to deal with debugging a multichannel problem and one for a recent fallocate issue This does include the two larger multichannel reconnect (dynamically adjusting interfaces on reconnect) patches, because we recently found an additional problem with multichannel to one server type that I want to include at the same time" * tag '5.19-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: when a channel is not found for server, log its connection id smb3: add trace point for SMB2_set_eof	2022-06-18 21:44:44 -05:00
Xiang wangx	1f3ddff375	ext4: fix a doubled word "need" in a comment Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Link: https://lore.kernel.org/r/20220605091503.12513-1-wangxiang@cdjrlc.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-06-18 19:36:20 -04:00
Zhang Yi	b55c3cd102	ext4: add reserved GDT blocks check We capture a NULL pointer issue when resizing a corrupt ext4 image which is freshly clear resize_inode feature (not run e2fsck). It could be simply reproduced by following steps. The problem is because of the resize_inode feature was cleared, and it will convert the filesystem to meta_bg mode in ext4_resize_fs(), but the es->s_reserved_gdt_blocks was not reduced to zero, so could we mistakenly call reserve_backup_gdb() and passing an uninitialized resize_inode to it when adding new group descriptors. mkfs.ext4 /dev/sda 3G tune2fs -O ^resize_inode /dev/sda #forget to run requested e2fsck mount /dev/sda /mnt resize2fs /dev/sda 8G ======== BUG: kernel NULL pointer dereference, address: 0000000000000028 CPU: 19 PID: 3243 Comm: resize2fs Not tainted 5.18.0-rc7-00001-gfde086c5ebfd #748 ... RIP: 0010:ext4_flex_group_add+0xe08/0x2570 ... Call Trace: <TASK> ext4_resize_fs+0xbec/0x1660 __ext4_ioctl+0x1749/0x24e0 ext4_ioctl+0x12/0x20 __x64_sys_ioctl+0xa6/0x110 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f2dd739617b ======== The fix is simple, add a check in ext4_resize_begin() to make sure that the es->s_reserved_gdt_blocks is zero when the resize_inode feature is disabled. Cc: stable@kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220601092717.763694-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-06-18 19:36:08 -04:00
Ding Xiang	bc75a6eb85	ext4: make variable "count" signed Since dx_make_map() may return -EFSCORRUPTED now, so change "count" to be a signed integer so we can correctly check for an error code returned by dx_make_map(). Fixes: `46c116b920` ("ext4: verify dir block before splitting it") Cc: stable@kernel.org Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20220530100047.537598-1-dingxiang@cmss.chinamobile.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-06-18 19:35:57 -04:00
Baokun Li	cf4ff938b4	ext4: correct the judgment of BUG in ext4_mb_normalize_request ext4_mb_normalize_request() can move logical start of allocated blocks to reduce fragmentation and better utilize preallocation. However logical block requested as a start of allocation (ac->ac_o_ex.fe_logical) should always be covered by allocated blocks so we should check that by modifying and to or in the assertion. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220528110017.354175-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-06-18 19:35:57 -04:00
Baokun Li	a08f789d2a	ext4: fix bug_on ext4_mb_use_inode_pa Hulk Robot reported a BUG_ON: ================================================================== kernel BUG at fs/ext4/mballoc.c:3211! [...] RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f [...] Call Trace: ext4_mb_new_blocks+0x9df/0x5d30 ext4_ext_map_blocks+0x1803/0x4d80 ext4_map_blocks+0x3a4/0x1a10 ext4_writepages+0x126d/0x2c30 do_writepages+0x7f/0x1b0 __filemap_fdatawrite_range+0x285/0x3b0 file_write_and_wait_range+0xb1/0x140 ext4_sync_file+0x1aa/0xca0 vfs_fsync_range+0xfb/0x260 do_fsync+0x48/0xa0 [...] ================================================================== Above issue may happen as follows: ------------------------------------- do_fsync vfs_fsync_range ext4_sync_file file_write_and_wait_range __filemap_fdatawrite_range do_writepages ext4_writepages mpage_map_and_submit_extent mpage_map_one_extent ext4_map_blocks ext4_mb_new_blocks ext4_mb_normalize_request >>> start + size <= ac->ac_o_ex.fe_logical ext4_mb_regular_allocator ext4_mb_simple_scan_group ext4_mb_use_best_found ext4_mb_new_preallocation ext4_mb_new_inode_pa ext4_mb_use_inode_pa >>> set ac->ac_b_ex.fe_len <= 0 ext4_mb_mark_diskspace_used >>> BUG_ON(ac->ac_b_ex.fe_len <= 0); we can easily reproduce this problem with the following commands: `fallocate -l100M disk` `mkfs.ext4 -b 1024 -g 256 disk` `mount disk /mnt` `fsstress -d /mnt -l 0 -n 1000 -p 1` The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP. Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur when the size is truncated. So start should be the start position of the group where ac_o_ex.fe_logical is located after alignment. In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP is very large, the value calculated by start_off is more accurate. Cc: stable@kernel.org Fixes: `cd648b8a8f` ("ext4: trim allocation requests to group size") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220528110017.354175-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-06-18 19:35:43 -04:00
Eric Biggers	85456054e1	ext4: fix up test_dummy_encryption handling for new mount API Since ext4 was converted to the new mount API, the test_dummy_encryption mount option isn't being handled entirely correctly, because the needed fscrypt_set_test_dummy_encryption() helper function combines parsing/checking/applying into one function. That doesn't work well with the new mount API, which split these into separate steps. This was sort of okay anyway, due to the parsing logic that was copied from fscrypt_set_test_dummy_encryption() into ext4_parse_param(), combined with an additional check in ext4_check_test_dummy_encryption(). However, these overlooked the case of changing the value of test_dummy_encryption on remount, which isn't allowed but ext4 wasn't detecting until ext4_apply_options() when it's too late to fail. Another bug is that if test_dummy_encryption was specified multiple times with an argument, memory was leaked. Fix this up properly by using the new helper functions that allow splitting up the parse/check/apply steps for test_dummy_encryption. Fixes: `cebe85d570` ("ext4: switch to the new mount api") Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220526040412.173025-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-06-18 19:35:43 -04:00
Shuqi Zhang	4efd9f0d12	ext4: use kmemdup() to replace kmalloc + memcpy Replace kmalloc + memcpy with kmemdup() Signed-off-by: Shuqi Zhang <zhangshuqi3@huawei.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220525030120.803330-1-zhangshuqi3@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-06-18 19:35:43 -04:00
Ye Bin	9b6641dd95	ext4: fix super block checksum incorrect after mount We got issue as follows: [home]# mount /dev/sda test EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended [home]# dmesg EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended EXT4-fs (sda): Errors on filesystem, clearing orphan list. EXT4-fs (sda): recovery complete EXT4-fs (sda): mounted filesystem with ordered data mode. Quota mode: none. [home]# debugfs /dev/sda debugfs 1.46.5 (30-Dec-2021) Checksum errors in superblock! Retrying... Reason is ext4_orphan_cleanup will reset ‘s_last_orphan’ but not update super block checksum. To solve above issue, defer update super block checksum after ext4_orphan_cleanup. Signed-off-by: Ye Bin <yebin10@huawei.com> Cc: stable@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220525012904.1604737-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-06-18 19:35:24 -04:00
Shyam Prasad N	5d24968f5b	cifs: when a channel is not found for server, log its connection id cifs_ses_get_chan_index gets the index for a given server pointer. When a match is not found, we warn about a possible bug. However, printing details about the non-matching server could be more useful to debug here. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-18 14:55:06 -05:00
Xiang wangx	93a8c044b9	tracefs: Fix syntax errors in comments Delete the redundant word 'to'. Link: https://lkml.kernel.org/r/20220605092729.13010-1-wangxiang@cdjrlc.com Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2022-06-17 19:01:28 -04:00
Linus Torvalds	4b35035bcf	NFS Client Fixes for Linux 5.19-rc - Bugfixes: - Add FMODE_CAN_ODIRECT support to NFSv4 so opens don't fail - Fix trunking detection & cl_max_connect setting - Avoid pnfs_update_layout() livelocks - Don't keep retrying pNFS if the server replies with NFS4ERR_UNAVAILABLE -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmKs2ZwACgkQ18tUv7Cl QOsnGxAA6g0M7IU6g375rfqxq9a/XqSlvwIeuSOb14WNybh7D6qXOa+iInsHFIl7 d7coORg846SOdUl218hVcp8Ba1OTsj/XAruJllsrHQnB50raBJ/nUbIbQlrxGYKn WzMtVnyLQ8Ml+rvERINPqVcgUBJ0PGWMiRi/h8OcUlWylV5ZI/irpkaZiuXamzhe O0Wa04N/82bUU03dEmQ0ZuPuhMn5JbOMaSzciRvHEV8nLvqvRAhGIVBtvrrYF0VB UfZx/4DTRXDD5/RA65hX2vgixZh7/cLTv4pr26wmfDBofo1zDiFsQpPS8QaZo5bt Sw+UQK1c15kW/EJS+au90mazBmFnk5UX2BOyfN+Cg5/GlHjE0YHV1f2ejbHycsyh Rcsu8nxNa5T82mg2EjOCqK2YWy8mGHYr5MJTYftL/uE8NdqP/DSgNpqNSQhQD2Bm vzsG0wP5RP+i3pRWWQnXOlZE+GdaKxtXKtg2ZjHx1Wkb2QIUbgbxST59q/U5QnIN MJKS1nhbxQA0dGo3ClzYNe76S9It7DDE0A3mdvIzPwRSheQhgmF4UlzyTWgSdyfw lnT5EK3pQat6cvdaszjMn1f6vx4BvTuhXUE9eH35opVMYykvAU6hz4ypllxKoR/h BES6KMJPKXn77ICJJVC3RR5w2v756DRNMeOfkjCi18TiuTVFXek= =nTJk -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client fixes from Anna Schumaker: - Add FMODE_CAN_ODIRECT support to NFSv4 so opens don't fail - Fix trunking detection & cl_max_connect setting - Avoid pnfs_update_layout() livelocks - Don't keep retrying pNFS if the server replies with NFS4ERR_UNAVAILABLE * tag 'nfs-for-5.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4: Add FMODE_CAN_ODIRECT after successful open of a NFS4.x file sunrpc: set cl_max_connect when cloning an rpc_clnt pNFS: Avoid a live lock condition in pnfs_update_layout() pNFS: Don't keep retrying if the server replied NFS4ERR_LAYOUTUNAVAILABLE	2022-06-17 15:17:57 -05:00
Linus Torvalds	f8e174c307	io_uring-5.19-2022-06-16 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKsc6oQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpgGxD/9YB9O3Dw2WOlzE+bnbadDEL0/XdaMVSZQX t5pfz1YTBUf/KgF+HWo6cGgvNupNjo6a2FAGJiGIXaEx2lZbKw7gUEPXohqY18h6 alLPzt881whWESXTjpsDtc57PfCVY/K5/5ebqN5AhoXCgtl6CvlePJZH8uzBMq5F vGjcBgdofum677uNPSEpn6AzIGVtd9jI6Rg8r/a4iRdzeAJlkp1ifVh424qGgWtQ fQuoV83EPut/RTUodXZwJ/2XrdJwNDex98LEmp1Pi78IprGawrQ5F9JzsypQR2ie 8ajLe6xn4wiXuWFr3pE9paow3c1APuftJ/PRXqBHoh2X6sMI4G2B2UNDkKrlK6DD 9r5INcKzpMY390nN6GnSD1BSWBGNuglu9mASXDKFXL/JK+XNi6nYlaXdPn4uAhyR Cp41xx3gGf3r8aq8Pv+YNRej3kpNSi8oHKhYPToxn+EwPX8TpTdexnQC4ZKWNMbZ Mg1hY5Z0NxuhEyvKlTXZmOF8dlf2dTZYJoqHHeYhvcoZT9dWwjrINXqJvqsCyywB 2fPOPjdn1SuBwsugSkYkMlsbLm4rlyLCLnEL2SgcbzyQ2rubN5UFcp3ouJOEt5Nz HDZi4s7LBOZTGmnmtev5GOA7kDCQ2EqOcRZQOdWPSa5g5pOL11ahxRW0KESSsPik 1pTBDjTfxg== =55JE -----END PGP SIGNATURE----- Merge tag 'io_uring-5.19-2022-06-16' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Bigger than usual at this time, both because we missed -rc2, but also because of some reverts that we chose to do. In detail: - Adjust mapped buffer API while we still can (Dylan) - Mapped buffer fixes (Dylan, Hao, Pavel, me) - Fix for uring_cmd wrong API usage for task_work (Dylan) - Fix for bug introduced in fixed file closing (Hao) - Fix race in buffer/file resource handling (Pavel) - Revert the NOP support for CQE32 and buffer selection that was brought up during the merge window (Pavel) - Remove IORING_CLOSE_FD_AND_FILE_SLOT introduced in this merge window. The API needs further refining, so just yank it for now and we'll revisit for a later kernel. - Series cleaning up the CQE32 support added in this merge window, making it more integrated rather than sitting on the side (Pavel)" * tag 'io_uring-5.19-2022-06-16' of git://git.kernel.dk/linux-block: (21 commits) io_uring: recycle provided buffer if we punt to io-wq io_uring: do not use prio task_work_add in uring_cmd io_uring: commit non-pollable provided mapped buffers upfront io_uring: make io_fill_cqe_aux honour CQE32 io_uring: remove __io_fill_cqe() helper io_uring: fix ->extra{1,2} misuse io_uring: fill extra big cqe fields from req io_uring: unite fill_cqe and the 32B version io_uring: get rid of __io_fill_cqe{32}_req() io_uring: remove IORING_CLOSE_FD_AND_FILE_SLOT Revert "io_uring: add buffer selection support to IORING_OP_NOP" Revert "io_uring: support CQE32 for nop operation" io_uring: limit size of provided buffer ring io_uring: fix types in provided buffer ring io_uring: fix index calculation io_uring: fix double unlock for pbuf select io_uring: kbuf: fix bug of not consuming ring buffer in partial io case io_uring: openclose: fix bug of closing wrong fixed file io_uring: fix not locked access to fixed buf table io_uring: fix races with buffer table unregister ...	2022-06-17 11:14:07 -07:00
Linus Torvalds	5c0cd3d4a9	Merge tag 'fs_for_v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull writeback and ext2 fixes from Jan Kara: "A fix for writeback bug which prevented machines with kdevtmpfs from booting and also one small ext2 bugfix in IO error handling" * tag 'fs_for_v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: init: Initialize noop_backing_dev_info early ext2: fix fs corruption when trying to remove a non-empty directory with IO error	2022-06-17 10:09:24 -07:00
Jens Axboe	6436c770f1	io_uring: recycle provided buffer if we punt to io-wq io_arm_poll_handler() will recycle the buffer appropriately if we end up arming poll (or if we're ready to retry), but not for the io-wq case if we have attempted poll first. Explicitly recycle the buffer to avoid both hanging on to it too long, but also to avoid multiple reads grabbing the same one. This can happen for ring mapped buffers, since it hasn't necessarily been committed. Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Link: https://github.com/axboe/liburing/issues/605 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-17 06:24:26 -06:00
Mike Kravetz	68d32527d3	hugetlbfs: zero partial pages during fallocate hole punch hugetlbfs fallocate support was originally added with commit `70c3547e36` ("hugetlbfs: add hugetlbfs_fallocate()"). Initial support only operated on whole hugetlb pages. This makes sense for populating files as other interfaces such as mmap and truncate require hugetlb page size alignment. Only operating on whole hugetlb pages for the hole punch case was a simplification and there was no compelling use case to zero partial pages. In a recent discussion[1] it was assumed that hugetlbfs hole punch would zero partial hugetlb pages as that is in line with the man page description saying 'partial filesystem blocks are zeroed'. However, the hugetlbfs hole punch code actually does this: hole_start = round_up(offset, hpage_size); hole_end = round_down(offset + len, hpage_size); Modify code to zero partial hugetlb pages in hole punch range. It is possible that application code could note a change in behavior. However, that would imply the code is passing in an unaligned range and expecting only whole pages be removed. This is unlikely as the fallocate documentation states the opposite. The current hugetlbfs fallocate hole punch behavior is tested with the libhugetlbfs test fallocate_align[2]. This test will be updated to validate partial page zeroing. [1] https://lore.kernel.org/linux-mm/20571829-9d3d-0b48-817c-b6b15565f651@redhat.com/ [2] https://github.com/libhugetlbfs/libhugetlbfs/blob/master/tests/fallocate_align.c Link: https://lkml.kernel.org/r/YqeiMlZDKI1Kabfe@monkey Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-06-16 19:11:32 -07:00
Steve French	7c05eae8db	smb3: add trace point for SMB2_set_eof In order to debug problems with file size being reported incorrectly temporarily (in this case xfstest generic/584 intermittent failure) we need to add trace point for the non-compounded code path where we set the file size (SMB2_set_eof). The new trace point is: "smb3_set_eof" Here is sample output from the tracepoint: TASK-PID CPU# \|\|\|\|\| TIMESTAMP FUNCTION \| \| \| \|\|\|\|\| \| \| xfs_io-75403 [002] ..... 95219.189835: smb3_set_eof: xid=221 sid=0xeef1cbd2 tid=0x27079ee6 fid=0x52edb58c offset=0x100000 aio-dio-append--75418 [010] ..... 95219.242402: smb3_set_eof: xid=226 sid=0xeef1cbd2 tid=0x27079ee6 fid=0xae89852d offset=0x0 Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-16 18:07:10 -05:00
Dominique Martinet	b0017602fd	9p: fix EBADF errors in cached mode cached operations sometimes need to do invalid operations (e.g. read on a write only file) Historic fscache had added a "writeback fid", a special handle opened RW as root, for this. The conversion to new fscache missed that bit. This commit reinstates a slightly lesser variant of the original code that uses the writeback fid for partial pages backfills if the regular user fid had been open as WRONLY, and thus would lack read permissions. Link: https://lkml.kernel.org/r/20220614033802.1606738-1-asmadeus@codewreck.org Fixes: `eb497943fa` ("9p: Convert to using the netfs helper lib to do reads and caching") Cc: stable@vger.kernel.org Cc: David Howells <dhowells@redhat.com> Reported-By: Christian Schoenebeck <linux_oss@crudebyte.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Tested-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-06-17 06:03:30 +09:00
Jan Kara	8d5459c11f	ext4: improve write performance with disabled delalloc When delayed allocation is disabled (either through mount option or because we are running low on free space), ext4_write_begin() allocates blocks with EXT4_GET_BLOCKS_IO_CREATE_EXT flag. With this flag extent merging is disabled and since ext4_write_begin() is called for each page separately, we end up with a lot of 1 block extents in the extent tree and following writeback is writing 1 block at a time which results in very poor write throughput (4 MB/s instead of 200 MB/s). These days when ext4_get_block_unwritten() is used only by ext4_write_begin(), ext4_page_mkwrite() and inline data conversion, we can safely allow extent merging to happen from these paths since following writeback will happen on different boundaries anyway. So use EXT4_GET_BLOCKS_CREATE_UNRIT_EXT instead which restores the performance. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220520111402.4252-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-06-16 12:17:56 -04:00
Zhang Yi	15baa7dcad	ext4: fix warning when submitting superblock in ext4_commit_super() We have already check the io_error and uptodate flag before submitting the superblock buffer, and re-set the uptodate flag if it has been failed to write out. But it was lockless and could be raced by another ext4_commit_super(), and finally trigger '!uptodate' WARNING when marking buffer dirty. Fix it by submit buffer directly. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220520023216.3065073-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-06-16 11:50:48 -04:00
Dylan Yudaken	32fc810b36	io_uring: do not use prio task_work_add in uring_cmd io_req_task_prio_work_add has a strict assumption that it will only be used with io_req_task_complete. There is a codepath that assumes this is the case and will not even call the completion function if it is hit. For uring_cmd with an arbitrary completion function change the call to the correct non-priority version. Fixes: `ee692a21e9` ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Dylan Yudaken <dylany@fb.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20220616135011.441980-1-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-16 09:10:26 -06:00
Wang Jianjian	48e02e6113	ext4: fix incorrect comment in ext4_bio_write_page() Signed-off-by: Wang Jianjian <wangjianjian3@huawei.com> Link: https://lore.kernel.org/r/20220520022255.2120576-1-wangjianjian3@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-06-16 11:03:16 -04:00
Yang Li	4f5bf12732	fs: fix jbd2_journal_try_to_free_buffers() kernel-doc comment Add the description of @folio and remove @page in function kernel-doc comment to remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/jbd2/transaction.c:2149: warning: Function parameter or member 'folio' not described in 'jbd2_journal_try_to_free_buffers' fs/jbd2/transaction.c:2149: warning: Excess function parameter 'page' description in 'jbd2_journal_try_to_free_buffers' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20220512075432.31763-1-yang.lee@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2022-06-16 10:36:09 -04:00
Jens Axboe	a76c0b31ee	io_uring: commit non-pollable provided mapped buffers upfront For recv/recvmsg, IO either completes immediately or gets queued for a retry. This isn't the case for read/readv, if eg a normal file or a block device is used. Here, an operation can get queued with the block layer. If this happens, ring mapped buffers must get committed immediately to avoid that the next read can consume the same buffer. Check if we're dealing with pollable file, when getting a new ring mapped provided buffer. If it's not, commit it immediately rather than wait post issue. If we don't wait, we can race with completions coming in, or just plain buffer reuse by committing after a retry where others could have grabbed the same buffer. Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-16 07:14:44 -06:00
Ye Bin	27cfa25895	ext2: fix fs corruption when trying to remove a non-empty directory with IO error We got issue as follows: [home]# mount /dev/sdd test [home]# cd test [test]# ls dir1 lost+found [test]# rmdir dir1 ext2_empty_dir: inject fault [test]# ls lost+found [test]# cd .. [home]# umount test [home]# fsck.ext2 -fn /dev/sdd e2fsck 1.42.9 (28-Dec-2013) Pass 1: Checking inodes, blocks, and sizes Inode 4065, i_size is 0, should be 1024. Fix? no Pass 2: Checking directory structure Pass 3: Checking directory connectivity Unconnected directory inode 4065 (/???) Connect to /lost+found? no '..' in ... (4065) is / (2), should be <The NULL inode> (0). Fix? no Pass 4: Checking reference counts Inode 2 ref count is 3, should be 4. Fix? no Inode 4065 ref count is 2, should be 3. Fix? no Pass 5: Checking group summary information /dev/sdd: ******** WARNING: Filesystem still has errors ******** /dev/sdd: 14/128016 files (0.0% non-contiguous), 18477/512000 blocks Reason is same with commit `7aab5c84a0`. We can't assume directory is empty when read directory entry failed. Link: https://lore.kernel.org/r/20220615090010.1544152-1-yebin10@huawei.com Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz>	2022-06-16 10:55:45 +02:00
Darrick J. Wong	e89ab76d7e	xfs: preserve DIFLAG2_NREXT64 when setting other inode attributes It is vitally important that we preserve the state of the NREXT64 inode flag when we're changing the other flags2 fields. Fixes: `9b7d16e34b` ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>	2022-06-15 23:13:33 -07:00
Darrick J. Wong	10930b254d	xfs: fix variable state usage The variable @args is fed to a tracepoint, and that's the only place it's used. This is fine for the kernel, but for userspace, tracepoints are #define'd out of existence, which results in this warning on gcc 11.2: xfs_attr.c: In function ‘xfs_attr_node_try_addname’: xfs_attr.c:1440:42: warning: unused variable ‘args’ [-Wunused-variable] 1440 \| struct xfs_da_args *args = attr->xattri_da_args; \| ^~~~ Clean this up. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>	2022-06-15 23:13:32 -07:00
Darrick J. Wong	f4288f0182	xfs: fix TOCTOU race involving the new logged xattrs control knob I found a race involving the larp control knob, aka the debugging knob that lets developers enable logging of extended attribute updates: Thread 1 Thread 2 echo 0 > /sys/fs/xfs/debug/larp setxattr(REPLACE) xfs_has_larp (returns false) xfs_attr_set echo 1 > /sys/fs/xfs/debug/larp xfs_attr_defer_replace xfs_attr_init_replace_state xfs_has_larp (returns true) xfs_attr_init_remove_state <oops, wrong DAS state!> This isn't a particularly severe problem right now because xattr logging is only enabled when CONFIG_XFS_DEBUG=y, and developers should know what they're doing. However, the eventual intent is that callers should be able to ask for the assistance of the log in persisting xattr updates. This capability might not be required for /all/ callers, which means that dynamic control must work correctly. Once an xattr update has decided whether or not to use logged xattrs, it needs to stay in that mode until the end of the operation regardless of what subsequent parallel operations might do. Therefore, it is an error to continue sampling xfs_globals.larp once xfs_attr_change has made a decision about larp, and it was not correct for me to have told Allison that ->create_intent functions can sample the global log incompat feature bitfield to decide to elide a log item. Instead, create a new op flag for the xfs_da_args structure, and convert all other callers of xfs_has_larp and xfs_sb_version_haslogxattrs within the attr update state machine to look for the operations flag. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>	2022-06-15 23:13:32 -07:00
Dave Wysochanski	5ee3d10f84	NFSv4: Add FMODE_CAN_ODIRECT after successful open of a NFS4.x file Commit `a2ad63daa8` ("VFS: add FMODE_CAN_ODIRECT file flag") added the FMODE_CAN_ODIRECT flag for NFSv3 but neglected to add it for NFSv4.x. This causes direct io on NFSv4.x to fail open with EINVAL: mount -o vers=4.2 127.0.0.1:/export /mnt/nfs4 dd if=/dev/zero of=/mnt/nfs4/file.bin bs=128k count=1 oflag=direct dd: failed to open '/mnt/nfs4/file.bin': Invalid argument dd of=/dev/null if=/mnt/nfs4/file.bin bs=128k count=1 iflag=direct dd: failed to open '/mnt/dir1/file1.bin': Invalid argument Fixes: `a2ad63daa8` ("VFS: add FMODE_CAN_ODIRECT file flag") Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2022-06-15 15:03:12 -04:00
Linus Torvalds	979086f5e0	fs.fixes.v5.19-rc3 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYqmpKwAKCRCRxhvAZXjc ogvLAQCsgqKYjmqx1s9ta8PXH9qiTWLQh1/s3ONCAvSBe0rYRAD9HPwbUoxguqxr T2RzjuX2+rqzA5qTErjQqVEftn7DgAo= =+P6m -----END PGP SIGNATURE----- Merge tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull vfs idmapping fix from Christian Brauner: "This fixes an issue where we fail to change the group of a file when the caller owns the file and is a member of the group to change to. This is only relevant on idmapped mounts. There's a detailed description in the commit message and regression tests have been added to xfstests" * tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: fs: account for group membership	2022-06-15 09:04:55 -07:00
Pavel Begunkov	c5595975b5	io_uring: make io_fill_cqe_aux honour CQE32 Don't let io_fill_cqe_aux() post 16B cqes for CQE32 rings, neither the kernel nor the userspace expect this to happen. Fixes: `76c68fbf1a` ("io_uring: enable CQE32") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/64fae669fae1b7083aa15d0cd807f692b0880b9a.1655287457.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-15 05:06:56 -06:00
Pavel Begunkov	cd94903d3b	io_uring: remove __io_fill_cqe() helper In preparation for the following patch, inline __io_fill_cqe(), there is only one user. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/71dab9afc3cde3f8b64d26f20d3b60bdc40726ff.1655287457.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-15 05:06:42 -06:00
Pavel Begunkov	2caf9822f0	io_uring: fix ->extra{1,2} misuse We don't really know the state of req->extra{1,2] fields in __io_fill_cqe_req(), if an opcode handler is not aware of CQE32 option, it never sets them up properly. Track the state of those fields with a request flag. Fixes: `76c68fbf1a` ("io_uring: enable CQE32") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4b3e5be512fbf4debec7270fd485b8a3b014d464.1655287457.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-15 05:06:09 -06:00
Pavel Begunkov	29ede2014c	io_uring: fill extra big cqe fields from req The only user of io_req_complete32()-like functions is cmd requests. Instead of keeping the whole complete32 family, remove them and provide the extras in already added for inline completions req->extra{1,2}. When fill_cqe_res() finds CQE32 option enabled it'll use those fields to fill a 32B cqe. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/af1319eb661b1f9a0abceb51cbbf72b8002e019d.1655287457.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-15 05:06:09 -06:00
Pavel Begunkov	f43de1f888	io_uring: unite fill_cqe and the 32B version We want just one function that will handle both normal cqes and 32B cqes. Combine __io_fill_cqe_req() and __io_fill_cqe_req32(). It's still not entirely correct yet, but saves us from cases when we fill an CQE of a wrong size. Fixes: `76c68fbf1a` ("io_uring: enable CQE32") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8085c5b2f74141520f60decd45334f87e389b718.1655287457.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-15 05:06:09 -06:00
Pavel Begunkov	91ef75a7db	io_uring: get rid of __io_fill_cqe{32}_req() There are too many cqe filling helpers, kill __io_fill_cqe{32}_req(), use __io_fill_cqe{32}_req_filled() instead, and then rename it. It'll simplify fixing in following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c18e0d191014fb574f24721245e4e3fddd0b6917.1655287457.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-15 05:06:09 -06:00
Tyler Hicks	2a3dcbccd6	9p: Fix refcounting during full path walks for fid lookups Decrement the refcount of the parent dentry's fid after walking each path component during a full path walk for a lookup. Failure to do so can lead to fids that are not clunked until the filesystem is unmounted, as indicated by this warning: 9pnet: found fid 3 not clunked The improper refcounting after walking resulted in open(2) returning -EIO on any directories underneath the mount point when using the virtio transport. When using the fd transport, there's no apparent issue until the filesytem is unmounted and the warning above is emitted to the logs. In some cases, the user may not yet be attached to the filesystem and a new root fid, associated with the user, is created and attached to the root dentry before the full path walk is performed. Increment the new root fid's refcount to two in that situation so that it can be safely decremented to one after it is used for the walk operation. The new fid will still be attached to the root dentry when v9fs_fid_lookup_with_uid() returns so a final refcount of one is correct/expected. Link: https://lkml.kernel.org/r/20220527000003.355812-2-tyhicks@linux.microsoft.com Link: https://lkml.kernel.org/r/20220612085330.1451496-4-asmadeus@codewreck.org Fixes: `6636b6dcc3` ("9p: add refcount to p9_fid struct") Cc: stable@vger.kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> [Dominique: fix clunking fid multiple times discussed in second link] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-06-15 12:05:33 +09:00
Dominique Martinet	e5690f2632	9p: fix fid refcount leak in v9fs_vfs_get_link we check for protocol version later than required, after a fid has been obtained. Just move the version check earlier. Link: https://lkml.kernel.org/r/20220612085330.1451496-3-asmadeus@codewreck.org Fixes: `6636b6dcc3` ("9p: add refcount to p9_fid struct") Cc: stable@vger.kernel.org Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-06-15 12:05:29 +09:00
Dominique Martinet	beca774fc5	9p: fix fid refcount leak in v9fs_vfs_atomic_open_dotl We need to release directory fid if we fail halfway through open This fixes fid leaking with xfstests generic 531 Link: https://lkml.kernel.org/r/20220612085330.1451496-2-asmadeus@codewreck.org Fixes: `6636b6dcc3` ("9p: add refcount to p9_fid struct") Cc: stable@vger.kernel.org Reported-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-06-15 12:05:21 +09:00
Pavel Begunkov	d884b6498d	io_uring: remove IORING_CLOSE_FD_AND_FILE_SLOT This partially reverts `a7c41b4687` Even though IORING_CLOSE_FD_AND_FILE_SLOT might save cycles for some users, but it tries to do two things at a time and it's not clear how to handle errors and what to return in a single result field when one part fails and another completes well. Kill it for now. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/837c745019b3795941eee4fcfd7de697886d645b.1655224415.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-14 10:57:40 -06:00
Pavel Begunkov	aa165d6d2b	Revert "io_uring: add buffer selection support to IORING_OP_NOP" This reverts commit `3d200242a6`. Buffer selection with nops was used for debugging and benchmarking but is useless in real life. Let's revert it before it's released. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c5012098ca6b51dfbdcb190f8c4e3c0bf1c965dc.1655224415.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-14 10:57:40 -06:00
Pavel Begunkov	8899ce4b2f	Revert "io_uring: support CQE32 for nop operation" This reverts commit `2bb04df7c2`. CQE32 nops were used for debugging and benchmarking but it doesn't target any real use case. Revert it, we can return it back if someone finds a good way to use it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5ff623d84ccb4b3f3b92a3ea41cdcfa612f3d96f.1655224415.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-14 10:57:40 -06:00
Christian Brauner	168f912893	fs: account for group membership When calling setattr_prepare() to determine the validity of the attributes the ia_{g,u}id fields contain the value that will be written to inode->i_{g,u}id. This is exactly the same for idmapped and non-idmapped mounts and allows callers to pass in the values they want to see written to inode->i_{g,u}id. When group ownership is changed a caller whose fsuid owns the inode can change the group of the inode to any group they are a member of. When searching through the caller's groups we need to use the gid mapped according to the idmapped mount otherwise we will fail to change ownership for unprivileged users. Consider a caller running with fsuid and fsgid 1000 using an idmapped mount that maps id 65534 to 1000 and 65535 to 1001. Consequently, a file owned by 65534:65535 in the filesystem will be owned by 1000:1001 in the idmapped mount. The caller now requests the gid of the file to be changed to 1000 going through the idmapped mount. In the vfs we will immediately map the requested gid to the value that will need to be written to inode->i_gid and place it in attr->ia_gid. Since this idmapped mount maps 65534 to 1000 we place 65534 in attr->ia_gid. When we check whether the caller is allowed to change group ownership we first validate that their fsuid matches the inode's uid. The inode->i_uid is 65534 which is mapped to uid 1000 in the idmapped mount. Since the caller's fsuid is 1000 we pass the check. We now check whether the caller is allowed to change inode->i_gid to the requested gid by calling in_group_p(). This will compare the passed in gid to the caller's fsgid and search the caller's additional groups. Since we're dealing with an idmapped mount we need to pass in the gid mapped according to the idmapped mount. This is akin to checking whether a caller is privileged over the future group the inode is owned by. And that needs to take the idmapped mount into account. Note, all helpers are nops without idmapped mounts. New regression test sent to xfstests. Link: https://github.com/lxc/lxd/issues/10537 Link: https://lore.kernel.org/r/20220613111517.2186646-1-brauner@kernel.org Fixes: `2f221d6f7b` ("attr: handle idmapped mounts") Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org # 5.15+ CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-06-14 12:18:47 +02:00
Jens Axboe	feaf625e70	Merge branch 'io_uring/io_uring-5.19' of https://github.com/isilence/linux into io_uring-5.19 Pull io_uring fixes from Pavel. * 'io_uring/io_uring-5.19' of https://github.com/isilence/linux: io_uring: fix double unlock for pbuf select io_uring: kbuf: fix bug of not consuming ring buffer in partial io case io_uring: openclose: fix bug of closing wrong fixed file io_uring: fix not locked access to fixed buf table io_uring: fix races with buffer table unregister io_uring: fix races with file table unregister	2022-06-13 06:52:52 -06:00
Dylan Yudaken	f9437ac0f8	io_uring: limit size of provided buffer ring The type of head and tail do not allow more than 2^15 entries in a provided buffer ring, so do not allow this. At 2^16 while each entry can be indexed, there is no way to disambiguate full vs empty. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220613101157.3687-4-dylany@fb.com Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-13 05:13:33 -06:00
Dylan Yudaken	c6e9fa5c0a	io_uring: fix types in provided buffer ring The type of head needs to match that of tail in order for rollover and comparisons to work correctly. Without this change the comparison of tail to head might incorrectly allow io_uring to use a buffer that userspace had not given it. Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220613101157.3687-3-dylany@fb.com Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-13 05:13:31 -06:00
Dylan Yudaken	97da4a5379	io_uring: fix index calculation When indexing into a provided buffer ring, do not subtract 1 from the index. Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220613101157.3687-2-dylany@fb.com Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-13 05:13:09 -06:00
Pavel Begunkov	fc9375e3f7	io_uring: fix double unlock for pbuf select io_buffer_select(), which is the only caller of io_ring_buffer_select(), fully handles locking, mutex unlock in io_ring_buffer_select() will lead to double unlock. Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>	2022-06-13 11:37:41 +01:00
Hao Xu	42db0c00e2	io_uring: kbuf: fix bug of not consuming ring buffer in partial io case When we use ring-mapped provided buffer, we should consume it before arm poll if partial io has been done. Otherwise the buffer may be used by other requests and thus we lost the data. Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Hao Xu <howeyxu@tencent.com> [pavel: 5.19 rebase] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>	2022-06-13 11:37:30 +01:00
Hao Xu	e71d7c56dd	io_uring: openclose: fix bug of closing wrong fixed file Don't update ret until fixed file is closed, otherwise the file slot becomes the error code. Fixes: `a7c41b4687` ("io_uring: let IORING_OP_FILES_UPDATE support choosing fixed file slots") Signed-off-by: Hao Xu <howeyxu@tencent.com> [pavel: 5.19 rebase] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>	2022-06-13 11:37:03 +01:00
Pavel Begunkov	05b538c176	io_uring: fix not locked access to fixed buf table We can look inside the fixed buffer table only while holding ->uring_lock, however in some cases we don't do the right async prep for IORING_OP_{WRITE,READ}_FIXED ending up with NULL req->imu forcing making an io-wq worker to try to resolve the fixed buffer without proper locking. Move req->imu setup into early req init paths, i.e. io_prep_rw(), which is called unconditionally for rw requests and under uring_lock. Fixes: `634d00df5e` ("io_uring: add full-fledged dynamic buffers support") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>	2022-06-13 09:53:41 +01:00
Pavel Begunkov	d11d31fc5d	io_uring: fix races with buffer table unregister Fixed buffer table quiesce might unlock ->uring_lock, potentially letting new requests to be submitted, don't allow those requests to use the table as they will race with unregistration. Reported-and-tested-by: van fantasy <g1042620637@gmail.com> Fixes: `bd54b6fe33` ("io_uring: implement fixed buffers registration similar to fixed files") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>	2022-06-13 09:53:27 +01:00
Pavel Begunkov	b0380bf6da	io_uring: fix races with file table unregister Fixed file table quiesce might unlock ->uring_lock, potentially letting new requests to be submitted, don't allow those requests to use the table as they will race with unregistration. Reported-and-tested-by: van fantasy <g1042620637@gmail.com> Fixes: `05f3fb3c53` ("io_uring: avoid ring quiesce for fixed file set unregister and update") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>	2022-06-13 09:53:07 +01:00
Linus Torvalds	2275c6babf	3 smb3 reconnect fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmKkwyEACgkQiiy9cAdy T1EqyAv/d8aDey0rQzGy918wzJd91gZrNFOJUpzVhIs3O5MakBgeoYn+S6rySl1+ xs6lXTQdSEyiL0edqTIq8iqA+iuhLPCBW2BWa/Zw089yHM/Ho3tjc5gBl5w38OcF 7NpFUInkg+yoBYWY9cCwjL83YaPxhcLKGY7S6WWptUxzf5Sg6eUqXCkMS7eUV6hb YniMa5uWZSJtqY4F6qw/NOw90QekodEmfL4lLU/GXOnDxlJ8v5Ztf3aGHITWNwsd ovhutUSai/tZz9fYHp6yOZYDcl4i0brOa3dIyU2tr52TdtzS73he8rE+Th4bu+uM XTXvrDCTwsnOTiRFyyBJcaVDF+6LpqPEcURqLEbVOf0xXHyoEQ4zEwVFQJIBYOP4 Oy8XeXQePRxCnToI2cFZaw85IkLikoZ+4PggFkbsaFdJfkboR7b+XVhkfGVr5jnn A6Unwrn3f6LS6MLbsDAjUpfzftwRyhbcvYCukeYKWz836xAx5tyOBIZA3m6keXah LyhX4qSb =mHWJ -----END PGP SIGNATURE----- Merge tag '5.19-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs client fixes from Steve French: "Three reconnect fixes, all for stable as well. One of these three reconnect fixes does address a problem with multichannel reconnect, but this does not include the additional fix (still being tested) for dynamically detecting multichannel adapter changes which will improve those reconnect scenarios even more" * tag '5.19-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: populate empty hostnames for extra channels cifs: return errors during session setup during reconnects cifs: fix reconnect on smb3 mount types	2022-06-12 11:05:44 -07:00
Christophe JAILLET	06ee1c0aeb	ksmbd: smbd: Remove useless license text when SPDX-License-Identifier is already used An SPDX-License-Identifier is already in place. There is no need to duplicate part of the corresponding license. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-11 11:18:26 -05:00
Namjae Jeon	fe0fde09e1	ksmbd: use SOCK_NONBLOCK type for kernel_accept() I found that normally it is O_NONBLOCK but there are different value for some arch. /include/linux/net.h: #ifndef SOCK_NONBLOCK #define SOCK_NONBLOCK O_NONBLOCK #endif /arch/alpha/include/asm/socket.h: #define SOCK_NONBLOCK 0x40000000 Use SOCK_NONBLOCK instead of O_NONBLOCK for kernel_accept(). Suggested-by: David Howells <dhowells@redhat.com> Signed-off-by: Namjae Jeon <linkinjeon@kerne.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-11 11:18:26 -05:00
Linus Torvalds	0885eacdc8	Notable changes: - There is now a backup maintainer for NFSD Notable fixes: - Prevent array overruns in svc_rdma_build_writes() - Prevent buffer overruns when encoding NFSv3 READDIR results - Fix a potential UAF in nfsd_file_put() -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmKjhbsACgkQM2qzM29m f5cgig/6A9gC2c9v4lR2fH6ufiCWvJBfuVaBbToubwktJHaDLvqH56JcvS3s/gKL PKGmbQTI/6lgmVgJqQSxJUnfe6wzHx8G1MdjlZEIwi3pUeiV+LpiprZz9TOhgYYV YDnXRGhb4wKOm75w8rb6X8k106XdKwdBaRQwb88FDawffWoEY0XNYrlmNmmWi8To ELlOlIwRCBbKJoJ6yEEWQRrBuVBXapbsn29tipZXbdo58g+vL0yDQq9s97b0mHhi C2apAN2+k18FiBJsA7b7pW1l/P6k9FNEeetvgWyN8OSMpPNmt0vz1HvKaIstPgg1 BX6rgWe5eQBFEk2KNvSGHrV3R+wAp7jeuVpHUMjxXvzmfj4exJV/H8lu+qZJNDGN ybCJatomR4APFxk+s1kptlzNo7zfyPz15L80HmWIngYJ/lrBOoKPIIi3bwPQcBwW q2Rc+SlvpqbJvEcomgF/lqQN6inmx44J+KpOSA/S8qSIdSkz0iaZsDahFxgZNe82 h+X/i1maRtnSIvWdGMR7O6kEFT5jky35WlTv/VutTOsUwA4mUU9vZUnufBBHJH07 nOdLMi/QS/O5GOnlyegrODtN75wi+IeKt+WMNmnN+JB8Tsg0kZwjOsc/dbQfyJqP PrQJ5AUP0TMm90B1873z8yKmhWeXtB71vgAI/d53aadBG4ZEstU= =6ugk -----END PGP SIGNATURE----- Merge tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Notable changes: - There is now a backup maintainer for NFSD Notable fixes: - Prevent array overruns in svc_rdma_build_writes() - Prevent buffer overruns when encoding NFSv3 READDIR results - Fix a potential UAF in nfsd_file_put()" * tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Remove pointer type casts from xdr_get_next_encode_buffer() SUNRPC: Clean up xdr_get_next_encode_buffer() SUNRPC: Clean up xdr_commit_encode() SUNRPC: Optimize xdr_reserve_space() SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer() SUNRPC: Trap RDMA segment overflows NFSD: Fix potential use-after-free in nfsd_file_put() MAINTAINERS: reciprocal co-maintainership for file locking and nfsd	2022-06-10 17:28:43 -07:00
Shyam Prasad N	4c14d7043f	cifs: populate empty hostnames for extra channels Currently, the secondary channels of a multichannel session also get hostname populated based on the info in primary channel. However, this will end up with a wrong resolution of hostname to IP address during reconnect. This change fixes this by not populating hostname info for all secondary channels. Fixes: `5112d80c16` ("cifs: populate server_hostname for extra channels") Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-10 18:55:02 -05:00
David Howells	40a8110120	netfs: Rename the netfs_io_request cleanup op and give it an op pointer The netfs_io_request cleanup op is now always in a position to be given a pointer to a netfs_io_request struct, so this can be passed in instead of the mapping and private data arguments (both of which are included in the struct). So rename the ->cleanup op to ->free_request (to match ->init_request) and pass in the I/O pointer. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com	2022-06-10 20:55:21 +01:00
Linus Torvalds	e81fb4198e	netfs: Further cleanups after struct netfs_inode wrapper introduced Change the signature of netfs helper functions to take a struct netfs_inode pointer rather than a struct inode pointer where appropriate, thereby relieving the need for the network filesystem to convert its internal inode format down to the VFS inode only for netfslib to bounce it back up. For type safety, it's better not to do that (and it's less typing too). Give netfs_write_begin() an extra argument to pass in a pointer to the netfs_inode struct rather than deriving it internally from the file pointer. Note that the ->write_begin() and ->write_end() ops are intended to be replaced in the future by netfslib code that manages this without the need to call in twice for each page. netfs_readpage() and similar are intended to be pointed at directly by the address_space_operations table, so must stick to the signature dictated by the function pointers there. Changes ======= - Updated the kerneldoc comments and documentation [DH]. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/CAHk-=wgkwKyNmNdKpQkqZ6DnmUL-x9hp0YBnUGjaPFEAdxDTbw@mail.gmail.com/	2022-06-10 20:55:21 +01:00
David Howells	102d841055	afs: Fix some checker issues Remove an unused global variable and make another static as reported by make C=1. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org	2022-06-10 20:55:21 +01:00
Linus Torvalds	ad6e076498	zonefs fixes for 5.19-rc2 * Fix handling of the explicit-open mount option, and in particular the conditions under which this option can be ignored. * Fix a problem with zonefs iomap_begin method, causing a hang in iomap_readahead() when a readahead request reaches the end of a file. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYqMh/wAKCRDdoc3SxdoY dvd+AP4jNRFhAedXl0mIutoP4k0XwblSz9RwrXLOYzkOtgpXGQD+Lps42w6EQliE wWuuL4syVgKamolj0WGcPLarGZC7LQA= =neot -----END PGP SIGNATURE----- Merge tag 'zonefs-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fixes from Damien Le Moal: - Fix handling of the explicit-open mount option, and in particular the conditions under which this option can be ignored. - Fix a problem with zonefs iomap_begin method, causing a hang in iomap_readahead() when a readahead request reaches the end of a file. * tag 'zonefs-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: fix zonefs_iomap_begin() for reads zonefs: Do not ignore explicit_open with active zone limit zonefs: fix handling of explicit_open option on mount	2022-06-10 10:56:28 -07:00
David Howells	874c8ca1e6	netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context While randstruct was satisfied with using an open-coded "void " offset cast for the netfs_i_context <-> inode casting, __builtin_object_size() as used by FORTIFY_SOURCE was not as easily fooled. This was causing the following complaint[1] from gcc v12: In file included from include/linux/string.h:253, from include/linux/ceph/ceph_debug.h:7, from fs/ceph/inode.c:2: In function 'fortify_memset_chk', inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2, inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2: include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 242 \| __write_overflow_field(p_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by embedding a struct inode into struct netfs_i_context (which should perhaps be renamed to struct netfs_inode). The struct inode vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode structs and vfs_inode is then simply changed to "netfs.inode" in those filesystems. Further, rename netfs_i_context to netfs_inode, get rid of the netfs_inode() function that converted a netfs_i_context pointer to an inode pointer (that can now be done with &ctx->inode) and rename the netfs_i_context() function to netfs_inode() (which is now a wrapper around container_of()). Most of the changes were done with: perl -p -i -e 's/vfs_inode/netfs.inode/'g \ `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/.[ch]` Kees suggested doing it with a pair structure[2] and a special declarator to insert that into the network filesystem's inode wrapper[3], but I think it's cleaner to embed it - and then it doesn't matter if struct randomisation reorders things. Dave Chinner suggested using a filesystem-specific VFS_I() function in each filesystem to convert that filesystem's own inode wrapper struct into the VFS inode struct[4]. Version #2: - Fix a couple of missed name changes due to a disabled cifs option. - Rename nfs_i_context to nfs_inode - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper structs. [ This also undoes commit `507160f46c` ("netfs: gcc-12: temporarily disable '-Wattribute-warning' for now") that is no longer needed ] Fixes: `bc899ee1c8` ("netfs: Add a netfs inode context") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> cc: Jonathan Corbet <corbet@lwn.net> cc: Eric Van Hensbergen <ericvh@gmail.com> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Steve French <smfrench@gmail.com> cc: William Kucharski <william.kucharski@oracle.com> cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> cc: Dave Chinner <david@fromorbit.com> cc: linux-doc@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: samba-technical@lists.samba.org cc: linux-fsdevel@vger.kernel.org cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2] Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3] Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4] Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-06-09 13:55:00 -07:00
Linus Torvalds	3d9f55c57b	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmKiO9UACgkQnJ2qBz9k QNk9+Af/RjaJEozyj/He7nqj1xncN6bIJzeyOqQVJNkHBsKYt7oDFvSuYI1Kbzk+ x7/x8dRtVR3kRZCO6VarETkzGp6Nw10RdzFKqT2FRmQ66wVZaXPQeqVZqwXSKdtR qgU892e9S2SqUH9EyUwk3D/HwLr1VNKKp6B0N+By7EwKmZdyTg5siFJ26+z+QpJQ wo84nN/m6GgHSm+c8kMFa+cs635tMY3+vP4nviUKyuDTxW3Yu6maIa5973WLiFqo EZSLtSfXYasjoOl5fN3AaO0dAl8fRJIh6wsgbeQI/NeUYMIqKWslW+5esq1SwreS r1+Xig8MmxDJ/1I3i/L/aDM7FipY9A== =kMe8 -----END PGP SIGNATURE----- Merge tag 'fs_for_v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, writeback, and quota fixes and cleanups from Jan Kara: "A fix for race in writeback code and two cleanups in quota and ext2" * tag 'fs_for_v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Prevent memory allocation recursion while holding dq_lock writeback: Fix inode->i_io_list not be protected by inode->i_lock error fs: Fix syntax errors in comments	2022-06-09 12:26:05 -07:00
Linus Torvalds	507160f46c	netfs: gcc-12: temporarily disable '-Wattribute-warning' for now This is a pure band-aid so that I can continue merging stuff from people while some of the gcc-12 fallout gets sorted out. In particular, gcc-12 is very unhappy about the kinds of pointer arithmetic tricks that netfs does, and that makes the fortify checks trigger in afs and ceph: In function ‘fortify_memset_chk’, inlined from ‘netfs_i_context_init’ at include/linux/netfs.h:327:2, inlined from ‘afs_set_netfs_context’ at fs/afs/inode.c:61:2, inlined from ‘afs_root_iget’ at fs/afs/inode.c:543:2: include/linux/fortify-string.h:258:25: warning: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 258 \| __write_overflow_field(p_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ and the reason is that netfs_i_context_init() is passed a 'struct inode' pointer, and then it does struct netfs_i_context ctx = netfs_i_context(inode); memset(ctx, 0, sizeof(ctx)); where that netfs_i_context() function just does pointer arithmetic on the inode pointer, knowing that the netfs_i_context is laid out immediately after it in memory. This is all truly disgusting, since the whole "netfs_i_context is laid out immediately after it in memory" is not actually remotely true in general, but is just made to be that way for afs and ceph. See for example fs/cifs/cifsglob.h: struct cifsInodeInfo { struct { /* These must be contiguous / struct inode vfs_inode; / the VFS's inode record / struct netfs_i_context netfs_ctx; / Netfslib context */ }; [...] and realize that this is all entirely wrong, and the pointer arithmetic that netfs_i_context() is doing is also very very wrong and wouldn't give the right answer if netfs_ctx had different alignment rules from a 'struct inode', for example). Anyway, that's just a long-winded way to say "the gcc-12 warning is actually quite reasonable, and our code happens to work but is pretty disgusting". This is getting fixed properly, but for now I made the mistake of thinking "the week right after the merge window tends to be calm for me as people take a breather" and I did a sustem upgrade. And I got gcc-12 as a result, so to continue merging fixes from people and not have the end result drown in warnings, I am fixing all these gcc-12 issues I hit. Including with these kinds of temporary fixes. Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/all/AEEBCF5D-8402-441D-940B-105AA718C71F@chromium.org/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-06-09 11:29:36 -07:00
Sungjong Seo	204e6ceaa1	exfat: use updated exfat_chain directly during renaming In order for a file to access its own directory entry set, exfat_inode_info(ei) has two copied values. One is ei->dir, which is a snapshot of exfat_chain of the parent directory, and the other is ei->entry, which is the offset of the start of the directory entry set in the parent directory. Since the parent directory can be updated after the snapshot point, it should be used only for accessing one's own directory entry set. However, as of now, during renaming, it could try to traverse or to allocate clusters via snapshot values, it does not make sense. This potential problem has been revealed when exfat_update_parent_info() was removed by commit `d8dad2588a` ("exfat: fix referencing wrong parent directory information after renaming"). However, I don't think it's good idea to bring exfat_update_parent_info() back. Instead, let's use the updated exfat_chain of parent directory diectly. Fixes: `d8dad2588a` ("exfat: fix referencing wrong parent directory information after renaming") Reported-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Tested-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-06-09 21:26:32 +09:00
Damien Le Moal	c1c1204c0d	zonefs: fix zonefs_iomap_begin() for reads If a readahead is issued to a sequential zone file with an offset exactly equal to the current file size, the iomap type is set to IOMAP_UNWRITTEN, which will prevent an IO, but the iomap length is calculated as 0. This causes a WARN_ON() in iomap_iter(): [17309.548939] WARNING: CPU: 3 PID: 2137 at fs/iomap/iter.c:34 iomap_iter+0x9cf/0xe80 [...] [17309.650907] RIP: 0010:iomap_iter+0x9cf/0xe80 [...] [17309.754560] Call Trace: [17309.757078] <TASK> [17309.759240] ? lock_is_held_type+0xd8/0x130 [17309.763531] iomap_readahead+0x1a8/0x870 [17309.767550] ? iomap_read_folio+0x4c0/0x4c0 [17309.771817] ? lockdep_hardirqs_on_prepare+0x400/0x400 [17309.778848] ? lock_release+0x370/0x750 [17309.784462] ? folio_add_lru+0x217/0x3f0 [17309.790220] ? reacquire_held_locks+0x4e0/0x4e0 [17309.796543] read_pages+0x17d/0xb60 [17309.801854] ? folio_add_lru+0x238/0x3f0 [17309.807573] ? readahead_expand+0x5f0/0x5f0 [17309.813554] ? policy_node+0xb5/0x140 [17309.819018] page_cache_ra_unbounded+0x27d/0x450 [17309.825439] filemap_get_pages+0x500/0x1450 [17309.831444] ? filemap_add_folio+0x140/0x140 [17309.837519] ? lock_is_held_type+0xd8/0x130 [17309.843509] filemap_read+0x28c/0x9f0 [17309.848953] ? zonefs_file_read_iter+0x1ea/0x4d0 [zonefs] [17309.856162] ? trace_contention_end+0xd6/0x130 [17309.862416] ? __mutex_lock+0x221/0x1480 [17309.868151] ? zonefs_file_read_iter+0x166/0x4d0 [zonefs] [17309.875364] ? filemap_get_pages+0x1450/0x1450 [17309.881647] ? __mutex_unlock_slowpath+0x15e/0x620 [17309.888248] ? wait_for_completion_io_timeout+0x20/0x20 [17309.895231] ? lock_is_held_type+0xd8/0x130 [17309.901115] ? lock_is_held_type+0xd8/0x130 [17309.906934] zonefs_file_read_iter+0x356/0x4d0 [zonefs] [17309.913750] new_sync_read+0x2d8/0x520 [17309.919035] ? __x64_sys_lseek+0x1d0/0x1d0 Furthermore, this causes iomap_readahead() to loop forever as iomap_readahead_iter() always returns 0, making no progress. Fix this by treating reads after the file size as access to holes, setting the iomap type to IOMAP_HOLE, the iomap addr to IOMAP_NULL_ADDR and using the length argument as is for the iomap length. To simplify the code with this change, zonefs_iomap_begin() is split into the read variant, zonefs_read_iomap_begin() and zonefs_read_iomap_ops, and the write variant, zonefs_write_iomap_begin() and zonefs_write_iomap_ops. Reported-by: Jorgen Hansen <Jorgen.Hansen@wdc.com> Fixes: `8dcc1a9d90` ("fs: New zonefs file system") Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Jorgen Hansen <Jorgen.Hansen@wdc.com>	2022-06-08 19:13:55 +09:00
Damien Le Moal	96eca145cb	zonefs: Do not ignore explicit_open with active zone limit A zoned device may have no limit on the number of open zones but may have a limit on the number of active zones it can support. In such case, the explicit_open mount option should not be ignored to ensure that the open() system call activates the zone with an explicit zone open command, thus guaranteeing that the zone can be written. Enforce this by ignoring the explicit_open mount option only for devices that have both the open and active zone limits equal to 0. Fixes: `87c9ce3ffe` ("zonefs: Add active seq file accounting") Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2022-06-08 15:38:44 +09:00
Damien Le Moal	a2a513be71	zonefs: fix handling of explicit_open option on mount Ignoring the explicit_open mount option on mount for devices that do not have a limit on the number of open zones must be done after the mount options are parsed and set in s_mount_opts. Move the check to ignore the explicit_open option after the call to zonefs_parse_options() in zonefs_fill_super(). Fixes: `b5c00e9757` ("zonefs: open/close zone on file open/close") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>	2022-06-08 15:38:42 +09:00
David Sterba	e3a4167c88	btrfs: add error messages to all unrecognized mount options Almost none of the errors stemming from a valid mount option but wrong value prints a descriptive message which would help to identify why mount failed. Like in the linked report: $ uname -r v4.19 $ mount -o compress=zstd /dev/sdb /mnt mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error. $ dmesg ... BTRFS error (device sdb): open_ctree failed Errors caused by memory allocation failures are left out as it's not a user error so reporting that would be confusing. Link: https://lore.kernel.org/linux-btrfs/9c3fec36-fc61-3a33-4977-a7e207c3fa4e@gmx.de/ CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-06-07 17:29:50 +02:00
Shyam Prasad N	8ea21823aa	cifs: return errors during session setup during reconnects During reconnects, we check the return value from cifs_negotiate_protocol, and have handlers for both success and failures. But if that passes, and cifs_setup_session returns any errors other than -EACCES, we do not handle that. This fix adds a handler for that, so that we don't go ahead and try a tree_connect on a failed session. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-06 18:23:38 -05:00
Trond Myklebust	880265c77a	pNFS: Avoid a live lock condition in pnfs_update_layout() If we're about to send the first layoutget for an empty layout, we want to make sure that we drain out the existing pending layoutget calls first. The reason is that these layouts may have been already implicitly returned to the server by a recall to which the client gave a NFS4ERR_NOMATCHING_LAYOUT response. The problem is that wait_var_event_killable() could in principle see the plh_outstanding count go back to '1' when the first process to wake up starts sending a new layoutget. If it fails to get a layout, then this loop can continue ad infinitum... Fixes: `0b77f97a7e` ("NFSv4/pnfs: Fix layoutget behaviour after invalidation") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2022-06-06 11:53:55 -04:00
Trond Myklebust	fe44fb23d6	pNFS: Don't keep retrying if the server replied NFS4ERR_LAYOUTUNAVAILABLE If the server tells us that a pNFS layout is not available for a specific file, then we should not keep pounding it with further layoutget requests. Fixes: `183d9e7b11` ("pnfs: rework LAYOUTGET retry handling") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2022-06-06 11:53:54 -04:00
Qu Wenruo	0591f04036	btrfs: prevent remounting to v1 space cache for subpage mount Upstream commit `9f73f1aef9` ("btrfs: force v2 space cache usage for subpage mount") forces subpage mount to use v2 cache, to avoid deprecated v1 cache which doesn't support subpage properly. But there is a loophole that user can still remount to v1 cache. The existing check will only give users a warning, but does not really prevent to do the remount. Although remounting to v1 will not cause any problems since the v1 cache will always be marked invalid when mounted with a different page size, it's still better to prevent v1 cache at all for subpage mounts. Fixes: `9f73f1aef9` ("btrfs: force v2 space cache usage for subpage mount") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-06-06 16:18:59 +02:00
Filipe Manana	31e70e5278	btrfs: fix hang during unmount when block group reclaim task is running When we start an unmount, at close_ctree(), if we have the reclaim task running and in the middle of a data block group relocation, we can trigger a deadlock when stopping an async reclaim task, producing a trace like the following: [629724.498185] task:kworker/u16:7 state:D stack: 0 pid:681170 ppid: 2 flags:0x00004000 [629724.499760] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] [629724.501267] Call Trace: [629724.501759] <TASK> [629724.502174] __schedule+0x3cb/0xed0 [629724.502842] schedule+0x4e/0xb0 [629724.503447] btrfs_wait_on_delayed_iputs+0x7c/0xc0 [btrfs] [629724.504534] ? prepare_to_wait_exclusive+0xc0/0xc0 [629724.505442] flush_space+0x423/0x630 [btrfs] [629724.506296] ? rcu_read_unlock_trace_special+0x20/0x50 [629724.507259] ? lock_release+0x220/0x4a0 [629724.507932] ? btrfs_get_alloc_profile+0xb3/0x290 [btrfs] [629724.508940] ? do_raw_spin_unlock+0x4b/0xa0 [629724.509688] btrfs_async_reclaim_metadata_space+0x139/0x320 [btrfs] [629724.510922] process_one_work+0x252/0x5a0 [629724.511694] ? process_one_work+0x5a0/0x5a0 [629724.512508] worker_thread+0x52/0x3b0 [629724.513220] ? process_one_work+0x5a0/0x5a0 [629724.514021] kthread+0xf2/0x120 [629724.514627] ? kthread_complete_and_exit+0x20/0x20 [629724.515526] ret_from_fork+0x22/0x30 [629724.516236] </TASK> [629724.516694] task:umount state:D stack: 0 pid:719055 ppid:695412 flags:0x00004000 [629724.518269] Call Trace: [629724.518746] <TASK> [629724.519160] __schedule+0x3cb/0xed0 [629724.519835] schedule+0x4e/0xb0 [629724.520467] schedule_timeout+0xed/0x130 [629724.521221] ? lock_release+0x220/0x4a0 [629724.521946] ? lock_acquired+0x19c/0x420 [629724.522662] ? trace_hardirqs_on+0x1b/0xe0 [629724.523411] __wait_for_common+0xaf/0x1f0 [629724.524189] ? usleep_range_state+0xb0/0xb0 [629724.524997] __flush_work+0x26d/0x530 [629724.525698] ? flush_workqueue_prep_pwqs+0x140/0x140 [629724.526580] ? lock_acquire+0x1a0/0x310 [629724.527324] __cancel_work_timer+0x137/0x1c0 [629724.528190] close_ctree+0xfd/0x531 [btrfs] [629724.529000] ? evict_inodes+0x166/0x1c0 [629724.529510] generic_shutdown_super+0x74/0x120 [629724.530103] kill_anon_super+0x14/0x30 [629724.530611] btrfs_kill_super+0x12/0x20 [btrfs] [629724.531246] deactivate_locked_super+0x31/0xa0 [629724.531817] cleanup_mnt+0x147/0x1c0 [629724.532319] task_work_run+0x5c/0xa0 [629724.532984] exit_to_user_mode_prepare+0x1a6/0x1b0 [629724.533598] syscall_exit_to_user_mode+0x16/0x40 [629724.534200] do_syscall_64+0x48/0x90 [629724.534667] entry_SYSCALL_64_after_hwframe+0x44/0xae [629724.535318] RIP: 0033:0x7fa2b90437a7 [629724.535804] RSP: 002b:00007ffe0b7e4458 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [629724.536912] RAX: 0000000000000000 RBX: 00007fa2b9182264 RCX: 00007fa2b90437a7 [629724.538156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000555d6cf20dd0 [629724.539053] RBP: 0000555d6cf20ba0 R08: 0000000000000000 R09: 00007ffe0b7e3200 [629724.539956] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [629724.540883] R13: 0000555d6cf20dd0 R14: 0000555d6cf20cb0 R15: 0000000000000000 [629724.541796] </TASK> This happens because: 1) Before entering close_ctree() we have the async block group reclaim task running and relocating a data block group; 2) There's an async metadata (or data) space reclaim task running; 3) We enter close_ctree() and park the cleaner kthread; 4) The async space reclaim task is at flush_space() and runs all the existing delayed iputs; 5) Before the async space reclaim task calls btrfs_wait_on_delayed_iputs(), the block group reclaim task which is doing the data block group relocation, creates a delayed iput at replace_file_extents() (called when COWing leaves that have file extent items pointing to relocated data extents, during the merging phase of relocation roots); 6) The async reclaim space reclaim task blocks at btrfs_wait_on_delayed_iputs(), since we have a new delayed iput; 7) The task at close_ctree() then calls cancel_work_sync() to stop the async space reclaim task, but it blocks since that task is waiting for the delayed iput to be run; 8) The delayed iput is never run because the cleaner kthread is parked, and no one else runs delayed iputs, resulting in a hang. So fix this by stopping the async block group reclaim task before we park the cleaner kthread. Fixes: `18bb8bbf13` ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-06-06 16:18:52 +02:00
Matthew Wilcox (Oracle)	537e11cdc7	quota: Prevent memory allocation recursion while holding dq_lock As described in commit `02117b8ae9` ("f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read"), we must not enter filesystem reclaim while holding the dq_lock. Prevent this more generally by using memalloc_nofs_save() while holding the lock. Link: https://lore.kernel.org/r/20220605143815.2330891-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz>	2022-06-06 10:08:10 +02:00
Jchao Sun	10e1407310	writeback: Fix inode->i_io_list not be protected by inode->i_lock error Commit `b35250c081` ("writeback: Protect inode->i_io_list with inode->i_lock") made inode->i_io_list not only protected by wb->list_lock but also inode->i_lock, but inode_io_list_move_locked() was missed. Add lock there and also update comment describing things protected by inode->i_lock. This also fixes a race where __mark_inode_dirty() could move inode under flush worker's hands and thus sync(2) could miss writing some inodes. Fixes: `b35250c081` ("writeback: Protect inode->i_io_list with inode->i_lock") Link: https://lore.kernel.org/r/20220524150540.12552-1-sunjunchao2870@gmail.com CC: stable@vger.kernel.org Signed-off-by: Jchao Sun <sunjunchao2870@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2022-06-06 09:54:30 +02:00
Xiang wangx	2aab03b867	fs: Fix syntax errors in comments Delete the redundant word 'not'. Link: https://lore.kernel.org/r/20220605125509.14837-1-wangxiang@cdjrlc.com Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: Jan Kara <jack@suse.cz>	2022-06-06 09:53:03 +02:00
Paulo Alcantara	c36ee7dab7	cifs: fix reconnect on smb3 mount types cifs.ko defines two file system types: cifs & smb3, and __cifs_get_super() was not including smb3 file system type when looking up superblocks, therefore failing to reconnect tcons in cifs_tree_connect(). Fix this by calling iterate_supers_type() on both file system types. Link: https://lore.kernel.org/r/CAFrh3J9soC36+BVuwHB=g9z_KB5Og2+p2_W+BBoBOZveErz14w@mail.gmail.com Cc: stable@vger.kernel.org Tested-by: Satadru Pramanik <satadru@gmail.com> Reported-by: Satadru Pramanik <satadru@gmail.com> Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-06 01:04:12 -05:00
Linus Torvalds	6684cf4290	fix for breakage in #work.fd this window Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYpz/IQAKCRBZ7Krx/gZQ 666mAPwKOC/voemjl2m+RpSruxAbdlRvKei3IY8YxLfyv+rmUQD9HKLJJtQX2VRF QTFmQ3p7kx30ydwSbyY8Kpw3VMCDxgs= =1ZKm -----END PGP SIGNATURE----- Merge tag 'pull-work.fd-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull file descriptor fix from Al Viro: "Fix for breakage in #work.fd this window" * tag 'pull-work.fd-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix the breakage in close_fd_get_file() calling conventions change	2022-06-05 17:14:03 -07:00
Al Viro	40a1926022	fix the breakage in close_fd_get_file() calling conventions change It used to grab an extra reference to struct file rather than just transferring to caller the one it had removed from descriptor table. New variant doesn't, and callers need to be adjusted. Reported-and-tested-by: syzbot+47dd250f527cb7bebf24@syzkaller.appspotmail.com Fixes: `6319194ec5` ("Unify the primitives for file descriptor closing") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-06-05 15:03:03 -04:00
Linus Torvalds	952923ddc0	Several cleanups in fs/namei.c. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYpvrrgAKCRBZ7Krx/gZQ 6+eZAP9r0c8E1UnUxRI32kQrzndJO2z9mKq6rI9D7GgJe+MtogEAlfynzklqbHzo fFvczeD0tzLu4aDExtGn6GNGf5gSpw4= =1blQ -----END PGP SIGNATURE----- Merge tag 'pull-18-rc1-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pathname updates from Al Viro: "Several cleanups in fs/namei.c" * tag 'pull-18-rc1-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: namei: cleanup double word in comment get rid of dead code in legitimize_root() fs/namei.c:reserve_stack(): tidy up the call of try_to_unlazy()	2022-06-04 19:07:15 -07:00
Linus Torvalds	cbd76edeab	Cleanups (and one fix) around struct mount handling. The fix is usermode_driver.c one - once you've done kern_mount(), you must kern_unmount(); simple mntput() will end up with a leak. Several failure exits in there messed up that way... In practice you won't hit those particular failure exits without fault injection, though. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYpvrWQAKCRBZ7Krx/gZQ 6z29AP9EZVSyIvnwXleehpa2mEZhsp+KAKgV/ENaKHMn7jiH0wD/bfgnhxIDNuc5 108E2R5RWEYTynW5k7nnP5PsTsMq5Qc= =b3Wc -----END PGP SIGNATURE----- Merge tag 'pull-18-rc1-work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mount handling updates from Al Viro: "Cleanups (and one fix) around struct mount handling. The fix is usermode_driver.c one - once you've done kern_mount(), you must kern_unmount(); simple mntput() will end up with a leak. Several failure exits in there messed up that way... In practice you won't hit those particular failure exits without fault injection, though" * tag 'pull-18-rc1-work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: move mount-related externs from fs.h to mount.h blob_to_mnt(): kern_unmount() is needed to undo kern_mount() m->mnt_root->d_inode->i_sb is a weird way to spell m->mnt_sb... linux/mount.h: trim includes uninline may_mount() and don't opencode it in fspick(2)/fsopen(2)	2022-06-04 19:00:05 -07:00
Linus Torvalds	dbe0ee4661	Descriptor handling cleanups -----BEGIN PGP SIGNATURE----- iHQEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYpwEZAAKCRBZ7Krx/gZQ 691uAP0QnwO0lOYOa41MfQ6QnzPbiYcffqtUuTJBWyfs8+WnugD2NNGQP7Zjtin9 q0wcv2KA6yY7qgu7RCCbxU/7J0YdCg== =ywuh -----END PGP SIGNATURE----- Merge tag 'pull-18-rc1-work.fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull file descriptor updates from Al Viro. - Descriptor handling cleanups * tag 'pull-18-rc1-work.fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Unify the primitives for file descriptor closing fs: remove fget_many and fput_many interface io_uring_enter(): don't leave f.flags uninitialized	2022-06-04 18:52:00 -07:00
Linus Torvalds	d66016c5cd	Nine cifs/smb3 client fixes. Includes DFS fixes, some cleanup of leagcy SMB1 code, duplicated message cleanup and a double free and deadlock fix -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmKb4dcACgkQiiy9cAdy T1G0HAwArA5ZwygidbWP+mZC+iuvFFDJczxKi+z3VteqfJCV1L7s4VQHVGMwjCbe duiIQ+wD5dUKwM13C5jVO4TeRixR+3eQXppVlq0fCmufyH+t05nZMXU+/vRgIbuH S3Tg8t6H7TJI+HbBYEH6FgnC2Gb18VAehckw119a6NYRR8czNRkEqcmyXPAhYaV7 u6eRJIiJGnki85f1oGKT+LvAtmXWf9CKWkVy2KPzkWxMPm5GYDdL04wRHqSYpAAv oA+Z9t76tYhK8e4KBLhD16OzAgUMyYyHoJSZFZHH4jtTnGz4u1SRHxh0qQy/4WnJ 3t8xSW2VsSil5N7jeXbF59GoZrbmspZ5Lu5kwaUiRjOGRKdByXU/8NFz4hS3C2Fb XPV5ZbgxLMHT857t0aoSeiiT0OXC86mLSDfdITF+ImftPjsmyfw5BL2AEf2x0oEO CiYIQyiZaGFmh8dlLhf9ysUKOETsBmX0stKs+uQ46SR78oXBDKg8vo/wnFhXItkv o9N5ucfM =Jhdy -----END PGP SIGNATURE----- Merge tag '5.19-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs client fixes from Steve French: "Nine cifs/smb3 client fixes. Includes DFS fixes, some cleanup of leagcy SMB1 code, duplicated message cleanup and a double free and deadlock fix" * tag '5.19-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix uninitialized pointer in error case in dfs_cache_get_tgt_share cifs: skip trailing separators of prefix paths cifs: update internal module number cifs: version operations for smb20 unneeded when legacy support disabled cifs: do not build smb1ops if legacy support is disabled cifs: fix potential deadlock in direct reclaim cifs: when extending a file with falloc we should make files not-sparse cifs: remove repeated debug message on cifs_put_smb_ses() cifs: fix potential double free during failed mount	2022-06-04 17:42:33 -07:00
Steve French	ee3c8019cc	cifs: fix uninitialized pointer in error case in dfs_cache_get_tgt_share Set default value of ppath to null. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-04 13:33:42 -05:00
Linus Torvalds	1f95267583	Ntfs3 for 5.19 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmKY2fUACgkQqbAzH4Mk B7ZUaw/8CIuns8LuGG4a14Q2eN/smIBdBhnTFAhk0pjZPYHaBUauor6bR1QxJwVV Vbr4QBmrPHp+MJEZOd5FDC1Fd79TCVhe5d9SzbVn3tG0DHfe8bp5YZfkrAOxGURb pG2GaQ5Pq4AiDt1d2nia8pwxt7sCVPx8v5Bvi25WPCdrrSmlxduJRBftZx4RNXLv p6iDfakGq9p1LxgJX+YkvJPqKMloY52DoB0po4JgHkQTME2tKnnzZucxRNNPs1Jd 8hBfM9g33wmETvlNffHpoI9JuAAxdKCcuU0/n6e2R9WQZqAZrrAdb5Jj2GRzV0em F/bezX0qa09v+siMgjxfHVKreKn4pea98Gu6GWI683CN4lvilIvrIsqu7z5HrX4r gSrfm+Wf5MUXMXwaKU2e/1kpP5cmUEjo7eFQS0sJzi7cO0nNqUnYYP8LwMHZw1jJ USvRhlJdGJMzRBsTXHvlYl42IVNwPP+KPlQuwgitk6oIl81L95fKivom0A5lMaEs bJrNtutz2Y9Pd4ZffJtgzMtqGvqfvSOxco7AHQYoSq2jrm8b735x1s2xhzNcdhv3 OZbTZeKfpMw8Vxnv43yJ8zpHlZ3EgD3+T0UHaBcJEXjCvLbD+ibLzfMaQa9mqC5t TE5SaEhFWf+BskVufhrgG4VfwvJGj84VhF5u9k4roJkw8+yJRdY= =ut2G -----END PGP SIGNATURE----- Merge tag 'ntfs3_for_5.19' of https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 updates from Konstantin Komarov: - fix some memory leaks and panic - fixed xfstests (tested on x86_64): generic/092 generic/099 generic/228 generic/240 generic/307 generic/444 - fix some typos, dead code, etc * tag 'ntfs3_for_5.19' of https://github.com/Paragon-Software-Group/linux-ntfs3: fs/ntfs3: provide block_invalidate_folio to fix memory leak fs/ntfs3: Fix invalid free in log_replay fs/ntfs3: Update valid size if -EIOCBQUEUED fs/ntfs3: Check new size for limits fs/ntfs3: Fix fiemap + fix shrink file size (to remove preallocated space) fs/ntfs3: In function ntfs_set_acl_ex do not change inode->i_mode if called from function ntfs_init_acl fs/ntfs3: Optimize locking in ntfs_save_wsl_perm fs/ntfs3: Update i_ctime when xattr is added fs/ntfs3: Restore ntfs_xattr_get_acl and ntfs_xattr_set_acl functions fs/ntfs3: Keep preallocated only if option prealloc enabled fs/ntfs3: Fix some memory leaks in an error handling path of 'log_replay()'	2022-06-03 16:57:16 -07:00
Linus Torvalds	1ec6574a3c	This set of changes updates init and user mode helper tasks to be ordinary user mode tasks. In commit `40966e316f` ("kthread: Ensure struct kthread is present for all kthreads") caused init and the user mode helper threads that call kernel_execve to have struct kthread allocated for them. This struct kthread going away during execve in turned made a use after free of struct kthread possible. The commit `343f4c49f2` ("kthread: Don't allocate kthread_struct for init and umh") is enough to fix the use after free and is simple enough to be backportable. The rest of the changes pass struct kernel_clone_args to clean things up and cause the code to make sense. In making init and the user mode helpers tasks purely user mode tasks I ran into two complications. The function task_tick_numa was detecting tasks without an mm by testing for the presence of PF_KTHREAD. The initramfs code in populate_initrd_image was using flush_delayed_fput to ensuere the closing of all it's file descriptors was complete, and flush_delayed_fput does not work in a userspace thread. I have looked and looked and more complications and in my code review I have not found any, and neither has anyone else with the code sitting in linux-next. Link: https://lkml.kernel.org/r/87mtfu4up3.fsf@email.froward.int.ebiederm.org Eric W. Biederman (8): kthread: Don't allocate kthread_struct for init and umh fork: Pass struct kernel_clone_args into copy_thread fork: Explicity test for idle tasks in copy_thread fork: Generalize PF_IO_WORKER handling init: Deal with the init process being a user mode process fork: Explicitly set PF_KTHREAD fork: Stop allowing kthreads to call execve sched: Update task_tick_numa to ignore tasks without an mm arch/alpha/kernel/process.c \| 13 ++++++------ arch/arc/kernel/process.c \| 13 ++++++------ arch/arm/kernel/process.c \| 12 ++++++----- arch/arm64/kernel/process.c \| 12 ++++++----- arch/csky/kernel/process.c \| 15 ++++++------- arch/h8300/kernel/process.c \| 10 ++++----- arch/hexagon/kernel/process.c \| 12 ++++++----- arch/ia64/kernel/process.c \| 15 +++++++------ arch/m68k/kernel/process.c \| 12 ++++++----- arch/microblaze/kernel/process.c \| 12 ++++++----- arch/mips/kernel/process.c \| 13 ++++++------ arch/nios2/kernel/process.c \| 12 ++++++----- arch/openrisc/kernel/process.c \| 12 ++++++----- arch/parisc/kernel/process.c \| 18 +++++++++------- arch/powerpc/kernel/process.c \| 15 +++++++------ arch/riscv/kernel/process.c \| 12 ++++++----- arch/s390/kernel/process.c \| 12 ++++++----- arch/sh/kernel/process_32.c \| 12 ++++++----- arch/sparc/kernel/process_32.c \| 12 ++++++----- arch/sparc/kernel/process_64.c \| 12 ++++++----- arch/um/kernel/process.c \| 15 +++++++------ arch/x86/include/asm/fpu/sched.h \| 2 +- arch/x86/include/asm/switch_to.h \| 8 +++---- arch/x86/kernel/fpu/core.c \| 4 ++-- arch/x86/kernel/process.c \| 18 +++++++++------- arch/xtensa/kernel/process.c \| 17 ++++++++------- fs/exec.c \| 8 ++++--- include/linux/sched/task.h \| 8 +++++-- init/initramfs.c \| 2 ++ init/main.c \| 2 +- kernel/fork.c \| 46 +++++++++++++++++++++++++++++++++------- kernel/sched/fair.c \| 2 +- kernel/umh.c \| 6 +++--- 33 files changed, 234 insertions(+), 160 deletions(-) Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmKaR/MACgkQC/v6Eiaj j0Aayg/7Bx66872d9c6igkJ+MPCTuh+v9QKCGwiYEmiU4Q5sVAFB0HPJO27qC14u 630X0RFNZTkPzNNEJNIW4kw6Dj8s8YRKf+FgQAVt4SzdRwT7eIPDjk1nGraopPJ3 O04pjvuTmUyidyViRyFcf2ptx/pnkrwP8jUSc+bGTgfASAKAgAokqKE5ecjewbBc Y/EAkQ6QW7KxPjeSmpAHwI+t3BpBev9WEC4PbhRhsBCQFO2+PJiklvqdhVNBnIjv qUezll/1xv9UYgniB15Q4Nb722SmnWSU3r8as1eFPugzTHizKhufrrpyP+KMK1A0 tdtEJNs5t2DZF7ZbGTFSPqJWmyTYLrghZdO+lOmnaSjHxK4Nda1d4NzbefJ0u+FE tutewowvHtBX6AFIbx+H3O+DOJM2IgNMf+ReQDU/TyNyVf3wBrTbsr9cLxypIJIp zze8npoLMlB7B4yxVo5ES5e63EXfi3iHl0L3/1EhoGwriRz1kWgVLUX/VZOUpscL RkJHsW6bT8sqxPWAA5kyWjEN+wNR2PxbXi8OE4arT0uJrEBMUgDCzydzOv5tJB00 mSQdytxH9LVdsmxBKAOBp5X6WOLGA4yb1cZ6E/mEhlqXMpBDF1DaMfwbWqxSYi4q sp5zU3SBAW0qceiZSsWZXInfbjrcQXNV/DkDRDO9OmzEZP4m1j0= =x6fy -----END PGP SIGNATURE----- Merge tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull kthread updates from Eric Biederman: "This updates init and user mode helper tasks to be ordinary user mode tasks. Commit `40966e316f` ("kthread: Ensure struct kthread is present for all kthreads") caused init and the user mode helper threads that call kernel_execve to have struct kthread allocated for them. This struct kthread going away during execve in turned made a use after free of struct kthread possible. Here, commit `343f4c49f2` ("kthread: Don't allocate kthread_struct for init and umh") is enough to fix the use after free and is simple enough to be backportable. The rest of the changes pass struct kernel_clone_args to clean things up and cause the code to make sense. In making init and the user mode helpers tasks purely user mode tasks I ran into two complications. The function task_tick_numa was detecting tasks without an mm by testing for the presence of PF_KTHREAD. The initramfs code in populate_initrd_image was using flush_delayed_fput to ensuere the closing of all it's file descriptors was complete, and flush_delayed_fput does not work in a userspace thread. I have looked and looked and more complications and in my code review I have not found any, and neither has anyone else with the code sitting in linux-next" * tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: sched: Update task_tick_numa to ignore tasks without an mm fork: Stop allowing kthreads to call execve fork: Explicitly set PF_KTHREAD init: Deal with the init process being a user mode process fork: Generalize PF_IO_WORKER handling fork: Explicity test for idle tasks in copy_thread fork: Pass struct kernel_clone_args into copy_thread kthread: Don't allocate kthread_struct for init and umh	2022-06-03 16:03:05 -07:00
Linus Torvalds	744983d878	This pull request contains fixes for JFFS2, UBI and UBIFS JFFS2: - Fixes for a memory leak UBI: - Fixes for fastmap (UAF, high CPU usage) UBIFS: - Minor cleanups -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmKZ0F0WHHJpY2hhcmRA c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wVOsD/wMI9/RR3NQukIdj1besXnYWCGr 44PLTZsX+fwn/ndj/IDqtyZiStILQOX3gNK9bZ540bVQGkMxgYerQUjZH2A1CaBd BwisUhU9duv6t6ObB8IudnqYB3nYTM7mRGQQOnkw0ddGdCoN7S8WB7ed7ce6bLF2 X4cbzpIfBhuXghKzZ0d9k/uNEj4ZcTsTov+yiX2F9M0xBgfFIoIrP51mk+5l7PAj BmEqDFRfuTPm5u1WMUpThPX/RhSEu5PkZDLztSmD8WfiaF20KPFrWIMtKQmn/3vT IkRzJoHZSX8KmjoD1eBHssblJwcF6l5OGZNrS2NGqJMpVtRJ2uFWq3lx/WjgN88W /eqIoryYod8NTogTzl//y/mtDhSkaGK6LdDKjk5Z1gP1ferJNBw8k1+icR/YP4sj /dBXJTI7nIrb79EegQHIGNVAfE+Oi+rjEB/r8cqG9z5sJQvCK9p30+cVJSuK0/Na N6wX107HtgDh5kLaUKwTQjZBi/1TpV0dQCdAOkqgyn5YlzxXKFKISBrRIPuFtGXM a6fBwd+RSd9ccWGPE4R+6JQELFOYDpOj6cunwheb0B1YLaDGoV+mjAcO6uGdExXy +iErtxV1v+lXqslQKR09uO+N2gApQYF7B01TjDjyxWsKWABe55Tl3z8lGpPxGIcb FvkFMzuuIE4FlyUY6g== =QSbc -----END PGP SIGNATURE----- Merge tag 'for-linus-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull JFFS2, UBI and UBIFS updates from Richard Weinberger: "JFFS2: - Fixes for a memory leak UBI: - Fixes for fastmap (UAF, high CPU usage) UBIFS: - Minor cleanups" * tag 'for-linus-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: ubi_create_volume: Fix use-after-free when volume creation failed ubi: fastmap: Check wl_pool for free peb before wear leveling ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not empty ubifs: Use NULL instead of using plain integer as pointer ubifs: Simplify the return expression of run_gc() jffs2: fix memory leak in jffs2_do_fill_super jffs2: Use kzalloc instead of kmalloc/memset	2022-06-03 14:42:24 -07:00
Paulo Alcantara	ef605e8682	cifs: skip trailing separators of prefix paths During DFS failover, prefix paths may change, so make sure to not leave trailing separators when parsing thew in dfs_cache_get_tgt_share(). The separators of prefix paths are already handled by build_path_from_dentry_optional_prefix(). Consider the following DFS link: //dom/dfs/link: [\srv1\share\dir1, \srv2\share\dir1] Before commit: mount.cifs //dom/dfs/link tree connect to \\srv1\share; prefix_path=dir1 disconnect srv1; failover to srv2 tree connect to \\srv2\share; prefix_path=dir1\ mv foo bar ... SMB2 430 Create Request File: dir1\\foo;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request SMB2 582 Create Response File: dir1\\foo;GetInfo Response;Close Response SMB2 430 Create Request File: dir1\\bar;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request SMB2 286 Create Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;GetInfo Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;Close Response, Error: STATUS_OBJECT_NAME_NOT_FOUND SMB2 462 Create Request File: dir1\\foo;SetInfo Request FILE_INFO/SMB2_FILE_RENAME_INFO NewName:dir1\\bar;Close Request SMB2 478 Create Response File: dir1\\foo;SetInfo Response, Error: STATUS_OBJECT_NAME_INVALID;Close Response After commit: mount.cifs //dom/dfs/link tree connect to \\srv1\share; prefix_path=dir1 disconnect srv1; failover to srv2 tree connect to \\srv2\share; prefix_path=dir1 mv foo bar ... SMB2 430 Create Request File: dir1\foo;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request SMB2 582 Create Response File: dir1\foo;GetInfo Response;Close Response SMB2 430 Create Request File: dir1\bar;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request SMB2 286 Create Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;GetInfo Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;Close Response, Error: STATUS_OBJECT_NAME_NOT_FOUND SMB2 462 Create Request File: dir1\foo;SetInfo Request FILE_INFO/SMB2_FILE_RENAME_INFO NewName:dir1\bar;Close Request SMB2 478 Create Response File: dir1\foo;SetInfo Response;Close Response Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-03 14:14:58 -05:00
Linus Torvalds	500a434fc5	Driver core changes for 5.19-rc1 Here is the set of driver core changes for 5.19-rc1. Note, I'm not really happy with this pull request as-is, see below for details, but overall this is all good for everything but a small set of systems, which we have a fix for already. Lots of tiny driver core changes and cleanups happened this cycle, but the two major things were: - firmware_loader reorganization and additions including the ability to have XZ compressed firmware images and the ability for userspace to initiate the firmware load when it needs to, instead of being always initiated by the kernel. FPGA devices specifically want this ability to have their firmware changed over the lifetime of the system boot, and this allows them to work without having to come up with yet-another-custom-uapi interface for loading firmware for them. - physical location support added to sysfs so that devices that know this information, can tell userspace where they are located in a common way. Some ACPI devices already support this today, and more bus types should support this in the future. Smaller changes included: - driver_override api cleanups and fixes - error path cleanups and fixes - get_abi script fixes - deferred probe timeout changes. It's that last change that I'm the most worried about. It has been reported to cause boot problems for a number of systems, and I have a tested patch series that resolves this issue. But I didn't get it merged into my tree before 5.18-final came out, so it has not gotten any linux-next testing. I'll send the fixup patches (there are 2) as a follow-on series to this pull request if you want to take them directly, _OR_ I can just revert the probe timeout changes and they can wait for the next -rc1 merge cycle. Given that the fixes are tested, and pretty simple, I'm leaning toward that choice. Sorry this all came at the end of the merge window, I should have resolved this all 2 weeks ago, that's my fault as it was in the middle of some travel for me. All have been tested in linux-next for weeks, with no reported issues other than the above-mentioned boot time outs. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYpnv/A8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+yk/fACgvmenbo5HipqyHnOmTQlT50xQ9EYAn2eTq6ai GkjLXBGNWOPBa5cU52qf =yEi/ -----END PGP SIGNATURE----- Merge tag 'driver-core-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of driver core changes for 5.19-rc1. Lots of tiny driver core changes and cleanups happened this cycle, but the two major things are: - firmware_loader reorganization and additions including the ability to have XZ compressed firmware images and the ability for userspace to initiate the firmware load when it needs to, instead of being always initiated by the kernel. FPGA devices specifically want this ability to have their firmware changed over the lifetime of the system boot, and this allows them to work without having to come up with yet-another-custom-uapi interface for loading firmware for them. - physical location support added to sysfs so that devices that know this information, can tell userspace where they are located in a common way. Some ACPI devices already support this today, and more bus types should support this in the future. Smaller changes include: - driver_override api cleanups and fixes - error path cleanups and fixes - get_abi script fixes - deferred probe timeout changes. It's that last change that I'm the most worried about. It has been reported to cause boot problems for a number of systems, and I have a tested patch series that resolves this issue. But I didn't get it merged into my tree before 5.18-final came out, so it has not gotten any linux-next testing. I'll send the fixup patches (there are 2) as a follow-on series to this pull request. All have been tested in linux-next for weeks, with no reported issues other than the above-mentioned boot time-outs" * tag 'driver-core-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) driver core: fix deadlock in __device_attach kernfs: Separate kernfs_pr_cont_buf and rename_lock. topology: Remove unused cpu_cluster_mask() driver core: Extend deferred probe timeout on driver registration MAINTAINERS: add Russ Weight as a firmware loader maintainer driver: base: fix UAF when driver_attach failed test_firmware: fix end of loop test in upload_read_show() driver core: location: Add "back" as a possible output for panel driver core: location: Free struct acpi_pld_info pld driver core: Add "" wildcard support to driver_async_probe cmdline param driver core: location: Check for allocations failure arch_topology: Trace the update thermal pressure kernfs: Rename kernfs_put_open_node to kernfs_unlink_open_file. export: fix string handling of namespace in EXPORT_SYMBOL_NS rpmsg: use local 'dev' variable rpmsg: Fix calling device_lock() on non-initialized device firmware_loader: describe 'module' parameter of firmware_upload_register() firmware_loader: Move definitions from sysfs_upload.h to sysfs.h firmware_loader: Fix configs for sysfs split selftests: firmware: Add firmware upload selftests ...	2022-06-03 11:48:47 -07:00
Linus Torvalds	04d93b2b8b	SPDX changes for 5.19-rc1 Here are some SPDX (i.e. licensing) changes for 5.19-rc1 The SPDX-labeling effort has started to pick up again, so here are some changes for various parts of the tree that are related to this effort. Included in here are: - freevxfs license updates - spihash.c license cleanups - spdxcheck script updates to make things easier to work with going forward All of the license updates came from the original authors/copyright holders of the code involved. All of these have been in linux-next for weeks with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYpngmg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+yl41wCgzt9M0/9hLjVV9UIW2l2phyJQZPQAoK7u0RUU tYRRT2gSUwAHlu3khZSS =fjdf -----END PGP SIGNATURE----- Merge tag 'spdx-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX updates from Greg KH: "Here are some SPDX license marker changes. The SPDX-labeling effort has started to pick up again, so here are some changes for various parts of the tree that are related to this effort. Included in here are: - freevxfs license updates - spihash.c license cleanups - spdxcheck script updates to make things easier to work with going forward All of the license updates came from the original authors/copyright holders of the code involved. All of these have been in linux-next for weeks with no reported issues" * tag 'spdx-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: siphash: add SPDX tags as sole licensing authority scripts/spdxcheck: Exclude top-level README scripts/spdxcheck: Exclude MAINTAINERS/CREDITS scripts/spdxcheck: Exclude config directories scripts/spdxcheck: Put excluded files and directories into a separate file scripts/spdxcheck: Add option to display files without SPDX scripts/spdxcheck: Add [sub]directory statistics scripts/spdxcheck: Add directory statistics scripts/spdxcheck: Add percentage to statistics freevxfs: relicense to GPLv2 only	2022-06-03 10:34:34 -07:00
Linus Torvalds	5ac8bdb9ad	io_uring-5.19-2022-06-02 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKZmm8QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpka7EAC8aWPkU+s5qiRVD3pJRjQQb0kAIUHXwlgv Y4c7CdBqjIYcRbMkxZg5lJaBHr8cYh67X0RSrkxcgO4uEtF3DTMZcrlG7ZaV8jnB e44zjBNsWuI6//ef6vlACuTLqsb/ZRTxzDB2haPlqAZ/uMjK5TdUIQfFc83IDLAE UxrUWE25R49hvg84L3Apbn79kLsLpqbv2vANuctDhOE0bH10S0SS987PVTg4TUA+ tkIzTyqgB9vWFJwsCkARp13uGreW+XroHwSh7KwKXJR55lu2f1vD4Spg0UvgigQQ I6vSphlvd4GL7oqQM0pHUyuraYOQ/WChXcUN3Jitgv92S1W0+s73hp/RgBFZBRZM E1m3Qu465QlNziwfRxV5gpPC4TpVH7CeXE8RpaH76KhACsQSzYtDbcl+gYaZJnOz 6pp0kAbJ8Wrfn4P0bYwpCz/aPi1+P2cKNhzeUIF/wqz3yt8CsQkbTyrQ+TLQ4XtX VnMF10opDl6yt2ELP9KPt3vjJnzIbxEmlqdg1yj4rB5FWWj8CkBdoaeqFdnqcL2r AuLjB/yjtzxJTAOMZYV/gCnArRiGgGb/7JTbyymW+iXNEdnYhmG225HfEG9c7ytj XHsu1vdNonNsk+cXqrGRZ+AUm8eHDX4hZX4CG06UXUuPEBoOJ5qfEZTiXLIwOnGJ Q0IGHDGgaw== =OkKj -----END PGP SIGNATURE----- Merge tag 'io_uring-5.19-2022-06-02' of git://git.kernel.dk/linux-block Pull more io_uring updates from Jens Axboe: - A small series with some prep patches for the upcoming 5.20 split of the io_uring.c file. No functional changes here, just minor bits that are nice to get out of the way now (me) - Fix for a memory leak in high numbered provided buffer groups, introduced in the merge window (me) - Wire up the new socket opcode for allocated direct descriptors, making it consistent with the other opcodes that can instantiate a descriptor (me) - Fix for the inflight tracking, should go into 5.18-stable as well (me) - Fix for a deadlock for io-wq offloaded file slot allocations (Pavel) - Direct descriptor failure fput leak fix (Xiaoguang) - Fix for the direct descriptor allocation hinting in case of unsuccessful install (Xiaoguang) * tag 'io_uring-5.19-2022-06-02' of git://git.kernel.dk/linux-block: io_uring: reinstate the inflight tracking io_uring: fix deadlock on iowq file slot alloc io_uring: let IORING_OP_FILES_UPDATE support choosing fixed file slots io_uring: defer alloc_hint update to io_file_bitmap_set() io_uring: ensure fput() called correspondingly when direct install fails io_uring: wire up allocated direct descriptors for socket io_uring: fix a memory leak of buffer group list on exit io_uring: move shutdown under the general net section io_uring: unify calling convention for async prep handling io_uring: add io_op_defs 'def' pointer in req init and issue io_uring: make prep and issue side of req handlers named consistently io_uring: make timeout prep handlers consistent with other prep handlers	2022-06-03 10:10:38 -07:00
Chuck Lever	b6c71c66b0	NFSD: Fix potential use-after-free in nfsd_file_put() nfsd_file_put_noref() can free @nf, so don't dereference @nf immediately upon return from nfsd_file_put_noref(). Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Fixes: `999397926a` ("nfsd: Clean up nfsd_file_put()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-06-02 13:05:58 -04:00
Linus Torvalds	17d8e3d90b	A big pile of assorted fixes and improvements for the filesystem with nothing in particular standing out, except perhaps that the fact that the MDS never really maintained atime was made official and thus it's no longer updated on the client either. We also have a MAINTAINERS update: Jeff is transitioning his filesystem maintainership duties to Xiubo. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmKYs1wTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi+PvCACIj47W4FapO672xcIkQ4920ZT1Jw/o 2BfKXUtNyVLpGgBlweJWSTd1tfXp0tl9MFg00t/zbVarHH0SGAgF1z6e/tM7rjA/ vyCkFQXJDuwB0kCbCtZ9xt5XIQkkvPPJOmyLSKYl7RqImch7pTRd5IwxgKGWqXDx FraVXqFqvr8L+szV/JCopdxdMVTFixWRD48z5pPlOReaOXiGjtTMoFIBIPp7GqVL UB7wyOtDmyzcGnUsRNqMQFrkUBsBW1IEDKf/yVtQNDjUxmr3uXm8vugeISpMOGBO cCkZACDeO0lpgHrXSo4UCf46bg3/HujxZu0nTc9HqPDiFdOmKmf58N4n =MAi2 -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.19-rc1' of https://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "A big pile of assorted fixes and improvements for the filesystem with nothing in particular standing out, except perhaps that the fact that the MDS never really maintained atime was made official and thus it's no longer updated on the client either. We also have a MAINTAINERS update: Jeff is transitioning his filesystem maintainership duties to Xiubo" * tag 'ceph-for-5.19-rc1' of https://github.com/ceph/ceph-client: (23 commits) MAINTAINERS: move myself from ceph "Maintainer" to "Reviewer" ceph: fix decoding of client session messages flags ceph: switch TASK_INTERRUPTIBLE to TASK_KILLABLE ceph: remove redundant variable ino ceph: try to queue a writeback if revoking fails ceph: fix statfs for subdir mounts ceph: fix possible deadlock when holding Fwb to get inline_data ceph: redirty the page for writepage on failure ceph: try to choose the auth MDS if possible for getattr ceph: disable updating the atime since cephfs won't maintain it ceph: flush the mdlog for filesystem sync ceph: rename unsafe_request_wait() libceph: use swap() macro instead of taking tmp variable ceph: fix statx AT_STATX_DONT_SYNC vs AT_STATX_FORCE_SYNC check ceph: no need to invalidate the fscache twice ceph: replace usage of found with dedicated list iterator variable ceph: use dedicated list iterator variable ceph: update the dlease for the hashed dentry when removing ceph: stop retrying the request when exceeding 256 times ceph: stop forwarding the request when exceeding 256 times ...	2022-06-02 08:59:39 -07:00
Jens Axboe	9cae36a094	io_uring: reinstate the inflight tracking After some debugging, it was realized that we really do still need the old inflight tracking for any file type that has io_uring_fops assigned. If we don't, then trivial circular references will mean that we never get the ctx cleaned up and hence it'll leak. Just bring back the inflight tracking, which then also means we can eliminate the conditional dropping of the file when task_work is queued. Fixes: `d5361233e9` ("io_uring: drop the old style inflight file tracking") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-06-01 23:57:02 -06:00
Steve French	096c956b0d	cifs: update internal module number To 2.37 Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-01 23:23:09 -05:00
Steve French	7ef93ffccd	cifs: version operations for smb20 unneeded when legacy support disabled We should not be including unused smb20 specific code when legacy support is disabled (CONFIG_CIFS_ALLOW_INSECURE_LEGACY turned off). For example smb2_operations and smb2_values aren't used in that case. Over time we can move more and more SMB1/CIFS and SMB2.0 code into the insecure legacy ifdefs Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-01 22:30:36 -05:00
Steve French	387ba9bf4c	cifs: do not build smb1ops if legacy support is disabled We should not be including unused SMB1/CIFS functions when legacy support is disabled (CONFIG_CIFS_ALLOW_INSECURE_LEGACY turned off), but especially obvious is not needing to build smb1ops.c at all when legacy support is disabled. Over time we can move more SMB1/CIFS and SMB2.0 legacy functions into ifdefs but this is a good start (and shrinks the module size a few percent). Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-06-01 21:49:27 -05:00

... 2 3 4 5 6 ...

77126 Commits