Commit Graph

211 Commits

Author SHA1 Message Date
Daniel Yang
c290fe508e exfat: resolve memory leak from exfat_create_upcase_table()
If exfat_load_upcase_table reaches end and returns -EINVAL,
allocated memory doesn't get freed and while
exfat_load_default_upcase_table allocates more memory, leading to a
memory leak.

Here's link to syzkaller crash report illustrating this issue:
https://syzkaller.appspot.com/text?tag=CrashReport&x=1406c201980000

Reported-by: syzbot+e1c69cadec0f1a078e3d@syzkaller.appspotmail.com
Fixes: a13d1a4de3 ("exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Yang <danielyangkang@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-23 21:38:13 +09:00
Yuezhang Mo
6630ea4910 exfat: move extend valid_size into ->page_mkwrite()
It is not a good way to extend valid_size to the end of the
mmap area by writing zeros in mmap. Because after calling mmap,
no data may be written, or only a small amount of data may be
written to the head of the mmap area.

This commit moves extending valid_size to exfat_page_mkwrite().
In exfat_page_mkwrite() only extend valid_size to the starting
position of new data writing, which reduces unnecessary writing
of zeros.

If the block is not mapped and is marked as new after being
mapped for writing, block_write_begin() will zero the page
cache corresponding to the block, so there is no need to call
zero_user_segment() in exfat_file_zeroed_range(). And after moving
extending valid_size to exfat_page_mkwrite(), the data written by
mmap will be copied to the page cache but the page cache may be
not mapped to the disk. Calling zero_user_segment() will cause
the data written by mmap to be cleared. So this commit removes
calling zero_user_segment() from exfat_file_zeroed_range() and
renames exfat_file_zeroed_range() to exfat_extend_valid_size().

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-23 21:38:11 +09:00
Yuezhang Mo
d2b537b3e5 exfat: fix memory leak in exfat_load_bitmap()
If the first directory entry in the root directory is not a bitmap
directory entry, 'bh' will not be released and reassigned, which
will cause a memory leak.

Fixes: 1e49a94cf7 ("exfat: add bitmap operations")
Cc: stable@vger.kernel.org
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-18 07:40:58 +09:00
Dongliang Cui
f761fcdd28 exfat: Implement sops->shutdown and ioctl
We found that when writing a large file through buffer write, if the
disk is inaccessible, exFAT does not return an error normally, which
leads to the writing process not stopping properly.

To easily reproduce this issue, you can follow the steps below:

1. format a device to exFAT and then mount (with a full disk erase)
2. dd if=/dev/zero of=/exfat_mount/test.img bs=1M count=8192
3. eject the device

You may find that the dd process does not stop immediately and may
continue for a long time.

The root cause of this issue is that during buffer write process,
exFAT does not need to access the disk to look up directory entries
or the FAT table (whereas FAT would do) every time data is written.
Instead, exFAT simply marks the buffer as dirty and returns,
delegating the writeback operation to the writeback process.

If the disk cannot be accessed at this time, the error will only be
returned to the writeback process, and the original process will not
receive the error, so it cannot be returned to the user side.

When the disk cannot be accessed normally, an error should be returned
to stop the writing process.

Implement sops->shutdown and ioctl to shut down the file system
when underlying block device is marked dead.

Signed-off-by: Dongliang Cui <dongliang.cui@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-18 07:40:56 +09:00
Yuezhang Mo
231eb762bb exfat: do not fallback to buffered write
After commit(11a347fb6c exfat: change to get file size from
DataLength), the remaining area or hole had been filled with
zeros before calling exfat_direct_IO(), so there is no need to
fallback to buffered write, and ->i_size_aligned is no longer
needed, drop it.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-17 15:09:50 +09:00
Yuezhang Mo
fba27cf005 exfat: drop ->i_size_ondisk
->i_size_ondisk is no longer used by exfat_write_begin() after
commit(11a347fb6c exfat: change to get file size from DataLength),
drop it.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-17 14:46:51 +09:00
Matthew Wilcox (Oracle)
1da86618bd
fs: Convert aops->write_begin to take a folio
Convert all callers from working on a page to working on one page
of a folio (support for working on an entire folio can come later).
Removes a lot of folio->page->folio conversions.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07 11:33:21 +02:00
Matthew Wilcox (Oracle)
a225800f32
fs: Convert aops->write_end to take a folio
Most callers have a folio, and most implementations operate on a folio,
so remove the conversion from folio->page->folio to fit through this
interface.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07 11:32:02 +02:00
Linus Torvalds
0260b0a744 Description for this pull request:
- Fix deadlock issue reported by syzbot.
 - Handle idmapped mounts.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmaVIHUWHGxpbmtpbmpl
 b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCIlYEACc4STnNTizxVM2wy2pWvBPYVyL
 ayBadYoSqK0nK2+tpBIanw294CYOXobnmdnFDfJofeuuvzUq2bUSyrB+WHdhNdQE
 YuDcEGYzjFMWtedefpxIV9i1mLIpegg+HjCLiHAOhWOKT7ewjLVeqcqjFBtLZw53
 JW4jzTAt/gp0NqeSRUs9lNqPt/tM5F/QL8gEorAAQcG/i0MW96uwl53KWdyIyuqD
 1bCEIaAOpUVtAkdBSJ5fTDrWIqe8QEkVAiQnujeAll08q2dyQiBHsK4WBNyH9TYO
 1XmIhDV6E/x2fxBkthoeCNFvFB9460OV3UveX0j8biXC4XVJMHpvvI6mXZnIT0C4
 AtzW16PBUjuRDqwJ8M///ApNu5BS0297LVTXHPv/2SgkqW+08JNtvMHQOk4f7iIB
 a+AfRltBRnrzQyNkh+SV14mmmxVp2EiTi73m0R5CjAbIuq8efox/RJX1v1MWqc9Q
 +NkFRM/fvt8RCsCWxo1N3Pu8djjegb/FH1hykk/PhJ0S8/36yosyX7SAzYAbvu9W
 m6QVbghBO/lJy4WW5JK6lxoh19EHDDCsv4rdGwRr4rfM1nwq4Rx4l2uxJ5jM6f74
 0ZHNLjCf/fQP/AgnAnCC3m2JpF6R7e+TykAYlmwx3Lxa3z6NHseSDLjdlKuIen6G
 G+dD68j5h3X30/7B2g==
 =xVc5
 -----END PGP SIGNATURE-----

Merge tag 'exfat-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

 - Fix deadlock issue reported by syzbot

 - Handle idmapped mounts

* tag 'exfat-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: fix potential deadlock on __exfat_get_dentry_set
  exfat: handle idmapped mounts
2024-07-17 12:53:47 -07:00
Sungjong Seo
89fc548767 exfat: fix potential deadlock on __exfat_get_dentry_set
When accessing a file with more entries than ES_MAX_ENTRY_NUM, the bh-array
is allocated in __exfat_get_entry_set. The problem is that the bh-array is
allocated with GFP_KERNEL. It does not make sense. In the following cases,
a deadlock for sbi->s_lock between the two processes may occur.

       CPU0                CPU1
       ----                ----
  kswapd
   balance_pgdat
    lock(fs_reclaim)
                      exfat_iterate
                       lock(&sbi->s_lock)
                       exfat_readdir
                        exfat_get_uniname_from_ext_entry
                         exfat_get_dentry_set
                          __exfat_get_dentry_set
                           kmalloc_array
                            ...
                            lock(fs_reclaim)
    ...
    evict
     exfat_evict_inode
      lock(&sbi->s_lock)

To fix this, let's allocate bh-array with GFP_NOFS.

Fixes: a3ff29a95f ("exfat: support dynamic allocate bh for exfat_entry_set_cache")
Cc: stable@vger.kernel.org # v6.2+
Reported-by: syzbot+412a392a2cd4a65e71db@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000fef47e0618c0327f@google.com
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-07-15 21:44:28 +09:00
Michael Jeanson
224821766f exfat: handle idmapped mounts
Pass the idmapped mount information to the different helper
functions. Adapt the uid/gid checks in exfat_setattr to use the
vfsuid/vfsgid helpers.

Based on the fat implementation in commit 4b78993681
("fat: handle idmapped mounts") by Christian Brauner.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-07-15 21:44:28 +09:00
Eric Sandeen
ffe1b94d74
exfat: Convert to new uid/gid option parsing helpers
Convert to new uid/gid option parsing helpers

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Link: https://lore.kernel.org/r/dda575de-11a7-4139-8a25-07957d311ed3@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-02 06:21:18 +02:00
Yuezhang Mo
f19257997d exfat: zero the reserved fields of file and stream extension dentries
From exFAT specification, the reserved fields should initialize
to zero and should not use for any purpose.

If create a new dentry set in the UNUSED dentries, all fields
had been zeroed when allocating cluster to parent directory.

But if create a new dentry set in the DELETED dentries, the
reserved fields in file and stream extension dentries may be
non-zero. Because only the valid bit of the type field of the
dentry is cleared in exfat_remove_entries(), if the type of
dentry is different from the original(For example, a dentry that
was originally a file name dentry, then set to deleted dentry,
and then set as a file dentry), the reserved fields is non-zero.

So this commit initializes the dentry to 0 before createing file
dentry and stream extension dentry.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-04-25 21:59:59 +09:00
Yuezhang Mo
d7ed5232f0 exfat: fix timing of synchronizing bitmap and inode
Commit(f55c096f62 exfat: do not zero the extended part) changed
the timing of synchronizing bitmap and inode in exfat_cont_expand().
The change caused xfstests generic/013 to fail if 'dirsync' or 'sync'
is enabled. So this commit restores the timing.

Fixes: f55c096f62 ("exfat: do not zero the extended part")
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-31 23:39:29 +09:00
Linus Torvalds
1b3e251373 Description for this pull request:
- Improve dirsync performance by syncing on a dentry-set rather than
     on a per-directory entry
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmX653kWHGxpbmtpbmpl
 b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCMEZD/9bpRN1V5YezmQUUh928yC0OWai
 aEDZqmzEsrJ7wXKUYmwjCzxO9/h3CwdtWTaJMctz2XlfJ9L62hiGXki4Cc27vfgs
 RGDV7fHpRmRq+JxgZN+UEnfrJx6kA0xrOaoyrbvT0t8pTCyK8yOY28YsltbI8wKd
 yQbWS4u4r1/Gugfry7PeGA5x6fxGcP/kbjB81Q1+/ilJetqELcUH9INOGSwvfzOh
 k9lgF+ujJVauzP1Pbw4fSZhcfXGYu0x4rbwcUAJUvuc3NXbotKEAW/ICJW1bH5En
 nN7IjiCYwMjEpK7H4zaZ61zrTIfe/MYgKLsq5XWYrvAmSL8QIlJiB4gbSpwd95gc
 7AK4d4mgYrF3oZPjYvb7kpbPtrNywQzIhef2W67E+fGifAKSsuI8gf+LYA2a7m5A
 HqTeL+3z1DnCwEGfLkNtEBy4xbv039dftjDcR8qzPs3WgwWU7G1nhwuQttcXC1qB
 p5eYxL1RvCbDXER9K6chdYQg7KkxCThlIGuG6shqQt4ybApyGbngmw7s3laUHMZQ
 3R3eD1UqtDb3vr04/PmvQ8NbLy0js8Q2lmjIowznyO/ahpCw/++OgX7T4Q+Sse6h
 xfGj8YkyeXNF7t6NwXhErdygvovoXQ4ADZIcxlhSc3NKnZUzL7RDEx5yMUzKggHu
 BlZlCJZoOoAm5Gk9Cw==
 =0fPf
 -----END PGP SIGNATURE-----

Merge tag 'exfat-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

 - Improve dirsync performance by syncing on a dentry-set rather than on
   a per-directory entry

* tag 'exfat-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: remove duplicate update parent dir
  exfat: do not sync parent dir if just update timestamp
  exfat: remove unused functions
  exfat: convert exfat_find_empty_entry() to use dentry cache
  exfat: convert exfat_init_ext_entry() to use dentry cache
  exfat: move free cluster out of exfat_init_ext_entry()
  exfat: convert exfat_remove_entries() to use dentry cache
  exfat: convert exfat_add_entry() to use dentry cache
  exfat: add exfat_get_empty_dentry_set() helper
  exfat: add __exfat_get_dentry_set() helper
2024-03-21 09:47:12 -07:00
Yuezhang Mo
dc38fdc51b exfat: remove duplicate update parent dir
For renaming, the directory only needs to be updated once if it
is in the same directory.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:56:10 +09:00
Yuezhang Mo
96cf51accc exfat: do not sync parent dir if just update timestamp
When sync or dir_sync is enabled, there is no need to sync the
parent directory's inode if only for updating its timestamp.

1. If an unexpected power failure occurs, the timestamp of the
   parent directory is not updated to the storage, which has no
   impact on the user.

2. The number of writes will be greatly reduced, which can not
   only improve performance, but also prolong device life.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:56:05 +09:00
Yuezhang Mo
4d71455976 exfat: remove unused functions
exfat_count_ext_entries() is no longer called, remove it.
exfat_update_dir_chksum() is no longer called, remove it and
rename exfat_update_dir_chksum_with_entry_set() to it.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:56:01 +09:00
Yuezhang Mo
af02c72d0b exfat: convert exfat_find_empty_entry() to use dentry cache
Before this conversion, each dentry traversed needs to be read
from the storage device or page cache. There are at least 16
dentries in a sector. This will result in frequent page cache
searches.

After this conversion, if all directory entries in a sector are
used, the sector only needs to be read once.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:54 +09:00
Yuezhang Mo
d97e060673 exfat: convert exfat_init_ext_entry() to use dentry cache
Before this conversion, in exfat_init_ext_entry(), to init
the dentries in a dentry set, the sync times is equals the
dentry number if 'dirsync' or 'sync' is enabled.
That affects not only performance but also device life.

After this conversion, only needs to be synchronized once if
'dirsync' or 'sync' is enabled.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:49 +09:00
Yuezhang Mo
4e1aa22fea exfat: move free cluster out of exfat_init_ext_entry()
exfat_init_ext_entry() is an init function, it's a bit strange
to free cluster in it. And the argument 'inode' will be removed
from exfat_init_ext_entry(). So this commit changes to free the
cluster in exfat_remove_entries().

Code refinement, no functional changes.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:45 +09:00
Yuezhang Mo
ff4343da02 exfat: convert exfat_remove_entries() to use dentry cache
Before this conversion, in exfat_remove_entries(), to mark the
dentries in a dentry set as deleted, the sync times is equals
the dentry numbers if 'dirsync' or 'sync' is enabled.
That affects not only performance but also device life.

After this conversion, only needs to be synchronized once if
'dirsync' or 'sync' is enabled.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:40 +09:00
Yuezhang Mo
cf8663fa99 exfat: convert exfat_add_entry() to use dentry cache
After this conversion, if "dirsync" or "sync" is enabled, the
number of synchronized dentries in exfat_add_entry() will change
from 2 to 1.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:36 +09:00
Yuezhang Mo
01da3a5176 exfat: add exfat_get_empty_dentry_set() helper
This helper is used to lookup empty dentry set. If there are
no enough empty dentries at the input location, this helper will
return the number of dentries that need to be skipped for the
next lookup.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:33 +09:00
Yuezhang Mo
7b6bab2359 exfat: add __exfat_get_dentry_set() helper
Since exfat_get_dentry_set() invokes the validate functions of
exfat_validate_entry(), it only supports getting a directory
entry set of an existing file, doesn't support getting an empty
entry set.

To remove the limitation, add this helper.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:28 +09:00
Linus Torvalds
f88c3fb81c mm, slab: remove last vestiges of SLAB_MEM_SPREAD
Yes, yes, I know the slab people were planning on going slow and letting
every subsystem fight this thing on their own.  But let's just rip off
the band-aid and get it over and done with.  I don't want to see a
number of unnecessary pull requests just to get rid of a flag that no
longer has any meaning.

This was mainly done with a couple of 'sed' scripts and then some manual
cleanup of the end result.

Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-12 20:32:19 -07:00
Linus Torvalds
3aec97e0c7 Description for this pull request:
- Fix ftruncate failure when allocating non-contiguous clusters.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmXhiBkWHGxpbmtpbmpl
 b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCFRdD/47DYPGhZPWt4+JamGs3ZR0WX8l
 555GnbSjzLmmGUYjEyDsMmJllL7I0gX3Hm2QNOJIce1ETh3a2oSQlp+O+uIJM7zW
 rQjV8l6u+JZXEvpZwbfIfpyQ2+p6wFgueFC8+HX+VwfiJXw0hKNfqIwugjyRiWUF
 P1pptVZvsne2d7dUM8nmwedFZClP2GhyFxPqOiAvPd6Yw5s6b2yOj534eU9JSWfp
 ph/ZVnZOZNwaKUPD57OWYFKb4hLHEDTpjWjpU1jXkdeKCAB9WuJyegJMtznJIaBW
 DtkDmaUeQLPccWFWlZXOPvZfueJuWU1rxzGCqYBOvP77SPDATxqgFh5yhRj5wqnx
 Cs//VqFY6fZC6HfXAURvbM6l+NblaH1m6Bm7o7PkbBScju9tyKByqMNGLtHHnH2V
 y8jD8mFj2wjvwD+6EBdIVD7ddr1M8uyPUpk++gYxklpSDwwzngkqpUItYCv1sKhD
 tlwyPY5efDXC1e38KWTmbhGh0UACz8pVNPbFZsQ7jVlvKlAoT+/T4SGz4t+4qXpA
 IEo36PYEznxNZllf5/kTQ/yzVjPuX2K0CVTvNb01nTdGfHE1p8Uvt1nWJr2nRall
 vSsOswEbJ78O+313+HCWL2WreqlN4yCVhh+cpyinGLZgrQwBMl4uJrgBb0XxwTD6
 0PjItbRcF4C+Bo4tGw==
 =LfJg
 -----END PGP SIGNATURE-----

Merge tag 'exfat-for-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat fix from Namjae Jeon:

 - Fix ftruncate failure when allocating non-contiguous clusters

* tag 'exfat-for-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: fix appending discontinuous clusters to empty file
2024-03-01 12:22:30 -08:00
Linus Torvalds
66a97c2ec9 We still have some races in filesystem methods when exposed to RCU
pathwalk.  This series is a result of code audit (the second round
 of it) and it should deal with most of that stuff.  Exceptions: ntfs3
 ->d_hash()/->d_compare() and ceph_d_revalidate().  Up to maintainers (a
 note for NTFS folks - when documentation says that a method may not block,
 it *does* imply that blocking allocations are to be avoided.  Really).
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZdroDAAKCRBZ7Krx/gZQ
 60dKAQCzp6rYr3ye4nylho9Rzu8LEpH04TuNf3C6JuyUaNHxHwEAvNLatZsyFnmV
 Ksp2Rg/IlKPNtQgYJ8xPxv9DFmNe8gI=
 =47Un
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull RCU pathwalk fixes from Al Viro:
 "We still have some races in filesystem methods when exposed to RCU
  pathwalk. This series is a result of code audit (the second round of
  it) and it should deal with most of that stuff.

  Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate().
  Up to maintainers (a note for NTFS folks - when documentation says
  that a method may not block, it *does* imply that blocking allocations
  are to be avoided. Really)"

[ More explanations for people who aren't familiar with the vagaries of
  RCU path walking: most of it is hidden from filesystems, but if a
  filesystem actively participates in the low-level path walking it
  needs to make sure the fields involved in that walk are RCU-safe.

  That "actively participate in low-level path walking" includes things
  like having its own ->d_hash()/->d_compare() routines, or by having
  its own directory permission function that doesn't just use the common
  helpers.  Having a ->d_revalidate() function will also have this issue.

  Note that instead of making everything RCU safe you can also choose to
  abort the RCU pathwalk if your operation cannot be done safely under
  RCU, but that obviously comes with a performance penalty. One common
  pattern is to allow the simple cases under RCU, and abort only if you
  need to do something more complicated.

  So not everything needs to be RCU-safe, and things like the inode etc
  that the VFS itself maintains obviously already are. But these fixes
  tend to be about properly RCU-delaying things like ->s_fs_info that
  are maintained by the filesystem and that got potentially released too
  early.   - Linus ]

* tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ext4_get_link(): fix breakage in RCU mode
  cifs_get_link(): bail out in unsafe case
  fuse: fix UAF in rcu pathwalks
  procfs: make freeing proc_fs_info rcu-delayed
  procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
  nfs: fix UAF on pathwalk running into umount
  nfs: make nfs_set_verifier() safe for use in RCU pathwalk
  afs: fix __afs_break_callback() / afs_drop_open_mmap() race
  hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
  exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
  affs: free affs_sb_info with kfree_rcu()
  rcu pathwalk: prevent bogus hard errors from may_lookup()
  fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
2024-02-25 09:29:05 -08:00
Al Viro
a13d1a4de3 exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have
a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem
that is in process of getting shut down.

Besides, having nls and upcase table cleanup moved from ->put_super() towards
the place where sbi is freed makes for simpler failure exits.

Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:31 -05:00
Yuezhang Mo
3a7845041e exfat: fix appending discontinuous clusters to empty file
Eric Hong found that when using ftruncate to expand an empty file,
exfat_ent_set() will fail if discontinuous clusters are allocated.
The reason is that the empty file does not have a cluster chain,
but exfat_ent_set() attempts to append the newly allocated cluster
to the cluster chain. In addition, exfat_find_last_cluster() only
supports finding the last cluster in a non-empty file.

So this commit adds a check whether the file is empty. If the file
is empty, exfat_find_last_cluster() and exfat_ent_set() are no longer
called as they do not need to be called.

Fixes: f55c096f62 ("exfat: do not zero the extended part")
Reported-by: Eric Hong <erichong@qnap.com>
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-02-18 14:41:18 +09:00
Yuezhang Mo
0991abedde exfat: fix zero the unwritten part for dio read
For dio read, bio will be leave in flight when a successful partial
aio read have been setup, blockdev_direct_IO() will return
-EIOCBQUEUED. In the case, iter->iov_offset will be not advanced,
the oops reported by syzbot will occur if revert iter->iov_offset
with iov_iter_revert(). The unwritten part had been zeroed by aio
read, so there is no need to zero it in dio read.

Reported-by: syzbot+fd404f6b03a58e8bc403@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fd404f6b03a58e8bc403
Fixes: 11a347fb6c ("exfat: change to get file size from DataLength")
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-18 23:01:51 +09:00
Yuezhang Mo
f55c096f62 exfat: do not zero the extended part
Since the read operation beyond the ValidDataLength returns zero,
if we just extend the size of the file, we don't need to zero the
extended part, but only change the DataLength without changing
the ValidDataLength.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-08 21:57:22 +09:00
Yuezhang Mo
11a347fb6c exfat: change to get file size from DataLength
In stream extension directory entry, the ValidDataLength
field describes how far into the data stream user data has
been written, and the DataLength field describes the file
size.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-08 21:57:22 +09:00
John Sanpe
34939ae005 exfat: using ffs instead of internal logic
Replaced the internal table lookup algorithm with ffs of
the bitops library with better performance.

Use it to increase the single processing length of the
exfat_find_free_bitmap function, from single-byte search to long type.

Signed-off-by: John Sanpe <sanpeqf@gmail.com>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-08 21:57:21 +09:00
John Sanpe
7423546040 exfat: using hweight instead of internal logic
Replace the internal table lookup algorithm with the hweight
library, which has instruction set acceleration capabilities.

Use it to increase the length of a single calculation of
the exfat_find_free_bitmap function to the long type.

Signed-off-by: John Sanpe <sanpeqf@gmail.com>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-08 21:57:21 +09:00
Yuezhang Mo
1373ca10ec exfat: fix ctime is not updated
Commit 4c72a36edd ("exfat: convert to new timestamp accessors")
removed attr_copy() from exfat_set_attr().
It causes xfstests generic/221 to fail. In xfstests generic/221,
it tests ctime should be updated even if futimens() update atime
only. But in this case, ctime will not be updated if attr_copy()
is removed.

attr_copy() may also update other attributes, and removing it may
cause other bugs, so this commit restores to call attr_copy() in
exfat_set_attr().

Fixes: 4c72a36edd ("exfat: convert to new timestamp accessors")
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-11-03 22:24:11 +09:00
Yuezhang Mo
fc12a722e6 exfat: fix setting uninitialized time to ctime/atime
An uninitialized time is set to ctime/atime in __exfat_write_inode().
It causes xfstests generic/003 and generic/192 to fail.

And since there will be a time gap between setting ctime/atime to
the inode and writing back the inode, so ctime/atime should not be
set again when writing back the inode.

Fixes: 4c72a36edd ("exfat: convert to new timestamp accessors")
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-11-03 22:24:07 +09:00
Yuezhang Mo
ee785c15b5 exfat: support create zero-size directory
This commit adds mount option 'zero_size_dir'. If this option
enabled, don't allocate a cluster to directory when creating
it, and set the directory size to 0.

On Windows, a cluster is allocated for a directory when it is
created, so the mount option is disabled by default.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-10-31 10:01:45 +09:00
Yuezhang Mo
dab48b8f2f exfat: support handle zero-size directory
After repairing a corrupted file system with exfatprogs' fsck.exfat,
zero-size directories may result. It is also possible to create
zero-size directories in other exFAT implementation, such as Paragon
ufsd dirver.

As described in the specification, the lower directory size limits
is 0 bytes.

Without this commit, sub-directories and files cannot be created
under a zero-size directory, and it cannot be removed.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-10-31 10:01:08 +09:00
Jan Cincera
0ab8ba7186 exfat: add ioctls for accessing attributes
Add GET and SET attributes ioctls to enable attribute modification.
We already do this in FAT and a few userspace utils made for it would
benefit from this also working on exFAT, namely fatattr.

Signed-off-by: Jan Cincera <hcincera@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-10-31 10:00:51 +09:00
Jeff Layton
4c72a36edd
exfat: convert to new timestamp accessors
Convert to using the new inode timestamp accessor functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-31-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-18 13:26:21 +02:00
Linus Torvalds
3d3dfeb3ae for-6.6/block-2023-08-28
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmTs08EQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpqa4EACu/zKE+omGXBV0Q7kEpVsChjp0ElGtSDIJ
 tJfTuvnWqQjrqRv4ksmZvGdx8SkqFuXri4/7oBXlsaqeUVbIQdWJUpLErBye6nxa
 lUb6nXOFWwyG94cMRYs71lN0loosjb7aiVw7oVLAIhntq3p3doFl/cyy3ndMZrUE
 pZbsrWSt4QiOKhcO0TtIjfAwsr31AN51qFiNNITEiZl3UjXfkGRCK81X0yM2N8zZ
 7Y0h1ldPBsZ/olNWeRyaW1uB64nKM0buR7/nDxCV/NI05nndJ34bIgo/JIj4xy0v
 SiBj2+y86+oMJZt17yYENwOQdtX3hbyESGuVm9dCrO0t9/byVQxkUk0OMm65BM/l
 l2d+gmMQZTbHziqfLlgq9i3i9+B4C2hsb7iBpuo7SW/FPbM45POgi3lpiZycaZyu
 krQo1qwL4KSGXzGN9CabEuKDcJcXqLxqMDOyEDA3R5Kz06V9tNuM+Di/mr4vuZHK
 sVHUfHuWBO9ionLlGPdc3fH/CuMqic8SHjumiAm2menBZV6cSzRDxpm6H4CyLt7y
 tWmw7BNU7dfHFGd+Jw0Ld49sAuEybszEXq6qYv5uYBVfJNqDvOvEeVoQp0RN2jJA
 AG30hymcZgxn9n7gkIgkPQDgIGUjnzUR8B2mE2UFU1CYVHXYXAXU55CCI5oeTkbs
 d0Y/zCZf1A==
 =p1bd
 -----END PGP SIGNATURE-----

Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:
 "Pretty quiet round for this release. This contains:

   - Add support for zoned storage to ublk (Andreas, Ming)

   - Series improving performance for drivers that mark themselves as
     needing a blocking context for issue (Bart)

   - Cleanup the flush logic (Chengming)

   - sed opal keyring support (Greg)

   - Fixes and improvements to the integrity support (Jinyoung)

   - Add some exports for bcachefs that we can hopefully delete again in
     the future (Kent)

   - deadline throttling fix (Zhiguo)

   - Series allowing building the kernel without buffer_head support
     (Christoph)

   - Sanitize the bio page adding flow (Christoph)

   - Write back cache fixes (Christoph)

   - MD updates via Song:
      - Fix perf regression for raid0 large sequential writes (Jan)
      - Fix split bio iostat for raid0 (David)
      - Various raid1 fixes (Heinz, Xueshi)
      - raid6test build fixes (WANG)
      - Deprecate bitmap file support (Christoph)
      - Fix deadlock with md sync thread (Yu)
      - Refactor md io accounting (Yu)
      - Various non-urgent fixes (Li, Yu, Jack)

   - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li,
     Ming, Nitesh, Ruan, Tejun, Thomas, Xu)"

* tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits)
  block: use strscpy() to instead of strncpy()
  block: sed-opal: keyring support for SED keys
  block: sed-opal: Implement IOC_OPAL_REVERT_LSP
  block: sed-opal: Implement IOC_OPAL_DISCOVERY
  blk-mq: prealloc tags when increase tagset nr_hw_queues
  blk-mq: delete redundant tagset map update when fallback
  blk-mq: fix tags leak when shrink nr_hw_queues
  ublk: zoned: support REQ_OP_ZONE_RESET_ALL
  md: raid0: account for split bio in iostat accounting
  md/raid0: Fix performance regression for large sequential writes
  md/raid0: Factor out helper for mapping and submitting a bio
  md raid1: allow writebehind to work on any leg device set WriteMostly
  md/raid1: hold the barrier until handle_read_error() finishes
  md/raid1: free the r1bio before waiting for blocked rdev
  md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io()
  blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init
  drivers/rnbd: restore sysfs interface to rnbd-client
  md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid()
  raid6: test: only check for Altivec if building on powerpc hosts
  raid6: test: make sure all intermediate and artifact files are .gitignored
  ...
2023-08-29 20:21:42 -07:00
Linus Torvalds
511fb5bafe v6.6-vfs.super
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXpbgAKCRCRxhvAZXjc
 oi8PAQCtXelGZHmTcmevsO8p4Qz7hFpkonZ/TnxKf+RdnlNgPgD+NWi+LoRBpaAj
 xk4z8SqJaTTP4WXrG5JZ6o7EQkUL8gE=
 =2e9I
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull superblock updates from Christian Brauner:
 "This contains the super rework that was ready for this cycle. The
  first part changes the order of how we open block devices and allocate
  superblocks, contains various cleanups, simplifications, and a new
  mechanism to wait on superblock state changes.

  This unblocks work to ultimately limit the number of writers to a
  block device. Jan has already scheduled follow-up work that will be
  ready for v6.7 and allows us to restrict the number of writers to a
  given block device. That series builds on this work right here.

  The second part contains filesystem freezing updates.

  Overview:

  The generic superblock changes are rougly organized as follows
  (ignoring additional minor cleanups):

   (1) Removal of the bd_super member from struct block_device.

       This was a very odd back pointer to struct super_block with
       unclear rules. For all relevant places we have other means to get
       the same information so just get rid of this.

   (2) Simplify rules for superblock cleanup.

       Roughly, everything that is allocated during fs_context
       initialization and that's stored in fs_context->s_fs_info needs
       to be cleaned up by the fs_context->free() implementation before
       the superblock allocation function has been called successfully.

       After sget_fc() returned fs_context->s_fs_info has been
       transferred to sb->s_fs_info at which point sb->kill_sb() if
       fully responsible for cleanup. Adhering to these rules means that
       cleanup of sb->s_fs_info in fill_super() is to be avoided as it's
       brittle and inconsistent.

       Cleanup shouldn't be duplicated between sb->put_super() as
       sb->put_super() is only called if sb->s_root has been set aka
       when the filesystem has been successfully born (SB_BORN). That
       complexity should be avoided.

       This also means that block devices are to be closed in
       sb->kill_sb() instead of sb->put_super(). More details in the
       lower section.

   (3) Make it possible to lookup or create a superblock before opening
       block devices

       There's a subtle dependency on (2) as some filesystems did rely
       on fill_super() to be called in order to correctly clean up
       sb->s_fs_info. All these filesystems have been fixed.

   (4) Switch most filesystem to follow the same logic as the generic
       mount code now does as outlined in (3).

   (5) Use the superblock as the holder of the block device. We can now
       easily go back from block device to owning superblock.

   (6) Export and extend the generic fs_holder_ops and use them as
       holder ops everywhere and remove the filesystem specific holder
       ops.

   (7) Call from the block layer up into the filesystem layer when the
       block device is removed, allowing to shut down the filesystem
       without risk of deadlocks.

   (8) Get rid of get_super().

       We can now easily go back from the block device to owning
       superblock and can call up from the block layer into the
       filesystem layer when the device is removed. So no need to wade
       through all registered superblock to find the owning superblock
       anymore"

Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/

* tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits)
  super: use higher-level helper for {freeze,thaw}
  super: wait until we passed kill super
  super: wait for nascent superblocks
  super: make locking naming consistent
  super: use locking helpers
  fs: simplify invalidate_inodes
  fs: remove get_super
  block: call into the file system for ioctl BLKFLSBUF
  block: call into the file system for bdev_mark_dead
  block: consolidate __invalidate_device and fsync_bdev
  block: drop the "busy inodes on changed media" log message
  dasd: also call __invalidate_device when setting the device offline
  amiflop: don't call fsync_bdev in FDFMTBEG
  floppy: call disk_force_media_change when changing the format
  block: simplify the disk_force_media_change interface
  nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl
  xfs use fs_holder_ops for the log and RT devices
  xfs: drop s_umount over opening the log and RT devices
  ext4: use fs_holder_ops for the log device
  ext4: drop s_umount over opening the log device
  ...
2023-08-28 11:04:18 -07:00
Linus Torvalds
615e95831e v6.6-vfs.ctime
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXTKAAKCRCRxhvAZXjc
 oifJAQCzi/p+AdQu8LA/0XvR7fTwaq64ZDCibU4BISuLGT2kEgEAuGbuoFZa0rs2
 XYD/s4+gi64p9Z01MmXm2XO1pu3GPg0=
 =eJz5
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs timestamp updates from Christian Brauner:
 "This adds VFS support for multi-grain timestamps and converts tmpfs,
  xfs, ext4, and btrfs to use them. This carries acks from all relevant
  filesystems.

  The VFS always uses coarse-grained timestamps when updating the ctime
  and mtime after a change. This has the benefit of allowing filesystems
  to optimize away a lot of metadata updates, down to around 1 per
  jiffy, even when a file is under heavy writes.

  Unfortunately, this has always been an issue when we're exporting via
  NFSv3, which relies on timestamps to validate caches. A lot of changes
  can happen in a jiffy, so timestamps aren't sufficient to help the
  client decide to invalidate the cache.

  Even with NFSv4, a lot of exported filesystems don't properly support
  a change attribute and are subject to the same problems with timestamp
  granularity. Other applications have similar issues with timestamps
  (e.g., backup applications).

  If we were to always use fine-grained timestamps, that would improve
  the situation, but that becomes rather expensive, as the underlying
  filesystem would have to log a lot more metadata updates.

  This introduces fine-grained timestamps that are used when they are
  actively queried.

  This uses the 31st bit of the ctime tv_nsec field to indicate that
  something has queried the inode for the mtime or ctime. When this flag
  is set, on the next mtime or ctime update, the kernel will fetch a
  fine-grained timestamp instead of the usual coarse-grained one.

  As POSIX generally mandates that when the mtime changes, the ctime
  must also change the kernel always stores normalized ctime values, so
  only the first 30 bits of the tv_nsec field are ever used.

  Filesytems can opt into this behavior by setting the FS_MGTIME flag in
  the fstype. Filesystems that don't set this flag will continue to use
  coarse-grained timestamps.

  Various preparatory changes, fixes and cleanups are included:

   - Fixup all relevant places where POSIX requires updating ctime
     together with mtime. This is a wide-range of places and all
     maintainers provided necessary Acks.

   - Add new accessors for inode->i_ctime directly and change all
     callers to rely on them. Plain accesses to inode->i_ctime are now
     gone and it is accordingly rename to inode->__i_ctime and commented
     as requiring accessors.

   - Extend generic_fillattr() to pass in a request mask mirroring in a
     sense the statx() uapi. This allows callers to pass in a request
     mask to only get a subset of attributes filled in.

   - Rework timestamp updates so it's possible to drop the @now
     parameter the update_time() inode operation and associated helpers.

   - Add inode_update_timestamps() and convert all filesystems to it
     removing a bunch of open-coding"

* tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits)
  btrfs: convert to multigrain timestamps
  ext4: switch to multigrain timestamps
  xfs: switch to multigrain timestamps
  tmpfs: add support for multigrain timestamps
  fs: add infrastructure for multigrain timestamps
  fs: drop the timespec64 argument from update_time
  xfs: have xfs_vn_update_time gets its own timestamp
  fat: make fat_update_time get its own timestamp
  fat: remove i_version handling from fat_update_time
  ubifs: have ubifs_update_time use inode_update_timestamps
  btrfs: have it use inode_update_timestamps
  fs: drop the timespec64 arg from generic_update_time
  fs: pass the request_mask to generic_fillattr
  fs: remove silly warning from current_time
  gfs2: fix timestamp handling on quota inodes
  fs: rename i_ctime field to __i_ctime
  selinux: convert to ctime accessor functions
  security: convert to ctime accessor functions
  apparmor: convert to ctime accessor functions
  sunrpc: convert to ctime accessor functions
  ...
2023-08-28 09:31:32 -07:00
Christoph Hellwig
4abc9a43d9 exfat: free the sbi and iocharset in ->kill_sb
As a rule of thumb everything allocated to the fs_context and moved into
the super_block should be freed by ->kill_sb so that the teardown
handling doesn't need to be duplicated between the fill_super error
path and put_super.  Implement an exfat-specific kill_sb method to do
that and share the code with the mount contex free helper for the
mount error handling case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230809220545.1308228-11-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:55 +02:00
Christoph Hellwig
c934dc927e exfat: don't RCU-free the sbi
There are no RCU critical sections for accessing any information in the
sbi, so drop the call_rcu indirection for freeing the sbi.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230809220545.1308228-10-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:55 +02:00
Jeff Layton
0d72b92883 fs: pass the request_mask to generic_fillattr
generic_fillattr just fills in the entire stat struct indiscriminately
today, copying data from the inode. There is at least one attribute
(STATX_CHANGE_COOKIE) that can have side effects when it is reported,
and we're looking at adding more with the addition of multigrain
timestamps.

Add a request_mask argument to generic_fillattr and have most callers
just pass in the value that is passed to getattr. Have other callers
(e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of
STATX_CHANGE_COOKIE into generic_fillattr.

Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09 08:56:36 +02:00
Linus Torvalds
3e32715496
vfs: get rid of old '->iterate' directory operation
All users now just use '->iterate_shared()', which only takes the
directory inode lock for reading.

Filesystems that never got convered to shared mode now instead use a
wrapper that drops the lock, re-takes it in write mode, calls the old
function, and then downgrades the lock back to read mode.

This way the VFS layer and other callers no longer need to care about
filesystems that never got converted to the modern era.

The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
ntfs, ocfs2, overlayfs, and vboxsf.

Honestly, several of them look like they really could just iterate their
directories in shared mode and skip the wrapper entirely, but the point
of this change is to not change semantics or fix filesystems that
haven't been fixed in the last 7+ years, but to finally get rid of the
dual iterators.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06 15:08:35 +02:00
Christoph Hellwig
925c86a19b fs: add CONFIG_BUFFER_HEAD
Add a new config option that controls building the buffer_head code, and
select it from all file systems and stacking drivers that need it.

For the block device nodes and alternative iomap based buffered I/O path
is provided when buffer_head support is not enabled, and iomap needs a
a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call
into the buffer_head code when it doesn't exist.

Otherwise this is just Kconfig and ifdef changes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20230801172201.1923299-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-02 09:13:09 -06:00
Sungjong Seo
ff84772fd4 exfat: release s_lock before calling dir_emit()
There is a potential deadlock reported by syzbot as below:

======================================================
WARNING: possible circular locking dependency detected
6.4.0-next-20230707-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor330/5073 is trying to acquire lock:
ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock_killable include/linux/mmap_lock.h:151 [inline]
ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: get_mmap_lock_carefully mm/memory.c:5293 [inline]
ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: lock_mm_and_find_vma+0x369/0x510 mm/memory.c:5344
but task is already holding lock:
ffff888019f760e0 (&sbi->s_lock){+.+.}-{3:3}, at: exfat_iterate+0x117/0xb50 fs/exfat/dir.c:232

which lock already depends on the new lock.

Chain exists of:
  &mm->mmap_lock --> mapping.invalidate_lock#3 --> &sbi->s_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sbi->s_lock);
                               lock(mapping.invalidate_lock#3);
                               lock(&sbi->s_lock);
  rlock(&mm->mmap_lock);

Let's try to avoid above potential deadlock condition by moving dir_emit*()
out of sbi->s_lock coverage.

Fixes: ca06197382 ("exfat: add directory operations")
Cc: stable@vger.kernel.org #v5.7+
Reported-by: syzbot+1741a5d9b79989c10bdc@syzkaller.appspotmail.com
Link: https://lore.kernel.org/lkml/00000000000078ee7e060066270b@google.com/T/#u
Tested-by: syzbot+1741a5d9b79989c10bdc@syzkaller.appspotmail.com
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-07-15 08:34:19 +09:00