linux

Author	SHA1	Message	Date
Kirill Tkhai	217316a601	fuse: Optimize request_end() by not taking fiq->waitq.lock We take global fiq->waitq.lock every time, when we are in this function, but interrupted requests are just small subset of all requests. This patch optimizes request_end() and makes it to take the lock when it's really needed. queue_interrupt() needs small change for that. After req is linked to interrupt list, we do smp_mb() and check for FR_FINISHED again. In case of FR_FINISHED bit has appeared, we remove req and leave the function: Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:13 +01:00
Kirill Tkhai	8da6e91832	fuse: Kill fasync only if interrupt is queued in queue_interrupt() We should sent signal only in case of interrupt is really queued. Not a real problem, but this makes the code clearer and intuitive. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:13 +01:00
Kirill Tkhai	340617508d	fuse: Remove stale comment in end_requests() Function end_requests() does not take fc->lock. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:12 +01:00
Kirill Tkhai	c5de16cca2	fuse: Replace page without copying in fuse_writepage_in_flight() It looks like we can optimize page replacement and avoid copying by simple updating the request's page. [SzM: swap with new request's tmp page to avoid use after free.] Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:12 +01:00
Miklos Szeredi	e2653bd53a	fuse: fix leaked aux requests Auxiliary requests chained on req->misc.write.next may be leaked on truncate. Free these as well if the parent request was truncated off. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:12 +01:00
Miklos Szeredi	419234d595	fuse: only reuse auxiliary request in fuse_writepage_in_flight() Don't reuse the queued request, even if it only contains a single page. This is needed because previous locking changes (spliting out fiq->waitq.lock from fc->lock) broke the assumption that request will remain in FR_PENDING at least until the new page contents are copied. This fix removes a slight optimization for a rare corner case, so we really shoudln't care. Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> Fixes: `fd22d62ed0` ("fuse: no fc->lock for iqueue parts") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:12 +01:00
Miklos Szeredi	7f305ca192	fuse: clean up fuse_writepage_in_flight() Restructure the function to better separate the locked and the unlocked parts. Use the "old_req" local variable to mean only the queued request, and not any auxiliary requests added onto its misc.write.next list. These changes are in preparation for the following patch. Also turn BUG_ON instances into WARN_ON and add a header comment explaining what the function does. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:12 +01:00
Miklos Szeredi	2fe93bd432	fuse: extract fuse_find_writeback() helper Call this from fuse_range_is_writeback() and fuse_writepage_in_flight(). Turn a BUG_ON() into a WARN_ON() in the process. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:12 +01:00
Miklos Szeredi	a2ebba8241	fuse: decrement NR_WRITEBACK_TEMP on the right page NR_WRITEBACK_TEMP is accounted on the temporary page in the request, not the page cache page. Fixes: `8b284dc472` ("fuse: writepages: handle same page rewrites") Cc: <stable@vger.kernel.org> # v3.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-01-16 10:27:59 +01:00
Jann Horn	9509941e9c	fuse: call pipe_buf_release() under pipe lock Some of the pipe_buf_release() handlers seem to assume that the pipe is locked - in particular, anon_pipe_buf_release() accesses pipe->tmp_page without taking any extra locks. From a glance through the callers of pipe_buf_release(), it looks like FUSE is the only one that calls pipe_buf_release() without having the pipe locked. This bug should only lead to a memory leak, nothing terrible. Fixes: `dd3bb14f44` ("fuse: support splice() writing to fuse device") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-01-16 10:27:59 +01:00
Miklos Szeredi	8a3177db59	cuse: fix ioctl cuse_process_init_reply() doesn't initialize fc->max_pages and thus all cuse bases ioctls fail with ENOMEM. Reported-by: Andreas Steinmetz <ast@domdv.de> Fixes: `5da784cce4` ("fuse: add max_pages to init_out") Cc: <stable@vger.kernel.org> # v4.20 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-01-16 10:27:59 +01:00
Miklos Szeredi	97e1532ef8	fuse: handle zero sized retrieve correctly Dereferencing req->page_descs[0] will Oops if req->max_pages is zero. Reported-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com Tested-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com Fixes: `b2430d7567` ("fuse: add per-page descriptor <offset, length> to fuse_req") Cc: <stable@vger.kernel.org> # v3.9 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-01-16 10:27:59 +01:00
Arun KS	ca79b0c211	mm: convert totalram_pages and totalhigh_pages variables to atomic totalram_pages and totalhigh_pages are made static inline function. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org Signed-off-by: Arun KS <arunks@codeaurora.org> Suggested-by: Michal Hocko <mhocko@suse.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-12-28 12:11:47 -08:00
Chad Austin	2e64ff154c	fuse: continue to send FUSE_RELEASEDIR when FUSE_OPEN returns ENOSYS When FUSE_OPEN returns ENOSYS, the no_open bit is set on the connection. Because the FUSE_RELEASE and FUSE_RELEASEDIR paths share code, this incorrectly caused the FUSE_RELEASEDIR request to be dropped and never sent to userspace. Pass an isdir bool to distinguish between FUSE_RELEASE and FUSE_RELEASEDIR inside of fuse_file_put. Fixes: `7678ac5061` ("fuse: support clients that don't implement 'open'") Cc: <stable@vger.kernel.org> # v3.14 Signed-off-by: Chad Austin <chadaustin@fb.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-12-11 21:47:28 +01:00
Takeshi Misawa	d72f70da60	fuse: Fix memory leak in fuse_dev_free() When ntfs is unmounted, the following leak is reported by kmemleak. kmemleak report: unreferenced object 0xffff880052bf4400 (size 4096): comm "mount.ntfs", pid 16530, jiffies 4294861127 (age 3215.836s) hex dump (first 32 bytes): 00 44 bf 52 00 88 ff ff 00 44 bf 52 00 88 ff ff .D.R.....D.R.... 10 44 bf 52 00 88 ff ff 10 44 bf 52 00 88 ff ff .D.R.....D.R.... backtrace: [<00000000bf4a2f8d>] fuse_fill_super+0xb22/0x1da0 [fuse] [<000000004dde0f0c>] mount_bdev+0x263/0x320 [<0000000025aebc66>] mount_fs+0x82/0x2bf [<0000000042c5a6be>] vfs_kern_mount.part.33+0xbf/0x480 [<00000000ed10cd5b>] do_mount+0x3de/0x2ad0 [<00000000d59ff068>] ksys_mount+0xba/0xd0 [<000000001bda1bcc>] __x64_sys_mount+0xba/0x150 [<00000000ebe26304>] do_syscall_64+0x151/0x490 [<00000000d25f2b42>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<000000002e0abd2c>] 0xffffffffffffffff fuse_dev_alloc() allocate fud->pq.processing. But this hash table is not freed. Fix this by freeing fud->pq.processing. Signed-off-by: Takeshi Misawa <jeliantsurux@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `be2ff42c5d` ("fuse: Use hash table to link processing request")	2018-12-10 09:57:54 +01:00
Miklos Szeredi	d233c7dd16	fuse: fix revalidation of attributes for permission check fuse_invalidate_attr() now sets fi->inval_mask instead of fi->i_time, hence we need to check the inval mask in fuse_permission() as well. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `2f1e81965f` ("fuse: allow fine grained attr cache invaldation")	2018-12-03 10:14:43 +01:00
Miklos Szeredi	a9c2d1e82f	fuse: fix fsync on directory Commit `ab2257e994` ("fuse: reduce size of struct fuse_inode") moved parts of fields related to writeback on regular file and to directory caching into a union. However fuse_fsync_common() called from fuse_dir_fsync() touches some writeback related fields, resulting in a crash. Move writeback related parts from fuse_fsync_common() to fuse_fysnc(). Reported-by: Brett Girton <btgirton@gmail.com> Tested-by: Brett Girton <btgirton@gmail.com> Fixes: `ab2257e994` ("fuse: reduce size of struct fuse_inode") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-12-03 10:14:43 +01:00
Myungho Jung	4fc4bb796b	fuse: Add bad inode check in fuse_destroy_inode() make_bad_inode() sets inode->i_mode to S_IFREG if I/O error is detected in fuse_do_getattr()/fuse_do_setattr(). If the inode is not a regular file, write_files and queued_writes in fuse_inode are not initialized and have NULL or invalid pointers written by other members in a union. So, list_empty() returns false in fuse_destroy_inode(). Add is_bad_inode() to check if make_bad_inode() was called. Reported-by: syzbot+b9c89b84423073226299@syzkaller.appspotmail.com Fixes: `ab2257e994` ("fuse: reduce size of struct fuse_inode") Signed-off-by: Myungho Jung <mhjungk@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-11-22 10:20:19 +01:00
Lukas Czerner	ebacb81273	fuse: fix use-after-free in fuse_direct_IO() In async IO blocking case the additional reference to the io is taken for it to survive fuse_aio_complete(). In non blocking case this additional reference is not needed, however we still reference io to figure out whether to wait for completion or not. This is wrong and will lead to use-after-free. Fix it by storing blocking information in separate variable. This was spotted by KASAN when running generic/208 fstest. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `744742d692` ("fuse: Add reference counting for fuse_io_priv") Cc: <stable@vger.kernel.org> # v4.6	2018-11-09 15:52:17 +01:00
Miklos Szeredi	2d84a2d19b	fuse: fix possibly missed wake-up after abort In current fuse_drop_waiting() implementation it's possible that fuse_wait_aborted() will not be woken up in the unlikely case that fuse_abort_conn() + fuse_wait_aborted() runs in between checking fc->connected and calling atomic_dec(&fc->num_waiting). Do the atomic_dec_and_test() unconditionally, which also provides the necessary barrier against reordering with the fc->connected check. The explicit smp_mb() in fuse_wait_aborted() is not actually needed, since the spin_unlock() in fuse_abort_conn() provides the necessary RELEASE barrier after resetting fc->connected. However, this is not a performance sensitive path, and adding the explicit barrier makes it easier to document. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `b8f95e5d13` ("fuse: umount should wait for all requests") Cc: <stable@vger.kernel.org> #v4.19	2018-11-09 15:52:16 +01:00
Miklos Szeredi	7fabaf3034	fuse: fix leaked notify reply fuse_request_send_notify_reply() may fail if the connection was reset for some reason (e.g. fs was unmounted). Don't leak request reference in this case. Besides leaking memory, this resulted in fc->num_waiting not being decremented and hence fuse_wait_aborted() left in a hanging and unkillable state. Fixes: `2d45ba381a` ("fuse: add retrieve request") Fixes: `b8f95e5d13` ("fuse: umount should wait for all requests") Reported-and-tested-by: syzbot+6339eda9cb4ebbc4c37b@syzkaller.appspotmail.com Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> #v2.6.36	2018-11-09 15:52:16 +01:00
Linus Torvalds	9931a07d51	Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull AFS updates from Al Viro: "AFS series, with some iov_iter bits included" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) missing bits of "iov_iter: Separate type from direction and use accessor functions" afs: Probe multiple fileservers simultaneously afs: Fix callback handling afs: Eliminate the address pointer from the address list cursor afs: Allow dumping of server cursor on operation failure afs: Implement YFS support in the fs client afs: Expand data structure fields to support YFS afs: Get the target vnode in afs_rmdir() and get a callback on it afs: Calc callback expiry in op reply delivery afs: Fix FS.FetchStatus delivery from updating wrong vnode afs: Implement the YFS cache manager service afs: Remove callback details from afs_callback_break struct afs: Commit the status on a new file/dir/symlink afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS afs: Don't invoke the server to read data beyond EOF afs: Add a couple of tracepoints to log I/O errors afs: Handle EIO from delivery function afs: Fix TTL on VL server and address lists afs: Implement VL server rotation afs: Improve FS server rotation error handling ...	2018-11-01 19:58:52 -07:00
David Howells	00e2370744	iov_iter: Use accessor function Use accessor functions to access an iterator's type and direction. This allows for the possibility of using some other method of determining the type of iterator than if-chains with bitwise-AND conditions. Signed-off-by: David Howells <dhowells@redhat.com>	2018-10-24 00:40:44 +01:00
Dan Schatzberg	5571f1e654	fuse: enable caching of symlinks FUSE file reads are cached in the page cache, but symlink reads are not. This patch enables FUSE READLINK operations to be cached which can improve performance of some FUSE workloads. In particular, I'm working on a FUSE filesystem for access to source code and discovered that about a 10% improvement to build times is achieved with this patch (there are a lot of symlinks in the source tree). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-15 15:43:07 +02:00
Miklos Szeredi	9a2eb24d1a	fuse: only invalidate atime in direct read After sending a synchronous READ request from __fuse_direct_read() we only need to invalidate atime; none of the other attributes should be changed by a read(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-15 15:43:06 +02:00
Miklos Szeredi	802dc0497b	fuse: don't need GETATTR after every READ If 'auto_inval_data' mode is active, then fuse_file_read_iter() will call fuse_update_attributes(), which will check the attribute validity and send a GETATTR request if some of the attributes are no longer valid. The page cache is then invalidated if the size or mtime have changed. Then, if a READ request was sent and reply received (which is the case if the data wasn't cached yet, or if the file is opened for O_DIRECT), the atime attribute is invalidated. This will result in the next read() also triggering a GETATTR, ... This can be fixed by only sending GETATTR if the mode or size are invalid, we don't need to do a refresh if only atime is invalid. More generally, none of the callers of fuse_update_attributes() need an up-to-date atime value, so for now just remove STATX_ATIME from the request mask when attributes are updated for internal use. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-15 15:43:06 +02:00
Miklos Szeredi	2f1e81965f	fuse: allow fine grained attr cache invaldation This patch adds the infrastructure for more fine grained attribute invalidation. Currently only 'atime' is invalidated separately. The use of this infrastructure is extended to the statx(2) interface, which for now means that if only 'atime' is invalid and STATX_ATIME is not specified in the mask argument, then no GETATTR request will be generated. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-15 15:43:06 +02:00
Miklos Szeredi	e52a825048	fuse: realloc page array Writeback caching currently allocates requests with the maximum number of possible pages, while the actual number of pages per request depends on a couple of factors that cannot be determined when the request is allocated (whether page is already under writeback, whether page is contiguous with previous pages already added to a request). This patch allows such requests to start with no page allocation (all pages inline) and grow the page array on demand. If the max_pages tunable remains the default value, then this will mean just one allocation that is the same size as before. If the tunable is larger, then this adds at most 3 additional memory allocations (which is generously compensated by the improved performance from the larger request). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:06 +02:00
Constantine Shulyupin	5da784cce4	fuse: add max_pages to init_out Replace FUSE_MAX_PAGES_PER_REQ with the configurable parameter max_pages to improve performance. Old RFC with detailed description of the problem and many fixes by Mitsuo Hayasaka (mitsuo.hayasaka.hu@hitachi.com): - https://lkml.org/lkml/2012/7/5/136 We've encountered performance degradation and fixed it on a big and complex virtual environment. Environment to reproduce degradation and improvement: 1. Add lag to user mode FUSE Add nanosleep(&(struct timespec){ 0, 1000 }, NULL); to xmp_write_buf in passthrough_fh.c 2. patch UM fuse with configurable max_pages parameter. The patch will be provided latter. 3. run test script and perform test on tmpfs fuse_test() { cd /tmp mkdir -p fusemnt passthrough_fh -o max_pages=$1 /tmp/fusemnt grep fuse /proc/self/mounts dd conv=fdatasync oflag=dsync if=/dev/zero of=fusemnt/tmp/tmp \ count=1K bs=1M 2>&1 \| grep -v records rm fusemnt/tmp/tmp killall passthrough_fh } Test results: passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0 0 0 1073741824 bytes (1.1 GB) copied, 1.73867 s, 618 MB/s passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0,max_pages=256 0 0 1073741824 bytes (1.1 GB) copied, 1.15643 s, 928 MB/s Obviously with bigger lag the difference between 'before' and 'after' will be more significant. Mitsuo Hayasaka, in 2012 (https://lkml.org/lkml/2012/7/5/136), observed improvement from 400-550 to 520-740. Signed-off-by: Constantine Shulyupin <const@MakeLinux.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:06 +02:00
Miklos Szeredi	8a7aa286ab	fuse: allocate page array more efficiently When allocating page array for a request the array for the page pointers and the array for page descriptors are allocated by two separate kmalloc() calls. Merge these into one allocation. Also instead of initializing the request and the page arrays with memset(), use the zeroing allocation variants. Reserved requests never carry pages (page array size is zero). Make that explicit by initializing the page array pointers to NULL and make sure the assumption remains true by adding a WARN_ON(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:05 +02:00
Miklos Szeredi	ab2257e994	fuse: reduce size of struct fuse_inode Do this by grouping fields used for cached writes and putting them into a union with fileds used for cached readdir (with obviously no overlap, since we don't have hybrid objects). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:05 +02:00
Miklos Szeredi	261aaba72f	fuse: use iversion for readdir cache verification Use the internal iversion counter to make sure modifications of the directory through this filesystem are not missed by the mtime check (due to mtime granularity). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:05 +02:00
Miklos Szeredi	7118883b44	fuse: use mtime for readdir cache verification Store the modification time of the directory in the cache, obtained before starting to fill the cache. When reading the cache, verify that the directory hasn't changed, by checking if current modification time is the same as the one stored in the cache. This only needs to be done when the current file position is at the beginning of the directory, as mandated by POSIX. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:04 +02:00
Miklos Szeredi	3494927e09	fuse: add readdir cache version Allow the cache to be invalidated when page(s) have gone missing. In this case increment the version of the cache and reset to an empty state. Add a version number to the directory stream in struct fuse_file as well, indicating the version of the cache it's supposed to be reading. If the cache version doesn't match the stream's version, then reset the stream to the beginning of the cache. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:04 +02:00
Miklos Szeredi	5d7bc7e868	fuse: allow using readdir cache The cache is only used if it's completed, not while it's still being filled; this constraint could be lifted later, if it turns out to be useful. Introduce state in struct fuse_file that indicates the position within the cache. After a seek, reset the position to the beginning of the cache and search the cache for the current position. If the current position is not found in the cache, then fall back to uncached readdir. It can also happen that page(s) disappear from the cache, in which case we must also fall back to uncached readdir. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:04 +02:00
Miklos Szeredi	69e3455115	fuse: allow caching readdir This patch just adds the cache filling functions, which are invoked if FOPEN_CACHE_DIR flag is set in the OPENDIR reply. Cache reading and cache invalidation are added by subsequent patches. The directory cache uses the page cache. Directory entries are packed into a page in the same format as in the READDIR reply. A page only contains whole entries, the space at the end of the page is cleared. The page is locked while being modified. Multiple parallel readdirs on the same directory can fill the cache; the only constraint is that continuity must be maintained (d_off of last entry points to position of current entry). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:04 +02:00
Miklos Szeredi	18172b10b6	fuse: extract fuse_emit() helper Prepare for cache filling by introducing a helper for emitting a single directory entry. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:23 +02:00
Miklos Szeredi	d123d8e183	fuse: split out readdir.c Directory reading code is about to grow larger, so split it out from dir.c into a new source file. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:23 +02:00
Kirill Tkhai	be2ff42c5d	fuse: Use hash table to link processing request We noticed the performance bottleneck in FUSE running our Virtuozzo storage over rdma. On some types of workload we observe 20% of times spent in request_find() in profiler. This function is iterating over long requests list, and it scales bad. The patch introduces hash table to reduce the number of iterations, we do in this function. Hash generating algorithm is taken from hash_add() function, while 256 lines table is used to store pending requests. This fixes problem and improves the performance. Reported-by: Alexey Kuznetsov <kuznet@virtuozzo.com> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:23 +02:00
Kirill Tkhai	3a5358d1a1	fuse: kill req->intr_unique This field is not needed after the previous patch, since we can easily convert request ID to interrupt request ID and vice versa. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:23 +02:00
Kirill Tkhai	c59fd85e4f	fuse: change interrupt requests allocation algorithm Using of two unconnected IDs req->in.h.unique and req->intr_unique does not allow to link requests to a hash table. We need can't use none of them as a key to calculate hash. This patch changes the algorithm of allocation of IDs for a request. Plain requests obtain even ID, while interrupt requests are encoded in the low bit. So, in next patches we will be able to use the rest of ID bits to calculate hash, and the hash will be the same for plain and interrupt requests. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:23 +02:00
Kirill Tkhai	63825b4e1d	fuse: do not take fc->lock in fuse_request_send_background() Currently, we take fc->lock there only to check for fc->connected. But this flag is changed only on connection abort, which is very rare operation. So allow checking fc->connected under just fc->bg_lock and use this lock (as well as fc->lock) when resetting fc->connected. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:23 +02:00
Kirill Tkhai	ae2dffa394	fuse: introduce fc->bg_lock To reduce contention of fc->lock, this patch introduces bg_lock for protection of fields related to background queue. These are: max_background, congestion_threshold, num_background, active_background, bg_queue and blocked. This allows next patch to make async reads not requiring fc->lock, so async reads and writes will have better performance executed in parallel. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:22 +02:00
Kirill Tkhai	2b30a53314	fuse: add locking to max_background and congestion_threshold changes Functions sequences like request_end()->flush_bg_queue() require that max_background and congestion_threshold are constant during their execution. Otherwise, checks like if (fc->num_background == fc->max_background) made in different time may behave not like expected. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:22 +02:00
Kirill Tkhai	2a23f2b8ad	fuse: use READ_ONCE on congestion_threshold and max_background Since they are of unsigned int type, it's allowed to read them unlocked during reporting to userspace. Let's underline this fact with READ_ONCE() macroses. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:22 +02:00
Kirill Tkhai	e287179afe	fuse: use list_first_entry() in flush_bg_queue() This cleanup patch makes the function to use the primitive instead of direct dereferencing. Also, move fiq dereferencing out of cycle, since it's always constant. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:22 +02:00
Niels de Vos	88bc7d5097	fuse: add support for copy_file_range() There are several FUSE filesystems that can implement server-side copy or other efficient copy/duplication/clone methods. The copy_file_range() syscall is the standard interface that users have access to while not depending on external libraries that bypass FUSE. Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:22 +02:00
Miklos Szeredi	908a572b80	fuse: fix blocked_waitq wakeup Using waitqueue_active() is racy. Make sure we issue a wake_up() unconditionally after storing into fc->blocked. After that it's okay to optimize with waitqueue_active() since the first wake up provides the necessary barrier for all waiters, not the just the woken one. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `3c18ef8117` ("fuse: optimize wake_up") Cc: <stable@vger.kernel.org> # v3.10	2018-09-28 16:43:22 +02:00
Miklos Szeredi	4c316f2f3f	fuse: set FR_SENT while locked Otherwise fuse_dev_do_write() could come in and finish off the request, and the set_bit(FR_SENT, ...) could trigger the WARN_ON(test_bit(FR_SENT, ...)) in request_end(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reported-by: syzbot+ef054c4d3f64cd7f7cec@syzkaller.appspotmai Fixes: `46c34a348b` ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2	2018-09-28 16:43:22 +02:00
Kirill Tkhai	d2d2d4fb1f	fuse: Fix use-after-free in fuse_dev_do_write() After we found req in request_find() and released the lock, everything may happen with the req in parallel: cpu0 cpu1 fuse_dev_do_write() fuse_dev_do_write() req = request_find(fpq, ...) ... spin_unlock(&fpq->lock) ... ... req = request_find(fpq, oh.unique) ... spin_unlock(&fpq->lock) queue_interrupt(&fc->iq, req); ... ... ... ... ... request_end(fc, req); fuse_put_request(fc, req); ... queue_interrupt(&fc->iq, req); Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `46c34a348b` ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2	2018-09-28 16:43:21 +02:00

1 2 3 4 5 ...

920 Commits