linux

Author	SHA1	Message	Date
Miklos Szeredi	6314efee3c	fuse: readdirplus: fix RCU walk Doing dput(parent) is not valid in RCU walk mode. In RCU mode it would probably be okay to update the parent flags, but it's actually not necessary most of the time... So only set the FUSE_I_ADVISE_RDPLUS flag on the parent when the entry was recently initialized by READDIRPLUS. This is achieved by setting FUSE_I_INIT_RDPLUS on entries added by READDIRPLUS and only dropping out of RCU mode if this flag is set. FUSE_I_INIT_RDPLUS is cleared once the FUSE_I_ADVISE_RDPLUS flag is set in the parent. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org	2013-10-01 16:41:22 +02:00
Maxim Patlasov	06a7c3c278	fuse: hotfix truncate_pagecache() issue The way how fuse calls truncate_pagecache() from fuse_change_attributes() is completely wrong. Because, w/o i_mutex held, we never sure whether 'oldsize' and 'attr->size' are valid by the time of execution of truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as we released fc->lock in the middle of fuse_change_attributes(), we completely loose control of actions which may happen with given inode until we reach truncate_pagecache. The list of potentially dangerous actions includes mmap-ed reads and writes, ftruncate(2) and write(2) extending file size. The typical outcome of doing truncate_pagecache() with outdated arguments is data corruption from user point of view. This is (in some sense) acceptable in cases when the issue is triggered by a change of the file on the server (i.e. externally wrt fuse operation), but it is absolutely intolerable in scenarios when a single fuse client modifies a file without any external intervention. A real life case I discovered by fsx-linux looked like this: 1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sends FUSE_SETATTR to the server synchronously, but before getting fc->lock ... 2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_LOOKUP to the server synchronously, then calls fuse_change_attributes(). The latter updates i_size, releases fc->lock, but before comparing oldsize vs attr->size.. 3. fuse_do_setattr() from the first step proceeds by acquiring fc->lock and updating attributes and i_size, but now oldsize is equal to outarg.attr.size because i_size has just been updated (step 2). Hence, fuse_do_setattr() returns w/o calling truncate_pagecache(). 4. As soon as ftruncate(2) completes, the user extends file size by write(2) making a hole in the middle of file, then reads data from the hole either by read(2) or mmap-ed read. The user expects to get zero data from the hole, but gets stale data because truncate_pagecache() is not executed yet. The scenario above illustrates one side of the problem: not truncating the page cache even though we should. Another side corresponds to truncating page cache too late, when the state of inode changed significantly. Theoretically, the following is possible: 1. As in the previous scenario fuse_dentry_revalidate() discovered that i_size changed (due to our own fuse_do_setattr()) and is going to call truncate_pagecache() for some 'new_size' it believes valid right now. But by the time that particular truncate_pagecache() is called ... 2. fuse_do_setattr() returns (either having called truncate_pagecache() or not -- it doesn't matter). 3. The file is extended either by write(2) or ftruncate(2) or fallocate(2). 4. mmap-ed write makes a page in the extended region dirty. The result will be the lost of data user wrote on the fourth step. The patch is a hotfix resolving the issue in a simplistic way: let's skip dangerous i_size update and truncate_pagecache if an operation changing file size is in progress. This simplistic approach looks correct for the cases w/o external changes. And to handle them properly, more sophisticated and intrusive techniques (e.g. NFS-like one) would be required. I'd like to postpone it until the issue is well discussed on the mailing list(s). Changed in v2: - improved patch description to cover both sides of the issue. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org	2013-09-03 13:41:58 +02:00
Miklos Szeredi	60b9df7a54	fuse: add flag to turn on async direct IO Without async DIO write requests to a single file were always serialized. With async DIO that's no longer the case. So don't turn on async DIO by default for fear of breaking backward compatibility. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-05-01 14:37:21 +02:00
Maxim Patlasov	efb9fa9e91	fuse: truncate file if async dio failed The patch improves error handling in fuse_direct_IO(): if we successfully submitted several fuse requests on behalf of synchronous direct write extending file and some of them failed, let's try to do our best to clean-up. Changed in v2: reuse fuse_do_setattr(). Thanks to Brian for suggestion. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-04-18 10:55:24 +02:00
Maxim Patlasov	36cf66ed9f	fuse: make fuse_direct_io() aware about AIO The patch implements passing "struct fuse_io_priv *io" down the stack up to fuse_send_read/write where it is used to submit request asynchronously. io->async==0 designates synchronous processing. Non-trivial part of the patch is changes in fuse_direct_io(): resources like fuse requests and user pages cannot be released immediately in async case. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-04-17 21:50:59 +02:00
Maxim Patlasov	01e9d11a3e	fuse: add support of async IO The patch implements a framework to process an IO request asynchronously. The idea is to associate several fuse requests with a single kiocb by means of fuse_io_priv structure. The structure plays the same role for FUSE as 'struct dio' for direct-io.c. The framework is supposed to be used like this: - someone (who wants to process an IO asynchronously) allocates fuse_io_priv and initializes it setting 'async' field to non-zero value. - as soon as fuse request is filled, it can be submitted (in non-blocking way) by fuse_async_req_send() - when all submitted requests are ACKed by userspace, io->reqs drops to zero triggering aio_complete() In case of IO initiated by libaio, aio_complete() will finish processing the same way as in case of dio_complete() calling aio_complete(). But the framework may be also used for internal FUSE use when initial IO request was synchronous (from user perspective), but it's beneficial to process it asynchronously. Then the caller should wait on kiocb explicitly and aio_complete() will wake the caller up. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-04-17 21:50:59 +02:00
Maxim Patlasov	796523fb24	fuse: add flag fc->initialized Existing flag fc->blocked is used to suspend request allocation both in case of many background request submitted and period of time before init_reply arrives from userspace. Next patch will skip blocking allocations of synchronous request (disregarding fc->blocked). This is mostly OK, but we still need to suspend allocations if init_reply is not arrived yet. The patch introduces flag fc->initialized which will serve this purpose. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-04-17 12:31:44 +02:00
Maxim Patlasov	8b41e6715e	fuse: make request allocations for background processing explicit There are two types of processing requests in FUSE: synchronous (via fuse_request_send()) and asynchronous (via adding to fc->bg_queue). Fortunately, the type of processing is always known in advance, at the time of request allocation. This preparatory patch utilizes this fact making fuse_get_req() aware about the type. Next patches will use it. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-04-17 12:31:44 +02:00
Eric Wong	634734b63a	fuse: allow control of adaptive readdirplus use For some filesystems (e.g. GlusterFS), the cost of performing a normal readdir and readdirplus are identical. Since adaptively using readdirplus has no benefit for those systems, give users/filesystems the option to control adaptive readdirplus use. v2 of this patch incorporates Miklos's suggestion to simplify the code, as well as improving consistency of macro names and documentation. Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-02-07 14:25:44 +01:00
Feng Shuo	4582a4ab2a	FUSE: Adapt readdirplus to application usage patterns Use the same adaptive readdirplus mechanism as NFS: http://permalink.gmane.org/gmane.linux.nfs/49299 If the user space implementation wants to disable readdirplus temporarily, it could just return ENOTSUPP. Then kernel will recall it with readdir. Signed-off-by: Feng Shuo <steve.shuo.feng@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-01-31 17:08:11 +01:00
Anatol Pomozov	c2132c1bc7	Do not use RCU for current process credentials Commit `c69e8d9c0` added rcu lock to fuse/dir.c It was assuming that 'task' is some other process but in fact this parameter always equals to 'current'. Inline this parameter to make it more readable and remove RCU lock as it is not needed when access current process credentials. Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-01-31 17:08:10 +01:00
Miklos Szeredi	fb05f41f5f	fuse: cleanup fuse_direct_io() Fix the following sparse warnings: fs/fuse/file.c:1216:43: warning: cast removes address space of expression fs/fuse/file.c:1216:43: warning: incorrect type in initializer (different address spaces) fs/fuse/file.c:1216:43: expected void [noderef] <asn:1>iov_base fs/fuse/file.c:1216:43: got void <noident> fs/fuse/file.c:1241:43: warning: cast removes address space of expression fs/fuse/file.c:1241:43: warning: incorrect type in initializer (different address spaces) fs/fuse/file.c:1241:43: expected void [noderef] <asn:1>iov_base fs/fuse/file.c:1241:43: got void <noident> fs/fuse/file.c:1267:43: warning: cast removes address space of expression fs/fuse/file.c:1267:43: warning: incorrect type in initializer (different address spaces) fs/fuse/file.c:1267:43: expected void [noderef] <asn:1>iov_base fs/fuse/file.c:1267:43: got void <noident> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-01-24 16:21:28 +01:00
Maxim Patlasov	b2430d7567	fuse: add per-page descriptor <offset, length> to fuse_req The ability to save page pointers along with lengths and offsets in fuse_req will be useful to cover several iovec-s with a single fuse_req. Per-request page_offset is removed because anybody who need it can use req->page_descs[0].offset instead. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-01-24 16:21:27 +01:00
Maxim Patlasov	b111c8c0e3	fuse: categorize fuse_get_req() The patch categorizes all fuse_get_req() invocations into two categories: - fuse_get_req_nopages(fc) - when caller doesn't care about req->pages - fuse_get_req(fc, n) - when caller need n page pointers (n > 0) Adding fuse_get_req_nopages() helps to avoid numerous fuse_get_req(fc, 0) scattered over code. Now it's clear from the first glance when a caller need fuse_req with page pointers. The patch doesn't make any logic changes. In multi-page case, it silly allocates array of FUSE_MAX_PAGES_PER_REQ page pointers. This will be amended by future patches. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-01-24 16:21:25 +01:00
Maxim Patlasov	4250c0668e	fuse: general infrastructure for pages[] of variable size The patch removes inline array of FUSE_MAX_PAGES_PER_REQ page pointers from fuse_req. Instead of that, req->pages may now point either to small inline array or to an array allocated dynamically. This essentially means that all callers of fuse_request_alloc[_nofs] should pass the number of pages needed explicitly. The patch doesn't make any logic changes. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-01-24 16:21:25 +01:00
Anand V. Avati	0b05b18381	fuse: implement NFS-like readdirplus support This patch implements readdirplus support in FUSE, similar to NFS. The payload returned in the readdirplus call contains 'fuse_entry_out' structure thereby providing all the necessary inputs for 'faking' a lookup() operation on the spot. If the dentry and inode already existed (for e.g. in a re-run of ls -l) then just the inode attributes timeout and dentry timeout are refreshed. With a simple client->network->server implementation of a FUSE based filesystem, the following performance observations were made: Test: Performing a filesystem crawl over 20,000 files with sh# time ls -lR /mnt Without readdirplus: Run 1: 18.1s Run 2: 16.0s Run 3: 16.2s With readdirplus: Run 1: 4.1s Run 2: 3.8s Run 3: 3.8s The performance improvement is significant as it avoided 20,000 upcalls calls (lookup). Cache consistency is no worse than what already is. Signed-off-by: Anand V. Avati <avati@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-01-24 16:21:25 +01:00
Eric W. Biederman	499dcf2024	userns: Support fuse interacting with multiple user namespaces Use kuid_t and kgid_t in struct fuse_conn and struct fuse_mount_data. The connection between between a fuse filesystem and a fuse daemon is established when a fuse filesystem is mounted and provided with a file descriptor the fuse daemon created by opening /dev/fuse. For now restrict the communication of uids and gids between the fuse filesystem and the fuse daemon to the initial user namespace. Enforce this by verifying the file descriptor passed to the mount of fuse was opened in the initial user namespace. Ensuring the mount happens in the initial user namespace is not necessary as mounts from non-initial user namespaces are not yet allowed. In fuse_req_init_context convert the currrent fsuid and fsgid into the initial user namespace for the request that will be sent to the fuse daemon. In fuse_fill_attr convert the uid and gid passed from the fuse daemon from the initial user namespace into kuids and kgids. In iattr_to_fattr called from fuse_setattr convert kuids and kgids into the uids and gids in the initial user namespace before passing them to the fuse filesystem. In fuse_change_attributes_common called from fuse_dentry_revalidate, fuse_permission, fuse_geattr, and fuse_setattr, and fuse_iget convert the uid and gid from the fuse daemon into a kuid and a kgid to store on the fuse inode. By default fuse mounts are restricted to task whose uid, suid, and euid matches the fuse user_id and whose gid, sgid, and egid matches the fuse group id. Convert the user_id and group_id mount options into kuids and kgids at mount time, and use uid_eq and gid_eq to compare the in fuse_allow_task. Cc: Miklos Szeredi <miklos@szeredi.hu> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-14 22:05:33 -08:00
Brian Foster	72d0d248ca	fuse: add FUSE_AUTO_INVAL_DATA init flag FUSE_AUTO_INVAL_DATA is provided to enable updated/auto cache invalidation logic. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2012-07-18 16:09:40 +02:00
Pavel Shilovsky	45c72cd73c	fuse: fix stat call on 32 bit platforms Now we store attr->ino at inode->i_ino, return attr->ino at the first time and then return inode->i_ino if the attribute timeout isn't expired. That's wrong on 32 bit platforms because attr->ino is 64 bit and inode->i_ino is 32 bit in this case. Fix this by saving 64 bit ino in fuse_inode structure and returning it every time we call getattr. Also squash attr->ino into inode->i_ino explicitly. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2012-05-14 17:06:42 +02:00
Miklos Szeredi	519c6040ce	fuse: optimize fallocate on permanent failure If userspace filesystem doesn't support fallocate, remember this and don't send request next time. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2012-04-26 10:56:36 +02:00
Linus Torvalds	6733e54b66	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: FUSE: Notifying the kernel of deletion. fuse: support ioctl on directories fuse: Use kcalloc instead of kzalloc to allocate array fuse: llseek optimize SEEK_CUR and SEEK_SET	2012-01-12 12:39:21 -08:00
Al Viro	541af6a074	fuse: propagate umode_t Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:55:07 -05:00
John Muir	451d0f5999	FUSE: Notifying the kernel of deletion. Allows a FUSE file-system to tell the kernel when a file or directory is deleted. If the specified dentry has the specified inode number, the kernel will unhash it. The current 'fuse_notify_inval_entry' does not cause the kernel to clean up directories that are in use properly, and as a result the users of those directories see incorrect semantics from the file-system. The error condition seen when 'fuse_notify_inval_entry' is used to notify of a deleted directory is avoided when 'fuse_notify_delete' is used instead. The following scenario demonstrates the difference: 1. User A chdirs into 'testdir' and starts reading 'testfile'. 2. User B rm -rf 'testdir'. 3. User B creates 'testdir'. 4. User C chdirs into 'testdir'. If you run the above within the same machine on any file-system (including fuse file-systems), there is no problem: user C is able to chdir into the new testdir. The old testdir is removed from the dentry tree, but still open by user A. If operations 2 and 3 are performed via the network such that the fuse file-system uses one of the notify functions to tell the kernel that the nodes are gone, then the following error occurs for user C while user A holds the original directory open: muirj@empacher:~> ls /test/testdir ls: cannot access /test/testdir: No such file or directory The issue here is that the kernel still has a dentry for testdir, and so it is requesting the attributes for the old directory, while the file-system is responding that the directory no longer exists. If on the other hand, if the file-system can notify the kernel that the directory is deleted using the new 'fuse_notify_delete' function, then the above ls will find the new directory as expected. Signed-off-by: John Muir <john@jmuir.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2011-12-13 11:58:49 +01:00
Miklos Szeredi	b18da0c56e	fuse: support ioctl on directories Multiplexing filesystems may want to support ioctls on the underlying files and directores (e.g. FS_IOC_{GET,SET}FLAGS). Ioctl support on directories was missing so add it now. Reported-by: Antonio SJ Musumeci <bile@landofbile.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2011-12-13 11:58:49 +01:00
Linus Torvalds	051732bcbe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message fuse: mark pages accessed when written to fuse: delete dead .write_begin and .write_end aops fuse: fix flock fuse: fix non-ANSI void function notation	2011-08-24 09:14:42 -07:00
Miklos Szeredi	37fb3a30b4	fuse: fix flock Commit `a9ff4f87` "fuse: support BSD locking semantics" overlooked a number of issues with supporing flock locks over existing POSIX locking infrastructure: - it's not backward compatible, passing flock(2) calls to userspace unconditionally (if userspace sets FUSE_POSIX_LOCKS) - it doesn't cater for the fact that flock locks are automatically unlocked on file release - it doesn't take into account the fact that flock exclusive locks (write locks) don't need an fd opened for write. The last one invalidates the original premise of the patch that flock locks can be emulated with POSIX locks. This patch fixes the first two issues. The last one needs to be fixed in userspace if the filesystem assumed that a write lock will happen only on a file operned for write (as in the case of the current fuse library). Reported-by: Sebastian Pipping <webmaster@hartwork.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2011-08-08 16:08:08 +02:00
Josef Bacik	02c24a8218	fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:59 -04:00
Miklos Szeredi	07d5f69b45	fuse: reduce size of struct fuse_request Reduce the size of struct fuse_request by removing cuse_init_out from the request structure and allocating it dinamically instead. CC: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2011-03-21 13:58:05 +01:00
Miklos Szeredi	5a18ec176c	fuse: fix hang of single threaded fuseblk filesystem Single threaded NTFS-3G could get stuck if a delayed RELEASE reply triggered a DESTROY request via path_put(). Fix this by a) making RELEASE requests synchronous, whenever possible, on fuseblk filesystems b) if not possible (triggered by an asynchronous read/write) then do the path_put() in a separate thread with schedule_work(). Reported-by: Oliver Neukum <oneukum@suse.de> Cc: stable@kernel.org Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2011-02-25 14:44:58 +01:00
Miklos Szeredi	02c048b919	fuse: allow batching of FORGET requests Terje Malmedal reports that a fuse filesystem with 32 million inodes on a machine with lots of memory can take up to 30 minutes to process FORGET requests when all those inodes are evicted from the icache. To solve this, create a BATCH_FORGET request that allows up to about 8000 FORGET requests to be sent in a single message. This request is only sent if userspace supports interface version 7.16 or later, otherwise fall back to sending individual FORGET messages. Reported-by: Terje Malmedal <terje.malmedal@usit.uio.no> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2010-12-07 20:16:56 +01:00
Miklos Szeredi	07e77dca8a	fuse: separate queue for FORGET requests Terje Malmedal reports that a fuse filesystem with 32 million inodes on a machine with lots of memory can go unresponsive for up to 30 minutes when all those inodes are evicted from the icache. The reason is that FORGET messages, sent when the inode is evicted, are queued up together with regular filesystem requests, and while the huge queue of FORGET messages are processed no other filesystem operation can proceed. Since a full fuse request structure is allocated for each inode, these take up quite a bit of memory as well. To solve these issues, create a slim 'fuse_forget_link' structure containing just the minimum of information required to send the FORGET request and chain these on a separate queue. When userspace is asking for a request make sure that FORGET and non-FORGET requests are selected fairly: for each 8 non-FORGET allow 16 FORGET requests. This will make sure FORGETs do not pile up, yet other requests are also allowed to proceed while the queued FORGETs are processed. Reported-by: Terje Malmedal <terje.malmedal@usit.uio.no> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2010-12-07 20:16:56 +01:00
Miklos Szeredi	2d45ba381a	fuse: add retrieve request Userspace filesystem can request data to be retrieved from the inode's mapping. This request is synchronous and the retrieved data is queued as a new request. If the write to the fuse device returns an error then the retrieve request was not completed and a reply will not be sent. Only present pages are returned in the retrieve reply. Retrieving stops when it finds a non-present page and only data prior to that is returned. This request doesn't change the dirty state of pages. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2010-07-12 14:41:40 +02:00
Miklos Szeredi	a1d75f2582	fuse: add store request Userspace filesystem can request data to be stored in the inode's mapping. This request is synchronous and has no reply. If the write to the fuse device returns an error then the store request was not fully completed (but may have updated some pages). If the stored data overflows the current file size, then the size is extended, similarly to a write(2) on the filesystem. Pages which have been completely stored are marked uptodate. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2010-07-12 14:41:40 +02:00
Linus Torvalds	003386fff3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: mm: export generic_pipe_buf_() to modules fuse: support splice() reading from fuse device fuse: allow splice to move pages mm: export remove_from_page_cache() to modules mm: export lru_cache_add_() to modules fuse: support splice() writing to fuse device fuse: get page reference for readpages fuse: use get_user_pages_fast() fuse: remove unneeded variable	2010-05-30 09:16:14 -07:00
Christoph Hellwig	7ea8085910	drop unused dentry argument to ->fsync Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:05:02 -04:00
Miklos Szeredi	ce534fb052	fuse: allow splice to move pages When splicing buffers to the fuse device with SPLICE_F_MOVE, try to move pages from the pipe buffer into the page cache. This allows populating the fuse filesystem's cache without ever touching the page contents, i.e. zero copy read capability. The following steps are performed when trying to move a page into the page cache: - buf->ops->confirm() to make sure the new page is uptodate - buf->ops->steal() to try to remove the new page from it's previous place - remove_from_page_cache() on the old page - add_to_page_cache_locked() on the new page If any of the above steps fail (non fatally) then the code falls back to copying the page. In particular ->steal() will fail if there are external references (other than the page cache and the pipe buffer) to the page. Also since the remove_from_page_cache() + add_to_page_cache_locked() are non-atomic it is possible that the page cache is repopulated in between the two and add_to_page_cache_locked() will fail. This could be fixed by creating a new atomic replace_page_cache_page() function. fuse_readpages_end() needed to be reworked so it works even if page->mapping is NULL for some or all pages which can happen if the add_to_page_cache_locked() failed. A number of sanity checks were added to make sure the stolen pages don't have weird flags set, etc... These could be moved into generic splice/steal code. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2010-05-25 15:06:07 +02:00
npiggin@suse.de	c08d3b0e33	truncate: use new helpers Update some fs code to make use of new helper functions introduced in the previous patch. Should be no significant change in behaviour (except CIFS now calls send_sig under i_lock, via inode_newsize_ok). Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Cc: linux-nfs@vger.kernel.org Cc: Trond.Myklebust@netapp.com Cc: linux-cifs-client@lists.samba.org Cc: sfrench@samba.org Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-09-24 08:41:47 -04:00
Csaba Henk	79a9d99434	fuse: add fusectl interface to max_background Make the max_background and congestion_threshold parameters of a FUSE mount tunable at runtime by adding the respective knobs to its directory within the fusectl filesystem. Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-09-16 14:15:29 +02:00
Csaba Henk	7a6d3c8b30	fuse: make the number of max background requests and congestion threshold tunable The practical values for these limits depend on the design of the filesystem server so let userspace set them at initialization time. Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-07-07 17:28:52 +02:00
John Muir	3b463ae0c6	fuse: invalidation reverse calls Add notification messages that allow the filesystem to invalidate VFS caches. Two notifications are added: 1) inode invalidation - invalidate cached attributes - invalidate a range of pages in the page cache (this is optional) 2) dentry invalidation - try to invalidate a subtree in the dentry cache Care must be taken while accessing the 'struct super_block' for the mount, as it can go away while an invalidation is in progress. To prevent this, introduce a rw-semaphore, that is taken for read during the invalidation and taken for write in the ->kill_sb callback. Cc: Csaba Henk <csaba@gluster.com> Cc: Anand Avati <avati@zresearch.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-06-30 20:12:24 +02:00
Miklos Szeredi	e0a43ddcc0	fuse: allow umask processing in userspace This patch lets filesystems handle masking the file mode on creation. This is needed if filesystem is using ACLs. - The CREATE, MKDIR and MKNOD requests are extended with a "umask" parameter. - A new FUSE_DONT_MASK flag is added to the INIT request/reply. With this the filesystem may request that the create mode is not masked. CC: Jean-Pierre André <jean-pierre.andre@wanadoo.fr> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-06-30 20:12:23 +02:00
Tejun Heo	08cbf542bf	fuse: export symbols to be used by CUSE Export the following symbols for CUSE. fuse_conn_put() fuse_conn_get() fuse_conn_kill() fuse_send_init() fuse_do_open() fuse_sync_release() fuse_direct_io() fuse_do_ioctl() fuse_file_poll() fuse_request_alloc() fuse_get_req() fuse_put_request() fuse_request_send() fuse_abort_conn() fuse_dev_release() fuse_dev_operations Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-04-28 16:56:42 +02:00
Tejun Heo	a325f9b922	fuse: update fuse_conn_init() and separate out fuse_conn_kill() Update fuse_conn_init() such that it doesn't take @sb and move bdi registration into a separate function. Also separate out fuse_conn_kill() from fuse_put_super(). These will be used to implement cuse. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-04-28 16:56:41 +02:00
Miklos Szeredi	8b0797a498	fuse: don't use inode in fuse_sync_release() Make fuse_sync_release() a generic helper function that doesn't need a struct inode pointer. This makes it suitable for use by CUSE. Change return value of fuse_release_common() from int to void. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-04-28 16:56:39 +02:00
Miklos Szeredi	91fe96b403	fuse: create fuse_do_open() helper for CUSE Create a helper for sending an OPEN request that doesn't need a struct inode pointer. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-04-28 16:56:37 +02:00
Miklos Szeredi	c7b7143c63	fuse: clean up args in fuse_finish_open() and fuse_release_fill() Move setting ff->fh, ff->nodeid and file->private_data outside fuse_finish_open(). Add ->open_flags member to struct fuse_file. This simplifies the argument passing to fuse_finish_open() and fuse_release_fill(), and paves the way for creating an open helper that doesn't need an inode pointer. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-04-28 16:56:37 +02:00
Miklos Szeredi	2106cb1893	fuse: don't use inode in helpers called by fuse_direct_io() Use ff->fc and ff->nodeid instead of passing down the inode. This prepares this function for use by CUSE, where the inode is not owned by a fuse filesystem. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-04-28 16:56:37 +02:00
Miklos Szeredi	da5e471457	fuse: add members to struct fuse_file Add new members ->fc and ->nodeid to struct fuse_file. This will aid in converting functions for use by CUSE, where the inode is not owned by a fuse filesystem. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-04-28 16:56:36 +02:00
Miklos Szeredi	b0be46ebf7	fuse: use struct path in release structure Use struct path instead of separate dentry and vfsmount in req->misc.release. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-04-28 16:56:36 +02:00
Al Viro	4269590a72	constify dentry_operations: FUSE Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:01 -04:00
Tejun Heo	43901aabd7	fuse: add fuse_conn->release() Add fuse_conn->release() so that fuse_conn can be embedded in other structures. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2008-11-26 12:03:56 +01:00
Tejun Heo	0d179aa592	fuse: separate out fuse_conn_init() from new_conn() Separate out fuse_conn_init() from new_conn() and while at it initialize fuse_conn->entry during conn initialization. This will be used by CUSE. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2008-11-26 12:03:55 +01:00
Tejun Heo	b93f858ab2	fuse: add fuse_ prefix to several functions Add fuse_ prefix to request_send*() and get_root_inode() as some of those functions will be exported for CUSE. With or without CUSE export, having the function names scoped is a good idea for debuggability. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2008-11-26 12:03:55 +01:00
Tejun Heo	95668a69a4	fuse: implement poll support Implement poll support. Polled files are indexed using kh in a RB tree rooted at fuse_conn->polled_files. Client should send FUSE_NOTIFY_POLL notification once after processing FUSE_POLL which has FUSE_POLL_SCHEDULE_NOTIFY set. Sending notification unconditionally after the latest poll or everytime file content might have changed is inefficient but won't cause malfunction. fuse_file_poll() can sleep and requires patches from the following thread which allows f_op->poll() to sleep. http://thread.gmane.org/gmane.linux.kernel/726176 Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2008-11-26 12:03:55 +01:00
Tejun Heo	acf99433d9	fuse: add file kernel handle The file handle, fuse_file->fh, is opaque value supplied by userland FUSE server and uniqueness is not guaranteed. Add file kernel handle, fuse_file->kh, which is allocated by the kernel on file allocation and guaranteed to be unique. This will be used by poll to match notification to the respective file but can be used for other purposes where unique file handle is necessary. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2008-11-26 12:03:55 +01:00
Miklos Szeredi	1729a16c2c	fuse: style fixes Fix coding style errors reported by checkpatch and others. Uptdate copyright date to 2008. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2008-11-26 12:03:54 +01:00
Tejun Heo	29d434b39c	fuse: add include protectors Add include protectors to include/linux/fuse.h and fs/fuse/fuse_i.h. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2008-10-16 16:08:57 +02:00
Miklos Szeredi	33670fa296	fuse: nfs export special lookups Implement the get_parent export operation by sending a LOOKUP request with ".." as the name. Implement looking up an inode by node ID after it has been evicted from the cache. This is done by seding a LOOKUP request with "." as the name (for all file types, not just directories). The filesystem can set the FUSE_EXPORT_SUPPORT flag in the INIT reply, to indicate that it supports these special lookups. Thanks to John Muir for the original implementation of this feature. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Miklos Szeredi	dbd561d236	fuse: add export operations Implement export_operations, to allow fuse filesystems to be exported to NFS. This feature has been in the out-of-tree fuse module, and is widely used and tested. It has not been originally merged into mainline, because doing the NFS export in userspace was thought to be a cleaner and more efficient way of doing it, than through the kernel. While that is true, it would also have involved a lot of duplicated effort at reimplementing NFS exporting (all the different versions of the protocol). This effort was unfortunately not undertaken by anyone, so we are left with doing it the easy but less efficient way. If this feature goes in, the out-of-tree fuse module can go away, which would have several advantages: - not having to maintain two versions - less confusion for users - no bugs due to kernel API changes Comment from hch: - Use the same fh_type values as XFS, since we use the same fh encoding. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Miklos Szeredi	78bb6cb9a8	fuse: add flag to turn on big writes Prior to 2.6.26 fuse only supported single page write requests. In theory all fuse filesystem should be able support bigger than 4k writes, as there's nothing in the API to prevent it. Unfortunately there's a known case in NTFS-3G where big writes cause filesystem corruption. There could also be other filesystems, where the lack of testing with big write requests would result in bugs. To prevent such problems on a kernel upgrade, disable big writes by default, but let filesystems set a flag to turn it on. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Szabolcs Szakacsits <szaka@ntfs-3g.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-13 08:02:26 -07:00
Miklos Szeredi	b48badf013	fuse: fix node ID type Node ID is 64bit but it is passed as unsigned long to some functions. This breakage wasn't noticed, because libfuse uses unsigned long too. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:51 -07:00
Miklos Szeredi	5c5c5e51b2	fuse: update file size on short read If the READ request returned a short count, then either - cached size is incorrect - filesystem is buggy, as short reads are only allowed on EOF So assume that the size is wrong and refresh it, so that cached read() doesn't zero fill the missing chunk. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:50 -07:00
Miklos Szeredi	3be5a52b30	fuse: support writable mmap Quoting Linus (3 years ago, FUSE inclusion discussions): "User-space filesystems are hard to get right. I'd claim that they are almost impossible, unless you limit them somehow (shared writable mappings are the nastiest part - if you don't have those, you can reasonably limit your problems by limiting the number of dirty pages you accept through normal "write()" calls)." Instead of attempting the impossible, I've just waited for the dirty page accounting infrastructure to materialize (thanks to Peter Zijlstra and others). This nicely solved the biggest problem: limiting the number of pages used for write caching. Some small details remained, however, which this largish patch attempts to address. It provides a page writeback implementation for fuse, which is completely safe against VM related deadlocks. Performance may not be very good for certain usage patterns, but generally it should be acceptable. It has been tested extensively with fsx-linux and bash-shared-mapping. Fuse page writeback design -------------------------- fuse_writepage() allocates a new temporary page with GFP_NOFS\|__GFP_HIGHMEM. It copies the contents of the original page, and queues a WRITE request to the userspace filesystem using this temp page. The writeback is finished instantly from the MM's point of view: the page is removed from the radix trees, and the PageDirty and PageWriteback flags are cleared. For the duration of the actual write, the NR_WRITEBACK_TEMP counter is incremented. The per-bdi writeback count is not decremented until the actual write completes. On dirtying the page, fuse waits for a previous write to finish before proceeding. This makes sure, there can only be one temporary page used at a time for one cached page. This approach is wasteful in both memory and CPU bandwidth, so why is this complication needed? The basic problem is that there can be no guarantee about the time in which the userspace filesystem will complete a write. It may be buggy or even malicious, and fail to complete WRITE requests. We don't want unrelated parts of the system to grind to a halt in such cases. Also a filesystem may need additional resources (particularly memory) to complete a WRITE request. There's a great danger of a deadlock if that allocation may wait for the writepage to finish. Currently there are several cases where the kernel can block on page writeback: - allocation order is larger than PAGE_ALLOC_COSTLY_ORDER - page migration - throttle_vm_writeout (through NR_WRITEBACK) - sync(2) Of course in some cases (fsync, msync) we explicitly want to allow blocking. So for these cases new code has to be added to fuse, since the VM is not tracking writeback pages for us any more. As an extra safetly measure, the maximum dirty ratio allocated to a single fuse filesystem is set to 1% by default. This way one (or several) buggy or malicious fuse filesystems cannot slow down the rest of the system by hogging dirty memory. With appropriate privileges, this limit can be raised through '/sys/class/bdi/<bdi>/max_ratio'. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:50 -07:00
Miklos Szeredi	b6f2fcbcfc	mm: bdi: expose the BDI object in sysfs for FUSE Register FUSE's backing_dev_info under sysfs with the name "fuse-MAJOR:MINOR" Make the fuse control filesystem use s_dev instead of a fuse specific ID. This makes it easier to match directories under /sys/fs/fuse/connections/ with directories under /sys/class/bdi, and with actual mounts. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:49 -07:00
Miklos Szeredi	d12def1bcb	fuse: limit queued background requests Libfuse basically creates a new thread for each new request. This is fine for synchronous requests, which are naturally limited. However background requests (especially writepage) can cause a thread creation storm. To avoid this, limit the number of background requests available to userspace. This is done by introducing another queue for background requests, and a counter for the number of "active" requests, which are currently available for userspace. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:13 -08:00
Miklos Szeredi	b57d426445	fuse: save space in struct fuse_req Move the fields 'dentry' and 'vfsmount' into the request specific union, since these are only used for the RELEASE request. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:13 -08:00
Miklos Szeredi	a6643094e7	fuse: pass open flags to read and write Some open flags (O_APPEND, O_DIRECT) can be changed with fcntl(F_SETFL, ...) after open, but fuse currently only sends the flags to userspace in open. To make it possible to correcly handle changing flags, send the current value to userspace in each read and write. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:54 -08:00
Miklos Szeredi	bcb4be809d	fuse: fix reading past EOF Currently reading a fuse file will stop at cached i_size and return EOF, even though the file might have grown since the attributes were last updated. So detect if trying to read past EOF, and refresh the attributes before continuing with the read. Thanks to mpb for the report. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:54 -08:00
Miklos Szeredi	f33321141b	fuse: add support for mandatory locking For mandatory locking the userspace filesystem needs to know the lock ownership for read, write and truncate operations. This patch adds the necessary fields to the protocol. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	b25e82e567	fuse: add helper for asynchronous writes This patch adds a new helper function fuse_write_fill() which makes it possible to send WRITE requests asynchronously. A new flag for WRITE requests is also added which indicates that this a write from the page cache, and not a "normal" file write. This patch is in preparation for writable mmap support. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	93a8c3cd9e	fuse: add list of writable files to fuse_inode Each WRITE request must carry a valid file descriptor. When a page is written back from a memory mapping, the file through which the page was dirtied is not available, so a new mechananism is needed to find a suitable file in ->writepage(s). A list of fuse_files is added to fuse_inode. The file is removed from the list in fuse_release(). This patch is in preparation for writable mmap support. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	6ff958edbf	fuse: add atomic open+truncate support This patch allows fuse filesystems to implement open(..., O_TRUNC) as a single request, instead of separate truncate and open requests. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	1fb69e7817	fuse: fix race between getattr and write Getattr and lookup operations can be running in parallel to attribute changing operations, such as write and setattr. This means, that if for example getattr was slower than a write, the cached size attribute could be set to a stale value. To prevent this race, introduce a per-filesystem attribute version counter. This counter is incremented whenever cached attributes are modified, and the incremented value stored in the inode. Before storing new attributes in the cache, getattr and lookup check, using the version number, whether the attributes have been modified during the request's lifetime. If so, the returned attributes are not cached, because they might be stale. Thanks to Jakub Bogusz for the bug report and test program. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Jakub Bogusz <jakub.bogusz@gemius.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:30 -07:00
Miklos Szeredi	e57ac68378	fuse: fix allowing operations The following operation didn't check if sending the request was allowed: setattr listxattr statfs Some other operations don't explicitly do the check, but VFS calls ->permission() which checks this. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:29 -07:00
Miklos Szeredi	ebc14c4dbe	fuse: fix permission checking on sticky directories The VFS checks sticky bits on the parent directory even if the filesystem defines it's own ->permission(). In some situations (sshfs, mountlo, etc) the user does have permission to delete a file even if the attribute based checking would not allow it. So work around this by storing the permission bits separately and returning them in stat(), but cutting the permission bits off from inode->i_mode. This is slightly hackish, but it's probably not worth it to add new infrastructure in VFS and a slight performance penalty for all filesystems, just for the sake of fuse. [Jan Engelhardt] cosmetic fixes Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:04 -07:00
Miklos Szeredi	244f6385c2	fuse: refresh stale attributes in fuse_permission() fuse_permission() didn't refresh inode attributes before using them, even if the validity has already expired. Thanks to Junjiro Okajima for spotting this. Also remove some old code to unconditionally refresh the attributes on the root inode. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:04 -07:00
Miklos Szeredi	c756e0a4d7	fuse: add reference counting to fuse_file Make lifetime of 'struct fuse_file' independent from 'struct file' by adding a reference counter and destructor. This will enable asynchronous page writeback, where it cannot be guaranteed, that the file is not released while a request with this file handle is being served. The actual RELEASE request is only sent when there are no more references to the fuse_file. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:03 -07:00
Miklos Szeredi	de5e3dec42	fuse: fix reserved request wake up Use wake_up_all instead of wake_up in put_reserved_req(), otherwise it is possible that the right task is not woken up. Also create a separate reserved_req_waitq in addition to the blocked_waitq, since they fulfill totally separate functions. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:03 -07:00
Miklos Szeredi	f92b99b9dc	fuse: update backing_dev_info congestion state Set the read and write congestion state if the request queue is close to blocking, and clear it when it's not. This prevents unnecessary blocking in readahead and (when writable mmaps are allowed) writeback. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:03 -07:00
Timo Savola	a5bfffac64	[PATCH] fuse: validate rootmode mount option If rootmode isn't valid, we hit the BUG() in fuse_init_inode. Now EINVAL is returned. Signed-off-by: Timo Savola <tsavola@movial.fi> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-08 19:47:55 -07:00
Miklos Szeredi	0ec7ca41f6	[PATCH] fuse: add DESTROY operation Add a DESTROY operation for block device based filesystems. With the help of this operation, such a filesystem can flush dirty data to the device synchronously before the umount returns. This is needed in situations where the filesystem is assumed to be clean immediately after unmount (e.g. ejecting removable media). Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:32 -08:00
Miklos Szeredi	b2d2272fae	[PATCH] fuse: add bmap support Add support for the BMAP operation for block device based filesystems. This is needed to support swap-files and lilo. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:32 -08:00
Miklos Szeredi	d2a85164aa	[PATCH] fuse: fix handling of moved directory Fuse considered it an error (EIO) if lookup returned a directory inode, to which a dentry already refered. This is because directory aliases are not allowed. But in a network filesystem this could happen legitimately, if a directory is moved on a remote client. This patch attempts to relax the restriction by trying to first evict the offending alias from the cache. If this fails, it still returns an error (EBUSY). A rarer situation is if an mkdir races with an indenpendent lookup, which finds the newly created directory already moved. In this situation the mkdir should return success, but that would be incorrect, since the dentry cannot be instantiated, so return EBUSY. Previously checking for a directory alias and instantiation of the dentry weren't done atomically in lookup/mkdir, hence two such calls racing with each other could create aliased directories. To prevent this introduce a new per-connection mutex: fuse_conn->inst_mutex, which is taken for instantiations with a directory inode. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:45 -07:00
Miklos Szeredi	0a0898cf41	[PATCH] fuse: use jiffies_64 It is entirely possible (though rare) that jiffies half-wraps around, while a dentry/inode remains in the cache. This could mean that the dentry/inode is not invalidated for another half wraparound-time. To get around this problem, use 64-bit jiffies. The only problem with this is that dentry->d_time is 32 bits on 32-bit archs. So use d_fsdata as the high 32 bits. This is an ugly hack, but far simpler, than having to allocate private data just for this purpose. Since 64-bit jiffies can be assumed never to wrap around, simple comparison can be used, and a zero time value can represent "invalid". Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-31 13:28:43 -07:00
Miklos Szeredi	9c8ef5614d	[PATCH] fuse: scramble lock owner ID VFS uses current->files pointer as lock owner ID, and it wouldn't be prudent to expose this value to userspace. So scramble it with XTEA using a per connection random key, known only to the kernel. Only one direction needs to be implemented, since the ID is never sent in the reverse direction. The XTEA algorithm is implemented inline since it's simple enough to do so, and this adds less complexity than if the crypto API were used. Thanks to Jesper Juhl for the idea. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-25 10:01:20 -07:00
Miklos Szeredi	a4d27e75ff	[PATCH] fuse: add request interruption Add synchronous request interruption. This is needed for file locking operations which have to be interruptible. However filesystem may implement interruptibility of other operations (e.g. like NFS 'intr' mount option). Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-25 10:01:19 -07:00
Miklos Szeredi	f9a2842e56	[PATCH] fuse: rename the interrupted flag Rename the 'interrupted' flag to 'aborted', since it indicates exactly that, and next patch will introduce an 'interrupted' flag for a Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-25 10:01:19 -07:00
Miklos Szeredi	33649c91a3	[PATCH] fuse: ensure FLUSH reaches userspace All POSIX locks owned by the current task are removed on close(). If the FLUSH request resulting initiated by close() fails to reach userspace, there might be locks remaining, which cannot be removed. The only reason it could fail, is if allocating the request fails. In this case use the request reserved for RELEASE, or if that is currently used by another FLUSH, wait for it to become available. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-25 10:01:19 -07:00
Miklos Szeredi	7142125937	[PATCH] fuse: add POSIX file locking support This patch adds POSIX file locking support to the fuse interface. This implementation doesn't keep any locking state in kernel. Unlocking on close() is handled by the FLUSH message, which now contains the lock owner id. Mandatory locking is not supported. The filesystem may enfoce mandatory locking in userspace if needed. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-25 10:01:19 -07:00
Miklos Szeredi	bafa96541b	[PATCH] fuse: add control filesystem Add a control filesystem to fuse, replacing the attributes currently exported through sysfs. An empty directory '/sys/fs/fuse/connections' is still created in sysfs, and mounting the control filesystem here provides backward compatibility. Advantages of the control filesystem over the previous solution: - allows the object directory and the attributes to be owned by the filesystem owner, hence letting unpriviled users abort the filesystem connection - does not suffer from module unload race [akpm@osdl.org: fix this fs for recent dhowells depredations] [akpm@osdl.org: fix 64-bit printk warnings] Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-25 10:01:19 -07:00
Miklos Szeredi	51eb01e735	[PATCH] fuse: no backgrounding on interrupt Don't put requests into the background when a fatal interrupt occurs while the request is in userspace. This removes a major wart from the implementation. Backgrounding of requests was introduced to allow breaking of deadlocks. However now the same can be achieved by aborting the filesystem through the 'abort' sysfs attribute. This is a change in the interface, but should not cause problems, since these kinds of deadlocks never happen during normal operation. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-25 10:01:19 -07:00
Miklos Szeredi	5a5fb1ea74	Revert "[fuse] fix deadlock between fuse_put_super() and request_end()" This reverts `73ce8355c2` commit. It was wrong, because it didn't take into account the requirement, that iput() for background requests must be performed synchronously with ->put_super(), otherwise active inodes may remain after unmount. The right solution is to keep the sbput_sem and perform iput() within the locked region, but move fput() outside sbput_sem. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>	2006-04-26 10:48:55 +02:00
Miklos Szeredi	9bc5dddad1	[fuse] Fix accounting the number of waiting requests Properly accounting the number of waiting requests was forgotten in "clean up request accounting" patch. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>	2006-04-11 21:16:09 +02:00
Miklos Szeredi	73ce8355c2	[fuse] fix deadlock between fuse_put_super() and request_end() A deadlock was possible, when the last reference to the superblock was held due to a background request containing a file reference. Releasing the file would release the vfsmount which in turn would release the superblock. Since sbput_sem is held during the fput() and fuse_put_super() tries to acquire this same semaphore, a deadlock results. The chosen soltuion is to get rid of sbput_sem, and instead use the spinlock to ensure the referenced inodes/file are released only once. Since the actual release may sleep, defer these outside the locked region, but using local variables instead of the structure members. This is a much more rubust solution. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>	2006-04-11 21:14:26 +02:00
Miklos Szeredi	08a53cdce6	[PATCH] fuse: account background requests The previous patch removed limiting the number of outstanding requests. This patch adds a much simpler limiting, that is also compatible with file locking operations. A task may have at most one synchronous request allocated. So these requests need not be otherwise limited. However the number of background requests (release, forget, asynchronous reads, interrupted requests) can grow indefinitely. This can be used by a malicous user to cause FUSE to allocate arbitrary amounts of unswappable kernel memory, denying service. For this reason add a limit for the number of background requests, and block allocations of new requests until the number goes bellow the limit. Also use this mechanism to block all requests until the INIT reply is received. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-04-11 06:18:49 -07:00
Miklos Szeredi	ce1d5a491f	[PATCH] fuse: clean up request accounting FUSE allocated most requests from a fixed size pool filled at mount time. However in some cases (release/forget) non-pool requests were used. File locking operations aren't well served by the request pool, since they may block indefinetly thus exhausting the pool. This patch removes the request pool and always allocates requests on demand. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-04-11 06:18:49 -07:00
Miklos Szeredi	d713311464	[PATCH] fuse: use a per-mount spinlock Remove the global spinlock in favor of a per-mount one. This patch is basically find & replace. The difficult part has already been done by the previous patch. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-04-11 06:18:48 -07:00
Jeff Dike	385a17bfc3	[PATCH] fuse: add O_ASYNC support to FUSE device This adds asynchronous notification to FUSE - a FUSE server can request O_ASYNC on a /dev/fuse file descriptor and receive SIGIO when there is input available. One subtlety - fuse_dev_fasync, which is called when O_ASYNC is requested, does no locking, unlink the other methods. I think it's unnecessary, as the fuse_conn.fasync list is manipulated only by fasync_helper and kill_fasync, which provide their own locking. It would also be wrong to use the fuse_lock, as it's a spin lock and fasync_helper can sleep. My one concern with this is the fuse_conn going away underneath fuse_dev_fasync - sys_fcntl takes a reference on the file struct, so this seems not to be a problem. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-04-11 06:18:48 -07:00
Arjan van de Ven	4b6f5d20b0	[PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-28 09:16:06 -08:00
Miklos Szeredi	9cd6845511	[PATCH] fuse: fix async read for legacy filesystems While asynchronous reads mean a performance improvement in most cases, if the filesystem assumed that reads are synchronous, then async reads may degrade performance (filesystem may receive reads out of order, which can confuse it's own readahead logic). With sshfs a 1.5 to 4 times slowdown can be measured. There's also a need for userspace filesystems to know whether asynchronous reads are supported by the kernel or not. To achive these, negotiate in the INIT request whether async reads will be used and the maximum readahead value. Update interface version to 7.6 If userspace uses a version earlier than 7.6, then disable async reads, and set maximum readahead value to the maximum read size, as done in previous versions. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-02-01 08:53:09 -08:00

1 2 3 4

181 Commits