Make each fuse device clone refer to a separate processing queue. The only
constraint on userspace code is that the request answer must be written to
the same device clone as it was read off.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Allow fuse device clones to refer to be distinguished. This patch just
adds the infrastructure by associating a separate "struct fuse_dev" with
each clone.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Allow an open fuse device to be "cloned". Userspace can create a clone by:
newfd = open("/dev/fuse", O_RDWR)
ioctl(newfd, FUSE_DEV_IOC_CLONE, &oldfd);
At this point newfd will refer to the same fuse connection as oldfd.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
In fuse_abort_conn() when all requests are on private lists we no longer
need fc->lock protection.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
No longer need to call request_end() with the connection lock held. We
still protect the background counters and queue with fc->lock, so acquire
it if necessary.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Now that we atomically test having already done everything we no longer
need other protection.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
When the connection is aborted it is possible that request_end() will be
called twice. Use atomic test and set to do the actual ending only once.
test_and_set_bit() also provides the necessary barrier semantics so no
explicit smp_wmb() is necessary.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
When an unlocked request is aborted, it is moved from fpq->io to a private
list. Then, after unlocking fpq->lock, the private list is processed and
the requests are finished off.
To protect the private list, we need to mark the request with a flag, so if
in the meantime the request is unlocked the list is not corrupted.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Add a fpq->lock for protecting members of struct fuse_pqueue and FR_LOCKED
request flag.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
- locked list_add() + list_del_init() cancel out
- common handling of case when request is ended here in the read phase
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
This will allow checking ->connected just with the processing queue lock.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
This is just two fields: fc->io and fc->processing.
This patch just rearranges the fields, no functional change.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
wait_event_interruptible_exclusive_locked() will do everything
request_wait() does, so replace it.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Interrupt is only queued after the request has been sent to userspace.
This is either done in request_wait_answer() or fuse_dev_do_read()
depending on which state the request is in at the time of the interrupt.
If it's not yet sent, then queuing the interrupt is postponed until the
request is read. Otherwise (the request has already been read and is
waiting for an answer) the interrupt is queued immedidately.
We want to call queue_interrupt() without fc->lock protection, in which
case there can be a race between the two functions:
- neither of them queue the interrupt (thinking the other one has already
done it).
- both of them queue the interrupt
The first one is prevented by adding memory barriers, the second is
prevented by checking (under fiq->waitq.lock) if the interrupt has already
been queued.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Use fiq->waitq.lock for protecting members of struct fuse_iqueue and
FR_PENDING request flag, previously protected by fc->lock.
Following patches will remove fc->lock protection from these members.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
This will allow checking ->connected just with the input queue lock.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
The input queue contains normal requests (fc->pending), forgets
(fc->forget_*) and interrupts (fc->interrupts). There's also fc->waitq and
fc->fasync for waking up the readers of the fuse device when a request is
available.
The fc->reqctr is also moved to the input queue (assigned to the request
when the request is added to the input queue.
This patch just rearranges the fields, no functional change.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Use flags for representing the state in fuse_req. This is needed since
req->list will be protected by different locks in different states, hence
we'll want the state itself to be split into distinct bits, each protected
with the relevant lock in that state.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
FUSE_REQ_INIT is actually the same state as FUSE_REQ_PENDING and
FUSE_REQ_READING and FUSE_REQ_WRITING can be merged into a common
FUSE_REQ_IO state.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Only hold fc->lock over sections of request_wait_answer() that actually
need it. If wait_event_interruptible() returns zero, it means that the
request finished. Need to add memory barriers, though, to make sure that
all relevant data in the request is synchronized.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Since it's a 64bit counter, it's never gonna wrap around. Remove code
dealing with that possibility.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Splice fc->pending and fc->processing lists into a common kill list while
holding fc->lock.
By the time we release fc->lock, pending and processing lists are empty and
the io list contains only locked requests.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Finer grained locking will mean there's no single lock to protect
modification of bitfileds in fuse_req.
So move to using bitops. Can use the non-atomic variants for those which
happen while the request definitely has only one reference.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
- don't end the request while req->locked is true
- make unlock_request() return an error if the connection was aborted
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
fuse_abort_conn() does all the work done by fuse_dev_release() and more.
"More" consists of:
end_io_requests(fc);
wake_up_all(&fc->waitq);
kill_fasync(&fc->fasync, SIGIO, POLL_IN);
All of which should be no-op (WARN_ON's added).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
And the same with fuse_request_send_nowait_locked().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
fc->conn_error is set once in FUSE_INIT reply and never cleared. Check it
in request allocation, there's no sense in doing all the preparation if
sending will surely fail.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Move accounting of fc->num_waiting to the point where the request actually
starts waiting. This is earlier than the current queue_request() for
background requests, since they might be waiting on the fc->bg_queue before
being queued on fc->pending.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Reset req->waiting in fuse_put_request(). This is needed for correct
accounting in fc->num_waiting for reserved requests.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
request_end() expects fc->num_background and fc->active_background to have
been incremented, which is not the case in fuse_request_send_nowait()
failure path. So instead just call the ->end() callback (which is actually
set by all callers).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
fc->release is called from fuse_conn_put() which was used in the error
cleanup before fc->release was initialized.
[Jeremiah Mahler <jmmahler@gmail.com>: assign fc->release after calling
fuse_conn_init(fc) instead of before.]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Fixes: a325f9b922 ("fuse: update fuse_conn_init() and separate out fuse_conn_kill()")
Cc: <stable@vger.kernel.org> #v2.6.31+
We used to read file_handle twice. Once to get the amount of extra
bytes, and once to fetch the entire structure.
This may be problematic since we do size verifications only after the
first read, so if the number of extra bytes changes in userspace between
the first and second calls, we'll have an incoherent view of
file_handle.
Instead, read the constant size once, and copy that over to the final
structure without having to re-read it again.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs fix from Al Viro:
"Off-by-one in d_walk()/__dentry_kill() race fix.
It's very hard to hit; possible in the same conditions as the original
bug, except that you need the skipped branch to contain all the
remaining evictables, so that the d_walk()-calling loop in
d_invalidate() decides there's nothing more to do and doesn't go for
another pass - otherwise that next pass will sweep the sucker.
So it's not too urgent, but seeing that the fix is obvious and the
original commit has spread into all -stable branches..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
d_walk() might skip too much
Changes in this update:
o regression fix for new rename whiteout code
o regression fixes for new superblock generic per-cpu counter code
o fix for incorrect error return sign introduced in 3.17
o metadata corruption fixes that need to go back to -stable kernels
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJVaO0JAAoJEK3oKUf0dfodd4UQANRdfXnUrpyQGhVS7HFoFoVt
FIQ52pPGbMu72+DqHc+Q41uvgAPe65LFB2VUL6CUGCMExstF72F5+QonzppMgkMo
unPER3eB8ya03SY+Kp+803ZGgzI2Nl2M6w8Kof730/RUk56PTGYIx4eLXd6iZSli
RsYjw8JDbeue5OQo5FPmLCSQ/Kr5ZJXbgWVPyWkKg9aCcXLN5YSJIV3xcMTK9Q2I
LqqODkyatnGc6YxGAKddS7Xzt1ntlZgbe5mndQw04a2g0Lf6emPH5r8b0UJXIu96
advOBX0pEbad4FeFS6Mo5D+nNCaaNP4WzN7wgdb+BYNVw3ss4Ebam7+yY6Gexg6y
bzZOEkk9saL4YeBDgyYICNu7kG4BRVKRQiiX220G6SFXM3nqbl7qBPb3kVFyDpcI
RRuFJ0ZV0kFJ+3IQ4xVnIh6nootceRk/mvZaK5HhLhQLzklpZ8fj4HF3oBDUAnvN
wNd+7GoZy7zldjCkbF4BP3GjUeW+b9ngrCNc+bFXi5cUbdECXAa2krjxyY+MlQF2
veNVVcsoRdfeM0VjJh2/piGJxMWIlRqXdKzPKsfMWnlIaJ6YyslfbSq+2K7LxgGR
Ho3Sjt0oUuPMZ9F/Mjj+XDqwmzgooUHXNyDBxhGXBNBPjApcRLcb2vQ2SrWEmeGJ
vZmC2R1ZoGdBJg8a55BT
=w5SP
-----END PGP SIGNATURE-----
Merge tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs fixes from Dave Chinner:
"This is a little larger than I'd like late in the release cycle, but
all the fixes are for regressions introduced in the 4.1-rc1 merge, or
are needed back in -stable kernels fairly quickly as they are
filesystem corruption or userspace visible correctness issues.
Changes in this update:
- regression fix for new rename whiteout code
- regression fixes for new superblock generic per-cpu counter code
- fix for incorrect error return sign introduced in 3.17
- metadata corruption fixes that need to go back to -stable kernels"
* tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: fix broken i_nlink accounting for whiteout tmpfile inode
xfs: xfs_iozero can return positive errno
xfs: xfs_attr_inactive leaves inconsistent attr fork state behind
xfs: extent size hints can round up extents past MAXEXTLEN
xfs: inode and free block counters need to use __percpu_counter_compare
percpu_counter: batch size aware __percpu_counter_compare()
xfs: use percpu_counter_read_positive for mp->m_icount
when we find that a child has died while we'd been trying to ascend,
we should go into the first live sibling itself, rather than its sibling.
Off-by-one in question had been introduced in "deal with deadlock in
d_walk()" and the fix needs to be backported to all branches this one
has been backported to.
Cc: stable@vger.kernel.org # 3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Both 'i' and 'bits_per_entry' are signed integers but the result is a
u64 block number. Cast i to u64 to avoid truncation on 32-bit targets.
Found by Coverity (CID 200679).
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The count variable is used to iterate down to (below) zero from the size
of the bitmap and handle the one-filling the remainder of the last
partial bitmap block. The loop conditional expects count to be signed
in order to detect when the final block is processed, after which count
goes negative.
Unfortunately, a recent change made this unsigned along with some other
related fields. The result of is this is that during mount,
omfs_get_imap will overrun the bitmap array and corrupt memory unless
number of blocks happens to be a multiple of 8 * blocksize.
Fix by changing count back to signed: it is guaranteed to fit in an s32
without overflow due to an enforced limit on the number of blocks in the
filesystem.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A static checker found the following issue in the error path for
omfs_fill_super:
fs/omfs/inode.c:552 omfs_fill_super()
warn: missing error code here? 'd_make_root()' failed. 'ret' = '0'
Fix by returning -ENOMEM in this case.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
match_token() expects a NULL terminator at the end of the token list so
that it would know where to stop. Not having one causes it to overrun
to invalid memory.
In practice, passing a mount option that omfs didn't recognize would
sometimes panic the system.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
load_elf_binary() returns `retval', not `error'.
Fixes: a87938b2e2 ("fs/binfmt_elf.c: fix bug in loading of PIE binaries")
Reported-by: James Hogan <james.hogan@imgtec.com>
Cc: Michael Davidson <md@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
XFS uses the internal tmpfile() infrastructure for the whiteout inode
used for RENAME_WHITEOUT operations. For tmpfile inodes, XFS allocates
the inode, drops di_nlink, adds the inode to the agi unlinked list,
calls d_tmpfile() which correspondingly drops i_nlink of the vfs inode,
and then finishes the common inode setup (e.g., clear I_NEW and unlock).
The d_tmpfile() call was originally made inxfs_create_tmpfile(), but was
pulled up out of that function as part of the following commit to
resolve a deadlock issue:
330033d6 xfs: fix tmpfile/selinux deadlock and initialize security
As a result, callers of xfs_create_tmpfile() are responsible for either
calling d_tmpfile() or fixing up i_nlink appropriately. The whiteout
tmpfile allocation helper does neither. As a result, the vfs ->i_nlink
becomes inconsistent with the on-disk ->di_nlink once xfs_rename() links
it back into the source dentry and calls xfs_bumplink().
Update the assert in xfs_rename() to help detect this problem in the
future and update xfs_rename_alloc_whiteout() to decrement the link
count as part of the manual tmpfile inode setup.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
It was missed when we converted everything in XFs to use negative error
numbers, so fix it now. Bug introduced in 3.17 by commit 2451337 ("xfs: global
error sign conversion"), and should go back to stable kernels.
Thanks to Brian Foster for noticing it.
cc: <stable@vger.kernel.org> # 3.17, 3.18, 3.19, 4.0
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>