Race-free addition and removal of a mark to a groups mark list would be easier
if we could lock the mark list of group before we lock the specific mark.
This patch changes the order used to add/remove marks to/from mark lists from
1. mark->lock
2. group->mark_lock
3. inode->i_lock
to
1. group->mark_lock
2. mark->lock
3. inode->i_lock
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Get a group ref for each mark that is added to the groups list and release that
ref when the mark is freed in fsnotify_put_mark().
We also use get a group reference for duplicated marks and for private event
data.
Now we dont free a group any more when the number of marks becomes 0 but when
the groups ref count does. Since this will only happen when all marks are removed
from a groups mark list, we dont have to set the groups number of marks to 1 at
group creation.
Beside clearing all marks in fsnotify_destroy_group() we do also flush the
groups event queue. This is since events may hold references to groups (due to
private event data) and we have to put those references first before we get a
chance to put the final ref, which will result in a call to
fsnotify_final_destroy_group().
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Introduce fsnotify_get_group() which increments the reference counter of a group.
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently in fsnotify_put_group() the ref count of a group is decremented and if
it becomes 0 fsnotify_destroy_group() is called. Since a groups ref count is only
at group creation set to 1 and never increased after that a call to fsnotify_put_group()
always results in a call to fsnotify_destroy_group().
With this patch fsnotify_destroy_group() is called directly.
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
IBM reported a deadlock in select_parent(). This was found to be caused
by taking rename_lock when already locked when restarting the tree
traversal.
There are two cases when the traversal needs to be restarted:
1) concurrent d_move(); this can only happen when not already locked,
since taking rename_lock protects against concurrent d_move().
2) racing with final d_put() on child just at the moment of ascending
to parent; rename_lock doesn't protect against this rare race, so it
can happen when already locked.
Because of case 2, we need to be able to handle restarting the traversal
when rename_lock is already held. This patch fixes all three callers of
try_to_ascend().
IBM reported that the deadlock is gone with this patch.
[ I rewrote the patch to be smaller and just do the "goto again" if the
lock was already held, but credit goes to Miklos for the real work.
- Linus ]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs fixes from Al Viro:
"A couple of fixes; one for automount/lazy umount race, another a
classic "we don't protect the refcount transition to zero with the
lock that protects looking for object in hash" kind of crap in lockd."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
close the race in nlmsvc_free_block()
do_add_mount()/umount -l races
"Search list for X" sounds like you're trying to find X on a list.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
normally we deal with lock_mount()/umount races by checking that
mountpoint to be is still in our namespace after lock_mount() has
been done. However, do_add_mount() skips that check when called
with MNT_SHRINKABLE in flags (i.e. from finish_automount()). The
reason is that ->mnt_ns may be a temporary namespace created exactly
to contain automounts a-la NFS4 referral handling. It's not the
namespace of the caller, though, so check_mnt() would fail here.
We still need to check that ->mnt_ns is non-NULL in that case,
though.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
- fix a regression related to xfs_sync_worker racing with unmount.
- fix a race while discarding xfs buffers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJQWO3uAAoJENaLyazVq6ZOjfcP+gJkcJLS5+qmyNEcW2IUH0+E
4WptMdBCLgZGa54aGAJ2mwg0FysyyTiTXjOSETRiBU+N3bAhgweucRsxc8z+awen
L+InHr8YgQyAoY0nhEcXI/EuHaF9OlgVT6YCOqr/V4gtLO+aczovQS1wA3w/pjAk
RWa4z+VlH+D9KenatoCcHSY6PIPO9pLs4Gfb7D/9BLFN+f6OnIaUlkwIQSuumuaw
Lt/sw24/FEBYyzspmGfJT1fjDZK4VI4QoPEAVuvGiJCGFzSW2RDmlb48ZXsnGBbM
f83tKjB7praQhXnBt56/S5YThgWzt8eaJVIhSExtEh1tisb5iWNQzVPk+USXUE9t
DNTxtJjwiECbslyVYkTDUKnhdPGtHkpQSN96RBUDvQYfoLHQ/aXbxfPIZGEt24YM
A/TbCFDFQrI91Rn3TkAxygvfOkxWxE9TB1PmwfgrJGFDWNxg84OBiCX9IMNi3NUF
glqoKn6aI5fZH6gHVU7xA+bnfJYYRIxUtgIHJ1sYH6dH185G5Yj3m9bojcN7DnmM
x1kLf0lscumgdB3OGLgpe5IrrFKM+ncclkS24X3eWOCvnWiEXBwajPqA8LloekZA
X+IyGhoSfg2yRJAYEipRD+H0XouNM/AsLMcI/VbEoLGebxpsKCkg0VwCbd/4xISO
90Q9jWXC4dzUVRc60rPw
=ZcGP
-----END PGP SIGNATURE-----
Merge tag 'for-linus-v3.6-rc7' of git://oss.sgi.com/xfs/xfs
Pull xfs bugfixes from Ben Myers:
- fix a regression related to xfs_sync_worker racing with unmount.
- fix a race while discarding xfs buffers.
* tag 'for-linus-v3.6-rc7' of git://oss.sgi.com/xfs/xfs:
xfs: stop the sync worker before xfs_unmountfs
xfs: fix race while discarding buffers [V4]
The format_array_alloc() function is fundamentally racy, in that it
prints the array twice: once to figure out how much space to allocate
for the buffer, and the second time to actually print out the data.
If any of the array contents changes in between, the allocation size may
be wrong, and the end result may be truncated in odd ways.
Just don't do it. Allocate a maximum-sized array up-front, and just
format the array contents once. The only user of the u32_array
interfaces is the Xen spinlock statistics code, and it has 31 entries in
the arrays, so the maximum size really isn't that big, and the end
result is much simpler code without the bug.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
u32_array_open() is racy when multiple threads read from a file with a
seek position of zero, i.e. when two or more simultaneous reads are
occurring after the non-seekable files are created. It is possible that
file->private_data is double-freed because the threads races between
kfree(file->private-data);
and
file->private_data = NULL;
The fix is to only do format_array_alloc() when the file is opened and
free it when it is closed.
Note that because the file has always been non-seekable, you can't open
it and read it multiple times anyway, so the data has always been
generated just once. The difference is that now it is generated at open
time rather than at the time of the first read, and that avoids the
race.
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Raghavendra <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs. This prevents occasional crashes on unmount like so:
PID: 21602 TASK: ee9df060 CPU: 0 COMMAND: "kworker/0:3"
#0 [c5377d28] crash_kexec at c0292c94
#1 [c5377d80] oops_end at c07090c2
#2 [c5377d98] no_context at c06f614e
#3 [c5377dbc] __bad_area_nosemaphore at c06f6281
#4 [c5377df4] bad_area_nosemaphore at c06f629b
#5 [c5377e00] do_page_fault at c070b0cb
#6 [c5377e7c] error_code (via page_fault) at c070892c
EAX: f300c6a8 EBX: f300c6a8 ECX: 000000c0 EDX: 000000c0 EBP: c5377ed0
DS: 007b ESI: 00000000 ES: 007b EDI: 00000001 GS: ffffad20
CS: 0060 EIP: c0481ad0 ERR: ffffffff EFLAGS: 00010246
#7 [c5377eb0] atomic64_read_cx8 at c0481ad0
#8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
#9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
#14 [c5377f58] process_one_work at c024ee4c
#15 [c5377f98] worker_thread at c024f43d
#16 [c5377fbc] kthread at c025326b
#17 [c5377fe8] kernel_thread_helper at c070e834
PID: 26653 TASK: e79143b0 CPU: 3 COMMAND: "umount"
#0 [cde0fda0] __schedule at c0706595
#1 [cde0fe28] schedule at c0706b89
#2 [cde0fe30] schedule_timeout at c0705600
#3 [cde0fe94] __down_common at c0706098
#4 [cde0fec8] __down at c0706122
#5 [cde0fed0] down at c025936f
#6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
#7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
#8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
#9 [cde0ff1c] generic_shutdown_super at c0333d7a
#10 [cde0ff38] kill_block_super at c0333e0f
#11 [cde0ff48] deactivate_locked_super at c0334218
#12 [cde0ff58] deactivate_super at c033495d
#13 [cde0ff68] mntput_no_expire at c034bc13
#14 [cde0ff7c] sys_umount at c034cc69
#15 [cde0ffa0] sys_oldumount at c034ccd4
#16 [cde0ffb0] system_call at c0707e66
commit 11159a05 added this to xfs_log_unmount and needs to be cleaned up
at a later date.
Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
This function returns the wrong value, which causes the callers to get
the length of the resulting pathname wrong when it contains non-ASCII
characters.
This seems to fix https://bugzilla.samba.org/show_bug.cgi?id=6767
Cc: <stable@vger.kernel.org>
Reported-by: Baldvin Kovacs <baldvin.kovacs@gmail.com>
Reported-and-Tested-by: Nicolas Lefebvre <nico.lefebvre@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
IBM reported a soft lockup after applying the fix for the rename_lock
deadlock. Commit c83ce989cb ("VFS: Fix the nfs sillyrename regression
in kernel 2.6.38") was found to be the culprit.
The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the
dentry was killed. This flag can be set on non-killed dentries too,
which results in infinite retries when trying to traverse the dentry
tree.
This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is
only set in d_kill() and makes try_to_ascend() test only this flag.
IBM reported successful test results with this patch.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The unregister_sysctl_table() function hangs if all references to its
ctl_table_header structure are not dropped.
This can happen sometimes because of a leak in proc_sys_lookup():
proc_sys_lookup() gets a reference to the table via lookup_entry(), but
it does not release it when a subsequent call to sysctl_follow_link()
fails.
This patch fixes this leak by making sure the reference is always
dropped on return.
See also commit 076c3eed2c ("sysctl: Rewrite proc_sys_lookup
introducing find_entry and lookup_entry") which reorganized this code in
3.4.
Tested in Linux 3.4.4.
Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull a btrfs revert from Chris Mason:
"My for-linus branch has one revert in the new quota code.
We're building up more fixes at etc for the next merge window, but I'm
keeping them out unless they are bigger regressions or have a huge
impact."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Revert "Btrfs: fix some error codes in btrfs_qgroup_inherit()"
Pull GFS2 fixes from Steven Whitehouse:
"Here are three GFS2 fixes for the current kernel tree. These are all
related to the block reservation code which was added at the merge
window. That code will be getting an update at the forthcoming merge
window too. In the mean time though there are a few smaller issues
which should be fixed.
The first patch resolves an issue with write sizes of greater than 32
bits with the size hinting code. The second ensures that the
allocation data structure is initialised when using xattrs and the
third takes into account allocations which may have been made by other
nodes which affect a reservation on the local node."
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
GFS2: Take account of blockages when using reserved blocks
GFS2: Fix missing allocation data for set/remove xattr
GFS2: Make write size hinting code common
shared memory mapping is dirtied and unmapped. The lower file was being
released when the eCryptfs file was closed and the dirtied pages could not be
written out.
- Adds a call to the lower filesystem's ->flush() from ecryptfs_flush().
- Fixes a regression, introduced in 2.6.39, when a file is renamed on top of
another file. The target file's inode was not being evicted and the space
taken by the file was not reclaimed until eCryptfs was unmounted.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCgAGBQJQU14UAAoJENaSAD2qAscKrpwQAK+Xu3RAHPLO16bEjhrFZaIr
FY2z8yK5geFzHfmhDxUT7jovw8d2ghAjrE3XHf+PgrqfUqIZ6vo57GFmR8ZNS+30
exNOb1DffaRqIejStrFkz+Ek4/FbnaPD/FvJYgC8D3Bv04YllyNI9Euq49VkkOiy
v6jxAy4H+/MCGzYvR4K0jmKlOLuC47TtJ322jXTm0iNeJJH2+kNXm0WvE8dY7dYV
2QMvv11M1HMI39S4yIY0eBPwbloP3AoO++H3Id7/XWcIlaaeZrVA7HmV8z04DEWq
Nt2vceSbX4FVBxmLab5TzTftorskJRDLCtMaSM8opve+3Sxgx+wooLskgrHqJjZH
8AtGfs/ZzvUkKcYMNv+iMax0/KqkGKOhdJMmqc37aI8qkBo5IYXpIPU1ZTy/S2T8
t3fM8jEI4AOKFn7GlG0kZg7EXo0wWPw3A8X73tCQx8W6sJy5STvv07hFZspX/4nn
+qhlM7Qzpr1XXPEv9cKo4oXzqOmlSi1Jd2ueGANNGJvche90uCWgmSZCfNNxGnPq
mLiiiV6KJ94xjjyf/ZWiKLryk5rv0fWiYXzutaORADn6WWwIVKX9IyvJfFI8y+M6
phREWNdOnyDFFA++h5QotWh3L7ZCL/HNeTpLDSD1ONxosVDhViviA2HrRlyl8LSc
soDzN7w+yOh+QjlN4sO/
=UC2V
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-3.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fixes from Tyler Hicks:
- Fixes a regression, introduced in 3.6-rc1, when a file is closed
before its shared memory mapping is dirtied and unmapped. The lower
file was being released when the eCryptfs file was closed and the
dirtied pages could not be written out.
- Adds a call to the lower filesystem's ->flush() from
ecryptfs_flush().
- Fixes a regression, introduced in 2.6.39, when a file is renamed on
top of another file. The target file's inode was not being evicted
and the space taken by the file was not reclaimed until eCryptfs was
unmounted.
* tag 'ecryptfs-3.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Copy up attributes of the lower target inode after rename
eCryptfs: Call lower ->flush() from ecryptfs_flush()
eCryptfs: Write out all dirty pages just before releasing the lower file
This reverts commit 5986802c2f.
Both paths are not error paths but regular cases where non-qgroup
subvols are involved.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We already use them for openat() and friends, but fstat() also wants to
be able to use O_PATH file descriptors. This should make it more
directly comparable to the O_SEARCH of Solaris.
Note that you could already do the same thing with "fstatat()" and an
empty path, but just doing "fstat()" directly is simpler and faster, so
there is no reason not to just allow it directly.
See also commit 332a2e1244, which did the same thing for fchdir, for
the same reasons.
Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org # O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After calling into the lower filesystem to do a rename, the lower target
inode's attributes were not copied up to the eCryptfs target inode. This
resulted in the eCryptfs target inode staying around, rather than being
evicted, because i_nlink was not updated for the eCryptfs inode. This
also meant that eCryptfs didn't do the final iput() on the lower target
inode so it stayed around, as well. This would result in a failure to
free up space occupied by the target file in the rename() operation.
Both target inodes would eventually be evicted when the eCryptfs
filesystem was unmounted.
This patch calls fsstack_copy_attr_all() after the lower filesystem
does its ->rename() so that important inode attributes, such as i_nlink,
are updated at the eCryptfs layer. ecryptfs_evict_inode() is now called
and eCryptfs can drop its final reference on the lower inode.
http://launchpad.net/bugs/561129
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Tested-by: Colin Ian King <colin.king@canonical.com>
Cc: <stable@vger.kernel.org> [2.6.39+]
Since eCryptfs only calls fput() on the lower file in
ecryptfs_release(), eCryptfs should call the lower filesystem's
->flush() from ecryptfs_flush().
If the lower filesystem implements ->flush(), then eCryptfs should try
to flush out any dirty pages prior to calling the lower ->flush(). If
the lower filesystem does not implement ->flush(), then eCryptfs has no
need to do anything in ecryptfs_flush() since dirty pages are now
written out to the lower filesystem in ecryptfs_release().
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Fixes a regression caused by:
821f749 eCryptfs: Revert to a writethrough cache model
That patch reverted some code (specifically, 32001d6f) that was
necessary to properly handle open() -> mmap() -> close() -> dirty pages
-> munmap(), because the lower file could be closed before the dirty
pages are written out.
Rather than reapplying 32001d6f, this approach is a better way of
ensuring that the lower file is still open in order to handle writing
out the dirty pages. It is called from ecryptfs_release(), while we have
a lock on the lower file pointer, just before the lower file gets the
final fput() and we overwrite the pointer.
https://launchpad.net/bugs/1047261
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Artemy Tregubenko <me@arty.name>
Tested-by: Artemy Tregubenko <me@arty.name>
Tested-by: Colin Ian King <colin.king@canonical.com>
The claim_reserved_blks() function was not taking account of
the possibility of "blockages" while performing allocation.
This can be caused by another node allocating something in
the same extent which has been reserved locally.
This patch tests for this condition and then skips the remainder
of the reservation in this case. This is a relatively rare event,
so that it should not affect the general performance improvement
which the block reservations provide.
The claim_reserved_blks() function also appears not to be able
to deal with reservations which cross bitmap boundaries, but
that can be dealt with in a future patch since we don't generate
boundary crossing reservations currently.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: David Teigland <teigland@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
This collects up the write size hinting code which is used by the
block reservation subsystem into a single function. At the same
time this also corrects the rounding for this calculation.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
- Final (hopefully) fix for the range checking code in NFSv4 getacl. This
should fix the Oopses being seen when the acl size is close to PAGE_SIZE.
- Fix a regression with the legacy binary mount code
- Fix a regression in the readdir cookieverf initialisation
- Fix an RPC over UDP regression
- Ensure that we report all errors in the NFSv4 open code
- Ensure that fsync() reports all relevant synchronisation errors.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQUN+KAAoJEGcL54qWCgDyHGcQAKj7MYVDIjhdmsVGGNWXUCnf
X0LVg/ajh+vjusK+hmquzcJikZqgce5IU5DW4vcFr1X8BgP+R51UVvU0KksByD5H
ourV2JVCztAQzQ4WWOsZAGqN0tooJUjyEjl4lEiDsQCF4Nk1HWbuCHeYuX74OToZ
jrgedj0EZ6zb7TOizvbgU/7lI+FKu3Hlw6+u27M9phtSuefJdYSHZHYVMOX81qPh
k0zgZ4tuLIaDuBB84iCrPwNt9icnevq6cIc+AGluI6xhDw+foPvUaUR+OUI420IZ
tunNzP2So+nNoyjEiyMVENaCdEyA75XAmmGHTUUdBiVOsMV4HF/TqvTtSsjk2mN1
FbZVvtjD6srjsQaKdVmqMIZBdhY9LSMLIQVqb4H2rYP6Mwq06WTuyCxf5YhzFfoy
2tai7JuqBkTAWfKB8ESWywV6Qk/MkUWRAOBO6ksS66gAwpcFDj6nfeAdwaEmoYKc
uzLUIRZaclPMZf661cs1fWeFV5XOnCL7je4owgTRGs7MHooWHPcC3273fEJqnhFz
5MkC7nfmUiGcdO1v0mfYTEtMj9Pp9icBoZcVTGn4eZIHzvhhZOx//8LhyBfS+jll
bKjaLZ1rErvIqwnSGcB7PK2yBYY9P6ZaxWjOrAAncZmiOxfhN0hvCo54jNOr/VZ+
atsDEAuqSTeK7ouBqyO4
=e5yE
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Final (hopefully) fix for the range checking code in NFSv4 getacl.
This should fix the Oopses being seen when the acl size is close to
PAGE_SIZE.
- Fix a regression with the legacy binary mount code
- Fix a regression in the readdir cookieverf initialisation
- Fix an RPC over UDP regression
- Ensure that we report all errors in the NFSv4 open code
- Ensure that fsync() reports all relevant synchronisation errors.
* tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: fsync() must exit with an error if page writeback failed
SUNRPC: Fix a UDP transport regression
NFS: return error from decode_getfh in decode open
NFSv4: Fix buffer overflow checking in __nfs4_get_acl_uncached
NFSv4: Fix range checking in __nfs4_get_acl_uncached and __nfs4_proc_set_acl
NFS: Fix a problem with the legacy binary mount code
NFS: Fix the initialisation of the readdir 'cookieverf' array
We need to ensure that if the call to filemap_write_and_wait_range()
fails, then we report that error back to the application.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pull FUSE fixes from Miklos Szeredi:
"This contains bugfixes for FUSE and CUSE and a compile warning fix."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix retrieve length
fuse: mark variables uninitialized
cuse: kill connection on initialization error
cuse: fix fuse_conn_kill()
Pull CIFS fixes from Steve French.
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix endianness conversion
CIFS: Fix error handling in cifs_push_mandatory_locks
Pull UDF and ext3 fixes from Jan Kara:
"One UDF data corruption fix and one ext3 fix where we didn't write
everything to disk on fsync in one corner case."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix data corruption for files in ICB
ext3: Fix fdatasync() for files with only i_size changes
If decode_getfh failed, nfs4_xdr_dec_open would return 0 since the last
decode_* call must have succeeded.
Cc: stable@vger.kernel.org
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pass the checks made by decode_getacl back to __nfs4_get_acl_uncached
so that it knows if the acl has been truncated.
The current overflow checking is broken, resulting in Oopses on
user-triggered nfs4_getfacl calls, and is opaque to the point
where several attempts at fixing it have failed.
This patch tries to clean up the code in addition to fixing the
Oopses by ensuring that the overflow checks are performed in
a single place (decode_getacl). If the overflow check failed,
we will still be able to report the acl length, but at least
we will no longer attempt to cache the acl or copy the
truncated contents to user space.
Reported-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Sachin Prabhu <sprabhu@redhat.com>
When a file is stored in ICB (inode), we overwrite part of the file, and
the page containing file's data is not in page cache, we end up corrupting
file's data by overwriting them with zeros. The problem is we use
simple_write_begin() which simply zeroes parts of the page which are not
written to. The problem has been introduced by be021ee4 (udf: convert to
new aops).
Fix the problem by providing a ->write_begin function which makes the page
properly uptodate.
CC: <stable@vger.kernel.org> # >= 2.6.24
Reported-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Ensure that the user supplied buffer size doesn't cause us to overflow
the 'pages' array.
Also fix up some confusion between the use of PAGE_SIZE and
PAGE_CACHE_SIZE when calculating buffer sizes. We're not using
the page cache for anything here.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Apparently, am-utils is still using the legacy binary mountdata interface,
and is having trouble parsing /proc/mounts due to the 'port=' field being
incorrectly set.
The following patch should fix up the regression.
Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
When the NFS_COOKIEVERF helper macro was converted into a static
inline function in commit 99fadcd764 (nfs: convert NFS_*(inode)
helpers to static inline), we broke the initialisation of the
readdir cookies, since that depended on doing a memset with an
argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore
changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *).
At this point, NFS_COOKIEVERF seems to be more of an obfuscation
than a helper, so the best thing would be to just get rid of it.
Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881
Reported-by: Andi Kleen <andi@firstfloor.org>
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
In some cases fuse_retrieve() would return a short byte count if offset was
non-zero. The data returned was correct, though.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.
Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.
CC: <stable@vger.kernel.org> # >= 2.6.32
Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Signed-off-by: Jan Kara <jack@suse.cz>
gcc 4.6.3 complains about uninitialized variables in fs/fuse/control.c:
CC fs/fuse/control.o
fs/fuse/control.c: In function 'fuse_conn_congestion_threshold_write':
fs/fuse/control.c:165:29: warning: 'val' may be used uninitialized in this function [-Wuninitialized]
fs/fuse/control.c: In function 'fuse_conn_max_background_write':
fs/fuse/control.c:128:23: warning: 'val' may be used uninitialized in this function [-Wuninitialized]
fuse_conn_limit_write() will always return non-zero unless the &val
is modified, so the warning is misleading. Let the compiler know
about it by marking 'val' with 'uninitialized_var'.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Pull CIFS fixes from Steve French.
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix cifs_do_create error hadnling
cifs: print error code if smb signature verification fails
CIFS: Fix log messages in packet checking for SMB2
CIFS: Protect i_nlink from being negative
Luca Risolia reported that a CUSE daemon will continue to run even if
initialization of the emulated device failes for some reason (e.g. the device
number is already registered by another driver).
This patch disconnects the fuse device on error, which will make the userspace
CUSE daemon exit, albeit without indication about what the problem was.
Reported-by: Luca Risolia <luca.risolia@studio.unibo.it>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
fuse_conn_kill() removed fc->entry, called fuse_ctl_remove_conn() and
fuse_bdi_destroy(). None of which is appropriate for cuse cleanup.
The fuse_ctl_remove_conn() decrements the nlink on the control filesystem, which
is totally bogus. The others are harmless but unnecessary.
So move these out from fuse_conn_kill() to fuse_put_super() where they belong.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
While xfs_buftarg_shrink() is freeing buffers from the dispose list (filled with
buffers from lru list), there is a possibility to have xfs_buf_stale() racing
with it, and removing buffers from dispose list before xfs_buftarg_shrink() does
it.
This happens because xfs_buftarg_shrink() handle the dispose list without
locking and the test condition in xfs_buf_stale() checks for the buffer being in
*any* list:
if (!list_empty(&bp->b_lru))
If the buffer happens to be on dispose list, this causes the buffer counter of
lru list (btp->bt_lru_nr) to be decremented twice (once in xfs_buftarg_shrink()
and another in xfs_buf_stale()) causing a wrong account usage of the lru list.
This may cause xfs_buftarg_shrink() to return a wrong value to the memory
shrinker shrink_slab(), and such account error may also cause an underflowed
value to be returned; since the counter is lower than the current number of
items in the lru list, a decrement may happen when the counter is 0, causing
an underflow on the counter.
The fix uses a new flag field (and a new buffer flag) to serialize buffer
handling during the shrink process. The new flag field has been designed to use
btp->bt_lru_lock/unlock instead of xfs_buf_lock/unlock mechanism.
dchinner, sandeen, aquini and aris also deserve credits for this.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Pull btrfs fixes from Chris Mason:
"I've split out the big send/receive update from my last pull request
and now have just the fixes in my for-linus branch. The send/recv
branch will wander over to linux-next shortly though.
The largest patches in this pull are Josef's patches to fix DIO
locking problems and his patch to fix a crash during balance. They
are both well tested.
The rest are smaller fixes that we've had queued. The last rc came
out while I was hacking new and exciting ways to recover from a
misplaced rm -rf on my dev box, so these missed rc3."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
Btrfs: fix that repair code is spuriously executed for transid failures
Btrfs: fix ordered extent leak when failing to start a transaction
Btrfs: fix a dio write regression
Btrfs: fix deadlock with freeze and sync V2
Btrfs: revert checksum error statistic which can cause a BUG()
Btrfs: remove superblock writing after fatal error
Btrfs: allow delayed refs to be merged
Btrfs: fix enospc problems when deleting a subvol
Btrfs: fix wrong mtime and ctime when creating snapshots
Btrfs: fix race in run_clustered_refs
Btrfs: don't run __tree_mod_log_free_eb on leaves
Btrfs: increase the size of the free space cache
Btrfs: barrier before waitqueue_active
Btrfs: fix deadlock in wait_for_more_refs
btrfs: fix second lock in btrfs_delete_delayed_items()
Btrfs: don't allocate a seperate csums array for direct reads
Btrfs: do not strdup non existent strings
Btrfs: do not use missing devices when showing devname
Btrfs: fix that error value is changed by mistake
Btrfs: lock extents as we map them in DIO
...
If verify_parent_transid() fails for all mirrors, the current code
calls repair_io_failure() anyway which means:
- that the disk block is rewritten without repairing anything and
- that a kernel log message is printed which misleadingly claims
that a read error was corrected.
This is an example:
parent transid verify failed on 615015833600 wanted 110423 found 110424
parent transid verify failed on 615015833600 wanted 110423 found 110424
btrfs read error corrected: ino 1 off 615015833600 (dev /dev/...)
It is wrong to ignore the results from verify_parent_transid() and to
call repair_eb_io_failure() when the verification of the transids failed.
This commit fixes the issue.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>