When xfs_readsb() does the very first read of the superblock,
it makes a guess at the length of the buffer, based on the
sector size of the underlying storage. This may or may
not match the filesystem sector size in sb_sectsize, so
we can't i.e. do a CRC check on it; it might be too short.
In fact, mounting a filesystem with sb_sectsize larger
than the device sector size will cause a mount failure
if CRCs are enabled, because we are checksumming a length
which exceeds the buffer passed to it.
So always read twice; the first time we read with NULL
buffer ops to skip verification; then set the proper
read length, hook up the proper verifier, and give it
another go.
Once we are sure that we've got the right buffer length,
we can also use bp->b_length in the xfs_sb_read_verify,
rather than the less-trusted on-disk sectorsize for
secondary superblocks. Before this we ran the risk of
passing junk to the crc32c routines, which didn't always
handle extreme values.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
My earlier commit 10e6e65 deserves a layer or two of brown paper
bags. The logic in that commit means that a CRC failure on the
primary superblock will *never* result in an error return.
Hopefully this fixes it, so that we always return the error
if it's a primary superblock, otherwise only if the filesystem
has CRCs enabled.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Pull jfs fix from David Kleikamp:
"Another ACL regression. This one more subtle"
* tag 'jfs-3.14-rc4' of git://github.com/kleikamp/linux-shaggy:
jfs: set i_ctime when setting ACL
When using device mapper, there are many "bio: create slab" messages in
the log. Device mapper targets have different front_pad, so each time when
we load a target that wasn't loaded before, we allocate a slab with the
appropriate front_pad and there is associated "bio: create slab" message.
This patch removes these messages, there is no need for them.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull ext4 fixes from Ted Ts'o:
"Miscellaneous ext4 bug fixes for v3.14"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: fix use after free in jbd2_journal_start_reserved()
ext4: don't leave i_crtime.tv_sec uninitialized
ext4: fix online resize with a non-standard blocks per group setting
ext4: fix online resize with very large inode tables
ext4: don't try to modify s_flags if the the file system is read-only
ext4: fix error paths in swap_inode_boot_loader()
ext4: fix xfstest generic/299 block validity failures
There is a regression in
208d0ac 2014-01-07 nfsd4: break only delegations when appropriate
which deletes an nfserrno() call in nfsd_setattr() (by accident,
probably), and NFSD becomes ignoring an error from VFS.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
My rework of handling of notification events (namely commit 7053aee26a
"fsnotify: do not share events between notification groups") broke
sending of cookies with inotify events. We didn't propagate the value
passed to fsnotify() properly and passed 4 uninitialized bytes to
userspace instead (so it is also an information leak). Sadly I didn't
notice this during my testing because inotify cookies aren't used very
much and LTP inotify tests ignore them.
Fix the problem by passing the cookie value properly.
Fixes: 7053aee26a
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When !defined(CONFIG_EXT4_DEBUG), mb_debug() should be defined as a
no_printk() statement instead of an empty statement in order to suppress
the following compiler warning:
fs/ext4/mballoc.c: In function ‘ext4_mb_cleanup_pa’:
fs/ext4/mballoc.c:2659:47: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
mb_debug(1, "mballoc: %u PAs left\n", count);
Signed-off-by: Patrick Palka <patrick@parcs.ath.cx>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Mark functions as static in jbd2/journal.c because they are not used
outside this file.
This eliminates the following warning in jbd2/journal.c:
fs/jbd2/journal.c:125:5: warning: no previous prototype for ‘jbd2_verify_csum_type’ [-Wmissing-prototypes]
fs/jbd2/journal.c:146:5: warning: no previous prototype for ‘jbd2_superblock_csum_verify’ [-Wmissing-prototypes]
fs/jbd2/journal.c:154:6: warning: no previous prototype for ‘jbd2_superblock_csum_set’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
"err" is zero here, there is no need to check again.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
(Trivial patch.)
If the code is looking at the RCU-protected pointer itself, but not
dereferencing it, the rcu_dereference() functions can be downgraded to
rcu_access_pointer(). This commit makes this downgrade in __alloc_fd(),
which simply compares the RCU-protected pointer against NULL with no
dereferencing.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Pull Ceph fixes from Sage Weil:
"We have some patches fixing up ACL support issues from Zheng and
Guangliang and a mount option to enable/disable this support. (These
fixes were somewhat delayed by the Chinese holiday.)
There is also a small fix for cached readdir handling when directories
are fragmented"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: fix __dcache_readdir()
ceph: add acl, noacl options for cephfs mount
ceph: make ceph_forget_all_cached_acls() static inline
ceph: add missing init_acl() for mkdir() and atomic_open()
ceph: fix ceph_set_acl()
ceph: fix ceph_removexattr()
ceph: remove xattr when null value is given to setxattr()
ceph: properly handle XATTR_CREATE and XATTR_REPLACE
Pull CIFS fixes from Steve French:
"Three cifs fixes, the most important fixing the problem with passing
bogus pointers with writev (CVE-2014-0069).
Two additional cifs fixes are still in review (including the fix for
an append problem which Al also discovered)"
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix too big maxBuf size for SMB3 mounts
cifs: ensure that uncached writes handle unmapped areas correctly
[CIFS] Fix cifsacl mounts over smb2 to not call cifs
When FS-Cache allocates an object, the following sequence of events can
occur:
-->fscache_alloc_object()
-->cachefiles_alloc_object() [via cache->ops->alloc_object]
<--[returns new object]
-->fscache_attach_object()
<--[failed]
-->cachefiles_put_object() [via cache->ops->put_object]
-->fscache_object_destroy()
-->fscache_objlist_remove()
-->rb_erase() to remove the object from fscache_object_list.
resulting in a crash in the rbtree code.
The problem is that the object is only added to fscache_object_list on
the success path of fscache_attach_object() where it calls
fscache_objlist_add().
So if fscache_attach_object() fails, the object won't have been added to
the objlist rbtree. We do, however, unconditionally try to remove the
object from the tree.
Thanks to NeilBrown for finding this and suggesting this solution.
Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: (a customer of) NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This has been this way for years, and every time I stumble across it I
lose my lunch. After coming across it for the nth time in the Coverity
results, I had to overcome the bystander effect and do something about
it.
This ignores the 79 column limit in favor of making it look like C
instead of gibberish.
The correct thing to do here would be to lose some of the indentation by
breaking this function up into several smaller ones. I might do that at
some point if I have the stomach to look at this again.
(Also some of those overlong ternary operations would likely be more
readable as regular if's)
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If directory is fragmented, readdir() read its dirfrags one by one.
After reading all dirfrags, the corresponding dentries are sorted in
(frag_t, off) order in the dcache. If dentries of a directory are all
cached, __dcache_readdir() can use the cached dentries to satisfy
readdir syscall. But when checking if a given dentry is after the
position of readdir, __dcache_readdir() compares numerical value of
frag_t directly. This is wrong, it should use ceph_frag_compare().
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Make the 'acl' option dependent on having ACL support compiled in. Make
the 'noacl' option work even without it so that one can always ask it to
be off and not error out on mount when it is not supported.
Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
Signed-off-by: Sage Weil <sage@inktank.com>
If acl is equivalent to file mode permission bits, ceph_set_acl()
needs to remove any existing acl xattr. Use __ceph_setxattr() to
handle both setting and removing acl xattr cases, it doesn't return
-ENODATA when there is no acl xattr.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE
to distinguish null value case from the zero-length value case.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
return -EEXIST if XATTR_CREATE is set and xattr alread exists.
return -ENODATA if XATTR_REPLACE is set but xattr does not exist.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
We need to use the same net namespace that was used to resolve
the hostname and sockaddr arguments.
Fixes: 32e62b7c3e (NFS: Add nfs4_update_server)
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
This patch help us to cleanup the readahead code by merging ra_{sit,nat}_pages
function into ra_meta_pages.
Additionally the new function is used to readahead cp block in
recover_orphan_inodes.
Change log from v1:
o fix a deadloop bug pointed by Jaegeuk Kim.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Previously without protection of inode mutex, f2fs_falloc and other data
correlated operations will interfere with each other.
So let's use inode mutex to keep atomicity of f2fs_falloc.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
If f2fs entered errorneous checkpoint status, it should skip writing meta
pages instead of redirtying the pages out.
Otherwise, it cannot unmount the partition even though f2fs is under read-only
status.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
When a new directory is allocated, if an error is occurred, we should truncate
preallocated dentry pages too.
This bug was reported by Andrey Tsyvarev after a while as follows.
mkdir()->
f2fs_add_link()->
init_inode_metadata()->
f2fs_init_acl()->
f2fs_get_acl()->
f2fs_getxattr()->
read_all_xattrs() fails.
Also there was a BUG_ON triggered after the fault in
mkdir()->
f2fs_add_link()->
init_inode_metadata()->
remove_inode_page() ->
f2fs_bug_on(inode->i_blocks != 0 && inode->i_blocks != 1);
But, previous patch wasn't perfect to resolve that bug, so the following bug
report was also submitted.
kernel BUG at fs/f2fs/inode.c:274!
Call Trace:
[<ffffffff811fde03>] evict+0xa3/0x1a0
[<ffffffff811fe615>] iput+0xf5/0x180
[<ffffffffa01c7f63>] f2fs_mkdir+0xf3/0x150 [f2fs]
[<ffffffff811f2a77>] vfs_mkdir+0xb7/0x160
[<ffffffff811f36bf>] SyS_mkdir+0x5f/0xc0
[<ffffffff81680769>] system_call_fastpath+0x16/0x1b
Finally, this patch resolves all the issues like below.
If an error is occurred after make_empty_dir(),
1. truncate_inode_pages()
The make_bad_inode() prior to iput() will change i_mode to S_IFREG, which
means that f2fs will not decrement fi->dirty_dents during f2fs_evict_inode.
But, by calling it here, we can do that.
2. truncate_blocks()
Preallocated dentry pages are trucated here to sync i_blocks.
3. remove_dirty_dir_inode()
Remove this directory inode from the list.
Reported-and-Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch modifies flow a little bit to avoid the following build warnings.
src/fs/f2fs/recovery.c: In function ‘check_index_in_prev_nodes’:
src/fs/f2fs/recovery.c:288:51: warning: ‘sum.<U5390>.<U52f8>.ofs_in_node’ may
be used uninitialized in this function [-Wmaybe-uninitialized]
src/fs/f2fs/recovery.c:260:23: warning: ‘sum.nid’ may be used uninitialized
in this function [-Wmaybe-uninitialized]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This is the erroneous scenario.
i_size on-disk i_size i_blocks
__f2fs_add_link() 4096 4096 2
get_new_data_page 8192 4096 3
-ENOSPC = init_inode_metadata
checkpoint - 4096 3
POR and reboot
__f2fs_add_link() 4096 4096 3
page = get_new_data_page (page->index = 1 by NEW_ADDR)
add a dentry to the page successfully
f2fs_rmdir()
f2fs_empty_dir() 4096 4096 3
f2fs_unlink() goes, since there is no valid dentry due to i_size = 4096.
But, still there is one dentry in page->index = 1.
So this patch moves the code to write dir->i_size into on-disk i_size in order
to sync dir's i_size, on-disk i_size, and its i_blocks.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch modifies the use of bi_private to remove pointer chasing for sbi.
Previously, we had a bi_private structure, but it needs memory allocation.
So this patch uses bi_private by the sbi pointer and adds a completion pointer
into the sbi.
This can achieve no memory allocation and nice use of the bi_private.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
If a new xattr node page was allocated and its inode is fsynced, we should
recover the xattr node page during the roll-forward process after power-cut.
But, previously, f2fs didn't handle that case, resulting in kernel panic as
follows reported by Tom Li.
BUG: unable to handle kernel paging request at ffffc9001c861a98
IP: [<ffffffffa0295236>] check_index_in_prev_nodes+0x86/0x2d0 [f2fs]
Call Trace:
[<ffffffff815ece9b>] ? printk+0x48/0x4a
[<ffffffffa029626a>] recover_fsync_data+0xdca/0xf50 [f2fs]
[<ffffffffa02873ae>] f2fs_fill_super+0x92e/0x970 [f2fs]
[<ffffffff8112c9f8>] mount_bdev+0x1b8/0x200
[<ffffffffa0286a80>] ? f2fs_remount+0x130/0x130 [f2fs]
[<ffffffffa0285e40>] f2fs_mount+0x10/0x20 [f2fs]
[<ffffffff8112d4de>] mount_fs+0x3e/0x1b0
[<ffffffff810ef4eb>] ? __alloc_percpu+0xb/0x10
[<ffffffff8114761f>] vfs_kern_mount+0x6f/0x120
[<ffffffff811497b9>] do_mount+0x259/0xa90
[<ffffffff810ead1d>] ? memdup_user+0x3d/0x80
[<ffffffff810eadb3>] ? strndup_user+0x53/0x70
[<ffffffff8114a2c9>] SyS_mount+0x89/0xd0
[<ffffffff815feae2>] system_call_fastpath+0x16/0x1b
This patch adds a recovery function of xattr node pages.
Reported-by: Tom Li <biergaizi@members.fsf.org>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In order to make fs consistency, update_inode_page should not be failed all
the time. Otherwise, it is possible to lose some metadata in the inode like
a link count.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Pull btrfs fixes from Chris Mason:
"We have a small collection of fixes in my for-linus branch.
The big thing that stands out is a revert of a new ioctl. Users
haven't shipped yet in btrfs-progs, and Dave Sterba found a better way
to export the information"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: use right clone root offset for compressed extents
btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105
Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol
Btrfs: fix max_inline mount option
Btrfs: fix a lockdep warning when cleaning up aborted transaction
Revert "btrfs: add ioctl to export size of global metadata reservation"
The set_flexbg_block_bitmap() function assumed that the number of
blocks in a blockgroup was sb->blocksize * 8, which is normally true,
but not always! Use EXT4_BLOCKS_PER_GROUP(sb) instead, to fix block
bitmap corruption after:
mke2fs -t ext4 -g 3072 -i 4096 /dev/vdd 1G
mount -t ext4 /dev/vdd /vdd
resize2fs /dev/vdd 8G
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Jon Bernard <jbernard@tuxion.com>
Cc: stable@vger.kernel.org
If a file system has a large number of inodes per block group, all of
the metadata blocks in a flex_bg may be larger than what can fit in a
single block group. Unfortunately, ext4_alloc_group_tables() in
resize.c was never tested to see if it would handle this case
correctly, and there were a large number of bugs which caused the
following sequence to result in a BUG_ON:
kernel bug at fs/ext4/resize.c:409!
...
call trace:
[<ffffffff81256768>] ext4_flex_group_add+0x1448/0x1830
[<ffffffff81257de2>] ext4_resize_fs+0x7b2/0xe80
[<ffffffff8123ac50>] ext4_ioctl+0xbf0/0xf00
[<ffffffff811c111d>] do_vfs_ioctl+0x2dd/0x4b0
[<ffffffff811b9df2>] ? final_putname+0x22/0x50
[<ffffffff811c1371>] sys_ioctl+0x81/0xa0
[<ffffffff81676aa9>] system_call_fastpath+0x16/0x1b
code: c8 4c 89 df e8 41 96 f8 ff 44 89 e8 49 01 c4 44 29 6d d4 0
rip [<ffffffff81254fa1>] set_flexbg_block_bitmap+0x171/0x180
This can be reproduced with the following command sequence:
mke2fs -t ext4 -i 4096 /dev/vdd 1G
mount -t ext4 /dev/vdd /vdd
resize2fs /dev/vdd 8G
To fix this, we need to make sure the right thing happens when a block
group's inode table straddles two block groups, which means the
following bugs had to be fixed:
1) Not clearing the BLOCK_UNINIT flag in the second block group in
ext4_alloc_group_tables --- the was proximate cause of the BUG_ON.
2) Incorrectly determining how many block groups contained contiguous
free blocks in ext4_alloc_group_tables().
3) Incorrectly setting the start of the next block range to be marked
in use after a discontinuity in setup_new_flex_group_blocks().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
bin_attributes created/updated in create_files() (such as those listed
via (struct device).attribute_groups) were not placed under the
specified group, and instead appeared in the base kobj directory.
Fix this by making bin_attributes use creating code similar to normal
attributes.
A quick grep shows that no one is using bin_attrs in a named attribute
group yet, so we can do this without breaking anything in usespace.
Note that I do not add is_visible() support to
bin_attributes, though that could be done as well.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For non compressed extents, iterate_extent_inodes() gives us offsets
that take into account the data offset from the file extent items, while
for compressed extents it doesn't. Therefore we have to adjust them before
placing them in a send clone instruction. Not doing this adjustment leads to
the receiving end requesting for a wrong a file range to the clone ioctl,
which results in different file content from the one in the original send
root.
Issue reproducible with the following excerpt from the test I made for
xfstests:
_scratch_mkfs
_scratch_mount "-o compress-force=lzo"
$XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
$XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo
$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
$XFS_IO_PROG -c "pwrite -S 0x3e -b 80000 200000 80000" $SCRATCH_MNT/foo
$BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
$XFS_IO_PROG -c "pwrite -S 0xdc -b 10000 250000 10000" $SCRATCH_MNT/foo
$XFS_IO_PROG -c "pwrite -S 0xff -b 10000 300000 10000" $SCRATCH_MNT/foo
# will be used for incremental send to be able to issue clone operations
$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/clones_snap
$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
$FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
$FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
-x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2
$FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \
-x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2
$BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
$BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap
$BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \
-c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap
_scratch_unmount
_scratch_mkfs
_scratch_mount
$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
$FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full
$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap
$FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full
$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
$FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>