Commit Graph

31751 Commits

Author SHA1 Message Date
Ryusuke Konishi
136e8770cd nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary
nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary

DESCRIPTION:
 There are use-cases when NILFS2 file system (formatted with block size
lesser than 4 KB) can be remounted in RO mode because of encountering of
"broken bmap" issue.

The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
 "The machine I've been trialling nilfs on is running Debian Testing,
  Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
  version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
  also reproduced it (identically) with Debian Unstable amd64 and Debian
  Experimental (using the 3.8-trunk kernel).  The problematic partitions
  were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."

SYMPTOMS:
(1) System log contains error messages likewise:

    [63102.496756] nilfs_direct_assign: invalid pointer: 0
    [63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
    [63102.496798]
    [63102.524403] Remounting filesystem read-only

(2) The NILFS2 file system is remounted in RO mode.

REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):

----------------[BEGIN SCRIPT]--------------------

VG=unencrypted
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------

REPRODUCIBILITY: 100%

INVESTIGATION:
As it was discovered, the issue takes place during segment
construction after executing such sequence of user-space operations:

  open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
  fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  ftruncate(7, 60)

The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
bmap (inode number=28)" takes place because of trying to get block
number for third block of the file with logical offset #3072 bytes.  As
it is possible to see from above output, the file has 60 bytes of the
whole size.  So, it is enough one block (1 KB in size) allocation for
the whole file.  Trying to operate with several blocks instead of one
takes place because of discovering several dirty buffers for this file
in nilfs_segctor_scan_file() method.

The root cause of this issue is in nilfs_set_page_dirty function which
is called just before writing to an mmapped page.

When nilfs_page_mkwrite function handles a page at EOF boundary, it
fills hole blocks only inside EOF through __block_page_mkwrite().

The __block_page_mkwrite() function calls set_page_dirty() after filling
hole blocks, thus nilfs_set_page_dirty function (=
a_ops->set_page_dirty) is called.  However, the current implementation
of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
at EOF boundary.

As a result, buffers outside EOF are inconsistently marked dirty and
queued for write even though they are not mapped with nilfs_get_block
function.

FIX:
This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.

Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
for this issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Anthony Doggett <Anthony2486@interfaces.org.uk>
Reported-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:52 -07:00
Jeff Moyer
6900807c6b aio: fix io_getevents documentation
In reviewing man pages, I noticed that io_getevents is documented to
update the timeout that gets passed into the library call.  This doesn't
happen in kernel space or in the library (even though it's documented to
do so in both places).  Unless there is objection, I'd like to fix the
comments/docs to match the code (I will also update the man page upon
consensus).

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Cyril Hrubis <chrubis@suse.cz>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:52 -07:00
Jeff Mahoney
fb09c3733a hfs: avoid crash in hfs_bnode_create
Commit 634725a929 ("hfs: cleanup HFS+ prints") removed the BUG_ON in
hfs_bnode_create in hfsplus.  This patch removes it from the hfs version
and avoids an fsfuzzer crash.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:51 -07:00
Joseph Qi
afe1bb73f8 ocfs2: unlock rw lock if inode lock failed
In ocfs2_file_aio_write(), it does ocfs2_rw_lock() first and then
ocfs2_inode_lock().

But if ocfs2_inode_lock() failed, it goes to out_sems without unlocking
rw lock.  This will cause a bug in ocfs2_lock_res_free() when testing
res->l_ex_holders, which is increased in __ocfs2_cluster_lock() and
decreased in __ocfs2_cluster_unlock().

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: "Duyongfeng (B)" <du.duyongfeng@huawei.com>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:51 -07:00
OGAWA Hirofumi
7b92d03c32 fat: fix possible overflow for fat_clusters
Intermediate value of fat_clusters can be overflowed on 32bits arch.

Reported-by: Krzysztof Strasburger <strasbur@chkw386.ch.pwr.wroc.pl>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:50 -07:00
Linus Torvalds
a8432588fc Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fix from Steve French:
 "One cifs fix to merge now - fixes possible DFS oops (I expect to
  request a merge of 4 additional cifs fixes next week)"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: only set ops for inodes in I_NEW state
2013-05-24 10:45:59 -07:00
Steven Whitehouse
e97e548ba8 GFS2: Fix typo in gfs2_log_end_write loop
There was a missing _all in this loop iterator

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-05-24 13:48:09 +01:00
Randy Dunlap
75f96ce6e7 GFS2: fix DLM depends to fix build errors
Fix build errors by correcting DLM dependencies in GFS2.
Build errors happen when CONFIG_GFS2_FS_LOCKING_DLM=y and CONFIG_DLM=m:

fs/built-in.o: In function `gfs2_lock':
file.c:(.text+0xc7abd): undefined reference to `dlm_posix_get'
file.c:(.text+0xc7ad0): undefined reference to `dlm_posix_unlock'
file.c:(.text+0xc7ad9): undefined reference to `dlm_posix_lock'
fs/built-in.o: In function `gdlm_unmount':
lock_dlm.c:(.text+0xd6e5b): undefined reference to `dlm_release_lockspace'
fs/built-in.o: In function `sync_unlock':
lock_dlm.c:(.text+0xd6e9e): undefined reference to `dlm_unlock'
fs/built-in.o: In function `sync_lock':
lock_dlm.c:(.text+0xd6fb6): undefined reference to `dlm_lock'
fs/built-in.o: In function `gdlm_put_lock':
lock_dlm.c:(.text+0xd7238): undefined reference to `dlm_unlock'
fs/built-in.o: In function `gdlm_mount':
lock_dlm.c:(.text+0xd753e): undefined reference to `dlm_new_lockspace'
lock_dlm.c:(.text+0xd79d3): undefined reference to `dlm_release_lockspace'
fs/built-in.o: In function `gdlm_lock':
lock_dlm.c:(.text+0xd8179): undefined reference to `dlm_lock'
fs/built-in.o: In function `gdlm_cancel':
lock_dlm.c:(.text+0xd6b22): undefined reference to `dlm_unlock'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-05-24 13:47:53 +01:00
Bob Peterson
af21ca8ed5 GFS2: Use single-block reservations for directories
This patch changes the multi-block allocation code, such that
directory inodes only get a single block reserved in the bitmap.
That way, the bitmaps are more tightly packed together, and there
are fewer spans of free blocks for in-use block reservations.
This means it takes less time to find a free span of blocks in the
bitmap, which speeds things up. This increases the performance of
some workloads by almost 2X. In Nate's mockup.py script (which does
(1) create dir, (2) create dir in dir, (3) create file in that dir)
the test executes in 23 steps rather than 43 steps, a 47%
performance improvement.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-05-24 13:47:32 +01:00
Bob Peterson
37f715774e GFS2: two minor quota fixups
This patch fixes two regression problems that Abhi found in the
GFS2 quota code.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-05-24 13:47:13 +01:00
Linus Torvalds
130901ba33 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "Miao Xie has been very busy, fixing races and enospc problems and many
  other small but important pieces.

  Alexandre Oliva discovered some problems with how our error handling
  was interacting with the block layer and for now has disabled our
  partial handling of sub-page writes.  The real sub-page work is in a
  series of patches from IBM that we still need to integrate and test.
  The code Alexandre has turned off was really incomplete.

  Josef has more error handling fixes and an important fix for the new
  skinny extent format.

  This also has my fix for the tracepoint crash from late in 3.9.  It's
  the first stage in a larger clean up to get rid of btrfs_bio and make
  a proper bioset for all the items we need to tack into the bio.  For
  now the bioset only holds our mirror_num and stripe_index, but for the
  next merge window I'll shuffle more in."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
  Btrfs: use a btrfs bioset instead of abusing bio internals
  Btrfs: make sure roots are assigned before freeing their nodes
  Btrfs: explicitly use global_block_rsv for quota_tree
  btrfs: do away with non-whole_page extent I/O
  Btrfs: don't invoke btrfs_invalidate_inodes() in the spin lock context
  Btrfs: remove BUG_ON() in btrfs_read_fs_tree_no_radix()
  Btrfs: pause the space balance when remounting to R/O
  Btrfs: fix unprotected root node of the subvolume's inode rb-tree
  Btrfs: fix accessing a freed tree root
  Btrfs: return errno if possible when we fail to allocate memory
  Btrfs: update the global reserve if it is empty
  Btrfs: don't steal the reserved space from the global reserve if their space type is different
  Btrfs: optimize the error handle of use_block_rsv()
  Btrfs: don't use global block reservation for inode cache truncation
  Btrfs: don't abort the current transaction if there is no enough space for inode cache
  Correct allowed raid levels on balance.
  Btrfs: fix possible memory leak in replace_path()
  Btrfs: fix possible memory leak in the find_parent_nodes()
  Btrfs: don't allow device replace on RAID5/RAID6
  Btrfs: handle running extent ops with skinny metadata
  ...
2013-05-18 11:35:28 -07:00
Chris Mason
c5cb6a0573 Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next 2013-05-17 21:53:17 -04:00
Chris Mason
9be3395bcd Btrfs: use a btrfs bioset instead of abusing bio internals
Btrfs has been pointer tagging bi_private and using bi_bdev
to store the stripe index and mirror number of failed IOs.

As bios bubble back up through the call chain, we use these
to decide if and how to retry our IOs.  They are also used
to count IO failures on a per device basis.

Recently a bio tracepoint was added lead to crashes because
we were abusing bi_bdev.

This commit adds a btrfs bioset, and creates explicit fields
for the mirror number and stripe index.  The plan is to
extend this structure for all of the fields currently in
struct btrfs_bio, which will mean one less kmalloc in
our IO path.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Reported-by: Tejun Heo <tj@kernel.org>
2013-05-17 21:52:52 -04:00
Josef Bacik
655b09fe54 Btrfs: make sure roots are assigned before freeing their nodes
If we fail to load the chunk tree we'll call free_root_pointers, except we may
not have assigned the roots for the dev_root/extent_root/csum_root yet, so we
could NULL pointer deref at this point.  Just add checks to make sure these
roots are set to keep us from panicing.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:38 -04:00
Stefan Behrens
3a6cad9009 Btrfs: explicitly use global_block_rsv for quota_tree
The quota_tree was set up to use the empty_block_rsv before
which would be problematic when the filesystem is filled up
and ENOSPC happens during internal operations while the quota
tree is updated and COWed (when the btrfs_qgroup_info_item
items) are written. In fact, use_block_rsv() which is used
in btrfs_cow_block() falls back to the global_block_rsv in
this case. But just in order to make it more clear what is
happening, change it to explicitly use the global_block_rsv.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:36 -04:00
Alexandre Oliva
17a5adccf3 btrfs: do away with non-whole_page extent I/O
end_bio_extent_readpage computes whole_page based on bv_offset and
bv_len, without taking into account that blk_update_request may modify
them when some of the blocks to be read into a page produce a read
error.  This would cause the read to unlock only part of the file
range associated with the page, which would in turn leave the entire
page locked, which would not only keep the process blocked instead of
returning -EIO to it, but also prevent any further access to the file.

It turns out that btrfs always issues whole-page reads and writes.
The special handling of non-whole_page appears to be a mistake or a
left-over from a time when this wasn't the case.  Indeed,
end_bio_extent_writepage distinguished between whole_page and
non-whole_page writes but behaved identically in both cases!

I've replaced the whole_page computations with warnings, just to be
sure that we're not issuing partial page reads or writes.  The
warnings should probably just go away some time.

Signed-off-by: Alexandre Oliva <oliva@gnu.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:35 -04:00
Miao Xie
b216cbfb52 Btrfs: don't invoke btrfs_invalidate_inodes() in the spin lock context
btrfs_invalidate_inodes() may sleep, so we should not invoke it in the
spin lock context. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:34 -04:00
Miao Xie
314297c2a3 Btrfs: remove BUG_ON() in btrfs_read_fs_tree_no_radix()
We have checked if ->node is NULL or not, so it is unnecessary to
use BUG_ON() to check again. Remove it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:32 -04:00
Miao Xie
061594ef17 Btrfs: pause the space balance when remounting to R/O
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:31 -04:00
Miao Xie
e1409cef85 Btrfs: fix unprotected root node of the subvolume's inode rb-tree
The root node of the rb-tree may be changed, so we should get it under
the lock. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:30 -04:00
Miao Xie
89042e5ad2 Btrfs: fix accessing a freed tree root
inode_tree_del() will move the tree root into the dead root list, and
then the tree will be destroyed by the cleaner. So if we remove the
delayed node which is cached in the inode after inode_tree_del(),
we may access a freed tree root. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:29 -04:00
Liu Bo
b9aa55bed1 Btrfs: return errno if possible when we fail to allocate memory
We need to set return value explicitly, otherwise we'll lose the error
value.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:27 -04:00
Miao Xie
d88033dbf4 Btrfs: update the global reserve if it is empty
Before applying this patch, we reserved the space for the global reserve
by the minimum unit if we found it is empty, it was unreasonable and
inefficient, because if the global reserve space was depleted, it implied
that the size of the global reserve was too small. In this case, we shoud
update the global reserve and fill it.

Cc: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:26 -04:00
Miao Xie
5881cfc924 Btrfs: don't steal the reserved space from the global reserve if their space type is different
If the type of the space we need is different with the global reserve, we
can not steal the space from the global reserve, because we can not allocate
the space from the free space cache that the global reserve points to.

Cc: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:25 -04:00
Miao Xie
b586b32374 Btrfs: optimize the error handle of use_block_rsv()
cc: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:24 -04:00
Miao Xie
7b61cd9224 Btrfs: don't use global block reservation for inode cache truncation
It is very likely that there are lots of subvolumes/snapshots in the filesystem,
so if we use global block reservation to do inode cache truncation, we may hog
all the free space that is reserved in global rsv. So it is better that we do
the free space reservation for inode cache truncation by ourselves.

Cc: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:22 -04:00
Miao Xie
7cfa9e51d2 Btrfs: don't abort the current transaction if there is no enough space for inode cache
The filesystem with inode cache was forced to be read-only when we umounted it.

Steps to reproduce:
 # mkfs.btrfs -f ${DEV}
 # mount -o inode_cache ${DEV} ${MNT}
 # dd if=/dev/zero of=${MNT}/file1 bs=1M count=8192
 # btrfs fi syn ${MNT}
 # dd if=${MNT}/file1 of=/dev/null bs=1M
 # rm -f ${MNT}/file1
 # btrfs fi syn ${MNT}
 # umount ${MNT}

It is because there was no enough space to do inode cache truncation, and then
we aborted the current transaction.

But no space error is not a serious problem when we write out the inode cache,
and it is safe that we just skip this step if we meet this problem. So we need
not abort the current transaction.

Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:21 -04:00
Andreas Philipp
8250dabedb Correct allowed raid levels on balance.
Raid5 with 3 devices is well defined while the old logic allowed
raid5 only with a minimum of 4 devices when converting the block group
profile via btrfs balance. Creating a raid5 with just three devices
using mkfs.btrfs worked always as expected. This is now fixed and the
whole logic is rewritten.

Signed-off-by: Andreas Philipp <philipp.andreas@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:20 -04:00
Stefan Behrens
379cde741b Btrfs: fix possible memory leak in replace_path()
In replace_path(), if read_tree_block() fails, we cannot return
directly, we should free some allocated memory otherwise memory
leak happens.

Similar to Wang's "Btrfs: fix possible memory leak in the
find_parent_nodes()" patch, the current commit fixes an issue that
is related to the "Btrfs: fix all callers of read_tree_block"
commit.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:19 -04:00
Wang Shilong
c16c2e2e51 Btrfs: fix possible memory leak in the find_parent_nodes()
In the find_parent_nodes(), if read_tree_block() fails, we can
not return directly, we should free some allocated memory otherwise
memory leak happens.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:17 -04:00
Stefan Behrens
4968810752 Btrfs: don't allow device replace on RAID5/RAID6
This is not yet supported and causes crashes. One sad user reported
that it destroyed his filesystem.

One failure is in __btrfs_map_block+0xc1f calling kmalloc(0).

0x5f21f is in __btrfs_map_block (fs/btrfs/volumes.c:4923).
4918                            num_stripes = map->num_stripes;
4919                            max_errors = nr_parity_stripes(map);
4920
4921                            raid_map = kmalloc(sizeof(u64) * num_stripes,
4922                                               GFP_NOFS);
4923                            if (!raid_map) {
4924                                    ret = -ENOMEM;
4925                                    goto out;
4926                            }
4927

There might be more issues. Until this is really tested, don't allow
users to start the procedure on RAID5/RAID6 filesystems.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:16 -04:00
Josef Bacik
b1c79e0947 Btrfs: handle running extent ops with skinny metadata
Chris hit a bug where we weren't finding extent records when running extent ops.
This is because we use the delayed_ref_head when running the extent op, which
means we can't use the ->type checks to see if we are metadata.  We also lose
the level of the metadata we are working on.  So to fix this we can just check
the ->is_data section of the extent_op, and we can store the level of the buffer
we were modifying in the extent_op.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:15 -04:00
Josef Bacik
73e1e61fb8 Btrfs: remove warn on in free space cache writeout
This catches block groups that are too large to properly cache.  We deal with
this case fine, so the warning just confuses users.  Remove the warning.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:13 -04:00
Josef Bacik
69a85bd87c Btrfs: don't null pointer deref on abort
I'm sorry, theres no excuse for this sort of work.  We need to use
root->leafsize since eb may be NULL.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:12 -04:00
Gabriel de Perthuis
03b71c6ca6 btrfs: don't stop searching after encountering the wrong item
The search ioctl skips items that are too large for a result buffer, but
inline items of a certain size occuring before any search result is
found would trigger an overflow and stop the search entirely.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=57641

Cc: stable@vger.kernel.org
Signed-off-by: Gabriel de Perthuis <g2p.code+btrfs@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 21:40:10 -04:00
Liu Bo
a52f4cd2b1 Btrfs: fix off-by-one in fiemap
lock_extent/unlock_extent expect an exclusive end.

Tested-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 16:27:26 -04:00
David Sterba
60b62978bc btrfs: annotate quota tree for lockdep
Quota tree has been missing from lockdep annotations, though no warning
has been seen in the wild.

There's currently one entry that does not belong there,
BTRFS_ORPHAN_OBJECTID.  No such tree exists, it's probably a copy &
paste mistake, the id is defined among tree ids.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-17 16:27:25 -04:00
Linus Torvalds
b973425cbb Fixed regressions (two stability regressions and a performance
regression) introduced during the 3.10-rc1 merge window.  Also
 included is a bug fix relating to allocating blocks after resizing an
 ext3 file system when using the ext4 file system driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJRkZBlAAoJENNvdpvBGATwLYQP/iWOBs2z93WG23cqkgqvL8o6
 ZyeJdgy9dkFCArVDX5SSnGkJXZ3iqIKi5HoTKTJKfytgMzgiDAZcLsIHVv6NczwR
 UGhjgS3HEdV5tJ46E6JnpB3NLSb+rAdc5kCdlsbzU46CP+JjFiYEhxVpK7ELuM/G
 yctChbIH9FY+1OwxHccacBOaJU2ELhnH6B/8Ry/6gM2H0vfKeTNOdocOHdxvbNqg
 ooGjytMfVopMQEfVG8aXtTfy341NFJH5fAYEahCcXxeO9ta6Unj9yOu5JV2wVrTt
 39+DBsquGX6AVQsc9IxJ6YAN6ldwWN7l3huE9/AI0o/alwGsfVi5M+M/d1MMjDqf
 Fgl2EzzBpZQeKKY9UXNi4LLgYdBiILMgKDOGoRKhRb8ynSSf/JX43+24FvidEi3o
 o//J4aR+oSZfaovGAeikqyF1cumayhoNN8MINRN8igIinBiC4GjBFEl/Kl/1eAY/
 lREGcsmYPXOkVPpM72waRYlP4GwNdOg4QSEY0SGljpwluO+dYtKQjHXcv/s/xL5v
 j3GemzYVyjx4zaq1g3PxGfuD6VKFHr0T6jvzd6cHu17lnPlw9fwznHbEm9BEcXDY
 gbGx9u+a2ZTqDwYVALbeoRpf9Zz6DUCse3ts4N3rbkXUQQiBYo7tybfVopIMAukb
 CexvidDE/ryJrJJFBwoK
 =6cRD
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 update from Ted Ts'o:
 "Fixed regressions (two stability regressions and a performance
  regression) introduced during the 3.10-rc1 merge window.

  Also included is a bug fix relating to allocating blocks after
  resizing an ext3 file system when using the ext4 file system driver"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd,jbd2: fix oops in jbd2_journal_put_journal_head()
  ext4: revert "ext4: use io_end for multiple bios"
  ext4: limit group search loop for non-extent files
  ext4: fix fio regression
2013-05-14 09:30:54 -07:00
Theodore Ts'o
a549984b8c ext4: revert "ext4: use io_end for multiple bios"
This reverts commit 4eec708d26.

Multiple users have reported crashes which is apparently caused by
this commit.  Thanks to Dmitry Monakhov for bisecting it.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Jan Kara <jack@suse.cz>
2013-05-11 19:07:42 -04:00
Linus Torvalds
c4cc75c332 Merge git://git.infradead.org/users/eparis/audit
Pull audit changes from Eric Paris:
 "Al used to send pull requests every couple of years but he told me to
  just start pushing them to you directly.

  Our touching outside of core audit code is pretty straight forward.  A
  couple of interface changes which hit net/.  A simple argument bug
  calling audit functions in namei.c and the removal of some assembly
  branch prediction code on ppc"

* git://git.infradead.org/users/eparis/audit: (31 commits)
  audit: fix message spacing printing auid
  Revert "audit: move kaudit thread start from auditd registration to kaudit init"
  audit: vfs: fix audit_inode call in O_CREAT case of do_last
  audit: Make testing for a valid loginuid explicit.
  audit: fix event coverage of AUDIT_ANOM_LINK
  audit: use spin_lock in audit_receive_msg to process tty logging
  audit: do not needlessly take a lock in tty_audit_exit
  audit: do not needlessly take a spinlock in copy_signal
  audit: add an option to control logging of passwords with pam_tty_audit
  audit: use spin_lock_irqsave/restore in audit tty code
  helper for some session id stuff
  audit: use a consistent audit helper to log lsm information
  audit: push loginuid and sessionid processing down
  audit: stop pushing loginid, uid, sessionid as arguments
  audit: remove the old depricated kernel interface
  audit: make validity checking generic
  audit: allow checking the type of audit message in the user filter
  audit: fix build break when AUDIT_DEBUG == 2
  audit: remove duplicate export of audit_enabled
  Audit: do not print error when LSMs disabled
  ...
2013-05-11 14:29:11 -07:00
Linus Torvalds
2dbd3cac87 Merge branch 'for-3.10' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
 "Small fixes for two bugs and two warnings"

* 'for-3.10' of git://linux-nfs.org/~bfields/linux:
  nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error
  SUNRPC: fix decoding of optional gss-proxy xdr fields
  SUNRPC: Refactor gssx_dec_option_array() to kill uninitialized warning
  nfsd4: don't allow owner override on 4.1 CLAIM_FH opens
2013-05-10 09:28:55 -07:00
Linus Torvalds
3644bc2ec7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull stray syscall bits from Al Viro:
 "Several syscall-related commits that were missing from the original"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE
  unicore32: just use mmap_pgoff()...
  unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
  x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)
2013-05-10 09:21:05 -07:00
Linus Torvalds
6fad8d02ef Improve performance when AES-NI (and most likely other crypto accelerators) is
available by moving to the ablkcipher crypto API. The improvement is more
 apparent on faster storage devices. There's no noticeable change when hardware
 crypto is not available.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCgAGBQJRjJFjAAoJENaSAD2qAscKuRgQAJkefyLPNBb4plXr3R+zJ1aM
 jkcqkUJWamCxlmqHg/n9LmlxpSZiiWSWJM8iiq8zQhPE0PVXVOVhqvzogT1xwv75
 xfT9xTuRi1v7UFaSGHtj3WoO2nscQ0pjZW/0CHgd/PZjz9y0iJ/l6ueoWCOz5L2i
 3oJvx/W407qM+MSogWKx79i1B2jILdBdQH/7PZ+UJS3jWEo3rMWBbfCwbYhd4pUG
 oVc+qFglNs+3HLdHVUmHPCerCL9qYAJIJmDrvupSOQ6DwdFaV8IysTgSEdFtLcfC
 8Z6DUOPzXnvA/+y+NCCCUxg1CrkgYkNrefLKAq18atFu63zIZIHZyJBTJ5Q6vXVF
 o1H8UcIOg/liGa6lXGf5b4ENNKvB0qMYQgiSrL0/FVVima4zGqUWkQLno4kQl1zx
 FHB5imQ7F/EMcow/nTN3YQYC3N/iYIFAIRxf35SiGGsNhO2sEyIqYlRSyq2MNrRl
 pLWNbhnRuhBUqcbqZDxq1oZ7624Ui4jnHHx7rl6Y3gfm8Xa1ZmQeY6rOadSZaRd7
 +ZqFZi1jiHz1c0tVUO/3DsqABIhbr9Ee03vracN8bVTGV0ZmO0qneugFB9esoeDf
 UnrU0Im8ilHu3OHAyc+UkRZuzThL9bLZEivwICQ+JDUJ1zsLYSQZEaGIt8hA0iDy
 8Bu3gtfX2WxD4ak/AkFv
 =3EhP
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-3.10-rc1-ablkcipher' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull eCryptfs update from Tyler Hicks:
 "Improve performance when AES-NI (and most likely other crypto
  accelerators) is available by moving to the ablkcipher crypto API.
  The improvement is more apparent on faster storage devices.

  There's no noticeable change when hardware crypto is not available"

* tag 'ecryptfs-3.10-rc1-ablkcipher' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: Use the ablkcipher crypto API
2013-05-10 09:20:01 -07:00
Linus Torvalds
977b58e1dd Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
 "The bulk of the changes are generalizing the ColdFire v3 core support
  and adding in 537x CPU support.  Also a couple of other bug fixes, one
  to fix a reintroduction of a past bug in the romfs filesystem nommu
  support."

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68knommu: enable Timer on coldfire 532x
  m68knommu: fix ColdFire 5373/5329 QSPI base address
  m68knommu: add support for configuring a Freescale M5373EVB board
  m68knommu: add support for the ColdFire 537x family of CPUs
  m68knommu: make ColdFire M532x platform support more v3 generic
  m68knommu: create and use a common M53xx ColdFire class of CPUs
  m68k: remove unused asm/dbg.h
  m68k: Set ColdFire ACR1 cache mode depending on kernel configuration
  romfs: fix nommu map length to keep inside filesystem
  m68k: clean up unused "config ROMVECSIZE"
2013-05-10 07:22:35 -07:00
Tyler Hicks
4dfea4f0d7 eCryptfs: Use the ablkcipher crypto API
Make the switch from the blkcipher kernel crypto interface to the
ablkcipher interface.

encrypt_scatterlist() and decrypt_scatterlist() now use the ablkcipher
interface but, from the eCryptfs standpoint, still treat the crypto
operation as a synchronous operation. They submit the async request and
then wait until the operation is finished before they return. Most of
the changes are contained inside those two functions.

Despite waiting for the completion of the crypto operation, the
ablkcipher interface provides performance increases in most cases when
used on AES-NI capable hardware.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Colin King <colin.king@canonical.com>
Reviewed-by: Zeev Zilberman <zeev@annapurnaLabs.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Thieu Le <thieule@google.com>
Cc: Li Wang <dragonylffly@163.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@iki.fi>
2013-05-09 16:55:07 -07:00
Linus Torvalds
70eba4226d Couple of pstore cleanups
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRiosmAAoJEKurIx+X31iBQ/IP/2D+JHfGcMzZbRo+2moRyhp9
 VYUrNOg+QA+Csm6ZO43GMB4JZtodI/tO12YLNGFbNh+YELdpox7XTqmG058+pABV
 El1xt7i2Ro2/PxnWBRnnvWVHoWSsPI9R65aVNgz8C2QyDxG4wY9/ZYfcZQgEajJO
 M4gyKx/d54WjKcD31OJBF5NGki2zZQcuBI8vWkjXZBPximNj+cJeC27VPnuNI2GC
 p32p9Q+pvBv43bkf3EPEFGsd/ZKdczZ75SOzLXVqOEJhmsxfEKgleQbBd3hiRlwe
 zwj8lCzjZS3xR9oCnvalzWgswFQMd9S1kQYbItdztIofU4Y9hP6wmsEzLofXku+0
 FshRxAOCtW1jSgRGo/BiDNfRQm8w+l+rJGafZafK6cQOtUHLD+Kig8AhFBIY3Nt1
 Wzsmjz5ERhMDImM2lVw4ypji5vWkQ50wst6sX5CQHiOdOEmX+HYJkpOXCHzZYNY2
 IrAa/qq6EKbQCfUf7btbsFMDMDIYk66gy6R596sC1Onwy14ecxWerY1HO0BFg3b8
 UUo9tGVHEBBLdogc1aErU6OBL2DLeO7TeURhFOu7UHo0PfJNtLyG2ekYOYjwgS+Q
 RKNJz9PHKsQnBOdI+dK3iHG2uqkbv6rg0pH5oIglCptabqflN9SE0RpM5b2PSLT3
 74fTPSlUDSuCcm1PXJ8O
 =Zw0E
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull trivial pstore update from Tony Luck:
 "Couple of pstore cleanups"

It turns out that the kmemdup() conversion ends up being undone by the
fact that the memory block also needed the ecc information (see commit
bd08ec33b5: "pstore/ram: Restore ecc information block"), so all that
remains after merging is the error return code change.

* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore/ram: fix error return code in ramoops_probe()
  fs: pstore: Replaced calls to kmalloc and memcpy with kmemdup
2013-05-09 16:42:10 -07:00
Linus Torvalds
07e074503e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs fixes from Al Viro:
 "Regression fix from Geert + yet another open-coded kernel_read()"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ecryptfs: don't open-code kernel_read()
  xtensa simdisk: Fix proc_create_data() conversion fallout
2013-05-09 13:44:35 -07:00
Linus Torvalds
983a5f84a4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs update from Chris Mason:
 "These are mostly fixes.  The biggest exceptions are Josef's skinny
  extents and Jan Schmidt's code to rebuild our quota indexes if they
  get out of sync (or you enable quotas on an existing filesystem).

  The skinny extents are off by default because they are a new variation
  on the extent allocation tree format.  btrfstune -x enables them, and
  the new format makes the extent allocation tree about 30% smaller.

  I rebased this a few days ago to rework Dave Sterba's crc checks on
  the super block, but almost all of these go back to rc6, since I
  though 3.9 was due any minute.

  The biggest missing fix is the tracepoint bug that was hit late in
  3.9.  I ran into problems with that in overnight testing and I'm still
  tracking it down.  I'll definitely have that fixed for rc2."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (101 commits)
  Btrfs: allow superblock mismatch from older mkfs
  btrfs: enhance superblock checks
  btrfs: fix misleading variable name for flags
  btrfs: use unsigned long type for extent state bits
  Btrfs: improve the loop of scrub_stripe
  btrfs: read entire device info under lock
  btrfs: remove unused gfp mask parameter from release_extent_buffer callchain
  btrfs: handle errors returned from get_tree_block_key
  btrfs: make static code static & remove dead code
  Btrfs: deal with errors in write_dev_supers
  Btrfs: remove almost all of the BUG()'s from tree-log.c
  Btrfs: deal with free space cache errors while replaying log
  Btrfs: automatic rescan after "quota enable" command
  Btrfs: rescan for qgroups
  Btrfs: split btrfs_qgroup_account_ref into four functions
  Btrfs: allocate new chunks if the space is not enough for global rsv
  Btrfs: separate sequence numbers for delayed ref tracking and tree mod log
  btrfs: move leak debug code to functions
  Btrfs: return free space in cow error path
  Btrfs: set UUID in root_item for created trees
  ...
2013-05-09 13:07:40 -07:00
Linus Torvalds
8769e078a9 xfs: update (#2) for v3.10-rc1
* add CONFIG_XFS_WARN, a step between zero debugging and CONFIG_XFS_DEBUG.
 * fix attrmulti and attrlist to fall back to vmalloc when kmalloc fails.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRi/V+AAoJENaLyazVq6ZOgncP/i0p141SO7qAwf20ql5NMi5Q
 KdhzWymyILIZ8GQoxsvoDASXb3D5I/f+8qX1WgrAG7/k7fahdGOYCtFTJ5YqYLvy
 ocxNoKqkhmUrMBbx0BcoNB4rtqgwLuHMR9ihDGDJrJDsn1b6nZlr38VBngIMhcC3
 rWS8hU6ZwEnGm9hcmNzMBLJpjCJRfwqRr9CmHbENP/LspV0ZTSgltIuFggBfjK6q
 LK3FYdrlYfiX1md3c9zPpdVn8P3hxeM/3Jq+O3mLZ6hY4uq1+NBSKQbS5uwg1d6e
 3ib5hxBDlRk//+S0mzJJ3DZ+Qqa2zHpZ/jNKsjdOBUoyqEdRC5tI9hx3F3la5VxP
 4oktNP2rvhpziRqRny/EwZm5xHrGTrdP8Uwq2nO2nxevtZyVd+P73K/17sNgOeCU
 8Xm6d3Usxw4FUTbiHjw0LFoJlM8yfEUSiTr1K8TmDP5G4phE6RsNnlbDLSUZ6kgh
 6UodqGZtKqdXegYljtwysX75PELQcWWcrA5+7U2/yk+qKyEJ3HvMNIXHvW6MqGTL
 69gxdrdD/Ff+83N+ktA3Ks31aj9n0LYpy6shxXg+YpHl6Mny3UiEpiUEsVc8jn4E
 iJ4qVqQI73qLDCNLM9XShBpPz5N/aPzaFw7QPXbj3A9UH3wd5OmUKNieoL4t9ucH
 sAreKLnndOuriYoTAWA9
 =IPgF
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.10-rc1-2' of git://oss.sgi.com/xfs/xfs

Pull xfs update (#2) from Ben Myers:

 - add CONFIG_XFS_WARN, a step between zero debugging and
   CONFIG_XFS_DEBUG.

 - fix attrmulti and attrlist to fall back to vmalloc when kmalloc
   fails.

* tag 'for-linus-v3.10-rc1-2' of git://oss.sgi.com/xfs/xfs:
  xfs: fallback to vmalloc for large buffers in xfs_compat_attrlist_by_handle
  xfs: fallback to vmalloc for large buffers in xfs_attrlist_by_handle
  xfs: introduce CONFIG_XFS_WARN
2013-05-09 13:06:20 -07:00
Al Viro
91c2e0bcae unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-09 13:46:38 -04:00