Commit Graph

506919 Commits

Author SHA1 Message Date
Chris Mason
1bbc621ef2 Btrfs: allow block group cache writeout outside critical section in commit
We loop through all of the dirty block groups during commit and write
the free space cache.  In order to make sure the cache is currect, we do
this while no other writers are allowed in the commit.

If a large number of block groups are dirty, this can introduce long
stalls during the final stages of the commit, which can block new procs
trying to change the filesystem.

This commit changes the block group cache writeout to take appropriate
locks and allow it to run earlier in the commit.  We'll still have to
redo some of the block groups, but it means we can get most of the work
out of the way without blocking the entire FS.

Signed-off-by: Chris Mason <clm@fb.com>
2015-04-10 14:07:22 -07:00
Chris Mason
2b10826800 Btrfs: don't use highmem for free space cache pages
In order to create the free space cache concurrently with FS modifications,
we need to take a few block group locks.

The cache code also does kmap, which would schedule with the locks held.
Instead of going through kmap_atomic, lets just use lowmem for the cache
pages.

Signed-off-by: Chris Mason <clm@fb.com>
2015-04-10 14:07:18 -07:00
Chris Mason
c9dc4c6578 Btrfs: two stage dirty block group writeout
Block group cache writeout is currently waiting on the pages for each
block group cache before moving on to writing the next one.  This commit
switches things around to send down all the caches and then wait on them
in batches.

The end result is much faster, since we're keeping the disk pipeline
full.

Signed-off-by: Chris Mason <clm@fb.com>
2015-04-10 14:07:11 -07:00
Chris Mason
4c6d1d85ad btrfs: move struct io_ctl into ctree.h and rename it
We'll need to put the io_ctl into the block_group cache struct, so
name it struct btrfs_io_ctl and move it into ctree.h

Signed-off-by: Chris Mason <clm@fb.com>
2015-04-10 14:07:04 -07:00
Josef Bacik
3bce876fd5 Btrfs: don't steal from the global reserve if we don't have the space
btrfs_evict_inode() needs to be more careful about stealing from the
global_rsv.  We dont' want to end up aborting commit with ENOSPC just
because the evict_inode code was too greedy.

Signed-off-by: Chris Mason <clm@fb.com>
2015-04-10 14:06:59 -07:00
Josef Bacik
365c531377 Btrfs: don't commit the transaction in the async space flushing
We're triggering a huge number of commits from
btrfs_async_reclaim_metadata_space.  These aren't really requried,
because everyone calling the async reclaim code is going to end up
triggering a commit on their own.

Signed-off-by: Chris Mason <clm@fb.com>
2015-04-10 14:06:54 -07:00
Josef Bacik
cb723e4919 Btrfs: reserve space for block groups
This changes our delayed refs calculations to include the space needed
to write back dirty block groups.

Signed-off-by: Chris Mason <clm@fb.com>
2015-04-10 14:06:48 -07:00
Chris Mason
28f75a0e6c Btrfs: refill block reserves during truncate
When truncate starts, it allocates some space in the block reserves so
that we'll have enough to update metadata along the way.

For very large files, we can easily go through all of that space as we
loop through the extents.  This changes truncate to refill the space
reservation as it progresses through the file.

Signed-off-by: Chris Mason <clm@fb.com>
2015-04-10 14:06:34 -07:00
Josef Bacik
1262133b8d Btrfs: account for crcs in delayed ref processing
As we delete large extents, we end up doing huge amounts of COW in order
to delete the corresponding crcs.  This adds accounting so that we keep
track of that space and flushing of delayed refs so that we don't build
up too much delayed crc work.

This helps limit the delayed work that must be done at commit time and
tries to avoid ENOSPC aborts because the crcs eat all the global
reserves.

Signed-off-by: Chris Mason <clm@fb.com>
2015-04-10 14:04:47 -07:00
Chris Mason
28ed1345a5 btrfs: actively run the delayed refs while deleting large files
When we are deleting large files with large extents, we are building up
a huge set of delayed refs for processing.  Truncate isn't checking
often enough to see if we need to back off and process those, or let
a commit proceed.

The end result is long stalls after the rm, and very long commit times.
During the commits, other processes back up waiting to start new
transactions and we get into trouble.

Signed-off-by: Chris Mason <clm@fb.com>
2015-04-10 14:00:14 -07:00
Guenter Roeck
4a3d1caf8a fs: btrfs: Add missing include file
Building alpha:allmodconfig fails with

fs/btrfs/inode.c: In function 'check_direct_IO':
fs/btrfs/inode.c:8050:2: error: implicit declaration of function 'iov_iter_alignment'

due to a missing include file.

Fixes: 3737c63e1fb0 ("fs: move struct kiocb to fs.h")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-01 12:32:07 -07:00
Chris Mason
dd82525956 Btrfs: free and unlock our path before btrfs_free_and_pin_reserved_extent()
The error handling path for alloc_reserved_tree_block is calling
btrfs_free_and_pin_reserved_extent with a spinning tree lock held.  This
might sleep as we allocate extent_state objects:

 BUG: sleeping function called from invalid context at mm/slub.c:1268
 in_atomic(): 1, irqs_disabled(): 0, pid: 11093, name: kworker/u4:7
 5 locks held by kworker/u4:7/11093:
  #0:  ("%s-%s""btrfs", name){++++.+}, at: [<ffffffff81091d51>] process_one_work+0x151/0x520
  #1:  ((&work->normal_work)){+.+.+.}, at: [<ffffffff81091d51>] process_one_work+0x151/0x520
  #2:  (sb_internal){++++.+}, at: [<ffffffffa003a70e>] start_transaction+0x43e/0x590 [btrfs]
  #3:  (&head_ref->mutex){+.+...}, at: [<ffffffffa0089f8c>] btrfs_delayed_ref_lock+0x4c/0x240 [btrfs]
  #4:  (btrfs-extent-00){++++..}, at: [<ffffffffa007697b>] btrfs_clear_lock_blocking_rw+0x9b/0x150 [btrfs]
 CPU: 0 PID: 11093 Comm: kworker/u4:7 Tainted: G        W 4.0.0-rc6-default+ #246
 Hardware name: Intel Corporation Santa Rosa platform/Matanzas, BIOS TSRSCRB1.86C.0047.B00.0610170821 10/17/06
 Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
  00000000000004f4 ffff88006dd17848 ffffffff81ab0e3b ffff88006dd17848
  ffff88007a944760 ffff88006dd17868 ffffffff8109d516 ffff88006dd17898
  0000000000000000 ffff88006dd17898 ffffffff8109d5b2 ffffffff81aba2bb
 Call Trace:
  [<ffffffff81ab0e3b>] dump_stack+0x4f/0x6c
  [<ffffffff8109d516>] ___might_sleep+0xf6/0x140
  [<ffffffff8109d5b2>] __might_sleep+0x52/0x90
  [<ffffffff81aba2bb>] ? ftrace_call+0x5/0x34
  [<ffffffff81196363>] kmem_cache_alloc+0x163/0x1b0
  [<ffffffffa0056f31>] ? alloc_extent_state+0x31/0x150 [btrfs]
  [<ffffffffa0056f20>] ? alloc_extent_state+0x20/0x150 [btrfs]
  [<ffffffffa0056f31>] alloc_extent_state+0x31/0x150 [btrfs]
  [<ffffffffa005805b>] __set_extent_bit+0x37b/0x5d0 [btrfs]
  [<ffffffff81aba2bb>] ? ftrace_call+0x5/0x34
  [<ffffffffa005888d>] ? set_extent_bit+0xd/0x30 [btrfs]
  [<ffffffffa00588a3>] set_extent_bit+0x23/0x30 [btrfs]
  [<ffffffffa0058e80>] set_extent_dirty+0x20/0x30 [btrfs]
  [<ffffffffa00195ba>] pin_down_extent+0xaa/0x170 [btrfs]
  [<ffffffffa001d8ef>] __btrfs_free_reserved_extent+0xcf/0x160 [btrfs]
  [<ffffffffa0023856>] btrfs_free_and_pin_reserved_extent+0x16/0x20 [btrfs]
  [<ffffffffa002482a>] __btrfs_run_delayed_refs+0xfca/0x1290 [btrfs]
  [<ffffffffa0026eae>] btrfs_run_delayed_refs+0x6e/0x2e0 [btrfs]
  [<ffffffffa0027378>] delayed_ref_async_start+0x48/0xb0 [btrfs]
  [<ffffffffa006c883>] normal_work_helper+0x83/0x350 [btrfs]
  [<ffffffffa006cd79>] ? btrfs_extent_refs_helper+0x9/0x20 [btrfs]
  [<ffffffffa006cd82>] btrfs_extent_refs_helper+0x12/0x20 [btrfs]
  [<ffffffff81091dcb>] process_one_work+0x1cb/0x520
  [<ffffffff81091d51>] ? process_one_work+0x151/0x520
  [<ffffffff811c7abf>] ? seq_read+0x3f/0x400
  [<ffffffff8109260b>] worker_thread+0x5b/0x4e0
  [<ffffffff81097be2>] ? __kthread_parkme+0x12/0xa0
  [<ffffffff810925b0>] ? rescuer_thread+0x450/0x450
  [<ffffffff81098686>] kthread+0xf6/0x120
  [<ffffffff81098590>] ? flush_kthread_worker+0x1b0/0x1b0
  [<ffffffff81ab8088>] ret_from_fork+0x58/0x90
  [<ffffffff81098590>] ? flush_kthread_worker+0x1b0/0x1b0
 ------------[ cut here ]------------

This changes things to free the path first, which will also unlock the
extent buffer.

Signed-off-by: Chris Mason <clm@fb.com>
Reported-by: Dave Sterba <dsterba@suse.cz>
Tested-by: Dave Sterba <dsterba@suse.cz>
2015-04-01 08:36:05 -07:00
Liu Bo
e56a951e01 Btrfs: Remove the check for old-style mkfs
This was used to make sure that a fresh btrfs from an older mkfs.btrfs,
but it also allows us to mount a buggy btrfs if this btrfs has the right
superblock head part but has something wrong with chunk tree part[1], and
after that we can hit BUG_ON()s set in the code to prevent something
impossible.

Since David has released "Btrfs progs v3.19-rc2", just remove the check,
if anyone who wants to make a fresh btrfs, please use the latest one.

[1]: http://www.spinics.net/lists/linux-btrfs/msg42358.html

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Omar Sandoval <osandov@osandov.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 18:10:25 -07:00
Jeff Mahoney
727b9784b6 btrfs: cleanup orphans while looking up default subvolume
Orphans in the fs tree are cleaned up via open_ctree and subvolume
orphans are cleaned via btrfs_lookup_dentry -- except when a default
subvolume is in use.  The name for the default subvolume uses a manual
lookup that doesn't trigger orphan cleanup and needs to trigger it
manually as well. This doesn't apply to the remount case since the
subvolumes are cleaned up by walking the root radix tree.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 18:10:25 -07:00
Tom Van Braeckel
d862095829 btrfs: explicitly set control file's private_data
The private_data member of the Btrfs control device file
(/dev/btrfs-control) is used to hold the current transaction and needs
to be initialized to NULL to signify that no transaction is in progress.

We explicitly set the control file's private_data to NULL to be
independent of whatever value the misc subsystem initializes it to.

Backstory:
----------

The misc subsystem (which is used by /dev/btrfs-control) initializes
a file's private_data to point to the misc device when a driver has
registered a custom open file operation and initializes it to NULL
when a custom open file operation has *not* been provided.

This subtle quirk is confusing, to the point where kernel code registers
*empty* file open operations to have private_data point to the misc
device structure.

And it leads to bugs, where the addition or removal of a custom open
file operation surprisingly changes the initial contents of a file's
private_data structure.

To simplify things in the misc subsystem, a patch [1] has been proposed
to *always* set private_data to point to the misc device instead of
only doing this when a custom open file operation has been registered.

But before we can fix this in the misc subsystem itself, we need to
modify the (few) drivers that rely on this very subtle behavior.

[1] https://lkml.org/lkml/2014/12/4/939

Signed-off-by: Martin Kepplinger <martink@posteo.de>
Signed-off-by: Tom Van Braeckel <tomvanbraeckel@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 18:10:24 -07:00
Chengyu Song
26e726afe0 btrfs: incorrect handling for fiemap_fill_next_extent return
fiemap_fill_next_extent returns 0 on success, -errno on error, 1 if this was
the last extent that will fit in user array. If 1 is returned, the return
value may eventually returned to user space, which should not happen, according
to manpage of ioctl.

Signed-off-by: Chengyu Song <csong84@gatech.edu>
Reviewed-by: David Sterba <dsterba@suse.cz>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 18:10:24 -07:00
David Sterba
3c3b04d10f btrfs: don't accept bare namespace as a valid xattr
Due to insufficient check in btrfs_is_valid_xattr, this unexpectedly
works:

 $ touch file
 $ setfattr -n user. -v 1 file
 $ getfattr -d file
user.="1"

ie. the missing attribute name after the namespace.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94291
Reported-by: William Douglas <william.douglas@intel.com>
CC: <stable@vger.kernel.org> # 2.6.29+
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 18:10:24 -07:00
Filipe Manana
dcc82f4783 Btrfs: fix log tree corruption when fs mounted with -o discard
While committing a transaction we free the log roots before we write the
new super block. Freeing the log roots implies marking the disk location
of every node/leaf (metadata extent) as pinned before the new super block
is written. This is to prevent the disk location of log metadata extents
from being reused before the new super block is written, otherwise we
would have a corrupted log tree if before the new super block is written
a crash/reboot happens and the location of any log tree metadata extent
ended up being reused and rewritten.

Even though we pinned the log tree's metadata extents, we were issuing a
discard against them if the fs was mounted with the -o discard option,
resulting in corruption of the log tree if a crash/reboot happened before
writing the new super block - the next time the fs was mounted, during
the log replay process we would find nodes/leafs of the log btree with
a content full of zeroes, causing the process to fail and require the
use of the tool btrfs-zero-log to wipeout the log tree (and all data
previously fsynced becoming lost forever).

Fix this by not doing a discard when pinning an extent. The discard will
be done later when it's safe (after the new super block is committed) at
extent-tree.c:btrfs_finish_extent_commit().

Fixes: e688b7252f (Btrfs: fix extent pinning bugs in the tree log)
CC: <stable@vger.kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 17:56:24 -07:00
Filipe Manana
2f2ff0ee5e Btrfs: fix metadata inconsistencies after directory fsync
We can get into inconsistency between inodes and directory entries
after fsyncing a directory. The issue is that while a directory gets
the new dentries persisted in the fsync log and replayed at mount time,
the link count of the inode that directory entries point to doesn't
get updated, staying with an incorrect link count (smaller then the
correct value). This later leads to stale file handle errors when
accessing (including attempt to delete) some of the links if all the
other ones are removed, which also implies impossibility to delete the
parent directories, since the dentries can not be removed.

Another issue is that (unlike ext3/4, xfs, f2fs, reiserfs, nilfs2),
when fsyncing a directory, new files aren't logged (their metadata and
dentries) nor any child directories. So this patch fixes this issue too,
since it has the same resolution as the incorrect inode link count issue
mentioned before.

This is very easy to reproduce, and the following excerpt from my test
case for xfstests shows how:

  _scratch_mkfs >> $seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create our main test file and directory.
  $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foo | _filter_xfs_io
  mkdir $SCRATCH_MNT/mydir

  # Make sure all metadata and data are durably persisted.
  sync

  # Add a hard link to 'foo' inside our test directory and fsync only the
  # directory. The btrfs fsync implementation had a bug that caused the new
  # directory entry to be visible after the fsync log replay but, the inode
  # of our file remained with a link count of 1.
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/foo_2

  # Add a few more links and new files.
  # This is just to verify nothing breaks or gives incorrect results after the
  # fsync log is replayed.
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/foo_3
  $XFS_IO_PROG -f -c "pwrite -S 0xff 0 64K" $SCRATCH_MNT/hello | _filter_xfs_io
  ln $SCRATCH_MNT/hello $SCRATCH_MNT/mydir/hello_2

  # Add some subdirectories and new files and links to them. This is to verify
  # that after fsyncing our top level directory 'mydir', all the subdirectories
  # and their files/links are registered in the fsync log and exist after the
  # fsync log is replayed.
  mkdir -p $SCRATCH_MNT/mydir/x/y/z
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/x/y/foo_y_link
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/x/y/z/foo_z_link
  touch $SCRATCH_MNT/mydir/x/y/z/qwerty

  # Now fsync only our top directory.
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/mydir

  # And fsync now our new file named 'hello', just to verify later that it has
  # the expected content and that the previous fsync on the directory 'mydir' had
  # no bad influence on this fsync.
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/hello

  # Simulate a crash/power loss.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  # Verify the content of our file 'foo' remains the same as before, 8192 bytes,
  # all with the value 0xaa.
  echo "File 'foo' content after log replay:"
  od -t x1 $SCRATCH_MNT/foo

  # Remove the first name of our inode. Because of the directory fsync bug, the
  # inode's link count was 1 instead of 5, so removing the 'foo' name ended up
  # deleting the inode and the other names became stale directory entries (still
  # visible to applications). Attempting to remove or access the remaining
  # dentries pointing to that inode resulted in stale file handle errors and
  # made it impossible to remove the parent directories since it was impossible
  # for them to become empty.
  echo "file 'foo' link count after log replay: $(stat -c %h $SCRATCH_MNT/foo)"
  rm -f $SCRATCH_MNT/foo

  # Now verify that all files, links and directories created before fsyncing our
  # directory exist after the fsync log was replayed.
  [ -f $SCRATCH_MNT/mydir/foo_2 ] || echo "Link mydir/foo_2 is missing"
  [ -f $SCRATCH_MNT/mydir/foo_3 ] || echo "Link mydir/foo_3 is missing"
  [ -f $SCRATCH_MNT/hello ] || echo "File hello is missing"
  [ -f $SCRATCH_MNT/mydir/hello_2 ] || echo "Link mydir/hello_2 is missing"
  [ -f $SCRATCH_MNT/mydir/x/y/foo_y_link ] || \
      echo "Link mydir/x/y/foo_y_link is missing"
  [ -f $SCRATCH_MNT/mydir/x/y/z/foo_z_link ] || \
      echo "Link mydir/x/y/z/foo_z_link is missing"
  [ -f $SCRATCH_MNT/mydir/x/y/z/qwerty ] || \
      echo "File mydir/x/y/z/qwerty is missing"

  # We expect our file here to have a size of 64Kb and all the bytes having the
  # value 0xff.
  echo "file 'hello' content after log replay:"
  od -t x1 $SCRATCH_MNT/hello

  # Now remove all files/links, under our test directory 'mydir', and verify we
  # can remove all the directories.
  rm -f $SCRATCH_MNT/mydir/x/y/z/*
  rmdir $SCRATCH_MNT/mydir/x/y/z
  rm -f $SCRATCH_MNT/mydir/x/y/*
  rmdir $SCRATCH_MNT/mydir/x/y
  rmdir $SCRATCH_MNT/mydir/x
  rm -f $SCRATCH_MNT/mydir/*
  rmdir $SCRATCH_MNT/mydir

  # An fsck, run by the fstests framework everytime a test finishes, also detected
  # the inconsistency and printed the following error message:
  #
  # root 5 inode 257 errors 2001, no inode item, link count wrong
  #    unresolved ref dir 258 index 2 namelen 5 name foo_2 filetype 1 errors 4, no inode ref
  #    unresolved ref dir 258 index 3 namelen 5 name foo_3 filetype 1 errors 4, no inode ref

  status=0
  exit

The expected golden output for the test is:

  wrote 8192/8192 bytes at offset 0
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  wrote 65536/65536 bytes at offset 0
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  File 'foo' content after log replay:
  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0020000
  file 'foo' link count after log replay: 5
  file 'hello' content after log replay:
  0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  *
  0200000

Which is the output after this patch and when running the test against
ext3/4, xfs, f2fs, reiserfs or nilfs2. Without this patch, the test's
output is:

  wrote 8192/8192 bytes at offset 0
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  wrote 65536/65536 bytes at offset 0
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  File 'foo' content after log replay:
  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0020000
  file 'foo' link count after log replay: 1
  Link mydir/foo_2 is missing
  Link mydir/foo_3 is missing
  Link mydir/x/y/foo_y_link is missing
  Link mydir/x/y/z/foo_z_link is missing
  File mydir/x/y/z/qwerty is missing
  file 'hello' content after log replay:
  0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  *
  0200000
  rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/x/y/z': No such file or directory
  rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/x/y': No such file or directory
  rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/x': No such file or directory
  rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/foo_2': Stale file handle
  rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/foo_3': Stale file handle
  rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir': Directory not empty

Fsck, without this fix, also complains about the wrong link count:

  root 5 inode 257 errors 2001, no inode item, link count wrong
      unresolved ref dir 258 index 2 namelen 5 name foo_2 filetype 1 errors 4, no inode ref
      unresolved ref dir 258 index 3 namelen 5 name foo_3 filetype 1 errors 4, no inode ref

So fix this by logging the inodes that the dentries point to when
fsyncing a directory.

A test case for xfstests follows.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 17:56:23 -07:00
Filipe Manana
bf69196045 Btrfs: change the insertion criteria for the qgroup operations rbtree
After looking at Liu Bo's recent patch (titled
"Btrfs: fix comp_oper to get right order") I realized the search made by
qgroup_oper_exists() was buggy because its rbtree navigation comparison
function, comp_oper_exist(), only looks at the fields bytenr and ref_root
of a tree node, ignoring the seq field completely. This was wrong because
when we insert a node into the rbtree we use comp_oper(), which takes a
decision based first on bytenr, then on seq and then on the ref_root field.
That means qgroup_oper_exists() could miss the fact that at least one
operation with given bytenr and ref_root exists.

Consider the following simple example of a 3 nodes qgroup operations
rbtree (created using comp_oper before this patch), where each node's key
is a tuple with the shape (bytenr, seq, ref_root, op):

                          [ (4096, 2, 20, op X) ]
                         /                       \
                        /                         \
   [ (4096, 1, 5, op Y) ]                         [ (4096, 3, 10, op Z) ]

qgroup_oper_exists() when called to search for an existing operation for
bytenr 4096 and ref root 10 wouldn't find anything because it would go to
the left subtree instead of the right subtree, since comp_oper_exits()
ignores the seq field completely.

Fix this by changing the insertion navigation function to use the ref_root
field right after using the bytenr field and before using the seq field,
so that qgroup_oper_exists() / comp_oper_exist() work as expected.

This patch applies on top of the patch mentioned above from Liu.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 17:55:52 -07:00
Filipe Manana
3d850dd448 Btrfs: add missing inode item update in fallocate()
If we fallocate(), without the keep size flag, into an area already covered
by an extent previously fallocated, we were updating the inode's i_size but
we weren't updating the inode item in the fs/subvol tree. A following umount
+ mount would result in a loss of the inode's size (and an fsync would miss
too the fact that the inode changed).

Reproducer:

  $ mkfs.btrfs -f /dev/sdd
  $ mount /dev/sdd /mnt
  $ fallocate -n -l 1M /mnt/foobar
  $ fallocate -l 512K /mnt/foobar
  $ umount /mnt
  $ mount /dev/sdd /mnt
  $ od -t x1 /mnt/foobar
  0000000

The expected result is:

  $ od -t x1 /mnt/foobar
  0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  2000000

A test case for fstests follows soon.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 17:55:52 -07:00
Filipe Manana
5f806c3ae2 Btrfs: incremental send, remove dead code
The logic to detect path loops when attempting to apply a pending
directory rename, introduced in commit
f959492fc1 (Btrfs: send, fix more issues related to directory renames)
is no longer needed, and the respective fstests test case for that commit,
btrfs/045, now passes without this code (as well as all the other test
cases for send/receive).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 17:55:52 -07:00
Filipe Manana
8996a48c0a Btrfs: incremental send, clear name from cache after orphanization
If a directory's reference ends up being orphanized, because the inode
currently being processed has a new path that matches that directory's
path, make sure we evict the name of the directory from the name cache.
This is because there might be descendent inodes (either directories or
regular files) that will be orphanized later too, and therefore the
orphan name of the ancestor must be used, otherwise we send issue rename
operations with a wrong path in the send stream.

Reproducer:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ mkdir -p /mnt/data/n1/n2/p1/p2
  $ mkdir /mnt/data/n4
  $ mkdir -p /mnt/data/p1/p2

  $ btrfs subvolume snapshot -r /mnt /mnt/snap1

  $ mv /mnt/data/p1/p2 /mnt/data
  $ mv /mnt/data/n1/n2/p1/p2 /mnt/data/p1
  $ mv /mnt/data/p2 /mnt/data/n1/n2/p1
  $ mv /mnt/data/n1/n2 /mnt/data/p1
  $ mv /mnt/data/p1 /mnt/data/n4
  $ mv /mnt/data/n4/p1/n2/p1 /mnt/data

  $ btrfs subvolume snapshot -r /mnt /mnt/snap2

  $ btrfs send /mnt/snap1 -f /tmp/1.send
  $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/2.send

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt2
  $ btrfs receive /mnt2 -f /tmp/1.send
  $ btrfs receive /mnt2 -f /tmp/2.send
  ERROR: rename data/p1/p2 -> data/n4/p1/p2 failed. no such file or directory

Directories data/p1 (inode 263) and data/p1/p2 (inode 264) in the parent
snapshot are both orphanized during the incremental send, and as soon as
data/p1 is orphanized, we must make sure that when orphanizing data/p1/p2
we use a source path of o263-6-o/p2 for the rename operation instead of
the old path data/p1/p2 (the one before the orphanization of inode 263).

A test case for xfstests follows soon.

Reported-by: Robbie Ko <robbieko@synology.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 17:55:51 -07:00
Filipe Manana
2f1f465ae6 Btrfs: send, don't leave without decrementing clone root's send_progress
If the clone root was not readonly or the dead flag was set on it, we were
leaving without decrementing the root's send_progress counter (and before
we just incremented it). If a concurrent snapshot deletion was in progress
and ended up being aborted, it would be impossible to later attempt to
delete again the snapshot, since the root's send_in_progress counter could
never go back to 0.

We were also setting clone_sources_to_rollback to i + 1 too early - if we
bailed out because the clone root we got is not readonly or flagged as dead
we ended up later derreferencing a null pointer because we didn't assign
the clone root to sctx->clone_roots[i].root:

		for (i = 0; sctx && i < clone_sources_to_rollback; i++)
			btrfs_root_dec_send_in_progress(
					sctx->clone_roots[i].root);

So just don't increment the send_in_progress counter if the root is readonly
or flagged as dead.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 17:55:51 -07:00
Filipe Manana
5cc2b17e80 Btrfs: send, add missing check for dead clone root
After we locked the root's root item, a concurrent snapshot deletion
call might have set the dead flag on it. So check if the dead flag
is set and abort if it is, just like we do for the parent root.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 17:55:51 -07:00
Filipe Manana
4f764e5153 Btrfs: remove deleted xattrs on fsync log replay
If we deleted xattrs from a file and fsynced the file, after a log replay
the xattrs would remain associated to the file. This was an unexpected
behaviour and differs from what other filesystems do, such as for example
xfs and ext3/4.

Fix this by, on fsync log replay, check if every xattr in the fs/subvol
tree (that belongs to a logged inode) has a matching xattr in the log,
and if it does not, delete it from the fs/subvol tree. This is a similar
approach to what we do for dentries when we replay a directory from the
fsync log.

This issue is trivial to reproduce, and the following excerpt from my
test for xfstests triggers the issue:

  _crash_and_mount()
  {
       # Simulate a crash/power loss.
       _load_flakey_table $FLAKEY_DROP_WRITES
       _unmount_flakey
       _load_flakey_table $FLAKEY_ALLOW_WRITES
       _mount_flakey
  }

  rm -f $seqres.full

  _scratch_mkfs >> $seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create out test file and add 3 xattrs to it.
  touch $SCRATCH_MNT/foobar
  $SETFATTR_PROG -n user.attr1 -v val1 $SCRATCH_MNT/foobar
  $SETFATTR_PROG -n user.attr2 -v val2 $SCRATCH_MNT/foobar
  $SETFATTR_PROG -n user.attr3 -v val3 $SCRATCH_MNT/foobar

  # Make sure everything is durably persisted.
  sync

  # Now delete the second xattr and fsync the inode.
  $SETFATTR_PROG -x user.attr2 $SCRATCH_MNT/foobar
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar

  _crash_and_mount

  # After the fsync log is replayed, the file should have only 2 xattrs, the ones
  # named user.attr1 and user.attr3. The btrfs fsync log replay bug left the file
  # with the 3 xattrs that we had before deleting the second one and fsyncing the
  # file.
  echo "xattr names and values after first fsync log replay:"
  $GETFATTR_PROG --absolute-names --dump $SCRATCH_MNT/foobar | _filter_scratch

  # Now write some data to our file, fsync it, remove the first xattr, add a new
  # hard link to our file and commit the fsync log by fsyncing some other new
  # file. This is to verify that after log replay our first xattr does not exist
  # anymore.
  echo "hello world!" >> $SCRATCH_MNT/foobar
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
  $SETFATTR_PROG -x user.attr1 $SCRATCH_MNT/foobar
  ln $SCRATCH_MNT/foobar $SCRATCH_MNT/foobar_link
  touch $SCRATCH_MNT/qwerty
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/qwerty

  _crash_and_mount

  # Now only the xattr with name user.attr3 should be set in our file.
  echo "xattr names and values after second fsync log replay:"
  $GETFATTR_PROG --absolute-names --dump $SCRATCH_MNT/foobar | _filter_scratch

  status=0
  exit

The expected golden output, which is produced with this patch applied or
when testing against xfs or ext3/4, is:

  xattr names and values after first fsync log replay:
  # file: SCRATCH_MNT/foobar
  user.attr1="val1"
  user.attr3="val3"

  xattr names and values after second fsync log replay:
  # file: SCRATCH_MNT/foobar
  user.attr3="val3"

Without this patch applied, the output is:

  xattr names and values after first fsync log replay:
  # file: SCRATCH_MNT/foobar
  user.attr1="val1"
  user.attr2="val2"
  user.attr3="val3"

  xattr names and values after second fsync log replay:
  # file: SCRATCH_MNT/foobar
  user.attr1="val1"
  user.attr2="val2"
  user.attr3="val3"

A patch with a test case for xfstests follows soon.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-03-26 17:55:51 -07:00
Chris Mason
9b305632cb Merge branch 'cleanup/divs' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1
Signed-off-by: Chris Mason <clm@fb.com>

Conflicts:
	fs/btrfs/volumes.c
2015-03-25 12:25:02 -07:00
Chris Mason
fc4c3c872f Merge branch 'cleanups-post-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1
Signed-off-by: Chris Mason <clm@fb.com>

Conflicts:
	fs/btrfs/disk-io.c
2015-03-25 10:52:48 -07:00
Chris Mason
9deed229fa Merge branch 'cleanups-for-4.1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1 2015-03-25 10:43:16 -07:00
Linus Torvalds
bc465aa9d0 Linux 4.0-rc5 2015-03-22 16:50:21 -07:00
Linus Torvalds
1b717b1af5 One fix for md in 4.0-rc4
Regression in recent patch causes crash on error path.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIVAwUAVQyiMDnsnt1WYoG5AQKV4A//SwSZQmnbSTDZ1H5DyERXBkNNKkyFkZCL
 9jwl60HaVA5U0N/fP97ku/757QtODYjjHLzlkYqgiiqMBGPFFIKa78D6Su0Z8uJK
 FtMJQaalhsWOCXHUz2R6DPHR1e9x2mlrJgqvnIbnbFXym1PZJ/fo3ccsLX3qqmNJ
 F4oFYpy79+yMCUSTn8Sv9czzSE19k5XC/Hz14drWi5ox9zpZmpFbMbQdNaKFa2Uy
 kphgEdvFNF9C2X+rJeeY91ofz/keSsZVL8HqxLThjxKXsZ+krpywsXa2nX7W00ya
 zt49VK78PL3v2eIAgf1XaBYO3oyjyzEaVuAkFOktv/LuLKAbFbup2Yi5JtHzieor
 NoaV9hfgCq1OqxkY8zI8hLcL6HS2vvk8vUXbb2Xqa6X5w6xuyjRMzzAyxqXuG4qm
 16jAU2RvITfdwAeNacyzq/xAWBg+jUgeg+lO/Z71O0qS7chGUNzve+hwQOtoeHzP
 fmEetiZa+gbOQEBZT3Ubsw4L9CqCaYVx7KOiE8Py5nbUVPMz2nF6jjt6WSPOgyWv
 JrflLIoEE5l6QffnEzcJb3OR920TKZx17/87KvBwct9XH3R0+UgJLFggHrYQAiUA
 54CnXVPl0MPq5vtROviX2ZWMThOk1s5HAsMhCBuTdqghtQQdNP7DWmGfg7ys5uUq
 152cPd+xGw4=
 =PM5i
 -----END PGP SIGNATURE-----

Merge tag 'md/4.0-rc4-fix' of git://neil.brown.name/md

Pull bugfix for md from Neil Brown:
 "One fix for md in 4.0-rc4

  Regression in recent patch causes crash on error path"

* tag 'md/4.0-rc4-fix' of git://neil.brown.name/md:
  md: fix problems with freeing private data after ->run failure.
2015-03-22 16:38:19 -07:00
Linus Torvalds
4541c22605 Driver core fixes for 4.0-rc5
Here are two bugfixes for things reported.  One regression in kernfs,
 and another issue fixed in the LZ4 code that was fixed in the "upstream"
 codebase that solves a reported kernel crash.
 
 Both have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlUOl6YACgkQMUfUDdst+ylRjQCfT46/we0U/S38/sTNYtElQXUI
 ZEsAnAwozBj6nIYOydxaKnA0BcujsSGz
 =unYd
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are two bugfixes for things reported.  One regression in kernfs,
  and another issue fixed in the LZ4 code that was fixed in the
  "upstream" codebase that solves a reported kernel crash

  Both have been in linux-next for a while"

* tag 'driver-core-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  LZ4 : fix the data abort issue
  kernfs: handle poll correctly on 'direct_read' files.
2015-03-22 12:07:47 -07:00
Linus Torvalds
b93dbeea7b Char/MISC fixes for 4.0-rc5
Here are three fixes for 4.0-rc5 that revert 3 PCMCIA patches that were
 merged in 4.0-rc1 that cause regressions.  So let's revert them for now
 and they will be reworked and resent sometime in the future.
 
 All have been tested in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlUOmJQACgkQMUfUDdst+ylPdwCdHY4X6DA2xIfbz/YgU2Xo8AOJ
 ZuYAoK8ZgmEzbleVX3p3qFyrsO2HVpmy
 =CzwC
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
 "Here are three fixes for 4.0-rc5 that revert 3 PCMCIA patches that
  were merged in 4.0-rc1 that cause regressions.  So let's revert them
  for now and they will be reworked and resent sometime in the future.

  All have been tested in linux-next for a while"

* tag 'char-misc-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Revert "pcmcia: add a new resource manager for non ISA systems"
  Revert "pcmcia: fix incorrect bracketing on a test"
  Revert "pcmcia: add missing include for new pci resource handler"
2015-03-22 12:03:14 -07:00
Linus Torvalds
704fa7f76f Staging driver fixes for 4.0-rc5
Here are 4 small staging driver fixes, all for the vt6656 and vt6655
 drivers, that resolve some reported issues with them.
 
 All of these patches have been in linux next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlUOmQ4ACgkQMUfUDdst+yl9zgCguQMirGlclUg8bAKQ4AxSJHyO
 vCQAnjSSrN6FcfxGkxYkxdLyU1TQ8vT1
 =WxGI
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Here are four small staging driver fixes, all for the vt6656 and
  vt6655 drivers, that resolve some reported issues with them.

  All of these patches have been in linux next for a while"

* tag 'staging-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  vt6655: Fix late setting of byRFType.
  vt6655: RFbSetPower fix missing rate RATE_12M
  staging: vt6656: vnt_rf_setpower: fix missing rate RATE_12M
  staging: vt6655: vnt_tx_packet fix dma_idx selection.
2015-03-22 11:59:02 -07:00
Linus Torvalds
b2f45eeff2 TTY/Serial driver fix for 4.0-rc5
Here's a single 8250 serial driver that fixes a reported deadlock with
 the serial console and the tty driver.
 
 It's been in linux-next for a while now.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlUOmYYACgkQMUfUDdst+yn46gCgrU/ZfjUXJRm/VXJ/TQfWyQSL
 2kIAoJ033kZ0RqWNDkGjK9H1kLSHdVn1
 =ApFj
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fix from Greg KH:
 "Here's a single 8250 serial driver that fixes a reported deadlock with
  the serial console and the tty driver.

  It's been in linux-next for a while now"

* tag 'tty-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: 8250_dw: Fix deadlock in LCR workaround
2015-03-22 11:54:29 -07:00
Linus Torvalds
cedd5f659e USB / PHY driver fixes for 4.0-rc5
Here's a number of USB and PHY driver fixes for 4.0-rc5.  Largest thing
 here is a revert of a gadget function driver patch that removes 500
 lines of code.  Other than that, it's a number of reported bugs fixes
 and new quirk/id entries.
 
 All have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlUOmk4ACgkQMUfUDdst+ym94wCdGs4iVQbrTA9p+561H8jhCCxh
 79oAn3y24kql3ob9/iuV6+N36+HQsp+0
 =IPei
 -----END PGP SIGNATURE-----

Merge tag 'usb-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB / PHY driver fixes from Greg KH:
 "Here's a number of USB and PHY driver fixes for 4.0-rc5.

  The largest thing here is a revert of a gadget function driver patch
  that removes 500 lines of code.  Other than that, it's a number of
  reported bugs fixes and new quirk/id entries.

  All have been in linux-next for a while"

* tag 'usb-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
  usb: common: otg-fsm: only signal connect after switching to peripheral
  uas: Add US_FL_NO_ATA_1X for Initio Corporation controllers / devices
  USB: ehci-atmel: rework clk handling
  MAINTAINERS: add entry for USB OTG FSM
  usb: chipidea: otg: add a_alt_hnp_support response for B device
  phy: omap-usb2: Fix missing clk_prepare call when using old dt name
  phy: ti/omap: Fix modalias
  phy: core: Fixup return value of phy_exit when !pm_runtime_enabled
  phy: miphy28lp: Convert to devm_kcalloc and fix wrong sizof
  phy: miphy365x: Convert to devm_kcalloc and fix wrong sizeof
  phy: twl4030-usb: Remove redundant assignment for twl->linkstat
  phy: exynos5-usbdrd: Fix off-by-one valid value checking for args->args[0]
  phy: Find the right match in devm_phy_destroy()
  phy: rockchip-usb: Fixup rockchip_usb_phy_power_on failure path
  phy: ti-pipe3: Simplify ti_pipe3_dpll_wait_lock implementation
  phy: samsung-usb2: Remove NULL terminating entry from phys array
  phy: hix5hd2-sata: Check return value of platform_get_resource
  phy: exynos-dp-video: Kill exynos_dp_video_phy_pwr_isol function
  Revert "usb: gadget: zero: Add support for interrupt EP"
  Revert "xhci: Clear the host side toggle manually when endpoint is 'soft reset'"
  ...
2015-03-22 11:33:55 -07:00
Linus Torvalds
f897522468 Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave dmaengine fixes from Vinod Koul:
 "Four fixes for dw, pl08x, imx-sdma and at_hdmac driver.  Nothing
  unusual here, simple fixes to these drivers"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: pl08x: Define capabilities for generic capabilities reporting
  dmaengine: dw: append MODULE_ALIAS for platform driver
  dmaengine: imx-sdma: switch to dynamic context mode after script loaded
  dmaengine: at_hdmac: Fix calculation of the residual bytes
2015-03-21 13:05:37 -07:00
Linus Torvalds
3d7a6db537 Power management and ACPI fixes for v4.0-rc5
- Revert a recent PCI commit related to IRQ resources management
    that introduced a regression for drivers attempting to bind to
    devices whose previous drivers did not balance pci_enable_device()
    and pci_disable_device() as expected (Rafael J Wysocki).
 
  - Fix a deadlock in at91_rtc_interrupt() introduced by a typo in a
    recent commit related to wakeup interrupt handling (Dan Carpenter).
 
  - Allow the power capping RAPL (Running-Average Power Limit) driver
    to use different energy units for domains within one CPU package
    which is necessary to handle Intel Haswell EP processors correctly
    (Jacob Pan).
 
  - Improve the cpuidle mvebu driver's handling of Armada XP SoCs by
    updating the target residency and exit latency numbers for those
    chips (Sebastien Rannou).
 
  - Prevent the cpuidle mvebu driver from calling cpu_pm_enter() twice
    in a row before cpu_pm_exit() is called on the same CPU which
    breaks the core's assumptions regarding the usage of those
    functions (Gregory Clement).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJVDLwIAAoJEILEb/54YlRxSrMQAKn/DwfNMqVWP5vf/YWauSfL
 S51+E/emGvy+3fwBsa46KkddRQ0ysxc1wKIHcWXc1UPtA+lKNS3MoCUdD+isUFt7
 PUrMsblgjh/e6LiXOBqElAiuugVoH7JCVMKvlv5Tsn3qxY3AJEoxGwV7p4XyP6lJ
 PumqAvWFtaIFKThJFdKPGC511tYTQWoZ/3u843aEsHtpvmiytgUrvxpuCXlSSKT1
 vbOdHAJXi0QyQYWIZ0VNN+MZ2WvaU9t1QCpBJUnzZMi2kuG3HP9rzY40GOnoMn6/
 jXaxegeT7UX5JY5NWU9VrrVwKzppIpyKW6yckIRcKD+ovwKdGbMrfMco2iyK1xgV
 Q6B5h5guYTTynjBoi9XO3d7AWN3gM+8OYCPJgcRG2BMQEunlS0D+i3cRDqeHzW0M
 W+OaENK9MnxG9KVEq0PIrWomGZL1SlOtHfHm9xu8hpqGx4h1iTSgiAEFQQ+Zmmzh
 +g1OLgddHkWjkPoZ/Y8d1NpdnTf+kbkm8Wqm9Uyie1/HnUJMnHYNbzZTyF4ZjlV2
 MAl2P0zBqWhLEDb4STHWHdnBZVhvGCpg1J2pFaSRjDEn+EP0YBH+LscWi//xONNr
 5acBoVzid92co+JwrYn3/MYHctV8bBLdXqeGUiuKD6tk9u+aLke24RTBwm5frPBE
 SjHr1sLhmzubzXtzQIrp
 =4Oz9
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
 "These are fixes for recent regressions (PCI/ACPI resources and at91
  RTC locking), a stable-candidate powercap RAPL driver fix and two ARM
  cpuidle fixes (one stable-candidate too).

  Specifics:

   - Revert a recent PCI commit related to IRQ resources management that
     introduced a regression for drivers attempting to bind to devices
     whose previous drivers did not balance pci_enable_device() and
     pci_disable_device() as expected (Rafael J Wysocki).

   - Fix a deadlock in at91_rtc_interrupt() introduced by a typo in a
     recent commit related to wakeup interrupt handling (Dan Carpenter).

   - Allow the power capping RAPL (Running-Average Power Limit) driver
     to use different energy units for domains within one CPU package
     which is necessary to handle Intel Haswell EP processors correctly
     (Jacob Pan).

   - Improve the cpuidle mvebu driver's handling of Armada XP SoCs by
     updating the target residency and exit latency numbers for those
     chips (Sebastien Rannou).

   - Prevent the cpuidle mvebu driver from calling cpu_pm_enter() twice
     in a row before cpu_pm_exit() is called on the same CPU which
     breaks the core's assumptions regarding the usage of those
     functions (Gregory Clement)"

* tag 'pm+acpi-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "x86/PCI: Refine the way to release PCI IRQ resources"
  rtc: at91rm9200: double locking bug in at91_rtc_interrupt()
  powercap / RAPL: handle domains with different energy units
  cpuidle: mvebu: Update cpuidle thresholds for Armada XP SOCs
  cpuidle: mvebu: Fix the CPU PM notifier usage
2015-03-21 12:51:36 -07:00
Linus Torvalds
97448d5b3d Merge git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "A bunch of fixes across drivers:

  radeon:
     disable two ended allocation for now, it breaks some stuff

  amdkfd:
     misc fixes

  nouveau:
     fix irq loop problem, add basic support for GM206 (new hw)

  i915:
     fix some WARNs people were seeing

  exynos:
     fix some iommu interactions causing boot failures"

* git://people.freedesktop.org/~airlied/linux:
  drm/radeon: drop ttm two ended allocation
  drm/exynos: fix the initialization order in FIMD
  drm/exynos: fix typo config name correctly.
  drm/exynos: Check for NULL dereference of crtc
  drm/exynos: IS_ERR() vs NULL bug
  drm/exynos: remove unused files
  drm/i915: Make sure the primary plane is enabled before reading out the fb state
  drm/nouveau/bios: fix i2c table parsing for dcb 4.1
  drm/nouveau/device/gm100: Basic GM206 bring up (as copy of GM204)
  drm/nouveau/device: post write to NV_PMC_BOOT_1 when flipping endian switch
  drm/nouveau/gr/gf100: fix some accidental or'ing of buffer addresses
  drm/nouveau/fifo/nv04: remove the loop from the interrupt handler
  drm/radeon: Changing number of compute pipe lines
  drm/amdkfd: Fix SDMA queue init. in non-HWS mode
  drm/amdkfd: destroy mqd when destroying kernel queue
  drm/i915: Ensure plane->state->fb stays in sync with plane->fb
2015-03-21 12:41:50 -07:00
Linus Torvalds
bb8ef2fbb8 More DeviceTree fixes for 4.0:
- Revert setting stdout-path as preferred console. This caused
 regressions in PowerMACs and other systems.
 
 - Yet another fix for stdout-path option parsing.
 
 - Fix error path handling in of_irq_parse_one
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVCt/mAAoJEMhvYp4jgsXiVKUIALYn1mFKm/RJiTZ2inLeIL/9
 Bl2U8y/kMT4NoNdKknIvdD8IngQuggl5WK7mpJo7LRZ3P9/t7msutzT8uwdn1868
 6sDH4bIzR3/5x+ZFJubcv8UQXs8dbWiipXDGsemKUtUEF7Q4JBd9LCCEx+22WAnk
 f1uThOB8Jd8c7oLPu1wKSG0RSJYF4s1rjfDrjLkcg3QWAzEcThI/5ww5oeuFtee8
 25xTXPgL4qTCRBsogse0339/pby8QUzFGdVHMGUEP/tjp82NfZ8+QxPOalLi103R
 kC6E9LTr85DgfxsqH5GVmA3pUHZfRb7OjLzoax5Dzf8kpYt8ta8sEw2+1uUadxY=
 =+ZiW
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-fixes-for-4.0-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull more DeviceTree fixes vfom Rob Herring:

 - revert setting stdout-path as preferred console.  This caused
   regressions in PowerMACs and other systems.

 - yet another fix for stdout-path option parsing.

 - fix error path handling in of_irq_parse_one

* tag 'devicetree-fixes-for-4.0-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  Revert "of: Fix premature bootconsole disable with 'stdout-path'"
  of: handle both '/' and ':' in path strings
  of: unittest: Add option string test case with longer path
  of/irq: Fix of_irq_parse_one() returned error codes
2015-03-21 12:33:01 -07:00
Linus Torvalds
e477f3e013 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Here are current target-pending fixes for v4.0-rc5 code that have made
  their way into the queue over the last weeks.

  The fixes this round include:

   - Fix long-standing iser-target logout bug related to early
     conn_logout_comp completion, resulting in iscsi_conn use-after-tree
     OOpsen.  (Sagi + nab)

   - Fix long-standing tcm_fc bug in ft_invl_hw_context() failure
     handing for DDP hw offload.  (DanC)

   - Fix incorrect use of unprotected __transport_register_session() in
     tcm_qla2xxx + other single local se_node_acl fabrics.  (Bart)

   - Fix reference leak in target_submit_cmd() -> target_get_sess_cmd()
     for ack_kref=1 failure path.  (Bart)

   - Fix pSCSI backend ->get_device_type() statistics OOPs with
     un-configured device.  (Olaf + nab)

   - Fix virtual LUN=0 target_configure_device failure OOPs at modprobe
     time.  (Claudio + nab)

   - Fix FUA write false positive failure regression in v4.0-rc1 code.
     (Christophe Vu-Brugier + HCH)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: do not reject FUA CDBs when write cache is enabled but emulate_write_cache is 0
  target: Fix virtual LUN=0 target_configure_device failure OOPs
  target/pscsi: Fix NULL pointer dereference in get_device_type
  tcm_fc: missing curly braces in ft_invl_hw_context()
  target: Fix reference leak in target_get_sess_cmd() error path
  loop/usb/vhost-scsi/xen-scsiback: Fix use of __transport_register_session
  tcm_qla2xxx: Fix incorrect use of __transport_register_session
  iscsi-target: Avoid early conn_logout_comp for iser connections
  Revert "iscsi-target: Avoid IN_LOGOUT failure case for iser-target"
  target: Disallow changing of WRITE cache/FUA attrs after export
2015-03-21 11:24:38 -07:00
Linus Torvalds
da6b9a2049 A handful of stable fixes for DM:
- fix thin target to always zero-fill reads to unprovisioned blocks
 - fix to interlock device destruction's suspend from internal suspends
 - fix 2 snapshot exception store handover bugs
 - fix dm-io to cope with DISCARD and WRITE_SAME capabilities changing
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVDIJ8AAoJEMUj8QotnQNaynEIAK4Q3mvOeI6PIwuC2iOWRglJ
 C0xzTKMR6VDQeM/uMRLeiiqxdPs6b89OdSmJHKJYOrdsusjG0a4IBAO9vDayb6sW
 AxOlbnpXdoiH3cqQ4dISJIfS9qM49qGM40CAflcLKYsGfLnstVLt+4l9HaWaRrHj
 b2kAC+I6PDDybFlKD5KsMB56sjzhhqg/lnnwkY2omV4vLHf6RrhVhK5gcEPF7VN0
 SJ5HKSNBHQnpYSEECHXkxGvZ/ZKTe9n7q0t6Q3nnTSUFTXS6esgTsK1q2AykADDR
 3W1LkrAAFP31AWKPkUwCEe3C8Rk3rCsrq2H14woWX1/UxSXNPlMYCU50ak4TVmc=
 =Svyk
 -----END PGP SIGNATURE-----

Merge tag 'dm-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull devicemapper fixes from Mike Snitzer:
 "A handful of stable fixes for DM:
   - fix thin target to always zero-fill reads to unprovisioned blocks
   - fix to interlock device destruction's suspend from internal
     suspends
   - fix 2 snapshot exception store handover bugs
   - fix dm-io to cope with DISCARD and WRITE_SAME capabilities changing"

* tag 'dm-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm io: deal with wandering queue limits when handling REQ_DISCARD and REQ_WRITE_SAME
  dm snapshot: suspend merging snapshot when doing exception handover
  dm snapshot: suspend origin when doing exception handover
  dm: hold suspend_lock while suspending device during device deletion
  dm thin: fix to consistently zero-fill reads to unprovisioned blocks
2015-03-21 11:15:13 -07:00
Linus Torvalds
521d474631 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "Most of these are fixing extent reservation accounting, or corners
  with tree writeback during commit.

  Josef's set does add a test, which isn't strictly a fix, but it'll
  keep us from making this same mistake again"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix outstanding_extents accounting in DIO
  Btrfs: add sanity test for outstanding_extents accounting
  Btrfs: just free dummy extent buffers
  Btrfs: account merges/splits properly
  Btrfs: prepare block group cache before writing
  Btrfs: fix ASSERT(list_empty(&cur_trans->dirty_bgs_list)
  Btrfs: account for the correct number of extents for delalloc reservations
  Btrfs: fix merge delalloc logic
  Btrfs: fix comp_oper to get right order
  Btrfs: catch transaction abortion after waiting for it
  btrfs: fix sizeof format specifier in btrfs_check_super_valid()
2015-03-21 10:53:37 -07:00
Linus Torvalds
0d122f7430 Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux
Pull nfsd bufix from Bruce Fields:
 "This is a fix for a crash easily triggered by 4.1 activity to a server
  built with CONFIG_NFSD_PNFS.

  There are some more bugfixes queued up that I intend to pass along
  next week, but this is the most critical"

* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
  Subject: nfsd: don't recursively call nfsd4_cb_layout_fail
2015-03-21 10:41:15 -07:00
Linus Torvalds
c6ef814509 This pull request fixes a bug introduced during the v4.0 merge window where we
forgot to put braces where they should be.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVC8rvAAoJECmIfjd9wqK0DbUP/23tBzSS4rm1CB5X/MBDhyhr
 JUryy+pdlNq0WCbcljxg769NII1xdY3Ka+5rtW0IfGE9qfmeR2xek3qdaNtjjYuh
 QnvZrvqdjEotx2dZkqDVZwuT33LgXliZlqCDIfNqTzMfatwP90/53wARyUPF9rJY
 u3/67w5ompSRr4Okl5BErdGy+AyrE2o3fiI10CWrX74xjQAmhhmX3UH4y+PMyvwN
 3QIQ8n5ECNRphMBP7clcFQGRGcVVflGuEn2boApRICJVaDJugJAf813ZrHuDl9dt
 Tsw+jy6x/bCaFXXXxh35KzpZoB7LTgZ/7kisoGH77HS+7seQERCPS49vatKxXESD
 ZM/BaX/BZ9wu3EWr7dNmlc/NMU1ykbcUr/BRjkeQ/hXSexO2GYryD/ysMBJVd3wn
 TAxSgp4zjkXlkcntgtLM6wwR+kSKffQy5xpvoUQSK4QvVOVrsXWizvfj652toYQs
 +m5CW0fF/CIjeqmxFpdC6NGITmTAmblUAPZJrYpTzrV1zOF0hO8b2EKdziP5qyDY
 kLlaU90MxSPMImSuEcRwku5KQMtF+b2OesW2QrPKPKkwT4j/IJeme5h4aKsWNGGP
 MxEOBkKctOARAus2ryLB9g99hTXmYbmcWKoc/4PYyjOXh1KvpQ6+HiHNeilZ0HBP
 Gkw0axRGpjdQTItnTyOr
 =SIBr
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.0-rc5' of git://git.infradead.org/linux-ubifs

Pull UBI fix from Artem Bityutskiy:
 "This fixes a bug introduced during the v4.0 merge window where we
  forgot to put braces where they should be"

* tag 'upstream-4.0-rc5' of git://git.infradead.org/linux-ubifs:
  UBI: fix missing brace control flow
2015-03-21 10:36:44 -07:00
Linus Torvalds
60ed380eb8 arm64 fixes:
- mm switching fix where the kernel pgd ends up in the user TTBR0 after
   returning from an EFI run-time services call
 - fix __GFP_ZERO handling for atomic pool and CMA DMA allocations (the
   generic code does get the gfp flags, so it's left with the arch code
   to memzero accordingly)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVDU0uAAoJEGvWsS0AyF7x5T8P/3PWvBUUdcZfJye7iteaBjTZ
 7/Sc3nedfkrIeNv1SjtJwSkMqhLucC993w655tOBbsQYt2e2uAiIX05/4yfl/K6H
 +4RjJqt3b9njmfl16KxvJVFIBTQXW3fZFdKvrMMCVk+Nqm1gn/RusOcVem5aKAaH
 qiEZ3T5k1Mh8Fx0DDjfn7jZHnzs+zdeYpvWneQBYMlTn3CPALBnIm606WIDTtgHD
 2QgCdaK/vwSiTkH4jNyuSLXR8EVdSDkQrEIAMhp6+iVciCBRzQSKbwUnY2MGonYY
 okMcXG4LaNe4o0j/nbrlLn1tUHkWEBHDDGORVfy0Ud1TMiFfUjFuH5Incak2oQUa
 ttvtBlBkLY5OacA2rEuHSgYd8VpVEVoDIcDDJ19YaG503VFSdyU4hNEchgsQLH17
 nM2yXE81wddl15TMJijBCOJubObZQSKqOFsicpRaBRXcgrqQIvhWxCO/EIyUoWej
 PxOOsA2qhhVuIjt67qBr5jzQqjd1jB9CD9oedLyq1QBUD2qI0UTm3glTvtH8uMLz
 FWIcuzr93zSf3b3NPTORlskYFmC2hwhlr10dFTSnBZonxKD/axZ9PUEtc7jO8Kqe
 sJoN6dvYfvreGIPH1Ig3ALbZrYTaU7jmht5zu41dcNPj80jSdKrp7lJtD7m147iQ
 Q6BYxNOJVJJ/BAZOO8pK
 =XSnR
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - mm switching fix where the kernel pgd ends up in the user TTBR0 after
   returning from an EFI run-time services call

 - fix __GFP_ZERO handling for atomic pool and CMA DMA allocations (the
   generic code does get the gfp flags, so it's left with the arch code
   to memzero accordingly)

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Honor __GFP_ZERO in dma allocations
  arm64: efi: don't restore TTBR0 if active_mm points at init_mm
2015-03-21 10:24:10 -07:00
Linus Torvalds
62a202d749 Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 "Another few ARM fixes.  Fabrice fixed the L2 cache DT parsing to allow
  prefetch configuration to be specified even when the cache size
  parsing fails.

  Laura noticed that the setting of page attributes wasn't working for
  modules due to is_module_addr() always returning false.

  Marc Gonzalez (aka Mason) noticed a potential latent bug with the way
  we read one of the CPUID registers (where we could attempt to read a
  non-present CPUID register which may fault.)

  I've fixed an issue where 32-bit DMA masks were failing with memory
  which extended to the top of physical address space, and I've also
  added debugging output of the page tables when we hit a data access
  exception which we don't specifically handle - prompted by the lack of
  information in a bug report"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8313/1: Use read_cpuid_ext() macro instead of inline asm
  ARM: 8311/1: Don't use is_module_addr in setting page attributes
  ARM: 8310/1: l2c: Fix prefetch settings dt parsing
  ARM: dump pgd, pmd and pte states on unhandled data abort faults
  ARM: dma-api: fix off-by-one error in __dma_supported()
2015-03-21 10:03:22 -07:00
Rafael J. Wysocki
9c86286a60 Merge branches 'pm-cpuidle', 'powercap', 'irq-pm' and 'acpi-resources'
* pm-cpuidle:
  cpuidle: mvebu: Update cpuidle thresholds for Armada XP SOCs
  cpuidle: mvebu: Fix the CPU PM notifier usage

* powercap:
  powercap / RAPL: handle domains with different energy units

* irq-pm:
  rtc: at91rm9200: double locking bug in at91_rtc_interrupt()

* acpi-resources:
  Revert "x86/PCI: Refine the way to release PCI IRQ resources"
2015-03-21 00:39:12 +01:00
NeilBrown
0c35bd4723 md: fix problems with freeing private data after ->run failure.
If ->run() fails, it can either free the data structures it
allocated, or leave that task to ->free() which will be called
on failures.

However:
  md.c calls ->free() even if ->private_data is NULL, which
     causes problems in some personalities.
  raid0.c frees the data, but doesn't clear ->private_data,
     which will become a problem when we fix md.c

So better fix both these issues at once.

Reported-by: Richard W.M. Jones <rjones@redhat.com>
Fixes: 5aa61f427e
URL: https://bugzilla.kernel.org/show_bug.cgi?id=94381
Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-21 09:40:36 +11:00
Suzuki K. Poulose
7132813c38 arm64: Honor __GFP_ZERO in dma allocations
Current implementation doesn't zero out the pages allocated.
Honor the __GFP_ZERO flag and zero out if set.

Cc: <stable@vger.kernel.org> # v3.14+
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-03-20 18:18:54 +00:00