Commit Graph

39703 Commits

Author SHA1 Message Date
Jan Schmidt
c3e0696523 Btrfs: always put insert_ptr modifications into the tree mod log
Several callers of insert_ptr set the tree_mod_log parameter to 0 to avoid
addition to the tree mod log. In fact, we need all of those operations. This
commit simply removes the additional parameter and makes addition to the
tree mod log unconditional.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:39 +02:00
Jan Schmidt
28da9fb446 Btrfs: fix tree mod log for root replacements at leaf level
For the tree mod log, we don't log any operations at leaf level. If the root
is at the leaf level (i.e. the tree consists only of the root), then
__tree_mod_log_oldest_root will find a ROOT_REPLACE operation in the log
(because we always log that one no matter which level), but no other
operations.

With this patch __tree_mod_log_oldest_root exits cleanly instead of
BUGging in this situation. get_old_root checks if its really a root at leaf
level in case we don't have any operations and WARNs if this assumption
breaks.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:38 +02:00
Jan Schmidt
9345457f4a Btrfs: support root level changes in __resolve_indirect_ref
With the tree mod log, we can have a tree that's two levels high, but
btrfs_search_old_slot may still return a path with the tree root at level
one instead. __resolve_indirect_ref must care for this and accept parents in
a lower level than expected.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:38 +02:00
Jan Schmidt
8ca78f3eda Btrfs: avoid waiting for delayed refs when we must not
We track two conditions to decide if we should sleep while waiting for more
delayed refs, the number of delayed refs (num_refs) and the first entry in
the list of blockers (first_seq).

When we suspect staleness, we save num_refs and do one more cycle. If
nothing changes, we then save first_seq for later comparison and do
wait_event. We ought to save first_seq the very same moment we're saving
num_refs. Otherwise we cannot be sure that nothing has changed and we might
start waiting when we shouldn't, which could lead to starvation.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:35 +02:00
Brian Norris
2d4cf5ae12 UBIFS: correct usage of IS_ENABLED()
Commit "818039c UBIFS: fix debugfs-less systems support" fixed one
regression but introduced a different regression - the debugfs is now always
compiled out. Root cause: IS_ENABLED() arguments should be used with the
CONFIG_* prefix.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-06-27 14:22:15 +03:00
Greg Kroah-Hartman
bcc66c0b88 Merge 3.5-rc4 into staging-next
This picks up the staging changes made in 3.5-rc4 so that everyone can sync up
properly.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-25 09:31:00 -07:00
Trond Myklebust
db3a3bcf08 NFSv2/v3: Remove incorrect dprintks from the readdir reply code
The actual size of the directory is unknown to the client, so it is
always requesting the maximum number it can handle. If the server
is replying with fewer entries than was requested, then that will
usually reflect the fact that we've hit the end of the directory.
Flagging it as an error is therefore incorrect.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-24 16:20:07 -04:00
Linus Torvalds
002b758b6d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
 "There are a couple of fixes from Yan for bad pointer dereferences in
  the messenger code and when fiddling with page->private after page
  migration, a fix from Alex for a use-after-free in the osd client
  code, and a couple fixes for the message refcounting and shutdown
  ordering."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: flush msgr queue during mon_client shutdown
  rbd: Clear ceph_msg->bio_iter for retransmitted message
  libceph: use con get/put ops from osd_client
  libceph: osd_client: don't drop reply reference too early
  ceph: check PG_Private flag before accessing page->private
2012-06-22 17:47:08 -07:00
Linus Torvalds
369c4f542f Merge tag 'for-linus-Jun-21-2012' of git://oss.sgi.com/xfs/xfs
Pull XFS fixes from Ben Myers:
 - Fix stale data exposure with unwritten extents
 - Fix a warning in xfs_alloc_vextent with ODEBUG
 - Fix overallocation and alignment of pages for xfs_bufs
 - Fix a cursor leak
 - Fix a log hang
 - Fix a crash related to xfs_sync_worker
 - Rename xfs log structure from struct log to struct xlog so we can use
   crash dumps effectively

* tag 'for-linus-Jun-21-2012' of git://oss.sgi.com/xfs/xfs:
  xfs: rename log structure to xlog
  xfs: shutdown xfs_sync_worker before the log
  xfs: Fix overallocation in xfs_buf_allocate_memory()
  xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near
  xfs: check for stale inode before acquiring iflock on push
  xfs: fix debug_object WARN at xfs_alloc_vextent()
  xfs: xfs_vm_writepage clear iomap_valid when !buffer_uptodate (REV2)
2012-06-22 11:07:55 -07:00
Linus Torvalds
636040b4ed Merge tag 'nfs-for-3.5-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 - Fix a write hang due to an uninitalised variable when
   !defined(CONFIG_NFS_V4)
 - Address upcall races in the legacy NFSv4 idmapper
 - Remove an O_DIRECT refcounting issue
 - Fix a pNFS refcounting bug when the file layout metadata server is
   also acting as a data server
 - Fix a pNFS module loading race.

* tag 'nfs-for-3.5-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Force the legacy idmapper to be single threaded
  NFS: Initialise commit_info.rpc_out when !defined(CONFIG_NFS_V4)
  NFS: Fix a refcounting issue in O_DIRECT
  NFSv4.1: Fix a race in set_pnfs_layoutdriver
  NFSv4.1: Fix umount when filelayout DS is also the MDS
2012-06-21 16:05:43 -07:00
Linus Torvalds
8874e812fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "This is a small pull with btrfs fixes.  The biggest of the bunch is
  another fix for the new backref walking code.

  We're still hammering out one btrfs dio vs buffered reads problem, but
  that one will have to wait for the next rc."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: delay iput with async extents
  Btrfs: add a missing spin_lock
  Btrfs: don't assume to be on the correct extent in add_all_parents
  Btrfs: introduce btrfs_next_old_item
2012-06-21 13:41:07 -07:00
Mark Tinguely
9a8d2fdbb4 xfs: remove xlog_t typedef
Remove the xlog_t type definitions.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 14:22:27 -05:00
Mark Tinguely
f7bdf03a99 xfs: rename log structure to xlog
Rename the XFS log structure to xlog to help crash distinquish it from the
other logs in Linux.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 14:21:11 -05:00
Ben Myers
8866fc6fa5 xfs: shutdown xfs_sync_worker before the log
Revert commit 1307bbd, which uses the s_umount semaphore to provide
exclusion between xfs_sync_worker and unmount, in favor of shutting down
the sync worker before freeing the log in xfs_log_unmount.  This is a
cleaner way of resolving the race between xfs_sync_worker and unmount
than using s_umount.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2012-06-21 14:20:48 -05:00
Jan Kara
59c84ed0dd xfs: Fix overallocation in xfs_buf_allocate_memory()
Commit de1cbee which removed b_file_offset in favor of b_bn introduced a bug
causing xfs_buf_allocate_memory() to overestimate the number of necessary
pages. The problem is that xfs_buf_alloc() sets b_bn to -1 and thus effectively
every buffer is straddling a page boundary which causes
xfs_buf_allocate_memory() to allocate two pages and use vmalloc() for access
which is unnecessary.

Dave says xfs_buf_alloc() doesn't need to set b_bn to -1 anymore since the
buffer is inserted into the cache only after being fully initialized now.
So just make xfs_buf_alloc() fill in proper block number from the beginning.

CC: David Chinner <dchinner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 14:20:36 -05:00
Dave Chinner
76d095388b xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near
When we fail to find an matching extent near the requested extent
specification during a left-right distance search in
xfs_alloc_ag_vextent_near, we fail to free the original cursor that
we used to look up the XFS_BTNUM_CNT tree and hence leak it.

Reported-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 14:20:20 -05:00
Brian Foster
9a3a5dab63 xfs: check for stale inode before acquiring iflock on push
An inode in the AIL can be flush locked and marked stale if
a cluster free transaction occurs at the right time. The
inode item is then marked as flushing, which causes xfsaild
to spin and leaves the filesystem stalled. This is
reproduced by running xfstests 273 in a loop for an
extended period of time.

Check for stale inodes before the flush lock. This marks
the inode as pinned, leads to a log flush and allows the
filesystem to proceed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 14:20:06 -05:00
Mark Tinguely
ad223e6030 xfs: rename log structure to xlog
Rename the XFS log structure to xlog to help crash distinquish it from the
other logs in Linux.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 13:49:39 -05:00
Ben Myers
11159a0500 xfs: shutdown xfs_sync_worker before the log
Revert commit 1307bbd, which uses the s_umount semaphore to provide
exclusion between xfs_sync_worker and unmount, in favor of shutting down
the sync worker before freeing the log in xfs_log_unmount.  This is a
cleaner way of resolving the race between xfs_sync_worker and unmount
than using s_umount.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2012-06-21 13:41:04 -05:00
Jan Kara
bcf62ab64d xfs: Fix overallocation in xfs_buf_allocate_memory()
Commit de1cbee which removed b_file_offset in favor of b_bn introduced a bug
causing xfs_buf_allocate_memory() to overestimate the number of necessary
pages. The problem is that xfs_buf_alloc() sets b_bn to -1 and thus effectively
every buffer is straddling a page boundary which causes
xfs_buf_allocate_memory() to allocate two pages and use vmalloc() for access
which is unnecessary.

Dave says xfs_buf_alloc() doesn't need to set b_bn to -1 anymore since the
buffer is inserted into the cache only after being fully initialized now.
So just make xfs_buf_alloc() fill in proper block number from the beginning.

CC: David Chinner <dchinner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 13:38:25 -05:00
Dave Chinner
079da28c64 xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near
When we fail to find an matching extent near the requested extent
specification during a left-right distance search in
xfs_alloc_ag_vextent_near, we fail to free the original cursor that
we used to look up the XFS_BTNUM_CNT tree and hence leak it.

Reported-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 12:32:43 -05:00
Brian Foster
76e8f13866 xfs: check for stale inode before acquiring iflock on push
An inode in the AIL can be flush locked and marked stale if
a cluster free transaction occurs at the right time. The
inode item is then marked as flushing, which causes xfsaild
to spin and leaves the filesystem stalled. This is
reproduced by running xfstests 273 in a loop for an
extended period of time.

Check for stale inodes before the flush lock. This marks
the inode as pinned, leads to a log flush and allows the
filesystem to proceed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 10:53:43 -05:00
Josef Bacik
cb77fcd885 Btrfs: delay iput with async extents
There is some concern that these iput()'s could be the final iputs and could
induce lockups on people waiting on writeback.  This would happen in the
rare case that we don't create ordered extents because of an error, but it
is theoretically possible and we already have a mechanism to deal with this
so just make them delayed iputs to negate any worry.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-21 07:19:36 -04:00
Josef Bacik
e18fca7342 Btrfs: add a missing spin_lock
When fixing up the locking in the delayed ref destruction work I accidently
broke the locking myself ;(.  Add back a spin_lock that should be there and
we are now all set.  Thanks,
Btrfs: add a missing spin_lock

When fixing up the locking in the delayed ref destruction work I accidently
broke the locking myself ;(.  Add back a spin_lock that should be there and
we are now all set.  Thanks,

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-21 07:19:35 -04:00
Alexander Block
69bca40d41 Btrfs: don't assume to be on the correct extent in add_all_parents
add_all_parents did assume that path is already at a correct extent data
item, which may not be true in case of data extents that were partly
rewritten and splitted.

We need to check if we're on a matching extent for every item and only
for the ones after the first. The loop is changed to do this now.

This patch also fixes a bug introduced with commit 3b127fd8 "Btrfs:
remove obsolete btrfs_next_leaf call from __resolve_indirect_ref".
The removal of next_leaf did sometimes result in slot==nritems when
the above described case happens, and thus resulting in invalid values
(e.g. wanted_obejctid) in add_all_parents (leading to missed backrefs
or even crashes).

Signed-off-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-21 07:19:34 -04:00
Alexander Block
1c8f52a5e9 Btrfs: introduce btrfs_next_old_item
We introduce btrfs_next_old_item that uses btrfs_next_old_leaf instead
of btrfs_next_leaf.

btrfs_next_item is also changed to simply call btrfs_next_old_item with
time_seq being 0.

Signed-off-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-21 07:19:34 -04:00
Anton Vorontsov
1e6a9e5625 pstore/ram_core: Better ECC size checking
- Instead of exploiting unsigned overflows (which doesn't work for all
  sizes), use straightforward checking for ECC total size not exceeding
  initial buffer size;

- Printing overflowed buffer_size is not informative. Instead, print
  ecc_size and buffer_size;

- No need for buffer_size argument in persistent_ram_init_ecc(),
  we can address prz->buffer_size directly.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-20 16:15:22 -07:00
Anton Vorontsov
beeb94321a pstore/ram_core: Proper checking for post_init errors (e.g. improper ECC size)
We will implement variable-sized ECC buffers soon, so post_init routine
might fail much more likely, so we'd better check for its errors.

To make error handling simple, modify persistent_ram_free() to it be safe
at all times.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-20 16:15:22 -07:00
Anton Vorontsov
90b58d9690 pstore/ram: Fix error handling during przs allocation
persistent_ram_new() returns ERR_PTR() value on errors, so during
freeing of the przs we should check for both NULL and IS_ERR() entries,
otherwise bad things will happen.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-20 16:15:22 -07:00
Anton Vorontsov
924d37118f pstore/ram: Probe as early as possible
Registering the platform driver before module_init allows us to log oopses
that happen during device probing.

This requires changing module_init to postcore_initcall, and switching
from platform_driver_probe to platform_driver_register because the
platform device is not registered when the platform driver is registered;
and because we use driver_register, now can't use create_bundle() (since
it will try to register the same driver once again), so we have to switch
to platform_device_register_data().

Also, some __init -> __devinit changes were needed.

Overall, the registration logic is now much clearer, since we have only
one driver registration point, and just an optional dummy device, which
is created from the module parameters.

Suggested-by: Colin Cross <ccross@android.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-20 16:15:22 -07:00
Linus Torvalds
bc259adc9b Merge tag 'staging-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging tree fixes from Greg Kroah-Hartman:
 "Here are a number of small fixes for the drivers/staging tree, as well
  as iio and pstore drivers (which came from the staging tree in the
  3.5-rc1 merge).  All of these are tiny, but resolve issues that people
  have been reporting.

  There's also a documentation update to reflect what the iio drivers
  really are doing, which is good to get straightened out.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'staging-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: r8712u: Add new USB IDs
  staging: gdm72xx: Release netlink socket properly
  iio: drop wrong reference from Kconfig
  pstore/inode: Make pstore_fill_super() static
  pstore/ram: Should zap persistent zone on unlink
  pstore/ram_core: Factor persistent_ram_zap() out of post_init()
  pstore/ram_core: Do not reset restored zone's position and size
  pstore/ram: Should update old dmesg buffer before reading
  staging:iio:ad7298: Fix linker error due to missing IIO kfifo buffer
  Revert "staging: usbip: bugfix for stack corruption on 64-bit architectures"
  staging: usbip: bugfix for stack corruption on 64-bit architectures
  staging/comedi: fix build for USB not enabled
  staging: omapdrm: fix crash when freeing bad fb
  staging:iio:ad7606: Re-add missing scale attribute
  iio: Fix potential use after free
  staging:iio: remove num_interrupt_lines from documentation
  iio: documentation: Add out_altvoltage and friends
2012-06-20 15:15:03 -07:00
Linus Torvalds
fe80352460 Merge tag 'driver-core-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core and printk fixes from Greg Kroah-Hartman:
 "Here are some fixes for 3.5-rc4 that resolve the kmsg problems that
  people have reported showing up after the printk and kmsg changes went
  into 3.5-rc1.  There are also a smattering of other tiny fixes for the
  extcon and hyper-v drivers that people have reported.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'driver-core-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  extcon: max8997: Add missing kfree for info->edev in max8997_muic_remove()
  extcon: Set platform drvdata in gpio_extcon_probe() and fix irq leak
  extcon: Fix wrong index in max8997_extcon_cable[]
  kmsg - kmsg_dump() fix CONFIG_PRINTK=n compilation
  printk: return -EINVAL if the message len is bigger than the buf size
  printk: use mutex lock to stop syslog_seq from going wild
  kmsg - kmsg_dump() use iterator to receive log buffer content
  vme: change maintainer e-mail address
  Extcon: Don't try to create duplicate link names
  driver core: fixup reversed deferred probe order
  printk: Fix alignment of buf causing crash on ARM EABI
  Tools: hv: verify origin of netlink connector message
2012-06-20 15:14:28 -07:00
Linus Torvalds
a2a2609c97 Merge branch 'akpm' (Andrew's patch-bomb)
* emailed from Andrew Morton <akpm@linux-foundation.org>: (21 patches)
  mm/memblock: fix overlapping allocation when doubling reserved array
  c/r: prctl: Move PR_GET_TID_ADDRESS to a proper place
  pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper
  pidns: guarantee that the pidns init will be the last pidns process reaped
  fault-inject: avoid call to random32() if fault injection is disabled
  Viresh has moved
  get_maintainer: Fix --help warning
  mm/memory.c: fix kernel-doc warnings
  mm: fix kernel-doc warnings
  mm: correctly synchronize rss-counters at exit/exec
  mm, thp: print useful information when mmap_sem is unlocked in zap_pmd_range
  h8300: use the declarations provided by <asm/sections.h>
  h8300: fix use of extinct _sbss and _ebss
  xtensa: use the declarations provided by <asm/sections.h>
  xtensa: use "test -e" instead of bashism "test -a"
  xtensa: replace xtensa-specific _f{data,text} by _s{data,text}
  memcg: fix use_hierarchy css_is_ancestor oops regression
  mm, oom: fix and cleanup oom score calculations
  nilfs2: ensure proper cache clearing for gc-inodes
  thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
  ...
2012-06-20 14:41:57 -07:00
Konstantin Khlebnikov
4fe7efdbdf mm: correctly synchronize rss-counters at exit/exec
do_exit() and exec_mmap() call sync_mm_rss() before mm_release() does
put_user(clear_child_tid) which can update task->rss_stat and thus make
mm->rss_stat inconsistent.  This triggers the "BUG:" printk in check_mm().

Let's fix this bug in the safest way, and optimize/cleanup this later.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-20 14:39:36 -07:00
Ryusuke Konishi
fbb24a3a91 nilfs2: ensure proper cache clearing for gc-inodes
A gc-inode is a pseudo inode used to buffer the blocks to be moved by
garbage collection.

Block caches of gc-inodes must be cleared every time a garbage collection
function (nilfs_clean_segments) completes.  Otherwise, stale blocks
buffered in the caches may be wrongly reused in successive calls of the GC
function.

For user files, this is not a problem because their gc-inodes are
distinguished by a checkpoint number as well as an inode number.  They
never buffer different blocks if either an inode number, a checkpoint
number, or a block offset differs.

However, gc-inodes of sufile, cpfile and DAT file can store different data
for the same block offset.  Thus, the nilfs_clean_segments function can
move incorrect block for these meta-data files if an old block is cached.
I found this is really causing meta-data corruption in nilfs.

This fixes the issue by ensuring cache clear of gc-inodes and resolves
reported GC problems including checkpoint file corruption, b-tree
corruption, and the following warning during GC.

  nilfs_palloc_freev: entry number 307234 already freed.
  ...

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>	[2.6.37+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-20 14:39:35 -07:00
Jeff Liu
3b876c8f2a xfs: fix debug_object WARN at xfs_alloc_vextent()
Fengguang reports:

[  780.529603] XFS (vdd): Ending clean mount
[  781.454590] ODEBUG: object is on stack, but not annotated
[  781.455433] ------------[ cut here ]------------
[  781.455433] WARNING: at /c/kernel-tests/sound/lib/debugobjects.c:301 __debug_object_init+0x173/0x1f1()
[  781.455433] Hardware name: Bochs
[  781.455433] Modules linked in:
[  781.455433] Pid: 26910, comm: kworker/0:2 Not tainted 3.4.0+ #51
[  781.455433] Call Trace:
[  781.455433]  [<ffffffff8106bc84>] warn_slowpath_common+0x83/0x9b
[  781.455433]  [<ffffffff8106bcb6>] warn_slowpath_null+0x1a/0x1c
[  781.455433]  [<ffffffff814919a5>] __debug_object_init+0x173/0x1f1
[  781.455433]  [<ffffffff81491c65>] debug_object_init+0x14/0x16
[  781.455433]  [<ffffffff8108842a>] __init_work+0x20/0x22
[  781.455433]  [<ffffffff8134ea56>] xfs_alloc_vextent+0x6c/0xd5

Use INIT_WORK_ONSTACK in xfs_alloc_vextent instead of INIT_WORK.

Reported-by: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-20 14:58:24 -05:00
Alain Renaud
66f9311381 xfs: xfs_vm_writepage clear iomap_valid when !buffer_uptodate (REV2)
On filesytems with a block size smaller than PAGE_SIZE we currently have
a problem with unwritten extents.  If a we have multi-block page for
which an unwritten extent has been allocated, and only some of the
buffers have been written to, and they are not contiguous, we can expose
stale data from disk in the blocks between the writes after extent
conversion.

Example of a page with unwritten and real data.
buffer  content
0       empty  b_state = 0
1       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
2       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
3       empty  b_state = 0
4       empty  b_state = 0
5       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
6       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
7       empty  b_state = 0

Buffers 1, 2, 5, and 6 have been written to, leaving 0, 3, 4, and 7
empty.  Currently buffers 1, 2, 5, and 6 are added to a single ioend,
and when IO has completed, extent conversion creates a real extent from
block 1 through block 6, leaving 0 and 7 unwritten.  However buffers 3
and 4 were not written to disk, so stale data is exposed from those
blocks on a subsequent read.

Fix this by setting iomap_valid = 0 when we find a buffer that is not
Uptodate.  This ensures that buffers 5 and 6 are not added to the same
ioend as buffers 1 and 2.  Later these blocks will be converted into two
separate real extents, leaving the blocks in between unwritten.

Signed-off-by: Alain Renaud <arenaud@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-20 14:57:28 -05:00
Bryan Schumaker
b1027439df NFS: Force the legacy idmapper to be single threaded
It was initially coded under the assumption that there would only be one
request at a time, so use a lock to enforce this requirement..

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
CC: stable@vger.kernel.org [3.4+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-20 14:38:11 -04:00
J. Bruce Fields
4af825041b nfsd4: process_open2 cleanup
Note we can simplify the error handling a little by doing the truncate
earlier.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-20 08:59:43 -04:00
J. Bruce Fields
e1aaa8916f nfsd4: nfsd4_lock() cleanup
Share a little common logic.  And note the comments here are a little
out of date (e.g. we don't always create new state in the "new" case any
more.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-20 08:59:42 -04:00
J. Bruce Fields
9068bed1a3 nfsd4: remove unnecessary comment
For the most part readers of cl_cb_state only need a value that is
"eventually" right.  And the value is set only either 1) in response to
some change of state, in which case it's set to UNKNOWN and then a
callback rpc is sent to probe the real state, or b) in the handling of a
response to such a callback.  UNKNOWN is therefore always a "temporary"
state, and for the other states we're happy to accept last writer wins.

So I think we're OK here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-20 08:59:41 -04:00
Chuck Lever
7df302f75e NFSD: TEST_STATEID should not return NFS4ERR_STALE_STATEID
According to RFC 5661, the TEST_STATEID operation is not allowed to
return NFS4ERR_STALE_STATEID.  In addition, RFC 5661 says:

15.1.16.5.  NFS4ERR_STALE_STATEID (Error Code 10023)

   A stateid generated by an earlier server instance was used.  This
   error is moot in NFSv4.1 because all operations that take a stateid
   MUST be preceded by the SEQUENCE operation, and the earlier server
   instance is detected by the session infrastructure that supports
   SEQUENCE.

I triggered NFS4ERR_STALE_STATEID while testing the Linux client's
NOGRACE recovery.  Bruce suggested an additional test that could be
useful to client developers.

Lastly, RFC 5661, section 18.48.3 has this:

 o  Special stateids are always considered invalid (they result in the
    error code NFS4ERR_BAD_STATEID).

An explicit check is made for those state IDs to avoid printk noise.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-20 08:59:40 -04:00
Weston Andros Adamson
2411967305 nfsd: probe the back channel on new connections
Initiate a CB probe when a new connection with the correct direction is added
to a session (IFF backchannel is marked as down).  Without this a
BIND_CONN_TO_SESSION has no effect on the internal backchannel state, which
causes the server to reply to every SEQUENCE op with the
SEQ4_STATUS_CB_PATH_DOWN flag set until DESTROY_SESSION.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-20 08:59:39 -04:00
Yan, Zheng
61600ef848 ceph: check PG_Private flag before accessing page->private
I got lots of NULL pointer dereference Oops when compiling kernel on ceph.
The bug is because the kernel page migration routine replaces some pages
in the page cache with new pages, these new pages' private can be non-zero.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 28c0254ede)
2012-06-20 07:43:48 -05:00
Trond Myklebust
1a0de48ae5 NFS: Initialise commit_info.rpc_out when !defined(CONFIG_NFS_V4)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-06-19 18:42:28 -04:00
Trond Myklebust
5a695da263 NFS: Fix a refcounting issue in O_DIRECT
In nfs_direct_write_reschedule(), the requests from nfs_scan_commit_list
have a refcount of 2, whereas the operations in
nfs_direct_write_completion_ops expect them to have a refcount of 1.

This patch adds a call to release the extra references.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-06-19 18:42:14 -04:00
Trond Myklebust
0a9c63fae7 NFSv4.1: Fix a race in set_pnfs_layoutdriver
The call to try_module_get() dereferences ld_type outside the
spin locks, which means that it may be pointing to garbage if
a module unload was in progress.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-19 13:32:45 -04:00
Trond Myklebust
2a4c8994ee NFSv4.1: Fix umount when filelayout DS is also the MDS
Currently there is a 'chicken and egg' issue when the DS is also the mounted
MDS. The nfs_match_client() reference from nfs4_set_ds_client bumps the
cl_count, the nfs_client is not freed at umount, and nfs4_deviceid_purge_client
is not called to dereference the MDS usage of a deviceid which holds a
reference to the DS nfs_client.  The result is the umount program returns,
but the nfs_client is not freed, and the cl_session hearbeat continues.

The MDS (and all other nfs mounts) lose their last nfs_client reference in
nfs_free_server when the last nfs_server (fsid) is umounted.
The file layout DS lose their last nfs_client reference in destroy_ds
when the last deviceid referencing the data server is put and destroy_ds is
called. This is triggered by a call to nfs4_deviceid_purge_client which
removes references to a pNFS deviceid used by an MDS mount.

The fix is to track how many pnfs enabled filesystems are mounted from
this server, and then to purge the device id cache once that count reaches
zero.

Reported-by: Jorge Mora <Jorge.Mora@netapp.com>
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-18 08:45:16 -04:00
Dan Carpenter
1cfb727107 UBIFS: fix assertion
The asserts here never check anything because it uses '|' instead of
'&'.  Now if the flags are not set it prints a warning a a stack trace.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-06-18 14:17:08 +03:00
Matthew Garrett
7dea9665fe hfsplus: fix bless ioctl when used with hardlinks
HFS+ doesn't really implement hard links - instead, hardlinks are indicated
by a magic file type which refers to an indirect node in a hidden
directory. The spec indicates that stat() should return the inode number
of the indirect node, but it turns out that this doesn't satisfy the
firmware when it's looking for a bootloader - it wants the catalog ID of
the hardlink file instead. Fix up this case.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-17 14:39:59 -07:00