Commit Graph

360767 Commits

Author SHA1 Message Date
Heiko Carstens
0e803bafbb compat: restore timerfd settime and gettime compat syscalls
Both compat syscalls got lost with 9d94b9e2 "switch timerfd compat syscalls
to COMPAT_SYSCALL_DEFINE" because of a typo:
COMPAT instead of CONFIG_COMPAT.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-03-02 09:35:13 -05:00
Al Viro
dfbb83d32c [regression] braino in "sparc: convert to ksignal"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-03-02 02:55:16 -05:00
Al Viro
dcf787f391 constify path_get/path_put and fs_struct.c stuff
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-03-01 23:51:07 -05:00
Al Viro
26567cdbbf fix nommu breakage in shmem.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-03-01 23:50:45 -05:00
Al Viro
dd37978c50 cache the value of file_inode() in struct file
Note that this thing does *not* contribute to inode refcount;
it's pinned down by dentry.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-03-01 19:48:30 -05:00
Heinz Mauelshagen
8735a81347 dm cache: add cleaner policy
A simple cache policy that writes back all data to the origin.

This is used to decommission a dm cache by emptying it.

Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:52 +00:00
Joe Thornber
f283635281 dm cache: add mq policy
A cache policy that uses a multiqueue ordered by recent hit
count to select which blocks should be promoted and demoted.
This is meant to be a general purpose policy.  It prioritises
reads over writes.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:51 +00:00
Joe Thornber
c6b4fcbad0 dm: add cache target
Add a target that allows a fast device such as an SSD to be used as a
cache for a slower device such as a disk.

A plug-in architecture was chosen so that the decisions about which data
to migrate and when are delegated to interchangeable tunable policy
modules.  The first general purpose module we have developed, called
"mq" (multiqueue), follows in the next patch.  Other modules are
under development.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:51 +00:00
Joe Thornber
7a87edfee7 dm persistent data: add bitset
Add a persistent bitset as a wrapper around dm-array.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:51 +00:00
Joe Thornber
6513c29f44 dm persistent data: add transactional array
Add a transactional array.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:51 +00:00
Joe Thornber
025b96853f dm thin: remove cells from stack
This patch takes advantage of the new bio-prison interface where the
memory is now passed in rather than using a mempool in bio-prison.
This allows the map function to avoid performing potentially-blocking
allocations that could lead to deadlocks: We want to avoid the cell
allocation that is done in bio_detain.

(The potential for mempool deadlocks still remains in other functions
that use bio_detain.)

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:50 +00:00
Joe Thornber
6beca5eb6e dm bio prison: pass cell memory in
Change the dm_bio_prison interface so that instead of allocating memory
internally, dm_bio_detain is supplied with a pre-allocated cell each
time it is called.

This enables a subsequent patch to move the allocation of the struct
dm_bio_prison_cell outside the thin target's mapping function so it can
no longer block there.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:50 +00:00
Joe Thornber
4e7f1f9089 dm persistent data: add btree_walk
Add dm_btree_walk to iterate through the contents of a btree.
This will be used by the dm cache target.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:50 +00:00
Alasdair G Kergon
b0d8ed4d96 dm: add target num_write_bios fn
Add a num_write_bios function to struct target.

If an instance of a target sets this, it will be queried before the
target's mapping function is called on a write bio, and the response
controls the number of copies of the write bio that the target will
receive.

This provides a convenient way for a target to send the same data to
more than one device.  The new cache target uses this in writethrough
mode, to send the data both to the cache and the backing device.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:49 +00:00
Mikulas Patocka
df5d2e9089 dm kcopyd: introduce configurable throttling
This patch allows the administrator to reduce the rate at which kcopyd
issues I/O.

Each module that uses kcopyd acquires a throttle parameter that can be
set in /sys/module/*/parameters.

We maintain a history of kcopyd usage by each module in the variables
io_period and total_period in struct dm_kcopyd_throttle. The actual
kcopyd activity is calculated as a percentage of time equal to
"(100 * io_period / total_period)".  This is compared with the user-defined
throttle percentage threshold and if it is exceeded, we sleep.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:49 +00:00
Mikulas Patocka
a26062416e dm ioctl: allow message to return data
This patch introduces enhanced message support that allows the
device-mapper core to recognise messages that are common to all devices,
and for messages to return data to userspace.

Core messages are processed by the function "message_for_md".  If the
device mapper doesn't support the message, it is passed to the target
driver.

If the message returns data, the kernel sets the flag
DM_MESSAGE_OUT_FLAG.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:49 +00:00
Mikulas Patocka
02cde50b7e dm ioctl: optimize functions without variable params
Device-mapper ioctls receive and send data in a buffer supplied
by userspace.  The buffer has two parts.  The first part contains
a 'struct dm_ioctl' and has a fixed size.  The second part depends
on the ioctl and has a variable size.

This patch recognises the specific ioctls that do not use the variable
part of the buffer and skips allocating memory for it.

In particular, when a device is suspended and a resume ioctl is sent,
this now avoid memory allocation completely.

The variable "struct dm_ioctl tmp" is moved from the function
copy_params to its caller ctl_ioctl and renamed to param_kernel.
It is used directly when the ioctl function doesn't need any arguments.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:49 +00:00
Mikulas Patocka
e2914cc26b dm ioctl: introduce ioctl_flags
This patch introduces flags for each ioctl function.

So far, one flag is defined, IOCTL_FLAGS_NO_PARAMS.  It is set if the
function processing the ioctl doesn't take or produce any parameters in
the section of the data buffer that has a variable size.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:48 +00:00
Jun'ichi Nomura
5f01520415 dm: merge io_pool and tio_pool
This patch merges io_pool and tio_pool into io_pool and cleans up
related functions.

Though device-mapper used to have 2 pools of objects for each dm device,
the use of bioset frontbad for per-bio data has shrunk the number of
pools to 1 for both bio-based and request-based device types.
(See c0820cf5 "dm: introduce per_bio_data" and
 94818742 "dm: Use bioset's front_pad for dm_rq_clone_bio_info")

So dm no longer has to maintain 2 different pointers.

No functional changes.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:48 +00:00
Jun'ichi Nomura
23e5083b4d dm: remove unused _rq_bio_info_cache
Remove _rq_bio_info_cache, which is no longer used.
No functional changes.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:48 +00:00
Mike Christie
87eb5b21d9 dm: fix limits initialization when there are no data devices
dm_calculate_queue_limits will first reset the provided limits to
defaults using blk_set_stacking_limits; whereby defeating the purpose of
retaining the original live table's limits -- as was intended via commit
3ae7065616 ("dm: retain table limits when
swapping to new table with no devices").

Fix this improper limits initialization (in the no data devices case) by
avoiding the call to dm_calculate_queue_limits.

[patch header revised by Mike Snitzer]

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v3.6+
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:48 +00:00
Mikulas Patocka
23cb21092e dm snapshot: add missing module aliases
Add module aliases so that autoloading works correctly if the user
tries to activate "snapshot-origin" or "snapshot-merge" targets.

Reference: https://bugzilla.redhat.com/889973

Reported-by: Chao Yang <chyang@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:47 +00:00
Mike Snitzer
018cede93c dm persistent data: set some btree fn parms const
Mark some constant parameters constant in some dm-btree functions.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:47 +00:00
Alasdair G Kergon
e4c938111f dm: refactor bio cloning
Refactor part of the bio splitting and cloning code to try to make it
easier to understand.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:47 +00:00
Alasdair G Kergon
14fe594d67 dm: rename bio cloning functions
Rename functions involved in splitting and cloning bios.

The sequence of functions is now:
  (1) __split_and_process* - entry point that selects the processing strategy
  (2) __send* - prepare the details for each bio needed and loop through them
  (3) __clone_and_map* - creates a clone and maps it

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:47 +00:00
Alasdair G Kergon
55a62eef8d dm: rename request variables to bios
Use 'bio' in the name of variables and functions that deal with
bios rather than 'request' to avoid confusion with the normal
block layer use of 'request'.

No functional changes.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:47 +00:00
Alasdair G Kergon
bd2a49b86d dm: clean up clone_bio
Remove the no-longer-used struct bio_set argument from clone_bio and split_bvec.
Use tio->ti in __map_bio() instead of passing in ti.
Factor out some code for setting up cloned bios.
Take target_request_nr as a parameter to alloc_tio().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:46 +00:00
Kees Cook
88ae4c5294 dm persistent data: remove CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:46 +00:00
Alasdair G Kergon
d57916a00f dm: remove CONFIG_EXPERIMENTAL
Remove EXPERIMENTAL from all existing device-mapper targets.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:46 +00:00
Mike Snitzer
58f77a2196 dm thin: use block_size_is_power_of_two
Use block_size_is_power_of_two() rather than checking
sectors_per_block_shift directly.  Also introduce local pool variable in
get_bio_block() to eliminate redundant tc->pool dereferences.

No functional change.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:45 +00:00
Mikulas Patocka
3daec3b447 dm bufio: use WRITE_FLUSH instead of REQ_FLUSH
Use WRITE_FLUSH instead of REQ_FLUSH for submitted requests to make it
consistent with the rest of the kernel. There is no functional change.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:45 +00:00
Wang Sheng-Hui
d2ce70a119 dm table: remove superfluous variable reset
If allocation fails, the local var *t is not used any more after kfree.
Don't need to reset it to NULL. Remove the unnecesary NULL set here.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:45 +00:00
Mike Snitzer
f13945d757 dm thin: support a non power of 2 discard_granularity
Support a non-power-of-2 discard granularity in dm-thin, now that the block
layer supports this(via 8dd2cb7e88 "block:
discard granularity might not be power of 2" and
59771079c1 "blk: avoid divide-by-zero with zero
discard granularity").

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:44 +00:00
Mikulas Patocka
fd7c092e71 dm: fix truncated status strings
Avoid returning a truncated table or status string instead of setting
the DM_BUFFER_FULL_FLAG when the last target of a table fills the
buffer.

When processing a table or status request, the function retrieve_status
calls ti->type->status. If ti->type->status returns non-zero,
retrieve_status assumes that the buffer overflowed and sets
DM_BUFFER_FULL_FLAG.

However, targets don't return non-zero values from their status method
on overflow. Most targets returns always zero.

If a buffer overflow happens in a target that is not the last in the
table, it gets noticed during the next iteration of the loop in
retrieve_status; but if a buffer overflow happens in the last target, it
goes unnoticed and erroneously truncated data is returned.

In the current code, the targets behave in the following way:
* dm-crypt returns -ENOMEM if there is not enough space to store the
  key, but it returns 0 on all other overflows.
* dm-thin returns errors from the status method if a disk error happened.
  This is incorrect because retrieve_status doesn't check the error
  code, it assumes that all non-zero values mean buffer overflow.
* all the other targets always return 0.

This patch changes the ti->type->status function to return void (because
most targets don't use the return code). Overflow is detected in
retrieve_status: if the status method fills up the remaining space
completely, it is assumed that buffer overflow happened.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:44 +00:00
Jun'ichi Nomura
16245bdc9d dm: do not replace bioset for request based dm
This patch fixes a regression introduced in v3.8, which causes oops
like this when dm-multipath is used:

general protection fault: 0000 [#1] SMP
RIP: 0010:[<ffffffff810fe754>]  [<ffffffff810fe754>] mempool_free+0x24/0xb0
Call Trace:
  <IRQ>
  [<ffffffff81187417>] bio_put+0x97/0xc0
  [<ffffffffa02247a5>] end_clone_bio+0x35/0x90 [dm_mod]
  [<ffffffff81185efd>] bio_endio+0x1d/0x30
  [<ffffffff811f03a3>] req_bio_endio.isra.51+0xa3/0xe0
  [<ffffffff811f2f68>] blk_update_request+0x118/0x520
  [<ffffffff811f3397>] blk_update_bidi_request+0x27/0xa0
  [<ffffffff811f343c>] blk_end_bidi_request+0x2c/0x80
  [<ffffffff811f34d0>] blk_end_request+0x10/0x20
  [<ffffffffa000b32b>] scsi_io_completion+0xfb/0x6c0 [scsi_mod]
  [<ffffffffa000107d>] scsi_finish_command+0xbd/0x120 [scsi_mod]
  [<ffffffffa000b12f>] scsi_softirq_done+0x13f/0x160 [scsi_mod]
  [<ffffffff811f9fd0>] blk_done_softirq+0x80/0xa0
  [<ffffffff81044551>] __do_softirq+0xf1/0x250
  [<ffffffff8142ee8c>] call_softirq+0x1c/0x30
  [<ffffffff8100420d>] do_softirq+0x8d/0xc0
  [<ffffffff81044885>] irq_exit+0xd5/0xe0
  [<ffffffff8142f3e3>] do_IRQ+0x63/0xe0
  [<ffffffff814257af>] common_interrupt+0x6f/0x6f
  <EOI>
  [<ffffffffa021737c>] srp_queuecommand+0x8c/0xcb0 [ib_srp]
  [<ffffffffa0002f18>] scsi_dispatch_cmd+0x148/0x310 [scsi_mod]
  [<ffffffffa000a38e>] scsi_request_fn+0x31e/0x520 [scsi_mod]
  [<ffffffff811f1e57>] __blk_run_queue+0x37/0x50
  [<ffffffff811f1f69>] blk_delay_work+0x29/0x40
  [<ffffffff81059003>] process_one_work+0x1c3/0x5c0
  [<ffffffff8105b22e>] worker_thread+0x15e/0x440
  [<ffffffff8106164b>] kthread+0xdb/0xe0
  [<ffffffff8142db9c>] ret_from_fork+0x7c/0xb0

The regression was introduced by the change
c0820cf5 "dm: introduce per_bio_data", where dm started to replace
bioset during table replacement.
For bio-based dm, it is good because clone bios do not exist during the
table replacement.
For request-based dm, however, (not-yet-mapped) clone bios may stay in
request queue and survive during the table replacement.
So freeing the old bioset could cause the oops in bio_put().

Since the size of front_pad may change only with bio-based dm,
it is not necessary to replace bioset for request-based dm.

Reported-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:44 +00:00
Randy Dunlap
8eae508b7c hsi: fix kernel-doc warnings
Fix kernel-doc warnings in hsi files:

  Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'e_handler' description in 'hsi_client'
  Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'pclaimed' description in 'hsi_client'
  Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'nb' description in 'hsi_client'
  Warning(drivers/hsi/hsi.c:434): No description found for parameter 'handler'
  Warning(drivers/hsi/hsi.c:434): Excess function parameter 'cb' description in 'hsi_register_port_event'

Don't document "private:" fields with kernel-doc notation.
If you want to leave them fully documented, that's OK, but
then don't mark them as "private:".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Carlos Chinea <carlos.chinea@nokia.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-omap@vger.kernel.org
Acked-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-01 13:39:00 -08:00
Russell King
ded3ef0fa7 ARM: Fix broken commit 0cc41e4a21 corrupting kernel messages
Commit 0cc41e4a21 (arch: remove direct definitions of KERN_<LEVEL>
uses) is broken - not enough thought was put into changing:

	.asciz	"string"

to

	.asciz	"string1" "string2"

The problem is that each string gets _separately_ NUL terminated, so
the result is a string containing:

	"string1\0string2\0"

rather than:

	"string1string2\0"

With our new printk levels, this ends up as - eg, KERN_DEBUG "string":

	0x01 0x00 0x07 0x00 "string" 0x00

which produces lots of \x01 in the kernel log.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-03-01 21:09:59 +00:00
Linus Torvalds
cc73dc04c7 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "Four cifs fixes (including for kernel bug #53221 and samba bug #9519)"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: bugfix for unreclaimed writeback pages in cifs_writev_requeue()
  cifs: set MAY_SIGN when sec=krb5
  POSIX extensions disabled on client due to illegal O_EXCL flag sent to Samba
  cifs: ensure that cifs_get_root() only traverses directories
2013-03-01 12:05:13 -08:00
Tim Gardner
5140a8ceaa autofs4 - autofs4_catatonic_mode(): remove redundant null check on kfree()
smatch analysis:

  fs/autofs4/waitq.c:46 autofs4_catatonic_mode() info: redundant null check on wq->name.name calling kfree()

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: autofs@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-01 12:04:39 -08:00
Peter Huewe
9d8072e7c3 autofs - Fix sparse warning: context imbalance in autofs4_d_automount() different lock contexts for basic block
Sparse complains:

  fs/autofs4/root.c:409:9: sparse: context imbalance in 'autofs4_d_automount' - different lock contexts for basic block

This was introduced by commit f55fb0c243 ("autofs4 - dont clear
DCACHE_NEED_AUTOMOUNT on rootless mount")

The function autofs4_d_automount can be left with the (&sbi->fs_lock)
held if sbi->version <= 4 and simple_empty(dentry) == false so the
warning seems valid.

--> Add an spin_unlock in this case before we jump to done

Unfortunately compile tested only.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-01 12:04:39 -08:00
Paul Gortmaker
180e001cd5 btrfs: fixup/remove module.h usage as required
We want to avoid module.h where posible, since it in turn includes
nearly all of header space.  This means removing it where it is not
required, and using export.h where we are only exporting symbols via
EXPORT_SYMBOL and friends.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-01 15:01:01 -05:00
Josef Bacik
124fe663f9 Btrfs: delete inline extents when we find them during logging
Apparently when we do inline extents we allow the data to overlap the last chunk
of the btrfs_file_extent_item, which means that we can possibly have a
btrfs_file_extent_item that isn't actually as large as a btrfs_file_extent_item.
This messes with us when we try to overwrite the extent when logging new extents
since we expect for it to be the right size.  To fix this just delete the item
and try to do the insert again which will give us the proper sized
btrfs_file_extent_item.  This fixes a panic where map_private_extent_buffer
would blow up because we're trying to write past the end of the leaf.  Thanks,

Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01 11:47:21 -05:00
Steven Noonan
45e27161c6 xenbus: fix compile failure on ARM with Xen enabled
Adding an include of linux/mm.h resolves this:
	drivers/xen/xenbus/xenbus_client.c: In function ‘xenbus_map_ring_valloc_hvm’:
	drivers/xen/xenbus/xenbus_client.c:532:66: error: implicit declaration of function ‘page_to_section’ [-Werror=implicit-function-declaration]

CC: stable@vger.kernel.org
Signed-off-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-03-01 10:55:00 -05:00
Konrad Rzeszutek Wilk
884ac2978a xen/pci: We don't do multiple MSI's.
There is no hypercall to setup multiple MSI per PCI device.
As such with these two new commits:
-  08261d87f7
   PCI/MSI: Enable multiple MSIs with pci_enable_msi_block_auto()
- 5ca72c4f7c
   AHCI: Support multiple MSIs

we would call the PHYSDEVOP_map_pirq 'nvec' times with the same
contents of the PCI device. Sander discovered that we would get
the same PIRQ value 'nvec' times and return said values to the
caller. That of course meant that the device was configured only
with one MSI and AHCI would fail with:

ahci 0000:00:11.0: version 3.0
xen: registering gsi 19 triggering 0 polarity 1
xen: --> pirq=19 -> irq=19 (gsi=19)
(XEN) [2013-02-27 19:43:07] IOAPIC[0]: Set PCI routing entry (6-19 -> 0x99 -> IRQ 19 Mode:1 Active:1)
ahci 0000:00:11.0: AHCI 0001.0200 32 slots 4 ports 6 Gbps 0xf impl SATA mode
ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part
ahci: probe of 0000:00:11.0 failed with error -22

That is b/c in ahci_host_activate the second call to
devm_request_threaded_irq  would return -EINVAL as we passed in
(on the second run) an IRQ that was never initialized.

CC: stable@vger.kernel.org
Reported-and-Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-03-01 10:54:21 -05:00
David Sterba
83c8266acc btrfs: try harder to allocate raid56 stripe cache
The stripe hash table is large, starting with allocation order 4 and can go as
high as order 7 in case lock debugging is turned on and structure padding
happens.

Observed mount failure:

mount: page allocation failure: order:7, mode:0x200050
Pid: 8234, comm: mount Tainted: G        W    3.8.0-default+ #267
Call Trace:
 [<ffffffff81114353>] warn_alloc_failed+0xf3/0x140
 [<ffffffff811171d2>] ? __alloc_pages_direct_compact+0x92/0x250
 [<ffffffff81117ac3>] __alloc_pages_nodemask+0x733/0x9d0
 [<ffffffff81152878>] ? cache_alloc_refill+0x3f8/0x840
 [<ffffffff811528bc>] cache_alloc_refill+0x43c/0x840
 [<ffffffff811302eb>] ? is_kernel_percpu_address+0x4b/0x90
 [<ffffffffa00a00ac>] ? btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs]
 [<ffffffff811531d7>] kmem_cache_alloc_trace+0x247/0x270
 [<ffffffffa00a00ac>] btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs]
 [<ffffffffa003133f>] open_ctree+0xb2f/0x1f90 [btrfs]
 [<ffffffff81397289>] ? string+0x49/0xe0
 [<ffffffff813987b3>] ? vsnprintf+0x443/0x5d0
 [<ffffffffa0007cb6>] btrfs_mount+0x526/0x600 [btrfs]
 [<ffffffff8115127c>] ? cache_alloc_debugcheck_after+0x4c/0x200
 [<ffffffff81162b90>] mount_fs+0x20/0xe0
 [<ffffffff8117db26>] vfs_kern_mount+0x76/0x120
 [<ffffffff811801b6>] do_mount+0x386/0x980
 [<ffffffff8112a5cb>] ? strndup_user+0x5b/0x80
 [<ffffffff81180840>] sys_mount+0x90/0xe0
 [<ffffffff81962e99>] system_call_fastpath+0x16/0x1b

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01 10:13:05 -05:00
Wang Shilong
88e081bf82 Btrfs: cleanup to make the function btrfs_delalloc_reserve_metadata more logic
The original code is a little confusing and not clear, The right
way to deal with the kernel code like this:
		[...]
		if (ret)
			goto out;
		[...]

So i move the common clean_up code to the place labeled with
out_fail, this will be easier to maintain.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01 10:13:04 -05:00
Wang Shilong
a9870c0e03 Btrfs: don't call btrfs_qgroup_free if just btrfs_qgroup_reserve fails
commit eb6b88d92c leads into another bug.
If it is just because qgroup_reserve fails, the function btrfs_qgroup_free
should not be called, otherwise, it will cause the wrong quota accounting.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01 10:13:04 -05:00
Wang Shilong
235fdb8ef2 Btrfs: remove reduplicate check about root in the function btrfs_clean_quota_tree
The check work has been done just before the function btrfs_clean_quota_tree
is called, it is not necessary to check it again, remove it.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01 10:13:04 -05:00
Wang Shilong
84cbe2f725 Btrfs: return ENOMEM rather than use BUG_ON when btrfs_alloc_path fails
Return ENOMEM rather trigger BUG_ON, fix it.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Zach Brown <zab@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01 10:13:04 -05:00
Wang Shilong
06b3a860dc Btrfs: fix missing deleted items in btrfs_clean_quota_tree
Steps to reproduce:

	i=0
	ncases=100

	mkfs.btrfs <disk>
	mount <disk> <mnt>
	btrfs quota enable <mnt>
	btrfs qgroup create 2/1 <mnt>
	while [ $i -le $ncases ]
	do
		btrfs qgroup create 1/$i <mnt>
		btrfs qgroup assign 1/$i 2/1 <mnt>
		i=$(($i+1))
	done

	btrfs quota disable <mnt>
	umount <mnt>
	btrfsck <mnt>

You can also use the commands:
	btrfs-debug-tree <disk> | grep QGROUP

You will find there are still items existed.The reasons why this happens
is because the original code just checks slots[0]==0 and returns.
We try to fix it by deleting the leaf one by one.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01 10:13:03 -05:00