Commit Graph

210440 Commits

Author SHA1 Message Date
Tao Ma
07eaac9438 ocfs2: Fix lockdep warning in reflink.
This patch change mutex_lock to a new subclass and
add a new inode lock subclass for the target inode
which caused this lockdep warning.

=============================================
[ INFO: possible recursive locking detected ]
2.6.35+ #5
---------------------------------------------
reflink/11086 is trying to acquire lock:
 (Meta){+++++.}, at: [<ffffffffa06f9d65>] ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2]

but task is already holding lock:
 (Meta){+++++.}, at: [<ffffffffa06f9aa0>] ocfs2_reflink_ioctl+0x5d3/0x1229 [ocfs2]

other info that might help us debug this:
6 locks held by reflink/11086:
 #0:  (&sb->s_type->i_mutex_key#15/1){+.+.+.}, at: [<ffffffff820e09ec>] lookup_create+0x26/0x97
 #1:  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa06f99a0>] ocfs2_reflink_ioctl+0x4d3/0x1229 [ocfs2]
 #2:  (Meta){+++++.}, at: [<ffffffffa06f9aa0>] ocfs2_reflink_ioctl+0x5d3/0x1229 [ocfs2]
 #3:  (&oi->ip_xattr_sem){+.+.+.}, at: [<ffffffffa06f9b58>] ocfs2_reflink_ioctl+0x68b/0x1229 [ocfs2]
 #4:  (&oi->ip_alloc_sem){+.+.+.}, at: [<ffffffffa06f9b67>] ocfs2_reflink_ioctl+0x69a/0x1229 [ocfs2]
 #5:  (&sb->s_type->i_mutex_key#15/2){+.+...}, at: [<ffffffffa06f9d4f>] ocfs2_reflink_ioctl+0x882/0x1229 [ocfs2]

stack backtrace:
Pid: 11086, comm: reflink Not tainted 2.6.35+ #5
Call Trace:
 [<ffffffff82063dd9>] validate_chain+0x56e/0xd68
 [<ffffffff82062275>] ? mark_held_locks+0x49/0x69
 [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1
 [<ffffffff82065a81>] lock_acquire+0xc6/0xed
 [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2]
 [<ffffffffa06c9ade>] __ocfs2_cluster_lock+0x975/0xa0d [ocfs2]
 [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2]
 [<ffffffffa06e107b>] ? ocfs2_wait_for_recovery+0x15/0x8a [ocfs2]
 [<ffffffffa06cb6ea>] ocfs2_inode_lock_full_nested+0x1ac/0xdc5 [ocfs2]
 [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2]
 [<ffffffff820623a0>] ? trace_hardirqs_on_caller+0x10b/0x12f
 [<ffffffff82060193>] ? debug_mutex_free_waiter+0x4f/0x53
 [<ffffffffa06f9d65>] ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2]
 [<ffffffffa06ce24a>] ? ocfs2_file_lock_res_init+0x66/0x78 [ocfs2]
 [<ffffffff820bb2d2>] ? might_fault+0x40/0x8d
 [<ffffffffa06df9f6>] ocfs2_ioctl+0x61a/0x656 [ocfs2]
 [<ffffffff820ee5d3>] ? mntput_no_expire+0x1d/0xb0
 [<ffffffff820e07b3>] ? path_put+0x2c/0x31
 [<ffffffff820e53ac>] vfs_ioctl+0x2a/0x9d
 [<ffffffff820e5903>] do_vfs_ioctl+0x45d/0x4ae
 [<ffffffff8233a7f6>] ? _raw_spin_unlock+0x26/0x2a
 [<ffffffff8200299c>] ? sysret_check+0x27/0x62
 [<ffffffff820e59ab>] sys_ioctl+0x57/0x7a
 [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10 09:19:06 -07:00
Tao Ma
5e64b0d9e8 ocfs2/lockdep: Move ip_xattr_sem out of ocfs2_xattr_get_nolock.
As the name shows, we shouldn't have any lock in
ocfs2_xattr_get_nolock. so lift ip_xattr_sem to the caller.
This should be safe for us since the only 2 callers are:
1. ocfs2_xattr_get which will lock the resources.
2. ocfs2_mknod which don't need this locking.

And this also resolves the following lockdep warning.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.35+ #5
-------------------------------------------------------
reflink/30027 is trying to acquire lock:
 (&oi->ip_alloc_sem){+.+.+.}, at: [<ffffffffa0673b67>] ocfs2_reflink_ioctl+0x69a/0x1226 [ocfs2]

but task is already holding lock:
 (&oi->ip_xattr_sem){++++..}, at: [<ffffffffa0673b58>] ocfs2_reflink_ioctl+0x68b/0x1226 [ocfs2]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&oi->ip_xattr_sem){++++..}:
       [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1
       [<ffffffff82065a81>] lock_acquire+0xc6/0xed
       [<ffffffff82339650>] down_read+0x34/0x47
       [<ffffffffa0691cb8>] ocfs2_xattr_get_nolock+0xa0/0x4e6 [ocfs2]
       [<ffffffffa069d64f>] ocfs2_get_acl_nolock+0x5c/0x132 [ocfs2]
       [<ffffffffa069d9c7>] ocfs2_init_acl+0x60/0x243 [ocfs2]
       [<ffffffffa066499d>] ocfs2_mknod+0xae8/0xfea [ocfs2]
       [<ffffffffa0665041>] ocfs2_create+0x9d/0x105 [ocfs2]
       [<ffffffff820e1c83>] vfs_create+0x9b/0xf4
       [<ffffffff820e20bb>] do_last+0x2fd/0x5be
       [<ffffffff820e31c0>] do_filp_open+0x1fb/0x572
       [<ffffffff820d6cf6>] do_sys_open+0x5a/0xe7
       [<ffffffff820d6dac>] sys_open+0x1b/0x1d
       [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b

-> #2 (jbd2_handle){+.+...}:
       [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1
       [<ffffffff82065a81>] lock_acquire+0xc6/0xed
       [<ffffffffa0604ff8>] start_this_handle+0x4a3/0x4bc [jbd2]
       [<ffffffffa06051d6>] jbd2__journal_start+0xba/0xee [jbd2]
       [<ffffffffa0605218>] jbd2_journal_start+0xe/0x10 [jbd2]
       [<ffffffffa065ca34>] ocfs2_start_trans+0xb7/0x19b [ocfs2]
       [<ffffffffa06645f3>] ocfs2_mknod+0x73e/0xfea [ocfs2]
       [<ffffffffa0665041>] ocfs2_create+0x9d/0x105 [ocfs2]
       [<ffffffff820e1c83>] vfs_create+0x9b/0xf4
       [<ffffffff820e20bb>] do_last+0x2fd/0x5be
       [<ffffffff820e31c0>] do_filp_open+0x1fb/0x572
       [<ffffffff820d6cf6>] do_sys_open+0x5a/0xe7
       [<ffffffff820d6dac>] sys_open+0x1b/0x1d
       [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b

-> #1 (&journal->j_trans_barrier){.+.+..}:
       [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1
       [<ffffffff82064fa9>] lock_release_non_nested+0x1e5/0x24b
       [<ffffffff82065999>] lock_release+0x158/0x17a
       [<ffffffff823389f6>] __mutex_unlock_slowpath+0xbf/0x11b
       [<ffffffff82338a5b>] mutex_unlock+0x9/0xb
       [<ffffffffa0679673>] ocfs2_free_ac_resource+0x31/0x67 [ocfs2]
       [<ffffffffa067c6bc>] ocfs2_free_alloc_context+0x11/0x1d [ocfs2]
       [<ffffffffa0633de0>] ocfs2_write_begin_nolock+0x141e/0x159b [ocfs2]
       [<ffffffffa0635523>] ocfs2_write_begin+0x11e/0x1e7 [ocfs2]
       [<ffffffff820a1297>] generic_file_buffered_write+0x10c/0x210
       [<ffffffffa0653624>] ocfs2_file_aio_write+0x4cc/0x6d3 [ocfs2]
       [<ffffffff820d822d>] do_sync_write+0xc2/0x106
       [<ffffffff820d897b>] vfs_write+0xae/0x131
       [<ffffffff820d8e55>] sys_write+0x47/0x6f
       [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b

-> #0 (&oi->ip_alloc_sem){+.+.+.}:
       [<ffffffff82063f92>] validate_chain+0x727/0xd68
       [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1
       [<ffffffff82065a81>] lock_acquire+0xc6/0xed
       [<ffffffff82339694>] down_write+0x31/0x52
       [<ffffffffa0673b67>] ocfs2_reflink_ioctl+0x69a/0x1226 [ocfs2]
       [<ffffffffa06599f6>] ocfs2_ioctl+0x61a/0x656 [ocfs2]
       [<ffffffff820e53ac>] vfs_ioctl+0x2a/0x9d
       [<ffffffff820e5903>] do_vfs_ioctl+0x45d/0x4ae
       [<ffffffff820e59ab>] sys_ioctl+0x57/0x7a
       [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10 09:19:05 -07:00
Linus Torvalds
152831be91 Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm: (30 commits)
  ARM: Update mach-types
  ARM: Partially revert "Auto calculate ZRELADDR and provide option for exceptions"
  ARM: Ensure PTE modifications via dma_alloc_coherent are visible
  ARM: 6359/1: ep93xx: move clock initialization earlier
  Revert "[ARM] pxa: remove now unnecessary dma_needs_bounce()"
  ARM: 6352/1: perf: fix event validation
  ARM: 6344/1: Mark CPU_32v6K as depended on CPU_V7
  ARM: 6343/1: wire up fanotify and prlimit64 syscalls on ARM
  ARM: 6330/1: perf: reword comments relating to perf_event_do_pending
  ARM: pxa168fb: fix section mismatch
  ARM: pxa: Make id const in pwm_probe()
  ARM: pxa: fix CI_HSYNC and CI_VSYNC MFP defines for pxa300
  ARM: pxa: remove __init from cpufreq_driver->init()
  ARM: imx: set cache line size to 64 bytes for i.MX5
  mx5/clock: fix clear bit fields issue in _clk_ccgr_disable function
  mxc/tzic: add base address when accessing TZIC registers
  ARM: mach-shmobile: ap4evb: fix write protect for SDHI1
  ARM: mach-shmobile: ap4evb: modify FSI2 ID
  ARM: mach-shmobile: do not enable the PLLC2 clock on init
  ARM: mach-shmobile: Clock framework comment fix
  ...
2010-09-09 18:31:34 -07:00
Russell King
a14d040408 ARM: Update mach-types
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-09-09 22:49:26 +01:00
Russell King
9e84ed63dc ARM: Partially revert "Auto calculate ZRELADDR and provide option for exceptions"
Partially revert e69edc7, which introduced automatic zreladdr
support.  The change in the way the manual definition is defined
seems to be error and conflict prone.  Go back to the original way
we were handling this for the time being, while keeping the automatic
zreladdr facility.

Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-09-09 22:39:41 +01:00
Russell King
de9ea203d1 Merge branch 'origin' 2010-09-09 22:38:43 +01:00
Jonathan Corbet
a73f8844e1 lglock: make lg_lock_global() actually lock globally
lg_lock_global() currently only acquires spinlocks for online CPUs, but
it's meant to lock all possible CPUs.  Lglock-protected resources may be
associated with removed CPUs - and, indeed, that could happen with the
per-superblock open files lists.

At Nick's suggestion, change for_each_online_cpu() to
for_each_possible_cpu() to protect accesses to those resources.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 09:09:43 -07:00
Stefan Bader
39aa3cb3e8 mm: Move vma_stack_continue into mm.h
So it can be used by all that need to check for that.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 09:05:06 -07:00
Linus Torvalds
26a94e81de Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/nes: Fix hang with modified FIN handling on A0 cards
  RDMA/nes: Change state to closing after FIN
  RDMA/nes: Fix double CLOSE event indication crash
  RDMA/nes: Write correct register write to set TX pause param
  RDMA/cxgb3: Don't exceed the max HW CQ depth
2010-09-09 08:58:52 -07:00
Linus Torvalds
cad46744a3 Merge branch 'fixes' of git://oss.oracle.com/git/tma/linux-2.6
* 'fixes' of git://oss.oracle.com/git/tma/linux-2.6:
  ocfs2: Fix orphan add in ocfs2_create_inode_in_orphan
  ocfs2: split out ocfs2_prepare_orphan_dir() into locking and prep functions
  ocfs2: allow return of new inode block location before allocation of the inode
  ocfs2: use ocfs2_alloc_dinode_update_counts() instead of open coding
  ocfs2: split out inode alloc code from ocfs2_mknod_locked
  Ocfs2: Fix a regression bug from mainline commit(6b933c8e6f).
  ocfs2: Fix deadlock when allocating page
  ocfs2: properly set and use inode group alloc hint
  ocfs2: Use the right group in nfs sync check.
  ocfs2: Flush drive's caches on fdatasync
  ocfs2: make __ocfs2_page_mkwrite handle file end properly.
  ocfs2: Fix incorrect checksum validation error
  ocfs2: Fix metaecc error messages
2010-09-09 08:57:02 -07:00
Roland Dreier
17859d07c8 Merge branches 'cxgb3' and 'nes' into for-linus 2010-09-08 14:43:28 -07:00
Faisal Latif
29da03b9d1 RDMA/nes: Fix hang with modified FIN handling on A0 cards
Changing state to CLOSING when FIN is received causes A0 cards to
hang.  Fix this by checking for A0 cards in FIN handling.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-08 14:38:23 -07:00
Faisal Latif
67d7072115 RDMA/nes: Change state to closing after FIN
When the driver receives an AE for FIN received, it closes the
connection without changing the state of the connection in the
hardware to closing.  By changing the state to closing, hardware will
do a normal close sequence.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-08 14:38:04 -07:00
Faisal Latif
dae58728dc RDMA/nes: Fix double CLOSE event indication crash
During a stress testing in a large cluster, multiple close event are
detected and BUG() is hit in the iWARP core.  The cause is that the
active node gave up while waiting for an MPA response from the peer
and tried to close the connection by sending RST.  The passive node
driver receives the RST but is waiting for MPA response from the user.
When the MPA accept is received, the driver offloads the connection
and sends a CLOSE event.  The driver gets an AE indicating RESET
received and also sends a CLOSE event, hitting a BUG().

Fix this by correcting RESET handling and sending CLOSE events.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-08 14:35:48 -07:00
Chien Tung
70c9db0fdf RDMA/nes: Write correct register write to set TX pause param
Setting TX pause param writes to the wrong register location causing
the adapter to hang.  Correct the define used to write the reigster.

Addresses: https://bugs.openfabrics.org/show_bug.cgi?id=2116
Reported-by: Shiri Franchi <shirif@voltaire.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-08 14:29:19 -07:00
Linus Torvalds
cc491e27d3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: i8042 - fix device removal on unload
  Input: bcm5974 - adjust major/minor to scale
  Input: MT - initialize slots to unused
  Input: use PIT_TICK_RATE in vt beep ioctl
  Input: wacom - fix mousewheel handling for old wacom tablets
2010-09-08 11:19:18 -07:00
Linus Torvalds
2c20130f20 Merge branch 'semaphore-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'semaphore-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  semaphore: Add DEFINE_SEMAPHORE
2010-09-08 11:15:51 -07:00
Linus Torvalds
1faa6ec8cc Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks
  io-mapping: Fix the address space annotations
  x86: Fix the address space annotations of iomap_atomic_prot_pfn()
  x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
  x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_dev
2010-09-08 11:14:10 -07:00
Linus Torvalds
79637a41e4 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  gcc-4.6: kernel/*: Fix unused but set warnings
  mutex: Fix annotations to include it in kernel-locking docbook
  pid: make setpgid() system call use RCU read-side critical section
  MAINTAINERS: Add RCU's public git tree
2010-09-08 11:13:42 -07:00
Linus Torvalds
899edae615 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Try to handle unknown nmis with an enabled PMU
  perf, x86: Fix handle_irq return values
  perf, x86: Fix accidentally ack'ing a second event on intel perf counter
  oprofile, x86: fix init_sysfs() function stub
  lockup_detector: Sync touch_*_watchdog back to old semantics
  tracing: Fix a race in function profile
  oprofile, x86: fix init_sysfs error handling
  perf_events: Fix time tracking for events with pid != -1 and cpu != -1
  perf: Initialize callchains roots's childen hits
  oprofile: fix crash when accessing freed task structs
2010-09-08 11:13:16 -07:00
Linus Torvalds
c8c727db41 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix lock annotations
  fuse: flush background queue on connection close
2010-09-08 11:12:59 -07:00
Russell King
2be23c475a ARM: Ensure PTE modifications via dma_alloc_coherent are visible
Dave Hylands reports:
| We've observed a problem with dma_alloc_writecombine when the system
| is under heavy load (heavy bus traffic).  We've managed to reduce the
| problem to the following snippet, which is run from a kthread in a
| continuous loop:
|
|   void *virtAddr;
|   dma_addr_t physAddr;
|   unsigned int numBytes = 256;
|
|   for (;;) {
|       virtAddr = dma_alloc_writecombine(NULL,
|             numBytes, &physAddr, GFP_KERNEL);
|       if (virtAddr == NULL) {
|          printk(KERN_ERR "Running out of memory\n");
|          break;
|       }
|
|       /* access DMA memory allocated */
|       tmp = virtAddr;
|       *tmp = 0x77;
|
|       /* free DMA memory */
|       dma_free_writecombine(NULL,
|             numBytes, virtAddr, physAddr);
|
|         ...sleep here...
|     }
|
| By itself, the code will run forever with no issues. However, as we
| increase our bus traffic (typically using DMA) then the *tmp = 0x77
| line will eventually cause a page fault. If we add a small delay (a
| few microseconds) before the *tmp = 0x77, then we don't see a page
| fault, even under heavy load.

A dsb() is required after modifying the PTE entries to ensure that they
will always be visible.  Add this dsb().

Reported-by: Dave Hylands <dhylands@gmail.com>
Tested-by: Dave Hylands <dhylands@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-09-08 16:27:56 +01:00
Thomas Gleixner
febc88c594 semaphore: Add DEFINE_SEMAPHORE
The full cleanup of init_MUTEX[_LOCKED] and DECLARE_MUTEX has not been
done. Some of the users are real semaphores and we should name them as
such instead of confusing everyone with "MUTEX".

Provide the infrastructure to get finally rid of init_MUTEX[_LOCKED]
and DECLARE_MUTEX.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
LKML-Reference: <20100907125054.795929962@linutronix.de>
2010-09-08 15:04:10 +02:00
Mika Westerberg
a387f0f540 ARM: 6359/1: ep93xx: move clock initialization earlier
Commit 7cfe24947 ("ARM: AMBA: Add pclk support to AMBA bus
infrastructure") changed AMBA bus to handle the PCLK automatically.
However, in EP93xx clock initialization is arch_initcall which is done
later than AMBA device identification. This causes
amba_get_enable_pclk() to fail resulting device where UARTs are not
functional.

So change ep93xx_clock_init() to be postcore_initcall.

Signed-off-by: Mika Westerberg <mika.westerberg@iki.fi>
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-09-08 12:28:39 +01:00
Russell King
0485e18bc4 Revert "[ARM] pxa: remove now unnecessary dma_needs_bounce()"
This reverts commit 4fa5518, which causes a compilation regression for
IXP4xx platforms.

Reported-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-09-08 12:28:39 +01:00
Mark Fasheh
97b8f4a9df ocfs2: Fix orphan add in ocfs2_create_inode_in_orphan
ocfs2_create_inode_in_orphan() is used by reflink to create the newly
reflinked inode simultaneously in the orphan dir. This allows us to easily
handle partially-reflinked files during recovery cleanup.

We have a problem though - the orphan dir stringifies inode # to determine
a unique name under which the orphan entry dirent can be created. Since
ocfs2_create_inode_in_orphan() needs the space allocated in the orphan dir
before it can allocate the inode, we currently call into the orphan code:

       /*
        * We give the orphan dir the root blkno to fake an orphan name,
        * and allocate enough space for our insertion.
        */
       status = ocfs2_prepare_orphan_dir(osb, &orphan_dir,
                                         osb->root_blkno,
                                         orphan_name, &orphan_insert);

Using osb->root_blkno might work fine on unindexed directories, but the
orphan dir can have an index.  When it has that index, the above code fails
to allocate the proper index entry.  Later, when we try to remove the file
from the orphan dir (using the actual inode #), the reflink operation will
fail.

To fix this, I created a function ocfs2_alloc_orphaned_file() which uses the
newly split out orphan and inode alloc code to figure out what the inode
block number will be (once allocated) and then prepare the orphan dir from
that data.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08 14:26:00 +08:00
Mark Fasheh
dd43bcde23 ocfs2: split out ocfs2_prepare_orphan_dir() into locking and prep functions
We do this because ocfs2_create_inode_in_orphan() wants to order locking of
the orphan dir with respect to locking of the inode allocator *before*
making any changes to the directory.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08 14:26:00 +08:00
Mark Fasheh
e49e27674d ocfs2: allow return of new inode block location before allocation of the inode
This allows code which needs to know the eventual block number of an inode
but can't allocate it yet due to transaction or lock ordering. For example,
ocfs2_create_inode_in_orphan() currently gives a junk blkno for preparation
of the orphan dir because it can't yet know where the actual inode is placed
- that code is actually in ocfs2_mknod_locked. This is a problem when the
orphan dirs are indexed as the junk inode number will create an index entry
which goes unused (and fails the later removal from the orphan dir).  Now
with these interfaces, ocfs2_create_inode_in_orphan() can run the block
group search (and get back the inode block number) *before* any actual
allocation occurs.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08 14:25:59 +08:00
Mark Fasheh
d51349829c ocfs2: use ocfs2_alloc_dinode_update_counts() instead of open coding
ocfs2_search_chain() makes the same updates as
ocfs2_alloc_dinode_update_counts to the alloc inode. Instead of open coding
the bitmap update, use our helper function.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08 14:25:58 +08:00
Mark Fasheh
021960cab3 ocfs2: split out inode alloc code from ocfs2_mknod_locked
Do this by splitting the bulk of the function away from the inode allocation
code at the very tom of ocfs2_mknod_locked(). Existing callers don't need to
change and won't see any difference. The new function created,
__ocfs2_mknod_locked() will be used shortly.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08 14:25:58 +08:00
Tristan Ye
81c8c82b5a Ocfs2: Fix a regression bug from mainline commit(6b933c8e6f).
The patch is to fix the regression bug brought from commit 6b933c8...( 'ocfs2:
Avoid direct write if we fall back to buffered I/O'):

http://oss.oracle.com/bugzilla/show_bug.cgi?id=1285

The commit 6b933c8e6f changed __generic_file_aio_write
to generic_file_buffered_write, which didn't call filemap_{write,wait}_range to  flush
the pagecaches when we were falling O_DIRECT writes back to buffered ones. it did hurt
the O_DIRECT semantics somehow in extented odirect writes.

This patch tries to guarantee O_DIRECT writes of 'fall back to buffered' to be correctly
flushed.

Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08 14:25:57 +08:00
Jan Kara
9b4c0ff32c ocfs2: Fix deadlock when allocating page
We cannot call grab_cache_page() when holding filesystem locks or with
a transaction started as grab_cache_page() calls page allocation with
GFP_KERNEL flag and thus page reclaim can recurse back into the filesystem
causing deadlocks or various assertion failures. We have to use
find_or_create_page() instead and pass it GFP_NOFS as we do with other
allocations.

Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08 14:25:57 +08:00
Mark Fasheh
b2b6ebf5f7 ocfs2: properly set and use inode group alloc hint
We were setting ac->ac_last_group in ocfs2_claim_suballoc_bits from
res->sr_bg_blkno.  Unfortunately, res->sr_bg_blkno is going to be zero under
normal (non-fragmented) circumstances. The discontig block group patches
effectively turned off that feature. Fix this by correctly calculating what
the next group hint should be.

Acked-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08 14:25:56 +08:00
Tao Ma
889f004a8c ocfs2: Use the right group in nfs sync check.
We have added discontig block group now, and now an inode
can be allocated in an discontig block group. So get
it in ocfs2_get_suballoc_slot_bit.

The old ocfs2_test_suballoc_bit gets group block no
from the allocation inode which is wrong. Fix it by
passing the right group.

Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08 14:25:56 +08:00
Jan Kara
04eda1a180 ocfs2: Flush drive's caches on fdatasync
When 'barrier' mount option is specified, we have to issue a cache flush
during fdatasync(2). We have to do this even if inode doesn't have
I_DIRTY_DATASYNC set because we still have to get written *data* to disk so
that they are not lost in case of crash.

Acked-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Singed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08 14:25:55 +08:00
Tao Ma
f63afdb2c3 ocfs2: make __ocfs2_page_mkwrite handle file end properly.
__ocfs2_page_mkwrite now is broken in handling file end.
1. the last page should be the page contains i_size - 1.
2. the len in the last page is also calculated wrong.
So change them accordingly.

Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08 14:25:55 +08:00
Sunil Mushran
f5ce5a08a4 ocfs2: Fix incorrect checksum validation error
For local mounts, ocfs2_read_locked_inode() calls ocfs2_read_blocks_sync() to
read the inode off the disk. The latter first checks to see if that block is
cached in the journal, and, if so, returns that block. That is ok.

But ocfs2_read_locked_inode() goes wrong when it tries to validate the checksum
of such blocks. Blocks that are cached in the journal may not have had their
checksum computed as yet. We should not validate the checksums of such blocks.

Fixes ossbz#1282
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1282

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Cc: stable@kernel.org
Singed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08 14:25:54 +08:00
Sunil Mushran
dc696aced9 ocfs2: Fix metaecc error messages
Like tools, the checksum validate function now prints the values in hex.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Singed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08 14:25:53 +08:00
Linus Torvalds
4f63e3c5be Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux:
  nfsd4: mask out non-access bits in nfs4_access_to_omode
2010-09-07 19:21:02 -07:00
Linus Torvalds
e6f901bb85 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  ima: always maintain counters
  AppArmor: Fix locking from removal of profile namespace
  AppArmor: Fix splitting an fqname into separate namespace and profile names
  AppArmor: Fix security_task_setrlimit logic for 2.6.36 changes
  AppArmor: Drop hack to remove appended " (deleted)" string
2010-09-07 19:13:06 -07:00
Mimi Zohar
e950598d43 ima: always maintain counters
commit 8262bb85da allocated the inode integrity struct (iint) before any
inodes were created. Only after IMA was initialized in late_initcall were
the counters updated. This patch updates the counters, whether or not IMA
has been initialized, to resolve 'imbalance' messages.

This patch fixes the bug as reported in bugzilla: 15673.  When the i915
is builtin, the ring_buffer is initialized before IMA, causing the
imbalance message on suspend.

Reported-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Tested-by: Thomas Meyer <thomas@m3y3r.de>
Tested-by: David Safford<safford@watson.ibm.com>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: James Morris <jmorris@namei.org>
2010-09-08 09:51:41 +10:00
John Johansen
999b4f0aa2 AppArmor: Fix locking from removal of profile namespace
The locking for profile namespace removal is wrong, when removing a
profile namespace, it needs to be removed from its parent's list.
Lock the parent of namespace list instead of the namespace being removed.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
2010-09-08 09:19:34 +10:00
John Johansen
04ccd53f09 AppArmor: Fix splitting an fqname into separate namespace and profile names
As per Dan Carpenter <error27@gmail.com>
  If we have a ns name without a following profile then in the original
  code it did "*ns_name = &name[1];".  "name" is NULL so "*ns_name" is
  0x1.  That isn't useful and could cause an oops when this function is
  called from aa_remove_profiles().

Beyond this the assignment of the namespace name was wrong in the case
where the profile name was provided as it was being set to &name[1]
after name  = skip_spaces(split + 1);

Move the ns_name assignment before updating name for the split and
also add skip_spaces, making the interface more robust.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
2010-09-08 09:19:31 +10:00
John Johansen
3a2dc8382a AppArmor: Fix security_task_setrlimit logic for 2.6.36 changes
2.6.36 introduced the abilitiy to specify the task that is having its
rlimits set.  Update mediation to ensure that confined tasks can only
set their own group_leader as expected by current policy.

Add TODO note about extending policy to support setting other tasks
rlimits.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
2010-09-08 09:19:29 +10:00
John Johansen
e819ff519b AppArmor: Drop hack to remove appended " (deleted)" string
The 2.6.36 kernel has refactored __d_path() so that it no longer appends
" (deleted)" to unlinked paths.  So drop the hack that was used to detect
and remove the appended string.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
2010-09-08 09:19:24 +10:00
Linus Torvalds
d56557af19 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: bus speed strings should be const
  PCI hotplug: Fix build with CONFIG_ACPI unset
  PCI: PCIe: Remove the port driver module exit routine
  PCI: PCIe: Move PCIe PME code to the pcie directory
  PCI: PCIe: Disable PCIe port services during port initialization
  PCI: PCIe: Ask BIOS for control of all native services at once
  ACPI/PCI: Negotiate _OSC control bits before requesting them
  ACPI/PCI: Do not preserve _OSC control bits returned by a query
  ACPI/PCI: Make acpi_pci_query_osc() return control bits
  ACPI/PCI: Reorder checks in acpi_pci_osc_control_set()
  PCI: PCIe: Introduce commad line switch for disabling port services
  PCI: PCIe AER: Introduce pci_aer_available()
  x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set
  PCI: provide stub pci_domain_nr function for !CONFIG_PCI configs
2010-09-07 16:00:17 -07:00
Linus Torvalds
fa2925cf90 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: Make fiemap work with sparse files
  xfs: prevent 32bit overflow in space reservation
  xfs: Disallow 32bit project quota id
  xfs: improve buffer cache hash scalability
2010-09-07 15:44:28 -07:00
Linus Torvalds
98e52c373c Merge branch 'for-linus' of git://android.kernel.org/kernel/tegra
* 'for-linus' of git://android.kernel.org/kernel/tegra:
  [ARM] tegra: Add ZRELADDR default for ARCH_TEGRA
2010-09-07 14:48:44 -07:00
Linus Torvalds
add2b10f2b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha-2.6:
  alpha: Fix printk format errors
  alpha: convert perf_event to use local_t
  Fix call to replaced SuperIO functions
  alpha: remove homegrown L1_CACHE_ALIGN macro
2010-09-07 14:38:54 -07:00
Linus Torvalds
3c5dff7b5e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: potential ERR_PTR() dereference
2010-09-07 14:38:21 -07:00