Commit Graph

211602 Commits

Author SHA1 Message Date
Alex Elder
39dc948c69 Merge branch 'v2.6.36' 2010-10-21 08:29:34 -05:00
Linus Torvalds
f6f94e2ab1 Linux 2.6.36 2010-10-20 13:30:22 -07:00
Linus Torvalds
7d7c4d06be Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
  MIPS: O32 compat/N32: Fix to use compat syscall wrappers for AIO syscalls.
  MAINTAINERS: Change list for ioc_serial to linux-serial.
  SERIAL: ioc3_serial: Return -ENOMEM on memory allocation failure
  MIPS: jz4740: Fix Kbuild Platform file.
  MIPS: Repair Kbuild make clean breakage.
2010-10-20 13:18:21 -07:00
Amit Shah
531295e63b virtio: console: Don't block entire guest if host doesn't read data
If the host is slow in reading data or doesn't read data at all,
blocking write calls not only blocked the program that called write()
but the entire guest itself.

To overcome this, let's not block till the host signals it has given
back the virtio ring element we passed it.  Instead, send the buffer to
the host and return to userspace.  This operation then becomes similar
to how non-blocking writes work, so let's use the existing code for this
path as well.

This code change also ensures blocking write calls do get blocked if
there's not enough room in the virtio ring as well as they don't return
-EAGAIN to userspace.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
CC: stable@kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-20 13:18:04 -07:00
Linus Torvalds
30c278192f Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] bsg: fix incorrect device_status value
  [SCSI] Fix VPD inquiry page wrapper
2010-10-20 13:13:09 -07:00
Linus Torvalds
ef2533dae5 Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Fix fs/gs reload oops with invalid ldt
2010-10-20 09:00:44 -07:00
Michel Thebeau
e2cc502c3f MIPS: O32 compat/N32: Fix to use compat syscall wrappers for AIO syscalls.
[Ralf: Michel's original patch only fixed N32; I replicated the same fix
for O32.]

Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
Cc: paul.gortmaker@windriver.com
Cc: bruce.ashfield@windriver.com
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-10-19 18:32:41 +01:00
Ralf Baechle
d39e072166 MAINTAINERS: Change list for ioc_serial to linux-serial.
IOC3 is also being used on SGI MIPS systems but this particular driver is
only being used on IA64 systems so linux-mips made no sense as a list.  Pat
also thinks linux-serial@vger.kernel.org is the better list.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-10-19 18:32:41 +01:00
Julia Lawall
6cc0cc4a35 SERIAL: ioc3_serial: Return -ENOMEM on memory allocation failure
In this code, 0 is returned on memory allocation failure, even though other
failures return -ENOMEM or other similar values.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression ret;
expression x,e1,e2,e3;
@@

ret = 0
... when != ret = e1
*x = \(kmalloc\|kcalloc\|kzalloc\)(...)
... when != ret = e2
if (x == NULL) { ... when != ret = e3
  return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
To: Pat Gefre <pfg@sgi.com>
Cc: kernel-janitors@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/1704/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-10-19 18:32:40 +01:00
David Daney
08be7b2bc7 MIPS: jz4740: Fix Kbuild Platform file.
The platform specific files should be included via the platform-y
variable.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Patchwork: https://patchwork.linux-mips.org/patch/1719/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-10-19 18:32:39 +01:00
David Daney
ad4b2b627c MIPS: Repair Kbuild make clean breakage.
When running make clean, Kbuild doesn't process the .config file, so nothing
generates a platform-y variable.  We can get it to descend into the platform
directories by setting $(obj-).

The dec Platform file was unconditionally setting platform-, obliterating
its previous contents and preventing some directories from being cleaned.
This is change to an append operation '+=' to allow cavium-octeon to be
cleaned.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Patchwork: https://patchwork.linux-mips.org/patch/1718/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-10-19 18:32:39 +01:00
Linus Torvalds
51ea8a88aa Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon/kms: avivo cursor workaround applies to evergreen as well
2010-10-19 10:10:20 -07:00
Avi Kivity
9581d442b9 KVM: Fix fs/gs reload oops with invalid ldt
kvm reloads the host's fs and gs blindly, however the underlying segment
descriptors may be invalid due to the user modifying the ldt after loading
them.

Fix by using the safe accessors (loadsegment() and load_gs_index()) instead
of home grown unsafe versions.

This is CVE-2010-3698.

KVM-Stable-Tag.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-19 14:21:45 -02:00
Linus Torvalds
547af560dd Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
  MIPS: Enable ISA_DMA_API config to fix build failure
  MIPS: 32-bit: Fix build failure in asm/fcntl.h
  MIPS: Remove all generated vmlinuz* files on "make clean"
  MIPS: do_sigaltstack() expects userland pointers
  MIPS: Fix error values in case of bad_stack
  MIPS: Sanitize restart logics
  MIPS: secure_computing, syscall audit: syscall number should in r2, not r0.
  MIPS: Don't block signals if we'd failed to setup a sigframe
2010-10-18 13:10:36 -07:00
Linus Torvalds
b0579fc089 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: evdev - fix EVIOCSABS regression
  Input: evdev - fix Ooops in EVIOCGABS/EVIOCSABS
2010-10-18 13:10:08 -07:00
Linus Torvalds
7f81c56cf2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  firewire: ohci: fix TI TSB82AA2 regression since 2.6.35
2010-10-18 13:09:26 -07:00
Thomas Gleixner
a731cd116c xfs: semaphore cleanup
Get rid of init_MUTEX[_LOCKED]() and use sema_init() instead.

(Ported to current XFS code by <aelder@sgi.com>.)

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:09:09 -05:00
Sascha Hauer
63f1474c69 mxc_nand: do not depend on disabling the irq in the interrupt handler
This patch reverts the driver to enabling/disabling the NFC interrupt
mask rather than enabling/disabling the system interrupt.  This cleans
up the driver so that it doesn't rely on interrupts being disabled
within the interrupt handler.

For i.MX21 we keep the current behaviour, that is calling
enable_irq/disable_irq_nosync to enable/disable interrupts.  This patch
is based on earlier work by John Ogness.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: John Ogness <john.ogness@linutronix.de>
Tested-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-18 13:09:05 -07:00
Arkadiusz Mi?kiewicz
6743099ce5 xfs: Extend project quotas to support 32bit project ids
This patch adds support for 32bit project quota identifiers.

On disk format is backward compatible with 16bit projid numbers. projid
on disk is now kept in two 16bit values - di_projid_lo (which holds the
same position as old 16bit projid value) and new di_projid_hi (takes
existing padding) and converts from/to 32bit value on the fly.

xfs_admin (for existing fs), mkfs.xfs (for new fs) needs to be used
to enable PROJID32BIT support.

Signed-off-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:08:08 -05:00
Christoph Hellwig
1a1a3e97ba xfs: remove xfs_buf wrappers
Stop having two different names for many buffer functions and use
the more descriptive xfs_buf_* names directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:08:07 -05:00
Christoph Hellwig
6c77b0ea1b xfs: remove xfs_cred.h
We're not actually passing around credentials inside XFS for a while
now, so remove all xfs_cred.h with it's cred_t typedef and all
instances of it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:08:06 -05:00
Christoph Hellwig
78a4b0961f xfs: remove xfs_globals.h
This header only provides one extern that isn't actually declared
anywhere, and shadowed by a macro.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:08:05 -05:00
Christoph Hellwig
668332e5fe xfs: remove xfs_version.h
It used to have a place when it contained an automatically generated
CVS version, but these days it's entirely superflous.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:08:04 -05:00
Christoph Hellwig
1ae4fe6dba xfs: remove xfs_refcache.h
This header has been completely unused for a couple of years.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:08:03 -05:00
Christoph Hellwig
4957a449a1 xfs: fix the xfs_trans_committed
Use the correct prototype for xfs_trans_committed instead of casting it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:08:02 -05:00
Christoph Hellwig
dfe188d428 xfs: remove unused t_callback field in struct xfs_trans
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:08:01 -05:00
Christoph Hellwig
d276734d93 xfs: fix bogus m_maxagi check in xfs_iget
These days inode64 should only control which AGs we allocate new
inodes from, while we still try to support reading all existing
inodes.  To make this actually work the check ontop of xfs_iget
needs to be relaxed to allow inodes in all allocation groups instead
of just those that we allow allocating inodes from.  Note that we
can't simply remove the check - it prevents us from accessing
invalid data when fed invalid inode numbers from NFS or bulkstat.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:08:01 -05:00
Christoph Hellwig
1b0407125f xfs: do not use xfs_mod_incore_sb_batch for per-cpu counters
Update the per-cpu counters manually in xfs_trans_unreserve_and_mod_sb
and remove support for per-cpu counters from xfs_mod_incore_sb_batch
to simplify it.  And added benefit is that we don't have to take
m_sb_lock for transactions that only modify per-cpu counters.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:08:00 -05:00
Christoph Hellwig
96540c7858 xfs: do not use xfs_mod_incore_sb for per-cpu counters
Export xfs_icsb_modify_counters and always use it for modifying
the per-cpu counters.  Remove support for per-cpu counters from
xfs_mod_incore_sb to simplify it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:59 -05:00
Christoph Hellwig
61ba35dea0 xfs: remove XFS_MOUNT_NO_PERCPU_SB
Fail the mount if we can't allocate memory for the per-CPU counters.
This is consistent with how we handle everything else in the mount
path and makes the superblock counter modification a lot simpler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:58 -05:00
Dave Chinner
50f59e8eed xfs: pack xfs_buf structure more tightly
pahole reports the struct xfs_buf has quite a few holes in it, so
packing the structure better will reduce the size of it by 16 bytes.
Also, move all the fields used in cache lookups into the first
cacheline.

Before on x86_64:

        /* size: 320, cachelines: 5 */
	/* sum members: 298, holes: 6, sum holes: 22 */

After on x86_64:

        /* size: 304, cachelines: 5 */
	/* padding: 6 */
	/* last cacheline: 48 bytes */

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:57 -05:00
Dave Chinner
74f75a0cb7 xfs: convert buffer cache hash to rbtree
The buffer cache hash is showing typical hash scalability problems.
In large scale testing the number of cached items growing far larger
than the hash can efficiently handle. Hence we need to move to a
self-scaling cache indexing mechanism.

I have selected rbtrees for indexing becuse they can have O(log n)
search scalability, and insert and remove cost is not excessive,
even on large trees. Hence we should be able to cache large numbers
of buffers without incurring the excessive cache miss search
penalties that the hash is imposing on us.

To ensure we still have parallel access to the cache, we need
multiple trees. Rather than hashing the buffers by disk address to
select a tree, it seems more sensible to separate trees by typical
access patterns. Most operations use buffers from within a single AG
at a time, so rather than searching lots of different lists,
separate the buffer indexes out into per-AG rbtrees. This means that
searches during metadata operation have a much higher chance of
hitting cache resident nodes, and that updates of the tree are less
likely to disturb trees being accessed on other CPUs doing
independent operations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:56 -05:00
Dave Chinner
69b491c214 xfs: serialise inode reclaim within an AG
Memory reclaim via shrinkers has a terrible habit of having N+M
concurrent shrinker executions (N = num CPUs, M = num kswapds) all
trying to shrink the same cache. When the cache they are all working
on is protected by a single spinlock, massive contention an
slowdowns occur.

Wrap the per-ag inode caches with a reclaim mutex to serialise
reclaim access to the AG. This will block concurrent reclaim in each
AG but still allow reclaim to scan multiple AGs concurrently. Allow
shrinkers to move on to the next AG if it can't get the lock, and if
we can't get any AG, then start blocking on locks.

To prevent reclaimers from continually scanning the same inodes in
each AG, add a cursor that tracks where the last reclaim got up to
and start from that point on the next reclaim. This should avoid
only ever scanning a small number of inodes at the satart of each AG
and not making progress. If we have a non-shrinker based reclaim
pass, ignore the cursor and reset it to zero once we are done.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:55 -05:00
Dave Chinner
e3a20c0b02 xfs: batch inode reclaim lookup
Batch and optimise the per-ag inode lookup for reclaim to minimise
scanning overhead. This involves gang lookups on the radix trees to
get multiple inodes during each tree walk, and tighter validation of
what inodes can be reclaimed without blocking befor we take any
locks.

This is based on ideas suggested in a proof-of-concept patch
posted by Nick Piggin.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:54 -05:00
Dave Chinner
78ae525676 xfs: implement batched inode lookups for AG walking
With the reclaim code separated from the generic walking code, it is
simple to implement batched lookups for the generic walk code.
Separate out the inode validation from the execute operations and
modify the tree lookups to get a batch of inodes at a time.

Reclaim operations will be optimised separately.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:53 -05:00
Dave Chinner
e13de955ca xfs: split out inode walk inode grabbing
When doing read side inode cache walks, the code to validate and
grab an inode is common to all callers. Split it out of the execute
callbacks in preparation for batching lookups. Similarly, split out
the inode reference dropping from the execute callbacks into the
main lookup look to be symmetric with the grab.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:52 -05:00
Dave Chinner
65d0f20533 xfs: split inode AG walking into separate code for reclaim
The reclaim walk requires different locking and has a slightly
different walk algorithm, so separate it out so that it can be
optimised separately.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:52 -05:00
Dave Chinner
69d6cc76cf xfs: remove buftarg hash for external devices
For RT and external log devices, we never use hashed buffers on them
now.  Remove the buftarg hash tables that are set up for them.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:51 -05:00
Dave Chinner
1922c949c5 xfs: use unhashed buffers for size checks
When we are checking we can access the last block of each device, we
do not need to use cached buffers as they will be tossed away
immediately. Use uncached buffers for size checks so that all IO
prior to full in-memory structure initialisation does not use the
buffer cache.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:50 -05:00
Dave Chinner
26af655233 xfs: kill XBF_FS_MANAGED buffers
Filesystem level managed buffers are buffers that have their
lifecycle controlled by the filesystem layer, not the buffer cache.
We currently cache these buffers, which makes cleanup and cache
walking somewhat troublesome. Convert the fs managed buffers to
uncached buffers obtained by via xfs_buf_get_uncached(), and remove
the XBF_FS_MANAGED special cases from the buffer cache.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:49 -05:00
Dave Chinner
ebad861b57 xfs: store xfs_mount in the buftarg instead of in the xfs_buf
Each buffer contains both a buftarg pointer and a mount pointer. If
we add a mount pointer into the buftarg, we can avoid needing the
b_mount field in every buffer and grab it from the buftarg when
needed instead. This shrinks the xfs_buf by 8 bytes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:48 -05:00
Dave Chinner
5adc94c247 xfs: introduced uncached buffer read primitve
To avoid the need to use cached buffers for single-shot or buffers
cached at the filesystem level, introduce a new buffer read
primitive that bypasses the cache an reads directly from disk.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:47 -05:00
Dave Chinner
686865f76e xfs: rename xfs_buf_get_nodaddr to be more appropriate
xfs_buf_get_nodaddr() is really used to allocate a buffer that is
uncached. While it is not directly assigned a disk address, the fact
that they are not cached is a more important distinction. With the
upcoming uncached buffer read primitive, we should be consistent
with this disctinction.

While there, make page allocation in xfs_buf_get_nodaddr() safe
against memory reclaim re-entrancy into the filesystem by allowing
a flags parameter to be passed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:46 -05:00
Dave Chinner
dcd79a1423 xfs: don't use vfs writeback for pure metadata modifications
Under heavy multi-way parallel create workloads, the VFS struggles
to write back all the inodes that have been changed in age order.
The bdi flusher thread becomes CPU bound, spending 85% of it's time
in the VFS code, mostly traversing the superblock dirty inode list
to separate dirty inodes old enough to flush.

We already keep an index of all metadata changes in age order - in
the AIL - and continued log pressure will do age ordered writeback
without any extra overhead at all. If there is no pressure on the
log, the xfssyncd will periodically write back metadata in ascending
disk address offset order so will be very efficient.

Hence we can stop marking VFS inodes dirty during transaction commit
or when changing timestamps during transactions. This will keep the
inodes in the superblock dirty list to those containing data or
unlogged metadata changes.

However, the timstamp changes are slightly more complex than this -
there are a couple of places that do unlogged updates of the
timestamps, and the VFS need to be informed of these. Hence add a
new function xfs_trans_ichgtime() for transactional changes,
and leave xfs_ichgtime() for the non-transactional changes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-10-18 15:07:45 -05:00
Dave Chinner
e176579e70 xfs: lockless per-ag lookups
When we start taking a reference to the per-ag for every cached
buffer in the system, kernel lockstat profiling on an 8-way create
workload shows the mp->m_perag_lock has higher acquisition rates
than the inode lock and has significantly more contention. That is,
it becomes the highest contended lock in the system.

The perag lookup is trivial to convert to lock-less RCU lookups
because perag structures never go away. Hence the only thing we need
to protect against is tree structure changes during a grow. This can
be done simply by replacing the locking in xfs_perag_get() with RCU
read locking. This removes the mp->m_perag_lock completely from this
path.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:44 -05:00
Dave Chinner
bd32d25a7c xfs: remove debug assert for per-ag reference counting
When we start taking references per cached buffer to the the perag
it is cached on, it will blow the current debug maximum reference
count assert out of the water. The assert has never caught a bug,
and we have tracing to track changes if there ever is a problem,
so just remove it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:43 -05:00
Dave Chinner
d1583a3833 xfs: reduce the number of CIL lock round trips during commit
When commiting a transaction, we do a lock CIL state lock round trip
on every single log vector we insert into the CIL. This is resulting
in the lock being as hot as the inode and dcache locks on 8-way
create workloads. Rework the insertion loops to bring the number
of lock round trips to one per transaction for log vectors, and one
more do the busy extents.

Also change the allocation of the log vector buffer not to zero it
as we copy over the entire allocated buffer anyway.

This patch also includes a structural cleanup to the CIL item
insertion provided by Christoph Hellwig.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:42 -05:00
Poyo VL
9c169915ad xfs: eliminate some newly-reported gcc warnings
Ionut Gabriel Popescu <poyo_vl@yahoo.com> submitted a simple change
to eliminate some "may be used uninitialized" warnings when building
XFS.  The reported condition seems to be something that GCC did not
used to recognize or report.  The warnings were produced by:

    gcc version 4.5.0 20100604
    [gcc-4_5-branch revision 160292] (SUSE Linux)

Signed-off-by: Ionut Gabriel Popescu <poyo_vl@yahoo.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:39 -05:00
Christoph Hellwig
c0e59e1ac0 xfs: remove the ->kill_root btree operation
The implementation os ->kill_root only differ by either simply
zeroing out the now unused buffer in the btree cursor in the inode
allocation btree or using xfs_btree_setbuf in the allocation btree.

Initially both of them used xfs_btree_setbuf, but the use in the
ialloc btree was removed early on because it interacted badly with
xfs_trans_binval.

In addition to zeroing out the buffer in the cursor xfs_btree_setbuf
updates the bc_ra array in the btree cursor, and calls
xfs_trans_brelse on the buffer previous occupying the slot.

The bc_ra update should be done for the alloc btree updated too,
although the lack of it does not cause serious problems.  The
xfs_trans_brelse call on the other hand is effectively a no-op in
the end - it keeps decrementing the bli_recur refcount until it hits
zero, and then just skips out because the buffer will always be
dirty at this point.  So removing it for the allocation btree is
just fine.

So unify the code and move it to xfs_btree.c.  While we're at it
also replace the call to xfs_btree_setbuf with a NULL bp argument in
xfs_btree_del_cursor with a direct call to xfs_trans_brelse given
that the cursor is beeing freed just after this and the state
updates are superflous.  After this xfs_btree_setbuf is only used
with a non-NULL bp argument and can thus be simplified.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:38 -05:00
Christoph Hellwig
acecf1b5d8 xfs: stop using xfs_qm_dqtobp in xfs_qm_dqflush
In xfs_qm_dqflush we know that q_blkno must be initialized already from a
previous xfs_qm_dqread.  So instead of calling xfs_qm_dqtobp we can
simply read the quota buffer directly.  This also saves us from a duplicate
xfs_qm_dqcheck call check and allows xfs_qm_dqtobp to be simplified now
that it is always called for a newly initialized inode.  In addition to
that properly unwind all locks in xfs_qm_dqflush when xfs_qm_dqcheck
fails.

This mirrors a similar cleanup in the inode lookup done earlier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-18 15:07:37 -05:00