In the process of writing up the mechanical proof of correctness for the
dynticks/preemptable-RCU interface, I noticed misplaced memory barriers in
rcu_enter_nohz() and rcu_exit_nohz().
This patch puts them in the right place and adds a comment. The key thing to
keep in mind is that rcu_enter_nohz() is -exiting- the mode that can legally
execute RCU read-side critical sections.
The memory barrier must be between any potential RCU read-side critical
sections and the increment of the per-CPU dynticks_progress_counter, and thus
must come -before- this increment. And vice versa for rcu_exit_nohz().
The locking in the scheduler is probably saving us for the moment.
Also, switch to smp_mb() - we don't need a barrier for uniprocessor kernels.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since 4c7ffe0b9f ("fbdev: prevent drivers that
have hardware cursors from calling software cursor code") every call of
i810fb_cursor fails with -ENXIO because of a incorrect "!".
This hasn't struck until eaa0ff15c3 ("fix !
versus & precedence in various places") surrounded the expression with braces,
so that the intended behavior was inverted. That caused 'pixel waste' - the
same line of multi-colored pixels repeated over the whole screen - during
console switch.
This switches back to the original pre-4c7ffe0 behavior.
Signed-off-by: Stefan Bauer <stefan.bauer@cs.tu-chemnitz.de>
Tested-by: Stefan Bauer <stefan.bauer@cs.tu-chemnitz.de>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Antonino Daplas <adaplas@pol.net>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a long-standing typo (predating git) that will cause data corruption if a
journal data block needs unescaping. At the moment the wrong buffer head's
data is being unescaped.
To test this case mount a filesystem with data=journal, start creating and
deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then
pull the plug on the device. Without this patch the files will contain zeros
instead of the correct data after recovery.
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a long-standing typo (predating git) that will cause data corruption if a
journal data block needs unescaping. At the moment the wrong buffer head's
data is being unescaped.
To test this case mount a filesystem with data=journal, start creating and
deleting a bunch of files containing only JFS_MAGIC_NUMBER (0xc03b3998), then
pull the plug on the device. Without this patch the files will contain zeros
instead of the correct data after recovery.
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the SYSV ipc msgctl(),semctl(),shmctl() family, if the user passed *_INFO
as the desired operation, no specific object is meant to be controlled and
only system-wide information is returned. This leads to a NULL IPC object in
the LSM hooks if the _INFO flag is given.
Avoid dereferencing this NULL pointer in Smack ipc *ctl() methods.
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix up an error in iget removal in which romfs_lookup() making a successful
call to romfs_iget() continues through the negative/error handling (previously
the successful case jumped around the negative/error handling case):
(1) inode is initialised to NULL at the top of the function, eliminating the
need for specific negative-inode handling. This means the positive
success handling now flows straight through.
(2) Rename the labels to be clearer about what they mean.
Also make romfs_lookup()'s result variable of type long so as to avoid
32-bit/64-bit conversions with PTR_ERR() and friends.
Based upon a report and patch from Adam Richter.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: "Adam J. Richter" <adam@yggdrasil.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are several places where we make allocations with GFP_KERNEL while under
a transaction, which could lead to an assertion panic or lockup if under
memory pressure. This patch switches these problem areas to use GFP_NOFS to
keep these problems from happening.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ibmpex's temperature sensors report incorrect units. Apply a conversion
factor so that tempertures report correctly. Until now, no systems seemed to
report temperatures this way, but evidently QS2x blades do.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Cc: "Mark M. Hoffman" <mhoffman@lightlink.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The check t->pid == t->pid is not the blessed way to check whether a task is a
group leader.
This is not about the code beautifulness only, but about pid namespaces fixes
- both the tgid and the pid fields on the task_struct are (slowly :( )
becoming deprecated.
Besides, the thread_group_leader() macro makes only one dereference :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Exposing the binary blob which is the md 'super-block' via sysfs doesn't
really fit with the whole sysfs model, and ever since commit
8118a859dc ("sysfs: fix off-by-one error
in fill_read_buffer()") it doesn't actually work at all (as the size of
the blob is often one page).
(akpm: as in, fs/sysfs/file.c:fill_read_buffer() goes BUG)
So just remove it altogether. It isn't really useful.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix various kernel-doc notation in mm/:
filemap.c: add function short description; convert 2 to kernel-doc
fremap.c: change parameter 'prot' to @prot
pagewalk.c: change "-" in function parameters to ":"
slab.c: fix short description of kmem_ptr_validate()
swap.c: fix description & parameters of put_pages_list()
swap_state.c: fix function parameters
vmalloc.c: change "@returns" to "Returns:" since that is not a parameter
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
My group ran into a AIO process hang on a 2.6.24 kernel with the process
sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come
and it never would.
We ran the tests on x86_64 SMP. The hang only occurred on a Xeon box
("Clovertown") but not a Core2Duo ("Conroe"). On the Xeon, the L2 cache isn't
shared between all eight processors, but is L2 is shared between between all
two processors on the Core2Duo we use.
My analysis of the hang is if you go down to the second while-loop
in read_events(), what happens on processor #1:
1) add_wait_queue_exclusive() adds thread to ctx->wait
2) aio_read_evt() to check tail
3) if aio_read_evt() returned 0, call [io_]schedule() and sleep
In aio_complete() with processor #2:
A) info->tail = tail;
B) waitqueue_active(&ctx->wait)
C) if waitqueue_active() returned non-0, call wake_up()
The way the code is written, step 1 must be seen by all other processors
before processor 1 checks for pending events in step 2 (that were recorded by
step A) and step A by processor 2 must be seen by all other processors
(checked in step 2) before step B is done.
The race I believed I was seeing is that steps 1 and 2 were
effectively swapped due to the __list_add() being delayed by the L2
cache not shared by some of the other processors. Imagine:
proc 2: just before step A
proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet
proc 1, step 2: checks tail and sees no pending events
proc 2, step A: updates tail
proc 1, step 3: calls [io_]schedule() and sleeps
proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup
so proc 1 sleeps indefinitely
My patch adds a memory barrier between steps A and B. It ensures that the
update in step 1 gets seen on processor 2 before continuing. If processor 1
was just before step 1, the memory barrier makes sure that step A (update
tail) gets seen by the time processor 1 makes it to step 2 (check tail).
Before the patch our AIO process would hang virtually 100% of the time. After
the patch, we have yet to see the process ever hang.
Signed-off-by: Quentin Barnes <qbarnes+linux@yahoo-inc.com>
Reviewed-by: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: <stable@kernel.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ We should probably disallow that "if (waitqueue_active()) wake_up()"
coding pattern, because it's so often buggy wrt memory ordering ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The PCI bridge representing the PCIE root complex on Axon, contains
device BARs for a memory range and ROM that define inbound accesses.
This confuses the kernel resource management code -- the resources
need to be hidden when Axon is a host bridge.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The cell IOMMU code to parse the dma-ranges properties, used for the fixed
mapping, was broken in two ways for some devices.
Firstly it didn't cope with empty dma-ranges properties. An empty property
implies no translation so can be safely skipped.
The code also wrongly assumed it would be looking at PCI devices, and hard
coded the number of address and size cells.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The wrapper script didn't have entries for the TQM8540 board and the
SBC8548 or SBC8560 boards. I've assumed that the TQM8540 console is
8250 based and not CPM based by looking at its defconfig. There was
also a trailing * on the TQM8555 entry that I removed too.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since the PMU is an NMI now, it can come at any time we are only soft
disabled. We must hard disable around the two places we allow the kernel
stack SLB and r1 to go out of sync. Otherwise the PMU exception can
force a kernel stack SLB into another slot, which can lead to it
getting evicted, which can lead to a nasty unrecoverable SLB miss
in the exception entry code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The PTRACE_SETREGS request was only recently added on powerpc,
and gdb does not use it. So it slipped through without getting
all the testing it should have had.
The user_regset changes had a simple bug in storing to all of
the 32-bit general registers block on 64-bit kernels. This bug
only comes up with PTRACE_SETREGS, not PPC_PTRACE_SETREGS.
It causes a BUG_ON to hit, so this fix needs to go in ASAP.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Clean up: refactor the encoding of the opaque 16-byte private argument in
xdr_encode_mon(). This will be updated later to support IPv6 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Switch to using the new mon_id encoder function.
Now that we've refactored the encoding of SM_MON requests, we've
discovered that the pre-computed buffer length maximums are
incorrect!
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: document the argument type that xdr_encode_common() is
marshalling by introducing a new function. The new function will replace
xdr_encode_common() in just a sec.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: introduce a new XDR encoder specifically for the my_id
argument of SM_MON requests.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: introduce a new XDR encoder specifically for the mon_name
argument of SM_MON requests. This will be updated later to support IPv6
addresses in addition to IPv4 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Introduce a special helper function to check the length of NSM strings
before they are placed on the wire.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean-up: replace __inline__ and use up-to-date function declaration
conventions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: RPC protocol version numbers are u32. Make sure we use an
appropriate type for NLM version numbers when calling nlm_lookup_host().
Eliminates a harmless mixed sign comparison in nlm_host_lookup().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bruce Fields says:
"By the way, we've got another config-related nit here:
http://bugzilla.linux-nfs.org/show_bug.cgi?id=156
You can build lockd without CONFIG_SYSCTL set, but then the module will
fail to load."
For now, disable the sysctl registration calls in lockd if CONFIG_SYSCTL
is not enabled. This allows the kernel to build properly if PROC_FS or
SYSCTL is not enabled, but an NFS client is desired.
In the long run, we would like to be able to build the kernel with an
NFS client but without lockd. This makes sense, for example, if you want
an NFSv4-only NFS client, as NFSv4 doesn't use NLM at all.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Most distros will want support for rpcbind protocols 3 and 4 to default off
until they have integrated user-space support for the new rpcbind daemon
which supports IPv6 RPC services.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: refresh the help text for Kconfig items related to the sunrpc
module. Remove obsolete URLs, and make the language consistent among
the options.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since O_DIRECT is a standard feature that is enabled in most distros,
eliminate the CONFIG_NFS_DIRECTIO build option, and change the
fs/nfs/Makefile to always build in the NFS direct I/O engine.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Save the value of the mountproto= mountport= mountvers= and mountaddr=
options so that these values can be displayed later via
nfs_show_options().
This preserves the intent of the original mount options, should the file
system need to be remounted based on what's displayed in /proc/mounts.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
During a remount based on the mount options displayed in /proc/mounts, we
want to preserve the original behavior of the mount request. Let's save
the original setting of the "port=" mount option in the mount's nfs_server
structure.
This allows us to simplify the default behavior of port setting for NFSv4
mounts: by default, NFSv2/3 mounts first try an RPC bind to determine the
NFS server's port, unless the user specified the "port=" mount option;
Users can force the client to skip the RPC bind by explicitly specifying
"port=<value>".
NFSv4, by contrast, assumes the NFS server port is 2049 and skips the RPC
bind, unless the user specifies "port=". Users can force an RPC bind for
NFSv4 by explicitly specifying "port=0".
I added a couple of extra comments to clarify this behavior.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: make data types of fields in nfs_parsed_mount_options more
consistent with other uses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: use %u instead of %d when displaying NFS mount options.
Nit: Fix reporting of "namlen=" option in nfs_show_mount_stats. The mount
option is called "namlen" without the "e".
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.
This patch does 3 things:
1) have malformed responses with no entries return error (-EIO)
2) preserve existing workaround for servers that send empty
responses with the EOF marker unset.
3) Add some comments to clarify the logic in decode_readdir().
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.
This patch does 3 things:
1) have malformed responses with no entries return error (-EIO)
2) preserve existing workaround for servers that send empty
responses with the EOF marker unset.
3) Add some comments to clarify the logic in nfs3_xdr_readdirres().
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.
This patch does 3 things:
1) have malformed responses with no entries return error (-EIO)
2) preserve existing workaround for servers that send empty
responses with the EOF marker unset.
3) Add some comments to clarify the logic in nfs_xdr_readdirres().
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Both flush functions have the same error handling routine. Pull
it out as a function.
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>