On 3.11-rc we are seeing cgroup directories left behind when they should
have been removed. Here's a trivial reproducer:
cd /sys/fs/cgroup/memory
mkdir parent parent/child; rmdir parent/child parent
rmdir: failed to remove `parent': Device or resource busy
It's because cgroup_destroy_locked() (step 1 of destruction) leaves
cgroup on parent's children list, letting cgroup_offline_fn() (step 2 of
destruction) remove it; but step 2 is run by work queue, which may not
yet have removed the children when parent destruction checks the list.
Fix that by checking through a non-empty list of children: if every one
of them has already been marked CGRP_DEAD, then it's safe to proceed:
those children are invisible to userspace, and should not obstruct rmdir.
(I didn't see any reason to keep the cgrp->children checks under the
unrelated css_set_lock, so moved them out.)
tj: Flattened nested ifs a bit and updated comment so that it's
correct on both for-3.11-fixes and for-3.12.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
If !PREEMPT, a kworker running work items back to back can hog CPU.
This becomes dangerous when a self-requeueing work item which is
waiting for something to happen races against stop_machine. Such
self-requeueing work item would requeue itself indefinitely hogging
the kworker and CPU it's running on while stop_machine would wait for
that CPU to enter stop_machine while preventing anything else from
happening on all other CPUs. The two would deadlock.
Jamie Liu reports that this deadlock scenario exists around
scsi_requeue_run_queue() and libata port multiplier support, where one
port may exclude command processing from other ports. With the right
timing, scsi_requeue_run_queue() can end up requeueing itself trying
to execute an IO which is asked to be retried while another device has
an exclusive access, which in turn can't make forward progress due to
stop_machine.
Fix it by invoking cond_resched() after executing each work item.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jamie Liu <jamieliu@google.com>
References: http://thread.gmane.org/gmane.linux.kernel/1552567
Cc: stable@vger.kernel.org
--
kernel/workqueue.c | 9 +++++++++
1 file changed, 9 insertions(+)
Merge fixes from Andrew Morton:
"Five fixes.
err, make that six. let me try again"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbers
memcg: check that kmem_cache has memcg_params before accessing it
drivers/base/memory.c: fix show_mem_removable() to handle missing sections
IPC: bugfix for msgrcv with msgtyp < 0
Omnikey Cardman 4000: pull in ioctl.h in user header
timer_list: correct the iterator for timer_list
While using pacemaker/corosync, the node numbers are generated using IP
address as opposed to serial node number generation. This may not fit
in a 8-byte string. Use a bigger string to print the complete node
number.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the system had a few memory groups and all of them were destroyed,
memcg_limited_groups_array_size has non-zero value, but all new caches
are created without memcg_params, because memcg_kmem_enabled() returns
false.
We try to enumirate child caches in a few places and all of them are
potentially dangerous.
For example my kernel is compiled with CONFIG_SLAB and it crashed when I
tryed to mount a NFS share after a few experiments with kmemcg.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8118166a>] do_tune_cpucache+0x8a/0xd0
PGD b942a067 PUD b999f067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: fscache(+) ip6table_filter ip6_tables iptable_filter ip_tables i2c_piix4 pcspkr virtio_net virtio_balloon i2c_core floppy
CPU: 0 PID: 357 Comm: modprobe Not tainted 3.11.0-rc7+ #59
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8800b9f98240 ti: ffff8800ba32e000 task.ti: ffff8800ba32e000
RIP: 0010:[<ffffffff8118166a>] [<ffffffff8118166a>] do_tune_cpucache+0x8a/0xd0
RSP: 0018:ffff8800ba32fb70 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: ffff8800b9f98910 RDI: 0000000000000246
RBP: ffff8800ba32fba0 R08: 0000000000000002 R09: 0000000000000004
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000010
R13: 0000000000000008 R14: 00000000000000d0 R15: ffff8800375d0200
FS: 00007f55f1378740(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f24feba57a0 CR3: 0000000037b51000 CR4: 00000000000006f0
Call Trace:
enable_cpucache+0x49/0x100
setup_cpu_cache+0x215/0x280
__kmem_cache_create+0x2fa/0x450
kmem_cache_create_memcg+0x214/0x350
kmem_cache_create+0x2b/0x30
fscache_init+0x19b/0x230 [fscache]
do_one_initcall+0xfa/0x1b0
load_module+0x1c41/0x26d0
SyS_finit_module+0x86/0xb0
system_call_fastpath+0x16/0x1b
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to 'man msgrcv': "If msgtyp is less than 0, the first message of
the lowest type that is less than or equal to the absolute value of msgtyp
shall be received."
Bug: The kernel only returns a message if its type is 1; other messages
with type < abs(msgtype) will never get returned.
Fix: After having traversed the list to find the first message with the
lowest type, we need to actually return that message.
This regression was introduced by commit daaf74cf08 ("ipc: refactor
msg list search into separate function")
Signed-off-by: Svenning Soerensen <sss@secomea.dk>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This file uses the ioctl helpers (_IOR/_IOW/etc...), so include ioctl.h
for the definitions.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Harald Welte <laforge@gnumonks.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Correct an issue with /proc/timer_list reported by Holger.
When reading from the proc file with a sufficiently small buffer, 2k so
not really that small, there was one could get hung trying to read the
file a chunk at a time.
The timer_list_start function failed to account for the possibility that
the offset was adjusted outside the timer_list_next.
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Reported-by: Holger Hans Peter Freyther <holger@freyther.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Berke Durak <berke.durak@xiphos.com>
Cc: Jeff Layton <jlayton@redhat.com>
Tested-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org> # 3.10.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This just replaces the dentry count/lock combination with the lockref
structure that contains both a count and a spinlock, and does the
mechanical conversion to use the lockref infrastructure.
There are no semantic changes here, it's purely syntactic. The
reference lockref implementation uses the spinlock exactly the same way
that the old dcache code did, and the bulk of this patch is just
expanding the internal "d_count" use in the dcache code to use
"d_lockref.count" instead.
This is purely preparation for the real change to make the reference
count updates be lockless during the 3.12 merge window.
[ As with the previous commit, this is a rewritten version of a concept
originally from Waiman, so credit goes to him, blame for any errors
goes to me.
Waiman's patch had some semantic differences for taking advantage of
the lockless update in dget_parent(), while this patch is
intentionally a pure search-and-replace change with no semantic
changes. - Linus ]
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This introduces a new "lockref" structure that supports the concept of
lockless updates of reference counts that still honor an attached
spinlock.
NOTE! This reference implementation is not the optimized lockless
version, rather it is the fallback implementation using standard
spinlocks. The actual optimized versions will be merged into 3.12, but
I wanted to get the infrastructure in place and document the new
interfaces.
[ Also note that this particular commit is drastically cut-down minimal
version of the original patch by Waiman. In order to properly credit
the original author I'm marking Waiman as the author here, but in the
end this patch bears little resemblance to the patch by Waiman. So
blame any errors on me editing things down to the point where I can
introduce the infrastructure before the merge window for 3.12 actually
opens. - Linus ]
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds another entry (HP hs2434 Mobile Broadband) to the list
of exceptional devices that require a zero length packet in order to
function properly. This list was added in commit 844e88f0. The hs2434
is manufactured by Sierra Wireless, who also produces the MC7710,
which the ZLP exception list was created for in the first place. So
hopefully it is just this one producer's devices that will need this
workaround.
Tested on a DM1-4310NR HP notebook, which does not function without this
change.
Signed-off-by: Rob Gardner <robmatic@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a cpu_relaxt to sk_busy_loop.
Julie Cummings reported performance issues when hyperthreading is on.
Arjan van de Ven observed that we should have a cpu_relax() in the
busy poll loop.
Reported-by: Julie Cummings <julie.a.cummings@intel.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixed the pbl(programmable burst length) setting
using DT. Even though the default pbl is 8, If there is no
pbl property in device tree file, pbl is set 0 and it causes
bandwidth degradation.
Signed-off-by: Byungho An <bh74.an@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netlink dump operations take module as parameter to hold
reference for entire netlink dump duration.
Currently it holds ref only on genl module which is not correct
when we use ops registered to genl from another module.
Following patch adds module pointer to genl_ops so that netlink
can hold ref count on it.
CC: Jesse Gross <jesse@nicira.com>
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of genl-family with parallel ops off, dumpif() callback
is expected to run under genl_lock, But commit def3117493
(genl: Allow concurrent genl callbacks.) changed this behaviour
where only first dumpit() op was called under genl-lock.
For subsequent dump, only nlk->cb_lock was taken.
Following patch fixes it by defining locked dumpit() and done()
callback which takes care of genl-locking.
CC: Jesse Gross <jesse@nicira.com>
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some architectures, such as ARM-32 do not return the same base address
when you call kmap_atomic() twice on the same page.
This causes problems for the memmove() call in the XDR helper routine
"_shift_data_right_pages()", since it defeats the detection of
overlapping memory ranges, and has been seen to corrupt memory.
The fix is to distinguish between the case where we're doing an
inter-page copy or not. In the former case of we know that the memory
ranges cannot possibly overlap, so we can additionally micro-optimise
by replacing memmove() with memcpy().
Reported-by: Mark Young <MYoung@nvidia.com>
Reported-by: Matt Craighead <mcraighead@nvidia.com>
Cc: Bruce Fields <bfields@fieldses.org>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Matt Craighead <mcraighead@nvidia.com>
This reverts commit bb2314b479.
It wasn't necessarily wrong per se, but we're still busily discussing
the exact details of this all, so I'm going to revert it for now.
It's true that you can already do flink() through /proc and that flink()
isn't new. But as Brad Spengler points out, some secure environments do
not mount proc, and flink adds a new interface that can avoid path
lookup of the source for those kinds of environments.
We may re-do this (and even mark it for stable backporting back in 3.11
and possibly earlier) once the whole discussion about the interface is done.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The net_device might be not set on the skb when we try refcounting.
This leads to a null pointer dereference in xdst_queue_output().
It turned out that the refcount to the net_device is not needed
after all. The dst_entry has a refcount to the net_device before
we queue the skb, so it can't go away. Therefore we can remove the
refcount on queueing to fix the null pointer dereference.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The recent commit to delay the release of kobject triggered NULL
dereferences of opti9xx drivers. The cause is that all
snd-opti92x-ad1848, snd-opti92x-cs4231 and snd-opti93x drivers
register the PnP card driver with the very same name, and also
snd-opti92x-ad1848 and -cs4231 drivers register the ISA driver with
the same name, too. When these drivers are built in, quick
"register-release-and-re-register" actions occur, and this results in
Oops because of the same name is assigned to the kobject.
The fix is simply to assign individual names. As a bonus, by using
KBUILD_MODNAME, the patch reduces more lines than it adds.
The fix is based on the suggestion by Russell King.
Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Since the PF gathers statistics for the VF, when the VF is about to unload
we must synchronize the release of its statistics buffer with the PF, so that
no DMA operation will be made to that address after the buffer release.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to incorrect VF/PF conditions, when unloading a VF it will not release
part of the memory it has previously allocated.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check on return code of bnx2x_vfop_config_vlan0() would lead to error
handling flow as the return value indicating an existing pending ramrod would
be erroneously considered as an error.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If driver will fail to allocate all queues, it will shrink the number of
queues and move the storage queue to its correct place (i.e., the last
queue among the newly supported number).
When changing the pointers of the new location of the FCoE queue, we need
to pay special attention to the aggregations pointer - that memory is allocated
during probe and released upon driver removal. Current implementation has 2
pointers pointing to the same chunk of allocated memory, meaning upon removal
there will be two kfree() of the same chunk while the other won't be released.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Solve issue where no stats were being collected for VF devices due to missing
configuration in the stats' atomic synchronization mechanism.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following we will begin to add memcg dirty page accounting around
__set_page_dirty_{buffers,nobuffers} in vfs layer, so we'd better use vfs interface to
avoid exporting those details to filesystems.
Since vfs set_page_dirty() should be called under page lock, here we don't need elaborate
codes to handle racy anymore, and two WARN_ON() are added to detect such exceptions.
Thanks very much for Sage and Yan Zheng's coaching!
I tested it in a two server's ceph environment that one is client and the other is
mds/osd/mon, and run the following fsx test from xfstests:
./fsx 1MB -N 50000 -p 10000 -l 1048576
./fsx 10MB -N 50000 -p 10000 -l 10485760
./fsx 100MB -N 50000 -p 10000 -l 104857600
The fsx does lots of mmap-read/mmap-write/truncate operations and the tests completed
successfully without triggering any of WARN_ON.
Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
Reviewed-by: Sage Weil <sage@inktank.com>
John W. Linville says:
====================
This is one more set of fixes intended for the 3.11 stream...
For the mac80211 bits, Johannes says:
"I have three more patches for the 3.11 stream: Felix's fix for the
fairly visible brcmsmac crash, a fix from Simon for an IBSS join bug I
found and a fix for a channel context bug in IBSS I'd introduced."
Along with those...
Sujith Manoharan makes a minor change to not use a PLL hang workaroun
for AR9550. This one-liner fixes a couple of bugs reported in the Red Hat
bugzilla.
Helmut Schaa addresses an ath9k_htc bug that mangles frame headers
during Tx. This fix is small, tested by the bug reported and isolated
to ath9k_htc.
Stanislaw Gruszka reverts a recent iwl4965 change that broke rfkill
notification to user space.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
For sync_read/write, it may do multi stripe operations.If one of those
met erro, we return the former successed size rather than a error value.
There is a exception for write-operation met -EOLDSNAPC.If this occur,we
retry the whole write again.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
cephfs . show_layout
>layyout.data_pool: 0
>layout.object_size: 4194304
>layout.stripe_unit: 4194304
>layout.stripe_count: 1
TestA:
>dd if=/dev/urandom of=test bs=1M count=2 oflag=direct
>dd if=/dev/urandom of=test bs=1M count=2 seek=4 oflag=direct
>dd if=test of=/dev/null bs=6M count=1 iflag=direct
The messages from func striped_read are:
ceph: file.c:350 : striped_read 0~6291456 (read 0) got 2097152 HITSTRIPE SHORT
ceph: file.c:350 : striped_read 2097152~4194304 (read 2097152) got 0 HITSTRIPE SHORT
ceph: file.c:381 : zero tail 4194304
ceph: file.c:390 : striped_read returns 6291456
The hole of file is from 2M--4M.But actualy it zero the last 4M include
the last 2M area which isn't a hole.
Using this patch, the messages are:
ceph: file.c:350 : striped_read 0~6291456 (read 0) got 2097152 HITSTRIPE SHORT
ceph: file.c:358 : zero gap 2097152 to 4194304
ceph: file.c:350 : striped_read 4194304~2097152 (read 4194304) got 2097152
ceph: file.c:384 : striped_read returns 6291456
TestB:
>echo majianpeng > test
>dd if=test of=/dev/null bs=2M count=1 iflag=direct
The messages are:
ceph: file.c:350 : striped_read 0~6291456 (read 0) got 11 HITSTRIPE SHORT
ceph: file.c:350 : striped_read 11~6291445 (read 11) got 0 HITSTRIPE SHORT
ceph: file.c:390 : striped_read returns 11
For this case,it did once more striped_read.It's no meaningless.
Using this patch, the message are:
ceph: file.c:350 : striped_read 0~6291456 (read 0) got 11 HITSTRIPE SHORT
ceph: file.c:384 : striped_read returns 11
Big thanks to Yan Zheng for the patch.
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
create_singlethread_workqueue() returns NULL on error, and it doesn't
return ERR_PTRs.
I tweaked the error handling a little to be consistent with earlier in
the function.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Sage Weil <sage@inktank.com>
There are two places where we read "nr_maps" if both of them are set to
zero then we would hit a NULL dereference here.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Sage Weil <sage@inktank.com>
We've tried to fix the error paths in this function before, but there
is still a hidden goto in the ceph_decode_need() macro which goes to the
wrong place. We need to release the "req" and unlock a mutex before
returning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Since commit 82dc3c63 ("net: introduce NAPI_POLL_WEIGHT")
netif_napi_add() produces an error message if a NAPI poll weight
greater than 64 is requested.
GELIC_NET_NAPI_WEIGHT is defined to GELIC_NET_RX_DESCRIPTORS,
which is 128.
Use the standard NAPI weight.
v2: proper reference to the related commit
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 82dc3c63 ("net: introduce NAPI_POLL_WEIGHT")
netif_napi_add() produces an error message if a NAPI poll weight
greater than 64 is requested.
Use the standard NAPI weight.
v2: proper reference to the related commit
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 82dc3c63 ("net: introduce NAPI_POLL_WEIGHT")
netif_napi_add() produces an error message if a NAPI poll weight
greater than 64 is requested.
jme requests a quarter of the rx ring size as the NAPI weight.
jme's rx ring size is 1 << 9 = 512.
Use the standard NAPI weight.
v2: proper reference to the related commit
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nsproxy.pid_ns is *not* the task's pid namespace. The name should clarify
that.
This makes it more obvious that setns on a pid namespace is weird --
it won't change the pid namespace shown in procfs.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a security bug.
The follow-up will fix nsproxy to discourage this type of issue from
happening again.
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two changes here:
- Fix a bug in the rbtree code which could cause it to create two
different cache entries for the same register by adding a single
register at a time to the cache. This isn't awesome for performance
but it's non-invasive which we need for this late in the release
cycle and the I/O costs we're trying to avoid are high.
- Add another header used in the !CONFIG_REGMAP stubs where we had been
relying on implicit inclusion.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSHMPxAAoJELSic+t+oim9gTcP/iKwWc/yO33EGXx8adrDdy71
UNjN3KMpYxWxEK/26cetNtlOTv1p4XeXJsaML52V2e6/aRfYJwAai2cBBEUvlIvD
+QS5oWGSWO/7lqvUndOhI2SlcI9Ewzuc23aTfz1LZUiug3Eo3EKc4FnWASmWjRoC
vXO63OJ1EK5yOnffiOQkSiMZ4fNWAOp7j/LbEkfmWQ3yYXpVXZr5og9qSn/yHmJN
RiDL+wCrQx4LJpPhq/DinWsaa9R1LdAeRESFgwibxBh4YDUb2VWvLiU6x+WA/1Sf
SiEFUx3kkhsfjyWJCqWi/jBd47v5yOdzIFwTJFE0v0W2TLR8y1qEE3KlFLNZUyr4
9VL9pdRBJTlaCmDRnc//WKoZ8r0dSnulMZoDX1KvFPZus2Xn9FbeXmiGRAMwnUM0
yBF0gXX8YbCSVac7S9GdUtsDZIgr86nRiyfk3fLn42Glw/wUnXz9VDZphGh+49fD
k6VTbaDGgr85Aj2BR8m22WOEyX9ZdSpyeKWR0Ysvd/NrVt5Wwtn64FOTpxDJCGLb
bL730gfIUbcby5G2nWMhMmLOqBT5eaB9Sj0HkG6evrUraMHCemqzg7BnUJH07JkR
p6EKFBxr8D6GKauXv8RSXeJzI39SeoyXVPDoUS5JS9RiQgstzJGucm5RkF3JXgdL
fp7tSRUzH+NM2iAanvTt
=8r8F
-----END PGP SIGNATURE-----
Merge tag 'regmap-v3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
"Two changes here:
- Fix a bug in the rbtree code which could cause it to create two
different cache entries for the same register by adding a single
register at a time to the cache. This isn't awesome for
performance but it's non-invasive which we need for this late in
the release cycle and the I/O costs we're trying to avoid are high.
- Add another header used in the !CONFIG_REGMAP stubs where we had
been relying on implicit inclusion"
* tag 'regmap-v3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: rbtree: Fix overlapping rbnodes.
regmap: Add another missing header for !CONFIG_REGMAP stubs
Pull powerpc fixes from Ben Herrenschmidt:
"Here are 3 bug fixes that should probably go into 3.11 since I'm also
tagging them for stable.
Once fixes our old /proc/powerpc/lparcfg file which provides partition
informations when running under our hypervisor and also acts as a
user-triggerable Oops when hot :-(
The other two respectively are a one liner to fix a HVSI protocol
handshake problem causing the console to fail to show up on a bunch of
machines until we reach userspace, which I deem annoying enough to
warrant going to stable, and a nasty gcc miscompile causing us to pass
virtual instead of physical addresses to the firmware under some
circumstances"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/hvsi: Increase handshake timeout from 200ms to 400ms.
powerpc: Work around gcc miscompilation of __pa() on 64-bit
powerpc: Don't Oops when accessing /proc/powerpc/lparcfg without hypervisor
Dave reported corrupted swap entries
| [ 4588.541886] swap_free: Unused swap offset entry 00002d15
| [ 4588.541952] BUG: Bad page map in process trinity-kid12 pte:005a2a80 pmd:22c01f067
and Hugh pointed that in move_ptes _PAGE_SOFT_DIRTY bit set regardless
the type of entry pte consists of. The trick here is that when we carry
soft dirty status in swap entries we are to use _PAGE_SWP_SOFT_DIRTY
instead, because this is the only place in pte which can be used for own
needs without intersecting with bits owned by swap entry type/offset.
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Analyzed-by: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This solves a problem observed in kexec'ed kernel where 200ms timeout is
too short and bootconsole fails to initialize. Console did eventually
become workable but much later into the boot process.
Observed timeout was around 260ms, but I decided to make it a little bigger
for more reliability.
This has been tested on Power7 machine with Petitboot as a primary
bootloader and PowerNV firmware.
CC: <stable@vger.kernel.org>
Signed-off-by: Eugene Surovegin <surovegin@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On 64-bit, __pa(&static_var) gets miscompiled by recent versions of
gcc as something like:
addis 3,2,.LANCHOR1+4611686018427387904@toc@ha
addi 3,3,.LANCHOR1+4611686018427387904@toc@l
This ends up effectively ignoring the offset, since its bottom 32 bits
are zero, and means that the result of __pa() still has 0xC in the top
nibble. This happens with gcc 4.8.1, at least.
To work around this, for 64-bit we make __pa() use an AND operator,
and for symmetry, we make __va() use an OR operator. Using an AND
operator rather than a subtraction ends up with slightly shorter code
since it can be done with a single clrldi instruction, whereas it
takes three instructions to form the constant (-PAGE_OFFSET) and add
it on. (Note that MEMORY_START is always 0 on 64-bit.)
CC: <stable@vger.kernel.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
/proc/powerpc/lparcfg is an ancient facility (though still actively used)
which allows access to some informations relative to the partition when
running underneath a PAPR compliant hypervisor.
It makes no sense on non-pseries machines. However, currently, not only
can it be created on these if the kernel has pseries support, but accessing
it on such a machine will crash due to trying to do hypervisor calls.
In fact, it should also not do HV calls on older pseries that didn't have
an hypervisor either.
Finally, it has the plumbing to be a module but is a "bool" Kconfig option.
This fixes the whole lot by turning it into a machine_device_initcall
that is only created on pseries, and adding the necessary hypervisor
check before calling the H_GET_EM_PARMS hypercall
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Here is a single bugfix that resolves the "can not build the OHCI driver
with CONFIG_PM disabled" problem that lots of people have been reporting
with 3.11-rc7. Sorry about that one, it missed my build tests, and it
seems, a number of others as well.
Thank goodness for Guenter :)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.21 (GNU/Linux)
iEYEABECAAYFAlIcDYsACgkQMUfUDdst+ykhqwCaAykv4ypLeDQa9cuKoxZcDeRb
wEEAoLWUzWk/ic2XrXbNZ0/pJ69bMmzs
=2yMG
-----END PGP SIGNATURE-----
Merge tag 'usb-3.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB bugfix from Greg KH:
"Here is a single bugfix that resolves the "can not build the OHCI
driver with CONFIG_PM disabled" problem that lots of people have been
reporting with 3.11-rc7. Sorry about that one, it missed my build
tests, and it seems, a number of others as well.
Thank goodness for Guenter :)"
* tag 'usb-3.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: OHCI: fix build error related to ohci_suspend/resume
client reporting a readdir loop.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.21 (GNU/Linux)
iQIcBAABAgAGBQJSG9IHAAoJEDaohF61QIxkptgP/jTEooTZ2uMmIouqj6rhrc81
avrqtB26Ww74XBzuyiTuSBLUUJXGaLveC/i9rR2XOyA7cXJUbrqvqpZ2OExOURsf
gHeLX20RKwKgaRQ6R9Xcri6wjWql4YHL/z3tI/DGpDkMfA0siocbHW+GZSfmMZ8c
EOcAECY+tqrAgFL1oESTinglGNZ2V1f/IyKB8DiUDgDuMkV6Zw3083Ph7xB/bw9T
Jffg6nvkST2mzZNTKgU8W3extW/X9GxxleYLED2jwEYioOFVAlvSKZTFD65NVUya
1l3YH68jROnDSoHmgOup0C6i51/e7QFuBNdyR/8UK2OyOXkJ1wneXutUIiysWWty
dY9+kxSSdpNuZvflGrZlP7yoEjGgRO/893owUcusiSgLTpQpNs2y7OVjCBwsHGa7
AIWyu+FyOnNnmO6oiYkmNQlE3bAoz3z0CO7IuP5lm5HRgMIxp+k2k8yOHon/PcuF
juQsbMydGcNVAjJlQuxCDh1uijOGLbDol/NekpUnyL02oy294raiWdy4IUaqWIHG
Y5i1yeVwFbr7QsI8RCbEliCRwP5YMWSp4irEUozJTDplAn4AnJ5AJdYq9KM5JHMm
qBHXl/asWbxGQZhmig2xzojZ+HVPFVWilpBz7OvsMXJaulrRqxV3DgAudIlZPP4B
mB8DPzctwmgzmlgCqzRW
=+aYs
-----END PGP SIGNATURE-----
Merge tag 'jfs-3.11-rc8' of git://github.com/kleikamp/linux-shaggy
Pull jfs fix from Dave Kleikamp:
"One JFS patch to fix an incompatibility with NFSv4 resulting in the
nfs client reporting a readdir loop"
* tag 'jfs-3.11-rc8' of git://github.com/kleikamp/linux-shaggy:
jfs: fix readdir cookie incompatibility with NFSv4
Commit 9a11899c5e (USB: OHCI: add missing PCI PM callbacks to
ohci-pci.c) added missing ohci_suspend and ohci_resume callback
pointers, but forgot that these callbacks are declared and defined
only when CONFIG_PM is enabled.
This patch adds a preprocessor conditional to avoid build errors when
PM is disabled.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Meelis Roos <mroos@linux.ee>,
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In suspend-resume sequence, the OS could attempt to initialize the controller
before it is ready, check for POST state before going ahead.
Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>