If interrupts were enabled when taking the spinlock, we can leave them
enabled while blocking to get the lock.
If we can enable interrupts while waiting for the lock to become
available, and we take an interrupt before entering the poll,
and the handler takes a spinlock which ends up going into
the slow state (invalidating the per-cpu "lock" and "want" values),
then when the interrupt handler returns the event channel will
remain pending so the poll will return immediately, causing it to
return out to the main spinlock loop.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-12-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks. The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).
In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking. We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:
Unlocker Locker
test for lock pickup
-> fail
unlock
test slowpath
-> false
set slowpath flags
block
Whereas this works in any ordering:
Unlocker Locker
set slowpath flags
test for lock pickup
-> fail
block
unlock
test slowpath
-> true, kick
If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.
The unlock code uses a locked add to update the head counter. This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs. If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.
Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks. However, if PV ticketlocks are not
enabled, then the old non-locked "add" is the only unlocking code.
Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n. If it
doesn't the generated code isn't too bad, but its definitely suboptimal.
Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Commit b202952075 ("perf, core: Rate limit
perf_sched_events jump_label patching") introduced rate limiting
for jump label disabling. The changes were made in the jump label code
in order to be more widely available and to keep things tidier. This is
all fine, except now jump_label.h includes linux/workqueue.h, which
makes it impossible to include jump_label.h from anything that
workqueue.h needs. For example, it's now impossible to include
jump_label.h from asm/spinlock.h, which is done in proposed
pv-ticketlock patches. This patch splits out the rate limiting related
changes from jump_label.h into a new file, jump_label_ratelimit.h, to
resolve the issue.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Link: http://lkml.kernel.org/r/1376058122-8248-10-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a "is in slowpath state" bit. This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-9-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path. To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-7-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.
xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.
xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any. If found,
it kicks it by making its event channel pending, which wakes it up.
We need to make sure interrupts are disabled while we're relying on the
contents of the per-cpu lock_waiting values, otherwise an interrupt
handler could come in, try to take some other lock, block, and overwrite
our values.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-6-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[ Raghavendra: use function + enum instead of macro, cmpxchg for zero status reset
Reintroduce break since we know the exact vCPU to send IPI as suggested by Konrad.]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
There's no need to do it at very early init, and doing it there
makes it impossible to use the jump_label machinery.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-5-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-4-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The code size expands somewhat, and its better to just call
a function rather than inline it.
Thanks Jeremy for original version of ARCH_NOINLINE_SPIN_UNLOCK config patch,
which is simplified.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1376058122-8248-3-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).
Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments. They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.
(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
To address this, we add two hooks:
- __ticket_spin_lock which is called after the cpu has been
spinning on the lock for a significant number of iterations but has
failed to take the lock (presumably because the cpu holding the lock
has been descheduled). The lock_spinning pvop is expected to block
the cpu until it has been kicked by the current lock holder.
- __ticket_spin_unlock, which on releasing a contended lock
(there are more cpus with tail tickets), it looks to see if the next
cpu is blocked and wakes it if so.
When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.
Results:
=======
setup: 32 core machine with 32 vcpu KVM guest (HT off) with 8GB RAM
base = 3.11-rc
patched = base + pvspinlock V12
+-----------------+----------------+--------+
dbench (Throughput in MB/sec. Higher is better)
+-----------------+----------------+--------+
| base (stdev %)|patched(stdev%) | %gain |
+-----------------+----------------+--------+
| 15035.3 (0.3) |15150.0 (0.6) | 0.8 |
| 1470.0 (2.2) | 1713.7 (1.9) | 16.6 |
| 848.6 (4.3) | 967.8 (4.3) | 14.0 |
| 652.9 (3.5) | 685.3 (3.7) | 5.0 |
+-----------------+----------------+--------+
pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases,
and undercommits results are flat
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
[ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pull dmaengine fixes from Vinod Koul:
"Two fixes for slave dmaengine. The first fixes cyclic dma transfers
for pl330 and the second one makes us return the correct error code on
probe"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dma: pl330: Fix cyclic transfers
pch_dma: fix error return code in pch_dma_probe()
Pull drm fix from Dave Airlie:
"Just a quick fix that a few people have reported, be nice to have in
asap"
The drm tree seems to be very confused about 64-bit divides. Here it
uses a slow 64-by-64 bit divide to divide by a small constant. Oh well.
Doesn't look performance-critical, just stupid.
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: fix 64 bit divide in SI spm code
Commit 46a1c2c7ae ("vfs: export lseek_execute() to modules") broke the
tmpfs SEEK_DATA/SEEK_HOLE implementation, because vfs_setpos() converts
the carefully prepared -ENXIO to -EINVAL. Other filesystems avoid it in
error cases: do the same in tmpfs.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All small regression or small fixes, nothing surprising at this stage.
- regression fix for Mac MINI quirk
- compress ioctl error fix
- ASoC fixes for control change notifications, some UI fixes,
driver-specific fixes (resource leak, build errors, etc)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJR/g7fAAoJEGwxgFQ9KSmkMwgP/RTXP4/40RYOpU+M2EQJzrJd
OZQnbRR7QLNB4Rec803k3LGj2Mrfsw/fUhNAHLF5ioOrOzwXvwTpZKu1/kQ31WKL
s4Bo2kGPqW4E9+1daOtFjcJLtXRf1Ae9ugoFV4QW/H08fllRg6gNlvgIhxpfVnd4
Bh8UN7mZBB+qTcfnaDxW0qA0O9C7xrLOiSXVp97x7KnvRcUYn4pM0lCxqPYoAFhO
ZDAGawMYa1WcWhseVhzSFu5oypfwmsH4lOXts3OPMP7n2UtGh5xJSH5cbb1CUH3n
77fI9AddlwbS07No5fWZZrXx/fFXk+8kpC47eq1cpW2OLPf2PTntAsbM14YcPDgR
MjJjFujsPv9y104ClQISjzQvHcgKQRHyu5ffQYU2SGIVRGGuqmofit0bnjQOSObj
6mX78yDYfScU7jY+l8crgIKTjC1xESNhu3ZxAvZ0ouDRB1As8zzbqg5Wn9n7o69i
kTRrRXIB2Dhj8gQJOOgenUaUjPnHdoNQGBb2VPD8nsXWiF74rBmEDwGGfB042BL3
J2R0ucwOYTRDx8/B2OPGQ0qTbDExWlagnQnscvWs7M8/p7ZG+4IDrDF1NFuuVvmf
1ztzvm5zdAlSrD0sdjYXsa6fdvgqcAjwZS+kZAeS3QTKoWNC/rw1mJCIK70q5Sb6
UerHKw6bE1MnKcMVWWQQ
=buMS
-----END PGP SIGNATURE-----
Merge tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All small regression or small fixes, nothing surprising at this stage.
- regression fix for intel Mac Mini quirk
- compress ioctl error fix
- ASoC fixes for control change notifications, some UI fixes,
driver-specific fixes (resource leak, build errors, etc)"
* tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix missing fixup for Mac Mini with STAC9221
ASoC: wm0010: Fix resource leak
ASoC: au1x: Fix build
ASoC: bf5xx-ac97: Fix compile error with SND_BF5XX_HAVE_COLD_RESET
ASoC: bfin-ac97: Fix prototype error following AC'97 refactoring
ALSA: compress: fix the return value for SNDRV_COMPRESS_VERSION
ASoC: dapm: Fix return value of snd_soc_dapm_put_{volsw,enum_virt}()
Pull networking fixes from David Miller:
1) Don't ignore user initiated wireless regulatory settings on cards
with custom regulatory domains, from Arik Nemtsov.
2) Fix length check of bluetooth information responses, from Jaganath
Kanakkassery.
3) Fix misuse of PTR_ERR in btusb, from Adam Lee.
4) Handle rfkill properly while iwlwifi devices are offline, from
Emmanuel Grumbach.
5) Fix r815x devices DMA'ing to stack buffers, from Hayes Wang.
6) Kernel info leak in ATM packet scheduler, from Dan Carpenter.
7) 8139cp doesn't check for DMA mapping errors, from Neil Horman.
8) Fix bridge multicast code to not snoop when no querier exists,
otherwise mutlicast traffic is lost. From Linus Lüssing.
9) Avoid soft lockups in fib6_run_gc(), from Michal Kubecek.
10) Fix races in automatic address asignment on ipv6, which can result
in incorrect lifetime assignments. From Jiri Benc.
11) Cure build bustage when CONFIG_NET_LL_RX_POLL is not set and rename
it CONFIG_NET_RX_BUSY_POLL to eliminate the last reference to the
original naming of this feature. From Cong Wang.
12) Fix crash in TIPC when server socket creation fails, from Ying Xue.
13) macvlan_changelink() silently succeeds when it shouldn't, from
Michael S Tsirkin.
14) HTB packet scheduler can crash due to sign extension, fix from
Stephen Hemminger.
15) With the cable unplugged, r8169 prints out a message every 10
seconds, make it netif_dbg() instead of netif_warn(). From Peter
Wu.
16) Fix memory leak in rtm_to_ifaddr(), from Daniel Borkmann.
17) sis900 gets spurious TX queue timeouts due to mismanagement of link
carrier state, from Denis Kirjanov.
18) Validate somaxconn sysctl to make sure it fits inside of a u16.
From Roman Gushchin.
19) Fix MAC address filtering on qlcnic, from Shahed Shaikh.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (68 commits)
qlcnic: Fix for flash update failure on 83xx adapter
qlcnic: Fix link speed and duplex display for 83xx adapter
qlcnic: Fix link speed display for 82xx adapter
qlcnic: Fix external loopback test.
qlcnic: Removed adapter series name from warning messages.
qlcnic: Free up memory in error path.
qlcnic: Fix ingress MAC learning
qlcnic: Fix MAC address filter issue on 82xx adapter
net: ethernet: davinci_emac: drop IRQF_DISABLED
netlabel: use domain based selectors when address based selectors are not available
net: check net.core.somaxconn sysctl values
sis900: Fix the tx queue timeout issue
net: rtm_to_ifaddr: free ifa if ifa_cacheinfo processing fails
r8169: remove "PHY reset until link up" log spam
net: ethernet: cpsw: drop IRQF_DISABLED
htb: fix sign extension bug
macvlan: handle set_promiscuity failures
macvlan: better mode validation
tipc: fix oops when creating server socket fails
net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL
...
Flash update routine was improperly checking register read API return value.
Modify register read API and perform proper error check.
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Set link speed and duplex to unknown when link is not up.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Do not obtain link speed from register when adapter
link is down.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver was not handling external loopback diagnostic
test request.
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pratik Pujar <pratik.pujar@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Delete MAC address from the adapter's filter table
if the source MAC address of ingress packet matches.
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver was passing the address of a pointer instead of
the pointer itself.
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IRQF_DISABLED is a no-op by now and should be removed.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull nfsd bugfixes from Bruce Fields:
"Most of this is due to a screwup on my part -- some gss-proxy crashes
got fixed before the merge window but somehow never made it out of a
temporary git repo on my laptop...."
* 'for-3.11' of git://linux-nfs.org/~bfields/linux:
svcrpc: set cr_gss_mech from gss-proxy as well as legacy upcall
svcrpc: fix kfree oops in gss-proxy code
svcrpc: fix gss-proxy xdr decoding oops
svcrpc: fix gss_rpc_upcall create error
NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure.
Pull arm fixes fixes from Russell King:
"This fixes a couple of problems with commit 48be69a026 ("ARM: move
signal handlers into a vdso-like page"), one of which was originally
discovered via my testing originally, but the fix for it was never
actually committed.
The other shows up on noMMU builds, and such platforms are extremely
rare and as such are not part of my nightly testing"
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: fix nommu builds with 48be69a02 (ARM: move signal handlers into a vdso-like page)
ARM: fix a cockup in 48be69a02 (ARM: move signal handlers into a vdso-like page)
Without this patch, the values for ideality (register 0x4b) and ideality
selection mask (register 0x4c) are inverted.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: stable@vger.kernel.org # 3.9+
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Olof reports that noMMU builds error out with:
arch/arm/kernel/signal.c: In function 'setup_return':
arch/arm/kernel/signal.c:413:25: error: 'mm_context_t' has no member named 'sigpage'
This shows one of the evilnesses of IS_ENABLED(). Get rid of it here
and replace it with #ifdef's - and as no noMMU platform can make use
of sigpage, depend on CONIFG_MMU not CONFIG_ARM_MPU.
Reported-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Unfortunately, I never committed the fix to a nasty oops which can
occur as a result of that commit:
------------[ cut here ]------------
kernel BUG at /home/olof/work/batch/include/linux/mm.h:414!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 490 Comm: killall5 Not tainted 3.11.0-rc3-00288-gabe0308 #53
task: e90acac0 ti: e9be8000 task.ti: e9be8000
PC is at special_mapping_fault+0xa4/0xc4
LR is at __do_fault+0x68/0x48c
This doesn't show up unless you do quite a bit of testing; a simple
boot test does not do this, so all my nightly tests were passing fine.
The reason for this is that install_special_mapping() expects the
page array to stick around, and as this was only inserting one page
which was stored on the kernel stack, that's why this was blowing up.
Reported-by: Olof Johansson <olof@lixom.net>
Tested-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
NetLabel has the ability to selectively assign network security labels
to outbound traffic based on either the LSM's "domain" (different for
each LSM), the network destination, or a combination of both. Depending
on the type of traffic, local or forwarded, and the type of traffic
selector, domain or address based, different hooks are used to label the
traffic; the goal being minimal overhead.
Unfortunately, there is a bug such that a system using NetLabel domain
based traffic selectors does not correctly label outbound local traffic
that is not assigned to a socket. The issue is that in these cases
the associated NetLabel hook only looks at the address based selectors
and not the domain based selectors. This patch corrects this by
checking both the domain and address based selectors so that the correct
labeling is applied, regardless of the configuration type.
In order to acomplish this fix, this patch also simplifies some of the
NetLabel domainhash structures to use a more common outbound traffic
mapping type: struct netlbl_dommap_def. This simplifies some of the code
in this patch and paves the way for further simplifications in the
future.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's possible to assign an invalid value to the net.core.somaxconn
sysctl variable, because there is no checks at all.
The sk_max_ack_backlog field of the sock structure is defined as
unsigned short. Therefore, the backlog argument in inet_listen()
shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall
is truncated to the somaxconn value. So, the somaxconn value shouldn't
exceed 65535 (USHRT_MAX).
Also, negative values of somaxconn are meaningless.
before:
$ sysctl -w net.core.somaxconn=256
net.core.somaxconn = 256
$ sysctl -w net.core.somaxconn=65536
net.core.somaxconn = 65536
$ sysctl -w net.core.somaxconn=-100
net.core.somaxconn = -100
after:
$ sysctl -w net.core.somaxconn=256
net.core.somaxconn = 256
$ sysctl -w net.core.somaxconn=65536
error: "Invalid argument" setting key "net.core.somaxconn"
$ sysctl -w net.core.somaxconn=-100
error: "Invalid argument" setting key "net.core.somaxconn"
Based on a prior patch from Changli Gao.
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Reported-by: Changli Gao <xiaosuo@gmail.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fixes for the newly merged mlx5 hardware driver
- Stack info leak fixes from Dan Carpenter
- Fixes for pkey table handling with SR-IOV
- A few other small things
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJR+9nLAAoJEENa44ZhAt0hBPQP/RSRLYaN4k9apw8mOdiQo8hD
LDFt/DTSSZz/0/VtCgQoOblqAmsDeHp8Tj/WS7bm1zi8ajBPLmUBuQ9GWza0KDFT
BDbhcdndvg6FO47GdyF8KWQrBB/WRfGg1ucqeelqRB1WoMtUnP/bX3umFrn7adNG
/W38KNfx+G2/WwQMbUWl7jxJuqS2tGGKBQEvdO0LjsNTfU4LSqI8L9WrLXhwAV4V
GXNc3rV1EsLqoHe9IznKGw3Ls02ZuFw7b/+WEVY6PjSf73sqfAm3LJnGO6h9Jd79
CKLr2ev0cadosorMrdL9GdOGZX/gGw0Ub9YBLiRdNNq1aPIHaWzbQbsW2nQHYQ/l
bZufcye2A/10LaP9JsX4RaK3mE5dm7jvZKoT5sqX+j6k3KgDuCmBjsGVaWaP+NyZ
iqrIcyJxQfB6Cpm2vELrBeCusvVv9/Z5v4yeMQYhnAdVSEsqFEmE200Udnm3viiq
sHtSe9fN7fkN7aB+immIlrOV1tMZARwbJ6yz/3Gw4gu6O8Yfjo74dIiu0m2feiUl
tNlzPzPiIPx1PTEmjTKAKtuauTxHNVxp7yQS8IJLzmtU9BmYmau11CLI08e1gnce
HY7hMfeq65pxbNu4pcn6V7HUmU4u6kUeQkygzi+f6w6FMLH5CWbF0Hw7Ly7wpSNL
g56DkcamFKr86ex/J80z
=ZwEX
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband/rdma fixes from Roland Dreier:
- Fixes for the newly merged mlx5 hardware driver
- Stack info leak fixes from Dan Carpenter
- Fixes for pkey table handling with SR-IOV
- A few other small things
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IPoIB: Fix pkey change flow for virtualization environments
IPoIB: Make sure child devices use valid/proper pkeys
IB/core: Create QP1 using the pkey index which contains the default pkey
mlx5_core: Variable may be used uninitialized
mlx5_core: Implement new initialization sequence
mlx5_core: Fix use after free in mlx5_cmd_comp_handler()
IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()
IB/mlx5: Fix error return code in init_one()
IB/mlx4: Use default pkey when creating tunnel QPs
RDMA/cma: Only call cma_save_ib_info() for CM REQs
RDMA/cma: Fix accessing invalid private data for UD
RDMA/cma: Fix gcc warning
Revert "RDMA/nes: Fix compilation error when nes_debug is enabled"
IB/qib: Add err_decode() call for ring dump
RDMA/cxgb3: Fix stack info leak in iwch_create_cq()
RDMA/nes: Fix info leaks in nes_create_qp() and nes_create_cq()
RDMA/ocrdma: Fix several stack info leaks
RDMA/cxgb4: Fix stack info leak in c4iw_create_qp()
RDMA/ocrdma: Remove unused include
- Revert the OMAP fixes that caused more problems than they solved.
- Fix a build error.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJR+h+XAAoJEEEQszewGV1zIMYQAK7VFsYq1ZNkzpoTYSkglnaa
7twmEFqyV4U+jWtJRjpygJ1ZP6uL/ZdI/ouDlvOXlNf0dFJUtaSiqrL1eB2l7OXN
9t463se/0h4/aoN4i09aN0CiDntmfrFxHnu6xe5oO4ukcw1EJu3gWekKBFZ6lX79
1cayVAnrk2dudIjpNqBxJP+SzNGOI0yn1Ig+iznXw5JHhfjkZ0Heh1UQYpnVOjjC
attVs4++lxj7Tj4qkdPibztEnvOzMFV4zAqMFsCiqv/8IqSui9J7zHBYdde05ICZ
yynf3VWuhDjmhMYkNr+wUbfpBJA75u2ViEc/hKJihTykOOxuU+XvX/KGsBwbK7J+
EENTzBDZmyH3OYntKHWJlHkFy6Ym6hWPVGIZ3AJoiMFX2SQzWMgqW/yVUvmeWNj+
2Ds/of9n3VkN9j1dDnAdvrR1TqjpTDHCP+ZSgPSO1X4bBAbWi2HajwvMWZNKUIIP
md2Qu+KWUuDBoq1FpndJWubmEzfRXAZJn4Ghk2ldoy8Q2zI6pSeG1yUHI0IFTLaD
QsCpE84lUoBA1isLt9X2nRTNAKwBjb/9+x5VvT4tACd5V2nYk8TKiuE9DCVKDDNd
7/KNO9p9HgisV+ZqSJyT8hPz42d0PXapKr+jW/dikX+7fXDJj3EY+s/jt0bTQlkS
1Ut/Dhm0HAAzKBh+dHUV
=PQ04
-----END PGP SIGNATURE-----
Merge tag 'gpio-for-v3.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Yet another GPIO pull request, fixing the fix from the last one. It
turns out that fixing the boot path for device tree boots on OMAP
breaks out antique systems (such as OMAP1) and we need to find a
better way. So we're reverting that "fix" for the moment and thinking
about something better.
Also fixing a build issue on the MSM driver"
* tag 'gpio-for-v3.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio_msm: Fix build error due to missing err.h
Revert "gpio/omap: don't create an IRQ mapping for every GPIO on DT"
Revert "gpio/omap: auto request GPIO as input if used as IRQ via DT"
Revert "gpio/omap: fix build error when OF_GPIO is not defined."
Commit 5c766d642 ("ipv4: introduce address lifetime") leaves the ifa
resource that was allocated via inet_alloc_ifa() unfreed when returning
the function with -EINVAL. Thus, free it first via inet_free_ifa().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
This message was added in commit a7154cb8 (June 2004, [PATCH] r8169:
link handling and phy reset rework) and is printed every ten seconds
when no cable is connected and runtime power management is disabled.
(Before that commit, "Reset RTL8169s PHY" would be printed instead.)
Signed-off-by: Peter Wu <lekensteyn@gmail.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IRQF_DISABLED is a no-op by now and should be
removed.
Signed-off-by: Felipe Balbi <balbi@ti.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When userspace passes a large priority value
the assignment of the unsigned value hopt->prio
to signed int cl->prio causes cl->prio to become negative and the
comparison is with TC_HTB_NUMPRIO is always false.
The result is that HTB crashes by referencing outside
the array when processing packets. With this patch the large value
wraps around like other values outside the normal range.
See: https://bugzilla.kernel.org/show_bug.cgi?id=60669
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull powerpc fixes from Ben Herrenschmidt:
"Here is not quite a handful of powerpc fixes for rc3.
The windfarm fix is a regression fix (though not a new one), the PMU
interrupt rename is not a fix per-se but has been submitted a long
time ago and I kept forgetting to put it in (it puts us back in sync
with x86), the other perf bit is just about putting an API/ABI bit
definition in the right place for userspace to consume, and finally,
we have a fix for the VPHN (Virtual Partition Home Node) feature
(notification that the hypervisor is moving nodes around) which could
cause lockups so we may as well fix it now"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/windfarm: Fix noisy slots-fan on Xserve (rm31)
powerpc: VPHN topology change updates all siblings
powerpc/perf: Export PERF_EVENT_CONFIG_EBB_SHIFT to userspace
powerpc: Rename PMU interrupts from CNT to PMI
Pull ARM fixes from Russell King:
"I've thought long and hard about what to say for this pull request,
and I really can't work out anything sane to say to summarise much of
these commits. The problem is, for most of these are, yet again, lots
of small bits scattered around the place without any real overall
theme to them"
Most notable is probably the kuser page helper improvements.
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm: (22 commits)
ARM: Add .text annotations where required after __CPUINIT removal
ARM: 7803/1: Fix deadlock scenario with smp_send_stop()
ARM: make vectors page inaccessible from userspace
ARM: move signal handlers into a vdso-like page
ARM: allow kuser helpers to be removed from the vector page
ARM: update FIQ support for relocation of vectors
ARM: use linker magic for vectors and vector stubs
ARM: move vector stubs
ARM: poison memory between kuser helpers
ARM: poison the vectors page
ARM: 7801/1: v6: prevent gcc 4.5 from reordering extended CP15 reads above is_smp() test
ARM: 7800/1: ARMv7-M: Fix name of NVIC handler function
ARM: Fix sorting of machine- initializers
ARM: 7791/1: a.out: remove partial a.out support
ARM: 7790/1: Fix deferred mm switch on VIVT processors
ARM: 7789/1: Do not run dummy_flush_tlb_a15_erratum() on non-Cortex-A15
ARM: 7787/1: virt: ensure visibility of __boot_cpu_mode
ARM: 7788/1: elf: fix lpae hwcap feature reporting in proc/cpuinfo
ARM: 7786/1: hyp: fix macro parameterisation
ARM: 7785/1: mm: restrict early_alloc to section-aligned memory
...
Pull parisc updates from Helge Deller:
"The majority of lines changed are due the addition of a defconfig for
the C8000 machine. Even the fix in parisc/kernel/cache.c file is
actually ony a 10-line fix, but the change became bigger (and much
nicer) to avoid errors of the checkpatch script.
Here is the short-changelog:
This round of parisc updates includes mostly fixes for the C8000
workstation. We have a new defconfig file for this machine, as well
as fixes for it's serial port, the AGP driver and the cache routines
to cope with the vmas of the FireGL card in a C8000. The sys32.h
header file was not used and as such it's now gone"
* 'parisc-3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix interrupt routing for C8000 serial ports
parisc: Remove arch/parisc/kernel/sys32.h header
parisc: add defconfig for c8000 machine
parisc: agp/parisc-agp: allow binding of user memory to the AGP GART
parisc: Fix cache routines to ignore vma's with an invalid pfn
Hotplug
PCI: pciehp: Fix null pointer deref when hot-removing SR-IOV device
PCI: hotplug: Convert to be builtin only, not modular
PCI: pciehp: Convert pciehp to be builtin only, not modular
Resource allocation
PCI: Retry allocation of only the resource type that failed
ARM
PCI: mvebu: Disable prefetchable memory support in PCI-to-PCI bridge
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJR+/RRAAoJEFmIoMA60/r8iloP/is7mx667v9F9WRlYMJss+ig
K0cVOauokmr/pyS7Tgr0jfhtulXOmN/VJTvCw6h7+qYtsO/DQ7Y5LPXoMW83s5hK
/N33HDZcm2/1hUxv/4GS3omtO9VefWcmzXmPM7l1fEBQyACTg/zj1Mb97U8j8mhh
vr6y/pLDLwwaRcQNP/Y2F6H8RsDDWE/IyC9MN1qEz+b7Qve5fAPPrDBgExakzmu/
UiiJV3nl7fqBeITA/LSWDCgsOeavmmabqZuaXnKWN0L5PaEa3/8Of6kbsG5ZGgdZ
Y5/Qu0HRtAhaFNAd1670IzcMThC6TJV685z09/OQ+4uKpZ2jJYM26ISSdiMG2He4
FQmLQcgkxGX0xYxdD7K37VC7O17NH+3jEouM2SNSCXGz5RQ3qvW5F4IvSctHmIO8
Q0m2HladNYHtOYk1eGNSlxPd3U0nyQwlXSTgJrKYd/cJgUqYjs+Q9zuAmUj+4QPR
ywwO3rgpetaROY6avw31fWFGqQFAQEQeXvMwTrAJbdIcxvVz77yRiOrD72X61Z7h
A6owzbMmuYKPuym5EbBr0GoCJWAuHc8GIvXKHQQ9QMGuRAj9X4Qrk+BdudeqlTxD
e9WKkKGPmPxR6IuSlZdLCNqcTDDlQZFTnq5WJ989pPsUMKFu8LjybKntBEbPtIue
ydWvibCrxNbTEMyyg+c6
=z2eX
-----END PGP SIGNATURE-----
Merge tag 'pci-v3.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Yinghai fixed a couple regressions: one resource assignment problem
introduced in v3.10 that showed up with SR-IOV on powerpc, and another
SR-IOV hot-remove issue related to refcounting changes we merged for
v3.11.
Yinghai is still working on another SR-IOV-related fix or two, which
will be simpler if pciehp is non-modular, so I included the Kconfig
changes now to get them in earlier.
Finally, a minor fix for the ARM Marvell EBU host bridge driver that
was merged for v3.11
Hotplug:
PCI: pciehp: Fix null pointer deref when hot-removing SR-IOV device
PCI: hotplug: Convert to be builtin only, not modular
PCI: pciehp: Convert pciehp to be builtin only, not modular
Resource allocation:
PCI: Retry allocation of only the resource type that failed
ARM:
PCI: mvebu: Disable prefetchable memory support in PCI-to-PCI bridge"
* tag 'pci-v3.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: mvebu: Disable prefetchable memory support in PCI-to-PCI bridge
PCI: Retry allocation of only the resource type that failed
PCI: pciehp: Convert pciehp to be builtin only, not modular
PCI: hotplug: Convert to be builtin only, not modular
PCI: pciehp: Fix null pointer deref when hot-removing SR-IOV device
- Revert two cpuidle commits added during the 3.8 development cycle that
turn out to have introduced a significant performance regression as
requested by Jeremy Eder.
- The recent patches that made the freezer less heavy-weight introduced
a regression causing user-space-driven hibernation using the ioctl()
interface to block indefinitely when the hibernate process executes
try_to_freeze(). Fix from Colin Cross addresses this by adding a
process flag to mark the hibernate/suspend process to inform the
freezer that that process should be ignored.
- One of the recent cpufreq reverts uncovered a problem in the core
causing the cpufreq driver module refcount to become negative after
a system suspend-resume cycle. Fix from Rafael J Wysocki.
- The evaluation of the ACPI battery _BIX method has never worked
correctly, because the commit that added support for it forgot to
take the "Revision" field in the return package into account. As
a result, the reading of battery info doesn't work at all on some
systems, which is addressed by a fix from Lan Tianyu.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJR+6ptAAoJEKhOf7ml8uNsRpIP/0P2HbCFM52/4Rv/Iltnt4fI
9Vo2dyuL7JKP2U8jtHxfhFGg3oMdYQoUIdnpjtKr4O3obhzl4vHwE9vtrRlhHpRZ
SnHGe0W5v0eQOdCbVzdwS1NrJwckkTy1JuybV+PH66T84Usu0QoxE4iNveK2LX23
eJvOgWGBoyEEWJb+1/KJNIcKk77A0Cnc2CCLMN5bmhwH1QGDRZdzSnrjK5fGniF0
akCGq8jJhBaI1xJF/42LgNBiPpAYk42SPuiSOqniKzweUK1P6YzHjArh0qaTBoUj
27HRkZlY6Y8WLFxqQio7zvbbLSdRuwosESofw2kCFkAAEnCc71kw2nbebNr3sCap
MqrmEMcxqT803PiB2RGyS53WNE7mM3NFCPRLOPL+cWeNQhoYzbZ+UiNx4Dw667cr
Ow+egCY+jyAZm5TFqY6Y75lG61UM6oCs6M6iIwiv/BOmJqCmkTjvNBxHWrVcWxin
YhiLJGyt7iAcIaxhy+fCs2j2a7B0Ai62kZ6YLqaEtNBzjuDbm6sr61A6Nu8bpOTU
C7e76AocyfuDpdU99uawDvuazCGWEg+f8eH8C/ij19jF1/Mrlr0x+4x9MmMm9Iz5
ux0uroTteEuswz9aHmY270qdDLIuSGUsmqD05RoaO61U8dVigWw+ZKqUCImrAM7x
4bK1+2eOig794g9vSsen
=7x7r
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
- Revert two cpuidle commits added during the 3.8 development cycle
that turn out to have introduced a significant performance regression
as requested by Jeremy Eder.
- The recent patches that made the freezer less heavy-weight introduced
a regression causing user-space-driven hibernation using the ioctl()
interface to block indefinitely when the hibernate process executes
try_to_freeze(). Fix from Colin Cross addresses this by adding a
process flag to mark the hibernate/suspend process to inform the
freezer that that process should be ignored.
- One of the recent cpufreq reverts uncovered a problem in the core
causing the cpufreq driver module refcount to become negative after a
system suspend-resume cycle. Fix from Rafael J Wysocki.
- The evaluation of the ACPI battery _BIX method has never worked
correctly, because the commit that added support for it forgot to
take the "Revision" field in the return package into account. As a
result, the reading of battery info doesn't work at all on some
systems, which is addressed by a fix from Lan Tianyu.
* tag 'pm+acpi-3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processes
ACPI / battery: Fix parsing _BIX return value
cpufreq: Fix cpufreq driver module refcount balance after suspend/resume
Revert "cpuidle: Quickly notice prediction failure for repeat mode"
Revert "cpuidle: Quickly notice prediction failure in general case"