When device is being removed (e.g during VPI port link type change
from ETH to IB), tasks for gid table changes should not be executed.
Flush the current queue of tasks and block further tasks from entering the queue.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Chuck Lever reported the following stack trace:
=================================
[ INFO: inconsistent lock state ]
3.16.0-rc2-00024-g2e78883 #17 Tainted: G E
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
(&(&iboe->lock)->rlock){+.?...}, at: [<ffffffffa065f68b>] mlx4_ib_addr_event+0xdb/0x1a0 [mlx4_ib]
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810b3110>] mark_irqflags+0x110/0x170
[<ffffffff810b4806>] __lock_acquire+0x2c6/0x5b0
[<ffffffff810b4bd9>] lock_acquire+0xe9/0x120
[<ffffffff815f7f6e>] _raw_spin_lock+0x3e/0x80
[<ffffffffa0661084>] mlx4_ib_scan_netdevs+0x34/0x260 [mlx4_ib]
[<ffffffffa06612db>] mlx4_ib_netdev_event+0x2b/0x40 [mlx4_ib]
[<ffffffff81522219>] register_netdevice_notifier+0x99/0x1e0
[<ffffffffa06626e3>] mlx4_ib_add+0x743/0xbc0 [mlx4_ib]
[<ffffffffa05ec168>] mlx4_add_device+0x48/0xa0 [mlx4_core]
[<ffffffffa05ec2c3>] mlx4_register_interface+0x73/0xb0 [mlx4_core]
[<ffffffffa05c505e>] cm_req_handler+0x13e/0x460 [ib_cm]
[<ffffffff810002e2>] do_one_initcall+0x112/0x1c0
[<ffffffff810e8264>] do_init_module+0x34/0x190
[<ffffffff810ea62f>] load_module+0x5cf/0x740
[<ffffffff810ea939>] SyS_init_module+0x99/0xd0
[<ffffffff815f8fd2>] system_call_fastpath+0x16/0x1b
irq event stamp: 336142
hardirqs last enabled at (336142): [<ffffffff810612f5>] __local_bh_enable_ip+0xb5/0xc0
hardirqs last disabled at (336141): [<ffffffff81061296>] __local_bh_enable_ip+0x56/0xc0
softirqs last enabled at (336004): [<ffffffff8106123a>] _local_bh_enable+0x4a/0x50
softirqs last disabled at (336005): [<ffffffff810617a4>] irq_exit+0x44/0xd0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&iboe->lock)->rlock);
<Interrupt>
lock(&(&iboe->lock)->rlock);
*** DEADLOCK ***
The above problem was caused by the spin lock being taken both in the process
context and in a soft-irq context (in a netdev notifier handler).
The required fix is to use spin_lock/unlock_bh() instead of spin_lock/unlock
on the iboe lock.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When a RoCE port becomes active and the netdev of the port has upper
device (e.g bond/team), GIDs derived from the upper dev should appear
in the port's RoCE GID table.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
There's no need to reset the gid table twice and we need to do it only
for Ethernet ports. Also, no need to actively scan ndetdevs since it's
being done immediatly after we register netdev notifiers.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When reading the IPv6 addresses from the net-device, make sure to
avoid adding a duplicate entry to the GID table because of equality
between the default GID we generate and the default IPv6 link-local
address of the device.
Fixes: acc4fccf4e ("IB/mlx4: Make sure GID index 0 is always occupied")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When Ethernet netdev is not present for a port (e.g. when the link
layer type of the port is InfiniBand) it's possible to dereference a
null pointer when we do netdevice scanning.
To fix that, we move a section of code that needs to run only when
netdev is present to a proper if () statement.
Fixes: ad4885d279 ("IB/mlx4: Build the port IBoE GID table properly under bonding")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch addresses feedback from Sagi Grimberg on the rereg_mr
implementation of mlx4. The following are fixed:
1. Set the correct pd_flags
2. Make sure we change the iova and size MR fields only after
successful write and allocation of the MTTs.
3. Make the error checking more robust
Fixes: e630664c83 ("mlx4_core: Add helper functions to support MR re-registration")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
According to <http://marc.info/?t=138347640900004&r=1&w=2>, revision
A0 of Connect-X does not correctly assemble TSO packets. Disable that
feature on that hardware revision.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Highlights:
- More fixes for read/write codepath regressions
- Sleeping while holding the inode lock
- Stricter enforcement of page contiguity when coalescing requests
- Fix up error handling in the page coalescing code
- Don't busy wait on SIGKILL in the file locking code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJT+0LpAAoJEGcL54qWCgDyWfsP/imrpge47aZywi95chV8vgjM
O85ITZbupTFwXbB7kE63CrcaxRGhFrSStk4UDhDCDkHfFb1ksjZaPR1mnkwvkR2p
4+JUoq0fkPfeX21+rqKCYmnhstpne/N8K8FJBsEs3/TqiCBWxWOelLXdyWun4H5B
9JBYQ7FYitUazeSiSiDXcl7Di/E09cFPi0H5VPKRyuNdYxySabnsBOELBE/28iXr
egW1I9UKQR2EtBrvgazBbWE5XmB9XAm4X3sD1l0QD65mfSNkbnNhPFSiCdT7f/d6
9uxECR0Y4wNYgYAfVLBew5/MXJajcv03BFMKmTUeGj9fOQzycpBT4Dx2KxEWqfnt
Xk2nNbISxBnO0koMflmo+LPv2lv+Br3kQ+eZCHHKknvBrX2a6bJdTCZkwACVtND9
LdbAveFQpdaeLrm/28TnRoE927r+VeAVM19yOSG8sNAskFFg4Yy51tR0e1GivkJT
+qmmTRx+l78HjHvoPXOYdNgBC954r6APH5ST7su/7WxNClM36fEK6XxA9xbDLJWm
wUzlGKvpwEeBJJhgjbQLwuU8BiksjFz/CaiObNvPOpc/d2GoKIhnTg19kNhg2R//
UCDa2d5fep4z0Bo9p0s1KZm9pSBkkLjvRp9dm8WEIxLcdaF1jBK3dJECepm6ccvw
dmEmEfjbMudVdt/ZhapJ
=2wRt
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
"Highlights:
- more fixes for read/write codepath regressions
* sleeping while holding the inode lock
* stricter enforcement of page contiguity when coalescing requests
* fix up error handling in the page coalescing code
- don't busy wait on SIGKILL in the file locking code"
* tag 'nfs-for-3.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: Don't busy-wait on SIGKILL in __nfs_iocounter_wait
nfs: can_coalesce_requests must enforce contiguity
nfs: disallow duplicate pages in pgio page vectors
nfs: don't sleep with inode lock in lock_and_join_requests
nfs: fix error handling in lock_and_join_requests
nfs: use blocking page_group_lock in add_request
nfs: fix nonblocking calls to nfs_page_group_lock
nfs: change nfs_page_group_lock argument
* Confine SH_INTC to platforms that need it
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJT+oswAAoJENfPZGlqN0++/A8QAK2yp96zl7gABtddgkunNJiV
xWCm4wcP+7dIrLpwizxtt/6HMPj9ZJHYtcxRRKVlzquwcukeVCIkfZ3Qvn2JilWM
+b4rVRGzQ+z0M7SCpNGjtgFc1IFrVrzuxZpPwgseQ5I6HQsZJKUi3qn5rRJSEax2
w+ANis2eZfmSBdu3Qsx2QBDXvS7ZPlLpimoAE+rjr60dZnljcrvBrVHQvSgpEHxn
4rXPbYrrkIee32J4lxLRDPtn5fvAsrr+vcLXEYqyFr2292U2PAxIrjZgrRzZYNtD
L6xmhZAhdmIpC0gLDLyKQhwj/9pfGFxFErZX9jeGdPT2KTEEYPkSxb75kdTwmxP4
DEtZG9Nuu6CQCBldrMt6kpujn+XCYRl94cSyUffJ5KVhNUiqB+e+JP80Yj9HB7QU
nGUfs2hQ/azY8XEiwvnls0314thkB5gN7KHxmlaFa9Wz7cgYFwWAguHH8qrq839U
+i+7dUg7nlmVoIAvzYIIA+L7x3xX4gEwsf88x3i04DPp0A8/oDAwGrqM5QzsaZpD
ghodnKRG7foc+29RLYka5CxF/MqKzpJu675sSBCVB3uW9NZpoGSwml/NN9yUu1JD
uvU2FECcbxF5OU6RHfq8FFql0fAOVeVWsMCEXgkjpZOXynKU15VTf1K2d+qkUZ/Y
gV45Jj+yCZhYs0wbXUml
=QSyy
-----END PGP SIGNATURE-----
Merge tag 'renesas-sh-drivers-for-v3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas
Pull SH driver fix from Simon Horman:
"Confine SH_INTC to platforms that need it"
* tag 'renesas-sh-drivers-for-v3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
sh: intc: Confine SH_INTC to platforms that need it
Pull MIPS fixes from Ralf Baechle:
"Pretty much all across the field so with this we should be in
reasonable shape for the upcoming -rc2"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: OCTEON: make get_system_type() thread-safe
MIPS: CPS: Initialize EVA before bringing up VPEs from secondary cores
MIPS: Malta: EVA: Rename 'eva_entry' to 'platform_eva_init'
MIPS: EVA: Add new EVA header
MIPS: scall64-o32: Fix indirect syscall detection
MIPS: syscall: Fix AUDIT value for O32 processes on MIPS64
MIPS: Loongson: Fix COP2 usage for preemptible kernel
MIPS: NL: Fix nlm_xlp_defconfig build error
MIPS: Remove race window in page fault handling
MIPS: Malta: Improve system memory detection for '{e, }memsize' >= 2G
MIPS: Alchemy: Fix db1200 PSC clock enablement
MIPS: BCM47XX: Fix reboot problem on BCM4705/BCM4785
MIPS: Remove duplicated include from numa.c
MIPS: Add common plat_irq_dispatch declaration
MIPS: MSP71xx: remove unused plat_irq_dispatch() argument
MIPS: GIC: Remove useless parens from GICBIS().
MIPS: perf: Mark pmu interupt IRQF_NO_THREAD
separate trampolines had a design flaw with the interaction between
the function and function_graph tracers.
The main flaw was the simplification of the use of multiple tracers having
the same filter (like function and function_graph, that use the
set_ftrace_filter file to filter their code). The design assumed that the
two tracers could never run simultaneously as only one tracer can be
used at a time. The problem with this assumption was that the function
profiler could be implemented on top of the function graph tracer, and
the function profiler could run at the same time as the function tracer.
This caused the assumption to be broken and when ftrace detected this
failed assumpiton it would spit out a nasty warning and shut itself down.
Instead of using a single ftrace_ops that switches between the function
and function_graph callbacks, the two tracers can again use their own
ftrace_ops. But instead of having a complex hierarchy of ftrace_ops,
the filter fields are placed in its own structure and the ftrace_ops
can carefully use the same filter. This change took a bit to be able
to allow for this and currently only the global_ops can share the same
filter, but this new design can easily be modified to allow for any
ftrace_ops to share its filter with another ftrace_ops.
The first four patches deal with the change of allowing the ftrace_ops
to share the filter (and this needs to go to 3.16 as well).
The fifth patch fixes a bug that was also caused by the new changes
but only for archs other than x86, and only if those archs implement
a direct call to the function_graph tracer which they do not do yet
but will in the future. It does not need to go to stable, but needs
to be fixed before the other archs update their code to allow direct
calls to the function_graph trampoline.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJT+hqSAAoJEKQekfcNnQGulvcH/0O4NMXX4HH1dQlYgKEaSYxE
Nh8WdiewopF5iaeNvo+8Nzdq8D2k3KgMOqSlzJ4JVmzd7gjOBSGeKDfqFwR+IbTk
9LcaJJCI3oG3MEf6m7gZMdjKPKyxkeYHDtG7kRHo8z94eliV9pKC6fUnEWayQO3o
Kv6IBupdkF8ICAiKRae5Uo0c9wjZ9YP0bZS7fxI2hJw3h/NMFnhnhUL03URIx8e3
dqgpweYg+P3KPfp2Jz6safdJqLTPK9rqqhkZhylbDl7o78xEzRN7wCyB6Nak00xz
swRgsW6vFP7ci/YSNx+B6HCIf7NTm3WLDrrIhitNHcJUZwUMU3CRO9IJHGsTuEE=
=J5lZ
-----END PGP SIGNATURE-----
Merge tag 'trace-fixes-v3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull fix for ftrace function tracer/profiler conflict from Steven Rostedt:
"The rewrite of the ftrace code that makes it possible to allow for
separate trampolines had a design flaw with the interaction between
the function and function_graph tracers.
The main flaw was the simplification of the use of multiple tracers
having the same filter (like function and function_graph, that use the
set_ftrace_filter file to filter their code). The design assumed that
the two tracers could never run simultaneously as only one tracer can
be used at a time. The problem with this assumption was that the
function profiler could be implemented on top of the function graph
tracer, and the function profiler could run at the same time as the
function tracer. This caused the assumption to be broken and when
ftrace detected this failed assumpiton it would spit out a nasty
warning and shut itself down.
Instead of using a single ftrace_ops that switches between the
function and function_graph callbacks, the two tracers can again use
their own ftrace_ops. But instead of having a complex hierarchy of
ftrace_ops, the filter fields are placed in its own structure and the
ftrace_ops can carefully use the same filter. This change took a bit
to be able to allow for this and currently only the global_ops can
share the same filter, but this new design can easily be modified to
allow for any ftrace_ops to share its filter with another ftrace_ops.
The first four patches deal with the change of allowing the ftrace_ops
to share the filter (and this needs to go to 3.16 as well).
The fifth patch fixes a bug that was also caused by the new changes
but only for archs other than x86, and only if those archs implement a
direct call to the function_graph tracer which they do not do yet but
will in the future. It does not need to go to stable, but needs to be
fixed before the other archs update their code to allow direct calls
to the function_graph trampoline"
* tag 'trace-fixes-v3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Use current addr when converting to nop in __ftrace_replace_code()
ftrace: Fix function_profiler and function tracer together
ftrace: Fix up trampoline accounting with looping on hash ops
ftrace: Update all ftrace_ops for a ftrace_hash_ops update
ftrace: Allow ftrace_ops to use the hashes from other ops
Pull x86 fixes from Ingo Molnar:
"A couple of EFI fixes, plus misc fixes all around the map"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/arm64: Store Runtime Services revision
firmware: Do not use WARN_ON(!spin_is_locked())
x86_32, entry: Clean up sysenter_badsys declaration
x86/doc: Fix the 'tlb_single_page_flush_ceiling' sysconfig path
x86/mm: Fix sparse 'tlb_single_page_flush_ceiling' warning and make the variable read-mostly
x86/mm: Fix RCU splat from new TLB tracepoints
Pull perf fixes from Ingo Molnar:
"A kprobes and a perf compat ioctl fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Handle compat ioctl
kprobes: Skip kretprobe hit in NMI context to avoid deadlock
A collection of fixes from this week, it's been pretty quiet and nothing
really stands out as particularly noteworthy here -- mostly minor fixes
across the field:
- ODROID booting was fixed due to PMIC interrupts missing in DT
- A collection of i.MX fixes
- Minor Tegra fix for regulators
- Rockchip fix and addition of SoC-specific mailing list to make it
easier to find posted patches.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJT+jEaAAoJEIwa5zzehBx3gPIP/2EMvkRiFL73z6u67d6AhqCR
UPpv65Lk0FzvKZYvpNdNDdPatS48SpEh4Ppj58HegK31jaZIYhsvGHFsAwrHpZXF
2D2H8Vqy6wl+UL+BQGTHJ2rhqgg40PSZQh1KksaBrjb9MDWUbAI0V9AysoT6ecIP
/02mzRkNPL7V5ISikksa82FWbZwI36KJGTM19ZDp6CYQnV2L4eG0LefGjgEzdE07
iur1PO/TUi0ibQex3It9D+ynNlza0sZJDR0AsGFTS/96fvtX6SQzvxtEsr4jUfyT
jceqD5KIWS9N1OmRudJ+e3awpzGGuRIkdq36eiJbhSe426LHgDNbIS4RU+YRNFIf
9/bK4blcxGNnddsDTLUIyi+vykAm1ObAfGNNrKeA4z9lDw0QVuoG5VwrgNKcIk3J
kugj3RLUQ5yd9iIFJyrPxlpB5mTo5SaPCSxjuDKzftwDQtF+SJI58V//Nde0Ocfw
K7VpmY26uYUmf6AltiyFxOCUASzUC3Bp+/cf0lYTWE1+iIuOTJRlYmYwKDMrJ+c0
mtz3uCiQhTzaHje6AksA1wlKhv3KS/opGN1oNVSILgjExiRGh94MSkROLqtJ35cE
WV/nuzZUPdmZ6tGQ6b7FEa0elbEElaioDiQraOeSehKijEjfhK+tHNJczoO94s2R
Swfff4NwRe346ZMk/L/2
=+oYu
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A collection of fixes from this week, it's been pretty quiet and
nothing really stands out as particularly noteworthy here -- mostly
minor fixes across the field:
- ODROID booting was fixed due to PMIC interrupts missing in DT
- a collection of i.MX fixes
- minor Tegra fix for regulators
- Rockchip fix and addition of SoC-specific mailing list to make it
easier to find posted patches"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
bus: arm-ccn: Fix warning message
ARM: shmobile: koelsch: Remove non-existent i2c6 pinmux
ARM: tegra: apalis/colibri t30: fix on-module 5v0 supplies
MAINTAINERS: add new Rockchip SoC list
ARM: dts: rockchip: readd missing mmc0 pinctrl settings
ARM: dts: ODROID i2c improvements
ARM: dts: Enable PMIC interrupts on ODROID
ARM: dts: imx6sx: fix the pad setting for uart CTS_B
ARM: dts: i.MX53: fix apparent bug in VPU clks
ARM: imx: correct gpu2d_axi and gpu3d_axi clock setting
ARM: dts: imx6: edmqmx6: change enet reset pin
ARM: dts: vf610-twr: Fix pinctrl_esdhc1 pin definitions.
ARM: imx: remove unnecessary ARCH_HAS_OPP select
ARM: imx: fix TLB missing of IOMUXC base address during suspend
ARM: imx6: fix SMP compilation again
ARM: dt: sun6i: Add #address-cells and #size-cells to i2c controller nodes
- A largeish fix for the IRQ handling in the new Zynq driver.
The quite verbose commit message gives the exact details.
- Move some defines for gpiod flags outside an ifdef to make
stub functions work again.
- Various minor fixes that we can accept for -rc1.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJT+ZCiAAoJEEEQszewGV1zOZEQALkGrQacC9pZdhw93+wumj/d
goA4B1VSwi+vlQ3DcF7oXbTcbaYfGu58ZAlShc9XhVINaWEkVfc1GnBCq9rsnjNh
/rPaqDpqqBo7tfghjsMIZFdlNH151u/4rCAFMUB/STOAlUckSCiqWdxd7wMhAPlZ
yLD5oKqm3mWu/PX2n334rkLb1UxT1vIniB3DgQRhUtrKeYarH9OpzJnWnCCmspCr
GeVWpq4Z851fHMok5D76EbLL57JykTCC7VjHyK9jrmbjpMFX6+lYmWWMCOgUHypi
zA5ZEL7PjhhrZlwTN5yn8QbuSfbmT14tTUP/OLtItE0CYjF1nheyxueKErHT+DP4
piLDZgdv79ZYBSbvfbpjeO5VLkZz3m7EAGp/0kWy89uGcwjVsk+zsDSDc0v09tTk
hYRhu5/gc8zp3rd7v4DspSBbpxK+iI7uMzaGuv32AEdPvFzDVy4SJmRN/qTt/W9E
KVAdk4OJdx7xb9ee9Z2OfdbteOt0zFQjmTIneZ0urlqOZCbvbFH4PvuPB4L81SCV
XQ023OZqbBYb72Kz3Fw46NjMvRql3rkDph62oikEtfSKmSziMDY9z6bY42b0PCTq
mSovoQduN06wRDy5VQA8H1h7VF3k8NXjTI2hMEV85VhxlCzhabukBf+HY37TQIhT
pLQp0fwKmf8iZWtuF4Of
=7iYA
-----END PGP SIGNATURE-----
Merge tag 'gpio-v3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull gpio fixes from Linus Walleij:
- a largeish fix for the IRQ handling in the new Zynq driver. The
quite verbose commit message gives the exact details.
- move some defines for gpiod flags outside an ifdef to make stub
functions work again.
- various minor fixes that we can accept for -rc1.
* tag 'gpio-v3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio-lynxpoint: enable input sensing in resume
gpio: move GPIOD flags outside #ifdef
gpio: delete unneeded test before of_node_put
gpio: zynq: Fix IRQ handlers
gpiolib: devres: use correct structure type name in sizeof
MAINTAINERS: Change maintainer for gpio-bcm-kona.c
Pull drm fixes from Dave Airlie:
"Intel and radeon fixes.
Post KS/LC git requests from i915 and radeon stacked up. They are all
fixes along with some new pci ids for radeon, and one maintainers file
entry.
- i915: display fixes and irq fixes
- radeon: pci ids, and misc gpuvm, dpm and hdp cache"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (29 commits)
MAINTAINERS: Add entry for Renesas DRM drivers
drm/radeon: add additional SI pci ids
drm/radeon: add new bonaire pci ids
drm/radeon: add new KV pci id
Revert "drm/radeon: Use write-combined CPU mappings of ring buffers with PCIe"
drm/radeon: fix active_cu mask on SI and CIK after re-init (v3)
drm/radeon: fix active cu count for SI and CIK
drm/radeon: re-enable selective GPUVM flushing
drm/radeon: Sync ME and PFP after CP semaphore waits v4
drm/radeon: fix display handling in radeon_gpu_reset
drm/radeon: fix pm handling in radeon_gpu_reset
drm/radeon: Only flush HDP cache for indirect buffers from userspace
drm/radeon: properly document reloc priority mask
drm/i915: don't try to retrain a DP link on an inactive CRTC
drm/i915: make sure VDD is turned off during system suspend
drm/i915: cancel hotplug and dig_port work during suspend and unload
drm/i915: fix HPD IRQ reenable work cancelation
drm/i915: take display port power domain in DP HPD handler
drm/i915: Don't try to enable cursor from setplane when crtc is disabled
drm/i915: Skip load detect when intel_crtc->new_enable==true
...
As reported by Dan Aloni, commit f8567a3845 ("aio: fix aio request
leak when events are reaped by userspace") introduces a regression when
user code attempts to perform io_submit() with more events than are
available in the ring buffer. Reverting that commit would reintroduce a
regression when user space event reaping is used.
Fixing this bug is a bit more involved than the previous attempts to fix
this regression. Since we do not have a single point at which we can
count events as being reaped by user space and io_getevents(), we have
to track event completion by looking at the number of events left in the
event ring. So long as there are as many events in the ring buffer as
there have been completion events generate, we cannot call
put_reqs_available(). The code to check for this is now placed in
refill_reqs_available().
A test program from Dan and modified by me for verifying this bug is available
at http://www.kvack.org/~bcrl/20140824-aio_bug.c .
Reported-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Dan Aloni <dan@kernelim.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: stable@vger.kernel.org # v3.16 and anything that f8567a3845 was backported to
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A message warning a user about wrong vc value was printing
out port instead.
Reported-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
On r8a7791, i2c6 (aka iic3) doesn't need pinmux, but the koelsch dts
refers to non-existent pinmux configuration data:
pinmux core: sh-pfc does not support function i2c6
sh-pfc e6060000.pfc: invalid function i2c6 in map table
Remove it to fix this.
Fixes: commit 1d41f36a68 ("ARM: shmobile:
koelsch dts: Add VDD MPU regulator for DVFS")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Olof Johansson <olof@lixom.net>
Working on Gigabit/PCIe support in U-Boot for Apalis T30 I realised
that the current device tree source includes for our modules only
happen to work due to referencing the on-carrier 5v0 supply from USB
which is not at all available on-module. The modules actually contain
TPS60150 charge pumps to generate the PMIC required 5 volts from the
one and only 3.3 volt module supply. This patch fixes this.
(Note: When back-porting this to v3.16 stable releases, simply drop the
change to tegra30-apalis.dtsi; that file was added in v3.17)
Cc: <stable@vger.kernel.org> #v3.16+
Signed-off-by: Marcel Ziswiler <marcel@ziswiler.com>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
dts files and addition of the new Rockchip list to MAINTAINERS.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCAAGBQJT+hwkAAoJEPOmecmc0R2Bda4IAJ7rxypWxi8RtfxULHu1oGxZ
cdmAsaedAkjTn6tye/99lv1WkLzUD+Q3AxLX6Glu8PgTkocbwtkVTP+gMbv7zwfN
Z9bA6kazPL4Rskr6Am8BFKkGhXHCXChGP1J3z9Gyh07Vu2+3zsvRzyM7hySt3TqZ
EA2dL/uODlukw6EPzFaqGWZLn1OuJmkHXDfnZ3lBI/GFZD9qt5DnYswMBlDlIwr6
D08jmcleRA3wpnY1HXYR2cN1sJmcEP6xVE8ApXd71X0MtumEy49ZTR+4T5/Sh/XN
OPheoH3UUVtXVrAa5fsoUE2xsKs3PZgnIHk1kQklpbw+HArbN+aW0Jh6CySb65Q=
=aNJj
-----END PGP SIGNATURE-----
Merge tag 'v3.17-rockchip-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into fixes
Merge "ARM: rockchip: fix for 3.17" from Heiko Stubner:
Pinctrl that got accidentially dropped when reorganizing the
dts files and addition of the new Rockchip list to MAINTAINERS.
* tag 'v3.17-rockchip-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
MAINTAINERS: add new Rockchip SoC list
ARM: dts: rockchip: readd missing mmc0 pinctrl settings
Signed-off-by: Olof Johansson <olof@lixom.net>
This pull just contains some new pci ids.
* 'drm-fixes-3.17' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: add additional SI pci ids
drm/radeon: add new bonaire pci ids
drm/radeon: add new KV pci id
During the restructuring of the Rockchip Cortex-A9 dtsi files it seems
like the pinctrl settings vanished at some point from the mmc0 support.
This of course renders them unusable, so readd the necessary pinctrl
properties.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Only a single patch in here that fixes a DTC warning.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJT2hD6AAoJEBx+YmzsjxAgV9YP/1wML2m04CODnLmLhKyx1GyM
VyBoEHb+o7P7TRyK7ePIVkGhGnQjYGMyXAI2KfOiBJpAl23uxXCXSNKc6WY+nBAw
6T6qL6w+tPIPW4B79fAjmoPksqhoVf9sAKAkdu6+ETwKMHRaF0IesPRTg9t21Wcd
yHHDY4z/qaFCMV59zeOIWMxM0V0gvUa46+bXC2flyTcTz5RgyUswCOXKHLvXfK1L
w8B+V34AeBtm2EzG6co1iOJVRKXlrjzUg25+adbRissNfvmVkPwYkQJNoBO6r/PO
Rxf7Mq50oovPl/Oc83e3RZl3tE8ds12t+ScS0qUFq6+tBvyq53IYP+51IhDOGiR3
a6PB1nPEXtvUt09xv5SPJiRsB+BFrh5hx+qWbDvUNQL5FZrqBxa63H6qArybFQXg
GaY8Zv5gdiMvAU0tdk7v4pduJkvvoG2GALGHPrcKmrpg2+qQ4LDKOJxErIEqOUi2
ERp7c0kODDz18F9OmjGYU2R7XT6Ji8h4MS/hbCiLajcboQMe3yClrcVB8ts+AUKJ
NSDWl5Q0/8uHhk40FAaGYEqVLakk8vhXwdg1hpcXzIg3dJg+P09dYK5nIOBzKIok
+eVXZ8xjcRnoBAumljZ3eGbTuysGDO95E56coDPE4IiAe4Esi7kl3fUS/NICoOGs
MC9tjeyMoscy+hoGsP57
=PL45
-----END PGP SIGNATURE-----
Merge tag 'sunxi-dt-for-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into fixes
Merge "Allwinner DT changes, take 2" from Maxime Ripard:
Only a single patch in here that fixes a DTC warning.
* tag 'sunxi-dt-for-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux:
ARM: dt: sun6i: Add #address-cells and #size-cells to i2c controller nodes
Signed-off-by: Olof Johansson <olof@lixom.net>
In __ftrace_replace_code(), when converting the call to a nop in a function
it needs to compare against the "curr" (current) value of the ftrace ops, and
not the "new" one. It currently does not affect x86 which is the only arch
to do the trampolines with function graph tracer, but when other archs that do
depend on this code implement the function graph trampoline, it can crash.
Here's an example when ARM uses the trampolines (in the future):
------------[ cut here ]------------
WARNING: CPU: 0 PID: 9 at kernel/trace/ftrace.c:1716 ftrace_bug+0x17c/0x1f4()
Modules linked in: omap_rng rng_core ipv6
CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.16.0-test-10959-gf0094b28f303-dirty #52
[<c02188f4>] (unwind_backtrace) from [<c021343c>] (show_stack+0x20/0x24)
[<c021343c>] (show_stack) from [<c095a674>] (dump_stack+0x78/0x94)
[<c095a674>] (dump_stack) from [<c02532a0>] (warn_slowpath_common+0x7c/0x9c)
[<c02532a0>] (warn_slowpath_common) from [<c02532ec>] (warn_slowpath_null+0x2c/0x34)
[<c02532ec>] (warn_slowpath_null) from [<c02cbac4>] (ftrace_bug+0x17c/0x1f4)
[<c02cbac4>] (ftrace_bug) from [<c02cc44c>] (ftrace_replace_code+0x80/0x9c)
[<c02cc44c>] (ftrace_replace_code) from [<c02cc658>] (ftrace_modify_all_code+0xb8/0x164)
[<c02cc658>] (ftrace_modify_all_code) from [<c02cc718>] (__ftrace_modify_code+0x14/0x1c)
[<c02cc718>] (__ftrace_modify_code) from [<c02c7244>] (multi_cpu_stop+0xf4/0x134)
[<c02c7244>] (multi_cpu_stop) from [<c02c6e90>] (cpu_stopper_thread+0x54/0x130)
[<c02c6e90>] (cpu_stopper_thread) from [<c0271cd4>] (smpboot_thread_fn+0x1ac/0x1bc)
[<c0271cd4>] (smpboot_thread_fn) from [<c026ddf0>] (kthread+0xe0/0xfc)
[<c026ddf0>] (kthread) from [<c020f318>] (ret_from_fork+0x14/0x20)
---[ end trace dc9ce72c5b617d8f ]---
[ 65.047264] ftrace failed to modify [<c0208580>] asm_do_IRQ+0x10/0x1c
[ 65.054070] actual: 85:1b:00:eb
Fixes: 7413af1fb7 "ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The latest rewrite of ftrace removed the separate ftrace_ops of
the function tracer and the function graph tracer and had them
share the same ftrace_ops. This simplified the accounting by removing
the multiple layers of functions called, where the global_ops func
would call a special list that would iterate over the other ops that
were registered within it (like function and function graph), which
itself was registered to the ftrace ops list of all functions
currently active. If that sounds confusing, the code that implemented
it was also confusing and its removal is a good thing.
The problem with this change was that it assumed that the function
and function graph tracer can never be used at the same time.
This is mostly true, but there is an exception. That is when the
function profiler uses the function graph tracer to profile.
The function profiler can be activated the same time as the function
tracer, and this breaks the assumption and the result is that ftrace
will crash (it detects the error and shuts itself down, it does not
cause a kernel oops).
To solve this issue, a previous change allowed the hash tables
for the functions traced by a ftrace_ops to be a pointer and let
multiple ftrace_ops share the same hash. This allows the function
and function_graph tracer to have separate ftrace_ops, but still
share the hash, which is what is done.
Now the function and function graph tracers have separate ftrace_ops
again, and the function tracer can be run while the function_profile
is active.
Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If a SIGKILL is sent to a task waiting in __nfs_iocounter_wait,
it will busy-wait or soft lockup in its while loop.
nfs_wait_bit_killable won't sleep, and the loop won't exit on
the error return.
Stop the busy-wait by breaking out of the loop when
nfs_wait_bit_killable returns an error.
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Commit 6094f83864
"nfs: allow coalescing of subpage requests" got rid of the requirement
that requests cover whole pages, but it made some incorrect assumptions.
It turns out that callers of this interface can map adjacent requests
(by file position as seen by req_offset + req->wb_bytes) to different pages,
even when they could share a page. An example is the direct I/O interface -
iov_iter_get_pages_alloc may return one segment with a partial page filled
and the next segment (which is adjacent in the file position) starts with a
new page.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Adjacent requests that share the same page are allowed, but should only
use one entry in the page vector. This avoids overruning the page
vector - it is sized based on how many bytes there are, not by
request count.
This fixes issues that manifest as "Redzone overwritten" bugs (the
vector overrun) and hangs waiting on page read / write, as it waits on
the same page more than once.
This also adds bounds checking to the page vector with a graceful failure
(WARN_ON_ONCE and pgio error returned to application).
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
This handles the 'nonblock=false' case in nfs_lock_and_join_requests.
If the group is already locked and blocking is allowed, drop the inode lock
and wait for the group lock to be cleared before trying it all again.
This should fix warnings found in peterz's tree (sched/wait branch), where
might_sleep() checks are added to wait.[ch].
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Reviewed-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
This fixes handling of errors from nfs_page_group_lock in
nfs_lock_and_join_requests. It now releases the inode lock and the
reference to the head request.
Reported-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Reviewed-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
__nfs_pageio_add_request was calling nfs_page_group_lock nonblocking, but
this can return -EAGAIN which would end up passing -EIO to the application.
There is no reason not to block in this path, so change the two calls to
do so. Also, there is no need to check the return value of
nfs_page_group_lock when nonblock=false, so remove the error handling code.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Reviewed-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
nfs_page_group_lock was calling wait_on_bit_lock even when told not to
block. Fix by first trying test_and_set_bit, followed by wait_on_bit_lock
if and only if blocking is allowed. Return -EAGAIN if nonblocking and the
test_and_set of the bit was already locked.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Reviewed-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Flip the meaning of the second argument from 'wait' to 'nonblock' to
match related functions. Update all five calls to reflect this change.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Reviewed-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Just one bugfix for the PWM lookup table code that would cause a PWM
channel to be set to the wrong period and polarity for non-perfect
matches.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJT8cLVAAoJEN0jrNd/PrOhRiQP/10av7t2Ahdn3Qi2imz5j3UO
lA73tzU36MEchsNoDkYnLwZik6x/3GYk7QUkPoyMFYcay1Wu/USj5hTuZl2phOtF
9tdqkKosrV7APJfpzfuoj6W2FtKBIV4iaxez+ZrqXXKj4BdKXGFvv72w4xf/EWE9
aPabqg3lvorZY42adqbqH5kbATd61FJPZktwzKfmg7O01Wnp2GL3xCPApq9CsBEQ
c7i9TR1ttEQZNM6RRs7auwgRNgbuxFZkXRSP5VFbFb1TB3OMCDcGY+PXab42SYLR
ztlUao93jZP9Dz7abIGHcZDgRpj7i6veu09RAH6C1Lr0ovvcTor69LlsgvaWQKKb
8CMiKGpLVF3Sg3wLwrSRgUb7FMNVc/R1lR//BtMMTxFcVNTvxc18Tl41azx3kZEt
UobQ3IzpalOlJTj1ADzUwws9alcgnD5hD6SEQJwwuqEzJTB4FeTepnrr4VgAA1oM
HU8+TzSdZLV0lDIl43rKj0kZ93ds3i2lM/FU6e0Z4bZT1K53J9a4iQsKJIFy83An
bcT0lr1kgBTHoBvSnCLsSB6ZmWZ3rmnQ6kIWv/nfzcQKQdNLMpyEbb1xComM/SOI
VjaIf/OaTa9h1QswSS2kYZpOguNiQkzRQtDsH4Pr0UgG7g7A5kofjsaP1VZRjcCq
FEFj0Zi+aHYX/UKDNa72
=SmTb
-----END PGP SIGNATURE-----
Merge tag 'pwm/for-3.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm fix from Thierry Reding:
"Just one bugfix for the PWM lookup table code that would cause a PWM
channel to be set to the wrong period and polarity for non-perfect
matches"
* tag 'pwm/for-3.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: Fix period and polarity in pwm_get() for non-perfect matches
The new_ctx pointer is set only for non-chanctx drivers. This yielded a
crash for chanctx-based drivers during channel switch finalization:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]
Use an adequate chanctx pointer to fix this.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
"Here are some bug fixes that have piled up during ksummit/linuxcon.
1) Fix endian problems in ibmveth, from Anton Blanchard.
2) IPV6 routing code does GFP_KERNEL allocation in atomic, fix from
Benjamin Block.
3) SCTP association fixes from Daniel Borkmann.
4) When multiple VLAN headers are present we have to make sure the
second and subsequent ones are pullable in the SKB otherwise we
blindly dereference garbage. From Jiri Benc.
5) The argument adjustment of the signature of hlist_add_after*()
introduced a regression in the batman-adv code, fix from Sven
Eckelmann.
6) Fix TX hang handling to avoid a panic in i40e, from Anjali Singhai
Jain.
7) PTP flag test is inverted in i40e driver, from Jesse Brandeburg.
8) ATM LEC driver needs to hold RTNL mutex over MTU changes, from
Chas Williams.
9) Truncate packets larger then the TPACKET_V3 format configured
buffers, otherwise we overwrite past the end of said buffers.
From Eric Dumazet.
10) Fix endianness bugs in qlcnic firmware handling, from Rajesh
Borundia and Shahed Shaikh.
11) CXGB4 sometimes doesn't get all of the TX completion events it
should resulting in SKBs getting stuck in the TX queue, from
Hariprasad Shenai.
12) When the FEC chip's PTP clock is disabled, you can't access the
register. Add necessary checks to avoid the resulting hang, from
Fugang Duan"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits)
drivers: isdn: eicon: xdi_msg.h: Fix typo in #ifndef
net: sctp: fix suboptimal edge-case on non-active active/retrans path selection
net: sctp: spare unnecessary comparison in sctp_trans_elect_best
net: ethernet: broadcom: bnx2x: Remove redundant #ifdef
ibmveth: Fix endian issues with rx_no_buffer statistic
net: xgene: fix possible NULL dereference in xgene_enet_free_desc_rings()
openvswitch: fix panic with multiple vlan headers
net: ipv6: fib: don't sleep inside atomic lock
net: fec: ptp: avoid register access when ipg clock is disabled
cxgb4: Free completed tx skbs promptly
cxgb4: Fix race condition in cleanup
sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probe
bnx2x: Revert UNDI flushing mechanism
qlcnic: Fix endianess issue in firmware load from file operation
qlcnic: Fix endianess issue in FW dump template header
qlcnic: Fix flash access interface to application
MAINTAINERS: Add section for MRF24J40 IEEE 802.15.4 radio driver
macvlan: Allow setting multicast filter on all macvlan types
packet: handle too big packets for PACKET_V3
MAINTAINERS: add entry for ec_bhf driver
...
Now that a ftrace_hash can be shared by multiple ftrace_ops, they can dec
the rec->flags by more than once (one per those that share the ftrace_hash).
This means that the tramp_hash may not have a hash item when it was added.
For example, if two ftrace_ops share a hash for a ftrace record, and the
first ops has a trampoline, when it adds itself it will set the rec->flags
TRAMP flag and increments its nr_trampolines counter. When the second ops
is added, it must clear that tramp flag but also decrement the other ops
that shares its hash. As the update to the function callbacks has not yet
been performed, the other ops will not have the tramp hash set yet and it
can not be used to know to decrement its nr_trampolines.
Luckily, the tramp_hash does not need to be used. As the ftrace_mutex is
held, a ops with a trampoline to a record during an update of another ops
that shares the record will have its func_hash pointing to it. Since a
trampoline can only be set for a record if only one ops is attached to it,
we can just check if the record has a trampoline (the FTRACE_FL_TRAMP flag
is set) and then find the ops that has this record in its hashes.
Also added some output to help debug when things go wrong.
Cc: stable@vger.kernel.org # 3.16+ (apply after 3.17-rc4 is out)
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Test for definedness of the macro which is actually defined (the
change is hard to see: it is s/SSS/SSA/).
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
In SCTP, selection of active (T.ACT) and retransmission (T.RET)
transports is being done whenever transport control operations
(UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport().
Commits 4c47af4d5e ("net: sctp: rework multihoming retransmission
path selection to rfc4960") and a7288c4dd5 ("net: sctp: improve
sctp_select_active_and_retran_path selection") have both improved
it towards a more fine-grained and optimal path selection.
Currently, the selection algorithm for T.ACT and T.RET is as follows:
1) Elect the two most recently used ACTIVE transports T1, T2 for
T.ACT, T.RET, where T.ACT<-T1 and T1 is most recently used
2) In case primary path T.PRI not in {T1, T2} but ACTIVE, set
T.ACT<-T.PRI and T.RET<-T1
3) If only T1 is ACTIVE from the set, set T.ACT<-T1 and T.RET<-T1
4) If none is ACTIVE, set T.ACT<-best(T.PRI, T.RET, T3) where
T3 is the most recently used (if avail) in PF, set T.RET<-T.PRI
Prior to above commits, 4) was simply a camp on T.ACT<-T.PRI and
T.RET<-T.PRI, ignoring possible paths in PF. Camping on T.PRI is
still slightly suboptimal as it can lead to the following scenario:
Setup:
<A> <B>
T1: p1p1 (10.0.10.10) <==> .'`) <==> p1p1 (10.0.10.12) <= T.PRI
T2: p1p2 (10.0.10.20) <==> (_ . ) <==> p1p2 (10.0.10.22)
net.sctp.rto_min = 1000
net.sctp.path_max_retrans = 2
net.sctp.pf_retrans = 0
net.sctp.hb_interval = 1000
T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to
link flapping). Here, the first time transmission is sent over PF path
T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks
are sent over the INACTIVE path T1 (T.PRI), which is not good.
After the patch, it's choosing better transports in both cases by
modifying step 4):
4) If none is ACTIVE, set T.ACT_new<-best(T.ACT_old, T3) where T3 is
the most recently used (if avail) in PF, set T.RET<-T.ACT_new
This will still select a best possible path in PF if available (which
can also include T.PRI/T.RET), and set both T.ACT/T.RET to it.
In case sctp_assoc_control_transport() *just* put T.ACT_old into INACTIVE
as it transitioned from ACTIVE->PF->INACTIVE and stays in INACTIVE just
for a very short while before going back ACTIVE, it will guarantee that
this path will be reselected for T.ACT/T.RET since T3 (PF) is not
available.
Previously, this was not possible, as we would only select between T.PRI
and T.RET, and a possible T3 would be NULL due to the fact that we have
just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE
and would select a suboptimal path when T.PRI/T.RET have worse properties.
In the case that T.ACT_old permanently went to INACTIVE during this
transition and there's no PF path available, plus T.PRI and T.RET are
INACTIVE as well, we would now camp on T.ACT_old, but if everything is
being INACTIVE there's really not much we can do except hoping for a
successful HB to bring one of the transports back up again and, thus
cause a new selection through sctp_assoc_control_transport().
Now both tests work fine:
Case 1:
1. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET
2. T1 S(ACTIVE) T.ACT, T.RET
T2 S(PF)
3. T1 S(ACTIVE) T.ACT, T.RET
T2 S(INACTIVE)
5. T1 S(PF) T.ACT, T.RET
T2 S(INACTIVE)
[ 5.1 T1 S(INACTIVE) T.ACT, T.RET
T2 S(INACTIVE) ]
6. T1 S(ACTIVE) T.ACT, T.RET
T2 S(INACTIVE)
7. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET
Case 2:
1. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET
2. T1 S(PF)
T2 S(ACTIVE) T.ACT, T.RET
3. T1 S(INACTIVE)
T2 S(ACTIVE) T.ACT, T.RET
5. T1 S(INACTIVE)
T2 S(PF) T.ACT, T.RET
[ 5.1 T1 S(INACTIVE)
T2 S(INACTIVE) T.ACT, T.RET ]
6. T1 S(INACTIVE)
T2 S(ACTIVE) T.ACT, T.RET
7. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When both transports are the same, we don't have to go down that
road only to realize that we will return the very same transport.
We are guaranteed that curr is always non-NULL. Therefore, just
short-circuit this special case.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nothing defines _ASM_GENERIC_INT_L64_H, it is a weird way to check for
64 bit longs, and u64 should be printed using %llx anyway.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hidden away in the last 8 bytes of the buffer_list page is a solitary
statistic. It needs to be byte swapped or else ethtool -S will
produce numbers that terrify the user.
Since we do this in multiple places, create a helper function with a
comment explaining what is going on.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
A NULL pointer dereference is possible for the argument ring->buf_pool
which is passed to xgene_enet_free_desc_ring(), as ring could be NULL.
And now since NULL pointers are being checked for before the calls to
xgene_enet_free_desc_ring(), might as well take advantage of them and
not call the function if the argument would be NULL.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When there are multiple vlan headers present in a received frame, the first
one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
skb beyond the VLAN TPID may be still non-linear, including the inner TCI
and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.
This leads to two things:
1. Part of the resulting ethernet header is in the non-linear part of the
skb. When eth_type_trans is called later as the result of
OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
is in fact accessing random data when it reads past the TPID.
2. network_header points into the ethernet header instead of behind it.
mac_len is set to a wrong value (10), too.
Reported-by: Yulong Pei <ypei@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>