Commit Graph

704772 Commits

Author SHA1 Message Date
Thierry Reding
dfebb5f43a usb: chipidea: Add support for Tegra20/30/114/124
All of these Tegra SoC generations have a ChipIdea UDC IP block that can
be used for device mode communication with a host. Implement rudimentary
support that doesn't allow switching between host and device modes.

Tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Thierry Reding <treding@nvidia.com>
[digetx@gmail.com: rebased patches and added DMA alignment quirk for Tegra20]
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
2017-08-24 17:40:52 +08:00
Dmitry Osipenko
581821ae7f usb: chipidea: udc: Support SKB alignment quirk
NVIDIA Tegra20 UDC can't cope with unaligned DMA and require a USB gadget
quirk that avoids SKB buffer alignment to be set in order to make Ethernet
Gadget working. Later Tegra generations do not require that quirk. Let's
add a new platform data flag that allows to enable USB gadget quirk for
platforms that require it.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
2017-08-24 17:40:42 +08:00
Nicholas Piggin
2fe59f507a timers: Fix excessive granularity of new timers after a nohz idle
When a timer base is idle, it is forwarded when a new timer is added
to ensure that granularity does not become excessive. When not idle,
the timer tick is expected to increment the base.

However there are several problems:

- If an existing timer is modified, the base is forwarded only after
  the index is calculated.

- The base is not forwarded by add_timer_on.

- There is a window after a timer is restarted from a nohz idle, after
  it is marked not-idle and before the timer tick on this CPU, where a
  timer may be added but the ancient base does not get forwarded.

These result in excessive granularity (a 1 jiffy timeout can blow out
to 100s of jiffies), which cause the rcu lockup detector to trigger,
among other things.

Fix this by keeping track of whether the timer base has been idle
since it was last run or forwarded, and if so then forward it before
adding a new timer.

There is still a case where mod_timer optimises the case of a pending
timer mod with the same expiry time, where the timer can see excessive
granularity relative to the new, shorter interval. A comment is added,
but it's not changed because it is an important fastpath for
networking.

This has been tested and found to fix the RCU softlockup messages.

Testing was also done with tracing to measure requested versus
achieved wakeup latencies for all non-deferrable timers in an idle
system (with no lockup watchdogs running). Wakeup latency relative to
absolute latency is calculated (note this suffers from round-up skew
at low absolute times) and analysed:

             max     avg      std
upstream   506.0    1.20     4.68
patched      2.0    1.08     0.15

The bug was noticed due to the lockup detector Kconfig changes
dropping it out of people's .configs and resulting in larger base
clk skew When the lockup detectors are enabled, no CPU can go idle for
longer than 4 seconds, which limits the granularity errors.
Sub-optimal timer behaviour is observable on a smaller scale in that
case:

	     max     avg      std
upstream     9.0    1.05     0.19
patched      2.0    1.04     0.11

Fixes: Fixes: a683f390b9 ("timers: Forward the wheel clock whenever possible")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: David Miller <davem@davemloft.net>
Cc: dzickus@redhat.com
Cc: sfr@canb.auug.org.au
Cc: mpe@ellerman.id.au
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: linuxarm@huawei.com
Cc: abdhalee@linux.vnet.ibm.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: akpm@linux-foundation.org
Cc: paulmck@linux.vnet.ibm.com
Cc: torvalds@linux-foundation.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.com
2017-08-24 11:40:18 +02:00
Kalle Valo
90bc7dfdcb Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
ath.git patches for 4.14. Major changes:

ath10k

* initial UBS bus support (no full support yet)

* add tdls support for 10.4 firmware

ath9k

* add Dell Wireless 1802

wil6210

* support FW RSSI reporting
2017-08-24 12:26:17 +03:00
Ingo Molnar
c7f4f994de Merge tag 'perf-core-for-mingo-4.14-20170823' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

- Expression parser enhancements for metrics (Andi Kleen)

- Fix buffer overflow while freeing events in 'perf stat' (Andi Kleen)

- Fix static linking with elfutils's libdf and with libunwind
  in Debian/Ubuntu (Konstantin Khlebnikov)

- Tighten detection of BPF events, avoiding matching some other PMU
  events such as 'cpu/uops_executed.core,cmask=1/' as a .c source
  file that ended up being considered a BPF event (Andi Kleen)

- Add Skylake server uncore JSON vendor events (Andi Kleen)

- Add support for printing new mem_info encodings, including
  'perf test' checks (Andi Kleen)

- Really install manpages via 'make install-man' (Konstantin Khlebnikov)

- Fix documentation for perf_event_paranoid and perf_event_mlock_kb
  sysctls (Konstantin Khlebnikov)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-24 10:12:59 +02:00
Ingo Molnar
93da8b221d Merge branch 'linus' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-24 10:12:33 +02:00
Juergen Gross
ecda85e702 x86/lguest: Remove lguest support
Lguest seems to be rather unused these days. It has seen only patches
ensuring it still builds the last two years and its official state is
"Odd Fixes".

Remove it in order to be able to clean up the paravirt code.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: boris.ostrovsky@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: rusty@rustcorp.com.au
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170816173157.8633-3-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-24 09:57:28 +02:00
Juergen Gross
edcb5cf84f x86/paravirt/xen: Remove xen_patch()
Xen's paravirt patch function xen_patch() does some special casing for
irq_ops functions to apply relocations when those functions can be
patched inline instead of calls.

Unfortunately none of the special case function replacements is small
enough to be patched inline, so the special case never applies.

As xen_patch() will call paravirt_patch_default() in all cases it can
be just dropped. xen-asm.h doesn't seem necessary without xen_patch()
as the only thing left in it would be the definition of XEN_EFLAGS_NMI
used only once. So move that definition and remove xen-asm.h.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: rusty@rustcorp.com.au
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170816173157.8633-2-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-24 09:57:24 +02:00
Hanna Hawa
bf32f2aeb2 arm64: dts: marvell: add Device Tree files for Armada-8KP
This commit adds the base Device Tree files for the Armada 8KPlus.
The Armada 8KP SoCs include several hardware blocks, and this
commit only adds support for the AP810 block, that contains the CPU
core and basic peripherals.

AP810 is a high-performance die, includes octal core application
processor based ARMv8-A architecture, two standard high speed DDR4
interface, and GIC-600 interrupt controller.
AP810 Built as part of Marvell’s MoChi AP family products.

Armada-8080 (8KPlus family), include an AP810 block that contains
the CPU core and basic peripherals.

This commit creates the following hierarchy:
 * armada-ap810-ap0.dtsi - definitions common to AP810
 	* armada-ap810-ap0-octa-core.dtsi - description of the octa cores
		* armada-8080.dtsi - description of the 8080 SoC
			* armada-8080-db.dts - description of the 8080 board

Signed-off-by: Hanna Hawa <hannah@marvell.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
2017-08-24 09:56:41 +02:00
Takashi Sakamoto
b8e2204b25 ALSA: control: TLV data is unavailable at initial state of user-defined element set
For user-defined element set, in its initial state, TLV data is not
registered. It's firstly available when any application register it by
an additional operation. However, in current implementation, it's available
in its initial state. As a result, applications get -ENXIO to read it.

This commit controls its readability to manage info flags properly. In an
initial state, elements don't have SND_CTL_ELEM_ACCESS_TLV_READ flag. Once
TLV write operation is executed, they get the flag.

Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-08-24 09:15:15 +02:00
Takashi Sakamoto
da4288287b ALSA: control: queue TLV event for a set of user-defined element
In a design of user-defined element set, applications allow to change TLV
data on the set. This operation doesn't only affects to a target element,
but also to elements in the set.

This commit generates TLV event for all of elements in the set when the TLV
data is changed.

Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-08-24 09:15:14 +02:00
Takashi Sakamoto
fb8027ebfd ALSA: control: delegate TLV eventing to each driver
In a design of ALSA control core, a set of elements is represented by
'struct snd_kcontrol' to share common attributes. The set of elements
shares TLV (Type-Length-Value) data, too.

On the other hand, in ALSA control interface/protocol for applications,
a TLV operation is committed to an element. Totally, the operation can
have sub-effect to the other elements in the set. For example, TLV_WRITE
operation is expected to change TLV data, which returns to applications.
Applications attempt to change the TLV data per element, but in the above
design, they can effect to elements in the same set.

As a default, ALSA control core has no implementation except for TLV_READ
operation. Thus, the above design looks to have no issue. However, in
kernel APIs of ALSA control component, developers can program a handler
for any request of the TLV operation. Therefore, for elements in a set
which has the handler, applications can commit TLV_WRITE and TLV_COMMAND
requests.

For the above scenario, ALSA control core assist notification. When the
handler returns positive value, the core queueing an event for a requested
element. However, this includes design defects that the event is not
queued for the other element in a set. Actually, developers can program
the handlers to keep per-element TLV data, but it depends on each driver.

As of v4.13-rc6, there's no driver in tree to utilize the notification,
except for user-defined element set. This commit delegates the notification
into each driver to prevent developers from the design defects.

Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-08-24 09:15:13 +02:00
Arvind Yadav
5d3806eea2 ALSA: nm256: constify snd_ac97_res_table
snd_ac97_res_table are not supposed to change at runtime. All functions
working with snd_ac97_res_table provided by <sound/ac97_codec.h> work with
const snd_ac97_res_table. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-08-24 09:13:43 +02:00
Naveen N. Rao
2dea1d9c38 powerpc/uprobes: Implement arch_uretprobe_is_alive()
This helper is used to detect if a uprobe'd function has returned
through a setjmp/longjmp, rather than branching to the LR that was
updated previously by us. This fixes a SIGSEGV that gets generated when
programs use setjmp/longjmp with uretprobes.

We use the arm64 model (arch/arm64/kernel/probes/uprobes.c:
arch_uretprobe_is_alive()) for detecting when stack frames have been
removed from under us.

Reference:
https://marc.info/?l=linux-kernel&m=143748610330073
commit 7b868e4802 ("uprobes/x86: Reimplement arch_uretprobe_is_alive()")
commit db087ef69a ("uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more
clever")

Tested with the test program from:
https://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=blob;f=testsuite/systemtap.base/bz5274.c;hb=HEAD

And this script:
    $ cat test.sh
    #!/bin/bash

    perf probe -x ./bz5274 -a bz5274_main_return=main%return
    perf probe -x ./bz5274 -a bz5274_funca_return=funca%return
    perf probe -x ./bz5274 -a bz5274_funcb_return=funcb%return
    perf probe -x ./bz5274 -a bz5274_funcc_return=funcc%return
    perf probe -x ./bz5274 -a bz5274_funcd_return=funcd%return

    perf record -e 'probe_bz5274:*' -aR ./bz5274

Reported-by: Gustavo Luiz Duarte <gduarte@redhat.com>
Reported-by: zsun@redhat.com
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-24 16:19:21 +10:00
Naveen N. Rao
ec4189c4e8 powerpc/kprobes: Don't save/restore DAR/DSISR to/from pt_regs for optprobes
We don't save/restore these across a trap, or with KPROBES_ON_FTRACE.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-24 16:19:01 +10:00
Daniel Borkmann
a5e2da6e97 bpf: netdev is never null in __dev_map_flush
No need to test for it in fast-path, every dev in bpf_dtab_netdev
is guaranteed to be non-NULL, otherwise dev_map_update_elem() will
fail in the first place.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:43:40 -07:00
David S. Miller
d0273ef3b4 Merge branch 'bnxt_en-bug-fixes'
Michael Chan says:

====================
bnxt_en: bug fixes.

3 bug fixes related to XDP ring accounting in bnxt_setup_tc(), freeing
MSIX vectors when bnxt_re unregisters, and preserving the user-administered
PF MAC address when disabling SRIOV.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:42:43 -07:00
Michael Chan
a22a6ac2ff bnxt_en: Do not setup MAC address in bnxt_hwrm_func_qcaps().
bnxt_hwrm_func_qcaps() is called during probe to get all device
resources and it also sets up the factory MAC address.  The same function
is called when SRIOV is disabled to reclaim all resources.  If
the MAC address has been overridden by a user administered MAC
address, calling this function will overwrite it.

Separate the logic that sets up the default MAC address into a new
function bnxt_init_mac_addr() that is only called during probe time.

Fixes: 4a21b49b34 ("bnxt_en: Improve VF resource accounting.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:42:43 -07:00
Michael Chan
146ed3c5b8 bnxt_en: Free MSIX vectors when unregistering the device from bnxt_re.
Take back ownership of the MSIX vectors when unregistering the device
from bnxt_re.

Fixes: a588e4580a ("bnxt_en: Add interface to support RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:42:42 -07:00
Michael Chan
87e9b3778c bnxt_en: Fix .ndo_setup_tc() to include XDP rings.
When the number of TX rings is changed in bnxt_setup_tc(), we need to
include the XDP rings in the total TX ring count.

Fixes: 3841340627 ("bnxt_en: Add support for XDP_TX action.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:42:42 -07:00
Jakub Kicinski
46f1c52e66 nfp: TX time stamp packets before HW doorbell is rung
TX completion may happen any time after HW queue was kicked.
We can't access the skb afterwards.  Move the time stamping
before ringing the doorbell.

Fixes: 4c3523623d ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:40:49 -07:00
Shubham Bansal
d2aaa3dc41 bpf, doc: Add arm32 as arch supporting eBPF JIT
As eBPF JIT support for arm32 was added recently with
commit 39c13c204b, it seems appropriate to
add arm32 as arch with support for eBPF JIT in bpf and sysctl docs as well.

Signed-off-by: Shubham Bansal <illusionist.neo@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:40:12 -07:00
David S. Miller
81152518e9 Merge branch 'bpf-verifier-fixes'
Edward Cree says:

====================
bpf: verifier fixes

Fix a couple of bugs introduced in my recent verifier patches.
Patch #2 does slightly increase the insn count on bpf_lxc.o, but only by
 about a hundred insns (i.e. 0.2%).

v2: added test for write-marks bug (patch #1); reworded comment on
 propagate_liveness() for clarity.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:38:08 -07:00
Edward Cree
8e9cd9ce90 bpf/verifier: document liveness analysis
The liveness tracking algorithm is quite subtle; add comments to explain it.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:38:08 -07:00
Edward Cree
1b688a19a9 bpf/verifier: remove varlen_map_value_access flag
The optimisation it does is broken when the 'new' register value has a
 variable offset and the 'old' was constant.  I broke it with my pointer
 types unification (see Fixes tag below), before which the 'new' value
 would have type PTR_TO_MAP_VALUE_ADJ and would thus not compare equal;
 other changes in that patch mean that its original behaviour (ignore
 min/max values) cannot be restored.
Tests on a sample set of cilium programs show no change in count of
 processed instructions.

Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:38:08 -07:00
Alexei Starovoitov
df20cb7ec1 selftests/bpf: add a test for a pruning bug in the verifier
The test makes a read through a map value pointer, then considers pruning
 a branch where the register holds an adjusted map value pointer.  It
 should not prune, but currently it does.

Signed-off-by: Alexei Starovoitov <ast@fb.com>
[ecree@solarflare.com: added test-name and patch description]
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:38:07 -07:00
Edward Cree
63f45f8406 bpf/verifier: when pruning a branch, ignore its write marks
The fact that writes occurred in reaching the continuation state does
 not screen off its reads from us, because we're not really its parent.
So detect 'not really the parent' in do_propagate_liveness, and ignore
 write marks in that case.

Fixes: dc503a8ad9 ("bpf/verifier: track liveness for pruning")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:38:07 -07:00
Edward Cree
d893dc26e3 selftests/bpf: add a test for a bug in liveness-based pruning
Writes in straight-line code should not prevent reads from propagating
 along jumps.  With current verifier code, the jump from 3 to 5 does not
 add a read mark on 3:R0 (because 5:R0 has a write mark), meaning that
 the jump from 1 to 3 gets pruned as safe even though R0 is NOT_INIT.

Verifier output:
0: (61) r2 = *(u32 *)(r1 +0)
1: (35) if r2 >= 0x0 goto pc+1
 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
2: (b7) r0 = 0
3: (35) if r2 >= 0x0 goto pc+1
 R0=inv0 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
4: (b7) r0 = 0
5: (95) exit

from 3 to 5: safe

from 1 to 3: safe
processed 8 insns, stack depth 0

Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:38:07 -07:00
Colin Ian King
60890e0460 gre: remove duplicated assignment of iph
iph is being assigned the same value twice; remove the redundant
first assignment. (Thanks to Nikolay Aleksandrov for pointing out
that the first asssignment should be removed and not the second)

Fixes warning:
net/ipv4/ip_gre.c:265:2: warning: Value stored to 'iph' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:36:03 -07:00
Stefano Brivio
ee6c88bb75 sctp: Avoid out-of-bounds reads from address storage
inet_diag_msg_sctp{,l}addr_fill() and sctp_get_sctp_info() copy
sizeof(sockaddr_storage) bytes to fill in sockaddr structs used
to export diagnostic information to userspace.

However, the memory allocated to store sockaddr information is
smaller than that and depends on the address family, so we leak
up to 100 uninitialized bytes to userspace. Just use the size of
the source structs instead, in all the three cases this is what
userspace expects. Zero out the remaining memory.

Unused bytes (i.e. when IPv4 addresses are used) in source
structs sctp_sockaddr_entry and sctp_transport are already
cleared by sctp_add_bind_addr() and sctp_transport_new(),
respectively.

Noticed while testing KASAN-enabled kernel with 'ss':

[ 2326.885243] BUG: KASAN: slab-out-of-bounds in inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] at addr ffff881be8779800
[ 2326.896800] Read of size 128 by task ss/9527
[ 2326.901564] CPU: 0 PID: 9527 Comm: ss Not tainted 4.11.0-22.el7a.x86_64 #1
[ 2326.909236] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
[ 2326.917585] Call Trace:
[ 2326.920312]  dump_stack+0x63/0x8d
[ 2326.924014]  kasan_object_err+0x21/0x70
[ 2326.928295]  kasan_report+0x288/0x540
[ 2326.932380]  ? inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
[ 2326.938500]  ? skb_put+0x8b/0xd0
[ 2326.942098]  ? memset+0x31/0x40
[ 2326.945599]  check_memory_region+0x13c/0x1a0
[ 2326.950362]  memcpy+0x23/0x50
[ 2326.953669]  inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
[ 2326.959596]  ? inet_diag_msg_sctpasoc_fill+0x460/0x460 [sctp_diag]
[ 2326.966495]  ? __lock_sock+0x102/0x150
[ 2326.970671]  ? sock_def_wakeup+0x60/0x60
[ 2326.975048]  ? remove_wait_queue+0xc0/0xc0
[ 2326.979619]  sctp_diag_dump+0x44a/0x760 [sctp_diag]
[ 2326.985063]  ? sctp_ep_dump+0x280/0x280 [sctp_diag]
[ 2326.990504]  ? memset+0x31/0x40
[ 2326.994007]  ? mutex_lock+0x12/0x40
[ 2326.997900]  __inet_diag_dump+0x57/0xb0 [inet_diag]
[ 2327.003340]  ? __sys_sendmsg+0x150/0x150
[ 2327.007715]  inet_diag_dump+0x4d/0x80 [inet_diag]
[ 2327.012979]  netlink_dump+0x1e6/0x490
[ 2327.017064]  __netlink_dump_start+0x28e/0x2c0
[ 2327.021924]  inet_diag_handler_cmd+0x189/0x1a0 [inet_diag]
[ 2327.028045]  ? inet_diag_rcv_msg_compat+0x1b0/0x1b0 [inet_diag]
[ 2327.034651]  ? inet_diag_dump_compat+0x190/0x190 [inet_diag]
[ 2327.040965]  ? __netlink_lookup+0x1b9/0x260
[ 2327.045631]  sock_diag_rcv_msg+0x18b/0x1e0
[ 2327.050199]  netlink_rcv_skb+0x14b/0x180
[ 2327.054574]  ? sock_diag_bind+0x60/0x60
[ 2327.058850]  sock_diag_rcv+0x28/0x40
[ 2327.062837]  netlink_unicast+0x2e7/0x3b0
[ 2327.067212]  ? netlink_attachskb+0x330/0x330
[ 2327.071975]  ? kasan_check_write+0x14/0x20
[ 2327.076544]  netlink_sendmsg+0x5be/0x730
[ 2327.080918]  ? netlink_unicast+0x3b0/0x3b0
[ 2327.085486]  ? kasan_check_write+0x14/0x20
[ 2327.090057]  ? selinux_socket_sendmsg+0x24/0x30
[ 2327.095109]  ? netlink_unicast+0x3b0/0x3b0
[ 2327.099678]  sock_sendmsg+0x74/0x80
[ 2327.103567]  ___sys_sendmsg+0x520/0x530
[ 2327.107844]  ? __get_locked_pte+0x178/0x200
[ 2327.112510]  ? copy_msghdr_from_user+0x270/0x270
[ 2327.117660]  ? vm_insert_page+0x360/0x360
[ 2327.122133]  ? vm_insert_pfn_prot+0xb4/0x150
[ 2327.126895]  ? vm_insert_pfn+0x32/0x40
[ 2327.131077]  ? vvar_fault+0x71/0xd0
[ 2327.134968]  ? special_mapping_fault+0x69/0x110
[ 2327.140022]  ? __do_fault+0x42/0x120
[ 2327.144008]  ? __handle_mm_fault+0x1062/0x17a0
[ 2327.148965]  ? __fget_light+0xa7/0xc0
[ 2327.153049]  __sys_sendmsg+0xcb/0x150
[ 2327.157133]  ? __sys_sendmsg+0xcb/0x150
[ 2327.161409]  ? SyS_shutdown+0x140/0x140
[ 2327.165688]  ? exit_to_usermode_loop+0xd0/0xd0
[ 2327.170646]  ? __do_page_fault+0x55d/0x620
[ 2327.175216]  ? __sys_sendmsg+0x150/0x150
[ 2327.179591]  SyS_sendmsg+0x12/0x20
[ 2327.183384]  do_syscall_64+0xe3/0x230
[ 2327.187471]  entry_SYSCALL64_slow_path+0x25/0x25
[ 2327.192622] RIP: 0033:0x7f41d18fa3b0
[ 2327.196608] RSP: 002b:00007ffc3b731218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 2327.205055] RAX: ffffffffffffffda RBX: 00007ffc3b731380 RCX: 00007f41d18fa3b0
[ 2327.213017] RDX: 0000000000000000 RSI: 00007ffc3b731340 RDI: 0000000000000003
[ 2327.220978] RBP: 0000000000000002 R08: 0000000000000004 R09: 0000000000000040
[ 2327.228939] R10: 00007ffc3b730f30 R11: 0000000000000246 R12: 0000000000000003
[ 2327.236901] R13: 00007ffc3b731340 R14: 00007ffc3b7313d0 R15: 0000000000000084
[ 2327.244865] Object at ffff881be87797e0, in cache kmalloc-64 size: 64
[ 2327.251953] Allocated:
[ 2327.254581] PID = 9484
[ 2327.257215]  save_stack_trace+0x1b/0x20
[ 2327.261485]  save_stack+0x46/0xd0
[ 2327.265179]  kasan_kmalloc+0xad/0xe0
[ 2327.269165]  kmem_cache_alloc_trace+0xe6/0x1d0
[ 2327.274138]  sctp_add_bind_addr+0x58/0x180 [sctp]
[ 2327.279400]  sctp_do_bind+0x208/0x310 [sctp]
[ 2327.284176]  sctp_bind+0x61/0xa0 [sctp]
[ 2327.288455]  inet_bind+0x5f/0x3a0
[ 2327.292151]  SYSC_bind+0x1a4/0x1e0
[ 2327.295944]  SyS_bind+0xe/0x10
[ 2327.299349]  do_syscall_64+0xe3/0x230
[ 2327.303433]  return_from_SYSCALL_64+0x0/0x6a
[ 2327.308194] Freed:
[ 2327.310434] PID = 4131
[ 2327.313065]  save_stack_trace+0x1b/0x20
[ 2327.317344]  save_stack+0x46/0xd0
[ 2327.321040]  kasan_slab_free+0x73/0xc0
[ 2327.325220]  kfree+0x96/0x1a0
[ 2327.328530]  dynamic_kobj_release+0x15/0x40
[ 2327.333195]  kobject_release+0x99/0x1e0
[ 2327.337472]  kobject_put+0x38/0x70
[ 2327.341266]  free_notes_attrs+0x66/0x80
[ 2327.345545]  mod_sysfs_teardown+0x1a5/0x270
[ 2327.350211]  free_module+0x20/0x2a0
[ 2327.354099]  SyS_delete_module+0x2cb/0x2f0
[ 2327.358667]  do_syscall_64+0xe3/0x230
[ 2327.362750]  return_from_SYSCALL_64+0x0/0x6a
[ 2327.367510] Memory state around the buggy address:
[ 2327.372855]  ffff881be8779700: fc fc fc fc 00 00 00 00 00 00 00 00 fc fc fc fc
[ 2327.380914]  ffff881be8779780: fb fb fb fb fb fb fb fb fc fc fc fc 00 00 00 00
[ 2327.388972] >ffff881be8779800: 00 00 00 00 fc fc fc fc fb fb fb fb fb fb fb fb
[ 2327.397031]                                ^
[ 2327.401792]  ffff881be8779880: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc
[ 2327.409850]  ffff881be8779900: 00 00 00 00 00 04 fc fc fc fc fc fc 00 00 00 00
[ 2327.417907] ==================================================================

This fixes CVE-2017-7558.

References: https://bugzilla.redhat.com/show_bug.cgi?id=1480266
Fixes: 8f840e47f1 ("sctp: add the sctp_diag.c file")
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:35:15 -07:00
Arvind Yadav
042a90106b net: tipc: constify genl_ops
genl_ops are not supposed to change at runtime. All functions
working with genl_ops provided by <net/genetlink.h> work with
const genl_ops. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:31:38 -07:00
Colin Ian King
5719e5eb31 net: hinic: make functions set_ctrl0 and set_ctrl1 static
The functions set_ctrl0 and set_ctrl1 are local to the source and do
not need to be in global scope, so make them static.

Cleans up sparse warnings:
symbol 'set_ctrl0' was not declared. Should it be static?
symbol 'set_ctrl1' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:20:28 -07:00
Cédric Le Goater
a9dadc1c51 powerpc/xive: Fix the size of the cpumask used in xive_find_target_in_mask()
When called from xive_irq_startup(), the size of the cpumask can be
larger than nr_cpu_ids. This can result in a WARN_ON such as:

  WARNING: CPU: 10 PID: 1 at ../arch/powerpc/sysdev/xive/common.c:476 xive_find_target_in_mask+0x110/0x2f0
  ...
  NIP [c00000000008a310] xive_find_target_in_mask+0x110/0x2f0
  LR [c00000000008a2e4] xive_find_target_in_mask+0xe4/0x2f0
  Call Trace:
    xive_find_target_in_mask+0x74/0x2f0 (unreliable)
    xive_pick_irq_target.isra.1+0x200/0x230
    xive_irq_startup+0x60/0x180
    irq_startup+0x70/0xd0
    __setup_irq+0x7bc/0x880
    request_threaded_irq+0x14c/0x2c0
    request_event_sources_irqs+0x100/0x180
    __machine_initcall_pseries_init_ras_IRQ+0x104/0x134
    do_one_initcall+0x68/0x1d0
    kernel_init_freeable+0x290/0x374
    kernel_init+0x24/0x170
    ret_from_kernel_thread+0x5c/0x74

This happens because we're being called with our affinity mask set to
irq_default_affinity. That in turn was populated using
cpumask_setall(), which sets NR_CPUs worth of bits, not nr_cpu_ids
worth. Finally cpumask_weight() will return > nr_cpu_ids when passed a
mask which has > nr_cpu_ids bits set.

Fix it by limiting the value returned by cpumask_weight().

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[mpe: Add change log details on actual cause]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-24 15:20:18 +10:00
Paolo Abeni
257a73031d net/sock: allow the user to set negative peek offset
This is necessary to allow the user to disable peeking with
offset once it's enabled.
Unix sockets already allow the above, with this patch we
permit it for udp[6] sockets, too.

Fixes: 627d2d6b55 ("udp: enable MSG_PEEK at non-zero offset")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:18:44 -07:00
Eric Dumazet
2b33bc8aa2 net: dsa: use consume_skb()
Two kfree_skb() should be consume_skb(), to be friend with drop monitor
(perf record ... -e skb:kfree_skb)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:13:34 -07:00
David S. Miller
110d8465a6 Merge branch 'mlxsw-multichain-tc-offload'
Jiri Pirko says:

====================
mlxsw: spectrum: Introduce multichain TC offload

This patchset introduces offloading of rules added to chain with
non-zero index, which was previously forbidden. Also, goto_chain
termination action is offloaded allowing to jump to processing
of desired chain.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:44:32 -07:00
Jiri Pirko
0ede6ba2a1 mlxsw: spectrum_flower: Offload goto_chain termination action
If action is gact goto_chain, offload it to HW by jumping to another
ruleset.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:44:32 -07:00
Jiri Pirko
dbec8ee95a mlxsw: spectrum_acl: Provide helper to lookup ruleset
We need to lookup ruleset in order to offload goto_chain termination
action. This patch adds it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:44:32 -07:00
Jiri Pirko
0ade3b6457 mlxsw: spectrum_acl: Allow to get group_id value for a ruleset
For goto_chain action we need to know group_id of a ruleset to jump to.
Provide infrastructure in order to get it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:44:32 -07:00
Jiri Pirko
e457d86ada net: sched: add couple of goto_chain helpers
Add helpers to find out if a gact instance is goto_chain termination
action and to get chain index.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:44:32 -07:00
Jiri Pirko
45b62742df mlxsw: spectrum: Offload multichain TC rules
Reflect chain index coming down from TC core and create a ruleset per
chain. Note that only chain 0, being the implicit chain, is bound to the
device for processing. The rest of chains have to be "jumped-to" by
actions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:44:32 -07:00
David S. Miller
ae99e18892 Merge branch 'mvpp2-software-TSO-support'
Antoine Tenart says:

====================
net: mvpp2: software TSO support

This series adds the s/w TSO support in the PPv2 driver, in addition to
two cosmetic commits. As stated in patch 3/3:

Using iperf and 10G ports, using TSO shows a significant performance
improvement by a factor 2 to reach around 9.5Gbps in TX; as well as a
significant CPU usage drop (from 25% to 15%).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:42:10 -07:00
Antoine Ténart
186cd4d4e4 net: mvpp2: software tso support
The patch uses the tso API to implement the tso functionality in Marvell
PPv2 driver.

Using iperf and 10G ports, using TSO shows a significant performance
improvement by a factor 2 to reach around 9.5Gbps in TX; as well as a
significant CPU usage drop (from 25% to 15%).

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:42:10 -07:00
Antoine Ténart
85affd7e29 net: mvpp2: unify the txq size define use
The txq size is defined by MVPP2_AGGR_TXQ_SIZE, which is sometime not
used directly but through variables. As it is a fixed value use the
define everywhere in the driver.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:42:09 -07:00
Antoine Ténart
f9cbe9a556 net: define the TSO header size in net/tso.h
The TSO header size was defined in many drivers. Factorize the code and
define its size in net/tso.h.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:42:09 -07:00
David S. Miller
40e607cbee Merge branch 'nfp-fixes'
Jakub Kicinski says:

====================
nfp: fix SR-IOV deadlock and representor bugs

This series tackles the bug I've already tried to fix in commit
6d48ceb27a ("nfp: allocate a private workqueue for driver work").
I created a separate workqueue to avoid possible deadlock, and
the lockdep error disappeared, coincidentally.  The way workqueues
are operating, separate workqueue doesn't necessarily mean separate
thread of execution.  Luckily we can safely forego the lock.

Second fix changes the order in which vNIC netdevs and representors
are created/destroyed.  The fix is kept small and should be sufficient
for net because of how flower uses representors, a more thorough fix
will be targeted at net-next.

Third fix avoids leaking mapped frame buffers if FW sent a frame with
unknown portid.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:39:50 -07:00
Jakub Kicinski
1691a4c0f4 nfp: avoid buffer leak when representor is missing
When driver receives a muxed frame, but it can't find the representor
netdev it is destined to it will try to "drop" that frame, i.e. reuse
the buffer.  The issue is that the replacement buffer has already been
allocated at this point, and reusing the buffer from received frame
will leak it.  Change the code to put the new buffer on the ring
earlier and not reuse the old buffer (make the buffer parameter
to nfp_net_rx_drop() a NULL).

Fixes: 91bf82ca9e ("nfp: add support for tx/rx with metadata portid")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:39:44 -07:00
Jakub Kicinski
326ce60301 nfp: make sure representors are destroyed before their lower netdev
App start/stop callbacks can perform application initialization.
Unfortunately, flower app started using them for creating and
destroying representors.  This can lead to a situation where
lower vNIC netdev is destroyed while representors still try
to pass traffic.  This will most likely lead to a NULL-dereference
on the lower netdev TX path.

Move the start/stop callbacks, so that representors are created/
destroyed when vNICs are fully initialized.

Fixes: 5de73ee467 ("nfp: general representor implementation")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:39:44 -07:00
Jakub Kicinski
d6e1ab9ea3 nfp: don't hold PF lock while enabling SR-IOV
Enabling SR-IOV VFs will cause the PCI subsystem to schedule a
work and flush its workqueue.  Since the nfp driver schedules its
own work we can't enable VFs while holding driver load.  Commit
6d48ceb27a ("nfp: allocate a private workqueue for driver work")
tried to avoid this deadlock by creating a separate workqueue.
Unfortunately, due to the architecture of workqueue subsystem this
does not guarantee a separate thread of execution.  Luckily
we can simply take pci_enable_sriov() from under the driver lock.

Take pci_disable_sriov() from under the lock too for symmetry.

Fixes: 6d48ceb27a ("nfp: allocate a private workqueue for driver work")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:39:44 -07:00
Xin Long
5f9ae3d9e7 ipv4: do metrics match when looking up and deleting a route
Now when ipv4 route inserts a fib_info, it memcmp fib_metrics.
It means ipv4 route identifies one route also with metrics.

But when removing a route, it tries to find the route without
caring about the metrics. It will cause that the route with
right metrics can't be removed.

Thomas noticed this issue when doing the testing:

1. add:
   # ip route append 192.168.7.0/24 dev v window 1000
   # ip route append 192.168.7.0/24 dev v window 1001
   # ip route append 192.168.7.0/24 dev v window 1002
   # ip route append 192.168.7.0/24 dev v window 1003
2. delete:
   # ip route delete 192.168.7.0/24 dev v window 1002
3. show:
     192.168.7.0/24 proto boot scope link window 1001
     192.168.7.0/24 proto boot scope link window 1002
     192.168.7.0/24 proto boot scope link window 1003

The one with window 1002 wasn't deleted but the first one was.

This patch is to do metrics match when looking up and deleting
one route.

Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 20:37:10 -07:00