Commit Graph

704772 Commits

Author SHA1 Message Date
Dan Carpenter
c6b4ee9eba sparc64: vcc: Check for IS_ERR() instead of NULL
The tty_alloc_driver() function never returns NULL, it returns error
pointers on error.

Fixes: ce808b7463 ("sparc64: vcc: TTY driver initialization and cleanup")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:39:55 -07:00
Dexuan Cui
ae0078fcf0 hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It uses VMBus ringbuffer as the
transportation layer.

With hv_sock, applications between the host (Windows 10, Windows Server
2016 or newer) and the guest can talk with each other using the traditional
socket APIs.

More info about Hyper-V Sockets is available here:

"Make your own integration services":
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service

The patch implements the necessary support in Linux guest by introducing a new
vsock transport for AF_VSOCK.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Andy King <acking@vmware.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: George Zhang <georgezhang@vmware.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Reilly Grant <grantr@vmware.com>
Cc: Asias He <asias@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:38:18 -07:00
Jakub Kicinski
7cadf2cbe8 selftests/bpf: check the instruction dumps are populated
Add a basic test for checking whether kernel is populating
the jited and xlated BPF images.  It was used to confirm
the behaviour change from commit d777b2ddbe ("bpf: don't
zero out the info struct in bpf_obj_get_info_by_fd()"),
which made bpf_obj_get_info_by_fd() usable for retrieving
the image dumps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:35:18 -07:00
Wei Wang
4e587ea71b ipv6: fix sparse warning on rt6i_node
Commit c5cff8561d adds rcu grace period before freeing fib6_node. This
generates a new sparse warning on rt->rt6i_node related code:
  net/ipv6/route.c:1394:30: error: incompatible types in comparison
  expression (different address spaces)
  ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
  expression (different address spaces)

This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
rcu API is used for it.
After this fix, sparse no longer generates the above warning.

Fixes: c5cff8561d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:34:40 -07:00
Yazen Ghannam
1d5d820b8f ACPI, APEI, EINJ: Subtract any matching Register Region from Trigger resources
ACPI defines a number of instructions to use for triggering errors. However
we are currently removing the address resources from the trigger resources
for only the WRITE_REGISTER_VALUE instruction. This leads to a resource
conflict for any other valid instruction.

Check that the instruction is less than or equal to the
WRITE_REGISTER_VALUE instruction. This allows all valid memory access
instructions and protects against invalid instructions.

Fixes: b4e008dc53 (ACPI, APEI, EINJ, Refine the fix of resource conflict)
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-29 00:32:49 +02:00
Rafael J. Wysocki
0a9f429ffb Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq
Pull devfreq changes for v4.14 from MyungJoo Ham.

* 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq:
  PM / devfreq: Fix memory leak when fail to register device
  PM / devfreq: Add dependency on PM_OPP
  PM / devfreq: Move private devfreq_update_stats() into devfreq
  PM / devfreq: Convert to using %pOF instead of full_name
2017-08-29 00:27:12 +02:00
Stefano Brivio
0f3086868e cxgb4: Fix stack out-of-bounds read due to wrong size to t4_record_mbox()
Passing commands for logging to t4_record_mbox() with size
MBOX_LEN, when the actual command size is actually smaller,
causes out-of-bounds stack accesses in t4_record_mbox() while
copying command words here:

	for (i = 0; i < size / 8; i++)
		entry->cmd[i] = be64_to_cpu(cmd[i]);

Up to 48 bytes from the stack are then leaked to debugfs.

This happens whenever we send (and log) commands described by
structs fw_sched_cmd (32 bytes leaked), fw_vi_rxmode_cmd (48),
fw_hello_cmd (48), fw_bye_cmd (48), fw_initialize_cmd (48),
fw_reset_cmd (48), fw_pfvf_cmd (32), fw_eq_eth_cmd (16),
fw_eq_ctrl_cmd (32), fw_eq_ofld_cmd (32), fw_acl_mac_cmd(16),
fw_rss_glb_config_cmd(32), fw_rss_vi_config_cmd(32),
fw_devlog_cmd(32), fw_vi_enable_cmd(48), fw_port_cmd(32),
fw_sched_cmd(32), fw_devlog_cmd(32).

The cxgb4vf driver got this right instead.

When we call t4_record_mbox() to log a command reply, a MBOX_LEN
size can be used though, as get_mbox_rpl() will fill cmd_rpl up
completely.

Fixes: 7f080c3f2f ("cxgb4: Add support to enable logging of firmware mailbox commands")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:24:23 -07:00
Dan Carpenter
f740c34ee5 bpf: fix oops on allocation failure
"err" is set to zero if bpf_map_area_alloc() fails so it means we return
ERR_PTR(0) which is NULL.  The caller, find_and_alloc_map(), is not
expecting NULL returns and will oops.

Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:23:34 -07:00
Leonard Crestez
fded5fc841 cpufreq: imx6q: Fix imx6sx low frequency support
This patch contains the minimal changes required to support imx6sx OPP
of 198 Mhz. Without this patch cpufreq still reports success but the
frequency is not changed, the "arm" clock will still be at 396000000 in
clk_summary.

In order to do this PLL1 needs to be still kept enabled while changing
the ARM clock. This is a hardware requirement: when ARM_PODF is changed
in CCM we need to check the busy bit of CCM_CDHIPR bit 16 arm_podf_busy,
and this bit is sync with PLL1 clock, so if PLL1 NOT enabled, this
bit will never get clear.

Keep pll1_sys explicitly enabled until after the rate is change to deal
with this. Otherwise from the clk framework perspective pll1_sys is
unused and gets turned off.

Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-29 00:22:52 +02:00
Maxime Ripard
ad4540cc5a net: stmmac: sun8i: Remove the compatibles
Since the bindings have been controversial, and we follow the DT stable ABI
rule, we shouldn't let a driver with a DT binding that might change slip
through in a stable release.

Remove the compatibles to make sure the driver will not probe and no-one
will start using the binding currently implemented. This commit will
obviously need to be reverted in due time.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:22:42 -07:00
Colin Ian King
843791bb6c cpufreq: speedstep-lib: make several arrays static, makes code smaller
Don't populate arrays on the stack, instead make them static.
Makes the object code smaller by over 860 bytes:

Before:
   text	   data	    bss	    dec	    hex	filename
  10716	   5196	      0	  15912	   3e28	drivers/cpufreq/speedstep-lib.o

After:
   text	   data	    bss	    dec	    hex	filename
   9690	   5356	      0	  15046	   3ac6	drivers/cpufreq/speedstep-lib.o

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-29 00:21:41 +02:00
David S. Miller
c73c8a8e07 Merge branch 'nfp-flow-dissector-layer'
Pieter Jansen van Vuuren says:

====================
nfp: fix layer calculation and flow dissector use

Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. Formerly flow dissectors were referenced
without first checking that they are in use and correctly populated by TC.
Additionally this patch set fixes the incorrect use of mask field for vlan
matching.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:20:25 -07:00
Pieter Jansen van Vuuren
6afd33e438 nfp: remove incorrect mask check for vlan matching
Previously the vlan tci field was incorrectly exact matched. This patch
fixes this by using the flow dissector to populate the vlan tci field.

Fixes: 5571e8c9f2 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:20:24 -07:00
Pieter Jansen van Vuuren
74af597510 nfp: fix supported key layers calculation
Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. This patch checks that the TTL and
TOS fields are masked out before offloading. Additionally this patch
checks that MPLS packets are correctly handled, by not offloading them.

Fixes: af9d842c13 ("nfp: extend flower add flow offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:20:24 -07:00
Pieter Jansen van Vuuren
a7cd39e0c7 nfp: fix unchecked flow dissector use
Previously flow dissectors were referenced without first checking that
they are in use and correctly populated by TC. This patch fixes this by
checking each flow dissector key before referencing them.

Fixes: 5571e8c9f2 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:20:24 -07:00
Liav Rehana
28923f6b74 ARC: [plat-eznps] handle extra aux regs #2: kernel/entry exit
Preserve eflags and gpa1 aux during entry/exit into kernel as these
could be modified by kernel mode

These registers used by compare exchange instructions.
  - GPA1 is used for compare value,
  - EFLAGS got bit reflects atomic operation response.

EFLAGS is zeroed for each new user task so it won't get its
parent value.

Signed-off-by: Liav Rehana <liavr@mellanox.com>
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
Noam Camus
5b2189ab6e ARC: [plat-eznps] handle extra aux regs #1: save/restore on context switch
save EFLAGS, and GPA1 auxiliary registers during context switch,
since they may be changed by the new task in kernel mode, while using atomic
ops e.g. cmpxchg.

Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
Elad Kanfi
3f9cd874dc ARC: [plat-eznps] avoid toggling of DPC register
HW bug description: in case of HW thread context switch
the dpc configuration of the exiting thread is dragged
one cycle into the next thread.
In order to avoid the consequences of this bug, the DPC register
is set to an initial value, and not changed afterwards.

Signed-off-by: Elad Kanfi <eladkan@mellanox.com>
Signed-off-by: Noam Camus <noamca@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
Liav Rehana
abd8926bff ARC: [plat-eznps] Update the init sequence of aux regs per cpu.
This commit add new configuration that enables us to distinguish
between building the kernel for platforms that have a different set
of auxiliary registers for each cpu and platforms that have a shared
set of auxiliary registers across every thread in each core.
On platforms that implement a different set of auxiliary registers
disabling this configuration insures that we initialize registers on
every cpu and not just for the first thread of the core.
Example for non shared registers is working with EZsim (non silicon)

Signed-off-by: Liav Rehana <liavr@mellanox.com>
Signed-off-by: Noam Camus <noamca@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
Noam Camus
35b55ef2b8 ARC: [plat-eznps] new command line argument for HW scheduler at MTM
We add ability for all cores at NPS SoC to control the number of cycles
HW thread can execute before it is replace with another eligible
HW thread within the same core. The replacement is done by the
HW scheduler.

Signed-off-by: Noam Camus <noamca@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: simplified handlign of out of range argument value]
2017-08-28 15:17:36 -07:00
Noam Camus
18ee4becb5 ARC: set boot print log level to PR_INFO
Some of the boot printing code had printk() w/o explicit log level.

This patch introduces consistency allowing platforms to switch to less
verbose console logging using cmdline.

NPS400 with 4K CPUs needs to avoid the cpu info printing for faster
bootup.

Signed-off-by: Noam Camus <noamca@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
Noam Camus
983394959f ARC: [plat-eznps] Handle user memory error same in simulation and silicon
On ARC700 (and nSIM), user mode memory error triggers an L2 interrupt
which is handled gracefully by kernel (or it tries to despite this being
imprecise, and error could get charged to kernel itself). The offending
task is killed and kernel moves on.

NPS hardware however raises a Machine Check exception for same error
which is NOT recoverable by kernel.

This patch aligns kernel handling for nSIM case, to same as hardware by
overriding the default user space bus error handler.

Signed-off-by: Noam Camus <noamca@mellanox.com>
Signed-off-by: Elad Kanfi <eladkan@mellanox.com>
[vgupta: rewrote changelog]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
Noam Camus
644fa02b39 ARC: [plat-eznps] use schd.wft instruction instead of sleep at idle task
When HW threads are active we want CPU to enter idle state only
for the calling HW thread and not to put on sleep all HW threads
sharing this core. For this need the NPS400 got dedicated instruction
so only calling thread is entring sleep and all other are still awake
and can execute instructions.

Signed-off-by: Noam Camus <noamca@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: reworked patch to not use inline ifdef but a new function itself]
2017-08-28 15:17:36 -07:00
Vineet Gupta
64f42cec84 ARC: create cpu specific version of arch_cpu_idle()
This paves way for creating a 3rd variant needed for NPS ARC700 without
littering ifdey'ery all over the place

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
Noam Camus
1112c3b2ce ARC: [plat-eznps] spinlock aware for MTM
This way when we execute "ex" during trying to hold lock we can switch to
other HW thread and utilize the core intead of just spinning on a lock.

We noticed about 10% improvement of execution time with hackbench test.

Signed-off-by: Noam Camus <noamca@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
Vineet Gupta
c2bdac146b ARC: spinlock: Document the EX based spin_unlock
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
Noam Camus
ab1e8660c1 ARC: [plat-eznps] disabled stall counter due to a HW bug
This counter represents threshold for consecutive stall which
would trigger HW threads scheduling. However when enabled, low
threshhold values cause performance degradation and in the
worst case even livelock.

So disable it by resorting to HW reset value

Signed-off-by: Noam Camus <noamca@mellanox.com>
Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: fixed changelog]
2017-08-28 15:17:36 -07:00
Noam Camus
30b7af252e ARC: [plat-eznps] Fix TLB Errata
Due to a HW bug in NPS400 we get from time to time false TLB miss.
Workaround this by validating each miss.

Signed-off-by: Noam Camus <noamca@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
Noam Camus
9e9395525b ARC: [plat-eznps] typo fix at Kconfig
Signed-off-by: Noam Camus <noamca@mellanox.com>
Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
Liav Rehana
9405530469 ARC: typos fix in kernel/entry-compact.S
Signed-off-by: Liav Rehana <liavr@mellanox.com>
Signed-off-by: Noam Camus <noamca@mellanox.com>
Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
Liav Rehana
ddf720f86e ARC: typo fix in mm/fault.c
Signed-off-by: Liav Rehana <liavr@mellanox.com>
Signed-off-by: Noam Camus <noamca@mellanox.com>
Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
David Ahern
a8e3bb347d net: Add comment that early_demux can change via sysctl
Twice patches trying to constify inet{6}_protocol have been reverted:
39294c3df2 ("Revert "ipv6: constify inet6_protocol structures"") to
revert 3a3a4e3054 and then 03157937fe ("Revert "ipv4: make
net_protocol const"") to revert aa8db499ea.

Add a comment that the structures can not be const because the
early_demux field can change based on a sysctl.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:17:29 -07:00
Rafael J. Wysocki
7aa7a0360a PM: docs: Delete the obsolete states.txt document
The Documentation/power/states.txt document is now redundant and
sonewhat outdated, so delete it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-29 00:16:06 +02:00
Rafael J. Wysocki
cba630b6f5 Merge branch 'pm-sleep' into pm-docs 2017-08-29 00:15:47 +02:00
Rafael J. Wysocki
0c0b6b7bc4 PM: docs: Describe high-level PM strategies and sleep states
Reorganize the power management part of admin-guide by adding a
description of major power management strategies supported by the
kernel (system-wide and working-state power management) to it and
dividing the rest of the material into the system-wide PM and
working-state PM chapters.

On top of that, add a description of system sleep states to the
system-wide PM chapter.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
2017-08-29 00:15:32 +02:00
Willem de Bruijn
cc8737a5fe xen-netback: update ubuf_info initialization to anonymous union
The xen driver initializes struct ubuf_info fields using designated
initializers. I recently moved these fields inside a nested anonymous
struct inside an anonymous union. I had missed this use case.

This breaks compilation of xen-netback with older compilers.
>From kbuild bot with gcc-4.4.7:

   drivers/net//xen-netback/interface.c: In function
   'xenvif_init_queue':
   >> drivers/net//xen-netback/interface.c:554: error: unknown field 'ctx' specified in initializer
   >> drivers/net//xen-netback/interface.c:554: warning: missing braces around initializer
      drivers/net//xen-netback/interface.c:554: warning: (near initialization for '(anonymous).<anonymous>')
   >> drivers/net//xen-netback/interface.c:554: warning: initialization makes integer from pointer without a cast
   >> drivers/net//xen-netback/interface.c:555: error: unknown field 'desc' specified in initializer

Add double braces around the designated initializers to match their
nested position in the struct. After this, compilation succeeds again.

Fixes: 4ab6c99d99 ("sock: MSG_ZEROCOPY notification coalescing")
Reported-by: kbuild bot <lpk@intel.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:11:50 -07:00
David S. Miller
d5a915b978 Merge branch 'gre-add-collect_md-mode-for-ERSPAN-tunnel'
William Tu says:

====================
gre: add collect_md mode for ERSPAN tunnel

This patch series provide collect_md mode for ERSPAN tunnel.  The fist patch
refactors the existing gre_fb_xmit function by exacting the route cache
portion into a new function called prepare_fb_xmit.  The second patch
introduces the collect_md mode for ERSPAN tunnel, by calling the
prepare_fb_xmit function and adding ERSPAN specific logic.  The final patch
adds the test case using bpf_skb_{set,get}_tunnel_{key,opt}.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:04:52 -07:00
William Tu
ef88f89c83 samples/bpf: extend test_tunnel_bpf.sh with ERSPAN
Extend existing tests for vxlan, gre, geneve, ipip to
include ERSPAN tunnel.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:04:52 -07:00
William Tu
1a66a836da gre: add collect_md mode to ERSPAN tunnel
Similar to gre, vxlan, geneve, ipip tunnels, allow ERSPAN tunnels to
operate in 'collect metadata' mode.  bpf_skb_[gs]et_tunnel_key() helpers
can make use of it right away.  OVS can use it as well in the future.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:04:52 -07:00
William Tu
862a03c35e gre: refactor the gre_fb_xmit
The patch refactors the gre_fb_xmit function, by creating
prepare_fb_xmit function for later ERSPAN collect_md mode patch.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:04:52 -07:00
Oza Pawandeep
39b7a4ff93 PCI: iproc: Work around Stingray CRS defects
Configuration Request Retry Status ("CRS") completions are a required part
of PCIe.  A PCIe device may respond to config a request with a CRS
completion to indicate that it needs more time to initialize.  A Root Port
that receives a CRS completion may automatically retry the request, or it
may treat the request as a failed transaction.  For a failed read, it will
likely synthesize all 1's data, i.e., 0xffffffff, to complete the read to
the CPU.

CRS Software Visibility ("CRS SV") is an optional feature.  Per PCIe r3.1,
sec 2.3.2, if supported and enabled, a Root Port that receives a CRS
completion for a config read of the Vendor ID will synthesize 0x0001 data
(an invalid Vendor ID) instead of retrying or failing the transaction.  The
0x0001 data makes the CRS completion visible to software, so it can perform
other tasks while waiting for the device.

The iProc "Stingray" PCIe controller does not support CRS completions
correctly.  From the Stingray PCIe Controller spec:

  4.7.3.3. Retry Status On Configuration Cycle

  Endpoints are allowed to generate retry status on configuration cycles.
  In this case, the RC needs to re-issue the request. The IP does not
  handle this because the number of configuration cycles needed will
  probably be less than the total number of non-posted operations needed.

  When a retry status is received on the User RX interface for a
  configuration request that was sent on the User TX interface, it will be
  indicated with a completion with the CMPL_STATUS field set to 2=CRS, and
  the user will have to find the address and data values and send a new
  transaction on the User TX interface.  When the internal configuration
  space returns a retry status during a configuration cycle (user_cscfg =
  1) on the Command/Status interface, the pcie_cscrs will assert with the
  pcie_csack signal to indicate the CRS status.

  When the CRS Software Visibility Enable register in the Root Control
  register is enabled, the IP will return the data value to 0x0001 for the
  Vendor ID value and 0xffff  (all 1’s) for the rest of the data in the
  request for reads of offset 0 that return with CRS status.  This is true
  for both the User RX Interface and for the Command/Status interface.
  When CRS Software Visibility is enabled, the CMPL_STATUS field of the
  completion on the User RX Interface will not be 2=CRS and the pcie_cscrs
  signal will not assert on the Command/Status interface.

The Stingray hardware never reissues configuration requests when it
receives CRS completions.  Contrary to what sec 4.7.3.3 above says, when it
receives a CRS completion, it synthesizes 0xffff0001 data regardless of the
address of the read or the value of the CRS SV enable bit.

This is broken in two ways:

  1) When CRS SV is disabled, the Root Port should never synthesize the
  0x0001 value.  If it receives a CRS completion, it should fail the
  transaction and synthesize all 1's data.

  2) When CRS SV is enabled, the Root Port should only synthesize 0x0001
  data if it receives a CRS completion for a read of the Vendor ID.  If it
  receives a CRS completion for any other read, it should fail the
  transaction and synthesize all 1's data.

This breaks pci_flr_wait(), which reads the Command register and expects to
see all 1's data if the read fails because of CRS completions.  On
Stingray, it sees the incorrect 0xffff0001 data instead.

It also breaks config registers that contain the 0xffff0001 value.  If we
read such a register, software can't distinguish a CRS completion from the
actual value read from the device.

On Stingray, if we read 0xffff0001 data, assume this indicates a CRS
completion and retry the read for 500ms.  If we time out, return all 1's
(0xffffffff) data.  Note that this corrupts registers that happen to
contain 0xffff0001.

Stingray advertises CRS SV support in its Root Capabilities register, and
the CRS SV enable bit is writable (even though the hardware ignores it).
Mask out PCI_EXP_RTCAP_CRSVIS so software doesn't try to use CRS SV.

Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
[bhelgaas: changelog, add probe-time warning about corruption, don't
advertise CRS SV support, remove duplicate pci_generic_config_read32(),
fix alignment based on patch from Arnd Bergmann <arnd@arndb.de>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-08-28 16:43:30 -05:00
Oza Pawandeep
d005045bcf PCI: iproc: Factor out memory-mapped config access address calculation
Factor out the address calculation for memory-mapped config accesses as a
separate function.  No functional change intended.

Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-08-28 16:43:24 -05:00
David Ahern
03157937fe Revert "ipv4: make net_protocol const"
This reverts commit aa8db499ea.

Early demux structs can not be made const. Doing so results in:
[   84.967355] BUG: unable to handle kernel paging request at ffffffff81684b10
[   84.969272] IP: proc_configure_early_demux+0x1e/0x3d
[   84.970544] PGD 1a0a067
[   84.970546] P4D 1a0a067
[   84.971212] PUD 1a0b063
[   84.971733] PMD 80000000016001e1

[   84.972669] Oops: 0003 [#1] SMP
[   84.973065] Modules linked in: ip6table_filter ip6_tables veth vrf
[   84.973833] CPU: 0 PID: 955 Comm: sysctl Not tainted 4.13.0-rc6+ #22
[   84.974612] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[   84.975855] task: ffff88003854ce00 task.stack: ffffc900005a4000
[   84.976580] RIP: 0010:proc_configure_early_demux+0x1e/0x3d
[   84.977253] RSP: 0018:ffffc900005a7dd0 EFLAGS: 00010246
[   84.977891] RAX: ffffffff81684b10 RBX: 0000000000000001 RCX: 0000000000000000
[   84.978759] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000000
[   84.979628] RBP: ffffc900005a7dd0 R08: 0000000000000000 R09: 0000000000000000
[   84.980501] R10: 0000000000000001 R11: 0000000000000008 R12: 0000000000000001
[   84.981373] R13: ffffffffffffffea R14: ffffffff81a9b4c0 R15: 0000000000000002
[   84.982249] FS:  00007feb237b7700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[   84.983231] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.983941] CR2: ffffffff81684b10 CR3: 0000000038492000 CR4: 00000000000406f0
[   84.984817] Call Trace:
[   84.985133]  proc_tcp_early_demux+0x29/0x30

I think this is the second time such a patch has been reverted.

Cc: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 14:30:46 -07:00
Bhumika Goyal
dfbde55249 nbd: make device_attribute const
Make this const as is is only passed as an argument to the
function device_create_file and device_remove_file and the corresponding
arguments are of type const.
Done using Coccinelle

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-28 15:21:27 -06:00
Jens Axboe
b3c3051220 null_blk: use available 'dev' in nullb_device_power_store()
We already have this pointer, no need to use to_nullb_device()
again.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-28 15:06:31 -06:00
Shaohua Li
060fd198a3 block/nullb: delete unnecessary memory free
Commit 2984c86(nullb: factor disk parameters) has a typo. The
nullb_device allocation/free is done outside of null_add_dev. The commit
accidentally frees the nullb_device in error code path.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-28 15:06:17 -06:00
Jason Ekstrand
3e6fb72d6c drm/syncobj: Add a syncobj_array_find helper
The wait ioctl has a bunch of code to read an syncobj handle array from
userspace and turn it into an array of syncobj pointers.  We're about to
add two new IOCTLs which will need to work with arrays of syncobj
handles so let's make some helpers.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-08-29 06:28:23 +10:00
Jason Ekstrand
e7aca5031a drm/syncobj: Allow wait for submit and signal behavior (v5)
Vulkan VkFence semantics require that the application be able to perform
a CPU wait on work which may not yet have been submitted.  This is
perfectly safe because the CPU wait has a timeout which will get
triggered eventually if no work is ever submitted.  This behavior is
advantageous for multi-threaded workloads because, so long as all of the
threads agree on what fences to use up-front, you don't have the extra
cross-thread synchronization cost of thread A telling thread B that it
has submitted its dependent work and thread B is now free to wait.

Within a single process, this can be implemented in the userspace driver
by doing exactly the same kind of tracking the app would have to do
using posix condition variables or similar.  However, in order for this
to work cross-process (as is required by VK_KHR_external_fence), we need
to handle this in the kernel.

This commit adds a WAIT_FOR_SUBMIT flag to DRM_IOCTL_SYNCOBJ_WAIT which
instructs the IOCTL to wait for the syncobj to have a non-null fence and
then wait on the fence.  Combined with DRM_IOCTL_SYNCOBJ_RESET, you can
easily get the Vulkan behavior.

v2:
 - Fix a bug in the invalid syncobj error path
 - Unify the wait-all and wait-any cases
v3:
 - Unify the timeout == 0 case a bit with the timeout > 0 case
 - Use wait_event_interruptible_timeout
v4:
 - Use proxy fence
v5:
 - Revert to a combination of v2 and v3
 - Don't use proxy fences
 - Don't use wait_event_interruptible_timeout because it just adds an
   extra layer of callbacks

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-08-29 06:28:17 +10:00
Jason Ekstrand
1fc08218ed drm/syncobj: Add a CREATE_SIGNALED flag
This requests that the driver create the sync object such that it
already has a signaled dma_fence attached.  Because we don't need
anything in particular (just something signaled), we use a dummy null
fence.  This is useful for Vulkan which has a similar flag that can be
passed to vkCreateFence.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-08-29 06:27:41 +10:00
Jason Ekstrand
9c19fb10a5 drm/syncobj: Add a callback mechanism for replace_fence (v3)
It is useful in certain circumstances to know when the fence is replaced
in a syncobj.  Specifically, it may be useful to know when the fence
goes from NULL to something valid.  This does make syncobj_replace_fence
a little more expensive because it has to take a lock but, in the common
case where there is no callback list, it spends a very short amount of
time inside the lock.

v2:
 - Don't lock in drm_syncobj_fence_get.  We only really need to lock
   around fence_replace to make the callback work.
v3:
 - Fix the cb_list comment to make kbuild happy

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-08-29 06:26:42 +10:00