Commit Graph

694589 Commits

Author SHA1 Message Date
Vasundhara Volam
56f0fd80d1 bnxt_en: assign CPU affinity hints to bnxt_en IRQs
This patch provides hints to irqbalance to map bnxt_en device IRQs
to specific CPU cores. cpumask_local_spread() is used, which first
maps IRQs to near NUMA cores; when those cores are exhausted, IRQs
are mapped to far NUMA cores.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:57:09 -07:00
Michael Chan
98fdbe73bf bnxt_en: Improve tx ring reservation logic.
When the number of TX rings is changed (e.g. ethtool -L, enabling XDP TX
rings, etc), the current code tries to reserve the new number of TX rings
before closing and re-opening the NIC.  If we are unable to reserve the
new TX rings, we abort the operation and keep the current TX rings.

The problem is that the firmware will disable the current TX rings even
when it cannot reserve the new set of TX rings.  We fix it as follows:

1. Instead of reserving the new set of TX rings, just ask the firmware
to check if the new set of TX rings is available.  There is a flag in
the firmware message to do that.  If not available, abort and the
current TX rings will not be disabled.

2. Do the actual TX ring reservation in the path that opens the NIC.
We keep the number of TX rings currently successfully reserved.  If the
number of TX rings is different than the reserved TX rings, we call
firmware and reserve again.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:57:09 -07:00
Michael Chan
6a17eb27bf bnxt_en: Update firmware interface spec. to 1.8.1.4.
Flow APIs are added in this firmware interface.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:57:09 -07:00
David S. Miller
33b86ba057 Merge branch 'NCSI-vlan-filtering'
Samuel Mendoza-Jonas says:

====================
NCSI VLAN Filtering Support

This series (mainly patch 2) adds VLAN filtering to the NCSI implementation.
A fair amount of code already exists in the NCSI stack for VLAN filtering but
none of it is actually hooked up. This goes the final mile and fixes a few
bugs in the existing code found along the way (patch 1).

Patch 3 adds the appropriate flag and callbacks to the ftgmac100 driver to
enable filtering as it's a large consumer of NCSI (and what I've been
testing on).

v3:	- Add comment describing change to ncsi_find_filter()
	- Catch NULL in clear_one_vid() from ncsi_get_filter()
	- Simplify state changes when kicking updated channel
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:49:49 -07:00
Samuel Mendoza-Jonas
51564585d8 ftgmac100: Support NCSI VLAN filtering when available
Register the ndo_vlan_rx_{add,kill}_vid callbacks and set the
NETIF_F_HW_VLAN_CTAG_FILTER if NCSI is available.
This allows the VLAN core to notify the NCSI driver when changes occur
so that the remote NCSI channel can be properly configured to filter on
the set VLAN tags.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:49:49 -07:00
Samuel Mendoza-Jonas
21acf63013 net/ncsi: Configure VLAN tag filter
Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI
stack process new VLAN tags and configure the channel VLAN filter
appropriately.
Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent
for each one, meaning the ncsi_dev_state_config_svf state must be
repeated. An internal list of VLAN tags is maintained, and compared
against the current channel's ncsi_channel_filter in order to keep track
within the state. VLAN filters are removed in a similar manner, with the
introduction of the ncsi_dev_state_config_clear_vids state. The maximum
number of VLAN tag filters is determined by the "Get Capabilities"
response from the channel.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:49:49 -07:00
Samuel Mendoza-Jonas
8579a67e13 net/ncsi: Fix several packet definitions
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:49:49 -07:00
David S. Miller
a74e344a99 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-08-27

This series contains updates to i40e and i40evf only.

Sudheer updates code comments and state variable so that adminq_subtask
will have accutate information whenever it gets scheduled.

Mariusz stores information about FEC modes, to be used to printing link
states information, so that we do not need to call admin queue when
reporting link status.  Adds VF support for controlling VLAN tag
stripping via ethtool.

Jake provides the majority of changes in this series, starting with
increasing the size of the prefix buffer so that it can hold enough
characters for every possible input, which prevents snprintf truncation.
Fixed other string truncation errors/warnings produced by GCC 7.x.
Removed an unnecessary workaround for resetting XPS.  Fixed an issue
where there is a mismatched affinity mask value, so initialize the value
to cpu_possible_mask and invert the logic for checking incorrect CPU vs
IRQ affinity so that the exceptional case is handled at the check.
Removed ULTRA latency mode due to several issues found and will be
looking at better solution for small packet workloads.

Akeem fixes an issue where the incorrect flag was being used to set
promiscuous mode for unicast, which was enabling promiscuous mode only
for multicast instead of unicast.

Carolyn fixes an issue where an error return value is set, but this
value can be overwritten before we actually do exit the function.  So
remove the error code assignment and add code comments for better
understanding on why we do not need to set and return the error.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:46:25 -07:00
Aviad Krawczyk
cde66f24c3 net-next/hinic: fix comparison of a uint16_t type with -1
Remove the search for index of constant buffer size

Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:44:39 -07:00
Aviad Krawczyk
52f31422d4 net-next/hinic: Fix MTU limitation
Fix the hw MTU limitation by setting max_mtu

Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:43:46 -07:00
David S. Miller
70f5c98228 Merge branch 'irda-move-to-staging'
Greg Kroah-Hartman says:

====================
irda: move it to drivers/staging so we can delete it

The IRDA code has long been obsolete and broken.  So, to keep people
from trying to use it, and to prevent people from having to maintain it,
let's move it to drivers/staging/ so that we can delete it entirely from
the kernel in a few releases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:42:57 -07:00
Greg Kroah-Hartman
333b3d070f staging: irda: add a TODO file.
The irda code will be deleted in a future kernel release, so no need to
have anyone do any new work on it.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:42:57 -07:00
Greg Kroah-Hartman
5bf916ee0a irda: move include/net/irda into staging subdirectory
And finally, move the irda include files into
drivers/staging/irda/include/net/irda.  Yes, it's a long path, but it
makes it easy for us to just add a Makefile directory path addition and
all of the net and drivers code "just works".

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:42:57 -07:00
Greg Kroah-Hartman
6c391ff758 irda: move drivers/net/irda to drivers/staging/irda/drivers
Move the irda drivers from drivers/net/irda/ to
drivers/staging/irda/drivers as they will be deleted in a future kernel
release.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:42:57 -07:00
Greg Kroah-Hartman
1ca163afb6 irda: move net/irda/ to drivers/staging/irda/net/
It's time to get rid of IRDA.  It's long been broken, and no one seems
to use it anymore.  So move it to staging and after a while, we can
delete it from there.

To start, move the network irda core from net/irda to
drivers/staging/irda/net/

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:42:56 -07:00
David S. Miller
9d8c7e5a87 Merge branch 'dpaa_eth-rss'
Madalin Bucur says:

====================
Add RSS to DPAA 1.x Ethernet driver

This patch set introduces Receive Side Scaling for the DPAA Ethernet
driver. Documentation is updated with details related to the new
feature and limitations that apply.
Added also a small fix.

v2: removed a C++ style comment
v3: move struct fman to header file to avoid exporting a function
v4: addressed compilation issues introduced in v3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:41:01 -07:00
Madalin Bucur
52600dcc9e dpaa_eth: check allocation result
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:41:01 -07:00
Madalin Bucur
0659191630 Documentation: networking: add RSS information
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:41:01 -07:00
Madalin Bucur
056057e288 dpaa_eth: add NETIF_F_RXHASH
Set the skb hash when then FMan Keygen hash result is available.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:41:00 -07:00
Madalin Bucur
bcf0994b23 dpaa_eth: enable Rx hashing control
Allow ethtool control of the Rx flow hashing. By default RSS is
enabled, this allows to turn it off by bypassing the FMan Keygen
block and sending all traffic on the default Rx frame queue.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:41:00 -07:00
Madalin Bucur
3150b7c20b dpaa_eth: use multiple Rx frame queues
Add a block of 128 Rx frame queues per port. The FMan hardware will
send traffic on one of these queues based on the FMan port Parse
Classify Distribute setup. The hash computed by the FMan Keygen
block will select the Rx FQ.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:41:00 -07:00
Iordache Florinel-R70177
7472f4f281 fsl/fman: enable FMan Keygen
Add support for the FMan Keygen with a hardcoded scheme to spread
incoming traffic on a FQ range based on source and destination IPs
and ports.

Signed-off-by: Iordache Florinel <florinel.iordache@nxp.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:41:00 -07:00
Madalin Bucur
ca58ce5766 fsl/fman: move struct fman to header file
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:41:00 -07:00
Himanshu Jha
0df49584ed net: ethernet: broadcom: Remove null check before kfree
Kfree on NULL pointer is a no-op and therefore checking is redundant.

Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:53:23 -07:00
Gao Feng
f9ab7425b3 sched: sfq: drop packets after root qdisc lock is released
The commit 520ac30f45 ("net_sched: drop packets after root qdisc lock
is released) made a big change of tc for performance. But there are
some points which are not changed in SFQ enqueue operation.
1. Fail to find the SFQ hash slot;
2. When the queue is full;

Now use qdisc_drop instead free skb directly.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:51:42 -07:00
David S. Miller
f7423eea52 Merge branch 'mlxsw-dpipe-fixes'
Jiri Pirko says:

====================
mlxsw: spectrum: Fix couple of dpipe ipv4 host table bugs

Arkadi Sharshevsky (1):
  mlxsw: spectrum_dpipe: Fix host table dump

Jiri Pirko (1):
  mlxsw: spectrum: compile-in dpipe support only if devlink is enabled
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:41:15 -07:00
Arkadi Sharshevsky
18fed7e15d mlxsw: spectrum_dpipe: Fix host table dump
During the neighbor traversal the neighbors from different families
should be ignored.

Fixes: c58035a74aba ("mlxsw: spectrum_dpipe: Add support for IPv4 host table dump")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:41:15 -07:00
Jiri Pirko
10bfec0a2b mlxsw: spectrum: compile-in dpipe support only if devlink is enabled
Makes no sense to have dpipe compiled in when devlink is not enabled,
because the devlink dpipe registation is noop function. So don't compile
it in. This also fixes missing extern structs errors.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: a86f030915 ("mlxsw: spectrum_dpipe: Add support for IPv4 host table dump")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:41:15 -07:00
Dexuan Cui
ae0078fcf0 hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It uses VMBus ringbuffer as the
transportation layer.

With hv_sock, applications between the host (Windows 10, Windows Server
2016 or newer) and the guest can talk with each other using the traditional
socket APIs.

More info about Hyper-V Sockets is available here:

"Make your own integration services":
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service

The patch implements the necessary support in Linux guest by introducing a new
vsock transport for AF_VSOCK.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Andy King <acking@vmware.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: George Zhang <georgezhang@vmware.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Reilly Grant <grantr@vmware.com>
Cc: Asias He <asias@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:38:18 -07:00
Jakub Kicinski
7cadf2cbe8 selftests/bpf: check the instruction dumps are populated
Add a basic test for checking whether kernel is populating
the jited and xlated BPF images.  It was used to confirm
the behaviour change from commit d777b2ddbe ("bpf: don't
zero out the info struct in bpf_obj_get_info_by_fd()"),
which made bpf_obj_get_info_by_fd() usable for retrieving
the image dumps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:35:18 -07:00
Dan Carpenter
f740c34ee5 bpf: fix oops on allocation failure
"err" is set to zero if bpf_map_area_alloc() fails so it means we return
ERR_PTR(0) which is NULL.  The caller, find_and_alloc_map(), is not
expecting NULL returns and will oops.

Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:23:34 -07:00
David Ahern
a8e3bb347d net: Add comment that early_demux can change via sysctl
Twice patches trying to constify inet{6}_protocol have been reverted:
39294c3df2 ("Revert "ipv6: constify inet6_protocol structures"") to
revert 3a3a4e3054 and then 03157937fe ("Revert "ipv4: make
net_protocol const"") to revert aa8db499ea.

Add a comment that the structures can not be const because the
early_demux field can change based on a sysctl.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:17:29 -07:00
Willem de Bruijn
cc8737a5fe xen-netback: update ubuf_info initialization to anonymous union
The xen driver initializes struct ubuf_info fields using designated
initializers. I recently moved these fields inside a nested anonymous
struct inside an anonymous union. I had missed this use case.

This breaks compilation of xen-netback with older compilers.
>From kbuild bot with gcc-4.4.7:

   drivers/net//xen-netback/interface.c: In function
   'xenvif_init_queue':
   >> drivers/net//xen-netback/interface.c:554: error: unknown field 'ctx' specified in initializer
   >> drivers/net//xen-netback/interface.c:554: warning: missing braces around initializer
      drivers/net//xen-netback/interface.c:554: warning: (near initialization for '(anonymous).<anonymous>')
   >> drivers/net//xen-netback/interface.c:554: warning: initialization makes integer from pointer without a cast
   >> drivers/net//xen-netback/interface.c:555: error: unknown field 'desc' specified in initializer

Add double braces around the designated initializers to match their
nested position in the struct. After this, compilation succeeds again.

Fixes: 4ab6c99d99 ("sock: MSG_ZEROCOPY notification coalescing")
Reported-by: kbuild bot <lpk@intel.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:11:50 -07:00
David S. Miller
d5a915b978 Merge branch 'gre-add-collect_md-mode-for-ERSPAN-tunnel'
William Tu says:

====================
gre: add collect_md mode for ERSPAN tunnel

This patch series provide collect_md mode for ERSPAN tunnel.  The fist patch
refactors the existing gre_fb_xmit function by exacting the route cache
portion into a new function called prepare_fb_xmit.  The second patch
introduces the collect_md mode for ERSPAN tunnel, by calling the
prepare_fb_xmit function and adding ERSPAN specific logic.  The final patch
adds the test case using bpf_skb_{set,get}_tunnel_{key,opt}.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:04:52 -07:00
William Tu
ef88f89c83 samples/bpf: extend test_tunnel_bpf.sh with ERSPAN
Extend existing tests for vxlan, gre, geneve, ipip to
include ERSPAN tunnel.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:04:52 -07:00
William Tu
1a66a836da gre: add collect_md mode to ERSPAN tunnel
Similar to gre, vxlan, geneve, ipip tunnels, allow ERSPAN tunnels to
operate in 'collect metadata' mode.  bpf_skb_[gs]et_tunnel_key() helpers
can make use of it right away.  OVS can use it as well in the future.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:04:52 -07:00
William Tu
862a03c35e gre: refactor the gre_fb_xmit
The patch refactors the gre_fb_xmit function, by creating
prepare_fb_xmit function for later ERSPAN collect_md mode patch.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:04:52 -07:00
David Ahern
03157937fe Revert "ipv4: make net_protocol const"
This reverts commit aa8db499ea.

Early demux structs can not be made const. Doing so results in:
[   84.967355] BUG: unable to handle kernel paging request at ffffffff81684b10
[   84.969272] IP: proc_configure_early_demux+0x1e/0x3d
[   84.970544] PGD 1a0a067
[   84.970546] P4D 1a0a067
[   84.971212] PUD 1a0b063
[   84.971733] PMD 80000000016001e1

[   84.972669] Oops: 0003 [#1] SMP
[   84.973065] Modules linked in: ip6table_filter ip6_tables veth vrf
[   84.973833] CPU: 0 PID: 955 Comm: sysctl Not tainted 4.13.0-rc6+ #22
[   84.974612] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[   84.975855] task: ffff88003854ce00 task.stack: ffffc900005a4000
[   84.976580] RIP: 0010:proc_configure_early_demux+0x1e/0x3d
[   84.977253] RSP: 0018:ffffc900005a7dd0 EFLAGS: 00010246
[   84.977891] RAX: ffffffff81684b10 RBX: 0000000000000001 RCX: 0000000000000000
[   84.978759] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000000
[   84.979628] RBP: ffffc900005a7dd0 R08: 0000000000000000 R09: 0000000000000000
[   84.980501] R10: 0000000000000001 R11: 0000000000000008 R12: 0000000000000001
[   84.981373] R13: ffffffffffffffea R14: ffffffff81a9b4c0 R15: 0000000000000002
[   84.982249] FS:  00007feb237b7700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[   84.983231] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.983941] CR2: ffffffff81684b10 CR3: 0000000038492000 CR4: 00000000000406f0
[   84.984817] Call Trace:
[   84.985133]  proc_tcp_early_demux+0x29/0x30

I think this is the second time such a patch has been reverted.

Cc: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 14:30:46 -07:00
Bhumika Goyal
8209432a5e RDS: make rhashtable_params const
Make this const as it is either used during a copy operation or passed
to a const argument of the function rhltable_init

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:30:02 -07:00
Bhumika Goyal
aa8db499ea ipv4: make net_protocol const
Make these const as they are only passed to a const argument of the
function inet_add_protocol.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:30:02 -07:00
Bhumika Goyal
eb73ddebe9 bridge: make ebt_table const
Make this const as it is only passed to a const argument of the function
ebt_register_table.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:30:02 -07:00
David S. Miller
7a483899b5 Merge branch 'sockmap-uapi-updates-and-fixes'
John Fastabend says:

====================
sockmap UAPI updates and fixes

This series updates sockmap UAPI, adds additional test cases and
provides a couple fixes.

First the UAPI changes. The original API added two sockmap specific
API artifacts (a) a new map_flags field with a sockmap specific update
command and (b) a new sockmap specific attach field in the attach data
structure. After this series instead of attaching programs with a
single command now two commands are used to attach programs to maps
individually. This allows us to add new programs easily in the future
and avoids any specific sockmap data structure additions. The
map_flags field is also removed and instead we allow socks to be
added to multiple maps that may or may not have programs attached.
This allows users to decide if a sock should run a SK_SKB program type
on receive based on the map it is attached to. This is a nice
improvement. See patches for specific details.

More test cases were added to test above changes and also stress test
the interface.

Finally two fixes/improvements were made. First a missing rcu
section was added. Second now sockmap can build without KCM being
used to trigger 'y' on CONFIG_STREAM_PARSER by selecting a new
BPF config option.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:13:22 -07:00
John Fastabend
3f0d6a1698 bpf: test_maps add sockmap stress test
Sockmap is a bit different than normal stress tests that can run
in parallel as is. We need to reuse the same socket pool and map
pool to get good stress test cases.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:13:22 -07:00
John Fastabend
0884824663 bpf: sockmap requires STREAM_PARSER add Kconfig entry
SOCKMAP uses strparser code (compiled with Kconfig option
CONFIG_STREAM_PARSER) to run the parser BPF program. Without this
config option set sockmap wont be compiled. However, at the moment
the only way to pull in the strparser code is to enable KCM.

To resolve this create a BPF specific config option to pull
only the strparser piece in that sockmap needs. This also
allows folks who want to use BPF/syscall/maps but don't need
sockmap to easily opt out.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:13:22 -07:00
John Fastabend
78aeaaef99 bpf: sockmap indicate sock events to listeners
After userspace pushes sockets into a sockmap it may not be receiving
data (assuming stream_{parser|verdict} programs are attached). But, it
may still want to manage the socks. A common pattern is to poll/select
for a POLLRDHUP event so we can close the sock.

This patch adds the logic to wake up these listeners.

Also add TCP_SYN_SENT to the list of events to handle. We don't want
to break the connection just because we happen to be in this state.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:13:22 -07:00
John Fastabend
81374aaa26 bpf: harden sockmap program attach to ensure correct map type
When attaching a program to sockmap we need to check map type
is correct.

Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:13:22 -07:00
John Fastabend
ed85054d34 bpf: more SK_SKB selftests
Tests packet read/writes and additional skb fields.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:13:22 -07:00
John Fastabend
6fd28865c2 bpf: additional sockmap self tests
Add some more sockmap tests to cover,

 - forwarding to NULL entries
 - more than two maps to test list ops
 - forwarding to different map

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:13:21 -07:00
John Fastabend
d26e597d87 bpf: sockmap add missing rcu_read_(un)lock in smap_data_ready
References to psock must be done inside RCU critical section.

Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:13:21 -07:00
John Fastabend
2f857d0460 bpf: sockmap, remove STRPARSER map_flags and add multi-map support
The addition of map_flags BPF_SOCKMAP_STRPARSER flags was to handle a
specific use case where we want to have BPF parse program disabled on
an entry in a sockmap.

However, Alexei found the API a bit cumbersome and I agreed. Lets
remove the STRPARSER flag and support the use case by allowing socks
to be in multiple maps. This allows users to create two maps one with
programs attached and one without. When socks are added to maps they
now inherit any programs attached to the map. This is a nice
generalization and IMO improves the API.

The API rules are less ambiguous and do not need a flag:

  - When a sock is added to a sockmap we have two cases,

     i. The sock map does not have any attached programs so
        we can add sock to map without inheriting bpf programs.
        The sock may exist in 0 or more other maps.

    ii. The sock map has an attached BPF program. To avoid duplicate
        bpf programs we only add the sock entry if it does not have
        an existing strparser/verdict attached, returning -EBUSY if
        a program is already attached. Otherwise attach the program
        and inherit strparser/verdict programs from the sock map.

This allows for socks to be in a multiple maps for redirects and
inherit a BPF program from a single map.

Also this patch simplifies the logic around BPF_{EXIST|NOEXIST|ANY}
flags. In the original patch I tried to be extra clever and only
update map entries when necessary. Now I've decided the complexity
is not worth it. If users constantly update an entry with the same
sock for no reason (i.e. update an entry without actually changing
any parameters on map or sock) we still do an alloc/release. Using
this and allowing multiple entries of a sock to exist in a map the
logic becomes much simpler.

Note: Now that multiple maps are supported the "maps" pointer called
when a socket is closed becomes a list of maps to remove the sock from.
To keep the map up to date when a sock is added to the sockmap we must
add the map/elem in the list. Likewise when it is removed we must
remove it from the list. This results in searching the per psock list
on delete operation. On TCP_CLOSE events we walk the list and remove
the psock from all map/entry locations. I don't see any perf
implications in this because at most I have a psock in two maps. If
a psock were to be in many maps its possibly this might be noticeable
on delete but I can't think of a reason to dup a psock in many maps.
The sk_callback_lock is used to protect read/writes to the list. This
was convenient because in all locations we were taking the lock
anyways just after working on the list. Also the lock is per sock so
in normal cases we shouldn't see any contention.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:13:21 -07:00