This patch provides hints to irqbalance to map bnxt_en device IRQs
to specific CPU cores. cpumask_local_spread() is used, which first
maps IRQs to near NUMA cores; when those cores are exhausted, IRQs
are mapped to far NUMA cores.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the number of TX rings is changed (e.g. ethtool -L, enabling XDP TX
rings, etc), the current code tries to reserve the new number of TX rings
before closing and re-opening the NIC. If we are unable to reserve the
new TX rings, we abort the operation and keep the current TX rings.
The problem is that the firmware will disable the current TX rings even
when it cannot reserve the new set of TX rings. We fix it as follows:
1. Instead of reserving the new set of TX rings, just ask the firmware
to check if the new set of TX rings is available. There is a flag in
the firmware message to do that. If not available, abort and the
current TX rings will not be disabled.
2. Do the actual TX ring reservation in the path that opens the NIC.
We keep the number of TX rings currently successfully reserved. If the
number of TX rings is different than the reserved TX rings, we call
firmware and reserve again.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flow APIs are added in this firmware interface.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Mendoza-Jonas says:
====================
NCSI VLAN Filtering Support
This series (mainly patch 2) adds VLAN filtering to the NCSI implementation.
A fair amount of code already exists in the NCSI stack for VLAN filtering but
none of it is actually hooked up. This goes the final mile and fixes a few
bugs in the existing code found along the way (patch 1).
Patch 3 adds the appropriate flag and callbacks to the ftgmac100 driver to
enable filtering as it's a large consumer of NCSI (and what I've been
testing on).
v3: - Add comment describing change to ncsi_find_filter()
- Catch NULL in clear_one_vid() from ncsi_get_filter()
- Simplify state changes when kicking updated channel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Register the ndo_vlan_rx_{add,kill}_vid callbacks and set the
NETIF_F_HW_VLAN_CTAG_FILTER if NCSI is available.
This allows the VLAN core to notify the NCSI driver when changes occur
so that the remote NCSI channel can be properly configured to filter on
the set VLAN tags.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI
stack process new VLAN tags and configure the channel VLAN filter
appropriately.
Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent
for each one, meaning the ncsi_dev_state_config_svf state must be
repeated. An internal list of VLAN tags is maintained, and compared
against the current channel's ncsi_channel_filter in order to keep track
within the state. VLAN filters are removed in a similar manner, with the
introduction of the ncsi_dev_state_config_clear_vids state. The maximum
number of VLAN tag filters is determined by the "Get Capabilities"
response from the channel.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-08-27
This series contains updates to i40e and i40evf only.
Sudheer updates code comments and state variable so that adminq_subtask
will have accutate information whenever it gets scheduled.
Mariusz stores information about FEC modes, to be used to printing link
states information, so that we do not need to call admin queue when
reporting link status. Adds VF support for controlling VLAN tag
stripping via ethtool.
Jake provides the majority of changes in this series, starting with
increasing the size of the prefix buffer so that it can hold enough
characters for every possible input, which prevents snprintf truncation.
Fixed other string truncation errors/warnings produced by GCC 7.x.
Removed an unnecessary workaround for resetting XPS. Fixed an issue
where there is a mismatched affinity mask value, so initialize the value
to cpu_possible_mask and invert the logic for checking incorrect CPU vs
IRQ affinity so that the exceptional case is handled at the check.
Removed ULTRA latency mode due to several issues found and will be
looking at better solution for small packet workloads.
Akeem fixes an issue where the incorrect flag was being used to set
promiscuous mode for unicast, which was enabling promiscuous mode only
for multicast instead of unicast.
Carolyn fixes an issue where an error return value is set, but this
value can be overwritten before we actually do exit the function. So
remove the error code assignment and add code comments for better
understanding on why we do not need to set and return the error.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the search for index of constant buffer size
Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the hw MTU limitation by setting max_mtu
Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman says:
====================
irda: move it to drivers/staging so we can delete it
The IRDA code has long been obsolete and broken. So, to keep people
from trying to use it, and to prevent people from having to maintain it,
let's move it to drivers/staging/ so that we can delete it entirely from
the kernel in a few releases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The irda code will be deleted in a future kernel release, so no need to
have anyone do any new work on it.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
And finally, move the irda include files into
drivers/staging/irda/include/net/irda. Yes, it's a long path, but it
makes it easy for us to just add a Makefile directory path addition and
all of the net and drivers code "just works".
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the irda drivers from drivers/net/irda/ to
drivers/staging/irda/drivers as they will be deleted in a future kernel
release.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's time to get rid of IRDA. It's long been broken, and no one seems
to use it anymore. So move it to staging and after a while, we can
delete it from there.
To start, move the network irda core from net/irda to
drivers/staging/irda/net/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur says:
====================
Add RSS to DPAA 1.x Ethernet driver
This patch set introduces Receive Side Scaling for the DPAA Ethernet
driver. Documentation is updated with details related to the new
feature and limitations that apply.
Added also a small fix.
v2: removed a C++ style comment
v3: move struct fman to header file to avoid exporting a function
v4: addressed compilation issues introduced in v3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the skb hash when then FMan Keygen hash result is available.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow ethtool control of the Rx flow hashing. By default RSS is
enabled, this allows to turn it off by bypassing the FMan Keygen
block and sending all traffic on the default Rx frame queue.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a block of 128 Rx frame queues per port. The FMan hardware will
send traffic on one of these queues based on the FMan port Parse
Classify Distribute setup. The hash computed by the FMan Keygen
block will select the Rx FQ.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the FMan Keygen with a hardcoded scheme to spread
incoming traffic on a FQ range based on source and destination IPs
and ports.
Signed-off-by: Iordache Florinel <florinel.iordache@nxp.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kfree on NULL pointer is a no-op and therefore checking is redundant.
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 520ac30f45 ("net_sched: drop packets after root qdisc lock
is released) made a big change of tc for performance. But there are
some points which are not changed in SFQ enqueue operation.
1. Fail to find the SFQ hash slot;
2. When the queue is full;
Now use qdisc_drop instead free skb directly.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the neighbor traversal the neighbors from different families
should be ignored.
Fixes: c58035a74aba ("mlxsw: spectrum_dpipe: Add support for IPv4 host table dump")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Makes no sense to have dpipe compiled in when devlink is not enabled,
because the devlink dpipe registation is noop function. So don't compile
it in. This also fixes missing extern structs errors.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: a86f030915 ("mlxsw: spectrum_dpipe: Add support for IPv4 host table dump")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It uses VMBus ringbuffer as the
transportation layer.
With hv_sock, applications between the host (Windows 10, Windows Server
2016 or newer) and the guest can talk with each other using the traditional
socket APIs.
More info about Hyper-V Sockets is available here:
"Make your own integration services":
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service
The patch implements the necessary support in Linux guest by introducing a new
vsock transport for AF_VSOCK.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Andy King <acking@vmware.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: George Zhang <georgezhang@vmware.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Reilly Grant <grantr@vmware.com>
Cc: Asias He <asias@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a basic test for checking whether kernel is populating
the jited and xlated BPF images. It was used to confirm
the behaviour change from commit d777b2ddbe ("bpf: don't
zero out the info struct in bpf_obj_get_info_by_fd()"),
which made bpf_obj_get_info_by_fd() usable for retrieving
the image dumps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"err" is set to zero if bpf_map_area_alloc() fails so it means we return
ERR_PTR(0) which is NULL. The caller, find_and_alloc_map(), is not
expecting NULL returns and will oops.
Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Twice patches trying to constify inet{6}_protocol have been reverted:
39294c3df2 ("Revert "ipv6: constify inet6_protocol structures"") to
revert 3a3a4e3054 and then 03157937fe ("Revert "ipv4: make
net_protocol const"") to revert aa8db499ea.
Add a comment that the structures can not be const because the
early_demux field can change based on a sysctl.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The xen driver initializes struct ubuf_info fields using designated
initializers. I recently moved these fields inside a nested anonymous
struct inside an anonymous union. I had missed this use case.
This breaks compilation of xen-netback with older compilers.
>From kbuild bot with gcc-4.4.7:
drivers/net//xen-netback/interface.c: In function
'xenvif_init_queue':
>> drivers/net//xen-netback/interface.c:554: error: unknown field 'ctx' specified in initializer
>> drivers/net//xen-netback/interface.c:554: warning: missing braces around initializer
drivers/net//xen-netback/interface.c:554: warning: (near initialization for '(anonymous).<anonymous>')
>> drivers/net//xen-netback/interface.c:554: warning: initialization makes integer from pointer without a cast
>> drivers/net//xen-netback/interface.c:555: error: unknown field 'desc' specified in initializer
Add double braces around the designated initializers to match their
nested position in the struct. After this, compilation succeeds again.
Fixes: 4ab6c99d99 ("sock: MSG_ZEROCOPY notification coalescing")
Reported-by: kbuild bot <lpk@intel.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu says:
====================
gre: add collect_md mode for ERSPAN tunnel
This patch series provide collect_md mode for ERSPAN tunnel. The fist patch
refactors the existing gre_fb_xmit function by exacting the route cache
portion into a new function called prepare_fb_xmit. The second patch
introduces the collect_md mode for ERSPAN tunnel, by calling the
prepare_fb_xmit function and adding ERSPAN specific logic. The final patch
adds the test case using bpf_skb_{set,get}_tunnel_{key,opt}.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend existing tests for vxlan, gre, geneve, ipip to
include ERSPAN tunnel.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to gre, vxlan, geneve, ipip tunnels, allow ERSPAN tunnels to
operate in 'collect metadata' mode. bpf_skb_[gs]et_tunnel_key() helpers
can make use of it right away. OVS can use it as well in the future.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch refactors the gre_fb_xmit function, by creating
prepare_fb_xmit function for later ERSPAN collect_md mode patch.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make this const as it is either used during a copy operation or passed
to a const argument of the function rhltable_init
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make these const as they are only passed to a const argument of the
function inet_add_protocol.
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make this const as it is only passed to a const argument of the function
ebt_register_table.
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend says:
====================
sockmap UAPI updates and fixes
This series updates sockmap UAPI, adds additional test cases and
provides a couple fixes.
First the UAPI changes. The original API added two sockmap specific
API artifacts (a) a new map_flags field with a sockmap specific update
command and (b) a new sockmap specific attach field in the attach data
structure. After this series instead of attaching programs with a
single command now two commands are used to attach programs to maps
individually. This allows us to add new programs easily in the future
and avoids any specific sockmap data structure additions. The
map_flags field is also removed and instead we allow socks to be
added to multiple maps that may or may not have programs attached.
This allows users to decide if a sock should run a SK_SKB program type
on receive based on the map it is attached to. This is a nice
improvement. See patches for specific details.
More test cases were added to test above changes and also stress test
the interface.
Finally two fixes/improvements were made. First a missing rcu
section was added. Second now sockmap can build without KCM being
used to trigger 'y' on CONFIG_STREAM_PARSER by selecting a new
BPF config option.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sockmap is a bit different than normal stress tests that can run
in parallel as is. We need to reuse the same socket pool and map
pool to get good stress test cases.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SOCKMAP uses strparser code (compiled with Kconfig option
CONFIG_STREAM_PARSER) to run the parser BPF program. Without this
config option set sockmap wont be compiled. However, at the moment
the only way to pull in the strparser code is to enable KCM.
To resolve this create a BPF specific config option to pull
only the strparser piece in that sockmap needs. This also
allows folks who want to use BPF/syscall/maps but don't need
sockmap to easily opt out.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After userspace pushes sockets into a sockmap it may not be receiving
data (assuming stream_{parser|verdict} programs are attached). But, it
may still want to manage the socks. A common pattern is to poll/select
for a POLLRDHUP event so we can close the sock.
This patch adds the logic to wake up these listeners.
Also add TCP_SYN_SENT to the list of events to handle. We don't want
to break the connection just because we happen to be in this state.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When attaching a program to sockmap we need to check map type
is correct.
Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tests packet read/writes and additional skb fields.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some more sockmap tests to cover,
- forwarding to NULL entries
- more than two maps to test list ops
- forwarding to different map
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
References to psock must be done inside RCU critical section.
Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The addition of map_flags BPF_SOCKMAP_STRPARSER flags was to handle a
specific use case where we want to have BPF parse program disabled on
an entry in a sockmap.
However, Alexei found the API a bit cumbersome and I agreed. Lets
remove the STRPARSER flag and support the use case by allowing socks
to be in multiple maps. This allows users to create two maps one with
programs attached and one without. When socks are added to maps they
now inherit any programs attached to the map. This is a nice
generalization and IMO improves the API.
The API rules are less ambiguous and do not need a flag:
- When a sock is added to a sockmap we have two cases,
i. The sock map does not have any attached programs so
we can add sock to map without inheriting bpf programs.
The sock may exist in 0 or more other maps.
ii. The sock map has an attached BPF program. To avoid duplicate
bpf programs we only add the sock entry if it does not have
an existing strparser/verdict attached, returning -EBUSY if
a program is already attached. Otherwise attach the program
and inherit strparser/verdict programs from the sock map.
This allows for socks to be in a multiple maps for redirects and
inherit a BPF program from a single map.
Also this patch simplifies the logic around BPF_{EXIST|NOEXIST|ANY}
flags. In the original patch I tried to be extra clever and only
update map entries when necessary. Now I've decided the complexity
is not worth it. If users constantly update an entry with the same
sock for no reason (i.e. update an entry without actually changing
any parameters on map or sock) we still do an alloc/release. Using
this and allowing multiple entries of a sock to exist in a map the
logic becomes much simpler.
Note: Now that multiple maps are supported the "maps" pointer called
when a socket is closed becomes a list of maps to remove the sock from.
To keep the map up to date when a sock is added to the sockmap we must
add the map/elem in the list. Likewise when it is removed we must
remove it from the list. This results in searching the per psock list
on delete operation. On TCP_CLOSE events we walk the list and remove
the psock from all map/entry locations. I don't see any perf
implications in this because at most I have a psock in two maps. If
a psock were to be in many maps its possibly this might be noticeable
on delete but I can't think of a reason to dup a psock in many maps.
The sk_callback_lock is used to protect read/writes to the list. This
was convenient because in all locations we were taking the lock
anyways just after working on the list. Also the lock is per sock so
in normal cases we shouldn't see any contention.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>