In preparation to make DSA master devices point to their corresponding
CPU port instead of the whole tree, add copies of dst and rcv in the
dsa_port structure so that we keep fast access in the receive hot path.
Also keep the copies at the beginning of the dsa_port structure in order
to ensure they are available in cacheline 1.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DSA tagging protocol operations are specific to each CPU port,
thus the dsa_device_ops pointer belongs to the dsa_port structure.
>From now on assign a slave's xmit copy from its CPU port tagging
operations. This will ease the future support for multiple CPU ports.
Also keep the tag_ops at the beginning of the dsa_port structure so that
we ensure copies for hot path are in cacheline 1.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When resolving the DSA tagging protocol used by a CPU switch, use a
temporary "tag_ops" variable to store the dsa_device_ops instead of
using directly dst->tag_ops. This will make the future patches moving
this pointer around easier to read.
There is no functional changes.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make it clear that the master device is linked to a CPU port by using
"cpu_dp" for the dsa_port variable in master.c instead of "port", then
use a "port" variable to describe the port index, as usually seen in
other places of DSA core.
This will make the future patch touching dsa_ptr more readable. There is
no functional changes.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DSA tagging code does not need to know about the DSA architecture,
it only needs to return the slave device corresponding to the source
port index (and eventually the source device index for cascade-capable
switches) parsed from the frame received on the master device.
For this purpose, provide an inline dsa_master_get_slave helper which
validates the device and port indexes and look up the slave device.
This makes the tagging rcv functions more concise and robust, and also
makes dsa_get_cpu_port obsolete.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pointer ndev is being dereferenced with the call to netif_running
before it is being null checked. Re-order the code to only dereference
ndev after it has been null checked.
Detected by CoverityScan, CID#1457206 ("Dereference before null check")
Fixes: 9df8f79a4d ("net: hns3: Add DCB support when interacting with network stack")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the numbers of events for stop_queue and wake_queue in
ethtool stats.
Example:
ethtool -S eth0
NIC statistics:
...
stop_queue: 7
wake_queue: 7
...
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch uses u64_to_user_ptr() to cast info.map_ids to a userspace ptr.
It also tags the user_map_ids with '__user' for sparse check.
Fixes: cb4d2b3f03 ("bpf: Add name, load_time, uid and map_ids to bpf_prog_info")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The assignment of -EINVAL to variable ret is redundant as it
is being overwritten on the following error exit paths or
to the return value from the following call to basic_set_parms.
Fix this up by removing it. Cleans up clang warning message:
net/sched/cls_basic.c:185:2: warning: Value stored to 'err' is never read
Fixes: 1d8134fea2 ("net_sched: use idr to allocate basic filter handles")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function ipmr_notifier_init is local to the source and does
not need to be in global scope, so make it static.
Cleans up sparse warning:
warning: symbol 'ipmr_notifier_init' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
State is initially reported as UNKNOWN. Before register call
netif_carrier_off(). Once the device is opened, call netif_carrier_on() in
order to set the state to UP.
Signed-off-by: Mick Tarsel <mjtarsel@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit e85ec74ace ("net: dsa: bcm_sf2: Defer port
enabling to calling port_enable") because this now makes an unbind
followed by a bind to fail connecting to the ingrated PHY.
What this patch missed is that we need the PHY to be enabled with
bcm_sf2_gphy_enable_set() before probing it on the MDIO bus. This is
correctly done in the ops->setup() function, but by the time
ops->port_enable() runs, this is too late. Upon unbind we would power
down the PHY, and so when we would bind again, the PHY would be left
powered off.
Fixes: e85ec74ace ("net: dsa: bcm_sf2: Defer port enabling to calling port_enable")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-09-29
This series contains updates to i40e and i40evf only.
Jake provides several of the changes starting with the renaming of a
variable to clarify what the value is actually calculating. Found we
were misusing the __I40E_RECOVERY_PENDING bit to determine when we
should actually request a new IRQ in i40e_setup_misc_vector(), which
lead to a design mistake, so to resolve the issue, use a separate
state bit for miscellaneous IRQ setup and fix up the design while we
are at it. Cleaned up the old legacy PM support in the driver since
we support the newer generic PM callbacks. Fixed a failure to
hibernate issue, where on some platforms with a large number of CPUs,
we would allocate many IRQ vectors which we would try to migrate to
CPU0 when hibernating.
Sudheer cleans up a check for unqualified module inside i40e_up_complete()
because the link state information is in flux at time, so log messages
are getting logged with incorrect link state information. Also provided
additional log message cleanups and simplify member variable access in
the printing of the link messages.
Mariusz relaxes the firmware check since Fortville and Fort Park NICs
can and do have different firmware versions, so only warn for older
Fortville firmware. Fixed an errata with a flow director statistic that
was not wrapping as expected, simply reset after reading.
Mitch prevents consternation by lowering the log level to debug on a
message seen regularly on VF reset or unload, which is meaningless under
normal circumstances. Refactor the firmware version checking since
Fortville and Fort Park devices can have different firmware versions.
Alan fixes a ring to vector mapping, where the past implementation
attempted to map each Tx and Rx ring to its own vector, however we use
combined queues so we should be mapping the Tx/Rx rings together on one
vector. Adds the ability for the VF to request a different number of
queues allocated to it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The check on len is redundant as it is always greater than 1,
so just remove it and make the printk less complex.
Detected by CoverityScan, CID#1226729 ("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far we've been relying on sockopt(SOL_IP, IP_FREEBIND) being usable
even on IPv6 sockets.
However, it turns out it is perfectly reasonable to want to set freebind
on an AF_INET6 SOCK_RAW socket - but there is no way to set any SOL_IP
socket option on such a socket (they're all blindly errored out).
One use case for this is to allow spoofing src ip on a raw socket
via sendmsg cmsg.
Tested:
built, and booted
# python
>>> import socket
>>> SOL_IP = socket.SOL_IP
>>> SOL_IPV6 = socket.IPPROTO_IPV6
>>> IP_FREEBIND = 15
>>> IPV6_FREEBIND = 78
>>> s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0)
>>> s.getsockopt(SOL_IP, IP_FREEBIND)
0
>>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND)
0
>>> s.setsockopt(SOL_IPV6, IPV6_FREEBIND, 1)
>>> s.getsockopt(SOL_IP, IP_FREEBIND)
1
>>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND)
1
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NS for DAD are sent on admin up as long as a valid qdisc is found.
A race condition exists by which these packets will not egress the
interface if the operational state of the lower device is not yet up.
The solution is to delay DAD until the link is operationally up
according to RFC2863. Rather than only doing this, follow the existing
code checks by deferring IPv6 device initialization altogether. The fix
allows DAD on devices like tunnels that are controlled by userspace
control plane. The fix has no impact on regular deployments, but means
that there is no IPv6 connectivity until the port has been opened in
the case of port-based network access control, which should be
desirable.
Signed-off-by: Mike Manning <mmanning@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The i40e driver now supports two different devices with two different
firmware versions. So be smart about how we handle these. Move the FW
version macros to the appropriate header file, and add a convenience
macro that checks the version based on the device. Then use this macro
to check whether or not the driver can use the new link info API.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently the PF allocates a default number of queues for each VF and
cannot be changed. This patch enables the VF to request a different
number of queues allocated to it. This patch also adds a new virtchnl
op and capability flag to facilitate this negotiation.
After the PF receives a request message, it will set a requested number
of queues for that VF. Then when the VF resets, its VSI will get a new
number of queues allocated to it.
This is a best effort request and since we only allocate a guaranteed
default number, if the VF tries to ask for more than the guaranteed
number, there may not be enough in HW to accommodate it unless other
queues for other VFs are freed. It should also be noted decreasing the
number queues allocated to a VF to below the default will NOT enable the
allocation of more than 32 VFs per PF and will not free queues guaranteed
to each VF by default.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The current implementation for mapping queues to vectors is broken
because it attempts to map each Tx and Rx ring to its own vector,
however we use combined queues so we should actually be mapping the
Tx/Rx rings together on one vector.
Also in the current implementation, in the case where we have more
queues than vectors, we attempt to group the queues together into
'chunks' and map each 'chunk' of queues to a vector. Chunking them
together would be more ideal if, and only if, we only had RSS because of
the way the hashing algorithm works but in the case of a future patch
that enables VF ADq, round robin assignment is better and still works
with RSS.
This patch resolves both those issues and simplifies the code needed to
accomplish this. Instead of treating the case where we have more queues
than vectors as special, if we notice our vector index is greater than
vectors, reset the vector index to zero and continue mapping. This
should ensure that in both cases, whether we have enough vectors for
each queue or not, the queues get appropriately mapped.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
On some platforms with a large number of CPUs, we will allocate many IRQ
vectors. When hibernating, the system will attempt to migrate all of the
vectors back to CPU0 when shutting down all the other CPUs. It is
possible that we have so many vectors that it cannot re-assign them to
CPU0. This is even more likely if we have many devices installed in one
platform.
The end result is failure to hibernate, as it is not possible to
shutdown the CPUs. We can avoid this by disabling MSI-X and clearing our
interrupt scheme when the device is suspended. A more ideal solution
would be some method for the stack to properly handle this for all
drivers, rather than on a case-by-case basis for each driver to fix
itself.
However, until this more ideal solution exists, we can do our part and
shutdown our IRQs during suspend, which should allow systems with
a large number of CPUs to safely suspend or hibernate.
It may be worth investigating if we should shut down even further when
we suspend as it may make the path cleaner, but this was the minimum fix
for the hibernation issue mentioned here.
Testing-hints:
This affects systems with a large number of CPUs, and with multiple
devices enabled. Without this change, those platforms are unable to
hibernate at all.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Although the service task does check the suspended status before
running, it might already be part way through running when we go to
suspend. Lets ensure that the service task is stopped and will not be
restarted again until we finish resuming. This ensures that service task
code does not cause strange interactions with the suspend/resume
handlers.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When handling suspend and resume callbacks we want to make sure that (a)
we don't suspend again if we're already suspended and (b) we don't
resume again if we're already resuming. Lets make sure we test_and_set
the __I40E_SUSPENDED bit in i40e_suspend which ensures that a suspend
call when already suspended will exit early. Additionally, if
__I40E_SUSPENDED is not set when we begin resuming, exit early as well.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Stop using the old legacy PM support, since we now have stable support
for the newer generic PM callbacks.
This has several advantages. First, we no longer have to manage our
own pci_save_state() and power changes, as it's preferred to have the
PCI stack do this. Second, these routines get called for both hibernate
and suspend to ram, so we can have the driver properly handle all the
suspend/resume flows that it needs to.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We currently (mis)use the __I40E_RECOVERY_PENDING bit to determine when
we should actually request a new IRQ in i40e_setup_misc_vector().
This led to a design mistake where we open-coded the re-setup of the
miscellaneous vector in i40e_resume() instead of using the function
provided. If we did not open-code this and instead tried to use the
i40e_setup_misc_vector() function, it would lead to never reallocating
the IRQ.
This would lead to a second i40e_suspend() call failing to free the
vector due to a NULL pointer dereference.
A future patch is going to re-work how the i40e_suspend() and
i40e_resume() flows work to clear all IRQ vectors, which would require
us to use i40e_setup_misc_vector() directly. Since during this time the
__I40E_RECOVERY_PENDING bit is set, we'll never re-allocate the vector.
Rather than leaving the open-coded setup in i40e_resume() lets just fix
the problem properly in i40e_setup_misc_vector().
Introduce a new state bit which indicates when the IRQ has been
assigned, which will be set when i40e_setup_misc_vector is first called.
This ultimately resolves the issue of re-requesting the vector, without
overloading the __I40E_RECOVERY_PENDING state. This ensures that the
suspend/resume cycle can use the setup function instead of open-coding
the re-request during resume.
Additionally, since the only callers of i40e_stop_misc_vector also want
to free it, move this code directly into the function to avoid
duplication. Due to the new functionality, rename it to
i40e_free_misc_vector().
This lets us drop the extra calls to free and re-enable the vector
during i40e_suspend() and i40e_resume(). We don't need to call
i40e_setup_misc_Vector() in i40e_resume() because it gets called by the
i40e_rebuild() call.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We see this message regularly on VF reset or unload (which invokes a
reset). It's essentially meaningless unless it's happening constantly.
To prevent consternation, lower the log level to debug so it's not seen
under normal circumstance.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
An errata with GLQF_PCNT causes it to not wrap as expected. This
can cause an error in flow director statistics. This patch resets
affected counters just after reading.
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fortville and Fort Park devices are often on different firmware release
schedules. This change relaxes the minor version warning message,
so it is only displayed for older FW warning version for old
firmware Fortville 3 or earlier.
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This commit replaces usage of vsi->back in i40e_print_link_message()
(which is actually a PF pointer) with temp variable.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e_print_link_message() is intended to compare new
link state with current link state and print log message
only if the new state is different from current state.
However in current driver the new state does not get updated
when link is going down because of the if condition. When an
interface is brought down, vsi->state is set to I40E_VSI_DOWN
in i40e_vsi_close() and later i40e_print_link_message() does
not get invoked in i40e_link_event due to if condition. Hence
link down message doesn't appear when link is going down. The
down state is seen later during i40e_open() and old state
gets printed. The actual link state doesn't get updated in
i40e_close() or i40e_open() but when i40e_handle_link_event is
called inside i40e_clean_adminq_subtask.
This change allows i40e_print_link_message() to be called when
interface is going down and keeps the state information updated.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In current driver, when ifconfig ethx up is done, the link state
doesn't transition to UP inside i40e_open(). It changes after AQ
command response is handled in i40e_handle_link_event().
When pf->hw.phy.link_info.link_info is DOWN inside i40e_open(),
The state is transient and invalid. So log message gets printed
based on incorrect info (i.e link_info and an_info).
This commit removes check for unqualified module inside
i40e_up_complete(). The existing check in i40e_handle_link_event()
logs the error message based on correct link state information.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This value is not calculating bytes_per_int, which would actually just
be bytes/ITR_COUNTDOWN_START, but rather it's calculating bytes/usecs.
Rename the variable for clarity so that future developers understand
what the value is actually calculating.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
fib_check_nh does not use the fib_info arg; remove t.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_weight in fib_info is set but not used. Remove it and the
helpers for setting it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau says:
====================
bpf: Extend bpf_{prog,map}_info
This patch series adds more fields to bpf_prog_info and bpf_map_info.
Please see individual patch for details.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch tests newly added fields of the bpf_attr,
bpf_prog_info and bpf_map_info.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch swaps the checking order. It now checks the map_info
first and then prog_info. It is a prep work for adding
test to the newly added fields (the map_ids of prog_info field
in particular).
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch extends the libbpf to provide API support to
allow specifying BPF object name.
In tools/lib/bpf/libbpf, the C symbol of the function
and the map is used. Regarding section name, all maps are
under the same section named "maps". Hence, section name
is not a good choice for map's name. To be consistent with
map, bpf_prog also follows and uses its function symbol as
the prog's name.
This patch adds logic to collect function's symbols in libbpf.
There is existing codes to collect the map's symbols and no change
is needed.
The bpf_load_program_name() and bpf_map_create_name() are
added to take the name argument. For the other bpf_map_create_xxx()
variants, a name argument is directly added to them.
In samples/bpf, bpf_load.c in particular, the symbol is also
used as the map's name and the map symbols has already been
collected in the existing code. For bpf_prog, bpf_load.c does
not collect the function symbol name. We can consider to collect
them later if there is a need to continue supporting the bpf_load.c.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows userspace to specify a name for a map
during BPF_MAP_CREATE.
The map's name can later be exported to user space
via BPF_OBJ_GET_INFO_BY_FD.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch adds name and load_time to struct bpf_prog_aux. They
are also exported to bpf_prog_info.
The bpf_prog's name is passed by userspace during BPF_PROG_LOAD.
The kernel only stores the first (BPF_PROG_NAME_LEN - 1) bytes
and the name stored in the kernel is always \0 terminated.
The kernel will reject name that contains characters other than
isalnum() and '_'. It will also reject name that is not null
terminated.
The existing 'user->uid' of the bpf_prog_aux is also exported to
the bpf_prog_info as created_by_uid.
The existing 'used_maps' of the bpf_prog_aux is exported to
the newly added members 'nr_map_ids' and 'map_ids' of
the bpf_prog_info. On the input, nr_map_ids tells how
big the userspace's map_ids buffer is. On the output,
nr_map_ids tells the exact user_map_cnt and it will only
copy up to the userspace's map_ids buffer is allowed.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the commit 76174004a0 (tcp: do not slow start when cwnd equals
ssthresh), the comparison to the reduced cwnd in tcp_vegas_ssthresh() would
under-evaluate the ssthresh.
Signed-off-by: Hoang Tran <hoang.tran@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to be able to transparently forward most link-local frames via
tunnels (e.g. vxlan, qinq). Currently the bridge's group_fwd_mask has a
mask which restricts the forwarding of STP and LACP, but we need to be able
to forward these over tunnels and control that forwarding on a per-port
basis thus add a new per-port group_fwd_mask option which only disallows
mac pause frames to be forwarded (they're always dropped anyway).
The patch does not change the current default situation - all of the others
are still restricted unless configured for forwarding.
We have successfully tested this patch with LACP and STP forwarding over
VxLAN and qinq tunnels.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin says:
====================
Add support for DCB feature in hns3 driver
The patchset contains some enhancement related to DCB before
adding support for DCB feature.
This patchset depends on the following patchset:
https://patchwork.ozlabs.org/cover/815646/https://patchwork.ozlabs.org/cover/816145/
High Level Architecture:
[ lldpad ]
|
|
|
[ hns3_dcbnl ]
|
|
|
[ hclge_dcb ]
/ \
/ \
/ \
[ hclge_main ] [ hclge_tm ]
Current patch-set support following functionality:
Use of lldptool to configure the tc schedule mode, tc
bandwidth(if schedule mode is ETS), prio_tc_map and
PFC parameter.
V3: Drop mqprio support
V2: Fix for not defining variables in local loop.
V1: Initial Submit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When using lldptool to configure DCB parameter, hclge_dcb module
call the client_ops->setup_tc to tell network stack which queue
and priority is using for specific tc.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the DCB feature is supported, fc_mode and dcb enable flag
must be set according to the DCB parameter.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add dcb netlink interface by calling the interface from
hclge_dcb module.
This patch also update Makefile in order to build hns3_dcbnl module.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hclge_dcb module calls the interface from hclge_main/tm
and provide interface for the dcb netlink interface.
This patch also update Makefiles required to build the DCB
supported code in HNS3 Ethernet driver and update the existing
Kconfig file in the hisilicon folder.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add some interface and export some interface from
hclge_tm and hclgc_main to support the upcoming DCB feature.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sriov is enabled and TM is in tc-based mode, vf's TM
parameters is not set in TM initialization process.
This patch add the tc_based TM support for sriov enabled
using the information in vport struct.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add a tm_port_shaper cmd and set port shaper
to HCLGE_ETHER_MAX_RATE on TM initialization process.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add a pfc_pause_en cmd, and use it to configure
PFC option according to fc_mode in hdev->tm_info.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>