Only non-NPAR PFs need to actively check and manage unsupported link
speeds. NPAR functions and VFs do not control the link speed and
should skip the unsupported speed detection logic, to avoid warning
messages from firmware rejecting the unsupported firmware calls.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent change to reduce delay granularity waiting for firmware
reponse has caused a regression. With a tighter delay loop,
the driver may see the beginning part of the response faster.
The original 5 usec delay to wait for the rest of the message
is not long enough and some messages are detected as invalid.
Increase the maximum wait time from 5 usec to 20 usec. Also, fix
the debug message that shows the total delay time for the response
when the message times out. With the new logic, the delay time
is not fixed per iteration of the loop, so we define a macro to
show the total delay time.
Fixes: 9751e8e714 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add logic to reserve default rings at driver open time if none was
reserved during probe time. This will happen when the PF driver did
not provision minimum rings to the VF, due to more limited resources.
Driver open will only succeed if some minimum rings can be reserved.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For completeness and correctness, the VF driver needs to reserve these
RSS and L2 contexts.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When rings are more limited and the PF has not provisioned minimum
guaranteed rings to the VF, do not reserve rings during driver probe.
Wait till device open before reserving rings when they will be used.
Device open will succeed if some minimum rings can be successfully
reserved and allocated.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds debugfs support for bnxt_en with the purpose of allowing users
to examine the current DIM profile in use for each receive queue. This
was instrumental in debugging issues found with DIM and ensuring that
the profiles we expect to use are the profiles being used.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Testing with DIM enabled on older kernels indicated that firmware calls
were slower than expected. More detailed analysis indicated that the
default 25us delay was higher than necessary. Reducing the time spend in
usleep_range() for the first several calls would reduce the overall
latency of firmware calls on newer Intel processors.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This keeps the RING_IDLE flag set in hardware for higher coalesce
settings by default and improved latency.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace switch statements printing different messages for every ring type
with a common message.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Older firmware will reject this call and cause an error message to
be printed by the VF driver.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current driver maps MQPRIO traffic classes directly 1:1 to the
internal hardware queues (TC0 maps to hardware queue 0, etc). This
direct mapping requires the internal hardware queues to be reconfigured
from lossless to lossy and vice versa when necessary. This
involves reconfiguring internal buffer thresholds which is
disruptive and not always reliable.
Implement a new scheme to map TCs to internal hardware queues by
matching up their PFC requirements. This will eliminate the need
to reconfigure a hardware queue internal buffers at run time. After
remapping, the NIC is closed and opened for the new TC to hardware
queues to take effect.
This patch only adds the basic mapping logic.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When open fails during ethtool -L ring change, for example, the driver
may crash at bnxt_free_irq() because bp->bnapi is NULL.
If we fail to allocate all the new rings, bnxt_open_nic() will free
all the memory including bp->bnapi. Subsequent call to bnxt_close_nic()
will try to dereference bp->bnapi in bnxt_free_irq().
Fix it by checking for !bp->bnapi in bnxt_free_irq().
Fixes: e5811b8c09 ("bnxt_en: Add IRQ remapping logic.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With recent changes to reserve both L2 and RDMA rings, we need to include
the RDMA rings in bnxt_check_rings(). Otherwise we will under-estimate
the rings we need during ethtool -L and may lead to failure.
Fixes: fbcfc8e467 ("bnxt_en: Reserve completion rings and MSIX for bnxt_re RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the driver needs to re-initailize the IRQ vectors, we make the
new ulp_irq_stop() call to tell the RDMA driver to disable and free
the IRQ vectors. After IRQ vectors have been re-initailized, we
make the ulp_irq_restart() call to tell the RDMA driver that
IRQs can be restarted.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add additional logic to reserve completion rings for the bnxt_re driver
when it requests MSIX vectors. The function bnxt_cp_rings_in_use()
will return the total number of completion rings used by both drivers
that need to be reserved. If the network interface in up, we will
close and open the NIC to reserve the new set of completion rings and
re-initialize the vectors.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor bnxt_need_reserve_rings() slightly so that __bnxt_reserve_rings()
can call it and remove some duplicated code.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add remapping logic so that bnxt_en can use any arbitrary MSIX vectors.
This will allow the driver to reserve one range of MSIX vectors to be
used by both bnxt_en and bnxt_re. bnxt_en can now skip over the MSIX
vectors used by bnxt_re.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the current code, the range of MSIX vectors allocated for the RDMA
driver is disjoint from the network driver. This creates a problem
for the new firmware ring reservation scheme. The new scheme requires
the reserved completion rings/MSIX vectors to be in a contiguous
range.
Change the logic to allocate RDMA MSIX vectors to be contiguous with
the vectors used by bnxt_en on new firmware using the new scheme.
The new function bnxt_get_num_msix() calculates the exact number of
vectors needed by both drivers.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the driver code makes some assumptions about the group index
and the map index of rings. This makes the code more difficult to
understand and less flexible.
Improve it by adding the grp_idx and map_idx fields explicitly to the
bnxt_ring_struct as a union. The grp_idx is initialized for each tx ring
and rx agg ring during init. time. We do the same for the map_idx for
each cmpl ring.
The grp_idx ties the tx ring to the ring group. The map_idx is the
doorbell index of the ring. With this new infrastructure, we can change
the ring index mapping scheme easily in the future.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When firmware sends a DMA response to the driver, the last byte of the
message will be set to 1 to indicate that the whole response is valid.
The driver waits for the message to be valid before reading the message.
The firmware spec allows these response messages to increase in
length by adding new fields to the end of these messages. The
older spec's valid location may become a new field in a newer
spec. To guarantee compatibility, the driver should zero the valid
byte before interpreting the entire message so that any new fields not
implemented by the older spec will be read as zero.
For messages that are forwarded to VFs, we need to set the length
and re-instate the valid bit so the VF will see the valid response.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When checking for the maximum pre-set TX channels for ethtool -l, we
need to check the current max_tx_scheduler_inputs parameter from firmware.
This parameter specifies the max input for the internal QoS nodes currently
available to this function. The function's TX rings will be capped by this
parameter. By adding this logic, we provide a more accurate pre-set max
TX channels to the user.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gather periodic extended port statistics, if the device is PF and
link is up.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trusted VFs are allowed to modify MAC address, even when PF
has assigned one.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the RDMA driver is registered, use a new VNIC mode that allows
RDMA traffic to be seen on the netdev in promiscuous mode.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the default ring logic to select default number of rings to be up to
8 per port if the default rings x NIC ports <= total CPUs.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor changes, such as new extended port statistics.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.
This ends up CPU observing two barriers back to back before executing the
register write.
Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
Also add mmiowb() so that write code doesn't move outside of scope.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During initialization, if we encounter errors, there is a code path that
calls bnxt_hwrm_vnic_set_tpa() with invalid VNIC ID. This may cause a
warning in firmware logs.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_restore_pf_fw_resources routine frees PF resources by calling
close_nic and allocates the resources back, by doing open_nic. However,
this is not needed, if the PF is already in closed state.
This bug causes the driver to call open the device and call request_irq()
when it is not needed. Ultimately, pci_disable_msix() will crash
when bnxt_en is unloaded.
This patch fixes the problem by skipping __bnxt_close_nic and
__bnxt_open_nic inside bnxt_restore_pf_fw_resources routine, if the
interface is not running.
Fixes: 80fcaf46c0 ("bnxt_en: Restore MSIX after disabling SRIOV.")
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent changes added the bnxt_init_int_mode() call in the driver's open
path whenever ring reservations are changed. This call was previously
only called in the probe path. In the open path, if MQPRIO TC has been
setup, the bnxt_init_int_mode() call would reset and mess up the MQPRIO
per TC rings.
Fix it by not re-initilizing bp->tx_nr_rings_per_tc in
bnxt_init_int_mode(). Instead, initialize it in the probe path only
after the bnxt_init_int_mode() call.
Fixes: 674f50a5b0 ("bnxt_en: Implement new method to reserve rings.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When receiving a packet with VLAN tag, pass the entire 16-bit TCI to the
stack when calling __vlan_hwaccel_put_tag(). The current code is only
passing the 12-bit tag and it is missing the priority bits.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of vnics to check must be determined ahead of time because
only standard RX rings require vnics to support RFS. The logic is
similar to the ring reservation logic and we can now use the
refactored common functions to do most of the work in setting up
the firmware message.
Fixes: 8f23d638b3 ("bnxt_en: Expand bnxt_check_rings() to check all resources.")
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bnxt_hwrm_reserve_{pf|vf}_rings() functions are very similar to
the bnxt_hwrm_check_{pf|vf}_rings() functions. Refactor the former
so that the latter can make use of common code in the next patch.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure to cancel any pending work that might update driver coalesce
settings when taking down an interface.
Fixes: 6a8788f256 ("bnxt_en: add support for software dynamic interrupt moderation")
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of tc_cls_can_offload_and_chain0() to set extack msg in case
ethtool tc offload flag is not set or chain unsupported.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the driver exports different switchdev PARENT_IDs for
representors belonging to different SR-IOV PF-pools of an adapter.
This is not correct as the adapter can switch across all vports
of an adapter. This patch fixes this by exporting a common switchdev
PARENT_ID for all reps of an adapter. The PCIE DSN is used as the id.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The chip supports 64-byte and 128-byte cache line size for more optimal
DMA performance when matched to the CPU cache line size. The default is 64.
If the system is using 128-byte cache line size, set it to 128.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Forward hwrm_func_vf_cfg command from VF to PF driver, to store
VF MAC address in PF's context. This will allow "ip link show"
to display all VF MAC addresses.
Maintain 2 locations of MAC address in VF info structure, one for
a PF assigned MAC and one for VF assigned MAC.
Display VF assigned MAC in "ip link show", only if PF assigned MAC is
not valid.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_check_rings() is called by ethtool, XDP setup, and ndo_setup_tc()
to see if there are enough resources to support the new configuration.
Expand the call to test all resources if the firmware supports the new
API. With the more flexible resource allocation scheme, this call must
be made to check that all resources are available before committing to
allocate the resources.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of the old method of evenly dividing the resources to the VFs,
use the new firmware API to specify min and max resources for each VF.
This way, there is more flexibility for each VF to allocate more or less
resources.
The min is the absolute minimum for each VF to function. The max is the
global resources minus the resources used by the PF. Each VF is
guaranteed the min. Up to max resources may be available for some VFs.
The PF driver can use one of 2 strategies specified in NVRAM to assign
the resources. The old legacy strategy of evenly dividing the resources
or the new flexible strategy.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In bnxt_rfs_capable(), add call to reserve vnic resources to support
NTUPLE. Return true if we can successfully reserve enough vnics.
Otherwise, reserve the minimum 1 VNIC for normal operations not
supporting NTUPLE and return false.
Also, suppress warning message about not enough resources for NTUPLE when
only 1 RX ring is in use. NTUPLE filters by definition require multiple
RX rings.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new method will call firmware to reserve the desired tx, rx, cmpl
rings, ring groups, stats context, and vnic resources. A second query
call will check the actual resources that firmware is able to reserve.
The driver will then trim and adjust based on the actual resources
provided by firmware. The driver will then reserve the final resources
in use.
This method is a more flexible way of using hardware resources. The
resources are not fixed and can by adjusted by firmware. The driver
adapts to the available resources that the firmware can reserve for
the driver.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In combined mode, the driver is currently not setting RX and TX ring
numbers the same when firmware can allocate more RX than TX or vice versa.
This will confuse the user as the ethtool convention assumes they are the
same in combined mode. Fix it by adding bnxt_trim_dflt_sh_rings() to trim
RX and TX ring numbers to be the same as the completion ring number in
combined mode.
Note that if TCs are enabled and/or XDP is enabled, the number of TX rings
will not be the same as RX rings in combined mode.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new API HWRM_FUNC_RESOURCE_QCAPS provides min and max hardware
resources. Use the new API when it is supported by firmware.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for new firmware APIs to allocate hardware resources,
add a new struct bnxt_hw_resc to hold various min, max and reserved
resources. This new structure is common for PFs and VFs.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After SRIOV has been enabled and disabled, the MSIX vectors assigned to
the VFs have to be re-initialized. Otherwise they cannot be re-used by
the PF. For example, increasing the number of PF rings after disabling
SRIOV may fail if the PF uses MSIX vectors previously assigned to the VFs.
To fix this, we add logic in bnxt_restore_pf_fw_resources() to close the
NIC, clear and re-init MSIX, and re-open the NIC.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new __bnxt_close_nic() function to do all the work previously done
in bnxt_close_nic() except waiting for SRIOV configuration. The new
function will be used in the next patch as part of SRIOV cleanup.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The version has new firmware APIs to allocate PF/VF resources more
flexibly.
New toolchains were used to generate this file, resulting in a one-time
large diffstat.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently in the cases where cmp_type == CMP_TYPE_RX_L2_TPA_START_CMP or
CMP_TYPE_RX_L2_TPA_END_CMP the exit path updates cpr->rx_bytes with an
uninitialized length len. Fix this by adding a new exit path that does
not update the cpr stats with the bogus length len and remove the unused
label next_rx_no_prod.
Detected by CoverityScan, CID#1463807 ("Uninitialized scalar variable")
Fixes: 6a8788f256 ("bnxt_en: add support for software dynamic interrupt moderation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements the changes needed for the bnxt_en driver to add support
for dynamic interrupt moderation per ring.
This does add additional counters in the receive path, but testing shows
that any additional instructions are offset by throughput gain when the
default configuration is for low latency.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver hook points for xdp_rxq_info:
* reg : bnxt_alloc_rx_rings
* unreg: bnxt_free_rx_rings
This driver should be updated to re-register when changing
allocation mode of RX rings.
Tested on actual hardware.
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Advertise NETIF_F_GRO_HW in hw_features if hardware GRO is supported.
In bnxt_fix_features(), disable GRO_HW and LRO if current hardware
configuration does not allow it. GRO_HW depends on GRO. GRO_HW is
also mutually exclusive with LRO. XDP setup will now rely on
bnxt_fix_features() to turn off aggregation. During chip init, turn on
or off hardware GRO based on NETIF_F_GRO_HW in features flag.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After applying 2270bc5da3 ("bnxt_en: Fix netpoll handling") and
903649e718 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."),
we still see the following WARN fire:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160
bnxt_poll+0x0/0xd0 exceeded budget in poll
<snip>
Call Trace:
[<ffffffff814be5cd>] dump_stack+0x4d/0x70
[<ffffffff8107e013>] __warn+0xd3/0xf0
[<ffffffff8107e07f>] warn_slowpath_fmt+0x4f/0x60
[<ffffffff8179519a>] netpoll_poll_dev+0x15a/0x160
[<ffffffff81795f38>] netpoll_send_skb_on_dev+0x168/0x250
[<ffffffff817962fc>] netpoll_send_udp+0x2dc/0x440
[<ffffffff815fa9be>] write_ext_msg+0x20e/0x250
[<ffffffff810c8125>] call_console_drivers.constprop.23+0xa5/0x110
[<ffffffff810c9549>] console_unlock+0x339/0x5b0
[<ffffffff810c9a88>] vprintk_emit+0x2c8/0x450
[<ffffffff810c9d5f>] vprintk_default+0x1f/0x30
[<ffffffff81173df5>] printk+0x48/0x50
[<ffffffffa0197713>] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core]
[<ffffffffa0197b9b>] edac_mc_handle_error+0x42b/0x6e0 [edac_core]
[<ffffffffa01c3a60>] sbridge_mce_output_error+0x410/0x10d0 [sb_edac]
[<ffffffffa01c47cc>] sbridge_check_error+0xac/0x130 [sb_edac]
[<ffffffffa0197f3c>] edac_mc_workq_function+0x3c/0x90 [edac_core]
[<ffffffff81095f8b>] process_one_work+0x19b/0x480
[<ffffffff810967ca>] worker_thread+0x6a/0x520
[<ffffffff8109c7c4>] kthread+0xe4/0x100
[<ffffffff81884c52>] ret_from_fork+0x22/0x40
This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting
in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually
given a non-zero budget.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some dual port NICs, the 2 ports have to be configured with compatible
link speeds. Under some conditions, a port's configured speed may no
longer be supported. The firmware will send a message to the driver
when this happens.
Improve this logic that prints out the warning by only printing it if
we can determine the link speed that is no longer supported. If the
speed is unknown or it is in autoneg mode, skip the warning message.
Reported-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Small overlapping change conflict ('net' changed a line,
'net-next' added a line right afterwards) in flexcan.c
Signed-off-by: David S. Miller <davem@davemloft.net>
short_input variable is assigned to another data pointer which is
referred out of its scope. Fix it by moving short_input definition
to the beginning of bnxt_hwrm_do_send_msg() function.
No failure has been reported so far due to this issue.
Fixes: e605db801b ("bnxt_en: Support for Short Firmware Message")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current 'bnxt_shutdown' implementation only invokes
'bnxt_ulp_shutdown' to shut down RoCE in the case when the system is in
the path of power off (SYSTEM_POWER_OFF). While this may work in most
cases, it does not work in the smart NIC case, when Linux 'reboot'
command is initiated from the Linux that runs on the ARM cores of the
NIC card. In this particular case, Linux 'reboot' results in a system
'L3' level reset where the entire ARM and associated subsystems are
being reset, but at the same time, Nitro core is being kept in sane state
(to allow external PCIe connected servers to continue to work). Without
properly shutting down RoCE and freeing all associated resources, it
results in the ARM core to hang immediately after the 'reboot'
By always invoking 'bnxt_ulp_shutdown' in 'bnxt_shutdown', it fixes the
above issue
Fixes: 0efd2fc65c ("bnxt_en: Add a callback to inform RDMA driver during PCI shutdown.")
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since day one of XDP drivers had to remember to free the program
on the remove path. This leads to code duplication and is error
prone. Make the stack query the installed programs on unregister
and if something is installed, remove the program. Freeing of
program attached to XDP generic is moved from free_netdev() as well.
Because the remove will now be called before notifiers are
invoked, BPF offload state of the program will not get destroyed
before uninstall.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
On 32-bit architectures, rtc_time_to_tm() returns incorrect results
in 2038 or later, and do_gettimeofday() is broken for the same reason.
This changes the code to use ktime_get_real_seconds() and time64_to_tm()
instead, both of them are 2038-safe, and we can also get rid of the
CONFIG_RTC_LIB dependency that way.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO to match the new
convention.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ndo_xdp is a control path callback for setting up XDP in the
driver. We can reuse it for other forms of communication
between the eBPF stack and the drivers. Rename the callback
and associated structures and definitions.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent IRQ coalescing clean up has removed a guard-rail for the max DMA
buffer coalescing value. This is a 6-bit value and must not be 0. We
already have a check for 0 but 64 is equivalent to 0 and will cause
non-stop interrupts. Fix it by adding the proper check.
Fixes: f8503969d2 ("bnxt_en: Refactor and simplify coalescing code.")
Reported-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This restores the original behaviour before the block callbacks were
introduced. Allow the drivers to do binding of block always, no matter
if the NETIF_F_HW_TC feature is on or off. Move the check to the block
callback which is called for rule insertion.
Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TC flower is not enabled on VFs and when there's no FW support.
Alloc the tc_info{} struct at init time only when TC flower is being
enabled.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements periodic querying of cfa flow stats
in batches to compute the 'lastused' attribute of TC flow stats.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mapping of the ethtool coalescing parameters to hardware parameters
is now done in bnxt_hwrm_set_coal_params(). The same function can
handle both RX and TX settings. The code is now more clear. Some
adjustments have been made to get better hardware settings. The
coal_frames setting is now accurately set in hardware. The max_timer
is set to coal_ticks value.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current IRQ coalescing logic is a little messy. The ethtool
parameters are mapped to hardware parameters in a way that is difficult
to understand. The first step is to better organize the parameters
by adding the new structure bnxt_coal. The structure is used by both
the RX and TX sets of coalescing parameters.
Adjust the default coal_ticks to 14 us and 28 us for RX and TX.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some NICs have a firmware enforced maximum MTU setting by management
firmware. Set up netdev->max_mtu accordingly.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to call bnxt_approve_mac() which will send a message to the
PF if the MAC address hasn't changed.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code retrieves the firmware package version from firmware
everytime ethtool -i is run. There is no reason to do that as the
firmware will not change while the driver is loaded. Get the version
once at init time.
Also, display the full 4-part firmware version string and remove the
less useful interface spec version.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Rob Miller <rmiller@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new PCIe device ID and chip number for bcm58804
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There were quite a few overlapping sets of changes here.
Daniel's bug fix for off-by-ones in the new BPF branch instructions,
along with the added allowances for "data_end > ptr + x" forms
collided with the metadata additions.
Along with those three changes came veritifer test cases, which in
their final form I tried to group together properly. If I had just
trimmed GIT's conflict tags as-is, this would have split up the
meta tests unnecessarily.
In the socketmap code, a set of preemption disabling changes
overlapped with the rename of bpf_compute_data_end() to
bpf_compute_data_pointers().
Changes were made to the mv88e6060.c driver set addr method
which got removed in net-next.
The hyperv transport socket layer had a locking change in 'net'
which overlapped with a change of socket state macro usage
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Moving generic devlink code (registration) out of VF-R code
into new bnxt_devlink file, in preparation for future work
to add additional devlink functionality to bnxt.
Signed-off-by: Steve Lin <steven.lin1@broadcom.com>
Acked-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All drivers are converted to use block callbacks for TC_SETUP_CLS*.
So it is now safe to remove the calls to ndo_setup_tc from cls_*
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for flower offloads to block callbacks.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In bnxt_find_nvram_item(), it is copying firmware response data after
releasing the mutex. This can cause the firmware response data
to be corrupted if the next firmware response overwrites the response
buffer. The rare problem shows up when running ethtool -i repeatedly.
Fix it by calling the new variant _hwrm_send_message_silent() that requires
the caller to take the mutex and to release it after the response data has
been copied.
Fixes: 3ebf6f0a09 ("bnxt_en: Add installed-package version reporting via Ethtool GDRVINFO")
Reported-by: Sarveswara Rao Mygapula <sarveswararao.mygapula@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PCIE PCIE_EP_REG_LINK_STATUS_CONTROL register is only defined in PF
config space, so we must read it from the PF.
Fixes: 90c4f788f6 ("bnxt_en: Report PCIe link speed and width during driver load")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a further improvement to the PF/VF link change logic, use a private
mutex instead of the rtnl lock to protect link change logic. With the
new mutex, we don't have to take the rtnl lock in the workqueue when
we have to handle link related functions. If the VF and PF drivers
are running on the same host and both take the rtnl lock and one is
waiting for the other, it will cause timeout. This patch fixes these
timeouts.
Fixes: 90c694bb71 ("bnxt_en: Fix RTNL lock usage on bnxt_update_link().")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Link status query firmware messages originating from the VFs are forwarded
to the PF. The driver handles these interactions in a workqueue for the
VF and PF. The VF driver waits for the response from the PF in the
workqueue. If the PF and VF driver are running on the same host and the
work for both PF and VF are queued on the same workqueue, the VF driver
may not get the response if the PF work item is queued behind it on the
same workqueue. This will lead to the VF link query message timing out.
To prevent this, we create a private workqueue for PFs instead of using
the common workqueue. The VF query and PF response will never be on
the same workqueue.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IS_ERR() already implies unlikely(), so it can be omitted.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use setup_timer function instead of initializing timer with the
function and data fields.
Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for offloading TC based flow
rules and actions for the 'flower' classifier in the bnxt_en driver.
It includes logic to parse flow rules and actions received from the
TC subsystem, store them and issue the corresponding
hwrm_cfa_flow_alloc/free FW cmds. L2/IPv4/IPv6 flows and drop,
redir, vlan push/pop actions are supported in this patch.
In this patch the hwrm_cfa_flow_xxx routines are just stubs.
The code for these routines is introduced in the next patch for easier
review. Also, the code to query the TC/flower action stats will
be introduced in a subsequent patch.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce default rings from 8 to 4 on multi-port cards to reduce memory
usage.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we cannot allocate RX buffers in the NAPI poll loop when processing
an RX event, the current code does not count that event towards the NAPI
budget. This can cause us to potentially loop forever in NAPI if we
consistently cannot allocate new buffers. Improve it by counting
-ENOMEM event as 1 towards the NAPI budget.
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
initialize board_info values with proper enums for defensive programming
purposes. This will avoid any errors of the enums being declared not
lining up with the board_info array.
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add PCIe device ID for bcm58802 and bcm58808. Also add chip number
update to declare bcm588xx as chip class phase 4 and later
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides hints to irqbalance to map bnxt_en device IRQs
to specific CPU cores. cpumask_local_spread() is used, which first
maps IRQs to near NUMA cores; when those cores are exhausted, IRQs
are mapped to far NUMA cores.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the number of TX rings is changed (e.g. ethtool -L, enabling XDP TX
rings, etc), the current code tries to reserve the new number of TX rings
before closing and re-opening the NIC. If we are unable to reserve the
new TX rings, we abort the operation and keep the current TX rings.
The problem is that the firmware will disable the current TX rings even
when it cannot reserve the new set of TX rings. We fix it as follows:
1. Instead of reserving the new set of TX rings, just ask the firmware
to check if the new set of TX rings is available. There is a flag in
the firmware message to do that. If not available, abort and the
current TX rings will not be disabled.
2. Do the actual TX ring reservation in the path that opens the NIC.
We keep the number of TX rings currently successfully reserved. If the
number of TX rings is different than the reserved TX rings, we call
firmware and reserve again.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_hwrm_func_qcaps() is called during probe to get all device
resources and it also sets up the factory MAC address. The same function
is called when SRIOV is disabled to reclaim all resources. If
the MAC address has been overridden by a user administered MAC
address, calling this function will overwrite it.
Separate the logic that sets up the default MAC address into a new
function bnxt_init_mac_addr() that is only called during probe time.
Fixes: 4a21b49b34 ("bnxt_en: Improve VF resource accounting.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the number of TX rings is changed in bnxt_setup_tc(), we need to
include the XDP rings in the total TX ring count.
Fixes: 3841340627 ("bnxt_en: Add support for XDP_TX action.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of struct tc_to_netdev which is now just unnecessary container
and rather pass per-type structures down to drivers directly.
Along with that, consolidate the naming of per-type structure variables
in cls_*.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the return value from -EINVAL to -EOPNOTSUPP. The rest of the
drivers have it like that, so be aligned.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As ndo_setup_tc is generic offload op for whole tc subsystem, does not
really make sense to have cls-specific args. So move them under
cls_common structurure which is embedded in all cls structs.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the type is always present, push it to be a separate argument to
ndo_setup_tc. On the way, name the type enum and use it for arg type.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the phys_port_name for the external physical port to be in
"pA" format and that of VF-rep to be in "pCvfD" format as
suggested by Jakub Kicinski.
Fixes: c124a62ff2 ("bnxt_en: add support for port_attr_get and get_phys_port_name")
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a couple of warnings where variable ‘txq’ set but not used
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>v, i);
Signed-off-by: David S. Miller <davem@davemloft.net>
Suggested by Jakub Kicinski.
Fixes: c124a62ff2 ("bnxt_en: add support for port_attr_get and and get_phys_port_name")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for the switchdev_port_attr_get() and
ndo_get_phys_port_name() methods for the PF and the VF-reps.
Using this support a user application can deduce that the PF
(when in the ESWITCH_SWDEV mode) and it's VF-reps form a switch.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces the RX/TX and a simple netdev implementation
for VF-reps. The VF-reps use the RX/TX rings of the PF. For each VF-rep
the PF driver issues a VFR_ALLOC FW cmd that returns "cfa_code"
and "cfa_action" values. The FW sets up the filter tables in such
a way that VF traffic by default (in absence of other rules)
gets punted to the parent PF. The cfa_code value in the RX-compl
informs the driver of the source VF. For traffic being transmitted
from the VF-rep, the TX BD is tagged with a cfa_action value that
informs the HW to punt it to the corresponding VF.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a part of a patch-set that introduces support for
VF-reps in the bnxt_en driver. The driver registers eswitch mode
get/set methods with the devlink interface that allow a user to
enable SRIOV switchdev mode. When enabled, the driver registers
a VF-rep netdev object for each VF with the stack. This can
essentially bring the VFs unders the management perview of the
hypervisor and applications such as OVS.
The next patch in the series, adds the RX/TX routines and a slim
netdev implementation for the VF-reps.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report DCB_CAP_DCBX_LLD_MANAGED only if the firmware DCBX agent is enabled
and running for PF or VF. Otherwise, if both LLDP and DCBX agents are
disabled in firmware, we report DCB_CAP_DCBX_LLD_HOST and allow host
IEEE DCB settings. This patch refines the current logic in the driver.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For debugging purpose, it is sometimes useful to disable periodic
port statistics updates, so that the firmware logs will not be
filled with statistics update messages.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To allow users to set the hardware bridging mode to VEB or VEPA. Only
single function PF can change the bridging mode.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Retrieve and store the hardware bridge mode, so that we can implement
ndo_bridge_{get|set)link methods in the next patch.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VF representors and PTP are added features in the new firmware spec.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PF driver sets up a list of firmware commands from the VF driver that
needs to be forwarded to the PF for approval. This list is a 256-bit
bitmap. The code that sets up the bitmap falls apart on big-endian
architecture. __set_bit() does not work because it operates on long types
whereas the firmware interface is defined in u32 types, causing bits in
the wrong 32-bit word to be set.
Fix it by setting the proper bits on an array of u32.
Fixes: de68f5de56 ("bnxt_en: Fix bitmap declaration to work on 32-bit arches.")
Reported-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When changing channels from combined to rx/tx or vice versa, the code
uses the wrong "sh" parameter to determine if we are reserving rings
for shared or non-shared mode. It should be using the ethtool requested
"sh" parameter instead of the current "sh" parameter.
Fix it by passing the "sh" parameter to bnxt_reserve_rings(). For
ethtool, we will pass in the requested "sh" parameter.
Fixes: 391be5c273 ("bnxt_en: Implement new scheme to reserve tx rings.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
.ndo_get_stats64() may not be protected by RTNL and can race with
.ndo_stop() or other ethtool operations that can free the statistics
memory. Fix it by setting a new flag BNXT_STATE_READ_STATS and then
proceeding to read statistics memory only if the state is OPEN. The
close path that frees the memory clears the OPEN state and then waits
for the BNXT_STATE_READ_STATS to clear before proceeding to free the
statistics memory.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To handle netpoll properly, the driver must only handle TX packets
during NAPI. Handling RX events cause warnings and errors in
netpoll mode. The ndo_poll_controller() method should call
napi_schedule() directly so that a NAPI weight of zero will be used
during netpoll mode.
The bnxt_en driver supports 2 ring modes: combined, and separate rx/tx.
In separate rx/tx mode, the ndo_poll_controller() method will only
process the tx rings. In combined mode, the rx and tx completion
entries are mixed in the completion ring and we need to drop the rx
entries and recycle the rx buffers.
Add a function bnxt_force_rx_discard() to handle this in netpoll mode
when we see rx entries in combined ring mode.
Reported-by: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we get a TPA_END completion to handle a completed LRO packet, it
is possible that hardware would indicate errors. The current code is
not checking for the error condition. Define the proper error bits and
the macro to check for this error and abort properly.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to write the doorbell if BQL has stopped the queue and
skb->xmit_more is set. Otherwise it is possible for the tx queue to
rot and cause tx timeout.
Fixes: 4d172f21ce ("bnxt_en: Implement xmit_more.")
Suggested-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the existing code, the local variable sh is hardcoded to true to
calculate default rings for shared ring configuration. It is better
to have the caller determine the value of sh.
Reported-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not write the TX doorbell if skb->xmit_more is set unless the TX
queue is full.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Older chips require the doorbells to be written twice, but newer chips
do not. Add a new common function bnxt_db_write() to write all
doorbells appropriately depending on the chip. Eliminating the extra
doorbell on newer chips has a significant performance improvement
on pktgen.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add additional chip definitions and macros for all supported chips.
Add a new macro BNXT_CHIP_P4_PLUS for the newer generation of chips and
use the macro to properly determine the features supported by these
newer chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When bnxt_en gets a PCI shutdown call, we need to have a new callback
to inform the RDMA driver to do proper shutdown and removal.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new short message format is used on the new BCM57454 VFs. Each
firmware message is a fixed 16-byte message sent using the standard
firmware communication channel. The short message has a DMA address
pointing to the legacy long firmware message.
Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the current code, bnxt_dcb_init() is called too early before we
determine if the firmware DCBX agent is running or not. As a result,
we are not setting the DCB_CAP_DCBX_HOST and DCB_CAP_DCBX_LLD_MANAGED
flags properly to report to DCBNL.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the SPARC platform we need to use the DMA_ATTR_WEAK_ORDERING attribute
in our Rx path dma mapping in order to get the expected performance out
of the receive path. Adding it to the Tx path has little effect, so
that's not a part of this patch.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Tom Saeger <tom.saeger@oracle.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have the number of longs, but we need to calculate the number of
bytes required.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change restricts the PF in multi-host mode from setting any port
level PHY configuration. The settings are controlled by firmware in
Multi-Host mode.
Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check the additional flag in bnxt_hwrm_func_qcfg() before allowing
DCBX to be done in host mode.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for 100G link speed reporting for Broadcom BCM57454
ASIC in ethtool command.
Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com>
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mostly simple cases of overlapping changes (adding code nearby,
a function whose name changes, for example).
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code enables up to the maximum MSIX vectors in the PCIE
config space without considering the max completion rings available.
An MSIX vector is only useful when it has an associated completion
ring, so it is better to cap it.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mac loopback self test operates in polling mode. To support that,
we need to add functions to open and close the NIC half way. The half
open mode allows the rings to operate without IRQ and NAPI. We
use the XDP transmit function to send the loopback packet.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the basic infrastructure and only firmware tests initially.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add suspend/resume callbacks using the newer dev_pm_ops method.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And add functions to set and free magic packet filter.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add pci shutdown method to put device in the proper WoL and power state.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add code to driver probe function to check if the device is WoL capable
and if Magic packet WoL filter is currently set.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In bnxt_free_rx_skbs(), which is called to free up all RX buffers during
shutdown, we need to unmap the page if we are running in XDP mode.
Fixes: c61fb99cae ("bnxt_en: Add RX page mode support.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Net device reset can fail when the h/w or f/w is in a bad state.
Subsequent netdevice open fails in bnxt_hwrm_stat_ctx_alloc().
The cleanup invokes bnxt_hwrm_resource_free() which inturn
calls bnxt_disable_int(). In this routine, the code segment
if (ring->fw_ring_id != INVALID_HW_RING_ID)
BNXT_CP_DB(cpr->cp_doorbell, cpr->cp_raw_cons);
results in NULL pointer dereference as cpr->cp_doorbell is not yet
initialized, and fw_ring_id is zero.
The fix is to initialize cpr fw_ring_id to INVALID_HW_RING_ID before
bnxt_init_chip() is invoked.
Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The configurable priority to traffic class mapping and the user specified
queue ranges are used to configure the traffic class, overriding the
hardware defaults when the 'hw' option is set to 0. However, when the 'hw'
option is non-zero, the hardware QOS defaults are used.
This patch makes it so that we can pass the data the user provided to
ndo_setup_tc. This allows us to pull in the queue configuration if the
user requested it as well as any additional hardware offload type
requested by using a value other than 1 for the hw value.
Finally it also provides a means for the device driver to return the level
supported for the offload type via the qopt->hw value. Previously we were
just always assuming the value to be 1, in the future values beyond just 1
may be supported.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some situations, the firmware will return 0 for autoneg supported
speed. This may happen if the firmware detects no SFP module, for
example. The driver should ignore this so that we don't end up with
an invalid autoneg setting with nothing advertised. When SFP module
is inserted, we'll get the updated settings from firmware at that time.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set DCB_CAP_DCBX_HOST capability flag only if the firmware LLDP agent
is not running.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we call bnxt_reset_task() due to tx timeout, we should call
bnxt_ulp_stop() to inform the RDMA driver about the error and the
impending reset.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The firmware call to do function reset is done too late. It is causing
the rings that have been reserved to be freed. In NPAR mode, this bug
is causing us to run out of rings.
Fixes: 391be5c273 ("bnxt_en: Implement new scheme to reserve tx rings.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use eth_hw_addr_random() to set a random MAC address in order to make
sure bp->dev->addr_assign_type will be properly set to NET_ADDR_RANDOM.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the bnxt_init_one() failure path, bar1 and bar2 are not
being unmapped. This commit fixes this issue. Reorganize the
code so that bnxt_init_one()'s failure path and bnxt_remove_one()
can call the same function to do the PCI cleanup.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>