We've got the number of longs, yes, but we should multiply by
sizeof(long) to get the number of bytes needed.
Fixes: e4917d46a6 ("qede: Add aRFS support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for aRFS for TCP and UDP
protocols with IPv4/IPv6.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds necessary APIs to interface with
qede aRFS support in successive patch.
It also reserves separate PTT entry for aRFS,
[as being in fastpath flow] for hardware access instead of
trying to acquire it at run time from the ptt pool.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case an XDP program is attached, reserve XDP_PACKET_HEADROOM
bytes at the beginning of the packet for the program to play
with.
Modify the XDP logic in the driver to fill-in the missing bits
and re-calculate offsets and length after the program has finished
running to properly reflect the current status of the packet.
We can then go and remove the limitation of not supporting XDP programs
where xdp_adjust_head is set.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver currently doesn't support any headroom; The only 'available'
space it has in the head of the buffer is due to the placement
offset.
In order to allow [later] support of XDP adjustment of headroom,
modify the the ingress flow to properly handle a scenario where
the packets would have such.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current implementation of VFs is very tight in regard to queue
resources. VFs support for XDP would require quite a bit of additional
infrastructure in qede and qed [sharing of queue-zones between queues,
more VF cids, mapping of the doorbell bar, etc.].
For now, prevent XDP programs from being attached to VFs.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver is currently using dma_unmap_single() with the address it
passed to device for the purpose of forwarding, but the XDP
transmission buffer was originally a page allocated for the rx-queue.
The mapped address is likely to differ from the original mapped
address due to the placement offset.
This difference is going to get even bigger once we support headroom.
Cache the original mapped address of the page, and use it for unmapping
of the buffer when completion arrives for the XDP forwarded packet.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, each time an ingress packet is passed to networking stack
the driver increments a per-queue SW statistic.
As we want to have additional fields in the first cache-line of the
Rx-queue struct, change flow so this statistic would be updated once per
NAPI run. We will later push the statistic to a different cache line.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to maintain the various open archipelagos as a list -
The maximal number of them is known, and we can use the CID
as key for random-access into the array.
Signed-off-by: Michal Kalderon <Michal.Kalderon@caviumc.om>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Management firmware can query for some basic iSCSI-related statistics.
Provide those just as we do for other protocols.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that management firmware is capable of telling us the number of CQs
available for a given PF, qed needs to communicate the number to qedi
so it would know have many to use.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware provides a statistic for the number of out-of-order isles
it used - fill it in the iscsi-related statistics.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before initializing the chip's engine, driver currently closes a set
of registers on the HW's ingress flow to prevent packets from slipping
in while they're not supposed to.
This configuration is insufficient, as there are some scenarios where
packets would still arrive even when said registers are set,
but the management firmware already closes other per-port registers
that do suffice, making this setting unnecessray.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Default HW configuration is optimal for an architecture where cache
line size is 64B.
During chip initialization, properly initialize the cache line size
in HW to avoid possible redundant PCI transactions.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to access HW registers driver needs to acquire a PTT entry
[mapping between bar memory and internal chip address].
Since acquiring PTT entries could fail [at least in theory] as their
number is finite and other flows can hold them, we reserve special PTT
entries for 'important' enough flows - ones we want to guarantee that
would not be susceptible to such issues.
One such special entry is the 'main' PTT which is meant to be used in
flows such as chip initialization and de-initialization.
However, there are other flows that are also using that same entry
for their own purpose, and might run concurrently with the original
flows [notice that for most cases using the main-ptt by mistake, such
a race is still impossible, at least today].
This patch re-organizes the various functions that currently use the
main_ptt in one of two ways:
- If a function shouldn't use the main_ptt it starts acquiring and
releasing it's own PTT entry and use it instead. Notice if those
functions previously couldn't fail, they now can [as acquisition
might fail].
- Change the prototypes so that the main_ptt would be received as
a parameter [instead of explicitly accessing it].
This prevents the future risk of adding codes that introduces new
use-cases for flows using the main_ptt, ones that might be in race
with the actual 'main' flows.
Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PTT entries are per-hwfn; If some errneous flow is trying
to use a PTT belonging to a differnet hwfn warn user, as this
can break every register accessing flow later and is very hard
to root-cause.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While unlikely, this makes sure the workqueue name won't be processed
as a format string.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When qedr is enabled, qed would try dividing the msi-x vectors between
L2 and RoCE, starting with L2 and providing it with sufficient vectors
for its queues.
Problem is qed would also do that for storage partitions, and as those
don't need queues it would lead qed to award those partitions with 0
msi-x vectors, causing them to believe theye're using INTa and
preventing them from operating.
Fixes: 51ff17251c ("qed: Add support for RoCE hw init")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There seems to be a missing break on the OOO_LB_TC case, pq_id
is being assigned and then re-assigned on the fall through default
case and that seems suspect.
Detected by CoverityScan, CID#1424402 ("Missing break in switch")
Fixes: b5a9ee7cf3 ("qed: Revise QM cofiguration")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should be returning -ENOMEM if qed_mcp_cmd_add_elem() fails. The
current code returns success.
Fixes: 4ed1eea82a ("qed: Revise MFW command locking")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's possible some configurations would prevent driver from utilizing
all the Memory Regions due to a lack of ILT lines.
In such a case, calculate how many memory regions would have to be
dropped due to limit, and manage without those.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As RoCE doesn't need to use the SRC, allocating ILT memory
on behalf of RoCE is wasting available ILT lines.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As of today there's no protocol supported that requires
support from the TM hardware block and enables SRIOV,
but we should still correct the calculation to reflect
the lines required for such future VFs instead of changing
the PF's own lines.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When configuring the HW timers block we should set the number of CIDs
up until the last CID that require timers, instead of only those CIDs
whose protocol needs timers support.
Today, the protocols that require HW timers' support have their CIDs
before any other protocol, but that would change in future [when we
add iWARP support].
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor and clean up the queue manager initialization logic.
Also, this adds support for RoC low latency queues, which later
would be used for improving RoCE latency in high throughput scenarios.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now, qed used some port-defined value as BDQ index for both iSCSI
and FCoE.
As management firmware now treats BDQ as a resource and tells each PF
its BDQ-range, start using a valure from that range instead.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Management firmware is used as an arbiter between the various PFs
in matters of resources, but some of the resources that need to
be divided are dependent on the non-management firmware used,
so management firmware first needs to be told how many resources
there are before trying to divide them.
As part of the initialization sequence, driver would first inform
the management firmware of the available resources under
a dedicated resource lock, and afterwards request for various
resources which might be based on the previous set values.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Global locking can't properly be used to synchronize between different
PFs in all scenarios, as those instances might reside in different
logical partitions [e.g., when a PF is assigned via PDA to some VM].
The management firmware provides a generic infrastructure for
device locks. For each 'resource', it's guaranteed it could be acquired
by at most a single PF at any given time [or by management firmware].
This patch adds the necessary logic in qed for utilizing said
infrastructure, implementing lock/unlock internal APIs.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During HW initialization, driver would set various registers to their
needed values - but it assumes all registers start at their reset-value,
so there's no need to re-configure a register's default value.
This assumption might be incorrect, e.g., in case of preboot driver
running and initializing the driver prior to our driver.
To overcome this, we now ask management firmware to initiate a PF-flr
early during the initialization sequence. That would return everything
in the PF's scope back to default and prevent previous configurations
from still being applied.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Management firmware is used as an arbiter between the various PFs
in regard to loading - it causes the various PFs to load/unload
sequentially and informs each of its appropriate rule in the init.
But the existing flow is too weak to handle some scenarios where
PFs aren't properly cleaned prior to loading.
The significant scenarios falling under this criteria:
a. Preboot drivers in some environment can't properly unload.
b. Unexpected driver replacement [kdump, PDA].
Modern management firmware supports a more intricate loading flow,
where the driver has the ability to overcome previous limitations.
This moves qed into using this newer scheme.
Notice new scheme is backward compatible, so new drivers would
still be able to load properly on top of older management firmwares
and vice versa.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We'll soon need additional information, so start by changing
the infrastructure to receive the initializing variables
via a parameter struct.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Management firmware is used as arbiter between different PFs
which are loading/unloading, but in order to use the synchronization
it offers the contending configurations need to be applied either
between their LOAD_REQ <-> LOAD_DONE or UNLOAD_REQ <-> UNLOAD_DONE
management firmware commands.
Existing HW stop flow utilizes 2 different functions: qed_hw_stop() and
qed_hw_reset() which don't abide this requirement; Most of the closure
is doing outside the scope of the unload request.
This patch removes qed_hw_reset() and places the relevant stop
functionality underneath the management firmware protection.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Align the driver feature distribution with the flow utilized
by the management firmware - first reserve L2 queues for
VFs and use all the remaining for the PF.
The current distribution might lead to PFs with an enormous
amount of queues, but at the same time leave us with insufficient
resources for starting all VFs.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When RoCE is enabled on a given L2 interface, the interrupt lines
are divided equally between L2 and RoCE -
But in case number of lines needed for RoCE is limited by number
of available CNQs, we can utilize the additional lines for L2.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Management firmware and driver are meant to be both backward and forward
compatibile with each other.
If a new mangement firmware would work with an older driver,
it's possible that driver would receive indications which are meaningless
to it. That's perfectly acceptible from the firmware part - so no need to
log such messages at default verbosity; That would only serve to confuse
users.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The management firmware is running on a Big Endian processor,
and when running on LE platform HW is configured to swap access
to memory shared between management firmware and driver on
32-bit granulariy.
As a result, for matters of simplicity most of the APIs between
driver and management firmware are based on 32-bit variables.
MAC settings are one exception, as driver needs to fill a byte
array when indicating to management firmware that primary MAC
has changed.
Due to the swap, driver must make sure that the mac that was
provided in byte-order would be translated into native order,
otherwise after the swap the management firmware would read
it swapped.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver interaction with management firmware involves a union
of all the data-members relating to the commands the driver prepares.
Current interface assumes the caller always passes such a union -
but thats cumbersome as well as risky [chancing a stack corruption
in case caller accidentally passes a smaller member instead of union].
Change implementation so that caller could pass a pointer to any
of the members instead of the union.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Interaction of driver -> management firmware is based
on a one-pending mailbox [per interface], and various
mailbox commands need to be synchronized.
Current scheme is messy, and there's a difficulty extending
it as it deals differently with various commands as well as
making assumption on the required behavior for load/unload
requests.
Drop the current scheme into a completion-list-based approach;
Each flow would try sending the command when possible,
allowing one flow to complete another flow's completion and
relieve the mailbox before sending its own command.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The link information exists only on the leading hwfn,
but some of its derivatives [e.g., min/max rate] need to
be configured for each hwfn.
When re-basing the VF link view, use the leading hwfn
information as basis for all existing hwfns to allow
said configurations to stick.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Malicious VF existance should be interesting enough for the
hyperuser. Change the PF indication that one of its child VF
became malicious to appear by default.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PF<->VF interface allows for the VF to request
multiple queues closure via a single message, but this has
never been used by any official driver.
We now deprecate this option, forcing each queue close
to arrive via a different command; This would be required
for future TLVs that are going to extend the queue TLVs with
additional information on a per-queue basis.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PF needs to validate the status of VF queues before asking firmware
to configure anything for them, but that validation is done in various
different forms - sometimes inadequate.
Add auxillary functions that can be used for testing of the queue
state and convert the various flows to use those instead of current
existing flows; Also, add missing validations where needed.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When starting the VF's vport, the PF would first configure
the status blocks of the VF and then reset them.
That would cause some of the configured information to be lost -
specifically it would mean that all the VFs queues would use
the Rx coalescing state-machine of the status block.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When PF responds to the VF requests it also cleans the HW-channel
indication in firmware to allow further VF messages to arrive,
but the order currently applied is wrong -
The PF is copying by DMAE the response the VF is polling on for
completion, and only afterwards sets the HW-channel to ready state.
This creates a race condition where the VF would be able to send
an additional message to the PF before the channel would get ready
again, causing the firmware to consider the VF as malicious.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a VF is considered malicious, driver handling of the VF
FLR flow would clean said indication - but not if the FLR is
part of an sriov-disable flow.
That leads to further issues, as PF wouldn't re-enable the
previously malicious VF when sriov is re-enabled.
No reason for that - simply clean malicious indications in
the sriov-disable flow as well.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VFs are currently logging errors when communicating
with their PFs in a too-low verbosity that wouldn't
be shown by default. As timeouts and failed commands
are crucial for VF operability, make them appear by
default.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the end of the timeout loop, retries will always be zero so
the check for zero is redundant so remove it. Also replace
printk with pr_err as recommended by checkpatch.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/broadcom/genet/bcmgenet.c
net/core/sock.c
Conflicts were overlapping changes in bcmgenet and the
lockdep handling of sockets.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Ensure that mtu is at least IPV6_MIN_MTU in ipv6 VTI tunnel driver,
from Steffen Klassert.
2) Fix crashes when user tries to get_next_key on an LPM bpf map, from
Alexei Starovoitov.
3) Fix detection of VLAN fitlering feature for bnx2x VF devices, from
Michal Schmidt.
4) We can get a divide by zero when TCP socket are morphed into
listening state, fix from Eric Dumazet.
5) Fix socket refcounting bugs in skb_complete_wifi_ack() and
skb_complete_tx_timestamp(). From Eric Dumazet.
6) Use after free in dccp_feat_activate_values(), also from Eric
Dumazet.
7) Like bonding team needs to use ETH_MAX_MTU as netdev->max_mtu, from
Jarod Wilson.
8) Fix use after free in vrf_xmit(), from David Ahern.
9) Don't do UDP Fragmentation Offload on IPComp ipsec packets, from
Alexey Kodanev.
10) Properly check napi_complete_done() return value in order to decide
whether to re-enable IRQs or not in amd-xgbe driver, from Thomas
Lendacky.
11) Fix double free of hwmon device in marvell phy driver, from Andrew
Lunn.
12) Don't crash on malformed netlink attributes in act_connmark, from
Etienne Noss.
13) Don't remove routes with a higher metric in ipv6 ECMP route replace,
from Sabrina Dubroca.
14) Don't write into a cloned SKB in ipv6 fragmentation handling, from
Florian Westphal.
15) Fix routing redirect races in dccp and tcp, basically the ICMP
handler can't modify the socket's cached route in it's locked by the
user at this moment. From Jon Maxwell.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (108 commits)
qed: Enable iSCSI Out-of-Order
qed: Correct out-of-bound access in OOO history
qed: Fix interrupt flags on Rx LL2
qed: Free previous connections when releasing iSCSI
qed: Fix mapping leak on LL2 rx flow
qed: Prevent creation of too-big u32-chains
qed: Align CIDs according to DORQ requirement
mlxsw: reg: Fix SPVMLR max record count
mlxsw: reg: Fix SPVM max record count
net: Resend IGMP memberships upon peer notification.
dccp: fix memory leak during tear-down of unsuccessful connection request
tun: fix premature POLLOUT notification on tun devices
dccp/tcp: fix routing redirect race
ucc/hdlc: fix two little issue
vxlan: fix ovs support
net: use net->count to check whether a netns is alive or not
bridge: drop netfilter fake rtable unconditionally
ipv6: avoid write to a possibly cloned skb
net: wimax/i2400m: fix NULL-deref at probe
isdn/gigaset: fix NULL-deref at probe
...