Don't hard code the function names in the diagnostic output when these
reset related routines fail. Instead, use %s and __func__ so that future
refactors don't need to change the print outs.
Additionally, while we are here, add missing function header comments
for the new reset_prepare and reset_done function handlers.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Correct the backward logic using !net_ratelimit()
Miscellanea:
o Add a blank line before the error return label
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Now that we have a working MAC/VLAN queue for handling MAC/VLAN messages
from the netdev, replace the default handler for the VF<->PF messages.
This new handler is very similar to the default code, but uses the
MAC/VLAN queue instead of sending the message directly. Unfortunately we
can't easily re-use the default code, so we'll just replace the entire
function.
This ensures that a VF requesting a large number of VLANs or MAC
addresses does not start a reset cycle, as explained in the commit which
introduced the message queue.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Ngai-mint Kwan <ngai-mint.kwan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Under some circumstances, when dealing with a large number of MAC
address or VLAN updates at once, the fm10k driver, particularly the VFs
can overload the mailbox with too many messages at once.
This results in a mailbox timeout, which causes the driver to initiate
a reset. During the reset, we re-send all the same messages that
originally caused the timeout. This results in a cycle of resets each
triggering a future reset.
To fix or avoid this, we introduce a workqueue item which monitors
a queue of MAC and VLAN requests. These requests are queued to the end
of the list, and we process as a FIFO periodically.
Initially we only handle requests for the netdev, but we do handle
unicast MAC addresses, multicast MAC addresses, and update VLAN
requests.
A future patch will add support to use this queue for handling MAC
update requests from the VF<->PF mailbox.
The MAC/VLAN work item will keep checking to make sure that each request
does not overflow the mailbox and cause a timeout. If it might, then the
work item will reschedule itself a short time later. This avoids any
reset cycle, since we never send the message if the mailbox is not
ready.
As an alternative, we tried increasing the mailbox message FIFO, but
this just delays the problem and results in needless memory waste on the
system. Our new message queue is dynamically allocated so only uses as
much memory as it needs. Additionally, it need not be contiguous like
the Tx and Rx FIFOs.
Note that this patch chose to only create a queue for MAC and VLAN
messages, since these are the only messages sent in a large enough
volume to cause the reset loop. Other messages are very unlikely to
overflow the mailbox Tx FIFO so easily.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Replace the PCI specific legacy power management hooks with the new
generic power management hooks which work properly for both suspend and
hibernate. The new generic system is better and properly handles the
lower level PCIe power management rather than forcing the driver to
handle it.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Lets not re-invent the locking wheel. Remove our bitlock and use
a proper spinlock instead.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If we lose PCIe link, such as when an unannounced PFLR event occurs, or
when a device is surprise removed, we currently detach the device and
close the netdev. This unfortunately leaves a lot of things still
active, such as the msix_mbx_pf IRQ, and Tx/Rx resources.
This can cause problems because the register reads will return
potentially invalid values which may result in unknown driver behavior.
Begin the process of resetting using fm10k_prepare_for_reset(), much in
the same way as the suspend and resume cycle does. This will attempt to
shutdown as much as possible, in order to prevent possible issues.
A naive implementation for this has issues, because there are now
multiple flows calling the reset logic and setting a reset bit. This
would cause problems, because the "re-attach" routine might call
fm10k_handle_reset() prior to the reset actually finishing. Instead,
we'll add state bits to indicate which flow actually initiated the
reset.
For the general reset flow, we'll assume that if someone else is
resetting that we do not need to handle it at all, so it does not need
its own state bit. For the suspend case, we will simply issue a warning
indicating that we are attempting to recover from this case when
resuming.
For the detached subtask, we'll simply refuse to re-attach until we've
actually initiated a reset as part of that flow.
Finally, we'll stop attempting to manage the mailbox subtask when we're
detached, since there's nothing we can do if we don't have a PCIe
address.
Overall this produces a much cleaner shutdown and recovery cycle for
a PCIe surprise remove event.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Although very unlikely, it is possible that cancel_work_sync() may stop
the service_task before it actually started. In this case, the
__FM10K_SERVICE_SCHED bit will never be cleared. This results in the
service task being unable to reschedule in the future. Add a helper
function which sets the service disable bit, waits for the service task
to stop and clears the schedule bit, thus avoiding the race condition.
We know the schedule bit is safe to clear because the cancel_work_sync()
guarantees the service task is not running.
Add a helper function also to restart the service task, for symmetry.
This is not strictly needed but helps the mental model of how to stop
and start the service task.
This race could only happen in fm10k_suspend/fm10k_resume as this is the
only place where the service task is actually restarted. Thus,
suspend/resume testing would be ideal. However, note that the chance of
this happening is very slim as the service event is scheduled for
immediate execution, and you would have to trigger a suspend at almost
the exact same time as the service task was scheduled.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A future patch needs these functions defined earlier in the file. Move
them closer to above where they will be called.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It is possible that under rare circumstances the device is undergoing
a reset, such as when a PFLR occurs, and the device may be transmitting
simultaneously. In this case, we might attempt to divide by zero when
finding the proper r_idx. Instead, lets read the num_tx_queues once,
and make sure it's non-zero.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We've always had a really weird looping construction for resetting VFs.
We read the VFLRE register and reset the VF if the corresponding bit is
set, which makes sense. However we loop continuously until we no longer
have any bits left unset. At first this makes sense, as a sort of "keep
trying until we succeed" concept.
Unfortunately this causes a problem if we happen to surprise remove
while this code is executing, because in this case we'll always read all
1s for the VFLRE register. This results in a hard lockup on the CPU
because the loop will never terminate.
Because our own reset function will clear the VFLR event register
always, (except when we've lost PCIe link obviously) there is no real
reason to loop. In practice, we'll loop over once and find that no VFs
are pending anymore.
Lets just check once. Since we're clear the notification when we reset
there's no benefit to the loop. Additionally, there shouldn't be a race
as future VLFRE events should trigger an interrupt. Additionally, we
didn't warn or do anything in the looped case anyways.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We're doing a really convoluted bitshift and read for the PFVFLRE
register. Just reading the PFVFLRE(1), shifting it by 32, then reading
PFVFLRE(0) should be sufficient.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When we load the driver, we set the last_reset to be in the future,
which delays the initial driver reset. Additionally, the service task
isn't scheduled to run automatically until the timer runs out. This
causes a needless delay of the first reset to begin talking to the
switch manager.
We can avoid this by simply not setting last_reset and immediately
scheduling the service task while in probe. This allows the device to
wake up faster, and avoids this delay.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Newer versions of GCC starting with 7 now additionally warn when a case
statement may fall through without an explicit comment mentioning it.
Add such a comment to silence the warning, as this is expected.
Unfortunately the comment must come directly before the next case
statement, so we put it outside the #ifdef. Otherwise, the compiler
cannot properly detect it and thus the warning is displayed regardless.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
New versions of GCC since version 7 began warning about possible
truncation of calls to snprintf. We can fix this and avoid false
positives. First, we should pass the full buffer size to snprintf,
because it guarantees a NULL character as part of its passed length, so
passing len-1 is simply wasting a byte of possible storage.
Second, if we make the ri and ti variables unsigned, the compiler is
able to correctly reason that the value never gets larger than 256, so
it doesn't need to warn about the full space required to print a signed
integer.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Newer versions of GCC since version 7 now warn when a case statement may
fall through without an explicit comment. "Fallthough" does not count as
it is misspelled. Fix the typos for these comments to appease the new
warnings.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In fm10k_get_host_state_generic, we check the mailbox tx_read() function
to ensure that the mailbox is still open. This function also checks to
make sure we have space to transmit another message. Unfortunately, if
we just recently sent a bunch of messages (such as enabling hundreds of
VLANs on a VF) this can result in a race where the watchdog task thinks
the link went down just because we haven't had time to process all these
messages yet.
Instead, lets just check whether the mailbox is still open. This ensures
that we don't race with the Tx FIFO, and we only link down once the
mailbox is not open.
This is safe, because if the FIFO fills up and we're unable to send
a message for too long, we'll end up triggering the timeout detection
which results in a reset. Additionally, since we still check to ensure
the mailbox state is OPEN, we'll transition to link down whenever the
mailbox closes as well.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Two single characters should be put into a sequence.
Thus use the corresponding function "seq_putc".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When we are handling PF<->VF mailbox messages, it is possible that the
VF will send us so many messages that the PF<->SM FIFO will fill up. In
this case, we stop the loop and wait until the service event is
rescheduled.
Normally this should happen due to an interrupt. But it is possible that
we don't get another interrupt for a while and it isn't until the
service timer actually reschedules us. Instead, simply reschedule
immediately which will cause the service event to be run again as soon
as we exit.
This ensures that we promptly handle all of the PF<->VF messages with
minimal delay, while still giving time for the SM mailbox to drain.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When we process VF mailboxes, the driver is likely going to also queue
up messages to the switch manager. This process merely queues up the
FIFO, but doesn't actually begin the transmission process. Because we
hold the mailbox lock during this VF processing, the PF<->SM mailbox is
not getting processed at this time. Ensure that we actually process the
PF<->SM mailbox in between each PF<->VF mailbox.
This should ensure prompt transmission of the messages queued up after
each VF message is received and handled.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Get rid of struct tc_to_netdev which is now just unnecessary container
and rather pass per-type structures down to drivers directly.
Along with that, consolidate the naming of per-type structure variables
in cls_*.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the return value from -EINVAL to -EOPNOTSUPP. The rest of the
drivers have it like that, so be aligned.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As ndo_setup_tc is generic offload op for whole tc subsystem, does not
really make sense to have cls-specific args. So move them under
cls_common structurure which is embedded in all cls structs.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the type is always present, push it to be a separate argument to
ndo_setup_tc. On the way, name the type enum and use it for arg type.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pci_error_handlers->reset_notify() method had a flag to indicate
whether to prepare for or clean up after a reset. The prepare and done
cases have no shared functionality whatsoever, so split them into separate
methods.
[bhelgaas: changelog, update locking comments]
Link: http://lkml.kernel.org/r/20170601111039.8913-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Interfaces will reset whenever the TX mailbox FIFO has become full. This
occurs more frequently whenever the IES API application is not running
to process and clear the messages in the FIFO. Thus, this could lead to
situations where the interface would enter an infinite reset loop. That
is: if the interface is trying to synchronize a huge number of unicast
and multicast entries with the IES API application, the TX mailbox FIFO
will become full and the interface resets. Once the interface exits
reset, it'll try to synchronize the unicast and multicast entries again.
Ergo, this creates an infinite loop. Other actions such as multiple
mulitcast mode or up/down transitions will fill the TX mailbox FIFO and
induce the interface to reset. To correct these situations, check if the
interface's "host_ready" flag is enabled before enqueuing any messages
to the TX mailbox FIFO. This check will be conducted by a function call.
Lastly, this issue mainly affects the PF and, thus, the VF is exempt.
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Write to RXQCTL register to disable the receive queue when configuring
the RX ring.
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Re-word the comment to avoid stating that we return a value for this
void function. Additionally, there is no need to mention older kernels,
since this is the upstream kernel.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If some code path executes fm10k_service_event_schedule(), it is
guaranteed that we only queue the service task once, since we use
__FM10K_SERVICE_SCHED flag. Unfortunately this has a side effect that if
a service request occurs while we are currently running the watchdog, it
is possible that we will fail to notice the request and ignore it until
the next time the request occurs.
This can cause problems with pf/vf mailbox communication and other
service event tasks. To avoid this, introduce a FM10K_SERVICE_REQUEST
bit. When we successfully schedule (and set the _SCHED bit) the service
task, we will clear this bit. However, if we are unable to currently
schedule the service event, we just set the new SERVICE_REQUEST bit.
Finally, after the service event completes, we will re-schedule if the
request bit has been set.
This should ensure that we do not miss any service event schedules,
since we will re-schedule it once the currently running task finishes.
This means that for each request, we will always schedule the service
task to run at least once in full after the request came in.
This will avoid timing issues that can occur with the service event
scheduling. We do pay a cost in re-running many tasks, but all the
service event tasks use either flags to avoid duplicate work, or are
tolerant of being run multiple times.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This ensures that future programmers do not have to remember to re-size
the bitmaps due to adding new values. Although this is unlikely for this
driver, it may happen and it's best to prevent it from ever being an
issue.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Replace bitwise operators and #defines with a BITMAP and enumeration
values. This is similar to how we handle the "state" values as well.
This has two distinct advantages over the old method. First, we ensure
correctness of operations which are currently problematic due to race
conditions. Suppose that two kernel threads are running, such as the
watchdog and an ethtool ioctl, and both modify flags. We'll say that the
watchdog is CPU A, and the ethtool ioctl is CPU B.
CPU A sets FLAG_1, which can be seen as
CPU A read FLAGS
CPU A write FLAGS | FLAG_1
CPU B sets FLAG_2, which can be seen as
CPU B read FLAGS
CPU A write FLAGS | FLAG_2
However, "|=" and "&=" operators are not actually atomic. So this could
be ordered like the following:
CPU A read FLAGS -> variable
CPU B read FLAGS -> variable
CPU A write FLAGS (variable | FLAG_1)
CPU B write FLAGS (variable | FLAG_2)
Notice how the 2nd write from CPU B could actually undo the write from
CPU A because it isn't guaranteed that the |= operation is atomic.
In practice the race windows for most flag writes is incredibly narrow
so it is not easy to isolate issues. However, the more flags we have,
the more likely they will cause problems. Additionally, if such
a problem were to arise, it would be incredibly difficult to track down.
Second, there is an additional advantage beyond code correctness. We can
now automatically size the BITMAP if more flags were added, so that we
do not need to remember that flags is u32 and thus if we added too many
flags we would over-run the variable. This is not a likely occurrence
for fm10k driver, but this patch can serve as an example for other
drivers which have many more flags.
This particular change does have a bit of trouble converting some of the
idioms previously used with the #defines for flags. Specifically, when
converting FM10K_FLAG_RSS_FIELD_IPV[46]_UDP flags. This whole operation
was actually quite problematic, because we actually stored flags
separately. This could more easily show the problem of the above
re-ordering issue.
This is really difficult to test whether atomics make a difference in
practical scenarios, but you can ensure that basic functionality remains
the same. This patch has a lot of code coverage, but most of it is
relatively simple.
While we are modifying these files, update their copyright year.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
FM10K_REMOVED expects a hardware address, not a 'struct fm10k_hw'.
Fixes: 5cb8db4a4c ("fm10k: Add support for VF")
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
These files all use functions declared in interrupt.h, but currently rely
on implicit inclusion of this file (via netns/xfrm.h).
That won't work anymore when the flow cache is removed so include that
header where needed.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The configurable priority to traffic class mapping and the user specified
queue ranges are used to configure the traffic class, overriding the
hardware defaults when the 'hw' option is set to 0. However, when the 'hw'
option is non-zero, the hardware QOS defaults are used.
This patch makes it so that we can pass the data the user provided to
ndo_setup_tc. This allows us to pull in the queue configuration if the
user requested it as well as any additional hardware offload type
requested by using a value other than 1 for the hw value.
Finally it also provides a means for the device driver to return the level
supported for the offload type via the qopt->hw value. Previously we were
just always assuming the value to be 1, in the future values beyond just 1
may be supported.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.
Fix all drivers with ndo_get_stats64 to have a void function.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The debug statistics were removed due to complications with the ethtool
statistics API which are not possible to resolve without a new
statistics interface. The flag was left behind, but we no longer need
it.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This was accidentally removed when we defeatured the full 1588 Clock
support. We need to report the Rx descriptor timestamp value so that
applications built on top of the IES API can function properly.
Additionally, remove the FM10K_FLAG_RX_TS_ENABLED, as it is not used now
that 1588 functionality has been removed.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
On packet RX, we perform a dma sync for cpu before passing the
packet up. Here we limit that sync to the actual length of the
incoming packet, rather than always syncing the entire buffer.
Signed-off-by: Scott Peterson <scott.d.peterson@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Partially revert commit 5e93cbadd3 ("fm10k: Reset mailbox global
interrupts", 2016-06-07)
The register bits related to this commit are now solely being handled by
the IES API. Recent changes in the IES API will allow an automatic
recovery from improper handling of these bits.
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Multiple IES API resets can cause a race condition where the mailbox
interrupt request bits can be cleared before being handled. This can
leave certain mailbox messages from the PF to be untreated and the PF
will enter in some inactive state. If this situation occurs, the IES API
will initiate a mailbox version reset which, then, trigger a mailbox
state change. Once this mailbox transition occurs (from OPEN to CONNECT
state), a request for reset will be returned.
This ensures that PF will undergo a reset whenever IES API encounters an
unknown global mailbox interrupt event or whenever the IES API
terminates.
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We don't need to typecast a u8 * into a char *, so just remove the extra
variable.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since a pointer "mac" to fm10k_mac_info structure exists, use it to
access the contents of its members.
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
e100: min_mtu 68, max_mtu 1500
- remove e100_change_mtu entirely, is identical to old eth_change_mtu,
and no longer serves a purpose. No need to set min_mtu or max_mtu
explicitly, as ether_setup() will already set them to 68 and 1500.
e1000: min_mtu 46, max_mtu 16110
e1000e: min_mtu 68, max_mtu varies based on adapter
fm10k: min_mtu 68, max_mtu 15342
- remove fm10k_change_mtu entirely, does nothing now
i40e: min_mtu 68, max_mtu 9706
i40evf: min_mtu 68, max_mtu 9706
igb: min_mtu 68, max_mtu 9216
- There are two different "max" frame sizes claimed and both checked in
the driver, the larger value wasn't relevant though, so I've set max_mtu
to the smaller of the two values here to retain identical behavior.
igbvf: min_mtu 68, max_mtu 9216
- Same issue as igb duplicated
ixgb: min_mtu 68, max_mtu 16114
- Also remove pointless old == new check, as that's done in dev_set_mtu
ixgbe: min_mtu 68, max_mtu 9710
ixgbevf: min_mtu 68, max_mtu dependent on hardware/firmware
- Some hw can only handle up to max_mtu 1504 on a vf, others 9710
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial change here to cleanup a checkpatch.pl warning that got
introduced when changing to alloc_workqueue.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This generic callback is for drivers which have software Tx timestamp
support enabled. Without this, PTP applications requesting software
timestamps may complain that the requested mode is not supported.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Introduce new rtnl UAPI that exposes a list of vlans per VF, giving
the ability for user-space application to specify it for the VF, as an
option to support 802.1ad.
We adjusted IP Link tool to support this option.
For future use cases, the new UAPI supports multiple vlans. For now we
limit the list size to a single vlan in kernel.
Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward
compatibility with older versions of IP Link tool.
Add a vlan protocol parameter to the ndo_set_vf_vlan callback.
We kept 802.1Q as the drivers' default vlan protocol.
Suitable ip link tool command examples:
Set vf vlan protocol 802.1ad:
ip link set eth0 vf 1 vlan 100 proto 802.1ad
Set vf to VST (802.1Q) mode:
ip link set eth0 vf 1 vlan 100 proto 802.1Q
Or by omitting the new parameter
ip link set eth0 vf 1 vlan 100
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the PF assigns a new MAC address to a VF it uses the base address
registers to store the MAC address. This allows a VF which loads after
this setup the ability to get the initial address without having to wait
for a mailbox message. Unfortunately to do this, the PF must take queue
ownership away from the VF, which can cause fault errors when there is
already an active VF driver.
This queue ownership assignment causes race condition between the PF and
the VF such that potentially a VF can cause FUM fault errors due to
normal PF/VF driver behavior.
It is not safe to simply allow the PF to write the base address
registers without taking queue ownership back as the PF must also
disable the queues, and this would impact active VF use. The current
code is safe because the queue ownership will prevent the VF from
actually writing but does trigger the FUM fault.
We can do better by simply avoiding the register write process when
a mailbox message suffices. If the message can be sent over the mailbox,
then we will not perform the queue ownership assignment and we won't
update the base address to be the same as the MAC address.
We do still have to write the TXQCTL registers in order to update the
VID of the queue. This is necessary because the TXQCTL register is
read-only from the VF, and thus the VF cannot do this for itself. This
register does not need to wait for the Tx queue to be disabled and is
safe for the PF to write during normal VF operation, so we move this
write to the top of the function above the mailbox message. Without
this, the TXQCTL register would be misconfigured and cause the VF to Tx
hang.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ensure that other bits in the RXQCTL register do not get cleared. This
ensures that bits related to queue ownership are maintained.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Similar to how we handle VXLAN offload, enable support for a single
Geneve tunnel.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In preparation for adding Geneve Rx offload support, refactor the
current VXLAN offload flow to be a bit more generic so that it will be
easier to add the new Geneve code. The fm10k hardware supports one VXLAN
and one Geneve tunnel, so we will eventually treat the VXLAN and Geneve
tunnels identically. To this end, factor out the code that handles the
current list so that we can use the generic flow for both tunnels in the
next patch.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In the event of a surprise remove, we expect the driver to go down,
which includes calling .stop_hw(). However, this function will return an
error because the queues won't appear to cleanly disable. Prevent this
and avoid the unnecessary checks by just returning when
FM10K_REMOVED(hw->hw_addr) is true.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In the event of an uncorrectable AER error occurring when the driver has
not loaded, the recovery routines are not done. This is done because
future loads of the driver may not be aware of the IO state and may not
be able to recover at all. In this case, when we next load the driver it
fails due to what appears to be a surprise remove event. Instead, add
a check to ensure that the device is in the normal IO state before
continuing to probe. This allows us to give a more descriptive message
of what is wrong.
Without this change, the driver will attempt to probe up to our first
call of .reset_hw() which will be unable to read registers and act as if
a surprise remove event occurred.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When fm10k_poll fully cleans rings it returns 0. This is incorrect as it
messes up the budget accounting in the core NAPI code. Fix this by
returning actual work done, capped at budget - 1 since the core doesn't
expect a return of the full budget when the driver modifies the NAPI
status.
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
While technically not needed, as all our uses of ACCESS_ONCE are scalar
types, we already use READ_ONCE in a few places, and for code
readability we can swap all the uses of the older ACCESS_ONCE into
READ_ONCE.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The function is only used in fm10k_ethtool.c, so make it static.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A previous patch added support to check for hardware Tx pending in the
fm10k_down routine. This support was intended to ensure that we
accurately check what the hardware state is. However, checking for Tx
hangs in this manor during the hotpath results in a large performance
hit. Avoid this by making the hotpath check use the SW counters instead.
Fixes: a0f53cf49cb0 ("fm10k: use actual hardware registers when checking for pending Tx", 2016-06-08)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A previous patch removed the pci_disable_device() call in
.io_error_detected. This call corresponded to a pci_enable_device_mem()
call within .io_slot_reset handler. Change the call here to
a pci_reenable_device() so that it does not increment and leak the
enable_cnt reference count for the device. Without this change, VF
devices may fail during an unbind/bind, and we'll never zero the
reference counter for the pci_dev structure.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The pci_enable_msix_range() function returns a positive value of the
number of allocated vectors if it succeeds. On failure it returns
a negative error code. Return this code properly so that the error
message printed by the driver will show the actual error code instead of
being masked by -ENOMEM.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When we resume from an AER recovery with many active VFs, the PF sees
many spurious link up and link down events. Prevent this by delaying
link down for at least one second after the resume event.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If the fm10k interface is brought up, but the switch manager software is
not running, the driver will continuously request the lport map every
few seconds in the base driver watchdog routine. Eventually after
several minutes the switch mailbox Tx fifo will fill up and the mailbox
will timeout, resulting in a reset. This reset will appear as if for no
reason, and occurs regularly every few minutes until the switch manager
software is loaded.
Prevent this from happening by only requesting the lport map after we've
verified the switch mailbox is tx_ready. In order to simplify code logic
and reduce code duplication, implement this as a new function pointer
"mac.ops.request_lport_map" which the VF will not implement. Otherwise,
we have to duplicate the tx_ready check outside of
fm10k_get_host_state_generic, or re-implement most of
fm10k_get_host_state_generic in the pf version.
The resulting code is simpler and easier to understand, and prevents the
PF from continuously requesting lport map and filling the Tx fifo of
a switch mailbox that isn't ready.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sometimes, a VF driver will lose PCIe address access, such as due to
a PF FLR event. In fm10k_detach_subtask, poll and check whether the
PCIe register space is active again and restore the device when it has.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If an FLR occurs, VF devices will be knocked out of bus master mode, and
the driver will be unable to recover from the reset properly, resulting
in malicious driver events and an infinite reset loop. In the normal
case, the bus master mode will already be enabled and this call will
essentially be a no-op. Since we're doing this every reset, it is
possible we could remove the other calls to pci_set_master() but it
seems not harmful to just leave them in place.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Continuing the effort to commonize the similar suspend/resume flows,
finish up by using the new fm10k_handle_suspand and fm10k_handle_resume
functions for the standard suspend/resume flow.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When a function level PCI reset is triggered using sysfs, it calls the
driver's .reset_notify error handler. Implement a handler based on the
now split fm10k_prepare_for_reset and fm10k_handle_reset functions, so
that we fully reset the driver when the PCI function level reset occurs.
This also ensures the reset is handled in a clean way by first disabling
all the driver bits first and then restoring them after the function
reset. Previously the stack simply performed a blind function reset and
our driver didn't take any part in the process.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Now that we have extracted the necessary steps for a split
suspend/resume flow, re-use these functions instead of using the current
open coded flow. This ensures that we don't miss any steps. It also
ensures that we have the correct driver states set.
Since we'll be handling all of the reset flow ourselves, we no longer
need to request a reset in the io_slot_reset() function.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Implement fm10k_prepare_suspend and fm10k_handle_resume functions which
abstract around the now existing fm10k_prepare_for_reset and
fm10k_handle_reset. The new functions also handle stopping the service
task, which is something that the original re-init flow does not need.
Every other location that does a suspend/resume type flow is expected to
use these functions, because otherwise they may have conflicts with the
running watchdog routines. This also has the effect of preventing
possible surprise remove events during handling of FLR events and PCIe
errors.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There are several flows in the driver which perform the similar function
of tearing down software and restoring software to recover from certain
errors or PCIe events, including:
* fm10k_reinit
* fm10k_suspend/resume
* fm10k_io_error_detected/fm10k_io_resume
In addition, we want to implement a .reset_notify() handler as well
which will also perform similar function.
Rework how the driver codes reset and resume flows by separating out the
reinit logic into two functions "fm10k_prepare_for_reset" and
"fm10k_handle_reset". This first step will allow us to re-use this
functionality in the similar blocks of code instead of re-coding the
same sequence of events slightly different.
The end result should be more maintainable and correct, fixing several
inconsistencies with the work flow.
The new functions expect to take the rtnl_lock() themselves, and it does
have the unfortunate side effect of having the reinit flow take then
release then take the rtnl_lock. However, this minor downside is
out weighted by the benefits of code reduction and reducing needless
difference between these flows.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It turns out that sometimes during a reset the Tx queues will be
temporarily stuck longer than .stop_hw() expects. Work around this issue
by attempting to .stop_hw() first. If it tails, wait a number of
attempts until the Tx queues appear to be drained. After this, attempt
stop_hw() again. This ensures that we avoid waiting if we don't need to,
such as during the first initialization of a VF, and give the proper
amount of time necessary to recover from most situations. It is possible
that the hardware is actually stuck. For PFs, this is usually fixed by
a datapath reset. Unfortunately the VF cannot request a similar reset
for itself.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When stop_hw() routine fails with FM10K_ERR_REQUESTS_PENDING, this
indicates that the Tx or Rx queues did not shutdown within the time
limit. Print a more suitable message at the dev_info level instead of
dev_err.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A while ago, an additional check for the switch being ready was added to
reset_hw. A recent refactor accidentally made this check return an error
code on failure which caused fm10k_probe to fail when the switch wasn't
brought up first. The original reasoning for the check was to prevent
additional data path reset when the fabric wasn't ready yet. However,
there isn't a compelling reason to keep the check, as the data path
reset will restore hardware to a known good state. Remove the check and
perform the data path reset regardless of the switch manager state.
An alternative fix is to return FM10K_SUCCESS instead, and bypass the
actual data path reset. This should be fine as we will perform
a reset_hw once the switch is active. However, since data path reset
will reset many parts of the hardware it seems better to just perform
the reset regardless of switch state.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don't report FM10K_ERR_REQUESTS_PENDING when we fail to disable queues
within the timeout. This can occur due to a hardware Tx hang, or when
the switch ethernet fabric is resetting while we are transmitting
traffic. It can sometimes take up to 500ms before the Tx DMA engine
gives up. Instead, just skip the DMA engine check and perform
a data-path reset anyways. Add a statistic counter to keep track of the
number of resets occurring while we have pending DMA on the rings.
In order to prevent having to re-assign err to 0, re-order the
last few items of the reset_hw_pf function so that we don't perform
"return err" at the end.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When a data path reset is initiated, write control to the PCIE_GMBX is
yanked from the switch manager. The switch manager writes to this
register to clear mailbox global interrupt bits as part of its mailbox
interrupt handling routine. When the device recovers from the data path
reset and these bits are not cleared, it will prevent future mailbox
global interrupts from being triggered. Upon confirming that the device
has exited from a data path reset, clear these bits to ensure the proper
functioning of the mailbox global interrupt.
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Also prevent updating stats while the interface is down. If we're
already updating stats, just return doing nothing. When we take the
device down, block stat updates until we come back up. This ensures that
we avoid tearing down rings when we're updating statistics, and prevents
updating statistics until we're up.
We can't re-use the __FM10K_DOWN for this because it wouldn't prevent
multiple threads from accessing statistics. Neither does it prevent the
case where we start updating stats and then start going down in another
thread.
The fm10k_get_stats64 is except from this, because it has a completely
different flow which does not suffer from the same issues as
fm10k_update_stats might.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It's currently possible for fm10k_update_stats to be called during the
window when we go down and the rings are removed. This can result in
a null pointer dereference. In fm10k_get_stats64 we work around this by
using ACCESS_ONCE and a null pointer check inside the loop. Use this
same flow in the fm10k_update_stats to avoid the potential null pointer.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Return early from fm10k_down() when we are already down, since that
means another thread is either already finished or has started going
down, so shouldn't conflict with them.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Some comments weren't updated to reflect the renaming of ndo's and the
change of arguments.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
alloc_workqueue replaces deprecated create_workqueue().
A dedicated workqueue has been used since the workitem (viz
fm10k_service_task, which manages and runs other subtasks) is involved in
normal device operation and requires forward progress under memory
pressure.
create_workqueue has been replaced with alloc_workqueue with max_active
as 0 since there is no need for throttling the number of active work
items.
Since network devices may be used in memory reclaim path,
WQ_MEM_RECLAIM has been set to guarantee forward progress.
flush_workqueue is unnecessary since destroy_workqueue() itself calls
drain_workqueue() which flushes repeatedly till the workqueue
becomes empty. Hence the call to flush_workqueue() has been dropped.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The index calculated when looping through the indir array passed to
fm10k_write_reta was incorrectly calculated as the first part i needs to
be multiplied by 4.
Fixes: 0cfea7a65738 ("fm10k: fix possible null pointer deref after kcalloc", 2016-04-13)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
While reviewing the i40e driver changes to support page based receive I
realized that I had overlooked the fact that the fm10k hardware required a
512 byte alignment for Rx buffers. This patch is meant to address that by
changing the alignment for Rx buffers to 512 bytes instead of allowing it
to be L1 cache aligned.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The FM10K_MAX_DATA_PER_TXD is really just using a bitshift as a power of
2 operation in an efficient manner. We shouldn't represent this as a BIT()
because that obscures the intention of the operation.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Now that we do have pci_request_mem_regions() and pci_release_mem_regions()
at hand, use it in the Intel ethernet drivers.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: David S. Miller <davem@davemloft.net>
This change replaces the network device operations for adding or removing a
VXLAN port with operations that are more generically defined to be used for
any UDP offload port but provide a type. As such by just adding a line to
verify that the offload type if VXLAN we can maintain the same
functionality.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts were two cases of simple overlapping changes,
nothing serious.
In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.
Signed-off-by: David S. Miller <davem@davemloft.net>
fm10k_open requires rtnl_lock to be held.
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Shannon Nelson <shannon.nelson@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Cc: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check for and handle IPv6 extended headers so that Tx checksum offload
can be done. Also use skb_checksum_help for unexpected cases. This was
originally discovered in ixgbe.
Reported-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Update every header file and other locations to consistently use
Intel(R) instead of just Intel. Also update copyright year of files
which we modified.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When writing a new default redirection table, we needed to populate
a new RSS table using ethtool_rxfh_indir_default. We populated this
table into a region of memory allocated using kcalloc, but never checked
this for NULL. Fix this by moving the default table generation into
fm10k_write_reta. If this function is passed a table, use it. Otherwise,
generate the default table using ethtool_rxfh_indir_default, 4 at at
time.
Fixes: 0ea7fae440 ("fm10k: use ethtool_rxfh_indir_default for default redirection table")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Deleting lport when multicast mode is configured to
FM10K_XCAST_MODE_ALLMULTI or FM10K_XCAST_MODE_PROMISC will result in
generating orphaned multicast-group entries in the switch manager.
Before deleting the lport, reset multicast mode to FM10K_XCAST_MODE_NONE
to flush out these multicast-group entries.
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The original comment may be read incorrectly as referring to checking
the *entire* length is zero. However, it merely checks only the reserved
bits of both length and reserved in a small amount of code. Update the
comment to indicate this is a clever trick and clearly spell out that it
only checks the reserve bits.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>