The initial function and setup tables can be marked as constant.
Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Noticed that the legacy Interrupt handler didn't have the same
ECC warning as did the MSI. So this patch adds it.
Signed-off-by: Don Skidmore <donald.c.skidmore>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The X540 thermal sensor interrupt isn't a General Purpose Interrupt
so doesn't need to be enabled in ixgbe_setup_gpie(). Likewise X540 doesn't
use the SDP0 for thermal sensor so it doesn't need to be enabled for any
device other than 82599.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add code to enable thermal sensors for the x540 hardware, as well as a
thermal interrupt check which will exit with a critical message of a
thermal overheat is detected. Intent of code allows other mac types to
be added with different configuration in the future.
Fixed in this version is the addition of setting the temp_sensor
capable flag which was previously only set for a specific mac.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Revise high and low threshold marks wrt flow control to account
for the X540 devices and latency introduced by the loopback
switch.
Without this it was in theory possible to drop frames on a
supposedly lossless link with X540 or SR-IOV enabled.
Previously we used a magic number in a define to calculate the
threshold values. This made it difficult to sort out exactly
which latencies were or were not being accounted for. Here
I was overly explicit and tried to used #define names that would
be recognizable after reading the IEEE 802.1Qbb specification.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Disable LLI for FCoE since regular interrupt
and their moderation rate works slightly better
for FCoE also.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch is meant to help cleanup the interrupt throttle rate logic by
storing the interrupt throttle rate as a value in microseconds instead of
interrupts per second. The advantage to this approach is that the value
can now be stored in an 16 bit field and doesn't require as much math to
flip the value back and forth since the hardware already used microseconds
when setting the rate.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Changes to clean up the vlan rx path broke trunk vlan. Trunk vlans in
a VF driver are those set using:
"ip link set <pfdev> vf <n> <vlanid>"
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
CC: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Doing an 'ifconfig ethN down' followed by an 'ifconfig ethN up' on a qemu-kvm
guest system configured with two e1000 NICs can result in an 'unable to handle
kernel paging request at 0000000100000000' or 'bad page map in process ...' or
something similar.
These result from a 4096-byte page being corrupted with the following two-word
pattern (16-bytes) repeated throughout the entire page:
0x0000000000000000
0x0000000100000000
There can be other bits set as well. What is a constant is that the 2nd word
has the 32nd bit set. So one could see:
:
0x0000000000000000
0x0000000100000000
0x0000000000000000
0x0000000172adc067 <<< bad pte
0x800000006ec60067
0x0000000700000040
0x0000000000000000
0x0000000100000000
:
Which came from from a process' page table I dumped out when the marked line
was seen as bad by print_bad_pte().
The repeating pattern represents the e1000's two-word receive descriptor:
struct e1000_rx_desc {
__le64 buffer_addr; /* Address of the descriptor's data buffer */
__le16 length; /* Length of data DMAed into data buffer */
__le16 csum; /* Packet checksum */
u8 status; /* Descriptor status */
u8 errors; /* Descriptor Errors */
__le16 special;
};
And the 32nd bit of the 2nd word maps to the 'u8 status' member, and
corresponds to E1000_RXD_STAT_DD which indicates the descriptor is done.
The corruption appears to result from the following...
. An 'ifconfig ethN down' gets us into e1000_close(), which through a number
of subfunctions results in:
1. E1000_RCTL_EN being cleared in RCTL register. [e1000_down()]
2. dma_free_coherent() being called. [e1000_free_rx_resources()]
. An 'ifconfig ethN up' gets us into e1000_open(), which through a number of
subfunctions results in:
1. dma_alloc_coherent() being called. [e1000_setup_rx_resources()]
2. E1000_RCTL_EN being set in RCTL register. [e1000_setup_rctl()]
3. E1000_RCTL_EN being cleared in RCTL register. [e1000_configure_rx()]
4. RDLEN, RDBAH and RDBAL registers being set to reflect the dma page
allocated in step 1. [e1000_configure_rx()]
5. E1000_RCTL_EN being set in RCTL register. [e1000_configure_rx()]
During the 'ifconfig ethN up' there is a window opened, starting in step 2
where the receives are enabled up until they are disabled in step 3, in which
the address of the receive descriptor dma page known by the NIC is still the
previous one which was freed during the 'ifconfig ethN down'. If this memory
has been reallocated for some other use and the NIC feels so inclined, it will
write to that former dma page with predictably unpleasant results.
I realize that in the guest, we're dealing with an e1000 NIC that is software
emulated by qemu-kvm. The problem doesn't appear to occur on bare-metal. Andy
suspects that this is because in the emulator link-up is essentially instant
and traffic can start flowing immediately. Whereas on bare-metal, link-up
usually seems to take at least a few milliseconds. And this might be enough
to prevent traffic from flowing into the device inside the window where
E1000_RCTL_EN is set.
So perhaps a modification needs to be made to the qemu-kvm e1000 NIC emulator
to delay the link-up. But in defense of the emulator, it seems like a bad idea
to enable dma operations before the address of the memory to be involved has
been made known.
The following patch no longer enables receives in e1000_setup_rctl() but leaves
them however they were. It only enables receives in e1000_configure_rx(), and
only after the dma address has been made known to the hardware.
There are two places where e1000_setup_rctl() gets called. The one in
e1000_configure() is followed immediately by a call to e1000_configure_rx(), so
there's really no change functionally (except for the removal of the problem
window. The other is in __e1000_shutdown() and is not followed by a call to
e1000_configure_rx(), so there is a change functionally. But consider...
. An 'ifconfig ethN down' (just as described above).
. A 'suspend' of the system, which (I'm assuming) will find its way into
e1000_suspend() which calls __e1000_shutdown() resulting in:
1. E1000_RCTL_EN being set in RCTL register. [e1000_setup_rctl()]
And again we've re-opened the problem window for some unknown amount of time.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Finish conversion to unified ethtool ops: convert get_flags.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reloading FW during resets can cause issues. Remove the full reset
as it is not needed.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add support for WOL as determined by the EEPROM.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is meant to avoid a hardware lockup when Tx work is still
pending and we request a reset.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support for configuring the priority to
traffic class mapping.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We don't need SFP+ plugable support for X540 hardware (copper only) so
don't enable the SFP+ interrupts.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The DCB CEE command set_state() will complete successfully
but is misleading because it enables IEEE mode. After
this patch the command is failed.
And IEEE PFC/ETS is managed from ieee paths now instead
of using CEE primitives.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use the PCI device flag indicating if a VF is assigned to a guest VM
to guard against destroying VFs upon driver removal. Implement
additional feature to detect if VFs already exist when the driver
is loaded and if so configure them and set the driver state to
SR-IOV enabled.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Instead of using the multi_tx_table to map possible Tx queues to Tx rings
we can just do simple subtraction for the unlikely event that the Tx queue
provided exceeds the number of Tx rings.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since igb only uses advanced descriptors we might as well just use an IGB
specific define and drop the _ADV suffix for the descriptor declarations.
In addition this can be further reduced by assuming that it will be working
on pointers since that is normally how the Tx descriptors are handled.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Many of the function names in the hot path are carrying an extra "_adv"
suffix on the end of them to represent the fact that they are using
advanced descriptors instead of legacy descriptors. However since all igb
uses are advanced descriptors adding the extra suffix doesn't really add
any additional data. Since this is the case it is easiest to just drop the
suffix and save us from having to store the extra characters.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is meant to be a general cleanup and performance improvement
for clean_rx_irq. The previous patch should have updated the allocation so
that the rings can be treated as read-only within the clean_rx_irq
function. In addition I am re-ordering the operations such that several
goals are accomplished including reducing the overhead for packet
accounting, reducing the number of items on the stack, and improving
overall performance.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is meant to improve performance by splitting the Tx and Rx
rings into 3 sections. The first is primarily a read only section
containing basic things like the indexes, a pointer to the dev and netdev
structures, and basic information. The second section contains the stats
and next_to_use and next_to_clean values. The third section is primarily
unused values that can just be placed at the end of the ring and are not
used in the hot path.
The adapter structure has several sections that are read in the hot path.
In order to improve performance there I am combining the frequent read
hot path items into a single cache line.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is meant to streamline the Rx buffer allocation and cleanup.
This is accomplished by reducing the number of writes by only having the Rx
descriptor ring written by software during allocation, and it will only be
read during cleanup.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change removes support for single buffer mode from igb and makes the
driver function in packet split always. The advantage to doing this is
that we can reduce total memory allocation overhead significantly as we
will only need to allocate one 1K slab per packet and then make use of a
reusable half page instead of allocating a 2K slab per packet.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch modifies the max_frame_size in order account for an optional
VLAN tag. In order to support this we must also increase the
MAX_STD_JUMBO_FRAME_SIZE to account for the 4 extra bytes.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change cleans up the RXDCTL and TXDCTL configurations and optimizes RX
performance by allowing back write-backs on all hardware other than 82576.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
netif_tx_start_all_queues() is already called in ixgbe_up_complete, no need
to do it twice.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Remove duplicate inc of hwstats->ruc
Introduce separate loops for 8 and 16 register reads.
Consolidate mac checks under one case.
Make sure registers are cleared on read.
Reported-by: Jonathan Lynch <jonathan.lynch@thenowfactory.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
CC: Jonathan Lynch <jonathan.lynch@thenowfactory.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch improves the memory utilization with RSC when in one-buffer
mode. This is accomplished by making the default buffer sizes match up
with the standard memory allocation sizes minus 1K for shared info and
padding overhead. By doing this CPU utilization when doing large receives
can be reduced by as much as 8%.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The adapter structure was removed from the call so it can be dropped from
the ixgbe_fso documentation.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
One of the 82598 phys was not being correctly identified as being SFP.
This change corrects that.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change adds a small bit of missing code for enabling the overheat sensor
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
ixgbe_up and ixgbe_up_complete will always return 0. Since this doesn't
provide any useful information we might as well just make them both void
and save ourselves from having to return an unused value.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change fixes an issue in which the incorrect amount of headroom was
being reserved for flow director filters.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change fixes a minor redundancy in that tx_sample_rate was set twice.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Private rx_csum flags are now duplicate of netdev->features & NETIF_F_RXCSUM.
Removing this needs deeper surgery.
Things noticed:
- ixgb has RX csum disabled by default
- HW VLAN acceleration probably can be toggled, but it's left as is
- the resets on RX csum offload change can probably be avoided
- there is A LOT of copy-and-pasted code here
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A user-space process must use ETHTOOL_GRXCLSRLCNT to find the number
of classification rules, then allocate a buffer of the right size,
then use ETHTOOL_GRXCLSRLALL to fill the buffer. If some other
process inserts or deletes a rule between those two operations,
the user buffer might turn out to be the wrong size.
If it's too small, the return value will be -EMSGSIZE. But if it's
too large, there is no indication of this. Fix this by updating
the rule_cnt field on return.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct the description of ethtool_rxnfc::rule_locs; it is an array
of currently used locations, not all possible valid locations.
Add note that drivers must not use ethtool_rxnfc::rule_locs.
The rule_locs argument to ethtool_ops::get_rxnfc is either NULL or a
pointer to an array of u32, so change the parameter type accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was possible to inadvertently add additional interrupt causes to the
MSI-X other interrupt. This occurred when things such as RX buffer overrun
events were being triggered at the same time as an event such as a Flow
Director table reinit request. In order to avoid this we should be
explicitly programming only the interrupts that we want enabled. In
addition I am renaming the ixgbe_msix_lsc function and interrupt to drop
any implied meaning of this being a link status only interrupt.
Unfortunately the patch is a bit ugly due to the fact that ixgbe_irq_enable
needed to be moved up before ixgbe_msix_other in order to have things
defined in the correct order.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is meant to cleanup some of the code related to SR-IOV and the
interrupt registers. Specifically I am moving the EITRSEL configuration
into the MSI-X configuration section instead of enablement. Also I am
fixing the VF shutdown path since it had operations in the incorrect order.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The reset paths are overly complicated and are either missing steps or
contain extra unnecessary steps such as reading MAC address twice. This
change is meant to help clean up the reset paths an get things functioning
correctly.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change updated the TXDCTL configuration. The main goal is to be much
more explicit about the configuration and avoid a possible fake TX hang
when the interrupt throttle rate is set to 0.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch is a minor whitespace cleanup to compress the device ID
declaration and board type declaration onto the same line. It seems to
make sense since all of the combinations of the two are less than 80
characters and it makes the overall layout a bit more readable.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch drops a set of unnecessary dereferences to the hardware structure
since we already have a local copy of the hardware pointer.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>