Adds a new function t4vf_fl_pkt_align() and use the same in SGE
initialization code to find out freelist packet alignment
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
T4 and T5 hardware will not coalesce Free List PCI-E Fetch Requests if
the Host Driver provides more Free List Pointers than the Fetch Burst
Minimum value. So if we set FBMIN to 64 bytes and the Host Driver
supplies 128 bytes of Free List Pointer data, the hardware will issue two
64-byte PCI-E Fetch Requests rather than a single coallesced 128-byte
Fetch Request. T6 fixes this. So, for T4/T5 we set the FBMIN value to 128
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use freelist capacity instead of freelist size while checking, if
freelist needs to be refilled
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VF administrative mac addresses (stored in the PF driver) are
initialized to zero when the PF driver starts up.
These addresses may be modified in the PF driver through ndo calls
initiated by iproute2 or libvirt.
While we allow the PF/host to change the VF admin mac address from zero
to a valid unicast mac, we do not allow restoring the VF admin mac to
zero. We currently only allow changing this mac to a different unicast mac.
This leads to problems when libvirt scripts are used to deal with
VF mac addresses, and libvirt attempts to revoke the mac so this
host will not use it anymore.
Fix this by allowing resetting a VF administrative MAC back to zero.
Fixes: 8f7ba3ca12 ('net/mlx4: Add set VF mac address support')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reported-by: Moshe Levi <moshele@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The limit of 63 is only for virtual functions while the actual enforcement
was for VFs plus physical functions, fix that.
Fixes: e57968a10b ('net/mlx4_core: Support the HA mode for SRIOV VFs too')
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the mac and vlan register/unregister/replace functions, the driver locks
the mac table mutex (or vlan table mutex) on both ports.
We move to use mutex_lock_nested() to prevent warnings, such as the one below.
[ 101.828445] =============================================
[ 101.834820] [ INFO: possible recursive locking detected ]
[ 101.841199] 4.5.0-rc2+ #49 Not tainted
[ 101.850251] ---------------------------------------------
[ 101.856621] modprobe/3054 is trying to acquire lock:
[ 101.862514] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c10e>] __mlx4_register_mac+0x87e/0xa90 [mlx4_core]
[ 101.874598]
[ 101.874598] but task is already holding lock:
[ 101.881703] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c0f0>] __mlx4_register_mac+0x860/0xa90 [mlx4_core]
[ 101.893776]
[ 101.893776] other info that might help us debug this:
[ 101.901658] Possible unsafe locking scenario:
[ 101.901658]
[ 101.908859] CPU0
[ 101.911923] ----
[ 101.914985] lock(&table->mutex#2);
[ 101.919595] lock(&table->mutex#2);
[ 101.924199]
[ 101.924199] * DEADLOCK *
[ 101.924199]
[ 101.931643] May be due to missing lock nesting notation
Fixes: 5f61385d2e ('net/mlx4_core: Keep VLAN/MAC tables mirrored in multifunc HA mode')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Suggested-by: Doron Tsur <doront@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using the HW VPort counters for traffic (rx/tx packets/bytes)
statistics is wrong. This is because frames dropped due to steering or
out of buffer will be counted as received. To fix that, we move to use
the packet/bytes accounting done by the driver for what the netdev
reports out.
Fixes: f62b8bb8f2 ('net/mlx5: Extend mlx5_core to support [...]')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sum up rx/tx bytes in software as we do for rx/tx packets, to be reported
in upcoming statistics fix.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon changing num_channels, reset the RSS indirection table to
match the new value.
Fixes: 2d75b2bc8a ('net/mlx5e: Add ethtool RSS configuration options')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should modify TIRs explicitly to apply the new RSS configuration.
The light ndo close/open calls do not "refresh" them.
Fixes: 2d75b2bc8a ('net/mlx5e: Add ethtool RSS configuration options')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Readers/Writers lock for SW timecounter was acquired without disabling
interrupts on local CPU.
The problematic scenario:
* HW timestamping is enabled
* Timestamp overflow periodic service task is running on local CPU and
holding write_lock for SW timecounter
* Completion arrives, triggers interrupt for local CPU.
Interrupt routine calls napi_schedule(), which triggers rx/tx
skb process.
An attempt to read SW timecounter using read_lock is done, which is
already locked by a writer on the same CPU and cause soft lockup.
Add irqsave/irqrestore for when using the readers/writers lock for
writing.
Fixes: ef9814deaf ('net/mlx5e: Add HW timestamping (TS) support')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ethtool LRO enable/disable is broken, as of today we only modify TCP
TIRs in order to apply the requested configuration.
Hardware requires that all TIRs pointing to the same RQ should share the
same LRO configuration. For that all other TIRs' LRO fields must be
modified as well.
Fixes: 5c50368f38 ('net/mlx5e: Light-weight netdev open/stop')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the MLX5E_CQ_HAS_CQES optimization flag, the following buggy
flow might occur:
- Suppose RX is always busy, TX has a single packet every second.
- We poll a single TX cqe and clear its flag.
- We never arm it again as RX is always busy.
- TX CQ flag is never changed, and new TX cqes are not polled.
We revert this optimization.
Fixes: e586b3b0ba ('net/mlx5: Ethernet Datapath files')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch just updates the driver to the version fully
tested on STi platforms. This version is Oct_2015.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a threshold now used to also limit the skb allocation
when use zero-copy. This is to avoid that there are incoherence
in the ring due to a failure on skb allocation under very
aggressive testing and under low memory conditions.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to allow this driver to copy tiny frames during the reception
process. This is giving more stability while stressing the driver on STi
embedded systems.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
phy_bus_name can be NULL when "fixed-link" property isn't used.
Then, since "stmmac: do not poll phy handler when attach a switch",
phy_bus_name ptr needs to be checked before strcmp is called.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch avoids to call the stmmac_adjust_link when
the driver is connected to a switch by using the FIXED_PHY
support. Prior this patch the phydev->irq was set as PHY_POLL
so periodically the phy handler was invoked spending useless
time because the link cannot actually change.
Note that the stmmac_adjust_link will be called just one
time and this guarantees that the ST glue logic will be
setup according to the mode and speed fixed.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to fill the first descriptor just before granting
the DMA engine so at the end of the xmit.
The patch takes care about the algorithm adopted to mitigate the
interrupts, then it fixes the last segment in case of no fragments.
Moreover, this new implementation does not pass any "ter" field when
prepare the descriptors because this is not necessary.
The patch also details the memory barrier in the xmit.
As final results, this patch guarantees the same performances
but fixing a case if small datagram are sent. In fact, this
kind of test is impacted if no coalesce is done.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dirty index can be updated out of the loop where all the
tx resources are claimed. This will help on performances too.
Also a useless debug printk has been removed from the main loop.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch "inline" get_tx_owner and get_ls routines. It Results in a
unique read to tdes0, instead of three, to check TX_OWN and LS bits,
and other status bits.
It helps improve driver TX path by removing two uncached read/writes
inside TX clean loop for enhanced descriptors but not for normal ones
because the des1 must be read in any case.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to optimize the way to manage the TDES inside the
xmit function. When prepare the frame, some settings (e.g. OWN
bit) can be merged. This has been reworked to improve the tx
performances.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RDES0 register can be read several times while doing RX of a
packet.
This patch slightly improves RX path performance by reading rdes0
once for two operation: check rx owner, get rx status bits.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Optimize tx_clean by avoiding a des3 read in stmmac_clean_desc3().
In ring mode, TX, des3 seems only used when xmit a jumbo frame.
In case of normal descriptors, it may also be used for time
stamping.
Clean it in the above two case, without reading it.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
last_segment field is read twice from dma descriptors in stmmac_clean().
Add last_segment to dma data so that this flag is from priv
structure in cache instead of memory.
It avoids reading twice from memory for each loop in stmmac_clean().
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the code pulls out the length field when
unmapping a buffer directly from the descriptor. This will result
in an uncached read to a dma_alloc_coherent() region. There is no
need to do this, so this patch simply puts the value directly into
a data structure which will hit the cache.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to rework the ring management now optimized.
The indexes into the ring buffer are always incremented, and
the entry is accessed via doing a modulo to find the "real"
position in the ring.
It is inefficient, modulo is an expensive operation.
The formula [(entry + 1) & (size - 1)] is now adopted on
a ring that is power-of-2 in size.
Then, the number of elements cannot be set by command line but
it is fixed.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch completely changes the descriptor layout to improve
the whole performances due to the single read usage of the
descriptors in critical paths.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch restructures the DMA bus settings and this is done
by introducing a new platform structure used for programming
the AXI Bus Mode Register inside the DMA module.
This structure can be populated from device-tree as documented in the
binding txt file.
After initializing the DMA, the AXI register can be optionally tuned
for platform drivers based.
This patch also reworks some parameters to make coherent the DMA
configuration now that AXI register is introduced.
For example, the burst_len is managed by using the mentioned axi
support above; so the snps,burst-len parameter has been removed.
It makes sense to provide the AAL parameter from DT to Address-Aligned
Beats inside the Register0 and review the PBL settings when initialize
the engine.
For PCI glue, rebuilding the story of this setting, it
was added to align a configuration so not for fixing some
known problem. No issue raised after this patch.
It is safe to use the default burst length instead of
tuning it to the maximum value
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to share the same reset procedure between dwmac100 and
dwmac1000 chips.
This will also help on enhancing the driver and support new chips.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of problems when initializing the chip, the error flows aren't
being properly done. Specifically, it's possible that the chip would be
left in a configuration allowing it [internally] to access the host
memory, causing fatal problems in the device that would require power
cycle to overcome.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current statistics logic is meant for L2, not for all future protocols.
Move this content to the proper designated file.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BB_A0 is a development model that is will not reach actual clients.
In fact, future firmware would simply fail to initialize such chip.
This changes the configuration into B0 instead of A0, and adds a safeguard
against the slim chance someone would actually try this with an A0 adapter
in which case probe would gracefully fail.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver learns the inner bar sized from a register configured by management
firmware, but older versions are not setting this register.
But since we know which values were configured back then, use them instead.
Signed-off-by: Ram Amrani <Ram.Amrani@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For RTL8168G/RTL8168H/RTL8411B/RTL8107E, enable this flag to eliminate
message "AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0002
address=0x0000000000003000 flags=0x0050] in dmesg.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use managed resource functions devm_kzalloc and pcim_enable_device
to simplify error handling. Subsequently, remove unnecessary
kfree, pci_disable_device and pci_release_regions.
To be compatible with the change, various gotos are replaced with
direct returns and unneeded labels are dropped.
Also, `sc` was only being freed in the probe function and not the
remove function before the change. By using devm_kzalloc this patch
also fixes this memory leak.
Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For consistency with other event data structs and to lessen
the chance of a mistake should one of the reserved fields become
used in the future, define the reserved fields as little-endian.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There were no missing endianness conversions in this case, but the
fields of struct cfc_del_event_data should be defined as little-endian
to get rid of the ugly (__force __le32) casts.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's not really a bug, but it was odd that bnx2x_eq_int() read the
message data as if it were a cfc_del_event regardless of the event type.
It's cleaner to access only the appropriate member of union event_data
after checking the event opcode.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On ppc64 the PF did not receive messages from VFs correctly.
Fields of struct vf_pf_event_data are little-endian.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a VF is sending a message to the PF, it needs to trigger the PF
to tell it the message is ready.
The trigger did not work on ppc64. No interrupt appeared in the PF.
The bug is due to confusion about the layout of struct trigger_vf_zone.
In bnx2x_send_msg2pf() the trigger is written using writeb(), not
writel(), so the attempt to define the struct with a reversed layout on
big-endian is counter-productive.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2x crashes during the initialization of the 8021q module on ppc64.
The bug is a missing conversion from le32 in
bnx2x_handle_classification_eqe() when obtaining the cid value from
struct eth_event_data.
The fields in struct eth_event_data should all be declared as
little-endian and conversions added where missing.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch utilizes the attention infrastructure to log additional
information that relates only to specific HW blocks.
For some of those HW blocks, it also stops automatically disabling the
attention generation as the attention is considered benign and thus
should only be logged; No fear of it flooding the system.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each HW block contains common information about attention reasons,
raising a bit for each one of the different sub-reasons that caused it
to raise an attention.
This patch extends the infrastructure by allowing logging of the various
reasons causing the HW blocks to generate an attention.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HW is capable of generating attentnions for a multitude of reasons,
but current driver is enabling attention generation only for management
firmware [required for link notifications].
This patch enables almost all of the possible reasons for HW attentions,
logging the HW block generating the attention and preventing further
attentions from that source [to prevent possible attention flood].
It also lays the infrastructure for additional exploration of the various
attentions.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid double mapping of io mapped memory, Device page may be
mapped to non-cached(NC) or to write-combining(WC).
The code before this fix tries to map it both to WC and NC
contrary to what stated in Intel's software developer manual.
Here we remove the global WC mapping of all UARS
"dev->priv.bf_mapping", since UAR mapping should be decided
per UAR (e.g we want different mappings for EQs, CQs vs QPs).
Caller will now have to choose whether to map via
write-combining API or not.
mlx5e SQs will choose write-combining in order to perform
BlueFlame writes.
Fixes: 88a85f99e5 ('TX latency optimization to save DMA reads')
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Reviewed-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling mlx5e_set_coalesce while the interface is down will result in
modifying CQs that don't exist.
Fixes: f62b8bb8f2 ('net/mlx5: Extend mlx5_core to support ConnectX-4
Ethernet functionality')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If CQ moderation is not supported by the device, print a warning on
netdevice load, and return error when trying to modify/query cq
moderation via ethtool.
Fixes: f62b8bb8f2 ('net/mlx5: Extend mlx5_core to support ConnectX-4
Ethernet functionality')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By its role, there is no need to set all the other parameters
for the drop RQ.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For data cache locality considerations, we moved the nop and
csum_offload_inner within sq_stats struct as they are more
commonly accessed in xmit path.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of the pair (channel, tc), we now use a single number that
goes over all tx queues of a TC, for all TCs.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
More proper to declare carrier state UP only after the channels
are ready for traffic.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We only need to flush the irq handler to make sure it does not
queue a work into the global work queue after we start to flush it.
So using synchronize_irq() is more appropriate than a spin lock.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Disable auto negotiation on init to properly detect an already plugged
cable at boot.
At boot, when the phy is started, it is in the PHY_UP state.
However, if a cable is plugged at boot, because auto negociation is already
enabled at the time we get the first interrupt, the phy is already running.
But the state machine then switches from PHY_UP to PHY_AN and calls
phy_start_aneg(). phy_start_aneg() will not do anything because aneg is
already enabled on the phy. It will then wait for a interrupt before going
further. This interrupt will never happen unless the cable is unplugged and
then replugged.
It was working properly before 321beec504 (net: phy: Use interrupts when
available in NOLINK state) because switching to NOLINK meant starting
polling the phy, even if IRQ were enabled.
Fixes: 321beec504 (net: phy: Use interrupts when available in NOLINK state)
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At least on ksz8081, when getting back from power down, interrupts are
disabled. ensure they are reenabled if they were previously enabled.
This fixes resuming which is failing on the xplained boards from atmel
since 321beec504 (net: phy: Use interrupts when available in NOLINK
state)
Fixes: 321beec504 (net: phy: Use interrupts when available in NOLINK state)
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement port_vlan_filtering in the driver to toggle the related port
802.1Q mode between DISABLED and SECURE, on user request.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that ports isolation is correctly configured when joining or leaving
a bridge, there is no need to rely on reserved VLANs to isolate
unbridged ports anymore. Thus remove them, and disable 802.1Q on setup.
This restores the expected behavior of hardware bridging for systems
without 802.1Q or VLAN filtering enabled.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The In Chip Port Based VLAN Table contains bits used to restrict which
output ports this input port can send frames to.
With the VLAN filtering enabled, these tables work in conjunction with
the VLAN Table Unit to allow egressing frames.
In order to remove the current dependency to BRIDGE_VLAN_FILTERING for
basic hardware bridging to work, it is necessary to restore a fine
control of each port's VLANTable, on setup and when a port joins or
leaves a bridge.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Give a new bridge a fresh FDB, assign it to its members, and restore a
fresh FDB to a port leaving a bridge.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Restore per-port FDB. Assign them on setup, allow adding and deleting
addresses into them, and dump them.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a _mv88e6xxx_fid_new function which gives and flushes the lowest FID
available. Call it when preparing a new VTU entry.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move out the code which dumps a single FDB to its own function.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename _mv88e6xxx_vlan_init in _mv88e6xxx_vtu_new, eventually called
from a new _mv88e6xxx_vtu_get function, which abstracts the VTU GetNext
VID-1 trick to retrieve a single entry.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ppp_read() and ppp_poll() can be called concurrently with ppp_ioctl().
In this case, ppp_ioctl() might call ppp_ccp_closed(), which may update
ppp->flags while ppp_read() or ppp_poll() is reading it.
The update done by ppp_ccp_closed() isn't atomic due to the bit mask
operation ('ppp->flags &= ~(SC_CCP_OPEN | SC_CCP_UP)'), so concurrent
readers might get transient values.
Reading incorrect ppp->flags may disturb the 'ppp->flags & SC_LOOP_TRAFFIC'
test in ppp_read() and ppp_poll(), which in turn can lead to improper
decision on whether the PPP unit file is ready for reading or not.
Since ppp_ccp_closed() is protected by the Rx and Tx locks (with
ppp_lock()), taking the Rx lock is enough for ppp_read() and ppp_poll()
to guarantee that ppp_ccp_closed() won't update ppp->flags
concurrently.
The same reasoning applies to ppp->n_channels. The 'n_channels' field
can also be written to concurrently by ppp_ioctl() (through
ppp_connect_channel() or ppp_disconnect_channel()). These writes aren't
atomic (simple increment/decrement), but are protected by both the Rx
and Tx locks (like in the ppp->flags case). So holding the Rx lock
before reading ppp->n_channels also prevents concurrent writes.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow a user to split or unsplit a port using the newly introduced
devlink ops.
Once split, the original netdev is destroyed and 2 or 4 others are
created, according to user configuration. The new ports are like any
other port, with the sole difference of supporting a lower maximum
speed. When unsplit, the reverse process takes place.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When splitting and unsplitting we'll destroy usable ports on the fly, so
mark them using a NULL pointer to indicate that their local port number
is free and can be re-used.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The port netdevs are each associated with a different local port number
in the device. These local ports are grouped into groups of 4 (e.g.
(1-4), (5-8)) called clusters. The cluster constitutes the one of two
possible modules they can be mapped to. This mapping is board-specific
and done by the device's firmware during init.
When splitting a port by 4, the device requires us to first unmap all
the ports in the cluster and then map each to a single lane in the module
associated with the port netdev used as the handle for the operation.
This means that two port netdevs will disappear, as only 100Gb/s (4
lanes) ports can be split and we are guaranteed to have two of these
((1, 3), (5, 7) etc.) in a cluster.
When unsplit occurs we need to reinstantiate the two original 100Gb/s
ports and map each to its origianl module. Therefore, during driver init
store the initial local port to module mapping, so it can be used later
during unsplitting.
Note that a by 2 split doesn't require us to store the mapping, as we
only need to reinstantiate one port whose module is known.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When splitting a port we replace it with 2 or 4 other ports. To be able
to do that we need to remove the original port netdev and unmap it from
its module. However, we first mark it as disabled, as active ports
cannot be unmapped.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add middle layer in mlxsw core code to forward port split/unsplit calls
into specific ASIC drivers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, there has been an mlx4-specific sysfs file allowing user to
change port type to either Ethernet of InfiniBand. This is very
inconvenient.
Allow to expose the same ability to set port type in a generic way
using devlink interface.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
v2->v3:
-add dev param to devlink_register (api change)
Signed-off-by: David S. Miller <davem@davemloft.net>
In the original series drivers would get offload requests for cls_u32
rules even if the feature bit is disabled. This meant the driver had
to do a boiler plate check on the feature bit before adding/deleting
the rule.
This patch lifts the check into the core code and removes it from the
driver specific case.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rx headroom for veth dev is the peer device needed_headroom.
Avoid ping-pong updates setting the private flag IFF_PHONY_HEADROOM.
This avoids skb head reallocation when forwarding from a veth dev
towards a device adding some kind of encapsulation.
When transmitting frames below the MTU size towards a vxlan device,
this gives about 10% performance speed-up when OVS is used to connect
the veth and the vxlan device and a little more when using a
plain Linux bridge.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ndo_set_rx_headroom controls the align value used by tun devices to
allocate skbs on frame reception.
When the xmit device adds a large encapsulation, this avoids an skb
head reallocation on forwarding.
The measured improvement when forwarding towards a vxlan dev with
frame size below the egress device MTU is as follow:
vxlan over ipv6, bridged: +6%
vxlan over ipv6, ovs: +7%
In case of ipv4 tunnels there is no improvement, since the tun
device default alignment provides enough headroom to avoid the skb
head reallocation.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is used to send NVM_FIND_DIR_ENTRY messages which can return error
if the entry is not found. This is normal and the error message will
cause unnecessary alarm, so silence it.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new function bnxt_do_send_msg() to do essentially the same thing
with an additional paramter to silence error response messages. All
current callers will set silent to false.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For everything to fit, we remove the PHY microcode version and replace it
with the firmware package version in the fw_version string.
Signed-off-by: Rob Swindell <swindell@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use appropriate firmware request header structure to prepare the
firmware messages. This avoids the unnecessary conversion of the
fields to 32-bit fields. Add appropriate endian conversion when
printing out the message fields in dmesg so that they appear correct
in the log.
Reported-by: Rob Swindell <swindell@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before this patch, we used a hardcoded value of 500 msec as the default
value for firmware message response timeout. For better portability with
future hardware or debug platforms, use the value provided by firmware in
the first response and store it for all susequent messages. Redefine the
macro HWRM_CMD_TIMEOUT to the stored value. Since we don't have the
value yet in the first message, use the 500 ms default if the stored value
is zero.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When tx and rx rings don't share the same completion ring, tx coalescing
parameters can be set differently from the rx coalescing parameters.
Otherwise, use rx coalescing parameters on shared completion rings.
Adjust rx coalescing default values to lower interrupt rate.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a function to set all the coalescing parameters. The function can
be used later to set both rx and tx coalescing parameters.
v2: Fixed function parameters formatting requested by DaveM.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't convert these to internal hardware tick values before storing
them. This avoids the confusion of ethtool -c returning slightly
different values than the ones set using ethtool -C when we convert
hardware tick values back to micro seconds. Add better comments for
the hardware settings.
Also, rename the current set of coalescing fields with rx_ prefix.
The next patch will add support of tx coalescing values.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During remove_one() when SRIOV is enabled, the PF driver
should broadcast PF driver unload notification to all
VFs that are attached to VMs. Upon receiving the PF
driver unload notification, the VF driver should print
a warning message to message log. Certain operations on the
VF may not succeed after the PF has unloaded.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the VF to setup its own MAC address if the PF has not administratively
set it for the VF. To do that, we should always store the MAC address
from the firmware. There are 2 cases:
1. The MAC address is valid. This MAC address is assigned by the PF and
it needs to override the current VF MAC address.
2. The MAC address is zero. The VF will use a random MAC address by default.
By storing this 0 MAC address in the VF structure, it will allow the VF
user to change the MAC address later using ndo_set_mac_address() when
it sees that the stored MAC address is 0.
v2: Expanded descriptions and added more comments.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use list_move_tail() to move MAC address entry from list of pending
to list of active entries. Simple list_add_tail() leaves the entry
also in the first list, this leads to list corruption.
Cc: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEcBAABCgAGBQJW0AKVAAoJED07qiWsqSVqlwYH/1wxn03OA5V3zsp7zjt+4095
ToUY7RFlRB5zLNCWE3uz37pRsiDv4TcJ/VwgtMlkvskjdFmQQOveEhJYKx1PVq/C
l1hOcPRdvn9Ji7WQp5gQGlT4zymQWU0//6oQGx5MkuaLhRhLXoRSIaj5WDNya6Iy
AqVJhi3hMoYWiMuV2bkvSicPZ39OCtDejJbGL4D1u6wGvHje3emGtxNoIzx3dJ81
jBiE5aNRnC5c5jtz4veEVIRFhYndvfUGE5deDG2K7wgLt4IGE0zLZAuf4NgbpbV7
Xi/Nt5LCRYepol9AoI3qd1vuH8LK7YAFcyB2NCqcZbvsqo1RXsuLZWQExSpsmfc=
=634Q
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-4.6-20160226' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2016-02-26
this is a pull request of 3 patch for net-next/master.
There are two patches by Simon Horman, in which the device tree support
for the rcar_can driver is improved. One patch by me fixes the bad
coding style of the ems_usb driver which was introduced recently.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEcBAABCgAGBQJW0ACIAAoJED07qiWsqSVqraYIALM+0BeB5V2g2kyjvsQS2Zah
1139UfEwVvb3djUhEvq68+zsXwPVrBdf2Kk8KgQxOzB3ctKRp8JtN7HPLWLXrgLw
WSFRhye/xUuPrUgiuwZ8cGR1QiBAuA/xRgFR8re7+pQ2nEzlx53neWolnjuesuWo
5MKoyPK7LSIgkZD7VteJklpScDQDqGcbc6w6pe1yg8gwQZoDycD1+Hqj4JKehqJN
J0W6puBL6urs9KtCzZRzMzEk09rH5BanQsAlCf4NkE0zvnPmtjptmsgRAo4rpAz3
6L6C+B1gph04eb+dXKsIRAVdN+kCHt1Sa7PlmfxXrY2hgVLvb+pXwJ6p3O6DRXE=
=L95V
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-4.5-20160226' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2016-02-26
this is a pull request of one patch for net.
The patch by Maximilain Schneider fixes a kfree() problem during disconnect in
the gs_usb driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ethtool operations of set_pauseram and get_pauseparm.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not required after commit cd772de358
("phy: keep pause flags in phy driver features")
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace devid to chipid & chiprev for easy access.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the user to set and retrieve speed and duplex of the
hv_netvsc device via ethtool.
Example:
$ ethtool eth0
Settings for eth0:
...
Speed: Unknown!
Duplex: Unknown! (255)
...
$ ethtool -s eth0 speed 1000 duplex full
$ ethtool eth0
Settings for eth0:
...
Speed: 1000Mb/s
Duplex: Full
...
This is based on patches by Roopa Prabhu and Nikolay Aleksandrov.
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently, I fixed a bug in 3c59x:
commit 6e144419e4
Author: Neil Horman <nhorman@tuxdriver.com>
Date: Wed Jan 13 12:43:54 2016 -0500
3c59x: fix another page map/single unmap imbalance
Which correctly rebalanced dma mapping and unmapping types. Unfortunately it
introduced a new bug which causes oopses on older systems.
When mapping dma regions, the last entry for a packet in the 3c59x tx ring
encodes a LAST_FRAG bit, which is encoded as the high order bit of the buffers
length field. When it is unmapped the LAST_FRAG bit is cleared prior to being
passed to the unmap function. Unfortunately the commit above fails to do that
masking. It was missed in testing because the system on which I tested it had
an intel iommu, the driver for which ignores the size field, using only the DMA
address as the token to identify the mapping to be released. However, on older
systems that rely on swiotlb (or other dma drivers that key off that length
field), not masking off that LAST_FRAG high order bit results in parsing a huge
size to be release, leading to all sorts of odd corruptions and the like.
Fix is easy, just mask the length with 0xFFF. It should really be
&(LAST_FRAG-1), but 0xFFF is the style of the file, and I'd like to make this
fix minimal and correct before making it prettier.
Appies to the net tree cleanly. All testing on both iommu and swiommu based
systems produce good results
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 5b6490def9 ("3c59x: Use setup_timer()") Amitoj
removed add_timer which sets up the epires timer. In this patch
the behavior is restore but it uses mod_timer which is a bit more
compact.
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We intended to return PTR_ERR() here instead of 1.
Fixes: 1f9993f682 ('rocker: fix a neigh entry leak issue')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the send skbuff reaches the end, nlmsg_put and friends returns
-EMSGSIZE but it is silently thrown away in ndo_fdb_dump. It is called
within a for_each_netdev loop and the first fdb entry of a following
netdev could fit in the remaining skbuff. This breaks the mechanism
of cb->args[0] and idx to keep track of the entries that are already
dumped, which results missing entries in bridge fdb show command.
Signed-off-by: Minoura Makoto <minoura@valinux.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>