Commit Graph

576492 Commits

Author SHA1 Message Date
Hariprasad Shenai
cb440364c7 cxgb4vf: Make sge init code more readable
Adds a new function t4vf_fl_pkt_align() and use the same in SGE
initialization code to find out freelist packet alignment

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:46:29 -05:00
Hariprasad Shenai
edadad80d6 cxgb4/cxgb4vf: For T6 adapter, set FBMIN to 64 bytes
T4 and T5 hardware will not coalesce Free List PCI-E Fetch Requests if
the Host Driver provides more Free List Pointers than the Fetch Burst
Minimum value.  So if we set FBMIN to 64 bytes and the Host Driver
supplies 128 bytes of Free List Pointer data, the hardware will issue two
64-byte PCI-E Fetch Requests rather than a single coallesced 128-byte
Fetch Request. T6 fixes this. So, for T4/T5 we set the FBMIN value to 128

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:46:29 -05:00
Hariprasad Shenai
da08e4259f cxgb4/cxgb4vf: Use fl capacity to check if fl needs to be replenished
Use freelist capacity instead of freelist size while checking, if
freelist needs to be refilled

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:46:29 -05:00
David S. Miller
11351bf762 Merge branch 'mlx4-fixes'
Or Gerlitz says:

====================
Mellanox 10/40G mlx4 driver fixes for 4.5-rc6

This series contains two fixes for the SRIOV HW LAG that was
introduced in 4.5-rc1 and one fix that allows to revoke the
administrative MAC that was assigned to VF through the PF.

The VF mac fix needs to go for stable too.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:42:46 -05:00
Jack Morgenstein
6e5224224f net/mlx4_core: Allow resetting VF admin mac to zero
The VF administrative mac addresses (stored in the PF driver) are
initialized to zero when the PF driver starts up.

These addresses may be modified in the PF driver through ndo calls
initiated by iproute2 or libvirt.

While we allow the PF/host to change the VF admin mac address from zero
to a valid unicast mac, we do not allow restoring the VF admin mac to
zero. We currently only allow changing this mac to a different unicast mac.

This leads to problems when libvirt scripts are used to deal with
VF mac addresses, and libvirt attempts to revoke the mac so this
host will not use it anymore.

Fix this by allowing resetting a VF administrative MAC back to zero.

Fixes: 8f7ba3ca12 ('net/mlx4: Add set VF mac address support')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reported-by: Moshe Levi <moshele@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:42:46 -05:00
Moni Shoua
00ada91039 net/mlx4_core: Check the correct limitation on VFs for HA mode
The limit of 63 is only for virtual functions while the actual enforcement
was for VFs plus physical functions, fix that.

Fixes: e57968a10b ('net/mlx4_core: Support the HA mode for SRIOV VFs too')
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:42:45 -05:00
Jack Morgenstein
03a79f31ef net/mlx4_core: Fix lockdep warning in handling of mac/vlan tables
In the mac and vlan register/unregister/replace functions, the driver locks
the mac table mutex (or vlan table mutex) on both ports.

We move to use mutex_lock_nested() to prevent warnings, such as the one below.

[ 101.828445] =============================================
[ 101.834820] [ INFO: possible recursive locking detected ]
[ 101.841199] 4.5.0-rc2+  #49 Not tainted
[ 101.850251] ---------------------------------------------
[ 101.856621] modprobe/3054 is trying to acquire lock:
[ 101.862514] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c10e>] __mlx4_register_mac+0x87e/0xa90 [mlx4_core]
[ 101.874598]
[ 101.874598] but task is already holding lock:
[ 101.881703] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c0f0>] __mlx4_register_mac+0x860/0xa90 [mlx4_core]
[ 101.893776]
[ 101.893776] other info that might help us debug this:
[ 101.901658] Possible unsafe locking scenario:
[ 101.901658]
[ 101.908859] CPU0
[ 101.911923] ----
[ 101.914985] lock(&table->mutex#2);
[ 101.919595] lock(&table->mutex#2);
[ 101.924199]
[ 101.924199] * DEADLOCK *
[ 101.924199]
[ 101.931643] May be due to missing lock nesting notation

Fixes: 5f61385d2e ('net/mlx4_core: Keep VLAN/MAC tables mirrored in multifunc HA mode')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Suggested-by: Doron Tsur <doront@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:42:45 -05:00
David S. Miller
ebc363f577 Merge branch 'mlx5-fixes'
Saeed Mahameed says:

====================
Mellanox 100G mlx5 driver fixes

This series has few bug fixes for the mlx5 Ethernet driver.

Eran fixed a locking issue with time-stamping that could cause a
soft-lockup when time-stamping is enabled.

Gal fixed the rx/tx packets/bytes counters returned by the driver to
actually went through the network stack.

Tariq removed a poll CQ optimization which could lead the driver to
stop getting interrupts for some of the rings, and a did also fix to
HW LRO which is currently broken.

He also provided RSS and RX hash fixes for the case of changing the
number of rx rings the RX hash/RSS configuration will be out of sync.

The time stamping fix from Eran is not for -stable as the feature was
only introduced in 4.5 but all of the others are.

Changes fro V0:
	- Eran addressed the irqsave/restore comments from "Dave" and fixed them.

This series is generated against net commit 4c0b6eaf37 'net:
thunderx: Fix for Qset error due to CQ full'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:37:27 -05:00
Gal Pressman
faf4478bf8 net/mlx5e: Provide correct packet/bytes statistics
Using the HW VPort counters for traffic (rx/tx packets/bytes)
statistics is wrong. This is because frames dropped due to steering or
out of buffer will be counted as received. To fix that, we move to use
the packet/bytes accounting done by the driver for what the netdev
reports out.

Fixes: f62b8bb8f2 ('net/mlx5: Extend mlx5_core to support [...]')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:37:26 -05:00
Gal Pressman
b081da5ee1 net/mlx5e: Add rx/tx bytes software counters
Sum up rx/tx bytes in software as we do for rx/tx packets, to be reported
in upcoming statistics fix.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:37:26 -05:00
Tariq Toukan
85082dba0a net/mlx5e: Correctly handle RSS indirection table when changing number of channels
Upon changing num_channels, reset the RSS indirection table to
match the new value.

Fixes: 2d75b2bc8a ('net/mlx5e: Add ethtool RSS configuration options')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:37:26 -05:00
Tariq Toukan
bdfc028de1 net/mlx5e: Fix ethtool RX hash func configuration change
We should modify TIRs explicitly to apply the new RSS configuration.
The light ndo close/open calls do not "refresh" them.

Fixes: 2d75b2bc8a ('net/mlx5e: Add ethtool RSS configuration options')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:37:25 -05:00
Eran Ben Elisha
0ad9b20415 net/mlx5e: Fix soft lockup when HW Timestamping is enabled
Readers/Writers lock for SW timecounter was acquired without disabling
interrupts on local CPU.

The problematic scenario:
* HW timestamping is enabled
* Timestamp overflow periodic service task is running on local CPU and
  holding write_lock for SW timecounter
* Completion arrives, triggers interrupt for local CPU.
  Interrupt routine calls napi_schedule(), which triggers rx/tx
  skb process.
  An attempt to read SW timecounter using read_lock is done, which is
  already locked by a writer on the same CPU and cause soft lockup.

Add irqsave/irqrestore for when using the readers/writers lock for
writing.

Fixes: ef9814deaf ('net/mlx5e: Add HW timestamping (TS) support')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:37:25 -05:00
Tariq Toukan
ab0394fe2c net/mlx5e: Fix LRO modify
Ethtool LRO enable/disable is broken, as of today we only modify TCP
TIRs in order to apply the requested configuration.

Hardware requires that all TIRs pointing to the same RQ should share the
same LRO configuration. For that all other TIRs' LRO fields must be
modified as well.

Fixes: 5c50368f38 ('net/mlx5e: Light-weight netdev open/stop')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:37:25 -05:00
Tariq Toukan
59a7c2fd33 net/mlx5e: Remove wrong poll CQ optimization
With the MLX5E_CQ_HAS_CQES optimization flag, the following buggy
flow might occur:
- Suppose RX is always busy, TX has a single packet every second.
- We poll a single TX cqe and clear its flag.
- We never arm it again as RX is always busy.
- TX CQ flag is never changed, and new TX cqes are not polled.

We revert this optimization.

Fixes: e586b3b0ba ('net/mlx5: Ethernet Datapath files')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:37:24 -05:00
David S. Miller
d93e4093fb Merge branch 'stmmac-optimizations'
Alexandre TORGUE says:

====================
stmmac: enhance driver performances and update the version

According to Giuseppe, I send the v3 series.

This is a subset of patches to rework the driver in order to improve its
performances and make it more robust under stress conditions.

All patches have been ported on STi mainstream kernel branch and
tested on ARM STiH4xx platforms and newer ones.

This series also updates the driver version and prepares it
to include further development to support new chips.

In detail, these patches are:

o to rework and improve the internal DMA bus settings

  Fine tuning is mandatory on some platforms for both
  performance and stability issues.

o to rework and optimize the descriptor management.

  This will help a lot on performance side and preparing
  the inclusion on the GMAC4.x.

o to add a set of optimizations for both xmit and rx functions.

  These will help a lot on performance side and making the driver
  more robust in case of low memory conditions and under some
  stress test, performed for example on IP-STB.

Below some throughput figures obtained on some boxes before and after
the patches.

                       nuttcp (mbps)       iperf (Mbps)
------------------------------------------------------------------
                      tcp     udp          tcp      udp
                   tx   rx   tx  rx      tx   rx   tx  rx
                    ------------------------------------------
   old             680   800 480  506    760  800   600  700
   new             830   880 540  630    840  880   700   800

V2: - rx_copybreak is now managed by using ethtool.
V3: - improve comments on PCIe detailing that there are no regressions
    - rework some APIs to properly define some params as bool as expected
    - rework the formula to get the element inside the ring. Comparing V2,
	patches 4 and 13 have been merged because the same formula have been
	used. After this rework, no evident benefit has been noticed in terms
	of performances so the table above is still valid. Disassembling the
	code for SH4 and ARM, with the new formula just an instr is saved
	(depending on compiler flags) and this gives us not so relevanti gain,
	for example, on SH4 where some instr are executed in the same pipeline
	stage.
	Ring sizes are now fixed and maybe they can be reworked to be tuned
	w/o using stmmaceth= cmdline option. Indeed, nobody change these sizes
	and indeed the numbers selected by default respect the budget and
	avoid to pass invalid setup. These are the best driver default sizes
	for ring and chain.

====================
2016-03-02 14:21:34 -05:00
Giuseppe Cavallaro
3796e44ddc stmmac: update version to Oct_2015
This patch just updates the driver to the version fully
tested on STi platforms. This version is Oct_2015.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:34 -05:00
Giuseppe Cavallaro
120e87f91e stmmac: tune rx copy via threshold.
There is a threshold now used to also limit the skb allocation
when use zero-copy. This is to avoid that there are incoherence
in the ring due to a failure on skb allocation under very
aggressive testing and under low memory conditions.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:33 -05:00
Giuseppe Cavallaro
22ad383815 stmmac: do not perform zero-copy for rx frames
This patch is to allow this driver to copy tiny frames during the reception
process. This is giving more stability while stressing the driver on STi
embedded systems.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:33 -05:00
Fabrice Gasnier
8ecd80a5f6 stmmac: fix phy init when attached to a phy
phy_bus_name can be NULL when "fixed-link" property isn't used.
Then, since "stmmac: do not poll phy handler when attach a switch",
phy_bus_name ptr needs to be checked before strcmp is called.

Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:33 -05:00
Giuseppe Cavallaro
8e99fc5f88 stmmac: do not poll phy handler when attach a switch
This patch avoids to call the stmmac_adjust_link when
the driver is connected to a switch by using the FIXED_PHY
support. Prior this patch the phydev->irq was set as PHY_POLL
so periodically the phy handler was invoked spending useless
time because the link cannot actually change.
Note that the stmmac_adjust_link will be called just one
time and this guarantees that the ST glue logic will be
setup according to the mode and speed fixed.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:33 -05:00
Giuseppe Cavallaro
0e80bdc9a7 stmmac: first frame prep at the end of xmit routine
This patch is to fill the first descriptor just before granting
the DMA engine so at the end of the xmit.
The patch takes care about the algorithm adopted to mitigate the
interrupts, then it fixes the last segment in case of no fragments.
Moreover, this new implementation does not pass any "ter" field when
prepare the descriptors because this is not necessary.
The patch also details the memory barrier in the xmit.

As final results, this patch guarantees the same performances
but fixing a case if small datagram are sent. In fact, this
kind of test is impacted if no coalesce is done.

Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:33 -05:00
Giuseppe Cavallaro
fbc80823a9 stmmac: set dirty index out of the loop
The dirty index can be updated out of the loop where all the
tx resources are claimed. This will help on performances too.
Also a useless debug printk has been removed from the main loop.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:32 -05:00
Fabrice Gasnier
c363b6586c stmmac: optimize tx clean function
This patch "inline" get_tx_owner and get_ls routines. It Results in a
unique read to tdes0, instead of three, to check TX_OWN and LS bits,
and other status bits.

It helps improve driver TX path by removing two uncached read/writes
inside TX clean loop for enhanced descriptors but not for normal ones
because the des1 must be read in any case.

Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:32 -05:00
Giuseppe Cavallaro
be434d5075 stmmac: optimize tx desc management
This patch is to optimize the way to manage the TDES inside the
xmit function. When prepare the frame, some settings (e.g. OWN
bit) can be merged. This has been reworked to improve the tx
performances.

Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:32 -05:00
Fabrice Gasnier
c1fa3212be stmmac: merge get_rx_owner into rx_status routine.
The RDES0 register can be read several times while doing RX of a
packet.
This patch slightly improves RX path performance by reading rdes0
once for two operation: check rx owner, get rx status bits.

Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:32 -05:00
Giuseppe Cavallaro
96951366ce stmmac: add is_jumbo field to dma data
Optimize tx_clean by avoiding a des3 read in stmmac_clean_desc3().

In ring mode, TX, des3 seems only used when xmit a jumbo frame.
In case of normal descriptors, it may also be used for time
stamping.
Clean it in the above two case, without reading it.

Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:31 -05:00
Giuseppe Cavallaro
2a6d8e1726 stmmac: add last_segment field to dma data
last_segment field is read twice from dma descriptors in stmmac_clean().
Add last_segment to dma data so that this flag is from priv
structure in cache instead of memory.
It avoids reading twice from memory for each loop in stmmac_clean().

Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:31 -05:00
Giuseppe Cavallaro
553e2ab313 stmmac: add length field to dma data
Currently, the code pulls out the length field when
unmapping a buffer directly from the descriptor. This will result
in an uncached read to a dma_alloc_coherent() region. There is no
need to do this, so this patch simply puts the value directly into
a data structure which will hit the cache.

Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:31 -05:00
Giuseppe Cavallaro
e3ad57c967 stmmac: review RX/TX ring management
This patch is to rework the ring management now optimized.
The indexes into the ring buffer are always incremented, and
the entry is accessed via doing a modulo to find the "real"
position in the ring.
It is inefficient, modulo is an expensive operation.

The formula [(entry + 1) & (size - 1)] is now adopted on
a ring that is power-of-2 in size.
Then, the number of elements cannot be set by command line but
it is fixed.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:31 -05:00
Giuseppe Cavallaro
293e4365a1 stmmac: change descriptor layout
This patch completely changes the descriptor layout to improve
the whole performances due to the single read usage of the
descriptors in critical paths.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:30 -05:00
Giuseppe Cavallaro
afea03656a stmmac: rework DMA bus setting and introduce new platform AXI structure
This patch restructures the DMA bus settings and this is done
by introducing a new platform structure used for programming
the AXI Bus Mode Register inside the DMA module.
This structure can be populated from device-tree as documented in the
binding txt file.

After initializing the DMA, the AXI register can be optionally tuned
for platform drivers based.
This patch also reworks some parameters to make coherent the DMA
configuration now that AXI register is introduced.
For example, the burst_len is managed by using the mentioned axi
support above; so the snps,burst-len parameter has been removed.
It makes sense to provide the AAL parameter from DT to Address-Aligned
Beats inside the Register0 and review the PBL settings when initialize
the engine.

For PCI glue, rebuilding the story of this setting, it
was added to align a configuration so not for fixing some
known problem. No issue raised after this patch.
It is safe to use the default burst length instead of
tuning it to the maximum value

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:30 -05:00
Giuseppe Cavallaro
495db27302 stmmac: share reset function between dwmac100 and dwmac1000
This patch is to share the same reset procedure between dwmac100 and
dwmac1000 chips.
This will also help on enhancing the driver and support new chips.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:21:30 -05:00
David S. Miller
fcb3f55f66 Merge branch 'rds-support-FRMR-and-cleanups'
Santosh Shilimkar says:

====================
RDS: Major clean-up with couple of new features for 4.6

v3:
Re-generated the same series by omitting "-D" option from git format-patch
command. Since first patch has file removals, git apply/am can't deal
with it when formated with '-D' option.

v2:
Dropped module parameter from [PATCH 11/13] as suggested by David Miller

Series is generated against net-next but also applies against Linus's tip
cleanly. Entire patchset is available at below git tree:

 git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git for_4.6/net-next/rds_v2

The diff-stat looks bit scary since almost ~4K lines of code is
getting removed. Brief summary of the series:

- Drop the stale iWARP support:
	RDS iWarp support code has become stale and non testable for
	sometime.  As discussed and agreed earlier on list, am dropping
	its support for good. If new iWarp user(s) shows up in future,
	the plan is to adapt existing IB RDMA with special sink case.
- RDS gets SO_TIMESTAMP support
- Long due RDS maintainer entry gets updated
- Some RDS IB code refactoring towards new FastReg Memory registration (FRMR)
- Lastly the initial support for FRMR

RDS IB RDMA performance with FRMR is not yet as good as FMR and I do have
some patches in progress to address that. But they are not ready for 4.6
so I left them out of this series.

Also am keeping eye on new CQ API adaptations like other ULPs doing and
will try to adapt RDS for the same most likely in 4.7+ timeframe.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:20 -05:00
Avinash Repaka
1659185fb4 RDS: IB: Support Fastreg MR (FRMR) memory registration mode
Fastreg MR(FRMR) is another method with which one can
register memory to HCA. Some of the newer HCAs supports only fastreg
mr mode, so we need to add support for it to have RDS functional
on them.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:19 -05:00
santosh.shilimkar@oracle.com
ad6832f950 RDS: IB: allocate extra space on queues for FRMR support
Fastreg MR(FRMR) memory registration and invalidation makes use
of work request and completion queues for its operation. Patch
allocates extra queue space towards these operation(s).

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:19 -05:00
santosh.shilimkar@oracle.com
2cb2912d65 RDS: IB: add Fastreg MR (FRMR) detection support
Discovere Fast Memmory Registration support using IB device
IB_DEVICE_MEM_MGT_EXTENSIONS. Certain HCA might support just FRMR
or FMR or both FMR and FRWR. In case both mr type are supported,
default FMR is used.

Default MR is still kept as FMR against what everyone else
is following. Default will be changed to FRMR once the
RDS performance with FRMR is comparable with FMR. The
work is in progress for the same.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:19 -05:00
santosh.shilimkar@oracle.com
db42753adb RDS: IB: add mr reused stats
Add MR reuse statistics to RDS IB transport.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:19 -05:00
santosh.shilimkar@oracle.com
37ea401e9c RDS: IB: handle the RDMA CM time wait event
Drop the RDS connection on RDMA_CM_EVENT_TIMEWAIT_EXIT so that
it can reconnect and resume.

While testing fastreg, this error happened in couple of tests but
was getting un-noticed.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:19 -05:00
santosh.shilimkar@oracle.com
d4de76da5c RDS: IB: add connection info to ibmr
Preperatory patch for FRMR support. From connection info,
we can retrieve cm_id which contains qp handled needed for
work request posting.

We also need to drop the RDS connection on QP error states
where connection handle becomes useful.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:18 -05:00
santosh.shilimkar@oracle.com
490ea5967b RDS: IB: move FMR code to its own file
No functional change.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:18 -05:00
santosh.shilimkar@oracle.com
a69365a39c RDS: IB: create struct rds_ib_fmr
Keep fmr related filed in its own struct. Fastreg MR structure
will be added to the union.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:18 -05:00
santosh.shilimkar@oracle.com
f6df683f32 RDS: IB: Re-organise ibmr code
No functional changes. This is in preperation towards adding
fastreg memory resgitration support.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:18 -05:00
santosh.shilimkar@oracle.com
dcfd041c87 RDS: IB: Remove the RDS_IB_SEND_OP dependency
This helps to combine asynchronous fastreg MR completion handler
with send completion handler.

No functional change.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:17 -05:00
santosh.shilimkar@oracle.com
72f26eee51 MAINTAINERS: update RDS entry
Acked-by: Chien Yen <chien.yen@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:17 -05:00
santosh.shilimkar@oracle.com
5711f8b353 RDS: Add support for SO_TIMESTAMP for incoming messages
The SO_TIMESTAMP generates time stamp for each incoming RDS messages
User app can enable it by using SO_TIMESTAMP setsocketopt() at
SOL_SOCKET level. CMSG data of cmsg type SO_TIMESTAMP contains the
time stamp in struct timeval format.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:17 -05:00
santosh.shilimkar@oracle.com
dcdede0406 RDS: Drop stale iWARP RDMA transport
RDS iWarp support code has become stale and non testable. As
indicated earlier, am dropping the support for it.

If new iWarp user(s) shows up in future, we can adapat the RDS IB
transprt for the special RDMA READ sink case. iWarp needs an MR
for the RDMA READ sink.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:17 -05:00
Pablo Neira Ayuso
8a6bf5da1a netfilter: nft_masq: support port range
Complete masquerading support by allowing port range selection.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-02 20:05:27 +01:00
Florian Westphal
af4610c395 netfilter: don't call hooks unless needed
With the previous patches in place, a netns nf_hook_list might be empty,
even if e.g. init_net performs filtering.

Thus change nf_hook_thresh to check the hook_list as well before
initializing hook_state and calling nf_hook_slow().

We still make use of static keys; if no netfilter modules are loaded
list is guaranteed to be empty.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-02 20:05:26 +01:00
Florian Westphal
5f6c253ebe netfilter: bridge: register hooks only when bridge interface is added
This moves bridge hooks to a register-when-needed scheme.

We use a device notifier to register the 'call-iptables' netfilter hooks
only once a bridge gets added.

This means that if the initial namespace uses a bridge, newly created
network namespaces no longer get the PRE_ROUTING ipt_sabotage hook.

It will registered in that network namespace once a bridge is created
within that namespace.

A few modules still use global hooks:

- conntrack
- bridge PF_BRIDGE hooks
- IPVS
- CLUSTER match (deprecated)
- SYNPROXY

As long as these modules are not loaded/used, a new network namespace has
empty hook list and NF_HOOK() will boil down to single list_empty test even
if initial namespace does stateless packet filtering.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-02 20:05:25 +01:00