Commit Graph

660 Commits

Author SHA1 Message Date
Linus Torvalds
ae045e2455 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Steady transitioning of the BPF instructure to a generic spot so
      all kernel subsystems can make use of it, from Alexei Starovoitov.

   2) SFC driver supports busy polling, from Alexandre Rames.

   3) Take advantage of hash table in UDP multicast delivery, from David
      Held.

   4) Lighten locking, in particular by getting rid of the LRU lists, in
      inet frag handling.  From Florian Westphal.

   5) Add support for various RFC6458 control messages in SCTP, from
      Geir Ola Vaagland.

   6) Allow to filter bridge forwarding database dumps by device, from
      Jamal Hadi Salim.

   7) virtio-net also now supports busy polling, from Jason Wang.

   8) Some low level optimization tweaks in pktgen from Jesper Dangaard
      Brouer.

   9) Add support for ipv6 address generation modes, so that userland
      can have some input into the process.  From Jiri Pirko.

  10) Consolidate common TCP connection request code in ipv4 and ipv6,
      from Octavian Purdila.

  11) New ARP packet logger in netfilter, from Pablo Neira Ayuso.

  12) Generic resizable RCU hash table, with intial users in netlink and
      nftables.  From Thomas Graf.

  13) Maintain a name assignment type so that userspace can see where a
      network device name came from (enumerated by kernel, assigned
      explicitly by userspace, etc.) From Tom Gundersen.

  14) Automatic flow label generation on transmit in ipv6, from Tom
      Herbert.

  15) New packet timestamping facilities from Willem de Bruijn, meant to
      assist in measuring latencies going into/out-of the packet
      scheduler, latency from TCP data transmission to ACK, etc"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits)
  cxgb4 : Disable recursive mailbox commands when enabling vi
  net: reduce USB network driver config options.
  tg3: Modify tg3_tso_bug() to handle multiple TX rings
  amd-xgbe: Perform phy connect/disconnect at dev open/stop
  amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask
  net: sun4i-emac: fix memory leak on bad packet
  sctp: fix possible seqlock seadlock in sctp_packet_transmit()
  Revert "net: phy: Set the driver when registering an MDIO bus device"
  cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine
  team: Simplify return path of team_newlink
  bridge: Update outdated comment on promiscuous mode
  net-timestamp: ACK timestamp for bytestreams
  net-timestamp: TCP timestamping
  net-timestamp: SCHED timestamp on entering packet scheduler
  net-timestamp: add key to disambiguate concurrent datagrams
  net-timestamp: move timestamp flags out of sk_flags
  net-timestamp: extend SCM_TIMESTAMPING ancillary data struct
  cxgb4i : Move stray CPL definitions to cxgb4 driver
  tcp: reduce spurious retransmits due to transient SACK reneging
  qlcnic: Initialize dcbnl_ops before register_netdev
  ...
2014-08-06 09:38:14 -07:00
Linus Torvalds
e7fda6c4c3 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and time updates from Thomas Gleixner:
 "A rather large update of timers, timekeeping & co

   - Core timekeeping code is year-2038 safe now for 32bit machines.
     Now we just need to fix all in kernel users and the gazillion of
     user space interfaces which rely on timespec/timeval :)

   - Better cache layout for the timekeeping internal data structures.

   - Proper nanosecond based interfaces for in kernel users.

   - Tree wide cleanup of code which wants nanoseconds but does hoops
     and loops to convert back and forth from timespecs.  Some of it
     definitely belongs into the ugly code museum.

   - Consolidation of the timekeeping interface zoo.

   - A fast NMI safe accessor to clock monotonic for tracing.  This is a
     long standing request to support correlated user/kernel space
     traces.  With proper NTP frequency correction it's also suitable
     for correlation of traces accross separate machines.

   - Checkpoint/restart support for timerfd.

   - A few NOHZ[_FULL] improvements in the [hr]timer code.

   - Code move from kernel to kernel/time of all time* related code.

   - New clocksource/event drivers from the ARM universe.  I'm really
     impressed that despite an architected timer in the newer chips SoC
     manufacturers insist on inventing new and differently broken SoC
     specific timers.

[ Ed. "Impressed"? I don't think that word means what you think it means ]

   - Another round of code move from arch to drivers.  Looks like most
     of the legacy mess in ARM regarding timers is sorted out except for
     a few obnoxious strongholds.

   - The usual updates and fixlets all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  timekeeping: Fixup typo in update_vsyscall_old definition
  clocksource: document some basic timekeeping concepts
  timekeeping: Use cached ntp_tick_length when accumulating error
  timekeeping: Rework frequency adjustments to work better w/ nohz
  timekeeping: Minor fixup for timespec64->timespec assignment
  ftrace: Provide trace clocks monotonic
  timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
  seqcount: Add raw_write_seqcount_latch()
  seqcount: Provide raw_read_seqcount()
  timekeeping: Use tk_read_base as argument for timekeeping_get_ns()
  timekeeping: Create struct tk_read_base and use it in struct timekeeper
  timekeeping: Restructure the timekeeper some more
  clocksource: Get rid of cycle_last
  clocksource: Move cycle_last validation to core code
  clocksource: Make delta calculation a function
  wireless: ath9k: Get rid of timespec conversions
  drm: vmwgfx: Use nsec based interfaces
  drm: i915: Use nsec based interfaces
  timekeeping: Provide ktime_get_raw()
  hangcheck-timer: Use ktime_get_ns()
  ...
2014-08-05 17:46:42 -07:00
Jack Morgenstein
4d2f9bbb65 mlx5: Adjust events to use unsigned long param instead of void *
In the event flow, we currently pass only a port number in the
void *data argument.  Rather than pass a pointer to the event handlers,
we should use an "unsigned long" parameter, and pass the port number
value directly.

In the future, if necessary for some events, we can use the unsigned long
parameter to pass a pointer.

Based on a patch by Eli Cohen <eli@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30 14:00:06 -07:00
Jack Morgenstein
f241e7497e mlx5: minor fixes (mainly avoidance of hidden casts)
There were many places where parameters which should be u8/u16 were
integer type.

Additionally, in 2 places, a check for a non-null pointer was added
before dereferencing the pointer (this is actually a bug fix).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30 14:00:06 -07:00
Jack Morgenstein
9603b61de1 mlx5: Move pci device handling from mlx5_ib to mlx5_core
In preparation for a new mlx5 device which is VPI (i.e., ports can be
either IB or ETH), move the pci device functionality from mlx5_ib
to mlx5_core.

This involves the following changes:
1. Move mlx5_core_dev struct out of mlx5_ib_dev. mlx5_core_dev
   is now an independent structure maintained by mlx5_core.
   mlx5_ib_dev now has a pointer to that struct.
   This requires changing a lot of places where the core_dev
   struct was accessed via mlx5_ib_dev (now, this needs to
   be a pointer dereference).
2. All PCI initializations are now done in mlx5_core. Thus,
   it is now mlx5_core which does pci_register_device (and not
   mlx5_ib, as was previously).
3. mlx5_ib now registers itself with mlx5_core as an "interface"
   driver. This is very similar to the mechanism employed for
   the mlx4 (ConnectX) driver. Once the HCA is initialized
   (by mlx5_core), it invokes the interface drivers to do
   their initializations.
4. There is a new event handler which the core registers:
   mlx5_core_event(). This event handler invokes the
   event handlers registered by the interfaces.

Based on a patch by Eli Cohen <eli@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30 14:00:06 -07:00
Fengguang Wu
3f6148e76a net/mlx4_en: mlx4_en_[gs]et_priv_flags() can be static
Fixes sparse warning intrduced by commit 0fef9d0 ("net/mlx4_en: Disable
blueflame using ethtool private flags")

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-24 23:36:32 -07:00
Thomas Gleixner
14a7004671 net: mlx5: Use ktime_get_ns()
This code is beyond silly:

     struct timespec ts = ktime_get_ts();
     ktime_t ktime = timespec_to_ktime(ts);

Further down the code builds the delta of two ktime_t values and
converts the result to nanoseconds.

Use ktime_get_ns() and replace all the nonsense.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2014-07-23 15:01:43 -07:00
Amir Vadai
ea1c1af139 net/mlx4_en: Reduce memory consumption on kdump kernel
When memory is limited, reduce number of rx and tx rings.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22 19:53:14 -07:00
Amir Vadai
2599d8580f net/mlx4_core: Use low memory profile on kdump kernel
When running in kdump kernel, reduce number of resources allocated for
the hardware. This will enable the NIC to operate in this low memory
environment at the expense of performance and some features not related
to the basic NIC functionality.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22 19:53:14 -07:00
Amir Vadai
0fef9d0308 net/mlx4_en: Disable blueflame using ethtool private flags
Enable the user to turn off the hardware feature called BlueFlame.
Since it is something specific to mlx4_en hardware, we control
the feature via ethtool private flags.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22 19:53:14 -07:00
Eyal Perry
b94901f3ed net/mlx4_en: current_mac isn't updated in port up
When port is down dev_addr is changed (e.g. by bonding) but current_mac
is not touched. When port is up again, hash_mac is updated to dev_addr,
but current_mac isn't. This leads to inconsistency between current_mac
and mac_hash. Because of that, mlx4_en_replace_mac() fails to find
current_mac in mac_hash.

Fix is to reset current_mac to dev_addr when port is up - as we do for
mac_hash.

Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22 19:53:13 -07:00
David S. Miller
8fd90bb889 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/infiniband/hw/cxgb4/device.c

The cxgb4 conflict was simply overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22 00:44:59 -07:00
Linus Torvalds
15ba2236f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Null termination fix in dns_resolver got the pointer dereferncing
    wrong, fix from Ben Hutchings.

 2) ip_options_compile() has a benign but real buffer overflow when
    parsing options.  From Eric Dumazet.

 3) Table updates can crash in netfilter's nftables if none of the state
    flags indicate an actual change, from Pablo Neira Ayuso.

 4) Fix race in nf_tables dumping, also from Pablo.

 5) GRE-GRO support broke the forwarding path because the segmentation
    state was not fully initialized in these paths, from Jerry Chu.

 6) sunvnet driver leaks objects and potentially crashes on module
    unload, from Sowmini Varadhan.

 7) We can accidently generate the same handle for several u32
    classifier filters, fix from Cong Wang.

 8) Several edge case bug fixes in fragment handling in xen-netback,
    from Zoltan Kiss.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
  ipv4: fix buffer overflow in ip_options_compile()
  batman-adv: fix TT VLAN inconsistency on VLAN re-add
  batman-adv: drop QinQ claim frames in bridge loop avoidance
  dns_resolver: Null-terminate the right string
  xen-netback: Fix pointer incrementation to avoid incorrect logging
  xen-netback: Fix releasing header slot on error path
  xen-netback: Fix releasing frag_list skbs in error path
  xen-netback: Fix handling frag_list on grant op error path
  net_sched: avoid generating same handle for u32 filters
  net: huawei_cdc_ncm: add "subclass 3" devices
  net: qmi_wwan: add two Sierra Wireless/Netgear devices
  wan/x25_asy: integer overflow in x25_asy_change_mtu()
  net: ppp: fix creating PPP pass and active filters
  net/mlx4_en: cq->irq_desc wasn't set in legacy EQ's
  sunvnet: clean up objects created in vnet_new() on vnet_exit()
  r8169: Enable RX_MULTI_EN for RTL_GIGA_MAC_VER_40
  net-gre-gro: Fix a bug that breaks the forwarding path
  netfilter: nf_tables: 64bit stats need some extra synchronization
  netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale
  netfilter: nf_tables: safe RCU iteration on list when dumping
  ...
2014-07-21 22:46:01 -07:00
Linus Torvalds
b579fcca32 InfiniBand/RDMA fixes for 3.16
- cxgb4 hardware driver regression fixes
  - mlx5 hardware driver regression fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJTyYWiAAoJEENa44ZhAt0hKkMP/ROnFwRtk0DKNGbyZLx/gLpK
 hEvzNeqZKn8OS2eHMvkP+pbKWrm1GoPcPEMZVfZv8dqwS7PzcmHVsA7WjzJF0mXh
 PLuPZfLHu4UZAZttP7Hm8XV+p7uthD/Fvk60x/uzboMZhSjFt4Khb0tEvezZ8Jyh
 5bbkiCp2PokD4z32fFoErTxa5XE8bYuqL70G9YyMuwdZWS2W4WbjihYHbi8vBzFd
 IsAvl4Ms/Fs8B5aeK0beeuals0yLYishPQY/F+r10GUQ3Q3JlD1/5j/MML3SZOlX
 FfYC3BDm3zgVHdaDTohlXbGVvJbziI5xkj0cUxVIk5f3SVZC36/BR+AxtjlhLjvR
 GEh9wwCP1cfXt7tRFBOS11M869oUOjG/5SQozsnJTAztpll85wxIWOyCCupRSokX
 90YNFtdGTBMkkm5vbddpekw+ku/qhrG8oNhKLMXoP1v4nV/8HiVG+oZdzQYrPPOp
 KMkNtC6ItLCXBRF0hUGGAszUaFap0ZFCfWk9d8W+xpOnFzVgE8pkG2kRH1fty5ja
 HbI/0OhiX5x+lYhyzGUb+AnjNTT5yYba1M+t1Sr3JC5AxBzwiS4FYq7TRTnNGt+p
 H6/+WMw+PcCpCaOcVCivbFCl1gceA7bKCMuwjjFoO0CjDT8Tr6pX3gRcDvrxHVGr
 rLi5EcHdgJEsPfaUACbv
 =JbA3
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma fixes from Roland Dreier:
 - cxgb4 hardware driver regression fixes
 - mlx5 hardware driver regression fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx5: Enable "block multicast loopback" for kernel consumers
  RDMA/cxgb4: Call iwpm_init() only once
  mlx5_core: Fix possible race between mr tree insert/delete
  RDMA/cxgb4: Initialize the device status page
  RDMA/cxgb4: Clean up connection on ARP error
  RDMA/cxgb4: Fix skb_leak in reject_cr()
2014-07-18 20:39:34 -10:00
Amir Vadai
858e6c3210 net/mlx4_en: cq->irq_desc wasn't set in legacy EQ's
Fix a regression introduced by commit 35f6f45 ("net/mlx4_en: Don't use
irq_affinity_notifier to track changes in IRQ affinity map").
When core is started in legacy EQ's (number of IRQ's < rx rings), cq->irq_desc
was NULL.  This caused a kernel crash under heavy traffic - when having more
than rx NAPI budget completions.
Fixed to have it set for both EQ modes.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 23:28:32 -07:00
Saeed Mahameed
1645a54082 net/mlx4_core: Remove MCG in case it is attached to promiscuous QPs only
In B0 steering mode if promiscuous QP asks to be detached from MCG entry,
and it is the only one in this entry then the entry will never be deleted.
This is a wrong behavior since we don't want to keep those entries after
the promiscuous QP becomes non-promiscuous. Therefore remove steering
entry containing only promiscuous QP.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 23:26:25 -07:00
Eugenia Emantayev
816e59846b net/mlx4_core: In SR-IOV mode host should add promisc QP to default entry only
In current situation host is adding the promiscuous QP to all steering
entries and the default entry as well. In this case when having PV
and SR-IOV on the same setup bridge will receive all traffic that is
targeted to the other VMs. This is bad.
Solution: In SR-IOV mode host can add promiscuous QP to default entry only.
The above problem and fix are relevant for B0 steering mode only.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 23:26:25 -07:00
Alexander Guller
7590837603 net/mlx4_core: Make sure the max number of QPs per MCG isn't exceeded
In B0 steering mode when adding QPs to the default MCG entry need
to check that maximal number of QPs per MCG entry was not exceeded.

Signed-off-by: Alexander Guller <alexg@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.co.il>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 23:26:24 -07:00
Dotan Barak
aab2bb0e29 net/mlx4_core: Make sure that negative array index isn't used
To make sure that the array index isn't used in the code with
negative value, we stop using the for loop integer iterator
outside of it.
>From now on use members count to swap the last QP with removed one.
Fix also the second occurrence of this flow in mlx4_qp_detach_common().
In mlx4_qp_detach_common() use members_count instead of
loop iterator outside of the for loop.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 23:26:24 -07:00
Yevgeny Petrilin
0d7869acb2 net/mlx4_core: Fix leakage of SW multicast entries
When removing multicast address in B0 steering mode there is
a bug in cases where there is a single QP registered for the address,
and this QP is also promiscuous. In such cases the entry wouldn't be
deleted from the SW structure representing all Ethernet MCG entries,
but would be removed in HW. This way when driver goes to remove it
from SW and HW structures the HW deletion fails.
Moreover the same index could later be used for registering
different address, which can be Infiniband.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 23:26:24 -07:00
David S. Miller
1a98c69af1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 14:09:34 -07:00
Jason Wang
32b333fe99 mlx4: mark napi id for gro_skb
Napi id was not marked for gro_skb, this will lead rx busy loop won't
work correctly since they stack never try to call low latency receive
method because of a zero socket napi id. Fix this by marking napi id
for gro_skb.

The transaction rate of 1 byte netperf tcp_rr gets about 50% increased
(from 20531.68 to 30610.88).

Cc: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 16:14:16 -07:00
Sagi Grimberg
6ef07a9f36 mlx5_core: Fix possible race between mr tree insert/delete
In mlx5_core_destroy_mkey(), we must first remove the mr from the
radix tree and then destroy it.  Otherwise we might hit a race if the
key was reallocated and we attempted to insert it to the radix tree.

Also handle radix tree insert/delete failures.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Eli Cohen <elic@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-09 16:58:58 -07:00
Amir Vadai
fbc6daf197 net/mlx4_en: Ignore budget on TX napi polling
It is recommended that TX work not count against the quota.
The cost of TX packet liberation is a minute percentage of what it costs to
process an RX frame. Furthermore, that SKB freeing makes memory available for
other paths in the stack.

Give the TX a larger budget and be more aggressive about cleaning up the Tx
descriptors this budget could be changed using ethtool:
$ ethtool -C eth1 tx-frames-irq <budget>

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 20:00:49 -07:00
Noa Osherovich
2695bab2a6 net/mlx4_en: Fix mac_hash database inconsistency
Using a local copy of dev_addr in mlx4_en_set_mac() to prevent dev_addr
from being modified during error flow or when dev_addr is modified in
another context (which is another problem that is being discussed over
the mailing list [1]).
Also fixing bad naming of priv->prev_mac into priv->current_mac.

[1] - http://patchwork.ozlabs.org/patch/351489/

Reviewed-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 19:58:45 -07:00
Yishai Hadas
d5b8dff007 net/mlx4_en: Do not count LLC/SNAP in MTU calculation
LLC/SNAP 8 bytes should not be added as part of header calculation.
If used, payload will be decreased accordingly. For MTU of 1500
we'll set 1522 instead of 1523.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Liran Liss <liranl@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 19:58:44 -07:00
Eugenia Emantayev
49a1e4f6b7 net/mlx4_en: Do not disable vlan filter during promiscuous mode
Promiscous mode is only for MACs.
Should not disable/enable VLAN filter when entering/leaving promisuous mode.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.co.il>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 19:58:44 -07:00
Eugenia Emantayev
143b3efb40 net/mlx4: Verify port number in __mlx4_unregister_mac
Verify port number to avoid crashes if port number is outside the range.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 19:58:44 -07:00
Eugenia Emantayev
4359db1e0d net/mlx4_en: Run loopback test only when port is up
Loopback can't work when port is down.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 19:58:44 -07:00
Eugenia Emantayev
523ece889e net/mlx4_en: Fix set port ratelimit for 40GE
In 40GE we can't use the default bw units for set ratelimit (100 Mbps)
since the max is 255*100 Mbps = 25 Gbps (not suited for 40GE), thus we need 1 Gbps units.
But for 10GE 1 Gbps units might be too bruit so we use the following solution.

For user set ratelimit <= 25 Gbps:
        use 100 Mbps units * user_ratelimit (* 10).

For user set ratelimit > 25 Gbps:
        use 1 Gbps units * user_ratelimit.

For user set unlimited ratelimit (0 Gbps):
        use 1 Gbps units * MAX_RATELIMIT_DEFAULT (57)

Note: any value > 58 will damage the FW ratelimit computation, so we allow
      a max and any higher value will be pulled down to 57.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08 19:58:44 -07:00
Or Gerlitz
e326f2f13b net/mlx4_en: Don't configure the HW vxlan parser when vxlan offloading isn't set
The add_vxlan_port ndo driver code was wrongly testing whether HW vxlan offloads
are supported by the device instead of checking if they are currently enabled.

This causes the driver to configure the HW parser to conduct matching for vxlan
packets but since no steering rules were set, vxlan packets are dropped on RX.

Fix that by doing the right test, as done in the del_vxlan_port ndo handler.

Fixes: 1b136de ('net/mlx4: Implement vxlan ndo calls')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 21:39:18 -07:00
Amir Vadai
bb273617a6 net/mlx4_en: IRQ affinity hint is not cleared on port down
Need to remove affinity hint at mlx4_en_deactivate_cq() and not at
mlx4_en_destroy_cq() - since affinity_mask might be free'd while still
being used by procfs.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 18:29:23 -07:00
Amir Vadai
35f6f45368 net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map
IRQ affinity notifier can only have a single notifier - cpu_rmap
notifier. Can't use it to track changes in IRQ affinity map.
Detect IRQ affinity changes by comparing CPU to current IRQ affinity map
during NAPI poll thread.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ben Hutchings <ben@decadent.org.uk>
Fixes: 2eacc23 ("net/mlx4_core: Enforce irq affinity changes immediatly")
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 18:29:23 -07:00
Or Gerlitz
960b1f454e net/mlx4_core: Fix the error flow when probing with invalid VF configuration
Single ported VF are currently not supported on configurations where
one or both ports are IB. When we hit this case, the relevant flow in
the driver didn't return error and jumped to the wrong label. Fix that.

Fixes: dd41cc3 ('net/mlx4: Adapt num_vfs/probed_vf params for single port VF')
Reported-by: Shirley Ma <shirley.ma@oracle.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-22 17:13:40 -07:00
Linus Torvalds
f9da455b93 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

 4) BPF now has a "random" opcode, from Chema Gonzalez.

 5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

 6) Support TCP fastopen over ipv6, from Daniel Lee.

 7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers.  From Ezequiel Garcia.

 8) Support software TSO in fec driver too, from Nimrod Andy.

 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
  rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
  tcp: fixing TLP's FIN recovery
  net: fec: Add software TSO support
  net: fec: Add Scatter/gather support
  net: fec: Increase buffer descriptor entry number
  net: fec: Factorize feature setting
  net: fec: Enable IP header hardware checksum
  net: fec: Factorize the .xmit transmit function
  bridge: fix compile error when compiling without IPv6 support
  bridge: fix smatch warning / potential null pointer dereference
  via-rhine: fix full-duplex with autoneg disable
  bnx2x: Enlarge the dorq threshold for VFs
  bnx2x: Check for UNDI in uncommon branch
  bnx2x: Fix 1G-baseT link
  bnx2x: Fix link for KR with swapped polarity lane
  sctp: Fix sk_ack_backlog wrap-around problem
  net/core: Add VF link state control policy
  net/fsl: xgmac_mdio is dependent on OF_MDIO
  net/fsl: Make xgmac_mdio read error message useful
  net_sched: drr: warn when qdisc is not work conserving
  ...
2014-06-12 14:27:40 -07:00
Yuval Atias
9e311e77a8 net/mlx4_en: Use affinity hint
The “affinity hint” mechanism is used by the user space
daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
Irqbalancer can use this hint to balance the irqs between the
cpus indicated by the mask.

We wish the HCA to preferentially map the IRQs it uses to numa cores
close to it.  To accomplish this, we use cpumask_set_cpu_local_first(), that
sets the affinity hint according the following policy:
First it maps IRQs to “close” numa cores.  If these are exhausted, the
remaining IRQs are mapped to “far” numa cores.

Signed-off-by: Yuval Atias <yuvala@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 14:58:16 -07:00
Wei Yang
da1de8dfff net/mlx4_core: Keep only one driver entry release mlx4_priv
Following commit befdf89 "net/mlx4_core: Preserve pci_dev_data after
__mlx4_remove_one()", there are two mlx4 pci callbacks which will
attempt to release the mlx4_priv object -- .shutdown and .remove.

This leads to a use-after-free access to the already freed mlx4_priv
instance and trigger a "Kernel access of bad area" crash when both
.shutdown and .remove are called.

During reboot or kexec, .shutdown is called, with the VFs probed to
the host going through shutdown first and then the PF. Later, the PF
will trigger VFs' .remove since VFs still have driver attached.

Fix that by keeping only one driver entry which releases mlx4_priv.

Fixes: befdf89 ('net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()')
CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 00:32:46 -07:00
Jack Morgenstein
95646373c9 net/mlx4_core: Fix SRIOV free-pool management when enforcing resource quotas
The Hypervisor driver tracks free slots and reserved slots at the global level
and tracks allocated slots and guaranteed slots per VF.

Guaranteed slots are treated as reserved by the driver, so the total
reserved slots is the sum of all guaranteed slots over all the VFs.

As VFs allocate resources, free (global) is decremented and allocated (per VF)
is incremented for those resources. However, reserved (global) is never changed.

This means that effectively, when a VF allocates a resource from its
guaranteed pool, it is actually reducing that resource's free pool (since
the global reserved count was not also reduced).

The fix for this problem is the following: For each resource, as long as a
VF's allocated count is <= its guaranteed number, when allocating for that
VF, the reserved count (global) should be reduced by the allocation as well.

When the global reserved count reaches zero, the remaining global free count
is still accessible as the free pool for that resource.

When the VF frees resources, the reverse happens: the global reserved count
for a resource is incremented only once the VFs allocated number falls below
its guaranteed number.

This fix was developed by Rick Kready <kready@us.ibm.com>

Reported-by: Rick Kready <kready@us.ibm.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 00:32:46 -07:00
Linus Torvalds
1d21b1bf53 Main batch of InfiniBand/RDMA changes for 3.16:
- Add iWARP port mapper to avoid conflicts between RDMA and normal
    stack TCP connections.
 
  - Fixes for i386 / x86-64 structure padding differences (ABI
    compatibility for 32-on-64) from Yann Droneaud.
 
  - A pile of SRP initiator fixes from Bart Van Assche.
 
  - Fixes for a writeback / memory allocation deadlock with NFS over
    IPoIB connected mode from Jiri Kosina.
 
  - The usual fixes and cleanups to mlx4, mlx5, cxgb4 and other
    low-level drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJTlzyEAAoJEENa44ZhAt0h9yoP/1UeXlejOpCJyiNdtJZ+ilcU
 cb0PEzsjzqACyDqcoQ0EpQM3/3emccVIC3uUXK12mzlTIXOFYTeRLays/TbxZDLt
 FK5D/NrMmmJmciPt1ZRgUX82kFFRGScEfpkXYs7jxtRaNT7CW5KwSNQr6aFXskUz
 1gpdK1ARCN5rWcGl2HJx5o9C4c/Fa/Vov8lOsAkUZXD1SuPNT/fFN0u1pRzU68g0
 k3oj81XnZq5ejOBQKXEHImcmjXwaJ2yjmzxhSsKebqDWDdXuS/F9e4taKneHTZmr
 AdwJaLLJPWmAGi/vYYhkuLKpzIDpzMCqwr39lEabmjWvznYOlnjfVUXwUTE2nwNC
 DIXuHOLFrSvF2cNxh8ZeEYKS8AV+PjAOahPC5whkWkY256Q67uB7cy9ilWAK+7xS
 QcQ5Inr6iXvxIGYA4hNwUo8aK0NuKFwhkVVFEbkPaurbQZPqiKwyVE3w2FOws/Qp
 0kLLCVvpRQYjKzkxyof2tb1AcNuVNKXHrYk6RaBDJ9mjxHbhvY4OSt4CBxAAXBu6
 zoedUydN1Nz1UgAB1jDsBdyE2QQnXockA1+JJKNq6gM5Dz0DUdAylzQ2NqY9tnYz
 RTzihEPYIiQUkV3B8ErbqsuO6z7M830AXO5AR6bLZn1zgJ0cbMLBaKLA8LRufJI/
 qxNVwL32Uv1PjKZ+yX1x
 =Wcdc
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull main InfiniBand/RDMA updates from Roland Dreier:

 - add iWARP port mapper to avoid conflicts between RDMA and normal
   stack TCP connections.

 - fixes for i386 / x86-64 structure padding differences (ABI
   compatibility for 32-on-64) from Yann Droneaud.

 - a pile of SRP initiator fixes from Bart Van Assche.

 - fixes for a writeback / memory allocation deadlock with NFS over
   IPoIB connected mode from Jiri Kosina.

 - the usual fixes and cleanups to mlx4, mlx5, cxgb4 and other low-level
   drivers.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (61 commits)
  RDMA/cxgb4: Add support for iWARP Port Mapper user space service
  RDMA/nes: Add support for iWARP Port Mapper user space service
  RDMA/core: Add support for iWARP Port Mapper user space service
  IB/mlx4: Fix gfp passing in create_qp_common()
  IB/umad: Fix use-after-free on close
  IB/core: Fix kobject leak on device register error flow
  RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp
  mlx4_core: Fix GFP flags parameters to be gfp_t
  IB/core: Fix port kobject deletion during error flow
  IB/core: Remove unneeded kobject_get/put calls
  IB/core: Fix sparse warnings about redeclared functions
  IB/mad: Fix sparse warning about gfp_t use
  IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO
  IB: Add a QP creation flag to use GFP_NOIO allocations
  IB: Return error for unsupported QP creation flags
  IB: Allow build of hw/ and ulp/ subdirectories independently
  mlx4_core: Move handling of MLX4_QP_ST_MLX to proper switch statement
  RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp
  IB/srp: Avoid problems if a header uses pr_fmt
  IB/umad: Fix error handling
  ...
2014-06-10 10:41:33 -07:00
Roland Dreier
eeaddf3670 Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', 'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next 2014-06-10 10:12:14 -07:00
Jiri Pirko
537fae0101 net: use SPEED_UNKNOWN and DUPLEX_UNKNOWN when appropriate
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-06 16:24:07 -07:00
Roland Dreier
4e2c341b86 mlx4_core: Fix GFP flags parameters to be gfp_t
Otherwise sparse gives a bunch of warnings like

    drivers/net/ethernet/mellanox/mlx4/srq.c:110:66: sparse: incorrect type in argument 4 (different base types)
    drivers/net/ethernet/mellanox/mlx4/srq.c:110:66:    expected int [signed] gfp
    drivers/net/ethernet/mellanox/mlx4/srq.c:110:66:    got restricted gfp_t

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04 10:19:13 -07:00
David S. Miller
c99f7abf0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	include/net/inetpeer.h
	net/ipv6/output_core.c

Changes in net were fixing bugs in code removed in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03 23:32:12 -07:00
David S. Miller
014b20133b Merge branch 'ethtool-rssh-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/net-next
Ben Hutchings says:

====================
Pull request: Fixes for new ethtool RSS commands

This addresses several problems I previously identified with the new
ETHTOOL_{G,S}RSSH commands:

1. Missing validation of reserved parameters
2. Vague documentation
3. Use of unnamed magic number
4. No consolidation with existing driver operations

I don't currently have access to suitable network hardware, but have
tested these changes with a dummy driver that can support various
combinations of operations and sizes, together with (a) Debian's ethtool
3.13 (b) ethtool 3.14 with the submitted patch to use ETHTOOL_{G,S}RSSH
and minor adjustment for fixes 1 and 3.

v2: Update RSS operations in vmxnet3 too
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 23:07:02 -07:00
Ben Hutchings
fe62d00137 ethtool: Replace ethtool_ops::{get,set}_rxfh_indir() with {get,set}_rxfh()
ETHTOOL_{G,S}RXFHINDIR and ETHTOOL_{G,S}RSSH should work for drivers
regardless of whether they expose the hash key, unless you try to
set a hash key for a driver that doesn't expose it.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-06-03 02:42:44 +01:00
Jiri Kosina
40f2287bd5 IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO
Modify the various routines used to allocate memory resources which
serve QPs in mlx4 to get an input GFP directive.  Have the Ethernet
driver to use GFP_KERNEL in it's QP allocations as done prior to this
commit, and the IB driver to use GFP_NOIO when the IB verbs
IB_QP_CREATE_USE_GFP_NOIO QP creation flag is provided.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02 14:58:11 -07:00
David S. Miller
96b2e73c54 Revert "net/mlx4_en: Use affinity hint"
This reverts commit 70a640d0da.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 00:18:48 -07:00
Yuval Atias
70a640d0da net/mlx4_en: Use affinity hint
The “affinity hint” mechanism is used by the user space
daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
Irqbalancer can use this hint to balance the irqs between the
cpus indicated by the mask.

We wish the HCA to preferentially map the IRQs it uses to numa cores
close to it.  To accomplish this, we use cpumask_set_cpu_local_first(), that
sets the affinity hint according the following policy:
First it maps IRQs to “close” numa cores.  If these are exhausted, the
remaining IRQs are mapped to “far” numa cores.

Signed-off-by: Yuval Atias <yuvala@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-01 19:16:29 -07:00
Jack Morgenstein
111c6094bd net/mlx4_core: Reset RoCE VF gids when guest driver goes down
Reset the GIDs assigned to a VF in the port RoCE GID table when
that guest goes down (either crashes or goes down cleanly).

As part of this fix, we refactor the RoCE gid table driver copy,
moving it to the mlx4_port_info structure (together with the MAC
and VLAN tables).

As with the MAC and VLAN tables, we now use a mutex per port
for the GID table so that modifying the driver copy and
modifying the firmware copy of a port GID table becomes an
atomic operation (thus avoiding driver-copy/FW-copy mismatches).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 16:57:52 -07:00
Roland Dreier
165cb465f7 mlx4_core: Move handling of MLX4_QP_ST_MLX to proper switch statement
The handling of MLX4_QP_ST_MLX in verify_qp_parameters() was
accidentally put inside the inner switch statement (that handles which
transition of RC/UC/XRC QPs is happening).  Fix this by moving the case
to the outer switch statement.

The compiler pointed this out with:

    drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'verify_qp_parameters':
 >> drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:2875:3: warning: case value '7' not in enumerated type 'enum qp_transition' [-Wswitch]
       case MLX4_QP_ST_MLX:

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 99ec41d0a4 ("mlx4: Add infrastructure for selecting VFs to enable QP0 via MLX proxy QPs")
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-30 15:38:58 -07:00