Commit Graph

528 Commits

Author SHA1 Message Date
Vlad Yasevich
75a75ee46b mlx4: Remove driver specific fdb handlers.
Remove driver specific fdb hadlers since they are the same as
the default ones.

CC: Amir Vadai <amirv@mellanox.com>
CC: Yan Burman <yanb@mellanox.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 15:29:46 -05:00
Sasha Levin
b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Linus Torvalds
1cef9350cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) ping_err() ICMP error handler looks at wrong ICMP header, from Li
    Wei.

 2) TCP socket hash function on ipv6 is too weak, from Eric Dumazet.

 3) netif_set_xps_queue() forgets to drop mutex on errors, fix from
    Alexander Duyck.

 4) sum_frag_mem_limit() can deadlock due to lack of BH disabling, fix
    from Eric Dumazet.

 5) TCP SYN data is miscalculated in tcp_send_syn_data(), because the
    amount of TCP option space was not taken into account properly in
    this code path.  Fix from yuchung Cheng.

 6) MLX4 driver allocates device queues with the wrong size, from Kleber
    Sacilotto.

 7) sock_diag can access past the end of the sock_diag_handlers[] array,
    from Mathias Krause.

 8) vlan_set_encap_proto() makes incorrect assumptions about where
    skb->data points, rework the logic so that it works regardless of
    where skb->data happens to be.  From Jesse Gross.

 9) Fix gianfar build failure with NET_POLL enabled, from Paul
    Gortmaker.

10) Fix Ipv4 ID setting and checksum calculations in GRE driver, from
   Pravin B Shelar.

11) bgmac driver does:

        int i;

        for (i = 0; ...; ...) {
                ...
                for (i = 0; ...; ...) {

    effectively corrupting the outer loop index, use a seperate
    variable for the inner loops.  From Rafał Miłecki.

12) Fix suspend bugs in smsc95xx driver, from Ming Lei.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
  usbnet: smsc95xx: rename FEATURE_AUTOSUSPEND
  usbnet: smsc95xx: fix broken runtime suspend
  usbnet: smsc95xx: fix suspend failure
  bgmac: fix indexing of 2nd level loops
  b43: Fix lockdep splat on module unload
  Revert "ip_gre: propogate target device GSO capability to the tunnel device"
  IP_GRE: Fix GRE_CSUM case.
  VXLAN: Use tunnel_ip_select_ident() for tunnel IP-Identification.
  IP_GRE: Fix IP-Identification.
  net/pasemi: Fix missing coding style
  vmxnet3: fix ethtool ring buffer size setting
  vmxnet3: make local function static
  bnx2x: remove dead code and make local funcs static
  gianfar: fix compile fail for NET_POLL=y due to struct packing
  vlan: adjust vlan_set_encap_proto() for its callers
  sock_diag: Simplify sock_diag_handlers[] handling in __sock_diag_rcv_msg
  sock_diag: Fix out-of-bounds access to sock_diag_handlers[]
  vxlan: remove depends on CONFIG_EXPERIMENTAL
  mlx4_en: fix allocation of CPU affinity reverse-map
  mlx4_en: fix allocation of device tx_cq
  ...
2013-02-26 11:44:11 -08:00
Linus Torvalds
70a3a06d01 Main batch of InfiniBand/RDMA changes for 3.9:
- SRP error handling fixes from Bart Van Assche
  - Implementation of memory windows for mlx4 from Shani Michaeli
  - Lots of cxgb4 HW driver fixes from Vipul Pandya
  - Make iSER work for virtual functions, other fixes from Or Gerlitz
  - Fix for bug in qib HW driver from Mike Marciniszyn
  - IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman
  - Various cleanups and warning fixes from Julia Lawall, Paul Bolle, Wei Yongjun
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJRLPNoAAoJEENa44ZhAt0h0bMP/1xqlgDR3DGdoUoV4yTd6O9G
 Uccdb7og5o5tedVo+8xZz01y4at99P8FWW4hJ4os6k8n6yKoBVwo7qjN+BOR+JG5
 q8+O+ynUSIg4tGrb5sXcMnKXAXbw/vkftMWYNA41cbrM24DTYzB/2SLpvhwbFoTT
 tdQc2tgz5QaDqzWbagyCR4+k/IgO+Llrz/RvIdtz4dsTnTDogN7QCoSffX8n/Lpb
 DxtyXK4sdl3DAtd3CsIdsB/TSMb3RkHLCoSvmrWlLnqMdsbRxVnCVfBm4BOghW3J
 Y2K3joRoCjjIZSRNs/i0FMFkT/jbCXg1oXg9ek/a6YFNcgyk7z8iGyXrRY7fOnno
 8U2SfxJ69YpVYeJr+DSjaeHcmjsaYU7NN7JPxzvPKcJOIsxQJ/euJDXAXau3lEQY
 o9/p4JsGty0WHi1NanyygvghvBAoP1C5/59Sl4bHH5gckPyJT1kinPSCTT76YXGS
 WkSHg2mDhiJHy7Pnuy85iZldPoy2/5z09/I4aGMeL+8kUZbD4iFqzXIJU0HTsAim
 EONoRXDhIcN5DNVSVH1ig6nJ2a7Vhov4Z0r/vB8P4KhslBcqFwf2leC0eCoe5mNt
 SzcKhqosZDXoL8AwzpntzGIOid8pWmHbUx/PgIcoVXPjtl0h2ULNIFoYYyMZ3cyU
 AyN2tSiUZVddTV1/aKGL
 =RAQw
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband update from Roland Dreier:
 "Main batch of InfiniBand/RDMA changes for 3.9:

   - SRP error handling fixes from Bart Van Assche

   - Implementation of memory windows for mlx4 from Shani Michaeli

   - Lots of cxgb4 HW driver fixes from Vipul Pandya

   - Make iSER work for virtual functions, other fixes from Or Gerlitz

   - Fix for bug in qib HW driver from Mike Marciniszyn

   - IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman

   - Various cleanups and warning fixes from Julia Lawall, Paul Bolle,
     Wei Yongjun"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (41 commits)
  IB/mlx4: Advertise MW support
  IB/mlx4: Support memory window binding
  mlx4: Implement memory windows allocation and deallocation
  mlx4_core: Enable memory windows in {INIT, QUERY}_HCA
  mlx4_core: Disable memory windows for virtual functions
  IPoIB: Free ipoib neigh on path record failure so path rec queries are retried
  IB/srp: Fail I/O requests if the transport is offline
  IB/srp: Avoid endless SCSI error handling loop
  IB/srp: Avoid sending a task management function needlessly
  IB/srp: Track connection state properly
  IB/mlx4: Remove redundant NULL check before kfree
  IB/mlx4: Fix compiler warning about uninitialized 'vlan' variable
  IB/mlx4: Convert is_xxx variables in build_mlx_header() to bool
  IB/iser: Enable iser when FMRs are not supported
  IB/iser: Avoid error prints on EAGAIN registration failures
  IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML
  IB/uverbs: Implement memory windows support in uverbs
  IB/core: Add "type 2" memory windows support
  mlx4_core: Propagate MR deregistration failures to caller
  mlx4_core: Rename MPT-related functions to have mpt_ prefix
  ...
2013-02-26 11:41:08 -08:00
Shani Michaeli
804d6a89a5 mlx4: Implement memory windows allocation and deallocation
Implement MW allocation and deallocation in mlx4_core and mlx4_ib.
Pass down the enable bind flag when registering memory regions.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 10:44:32 -08:00
Shani Michaeli
e448834e35 mlx4_core: Enable memory windows in {INIT, QUERY}_HCA
Add memory windows-related code to INIT_HCA and QUERY_HCA.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 10:44:31 -08:00
Shani Michaeli
cc1ade94ee mlx4_core: Disable memory windows for virtual functions
Do not enable memory windows allocation for virtual functions.

In addition, add a few safety checks, such as:

* Verifying the PD of a new MPT matches the VF.
* Making sure binding memory window isn't enabled for FMRs, and
  that new memory windows are not FMR themselves.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 10:44:31 -08:00
Kleber Sacilotto de Souza
3770699675 mlx4_en: fix allocation of CPU affinity reverse-map
The mlx4_en driver allocates the number of objects for the CPU affinity
reverse-map based on the number of rx rings of the device. However,
mlx4_assign_eq() calls irq_cpu_rmap_add() as many times as IRQ's are
assigned to EQ's, which can be as large as mlx4_dev->caps.comp_pool. If
caps.comp_pool is larger than rx_ring_num we will eventually hit the
BUG_ON() in cpu_rmap_add().

Fix this problem by allocating space for the maximum number of CPU
affinity reverse-map objects we might want to add.

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-23 13:51:54 -05:00
Kleber Sacilotto de Souza
427a96252d mlx4_en: fix allocation of device tx_cq
The memory to hold the network device tx_cq is not being allocated with
the correct size in mlx4_en_init_netdev(). It should use MAX_TX_RINGS
instead of MAX_RX_RINGS. This can cause problems if the number of tx
rings being used is greater than MAX_RX_RINGS.

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-23 13:51:54 -05:00
Linus Torvalds
9afa3195b9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "Assorted tiny fixes queued in trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (22 commits)
  DocBook: update EXPORT_SYMBOL entry to point at export.h
  Documentation: update top level 00-INDEX file with new additions
  ARM: at91/ide: remove unsused at91-ide Kconfig entry
  percpu_counter.h: comment code for better readability
  x86, efi: fix comment typo in head_32.S
  IB: cxgb3: delay freeing mem untill entirely done with it
  net: mvneta: remove unneeded version.h include
  time: x86: report_lost_ticks doesn't exist any more
  pcmcia: avoid static analysis complaint about use-after-free
  fs/jfs: Fix typo in comment : 'how may' -> 'how many'
  of: add missing documentation for of_platform_populate()
  btrfs: remove unnecessary cur_trans set before goto loop in join_transaction
  sound: soc: Fix typo in sound/codecs
  treewide: Fix typo in various drivers
  btrfs: fix comment typos
  Update ibmvscsi module name in Kconfig.
  powerpc: fix typo (utilties -> utilities)
  of: fix spelling mistake in comment
  h8300: Fix home page URL in h8300/README
  xtensa: Fix home page URL in Kconfig
  ...
2013-02-21 17:40:58 -08:00
Shani Michaeli
6108372070 mlx4_core: Propagate MR deregistration failures to caller
MR deregistration fails when memory windows are bound to the MR.
Handle such failures by propagating them to the caller ULP.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-21 11:38:43 -08:00
Shani Michaeli
b20e519a81 mlx4_core: Rename MPT-related functions to have mpt_ prefix
The MPT - Memory Protection Table - is used by both memory windows and
memory regions.  Hence, all MPT references are relevant for both types
of memory objects.  Rename the relevant functions to start with mpt_
instead of the current mr_ prefix.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-21 11:37:23 -08:00
Vlad Yasevich
1690be63a2 bridge: Add vlan support to static neighbors
When a user adds bridge neighbors, allow him to specify VLAN id.
If the VLAN id is not specified, the neighbor will be added
for VLANs currently in the ports filter list.  If no VLANs are
configured on the port, we use vlan 0 and only add 1 entry.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-13 19:42:16 -05:00
David S. Miller
fd5023111c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Synchronize with 'net' in order to sort out some l2tp, wireless, and
ipv6 GRE fixes that will be built on top of in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-08 18:02:14 -05:00
Joe Perches
14f8dc4953 drivers: net: Remove remaining alloc/OOM messages
alloc failures already get standardized OOM
messages and a dump_stack.

For the affected mallocs around these OOM messages:

Converted kmallocs with multiplies to kmalloc_array.
Converted a kmalloc/memcpy to kmemdup.
Removed now unused stack variables.
Removed unnecessary parentheses.
Neatened alignment.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Arend van Spriel <arend@broadcom.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-08 17:44:39 -05:00
Tom Herbert
41b749201b mlx4_en: Fix BQL reset TX queue call point
Fix issue in Mellanox driver related to BQL.  netdev_tx_reset_queue
was not being called in certain situations where the device was
being start and stopped.  Moved netdev_tx_reset_queue from the reset
device path to mlx4_en_free_tx_buf which is where the rings are
cleaned in a reset (specifically from device being stopped).

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-By: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-07 23:33:51 -05:00
Yan Burman
0ccddcd1c2 net/mlx4_en: Implement ndo fdb functionality
Add support for setting embedded switch fdb in case of SRIOV, by
implementing ndo_fdb_{add, del, dump}. This will allow to use
bridged configuration with multi-function. In order to add VM MAC
to the eSwitch fdb, the following command may be used over the relevant function interface:
bridge fdb add <MAC> permanent self dev <IFACE>

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-07 23:26:13 -05:00
Yan Burman
cc5387f734 net/mlx4_en: Add unicast MAC filtering
Implement and advertise unicast MAC filtering, such that setting macvlan
instance over mlx4_en interfaces will not require the networking core
to put mlx4_en devices in promiscuous mode.

If for some reason adding a unicast address filter fails e.g as of missing space in
the HW mac table, the device forces itself into promiscuous mode (and out of this
forced state when enough space is available).

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-07 23:26:13 -05:00
Yan Burman
c07cb4b0ab net/mlx4_en: Manage hash of MAC addresses per port
As a preparation step for supporting multiple unicast addresses, store MAC addresses in hash table.
Remove the radix tree for MAC addresses per QP, as it's not in use.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-07 23:26:13 -05:00
Yan Burman
90bbb74af6 net/mlx4_en: Save previous MAC address of the port so we can replace it later
In preparation to having more than one unicast MAC per port, we need to keep track
of the previous MAC address in the flow of ndo_set_mac_address,
so that mlx4_en_replace_mac will know what to replace.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-07 23:26:13 -05:00
Yan Burman
0eb74fdda4 net/mlx4_en: Re-arrange ndo_set_rx_mode related code
Currently, mlx4_en_do_set_multicast serves as the ndo_set_rx_mode entry for mlx4_en,
doing all related work. Split it to few calls, one per required functionality
(e.g multicast, promiscuous, etc) and rename some structures and calls
to use rx_mode notation instead of multicast.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-07 23:26:13 -05:00
Yan Burman
16a10ffd20 net/mlx4: Move Ethernet related functionality from mlx4_core to mlx4_en
Move low level code that deals with management of Ethernet MACs and QPs from mlx4_core to mlx4_en.
Also convert the new functions to deal with MACs in form of char array instead of u64.

Actual functions moved:
mlx4_replace_mac
mlx4_get_eth_qp
mlx4_put_eth_qp

To conduct this change, some functionality had to be exported from the core,
the following functions were added:
mlx4_get_base_qp
__mlx4_replace_mac (low level function for CX1/A0 compatibility)

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-07 23:26:12 -05:00
Yan Burman
48e551ff3d net/mlx4_en: Cleanup multiline strings
Make the code consistent in regard to error messages
not spanning multiple lines.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-07 23:26:12 -05:00
Yan Burman
6bbb6d99f3 net/mlx4_en: Optimize Rx fast path filter checks
Currently, RX path code that does RX filtering is not optimized
and does an expensive conversion. In order to use ether_addr_equal_64bits
which is optimized for such cases, we need the MAC address kept by the device
to be in the form of unsigned char array instead of u64. Store the MAC address
as unsigned char array and convert to/from u64 out of the fast path when needed.
Side effect of this is that we no longer need priv->mac, since it's the same
as dev->dev_addr.

This optimization was suggested by Eric Dumazet <eric.dumazet@gmail.com>

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-07 23:26:12 -05:00
Yan Burman
79aeaccd91 net/mlx4_en: Optimize loopback related checks in data path
Currently there are relatively complex conditional checks in the fast path,
for TX loopback enabling and resulting RX filter logic.
Move elaborate if's out of data path, replace them with a single flag
for each state and update that state from appropriate places.
Also, in native (non SRIOV) mode and not in loopback or in selftest,
there is no need to try and filter out packets that HW loopback-ed,
as in native mode we do not loopback packets anymore.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-07 23:26:12 -05:00
Linus Torvalds
bb5204c2eb IB regression fixes for 3.8:
- Fix mlx4 VFs not working on old guests because of 64B CQE changes
  - Fix ill-considered sparse fix for qib
  - Fix IPoIB crash due to skb double destruct introduced in 3.8-rc1
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJREutcAAoJEENa44ZhAt0hjFwP/3SCQr/eboXyJV9GBlmbU9y2
 X7t6JS1n9R5KxBj54XBL8ZA7qcaw7rQj8VgPC4qlMWTR1/fXHsrtiRtQU1VMBcBP
 Eh50iE5BEq4kpK93IYZZei+I7J/0O1Hpj1JwUuvGr7/hQltMWXItuPTlO4Ror5Kk
 Vvu/waQh9DDp1uQRSPbSqAhEx7cGbl27UT7BLPqszVla59GA8UOUcfit8I9CyTmk
 ySP2zrDC7JtPoOPYy6w32K4NSjp3KTR4EHWX0G3t/b0LvwEHARwQZ3RI4ZjNMqLl
 mtKfqaYjqCeSlaT6MAODlN0aTp38GFAU0RaGePL5GurxQExwGnVZfTRUJDkNGTGO
 vPDq6+L6XwPHgYTs1knafs3OT24nwv/vzZ/SLV7gcssbxdL8Cru16E4CO3Vpryrl
 5B0w2+ld+L1lw/m4rSuqzQYpS6NpW35ATKzMhQNwk9cLCLNCOqv247WDvhBZDnpV
 lhLQ+RGs6DK7CQQ8w4rYLFBVk+1kPlZYILV0Rjni6vv7w9S/byVrshqE8eIkQwqE
 BEl0gMc83VZj5WH5s5MJEx+T5H2lZ80rIDKuamSz7wEduXWWENEqj5k7mBHa66Sn
 0aHcrXDe26Cj1TUCGbrgeFPMlucVAK+fSjvEzZrzQwxLspnKXlFw5v0DvqmTqBok
 hO0iE4ajfXl9RfIC7KrK
 =bmKU
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull IB regression fixes from Roland Dreier:

 - Fix mlx4 VFs not working on old guests because of 64B CQE changes

 - Fix ill-considered sparse fix for qib

 - Fix IPoIB crash due to skb double destruct introduced in 3.8-rc1

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/qib: Fix for broken sparse warning fix
  mlx4_core: Fix advertisement of wrong PF context behaviour
  IPoIB: Fix crash due to skb double destruct
2013-02-08 12:15:14 +11:00
Or Gerlitz
f97b4b5d46 mlx4_core: Fix advertisement of wrong PF context behaviour
Commit 08ff32352d ("mlx4: 64-byte CQE/EQE support") introduced a
regression where older guest VF drivers failed to load even when
64-byte EQEs/CQEs are disabled, since the PF wrongly advertises the
new context behaviour anyway.  The failure looks like:

    mlx4_core 0000:00:07.0: Unknown pf context behaviour
    mlx4_core 0000:00:07.0: Failed to obtain slave caps
    mlx4_core: probe of 0000:00:07.0 failed with error -38

Fix this by basing this advertisement on dev->caps.flags, which is the
operational capabilities used by the QUERY_FUNC_CAP command wrapper
(dev_cap->flags holds the firmware capabilities).

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-05 09:35:41 -08:00
Hadar Hen Zion
f9d96862ca net/mlx4_en: Fix compilation error when CONFIG_INET isn't defined
ip_eth_mc_map function can't be used when CONFIG_INET isn't defined.
Fixed compilation error by adding CONFIG_INET define check before using the
function.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-04 13:26:50 -05:00
Hadar Hen Zion
377d97393d net/mlx4_en: Fix error propagation for ethtool helper function
Propagate return value of mlx4_en_ethtool_add_mac_rule_by_ipv4 in case of
failure.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-04 13:26:50 -05:00
Joe Perches
b2adaca92c ethernet: Remove unnecessary alloc/OOM messages, alloc cleanups
alloc failures already get standardized OOM
messages and a dump_stack.

Convert kzalloc's with multiplies to kcalloc.
Convert kmalloc's with multiplies to kmalloc_array.
Fix a few whitespace defects.
Convert a constant 6 to ETH_ALEN.
Use parentheses around sizeof.
Convert vmalloc/memset to vzalloc.
Remove now unused size variables.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-04 13:22:33 -05:00
Amir Vadai
3484aac161 net/mlx4_en: Fix transmit timeout when driver restarts port
Under heavy CPU load, changing, ring size/mtu/etc. could result in transmit
timeout, since stop-start port might take more than 10 seconds.
Calling netif_detach_device to prevent tx queue transmit timeout.

netif_detach_device() is not called under ndo_stop, because netif_carrier_off
will prevent the timeout, and device should not be marked as not present, or
else user won't be able to start it later on.

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 12:48:48 -05:00
Matan Barak
955154fa33 net/mlx4_en: Don't reassign port mac address on firmware that supports it
Mac reassignments should only be done when not supported by the firmware. To
accomplish that, checking firmware capability bit to know whether we should
reassign macs in the driver.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 12:48:47 -05:00
Hadar Hen Zion
23537b732f net/mlx4_core: Use firmware driven flow steering hash mode
The Firmware dynamically changes flow steering hash configuration from covering
L2 only to "full" L2/L3/L4 mode needed.  The dynamic change allows the driver
to set hard coded hash configuration which is changed by the firmware from L2
to L2/L3/L4 when attaching the first L3/L4 flow steering rule and back to L2
when there are no more such rules.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 12:48:47 -05:00
Hadar Hen Zion
0d256c0e93 net/mlx4_en: Fix ethtool rules leftovers after module unloaded
As part of the driver unload flow, all steering rules must be deleted,
make sure to remove the rules that were set through ethtool.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 12:48:47 -05:00
Hadar Hen Zion
280fce1e3e net/mlx4_en: Block insertion of ethtool steering rules while the interface is down
Attaching steering rules while the interface is down is an invalid operation, block it.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 12:48:47 -05:00
Hadar Hen Zion
8258bd2713 net/mlx4_en: Fix vlan mask for ethtool steering rules
The vlan mask field should be validated and assigned according to the field
size which is 12 bits. Also replace the numeric 0xfff mask with existing kernel
macro.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 12:48:47 -05:00
Hadar Hen Zion
69d7126b7f net/mlx4_en: Validate VLAN IDs provided in ethtool flow steering rules
When attaching flow steering rules via Ethtool accept only valid vlans IDs e.g
in the range: [0,4095].

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 12:48:46 -05:00
Hadar Hen Zion
f90a36734a net/mlx4_en: Fix ip/udp steering rules multicast mac when attached via ethtool
Destination mac is a mandatory specification for ip/udp steering rules.
When attaching multicast steering rules via ethtool the unicast mac of the
interface was added to the rule specification instead of the multicast mac.
The following commit sets the corresponding multicast mac for the rule multicast ip.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 12:48:46 -05:00
Hadar Hen Zion
248c62aa12 net/mlx4_core: Set correctly allow_loopback flag
The allow_loopback flag was wrongly set using arithmetic bit operation, change
the code to use logical bit operation.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 12:48:46 -05:00
Hadar Hen Zion
015465f851 net/mlx4_core: Directly expose fields of HW flow steering rule control segment
Some of the fields for struct mlx4_net_trans_rule_hw_ctrl were packed into u32
and accessed through bit field operations. Expose and access them directly as
u8.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 12:48:46 -05:00
David S. Miller
f1e7b73acc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Bring in the 'net' tree so that we can get some ipv4/ipv6 bug
fixes that some net-next work will build upon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 15:32:13 -05:00
Jiri Kosina
617677295b Merge branch 'master' into for-next
Conflicts:
	drivers/devfreq/exynos4_bus.c

Sync with Linus' tree to be able to apply patches that are
against newer code (mvneta).
2013-01-29 10:48:30 +01:00
Amir Vadai
78fb2de711 net/mlx4_en: Initialize RFS filters lock and list in init_netdev
filters_lock might have been used while it was re-initialized.
Moved filters_lock and filters_list initialization to init_netdev instead of
alloc_resources which is called every time the device is configured.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28 00:14:28 -05:00
Amir Vadai
7225922558 net/mlx4_en: Fix a race when closing TX queue
There is a possible race where the TX completion handler can clean the
entire TX queue between the decision that the queue is full and actually
closing it. To avoid this situation, check again if the queue is really
full, if not, reopen the transmit and continue with sending the packet.

CC: Eric Dumazet <edumazet@google.com>

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28 00:14:24 -05:00
Jack Morgenstein
f356fcbe12 net/mlx4_core: Return proper error code when __mlx4_add_one fails
Returning 0 (success) when in fact we are aborting the load, leads to kernel
panic when unloading the module. Fix that by returning the actual error code.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28 00:13:57 -05:00
Eugenia Emantayev
dbd501a806 net/mlx4_en: Use the correct netif lock on ndo_set_rx_mode
The device multicast list is protected by netif_addr_lock_bh in the networking core, we should
use this locking practice in mlx4_en too.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28 00:13:56 -05:00
Aviad Yehezkel
db0e7cba6d net/mlx4_en: Fix traffic loss under promiscuous mode
When port is stopped and flow steering mode is not device managed: promisc QP
rule wasn't removed from MCG table.
Added code to remove it in all flow steering modes.
In addition, promsic rule removal should be in stop port and not in start
port - moved it accordingly.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28 00:13:56 -05:00
Eugenia Emantayev
2d51837fa1 net/mlx4_en: Issue the dump eth statistics command under lock
Performing the DUMP_ETH_STATS firmware command outside the lock leads to kernel
panic when data structures such as RX/TX rings are freed in parallel, e.g when
one changes the mtu or ring sizes.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28 00:13:56 -05:00
Eric Dumazet
546bfedbe6 net/mlx4_en: remove redundant code
remove redundant code from build_inline_wqe()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-By: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-20 23:10:52 -05:00
Or Gerlitz
ca4c7b35f7 net/mlx4_core: Set number of msix vectors under SRIOV mode to firmware defaults
The lines

	if (mlx4_is_mfunc(dev)) {
		nreq = 2;
	} else {

which hard code the number of requested msi-x vectors under multi-function
mode to two can be removed completely, since the firmware sets num_eqs and
reserved_eqs appropriately Thus, the code line:

	nreq = min_t(int, dev->caps.num_eqs - dev->caps.reserved_eqs, nreq);

is by itself sufficient and correct for all cases. Currently, for mfunc
mode num_eqs = 32 and reserved_eqs = 28, hence four vectors will be enabled.

This triples (one vector is used for the async events and commands EQ) the
horse power provided for processing of incoming packets on netdev RSS scheme,
IO initiators/targets commands processing flows, etc.

Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18 14:25:28 -05:00
Yan Burman
213815a1e6 net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode
Commit 5b4c4d3686 "mlx4_en: Allow communication between functions on
same host" introduced a regression under which a bridge acting as vSwitch
whose uplink is an mlx4 Ethernet device become non-operative in native
(non sriov) mode. This happens since broadcast ARP requests sent by VMs
were loopback-ed by the HW and hence the bridge learned VM source MACs
on both the VM and the uplink ports.

The fix is to place the DMAC in the send WQE only under SRIOV/eSwitch
configuration or when the device is in selftest.

Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-18 14:25:28 -05:00
Jiri Pirko
aaeb6cdfa5 remove init of dev->perm_addr in drivers
perm_addr is initialized correctly in register_netdevice() so to init it in
drivers is no longer needed.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-08 18:00:48 -08:00
Jorrit Schippers
d82603c6da treewide: Replace incomming with incoming in all comments and strings
Signed-off-by: Jorrit Schippers <jorrit@ncode.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-01-03 16:15:49 +01:00
Jack Morgenstein
3c439b5586 mlx4_core: Allow choosing flow steering mode
Device managed flow steering will be enabled only under administrator
directive provided through setting the existing module parameter
log_num_mgm_entry_size to -1 (if the device actually supports flow
steering).  If flow steering isn't requested or not available, the
driver will use the value of log_num_mgm_entry_size and B0 steering.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19 11:47:22 -08:00
Jack Morgenstein
7b8157bedc mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV
Separate flow steering capability detection from the decision to activate.

For the master (and for native), detect the flow steering capability
in mlx4_dev_cap, but activate the appropriate steering type in a new
function choose_flow_steering() based on detected data.

For VFs, activate flow steering based on what was actually activated
by the master, where that info is obtained via QUERY_HCA. This fixes
the current VF detection which is wrongly based on QUERY_DEV_CAP.

Also, for SR-IOV mode, if flow steering may be activated, do so only
if the max number of QPs per rule is sufficient to satisfy one
subscription per VF.  If not, fall back to B0 mode. This is needed to
serve registrations done by L2 network drivers such as mlx4_en and
IPoIB when the network stack attempts to join to multicast groups such
as all-hosts or the IPoIB broadcast group.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19 11:47:14 -08:00
Hadar Hen Zion
2065b38bc2 mlx4_core: Fix error flow in the flow steering wrapper
The error flow of the flow steering wrapper had a typo which caused
the wrong firmware command to be called, fix it.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19 09:44:00 -08:00
Hadar Hen Zion
a9c01e7ac1 mlx4_core: Add QPN enforcement for flow steering rules set by VFs
Since VFs may be mapped to VMs which aren't trusted entities, flow
steering rules attached through the wrapper on behalf of VFs must be
checked to make sure that the specified QP number is assigned to that
VF.  Also, make sure to keep the QP busy till the end of the operation
from the resource tracker point of view.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19 09:43:59 -08:00
Linus Torvalds
f132c54e3a First batch of InfiniBand/RDMA changes for the 3.8 merge window:
- A good chunk of Bart Van Assche's SRP fixes
  - UAPI disintegration from David Howells
  - mlx4 support for "64-byte CQE" hardware feature from Or Gerlitz
  - Other miscellaneous fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQxstjAAoJEENa44ZhAt0hURUQAJd7HumReKTdRqzIzXPc+rgl
 pRR5eqplPY2anfJMqLDiFphVjfCiKyhudomdo+RUbBFFnUVLlBzk80A0/IZ3g3PZ
 MHOT+pX4PGDd+3FQxV2AaQCMwgGbvC0haInXyQDVZGm0fbMjRd699yGVWBiA8rOI
 VNhUi5WMmynSINYokM8UxrhfoUfy3QxsOvZBZ3XUD1zjJB0IMd5HRdiDUG7ur0q+
 rfpWKv51DXT81ux36MXbdPBhLRbzx4B7EwuPWOFPqJe1KwK2cD8iA6DwEKC9KMxS
 Kj2+CxB5Bfpfz8bhLi2VZcMgAKiSIQDXUtiKz8h0yFVhvADYZLU7zdGN49mCqKcY
 9dwX8+0aIVez6WB2jH+ir2FSG65NsnvqESwQ4LLQ9bhArgf9fapVGlypHwcKi5hh
 3j2ipO/RyT56nLQeI0gz1P5mQneFSWlY96CD8WP+9OxO/mVnxViajzevSwT/cLE6
 IOMks8DPhsQK88JXSx0XKVxn3zrJ9SXbYDhRWJ6f4w/fxraRXlFdQi0UfcsAajkX
 5qmM4e8Oy97TJYiY1RkAmb7aV182xMWVjtDx2FFTQ5ukgDea/DklIM/JNQ475027
 N7zMW1tP6+gnnDyMEkteVuPdbl1fzwI3RdXCh0mFZHZ5tvegkdxbw0XxERcevnQN
 LZfME8wCuC7+RtmE38Li
 =TQK2
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband upate from Roland Dreier:
 "First batch of InfiniBand/RDMA changes for the 3.8 merge window:
   - A good chunk of Bart Van Assche's SRP fixes
   - UAPI disintegration from David Howells
   - mlx4 support for "64-byte CQE" hardware feature from Or Gerlitz
   - Other miscellaneous fixes"

Fix up trivial conflict in mellanox/mlx4 driver.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (33 commits)
  RDMA/nes: Fix for crash when registering zero length MR for CQ
  RDMA/nes: Fix for terminate timer crash
  RDMA/nes: Fix for BUG_ON due to adding already-pending timer
  IB/srp: Allow SRP disconnect through sysfs
  srp_transport: Document sysfs attributes
  srp_transport: Simplify attribute initialization code
  srp_transport: Fix attribute registration
  IB/srp: Document sysfs attributes
  IB/srp: send disconnect request without waiting for CM timewait exit
  IB/srp: destroy and recreate QP and CQs when reconnecting
  IB/srp: Eliminate state SRP_TARGET_DEAD
  IB/srp: Introduce the helper function srp_remove_target()
  IB/srp: Suppress superfluous error messages
  IB/srp: Process all error completions
  IB/srp: Introduce srp_handle_qp_err()
  IB/srp: Simplify SCSI error handling
  IB/srp: Keep processing commands during host removal
  IB/srp: Eliminate state SRP_TARGET_CONNECTING
  IB/srp: Increase block layer timeout
  RDMA/cm: Change return value from find_gid_port()
  ...
2012-12-13 19:19:09 -08:00
Linus Torvalds
a2013a13e6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial branch from Jiri Kosina:
 "Usual stuff -- comment/printk typo fixes, documentation updates, dead
  code elimination."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  HOWTO: fix double words typo
  x86 mtrr: fix comment typo in mtrr_bp_init
  propagate name change to comments in kernel source
  doc: Update the name of profiling based on sysfs
  treewide: Fix typos in various drivers
  treewide: Fix typos in various Kconfig
  wireless: mwifiex: Fix typo in wireless/mwifiex driver
  messages: i2o: Fix typo in messages/i2o
  scripts/kernel-doc: check that non-void fcts describe their return value
  Kernel-doc: Convention: Use a "Return" section to describe return values
  radeon: Fix typo and copy/paste error in comments
  doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
  various: Fix spelling of "asynchronous" in comments.
  Fix misspellings of "whether" in comments.
  eisa: Fix spelling of "asynchronous".
  various: Fix spelling of "registered" in comments.
  doc: fix quite a few typos within Documentation
  target: iscsi: fix comment typos in target/iscsi drivers
  treewide: fix typo of "suport" in various comments and Kconfig
  treewide: fix typo of "suppport" in various comments
  ...
2012-12-13 12:00:02 -08:00
Yan Burman
520dfe3a36 net/mlx4_en: Add support for destination MAC in steering rules
Implement destination MAC rule extension for L3/L4 rules in
flow steering. Usefull for vSwitch/macvlan configurations.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12 13:02:30 -05:00
Yan Burman
c402b9477b net/mlx4_en: Use generic etherdevice.h functions.
Get rid of full_mac, zero_mac in favour of
is_zero_ether_addr and is_broadcast_ether_addr.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12 13:02:30 -05:00
Greg Kroah-Hartman
1dd06ae8db drivers/net: fix up function prototypes after __dev* removals
The __dev* removal patches for the network drivers ended up messing up
the function prototypes for a bunch of drivers.  This patch fixes all of
them back up to be properly aligned.

Bonus is that this almost removes 100 lines of code, always a nice
surprise.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07 14:22:22 -05:00
Bill Pemberton
f57e68488e mlx4_core: remove __dev* attributes
CONFIG_HOTPLUG is going away as an option.  As result the __dev*
markings will be going away.

Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst,
and __devexit.

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-03 11:16:44 -08:00
Amir Vadai
d317966bd3 net/mlx4_en: Set number of rx/tx channels using ethtool
Add support to changing number of rx/tx channels using
ethtool ('ethtool -[lL]'). Where the number of tx channels specified in ethtool
is the number of rings per user priority - not total number of tx rings.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-02 20:22:59 -05:00
Amir Vadai
79c54b6bbf net/mlx4_en: Fix TX moderation info loss after set_ringparam is called
We need to re-set tx moderation information after calling set_ringparam
else default tx moderation will be used.
Also avoid related code duplication, by putting it in a utility function.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-02 20:22:58 -05:00
Jack Morgenstein
311f813a2d mlx4_core: Fix potential deadlock in mlx4_eq_int()
The slave_state_lock spinlock is used in both interrupt context and
process context, hence irq locking must be used.  Found by lockdep.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-29 12:14:54 -08:00
David S. Miller
8a2cf062b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29 12:51:17 -05:00
Amir Vadai
29bb8f4a8d net/mlx4_en: Can set maxrate only for TC0
Had a typo in memcpy.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 11:15:32 -05:00
Or Gerlitz
08ff32352d mlx4: 64-byte CQE/EQE support
ConnectX-3 devices can use either 64- or 32-byte completion queue
entries (CQEs) and event queue entries (EQEs).  Using 64-byte
EQEs/CQEs performs better because each entry is aligned to a complete
cacheline.  This patch queries the HCA's capabilities, and if it
supports 64-byte CQEs and EQES the driver will configure the HW to
work in 64-byte mode.

The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.

Since this mode is global, userspace (libmlx4) must be updated to work
with the configured CQE size, and guests using SR-IOV virtual
functions need to know both EQE and CQE size.

In case one of the 64-byte CQE/EQE capabilities is activated, the
patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
they need an update to be able to work with the PPF. This is done by
changing the returned pf_context_behaviour not to be zero any more. In
case none of these capabilities is activated that value remains zero
and older guest drivers can run OK.

The SRIOV related flow is as follows

1. the PPF does the detection of the new capabilities using
   QUERY_DEV_CAP command.

2. the PPF activates the new capabilities using INIT_HCA.

3. the VF detects if the PPF activated the capabilities using
   QUERY_HCA, and if this is the case activates them for itself too.

Note that the VF detects that it must be aware to the new PF behaviour
using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.

User space notification is done through a new field introduced in
struct mlx4_ib_ucontext which holds device capabilities for which user
space must take action. This changes the binary interface so the ABI
towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
when **needed** i.e. only when the driver does use 64-byte CQEs or
future device capabilities which must be in sync by user space. This
practice allows to work with unmodified libmlx4 on older devices (e.g
A0, B0) which don't support 64-byte CQEs.

In order to keep existing systems functional when they update to a
newer kernel that contains these changes in VF and userspace ABI, a
module parameter enable_64b_cqe_eqe must be set to enable 64-byte
mode; the default is currently false.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-26 10:19:17 -08:00
Ben Hutchings
ff33c0e188 net: Remove bogus dependencies on INET
Various drivers depend on INET because they used to select INET_LRO,
but they have all been converted to use GRO which has no such
dependency.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 19:13:59 -05:00
Ben Hutchings
f1d29a3fa6 mlx4_en: Remove remnants of LRO support
Commit fa37a9586f ('mlx4_en: Moving to
work with GRO') left behind the Kconfig depends/select, some dead
code and comments referring to LRO.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 19:13:59 -05:00
Adam Buchbinder
b3834be5c4 various: Fix spelling of "asynchronous" in comments.
"Asynchronous" is misspelled in some comments. No code changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-19 14:32:13 +01:00
David S. Miller
d4185bbf62 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c

Minor conflict between the BCM_CNIC define removal in net-next
and a bug fix added to net.  Based upon a conflict resolution
patch posted by Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-10 18:32:51 -05:00
Eric Dumazet
ecfd2ce1a9 mlx4: change TX coalescing defaults
mlx4 currently uses a too high tx coalescing setting, deferring
TX completion interrupts by up to 128 us.

With the recent skb_orphan() removal in commit 8112ec3b87,
performance of a single TCP flow is capped to ~4 Gbps, unless
we increase tcp_limit_output_bytes.

I suggest using 16 us instead of 128 us, allowing a finer control.

Performance of a single TCP flow is restored to previous levels,
while keeping TCP small queues fully enabled with default sysctl.

This patch is also a BQL prereq.

Reported-by: Vimalkumar <j.vimal@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-07 15:30:19 -05:00
Linus Torvalds
e657e078d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "This is what we usually expect at this stage of the game, lots of
  little things, mostly in drivers.  With the occasional 'oops didn't
  mean to do that' kind of regressions in the core code."

 1) Uninitialized data in __ip_vs_get_timeouts(), from Arnd Bergmann

 2) Reject invalid ACK sequences in Fast Open sockets, from Jerry Chu.

 3) Lost error code on return from _rtl_usb_receive(), from Christian
    Lamparter.

 4) Fix reset resume on USB rt2x00, from Stanislaw Gruszka.

 5) Release resources on error in pch_gbe driver, from Veaceslav Falico.

 6) Default hop limit not set correctly in ip6_template_metrics[], fix
    from Li RongQing.

 7) Gianfar PTP code requests wrong kind of resource during probe, fix
    from Wei Yang.

 8) Fix VHOST net driver on big-endian, from Michael S Tsirkin.

 9) Mallenox driver bug fixes from Jack Morgenstein, Or Gerlitz, Moni
    Shoua, Dotan Barak, and Uri Habusha.

10) usbnet leaks memory on TX path, fix from Hemant Kumar.

11) Use socket state test, rather than presence of FIN bit packet, to
    determine FIONREAD/SIOCINQ value.  Fix from Eric Dumazet.

12) Fix cxgb4 build failure, from Vipul Pandya.

13) Provide a SYN_DATA_ACKED state to complement SYN_FASTOPEN in socket
    info dumps.  From Yuchung Cheng.

14) Fix leak of security path in kfree_skb_partial().  Fix from Eric
    Dumazet.

15) Handle RX FIFO overflows more resiliently in pch_gbe driver, from
    Veaceslav Falico.

16) Fix MAINTAINERS file pattern for networking drivers, from Jean
    Delvare.

17) Add iPhone5 IDs to IPHETH driver, from Jay Purohit.

18) VLAN device type change restriction is too strict, and should not
    trigger for the automatically generated vlan0 device.  Fix from Jiri
    Pirko.

19) Make PMTU/redirect flushing work properly again in ipv4, from
    Steffen Klassert.

20) Fix memory corruptions by using kfree_rcu() in netlink_release().
    From Eric Dumazet.

21) More qmi_wwan device IDs, from Bjørn Mork.

22) Fix unintentional change of SNAT/DNAT hooks in generic NAT
    infrastructure, from Elison Niven.

23) Fix 3.6.x regression in xt_TEE netfilter module, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
  tilegx: fix some issues in the SW TSO support
  qmi_wwan/cdc_ether: move Novatel 551 and E362 to qmi_wwan
  net: usb: Fix memory leak on Tx data path
  net/mlx4_core: Unmap UAR also in the case of error flow
  net/mlx4_en: Don't use vlan tag value as an indication for vlan presence
  net/mlx4_en: Fix double-release-range in tx-rings
  bas_gigaset: fix pre_reset handling
  vhost: fix mergeable bufs on BE hosts
  gianfar_ptp: use iomem, not ioports resource tree in probe
  ipv6: Set default hoplimit as zero.
  NET_VENDOR_TI: make available for am33xx as well
  pch_gbe: fix error handling in pch_gbe_up()
  b43: Fix oops on unload when firmware not found
  mwifiex: clean up scan state on error
  mwifiex: return -EBUSY if specific scan request cannot be honored
  brcmfmac: fix potential NULL dereference
  Revert "ath9k_hw: Updated AR9003 tx gain table for 5GHz"
  ath9k_htc: Add PID/VID for a Ubiquiti WiFiStation
  rt2x00: usb: fix reset resume
  rtlwifi: pass rx setup error code to caller
  ...
2012-10-26 15:00:48 -07:00
Dotan Barak
bfc0d8c3de net/mlx4_core: Unmap UAR also in the case of error flow
If a failure takes place during the EQ creation, we need to unmap the
UAR memory block too.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Uri Habusha <urih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26 03:34:15 -04:00
Moni Shoua
2b39a06198 net/mlx4_en: Don't use vlan tag value as an indication for vlan presence
The vlan tag can be zero. This is why it can't serve as an indication
that packet requires VLAN header in the TX flow.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26 03:34:15 -04:00
Jack Morgenstein
7208ca3007 net/mlx4_en: Fix double-release-range in tx-rings
The QP range is reserved as a single block. However, when freeing the
en resources, the tx-ring QPs are released both in mlx4_en_destroy_tx_ring
(one at a time) and in mlx4_en_free_resources (as a block release).

Fix by eliminating the one-at-a-time release in mlx4_en_destroy_tx_ring.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26 03:34:15 -04:00
Dotan Barak
41929ed265 mlx4_core: Perform correct resource cleanup if mlx4_QUERY_ADAPTER() fails
Fixed the resource cleanup to act correctly and prevent a kernel oops when
mlx4_QUERY_ADAPTER() fails.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-23 09:03:37 -07:00
Or Gerlitz
3cf164c8de mlx4_core: Remove annoying debug messages from SR-IOV flow
These debug prints left behind by commits c82e9aa0a8 ("mlx4_core:
resource tracking for HCA resources used by guests"), 54679e1482
("mlx4: Implement QP paravirtualization and maintain phys_pkey_cache
for smp_snoop") and 993c401e20 ("mlx4_core: Add IB port-state
machine and port mgmt event propagation") make it pretty hard to
actually use the mlx4_core debug messages when running in SRIOV/IB
mode -- for example, the module load sequence of a device with one VF
yielded 631 debug prints, with 408 of them being from this set.  Let's
just remove them.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-23 09:03:30 -07:00
Jack Morgenstein
60396683fe mlx4_core: Adjust flow steering attach wrapper so that IB works on SR-IOV VFs
Currently, the InfiniBand stack does not support flow steering at the
verbs level -- the only usage of flow steering in the IB driver is for
L2 multicast attaches.  We need to add the IB case to procedure
mlx4_QP_FLOW_STEERING_ATTACH_wrapper() to allow IPoIB to work on VFs
on ConnectX-3 when flow steering is enabled.

Currently, the IB case in mlx4_QP_FLOW_STEERING_ATTACH_wrapper() is missing,
so the procedure returns -EINVAL and IPoIB on VFs breaks.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-03 14:11:20 -07:00
Linus Torvalds
7a9a2970b5 First batch of InfiniBand/RDMA changes for the 3.7 merge window:
- mlx4 IB support for SR-IOV
  - A couple of SRP initiator fixes
  - Batch of nes hardware driver fixes
  - Fix for long-standing use-after-free crash in IPoIB
  - Other miscellaneous fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQav4jAAoJEENa44ZhAt0hmL0QAJTuMdSOzYFd/NB38owJCNM2
 kz/N1GlBm3z98fIlGo8u+lzgV2qxqZSAzsJsouMeK38KiAX3CL8HKe44A1QvTM6v
 dXTNL4JFX24/YF+nlmMY8Av518I9Mkte3BZCnpYkBjVFBWe0ePwoRC/btfBXPDIV
 0snq4OtjoBAn00dOOyuZ5PoyY9xf0z4UB0Gple9sM4mzEb8wVWdNDDPOiuPJc6fA
 L+gk6HLkZDg54+QswafdKYwpeTq45wIKLmCdS3oUNmppMLVhZY8rECOwzSa+KiTr
 /Yo19n+zl+IBlvjQHhmUqGHvdD17PaGlr+TckAsQqmVfXUH5qqpEnkF8FoEK59c5
 YA3lVU8Sj4BPhJ7qX54CuN3767mZizakkxCr9iPRzABFTgzWVgcSgCrE8jjx4i0h
 Pam+L5bmANFStgmGR8PmXiNgcrCUcEqYHsOWDDAnHa5ekb2nyv1JL1c18hlY9hC3
 Xb1YTMZFwvofGza89hBu7oHrMbLOUc5kW2lBpvUn2nlyf3i0F8ISlVbVbNjFA54p
 60/jHa2VOQ2CcJUJKnJOk4ajOOEfHnPtMn2q96XJ69Dp8+eSYEO/G+0i1OlChq4h
 ClnG0Yp+NkT1o8WXMd7guDR+RsXt+DXIij5TiUWRIqnIlopIsMTRhNH28tMu4jQL
 fgN5n987wru91ewdX4gW
 =PAcy
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband updates from Roland Dreier:
 "First batch of InfiniBand/RDMA changes for the 3.7 merge window:
   - mlx4 IB support for SR-IOV
   - A couple of SRP initiator fixes
   - Batch of nes hardware driver fixes
   - Fix for long-standing use-after-free crash in IPoIB
   - Other miscellaneous fixes"

This merge also removes a new use of __cancel_delayed_work(), and
replaces it with the regular cancel_delayed_work() that is now irq-safe
thanks to the workqueue updates.

That said, I suspect the sequence in question should probably use
"mod_delayed_work()".  I just did the minimal "don't use deprecated
functions" fixup, though.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits)
  IB/qib: Fix local access validation for user MRs
  mlx4_core: Disable SENSE_PORT for multifunction devices
  mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs
  mlx4_core: Stash PCI ID driver_data in mlx4_priv structure
  IB/srp: Avoid having aborted requests hang
  IB/srp: Fix use-after-free in srp_reset_req()
  IB/qib: Add a qib driver version
  RDMA/nes: Fix compilation error when nes_debug is enabled
  RDMA/nes: Print hardware resource type
  RDMA/nes: Fix for crash when TX checksum offload is off
  RDMA/nes: Cosmetic changes
  RDMA/nes: Fix for incorrect MSS when TSO is on
  RDMA/nes: Fix incorrect resolving of the loopback MAC address
  mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem
  mlx4_core: Trivial cleanups to driver log messages
  mlx4_core: Trivial readability fix: "0X30" -> "0x30"
  IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes
  mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
  mlx4: Paravirtualize Node Guids for slaves
  mlx4: Activate SR-IOV mode for IB
  ...
2012-10-02 17:20:40 -07:00
Linus Torvalds
aecdc33e11 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

 1) GRE now works over ipv6, from Dmitry Kozlov.

 2) Make SCTP more network namespace aware, from Eric Biederman.

 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

 4) Make openvswitch network namespace aware, from Pravin B Shelar.

 5) IPV6 NAT implementation, from Patrick McHardy.

 6) Server side support for TCP Fast Open, from Jerry Chu and others.

 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

 8) Increate the loopback default MTU to 64K, from Eric Dumazet.

 9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic.  This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail.  Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
  hyperv: Add buffer for extended info after the RNDIS response message.
  hyperv: Report actual status in receive completion packet
  hyperv: Remove extra allocated space for recv_pkt_list elements
  hyperv: Fix page buffer handling in rndis_filter_send_request()
  hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
  hyperv: Fix the max_xfer_size in RNDIS initialization
  vxlan: put UDP socket in correct namespace
  vxlan: Depend on CONFIG_INET
  sfc: Fix the reported priorities of different filter types
  sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
  sfc: Fix loopback self-test with separate_tx_channels=1
  sfc: Fix MCDI structure field lookup
  sfc: Add parentheses around use of bitfield macro arguments
  sfc: Fix null function pointer in efx_sriov_channel_type
  vxlan: virtual extensible lan
  igmp: export symbol ip_mc_leave_group
  netlink: add attributes to fdb interface
  tg3: unconditionally select HWMON support when tg3 is enabled.
  Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
  gre: fix sparse warning
  ...
2012-10-02 13:38:27 -07:00
Linus Torvalds
033d9959ed Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue changes from Tejun Heo:
 "This is workqueue updates for v3.7-rc1.  A lot of activities this
  round including considerable API and behavior cleanups.

   * delayed_work combines a timer and a work item.  The handling of the
     timer part has always been a bit clunky leading to confusing
     cancelation API with weird corner-case behaviors.  delayed_work is
     updated to use new IRQ safe timer and cancelation now works as
     expected.

   * Another deficiency of delayed_work was lack of the counterpart of
     mod_timer() which led to cancel+queue combinations or open-coded
     timer+work usages.  mod_delayed_work[_on]() are added.

     These two delayed_work changes make delayed_work provide interface
     and behave like timer which is executed with process context.

   * A work item could be executed concurrently on multiple CPUs, which
     is rather unintuitive and made flush_work() behavior confusing and
     half-broken under certain circumstances.  This problem doesn't
     exist for non-reentrant workqueues.  While non-reentrancy check
     isn't free, the overhead is incurred only when a work item bounces
     across different CPUs and even in simulated pathological scenario
     the overhead isn't too high.

     All workqueues are made non-reentrant.  This removes the
     distinction between flush_[delayed_]work() and
     flush_[delayed_]_work_sync().  The former is now as strong as the
     latter and the specified work item is guaranteed to have finished
     execution of any previous queueing on return.

   * In addition to the various bug fixes, Lai redid and simplified CPU
     hotplug handling significantly.

   * Joonsoo introduced system_highpri_wq and used it during CPU
     hotplug.

  There are two merge commits - one to pull in IRQ safe timer from
  tip/timers/core and the other to pull in CPU hotplug fixes from
  wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."

Fixed a number of trivial conflicts, but the more interesting conflicts
were silent ones where the deprecated interfaces had been used by new
code in the merge window, and thus didn't cause any real data conflicts.

Tejun pointed out a few of them, I fixed a couple more.

* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
  workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
  workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
  workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
  workqueue: remove @delayed from cwq_dec_nr_in_flight()
  workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
  workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
  workqueue: use __cpuinit instead of __devinit for cpu callbacks
  workqueue: rename manager_mutex to assoc_mutex
  workqueue: WORKER_REBIND is no longer necessary for idle rebinding
  workqueue: WORKER_REBIND is no longer necessary for busy rebinding
  workqueue: reimplement idle worker rebinding
  workqueue: deprecate __cancel_delayed_work()
  workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
  workqueue: use mod_delayed_work() instead of __cancel + queue
  workqueue: use irqsafe timer for delayed_work
  workqueue: clean up delayed_work initializers and add missing one
  workqueue: make deferrable delayed_work initializer names consistent
  workqueue: cosmetic whitespace updates for macro definitions
  workqueue: deprecate system_nrt[_freezable]_wq
  workqueue: deprecate flush[_delayed]_work_sync()
  ...
2012-10-02 09:54:49 -07:00
Roland Dreier
d172f5a4ab Merge branches 'cma', 'cxgb4', 'ipoib', 'mlx4', 'mlx4-sriov', 'nes', 'qib' and 'srp' into for-linus 2012-10-02 07:43:59 -07:00
Eric Dumazet
8112ec3b87 mlx4: dont orphan skbs in mlx4_en_xmit()
After commit e22979d96a (mlx4_en: Moving to Interrupts for TX
completions) we no longer need to orphan skbs in mlx4_en_xmit()
since skb wont stay a long time in TX ring before their release.

Orphaning skbs in ndo_start_xmit() should be avoided as much as
possible, since it breaks TCP Small Queue or other flow control
mechanisms (per socket limits)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-01 17:01:57 -04:00
Linus Torvalds
fdb2f9c2eb PCI changes for the 3.7 merge window:
Host bridge hotplug
     - Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi)
     - Clear host bridge resource info to avoid issue when releasing (Yinghai Lu)
     - Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu)
     - Use standard list ops for acpi_pci_drivers (Jiang Liu)
 
   Device hotplug
     - Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang Liu)
     - Remove fakephp driver (Bjorn Helgaas)
     - Fix VGA ref count in hotplug remove path (Yinghai Lu)
     - Allow acpiphp to handle PCIe ports without native hotplug (Jiang Liu)
     - Implement resume regardless of pciehp_force param (Oliver Neukum)
     - Make pci_fixup_irqs() work after init (Thierry Reding)
 
   Miscellaneous
     - Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang)
     - Factor out PCI Express Capability accessors (Jiang Liu)
     - Add pcibios_window_alignment() so powerpc EEH can use generic resource assignment (Gavin Shan)
     - Make pci_error_handlers const (Stephen Hemminger)
     - Cleanup drivers/pci/remove.c (Bjorn Helgaas)
     - Improve Vendor-Specific Extended Capability support (Bjorn Helgaas)
     - Use standard list ops for bus->devices (Bjorn Helgaas)
     - Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang)
     - Reassign invalid bus number ranges (Intel DP43BF workaround) (Yinghai Lu)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQac4hAAoJEPGMOI97Hn6zjZYP/iaqU9kjmgTsBbSyzB4oApv/
 RRxo3I+ad9GF6XlMQfVAtyx1pgCD1gdGAtoDgGSCTqgdYD3CO10AxKU+yleAk1wo
 dbMxLifJNTrT3G1mZ/NL16yEGhCwvhfwzRtB1VoZmCT4lSApO/7cJkXl2DzHfA/i
 pmltOOiQCN8kbUcJbVPtUyTVPi2zl/8bsyCyTkS7YG0VXeGRM+ZUvPWZJ7MnWYYB
 5qoCdrw5ENCCiDQ9yw5SAfgL23b+0p6OI/x3Lkex0QQOWwSqGSiaHt4b7eitrC5b
 2eAJg32f/AzZke1YbKLMfdsL0VJP3GAswhDVHlgmo63rZkOZChm+97dgZ35Mcv5v
 kEXkWyBb1xJ3t8rZir6Qer9Iv2wOB+MkZ5qtU/Vf+l0wLQLYTrRVsKngrEDREONk
 dXbokp6iVSPeA1sTSdH9MmHlTUIj82ZLSGcxcjTsN8NWZjxx6g3rNx1uay+5MYOW
 4ET9zNu5snrAqN6N4Tb81gvtG8qYfxzdvVfrA9AaGKI6xxB7jkqgFJRp55JiEcFc
 x4cmWkhvdlhVsG2TQwFxYNfswOqD+7NCs6V4kSVZX6ezpDrH7I5VvcnnhstF7C8l
 KZul0EV7OW+kDK23pNe24lVP2xtOv6G8eK/3PmeKIXWl1V83nqre/oLufRzTfs+Z
 SxkILwY/MFpuCFteKE1t
 =haBu
 -----END PGP SIGNATURE-----

Merge tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI changes from Bjorn Helgaas:
 "Host bridge hotplug
    - Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi)
    - Clear host bridge resource info to avoid issue when releasing
      (Yinghai Lu)
    - Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu)
    - Use standard list ops for acpi_pci_drivers (Jiang Liu)

  Device hotplug
    - Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang
      Liu)
    - Remove fakephp driver (Bjorn Helgaas)
    - Fix VGA ref count in hotplug remove path (Yinghai Lu)
    - Allow acpiphp to handle PCIe ports without native hotplug (Jiang
      Liu)
    - Implement resume regardless of pciehp_force param (Oliver Neukum)
    - Make pci_fixup_irqs() work after init (Thierry Reding)

  Miscellaneous
    - Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang)
    - Factor out PCI Express Capability accessors (Jiang Liu)
    - Add pcibios_window_alignment() so powerpc EEH can use generic
      resource assignment (Gavin Shan)
    - Make pci_error_handlers const (Stephen Hemminger)
    - Cleanup drivers/pci/remove.c (Bjorn Helgaas)
    - Improve Vendor-Specific Extended Capability support (Bjorn
      Helgaas)
    - Use standard list ops for bus->devices (Bjorn Helgaas)
    - Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang)
    - Reassign invalid bus number ranges (Intel DP43BF workaround)
      (Yinghai Lu)"

* tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (102 commits)
  PCI: acpiphp: Handle PCIe ports without native hotplug capability
  PCI/ACPI: Use acpi_driver_data() rather than searching acpi_pci_roots
  PCI/ACPI: Protect acpi_pci_roots list with mutex
  PCI/ACPI: Use acpi_pci_root info rather than looking it up again
  PCI/ACPI: Pass acpi_pci_root to acpi_pci_drivers' add/remove interface
  PCI/ACPI: Protect acpi_pci_drivers list with mutex
  PCI/ACPI: Notify acpi_pci_drivers when hot-plugging PCI root bridges
  PCI/ACPI: Use normal list for struct acpi_pci_driver
  PCI/ACPI: Use DEVICE_ACPI_HANDLE rather than searching acpi_pci_roots
  PCI: Fix default vga ref_count
  ia64/PCI: Clear host bridge aperture struct resource
  x86/PCI: Clear host bridge aperture struct resource
  PCI: Stop all children first, before removing all children
  Revert "PCI: Use hotplug-safe pci_get_domain_bus_and_slot()"
  PCI: Provide a default pcibios_update_irq()
  PCI: Discard __init annotations for pci_fixup_irqs() and related functions
  PCI: Use correct type when freeing bus resource list
  PCI: Check P2P bridge for invalid secondary/subordinate range
  PCI: Convert "new_id"/"remove_id" into generic pci_bus driver attributes
  xen-pcifront: Use hotplug-safe pci_get_domain_bus_and_slot()
  ...
2012-10-01 12:05:36 -07:00
Roland Dreier
aadf4f3f66 mlx4_core: Disable SENSE_PORT for multifunction devices
In the current driver, the SENSE_PORT firmware command is issued as a
"wrapped" command, but the command handling code doesn't have a
wrapper, so it will never do anything other than log an error message.
The latest ConnectX-3 2.11.500 firmware reports the SENSE_PORT
capability even in multi-function (SR-IOV) mode, so the driver will
try to issue the command.

At least until the driver has a proper wrapper for SENSE_PORT, make
sure we disable the command for multi-function devices.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-01 02:10:45 -07:00
Roland Dreier
ca3e57a599 mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs
Instead of having a hard-coded "PCI device ID != 0x1003" (which
obviously breaks as newer devices with ID != 0x1003 become available),
instead let's set a flag in our PCI device table for the older devices
where we're supposed to force using SENSE_PORT.  This also avoids
enabling SENSE_PORT for virtual functions by mistake.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-01 02:10:44 -07:00
Roland Dreier
839f12434c mlx4_core: Stash PCI ID driver_data in mlx4_priv structure
That way we can check flags later on, when we've finished with the
pci_device_id structure.  Also convert the "is VF" flag to an enum:
"Never do in the preprocessor what can be done in C."

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-01 02:10:39 -07:00
Roland Dreier
f3d4c89ee4 mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem
On an SR-IOV master device, __mlx4_init_one() calls mlx4_init_hca()
before mlx4_multi_func_init().  However, for unlucky configurations,
mlx4_init_hca() might call mlx4_SENSE_PORT() (via mlx4_dev_cap()), and
that calls mlx4_cmd_imm() with MLX4_CMD_WRAPPED set.

However, on a multifunction device with MLX4_CMD_WRAPPED, __mlx4_cmd()
calls into mlx4_slave_cmd(), and that immediately tries to do

	down(&priv->cmd.slave_sem);

but priv->cmd.slave_sem isn't initialized until mlx4_multi_func_init()
(which we haven't called yet).  The next thing it tries to do is access
priv->mfunc.vhcr, but that hasn't been allocated yet.

Fix this by moving the initialization of slave_sem and vhcr up into
mlx4_cmd_init(). Also, since slave_sem is really just being used as a
mutex, convert it into a slave_cmd_mutex.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:46 -07:00
Roland Dreier
84b1f1538f mlx4_core: Trivial cleanups to driver log messages
Also put format string onto one line.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:46 -07:00
Roland Dreier
69612b9ff4 mlx4_core: Trivial readability fix: "0X30" -> "0x30"
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:45 -07:00
Jack Morgenstein
47605df953 mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...).  The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().

This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use.  This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.

Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).

To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:

    base_sqpn -- in non-sriov mode, this was formerly sqp_start.
    base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
    base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.

The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps.  However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:43 -07:00
Jack Morgenstein
afa8fd1db9 mlx4: Paravirtualize Node Guids for slaves
This is necessary in order to support > 1 VF/PF in a VM for software
that uses the node guid as a discriminator, such as librdmacm.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:43 -07:00
Jack Morgenstein
026149cbaa mlx4: Activate SR-IOV mode for IB
Remove the error returns for IB ports from mlx4_ib_add,
mlx4_INIT_PORT_wrapper, and mlx4_CLOSE_PORT_wrapper.

Currently, SRIOV is supported only for devices for which the
link layer is IB on all ports; RoCE support will be added later.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:42 -07:00
Jack Morgenstein
992e8e6e87 IB/mlx4: Miscellaneous adjustments for SR-IOV IB support
1. Allow only master to change node description.
2. Prevent AH leakage in send mads.
3. Take device part number from PCI structure, so that guests see the
   VF part number (and not the PF part number).
4. Place the device revision ID into caps structure at startup.
5. SET_PORT in update_gids_task needs to go through wrapper on master.
6. In mlx4_ib_event(), PORT_MGMT_EVENT needs be handled in a work
   queue on the master, since it propagates events to slaves using
   GEN_EQE.
7. Do not support FMR on slaves.
8. Add spinlock to slave_event(), since it is called both in interrupt
   context and in process context (due to 6 above, and also if
   smp_snoop is used).  This fix was found and implemented by Saeed
   Mahameed <saeedm@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:41 -07:00
Jack Morgenstein
980e90010f mlx4_core: INIT/CLOSE port logic for IB ports in SR-IOV mode
Normally, INIT_PORT and CLOSE_PORT are invoked when special QP0
transitions to RTR, or transitions to ERR/RESET respectively.

In SR-IOV mode, however, the master is also paravirtualized.  This in
turn requires that we not do INIT_PORT until the entire QP0 path (real
QP0 and proxy QP0) is ready to receive.  When the real QP0 goes down,
we should indicate that the port is not active.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:40 -07:00
Jack Morgenstein
efcd235d73 net/mlx4_core: Adjustments to SET_PORT for IB SR-IOV
1. Slaves may not set the IS_SM capability for the port.
2. DEV_MGMT may not be set in multifunction mode.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:40 -07:00
Jack Morgenstein
2a4fae148c IB/mlx4: Propagate P_Key and guid change port management events to slaves
P_Key change and guid change events are not of interest to all slaves,
but only to those slaves which "see" the table slots whose contents
have change.

For example, if the guid at port 1, index 5 has changed in the PPF, we
wish to propagate the gid-change event only to the function which has
that guid index mapped to its port/guid table (in this case it is
slave #5). Other functions should not get the event, since the event
does not affect them.

Similarly with P_Keys -- P_Key change events are forwarded only to
slaves which have that P_Key index mapped to their virtual P_Key table.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:38 -07:00
Jack Morgenstein
a0c64a17ab mlx4: Add alias_guid mechanism
For IB ports, we paravirtualize the GUID at index 0 on slaves.  The
GUID at index 0 seen by a slave is the actual GUID occupying the GUID
table at the slave-id index.

The driver, by default, requests at startup time that subnet manager
populate its entire guid table with GUIDs. These guids are then mapped
(paravirtualized) to the slaves, and appear for each slave as its GUID
at index 0.

Until each slave has such a guid, its port status is DOWN.

The guid table is cached to support special QP paravirtualization, and
event propagation to slaves on guid change (we test to see if the guid
really changed before propagating an event to the slave).

To support this caching, add capability to __mlx4_ib_query_gid() to
obtain the network view (i.e., physical view) gid at index X, not just
the host (paravirtualized) view.

Based on a patch from Erez Shitrit <erezsh@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:37 -07:00
Jack Morgenstein
993c401e20 mlx4_core: Add IB port-state machine and port mgmt event propagation
For an IB port, a slave should not show port active until that slave
has a valid alias-guid (provided by the subnet manager).  Therefore
the port-up event should be passed to a slave only after both the port
is up, and the slave's alias-guid has been set.

Also, provide the infrastructure for propagating port-management
events (client-reregister, etc) to slaves.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:37 -07:00
Jack Morgenstein
0a9a01884d mlx4: MAD_IFC paravirtualization
The MAD_IFC firmware command fulfills two functions.

First, it is used in the QP0/QP1 MAD-handling flow to obtain
information from the FW (for answering queries), and for setting
variables in the HCA (MAD SET packets).

For this, MAD_IFC should provide the FW (physical) view of the data.
This is the view that OpenSM needs.  We call this the "network view".

In the second case, MAD_IFC is used by various verbs to obtain data
regarding the local HCA (e.g., ib_query_device()).  We call this the
"host view".

This data needs to be paravirtualized.

MAD_IFC therefore needs a wrapper function, and also needs another
flag indicating whether it should provide the network view (when it is
called by ib_process_mad in special-qp packet handling), or the host
view (when it is called while implementing a verb).

There are currently 2 flag parameters in mlx4_MAD_IFC already:
ignore_bkey and ignore_mkey.  These two parameters are replaced by a
single "mad_ifc_flags" parameter, with different bits set for each
flag.  A third flag is added: "network-view/host-view".

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:34 -07:00
Jack Morgenstein
54679e1482 mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop
This requires:

1. Replacing the paravirtualized P_Key index (inserted by the guest)
   with the real P_Key index.

2. For UD QPs, placing the guest's true source GID index in the
   address path structure mgid field, and setting the ud_force_mgid
   bit so that the mgid is taken from the QP context and not from the
   WQE when posting sends.

3. For UC and RC QPs, placing the guest's true source GID index in the
   address path structure mgid field.

4. For tunnel and proxy QPs, setting the Q_Key value reserved for that
   proxy/tunnel pair.

Since not all the above adjustments occur in all the QP transitions,
the QP transitions require separate wrapper functions.

Secondly, initialize the P_Key virtualization table to its default
values: Master virtualized table is 1-1 with the real P_Key table,
guest virtualized table has P_Key index 0 mapped to the real P_Key
index 0, and all the other P_Key indices mapped to the reserved
(invalid) P_Key at index 127.

Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache.
and generating events on the master only if a P_Key actually changed.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:33 -07:00
Jack Morgenstein
fc06573dfa IB/mlx4: Initialize SR-IOV IB support for slaves in master context
Allocate SR-IOV paravirtualization resources and MAD demuxing contexts
on the master.

This has two parts.  The first part is to initialize the structures to
contain the contexts.  This is done at master startup time in
mlx4_ib_init_sriov().

The second part is to actually create the tunneling resources required
on the master to support a slave.  This is performed the master
detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event
generated when a slave initializes its comm channel).

For the master, there is no such startup event, so it creates its own
tunneling resources when it starts up.  In addition, the master also
creates the real special QPs.  The ib_core layer on the master causes
creation of proxy special QPs, since the master is also
paravirtualized at the ib_core layer.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:32 -07:00
Jack Morgenstein
e2c76824ca mlx4_core: Add proxy and tunnel QPs to the reserved QP area
In addition, pass the proxy and tunnel QP numbers to slaves so the
driver can perform special QP paravirtualization.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:31 -07:00
Or Gerlitz
a084feebd2 mlx4_core: Remove annoying debug message in the resource tracker
This innocent print makes it very hard to actually use the mlx4_core
debug messages -- for example, the module load sequence of a device
with two VFs yielded 3200 debug prints, with 2800 of them being this
one.  Let's just remove it.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:09 -07:00
Dotan Barak
426dd00d76 mlx4_core: Fix wrong offset in parsing query device caps response
The wrong offset was used when parsing the number of XRCs in
mlx4_QUERY_DEV_CAP().

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:08 -07:00
Linus Torvalds
2ade0b7f79 Last set of InfiniBand/RDMA fixes for 3.6:
- Couple more IPoIB fixes for regressions introduced by path database conversion
  - Minor other fixes to low-level drivers (cxgb4, mlx4, qib, ocrdma)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQV0eiAAoJEENa44ZhAt0hTScP/3+Y8v4w3Nyop7iNj9uvnsCn
 PhgfY9XUp9d0zOHM7XK+pTGLJheLLiyXfcCFTwgoC6tIyj7J1ulhYv+RBr7GbS+e
 K7aG7QtLDyM17ETgOBVIAvkz1CFKwdI843xRQKo0/oVMQJ9XLHg0nvzd+XaxZuud
 C+7Lfl5RH9Bl1tHJuOZmdyJinSQyLO/uV+tqwL5upF5YMTCJ2XsmVa1HQKhOAkBE
 bH4zXtFViNEKVuKXkpGc1yJFKZiwysirluYXedDkkEdr2jf9Mysw5JvZKVUOCh+4
 qYMjEAWQaUBia/tFUs9k6iCwE8ws7tRC4OSisb3cVzCUaqkfmx1Sp7vAQcDqx4xB
 r2dy4oc+vEFdrmjerOVEgmdgTwnDrWMjUlPuGyRX9935I7ybKA7NQPGlImKU8oiq
 8vyptsTN6r7AYc7kw6aYaE9kqGPnST7I3cZg71lHfwCuHKMNYfkxd1Mqxx/W6rMZ
 G1VnstVdioQK2HvcFIeArXQ6H6QjUAY92WMoyhIsrU4KwEx2Zpr0pfKWCD0eIOuB
 f4s9v0f5/6DqSwJ9uhhpAW/5bmV+RUCDjm8Ep9i70bGk7SliwT0MNPhHbaMfm7aU
 c+awy3HWpWKGCtod9zUzJH/4NsHzpggdHUlyAOuU5nqsh9GarYb0cvLjTNfsfdyV
 haQtMR+fSUK7Durd9MQl
 =nqvR
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA fixes from Roland Dreier:
 - A couple more IPoIB fixes for regressions introduced by path database
   conversion
 - Minor other fixes to low-level drivers (cxgb4, mlx4, qib, ocrdma)

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/qib: Fix failure of compliance test C14-024#06_LocalPortNum
  RDMA/ocrdma: Fix CQE expansion of unsignaled WQE
  mlx4_core: Fix integer overflows so 8TBs of memory registration works
  IPoIB: Fix AB-BA deadlock when deleting neighbours
  IPoIB: Fix memory leak in the neigh table deletion flow
  RDMA/cxgb4: Move dereference below NULL test
2012-09-17 13:21:02 -07:00
Yishai Hadas
dd03e73481 mlx4_core: Fix integer overflows so 8TBs of memory registration works
This patch adds on the fixes done in commits 89dd86db78 ("mlx4_core:
Allow large mlx4_buddy bitmaps") and 3de819e6b6 ("mlx4_core: Fix
integer overflow issues around MTT table") so that memory registration
of up to 8TB (log_num_mtt=31) finally works.

It fixes integer overflows in a few mlx4_table_yyy routines in icm.c
by using a u64 intermediate variable, and int/uint issues that caused
table indexes to become nagive by setting some variables to be u32
instead of int.  These problems cause crashes when a user attempted to
register > 512GB of RAM.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-13 17:52:02 -07:00
Bjorn Helgaas
78890b5989 Merge commit 'v3.6-rc5' into next
* commit 'v3.6-rc5': (1098 commits)
  Linux 3.6-rc5
  HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured
  Remove user-triggerable BUG from mpol_to_str
  xen/pciback: Fix proper FLR steps.
  uml: fix compile error in deliver_alarm()
  dj: memory scribble in logi_dj
  Fix order of arguments to compat_put_time[spec|val]
  xen: Use correct masking in xen_swiotlb_alloc_coherent.
  xen: fix logical error in tlb flushing
  xen/p2m: Fix one-off error in checking the P2M tree directory.
  powerpc: Don't use __put_user() in patch_instruction
  powerpc: Make sure IPI handlers see data written by IPI senders
  powerpc: Restore correct DSCR in context switch
  powerpc: Fix DSCR inheritance in copy_thread()
  powerpc: Keep thread.dscr and thread.dscr_inherit in sync
  powerpc: Update DSCR on all CPUs when writing sysfs dscr_default
  powerpc/powernv: Always go into nap mode when CPU is offline
  powerpc: Give hypervisor decrementer interrupts their own handler
  powerpc/vphn: Fix arch_update_cpu_topology() return value
  ARM: gemini: fix the gemini build
  ...

Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
	drivers/rapidio/devices/tsi721.c
2012-09-13 08:41:01 -06:00
Bjorn Helgaas
1959ec5f82 Merge branch 'pci/stephen-const' into next
* pci/stephen-const:
  make drivers with pci error handlers const
  scsi: make pci error handlers const
  netdev: make pci_error_handlers const
  PCI: Make pci_error_handlers const
2012-09-12 13:54:10 -06:00
Stephen Hemminger
3646f0e5c9 netdev: make pci_error_handlers const
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-09-07 16:35:00 -06:00
Eugenia Emantayev
521130d11f net/mlx4_core: Return the error value in case of command initialization failure
If mlx4_cmd_init() failed, the init_one function returned
success, although no resources were opened.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 12:55:59 -04:00
Aviad Yehezkel
bef772eb06 net/mlx4_core: Fixing error flow in case of QUERY_FW failure
The order of operations was wrong on the teardown flow.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 12:55:59 -04:00
Aviad Yehezkel
60d31c1475 net/mlx4_core: Looking for promiscuous entries on the correct port
The search for promisc entries was always done on the first port,
While the addition is done on the correct port.
This lead to resource leackage of promisc entries on the second
port and brought to a state where we could no longer enter to
promiscuous mode after enough iterations of "ifconfig promisc"
on the second port.
Fix that by using the correct port when searching.

Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 12:55:59 -04:00
Hadar Hen Zion
7fb40f87c4 net/mlx4_core: Add security check / enforcement for flow steering rules set for VMs
Since VFs may be mapped to VMs which aren't trusted entities,  flow
steering rules attached through the wrapper on behalf of VFs must be
checked to make sure that their L2 specification relate to MAC address
assigned to that VF, and add L2 specification if its missing.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 12:55:59 -04:00
Hadar Hen Zion
a8edc3bf05 net/mlx4_core: Put Firmware flow steering structures in common header files
To allow for usage of the flow steering Firmware structures in more locations over the driver,
such as the resource tracker, move them from mcg.c to common header files.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 12:55:59 -04:00
Jiang Liu
fadd1daa0b mlx4: Use PCI Express Capability accessors
Use PCI Express Capability access functions to simplify mlx4 driver.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-08-23 10:11:13 -06:00
Linus Torvalds
8f8ba75ee2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking update from David Miller:
 "A couple weeks of bug fixing in there.  The largest chunk is all the
  broken crap Amerigo Wang found in the netpoll layer."

 1) netpoll and it's users has several serious bugs:
    a) uses GFP_KERNEL with locks held
    b) interfaces requiring interrupts disabled are called with them
       enabled
    c) and vice versa
    d) VLAN tag demuxing, as per all other RX packet input paths, is not
       applied

    All from Amerigo Wang.

 2) Hopefully cure the ipv4 mapped ipv6 address TCP early demux bugs for
    good, from Neal Cardwell.

 3) Unlike AF_UNIX, AF_PACKET sockets don't set a default credentials
    when the user doesn't specify one explicitly during sendmsg().
    Instead we attach an empty (zero) SCM credential block which is
    definitely not what we want.  Fix from Eric Dumazet.

 4) IPv6 illegally invokes netdevice notifiers with RCU lock held, fix
    from Ben Hutchings.

 5) inet_csk_route_child_sock() checks wrong inet options pointer, fix
    from Christoph Paasch.

 6) When AF_PACKET is used for transmit, packet loopback doesn't behave
    properly when a socket fanout is enabled, from Eric Leblond.

 7) On bluetooth l2cap channel create failure, we leak the socket, from
    Jaganath Kanakkassery.

 8) Fix all the netprio file handling bugs found by Al Viro, from John
    Fastabend.

 9) Several error return and NULL deref bug fixes in networking drivers
    from Julia Lawall.

10) A large smattering of struct padding et al.  kernel memory leaks to
    userspace found of Mathias Krause.

11) Conntrack expections in netfilter can access an uninitialized timer,
    fix from Pablo Neira Ayuso.

12) Several netfilter SIP tracker bug fixes from Patrick McHardy.

13) IPSEC ipv6 routes are not initialized correctly all the time,
    resulting in an OOPS in inet_putpeer().  Also from Patrick McHardy.

14) Bridging does rcu_dereference() outside of RCU protected area, from
    Stephen Hemminger.

15) Fix routing cache removal performance regression when looking up
    output routes that have a local destination.  From Zheng Yan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  af_netlink: force credentials passing [CVE-2012-3520]
  ipv4: fix ip header ident selection in __ip_make_skb()
  ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()
  tcp: fix possible socket refcount problem
  net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child()
  net/core/dev.c: fix kernel-doc warning
  netconsole: remove a redundant netconsole_target_put()
  net: ipv6: fix oops in inet_putpeer()
  net/stmmac: fix issue of clk_get for Loongson1B.
  caif: Do not dereference NULL in chnl_recv_cb()
  af_packet: don't emit packet on orig fanout group
  drivers/net/irda: fix error return code
  drivers/net/wan/dscc4.c: fix error return code
  drivers/net/wimax/i2400m/fw.c: fix error return code
  smsc75xx: add missing entry to MAINTAINERS
  net: qmi_wwan: new devices: UML290 and K5006-Z
  net: sh_eth: Add eth support for R8A7779 device
  netdev/phy: skip disabled mdio-mux nodes
  dt: introduce for_each_available_child_of_node, of_get_next_available_child
  net: netprio: fix cgrp create and write priomap race
  ...
2012-08-21 16:46:08 -07:00
Tejun Heo
203b42f731 workqueue: make deferrable delayed_work initializer names consistent
Initalizers for deferrable delayed_work are confused.

* __DEFERRED_WORK_INITIALIZER()
* DECLARE_DEFERRED_WORK()
* INIT_DELAYED_WORK_DEFERRABLE()

Rename them to

* __DEFERRABLE_WORK_INITIALIZER()
* DECLARE_DEFERRABLE_WORK()
* INIT_DEFERRABLE_WORK()

This patch doesn't cause any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-21 13:18:23 -07:00
Linus Torvalds
846b99964a Grab bag of InfiniBand/RDMA fixes:
- IPoIB fixes for regressions introduced by path database conversion
  - mlx4 fixes for bugs with large memory systems and regressions from SR-IOV patches
  - RDMA CM fix for passing bad event up to userspace
  - Other minor fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQLo9cAAoJEENa44ZhAt0hPeUP/Ag1i164UE2e64sTfbVUfbEM
 7a+acsBo2ByQhVaXd65PZqm6+aYjosSwkStl0u4cb9Q3qQvYiZeLJj/qmInVV1hv
 qHuNytbnXl6W2f6vyz391TKx+X6uastlH7O4f4v0zS3zkrlkN9kCrN5dI1Fxm9ba
 5hluyggs7pVmHPX7KDGnQNZEjCsKoJF2kzB0hXwqGo1JxB2sEGq3DV/uMTHPgKKM
 HlrTJEEb7pOeHHf2Zt8MQiHMC0YBJ6MmpVIidR02TdQdCgZQcpoYnn20odmNLK5D
 HTT4jbCSsbOiHPpdAn0YkgzXzL58a2/b1Tt/pNq+Rud7VKwFVuFhNElPyAxR1/1K
 tPGUbDA6eOfosU++QbDJ0QHxDTgBTyXwh4yMkwPjmdWJ2jwTQev74iiDXgDf3QbS
 tMmWSwfC38hBxvfomNA6QTPvY1c3NNvj3uO+xq4/q6HloGqSlnIrPBL9hF0BiVQ6
 KteZ6gF+R1rHEIQraEnbNmh5Rxvi2RdgN0Xa0U9w/yr0LCowmUbFK35XoyTFyo/8
 rF7xp85s1wH5OuaA0NQeivQEKbuBxl7nvJlC2RBdRT/B9U5now+Y2XKOX0+FfRsg
 Qm0By/K8ss2XftX+lNfip+ScSmu9p+kCBME0X8Ra590VDt0IcPiJsC9ozhKyywRy
 KfSGj3iZG/nDaec6i1aN
 =hN/H
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma fixes from Roland Dreier:
 "Grab bag of InfiniBand/RDMA fixes:
   - IPoIB fixes for regressions introduced by path database conversion
   - mlx4 fixes for bugs with large memory systems and regressions from
     SR-IOV patches
   - RDMA CM fix for passing bad event up to userspace
   - Other minor fixes"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx4: Check iboe netdev pointer before dereferencing it
  mlx4_core: Clean up buddy bitmap allocation
  mlx4_core: Fix integer overflow issues around MTT table
  mlx4_core: Allow large mlx4_buddy bitmaps
  IB/srp: Fix a race condition
  IB/qib: Fix error return code in qib_init_7322_variables()
  IB: Fix typos in infiniband drivers
  IB/ipoib: Fix RCU pointer dereference of wrong object
  IB/ipoib: Add missing locking when CM object is deleted
  RDMA/ucma.c: Fix for events with wrong context on iWARP
  RDMA/ocrdma: Don't call vlan_dev_real_dev() for non-VLAN netdevs
  IB/mlx4: Fix possible deadlock on sm_lock spinlock
2012-08-17 11:45:58 -07:00
Roland Dreier
96f17d5900 mlx4_core: Clean up buddy bitmap allocation
- Use kcalloc() / vzalloc() instead of an extra bitmap_zero().
 - Add __GFP_NOWARN to kcalloc() since we'll try vzalloc() if it fails.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 21:05:27 -07:00
Yishai Hadas
3de819e6b6 mlx4_core: Fix integer overflow issues around MTT table
Fix some issues around int variables used in data structures related
to memory registration.

Handle int overflow in mlx4_init_icm_table by using a u64 intermediate
variable and changing struct mlx4_icm_table num_obj field to be u32.

Change some more fields/variables to use u32 instead of int to prevent
a case where the variable becomes negative when bit 31 is set.

Also subtract log_mtts_per_seg from the exponent when computing
num_mtt, since its added later on in that very same code area.

This and the previous commit fixes some issues which actually prevent
commit db5a7a65c0 ("mlx4_core: Scale size of MTT table with system
RAM") from working.  Now, when the number of MTTs is scaled with the
size of the RAM we can map up to 8TB.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 21:05:26 -07:00
Yishai Hadas
89dd86db78 mlx4_core: Allow large mlx4_buddy bitmaps
mlx4_buddy_init uses kmalloc() to allocate bitmaps, which fails when
the required size is beyond the max supported value (or when memory is
too fragmented to handle a huge allocation).  Extend this to use use
vmalloc() if kmalloc() fails, and take that into account when freeing
the bitmaps as well.

This fixes a driver load failure when log num mtt is 26 or higher, and
is a step in the direction of allowing to register huge amounts of
memory on large memory systems.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 21:05:20 -07:00
Julia Lawall
499b95f6b3 drivers/net/ethernet/mellanox/mlx4/mcg.c: fix error return code
Convert a 0 error return code to a negative one, as returned elsewhere in the
function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret;
expression e,e1,e2,e3,e4,x;
@@

(
if (\(ret != 0\|ret < 0\) || ...) { ... return ...; }
|
ret = 0
)
... when != ret = e1
*x = \(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\|ioremap\|ioremap_nocache\|devm_ioremap\|devm_ioremap_nocache\)(...);
... when != x = e2
    when != ret = e3
*if (x == NULL || ...)
{
  ... when != ret = e4
*  return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 17:00:57 -07:00
Yevgeny Petrilin
2207b60ffb net/mlx4_core: Remove port type restrictions
Port1=Eth, Port2=IB restriction is no longer required.
Having RoCE, there will always rdma port initialized over ConnectX
physical port, no matter whether the link layer is IB or Ethernet.
So we always have dual port IB device.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-03 16:49:40 -07:00
Yevgeny Petrilin
c18520bd1b net/mlx4_en: Fixing TX queue stop/wake flow
Removing the ring->blocked flag, it is redundant and leads to a race:

We close the TX queue and then set the "blocked" flag.
Between those 2 operations the completion function can check the "blocked"
flag, sees that it is 0, and wouldn't open the TX queue.

Using netif_tx_queue_stopped to check the state of the queue to avoid this race.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-03 16:49:02 -07:00
Amir Vadai
c8c40b7f32 net/mlx4_en: loopbacked packets are dropped when SMAC=DMAC
Should NOT check SMAC=DMAC when:
1. loopback is turned on
2. validate_loopback is true.

Fixed it accordingly.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-03 16:49:02 -07:00
Linus Torvalds
1e30c1b386 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates and fixes from David Miller:

1) Reinstate the no-ref optimization for input route lookups in ipv4 to
   fix some routing cache removal perf regressions.

2) Make TCP socket pre-demux work on ipv6 side too, from Eric Dumazet.

3) Get RX hash value from correct place in be2net driver, from
   Sarveshwar Bandi.

4) Validation of FIB cached routes missing critical check, from Eric
   Dumazet.

5) EEH support in mlx4 driver, from Kleber Sacilotto de Souza.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
  ipv6: Early TCP socket demux
  ipv4: Fix input route performance regression.
  pch_gbe: vlan skb len fix
  pch_gbe: add extra clean tx
  pch_gbe: fix transmit watchdog timeout
  ixgbe: fix panic while dumping packets on Tx hang with IOMMU
  be2net: Fix to parse RSS hash from Receive completions correctly.
  net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER
  hyperv: Add error handling to rndis_filter_device_add()
  hyperv: Add a check for ring_size value
  ipv4: rt_cache_valid must check expired routes
  net/pch_gpe: Cannot disable ethernet autonegation
  qeth: repair crash in qeth_l3_vlan_rx_kill_vid()
  netiucv: cleanup attribute usage
  net: wiznet add missing HAS_IOMEM dependency
  be2net: Missing byteswap in be_get_fw_log_level causes oops on PowerPC
  mlx4: Add support for EEH error recovery
  cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN
  wanmain: comparing array with NULL
  caif: fix NULL pointer check
  ...
2012-07-26 18:09:01 -07:00
Amir Vadai
ee64c0ee51 net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER
RFS filter id can't have the special value RPS_NO_FILTER,
need to skip it when allocating id's.

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 00:23:55 -07:00
Kleber Sacilotto de Souza
57dbf29a54 mlx4: Add support for EEH error recovery
Currently the mlx4 drivers don't have the necessary callbacks to
implement EEH errors detection and recovery, so the PCI layer uses the
probe and remove callbacks to try to recover the device after an error on
the bus. However, these callbacks have race conditions with the internal
catastrophic error recovery functions, which will also detect the error
and this can cause the system to crash if both EEH and catas functions
try to reset the device.

This patch adds the necessary error recovery callbacks and makes sure
that the internal catastrophic error functions will not try to reset the
device in such scenarios. It also adds some calls to
pci_channel_offline() to suppress reads/writes on the bus when the slot
cannot accept I/O operations so we prevent unnecessary accesses to the
bus and speed up the device removal.

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-25 15:24:13 -07:00
Linus Torvalds
5dedb9f3bd InfiniBand/RDMA changes for the 3.6 merge window:
- Updates to the qib low-level driver
  - First chunk of changes for SR-IOV support for mlx4 IB
  - RDMA CM support for IPv6-only binding
  - Other misc cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQDXhUAAoJEENa44ZhAt0hZxwQAJydS9f9me31AGa45SAq8rA8
 6e3LgnQ6jS38he7LZZdSsT7g25jROCKOcTj6VUkNSVAVSkK8zcp7wngjDxw50IK6
 FF9hNeljMkWBflOhxzB34DRGCW4b4J+Yt1o7v1RtawiG/Mhri/6imKL0Aqjt4GwX
 a1MPZn+xI2osLujfdHJtATPWWB9jCaXdFe4DJUNPdqJhS6TN7s8OP3XMiqoJtdaV
 ptHeSSbjdUR1mg/h2LU2FVmWHXNSBxn7MEsrBBRkQVyiEkXieFBLwPTp4DqmAUJf
 xugm6Hf4sbiQ+QuU0baJODt56wveuYVQ4HKUzE0urUFXyU4TUB9blrehWZnKsRte
 wo/w4nvVlnXgGhHH0Igq76RDX8aCwc/6uQJ/29oChWjrei3HE0LjmIlPAu0vAhyw
 ViLe02/r2gKXQv1NxIqhPmGsJTZizg2mUk2eEJHHPQb/NGL7iNo6b+141AfURoqQ
 goGmlGdzffCRpmo6FXFZ57RPVRS4gwMunCY/Pmvq5a2t4oZh8899l2+V3N7bCfvH
 +JdavxAjia9U4IlgPsAqVaz8z8TyHOY/lEd75Wnw8q9www3kLx7hASvHsdwrS4LL
 ihzECzsaXOSoIXQzghs+6iyp1pzmzE9ve8OqbIGJStzlrOLyn7Gjo+Ixwm0SLsTO
 I7h9iJ1OMKFZ8bzmHsWb
 =f09j
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Updates to the qib low-level driver
 - First chunk of changes for SR-IOV support for mlx4 IB
 - RDMA CM support for IPv6-only binding
 - Other misc cleanups and fixes

Fix up some add-add conflicts in include/linux/mlx4/device.h and
drivers/net/ethernet/mellanox/mlx4/main.c

* tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits)
  IB/qib: checkpatch fixes
  IB/qib: Add congestion control agent implementation
  IB/qib: Reduce sdma_lock contention
  IB/qib: Fix an incorrect log message
  IB/qib: Fix QP RCU sparse warnings
  mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
  mlx4_core: Allow guests to have IB ports
  mlx4_core: Implement mechanism for reserved Q_Keys
  net/mlx4_core: Free ICM table in case of error
  IB/cm: Destroy idr as part of the module init error flow
  mlx4_core: Remove double function declarations
  IB/mlx4: Fill the masked_atomic_cap attribute in query device
  IB/mthca: Fill in sq_sig_type in query QP
  IB/mthca: Warning about event for non-existent QPs should show event type
  IB/qib: Fix sparse RCU warnings in qib_keys.c
  net/mlx4_core: Initialize IB port capabilities for all slaves
  mlx4: Use port management change event instead of smp_snoop
  IB/qib: RCU locking for MR validation
  IB/qib: Avoid returning EBUSY from MR deregister
  IB/qib: Fix UC MR refs for immediate operations
  ...
2012-07-24 13:56:26 -07:00
Roland Dreier
089117e1ad Merge branches 'cma', 'cxgb4', 'misc', 'mlx4-sriov', 'mlx-cleanups', 'ocrdma' and 'qib' into for-linus 2012-07-22 23:26:17 -07:00
Thadeu Lima de Souza Cascardo
4cce66cdd1 mlx4_en: map entire pages to increase throughput
In its receive path, mlx4_en driver maps each page chunk that it pushes
to the hardware and unmaps it when pushing it up the stack. This limits
throughput to about 3Gbps on a Power7 8-core machine.

One solution is to map the entire allocated page at once. However, this
requires that we keep track of every page fragment we give to a
descriptor. We also need to work with the discipline that all fragments will
be released (in the sense that it will not be reused by the driver
anymore) in the order they are allocated to the driver.

This requires that we don't reuse any fragments, every single one of
them must be reallocated. We do that by releasing all the fragments that
are processed and only after finished processing the descriptors, we
start the refill.

We also must somehow guarantee that we either refill all fragments in a
descriptor or none at all, without resorting to giving up a page
fragment that we would have already given. Otherwise, we would break the
discipline of only releasing the fragments in the order they were
allocated.

This has passed page allocation fault injections (restricted to the
driver by using required-start and required-end) and device hotplug
while 16 TCP streams were able to deliver more than 9Gbps.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:53:13 -07:00
Amir Vadai
1eb8c695bd net/mlx4_en: Add accelerated RFS support
Use RFS infrastructure and flow steering in HW to keep CPU
affinity of rx interrupts and application per TCP stream.

A flow steering filter is added to the HW whenever the RFS
ndo callback is invoked by core networking code.

Because the invocation takes place in interrupt context, the
actual setup of HW is done using workqueue. Whenever new filter
is added, the driver checks for expiry of existing filters.

Since there's window in time between the point where the core
RFS code invoked the ndo callback, to the point where the HW
is configured from the workqueue context, the 2nd, 3rd etc
packets from that stream will cause the net core to invoke
the callback again and again.

To prevent inefficient/double configuration of the HW, the filters
are kept in a database which is indexed using hash function to enable
fast access.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Amir Vadai
d9236c3f10 {NET,IB}/mlx4: Add rmap support to mlx4_assign_eq
Enable callers of mlx4_assign_eq to supply a pointer to cpu_rmap.
If supplied, the assigned IRQ is tracked using rmap infrastructure.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Amir Vadai
af22d9de45 net/mlx4: Move MAC_MASK to a common place
Define this macro is one common place instead of duplicating it over the code

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Dan Carpenter
9c64508af2 net/mlx4_en: dereferencing freed memory
We dereferenced "mclist" after the kfree().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-16 22:58:07 -07:00
Dan Carpenter
447458c01f net/mlx4: off by one in parse_trans_rule()
This should be ">=" here instead of ">".  MLX4_NET_TRANS_RULE_NUM is 6.
We use "spec->id" as an array offset into the __rule_hw_sz[] and
__sw_id_hw[] arrays which have 6 elements.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-16 22:57:43 -07:00
Jack Morgenstein
6634961c14 mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
To allow easy paravirtualization of P_Key and GID table sizes, keep
paravirtualized sizes in mlx4_dev->caps, but save the actual physical
sizes from FW in struct: mlx4_dev->phys_cap.

In addition, in SR-IOV mode, do the following:

1. Reduce reported P_Key table size by 1.
   This is done to reserve the highest P_Key index for internal use,
   for declaring an invalid P_Key in P_Key paravirtualization.
   We require a P_Key index which always contain an invalid P_Key
   value for this purpose (i.e., one which cannot be modified by
   the subnet manager).  The way to do this is to reduce the
   P_Key table size reported to the subnet manager by 1, so that
   it will not attempt to access the P_Key at index #127.

2. Paravirtualize the GID table size to 1. Thus, each guest sees
   only a single GID (at its paravirtualized index 0).

In addition, since we are paravirtualizing the GID table size to 1, we
add paravirtualization of the master GID event here (i.e., we do not
do ib_dispatch_event() for the GUID change event on the master, since
its (only) GUID never changes).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:52:23 -07:00
Jack Morgenstein
105c320f6a mlx4_core: Allow guests to have IB ports
Modify mlx4_dev_cap to allow IB support when SR-IOV is active.  Modify
mlx4_slave_cap to set the "rdma-supported" bit in its flags area, and
pass that to the guests (this is done in QUERY_FUNC_CAP and its
wrapper).

However, we don't activate IB support quite yet -- we leave the error
return at the start of mlx4_ib_add in the mlx4_ib driver.

In addition, set "protected fmr supported" bit to zero in the
QUERY_FUNC_CAP wrapper.

Finally, in the QUERY_FUNC_CAP wrapper, we needed to add code which
checks for the port type (IB or Ethernet).  Previously, this was not
an issue, since only Ethernet ports were supported.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:51:37 -07:00
Jack Morgenstein
396f2feb05 mlx4_core: Implement mechanism for reserved Q_Keys
The SR-IOV special QP tunneling mechanism uses proxy special QPs
(instead of the real special QPs) for MADs on guests.  These proxy QPs
send their packets to a "tunnel" QP owned by the master.  The master
then forwards the MAD (after any required paravirtualization) to the
real special QP, which sends out the MAD.

For security reasons (i.e., to prevent guests from sending MADs to
tunnel QPs belonging to other guests), each proxy-tunnel QP pair is
assigned a unique, reserved, Q_Key.  These Q_Keys are available only
for proxy and tunnel QPs -- if the guest tries to use these Q_Keys
with other QPs, it will fail.

This patch introduces a mechanism for reserving a block of 64K Q_Keys
for proxy/tunneling use.

The patch introduces also two new fields into mlx4_dev: base_sqpn and
base_tunnel_sqpn.

In SR-IOV mode, the QP numbers for the "real," proxy, and tunnel sqps
are added to the reserved QPN area (so that they will not change).
There are 8 special QPs per port in the HCA, and each of them is
assigned both a proxy and a tunnel QP, for each VF and for the PF as
well in SR-IOV mode.

The QPNs for these QPs are arranged as follows:
 1. The real SQP numbers (8)
 2. The proxy SQPs (8 * (max number of VFs + max number of PFs)
 3. The tunnel SQPs (8 * (max number of VFs + max number of PFs)

To support these QPs, two new fields are added to struct mlx4_dev:

  base_sqp:  this is the QP number of the first of the real SQPs
  base_tunnel_sqp: this is the qp number of the first qp in the tunnel
                   sqp region. (On guests, this is the first tunnel
                   sqp of the 8 which are assigned to that guest).

In addition, in SR-IOV mode, sqp_start is the number of the first
proxy SQP in the proxy SQP region.  (In guests, this is the first
proxy SQP of the 8 which are assigned to that guest)

Note that in non-SR-IOV mode, there are no proxies and no tunnels.
In this case, sqp_start is set to sqp_base -- which minimizes code
changes.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:35:55 -07:00
Dotan Barak
240a9207aa net/mlx4_core: Free ICM table in case of error
In mlx4_init_icm_table(), free the allocated table if we failed to
allocate memory to its entries.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 09:22:58 -07:00
Dotan Barak
f457ce471c mlx4_core: Remove double function declarations
Spotted four duplicate declarations in icm.h, remove them.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 09:22:58 -07:00
Jack Morgenstein
2aca1172c2 net/mlx4_core: Initialize IB port capabilities for all slaves
With IB SR-IOV, each slave has its own separate copy of the port
capabilities flags.  For example, the master can run a subnet manager
(which causes the IsSM bit to be set in the master's port
capabilities) without affecting the port capabilities seen by the
slaves (the IsSM bit will be seen as cleared in the slaves).

Also add a static inline mlx4_master_func_num() to enhance readability
of the code.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-10 09:57:06 -07:00
Jack Morgenstein
00f5ce99dc mlx4: Use port management change event instead of smp_snoop
The port management change event can replace smp_snoop.  If the
capability bit for this event is set in dev-caps, the event is used
(by the driver setting the PORT_MNG_CHG_EVENT bit in the async event
mask in the MAP_EQ fw command).  In this case, when the driver passes
incoming SMP PORT_INFO SET mads to the FW, the FW generates port
management change events to signal any changes to the driver.

If the FW generates these events, smp_snoop shouldn't be invoked in
ib_process_mad(), or duplicate events will occur (once from the
FW-generated event, and once from smp_snoop).

In the case where the FW does not generate port management change
events smp_snoop needs to be invoked to create these events.  The flow
in smp_snoop has been modified to make use of the same procedures as
in the fw-generated-event event case to generate the port management
events (LID change, Client-rereg, Pkey change, and/or GID change).

Port management change event handling required changing the
mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument
(last argument) had to be changed to unsigned long in order to
accomodate passing the EQE pointer.

We also needed to move the definition of struct mlx4_eqe from
net/mlx4.h to file device.h -- to make it available to the IB driver,
to handle port management change events.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-10 09:47:10 -07:00
Jack Morgenstein
752a50cab6 mlx4_core: Pass an invalid PCI id number to VFs
Currently, VFs have 0 in their dev->caps.function field.  This is a
valid pci id (usually of the PF).  Instead, pass an invalid PCI id to
the VF via QUERY_FW, so that if the value gets accessed in the VF
driver, we'll catch the problem.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:05:05 -07:00
Hadar Hen Zion
cabdc8ee37 net/mlx4_en: Add support for drop action through ethtool
The drop action is implemented by allocating a QP and keeping it in a reset state
such that the HW drops any packets which are steered to that QP. When a drop action
is requested, we attach the relevant flow to that QP.

Sign-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:06 -07:00
Hadar Hen Zion
820672812f net/mlx4_en: Manage flow steering rules with ethtool
Implement the ethtool APIs for attaching L2/L3/L4 based flow steering
rules to the netdevice RX rings. Added set_rxnfc callback and enhanced
the existing get_rxnfc callback.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:06 -07:00
Hadar Hen Zion
592e49dda8 net/mlx4: Implement promiscuous mode with device managed flow-steering
The device managed flow steering API has three promiscuous modes:

1. Uplink - captures all the packets that arrive to the port.
2. Allmulti - captures all multicast packets arriving to the port.
3. Function port - for future use, this mode is not implemented yet.

Use these modes with the flow_attach and flow_detach firmware commands
according to the promiscuous state of the netdevice.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:06 -07:00
Hadar Hen Zion
1b9c6b064e net/mlx4_core: Add resource tracking for device managed flow steering rules
As with other device resources, the resource tracker is needed for supporting
device managed flow steering rules under SRIOV: make sure virtual functions
delete only rules created by them, and clean all rules attached by a crashed VF.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:06 -07:00
Hadar Hen Zion
0ff1fb654b {NET, IB}/mlx4: Add device managed flow steering firmware API
The driver is modified to support three operation modes.

If supported by firmware use the device managed flow steering
API, that which we call device managed steering mode. Else, if
the firmware supports the B0 steering mode use it, and finally,
if none of the above, use the A0 steering mode.

When the steering mode is device managed, the code is modified
such that L2 based rules set by the mlx4_en driver for Ethernet
unicast and multicast, and the IB stack multicast attach calls
done through the mlx4_ib driver are all routed to use the device
managed API.

When attaching rule using device managed flow steering API,
the firmware returns a 64 bit registration id, which is to be
provided during detach.

Currently the firmware is always programmed during HCA initialization
to use standard L2 hashing. Future work should be done to allow
configuring the flow-steering hash function with common, non
proprietary means.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion
8fcfb4db74 net/mlx4_core: Add firmware commands to support device managed flow steering
Add support for firmware commands to attach/detach a new device managed
steering mode. Such network steering rules allow the user to provide an
L2/L3/L4 flow specification to the firmware and have the device to steer
traffic that matches that specification to the provided QP.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion
c96d97f4d1 net/mlx4: Set steering mode according to device capabilities
Instead of checking the firmware supported steering mode in various
places in the code, add a dedicated field in the mlx4 device capabilities
structure which is written once during the initialization flow and read
across the code.

This also set the grounds for add new steering modes. Currently two modes
are supported, and are named after the ConnectX HW versions A0 and B0.

A0 steering uses mac_index, vlan_index and priority to steer traffic
into pre-defined range of QPs.

B0 steering uses Ethernet L2 hashing rules and is enabled only
if the firmware supports both unicast and multicast B0 steering,

The current steering modes are relevant for Ethernet traffic only,
such that Infiniband steering remains untouched.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Yevgeny Petrilin
6d19993788 net/mlx4_en: Re-design multicast attachments flow
Currently, for every change in the net device multicast list, the driver
detaches all the addresses from the HW device, and then attaches the
updated list. This behavior is wrong from two aspects: first, it causes
a load of firmware commands and second, there is period of time where
the correct addresses are not attached, which turned into packet loss.

To improve - a copy of the multicast list is saved by the driver. For
every change in the multicast list, the multicast list copy is used
to find the delta between those two lists and add or remove multicast
addresses as needed.

Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion
aa1ec3dde1 net/mlx4_core: Change resource tracking ID to be 64 bit
Currently the IDs used by the resource tracker are of type u32, so far this was
ok since all the different resources we were tracking could be encoded in 32bit.

As a preparation step for tracking of resources whose IDs need > 32 bits such
as network flow steering rules, who are 64 bit in size, move to use 64 bit
based resource IDs.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion
4af1c0488d net/mlx4_core: Change resource tracking mechanism to use red-black tree
Change the data structure used for managing the SRIOV resource tracking
mechanism from radix tree to red-black tree. This is preparation step
for supporting resource IDs which are 64bit long, such as network flow
steering rules. Such IDs can't be used as radix-tree keys on 32bit
architectures and hence the reason for the change.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Yuval Mintz
90b1ebe7af mlx4: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 03:06:44 -07:00
David S. Miller
b26d344c6b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/caif/caif_hsi.c
	drivers/net/usb/qmi_wwan.c

The qmi_wwan merge was trivial.

The caif_hsi.c, on the other hand, was not.  It's a conflict between
1c385f1fdf ("caif-hsi: Replace platform
device with ops structure.") in the net-next tree and commit
39abbaef19 ("caif-hsi: Postpone init of
HIS until open()") in the net tree.

I did my best with that one and will ask Sjur to check it out.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-28 17:37:00 -07:00
Yevgeny Petrilin
044ca2a5f2 net/mlx4_en: Release QP range in free_resources
Add a missing resource release in ring cleanup.
Not doing this leaves a range of QPs that are being reserved,
and no one can use them.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:30:12 -07:00
Yevgeny Petrilin
9858d2d1ac net/mlx4: Use single completion vector after NOP failure
Fix a crash at the error flow of NOP command which caused the driver to try and use
a completion vector which wasn't allocated.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:30:12 -07:00
Yevgeny Petrilin
5c8e904666 net/mlx4_en: Set correct port parameters during device initialization
Set valid port parameters: MTU and flow control configuration when
configuring the port during HW device initialization,
prior to the net device open() being called.
Using  invalid parameters (such as all zeros)
could lead to bad firmware behavior.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:30:12 -07:00
David S. Miller
7e52b33bd5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv6/route.c

This deals with a merge conflict between the net-next addition of the
inetpeer network namespace ops, and Thomas Graf's bug fix in
2a0c451ade which makes sure we don't
register /proc/net/ipv6_route before it is actually safe to do so.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15 15:51:55 -07:00
Linus Torvalds
ae501be0f6 InfiniBand/RDMA fixes for 3.5-rc2, all in hardware drivers:
- Fix crash in cxgb4
  - Fixes to new ocrdma driver
  - Regression fixes for mlx4
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPz5V+AAoJEENa44ZhAt0hsXoQAIFBNbsLWLft4J9jpyDAFj5z
 SpspcZAVgJW+K7I6j38SdQXMsR/XmcwM8Yh3iJIa/YG0P/6bWXDNU20itN62mAcI
 D+LphECUv/T8PGxEcAFlwjLYy5kcgFBTs/t7AgVS57OL4z8YuDLDr7uhdcnPdq7t
 AMIptqhXZgSnM/peIkf2SWBYxHSUMhukrlu5uNB9uc9GC9VRGhioXxT3eBPwr54O
 7BwHxQGDYtFZ4fYKmFxN9sJJmFf9nl/WhYM2HwCMPEe4UzlxxfbP7/c17QNS7YGo
 e072Jvs+U2ttb4J7J2yBRcpiiSjYDAVi+7fE9OtAdWyfPDa9MmcajVq7PUnohZLc
 muNSv1RGOebj2mSjE+oc1oWZKkM/nSEwbIE7PJOvWX2RdqOs8xNNC0RjgvsXDvyf
 +lARPcyolxpJXtvIlLY/Ua1KhJf52zu+Kp1QCbEnloU/NuGcc09It6Wq6X1yU/Gs
 N29qjuFOBoHQxDzfWPc3Uhp1/eNI8UJiTfO9CSYHv2ZHJzW8BqoblbKlE2l43TDg
 frH1l1/9RYY7gTGoCVvfJByvSWM7rBJdNfpu+yJDm5en86xcF6HB7GG0nJ/okpw2
 dUqQsLu+nTKZn8KCM3jzgP/ANT+PWk1UZlS1fIaMFMwa1SLWYqbaW/KEbRrY5IeP
 H0q/TFhJH8PccBjPbcsT
 =zl8u
 -----END PGP SIGNATURE-----

Merge tag 'rdma-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA fixes from Roland Dreier:
 "All in hardware drivers:
   - Fix crash in cxgb4
   - Fixes to new ocrdma driver
   - Regression fixes for mlx4"

* tag 'rdma-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx4: Fix max_wqe capacity reported from query device
  mlx4_core: Fix setting VL_cap in mlx4_SET_PORT wrapper flow
  IB/mlx4: Fix EQ deallocation in legacy mode
  RDMA/cxgb4: Fix crash when peer address is 0.0.0.0
  RDMA/ocrdma: Remove unnecessary version.h includes
  RDMA/ocrdma: Fix signaled event for SRQ_LIMIT_REACHED
  RDMA/ocrdma: Correct queue free count math
2012-06-06 10:45:21 -07:00
Jack Morgenstein
edc4a67e15 mlx4_core: Fix setting VL_cap in mlx4_SET_PORT wrapper flow
Commit 096335b3f9 ("mlx4_core: Allow dynamic MTU configuration for
IB ports") modifies the port VL setting.  This exposes a bug in
mlx4_common_set_port(), where the VL cap value passed in (inside the
command mailbox) is incorrectly zeroed-out:

mlx4_SET_PORT modifies the VL_cap field (byte 3 of the mailbox).
Since the SET_PORT command is paravirtualized on the master as well as
on the slaves, mlx4_SET_PORT_wrapper() is invoked on the master.  This
calls mlx4_common_set_port() where mailbox byte 3 gets overwritten by
code which should only set a single bit in that byte (for the reset
qkey counter flag) -- but instead overwrites the entire byte.

The result is that when running in SR-IOV mode, the VL_cap will be set
to zero -- fix this.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-06 10:07:54 -07:00
Joe Perches
6469933605 ethernet: Remove casts to same type
Adding casts of objects to the same type is unnecessary
and confusing for a human reader.

For example, this cast:

        int y;
        int *p = (int *)&y;

I used the coccinelle script below to find and remove these
unnecessary casts.  I manually removed the conversions this
script produces of casts with __force, __iomem and __user.

@@
type T;
T *p;
@@

-       (T *)p
+       p

A function in atl1e_main.c was passed a const pointer
when it actually modified elements of the structure.

Change the argument to a non-const pointer.

A function in stmmac needed a __force to avoid a sparse
warning.  Added it.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-06 09:31:33 -07:00
Jack Morgenstein
401453a31e net/mlx4_core: Fix obscure mlx4_cmd_box parameter in QUERY_DEV_CAP
The "!mlx4_is_slave" is totally confusing.  Fix with
constant MLX4_CMD_NATIVE, which is the intended behavior.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein
6230bb234d net/mlx4_core: Check port out-of-range before using in mlx4_slave_cap
The range check was performed after using the port number.

Reverse this to prevent a potential array overflow.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein
b91cb3ebcd net/mlx4_core: Fixes for VF / Guest startup flow
- pass the following parameters:
  - firmware version (added QUERY_FW paravirtualization for that)

  - disable Blueflame on slaves. KVM disables write combining on guests,
    and we get better performance without BF in this case. (This requires
    QUERY_DEV_CAP paravirtualization, also in this commit)

  - max qp rdma as destination

- get rid of a chunk of "if (0)" dead code

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein
13bf58b760 net/mlx4_en: Fix improper use of "port" parameter in mlx4_en_event
Port is used as an array index before we know if that is proper.

For example, in the catas event case, port is zero; however,
the port index should lie in the range (1..2).

Fix this by using 'port' only in the events where it is of interest.

Test for port out of range in the default (unhandled event) case,
and do not output a message if it is not an ethernet port.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Marcel Apfelbaum
3fc929e2d6 net/mlx4_core: Fix number of EQs used in ICM initialisation
In SRIOV mode, the number of EQs used when computing the total ICM size
was incorrect.

To fix this, we do the following:
1. We add a new structure to mlx4_dev, mlx4_phys_caps, to contain physical HCA
   capabilities.  The PPF uses the phys capabilities when it computes things
   like ICM size.

   The dev_caps structure will then contain the paravirtualized values, making
   bookkeeping much easier in SRIOV mode. We add a structure rather than a
   single parameter because there will be other fields in the phys_caps.

   The first field we add to the mlx4_phys_caps structure is num_phys_eqs.

2. In INIT_HCA, when running in SRIOV mode, the "log_num_eqs" parameter
   passed to the FW is the number of EQs per VF/PF; each function (PF or VF)
   has this number of EQs available.

   However, the total number of EQs which must be allowed for in the ICM is
   (1 << log_num_eqs) * (#VFs + #PFs).  Rather than compute this quantity,
   we allocate ICM space for 1024 EQs (which is the device maximum
   number of EQs, and which is the value we place in the mlx4_phys_caps structure).

   For INIT_HCA, however, we use the per-function number of EQs as described
   above.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein
30f7c73bed net/mlx4_core: Fix the slave_id out-of-range test in mlx4_eq_int
Ths fixes the comparison in the FLR (Function Level Reset) event case.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:15 -04:00
Linus Torvalds
c23ddf7857 InfiniBand/RDMA changes for the 3.5 merge window:
- Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
  - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
    applications to send and receive arbitrary packets directly to/from
    the hardware
  - Add "doorbell drop" handling to the cxgb4 driver
  - A fairly large batch of qib hardware driver changes
  - A few fixes for lockdep-detected issues
  - A few other miscellaneous fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPumbZAAoJEENa44ZhAt0h8LgP/0fXe7Szm3n6P6UvMAVqkagM
 4PpreH3mpWUFpzqeQE1JPDtgx700R6aPipbHqgIN+k61RWMpLjICGcNx7iwxn1I+
 zqdquGygWgjceLz+BLVlk+iBmJt3vZ3fPRAXc7fdP+jhIarWkNIOy1pXWTUuRvED
 jL8jIaxhCcgAVzm/zNyt6IPxkaHvCz7K9wqmpyU0dsO9OyPdGvWA9+CkGXwmOCPq
 mxSVhWnfGsMkPBsL7EgTC5KP/ox2PKq6rFgysmVVS+rKCpP0L8BEVQyGX3Gf8KA8
 yV+KdTi9ofDnFrv6R7Wz0v7HRUih8GRssakzBu7Y7HLfK1M/QwMG0GUAibXGZObc
 vUXuQ3uRJ/cIzMPXqKeGYwpb5t+TmxyjhWu44OjCUQkNau91+9BSbA69S88KXc49
 aTJiCZlhPuGf4uGMWJJuPLcE2xO2QCZj+8ckL2STYrIip6GWlCH02kJaQmRkuWH2
 UhMOeJDBC4nvh4EQT/WwHpGzyhkavE2ayfo5YemxBJXo+P5Mdbf7WIDRQDLUEeQH
 F8sPoccH4hDiAorN/SkTsm14jVTP7oWW1M40Ont59Nhbgm88MsVkvjoneHnfBvbD
 HjK92soCWnYTAoREfj0G4xUxZgMdOZcezWrX0rx5LJ8Ju9y4zAi3cKGr7lg6hs4X
 syKfN0VjiDRtJ+pxayi3
 =yWfr
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
 - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
   applications to send and receive arbitrary packets directly to/from
   the hardware
 - Add "doorbell drop" handling to the cxgb4 driver
 - A fairly large batch of qib hardware driver changes
 - A few fixes for lockdep-detected issues
 - A few other miscellaneous fixes and cleanups

Fix up trivial conflict in drivers/net/ethernet/emulex/benet/be.h.

* tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (53 commits)
  RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree
  IB/mlx4: Fix mlx4_ib_add() error flow
  IB/core: Fix IB_SA_COMP_MASK macro
  IB/iser: Fix error flow in iser ep connection establishment
  IB/mlx4: Increase the number of vectors (EQs) available for ULPs
  RDMA/cxgb4: Add query_qp support
  RDMA/cxgb4: Remove kfifo usage
  RDMA/cxgb4: Use vmalloc() for debugfs QP dump
  RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues
  RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch()
  RDMA/cxgb4: Add DB Overflow Avoidance
  RDMA/cxgb4: Add debugfs RDMA memory stats
  cxgb4: DB Drop Recovery for RDMA and LLD queues
  cxgb4: Common platform specific changes for DB Drop Recovery
  cxgb4: Detect DB FULL events and notify RDMA ULD
  RDMA/cxgb4: Drop peer_abort when no endpoint found
  RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr()
  mlx4_core: Change bitmap allocator to work in round-robin fashion
  RDMA/nes: Don't call event handler if pointer is NULL
  RDMA/nes: Fix for the ORD value of the connecting peer
  ...
2012-05-21 17:54:55 -07:00
Amir Vadai
bc6a4744b8 net/mlx4_en: num cores tx rings for every UP
Change the TX ring scheme such that the number of rings for untagged packets
and for tagged packets (per each of the vlan priorities) is the same, unlike
the current situation where for tagged traffic there's one ring per priority
and for untagged rings as the number of core.

Queue selection is done as follows:

If the mqprio qdisc is operates on the interface, such that the core networking
code invoked the device setup_tc ndo callback, a mapping of skb->priority =>
queue set is forced - for both, tagged and untagged traffic.

Else, the egress map skb->priority =>  User priority is used for tagged traffic, and
all untagged traffic is sent through tx rings of UP 0.

The patch follows the convergence of discussing that issue with John Fastabend
over this thread http://comments.gmane.org/gmane.linux.network/229877

Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Liran Liss <liranl@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17 16:17:50 -04:00
Jack Morgenstein
eb71d0d63f net/mlx4_core: Fixed error flow in rem_slave_eqs
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:59 -04:00
Jack Morgenstein
ba062d5219 net/mlx4_core: Add XRC domains and counters to resource tracker
Add missing resource tracking for XRC domains and complete the tracking for HCA
network flow counters.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:59 -04:00
Jack Morgenstein
b8924951f6 net/mlx4_core: Fix potential kernel Oops in res tracker during Dom0 driver unload
Currently the slave and master resources are deleted after master freed
all bitmaps. If any resources were not properly cleaned up during the
shutdown process, an Oops would result.

Fix so that delete slave (only) resources during cleanup. Master resources
are cleaned up during unload process, and need not separately be cleaned.

Note that during cleanup, we need to split the resource-tracker freeing
functionality.

Before removing all the bitmaps, we free any leftover slave resources.
However, we can only remove the resource tracker linked list after
all bitmap frees, since some of the freeing functions (e.g.,
mlx4_cleanup_eq_table) use paravirtualized FW commands which expect
the resource tracker linked list to be present.

Found-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Jack Morgenstein
681372a7a3 net/mlx4_core: Do not reset module-parameter num_vfs when fail to enable sriov
Consider the following scenario: 2 HCAs, where only one of which can run SRIOV.

If we reset the module parameter, all the VFs of the SRIOV HCA will be
claimed by the PPF host (-- the code relies on num_vfs being non-zero
to avoid this claiming, and num_vfs was reset when pci_enable_sriov failed
for the non-SRIOV HCA).

The solution is not to touch the num_vfs parameter.

Also, eliminate the unneeded check of num_vfs when disabling sriov
(the dev flag bit is sufficient).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Jack Morgenstein
b9985f410a net/mlx4_core: Remove unused *_str functions from the resource tracker
Removed unsued *_str helper functions from resource_tracker.c

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Jack Morgenstein
5e92d803bf net/mlx4_core: Change SYNC_TPT to be native (not wrapped)
The "wrapped" was incorrect, since no wrapper function was defined.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Jack Morgenstein
8bac9ede68 net/mlx4_core: Fix init_port mask state for slaves
In function mlx4_INIT_PORT_wrapper, the port state mask for the
slave is only set if we are invoking the INIT_PORT fw command.

However, the reference count for the (initialized) port is
incremented anyway.

This creates a problem in that when we have multiple slaves,
then the CLOSE_PORT command will never be invoked. The
reason is that in the CLOSE_PORT wrapper, if the port-state
mask is zero for the slave (which it is), the wrapper returns
without doing anything. The only slave which will not return
immediately in the CLOSE_PORT wrapper is that slave for which
INIT_PORT was invoked.

The fix is to not have the port-state mask setting depend
on the logic for calling the INIT_PORT fw command.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Or Gerlitz
162344ed2c net/mlx4: Address build warnings on set but not used variables
Handle the compiler warnings on variables which are set but not used
by removing the relevant variable or casting a return value which is
ignored on purpose to void.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Jack Morgenstein
f4ec9e9531 mlx4_core: Change bitmap allocator to work in round-robin fashion
Under most circumstances, the bitmap allocator does not allocate the
same full 24-bit QP number immediately after a QP is destroyed.

This works by using the upper bits of a 24-bit QP number, beyond the
number of QPs that are actually available in the low level driver.
For example, say that the HCA is willing to allocate a maximum of 64K
qps.  We use the bits 23..16 as a "counter" which is incremented by 1
at each allocation so that even if the same physical QP is
re-allocated, it will not receive the same 24-bit QP number.

However, we have seen the following scenario:
1. Allocate, say, 255 QPs in succession.  This will cause a wrap of the "counter".
2. Destroy the first QP allocated, then allocate a new QP.  The new QP,
   because of the counter wraparound, will get the same FULL QP number as
   the QP just destroyed!

This is a problem because packets in transit can be erroneously
delivered to the new QP when they were meant for the old (destroyed)
QP, because the full QP number of the new QP is identical to the
destroyed QP.  (The "counter" mechanism is meant to prevent this by
having the full 24-bit QP numbers differ even if the physical QP on
the HCA is the same.  As we see above, however, this mechanism does
not always work).

The best fix for this problem is to allocate QPs in round-robin mode,
so that the physical QP numbers are not immediately re-used.

Found-by:  Matthew Finlay <matt@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14 13:44:38 -07:00
Shlomo Pongratz
b3416f4476 mlx4_core: Add second capabilities flags field
This patch adds a 64-bit flags2 features member to struct mlx4_dev to
export further features of the hardware.  The original flags field
tracks features whose support bits are advertised by the firmware in
offsets 0x40 and 0x44 of the query device capabilities command.
flags2 will track features whose support bits are scattered at various
offsets.

RSS support is the first feature to be exported through flags2.  RSS
capabilities are located at offset 0x2e.  The size of the RSS
indirection table is also given in this offset.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-08 11:54:32 -07:00
Yevgeny Petrilin
5b263f5374 mlx4_en: Byte Queue Limit support
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:34:02 -04:00
Yevgeny Petrilin
e22979d96a mlx4_en: Moving to Interrupts for TX completions
Moving to interrupts instead of polling fpr TX completions
Avoiding situations where skb can be held in by the driver for
a long time (till timer expires).
The change is also necessary for supporting BQL.

Removing comp_lock that was required because we could handle TX
completions from several contexts: Interrupts, timer, polling.
Now there is only interrupts

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:34:02 -04:00
Yevgeny Petrilin
a19a848a45 mlx4_en: Added Ethtool support for TX Interrupt coalescing
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:34:02 -04:00
Paul Gortmaker
43c880dff3 drivers/net: fix unresolved 64bit math in mellanox/mlx4/en_dcb_nl.c
Commit 109d244605

    "net/mlx4_en: Set max rate-limit for a TC"

introduced 64 bit math operations into mlx4_en_dcbnl_ieee_setmaxrate()

causing the following final link failure on an x86_32 allmodconfig

  ERROR: "__udivdi3" [drivers/net/ethernet/mellanox/mlx4/mlx4_en.ko] undefined!

Convert it to use div_u64() instead.

Cc: Amir Vadai <amirv@mellanox.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-16 02:12:11 -04:00
Masanari Iida
fd9071ec61 net: Fix spelling typo in net
Correct spelling typo within drivers/net.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 15:29:02 -04:00
David S. Miller
06eb4eafbd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-04-10 14:30:45 -04:00
Amir Vadai
109d244605 net/mlx4_en: Set max rate-limit for a TC
This patch is using the DCB netlink to set rate limit per ETS TC
Values are accepted in Kbps and rounded up to the nearest multiply of 100Mbps.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:04 -04:00
Amir Vadai
897d7846b4 net/mlx4_en: sk_prio <=> UP for untagged traffic
Since vlan egress map is only good for tagged traffic, need to have other
mapping to be used by untagged traffic.
For that, the driver uses sch_mqprio mapping. This mapping could be set by
using tc tool from iproute2 package.
Mapped UP will be used by the HW for QoS purposes, but won't go out on the
wire.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:04 -04:00
Amir Vadai
564c274c3d net/mlx4_en: DCB QoS support
Set TSA, promised BW and PFC using IEEE 802.1qaz netlink commands.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:04 -04:00
Amir Vadai
e5395e92a4 net/mlx4_core: set port QoS attributes
Adding QoS firmware commands:
- mlx4_en_SET_PORT_PRIO2TC - set UP <=> TC
- mlx4_en_SET_PORT_SCHEDULER - set promised BW, max BW and PG number

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:03 -04:00
Amir Vadai
0e98b523c4 net/mlx4_en: Force user priority by QP attribute
Instead of relying on HW to change schedule queue by UP, schedule
queue is fixed for a tx_ring, and UP in WQE is ignored in this aspect.  This
resolves two issues with untagged traffic:
1. untagged traffic has no UP in packet which is needed for QoS. The change
   above allows setting the schedule queue (and by that the UP) of such a stream.
2. BlueFlame uses the same field used by vlan tag. So forcing UP from QPC
   allows using BF for untagged but prioritized traffic.

In old firmware that force UP is not supported, untagged traffic will not subject to
QoS.

Because UP is set by QP, need to always have a tx ring per UP, even if pfcrx
module paramter is false.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:03 -04:00
Thadeu Lima de Souza Cascardo
117980c4c9 mlx4: allocate just enough pages instead of always 4 pages
The driver uses a 2-order allocation, which is too much on architectures
like ppc64, which has a 64KiB page. This particular allocation is used
for large packet fragments that may have a size of 512, 1024, 4096 or
fill the whole allocation. So, a minimum size of 16384 is good enough
and will be the same size that is used in architectures of 4KiB sized
pages.

This will avoid allocation failures that we see when the system is under
stress, but still has plenty of memory, like the one below.

This will also allow us to set the interface MTU to higher values like
9000, which was not possible on ppc64 without this patch.

Node 1 DMA: 737*64kB 37*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 51904kB
83137 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 10420096kB
Total swap = 10420096kB
107776 pages RAM
1184 pages reserved
147343 pages shared
28152 pages non-shared
netstat: page allocation failure. order:2, mode:0x4020
Call Trace:
[c0000001a4fa3770] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable)
[c0000001a4fa3820] [c00000000016af38] .__alloc_pages_nodemask+0x618/0x930
[c0000001a4fa39a0] [c0000000001a71a0] .alloc_pages_current+0xb0/0x170
[c0000001a4fa3a40] [d00000000dcc3e00] .mlx4_en_alloc_frag+0x200/0x240 [mlx4_en]
[c0000001a4fa3b10] [d00000000dcc3f8c] .mlx4_en_complete_rx_desc+0x14c/0x250 [mlx4_en]
[c0000001a4fa3be0] [d00000000dcc4eec] .mlx4_en_process_rx_cq+0x62c/0x850 [mlx4_en]
[c0000001a4fa3d20] [d00000000dcc5150] .mlx4_en_poll_rx_cq+0x40/0x90 [mlx4_en]
[c0000001a4fa3dc0] [c0000000004e2bb8] .net_rx_action+0x178/0x450
[c0000001a4fa3eb0] [c00000000009c9b8] .__do_softirq+0x118/0x290
[c0000001a4fa3f90] [c000000000031df8] .call_do_softirq+0x14/0x24
[c000000184c3b520] [c00000000000e700] .do_softirq+0xf0/0x110
[c000000184c3b5c0] [c00000000009c6d4] .irq_exit+0xb4/0xc0
[c000000184c3b640] [c00000000000e964] .do_IRQ+0x144/0x230

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Tested-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 20:34:29 -04:00
Linus Torvalds
0c2fe82a9b InfiniBand/RDMA changes for the 3.4 merge window. Nothing big really
stands out; by patch count lots of fixes to the mlx4 driver plus some
 cleanups and fixes to the core and other drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPZ2THAAoJEENa44ZhAt0hMgkP/AwXjwzz6lYlIizytQ5KOH11
 CT/VoHLIGIwY31aRhZRHggWLEbJi8akgfh04K9PiSh/muFRPYk5KJDoQEoJV5Odv
 12krqtpZeL41YcAPq0wiyP3Ki5AZDCDumzWbOtmxE9m93mVfi3yFTTWDHFo/7SOx
 A2kUITdKuZhgBpCFYS23Gs8QTWcMPUlwNlM76edG90BjSySHsK15tbqTUaEOC6Ux
 SPG0p0c2YjlZCPmRObrCL65o5LFRVajinZ4rXN7rre4LEU+IuXgW+mQU2Kwy8z6a
 oMPE2l4OMSqj270r2PlVK087gGH69nKfLgLQLJiP0Id+KwdYUk7vQcA+Fu0jwjP3
 Ci/ahllvB76IiaNbax0vTy2ohCJmilLLotAP46NW0vPR2/KnmVy0Mqk8pXQQddw4
 tnAFNnafn/MjTty8GwqyXR/enJZs29ePcSxcYnKjXYRIgG4ldejL2wktgSra8+gb
 3/qFag1HRgZzcICkXGpHj7uWJ2KDe++1m6KTb0THUmrjuiel6ATFZpYoeRed3GfS
 ZMqpjZvuXM8nA9CrKMkzm5vIiUFF8Qq0hUTLR38F8FiNEhmC08JqH5PqjJ3Z18if
 R1yG4W4xmp/0ICHg4zyg6LWV3YsweDD+jSCJpW+KNBvu3HtYb41HIlK+i2TTmtPD
 3etEZZRwROAyYKcwN01p
 =DbLx
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes for the 3.4 merge window from Roland Dreier:
 "Nothing big really stands out; by patch count lots of fixes to the
  mlx4 driver plus some cleanups and fixes to the core and other
  drivers."

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (28 commits)
  mlx4_core: Scale size of MTT table with system RAM
  mlx4_core: Allow dynamic MTU configuration for IB ports
  IB/mlx4: Fix info returned when querying IBoE ports
  IB/mlx4: Fix possible missed completion event
  mlx4_core: Report thermal error events
  mlx4_core: Fix one more static exported function
  IB: Change CQE "csum_ok" field to a bit flag
  RDMA/iwcm: Reject connect requests if cmid is not in LISTEN state
  RDMA/cxgb3: Don't pass irq flags to flush_qp()
  mlx4_core: Get rid of redundant ext_port_cap flags
  RDMA/ucma: Fix AB-BA deadlock
  IB/ehca: Fix ilog2() compile failure
  IB: Use central enum for speed instead of hard-coded values
  IB/iser: Post initial receive buffers before sending the final login request
  IB/iser: Free IB connection resources in the proper place
  IB/srp: Consolidate repetitive sysfs code
  IB/srp: Use pr_fmt() and pr_err()/pr_warn()
  IB/core: Fix SDR rates in sysfs
  mlx4: Enforce device max FMR maps in FMR alloc
  IB/mlx4: Set bad_wr for invalid send opcode
  ...
2012-03-21 10:33:42 -07:00
Eugenia Emantayev
58a3de0592 mlx4_core: fix race on comm channel
Prevent race condition between commands on comm channel.
Happened while unloading the driver when switching from
event to polling mode. VF got completion on the last command
before switching to polling mode, but toggle was not changed.
After the fix - VF will not write the next command before
toggle is updated.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-19 18:02:05 -04:00
Roland Dreier
42872c7a5e Merge branches 'misc' and 'mlx4' into for-next
Conflicts:
	drivers/infiniband/hw/mlx4/main.c
	drivers/net/ethernet/mellanox/mlx4/main.c
	include/linux/mlx4/device.h
2012-03-12 16:25:28 -07:00
Roland Dreier
db5a7a65c0 mlx4_core: Scale size of MTT table with system RAM
The current driver defaults to 1M MTT segments, where each segment holds
8 MTT entries.  This limits the total memory registered to 8M * PAGE_SIZE
which is 32GB with 4K pages.  Since systems that have much more memory
are pretty common now (at least among systems with InfiniBand hardware),
this limit ends up getting hit in practice quite a bit.

Handle this by having the driver allocate at least enough MTT entries to
cover 2 * totalram pages.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-12 16:24:59 -07:00
Or Gerlitz
096335b3f9 mlx4_core: Allow dynamic MTU configuration for IB ports
Set the MTU for IB ports in the driver instead of using the firmware
default of 2KB (the driver defaults to 4KB).  Allow for dynamic mtu
configuration through a new, per-port sysfs entry.

Since there's a dependency between the port MTU and the max number of
HW VLs the port can support, apply a mim/max approach, using a loop
that goes down from the highest possible number of VLs to the lowest,
using the firmware return status to know whether the requested number
of VLs is possible with a given MTU.

For now, as with the dynamic link type change / VPI support, the sysfs
entry to change the mtu is exposed only when NOT running in SR-IOV
mode.  To allow changing the MTU for the master in SR-IOV mode,
primary-function-initiated FLR (Function Level Reset) needs to be
implemented.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-12 16:24:59 -07:00
Jack Morgenstein
5984be9004 mlx4_core: Report thermal error events
Print an error message when a thermal error async event is reported by the HW.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Dotan Barak <dotanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-12 16:24:59 -07:00
Roland Dreier
e10903b087 mlx4_core: Fix one more static exported function
Commit 22c8bff6fa ("mlx4_core: Exported functions can't be static")
fixed most of this up, but forgot about mlx4_is_slave_active().  Fix
this one too.

Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-12 16:24:58 -07:00
David S. Miller
b2d3298e09 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-03-09 14:34:20 -08:00
Jack Morgenstein
dcf353b170 mlx4_core: fix bug in modify_cq wrapper for resize flow.
The actual FW command is called in procedure "handle_resize".
Code incorrectly invoked the FW command again (in good flow), in
the modify_cq wrapper function.

Fix by skipping second FW invocation unconditionally for resize.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-08 00:28:01 -08:00
Or Gerlitz
8154c07fe1 mlx4_core: Get rid of redundant ext_port_cap flags
While doing the work for commit a6f7feae6d ("IB/mlx4: pass SMP
vendor-specific attribute MADs to firmware") we realized that the
firmware would respond on all sorts of vendor-specific MADs.
Therefore commit 97285b7817 ("mlx4_core: Add extended port
capabilities support") adds redundant code into the driver, since
there's no real reaon to maintain the extended capabilities of the
port, as they can be queried on demand (e.g the FDR10 capability).

This patch reverts commit 97285b7817 and removes the check for
extended caps from the mlx4_ib driver port query flow.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-06 17:25:18 -08:00
Yevgeny Petrilin
66431a7d45 net/mlx4: defining functions as static
Fixing sparse warnings, the 2 functions are only used in same
file. Defining them as static and not exporting them.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:18 -05:00
Yevgeny Petrilin
be6736ba1f net/mlx4: remove unused functions
Removing functions that are no longer in use, but still exist

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:18 -05:00
Yevgeny Petrilin
9a9a232a92 net/mlx4: fixing sparse warnings for not declared, functions
The SET_PORT functions are implemented in port.c, which is part
of mlx4_core, these functions are exported. The functions are in use by
the mlx4_en module (were originally part of mlx4_en).
Their declaration remained in mlx4_en module, moving the declaration to the right location.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:18 -05:00
Yevgeny Petrilin
2ab573c586 net/mlx4: fixing sparse warnings when copying mac, address to gid entry
The mac should be written as __be64 the gid. The warning was because
we changed the mac parameter, which is u64, by writing result of cpu_to_be64
into it. Fixing by using new variable of type __be64.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:17 -05:00
Yevgeny Petrilin
39b2c4ebb4 net/mlx4: fix sparse warnings on wrong type for RSS keys
The keys used for the hardware RSS topelitz hash are of type __be32
where the values provided by the driver are from array of u32,
this triggered sparse warning on incorrect type in assignment as of different base types.
Since these values are picked randomly,
the fix is to transform the key to __be32 by executing cpu_to_be_32()

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:17 -05:00
Or Gerlitz
966684d581 net/mlx4: fix sparse warnings on TX blue flame buffer
The blue flame buffer is defined to be of type void __iomem *
but was passed to mlx4_bf_copy which gets unsigned long * .
This triggered sparse warning on different address spaces,
fix that by changing mlx4_bf_copy first param to be of type void __iomem * .

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:17 -05:00
Or Gerlitz
4ef2a435be net/mlx4: fix sparse warnings on TX control flags, endianess
Fix sparse warnings on incompatibility between the endianess of the ctrl_flags
field of struct mlx4_en_priv to the srcrb_flags field of struct
mlx4_wqe_ctrl_seg by changing the former to be __be32 instead of u32.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:17 -05:00
Yevgeny Petrilin
ebf8c9aa03 net/mlx4_en: Saving mem access on data path
Localized the pdev->dev, and using dma_map instead of pci_map
There are multiple map/unmap operations on data path,
optimizing those by saving redundant pointer access.
Those places were identified as hot-spots when running kernel profiling
during some benchmarks.
The fixes had most impact when testing packet rate with small packets,
reducing several % from CPU load, and in some case being the difference
between reaching wire speed or being CPU bound.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:17 -05:00
Yevgeny Petrilin
1d4526e037 mlx4_core: remove buggy sched_queue masking
Fixes a bug introduced by commit fe9a2603c, where the priority bits
in the schedule queue field were masked out.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 14:43:50 -05:00
Eric Dumazet
18f973af3e mlx4_en: remove sparse errors
Fix new sparse errors introduced in commit 6221217199 (mlx4_en: dont
change mac_header on xmit)

Reported-by: Or Gerlitz <or.gerlitz@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-05 16:42:45 -05:00
David S. Miller
ff4783ce78 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/sfc/rx.c

Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change
the rx_buf->is_page boolean into a set of u16 flags, and another to
adjust how ->ip_summed is initialized.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 21:55:51 -05:00
Linus Torvalds
203738e548 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
1) ICMP sockets leave err uninitialized but we try to return it for the
   unsupported MSG_OOB case, reported by Dave Jones.

2) Add new Zaurus device ID entries, from Dave Jones.

3) Pointer calculation in hso driver memset is wrong, from Dan
   Carpenter.

4) ks8851_probe() checks unsigned value as negative, fix also from Dan
   Carpenter.

5) Fix crashes in atl1c driver due to TX queue handling, from Eric
   Dumazet.  I anticipate some TX side locking fixes coming in the near
   future for this driver as well.

6) The inline directive fix in Bluetooth which was breaking the build
   only with very new versions of GCC, from Johan Hedberg.

7) Fix crashes in the ATP CLIP code due to ARP cleanups this merge
   window, reported by Meelis Roos and fixed by Eric Dumazet.

8) JME driver doesn't flush RX FIFO correctly, from Guo-Fu Tseng.

9) Some ip6_route_output() callers test the return value for NULL, but
   this never happens as the convention is to return a dst entry with
   dst->error set.  Fixes from RonQing Li.

10) Logitech Harmony 900 should be handled by zaurus driver not
   cdc_ether, update white lists and black lists accordingly.  From
   Scott Talbert.

11) Receiving from certain kinds of devices there won't be a MAC header,
   so there is no MAC header to fixup in the IPSEC code, and if we try
   to do it we'll crash.  Fix from Eric Dumazet.

12) Port type array indexing off-by-one in mlx4 driver, fix from Yevgeny
   Petrilin.

13) Fix regression in link-down handling in davinci_emac which causes
   all RX descriptors to be freed up and therefore RX to wedge
   completely, from Christian Riesch.

14) It took two attempts, but ctnetlink soft lockups seem to be
   cured now, from Pablo Neira Ayuso.

15) Endianness bug fix in ENIC driver, from Santosh Nayak.

16) The long ago conversion of the PPP fragmentation code over to
   abstracted SKB list handling wasn't perfect, once we get an
   out of sequence SKB we don't flush the rest of them like we
   should.  From Ben McKeegan.

17) Fix regression of ->ip_summed initialization in sfc driver.
   From Ben Hutchings.

18) Bluetooth timeout mistakenly using msecs instead of jiffies,
   from Andrzej Kaczmarek.

19) Using _sync variant of work cancellation results in deadlocks,
   use the non _sync variants instead.  From Andre Guedes.

20) Bluetooth rfcomm code had reference counting problems leading
   to crashes, fix from Octavian Purdila.

21) The conversion of netem over to classful qdisc handling added
   two bugs to netem_dequeue(), fixes from Eric Dumazet.

22) Missing pci_iounmap() in ATM Solos driver.  Fix from Julia Lawall.

23) b44_pci_exit() should not have __exit tag since it's invoked from
   non-__exit code.  From Nikola Pajkovsky.

24) The conversion of the neighbour hash tables over to RCU added a
   race, fixed here by adding the necessary reread of tbl->nht, fix
   from Michel Machado.

25) When we added VF (virtual function) attributes for network device
   dumps, this potentially bloats up the size of the dump of one
   network device such that the dump size is too large for the buffer
   allocated by properly written netlink applications.

   In particular, if you add 255 VFs to a network device, parts of
   GLIBC stop working.

   To fix this, we add an attribute that is used to turn on these
   extended portions of the network device dump.  Sophisticaed
   applications like 'ip' that want to see this stuff  will be changed
   to set the attribute, whereas things like GLIBC that don't care
   about VFs simply will not, and therefore won't be busted by the
   mere presence of VFs on a network device.

   Thanks to the tireless work of Greg Rose on this fix.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
  sfc: Fix assignment of ip_summed for pre-allocated skbs
  ppp: fix 'ppp_mp_reconstruct bad seq' errors
  enic: Fix endianness bug.
  gre: fix spelling in comments
  netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
  Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries"
  davinci_emac: Do not free all rx dma descriptors during init
  mlx4_core: Fixing array indexes when setting port types
  phy: IC+101G and PHY_HAS_INTERRUPT flag
  netdev/phy/icplus: Correct broken phy_init code
  ipsec: be careful of non existing mac headers
  Move Logitech Harmony 900 from cdc_ether to zaurus
  hso: memsetting wrong data in hso_get_count()
  netfilter: ip6_route_output() never returns NULL.
  ethernet/broadcom: ip6_route_output() never returns NULL.
  ipv6: ip6_route_output() never returns NULL.
  jme: Fix FIFO flush issue
  atm: clip: remove clip_tbl
  ipv4: ping: Fix recvmsg MSG_OOB error handling.
  rtnetlink: Fix problem with buffer allocation
  ...
2012-02-26 12:47:17 -08:00
Eric Dumazet
6221217199 mlx4_en: dont change mac_header on xmit
A driver xmit function is not allowed to change skb without special
care.

mlx4_en_xmit() should not call skb_reset_mac_header() and instead should
use skb->data to access ethernet header.

This removes a dumb test : if (ethh && ethh->h_dest)

Also remove this slow mlx4_en_mac_to_u64() call, we can use
get_unaligned() to get faster code.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 14:22:05 -05:00
Eli Cohen
a5bbe892da mlx4: Enforce device max FMR maps in FMR alloc
ConnectX devices have a limit on the number of mappings that can be
done on an FMR before having to call sync_tpt.  The current
mlx4_ib driver reports the limit correctly in max_map_per_fmr in
.query_device(), but mlx4_core doesn't check it when actually
allocating FMRs.

Add a max_fmr_maps field to struct mlx4_caps and enforce this maximum
value on FMR allocations.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-26 01:43:37 -08:00
Linus Torvalds
b52b80023f One InfiniBand/RDMA regression fix for 3.3:
- mlx4 SR-IOV changes added static exported functions, which doesn't
    build on powerpc at least.  Fix from Doug Ledford for this.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPSEe5AAoJEENa44ZhAt0hIoUP/jwpxojruN8c7PuJesp3sne2
 45xxH0OsPSFdYTBnNZGtL3MWnzp88YjoggRvrKGmiEc/vitDTveL8/j+jFatl/d5
 o5yBq9wsSKgU3jJiczT21WBOIg2I8KGgdWozNSO8rUl++fHJGgH1sCSmf8biomnk
 dlHZbA4ZszH1bxHh406GK/+cP5jjKlTLqkixz/156fsopMxzHBaJycOmzSPpHl9s
 ykrVv3n/mhrc3zBgx5y9aU+LhcchZH6CBQOzLBks1c4w8AFXTxIMAQRhkBVnDksi
 zZhg5E05zRkynr27zebNnu6Y6hfWamfCHkM6krJLXL/QRfN0LmoLvdtW2HRffrHW
 9eOzEdmh9i5wHSeH5zhQAE17yttpWQCNN1thGv3oe2uGj7ItmtA1yc/5q8opwBU3
 bl4/saNVdK8DGtpGZjCmPBnsOl9IgMmYMn5haRmOhDvD4/B93OHl+2iQi50vPJHS
 bBvYG03OTHYs9QexDANDv3Q9VmxGRCBXcZkQblAqofgO3dLgCQdddwn0VHt9xDfK
 5Hp31vPs5pGfT6AkPHREHNNvj27Ve1erjrODNVI/Qr2CIkaoiQ/6oR+9ZMrAFsnb
 uyhNDzt4yuMrfA3T/7sAITL6hFLkVHYL2lSorFr5BXZ4cusYlzUYHRbtiR17Vsye
 1KNTElxLI9fmD5xJSY6N
 =mHhD
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

One InfiniBand/RDMA regression fix for 3.3:

 - mlx4 SR-IOV changes added static exported functions, which doesn't
   build on powerpc at least.  Fix from Doug Ledford for this.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  mlx4_core: Exported functions can't be static
2012-02-24 20:03:14 -08:00
Yevgeny Petrilin
1e0f03d57d mlx4_core: Fixing array indexes when setting port types
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-24 01:52:45 -05:00
Doug Ledford
22c8bff6fa mlx4_core: Exported functions can't be static
At least on powerpc, it breaks the build if exported functions are
static.  Fix some static exported functions introduced with the mlx4
SR-IOV support added in 3.3-rc1.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-22 23:00:38 -08:00
Yevgeny Petrilin
3d8f93083b mlx4: Setting new port types after all interfaces unregistered
In port type change flow, need to set the new port types only after
all interfaces have finished the unregister process.
Otherwise, during unregister, one of the interfaces might issue a SET_PORT
command with wrong port types, it can cause bad FW behavior.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 15:27:24 -05:00
Yevgeny Petrilin
730c41d5ba mlx4: Replacing pool_lock with mutex
Under the spinlock we call request_irq(), which allocates memory with GFP_KERNEL,
This causes the following trace when DEBUG_SPINLOCK is enabled, it can cause
the following trace:

 BUG: spinlock wrong CPU on CPU#2, ethtool/2595
 lock: ffff8801f9cbc2b0, .magic: dead4ead, .owner: ethtool/2595, .owner_cpu: 0
 Pid: 2595, comm: ethtool Not tainted 3.0.18 #2
 Call Trace:
 spin_bug+0xa2/0xf0
 do_raw_spin_unlock+0x71/0xa0
 _raw_spin_unlock+0xe/0x10
 mlx4_assign_eq+0x12b/0x190 [mlx4_core]
 mlx4_en_activate_cq+0x252/0x2d0 [mlx4_en]
 ? mlx4_en_activate_rx_rings+0x227/0x370 [mlx4_en]
 mlx4_en_start_port+0x189/0xb90 [mlx4_en]
 mlx4_en_set_ringparam+0x29a/0x340 [mlx4_en]
 dev_ethtool+0x816/0xb10
 ? dev_get_by_name_rcu+0xa4/0xe0
 dev_ioctl+0x2b5/0x470
 handle_mm_fault+0x1cd/0x2d0
 sock_do_ioctl+0x5d/0x70
 sock_ioctl+0x79/0x2f0
 do_vfs_ioctl+0x8c/0x340
 sys_ioctl+0xa1/0xb0
 system_call_fastpath+0x16/0x1b

Replacing with mutex, which is enough in this case.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 15:27:23 -05:00
Jack Morgenstein
3d7474734b mlx4_core: Do not map BF area if capability is 0
BF can be disabled in some cases, the capability field, bf_reg_size is set
to zero in this case. Don't map the BF area in this case, it would cause
failures.  In addition, leaving the BF area unmapped
also alerts the ETH driver to not use BF.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-20 19:26:34 -05:00
David S. Miller
32efe08d77 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c

Small minor conflict in bnx2x, wherein one commit changed how
statistics were stored in software, and another commit
fixed endianness bugs wrt. reading the values provided by
the chip in memory.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-19 16:03:15 -05:00
Eugenia Emantayev
9f5b6c632e mlx4: add unicast steering entries to resource_tracker
Add unicast steering entries to resource tracker.
Do qp_detach also for these entries when VF doesn't shut down gracefully.
Otherwise there is leakage of these resources, since they are not tracked.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-15 14:50:16 -05:00
Eugenia Emantayev
2531188b47 mlx4: fix QP tree trashing
When adding new unicast steer entry, before moving qp to state ready,
actually before calling mlx4_RST2INIT_QP_wrapper(), there were added
a lot of entries with local_qpn=0 into radix tree.
This fact impacted the get_res() function and proper functioning
of resource tracker in addition to adding trash entries into radix tree.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@melllanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-15 14:50:16 -05:00
Eugenia Emantayev
75c6062cb7 mlx4: fix buffer overrun
When passing MLX4_UC_STEER=1 it was translated to value 2
after mlx4_QP_ATTACH_wrapper. Therefore in new_steering_entry()
unicast steer entries were added to index 2 of array of size 2.
Fixing this bug by shift right to one position.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-15 14:50:15 -05:00
Eugenia Emantayev
eb40d89276 mlx4: add unicast steering entries to resource_tracker
Add unicast steering entries to resource tracker.
Do qp_detach also for these entries when VF doesn't shut down gracefully.
Otherwise there is leakage of these resources, since they are not tracked.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:59 -05:00
Eugenia Emantayev
f1f75f0e2b mlx4: attach multicast with correct flag
mlx4_multicast_attach/detach() should use always MLX4_MC_STEER flag

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:58 -05:00
Eugenia Emantayev
de9b43dbb8 mlx4: remove redundant adding of steering type to gid
mlx4_uc_steer_add/release() should not add MLX4_UC_STEER flag to gid.
It is added in mlx4_unicast_attach/detach().

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:58 -05:00
Eugenia Emantayev
deb8b3e849 mlx4: remove unnecessary variables and arguments
mlx4_qp_attach/detach_common() don't use hash variable, move it to find_entry()
static find_entry() in mcg.c doesn't use steer argument

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:58 -05:00
Eugenia Emantayev
45b5136551 mlx4: remove unused field high_prios
Remove unnecessary field high_prios from mlx4_steer struct and initialization

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:58 -05:00
Eugenia Emantayev
0ee9f1dd7b mlx4: fix QP tree trashing
When adding new unicast steer entry, before moving qp to state ready,
actually before calling mlx4_RST2INIT_QP_wrapper(), there were added
a lot of entries with local_qpn=0 into radix tree.
This fact impacted the get_res() function and proper functioning
of resource tracker in addition to adding trash entries into radix tree.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@melllanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:58 -05:00
Eugenia Emantayev
58a30d6a3c mlx4_core: fix buffer overrun
When passing MLX4_UC_STEER=1 it was translated to value 2
after mlx4_QP_ATTACH_wrapper. Therefore in new_steering_entry()
unicast steer entries were added to index 2 of array of size 2.
Fixing this bug by shift right to one position.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:57 -05:00
Axel Lin
758ff235b3 mlx4: Fix kcalloc parameters swapped
The first parameter should be "number of elements" and the second parameter
should be "element size".

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-13 16:00:58 -05:00
David S. Miller
d5ef8a4d87 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/infiniband/hw/nes/nes_cm.c

Simple whitespace conflict.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 23:32:28 -05:00
Thadeu Lima de Souza Cascardo
7e2eb99cc6 mlx4: fix DMA mapping leak when allocation fails
mlx4_en_prepare_rx_desc does not correctly clean up after it finds an
allocation failure. It should unmap a page before calling put_page, but
it only calls the later.

This bug would prevent a device removal using hotplug after setting the
device MTU to 9000 and opening the network interface. After the fix, we
still see the allocation failure with MTU 9000, but we are able to
remove the device.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 14:42:28 -05:00
Thadeu Lima de Souza Cascardo
68355f7113 mlx4: allow device removal by fixing dma unmap size
After opening the network interface, Mellanox ConnectX device cannot be
removed by hotplug because it has not properly unmapped all DMA memory.

It happens that mlx4_en_activate_rx_rings overrides the variable that
keeps the size of the memory mapped.

This is fixed by passing to mlx4_en_destroy_rx_ring the same size that is
given to mlx4_en_create_rx_ring.

After applying this patch, hot unplugging the device works after opening
the interface.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 14:42:28 -05:00
Eugenia Emantayev
4c41b36737 mlx4_core: use correct port for steering
Use port number for correct steering (list per port).
Before the fix all steering entries (for both physical ports)
were managed in first port structures, so we had leakage of resources
for port 2.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 12:10:11 -05:00
Eugenia Emantayev
4df9950406 mlx4_core: use correct flag for unicast_promisc
Use MLX4_DEV_CAP_FLAG_VEP_UC_STEER for unicast_promisc_add/remove
Unicast entries were managed in wrong data structures.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 12:10:11 -05:00
Eugenia Emantayev
f08ad06c05 mlx4_core: fix memory leak at multi_func_cleanup
Perform cleanup also in non-master flow.
The VFs use communication channel as well.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 12:10:11 -05:00
Pradeep A Dalvi
c056b734e5 netdev: ethernet dev_alloc_skb to netdev_alloc_skb
Replaced deprecating dev_alloc_skb with netdev_alloc_skb in drivers/net/ethernet
  - Removed extra skb->dev = dev after netdev_alloc_skb

Signed-off-by: Pradeep A Dalvi <netdev@pradeepdalvi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 11:52:27 -05:00
Masanari Iida
8d9eb069ea mlx4: Fix typo in cmd.c
Correct spelling "reseting" to "resetting" in
drivers/net/ethernet/mellanox/mlx4/cmd.c

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-04 16:31:47 -05:00
Joe Perches
41de8d4cff drivers/net: Remove alloc_etherdev error messages
alloc_etherdev has a generic OOM/unable to alloc message.
Remove the duplicative messages after alloc_etherdev calls.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-31 16:20:48 -05:00
Joe Perches
e404decb0f drivers/net: Remove unnecessary k.alloc/v.alloc OOM messages
alloc failures use dump_stack so emitting an additional
out-of-memory message is an unnecessary duplication.

Remove the allocation failure messages.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-31 16:20:21 -05:00
Marcel Apfelbaum
803143fbda mlx4_core: map async events to arbitrary slave eqs
Slave async events were mapped to single eq. This patch fixes this issue, so
the slaves can map the async events to any eq.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:44 -05:00