linux

Author	SHA1	Message	Date
Jack Morgenstein	afa8fd1db9	mlx4: Paravirtualize Node Guids for slaves This is necessary in order to support > 1 VF/PF in a VM for software that uses the node guid as a discriminator, such as librdmacm. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:43 -07:00
Jack Morgenstein	026149cbaa	mlx4: Activate SR-IOV mode for IB Remove the error returns for IB ports from mlx4_ib_add, mlx4_INIT_PORT_wrapper, and mlx4_CLOSE_PORT_wrapper. Currently, SRIOV is supported only for devices for which the link layer is IB on all ports; RoCE support will be added later. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:42 -07:00
Jack Morgenstein	992e8e6e87	IB/mlx4: Miscellaneous adjustments for SR-IOV IB support 1. Allow only master to change node description. 2. Prevent AH leakage in send mads. 3. Take device part number from PCI structure, so that guests see the VF part number (and not the PF part number). 4. Place the device revision ID into caps structure at startup. 5. SET_PORT in update_gids_task needs to go through wrapper on master. 6. In mlx4_ib_event(), PORT_MGMT_EVENT needs be handled in a work queue on the master, since it propagates events to slaves using GEN_EQE. 7. Do not support FMR on slaves. 8. Add spinlock to slave_event(), since it is called both in interrupt context and in process context (due to 6 above, and also if smp_snoop is used). This fix was found and implemented by Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:41 -07:00
Jack Morgenstein	980e90010f	mlx4_core: INIT/CLOSE port logic for IB ports in SR-IOV mode Normally, INIT_PORT and CLOSE_PORT are invoked when special QP0 transitions to RTR, or transitions to ERR/RESET respectively. In SR-IOV mode, however, the master is also paravirtualized. This in turn requires that we not do INIT_PORT until the entire QP0 path (real QP0 and proxy QP0) is ready to receive. When the real QP0 goes down, we should indicate that the port is not active. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:40 -07:00
Jack Morgenstein	efcd235d73	net/mlx4_core: Adjustments to SET_PORT for IB SR-IOV 1. Slaves may not set the IS_SM capability for the port. 2. DEV_MGMT may not be set in multifunction mode. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:40 -07:00
Jack Morgenstein	2a4fae148c	IB/mlx4: Propagate P_Key and guid change port management events to slaves P_Key change and guid change events are not of interest to all slaves, but only to those slaves which "see" the table slots whose contents have change. For example, if the guid at port 1, index 5 has changed in the PPF, we wish to propagate the gid-change event only to the function which has that guid index mapped to its port/guid table (in this case it is slave #5). Other functions should not get the event, since the event does not affect them. Similarly with P_Keys -- P_Key change events are forwarded only to slaves which have that P_Key index mapped to their virtual P_Key table. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:38 -07:00
Jack Morgenstein	a0c64a17ab	mlx4: Add alias_guid mechanism For IB ports, we paravirtualize the GUID at index 0 on slaves. The GUID at index 0 seen by a slave is the actual GUID occupying the GUID table at the slave-id index. The driver, by default, requests at startup time that subnet manager populate its entire guid table with GUIDs. These guids are then mapped (paravirtualized) to the slaves, and appear for each slave as its GUID at index 0. Until each slave has such a guid, its port status is DOWN. The guid table is cached to support special QP paravirtualization, and event propagation to slaves on guid change (we test to see if the guid really changed before propagating an event to the slave). To support this caching, add capability to __mlx4_ib_query_gid() to obtain the network view (i.e., physical view) gid at index X, not just the host (paravirtualized) view. Based on a patch from Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:37 -07:00
Jack Morgenstein	993c401e20	mlx4_core: Add IB port-state machine and port mgmt event propagation For an IB port, a slave should not show port active until that slave has a valid alias-guid (provided by the subnet manager). Therefore the port-up event should be passed to a slave only after both the port is up, and the slave's alias-guid has been set. Also, provide the infrastructure for propagating port-management events (client-reregister, etc) to slaves. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:37 -07:00
Jack Morgenstein	0a9a01884d	mlx4: MAD_IFC paravirtualization The MAD_IFC firmware command fulfills two functions. First, it is used in the QP0/QP1 MAD-handling flow to obtain information from the FW (for answering queries), and for setting variables in the HCA (MAD SET packets). For this, MAD_IFC should provide the FW (physical) view of the data. This is the view that OpenSM needs. We call this the "network view". In the second case, MAD_IFC is used by various verbs to obtain data regarding the local HCA (e.g., ib_query_device()). We call this the "host view". This data needs to be paravirtualized. MAD_IFC therefore needs a wrapper function, and also needs another flag indicating whether it should provide the network view (when it is called by ib_process_mad in special-qp packet handling), or the host view (when it is called while implementing a verb). There are currently 2 flag parameters in mlx4_MAD_IFC already: ignore_bkey and ignore_mkey. These two parameters are replaced by a single "mad_ifc_flags" parameter, with different bits set for each flag. A third flag is added: "network-view/host-view". Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:34 -07:00
Jack Morgenstein	54679e1482	mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop This requires: 1. Replacing the paravirtualized P_Key index (inserted by the guest) with the real P_Key index. 2. For UD QPs, placing the guest's true source GID index in the address path structure mgid field, and setting the ud_force_mgid bit so that the mgid is taken from the QP context and not from the WQE when posting sends. 3. For UC and RC QPs, placing the guest's true source GID index in the address path structure mgid field. 4. For tunnel and proxy QPs, setting the Q_Key value reserved for that proxy/tunnel pair. Since not all the above adjustments occur in all the QP transitions, the QP transitions require separate wrapper functions. Secondly, initialize the P_Key virtualization table to its default values: Master virtualized table is 1-1 with the real P_Key table, guest virtualized table has P_Key index 0 mapped to the real P_Key index 0, and all the other P_Key indices mapped to the reserved (invalid) P_Key at index 127. Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache. and generating events on the master only if a P_Key actually changed. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:33 -07:00
Jack Morgenstein	fc06573dfa	IB/mlx4: Initialize SR-IOV IB support for slaves in master context Allocate SR-IOV paravirtualization resources and MAD demuxing contexts on the master. This has two parts. The first part is to initialize the structures to contain the contexts. This is done at master startup time in mlx4_ib_init_sriov(). The second part is to actually create the tunneling resources required on the master to support a slave. This is performed the master detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event generated when a slave initializes its comm channel). For the master, there is no such startup event, so it creates its own tunneling resources when it starts up. In addition, the master also creates the real special QPs. The ib_core layer on the master causes creation of proxy special QPs, since the master is also paravirtualized at the ib_core layer. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:32 -07:00
Jack Morgenstein	e2c76824ca	mlx4_core: Add proxy and tunnel QPs to the reserved QP area In addition, pass the proxy and tunnel QP numbers to slaves so the driver can perform special QP paravirtualization. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:31 -07:00
Linus Torvalds	2ade0b7f79	Last set of InfiniBand/RDMA fixes for 3.6: - Couple more IPoIB fixes for regressions introduced by path database conversion - Minor other fixes to low-level drivers (cxgb4, mlx4, qib, ocrdma) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCAAGBQJQV0eiAAoJEENa44ZhAt0hTScP/3+Y8v4w3Nyop7iNj9uvnsCn PhgfY9XUp9d0zOHM7XK+pTGLJheLLiyXfcCFTwgoC6tIyj7J1ulhYv+RBr7GbS+e K7aG7QtLDyM17ETgOBVIAvkz1CFKwdI843xRQKo0/oVMQJ9XLHg0nvzd+XaxZuud C+7Lfl5RH9Bl1tHJuOZmdyJinSQyLO/uV+tqwL5upF5YMTCJ2XsmVa1HQKhOAkBE bH4zXtFViNEKVuKXkpGc1yJFKZiwysirluYXedDkkEdr2jf9Mysw5JvZKVUOCh+4 qYMjEAWQaUBia/tFUs9k6iCwE8ws7tRC4OSisb3cVzCUaqkfmx1Sp7vAQcDqx4xB r2dy4oc+vEFdrmjerOVEgmdgTwnDrWMjUlPuGyRX9935I7ybKA7NQPGlImKU8oiq 8vyptsTN6r7AYc7kw6aYaE9kqGPnST7I3cZg71lHfwCuHKMNYfkxd1Mqxx/W6rMZ G1VnstVdioQK2HvcFIeArXQ6H6QjUAY92WMoyhIsrU4KwEx2Zpr0pfKWCD0eIOuB f4s9v0f5/6DqSwJ9uhhpAW/5bmV+RUCDjm8Ep9i70bGk7SliwT0MNPhHbaMfm7aU c+awy3HWpWKGCtod9zUzJH/4NsHzpggdHUlyAOuU5nqsh9GarYb0cvLjTNfsfdyV haQtMR+fSUK7Durd9MQl =nqvR -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA fixes from Roland Dreier: - A couple more IPoIB fixes for regressions introduced by path database conversion - Minor other fixes to low-level drivers (cxgb4, mlx4, qib, ocrdma) * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/qib: Fix failure of compliance test C14-024#06_LocalPortNum RDMA/ocrdma: Fix CQE expansion of unsignaled WQE mlx4_core: Fix integer overflows so 8TBs of memory registration works IPoIB: Fix AB-BA deadlock when deleting neighbours IPoIB: Fix memory leak in the neigh table deletion flow RDMA/cxgb4: Move dereference below NULL test	2012-09-17 13:21:02 -07:00
Yishai Hadas	dd03e73481	mlx4_core: Fix integer overflows so 8TBs of memory registration works This patch adds on the fixes done in commits `89dd86db78` ("mlx4_core: Allow large mlx4_buddy bitmaps") and `3de819e6b6` ("mlx4_core: Fix integer overflow issues around MTT table") so that memory registration of up to 8TB (log_num_mtt=31) finally works. It fixes integer overflows in a few mlx4_table_yyy routines in icm.c by using a u64 intermediate variable, and int/uint issues that caused table indexes to become nagive by setting some variables to be u32 instead of int. These problems cause crashes when a user attempted to register > 512GB of RAM. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-13 17:52:02 -07:00
Eugenia Emantayev	521130d11f	net/mlx4_core: Return the error value in case of command initialization failure If mlx4_cmd_init() failed, the init_one function returned success, although no resources were opened. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-07 12:55:59 -04:00
Aviad Yehezkel	bef772eb06	net/mlx4_core: Fixing error flow in case of QUERY_FW failure The order of operations was wrong on the teardown flow. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-07 12:55:59 -04:00
Aviad Yehezkel	60d31c1475	net/mlx4_core: Looking for promiscuous entries on the correct port The search for promisc entries was always done on the first port, While the addition is done on the correct port. This lead to resource leackage of promisc entries on the second port and brought to a state where we could no longer enter to promiscuous mode after enough iterations of "ifconfig promisc" on the second port. Fix that by using the correct port when searching. Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-07 12:55:59 -04:00
Hadar Hen Zion	7fb40f87c4	net/mlx4_core: Add security check / enforcement for flow steering rules set for VMs Since VFs may be mapped to VMs which aren't trusted entities, flow steering rules attached through the wrapper on behalf of VFs must be checked to make sure that their L2 specification relate to MAC address assigned to that VF, and add L2 specification if its missing. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-07 12:55:59 -04:00
Hadar Hen Zion	a8edc3bf05	net/mlx4_core: Put Firmware flow steering structures in common header files To allow for usage of the flow steering Firmware structures in more locations over the driver, such as the resource tracker, move them from mcg.c to common header files. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-07 12:55:59 -04:00
Linus Torvalds	8f8ba75ee2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking update from David Miller: "A couple weeks of bug fixing in there. The largest chunk is all the broken crap Amerigo Wang found in the netpoll layer." 1) netpoll and it's users has several serious bugs: a) uses GFP_KERNEL with locks held b) interfaces requiring interrupts disabled are called with them enabled c) and vice versa d) VLAN tag demuxing, as per all other RX packet input paths, is not applied All from Amerigo Wang. 2) Hopefully cure the ipv4 mapped ipv6 address TCP early demux bugs for good, from Neal Cardwell. 3) Unlike AF_UNIX, AF_PACKET sockets don't set a default credentials when the user doesn't specify one explicitly during sendmsg(). Instead we attach an empty (zero) SCM credential block which is definitely not what we want. Fix from Eric Dumazet. 4) IPv6 illegally invokes netdevice notifiers with RCU lock held, fix from Ben Hutchings. 5) inet_csk_route_child_sock() checks wrong inet options pointer, fix from Christoph Paasch. 6) When AF_PACKET is used for transmit, packet loopback doesn't behave properly when a socket fanout is enabled, from Eric Leblond. 7) On bluetooth l2cap channel create failure, we leak the socket, from Jaganath Kanakkassery. 8) Fix all the netprio file handling bugs found by Al Viro, from John Fastabend. 9) Several error return and NULL deref bug fixes in networking drivers from Julia Lawall. 10) A large smattering of struct padding et al. kernel memory leaks to userspace found of Mathias Krause. 11) Conntrack expections in netfilter can access an uninitialized timer, fix from Pablo Neira Ayuso. 12) Several netfilter SIP tracker bug fixes from Patrick McHardy. 13) IPSEC ipv6 routes are not initialized correctly all the time, resulting in an OOPS in inet_putpeer(). Also from Patrick McHardy. 14) Bridging does rcu_dereference() outside of RCU protected area, from Stephen Hemminger. 15) Fix routing cache removal performance regression when looking up output routes that have a local destination. From Zheng Yan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits) af_netlink: force credentials passing [CVE-2012-3520] ipv4: fix ip header ident selection in __ip_make_skb() ipv4: Use newinet->inet_opt in inet_csk_route_child_sock() tcp: fix possible socket refcount problem net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child() net/core/dev.c: fix kernel-doc warning netconsole: remove a redundant netconsole_target_put() net: ipv6: fix oops in inet_putpeer() net/stmmac: fix issue of clk_get for Loongson1B. caif: Do not dereference NULL in chnl_recv_cb() af_packet: don't emit packet on orig fanout group drivers/net/irda: fix error return code drivers/net/wan/dscc4.c: fix error return code drivers/net/wimax/i2400m/fw.c: fix error return code smsc75xx: add missing entry to MAINTAINERS net: qmi_wwan: new devices: UML290 and K5006-Z net: sh_eth: Add eth support for R8A7779 device netdev/phy: skip disabled mdio-mux nodes dt: introduce for_each_available_child_of_node, of_get_next_available_child net: netprio: fix cgrp create and write priomap race ...	2012-08-21 16:46:08 -07:00
Linus Torvalds	846b99964a	Grab bag of InfiniBand/RDMA fixes: - IPoIB fixes for regressions introduced by path database conversion - mlx4 fixes for bugs with large memory systems and regressions from SR-IOV patches - RDMA CM fix for passing bad event up to userspace - Other minor fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCAAGBQJQLo9cAAoJEENa44ZhAt0hPeUP/Ag1i164UE2e64sTfbVUfbEM 7a+acsBo2ByQhVaXd65PZqm6+aYjosSwkStl0u4cb9Q3qQvYiZeLJj/qmInVV1hv qHuNytbnXl6W2f6vyz391TKx+X6uastlH7O4f4v0zS3zkrlkN9kCrN5dI1Fxm9ba 5hluyggs7pVmHPX7KDGnQNZEjCsKoJF2kzB0hXwqGo1JxB2sEGq3DV/uMTHPgKKM HlrTJEEb7pOeHHf2Zt8MQiHMC0YBJ6MmpVIidR02TdQdCgZQcpoYnn20odmNLK5D HTT4jbCSsbOiHPpdAn0YkgzXzL58a2/b1Tt/pNq+Rud7VKwFVuFhNElPyAxR1/1K tPGUbDA6eOfosU++QbDJ0QHxDTgBTyXwh4yMkwPjmdWJ2jwTQev74iiDXgDf3QbS tMmWSwfC38hBxvfomNA6QTPvY1c3NNvj3uO+xq4/q6HloGqSlnIrPBL9hF0BiVQ6 KteZ6gF+R1rHEIQraEnbNmh5Rxvi2RdgN0Xa0U9w/yr0LCowmUbFK35XoyTFyo/8 rF7xp85s1wH5OuaA0NQeivQEKbuBxl7nvJlC2RBdRT/B9U5now+Y2XKOX0+FfRsg Qm0By/K8ss2XftX+lNfip+ScSmu9p+kCBME0X8Ra590VDt0IcPiJsC9ozhKyywRy KfSGj3iZG/nDaec6i1aN =hN/H -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma fixes from Roland Dreier: "Grab bag of InfiniBand/RDMA fixes: - IPoIB fixes for regressions introduced by path database conversion - mlx4 fixes for bugs with large memory systems and regressions from SR-IOV patches - RDMA CM fix for passing bad event up to userspace - Other minor fixes" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/mlx4: Check iboe netdev pointer before dereferencing it mlx4_core: Clean up buddy bitmap allocation mlx4_core: Fix integer overflow issues around MTT table mlx4_core: Allow large mlx4_buddy bitmaps IB/srp: Fix a race condition IB/qib: Fix error return code in qib_init_7322_variables() IB: Fix typos in infiniband drivers IB/ipoib: Fix RCU pointer dereference of wrong object IB/ipoib: Add missing locking when CM object is deleted RDMA/ucma.c: Fix for events with wrong context on iWARP RDMA/ocrdma: Don't call vlan_dev_real_dev() for non-VLAN netdevs IB/mlx4: Fix possible deadlock on sm_lock spinlock	2012-08-17 11:45:58 -07:00
Roland Dreier	96f17d5900	mlx4_core: Clean up buddy bitmap allocation - Use kcalloc() / vzalloc() instead of an extra bitmap_zero(). - Add __GFP_NOWARN to kcalloc() since we'll try vzalloc() if it fails. Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-08-15 21:05:27 -07:00
Yishai Hadas	3de819e6b6	mlx4_core: Fix integer overflow issues around MTT table Fix some issues around int variables used in data structures related to memory registration. Handle int overflow in mlx4_init_icm_table by using a u64 intermediate variable and changing struct mlx4_icm_table num_obj field to be u32. Change some more fields/variables to use u32 instead of int to prevent a case where the variable becomes negative when bit 31 is set. Also subtract log_mtts_per_seg from the exponent when computing num_mtt, since its added later on in that very same code area. This and the previous commit fixes some issues which actually prevent commit `db5a7a65c0` ("mlx4_core: Scale size of MTT table with system RAM") from working. Now, when the number of MTTs is scaled with the size of the RAM we can map up to 8TB. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-08-15 21:05:26 -07:00
Yishai Hadas	89dd86db78	mlx4_core: Allow large mlx4_buddy bitmaps mlx4_buddy_init uses kmalloc() to allocate bitmaps, which fails when the required size is beyond the max supported value (or when memory is too fragmented to handle a huge allocation). Extend this to use use vmalloc() if kmalloc() fails, and take that into account when freeing the bitmaps as well. This fixes a driver load failure when log num mtt is 26 or higher, and is a step in the direction of allowing to register huge amounts of memory on large memory systems. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-08-15 21:05:20 -07:00
Julia Lawall	499b95f6b3	drivers/net/ethernet/mellanox/mlx4/mcg.c: fix error return code Convert a 0 error return code to a negative one, as returned elsewhere in the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier ret; expression e,e1,e2,e3,e4,x; @@ ( if (\(ret != 0\\|ret < 0\) \|\| ...) { ... return ...; } \| ret = 0 ) ... when != ret = e1 x = \(kmalloc\\|kzalloc\\|kcalloc\\|devm_kzalloc\\|ioremap\\|ioremap_nocache\\|devm_ioremap\\|devm_ioremap_nocache\)(...); ... when != x = e2 when != ret = e3 if (x == NULL \|\| ...) { ... when != ret = e4 * return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 17:00:57 -07:00
Yevgeny Petrilin	2207b60ffb	net/mlx4_core: Remove port type restrictions Port1=Eth, Port2=IB restriction is no longer required. Having RoCE, there will always rdma port initialized over ConnectX physical port, no matter whether the link layer is IB or Ethernet. So we always have dual port IB device. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-03 16:49:40 -07:00
Yevgeny Petrilin	c18520bd1b	net/mlx4_en: Fixing TX queue stop/wake flow Removing the ring->blocked flag, it is redundant and leads to a race: We close the TX queue and then set the "blocked" flag. Between those 2 operations the completion function can check the "blocked" flag, sees that it is 0, and wouldn't open the TX queue. Using netif_tx_queue_stopped to check the state of the queue to avoid this race. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-03 16:49:02 -07:00
Amir Vadai	c8c40b7f32	net/mlx4_en: loopbacked packets are dropped when SMAC=DMAC Should NOT check SMAC=DMAC when: 1. loopback is turned on 2. validate_loopback is true. Fixed it accordingly. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-03 16:49:02 -07:00
Linus Torvalds	1e30c1b386	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking updates and fixes from David Miller: 1) Reinstate the no-ref optimization for input route lookups in ipv4 to fix some routing cache removal perf regressions. 2) Make TCP socket pre-demux work on ipv6 side too, from Eric Dumazet. 3) Get RX hash value from correct place in be2net driver, from Sarveshwar Bandi. 4) Validation of FIB cached routes missing critical check, from Eric Dumazet. 5) EEH support in mlx4 driver, from Kleber Sacilotto de Souza. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits) ipv6: Early TCP socket demux ipv4: Fix input route performance regression. pch_gbe: vlan skb len fix pch_gbe: add extra clean tx pch_gbe: fix transmit watchdog timeout ixgbe: fix panic while dumping packets on Tx hang with IOMMU be2net: Fix to parse RSS hash from Receive completions correctly. net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER hyperv: Add error handling to rndis_filter_device_add() hyperv: Add a check for ring_size value ipv4: rt_cache_valid must check expired routes net/pch_gpe: Cannot disable ethernet autonegation qeth: repair crash in qeth_l3_vlan_rx_kill_vid() netiucv: cleanup attribute usage net: wiznet add missing HAS_IOMEM dependency be2net: Missing byteswap in be_get_fw_log_level causes oops on PowerPC mlx4: Add support for EEH error recovery cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN wanmain: comparing array with NULL caif: fix NULL pointer check ...	2012-07-26 18:09:01 -07:00
Amir Vadai	ee64c0ee51	net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER RFS filter id can't have the special value RPS_NO_FILTER, need to skip it when allocating id's. CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 00:23:55 -07:00
Kleber Sacilotto de Souza	57dbf29a54	mlx4: Add support for EEH error recovery Currently the mlx4 drivers don't have the necessary callbacks to implement EEH errors detection and recovery, so the PCI layer uses the probe and remove callbacks to try to recover the device after an error on the bus. However, these callbacks have race conditions with the internal catastrophic error recovery functions, which will also detect the error and this can cause the system to crash if both EEH and catas functions try to reset the device. This patch adds the necessary error recovery callbacks and makes sure that the internal catastrophic error functions will not try to reset the device in such scenarios. It also adds some calls to pci_channel_offline() to suppress reads/writes on the bus when the slot cannot accept I/O operations so we prevent unnecessary accesses to the bus and speed up the device removal. Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Acked-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-25 15:24:13 -07:00
Linus Torvalds	5dedb9f3bd	InfiniBand/RDMA changes for the 3.6 merge window: - Updates to the qib low-level driver - First chunk of changes for SR-IOV support for mlx4 IB - RDMA CM support for IPv6-only binding - Other misc cleanups and fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCAAGBQJQDXhUAAoJEENa44ZhAt0hZxwQAJydS9f9me31AGa45SAq8rA8 6e3LgnQ6jS38he7LZZdSsT7g25jROCKOcTj6VUkNSVAVSkK8zcp7wngjDxw50IK6 FF9hNeljMkWBflOhxzB34DRGCW4b4J+Yt1o7v1RtawiG/Mhri/6imKL0Aqjt4GwX a1MPZn+xI2osLujfdHJtATPWWB9jCaXdFe4DJUNPdqJhS6TN7s8OP3XMiqoJtdaV ptHeSSbjdUR1mg/h2LU2FVmWHXNSBxn7MEsrBBRkQVyiEkXieFBLwPTp4DqmAUJf xugm6Hf4sbiQ+QuU0baJODt56wveuYVQ4HKUzE0urUFXyU4TUB9blrehWZnKsRte wo/w4nvVlnXgGhHH0Igq76RDX8aCwc/6uQJ/29oChWjrei3HE0LjmIlPAu0vAhyw ViLe02/r2gKXQv1NxIqhPmGsJTZizg2mUk2eEJHHPQb/NGL7iNo6b+141AfURoqQ goGmlGdzffCRpmo6FXFZ57RPVRS4gwMunCY/Pmvq5a2t4oZh8899l2+V3N7bCfvH +JdavxAjia9U4IlgPsAqVaz8z8TyHOY/lEd75Wnw8q9www3kLx7hASvHsdwrS4LL ihzECzsaXOSoIXQzghs+6iyp1pzmzE9ve8OqbIGJStzlrOLyn7Gjo+Ixwm0SLsTO I7h9iJ1OMKFZ8bzmHsWb =f09j -----END PGP SIGNATURE----- Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - Updates to the qib low-level driver - First chunk of changes for SR-IOV support for mlx4 IB - RDMA CM support for IPv6-only binding - Other misc cleanups and fixes Fix up some add-add conflicts in include/linux/mlx4/device.h and drivers/net/ethernet/mellanox/mlx4/main.c * tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits) IB/qib: checkpatch fixes IB/qib: Add congestion control agent implementation IB/qib: Reduce sdma_lock contention IB/qib: Fix an incorrect log message IB/qib: Fix QP RCU sparse warnings mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them mlx4_core: Allow guests to have IB ports mlx4_core: Implement mechanism for reserved Q_Keys net/mlx4_core: Free ICM table in case of error IB/cm: Destroy idr as part of the module init error flow mlx4_core: Remove double function declarations IB/mlx4: Fill the masked_atomic_cap attribute in query device IB/mthca: Fill in sq_sig_type in query QP IB/mthca: Warning about event for non-existent QPs should show event type IB/qib: Fix sparse RCU warnings in qib_keys.c net/mlx4_core: Initialize IB port capabilities for all slaves mlx4: Use port management change event instead of smp_snoop IB/qib: RCU locking for MR validation IB/qib: Avoid returning EBUSY from MR deregister IB/qib: Fix UC MR refs for immediate operations ...	2012-07-24 13:56:26 -07:00
Roland Dreier	089117e1ad	Merge branches 'cma', 'cxgb4', 'misc', 'mlx4-sriov', 'mlx-cleanups', 'ocrdma' and 'qib' into for-linus	2012-07-22 23:26:17 -07:00
Thadeu Lima de Souza Cascardo	4cce66cdd1	mlx4_en: map entire pages to increase throughput In its receive path, mlx4_en driver maps each page chunk that it pushes to the hardware and unmaps it when pushing it up the stack. This limits throughput to about 3Gbps on a Power7 8-core machine. One solution is to map the entire allocated page at once. However, this requires that we keep track of every page fragment we give to a descriptor. We also need to work with the discipline that all fragments will be released (in the sense that it will not be reused by the driver anymore) in the order they are allocated to the driver. This requires that we don't reuse any fragments, every single one of them must be reallocated. We do that by releasing all the fragments that are processed and only after finished processing the descriptors, we start the refill. We also must somehow guarantee that we either refill all fragments in a descriptor or none at all, without resorting to giving up a page fragment that we would have already given. Otherwise, we would break the discipline of only releasing the fragments in the order they were allocated. This has passed page allocation fault injections (restricted to the driver by using required-start and required-end) and device hotplug while 16 TCP streams were able to deliver more than 9Gbps. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-19 10:53:13 -07:00
Amir Vadai	1eb8c695bd	net/mlx4_en: Add accelerated RFS support Use RFS infrastructure and flow steering in HW to keep CPU affinity of rx interrupts and application per TCP stream. A flow steering filter is added to the HW whenever the RFS ndo callback is invoked by core networking code. Because the invocation takes place in interrupt context, the actual setup of HW is done using workqueue. Whenever new filter is added, the driver checks for expiry of existing filters. Since there's window in time between the point where the core RFS code invoked the ndo callback, to the point where the HW is configured from the workqueue context, the 2nd, 3rd etc packets from that stream will cause the net core to invoke the callback again and again. To prevent inefficient/double configuration of the HW, the filters are kept in a database which is indexed using hash function to enable fast access. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-19 08:34:37 -07:00
Amir Vadai	d9236c3f10	{NET,IB}/mlx4: Add rmap support to mlx4_assign_eq Enable callers of mlx4_assign_eq to supply a pointer to cpu_rmap. If supplied, the assigned IRQ is tracked using rmap infrastructure. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-19 08:34:37 -07:00
Amir Vadai	af22d9de45	net/mlx4: Move MAC_MASK to a common place Define this macro is one common place instead of duplicating it over the code Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-19 08:34:37 -07:00
Dan Carpenter	9c64508af2	net/mlx4_en: dereferencing freed memory We dereferenced "mclist" after the kfree(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-16 22:58:07 -07:00
Dan Carpenter	447458c01f	net/mlx4: off by one in parse_trans_rule() This should be ">=" here instead of ">". MLX4_NET_TRANS_RULE_NUM is 6. We use "spec->id" as an array offset into the __rule_hw_sz[] and __sw_id_hw[] arrays which have 6 elements. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-16 22:57:43 -07:00
Jack Morgenstein	6634961c14	mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them To allow easy paravirtualization of P_Key and GID table sizes, keep paravirtualized sizes in mlx4_dev->caps, but save the actual physical sizes from FW in struct: mlx4_dev->phys_cap. In addition, in SR-IOV mode, do the following: 1. Reduce reported P_Key table size by 1. This is done to reserve the highest P_Key index for internal use, for declaring an invalid P_Key in P_Key paravirtualization. We require a P_Key index which always contain an invalid P_Key value for this purpose (i.e., one which cannot be modified by the subnet manager). The way to do this is to reduce the P_Key table size reported to the subnet manager by 1, so that it will not attempt to access the P_Key at index #127. 2. Paravirtualize the GID table size to 1. Thus, each guest sees only a single GID (at its paravirtualized index 0). In addition, since we are paravirtualizing the GID table size to 1, we add paravirtualization of the master GID event here (i.e., we do not do ib_dispatch_event() for the GUID change event on the master, since its (only) GUID never changes). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-11 11:52:23 -07:00
Jack Morgenstein	105c320f6a	mlx4_core: Allow guests to have IB ports Modify mlx4_dev_cap to allow IB support when SR-IOV is active. Modify mlx4_slave_cap to set the "rdma-supported" bit in its flags area, and pass that to the guests (this is done in QUERY_FUNC_CAP and its wrapper). However, we don't activate IB support quite yet -- we leave the error return at the start of mlx4_ib_add in the mlx4_ib driver. In addition, set "protected fmr supported" bit to zero in the QUERY_FUNC_CAP wrapper. Finally, in the QUERY_FUNC_CAP wrapper, we needed to add code which checks for the port type (IB or Ethernet). Previously, this was not an issue, since only Ethernet ports were supported. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-11 11:51:37 -07:00
Jack Morgenstein	396f2feb05	mlx4_core: Implement mechanism for reserved Q_Keys The SR-IOV special QP tunneling mechanism uses proxy special QPs (instead of the real special QPs) for MADs on guests. These proxy QPs send their packets to a "tunnel" QP owned by the master. The master then forwards the MAD (after any required paravirtualization) to the real special QP, which sends out the MAD. For security reasons (i.e., to prevent guests from sending MADs to tunnel QPs belonging to other guests), each proxy-tunnel QP pair is assigned a unique, reserved, Q_Key. These Q_Keys are available only for proxy and tunnel QPs -- if the guest tries to use these Q_Keys with other QPs, it will fail. This patch introduces a mechanism for reserving a block of 64K Q_Keys for proxy/tunneling use. The patch introduces also two new fields into mlx4_dev: base_sqpn and base_tunnel_sqpn. In SR-IOV mode, the QP numbers for the "real," proxy, and tunnel sqps are added to the reserved QPN area (so that they will not change). There are 8 special QPs per port in the HCA, and each of them is assigned both a proxy and a tunnel QP, for each VF and for the PF as well in SR-IOV mode. The QPNs for these QPs are arranged as follows: 1. The real SQP numbers (8) 2. The proxy SQPs (8 * (max number of VFs + max number of PFs) 3. The tunnel SQPs (8 * (max number of VFs + max number of PFs) To support these QPs, two new fields are added to struct mlx4_dev: base_sqp: this is the QP number of the first of the real SQPs base_tunnel_sqp: this is the qp number of the first qp in the tunnel sqp region. (On guests, this is the first tunnel sqp of the 8 which are assigned to that guest). In addition, in SR-IOV mode, sqp_start is the number of the first proxy SQP in the proxy SQP region. (In guests, this is the first proxy SQP of the 8 which are assigned to that guest) Note that in non-SR-IOV mode, there are no proxies and no tunnels. In this case, sqp_start is set to sqp_base -- which minimizes code changes. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-11 11:35:55 -07:00
Dotan Barak	240a9207aa	net/mlx4_core: Free ICM table in case of error In mlx4_init_icm_table(), free the allocated table if we failed to allocate memory to its entries. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-11 09:22:58 -07:00
Dotan Barak	f457ce471c	mlx4_core: Remove double function declarations Spotted four duplicate declarations in icm.h, remove them. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-11 09:22:58 -07:00
Jack Morgenstein	2aca1172c2	net/mlx4_core: Initialize IB port capabilities for all slaves With IB SR-IOV, each slave has its own separate copy of the port capabilities flags. For example, the master can run a subnet manager (which causes the IsSM bit to be set in the master's port capabilities) without affecting the port capabilities seen by the slaves (the IsSM bit will be seen as cleared in the slaves). Also add a static inline mlx4_master_func_num() to enhance readability of the code. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-10 09:57:06 -07:00
Jack Morgenstein	00f5ce99dc	mlx4: Use port management change event instead of smp_snoop The port management change event can replace smp_snoop. If the capability bit for this event is set in dev-caps, the event is used (by the driver setting the PORT_MNG_CHG_EVENT bit in the async event mask in the MAP_EQ fw command). In this case, when the driver passes incoming SMP PORT_INFO SET mads to the FW, the FW generates port management change events to signal any changes to the driver. If the FW generates these events, smp_snoop shouldn't be invoked in ib_process_mad(), or duplicate events will occur (once from the FW-generated event, and once from smp_snoop). In the case where the FW does not generate port management change events smp_snoop needs to be invoked to create these events. The flow in smp_snoop has been modified to make use of the same procedures as in the fw-generated-event event case to generate the port management events (LID change, Client-rereg, Pkey change, and/or GID change). Port management change event handling required changing the mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument (last argument) had to be changed to unsigned long in order to accomodate passing the EQE pointer. We also needed to move the definition of struct mlx4_eqe from net/mlx4.h to file device.h -- to make it available to the IB driver, to handle port management change events. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-10 09:47:10 -07:00
Jack Morgenstein	752a50cab6	mlx4_core: Pass an invalid PCI id number to VFs Currently, VFs have 0 in their dev->caps.function field. This is a valid pci id (usually of the PF). Instead, pass an invalid PCI id to the VF via QUERY_FW, so that if the value gets accessed in the VF driver, we'll catch the problem. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-08 18:05:05 -07:00
Hadar Hen Zion	cabdc8ee37	net/mlx4_en: Add support for drop action through ethtool The drop action is implemented by allocating a QP and keeping it in a reset state such that the HW drops any packets which are steered to that QP. When a drop action is requested, we attach the relevant flow to that QP. Sign-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-07 16:23:06 -07:00
Hadar Hen Zion	820672812f	net/mlx4_en: Manage flow steering rules with ethtool Implement the ethtool APIs for attaching L2/L3/L4 based flow steering rules to the netdevice RX rings. Added set_rxnfc callback and enhanced the existing get_rxnfc callback. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-07 16:23:06 -07:00
Hadar Hen Zion	592e49dda8	net/mlx4: Implement promiscuous mode with device managed flow-steering The device managed flow steering API has three promiscuous modes: 1. Uplink - captures all the packets that arrive to the port. 2. Allmulti - captures all multicast packets arriving to the port. 3. Function port - for future use, this mode is not implemented yet. Use these modes with the flow_attach and flow_detach firmware commands according to the promiscuous state of the netdevice. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-07 16:23:06 -07:00

1 2 3 4 5

226 Commits