Remove driver specific fdb hadlers since they are the same as
the default ones.
CC: Amir Vadai <amirv@mellanox.com>
CC: Yan Burman <yanb@mellanox.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) ping_err() ICMP error handler looks at wrong ICMP header, from Li
Wei.
2) TCP socket hash function on ipv6 is too weak, from Eric Dumazet.
3) netif_set_xps_queue() forgets to drop mutex on errors, fix from
Alexander Duyck.
4) sum_frag_mem_limit() can deadlock due to lack of BH disabling, fix
from Eric Dumazet.
5) TCP SYN data is miscalculated in tcp_send_syn_data(), because the
amount of TCP option space was not taken into account properly in
this code path. Fix from yuchung Cheng.
6) MLX4 driver allocates device queues with the wrong size, from Kleber
Sacilotto.
7) sock_diag can access past the end of the sock_diag_handlers[] array,
from Mathias Krause.
8) vlan_set_encap_proto() makes incorrect assumptions about where
skb->data points, rework the logic so that it works regardless of
where skb->data happens to be. From Jesse Gross.
9) Fix gianfar build failure with NET_POLL enabled, from Paul
Gortmaker.
10) Fix Ipv4 ID setting and checksum calculations in GRE driver, from
Pravin B Shelar.
11) bgmac driver does:
int i;
for (i = 0; ...; ...) {
...
for (i = 0; ...; ...) {
effectively corrupting the outer loop index, use a seperate
variable for the inner loops. From Rafał Miłecki.
12) Fix suspend bugs in smsc95xx driver, from Ming Lei.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
usbnet: smsc95xx: rename FEATURE_AUTOSUSPEND
usbnet: smsc95xx: fix broken runtime suspend
usbnet: smsc95xx: fix suspend failure
bgmac: fix indexing of 2nd level loops
b43: Fix lockdep splat on module unload
Revert "ip_gre: propogate target device GSO capability to the tunnel device"
IP_GRE: Fix GRE_CSUM case.
VXLAN: Use tunnel_ip_select_ident() for tunnel IP-Identification.
IP_GRE: Fix IP-Identification.
net/pasemi: Fix missing coding style
vmxnet3: fix ethtool ring buffer size setting
vmxnet3: make local function static
bnx2x: remove dead code and make local funcs static
gianfar: fix compile fail for NET_POLL=y due to struct packing
vlan: adjust vlan_set_encap_proto() for its callers
sock_diag: Simplify sock_diag_handlers[] handling in __sock_diag_rcv_msg
sock_diag: Fix out-of-bounds access to sock_diag_handlers[]
vxlan: remove depends on CONFIG_EXPERIMENTAL
mlx4_en: fix allocation of CPU affinity reverse-map
mlx4_en: fix allocation of device tx_cq
...
- SRP error handling fixes from Bart Van Assche
- Implementation of memory windows for mlx4 from Shani Michaeli
- Lots of cxgb4 HW driver fixes from Vipul Pandya
- Make iSER work for virtual functions, other fixes from Or Gerlitz
- Fix for bug in qib HW driver from Mike Marciniszyn
- IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman
- Various cleanups and warning fixes from Julia Lawall, Paul Bolle, Wei Yongjun
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJRLPNoAAoJEENa44ZhAt0h0bMP/1xqlgDR3DGdoUoV4yTd6O9G
Uccdb7og5o5tedVo+8xZz01y4at99P8FWW4hJ4os6k8n6yKoBVwo7qjN+BOR+JG5
q8+O+ynUSIg4tGrb5sXcMnKXAXbw/vkftMWYNA41cbrM24DTYzB/2SLpvhwbFoTT
tdQc2tgz5QaDqzWbagyCR4+k/IgO+Llrz/RvIdtz4dsTnTDogN7QCoSffX8n/Lpb
DxtyXK4sdl3DAtd3CsIdsB/TSMb3RkHLCoSvmrWlLnqMdsbRxVnCVfBm4BOghW3J
Y2K3joRoCjjIZSRNs/i0FMFkT/jbCXg1oXg9ek/a6YFNcgyk7z8iGyXrRY7fOnno
8U2SfxJ69YpVYeJr+DSjaeHcmjsaYU7NN7JPxzvPKcJOIsxQJ/euJDXAXau3lEQY
o9/p4JsGty0WHi1NanyygvghvBAoP1C5/59Sl4bHH5gckPyJT1kinPSCTT76YXGS
WkSHg2mDhiJHy7Pnuy85iZldPoy2/5z09/I4aGMeL+8kUZbD4iFqzXIJU0HTsAim
EONoRXDhIcN5DNVSVH1ig6nJ2a7Vhov4Z0r/vB8P4KhslBcqFwf2leC0eCoe5mNt
SzcKhqosZDXoL8AwzpntzGIOid8pWmHbUx/PgIcoVXPjtl0h2ULNIFoYYyMZ3cyU
AyN2tSiUZVddTV1/aKGL
=RAQw
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband update from Roland Dreier:
"Main batch of InfiniBand/RDMA changes for 3.9:
- SRP error handling fixes from Bart Van Assche
- Implementation of memory windows for mlx4 from Shani Michaeli
- Lots of cxgb4 HW driver fixes from Vipul Pandya
- Make iSER work for virtual functions, other fixes from Or Gerlitz
- Fix for bug in qib HW driver from Mike Marciniszyn
- IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman
- Various cleanups and warning fixes from Julia Lawall, Paul Bolle,
Wei Yongjun"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (41 commits)
IB/mlx4: Advertise MW support
IB/mlx4: Support memory window binding
mlx4: Implement memory windows allocation and deallocation
mlx4_core: Enable memory windows in {INIT, QUERY}_HCA
mlx4_core: Disable memory windows for virtual functions
IPoIB: Free ipoib neigh on path record failure so path rec queries are retried
IB/srp: Fail I/O requests if the transport is offline
IB/srp: Avoid endless SCSI error handling loop
IB/srp: Avoid sending a task management function needlessly
IB/srp: Track connection state properly
IB/mlx4: Remove redundant NULL check before kfree
IB/mlx4: Fix compiler warning about uninitialized 'vlan' variable
IB/mlx4: Convert is_xxx variables in build_mlx_header() to bool
IB/iser: Enable iser when FMRs are not supported
IB/iser: Avoid error prints on EAGAIN registration failures
IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML
IB/uverbs: Implement memory windows support in uverbs
IB/core: Add "type 2" memory windows support
mlx4_core: Propagate MR deregistration failures to caller
mlx4_core: Rename MPT-related functions to have mpt_ prefix
...
Implement MW allocation and deallocation in mlx4_core and mlx4_ib.
Pass down the enable bind flag when registering memory regions.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add memory windows-related code to INIT_HCA and QUERY_HCA.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Do not enable memory windows allocation for virtual functions.
In addition, add a few safety checks, such as:
* Verifying the PD of a new MPT matches the VF.
* Making sure binding memory window isn't enabled for FMRs, and
that new memory windows are not FMR themselves.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The mlx4_en driver allocates the number of objects for the CPU affinity
reverse-map based on the number of rx rings of the device. However,
mlx4_assign_eq() calls irq_cpu_rmap_add() as many times as IRQ's are
assigned to EQ's, which can be as large as mlx4_dev->caps.comp_pool. If
caps.comp_pool is larger than rx_ring_num we will eventually hit the
BUG_ON() in cpu_rmap_add().
Fix this problem by allocating space for the maximum number of CPU
affinity reverse-map objects we might want to add.
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The memory to hold the network device tx_cq is not being allocated with
the correct size in mlx4_en_init_netdev(). It should use MAX_TX_RINGS
instead of MAX_RX_RINGS. This can cause problems if the number of tx
rings being used is greater than MAX_RX_RINGS.
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull trivial tree from Jiri Kosina:
"Assorted tiny fixes queued in trivial tree"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (22 commits)
DocBook: update EXPORT_SYMBOL entry to point at export.h
Documentation: update top level 00-INDEX file with new additions
ARM: at91/ide: remove unsused at91-ide Kconfig entry
percpu_counter.h: comment code for better readability
x86, efi: fix comment typo in head_32.S
IB: cxgb3: delay freeing mem untill entirely done with it
net: mvneta: remove unneeded version.h include
time: x86: report_lost_ticks doesn't exist any more
pcmcia: avoid static analysis complaint about use-after-free
fs/jfs: Fix typo in comment : 'how may' -> 'how many'
of: add missing documentation for of_platform_populate()
btrfs: remove unnecessary cur_trans set before goto loop in join_transaction
sound: soc: Fix typo in sound/codecs
treewide: Fix typo in various drivers
btrfs: fix comment typos
Update ibmvscsi module name in Kconfig.
powerpc: fix typo (utilties -> utilities)
of: fix spelling mistake in comment
h8300: Fix home page URL in h8300/README
xtensa: Fix home page URL in Kconfig
...
MR deregistration fails when memory windows are bound to the MR.
Handle such failures by propagating them to the caller ULP.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The MPT - Memory Protection Table - is used by both memory windows and
memory regions. Hence, all MPT references are relevant for both types
of memory objects. Rename the relevant functions to start with mpt_
instead of the current mr_ prefix.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When a user adds bridge neighbors, allow him to specify VLAN id.
If the VLAN id is not specified, the neighbor will be added
for VLANs currently in the ports filter list. If no VLANs are
configured on the port, we use vlan 0 and only add 1 entry.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Synchronize with 'net' in order to sort out some l2tp, wireless, and
ipv6 GRE fixes that will be built on top of in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
alloc failures already get standardized OOM
messages and a dump_stack.
For the affected mallocs around these OOM messages:
Converted kmallocs with multiplies to kmalloc_array.
Converted a kmalloc/memcpy to kmemdup.
Removed now unused stack variables.
Removed unnecessary parentheses.
Neatened alignment.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Arend van Spriel <arend@broadcom.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix issue in Mellanox driver related to BQL. netdev_tx_reset_queue
was not being called in certain situations where the device was
being start and stopped. Moved netdev_tx_reset_queue from the reset
device path to mlx4_en_free_tx_buf which is where the rings are
cleaned in a reset (specifically from device being stopped).
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-By: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for setting embedded switch fdb in case of SRIOV, by
implementing ndo_fdb_{add, del, dump}. This will allow to use
bridged configuration with multi-function. In order to add VM MAC
to the eSwitch fdb, the following command may be used over the relevant function interface:
bridge fdb add <MAC> permanent self dev <IFACE>
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement and advertise unicast MAC filtering, such that setting macvlan
instance over mlx4_en interfaces will not require the networking core
to put mlx4_en devices in promiscuous mode.
If for some reason adding a unicast address filter fails e.g as of missing space in
the HW mac table, the device forces itself into promiscuous mode (and out of this
forced state when enough space is available).
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a preparation step for supporting multiple unicast addresses, store MAC addresses in hash table.
Remove the radix tree for MAC addresses per QP, as it's not in use.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation to having more than one unicast MAC per port, we need to keep track
of the previous MAC address in the flow of ndo_set_mac_address,
so that mlx4_en_replace_mac will know what to replace.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, mlx4_en_do_set_multicast serves as the ndo_set_rx_mode entry for mlx4_en,
doing all related work. Split it to few calls, one per required functionality
(e.g multicast, promiscuous, etc) and rename some structures and calls
to use rx_mode notation instead of multicast.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move low level code that deals with management of Ethernet MACs and QPs from mlx4_core to mlx4_en.
Also convert the new functions to deal with MACs in form of char array instead of u64.
Actual functions moved:
mlx4_replace_mac
mlx4_get_eth_qp
mlx4_put_eth_qp
To conduct this change, some functionality had to be exported from the core,
the following functions were added:
mlx4_get_base_qp
__mlx4_replace_mac (low level function for CX1/A0 compatibility)
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the code consistent in regard to error messages
not spanning multiple lines.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, RX path code that does RX filtering is not optimized
and does an expensive conversion. In order to use ether_addr_equal_64bits
which is optimized for such cases, we need the MAC address kept by the device
to be in the form of unsigned char array instead of u64. Store the MAC address
as unsigned char array and convert to/from u64 out of the fast path when needed.
Side effect of this is that we no longer need priv->mac, since it's the same
as dev->dev_addr.
This optimization was suggested by Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there are relatively complex conditional checks in the fast path,
for TX loopback enabling and resulting RX filter logic.
Move elaborate if's out of data path, replace them with a single flag
for each state and update that state from appropriate places.
Also, in native (non SRIOV) mode and not in loopback or in selftest,
there is no need to try and filter out packets that HW loopback-ed,
as in native mode we do not loopback packets anymore.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix mlx4 VFs not working on old guests because of 64B CQE changes
- Fix ill-considered sparse fix for qib
- Fix IPoIB crash due to skb double destruct introduced in 3.8-rc1
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJREutcAAoJEENa44ZhAt0hjFwP/3SCQr/eboXyJV9GBlmbU9y2
X7t6JS1n9R5KxBj54XBL8ZA7qcaw7rQj8VgPC4qlMWTR1/fXHsrtiRtQU1VMBcBP
Eh50iE5BEq4kpK93IYZZei+I7J/0O1Hpj1JwUuvGr7/hQltMWXItuPTlO4Ror5Kk
Vvu/waQh9DDp1uQRSPbSqAhEx7cGbl27UT7BLPqszVla59GA8UOUcfit8I9CyTmk
ySP2zrDC7JtPoOPYy6w32K4NSjp3KTR4EHWX0G3t/b0LvwEHARwQZ3RI4ZjNMqLl
mtKfqaYjqCeSlaT6MAODlN0aTp38GFAU0RaGePL5GurxQExwGnVZfTRUJDkNGTGO
vPDq6+L6XwPHgYTs1knafs3OT24nwv/vzZ/SLV7gcssbxdL8Cru16E4CO3Vpryrl
5B0w2+ld+L1lw/m4rSuqzQYpS6NpW35ATKzMhQNwk9cLCLNCOqv247WDvhBZDnpV
lhLQ+RGs6DK7CQQ8w4rYLFBVk+1kPlZYILV0Rjni6vv7w9S/byVrshqE8eIkQwqE
BEl0gMc83VZj5WH5s5MJEx+T5H2lZ80rIDKuamSz7wEduXWWENEqj5k7mBHa66Sn
0aHcrXDe26Cj1TUCGbrgeFPMlucVAK+fSjvEzZrzQwxLspnKXlFw5v0DvqmTqBok
hO0iE4ajfXl9RfIC7KrK
=bmKU
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull IB regression fixes from Roland Dreier:
- Fix mlx4 VFs not working on old guests because of 64B CQE changes
- Fix ill-considered sparse fix for qib
- Fix IPoIB crash due to skb double destruct introduced in 3.8-rc1
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/qib: Fix for broken sparse warning fix
mlx4_core: Fix advertisement of wrong PF context behaviour
IPoIB: Fix crash due to skb double destruct
Commit 08ff32352d ("mlx4: 64-byte CQE/EQE support") introduced a
regression where older guest VF drivers failed to load even when
64-byte EQEs/CQEs are disabled, since the PF wrongly advertises the
new context behaviour anyway. The failure looks like:
mlx4_core 0000:00:07.0: Unknown pf context behaviour
mlx4_core 0000:00:07.0: Failed to obtain slave caps
mlx4_core: probe of 0000:00:07.0 failed with error -38
Fix this by basing this advertisement on dev->caps.flags, which is the
operational capabilities used by the QUERY_FUNC_CAP command wrapper
(dev_cap->flags holds the firmware capabilities).
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
ip_eth_mc_map function can't be used when CONFIG_INET isn't defined.
Fixed compilation error by adding CONFIG_INET define check before using the
function.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Propagate return value of mlx4_en_ethtool_add_mac_rule_by_ipv4 in case of
failure.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
alloc failures already get standardized OOM
messages and a dump_stack.
Convert kzalloc's with multiplies to kcalloc.
Convert kmalloc's with multiplies to kmalloc_array.
Fix a few whitespace defects.
Convert a constant 6 to ETH_ALEN.
Use parentheses around sizeof.
Convert vmalloc/memset to vzalloc.
Remove now unused size variables.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Under heavy CPU load, changing, ring size/mtu/etc. could result in transmit
timeout, since stop-start port might take more than 10 seconds.
Calling netif_detach_device to prevent tx queue transmit timeout.
netif_detach_device() is not called under ndo_stop, because netif_carrier_off
will prevent the timeout, and device should not be marked as not present, or
else user won't be able to start it later on.
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mac reassignments should only be done when not supported by the firmware. To
accomplish that, checking firmware capability bit to know whether we should
reassign macs in the driver.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Firmware dynamically changes flow steering hash configuration from covering
L2 only to "full" L2/L3/L4 mode needed. The dynamic change allows the driver
to set hard coded hash configuration which is changed by the firmware from L2
to L2/L3/L4 when attaching the first L3/L4 flow steering rule and back to L2
when there are no more such rules.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of the driver unload flow, all steering rules must be deleted,
make sure to remove the rules that were set through ethtool.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Attaching steering rules while the interface is down is an invalid operation, block it.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vlan mask field should be validated and assigned according to the field
size which is 12 bits. Also replace the numeric 0xfff mask with existing kernel
macro.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When attaching flow steering rules via Ethtool accept only valid vlans IDs e.g
in the range: [0,4095].
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Destination mac is a mandatory specification for ip/udp steering rules.
When attaching multicast steering rules via ethtool the unicast mac of the
interface was added to the rule specification instead of the multicast mac.
The following commit sets the corresponding multicast mac for the rule multicast ip.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The allow_loopback flag was wrongly set using arithmetic bit operation, change
the code to use logical bit operation.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the fields for struct mlx4_net_trans_rule_hw_ctrl were packed into u32
and accessed through bit field operations. Expose and access them directly as
u8.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bring in the 'net' tree so that we can get some ipv4/ipv6 bug
fixes that some net-next work will build upon.
Signed-off-by: David S. Miller <davem@davemloft.net>
filters_lock might have been used while it was re-initialized.
Moved filters_lock and filters_list initialization to init_netdev instead of
alloc_resources which is called every time the device is configured.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a possible race where the TX completion handler can clean the
entire TX queue between the decision that the queue is full and actually
closing it. To avoid this situation, check again if the queue is really
full, if not, reopen the transmit and continue with sending the packet.
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Returning 0 (success) when in fact we are aborting the load, leads to kernel
panic when unloading the module. Fix that by returning the actual error code.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device multicast list is protected by netif_addr_lock_bh in the networking core, we should
use this locking practice in mlx4_en too.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When port is stopped and flow steering mode is not device managed: promisc QP
rule wasn't removed from MCG table.
Added code to remove it in all flow steering modes.
In addition, promsic rule removal should be in stop port and not in start
port - moved it accordingly.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Performing the DUMP_ETH_STATS firmware command outside the lock leads to kernel
panic when data structures such as RX/TX rings are freed in parallel, e.g when
one changes the mtu or ring sizes.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
remove redundant code from build_inline_wqe()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-By: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The lines
if (mlx4_is_mfunc(dev)) {
nreq = 2;
} else {
which hard code the number of requested msi-x vectors under multi-function
mode to two can be removed completely, since the firmware sets num_eqs and
reserved_eqs appropriately Thus, the code line:
nreq = min_t(int, dev->caps.num_eqs - dev->caps.reserved_eqs, nreq);
is by itself sufficient and correct for all cases. Currently, for mfunc
mode num_eqs = 32 and reserved_eqs = 28, hence four vectors will be enabled.
This triples (one vector is used for the async events and commands EQ) the
horse power provided for processing of incoming packets on netdev RSS scheme,
IO initiators/targets commands processing flows, etc.
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5b4c4d3686 "mlx4_en: Allow communication between functions on
same host" introduced a regression under which a bridge acting as vSwitch
whose uplink is an mlx4 Ethernet device become non-operative in native
(non sriov) mode. This happens since broadcast ARP requests sent by VMs
were loopback-ed by the HW and hence the bridge learned VM source MACs
on both the VM and the uplink ports.
The fix is to place the DMAC in the send WQE only under SRIOV/eSwitch
configuration or when the device is in selftest.
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
perm_addr is initialized correctly in register_netdevice() so to init it in
drivers is no longer needed.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device managed flow steering will be enabled only under administrator
directive provided through setting the existing module parameter
log_num_mgm_entry_size to -1 (if the device actually supports flow
steering). If flow steering isn't requested or not available, the
driver will use the value of log_num_mgm_entry_size and B0 steering.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Separate flow steering capability detection from the decision to activate.
For the master (and for native), detect the flow steering capability
in mlx4_dev_cap, but activate the appropriate steering type in a new
function choose_flow_steering() based on detected data.
For VFs, activate flow steering based on what was actually activated
by the master, where that info is obtained via QUERY_HCA. This fixes
the current VF detection which is wrongly based on QUERY_DEV_CAP.
Also, for SR-IOV mode, if flow steering may be activated, do so only
if the max number of QPs per rule is sufficient to satisfy one
subscription per VF. If not, fall back to B0 mode. This is needed to
serve registrations done by L2 network drivers such as mlx4_en and
IPoIB when the network stack attempts to join to multicast groups such
as all-hosts or the IPoIB broadcast group.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The error flow of the flow steering wrapper had a typo which caused
the wrong firmware command to be called, fix it.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Since VFs may be mapped to VMs which aren't trusted entities, flow
steering rules attached through the wrapper on behalf of VFs must be
checked to make sure that the specified QP number is assigned to that
VF. Also, make sure to keep the QP busy till the end of the operation
from the resource tracker point of view.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- A good chunk of Bart Van Assche's SRP fixes
- UAPI disintegration from David Howells
- mlx4 support for "64-byte CQE" hardware feature from Or Gerlitz
- Other miscellaneous fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJQxstjAAoJEENa44ZhAt0hURUQAJd7HumReKTdRqzIzXPc+rgl
pRR5eqplPY2anfJMqLDiFphVjfCiKyhudomdo+RUbBFFnUVLlBzk80A0/IZ3g3PZ
MHOT+pX4PGDd+3FQxV2AaQCMwgGbvC0haInXyQDVZGm0fbMjRd699yGVWBiA8rOI
VNhUi5WMmynSINYokM8UxrhfoUfy3QxsOvZBZ3XUD1zjJB0IMd5HRdiDUG7ur0q+
rfpWKv51DXT81ux36MXbdPBhLRbzx4B7EwuPWOFPqJe1KwK2cD8iA6DwEKC9KMxS
Kj2+CxB5Bfpfz8bhLi2VZcMgAKiSIQDXUtiKz8h0yFVhvADYZLU7zdGN49mCqKcY
9dwX8+0aIVez6WB2jH+ir2FSG65NsnvqESwQ4LLQ9bhArgf9fapVGlypHwcKi5hh
3j2ipO/RyT56nLQeI0gz1P5mQneFSWlY96CD8WP+9OxO/mVnxViajzevSwT/cLE6
IOMks8DPhsQK88JXSx0XKVxn3zrJ9SXbYDhRWJ6f4w/fxraRXlFdQi0UfcsAajkX
5qmM4e8Oy97TJYiY1RkAmb7aV182xMWVjtDx2FFTQ5ukgDea/DklIM/JNQ475027
N7zMW1tP6+gnnDyMEkteVuPdbl1fzwI3RdXCh0mFZHZ5tvegkdxbw0XxERcevnQN
LZfME8wCuC7+RtmE38Li
=TQK2
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband upate from Roland Dreier:
"First batch of InfiniBand/RDMA changes for the 3.8 merge window:
- A good chunk of Bart Van Assche's SRP fixes
- UAPI disintegration from David Howells
- mlx4 support for "64-byte CQE" hardware feature from Or Gerlitz
- Other miscellaneous fixes"
Fix up trivial conflict in mellanox/mlx4 driver.
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (33 commits)
RDMA/nes: Fix for crash when registering zero length MR for CQ
RDMA/nes: Fix for terminate timer crash
RDMA/nes: Fix for BUG_ON due to adding already-pending timer
IB/srp: Allow SRP disconnect through sysfs
srp_transport: Document sysfs attributes
srp_transport: Simplify attribute initialization code
srp_transport: Fix attribute registration
IB/srp: Document sysfs attributes
IB/srp: send disconnect request without waiting for CM timewait exit
IB/srp: destroy and recreate QP and CQs when reconnecting
IB/srp: Eliminate state SRP_TARGET_DEAD
IB/srp: Introduce the helper function srp_remove_target()
IB/srp: Suppress superfluous error messages
IB/srp: Process all error completions
IB/srp: Introduce srp_handle_qp_err()
IB/srp: Simplify SCSI error handling
IB/srp: Keep processing commands during host removal
IB/srp: Eliminate state SRP_TARGET_CONNECTING
IB/srp: Increase block layer timeout
RDMA/cm: Change return value from find_gid_port()
...
Pull trivial branch from Jiri Kosina:
"Usual stuff -- comment/printk typo fixes, documentation updates, dead
code elimination."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
HOWTO: fix double words typo
x86 mtrr: fix comment typo in mtrr_bp_init
propagate name change to comments in kernel source
doc: Update the name of profiling based on sysfs
treewide: Fix typos in various drivers
treewide: Fix typos in various Kconfig
wireless: mwifiex: Fix typo in wireless/mwifiex driver
messages: i2o: Fix typo in messages/i2o
scripts/kernel-doc: check that non-void fcts describe their return value
Kernel-doc: Convention: Use a "Return" section to describe return values
radeon: Fix typo and copy/paste error in comments
doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
various: Fix spelling of "asynchronous" in comments.
Fix misspellings of "whether" in comments.
eisa: Fix spelling of "asynchronous".
various: Fix spelling of "registered" in comments.
doc: fix quite a few typos within Documentation
target: iscsi: fix comment typos in target/iscsi drivers
treewide: fix typo of "suport" in various comments and Kconfig
treewide: fix typo of "suppport" in various comments
...
Implement destination MAC rule extension for L3/L4 rules in
flow steering. Usefull for vSwitch/macvlan configurations.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of full_mac, zero_mac in favour of
is_zero_ether_addr and is_broadcast_ether_addr.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The __dev* removal patches for the network drivers ended up messing up
the function prototypes for a bunch of drivers. This patch fixes all of
them back up to be properly aligned.
Bonus is that this almost removes 100 lines of code, always a nice
surprise.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_HOTPLUG is going away as an option. As result the __dev*
markings will be going away.
Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst,
and __devexit.
Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add support to changing number of rx/tx channels using
ethtool ('ethtool -[lL]'). Where the number of tx channels specified in ethtool
is the number of rings per user priority - not total number of tx rings.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to re-set tx moderation information after calling set_ringparam
else default tx moderation will be used.
Also avoid related code duplication, by putting it in a utility function.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The slave_state_lock spinlock is used in both interrupt context and
process context, hence irq locking must be used. Found by lockdep.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
ConnectX-3 devices can use either 64- or 32-byte completion queue
entries (CQEs) and event queue entries (EQEs). Using 64-byte
EQEs/CQEs performs better because each entry is aligned to a complete
cacheline. This patch queries the HCA's capabilities, and if it
supports 64-byte CQEs and EQES the driver will configure the HW to
work in 64-byte mode.
The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.
Since this mode is global, userspace (libmlx4) must be updated to work
with the configured CQE size, and guests using SR-IOV virtual
functions need to know both EQE and CQE size.
In case one of the 64-byte CQE/EQE capabilities is activated, the
patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
they need an update to be able to work with the PPF. This is done by
changing the returned pf_context_behaviour not to be zero any more. In
case none of these capabilities is activated that value remains zero
and older guest drivers can run OK.
The SRIOV related flow is as follows
1. the PPF does the detection of the new capabilities using
QUERY_DEV_CAP command.
2. the PPF activates the new capabilities using INIT_HCA.
3. the VF detects if the PPF activated the capabilities using
QUERY_HCA, and if this is the case activates them for itself too.
Note that the VF detects that it must be aware to the new PF behaviour
using QUERY_FUNC_CAP. Steps 1 and 2 apply also for native mode.
User space notification is done through a new field introduced in
struct mlx4_ib_ucontext which holds device capabilities for which user
space must take action. This changes the binary interface so the ABI
towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
when **needed** i.e. only when the driver does use 64-byte CQEs or
future device capabilities which must be in sync by user space. This
practice allows to work with unmodified libmlx4 on older devices (e.g
A0, B0) which don't support 64-byte CQEs.
In order to keep existing systems functional when they update to a
newer kernel that contains these changes in VF and userspace ABI, a
module parameter enable_64b_cqe_eqe must be set to enable 64-byte
mode; the default is currently false.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Various drivers depend on INET because they used to select INET_LRO,
but they have all been converted to use GRO which has no such
dependency.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit fa37a9586f ('mlx4_en: Moving to
work with GRO') left behind the Kconfig depends/select, some dead
code and comments referring to LRO.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"Asynchronous" is misspelled in some comments. No code changes.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
Minor conflict between the BCM_CNIC define removal in net-next
and a bug fix added to net. Based upon a conflict resolution
patch posted by Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx4 currently uses a too high tx coalescing setting, deferring
TX completion interrupts by up to 128 us.
With the recent skb_orphan() removal in commit 8112ec3b87,
performance of a single TCP flow is capped to ~4 Gbps, unless
we increase tcp_limit_output_bytes.
I suggest using 16 us instead of 128 us, allowing a finer control.
Performance of a single TCP flow is restored to previous levels,
while keeping TCP small queues fully enabled with default sysctl.
This patch is also a BQL prereq.
Reported-by: Vimalkumar <j.vimal@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"This is what we usually expect at this stage of the game, lots of
little things, mostly in drivers. With the occasional 'oops didn't
mean to do that' kind of regressions in the core code."
1) Uninitialized data in __ip_vs_get_timeouts(), from Arnd Bergmann
2) Reject invalid ACK sequences in Fast Open sockets, from Jerry Chu.
3) Lost error code on return from _rtl_usb_receive(), from Christian
Lamparter.
4) Fix reset resume on USB rt2x00, from Stanislaw Gruszka.
5) Release resources on error in pch_gbe driver, from Veaceslav Falico.
6) Default hop limit not set correctly in ip6_template_metrics[], fix
from Li RongQing.
7) Gianfar PTP code requests wrong kind of resource during probe, fix
from Wei Yang.
8) Fix VHOST net driver on big-endian, from Michael S Tsirkin.
9) Mallenox driver bug fixes from Jack Morgenstein, Or Gerlitz, Moni
Shoua, Dotan Barak, and Uri Habusha.
10) usbnet leaks memory on TX path, fix from Hemant Kumar.
11) Use socket state test, rather than presence of FIN bit packet, to
determine FIONREAD/SIOCINQ value. Fix from Eric Dumazet.
12) Fix cxgb4 build failure, from Vipul Pandya.
13) Provide a SYN_DATA_ACKED state to complement SYN_FASTOPEN in socket
info dumps. From Yuchung Cheng.
14) Fix leak of security path in kfree_skb_partial(). Fix from Eric
Dumazet.
15) Handle RX FIFO overflows more resiliently in pch_gbe driver, from
Veaceslav Falico.
16) Fix MAINTAINERS file pattern for networking drivers, from Jean
Delvare.
17) Add iPhone5 IDs to IPHETH driver, from Jay Purohit.
18) VLAN device type change restriction is too strict, and should not
trigger for the automatically generated vlan0 device. Fix from Jiri
Pirko.
19) Make PMTU/redirect flushing work properly again in ipv4, from
Steffen Klassert.
20) Fix memory corruptions by using kfree_rcu() in netlink_release().
From Eric Dumazet.
21) More qmi_wwan device IDs, from Bjørn Mork.
22) Fix unintentional change of SNAT/DNAT hooks in generic NAT
infrastructure, from Elison Niven.
23) Fix 3.6.x regression in xt_TEE netfilter module, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
tilegx: fix some issues in the SW TSO support
qmi_wwan/cdc_ether: move Novatel 551 and E362 to qmi_wwan
net: usb: Fix memory leak on Tx data path
net/mlx4_core: Unmap UAR also in the case of error flow
net/mlx4_en: Don't use vlan tag value as an indication for vlan presence
net/mlx4_en: Fix double-release-range in tx-rings
bas_gigaset: fix pre_reset handling
vhost: fix mergeable bufs on BE hosts
gianfar_ptp: use iomem, not ioports resource tree in probe
ipv6: Set default hoplimit as zero.
NET_VENDOR_TI: make available for am33xx as well
pch_gbe: fix error handling in pch_gbe_up()
b43: Fix oops on unload when firmware not found
mwifiex: clean up scan state on error
mwifiex: return -EBUSY if specific scan request cannot be honored
brcmfmac: fix potential NULL dereference
Revert "ath9k_hw: Updated AR9003 tx gain table for 5GHz"
ath9k_htc: Add PID/VID for a Ubiquiti WiFiStation
rt2x00: usb: fix reset resume
rtlwifi: pass rx setup error code to caller
...
If a failure takes place during the EQ creation, we need to unmap the
UAR memory block too.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Uri Habusha <urih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vlan tag can be zero. This is why it can't serve as an indication
that packet requires VLAN header in the TX flow.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The QP range is reserved as a single block. However, when freeing the
en resources, the tx-ring QPs are released both in mlx4_en_destroy_tx_ring
(one at a time) and in mlx4_en_free_resources (as a block release).
Fix by eliminating the one-at-a-time release in mlx4_en_destroy_tx_ring.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed the resource cleanup to act correctly and prevent a kernel oops when
mlx4_QUERY_ADAPTER() fails.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
These debug prints left behind by commits c82e9aa0a8 ("mlx4_core:
resource tracking for HCA resources used by guests"), 54679e1482
("mlx4: Implement QP paravirtualization and maintain phys_pkey_cache
for smp_snoop") and 993c401e20 ("mlx4_core: Add IB port-state
machine and port mgmt event propagation") make it pretty hard to
actually use the mlx4_core debug messages when running in SRIOV/IB
mode -- for example, the module load sequence of a device with one VF
yielded 631 debug prints, with 408 of them being from this set. Let's
just remove them.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Currently, the InfiniBand stack does not support flow steering at the
verbs level -- the only usage of flow steering in the IB driver is for
L2 multicast attaches. We need to add the IB case to procedure
mlx4_QP_FLOW_STEERING_ATTACH_wrapper() to allow IPoIB to work on VFs
on ConnectX-3 when flow steering is enabled.
Currently, the IB case in mlx4_QP_FLOW_STEERING_ATTACH_wrapper() is missing,
so the procedure returns -EINVAL and IPoIB on VFs breaks.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJQav4jAAoJEENa44ZhAt0hmL0QAJTuMdSOzYFd/NB38owJCNM2
kz/N1GlBm3z98fIlGo8u+lzgV2qxqZSAzsJsouMeK38KiAX3CL8HKe44A1QvTM6v
dXTNL4JFX24/YF+nlmMY8Av518I9Mkte3BZCnpYkBjVFBWe0ePwoRC/btfBXPDIV
0snq4OtjoBAn00dOOyuZ5PoyY9xf0z4UB0Gple9sM4mzEb8wVWdNDDPOiuPJc6fA
L+gk6HLkZDg54+QswafdKYwpeTq45wIKLmCdS3oUNmppMLVhZY8rECOwzSa+KiTr
/Yo19n+zl+IBlvjQHhmUqGHvdD17PaGlr+TckAsQqmVfXUH5qqpEnkF8FoEK59c5
YA3lVU8Sj4BPhJ7qX54CuN3767mZizakkxCr9iPRzABFTgzWVgcSgCrE8jjx4i0h
Pam+L5bmANFStgmGR8PmXiNgcrCUcEqYHsOWDDAnHa5ekb2nyv1JL1c18hlY9hC3
Xb1YTMZFwvofGza89hBu7oHrMbLOUc5kW2lBpvUn2nlyf3i0F8ISlVbVbNjFA54p
60/jHa2VOQ2CcJUJKnJOk4ajOOEfHnPtMn2q96XJ69Dp8+eSYEO/G+0i1OlChq4h
ClnG0Yp+NkT1o8WXMd7guDR+RsXt+DXIij5TiUWRIqnIlopIsMTRhNH28tMu4jQL
fgN5n987wru91ewdX4gW
=PAcy
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband updates from Roland Dreier:
"First batch of InfiniBand/RDMA changes for the 3.7 merge window:
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes"
This merge also removes a new use of __cancel_delayed_work(), and
replaces it with the regular cancel_delayed_work() that is now irq-safe
thanks to the workqueue updates.
That said, I suspect the sequence in question should probably use
"mod_delayed_work()". I just did the minimal "don't use deprecated
functions" fixup, though.
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits)
IB/qib: Fix local access validation for user MRs
mlx4_core: Disable SENSE_PORT for multifunction devices
mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs
mlx4_core: Stash PCI ID driver_data in mlx4_priv structure
IB/srp: Avoid having aborted requests hang
IB/srp: Fix use-after-free in srp_reset_req()
IB/qib: Add a qib driver version
RDMA/nes: Fix compilation error when nes_debug is enabled
RDMA/nes: Print hardware resource type
RDMA/nes: Fix for crash when TX checksum offload is off
RDMA/nes: Cosmetic changes
RDMA/nes: Fix for incorrect MSS when TSO is on
RDMA/nes: Fix incorrect resolving of the loopback MAC address
mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem
mlx4_core: Trivial cleanups to driver log messages
mlx4_core: Trivial readability fix: "0X30" -> "0x30"
IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
mlx4: Paravirtualize Node Guids for slaves
mlx4: Activate SR-IOV mode for IB
...
Pull networking changes from David Miller:
1) GRE now works over ipv6, from Dmitry Kozlov.
2) Make SCTP more network namespace aware, from Eric Biederman.
3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.
4) Make openvswitch network namespace aware, from Pravin B Shelar.
5) IPV6 NAT implementation, from Patrick McHardy.
6) Server side support for TCP Fast Open, from Jerry Chu and others.
7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
Borkmann.
8) Increate the loopback default MTU to 64K, from Eric Dumazet.
9) Use a per-task rather than per-socket page fragment allocator for
outgoing networking traffic. This benefits processes that have very
many mostly idle sockets, which is quite common.
From Eric Dumazet.
10) Use up to 32K for page fragment allocations, with fallbacks to
smaller sizes when higher order page allocations fail. Benefits are
a) less segments for driver to process b) less calls to page
allocator c) less waste of space.
From Eric Dumazet.
11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.
12) VXLAN device driver, one way to handle VLAN issues such as the
limitation of 4096 VLAN IDs yet still have some level of isolation.
From Stephen Hemminger.
13) As usual there is a large boatload of driver changes, with the scale
perhaps tilted towards the wireless side this time around.
Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
hyperv: Add buffer for extended info after the RNDIS response message.
hyperv: Report actual status in receive completion packet
hyperv: Remove extra allocated space for recv_pkt_list elements
hyperv: Fix page buffer handling in rndis_filter_send_request()
hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
hyperv: Fix the max_xfer_size in RNDIS initialization
vxlan: put UDP socket in correct namespace
vxlan: Depend on CONFIG_INET
sfc: Fix the reported priorities of different filter types
sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
sfc: Fix loopback self-test with separate_tx_channels=1
sfc: Fix MCDI structure field lookup
sfc: Add parentheses around use of bitfield macro arguments
sfc: Fix null function pointer in efx_sriov_channel_type
vxlan: virtual extensible lan
igmp: export symbol ip_mc_leave_group
netlink: add attributes to fdb interface
tg3: unconditionally select HWMON support when tg3 is enabled.
Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
gre: fix sparse warning
...
Pull workqueue changes from Tejun Heo:
"This is workqueue updates for v3.7-rc1. A lot of activities this
round including considerable API and behavior cleanups.
* delayed_work combines a timer and a work item. The handling of the
timer part has always been a bit clunky leading to confusing
cancelation API with weird corner-case behaviors. delayed_work is
updated to use new IRQ safe timer and cancelation now works as
expected.
* Another deficiency of delayed_work was lack of the counterpart of
mod_timer() which led to cancel+queue combinations or open-coded
timer+work usages. mod_delayed_work[_on]() are added.
These two delayed_work changes make delayed_work provide interface
and behave like timer which is executed with process context.
* A work item could be executed concurrently on multiple CPUs, which
is rather unintuitive and made flush_work() behavior confusing and
half-broken under certain circumstances. This problem doesn't
exist for non-reentrant workqueues. While non-reentrancy check
isn't free, the overhead is incurred only when a work item bounces
across different CPUs and even in simulated pathological scenario
the overhead isn't too high.
All workqueues are made non-reentrant. This removes the
distinction between flush_[delayed_]work() and
flush_[delayed_]_work_sync(). The former is now as strong as the
latter and the specified work item is guaranteed to have finished
execution of any previous queueing on return.
* In addition to the various bug fixes, Lai redid and simplified CPU
hotplug handling significantly.
* Joonsoo introduced system_highpri_wq and used it during CPU
hotplug.
There are two merge commits - one to pull in IRQ safe timer from
tip/timers/core and the other to pull in CPU hotplug fixes from
wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."
Fixed a number of trivial conflicts, but the more interesting conflicts
were silent ones where the deprecated interfaces had been used by new
code in the merge window, and thus didn't cause any real data conflicts.
Tejun pointed out a few of them, I fixed a couple more.
* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
workqueue: remove @delayed from cwq_dec_nr_in_flight()
workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
workqueue: use __cpuinit instead of __devinit for cpu callbacks
workqueue: rename manager_mutex to assoc_mutex
workqueue: WORKER_REBIND is no longer necessary for idle rebinding
workqueue: WORKER_REBIND is no longer necessary for busy rebinding
workqueue: reimplement idle worker rebinding
workqueue: deprecate __cancel_delayed_work()
workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
workqueue: use mod_delayed_work() instead of __cancel + queue
workqueue: use irqsafe timer for delayed_work
workqueue: clean up delayed_work initializers and add missing one
workqueue: make deferrable delayed_work initializer names consistent
workqueue: cosmetic whitespace updates for macro definitions
workqueue: deprecate system_nrt[_freezable]_wq
workqueue: deprecate flush[_delayed]_work_sync()
...
After commit e22979d96a (mlx4_en: Moving to Interrupts for TX
completions) we no longer need to orphan skbs in mlx4_en_xmit()
since skb wont stay a long time in TX ring before their release.
Orphaning skbs in ndo_start_xmit() should be avoided as much as
possible, since it breaks TCP Small Queue or other flow control
mechanisms (per socket limits)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Host bridge hotplug
- Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi)
- Clear host bridge resource info to avoid issue when releasing (Yinghai Lu)
- Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu)
- Use standard list ops for acpi_pci_drivers (Jiang Liu)
Device hotplug
- Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang Liu)
- Remove fakephp driver (Bjorn Helgaas)
- Fix VGA ref count in hotplug remove path (Yinghai Lu)
- Allow acpiphp to handle PCIe ports without native hotplug (Jiang Liu)
- Implement resume regardless of pciehp_force param (Oliver Neukum)
- Make pci_fixup_irqs() work after init (Thierry Reding)
Miscellaneous
- Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang)
- Factor out PCI Express Capability accessors (Jiang Liu)
- Add pcibios_window_alignment() so powerpc EEH can use generic resource assignment (Gavin Shan)
- Make pci_error_handlers const (Stephen Hemminger)
- Cleanup drivers/pci/remove.c (Bjorn Helgaas)
- Improve Vendor-Specific Extended Capability support (Bjorn Helgaas)
- Use standard list ops for bus->devices (Bjorn Helgaas)
- Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang)
- Reassign invalid bus number ranges (Intel DP43BF workaround) (Yinghai Lu)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJQac4hAAoJEPGMOI97Hn6zjZYP/iaqU9kjmgTsBbSyzB4oApv/
RRxo3I+ad9GF6XlMQfVAtyx1pgCD1gdGAtoDgGSCTqgdYD3CO10AxKU+yleAk1wo
dbMxLifJNTrT3G1mZ/NL16yEGhCwvhfwzRtB1VoZmCT4lSApO/7cJkXl2DzHfA/i
pmltOOiQCN8kbUcJbVPtUyTVPi2zl/8bsyCyTkS7YG0VXeGRM+ZUvPWZJ7MnWYYB
5qoCdrw5ENCCiDQ9yw5SAfgL23b+0p6OI/x3Lkex0QQOWwSqGSiaHt4b7eitrC5b
2eAJg32f/AzZke1YbKLMfdsL0VJP3GAswhDVHlgmo63rZkOZChm+97dgZ35Mcv5v
kEXkWyBb1xJ3t8rZir6Qer9Iv2wOB+MkZ5qtU/Vf+l0wLQLYTrRVsKngrEDREONk
dXbokp6iVSPeA1sTSdH9MmHlTUIj82ZLSGcxcjTsN8NWZjxx6g3rNx1uay+5MYOW
4ET9zNu5snrAqN6N4Tb81gvtG8qYfxzdvVfrA9AaGKI6xxB7jkqgFJRp55JiEcFc
x4cmWkhvdlhVsG2TQwFxYNfswOqD+7NCs6V4kSVZX6ezpDrH7I5VvcnnhstF7C8l
KZul0EV7OW+kDK23pNe24lVP2xtOv6G8eK/3PmeKIXWl1V83nqre/oLufRzTfs+Z
SxkILwY/MFpuCFteKE1t
=haBu
-----END PGP SIGNATURE-----
Merge tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI changes from Bjorn Helgaas:
"Host bridge hotplug
- Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi)
- Clear host bridge resource info to avoid issue when releasing
(Yinghai Lu)
- Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu)
- Use standard list ops for acpi_pci_drivers (Jiang Liu)
Device hotplug
- Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang
Liu)
- Remove fakephp driver (Bjorn Helgaas)
- Fix VGA ref count in hotplug remove path (Yinghai Lu)
- Allow acpiphp to handle PCIe ports without native hotplug (Jiang
Liu)
- Implement resume regardless of pciehp_force param (Oliver Neukum)
- Make pci_fixup_irqs() work after init (Thierry Reding)
Miscellaneous
- Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang)
- Factor out PCI Express Capability accessors (Jiang Liu)
- Add pcibios_window_alignment() so powerpc EEH can use generic
resource assignment (Gavin Shan)
- Make pci_error_handlers const (Stephen Hemminger)
- Cleanup drivers/pci/remove.c (Bjorn Helgaas)
- Improve Vendor-Specific Extended Capability support (Bjorn
Helgaas)
- Use standard list ops for bus->devices (Bjorn Helgaas)
- Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang)
- Reassign invalid bus number ranges (Intel DP43BF workaround)
(Yinghai Lu)"
* tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (102 commits)
PCI: acpiphp: Handle PCIe ports without native hotplug capability
PCI/ACPI: Use acpi_driver_data() rather than searching acpi_pci_roots
PCI/ACPI: Protect acpi_pci_roots list with mutex
PCI/ACPI: Use acpi_pci_root info rather than looking it up again
PCI/ACPI: Pass acpi_pci_root to acpi_pci_drivers' add/remove interface
PCI/ACPI: Protect acpi_pci_drivers list with mutex
PCI/ACPI: Notify acpi_pci_drivers when hot-plugging PCI root bridges
PCI/ACPI: Use normal list for struct acpi_pci_driver
PCI/ACPI: Use DEVICE_ACPI_HANDLE rather than searching acpi_pci_roots
PCI: Fix default vga ref_count
ia64/PCI: Clear host bridge aperture struct resource
x86/PCI: Clear host bridge aperture struct resource
PCI: Stop all children first, before removing all children
Revert "PCI: Use hotplug-safe pci_get_domain_bus_and_slot()"
PCI: Provide a default pcibios_update_irq()
PCI: Discard __init annotations for pci_fixup_irqs() and related functions
PCI: Use correct type when freeing bus resource list
PCI: Check P2P bridge for invalid secondary/subordinate range
PCI: Convert "new_id"/"remove_id" into generic pci_bus driver attributes
xen-pcifront: Use hotplug-safe pci_get_domain_bus_and_slot()
...
In the current driver, the SENSE_PORT firmware command is issued as a
"wrapped" command, but the command handling code doesn't have a
wrapper, so it will never do anything other than log an error message.
The latest ConnectX-3 2.11.500 firmware reports the SENSE_PORT
capability even in multi-function (SR-IOV) mode, so the driver will
try to issue the command.
At least until the driver has a proper wrapper for SENSE_PORT, make
sure we disable the command for multi-function devices.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Instead of having a hard-coded "PCI device ID != 0x1003" (which
obviously breaks as newer devices with ID != 0x1003 become available),
instead let's set a flag in our PCI device table for the older devices
where we're supposed to force using SENSE_PORT. This also avoids
enabling SENSE_PORT for virtual functions by mistake.
Signed-off-by: Roland Dreier <roland@purestorage.com>
That way we can check flags later on, when we've finished with the
pci_device_id structure. Also convert the "is VF" flag to an enum:
"Never do in the preprocessor what can be done in C."
Signed-off-by: Roland Dreier <roland@purestorage.com>
On an SR-IOV master device, __mlx4_init_one() calls mlx4_init_hca()
before mlx4_multi_func_init(). However, for unlucky configurations,
mlx4_init_hca() might call mlx4_SENSE_PORT() (via mlx4_dev_cap()), and
that calls mlx4_cmd_imm() with MLX4_CMD_WRAPPED set.
However, on a multifunction device with MLX4_CMD_WRAPPED, __mlx4_cmd()
calls into mlx4_slave_cmd(), and that immediately tries to do
down(&priv->cmd.slave_sem);
but priv->cmd.slave_sem isn't initialized until mlx4_multi_func_init()
(which we haven't called yet). The next thing it tries to do is access
priv->mfunc.vhcr, but that hasn't been allocated yet.
Fix this by moving the initialization of slave_sem and vhcr up into
mlx4_cmd_init(). Also, since slave_sem is really just being used as a
mutex, convert it into a slave_cmd_mutex.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...). The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use. This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.
Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).
To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:
base_sqpn -- in non-sriov mode, this was formerly sqp_start.
base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This is necessary in order to support > 1 VF/PF in a VM for software
that uses the node guid as a discriminator, such as librdmacm.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Remove the error returns for IB ports from mlx4_ib_add,
mlx4_INIT_PORT_wrapper, and mlx4_CLOSE_PORT_wrapper.
Currently, SRIOV is supported only for devices for which the
link layer is IB on all ports; RoCE support will be added later.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
1. Allow only master to change node description.
2. Prevent AH leakage in send mads.
3. Take device part number from PCI structure, so that guests see the
VF part number (and not the PF part number).
4. Place the device revision ID into caps structure at startup.
5. SET_PORT in update_gids_task needs to go through wrapper on master.
6. In mlx4_ib_event(), PORT_MGMT_EVENT needs be handled in a work
queue on the master, since it propagates events to slaves using
GEN_EQE.
7. Do not support FMR on slaves.
8. Add spinlock to slave_event(), since it is called both in interrupt
context and in process context (due to 6 above, and also if
smp_snoop is used). This fix was found and implemented by Saeed
Mahameed <saeedm@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Normally, INIT_PORT and CLOSE_PORT are invoked when special QP0
transitions to RTR, or transitions to ERR/RESET respectively.
In SR-IOV mode, however, the master is also paravirtualized. This in
turn requires that we not do INIT_PORT until the entire QP0 path (real
QP0 and proxy QP0) is ready to receive. When the real QP0 goes down,
we should indicate that the port is not active.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
1. Slaves may not set the IS_SM capability for the port.
2. DEV_MGMT may not be set in multifunction mode.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
P_Key change and guid change events are not of interest to all slaves,
but only to those slaves which "see" the table slots whose contents
have change.
For example, if the guid at port 1, index 5 has changed in the PPF, we
wish to propagate the gid-change event only to the function which has
that guid index mapped to its port/guid table (in this case it is
slave #5). Other functions should not get the event, since the event
does not affect them.
Similarly with P_Keys -- P_Key change events are forwarded only to
slaves which have that P_Key index mapped to their virtual P_Key table.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>