- A good chunk of Bart Van Assche's SRP fixes
- UAPI disintegration from David Howells
- mlx4 support for "64-byte CQE" hardware feature from Or Gerlitz
- Other miscellaneous fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJQxstjAAoJEENa44ZhAt0hURUQAJd7HumReKTdRqzIzXPc+rgl
pRR5eqplPY2anfJMqLDiFphVjfCiKyhudomdo+RUbBFFnUVLlBzk80A0/IZ3g3PZ
MHOT+pX4PGDd+3FQxV2AaQCMwgGbvC0haInXyQDVZGm0fbMjRd699yGVWBiA8rOI
VNhUi5WMmynSINYokM8UxrhfoUfy3QxsOvZBZ3XUD1zjJB0IMd5HRdiDUG7ur0q+
rfpWKv51DXT81ux36MXbdPBhLRbzx4B7EwuPWOFPqJe1KwK2cD8iA6DwEKC9KMxS
Kj2+CxB5Bfpfz8bhLi2VZcMgAKiSIQDXUtiKz8h0yFVhvADYZLU7zdGN49mCqKcY
9dwX8+0aIVez6WB2jH+ir2FSG65NsnvqESwQ4LLQ9bhArgf9fapVGlypHwcKi5hh
3j2ipO/RyT56nLQeI0gz1P5mQneFSWlY96CD8WP+9OxO/mVnxViajzevSwT/cLE6
IOMks8DPhsQK88JXSx0XKVxn3zrJ9SXbYDhRWJ6f4w/fxraRXlFdQi0UfcsAajkX
5qmM4e8Oy97TJYiY1RkAmb7aV182xMWVjtDx2FFTQ5ukgDea/DklIM/JNQ475027
N7zMW1tP6+gnnDyMEkteVuPdbl1fzwI3RdXCh0mFZHZ5tvegkdxbw0XxERcevnQN
LZfME8wCuC7+RtmE38Li
=TQK2
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband upate from Roland Dreier:
"First batch of InfiniBand/RDMA changes for the 3.8 merge window:
- A good chunk of Bart Van Assche's SRP fixes
- UAPI disintegration from David Howells
- mlx4 support for "64-byte CQE" hardware feature from Or Gerlitz
- Other miscellaneous fixes"
Fix up trivial conflict in mellanox/mlx4 driver.
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (33 commits)
RDMA/nes: Fix for crash when registering zero length MR for CQ
RDMA/nes: Fix for terminate timer crash
RDMA/nes: Fix for BUG_ON due to adding already-pending timer
IB/srp: Allow SRP disconnect through sysfs
srp_transport: Document sysfs attributes
srp_transport: Simplify attribute initialization code
srp_transport: Fix attribute registration
IB/srp: Document sysfs attributes
IB/srp: send disconnect request without waiting for CM timewait exit
IB/srp: destroy and recreate QP and CQs when reconnecting
IB/srp: Eliminate state SRP_TARGET_DEAD
IB/srp: Introduce the helper function srp_remove_target()
IB/srp: Suppress superfluous error messages
IB/srp: Process all error completions
IB/srp: Introduce srp_handle_qp_err()
IB/srp: Simplify SCSI error handling
IB/srp: Keep processing commands during host removal
IB/srp: Eliminate state SRP_TARGET_CONNECTING
IB/srp: Increase block layer timeout
RDMA/cm: Change return value from find_gid_port()
...
The __dev* removal patches for the network drivers ended up messing up
the function prototypes for a bunch of drivers. This patch fixes all of
them back up to be properly aligned.
Bonus is that this almost removes 100 lines of code, always a nice
surprise.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_HOTPLUG is going away as an option. As result the __dev*
markings will be going away.
Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst,
and __devexit.
Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ConnectX-3 devices can use either 64- or 32-byte completion queue
entries (CQEs) and event queue entries (EQEs). Using 64-byte
EQEs/CQEs performs better because each entry is aligned to a complete
cacheline. This patch queries the HCA's capabilities, and if it
supports 64-byte CQEs and EQES the driver will configure the HW to
work in 64-byte mode.
The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.
Since this mode is global, userspace (libmlx4) must be updated to work
with the configured CQE size, and guests using SR-IOV virtual
functions need to know both EQE and CQE size.
In case one of the 64-byte CQE/EQE capabilities is activated, the
patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
they need an update to be able to work with the PPF. This is done by
changing the returned pf_context_behaviour not to be zero any more. In
case none of these capabilities is activated that value remains zero
and older guest drivers can run OK.
The SRIOV related flow is as follows
1. the PPF does the detection of the new capabilities using
QUERY_DEV_CAP command.
2. the PPF activates the new capabilities using INIT_HCA.
3. the VF detects if the PPF activated the capabilities using
QUERY_HCA, and if this is the case activates them for itself too.
Note that the VF detects that it must be aware to the new PF behaviour
using QUERY_FUNC_CAP. Steps 1 and 2 apply also for native mode.
User space notification is done through a new field introduced in
struct mlx4_ib_ucontext which holds device capabilities for which user
space must take action. This changes the binary interface so the ABI
towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
when **needed** i.e. only when the driver does use 64-byte CQEs or
future device capabilities which must be in sync by user space. This
practice allows to work with unmodified libmlx4 on older devices (e.g
A0, B0) which don't support 64-byte CQEs.
In order to keep existing systems functional when they update to a
newer kernel that contains these changes in VF and userspace ABI, a
module parameter enable_64b_cqe_eqe must be set to enable 64-byte
mode; the default is currently false.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fixed the resource cleanup to act correctly and prevent a kernel oops when
mlx4_QUERY_ADAPTER() fails.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJQav4jAAoJEENa44ZhAt0hmL0QAJTuMdSOzYFd/NB38owJCNM2
kz/N1GlBm3z98fIlGo8u+lzgV2qxqZSAzsJsouMeK38KiAX3CL8HKe44A1QvTM6v
dXTNL4JFX24/YF+nlmMY8Av518I9Mkte3BZCnpYkBjVFBWe0ePwoRC/btfBXPDIV
0snq4OtjoBAn00dOOyuZ5PoyY9xf0z4UB0Gple9sM4mzEb8wVWdNDDPOiuPJc6fA
L+gk6HLkZDg54+QswafdKYwpeTq45wIKLmCdS3oUNmppMLVhZY8rECOwzSa+KiTr
/Yo19n+zl+IBlvjQHhmUqGHvdD17PaGlr+TckAsQqmVfXUH5qqpEnkF8FoEK59c5
YA3lVU8Sj4BPhJ7qX54CuN3767mZizakkxCr9iPRzABFTgzWVgcSgCrE8jjx4i0h
Pam+L5bmANFStgmGR8PmXiNgcrCUcEqYHsOWDDAnHa5ekb2nyv1JL1c18hlY9hC3
Xb1YTMZFwvofGza89hBu7oHrMbLOUc5kW2lBpvUn2nlyf3i0F8ISlVbVbNjFA54p
60/jHa2VOQ2CcJUJKnJOk4ajOOEfHnPtMn2q96XJ69Dp8+eSYEO/G+0i1OlChq4h
ClnG0Yp+NkT1o8WXMd7guDR+RsXt+DXIij5TiUWRIqnIlopIsMTRhNH28tMu4jQL
fgN5n987wru91ewdX4gW
=PAcy
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband updates from Roland Dreier:
"First batch of InfiniBand/RDMA changes for the 3.7 merge window:
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes"
This merge also removes a new use of __cancel_delayed_work(), and
replaces it with the regular cancel_delayed_work() that is now irq-safe
thanks to the workqueue updates.
That said, I suspect the sequence in question should probably use
"mod_delayed_work()". I just did the minimal "don't use deprecated
functions" fixup, though.
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits)
IB/qib: Fix local access validation for user MRs
mlx4_core: Disable SENSE_PORT for multifunction devices
mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs
mlx4_core: Stash PCI ID driver_data in mlx4_priv structure
IB/srp: Avoid having aborted requests hang
IB/srp: Fix use-after-free in srp_reset_req()
IB/qib: Add a qib driver version
RDMA/nes: Fix compilation error when nes_debug is enabled
RDMA/nes: Print hardware resource type
RDMA/nes: Fix for crash when TX checksum offload is off
RDMA/nes: Cosmetic changes
RDMA/nes: Fix for incorrect MSS when TSO is on
RDMA/nes: Fix incorrect resolving of the loopback MAC address
mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem
mlx4_core: Trivial cleanups to driver log messages
mlx4_core: Trivial readability fix: "0X30" -> "0x30"
IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
mlx4: Paravirtualize Node Guids for slaves
mlx4: Activate SR-IOV mode for IB
...
Host bridge hotplug
- Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi)
- Clear host bridge resource info to avoid issue when releasing (Yinghai Lu)
- Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu)
- Use standard list ops for acpi_pci_drivers (Jiang Liu)
Device hotplug
- Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang Liu)
- Remove fakephp driver (Bjorn Helgaas)
- Fix VGA ref count in hotplug remove path (Yinghai Lu)
- Allow acpiphp to handle PCIe ports without native hotplug (Jiang Liu)
- Implement resume regardless of pciehp_force param (Oliver Neukum)
- Make pci_fixup_irqs() work after init (Thierry Reding)
Miscellaneous
- Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang)
- Factor out PCI Express Capability accessors (Jiang Liu)
- Add pcibios_window_alignment() so powerpc EEH can use generic resource assignment (Gavin Shan)
- Make pci_error_handlers const (Stephen Hemminger)
- Cleanup drivers/pci/remove.c (Bjorn Helgaas)
- Improve Vendor-Specific Extended Capability support (Bjorn Helgaas)
- Use standard list ops for bus->devices (Bjorn Helgaas)
- Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang)
- Reassign invalid bus number ranges (Intel DP43BF workaround) (Yinghai Lu)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJQac4hAAoJEPGMOI97Hn6zjZYP/iaqU9kjmgTsBbSyzB4oApv/
RRxo3I+ad9GF6XlMQfVAtyx1pgCD1gdGAtoDgGSCTqgdYD3CO10AxKU+yleAk1wo
dbMxLifJNTrT3G1mZ/NL16yEGhCwvhfwzRtB1VoZmCT4lSApO/7cJkXl2DzHfA/i
pmltOOiQCN8kbUcJbVPtUyTVPi2zl/8bsyCyTkS7YG0VXeGRM+ZUvPWZJ7MnWYYB
5qoCdrw5ENCCiDQ9yw5SAfgL23b+0p6OI/x3Lkex0QQOWwSqGSiaHt4b7eitrC5b
2eAJg32f/AzZke1YbKLMfdsL0VJP3GAswhDVHlgmo63rZkOZChm+97dgZ35Mcv5v
kEXkWyBb1xJ3t8rZir6Qer9Iv2wOB+MkZ5qtU/Vf+l0wLQLYTrRVsKngrEDREONk
dXbokp6iVSPeA1sTSdH9MmHlTUIj82ZLSGcxcjTsN8NWZjxx6g3rNx1uay+5MYOW
4ET9zNu5snrAqN6N4Tb81gvtG8qYfxzdvVfrA9AaGKI6xxB7jkqgFJRp55JiEcFc
x4cmWkhvdlhVsG2TQwFxYNfswOqD+7NCs6V4kSVZX6ezpDrH7I5VvcnnhstF7C8l
KZul0EV7OW+kDK23pNe24lVP2xtOv6G8eK/3PmeKIXWl1V83nqre/oLufRzTfs+Z
SxkILwY/MFpuCFteKE1t
=haBu
-----END PGP SIGNATURE-----
Merge tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI changes from Bjorn Helgaas:
"Host bridge hotplug
- Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi)
- Clear host bridge resource info to avoid issue when releasing
(Yinghai Lu)
- Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu)
- Use standard list ops for acpi_pci_drivers (Jiang Liu)
Device hotplug
- Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang
Liu)
- Remove fakephp driver (Bjorn Helgaas)
- Fix VGA ref count in hotplug remove path (Yinghai Lu)
- Allow acpiphp to handle PCIe ports without native hotplug (Jiang
Liu)
- Implement resume regardless of pciehp_force param (Oliver Neukum)
- Make pci_fixup_irqs() work after init (Thierry Reding)
Miscellaneous
- Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang)
- Factor out PCI Express Capability accessors (Jiang Liu)
- Add pcibios_window_alignment() so powerpc EEH can use generic
resource assignment (Gavin Shan)
- Make pci_error_handlers const (Stephen Hemminger)
- Cleanup drivers/pci/remove.c (Bjorn Helgaas)
- Improve Vendor-Specific Extended Capability support (Bjorn
Helgaas)
- Use standard list ops for bus->devices (Bjorn Helgaas)
- Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang)
- Reassign invalid bus number ranges (Intel DP43BF workaround)
(Yinghai Lu)"
* tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (102 commits)
PCI: acpiphp: Handle PCIe ports without native hotplug capability
PCI/ACPI: Use acpi_driver_data() rather than searching acpi_pci_roots
PCI/ACPI: Protect acpi_pci_roots list with mutex
PCI/ACPI: Use acpi_pci_root info rather than looking it up again
PCI/ACPI: Pass acpi_pci_root to acpi_pci_drivers' add/remove interface
PCI/ACPI: Protect acpi_pci_drivers list with mutex
PCI/ACPI: Notify acpi_pci_drivers when hot-plugging PCI root bridges
PCI/ACPI: Use normal list for struct acpi_pci_driver
PCI/ACPI: Use DEVICE_ACPI_HANDLE rather than searching acpi_pci_roots
PCI: Fix default vga ref_count
ia64/PCI: Clear host bridge aperture struct resource
x86/PCI: Clear host bridge aperture struct resource
PCI: Stop all children first, before removing all children
Revert "PCI: Use hotplug-safe pci_get_domain_bus_and_slot()"
PCI: Provide a default pcibios_update_irq()
PCI: Discard __init annotations for pci_fixup_irqs() and related functions
PCI: Use correct type when freeing bus resource list
PCI: Check P2P bridge for invalid secondary/subordinate range
PCI: Convert "new_id"/"remove_id" into generic pci_bus driver attributes
xen-pcifront: Use hotplug-safe pci_get_domain_bus_and_slot()
...
In the current driver, the SENSE_PORT firmware command is issued as a
"wrapped" command, but the command handling code doesn't have a
wrapper, so it will never do anything other than log an error message.
The latest ConnectX-3 2.11.500 firmware reports the SENSE_PORT
capability even in multi-function (SR-IOV) mode, so the driver will
try to issue the command.
At least until the driver has a proper wrapper for SENSE_PORT, make
sure we disable the command for multi-function devices.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Instead of having a hard-coded "PCI device ID != 0x1003" (which
obviously breaks as newer devices with ID != 0x1003 become available),
instead let's set a flag in our PCI device table for the older devices
where we're supposed to force using SENSE_PORT. This also avoids
enabling SENSE_PORT for virtual functions by mistake.
Signed-off-by: Roland Dreier <roland@purestorage.com>
That way we can check flags later on, when we've finished with the
pci_device_id structure. Also convert the "is VF" flag to an enum:
"Never do in the preprocessor what can be done in C."
Signed-off-by: Roland Dreier <roland@purestorage.com>
On an SR-IOV master device, __mlx4_init_one() calls mlx4_init_hca()
before mlx4_multi_func_init(). However, for unlucky configurations,
mlx4_init_hca() might call mlx4_SENSE_PORT() (via mlx4_dev_cap()), and
that calls mlx4_cmd_imm() with MLX4_CMD_WRAPPED set.
However, on a multifunction device with MLX4_CMD_WRAPPED, __mlx4_cmd()
calls into mlx4_slave_cmd(), and that immediately tries to do
down(&priv->cmd.slave_sem);
but priv->cmd.slave_sem isn't initialized until mlx4_multi_func_init()
(which we haven't called yet). The next thing it tries to do is access
priv->mfunc.vhcr, but that hasn't been allocated yet.
Fix this by moving the initialization of slave_sem and vhcr up into
mlx4_cmd_init(). Also, since slave_sem is really just being used as a
mutex, convert it into a slave_cmd_mutex.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...). The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use. This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.
Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).
To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:
base_sqpn -- in non-sriov mode, this was formerly sqp_start.
base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This is necessary in order to support > 1 VF/PF in a VM for software
that uses the node guid as a discriminator, such as librdmacm.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This requires:
1. Replacing the paravirtualized P_Key index (inserted by the guest)
with the real P_Key index.
2. For UD QPs, placing the guest's true source GID index in the
address path structure mgid field, and setting the ud_force_mgid
bit so that the mgid is taken from the QP context and not from the
WQE when posting sends.
3. For UC and RC QPs, placing the guest's true source GID index in the
address path structure mgid field.
4. For tunnel and proxy QPs, setting the Q_Key value reserved for that
proxy/tunnel pair.
Since not all the above adjustments occur in all the QP transitions,
the QP transitions require separate wrapper functions.
Secondly, initialize the P_Key virtualization table to its default
values: Master virtualized table is 1-1 with the real P_Key table,
guest virtualized table has P_Key index 0 mapped to the real P_Key
index 0, and all the other P_Key indices mapped to the reserved
(invalid) P_Key at index 127.
Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache.
and generating events on the master only if a P_Key actually changed.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In addition, pass the proxy and tunnel QP numbers to slaves so the
driver can perform special QP paravirtualization.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* commit 'v3.6-rc5': (1098 commits)
Linux 3.6-rc5
HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured
Remove user-triggerable BUG from mpol_to_str
xen/pciback: Fix proper FLR steps.
uml: fix compile error in deliver_alarm()
dj: memory scribble in logi_dj
Fix order of arguments to compat_put_time[spec|val]
xen: Use correct masking in xen_swiotlb_alloc_coherent.
xen: fix logical error in tlb flushing
xen/p2m: Fix one-off error in checking the P2M tree directory.
powerpc: Don't use __put_user() in patch_instruction
powerpc: Make sure IPI handlers see data written by IPI senders
powerpc: Restore correct DSCR in context switch
powerpc: Fix DSCR inheritance in copy_thread()
powerpc: Keep thread.dscr and thread.dscr_inherit in sync
powerpc: Update DSCR on all CPUs when writing sysfs dscr_default
powerpc/powernv: Always go into nap mode when CPU is offline
powerpc: Give hypervisor decrementer interrupts their own handler
powerpc/vphn: Fix arch_update_cpu_topology() return value
ARM: gemini: fix the gemini build
...
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
drivers/rapidio/devices/tsi721.c
If mlx4_cmd_init() failed, the init_one function returned
success, although no resources were opened.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The order of operations was wrong on the teardown flow.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Port1=Eth, Port2=IB restriction is no longer required.
Having RoCE, there will always rdma port initialized over ConnectX
physical port, no matter whether the link layer is IB or Ethernet.
So we always have dual port IB device.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates and fixes from David Miller:
1) Reinstate the no-ref optimization for input route lookups in ipv4 to
fix some routing cache removal perf regressions.
2) Make TCP socket pre-demux work on ipv6 side too, from Eric Dumazet.
3) Get RX hash value from correct place in be2net driver, from
Sarveshwar Bandi.
4) Validation of FIB cached routes missing critical check, from Eric
Dumazet.
5) EEH support in mlx4 driver, from Kleber Sacilotto de Souza.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
ipv6: Early TCP socket demux
ipv4: Fix input route performance regression.
pch_gbe: vlan skb len fix
pch_gbe: add extra clean tx
pch_gbe: fix transmit watchdog timeout
ixgbe: fix panic while dumping packets on Tx hang with IOMMU
be2net: Fix to parse RSS hash from Receive completions correctly.
net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER
hyperv: Add error handling to rndis_filter_device_add()
hyperv: Add a check for ring_size value
ipv4: rt_cache_valid must check expired routes
net/pch_gpe: Cannot disable ethernet autonegation
qeth: repair crash in qeth_l3_vlan_rx_kill_vid()
netiucv: cleanup attribute usage
net: wiznet add missing HAS_IOMEM dependency
be2net: Missing byteswap in be_get_fw_log_level causes oops on PowerPC
mlx4: Add support for EEH error recovery
cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN
wanmain: comparing array with NULL
caif: fix NULL pointer check
...
Currently the mlx4 drivers don't have the necessary callbacks to
implement EEH errors detection and recovery, so the PCI layer uses the
probe and remove callbacks to try to recover the device after an error on
the bus. However, these callbacks have race conditions with the internal
catastrophic error recovery functions, which will also detect the error
and this can cause the system to crash if both EEH and catas functions
try to reset the device.
This patch adds the necessary error recovery callbacks and makes sure
that the internal catastrophic error functions will not try to reset the
device in such scenarios. It also adds some calls to
pci_channel_offline() to suppress reads/writes on the bus when the slot
cannot accept I/O operations so we prevent unnecessary accesses to the
bus and speed up the device removal.
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Updates to the qib low-level driver
- First chunk of changes for SR-IOV support for mlx4 IB
- RDMA CM support for IPv6-only binding
- Other misc cleanups and fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJQDXhUAAoJEENa44ZhAt0hZxwQAJydS9f9me31AGa45SAq8rA8
6e3LgnQ6jS38he7LZZdSsT7g25jROCKOcTj6VUkNSVAVSkK8zcp7wngjDxw50IK6
FF9hNeljMkWBflOhxzB34DRGCW4b4J+Yt1o7v1RtawiG/Mhri/6imKL0Aqjt4GwX
a1MPZn+xI2osLujfdHJtATPWWB9jCaXdFe4DJUNPdqJhS6TN7s8OP3XMiqoJtdaV
ptHeSSbjdUR1mg/h2LU2FVmWHXNSBxn7MEsrBBRkQVyiEkXieFBLwPTp4DqmAUJf
xugm6Hf4sbiQ+QuU0baJODt56wveuYVQ4HKUzE0urUFXyU4TUB9blrehWZnKsRte
wo/w4nvVlnXgGhHH0Igq76RDX8aCwc/6uQJ/29oChWjrei3HE0LjmIlPAu0vAhyw
ViLe02/r2gKXQv1NxIqhPmGsJTZizg2mUk2eEJHHPQb/NGL7iNo6b+141AfURoqQ
goGmlGdzffCRpmo6FXFZ57RPVRS4gwMunCY/Pmvq5a2t4oZh8899l2+V3N7bCfvH
+JdavxAjia9U4IlgPsAqVaz8z8TyHOY/lEd75Wnw8q9www3kLx7hASvHsdwrS4LL
ihzECzsaXOSoIXQzghs+6iyp1pzmzE9ve8OqbIGJStzlrOLyn7Gjo+Ixwm0SLsTO
I7h9iJ1OMKFZ8bzmHsWb
=f09j
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
- Updates to the qib low-level driver
- First chunk of changes for SR-IOV support for mlx4 IB
- RDMA CM support for IPv6-only binding
- Other misc cleanups and fixes
Fix up some add-add conflicts in include/linux/mlx4/device.h and
drivers/net/ethernet/mellanox/mlx4/main.c
* tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits)
IB/qib: checkpatch fixes
IB/qib: Add congestion control agent implementation
IB/qib: Reduce sdma_lock contention
IB/qib: Fix an incorrect log message
IB/qib: Fix QP RCU sparse warnings
mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
mlx4_core: Allow guests to have IB ports
mlx4_core: Implement mechanism for reserved Q_Keys
net/mlx4_core: Free ICM table in case of error
IB/cm: Destroy idr as part of the module init error flow
mlx4_core: Remove double function declarations
IB/mlx4: Fill the masked_atomic_cap attribute in query device
IB/mthca: Fill in sq_sig_type in query QP
IB/mthca: Warning about event for non-existent QPs should show event type
IB/qib: Fix sparse RCU warnings in qib_keys.c
net/mlx4_core: Initialize IB port capabilities for all slaves
mlx4: Use port management change event instead of smp_snoop
IB/qib: RCU locking for MR validation
IB/qib: Avoid returning EBUSY from MR deregister
IB/qib: Fix UC MR refs for immediate operations
...
To allow easy paravirtualization of P_Key and GID table sizes, keep
paravirtualized sizes in mlx4_dev->caps, but save the actual physical
sizes from FW in struct: mlx4_dev->phys_cap.
In addition, in SR-IOV mode, do the following:
1. Reduce reported P_Key table size by 1.
This is done to reserve the highest P_Key index for internal use,
for declaring an invalid P_Key in P_Key paravirtualization.
We require a P_Key index which always contain an invalid P_Key
value for this purpose (i.e., one which cannot be modified by
the subnet manager). The way to do this is to reduce the
P_Key table size reported to the subnet manager by 1, so that
it will not attempt to access the P_Key at index #127.
2. Paravirtualize the GID table size to 1. Thus, each guest sees
only a single GID (at its paravirtualized index 0).
In addition, since we are paravirtualizing the GID table size to 1, we
add paravirtualization of the master GID event here (i.e., we do not
do ib_dispatch_event() for the GUID change event on the master, since
its (only) GUID never changes).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Modify mlx4_dev_cap to allow IB support when SR-IOV is active. Modify
mlx4_slave_cap to set the "rdma-supported" bit in its flags area, and
pass that to the guests (this is done in QUERY_FUNC_CAP and its
wrapper).
However, we don't activate IB support quite yet -- we leave the error
return at the start of mlx4_ib_add in the mlx4_ib driver.
In addition, set "protected fmr supported" bit to zero in the
QUERY_FUNC_CAP wrapper.
Finally, in the QUERY_FUNC_CAP wrapper, we needed to add code which
checks for the port type (IB or Ethernet). Previously, this was not
an issue, since only Ethernet ports were supported.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The SR-IOV special QP tunneling mechanism uses proxy special QPs
(instead of the real special QPs) for MADs on guests. These proxy QPs
send their packets to a "tunnel" QP owned by the master. The master
then forwards the MAD (after any required paravirtualization) to the
real special QP, which sends out the MAD.
For security reasons (i.e., to prevent guests from sending MADs to
tunnel QPs belonging to other guests), each proxy-tunnel QP pair is
assigned a unique, reserved, Q_Key. These Q_Keys are available only
for proxy and tunnel QPs -- if the guest tries to use these Q_Keys
with other QPs, it will fail.
This patch introduces a mechanism for reserving a block of 64K Q_Keys
for proxy/tunneling use.
The patch introduces also two new fields into mlx4_dev: base_sqpn and
base_tunnel_sqpn.
In SR-IOV mode, the QP numbers for the "real," proxy, and tunnel sqps
are added to the reserved QPN area (so that they will not change).
There are 8 special QPs per port in the HCA, and each of them is
assigned both a proxy and a tunnel QP, for each VF and for the PF as
well in SR-IOV mode.
The QPNs for these QPs are arranged as follows:
1. The real SQP numbers (8)
2. The proxy SQPs (8 * (max number of VFs + max number of PFs)
3. The tunnel SQPs (8 * (max number of VFs + max number of PFs)
To support these QPs, two new fields are added to struct mlx4_dev:
base_sqp: this is the QP number of the first of the real SQPs
base_tunnel_sqp: this is the qp number of the first qp in the tunnel
sqp region. (On guests, this is the first tunnel
sqp of the 8 which are assigned to that guest).
In addition, in SR-IOV mode, sqp_start is the number of the first
proxy SQP in the proxy SQP region. (In guests, this is the first
proxy SQP of the 8 which are assigned to that guest)
Note that in non-SR-IOV mode, there are no proxies and no tunnels.
In this case, sqp_start is set to sqp_base -- which minimizes code
changes.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
With IB SR-IOV, each slave has its own separate copy of the port
capabilities flags. For example, the master can run a subnet manager
(which causes the IsSM bit to be set in the master's port
capabilities) without affecting the port capabilities seen by the
slaves (the IsSM bit will be seen as cleared in the slaves).
Also add a static inline mlx4_master_func_num() to enhance readability
of the code.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The driver is modified to support three operation modes.
If supported by firmware use the device managed flow steering
API, that which we call device managed steering mode. Else, if
the firmware supports the B0 steering mode use it, and finally,
if none of the above, use the A0 steering mode.
When the steering mode is device managed, the code is modified
such that L2 based rules set by the mlx4_en driver for Ethernet
unicast and multicast, and the IB stack multicast attach calls
done through the mlx4_ib driver are all routed to use the device
managed API.
When attaching rule using device managed flow steering API,
the firmware returns a 64 bit registration id, which is to be
provided during detach.
Currently the firmware is always programmed during HCA initialization
to use standard L2 hashing. Future work should be done to allow
configuring the flow-steering hash function with common, non
proprietary means.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of checking the firmware supported steering mode in various
places in the code, add a dedicated field in the mlx4 device capabilities
structure which is written once during the initialization flow and read
across the code.
This also set the grounds for add new steering modes. Currently two modes
are supported, and are named after the ConnectX HW versions A0 and B0.
A0 steering uses mac_index, vlan_index and priority to steer traffic
into pre-defined range of QPs.
B0 steering uses Ethernet L2 hashing rules and is enabled only
if the firmware supports both unicast and multicast B0 steering,
The current steering modes are relevant for Ethernet traffic only,
such that Infiniband steering remains untouched.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a crash at the error flow of NOP command which caused the driver to try and use
a completion vector which wasn't allocated.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The range check was performed after using the port number.
Reverse this to prevent a potential array overflow.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- pass the following parameters:
- firmware version (added QUERY_FW paravirtualization for that)
- disable Blueflame on slaves. KVM disables write combining on guests,
and we get better performance without BF in this case. (This requires
QUERY_DEV_CAP paravirtualization, also in this commit)
- max qp rdma as destination
- get rid of a chunk of "if (0)" dead code
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In SRIOV mode, the number of EQs used when computing the total ICM size
was incorrect.
To fix this, we do the following:
1. We add a new structure to mlx4_dev, mlx4_phys_caps, to contain physical HCA
capabilities. The PPF uses the phys capabilities when it computes things
like ICM size.
The dev_caps structure will then contain the paravirtualized values, making
bookkeeping much easier in SRIOV mode. We add a structure rather than a
single parameter because there will be other fields in the phys_caps.
The first field we add to the mlx4_phys_caps structure is num_phys_eqs.
2. In INIT_HCA, when running in SRIOV mode, the "log_num_eqs" parameter
passed to the FW is the number of EQs per VF/PF; each function (PF or VF)
has this number of EQs available.
However, the total number of EQs which must be allowed for in the ICM is
(1 << log_num_eqs) * (#VFs + #PFs). Rather than compute this quantity,
we allocate ICM space for 1024 EQs (which is the device maximum
number of EQs, and which is the value we place in the mlx4_phys_caps structure).
For INIT_HCA, however, we use the per-function number of EQs as described
above.
Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
- Add generic and mlx4 support for "raw" QPs: allow suitably privileged
applications to send and receive arbitrary packets directly to/from
the hardware
- Add "doorbell drop" handling to the cxgb4 driver
- A fairly large batch of qib hardware driver changes
- A few fixes for lockdep-detected issues
- A few other miscellaneous fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJPumbZAAoJEENa44ZhAt0h8LgP/0fXe7Szm3n6P6UvMAVqkagM
4PpreH3mpWUFpzqeQE1JPDtgx700R6aPipbHqgIN+k61RWMpLjICGcNx7iwxn1I+
zqdquGygWgjceLz+BLVlk+iBmJt3vZ3fPRAXc7fdP+jhIarWkNIOy1pXWTUuRvED
jL8jIaxhCcgAVzm/zNyt6IPxkaHvCz7K9wqmpyU0dsO9OyPdGvWA9+CkGXwmOCPq
mxSVhWnfGsMkPBsL7EgTC5KP/ox2PKq6rFgysmVVS+rKCpP0L8BEVQyGX3Gf8KA8
yV+KdTi9ofDnFrv6R7Wz0v7HRUih8GRssakzBu7Y7HLfK1M/QwMG0GUAibXGZObc
vUXuQ3uRJ/cIzMPXqKeGYwpb5t+TmxyjhWu44OjCUQkNau91+9BSbA69S88KXc49
aTJiCZlhPuGf4uGMWJJuPLcE2xO2QCZj+8ckL2STYrIip6GWlCH02kJaQmRkuWH2
UhMOeJDBC4nvh4EQT/WwHpGzyhkavE2ayfo5YemxBJXo+P5Mdbf7WIDRQDLUEeQH
F8sPoccH4hDiAorN/SkTsm14jVTP7oWW1M40Ont59Nhbgm88MsVkvjoneHnfBvbD
HjK92soCWnYTAoREfj0G4xUxZgMdOZcezWrX0rx5LJ8Ju9y4zAi3cKGr7lg6hs4X
syKfN0VjiDRtJ+pxayi3
=yWfr
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
- Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
- Add generic and mlx4 support for "raw" QPs: allow suitably privileged
applications to send and receive arbitrary packets directly to/from
the hardware
- Add "doorbell drop" handling to the cxgb4 driver
- A fairly large batch of qib hardware driver changes
- A few fixes for lockdep-detected issues
- A few other miscellaneous fixes and cleanups
Fix up trivial conflict in drivers/net/ethernet/emulex/benet/be.h.
* tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (53 commits)
RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree
IB/mlx4: Fix mlx4_ib_add() error flow
IB/core: Fix IB_SA_COMP_MASK macro
IB/iser: Fix error flow in iser ep connection establishment
IB/mlx4: Increase the number of vectors (EQs) available for ULPs
RDMA/cxgb4: Add query_qp support
RDMA/cxgb4: Remove kfifo usage
RDMA/cxgb4: Use vmalloc() for debugfs QP dump
RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues
RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch()
RDMA/cxgb4: Add DB Overflow Avoidance
RDMA/cxgb4: Add debugfs RDMA memory stats
cxgb4: DB Drop Recovery for RDMA and LLD queues
cxgb4: Common platform specific changes for DB Drop Recovery
cxgb4: Detect DB FULL events and notify RDMA ULD
RDMA/cxgb4: Drop peer_abort when no endpoint found
RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr()
mlx4_core: Change bitmap allocator to work in round-robin fashion
RDMA/nes: Don't call event handler if pointer is NULL
RDMA/nes: Fix for the ORD value of the connecting peer
...
Add missing resource tracking for XRC domains and complete the tracking for HCA
network flow counters.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the slave and master resources are deleted after master freed
all bitmaps. If any resources were not properly cleaned up during the
shutdown process, an Oops would result.
Fix so that delete slave (only) resources during cleanup. Master resources
are cleaned up during unload process, and need not separately be cleaned.
Note that during cleanup, we need to split the resource-tracker freeing
functionality.
Before removing all the bitmaps, we free any leftover slave resources.
However, we can only remove the resource tracker linked list after
all bitmap frees, since some of the freeing functions (e.g.,
mlx4_cleanup_eq_table) use paravirtualized FW commands which expect
the resource tracker linked list to be present.
Found-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consider the following scenario: 2 HCAs, where only one of which can run SRIOV.
If we reset the module parameter, all the VFs of the SRIOV HCA will be
claimed by the PPF host (-- the code relies on num_vfs being non-zero
to avoid this claiming, and num_vfs was reset when pci_enable_sriov failed
for the non-SRIOV HCA).
The solution is not to touch the num_vfs parameter.
Also, eliminate the unneeded check of num_vfs when disabling sriov
(the dev flag bit is sufficient).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a 64-bit flags2 features member to struct mlx4_dev to
export further features of the hardware. The original flags field
tracks features whose support bits are advertised by the firmware in
offsets 0x40 and 0x44 of the query device capabilities command.
flags2 will track features whose support bits are scattered at various
offsets.
RSS support is the first feature to be exported through flags2. RSS
capabilities are located at offset 0x2e. The size of the RSS
indirection table is also given in this offset.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
stands out; by patch count lots of fixes to the mlx4 driver plus some
cleanups and fixes to the core and other drivers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJPZ2THAAoJEENa44ZhAt0hMgkP/AwXjwzz6lYlIizytQ5KOH11
CT/VoHLIGIwY31aRhZRHggWLEbJi8akgfh04K9PiSh/muFRPYk5KJDoQEoJV5Odv
12krqtpZeL41YcAPq0wiyP3Ki5AZDCDumzWbOtmxE9m93mVfi3yFTTWDHFo/7SOx
A2kUITdKuZhgBpCFYS23Gs8QTWcMPUlwNlM76edG90BjSySHsK15tbqTUaEOC6Ux
SPG0p0c2YjlZCPmRObrCL65o5LFRVajinZ4rXN7rre4LEU+IuXgW+mQU2Kwy8z6a
oMPE2l4OMSqj270r2PlVK087gGH69nKfLgLQLJiP0Id+KwdYUk7vQcA+Fu0jwjP3
Ci/ahllvB76IiaNbax0vTy2ohCJmilLLotAP46NW0vPR2/KnmVy0Mqk8pXQQddw4
tnAFNnafn/MjTty8GwqyXR/enJZs29ePcSxcYnKjXYRIgG4ldejL2wktgSra8+gb
3/qFag1HRgZzcICkXGpHj7uWJ2KDe++1m6KTb0THUmrjuiel6ATFZpYoeRed3GfS
ZMqpjZvuXM8nA9CrKMkzm5vIiUFF8Qq0hUTLR38F8FiNEhmC08JqH5PqjJ3Z18if
R1yG4W4xmp/0ICHg4zyg6LWV3YsweDD+jSCJpW+KNBvu3HtYb41HIlK+i2TTmtPD
3etEZZRwROAyYKcwN01p
=DbLx
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes for the 3.4 merge window from Roland Dreier:
"Nothing big really stands out; by patch count lots of fixes to the
mlx4 driver plus some cleanups and fixes to the core and other
drivers."
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (28 commits)
mlx4_core: Scale size of MTT table with system RAM
mlx4_core: Allow dynamic MTU configuration for IB ports
IB/mlx4: Fix info returned when querying IBoE ports
IB/mlx4: Fix possible missed completion event
mlx4_core: Report thermal error events
mlx4_core: Fix one more static exported function
IB: Change CQE "csum_ok" field to a bit flag
RDMA/iwcm: Reject connect requests if cmid is not in LISTEN state
RDMA/cxgb3: Don't pass irq flags to flush_qp()
mlx4_core: Get rid of redundant ext_port_cap flags
RDMA/ucma: Fix AB-BA deadlock
IB/ehca: Fix ilog2() compile failure
IB: Use central enum for speed instead of hard-coded values
IB/iser: Post initial receive buffers before sending the final login request
IB/iser: Free IB connection resources in the proper place
IB/srp: Consolidate repetitive sysfs code
IB/srp: Use pr_fmt() and pr_err()/pr_warn()
IB/core: Fix SDR rates in sysfs
mlx4: Enforce device max FMR maps in FMR alloc
IB/mlx4: Set bad_wr for invalid send opcode
...
Set the MTU for IB ports in the driver instead of using the firmware
default of 2KB (the driver defaults to 4KB). Allow for dynamic mtu
configuration through a new, per-port sysfs entry.
Since there's a dependency between the port MTU and the max number of
HW VLs the port can support, apply a mim/max approach, using a loop
that goes down from the highest possible number of VLs to the lowest,
using the firmware return status to know whether the requested number
of VLs is possible with a given MTU.
For now, as with the dynamic link type change / VPI support, the sysfs
entry to change the mtu is exposed only when NOT running in SR-IOV
mode. To allow changing the MTU for the master in SR-IOV mode,
primary-function-initiated FLR (Function Level Reset) needs to be
implemented.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 22c8bff6fa ("mlx4_core: Exported functions can't be static")
fixed most of this up, but forgot about mlx4_is_slave_active(). Fix
this one too.
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
While doing the work for commit a6f7feae6d ("IB/mlx4: pass SMP
vendor-specific attribute MADs to firmware") we realized that the
firmware would respond on all sorts of vendor-specific MADs.
Therefore commit 97285b7817 ("mlx4_core: Add extended port
capabilities support") adds redundant code into the driver, since
there's no real reaon to maintain the extended capabilities of the
port, as they can be queried on demand (e.g the FDR10 capability).
This patch reverts commit 97285b7817 and removes the check for
extended caps from the mlx4_ib driver port query flow.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Conflicts:
drivers/net/ethernet/sfc/rx.c
Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change
the rx_buf->is_page boolean into a set of u16 flags, and another to
adjust how ->ip_summed is initialized.
Signed-off-by: David S. Miller <davem@davemloft.net>
ConnectX devices have a limit on the number of mappings that can be
done on an FMR before having to call sync_tpt. The current
mlx4_ib driver reports the limit correctly in max_map_per_fmr in
.query_device(), but mlx4_core doesn't check it when actually
allocating FMRs.
Add a max_fmr_maps field to struct mlx4_caps and enforce this maximum
value on FMR allocations.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In port type change flow, need to set the new port types only after
all interfaces have finished the unregister process.
Otherwise, during unregister, one of the interfaces might issue a SET_PORT
command with wrong port types, it can cause bad FW behavior.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Under the spinlock we call request_irq(), which allocates memory with GFP_KERNEL,
This causes the following trace when DEBUG_SPINLOCK is enabled, it can cause
the following trace:
BUG: spinlock wrong CPU on CPU#2, ethtool/2595
lock: ffff8801f9cbc2b0, .magic: dead4ead, .owner: ethtool/2595, .owner_cpu: 0
Pid: 2595, comm: ethtool Not tainted 3.0.18 #2
Call Trace:
spin_bug+0xa2/0xf0
do_raw_spin_unlock+0x71/0xa0
_raw_spin_unlock+0xe/0x10
mlx4_assign_eq+0x12b/0x190 [mlx4_core]
mlx4_en_activate_cq+0x252/0x2d0 [mlx4_en]
? mlx4_en_activate_rx_rings+0x227/0x370 [mlx4_en]
mlx4_en_start_port+0x189/0xb90 [mlx4_en]
mlx4_en_set_ringparam+0x29a/0x340 [mlx4_en]
dev_ethtool+0x816/0xb10
? dev_get_by_name_rcu+0xa4/0xe0
dev_ioctl+0x2b5/0x470
handle_mm_fault+0x1cd/0x2d0
sock_do_ioctl+0x5d/0x70
sock_ioctl+0x79/0x2f0
do_vfs_ioctl+0x8c/0x340
sys_ioctl+0xa1/0xb0
system_call_fastpath+0x16/0x1b
Replacing with mutex, which is enough in this case.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
BF can be disabled in some cases, the capability field, bf_reg_size is set
to zero in this case. Don't map the BF area in this case, it would cause
failures. In addition, leaving the BF area unmapped
also alerts the ETH driver to not use BF.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unnecessary field high_prios from mlx4_steer struct and initialization
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Num mtts from profile is really the number of mtt segments.
Thus, in make profile, to get the proper number of MTT entries,
must multiply num_mtts by mtts per segment.
Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Virtual Functions should not be aware their function number.
Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.
It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.
(Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
New FW can give clues to driver regarding default port type
and whether or not we should default to link sensing on the port.
2 bits are added to QUERY_PORT command:
1. suggested_type: This bit gives a hint whether the default port type should be
IB or Ethernet.
The driver will use this hint in case the user didn't specify explicitly the link layer
type he wants to set.
2. default_sense: If this bit is set, we would sense the port type on start-up
and default the port to link sensing
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
For ConnectX3 devices, we allow link sensing only if FW explicitly
reported it supports the feature.
For older versions (ConnectX1 and 2), if the card supports both link layer types
(Ethenet and Infiniband), link sensing is supported.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Added module parameters sr_iov and probe_vf for controlling enablement of
SRIOV mode.
2. Increased default max num-qps, num-mpts and log_num_macs to accomodate
SRIOV mode
3. Added port_type_array as a module parameter to allow driver startup with
ports configured as desired.
In SRIOV mode, only ETH is supported, and this array is ignored; otherwise,
for the case where the FW supports both port types (ETH and IB), the
port_type_array parameter is used.
By default, the port_type_array is set to configure both ports as IB.
4. When running in sriov mode, the master needs to initialize the ICM eq table
to hold the eq's for itself and also for all the slaves.
5. mlx4_set_port_mask() now invoked from mlx4_init_hca, instead of in mlx4_dev_cap.
6. Introduced sriov VF (slave) device startup/teardown logic (mainly procedures
mlx4_init_slave, mlx4_slave_exit, mlx4_slave_cap, mlx4_slave_exit and flow
modifications in __mlx4_init_one, mlx4_init_hca, and mlx4_setup_hca).
VFs obtain their startup information from the PF (master) device via the
comm channel.
7. In SRIOV mode (both PF and VF), MSI_X must be enabled, or the driver
aborts loading the device.
8. Do not allow setting port type via sysfs when running in SRIOV mode.
9. mlx4_get_ownership: Currently, only one PF is supported by the driver.
If the HCA is burned with FW which enables more than one PF, only one
of the PFs is allowed to run. The first one up grabs a FW ownership
semaphone -- all other PFs will find that semaphore taken, and the
driver will not allow them to run.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Liran Liss <liranl@mellanox.co.il>
Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the previous implementation mtts are managed by:
1. order - log(mtt segments), 'mtt segment' groups several mtts together.
2. first_seg - segment location relative to mtt table.
In the current implementation:
1. order - log(mtts) rather than segments
2. offset - mtt index in mtt table
Note: The actual mtt allocation is made in segments but it is
transparent to callers.
Rational: The mtt resource holders are not interested on how the allocation
of mtt is done, but rather on how they will use it.
Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let multicast/unicast attaching flow go through resource tracker.
The PF is the one responsible for managing all the steering entries.
Define and use module parameter that determines the number of qps
per multicast group.
Minor changes in function calls according to changed prototype.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Port mask now has additional state.
Port can be set as "none". In this case neither the mlx4_en or mlx4_ib
drivers take ownership of the port.
In multifunction mode there is an option to set the vfs as single ported devices.
(in single function mode, both physical ports belong to same function)
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
mlx4_core: Deprecate log_num_vlan module param
IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
IB/mlx4: Enable 4K mtu for IBoE
RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
RDMA/cxgb4: Serialize calls to CQ's comp_handler
RDMA/cxgb3: Serialize calls to CQ's comp_handler
IB/qib: Fix issue with link states and QSFP cables
IB/mlx4: Configure extended active speeds
mlx4_core: Add extended port capabilities support
IB/qib: Hold links until tuning data is available
IB/qib: Clean up checkpatch issue
IB/qib: Remove s_lock around header validation
IB/qib: Precompute timeout jiffies to optimize latency
IB/qib: Use RCU for qpn lookup
IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
IB/qib: Decode path MTU optimization
IB/qib: Optimize RC/UC code by IB operation
IPoIB: Use the right function to do DMA unmap pages
RDMA/cxgb4: Use correct QID in insert_recv_cqe()
RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
...
Moves the Mellanox driver into drivers/net/ethernet/mellanox/ and
make the necessary Kconfig and Makefile changes.
CC: Roland Dreier <roland@kernel.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>