linux/drivers
Dennis Dalessandro ba7d8117f3 IB/core, ipoib: Do not overreact to SM LID change event
When IPoIB receives an SM LID change event, it reacts by flushing its
path record cache and rejoining multicast groups. This is the same
behavior it performs when it receives a reregistration event. This
behavior is unnecessary as an SM may have database backup or
synchronization mechanisms which permit the SM location or LID to change
without loss of multicast membership and without impact to path records.

Both opensm and the OPA FM issue reregistration events if a new SM is
started (or restarted with a new config) or an SM event occurs which
results in loss of multicast membership records by the SM (such as
opensm failover) or the SM encounters new nodes with Active ports (such
as after joining 2 fabrics by connecting switches via ISLs). Hence this
event can be depended on as the trigger for IPoIB cache and multicast
flushing.

It appears that some drivers, such as qib, and hfi1 issue the
IB_EVENT_SM_CHANGE but other drivers such as mlx4 and mlx5 do not.
Empirical testing on Mellanox EDR using ibv_asyncwatch has confirmed
that Mellanox EDR HCAs do not generate SM change events and that opensm
does generate reregistration.

An SM LID change event is generated by the mentioned drivers to reflect
that sm_lid and/or sm_sl in the local port info has changed. The intent
of this event is to permit applications and ULPs which have a local copy
of this information (or an address handle using it) to update their
information.

The intent is that the reregistration event (caused by the SM via a bit
in Set(PortInfo)) be used to inform nodes that they need to rejoin
multicast groups, resubscribe for notices and potentially update path
records.

When an SM migrates or fails over, a SM LID change event can occur. In
response IPoIB discards path records and multicast membership and loses
connectivity until these records are restored via SA requests. In very
large fabrics, it may take minutes for the SM to be ready and for the SA
responses to be supplied.  This can result in undesirable and
unnecessary IPoIB connectivity impacts. It also can result in an
unnecessary storm of SA queries from all nodes in a cluster potentially
followed by yet another storm if the SM issues the reregistration
request.

The fact the Mellanox HCAs do not even generate this event, is further
evidence that on modern IB fabrics there will be no ill side effects
from the proposed changes below to reduce the reaction by 3 kernel
components to this event. So these changes should be benign for Mellanox
IB fabrics and will benefit OPA fabrics while also making ib_core and
ULP behavor "correct" as intended by the IBTA spec and kernel RDMA event
APIs.

Address these issues by removing IB_EVENT_SM_CHANGE handling from ipoib.
IPoIB does not locally store sm_lid nor sm_sl, so it does not need to do
anything on SM LID change. IPoIB makes use of other ib_core components
to issue SA requests for it and those components correctly track SM LID
and SM LID changes.

Also in ib_core multicast handling,  remove the test for
IB_EVENT_SM_CHANGE. This code is moving all multicast groups to the
error state, which will trigger rejoins. This code is used by IPoIB as
well as the connection manager and other clients of multicast groups.
This kernel module centralizes group membership status and joins since a
node can only join a given group once but multiple ULPs or applications
may want to join the same group. It makes use of the sa_query.c
component in ib_core, which correctly trackes SM LID and SL. This
component does not track SM LID nor SL itself and hence need not react
to their changes.

Similarly in the ib_core cache code remove the handling for the
IB_EVENT_SM_CHANGE.  In this function. The ib_cache_update function
which is ultimately called is updating local copies of the pkey table,
gid table and lmc. It does not update nor retain sm_lid nor sm_sl. As
such it does not need to be called on an SM LID change. It technically
also does not need to be called on a reregistration. The LID_CHANGE,
PKEY_CHANGE, GID_CHANGE and port state change events (PORT_ERR,
PORT_ACTICE) should be sufficient triggers.

It is worth noting that the alternative of simply having the hfi1 and
qib drivers not generate the SM LID change event was explored. While
this would duplicate what Mellanox drivers do now, it is not the correct
behavior and removes the ability for an SM to migrate without requiring
reregistration. Since both opensm and OPA SM have mechanisms to backup
or synchronize registration information, it is desirable to let them
perform SM migrations (with LID or SL changes) without requiring
reregistration when they deem it appropriate.

Suggested-by: Todd Rimmer <todd.rimmer@intel.com>
Tested-by: Michael Brooks <michael.brooks@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Todd Rimmer <todd.rimmer@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-07 16:06:03 -03:00
..
accessibility
acpi ACPICA: Namespace: remove address node from global list after method termination 2019-04-09 10:05:11 +02:00
amba ARM: 8836/1: drivers: amba: Update component matching to use the CoreSight UCI values. 2019-02-26 11:23:49 +00:00
android binder: fix race between munmap() and direct reclaim 2019-03-21 06:51:32 +01:00
ata libata: fix using DMA buffers on stack 2019-03-28 08:16:04 -06:00
atm
auxdisplay auxdisplay: charlcd: make backlight initial state configurable 2019-03-17 08:48:45 +01:00
base Device properties framework fix for 5.1-rc2 2019-03-22 12:08:52 -07:00
bcma
block virtio-blk: limit number of hw queues by nr_cpu_ids 2019-04-10 08:18:24 -06:00
bluetooth Bluetooth: btusb: request wake pin with NOAUTOEN 2019-04-09 17:38:24 -10:00
bus ARM: SoC driver updates for 5.1 2019-03-06 09:41:12 -08:00
cdrom
char tpm: Fix the type of the return value in calc_tpm2_event_size() 2019-04-08 15:58:54 -07:00
clk clk: imx: Fix PLL_1416X not rounding rates 2019-04-12 14:21:43 -07:00
clocksource clocksource/drivers/clps711x: Remove board support 2019-03-24 11:30:11 +01:00
connector connector: fix unsafe usage of ->real_parent 2019-03-08 15:06:38 -08:00
cpufreq cpufreq/intel_pstate: Load only on Intel hardware 2019-04-01 23:39:23 +02:00
cpuidle cpuidle: governor: Add new governors to cpuidle_governors again 2019-03-12 23:46:55 +01:00
crypto crypto: caam - fix copy of next buffer for xcbc and cmac 2019-03-28 13:54:32 +08:00
dax device-dax for 5.1 2019-03-16 13:05:32 -07:00
dca
devfreq
dio
dma dmaengine: stm32-mdma: Revert "dmaengine: stm32-mdma: Add a check on read_u32_array" 2019-03-25 21:56:54 +05:30
dma-buf
edac Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-03-08 09:11:39 -08:00
eisa
extcon
firewire
firmware memblock: drop memblock_alloc_*_nopanic() variants 2019-03-12 10:04:02 -07:00
fmc
fpga
fsi
gnss gnss: add driver for mediatek receivers 2019-02-15 16:54:38 +01:00
gpio gpio fixes for v5.1-rc3 2019-03-29 03:04:47 +01:00
gpu - Revert back to max link rate and lane count on eDP. 2019-04-12 13:39:32 +10:00
hid HID: input: add mapping for Assistant key 2019-04-03 13:33:25 +02:00
hsi HSI: omap_ssi_port: fix debugfs_simple_attr.cocci warnings 2019-02-14 12:36:21 +01:00
hv Char/Misc driver patches for 5.1-rc1 2019-03-06 14:18:59 -08:00
hwmon hwmon: (ntc_thermistor) Fix temperature type reporting 2019-03-29 09:51:44 -07:00
hwspinlock
hwtracing ARM updates for 5.1-rc1 2019-03-15 14:37:46 -07:00
i2c i2c: imx: don't leak the i2c adapter on error 2019-04-06 17:54:28 +02:00
i3c - Add a /* fall-through */ comment in the dw-i3c-master driver 2019-03-04 19:05:02 -08:00
ide Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide 2019-03-11 09:34:00 -07:00
idle intel_idle: add support for Jacobsville 2019-02-15 10:49:14 +01:00
iio - New Drivers 2019-03-08 10:02:58 -08:00
infiniband IB/core, ipoib: Do not overreact to SM LID change event 2019-05-07 16:06:03 -03:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2019-03-11 10:57:11 -07:00
interconnect
iommu iommu/amd: Set exclusion range correctly 2019-04-12 12:59:45 +02:00
ipack
irqchip irqchip/irq-ls1x: Missing error code in ls1x_intc_of_init() 2019-04-05 14:37:56 +02:00
isdn mISDN: hfcpci: Test both vendor & device ID for Digium HFC4S 2019-03-18 18:32:44 -07:00
leds leds: trigger: netdev: use memcpy in device_name_store 2019-03-30 19:09:32 +01:00
lightnvm lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecs 2019-04-10 12:17:01 -06:00
macintosh treewide: add checks for the return value of memblock_alloc*() 2019-03-12 10:04:02 -07:00
mailbox mailbox: imx: keep MU irq working during suspend/resume 2019-03-11 02:51:43 -05:00
mcb
md dm integrity: fix deadlock with overlapping I/O 2019-04-05 18:49:08 -04:00
media DMA mapping updates for 5.1 2019-03-10 11:54:48 -07:00
memory
memstick
message
mfd mfd: sun6i-prcm: Allow to compile with COMPILE_TEST 2019-04-03 08:38:07 +01:00
misc 5.1 Merge Window Pull Request 2019-03-09 15:53:03 -08:00
mmc mmc: sdhci-omap: Don't finish_mrq() on a command error during tuning 2019-04-11 12:40:32 +02:00
mtd mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer 2019-04-05 00:39:19 +02:00
mux
net Merge branch 'mlx5_tir_icm' into rdma.git for-next 2019-04-25 10:33:00 -03:00
nfc
ntb Fixes for switchtec debugability and mapping table entries, NTB 2019-03-15 14:32:59 -07:00
nubus
nvdimm device-dax for 5.1 2019-03-16 13:05:32 -07:00
nvme nvmet: fix discover log page when offsets are used 2019-04-11 17:28:30 +02:00
nvmem Char/Misc driver patches for 5.1-rc1 2019-03-06 14:18:59 -08:00
of of: fix kmemleak crash caused by imbalance in early memory reservation 2019-03-12 10:04:02 -07:00
opp PM / OPP: Update performance state when freq == old_freq 2019-03-12 09:45:56 +01:00
oprofile
parisc Revert: parisc: Use F_EXTEND() macro in iosapic code 2019-04-06 19:07:55 +02:00
parport Revert "parport: daisy: use new parport device model" 2019-03-25 14:49:00 -07:00
pci PCI: pciehp: Ignore Link State Changes after powering off a slot 2019-04-10 16:06:43 -05:00
pcmcia
perf arm64 updates for 5.1: 2019-03-10 10:17:23 -07:00
phy phy: sun4i-usb: Support set_mode to USB_HOST for non-OTG PHYs 2019-03-26 16:48:55 +09:00
pinctrl This is the bulk of pin control changes for the v5.1 kernel cycle. 2019-03-11 11:12:50 -07:00
platform Here's more than a handful of clk driver fixes for changes that came in 2019-04-13 14:33:56 -07:00
pnp ACPI/ACPICA: Trivial: fix spelling mistakes and fix whitespace formatting 2019-02-24 21:12:01 +01:00
power power: reset: at91-reset: add support for sam9x60 SoC 2019-02-20 00:41:01 +01:00
powercap powercap/intel_rapl: add Ice Lake mobile 2019-02-18 11:31:39 +01:00
pps
ps3
ptp Merge branch 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-03-05 14:08:26 -08:00
pwm pwm: atmel: Remove useless symbolic definitions 2019-03-04 12:52:49 +01:00
rapidio rapidio/mport_cdev: mark expected switch fall-through 2019-03-07 18:32:02 -08:00
ras
regulator regulator: mc13xxx: Constify regulator_ops variables 2019-03-04 00:01:08 +00:00
remoteproc remoteproc updates for v5.1 2019-03-14 09:00:06 -07:00
reset reset: meson-audio-arb: Fix missing .owner setting of reset_controller_dev 2019-03-25 16:22:10 +01:00
rpmsg rpmsg: virtio: change header file sort style 2019-02-20 21:15:54 -08:00
rtc rtc: da9063: set uie_unsupported when relevant 2019-04-02 23:33:09 +02:00
s390 SCSI fixes on 20190329 2019-03-29 14:58:49 -07:00
sbus
scsi for-linus-20190412 2019-04-13 16:23:16 -07:00
sfi
sh
siox
slimbus
sn
soc This pull request brings in a build fix for arm64 with bcm2835 2019-03-18 10:31:24 -07:00
soundwire
spi pci-v5.1-changes 2019-03-09 14:57:08 -08:00
spmi spmi: pmic-arb: select IRQ_DOMAIN_HIERARCHY in Kconfig 2019-02-14 09:14:50 +01:00
ssb
staging staging: vt6655: Remove vif check from vnt_interrupt 2019-03-29 17:25:45 +01:00
target SCSI misc on 20190315 2019-03-16 12:51:50 -07:00
tc
tee ARM: SoC driver updates for 5.1 2019-03-06 09:41:12 -08:00
thermal Merge branches 'fixes' and 'thermal-intel' into next 2019-03-18 22:37:44 +08:00
thunderbolt
tty tty: fix NULL pointer issue when tty_port ops is not set 2019-03-28 01:21:21 +09:00
uio
usb USB-serial fixes for 5.1-rc3 2019-03-29 15:31:16 +01:00
uwb
vfio vfio/type1: Limit DMA mappings per container 2019-04-03 12:43:05 -06:00
vhost virtio: fixes, cleanups 2019-03-10 12:47:57 -07:00
video fbdev changes for v5.1: 2019-03-15 14:22:59 -07:00
virt virt: vbox: Implement passing requestor info to the host for VirtualBox 6.0.x 2019-03-28 01:55:18 +09:00
virtio virtio: Honour 'may_reduce_num' in vring_create_virtqueue 2019-04-08 17:05:52 -04:00
visorbus
vlynq
vme
w1
watchdog linux-watchdog 5.1-rc1 tag 2019-03-11 11:22:15 -07:00
xen xen: fixes for 5.1-rc4 2019-04-07 06:12:10 -10:00
zorro
Kconfig
Makefile IOMMU Updates for Linux v5.1 2019-03-10 12:29:52 -07:00