Pull the trivial tree from Jiri Kosina:
"Tiny usual fixes all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
doc: fix old config name of kprobetrace
fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc
btrfs: fix the commment for the action flags in delayed-ref.h
btrfs: fix trivial typo for the comment of BTRFS_FREE_INO_OBJECTID
vfs: fix kerneldoc for generic_fh_to_parent()
treewide: fix comment/printk/variable typos
ipr: fix small coding style issues
doc: fix broken utf8 encoding
nfs: comment fix
platform/x86: fix asus_laptop.wled_type module parameter
mfd: printk/comment fixes
doc: getdelays.c: remember to close() socket on error in create_nl_socket()
doc: aliasing-test: close fd on write error
mmc: fix comment typos
dma: fix comments
spi: fix comment/printk typos in spi
Coccinelle: fix typo in memdup_user.cocci
tmiofb: missing NULL pointer checks
tools: perf: Fix typo in tools/perf
tools/testing: fix comment / output typos
...
- Support for putting regulators into bypass mode where they simply
switch their input to the output (mainly used for low power retention).
- A new API for setting voltages based on a voltage plus tolerance
rather than an explicit voltage range.
- Lots of cleanups and API updates from Axel Lin.
- New driver for MAX8907.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQZanxAAoJEOoSHmUN5Tg4X+0P/AvZNTJMNya8qapG1uOjdgML
w+gQTGw6g+7libJ8m3+M8YfoR0KRF3tGihGU9yZt8M7VkvZht/Q88eDOM0xw5FQo
XfpQjOb7+jc6CcVF8qGWz3DkXteqThRzwcqLR47/pv6EiNQB/2F/SWOp/btf/9hR
7YLFlKpv2LV4DwDnONGfJibe5Rrt0mj6dsy8ndmj/RMs7AVxN3H9V/L9q/kGNYIL
nnyRJIWTZ2R7L04shZBLaInnvARPbnrGiSUymZMwYO+5Op6qyuSWDf5IlfHZqsAc
pOWksUSHxvxAmh2MmH9cV4OT31e04qhAHxwxOzH4vpKMHdN1X92X56NwiF5K15/s
HJ86+TVU3JRnsHAPxVVWuplL4cY1UbaueMTt3FpA1xF9Y6W7DvIK/Y0S5FXHmYzN
RClrlvtlIU8m71QaroQmVaprLz0odrC0gyIO9TdW4KP+fkomKKNQLiSr7Wkv1Deg
lnjwlXSForW3Rqf9GLx3SJI29nHuy8F4RG55KvaxoXKm2TZdgtDG5x3ol1RW0/bd
OxF7SC/gEgnU0S4oOOdmrLOuUME2LY3wzi9TZT31kqieCQ3L257LHRtjBS8oZJpk
jnU5Fk+q+7nIxdxawgv11BS/z28T3ucRq1gkIC2Bxv138cF5UDsNNw6uKqjUcK7x
AmvUqMPwVCk8SIEAs2Oh
=Kg66
-----END PGP SIGNATURE-----
Merge tag 'regulator-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
- Support for putting regulators into bypass mode where they simply
switch their input to the output (mainly used for low power
retention).
- A new API for setting voltages based on a voltage plus tolerance
rather than an explicit voltage range.
- Lots of cleanups and API updates from Axel Lin.
- New driver for MAX8907.
* tag 'regulator-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (71 commits)
regulator: arizona-ldo: Remove top voltage
regulator: tps6586x: remove regulator-compatible from DT docs
regulator: tps65217.txt: remove regulator-compatible from DT docs
regulator: deprecate regulator-compatible DT property
regulator: fan53555: remove vsel_max not used
regulator: aat2870: Don't explicitly initialise the first field
extcon: arizona: Use bypass mode for MICVDD
regulator: wm831x-ldo: Add bypass support
regulator: arizona-micsupp: Support get/set bypass
regulator: arizona-ldo: Support get/set bypass
regulator: core: Provide regmap get/set bypass operations
regulator: core: Support bypass mode
regulator: Fairchild fan53555 support
regulator: twl: Remove another unused variable warning
regulator: core: Try using the parent device for the default regmap
regulator: core: Fast path non-deferred disables
regulator: core: Report microvolts in sysfs even with only list_voltage()
regulator: tps6586x: add support for SYS rail
regulator: lp872x: remove unnecessary function
regulator: lp872x: fix NULL pointer access problem
...
A quiet release for the regmap core, essentially all the activity is in
the shared interrupt controller which is being more and more widely
used and has been enhanced to support a wider range of masking types and
wake handling methods, plus integration with runtime PM for devices
making aggressive use of that.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQXeGvAAoJEOoSHmUN5Tg4kT8P/A9vmvBUdqGDrjpZh54c5r4u
lhefKwhvJLKvrakZyrlc1vcRs21Zf8w92bRQNVkkBuWaCSCcfwUgMTDvzm8un7sa
KJLohCFMAnaao8StmM14T/p8qoMlsrCD1BfngWRke5jhdj+i9SAgrOep+3vmACRt
A44G53Sj2ff0LCcJXLSrsHhexTERqg2/We2haDlVDsmwRXOmfpxPGfM7TO/0mBox
4XeogIIPsfg9PizR3kuAawSA4tIOu0NaKOYnj/o9HohJwIBEZ5ua1WQg+zXj72F/
3TbDiYqhB4qkDPErp3MRd9JDp2Au88763LmR4VByFoHnjUu7OV/eITFGE7U0DIbg
gCcmdIBKJN4r06x2wX/dIF202sCiQTvenYFowmHcaL/imXGesPFH3zZjDNVVwDbb
I7JgzFxyTuSxY2ZfMGL0UHQflucaTGNbdvm6bWrx1l+ZmUg01q4ugi2cSJ/wkzUu
Gi5UdmhFw9y2hhYgkxR35Ke6sBsiMFBh92K35x5Vp+R0nVINGOJeMBxNRdst2s1i
4PI5nWkS0tNDkRI5LXl1mRbO+aGyCyHK22qsRfEFifHTXvbyFmSy64xXPh6AWkk/
fWuJjgXa3U5aV0LL/VGJCtcrjLRQnKZDwYsIfC49h0yFNvNTy+AYUvouc8TtTUUj
p9j1bFbR79h7HeUeLLns
=l7V7
-----END PGP SIGNATURE-----
Merge tag 'regmap-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"A quiet release for the regmap core, essentially all the activity is
in the shared interrupt controller which is being more and more widely
used and has been enhanced to support a wider range of masking types
and wake handling methods, plus integration with runtime PM for
devices making aggressive use of that."
* tag 'regmap-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: no need primary handler for nested irq
regmap: irq: Add mask invert flag for enable register
mfd: wm8994: Flag the interrupt block as requiring runtime PM be enabled
regmap: irq: Enable devices for runtime PM while handling interrupts
regmap: irq: initialize all irqs to wake disabled
regmap: set MASK_ON_SUSPEND/SKIP_SET_WAKE if no wake_base
regmap: name irq_chip based on regmap_irq_chip's name
regmap: store irq_chip inside regmap_irq_chip_data
regmap: irq: Only update mask bits when doing initial mask
regmap: fix some error messages to take account of irq_reg_stride
regmap: Don't lock in regmap_reinit_cache()
Pull GFS2 updates from Steven Whitehouse:
"The major feature this time is the "rbm" conversion in the resource
group code. The new struct gfs2_rbm specifies the location of an
allocatable block in (resource group, bitmap, offset) form. There are
a number of added helper functions, and later patches then rewrite
some of the resource group code in terms of this new structure. Not
only does this give us a nice code clean up, but it also removes some
of the previous restrictions where extents could not cross bitmap
boundaries, for example.
In addition to that, there are a few bug fixes and clean ups, but the
rbm work is by far the majority of this patch set in terms of number
of changed lines."
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (27 commits)
GFS2: Write out dirty inode metadata in delayed deletes
GFS2: fix s_writers.counter imbalance in gfs2_ail_empty_gl
GFS2: Fix infinite loop in rbm_find
GFS2: Consolidate free block searching functions
GFS2: Get rid of I_MUTEX_QUOTA usage
GFS2: Stop block extents at the end of bitmaps
GFS2: Fix unclaimed_blocks() wrapping bug and clean up
GFS2: Improve block reservation tracing
GFS2: Fall back to ignoring reservations, if there are no other blocks left
GFS2: Fix ->show_options() for statfs slow
GFS2: Use rbm for gfs2_setbit()
GFS2: Use rbm for gfs2_testbit()
GFS2: Eliminate unnecessary check for state > 3 in bitfit
GFS2: Eliminate redundant calls to may_grant
GFS2: Combine functions gfs2_glock_dq_wait and wait_on_demote
GFS2: Combine functions gfs2_glock_wait and wait_on_holder
GFS2: inline __gfs2_glock_schedule_for_reclaim
GFS2: change function gfs2_direct_IO to use a normal gfs2_glock_dq
GFS2: rbm code cleanup
GFS2: Fix case where reservation finished at end of rgrp
...
The current SPI driver has many issues. Examples are:
* Segfaulting on most transfers due to expecting all transfers to have
both RX and TX buffers.
* Hanging on TX transfers since the whole driver flow is driven by RX
DMA completion, but the HW is only told to enable RX for RX transfers.
* Use of clk_disable_unprepare() from atomic context.
* Once those and other minor issues are fixed, the driver still doesn't
actually work.
* The driver also implements a deprecated API to the SPI core.
For this reason, simply remove the driver completely. This has two
advantages:
1) This will remove the last use of Tegra's <mach/dma.h>, which will
allow that file to be removed, which is required for single zImage
work.
2) The downstream driver is significaly different from the current
code. I believe a patch to re-add the downstream driver (with
appropriate cleanup) will be much simpler to review if it's a new
file rather than randomly interspered with essentially unrelated
existing code.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
In the current driver, the SENSE_PORT firmware command is issued as a
"wrapped" command, but the command handling code doesn't have a
wrapper, so it will never do anything other than log an error message.
The latest ConnectX-3 2.11.500 firmware reports the SENSE_PORT
capability even in multi-function (SR-IOV) mode, so the driver will
try to issue the command.
At least until the driver has a proper wrapper for SENSE_PORT, make
sure we disable the command for multi-function devices.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Instead of having a hard-coded "PCI device ID != 0x1003" (which
obviously breaks as newer devices with ID != 0x1003 become available),
instead let's set a flag in our PCI device table for the older devices
where we're supposed to force using SENSE_PORT. This also avoids
enabling SENSE_PORT for virtual functions by mistake.
Signed-off-by: Roland Dreier <roland@purestorage.com>
That way we can check flags later on, when we've finished with the
pci_device_id structure. Also convert the "is VF" flag to an enum:
"Never do in the preprocessor what can be done in C."
Signed-off-by: Roland Dreier <roland@purestorage.com>
When a device is unplugged, wait for all processes that have opened the device
to close before deallocating the device.
Signed-off-by: Ratan Nalumasu <ratan@google.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Matthieu CASTET adjusted picolcd_debug_out_report() to only operate when
there is an active listener on debugfs for events.
His change got lost while splitting hid_picolcd.c, restore it.
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Fix the touch-up no response problem on GeneralTouch twofingers touchscreen and
modify the driver for new GeneralTouch PWT touchscreen.
Signed-off-by: Xianhan Yu <aroundight77@gmail.com>
Reviewed-by Benjamin Tissoires <benjamin.tissoires@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The Sony PS3 Blue-ray Disc Remote Control used to be supported by the
BlueZ project's user space, but the code that handled it was recently
removed as its functionality conflicted with a real HSP implementation
and the mapping was thought to be better handled in the kernel. This is
a port of the mapping logic from the fakehid driver by Marcel Holtmann
to the in-kernel HID layer.
We also add support for the Logitech Harmony Adapter for PS3, which
emulates the BD Remote.
Signed-off-by: David Dillow <dave@thedillows.org>
Signed-off-by: Antonio Ospite <ospite@studenti.unina.it>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The dev_rdesc member of the hid_device structure is meant to store the original
report descriptor received from the device, but it is currently passed to any
report_fixup method before it is copied to the rdesc member. This patch uses a
temporary buffer to shield dev_rdesc from the side effects of many HID drivers'
report_fixup implementations.
usbhid's hid_post_reset checks the report descriptor currently returned by the
device against a descriptor that may have been modified by a driver's
report_fixup method. That leaves some devices nonfunctional after a resume, with
a "reset_resume error 1" reported. This patch checks the new descriptor against
the unmodified dev_rdesc instead and uses the original, instead of modified,
report size.
BugLink: http://bugs.launchpad.net/bugs/1049623
Signed-off-by: Kevin Daughtridge <kevin@kdau.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The BCM2835 GPIO module is a combined GPIO controller, (GPIO) interrupt
controller, and pinmux/control device.
Original driver by Simon Arlott.
Rewrite including GPIO chip device by Chris Boot.
Upstreaming changes by Stephen Warren:
* Wrote DT binding documentation.
* Changed brcm,function to an integer to more directly match the
datasheet, and to match brcm,pins being an integer.
* Implemented pull-up/down pin config.
* Removed read-only DT property and related code. The restriction this
implemented are driven by the board, not the GPIO HW block, so don't
really make sense of a HW block binding, were in general incomplete
(since they could only know about the few pins hard-coded into the
Raspberry Pi B board design and not the uncommitted GPIOS), and are
better represented simply by not writing incorrect data into pin
configuration nodes.
* Don't set GPIO_IN function select in gpio_request_enable() to avoid
glitches; defer this to gpio_set_direction(). Consequently, removed
empty bcm2835_pmx_gpio_request_enable().
* Simplified enabled_irq_map[]; make it explicitly 1 entry per bank.
* Lifted use of enabled_irq_map[] outside the per-interrupt loop in
IRQ handler, thus fixing an issue where the code was indexing into
enabled_irq_map[] by intra-bank GPIO ID, not global GPIO ID.
* Removed locking in IRQ handler, since all other code uses
spin_lock_irqsave() and so guarantees it doesn't run concurrently
with the handler.
* Moved duplicated BUILD_BUG_ON()s into probe(). Also check size of
bcm2835_gpio_pins[].
* Remove range-checking from bcm2835_pctl_get_groups_count() since we've
decided to trust the pinctrl core.
* Made bcm2835_pmx_gpio_disable_free() call bcm2835_pinctrl_fsel_set()
directly for simplicity.
* Fixed body of dt_free_map() to match latest dt_node_to_map().
* Removed GPIO ownership check from bcm2835_pmx_enable() since the pinctrl
core owns doing this.
* Made irq_chip and pinctrl_gpio_range .name == MODULE_NAME so it's more
descriptive.
* Simplified remove(); removed call to non-existent
pinctrl_remove_gpio_range(), remove early return on error.
* Don't force gpiochip's base to 0. Set gpio_range.base to gpiochip's
base GPIO number.
* Error-handling cleanups in probe().
* Switched to module_platform_driver() rather than open-coding.
* Made pin, group, and function names lower-case.
* s/broadcom/brcm/ in DT property names.
* s/2708/2835/.
* Fixed a couple minor checkpatch warnings, and other minor cleanup.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: Chris Boot <bootc@bootc.net>
Signed-off-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This pure documentation fix tries to align the "idle" and
"sleep" pin states to the idle and suspend states from
runtime PM.
Cc: Patrice Chotard <patrice.chotard@st.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
We need to call scsi_done() for commands after we abort them.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
srp_free_req() uses the scsi_cmnd structure contents to unmap
buffers, so we must invoke srp_free_req() before we release
ownership of that structure.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Removing old variables caused a compile error from nes_debug().
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Hardware resource types are added and when a resource isn't available,
its type is printed.
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When TX checksum offload is disabled for an iWarp connection,
skb->ip_summed needs to be set to CHECKSUM_NONE.
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- Remove unnecessary statement "if (1)"
- Refactor a statement (wqe_misc |= NES_NIC_SQ_WQE_COMPLETION) out of
if/else statement, because it is independant of the flow.
- Define netdev->features in one line for clarity.
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In TSO handling code, skb_shared_info() is used to get the MSS
instead of the bool function skb_is_gso() (which always returns 1).
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
On an SR-IOV master device, __mlx4_init_one() calls mlx4_init_hca()
before mlx4_multi_func_init(). However, for unlucky configurations,
mlx4_init_hca() might call mlx4_SENSE_PORT() (via mlx4_dev_cap()), and
that calls mlx4_cmd_imm() with MLX4_CMD_WRAPPED set.
However, on a multifunction device with MLX4_CMD_WRAPPED, __mlx4_cmd()
calls into mlx4_slave_cmd(), and that immediately tries to do
down(&priv->cmd.slave_sem);
but priv->cmd.slave_sem isn't initialized until mlx4_multi_func_init()
(which we haven't called yet). The next thing it tries to do is access
priv->mfunc.vhcr, but that hasn't been allocated yet.
Fix this by moving the initialization of slave_sem and vhcr up into
mlx4_cmd_init(). Also, since slave_sem is really just being used as a
mutex, convert it into a slave_cmd_mutex.
Signed-off-by: Roland Dreier <roland@purestorage.com>
When we have VFs and PFs on same host, the VFs are activated within
the mlx4_core module before the mlx4_ib kernel module is loaded.
When the mlx4_ib module initializes the PF (master), it now creates
MAD paravirtualization contexts for any VFs that already active.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...). The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use. This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.
Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).
To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:
base_sqpn -- in non-sriov mode, this was formerly sqp_start.
base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This is necessary in order to support > 1 VF/PF in a VM for software
that uses the node guid as a discriminator, such as librdmacm.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Remove the error returns for IB ports from mlx4_ib_add,
mlx4_INIT_PORT_wrapper, and mlx4_CLOSE_PORT_wrapper.
Currently, SRIOV is supported only for devices for which the
link layer is IB on all ports; RoCE support will be added later.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
1. Allow only master to change node description.
2. Prevent AH leakage in send mads.
3. Take device part number from PCI structure, so that guests see the
VF part number (and not the PF part number).
4. Place the device revision ID into caps structure at startup.
5. SET_PORT in update_gids_task needs to go through wrapper on master.
6. In mlx4_ib_event(), PORT_MGMT_EVENT needs be handled in a work
queue on the master, since it propagates events to slaves using
GEN_EQE.
7. Do not support FMR on slaves.
8. Add spinlock to slave_event(), since it is called both in interrupt
context and in process context (due to 6 above, and also if
smp_snoop is used). This fix was found and implemented by Saeed
Mahameed <saeedm@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Normally, INIT_PORT and CLOSE_PORT are invoked when special QP0
transitions to RTR, or transitions to ERR/RESET respectively.
In SR-IOV mode, however, the master is also paravirtualized. This in
turn requires that we not do INIT_PORT until the entire QP0 path (real
QP0 and proxy QP0) is ready to receive. When the real QP0 goes down,
we should indicate that the port is not active.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
1. Slaves may not set the IS_SM capability for the port.
2. DEV_MGMT may not be set in multifunction mode.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This directory is added only for the master -- slaves do not have it.
The sysfs iov directory is used to manage and examine the port P_Key
and guid paravirtualization.
Under iov/ports, the administrator may examine the gid and P_Key tables
as they are present in the device (and as are seen in the "network
view" presented to the SM).
Under the iov/<pci slot number> directories, the admin may map the
index numbers in the physical tables (as under iov/ports) to the
paravirtualized index numbers that guests see.
For example, if the administrator, for port 1 on guest 2 maps physical
pkey index 10 to virtual index 1, then that guest, whenever it uses
its pkey index 1, will actually be using the real pkey index 10.
Based on patch from Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
P_Key change and guid change events are not of interest to all slaves,
but only to those slaves which "see" the table slots whose contents
have change.
For example, if the guid at port 1, index 5 has changed in the PPF, we
wish to propagate the gid-change event only to the function which has
that guid index mapped to its port/guid table (in this case it is
slave #5). Other functions should not get the event, since the event
does not affect them.
Similarly with P_Keys -- P_Key change events are forwarded only to
slaves which have that P_Key index mapped to their virtual P_Key table.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
For IB ports, we paravirtualize the GUID at index 0 on slaves. The
GUID at index 0 seen by a slave is the actual GUID occupying the GUID
table at the slave-id index.
The driver, by default, requests at startup time that subnet manager
populate its entire guid table with GUIDs. These guids are then mapped
(paravirtualized) to the slaves, and appear for each slave as its GUID
at index 0.
Until each slave has such a guid, its port status is DOWN.
The guid table is cached to support special QP paravirtualization, and
event propagation to slaves on guid change (we test to see if the guid
really changed before propagating an event to the slave).
To support this caching, add capability to __mlx4_ib_query_gid() to
obtain the network view (i.e., physical view) gid at index X, not just
the host (paravirtualized) view.
Based on a patch from Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
For an IB port, a slave should not show port active until that slave
has a valid alias-guid (provided by the subnet manager). Therefore
the port-up event should be passed to a slave only after both the port
is up, and the slave's alias-guid has been set.
Also, provide the infrastructure for propagating port-management
events (client-reregister, etc) to slaves.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In CM para-virtualization:
1. Incoming requests are steered to the correct vHCA according to the
embedded GID.
2. Communication IDs on outgoing requests are replaced by a globally
unique ID, generated by the PPF, since there is no synchronization
of ID generation between guests (and so these IDs are not
guaranteed to be globally unique). The guest's comm ID is stored,
and is returned to the response MAD when it arrives.
Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
MCG paravirtualization support includes:
- Creating multicast groups by VFs, and keeping accounting of them
- Leaving multicast groups by VFs
- Updating SM only with real changes in the overall picture of MCGs status
- Creation of MGID=0 groups (let SM choose MGID)
Note that the MCG module maintains its own internal MCG object
reference counts. The reason for this is that the IB core is used to
track only the multicast groups joins generated by the PF it runs
over. The PF IB core layer is unaware of slaves, so it cannot be used
to keep track of MCG joins they generate.
Signed-off-by: Oren Duer <oren@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The MAD_IFC firmware command fulfills two functions.
First, it is used in the QP0/QP1 MAD-handling flow to obtain
information from the FW (for answering queries), and for setting
variables in the HCA (MAD SET packets).
For this, MAD_IFC should provide the FW (physical) view of the data.
This is the view that OpenSM needs. We call this the "network view".
In the second case, MAD_IFC is used by various verbs to obtain data
regarding the local HCA (e.g., ib_query_device()). We call this the
"host view".
This data needs to be paravirtualized.
MAD_IFC therefore needs a wrapper function, and also needs another
flag indicating whether it should provide the network view (when it is
called by ib_process_mad in special-qp packet handling), or the host
view (when it is called while implementing a verb).
There are currently 2 flag parameters in mlx4_MAD_IFC already:
ignore_bkey and ignore_mkey. These two parameters are replaced by a
single "mad_ifc_flags" parameter, with different bits set for each
flag. A third flag is added: "network-view/host-view".
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Special QPs are paravirtualized.
vHCAs are not given direct access to QP0/1. Rather, these QPs are
operated by a special context hosted by the PF, which mediates access
to/from vHCAs. This is done by opening a "tunnel" per vHCA port per
QP0/1. A tunnel comprises a pair of UD QPs: a "Tunnel QP" in the
PF-context and a "Proxy QP" in the vHCA. All vHCA MAD traffic must
pass through the corresponding tunnel. vHCA QPs cannot be assigned to
VL15 and are denied of the well-known QKey.
Outgoing messages are "de-multiplexed" (i.e., directed to the wire via
the real special QP).
Incoming messages are "multiplexed" (i.e. steered by the PPF to the
correct VF or to the PF)
QP0 access is restricted to the PF vHCA. VF vHCAs also have (virtual)
QP0s, but they never receive any SMPs and all SMPs sent are discarded.
QP1 traffic is allowed for all vHCAs, but special care is required to
bridge the gap between the host and network views.
Specifically:
- Transaction IDs are mapped to guarantee uniqueness among vHCAs
- CM para-virtualization
o Incoming requests are steered to the correct vHCA according to the embedded GID
o Local communication IDs are mapped to ensure uniqueness among vHCAs
(see the patch that adds CM paravirtualization.)
- Multicast para-virtualization
o The PF context aggregates membership state from all vHCAs
o The SA is contacted only when the aggregate membership changes
o If the aggregate does not change, the PF context will provide the
requesting vHCA with the proper response.
(see the patch that adds multicast group paravirtualization)
Incoming MADs are steered according to:
- the DGID If a GRH is present
- the mapped transaction ID for response MADs
- the embedded GID in CM requests
- the remote communication ID in other CM messages
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This requires:
1. Replacing the paravirtualized P_Key index (inserted by the guest)
with the real P_Key index.
2. For UD QPs, placing the guest's true source GID index in the
address path structure mgid field, and setting the ud_force_mgid
bit so that the mgid is taken from the QP context and not from the
WQE when posting sends.
3. For UC and RC QPs, placing the guest's true source GID index in the
address path structure mgid field.
4. For tunnel and proxy QPs, setting the Q_Key value reserved for that
proxy/tunnel pair.
Since not all the above adjustments occur in all the QP transitions,
the QP transitions require separate wrapper functions.
Secondly, initialize the P_Key virtualization table to its default
values: Master virtualized table is 1-1 with the real P_Key table,
guest virtualized table has P_Key index 0 mapped to the real P_Key
index 0, and all the other P_Key indices mapped to the reserved
(invalid) P_Key at index 127.
Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache.
and generating events on the master only if a P_Key actually changed.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Allocate SR-IOV paravirtualization resources and MAD demuxing contexts
on the master.
This has two parts. The first part is to initialize the structures to
contain the contexts. This is done at master startup time in
mlx4_ib_init_sriov().
The second part is to actually create the tunneling resources required
on the master to support a slave. This is performed the master
detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event
generated when a slave initializes its comm channel).
For the master, there is no such startup event, so it creates its own
tunneling resources when it starts up. In addition, the master also
creates the real special QPs. The ib_core layer on the master causes
creation of proxy special QPs, since the master is also
paravirtualized at the ib_core layer.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In addition, pass the proxy and tunnel QP numbers to slaves so the
driver can perform special QP paravirtualization.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
1. Introduce the basic SR-IOV parvirtualization context objects for
multiplexing and demultiplexing MADs.
2. Introduce support for the new proxy and tunnel QP types.
This patch introduces the objects required by the master for managing
QP paravirtualization for guests.
struct mlx4_ib_sriov is created by the master only.
It is a container for the following:
1. All the info required by the PPF to multiplex and de-multiplex MADs
(including those from the PF). (struct mlx4_ib_demux_ctx demux)
2. All the info required to manage alias GUIDs (i.e., the GUID at
index 0 that each guest perceives. In fact, this is not the GUID
which is actually at index 0, but is, in fact, the GUID which is at
index[<VF number>] in the physical table.
3. structures which are used to manage CM paravirtualization
4. structures for managing the real special QPs when running in SR-IOV
mode. The real SQPs are controlled by the PPF in this case. All
SQPs created and controlled by the ib core layer are proxy SQP.
struct mlx4_ib_demux_ctx contains the information per port needed
to manage paravirtualization:
1. All multicast paravirt info
2. All tunnel-qp paravirt info for the port.
3. GUID-table and GUID-prefix for the port
4. work queues.
struct mlx4_ib_demux_pv_ctx contains all the info for managing the
paravirtualized QPs for one slave/port.
struct mlx4_ib_demux_pv_qp contains the info need to run an individual
QP (either tunnel qp or real SQP).
Note: We made use of the 2 most significant bits in enum
mlx4_ib_qp_flags (based on enum ib_qp_create_flags in ib_verbs.h).
We need these bits in the low-level driver for internal purposes.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>