Commit Graph

781771 Commits

Author SHA1 Message Date
Darrick J. Wong
9e037cb797 xfs: check for unknown v5 feature bits in superblock write verifier
Make sure we never try to write the superblock with unknown feature bits
set.  We checked those at mount time, so if they're set now then memory
is corrupt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-07-31 13:18:09 -07:00
Darrick J. Wong
69775fd15d xfs: verify icount in superblock write
Add a helper predicate to check the inode count for sanity, then use it
in the superblock write verifier to inspect sb_icount.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-07-31 13:18:09 -07:00
Bill O'Donnell
8756a5af18 libxfs: add more bounds checking to sb sanity checks
Current sb verifier doesn't check bounds on sb_fdblocks and sb_ifree.
Add sanity checks for these parameters.

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
[darrick: port to refactored sb validation predicates]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-07-31 13:18:09 -07:00
Darrick J. Wong
eca383fcd6 xfs: refactor superblock verifiers
Split the superblock verifier into the common checks, the read-time
checks, and the write-time check functions.  No functional changes, but
we're setting up to add more write-only checks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-07-31 13:18:09 -07:00
Darrick J. Wong
86d969b425 xfs: refactor the xrep_extent_list into xfs_bitmap
As mentioned previously, the xrep_extent_list basically implements a
bitmap with two functions: set and disjoint union.  Rename all these
functions to xfs_bitmap to shorten the name and make it more obvious
what we're doing.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-07-31 13:18:08 -07:00
Linus Torvalds
37b71411b7 Merge tag 'audit-pr-20180731' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
 "A single small audit fix to guard against memory allocation failures
  when logging information about a kernel module load.

  It's small, easy to understand, and self-contained; while nothing is
  zero risk, this should be pretty low"

* tag 'audit-pr-20180731' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: fix potential null dereference 'context->module.name'
2018-07-31 13:17:46 -07:00
Arthur Fabre
fbeb1603bf bpf: verifier: MOV64 don't mark dst reg unbounded
When check_alu_op() handles a BPF_MOV64 between two registers,
it calls check_reg_arg(DST_OP) on the dst register, marking it
as unbounded. If the src and dst register are the same, this
marks the src as unbounded, which can lead to unexpected errors
for further checks that rely on bounds info. For example:

	BPF_MOV64_IMM(BPF_REG_2, 0),
	BPF_MOV64_REG(BPF_REG_2, BPF_REG_2),
	BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_2),
	BPF_MOV64_IMM(BPF_REG_0, 0),
	BPF_EXIT_INSN(),

Results in:

	"math between ctx pointer and register with unbounded
	min value is not allowed"

check_alu_op() now uses check_reg_arg(DST_OP_NO_MARK), and MOVs
that need to mark the dst register (MOVIMM, MOV32) do so.

Added a test case for MOV64 dst == src, and dst != src.

Signed-off-by: Arthur Fabre <afabre@cloudflare.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-31 22:09:33 +02:00
Anna-Maria Gleixner
80d20d35af nohz: Fix local_timer_softirq_pending()
local_timer_softirq_pending() checks whether the timer softirq is
pending with: local_softirq_pending() & TIMER_SOFTIRQ.

This is wrong because TIMER_SOFTIRQ is the softirq number and not a
bitmask. So the test checks for the wrong bit.

Use BIT(TIMER_SOFTIRQ) instead.

Fixes: 5d62c183f9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()")
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Cc: bigeasy@linutronix.de
Cc: peterz@infradead.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de
2018-07-31 22:08:44 +02:00
Feras Daoud
8e1d162d8e net/mlx5e: IPoIB, Set the netdevice sw mtu in ipoib enhanced flow
After introduction of the cited commit, mlx5e_build_nic_params
receives the netdevice mtu in order to set the sw_mtu of mlx5e_params.
For enhanced IPoIB, the netdevice mtu is not set in this stage,
therefore, the initial sw_mtu equals zero. As a result, the hw_mtu
of the receive queue will be calculated incorrectly causing traffic
issues.

To fix this issue, query for port mtu before building the nic params.

Fixes: 472a1e44b3 ("net/mlx5e: Save MTU in channels params")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-31 12:58:45 -07:00
Adi Nissim
eacecf2760 net/mlx5e: Fix null pointer access when setting MTU of vport representor
MTU helper function is used by both conventional mlx5e
instances (PF/VF) and the eswitch representors. The representor
shouldn't change the nic vport context MTU, the VF is responsible for
that. Therefore set_mtu_cb has a null value when changing the
representor MTU.

Fixes: 250a42b6a7 ("net/mlx5e: Support configurable MTU for vport representors")
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-31 12:58:44 -07:00
Or Gerlitz
2e8e70d249 net/mlx5e: Set port trust mode to PCP as default
The hairpin offload code has dependency on the trust mode being PCP.

Hence we should set PCP as the default for handling cases where we are
disallowed to read the trust mode from the FW, or failed to initialize it.

Fixes: 106be53b6b ('net/mlx5e: Set per priority hairpin pairs')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-31 12:58:43 -07:00
Eli Cohen
5f5991f36d net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager
Execute mlx5_eswitch_init() only if we have MLX5_ESWITCH_MANAGER
capabilities.
Do the same for mlx5_eswitch_cleanup().

Fixes: a9f7705ffd ("net/mlx5: Unify vport manager capability check")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-31 12:58:43 -07:00
Christoph Hellwig
e6476c2144 net: remove bogus RCU annotations on socket.wq
We never use RCU protection for it, just a lot of cargo-cult
rcu_deference_protects calls.

Note that we do keep the kfree_rcu call for it, as the references through
struct sock are RCU protected and thus might require a grace period before
freeing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31 12:40:22 -07:00
Karthikeyan Ramasubramanian
37692de5d5 i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller
This bus driver supports the GENI based i2c hardware controller in the
Qualcomm SOCs. The Qualcomm Generic Interface (GENI) is a programmable
module supporting a wide range of serial interfaces including I2C. The
driver supports FIFO mode and DMA mode of transfer and switches modes
dynamically depending on the size of the transfer.

Signed-off-by: Karthikeyan Ramasubramanian <kramasub@codeaurora.org>
Signed-off-by: Sagar Dharia <sdharia@codeaurora.org>
Signed-off-by: Girish Mahadevan <girishm@codeaurora.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Stephen Boyd <swboyd@chromium.org>
[wsa: squashed the MAINTAINER addition and a RPM fix by Evan Green]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-07-31 21:17:20 +02:00
Lukas Wunner
4e6a13356f PCI: pciehp: Deduplicate presence check on probe & resume
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot.  Both code paths are
identical, so deduplicate them per Mika's request.

On probe, an additional check is performed to disable power of an
unoccupied slot.  This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume:  The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled.  Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.

To allow for deduplication of the presence check, move the power check
to pcie_init().  This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.

However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested.  If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message.  Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat.  Avoid by disabling
interrupts before disabling power.  (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-31 13:27:24 -05:00
Lukas Wunner
8bb46b079d PCI: pciehp: Avoid implicit fallthroughs in switch statements
Per Mika's request, add an explicit break to the last case of switch
statements everywhere in pciehp to be more defensive towards future
amendments.

Per Gustavo's request, mark all non-empty implicit fallthroughs with a
comment to silence warnings triggered by -Wimplicit-fallthrough=2.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2018-07-31 13:26:33 -05:00
Andrew Lunn
6751e7c66c net: dsa: mv88e6xxx: Fix SERDES support on 88E6141/6341
Version 1 of the patch adding SERDES support to the 88E6141/6341
correctly added the ops to the 88E6141/6341. However, by the time
version 3 was committed, the ops had moved to the 88E6085/6175. Put
them back where they belong.

Fixes: 5bafeb6e7e ("net: dsa: mv88e6xxx: 88E6141/6341 SERDES support")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31 10:36:59 -07:00
Alexandre Belloni
84a7f564fa mips: dts: mscc: enable spi and NOR flash support on ocelot PCB123
Ocelot PCB123 has a SPI NOR connected on its SPI bus.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20103/
Cc: Mark Brown <broonie@kernel.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-spi@vger.kernel.org
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Allan Nielsen <allan.nielsen@microsemi.com>
2018-07-31 10:34:34 -07:00
Alexandre Belloni
9eaf3ba5e0 mips: dts: mscc: Add spi on Ocelot
Add support for the SPI controller

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20101/
Cc: Mark Brown <broonie@kernel.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-spi@vger.kernel.org
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Allan Nielsen <allan.nielsen@microsemi.com>
2018-07-31 10:34:08 -07:00
David S. Miller
9249069590 Merge tag 'wireless-drivers-for-davem-2018-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:

====================
wireless-drivers fixes for 4.18

Last set of fixes before 4.18 is released

iwlwifi

* add new IDs for cards already available on the market

brcmfmac

* fix a regression introduced in v4.17
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31 10:33:43 -07:00
Srinivas Kandagatla
c8cb5f775c ASoC: wcd9335: add CLASS-H Controller support
CLASS-H controller/Amplifier is common accorss Qualcomm WCD codec series.
This patchset adds basic CLASS-H controller apis for WCD codecs after
wcd9335 to use.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-07-31 18:17:37 +01:00
Srinivas Kandagatla
e57d4ca882 ASoC: wcd9335: add support to wcd9335 codec
Qualcomm WCD9335 Codec is a standalone Hi-Fi audio codec IC,
It supports both I2S/I2C and SLIMbus audio interfaces.
On slimbus interface it supports two data lanes; 16 Tx ports
and 8 Rx ports. It has Seven DACs and nine dedicated interpolators,
Seven (six audio ADCs, and one VBAT ADC), Multibutton headset
control (MBHC), Active noise cancellation and Sidetone paths
and processing.

This patchset adds very basic support for playback and capture
via the 9 interpolators and ADC respectively.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-07-31 18:17:32 +01:00
Srinivas Kandagatla
1877c9fda1 ASoC: dt-bindings: add dt bindings for wcd9335 audio codec
This patch adds bindings for wcd9335 audio codec which can support both SLIMbus
and I2S/I2C interface.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-07-31 18:17:18 +01:00
Jason Wang
d46eeeaf99 virtio-net: get rid of unnecessary container of rq stats
We don't maintain tx counters in rx stats any more. There's no need
for an extra container of rq stats.

Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31 10:02:01 -07:00
Jason Wang
ca9e83b4a5 virtio-net: correctly update XDP_TX counters
Commit 5b8f3c8d30 ("virtio_net: Add XDP related stats") tries to
count TX XDP stats in virtnet_receive(). This will cause several
issues:

- virtnet_xdp_sq() was called without checking whether or not XDP is
  set. This may cause out of bound access when there's no enough txq
  for XDP.
- Stats were updated even if there's no XDP/XDP_TX.

Fixing this by reusing virtnet_xdp_xmit() for XDP_TX which can counts
TX XDP counter itself and remove the unnecessary tx stats embedded in
rx stats.

Reported-by: syzbot+604f8271211546f5b3c7@syzkaller.appspotmail.com
Fixes: 5b8f3c8d30 ("virtio_net: Add XDP related stats")
Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31 10:02:01 -07:00
Colin Ian King
2789e83c93 xen/gntdev: don't dereference a null gntdev_dmabuf on allocation failure
Currently when the allocation of gntdev_dmabuf fails, the error exit
path will call dmabuf_imp_free_storage and causes a null pointer
dereference on gntdev_dmabuf.  Fix this by adding an error exit path
that won't free gntdev_dmabuf.

Detected by CoverityScan, CID#1472124 ("Dereference after null check")

Fixes: bf8dc55b13 ("xen/gntdev: Implement dma-buf import functionality")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-07-31 12:59:13 -04:00
Stephen Hemminger
8fdee4cc95 sunrpc: whitespace fixes
Remove trailing whitespace and blank line at EOF

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-31 12:53:40 -04:00
Bill Baker
0f90be132c NFSv4 client live hangs after live data migration recovery
After a live data migration event at the NFS server, the client may send
I/O requests to the wrong server, causing a live hang due to repeated
recovery events.  On the wire, this will appear as an I/O request failing
with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly.
NFS4ERR_BADSSESSION is returned because the session ID being used was
issued by the other server and is not valid at the old server.

The failure is caused by async worker threads having cached the transport
(xprt) in the rpc_task structure.  After the migration recovery completes,
the task is redispatched and the task resends the request to the wrong
server based on the old value still present in tk_xprt.

The solution is to recompute the tk_xprt field of the rpc_task structure
so that the request goes to the correct server.

Signed-off-by: Bill Baker <bill.baker@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Helen Chao <helen.chao@oracle.com>
Fixes: fb43d17210 ("SUNRPC: Use the multipath iterator to assign a ...")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-31 12:53:40 -04:00
Olga Kornievskaia
32cd3ee511 NFSv4.0 fix client reference leak in callback
If there is an error during processing of a callback message, it leads
to refrence leak on the client structure and eventually an unclean
superblock.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-31 12:53:40 -04:00
Dan Carpenter
1a54c0cfcb sunrpc: kstrtoul() can also return -ERANGE
Smatch complains that "num" can be uninitialized when kstrtoul() returns
-ERANGE.  It's true enough, but basically harmless in this case.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-31 12:53:40 -04:00
Dan Carpenter
379ebf0796 NFS: silence a harmless uninitialized variable warning
kstrtoul() can return -ERANGE so Smatch complains that "num" can be
uninitialized.  We check that it's within bounds so it's not a huge
deal.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-31 12:53:40 -04:00
Dave Wysochanski
016583d703 sunrpc: Change rpc_print_iostats to rpc_clnt_show_stats and handle rpc_clnt clones
The existing rpc_print_iostats has a few shortcomings.  First, the naming
is not consistent with other functions in the kernel that display stats.
Second, it is really displaying stats for an rpc_clnt structure as it
displays both xprt stats and per-op stats.  Third, it does not handle
rpc_clnt clones, which is important for the one in-kernel tree caller
of this function, the NFS client's nfs_show_stats function.

Fix all of the above by renaming the rpc_print_iostats to
rpc_clnt_show_stats and looping through any rpc_clnt clones via
cl_parent.

Once this interface is fixed, this addresses a problem with NFSv4.
Before this patch, the /proc/self/mountstats always showed incorrect
counts for NFSv4 lease and session related opcodes such as SEQUENCE,
RENEW, SETCLIENTID, CREATE_SESSION, etc.  These counts were always 0
even though many ops would go over the wire.  The reason for this is
there are multiple rpc_clnt structures allocated for any given NFSv4
mount, and inside nfs_show_stats() we callled into rpc_print_iostats()
which only handled one of them, nfs_server->client.  Fix these counts
by calling sunrpc's new rpc_clnt_show_stats() function, which handles
cloned rpc_clnt structs and prints the stats together.

Note that one side-effect of the above is that multiple mounts from
the same NFS server will show identical counts in the above ops due
to the fact the one rpc_clnt (representing the NFSv4 client state)
is shared across mounts.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-07-31 12:53:35 -04:00
David S. Miller
6293e4d674 Merge branch 'xsk-improvements-to-RX-queue-check-and-replace'
Jakub Kicinski says:

====================
xsk: improvements to RX queue check and replace

First 3 patches of my recent RFC.  The first one make the check against
real_num_rx_queues slightly more reliable, while the latter two redefine
XDP_QUERY_XSK_UMEM slightly to disallow replacing UMEM in the driver at
the stack level.

I'm not sure where this lays on the bpf vs net trees scale, but there
should be no conflicts with either tree.
====================

Acked-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31 09:48:22 -07:00
Jakub Kicinski
84c6b86875 xsk: don't allow umem replace at stack level
Currently drivers have to check if they already have a umem
installed for a given queue and return an error if so.  Make
better use of XDP_QUERY_XSK_UMEM and move this functionality
to the core.

We need to keep rtnl across the calls now.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31 09:48:21 -07:00
Jakub Kicinski
f734607e81 xsk: refactor xdp_umem_assign_dev()
Return early and only take the ref on dev once there is no possibility
of failing.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31 09:48:21 -07:00
Jakub Kicinski
c29c2ebd2a net: update real_num_rx_queues even when !CONFIG_SYSFS
We used to depend on real_num_rx_queues as a upper bound for sanity
checks.  For AF_XDP socket validation it's useful if the check behaves
the same regardless of CONFIG_SYSFS setting.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31 09:48:21 -07:00
Linus Torvalds
c1d61e7fe3 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Nine fixes, five in the qla2xxx driver, the most serious of which is
  the uninitialized list head crash which can be observed in most
  systems under a sufficiently loaded low memory environment.

  The two sg fixes are minor but obvious and two target ones which seem
  reasonable but not high impact"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qla2xxx: Return error when TMF returns
  scsi: qla2xxx: Fix ISP recovery on unload
  scsi: qla2xxx: Fix driver unload by shutting down chip
  scsi: qla2xxx: Fix NPIV deletion by calling wait_for_sess_deletion
  scsi: qla2xxx: Fix unintialized List head crash
  scsi: sg: update comment for blk_get_request()
  scsi: sg: fix minor memory leak in error path
  scsi: libiscsi: fix possible NULL pointer dereference in case of TMF
  scsi: target: iscsi: cxgbit: fix max iso npdu calculation
2018-07-31 09:46:36 -07:00
Jesper Dangaard Brouer
39c64d8c87 mlx5: handle DMA mapping error case for XDP redirect
Commit 58b99ee3e3 ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
forgot to return/free the xdp_frame in case the DMA mapping failed, correct this.

Also DMA unmap the frame in case mlx5e_xmit_xdp_frame() fails.

Fixes: 58b99ee3e3 ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31 09:44:41 -07:00
Linus Torvalds
095c3633f1 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
 "Some bugfixes that seem important and safe enough to merge at the last
  minute"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_balloon: fix another race between migration and ballooning
  tools/virtio: add kmalloc_array stub
  tools/virtio: add dma barrier stubs
2018-07-31 09:35:32 -07:00
Linus Torvalds
c786e4052c Merge tag 'acpi-urgent-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
 "These fix a recent ACPICA regression affecting control method
  execution at the table level and an earlier hibernation regression in
  the ACPI driver for Intel SoCs (LPSS) that was missed by a previous
  fix in this cycle.

  Specifics:

   - Fix a recent ACPICA regression introduced by a previous fix that
     caused control method execution at the table level to be mishandled
     by mistake (Erik Schmauss).

   - Fix a hibernation regression from the 4.15 cycle in the ACPI driver
     for Intel SoCs (LPSS) that caused the platform firmware to be
     confused during resume from hibernation by the driver's PM quirks
     which was fixed for system-wide suspend/resume (ACPI S3) earlier in
     this cycle, but that previous fix missed the hibernation (ACPI S4)
     case (Rafael Wysocki)"

* tag 'acpi-urgent-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPICA: AML Parser: ignore control method status in module-level code
  ACPI / LPSS: Avoid PM quirks on suspend and resume from hibernation
2018-07-31 09:31:18 -07:00
Hari Vyas
44bda4b7d2 PCI: Fix is_added/is_busmaster race condition
When a PCI device is detected, pdev->is_added is set to 1 and proc and
sysfs entries are created.

When the device is removed, pdev->is_added is checked for one and then
device is detached with clearing of proc and sys entries and at end,
pdev->is_added is set to 0.

is_added and is_busmaster are bit fields in pci_dev structure sharing same
memory location.

A strange issue was observed with multiple removal and rescan of a PCIe
NVMe device using sysfs commands where is_added flag was observed as zero
instead of one while removing device and proc,sys entries are not cleared.
This causes issue in later device addition with warning message
"proc_dir_entry" already registered.

Debugging revealed a race condition between the PCI core setting the
is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
setting the is_busmaster bit in pci_set_master().  As these fields are not
handled atomically, that clears the is_added bit.

Move the is_added bit to a separate private flag variable and use atomic
functions to set and retrieve the device addition state.  This avoids the
race because is_added no longer shares a memory location with is_busmaster.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 11:27:54 -05:00
Lukas Wunner
47a8e237ed PCI: Whitelist Thunderbolt ports for runtime D3
Thunderbolt controllers can be runtime suspended to D3cold to save ~1.5W.
This requires that runtime D3 is allowed on its PCIe ports, so whitelist
them.

The 2015 BIOS cutoff that we've instituted for runtime D3 on PCIe ports
is unnecessary on Thunderbolt because we know that even the oldest
controller, Light Ridge (2010), is able to suspend its ports to D3 just
fine -- specifically including its hotplug ports.  And the power saving
should be afforded to machines even if their BIOS predates 2015.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andreas Noever <andreas.noever@gmail.com>
2018-07-31 11:09:36 -05:00
Lukas Wunner
eb3b5bf1a8 PCI: Whitelist native hotplug ports for runtime D3
Previously we blacklisted PCIe hotplug ports for runtime D3 because:

(a) Ports handled by the firmware must not be transitioned to D3 by the
    OS behind the firmware's back:
    https://bugzilla.kernel.org/show_bug.cgi?id=53811

(b) Ports handled natively by the OS lacked runtime D3 support in the
    pciehp driver.

We've just rectified the latter, so allow users to manually enable and
test it by passing pcie_port_pm=force on the command line.  Vendors are
thus put in a position to validate hotplug ports for runtime D3 and
perhaps we can someday enable it by default, but with a BIOS cutoff date.

Ashok Raj tested runtime D3 on hotplug ports of a SkyLake Xeon-SP in
2017 and encountered Hardware Error NMIs, so this feature clearly cannot
be enabled for everyone yet:
https://lkml.kernel.org/r/20170503180426.GA4058@otc-nc-03

While at it, remove an erroneous code comment I added with 97a90aee5d
("PCI: Consolidate conditions to allow runtime PM on PCIe ports") which
claims that parents of a hotplug port must stay awake lest interrupts
cannot be delivered.  That has turned out to be wrong at least for
Thunderbolt hotplug ports.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-31 11:09:36 -05:00
Lukas Wunner
82c3fbff6e PCI: sysfs: Resume to D0 on function reset
When performing a function reset via sysfs, the device's config space is
accessed in places such as pcie_flr() and its MMIO space is accessed e.g.
in reset_ivb_igd(), so ensure accessibility by resuming the device to D0.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-31 11:09:36 -05:00
Lukas Wunner
4417aa45c1 PCI: pciehp: Resume parent to D0 on config space access
Ensure accessibility of a hotplug port's config space when accessed via
sysfs by resuming its parent to D0.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-31 11:09:36 -05:00
Lukas Wunner
8350307454 PCI: pciehp: Resume to D0 on enable/disable
pciehp's IRQ thread ensures accessibility of the port by runtime resuming
its parent to D0.  However when the slot is enabled/disabled, the port
itself needs to be in D0 because its secondary bus is accessed in:

    pciehp_check_link_status(),
    pciehp_configure_device() (both called from board_added())
and
    pciehp_unconfigure_device() (called from remove_board()).

Thus, acquire a runtime PM ref on enable/disablement of the slot.

Yinghai Lu additionally discovered that some SkyLake servers feature a
Power Controller for their PCIe hotplug ports (PCIe r3.1, sec 6.7.1.8)
which requires the port to be in D0 when invoking

    pciehp_power_on_slot() (likewise called from board_added()).

If slot power is turned on while in D3hot, link training later fails:
https://lkml.kernel.org/r/20170205073454.GA253@wunner.de

The spec is silent about such a requirement, but it seems prudent to
assume that any hotplug port with a Power Controller may need this.

The present commit holds a runtime PM ref whenever slot power is turned
on and off, but it doesn't keep the port in D0 as long as slot power is
on.  If vendors determine that's necessary, they need to amend pciehp to
acquire a runtime PM ref in pciehp_power_on_slot() and release one in
pciehp_power_off_slot().

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-31 11:09:36 -05:00
Lukas Wunner
6b08c3854c PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment.  After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?

It turns out that assumption is wrong at least for Thunderbolt:  Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts.  Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.

If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events.  For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.

That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port.  The worst case is a Thunderbolt downstream port at the
end of a daisy chain:  There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port.  That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.

Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)

PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."

This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended.  pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there.  On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled.  (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)

(Side note:  Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector".  That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports.  This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec:  "The Ports on a Switch that
are not the Upstream Port are Downstream Ports.  All Ports on a Root
Complex are Downstream Ports.")

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-31 11:08:56 -05:00
Lukas Wunner
469e764c4a PCI: pciehp: Obey compulsory command delay after resume
Upon resume from system sleep, the Slot Control register is written via:

  pci_pm_resume_noirq()
    pci_pm_default_resume_early()
      pci_restore_state()
        pci_restore_pcie_state()

PCIe r4.0, sec 6.7.3.2 says that after "issuing a write transaction that
targets any portion of the Port's Slot Control register, [...] software
must wait for [the] command to complete before issuing the next command".

pciehp currently fails to enforce that rule after the above-mentioned
write.  Fix it.

(Moving restoration of the Slot Control register to pciehp doesn't seem
to make sense because the other PCIe hotplug drivers may need it as
well.)

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-31 11:07:59 -05:00
Lukas Wunner
7903782460 PCI: pciehp: Clear spurious events earlier on resume
Thunderbolt hotplug ports that were occupied before system sleep resume
with their downstream link in "off" state.  Only after the Thunderbolt
controller has reestablished the PCIe tunnels does the link go up.
As a result, a spurious Presence Detect Changed and/or Data Link Layer
State Changed event occurs.

The events are not immediately acted upon because tunnel reestablishment
happens in the ->resume_noirq phase, when interrupts are still disabled.
Also, notification of events may initially be disabled in the Slot
Control register when coming out of system sleep and is reenabled in the
->resume_noirq phase through:

  pci_pm_resume_noirq()
    pci_pm_default_resume_early()
      pci_restore_state()
        pci_restore_pcie_state()

It is not guaranteed that the events are acted upon at all:  PCIe r4.0,
sec 6.7.3.4 says that "a port may optionally send an MSI when there are
hot-plug events that occur while interrupt generation is disabled, and
interrupt generation is subsequently enabled."  Note the "optionally".

If an MSI is sent, pciehp will gratuitously turn the slot off and back
on once the ->resume_early phase has commenced.

If an MSI is not sent, the extant, unacknowledged events in the Slot
Status register will prevent future notification of presence or link
changes.

Commit 13c65840fe ("PCI: pciehp: Clear Presence Detect and Data Link
Layer Status Changed on resume") fixed the latter by clearing the events
in the ->resume phase.  Move this to the ->resume_noirq phase to also
fix the gratuitous disable/enablement of the slot.

The commit further restored the Slot Control register in the ->resume
phase, but that's dispensable because as shown above it's already been
done in the ->resume_noirq phase.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-31 11:07:59 -05:00
Lukas Wunner
6ccb127ba6 PCI: portdrv: Deduplicate PM callback iterator
Replace suspend_iter() and resume_iter() with a single function pm_iter()
to allow addition of port service callbacks for further power management
phases without having to add another iterator each time.

No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-31 11:07:59 -05:00