Commit Graph

723112 Commits

Author SHA1 Message Date
Chuck Lever
b4744e00a3 xprtrdma: Add trace points for connect events
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-23 09:44:37 -05:00
Chuck Lever
1c443effa3 xprtrdma: Add trace points to instrument MR allocation and recovery
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-23 09:44:36 -05:00
Chuck Lever
2937fede11 xprtrdma: Add trace points to instrument memory invalidation
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-23 09:44:36 -05:00
Chuck Lever
e11b7c9655 xprtrdma: Add trace points in reply decoder path
This includes decoding Write and Reply chunks, and fixing up inline
payloads.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-23 09:44:36 -05:00
Chuck Lever
58f10ad40d xprtrdma: Add trace points to instrument memory registration
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-23 09:44:35 -05:00
Chuck Lever
b4a7f91c1d xprtrdma: Add trace points in the RPC Reply handler paths
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-23 09:44:35 -05:00
Chuck Lever
ab03eff58e xprtrdma: Add trace points in RPC Call transmit paths
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-23 09:44:35 -05:00
Chuck Lever
e48f083e19 rpcrdma: infrastructure for static trace points in rpcrdma.ko
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-23 09:44:25 -05:00
Chuck Lever
6b3a60ae7b rdma/ib: Add trace point macros to display human-readable values
These can be shared with all kernel ULPs, and more can easily be
added as needed.

Note: checkpatch.pl has some heartburn with the TRACE_DEFINE_ENUM
macros and the LIST macros. These follow the same style as other
header files under include/tracing/events , thus should be
considered acceptable exceptions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-23 09:44:14 -05:00
Chuck Lever
ec12e479e3 xprtrdma: Introduce rpcrdma_mw_unmap_and_put
Clean up: Code review suggested that a common bit of code can be
placed into a helper function, and this gives us fewer places to
stick an "I DMA unmapped something" trace point.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:52 -05:00
Chuck Lever
96ceddea37 xprtrdma: Remove usage of "mw"
Clean up: struct rpcrdma_mw was named after Memory Windows, but
xprtrdma no longer supports a Memory Window registration mode.
Rename rpcrdma_mw and its fields to reduce confusion and make
the code more sensible to read.

Renaming "mw" was suggested by Tom Talpey, the author of the
original xprtrdma implementation. It's a good idea, but I haven't
done this until now because it's a huge diffstat for no benefit
other than code readability.

However, I'm about to introduce static trace points that expose
a few of xprtrdma's internal data structures. They should make sense
in the trace report, and it's reasonable to treat trace points as a
kernel API contract which might be difficult to change later.

While I'm churning things up, two additional changes:
- rename variables unhelpfully called "r" to "mr", to improve code
  clarity, and
- rename the MR-related helper functions using the form
  "rpcrdma_mr_<verb>", to be consistent with other areas of the
  code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:51 -05:00
Chuck Lever
ce5b371782 xprtrdma: Replace all usage of "frmr" with "frwr"
Clean up: Over time, the industry has adopted the term "frwr"
instead of "frmr". The term "frwr" is now more widely recognized.

For the past couple of years I've attempted to add new code using
"frwr" , but there still remains plenty of older code that still
uses "frmr". Replace all usage of "frmr" to avoid confusion.

While we're churning code, rename variables unhelpfully called "f"
to "frwr", to improve code clarity.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:50 -05:00
Chuck Lever
30b5416bf0 xprtrdma: Don't clear RPC_BC_PA_IN_USE on pre-allocated rpc_rqst's
No need for the overhead of atomically setting and clearing this bit
flag for every use of a pre-allocated backchannel rpc_rqst. These
are a distinct pool of rpc_rqsts that are used only for callback
operations, so it is safe to simply leave the bit set.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:49 -05:00
Chuck Lever
cf73daf527 xprtrdma: Split xprt_rdma_send_request
Clean up. @rqst is set up differently for backchannel Replies. For
example, rqst->rq_task and task->tk_client are both NULL. So it is
easier to understand and maintain this code path if it is separated.

Also, we can get rid of the confusing rl_connect_cookie hack in
rpcrdma_bc_receive_call.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:48 -05:00
Chuck Lever
6c537f2c7c xprtrdma: buf_free not called for CB replies
Since commit 5a6d1db455 ("SUNRPC: Add a transport-specific private
field in rpc_rqst"), the rpc_rqst's for RPC-over-RDMA backchannel
operations leave rq_buffer set to NULL.

xprt_release does not invoke ->op->buf_free when rq_buffer is NULL.
The RPCRDMA_REQ_F_BACKCHANNEL check in xprt_rdma_free is therefore
redundant because xprt_rdma_free is not invoked for backchannel
requests.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:47 -05:00
Chuck Lever
a2b6470b1c xprtrdma: Move unmap-safe logic to rpcrdma_marshal_req
Clean up. This logic is related to marshaling the request, and I'd
like to keep everything that touches req->rl_registered close
together, for CPU cache efficiency.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:47 -05:00
Chuck Lever
20035edf3c xprtrdma: Support IPv6 in xprt_rdma_set_port
Clean up a harmless oversight. xprtrdma's ->set_port method has
never properly supported IPv6.

This issue has never been a problem because NFS/RDMA mounts have
always required "port=20049", thus so far, rpcbind is not invoked
for these mounts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:46 -05:00
Chuck Lever
dd229cee4e xprtrdma: Remove another sockaddr_storage field (cdata::addr)
Save more space in struct rpcrdma_xprt by removing the redundant
"addr" field from struct rpcrdma_create_data_internal. Wherever
we have rpcrdma_xprt, we also have the rpc_xprt, which has a
sockaddr_storage field with the same content.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:45 -05:00
Chuck Lever
d461f1f2fb xprtrdma: Initialize the xprt address string array earlier
This makes the address strings available for debugging messages in
earlier stages of transport set up.

The first benefit is to get rid of the single-use rep_remote_addr
field, saving 128+ bytes in struct rpcrdma_ep.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:45 -05:00
Chuck Lever
104927042c xprtrdma: Remove unused padding variables
Clean up. Remove fields that should have been removed by
commit b3221d6a53 ("xprtrdma: Remove logic that constructs
RDMA_MSGP type calls").

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:44 -05:00
Chuck Lever
3f0e3edd6a xprtrdma: Remove ri_reminv_expected
Clean up.

Commit b5f0afbea4 ("xprtrdma: Per-connection pad optimization")
should have removed this.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:43 -05:00
Chuck Lever
c34416182f xprtrdma: Per-mode handling for Remote Invalidation
Refactoring change: Remote Invalidation is particular to the memory
registration mode that is use. Use a callout instead of a generic
function to handle Remote Invalidation.

This gets rid of the 8-byte flags field in struct rpcrdma_mw, of
which only a single bit flag has been allocated.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:43 -05:00
Chuck Lever
42b9f5c58a xprtrdma: Eliminate unnecessary lock cycle in xprt_rdma_send_request
The rpcrdma_req is not shared yet, and its associated Send hasn't
been posted, thus RMW should be safe. There's no need for the
expense of a lock cycle here.

Fixes: 0ba6f37012 ("xprtrdma: Refactor rpcrdma_deferred_completion")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:42 -05:00
Chuck Lever
d698c4a02e xprtrdma: Fix backchannel allocation of extra rpcrdma_reps
The backchannel code uses rpcrdma_recv_buffer_put to add new reps
to the free rep list. This also decrements rb_recv_count, which
spoofs the receive overrun logic in rpcrdma_buffer_get_rep.

Commit 9b06688bc3 ("xprtrdma: Fix additional uses of
spin_lock_irqsave(rb_lock)") replaced the original open-coded
list_add with a call to rpcrdma_recv_buffer_put(), but then a year
later, commit 05c974669e ("xprtrdma: Fix receive buffer
accounting") added rep accounting to rpcrdma_recv_buffer_put.
It was an oversight to let the backchannel continue to use this
function.

The fix this, let's combine the "add to free list" logic with
rpcrdma_create_rep.

Also, do not allocate RPCRDMA_MAX_BC_REQUESTS rpcrdma_reps in
rpcrdma_buffer_create and then allocate additional rpcrdma_reps in
rpcrdma_bc_setup_reps. Allocating the extra reps during backchannel
set-up is sufficient.

Fixes: 05c974669e ("xprtrdma: Fix receive buffer accounting")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:41 -05:00
Chuck Lever
03ac1a76ce xprtrdma: Fix buffer leak after transport set up failure
This leak has been around forever, and is exceptionally rare.

EINVAL causes mount to fail with "an incorrect mount option was
specified" although it's not likely that one of the mount
options is incorrect. Instead, return ENODEV in this case, as this
appears to be an issue with system or device configuration rather
than a specific mount option.

Some obsolete comments are also removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16 11:19:41 -05:00
Linus Torvalds
1291a0d504 Linux 4.15-rc4 2017-12-17 18:59:59 -08:00
Kees Cook
779f4e1c6c Revert "exec: avoid RLIMIT_STACK races with prlimit()"
This reverts commit 04e35f4495.

SELinux runs with secureexec for all non-"noatsecure" domain transitions,
which means lots of processes end up hitting the stack hard-limit change
that was introduced in order to fix a race with prlimit(). That race fix
will need to be redesigned.

Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Tomáš Trnka <trnka@scm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-17 14:26:25 -08:00
Linus Torvalds
f8940a0f20 Merge branch 'WIP.x86-pti.base-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Page Table Isolation (PTI) v4.14 backporting base tree from Ingo Molnar:
 "This tree contains the v4.14 PTI backport preparatory tree, which
  consists of four merges of upstream trees and 7 cherry-picked commits,
  which the upcoming PTI work depends on"

NOTE! The resulting tree is exactly the same as the original base tree
(ie the diff between this commit and its immediate first parent is
empty).

The only reason for this merge is literally to have a common point for
the actual PTI changes so that the commits can be shared in both the
4.15 and 4.14 trees.

* 'WIP.x86-pti.base-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow
  locking/barriers: Convert users of lockless_dereference() to READ_ONCE()
  locking/barriers: Add implicit smp_read_barrier_depends() to READ_ONCE()
  bpf: fix build issues on um due to mising bpf_perf_event.h
  perf/x86: Enable free running PEBS for REGS_USER/INTR
  x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD
  x86/cpufeature: Add User-Mode Instruction Prevention definitions
2017-12-17 13:57:08 -08:00
Linus Torvalds
6ba64feff6 Merge branch 'WIP.x86-pti.base.prep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Page Table Isolation (PTI) preparatory tree from Ingo Molnar:
 "This does a rename to free up linux/pti.h to be used by the upcoming
  page table isolation feature"

* 'WIP.x86-pti.base.prep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  drivers/misc/intel/pti: Rename the header file to free up the namespace
2017-12-17 13:54:31 -08:00
Linus Torvalds
2ffb448ccb Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "A single bugfix which prevents arbitrary sigev_notify values in
  posix-timers"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-timer: Properly check sigevent->sigev_notify
2017-12-17 13:48:50 -08:00
Linus Torvalds
c43727908f dmaengine fixes for v4.15-rc4
Here are fixes for this round
 - Fix for disable clk on error path in fsl-edma driver
 - Disable clk fail fix in jz4740 driver
 - Fix long pending bug in dmatest driver for dangling pointer
 - Fix potential NULL pointer dereference in at_hdmac driver
 - Error handling path in ioat driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJaNp2PAAoJEHwUBw8lI4NHNfEP/RcERW0adeh4HXIges3FMW9N
 cqhRQpnUWmYIOTsfrWDH6WyR7pmEBOLryBzZXQ6Y8I3AOW4FTwBOG/W8NqFj+aiA
 JjeKRkXdpHD1eZpav72FUUhf4inb7BTtpGN5lBEFFk+gA3rWLscUw33OJLZmxsb5
 Vi82pJARfst42XB/jZUBDq+YNYlgG+NspYgUWvS5EnNXJJmy+47c5TS9EQUovctM
 X0yrKNomSW2qEO/IEXTUvXtArxpmxBrZI1q9RU21JJvF3izElqYO9C3OhGijM27u
 ERHWM1iStOxOl7zrzp8JJmVswdHyW/rT5uSlhl4689S2UZEJrqBGPo67Nh5w0Isi
 Nk+2X8I7Z/tNByggXASHX/xXymQGJF/mxAs1KypnSE0DpCJQi7oJAG6UY7lbIOz8
 CRGRt0gM8Xe/GrZa+7ZXmc7BoCrNPaGmW070XX3BoMcqT9m+SFOD2DhT351AVoOI
 uIPw5gAw0OCYhzkPP1c6AZPdLLFlVGrmPARRto/Af7w4Fnw4UlWGQ/+d6oQlvEyG
 G0kPxKE3e6uHXOizgm3GkbqpQWjTz649JMNw+xVey9e/pSVXORYIIomPmwahbh8x
 7Rfojaccq3w0YYG58z/Q5+WmzWf0jU//G/Cq5LsOm/5ghZTYRna7olA6lxFcn+ab
 6WQBW1rV2y8XUpDl4ScB
 =2egq
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-4.15-rc4' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "This time consisting of fixes in a bunch of drivers and the dmatest
  module:

   - Fix for disable clk on error path in fsl-edma driver
   - Disable clk fail fix in jz4740 driver
   - Fix long pending bug in dmatest driver for dangling pointer
   - Fix potential NULL pointer dereference in at_hdmac driver
   - Error handling path in ioat driver"

* tag 'dmaengine-fix-4.15-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: fsl-edma: disable clks on all error paths
  dmaengine: jz4740: disable/unprepare clk if probe fails
  dmaengine: dmatest: move callback wait queue to thread context
  dmaengine: at_hdmac: fix potential NULL pointer dereference in atc_prep_dma_interleaved
  dmaengine: ioat: Fix error handling path
2017-12-17 13:28:49 -08:00
Arnd Bergmann
b9f5fb1800 cramfs: fix MTD dependency
With CONFIG_MTD=m and CONFIG_CRAMFS=y, we now get a link failure:

  fs/cramfs/inode.o: In function `cramfs_mount': inode.c:(.text+0x220): undefined reference to `mount_mtd'
  fs/cramfs/inode.o: In function `cramfs_mtd_fill_super':
  inode.c:(.text+0x6d8): undefined reference to `mtd_point'
  inode.c:(.text+0xae4): undefined reference to `mtd_unpoint'

This adds a more specific Kconfig dependency to avoid the broken
configuration.

Alternatively we could make CRAMFS itself depend on "MTD || !MTD" with a
similar result.

Fixes: 99c18ce580 ("cramfs: direct memory access support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-17 12:20:58 -08:00
Linus Torvalds
73d080d374 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "The alloc_super() one is a regression in this merge window, lazytime
  thing is older..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Handle lazytime in do_mount()
  alloc_super(): do ->s_umount initialization earlier
2017-12-17 12:18:35 -08:00
Linus Torvalds
1c6b942d7d Fix a regression which caused us to fail to interpret symlinks in very
ancient ext3 file system images.  Also fix two xfstests failures, one
 of which could cause a OOPS, plus an additional bug fix caught by fuzz
 testing.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlo1y3EACgkQ8vlZVpUN
 gaNFOQf/bMf6ynai1dGGRwef+UcT874NZ2Hqm+UqI6pxusz0ZeKWm8HWfPfg31Fa
 o+OnUsZ7NXFBIHyfXKFJzdOgutjZ5eY0vMu+NrlyBdd6W+ZcHwn1PvQsLapFYvqK
 Rt+8nWTKqtnksSfh0vyODmUYgItOULOPPepjnIPm/Pd0DinJwo0GY/8MzLkz4SpX
 g6R60ou0ToEYNqBXAKIBnZ4aq8KWMtCMGcD270U5eAm/63Pt4riRwJbjITxZPAH1
 wKzivP4Ce5ce8W2g2/6mFFlBFWvtlB491T+BsgHUEv3OLze+kYS2PcxQthhEmBR8
 zeZ2o2/0tTxejE//cyJ4gCe3fYGRDg==
 =xqLC
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix a regression which caused us to fail to interpret symlinks in very
  ancient ext3 file system images.

  Also fix two xfstests failures, one of which could cause an OOPS, plus
  an additional bug fix caught by fuzz testing"

* tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix crash when a directory's i_size is too small
  ext4: add missing error check in __ext4_new_inode()
  ext4: fix fdatasync(2) after fallocate(2) operation
  ext4: support fast symlinks from ext3 file systems
2017-12-17 12:14:33 -08:00
Andrey Ryabinin
2aeb07365b x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow
[ Note, this is a Git cherry-pick of the following commit:

    d17a1d97dc: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")

  ... for easier x86 PTI code testing and back-porting. ]

The KASAN shadow is currently mapped using vmemmap_populate() since that
provides a semi-convenient way to map pages into init_top_pgt.  However,
since that no longer zeroes the mapped pages, it is not suitable for
KASAN, which requires zeroed shadow memory.

Add kasan_populate_shadow() interface and use it instead of
vmemmap_populate().  Besides, this allows us to take advantage of
gigantic pages and use them to populate the shadow, which should save us
some memory wasted on page tables and reduce TLB pressure.

Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:57:26 +01:00
Will Deacon
3382290ed2 locking/barriers: Convert users of lockless_dereference() to READ_ONCE()
[ Note, this is a Git cherry-pick of the following commit:

    506458efaf ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()")

  ... for easier x86 PTI code testing and back-porting. ]

READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it
can be used instead of lockless_dereference() without any change in
semantics.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:57:15 +01:00
Will Deacon
c2bc66082e locking/barriers: Add implicit smp_read_barrier_depends() to READ_ONCE()
[ Note, this is a Git cherry-pick of the following commit:

    76ebbe78f7 ("locking/barriers: Add implicit smp_read_barrier_depends() to READ_ONCE()")

  ... for easier x86 PTI code testing and back-porting. ]

In preparation for the removal of lockless_dereference(), which is the
same as READ_ONCE() on all architectures other than Alpha, add an
implicit smp_read_barrier_depends() to READ_ONCE() so that it can be
used to head dependency chains on all architectures.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508840570-22169-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:57:01 +01:00
Daniel Borkmann
ab95477e7c bpf: fix build issues on um due to mising bpf_perf_event.h
[ Note, this is a Git cherry-pick of the following commit:

    a23f06f06d ("bpf: fix build issues on um due to mising bpf_perf_event.h")

  ... for easier x86 PTI code testing and back-porting. ]

Since c895f6f703 ("bpf: correct broken uapi for
BPF_PROG_TYPE_PERF_EVENT program type") um (uml) won't build
on i386 or x86_64:

  [...]
    CC      init/main.o
  In file included from ../include/linux/perf_event.h:18:0,
                   from ../include/linux/trace_events.h:10,
                   from ../include/trace/syscall.h:7,
                   from ../include/linux/syscalls.h:82,
                   from ../init/main.c:20:
  ../include/uapi/linux/bpf_perf_event.h:11:32: fatal error:
  asm/bpf_perf_event.h: No such file or directory #include
  <asm/bpf_perf_event.h>
  [...]

Lets add missing bpf_perf_event.h also to um arch. This seems
to be the only one still missing.

Fixes: c895f6f703 ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Richard Weinberger <richard@sigma-star.at>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Richard Weinberger <richard@sigma-star.at>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:56:34 +01:00
Andi Kleen
2fe1bc1f50 perf/x86: Enable free running PEBS for REGS_USER/INTR
[ Note, this is a Git cherry-pick of the following commit:

    a47ba4d77e ("perf/x86: Enable free running PEBS for REGS_USER/INTR")

  ... for easier x86 PTI code testing and back-porting. ]

Currently free running PEBS is disabled when user or interrupt
registers are requested. Most of the registers are actually
available in the PEBS record and can be supported.

So we just need to check for the supported registers and then
allow it: it is all except for the segment register.

For user registers this only works when the counter is limited
to ring 3 only, so this also needs to be checked.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170831214630.21892-1-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:55:17 +01:00
Rudolf Marek
f2dbad36c5 x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD
[ Note, this is a Git cherry-pick of the following commit:

    2b67799bdf25 ("x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD")

  ... for easier x86 PTI code testing and back-porting. ]

The latest AMD AMD64 Architecture Programmer's Manual
adds a CPUID feature XSaveErPtr (CPUID_Fn80000008_EBX[2]).

If this feature is set, the FXSAVE, XSAVE, FXSAVEOPT, XSAVEC, XSAVES
/ FXRSTOR, XRSTOR, XRSTORS always save/restore error pointers,
thus making the X86_BUG_FXSAVE_LEAK workaround obsolete on such CPUs.

Signed-Off-By: Rudolf Marek <r.marek@assembler.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/bdcebe90-62c5-1f05-083c-eba7f08b2540@assembler.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:55:02 +01:00
Ricardo Neri
a8b4db562e x86/cpufeature: Add User-Mode Instruction Prevention definitions
[ Note, this is a Git cherry-pick of the following commit: (limited to the cpufeatures.h file)

    3522c2a6a4 ("x86/cpufeature: Add User-Mode Instruction Prevention definitions")

  ... for easier x86 PTI code testing and back-porting. ]

User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.

The subset of instructions comprises:

 * SGDT - Store Global Descriptor Table
 * SIDT - Store Interrupt Descriptor Table
 * SLDT - Store Local Descriptor Table
 * SMSW - Store Machine Status Word
 * STR  - Store Task Register

This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: ricardo.neri@intel.com
Link: http://lkml.kernel.org/r/1509935277-22138-7-git-send-email-ricardo.neri-calderon@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:54:34 +01:00
Ingo Molnar
e5d77a73f3 Merge commit 'upstream-x86-virt' into WIP.x86/mm
Merge a minimal set of virt cleanups, for a base for the MM isolation patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:50:01 +01:00
Ingo Molnar
2ec077c186 Merge branch 'upstream-acpi-fixes' into WIP.x86/pti.base
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:09:31 +01:00
Ingo Molnar
650400b2cc Merge branch 'upstream-x86-selftests' into WIP.x86/pti.base
Conflicts:
	arch/x86/kernel/cpu/Makefile

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:04:28 +01:00
Ingo Molnar
0fd2e9c53d Merge commit 'upstream-x86-entry' into WIP.x86/mm
Pull in a minimal set of v4.15 entry code changes, for a base for the MM isolation patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 12:58:53 +01:00
Ingo Molnar
1784f9144b drivers/misc/intel/pti: Rename the header file to free up the namespace
We'd like to use the 'PTI' acronym for 'Page Table Isolation' - free up the
namespace by renaming the <linux/pti.h> driver header to <linux/intel-pti.h>.

(Also standardize the header guard name while at it.)

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: J Freyensee <james_p_freyensee@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 12:52:34 +01:00
Linus Torvalds
f3b5ad89de Second pull request for 4.15-rc
- Fix for SELinux on the umad SMI path. Some old hardware
   does not fill the PKey properly exposing another bug in the newer
   SELinux code.
 - Check the input port as we can exceed array bounds from this
   user supplied value
 - Users are unable to use the hash field support as they want due to
   incorrect checks on the field restrictions, correct that so the
   feature works as intended
 - User triggerable oops in the NETLINK_RDMA handler
 - cxgb4 driver fix for a bad interaction with CQ flushing in iser
   caused by patches in this merge window, and bad CQ flushing during
   normal close.
 - Unbalanced memalloc_noio in ipoib in an error path.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJaNGJ9AAoJEDht9xV+IJsawK4P/iVlUR8DReXKVPkxYOQk15bI
 GEKG3t2Ce1GaeFZY7TBmsdHBRTHhxf2osEM57TbBmWv6N/pG83GLresE6xxOhRHz
 3s2hzWElJXpYnM0QttHCNjJvySIzjzLZiaQyhiWFqs0+cPVUM9zQd0G77LwHngRf
 1gO7toTMYk8eZkJt0ClQwHMeH6qR892o+zDUtorX/Ez4Ly4tT3I/RwRpbZ1HHpsA
 uWMYcsge7lRzFbZnC+lDoeqozcv20B7n9UBEcAHJkVSh5JFC+TByRmCAZ/hPzjXb
 Pr2E4gTYT+ULUsPRECtIwupT30xfFdByFYBAl+EQ+fiJvGgBxcgVjdLDQ3Ddlb6n
 ga5UEverYKivizitKowtpMCJ0nVH6R4qLt5vcPwxuoHKQmUtXQFeg/haZPWCiPwr
 B4Ahm371yRx8xo4AITBFX4L4PdtmdAueyrrjz/MxJm5YM2eRy08OONFVlBqXTuqT
 EdbtHFCbXtE3aAIiWGmUA0jbswKN9fkUct/wMwkny8T3h/XPKhBqA/SWN5SX1KC3
 EHAjczAcX+MOS52pyhf07C3Z/oq4gpXSCQQHSat9es8oxst4w0CWcCGqIhyqyu2q
 s5CZG3Ok+OvTmKWRJkaEAJXHRoTB1OjkgEod13xRDQmONW/cabeBKe/BC1zmvctM
 g2eyl4amP8MGaRTldpou
 =oyIl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "More fixes from testing done on the rc kernel, including more SELinux
  testing. Looking forward, lockdep found regression today in ipoib
  which is still being fixed.

  Summary:

   - Fix for SELinux on the umad SMI path. Some old hardware does not
     fill the PKey properly exposing another bug in the newer SELinux
     code.

   - Check the input port as we can exceed array bounds from this user
     supplied value

   - Users are unable to use the hash field support as they want due to
     incorrect checks on the field restrictions, correct that so the
     feature works as intended

   - User triggerable oops in the NETLINK_RDMA handler

   - cxgb4 driver fix for a bad interaction with CQ flushing in iser
     caused by patches in this merge window, and bad CQ flushing during
     normal close.

   - Unbalanced memalloc_noio in ipoib in an error path"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/ipoib: Restore MM behavior in case of tx_ring allocation failure
  iw_cxgb4: only insert drain cqes if wq is flushed
  iw_cxgb4: only clear the ARMED bit if a notification is needed
  RDMA/netlink: Fix general protection fault
  IB/mlx4: Fix RSS hash fields restrictions
  IB/core: Don't enforce PKey security on SMI MADs
  IB/core: Bound check alternate path port number
2017-12-16 13:43:08 -08:00
Linus Torvalds
f25e229560 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "Two bugfixes for the AT24 I2C eeprom driver and some minor corrections
  for I2C bus drivers"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: piix4: Fix port number check on release
  i2c: stm32: Fix copyrights
  i2c-cht-wc: constify platform_device_id
  eeprom: at24: change nvmem stride to 1
  eeprom: at24: fix I2C device selection for runtime PM
2017-12-16 13:34:38 -08:00
Linus Torvalds
d025fbf1a2 NFS client fixes for Linux 4.15-rc4
Stable bugfixes:
 - NFS: Avoid a BUG_ON() in nfs_commit_inode() by not waiting for a
        commit in the case that there were no commit requests.
 - SUNRPC: Fix a race in the receive code path
 
 Other fixes:
 - NFS: Fix a deadlock in nfs client initialization
 - xprtrdma: Fix a performance regression for small IOs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlo0PdMACgkQ18tUv7Cl
 QOvlUg/+KoXWXNwItHIyyegYgRXcAPpaCtdnCjjOP6R9HEJ+clnLcaqDxdDKVWQ/
 oDvEcQcsBpywbUi7vVrvdar4mofwuyjXPpbcZPlDP1Ru4yyAlyylftwIuQW/nzdd
 vX2tZaVf+B9y1XvSD5NI+2EKWmp7MVrPdNhYxAB39TQZnAAvYDFHhywtZ0UR7vJt
 7YVcZoPtKUhg15jhCOr73eaCT0884/tlgedfd6DkDGR6bCtSQC2PySfqq9Lnnl/1
 ruDzzcgTARzSEzvta/uyBRspOLBHeeBhTdQUp79lMfekC4+68Tx6DFWnydIUttuE
 G7LphN6hfbJLF20U/ENb2H8v10WZsKvGEuxM+fp5PXGcIMSlX4qoJUe/egJFiiSL
 IaikgibvfiKmYSJvwdxTlOcr793X2Ej19HNciNjJQp4pviDOdZixgtGvVVHJBmh6
 LYzE5q9jgbW9wQXwTTeWHp/nyqL80NslX0UARYnS2Ua0B96GRCESXqCUFtxK6tKR
 wbYiHzKc4dOfSxpNlKI+FlX63m5oSAmTEii3ODsWZjObbwYHNX2Zqj2cVFiSLCpv
 ZXgmpNL+tL2zBWxPvn6rzYhpaXo++PqlHK7vv2QVBI6XM2J8ztpj5Wr5zneRoJaE
 ejk8nw/mR43bfdQuUGZRKh/Z+FTqL0/2WbDgJMXl09c+zRz7J2c=
 =XhEC
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "This has two stable bugfixes, one to fix a BUG_ON() when
  nfs_commit_inode() is called with no outstanding commit requests and
  another to fix a race in the SUNRPC receive codepath.

  Additionally, there are also fixes for an NFS client deadlock and an
  xprtrdma performance regression.

  Summary:

  Stable bugfixes:
   - NFS: Avoid a BUG_ON() in nfs_commit_inode() by not waiting for a
     commit in the case that there were no commit requests.
   - SUNRPC: Fix a race in the receive code path

  Other fixes:
   - NFS: Fix a deadlock in nfs client initialization
   - xprtrdma: Fix a performance regression for small IOs"

* tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  SUNRPC: Fix a race in the receive code path
  nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests
  xprtrdma: Spread reply processing over more CPUs
  nfs: fix a deadlock in nfs client initialization
2017-12-16 13:12:53 -08:00
Linus Torvalds
f6f3732162 Revert "mm: replace p??_write with pte_access_permitted in fault + gup paths"
This reverts commits 5c9d2d5c26, c7da82b894, and e7fe7b5cae.

We'll probably need to revisit this, but basically we should not
complicate the get_user_pages_fast() case, and checking the actual page
table protection key bits will require more care anyway, since the
protection keys depend on the exact state of the VM in question.

Particularly when doing a "remote" page lookup (ie in somebody elses VM,
not your own), you need to be much more careful than this was.  Dave
Hansen says:

 "So, the underlying bug here is that we now a get_user_pages_remote()
  and then go ahead and do the p*_access_permitted() checks against the
  current PKRU. This was introduced recently with the addition of the
  new p??_access_permitted() calls.

  We have checks in the VMA path for the "remote" gups and we avoid
  consulting PKRU for them. This got missed in the pkeys selftests
  because I did a ptrace read, but not a *write*. I also didn't
  explicitly test it against something where a COW needed to be done"

It's also not entirely clear that it makes sense to check the protection
key bits at this level at all.  But one possible eventual solution is to
make the get_user_pages_fast() case just abort if it sees protection key
bits set, which makes us fall back to the regular get_user_pages() case,
which then has a vma and can do the check there if we want to.

We'll see.

Somewhat related to this all: what we _do_ want to do some day is to
check the PAGE_USER bit - it should obviously always be set for user
pages, but it would be a good check to have back.  Because we have no
generic way to test for it, we lost it as part of moving over from the
architecture-specific x86 GUP implementation to the generic one in
commit e585513b76 ("x86/mm/gup: Switch GUP to the generic
get_user_page_fast() implementation").

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-15 18:53:22 -08:00