Commit Graph

688457 Commits

Author SHA1 Message Date
Linus Torvalds
2cc7b4ca7d Various fixes and tweaks for the pstore subsystem. Highlights:
- use memdup_user() instead of open-coded copies (Geliang Tang)
 - fix record memory leak during initialization (Douglas Anderson)
 - avoid confused compressed record warning (Ankit Kumar)
 - prepopulate record timestamp and remove redundant logic from backends
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJZXGq0AAoJEIly9N/cbcAmecQP/iw5ngoGaB5pQD8Jq8srzWJK
 nGysSHuEQmDMSmTpXKllmi+AVotMXtvLzeEy0ThmtaTYJPUF2NYi4BIv0SonAu/v
 6Jds4AP9OYBZAxe95Xdk/VlDpo3LW2DxDk3URC3kmDCWqr91zH2a8RQCfr1ArGb0
 7vI0fEKuc4rDTnOIw4hlJ60UyYX+PsD7m/s/9p///mFN7nIhCvm1w9ToIIwNovX7
 4Hvgs135ZanBjLkvPEKPMQRoizCGEeznZPNhn0WFe+AKFIW0KLME+XArgcrCg5w+
 UZr3p706fqMe54ZuZhzlaoHZKuEEfsSda8XamgSA1tMuHm983DZJ0k9nskyXRqtT
 tGBSaFbrArAim3JvI5diJ6LB7QGGThGWjUc8tkbTMyJyxwZeDvoPIyirzTnignRz
 RbnL3DJDAnKqNuf0RyX6a6iz6JobXRz52SZqOWZ/CWrDnBtsXnvPz/enMANgKLZn
 5Hq3ngapIa+DdK6jipppgPMY2woHrb3Jr6E0xhU6PDXQFMNI8cnD0+6H8h3//XG0
 4q6bGsDMy6G6o6RvxIFN+Nr7Xrff8CSlujClIQBPSgn0fPcxvOnZTnYjN0UQ0RMW
 OCh68vb4eJgi3diYLqQ/1m25fIRsxC8O0uu089bH4uGJgtZUfEX+D6L5UtBGt+fe
 BXbX1HbaVFeatVB/o0el
 =VRl5
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore updates from Kees Cook:
 "Various fixes and tweaks for the pstore subsystem.

  Highlights:

   - use memdup_user() instead of open-coded copies (Geliang Tang)

   - fix record memory leak during initialization (Douglas Anderson)

   - avoid confused compressed record warning (Ankit Kumar)

   - prepopulate record timestamp and remove redundant logic from
     backends"

* tag 'pstore-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  powerpc/nvram: use memdup_user
  pstore: use memdup_user
  pstore: Fix format string to use %u for record id
  pstore: Populate pstore record->time field
  pstore: Create common record initializer
  efi-pstore: Refactor erase routine
  pstore: Avoid potential infinite loop
  pstore: Fix leaked pstore_record in pstore_get_backend_records()
  pstore: Don't warn if data is uncompressed and type is not PSTORE_TYPE_DMESG
2017-07-05 11:43:47 -07:00
Linus Torvalds
e24dd9ee53 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer updates from James Morris:

 - a major update for AppArmor. From JJ:

     * several bug fixes and cleanups

     * the patch to add symlink support to securityfs that was floated
       on the list earlier and the apparmorfs changes that make use of
       securityfs symlinks

     * it introduces the domain labeling base code that Ubuntu has been
       carrying for several years, with several cleanups applied. And it
       converts the current mediation over to using the domain labeling
       base, which brings domain stacking support with it. This finally
       will bring the base upstream code in line with Ubuntu and provide
       a base to upstream the new feature work that Ubuntu carries.

     * This does _not_ contain any of the newer apparmor mediation
       features/controls (mount, signals, network, keys, ...) that
       Ubuntu is currently carrying, all of which will be RFC'd on top
       of this.

 - Notable also is the Infiniband work in SELinux, and the new file:map
   permission. From Paul:

      "While we're down to 21 patches for v4.13 (it was 31 for v4.12),
       the diffstat jumps up tremendously with over 2k of line changes.

       Almost all of these changes are the SELinux/IB work done by
       Daniel Jurgens; some other noteworthy changes include a NFS v4.2
       labeling fix, a new file:map permission, and reporting of policy
       capabilities on policy load"

   There's also now genfscon labeling support for tracefs, which was
   lost in v4.1 with the separation from debugfs.

 - Smack incorporates a safer socket check in file_receive, and adds a
   cap_capable call in privilege check.

 - TPM as usual has a bunch of fixes and enhancements.

 - Multiple calls to security_add_hooks() can now be made for the same
   LSM, to allow LSMs to have hook declarations across multiple files.

 - IMA now supports different "ima_appraise=" modes (eg. log, fix) from
   the boot command line.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (126 commits)
  apparmor: put back designators in struct initialisers
  seccomp: Switch from atomic_t to recount_t
  seccomp: Adjust selftests to avoid double-join
  seccomp: Clean up core dump logic
  IMA: update IMA policy documentation to include pcr= option
  ima: Log the same audit cause whenever a file has no signature
  ima: Simplify policy_func_show.
  integrity: Small code improvements
  ima: fix get_binary_runtime_size()
  ima: use ima_parse_buf() to parse template data
  ima: use ima_parse_buf() to parse measurements headers
  ima: introduce ima_parse_buf()
  ima: Add cgroups2 to the defaults list
  ima: use memdup_user_nul
  ima: fix up #endif comments
  IMA: Correct Kconfig dependencies for hash selection
  ima: define is_ima_appraise_enabled()
  ima: define Kconfig IMA_APPRAISE_BOOTPARAM option
  ima: define a set of appraisal rules requiring file signatures
  ima: extend the "ima_policy" boot command line to support multiple policies
  ...
2017-07-05 11:26:35 -07:00
Linus Torvalds
7391786a64 Merge branch 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit
Pull audit updates from Paul Moore:
 "Things are relatively quiet on the audit front for v4.13, just five
  patches for a total diffstat of 102 lines.

  There are two patches from Richard to consistently record the POSIX
  capabilities and add the ambient capability information as well.

  I also chipped in two patches to fix a race condition with the auditd
  tracking code and ensure we don't skip sending any records to the
  audit multicast group.

  Finally a single style fix that I accepted because I must have been in
  a good mood that day.

  Everything passes our test suite, and should be relatively harmless,
  please merge for v4.13"

* 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit:
  audit: make sure we never skip the multicast broadcast
  audit: fix a race condition with the auditd tracking code
  audit: style fix
  audit: add ambient capabilities to CAPSET and BPRM_FCAPS records
  audit: unswing cap_* fields in PATH records
2017-07-05 11:24:05 -07:00
Linus Torvalds
eed1fc8779 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:

 - Store printk() messages into the main log buffer directly even in NMI
   when the lock is available. It is the best effort to print even large
   chunk of text. It is handy, for example, when all ftrace messages are
   printed during the system panic in NMI.

 - Add missing annotations to calm down compiler warnings

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  printk: add __printf attributes to internal functions
  printk: Use the main logbuf in NMI when logbuf_lock is available
2017-07-05 11:11:26 -07:00
Jeffrey Hugo
65a4433aeb sched/fair: Fix load_balance() affinity redo path
If load_balance() fails to migrate any tasks because all tasks were
affined, load_balance() removes the source CPU from consideration and
attempts to redo and balance among the new subset of CPUs.

There is a bug in this code path where the algorithm considers all active
CPUs in the system (minus the source that was just masked out).  This is
not valid for two reasons: some active CPUs may not be in the current
scheduling domain and one of the active CPUs is dst_cpu. These CPUs should
not be considered, as we cannot pull load from them.

Instead of failing out of load_balance(), we may end up redoing the search
with no valid CPUs and incorrectly concluding the domain is balanced.
Additionally, if the group_imbalance flag was just set, it may also be
incorrectly unset, thus the flag will not be seen by other CPUs in future
load_balance() runs as that algorithm intends.

Fix the check by removing CPUs not in the current domain and the dst_cpu
from considertation, thus limiting the evaluation to valid remaining CPUs
from which load might be migrated.

Co-authored-by: Austin Christ <austinwc@codeaurora.org>
Co-authored-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Austin Christ <austinwc@codeaurora.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Timur Tabi <timur@codeaurora.org>
Link: http://lkml.kernel.org/r/1496863138-11322-2-git-send-email-jhugo@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 16:28:48 +02:00
Ingo Molnar
86d35afb8e MAINTAINERS: Add Frederic Weisbecker as nohz/dyntics maintainer
Frederic has been improving and maintaining the nohz/dynticks kernel features
for years, so make his de facto maintainership official.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 16:26:37 +02:00
Steven Rostedt (VMware)
69d71879d2 ftrace: Test for NULL iter->tr in regex for stack_trace_filter changes
As writing into stack_trace_filter, the iter-tr is not set and is NULL.
Check if it is NULL before dereferencing it in ftrace_regex_release().

Fixes: 8c08f0d5c6 ("ftrace: Have cached module filters be an active filter")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-07-05 09:52:18 -04:00
Steven Rostedt (VMware)
4dce17b26b Merge commit '0f17976568b3f72e676450af0c0db6f8752253d6' into trace/ftrace/core
Need to get the changes from 0f17976568 ("ftrace: Fix regression with
module command in stack_trace_filter") as it is required to fix some other
changes with stack_trace_filter and the new development code.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-07-05 09:51:24 -04:00
Rob Herring
a4485b545e Merge branch 'dt/property-move' into dt/next 2017-07-05 08:31:52 -05:00
Rob Herring
b8ba92b101 Merge branch 'topic/of-graph-base' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound into dt/property-move
OF graph changes for ALSA conflict with the move of graph functions into
property.c.
2017-07-05 08:24:05 -05:00
Arvind Yadav
29695254ec GFS2: constify attribute_group structures.
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.

File size before:
   text	   data	    bss	    dec	    hex	filename
   5259	   1344	      8	   6611	   19d3	fs/gfs2/sys.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
   5371	   1216	      8	   6595	   19c3	fs/gfs2/sys.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:21:14 -05:00
Andreas Gruenbacher
e0b62e21b7 gfs2: gfs2_create_inode: Keep glock across iput
On failure, keep the inode glock across the final iput of the new inode
so that gfs2_evict_inode doesn't have to re-acquire the glock.  That
way, gfs2_evict_inode won't need to revalidate the block type.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:21:07 -05:00
Andreas Gruenbacher
6b0c7440bc gfs2: Clean up glock work enqueuing
This patch adds a standardized queueing mechanism for glock work
with spin_lock protection to prevent races.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:21:00 -05:00
Andreas Gruenbacher
6f6597baae gfs2: Protect gl->gl_object by spin lock
Put all remaining accesses to gl->gl_object under the
gl->gl_lockref.lock spinlock to prevent races.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:20:52 -05:00
Andreas Gruenbacher
4fd1a57952 gfs2: Get rid of flush_delayed_work in gfs2_evict_inode
So far, gfs2_evict_inode clears gl->gl_object and then flushes the glock
work queue to make sure that inode glops which dereference gl->gl_object
have finished running before the inode is destroyed.  However, flushing
the work queue may do more work than needed, and in particular, it may
call into DLM, which we want to avoid here.  Use a bit lock
(GIF_GLOP_PENDING) to synchronize between the inode glops and
gfs2_evict_inode instead to get rid of the flushing.

In addition, flush the work queues of existing glocks before reusing
them for new inodes to get those glocks into a known state: the glock
state engine currently doesn't handle glock re-appropriation correctly.
(We may be able to fix the glock state engine instead later.)

Based on a patch by Steven Whitehouse <swhiteho@redhat.com>.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:20:24 -05:00
Kirill Tkhai
a0c4acd2c2 locking/rwsem-spinlock: Fix EINTR branch in __down_write_common()
If a writer could been woken up, the above branch

	if (sem->count == 0)
		break;

would have moved us to taking the sem. So, it's
not the time to wake a writer now, and only readers
are allowed now. Thus, 0 must be passed to __rwsem_do_wake().

Next, __rwsem_do_wake() wakes readers unconditionally.
But we mustn't do that if the sem is owned by writer
in the moment. Otherwise, writer and reader own the sem
the same time, which leads to memory corruption in
callers.

rwsem-xadd.c does not need that, as:

  1) the similar check is made lockless there,
  2) in __rwsem_mark_wake::try_reader_grant we test,

that sem is not owned by writer.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Niklas Cassel <niklas.cassel@axis.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 17fcbd590d "locking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=y"
Link: http://lkml.kernel.org/r/149762063282.19811.9129615532201147826.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 12:26:29 +02:00
Russell King
98becb781e Merge branches 'fixes' and 'misc' into for-linus 2017-07-05 11:06:59 +01:00
David S. Miller
0e72582270 Merge branch 'phy-dp83867-workaround-incorrect-RX_CTRL-pin-strap'
Sekhar Nori says:

====================
net: phy: dp83867: workaround incorrect RX_CTRL pin strap

This patch series adds workaround for incorrect RX_CTRL pin strap
setting that can be found on some TI boards.

This is required to be complaint to PHY datamanual specification.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:23:53 +01:00
Murali Karicheri
371444764b net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
The data manual for DP83867IR/CR, SNLS484E[1], revised march 2017,
advises that strapping RX_DV/RX_CTRL pin in mode 1 and 2 is not
supported (see note below Table 5 (4-Level Strap Pins)).

There are some boards which have the pin strapped this way and need
software workaround suggested by the data manual. Bit[7] of
Configuration Register 4 (address 0x0031) must be cleared to 0. This
ensures proper operation of the PHY.

Implement driver support for device-tree property meant to advertise
the wrong strapping.

[1] http://www.ti.com/lit/ds/snls484e/snls484e.pdf

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
[nsekhar@ti.com: rebase to mainline, code simplification]
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:23:53 +01:00
Murali Karicheri
908a773325 dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
The data manual for DP83867IR/CR, SNLS484E[1], revised march 2017,
advises that strapping RX_DV/RX_CTRL pin in mode 1 and 2 is not
supported (see note below Table 5 (4-Level Strap Pins)).

It further advises that if a board has this pin strapped in mode 1 and
mode 2, then to ensure proper operation of the PHY, a software workaround
must be implemented.

Since it is not possible to detect in software if RX_DV/RX_CTRL pin is
incorrectly strapped, add a device-tree property for the board to
advertise this and allow corrective action in software.

[1] http://www.ti.com/lit/ds/snls484e/snls484e.pdf

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
[nsekhar@ti.com: rebase to mainline, split documentation into separate patch]
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:23:53 +01:00
David S. Miller
a778427efc Merge branch 'cxgb4-ptp'
Atul Gupta says:

====================
cxgb4: Add PTP Hardware Clock (PHC) support

V4:
	Splitting the patch again
V3:
        Releasing lock in the exit paths
V2:
        Splitting the patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:21:54 +01:00
Atul Gupta
c3ff08eba9 cxgb4: Support for get_ts_info ethtool method
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:21:54 +01:00
Atul Gupta
9c33e4208b cxgb4: Add PTP Hardware Clock (PHC) support
Add PTP IEEE-1588 support and make it accessible via PHC subsystem.
The functionality is enabled for T5/T6 adapters. Driver interfaces with
Firmware to program and adjust the clock offset.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:21:54 +01:00
Atul Gupta
a456950445 cxgb4: time stamping interface for PTP
Supports hardware and software time stamping via the
Linux SO_TIMESTAMPING socket option.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:21:54 +01:00
David S. Miller
97d731d810 Merge branch 'nfp-port-enumeration-change-and-FW-ABI-adjustment'
Jakub Kicinski says:

====================
nfp: port enumeration change and FW ABI adjustment

This set changes the way ports are numbered internally to avoid MAC address
changes and invalid link information when breakout is configured.  Second
patch gets rid of old way of looking up MAC addresses in device information
which caused all this confusion.

Patch 3 is a small adjustment to the new FW ABI version we introduced in
this release cycle.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:13:08 +01:00
Jakub Kicinski
64a919a944 nfp: default to chained metadata prepend format
ABI 4.x introduced the chained metadata format and made it the
only one possible.  There are cases, however, where the old
format is preferred - mostly to make interoperation with VFs
using ABI 3.x easier for the datapath.  In ABI 5.x we allowed
for more flexibility by selecting the metadata format based
on capabilities.  The default was left to non-chained.

In case of fallback traffic, there is no capability telling the
driver there may be chained metadata.  With a very stripped-
-down FW the default old metadata format would be selected
making the driver drop all fallback traffic.

This patch changes the default selection in the driver. It
should not hurt with old firmwares, because if they don't
advertise RSS they will not produce metadata anyway.  New
firmwares advertising ABI 5.x, however, can depend on the
driver defaulting to chained format.

Fixes: f9380629fa ("nfp: advertise support for NFD ABI 0.5")
Suggested-by: Michael Rapson <michael.rapson@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:13:07 +01:00
Jakub Kicinski
cb2cda4848 nfp: remove legacy MAC address lookup
The legacy MAC address lookup doesn't work well with breakout
cables.  We are probably better off picking random addresses
than the wrong ones in the theoretical scenario where management
FW didn't tell us what the port config is.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:13:07 +01:00
Jakub Kicinski
2eb333c4b4 nfp: improve order of interfaces in breakout mode
For historical reasons we enumerate the vNICs in order.  This means
that if user configures breakout on a multiport card, the first
interface of the second port will have its MAC address changed.

What's worse, when moved from static information (HWInfo) to using
management FW (NSP), more features started depending on the port ids.
Right now in case of breakout first subport of the second port and
second subport of the first port will have their link info swapped.

Revise the ordering scheme so that first subport maintains its address.
Side effect of this change is that we will use base lane ids in
devlink (i.e. 40G ports will be 4 ids apart), e.g.:

pci/0000:04:00.0/0: type eth netdev p6p1
pci/0000:04:00.0/4: type eth netdev p6p2

Note that behaviour of phys_port_id is not changed since there is
a separate id number for the subport there.

Fixes: ec8b1fbe68 ("nfp: support port splitting via devlink")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:13:07 +01:00
Colin Ian King
42af627b40 net: macb: remove extraneous return when MACB_EXT_DESC is defined
When macro MACB_EXT_DESC is defined we end up with two identical
return statements and just one is sufficient. Remove the extra
return.

Detected by CoverityScan, CID#1449361 ("Structurally dead code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:09:36 +01:00
Chen Yu
12df216c61 x86/boot/e820: Introduce the bootloader provided e820_table_firmware[] table
Add the real e820_tabel_firmware[] that will not be modified by the kernel
or the EFI boot stub under any circumstance.

In addition to that modify the code so that e820_table_firmwarep[] is
exposed via sysfs to represent the real firmware memory layout,
rather than exposing the e820_table_kexec[] table.

This fixes a hibernation bug/warning, which uses e820_table_kexec[] to check
RAM layout consistency across hibernation/resume:

  The suspend kernel:
  [    0.000000] e820: update [mem 0x76671018-0x76679457] usable ==> usable

  The resume kernel:
  [    0.000000] e820: update [mem 0x7666f018-0x76677457] usable ==> usable
  ...
  [   15.752088] PM: Using 3 thread(s) for decompression.
  [   15.752088] PM: Loading and decompressing image data (471870 pages)...
  [   15.764971] Hibernate inconsistent memory map detected!
  [   15.770833] PM: Image mismatch: architecture specific data

Actually it is safe to restore these pages because E820_TYPE_RAM and
E820_TYPE_RESERVED_KERN are treated the same during hibernation, so
the original e820 table provided by the bootloader is used for
hibernation MD5 fingerprint checking.

The side effect is that, this newly introduced variable might increase the
kernel size at compile time.

Suggested-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:09:02 +02:00
Chen Yu
a09bae0f8a x86/boot/e820: Rename the e820_table_firmware to e820_table_kexec
Currently the e820_table_firmware[] table is mainly used by the kexec,
and it is not what it's supposed to be - despite its name it might be
modified by the kernel.

So change its name to e820_table_kexec[]. In the next patch we will
introduce the real e820_table_firmware[] table.

No functional change.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:09:02 +02:00
Chen Yu
b7a67e02cd x86/boot/e820: Avoid overwriting e820_table_firmware
The following commit in 2013:

  77ea8c9489 ("x86: Reserve setup_data ranges late after parsing memmap cmdline")

has fixed the issue of losing setup_data information by deferring the
e820_reserve_setup_data() call until the early params have been parsed.

But this also introduced a new problem that, during early params parsing,
the kexec kernel might fake a mptable and saves it into the e820_table_firmware[]
table (without saving the mptable to the e820_table[]), however the subsequent
invoking of e820_reserve_setup_data() will overwrite the e820_table_firmware[]
according to the e820_table[], thus the fake mptable information is lost.

Fix this issue by updating the e820_table_firmware[] according to
the setup_data information, but without overwriting it.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:09:02 +02:00
Colin Ian King
6d3f06a004 bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
There appears to be a missing break in the TCP_BPF_SNDCWND_CLAMP case.
Currently the non-error path where val is greater than zero falls through
to the default case that sets the error return to -EINVAL. Add in
the missing break.

Detected by CoverityScan, CID#1449376 ("Missing break in switch")

Fixes: 13bf96411a ("bpf: Adds support for setting sndcwnd clamp")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:08:54 +01:00
Lawrence Brakmo
f856e46978 bpf: fix return in load_bpf_file
The function load_bpf_file ignores the return value of
load_and_attach(), so even if load_and_attach() returns an error,
load_bpf_file() will return 0.

Now, load_bpf_file() can call load_and_attach() multiple times and some
can succeed and some could fail. I think the correct behavor is to
return error on the first failed load_and_attach().

v2: Added missing SOB

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:05:28 +01:00
Roopa Prabhu
ca4a1cd930 mpls: fix rtm policy in mpls_getroute
fix rtm policy name typo in mpls_getroute and also remove
export of rtm_ipv4_policy

Fixes: 397fc9e5ce ("mpls: route get support")
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:04:32 +01:00
Wanpeng Li
2a42eb9594 sched/cputime: Accumulate vtime on top of nsec clocksource
Currently the cputime source used by vtime is jiffies. When we cross
a context boundary and jiffies have changed since the last snapshot, the
pending cputime is accounted to the switching out context.

This system works ok if the ticks are not aligned across CPUs. If they
instead are aligned (ie: all fire at the same time) and the CPUs run in
userspace, the jiffies change is only observed on tick exit and therefore
the user cputime is accounted as system cputime. This is because the
CPU that maintains timekeeping fires its tick at the same time as the
others. It updates jiffies in the middle of the tick and the other CPUs
see that update on IRQ exit:

    CPU 0 (timekeeper)                  CPU 1
    -------------------              -------------
                      jiffies = N
    ...                              run in userspace for a jiffy
    tick entry                       tick entry (sees jiffies = N)
    set jiffies = N + 1
    tick exit                        tick exit (sees jiffies = N + 1)
                                                account 1 jiffy as stime

Fix this with using a nanosec clock source instead of jiffies. The
cputime is then accumulated and flushed everytime the pending delta
reaches a jiffy in order to mitigate the accounting overhead.

[ fweisbec: changelog, rebase on struct vtime, field renames, add delta
  on cputime readers, keep idle vtime as-is (low overhead accounting),
  harmonize clock sources. ]

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Luiz Capitulino <lcapitulino@redhat.com>
Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1498756511-11714-6-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 09:54:15 +02:00
Frederic Weisbecker
bac5b6b6b1 sched/cputime: Move the vtime task fields to their own struct
We are about to add vtime accumulation fields to the task struct. Let's
avoid more bloatification and gather vtime information to their own
struct.

Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1498756511-11714-5-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 09:54:15 +02:00
Frederic Weisbecker
60a9ce57e7 sched/cputime: Rename vtime fields
The current "snapshot" based naming on vtime fields suggests we record
some past event but that's a low level picture of their actual purpose
which comes out blurry. The real point of these fields is to run a basic
state machine that tracks down cputime entry while switching between
contexts.

So lets reflect that with more meaningful names.

Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1498756511-11714-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 09:54:14 +02:00
Frederic Weisbecker
9fa57cf5a5 sched/cputime: Always set tsk->vtime_snap_whence after accounting vtime
Even though it doesn't have functional consequences, setting
the task's new context state after we actually accounted the pending
vtime from the old context state makes more sense from a review
perspective.

vtime_user_exit() is the only function that doesn't follow that rule
and that can bug the reviewer for a little while until he realizes there
is no reason for this special case.

Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1498756511-11714-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 09:54:14 +02:00
Frederic Weisbecker
1c3eda01a7 vtime, sched/cputime: Remove vtime_account_user()
It's an unnecessary function between vtime_user_exit() and
account_user_time().

Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1498756511-11714-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 09:54:14 +02:00
Herbert Xu
035f901eac Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Merge the crypto tree to pull in fixes for the next merge window.
2017-07-05 15:17:26 +08:00
Ingo Molnar
524b62fdbe perf/urgent fixes:
User visible:
 
 - Fix max attr.precise_ip probing to make perf use the best cycles:p
   available in the processor for non root users (Arnaldo Carvalho de Melo)
 
 - Fix processing of MMAP events for 32-bit binaries on 64-bit systems
   when unwind support is not fully integrated, fixing DSO and symbol
   resolution (Jiri Olsa)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZW+uQAAoJENZQFvNTUqpARHYP/jzthQY9jH6BtLwntItNc8EY
 7zweMpPbdRxXqILfkeEpGiGsH13j8+ZdUlg7Q07yuS2hSJJLYn3WPLb5kfVjDmKP
 IeSkJSsGi448RCWr9yQiOIeVbU07vCb3fdcDK6E485n5gU6jjlfvVF9odt1Yg+PY
 jNs6XSZDbi5GGMA2CpqHYFjRCQbst7ucF6MLlEF3CK6qY9TtiM1UrWwEoJgiKoab
 rnwP9pe2om1KKbBaKE8jSS42yw1KJkvrE6To3cD0HwrnWBufWDZmwrF4Ba5UsClK
 2q3uMJMOMzFOZaWAw1NkW5JfSb0iBxOMnFZXgng+zKsubHlkAkY7j+GhgV6ndSXX
 viuBR6fFhC37xwHSzgw2z0LIwj8VMmsppZWrgB+El9PUiRJ3qIkooXVPaSV+Eet4
 4XfIg4m1L1hlzp5OskV4H6Rh5cqp2g0mmr0iDMSJZVfMC4sCNBSQOT7l7Sks0EWV
 4MeLdFcwIXFYjecu/MZ3H+EI2OIi4/KmuPgwyQmVW09tI00qXIQGFHnq9Q4InneU
 jAwBxn2qNerXM2DS0Zw55rUWakjYQH2wq6MXFPTmyZXlibiFpbBhQjE9SyafoURN
 bC2ago2QXny12WZEwPp3PAYJb6VWhL3W/z00vyawdso1iQJU2QLU4N54+qPq/iXn
 4ng+dI3ZeaZQXAT8X6KS
 =VehJ
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo-4.12-20170704' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

User visible changes:

 - Fix max attr.precise_ip probing to make perf use the best cycles:p
   available in the processor for non root users (Arnaldo Carvalho de Melo)

 - Fix processing of MMAP events for 32-bit binaries on 64-bit systems
   when unwind support is not fully integrated, fixing DSO and symbol
   resolution (Jiri Olsa)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 09:10:37 +02:00
Mikulas Patocka
99c13b8c88 x86/mm/pat: Don't report PAT on CPUs that don't support it
The pat_enabled() logic is broken on CPUs which do not support PAT and
where the initialization code fails to call pat_init(). Due to that the
enabled flag stays true and pat_enabled() returns true wrongfully.

As a consequence the mappings, e.g. for Xorg, are set up with the wrong
caching mode and the required MTRR setups are omitted.

To cure this the following changes are required:

  1) Make pat_enabled() return true only if PAT initialization was
     invoked and successful.

  2) Invoke init_cache_modes() unconditionally in setup_arch() and
     remove the extra callsites in pat_disable() and the pat disabled
     code path in pat_init().

Also rename __pat_enabled to pat_disabled to reflect the real purpose of
this variable.

Fixes: 9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bernhard Held <berny156@gmx.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: "Luis R. Rodriguez" <mcgrof@suse.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1707041749300.3456@file01.intranet.prod.int.rdu2.redhat.com
2017-07-05 09:01:24 +02:00
Cornelia Huck
0c6b2975a9 Update my email address
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-05 07:35:30 +02:00
Jiri Olsa
c46fc0424c s390/syscalls: Fix out of bounds arguments access
Zorro reported following crash while having enabled
syscall tracing (CONFIG_FTRACE_SYSCALLS):

  Unable to handle kernel pointer dereference at virtual ...
  Oops: 0011 [#1] SMP DEBUG_PAGEALLOC

  SNIP

  Call Trace:
  ([<000000000024d79c>] ftrace_syscall_enter+0xec/0x1d8)
   [<00000000001099c6>] do_syscall_trace_enter+0x236/0x2f8
   [<0000000000730f1c>] sysc_tracesys+0x1a/0x32
   [<000003fffcf946a2>] 0x3fffcf946a2
  INFO: lockdep is turned off.
  Last Breaking-Event-Address:
   [<000000000022dd44>] rb_event_data+0x34/0x40
  ---[ end trace 8c795f86b1b3f7b9 ]---

The crash happens in syscall_get_arguments function for
syscalls with zero arguments, that will try to access
first argument (args[0]) in event entry, but it's not
allocated.

Bail out of there are no arguments.

Cc: stable@vger.kernel.org
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-05 07:35:30 +02:00
Sebastian Ott
c14b7a85be s390/vfio_ccw: remove unused variable
Fix this set but not used warning:

drivers/s390/cio/vfio_ccw_drv.c: In function 'vfio_ccw_sch_io_todo':
drivers/s390/cio/vfio_ccw_drv.c:72:21: warning: variable 'sch' set but not used [-Wunused-but-set-variable]
  struct subchannel *sch;
                     ^

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-05 07:35:29 +02:00
Sebastian Ott
4bca698ffe s390/dasd: remove unneeded code
Fix these set but not used warnings:

drivers/s390/block/dasd.c:3933:6: warning: variable 'rc' set but not used [-Wunused-but-set-variable]
drivers/s390/block/dasd_alias.c:757:6: warning: variable 'rc' set but not used [-Wunused-but-set-variable]

In addition to that remove the test if an unsigned is < 0:

drivers/s390/block/dasd_devmap.c:153:11: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-05 07:35:29 +02:00
Michael Holzheu
cd0ae1d395 s390/crash: Remove unused KEXEC_NOTE_BYTES
After commmit 692f66f26a ("crash: move crashkernel parsing and vmcore
related code under CONFIG_CRASH_CORE") the KEXEC_NOTE_BYTES macro is not
used anymore and for s390 we create the ELF header in the new kernel
anyway. Therefore remove the macro.

Reported-by: Xunlei Pang <xpang@redhat.com>
Reviewed-by: Mikhail Zaslonko <zaslonko@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-05 07:35:29 +02:00
Harald Freudenberger
792e0e0022 s390/zcrypt: Fix missing newlines at some debug feature messages.
On some debug feature invocations the newline was missing.

Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-05 07:35:29 +02:00
Jan Höppner
9d2be0c1d4 s390/dasd: Make raw I/O usable without prefix support
The Prefix CCW is not mandatory and raw I/O can also be issued without
it. Check whether the Prefix CCW is supported and if not use the
combination of Define Extent and Locate Record Extended instead.

While at it, sort the variable declarations, replace the gotos with
early exits, and remove an error check at the end which is irrelevant.
Also, remove the XRC check as it is not relevant for raw I/O.

Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-05 07:35:29 +02:00