Commit Graph

26649 Commits

Author SHA1 Message Date
Michal Hocko
d6cb41cc44 mm, hugetlb: remove hugepages_treat_as_movable sysctl
hugepages_treat_as_movable has been introduced by 396faf0303 ("Allow
huge page allocations to use GFP_HIGH_MOVABLE") to allow hugetlb
allocations from ZONE_MOVABLE even when hugetlb pages were not
migrateable.  The purpose of the movable zone was different at the time.
It aimed at reducing memory fragmentation and hugetlb pages being long
lived and large werre not contributing to the fragmentation so it was
acceptable to use the zone back then.

Things have changed though and the primary purpose of the zone became
migratability guarantee.  If we allow non migrateable hugetlb pages to
be in ZONE_MOVABLE memory hotplug might fail to offline the memory.

Remove the knob and only rely on hugepage_migration_supported to allow
movable zones.

Mel said:

: Primarily it was aimed at allowing the hugetlb pool to safely shrink with
: the ability to grow it again.  The use case was for batched jobs, some of
: which needed huge pages and others that did not but didn't want the memory
: useless pinned in the huge pages pool.
:
: I suspect that more users rely on THP than hugetlbfs for flexible use of
: huge pages with fallback options so I think that removing the option
: should be ok.

Link: http://lkml.kernel.org/r/20171003072619.8654-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Alexandru Moise <00moses.alexander00@gmail.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Alexandru Moise <00moses.alexander00@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:37 -08:00
Andrew Morton
d70f2a14b7 include/linux/sched/mm.h: uninline mmdrop_async(), etc
mmdrop_async() is only used in fork.c.  Move that and its support
functions into fork.c, uninline it all.

Quite a lot of code gets moved around to avoid forward declarations.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:36 -08:00
Linus Torvalds
28bc6fb959 SCSI misc on 20180131
This is mostly updates of the usual driver suspects: arcmsr,
 scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas,
 hisi_sas.  We also have a rework of the libsas hotplug handling to
 make it more robust, a slew of 32 bit time conversions and fixes, and
 a host of the usual minor updates and style changes.  The biggest
 potential for regressions is the libsas hotplug changes, but so far
 they seem stable under testing.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWnH+5SYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishWxuAP0UvuJp
 MNR/yU/wv/emSzOc48Ldwd7I0xD2XxSnloGUgwD+IGZZT5yNUQA1THCbm+en4hkB
 WvyBieQs9qRit+2czd4=
 =gJMf
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This is mostly updates of the usual driver suspects: arcmsr,
  scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas,
  hisi_sas.

  We also have a rework of the libsas hotplug handling to make it more
  robust, a slew of 32 bit time conversions and fixes, and a host of the
  usual minor updates and style changes. The biggest potential for
  regressions is the libsas hotplug changes, but so far they seem stable
  under testing"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (313 commits)
  scsi: qla2xxx: Fix logo flag for qlt_free_session_done()
  scsi: arcmsr: avoid do_gettimeofday
  scsi: core: Add VENDOR_SPECIFIC sense code definitions
  scsi: qedi: Drop cqe response during connection recovery
  scsi: fas216: fix sense buffer initialization
  scsi: ibmvfc: Remove unneeded semicolons
  scsi: hisi_sas: fix a bug in hisi_sas_dev_gone()
  scsi: hisi_sas: directly attached disk LED feature for v2 hw
  scsi: hisi_sas: devicetree: bindings: add LED feature for v2 hw
  scsi: megaraid_sas: NVMe passthrough command support
  scsi: megaraid: use ktime_get_real for firmware time
  scsi: fnic: use 64-bit timestamps
  scsi: qedf: Fix error return code in __qedf_probe()
  scsi: devinfo: fix format of the device list
  scsi: qla2xxx: Update driver version to 10.00.00.05-k
  scsi: qla2xxx: Add XCB counters to debugfs
  scsi: qla2xxx: Fix queue ID for async abort with Multiqueue
  scsi: qla2xxx: Fix warning for code intentation in __qla24xx_handle_gpdb_event()
  scsi: qla2xxx: Fix warning during port_name debug print
  scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
  ...
2018-01-31 11:23:28 -08:00
Linus Torvalds
8b0fdf631c Merge branch 'work.mqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mqueue/bpf vfs cleanups from Al Viro:
 "mqueue and bpf go through rather painful and similar contortions to
  create objects in their dentry trees. Provide a primitive for doing
  that without abusing ->mknod(), switch bpf and mqueue to it.

  Another mqueue-related thing that has ended up in that branch is
  on-demand creation of internal mount (based upon the work of Giuseppe
  Scrivano)"

* 'work.mqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mqueue: switch to on-demand creation of internal mount
  tidy do_mq_open() up a bit
  mqueue: clean prepare_open() up
  do_mq_open(): move all work prior to dentry_open() into a helper
  mqueue: fold mq_attr_ok() into mqueue_get_inode()
  move dentry_open() calls up into do_mq_open()
  mqueue: switch to vfs_mkobj(), quit abusing ->d_fsdata
  bpf_obj_do_pin(): switch to vfs_mkobj(), quit abusing ->mknod()
  new primitive: vfs_mkobj()
2018-01-30 18:32:21 -08:00
Linus Torvalds
168fe32a07 Merge branch 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull poll annotations from Al Viro:
 "This introduces a __bitwise type for POLL### bitmap, and propagates
  the annotations through the tree. Most of that stuff is as simple as
  'make ->poll() instances return __poll_t and do the same to local
  variables used to hold the future return value'.

  Some of the obvious brainos found in process are fixed (e.g. POLLIN
  misspelled as POLL_IN). At that point the amount of sparse warnings is
  low and most of them are for genuine bugs - e.g. ->poll() instance
  deciding to return -EINVAL instead of a bitmap. I hadn't touched those
  in this series - it's large enough as it is.

  Another problem it has caught was eventpoll() ABI mess; select.c and
  eventpoll.c assumed that corresponding POLL### and EPOLL### were
  equal. That's true for some, but not all of them - EPOLL### are
  arch-independent, but POLL### are not.

  The last commit in this series separates userland POLL### values from
  the (now arch-independent) kernel-side ones, converting between them
  in the few places where they are copied to/from userland. AFAICS, this
  is the least disruptive fix preserving poll(2) ABI and making epoll()
  work on all architectures.

  As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
  it will trigger only on what would've triggered EPOLLWRBAND on other
  architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
  at all on sparc. With this patch they should work consistently on all
  architectures"

* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
  make kernel-side POLL... arch-independent
  eventpoll: no need to mask the result of epi_item_poll() again
  eventpoll: constify struct epoll_event pointers
  debugging printk in sg_poll() uses %x to print POLL... bitmap
  annotate poll(2) guts
  9p: untangle ->poll() mess
  ->si_band gets POLL... bitmap stored into a user-visible long field
  ring_buffer_poll_wait() return value used as return value of ->poll()
  the rest of drivers/*: annotate ->poll() instances
  media: annotate ->poll() instances
  fs: annotate ->poll() instances
  ipc, kernel, mm: annotate ->poll() instances
  net: annotate ->poll() instances
  apparmor: annotate ->poll() instances
  tomoyo: annotate ->poll() instances
  sound: annotate ->poll() instances
  acpi: annotate ->poll() instances
  crypto: annotate ->poll() instances
  block: annotate ->poll() instances
  x86: annotate ->poll() instances
  ...
2018-01-30 17:58:07 -08:00
Linus Torvalds
13ddd1667e Merge branch 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "Nothing too interesting. Documentation updates and trivial changes;
  however, this pull request does containt he previusly discussed
  dropping of __must_check from strscpy()"

* 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  Documentation: Fix 'file_mapped' -> 'mapped_file'
  string: drop __must_check from strscpy() and restore strscpy() usages in cgroup
  cgroup, docs: document the root cgroup behavior of cpu and io controllers
  cgroup-v2.txt: fix typos
  cgroup: Update documentation reference
  Documentation/cgroup-v1: fix outdated programming details
  cgroup, docs: document cgroup v2 device controller
2018-01-30 15:09:47 -08:00
Linus Torvalds
f8cc87b6c1 Merge branch 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Workqueue has an early init trick where workqueues can be created and
  work items queued on them before the workqueue subsystem is online.
  This helps simplifying early init and operation of low level
  subsystems which use workqueues for managerial things which aren't
  depended upon early during boot.

  Out of laziness, the early init didn't cover workqueues with
  WQ_MEM_RECLAIM, which is inconsistent and confusing because adding the
  flag simply makes the system fail to boot. Cover WQ_MEM_RECLAIM too.

  This was originally brought up for RCU but RCU didn't actually need
  this. I still think it's a good idea to cover it"

* 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: allow WQ_MEM_RECLAIM on early init workqueues
  workqueue: separate out init_rescuer()
2018-01-30 14:45:39 -08:00
Linus Torvalds
2afe738fc0 Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns updates from Eric Biederman:
 "Between the holidays and other distractions only a small amount of
  namespace work made it into my tree this time.

  Just a final cleanup from a revert several kernels ago and a small
  typo fix from Wolffhardt Schwabe"

* 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  fix typo in assignment of fs default overflow gid
  autofs4: Modify autofs_wait to use current_uid() and current_gid()
  userns: Don't fail follow_automount based on s_user_ns
2018-01-30 14:43:12 -08:00
Linus Torvalds
d4173023e6 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo cleanups from Eric Biederman:
 "Long ago when 2.4 was just a testing release copy_siginfo_to_user was
  made to copy individual fields to userspace, possibly for efficiency
  and to ensure initialized values were not copied to userspace.

  Unfortunately the design was complex, it's assumptions unstated, and
  humans are fallible and so while it worked much of the time that
  design failed to ensure unitialized memory is not copied to userspace.

  This set of changes is part of a new design to clean up siginfo and
  simplify things, and hopefully make the siginfo handling robust enough
  that a simple inspection of the code can be made to ensure we don't
  copy any unitializied fields to userspace.

  The design is to unify struct siginfo and struct compat_siginfo into a
  single definition that is shared between all architectures so that
  anyone adding to the set of information shared with struct siginfo can
  see the whole picture. Hopefully ensuring all future si_code
  assignments are arch independent.

  The design is to unify copy_siginfo_to_user32 and
  copy_siginfo_from_user32 so that those function are complete and cope
  with all of the different cases documented in signinfo_layout. I don't
  think there was a single implementation of either of those functions
  that was complete and correct before my changes unified them.

  The design is to introduce a series of helpers including
  force_siginfo_fault that take the values that are needed in struct
  siginfo and build the siginfo structure for their callers. Ensuring
  struct siginfo is built correctly.

  The remaining work for 4.17 (unless someone thinks it is post -rc1
  material) is to push usage of those helpers down into the
  architectures so that architecture specific code will not need to deal
  with the fiddly work of intializing struct siginfo, and then when
  struct siginfo is guaranteed to be fully initialized change copy
  siginfo_to_user into a simple wrapper around copy_to_user.

  Further there is work in progress on the issues that have been
  documented requires arch specific knowledge to sort out.

  The changes below fix or at least document all of the issues that have
  been found with siginfo generation. Then proceed to unify struct
  siginfo the 32 bit helpers that copy siginfo to and from userspace,
  and generally clean up anything that is not arch specific with regards
  to siginfo generation.

  It is a lot but with the unification you can of siginfo you can
  already see the code reduction in the kernel"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (45 commits)
  signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr
  mm/memory_failure: Remove unused trapno from memory_failure
  signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed
  signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap
  signal: Helpers for faults with specialized siginfo layouts
  signal: Add send_sig_fault and force_sig_fault
  signal: Replace memset(info,...) with clear_siginfo for clarity
  signal: Don't use structure initializers for struct siginfo
  signal/arm64: Better isolate the COMPAT_TASK portion of ptrace_hbptriggered
  ptrace: Use copy_siginfo in setsiginfo and getsiginfo
  signal: Unify and correct copy_siginfo_to_user32
  signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32
  signal: Unify and correct copy_siginfo_from_user32
  signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED
  signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h
  signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h
  signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h
  signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h
  signal/powerpc: Remove redefinition of NSIGTRAP on powerpc
  signal: Move addr_lsb into the _sigfault union for clarity
  ...
2018-01-30 14:18:52 -08:00
Linus Torvalds
0aebc6a440 arm64 updates for 4.16:
- Security mitigations:
   - variant 2: invalidating the branch predictor with a call to secure firmware
   - variant 3: implementing KPTI for arm64
 
 - 52-bit physical address support for arm64 (ARMv8.2)
 
 - arm64 support for RAS (firmware first only) and SDEI (software
   delegated exception interface; allows firmware to inject a RAS error
   into the OS)
 
 - Perf support for the ARM DynamIQ Shared Unit PMU
 
 - CPUID and HWCAP bits updated for new floating point multiplication
   instructions in ARMv8.4
 
 - Removing some virtual memory layout printks during boot
 
 - Fix initial page table creation to cope with larger than 32M kernel
   images when 16K pages are enabled
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlpwxDMACgkQa9axLQDI
 XvF55BAAniMpxPXnYNfv6l7/4O8eKo1lJIaG1wbej4JRZ/rT3K4Z3OBXW1dKHO8d
 /PTbVmZ90IqIGROkoDrE+6xyjjn9yK3uuW4ytN2zQkBa8VFaHAnHlX+zKQcuwy9f
 yxwiHk+C7vK5JR7mpXTazjRknsUv1MPtlTt7DQrSdq0KRDJVDNFC+grmbew2rz0X
 cjQDqZqgzuFyrKxdiQVjDmc3zH9NsNBhDo0hlGHf2jK6bGJsAPtI8M2JcLrK8ITG
 Ye/dD7BJp1mWD8ff0BPaMxu24qfAMNLH8f2dpTa986/H78irVz7i/t5HG0/1+5Jh
 EE4OFRTKZ59Qgyo1zWcaJvdp8YjiaX/L4PWJg8CxM5OhP9dIac9ydcFQfWzpKpUs
 xyZfmK6XliGFReAkVOOf5tEqFUDhMtsqhzPYmbmU1lp61wmSYIZ8CTenpWWCJSRO
 NOGyG1X2uFBvP69+iPNlfTGz1r7tg1URY5iO8fUEIhY8LrgyORkiqw4OvPEgnMXP
 Ngy+dXhyvnps2AAWbSX0O4puRlTgEYLT5KaMLzH/+gWsXATT0rzUCD/aOwUQq/Y7
 SWXZHkb3jpmOZZnzZsLL2MNzEIPCFBwSUE9fSv4dA9d/N6tUmlmZALJjHkfzCDpj
 +mPsSmAMTj72kUYzm0b5GCtOu/iQ2kDWOZjOM1m4+v/B+f7JoEE=
 =iEjP
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "The main theme of this pull request is security covering variants 2
  and 3 for arm64. I expect to send additional patches next week
  covering an improved firmware interface (requires firmware changes)
  for variant 2 and way for KPTI to be disabled on unaffected CPUs
  (Cavium's ThunderX doesn't work properly with KPTI enabled because of
  a hardware erratum).

  Summary:

   - Security mitigations:
      - variant 2: invalidate the branch predictor with a call to
        secure firmware
      - variant 3: implement KPTI for arm64

   - 52-bit physical address support for arm64 (ARMv8.2)

   - arm64 support for RAS (firmware first only) and SDEI (software
     delegated exception interface; allows firmware to inject a RAS
     error into the OS)

   - perf support for the ARM DynamIQ Shared Unit PMU

   - CPUID and HWCAP bits updated for new floating point multiplication
     instructions in ARMv8.4

   - remove some virtual memory layout printks during boot

   - fix initial page table creation to cope with larger than 32M kernel
     images when 16K pages are enabled"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (104 commits)
  arm64: Fix TTBR + PAN + 52-bit PA logic in cpu_do_switch_mm
  arm64: Turn on KPTI only on CPUs that need it
  arm64: Branch predictor hardening for Cavium ThunderX2
  arm64: Run enable method for errata work arounds on late CPUs
  arm64: Move BP hardening to check_and_switch_context
  arm64: mm: ignore memory above supported physical address size
  arm64: kpti: Fix the interaction between ASID switching and software PAN
  KVM: arm64: Emulate RAS error registers and set HCR_EL2's TERR & TEA
  KVM: arm64: Handle RAS SErrors from EL2 on guest exit
  KVM: arm64: Handle RAS SErrors from EL1 on guest exit
  KVM: arm64: Save ESR_EL2 on guest SError
  KVM: arm64: Save/Restore guest DISR_EL1
  KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2.
  KVM: arm/arm64: mask/unmask daif around VHE guests
  arm64: kernel: Prepare for a DISR user
  arm64: Unconditionally enable IESB on exception entry/return for firmware-first
  arm64: kernel: Survive corrected RAS errors notified by SError
  arm64: cpufeature: Detect CPU RAS Extentions
  arm64: sysreg: Move to use definitions for all the SCTLR bits
  arm64: cpufeature: __this_cpu_has_cap() shouldn't stop early
  ...
2018-01-30 13:57:43 -08:00
Linus Torvalds
af8c5e2d60 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Implement frequency/CPU invariance and OPP selection for
     SCHED_DEADLINE (Juri Lelli)

   - Tweak the task migration logic for better multi-tasking
     workload scalability (Mel Gorman)

   - Misc cleanups, fixes and improvements"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Make bandwidth enforcement scale-invariant
  sched/cpufreq: Move arch_scale_{freq,cpu}_capacity() outside of #ifdef CONFIG_SMP
  sched/cpufreq: Remove arch_scale_freq_capacity()'s 'sd' parameter
  sched/cpufreq: Always consider all CPUs when deciding next freq
  sched/cpufreq: Split utilization signals
  sched/cpufreq: Change the worker kthread to SCHED_DEADLINE
  sched/deadline: Move CPU frequency selection triggering points
  sched/cpufreq: Use the DEADLINE utilization signal
  sched/deadline: Implement "runtime overrun signal" support
  sched/fair: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache
  sched/fair: Correct obsolete comment about cpufreq_update_util()
  sched/fair: Remove impossible condition from find_idlest_group_cpu()
  sched/cpufreq: Don't pass flags to sugov_set_iowait_boost()
  sched/cpufreq: Initialize sg_cpu->flags to 0
  sched/fair: Consider RT/IRQ pressure in capacity_spare_wake()
  sched/fair: Use 'unsigned long' for utilization, consistently
  sched/core: Rework and clarify prepare_lock_switch()
  sched/fair: Remove unused 'curr' parameter from wakeup_gran
  sched/headers: Constify object_is_on_stack()
2018-01-30 11:55:56 -08:00
Linus Torvalds
d8b91dde38 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Clean up the x86 instruction decoder (Masami Hiramatsu)

   - Add new uprobes optimization for PUSH instructions on x86 (Yonghong
     Song)

   - Add MSR_IA32_THERM_STATUS to the MSR events (Stephane Eranian)

   - Fix misc bugs, update documentation, plus various cleanups (Jiri
     Olsa)

  There's a large number of tooling side improvements:

   - Intel-PT/BTS improvements (Adrian Hunter)

   - Numerous 'perf trace' improvements (Arnaldo Carvalho de Melo)

   - Introduce an errno code to string facility (Hendrik Brueckner)

   - Various build system improvements (Jiri Olsa)

   - Add support for CoreSight trace decoding by making the perf tools
     use the external openCSD (Mathieu Poirier, Tor Jeremiassen)

   - Add ARM Statistical Profiling Extensions (SPE) support (Kim
     Phillips)

   - libtraceevent updates (Steven Rostedt)

   - Intel vendor event JSON updates (Andi Kleen)

   - Introduce 'perf report --mmaps' and 'perf report --tasks' to show
     info present in 'perf.data' (Jiri Olsa, Arnaldo Carvalho de Melo)

   - Add infrastructure to record first and last sample time to the
     perf.data file header, so that when processing all samples in a
     'perf record' session, such as when doing build-id processing, or
     when specifically requesting that that info be recorded, use that
     in 'perf report --time', that also got support for percent slices
     in addition to absolute ones.

     I.e. now it is possible to ask for the samples in the 10%-20% time
     slice of a perf.data file (Jin Yao)

   - Allow system wide 'perf stat --per-thread', sorting the result (Jin
     Yao)

     E.g.:

      [root@jouet ~]# perf stat --per-thread --metrics IPC
      ^C
       Performance counter stats for 'system wide':

                  make-22229  23,012,094,032  inst_retired.any   #  0.8 IPC
                   cc1-22419     692,027,497  inst_retired.any   #  0.8 IPC
                   gcc-22418     328,231,855  inst_retired.any   #  0.9 IPC
                   cc1-22509     220,853,647  inst_retired.any   #  0.8 IPC
                   gcc-22486     199,874,810  inst_retired.any   #  1.0 IPC
                    as-22466     177,896,365  inst_retired.any   #  0.9 IPC
                   cc1-22465     150,732,374  inst_retired.any   #  0.8 IPC
                   gcc-22508     112,555,593  inst_retired.any   #  0.9 IPC
                   cc1-22487     108,964,079  inst_retired.any   #  0.7 IPC
       qemu-system-x86-2697       21,330,550  inst_retired.any   #  0.3 IPC
       systemd-journal-551        20,642,951  inst_retired.any   #  0.4 IPC
       docker-containe-17651       9,552,892  inst_retired.any   #  0.5 IPC
       dockerd-current-9809        7,528,586  inst_retired.any   #  0.5 IPC
                  make-22153  12,504,194,380  inst_retired.any   #  0.8 IPC
               python2-22429  12,081,290,954  inst_retired.any   #  0.8 IPC
      <SNIP>
               python2-22429  15,026,328,103  cpu_clk_unhalted.thread
                   cc1-22419     826,660,193  cpu_clk_unhalted.thread
                   gcc-22418     365,321,295  cpu_clk_unhalted.thread
                   cc1-22509     279,169,362  cpu_clk_unhalted.thread
                   gcc-22486     210,156,950  cpu_clk_unhalted.thread
      <SNIP>

           5.638075538 seconds time elapsed

     [root@jouet ~]#

   - Improve shell auto-completion of perf events (Jin Yao)

   - 'perf probe' improvements (Masami Hiramatsu)

   - Improve PMU infrastructure to support amp64's ThunderX2
     implementation defined core events (Ganapatrao Kulkarni)

   - Various annotation related improvements and fixes (Thomas Richter)

   - Clarify usage of 'overwrite' and 'backward' in the evlist/mmap
     code, removing the 'overwrite' parameter from several functions as
     it was always used it as 'false' (Wang Nan)

   - Fix/improve 'perf record' reverse recording support (Wang Nan)

   - Improve command line options documentation (Sihyeon Jang)

   - Optimize sample parsing for ordering events, where we don't need to
     parse all the PERF_SAMPLE_ bits, just the ones leading to the
     timestamp needed to reorder events (Jiri Olsa)

   - Generalize the annotation code to support other source information
     besides objdump/DWARF obtained ones, starting with python scripts,
     that will is slated to be merged soon (Jiri Olsa)

   - ... and a lot more that I failed to list, see the shortlog and
     changelog for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits)
  perf trace beauty flock: Move to separate object file
  perf evlist: Remove fcntl.h from evlist.h
  perf trace beauty futex: Beautify FUTEX_BITSET_MATCH_ANY
  perf trace: Do not print from time delta for interrupted syscall lines
  perf trace: Add --print-sample
  perf bpf: Remove misplaced __maybe_unused attribute
  MAINTAINERS: Adding entry for CoreSight trace decoding
  perf tools: Add mechanic to synthesise CoreSight trace packets
  perf tools: Add full support for CoreSight trace decoding
  pert tools: Add queue management functionality
  perf tools: Add functionality to communicate with the openCSD decoder
  perf tools: Add support for decoding CoreSight trace data
  perf tools: Add decoder mechanic to support dumping trace data
  perf tools: Add processing of coresight metadata
  perf tools: Add initial entry point for decoder CoreSight traces
  perf tools: Integrating the CoreSight decoding library
  perf vendor events intel: Update IvyTown files to V20
  perf vendor events intel: Update IvyBridge files to V20
  perf vendor events intel: Update BroadwellDE events to V7
  perf vendor events intel: Update SkylakeX events to V1.06
  ...
2018-01-30 11:15:14 -08:00
Linus Torvalds
5e7481a25e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes relate to making lock_is_held() et al (and external
  wrappers of them) work on const data types - this requires const
  propagation through the depths of lockdep.

  This removes a number of ugly type hacks the external helpers used"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep: Convert some users to const
  lockdep: Make lockdep checking constant
  lockdep: Assign lock keys on registration
2018-01-30 10:44:56 -08:00
Linus Torvalds
d772794637 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main RCU changes in this cycle were:

   - Updates to use cond_resched() instead of cond_resched_rcu_qs()
     where feasible (currently everywhere except in kernel/rcu and in
     kernel/torture.c). Also a couple of fixes to avoid sending IPIs to
     offline CPUs.

   - Updates to simplify RCU's dyntick-idle handling.

   - Updates to remove almost all uses of smp_read_barrier_depends() and
     read_barrier_depends().

   - Torture-test updates.

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  torture: Save a line in stutter_wait(): while -> for
  torture: Eliminate torture_runnable and perf_runnable
  torture: Make stutter less vulnerable to compilers and races
  locking/locktorture: Fix num reader/writer corner cases
  locking/locktorture: Fix rwsem reader_delay
  torture: Place all torture-test modules in one MAINTAINERS group
  rcutorture/kvm-build.sh: Skip build directory check
  rcutorture: Simplify functions.sh include path
  rcutorture: Simplify logging
  rcutorture/kvm-recheck-*: Improve result directory readability check
  rcutorture/kvm.sh: Support execution from any directory
  rcutorture/kvm.sh: Use consistent help text for --qemu-args
  rcutorture/kvm.sh: Remove unused variable, `alldone`
  rcutorture: Remove unused script, config2frag.sh
  rcutorture/configinit: Fix build directory error message
  rcutorture: Preempt RCU-preempt readers more vigorously
  torture: Reduce #ifdefs for preempt_schedule()
  rcu: Remove have_rcu_nocb_mask from tree_plugin.h
  rcu: Add comment giving debug strategy for double call_rcu()
  tracing, rcu: Hide trace event rcu_nocb_wake when not used
  ...
2018-01-30 10:15:30 -08:00
Linus Torvalds
6304672b7f Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/pti updates from Thomas Gleixner:
 "Another set of melted spectrum related changes:

   - Code simplifications and cleanups for RSB and retpolines.

   - Make the indirect calls in KVM speculation safe.

   - Whitelist CPUs which are known not to speculate from Meltdown and
     prepare for the new CPUID flag which tells the kernel that a CPU is
     not affected.

   - A less rigorous variant of the module retpoline check which merily
     warns when a non-retpoline protected module is loaded and reflects
     that fact in the sysfs file.

   - Prepare for Indirect Branch Prediction Barrier support.

   - Prepare for exposure of the Speculation Control MSRs to guests, so
     guest OSes which depend on those "features" can use them. Includes
     a blacklist of the broken microcodes. The actual exposure of the
     MSRs through KVM is still being worked on"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Simplify indirect_branch_prediction_barrier()
  x86/retpoline: Simplify vmexit_fill_RSB()
  x86/cpufeatures: Clean up Spectre v2 related CPUID flags
  x86/cpu/bugs: Make retpoline module warning conditional
  x86/bugs: Drop one "mitigation" from dmesg
  x86/nospec: Fix header guards names
  x86/alternative: Print unadorned pointers
  x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support
  x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes
  x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
  x86/msr: Add definitions for new speculation control MSRs
  x86/cpufeatures: Add AMD feature bits for Speculation Control
  x86/cpufeatures: Add Intel feature bits for Speculation Control
  x86/cpufeatures: Add CPUID_7_EDX CPUID leaf
  module/retpoline: Warn about missing retpoline in module
  KVM: VMX: Make indirect call speculation safe
  KVM: x86: Make indirect calls in emulator speculation safe
2018-01-29 19:08:02 -08:00
Linus Torvalds
a46d3f9b1c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The timer departement presents:

   - A rather large rework of the hrtimer infrastructure which
     introduces softirq based hrtimers to replace the spread of
     hrtimer/tasklet combos which force the actual callback execution
     into softirq context. The approach is completely different from the
     initial implementation which you cursed at 10 years ago rightfully.

     The softirq based timers have their own queues and there is no
     nasty indirection and list reshuffling in the hard interrupt
     anymore. This comes with conversion of some of the hrtimer/tasklet
     users, the rest and the final removal of that horrible interface
     will come towards the end of the merge window or go through the
     relevant maintainer trees.

     Note: The top commit merged the last minute bugfix for the 10 years
     old CPU hotplug bug as I wanted to make sure that I fatfinger the
     merge conflict resolution myself.

   - The overhaul of the STM32 clocksource/clockevents driver

   - A new driver for the Spreadtrum SC9860 timer

   - A new driver dor the Actions Semi S700 timer

   - The usual set of fixes and updates all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  usb/gadget/NCM: Replace tasklet with softirq hrtimer
  ALSA/dummy: Replace tasklet with softirq hrtimer
  hrtimer: Implement SOFT/HARD clock base selection
  hrtimer: Implement support for softirq based hrtimers
  hrtimer: Prepare handling of hard and softirq based hrtimers
  hrtimer: Add clock bases and hrtimer mode for softirq context
  hrtimer: Use irqsave/irqrestore around __run_hrtimer()
  hrtimer: Factor out __hrtimer_next_event_base()
  hrtimer: Factor out __hrtimer_start_range_ns()
  hrtimer: Remove the 'base' parameter from hrtimer_reprogram()
  hrtimer: Make remote enqueue decision less restrictive
  hrtimer: Unify remote enqueue handling
  hrtimer: Unify hrtimer removal handling
  hrtimer: Make hrtimer_force_reprogramm() unconditionally available
  hrtimer: Make hrtimer_reprogramm() unconditional
  hrtimer: Make hrtimer_cpu_base.next_timer handling unconditional
  hrtimer: Make the remote enqueue check unconditional
  hrtimer: Use accesor functions instead of direct access
  hrtimer: Make the hrtimer_cpu_base::hres_active field unconditional, to simplify the code
  hrtimer: Make room in 'struct hrtimer_cpu_base'
  ...
2018-01-29 16:50:58 -08:00
Linus Torvalds
7bcd342594 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "A rather small set of irq updates this time:

   - removal of the old and now obsolete irq domain debugging code

   - the new Goldfish PIC driver

   - the usual pile of small fixes and updates"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqdomain: Kill CONFIG_IRQ_DOMAIN_DEBUG
  irq/work: Improve the flag definitions
  irqchip/gic-v3: Fix the driver probe() fail due to disabled GICC entry
  irqchip/irq-goldfish-pic: Add Goldfish PIC driver
  dt-bindings/goldfish-pic: Add device tree binding for Goldfish PIC driver
  irqchip/ompic: fix return value check in ompic_of_init()
  dt-bindings/bcm283x: Define polarity of per-cpu interrupts
  irqchip/irq-bcm2836: Add support for DT interrupt polarity
  dt-bindings/bcm2836-l1-intc: Add interrupt polarity support
2018-01-29 16:47:21 -08:00
Linus Torvalds
0a4b6e2f80 Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
 "This is the main pull request for block IO related changes for the
  4.16 kernel. Nothing major in this pull request, but a good amount of
  improvements and fixes all over the map. This contains:

   - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and
     Paolo.

   - Support for SMR zones for deadline and mq-deadline from Damien and
     Christoph.

   - Set of fixes for bcache by way of Michael Lyle, including fixes
     from himself, Kent, Rui, Tang, and Coly.

   - Series from Matias for lightnvm with fixes from Hans Holmberg,
     Javier, and Matias. Mostly centered around pblk, and the removing
     rrpc 1.2 in preparation for supporting 2.0.

   - A couple of NVMe pull requests from Christoph. Nothing major in
     here, just fixes and cleanups, and support for command tracing from
     Johannes.

   - Support for blk-throttle for tracking reads and writes separately.
     From Joseph Qi. A few cleanups/fixes also for blk-throttle from
     Weiping.

   - Series from Mike Snitzer that enables dm to register its queue more
     logically, something that's alwways been problematic on dm since
     it's a stacked device.

   - Series from Ming cleaning up some of the bio accessor use, in
     preparation for supporting multipage bvecs.

   - Various fixes from Ming closing up holes around queue mapping and
     quiescing.

   - BSD partition fix from Richard Narron, fixing a problem where we
     can't mount newer (10/11) FreeBSD partitions.

   - Series from Tejun reworking blk-mq timeout handling. The previous
     scheme relied on atomic bits, but it had races where we would think
     a request had timed out if it to reused at the wrong time.

   - null_blk now supports faking timeouts, to enable us to better
     exercise and test that functionality separately. From me.

   - Kill the separate atomic poll bit in the request struct. After
     this, we don't use the atomic bits on blk-mq anymore at all. From
     me.

   - sgl_alloc/free helpers from Bart.

   - Heavily contended tag case scalability improvement from me.

   - Various little fixes and cleanups from Arnd, Bart, Corentin,
     Douglas, Eryu, Goldwyn, and myself"

* 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits)
  block: remove smart1,2.h
  nvme: add tracepoint for nvme_complete_rq
  nvme: add tracepoint for nvme_setup_cmd
  nvme-pci: introduce RECONNECTING state to mark initializing procedure
  nvme-rdma: remove redundant boolean for inline_data
  nvme: don't free uuid pointer before printing it
  nvme-pci: Suspend queues after deleting them
  bsg: use pr_debug instead of hand crafted macros
  blk-mq-debugfs: don't allow write on attributes with seq_operations set
  nvme-pci: Fix queue double allocations
  block: Set BIO_TRACE_COMPLETION on new bio during split
  blk-throttle: use queue_is_rq_based
  block: Remove kblockd_schedule_delayed_work{,_on}()
  blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays
  blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly()
  lib/scatterlist: Fix chaining support in sgl_alloc_order()
  blk-throttle: track read and write request individually
  block: add bdev_read_only() checks to common helpers
  block: fail op_is_write() requests to read-only partitions
  blk-throttle: export io_serviced_recursive, io_service_bytes_recursive
  ...
2018-01-29 11:51:49 -08:00
Linus Torvalds
7f3fdd40a7 Power management updates for v4.16-rc1
- Define a PM driver flag allowing drivers to request that their
    devices be left in suspend after system-wide transitions to the
    working state if possible and add support for it to the PCI bus
    type and the ACPI PM domain (Rafael Wysocki).
 
  - Make the PM core carry out optimizations for devices with driver
    PM flags set in some cases and make a few drivers set those flags
    (Rafael Wysocki).
 
  - Fix and clean up wrapper routines allowing runtime PM device
    callbacks to be re-used for system-wide PM, change the generic
    power domains (genpd) framework to stop using those routines
    incorrectly and fix up a driver depending on that behavior of
    genpd (Rafael Wysocki, Ulf Hansson, Geert Uytterhoeven).
 
  - Fix and clean up the PM core's device wakeup framework and
    re-factor system-wide PM core code related to device wakeup
    (Rafael Wysocki, Ulf Hansson, Brian Norris).
 
  - Make more x86-based systems use the Low Power Sleep S0 _DSM
    interface by default (to fix power button wakeup from
    suspend-to-idle on Surface Pro3) and add a kernel command line
    switch to tell it to ignore the system sleep blacklist in the
    ACPI core (Rafael Wysocki).
 
  - Fix a race condition related to cpufreq governor module removal
    and clean up the governor management code in the cpufreq core
    (Rafael Wysocki).
 
  - Drop the unused generic code related to the handling of the static
    power energy usage model in the CPU cooling thermal driver along
    with the corresponding documentation (Viresh Kumar).
 
  - Add mt2712 support to the Mediatek cpufreq driver (Andrew-sh Cheng).
 
  - Add a new operating point to the imx6ul and imx6q cpufreq drivers
    and switch the latter to using clk_bulk_get() (Anson Huang, Dong
    Aisheng).
 
  - Add support for multiple regulators to the TI cpufreq driver along
    with a new DT binding related to that and clean up that driver
    somewhat (Dave Gerlach).
 
  - Fix a powernv cpufreq driver regression leading to incorrect CPU
    frequency reporting, fix that driver to deal with non-continguous
    P-states correctly and clean it up (Gautham Shenoy, Shilpasri Bhat).
 
  - Add support for frequency scaling on Armada 37xx SoCs through the
    generic DT cpufreq driver (Gregory CLEMENT).
 
  - Fix error code paths in the mvebu cpufreq driver (Gregory CLEMENT).
 
  - Fix a transition delay setting regression in the longhaul cpufreq
    driver (Viresh Kumar).
 
  - Add Skylake X (server) support to the intel_pstate cpufreq driver
    and clean up that driver somewhat (Srinivas Pandruvada).
 
  - Clean up the cpufreq statistics collection code (Viresh Kumar).
 
  - Drop cluster terminology and dependency on physical_package_id
    from the PSCI driver and drop dependency on arm_big_little from
    the SCPI cpufreq driver (Sudeep Holla).
 
  - Add support for system-wide suspend and resume to the RAPL power
    capping driver and drop a redundant semicolon from it (Zhen Han,
    Luis de Bethencourt).
 
  - Make SPI domain validation (in the SCSI SPI transport driver) and
    system-wide suspend mutually exclusive as they rely on the same
    underlying mechanism and cannot be carried out at the same time
    (Bart Van Assche).
 
  - Fix the computation of the amount of memory to preallocate in the
    hibernation core and clean up one function in there (Rainer Fiebig,
    Kyungsik Lee).
 
  - Prepare the Operating Performance Points (OPP) framework for being
    used with power domains and clean up one function in it (Viresh
    Kumar, Wei Yongjun).
 
  - Clean up the generic sysfs interface for device PM (Andy Shevchenko).
 
  - Fix several minor issues in power management frameworks and clean
    them up a bit (Arvind Yadav, Bjorn Andersson, Geert Uytterhoeven,
    Gustavo Silva, Julia Lawall, Luis de Bethencourt, Paul Gortmaker,
    Sergey Senozhatsky, gaurav jindal).
 
  - Make it easier to disable PM via Kconfig (Mark Brown).
 
  - Clean up the cpupower and intel_pstate_tracer utilities (Doug
    Smythies, Laura Abbott).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJaYw2iAAoJEILEb/54YlRxLHwP/iabmAcbXBeg30/wSCKcWB6f
 Ar785YbkFedNP7b2dypR7bcKIkaV55EExNHHoVuvC6gKrW+zx3F39v9QzK3HBKfw
 DgLWMjxR5Xdm9o8o2chsBEMl0itSRB9s864s+AAAElP+qjyT6kmbFyRFgVYLiNH0
 v9jNhPF9EmirViwES/syELa/P1AJDMxCb/SbRY+Xp1sPhGKlx2J/2eQsVDs7G+wL
 2BJeyBqwL9D78U/eY2bvpCoZLpmZmklx1eY5iK3Mzo6LZKYMaSypgkGuRfh//K+a
 8vFLwOBsOlpZ8lsPBRatV5+SMu8qMQMTnstui1m3/9bOPFfjymat6u0lLw4BV2hv
 zrNfqWOiwTAt/fczR1/naYuuSeRCLABvYDKjs/9iYdrCZYJ+n+ZzU/wi5geswDtD
 cQKDMOdOBrnfkN0Vqpw6ZBqun0RDldNT/+6oy93tHWBlF0CA4mMq5jr8q3iH35CW
 8TA1GCkurHZXTyYdYXR5SUHxPbOgZC87GAb7RlFEJJnvvkmy3jmBng675Hl5XAn7
 D8eJp3d4h5n121pkMLGcBc7K036T2uFsjrHWx+QsjKFUBWUBnuRfInRrLA5WnGo2
 U+KIEUPepdnbFFvYNv+kTgz2uE6FOqycEmnUKUKWUZYPN0GDAOw/V3813uxVRYtq
 27omIOL7PJp1wWjQnfXK
 =dnb7
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "This includes some infrastructure changes in the PM core, mostly
  related to integration between runtime PM and system-wide suspend and
  hibernation, plus some driver changes depending on them and fixes for
  issues in that area which have become quite apparent recently.

  Also included are changes making more x86-based systems use the Low
  Power Sleep S0 _DSM interface by default, which turned out to be
  necessary to handle power button wakeups from suspend-to-idle on
  Surface Pro3.

  On the cpufreq front we have fixes and cleanups in the core, some new
  hardware support, driver updates and the removal of some unused code
  from the CPU cooling thermal driver.

  Apart from this, the Operating Performance Points (OPP) framework is
  prepared to be used with power domains in the future and there is a
  usual bunch of assorted fixes and cleanups.

  Specifics:

   - Define a PM driver flag allowing drivers to request that their
     devices be left in suspend after system-wide transitions to the
     working state if possible and add support for it to the PCI bus
     type and the ACPI PM domain (Rafael Wysocki).

   - Make the PM core carry out optimizations for devices with driver PM
     flags set in some cases and make a few drivers set those flags
     (Rafael Wysocki).

   - Fix and clean up wrapper routines allowing runtime PM device
     callbacks to be re-used for system-wide PM, change the generic
     power domains (genpd) framework to stop using those routines
     incorrectly and fix up a driver depending on that behavior of genpd
     (Rafael Wysocki, Ulf Hansson, Geert Uytterhoeven).

   - Fix and clean up the PM core's device wakeup framework and
     re-factor system-wide PM core code related to device wakeup
     (Rafael Wysocki, Ulf Hansson, Brian Norris).

   - Make more x86-based systems use the Low Power Sleep S0 _DSM
     interface by default (to fix power button wakeup from
     suspend-to-idle on Surface Pro3) and add a kernel command line
     switch to tell it to ignore the system sleep blacklist in the ACPI
     core (Rafael Wysocki).

   - Fix a race condition related to cpufreq governor module removal and
     clean up the governor management code in the cpufreq core (Rafael
     Wysocki).

   - Drop the unused generic code related to the handling of the static
     power energy usage model in the CPU cooling thermal driver along
     with the corresponding documentation (Viresh Kumar).

   - Add mt2712 support to the Mediatek cpufreq driver (Andrew-sh
     Cheng).

   - Add a new operating point to the imx6ul and imx6q cpufreq drivers
     and switch the latter to using clk_bulk_get() (Anson Huang, Dong
     Aisheng).

   - Add support for multiple regulators to the TI cpufreq driver along
     with a new DT binding related to that and clean up that driver
     somewhat (Dave Gerlach).

   - Fix a powernv cpufreq driver regression leading to incorrect CPU
     frequency reporting, fix that driver to deal with non-continguous
     P-states correctly and clean it up (Gautham Shenoy, Shilpasri
     Bhat).

   - Add support for frequency scaling on Armada 37xx SoCs through the
     generic DT cpufreq driver (Gregory CLEMENT).

   - Fix error code paths in the mvebu cpufreq driver (Gregory CLEMENT).

   - Fix a transition delay setting regression in the longhaul cpufreq
     driver (Viresh Kumar).

   - Add Skylake X (server) support to the intel_pstate cpufreq driver
     and clean up that driver somewhat (Srinivas Pandruvada).

   - Clean up the cpufreq statistics collection code (Viresh Kumar).

   - Drop cluster terminology and dependency on physical_package_id from
     the PSCI driver and drop dependency on arm_big_little from the SCPI
     cpufreq driver (Sudeep Holla).

   - Add support for system-wide suspend and resume to the RAPL power
     capping driver and drop a redundant semicolon from it (Zhen Han,
     Luis de Bethencourt).

   - Make SPI domain validation (in the SCSI SPI transport driver) and
     system-wide suspend mutually exclusive as they rely on the same
     underlying mechanism and cannot be carried out at the same time
     (Bart Van Assche).

   - Fix the computation of the amount of memory to preallocate in the
     hibernation core and clean up one function in there (Rainer Fiebig,
     Kyungsik Lee).

   - Prepare the Operating Performance Points (OPP) framework for being
     used with power domains and clean up one function in it (Viresh
     Kumar, Wei Yongjun).

   - Clean up the generic sysfs interface for device PM (Andy
     Shevchenko).

   - Fix several minor issues in power management frameworks and clean
     them up a bit (Arvind Yadav, Bjorn Andersson, Geert Uytterhoeven,
     Gustavo Silva, Julia Lawall, Luis de Bethencourt, Paul Gortmaker,
     Sergey Senozhatsky, gaurav jindal).

   - Make it easier to disable PM via Kconfig (Mark Brown).

   - Clean up the cpupower and intel_pstate_tracer utilities (Doug
     Smythies, Laura Abbott)"

* tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
  PCI / PM: Remove spurious semicolon
  cpufreq: scpi: remove arm_big_little dependency
  drivers: psci: remove cluster terminology and dependency on physical_package_id
  powercap: intel_rapl: Fix trailing semicolon
  dmaengine: rcar-dmac: Make DMAC reinit during system resume explicit
  PM / runtime: Allow no callbacks in pm_runtime_force_suspend|resume()
  PM / hibernate: Drop unused parameter of enough_swap
  PM / runtime: Check ignore_children in pm_runtime_need_not_resume()
  PM / runtime: Rework pm_runtime_force_suspend/resume()
  PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume()
  cpufreq: powernv: Dont assume distinct pstate values for nominal and pmin
  cpufreq: intel_pstate: Add Skylake servers support
  cpufreq: intel_pstate: Replace bxt_funcs with core_funcs
  platform/x86: surfacepro3: Support for wakeup from suspend-to-idle
  ACPI / PM: Use Low Power S0 Idle on more systems
  PM / wakeup: Print warn if device gets enabled as wakeup source during sleep
  PM / domains: Don't skip driver's ->suspend|resume_noirq() callbacks
  PM / core: Propagate wakeup_path status flag in __device_suspend_late()
  PM / core: Re-structure code for clearing the direct_complete flag
  powercap: add suspend and resume mechanism for SOC power limit
  ...
2018-01-29 09:47:41 -08:00
Linus Torvalds
49f9c3552c init_task out-of-lining
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAWl80tvSw1s6N8H32AQJq8A//ViRN5fExrd678Eh2Bz1ytrJYMUfYY3Hv
 QTH5TH9zFyLFyWLB1Iwe13sdLVTTM88O0qcDb54Lx9fWUqeMZyYvBhLtWPc00lTU
 0m3EyYR87MFWaEV+VxaVWgWaWkMDkd39KubDitcS+YIBDszTuMpYodhPUsgLt7lr
 pePX7eurXKdQPTh4NUOjGA2NaZot3tga76J6D8NKruGYUstQCGxpP1ryiFfACnwf
 NLWNO8ZBMtlDwX1mHYOOMFMaBzFzXorPm7jY4HJDf3mUM84xI3ach6CuH9RTSzfq
 A+qB1U3QILPVFo2HtqOHui4bFjRwqOf6uIrI/KcnioJ37w1O+KFcMJeDnX2I211q
 f2lXehJLQA7kPmxQw8T3//HDRaLXc0Qxt7IPZRFinrlkcN4oh3DD5euMfCFBSoZG
 PTbjxlgMfzJPoZtqAcy0rV5L54a/F4h915OQPJCKLwujIsXD2nT993vNmGDyq4zh
 BzNMxSXJC8p+jYvQpNhWyyxwDBBT/YsVQo/ACwg4eJnD3blVTAioRT9ZZcAcsY0F
 0z1eWW5RiknzIaXQWvjfK0gYKpO+aMSu9+gipHfMbU3yXG+sPj/H6zAHYzqX3uQZ
 jb5Iujjnu49W/YD+RiMenuu59lNXUnLSeRnlV7dw0qxGK1FzGo24+ZzKFhJhKvzG
 tdfUsev1Mc8=
 =jhWg
 -----END PGP SIGNATURE-----

Merge tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull init_task initializer cleanups from David Howells:
 "It doesn't seem useful to have the init_task in a header file rather
  than in a normal source file. We could consolidate init_task handling
  instead and expand out various macros.

  Here's a series of patches that consolidate init_task handling:

   (1) Make THREAD_SIZE available to vmlinux.lds for cris, hexagon and
       openrisc.

   (2) Alter the INIT_TASK_DATA linker script macro to set
       init_thread_union and init_stack rather than defining these in C.

       Insert init_task and init_thread_into into the init_stack area in
       the linker script as appropriate to the configuration, with
       different section markers so that they end up correctly ordered.

       We can then get merge ia64's init_task.c into the main one.

       We then have a bunch of single-use INIT_*() macros that seem only
       to be macros because they used to be used per-arch. We can then
       expand these in place of the user and get rid of a few lines and
       a lot of backslashes.

   (3) Expand INIT_TASK() in place.

   (4) Expand in place various small INIT_*() macros that are defined
       conditionally. Expand them and surround them by #if[n]def/#endif
       in the .c file as it takes fewer lines.

   (5) Expand INIT_SIGNALS() and INIT_SIGHAND() in place.

   (6) Expand INIT_STRUCT_PID in place.

  These macros can then be discarded"

* tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  Expand INIT_STRUCT_PID and remove
  Expand the INIT_SIGNALS and INIT_SIGHAND macros and remove
  Expand various INIT_* macros and remove
  Expand INIT_TASK() in init/init_task.c and remove
  Construct init thread stack in the linker script rather than by union
  openrisc: Make THREAD_SIZE available to vmlinux.lds
  hexagon: Make THREAD_SIZE available to vmlinux.lds
  cris: Make THREAD_SIZE available to vmlinux.lds
2018-01-29 09:08:34 -08:00
Linus Torvalds
07b0137c02 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "A single fix for a ~10 years old problem which causes high resolution
  timers to stop after a CPU unplug/plug cycle due to a stale flag in
  the per CPU hrtimer base struct.

  Paul McKenney was hunting this for about a year, but the heisenbug
  nature made it resistant against debug attempts for quite some time"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Reset hrtimer cpu base proper on CPU hotplug
2018-01-28 12:17:35 -08:00
Linus Torvalds
624441927f Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
 "A single bug fix to prevent a subtle deadlock in the scheduler core
  code vs cpu hotplug"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix cpu.max vs. cpuhotplug deadlock
2018-01-28 11:51:45 -08:00
Linus Torvalds
39e383626c Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Four patches which all address lock inversions and deadlocks in the
  perf core code and the Intel debug store"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix perf,x86,cpuhp deadlock
  perf/core: Fix ctx::mutex deadlock
  perf/core: Fix another perf,trace,cpuhp lock inversion
  perf/core: Fix lock inversion between perf,trace,cpuhp
2018-01-28 11:48:25 -08:00
Linus Torvalds
8c76e31a6a Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
 "Two final locking fixes for 4.15:

   - Repair the OWNER_DIED logic in the futex code which got wreckaged
     with the recent fix for a subtle race condition.

   - Prevent the hard lockup detector from triggering when dumping all
     held locks in the system"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Avoid triggering hardlockup from debug_show_all_locks()
  futex: Fix OWNER_DEAD fixup
2018-01-28 11:20:35 -08:00
Thomas Gleixner
303c146df1 Merge branch 'timers/urgent' into timers/core
Pick up urgent bug fix and resolve the conflict.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-01-27 15:35:29 +01:00
Thomas Gleixner
d5421ea43d hrtimer: Reset hrtimer cpu base proper on CPU hotplug
The hrtimer interrupt code contains a hang detection and mitigation
mechanism, which prevents that a long delayed hrtimer interrupt causes a
continous retriggering of interrupts which prevent the system from making
progress. If a hang is detected then the timer hardware is programmed with
a certain delay into the future and a flag is set in the hrtimer cpu base
which prevents newly enqueued timers from reprogramming the timer hardware
prior to the chosen delay. The subsequent hrtimer interrupt after the delay
clears the flag and resumes normal operation.

If such a hang happens in the last hrtimer interrupt before a CPU is
unplugged then the hang_detected flag is set and stays that way when the
CPU is plugged in again. At that point the timer hardware is not armed and
it cannot be armed because the hang_detected flag is still active, so
nothing clears that flag. As a consequence the CPU does not receive hrtimer
interrupts and no timers expire on that CPU which results in RCU stalls and
other malfunctions.

Clear the flag along with some other less critical members of the hrtimer
cpu base to ensure starting from a clean state when a CPU is plugged in.

Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
root cause of that hard to reproduce heisenbug. Once understood it's
trivial and certainly justifies a brown paperbag.

Fixes: 41d2e49493 ("hrtimer: Tune hrtimer_interrupt hang logic")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
2018-01-27 15:12:22 +01:00
Andi Kleen
caf7501a1b module/retpoline: Warn about missing retpoline in module
There's a risk that a kernel which has full retpoline mitigations becomes
vulnerable when a module gets loaded that hasn't been compiled with the
right compiler or the right option.

To enable detection of that mismatch at module load time, add a module info
string "retpoline" at build time when the module was compiled with
retpoline support. This only covers compiled C source, but assembler source
or prebuilt object files are not checked.

If a retpoline enabled kernel detects a non retpoline protected module at
load time, print a warning and report it in the sysfs vulnerability file.

[ tglx: Massaged changelog ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: gregkh@linuxfoundation.org
Cc: torvalds@linux-foundation.org
Cc: jeyu@kernel.org
Cc: arjan@linux.intel.com
Link: https://lkml.kernel.org/r/20180125235028.31211-1-andi@firstfloor.org
2018-01-26 15:03:56 +01:00
Peter Zijlstra
0c7296cad6 perf/core: Fix ctx::mutex deadlock
Lockdep noticed the following 3-way lockup scenario:

	sys_perf_event_open()
	  perf_event_alloc()
	    perf_try_init_event()
 #0	      ctx = perf_event_ctx_lock_nested(1)
	      perf_swevent_init()
		swevent_hlist_get()
 #1		  mutex_lock(&pmus_lock)

	perf_event_init_cpu()
 #1	  mutex_lock(&pmus_lock)
 #2	  mutex_lock(&ctx->mutex)

	sys_perf_event_open()
	  mutex_lock_double()
 #2	   mutex_lock()
 #0	   mutex_lock_nested()

And while we need that perf_event_ctx_lock_nested() for HW PMUs such
that they can iterate the sibling list, trying to match it to the
available counters, the software PMUs need do no such thing. Exclude
them.

In particular the swevent triggers the above invertion, while the
tpevent PMU triggers a more elaborate one through their event_mutex.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-25 14:48:30 +01:00
Peter Zijlstra
43fa87f7de perf/core: Fix another perf,trace,cpuhp lock inversion
Lockdep noticed the following 3-way lockup race:

        perf_trace_init()
 #0       mutex_lock(&event_mutex)
          perf_trace_event_init()
            perf_trace_event_reg()
              tp_event->class->reg() := tracepoint_probe_register
 #1              mutex_lock(&tracepoints_mutex)
                  trace_point_add_func()
 #2                  static_key_enable()

 #2	do_cpu_up()
	  perf_event_init_cpu()
 #3	    mutex_lock(&pmus_lock)
 #4	    mutex_lock(&ctx->mutex)

	perf_ioctl()
 #4	  ctx = perf_event_ctx_lock()
	  _perf_iotcl()
	    ftrace_profile_set_filter()
 #0	      mutex_lock(&event_mutex)

Fudge it for now by noting that the tracepoint state does not depend
on the event <-> context relation. Ugly though :/

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-25 14:48:30 +01:00
Peter Zijlstra
82d94856fa perf/core: Fix lock inversion between perf,trace,cpuhp
Lockdep gifted us with noticing the following 4-way lockup scenario:

        perf_trace_init()
 #0       mutex_lock(&event_mutex)
          perf_trace_event_init()
            perf_trace_event_reg()
              tp_event->class->reg() := tracepoint_probe_register
 #1             mutex_lock(&tracepoints_mutex)
                  trace_point_add_func()
 #2                 static_key_enable()

 #2     do_cpu_up()
          perf_event_init_cpu()
 #3         mutex_lock(&pmus_lock)
 #4         mutex_lock(&ctx->mutex)

        perf_event_task_disable()
          mutex_lock(&current->perf_event_mutex)
 #4       ctx = perf_event_ctx_lock()
 #5       perf_event_for_each_child()

        do_exit()
          task_work_run()
            __fput()
              perf_release()
                perf_event_release_kernel()
 #4               mutex_lock(&ctx->mutex)
 #5               mutex_lock(&event->child_mutex)
                  free_event()
                    _free_event()
                      event->destroy() := perf_trace_destroy
 #0                     mutex_lock(&event_mutex);

Fix that by moving the free_event() out from under the locks.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-25 14:48:29 +01:00
Marc Zyngier
c5baa1be8f irqdomain: Kill CONFIG_IRQ_DOMAIN_DEBUG
CONFIG_IRQ_DOMAIN_DEBUG is similar to CONFIG_GENERIC_IRQ_DEBUGFS,
just with less information.

Spring cleanup time.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Yang Shunyong <shunyong.yang@hxt-semitech.com>
Link: https://lkml.kernel.org/r/20180117142647.23622-1-marc.zyngier@arm.com
2018-01-24 12:32:58 +01:00
Peter Zijlstra
ce48c14649 sched/core: Fix cpu.max vs. cpuhotplug deadlock
Tejun reported the following cpu-hotplug lock (percpu-rwsem) read recursion:

  tg_set_cfs_bandwidth()
    get_online_cpus()
      cpus_read_lock()

    cfs_bandwidth_usage_inc()
      static_key_slow_inc()
        cpus_read_lock()

Reported-by: Tejun Heo <tj@kernel.org>
Tested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180122215328.GP3397@worktop
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-24 10:03:44 +01:00
Tejun Heo
88f1c87de1 locking/lockdep: Avoid triggering hardlockup from debug_show_all_locks()
debug_show_all_locks() iterates all tasks and print held locks whole
holding tasklist_lock.  This can take a while on a slow console device
and may end up triggering NMI hardlockup detector if someone else ends
up waiting for tasklist_lock.

Touch the NMI watchdog while printing the held locks to avoid
spuriously triggering the hardlockup detector.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20180122220055.GB1771050@devbig577.frc2.facebook.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-24 10:00:09 +01:00
Peter Zijlstra
a97cb0e7b3 futex: Fix OWNER_DEAD fixup
Both Geert and DaveJ reported that the recent futex commit:

  c1e2f0eaf0 ("futex: Avoid violating the 10th rule of futex")

introduced a problem with setting OWNER_DEAD. We set the bit on an
uninitialized variable and then entirely optimize it away as a
dead-store.

Move the setting of the bit to where it is more useful.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: c1e2f0eaf0 ("futex: Avoid violating the 10th rule of futex")
Link: http://lkml.kernel.org/r/20180122103947.GD2228@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-24 09:58:18 +01:00
Steven Rostedt (VMware)
2ee5b92a25 tracing: Update stack trace skipping for ORC unwinder
With the addition of ORC unwinder and FRAME POINTER unwinder, the stack
trace skipping requirements have changed.

I went through the tracing stack trace dumps with ORC and with frame
pointers and recalculated the proper values.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 15:57:00 -05:00
Steven Rostedt (VMware)
6be7fa3c74 ftrace, orc, x86: Handle ftrace dynamically allocated trampolines
The function tracer can create a dynamically allocated trampoline that is
called by the function mcount or fentry hook that is used to call the
function callback that is registered. The problem is that the orc undwinder
will bail if it encounters one of these trampolines. This breaks the stack
trace of function callbacks, which include the stack tracer and setting the
stack trace for individual functions.

Since these dynamic trampolines are basically copies of the static ftrace
trampolines defined in ftrace_*.S, we do not need to create new orc entries
for the dynamic trampolines. Finding the return address on the stack will be
identical as the functions that were copied to create the dynamic
trampolines. When encountering a ftrace dynamic trampoline, we can just use
the orc entry of the ftrace static function that was copied for that
trampoline.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 15:56:55 -05:00
Eric W. Biederman
f71dd7dc2d signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed
There are so many places that build struct siginfo by hand that at
least one of them is bound to get it wrong.  A handful of cases in the
kernel arguably did just that when using the errno field of siginfo to
pass no errno values to userspace.  The usage is limited to a single
si_code so at least does not mess up anything else.

Encapsulate this questionable pattern in a helper function so
that the userspace ABI is preserved.

Update all of the places that use this pattern to use the new helper
function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-22 19:07:11 -06:00
Eric W. Biederman
382467358a signal: Helpers for faults with specialized siginfo layouts
The helpers added are:
send_sig_mceerr
force_sig_mceerr
force_sig_bnderr
force_sig_pkuerr

Filling out siginfo properly can ge tricky.  Especially for these
specialized cases where the temptation is to share code with other
cases which use a different subset of siginfo fields.  Unfortunately
that code sharing frequently results in bugs with the wrong siginfo
fields filled in, and makes it harder to verify that the siginfo
structure was properly initialized.

Provide these helpers instead that get all of the details right, and
guarantee that siginfo is properly initialized.

send_sig_mceerr and force_sig_mceer are a little special as two si
codes BUS_MCEERR_AO and BUS_MCEER_AR both use the same extended
signinfo layout.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-22 19:07:10 -06:00
Eric W. Biederman
f8ec66014f signal: Add send_sig_fault and force_sig_fault
The vast majority of signals sent from architecture specific code are
simple faults.  Encapsulate this reality with two helper functions so
that the nit-picky implementation of preparing a siginfo does not need
to be repeated many times on each architecture.

As only some architectures support the trapno field, make the trapno
arguement only present on those architectures.

Similary as ia64 has three fields: imm, flags, and isr that
are specific to it.  Have those arguments always present on ia64
and no where else.

This ensures the architecture specific code always remembers which
fields it needs to pass into the siginfo structure.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-22 19:07:09 -06:00
Eric W. Biederman
3b10db2b06 signal: Replace memset(info,...) with clear_siginfo for clarity
The function clear_siginfo is just a nice wrapper around memset so
this results in no functional change.  This change makes mistakes
a little more difficult and it makes it clearer what is going on.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-22 19:07:08 -06:00
Eric W. Biederman
5f74972ce6 signal: Don't use structure initializers for struct siginfo
The siginfo structure has all manners of holes with the result that a
structure initializer is not guaranteed to initialize all of the bits.
As we have to copy the structure to userspace don't even try to use
a structure initializer.  Instead use clear_siginfo followed by initializing
selected fields.  This gives a guarantee that uninitialized kernel memory
is not copied to userspace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-22 19:07:08 -06:00
Linus Torvalds
66f8162418 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
 "A single fix for the new matrix allocator to prevent vector exhaustion
  by certain network drivers which allocate gazillions of unused vectors
  which cannot be put into reservation mode due to MSI and the lack of
  MSI entry masking.

  The fix/workaround is to spread the vectors across CPUs by searching
  the supplied target CPU mask for the CPU with the smallest number of
  allocated vectors"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irq/matrix: Spread interrupts on allocation
2018-01-21 10:39:58 -08:00
Linus Torvalds
ec835f8104 Two more small fixes
- The conversion of enums into their actual numbers to display
    in the event format file had an off-by-one bug, that could cause
    an enum not to be converted, and break user space parsing tools.
 
  - A fix to a previous fix to bring back the context recursion checks.
    The interrupt case checks for NMI, IRQ and softirq, but the softirq
    returned the same number regardless if it was set or not, although
    the logic would force it to be set if it were hit.
 -----BEGIN PGP SIGNATURE-----
 
 iQHIBAABCgAyFiEEPm6V/WuN2kyArTUe1a05Y9njSUkFAlpiEd8UHHJvc3RlZHRA
 Z29vZG1pcy5vcmcACgkQ1a05Y9njSUndmQv9EdmMa/CXyz6IFTEHaQHizNHak5yx
 hyMsKgcyylY+GlS9p1285Gn6d5bdOpa3rbv311YVu4/8lTPlZmWQdgVrmTt/jdiK
 NbKZSG5mXYP8F6n3phenM9PFIFHkrrrS2RKq7J17gwUbE6YTT3HFxCAkK+i9UmRo
 VoRAWSEBDONma1dgh1ezkEFLQ9fucFZMIur3PiHJlZmtuhSGsImzQQBzZO0Gj1gi
 iBPPf90+55IOzPAaSAlj5RafjPFsaVgJC9I+KtSLL+waPWNTEHc9wg8XvogM5mJv
 t2Q8A91NcM4AQ37HP+Tyhot32RprZVrEWBZ6fnrC7RXFLk0HwVk6mCTwBYhoWJoN
 QiHtx5h6BtQTzNMjKBweMrEZlqWds9+y81ZrP5g/EA67yT2FQFC57IuuyH6fzfc6
 ECn6XNyGVafBjUZWhxUG6uwp2P6lsRSvFnT+tAOdocgoUqnDySoyEUIZh6Wa7CQW
 ZtmYjRISjc9220i8m5HWTTj+0imYm0p2W3w8
 =O7m4
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.15-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Two more small fixes

   - The conversion of enums into their actual numbers to display in the
     event format file had an off-by-one bug, that could cause an enum
     not to be converted, and break user space parsing tools.

   - A fix to a previous fix to bring back the context recursion checks.
     The interrupt case checks for NMI, IRQ and softirq, but the softirq
     returned the same number regardless if it was set or not, although
     the logic would force it to be set if it were hit"

* tag 'trace-v4.15-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix converting enum's from the map in trace_event_eval_update()
  ring-buffer: Fix duplicate results in mapping context to bits in recursive lock
2018-01-19 11:38:19 -08:00
Linus Torvalds
8b335c7d22 Merge branch 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
 "cgroup.threads should be delegatable (ie. a container should be able
  to write to it from inside) but was missing the flag.

  The change is very low risk"

* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: make cgroup.threads delegatable
2018-01-19 11:25:17 -08:00
Linus Torvalds
a2c9c1c035 Merge branch 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixlet from Tejun Heo:
 "One patch to add touch_nmi_watchdog() while dumping workqueue debug
  messages to avoid triggering the lockup detector spuriously.

  The change is very low risk"

* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: avoid hard lockups in show_workqueue_state()
2018-01-19 11:23:39 -08:00
Linus Torvalds
726ba84b50 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix BPF divides by zero, from Eric Dumazet and Alexei Starovoitov.

 2) Reject stores into bpf context via st and xadd, from Daniel
    Borkmann.

 3) Fix a memory leak in TUN, from Cong Wang.

 4) Disable RX aggregation on a specific troublesome configuration of
    r8152 in a Dell TB16b dock.

 5) Fix sw_ctx leak in tls, from Sabrina Dubroca.

 6) Fix program replacement in cls_bpf, from Daniel Borkmann.

 7) Fix uninitialized station_info structures in cfg80211, from Johannes
    Berg.

 8) Fix miscalculation of transport header offset field in flow
    dissector, from Eric Dumazet.

 9) Fix LPM tree leak on failure in mlxsw driver, from Ido Schimmel.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
  ibmvnic: Fix IPv6 packet descriptors
  ibmvnic: Fix IP offload control buffer
  ipv6: don't let tb6_root node share routes with other node
  ip6_gre: init dev->mtu and dev->hard_header_len correctly
  mlxsw: spectrum_router: Free LPM tree upon failure
  flow_dissector: properly cap thoff field
  fm10k: mark PM functions as __maybe_unused
  cfg80211: fix station info handling bugs
  netlink: reset extack earlier in netlink_rcv_skb
  can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
  can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
  bpf: mark dst unknown on inconsistent {s, u}bounds adjustments
  bpf: fix cls_bpf on filter replace
  Net: ethernet: ti: netcp: Fix inbound ping crash if MTU size is greater than 1500
  tls: reset crypto_info when do_tls_setsockopt_tx fails
  tls: return -EBUSY if crypto_info is already set
  tls: fix sw_ctx leak
  net/tls: Only attach to sockets in ESTABLISHED state
  net: fs_enet: do not call phy_stop() in interrupts
  r8152: disable RX aggregation on Dell TB16 dock
  ...
2018-01-19 09:30:33 -08:00
Tejun Heo
08a77676f9 string: drop __must_check from strscpy() and restore strscpy() usages in cgroup
e7fd37ba12 ("cgroup: avoid copying strings longer than the buffers")
converted possibly unsafe strncpy() usages in cgroup to strscpy().
However, although the callsites are completely fine with truncated
copied, because strscpy() is marked __must_check, it led to the
following warnings.

  kernel/cgroup/cgroup.c: In function ‘cgroup_file_name’:
  kernel/cgroup/cgroup.c:1400:10: warning: ignoring return value of ‘strscpy’, declared with attribute warn_unused_result [-Wunused-result]
     strscpy(buf, cft->name, CGROUP_FILE_NAME_MAX);
	       ^

To avoid the warnings, 50034ed496 ("cgroup: use strlcpy() instead of
strscpy() to avoid spurious warning") switched them to strlcpy().

strlcpy() is worse than strlcpy() because it unconditionally runs
strlen() on the source string, and the only reason we switched to
strlcpy() here was because it was lacking __must_check, which doesn't
reflect any material differences between the two function.  It's just
that someone added __must_check to strscpy() and not to strlcpy().

These basic string copy operations are used in variety of ways, and
one of not-so-uncommon use cases is safely handling truncated copies,
where the caller naturally doesn't care about the return value.  The
__must_check doesn't match the actual use cases and forces users to
opt for inferior variants which lack __must_check by happenstance or
spread ugly (void) casts.

Remove __must_check from strscpy() and restore strscpy() usages in
cgroup.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
2018-01-19 08:51:36 -08:00
Steven Rostedt (VMware)
1ebe1eaf2f tracing: Fix converting enum's from the map in trace_event_eval_update()
Since enums do not get converted by the TRACE_EVENT macro into their values,
the event format displaces the enum name and not the value. This breaks
tools like perf and trace-cmd that need to interpret the raw binary data. To
solve this, an enum map was created to convert these enums into their actual
numbers on boot up. This is done by TRACE_EVENTS() adding a
TRACE_DEFINE_ENUM() macro.

Some enums were not being converted. This was caused by an optization that
had a bug in it.

All calls get checked against this enum map to see if it should be converted
or not, and it compares the call's system to the system that the enum map
was created under. If they match, then they call is processed.

To cut down on the number of iterations needed to find the maps with a
matching system, since calls and maps are grouped by system, when a match is
made, the index into the map array is saved, so that the next call, if it
belongs to the same system as the previous call, could start right at that
array index and not have to scan all the previous arrays.

The problem was, the saved index was used as the variable to know if this is
a call in a new system or not. If the index was zero, it was assumed that
the call is in a new system and would keep incrementing the saved index
until it found a matching system. The issue arises when the first matching
system was at index zero. The next map, if it belonged to the same system,
would then think it was the first match and increment the index to one. If
the next call belong to the same system, it would begin its search of the
maps off by one, and miss the first enum that should be converted. This left
a single enum not converted properly.

Also add a comment to describe exactly what that index was for. It took me a
bit too long to figure out what I was thinking when debugging this issue.

Link: http://lkml.kernel.org/r/717BE572-2070-4C1E-9902-9F2E0FEDA4F8@oracle.com

Cc: stable@vger.kernel.org
Fixes: 0c564a538a ("tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values")
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Teste-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-18 15:53:10 -05:00
Steven Rostedt (VMware)
0164e0d7e8 ring-buffer: Fix duplicate results in mapping context to bits in recursive lock
In bringing back the context checks, the code checks first if its normal
(non-interrupt) context, and then for NMI then IRQ then softirq. The final
check is redundant. Since the if branch is only hit if the context is one of
NMI, IRQ, or SOFTIRQ, if it's not NMI or IRQ there's no reason to check if
it is SOFTIRQ. The current code returns the same result even if its not a
SOFTIRQ. Which is confusing.

  pc & SOFTIRQ_OFFSET ? 2 : RB_CTX_SOFTIRQ

Is redundant as RB_CTX_SOFTIRQ *is* 2!

Fixes: a0e3a18f4b ("ring-buffer: Bring back context level recursive checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-18 15:45:48 -05:00
David S. Miller
7155f8f391 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-01-18

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a divide by zero due to wrong if (src_reg == 0) check in
   64-bit mode. Properly handle this in interpreter and mask it
   also generically in verifier to guard against similar checks
   in JITs, from Eric and Alexei.

2) Fix a bug in arm64 JIT when tail calls are involved and progs
   have different stack sizes, from Daniel.

3) Reject stores into BPF context that are not expected BPF_STX |
   BPF_MEM variant, from Daniel.

4) Mark dst reg as unknown on {s,u}bounds adjustments when the
   src reg has derived bounds from dead branches, from Daniel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-18 09:17:04 -05:00