Commit Graph

110212 Commits

Author SHA1 Message Date
Linus Torvalds
d70b3ef54c Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Ingo Molnar:
 "There were so many changes in the x86/asm, x86/apic and x86/mm topics
  in this cycle that the topical separation of -tip broke down somewhat -
  so the result is a more traditional architecture pull request,
  collected into the 'x86/core' topic.

  The topics were still maintained separately as far as possible, so
  bisectability and conceptual separation should still be pretty good -
  but there were a handful of merge points to avoid excessive
  dependencies (and conflicts) that would have been poorly tested in the
  end.

  The next cycle will hopefully be much more quiet (or at least will
  have fewer dependencies).

  The main changes in this cycle were:

   * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas
     Gleixner)

     - This is the second and most intrusive part of changes to the x86
       interrupt handling - full conversion to hierarchical interrupt
       domains:

          [IOAPIC domain]   -----
                                 |
          [MSI domain]      --------[Remapping domain] ----- [ Vector domain ]
                                 |   (optional)          |
          [HPET MSI domain] -----                        |
                                                         |
          [DMAR domain]     -----------------------------
                                                         |
          [Legacy domain]   -----------------------------

       This now reflects the actual hardware and allowed us to distangle
       the domain specific code from the underlying parent domain, which
       can be optional in the case of interrupt remapping.  It's a clear
       separation of functionality and removes quite some duct tape
       constructs which plugged the remap code between ioapic/msi/hpet
       and the vector management.

     - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt
       injection into guests (Feng Wu)

   * x86/asm changes:

     - Tons of cleanups and small speedups, micro-optimizations.  This
       is in preparation to move a good chunk of the low level entry
       code from assembly to C code (Denys Vlasenko, Andy Lutomirski,
       Brian Gerst)

     - Moved all system entry related code to a new home under
       arch/x86/entry/ (Ingo Molnar)

     - Removal of the fragile and ugly CFI dwarf debuginfo annotations.
       Conversion to C will reintroduce many of them - but meanwhile
       they are only getting in the way, and the upstream kernel does
       not rely on them (Ingo Molnar)

     - NOP handling refinements. (Borislav Petkov)

   * x86/mm changes:

     - Big PAT and MTRR rework: making the code more robust and
       preparing to phase out exposing direct MTRR interfaces to drivers -
       in favor of using PAT driven interfaces (Toshi Kani, Luis R
       Rodriguez, Borislav Petkov)

     - New ioremap_wt()/set_memory_wt() interfaces to support
       Write-Through cached memory mappings.  This is especially
       important for good performance on NVDIMM hardware (Toshi Kani)

   * x86/ras changes:

     - Add support for deferred errors on AMD (Aravind Gopalakrishnan)

       This is an important RAS feature which adds hardware support for
       poisoned data.  That means roughly that the hardware marks data
       which it has detected as corrupted but wasn't able to correct, as
       poisoned data and raises an APIC interrupt to signal that in the
       form of a deferred error.  It is the OS's responsibility then to
       take proper recovery action and thus prolonge system lifetime as
       far as possible.

     - Add support for Intel "Local MCE"s: upcoming CPUs will support
       CPU-local MCE interrupts, as opposed to the traditional system-
       wide broadcasted MCE interrupts (Ashok Raj)

     - Misc cleanups (Borislav Petkov)

   * x86/platform changes:

     - Intel Atom SoC updates

  ... and lots of other cleanups, fixlets and other changes - see the
  shortlog and the Git log for details"

* 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits)
  x86/hpet: Use proper hpet device number for MSI allocation
  x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
  x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled
  x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled
  x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail
  genirq: Prevent crash in irq_move_irq()
  genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain
  iommu, x86: Properly handle posted interrupts for IOMMU hotplug
  iommu, x86: Provide irq_remapping_cap() interface
  iommu, x86: Setup Posted-Interrupts capability for Intel iommu
  iommu, x86: Add cap_pi_support() to detect VT-d PI capability
  iommu, x86: Avoid migrating VT-d posted interrupts
  iommu, x86: Save the mode (posted or remapped) of an IRTE
  iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
  iommu: dmar: Provide helper to copy shared irte fields
  iommu: dmar: Extend struct irte for VT-d Posted-Interrupts
  iommu: Add new member capability to struct irq_remap_ops
  x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code
  x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation
  x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry()
  ...
2015-06-22 17:59:09 -07:00
Linus Torvalds
650ec5a6bd Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 warning fixlet from Ingo Molnar:
 "A build fix for certain (rare) variants of binutils that did not make
  it into v4.1"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Fix overflow warning with 32-bit binutils
2015-06-22 17:51:59 -07:00
Linus Torvalds
35ffccdb7e Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pul x86 microcode updates from Ingo Molnar:
 "x86 microcode loader updates from Borislav Petkov:

   - early parsing of the built-in microcode

   - cleanups

   - misc smaller fixes"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Correct CPU family related variable types
  x86/microcode: Disable builtin microcode loading on 32-bit for now
  x86/microcode/intel: Rename get_matching_sig()
  x86/microcode/intel: Simplify get_matching_sig()
  x86/microcode/intel: Simplify update_match_cpu()
  x86/microcode/intel: Rename get_matching_microcode
  x86/cpu/microcode: Zap changelog
  x86/microcode: Parse built-in microcode early
  x86/microcode/intel: Remove unused @rev arg of get_matching_sig()
  x86/microcode/intel: Get rid of revision_is_newer()
2015-06-22 17:46:14 -07:00
Linus Torvalds
e2172d8fd5 Merge branch 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kdump updates from Ingo Molnar:
 "Three kdump robustness related improvements (Joerg Roedel)"

* 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/crash: Allocate enough low memory when crashkernel=high
  x86/swiotlb: Try coherent allocations with __GFP_NOWARN
  swiotlb: Warn on allocation failure in swiotlb_alloc_coherent()
2015-06-22 17:40:55 -07:00
Linus Torvalds
e75c73ad64 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU updates from Ingo Molnar:
 "This tree contains two main changes:

   - The big FPU code rewrite: wide reaching cleanups and reorganization
     that pulls all the FPU code together into a clean base in
     arch/x86/fpu/.

     The resulting code is leaner and faster, and much easier to
     understand.  This enables future work to further simplify the FPU
     code (such as removing lazy FPU restores).

     By its nature these changes have a substantial regression risk: FPU
     code related bugs are long lived, because races are often subtle
     and bugs mask as user-space failures that are difficult to track
     back to kernel side backs.  I'm aware of no unfixed (or even
     suspected) FPU related regression so far.

   - MPX support rework/fixes.  As this is still not a released CPU
     feature, there were some buglets in the code - should be much more
     robust now (Dave Hansen)"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (250 commits)
  x86/fpu: Fix double-increment in setup_xstate_features()
  x86/mpx: Allow 32-bit binaries on 64-bit kernels again
  x86/mpx: Do not count MPX VMAs as neighbors when unmapping
  x86/mpx: Rewrite the unmap code
  x86/mpx: Support 32-bit binaries on 64-bit kernels
  x86/mpx: Use 32-bit-only cmpxchg() for 32-bit apps
  x86/mpx: Introduce new 'directory entry' to 'addr' helper function
  x86/mpx: Add temporary variable to reduce masking
  x86: Make is_64bit_mm() widely available
  x86/mpx: Trace allocation of new bounds tables
  x86/mpx: Trace the attempts to find bounds tables
  x86/mpx: Trace entry to bounds exception paths
  x86/mpx: Trace #BR exceptions
  x86/mpx: Introduce a boot-time disable flag
  x86/mpx: Restrict the mmap() size check to bounds tables
  x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK
  x86/mpx: Clean up the code by not passing a task pointer around when unnecessary
  x86/mpx: Use the new get_xsave_field_ptr()API
  x86/fpu/xstate: Wrap get_xsave_addr() to make it safer
  x86/fpu/xstate: Fix up bad get_xsave_addr() assumptions
  ...
2015-06-22 17:16:11 -07:00
Linus Torvalds
cfe3eceb7a Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI updates from Ingo Molnar:
 "EFI changes:

   - Use idiomatic negative error values in efivar_create_sysfs_entry()
     instead of returning '1' to indicate error (Dan Carpenter)

   - Implement new support to expose the EFI System Resource Tables in
     sysfs, which provides information for performing firmware updates
     (Peter Jones)

   - Documentation cleanup in the EFI handover protocol section which
     falsely claimed that 'cmdline_size' needed to be filled out by the
     boot loader (Alex Smith)

   - Align the order of SMBIOS tables in /sys/firmware/efi/systab to
     match the way that we do things for ACPI and add documentation to
     Documentation/ABI (Jean Delvare)"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: Work around ia64 build problem with ESRT driver
  efi: Add 'systab' information to Documentation/ABI
  efi: dmi: List SMBIOS3 table before SMBIOS table
  efi/esrt: Fix some compiler warnings
  x86, doc: Remove cmdline_size from list of fields to be filled in for EFI handover
  efi: Add esrt support
  efi: efivar_create_sysfs_entry() should return negative error codes
2015-06-22 17:10:44 -07:00
Linus Torvalds
b3ba283d83 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CPU features from Ingo Molnar:
 "Various CPU feature support related changes: in particular the
  /proc/cpuinfo model name sanitization change should be monitored, it
  has a chance to break stuff.  (but really shouldn't and there are no
  regression reports)"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/amd: Give access to the number of nodes in a physical package
  x86/cpu: Trim model ID whitespace
  x86/cpu: Strip any /proc/cpuinfo model name field whitespace
  x86/cpu/amd: Set X86_FEATURE_EXTD_APICID for future processors
  x86/gart: Check for GART support before accessing GART registers
2015-06-22 16:43:01 -07:00
Linus Torvalds
d43e4f44ba Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Clean up types in xlate_dev_mem_ptr() some more
  x86: Deinline dma_free_attrs()
  x86: Deinline dma_alloc_attrs()
  x86: Remove unused TI_cpu
  x86: Merge common 32-bit values in asm-offsets.c
2015-06-22 16:23:00 -07:00
Linus Torvalds
23b7776290 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes are:

   - lockless wakeup support for futexes and IPC message queues
     (Davidlohr Bueso, Peter Zijlstra)

   - Replace spinlocks with atomics in thread_group_cputimer(), to
     improve scalability (Jason Low)

   - NUMA balancing improvements (Rik van Riel)

   - SCHED_DEADLINE improvements (Wanpeng Li)

   - clean up and reorganize preemption helpers (Frederic Weisbecker)

   - decouple page fault disabling machinery from the preemption
     counter, to improve debuggability and robustness (David
     Hildenbrand)

   - SCHED_DEADLINE documentation updates (Luca Abeni)

   - topology CPU masks cleanups (Bartosz Golaszewski)

   - /proc/sched_debug improvements (Srikar Dronamraju)"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (79 commits)
  sched/deadline: Remove needless parameter in dl_runtime_exceeded()
  sched: Remove superfluous resetting of the p->dl_throttled flag
  sched/deadline: Drop duplicate init_sched_dl_class() declaration
  sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target
  sched/deadline: Make init_sched_dl_class() __init
  sched/deadline: Optimize pull_dl_task()
  sched/preempt: Add static_key() to preempt_notifiers
  sched/preempt: Fix preempt notifiers documentation about hlist_del() within unsafe iteration
  sched/stop_machine: Fix deadlock between multiple stop_two_cpus()
  sched/debug: Add sum_sleep_runtime to /proc/<pid>/sched
  sched/debug: Replace vruntime with wait_sum in /proc/sched_debug
  sched/debug: Properly format runnable tasks in /proc/sched_debug
  sched/numa: Only consider less busy nodes as numa balancing destinations
  Revert 095bebf61a ("sched/numa: Do not move past the balance point if unbalanced")
  sched/fair: Prevent throttling in early pick_next_task_fair()
  preempt: Reorganize the notrace definitions a bit
  preempt: Use preempt_schedule_context() as the official tracing preemption point
  sched: Make preempt_schedule_context() function-tracing safe
  x86: Remove cpu_sibling_mask() and cpu_core_mask()
  x86: Replace cpu_**_mask() with topology_**_cpumask()
  ...
2015-06-22 15:52:04 -07:00
Linus Torvalds
6bc4c3ad36 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "These are the left over fixes from the v4.1 cycle"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Fix build breakage if prefix= is specified
  perf/x86: Honor the architectural performance monitoring version
  perf/x86/intel: Fix PMI handling for Intel PT
  perf/x86/intel/bts: Fix DS area sharing with x86_pmu events
  perf/x86: Add more Broadwell model numbers
  perf: Fix ring_buffer_attach() RCU sync, again
2015-06-22 15:45:41 -07:00
Linus Torvalds
c58267e9fa Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes mostly consist of work on x86 PMU drivers:

   - x86 Intel PT (hardware CPU tracer) improvements (Alexander
     Shishkin)

   - x86 Intel CQM (cache quality monitoring) improvements (Thomas
     Gleixner)

   - x86 Intel PEBSv3 support (Peter Zijlstra)

   - x86 Intel PEBS interrupt batching support for lower overhead
     sampling (Zheng Yan, Kan Liang)

   - x86 PMU scheduler fixes and improvements (Peter Zijlstra)

  There's too many tooling improvements to list them all - here are a
  few select highlights:

  'perf bench':

      - Introduce new 'perf bench futex' benchmark: 'wake-parallel', to
        measure parallel waker threads generating contention for kernel
        locks (hb->lock). (Davidlohr Bueso)

  'perf top', 'perf report':

      - Allow disabling/enabling events dynamicaly in 'perf top':
        a 'perf top' session can instantly become a 'perf report'
        one, i.e. going from dynamic analysis to a static one,
        returning to a dynamic one is possible, to toogle the
        modes, just press 'f' to 'freeze/unfreeze' the sampling. (Arnaldo Carvalho de Melo)

      - Make Ctrl-C stop processing on TUI, allowing interrupting the load of big
        perf.data files (Namhyung Kim)

  'perf probe': (Masami Hiramatsu)

      - Support glob wildcards for function name
      - Support $params special probe argument: Collect all function arguments
      - Make --line checks validate C-style function name.
      - Add --no-inlines option to avoid searching inline functions
      - Greatly speed up 'perf probe --list' by caching debuginfo.
      - Improve --filter support for 'perf probe', allowing using its arguments
        on other commands, as --add, --del, etc.

  'perf sched':

      - Add option in 'perf sched' to merge like comms to lat output (Josef Bacik)

  Plus tons of infrastructure work - in particular preparation for
  upcoming threaded perf report support, but also lots of other work -
  and fixes and other improvements.  See (much) more details in the
  shortlog and in the git log"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (305 commits)
  perf tools: Configurable per thread proc map processing time out
  perf tools: Add time out to force stop proc map processing
  perf report: Fix sort__sym_cmp to also compare end of symbol
  perf hists browser: React to unassigned hotkey pressing
  perf top: Tell the user how to unfreeze events after pressing 'f'
  perf hists browser: Honour the help line provided by builtin-{top,report}.c
  perf hists browser: Do not exit when 'f' is pressed in 'report' mode
  perf top: Replace CTRL+z with 'f' as hotkey for enable/disable events
  perf annotate: Rename source_line_percent to source_line_samples
  perf annotate: Display total number of samples with --show-total-period
  perf tools: Ensure thread-stack is flushed
  perf top: Allow disabling/enabling events dynamicly
  perf evlist: Add toggle_enable() method
  perf trace: Fix race condition at the end of started workloads
  perf probe: Speed up perf probe --list by caching debuginfo
  perf probe: Show usage even if the last event is skipped
  perf tools: Move libtraceevent dynamic list to separated LDFLAGS variable
  perf tools: Fix a problem when opening old perf.data with different byte order
  perf tools: Ignore .config-detected in .gitignore
  perf probe: Fix to return error if no probe is added
  ...
2015-06-22 15:19:21 -07:00
Linus Torvalds
1bf7067c6e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes are:

   - 'qspinlock' support, enabled on x86: queued spinlocks - these are
     now the spinlock variant used by x86 as they outperform ticket
     spinlocks in every category.  (Waiman Long)

   - 'pvqspinlock' support on x86: paravirtualized variant of queued
     spinlocks.  (Waiman Long, Peter Zijlstra)

   - 'qrwlock' support, enabled on x86: queued rwlocks.  Similar to
     queued spinlocks, they are now the variant used by x86:

       CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
       CONFIG_QUEUED_SPINLOCKS=y
       CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
       CONFIG_QUEUED_RWLOCKS=y

   - various lockdep fixlets

   - various locking primitives cleanups, further WRITE_ONCE()
     propagation"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  locking/lockdep: Remove hard coded array size dependency
  locking/qrwlock: Don't contend with readers when setting _QW_WAITING
  lockdep: Do not break user-visible string
  locking/arch: Rename set_mb() to smp_store_mb()
  locking/arch: Add WRITE_ONCE() to set_mb()
  rtmutex: Warn if trylock is called from hard/softirq context
  arch: Remove __ARCH_HAVE_CMPXCHG
  locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG
  locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS
  locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS
  locking/pvqspinlock: Replace xchg() by the more descriptive set_mb()
  locking/pvqspinlock, x86: Enable PV qspinlock for Xen
  locking/pvqspinlock, x86: Enable PV qspinlock for KVM
  locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching
  locking/pvqspinlock: Implement simple paravirt support for the qspinlock
  locking/qspinlock: Revert to test-and-set on hypervisors
  locking/qspinlock: Use a simple write to grab the lock
  locking/qspinlock: Optimize for smaller NR_CPUS
  locking/qspinlock: Extract out code snippets for the next patch
  locking/qspinlock: Add pending bit
  ...
2015-06-22 14:54:22 -07:00
Linus Torvalds
fc934d4017 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:

 - Continued initialization/Kconfig updates: hide most Kconfig options
   from unsuspecting users.

   There's now a single high level configuration option:

        *
        * RCU Subsystem
        *
        Make expert-level adjustments to RCU configuration (RCU_EXPERT) [N/y/?] (NEW)

   Which if answered in the negative, leaves us with a single
   interactive configuration option:

        Offload RCU callback processing from boot-selected CPUs (RCU_NOCB_CPU) [N/y/?] (NEW)

   All the rest of the RCU options are configured automatically.  Later
   on we'll remove this single leftover configuration option as well.

 - Remove all uses of RCU-protected array indexes: replace the
   rcu_[access|dereference]_index_check() APIs with READ_ONCE() and
   rcu_lockdep_assert()

 - RCU CPU-hotplug cleanups

 - Updates to Tiny RCU: a race fix and further code shrinkage.

 - RCU torture-testing updates: fixes, speedups, cleanups and
   documentation updates.

 - Miscellaneous fixes

 - Documentation updates

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  rcutorture: Allow repetition factors in Kconfig-fragment lists
  rcutorture: Display "make oldconfig" errors
  rcutorture: Update TREE_RCU-kconfig.txt
  rcutorture: Make rcutorture scripts force RCU_EXPERT
  rcutorture: Update configuration fragments for rcutree.rcu_fanout_exact
  rcutorture: TASKS_RCU set directly, so don't explicitly set it
  rcutorture: Test SRCU cleanup code path
  rcutorture: Replace barriers with smp_store_release() and smp_load_acquire()
  locktorture: Change longdelay_us to longdelay_ms
  rcutorture: Allow negative values of nreaders to oversubscribe
  rcutorture: Exchange TREE03 and TREE08 NR_CPUS, speed up CPU hotplug
  rcutorture: Exchange TREE03 and TREE04 geometries
  locktorture: fix deadlock in 'rw_lock_irq' type
  rcu: Correctly handle non-empty Tiny RCU callback list with none ready
  rcutorture: Test both RCU-sched and RCU-bh for Tiny RCU
  rcu: Further shrink Tiny RCU by making empty functions static inlines
  rcu: Conditionally compile RCU's eqs warnings
  rcu: Remove prompt for RCU implementation
  rcu: Make RCU able to tolerate undefined CONFIG_RCU_KTHREAD_PRIO
  rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT_LEAF
  ...
2015-06-22 14:01:01 -07:00
Rafael J. Wysocki
3bcda76d9d Merge branch 'pm-sleep'
* pm-sleep:
  x86: Load __USER_DS into DS/ES after resume
2015-06-22 14:40:28 +02:00
Ingo Molnar
ffa64eff95 x86: Load __USER_DS into DS/ES after resume
Srinivas Pandruvada reported a problem with system resume from
suspend-to-RAM on 32-bit x86 systems where the DS register of
the CPU is set to __KERNEL_DS instead of __USER_DS on return
to user space which cases a General Protection Fault to occur.

The issue is that DS is set to __KERNEL_DS by the ACPI resume code
path while the SYSEXIT path never reloads DS/ES.  It assumes they
are still __USER_DS set at the SYSENTER time (Brian Gerst), so if
the return to user space happens to be through SYSEXIT, it will lead
to the reported GPF.

Fix the problem by setting the DS and ES registers to __USER_DS
as expected by the SYSEXIT path.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=61781
Link: http://marc.info/?l=linux-pm&m=143406648920385&w=2
Acked-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-22 14:40:03 +02:00
Ingo Molnar
7ef3d7d58d Merge branches 'x86/apic', 'x86/asm', 'x86/mm' and 'x86/platform' into x86/core, to merge last updates
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-22 09:15:03 +02:00
Thomas Gleixner
cb17b2a674 x86/hpet: Use proper hpet device number for MSI allocation
hpet_assign_irq() is called with hpet_device->num as "hardware
interrupt number", but hpet_device->num is initialized after the
interrupt has been assigned, so it's always 0. As a consequence only
the first MSI allocation succeeds, the following ones fail because the
"hardware interrupt number" already exists.

Move the initialization of dev->num and other fields before the call
to hpet_assign_irq(), which is the ordering before the offending
commit which introduced that regression.

Fixes: "3cb96f0c9733 x86/hpet: Enhance HPET IRQ to support hierarchical irqdomains"
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1506211635010.4107@nanos
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
2015-06-21 16:38:40 +02:00
Jiang Liu
bafac298fb x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
irq == 0 is not a valid irq for a irqdomain MSI allocation, but hpet
code checks only for negative return values.

Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/558447AF.30703@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-20 12:00:58 +02:00
Vladimir Murzin
86dca36e6b arm64: use private ratelimit state along with show_unhandled_signals
printk_ratelimit() shares the ratelimiting state with other callers what
may lead to scenarios where at the time we want to print out debug
information we already limited, so nothing appears in the dmesg - this
makes exception-trace quite poor helper in debugging.

Additionally, we have imbalance with some messages limited with global
ratelimit state and other messages limited with their private state
defined via pr_*_ratelimited().

To address this inconsistency show_unhandled_signals_ratelimited()
macro is introduced and caller sites are converted to use it.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-19 16:26:15 +01:00
Vladimir Murzin
9e793ab84e arm64: show unhandled SP/PC alignment faults
Report unhandled SP/PC alignment faults if the show_unhandled_signals
variable is set (via /proc/sys/debug/exception-trace).

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-19 16:20:10 +01:00
Wei Huang
e5af058aac KVM: x86/vPMU: reorder PMU functions
Keep called functions closer to their callers, and init/destroy
functions next to each other.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:30 +02:00
Wei Huang
e84cfe4ce0 KVM: x86/vPMU: whitespace and stylistic adjustments in PMU code
Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:30 +02:00
Wei Huang
212dba1267 KVM: x86/vPMU: use the new macros to go between PMC, PMU and VCPU
Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:30 +02:00
Wei Huang
474a5bb944 KVM: x86/vPMU: introduce pmu.h header
This will be used for private function used by AMD- and Intel-specific
PMU implementations.

Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:29 +02:00
Wei Huang
c6702c9dcf KVM: x86/vPMU: rename a few PMU functions
Before introducing a pmu.h header for them, make the naming more
consistent.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:29 +02:00
Xiao Guangrong
6a39bbc5da KVM: MTRR: do not map huge page for non-consistent range
Based on Intel's SDM, mapping huge page which do not have consistent
memory cache for each 4k page will cause undefined behavior

In order to avoiding this kind of undefined behavior, we force to use
4k pages under this case

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:29 +02:00
Xiao Guangrong
fa61213746 KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type
mtrr_for_each_mem_type() is ready now, use it to simplify
kvm_mtrr_get_guest_memory_type()

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:29 +02:00
Xiao Guangrong
f571c0973e KVM: MTRR: introduce mtrr_for_each_mem_type
It walks all MTRRs and gets all the memory cache type setting for the
specified range also it checks if the range is fully covered by MTRRs

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Adjust for range_size->range_shift change. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:29 +02:00
Xiao Guangrong
f7bfb57b3e KVM: MTRR: introduce fixed_mtrr_addr_* functions
Two functions are introduced:
- fixed_mtrr_addr_to_seg() translates the address to the fixed
  MTRR segment

- fixed_mtrr_addr_seg_to_range_index() translates the address to
  the index of kvm_mtrr.fixed_ranges[]

They will be used in the later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Adjust for range_size->range_shift change. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:28 +02:00
Xiao Guangrong
19efffa244 KVM: MTRR: sort variable MTRRs
Sort all valid variable MTRRs based on its base address, it will help us to
check a range to see if it's fully contained in variable MTRRs

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Fix list insertion sort, simplify var_mtrr_range_is_valid to just
 test the V bit. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:28 +02:00
Xiao Guangrong
a13842dc66 KVM: MTRR: introduce var_mtrr_range
It gets the range for the specified variable MTRR

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Simplify boolean operations. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:28 +02:00
Xiao Guangrong
de9aef5e1a KVM: MTRR: introduce fixed_mtrr_segment table
This table summarizes the information of fixed MTRRs and introduce some APIs
to abstract its operation which helps us to clean up the code and will be
used in later patches

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Change range_size to range_shift, in order to avoid udivdi3 errors.
 - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:28 +02:00
Xiao Guangrong
3f3f78b614 KVM: MTRR: improve kvm_mtrr_get_guest_memory_type
- kvm_mtrr_get_guest_memory_type() only checks one page in MTRRs so
   that it's unnecessary to check to see if the range is partially
   covered in MTRR

 - optimize the check of overlap memory type and add some comments
   to explain the precedence

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:28 +02:00
Xiao Guangrong
86fd52701c KVM: MTRR: do not split 64 bits MSR content
Variable MTRR MSRs are 64 bits which are directly accessed with full length,
no reason to split them to two 32 bits

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:27 +02:00
Xiao Guangrong
10fac2dc2b KVM: MTRR: clean up mtrr default type
Drop kvm_mtrr->enable, omit the decode/code workload and get rid of
all the hard code

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:27 +02:00
Xiao Guangrong
910a6aae4e KVM: MTRR: exactly define the size of variable MTRRs
Only KVM_NR_VAR_MTRR variable MTRRs are available in KVM guest

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:27 +02:00
Xiao Guangrong
70109e7d9d KVM: MTRR: remove mtrr_state.have_fixed
vMTRR does not depend on any host MTRR feature and fixed MTRRs have always
been implemented, so drop this field

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:27 +02:00
Xiao Guangrong
eb839917a7 KVM: MTRR: handle MSR_MTRRcap in kvm_mtrr_get_msr
MSR_MTRRcap is a MTRR msr so move the handler to the common place, also
add some comments to make the hard code more readable

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:27 +02:00
Xiao Guangrong
ff53604b40 KVM: x86: move MTRR related code to a separate file
MTRR code locates in x86.c and mmu.c so that move them to a separate file to
make the organization more clearer and it will be the place where we fully
implement vMTRR

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:26 +02:00
Xiao Guangrong
b18d5431ac KVM: x86: fix CR0.CD virtualization
Currently, CR0.CD is not checked when we virtualize memory cache type for
noncoherent_dma guests, this patch fixes it by :

- setting UC for all memory if CR0.CD = 1
- zapping all the last sptes in MMU if CR0.CD is changed

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:26 +02:00
Bandan Das
f104765b4f KVM: nSVM: Check for NRIPS support before updating control field
If hardware doesn't support DecodeAssist - a feature that provides
more information about the intercept in the VMCB, KVM decodes the
instruction and then updates the next_rip vmcb control field.
However, NRIP support itself depends on cpuid Fn8000_000A_EDX[NRIPS].
Since skip_emulated_instruction() doesn't verify nrip support
before accepting control.next_rip as valid, avoid writing this
field if support isn't present.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 17:16:26 +02:00
Paolo Bonzini
05fe125fa3 KVM/ARM changes for v4.2:
- Proper guest time accounting
 - FP access fix for 32bit
 - The usual pile of GIC fixes
 - PSCI fixes
 - Random cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVg/c2AAoJECPQ0LrRPXpDjSoQALjxhdY0ROAz2yjVqcYD5Goj
 8XHLbwDxlXQIqXulEakpx/2CtQs3tYsnt7gg3HoO3aY0TqxKY+0nh9RuZ/b15KcP
 hailUK/hbdLo9sDmoPBRBBP26gqjnXVMYTNqSFk4u2KADYUYgl5auhXcTNARH6Gx
 +c3LJG9HUbbvKkTG+RB+Xgl3f9N0uaIaV15oTX+WsolNIQVmA8Sh5X7nXPnFXV4r
 klb9pcR3FPWA9m+fDW8VszoLZ39a12kfQsK0yshKy5ep/AA1liPEYDVc6Wp/cEqz
 ybDfGuO9GPoPLfzhYq1Cw/SeKaAFruu6QD1ugsSlh1DVZ2evme+OsEvzRsfCqEYc
 VkKkX2Rc8DQTjsJoaTCOmNfU+v0dWwriCXmfLPBj01+55M4oCDcntTasrEsDAc3o
 YkTk68T9HJxsNesWpkZDo0pYzGeu3GDS1Hg/ElESminCNfK/S/GBTtidLq7Rs7KU
 VJCaUeob6CoQnRdmNEm+nHInh5pSoOlKzdduFM9IcvOO5B12YlNMcVnr48Y16ea7
 ElYGkxR/n/uLt33A3Yvaj8+PeJbwxQ1RcWRDJy8Fw77AFLEaT2VTr2kZgJIqTatQ
 ENnCLMxNP2uAt1TrGhEempndYZdWK/RiuaA9MjnxSscFkrXPfn4bMJzV58ghF9L+
 62hk2S8I8q69SvqbPTgv
 =VQWP
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM changes for v4.2:

- Proper guest time accounting
- FP access fix for 32bit
- The usual pile of GIC fixes
- PSCI fixes
- Random cleanups
2015-06-19 17:15:24 +02:00
Herbert Xu
c0b59fafe3 Merge branch 'mvebu/drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Merge the mvebu/drivers branch of the arm-soc tree which contains
just a single patch bfa1ce5f38 ("bus:
mvebu-mbus: add mv_mbus_dram_info_nooverlap()") that happens to be
a prerequisite of the new marvell/cesa crypto driver.
2015-06-19 22:07:07 +08:00
Borislav Petkov
04c17341b4 x86/boot: Fix overflow warning with 32-bit binutils
When building the kernel with 32-bit binutils built with support
only for the i386 target, we get the following warning:

  arch/x86/kernel/head_32.S:66: Warning: shift count out of range (32 is not between 0 and 31)

The problem is that in that case, binutils' internal type
representation is 32-bit wide and the shift range overflows.

In order to fix this, manipulate the shift expression which
creates the 4GiB constant to not overflow the shift count.

Suggested-by: Michael Matz <matz@suse.de>
Reported-and-tested-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19 16:03:26 +02:00
Will Deacon
6f1a6ae87c arm64: vdso: work-around broken ELF toolchains in Makefile
When building the kernel with a bare-metal (ELF) toolchain, the -shared
option may not be passed down to collect2, resulting in silent corruption
of the vDSO image (in particular, the DYNAMIC section is omitted).

The effect of this corruption is that the dynamic linker fails to find
the vDSO symbols and libc is instead used for the syscalls that we
intended to optimise (e.g. gettimeofday). Functionally, there is no
issue as the sigreturn trampoline is still intact and located by the
kernel.

This patch fixes the problem by explicitly passing -shared to the linker
when building the vDSO.

Cc: <stable@vger.kernel.org>
Reported-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Reported-by: James Greenlaigh <james.greenhalgh@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-19 14:54:10 +01:00
Sudeep Holla
af391b15f7 arm64: kernel: rename __cpu_suspend to keep it aligned with arm
This patch renames __cpu_suspend to cpu_suspend so that it's aligned
with ARM32. It also removes the redundant wrapper created.

This is in preparation to implement generic PSCI system suspend using
the cpu_{suspend,resume} which now has the same interface on both ARM
and ARM64.

Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-19 14:46:39 +01:00
Palik, Imre
2c33645d36 perf/x86: Honor the architectural performance monitoring version
Architectural performance monitoring, version 1, doesn't support fixed counters.

Currently, even if a hypervisor advertises support for architectural
performance monitoring version 1, perf may still try to use the fixed
counters, as the constraints are set up based on the CPU model.

This patch ensures that perf honors the architectural performance monitoring
version returned by CPUID, and it only uses the fixed counters for version 2
and above.

(Some of the ideas in this patch came from Peter Zijlstra.)

Signed-off-by: Imre Palik <imrep@amazon.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433767609-1039-1-git-send-email-imrep.amz@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19 09:38:48 +02:00
Alexander Shishkin
1b7b938f18 perf/x86/intel: Fix PMI handling for Intel PT
Intel PT is a separate PMU and it is not using any of the x86_pmu
code paths, which means in particular that the active_events counter
remains intact when new PT events are created.

However, PT uses the generic x86_pmu PMI handler for its PMI handling needs.

The problem here is that the latter checks active_events and in case of it
being zero, exits without calling the actual x86_pmu.handle_nmi(), which
results in unknown NMI errors and massive data loss for PT.

The effect is not visible if there are other perf events in the system
at the same time that keep active_events counter non-zero, for instance
if the NMI watchdog is running, so one needs to disable it to reproduce
the problem.

At the same time, the active_events counter besides doing what the name
suggests also implicitly serves as a PMC hardware and DS area reference
counter.

This patch adds a separate reference counter for the PMC hardware, leaving
active_events for actually counting the events and makes sure it also
counts PT and BTS events.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Link: http://lkml.kernel.org/r/87k2v92t0s.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19 09:38:47 +02:00
Alexander Shishkin
6b099d9b04 perf/x86/intel/bts: Fix DS area sharing with x86_pmu events
Currently, the intel_bts driver relies on the DS area allocated by the x86_pmu
code in its event_init() path, which is a bug: creating a BTS event while
no x86_pmu events are present results in a NULL pointer dereference.

The same DS area is also used by PEBS sampling, which makes it quite a bit
trickier to have a separate one for intel_bts' purposes.

This patch makes intel_bts driver use the same DS allocation and reference
counting code as x86_pmu to make sure it is always present when either
intel_bts or x86_pmu need it.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Link: http://lkml.kernel.org/r/1434024837-9916-2-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19 09:38:47 +02:00
Andi Kleen
4b36f1a413 perf/x86: Add more Broadwell model numbers
This patch adds additional model numbers for Broadwell to perf.
Support for Broadwell with Iris Pro (Intel Core i7-57xxC)
and support for Broadwell Server Xeon.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1434055942-28253-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19 09:38:46 +02:00
Michael Ellerman
6096f88451 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include more 8xx optimizations, an e6500 hugetlb optimization,
QMan device tree nodes, t1024/t1023 support, and various fixes and
cleanup."
2015-06-19 17:23:48 +10:00
Alexey Kardashevskiy
5c89a87d13 powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma()
When pnv_pci_ioda_fixup() is called during PHB fixup time, each PE in
the sorted list of PEs (phb::pe_dma_list) is iterated to setup the PE's
DMA32 space by pnv_ioda_setup_bus_dma() if the PE's DMA32 weight is bigger
than zero. The function also assigns all the subordinate PCI devices of
the PE's primary bus with the PE's DMA32 IOMMU table. It causes the PCI
devicess in the child PEs, which don't have DMA weight, receives wrong
IOMMU table and then IOMMU group.

The patch fixes above issue by more check on the PE's coverage and don't
assign IOMMU table to those PCI devices, which belong to the child PEs.
The problem was found on Firestone platform initially.

Suggested-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-19 17:10:29 +10:00
Aneesh Kumar K.V
e148315852 powerpc/mm: Change the swap encoding in pte.
Current swap encoding in pte can't support large pfns
above 4TB. Change the swap encoding such that we put
the swap type in the PTE bits. Also add build checks
to make sure we don't overlap with HPTEFLAGS.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-19 17:10:29 +10:00
Aneesh Kumar K.V
02505cebc1 powerpc/mm: PTE_RPN_MAX is not used, remove the same
Remove the unused #define

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-19 17:10:28 +10:00
Sam bobroff
b4b56f9eca powerpc/tm: Abort syscalls in active transactions
This patch changes the syscall handler to doom (tabort) active
transactions when a syscall is made and return very early without
performing the syscall and keeping side effects to a minimum (no CPU
accounting or system call tracing is performed). Also included is a
new HWCAP2 bit, PPC_FEATURE2_HTM_NOSC, to indicate this
behaviour to userspace.

Currently, the system call instruction automatically suspends an
active transaction which causes side effects to persist when an active
transaction fails.

This does change the kernel's behaviour, but in a way that was
documented as unsupported.  It doesn't reduce functionality as
syscalls will still be performed after tsuspend; it just requires that
the transaction be explicitly suspended.  It also provides a
consistent interface and makes the behaviour of user code
substantially the same across powerpc and platforms that do not
support suspended transactions (e.g. x86 and s390).

Performance measurements using
http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of
a normal (non-aborted) system call increases by about 0.25%.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-19 17:10:28 +10:00
Rafael J. Wysocki
b306976ef7 Merge branches 'pm-clk', 'pm-domains' and 'powercap'
* pm-clk:
  PM / clk: Print acquired clock name in addition to con_id
  PM / clk: Fix clock error check in __pm_clk_add()
  drivers: sh: remove boilerplate code and use USE_PM_CLK_RUNTIME_OPS
  arm: davinci: remove boilerplate code and use USE_PM_CLK_RUNTIME_OPS
  arm: omap1: remove boilerplate code and use USE_PM_CLK_RUNTIME_OPS
  arm: keystone: remove boilerplate code and use USE_PM_CLK_RUNTIME_OPS
  PM / clock_ops: Provide default runtime ops to users

* pm-domains:
  PM / Domains: Skip timings during syscore suspend/resume

* powercap:
  powercap / RAPL: Support Knights Landing
  powercap / RAPL: Floor frequency setting in Atom SoC
2015-06-19 01:18:30 +02:00
Rafael J. Wysocki
ab232ba570 Merge branches 'pm-sleep' and 'pm-runtime'
* pm-sleep:
  PM / sleep: trace_device_pm_callback coverage in dpm_prepare/complete
  PM / wakeup: add a dummy wakeup_source to record statistics
  PM / sleep: Make suspend-to-idle-specific code depend on CONFIG_SUSPEND
  PM / sleep: Return -EBUSY from suspend_enter() on wakeup detection
  PM / tick: Add tracepoints for suspend-to-idle diagnostics
  PM / sleep: Fix symbol name in a comment in kernel/power/main.c
  leds / PM: fix hibernation on arm when gpio-led used with CPU led trigger
  ARM: omap-device: use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS
  bus: omap_l3_noc: add missed callbacks for suspend-to-disk
  PM / sleep: Add macro to define common noirq system PM callbacks
  PM / sleep: Refine diagnostic messages in enter_state()
  PM / wakeup: validate wakeup source before activating it.

* pm-runtime:
  PM / Runtime: Update last_busy in rpm_resume
  PM / runtime: add note about re-calling in during device probe()
2015-06-19 01:18:02 +02:00
Sebastian Ott
7fc611ff3f s390/pci: improve handling of hotplug event 0x301
Hypervisors may deliver event 0x301 not only for standby
but also for reserved devices.
Just handle event 0x301 regardless of the device's state.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-06-18 15:45:54 +02:00
Sebastian Ott
0c36b8ac70 s390/setup: fix DMA_API_DEBUG warnings
Addresses from the usable space in [_ehead, _stext] lead to false
positives in DMA_API_DEBUG code (which will complain when an address
is in [_text, _etext]).

Avoid these warnings by not using that memory in case of
CONFIG_DMA_API_DEBUG=y.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-06-18 15:45:46 +02:00
Russell King
7f77c5c39d ARM: gemini: Fix race in installing GPIO chained IRQ handler
The gemini code was installing its chained interrupt handler (which
enables the interrupt) before it was setting its data, which is bad if
the IRQ was previously pending.  Avoid this problem by converting it to
irq_set_chained_handler_and_data().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Hans Ulli Kroll <ulli.kroll@googlemail.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/E1Z4z07-0002SO-Gv@rmk-PC.arm.linux.org.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-18 14:03:08 +02:00
Russell King
056c0acf87 ARM: sa1100: convert SA11x0 related code to use new chained handler helper
Convert SA11x0 (Neponset, SA1111, and UCB1x00 code) to use the new
irq_set_chained_handler_and_data() helper.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Hans Ulli Kroll <ulli.kroll@googlemail.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/E1Z4yzx-0002S6-7p@rmk-PC.arm.linux.org.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-18 14:03:08 +02:00
Aravind Gopalakrishnan
cc2749e409 x86/cpu/amd: Give access to the number of nodes in a physical package
Stash the number of nodes in a physical processor package
locally and add an accessor to be called by interested parties.
The first user is the MCE injection module which uses it to find
the node base core in a package for injecting a certain type of
errors.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Rewrote the commit message, merged it with the accessor patch and unified naming. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@osg.samsung.com
Link: http://lkml.kernel.org/r/1433868317-18417-2-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-18 11:16:06 +02:00
Feng Tang
b58d930750 x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail
This question has been asked many times, and finally I found the
official document which explains the problem of HPET on Baytrail,
that it will halt in deep idle states.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john.stultz@linaro.org
Cc: len.brown@intel.com
Cc: matthew.lee@intel.com
Link: http://lkml.kernel.org/r/1434361201-31743-1-git-send-email-feng.tang@intel.com
[ Prettified things a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-18 10:57:38 +02:00
Linus Torvalds
32e0e382ee Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm bugfix from Marcelo Tosatti:
 "Rrestore APIC migration functionality"

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: fix lapic.timer_mode on restore
2015-06-17 20:54:47 -10:00
Mark Brown
9a8d141d5a Merge remote-tracking branches 'spi/topic/ath79', 'spi/topic/atmel' and 'spi/topic/davinci' into spi-next 2015-06-18 00:19:50 +01:00
Bjorn Helgaas
9f3d162071 Merge branch 'pci/resource' into next
* pci/resource:
  x86/PCI: Use host bridge _CRS info on systems with >32 bit addressing
  x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A
2015-06-17 17:24:32 -05:00
Bjorn Helgaas
3d9fecf6bf x86/PCI: Use host bridge _CRS info on systems with >32 bit addressing
We enable _CRS on all systems from 2008 and later.  On older systems, we
ignore _CRS and assume the whole physical address space (excluding RAM and
other devices) is available for PCI devices, but on systems that support
physical address spaces larger than 4GB, it's doubtful that the area above
4GB is really available for PCI.

After d56dbf5bab ("PCI: Allocate 64-bit BARs above 4G when possible"), we
try to use that space above 4GB *first*, so we're more likely to put a
device there.

On Juan's Toshiba Satellite Pro U200, BIOS left the graphics, sound, 1394,
and card reader devices unassigned (but only after Windows had been
booted).  Only the sound device had a 64-bit BAR, so it was the only device
placed above 4GB, and hence the only device that didn't work.

Keep _CRS enabled even on pre-2008 systems if they support physical address
space larger than 4GB.

Fixes: d56dbf5bab ("PCI: Allocate 64-bit BARs above 4G when possible")
Reported-and-tested-by: Juan Dayer <jdayer@outlook.com>
Reported-and-tested-by: Alan Horsfield <alan@hazelgarth.co.uk>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99221
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=907092
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org	# v3.14+
2015-06-17 17:24:06 -05:00
Alexey Kardashevskiy
b5926430df powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off
The pnv_pci_ioda2_unset_window() function is used to do the final
cleanup of a DMA window being released:
- via VFIO ioctl by the guest request;
- via unplugging a virtual PCI function.
However the function was under #ifdef CONFIG_IOMMU_API and was missing.

This moves the helper outside of IOMMU_API block and enables it
for either or both IOMMU_API and PCI_IOV.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-18 07:16:02 +10:00
Jeremy Kerr
cdf2bc1c28 powerpc/include: Add opal-prd to installed uapi headers
We'll want to build the opal-prd daemon with the prd headers, so include
this in the uapi headers list.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-18 07:16:02 +10:00
Jeremy Kerr
7185795a62 powerpc/powernv: fix construction of opal PRD messages
We currently have a bug in the PRD code, where the contents of an
incoming message (beyond the header) will be overwritten by the list
item manipulations when adding to to the prd_msg_queue.

This change reorders struct opal_prd_msg_queue_item, so that the
message body doesn't overlap the list_head.

We also clarify the memcpy of the message, as we're copying unnecessary
bytes at the end of the message data.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-18 07:16:01 +10:00
Alistair Popple
02b6505c8f powerpc/powernv: Increase opal-irqchip initcall priority
The eeh subsystem for powernv requires the opal event irqchip to be
initialised prior to initialisation or the following errors are
produced (and eeh doesn't work as expected):

irq: XICS didn't like hwirq-0x9 to VIRQ17 mapping (rc=-22)
pnv_eeh_post_init: Can't request OPAL event interrupt (0)

On powernv eeh is initialised from a subsys_initcall due to a check
for machine_is(powernv) in eeh_init(). This patch increases the
initcall priority of opal_event_init() to an arch_initcall to ensure
the opal event interface is initialised prior to any users of it.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reported-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-18 07:16:01 +10:00
Vladimir Murzin
4e2ee96a63 arm64: compat: print compat_sp instead of sp
We check against compat_sp, but print out arm64's sp - fix it.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-17 14:30:53 +01:00
Dave P Martin
b9bcc91993 arm64: mm: Fix freeing of the wrong memmap entries with !SPARSEMEM_VMEMMAP
The memmap freeing code in free_unused_memmap() computes the end of
each memblock by adding the memblock size onto the base.  However,
if SPARSEMEM is enabled then the value (start) used for the base
may already have been rounded downwards to work out which memmap
entries to free after the previous memblock.

This may cause memmap entries that are in use to get freed.

In general, you're not likely to hit this problem unless there
are at least 2 memblocks and one of them is not aligned to a
sparsemem section boundary.  Note that carve-outs can increase
the number of memblocks by splitting the regions listed in the
device tree.

This problem doesn't occur with SPARSEMEM_VMEMMAP, because the
vmemmap code deals with freeing the unused regions of the memmap
instead of requiring the arch code to do it.

This patch gets the memblock base out of the memblock directly when
computing the block end address to ensure the correct value is used.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-17 14:29:34 +01:00
Mark Rutland
46b0567c85 arm64: entry: fix context tracking for el0_sp_pc
Commit 6c81fe7925 ("arm64: enable context tracking") did not
update el0_sp_pc to use ct_user_exit, but this appears to have been
unintentional. In commit 6ab6463aeb ("arm64: adjust el0_sync so
that a function can be called") we made x0 available, and in the return
to userspace we call ct_user_enter in the kernel_exit macro.

Due to this, we currently don't correctly inform RCU of the user->kernel
transition, and may erroneously account for time spent in the kernel as
if we were in an extended quiescent state when CONFIG_CONTEXT_TRACKING
is enabled.

As we do record the kernel->user transition, a userspace application
making accesses from an unaligned stack pointer can demonstrate the
imbalance, provoking the following warning:

------------[ cut here ]------------
WARNING: CPU: 2 PID: 3660 at kernel/context_tracking.c:75 context_tracking_enter+0xd8/0xe4()
Modules linked in:
CPU: 2 PID: 3660 Comm: a.out Not tainted 4.1.0-rc7+ #8
Hardware name: ARM Juno development board (r0) (DT)
Call trace:
[<ffffffc000089914>] dump_backtrace+0x0/0x124
[<ffffffc000089a48>] show_stack+0x10/0x1c
[<ffffffc0005b3cbc>] dump_stack+0x84/0xc8
[<ffffffc0000b3214>] warn_slowpath_common+0x98/0xd0
[<ffffffc0000b330c>] warn_slowpath_null+0x14/0x20
[<ffffffc00013ada4>] context_tracking_enter+0xd4/0xe4
[<ffffffc0005b534c>] preempt_schedule_irq+0xd4/0x114
[<ffffffc00008561c>] el1_preempt+0x4/0x28
[<ffffffc0001b8040>] exit_files+0x38/0x4c
[<ffffffc0000b5b94>] do_exit+0x430/0x978
[<ffffffc0000b614c>] do_group_exit+0x40/0xd4
[<ffffffc0000c0208>] get_signal+0x23c/0x4f4
[<ffffffc0000890b4>] do_signal+0x1ac/0x518
[<ffffffc000089650>] do_notify_resume+0x5c/0x68
---[ end trace 963c192600337066 ]---

This patch adds the missing ct_user_exit to the el0_sp_pc entry path,
correcting the context tracking for this case.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Fixes: 6c81fe7925 ("arm64: enable context tracking")
Cc: <stable@vger.kernel.org> # v3.17+
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-17 11:53:19 +01:00
Marc Zyngier
4642019dc4 arm/arm64: KVM: vgic: Do not save GICH_HCR / ICH_HCR_EL2
The GIC Hypervisor Configuration Register is used to enable
the delivery of virtual interupts to a guest, as well as to
define in which conditions maintenance interrupts are delivered
to the host.

This register doesn't contain any information that we need to
read back (the EOIcount is utterly useless for us).

So let's save ourselves some cycles, and not save it before
writing zero to it.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 09:59:55 +01:00
Lorenzo Pieralisi
e2d997366d ARM: kvm: psci: fix handling of unimplemented functions
According to the PSCI specification and the SMC/HVC calling
convention, PSCI function_ids that are not implemented must
return NOT_SUPPORTED as return value.

Current KVM implementation takes an unhandled PSCI function_id
as an error and injects an undefined instruction into the guest
if PSCI implementation is called with a function_id that is not
handled by the resident PSCI version (ie it is not implemented),
which is not the behaviour expected by a guest when calling a
PSCI function_id that is not implemented.

This patch fixes this issue by returning NOT_SUPPORTED whenever
the kvm PSCI call is executed for a function_id that is not
implemented by the PSCI kvm layer.

Cc: <stable@vger.kernel.org> # 3.18+
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 09:46:29 +01:00
Alex Bennée
921ef1e16c KVM: arm64: fix misleading comments in save/restore
The elr_el2 and spsr_el2 registers in fact contain the processor state
before entry into EL2. In the case of guest state it could be in either
el0 or el1.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 09:46:29 +01:00
Kim Phillips
8889583c03 KVM: arm/arm64: Enable the KVM-VFIO device
The KVM-VFIO device is used by the QEMU VFIO device. It is used to
record the list of in-use VFIO groups so that KVM can manipulate
them.

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 09:46:29 +01:00
Christoffer Dall
1b3d546daf arm/arm64: KVM: Properly account for guest CPU time
Until now we have been calling kvm_guest_exit after re-enabling
interrupts when we come back from the guest, but this has the
unfortunate effect that CPU time accounting done in the context of timer
interrupts occurring while the guest is running doesn't properly notice
that the time since the last tick was spent in the guest.

Inspired by the comment in the x86 code, move the kvm_guest_exit() call
below the local_irq_enable() call and change __kvm_guest_exit() to
kvm_guest_exit(), because we are now calling this function with
interrupts enabled.  We have to now explicitly disable preemption and
not enable preemption before we've called kvm_guest_exit(), since
otherwise we could be preempted and everything happening before we
eventually get scheduled again would be accounted for as guest time.

At the same time, move the trace_kvm_exit() call outside of the atomic
section, since there is no reason for us to do that with interrupts
disabled.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 09:46:28 +01:00
Tiejun Chen
ea2c6d9745 kvm: remove one useless check extension
We already check KVM_CAP_IRQFD in generic once enable CONFIG_HAVE_KVM_IRQFD,

kvm_vm_ioctl_check_extension_generic()
    |
    + switch (arg) {
    +   ...
    +   #ifdef CONFIG_HAVE_KVM_IRQFD
    +       case KVM_CAP_IRQFD:
    +   #endif
    +   ...
    +   return 1;
    +   ...
    + }
    |
    + kvm_vm_ioctl_check_extension()

So its not necessary to check this in arch again, and also fix one typo,
s/emlation/emulation.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-06-17 09:46:28 +01:00
Marc Zyngier
85e84ba310 arm: KVM: force execution of HCPTR access on VM exit
On VM entry, we disable access to the VFP registers in order to
perform a lazy save/restore of these registers.

On VM exit, we restore access, test if we did enable them before,
and save/restore the guest/host registers if necessary. In this
sequence, the FPEXC register is always accessed, irrespective
of the trapping configuration.

If the guest didn't touch the VFP registers, then the HCPTR access
has now enabled such access, but we're missing a barrier to ensure
architectural execution of the new HCPTR configuration. If the HCPTR
access has been delayed/reordered, the subsequent access to FPEXC
will cause a trap, which we aren't prepared to handle at all.

The same condition exists when trapping to enable VFP for the guest.

The fix is to introduce a barrier after enabling VFP access. In the
vmexit case, it can be relaxed to only takes place if the guest hasn't
accessed its view of the VFP registers, making the access to FPEXC safe.

The set_hcptr macro is modified to deal with both vmenter/vmexit and
vmtrap operations, and now takes an optional label that is branched to
when the guest hasn't touched the VFP registers.

Reported-by: Vikram Sethi <vikrams@codeaurora.org>
Cc: stable@kernel.org	# v3.9+
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17 09:40:14 +01:00
Shreyas B. Prabhu
3609d819a3 powerpc: Make doorbell check preemption safe
Doorbell can be used to cause ipi on cpus which are sibling threads on
the same core. So icp_native_cause_ipi checks if the destination cpu
is a sibling thread of the current cpu and uses doorbell in such cases.

But while running with CONFIG_PREEMPT=y, since this section is
preemtible, we can run into issues if after we check if the destination
cpu is a sibling cpu, the task gets migrated from a sibling cpu to a
cpu on another core.

Fix this by using get_cpu()/ put_cpu()

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-17 08:01:03 +10:00
Bjorn Helgaas
bf933dbb84 Merge branches 'pci/host-designware', 'pci/host-designware-common', 'pci/host-generic', 'pci/host-imx6', 'pci/host-iproc' and 'pci/host-xgene' into next
* pci/host-designware:
  PCI: designware: Use iATU0 for cfg and IO, iATU1 for MEM
  PCI: designware: Consolidate outbound iATU programming functions
  PCI: designware: Add support for x8 links

* pci/host-designware-common:
  PCI: designware: Wait for link to come up with consistent style
  PCI: layerscape: Factor out ls_pcie_establish_link()
  PCI: layerscape: Use dw_pcie_link_up() consistently
  PCI: dra7xx: Use dw_pcie_link_up() consistently
  PCI: imx6: Rename imx6_pcie_start_link() to imx6_pcie_establish_link()

* pci/host-generic:
  of/pci: Fix pci_address_to_pio() conversion of CPU address to I/O port

* pci/host-imx6:
  PCI: imx6: Add #define PCIE_RC_LCSR
  PCI: imx6: Use "u32", not "uint32_t"
  PCI: imx6: Add speed change timeout message

* pci/host-iproc:
  PCI: iproc: Free resource list after registration
  PCI: iproc: Directly add PCI resources
  PCI: iproc: Add BCMA PCIe driver
  PCI: iproc: Allow override of device tree IRQ mapping function

* pci/host-xgene:
  arm64: dts: Add APM X-Gene PCIe MSI nodes
  PCI: xgene: Add APM X-Gene v1 PCIe MSI/MSIX termination driver
2015-06-16 08:19:55 -05:00
Radim Krčmář
b6ac069532 KVM: x86: fix lapic.timer_mode on restore
lapic.timer_mode was not properly initialized after migration, which
broke few useful things, like login, by making every sleep eternal.

Fix this by calling apic_update_lvtt in kvm_apic_post_state_restore.

There are other slowpaths that update lvtt, so this patch makes sure
something similar doesn't happen again by calling apic_update_lvtt
after every modification.

Cc: stable@vger.kernel.org
Fixes: f30ebc312c ("KVM: x86: optimize some accesses to LVTT and SPIV")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-06-16 01:01:10 -03:00
Mark Rutland
822bf4833e arm64: defconfig: enable memtest
The kernel memtest utility is incredibly useful for detecting memory
problems, but sadly isn't in defconfig.

The memtest itself is only run when the user has explicitly passed a
memtest option on the kernel command line, so simply enabling the option
should not have a negative impact.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-15 16:57:46 +01:00
Suthikulpanit, Suravee
b6197b93fa arm64 : Introduce support for ACPI _CCA object
section 6.2.17 _CCA states that ARM platforms require ACPI _CCA
object to be specified for DMA-cabpable devices. Therefore, this patch
specifies ACPI_CCA_REQUIRED in arm64 Kconfig.

In addition, to handle the case when _CCA is missing, arm64 would assign
dummy_dma_ops to disable DMA capability of the device.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-15 14:40:49 +02:00
Jeremiah Mahler
de1e00871d crypto: aesni - fix crypto_fpu_exit() section mismatch
The '__init aesni_init()' function calls the '__exit crypto_fpu_exit()'
function directly.  Since they are in different sections, this generates
a warning.

  make CONFIG_DEBUG_SECTION_MISMATCH=y
  ...
  WARNING: arch/x86/crypto/aesni-intel.o(.init.text+0x12b): Section
  mismatch in reference from the function init_module() to the function
  .exit.text:crypto_fpu_exit()
  The function __init init_module() references
  a function __exit crypto_fpu_exit().
  This is often seen when error handling in the init function
  uses functionality in the exit path.
  The fix is often to remove the __exit annotation of
  crypto_fpu_exit() so it may be used outside an exit section.

Fix the warning by removing the __exit annotation.

Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-15 18:15:58 +08:00
Geert Uytterhoeven
4e0a64124f s390/mm: s/specifiation/specification/, s/an specification/a specification/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-06-15 10:50:48 +02:00
Michael Ellerman
4bece972fc powerpc/powernv: pnv_init_idle_states() should only run on powernv
Although this init call checks for device tree properties before doing
anything, it should still only run on powernv machines.

Reviewed-by: Shreyas B Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-15 16:45:12 +10:00
Linus Torvalds
1f1e34f723 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull more MIPS fixes from Ralf Baechle:
 "Another round of 4.1 MIPS fixes, one fix to a MIPS-specific #if
  condition in lib/mpi, one fix to the MIPS GIC irqchip driver and one
  SSB fix.

  Details:
   - fix handling of clock in chipco SSB driver.
   - fix two MIPS-specific #if conditions to correctly work for GCC 5.1.
   - fix damage to R6 pgtable bits done by XPA support.
   - fix possible crash due to unloading modules that contain statically
     defined platform devices.
   - fix disabling of the MSA ASE on context switch to also work
     correctly when a new thread/process has the CPU for the very first
     time.

  This is part of linux-next and has been beaten to death on
  Imagination's test farm.

  While things are not looking too grim this pull request also means the
  rate of fixes for 4.1 remains nearly constant so I'd not be unhappy if
  you'd delay the release"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MPI: MIPS: Fix compilation error with GCC 5.1
  IRQCHIP: mips-gic: Don't nest calls to do_IRQ()
  MIPS: MSA: bugfix - disable MSA correctly for new threads/processes.
  MIPS: Loongson: Do not register 8250 platform device from module.
  MIPS: Cobalt: Do not build MTD platform device registration code as module.
  SSB: Fix handling of ssb_pmu_get_alp_clock()
  MIPS: pgtable-bits: Fix XPA damage to R6 definitions.
2015-06-14 15:38:57 -10:00
Linus Torvalds
d37479aac5 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "A regression fix for a crash, and a Intel HSW uncore PMU driver fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization"
  perf/x86/intel/uncore: Fix CBOX bit wide and UBOX reg on Haswell-EP
2015-06-14 14:00:13 -10:00
David S. Miller
25c43bf13b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-13 23:56:52 -07:00
Vladimir Murzin
c7d6b573fe arm64: mm: remove reference to tlb.S from comment block
tlb.S has been removed since fa48e6f "arm64: mm: Optimise tlb flush logic
where we have >4K granule", so align comment with that.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-12 15:36:52 +01:00
Catalin Marinas
565630d503 arm64: Do not attempt to use init_mm in reset_context()
After secondary CPU boot or hotplug, the active_mm of the idle thread is
&init_mm. The init_mm.pgd (swapper_pg_dir) is only meant for TTBR1_EL1
and must not be set in TTBR0_EL1. Since when active_mm == &init_mm the
TTBR0_EL1 is already set to the reserved value, there is no need to
perform any context reset.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
2015-06-12 15:36:18 +01:00
Marc Zyngier
8a14849b4a arm64: KVM: Switch vgic save/restore to alternative_insn
So far, we configured the world-switch by having a small array
of pointers to the save and restore functions, depending on the
GIC used on the platform.

Loading these values each time is a bit silly (they never change),
and it makes sense to rely on the instruction patching instead.

This leads to a nice cleanup of the code.

Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-12 15:12:08 +01:00
Marc Zyngier
94a9e04aa1 arm64: alternative: Introduce feature for GICv3 CPU interface
Add a new item to the feature set (ARM64_HAS_SYSREG_GIC_CPUIF)
to indicate that we have a system register GIC CPU interface

This will help KVM switching to alternative instruction patching.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-12 15:11:50 +01:00
Feng Wu
959c870f73 iommu, x86: Provide irq_remapping_cap() interface
Add a new interface irq_remapping_cap() to detect whether irq
remapping supports new features, such as VT-d Posted-Interrupts.

Export the function, so that KVM code can check this and use this
mechanism properly.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/1433827237-3382-10-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-12 11:33:52 +02:00
Feng Wu
8541186faf iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
Interrupt chip callback to set the VCPU affinity for posted interrupts.

[ tglx: Use the helper function to copy from the remap irte instead of
        open coding it. Massage the comment as well ]

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: joro@8bytes.org
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/1433827237-3382-5-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-12 11:33:52 +02:00
Feng Wu
6f28192394 iommu: Add new member capability to struct irq_remap_ops
Add a new member 'capability' to struct irq_remap_ops for storing
information about available capabilities such as VT-d
Posted-Interrupts.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/1433827237-3382-2-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-12 11:33:51 +02:00
Dave Hansen
a842400367 x86/fpu: Fix double-increment in setup_xstate_features()
I noticed that my MPX tracepoints were producing garbage for the
lower and upper bounds:

	mpx_bounds_register_exception: address referenced: 0x00007fffffffccb7 bounds: lower: 0x0 ~upper: 0xffffffffffffffff
	mpx_bounds_register_exception: address referenced: 0x00007fffffffccbf bounds: lower: 0x0 ~upper: 0xffffffffffffffff

This is, of course, bogus because 0x00007fffffffccbf is *within*
the bounds.  I assumed that my instruction decoder was bad and
went looking at it.  But I eventually realized that I was
getting a '0' offset back from xstate_offsets[BNDREGS].

It was being skipped in the initialization, which is obviously
bogus, so remove the extra leaf++.

This also goes an initializes xstate_offsets/sizes[] to -1 so
so that bugs like this will oops instead of silently failing
in interesting ways.

This was introduced by:

	39f1acd ("x86/fpu/xstate: Don't assume the first zero xfeatures zero bit means the end")

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@sr71.net
Link: http://lkml.kernel.org/r/20150611193400.2E0B00DB@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-12 10:48:12 +02:00