Commit Graph

5680 Commits

Author SHA1 Message Date
Nadav Amit
c205fb7d7d KVM: x86: #PF error-code on R/W operations is wrong
When emulating an instruction that reads the destination memory operand (i.e.,
instructions without the Mov flag in the emulator), the operand is first read.
If a page-fault is detected in this phase, the error-code which would be
delivered to the VM does not indicate that the access that caused the exception
is a write one. This does not conform with real hardware, and may cause the VM
to enter the page-fault handler twice for no reason (once for read, once for
write).

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-09 10:24:11 +01:00
Marcelo Tosatti
7c6a98dfa1 KVM: x86: add method to test PIR bitmap vector
kvm_x86_ops->test_posted_interrupt() returns true/false depending
whether 'vector' is set.

Next patch makes use of this interface.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:18 +01:00
Eugene Korenevsky
e9ac033e6b KVM: nVMX: Improve nested msr switch checking
This patch improve checks required by Intel Software Developer Manual.
 - SMM MSRs are not allowed.
 - microcode MSRs are not allowed.
 - check x2apic MSRs only when LAPIC is in x2apic mode.
 - MSR switch areas must be aligned to 16 bytes.
 - address of first and last byte in MSR switch areas should not set any bits
   beyond the processor's physical-address width.

Also it adds warning messages on failures during MSR switch. These messages
are useful for people who debug their VMMs in nVMX.

Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:15 +01:00
Wincy Van
ff651cb613 KVM: nVMX: Add nested msr load/restore algorithm
Several hypervisors need MSR auto load/restore feature.
We read MSRs from VM-entry MSR load area which specified by L1,
and load them via kvm_set_msr in the nested entry.
When nested exit occurs, we get MSRs via kvm_get_msr, writing
them to L1`s MSR store area. After this, we read MSRs from VM-exit
MSR load area, and load them via kvm_set_msr.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:14 +01:00
Luck, Tony
d4812e169d x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks
We now switch to the kernel stack when a machine check interrupts
during user mode.  This means that we can perform recovery actions
in the tail of do_machine_check()

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-07 07:47:42 -08:00
Andy Lutomirski
bced35b65a x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic
In some IST handlers, if the interrupt came from user mode,
we can safely enable preemption.  Add helpers to do it safely.

This is intended to be used my the memory failure code in
do_machine_check.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-02 10:22:46 -08:00
Andy Lutomirski
83653c16da x86: Clean up current_stack_pointer
There's no good reason for it to be a macro, and x86_64 will want to
use it, so it should be in a header.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-02 10:22:46 -08:00
Andy Lutomirski
9592747538 x86, traps: Track entry into and exit from IST context
We currently pretend that IST context is like standard exception
context, but this is incorrect.  IST entries from userspace are like
standard exceptions except that they use per-cpu stacks, so they are
atomic.  IST entries from kernel space are like NMIs from RCU's
perspective -- they are not quiescent states even if they
interrupted the kernel during a quiescent state.

Add and use ist_enter and ist_exit to track IST context.  Even
though x86_32 has no IST stacks, we track these interrupts the same
way.

This fixes two issues:

 - Scheduling from an IST interrupt handler will now warn.  It would
   previously appear to work as long as we got lucky and nothing
   overwrote the stack frame.  (I don't know of any bugs in this
   that would trigger the warning, but it's good to be on the safe
   side.)

 - RCU handling in IST context was dangerous.  As far as I know,
   only machine checks were likely to trigger this, but it's good to
   be on the safe side.

Note that the machine check handlers appears to have been missing
any context tracking at all before this patch.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-02 10:22:46 -08:00
Ingo Molnar
2aba73a614 This is hopefully the last vdso fix for 3.19. It should be very
safe (it just adds a volatile).
 
 I don't think it fixes an actual bug (the __getcpu calls in the
 pvclock code may not have been needed in the first place), but
 discussion on that point is ongoing.
 
 It also fixes a big performance issue in 3.18 and earlier in which
 the lsl instructions in vclock_gettime got hoisted so far up the
 function that they happened even when the function they were in was
 never called.  n 3.19, the performance issue seems to be gone due to
 the whims of my compiler and some interaction with a branch that's
 now gone.
 
 I'll hopefully have a much bigger overhaul of the pvclock code
 for 3.20, but it needs careful review.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUmduGAAoJEK9N98ZeDfrk874H/RkkP+y6/DmdKVR1dTOUQW4u
 f1wPU0/sc5xywGjNcfR3XwUuyBJyd3s81WVaE5XXHfCHnbjG2Z4CNTqga27hL1D0
 io01Q2s3dh1Y5c0cccVmJmyw//YVzMUOzGTNM9R0NKQNXmYUz6jgQaqk+wWORdD6
 JXCU3/LI5VT0fjNPLj1M9l59eC2Qg/V4GqY2xRJ1AfbwkX1CFZTcWUPb+4FScVYv
 9gds/vOoFg54MypVJD4SeIC9I8U0qcim9gV7gGFdzyDNCXS5J4P+02sEOFNu8oYy
 HVK1B0LXhswT08Ho1yRxXUhFxpqEGeGJvTlDTvwy+r/yuKE2AVBtlhLQBqMPhnY=
 =u3d2
 -----END PGP SIGNATURE-----

Merge tag 'pr-20141223-x86-vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/urgent

Pull VDSO fix from Andy Lutomirski:

 "This is hopefully the last vdso fix for 3.19.  It should be very
  safe (it just adds a volatile).

  I don't think it fixes an actual bug (the __getcpu calls in the
  pvclock code may not have been needed in the first place), but
  discussion on that point is ongoing.

  It also fixes a big performance issue in 3.18 and earlier in which
  the lsl instructions in vclock_gettime got hoisted so far up the
  function that they happened even when the function they were in was
  never called.  n 3.19, the performance issue seems to be gone due to
  the whims of my compiler and some interaction with a branch that's
  now gone.

  I'll hopefully have a much bigger overhaul of the pvclock code
  for 3.20, but it needs careful review."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-01 22:21:22 +01:00
Andy Lutomirski
1ddf0b1b11 x86, vdso: Use asm volatile in __getcpu
In Linux 3.18 and below, GCC hoists the lsl instructions in the
pvclock code all the way to the beginning of __vdso_clock_gettime,
slowing the non-paravirt case significantly.  For unknown reasons,
presumably related to the removal of a branch, the performance issue
is gone as of

e76b027e64 x86,vdso: Use LSL unconditionally for vgetcpu

but I don't trust GCC enough to expect the problem to stay fixed.

There should be no correctness issue, because the __getcpu calls in
__vdso_vlock_gettime were never necessary in the first place.

Note to stable maintainers: In 3.18 and below, depending on
configuration, gcc 4.9.2 generates code like this:

     9c3:       44 0f 03 e8             lsl    %ax,%r13d
     9c7:       45 89 eb                mov    %r13d,%r11d
     9ca:       0f 03 d8                lsl    %ax,%ebx

This patch won't apply as is to any released kernel, but I'll send a
trivial backported version if needed.

Fixes: 51c19b4f59 x86: vdso: pvclock gettime support
Cc: stable@vger.kernel.org # 3.8+
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2014-12-23 13:05:30 -08:00
Borislav Petkov
6ca7a8a150 x86/fpu: Use a symbolic name for asm operand
Fix up the else-case in fpu_fxsave() which seems like it has
been overlooked. Correct comment style in restore_fpu_checking()
while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1419170543-11393-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-23 10:41:12 +01:00
Li Bin
b5bfc51707 livepatch: move x86 specific ftrace handler code to arch/x86
The execution flow redirection related implemention in the livepatch
ftrace handler is depended on the specific architecture. This patch
introduces klp_arch_set_pc(like kgdb_arch_set_pc) interface to change
the pt_regs.

Signed-off-by: Li Bin <huawei.libin@huawei.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-12-22 15:40:49 +01:00
Seth Jennings
b700e7f03d livepatch: kernel: add support for live patching
This commit introduces code for the live patching core.  It implements
an ftrace-based mechanism and kernel interface for doing live patching
of kernel and kernel module functions.

It represents the greatest common functionality set between kpatch and
kgraft and can accept patches built using either method.

This first version does not implement any consistency mechanism that
ensures that old and new code do not run together.  In practice, ~90% of
CVEs are safe to apply in this way, since they simply add a conditional
check.  However, any function change that can not execute safely with
the old version of the function can _not_ be safely applied in this
version.

[ jkosina@suse.cz: due to the number of contributions that got folded into
  this original patch from Seth Jennings, add SUSE's copyright as well, as
  discussed via e-mail ]

Signed-off-by: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-12-22 15:40:49 +01:00
Linus Torvalds
60815cf2e0 kernel: Provide READ_ONCE and ASSIGN_ONCE
As discussed on LKML http://marc.info/?i=54611D86.4040306%40de.ibm.com
 ACCESS_ONCE might fail with specific compilers for non-scalar accesses.
 
 Here is a set of patches to tackle that problem.
 
 The first patch introduce READ_ONCE and ASSIGN_ONCE. If the data structure
 is larger than the machine word size memcpy is used and a warning is emitted.
 The next patches fix up several in-tree users of ACCESS_ONCE on non-scalar
 types.
 
 This merge does not yet contain a patch that forces ACCESS_ONCE to work only
 on scalar types. This is targetted for the next merge window as Linux next
 already contains new offenders regarding ACCESS_ONCE vs. non-scalar types.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJUkrVGAAoJEBF7vIC1phx8stkP/2LmN5y6LOseoEW06xa5MX4m
 cbIKsZNtsGHl7EDcTzzuWs6Sq5/Cj7V3yzeBF7QGbUKOqvFWU3jvpUBCCfjMg37C
 77/Vf0ZPrxTXXxeJ4Ykdy2CGvuMtuYY9TWkrRNKmLU0xex7lGblEzCt9z6+mZviw
 26/DN8ctjkHRvIUAi+7RfQBBc3oSMYAC1mzxYKBAsAFLV+LyFmsGU/4iofZMAsdt
 XFyVXlrLn0Bjx/MeceGkOlMDiVx4FnfccfFaD4hhuTLBJXWitkUK/MRa4JBiXWzH
 agY8942A8/j9wkI2DFp/pqZYqA/sTXLndyOWlhE//ZSti0n0BSJaOx3S27rTLkAc
 5VmZEVyIrS3hyOpyyAi0sSoPkDnjeCHmQg9Rqn34/poKLd7JDrW2UkERNCf/T3eh
 GI2rbhAlZz3v5mIShn8RrxzslWYmOObpMr3HYNUdRk8YUfTf6d6aZ3txHp2nP4mD
 VBAEzsvP9rcVT2caVhU2dnBzeaZAj3zeDxBtjcb3X2osY9tI7qgLc9Fa/fWKgILk
 2evkLcctsae2mlLNGHyaK3Dm/ZmYJv+57MyaQQEZNfZZgeB1y4k0DkxH4w1CFmCi
 s8XlH5voEHgnyjSQXXgc/PNVlkPAKr78ZyTiAfiKmh8rpe41/W4hGcgao7L9Lgiu
 SI0uSwKibuZt4dHGxQuG
 =IQ5o
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux

Pull ACCESS_ONCE cleanup preparation from Christian Borntraeger:
 "kernel: Provide READ_ONCE and ASSIGN_ONCE

  As discussed on LKML http://marc.info/?i=54611D86.4040306%40de.ibm.com
  ACCESS_ONCE might fail with specific compilers for non-scalar
  accesses.

  Here is a set of patches to tackle that problem.

  The first patch introduce READ_ONCE and ASSIGN_ONCE.  If the data
  structure is larger than the machine word size memcpy is used and a
  warning is emitted.  The next patches fix up several in-tree users of
  ACCESS_ONCE on non-scalar types.

  This does not yet contain a patch that forces ACCESS_ONCE to work only
  on scalar types.  This is targetted for the next merge window as Linux
  next already contains new offenders regarding ACCESS_ONCE vs.
  non-scalar types"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux:
  s390/kvm: REPLACE barrier fixup with READ_ONCE
  arm/spinlock: Replace ACCESS_ONCE with READ_ONCE
  arm64/spinlock: Replace ACCESS_ONCE READ_ONCE
  mips/gup: Replace ACCESS_ONCE with READ_ONCE
  x86/gup: Replace ACCESS_ONCE with READ_ONCE
  x86/spinlock: Replace ACCESS_ONCE with READ_ONCE
  mm: replace ACCESS_ONCE with READ_ONCE or barriers
  kernel: Provide READ_ONCE and ASSIGN_ONCE
2014-12-20 16:48:59 -08:00
Srinidhi Kasagar
e7ddf4b7b3 cpufreq: Add SFI based cpufreq driver support
This adds the SFI based cpu freq driver for some of the Intel's
Silvermont based Atom architectures like Z34xx and Z35xx.

Signed-off-by: Rudramuni, Vishwesh M <vishwesh.m.rudramuni@intel.com>
Signed-off-by: Srinidhi Kasagar <srinidhi.kasagar@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2014-12-20 13:37:37 -05:00
Linus Torvalds
e589c9e13a Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Thomas Gleixner:
 "After stopping the full x86/apic branch, I took some time to go
  through the first block of patches again, which are mostly cleanups
  and preparatory work for the irqdomain conversion and ioapic hotplug
  support.

  Unfortunaly one of the real problematic commits was right at the
  beginning, so I rebased this portion of the pending patches without
  the offenders.

  It would be great to get this into 3.19.  That makes reworking the
  problematic parts simpler.  The usual tip testing did not unearth any
  issues and it is fully bisectible now.

  I'm pretty confident that this wont affect the calmness of the xmas
  season.

  Changes:
   - Split the convoluted io_apic.c code into domain specific parts
     (vector, ioapic, msi, htirq)
   - Introduce proper helper functions to retrieve irq specific data
     instead of open coded dereferencing of pointers
   - Preparatory work for ioapic hotplug and irqdomain conversion
   - Removal of the non functional pci-ioapic driver
   - Removal of unused irq entry stubs
   - Make native_smp_prepare_cpus() preemtible to avoid GFP_ATOMIC
     allocations for everything which is called from there.
   - Small cleanups and fixes"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  iommu/amd: Use helpers to access irq_cfg data structure associated with IRQ
  iommu/vt-d: Use helpers to access irq_cfg data structure associated with IRQ
  x86: irq_remapping: Use helpers to access irq_cfg data structure associated with IRQ
  x86, irq: Use helpers to access irq_cfg data structure associated with IRQ
  x86, irq: Make MSI and HT_IRQ indepenent of X86_IO_APIC
  x86, irq: Move IRQ initialization routines from io_apic.c into vector.c
  x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h
  x86, irq: Move HT IRQ related code from io_apic.c into htirq.c
  x86, irq: Move PCI MSI related code from io_apic.c into msi.c
  x86, irq: Replace printk(KERN_LVL) with pr_lvl() utilities
  x86, irq: Make UP version of irq_complete_move() an inline stub
  x86, irq: Move local APIC related code from io_apic.c into vector.c
  x86, irq: Introduce helpers to access struct irq_cfg
  x86, irq: Protect __clear_irq_vector() with vector_lock
  x86, irq: Rename local APIC related functions in io_apic.c as apic_xxx()
  x86, irq: Refine hw_irq.h to prepare for irqdomain support
  x86, irq: Convert irq_2_pin list to generic list
  x86, irq: Kill useless parameter 'irq_attr' of IO_APIC_get_PCI_irq_vector()
  x86, irq, acpi: Get rid of special handling of GSI for ACPI SCI
  x86, irq: Introduce helper to check whether an IOAPIC has been registered
  ...
2014-12-19 14:02:02 -08:00
Linus Torvalds
1092b596a5 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
 "This contains a single TLS ABI validation fix from Andy Lutomirski"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tls: Don't validate lm in set_thread_area() after all
2014-12-19 13:18:31 -08:00
Linus Torvalds
66dcff86ba 3.19 changes for KVM:
- spring cleaning: removed support for IA64, and for hardware-assisted
 virtualization on the PPC970
 - ARM, PPC, s390 all had only small fixes
 
 For x86:
 - small performance improvements (though only on weird guests)
 - usual round of hardware-compliancy fixes from Nadav
 - APICv fixes
 - XSAVES support for hosts and guests.  XSAVES hosts were broken because
 the (non-KVM) XSAVES patches inadvertently changed the KVM userspace
 ABI whenever XSAVES was enabled; hence, this part is going to stable.
 Guest support is just a matter of exposing the feature and CPUID leaves
 support.
 
 Right now KVM is broken for PPC BookE in your tree (doesn't compile).
 I'll reply to the pull request with a patch, please apply it either
 before the pull request or in the merge commit, in order to preserve
 bisectability somewhat.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJUkpg+AAoJEL/70l94x66DUmoH/jzXYkptSW9NGgm79KqxGJlD
 lzLnLBkitVvx++Mz5YBhdJEhKKLUlCtifFT1zPJQ/pthQhIRSaaAwZyNGgUs5w5x
 yMGKHiPQFyZRbmQtZhCInW0BftJoYHHciO3nUfHCZnp34My9MP2D55W7/z+fYFfQ
 DuqBSE9ThyZJtZ4zh8NRA9fCOeuqwVYRyoBs820Wbsh4cpIBoIK63Dg7k+CLE+ZV
 MZa/mRL6bAfsn9W5bnOUAgHJ3SPznnWbO3/g0aV+roL/5pffblprJx9lKNR08xUM
 6hDFLop2gDehDJesDkY/o8Ckp1hEouvfsVpSShry4vcgtn0hgh2O5/6Orbmj6vE=
 =Zwq1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM update from Paolo Bonzini:
 "3.19 changes for KVM:

   - spring cleaning: removed support for IA64, and for hardware-
     assisted virtualization on the PPC970

   - ARM, PPC, s390 all had only small fixes

  For x86:
   - small performance improvements (though only on weird guests)
   - usual round of hardware-compliancy fixes from Nadav
   - APICv fixes
   - XSAVES support for hosts and guests.  XSAVES hosts were broken
     because the (non-KVM) XSAVES patches inadvertently changed the KVM
     userspace ABI whenever XSAVES was enabled; hence, this part is
     going to stable.  Guest support is just a matter of exposing the
     feature and CPUID leaves support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits)
  KVM: move APIC types to arch/x86/
  KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
  KVM: PPC: Book3S HV: Improve H_CONFER implementation
  KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
  KVM: PPC: Book3S HV: Remove code for PPC970 processors
  KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
  KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
  arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
  arch: powerpc: kvm: book3s_pr.c: Remove unused function
  arch: powerpc: kvm: book3s.c: Remove some unused functions
  arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
  KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
  KVM: PPC: Book3S HV: ptes are big endian
  KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
  KVM: PPC: Book3S HV: Fix KSM memory corruption
  KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
  KVM: PPC: Book3S HV: Fix computation of tlbie operand
  KVM: PPC: Book3S HV: Add missing HPTE unlock
  KVM: PPC: BookE: Improve irq inject tracepoint
  arm/arm64: KVM: Require in-kernel vgic for the arch timers
  ...
2014-12-18 16:05:28 -08:00
Andy Lutomirski
3fb2f4237b x86/tls: Don't validate lm in set_thread_area() after all
It turns out that there's a lurking ABI issue.  GCC, when
compiling this in a 32-bit program:

struct user_desc desc = {
	.entry_number    = idx,
	.base_addr       = base,
	.limit           = 0xfffff,
	.seg_32bit       = 1,
	.contents        = 0, /* Data, grow-up */
	.read_exec_only  = 0,
	.limit_in_pages  = 1,
	.seg_not_present = 0,
	.useable         = 0,
};

will leave .lm uninitialized.  This means that anything in the
kernel that reads user_desc.lm for 32-bit tasks is unreliable.

Revert the .lm check in set_thread_area().  The value never did
anything in the first place.

Fixes: 0e58af4e1d ("x86/tls: Disallow unusual TLS segments")
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # Only if 0e58af4e1d is backported
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-18 12:12:26 +01:00
Christian Borntraeger
4f9d1382e6 x86/spinlock: Replace ACCESS_ONCE with READ_ONCE
ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the spinlock code to replace ACCESS_ONCE with READ_ONCE.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-12-18 09:54:38 +01:00
Paolo Bonzini
cb5281a572 KVM: move APIC types to arch/x86/
They are not used anymore by IA64, move them away.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-18 09:39:51 +01:00
Linus Torvalds
eb64c3c6cd xen: additional features for 3.19-rc0
- Linear p2m for x86 PV guests which simplifies the p2m code, improves
   performance and will allow for > 512 GB PV guests in the future.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJUjx7OAAoJEFxbo/MsZsTRXLIH/ishF/xDCL6F5r0I0SKDuaz5
 C/BediDcFzbzh4/t3x2PrPooHk4gPmeyIg688ZGgBAxHRXC5OJ2U5tdtZ/qUCnwf
 0J1pdp/yoAOVRJT+Sax10lN4+G8YV7+6Ptikz0C7glXBAg8SgFL3Y6tfBS0jNwYR
 wQph09S9n7gMZTodSBLbb0ymtJMhl16DrETJsYV73sU7bAL5sFDVkMQvY3SxkusX
 GNFeALfqM0cSK9mDI6O9avGJKoIdKlzt7VWHdlc+yKTlQsoyg/cSH3AaihhG6af9
 IElRxwH9Z40VFLKip0gNMOIrUwAjFGSw6N+Uhik27tlmvfI3Dll/+gsMz/5sHc8=
 =OyoK
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.19-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull additional xen update from David Vrabel:
 "Xen: additional features for 3.19-rc0

   - Linear p2m for x86 PV guests which simplifies the p2m code,
     improves performance and will allow for > 512 GB PV guests in the
     future.

  A last-minute, configuration specific issue was discovered with this
  change which is why it was not included in my previous pull request.
  This is now been fixed and tested"

* tag 'stable/for-linus-3.19-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: switch to post-init routines in xen mmu.c earlier
  Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
  xen: annotate xen_set_identity_and_remap_chunk() with __init
  xen: introduce helper functions to do safe read and write accesses
  xen: Speed up set_phys_to_machine() by using read-only mappings
  xen: switch to linear virtual mapped sparse p2m list
  xen: Hide get_phys_to_machine() to be able to tune common path
  x86: Introduce function to get pmd entry pointer
  xen: Delay invalidating extra memory
  xen: Delay m2p_override initialization
  xen: Delay remapping memory of pv-domain
  xen: use common page allocation function in p2m.c
  xen: Make functions static
  xen: fix some style issues in p2m.c
2014-12-16 13:23:03 -08:00
Jiang Liu
11d686e956 x86, irq: Move IRQ initialization routines from io_apic.c into vector.c
Move IRQ initialization routines from io_apic.c into vector.c,
preparing for enabling hierarchy irqdomain.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414397531-28254-15-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:17 +01:00
Jiang Liu
8643e28da2 x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h
Clean up code by moving IOAPIC related declarations from hw_irq.h into
io_apic.h.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Aubrey <aubrey.li@linux.intel.com>
Cc: Ryan Desfosses <ryan@desfo.org>
Cc: Quentin Lambert <lambert.quentin@gmail.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: http://lkml.kernel.org/r/1414397531-28254-14-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:17 +01:00
Jiang Liu
443809828c x86, irq: Move PCI MSI related code from io_apic.c into msi.c
Create arch/x86/kernel/apic/msi.c to host MSI related code,
preparing for enabling hierarchy irqdomain.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414397531-28254-12-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:17 +01:00
Thomas Gleixner
f0e5bf7583 x86, irq: Make UP version of irq_complete_move() an inline stub
No point for having an empty real function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:16 +01:00
Jiang Liu
74afab7af7 x86, irq: Move local APIC related code from io_apic.c into vector.c
Create arch/x86/kernel/apic/vector.c to host local APIC related code,
prepare for making MSI/HT_IRQ independent of IOAPIC.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1414397531-28254-10-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:16 +01:00
Jiang Liu
55a0e2b122 x86, irq: Introduce helpers to access struct irq_cfg
Change irq_cfg() from static to extern, also introduce helper function
irqd_cfg(). Later we can rewrite these two helpers when enabling
hierarchy irqdomain.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1414397531-28254-9-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:16 +01:00
Jiang Liu
cb39288cd6 x86, irq: Rename local APIC related functions in io_apic.c as apic_xxx()
Rename local APIC related functions in io_apic.c as apic_xxx() instead
of ioapic_xxx(), later they will be moved into separate file.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1414397531-28254-7-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:16 +01:00
Jiang Liu
26011eee04 x86, irq: Refine hw_irq.h to prepare for irqdomain support
Refine hw_irq.h to prepare for irqdomain support by:

1) guarding common APIC related interfaces with CONFIG_X86_LOCAL_APIC
2) guarding interrupt remapping related interfaces with CONFIG_IRQ_REMAP
3) guarding IOAPIC related interfaces with CONFIG_X86_IO_APIC

No functional changes.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414397531-28254-6-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:16 +01:00
Yinghai Lu
a178b87b20 x86, irq: Convert irq_2_pin list to generic list
Use generic list to replace private list implementation so we can use
the existing helper functions.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1414397531-28254-5-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Joerg Roedel <joro@8bytes.org>
2014-12-16 14:08:16 +01:00
Jiang Liu
25d0d35ed7 x86, irq: Kill useless parameter 'irq_attr' of IO_APIC_get_PCI_irq_vector()
None of the callers requires irq_attr to be filled
in. IO_APIC_get_PCI_irq_vector() does not do anything useful with it
either.

Remove the parameter and fixup the call sites.

[ tglx: Massaged changelog ]

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Ryan Desfosses <ryan@desfo.org>
Cc: Quentin Lambert <lambert.quentin@gmail.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: http://lkml.kernel.org/r/1414397531-28254-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:16 +01:00
Jiang Liu
e89900c9ad x86, irq: Introduce helper to check whether an IOAPIC has been registered
Introduce acpi_ioapic_registered() to check whether an IOAPIC has already
been registered, it will be used when enabling IOAPIC hotplug.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-18-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:15 +01:00
Jiang Liu
15516a3b86 x86, irq, ACPI: Implement interfaces to support ACPI based IOAPIC hot-removal
Implement acpi_unregister_ioapic() to support ACPI based IOAPIC hot-removal.
An IOAPIC could only be removed when all its pins are unused.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-17-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:15 +01:00
Jiang Liu
35ef9c941c x86, irq: Refine mp_register_ioapic() to prepare for IOAPIC hotplug
Refine mp_register_ioapic() to prepare for IOAPIC hotplug by:
1) change return value from void to int.
2) check for gsi range conflicts
3) check for IOAPIC physical address conflicts
4) enhance the way to allocate IOAPIC index

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-14-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:15 +01:00
Jiang Liu
67dc5e701f x86, irq: Remove __init marker for functions will be used by IOAPIC hotplug
Remove __init marker for functions which will be used by IOAPIC hotplug
at runtime.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-12-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:15 +01:00
Jan Beulich
2414e021ac x86: Avoid building unused IRQ entry stubs
When X86_LOCAL_APIC (i.e. unconditionally on x86-64),
first_system_vector will never end up being higher than
LOCAL_TIMER_VECTOR (0xef), and hence building stubs for vectors
0xef...0xff is pointlessly reducing code density. Deal with this at
build time already.

Taking into consideration that X86_64 implies X86_LOCAL_APIC, also
simplify (and hence make easier to read and more consistent with the
change done here) some #if-s in arch/x86/kernel/irqinit.c.

While we could further improve the packing of the IRQ entry stubs (the
four ones now left in the last set could be fit into the four padding
bytes each of the final four sets have) this doesn't seem to provide
any real benefit: Both irq_entries_start and common_interrupt getting
cache line aligned, eliminating the 30th set would just produce 32
bytes of padding between the 29th and common_interrupt.

[ tglx: Folded lguest fix from Dan Carpenter ]

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: lguest@lists.ozlabs.org
Cc: Rusty Russell <rusty@rustcorp.com.au>
Link: http://lkml.kernel.org/r/54574D5F0200007800044389@mail.emea.novell.com
Link: http://lkml.kernel.org/r/20141115185718.GB6530@mwanda
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:14 +01:00
Jan Beulich
e106798259 x86: irq: Fix placement of mp_should_keep_irq()
While f3761db164 ("x86, irq: Fix build error caused by
9eabc99a635a77cbf09") addressed the original build problem,
declaration, inline stub, and definition still seem misplaced: It isn't
really IO-APIC related, and it's being used solely in arch/x86/pci/.
This also means stubbing it out when !CONFIG_X86_IO_APIC was at least
questionable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/545747BE020000780004436E@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:14 +01:00
Jiang Liu
e32c67e0cb x86, irq: Provide empty send_cleanup_vector() stub for UP builds
Define an empty send_cleanup_vector() for UP kernel to fix link error
of undefined reference, which is used by uv_irq and irq_remapping.

[ tglx: Made it an inline stub and moved it ahead of the file split
  	changes ]

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1414397531-28254-21-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:14 +01:00
Linus Torvalds
536e89ee53 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes (mainly Andy's TLS fixes), plus a cleanup"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tls: Disallow unusual TLS segments
  x86/tls: Validate TLS entries to protect espfix
  MAINTAINERS: Add me as x86 VDSO submaintainer
  x86/asm: Unify segment selector defines
  x86/asm: Guard against building the 32/64-bit versions of the asm-offsets*.c file directly
  x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  x86/mm: Use min() instead of min_t() in the e820 printout code
  x86/mm: Fix zone ranges boot printout
  x86/doc: Update documentation after file shuffling
2014-12-14 11:51:50 -08:00
Linus Torvalds
f96fe22567 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull another networking update from David Miller:
 "Small follow-up to the main merge pull from the other day:

  1) Alexander Duyck's DMA memory barrier patch set.

  2) cxgb4 driver fixes from Karen Xie.

  3) Add missing export of fixed_phy_register() to modules, from Mark
     Salter.

  4) DSA bug fixes from Florian Fainelli"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
  net/macb: add TX multiqueue support for gem
  linux/interrupt.h: remove the definition of unused tasklet_hi_enable
  jme: replace calls to redundant function
  net: ethernet: davicom: Allow to select DM9000 for nios2
  net: ethernet: smsc: Allow to select SMC91X for nios2
  cxgb4: Add support for QSA modules
  libcxgbi: fix freeing skb prematurely
  cxgb4i: use set_wr_txq() to set tx queues
  cxgb4i: handle non-pdu-aligned rx data
  cxgb4i: additional types of negative advice
  cxgb4/cxgb4i: set the max. pdu length in firmware
  cxgb4i: fix credit check for tx_data_wr
  cxgb4i: fix tx immediate data credit check
  net: phy: export fixed_phy_register()
  fib_trie: Fix trie balancing issue if new node pushes down existing node
  vlan: Add ability to always enable TSO/UFO
  r8169:update rtl8168g pcie ephy parameter
  net: dsa: bcm_sf2: force link for all fixed PHY devices
  fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads
  r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
  ...
2014-12-12 16:11:12 -08:00
Linus Torvalds
9d050966e2 xen: features and fixes for 3.19-rc0
- Fully support non-coherent devices on ARM by introducing the
   mechanisms to request the hypervisor to perform the required cache
   maintainance operations.
 
 - A number of pciback bug fixes and cleanups.  Notably a deadlock fix
   if a PCI device was manually uunbound and a fix for incorrectly
   restoring state after a function reset.
 
 - In x86 PVHVM guests, use the APIC for interrupts if this has been
   virtualized by the hardware.  This reduces the number of interrupt-
   related VM exits on such hardware.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJUiYb+AAoJEFxbo/MsZsTRwmEH+gNaJz5r8gIJlq8Q51+nOIs4
 Gw6HdjUB5MOT47vDV4treEOx0Bk8hYTfgWUWvAC81JMJ1sMWOVrUGuG/0lmzaomW
 zXvSk+o0n4LafwEhHb8LIccZMbaH7f9o3PNdNchrTkPrIl8Gf2nmBXCkDsT4mRye
 5ZFpc4ntgBrznh3baPYDS8PCAmlyZ0uVEnz1ofYI6S80dC13siEiPG0c9TrNEKzO
 glhvgCRmR0C4ZNLblM36HWBEqrdLuGCoNJSH+7okygyP2TLD3aO4R+9aD5JWYNdf
 fO2WmivX/zK+UGVAElrLx+rb8R2dv3ddeaE5piZhIBUieopIWJd32L3LhQORdtc=
 =N6DP
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.19-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen features and fixes from David Vrabel:

 - Fully support non-coherent devices on ARM by introducing the
   mechanisms to request the hypervisor to perform the required cache
   maintainance operations.

 - A number of pciback bug fixes and cleanups.  Notably a deadlock fix
   if a PCI device was manually uunbound and a fix for incorrectly
   restoring state after a function reset.

 - In x86 PVHVM guests, use the APIC for interrupts if this has been
   virtualized by the hardware.  This reduces the number of interrupt-
   related VM exits on such hardware.

* tag 'stable/for-linus-3.19-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (26 commits)
  Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
  xen/pci: Use APIC directly when APIC virtualization hardware is available
  xen/pci: Defer initialization of MSI ops on HVM guests
  xen-pciback: drop SR-IOV VFs when PF driver unloads
  xen/pciback: Restore configuration space when detaching from a guest.
  PCI: Expose pci_load_saved_state for public consumption.
  xen/pciback: Remove tons of dereferences
  xen/pciback: Print out the domain owning the device.
  xen/pciback: Include the domain id if removing the device whilst still in use
  driver core: Provide an wrapper around the mutex to do lockdep warnings
  xen/pciback: Don't deadlock when unbinding.
  swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single
  swiotlb-xen: call xen_dma_sync_single_for_device when appropriate
  swiotlb-xen: remove BUG_ON in xen_bus_to_phys
  swiotlb-xen: pass dev_addr to xen_dma_unmap_page and xen_dma_sync_single_for_cpu
  xen/arm: introduce GNTTABOP_cache_flush
  xen/arm/arm64: introduce xen_arch_need_swiotlb
  xen/arm/arm64: merge xen/mm32.c into xen/mm.c
  xen/arm: use hypercall to flush caches in map_page
  xen: add a dma_addr_t dev_addr argument to xen_dma_map_page
  ...
2014-12-11 18:15:33 -08:00
Alexander Duyck
1077fa36f2 arch: Add lightweight memory barriers dma_rmb() and dma_wmb()
There are a number of situations where the mandatory barriers rmb() and
wmb() are used to order memory/memory operations in the device drivers
and those barriers are much heavier than they actually need to be.  For
example in the case of PowerPC wmb() calls the heavy-weight sync
instruction when for coherent memory operations all that is really needed
is an lsync or eieio instruction.

This commit adds a coherent only version of the mandatory memory barriers
rmb() and wmb().  In most cases this should result in the barrier being the
same as the SMP barriers for the SMP case, however in some cases we use a
barrier that is somewhere in between rmb() and smp_rmb().  For example on
ARM the rmb barriers break down as follows:

  Barrier   Call     Explanation
  --------- -------- ----------------------------------
  rmb()     dsb()    Data synchronization barrier - system
  dma_rmb() dmb(osh) data memory barrier - outer sharable
  smp_rmb() dmb(ish) data memory barrier - inner sharable

These new barriers are not as safe as the standard rmb() and wmb().
Specifically they do not guarantee ordering between coherent and incoherent
memories.  The primary use case for these would be to enforce ordering of
reads and writes when accessing coherent memory that is shared between the
CPU and a device.

It may also be noted that there is no dma_mb().  Most architectures don't
provide a good mechanism for performing a coherent only full barrier without
resorting to the same mechanism used in mb().  As such there isn't much to
be gained in trying to define such a function.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 21:15:06 -05:00
Alexander Duyck
8a44971841 arch: Cleanup read_barrier_depends() and comments
This patch is meant to cleanup the handling of read_barrier_depends and
smp_read_barrier_depends.  In multiple spots in the kernel headers
read_barrier_depends is defined as "do {} while (0)", however we then go
into the SMP vs non-SMP sections and have the SMP version reference
read_barrier_depends, and the non-SMP define it as yet another empty
do/while.

With this commit I went through and cleaned out the duplicate definitions
and reduced the number of definitions down to 2 per header.  In addition I
moved the 50 line comments for the macro from the x86 and mips headers that
defined it as an empty do/while to those that were actually defining the
macro, alpha and blackfin.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 21:15:05 -05:00
Linus Torvalds
70e71ca0af Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

 2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers.  Thanks to Al Viro
    and Herbert Xu.

 3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

 4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

 5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

 6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

 7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

 8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

 9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets.  From Alexei
    Starovoitov.

10) Support TSO/LSO in sunvnet driver, from David L Stevens.

11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

12) Remote checksum offload, from Tom Herbert.

13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

14) Add MPLS support to openvswitch, from Simon Horman.

15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet.  This tries to resolve the conflicting goals between the
    desired handling of bulk vs.  RPC-like traffic.

17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU.  From Eric Dumazet.

18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

21) Add VLAN packet scheduler action, from Jiri Pirko.

22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
  Fix race condition between vxlan_sock_add and vxlan_sock_release
  net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
  net/mlx4: Add support for A0 steering
  net/mlx4: Refactor QUERY_PORT
  net/mlx4_core: Add explicit error message when rule doesn't meet configuration
  net/mlx4: Add A0 hybrid steering
  net/mlx4: Add mlx4_bitmap zone allocator
  net/mlx4: Add a check if there are too many reserved QPs
  net/mlx4: Change QP allocation scheme
  net/mlx4_core: Use tasklet for user-space CQ completion events
  net/mlx4_core: Mask out host side virtualization features for guests
  net/mlx4_en: Set csum level for encapsulated packets
  be2net: Export tunnel offloads only when a VxLAN tunnel is created
  gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
  cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
  net: fec: only enable mdio interrupt before phy device link up
  net: fec: clear all interrupt events to support i.MX6SX
  net: fec: reset fep link status in suspend function
  net: sock: fix access via invalid file descriptor
  net: introduce helper macro for_each_cmsghdr
  ...
2014-12-11 14:27:06 -08:00
Linus Torvalds
bae41e45b7 sound updates for 3.19-rc1
This became a fairly large pull request.  In addition to the usual
 driver updates / fixes, there have been a high amount of cleanups in
 ASoC area, as well as control API helpers and kernel documentations
 fixes touching through the whole tree.
 
 In the driver side, the biggest changes are the support for new Intel
 SoC found on new x86 machines, and the updates of FireWire dice and
 oxfw drivers.
 
 Some remarkable items are below:
 
 * ALSA core
  - PCM mmap code cleanup, removal of arch-dependent codes
  - PCM xrun injection support
  - PCM hwptr tracepoint support
  - Refactoring of snd_pcm_action(), simplification of PCM locking
  - Robustified sequecner auto-load functionality
  - New control API helpers and lots of cleanups along with them
  - Lots of kerneldoc fixes and cleanups
 
 * USB-audio
  - The mixer resume code was largely rewritten, and the devices with
    quirks are resumed properly.
  - New hardware support: Focusrite Scarlett, Digidesign Mbox1,
    Denon/Marantz DACs, Zoom R16/24
 
 * FireWire
  - DICE driver updates with better duplex and sync support, including
    MIDI support
  - New OXFW driver for Oxford Semiconductor FW970/971 chipset,
    including the previous LaCie Speakers device.  Fullduplex and MIDI
    support included as well as DICE driver.
 
 * HD-audio
  - Refactoring the driver-caps quirk handling in snd-hda-intel
  - More consistent control names representing the topology better
  - Fixups: HP mute LED with ALC268 codec, Ideapad S210 built-in mic
    fix, ASUS Z99He laptop EAPD
 
 * ASoC
  - Conversion of AC'97 drivers to use regmap, bringing us closer to
    the removal of the ASoC level I/O code
  - Clean up a lot of old drivers that were open coding things that
    have subsequently been implemented in the core
  - Some DAPM performance improvements
  - Removal of the now seldom used CODEC mutex
  - Lots of updates for the newer Intel SoC support, including support
    for the DSP and some Cherrytrail and Braswell machine drivers
  - Support for Samsung boards using rt5631 as the CODEC
  - Removal of the obsolete AFEB9260 machine driver
  - Driver support for the TI TS3A227E headset driver used in some
    Chrombeooks
 
 * Others
  - ASIHPI driver update and cleanups
  - Lots of dev_*() printk conversions
  - Lots of trivial cleanups for the codes spotted by Coccinelle
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJUiYaqAAoJEGwxgFQ9KSmkeo0P/2aDx2w8iVi8n7Og/7VBubkm
 VZkk08IOpP3h1ojyQRsBQPI0H5AquqQTZN1TJUDcy+6PD9vckYYcag9JWhA+0RBr
 I+BfTMLB3E4umIkzOjxeoyOzheL7GoZ+eZYEm8DkAhaue+cFhjNJz+S6g8ENkxJ9
 lSjErXQxyiowc39I0v1WBZcuq6glX1psEsVup9U8m7KhNx6lexj28A2MkqicW4hs
 DZE6pYrk57W7y3+/NWxaBiglrItvScBAPpPqoyDm9zuDNTmAtGjf1uMRmRyHe30Z
 iunHXki8Fc2yBBapmfYrcLC2jyIyZykcxniF8Hd4nXUvddisFUEFFhNmB6v392d0
 4/NXSqTnsq48vm0Ezjia2LySWKZZVQtam8t9262BKHcosKYObxirekD6vijSoWO8
 ZWoXa+U1oWSFEoOAFDsu6GFqFHFRi5VhqBgIaPEIxrT2MQGHL3KU1bp8CJi/5CTU
 pNh0wC9SMtnSJJXBIP/nYH81WQxaik3c4eiHFPN4+0McBZQiIaIqMG6x+iiVNvPB
 MNLLVAzk0QiWeCmSo8OBdjOV0/T+pfQ7lrTCn2B1jdJi1CkAO8m2SwQrG4PpRx8k
 lUTBd4zTx5DYR+yPF69OyoCQg0XKjW9g62Qo5rmxrQreiidROZOBS1bljWzIPeft
 otupLmK5kz67n3eB2eto
 =sB6v
 -----END PGP SIGNATURE-----

Merge tag 'sound-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound updates from Takashi Iwai:
 "This became a fairly large pull request.  In addition to the usual
  driver updates / fixes, there have been a high amount of cleanups in
  ASoC area, as well as control API helpers and kernel documentations
  fixes touching through the whole tree.

  In the driver side, the biggest changes are the support for new Intel
  SoC found on new x86 machines, and the updates of FireWire dice and
  oxfw drivers.

  Some remarkable items are below:

  ALSA core:
   - PCM mmap code cleanup, removal of arch-dependent codes
   - PCM xrun injection support
   - PCM hwptr tracepoint support
   - Refactoring of snd_pcm_action(), simplification of PCM locking
   - Robustified sequecner auto-load functionality
   - New control API helpers and lots of cleanups along with them
   - Lots of kerneldoc fixes and cleanups

  USB-audio:
   - The mixer resume code was largely rewritten, and the devices with
     quirks are resumed properly.
   - New hardware support: Focusrite Scarlett, Digidesign Mbox1,
     Denon/Marantz DACs, Zoom R16/24

  FireWire:
   - DICE driver updates with better duplex and sync support, including
     MIDI support
   - New OXFW driver for Oxford Semiconductor FW970/971 chipset,
     including the previous LaCie Speakers device.  Fullduplex and MIDI
     support included as well as DICE driver.

  HD-audio:
   - Refactoring the driver-caps quirk handling in snd-hda-intel
   - More consistent control names representing the topology better
   - Fixups: HP mute LED with ALC268 codec, Ideapad S210 built-in mic
     fix, ASUS Z99He laptop EAPD

  ASoC:
   - Conversion of AC'97 drivers to use regmap, bringing us closer to
     the removal of the ASoC level I/O code
   - Clean up a lot of old drivers that were open coding things that
     have subsequently been implemented in the core
   - Some DAPM performance improvements
   - Removal of the now seldom used CODEC mutex
   - Lots of updates for the newer Intel SoC support, including support
     for the DSP and some Cherrytrail and Braswell machine drivers
   - Support for Samsung boards using rt5631 as the CODEC
   - Removal of the obsolete AFEB9260 machine driver
   - Driver support for the TI TS3A227E headset driver used in some
     Chrombeooks

  Others:
   - ASIHPI driver update and cleanups
   - Lots of dev_*() printk conversions
   - Lots of trivial cleanups for the codes spotted by Coccinelle"

* tag 'sound-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (594 commits)
  ALSA: pcxhr: NULL dereference on probe failure
  ALSA: lola: NULL dereference on probe failure
  ALSA: hda - Add "eapd" model string for AD1986A codec
  ALSA: hda - Add EAPD fixup for ASUS Z99He laptop
  ALSA: oxfw: Add hwdep interface
  ALSA: oxfw: Add support for capture/playback MIDI messages
  ALSA: oxfw: add support for capturing PCM samples
  ALSA: oxfw: Add support AMDTP in-stream
  ALSA: oxfw: Add support for Behringer/Mackie devices
  ALSA: oxfw: Change the way to start stream
  ALSA: oxfw: Add proc interface for debugging purpose
  ALSA: oxfw: Change the way to make PCM rules/constraints
  ALSA: oxfw: Add support for AV/C stream format command to get/set supported stream formation
  ALSA: oxfw: Change the way to name card
  ALSA: dice: Add support for MIDI capture/playback
  ALSA: dice: Add support for capturing PCM samples
  ALSA: dice: Support for non SYT-Match sampling clock source mode
  ALSA: dice: Add support for duplex streams with synchronization
  ALSA: dice: Change the way to start stream
  ALSA: jack: Add dummy snd_jack_set_key() definition
  ...
2014-12-11 13:20:50 -08:00
Borislav Petkov
be9d1738b1 x86/asm: Unify segment selector defines
Those are identical on 32- and 64-bit, unify them. No functional
change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418127959-29902-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-11 11:45:03 +01:00
Xishi Qiu
c072b90c8d x86/mm: Fix zone ranges boot printout
This is the usual physical memory layout boot printout:
	...
	[    0.000000] Zone ranges:
	[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
	[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
	[    0.000000]   Normal   [mem 0x100000000-0xc3fffffff]
	[    0.000000] Movable zone start for each node
	[    0.000000] Early memory node ranges
	[    0.000000]   node   0: [mem 0x00001000-0x00099fff]
	[    0.000000]   node   0: [mem 0x00100000-0xbf78ffff]
	[    0.000000]   node   0: [mem 0x100000000-0x63fffffff]
	[    0.000000]   node   1: [mem 0x640000000-0xc3fffffff]
	...

This is the log when we set "mem=2G" on the boot cmdline:
	...
	[    0.000000] Zone ranges:
	[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
	[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]  // should be 0x7fffffff, right?
	[    0.000000]   Normal   empty
	[    0.000000] Movable zone start for each node
	[    0.000000] Early memory node ranges
	[    0.000000]   node   0: [mem 0x00001000-0x00099fff]
	[    0.000000]   node   0: [mem 0x00100000-0x7fffffff]
	...

This patch fixes the printout, the following log shows the right
ranges:
	...
	[    0.000000] Zone ranges:
	[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
	[    0.000000]   DMA32    [mem 0x01000000-0x7fffffff]
	[    0.000000]   Normal   empty
	[    0.000000] Movable zone start for each node
	[    0.000000] Early memory node ranges
	[    0.000000]   node   0: [mem 0x00001000-0x00099fff]
	[    0.000000]   node   0: [mem 0x00100000-0x7fffffff]
	...

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Linux MM <linux-mm@kvack.org>
Cc: <dave@sr71.net>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/r/5487AB3D.6070306@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-11 11:35:02 +01:00
Linus Torvalds
92a578b064 ACPI and power management updates for 3.19-rc1
This time we have some more new material than we used to have during
 the last couple of development cycles.
 
 The most important part of it to me is the introduction of a unified
 interface for accessing device properties provided by platform
 firmware.  It works with Device Trees and ACPI in a uniform way and
 drivers using it need not worry about where the properties come
 from as long as the platform firmware (either DT or ACPI) makes
 them available.  It covers both devices and "bare" device node
 objects without struct device representation as that turns out to
 be necessary in some cases.  This has been in the works for quite
 a few months (and development cycles) and has been approved by
 all of the relevant maintainers.
 
 On top of that, some drivers are switched over to the new interface
 (at25, leds-gpio, gpio_keys_polled) and some additional changes are
 made to the core GPIO subsystem to allow device drivers to manipulate
 GPIOs in the "canonical" way on platforms that provide GPIO information
 in their ACPI tables, but don't assign names to GPIO lines (in which
 case the driver needs to do that on the basis of what it knows about
 the device in question).  That also has been approved by the GPIO
 core maintainers and the rfkill driver is now going to use it.
 
 Second is support for hardware P-states in the intel_pstate driver.
 It uses CPUID to detect whether or not the feature is supported by
 the processor in which case it will be enabled by default.  However,
 it can be disabled entirely from the kernel command line if necessary.
 
 Next is support for a platform firmware interface based on ACPI
 operation regions used by the PMIC (Power Management Integrated
 Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms.
 That interface is used for manipulating power resources and for
 thermal management: sensor temperature reporting, trip point setting
 and so on.
 
 Also the ACPI core is now going to support the _DEP configuration
 information in a limited way.  Basically, _DEP it supposed to reflect
 off-the-hierarchy dependencies between devices which may be very
 indirect, like when AML for one device accesses locations in an
 operation region handled by another device's driver (usually, the
 device depended on this way is a serial bus or GPIO controller).
 The support added this time is sufficient to make the ACPI battery
 driver work on Asus T100A, but it is general enough to be able to
 cover some other use cases in the future.
 
 Finally, we have a new cpufreq driver for the Loongson1B processor.
 
 In addition to the above, there are fixes and cleanups all over the
 place as usual and a traditional ACPICA update to a recent upstream
 release.
 
 As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver
 for Intel platforms should be able to handle power management of
 the DMA engine correctly, the cpufreq-dt driver should interact
 with the thermal subsystem in a better way and the ACPI backlight
 driver should handle some more corner cases, among other things.
 
 On top of the ACPICA update there are fixes for race conditions
 in the ACPICA's interrupt handling code which might lead to some
 random and strange looking failures on some systems.
 
 In the cleanups department the most visible part is the series
 of commits targeted at getting rid of the CONFIG_PM_RUNTIME
 configuration option.  That was triggered by a discussion
 regarding the generic power domains code during which we realized
 that trying to support certain combinations of PM config options
 was painful and not really worth it, because nobody would use them
 in production anyway.  For this reason, we decided to make
 CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and that lead to the
 conclusion that the latter became redundant and CONFIG_PM could
 be used instead of it.  The material here makes that replacement
 in a major part of the tree, but there will be at least one more
 batch of that in the second part of the merge window.
 
 Specifics:
 
  - Support for retrieving device properties information from ACPI
    _DSD device configuration objects and a unified device properties
    interface for device drivers (and subsystems) on top of that.
    As stated above, this works with Device Trees and ACPI and allows
    device drivers to be written in a platform firmware (DT or ACPI)
    agnostic way.  The at25, leds-gpio and gpio_keys_polled drivers
    are now going to use this new interface and the GPIO subsystem
    is additionally modified to allow device drivers to assign names
    to GPIO resources returned by ACPI _CRS objects (in case _DSD is
    not present or does not provide the expected data).  The changes
    in this set are mostly from Mika Westerberg, Rafael J Wysocki,
    Aaron Lu, and Darren Hart with some fixes from others (Fabio Estevam,
    Geert Uytterhoeven).
 
  - Support for Hardware Managed Performance States (HWP) as described
    in Volume 3, section 14.4, of the Intel SDM in the intel_pstate
    driver.  CPUID is used to detect whether or not the feature is
    supported by the processor.  If supported, it will be enabled
    automatically unless the intel_pstate=no_hwp switch is present in
    the kernel command line.  From Dirk Brandewie.
 
  - New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie).
 
  - Support for firmware interface based on ACPI operation regions
    used by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR
    platforms for power resource control and thermal management
    (Aaron Lu).
 
  - Limited support for retrieving off-the-hierarchy dependencies
    between devices from ACPI _DEP device configuration objects
    and deferred probing support for the ACPI battery driver based
    on the _DEP information to make that driver work on Asus T100A
    (Lan Tianyu).
 
  - New cpufreq driver for the Loongson1B processor (Kelvin Cheung).
 
  - ACPICA update to upstream revision 20141107 which only affects
    tools (Bob Moore).
 
  - Fixes for race conditions in the ACPICA's interrupt handling
    code and in the ACPI code related to system suspend and resume
    (Lv Zheng and Rafael J Wysocki).
 
  - ACPI core fix for an RCU-related issue in the ioremap() regions
    management code that slowed down significantly after CPUs had
    been allowed to enter idle states even if they'd had RCU callbakcs
    queued and triggered some problems in certain proprietary graphics
    driver (and elsewhere).  The fix replaces synchronize_rcu() in
    that code with synchronize_rcu_expedited() which makes the issue
    go away.  From Konstantin Khlebnikov.
 
  - ACPI LPSS (Low-Power Subsystem) driver fix to handle power
    management of the DMA engine included into the LPSS correctly.
    The problem is that the DMA engine doesn't have ACPI PM support
    of its own and it simply is turned off when the last LPSS device
    having ACPI PM support goes into D3cold.  To work around that,
    the PM domain used by the ACPI LPSS driver is redesigned so at
    least one device with ACPI PM support will be on as long as the
    DMA engine is in use.  From Andy Shevchenko.
 
  - ACPI backlight driver fix to avoid using it on "Win8-compatible"
    systems where it doesn't work and where it was used by default by
    mistake (Aaron Lu).
 
  - Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki,
    Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and
    Ashwin Chaugule (mostly related to the upcoming ARM64 support).
 
  - Intel RAPL (Running Average Power Limit) power capping driver
    fixes and improvements including new processor IDs (Jacob Pan).
 
  - Generic power domains modification to power up domains after
    attaching devices to them to meet the expectations of device
    drivers and bus types assuming devices to be accessible at
    probe time (Ulf Hansson).
 
  - Preliminary support for controlling device clocks from the
    generic power domains core code and modifications of the
    ARM/shmobile platform to use that feature (Ulf Hansson).
 
  - Assorted minor fixes and cleanups of the generic power
    domains core code (Ulf Hansson, Geert Uytterhoeven).
 
  - Assorted minor fixes and cleanups of the device clocks control
    code in the PM core (Geert Uytterhoeven, Grygorii Strashko).
 
  - Consolidation of device power management Kconfig options by making
    CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter
    which is now redundant (Rafael J Wysocki and Kevin Hilman).  That
    is the first batch of the changes needed for this purpose.
 
  - Core device runtime power management support code cleanup related
    to the execution of callbacks (Andrzej Hajda).
 
  - cpuidle ARM support improvements (Lorenzo Pieralisi).
 
  - cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and
    a new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and
    Bartlomiej Zolnierkiewicz).
 
  - New cpufreq driver callback (->ready) to be executed when the
    cpufreq core is ready to use a given policy object and cpufreq-dt
    driver modification to use that callback for cooling device
    registration (Viresh Kumar).
 
  - cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu,
    James Geboski, Tomeu Vizoso).
 
  - Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate,
    cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao,
    Stefan Wahren, Petr Cvek).
 
  - OPP (Operating Performance Points) framework modification to
    allow OPPs to be removed too and update of a few cpufreq drivers
    (cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added
    during initialization) on driver removal (Viresh Kumar).
 
  - Hibernation core fixes and cleanups (Tina Ruchandani and
    Markus Elfring).
 
  - PM Kconfig fix related to CPU power management (Pankaj Dubey).
 
  - cpupower tool fix (Prarit Bhargava).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJUhj6JAAoJEILEb/54YlRxTM4P/j5g5SfqvY0QKsn7sR7MGZ6v
 nsgCBhJAqTw3ocNC7EAs8z9h2GWy1KbKpakKYWAh9Fs1yZoey7tFSlcv/Rgjlp70
 uU5sDQHtpE9mHKiymdsowiQuWgpl962L4k+k8hUslhlvgk1PvVbpajR6OqG8G+pD
 asuIW9eh1APNkLyXmRJ3ZPomzs0VmRdZJ0NEs0lKX9mJskqEvxPIwdaxq3iaJq9B
 Fo0J345zUDcJnxWblDRdHlOigCimglElfN5qJwaC4KpwUKuBvLRKbp4f69+wfT0c
 kYFiR29X5KjJ2kLfP/wKsLyuDCYYXRq3tCia5M1tAqOjZ+UA89H/GDftx/5lntmv
 qUlBa35VfdS1SX4HyApZitOHiLgo+It/hl8Z9bJnhyVw66NxmMQ8JYN2imb8Lhqh
 XCLR7BxLTah82AapLJuQ0ZDHPzZqMPG2veC2vAzRMYzVijict/p4Y2+qBqONltER
 4rs9uRVn+hamX33lCLg8BEN8zqlnT3rJFIgGaKjq/wXHAU/zpE9CjOrKMQcAg9+s
 t51XMNPwypHMAYyGVhEL89ImjXnXxBkLRuquhlmEpvQchIhR+mR3dLsarGn7da44
 WPIQJXzcsojXczcwwfqsJCR4I1FTFyQIW+UNh02GkDRgRovQqo+Jk762U7vQwqH+
 LBdhvVaS1VW4v+FWXEoZ
 =5dox
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management updates from Rafael Wysocki:
 "This time we have some more new material than we used to have during
  the last couple of development cycles.

  The most important part of it to me is the introduction of a unified
  interface for accessing device properties provided by platform
  firmware.  It works with Device Trees and ACPI in a uniform way and
  drivers using it need not worry about where the properties come from
  as long as the platform firmware (either DT or ACPI) makes them
  available.  It covers both devices and "bare" device node objects
  without struct device representation as that turns out to be necessary
  in some cases.  This has been in the works for quite a few months (and
  development cycles) and has been approved by all of the relevant
  maintainers.

  On top of that, some drivers are switched over to the new interface
  (at25, leds-gpio, gpio_keys_polled) and some additional changes are
  made to the core GPIO subsystem to allow device drivers to manipulate
  GPIOs in the "canonical" way on platforms that provide GPIO
  information in their ACPI tables, but don't assign names to GPIO lines
  (in which case the driver needs to do that on the basis of what it
  knows about the device in question).  That also has been approved by
  the GPIO core maintainers and the rfkill driver is now going to use
  it.

  Second is support for hardware P-states in the intel_pstate driver.
  It uses CPUID to detect whether or not the feature is supported by the
  processor in which case it will be enabled by default.  However, it
  can be disabled entirely from the kernel command line if necessary.

  Next is support for a platform firmware interface based on ACPI
  operation regions used by the PMIC (Power Management Integrated
  Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms.
  That interface is used for manipulating power resources and for
  thermal management: sensor temperature reporting, trip point setting
  and so on.

  Also the ACPI core is now going to support the _DEP configuration
  information in a limited way.  Basically, _DEP it supposed to reflect
  off-the-hierarchy dependencies between devices which may be very
  indirect, like when AML for one device accesses locations in an
  operation region handled by another device's driver (usually, the
  device depended on this way is a serial bus or GPIO controller).  The
  support added this time is sufficient to make the ACPI battery driver
  work on Asus T100A, but it is general enough to be able to cover some
  other use cases in the future.

  Finally, we have a new cpufreq driver for the Loongson1B processor.

  In addition to the above, there are fixes and cleanups all over the
  place as usual and a traditional ACPICA update to a recent upstream
  release.

  As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver for
  Intel platforms should be able to handle power management of the DMA
  engine correctly, the cpufreq-dt driver should interact with the
  thermal subsystem in a better way and the ACPI backlight driver should
  handle some more corner cases, among other things.

  On top of the ACPICA update there are fixes for race conditions in the
  ACPICA's interrupt handling code which might lead to some random and
  strange looking failures on some systems.

  In the cleanups department the most visible part is the series of
  commits targeted at getting rid of the CONFIG_PM_RUNTIME configuration
  option.  That was triggered by a discussion regarding the generic
  power domains code during which we realized that trying to support
  certain combinations of PM config options was painful and not really
  worth it, because nobody would use them in production anyway.  For
  this reason, we decided to make CONFIG_PM_SLEEP select
  CONFIG_PM_RUNTIME and that lead to the conclusion that the latter
  became redundant and CONFIG_PM could be used instead of it.  The
  material here makes that replacement in a major part of the tree, but
  there will be at least one more batch of that in the second part of
  the merge window.

  Specifics:

   - Support for retrieving device properties information from ACPI _DSD
     device configuration objects and a unified device properties
     interface for device drivers (and subsystems) on top of that.  As
     stated above, this works with Device Trees and ACPI and allows
     device drivers to be written in a platform firmware (DT or ACPI)
     agnostic way.  The at25, leds-gpio and gpio_keys_polled drivers are
     now going to use this new interface and the GPIO subsystem is
     additionally modified to allow device drivers to assign names to
     GPIO resources returned by ACPI _CRS objects (in case _DSD is not
     present or does not provide the expected data).  The changes in
     this set are mostly from Mika Westerberg, Rafael J Wysocki, Aaron
     Lu, and Darren Hart with some fixes from others (Fabio Estevam,
     Geert Uytterhoeven).

   - Support for Hardware Managed Performance States (HWP) as described
     in Volume 3, section 14.4, of the Intel SDM in the intel_pstate
     driver.  CPUID is used to detect whether or not the feature is
     supported by the processor.  If supported, it will be enabled
     automatically unless the intel_pstate=no_hwp switch is present in
     the kernel command line.  From Dirk Brandewie.

   - New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie).

   - Support for firmware interface based on ACPI operation regions used
     by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR
     platforms for power resource control and thermal management (Aaron
     Lu).

   - Limited support for retrieving off-the-hierarchy dependencies
     between devices from ACPI _DEP device configuration objects and
     deferred probing support for the ACPI battery driver based on the
     _DEP information to make that driver work on Asus T100A (Lan
     Tianyu).

   - New cpufreq driver for the Loongson1B processor (Kelvin Cheung).

   - ACPICA update to upstream revision 20141107 which only affects
     tools (Bob Moore).

   - Fixes for race conditions in the ACPICA's interrupt handling code
     and in the ACPI code related to system suspend and resume (Lv Zheng
     and Rafael J Wysocki).

   - ACPI core fix for an RCU-related issue in the ioremap() regions
     management code that slowed down significantly after CPUs had been
     allowed to enter idle states even if they'd had RCU callbakcs
     queued and triggered some problems in certain proprietary graphics
     driver (and elsewhere).  The fix replaces synchronize_rcu() in that
     code with synchronize_rcu_expedited() which makes the issue go
     away.  From Konstantin Khlebnikov.

   - ACPI LPSS (Low-Power Subsystem) driver fix to handle power
     management of the DMA engine included into the LPSS correctly.  The
     problem is that the DMA engine doesn't have ACPI PM support of its
     own and it simply is turned off when the last LPSS device having
     ACPI PM support goes into D3cold.  To work around that, the PM
     domain used by the ACPI LPSS driver is redesigned so at least one
     device with ACPI PM support will be on as long as the DMA engine is
     in use.  From Andy Shevchenko.

   - ACPI backlight driver fix to avoid using it on "Win8-compatible"
     systems where it doesn't work and where it was used by default by
     mistake (Aaron Lu).

   - Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki,
     Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and Ashwin
     Chaugule (mostly related to the upcoming ARM64 support).

   - Intel RAPL (Running Average Power Limit) power capping driver fixes
     and improvements including new processor IDs (Jacob Pan).

   - Generic power domains modification to power up domains after
     attaching devices to them to meet the expectations of device
     drivers and bus types assuming devices to be accessible at probe
     time (Ulf Hansson).

   - Preliminary support for controlling device clocks from the generic
     power domains core code and modifications of the ARM/shmobile
     platform to use that feature (Ulf Hansson).

   - Assorted minor fixes and cleanups of the generic power domains core
     code (Ulf Hansson, Geert Uytterhoeven).

   - Assorted minor fixes and cleanups of the device clocks control code
     in the PM core (Geert Uytterhoeven, Grygorii Strashko).

   - Consolidation of device power management Kconfig options by making
     CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter
     which is now redundant (Rafael J Wysocki and Kevin Hilman).  That
     is the first batch of the changes needed for this purpose.

   - Core device runtime power management support code cleanup related
     to the execution of callbacks (Andrzej Hajda).

   - cpuidle ARM support improvements (Lorenzo Pieralisi).

   - cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and a
     new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and
     Bartlomiej Zolnierkiewicz).

   - New cpufreq driver callback (->ready) to be executed when the
     cpufreq core is ready to use a given policy object and cpufreq-dt
     driver modification to use that callback for cooling device
     registration (Viresh Kumar).

   - cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu, James
     Geboski, Tomeu Vizoso).

   - Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate,
     cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao,
     Stefan Wahren, Petr Cvek).

   - OPP (Operating Performance Points) framework modification to allow
     OPPs to be removed too and update of a few cpufreq drivers
     (cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added
     during initialization) on driver removal (Viresh Kumar).

   - Hibernation core fixes and cleanups (Tina Ruchandani and Markus
     Elfring).

   - PM Kconfig fix related to CPU power management (Pankaj Dubey).

   - cpupower tool fix (Prarit Bhargava)"

* tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (120 commits)
  i2c-omap / PM: Drop CONFIG_PM_RUNTIME from i2c-omap.c
  dmaengine / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  tools: cpupower: fix return checks for sysfs_get_idlestate_count()
  drivers: sh / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME
  MMC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  MFD / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  misc / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  media / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  input / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  leds: leds-gpio: Fix multiple instances registration without 'label' property
  iio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  hsi / OMAP / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  i2c-hid / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  drm / exynos / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  gpio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  hwrandom / exynos / PM: Use CONFIG_PM in #ifdef
  block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  USB / PM: Drop CONFIG_PM_RUNTIME from the USB core
  PM: Merge the SET*_RUNTIME_PM_OPS() macros
  ...
2014-12-10 21:17:00 -08:00
Linus Torvalds
1dd7dcb6ea There was a lot of clean ups and minor fixes. One of those clean ups was
to the trace_seq code. It also removed the return values to the
 trace_seq_*() functions and use trace_seq_has_overflowed() to see if
 the buffer filled up or not. This is similar to work being done to the
 seq_file code as well in another tree.
 
 Some of the other goodies include:
 
  o Added some "!" (NOT) logic to the tracing filter.
 
  o Fixed the frame pointer logic to the x86_64 mcount trampolines
 
  o Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
    That is, the ftrace trampoline can be dynamically allocated
    and be called directly by functions that only have a single hook
    to them.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUhbLGAAoJEEjnJuOKh9ldRV4H/3NcLbgGB2iu96la1zdYE6pG
 Q7cDJMxXK80YIIL70h9G0IItcD4t62LMb72lfBnMGRj3msgFb3AgISW57EuI0Pxk
 xk24wuIPoTG2S7v9sc3SboNFwO8qbtIjxD2OBmqIUrGo2sZIiGjyj3gX7mCY3uzL
 WB2bUOSFz/22OgaANinR5EELHA3pZZCf54Vz1K9ndmtK0xp0j1a7xJShD6TrMdYv
 mZ3zH5ViIhW4A3mdcMceh6fy2JLQAiEKF0uPTvcMMz7NlVul0mxyL/+10P7AE/3R
 Ehw4fzmm4NDshPDtBOkKH0LsppgXzuItFuQUTpact3JlqTg++bV6onSsrkt1hlY=
 =Z7Cm
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "There was a lot of clean ups and minor fixes.  One of those clean ups
  was to the trace_seq code.  It also removed the return values to the
  trace_seq_*() functions and use trace_seq_has_overflowed() to see if
  the buffer filled up or not.  This is similar to work being done to
  the seq_file code as well in another tree.

  Some of the other goodies include:

   - Added some "!" (NOT) logic to the tracing filter.

   - Fixed the frame pointer logic to the x86_64 mcount trampolines

   - Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
     That is, the ftrace trampoline can be dynamically allocated and be
     called directly by functions that only have a single hook to them"

* tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits)
  tracing: Truncated output is better than nothing
  tracing: Add additional marks to signal very large time deltas
  Documentation: describe trace_buf_size parameter more accurately
  tracing: Allow NOT to filter AND and OR clauses
  tracing: Add NOT to filtering logic
  ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter
  ftrace/x86: Get rid of ftrace_caller_setup
  ftrace/x86: Have save_mcount_regs macro also save stack frames if needed
  ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs
  ftrace/x86: Simplify save_mcount_regs on getting RIP
  ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter
  ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments
  ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file
  ftrace/x86: Have static tracing also use ftrace_caller_setup
  ftrace/x86: Have static function tracing always test for function graph
  kprobes: Add IPMODIFY flag to kprobe_ftrace_ops
  ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict
  kprobes/ftrace: Recover original IP if pre_handler doesn't change it
  tracing/trivial: Fix typos and make an int into a bool
  tracing: Deletion of an unnecessary check before iput()
  ...
2014-12-10 19:58:13 -08:00
Linus Torvalds
b6da0076ba Merge branch 'akpm' (patchbomb from Andrew)
Merge first patchbomb from Andrew Morton:
 - a few minor cifs fixes
 - dma-debug upadtes
 - ocfs2
 - slab
 - about half of MM
 - procfs
 - kernel/exit.c
 - panic.c tweaks
 - printk upates
 - lib/ updates
 - checkpatch updates
 - fs/binfmt updates
 - the drivers/rtc tree
 - nilfs
 - kmod fixes
 - more kernel/exit.c
 - various other misc tweaks and fixes

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
  exit: pidns: fix/update the comments in zap_pid_ns_processes()
  exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
  exit: exit_notify: re-use "dead" list to autoreap current
  exit: reparent: call forget_original_parent() under tasklist_lock
  exit: reparent: avoid find_new_reaper() if no children
  exit: reparent: introduce find_alive_thread()
  exit: reparent: introduce find_child_reaper()
  exit: reparent: document the ->has_child_subreaper checks
  exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper()
  exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting
  exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting
  exit: proc: don't try to flush /proc/tgid/task/tgid
  exit: release_task: fix the comment about group leader accounting
  exit: wait: drop tasklist_lock before psig->c* accounting
  exit: wait: don't use zombie->real_parent
  exit: wait: cleanup the ptrace_reparented() checks
  usermodehelper: kill the kmod_thread_locker logic
  usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper()
  fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp
  nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
  ...
2014-12-10 18:34:42 -08:00
Kirill A. Shutemov
c164e038ee mm: fix huge zero page accounting in smaps report
As a small zero page, huge zero page should not be accounted in smaps
report as normal page.

For small pages we rely on vm_normal_page() to filter out zero page, but
vm_normal_page() is not designed to handle pmds.  We only get here due
hackish cast pmd to pte in smaps_pte_range() -- pte and pmd format is not
necessary compatible on each and every architecture.

Let's add separate codepath to handle pmds.  follow_trans_huge_pmd() will
detect huge zero page for us.

We would need pmd_dirty() helper to do this properly.  The patch adds it
to THP-enabled architectures which don't yet have one.

[akpm@linux-foundation.org: use do_div to fix 32-bit build]
Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengwei Yin <yfw.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:08 -08:00
Linus Torvalds
3a5dc1fafb Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Reload microcode when resuming and the case when only the early
     loader has been utilized.  (Borislav Petkov)

   - Also, do not load the driver on paravirt guests.  (Boris
     Ostrovsky)"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/intel: Fish out the stashed microcode for the BSP
  x86, microcode: Reload microcode on resume
  x86, microcode: Don't initialize microcode code on paravirt
  x86, microcode, intel: Drop unused parameter
  x86, microcode, AMD: Do not use smp_processor_id() in preemtible context
2014-12-10 15:01:43 -08:00
Linus Torvalds
3100e448e7 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso updates from Ingo Molnar:
 "Various vDSO updates from Andy Lutomirski, mostly cleanups and
  reorganization to improve maintainability, but also some
  micro-optimizations and robustization changes"

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64/vsyscall: Restore orig_ax after vsyscall seccomp
  x86_64: Add a comment explaining the TASK_SIZE_MAX guard page
  x86_64,vsyscall: Make vsyscall emulation configurable
  x86_64, vsyscall: Rewrite comment and clean up headers in vsyscall code
  x86_64, vsyscall: Turn vsyscalls all the way off when vsyscall==none
  x86,vdso: Use LSL unconditionally for vgetcpu
  x86: vdso: Fix build with older gcc
  x86_64/vdso: Clean up vgetcpu init and merge the vdso initcalls
  x86_64/vdso: Remove jiffies from the vvar page
  x86/vdso: Make the PER_CPU segment 32 bits
  x86/vdso: Make the PER_CPU segment start out accessed
  x86/vdso: Change the PER_CPU segment to use struct desc_struct
  x86_64/vdso: Move getcpu code from vsyscall_64.c to vdso/vma.c
  x86_64/vsyscall: Move all of the gate_area code to vsyscall_64.c
2014-12-10 14:24:20 -08:00
Linus Torvalds
c9f861c772 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "The biggest change in this cycle is better support for UCNA
  (UnCorrected No Action) events:

    "Handle all uncorrected error reports in the same way (soft
     offline the page). We used to only do that for SRAO
     (software recoverable action optional) machine checks, but
     it makes sense to also do it for UCNA (UnCorrected No
     Action) logs found by CMCI or polling."

  plus various x86 MCE handling updates and fixes"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Spell "panicked" correctly
  x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll
  x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
  x86, MCE, AMD: Assign interrupt handler only when bank supports it
  x86, MCE, AMD: Drop software-defined bank in error thresholding
  x86, MCE, AMD: Move invariant code out from loop body
  x86, MCE, AMD: Correct thresholding error logging
  x86, MCE, AMD: Use macros to compute bank MSRs
  RAS, HWPOISON: Fix wrong error recovery status
  GHES: Make ghes_estatus_caches static
  APEI, GHES: Cleanup unnecessary function for lockless list
2014-12-10 14:20:10 -08:00
Linus Torvalds
a023748d53 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm tree changes from Ingo Molnar:
 "The biggest change is full PAT support from Jürgen Gross:

     The x86 architecture offers via the PAT (Page Attribute Table) a
     way to specify different caching modes in page table entries.  The
     PAT MSR contains 8 entries each specifying one of 6 possible cache
     modes.  A pte references one of those entries via 3 bits:
     _PAGE_PAT, _PAGE_PWT and _PAGE_PCD.

     The Linux kernel currently supports only 4 different cache modes.
     The PAT MSR is set up in a way that the setting of _PAGE_PAT in a
     pte doesn't matter: the top 4 entries in the PAT MSR are the same
     as the 4 lower entries.

     This results in the kernel not supporting e.g. write-through mode.
     Especially this cache mode would speed up drivers of video cards
     which now have to use uncached accesses.

     OTOH some old processors (Pentium) don't support PAT correctly and
     the Xen hypervisor has been using a different PAT MSR configuration
     for some time now and can't change that as this setting is part of
     the ABI.

     This patch set abstracts the cache mode from the pte and introduces
     tables to translate between cache mode and pte bits (the default
     cache mode "write back" is hard-wired to PAT entry 0).  The tables
     are statically initialized with values being compatible to old
     processors and current usage.  As soon as the PAT MSR is changed
     (or - in case of Xen - is read at boot time) the tables are changed
     accordingly.  Requests of mappings with special cache modes are
     always possible now, in case they are not supported there will be a
     fallback to a compatible but slower mode.

     Summing it up, this patch set adds the following features:

      - capability to support WT and WP cache modes on processors with
        full PAT support

      - processors with no or uncorrect PAT support are still working as
        today, even if WT or WP cache mode are selected by drivers for
        some pages

      - reduction of Xen special handling regarding cache mode

  Another change is a boot speedup on ridiculously large RAM systems,
  plus other smaller fixes"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  x86: mm: Move PAT only functions to mm/pat.c
  xen: Support Xen pv-domains using PAT
  x86: Enable PAT to use cache mode translation tables
  x86: Respect PAT bit when copying pte values between large and normal pages
  x86: Support PAT bit in pagetable dump for lower levels
  x86: Clean up pgtable_types.h
  x86: Use new cache mode type in memtype related functions
  x86: Use new cache mode type in mm/ioremap.c
  x86: Use new cache mode type in setting page attributes
  x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
  x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()
  x86: Use new cache mode type in mm/iomap_32.c
  x86: Use new cache mode type in asm/pgtable.h
  x86: Use new cache mode type in arch/x86/mm/init_64.c
  x86: Use new cache mode type in arch/x86/pci
  x86: Use new cache mode type in drivers/video/fbdev/vermilion
  x86: Use new cache mode type in drivers/video/fbdev/gbefb.c
  x86: Use new cache mode type in include/asm/fb.h
  x86: Make page cache mode a real type
  x86: mm: Use 2GB memory block size on large-memory x86-64 systems
  ...
2014-12-10 13:59:34 -08:00
Linus Torvalds
773fed910d Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform changes from Ingo Molnar:
 "A handful of numachip APIC driver updates/fixes, and two small SGI/UV
  fixes"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: numachip: APIC driver cleanups
  x86: numachip: Elide self-IPI ICR polling
  x86: numachip: Fix 16-bit APIC ID truncation

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: UV BAU: Increase maximum CPUs per socket/hub
  x86: UV BAU: Avoid NULL pointer reference in ptc_seq_show
2014-12-10 13:40:11 -08:00
Linus Torvalds
8139548136 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "Changes in this cycle are:

   - support module unload for efivarfs (Mathias Krause)

   - another attempt at moving x86 to libstub taking advantage of the
     __pure attribute (Ard Biesheuvel)

   - add EFI runtime services section to ptdump (Mathias Krause)"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, ptdump: Add section for EFI runtime services
  efi/x86: Move x86 back to libstub
  efivarfs: Allow unloading when build as module
2014-12-10 12:42:16 -08:00
Daniel Borkmann
0cb6c969ed net, lib: kill arch_fast_hash library bits
As there are now no remaining users of arch_fast_hash(), lets kill
it entirely.

This basically reverts commit 71ae8aac3e ("lib: introduce arch
optimized hash library") and follow-up work, that is f.e., commit
237217546d ("lib: hash: follow-up fixups for arch hash"),
commit e3fec2f74f ("lib: Add missing arch generic-y entries for
asm-generic/hash.h") and last but not least commit 6a02652df5
("perf tools: Fix include for non x86 architectures").

Cc: Francesco Fusco <fusco@ntop.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:17:46 -05:00
Linus Torvalds
b6444bd0a1 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot and percpu updates from Ingo Molnar:
 "This tree contains a bootable images documentation update plus three
  slightly misplaced x86/asm percpu changes/optimizations"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-64: Use RIP-relative addressing for most per-CPU accesses
  x86-64: Handle PC-relative relocations on per-CPU data
  x86: Convert a few more per-CPU items to read-mostly ones
  x86, boot: Document intermediates more clearly
2014-12-10 12:10:24 -08:00
Linus Torvalds
9d0cf6f564 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "Misc changes:

   - context switch micro-optimization
   - debug printout micro-optimization
   - comment enhancements and typo fix"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Replace seq_printf() with seq_puts()
  x86/asm: Fix typo in arch/x86/kernel/asm_offset_64.c
  sched/x86: Add a comment clarifying LDT context switching
  sched/x86_64: Don't save flags on context switch
2014-12-10 12:09:26 -08:00
Linus Torvalds
3eb5b893eb Merge branch 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 MPX support from Thomas Gleixner:
 "This enables support for x86 MPX.

  MPX is a new debug feature for bound checking in user space.  It
  requires kernel support to handle the bound tables and decode the
  bound violating instruction in the trap handler"

* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  asm-generic: Remove asm-generic arch_bprm_mm_init()
  mm: Make arch_unmap()/bprm_mm_init() available to all architectures
  x86: Cleanly separate use of asm-generic/mm_hooks.h
  x86 mpx: Change return type of get_reg_offset()
  fs: Do not include mpx.h in exec.c
  x86, mpx: Add documentation on Intel MPX
  x86, mpx: Cleanup unused bound tables
  x86, mpx: On-demand kernel allocation of bounds tables
  x86, mpx: Decode MPX instruction to get bound violation information
  x86, mpx: Add MPX-specific mmap interface
  x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
  x86, mpx: Add MPX to disabled features
  ia64: Sync struct siginfo with general version
  mips: Sync struct siginfo with general version
  mpx: Extend siginfo structure to include bound violation information
  x86, mpx: Rename cfg_reg_u and status_reg
  x86: mpx: Give bndX registers actual names
  x86: Remove arbitrary instruction size limit in instruction decoder
2014-12-10 09:34:43 -08:00
Linus Torvalds
9e66645d72 Merge branch 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq domain updates from Thomas Gleixner:
 "The real interesting irq updates:

   - Support for hierarchical irq domains:

     For complex interrupt routing scenarios where more than one
     interrupt related chip is involved we had no proper representation
     in the generic interrupt infrastructure so far.  That made people
     implement rather ugly constructs in their nested irq chip
     implementations.  The main offenders are x86 and arm/gic.

     To distangle that mess we have now hierarchical irqdomains which
     seperate the various interrupt chips and connect them via the
     hierarchical domains.  That keeps the domain specific details
     internal to the particular hierarchy level and removes the
     criss/cross referencing of chip internals.  The resulting hierarchy
     for a complex x86 system will look like this:

        vector          mapped: 74
          msi-0         mapped: 2
          dmar-ir-1     mapped: 69
            ioapic-1    mapped: 4
            ioapic-0    mapped: 20
            pci-msi-2   mapped: 45
          dmar-ir-0     mapped: 3
            ioapic-2    mapped: 1
            pci-msi-1   mapped: 2
          htirq         mapped: 0

     Neither ioapic nor pci-msi know about the dmar interrupt remapping
     between themself and the vector domain.  If interrupt remapping is
     disabled ioapic and pci-msi become direct childs of the vector
     domain.

     In hindsight we should have done that years ago, but in hindsight
     we always know better :)

   - Support for generic MSI interrupt domain handling

     We have more and more non PCI related MSI interrupts, so providing
     a generic infrastructure for this is better than having all
     affected architectures implementing their own private hacks.

   - Support for PCI-MSI interrupt domain handling, based on the generic
     MSI support.

     This part carries the pci/msi branch from Bjorn Helgaas pci tree to
     avoid a massive conflict.  The PCI/MSI parts are acked by Bjorn.

  I have two more branches on top of this.  The full conversion of x86
  to hierarchical domains and a partial conversion of arm/gic"

* 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  genirq: Move irq_chip_write_msi_msg() helper to core
  PCI/MSI: Allow an msi_controller to be associated to an irq domain
  PCI/MSI: Provide mechanism to alloc/free MSI/MSIX interrupt from irqdomain
  PCI/MSI: Enhance core to support hierarchy irqdomain
  PCI/MSI: Move cached entry functions to irq core
  genirq: Provide default callbacks for msi_domain_ops
  genirq: Introduce msi_domain_alloc/free_irqs()
  asm-generic: Add msi.h
  genirq: Add generic msi irq domain support
  genirq: Introduce callback irq_chip.irq_write_msi_msg
  genirq: Work around __irq_set_handler vs stacked domains ordering issues
  irqdomain: Introduce helper function irq_domain_add_hierarchy()
  irqdomain: Implement a method to automatically call parent domains alloc/free
  genirq: Introduce helper irq_domain_set_info() to reduce duplicated code
  genirq: Split out flow handler typedefs into seperate header file
  genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip
  genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip
  genirq: Add more helper functions to support stacked irq_chip
  genirq: Introduce helper functions to support stacked irq_chip
  irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF
  ...
2014-12-10 09:01:01 -08:00
Linus Torvalds
86c6a2fddf Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.

     This instruments might_sleep() checks to catch places that nest
     blocking primitives - such as mutex usage in a wait loop.  Such
     bugs can result in hard to debug races/hangs.

     Another category of invalid nesting that this facility will detect
     is the calling of blocking functions from within schedule() ->
     sched_submit_work() -> blk_schedule_flush_plug().

     There's some potential for false positives (if secondary blocking
     primitives themselves are not ready yet for this facility), but the
     kernel will warn once about such bugs per bootup, so the warning
     isn't much of a nuisance.

     This feature comes with a number of fixes, for problems uncovered
     with it, so no messages are expected normally.

   - Another round of sched/numa optimizations and refinements, for
     CONFIG_NUMA_BALANCING=y.

   - Another round of sched/dl fixes and refinements.

  Plus various smaller fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched: Add missing rcu protection to wake_up_all_idle_cpus
  sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
  sched/numa: Init numa balancing fields of init_task
  sched/deadline: Remove unnecessary definitions in cpudeadline.h
  sched/cpupri: Remove unnecessary definitions in cpupri.h
  sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
  sched/fair: Fix stale overloaded status in the busiest group finding logic
  sched: Move p->nr_cpus_allowed check to select_task_rq()
  sched/completion: Document when to use wait_for_completion_io_*()
  sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
  sched/fair: Kill task_struct::numa_entry and numa_group::task_list
  sched: Refactor task_struct to use numa_faults instead of numa_* pointers
  sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
  sched/deadline: Reschedule from switched_from_dl() after a successful pull
  sched/deadline: Push task away if the deadline is equal to curr during wakeup
  sched/deadline: Add deadline rq status print
  sched/deadline: Fix artificial overrun introduced by yield_task_dl()
  sched/rt: Clean up check_preempt_equal_prio()
  sched/core: Use dl_bw_of() under rcu_read_lock_sched()
  sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
  ...
2014-12-09 21:21:34 -08:00
Linus Torvalds
5706ffd045 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events update from Ingo Molnar:
 "On the kernel side there's few changes, the one that stands out is
  PEBS machine state sampling support on x86, by Stephane Eranian.

  On the tooling side:

  User visible tooling changes:

   - Don't open the DWARF info multiple times, keeping instead a dwfl
     handle in struct dso, greatly speeding up 'perf report' on powerpc.
     (Sukadev Bhattiprolu)

   - Introduce PARSE_OPT_DISABLED option flag and use it to avoid
     showing undersired options in tools that provides frontends to
     'perf record', like sched, kvm, etc (Namhyung Kim)

   - Fallback to kallsyms when using the minimal 'ELF' loader (Arnaldo
     Carvalho de Melo)

   - Fix annotation with kcore (Adrian Hunter)

   - Support source line numbers in annotate using a hotkey (Andi Kleen)

   - Callchain improvements including:
     * Enable printing the srcline in the history
     * Make get_srcline fall back to sym+offset (Andi Kleen)

   - TUI hist_entry browser fixes, including showing missing overhead
     value for first level callchain.  Detected comparing the output of
     --stdio/--gui (that matched) with --tui, that had this problem.
     (Namhyung Kim)

   - Support handling complete branch stacks as histograms (Andi Kleen)

  Tooling infrastructure changes:

   - Prep work for supporting per-pkg and snapshot counters in 'perf
     stat' (Jiri Olsa)

   - 'perf stat' refactorings, moving stuff from it to evsel.c to use in
     per-pkg/snapshot format changes (Jiri Olsa)

   - Add per-pkg format file parsing (Matt Fleming)

   - Clean up libelf feature support code (Namhyung Kim)

   - Add gzip decompression support for kernel modules (Namhyung Kim)

   - More prep patches for Intel PT, including a a thread stack and more
     stuff made available via the database export mechanism (Adrian
     Hunter)

   - More Intel PT work, including a facility to export sample data
     (comms, threads, symbol names, etc) in a database friendly way,
     with an script to use this to create a postgresql database.
     (Adrian Hunter)

   - Make sure that thread->mg->machine points to the machine where the
     thread exists (it was being set only for the kmaps kernel modules
     case, do it as well for the mmaps) and use it to shorten function
     signatures (Arnaldo Carvalho de Melo)

  ... and lots of other fixes and smaller improvements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits)
  perf report: In branch stack mode use address history sorting
  perf report: Add --branch-history option
  perf callchain: Support handling complete branch stacks as histograms
  perf stat: Add support for snapshot counters
  perf stat: Add support for per-pkg counters
  perf tools: Remove perf_evsel__read interface
  perf stat: Use read_counter in read_counter_aggr
  perf stat: Make read_counter work over the thread dimension
  perf stat: Use perf_evsel__read_cb in read_counter
  perf tools: Add snapshot format file parsing
  perf tools: Add per-pkg format file parsing
  perf evsel: Introduce perf_evsel__read_cb function
  perf evsel: Introduce perf_counts_values__scale function
  perf evsel: Introduce perf_evsel__compute_deltas function
  perf tools: Allow to force redirect pr_debug to stderr.
  perf tools: Fix segfault due to invalid kernel dso access
  perf callchain: Make get_srcline fall back to sym+offset
  perf symbols: Move bfd_demangle stubbing to its only user
  perf callchain: Enable printing the srcline in the history
  perf tools: Collapse first level callchain entry if it has sibling
  ...
2014-12-09 20:55:37 -08:00
Linus Torvalds
9c37f95936 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking tree changes from Ingo Molnar:
 "Two changes: a documentation update and a ticket locks live lock fix"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ticketlock: Fix spin_unlock_wait() livelock
  locking/lglocks: Add documentation of current lglocks implementation
2014-12-09 19:59:22 -08:00
Linus Torvalds
a0e4467726 asm-generic: asm/io.h rewrite
While there normally is no reason to have a pull request for asm-generic
 but have all changes get merged through whichever tree needs them, I do
 have a series for 3.19. There are two sets of patches that change
 significant portions of asm/io.h, and this branch contains both in order
 to resolve the conflicts:
 
 - Will Deacon has done a set of patches to ensure that all architectures
   define {read,write}{b,w,l,q}_relaxed() functions or get them by
   including asm-generic/io.h. These functions are commonly used on ARM
   specific drivers to avoid expensive L2 cache synchronization implied by
   the normal {read,write}{b,w,l,q}, but we need to define them on all
   architectures in order to share the drivers across architectures and
   to enable CONFIG_COMPILE_TEST configurations for them
 
 - Thierry Reding has done an unrelated set of patches that extends
   the asm-generic/io.h file to the degree necessary to make it useful
   on ARM64 and potentially other architectures.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAVIdwNmCrR//JCVInAQJWuw/9FHt2ThMnI1J1Jqy4CVwtyjWTSa6Y/uVj
 xSytS7AOvmU/nw1quSoba5mN9fcUQUtK9kqjqNcq71WsQcDE6BF9SFpi9cWtjWcI
 ZfWsC+5kqry/mbnuHefENipem9RqBrLbOBJ3LARf5M8rZJuTz1KbdZs9r9+1QsCX
 ou8jeqVvNKUn9J1WyekJBFSrPOtZ4bCUpeyh23JHRfPtJeAHNOuPuymj6WceAz98
 uMV1icRaCBMySsf9HgsHRYW5HwuCm3MrrYj6ukyPpgxYz7FRq4hJLDs6GnlFtAGb
 71g87NpFdB32qbW+y1ntfYaJyUryMHMVHBWcV5H9m0btdHTRHYZjoOGOPuyLHHO8
 +l4/FaOQhnDL8cNDj0HKfhdlyaFylcWgs1wzj68nv31c1dGjcJcQiyCDwry9mJhr
 erh4EewcerUvWzbBMQ4JP1f8syKMsKwbo1bVU61a1RQJxEqVCzJMLweGSOFmqMX2
 6E4ZJVWv81UFLoFTzYx+7+M45K4NWywKNQdzwKmqKHc4OQyvq4ALJI0A7SGFJdDR
 HJ7VqDiLaSdBitgJcJUxNzKcyXij6wE9jE1fBe3YDFE4LrnZXFVLN+MX6hs7AIFJ
 vJM1UpxRxQUMGIH2m7rbDNazOAsvQGxINOjNor23cNLuf6qLY1LrpHVPQDAfJVvA
 6tROM77bwIQ=
 =xUv6
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic asm/io.h rewrite from Arnd Bergmann:
 "While there normally is no reason to have a pull request for
  asm-generic but have all changes get merged through whichever tree
  needs them, I do have a series for 3.19.

  There are two sets of patches that change significant portions of
  asm/io.h, and this branch contains both in order to resolve the
  conflicts:

   - Will Deacon has done a set of patches to ensure that all
     architectures define {read,write}{b,w,l,q}_relaxed() functions or
     get them by including asm-generic/io.h.

     These functions are commonly used on ARM specific drivers to avoid
     expensive L2 cache synchronization implied by the normal
     {read,write}{b,w,l,q}, but we need to define them on all
     architectures in order to share the drivers across architectures
     and to enable CONFIG_COMPILE_TEST configurations for them

   - Thierry Reding has done an unrelated set of patches that extends
     the asm-generic/io.h file to the degree necessary to make it useful
     on ARM64 and potentially other architectures"

* tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (29 commits)
  ARM64: use GENERIC_PCI_IOMAP
  sparc: io: remove duplicate relaxed accessors on sparc32
  ARM: sa11x0: Use void __iomem * in MMIO accessors
  arm64: Use include/asm-generic/io.h
  ARM: Use include/asm-generic/io.h
  asm-generic/io.h: Implement generic {read,write}s*()
  asm-generic/io.h: Reconcile I/O accessor overrides
  /dev/mem: Use more consistent data types
  Change xlate_dev_{kmem,mem}_ptr() prototypes
  ARM: ixp4xx: Properly override I/O accessors
  ARM: ixp4xx: Fix build with IXP4XX_INDIRECT_PCI
  ARM: ebsa110: Properly override I/O accessors
  ARC: Remove redundant PCI_IOBASE declaration
  documentation: memory-barriers: clarify relaxed io accessor semantics
  x86: io: implement dummy relaxed accessor macros for writes
  tile: io: implement dummy relaxed accessor macros for writes
  sparc: io: implement dummy relaxed accessor macros for writes
  powerpc: io: implement dummy relaxed accessor macros for writes
  parisc: io: implement dummy relaxed accessor macros for writes
  mn10300: io: implement dummy relaxed accessor macros for writes
  ...
2014-12-09 17:25:00 -08:00
Mark Brown
addaeea9ee Merge remote-tracking branches 'asoc/topic/hdmi', 'asoc/topic/intel', 'asoc/topic/jack', 'asoc/topic/jz4740' and 'asoc/topic/lm49453' into asoc-next 2014-12-08 13:12:00 +00:00
Juergen Gross
90fff3ea15 xen: introduce helper functions to do safe read and write accesses
Introduce two helper functions to safely read and write unsigned long
values from or to memory when the access may fault because the mapping
is non-present or read-only.

These helpers can be used instead of open coded uses of __get_user()
and __put_user() avoiding the need to do casts to fix sparse warnings.

Use the helpers in page.h and p2m.c. This will fix the sparse
warnings when doing "make C=1".

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-08 10:53:59 +00:00
Ingo Molnar
2a2662bf88 Merge branch 'perf/core-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into perf/hw_breakpoints
Pull AMD range breakpoints support from Frederic Weisbecker:

" - Extend breakpoint tools and core to support address range through perf
    event with initial backend support for AMD extended breakpoints.

    Syntax is:

           perf record -e mem:addr/len:type

    For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512)

           perf record -e mem:0x1000/512:w

 - Clean up a bit breakpoint code validation

 It has been acked by Jiri and Oleg. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-08 11:50:24 +01:00
Oleg Nesterov
78bff1c868 x86/ticketlock: Fix spin_unlock_wait() livelock
arch_spin_unlock_wait() looks very suboptimal, to the point I
think this is just wrong and can lead to livelock: if the lock
is heavily contended we can never see head == tail.

But we do not need to wait for arch_spin_is_locked() == F. If it
is locked we only need to wait until the current owner drops
this lock. So we could simply spin until old_head !=
lock->tickets.head in this case, but .head can overflow and thus
we can't check "unlocked" only once before the main loop.

Also, the "unlocked" check can ignore TICKET_SLOWPATH_FLAG bit.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Paul E.McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <Waiman.Long@hp.com>
Link: http://lkml.kernel.org/r/20141201213417.GA5842@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-08 11:36:44 +01:00
Borislav Petkov
fbae4ba8c4 x86, microcode: Reload microcode on resume
Normally, we do reapply microcode on resume. However, in the cases where
that microcode comes from the early loader and the late loader hasn't
been utilized yet, there's no easy way for us to go and apply the patch
applied during boot by the early loader.

Thus, reuse the patch stashed by the early loader for the BSP.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-06 13:03:03 +01:00
Wanpeng Li
203000993d kvm: vmx: add MSR logic for XSAVES
Add logic to get/set the XSS model-specific register.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:39 +01:00
Wanpeng Li
f53cd63c2d kvm: x86: handle XSAVES vmcs and vmexit
Initialize the XSS exit bitmap.  It is zero so there should be no XSAVES
or XRSTORS exits.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:33 +01:00
Wanpeng Li
55412b2eda kvm: x86: Add kvm_x86_ops hook that enables XSAVES for guest
Expose the XSAVES feature to the guest if the kvm_x86_ops say it is
available.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:16 +01:00
Juergen Gross
054954eb05 xen: switch to linear virtual mapped sparse p2m list
At start of the day the Xen hypervisor presents a contiguous mfn list
to a pv-domain. In order to support sparse memory this mfn list is
accessed via a three level p2m tree built early in the boot process.
Whenever the system needs the mfn associated with a pfn this tree is
used to find the mfn.

Instead of using a software walked tree for accessing a specific mfn
list entry this patch is creating a virtual address area for the
entire possible mfn list including memory holes. The holes are
covered by mapping a pre-defined  page consisting only of "invalid
mfn" entries. Access to a mfn entry is possible by just using the
virtual base address of the mfn list and the pfn as index into that
list. This speeds up the (hot) path of determining the mfn of a
pfn.

Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
showed following improvements:

Elapsed time: 32:50 ->  32:35
System:       18:07 ->  17:47
User:        104:00 -> 103:30

Tested with following configurations:
- 64 bit dom0, 8GB RAM
- 64 bit dom0, 128 GB RAM, PCI-area above 4 GB
- 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without
                                    the patch)
- 32 bit domU, ballooning up and down
- 32 bit domU, save and restore
- 32 bit domU with PCI passthrough
- 64 bit domU, 8 GB, 2049 MB, 5000 MB
- 64 bit domU, ballooning up and down
- 64 bit domU, save and restore
- 64 bit domU with PCI passthrough

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:09:15 +00:00
Juergen Gross
0aad568983 xen: Hide get_phys_to_machine() to be able to tune common path
Today get_phys_to_machine() is always called when the mfn for a pfn
is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
to be able to avoid calling get_phys_to_machine() when possible as
soon as the switch to a linear mapped p2m list has been done.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:09:09 +00:00
Juergen Gross
792230c3a6 x86: Introduce function to get pmd entry pointer
Introduces lookup_pmd_address() to get the address of the pmd entry
related to a virtual address in the current address space. This
function is needed for support of a virtual mapped sparse p2m list
in xen pv domains, as we need the address of the pmd entry, not the
one of the pte in that case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:09:04 +00:00
Juergen Gross
5b8e7d8054 xen: Delay invalidating extra memory
When the physical memory configuration is initialized the p2m entries
for not pouplated memory pages are set to "invalid". As those pages
are beyond the hypervisor built p2m list the p2m tree has to be
extended.

This patch delays processing the extra memory related p2m entries
during the boot process until some more basic memory management
functions are callable. This removes the need to create new p2m
entries until virtual memory management is available.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:08:59 +00:00
Juergen Gross
1f3ac86b4c xen: Delay remapping memory of pv-domain
Early in the boot process the memory layout of a pv-domain is changed
to match the E820 map (either the host one for Dom0 or the Xen one)
regarding placement of RAM and PCI holes. This requires removing memory
pages initially located at positions not suitable for RAM and adding
them later at higher addresses where no restrictions apply.

To be able to operate on the hypervisor supported p2m list until a
virtual mapped linear p2m list can be constructed, remapping must
be delayed until virtual memory management is initialized, as the
initial p2m list can't be extended unlimited at physical memory
initialization time due to it's fixed structure.

A further advantage is the reduction in complexity and code volume as
we don't have to be careful regarding memory restrictions during p2m
updates.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:08:48 +00:00
Juergen Gross
820c4db2be xen: Make functions static
Some functions in arch/x86/xen/p2m.c are used locally only. Make them
static. Rearrange the functions in p2m.c to avoid forward declarations.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 14:08:37 +00:00
Boris Ostrovsky
14520c92cb xen/pci: Use APIC directly when APIC virtualization hardware is available
When hardware supports APIC/x2APIC virtualization we don't need to use
pirqs for MSI handling and instead use APIC since most APIC accesses
(MMIO or MSR) will now be processed without VMEXITs.

As an example, netperf on the original code produces this profile
(collected wih 'xentrace -e 0x0008ffff -T 5'):

    342 cpu_change
    260 CPUID
  34638 HLT
  64067 INJ_VIRQ
  28374 INTR
  82733 INTR_WINDOW
     10 NPF
  24337 TRAP
 370610 vlapic_accept_pic_intr
 307528 VMENTRY
 307527 VMEXIT
 140998 VMMCALL
    127 wrap_buffer

After applying this patch the same test shows

    230 cpu_change
    260 CPUID
  36542 HLT
    174 INJ_VIRQ
  27250 INTR
    222 INTR_WINDOW
     20 NPF
  24999 TRAP
 381812 vlapic_accept_pic_intr
 166480 VMENTRY
 166479 VMEXIT
  77208 VMMCALL
     81 wrap_buffer

ApacheBench results (ab -n 10000 -c 200) improve by about 10%

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-12-04 13:02:43 +00:00
Stefano Stabellini
a4dba13089 xen/arm/arm64: introduce xen_arch_need_swiotlb
Introduce an arch specific function to find out whether a particular dma
mapping operation needs to bounce on the swiotlb buffer.

On ARM and ARM64, if the page involved is a foreign page and the device
is not coherent, we need to bounce because at unmap time we cannot
execute any required cache maintenance operations (we don't know how to
find the pfn from the mfn).

No change of behaviour for x86.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-12-04 12:41:54 +00:00
Stefano Stabellini
a0f2dee0cd xen: add a dma_addr_t dev_addr argument to xen_dma_map_page
dev_addr is the machine address of the page.

The new parameter can be used by the ARM and ARM64 implementations of
xen_dma_map_page to find out if the page is a local page (pfn == mfn) or
a foreign page (pfn != mfn).

dev_addr could be retrieved again from the physical address, using
pfn_to_mfn, but it requires accessing an rbtree. Since we already have
the dev_addr in our hands at the call site there is no need to get the
mfn twice.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-12-04 12:41:51 +00:00
Jacob Shin
d6d55f0b9d perf/x86/amd: AMD support for bp_len > HW_BREAKPOINT_LEN_8
Implement hardware breakpoint address mask for AMD Family 16h and
above processors. CPUID feature bit indicates hardware support for
DRn_ADDR_MASK MSRs. These masks further qualify DRn/DR7 hardware
breakpoint addresses to allow matching of larger addresses ranges.

Valuable advice and pseudo code from Oleg Nesterov <oleg@redhat.com>

Signed-off-by: Jacob Shin <jacob.w.shin@gmail.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xiakaixu <xiakaixu@huawei.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-12-03 15:14:26 +01:00
Steven Rostedt (Red Hat)
4bcdf1522f ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file
Linus pointed out that MCOUNT_SAVE_FRAME is used in only a single file
and that there's no reason that it should be in a header file.
Move the macro to the code that uses it.

Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-01 14:07:16 -05:00
Borislav Petkov
2ef84b3bb9 x86, microcode, AMD: Do not use smp_processor_id() in preemtible context
Hand down the cpu number instead, otherwise lockdep screams when doing

echo 1 > /sys/devices/system/cpu/microcode/reload.

BUG: using smp_processor_id() in preemptible [00000000] code: amd64-microcode/2470
caller is debug_smp_processor_id+0x12/0x20
CPU: 1 PID: 2470 Comm: amd64-microcode Not tainted 3.18.0-rc6+ #26
...

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1417428741-4501-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-01 11:51:05 +01:00
Rafael J. Wysocki
8a497cfdc0 Merge back earlier cpufreq material for 3.19-rc1. 2014-12-01 02:46:24 +01:00
Paolo Bonzini
2b4a273b42 kvm: x86: avoid warning about potential shift wrapping bug
cs.base is declared as a __u64 variable and vector is a u32 so this
causes a static checker warning.  The user indeed can set "sipi_vector"
to any u32 value in kvm_vcpu_ioctl_x86_set_vcpu_events(), but the
value should really have 8-bit precision only.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-24 16:53:50 +01:00
Paolo Bonzini
c9eab58f64 KVM: x86: move device assignment out of kvm_host.h
Create a new header, and hide the device assignment functions there.
Move struct kvm_assigned_dev_kernel to assigned-dev.c by modifying
arch/x86/kvm/iommu.c to take a PCI device struct.

Based on a patch by Radim Krcmar <rkrcmark@redhat.com>.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-24 16:53:50 +01:00
Andy Lutomirski
82975bc6a6 uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME
x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
not on non-paranoid returns.  I suspect that this is a mistake and that
the code only works because int3 is paranoid.

Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
for the x86 bug.  With that bug fixed, we can remove _TIF_NOTIFY_RESUME
from the uprobes code.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-23 14:25:28 -08:00
Andy Lutomirski
6f442be2fb x86_64, traps: Stop using IST for #SS
On a 32-bit kernel, this has no effect, since there are no IST stacks.

On a 64-bit kernel, #SS can only happen in user code, on a failed iret
to user space, a canonical violation on access via RSP or RBP, or a
genuine stack segment violation in 32-bit kernel code.  The first two
cases don't need IST, and the latter two cases are unlikely fatal bugs,
and promoting them to double faults would be fine.

This fixes a bug in which the espfix64 code mishandles a stack segment
violation.

This saves 4k of memory per CPU and a tiny bit of code.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-23 13:56:19 -08:00
Radim Krčmář
c274e03af7 kvm: x86: move assigned-dev.c and iommu.c to arch/x86/
Now that ia64 is gone, we can hide deprecated device assignment in x86.

Notable changes:
 - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl()

The easy parts were removed from generic kvm code, remaining
 - kvm_iommu_(un)map_pages() would require new code to be moved
 - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-23 18:33:36 +01:00
Paolo Bonzini
6ef768fac9 kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/
ia64 does not need them anymore.  Ack notifiers become x86-specific
too.

Suggested-by: Gleb Natapov <gleb@kernel.org>
Reviewed-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-21 18:02:37 +01:00
Chen Yucong
e3480271f5 x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for AMD platform.

This patch extends the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.

In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.

Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-11-19 10:55:43 -08:00
Dave Hansen
a1ea1c032b x86: Cleanly separate use of asm-generic/mm_hooks.h
asm-generic/mm_hooks.h provides some generic fillers for the 90%
of architectures that do not need to hook some mmap-manipulation
functions.  A comment inside says:

> Define generic no-op hooks for arch_dup_mmap and
> arch_exit_mmap, to be included in asm-FOO/mmu_context.h
> for any arch FOO which doesn't need to hook these.

So, does x86 need to hook these?  It depends on CONFIG_PARAVIRT.
We *conditionally* include this generic header if we have
CONFIG_PARAVIRT=n.  That's madness.

With this patch, x86 stops using asm-generic/mmu_hooks.h entirely.
We use our own copies of the functions.  The paravirt code
provides some stubs if it is disabled, and we always call those
stubs in our x86-private versions of arch_exit_mmap() and
arch_dup_mmap().

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20141118182349.14567FA5@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-19 11:54:13 +01:00
Rafael J. Wysocki
bd2a0f6754 Merge back cpufreq material for 3.19-rc1. 2014-11-18 01:22:29 +01:00
Dave Hansen
1de4fa14ee x86, mpx: Cleanup unused bound tables
The previous patch allocates bounds tables on-demand.  As noted in
an earlier description, these can add up to *HUGE* amounts of
memory.  This has caused OOMs in practice when running tests.

This patch adds support for freeing bounds tables when they are no
longer in use.

There are two types of mappings in play when unmapping tables:
 1. The mapping with the actual data, which userspace is
    munmap()ing or brk()ing away, etc...
 2. The mapping for the bounds table *backing* the data
    (is tagged with VM_MPX, see the patch "add MPX specific
    mmap interface").

If userspace use the prctl() indroduced earlier in this patchset
to enable the management of bounds tables in kernel, when it
unmaps the first type of mapping with the actual data, the kernel
needs to free the mapping for the bounds table backing the data.
This patch hooks in at the very end of do_unmap() to do so.
We look at the addresses being unmapped and find the bounds
directory entries and tables which cover those addresses.  If
an entire table is unused, we clear associated directory entry
and free the table.

Once we unmap the bounds table, we would have a bounds directory
entry pointing at empty address space. That address space might
now be allocated for some other (random) use, and the MPX
hardware might now try to walk it as if it were a bounds table.
That would be bad.  So any unmapping of an enture bounds table
has to be accompanied by a corresponding write to the bounds
directory entry to invalidate it.  That write to the bounds
directory can fault, which causes the following problem:

Since we are doing the freeing from munmap() (and other paths
like it), we hold mmap_sem for write. If we fault, the page
fault handler will attempt to acquire mmap_sem for read and
we will deadlock.  To avoid the deadlock, we pagefault_disable()
when touching the bounds directory entry and use a
get_user_pages() to resolve the fault.

The unmapping of bounds tables happends under vm_munmap().  We
also (indirectly) call vm_munmap() to _do_ the unmapping of the
bounds tables.  We avoid unbounded recursion by disallowing
freeing of bounds tables *for* bounds tables.  This would not
occur normally, so should not have any practical impact.  Being
strict about it here helps ensure that we do not have an
exploitable stack overflow.

Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151831.E4531C4A@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:54 +01:00
Dave Hansen
fe3d197f84 x86, mpx: On-demand kernel allocation of bounds tables
This is really the meat of the MPX patch set.  If there is one patch to
review in the entire series, this is the one.  There is a new ABI here
and this kernel code also interacts with userspace memory in a
relatively unusual manner.  (small FAQ below).

Long Description:

This patch adds two prctl() commands to provide enable or disable the
management of bounds tables in kernel, including on-demand kernel
allocation (See the patch "on-demand kernel allocation of bounds tables")
and cleanup (See the patch "cleanup unused bound tables"). Applications
do not strictly need the kernel to manage bounds tables and we expect
some applications to use MPX without taking advantage of this kernel
support. This means the kernel can not simply infer whether an application
needs bounds table management from the MPX registers.  The prctl() is an
explicit signal from userspace.

PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
require kernel's help in managing bounds tables.

PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't
want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel
won't allocate and free bounds tables even if the CPU supports MPX.

PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds
directory out of a userspace register (bndcfgu) and then cache it into
a new field (->bd_addr) in  the 'mm_struct'.  PR_MPX_DISABLE_MANAGEMENT
will set "bd_addr" to an invalid address.  Using this scheme, we can
use "bd_addr" to determine whether the management of bounds tables in
kernel is enabled.

Also, the only way to access that bndcfgu register is via an xsaves,
which can be expensive.  Caching "bd_addr" like this also helps reduce
the cost of those xsaves when doing table cleanup at munmap() time.
Unfortunately, we can not apply this optimization to #BR fault time
because we need an xsave to get the value of BNDSTATUS.

==== Why does the hardware even have these Bounds Tables? ====

MPX only has 4 hardware registers for storing bounds information.
If MPX-enabled code needs more than these 4 registers, it needs to
spill them somewhere. It has two special instructions for this
which allow the bounds to be moved between the bounds registers
and some new "bounds tables".

They are similar conceptually to a page fault and will be raised by
the MPX hardware during both bounds violations or when the tables
are not present. This patch handles those #BR exceptions for
not-present tables by carving the space out of the normal processes
address space (essentially calling the new mmap() interface indroduced
earlier in this patch set.) and then pointing the bounds-directory
over to it.

The tables *need* to be accessed and controlled by userspace because
the instructions for moving bounds in and out of them are extremely
frequent. They potentially happen every time a register pointing to
memory is dereferenced. Any direct kernel involvement (like a syscall)
to access the tables would obviously destroy performance.

==== Why not do this in userspace? ====

This patch is obviously doing this allocation in the kernel.
However, MPX does not strictly *require* anything in the kernel.
It can theoretically be done completely from userspace. Here are
a few ways this *could* be done. I don't think any of them are
practical in the real-world, but here they are.

Q: Can virtual space simply be reserved for the bounds tables so
   that we never have to allocate them?
A: As noted earlier, these tables are *HUGE*. An X-GB virtual
   area needs 4*X GB of virtual space, plus 2GB for the bounds
   directory. If we were to preallocate them for the 128TB of
   user virtual address space, we would need to reserve 512TB+2GB,
   which is larger than the entire virtual address space today.
   This means they can not be reserved ahead of time. Also, a
   single process's pre-popualated bounds directory consumes 2GB
   of virtual *AND* physical memory. IOW, it's completely
   infeasible to prepopulate bounds directories.

Q: Can we preallocate bounds table space at the same time memory
   is allocated which might contain pointers that might eventually
   need bounds tables?
A: This would work if we could hook the site of each and every
   memory allocation syscall. This can be done for small,
   constrained applications. But, it isn't practical at a larger
   scale since a given app has no way of controlling how all the
   parts of the app might allocate memory (think libraries). The
   kernel is really the only place to intercept these calls.

Q: Could a bounds fault be handed to userspace and the tables
   allocated there in a signal handler instead of in the kernel?
A: (thanks to tglx) mmap() is not on the list of safe async
   handler functions and even if mmap() would work it still
   requires locking or nasty tricks to keep track of the
   allocation state there.

Having ruled out all of the userspace-only approaches for managing
bounds tables that we could think of, we create them on demand in
the kernel.

Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Dave Hansen
fcc7ffd679 x86, mpx: Decode MPX instruction to get bound violation information
This patch sets bound violation fields of siginfo struct in #BR
exception handler by decoding the user instruction and constructing
the faulting pointer.

We have to be very careful when decoding these instructions.  They
are completely controlled by userspace and may be changed at any
time up to and including the point where we try to copy them in to
the kernel.  They may or may not be MPX instructions and could be
completely invalid for all we know.

Note: This code is based on Qiaowei Ren's specialized MPX
decoder, but uses the generic decoder whenever possible.  It was
tested for robustness by generating a completely random data
stream and trying to decode that stream.  I also unmapped random
pages inside the stream to test the "partial instruction" short
read code.

We kzalloc() the siginfo instead of stack allocating it because
we need to memset() it anyway, and doing this makes it much more
clear when it got initialized by the MPX instruction decoder.

Changes from the old decoder:
 * Use the generic decoder instead of custom functions.  Saved
   ~70 lines of code overall.
 * Remove insn->addr_bytes code (never used??)
 * Make sure never to possibly overflow the regoff[] array, plus
   check the register range correctly in 32 and 64-bit modes.
 * Allow get_reg() to return an error and have mpx_get_addr_ref()
   handle when it sees errors.
 * Only call insn_get_*() near where we actually use the values
   instead if trying to call them all at once.
 * Handle short reads from copy_from_user() and check the actual
   number of read bytes against what we expect from
   insn_get_length().  If a read stops in the middle of an
   instruction, we error out.
 * Actually check the opcodes intead of ignoring them.
 * Dynamically kzalloc() siginfo_t so we don't leak any stack
   data.
 * Detect and handle decoder failures instead of ignoring them.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151828.5BDD0915@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Qiaowei Ren
57319d80e1 x86, mpx: Add MPX-specific mmap interface
We have chosen to perform the allocation of bounds tables in
kernel (See the patch "on-demand kernel allocation of bounds
tables") and to mark these VMAs with VM_MPX.

However, there is currently no suitable interface to actually do
this.  Existing interfaces, like do_mmap_pgoff(), have no way to
set a modified ->vm_ops or ->vm_flags and don't hold mmap_sem
long enough to let a caller do it.

This patch wraps mmap_region() and hold mmap_sem long enough to
make the modifications to the VMA which we need.

Also note the 32/64-bit #ifdef in the header.  We actually need
to do this at runtime eventually.  But, for now, we don't support
running 32-bit binaries on 64-bit kernels.  Support for this will
come in later patches.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151827.CE440F67@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Dave Hansen
95290cf13e x86, mpx: Add MPX to disabled features
This allows us to use cpu_feature_enabled(X86_FEATURE_MPX) as
both a runtime and compile-time check.

When CONFIG_X86_INTEL_MPX is disabled,
cpu_feature_enabled(X86_FEATURE_MPX) will evaluate at
compile-time to 0. If CONFIG_X86_INTEL_MPX=y, then the cpuid
flag will be checked at runtime.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151823.B358EAD2@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Dave Hansen
62e7759b1b x86, mpx: Rename cfg_reg_u and status_reg
According to Intel SDM extension, MPX configuration and status registers
should be BNDCFGU and BNDSTATUS. This patch renames cfg_reg_u and
status_reg to bndcfgu and bndstatus.

[ tglx: Renamed 'struct bndscr_struct' to 'struct bndscr' ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Link: http://lkml.kernel.org/r/20141114151817.031762AC@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Dave Hansen
c04e051ccc x86: mpx: Give bndX registers actual names
Consider the bndX MPX registers.  There 4 registers each
containing a 64-bit lower and a 64-bit upper bound.  That's 8*64
bits and we declare it thusly:

	struct bndregs_struct {
		u64 bndregs[8];
	}
    
Let's say you want to read the upper bound from the MPX register
bnd2 out of the xsave buf.  You do:

	bndregno = 2;
	upper_bound = xsave_buf->bndregs.bndregs[2*bndregno+1];

That kinda sucks.  Every time you access it, you need to know:
1. Each bndX register is two entries wide in "bndregs"
2. The lower comes first followed by upper.  We do the +1 to get
   upper vs. lower.

This replaces the old definition.  You can now access them
indexed by the register number directly, and with a meaningful
name for the lower and upper bound:

	bndregno = 2;
	xsave_buf->bndreg[bndregno].upper_bound;

It's now *VERY* clear that there are 4 registers.  The programmer
now doesn't have to care what order the lower and upper bounds
are in, and it's harder to get it wrong.

[ tglx: Changed ub/lb to upper_bound/lower_bound and renamed struct
bndreg_struct to struct bndreg ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: "Yu, Fenghua" <fenghua.yu@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141031215820.5EA5E0EC@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:52 +01:00
Dave Hansen
6ba48ff46f x86: Remove arbitrary instruction size limit in instruction decoder
The current x86 instruction decoder steps along through the
instruction stream but always ensures that it never steps farther
than the largest possible instruction size (MAX_INSN_SIZE).

The MPX code is now going to be doing some decoding of userspace
instructions.  We copy those from userspace in to the kernel and
they're obviously completely untrusted coming from userspace.  In
addition to the constraint that instructions can only be so long,
we also have to be aware of how long the buffer is that came in
from userspace.  This _looks_ to be similar to what the perf and
kprobes is doing, but it's unclear to me whether they are
affected.

The whole reason we need this is that it is perfectly valid to be
executing an instruction within MAX_INSN_SIZE bytes of an
unreadable page. We should be able to gracefully handle short
reads in those cases.

This adds support to the decoder to record how long the buffer
being decoded is and to refuse to "validate" the instruction if
we would have gone over the end of the buffer to decode it.

The kprobes code probably needs to be looked at here a bit more
carefully.  This patch still respects the MAX_INSN_SIZE limit
there but the kprobes code does look like it might be able to
be a bit more strict than it currently is.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:52 +01:00
Thomas Gleixner
0dbcae8847 x86: mm: Move PAT only functions to mm/pat.c
Commit e00c8cc93c "x86: Use new cache mode type in memtype related
functions" broke the ARCH=um build.

 arch/x86/include/asm/cacheflush.h:67:36: error: return type is an incomplete type
 static inline enum page_cache_mode get_page_memtype(struct page *pg)

The reason is simple. get_page_memtype() and set_page_memtype()
require enum page_cache_mode now, which is defined in
asm/pgtable_types.h. UM does not include that file for obvious reasons.

The simple solution is to move that functions to arch/x86/mm/pat.c
where the only callsites of this are located. They should have been
there in the first place.

Fixes: e00c8cc93c "x86: Use new cache mode type in memtype related functions"
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Weinberger <richard@nod.at>
2014-11-16 18:59:19 +01:00
Ingo Molnar
f108c898dd Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:31:17 +01:00
Juergen Gross
bd809af16e x86: Enable PAT to use cache mode translation tables
Update the translation tables from cache mode to pgprot values
according to the PAT settings. This enables changing the cache
attributes of a PAT index in just one place without having to change
at the users side.

With this change it is possible to use the same kernel with different
PAT configurations, e.g. supporting Xen.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-18-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
87ad0b713b x86: Clean up pgtable_types.h
Remove no longer used defines from pgtable_types.h as they are not
used any longer.

Switch __PAGE_KERNEL_NOCACHE to use cache mode type instead of pte
bits.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-15-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
e00c8cc93c x86: Use new cache mode type in memtype related functions
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-14-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
b14097bd91 x86: Use new cache mode type in mm/ioremap.c
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-13-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
49a3b3cbdf x86: Use new cache mode type in mm/iomap_32.c
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type. This requires to change
io_reserve_memtype() as well.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-9-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
d85f33342a x86: Use new cache mode type in asm/pgtable.h
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type. This requires changing some callers of
is_new_memtype_allowed() to be changed as well.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-8-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
c27ce0af89 x86: Use new cache mode type in include/asm/fb.h
Instead of directly using cache mode bits in the pte switch to usage of
the new cache mode type.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-3-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:24 +01:00
Juergen Gross
281d4078be x86: Make page cache mode a real type
At the moment there are a lot of places that handle setting or getting
the page cache mode by treating the pgprot bits equal to the cache mode.
This is only true because there are a lot of assumptions about the setup
of the PAT MSR. Otherwise the cache type needs to get translated into
pgprot bits and vice versa.

This patch tries to prepare for that by introducing a separate type
for the cache mode and adding functions to translate between those and
pgprot values.

To avoid too much performance penalty the translation between cache mode
and pgprot values is done via tables which contain the relevant
information.  Write-back cache mode is hard-wired to be 0, all other
modes are configurable via those tables. For large pages there are
translation functions as the PAT bit is located at different positions
in the ptes of 4k and large pages.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-2-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:24 +01:00
Ingo Molnar
595247f61f * Support module unload for efivarfs - Mathias Krause
* Another attempt at moving x86 to libstub taking advantage of the
    __pure attribute - Ard Biesheuvel
 
  * Add EFI runtime services section to ptdump - Mathias Krause
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUZnQGAAoJEC84WcCNIz1VfyUP/1MCVt4vepl7+0JzdUP/eVPs
 CwPM6gBOvgx1PviWrtvSU8UyjtYDqZx7jnCvyvbmlgixAqqIoFm80x5sd9DfyJBj
 vSrmavXaJgQomJN3N+fvaIpGJXp8NQmeNT87++UMb6VE5nYvx7suDcwfTqOaxcYt
 yDwKatTXTvQxDLlGgtymp2UhgVKBICs9WVo8weevB5LPmpt4TFCi1GDSimJkfg+0
 JTvkKF+QxPmVqgwY7bgdFFcfhsYCux5VgtbD4DJKS3LgfLJLAMKPOt83DbeOSmIa
 8zqtlF3eMwHgecKrMLSfIZH3XIl2gsIdPdvT6iBQkwwGZuSzG93JgwJ90HYiKoDm
 yNlffnhmdgI2RXO97UZJpzqGor+eNc0auuS4485PcE8NtZ1tbo20A/OCpfIzK8j8
 Vk7sfZxaaHKF5PUtRe6vo3myRlUCofMIuSEWSF8d709R6AEEuia6RZ3Y45EJPROn
 fKOiLsf7Og1Mk43Iy2lb7kFT766OsUnZZHU/xiIZj/v94HPWFWoFPtxPC0IURvPx
 24oiJxCnyXWGtoyn+SSprl+NAPuPsxVFYriTwaq2RBuoY0NAdy7NIXKe2HTp/WI0
 oSTtYkCRcwiHv0aSrg+yQHmwH7y7m39S3yIS4t5LXenn2G4ObUUjhcAQdF2Ft0Pr
 MT/l+stTt390cpfpcQbE
 =he2O
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi

Pull EFI updates for v3.19 from Matt Fleming:

 - Support module unload for efivarfs - Mathias Krause

 - Another attempt at moving x86 to libstub taking advantage of the
   __pure attribute - Ard Biesheuvel

 - Add EFI runtime services section to ptdump - Mathias Krause

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:48:53 +01:00
Igor Mammedov
1d4e7e3c0b kvm: x86: increase user memory slots to 509
With the 3 private slots, this gives us 512 slots total.
Motivation for this is in addition to assigned devices
support more memory hotplug slots, where 1 slot is
used by a hotplugged memory stick.
It will allow to support upto 256 hotplug memory
slots and leave 253 slots for assigned devices and
other devices that use them.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-14 10:02:40 +01:00
Aravind Gopalakrishnan
904cb3677f perf/x86/amd/ibs: Update IBS MSRs and feature definitions
New Fam15h models carry extra feature bits and extend
the MSR register space for IBS ops. Adding them here.

While at it, add functionality to read IbsBrTarget and
OpData4 depending on their availability if user wants a
PERF_SAMPLE_RAW.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <paulus@samba.org>
Cc: <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415651066-13523-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-12 15:12:32 +01:00
Dirk Brandewie
2f86dc4cdd intel_pstate: Add support for HWP
Add support of Hardware Managed Performance States (HWP) described in Volume 3
section 14.4 of the SDM.

With HWP enbaled intel_pstate will no longer be responsible for selecting P
states for the processor. intel_pstate will continue to register to
the cpufreq core as the scaling driver for CPUs implementing
HWP. In HWP mode intel_pstate provides three functions reporting
frequency to the cpufreq core, support for the set_policy() interface
from the core and maintaining the intel_pstate sysfs interface in
/sys/devices/system/cpu/intel_pstate.  User preferences expressed via
the set_policy() interface or the sysfs interface are forwared to the
CPU via the HWP MSR interface.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-12 00:04:38 +01:00
Dirk Brandewie
7787388772 x86: Add support for Intel HWP feature detection.
Add support of Hardware Managed Performance States (HWP) described in Volume 3
section 14.4 of the SDM.

One bit CPUID.06H:EAX[bit 7] expresses the presence of the HWP feature on
the processor. The remaining bits CPUID.06H:EAX[bit 8-11] denote the
presense of various HWP features.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-12 00:04:37 +01:00
Mathias Krause
8266e31ed0 x86, ptdump: Add section for EFI runtime services
In commit 3891a04aaf ("x86-64, espfix: Don't leak bits 31:16 of %esp
returning..") the "ESPFix Area" was added to the page table dump special
sections. That area, though, has a limited amount of entries printed.

The EFI runtime services are, unfortunately, located in-between the
espfix area and the high kernel memory mapping. Due to the enforced
limitation for the espfix area, the EFI mappings won't be printed in the
page table dump.

To make the ESP runtime service mappings visible again, provide them a
dedicated entry.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-11-11 22:28:57 +00:00
Ard Biesheuvel
243b6754cd efi/x86: Move x86 back to libstub
This reverts commit 84be880560, which itself reverted my original
attempt to move x86 from #include'ing .c files from across the tree
to using the EFI stub built as a static library.

The issue that affected the original approach was that splitting
the implementation into several .o files resulted in the variable
'efi_early' becoming a global with external linkage, which under
-fPIC implies that references to it must go through the GOT. However,
dealing with this additional GOT entry turned out to be troublesome
on some EFI implementations. (GCC's visibility=hidden attribute is
supposed to lift this requirement, but it turned out not to work on
the 32-bit build.)

Instead, use a pure getter function to get a reference to efi_early.
This approach results in no additional GOT entries being generated,
so there is no need for any changes in the early GOT handling.

Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-11-11 22:23:11 +00:00
Yijing Wang
03f56e42d0 Revert "PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()"
The problem fixed by 0e4ccb1505 ("PCI: Add x86_msi.msi_mask_irq() and
msix_mask_irq()") has been fixed in a simpler way by a previous commit
("PCI/MSI: Add pci_msi_ignore_mask to prevent writes to MSI/MSI-X Mask
Bits").

The msi_mask_irq() and msix_mask_irq() x86_msi_ops added by 0e4ccb1505
are no longer needed, so revert the commit.

default_msi_mask_irq() and default_msix_mask_irq() were added by
0e4ccb1505 and are still used by s390, so keep them for now.

[bhelgaas: changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: xen-devel@lists.xenproject.org
2014-11-11 15:14:30 -07:00
Arnd Bergmann
1c8d29696f Merge branch 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into asm-generic
* 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
  documentation: memory-barriers: clarify relaxed io accessor semantics
  x86: io: implement dummy relaxed accessor macros for writes
  tile: io: implement dummy relaxed accessor macros for writes
  sparc: io: implement dummy relaxed accessor macros for writes
  powerpc: io: implement dummy relaxed accessor macros for writes
  parisc: io: implement dummy relaxed accessor macros for writes
  mn10300: io: implement dummy relaxed accessor macros for writes
  m68k: io: implement dummy relaxed accessor macros for writes
  m32r: io: implement dummy relaxed accessor macros for writes
  ia64: io: implement dummy relaxed accessor macros for writes
  cris: io: implement dummy relaxed accessor macros for writes
  frv: io: implement dummy relaxed accessor macros for writes
  xtensa: io: remove dummy relaxed accessor macros for reads
  s390: io: remove dummy relaxed accessor macros for reads
  microblaze: io: remove dummy relaxed accessor macros
  asm-generic: io: implement relaxed accessor macros as conditional wrappers

Conflicts:
	include/asm-generic/io.h

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2014-11-11 19:55:45 +01:00
Thierry Reding
4707a341b4 /dev/mem: Use more consistent data types
The xlate_dev_{kmem,mem}_ptr() functions take either a physical address
or a kernel virtual address, so data types should be phys_addr_t and
void *. They both return a kernel virtual address which is only ever
used in calls to copy_{from,to}_user(), so make variables that store it
void * rather than char * for consistency.

Also only define a weak unxlate_dev_mem_ptr() function if architectures
haven't overridden them in the asm/io.h header file.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2014-11-10 15:59:21 +01:00
Boris Ostrovsky
54279552bd x86/core, x86/xen/smp: Use 'die_complete' completion when taking CPU down
Commit 2ed53c0d6c ("x86/smpboot: Speed up suspend/resume by
avoiding 100ms sleep for CPU offline during S3") introduced
completions to CPU offlining process. These completions are not
initialized on Xen kernels causing a panic in
play_dead_common().

Move handling of die_complete into common routines to make them
available to Xen guests.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: tianyu.lan@intel.com
Cc: konrad.wilk@oracle.com
Cc: xen-devel@lists.xenproject.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414770572-7950-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-10 11:16:40 +01:00
Andy Lutomirski
07114f0f1c x86_64: Add a comment explaining the TASK_SIZE_MAX guard page
That guard page is absolutely necessary; explain why for
posterity.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/23320cb5017c2da8475ec20fcde8089d82aa2699.1415144745.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-10 10:43:13 +01:00
Nadav Amit
9d88fca71a KVM: x86: MOV to CR3 can set bit 63
Although Intel SDM mentions bit 63 is reserved, MOV to CR3 can have bit 63 set.
As Intel SDM states in section 4.10.4 "Invalidation of TLBs and
Paging-Structure Caches": " MOV to CR3. ... If CR4.PCIDE = 1 and bit 63 of the
instruction’s source operand is 0 ..."

In other words, bit 63 is not reserved. KVM emulator currently consider bit 63
as reserved. Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 15:44:07 +01:00
Nadav Amit
82b32774c2 KVM: x86: Breakpoints do not consider CS.base
x86 debug registers hold a linear address. Therefore, breakpoints detection
should consider CS.base, and check whether instruction linear address equals
(CS.base + RIP). This patch introduces a function to evaluate RIP linear
address and uses it for breakpoints detection.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07 15:44:04 +01:00
Jan Beulich
97b67ae559 x86-64: Use RIP-relative addressing for most per-CPU accesses
Observing that per-CPU data (in the SMP case) is reachable by
exploiting 64-bit address wraparound (building on the default kernel
load address being at 16Mb), the one byte shorter RIP-relative
addressing form can be used for most per-CPU accesses. The one
exception are the "stable" reads, where the use of the "P" operand
modifier prevents the compiler from using RIP-relative addressing, but
is unavoidable due to the use of the "p" constraint (side note: with
gcc 4.9.x the intended effect of this isn't being achieved anymore,
see gcc bug 63637).

With the dependency on the minimum kernel load address, arbitrarily
low values for CONFIG_PHYSICAL_START are now no longer possible. A
link time assertion is being added, directing to the need to increase
that value when it triggers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5458A1780200007800044A9D@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-04 20:43:14 +01:00
Jan Beulich
2c773dd31f x86: Convert a few more per-CPU items to read-mostly ones
Both this_cpu_off and cpu_info aren't getting modified post boot, yet
are being accessed on enough code paths that grouping them with other
frequently read items seems desirable. For cpu_info this at the same
time implies removing the cache line alignment (which afaict became
pointless when it got converted to per-CPU data years ago).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/54589BD20200007800044A84@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-04 20:13:28 +01:00
Andy Lutomirski
1ad83c858c x86_64,vsyscall: Make vsyscall emulation configurable
This adds CONFIG_X86_VSYSCALL_EMULATION, guarded by CONFIG_EXPERT.
Turning it off completely disables vsyscall emulation, saving ~3.5k
for vsyscall_64.c, 4k for vsyscall_emu_64.S (the fake vsyscall
page), some tiny amount of core mm code that supports a gate area,
and possibly 4k for a wasted pagetable.  The latter is because the
vsyscall addresses are misaligned and fit poorly in the fixmap.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/406db88b8dd5f0cbbf38216d11be34bbb43c7eae.1414618407.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-03 21:44:57 +01:00
James Custer
3ab0c49fd6 x86: UV BAU: Increase maximum CPUs per socket/hub
We have encountered hardware with 18 cores/socket that gives 36 CPUs/socket
with hyperthreading enabled. This exceeds the current MAX_CPUS_PER_SOCKET
causing a failure in get_cpu_topology. Increase MAX_CPUS_PER_SOCKET to 64
and MAX_CPUS_PER_UVHUB to 128.

Signed-off-by: James Custer <jcuster@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Link: http://lkml.kernel.org/r/1414952199-185319-1-git-send-email-jcuster@sgi.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-03 13:49:24 +01:00
Andy Lutomirski
e76b027e64 x86,vdso: Use LSL unconditionally for vgetcpu
LSL is faster than RDTSCP and works everywhere; there's no need to
switch between them depending on CPU.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Andi Kleen <andi@firstfloor.org>
Link: http://lkml.kernel.org/r/72f73d5ec4514e02bba345b9759177ef03742efb.1414706021.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-03 13:41:53 +01:00
Minfei Huang
63e7b6d90c x86: mm: Re-use the early_ioremap fixed area
The temp fixed area is only used during boot for early_ioremap(), and
it is unused when ioremap() is functional. vmalloc/pkmap area become
available after early boot so the temp fixed area is available for
re-use.

The virtual address is more precious on i386, especially turning on
high memory. So we can re-use the virtual address space.

Remove the now unused defines FIXADDR_BOOT_START and FIXADDR_BOOT_SIZE.

Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Cc: pbonzini@redhat.com
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/1414582717-32729-1-git-send-email-mnfhuang@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-03 13:40:44 +01:00
Chao Peng
612263b30c KVM: x86: Enable Intel AVX-512 for guest
Expose Intel AVX-512 feature bits to guest. Also add checks for
xcr0 AVX512 related bits according to spec:
http://download-software.intel.com/sites/default/files/managed/71/2e/319433-017.pdf

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03 12:07:30 +01:00
Nadav Amit
16f8a6f979 KVM: vmx: Unavailable DR4/5 is checked before CPL
If DR4/5 is accessed when it is unavailable (since CR4.DE is set), then #UD
should be generated even if CPL>0. This is according to Intel SDM Table 6-2:
"Priority Among Simultaneous Exceptions and Interrupts".

Note, that this may happen on the first DR access, even if the host does not
sets debug breakpoints. Obviously, it occurs when the host debugs the guest.

This patch moves the DR4/5 checks from __kvm_set_dr/_kvm_get_dr to handle_dr.
The emulator already checks DR4/5 availability in check_dr_read. Nested
virutalization related calls to kvm_set_dr/kvm_get_dr would not like to inject
exceptions to the guest.

As for SVM, the patch follows the previous logic as much as possible. Anyhow,
it appears the DR interception code might be buggy - even if the DR access
may cause an exception, the instruction is skipped.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03 12:07:26 +01:00
Nadav Amit
394457a928 KVM: x86: some apic broadcast modes does not work
KVM does not deliver x2APIC broadcast messages with physical mode.  Intel SDM
(10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of
FFFF_FFFFH is used for broadcast of interrupts in both logical destination and
physical destination modes."

In addition, the local-apic enables cluster mode broadcast. As Intel SDM
10.6.2.2 says: "Broadcast to all local APICs is achieved by setting all
destination bits to one." This patch enables cluster mode broadcast.

The fix tries to combine broadcast in different modes through a unified code.

One rare case occurs when the source of IPI has its APIC disabled.  In such
case, the source can still issue IPIs, but since the source is not obliged to
have the same LAPIC mode as the enabled ones, we cannot rely on it.
Since it is a rare case, it is unoptimized and done on the slow-path.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com>
[As per Radim's review, use unsigned int for X2APIC_BROADCAST, return bool from
 kvm_apic_broadcast. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03 12:07:22 +01:00
Minfei Huang
96e70f8328 x86/mm: Avoid overlap the fixmap area on i386
It is a problem when configuring high memory off where the
vmalloc reserve area could end up overlapping the early_ioremap
fixmap area on i386.

The ordering of the VMALLOC_RESERVE space is:

 FIXADDR_TOP
                       fixed_addresses
 FIXADDR_START
                       early_ioremap fixed addresses
 FIXADDR_BOOT_START
                       Persistent kmap area
 PKMAP_BASE
 VMALLOC_END
                       Vmalloc area
 VMALLOC_START
 high_memory

The available address we can use is lower than
FIXADDR_BOOT_START. So we will set the kmap boundary below the
FIXADDR_BOOT_START, if we configure high memory.

If we configure high memory, the vmalloc reserve area should
end up to PKMAP_BASE, otherwise should end up to
FIXADDR_BOOT_START.

Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/6B680A9E-6CE9-4C96-934B-CB01DCB58278@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 12:21:48 +01:00
Andy Lutomirski
61a492fb17 x86_64/vdso: Remove jiffies from the vvar page
I think that the jiffies vvar was once used for the vgetcpu
cache. That code is long gone, so let's just make jiffies be a
normal variable.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/fcfee6f8749af14d96373a9e2656354ad0b95499.1411494540.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 11:22:13 +01:00
Andy Lutomirski
c4a7bba29b sched/x86: Add a comment clarifying LDT context switching
The code is correct, but only for a rather subtle reason.  This
confused me for quite a while when I read switch_mm, so clarify
the code to avoid confusing other people, too.

TBH, I wouldn't be surprised if this code was only correct by
accident.

[ I wouldn't normally send a comment-only patch, but it took me a long
  time to first figure out wtf was going on here, and then to figure
  out why this wasn't exploitable by malicious code, and then to
  figure out why this oddity had no user-visible effect at all.  Let's
  spare future readers the same confusion. ]

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/36275c99801a87d8dcf0502a41cf4e2ad81aae46.1412623954.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 11:11:57 +01:00
Andy Lutomirski
2c7577a758 sched/x86_64: Don't save flags on context switch
Now that the kernel always runs with clean flags (in particular,
NT is clear), there is no need to save and restore flags on
every context switch.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Sebastian Lackner <sebastian@fds-team.de>
Cc: Anish Bhatt <anish@chelsio.com>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/bf6fb790787eb95b922157838f52712c25dda157.1412187233.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 11:11:30 +01:00
Oleg Nesterov
e2336f6e51 sched: Kill task_preempt_count()
task_preempt_count() is pointless if preemption counter is per-cpu,
currently this is x86 only. It is only valid if the task is not
running, and even in this case the only info it can provide is the
state of PREEMPT_ACTIVE bit.

Change its single caller to check p->on_rq instead, this should be
the same if p->state != TASK_RUNNING, and kill this helper.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Alexander Graf <agraf@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/20141008183348.GC17495@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 10:47:56 +01:00
Oleg Nesterov
009f60e276 sched: stop the unbound recursion in preempt_schedule_context()
preempt_schedule_context() does preempt_enable_notrace() at the end
and this can call the same function again; exception_exit() is heavy
and it is quite possible that need-resched is true again.

1. Change this code to dec preempt_count() and check need_resched()
   by hand.

2. As Linus suggested, we can use the PREEMPT_ACTIVE bit and avoid
   the enable/disable dance around __schedule(). But in this case
   we need to move into sched/core.c.

3. Cosmetic, but x86 forgets to declare this function. This doesn't
   really matter because it is only called by asm helpers, still it
   make sense to add the declaration into asm/preempt.h to match
   preempt_schedule().

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Graf <agraf@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20141005202322.GB27962@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 10:46:05 +01:00
Subhransu S. Prusty
9a80e8f597 ASoC: Intel: mrfld: Define sst_res_info for acpi
To query resources in ACPI systems we need to define a data structure. This
would be set as platform_info for the devices.

Signed-off-by: Subhransu S. Prusty <subhransu.s.prusty@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2014-10-27 18:02:38 +00:00
Subhransu S. Prusty
43c5e23197 ASoC: Intel: mrfld - Define ipc_info structure
This will be used to abstract the differances in ipc offsets for different
chips.

Signed-off-by: Subhransu S. Prusty <subhransu.s.prusty@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2014-10-27 18:02:38 +00:00
Linus Torvalds
96971e9aa9 This is a pretty large update. I think it is roughly as big
as what I usually had for the _whole_ rc period.
 
 There are a few bad bugs where the guest can OOPS or crash the host.  We
 have also started looking at attack models for nested virtualization;
 bugs that usually result in the guest ring 0 crashing itself become
 more worrisome if you have nested virtualization, because the nested
 guest might bring down the non-nested guest as well.  For current
 uses of nested virtualization these do not really have a security
 impact, but you never know and bugs are bugs nevertheless.
 
 A lot of these bugs are in 3.17 too, resulting in a large number of
 stable@ Ccs.  I checked that all the patches apply there with no
 conflicts.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJUSjmSAAoJEL/70l94x66D2cYH/3JKWsTzhXjHGxZcXQQ85CwR
 49hp/crCLWJ2YRKzyAOkvwPI0/SgYKM5wJ8kgtKlpLxrPZKYwhGd1S9tKf6EdAib
 5gc/SDDAgHmkqL3IrXmkyKzUVeUWvgD/IFi1Sqalko1blpRlaN/JyJV0mjjGCbA+
 yH3Qi5tD0X00u00ycuZCB6mrFH0PH87BmKFiz6bSSJ43tsgD9AVD64BZid6c6hwm
 iaIfNcIuShavlv1TKG80cSez2qtNXjRLeTN8A10gVZo3hof/wP8aRm+LxF/1JEZX
 OsoNCjOhhL29qafcZOg3j/atbiAzWtSGV3vjU+iWh5mnN5oFZHcPgIGucQsuFec=
 =9oQY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "This is a pretty large update.  I think it is roughly as big as what I
  usually had for the _whole_ rc period.

  There are a few bad bugs where the guest can OOPS or crash the host.
  We have also started looking at attack models for nested
  virtualization; bugs that usually result in the guest ring 0 crashing
  itself become more worrisome if you have nested virtualization,
  because the nested guest might bring down the non-nested guest as
  well.  For current uses of nested virtualization these do not really
  have a security impact, but you never know and bugs are bugs
  nevertheless.

  A lot of these bugs are in 3.17 too, resulting in a large number of
  stable@ Ccs.  I checked that all the patches apply there with no
  conflicts"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: vfio: fix unregister kvm_device_ops of vfio
  KVM: x86: Wrong assertion on paging_tmpl.h
  kvm: fix excessive pages un-pinning in kvm_iommu_map error path.
  KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag
  KVM: x86: Emulator does not decode clflush well
  KVM: emulate: avoid accessing NULL ctxt->memopp
  KVM: x86: Decoding guest instructions which cross page boundary may fail
  kvm: x86: don't kill guest on unknown exit reason
  kvm: vmx: handle invvpid vm exit gracefully
  KVM: x86: Handle errors when RIP is set during far jumps
  KVM: x86: Emulator fixes for eip canonical checks on near branches
  KVM: x86: Fix wrong masking on relative jump/call
  KVM: x86: Improve thread safety in pit
  KVM: x86: Prevent host from panicking on shared MSR writes.
  KVM: x86: Check non-canonical addresses upon WRMSR
2014-10-24 12:42:55 -07:00
Petr Matousek
a642fc3050 kvm: vmx: handle invvpid vm exit gracefully
On systems with invvpid instruction support (corresponding bit in
IA32_VMX_EPT_VPID_CAP MSR is set) guest invocation of invvpid
causes vm exit, which is currently not handled and results in
propagation of unknown exit to userspace.

Fix this by installing an invvpid vm exit handler.

This is CVE-2014-3646.

Cc: stable@vger.kernel.org
Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:17 +02:00
Andy Honig
8b3c3104c3 KVM: x86: Prevent host from panicking on shared MSR writes.
The previous patch blocked invalid writes directly when the MSR
is written.  As a precaution, prevent future similar mistakes by
gracefulling handle GPs caused by writes to shared MSRs.

Cc: stable@vger.kernel.org
Signed-off-by: Andrew Honig <ahonig@google.com>
[Remove parts obsoleted by Nadav's patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:08 +02:00
Nadav Amit
854e8bb1aa KVM: x86: Check non-canonical addresses upon WRMSR
Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
written to certain MSRs. The behavior is "almost" identical for AMD and Intel
(ignoring MSRs that are not implemented in either architecture since they would
anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
non-canonical address is written on Intel but not on AMD (which ignores the top
32-bits).

Accordingly, this patch injects a #GP on the MSRs which behave identically on
Intel and AMD.  To eliminate the differences between the architecutres, the
value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
canonical value before writing instead of injecting a #GP.

Some references from Intel and AMD manuals:

According to Intel SDM description of WRMSR instruction #GP is expected on
WRMSR "If the source register contains a non-canonical address and ECX
specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."

According to AMD manual instruction manual:
LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
LSTAR and CSTAR registers.  If an RIP written by WRMSR is not in canonical
form, a general-protection exception (#GP) occurs."
IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
base field must be in canonical form or a #GP fault will occur."
IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
be in canonical form."

This patch fixes CVE-2014-3610.

Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-24 13:21:08 +02:00
Linus Torvalds
8c81f48e16 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI updates from Peter Anvin:
 "This patchset falls under the "maintainers that grovel" clause in the
  v3.18-rc1 announcement.  We had intended to push it late in the merge
  window since we got it into the -tip tree relatively late.

  Many of these are relatively simple things, but there are a couple of
  key bits, especially Ard's and Matt's patches"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  rtc: Disable EFI rtc for x86
  efi: rtc-efi: Export platform:rtc-efi as module alias
  efi: Delete the in_nmi() conditional runtime locking
  efi: Provide a non-blocking SetVariable() operation
  x86/efi: Adding efi_printks on memory allocationa and pci.reads
  x86/efi: Mark initialization code as such
  x86/efi: Update comment regarding required phys mapped EFI services
  x86/efi: Unexport add_efi_memmap variable
  x86/efi: Remove unused efi_call* macros
  efi: Resolve some shadow warnings
  arm64: efi: Format EFI memory type & attrs with efi_md_typeattr_format()
  ia64: efi: Format EFI memory type & attrs with efi_md_typeattr_format()
  x86: efi: Format EFI memory type & attrs with efi_md_typeattr_format()
  efi: Introduce efi_md_typeattr_format()
  efi: Add macro for EFI_MEMORY_UCE memory attribute
  x86/efi: Clear EFI_RUNTIME_SERVICES if failing to enter virtual mode
  arm64/efi: Do not enter virtual mode if booting with efi=noruntime or noefi
  arm64/efi: uefi_init error handling fix
  efi: Add kernel param efi=noruntime
  lib: Add a generic cmdline parse function parse_option_str
  ...
2014-10-23 14:45:09 -07:00
Borislav Petkov
a3a529d104 x86, MCE, AMD: Drop software-defined bank in error thresholding
Aravind had the good question about why we're assigning a
software-defined bank when reporting error thresholding errors instead
of simply using the bank which reports the last error causing the
overflow.

Digging through git history, it pointed to

9526866439 ("[PATCH] x86_64: mce_amd support for family 0x10 processors")

which added that functionality. The problem with this, however, is that
tools don't know about software-defined banks and get puzzled. So drop
that K8_MCE_THRESHOLD_BASE and simply use the hw bank reporting the
thresholding interrupt.

Save us a couple of MSR reads while at it.

Reported-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: https://lkml.kernel.org/r/5435B206.60402@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-21 22:28:48 +02:00
Will Deacon
cbc908ef8e x86: io: implement dummy relaxed accessor macros for writes
write{b,w,l,q}_relaxed are implemented by some architectures in order to
permit memory-mapped I/O accesses with weaker barrier semantics than the
non-relaxed variants.

This patch adds dummy macros for the write accessors to x86, in the
same vein as the dummy definitions for the relaxed read accessors.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2014-10-20 18:49:18 +01:00
Vinod Koul
163d2089d2 ASoC: Intel: mrfld - add the dsp sst driver
The SST driver is the missing piece in our driver stack not upstreamed,
so push it now :) This driver currently supports PCI device on Merrifield.
Future updates will bring support for ACPI device as well as future update
to PCI devices as well

In subsequent patches support is added for DSP loading using memcpy,
pcm operations and compressed ops.

Signed-off-by: Subhransu S. Prusty <subhransu.s.prusty@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2014-10-20 12:20:34 +01:00
Linus Torvalds
1f6075f990 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more perf updates from Ingo Molnar:
 "A second (and last) round of late coming fixes and changes, almost all
  of them in perf tooling:

  User visible tooling changes:

   - Add period data column and make it default in 'perf script' (Jiri
     Olsa)

   - Add a visual cue for toggle zeroing of samples in 'perf top'
     (Taeung Song)

   - Improve callchains when using libunwind (Namhyung Kim)

  Tooling fixes and infrastructure changes:

   - Fix for double free in 'perf stat' when using some specific invalid
     command line combo (Yasser Shalabi)

   - Fix off-by-one bugs in map->end handling (Stephane Eranian)

   - Fix off-by-one bug in maps__find(), also related to map->end
     handling (Namhyung Kim)

   - Make struct symbol->end be the first addr after the symbol range,
     to make it match the convention used for struct map->end.  (Arnaldo
     Carvalho de Melo)

   - Fix perf_evlist__add_pollfd() error handling in 'perf kvm stat
     live' (Jiri Olsa)

   - Fix python test build by moving callchain_param to an object linked
     into the python binding (Jiri Olsa)

   - Document sysfs events/ interfaces (Cody P Schafer)

   - Fix typos in perf/Documentation (Masanari Iida)

   - Add missing 'struct option' forward declaration (Arnaldo Carvalho
     de Melo)

   - Add option to copy events when queuing for sorting across cpu
     buffers and enable it for 'perf kvm stat live', to avoid having
     events left in the queue pointing to the ring buffer be rewritten
     in high volume sessions.  (Alexander Yarygin, improving work done
     by David Ahern):

   - Do not include a struct hists per perf_evsel, untangling the
     histogram code from perf_evsel, to pave the way for exporting a
     minimalistic tools/lib/api/perf/ library usable by tools/perf and
     initially by the rasd daemon being developed by Borislav Petkov,
     Robert Richter and Jean Pihet.  (Arnaldo Carvalho de Melo)

   - Make perf_evlist__open(evlist, NULL, NULL), i.e. without cpu and
     thread maps mean syswide monitoring, reducing the boilerplate for
     tools that only want system wide mode.  (Arnaldo Carvalho de Melo)

   - Move exit stuff from perf_evsel__delete to perf_evsel__exit, delete
     should be just a front end for exit + free (Arnaldo Carvalho de
     Melo)

   - Add support to new style format of kernel PMU event.  (Kan Liang)

  and other misc fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  perf script: Add period as a default output column
  perf script: Add period data column
  perf evsel: No need to drag util/cgroup.h
  perf evlist: Add missing 'struct option' forward declaration
  perf evsel: Move exit stuff from __delete to __exit
  kprobes/x86: Remove stale ARCH_SUPPORTS_KPROBES_ON_FTRACE define
  perf kvm stat live: Enable events copying
  perf session: Add option to copy events when queueing
  perf Documentation: Fix typos in perf/Documentation
  perf trace: Use thread_{,_set}_priv helpers
  perf kvm: Use thread_{,_set}_priv helpers
  perf callchain: Create an address space per thread
  perf report: Set callchain_param.record_mode for future use
  perf evlist: Fix for double free in tools/perf stat
  perf test: Add test case for pmu event new style format
  perf tools: Add support to new style format of kernel PMU event
  perf tools: Parse the pmu event prefix and suffix
  Revert "perf tools: Default to cpu// for events v5"
  perf Documentation: Remove Ruplicated docs for powerpc cpu specific events
  perf Documentation: sysfs events/ interfaces
  ...
2014-10-19 11:55:41 -07:00
Linus Torvalds
857b50f5d0 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
 "This is the MIPS pull request for the next kernel:

   - Zubair's patch series adds CMA support for MIPS.  Doing so it also
     touches ARM64 and x86.
   - remove the last instance of IRQF_DISABLED from arch/mips
   - updates to two of the MIPS defconfig files.
   - cleanup of how cache coherency bits are handled on MIPS and
     implement support for write-combining.
   - platform upgrades for Alchemy
   - move MIPS DTS files to arch/mips/boot/dts/"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (24 commits)
  MIPS: ralink: remove deprecated IRQF_DISABLED
  MIPS: pgtable.h: Implement the pgprot_writecombine function for MIPS
  MIPS: cpu-probe: Set the write-combine CCA value on per core basis
  MIPS: pgtable-bits: Define the CCA bit for WC writes on Ingenic cores
  MIPS: pgtable-bits: Move the CCA bits out of the core's ifdef blocks
  MIPS: DMA: Add cma support
  x86: use generic dma-contiguous.h
  arm64: use generic dma-contiguous.h
  asm-generic: Add dma-contiguous.h
  MIPS: BPF: Add new emit_long_instr macro
  MIPS: ralink: Move device-trees to arch/mips/boot/dts/
  MIPS: Netlogic: Move device-trees to arch/mips/boot/dts/
  MIPS: sead3: Move device-trees to arch/mips/boot/dts/
  MIPS: Lantiq: Move device-trees to arch/mips/boot/dts/
  MIPS: Octeon: Move device-trees to arch/mips/boot/dts/
  MIPS: Add support for building device-tree binaries
  MIPS: Create common infrastructure for building built-in device-trees
  MIPS: SEAD3: Enable DEVTMPFS
  MIPS: SEAD3: Regenerate defconfigs
  MIPS: Alchemy: DB1300: Add touch penirq support
  ...
2014-10-18 14:24:36 -07:00
Anton Blanchard
691286b556 kprobes/x86: Remove stale ARCH_SUPPORTS_KPROBES_ON_FTRACE define
Commit e7dbfe349d ("kprobes/x86: Move ftrace-based kprobe code
into kprobes-ftrace.c") switched from using
ARCH_SUPPORTS_KPROBES_ON_FTRACE to CONFIG_KPROBES_ON_FTRACE but
missed removing the define.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: masami.hiramatsu.pt@hitachi.com
Cc: ananth@in.ibm.com
Cc: a.p.zijlstra@chello.nl
Cc: fweisbec@gmail.com
Cc: rostedt@goodmis.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-17 07:18:34 +02:00
Linus Torvalds
0429fbc0bd Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
 "Way back, before the current percpu allocator was implemented, static
  and dynamic percpu memory areas were allocated and handled separately
  and had their own accessors.  The distinction has been gone for many
  years now; however, the now duplicate two sets of accessors remained
  with the pointer based ones - this_cpu_*() - evolving various other
  operations over time.  During the process, we also accumulated other
  inconsistent operations.

  This pull request contains Christoph's patches to clean up the
  duplicate accessor situation.  __get_cpu_var() uses are replaced with
  with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

  Unfortunately, the former sometimes is tricky thanks to C being a bit
  messy with the distinction between lvalues and pointers, which led to
  a rather ugly solution for cpumask_var_t involving the introduction of
  this_cpu_cpumask_var_ptr().

  This converts most of the uses but not all.  Christoph will follow up
  with the remaining conversions in this merge window and hopefully
  remove the obsolete accessors"

* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
  irqchip: Properly fetch the per cpu offset
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
  ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
  Revert "powerpc: Replace __get_cpu_var uses"
  percpu: Remove __this_cpu_ptr
  clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
  sparc: Replace __get_cpu_var uses
  avr32: Replace __get_cpu_var with __this_cpu_write
  blackfin: Replace __get_cpu_var uses
  tile: Use this_cpu_ptr() for hardware counters
  tile: Replace __get_cpu_var uses
  powerpc: Replace __get_cpu_var uses
  alpha: Replace __get_cpu_var
  ia64: Replace __get_cpu_var uses
  s390: cio driver &__get_cpu_var replacements
  s390: Replace __get_cpu_var uses
  mips: Replace __get_cpu_var uses
  MIPS: Replace __get_cpu_var uses in FPU emulator.
  arm: Replace __this_cpu_ptr with raw_cpu_ptr
  ...
2014-10-15 07:48:18 +02:00
Linus Torvalds
2fd7476de9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc smaller fixes that missed the v3.17 cycle"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Add arch/x86/purgatory/ make generated files to gitignore
  x86: Fix section conflict for numachip
  x86: Reject x32 executables if x32 ABI not supported
  x86_64, entry: Filter RFLAGS.NT on entry from userspace
  x86, boot, kaslr: Fix nuisance warning on 32-bit builds
2014-10-14 02:28:16 +02:00
Linus Torvalds
ba1a96fc7d Merge branch 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 seccomp changes from Ingo Molnar:
 "This tree includes x86 seccomp filter speedups and related preparatory
  work, which touches core seccomp facilities as well.

  The main idea is to split seccomp into two phases, to be able to enter
  a simple fast path for syscalls with ptrace side effects.

  There's no substantial user-visible (and ABI) effects expected from
  this, except a change in how we emit a better audit record for
  SECCOMP_RET_TRACE events"

* 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls
  x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls
  x86: Split syscall_trace_enter into two phases
  x86, entry: Only call user_exit if TIF_NOHZ
  x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
  seccomp: Document two-phase seccomp and arch-provided seccomp_data
  seccomp: Allow arch code to provide seccomp_data
  seccomp: Refactor the filter callback and the API
  seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing
2014-10-14 02:27:06 +02:00
Linus Torvalds
df133e8fa8 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "This tree includes the following changes:

   - fix memory hotplug
   - fix hibernation bootup memory layout assumptions
   - fix hyperv numa guest kernel messages
   - remove dead code
   - update documentation"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Update memory map description to list hypervisor-reserved area
  x86/mm, hibernate: Do not assume the first e820 area to be RAM
  x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data()
  x86/mm/hotplug: Modify PGD entry when removing memory
  x86/mm/hotplug: Pass sync_global_pgds() a correct argument in remove_pagetable()
  x86: Remove set_pmd_pfn
2014-10-14 02:22:41 +02:00
Linus Torvalds
e3438330f5 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Ingo Molnar:
 "Misc smaller cleanups"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, intel: Fix total_size computation
  x86, microcode, intel: Rename apply_microcode and declare it static
  x86, microcode, intel: Fix typos
  x86, microcode, intel: Add missing static declarations
  x86, microcode, amd: Fix missing static declaration
2014-10-14 02:21:51 +02:00
Linus Torvalds
c7b228adca Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU updates from Ingo Molnar:
 "x86 FPU handling fixes, cleanups and enhancements from Oleg.

  The signal handling race fix and the __restore_xstate_sig() preemption
  fix for eager-mode is marked for -stable as well"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: copy_thread: Don't nullify ->ptrace_bps twice
  x86, fpu: Shift "fpu_counter = 0" from copy_thread() to arch_dup_task_struct()
  x86, fpu: copy_process: Sanitize fpu->last_cpu initialization
  x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math()
  x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu()
  x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable()
  x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
2014-10-14 02:20:50 +02:00
Linus Torvalds
708d0b41a2 Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpufeature updates from Ingo Molnar:
 "This tree includes the following changes:

   - Introduce DISABLED_MASK to list disabled CPU features, to simplify
     CPU feature handling and avoid excessive #ifdefs

   - Remove the lightly used cpu_has_pae() primitive"

* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Add more disabled features
  x86: Introduce disabled-features
  x86: Axe the lightly-used cpu_has_pae
2014-10-14 02:19:47 +02:00
Linus Torvalds
bf10fa857f Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Three small cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tty/serial/8250: Clean up the asm/serial.h include file a bit
  x86/tty/serial/8250: Resolve missing-field-initializers warnings
  x86: Remove obsolete comment in uapi/e820.h
2014-10-13 18:19:01 +02:00
Linus Torvalds
9d9420f120 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side updates:

   - Fix and enhance poll support (Jiri Olsa)

   - Re-enable inheritance optimization (Jiri Olsa)

   - Enhance Intel memory events support (Stephane Eranian)

   - Refactor the Intel uncore driver to be more maintainable (Zheng
     Yan)

   - Enhance and fix Intel CPU and uncore PMU drivers (Peter Zijlstra,
     Andi Kleen)

   - [ plus various smaller fixes/cleanups ]

  User visible tooling updates:

   - Add +field argument support for --field option, so that one can add
     fields to the default list of fields to show, ie now one can just
     do:

         perf report --fields +pid

     And the pid will appear in addition to the default fields (Jiri
     Olsa)

   - Add +field argument support for --sort option (Jiri Olsa)

   - Honour -w in the report tools (report, top), allowing to specify
     the widths for the histogram entries columns (Namhyung Kim)

   - Properly show submicrosecond times in 'perf kvm stat' (Christian
     Borntraeger)

   - Add beautifier for mremap flags param in 'trace' (Alex Snast)

   - perf script: Allow callchains if any event samples them

   - Don't truncate Intel style addresses in 'annotate' (Alex Converse)

   - Allow profiling when kptr_restrict == 1 for non root users, kernel
     samples will just remain unresolved (Andi Kleen)

   - Allow configuring default options for callchains in config file
     (Namhyung Kim)

   - Support operations for shared futexes.  (Davidlohr Bueso)

   - "perf kvm stat report" improvements by Alexander Yarygin:
       -  Save pid string in opts.target.pid
       -  Enable the target.system_wide flag
       -  Unify the title bar output

   - [ plus lots of other fixes and small improvements.  ]

  Tooling infrastructure changes:

   - Refactor unit and scale function parameters for PMU parsing
     routines (Matt Fleming)

   - Improve DSO long names lookup with rbtree, resulting in great
     speedup for workloads with lots of DSOs (Waiman Long)

   - We were not handling POLLHUP notifications for event file
     descriptors

     Fix it by filtering entries in the events file descriptor array
     after poll() returns, refcounting mmaps so that when the last fd
     pointing to a perf mmap goes away we do the unmap (Arnaldo Carvalho
     de Melo)

   - Intel PT prep work, from Adrian Hunter, including:
       - Let a user specify a PMU event without any config terms
       - Add perf-with-kcore script
       - Let default config be defined for a PMU
       - Add perf_pmu__scan_file()
       - Add a 'perf test' for tracking with sched_switch
       - Add 'flush' callback to scripting API

   - Use ring buffer consume method to look like other tools (Arnaldo
     Carvalho de Melo)

   - hists browser (used in top and report) refactorings, getting rid of
     unused variables and reducing source code size by handling similar
     cases in a fewer functions (Namhyung Kim).

   - Replace thread unsafe strerror() with strerror_r() accross the
     whole tools/perf/ tree (Masami Hiramatsu)

   - Rename ordered_samples to ordered_events and allow setting a queue
     size for ordering events (Jiri Olsa)

   - [ plus lots of fixes, cleanups and other improvements ]"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (198 commits)
  perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment
  perf/x86/intel/uncore: Fix minor race in box set up
  perf record: Fix error message for --filter option not coming after tracepoint
  perf tools: Fix build breakage on arm64 targets
  perf symbols: Improve DSO long names lookup speed with rbtree
  perf symbols: Encapsulate dsos list head into struct dsos
  perf bench futex: Sanitize -q option in requeue
  perf bench futex: Support operations for shared futexes
  perf trace: Fix mmap return address truncation to 32-bit
  perf tools: Refactor unit and scale function parameters
  perf tools: Fix line number in the config file error message
  perf tools: Convert {record,top}.call-graph option to call-graph.record-mode
  perf tools: Introduce perf_callchain_config()
  perf callchain: Move some parser functions to callchain.c
  perf tools: Move callchain config from record_opts to callchain_param
  perf hists browser: Fix callchain print bug on TUI
  perf tools: Use ACCESS_ONCE() instead of volatile cast
  perf tools: Modify error code for when perf_session__new() fails
  perf tools: Fix perf record as non root with kptr_restrict == 1
  perf stat: Fix --per-core on multi socket systems
  ...
2014-10-13 15:58:15 +02:00
Linus Torvalds
6d5f0ebfc0 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking updates from Ingo Molnar:
 "The main updates in this cycle were:

   - mutex MCS refactoring finishing touches: improve comments, refactor
     and clean up code, reduce debug data structure footprint, etc.

   - qrwlock finishing touches: remove old code, self-test updates.

   - small rwsem optimization

   - various smaller fixes/cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Revert qrwlock recusive stuff
  locking/rwsem: Avoid double checking before try acquiring write lock
  locking/rwsem: Move EXPORT_SYMBOL() lines to follow function definition
  locking/rwlock, x86: Delete unused asm/rwlock.h and rwlock.S
  locking/rwlock, x86: Clean up asm/spinlock*.h to remove old rwlock code
  locking/semaphore: Resolve some shadow warnings
  locking/selftest: Support queued rwlock
  locking/lockdep: Restrict the use of recursive read_lock() with qrwlock
  locking/spinlocks: Always evaluate the second argument of spin_lock_nested()
  locking/Documentation: Update locking/mutex-design.txt disadvantages
  locking/Documentation: Move locking related docs into Documentation/locking/
  locking/mutexes: Use MUTEX_SPIN_ON_OWNER when appropriate
  locking/mutexes: Refactor optimistic spinning code
  locking/mcs: Remove obsolete comment
  locking/mutexes: Document quick lock release when unlocking
  locking/mutexes: Standardize arguments in lock/unlock slowpaths
  locking: Remove deprecated smp_mb__() barriers
2014-10-13 15:51:40 +02:00
Linus Torvalds
dbb885fecc Merge branch 'locking-arch-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull arch atomic cleanups from Ingo Molnar:
 "This is a series kept separate from the main locking tree, which
  cleans up and improves various details in the atomics type handling:

   - Remove the unused atomic_or_long() method

   - Consolidate and compress atomic ops implementations between
     architectures, to reduce linecount and to make it easier to add new
     ops.

   - Rewrite generic atomic support to only require cmpxchg() from an
     architecture - generate all other methods from that"

* 'locking-arch-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read()
  locking, mips: Fix atomics
  locking, sparc64: Fix atomics
  locking,arch: Rewrite generic atomic support
  locking,arch,xtensa: Fold atomic_ops
  locking,arch,sparc: Fold atomic_ops
  locking,arch,sh: Fold atomic_ops
  locking,arch,powerpc: Fold atomic_ops
  locking,arch,parisc: Fold atomic_ops
  locking,arch,mn10300: Fold atomic_ops
  locking,arch,mips: Fold atomic_ops
  locking,arch,metag: Fold atomic_ops
  locking,arch,m68k: Fold atomic_ops
  locking,arch,m32r: Fold atomic_ops
  locking,arch,ia64: Fold atomic_ops
  locking,arch,hexagon: Fold atomic_ops
  locking,arch,cris: Fold atomic_ops
  locking,arch,avr32: Fold atomic_ops
  locking,arch,arm64: Fold atomic_ops
  locking,arch,arm: Fold atomic_ops
  ...
2014-10-13 15:48:00 +02:00
Linus Torvalds
81ae31d782 xen: features and fixes for 3.18-rc0
- Add pvscsi frontend and backend drivers.
 - Remove _PAGE_IOMAP PTE flag, freeing it for alternate uses.
 - Try and keep memory contiguous during PV memory setup (reduces
   SWIOTLB usage).
 - Allow front/back drivers to use threaded irqs.
 - Support large initrds in PV guests.
 - Fix PVH guests in preparation for Xen 4.5
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJUNonmAAoJEFxbo/MsZsTRHAQH/inCjpCT+pkvTB0YAVfVvgMI
 gUogT8G+iB2MuCNpMffGIt8TAVXwcVtnOLH9ABH3IBVehzgipIbIiVEM9YhjrYvU
 1rgIKBpmZqSpjDHoIHpdHeCH67cVnRzA/PyoxZWLxPNmQ0t6bNf9yeAcCXK9PfUc
 7EAblUDmPGSx9x/EUnOKNNaZSEiUJZHDBXbMBLllk1+5H1vfKnpFCRGMG0IrfI44
 KVP2NX9Gfa05edMZYtH887FYyjFe2KNV6LJvE7+w7h2Dy0yIzf7y86t0l4n8gETb
 plvEUJ/lu9RYzTiZY/RxgBFYVTV59EqT45brSUtoe2Jcp8GSwiHslTHdfyFBwSo=
 =gw4d
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.18-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen updates from David Vrabel:
 "Features and fixes:

   - Add pvscsi frontend and backend drivers.
   - Remove _PAGE_IOMAP PTE flag, freeing it for alternate uses.
   - Try and keep memory contiguous during PV memory setup (reduces
     SWIOTLB usage).
   - Allow front/back drivers to use threaded irqs.
   - Support large initrds in PV guests.
   - Fix PVH guests in preparation for Xen 4.5"

* tag 'stable/for-linus-3.18-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (22 commits)
  xen: remove DEFINE_XENBUS_DRIVER() macro
  xen/xenbus: Remove BUG_ON() when error string trucated
  xen/xenbus: Correct the comments for xenbus_grant_ring()
  x86/xen: Set EFER.NX and EFER.SCE in PVH guests
  xen: eliminate scalability issues from initrd handling
  xen: sync some headers with xen tree
  xen: make pvscsi frontend dependant on xenbus frontend
  arm{,64}/xen: Remove "EXPERIMENTAL" in the description of the Xen options
  xen-scsifront: don't deadlock if the ring becomes full
  x86: remove the Xen-specific _PAGE_IOMAP PTE flag
  x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings
  x86: skip check for spurious faults for non-present faults
  xen/efi: Directly include needed headers
  xen-scsiback: clean up a type issue in scsiback_make_tpg()
  xen-scsifront: use GFP_ATOMIC under spin_lock
  MAINTAINERS: Add xen pvscsi maintainer
  xen-scsiback: Add Xen PV SCSI backend driver
  xen-scsifront: Add Xen PV SCSI frontend driver
  xen: Add Xen pvSCSI protocol description
  xen/events: support threaded irqs for interdomain event channels
  ...
2014-10-11 20:29:01 -04:00
Mel Gorman
6a33979d5b mm: remove misleading ARCH_USES_NUMA_PROT_NONE
ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented
_PAGE_NUMA using _PROT_NONE.  This saved using an additional PTE bit and
relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting
fault scanner.  This was found to be conceptually confusing with a lot of
implicit assumptions and it was asked that an alternative be found.

Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the
PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE
bits and shrunk the maximum possible swap size but it did not go far
enough.  There are no architectures that reuse _PROT_NONE as _PROT_NUMA
but the relics still exist.

This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary
duplication in powerpc vs the generic implementation by defining the types
the core NUMA helpers expected to exist from x86 with their ppc64
equivalent.  This necessitated that a PTE bit mask be created that
identified the bits that distinguish present from NUMA pte entries but it
is expected this will only differ between arches based on _PAGE_PROTNONE.
The naming for the generic helpers was taken from x86 originally but ppc64
has types that are equivalent for the purposes of the helper so they are
mapped instead of duplicating code.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:52 -04:00
Linus Torvalds
afa3536be8 Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "Main changes:

  - Fix the deadlock reported by Dave Jones et al
  - Clean up and fix nohz_full interaction with arch abilities
  - nohz init code consolidation/cleanup"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  nohz: nohz full depends on irq work self IPI support
  nohz: Consolidate nohz full init code
  arm64: Tell irq work about self IPI support
  arm: Tell irq work about self IPI support
  x86: Tell irq work about self IPI support
  irq_work: Force raised irq work to run on irq work interrupt
  irq_work: Introduce arch_irq_work_has_interrupt()
  nohz: Move nohz full init call to tick init
2014-10-09 06:30:57 -04:00
Linus Torvalds
e4e65676f2 Fixes and features for 3.18.
Apart from the usual cleanups, here is the summary of new features:
 
 - s390 moves closer towards host large page support
 
 - PowerPC has improved support for debugging (both inside the guest and
   via gdbstub) and support for e6500 processors
 
 - ARM/ARM64 support read-only memory (which is necessary to put firmware
   in emulated NOR flash)
 
 - x86 has the usual emulator fixes and nested virtualization improvements
   (including improved Windows support on Intel and Jailhouse hypervisor
   support on AMD), adaptive PLE which helps overcommitting of huge guests.
   Also included are some patches that make KVM more friendly to memory
   hot-unplug, and fixes for rare caching bugs.
 
 Two patches have trivial mm/ parts that were acked by Rik and Andrew.
 
 Note: I will soon switch to a subkey for signing purposes.  To verify
 future signed pull requests from me, please update my key with
 "gpg --recv-keys 9B4D86F2".  You should see 3 new subkeys---the
 one for signing will be a 2048-bit RSA key, 4E6B09D7.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJUL5sPAAoJEBvWZb6bTYbyfkEP/3MNhSyn6HCjPjtjLNPAl9KL
 WpExZSUFL2+4CztpdGIsek1BeJYHmqv3+c5S+WvaWVA1aqh2R7FT1D1ErBLjgLQq
 lq23IOr+XxmC3dXQUEEk+TlD+283UzypzEG4l4UD3JYg79fE3UrXAz82SeyewJDY
 x7aPYhkZG3RHu+wAyMPasG6E3zS5LySdUtGWbiPwz5BejrhBJoJdeb2WIL/RwnUK
 7ppSLB5EoFj/uMkuyeAAdAbdfSrhHA6faDZxNdxS9k9wGutrhhfUoQ49ONrKG4dV
 sFo1tSPTVgRs8QFYUZ2fJUPBAmUVddsgqh2K9d0NftGTq7b8YszaCsfFrs2/Y4MU
 YxssWEhxsfszerCu12bbAJrv6JBZYQ7TwGvI9L7P0iFU6IVw/djmukU4AkM9/e91
 YS/cue/PN+9Pn2ccXzL9J7xRtZb8FsOuRsCXTCmbOwDkLmrKPDBN2t3RUbeF+Eam
 ABrpWnLKX13kZSo4LKU+/niarzmPMp7odQfHVdr8ea0fiYLp4iN8puA20WaSPIgd
 CLvm+RAvXe5Lm91L4mpFotJ2uFyK6QlIYJV4FsgeWv/0D0qppWQi0Utb/aCNHCgy
 z8MyUMD48y7EpoQrFYr/7cddXIu0/NegnM8I1coVjIPEk4NfeebGUlCJ/V3D8wMG
 BgEfS2x6jRc5zB3hjwDr
 =iEVi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Fixes and features for 3.18.

  Apart from the usual cleanups, here is the summary of new features:

   - s390 moves closer towards host large page support

   - PowerPC has improved support for debugging (both inside the guest
     and via gdbstub) and support for e6500 processors

   - ARM/ARM64 support read-only memory (which is necessary to put
     firmware in emulated NOR flash)

   - x86 has the usual emulator fixes and nested virtualization
     improvements (including improved Windows support on Intel and
     Jailhouse hypervisor support on AMD), adaptive PLE which helps
     overcommitting of huge guests.  Also included are some patches that
     make KVM more friendly to memory hot-unplug, and fixes for rare
     caching bugs.

  Two patches have trivial mm/ parts that were acked by Rik and Andrew.

  Note: I will soon switch to a subkey for signing purposes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits)
  kvm: do not handle APIC access page if in-kernel irqchip is not in use
  KVM: s390: count vcpu wakeups in stat.halt_wakeup
  KVM: s390/facilities: allow TOD-CLOCK steering facility bit
  KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
  arm/arm64: KVM: Report correct FSC for unsupported fault types
  arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc
  kvm: Fix kvm_get_page_retry_io __gup retval check
  arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset
  kvm: x86: Unpin and remove kvm_arch->apic_access_page
  kvm: vmx: Implement set_apic_access_page_addr
  kvm: x86: Add request bit to reload APIC access page address
  kvm: Add arch specific mmu notifier for page invalidation
  kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static
  kvm: Fix page ageing bugs
  kvm/x86/mmu: Pass gfn and level to rmapp callback.
  x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
  kvm: x86: use macros to compute bank MSRs
  KVM: x86: Remove debug assertion of non-PAE reserved bits
  kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
  kvm: Faults which trigger IO release the mmap_sem
  ...
2014-10-08 05:27:39 -04:00
Ben Hutchings
0e6d3112a4 x86: Reject x32 executables if x32 ABI not supported
It is currently possible to execve() an x32 executable on an x86_64
kernel that has only ia32 compat enabled.  However all its syscalls
will fail, even _exit().  This usually causes it to segfault.

Change the ELF compat architecture check so that x32 executables are
rejected if we don't support the x32 ABI.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Link: http://lkml.kernel.org/r/1410120305.6822.9.camel@decadent.org.uk
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-08 11:17:42 +02:00
Linus Torvalds
74da38631a Tinification for 3.18
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJUL0J0AAoJEA7Zo9+K/4c9w40P/iMFPfCethdBtPz5rI88CVr2
 7yU99TdbEPoRJm+rU4ohvHdB73p2KWINIKvpSThvegvjXbEcKxQkdpVWHsFJZeHS
 bZiYmhjxdCBvJGLrYo5IwqH0PrSjokTPzMUekUCk7BkUKNJRaDjfUBHvUmKsinUR
 dQL+3KE3edy6W3DL+FOd0QZwSOgmOfEibTWpfmg+n16kFNa75Kg/QLwjYRvtQplP
 eElywDZN07IhAeBFqKhKvlKmDSAeqMd8RfoPPo9Ts+reeIrWYjVNbl9ISOqXqy2x
 JoLeZQmwSXj/C9Ehr5e+aId2eO8In5xueQfXP8SS8dCC7VLwRbnNgyAQQZEslEBk
 QH0GhT6GqTamBdiNI3I+usfs65cEaialXh2afcoLwGS/iGD8MhZ8Dt+m4iyXNxEZ
 kT9VA4974mPjJ1g0mDDnYIxNjxF43m+SD5K1sR/XGpMcA8NdqMUmvKNcbePCobVa
 WTutIemQqGipNeWE94XwZEbc0B+aWwH7eiZOBMVGhWsHInd7QeTBTbfZlctyBkzf
 AswgsFjC5FW05CWK6J1Lf/UI1FD9PmHMKpmQUPED1+7okDTfqGjKjdREWgZSixUt
 LIRfWqWEaNpRRBFbDyt0C+F4pBRPLiRDaOyNhwEdtXuVGKRXb1G3qX7nFOJAZo6G
 GDTZo9iIRNSfm/M4tJ+n
 =2VyW
 -----END PGP SIGNATURE-----

Merge tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux

Pull "tinification" patches from Josh Triplett.

Work on making smaller kernels.

* tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux:
  bloat-o-meter: Ignore syscall aliases SyS_ and compat_SyS_
  mm: Support compiling out madvise and fadvise
  x86: Support compiling out human-friendly processor feature names
  x86: Drop support for /proc files when !CONFIG_PROC_FS
  x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK
  x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE
  x86, boot: Use the usual -y -n mechanism for objects in vmlinux
  x86: Add "make tinyconfig" to configure the tiniest possible kernel
  x86, platform, kconfig: move kvmconfig functionality to a helper
2014-10-07 08:51:59 -04:00
Matt Fleming
75b128573b Merge branch 'next' into efi-next-merge
Conflicts:
	arch/x86/boot/compressed/eboot.c
2014-10-03 22:15:56 +01:00
Matt Fleming
60b4dc7720 efi: Delete the in_nmi() conditional runtime locking
commit 5dc3826d9f08 ("efi: Implement mandatory locking for UEFI Runtime
Services") implemented some conditional locking when accessing variable
runtime services that Ingo described as "pretty disgusting".

The intention with the !efi_in_nmi() checks was to avoid live-locks when
trying to write pstore crash data into an EFI variable. Such lockless
accesses are allowed according to the UEFI specification when we're in a
"non-recoverable" state, but whether or not things are implemented
correctly in actual firmware implementations remains an unanswered
question, and so it would seem sensible to avoid doing any kind of
unsynchronized variable accesses.

Furthermore, the efi_in_nmi() tests are inadequate because they don't
account for the case where we call EFI variable services from panic or
oops callbacks and aren't executing in NMI context. In other words,
live-locking is still possible.

Let's just remove the conditional locking altogether. Now we've got the
->set_variable_nonblocking() EFI variable operation we can abort if the
runtime lock is already held. Aborting is by far the safest option.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:41:03 +01:00
Mathias Krause
4e78eb0561 x86/efi: Mark initialization code as such
The 32 bit and 64 bit implementations differ in their __init annotations
for some functions referenced from the common EFI code. Namely, the 32
bit variant is missing some of the __init annotations the 64 bit variant
has.

To solve the colliding annotations, mark the corresponding functions in
efi_32.c as initialization code, too -- as it is such.

Actually, quite a few more functions are only used during initialization
and therefore can be marked __init. They are therefore annotated, too.
Also add the __init annotation to the prototypes in the efi.h header so
users of those functions will see it's meant as initialization code
only.

This patch also fixes the "prelog" typo. ("prologue" / "epilogue" might
be more appropriate but this is C code after all, not an opera! :D)

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:41:03 +01:00
Mathias Krause
6092068547 x86/efi: Unexport add_efi_memmap variable
This variable was accidentally exported, even though it's only used in
this compilation unit and only during initialization.

Remove the bogus export, make the variable static instead and mark it
as __initdata.

Fixes: 200001eb14 ("x86 boot: only pick up additional EFI memmap...")
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:41:02 +01:00
Mathias Krause
24ffd84b60 x86/efi: Remove unused efi_call* macros
Complement commit 62fa6e69a4 ("x86/efi: Delete most of the efi_call*
macros") and delete the stub macros for the !CONFIG_EFI case, too. In
fact, there are no EFI calls in this case so we don't need a dummy for
efi_call() even.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:41:02 +01:00
Ard Biesheuvel
161485e827 efi: Implement mandatory locking for UEFI Runtime Services
According to section 7.1 of the UEFI spec, Runtime Services are not fully
reentrant, and there are particular combinations of calls that need to be
serialized. Use a spinlock to serialize all Runtime Services with respect
to all others, even if this is more than strictly needed.

We've managed to get away without requiring a runtime services lock
until now because most of the interactions with EFI involve EFI
variables, and those operations are already serialised with
__efivars->lock.

Some of the assumptions underlying the decision whether locks are
needed or not (e.g., SetVariable() against ResetSystem()) may not
apply universally to all [new] architectures that implement UEFI.
Rather than try to reason our way out of this, let's just implement at
least what the spec requires in terms of locking.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:40:57 +01:00
Pranith Kumar
2291059c85 locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read()
Use the much more reader friendly ACCESS_ONCE() instead of the cast to volatile.
This is purely a stylistic change.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1411482607-20948-1-git-send-email-bobby.prani@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03 06:06:23 +02:00
Linus Torvalds
cd40fab6db Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This has:

   - EFI revert to fix a boot regression
   - early_ioremap() fix for boot failure
   - KASLR fix for possible boot failures
   - EFI fix for corrupted string printing
   - remove a misleading EFI bootup 'failed!' error message

  Unfortunately it's all rather close to the merge window"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Truncate 64-bit values when calling 32-bit OutputString()
  x86/efi: Delete misleading efi_printk() error message
  Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>"
  x86/kaslr: Avoid the setup_data area when picking location
  x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
2014-09-27 14:23:13 -07:00
Ingo Molnar
29282ac0bd * Revert the static library changes from the merge window since they're
causing issues for Macbooks and Fedora + Grub2 - Matt Fleming
 
  * Delete the misleading "setup_efi_pci() failed!" message which some
    people are seeing when booting EFI - Matt Fleming
 
  * Fix printing strings from the 32-bit EFI boot stub by only passing
    32-bit addresses to the firmware - Matt Fleming
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUIzOFAAoJEC84WcCNIz1VpSwQAJhp9Yu60dWXyqRpV+ER1yau
 MWHXunkeaccbBXNABkzuDUqb2a6DRIJow+/n+dYjIGY6Nf9zzLQFQ+s/EsS+IiyY
 s4rRvAqzfGYk1d6xzgvEccrdD4fRP32kFKnSlZpfTRuJZHZieD+f2y6TP7D33Ja3
 HV/ivPQHZNjxgsExpcE8Rz/QyOZpqRacTr9Gr7IusBRL6IMCyycfCTmt6d5pf1iD
 kQGSGwIBvgN4xMqPUdxTNo31bQZA5ZeywNOh9WhdSCL7FAIDfG9TmXt/J7ckq5ax
 0f6X92qgCs3peLY+/szgSZ2LsZI6I/FM1udc2SFiIOPvCwwytcJ94Wro5pLYzZ4i
 SnqB2xLLEmsR2J3MXIeY0aVy2VtHT4bYRnXYNd9G0eaVfrlJ+4lgwqJavAJmtDZx
 88ey1R8LKRQr+ueSv/BnOvE6T2+38HrjrMooFQsPvolRR0S6MITBr8I2hoRASkUt
 YsA+7s6+tO2QBmQYrKCYSAi9A7onMA9Fh93dmv7XLqFw/SsfVm3RnrNhOVsO9kPC
 zIsWZoS+PGwb4RRvM2i7JAEqUCbuLpIAHYEU6gqprWm1ERHsX9mfFYfsJQHzHuOY
 rg6+wtWQ9MGxek8POac4d2mC+PwC4DA0AkaTTZQBcdAJu+h/gZNt4w7mpk5v4Th1
 QYr/otShMvc84Zd+RMeV
 =5mVJ
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fixes from Matt Fleming:

  * Revert the static library changes from the merge window since they're
    causing issues for Macbooks and Fedora + Grub2 (Matt Fleming)

  * Delete the misleading "setup_efi_pci() failed!" message which some
    people are seeing when booting EFI (Matt Fleming)

  * Fix printing strings from the 32-bit EFI boot stub by only passing
    32-bit addresses to the firmware (Matt Fleming)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-25 16:40:08 +02:00
Tang Chen
c24ae0dcd3 kvm: x86: Unpin and remove kvm_arch->apic_access_page
In order to make the APIC access page migratable, stop pinning it in
memory.

And because the APIC access page is not pinned in memory, we can
remove kvm_arch->apic_access_page.  When we need to write its
physical address into vmcs, we use gfn_to_page() to get its page
struct, which is needed to call page_to_phys(); the page is then
immediately unpinned.

Suggested-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:08:01 +02:00
Tang Chen
4256f43f9f kvm: x86: Add request bit to reload APIC access page address
Currently, the APIC access page is pinned by KVM for the entire life
of the guest.  We want to make it migratable in order to make memory
hot-unplug available for machines that run KVM.

This patch prepares to handle this in generic code, through a new
request bit (that will be set by the MMU notifier) and a new hook
that is called whenever the request bit is processed.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:08:00 +02:00
Tang Chen
fe71557afb kvm: Add arch specific mmu notifier for page invalidation
This will be used to let the guest run while the APIC access page is
not pinned.  Because subsequent patches will fill in the function
for x86, place the (still empty) x86 implementation in the x86.c file
instead of adding an inline function in kvm_host.h.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:59 +02:00
Andres Lagar-Cavilla
5712846808 kvm: Fix page ageing bugs
1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.

2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.

3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.

Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:58 +02:00
Paolo Bonzini
c1118b3602 x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA.
In that case, KVM will fail to patch VMCALL instructions to VMMCALL
as required on AMD processors.

The failure mode is currently a divide-by-zero exception, which obviously
is a KVM bug that has to be fixed.  However, picking the right instruction
between VMCALL and VMMCALL will be faster and will help if you cannot upgrade
the hypervisor.

Reported-by: Chris Webb <chris@arachsys.com>
Tested-by: Chris Webb <chris@arachsys.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:57 +02:00
Liang Chen
77c3913b74 KVM: x86: directly use kvm_make_request again
A one-line wrapper around kvm_make_request is not particularly
useful. Replace kvm_mmu_flush_tlb() with kvm_make_request().

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:51 +02:00
Matt Fleming
84be880560 Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>"
This reverts commit f23cf8bd5c ("efi/x86: efistub: Move shared
dependencies to <asm/efi.h>") as well as the x86 parts of commit
f4f75ad574 ("efi: efistub: Convert into static library").

The road leading to these two reverts is long and winding.

The above two commits were merged during the v3.17 merge window and
turned the common EFI boot stub code into a static library. This
necessitated making some symbols global in the x86 boot stub which
introduced new entries into the early boot GOT.

The problem was that we weren't fixing up the newly created GOT entries
before invoking the EFI boot stub, which sometimes resulted in hangs or
resets. This failure was reported by Maarten on his Macbook pro.

The proposed fix was commit 9cb0e39423 ("x86/efi: Fixup GOT in all
boot code paths"). However, that caused issues for Linus when booting
his Sony Vaio Pro 11. It was subsequently reverted in commit
f3670394c2.

So that leaves us back with Maarten's Macbook pro not booting.

At this stage in the release cycle the least risky option is to revert
the x86 EFI boot stub to the pre-merge window code structure where we
explicitly #include efi-stub-helper.c instead of linking with the static
library. The arm64 code remains unaffected.

We can take another swing at the x86 parts for v3.18.

Conflicts:
	arch/x86/include/asm/efi.h

Tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Tested-by: Leif Lindholm <leif.lindholm@linaro.org> [arm64]
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-23 22:01:55 +01:00
David Vrabel
f955371ca9 x86: remove the Xen-specific _PAGE_IOMAP PTE flag
The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs
that were used to map I/O regions that are 1:1 in the p2m.  This
allowed Xen to obtain the correct PFN when converting the MFNs read
from a PTE back to their PFN.

Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn()
returns the correct PFN by using a combination of the m2p and p2m to
determine if an MFN corresponds to a 1:1 mapping in the the p2m.

Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for
future uses of the PTE flag.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
2014-09-23 13:36:20 +00:00
Zubair Lutfullah Kakakhel
8057b30814 x86: use generic dma-contiguous.h
dma-contiguous.h is now in asm-generic. Use that to avoid code
repetition in x86.

Signed-off-by: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: catalin.marinas@arm.com
Cc: will.deacon@arm.com
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: arnd@arndb.de
Cc: gregkh@linuxfoundation.org
Cc: m.szyprowski@samsung.com
Cc: x86@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-arch@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/7359/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-09-22 13:35:52 +02:00
Linus Torvalds
7a5e87867e Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

  EFI fixes, a build fix, a page table dumping (debug) fix and a clang
  build fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/arm64: Fix fdt-related memory reservation
  x86/mm: Apply the section attribute to the variable, not its type
  x86/efi: Fixup GOT in all boot code paths
  x86/efi: Only load initrd above 4g on second try
  x86-64, ptdump: Mark espfix area only if existent
  x86, irq: Fix build error caused by 9eabc99a63
2014-09-19 09:07:47 -07:00
Ingo Molnar
43657ffb79 * Increase the number of early_ioremap slots to fix a regression with
earlyprintk=efi after recent changes to the ACPI code - Dave Young
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUG/pjAAoJEC84WcCNIz1VQToP+wfz21mZLPyurzCXycbrXi3I
 h2Jg99HhCe2TL1+9slSEJZvCFGsnspbkZtfA8OB0Ms7eomt3mY9yTwsfMVfyUZRM
 2DAWjK0eg9ZEAXcmU0tGbI4Iy9Kf/U0OPEsZ8ZKJkG1QGA+pHrKqhfdrKzP/pVV6
 iQ94OgBa1BZqljYureERvIourghN6hcwlJHqTLF66kzbFtL//2NQJIidHBcJRO0k
 CA/2U65WNEFXAtUenHHUnSM8hhN5H2NiSxPKg6p4iK6nsNcBL80rhEmMHYW80VHp
 PP7lXoCAbGARIfXbjPPNvn7aHL0qzpsF3fvNrtQRSkEwGjEod8gxHAqqzguLrK58
 G0HovhfUvEADXzjDiWCWUAIT/ryHutJLxXo6Fn2UijtAso+W/wUorqJlKNDG1dHX
 PRJKO0kk7TCKPBYxNRkds9Gt8g8uEoInr+V13qvURjwb6H9hFd8fyuL25bRUftdV
 TE4/pzm5F9YjLU2548RfXKcz5snSuOT9EYf3s4C8wTtm4tJv7GxQ+/vz000Hres3
 YlWaY2CywsrZfdBZaaF9gEtsjsdBVH+PspOGtodY0jjnz/I71H7oZNdgb8vslRLS
 6iH9+swCEm6cdytzqFXO1qp7DKLpxZJmwTHKhAvdSYzXZBvf1JMCQ6zdoXNfdNRR
 R8vYh1pvBd1tRJQ9tpJe
 =t4a9
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fix from Matt Fleming:

  * Increase the number of early_ioremap() slots to fix a regression with
    earlyprintk=efi after recent changes to the ACPI code (Dave Young)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-19 12:49:20 +02:00
Tang Chen
a255d4795f kvm: Remove ept_identity_pagetable from struct kvm_arch.
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

NOTE: In the original code, ept identity pagetable page is pinned in memroy.
      As a result, it cannot be migrated/hot-removed. After this patch, since
      kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
      is no longer pinned in memory. And it can be migrated/hot-removed.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-17 13:10:11 +02:00
Luiz Capitulino
8b375f64dc x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data()
The setup_node_data() function allocates a pg_data_t object,
inserts it into the node_data[] array and initializes the
following fields: node_id, node_start_pfn and
node_spanned_pages.

However, a few function calls later during the kernel boot,
free_area_init_node() re-initializes those fields, possibly with
setup_node_data() is not used.

This causes a small glitch when running Linux as a hyperv numa
guest:

  SRAT: PXM 0 -> APIC 0x00 -> Node 0
  SRAT: PXM 0 -> APIC 0x01 -> Node 0
  SRAT: PXM 1 -> APIC 0x02 -> Node 1
  SRAT: PXM 1 -> APIC 0x03 -> Node 1
  SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
  SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff]
  SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff]
  NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff]
  Initmem setup node 0 [mem 0x00000000-0x7fffffff]
    NODE_DATA [mem 0x7ffdc000-0x7ffeffff]
  Initmem setup node 1 [mem 0x80800000-0x1081fffff]
    NODE_DATA [mem 0x1081ea000-0x1081fdfff]
  crashkernel: memory value expected
   [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0
   [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1
  Zone ranges:
    DMA      [mem 0x00001000-0x00ffffff]
    DMA32    [mem 0x01000000-0xffffffff]
    Normal   [mem 0x100000000-0x1081fffff]
  Movable zone start for each node
  Early memory node ranges
    node   0: [mem 0x00001000-0x0009efff]
    node   0: [mem 0x00100000-0x7ffeffff]
    node   1: [mem 0x80200000-0xf7ffffff]
    node   1: [mem 0x100000000-0x1081fffff]
  On node 0 totalpages: 524174
    DMA zone: 64 pages used for memmap
    DMA zone: 21 pages reserved
    DMA zone: 3998 pages, LIFO batch:0
    DMA32 zone: 8128 pages used for memmap
    DMA32 zone: 520176 pages, LIFO batch:31
  On node 1 totalpages: 524288
    DMA32 zone: 7672 pages used for memmap
    DMA32 zone: 491008 pages, LIFO batch:31
    Normal zone: 520 pages used for memmap
    Normal zone: 33280 pages, LIFO batch:7

In this dmesg, the SRAT table reports that the memory range for
node 1 starts at 0x80200000.  However, the line starting with
"Initmem" reports that node 1 memory range starts at 0x80800000.
 The "Initmem" line is reported by setup_node_data() and is
wrong, because the kernel ends up using the range as reported in
the SRAT table.

This commit drops all that dead code from setup_node_data(),
renames it to alloc_node_data() and adds a printk() to
free_area_init_node() so that we report a node's memory range
accurately.

Here's the same dmesg section with this patch applied:

   SRAT: PXM 0 -> APIC 0x00 -> Node 0
   SRAT: PXM 0 -> APIC 0x01 -> Node 0
   SRAT: PXM 1 -> APIC 0x02 -> Node 1
   SRAT: PXM 1 -> APIC 0x03 -> Node 1
   SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
   SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff]
   SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff]
   NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff]
   NODE_DATA(0) allocated [mem 0x7ffdc000-0x7ffeffff]
   NODE_DATA(1) allocated [mem 0x1081ea000-0x1081fdfff]
   crashkernel: memory value expected
    [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0
    [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1
   Zone ranges:
     DMA      [mem 0x00001000-0x00ffffff]
     DMA32    [mem 0x01000000-0xffffffff]
     Normal   [mem 0x100000000-0x1081fffff]
   Movable zone start for each node
   Early memory node ranges
     node   0: [mem 0x00001000-0x0009efff]
     node   0: [mem 0x00100000-0x7ffeffff]
     node   1: [mem 0x80200000-0xf7ffffff]
     node   1: [mem 0x100000000-0x1081fffff]
   Initmem setup node 0 [mem 0x00001000-0x7ffeffff]
   On node 0 totalpages: 524174
     DMA zone: 64 pages used for memmap
     DMA zone: 21 pages reserved
     DMA zone: 3998 pages, LIFO batch:0
     DMA32 zone: 8128 pages used for memmap
     DMA32 zone: 520176 pages, LIFO batch:31
   Initmem setup node 1 [mem 0x80200000-0x1081fffff]
   On node 1 totalpages: 524288
     DMA32 zone: 7672 pages used for memmap
     DMA32 zone: 491008 pages, LIFO batch:31
     Normal zone: 520 pages used for memmap
     Normal zone: 33280 pages, LIFO batch:7

This commit was tested on a two node bare-metal NUMA machine and
Linux as a numa guest on hyperv and qemu/kvm.

PS: The wrong memory range reported by setup_node_data() seems to be
    harmless in the current kernel because it's just not used.  However,
    that bad range is used in kernel 2.6.32 to initialize the old boot
    memory allocator, which causes a crash during boot.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-16 08:55:10 +02:00
Yasuaki Ishimatsu
9661d5bcd0 x86/mm/hotplug: Modify PGD entry when removing memory
When hot-adding/removing memory, sync_global_pgds() is called
for synchronizing PGD to PGD entries of all processes MM.  But
when hot-removing memory, sync_global_pgds() does not work
correctly.

At first, sync_global_pgds() checks whether target PGD is none
or not.  And if PGD is none, the PGD is skipped.  But when
hot-removing memory, PGD may be none since PGD may be cleared by
free_pud_table().  So when sync_global_pgds() is called after
hot-removing memory, sync_global_pgds() should not skip PGD even
if the PGD is none.  And sync_global_pgds() must clear PGD
entries of all processes MM.

Currently sync_global_pgds() does not clear PGD entries of all
processes MM when hot-removing memory.  So when hot adding
memory which is same memory range as removed memory after
hot-removing memory, following call traces are shown:

 kernel BUG at arch/x86/mm/init_64.c:206!
 ...
 [<ffffffff815e0c80>] kernel_physical_mapping_init+0x1b2/0x1d2
 [<ffffffff815ced94>] init_memory_mapping+0x1d4/0x380
 [<ffffffff8104aebd>] arch_add_memory+0x3d/0xd0
 [<ffffffff815d03d9>] add_memory+0xb9/0x1b0
 [<ffffffff81352415>] acpi_memory_device_add+0x1af/0x28e
 [<ffffffff81325dc4>] acpi_bus_device_attach+0x8c/0xf0
 [<ffffffff813413b9>] acpi_ns_walk_namespace+0xc8/0x17f
 [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7
 [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7
 [<ffffffff813418ed>] acpi_walk_namespace+0x95/0xc5
 [<ffffffff81326b4c>] acpi_bus_scan+0x9a/0xc2
 [<ffffffff81326bff>] acpi_scan_bus_device_check+0x8b/0x12e
 [<ffffffff81326cb5>] acpi_scan_device_check+0x13/0x15
 [<ffffffff81320122>] acpi_os_execute_deferred+0x25/0x32
 [<ffffffff8107e02b>] process_one_work+0x17b/0x460
 [<ffffffff8107edfb>] worker_thread+0x11b/0x400
 [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400
 [<ffffffff81085aef>] kthread+0xcf/0xe0
 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
 [<ffffffff815fc76c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140

This patch clears PGD entries of all processes MM when
sync_global_pgds() is called after hot-removing memory

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-16 08:55:09 +02:00
Dave Young
3eddc69ffe x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
3.16 kernel boot fail with earlyprintk=efi, it keeps scrolling at the
bottom line of screen.

Bisected, the first bad commit is below:
commit 86dfc6f339
Author: Lv Zheng <lv.zheng@intel.com>
Date:   Fri Apr 4 12:38:57 2014 +0800

    ACPICA: Tables: Fix table checksums verification before installation.

I did some debugging by enabling both serial and efi earlyprintk, below is
some debug dmesg, seems early_ioremap fails in scroll up function due to
no free slot, see below dmesg output:

  WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:116 __early_ioremap+0x90/0x1c4()
  __early_ioremap(ed00c800, 00000c80) not found slot
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 3.17.0-rc1+ #204
  Hardware name: Hewlett-Packard HP Z420 Workstation/1589, BIOS J61 v03.15 05/09/2013
  Call Trace:
    dump_stack+0x4e/0x7a
    warn_slowpath_common+0x75/0x8e
    ? __early_ioremap+0x90/0x1c4
    warn_slowpath_fmt+0x47/0x49
    __early_ioremap+0x90/0x1c4
    ? sprintf+0x46/0x48
    early_ioremap+0x13/0x15
    early_efi_map+0x24/0x26
    early_efi_scroll_up+0x6d/0xc0
    early_efi_write+0x1b0/0x214
    call_console_drivers.constprop.21+0x73/0x7e
    console_unlock+0x151/0x3b2
    ? vprintk_emit+0x49f/0x532
    vprintk_emit+0x521/0x532
    ? console_unlock+0x383/0x3b2
    printk+0x4f/0x51
    acpi_os_vprintf+0x2b/0x2d
    acpi_os_printf+0x43/0x45
    acpi_info+0x5c/0x63
    ? __acpi_map_table+0x13/0x18
    ? acpi_os_map_iomem+0x21/0x147
    acpi_tb_print_table_header+0x177/0x186
    acpi_tb_install_table_with_override+0x4b/0x62
    acpi_tb_install_standard_table+0xd9/0x215
    ? early_ioremap+0x13/0x15
    ? __acpi_map_table+0x13/0x18
    acpi_tb_parse_root_table+0x16e/0x1b4
    acpi_initialize_tables+0x57/0x59
    acpi_table_init+0x50/0xce
    acpi_boot_table_init+0x1e/0x85
    setup_arch+0x9b7/0xcc4
    start_kernel+0x94/0x42d
    ? early_idt_handlers+0x120/0x120
    x86_64_start_reservations+0x2a/0x2c
    x86_64_start_kernel+0xf3/0x100

Quote reply from Lv.zheng about the early ioremap slot usage in this case:

"""
In early_efi_scroll_up(), 2 mapping entries will be used for the src/dst screen buffer.
In drivers/acpi/acpica/tbutils.c, we've improved the early table loading code in acpi_tb_parse_root_table().
We now need 2 mapping entries:
1. One mapping entry is used for RSDT table mapping. Each RSDT entry contains an address for another ACPI table.
2. For each entry in RSDP, we need another mapping entry to map the table to perform necessary check/override before installing it.

When acpi_tb_parse_root_table() prints something through EFI earlyprintk console, we'll have 4 mapping entries used.
The current 4 slots setting of early_ioremap() seems to be too small for such a use case.
"""

Thus increase the slot to 8 in this patch to fix this issue.
boot-time mappings become 512 page with this patch.

Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org> # v3.16
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-14 15:24:31 +01:00
Linus Torvalds
72d9310460 Make ARCH_HAS_FAST_MULTIPLIER a real config variable
It used to be an ad-hoc hack defined by the x86 version of
<asm/bitops.h> that enabled a couple of library routines to know whether
an integer multiply is faster than repeated shifts and additions.

This just makes it use the real Kconfig system instead, and makes x86
(which was the only architecture that did this) select the option.

NOTE! Even for x86, this really is kind of wrong.  If we cared, we would
probably not enable this for builds optimized for netburst (P4), where
shifts-and-adds are generally faster than multiplies.  This patch does
*not* change that kind of logic, though, it is purely a syntactic change
with no code changes.

This was triggered by the fact that we have other places that really
want to know "do I want to expand multiples by constants by hand or
not", particularly the hash generation code.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-13 11:14:53 -07:00
Frederic Weisbecker
3010279f0f x86: Tell irq work about self IPI support
x86 supports irq work self-IPIs when local apic is available. This is
partly known on runtime so lets implement arch_irq_work_has_interrupt()
accordingly.

This should be safely called after setup_arch().

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-09-13 18:38:29 +02:00
Peter Zijlstra
c5c38ef3d7 irq_work: Introduce arch_irq_work_has_interrupt()
The nohz full code needs irq work to trigger its own interrupt so that
the subsystem can work even when the tick is stopped.

Lets introduce arch_irq_work_has_interrupt() that archs can override to
tell about their support for this ability.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-09-13 18:38:07 +02:00
Linus Torvalds
7ee2d2d671 xen: bug fixes for 3.17-rc4
- Fix for PVHVM suspend/resume and migration
 - Don't pointlessly retry certain ballooning ops
 - Fix gntalloc when grefs have run out.
 - Fix PV boot if KSALR is enable or very large modules are used.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJUEZw4AAoJEFxbo/MsZsTRoHkIAJ5h287Wer2Yf4ALlFI47Esl
 AhcgVKi7WMJ6mBQUk/xpbSS3MZEuTuPJVbuiabJwRaZk9vX7w8q08yrAD8SOrCwY
 6iGooaWEZ96P5DPI5fmWBuOgt5f2wfqzAp//wl/dK/kzr6Kw63huTXa0H0fmd8BY
 Gy13pN4JEPDyvjjZWFvDcrcEUA9eTcTaXQZisBUGuGictXySr5ovz9G3EKOtJP0D
 vkmDzApijFEEjJwZMGazgipng4mwh94fW4+7bs57s6iREpMzOJDTRKrl9suaXdza
 5HA2ed7Au/qL8IwiQAFGxQpMBqNQxEGerlFfPBhRPuGQ+Ek3/pZF3BTU56w/nt4=
 =y/Ol
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.17-b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen bug fixes from David Vrabel:
 - fix for PVHVM suspend/resume and migration
 - don't pointlessly retry certain ballooning ops
 - fix gntalloc when grefs have run out.
 - fix PV boot if KSALR is enable or very large modules are used.

* tag 'stable/for-linus-3.17-b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: don't copy bogus duplicate entries into kernel page tables
  xen/gntalloc: safely delete grefs in add_grefs() undo path
  xen/gntalloc: fix oops after runnning out of grant refs
  xen/balloon: cancel ballooning if adding new memory failed
  xen/manage: Always freeze/thaw processes when suspend/resuming
2014-09-11 16:52:29 -07:00
Dave Hansen
9298b815ef x86: Add more disabled features
The original motivation for these patches was for an Intel CPU
feature called MPX.  The patch to add a disabled feature for it
will go in with the other parts of the support.

But, in the meantime, there are a few other features than MPX
that we can make assumptions about at compile-time based on
compile options.  Add them to disabled-features.h and check them
with cpu_feature_enabled().

Note that this gets rid of the last things that needed an #ifdef
CONFIG_X86_64 in cpufeature.h.  Yay!

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140911211524.C0EC332A@viggo.jf.intel.com
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-11 14:30:17 -07:00
Dave Hansen
381aa07a9b x86: Introduce disabled-features
I believe the REQUIRED_MASK aproach was taken so that it was
easier to consult in assembly (arch/x86/kernel/verify_cpu.S).
DISABLED_MASK does not have the same restriction, but I
implemented it the same way for consistency.

We have a REQUIRED_MASK... which does two things:
1. Keeps a list of cpuid bits to check in very early boot and
   refuse to boot if those are not present.
2. Consulted during cpu_has() checks, which allows us to
   optimize out things at compile-time.  In other words, if we
   *KNOW* we will not boot with the feature off, then we can
   safely assume that it will be present forever.

But, we don't have a similar mechanism for CPU features which
may be present but that we know we will not use.  We simply
use our existing mechanisms to repeatedly check the status of
the bit at runtime (well, the alternatives patching helps here
but it does not provide compile-time optimization).

Adding a feature to disabled-features.h allows the bit to be
checked via a new macro: cpu_feature_enabled().  Note that
for features in DISABLED_MASK, checks with this macro have
all of the benefits of an #ifdef.  Before, we would have done
this in a header:

#ifdef CONFIG_X86_INTEL_MPX
#define cpu_has_mpx cpu_has(X86_FEATURE_MPX)
#else
#define cpu_has_mpx 0
#endif

and this in the code:

	if (cpu_has_mpx)
		do_some_mpx_thing();

Now, just add your feature to DISABLED_MASK and you can do this
everywhere, and get the same benefits you would have from
#ifdefs:

	if (cpu_feature_enabled(X86_FEATURE_MPX))
		do_some_mpx_thing();

We need a new function and *not* a modification to cpu_has()
because there are cases where we actually need to check the CPU
itself, despite what features the kernel supports.  The best
example of this is a hypervisor which has no control over what
features its guests are using and where the guest does not depend
on the host for support.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140911211513.9E35E931@viggo.jf.intel.com
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-11 14:30:02 -07:00
Dave Hansen
c8128cceb4 x86: Axe the lightly-used cpu_has_pae
cpu_has_pae is only referenced in one place: the X86_32 kexec
code (in a file not even built on 64-bit).  It hardly warrants
its own macro, or the trouble we go to ensuring that it can't
be called in X86_64 code.

Axe the macro and replace it with a direct cpu feature check.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140911211511.AD76E774@viggo.jf.intel.com
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-11 14:30:01 -07:00
Stefan Bader
0b5a50635f x86/xen: don't copy bogus duplicate entries into kernel page tables
When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded
modules exceeds 512 MiB, then loading modules fails with a warning
(and hence a vmalloc allocation failure) because the PTEs for the
newly-allocated vmalloc address space are not zero.

  WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128
           vmap_page_range_noflush+0x2a1/0x360()

This is caused by xen_setup_kernel_pagetables() copying
level2_kernel_pgt into level2_fixmap_pgt, overwriting many non-present
entries.

Without KASLR, the normal kernel image size only covers the first half
of level2_kernel_pgt and module space starts after that.

L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[  0..255]->kernel
                                                  [256..511]->module
                          [511]->level2_fixmap_pgt[  0..505]->module

This allows 512 MiB of of module vmalloc space to be used before
having to use the corrupted level2_fixmap_pgt entries.

With KASLR enabled, the kernel image uses the full PUD range of 1G and
module space starts in the level2_fixmap_pgt. So basically:

L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel
                          [511]->level2_fixmap_pgt[0..505]->module

And now no module vmalloc space can be used without using the corrupt
level2_fixmap_pgt entries.

Fix this by properly converting the level2_fixmap_pgt entries to MFNs,
and setting level1_fixmap_pgt as read-only.

A number of comments were also using the the wrong L3 offset for
level2_kernel_pgt.  These have been corrected.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: stable@vger.kernel.org
2014-09-10 15:23:42 +01:00
Waiman Long
6157c7e1bb locking/rwlock, x86: Delete unused asm/rwlock.h and rwlock.S
This patch removes the unused asm/rwlock.h and rwlock.S files.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1408037251-45918-3-git-send-email-Waiman.Long@hp.com
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Francesco Fusco <ffusco@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-10 11:46:39 +02:00
Waiman Long
2ff810a7ef locking/rwlock, x86: Clean up asm/spinlock*.h to remove old rwlock code
As the x86 architecture now uses qrwlock for its read/write lock
implementation, it is no longer necessary to keep the old rwlock code
around. This patch removes the old rwlock code in the asm/spinlock.h
and asm/spinlock_types.h files. Now the ARCH_USE_QUEUE_RWLOCK
config parameter cannot be removed from x86/Kconfig or there will be
a compilation error.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Waiman Long <Waiman.Long@hp.com>
Link: http://lkml.kernel.org/r/1408037251-45918-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-10 11:46:38 +02:00
Ingo Molnar
bdea534db8 Linux 3.17-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUDOW+AAoJEHm+PkMAQRiGOXYH/00TPKm8PdM5cXXG2YYYv9eT
 W99K7KD2i0/qiVtlGgjjvB7fO3K0HcZusTd2jmVd8IWntXvauq7Zpw5YZkjwu4KX
 Y1HCwwCd2aw0FoqgrJhNP3+j5Cr1BD/HLtbffjCe+A3tppOIis4Bwt2wJOoYlXpS
 hU9Jxxc4lcRo8YKbffouDo7PIneWeJy8N+WGpUR5BfJIEK0ZZtCUqn3/3WLX4FYu
 fE6uiF/bACTpKXU/mo4dDbhZp439H/QdwQc9B0F8+8CBDMXKaNHrPV7kN36T2SWa
 fD4boikTsi/yh9Ks1fvHbvNq2N0ihoMnja+vLRyvjAcAQv2fKG3OZtYgFWSdghU=
 =Xknd
 -----END PGP SIGNATURE-----

Merge tag 'v3.17-rc4' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-09 06:48:07 +02:00
Andy Lutomirski
54eea9957f x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls
For slowpath syscalls, we initialize regs->ax to -ENOSYS and stick
the syscall number into regs->orig_ax prior to any possible tracing
and syscall execution.  This is user-visible ABI used by ptrace
syscall emulation and seccomp.

For fastpath syscalls, there's no good reason not to do the same
thing.  It's even slightly simpler than what we're currently doing.
It probably has no measureable performance impact.  It should have
no user-visible effect.

The purpose of this patch is to prepare for two-phase syscall
tracing, in which the first phase might modify the saved RAX without
leaving the fast path.  This change is just subtle enough that I'm
keeping it separate.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/01218b493f12ae2f98034b78c9ae085e38e94350.1409954077.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-08 14:14:08 -07:00
Andy Lutomirski
e0ffbaabc4 x86: Split syscall_trace_enter into two phases
This splits syscall_trace_enter into syscall_trace_enter_phase1 and
syscall_trace_enter_phase2.  Only phase 2 has full pt_regs, and only
phase 2 is permitted to modify any of pt_regs except for orig_ax.

The intent is that phase 1 can be called from the syscall fast path.

In this implementation, phase1 can handle any combination of
TIF_NOHZ (RCU context tracking), TIF_SECCOMP, and TIF_SYSCALL_AUDIT,
unless seccomp requests a ptrace event, in which case phase2 is
forced.

In principle, this could yield a big speedup for TIF_NOHZ as well as
for TIF_SECCOMP if syscall exit work were similarly split up.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2df320a600020fda055fccf2b668145729dd0c04.1409954077.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-08 14:14:03 -07:00
Ingo Molnar
196cf35842 x86/tty/serial/8250: Clean up the asm/serial.h include file a bit
- correct spelling
 - align fields vertically to make things more readable
 - make the layout of magic defines more obvious

Cc: Mark Rustad <mark.d.rustad@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Link: http://lkml.kernel.org/r/1409972149-26272-1-git-send-email-jeffrey.t.kirsher@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-06 10:20:55 +02:00
Mark Rustad
9ea029f12a x86/tty/serial/8250: Resolve missing-field-initializers warnings
Resolve some missing-field-initializers warnings by using
designated initialization in the expansion of the
SERIAL_PORT_DFNS macro.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Link: http://lkml.kernel.org/r/1409972149-26272-1-git-send-email-jeffrey.t.kirsher@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-06 10:20:53 +02:00
Paolo Bonzini
54987b7afa KVM: x86: propagate exception from permission checks on the nested page fault
Currently, if a permission error happens during the translation of
the final GPA to HPA, walk_addr_generic returns 0 but does not fill
in walker->fault.  To avoid this, add an x86_exception* argument
to the translate_gpa function, and let it fill in walker->fault.
The nested_page_fault field will be true, since the walk_mmu is the
nested_mmu and translate_gpu instead operates on the "outer" (NPT)
instance.

Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-05 12:01:13 +02:00
Paolo Bonzini
ef54bcfeea KVM: x86: skip writeback on injection of nested exception
If a nested page fault happens during emulation, we will inject a vmexit,
not a page fault.  However because writeback happens after the injection,
we will write ctxt->eip from L2 into the L1 EIP.  We do not write back
if an instruction caused an interception vmexit---do the same for page
faults.

Suggested-by: Gleb Natapov <gleb@kernel.org>
Reviewed-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-05 12:01:06 +02:00
David Matlack
56f17dd3fb kvm: x86: fix stale mmio cache bug
The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
up to userspace:

(1) Guest accesses gpa X without a memory slot. The gfn is cached in
struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
the SPTE write-execute-noread so that future accesses cause
EPT_MISCONFIGs.

(2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
covering the page just accessed.

(3) Guest attempts to read or write to gpa X again. On Intel, this
generates an EPT_MISCONFIG. The memory slot generation number that
was incremented in (2) would normally take care of this but we fast
path mmio faults through quickly_check_mmio_pf(), which only checks
the per-vcpu mmio cache. Since we hit the cache, KVM passes a
KVM_EXIT_MMIO up to userspace.

This patch fixes the issue by using the memslot generation number
to validate the mmio cache.

Cc: stable@vger.kernel.org
Signed-off-by: David Matlack <dmatlack@google.com>
[xiaoguangrong: adjust the code to make it simpler for stable-tree fix.]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-03 10:03:42 +02:00
Oleg Nesterov
31d963389f x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu()
__thread_fpu_begin() checks X86_FEATURE_EAGER_FPU by hand, we have
a helper for that.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20140902175720.GA21656@redhat.com
Reviewed-by: Suresh Siddha <sbsiddha@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-02 14:51:15 -07:00
Matthew Wilcox
bb693f13a0 x86: Remove set_pmd_pfn
The last user of set_pmd_pfn() went away in commit f03574f2d5, so this
has been dead code for over a year.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

 arch/x86/include/asm/pgtable_32.h |    3 ---
 arch/x86/mm/pgtable_32.c          |   35 -----------------------------------
 2 files changed, 38 deletions(-)
2014-09-01 10:15:31 +02:00
Jiang Liu
f3761db164 x86, irq: Fix build error caused by 9eabc99a63
Commit 9eabc99a63 causes following build error when
IOAPIC is disabled.
   arch/x86/pci/irq.c: In function 'pirq_disable_irq':
>> arch/x86/pci/irq.c:1259:2: error: implicit declaration of function 'mp_should_keep_irq' [-Werror=implicit-function-declaration]
     if (io_apic_assign_pci_irqs && !mp_should_keep_irq(&dev->dev) &&
     ^
   cc1: some warnings being treated as errors

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1409382916-10649-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-09-01 10:12:03 +02:00
Linus Torvalds
fd5984d7c8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "One patch to avoid assigning interrupts we don't actually have on
  non-PC platforms, and two patches that addresses bugs in the new
  IOAPIC assignment code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, irq, PCI: Keep IRQ assignment for runtime power management
  x86: irq: Fix bug in setting IOAPIC pin attributes
  x86: Fix non-PC platform kernel crash on boot due to NULL dereference
2014-08-29 17:22:27 -07:00
Hugh Dickins
b38af4721f x86,mm: fix pte_special versus pte_numa
Sasha Levin has shown oopses on ffffea0003480048 and ffffea0003480008 at
mm/memory.c:1132, running Trinity on different 3.16-rc-next kernels:
where zap_pte_range() checks page->mapping to see if PageAnon(page).

Those addresses fit struct pages for pfns d2001 and d2000, and in each
dump a register or a stack slot showed d2001730 or d2000730: pte flags
0x730 are PCD ACCESSED PROTNONE SPECIAL IOMAP; and Sasha's e820 map has
a hole between cfffffff and 100000000, which would need special access.

Commit c46a7c817e ("x86: define _PAGE_NUMA by reusing software bits on
the PMD and PTE levels") has broken vm_normal_page(): a PROTNONE SPECIAL
pte no longer passes the pte_special() test, so zap_pte_range() goes on
to try to access a non-existent struct page.

Fix this by refining pte_special() (SPECIAL with PRESENT or PROTNONE) to
complement pte_numa() (SPECIAL with neither PRESENT nor PROTNONE).  A
hint that this was a problem was that c46a7c817e added pte_numa() test
to vm_normal_page(), and moved its is_zero_pfn() test from slow to fast
path: This was papering over a pte_special() snag when the zero page was
encountered during zap.  This patch reverts vm_normal_page() to how it
was before, relying on pte_special().

It still appears that this patch may be incomplete: aren't there other
places which need to be handling PROTNONE along with PRESENT?  For
example, pte_mknuma() clears _PAGE_PRESENT and sets _PAGE_NUMA, but on a
PROT_NONE area, that would make it pte_special().  This is side-stepped
by the fact that NUMA hinting faults skipped PROT_NONE VMAs and there
are no grounds where a NUMA hinting fault on a PROT_NONE VMA would be
interesting.

Fixes: c46a7c817e ("x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels")
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: <stable@vger.kernel.org>	[3.16]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29 16:28:16 -07:00
Radim Krčmář
13a34e067e KVM: remove garbage arg to *hardware_{en,dis}able
In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.

Remove unnecessary arguments that stem from this.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 16:35:55 +02:00
Paolo Bonzini
656473003b KVM: forward declare structs in kvm_types.h
Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid
"'struct foo' declared inside parameter list" warnings (and consequent
breakage due to conflicting types).

Move them from individual files to a generic place in linux/kvm_types.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 16:35:53 +02:00
Jiang Liu
9eabc99a63 x86, irq, PCI: Keep IRQ assignment for runtime power management
Now IOAPIC driver dynamically allocates IRQ numbers for IOAPIC pins.
We need to keep IRQ assignment for PCI devices during runtime power
management, otherwise it may cause failure of device wakeups.

Commit 3eec595235 "x86, irq, PCI: Keep IRQ assignment for PCI
devices during suspend/hibernation" has fixed the issue for suspend/
hibernation, we also need the same fix for runtime device sleep too.

Fix: https://bugzilla.kernel.org/show_bug.cgi?id=83271
Reported-and-Tested-by: EmanueL Czirai <amanual@openmailbox.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: EmanueL Czirai <amanual@openmailbox.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1409304383-18806-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-08-29 13:38:00 +02:00
Christoph Lameter
4ba2968420 percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
__get_cpu_var can paper over differences in the definitions of
cpumask_var_t and either use the address of the cpumask variable
directly or perform a fetch of the address of the struct cpumask
allocated elsewhere. This is important particularly when using per cpu
cpumask_var_t declarations because in one case we have an offset into
a per cpu area to handle and in the other case we need to fetch a
pointer from the offset.

This patch introduces a new macro

this_cpu_cpumask_var_ptr()

that is defined where cpumask_var_t is defined and performs the proper
actions. All use cases where __get_cpu_var is used with cpumask_var_t
are converted to the use of this_cpu_cpumask_var_ptr().

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-28 08:58:57 -04:00
Christoph Lameter
e16321709c uv: Replace __get_cpu_var
Use __this_cpu_read instead.

Cc: Hedi Berriche <hedi@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26 13:45:50 -04:00
Christoph Lameter
89cbc76768 x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x).  This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area.  __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset.  Thereby address calculations are avoided and less registers
are used when code is generated.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

	DEFINE_PER_CPU(int, y);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	__this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	__this_cpu_inc(y)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26 13:45:49 -04:00
Ingo Molnar
83bc90e115 Merge branch 'linus' into perf/core, to fix conflicts
Conflicts:
	arch/x86/kernel/cpu/perf_event_intel_uncore*.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-24 22:32:24 +02:00
Ingo Molnar
44afe60294 A bunch of cleanups from Henrique.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJT9z1RAAoJEBLB8Bhh3lVKA0UP/0sCiH8JtSaP+qdvuU5xtldM
 L1O76MrQ1De6BO1XqjucsxClbVPPxBE3nBGzxfSoIMdil0A8PmHmJj6/5GILAosd
 kX7LxoFVOFOQg+hekXkcd4t4lmiTPcmfgzywOnLykB9Iv4Bn12RpPhGjkgywHhGC
 6Rvh9cawqJb2rAui2zseK9RAHHhTasXqo7vWAm5kQPFE98fV6gp4Pcf/iJb8Rd09
 7Rp5f5vSvWkmCKx3nCzNGGkZgvwVD+sJb9jQJ6QFaoPYmNa0Te3mJ5nl32aNrsct
 5s6BPEimHJc4lt5IUMc1MRhg2DNrdmcUxYaonOraeG59XRtSxH9OvOSCS291qd/m
 lZOADMjxD/d0mvpY937UOOiZ2yC0RDNx2OQDnQTEWYtGJnzqq1myVVEAo6bkPYhd
 2JrqtElUY9o2iO2yaFge9/Po4rAtvHlgzKcdldSUOk6xLkj1FpWZK+PqO5vqKets
 KltJQupwUMgXCQ9jCF+Y4mdyuncRm/GhhCYsiWr16Xjp/znX6hS6i1DK9j70uim3
 7Uu09fdlP7V3dFiX2cuKwH2tQ+jcAIOoZjtQKwhSLOrva/BWqSOFbX9W3UxQH0X9
 5JVxTn6Rutr6pn7rm9zG82Xq+UueQMEzE1kklwjmmiNI2044PCvuYtWvfTY4pJ71
 oW09NvqBAlMMaWDJ92ps
 =brfb
 -----END PGP SIGNATURE-----

Merge tag 'microcode_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/microcode

Pull x86/microcode updates from Borislav Petkov:

   "A bunch of cleanups from Henrique."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-24 11:27:42 +02:00
Radim Krčmář
ae97a3b818 KVM: x86: introduce sched_in to kvm_x86_ops
sched_in preempt notifier is available for x86, allow its use in
specific virtualization technlogies as well.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-21 18:45:22 +02:00
Ross Zwisler
448466b723 x86: Remove obsolete comment in uapi/e820.h
A comment introduced by this old commit:

  028b785888 ("x86 boot: extend some internal memory map arrays to handle larger EFI input")

had to do with some nested preprocessor directives.  The
directives were split into separate files by this commit:

  af170c5061 ("UAPI: (Scripted) Disintegrate arch/x86/include/asm")

The comment explaining their interaction was retained and is now
present in arch/x86/include/uapi/asm/e820.h.  This comment is no
longer correct, so delete it.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Link: http://lkml.kernel.org/r/1400521824-21040-1-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-21 08:43:39 +02:00
Paolo Bonzini
0d234daf7e Revert "KVM: x86: Increase the number of fixed MTRR regs to 10"
This reverts commit 682367c494,
which causes 32-bit SMP Windows 7 guests to panic.

SeaBIOS has a limit on the number of MTRRs that it can handle,
and this patch exceeded the limit.  Better revert it.
Thanks to Nadav Amit for debugging the cause.

Cc: stable@nongnu.org
Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:28 +02:00
Wanpeng Li
4473b570a7 KVM: x86: drop fpu_activate hook
The only user of the fpu_activate hook was dropped in commit
2d04a05bd7 (KVM: x86 emulator: emulate CLTS internally, 2011-04-20).
vmx_fpu_activate and svm_fpu_activate are still called on #NM (and for
Intel CLTS), but never from common code; hence, there's no need for
a hook.

Reviewed-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-19 15:12:28 +02:00
Josh Triplett
9def39be4e x86: Support compiling out human-friendly processor feature names
The table mapping CPUID bits to human-readable strings takes up a
non-trivial amount of space, and only exists to support /proc/cpuinfo
and a couple of kernel messages.  Since programs depend on the format of
/proc/cpuinfo, force inclusion of the table when building with /proc
support; otherwise, support omitting that table to save space, in which
case the kernel messages will print features numerically instead.

In addition to saving 1408 bytes out of vmlinux, this also saves 1373
bytes out of the uncompressed setup code, which contributes directly to
the size of bzImage.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
2014-08-17 15:54:00 -07:00
Linus Torvalds
49899007b9 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull idle update from Len Brown:
 "Two Intel-platform-specific updates to intel_idle, and a cosmetic
  tweak to the turbostat utility"

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: tweak whitespace in output format
  intel_idle: Broadwell support
  intel_idle: Disable Baytrail Core and Module C6 auto-demotion
2014-08-16 09:25:34 -06:00
Len Brown
8c058d53f6 intel_idle: Disable Baytrail Core and Module C6 auto-demotion
Power efficiency improves on Baytrail (Intel Atom Processor E3000)
when Linux disables C6 auto-demotion.

Based on work by Srinidhi Kasagar <srinidhi.kasagar@intel.com>.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
2014-08-15 17:06:14 -04:00
Peter Zijlstra
f6b4ecee0e locking,x86: Kill atomic_or_long()
There are no users, kill it.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20140508135851.768177189@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-14 12:48:02 +02:00
Linus Torvalds
81c02a21b2 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/apic updates from Thomas Gleixner:
 "This is a major overhaul to the x86 apic subsystem consisting of the
  following parts:

   - Remove obsolete APIC driver abstractions (David Rientjes)

   - Use the irqdomain facilities to dynamically allocate IRQs for
     IOAPICs.  This is a prerequisite to enable IOAPIC hotplug support,
     and it also frees up wasted vectors (Jiang Liu)

   - Misc fixlets.

  Despite the hickup in Ingos previous pull request - caused by the
  missing fixup for the suspend/resume issue reported by Borislav - I
  strongly recommend that this update finds its way into 3.17.  Some
  history for you:

  This is preparatory work for physical IOAPIC hotplug.  The first
  attempt to support this was done by Yinghai and I shot it down because
  it just added another layer of obscurity and complexity to the already
  existing mess without tackling the underlying shortcomings of the
  current implementation.

  After quite some on- and offlist discussions, I requested that the
  design of this functionality must use generic infrastructure, i.e.
  irq domains, which provide all the mechanisms to dynamically map linux
  interrupt numbers to physical interrupts.

  Jiang picked up the idea and did a great job of consolidating the
  existing interfaces to manage the x86 (IOAPIC) interrupt system by
  utilizing irq domains.

  The testing in tip, Linux-next and inside of Intel on various machines
  did not unearth any oddities until Borislav exposed it to one of his
  oddball machines.  The issue was resolved quickly, but unfortunately
  the fix fell through the cracks and did not hit the tip tree before
  Ingo sent the pull request.  Not entirely Ingos fault, I also assumed
  that the fix was already merged when Ingo asked me whether he could
  send it.

  Nevertheless this work has a proper design, has undergone several
  rounds of review and the final fallout after applying it to tip and
  integrating it into Linux-next has been more than moderate.  It's the
  ground work not only for IOAPIC hotplug, it will also allow us to move
  the lowlevel vector allocation into the irqdomain hierarchy, which
  will benefit other architectures as well.  Patches are posted already,
  but they are on hold for two weeks, see below.

  I really appreciate the competence and responsiveness Jiang has shown
  in course of this endavour.  So I'm sure that any fallout of this will
  be addressed in a timely manner.

  FYI, I'm vanishing for 2 weeks into my annual kids summer camp kitchen
  duty^Wvacation, while you folks are drooling at KS/LinuxCon :) But HPA
  will have a look at the hopefully zero fallout until I'm back"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation
  x86/apic/vsmp: Make is_vsmp_box() static
  x86, apic: Remove enable_apic_mode callback
  x86, apic: Remove setup_portio_remap callback
  x86, apic: Remove multi_timer_check callback
  x86, apic: Replace noop_check_apicid_used
  x86, apic: Remove check_apicid_present callback
  x86, apic: Remove mps_oem_check callback
  x86, apic: Remove smp_callin_clear_local_apic callback
  x86, apic: Replace trampoline physical addresses with defaults
  x86, apic: Remove x86_32_numa_cpu_node callback
  x86: intel-mid: Use the new io_apic interfaces
  x86, vsmp: Remove is_vsmp_box() from apic_is_clustered_box()
  x86, irq: Clean up irqdomain transition code
  x86, irq, devicetree: Release IOAPIC pin when PCI device is disabled
  x86, irq, SFI: Release IOAPIC pin when PCI device is disabled
  x86, irq, mpparse: Release IOAPIC pin when PCI device is disabled
  x86, irq, ACPI: Release IOAPIC pin when PCI device is disabled
  x86, irq: Introduce helper functions to release IOAPIC pin
  x86, irq: Simplify the way to handle ISA IRQ
  ...
2014-08-13 18:23:32 -06:00
Linus Torvalds
7453f33b2e Merge branch 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/xsave changes from Peter Anvin:
 "This is a patchset to support the XSAVES instruction required to
  support context switch of supervisor-only features in upcoming
  silicon.

  This patchset missed the 3.16 merge window, which is why it is based
  on 3.15-rc7"

* 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, xsave: Add forgotten inline annotation
  x86/xsaves: Clean up code in xstate offsets computation in xsave area
  x86/xsave: Make it clear that the XSAVE macros use (%edi)/(%rdi)
  Define kernel API to get address of each state in xsave area
  x86/xsaves: Enable xsaves/xrstors
  x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf
  x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time
  x86/xsaves: Add xsaves and xrstors support for booting time
  x86/xsaves: Clear reserved bits in xsave header
  x86/xsaves: Use xsave/xrstor for saving and restoring user space context
  x86/xsaves: Use xsaves/xrstors for context switch
  x86/xsaves: Use xsaves/xrstors to save and restore xsave area
  x86/xsaves: Define a macro for handling xsave/xrstor instruction fault
  x86/xsaves: Define macros for xsave instructions
  x86/xsaves: Change compacted format xsave area header
  x86/alternative: Add alternative_input_2 to support alternative with two features and input
  x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors
2014-08-13 18:20:04 -06:00
Andi Kleen
86a04461a9 perf/x86: Revamp PEBS event selection
The basic idea is that it does not make sense to list all PEBS
events individually. The list is very long, sometimes outdated
and the hardware doesn't need it. If an event does not support
PEBS it will just not count, there is no security issue.

We need to only list events that something special, like
supporting load or store addresses.

This vastly simplifies the PEBS event selection. It also
speeds up the scheduling because the scheduler doesn't
have to walk as many constraints.

Bugs fixed:

 - We do not allow setting forbidden flags with PEBS anymore
   (SDM 18.9.4), except for the special cycle event.
   This is done using a new constraint macro that also
   matches on the event flags.

 - Correct DataLA and load/store/na flags reporting on Haswell
   [Requires a followon patch]

 - We did not allow all PEBS events on Haswell:
   We were missing some valid subevents in d1-d2 (MEM_LOAD_UOPS_RETIRED.*,
   MEM_LOAD_UOPS_RETIRED_L3_HIT_RETIRED.*)

This includes the changes proposed by Stephane earlier and obsoletes
his patchkit (except for some changes on pre Sandy Bridge/Silvermont
CPUs)

I only did Sandy Bridge and Silvermont and later so far, mostly because these
are the parts I could directly confirm the hardware behavior with hardware
architects. Also I do not believe the older CPUs have any
missing events in their PEBS list, so there's no pressing
need to change them.

I did not implement the flag proposed by Peter to allow
setting forbidden flags. If really needed this could
be implemented on to of this patch.

v2: Fix broken store events on SNB/IVB (Stephane Eranian)
v3: More fixes. Rename some arguments (Stephane Eranian)
v4: List most Haswell events individually again to report
memory operation type correctly.
Add new flags to describe load/store/na for datala.
Update description.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1407785233-32193-2-git-send-email-eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Mark Davies <junk@eslaf.co.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-13 07:51:13 +02:00
Vivek Goyal
dd5f726076 kexec: support for kexec on panic using new system call
This patch adds support for loading a kexec on panic (kdump) kernel usning
new system call.

It prepares ELF headers for memory areas to be dumped and for saved cpu
registers.  Also prepares the memory map for second kernel and limits its
boot to reserved areas only.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:33 -07:00
Vivek Goyal
27f48d3e63 kexec-bzImage64: support for loading bzImage using 64bit entry
This is loader specific code which can load bzImage and set it up for
64bit entry.  This does not take care of 32bit entry or real mode entry.

32bit mode entry can be implemented if somebody needs it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:33 -07:00
Andy Lutomirski
a6c19dfe39 arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area
The core mm code will provide a default gate area based on
FIXADDR_USER_START and FIXADDR_USER_END if
!defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR).

This default is only useful for ia64.  arm64, ppc, s390, sh, tile, 64-bit
UML, and x86_32 have their own code just to disable it.  arm, 32-bit UML,
and x86_64 have gate areas, but they have their own implementations.

This gets rid of the default and moves the code into ia64.

This should save some code on architectures without a gate area: it's now
possible to inline the gate_area functions in the default case.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle]
Acked-by: Richard Weinberger <richard@nod.at> [for um]
Acked-by: Will Deacon <will.deacon@arm.com> [for arm64]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nathan Lynch <Nathan_Lynch@mentor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:27 -07:00
Laura Abbott
308c09f17d lib/scatterlist: make ARCH_HAS_SG_CHAIN an actual Kconfig
Rather than have architectures #define ARCH_HAS_SG_CHAIN in an
architecture specific scatterlist.h, make it a proper Kconfig option and
use that instead.  At same time, remove the header files are are now
mostly useless and just include asm-generic/scatterlist.h.

[sfr@canb.auug.org.au: powerpc files now need asm/dma.h]
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>			[x86]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	[powerpc]
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:26 -07:00
Linus Torvalds
7725131982 ACPI and power management updates for 3.17-rc1
- ACPICA update to upstream version 20140724.  That includes
    ACPI 5.1 material (support for the _CCA and _DSD predefined names,
    changes related to the DMAR and PCCT tables and ARM support among
    other things) and cleanups related to using ACPICA's header files.
    A major part of it is related to acpidump and the core code used
    by that utility.  Changes from Bob Moore, David E Box, Lv Zheng,
    Sascha Wildner, Tomasz Nowicki, Hanjun Guo.
 
  - Radix trees for memory bitmaps used by the hibernation core from
    Joerg Roedel.
 
  - Support for waking up the system from suspend-to-idle (also known
    as the "freeze" sleep state) using ACPI-based PCI wakeup signaling
    (Rafael J Wysocki).
 
  - Fixes for issues related to ACPI button events (Rafael J Wysocki).
 
  - New device ID for an ACPI-enumerated device included into the
    Wildcat Point PCH from Jie Yang.
 
  - ACPI video updates related to backlight handling from Hans de Goede
    and Linus Torvalds.
 
  - Preliminary changes needed to support ACPI on ARM from Hanjun Guo
    and Graeme Gregory.
 
  - ACPI PNP core cleanups from Arjun Sreedharan and Zhang Rui.
 
  - Cleanups related to ACPI_COMPANION() and ACPI_HANDLE() macros
    (Rafael J Wysocki).
 
  - ACPI-based device hotplug cleanups from Wei Yongjun and
    Rafael J Wysocki.
 
  - Cleanups and improvements related to system suspend from
    Lan Tianyu, Randy Dunlap and Rafael J Wysocki.
 
  - ACPI battery cleanup from Wei Yongjun.
 
  - cpufreq core fixes from Viresh Kumar.
 
  - Elimination of a deadband effect from the cpufreq ondemand
    governor and intel_pstate driver cleanups from Stratos Karafotis.
 
  - 350MHz CPU support for the powernow-k6 cpufreq driver from
    Mikulas Patocka.
 
  - Fix for the imx6 cpufreq driver from Anson Huang.
 
  - cpuidle core and governor cleanups from Daniel Lezcano,
    Sandeep Tripathy and Mohammad Merajul Islam Molla.
 
  - Build fix for the big_little cpuidle driver from Sachin Kamat.
 
  - Configuration fix for the Operation Performance Points (OPP)
    framework from Mark Brown.
 
  - APM cleanup from Jean Delvare.
 
  - cpupower utility fixes and cleanups from Peter Senna Tschudin,
    Andrey Utkin, Himangi Saraogi, Rickard Strandqvist, Thomas Renninger.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJT4nhtAAoJEILEb/54YlRxtZEP/2rtVQFSFdAW8l0Xm1SeSsl4
 EnZpSNT1TFn+NdG23vSIot5Jzdz1/dLfeoJEbXpoVt4DPC9/PK4HPlv5FEDQYfh5
 srftvvGcAva969sXzSBRNUeR+M8Yd2RdoYCfmqTEUjzf8GJLL4jC0VAIwMtsQklt
 EbiQX8JaHQS7RIql7MDg1N2vaTo+zxkf39Kkcl56usmO/uATP7cAPjFreF/xQ3d8
 OyBhz1cOXIhPw7bd9Dv9AgpJzA8WFpktDYEgy2sluBWMv+mLYjdZRCFkfpIRzmea
 pt+hJDeAy8ZL6/bjWCzz2x6wG7uJdDLblreI28sgnJx/VHR3Co6u4H1BqUBj18ct
 CHV6zQ55WFmx9/uJqBtwFy333HS2ysJziC5ucwmg8QjkvAn4RK8S0qHMfRvSSaHj
 F9ejnHGxyrc3zzfsngUf/VXIp67FReaavyKX3LYxjHjMPZDMw2xCtCWEpUs52l2o
 fAbkv8YFBbUalIv0RtELH5XnKQ2ggMP8UgvT74KyfXU6LaliH8lEV20FFjMgwrPI
 sMr2xk04eS8mNRNAXL8OMMwvh6DY/Qsmb7BVg58RIw6CdHeFJl834yztzcf7+j56
 4oUmA16QYBCFA3udGQ3Tb07mi8XTfrMdTOGA0koQG9tjswKXuLUXUk9WAXZe4vml
 ItRpZKE86BCs3mLJMYre
 =ZODv
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management updates from Rafael Wysocki:
 "Again, ACPICA leads the pack (47 commits), followed by cpufreq (18
  commits) and system suspend/hibernation (9 commits).

  From the new code perspective, the ACPICA update brings ACPI 5.1 to
  the table, including a new device configuration object called _DSD
  (Device Specific Data) that will hopefully help us to operate device
  properties like Device Trees do (at least to some extent) and changes
  related to supporting ACPI on ARM.

  Apart from that we have hibernation changes making it use radix trees
  to store memory bitmaps which should speed up some operations carried
  out by it quite significantly.  We also have some power management
  changes related to suspend-to-idle (the "freeze" sleep state) support
  and more preliminary changes needed to support ACPI on ARM (outside of
  ACPICA).

  The rest is fixes and cleanups pretty much everywhere.

  Specifics:

   - ACPICA update to upstream version 20140724.  That includes ACPI 5.1
     material (support for the _CCA and _DSD predefined names, changes
     related to the DMAR and PCCT tables and ARM support among other
     things) and cleanups related to using ACPICA's header files.  A
     major part of it is related to acpidump and the core code used by
     that utility.  Changes from Bob Moore, David E Box, Lv Zheng,
     Sascha Wildner, Tomasz Nowicki, Hanjun Guo.

   - Radix trees for memory bitmaps used by the hibernation core from
     Joerg Roedel.

   - Support for waking up the system from suspend-to-idle (also known
     as the "freeze" sleep state) using ACPI-based PCI wakeup signaling
     (Rafael J Wysocki).

   - Fixes for issues related to ACPI button events (Rafael J Wysocki).

   - New device ID for an ACPI-enumerated device included into the
     Wildcat Point PCH from Jie Yang.

   - ACPI video updates related to backlight handling from Hans de Goede
     and Linus Torvalds.

   - Preliminary changes needed to support ACPI on ARM from Hanjun Guo
     and Graeme Gregory.

   - ACPI PNP core cleanups from Arjun Sreedharan and Zhang Rui.

   - Cleanups related to ACPI_COMPANION() and ACPI_HANDLE() macros
     (Rafael J Wysocki).

   - ACPI-based device hotplug cleanups from Wei Yongjun and Rafael J
     Wysocki.

   - Cleanups and improvements related to system suspend from Lan
     Tianyu, Randy Dunlap and Rafael J Wysocki.

   - ACPI battery cleanup from Wei Yongjun.

   - cpufreq core fixes from Viresh Kumar.

   - Elimination of a deadband effect from the cpufreq ondemand governor
     and intel_pstate driver cleanups from Stratos Karafotis.

   - 350MHz CPU support for the powernow-k6 cpufreq driver from Mikulas
     Patocka.

   - Fix for the imx6 cpufreq driver from Anson Huang.

   - cpuidle core and governor cleanups from Daniel Lezcano, Sandeep
     Tripathy and Mohammad Merajul Islam Molla.

   - Build fix for the big_little cpuidle driver from Sachin Kamat.

   - Configuration fix for the Operation Performance Points (OPP)
     framework from Mark Brown.

   - APM cleanup from Jean Delvare.

   - cpupower utility fixes and cleanups from Peter Senna Tschudin,
     Andrey Utkin, Himangi Saraogi, Rickard Strandqvist, Thomas
     Renninger"

* tag 'pm+acpi-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (118 commits)
  ACPI / LPSS: add LPSS device for Wildcat Point PCH
  ACPI / PNP: Replace faulty is_hex_digit() by isxdigit()
  ACPICA: Update version to 20140724.
  ACPICA: ACPI 5.1: Update for PCCT table changes.
  ACPICA/ARM: ACPI 5.1: Update for GTDT table changes.
  ACPICA/ARM: ACPI 5.1: Update for MADT changes.
  ACPICA/ARM: ACPI 5.1: Update for FADT changes.
  ACPICA: ACPI 5.1: Support for the _CCA predifined name.
  ACPICA: ACPI 5.1: New notify value for System Affinity Update.
  ACPICA: ACPI 5.1: Support for the _DSD predefined name.
  ACPICA: Debug object: Add current value of Timer() to debug line prefix.
  ACPICA: acpihelp: Add UUID support, restructure some existing files.
  ACPICA: Utilities: Fix local printf issue.
  ACPICA: Tables: Update for DMAR table changes.
  ACPICA: Remove some extraneous printf arguments.
  ACPICA: Update for comments/formatting. No functional changes.
  ACPICA: Disassembler: Add support for the ToUUID opererator (macro).
  ACPICA: Remove a redundant cast to acpi_size for ACPI_OFFSET() macro.
  ACPICA: Work around an ancient GCC bug.
  ACPI / processor: Make it possible to get local x2apic id via _MAT
  ...
2014-08-06 20:34:19 -07:00
Linus Torvalds
930e0312bc sound updates for 3.17-rc1
There've been many updates in ASoC side at this time, especially the
 framework enhancement for multiple CODECs on a single DAI and more
 componentization works.  The only major change in ALSA core is the
 addition of timestamp type in sw_params field.  This should behave in
 backward compatible way.  Other than that, there are lots of small
 changes and new drivers in wide range, including a large code cut in
 HD-audio driver for deprecated static quirks.  Some highlights are
 below:
 
 ALSA Core:
 - Add the new timestamp type field to sw_params to choose
   MONOTONIC_RAW type
 
 HD-audio:
 - Continued conversion to standard printk macros, generic code
   cleanups
 - Removal of obsoleted static quirk codes for Conexant and C-Media
   codecs
 - Fixups for HP Envy TS, Dell XPS 15, HP and Dell mute/mic LED,
   Gigabyte BXBT-2807 mobo
 - Intel Braswell support
 
 ASoC:
 - Support for multiple CODECs attached to a single DAI, enabling
   systems with for example multiple DAC/speaker drivers on a single
  link, contributed by Benoit Cousson based on work from Misael Lopez
  Cruz
 - Support for byte controls larger than 256 bytes based on the use of
   TLVs contributed by Omair Mohammed Abdullah
 - More componentisation work from Lars-Peter Clausen
 - The remainder of the conversions of CODEC drivers to params_width()
   by Mark Brown
 - Drivers for Cirrus Logic CS4265, Freescale i.MX ASRC blocks, Realtek
   RT286 and RT5670, Rockchip RK3xxx I2S controllers and Texas
   Instruments TAS2552
 - Lots of updates and fixes, especially to the DaVinci, Intel,
   Freescale, Realtek, and rcar drivers
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJT4fj0AAoJEGwxgFQ9KSmkXQ0QAIiRmVg40aiJoEdOLGgzNZtq
 r/nXj69AuB6JSy0hKbFyyijjCcRpyCCGvjDYlogjT75M3c35Npz/m85oZHx2tajD
 SB5OA+QxO4EQ3C0GjITIRHJROm4MM8/rnbnNYTsWnEGRkobTFTl0rHbSkA85RGFt
 0zZqqs1R0s/nO9PMQ+5PA5x9xVFiZs2COeCK0CFA9s2ACf/hbxJBRIqYpIFWOo78
 9L41jBOFuC/hIb4qwjgmsCWbKe1KQysTAf+Wty0CKipJ6VhfCbPn1Qn1zXGeUOxc
 mj4eZ6LpJTrVMr/UN02c5vgPOiaBrQ7fWZo3dVHLlIjC6cEI1tUvNYAin7CMEzx8
 DUsvo9p30OheA+ijc9wKaYFY6YmmJZRtpnnMd39i0oPG+bhvoV7vjXjJSB1sLJt1
 o82xLpVL4Th8H+DMDVwA7UIBvvZGZBusw1qsNGfcOPrmExi4ScGhA0gSOO6W2y1z
 VQLRbiXB/HtJGxeqWL6RqJOcLBOlJNmsk4UZMOSCu2OZrWd5I8MuRrNWeHDqhX1H
 +VDEJVhFmM21vMpnobzEPxWsMgTVIAVf3Thh+WgaPxL4Krh0vkpZsgZk16VVmy/o
 OJJF3n41FND4n9zSjOe4MkuL8UCOUpKCaBdqj9K1s6UKwOEKuDNslyT/zqutRWK5
 x1uApU5y+E4iQT/b7cmA
 =RL72
 -----END PGP SIGNATURE-----

Merge tag 'sound-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound updates from Takashi Iwai:
 "There've been many updates in ASoC side at this time, especially the
  framework enhancement for multiple CODECs on a single DAI and more
  componentization works.

  The only major change in ALSA core is the addition of timestamp type
  in sw_params field.  This should behave in backward compatible way.

  Other than that, there are lots of small changes and new drivers in
  wide range, including a large code cut in HD-audio driver for
  deprecated static quirks.  Some highlights are below:

  ALSA Core:
   - Add the new timestamp type field to sw_params to choose
     MONOTONIC_RAW type

  HD-audio:
   - Continued conversion to standard printk macros, generic code
     cleanups
   - Removal of obsoleted static quirk codes for Conexant and C-Media
     codecs
   - Fixups for HP Envy TS, Dell XPS 15, HP and Dell mute/mic LED,
     Gigabyte BXBT-2807 mobo
   - Intel Braswell support

  ASoC:
   - Support for multiple CODECs attached to a single DAI, enabling
     systems with for example multiple DAC/speaker drivers on a single
     link, contributed by Benoit Cousson based on work from Misael Lopez
     Cruz
   - Support for byte controls larger than 256 bytes based on the use of
     TLVs contributed by Omair Mohammed Abdullah
   - More componentisation work from Lars-Peter Clausen
   - The remainder of the conversions of CODEC drivers to params_width()
     by Mark Brown
   - Drivers for Cirrus Logic CS4265, Freescale i.MX ASRC blocks,
     Realtek RT286 and RT5670, Rockchip RK3xxx I2S controllers and Texas
     Instruments TAS2552
   - Lots of updates and fixes, especially to the DaVinci, Intel,
     Freescale, Realtek, and rcar drivers"

* tag 'sound-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (402 commits)
  ALSA: usb-audio: Whitespace cleanups for sound/usb/midi.*
  ALSA: usb-audio: Respond to suspend and resume callbacks for MIDI input
  sound/oss/pss: Remove typedefs pss_mixerdata and pss_confdata
  sound/oss/opl3: Remove typedef opl_devinfo
  ALSA: fireworks: fix specifiers in format strings for propper output
  ASoC: imx-audmux: Use uintptr_t for port numbers
  ASoC: davinci: Enable menuconfig entry for McASP
  ASoC: fsl_asrc: Don't access members of config before checking it
  ASoC: fsl_sarc_dma: Check pair before using it
  ASoC: adau1977: Fix truncation warning on 64 bit architectures
  ALSA: virtuoso: add Xonar Essence STX II support
  ALSA: riptide: fix %d confusingly prefixed with 0x in format strings
  ALSA: fireworks: fix %d confusingly prefixed with 0x in format strings
  ALSA: hda - add codec ID for Braswell display audio codec
  ALSA: hda - add PCI IDs for Intel Braswell
  ALSA: usb-audio: Adjust Gamecom 780 volume level
  ALSA: usb-audio: improve dmesg source grepability
  ASoC: rt5670: Fix duplicate const warnings
  ASoC: rt5670: Staticise non-exported symbols
  ASoC: Intel: update stream only on stream IPC msgs
  ...
2014-08-06 20:07:24 -07:00
Linus Torvalds
98a96f2022 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso updates from Ingo Molnar:
 "Further simplifications and improvements to the VDSO code, by Andy
  Lutomirski"

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64/vsyscall: Fix warn_bad_vsyscall log output
  x86/vdso: Set VM_MAYREAD for the vvar vma
  x86, vdso: Get rid of the fake section mechanism
  x86, vdso: Move the vvar area before the vdso text
2014-08-04 17:27:47 -07:00
Linus Torvalds
5637a2a3e9 Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 UV TLB update from Ingo Molnar:
 "UV TLB shootdown logic updates for version of the UV architecture"

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/uv: Update the UV3 TLB shootdown logic
2014-08-04 17:24:56 -07:00
Linus Torvalds
8556d44fee Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Intel SOC driver updates, by Aubrey Li.

   - TS5500 platform updates, by Vivien Didelot"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pmc_atom: Silence shift wrapping warnings in pmc_sleep_tmr_show()
  x86/pmc_atom: Expose PMC device state and platform sleep state
  x86/pmc_atom: Eisable a few S0ix wake up events for S0ix residency
  x86/platform: New Intel Atom SOC power management controller driver
  x86/platform/ts5500: Add support for TS-5400 boards
  x86/platform/ts5500: Add a 'name' sysfs attribute
  x86/platform/ts5500: Use the DEVICE_ATTR_RO() macro
2014-08-04 17:20:08 -07:00
Linus Torvalds
ce47479632 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "The main change in this cycle is the rework of the TLB range flushing
  code, to simplify, fix and consolidate the code.  By Dave Hansen"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Set TLB flush tunable to sane value (33)
  x86/mm: New tunable for single vs full TLB flush
  x86/mm: Add tracepoints for TLB flushes
  x86/mm: Unify remote INVLPG code
  x86/mm: Fix missed global TLB flush stat
  x86/mm: Rip out complicated, out-of-date, buggy TLB flushing
  x86/mm: Clean up the TLB flushing code
  x86/smep: Be more informative when signalling an SMEP fault
2014-08-04 17:15:45 -07:00
Linus Torvalds
76f09aa464 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI changes from Ingo Molnar:
 "Main changes in this cycle are:

   - arm64 efi stub fixes, preservation of FP/SIMD registers across
     firmware calls, and conversion of the EFI stub code into a static
     library - Ard Biesheuvel

   - Xen EFI support - Daniel Kiper

   - Support for autoloading the efivars driver - Lee, Chun-Yi

   - Use the PE/COFF headers in the x86 EFI boot stub to request that
     the stub be loaded with CONFIG_PHYSICAL_ALIGN alignment - Michael
     Brown

   - Consolidate all the x86 EFI quirks into one file - Saurabh Tangri

   - Additional error logging in x86 EFI boot stub - Ulf Winkelvos

   - Support loading initrd above 4G in EFI boot stub - Yinghai Lu

   - EFI reboot patches for ACPI hardware reduced platforms"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  efi/arm64: Handle missing virtual mapping for UEFI System Table
  arch/x86/xen: Silence compiler warnings
  xen: Silence compiler warnings
  x86/efi: Request desired alignment via the PE/COFF headers
  x86/efi: Add better error logging to EFI boot stub
  efi: Autoload efivars
  efi: Update stale locking comment for struct efivars
  arch/x86: Remove efi_set_rtc_mmss()
  arch/x86: Replace plain strings with constants
  xen: Put EFI machinery in place
  xen: Define EFI related stuff
  arch/x86: Remove redundant set_bit(EFI_MEMMAP) call
  arch/x86: Remove redundant set_bit(EFI_SYSTEM_TABLES) call
  efi: Introduce EFI_PARAVIRT flag
  arch/x86: Do not access EFI memory map if it is not available
  efi: Use early_mem*() instead of early_io*()
  arch/ia64: Define early_memunmap()
  x86/reboot: Add EFI reboot quirk for ACPI Hardware Reduced flag
  efi/reboot: Allow powering off machines using EFI
  efi/reboot: Add generic wrapper around EfiResetSystem()
  ...
2014-08-04 17:13:50 -07:00
Linus Torvalds
e9c9eecaba Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpufeature updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Continued cleanups of CPU bugs mis-marked as 'missing features', by
     Borislav Petkov.

   - Detect the xsaves/xrstors feature and releated cleanup, by Fenghua
     Yu"

* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cpu: Kill cpu_has_mp
  x86, amd: Cleanup init_amd
  x86/cpufeature: Add bug flags to /proc/cpuinfo
  x86, cpufeature: Convert more "features" to bugs
  x86/xsaves: Detect xsaves/xrstors feature
  x86/cpufeature.h: Reformat x86 feature macros
2014-08-04 17:12:45 -07:00
Linus Torvalds
19d402c1e7 Merge branches 'x86-build-for-linus', 'x86-cleanups-for-linus' and 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build/cleanup/debug updates from Ingo Molnar:
 "Robustify the build process with a quirk to avoid GCC reordering
  related bugs.

  Two code cleanups.

  Simplify entry_64.S CFI annotations, by Jan Beulich"

* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, build: Change code16gcc.h from a C header to an assembly header

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Simplify __HAVE_ARCH_CMPXCHG tests
  x86/tsc: Get rid of custom DIV_ROUND() macro

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/debug: Drop several unnecessary CFI annotations
2014-08-04 16:56:16 -07:00
Linus Torvalds
ef35ad26f8 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Kernel side changes:

   - Consolidate the PMU interrupt-disabled code amongst architectures
     (Vince Weaver)

   - misc fixes

  Tooling changes (new features, user visible changes):

   - Add support for pagefault tracing in 'trace', please see multiple
     examples in the changeset messages (Stanislav Fomichev).

   - Add pagefault statistics in 'trace' (Stanislav Fomichev)

   - Add header for columns in 'top' and 'report' TUI browsers (Jiri
     Olsa)

   - Add pagefault statistics in 'trace' (Stanislav Fomichev)

   - Add IO mode into timechart command (Stanislav Fomichev)

   - Fallback to syscalls:* when raw_syscalls:* is not available in the
     perl and python perf scripts.  (Daniel Bristot de Oliveira)

   - Add --repeat global option to 'perf bench' to be used in benchmarks
     such as the existing 'futex' one, that was modified to use it
     instead of a local option.  (Davidlohr Bueso)

   - Fix fd -> pathname resolution in 'trace', be it using /proc or a
     vfs_getname probe point.  (Arnaldo Carvalho de Melo)

   - Add suggestion of how to set perf_event_paranoid sysctl, to help
     non-root users trying tools like 'trace' to get a working
     environment.  (Arnaldo Carvalho de Melo)

   - Updates from trace-cmd for traceevent plugin_kvm plus args cleanup
     (Steven Rostedt, Jan Kiszka)

   - Support S/390 in 'perf kvm stat' (Alexander Yarygin)

  Tooling infrastructure changes:

   - Allow reserving a row for header purposes in the hists browser
     (Arnaldo Carvalho de Melo)

   - Various fixes and prep work related to supporting Intel PT (Adrian
     Hunter)

   - Introduce multiple debug variables control (Jiri Olsa)

   - Add callchain and additional sample information for python scripts
     (Joseph Schuchart)

   - More prep work to support Intel PT: (Adrian Hunter)
     - Polishing 'script' BTS output
     - 'inject' can specify --kallsym
     - VDSO is per machine, not a global var
     - Expose data addr lookup functions previously private to 'script'
     - Large mmap fixes in events processing

   - Include standard stringify macros in power pc code (Sukadev
     Bhattiprolu)

  Tooling cleanups:

   - Convert open coded equivalents to asprintf() (Andy Shevchenko)

   - Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo)

   - Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de
     Melo)

   - No need to reimplement err() in 'perf bench sched-messaging', drop
     barf().  (Davidlohr Bueso).

   - Remove ev_name argument from perf_evsel__hists_browse, can be
     obtained from the other parameters.  (Jiri Olsa)

  Tooling fixes:

   - Fix memory leak in the 'sched-messaging' perf bench test.
     (Davidlohr Bueso)

   - The -o and -n 'perf bench mem' options are mutually exclusive, emit
     error when both are specified.  (Davidlohr Bueso)

   - Fix scrollbar refresh row index in the ui browser, problem exposed
     now that headers will be added and will be allowed to be switched
     on/off.  (Jiri Olsa)

   - Handle the num array type in python properly (Sebastian Andrzej
     Siewior)

   - Fix wrong condition for allocation failure (Jiri Olsa)

   - Adjust callchain based on DWARF debug info on powerpc (Sukadev
     Bhattiprolu)

   - Fix a risk for doing free on uninitialized pointer in traceevent
     lib (Rickard Strandqvist)

   - Update attr test with PERF_FLAG_FD_CLOEXEC flag (Jiri Olsa)

   - Enable close-on-exec flag on perf file descriptor (Yann Droneaud)

   - Fix build on gcc 4.4.7 (Arnaldo Carvalho de Melo)

   - Event ordering fixes (Jiri Olsa)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (123 commits)
  Revert "perf tools: Fix jump label always changing during tracing"
  perf tools: Fix perf usage string leftover
  perf: Check permission only for parent tracepoint event
  perf record: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds
  perf record: Always force PERF_RECORD_FINISHED_ROUND event
  perf inject: Add --kallsyms parameter
  perf tools: Expose 'addr' functions so they can be reused
  perf session: Fix accounting of ordered samples queue
  perf powerpc: Include util/util.h and remove stringify macros
  perf tools: Fix build on gcc 4.4.7
  perf tools: Add thread parameter to vdso__dso_findnew()
  perf tools: Add dso__type()
  perf tools: Separate the VDSO map name from the VDSO dso name
  perf tools: Add vdso__new()
  perf machine: Fix the lifetime of the VDSO temporary file
  perf tools: Group VDSO global variables into a structure
  perf session: Add ability to skip 4GiB or more
  perf session: Add ability to 'skip' a non-piped event stream
  perf tools: Pass machine to vdso__dso_findnew()
  perf tools: Add dso__data_size()
  ...
2014-08-04 16:09:53 -07:00
Linus Torvalds
8efb90cf1e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle are:

   - big rtmutex and futex cleanup and robustification from Thomas
     Gleixner
   - mutex optimizations and refinements from Jason Low
   - arch_mutex_cpu_relax() removal and related cleanups
   - smaller lockdep tweaks"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  arch, locking: Ciao arch_mutex_cpu_relax()
  locking/lockdep: Only ask for /proc/lock_stat output when available
  locking/mutexes: Optimize mutex trylock slowpath
  locking/mutexes: Try to acquire mutex only if it is unlocked
  locking/mutexes: Delete the MUTEX_SHOW_NO_WAITER macro
  locking/mutexes: Correct documentation on mutex optimistic spinning
  rtmutex: Make the rtmutex tester depend on BROKEN
  futex: Simplify futex_lock_pi_atomic() and make it more robust
  futex: Split out the first waiter attachment from lookup_pi_state()
  futex: Split out the waiter check from lookup_pi_state()
  futex: Use futex_top_waiter() in lookup_pi_state()
  futex: Make unlock_pi more robust
  rtmutex: Avoid pointless requeueing in the deadlock detection chain walk
  rtmutex: Cleanup deadlock detector debug logic
  rtmutex: Confine deadlock logic to futex
  rtmutex: Simplify remove_waiter()
  rtmutex: Document pi chain walk
  rtmutex: Clarify the boost/deboost part
  rtmutex: No need to keep task ref for lock owner check
  rtmutex: Simplify and document try_to_take_rtmutex()
  ...
2014-08-04 16:09:06 -07:00
Linus Torvalds
8533ce7271 These are the x86, MIPS and s390 changes; PPC and ARM will come in a
few days.
 
 MIPS and s390 have little going on this release; just bugfixes, some
 small, some larger.
 
 The highlights for x86 are nested VMX improvements (Jan Kiszka), optimizations
 for old processor (up to Nehalem, by me and Bandan Das), and a lot of x86
 emulator bugfixes (Nadav Amit).
 
 Stephen Rothwell reported a trivial conflict with the tracing branch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJT300XAAoJEBvWZb6bTYby3V8QAJz+XyajnhJ8wH55Vxczz22L
 i2gtUGmBLhEXsBcaVKO4BBfek88lLzg0SGLjfW5wCMQmKtxVlrwTCXNkBoPGjapd
 NwHtWkMKym44PDhRovn7zkSumkxC43uFIBR/ebrhP6Bvhh9s+MnkQUxfw9ILB+YV
 EeKyEG8sSgxFCciuHbp3mIXpDcO6r/ldy6I7009OdyhLoMY+Kvmk7kRe9wtAivdg
 CGJi60QvGOn2RGRPOCEtF6UWr8Ae8fe1t84o0hkXPv/j3jtabzAatXKJa4dYNbIs
 7Mp4NQpxaGV6rq3WCYVeZRxGs+UReGDAS3Il4Z8C9eTOTooSfxdVr8acpM8PY6I8
 UmLT6ECLGycc4ELXrETtR+QLmiXACyJqyVxz4aiLV3kWSWfamKD3hBeQK9NizNcE
 VoPDl+PyISvR1tW4KstBuzfUWAEXi+gO78cqqFr/VW6cl7HKpA1DFQaPfGkYKDae
 2CPwcLwI5/M6RtSgkyXTkEqNZLc2BjldqSeM1lmWjhZVW56X2iqePUL46Vab3Yvt
 U+sELtwEE560NLN3hbaHUsLR1tcUix5w8vTzcXPxgoHQBszHCcAZTWd1XHulr64F
 rp/cangqtkPKcu5j1mNhQs38oLjHI1MUsbQrqFoD4tmHjQ75iXHRFzYGoIVKXyHG
 AnGbQzJzBcdAANhm3LW0
 =UXxV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM changes from Paolo Bonzini:
 "These are the x86, MIPS and s390 changes; PPC and ARM will come in a
  few days.

  MIPS and s390 have little going on this release; just bugfixes, some
  small, some larger.

  The highlights for x86 are nested VMX improvements (Jan Kiszka),
  optimizations for old processor (up to Nehalem, by me and Bandan Das),
  and a lot of x86 emulator bugfixes (Nadav Amit).

  Stephen Rothwell reported a trivial conflict with the tracing branch"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (104 commits)
  x86/kvm: Resolve shadow warnings in macro expansion
  KVM: s390: rework broken SIGP STOP interrupt handling
  KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table
  KVM: vmx: remove duplicate vmx_mpx_supported() prototype
  KVM: s390: Fix memory leak on busy SIGP stop
  x86/kvm: Resolve shadow warning from min macro
  kvm: Resolve missing-field-initializers warnings
  Replace NR_VMX_MSR with its definition
  KVM: x86: Assertions to check no overrun in MSR lists
  KVM: x86: set rflags.rf during fault injection
  KVM: x86: Setting rflags.rf during rep-string emulation
  KVM: x86: DR6/7.RTM cannot be written
  KVM: nVMX: clean up nested_release_vmcs12 and code around it
  KVM: nVMX: fix lifetime issues for vmcs02
  KVM: x86: Defining missing x86 vectors
  KVM: x86: emulator injects #DB when RFLAGS.RF is set
  KVM: x86: Cleanup of rflags.rf cleaning
  KVM: x86: Clear rflags.rf on emulated instructions
  KVM: x86: popf emulation should not change RF
  KVM: x86: Clearing rflags.rf upon skipped emulated instruction
  ...
2014-08-04 12:16:46 -07:00
Linus Torvalds
b8c0aa46b3 This pull request has a lot of work done. The main thing is the changes
to the ftrace function callback infrastructure. It's introducing a
 way to allow different functions to call directly different trampolines
 instead of all calling the same "mcount" one.
 
 The only user of this for now is the function graph tracer, which always
 had a different trampoline, but the function tracer trampoline was called
 and did basically nothing, and then the function graph tracer trampoline
 was called. The difference now, is that the function graph tracer
 trampoline can be called directly if a function is only being traced by
 the function graph trampoline. If function tracing is also happening on
 the same function, the old way is still done.
 
 The accounting for this takes up more memory when function graph tracing
 is activated, as it needs to keep track of which functions it uses.
 I have a new way that wont take as much memory, but it's not ready yet
 for this merge window, and will have to wait for the next one.
 
 Another big change was the removal of the ftrace_start/stop() calls that
 were used by the suspend/resume code that stopped function tracing when
 entering into suspend and resume paths. The stop of ftrace was done
 because there was some function that would crash the system if one called
 smp_processor_id()! The stop/start was a big hammer to solve the issue
 at the time, which was when ftrace was first introduced into Linux.
 Now ftrace has better infrastructure to debug such issues, and I found
 the problem function and labeled it with "notrace" and function tracing
 can now safely be activated all the way down into the guts of suspend
 and resume.
 
 Other changes include clean ups of uprobe code.
 Clean up of the trace_seq() code.
 And other various small fixes and clean ups to ftrace and tracing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT35zXAAoJEKQekfcNnQGuOz0H/38zqM0nLFhrgvz3EPk2UOjn
 xqpX8qyb2V7TJZL+IqeXU2a5cQZl5ba0D4WtBGpxbTae3CJYiuQ87iKUNFoH0om5
 FDpn80igb368k8V3qRdRsziKVCCf0XBd/NkHJXc0ZkfXGyzB2Ga4bBxALxp2gj9y
 bnO+vKo6+tWYKG4hyQb4P3LRXUrK8/LWEsPr39cH2QH1Rdj69Lx9CgrCdUVJmwcb
 Bj8hEiLXL/RYCFNn79A3wNTUvW0rG/AOIf4SLqXtasSRZ0ToaU0ZyDnrNv+0Ol47
 rX8tSk+LfXchL9hpIvjCf1vlAYq3pO02favteR/jip3lx/dTjEDE4RJ9qtJzZ4Q=
 =fwQY
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This pull request has a lot of work done.  The main thing is the
  changes to the ftrace function callback infrastructure.  It's
  introducing a way to allow different functions to call directly
  different trampolines instead of all calling the same "mcount" one.

  The only user of this for now is the function graph tracer, which
  always had a different trampoline, but the function tracer trampoline
  was called and did basically nothing, and then the function graph
  tracer trampoline was called.  The difference now, is that the
  function graph tracer trampoline can be called directly if a function
  is only being traced by the function graph trampoline.  If function
  tracing is also happening on the same function, the old way is still
  done.

  The accounting for this takes up more memory when function graph
  tracing is activated, as it needs to keep track of which functions it
  uses.  I have a new way that wont take as much memory, but it's not
  ready yet for this merge window, and will have to wait for the next
  one.

  Another big change was the removal of the ftrace_start/stop() calls
  that were used by the suspend/resume code that stopped function
  tracing when entering into suspend and resume paths.  The stop of
  ftrace was done because there was some function that would crash the
  system if one called smp_processor_id()! The stop/start was a big
  hammer to solve the issue at the time, which was when ftrace was first
  introduced into Linux.  Now ftrace has better infrastructure to debug
  such issues, and I found the problem function and labeled it with
  "notrace" and function tracing can now safely be activated all the way
  down into the guts of suspend and resume

  Other changes include clean ups of uprobe code, clean up of the
  trace_seq() code, and other various small fixes and clean ups to
  ftrace and tracing"

* tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits)
  ftrace: Add warning if tramp hash does not match nr_trampolines
  ftrace: Fix trampoline hash update check on rec->flags
  ring-buffer: Use rb_page_size() instead of open coded head_page size
  ftrace: Rename ftrace_ops field from trampolines to nr_trampolines
  tracing: Convert local function_graph functions to static
  ftrace: Do not copy old hash when resetting
  tracing: let user specify tracing_thresh after selecting function_graph
  ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on()
  tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST
  s390/ftrace: remove check of obsolete variable function_trace_stop
  arm64, ftrace: Remove check of obsolete variable function_trace_stop
  Blackfin: ftrace: Remove check of obsolete variable function_trace_stop
  metag: ftrace: Remove check of obsolete variable function_trace_stop
  microblaze: ftrace: Remove check of obsolete variable function_trace_stop
  MIPS: ftrace: Remove check of obsolete variable function_trace_stop
  parisc: ftrace: Remove check of obsolete variable function_trace_stop
  sh: ftrace: Remove check of obsolete variable function_trace_stop
  sparc64,ftrace: Remove check of obsolete variable function_trace_stop
  tile: ftrace: Remove check of obsolete variable function_trace_stop
  ftrace: x86: Remove check of obsolete variable function_trace_stop
  ...
2014-08-04 11:50:00 -07:00
Linus Torvalds
f2a84170ed Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu updates from Tejun Heo:

 - Major reorganization of percpu header files which I think makes
   things a lot more readable and logical than before.

 - percpu-refcount is updated so that it requires explicit destruction
   and can be reinitialized if necessary.  This was pulled into the
   block tree to replace the custom percpu refcnting implemented in
   blk-mq.

 - In the process, percpu and percpu-refcount got cleaned up a bit

* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits)
  percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()
  percpu-refcount: require percpu_ref to be exited explicitly
  percpu-refcount: use unsigned long for pcpu_count pointer
  percpu-refcount: add helpers for ->percpu_count accesses
  percpu-refcount: one bit is enough for REF_STATUS
  percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()
  workqueue: stronger test in process_one_work()
  workqueue: clear POOL_DISASSOCIATED in rebind_workers()
  percpu: Use ALIGN macro instead of hand coding alignment calculation
  percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations
  percpu: preffity percpu header files
  percpu: use raw_cpu_*() to define __this_cpu_*()
  percpu: reorder macros in percpu header files
  percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h
  percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h
  percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops
  percpu: reorganize include/linux/percpu-defs.h
  percpu: move accessors from include/linux/percpu.h to percpu-defs.h
  percpu: include/asm-generic/percpu.h should contain only arch-overridable parts
  percpu: introduce arch_raw_cpu_ptr()
  ...
2014-08-04 10:09:27 -07:00
Linus Torvalds
f74ad8df4e PCI changes for the v3.17 merge window:
Resource management
     - Support BAR sizes up to 128GB (Yinghai Lu)
     - Keep original resource if we fail to expand it (Guo Chao)
     - Return conventional error values from pci_revert_fw_address() (Bjorn Helgaas)
     - Tidy resource assignment messages (Bjorn Helgaas)
     - Don't exclude low BIOS area for non-PCI cards (Christoph Schulz)
 
   PCI device hotplug
     - Prevent NULL dereference during pciehp probe (Andreas Noever)
     - Make pciehp pcie_wait_cmd() self-contained (Bjorn Helgaas)
     - Wait for pciehp hotplug command completion lazily (Bjorn Helgaas)
     - Compute pciehp timeout from hotplug command start time (Bjorn Helgaas)
     - Remove pciehp assumptions about which commands cause completion events (Bjorn Helgaas)
     - Clear pciehp Data Link Layer State Changed during init (Myron Stowe)
     - Remove pciehp struct controller.no_cmd_complete (Rajat Jain)
     - Remove cpqphp unnecessary null test (Fabian Frederick)
     - Remove "invalid IRQ" warning for hot-added PCIe ports (Jiang Liu)
 
   IOMMU
     - Add DMA alias quirk for Intel 82801 bridge (Alex Williamson)
 
   MSI
     - Add internal msix_clear_and_set_ctrl() (Yijing Wang)
     - Remove unused msi_enabled_mask() (Yijing Wang)
     - Cache Multiple Message Capable in struct msi_desc (Yijing Wang)
     - Add msi_setup_entry() to clean up initialization (Yijing Wang)
     - Remove unused msi_remove_pci_irq_vectors() (Yijing Wang)
     - Retrieve first MSI IRQ from msi_desc rather than pci_dev (Yijing Wang)
     - Remove unused list access in __pci_restore_msix_state() (Yijing Wang)
     - Use irq_get_msi_desc() to simplify code (Yijing Wang)
 
   Generic host bridge driver
     - Fix GPL v2 license string typo (Bjorn Helgaas)
 
   Marvell MVEBU
     - Fix GPL v2 license string typo (Thierry Reding)
 
   NVIDIA Tegra
     - Use correct initial HW settings (Phil Edworthy)
     - Remove rcar_pcie_setup_window() resource argument (Phil Edworthy)
     - Fix GPL v2 license string typo (Thierry Reding)
 
   Renesas R-Car
     - Remove redundant config accessor register checks (Sergei Shtylyov)
     - Fix GPL v2 license string typo (Bjorn Helgaas)
 
   Virtualization
     - Factor secondary bus reset logic (Gavin Shan)
     - Remove duplicate powerpc reset logic (Gavin Shan)
 
   Miscellaneous
     - Rework default VGA detection for EFI (Bruno Prémont)
     - Fix sysfs "acpi_index" and "label" errors for NIC renaming (Simone Gotti)
     - Configure ASPM at pci_enable_device()-time (Vidya Sagar)
     - Add include/linux/pci_ids.h include guard (Rasmus Villemoes)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTzppBAAoJEFmIoMA60/r8mtQP/jgVWCSU+0ulHjoxVSRLu4Lc
 UGKQFjS03oWWflHdvW6wZFqN82Ynva9fYCLMtiKdPg7cgTosSRT3I4DjAIm80ZI/
 kZvHxSmi6DBYmchZBsWzj60zxNiYZeEgd7CevzcJRHuwbKNMr2y12s6hjJbyl5lF
 ygaXWpDveKjsEDjyk9vKjUGwul/NJKynar253Yh178XaoypdGuiEIw3D1lQFMZZp
 ADcRijIi+CD2BENtDr6fbldbj+yQ93yyUSloEnaKtWZD+Ao5IsHngN0IyRu+l1Wl
 LFob0AsopeYVFKdw22Gn1KAq9Jj01acsSBRXjgrauU+tLY512Vkbp1lFYl85B/38
 /Z0VNHncmIh29rq9Tl2xQwEeI3Ja27FfnMjC70dLM5YjWf8vsYnDEQZHyxAAe15D
 p3H3YuuDjmvHkoSrHY/68DLfDl9ubw3/BFUlCMqijL7444ZWLXathrnCV8ZJimmr
 PlF/m7GtXYF4wIw19m9KQqNBUPJJEsVHExKzICOY4v5/nMlvx4ZkBDR3tPNEH1sk
 3AYKjLDw21Nle7yKcAlxDI/TYWZqxuph23UpevzlQd16tutq2i2FqpauiqI3DFm4
 VfYVbOVQwfeUJt11VOCgxvE7RsTxCk5QefB+YKVAdVK6vMZHeZxsetYvrCDptnea
 cId/NfiEFnmr+u3mAyPM
 =U5Ip
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "I'll be on vacation until Aug 11, and I suspect the merge window will
  open before then, so I'm sending this to you early.  There are more
  things I'd like to get into v3.17, so I hope to send another pull
  request soon after I return.

  The most notable pieces here are:

   - Support BARs up to 128GB (up from 8GB)
   - Fix SR-IOV resource assignment when we fail to expand a resource
   - Rework pciehp to handle a common hardware erratum
   - Cleanup MSI
   - Fix NIC renaming issue
   - Fix VGA default device issue on EFI systems
   - Fix ASPM configuration (previously we didn't enable it as expected)

  Alex Williamson has graciously agreed to take care of any major issues
  with this if you take it before I return.

  Details:

  Resource management
    - Support BAR sizes up to 128GB (Yinghai Lu)
    - Keep original resource if we fail to expand it (Guo Chao)
    - Return conventional error values from pci_revert_fw_address() (Bjorn Helgaas)
    - Tidy resource assignment messages (Bjorn Helgaas)
    - Don't exclude low BIOS area for non-PCI cards (Christoph Schulz)

  PCI device hotplug
    - Prevent NULL dereference during pciehp probe (Andreas Noever)
    - Make pciehp pcie_wait_cmd() self-contained (Bjorn Helgaas)
    - Wait for pciehp hotplug command completion lazily (Bjorn Helgaas)
    - Compute pciehp timeout from hotplug command start time (Bjorn Helgaas)
    - Remove pciehp assumptions about which commands cause completion events (Bjorn Helgaas)
    - Clear pciehp Data Link Layer State Changed during init (Myron Stowe)
    - Remove pciehp struct controller.no_cmd_complete (Rajat Jain)
    - Remove cpqphp unnecessary null test (Fabian Frederick)
    - Remove "invalid IRQ" warning for hot-added PCIe ports (Jiang Liu)

  IOMMU
    - Add DMA alias quirk for Intel 82801 bridge (Alex Williamson)

  MSI
    - Add internal msix_clear_and_set_ctrl() (Yijing Wang)
    - Remove unused msi_enabled_mask() (Yijing Wang)
    - Cache Multiple Message Capable in struct msi_desc (Yijing Wang)
    - Add msi_setup_entry() to clean up initialization (Yijing Wang)
    - Remove unused msi_remove_pci_irq_vectors() (Yijing Wang)
    - Retrieve first MSI IRQ from msi_desc rather than pci_dev (Yijing Wang)
    - Remove unused list access in __pci_restore_msix_state() (Yijing Wang)
    - Use irq_get_msi_desc() to simplify code (Yijing Wang)

  Generic host bridge driver
    - Fix GPL v2 license string typo (Bjorn Helgaas)

  Marvell MVEBU
    - Fix GPL v2 license string typo (Thierry Reding)

  NVIDIA Tegra
    - Use correct initial HW settings (Phil Edworthy)
    - Remove rcar_pcie_setup_window() resource argument (Phil Edworthy)
    - Fix GPL v2 license string typo (Thierry Reding)

  Renesas R-Car
    - Remove redundant config accessor register checks (Sergei Shtylyov)
    - Fix GPL v2 license string typo (Bjorn Helgaas)

  Virtualization
    - Factor secondary bus reset logic (Gavin Shan)
    - Remove duplicate powerpc reset logic (Gavin Shan)

  Miscellaneous
    - Rework default VGA detection for EFI (Bruno Prémont)
    - Fix sysfs "acpi_index" and "label" errors for NIC renaming (Simone Gotti)
    - Configure ASPM at pci_enable_device()-time (Vidya Sagar)
    - Add include/linux/pci_ids.h include guard (Rasmus Villemoes)"

* tag 'pci-v3.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (38 commits)
  PCI/MSI: Use irq_get_msi_desc() to simplify code
  PCI/MSI: Remove unused list access in __pci_restore_msix_state()
  PCI/MSI: Retrieve first MSI IRQ from msi_desc rather than pci_dev
  PCI/MSI: Remove unused function msi_remove_pci_irq_vectors()
  PCI/MSI: Add msi_setup_entry() to clean up MSI initialization
  PCI: Configure ASPM when enabling device
  x86: don't exclude low BIOS area when allocating address space for non-PCI cards
  PCI: generic: Fix GPL v2 license string typo
  PCI: rcar: Fix GPL v2 license string typo
  PCI: tegra: Fix GPL v2 license string typo
  PCI: mvebu: Fix GPL v2 license string typo
  PCI: Add include guard to include/linux/pci_ids.h
  x86, ia64: Move EFI_FB vga_default_device() initialization to pci_vga_fixup()
  PCI: Tidy resource assignment messages
  PCI: Return conventional error values from pci_revert_fw_address()
  PCI: Cleanup control flow
  PCI: Support BAR sizes up to 128GB
  PCI: cpqphp: Remove unnecessary null test before debugfs_remove()
  PCI: pciehp: Clear Data Link Layer State Changed during init
  PCI: Add bridge DMA alias quirk for Intel 82801 bridge
  ...
2014-08-04 09:29:37 -07:00
Mark Brown
a6ce305207 Merge remote-tracking branches 'asoc/topic/intel', 'asoc/topic/kirkwood', 'asoc/topic/max98090' and 'asoc/topic/mc13783' into asoc-next 2014-08-04 16:31:45 +01:00
Dave Hansen
d17d8f9ded x86/mm: Add tracepoints for TLB flushes
We don't have any good way to figure out what kinds of flushes
are being attempted.  Right now, we can try to use the vm
counters, but those only tell us what we actually did with the
hardware (one-by-one vs full) and don't tell us what was actually
_requested_.

This allows us to select out "interesting" TLB flushes that we
might want to optimize (like the ranged ones) and ignore the ones
that we have very little control over (the ones at context
switch).

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154059.4C96CBA5@viggo.jf.intel.com
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-31 08:48:51 -07:00
Dave Hansen
e9f4e0a9fe x86/mm: Rip out complicated, out-of-date, buggy TLB flushing
I think the flush_tlb_mm_range() code that tries to tune the
flush sizes based on the CPU needs to get ripped out for
several reasons:

1. It is obviously buggy.  It uses mm->total_vm to judge the
   task's footprint in the TLB.  It should certainly be using
   some measure of RSS, *NOT* ->total_vm since only resident
   memory can populate the TLB.
2. Haswell, and several other CPUs are missing from the
   intel_tlb_flushall_shift_set() function.  Thus, it has been
   demonstrated to bitrot quickly in practice.
3. It is plain wrong in my vm:
	[    0.037444] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
	[    0.037444] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0
	[    0.037444] tlb_flushall_shift: 6
   Which leads to it to never use invlpg.
4. The assumptions about TLB refill costs are wrong:
	http://lkml.kernel.org/r/1337782555-8088-3-git-send-email-alex.shi@intel.com
    (more on this in later patches)
5. I can not reproduce the original data: https://lkml.org/lkml/2012/5/17/59
   I believe the sample times were too short.  Running the
   benchmark in a loop yields times that vary quite a bit.

Note that this leaves us with a static ceiling of 1 page.  This
is a conservative, dumb setting, and will be revised in a later
patch.

This also removes the code which attempts to predict whether we
are flushing data or instructions.  We expect instruction flushes
to be relatively rare and not worth tuning for explicitly.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154055.ABC88E89@viggo.jf.intel.com
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-31 08:48:50 -07:00
David Rientjes
2f078b9cb8 x86, apic: Remove enable_apic_mode callback
The enable_apic_mode() apic callback is never called, so remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302352320.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:44 -07:00
David Rientjes
11a8318ef5 x86, apic: Remove setup_portio_remap callback
Since commit b5660ba76b ("x86, platforms: Remove NUMAQ") removed NUMAQ,
the setup_portio_remap() apic callback has been obsolete.  Remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302351480.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:44 -07:00
David Rientjes
e76661ba09 x86, apic: Remove multi_timer_check callback
Since commit b5660ba76b ("x86, platforms: Remove NUMAQ") removed NUMAQ,
the multi_timer_check() apic callback has been obsolete.  Remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302351120.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:43 -07:00
David Rientjes
658ffd7e6f x86, apic: Remove check_apicid_present callback
The check_apicid_present() apic callback is never called, so remove it
and functions that implement it.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302350160.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:42 -07:00
David Rientjes
c460b5d340 x86, apic: Remove mps_oem_check callback
Since commit b5660ba76b ("x86, platforms: Remove NUMAQ") removed NUMAQ,
the mps_oem_check() apic callback has been obsolete.  Remove it.

This allows generic_mps_oem_check() to be removed as well.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302349390.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:42 -07:00
David Rientjes
300eddf967 x86, apic: Remove smp_callin_clear_local_apic callback
Since commit b5660ba76b ("x86, platforms: Remove NUMAQ") removed NUMAQ,
the smp_callin_clear_local_apic() apic callback has been obsolete.
Remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302349040.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:41 -07:00
David Rientjes
6ab1b27c84 x86, apic: Replace trampoline physical addresses with defaults
The trampoline_phys_{high,low} members of struct apic are always
initialized to DEFAULT_TRAMPOLINE_PHYS_HIGH and TRAMPOLINE_PHYS_LOW,
respectively.  Hardwire the constants and remove the unneeded members.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302348330.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:41 -07:00
David Rientjes
80a2670379 x86, apic: Remove x86_32_numa_cpu_node callback
Since commit b5660ba76b ("x86, platforms: Remove NUMAQ") removed NUMAQ,
the x86_32_numa_cpu_node() apic callback has been obsolete.  Remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302348060.17503@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-31 08:05:40 -07:00
Andy Lutomirski
7209a75d20 x86_64/entry/xen: Do not invoke espfix64 on Xen
This moves the espfix64 logic into native_iret.  To make this work,
it gets rid of the native patch for INTERRUPT_RETURN:
INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.

This changes the 16-bit SS behavior on Xen from OOPSing to leaking
some bits of the Xen hypervisor's RSP (I think).

[ hpa: this is a nonzero cost on native, but probably not enough to
  measure. Xen needs to fix this in their own code, probably doing
  something equivalent to espfix64. ]

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2014-07-28 15:25:40 -07:00
Henrique de Moraes Holschuh
ebc14ddcc9 x86, microcode, intel: Fix total_size computation
According to the Intel SDM vol 3A (order code 253668-051US, June 2014),
on section 9.11.1, page 9-28:

"For microcode updates with a data size field equal to 00000000H, the
size of the microcode update is 2048 bytes. The first 48 bytes contain
the microcode update header. The remaining 2000 bytes contain encrypted
data."

"For microcode updates with a data size not equal to 00000000H, the total
size field specifies the size of the microcode update."

Up to 2002/2003, Intel used an "old format" for the microcode update
containers that was always 2048 bytes in size. That old format did not
have Data Size and Total Size fields, the quadwords at those positions
in the microcode container header were "reserved". The microcode header
of the "old format" microcode container has a hrdver of 0x01. You can
hunt down an old copy of the Intel SDM to validate this through its
order number (#243192). I found one from 1999 through a Google search.

Sometime in 2002/2003 (AFAICT, for the Prescott processors), Intel
documented a new format for the microcode containers and contributed in
2003 some code to the Linux kernel microcode driver implementing support
for the new format. This new format has Data Size and Total Size fields,
as well as the optional extended signature table. However, it reuses the
same hrdver as the old format (0x01), and it can only be told apart from
the old format by a non-zero Data Size field.

In fact, the only reason we can even trust a Data Size of zero to mean
that the microcode container is in the old format, is because Intel
reatroatively promised that the old format would always have a zero
there when they wrote the documentation for the _new_ format.

This is a very old bug, dating back to 2003. It has been dormant
ever since, as Intel seems to set all reserved fields to zero on the
microcode updates they distribute: I could not find a public microcode
update that would trigger this bug.

Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Link: http://lkml.kernel.org/r/1406146251-8540-1-git-send-email-hmh@hmh.eng.br
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-07-28 16:08:02 +02:00
Ingo Molnar
5030c69755 Linux 3.16-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT1VYNAAoJEHm+PkMAQRiGQJwIAKSYp1Uqz5O/e5r0V1TlZKT4
 1B4Njopl57PwSrJQWcGEuH2yHyM896vfPO4L6BJIOfyWzh8kwpQqclDt6uhXoF/v
 OsO1zb/7/j+n/pDZsePqP9AyIgErsHEBgUbhecDqzjN++ITPcZjQ6TIMPglZaumN
 jFAdAZuAaEwqAk8jqN2wlm689Fh9MuUEarHXbXLCqu5RgLrWhFGhp/cTWY62aqnZ
 XfEeQ9KtpRZmlR/IYjerbb1eRH7ZdJsZ88WngLX9dj/JdNxHWBkWQBXGAusXk5Fk
 y6LsIV3TjyBdrRKJ1Ifyg/2EIXHNBs8HxTFGXpjtp2HPuMLDxZOWOWikb9URtNg=
 =Fjf4
 -----END PGP SIGNATURE-----

Merge tag 'v3.16-rc7' into perf/core, to merge in the latest fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28 10:00:33 +02:00
Rafael J. Wysocki
8989e1cc35 Merge branch 'acpi-config'
* acpi-config:
  ACPI / processor: Introduce ARCH_MIGHT_HAVE_ACPI_PDC
  ACPI: Don't use acpi_lapic in ACPI core code
  ACPI: add config for BIOS table scan
2014-07-27 23:52:48 +02:00
Li, Aubrey
f855911c1f x86/pmc_atom: Expose PMC device state and platform sleep state
Add the following interfaces to exposes PMC device state and sleep
state residency via debugfs:
	/sys/kernel/debugfs/pmc_atom/dev_state
	/sys/kernel/debugfs/pmc_atom/sleep_state

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Link: http://lkml.kernel.org/r/53B0FF59.8000600@linux.intel.com
Signed-off-by: Kasagar, Srinidhi <srinidhi.kasagar@intel.com>
Reviewed-by: Rudramuni, Vishwesh M <vishwesh.m.rudramuni@intel.com>
Reviewed-by: Joe Perches <joe@perches.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-25 14:12:14 -07:00
Li, Aubrey
b00055cade x86/pmc_atom: Eisable a few S0ix wake up events for S0ix residency
Disable PMC S0IX_WAKE_EN events coming from LPC block(unused) and
also from GPIO_SUS ored dedicated IRQs (must be disabled as per PMC
programming rule), GPIOSCORE ored dedicated IRQs (must be disabled
as per PMC programming rule), GPIO_SUS shared IRQ (not necessary
since the IOAPIC_DS wake event will still work), GPIO_SCORE shared
IRQ (not necessary since the IOAPIC_DS wake event will still work).

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Link: http://lkml.kernel.org/r/53B0FF22.5080403@linux.intel.com
Signed-off-by: Olivier Leveque <olivier.leveque@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-25 14:11:58 -07:00
Li, Aubrey
93e5eadd1f x86/platform: New Intel Atom SOC power management controller driver
The Power Management Controller (PMC) controls many of the power
management features present in the Atom SoC. This driver provides
a native power off function via PMC PCI IO port.

On some ACPI hardware-reduced platforms(e.g. ASUS-T100), ACPI sleep
registers are not valid so that (*pm_power_off)() is not hooked by
acpi_power_off(). The power off function in this driver is installed
only when pm_power_off is NULL.

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Link: http://lkml.kernel.org/r/53B0FEEA.3010805@linux.intel.com
Signed-off-by: Lejun Zhu <lejun.zhu@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-25 14:11:29 -07:00
Lv Zheng
d334c823b2 ACPICA: Linux: Add support to exclude <asm/acenv.h> inclusion.
The forthcoming patch will make <acpi/acpi.h> to be visible to all kernel
source code. Thus for the architectures that do not support ACPI and
haven't implemented <asm/acenv.h>, we need to make it excluded.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-23 01:10:44 +02:00
Nadav Amit
6f43ed01e8 KVM: x86: DR6/7.RTM cannot be written
Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is
not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the
current KVM implementation. This bug is apparent only if the MOV-DR instruction
is emulated or the host also debugs the guest.

This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and
set respectively. It also sets DR6.RTM upon every debug exception. Obviously,
it is not a complete fix, as debugging of RTM is still unsupported.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 17:17:52 +02:00
Nadav Amit
c9cdd085bb KVM: x86: Defining missing x86 vectors
Defining XE, XM and VE vector numbers.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 14:18:51 +02:00
Graeme Gregory
b50154d53e ACPI: Don't use acpi_lapic in ACPI core code
Now ARM64 support is being added to ACPI so architecture specific
values can not be used in core ACPI code.

Following on the patch "ACPI / processor: Check if LAPIC is present
during initialization" which uses acpi_lapic in acpi_processor.c,
on ARM64 platform, GIC is used instead of local APIC, so acpi_lapic
is not a suitable value for ARM64.

What is actually important at this point is if there is/are CPU
entry/entries (Local APIC/SAPIC, GICC) in MADT, so introduce
acpi_has_cpu_in_madt() to be arch specific and generic.

Signed-off-by: Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21 13:50:58 +02:00
Matt Fleming
44be28e9dd x86/reboot: Add EFI reboot quirk for ACPI Hardware Reduced flag
It appears that the BayTrail-T class of hardware requires EFI in order
to powerdown and reboot and no other reliable method exists.

This quirk is generally applicable to all hardware that has the ACPI
Hardware Reduced bit set, since usually ACPI would be the preferred
method.

Cc: Len Brown <len.brown@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-18 21:23:52 +01:00
Steven Rostedt (Red Hat)
1026ff9b8e ftrace/x86: Have function graph tracer use its own trampoline
The function graph trampoline is called from the function trampoline
and both do a save and restore of registers. The save of registers
done by the function trampoline when only the function graph tracer
is running is a waste of CPU cycles.

As the function graph tracer trampoline in x86 is dependent from
the function trampoline, we can call it directly when a function
is only being traced by the function graph trampoline.

Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-17 09:44:37 -04:00
Davidlohr Bueso
3a6bfbc91d arch, locking: Ciao arch_mutex_cpu_relax()
The arch_mutex_cpu_relax() function, introduced by 34b133f, is
hacky and ugly. It was added a few years ago to address the fact
that common cpu_relax() calls include yielding on s390, and thus
impact the optimistic spinning functionality of mutexes. Nowadays
we use this function well beyond mutexes: rwsem, qrwlock, mcs and
lockref. Since the macro that defines the call is in the mutex header,
any users must include mutex.h and the naming is misleading as well.

This patch (i) renames the call to cpu_relax_lowlatency  ("relax, but
only if you can do it with very low latency") and (ii) defines it in
each arch's asm/processor.h local header, just like for regular cpu_relax
functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
and thus we can take it out of mutex.h. While this can seem redundant,
I believe it is a good choice as it allows us to move out arch specific
logic from generic locking primitives and enables future(?) archs to
transparently define it, similarly to System Z.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bharat Bhushan <r65777@freescale.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Stratos Karafotis <stratosk@semaphore.gr>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Kulikov <segoon@openwall.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: adi-buildroot-devel@lists.sourceforge.net
Cc: linux390@de.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-am33-list@redhat.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-cris-kernel@axis.com
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux@lists.openrisc.net
Cc: linux-m32r-ja@ml.linux-m32r.org
Cc: linux-m32r@ml.linux-m32r.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-metag@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-17 12:32:47 +02:00
Ingo Molnar
b5e4111f02 Merge branch 'locking/urgent' into locking/core, before applying larger changes and to refresh the branch with fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-17 11:45:29 +02:00
Alexander Yarygin
44b3802122 perf kvm: Use defines of kvm events
Currently perf-kvm uses string literals for kvm event names, but it
works only for x86, because other architectures may have other names for
those events.

To reduce dependence on architecture, we add <asm/kvm_perf.h> file with
defines for:

- kvm_entry and kvm_exit events,
- exit reason field name in kvm_exit event,
- length of exit reasons strings,
- vcpu_id field name in kvm trace events,

and replace literals in perf-kvm.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1404397747-20939-2-git-send-email-yarygin@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-07-16 17:57:32 -03:00
Borislav Petkov
af0fa6f6b5 x86, cpu: Kill cpu_has_mp
It was used only for checking for some K7s which didn't have MP support,
see

http://www.hardwaresecrets.com/article/How-to-Transform-an-Athlon-XP-into-an-Athlon-MP/24

and it was unconditionally set on 64-bit for no reason. Kill it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1403609105-8332-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-14 12:21:40 -07:00
Borislav Petkov
80a208bd39 x86/cpufeature: Add bug flags to /proc/cpuinfo
Dump the flags which denote we have detected and/or have applied bug
workarounds to the CPU we're executing on, in a similar manner to the
feature flags.

The advantage is that those are not accumulating over time like the CPU
features.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1403609105-8332-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-14 12:21:39 -07:00
Oren Twaig
411cf9ee29 x86, vsmp: Remove is_vsmp_box() from apic_is_clustered_box()
When a vSMP Foundation box is detected, the function apic_cluster_num() counts
the number of APIC clusters found. If more than one found, a multi board
configuration is assumed, and TSC marked as unstable. This behavior is
incorrect as vSMP Foundation may use processors from single node only, attached
to memory of other nodes - and such node may have more than one APIC cluster
(typically any recent intel box has more than single APIC_CLUSTERID(x)).

To fix this, we simply remove the code which detects a vSMP Foundation box and
affects apic_is_clusted_box() return value. This can be done because later the
kernel checks by itself if the TSC is stable using the
check_tsc_sync_[source|target]() functions and marks TSC as unstable if needed.

Acked-by: Shai Fultheim <shai@scalemp.com>
Signed-off-by: Oren Twaig <oren@scalemp.com>
Link: http://lkml.kernel.org/r/1404036068-11674-1-git-send-email-oren@scalemp.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-07-13 17:48:03 -07:00
Borislav Petkov
b08ee5f7e4 x86: Simplify __HAVE_ARCH_CMPXCHG tests
Both the 32-bit and 64-bit cmpxchg.h header define __HAVE_ARCH_CMPXCHG
and there's ifdeffery which checks it. But since both bitness define it,
we can just as well move it up to the main cmpxchg header and simpify a
bit of code in doing that.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20140711104338.GB17083@pd.tnic
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-11 17:28:51 -07:00
Andy Lutomirski
e6577a7ce9 x86, vdso: Move the vvar area before the vdso text
Putting the vvar area after the vdso text is rather complicated: it
only works of the total length of the vdso text mapping is known at
vdso link time, and the linker doesn't allow symbol addresses to
depend on the sizes of non-allocatable data after the PT_LOAD
segment.

Moving the vvar area before the vdso text will allow is to safely
map non-allocatable data after the vdso text, which is a nice
simplification.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/156c78c0d93144ff1055a66493783b9e56813983.1405040914.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-11 16:57:51 -07:00
Paolo Bonzini
17052f16a5 KVM: emulate: put pointers in the fetch_cache
This simplifies the code a bit, especially the overflow checks.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:03 +02:00
Bandan Das
41061cdb98 KVM: emulate: do not initialize memopp
rip_relative is only set if decode_modrm runs, and if you have ModRM
you will also have a memopp.  We can then access memopp unconditionally.
Note that rip_relative cannot be hoisted up to decode_modrm, or you
break "mov $0, xyz(%rip)".

Also, move typecast on "out of range value" of mem.ea to decode_modrm.

Together, all these optimizations save about 50 cycles on each emulated
instructions (4-6%).

Signed-off-by: Bandan Das <bsd@redhat.com>
[Fix immediate operands with rip-relative addressing. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:01 +02:00
Bandan Das
573e80fe04 KVM: emulate: rework seg_override
x86_decode_insn already sets a default for seg_override,
so remove it from the zeroed area. Also replace set/get functions
with direct access to the field.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:01 +02:00
Bandan Das
c44b4c6ab8 KVM: emulate: clean up initializations in init_decode_cache
A lot of initializations are unnecessary as they get set to
appropriate values before actually being used. Optimize
placement of fields in x86_emulate_ctxt

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:00 +02:00
Bandan Das
1498507a47 KVM: emulate: move init_decode_cache to emulate.c
Core emulator functions all belong in emulator.c,
x86 should have no knowledge of emulator internals

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:13:59 +02:00
Paolo Bonzini
54cfdb3e95 KVM: emulate: speed up emulated moves
We can just blindly move all 16 bytes of ctxt->src's value to ctxt->dst.
write_register_operand will take care of writing only the lower bytes.

Avoiding a call to memcpy (the compiler optimizes it out) gains about
200 cycles on kvm-unit-tests for register-to-register moves, and makes
them about as fast as arithmetic instructions.

We could perhaps get a larger speedup by moving all instructions _except_
moves out of x86_emulate_insn, removing opcode_len, and replacing the
switch statement with an inlined em_mov.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:13:58 +02:00
Paolo Bonzini
37ccdcbe07 KVM: x86: return all bits from get_interrupt_shadow
For the next patch we will need to know the full state of the
interrupt shadow; we will then set KVM_REQ_EVENT when one bit
is cleared.

However, right now get_interrupt_shadow only returns the one
corresponding to the emulated instruction, or an unconditional
0 if the emulated instruction does not have an interrupt shadow.
This is confusing and does not allow us to check for cleared
bits as mentioned above.

Clean the callback up, and modify toggle_interruptibility to
match the comment above the call.  As a small result, the
call to set_interrupt_shadow will be skipped in the common
case where int_shadow == 0 && mask == 0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:13:56 +02:00
Bruno Prémont
20cde69402 x86, ia64: Move EFI_FB vga_default_device() initialization to pci_vga_fixup()
Commit b4aa016305 ("efifb: Implement vga_default_device() (v2)") added
efifb vga_default_device() so EFI systems that do not load shadow VBIOS or
setup VGA get proper value for boot_vga PCI sysfs attribute on the
corresponding PCI device.

Xorg doesn't detect devices when boot_vga=0, e.g., on some EFI systems such
as MacBookAir2,1.  Xorg detects the GPU and finds the DRI device but then
bails out with "no devices detected".

Note: When vga_default_device() is set boot_vga PCI sysfs attribute
reflects its state.  When unset this attribute is 1 whenever
IORESOURCE_ROM_SHADOW flag is set.

With introduction of sysfb/simplefb/simpledrm efifb is getting obsolete
while having native drivers for the GPU also makes selecting sysfb/efifb
optional.

Remove the efifb implementation of vga_default_device() and initialize
vgaarb's vga_default_device() with the PCI GPU that matches boot
screen_info in pci_fixup_video().

[bhelgaas: remove unused "dev" in efifb_setup()]
Fixes: b4aa016305 ("efifb: Implement vga_default_device() (v2)")
Tested-by: Anibal Francisco Martinez Cortina <linuxkid.zeuz@gmail.com>
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Matthew Garrett <matthew.garrett@nebula.com>
CC: stable@vger.kernel.org	# v3.5+
2014-07-10 16:48:48 -06:00
Tomasz Grabiec
0d3da0d26e KVM: x86: fix TSC matching
I've observed kvmclock being marked as unstable on a modern
single-socket system with a stable TSC and qemu-1.6.2 or qemu-2.0.0.

The culprit was failure in TSC matching because of overflow of
kvm_arch::nr_vcpus_matched_tsc in case there were multiple TSC writes
in a single synchronization cycle.

Turns out that qemu does multiple TSC writes during init, below is the
evidence of that (qemu-2.0.0):

The first one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm]
 0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm]

The second one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]

 #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
 #1  kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
 #2  kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
 #3  kvm_cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641
 #4  cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330
 #5  cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521
 #6  main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390

The third one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]

 #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
 #1  kvm_put_msrs  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
 #2  kvm_arch_put_registers  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
 #3  kvm_cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635
 #4  cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323
 #5  cpu_synchronize_all_post_reset () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:512
 #6  main  at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482

The fix is to count each vCPU only once when matched, so that
nr_vcpus_matched_tsc holds the size of the matched set. This is
achieved by reusing generation counters. Every vCPU with
this_tsc_generation == cur_tsc_generation is in the matched set. The
match set is cleared by setting cur_tsc_generation to a value which no
other vCPU is set to (by incrementing it).

I needed to bump up the counter size form u8 to u64 to ensure it never
overflows. Otherwise in cases TSC is not written the same number of
times on each vCPU the counter could overflow and incorrectly indicate
some vCPUs as being in the matched set. This scenario seems unlikely
but I'm not sure if it can be disregarded.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09 18:09:57 +02:00
Jan Kiszka
6cbc5f5a80 KVM: nSVM: Set correct port for IOIO interception evaluation
Obtaining the port number from DX is bogus as a) there are immediate
port accesses and b) user space may have changed the register content
while processing the PIO access. Forward the correct value from the
instruction emulator instead.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-09 18:09:56 +02:00
Ard Biesheuvel
f23cf8bd5c efi/x86: efistub: Move shared dependencies to <asm/efi.h>
This moves definitions depended upon both by code under arch/x86/boot
and under drivers/firmware/efi to <asm/efi.h>. This is in preparation of
turning the stub code under drivers/firmware/efi into a static library.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-07 20:29:46 +01:00
Tejun Heo
b9cd18de4d ptrace,x86: force IRET path after a ptrace_stop()
The 'sysret' fastpath does not correctly restore even all regular
registers, much less any segment registers or reflags values.  That is
very much part of why it's faster than 'iret'.

Normally that isn't a problem, because the normal ptrace() interface
catches the process using the signal handler infrastructure, which
always returns with an iret.

However, some paths can get caught using ptrace_event() instead of the
signal path, and for those we need to make sure that we aren't going to
return to user space using 'sysret'.  Otherwise the modifications that
may have been done to the register set by the tracer wouldn't
necessarily take effect.

Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
arch_ptrace_stop_needed() which is invoked from ptrace_stop().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03 17:27:23 -07:00
Linus Torvalds
4f23174981 A bunch of one-liners (except the s390 one).
The two more serious bugs ("KVM: SVM: Fix CPL export via SS.DPL" and
 "KVM: s390: add sie.h uapi header file to Kbuild and remove header
 dependency") were introduced in the 3.16 merge window.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTso6OAAoJEBvWZb6bTYbyB5IP/j/1d0hKsVjOGMco+dJ3uwjh
 X2gYBVxT3Nm9fyjAjSM6OjbFF2mj9zFNEGu0NvEaDnIlWtifvtXckFB1asp/o3/M
 UTbeaaN5US9Ou9W87KpJQLp7Wp9ENMgXeFsywpf9qMNyV04OHSP5cCwIspShCkNH
 r21oEwgvrnc4Trh4oBVUaykuPuU4mzAMBSiXxbQWVXkkPYBVBjGYNzdas7K3EfS9
 YmY1NgS1HDrUvRuM0b3guHqEizA717xFxpVpXAYhuxRqb1fRWMDuIy7q21hEoXxE
 RtuFjztfpAWIgK9O2j4mTuqR32nedzsieVMzF486oMPXRrUs+oQ16/AYV7K5eOZF
 Q1QJ5zx7890ncfxjXYMdUTI7d5sDFCc7F3DmRwtWh1jYrsNh8p+VRDTbNdmiNuIa
 1wXkoNpTsFHidXA6Uhl2pfI+o9OOWleEP3bB746RanS4bk54cDrx6UK2gJK6MMHl
 bekjzQrXRlh3qN1mqS+nMShq/vd2G6cCG0Y9ez8/aHrJoU5DPIOQ7IcOt2IZGtwZ
 MiBZAoWgHuYpEV4tXzqjHQy8IAddvGnM3RqWfNc0XLlVwcKosdI4fusg8wKnR02Z
 sLYRfe5BTnHG/ieIlx/iQmRXN04hhIUJiggFZrLizGemGYi16SaN/ixwb4YoA3nH
 ksZxlGKUi9SUftnv0Ph6
 =Rjb4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "A bunch of one-liners (except the s390 one).

  The two more serious bugs ("KVM: SVM: Fix CPL export via SS.DPL" and
  "KVM: s390: add sie.h uapi header file to Kbuild and remove header
  dependency") were introduced in the 3.16 merge window"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Fix CPL export via SS.DPL
  KVM: s390: add sie.h uapi header file to Kbuild and remove header dependency
  MIPS: KVM: Fix memory leak on VCPU
  KVM: x86: preserve the high 32-bits of the PAT register
  kvm: fix wrong address when writing Hyper-V tsc page
  KVM: x86: Increase the number of fixed MTRR regs to 10
2014-07-01 09:27:34 -07:00
Aaron Tomlin
f3aca3d095 nmi: provide the option to issue an NMI back trace to every cpu but current
Sometimes it is preferred not to use the trigger_all_cpu_backtrace()
routine when one wants to avoid capturing a back trace for current.  For
instance if one was previously captured recently.

This patch provides a new routine namely
trigger_allbutself_cpu_backtrace() which offers the flexibility to issue
an NMI to every cpu but current and capture a back trace accordingly.

Patch x86 and sparc to support new routine.

[dzickus@redhat.com: add stub in #else clause]
[dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion]
[sfr@canb.auug.org.au: undo C99ism]
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23 16:47:44 -07:00
Vinod Koul
61b165caa6 ASoC: Intel: add mrfld pipelines
Merrifield DSP used various pipelines to identify the streams and processing
modules. Add these defination in the pcm driver and also add a table for device
entries to firmware pipeline id conversion

Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
2014-06-23 12:24:27 +01:00
Jiang Liu
df334bead7 x86, irq: Introduce helper functions to release IOAPIC pin
Introduce function mp_unmap_irq() to release IOAPIC IRQ when IRQ is not
used any more, which will typically called by pcibios_disabled_irq.

And function mp_irqdomain_unmap() is a common implementation of
irq_domain_ops.unmap for IOAPIC.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1402302011-23642-38-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 23:05:44 +02:00
Jiang Liu
9f354b0252 x86, irq: Clean up unused IOAPIC interface
Now we have converted all x86 platforms to use the common irqdomain map
interface. There's no caller of io_apic_set_pci_routing(),
setup_IO_APIC_irq_extra() and io_apic_setup_irq_pin_once() any more,
so kill them.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1402302011-23642-35-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 23:05:44 +02:00
Jiang Liu
15a3c7cc91 x86, irq: Introduce two helper functions to support irqdomain map operation
Currently there are multiple entries to program IOAPIC pins, such as
io_apic_setup_irq_pin_once(), io_apic_set_pci_routing() and
setup_IO_APIC_irq_extra() etc.

This patch introduces two functions to help consolidate the code to
program IOAPIC pins. Function mp_set_pin_attr() is used to optionally
set trigger, polarity and NUMA node property for an IOAPIC pin.
If mp_set_pin_attr() is not invoked for a pin, the default configuration
from BIOS will be used.

Function mp_irqdomain_map() is an common implementation of irqdomain map()
operation. It figures out attribures for pin and then actually programs
the IOAPIC pin. We hope this will be the only entrance for programming
IOAPIC pin.

And the flow will:
1) caller such as xxx_pci_irq_enable figures out pin attributes.
2) Invoke mp_set_pin_attr() to set attributes for a pin. If the pin has
   already bin programmed,  mp_set_pin_attr() will aslo detects attribute
   confictions.
3) Invoke mp_map_pin_to_irq()
3.1) If IRQ has already been assigned, return irq_find_mapping()
3.2) Else irq_create_mapping()
		->irq_domain_associate()
			->mp_irqdomain_map()
				->io_apic_setup_irq_pin()

So every pin will only programmed once by mp_irqdomain_map(), so we
could kill io_apic_setup_irq_pin_once(), io_apic_set_pci_routing() and
setup_IO_APIC_irq_extra() etc.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1402302011-23642-30-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 23:05:43 +02:00
Jiang Liu
facd8fdb25 x86, devicetree, irq: Use common mechanism to support irqdomain
Now the ioapic driver provides a common interface to create irqdomain,
so replace the private implementation.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Tony Lindgren <tony@atomide.com>
Link: http://lkml.kernel.org/r/1402302011-23642-29-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 23:05:43 +02:00
Jiang Liu
44767bfaae x86, irq: Enhance mp_register_ioapic() to support irqdomain
Enhance function mp_register_ioapic() to support irqdomain.
When registering IOAPIC, caller may provide callbacks and parameters
for creating irqdomain. The IOAPIC core will create irqdomain later
if caller has passed in corresponding parameters.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: sfi-devel@simplefirmware.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Tony Lindgren <tony@atomide.com>
Link: http://lkml.kernel.org/r/1402302011-23642-25-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 23:05:42 +02:00
Jiang Liu
d7f3d47818 x86, irq: Introduce mechanisms to support dynamically allocate IRQ for IOAPIC
Currently x86 support identity mapping between GSI(IOAPIC pin) and IRQ
number, so continous IRQs at low end are statically allocated to IOAPICs
at boot time. This design causes trouble to support IOAPIC hotplug.

This patch implements basic mechanism to dynamically allocate IRQ on
demand for IOAPIC pins by using irqdomain framework.

It first adds several fields into struct ioapic to support irqdomain.
Then it implements an algorithm to dynamically allocate IRQ number
for IOAPIC pins on demand.

Currently it supports three types of irqdomain:
1) LEGACY: used to support IOAPIC hosting legacy IRQs and building
   identity mapping for legacy IRQs. A speical case, we dynamically
   allocate IRQ number for IOAPIC pin which has GSI number below
   nr_legacy_irqs() but isn't legacy IRQ. This is for backward
   compatibility and avoid regression.
2) STRICT: build identity mapping between GSI and IRQ nubmer.
3) DYNAMIC: dynamically allocate IRQ number for IOAPIC pin on demand.

Legacy(ISA) IRQs is not managed by irqdomain because there may be
multiple pins sharing the same IRQ number and current irqdomain only
supports 1:1 mapping between pins and IRQ.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Link: http://lkml.kernel.org/r/1402302011-23642-24-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 23:05:42 +02:00
Jiang Liu
6b9fb70824 x86, ACPI, irq: Consolidate algorithm of mapping (ioapic, pin) to IRQ number
Currently ACPI and ioapic both implement algorithms to map (ioapic, pin)
to IRQ number. So consolidate the common part into one place, which is
also preparing for irqdomain support.

It introduces mp_map_gsi_to_irq(), which will be used to allocate IRQ
number IOAPIC pins when irqdomain is enabled.

Also rename gsi_to_irq() to map_gsi_to_irq(), later we will introduce
unmap_gsi_to_irq() when enabling IOAPIC hotplug.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Link: http://lkml.kernel.org/r/1402380812-32446-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 23:05:42 +02:00
Jiang Liu
95d76acc75 x86, irq: Count legacy IRQs by legacy_pic->nr_legacy_irqs instead of NR_IRQS_LEGACY
Some platforms, such as Intel MID and mshypv, do not support legacy
interrupt controllers. So count legacy IRQs by legacy_pic->nr_legacy_irqs
instead of hard-coded NR_IRQS_LEGACY.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: xen-devel@lists.xenproject.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Tony Lindgren <tony@atomide.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Link: http://lkml.kernel.org/r/1402302011-23642-20-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 23:05:42 +02:00
Jiang Liu
18e4855186 x86, irq: Introduce some helper utilities to improve readability
It also fixes an off by one bug in
	if ((ioapic_idx > 0) && (irq > NR_IRQS_LEGACY))
It should be
	if ((ioapic_idx > 0) && (irq >= NR_IRQS_LEGACY))

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1402302011-23642-17-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 23:05:41 +02:00
Jiang Liu
4035ed0134 x86, ioapic: Kill unused global variable timer_through_8259
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1402302011-23642-12-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 23:05:41 +02:00
Jiang Liu
3eb2be5f49 x86, irq, trivial: Minor improvements of IRQ related code
1) Kill unused MAX_HARDIRQS_PER_CPU.
2) Improve function prototype declararions.
3) Simple typo fix, change "gsit" to "gsi".
4) Use macro VECTOR_UNDEFINED instead of hard-coded -1.
5) Kill redundant comments.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiri Kosina <trivial@kernel.org>
Link: http://lkml.kernel.org/r/1402302011-23642-11-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 23:05:41 +02:00
Jiang Liu
a491cc902c x86, mpparse: Simplify arch/x86/include/asm/mpspec.h
Simplify arch/x86/include/asm/mpspec.h by
1) Change max_physical_apicid to static as it's only used in apic.c.
2) Kill declaration of mpc_default_type, it's never defined.
3) Delete default_acpi_madt_oem_check(), it has already been declared
   in apic.h.
4) Make default_acpi_madt_oem_check() depends on CONFIG_X86_LOCAL_APIC
   instead of CONFIG_X86_64 to support i386.
5) Change mp_override_legacy_irq(), mp_config_acpi_legacy_irqs() and
   mp_register_gsi() as static because they are only used in acpi/boot.c.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1402302011-23642-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 23:05:40 +02:00
Paolo Bonzini
7cb060a91c KVM: x86: preserve the high 32-bits of the PAT register
KVM does not really do much with the PAT, so this went unnoticed for a
long time.  It is exposed however if you try to do rdmsr on the PAT
register.

Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 13:43:44 +02:00
Jan Kiszka
560b7ee12c KVM: nVMX: Fix returned value of MSR_IA32_VMX_PROCBASED_CTLS
SDM says bits 1, 4-6, 8, 13-16, and 26 have to be set.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:12 +02:00
Jan Kiszka
3dbcd8da7b KVM: nVMX: Advertise support for MSR_IA32_VMX_TRUE_*_CTLS
We already implemented them but failed to advertise them. Currently they
all return the identical values to the capability MSRs they are
augmenting. So there is no change in exposed features yet.

Drop related comments at this chance that are partially incorrect and
redundant anyway.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:11 +02:00
Jan Kiszka
e4aa5288ff KVM: x86: Fix constant value of VM_{EXIT_SAVE,ENTRY_LOAD}_DEBUG_CONTROLS
The spec says those controls are at bit position 2 - makes 4 as value.

The impact of this mistake is effectively zero as we only use them to
ensure that these features are set at position 2 (or, previously, 1) in
MSR_IA32_VMX_{EXIT,ENTRY}_CTLS - which is and will be always true
according to the spec.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 12:52:11 +02:00
Saurabh Tangri
eeb9db09f7 x86/efi: Move all workarounds to a separate file quirks.c
Currently, it's difficult to find all the workarounds that are
applied when running on EFI, because they're littered throughout
various code paths. This change moves all of them into a separate
file with the hope that it will be come the single location for all
our well documented quirks.

Signed-off-by: Saurabh Tangri <saurabh.tangri@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-06-19 11:14:33 +01:00
Nadav Amit
682367c494 KVM: x86: Increase the number of fixed MTRR regs to 10
Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
sometime make assumptions on CPUs while they ignore capability MSRs, it is
better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
actually supported has no functional implications.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19 11:41:14 +02:00
Borislav Petkov
9b13a93df2 x86, cpufeature: Convert more "features" to bugs
X86_FEATURE_FXSAVE_LEAK, X86_FEATURE_11AP and
X86_FEATURE_CLFLUSH_MONITOR are not really features but synthetic bits
we use for applying different bug workarounds. Call them what they
really are, and make sure they get the proper cross-CPU behavior (OR
rather than AND).

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1403042783-23278-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-06-18 15:27:04 -07:00
H. Peter Anvin
03ab3da3b2 Linux 3.16-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTnmhkAAoJEHm+PkMAQRiGYyoIAJ+dCxQjx0Jmu5VLs48yvUkx
 AVbnJeEq0ScgFPBAlfrGnnXczOyZGD+wveNRwb3bN4f6hGkWHPncfExAeTU35QQ+
 92kCLScPU6kn4tnTqjac4ZrUhBCfgg5hFFamXOPidVDG4f5wYwql3IvP4O3eMOcZ
 IOx7kgweqoPm4A/LjXQ3L21kghs88NXySWMwwfWFdXaV++ey07slCWLzGF5rMDh/
 xfBDfgTS4gdrbGeE+1z/qkoWyHwKnCad8Uh2Fu5CcprElwCjLLhrLPccYSRKO2IR
 2ZSj/mcMb1FhH7AOyXBYMVbjhOH5MCIlHvJYYp7kwHfs66UTmnkczOJxq1ynKP0=
 =Whty
 -----END PGP SIGNATURE-----

Merge tag 'v3.16-rc1' into x86/cpufeature

Linux 3.16-rc1

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-06-18 15:26:19 -07:00
Borislav Petkov
d0f2dd1861 x86, xsave: Add forgotten inline annotation
Add a missing inline annotation on a static function, in order to shut
up a bunch of warnings like:

In file included from arch/x86/crypto/camellia_aesni_avx_glue.c:23:0:
./arch/x86/include/asm/xsave.h:73:12: warning: ‘xsave_state_booting’ defined but not used [-Wunused-function]
 static int xsave_state_booting(struct xsave_struct *fx, u64 mask)
            ^
In file included from arch/x86/crypto/camellia_aesni_avx2_glue.c:23:0:
./arch/x86/include/asm/xsave.h:73:12: warning: ‘xsave_state_booting’ defined but not used [-Wunused-function]
 static int xsave_state_booting(struct xsave_struct *fx, u64 mask)
            ^
...

Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1403000468-30094-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-06-18 15:19:59 -07:00
Peter Zijlstra
4f3aaf2c2b x86, locking: Use no more OOSTORE nonsense
Paul reported that X86_OOSTORE is dead, yay! Update a comment and
remove a newly added reference.

Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20140611090509.GC3588@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-18 18:41:22 +02:00
Nadav Amit
67f4d4288c KVM: x86: rdpmc emulation checks the counter incorrectly
The rdpmc emulation checks that the counter (ECX) is not higher than 2, without
taking into considerations bits 30:31 role (e.g., bit 30 marks whether the
counter is fixed). The fix uses the pmu information for checking the validity
of the pmu counter.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 17:46:18 +02:00
Tejun Heo
6fbc07bbe2 percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations
__verify_pcpu_ptr() is used to verify that a specified parameter is
actually an percpu pointer by percpu accessor and operation
implementations.  Currently, where it's called isn't clearly defined
and we just ensure that it's invoked at least once for all accessors
and operations.

The lack of clarity on when it should be called isn't nice and given
that this is a completely generic issue, there's no reason to make
archs worry about it.

This patch updates __verify_pcpu_ptr() invocations such that it's
always invoked from the final generic wrapper once per access or
operation.  As this is already the case for {raw|this}_cpu_*()
definitions through __pcpu_size_*(), only the {raw|per|this}_cpu_ptr()
accessors need to be updated.

This change makes it unnecessary for archs to worry about
__verify_pcpu_ptr().  x86's arch_raw_cpu_ptr() is updated accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2014-06-17 19:12:40 -04:00
Tejun Heo
bbc344e1e3 percpu: introduce arch_raw_cpu_ptr()
Currently, archs can override raw_cpu_ptr() directly; however, we
wanna build a layer of indirection in the generic part of percpu so
that we can implement generic features there without affecting archs.

Introduce arch_raw_cpu_ptr() which is used to define raw_cpu_ptr() by
generic percpu code.  The two are identical for now.  x86 is currently
the only arch which overrides raw_cpu_ptr() and is converted to
define arch_raw_cpu_ptr() instead.

This doesn't introduce any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2014-06-17 19:12:34 -04:00
Linus Torvalds
3737a12761 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more perf updates from Ingo Molnar:
 "A second round of perf updates:

   - wide reaching kprobes sanitization and robustization, with the hope
     of fixing all 'probe this function crashes the kernel' bugs, by
     Masami Hiramatsu.

   - uprobes updates from Oleg Nesterov: tmpfs support, corner case
     fixes and robustization work.

   - perf tooling updates and fixes from Jiri Olsa, Namhyung Ki, Arnaldo
     et al:
        * Add support to accumulate hist periods (Namhyung Kim)
        * various fixes, refactorings and enhancements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
  perf: Differentiate exec() and non-exec() comm events
  perf: Fix perf_event_comm() vs. exec() assumption
  uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates
  perf/documentation: Add description for conditional branch filter
  perf/x86: Add conditional branch filtering support
  perf/tool: Add conditional branch filter 'cond' to perf record
  perf: Add new conditional branch filter 'PERF_SAMPLE_BRANCH_COND'
  uprobes: Teach copy_insn() to support tmpfs
  uprobes: Shift ->readpage check from __copy_insn() to uprobe_register()
  perf/x86: Use common PMU interrupt disabled code
  perf/ARM: Use common PMU interrupt disabled code
  perf: Disable sampled events if no PMU interrupt
  perf: Fix use after free in perf_remove_from_context()
  perf tools: Fix 'make help' message error
  perf record: Fix poll return value propagation
  perf tools: Move elide bool into perf_hpp_fmt struct
  perf tools: Remove elide setup for SORT_MODE__MEMORY mode
  perf tools: Fix "==" into "=" in ui_browser__warning assignment
  perf tools: Allow overriding sysfs and proc finding with env var
  perf tools: Consider header files outside perf directory in tags target
  ...
2014-06-12 19:18:49 -07:00
Linus Torvalds
c29deef32e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more locking changes from Ingo Molnar:
 "This is the second round of locking tree updates for v3.16, offering
  large system scalability improvements:

 - optimistic spinning for rwsems, from Davidlohr Bueso.

 - 'qrwlocks' core code and x86 enablement, from Waiman Long and PeterZ"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, locking/rwlocks: Enable qrwlocks on x86
  locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks
  locking/mutexes: Documentation update/rewrite
  locking/rwsem: Fix checkpatch.pl warnings
  locking/rwsem: Fix warnings for CONFIG_RWSEM_GENERIC_SPINLOCK
  locking/rwsem: Support optimistic spinning
2014-06-12 18:48:15 -07:00
Linus Torvalds
f9da455b93 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

 4) BPF now has a "random" opcode, from Chema Gonzalez.

 5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

 6) Support TCP fastopen over ipv6, from Daniel Lee.

 7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers.  From Ezequiel Garcia.

 8) Support software TSO in fec driver too, from Nimrod Andy.

 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
  rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
  tcp: fixing TLP's FIN recovery
  net: fec: Add software TSO support
  net: fec: Add Scatter/gather support
  net: fec: Increase buffer descriptor entry number
  net: fec: Factorize feature setting
  net: fec: Enable IP header hardware checksum
  net: fec: Factorize the .xmit transmit function
  bridge: fix compile error when compiling without IPv6 support
  bridge: fix smatch warning / potential null pointer dereference
  via-rhine: fix full-duplex with autoneg disable
  bnx2x: Enlarge the dorq threshold for VFs
  bnx2x: Check for UNDI in uncommon branch
  bnx2x: Fix 1G-baseT link
  bnx2x: Fix link for KR with swapped polarity lane
  sctp: Fix sk_ack_backlog wrap-around problem
  net/core: Add VF link state control policy
  net/fsl: xgmac_mdio is dependent on OF_MDIO
  net/fsl: Make xgmac_mdio read error message useful
  net_sched: drr: warn when qdisc is not work conserving
  ...
2014-06-12 14:27:40 -07:00
Oleg Nesterov
36fac0a214 signals: kill sigfindinword()
It has no users and it doesn't look useful.  I do not know why/when it was
introduced, I can't even find any user in the git history.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:11 -07:00
Waiman Long
bd01ec1a13 x86, locking/rwlocks: Enable qrwlocks on x86
Make x86 use the fair rwlock_t.

Implement the custom queue_write_unlock() for best performance.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
[peterz: near complete rewrite]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
Link: http://lkml.kernel.org/n/tip-r1xuzmdysvuhl3h86n5fbxi7@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-06 07:58:29 +02:00
Ingo Molnar
ec00010972 Merge branch 'perf/urgent' into perf/core, to resolve conflict and to prepare for new patches
Conflicts:
	arch/x86/kernel/traps.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-06 07:55:06 +02:00
Linus Torvalds
046f153343 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 EFI updates from Peter Anvin:
 "A collection of EFI changes.  The perhaps most important one is to
  fully save and restore the FPU state around each invocation of EFI
  runtime, and to not choke on non-ASCII characters in the boot stub"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efivars: Add compatibility code for compat tasks
  efivars: Refactor sanity checking code into separate function
  efivars: Stop passing a struct argument to efivar_validate()
  efivars: Check size of user object
  efivars: Use local variables instead of a pointer dereference
  x86/efi: Save and restore FPU context around efi_calls (i386)
  x86/efi: Save and restore FPU context around efi_calls (x86_64)
  x86/efi: Implement a __efi_call_virt macro
  x86, fpu: Extend the use of static_cpu_has_safe
  x86/efi: Delete most of the efi_call* macros
  efi: x86: Handle arbitrary Unicode characters
  efi: Add get_dram_base() helper function
  efi: Add shared printk wrapper for consistent prefixing
  efi: create memory map iteration helper
  efi: efi-stub-helper cleanup
2014-06-05 08:16:29 -07:00
Linus Torvalds
a0abcf2e8f Merge branch 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 cdso updates from Peter Anvin:
 "Vdso cleanups and improvements largely from Andy Lutomirski.  This
  makes the vdso a lot less ''special''"

* 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso, build: Make LE access macros clearer, host-safe
  x86/vdso, build: Fix cross-compilation from big-endian architectures
  x86/vdso, build: When vdso2c fails, unlink the output
  x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
  x86, mm: Replace arch_vma_name with vm_ops->name for vsyscalls
  x86, mm: Improve _install_special_mapping and fix x86 vdso naming
  mm, fs: Add vm_ops->name as an alternative to arch_vma_name
  x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
  x86, vdso: Remove vestiges of VDSO_PRELINK and some outdated comments
  x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
  x86, vdso: Move the 32-bit vdso special pages after the text
  x86, vdso: Reimplement vdso.so preparation in build-time C
  x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c
  x86, vdso: Clean up 32-bit vs 64-bit vdso params
  x86, mm: Ensure correct alignment of the fixmap
2014-06-05 08:05:29 -07:00
Linus Torvalds
2071b3e34f Merge branch 'x86/espfix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86-64 espfix changes from Peter Anvin:
 "This is the espfix64 code, which fixes the IRET information leak as
  well as the associated functionality problem.  With this code applied,
  16-bit stack segments finally work as intended even on a 64-bit
  kernel.

  Consequently, this patchset also removes the runtime option that we
  added as an interim measure.

  To help the people working on Linux kernels for very small systems,
  this patchset also makes these compile-time configurable features"

* 'x86/espfix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"
  x86, espfix: Make it possible to disable 16-bit support
  x86, espfix: Make espfix64 a Kconfig option, fix UML
  x86, espfix: Fix broken header guard
  x86, espfix: Move espfix definitions into a separate header file
  x86-32, espfix: Remove filter for espfix32 due to race
  x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
2014-06-05 07:46:15 -07:00
Ingo Molnar
8c6e549a44 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
Pull uprobes tmpfs support patches from Oleg Nesterov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 16:35:30 +02:00
Oleg Nesterov
5cdb76d6f0 uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates
Purely cosmetic, no changes in .o,

1. As Jim pointed out arch_uprobe->def looks ambiguous, rename it to
   ->defparam.

2. Add the comment into default_post_xol_op() to explain "regs->sp +=".

3. Remove the stale part of the comment in arch_uprobe_analyze_insn().

Suggested-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-06-05 16:21:57 +02:00
Cliff Wickman
a26fd71953 x86/uv: Update the UV3 TLB shootdown logic
Update of TLB shootdown code for UV3.

Kernel function native_flush_tlb_others() calls
uv_flush_tlb_others() on UV to invalidate tlb page definitions
on remote cpus. The UV systems have a hardware 'broadcast assist
unit' which can be used to broadcast shootdown messages to all
cpu's of selected nodes.

The behavior of the BAU has changed only slightly with UV3:

  - UV3 is recognized with is_uv3_hub().
  - UV2 functions and structures (uv2_xxx) are in most cases
    simply renamed to uv2_3_xxx.
  - Some UV2 error workarounds are not needed for UV3.
    (see uv_bau_message_interrupt and enable_timeouts)

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/E1WkgWh-0001yJ-3K@eag09.americas.sgi.com
[ Removed a few linebreak uglies. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 14:17:20 +02:00
Ingo Molnar
10b0256496 Merge branch 'perf/kprobes' into perf/core
Conflicts:
	arch/x86/kernel/traps.c

The kprobes enhancements are fully cooked, ship them upstream.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 12:26:50 +02:00
Linus Torvalds
00170fdd08 Merge branch 'akpm' (patchbomb from Andrew) into next
Merge misc updates from Andrew Morton:

 - a few fixes for 3.16.  Cc'ed to stable so they'll get there somehow.

 - various misc fixes and cleanups

 - most of the ocfs2 queue.  Review is slow...

 - most of MM.  The MM queue is pretty huge this time, but not much in
   the way of feature work.

 - some tweaks under kernel/

 - printk maintenance work

 - updates to lib/

 - checkpatch updates

 - tweaks to init/

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (276 commits)
  fs/autofs4/dev-ioctl.c: add __init to autofs_dev_ioctl_init
  fs/ncpfs/getopt.c: replace simple_strtoul by kstrtoul
  init/main.c: remove an ifdef
  kthreads: kill CLONE_KERNEL, change kernel_thread(kernel_init) to avoid CLONE_SIGHAND
  init/main.c: add initcall_blacklist kernel parameter
  init/main.c: don't use pr_debug()
  fs/binfmt_flat.c: make old_reloc() static
  fs/binfmt_elf.c: fix bool assignements
  fs/efs: convert printk(KERN_DEBUG to pr_debug
  fs/efs: add pr_fmt / use __func__
  fs/efs: convert printk to pr_foo()
  scripts/checkpatch.pl: device_initcall is not the only __initcall substitute
  checkpatch: check stable email address
  checkpatch: warn on unnecessary void function return statements
  checkpatch: prefer kstrto<foo> to sscanf(buf, "%<lhuidx>", &bar);
  checkpatch: add warning for kmalloc/kzalloc with multiply
  checkpatch: warn on #defines ending in semicolon
  checkpatch: make --strict a default for files in drivers/net and net/
  checkpatch: always warn on missing blank line after variable declaration block
  checkpatch: fix wildcard DT compatible string checking
  ...
2014-06-04 16:55:13 -07:00
Fabian Frederick
f6187769da sys_sgetmask/sys_ssetmask: add CONFIG_SGETMASK_SYSCALL
sys_sgetmask and sys_ssetmask are obsolete system calls no longer
supported in libc.

This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert
mode configuration.That option is enabled by default for those
architectures.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:14 -07:00
Chen Yucong
65eb71823b hwpoison: remove unused global variable in do_machine_check()
Remove an unused global variable mce_entry and relative operations in
do_machine_check().

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:11 -07:00
Cyrill Gorcunov
2bf01f9f0c mm: x86 pgtable: require X86_64 for soft-dirty tracker
Tracking dirty status on 2 level pages requires very ugly macros and
taking into account how old the machines who can operate without PAE
mode only are, lets drop soft dirty tracker from them for code
simplicity (note I can't drop all the macros from 2 level pages by now
since _PAGE_BIT_PROTNONE and _PAGE_BIT_FILE are still used even without
tracker).

Linus proposed to completely rip off softdirty support on x86-32 (even
with PAE) and since for CRIU we're not planning to support native x86-32
mode, lets do that.

(Softdirty tracker is relatively new feature which is mostly used by
CRIU so I don't expect if such API change would cause problems for
userspace).

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:05 -07:00
Cyrill Gorcunov
2373eaecff mm: x86 pgtable: drop unneeded preprocessor ifdef
_PAGE_BIT_FILE (bit 6) is always less than _PAGE_BIT_PROTNONE (bit 8), so
drop redundant #ifdef.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:05 -07:00
Akinobu Mita
9c5a362142 x86: enable DMA CMA with swiotlb
The DMA Contiguous Memory Allocator support on x86 is disabled when
swiotlb config option is enabled.  So DMA CMA is always disabled on
x86_64 because swiotlb is always enabled.  This attempts to support for
DMA CMA with enabling swiotlb config option.

The contiguous memory allocator on x86 is integrated in the function
dma_generic_alloc_coherent() which is .alloc callback in nommu_dma_ops
for dma_alloc_coherent().

x86_swiotlb_alloc_coherent() which is .alloc callback in swiotlb_dma_ops
tries to allocate with dma_generic_alloc_coherent() firstly and then
swiotlb_alloc_coherent() is called as a fallback.

The main part of supporting DMA CMA with swiotlb is that changing
x86_swiotlb_free_coherent() which is .free callback in swiotlb_dma_ops
for dma_free_coherent() so that it can distinguish memory allocated by
dma_generic_alloc_coherent() from one allocated by
swiotlb_alloc_coherent() and release it with dma_generic_free_coherent()
which can handle contiguous memory.  This change requires making
is_swiotlb_buffer() global function.

This also needs to change .free callback in the dma_map_ops for amd_gart
and sta2x11, because these dma_ops are also using
dma_generic_alloc_coherent().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:57 -07:00
Mel Gorman
c46a7c817e x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels
_PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
faults on x86.  Care is taken such that _PAGE_NUMA is used only in
situations where the VMA flags distinguish between NUMA hinting faults
and prot_none faults.  This decision was x86-specific and conceptually
it is difficult requiring special casing to distinguish between PROTNONE
and NUMA ptes based on context.

Fundamentally, we only need the _PAGE_NUMA bit to tell the difference
between an entry that is really unmapped and a page that is protected
for NUMA hinting faults as if the PTE is not present then a fault will
be trapped.

Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset.
This patch shrinks the maximum possible swap size and uses the bit to
uniquely distinguish between NUMA hinting ptes and swap ptes.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:55 -07:00
Linus Torvalds
d09cc3659d Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull core irq updates from Thomas Gleixner:
 "The irq department delivers:

   - Another tree wide update to get rid of the horrible create_irq
     interface along with its even more horrible variants.  That also
     gets rid of the last leftovers of the initial sparse irq hackery.
     arch/driver specific changes have been either acked or ignored.

   - A fix for the spurious interrupt detection logic with threaded
     interrupts.

   - A new ARM SoC interrupt controller

   - The usual pile of fixes and improvements all over the place"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  Documentation: brcmstb-l2: Add Broadcom STB Level-2 interrupt controller binding
  irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller
  genirq: Improve documentation to match current implementation
  ARM: iop13xx: fix msi support with sparse IRQ
  genirq: Provide !SMP stub for irq_set_affinity_notifier()
  irqchip: armada-370-xp: Move the devicetree binding documentation
  irqchip: gic: Use mask field in GICC_IAR
  genirq: Remove dynamic_irq mess
  ia64: Use irq_init_desc
  genirq: Replace dynamic_irq_init/cleanup
  genirq: Remove irq_reserve_irq[s]
  genirq: Replace reserve_irqs in core code
  s390: Avoid call to irq_reserve_irqs()
  s390: Remove pointless arch_show_interrupts()
  s390: pci: Check return value of alloc_irq_desc() proper
  sh: intc: Remove pointless irq_reserve_irqs() invocation
  x86, irq: Remove pointless irq_reserve_irqs() call
  genirq: Make create/destroy_irq() ia64 private
  tile: Use SPARSE_IRQ
  tile: pci: Use irq_alloc/free_hwirq()
  ...
2014-06-04 15:59:13 -07:00
Linus Torvalds
4dc4226f99 ACPI and power management updates for 3.16-rc1
- ACPICA update to upstream version 20140424.  That includes a
    number of fixes and improvements related to things like GPE
    handling, table loading, headers, memory mapping and unmapping,
    DSDT/SSDT overriding, and the Unload() operator.  The acpidump
    utility from upstream ACPICA is included too.  From Bob Moore,
    Lv Zheng, David Box, David Binderman, and Colin Ian King.
 
  - Fixes and cleanups related to ACPI video and backlight interfaces
    from Hans de Goede.  That includes blacklist entries for some new
    machines and using native backlight by default.
 
  - ACPI device enumeration changes to create platform devices
    rather than PNP devices for ACPI device objects with _HID by
    default.  PNP devices will still be created for the ACPI device
    object with device IDs corresponding to real PNP devices, so
    that change should not break things left and right, and we're
    expecting to see more and more ACPI-enumerated platform devices
    in the future.  From Zhang Rui and Rafael J Wysocki.
 
  - Updates for the ACPI LPSS (Low-Power Subsystem) driver allowing
    it to handle system suspend/resume on Asus T100 correctly.
    From Heikki Krogerus and Rafael J Wysocki.
 
  - PM core update introducing a mechanism to allow runtime-suspended
    devices to stay suspended over system suspend/resume transitions
    if certain additional conditions related to coordination within
    device hierarchy are met.  Related PM documentation update and
    ACPI PM domain support for the new feature.  From Rafael J Wysocki.
 
  - Fixes and improvements related to the "freeze" sleep state. They
    affect several places including cpuidle, PM core, ACPI core, and
    the ACPI battery driver.  From Rafael J Wysocki and Zhang Rui.
 
  - Miscellaneous fixes and updates of the ACPI core from Aaron Lu,
    Bjørn Mork, Hanjun Guo, Lan Tianyu, and Rafael J Wysocki.
 
  - Fixes and cleanups for the ACPI processor and ACPI PAD (Processor
    Aggregator Device) drivers from Baoquan He, Manuel Schölling,
    Tony Camuso, and Toshi Kani.
 
  - System suspend/resume optimization in the ACPI battery driver from
    Lan Tianyu.
 
  - OPP (Operating Performance Points) subsystem updates from
    Chander Kashyap, Mark Brown, and Nishanth Menon.
 
  - cpufreq core fixes, updates and cleanups from Srivatsa S Bhat,
    Stratos Karafotis, and Viresh Kumar.
 
  - Updates, fixes and cleanups for the Tegra, powernow-k8, imx6q,
    s5pv210, nforce2, and powernv cpufreq drivers from Brian Norris,
    Jingoo Han, Paul Bolle, Philipp Zabel, Stratos Karafotis, and
    Viresh Kumar.
 
  - intel_pstate driver fixes and cleanups from Dirk Brandewie,
    Doug Smythies, and Stratos Karafotis.
 
  - Enabling the big.LITTLE cpufreq driver on arm64 from Mark Brown.
 
  - Fix for the cpuidle menu governor from Chander Kashyap.
 
  - New ARM clps711x cpuidle driver from Alexander Shiyan.
 
  - Hibernate core fixes and cleanups from Chen Gang, Dan Carpenter,
    Fabian Frederick, Pali Rohár, and Sebastian Capella.
 
  - Intel RAPL (Running Average Power Limit) driver updates from
    Jacob Pan.
 
  - PNP subsystem updates from Bjorn Helgaas and Fabian Frederick.
 
  - devfreq core updates from Chanwoo Choi and Paul Bolle.
 
  - devfreq updates for exynos4 and exynos5 from Chanwoo Choi and
    Bartlomiej Zolnierkiewicz.
 
  - turbostat tool fix from Jean Delvare.
 
  - cpupower tool updates from Prarit Bhargava, Ramkumar Ramachandra
    and Thomas Renninger.
 
  - New ACPI ec_access.c tool for poking at the EC in a safe way
    from Thomas Renninger.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTjl16AAoJEILEb/54YlRxeKgP/RRQSV7lFtf582Dw/5M/iWOg
 qYeNtuYFLArEmJ7SpxHdKsU1ZRm3CahAS1j7grvQMQasUxTzoavMcSBNZefeaoNK
 d01LVNqcyKCZs3+izRezk5N1IY+AjdrOcqCdIk8rfgFnc6kOttYUrVcIzKuIKAvJ
 MsJ5s/uqP8G69FsAA3Ttdtr0HKiQhN4skSt424wntQRDeJNZPBs74mPKBGh8bxlO
 Zr/VCDibKQ2Z8jS7x+TzwZrOxgE1/9x0Cub6GAdTvAfS8A+utPwSkneUyopNqpQ+
 tJ5rz5R+HpmPMerizBuU+5s+tvjDPtH4/OZvOPSpYraQSFLOwx3hAm+a5k7fOGmc
 XWjXnXWT0i0V3iQkwrspTNjX1RgywbsHbmXrcWn192HResvMQ9zk2gH2ch6m8JhN
 yTV5V51dOZicpPuaTCvIkJpsV33p6vRz+EdPBiXoEdua5KKtOg8EnQ470dNaMR92
 3ZtWmIvSgGlyPyHlSHLfGXbPUwTYvDNV3aheIoXp9E6WY3WJN9J3WXm4EHKBNVaI
 H83kwuk1s92cgqh22H5Pcb0CmDcrbkUdP6hhsPS/aL80/EJMljRP2AYW1Y+l1LAf
 pzMLmekHFqQEDjFQltwGvFV/EjFeMHnqOgQONx9ygMaayCGGTYSDx3FbRDesf8t9
 qhoFcTPSxoo0XjrGrR6b
 =tpdF
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm into next

Pull ACPI and power management updates from Rafael Wysocki:
 "ACPICA is the leader this time (63 commits), followed by cpufreq (28
  commits), devfreq (15 commits), system suspend/hibernation (12
  commits), ACPI video and ACPI device enumeration (10 commits each).

  We have no major new features this time, but there are a few
  significant changes of how things work.  The most visible one will
  probably be that we are now going to create platform devices rather
  than PNP devices by default for ACPI device objects with _HID.  That
  was long overdue and will be really necessary to be able to use the
  same drivers for the same hardware blocks on ACPI and DT-based systems
  going forward.  We're not expecting fallout from this one (as usual),
  but it's something to watch nevertheless.

  The second change having a chance to be visible is that ACPI video
  will now default to using native backlight rather than the ACPI
  backlight interface which should generally help systems with broken
  Win8 BIOSes.  We're hoping that all problems with the native backlight
  handling that we had previously have been addressed and we are in a
  good enough shape to flip the default, but this change should be easy
  enough to revert if need be.

  In addition to that, the system suspend core has a new mechanism to
  allow runtime-suspended devices to stay suspended throughout system
  suspend/resume transitions if some extra conditions are met
  (generally, they are related to coordination within device hierarchy).
  However, enabling this feature requires cooperation from the bus type
  layer and for now it has only been implemented for the ACPI PM domain
  (used by ACPI-enumerated platform devices mostly today).

  Also, the acpidump utility that was previously shipped as a separate
  tool will now be provided by the upstream ACPICA along with the rest
  of ACPICA code, which will allow it to be more up to date and better
  supported, and we have one new cpuidle driver (ARM clps711x).

  The rest is improvements related to certain specific use cases,
  cleanups and fixes all over the place.

  Specifics:

   - ACPICA update to upstream version 20140424.  That includes a number
     of fixes and improvements related to things like GPE handling,
     table loading, headers, memory mapping and unmapping, DSDT/SSDT
     overriding, and the Unload() operator.  The acpidump utility from
     upstream ACPICA is included too.  From Bob Moore, Lv Zheng, David
     Box, David Binderman, and Colin Ian King.

   - Fixes and cleanups related to ACPI video and backlight interfaces
     from Hans de Goede.  That includes blacklist entries for some new
     machines and using native backlight by default.

   - ACPI device enumeration changes to create platform devices rather
     than PNP devices for ACPI device objects with _HID by default.  PNP
     devices will still be created for the ACPI device object with
     device IDs corresponding to real PNP devices, so that change should
     not break things left and right, and we're expecting to see more
     and more ACPI-enumerated platform devices in the future.  From
     Zhang Rui and Rafael J Wysocki.

   - Updates for the ACPI LPSS (Low-Power Subsystem) driver allowing it
     to handle system suspend/resume on Asus T100 correctly.  From
     Heikki Krogerus and Rafael J Wysocki.

   - PM core update introducing a mechanism to allow runtime-suspended
     devices to stay suspended over system suspend/resume transitions if
     certain additional conditions related to coordination within device
     hierarchy are met.  Related PM documentation update and ACPI PM
     domain support for the new feature.  From Rafael J Wysocki.

   - Fixes and improvements related to the "freeze" sleep state.  They
     affect several places including cpuidle, PM core, ACPI core, and
     the ACPI battery driver.  From Rafael J Wysocki and Zhang Rui.

   - Miscellaneous fixes and updates of the ACPI core from Aaron Lu,
     Bjørn Mork, Hanjun Guo, Lan Tianyu, and Rafael J Wysocki.

   - Fixes and cleanups for the ACPI processor and ACPI PAD (Processor
     Aggregator Device) drivers from Baoquan He, Manuel Schölling, Tony
     Camuso, and Toshi Kani.

   - System suspend/resume optimization in the ACPI battery driver from
     Lan Tianyu.

   - OPP (Operating Performance Points) subsystem updates from Chander
     Kashyap, Mark Brown, and Nishanth Menon.

   - cpufreq core fixes, updates and cleanups from Srivatsa S Bhat,
     Stratos Karafotis, and Viresh Kumar.

   - Updates, fixes and cleanups for the Tegra, powernow-k8, imx6q,
     s5pv210, nforce2, and powernv cpufreq drivers from Brian Norris,
     Jingoo Han, Paul Bolle, Philipp Zabel, Stratos Karafotis, and
     Viresh Kumar.

   - intel_pstate driver fixes and cleanups from Dirk Brandewie, Doug
     Smythies, and Stratos Karafotis.

   - Enabling the big.LITTLE cpufreq driver on arm64 from Mark Brown.

   - Fix for the cpuidle menu governor from Chander Kashyap.

   - New ARM clps711x cpuidle driver from Alexander Shiyan.

   - Hibernate core fixes and cleanups from Chen Gang, Dan Carpenter,
     Fabian Frederick, Pali Rohár, and Sebastian Capella.

   - Intel RAPL (Running Average Power Limit) driver updates from Jacob
     Pan.

   - PNP subsystem updates from Bjorn Helgaas and Fabian Frederick.

   - devfreq core updates from Chanwoo Choi and Paul Bolle.

   - devfreq updates for exynos4 and exynos5 from Chanwoo Choi and
     Bartlomiej Zolnierkiewicz.

   - turbostat tool fix from Jean Delvare.

   - cpupower tool updates from Prarit Bhargava, Ramkumar Ramachandra
     and Thomas Renninger.

   - New ACPI ec_access.c tool for poking at the EC in a safe way from
     Thomas Renninger"

* tag 'pm+acpi-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (187 commits)
  ACPICA: Namespace: Remove _PRP method support.
  intel_pstate: Improve initial busy calculation
  intel_pstate: add sample time scaling
  intel_pstate: Correct rounding in busy calculation
  intel_pstate: Remove C0 tracking
  PM / hibernate: fixed typo in comment
  ACPI: Fix x86 regression related to early mapping size limitation
  ACPICA: Tables: Add mechanism to control early table checksum verification.
  ACPI / scan: use platform bus type by default for _HID enumeration
  ACPI / scan: always register ACPI LPSS scan handler
  ACPI / scan: always register memory hotplug scan handler
  ACPI / scan: always register container scan handler
  ACPI / scan: Change the meaning of missing .attach() in scan handlers
  ACPI / scan: introduce platform_id device PNP type flag
  ACPI / scan: drop unsupported serial IDs from PNP ACPI scan handler ID list
  ACPI / scan: drop IDs that do not comply with the ACPI PNP ID rule
  ACPI / PNP: use device ID list for PNPACPI device enumeration
  ACPI / scan: .match() callback for ACPI scan handlers
  ACPI / battery: wakeup the system only when necessary
  power_supply: allow power supply devices registered w/o wakeup source
  ...
2014-06-04 08:57:16 -07:00
Linus Torvalds
b05d59dfce At over 200 commits, covering almost all supported architectures, this
was a pretty active cycle for KVM.  Changes include:
 
 - a lot of s390 changes: optimizations, support for migration,
   GDB support and more
 
 - ARM changes are pretty small: support for the PSCI 0.2 hypercall
   interface on both the guest and the host (the latter acked by Catalin)
 
 - initial POWER8 and little-endian host support
 
 - support for running u-boot on embedded POWER targets
 
 - pretty large changes to MIPS too, completing the userspace interface
   and improving the handling of virtualized timer hardware
 
 - for x86, a larger set of changes is scheduled for 3.17.  Still,
   we have a few emulator bugfixes and support for running nested
   fully-virtualized Xen guests (para-virtualized Xen guests have
   always worked).  And some optimizations too.
 
 The only missing architecture here is ia64.  It's not a coincidence
 that support for KVM on ia64 is scheduled for removal in 3.17.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTjtlBAAoJEBvWZb6bTYbyMOUP/2NAePghE3IjG99ikHFdn+BX
 BfrURsuR6GD0AhYQnBidBmpFbAmN/LwSJxv/M7sV7OBRWLu3qbt69DrPTU2e/FK1
 j9q25peu8jRyHzJ1q9rBroo74nD9lQYuVr3uXNxxcg0DRnw14JHGlM3y8LDEknO8
 W+gpWTeAQ+2AuOX98MpRbCRMuzziCSv5bP5FhBVnsWHiZfvMbcUrbeJt+zYSiDAZ
 0tHm/5dFKzfj/vVrrnjD4EZcRr688Bs5rztG96hY6aoVJryjZGLtLp92wCWkRRmH
 CCvZwd245NmNthuKHzcs27/duSWfU0uOlu7AMrD44QYhzeDGyB/2nbCxbGqLLoBA
 nnOviXH4cC65/CnisZ79zfo979HbZcX+Lzg747EjBgCSxJmLlwgiG8yXtDvk5otB
 TH6GUeGDiEEPj//JD3XtgSz0sF2NvjREWRyemjDMvhz6JC/bLytXKb3sn+NXSj8m
 ujzF9eQoa4qKDcBL4IQYGTJ4z5nY3Pd68dHFIPHB7n82OxFLSQUBKxXw8/1fb5og
 VVb8PL4GOcmakQlAKtTMlFPmuy4bbL2r/2iV5xJiOZKmXIu8Hs1JezBE3SFAltbl
 3cAGwSM9/dDkKxUbTFblyOE9bkKbg4WYmq0LkdzsPEomb3IZWntOT25rYnX+LrBz
 bAknaZpPiOrW11Et1htY
 =j5Od
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into next

Pull KVM updates from Paolo Bonzini:
 "At over 200 commits, covering almost all supported architectures, this
  was a pretty active cycle for KVM.  Changes include:

   - a lot of s390 changes: optimizations, support for migration, GDB
     support and more

   - ARM changes are pretty small: support for the PSCI 0.2 hypercall
     interface on both the guest and the host (the latter acked by
     Catalin)

   - initial POWER8 and little-endian host support

   - support for running u-boot on embedded POWER targets

   - pretty large changes to MIPS too, completing the userspace
     interface and improving the handling of virtualized timer hardware

   - for x86, a larger set of changes is scheduled for 3.17.  Still, we
     have a few emulator bugfixes and support for running nested
     fully-virtualized Xen guests (para-virtualized Xen guests have
     always worked).  And some optimizations too.

  The only missing architecture here is ia64.  It's not a coincidence
  that support for KVM on ia64 is scheduled for removal in 3.17"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits)
  KVM: add missing cleanup_srcu_struct
  KVM: PPC: Book3S PR: Rework SLB switching code
  KVM: PPC: Book3S PR: Use SLB entry 0
  KVM: PPC: Book3S HV: Fix machine check delivery to guest
  KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
  KVM: PPC: Book3S HV: Make sure we don't miss dirty pages
  KVM: PPC: Book3S HV: Fix dirty map for hugepages
  KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address
  KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates()
  KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number
  KVM: PPC: Book3S: Add ONE_REG register names that were missed
  KVM: PPC: Add CAP to indicate hcall fixes
  KVM: PPC: MPIC: Reset IRQ source private members
  KVM: PPC: Graciously fail broken LE hypercalls
  PPC: ePAPR: Fix hypercall on LE guest
  KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler
  KVM: PPC: BOOK3S: Always use the saved DAR value
  PPC: KVM: Make NX bit available with magic page
  KVM: PPC: Disable NX for old magic page using guests
  KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest
  ...
2014-06-04 08:47:12 -07:00
David S. Miller
c99f7abf0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	include/net/inetpeer.h
	net/ipv6/output_core.c

Changes in net were fixing bugs in code removed in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03 23:32:12 -07:00
Linus Torvalds
3de0ef8d0d Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86/UV changes from Ingo Molnar:
 "Continued updates for SGI UV 3 hardware support"

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/UV: Fix conditional in gru_exit()
  x86/UV: Set n_lshift based on GAM_GR_CONFIG MMR for UV3
2014-06-03 15:48:23 -07:00
Linus Torvalds
4aef77b2fe Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 IOSF platform updates from Ingo Molnar:
 "IOSF (Intel OnChip System Fabric) updates:

   - generalize the IOSF interface to allow mixed mode drivers: non-IOSF
     drivers to utilize of IOSF features on IOSF platforms.

   - add 'Quark X1000' IOSF/MBI support

   - clean up BayTrail and Quark PCI ID enumeration"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, iosf: Add PCI ID macros for better readability
  x86, iosf: Add Quark X1000 PCI ID
  x86, iosf: Added Quark MBI identifiers
  x86, iosf: Make IOSF driver modular and usable by more drivers
2014-06-03 15:46:38 -07:00
Linus Torvalds
c33c40549e Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 microcode changes from Ingo Molnar:
 "A microcode-debugging boot flag plus related refactoring"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode: Add a disable chicken bit
  x86, boot: Carve out early cmdline parsing function
2014-06-03 15:44:50 -07:00
Linus Torvalds
e30c631be6 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 irq cleanup from Ingo Molnar:
 "A single, trivial cleanup"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Clean up VECTOR_UNDEFINED and VECTOR_RETRIGGERED definition
2014-06-03 15:43:38 -07:00
Rafael J. Wysocki
0e36d43c9c Merge branch 'acpica'
* acpica: (63 commits)
  ACPICA: Namespace: Remove _PRP method support.
  ACPI: Fix x86 regression related to early mapping size limitation
  ACPICA: Tables: Add mechanism to control early table checksum verification.
  ACPICA: acpidump: Fix repetitive table dump in -n mode.
  ACPI: Clean up acpi_os_map/unmap_memory() to eliminate __iomem.
  ACPICA: Clean up redudant definitions already defined elsewhere
  ACPICA: Linux headers: Add <asm/acenv.h> to remove mis-ordered inclusion of <asm/acpi.h>
  ACPICA: Linux headers: Add <acpi/platform/aclinuxex.h>
  ACPICA: Linux headers: Remove ACPI_PREEMPTION_POINT() due to no usages.
  ACPICA: Update version to 20140424.
  ACPICA: Comment/format update, no functional change.
  ACPICA: Events: Update GPE handling and initialization code.
  ACPICA: Remove extraneous error message for large number of GPEs.
  ACPICA: Tables: Remove old mechanism to validate if XSDT contains NULL entries.
  ACPICA: Tables: Add new mechanism to skip NULL entries in RSDT and XSDT.
  ACPICA: acpidump: Add support to force using RSDT.
  ACPICA: Back port of improvements on exception code.
  ACPICA: Back port of _PRP update.
  ACPICA: acpidump: Fix truncated RSDP signature validation.
  ACPICA: Linux header: Add support for stubbed externals.
  ...
2014-06-03 23:12:27 +02:00
Linus Torvalds
c84a1e32ee Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull scheduler updates from Ingo Molnar:
 "The main scheduling related changes in this cycle were:

   - various sched/numa updates, for better performance

   - tree wide cleanup of open coded nice levels

   - nohz fix related to rq->nr_running use

   - cpuidle changes and continued consolidation to improve the
     kernel/sched/idle.c high level idle scheduling logic.  As part of
     this effort I pulled cpuidle driver changes from Rafael as well.

   - standardized idle polling amongst architectures

   - continued work on preparing better power/energy aware scheduling

   - sched/rt updates

   - misc fixlets and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
  sched/numa: Decay ->wakee_flips instead of zeroing
  sched/numa: Update migrate_improves/degrades_locality()
  sched/numa: Allow task switch if load imbalance improves
  sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code
  sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice()
  sched: Initialize rq->age_stamp on processor start
  sched, nohz: Change rq->nr_running to always use wrappers
  sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance()
  sched: Use clamp() and clamp_val() to make sys_nice() more readable
  sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups()
  sched/numa: Fix initialization of sched_domain_topology for NUMA
  sched: Call select_idle_sibling() when not affine_sd
  sched: Simplify return logic in sched_read_attr()
  sched: Simplify return logic in sched_copy_attr()
  sched: Fix exec_start/task_hot on migrated tasks
  arm64: Remove TIF_POLLING_NRFLAG
  metag: Remove TIF_POLLING_NRFLAG
  sched/idle: Make cpuidle_idle_call() void
  sched/idle: Reflow cpuidle_idle_call()
  sched/idle: Delay clearing the polling bit
  ...
2014-06-03 14:00:15 -07:00
Linus Torvalds
3d521f9151 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull perf updates from Ingo Molnar:
 "The tooling changes maintained by Jiri Olsa until Arnaldo is on
  vacation:

  User visible changes:
   - Add -F option for specifying output fields (Namhyung Kim)
   - Propagate exit status of a command line workload for record command
     (Namhyung Kim)
   - Use tid for finding thread (Namhyung Kim)
   - Clarify the output of perf sched map plus small sched command
     fixes (Dongsheng Yang)
   - Wire up perf_regs and unwind support for ARM64 (Jean Pihet)
   - Factor hists statistics counts processing which in turn also fixes
     several bugs in TUI report command (Namhyung Kim)
   - Add --percentage option to control absolute/relative percentage
     output (Namhyung Kim)
   - Add --list-cmds to 'kmem', 'mem', 'lock' and 'sched', for use by
     completion scripts (Ramkumar Ramachandra)

  Development/infrastructure changes and fixes:
   - Android related fixes for pager and map dso resolving (Michael
     Lentine)
   - Add libdw DWARF post unwind support for ARM (Jean Pihet)
   - Consolidate types.h for ARM and ARM64 (Jean Pihet)
   - Fix possible null pointer dereference in session.c (Masanari Iida)
   - Cleanup, remove unused variables in map_switch_event() (Dongsheng
     Yang)
   - Remove nr_state_machine_bugs in perf latency (Dongsheng Yang)
   - Remove usage of trace_sched_wakeup(.success) (Peter Zijlstra)
   - Cleanups for perf.h header (Jiri Olsa)
   - Consolidate types.h and export.h within tools (Borislav Petkov)
   - Move u64_swap union to its single user's header, evsel.h (Borislav
     Petkov)
   - Fix for s390 to properly parse tracepoints plus test code
     (Alexander Yarygin)
   - Handle EINTR error for readn/writen (Namhyung Kim)
   - Add a test case for hists filtering (Namhyung Kim)
   - Share map_groups among threads of the same group (Arnaldo Carvalho
     de Melo, Jiri Olsa)
   - Making some code (cpu node map and report parse callchain callback)
     global to be usable by upcomming changes (Don Zickus)
   - Fix pmu object compilation error (Jiri Olsa)

  Kernel side changes:
   - intrusive uprobes fixes from Oleg Nesterov.  Since the interface is
     admin-only, and the bug only affects user-space ("any probed
     jmp/call can kill the application"), we queued these fixes via the
     development tree, as a special exception.
   - more fuzzer motivated race fixes and related refactoring and
     robustization.
   - allow PMU drivers to be built as modules.  (No actual module yet,
     because the x86 Intel uncore module wasn't ready in time for this)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  perf tools: Add automatic remapping of Android libraries
  perf tools: Add cat as fallback pager
  perf tests: Add a testcase for histogram output sorting
  perf tests: Factor out print_hists_*()
  perf tools: Introduce reset_output_field()
  perf tools: Get rid of obsolete hist_entry__sort_list
  perf hists: Reset width of output fields with header length
  perf tools: Skip elided sort entries
  perf top: Add --fields option to specify output fields
  perf report/tui: Fix a bug when --fields/sort is given
  perf tools: Add ->sort() member to struct sort_entry
  perf report: Add -F option to specify output fields
  perf tools: Call perf_hpp__init() before setting up GUI browsers
  perf tools: Consolidate management of default sort orders
  perf tools: Allow hpp fields to be sort keys
  perf ui: Get rid of callback from __hpp__fmt()
  perf tools: Consolidate output field handling to hpp format routines
  perf tools: Use hpp formats to sort final output
  perf tools: Support event grouping in hpp ->sort()
  perf tools: Use hpp formats to sort hist entries
  ...
2014-06-03 13:18:00 -07:00
Linus Torvalds
776edb5931 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - reduced/streamlined smp_mb__*() interface that allows more usecases
     and makes the existing ones less buggy, especially in rarer
     architectures

   - add rwsem implementation comments

   - bump up lockdep limits"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  rwsem: Add comments to explain the meaning of the rwsem's count field
  lockdep: Increase static allocations
  arch: Mass conversion of smp_mb__*()
  arch,doc: Convert smp_mb__*()
  arch,xtensa: Convert smp_mb__*()
  arch,x86: Convert smp_mb__*()
  arch,tile: Convert smp_mb__*()
  arch,sparc: Convert smp_mb__*()
  arch,sh: Convert smp_mb__*()
  arch,score: Convert smp_mb__*()
  arch,s390: Convert smp_mb__*()
  arch,powerpc: Convert smp_mb__*()
  arch,parisc: Convert smp_mb__*()
  arch,openrisc: Convert smp_mb__*()
  arch,mn10300: Convert smp_mb__*()
  arch,mips: Convert smp_mb__*()
  arch,metag: Convert smp_mb__*()
  arch,m68k: Convert smp_mb__*()
  arch,m32r: Convert smp_mb__*()
  arch,ia64: Convert smp_mb__*()
  ...
2014-06-03 12:57:53 -07:00
Linus Torvalds
425553209b PCI changes for the v3.16 merge window:
Enumeration
     - Notify driver before and after device reset (Keith Busch)
     - Use reset notification in NVMe (Keith Busch)
 
   NUMA
     - Warn if we have to guess host bridge node information (Myron Stowe)
     - Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee Suthikulpanit)
     - Clean up and mark early_root_info_init() as deprecated (Suravee Suthikulpanit)
 
   Driver binding
     - Add "driver_override" for force specific binding (Alex Williamson)
     - Fail "new_id" addition for devices we already know about (Bandan Das)
 
   Resource management
     - Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox)
     - Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas)
     - Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
     - Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas)
     - Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas)
     - Don't convert BAR address to resource if dma_addr_t is too small (Bjorn Helgaas)
     - Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas)
     - Don't print anything while decoding is disabled (Bjorn Helgaas)
     - Don't add disabled subtractive decode bus resources (Bjorn Helgaas)
     - Add resource allocation comments (Bjorn Helgaas)
     - Restrict 64-bit prefetchable bridge windows to 64-bit resources (Yinghai Lu)
     - Assign i82875p_edac PCI resources before adding device (Yinghai Lu)
 
   PCI device hotplug
     - Remove unnecessary "dev->bus" test (Bjorn Helgaas)
     - Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas)
     - Fix rphahp endianess issues (Laurent Dufour)
     - Acknowledge spurious "cmd completed" event (Rajat Jain)
     - Allow hotplug service drivers to operate in polling mode (Rajat Jain)
     - Fix cpqphp possible NULL dereference (Rickard Strandqvist)
 
   MSI
     - Replace pci_enable_msi_block() by pci_enable_msi_exact() (Alexander Gordeev)
     - Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev)
     - Simplify populate_msi_sysfs() (Jan Beulich)
 
   Virtualization
     - Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson)
     - Mark RTL8110SC INTx masking as broken (Alex Williamson)
 
   Generic host bridge driver
     - Add generic PCI host controller driver (Will Deacon)
 
   Freescale i.MX6
     - Use new clock names (Lucas Stach)
     - Drop old IRQ mapping (Lucas Stach)
     - Remove optional (and unused) IRQs (Lucas Stach)
     - Add support for MSI (Lucas Stach)
     - Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat)
 
   Renesas R-Car
     - Add gen2 device tree support (Ben Dooks)
     - Use new OF interrupt mapping when possible (Lucas Stach)
     - Add PCIe driver (Phil Edworthy)
     - Add PCIe MSI support (Phil Edworthy)
     - Add PCIe device tree bindings (Phil Edworthy)
 
   Samsung Exynos
     - Remove unnecessary OOM messages (Jingoo Han)
     - Fix add_pcie_port() section mismatch warning (Sachin Kamat)
 
   Synopsys DesignWare
     - Make MSI ISR shared IRQ aware (Lucas Stach)
 
   Miscellaneous
     - Check for broken config space aliasing (Alex Williamson)
     - Update email address (Ben Hutchings)
     - Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas)
     - Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas)
     - Remove unnecessary __ref annotations (Bjorn Helgaas)
     - Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns (Bjorn Helgaas)
     - Fix use of uninitialized MPS value (Bjorn Helgaas)
     - Tidy x86/gart messages (Bjorn Helgaas)
     - Fix return value from pci_user_{read,write}_config_*() (Gavin Shan)
     - Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo)
     - Remove unused serial device IDs (Jean Delvare)
     - Use designated initialization in PCI_VDEVICE (Mark Rustad)
     - Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu)
     - Configure MPS on ARM (Murali Karicheri)
     - Remove unnecessary includes of <linux/init.h> (Paul Gortmaker)
     - Move Open Firmware devspec attribute to PCI common code (Sebastian Ott)
     - Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott)
     - Remove pcibios_add_platform_entries() (Sebastian Ott)
     - Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch)
     - Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang)
     - Add and use new pci_is_bridge() interface (Yijing Wang)
     - Make pci_bus_add_device() void (Yijing Wang)
 
   DMA API
     - Clarify physical/bus address distinction in docs (Bjorn Helgaas)
     - Fix typos in docs (Emilio López)
     - Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim)
     - Change dma_declare_coherent_memory() CPU address to phys_addr_t (Bjorn Helgaas)
     - Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory() (Bjorn Helgaas)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTjMaQAAoJEFmIoMA60/r8XncQAKX7cD6btXCZnrcYo7inseyp
 3rwOlrsNkWyHqSj/RqqzE1NY6L1h5G2uliI6xg1SKenuHPDcosm5d8FYO0ORKiUs
 xrqBkmZJHXN7fck//tJwsTXiYh5u42RO8QWbvZVr5UqXe40LyaMHMh9Y7VarrU/o
 sM2ADzFKagv1qMQ13nmYxqT+Zl+CqpimyLP+ep6Nfqxi6ils+KJ6b9SKYqrpqE6t
 Mcq2K5ShqU5SaYub1JIXLcQ9XylID+t1M9+cwixcs7a87HbJiktfkGqQvQJoUIuK
 Q5U+abcIGk4vfOnDCctSnoRyrcbTAZ/vqfo0vpX22TokESjwrD8hFOX5HPOFtD+4
 wIDbYurW/8oBhLRaJ0uTPzSH8bXjXTynAwxHZgIrEur5908eECKQ/WiFCxyrovvv
 r4ThAN0FaobllEr0XOFESOzDNSt/ME00WWI7+puAJ/KJkFEtcXt9othLmLmvLz8H
 2GWXrm/aOR0WUO7foGUxI3bXYlDN6NbSKpfuZsLAi2VAyJJ6L6yVSo/fT0X07e3z
 qRy9LOohuiwIKv/I4F2SEq2REfGGsnkrJBoeQi/oBZDcBy1Lsi7P9LWIERhLQEM+
 Hm+30lC/f326nI3hoyThj2k2xxZOQzCIvrt658xP4qd9Zfe1bvCH58FF8K62CoOd
 p8XAf7Sl6v6YUodUrT/t
 =km55
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into next

Pull PCI changes from Bjorn Helgaas:
 "Enumeration
    - Notify driver before and after device reset (Keith Busch)
    - Use reset notification in NVMe (Keith Busch)

  NUMA
    - Warn if we have to guess host bridge node information (Myron Stowe)
    - Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee
      Suthikulpanit)
    - Clean up and mark early_root_info_init() as deprecated (Suravee
      Suthikulpanit)

  Driver binding
    - Add "driver_override" for force specific binding (Alex Williamson)
    - Fail "new_id" addition for devices we already know about (Bandan
      Das)

  Resource management
    - Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox)
    - Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas)
    - Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
    - Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas)
    - Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas)
    - Don't convert BAR address to resource if dma_addr_t is too small
      (Bjorn Helgaas)
    - Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas)
    - Don't print anything while decoding is disabled (Bjorn Helgaas)
    - Don't add disabled subtractive decode bus resources (Bjorn Helgaas)
    - Add resource allocation comments (Bjorn Helgaas)
    - Restrict 64-bit prefetchable bridge windows to 64-bit resources
      (Yinghai Lu)
    - Assign i82875p_edac PCI resources before adding device (Yinghai Lu)

  PCI device hotplug
    - Remove unnecessary "dev->bus" test (Bjorn Helgaas)
    - Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas)
    - Fix rphahp endianess issues (Laurent Dufour)
    - Acknowledge spurious "cmd completed" event (Rajat Jain)
    - Allow hotplug service drivers to operate in polling mode (Rajat Jain)
    - Fix cpqphp possible NULL dereference (Rickard Strandqvist)

  MSI
    - Replace pci_enable_msi_block() by pci_enable_msi_exact()
      (Alexander Gordeev)
    - Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev)
    - Simplify populate_msi_sysfs() (Jan Beulich)

  Virtualization
    - Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson)
    - Mark RTL8110SC INTx masking as broken (Alex Williamson)

  Generic host bridge driver
    - Add generic PCI host controller driver (Will Deacon)

  Freescale i.MX6
    - Use new clock names (Lucas Stach)
    - Drop old IRQ mapping (Lucas Stach)
    - Remove optional (and unused) IRQs (Lucas Stach)
    - Add support for MSI (Lucas Stach)
    - Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat)

  Renesas R-Car
    - Add gen2 device tree support (Ben Dooks)
    - Use new OF interrupt mapping when possible (Lucas Stach)
    - Add PCIe driver (Phil Edworthy)
    - Add PCIe MSI support (Phil Edworthy)
    - Add PCIe device tree bindings (Phil Edworthy)

  Samsung Exynos
    - Remove unnecessary OOM messages (Jingoo Han)
    - Fix add_pcie_port() section mismatch warning (Sachin Kamat)

  Synopsys DesignWare
    - Make MSI ISR shared IRQ aware (Lucas Stach)

  Miscellaneous
    - Check for broken config space aliasing (Alex Williamson)
    - Update email address (Ben Hutchings)
    - Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas)
    - Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas)
    - Remove unnecessary __ref annotations (Bjorn Helgaas)
    - Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns
      (Bjorn Helgaas)
    - Fix use of uninitialized MPS value (Bjorn Helgaas)
    - Tidy x86/gart messages (Bjorn Helgaas)
    - Fix return value from pci_user_{read,write}_config_*() (Gavin Shan)
    - Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo)
    - Remove unused serial device IDs (Jean Delvare)
    - Use designated initialization in PCI_VDEVICE (Mark Rustad)
    - Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu)
    - Configure MPS on ARM (Murali Karicheri)
    - Remove unnecessary includes of <linux/init.h> (Paul Gortmaker)
    - Move Open Firmware devspec attribute to PCI common code (Sebastian Ott)
    - Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott)
    - Remove pcibios_add_platform_entries() (Sebastian Ott)
    - Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch)
    - Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang)
    - Add and use new pci_is_bridge() interface (Yijing Wang)
    - Make pci_bus_add_device() void (Yijing Wang)

  DMA API
    - Clarify physical/bus address distinction in docs (Bjorn Helgaas)
    - Fix typos in docs (Emilio López)
    - Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim)
    - Change dma_declare_coherent_memory() CPU address to phys_addr_t
      (Bjorn Helgaas)
    - Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory()
      (Bjorn Helgaas)"

* tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (92 commits)
  MAINTAINERS: Add generic PCI host controller driver
  PCI: generic: Add generic PCI host controller driver
  PCI: imx6: Add support for MSI
  PCI: designware: Make MSI ISR shared IRQ aware
  PCI: imx6: Remove optional (and unused) IRQs
  PCI: imx6: Drop old IRQ mapping
  PCI: imx6: Use new clock names
  i82875p_edac: Assign PCI resources before adding device
  ARM/PCI: Call pcie_bus_configure_settings() to set MPS
  PCI: imx6: Fix imx6_add_pcie_port() section mismatch warning
  PCI: Make pci_bus_add_device() void
  PCI: exynos: Fix add_pcie_port() section mismatch warning
  PCI: Introduce new device binding path using pci_dev.driver_override
  PCI: rcar: Add gen2 device tree support
  PCI: cpqphp: Fix possible null pointer dereference
  PCI: rcar: Add R-Car PCIe device tree bindings
  PCI: rcar: Add MSI support for PCIe
  PCI: rcar: Add Renesas R-Car PCIe driver
  PCI: Fix return value from pci_user_{read,write}_config_*()
  PCI: exynos: Remove unnecessary OOM messages
  ...
2014-06-02 12:15:19 -07:00
Linus Torvalds
9f888b3a10 xen: features and fixes for 3.16-rc0
- Support foreign mappings in PVH domains (needed when dom0 is PVH)
 
 - Fix mapping high MMIO regions in x86 PV guests (this is also the
   first half of removing the PAGE_IOMAP PTE flag).
 
 - ARM suspend/resume support.
 
 - ARM multicall support.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJTjE5MAAoJEFxbo/MsZsTRtl8H/2lfS9w05e60vRxjolPV0vRc
 5k9DcYFeJ+k2cz/2T3mNlIvKdfBTesSfgVquH+28GhQz+uKFQ1OrJpYNDTougSw5
 Wv0Ae8e+7eLABvJ9XMiZdDsPzsICw2wqWOvqrnQi2qR3SIimBc5tBigR4+Rccv+e
 btuBLlYT4WPQ8qgNyCBPgxzuyxteu5wK/0XryX6NcbrxeEbAzQAeDKkmvCD4fSvx
 KxrwTO3mwV4Lefmf/WS4Z9fDcPujQOUqKEtUWanw/2JalO1BzDPo+1wvYs0LduLC
 QI/YJN4SL3UeGOmbX2tyIaRgMsAcQVVrYkTm1cp8eD7vcRuvXaqy6dxuX05+V4g=
 =cxfG
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.16-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into next

Pull Xen updates from David Vrabel:
 "xen: features and fixes for 3.16-rc0
   - support foreign mappings in PVH domains (needed when dom0 is PVH)

   - fix mapping high MMIO regions in x86 PV guests (this is also the
     first half of removing the PAGE_IOMAP PTE flag).

   - ARM suspend/resume support.

   - ARM multicall support"

* tag 'stable/for-linus-3.16-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: map foreign pfns for autotranslated guests
  xen-acpi-processor: Don't display errors when we get -ENOSYS
  xen/pciback: Document the entry points for 'pcistub_put_pci_dev'
  xen/pciback: Document when the 'unbind' and 'bind' functions are called.
  xen-pciback: Document when we FLR an PCI device.
  xen-pciback: First reset, then free.
  xen-pciback: Cleanup up pcistub_put_pci_dev
  x86/xen: do not use _PAGE_IOMAP in xen_remap_domain_mfn_range()
  x86/xen: set regions above the end of RAM as 1:1
  x86/xen: only warn once if bad MFNs are found during setup
  x86/xen: compactly store large identity ranges in the p2m
  x86/xen: fix set_phys_range_identity() if pfn_e > MAX_P2M_PFN
  x86/xen: rename early_p2m_alloc() and early_p2m_alloc_middle()
  xen/x86: set panic notifier priority to minimum
  arm,arm64/xen: introduce HYPERVISOR_suspend()
  xen: refactor suspend pre/post hooks
  arm: xen: export HYPERVISOR_multicall to modules.
  arm64: introduce virt_to_pfn
  arm/xen: Remove definiition of virt_to_pfn in asm/xen/page.h
  arm: xen: implement multicall hypercall support.
2014-06-02 08:24:12 -07:00
Minchan Kim
6538b8ea88 x86_64: expand kernel stack to 16K
While I play inhouse patches with much memory pressure on qemu-kvm,
3.14 kernel was randomly crashed. The reason was kernel stack overflow.

When I investigated the problem, the callstack was a little bit deeper
by involve with reclaim functions but not direct reclaim path.

I tried to diet stack size of some functions related with alloc/reclaim
so did a hundred of byte but overflow was't disappeard so that I encounter
overflow by another deeper callstack on reclaim/allocator path.

Of course, we might sweep every sites we have found for reducing
stack usage but I'm not sure how long it saves the world(surely,
lots of developer start to add nice features which will use stack
agains) and if we consider another more complex feature in I/O layer
and/or reclaim path, it might be better to increase stack size(
meanwhile, stack usage on 64bit machine was doubled compared to 32bit
while it have sticked to 8K. Hmm, it's not a fair to me and arm64
already expaned to 16K. )

So, my stupid idea is just let's expand stack size and keep an eye
toward stack consumption on each kernel functions via stacktrace of ftrace.
For example, we can have a bar like that each funcion shouldn't exceed 200K
and emit the warning when some function consumes more in runtime.
Of course, it could make false positive but at least, it could make a
chance to think over it.

I guess this topic was discussed several time so there might be
strong reason not to increase kernel stack size on x86_64, for me not
knowing so Ccing x86_64 maintainers, other MM guys and virtio
maintainers.

Here's an example call trace using up the kernel stack:

         Depth    Size   Location    (51 entries)
         -----    ----   --------
   0)     7696      16   lookup_address
   1)     7680      16   _lookup_address_cpa.isra.3
   2)     7664      24   __change_page_attr_set_clr
   3)     7640     392   kernel_map_pages
   4)     7248     256   get_page_from_freelist
   5)     6992     352   __alloc_pages_nodemask
   6)     6640       8   alloc_pages_current
   7)     6632     168   new_slab
   8)     6464       8   __slab_alloc
   9)     6456      80   __kmalloc
  10)     6376     376   vring_add_indirect
  11)     6000     144   virtqueue_add_sgs
  12)     5856     288   __virtblk_add_req
  13)     5568      96   virtio_queue_rq
  14)     5472     128   __blk_mq_run_hw_queue
  15)     5344      16   blk_mq_run_hw_queue
  16)     5328      96   blk_mq_insert_requests
  17)     5232     112   blk_mq_flush_plug_list
  18)     5120     112   blk_flush_plug_list
  19)     5008      64   io_schedule_timeout
  20)     4944     128   mempool_alloc
  21)     4816      96   bio_alloc_bioset
  22)     4720      48   get_swap_bio
  23)     4672     160   __swap_writepage
  24)     4512      32   swap_writepage
  25)     4480     320   shrink_page_list
  26)     4160     208   shrink_inactive_list
  27)     3952     304   shrink_lruvec
  28)     3648      80   shrink_zone
  29)     3568     128   do_try_to_free_pages
  30)     3440     208   try_to_free_pages
  31)     3232     352   __alloc_pages_nodemask
  32)     2880       8   alloc_pages_current
  33)     2872     200   __page_cache_alloc
  34)     2672      80   find_or_create_page
  35)     2592      80   ext4_mb_load_buddy
  36)     2512     176   ext4_mb_regular_allocator
  37)     2336     128   ext4_mb_new_blocks
  38)     2208     256   ext4_ext_map_blocks
  39)     1952     160   ext4_map_blocks
  40)     1792     384   ext4_writepages
  41)     1408      16   do_writepages
  42)     1392      96   __writeback_single_inode
  43)     1296     176   writeback_sb_inodes
  44)     1120      80   __writeback_inodes_wb
  45)     1040     160   wb_writeback
  46)      880     208   bdi_writeback_workfn
  47)      672     144   process_one_work
  48)      528     112   worker_thread
  49)      416     240   kthread
  50)      176     176   ret_from_fork

[ Note: the problem is exacerbated by certain gcc versions that seem to
  generate much bigger stack frames due to apparently bad coalescing of
  temporaries and generating too many spills.  Rusty saw gcc-4.6.4 using
  35% more stack on the virtio path than 4.8.2 does, for example.

  Minchan not only uses such a bad gcc version (4.6.3 in his case), but
  some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
  what causes that kernel_map_pages() frame, for example). But we're
  clearly getting too close.

  The VM code also seems to have excessive stack frames partly for the
  same compiler reason, triggered by excessive inlining and lots of
  function arguments.

  We need to improve on our stack use, but in the meantime let's do this
  simple stack increase too.  Unlike most earlier reports, there is
  nothing simple that stands out as being really horribly wrong here,
  apart from the fact that the stack frames are just bigger than they
  should need to be.        - Linus ]

Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S Tsirkin <mst@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-30 11:52:51 -07:00
H. Peter Anvin
c9e5a5a703 x86/xsave: Make it clear that the XSAVE macros use (%edi)/(%rdi)
The XSAVE instruction family takes a memory argment.  The macros use
(%edi)/(%rdi) as that memory argument - make that clear to the reader.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-7-git-send-email-fenghua.yu@intel.com
2014-05-30 08:19:21 -07:00
Fenghua Yu
7496d6458f Define kernel API to get address of each state in xsave area
In standard form, each state is saved in the xsave area in fixed offset.
But in compacted form, offset of each saved state only can be calculated during
run time because some xstates may not be enabled and saved.

We define kernel API get_xsave_addr() returns address of a given state saved in a xsave area.

It can be called in kernel to get address of each xstate in xsave area in
either standard format or compacted format.

It's useful when kernel wants to directly access each state in xsave area.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-17-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:33:09 -07:00
Fenghua Yu
f41d830fa8 x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time
__save_fpu() can be called during early booting time when cpu caps are not
enabled and alternative can not be used yet. Therefore, it calls
xsave_state_booting() during booting time to save xstate to task's xsave area.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-14-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:33:04 -07:00
Fenghua Yu
adb9d526e9 x86/xsaves: Add xsaves and xrstors support for booting time
Since boot_cpu_data and cpu capabilities are not enabled yet during early
booting time, alternative can not be used in some functions to access xsave
area. Therefore, we define two new functions xrstor_state_booting() and
xsave_state_booting() to access xsave area just during early booting time.

xrstor_state_booting restores xstate from xsave area during early booting time.
xsave_state_booting saves xstate to xsave area during early booting time.

The two functions are similar to xrstor_state and xsave_state respectively.
But the two functions don't use alternatives because alternatives are not
enabled when they are called in such early booting time.

xrstor_state_booting is called only by functions defined as __init. So it's
defined as __init and will be removed from memory after booting time. There
is no extra memory cost caused by this function during running time.

But because xsave_state_booting can be called by run-time function __save_fpu(),
it's not defined as __init and will stay in memory during running time although
it will not be called anymore during running time. It is not ideal to
have this function stay in memory during running time. But it's a pretty small
function and the memory cost will be small. By doing in this way, we can
avoid to change a lot of code to just remove this small function and save a
bit memory for running time.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-13-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:33:02 -07:00
Fenghua Yu
facbf4d91a x86/xsaves: Use xsave/xrstor for saving and restoring user space context
We use legacy xsave/xrstor to save and restore standard form of xsave area
in user space context. No xsaveopt or xsaves is used here for two reasons.

First, we don't want to use modified optimization which is implemented in
xsaveopt and xsaves because xrstor/xrstors might track a wrong user space
application.

Secondly, we don't use compacted format of xsave area for backward
compatibility because legacy user space applications only don't understand
the compacted format of the xsave area.

Using standard form of the xsave area may allocate more memory for
user context than compacted form, but preserves compatibility with
legacy applications.  Furthermore, even with holes, the relevant cache
lines don't get touched and thus the performance impact is limited.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-11-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:32:57 -07:00
Fenghua Yu
f9de314b34 x86/xsaves: Use xsaves/xrstors for context switch
If xsaves is eanbled, use xsaves/xrstors for context switch to support
compacted format xsave area to occupy less memory and modified optimization
to improve saving performance.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-10-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:31:25 -07:00
Fenghua Yu
f31a9f7c71 x86/xsaves: Use xsaves/xrstors to save and restore xsave area
If xsaves is eanbled, use xsaves/xrstors instrucitons to save and restore
xstate. xsaves and xrstors support compacted format, init optimization,
modified optimization, and supervisor states.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:31:21 -07:00
Fenghua Yu
b84e70552e x86/xsaves: Define a macro for handling xsave/xrstor instruction fault
Define a macro to handle fault generated by xsave, xsaveopt, xsaves, xrstor,
and xrstors instructions. It is used in functions like xsave_state() etc.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:31:18 -07:00
Fenghua Yu
200b08a970 x86/xsaves: Define macros for xsave instructions
Define macros for xsave, xsaveopt, xsaves, xrstor, and xrstors inline
instructions. The instructions will be used for saving and restoring xstate.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:31:16 -07:00
Fenghua Yu
0b29643a58 x86/xsaves: Change compacted format xsave area header
The XSAVE area header is changed to support both compacted format and
standard format of xsave area.

The XSAVE header of an xsave area comprises the 64 bytes starting at offset
512 from the area base address:

- Bytes 7:0 of the xsave header is a state-component bitmap called
  xstate_bv. It identifies the state components in the xsave area.

- Bytes 15:8 of the xsave header is a state-component bitmap called
  xcomp_bv. It is used as follows:
  - xcomp_bv[63] indicates the format of the extended region of
    the xsave area. If it is clear, the standard format is used.
    If it is set, the compacted format is used.
  - xcomp_bv[62:0] indicate which features (starting at feature 2)
    have space allocated for them in the compacted format.

- Bytes 63:16 of the xsave header are reserved.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-6-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:31:10 -07:00
Fenghua Yu
5b3e83f46a x86/alternative: Add alternative_input_2 to support alternative with two features and input
alternative_input_2() replaces old instruction with new instructions with
input based on two features.

In alternative_input_2(oldinstr, newinstr1, feature1, newinstr2, feature2,
		input...),

feature2 has higher priority to replace oldinstr than feature1.

If CPU has feature2, newinstr2 replaces oldinstr and newinstr2 is
executed during run time.

If CPU doesn't have feature2, but it has feature1, newinstr1 replaces oldinstr
and newinstr1 is executed during run time.

If CPU doesn't have feature2 and feature1, oldinstr is executed during run
time.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:24:53 -07:00
Fenghua Yu
6229ad278c x86/xsaves: Detect xsaves/xrstors feature
Detect the xsaveopt, xsavec, xgetbv, and xsaves features in processor extended
state enumberation sub-leaf (eax=0x0d, ecx=1):
Bit 00: XSAVEOPT is available
Bit 01: Supports XSAVEC and the compacted form of XRSTOR if set
Bit 02: Supports XGETBV with ECX = 1 if set
Bit 03: Supports XSAVES/XRSTORS and IA32_XSS if set

The above features are defined in the new word 10 in cpu features.

The IA32_XSS MSR (index DA0H) contains a state-component bitmap that specifies
the state components that software has enabled xsaves and xrstors to manage.
If the bit corresponding to a state component is clear in XCR0 | IA32_XSS,
xsaves and xrstors will not operate on that state component, regardless of
the value of the instruction mask.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:24:28 -07:00
Fenghua Yu
446fd806f5 x86/cpufeature.h: Reformat x86 feature macros
In each X86 feature macro definition, add one space in front of the word
number which is a one-digit number currently.

The purpose of reformatting the macros is to align one-digit and two-digit
word numbers.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 12:37:10 -07:00
Hanjun Guo
a43ae58c84 PCI: Turn pcibios_penalize_isa_irq() into a weak function
pcibios_penalize_isa_irq() is only implemented by x86 now, and legacy ISA
is not used by some architectures.  Make pcibios_penalize_isa_irq() a
__weak function to simplify the code.  This removes the need for new
platforms to add stub implementations of pcibios_penalize_isa_irq().

[bhelgaas: changelog, comments]
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2014-05-27 16:23:58 -06:00
Lv Zheng
92985ef1db ACPICA: Clean up redudant definitions already defined elsewhere
Since mis-order issues have been solved, we can cleanup redundant
definitions that already have defaults in <acpi/platform/acenv.h>.

This patch removes redudant environments for __KERNEL__ surrounded code.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-27 18:13:08 +02:00
Lv Zheng
07d8391433 ACPICA: Linux headers: Add <asm/acenv.h> to remove mis-ordered inclusion of <asm/acpi.h>
There is a mis-order inclusion for <asm/acpi.h>.

As we will enforce including <linux/acpi.h> for all Linux ACPI users, we
can find the inclusion order is as follows:

<linux/acpi.h>
  <acpi/acpi.h>
   <acpi/platform/acenv.h>
    (acenv.h before including aclinux.h)
    <acpi/platform/aclinux.h>
...........................................................................
     (aclinux.h before including asm/acpi.h)
     <asm/acpi.h>                             @Redundant@
      (ACPICA specific stuff)
...........................................................................
...........................................................................
      (Linux ACPI specific stuff) ? - - - - - - - - - - - - +
     (aclinux.h after including asm/acpi.h)   @Invisible@   |
    (acenv.h after including aclinux.h)       @Invisible@   |
   other ACPICA headers                       @Invisible@   |
............................................................|..............
  <acpi/acpi_bus.h>                                         |
  <acpi/acpi_drivers.h>                                     |
  <asm/acpi.h> (Excluded)                                   |
   (Linux ACPI specific stuff) ! <- - - - - - - - - - - - - +

NOTE that, in ACPICA, <acpi/platform/acenv.h> is more like Kconfig
generated <generated/autoconf.h> for Linux, it is meant to be included
before including any ACPICA code.

In the above figure, there is a question mark for "Linux ACPI specific
stuff" in <asm/acpi.h> which should be included after including all other
ACPICA header files.  Thus they really need to be moved to the position
marked with exclaimation mark or the definitions in the blocks marked with
"@Invisible@" will be invisible to such architecture specific "Linux ACPI
specific stuff" header blocks.  This leaves 2 issues:
1. All environmental definitions in these blocks should have a copy in the
   area marked with "@Redundant@" if they are required by the "Linux ACPI
   specific stuff".
2. We cannot use any ACPICA defined types in <asm/acpi.h>.

This patch splits architecture specific ACPICA stuff from <asm/acpi.h> to
fix this issue.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-27 18:13:07 +02:00
David S. Miller
54e5c4def0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_alb.c
	drivers/net/ethernet/altera/altera_msgdma.c
	drivers/net/ethernet/altera/altera_sgdma.c
	net/ipv6/xfrm6_output.c

Several cases of overlapping changes.

The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.

In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.

Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-24 00:32:30 -04:00
Dave Hansen
65a7f03f6b x86: fix page fault tracing when KVM guest support enabled
I noticed on some of my systems that page fault tracing doesn't
work:

	cd /sys/kernel/debug/tracing
	echo 1 > events/exceptions/enable
	cat trace;
	# nothing shows up

I eventually traced it down to CONFIG_KVM_GUEST.  At least in a
KVM VM, enabling that option breaks page fault tracing, and
disabling fixes it.  I tried on some old kernels and this does
not appear to be a regression: it never worked.

There are two page-fault entry functions today.  One when tracing
is on and another when it is off.  The KVM code calls do_page_fault()
directly instead of calling the traced version:

> dotraplinkage void __kprobes
> do_async_page_fault(struct pt_regs *regs, unsigned long
> error_code)
> {
>         enum ctx_state prev_state;
>
>         switch (kvm_read_and_reset_pf_reason()) {
>         default:
>                 do_page_fault(regs, error_code);
>                 break;
>         case KVM_PV_REASON_PAGE_NOT_PRESENT:

I'm also having problems with the page fault tracing on bare
metal (same symptom of no trace output).  I'm unsure if it's
related.

Steven had an alternative to this which has zero overhead when
tracing is off where this includes the standard noops even when
tracing is disabled.  I'm unconvinced that the extra complexity
of his apporach:

	http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home

is worth it, expecially considering that the KVM code is already
making page fault entry slower here.  This solution is
dirt-simple.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22 17:47:17 +02:00
Paolo Bonzini
ae9fedc793 KVM: x86: get CPL from SS.DPL
CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS.  And CS.DPL is also not equal
to the CPL for conforming code segments.

However, SS.DPL *is* always equal to the CPL except for the weird
case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
value in the STAR MSR, but force CPL=3 (Intel instead forces
SS.DPL=SS.RPL=CPL=3).

So this patch:

- modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
the above case with SYSRET is not broken further, and the way
to fix it would be to pass the CPL to userspace and back

- modifies VMX to always return the CPL from SS.DPL (except
forcing it to 0 if we are emulating real mode via vm86 mode;
in vm86 mode all DPLs have to be 3, but real mode does allow
privileged instructions).  It also removes the CPL cache,
which becomes a duplicate of the SS access rights cache.

This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
CR0.PE=1 but before CS has been reloaded.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22 17:47:17 +02:00
Paolo Bonzini
fb5e336b97 KVM: x86: drop set_rflags callback
Not needed anymore now that the CPL is computed directly
during task switch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22 17:47:16 +02:00
Ingo Molnar
65c2ce7004 Linux 3.15-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTfR2zAAoJEHm+PkMAQRiG3noH/2s+KUge3qO2M+AmxttUo74B
 +npAMdbqYR3MdEiwxYZfsHcMu4Ye/IKLcrh4pydB5hI2mdjtGkH1bnmia0f1ve/c
 Z/a0256+W8gWp7mcUBqSNztqLPAWa7wKOqNdLjj5idr1BSj6u8im+fQ9FBh2woki
 1fyYAuq/60lq4CMOKJvkA95V1Ome/jO+8tS4PguOgsCETQxCVFGurZcBbG3Mx5Y3
 v+ioCqeRc6GvxPFR6YngnTZCrsLxSRT3tnO2Qy5zX7dxjIQkCEbvIckpBQv01Y3R
 wNUaX+2Jae207igxrEv8CjmCFnmZFuUI15aWWCy6fOS/j8bjuk6ThYJO8N4ZBM0=
 =2ShG
 -----END PGP SIGNATURE-----

Merge tag 'v3.15-rc6' into sched/core, to pick up the latest fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:28:56 +02:00
H. Peter Anvin
03c1b4e8e5 Merge remote-tracking branch 'origin/x86/espfix' into x86/vdso
Merge x86/espfix into x86/vdso, due to changes in the vdso setup code
that otherwise cause conflicts.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-21 17:36:33 -07:00
H. Peter Anvin
e6ab9a20e7 Merge commit '7ed6fb9b5a5510e4ef78ab27419184741169978a' into x86/espfix
Merge in Linus' tree with:

fa81511bb0 x86-64, modify_ldt: Make support for 16-bit segments a runtime option

... reverted, to avoid a conflict.  This commit is no longer necessary
with the proper fix in place.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-21 15:23:19 -07:00
Borislav Petkov
65cef1311d x86, microcode: Add a disable chicken bit
Add a cmdline param which disables the microcode loader. This is useful
mostly in debugging situations where we want to turn off microcode
loading, both early from the initrd and late, as a means to be able to
rule out its influence on the machine.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1400525957-11525-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-20 20:21:27 -07:00
Borislav Petkov
1b1ded57a4 x86, boot: Carve out early cmdline parsing function
Carve out early cmdline parsing function into .../lib/cmdline.c so it
can be used by early code in the kernel proper as well.

Adapted from arch/x86/boot/cmdline.c.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1400525957-11525-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-20 20:21:24 -07:00
Andy Lutomirski
a62c34bd2a x86, mm: Improve _install_special_mapping and fix x86 vdso naming
Using arch_vma_name to give special mappings a name is awkward.  x86
currently implements it by comparing the start address of the vma to
the expected address of the vdso.  This requires tracking the start
address of special mappings and is probably buggy if a special vma
is split or moved.

Improve _install_special_mapping to just name the vma directly.  Use
it to give the x86 vvar area a name, which should make CRIU's life
easier.

As a side effect, the vvar area will show up in core dumps.  This
could be considered weird and is fixable.

[hpa: I say we accept this as-is but be prepared to deal with knocking
 out the vvars from core dumps if this becomes a problem.]

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/276b39b6b645fb11e345457b503f17b83c2c6fd0.1400538962.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-20 11:38:42 -07:00
Thomas Gleixner
a553b142b8 iommu: dmar: Provide arch specific irq allocation
ia64 and x86 share this driver. x86 is moving to a different irq
allocation and ia64 keeps its private irq_create/destroy stuff.

Use macros to redirect to one or the other. Yes, macros to avoid
include hell.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Joerg Roedel <joro@8bytes.org>
Cc: x86@kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20140507154336.372289825@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:19 +02:00
Thomas Gleixner
d07c9f1875 x86: Get rid of get_nr_irqs_gsi()
No need to expose this outside of the ioapic code. The dynamic
allocations are guaranteed not to happen in the gsi space. See commit
62a08ae2a.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20140507154335.959870037@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:19 +02:00
Oleg Nesterov
5e1b05beec x86/traps: Make math_error() static
Trivial, make math_error() static.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14 13:57:26 +02:00
Denys Vlasenko
50204c6f6d uprobes/x86: Simplify rip-relative handling
It is possible to replace rip-relative addressing mode with addressing
mode of the same length: (reg+disp32). This eliminates the need to fix
up immediate and correct for changing instruction length.

And we can kill arch_uprobe->def.riprel_target.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14 13:57:25 +02:00
Anthony Iliopoulos
9844f54623 x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()
The invalidation is required in order to maintain proper semantics
under CoW conditions. In scenarios where a process clones several
threads, a thread operating on a core whose DTLB entry for a
particular hugepage has not been invalidated, will be reading from
the hugepage that belongs to the forked child process, even after
hugetlb_cow().

The thread will not see the updated page as long as the stale DTLB
entry remains cached, the thread attempts to write into the page,
the child process exits, or the thread gets migrated to a different
processor.

Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com>
Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp
Suggested-by: Shay Goikhman <shay.goikhman@huawei.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v2.6.16+ (!)
2014-05-13 16:34:09 -07:00
Ong Boon Leong
7ef1def800 x86, iosf: Added Quark MBI identifiers
Added all the MBI units below and their associated read/write
opcodes:
 - Host Bridge Arbiter
 - Host Bridge
 - Remote Management Unit
 - Memory Manager & eSRAM
 - SoC Unit

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Link: http://lkml.kernel.org/r/1399668248-24199-3-git-send-email-david.e.box@linux.intel.com
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09 14:57:08 -07:00
David E. Box
6b8f0c8780 x86, iosf: Make IOSF driver modular and usable by more drivers
Currently drivers that run on non-IOSF systems (Core/Xeon) can't use the IOSF
driver on SOC's without selecting it which forces an unnecessary and limiting
dependency. Provides dummy functions to allow these modules to conditionally
use the driver on IOSF equipped platforms without impacting their ability to
compile and load on non-IOSF platforms. Build default m to ensure availability
on x86 SOC's.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1399668248-24199-2-git-send-email-david.e.box@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09 14:56:15 -07:00
Andres Freund
c45f77364b x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macro
The spuriously added semicolon didn't have any effect because the
macro isn't currently in use.

c0a639ad0b

Signed-off-by: Andres Freund <andres@anarazel.de>
Link: http://lkml.kernel.org/r/1399598957-7011-3-git-send-email-andres@anarazel.de
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-09 08:42:47 -07:00
Peter Zijlstra
f80c5b39b8 sched/idle, x86: Switch from TS_POLLING to TIF_POLLING_NRFLAG
Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that
both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word.
This will allow us, using fetch_or(), to both set NEED_RESCHED and
check for POLLING_NRFLAG in a single operation and avoid pointless
wakeups.

Changing from the non-atomic thread_info::status flags to the atomic
thread_info::flags shouldn't be a big issue since most polling state
changes were followed/preceded by a full memory barrier anyway.

Also, fix up the apm_32 idle function, clearly that was forgotten in
the last conversion. The default idle state is !POLLING so just kill
the lot.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Link: http://lkml.kernel.org/n/tip-7yksmqtlv4nfowmlqr1rifoi@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-08 09:16:56 +02:00
Feng Tang
f10f383d84 x86/hpet: Make boot_hpet_disable extern
HPET on some platform has accuracy problem. Making
"boot_hpet_disable" extern so that we can runtime disable
the HPET timer by using quirk to check the platform.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1398327498-13163-1-git-send-email-feng.tang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-08 08:15:34 +02:00
Andy Lutomirski
f40c330091 x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
This makes the 64-bit and x32 vdsos use the same mechanism as the
32-bit vdso.  Most of the churn is deleting all the old fixmap code.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/8af87023f57f6bb96ec8d17fce3f88018195b49b.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:19:01 -07:00
Andy Lutomirski
18d0a6fd22 x86, vdso: Move the 32-bit vdso special pages after the text
This unifies the vdso mapping code and teaches it how to map special
pages at addresses corresponding to symbols in the vdso image.  The
new code is used for all vdso variants, but so far only the 32-bit
variants use the new vvar page position.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/b6d7858ad7b5ac3fd3c29cab6d6d769bc45d195e.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:56 -07:00
Andy Lutomirski
6f121e548f x86, vdso: Reimplement vdso.so preparation in build-time C
Currently, vdso.so files are prepared and analyzed by a combination
of objcopy, nm, some linker script tricks, and some simple ELF
parsers in the kernel.  Replace all of that with plain C code that
runs at build time.

All five vdso images now generate .c files that are compiled and
linked in to the kernel image.

This should cause only one userspace-visible change: the loaded vDSO
images are stripped more heavily than they used to be.  Everything
outside the loadable segment is dropped.  In particular, this causes
the section table and section name strings to be missing.  This
should be fine: real dynamic loaders don't load or inspect these
tables anyway.  The result is roughly equivalent to eu-strip's
--strip-sections option.

The purpose of this change is to enable the vvar and hpet mappings
to be moved to the page following the vDSO load segment.  Currently,
it is possible for the section table to extend into the page after
the load segment, so, if we map it, it risks overlapping the vvar or
hpet page.  This happens whenever the load segment is just under a
multiple of PAGE_SIZE.

The only real subtlety here is that the old code had a C file with
inline assembler that did 'call VDSO32_vsyscall' and a linker script
that defined 'VDSO32_vsyscall = __kernel_vsyscall'.  This most
likely worked by accident: the linker script entry defines a symbol
associated with an address as opposed to an alias for the real
dynamic symbol __kernel_vsyscall.  That caused ld to relocate the
reference at link time instead of leaving an interposable dynamic
relocation.  Since the VDSO32_vsyscall hack is no longer needed, I
now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it
work.  vdso2c will generate an error and abort the build if the
resulting image contains any dynamic relocations, so we won't
silently generate bad vdso images.

(Dynamic relocations are a problem because nothing will even attempt
to relocate the vdso.)

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:51 -07:00
Andy Lutomirski
cfda7bb9ec x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c
This code is used during CPU setup, and it isn't strictly speaking
related to the 32-bit vdso.  It's easier to understand how this
works when the code is closer to its callers.

This also lets syscall32_cpu_init be static, which might save some
trivial amount of kernel text.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/4e466987204e232d7b55a53ff6b9739f12237461.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:47 -07:00
Andy Lutomirski
3d7ee969bf x86, vdso: Clean up 32-bit vs 64-bit vdso params
Rather than using 'vdso_enabled' and an awful #define, just call the
parameters vdso32_enabled and vdso64_enabled.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/87913de56bdcbae3d93917938302fc369b05caee.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:40 -07:00
Tom Herbert
4405b4d635 net: Change x86_64 add32_with_carry to allow memory operand
Note add32_with_carry(a, b) is suboptimal, as it forces
a and b in registers.

b could be a memory or a register operand.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:26:29 -04:00
Tom Herbert
a278534406 x86_64: csum_add for x86_64
Add csum_add function for x86_64.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:26:29 -04:00
H. Peter Anvin
20b68535cd x86, espfix: Fix broken header guard
Header guard is #ifndef, not #ifdef...

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-02 11:34:17 -07:00
H. Peter Anvin
e1fe9ed8d2 x86, espfix: Move espfix definitions into a separate header file
Sparse warns that the percpu variables aren't declared before they are
defined.  Rather than hacking around it, move espfix definitions into
a proper header file.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-01 14:16:15 -07:00
H. Peter Anvin
3891a04aaf x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer.  This
causes some 16-bit software to break, but it also leaks kernel state
to user space.  We have a software workaround for that ("espfix") for
the 32-bit kernel, but it relies on a nonzero stack segment base which
is not available in 64-bit mode.

In checkin:

    b3b42ac2cb x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
the logic that 16-bit support is crippled on 64-bit kernels anyway (no
V86 support), but it turns out that people are doing stuff like
running old Win16 binaries under Wine and expect it to work.

This works around this by creating percpu "ministacks", each of which
is mapped 2^16 times 64K apart.  When we detect that the return SS is
on the LDT, we copy the IRET frame to the ministack and use the
relevant alias to return to userspace.  The ministacks are mapped
readonly, so if IRET faults we promote #GP to #DF which is an IST
vector and thus has its own stack; we then do the fixup in the #DF
handler.

(Making #GP an IST exception would make the msr_safe functions unsafe
in NMI/MC context, and quite possibly have other effects.)

Special thanks to:

- Andy Lutomirski, for the suggestion of using very small stack slots
  and copy (as opposed to map) the IRET frame there, and for the
  suggestion to mark them readonly and let the fault promote to #DF.
- Konrad Wilk for paravirt fixup and testing.
- Borislav Petkov for testing help and useful comments.

Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrew Lutomriski <amluto@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dirk Hohndel <dirk@hohndel.org>
Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
Cc: comex <comexk@gmail.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <stable@vger.kernel.org> # consider after upstream merge
2014-04-30 14:14:28 -07:00
Oleg Nesterov
1dc76e6eac uprobes/x86: Kill adjust_ret_addr(), simplify UPROBE_FIX_CALL logic
The only insn which could have both UPROBE_FIX_IP and UPROBE_FIX_CALL
was 0xe8 "call relative", and now it is handled by branch_xol_ops.

So we can change default_post_xol_op(UPROBE_FIX_CALL) to simply push
the address of next insn == utask->vaddr + insn.length, just we need
to record insn.length into the new auprobe->def.ilen member.

Note: if/when we teach branch_xol_ops to support jcxz/loopz we can
remove the "correction" logic, UPROBE_FIX_IP can use the same address.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30 19:10:39 +02:00
Oleg Nesterov
78d9af4cd3 uprobes/x86: Cleanup the usage of arch_uprobe->def.fixups, make it u8
handle_riprel_insn() assumes that nobody else could modify ->fixups
before. This is correct but fragile, change it to use "|=".

Also make ->fixups u8, we are going to add the new members into the
union. It is not clear why UPROBE_FIX_RIP_.X lived in the upper byte,
redefine them so that they can fit into u8.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30 19:10:38 +02:00
Oleg Nesterov
97aa5cddbe uprobes/x86: Move default_xol_ops's data into arch_uprobe->def
Finally we can move arch_uprobe->fixups/rip_rela_target_address
into the new "def" struct and place this struct in the union, they
are only used by default_xol_ops paths.

The patch also renames rip_rela_target_address to riprel_target just
to make this name shorter.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30 19:10:37 +02:00
Ian Campbell
5e40704ed2 arm: xen: implement multicall hypercall support.
As part of this make the usual change to xen_ulong_t in place of unsigned long.
This change has no impact on x86.

The Linux definition of struct multicall_entry.result differs from the Xen
definition, I think for good reasons, and used a long rather than an unsigned
long. Therefore introduce a xen_long_t, which is a long on x86 architectures
and a signed 64-bit integer on ARM.

Use uint32_t nr_calls on x86 for consistency with the ARM definition.

Build tested on amd64 and i386 builds. Runtime tested on ARM.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-04-24 13:09:46 +01:00
Masami Hiramatsu
9326638cbe kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation
Use NOKPROBE_SYMBOL macro for protecting functions
from kprobes instead of __kprobes annotation under
arch/x86.

This applies nokprobe_inline annotation for some cases,
because NOKPROBE_SYMBOL() will inhibit inlining by
referring the symbol address.

This just folds a bunch of previous NOKPROBE_SYMBOL()
cleanup patches for x86 to one patch.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20140417081814.26341.51656.stgit@ltc230.yrl.intra.hitachi.co.jp
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Lebon <jlebon@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24 10:26:38 +02:00
Masami Hiramatsu
6f6343f53d kprobes/x86: Call exception handlers directly from do_int3/do_debug
To avoid a kernel crash by probing on lockdep code, call
kprobe_int3_handler() and kprobe_debug_handler()(which was
formerly called post_kprobe_handler()) directly from
do_int3 and do_debug.

Currently kprobes uses notify_die() to hook the int3/debug
exceptoins. Since there is a locking code in notify_die,
the lockdep code can be invoked. And because the lockdep
involves printk() related things, theoretically, we need to
prohibit probing on such code, which means much longer blacklist
we'll have. Instead, hooking the int3/debug for kprobes before
notify_die() can avoid this problem.

Anyway, most of the int3 handlers in the kernel are already
called from do_int3 directly, e.g. ftrace_int3_handler,
poke_int3_handler, kgdb_ll_trap. Actually only
kprobe_exceptions_notify is on the notifier_call_chain.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jonathan Lebon <jlebon@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/20140417081733.26341.24423.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24 10:02:59 +02:00
Masami Hiramatsu
376e242429 kprobes: Introduce NOKPROBE_SYMBOL() macro to maintain kprobes blacklist
Introduce NOKPROBE_SYMBOL() macro which builds a kprobes
blacklist at kernel build time.

The usage of this macro is similar to EXPORT_SYMBOL(),
placed after the function definition:

  NOKPROBE_SYMBOL(function);

Since this macro will inhibit inlining of static/inline
functions, this patch also introduces a nokprobe_inline macro
for static/inline functions. In this case, we must use
NOKPROBE_SYMBOL() for the inline function caller.

When CONFIG_KPROBES=y, the macro stores the given function
address in the "_kprobe_blacklist" section.

Since the data structures are not fully initialized by the
macro (because there is no "size" information),  those
are re-initialized at boot time by using kallsyms.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20140417081705.26341.96719.stgit@ltc230.yrl.intra.hitachi.co.jp
Cc: Alok Kataria <akataria@vmware.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jan-Simon Möller <dl9pf@gmx.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-sparse@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24 10:02:56 +02:00
Nadav Amit
346874c950 KVM: x86: Fix CR3 reserved bits
According to Intel specifications, PAE and non-PAE does not have any reserved
bits.  In long-mode, regardless to PCIDE, only the high bits (above the
physical address) are reserved.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:46:57 -03:00
Peter Zijlstra
d00a569284 arch,x86: Convert smp_mb__*()
x86 is strongly ordered and all its atomic ops imply a full barrier.

Implement the two new primitives as the old ones were.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-knswsr5mldkr0w1lrdxvc81w@git.kernel.org
Cc: Dave Jones <davej@redhat.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 14:20:46 +02:00
Ingo Molnar
1111b680d3 Merge branch 'perf/urgent' into perf/core, to pick up PMU driver fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:14:55 +02:00
Oleg Nesterov
8e89c0be17 uprobes/x86: Emulate relative call's
See the previous "Emulate unconditional relative jmp's" which explains
why we can not execute "jmp" out-of-line, the same applies to "call".

Emulating of rip-relative call is trivial, we only need to additionally
push the ret-address. If this fails, we execute this instruction out of
line and this should trigger the trap, the probed application should die
or the same insn will be restarted if a signal handler expands the stack.
We do not even need ->post_xol() for this case.

But there is a corner (and almost theoretical) case: another thread can
expand the stack right before we execute this insn out of line. In this
case it hit the same problem we are trying to solve. So we simply turn
the probed insn into "call 1f; 1:" and add ->post_xol() which restores
->sp and restarts.

Many thanks to Jonathan who finally found the standalone reproducer,
otherwise I would never resolve the "random SIGSEGV's under systemtap"
bug-report. Now that the problem is clear we can write the simplified
test-case:

	void probe_func(void), callee(void);

	int failed = 1;

	asm (
		".text\n"
		".align 4096\n"
		".globl probe_func\n"
		"probe_func:\n"
		"call callee\n"
		"ret"
	);

	/*
	 * This assumes that:
	 *
	 *	- &probe_func = 0x401000 + a_bit, aligned = 0x402000
	 *
	 *	- xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
	 *	  as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
	 *
	 * so we can target the non-canonical address from xol_vma using
	 * the simple math below, 100 * 4096 is just the random offset
	 */
	asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1  + 100 * 4096\n");

	void callee(void)
	{
		failed = 0;
	}

	int main(void)
	{
		probe_func();
		return failed;
	}

It SIGSEGV's if you probe "probe_func" (although this is not very reliable,
randomize_va_space/etc can change the placement of xol area).

Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8)
differently. This patch relies on lib/insn.c and thus implements the intel's
behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use
this insn, so we postpone the fix until we decide what should we do; emulate
or not, support or not, etc.

Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:23 +02:00
Oleg Nesterov
7ba6db2d68 uprobes/x86: Emulate unconditional relative jmp's
Currently we always execute all insns out-of-line, including relative
jmp's and call's. This assumes that even if regs->ip points to nowhere
after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will
update it correctly.

However, this doesn't work if this regs->ip == xol_vaddr + insn_offset
is not canonical. In this case CPU generates #GP and general_protection()
kills the task which tries to execute this insn out-of-line.

Now that we have uprobe_xol_ops we can teach uprobes to emulate these
insns and solve the problem. This patch adds branch_xol_ops which has
a single branch_emulate_op() hook, so far it can only handle rel8/32
relative jmp's.

TODO: move ->fixup into the union along with rip_rela_target_address.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jonathan Lebon <jlebon@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17 21:58:22 +02:00
Oleg Nesterov
8ad8e9d3fd uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops
Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops",
move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default
set of methods and change arch_uprobe_pre/post_xol() accordingly.

This way we can add the new uprobe_xol_ops's to handle the insns
which need the special processing (rip-relative jmp/call at least).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
2014-04-17 21:58:19 +02:00
Ricardo Neri
b738c6ea49 x86/efi: Save and restore FPU context around efi_calls (i386)
Do a complete FPU context save/restore around the EFI calls. This required
as runtime EFI firmware may potentially use the FPU.

This change covers only the i386 configuration.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-17 13:26:33 +01:00
Ricardo Neri
de05764e0b x86/efi: Save and restore FPU context around efi_calls (x86_64)
Do a complete FPU context save/restore around the EFI calls. This required
as runtime EFI firmware may potentially use the FPU.

This change covers only the x86_64 configuration.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-17 13:26:32 +01:00
Ricardo Neri
982e239cd2 x86/efi: Implement a __efi_call_virt macro
For i386, all the EFI system runtime services functions return efi_status_t
except efi_reset_system_system. Therefore, not all functions can be covered
by the same macro in case the macro needs to do more than calling the function
(i.e., return a value). The purpose of the __efi_call_virt macro is to be used
when no return value is expected.

For x86_64, this macro would not be needed as all the runtime services return
u64. However, the same code is used for both x86_64 and i386. Thus, the macro
__efi_call_virt is also defined to not break compilation.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-17 13:26:32 +01:00
Matt Fleming
c6b4069192 x86, fpu: Extend the use of static_cpu_has_safe
It may be necessary to save and restore the FPU context during EFI runtime
system services calls. However, this may happen during boot and before
alternatives have run. Thus, we need to use static_cpu_has_safe instead.

The rationale behind the use of static_cpu_has_safe is the same as in
commit 5f8c421814 ("x86, fpu: Use static_cpu_has_safe
before alternatives") by Borislav Petkov.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
2014-04-17 13:26:31 +01:00
Matt Fleming
62fa6e69a4 x86/efi: Delete most of the efi_call* macros
We really only need one phys and one virt function call, and then only
one assembly function to make firmware calls.

Since we are not using the C type system anyway, we're not really losing
much by deleting the macros apart from no longer having a check that
we are passing the correct number of parameters. The lack of duplicated
code seems like a worthwhile trade-off.

Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-17 13:26:30 +01:00
Linus Torvalds
55101e2d6c Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Marcelo Tosatti:
 - Fix for guest triggerable BUG_ON (CVE-2014-0155)
 - CR4.SMAP support
 - Spurious WARN_ON() fix

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: remove WARN_ON from get_kernel_ns()
  KVM: Rename variable smep to cr4_smep
  KVM: expose SMAP feature to guest
  KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
  KVM: Add SMAP support when setting CR4
  KVM: Remove SMAP bit from CR4_RESERVED_BITS
  KVM: ioapic: try to recover if pending_eoi goes out of range
  KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
2014-04-14 16:21:28 -07:00
Feng Wu
56d6efc2de KVM: Remove SMAP bit from CR4_RESERVED_BITS
This patch removes SMAP bit from CR4_RESERVED_BITS.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:33 -03:00
Prarit Bhargava
79a51b25ba x86/irq: Clean up VECTOR_UNDEFINED and VECTOR_RETRIGGERED definition
During another patch review, David Rientjes noted that
VECTOR_UNDEFINED and VECTOR_RETRIGGERED should be defined with ()s
so that they are not erroneously used in an arithmetic operation.

Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Yang Zhang <yang.z.zhang@Intel.com>
Link: http://lkml.kernel.org/r/1396440827-18352-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-14 13:42:05 +02:00
Linus Torvalds
0b747172dc Merge git://git.infradead.org/users/eparis/audit
Pull audit updates from Eric Paris.

* git://git.infradead.org/users/eparis/audit: (28 commits)
  AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
  audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
  audit: do not cast audit_rule_data pointers pointlesly
  AUDIT: Allow login in non-init namespaces
  audit: define audit_is_compat in kernel internal header
  kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
  sched: declare pid_alive as inline
  audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
  syscall_get_arch: remove useless function arguments
  audit: remove stray newline from audit_log_execve_info() audit_panic() call
  audit: remove stray newlines from audit_log_lost messages
  audit: include subject in login records
  audit: remove superfluous new- prefix in AUDIT_LOGIN messages
  audit: allow user processes to log from another PID namespace
  audit: anchor all pid references in the initial pid namespace
  audit: convert PPIDs to the inital PID namespace.
  pid: get pid_t ppid of task in init_pid_ns
  audit: rename the misleading audit_get_context() to audit_take_context()
  audit: Add generic compat syscall support
  audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
  ...
2014-04-12 12:38:53 -07:00
Mark Salter
5b7c73e009 x86: use generic early_ioremap
Move x86 over to the generic early ioremap implementation.

Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:15 -07:00
Dave Young
6b550f6f20 x86/mm: sparse warning fix for early_memremap
This patch series takes the common bits from the x86 early ioremap
implementation and creates a generic implementation which may be used by
other architectures.  The early ioremap interfaces are intended for
situations where boot code needs to make temporary virtual mappings
before the normal ioremap interfaces are available.  Typically, this
means before paging_init() has run.

This patch (of 6):

There's a lot of sparse warnings for code like below: void *a =
early_memremap(phys_addr, size);

early_memremap intend to map kernel memory with ioremap facility, the
return pointer should be a kernel ram pointer instead of iomem one.

For making the function clearer and supressing sparse warnings this patch
do below two things:
1. cast to (__force void *) for the return value of early_memremap
2. add early_memunmap function and pass (__force void __iomem *) to iounmap

From Boris:
  "Ingo told me yesterday, it makes sense too.  I'd guess we can try it.
   FWIW, all callers of early_memremap use the memory they get remapped
   as normal memory so we should be safe"

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:14 -07:00
Christoph Lameter
b3ca1c10d7 percpu: add raw_cpu_ops
The kernel has never been audited to ensure that this_cpu operations are
consistently used throughout the kernel.  The code generated in many
places can be improved through the use of this_cpu operations (which
uses a segment register for relocation of per cpu offsets instead of
performing address calculations).

The patch set also addresses various consistency issues in general with
the per cpu macros.

A. The semantics of __this_cpu_ptr() differs from this_cpu_ptr only
   because checks are skipped. This is typically shown through a raw_
   prefix. So this patch set changes the places where __this_cpu_ptr()
   is used to raw_cpu_ptr().

B. There has been the long term wish by some that __this_cpu operations
   would check for preemption. However, there are cases where preemption
   checks need to be skipped. This patch set adds raw_cpu operations that
   do not check for preemption and then adds preemption checks to the
   __this_cpu operations.

C. The use of __get_cpu_var is always a reference to a percpu variable
   that can also be handled via a this_cpu operation. This patch set
   replaces all uses of __get_cpu_var with this_cpu operations.

D. We can then use this_cpu RMW operations in various places replacing
   sequences of instructions by a single one.

E. The use of this_cpu operations throughout will allow other arches than
   x86 to implement optimized references and RMV operations to work with
   per cpu local data.

F. The use of this_cpu operations opens up the possibility to
   further optimize code that relies on synchronization through
   per cpu data.

The patch set works in a couple of stages:

I. Patch 1 adds the additional raw_cpu operations and raw_cpu_ptr().
    Also converts the existing __this_cpu_xx_# primitive in the x86
    code to raw_cpu_xx_#.

II. Patch 2-4 use the raw_cpu operations in places that would give
     us false positives once they are enabled.

III. Patch 5 adds preemption checks to __this_cpu operations to allow
    checking if preemption is properly disabled when these functions
    are used.

IV. Patches 6-20 are patches that simply replace uses of __get_cpu_var
   with this_cpu_ptr. They do not depend on any changes to the percpu
   code. No preemption tests are skipped if they are applied.

V. Patches 21-46 are conversion patches that use this_cpu operations
   in various kernel subsystems/drivers or arch code.

VI.  Patches 47/48 (not included in this series) remove no longer used
    functions (__this_cpu_ptr and __get_cpu_var).  These should only be
    applied after all the conversion patches have made it and after we
    have done additional passes through the kernel to ensure that none of
    the uses of these functions remain.

This patch (of 46):

The patches following this one will add preemption checks to __this_cpu
ops so we need to have an alternative way to use this_cpu operations
without preemption checks.

raw_cpu_ops will be the basis for all other ops since these will be the
operations that do not implement any checks.

Primitive operations are renamed by this patch from __this_cpu_xxx to
raw_cpu_xxxx.

Also change the uses of the x86 percpu primitives in preempt.h.
These depend directly on asm/percpu.h (header #include nesting issue).

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Alex Shi <alex.shi@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bryan Wu <cooloney@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: David Daney <david.daney@cavium.com>
Cc: David Miller <davem@davemloft.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Robert Richter <rric@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:13 -07:00
Josh Triplett
b06dd879f5 x86: always define BUG() and HAVE_ARCH_BUG, even with !CONFIG_BUG
This ensures that BUG() always has a definition that causes a trap (via
an undefined instruction), and that the compiler still recognizes the
code following BUG() as unreachable, avoiding warnings that would
otherwise appear (such as on non-void functions that don't return a
value after BUG()).

In addition to saving a few bytes over the generic infinite-loop
implementation, this implementation traps rather than looping, which
potentially allows for better error-recovery behavior (such as by
rebooting).

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:10 -07:00
Linus Torvalds
d1d9cfc330 A number of cleanups plus support for the RDSEED instruction, which
will be showing up in Intel Broadwell CPU's.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTPaDyAAoJENNvdpvBGATwIVkP/Ao77ATdIBKr7kJJ+7q/jroo
 hfg2EOBzYwFSTZ4EUCwTwCNOqcAnr8xBYRWDDTxoiT0ORQN+4QkIXlukTBZbeCUg
 5ZCvRYhNcrieCBX69X/0aQ7yAkMR5/KSjccww3Tr6bLEItEBS7/gkeGv0H/E1DOg
 /1C3vbDDRPchM2qGM/TS/2Wd3WU5jWJk4cURJUlweBWHGXItOk6yHIY/+h3t4Bj1
 eJfxwG21BLJ1t159uvMj5Sw9Ketwnzgc4hsU6Ai/o37+Q12LYJ6xRCHsAatr6aA8
 dsObrfnD15T3gH/elVzB0tpBkor9pESkJAvUxWdMNMNGVU01ONuBz6rbSUqLdKu4
 GXA+9iHq4Ti43f4M886PweSweFr1+lFIiQrdAQaWZdrooLCMJCgjMNM+7wnlu8ho
 Eit65rq4k6y5Fb5GTB8Am/TCb3HNYhXnrMERA0meKx+/K8yaMygzHtPoT+DHdn8z
 GWDwNiMYrtTeiJ5nduljO3M2lYJDIWLkCD9SwupXe5HEj3qSHS/aiT5ng6YP+oOK
 mmUiWWgpVv3unO6pJw49pbzuAWXnbHVWMV2VDILVxn7mZZ0NAOL3OGAmK7US+Z/J
 qww9s89A2oba8vTbdrW7JuiP9ESlujpfrLQQgnW6WInMHY2dfo5hSo84ULLeLMXQ
 SqT1yywl4U888jqBYioB
 =RtMT
 -----END PGP SIGNATURE-----

Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random

Pull /dev/random changes from Ted Ts'o:
 "A number of cleanups plus support for the RDSEED instruction, which
  will be showing up in Intel Broadwell CPU's"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
  random: Add arch_has_random[_seed]()
  random: If we have arch_get_random_seed*(), try it before blocking
  random: Use arch_get_random_seed*() at init time and once a second
  x86, random: Enable the RDSEED instruction
  random: use the architectural HWRNG for the SHA's IV in extract_buf()
  random: clarify bits/bytes in wakeup thresholds
  random: entropy_bytes is actually bits
  random: simplify accounting code
  random: tighten bound on random_read_wakeup_thresh
  random: forget lock in lockless accounting
  random: simplify accounting logic
  random: fix comment on "account"
  random: simplify loop in random_read
  random: fix description of get_random_bytes
  random: fix comment on proc_do_uuid
  random: fix typos / spelling errors in comments
2014-04-04 21:25:28 -07:00
Linus Torvalds
a372c967a3 Support PCI devices with multiple MSIs, performance improvement for
kernel-based backends (by not populated m2p overrides when mapping),
 and assorted minor bug fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQEcBAABAgAGBQJTPTGyAAoJEFxbo/MsZsTRnjgH/10j5CbOK1RFvIyCSslGTf4G
 slhK8P8dhhplGAxwXXji322lWNYEx9Jd+V0Bhxnvr4drSlsP/qkWuBWf+u1LBvRq
 AVPM99tk0XHCVAuvMMNo/lc62dTIR9IpQvnY6WhHSHnSlfqyVcdnbaGk8/LRuxWJ
 u2F0MXzDNH00b/kt6hDBt3F7CkHfjwsEn43LCkkxyHPp5MJGD7bGDIe+bKtnjv9u
 D9VJtCWQkrjWQ6jNpjdP833JCNCGQrXtVO3DeTAGs3T1tGmiEsqp6kT6Gp5zCFnh
 oaQk9jfQL2S+IVnVhHVMW9nTwNPPrnIrD69FlgTrK301mcYW1mKoFotTogzHu+0=
 =2IG+
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen features and fixes from David Vrabel:
 "Support PCI devices with multiple MSIs, performance improvement for
  kernel-based backends (by not populated m2p overrides when mapping),
  and assorted minor bug fixes and cleanups"

* tag 'stable/for-linus-3.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/acpi-processor: fix enabling interrupts on syscore_resume
  xen/grant-table: Refactor gnttab_[un]map_refs to avoid m2p_override
  xen: remove XEN_PRIVILEGED_GUEST
  xen: add support for MSI message groups
  xen-pciback: Use pci_enable_msix_exact() instead of pci_enable_msix()
  xen/xenbus: remove unused xenbus_bind_evtchn()
  xen/events: remove unnecessary call to bind_evtchn_to_cpu()
  xen/events: remove the unused resend_irq_on_evtchn()
  drivers:xen-selfballoon:reset 'frontswap_inertia_counter' after frontswap_shrink
  drivers: xen: Include appropriate header file in pcpu.c
  drivers: xen: Mark function as static in platform-pci.c
2014-04-03 14:01:37 -07:00
Linus Torvalds
7cbb39d4d4 Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "PPC and ARM do not have much going on this time.  Most of the cool
  stuff, instead, is in s390 and (after a few releases) x86.

  ARM has some caching fixes and PPC has transactional memory support in
  guests.  MIPS has some fixes, with more probably coming in 3.16 as
  QEMU will soon get support for MIPS KVM.

  For x86 there are optimizations for debug registers, which trigger on
  some Windows games, and other important fixes for Windows guests.  We
  now expose to the guest Broadwell instruction set extensions and also
  Intel MPX.  There's also a fix/workaround for OS X guests, nested
  virtualization features (preemption timer), and a couple kvmclock
  refinements.

  For s390, the main news is asynchronous page faults, together with
  improvements to IRQs (floating irqs and adapter irqs) that speed up
  virtio devices"

* tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits)
  KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8
  KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset
  KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode
  KVM: PPC: Book3S HV: Return ENODEV error rather than EIO
  KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code
  KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state
  KVM: PPC: Book3S HV: Add transactional memory support
  KVM: Specify byte order for KVM_EXIT_MMIO
  KVM: vmx: fix MPX detection
  KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n
  KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE
  KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write
  KVM: s390: clear local interrupts at cpu initial reset
  KVM: s390: Fix possible memory leak in SIGP functions
  KVM: s390: fix calculation of idle_mask array size
  KVM: s390: randomize sca address
  KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP
  KVM: Bump KVM_MAX_IRQ_ROUTES for s390
  KVM: s390: irq routing for adapter interrupts.
  KVM: s390: adapter interrupt sources
  ...
2014-04-02 14:50:10 -07:00
Linus Torvalds
467cbd207a Merge branch 'x86-nuke-platforms-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 old platform removal from Peter Anvin:
 "This patchset removes support for several completely obsolete
  platforms, where the maintainers either have completely vanished or
  acked the removal.  For some of them it is questionable if there even
  exists functional specimens of the hardware"

Geert Uytterhoeven apparently thought this was a April Fool's pull request ;)

* 'x86-nuke-platforms-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, platforms: Remove NUMAQ
  x86, platforms: Remove SGI Visual Workstation
  x86, apic: Remove support for IBM Summit/EXA chipset
  x86, apic: Remove support for ia32-based Unisys ES7000
2014-04-02 13:15:58 -07:00
Linus Torvalds
c6f21243ce Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso changes from Peter Anvin:
 "This is the revamp of the 32-bit vdso and the associated cleanups.

  This adds timekeeping support to the 32-bit vdso that we already have
  in the 64-bit vdso.  Although 32-bit x86 is legacy, it is likely to
  remain in the embedded space for a very long time to come.

  This removes the traditional COMPAT_VDSO support; the configuration
  variable is reused for simply removing the 32-bit vdso, which will
  produce correct results but obviously suffer a performance penalty.
  Only one beta version of glibc was affected, but that version was
  unfortunately included in one OpenSUSE release.

  This is not the end of the vdso cleanups.  Stefani and Andy have
  agreed to continue work for the next kernel cycle; in fact Andy has
  already produced another set of cleanups that came too late for this
  cycle.

  An incidental, but arguably important, change is that this ensures
  that unused space in the VVAR page is properly zeroed.  It wasn't
  before, and would contain whatever garbage was left in memory by BIOS
  or the bootloader.  Since the VVAR page is accessible to user space
  this had the potential of information leaks"

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86, vdso: Fix the symbol versions on the 32-bit vDSO
  x86, vdso, build: Don't rebuild 32-bit vdsos on every make
  x86, vdso: Actually discard the .discard sections
  x86, vdso: Fix size of get_unmapped_area()
  x86, vdso: Finish removing VDSO32_PRELINK
  x86, vdso: Move more vdso definitions into vdso.h
  x86: Load the 32-bit vdso in place, just like the 64-bit vdsos
  x86, vdso32: handle 32 bit vDSO larger one page
  x86, vdso32: Disable stack protector, adjust optimizations
  x86, vdso: Zero-pad the VVAR page
  x86, vdso: Add 32 bit VDSO time support for 64 bit kernel
  x86, vdso: Add 32 bit VDSO time support for 32 bit kernel
  x86, vdso: Patch alternatives in the 32-bit VDSO
  x86, vdso: Introduce VVAR marco for vdso32
  x86, vdso: Cleanup __vdso_gettimeofday()
  x86, vdso: Replace VVAR(vsyscall_gtod_data) by gtod macro
  x86, vdso: __vdso_clock_gettime() cleanup
  x86, vdso: Revamp vclock_gettime.c
  mm: Add new func _install_special_mapping() to mmap.c
  x86, vdso: Make vsyscall_gtod_data handling x86 generic
  ...
2014-04-02 12:26:43 -07:00
Linus Torvalds
158e0d3621 Driver core / sysfs patches for 3.15-rc1
Here's the big driver core / sysfs update for 3.15-rc1.
 
 Lots of kernfs updates to make it useful for other subsystems, and a few
 other tiny driver core patches.
 
 All have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlM7A0wACgkQMUfUDdst+ynJNACfZlY+KNKIhNFt1OOW8rQfSZzy
 1PYAnjYuOoly01JlPrpJD5b4TdxaAq71
 =GVUg
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core and sysfs updates from Greg KH:
 "Here's the big driver core / sysfs update for 3.15-rc1.

  Lots of kernfs updates to make it useful for other subsystems, and a
  few other tiny driver core patches.

  All have been in linux-next for a while"

* tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (42 commits)
  Revert "sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()"
  kernfs: cache atomic_write_len in kernfs_open_file
  numa: fix NULL pointer access and memory leak in unregister_one_node()
  Revert "driver core: synchronize device shutdown"
  kernfs: fix off by one error.
  kernfs: remove duplicate dir.c at the top dir
  x86: align x86 arch with generic CPU modalias handling
  cpu: add generic support for CPU feature based module autoloading
  sysfs: create bin_attributes under the requested group
  driver core: unexport static function create_syslog_header
  firmware: use power efficient workqueue for unloading and aborting fw load
  firmware: give a protection when map page failed
  firmware: google memconsole driver fixes
  firmware: fix google/gsmi duplicate efivars_sysfs_init()
  drivers/base: delete non-required instances of include <linux/init.h>
  kernfs: fix kernfs_node_from_dentry()
  ACPI / platform: drop redundant ACPI_HANDLE check
  kernfs: fix hash calculation in kernfs_rename_ns()
  kernfs: add CONFIG_KERNFS
  sysfs, kobject: add sysfs wrapper for kernfs_enable_ns()
  ...
2014-04-01 16:28:19 -07:00
Linus Torvalds
4b1779c2cf PCI changes for the v3.15 merge window:
Enumeration
     - Increment max correctly in pci_scan_bridge() (Andreas Noever)
     - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever)
     - Assign CardBus bus number only during the second pass (Andreas Noever)
     - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever)
     - Make sure bus number resources stay within their parents bounds (Andreas Noever)
     - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever)
     - Check for child busses which use more bus numbers than allocated (Andreas Noever)
     - Don't scan random busses in pci_scan_bridge() (Andreas Noever)
     - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas)
     - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas)
     - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas)
     - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas)
     - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas)
 
   NUMA
     - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas)
     - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas)
     - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas)
     - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas)
     - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas)
     - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas)
     - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas)
     - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas)
 
   Resource management
     - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas)
     - Add resource_contains() (Bjorn Helgaas)
     - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas)
     - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas)
     - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas)
     - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas)
     - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas)
     - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas)
     - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas)
     - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas)
     - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas)
     - s390: Use generic pci_enable_resources() (Bjorn Helgaas)
     - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas)
     - Set type in __request_region() (Bjorn Helgaas)
     - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas)
     - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas)
     - Log IDE resource quirk in dmesg (Bjorn Helgaas)
     - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas)
 
   PCI device hotplug
     - Make check_link_active() non-static (Rajat Jain)
     - Use link change notifications for hot-plug and removal (Rajat Jain)
     - Enable link state change notifications (Rajat Jain)
     - Don't disable the link permanently during removal (Rajat Jain)
     - Don't check adapter or latch status while disabling (Rajat Jain)
     - Disable link notification across slot reset (Rajat Jain)
     - Ensure very fast hotplug events are also processed (Rajat Jain)
     - Add hotplug_lock to serialize hotplug events (Rajat Jain)
     - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain)
     - Don't turn slot off when hot-added device already exists (Yijing Wang)
 
   MSI
     - Keep pci_enable_msi() documentation (Alexander Gordeev)
     - ahci: Fix broken single MSI fallback (Alexander Gordeev)
     - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev)
     - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman)
     - Fix leak of msi_attrs (Greg Kroah-Hartman)
     - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida)
 
   Virtualization
     - Device-specific ACS support (Alex Williamson)
 
   Freescale i.MX6
     - Wait for retraining (Marek Vasut)
 
   Marvell MVEBU
     - Use Device ID and revision from underlying endpoint (Andrew Lunn)
     - Fix incorrect size for PCI aperture resources (Jason Gunthorpe)
     - Call request_resource() on the apertures (Jason Gunthorpe)
     - Fix potential issue in range parsing (Jean-Jacques Hiblot)
 
   Renesas R-Car
     - Check platform_get_irq() return code (Ben Dooks)
     - Add error interrupt handling (Ben Dooks)
     - Fix bridge logic configuration accesses (Ben Dooks)
     - Register each instance independently (Magnus Damm)
     - Break out window size handling (Magnus Damm)
     - Make the Kconfig dependencies more generic (Magnus Damm)
 
   Synopsys DesignWare
     - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar)
 
   Miscellaneous
     - Remove unused SR-IOV VF Migration support (Bjorn Helgaas)
     - Enable INTx if BIOS left them disabled (Bjorn Helgaas)
     - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter)
     - Clean up par-arch object file list (Liviu Dudau)
     - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom)
     - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang)
     - Fix pci_bus_b() build failure (Paul Gortmaker)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTOdAZAAoJEFmIoMA60/r8VYUQALRrReyMBk3pjRt/fKIX4Kwi
 ydSo/YJeeKTN8K93fLw8bb8bdPItJScJFTfEa4Q2SpZezR/ecGXLowisy0BBaPHK
 qtOyB8EqjkLS17GfyecIe9Nd2SIAI2De/0bchK3kDtIX1YlZB/k/tD3eCPMHDnnl
 m8c5kAHKPQYd8g01I+S8nrtGHk/A33grfYpJXPZbcqyhE0lWU3SI8KDAGbcKzNHE
 23Do0yNyd4nHIdixWlhETcNvzHn35Q/O38JJwW9Mf1aI9gusYuml6GFefCgu/iov
 lxqp3CEW7iPZgQEgNbrQ0HzWn/durL2Trd6S/Yh6f2xbm1LGYKWh3LZUFLd3AQDd
 INEpUgKsyb//nF3dtiyGnZlp0QykoqFyLo2AEDrb+ILTd4up5DeRY/m1UpjAXR5p
 QicBmrDksHrSivPmMZwLx1DFQYKjQbdx5lOqy9hQM/Jmsr+N3/l7QBrbQWXks3JZ
 NNAyn4RZHQB7UDQS/MmVPArs+JK5qaEDQD57QuOTlqgP19VY9C9E/l/aEqefjdFo
 XOAm7CwGpB/iBAkIbE6ROEDiJArigRVHEfxLYeE/jtGOdRDCD1deWk+g3S8DWD7m
 ZxWSgIVB00PMAmomczdg59YVFBhocgwPUa8/cw6yqzx2QKP4mWXIFZ/Sjau5I3tn
 WWoxXlUirZfTJc29XnVy
 =3mNS
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI changes from Bjorn Helgaas:
 "Enumeration
   - Increment max correctly in pci_scan_bridge() (Andreas Noever)
   - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever)
   - Assign CardBus bus number only during the second pass (Andreas Noever)
   - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever)
   - Make sure bus number resources stay within their parents bounds (Andreas Noever)
   - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever)
   - Check for child busses which use more bus numbers than allocated (Andreas Noever)
   - Don't scan random busses in pci_scan_bridge() (Andreas Noever)
   - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas)
   - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas)
   - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas)
   - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas)
   - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas)

  NUMA
   - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas)
   - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas)
   - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas)
   - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas)
   - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas)
   - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas)
   - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas)
   - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas)

  Resource management
   - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas)
   - Add resource_contains() (Bjorn Helgaas)
   - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas)
   - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas)
   - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas)
   - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas)
   - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas)
   - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas)
   - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas)
   - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas)
   - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas)
   - s390: Use generic pci_enable_resources() (Bjorn Helgaas)
   - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas)
   - Set type in __request_region() (Bjorn Helgaas)
   - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas)
   - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas)
   - Log IDE resource quirk in dmesg (Bjorn Helgaas)
   - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas)

  PCI device hotplug
   - Make check_link_active() non-static (Rajat Jain)
   - Use link change notifications for hot-plug and removal (Rajat Jain)
   - Enable link state change notifications (Rajat Jain)
   - Don't disable the link permanently during removal (Rajat Jain)
   - Don't check adapter or latch status while disabling (Rajat Jain)
   - Disable link notification across slot reset (Rajat Jain)
   - Ensure very fast hotplug events are also processed (Rajat Jain)
   - Add hotplug_lock to serialize hotplug events (Rajat Jain)
   - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain)
   - Don't turn slot off when hot-added device already exists (Yijing Wang)

  MSI
   - Keep pci_enable_msi() documentation (Alexander Gordeev)
   - ahci: Fix broken single MSI fallback (Alexander Gordeev)
   - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev)
   - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman)
   - Fix leak of msi_attrs (Greg Kroah-Hartman)
   - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida)

  Virtualization
   - Device-specific ACS support (Alex Williamson)

  Freescale i.MX6
   - Wait for retraining (Marek Vasut)

  Marvell MVEBU
   - Use Device ID and revision from underlying endpoint (Andrew Lunn)
   - Fix incorrect size for PCI aperture resources (Jason Gunthorpe)
   - Call request_resource() on the apertures (Jason Gunthorpe)
   - Fix potential issue in range parsing (Jean-Jacques Hiblot)

  Renesas R-Car
   - Check platform_get_irq() return code (Ben Dooks)
   - Add error interrupt handling (Ben Dooks)
   - Fix bridge logic configuration accesses (Ben Dooks)
   - Register each instance independently (Magnus Damm)
   - Break out window size handling (Magnus Damm)
   - Make the Kconfig dependencies more generic (Magnus Damm)

  Synopsys DesignWare
   - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar)

  Miscellaneous
   - Remove unused SR-IOV VF Migration support (Bjorn Helgaas)
   - Enable INTx if BIOS left them disabled (Bjorn Helgaas)
   - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter)
   - Clean up par-arch object file list (Liviu Dudau)
   - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom)
   - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang)
   - Fix pci_bus_b() build failure (Paul Gortmaker)"

* tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (108 commits)
  Revert "[PATCH] Insert GART region into resource map"
  PCI: Log IDE resource quirk in dmesg
  PCI: Change pci_bus_alloc_resource() type_mask to unsigned long
  PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region()
  resources: Set type in __request_region()
  PCI: Don't check resource_size() in pci_bus_alloc_resource()
  s390/PCI: Use generic pci_enable_resources()
  tile PCI RC: Use default pcibios_enable_device()
  sparc/PCI: Use default pcibios_enable_device() (Leon only)
  sh/PCI: Use default pcibios_enable_device()
  microblaze/PCI: Use default pcibios_enable_device()
  alpha/PCI: Use default pcibios_enable_device()
  PCI: Add "weak" generic pcibios_enable_device() implementation
  PCI: Don't enable decoding if BAR hasn't been assigned an address
  PCI: Enable INTx in pci_reenable_device() only when MSI/MSI-X not enabled
  PCI: Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit
  PCI: Don't try to claim IORESOURCE_UNSET resources
  PCI: Check IORESOURCE_UNSET before updating BAR
  PCI: Don't clear IORESOURCE_UNSET when updating BAR
  PCI: Mark resources as IORESOURCE_UNSET if we can't assign them
  ...

Conflicts:
	arch/x86/include/asm/topology.h
	drivers/ata/ahci.c
2014-04-01 15:14:04 -07:00
Linus Torvalds
683b6c6f82 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq code updates from Thomas Gleixner:
 "The irq department proudly presents:

   - Another tree wide sweep of irq infrastructure abuse.  Clear winner
     of the trainwreck engineering contest was:
         #include "../../../kernel/irq/settings.h"

   - Tree wide update of irq_set_affinity() callbacks which miss a cpu
     online check when picking a single cpu out of the affinity mask.

   - Tree wide consolidation of interrupt statistics.

   - Updates to the threaded interrupt infrastructure to allow explicit
     wakeup of the interrupt thread and a variant of synchronize_irq()
     which synchronizes only the hard interrupt handler.  Both are
     needed to replace the homebrewn thread handling in the mmc/sdhci
     code.

   - New irq chip callbacks to allow proper support for GPIO based irqs.
     The GPIO based interrupts need to request/release GPIO resources
     from request/free_irq.

   - A few new ARM interrupt chips.  No revolutionary new hardware, just
     differently wreckaged variations of the scheme.

   - Small improvments, cleanups and updates all over the place"

I was hoping that that trainwreck engineering contest was a April Fools'
joke.  But no.

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits)
  irqchip: sun7i/sun6i: Disable NMI before registering the handler
  ARM: sun7i/sun6i: dts: Fix IRQ number for sun6i NMI controller
  ARM: sun7i/sun6i: irqchip: Update the documentation
  ARM: sun7i/sun6i: dts: Add NMI irqchip support
  ARM: sun7i/sun6i: irqchip: Add irqchip driver for NMI controller
  genirq: Export symbol no_action()
  arm: omap: Fix typo in ams-delta-fiq.c
  m68k: atari: Fix the last kernel_stat.h fallout
  irqchip: sun4i: Simplify sun4i_irq_ack
  irqchip: sun4i: Use handle_fasteoi_irq for all interrupts
  genirq: procfs: Make smp_affinity values go+r
  softirq: Add linux/irq.h to make it compile again
  m68k: amiga: Add linux/irq.h to make it compile again
  irqchip: sun4i: Don't ack IRQs > 0, fix acking of IRQ 0
  irqchip: sun4i: Fix a comment about mask register initialization
  irqchip: sun4i: Fix irq 0 not working
  genirq: Add a new IRQCHIP_EOI_THREADED flag
  genirq: Document IRQCHIP_ONESHOT_SAFE flag
  ARM: sunxi: dt: Convert to the new irq controller compatibles
  irqchip: sunxi: Change compatibles
  ...
2014-04-01 11:22:57 -07:00
Linus Torvalds
99f7b025bf Merge branch 'x86-threadinfo-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 threadinfo changes from Ingo Molnar:
 "The main change here is the consolidation/unification of 32 and 64 bit
  thread_info handling methods, from Steve Rostedt"

* 'x86-threadinfo-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, threadinfo: Redo "x86: Use inline assembler to get sp"
  x86: Clean up dumpstack_64.c code
  x86: Keep thread_info on thread stack in x86_32
  x86: Prepare removal of previous_esp from i386 thread_info structure
  x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386
  x86: Nuke the supervisor_stack field in i386 thread_info
2014-04-01 10:17:18 -07:00
Linus Torvalds
a21e40877a Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Ingo Molnar:
 "The main purpose is to fix a full dynticks bug related to
  virtualization, where steal time accounting appears to be zero in
  /proc/stat even after a few seconds of competing guests running busy
  loops in a same host CPU.  It's not a regression though as it was
  there since the beginning.

  The other commits are preparatory work to fix the bug and various
  cleanups"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arch: Remove stub cputime.h headers
  sched: Remove needless round trip nsecs <-> tick conversion of steal time
  cputime: Fix jiffies based cputime assumption on steal accounting
  cputime: Bring cputime -> nsecs conversion
  cputime: Default implementation of nsecs -> cputime conversion
  cputime: Fix nsecs_to_cputime() return type cast
2014-04-01 10:16:10 -07:00
Linus Torvalds
b9b16a7922 Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpufeature update from Ingo Molnar:
 "Two refinements to clflushopt support"

* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cpufeature: If we disable CLFLUSH, we should disable CLFLUSHOPT
  x86, cpufeature: Rename X86_FEATURE_CLFLSH to X86_FEATURE_CLFLUSH
2014-04-01 10:11:21 -07:00
Dimitri Sivanich
5f40f7d938 x86/UV: Set n_lshift based on GAM_GR_CONFIG MMR for UV3
The value of n_lshift for UV is currently set based on the
socket m_val.

For UV3, set the n_lshift value based on the GAM_GR_CONFIG MMR.
This will allow bios to control the n_lshift value independent
of the socket m_val. Then n_lshift can be assigned a fixed value
across a multi-partition system, allowing for a fixed common
global physical address format that is independent of socket
m_val.

Cleanup unneeded macros.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/20140331143700.GB29916@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-01 12:10:44 +02:00
Linus Torvalds
190f918660 Merge branch 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 compat wrapper rework from Heiko Carstens:
 "S390 compat system call wrapper simplification work.

  The intention of this work is to get rid of all hand written assembly
  compat system call wrappers on s390, which perform proper sign or zero
  extension, or pointer conversion of compat system call parameters.
  Instead all of this should be done with C code eg by using Al's
  COMPAT_SYSCALL_DEFINEx() macro.

  Therefore all common code and s390 specific compat system calls have
  been converted to the COMPAT_SYSCALL_DEFINEx() macro.

  In order to generate correct code all compat system calls may only
  have eg compat_ulong_t parameters, but no unsigned long parameters.
  Those patches which change parameter types from unsigned long to
  compat_ulong_t parameters are separate in this series, but shouldn't
  cause any harm.

  The only compat system calls which intentionally have 64 bit
  parameters (preadv64 and pwritev64) in support of the x86/32 ABI
  haven't been changed, but are now only available if an architecture
  defines __ARCH_WANT_COMPAT_SYS_PREADV64/PWRITEV64.

  System calls which do not have a compat variant but still need proper
  zero extension on s390, like eg "long sys_brk(unsigned long brk)" will
  get a proper wrapper function with the new s390 specific
  COMPAT_SYSCALL_WRAPx() macro:

     COMPAT_SYSCALL_WRAP1(brk, unsigned long, brk);

  which generates the following code (simplified):

     asmlinkage long sys_brk(unsigned long brk);
     asmlinkage long compat_sys_brk(long brk)
     {
         return sys_brk((u32)brk);
     }

  Given that the C file which contains all the COMPAT_SYSCALL_WRAP lines
  includes both linux/syscall.h and linux/compat.h, it will generate
  build errors, if the declaration of sys_brk() doesn't match, or if
  there exists a non-matching compat_sys_brk() declaration.

  In addition this will intentionally result in a link error if
  somewhere else a compat_sys_brk() function exists, which probably
  should have been used instead.  Two more BUILD_BUG_ONs make sure the
  size and type of each compat syscall parameter can be handled
  correctly with the s390 specific macros.

  I converted the compat system calls step by step to verify the
  generated code is correct and matches the previous code.  In fact it
  did not always match, however that was always a bug in the hand
  written asm code.

  In result we get less code, less bugs, and much more sanity checking"

* 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits)
  s390/compat: add copyright statement
  compat: include linux/unistd.h within linux/compat.h
  s390/compat: get rid of compat wrapper assembly code
  s390/compat: build error for large compat syscall args
  mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  ipc/compat: convert to COMPAT_SYSCALL_DEFINE
  fs/compat: convert to COMPAT_SYSCALL_DEFINE
  security/compat: convert to COMPAT_SYSCALL_DEFINE
  mm/compat: convert to COMPAT_SYSCALL_DEFINE
  net/compat: convert to COMPAT_SYSCALL_DEFINE
  kernel/compat: convert to COMPAT_SYSCALL_DEFINE
  fs/compat: optional preadv64/pwrite64 compat system calls
  ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t
  s390/compat: partial parameter conversion within syscall wrappers
  s390/compat: automatic zero, sign and pointer conversion of syscalls
  s390/compat: add sync_file_range and fallocate compat syscalls
  ...
2014-03-31 14:32:17 -07:00
Linus Torvalds
7cc3afdf43 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI changes from Ingo Molnar:
 "The main changes:

  - Add debug code to the dump EFI pagetable - Borislav Petkov

  - Make 1:1 runtime mapping robust when booting on machines with lots
    of memory - Borislav Petkov

  - Move the EFI facilities bits out of 'x86_efi_facility' and into
    efi.flags which is the standard architecture independent place to
    keep EFI state, by Matt Fleming.

  - Add 'EFI mixed mode' support: this allows 64-bit kernels to be
    booted from 32-bit firmware.  This needs a bootloader that supports
    the 'EFI handover protocol'.  By Matt Fleming"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  x86, efi: Abstract x86 efi_early calls
  x86/efi: Restore 'attr' argument to query_variable_info()
  x86/efi: Rip out phys_efi_get_time()
  x86/efi: Preserve segment registers in mixed mode
  x86/boot: Fix non-EFI build
  x86, tools: Fix up compiler warnings
  x86/efi: Re-disable interrupts after calling firmware services
  x86/boot: Don't overwrite cr4 when enabling PAE
  x86/efi: Wire up CONFIG_EFI_MIXED
  x86/efi: Add mixed runtime services support
  x86/efi: Firmware agnostic handover entry points
  x86/efi: Split the boot stub into 32/64 code paths
  x86/efi: Add early thunk code to go from 64-bit to 32-bit
  x86/efi: Build our own EFI services pointer table
  efi: Add separate 32-bit/64-bit definitions
  x86/efi: Delete dead code when checking for non-native
  x86/mm/pageattr: Always dump the right page table in an oops
  x86, tools: Consolidate #ifdef code
  x86/boot: Cleanup header.S by removing some #ifdefs
  efi: Use NULL instead of 0 for pointer
  ...
2014-03-31 12:26:05 -07:00
Linus Torvalds
918d80a136 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu handling changes from Ingo Molnar:
 "Bigger changes:

   - Intel CPU hardware-enablement: new vector instructions support
     (AVX-512), by Fenghua Yu.

   - Support the clflushopt instruction and use it in appropriate
     places.  clflushopt is similar to clflush but with more relaxed
     ordering, by Ross Zwisler.

   - MSR accessor cleanups, by Borislav Petkov.

   - 'forcepae' boot flag for those who have way too much time to spend
     on way too old Pentium-M systems and want to live way too
     dangerously, by Chris Bainbridge"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M
  Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC
  x86, intel: Make MSR_IA32_MISC_ENABLE bit constants systematic
  x86, Intel: Convert to the new bit access MSR accessors
  x86, AMD: Convert to the new bit access MSR accessors
  x86: Add another set of MSR accessor functions
  x86: Use clflushopt in drm_clflush_virt_range
  x86: Use clflushopt in drm_clflush_page
  x86: Use clflushopt in clflush_cache_range
  x86: Add support for the clflushopt instruction
  x86, AVX-512: Enable AVX-512 States Context Switch
  x86, AVX-512: AVX-512 Feature Detection
2014-03-31 12:00:45 -07:00
Linus Torvalds
6ed7705167 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic changes from Ingo Molnar:
 "An xAPIC CPU hotplug race fix, plus cleanups and minor fixes"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Plug racy xAPIC access of CPU hotplug code
  x86/apic: Always define nox2apic and define it as initdata
  x86/apic: Remove unused function prototypes
  x86/apic: Switch wait_for_init_deassert() to a bool flag
  x86/apic: Only use default_wait_for_init_deassert()
2014-03-31 11:58:45 -07:00
Linus Torvalds
971eae7c99 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "Bigger changes:

   - sched/idle restructuring: they are WIP preparation for deeper
     integration between the scheduler and idle state selection, by
     Nicolas Pitre.

   - add NUMA scheduling pseudo-interleaving, by Rik van Riel.

   - optimize cgroup context switches, by Peter Zijlstra.

   - RT scheduling enhancements, by Thomas Gleixner.

  The rest is smaller changes, non-urgnt fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits)
  sched: Clean up the task_hot() function
  sched: Remove double calculation in fix_small_imbalance()
  sched: Fix broken setscheduler()
  sparc64, sched: Remove unused sparc64_multi_core
  sched: Remove unused mc_capable() and smt_capable()
  sched/numa: Move task_numa_free() to __put_task_struct()
  sched/fair: Fix endless loop in idle_balance()
  sched/core: Fix endless loop in pick_next_task()
  sched/fair: Push down check for high priority class task into idle_balance()
  sched/rt: Fix picking RT and DL tasks from empty queue
  trace: Replace hardcoding of 19 with MAX_NICE
  sched: Guarantee task priority in pick_next_task()
  sched/idle: Remove stale old file
  sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED
  cpuidle/arm64: Remove redundant cpuidle_idle_call()
  cpuidle/powernv: Remove redundant cpuidle_idle_call()
  sched, nohz: Exclude isolated cores from load balancing
  sched: Fix select_task_rq_fair() description comments
  workqueue: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
  sys: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
  ...
2014-03-31 11:21:19 -07:00
Linus Torvalds
8c292f1174 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Main changes:

  Kernel side changes:

   - Add SNB/IVB/HSW client uncore memory controller support (Stephane
     Eranian)

   - Fix various x86/P4 PMU driver bugs (Don Zickus)

  Tooling, user visible changes:

   - Add several futex 'perf bench' microbenchmarks (Davidlohr Bueso)

   - Speed up thread map generation (Don Zickus)

   - Introduce 'perf kvm --list-cmds' command line option for use by
     scripts (Ramkumar Ramachandra)

   - Print the evsel name in the annotate stdio output, prep to fix
     support outputting annotation for multiple events, not just for the
     first one (Arnaldo Carvalho de Melo)

   - Allow setting preferred callchain method in .perfconfig (Jiri Olsa)

   - Show in what binaries/modules 'perf probe's are set (Masami
     Hiramatsu)

   - Support distro-style debuginfo for uprobe in 'perf probe' (Masami
     Hiramatsu)

  Tooling, internal changes and fixes:

   - Use tid in mmap/mmap2 events to find maps (Don Zickus)

   - Record the reason for filtering an address_location (Namhyung Kim)

   - Apply all filters to an addr_location (Namhyung Kim)

   - Merge al->filtered with hist_entry->filtered in report/hists
     (Namhyung Kim)

   - Fix memory leak when synthesizing thread records (Namhyung Kim)

   - Use ui__has_annotation() in 'report' (Namhyung Kim)

   - hists browser refactorings to reuse code accross UIs (Namhyung Kim)

   - Add support for the new DWARF unwinder library in elfutils (Jiri
     Olsa)

   - Fix build race in the generation of bison files (Jiri Olsa)

   - Further streamline the feature detection display, trimming it a bit
     to show just the libraries detected, using VF=1 gets a more verbose
     output, showing the less interesting feature checks as well (Jiri
     Olsa).

   - Check compatible symtab type before loading dso (Namhyung Kim)

   - Check return value of filename__read_debuglink() (Stephane Eranian)

   - Move some hashing and fs related code from tools/perf/util/ to
     tools/lib/ so that it can be used by more tools/ living utilities
     (Borislav Petkov)

   - Prepare DWARF unwinding code for using an elfutils alternative
     unwinding library (Jiri Olsa)

   - Fix DWARF unwind max_stack processing (Jiri Olsa)

   - Add dwarf unwind 'perf test' entry (Jiri Olsa)

   - 'perf probe' improvements including memory leak fixes, sharing the
     intlist class with other tools, uprobes/kprobes code sharing and
     use of ref_reloc_sym (Masami Hiramatsu)

   - Shorten sample symbol resolving by adding cpumode to struct
     addr_location (Arnaldo Carvalho de Melo)

   - Fix synthesizing mmaps for threads (Don Zickus)

   - Fix invalid output on event group stdio report (Namhyung Kim)

   - Fixup header alignment in 'perf sched latency' output (Ramkumar
     Ramachandra)

   - Fix off-by-one error in 'perf timechart record' argv handling
     (Ramkumar Ramachandra)

  Tooling, cleanups:

   - Remove unused thread__find_map function (Jiri Olsa)

   - Remove unused simple_strtoul() function (Ramkumar Ramachandra)

  Tooling, documentation updates:

   - Update function names in debug messages (Ramkumar Ramachandra)

   - Update some code references in design.txt (Ramkumar Ramachandra)

   - Clarify load-latency information in the 'perf mem' docs (Andi
     Kleen)

   - Clarify x86 register naming in 'perf probe' docs (Andi Kleen)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (96 commits)
  perf tools: Remove unused simple_strtoul() function
  perf tools: Update some code references in design.txt
  perf evsel: Update function names in debug messages
  perf tools: Remove thread__find_map function
  perf annotate: Print the evsel name in the stdio output
  perf report: Use ui__has_annotation()
  perf tools: Fix memory leak when synthesizing thread records
  perf tools: Use tid in mmap/mmap2 events to find maps
  perf report: Merge al->filtered with hist_entry->filtered
  perf symbols: Apply all filters to an addr_location
  perf symbols: Record the reason for filtering an address_location
  perf sched: Fixup header alignment in 'latency' output
  perf timechart: Fix off-by-one error in 'record' argv handling
  perf machine: Factor machine__find_thread to take tid argument
  perf tools: Speed up thread map generation
  perf kvm: introduce --list-cmds for use by scripts
  perf ui hists: Pass evsel to hpp->header/width functions explicitly
  perf symbols: Introduce thread__find_cpumode_addr_location
  perf session: Change header.misc dump from decimal to hex
  perf ui/tui: Reuse generic __hpp__fmt() code
  ...
2014-03-31 11:13:25 -07:00
Linus Torvalds
462bf234a8 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking updates from Ingo Molnar:
 "The biggest change is the MCS spinlock generalization changes from Tim
  Chen, Peter Zijlstra, Jason Low et al.  There's also lockdep
  fixes/enhancements from Oleg Nesterov, in particular a false negative
  fix related to lockdep_set_novalidate_class() usage"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  locking/mutex: Fix debug checks
  locking/mutexes: Add extra reschedule point
  locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
  locking/mutexes: Unlock the mutex without the wait_lock
  locking/mutexes: Modify the way optimistic spinners are queued
  locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
  locking: Move mcs_spinlock.h into kernel/locking/
  m68k: Skip futex_atomic_cmpxchg_inatomic() test
  futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
  Revert "sched/wait: Suppress Sparse 'variable shadowing' warning"
  lockdep: Change lockdep_set_novalidate_class() to use _and_name
  lockdep: Change mark_held_locks() to check hlock->check instead of lockdep_no_validate
  lockdep: Don't create the wrong dependency on hlock->check == 0
  lockdep: Make held_lock->check and "int check" argument bool
  locking/mcs: Allow architecture specific asm files to be used for contended case
  locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order
  sched/wait: Suppress Sparse 'variable shadowing' warning
  hung_task/Documentation: Fix hung_task_warnings description
  locking/mcs: Allow architectures to hook in to contended paths
  locking/mcs: Micro-optimize the MCS code, add extra comments
  ...
2014-03-31 10:59:39 -07:00
Artem Fetishev
825600c0f2 x86: fix boot on uniprocessor systems
On x86 uniprocessor systems topology_physical_package_id() returns -1
which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized
which leads to GPF in rapl_pmu_init().

See arch/x86/kernel/cpu/perf_event_intel_rapl.c.

It turns out that physical_package_id and core_id can actually be
retreived for uniprocessor systems too.  Enabling them also fixes
rapl_pmu code.

Signed-off-by: Artem Fetishev <artem_fetishev@epam.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-28 13:56:58 -07:00
David Vrabel
5926f87fda Revert "xen: properly account for _PAGE_NUMA during xen pte translations"
This reverts commit a9c8e4beee.

PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT
is set and pseudo-physical addresses is _PAGE_PRESENT is clear.

This is because during a domain save/restore (migration) the page
table entries are "canonicalised" and uncanonicalised". i.e., MFNs are
converted to PFNs during domain save so that on a restore the page
table entries may be rewritten with the new MFNs on the destination.
This canonicalisation is only done for PTEs that are present.

This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or
_PAGE_NUMA) was set but _PAGE_PRESENT was clear.  These PTEs would be
migrated as-is which would result in unexpected behaviour in the
destination domain.  Either a) the MFN would be translated to the
wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE
because the MFN is no longer owned by the domain; or c) the present
bit would not get set.

Symptoms include "Bad page" reports when munmapping after migrating a
domain.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org>        [3.12+]
2014-03-25 11:11:42 +00:00
Andy Lutomirski
3c1b63b9e4 x86, vdso: Finish removing VDSO32_PRELINK
It's a declaration of a nonexistent symbol.  We can get rid of the
64-bit versions, too, but that's more intrusive.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/2ce2ce18447d8a0b78d44a278a066b6c0af06b32.1395366931.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-03-20 20:20:18 -07:00
Andy Lutomirski
9e6f450f94 x86, vdso: Move more vdso definitions into vdso.h
This fixes the Xen build and gets rid of a silly header file.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/1df77311795aff75f5742c787d277518314a38d3.1395366931.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-03-20 20:20:08 -07:00
Andy Lutomirski
b67e612cef x86: Load the 32-bit vdso in place, just like the 64-bit vdsos
This replaces a decent amount of incomprehensible and buggy code
with much more straightforward code.  It also brings the 32-bit vdso
more in line with the 64-bit vdsos, so maybe someday they can share
even more code.

This wastes a small amount of kernel .data and .text space, but it
avoids a couple of allocations on startup, so it should be more or
less a wash memory-wise.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/b8093933fad09ce181edb08a61dcd5d2592e9814.1395352498.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-20 15:19:14 -07:00
Eric Paris
579ec9e1ab audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
The syscall.h headers were including linux/audit.h but really only
needed the uapi/linux/audit.h to get the requisite defines.  Switch to
the uapi headers.

Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux-s390@vger.kernel.org
Cc: x86@kernel.org
2014-03-20 10:11:59 -04:00
Eric Paris
5e937a9ae9 syscall_get_arch: remove useless function arguments
Every caller of syscall_get_arch() uses current for the task and no
implementors of the function need args.  So just get rid of both of
those things.  Admittedly, since these are inline functions we aren't
wasting stack space, but it just makes the prototypes better.

Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux390@de.ibm.com
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
2014-03-20 10:11:59 -04:00
H. Peter Anvin
7b878d4b48 random: Add arch_has_random[_seed]()
Add predicate functions for having arch_get_random[_seed]*().  The
only current use is to avoid the loop in arch_random_refill() when
arch_get_random_seed_long() is unavailable.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-03-19 22:24:08 -04:00
H. Peter Anvin
d20f78d252 x86, random: Enable the RDSEED instruction
Upcoming Intel silicon adds a new RDSEED instruction, which is similar
to RDRAND but provides a stronger guarantee: unlike RDRAND, RDSEED
will always reseed the PRNG from the true random number source between
each read.  Thus, the output of RDSEED is guaranteed to be 100%
entropic, unlike RDRAND which is only architecturally guaranteed to be
1/512 entropic (although in practice is much more.)

The RDSEED instruction takes the same time to execute as RDRAND, but
RDSEED unlike RDRAND can legitimately return failure (CF=0) due to
entropy exhaustion if too many threads on too many cores are hammering
the RDSEED instruction at the same time.  Therefore, we have to be
more conservative and only use it in places where we can tolerate
failures.

This patch introduces the primitives arch_get_random_seed_{int,long}()
but does not use it yet.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-03-19 22:22:06 -04:00
Stefani Seibold
7c03156f34 x86, vdso: Add 32 bit VDSO time support for 64 bit kernel
This patch add the VDSO time support for the IA32 Emulation Layer.

Due the nature of the kernel headers and the LP64 compiler where the
size of a long and a pointer differs against a 32 bit compiler, there
is some type hacking necessary for optimal performance.

The vsyscall_gtod_data struture must be a rearranged to serve 32- and
64-bit code access at the same time:

- The seqcount_t was replaced by an unsigned, this makes the
  vsyscall_gtod_data intedepend of kernel configuration and internal functions.
- All kernel internal structures are replaced by fix size elements
  which works for 32- and 64-bit access
- The inner struct clock was removed to pack the whole struct.

The "unsigned seq" would be handled by functions derivated from seqcount_t.

Signed-off-by: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/1395094933-14252-11-git-send-email-stefani@seibold.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-18 12:52:41 -07:00
Stefani Seibold
7a59ed415f x86, vdso: Add 32 bit VDSO time support for 32 bit kernel
This patch add the time support for 32 bit a VDSO to a 32 bit kernel.

For 32 bit programs running on a 32 bit kernel, the same mechanism is
used as for 64 bit programs running on a 64 bit kernel.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/1395094933-14252-10-git-send-email-stefani@seibold.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-18 12:52:37 -07:00
Andy Lutomirski
b4b541a610 x86, vdso: Patch alternatives in the 32-bit VDSO
We need the alternatives mechanism for rdtsc_barrier() to work.

Signed-off-by: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/1395094933-14252-9-git-send-email-stefani@seibold.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-18 12:52:33 -07:00
Stefani Seibold
ef721987ae x86, vdso: Introduce VVAR marco for vdso32
This patch revamps the vvar.h for introduce the VVAR macro for vdso32.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/1395094933-14252-8-git-send-email-stefani@seibold.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-18 12:52:29 -07:00
Stefani Seibold
d2312e3379 x86, vdso: Make vsyscall_gtod_data handling x86 generic
This patch move the vsyscall_gtod_data handling out of vsyscall_64.c
into an additonal file vsyscall_gtod.c to make the functionality
available for x86 32 bit kernel.

It also adds a new vsyscall_32.c which setup the VVAR page.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/1395094933-14252-2-git-send-email-stefani@seibold.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-18 12:51:52 -07:00
Zoltan Kiss
1429d46df4 xen/grant-table: Refactor gnttab_[un]map_refs to avoid m2p_override
The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
for blkback and future netback patches it just cause a lock contention, as
those pages never go to userspace. Therefore this series does the following:
- the bulk of the original function (everything after the mapping hypercall)
  is moved to arch-dependent set/clear_foreign_p2m_mapping
- the "if (xen_feature(XENFEAT_auto_translated_physmap))" branch goes to ARM
- therefore the ARM function could be much smaller, the m2p_override stubs
  could be also removed
- on x86 the set_phys_to_machine calls were moved up to this new funcion
  from m2p_override functions
- and m2p_override functions are only called when there is a kmap_ops param

It also removes a stray space from arch/x86/include/asm/xen/page.h.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Suggested-by: Anthony Liguori <aliguori@amazon.com>
Suggested-by: David Vrabel <david.vrabel@citrix.com>
Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-03-18 14:40:19 +00:00
Andy Lutomirski
7dda038756 x86_32, mm: Remove user bit from identity map PDE
The only reason that the user bit was set was to support userspace
access to the compat vDSO in the fixmap.  The compat vDSO is gone,
so the user bit can be removed.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/e240a977f3c7cbd525a091fd6521499ec4b8e94f.1394751608.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13 16:20:17 -07:00
Andy Lutomirski
b0b49f2673 x86, vdso: Remove compat vdso support
The compat vDSO is a complicated hack that's needed to maintain
compatibility with a small range of glibc versions.

This removes it and replaces it with a much simpler hack: a config
option to disable the 32-bit vDSO by default.

This also changes the default value of CONFIG_COMPAT_VDSO to n --
users configuring kernels from scratch almost certainly want that
choice.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/4bb4690899106eb11430b1186d5cc66ca9d1660c.1394751608.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13 16:20:09 -07:00
H. Peter Anvin
0b131be8d4 x86, intel: Make MSR_IA32_MISC_ENABLE bit constants systematic
Replace somewhat arbitrary constants for bits in MSR_IA32_MISC_ENABLE
with verbose but systematic ones.  Add _BIT defines for all the rest
of them, too.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13 15:55:46 -07:00
Borislav Petkov
c0a639ad0b x86, Intel: Convert to the new bit access MSR accessors
... and save some lines of code.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1394384725-10796-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13 15:35:09 -07:00
Borislav Petkov
22085a66c2 x86: Add another set of MSR accessor functions
We very often need to set or clear a bit in an MSR as a result of doing
some sort of a hardware configuration. Add generic versions of that
repeated functionality in order to save us a bunch of duplicated code in
the early CPU vendor detection/config code.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1394384725-10796-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13 15:34:45 -07:00
Frederic Weisbecker
073d8224d2 arch: Remove stub cputime.h headers
Many architectures have a stub cputime.h that only include the default
cputime.h

Lets remove the useless headers, we only need to mention that we want
the default headers on the Kbuild files.

Cc: Archs <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-03-13 16:09:30 +01:00
Thomas Gleixner
ffb12cf002 Merge branch 'irq/for-gpio' into irq/core
Merge the request/release callbacks which are in a separate branch for
consumption by the gpio folks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-12 16:01:07 +01:00
Dave Jones
09df7c4c80 x86: Remove CONFIG_X86_OOSTORE
This was an optimization that made memcpy type benchmarks a little
faster on ancient (Circa 1998) IDT Winchip CPUs.  In real-life
workloads, it wasn't even noticable, and I doubt anyone is running
benchmarks on 16 year old silicon any more.

Given this code has likely seen very little use over the last decade,
let's just remove it.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-11 10:16:18 -07:00
Bjorn Helgaas
36fc5500bb sched: Remove unused mc_capable() and smt_capable()
Remove mc_capable() and smt_capable().  Neither is used.

Both were added by 5c45bf279d ("sched: mc/smt power savings sched
policy").  Uses of both were removed by 8e7fbcbc22 ("sched: Remove stale
power aware scheduling remnants and dysfunctional knobs").

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: http://lkml.kernel.org/r/20140304210737.16893.54289.stgit@bhelgaas-glaptop.roam.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:05:45 +01:00
Ingo Molnar
0066f3b93e Merge branch 'perf/urgent' into perf/core
Merge the latest fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:53:50 +01:00
Paolo Bonzini
c77fb5fe6f KVM: x86: Allow the guest to run with dirty debug registers
When not running in guest-debug mode, the guest controls the debug
registers and having to take an exit for each DR access is a waste
of time.  If the guest gets into a state where each context switch
causes DR to be saved and restored, this can take away as much as 40%
of the execution time from the guest.

After this patch, VMX- and SVM-specific code can set a flag in
switch_db_regs, telling vcpu_enter_guest that on the next exit the debug
registers might be dirty and need to be reloaded (syncing will be taken
care of by a new callback in kvm_x86_ops).  This flag can be set on the
first access to a debug registers, so that multiple accesses to the
debug registers only cause one vmexit.

Note that since the guest will be able to read debug registers and
enable breakpoints in DR7, we need to ensure that they are synchronized
on entry to the guest---including DR6 that was not synced before.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:02 +01:00
Paolo Bonzini
360b948d88 KVM: x86: change vcpu->arch.switch_db_regs to a bit mask
The next patch will add another bit that we can test with the
same "if".

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:02 +01:00
Jan Kiszka
c9a7953f09 KVM: x86: Remove return code from enable_irq/nmi_window
It's no longer possible to enter enable_irq_window in guest mode when
L1 intercepts external interrupts and we are entering L2. This is now
caught in vcpu_enter_guest. So we can remove the check from the VMX
version of enable_irq_window, thus the need to return an error code from
both enable_irq_window and enable_nmi_window.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 08:41:47 +01:00
Jan Kiszka
b6b8a1451f KVM: nVMX: Rework interception of IRQs and NMIs
Move the check for leaving L2 on pending and intercepted IRQs or NMIs
from the *_allowed handler into a dedicated callback. Invoke this
callback at the relevant points before KVM checks if IRQs/NMIs can be
injected. The callback has the task to switch from L2 to L1 if needed
and inject the proper vmexit events.

The rework fixes L2 wakeups from HLT and provides the foundation for
preemption timer emulation.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 08:41:45 +01:00
Steven Rostedt
198d208df4 x86: Keep thread_info on thread stack in x86_32
x86_64 uses a per_cpu variable kernel_stack to always point to
the thread stack of current. This is where the thread_info is stored
and is accessed from this location even when the irq or exception stack
is in use. This removes the complexity of having to maintain the
thread info on the stack when interrupts are running and having to
copy the preempt_count and other fields to the interrupt stack.

x86_32 uses the old method of copying the thread_info from the thread
stack to the exception stack just before executing the exception.

Having the two different requires #ifdefs and also the x86_32 way
is a bit of a pain to maintain. By converting x86_32 to the same
method of x86_64, we can remove #ifdefs, clean up the x86_32 code
a little, and remove the overhead of the copy.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:55 -08:00
Steven Rostedt
0788aa6a23 x86: Prepare removal of previous_esp from i386 thread_info structure
The i386 thread_info contains a previous_esp field that is used
to daisy chain the different stacks for dump_stack()
(ie. irq, softirq, thread stacks).

The goal is to eventual make i386 handling of thread_info the same
as x86_64, which means that the thread_info will not be in the stack
but as a per_cpu variable. We will no longer depend on thread_info
being able to daisy chain different stacks as it will only exist
in one location (the thread stack).

By moving previous_esp to the end of thread_info and referencing
it as an offset instead of using a thread_info field, this becomes
a stepping stone to moving the thread_info.

The offset to get to the previous stack is rather ugly in this
patch, but this is only temporary and the prev_esp will be changed
in the next commit. This commit is more for sanity checks of the
change.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012353.891757693@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.608754481@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:54 -08:00
Steven Rostedt (Red Hat)
b807902a88 x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386
According to a git log -p, GET_THREAD_INFO_WITH_ESP() has only been defined
and never been used. Get rid of it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140206144321.409045251@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:54 -08:00
Steven Rostedt
2432e1364b x86: Nuke the supervisor_stack field in i386 thread_info
Nothing references the supervisor_stack in the thread_info field,
and it does not exist in x86_64. To make the two more the same,
it is being removed.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012353.546183789@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.203619611@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:54 -08:00
Heiko Carstens
378a10f3ae fs/compat: optional preadv64/pwrite64 compat system calls
The preadv64/pwrite64 have been implemented for the x32 ABI, in order
to allow passing 64 bit arguments from user space without splitting
them into two 32 bit parameters, like it would be necessary for usual
compat tasks.
Howevert these two system calls are only being used for the x32 ABI,
so add __ARCH_WANT_COMPAT defines for these two compat syscalls and
make these two only visible for x86.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-06 15:35:09 +01:00
Thomas Gleixner
7ff42473eb x86: hardirq: Make irq_hv_callback_count available for CONFIG_HYPERV=m as well
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-06 12:08:37 +01:00
Matt Fleming
994448f1af Merge remote-tracking branch 'tip/x86/efi-mixed' into efi-for-mingo
Conflicts:
	arch/x86/kernel/setup.c
	arch/x86/platform/efi/efi.c
	arch/x86/platform/efi/efi_64.c
2014-03-05 18:15:37 +00:00
Matt Fleming
4fd69331ad Merge remote-tracking branch 'tip/x86/urgent' into efi-for-mingo
Conflicts:
	arch/x86/include/asm/efi.h
2014-03-05 17:31:41 +00:00
Thomas Gleixner
76d388cd72 x86: hyperv: Fixup the (brain) damage caused by the irq cleanup
Compiling last minute changes without setting the proper config
options is not really clever.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-05 13:42:14 +01:00
H. Peter Anvin
3c0b566334 * Disable the new EFI 1:1 virtual mapping for SGI UV because using it
causes a crash during boot - Borislav Petkov
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTFmV4AAoJEC84WcCNIz1VeSgP/1gykrBiH3Vr4H4c32la/arZ
 ktXRAT+RHdiebEXopt6+A1Pv6iyRYz3fBB1cKKb7q8fhDmVVmefGVkyO4qg4NbgL
 Ic1dP6uPgiFidcYML9/c6+UIovbwDD7f5wwJEjXxoZKg7b7P0TIykd8z8YPQ/6A6
 fIY5Z2L9A8eVt4k1m6Dg2PUJ2B+XKOLa5BbL7gXB6u9avgAVAoLM9oQStss+V9Pv
 JQu+BZxeEwuxgi0LOxk1sFWXaAoRtgVDNH6nPK93CRvO2H83voWFf+OcW9hQWsF5
 tLam6aMkvjuM5Dv1IA69PBAWrE/jxcMvjEdJyrGMWRKWtH/iTN4ez4EoFiO38o4R
 IgzXh9L5xbP3o+g3rntIu4h3/5yde9TRM18mER0lLTdNFZ8QzmLt4L0TptntRoBv
 bTXffILACq0uQU6T10P+EwseT472HphMeswaWVDkRxkN2hTCipFqQX0ekb0qbw3O
 yQeRyz1/t7DeA77iAGG96SfeziMdr44d6u6zDQPrTAJV+H1ZZ3XNIlJDi01CAg3b
 PDT6nHb/9V0tPZQebntZnczVRP+5EBdn5RbJaAMldEwVgjdjlFNY5slsF01KeCSF
 6Cx6UyAI9XKuZMoayC3QDHKKWi+BTOwbKcVAdtZ1c9AMLt9pyxsDfdoOAV4zBiQa
 PsVs9t/l7TZoL1MQHlEJ
 =YIab
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * Disable the new EFI 1:1 virtual mapping for SGI UV because using it
   causes a crash during boot - Borislav Petkov

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-04 15:50:06 -08:00
Borislav Petkov
a5d90c923b x86/efi: Quirk out SGI UV
Alex reported hitting the following BUG after the EFI 1:1 virtual
mapping work was merged,

 kernel BUG at arch/x86/mm/init_64.c:351!
 invalid opcode: 0000 [#1] SMP
 Call Trace:
  [<ffffffff818aa71d>] init_extra_mapping_uc+0x13/0x15
  [<ffffffff818a5e20>] uv_system_init+0x22b/0x124b
  [<ffffffff8108b886>] ? clockevents_register_device+0x138/0x13d
  [<ffffffff81028dbb>] ? setup_APIC_timer+0xc5/0xc7
  [<ffffffff8108b620>] ? clockevent_delta2ns+0xb/0xd
  [<ffffffff818a3a92>] ? setup_boot_APIC_clock+0x4a8/0x4b7
  [<ffffffff8153d955>] ? printk+0x72/0x74
  [<ffffffff818a1757>] native_smp_prepare_cpus+0x389/0x3d6
  [<ffffffff818957bc>] kernel_init_freeable+0xb7/0x1fb
  [<ffffffff81535530>] ? rest_init+0x74/0x74
  [<ffffffff81535539>] kernel_init+0x9/0xff
  [<ffffffff81541dfc>] ret_from_fork+0x7c/0xb0
  [<ffffffff81535530>] ? rest_init+0x74/0x74

Getting this thing to work with the new mapping scheme would need more
work, so automatically switch to the old memmap layout for SGI UV.

Acked-by: Russ Anderson <rja@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 23:43:33 +00:00
Matt Fleming
7d453eee36 x86/efi: Wire up CONFIG_EFI_MIXED
Add the Kconfig option and bump the kernel header version so that boot
loaders can check whether the handover code is available if they want.

The xloadflags field in the bzImage header is also updated to reflect
that the kernel supports both entry points by setting both of
XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
guaranteed to be addressable with 32-bits.

Note that no boot loaders should be using the bits set in xloadflags to
decide which entry point to jump to. The entire scheme is based on the
concept that 32-bit bootloaders always jump to ->handover_offset and
64-bit loaders always jump to ->handover_offset + 512. We set both bits
merely to inform the boot loader that it's safe to use the native
handover offset even if the machine type in the PE/COFF header claims
otherwise.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:43:57 +00:00
Matt Fleming
4f9dbcfc40 x86/efi: Add mixed runtime services support
Setup the runtime services based on whether we're booting in EFI native
mode or not. For non-native mode we need to thunk from 64-bit into
32-bit mode before invoking the EFI runtime services.

Using the runtime services after SetVirtualAddressMap() is slightly more
complicated because we need to ensure that all the addresses we pass to
the firmware are below the 4GB boundary so that they can be addressed
with 32-bit pointers, see efi_setup_page_tables().

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:43:14 +00:00
Matt Fleming
b8ff87a615 x86/efi: Firmware agnostic handover entry points
The EFI handover code only works if the "bitness" of the firmware and
the kernel match, i.e. 64-bit firmware and 64-bit kernel - it is not
possible to mix the two. This goes against the tradition that a 32-bit
kernel can be loaded on a 64-bit BIOS platform without having to do
anything special in the boot loader. Linux distributions, for one thing,
regularly run only 32-bit kernels on their live media.

Despite having only one 'handover_offset' field in the kernel header,
EFI boot loaders use two separate entry points to enter the kernel based
on the architecture the boot loader was compiled for,

    (1) 32-bit loader: handover_offset
    (2) 64-bit loader: handover_offset + 512

Since we already have two entry points, we can leverage them to infer
the bitness of the firmware we're running on, without requiring any boot
loader modifications, by making (1) and (2) valid entry points for both
CONFIG_X86_32 and CONFIG_X86_64 kernels.

To be clear, a 32-bit boot loader will always use (1) and a 64-bit boot
loader will always use (2). It's just that, if a single kernel image
supports (1) and (2) that image can be used with both 32-bit and 64-bit
boot loaders, and hence both 32-bit and 64-bit EFI.

(1) and (2) must be 512 bytes apart at all times, but that is already
part of the boot ABI and we could never change that delta without
breaking existing boot loaders anyhow.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:25:06 +00:00
Matt Fleming
426e34cc4f x86/mm/pageattr: Always dump the right page table in an oops
Now that we have EFI-specific page tables we need to lookup the pgd when
dumping those page tables, rather than assuming that swapper_pgdir is
the current pgdir.

Remove the double underscore prefix, which is usually reserved for
static functions.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:23:36 +00:00
Michael Opdenacker
d20d2efbf2 x86: Remove deprecated IRQF_DISABLED
This patch removes the IRQF_DISABLED flag from x86 architecture
code. It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Cc: venki@google.com
Link: http://lkml.kernel.org/r/1393965305-17248-1-git-send-email-michael.opdenacker@free-electrons.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-04 21:47:51 +01:00
Thomas Gleixner
1aec169673 x86: Hyperv: Cleanup the irq mess
The vmbus/hyperv interrupt handling is another complete trainwreck and
probably the worst of all currently in tree.

If CONFIG_HYPERV=y then the interrupt delivery to the vmbus happens
via the direct HYPERVISOR_CALLBACK_VECTOR. So far so good, but:

  The driver requests first a normal device interrupt. The only reason
  to do so is to increment the interrupt stats of that device
  interrupt. For no reason it also installs a private flow handler.

  We have proper accounting mechanisms for direct vectors, but of
  course it's too much effort to add that 5 lines of code.

  Aside of that the alloc_intr_gate() is not protected against
  reallocation which makes module reload impossible.

Solution to the problem is simple to rip out the whole mess and
implement it correctly.

First of all move all that code to arch/x86/kernel/cpu/mshyperv.c and
merily install the HYPERVISOR_CALLBACK_VECTOR with proper reallocation
protection and use the proper direct vector accounting mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linuxdrivers <devel@linuxdriverproject.org>
Cc: x86 <x86@kernel.org>
Link: http://lkml.kernel.org/r/20140223212739.028307673@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-04 17:37:54 +01:00
Thomas Gleixner
929320e4b4 x86: Add proper vector accounting for HYPERVISOR_CALLBACK_VECTOR
HyperV abuses a device interrupt to account for the
HYPERVISOR_CALLBACK_VECTOR.

Provide proper accounting as we have for the other vectors as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: x86 <x86@kernel.org>
Link: http://lkml.kernel.org/r/20140223212738.681855582@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-04 17:37:54 +01:00
Borislav Petkov
b7b898ae0c x86/efi: Make efi virtual runtime map passing more robust
Currently, running SetVirtualAddressMap() and passing the physical
address of the virtual map array was working only by a lucky coincidence
because the memory was present in the EFI page table too. Until Toshi
went and booted this on a big HP box - the krealloc() manner of resizing
the memmap we're doing did allocate from such physical addresses which
were not mapped anymore and boom:

http://lkml.kernel.org/r/1386806463.1791.295.camel@misato.fc.hp.com

One way to take care of that issue is to reimplement the krealloc thing
but with pages. We start with contiguous pages of order 1, i.e. 2 pages,
and when we deplete that memory (shouldn't happen all that often but you
know firmware) we realloc the next power-of-two pages.

Having the pages, it is much more handy and easy to map them into the
EFI page table with the already existing mapping code which we're using
for building the virtual mappings.

Thanks to Toshi Kani and Matt for the great debugging help.

Reported-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:17:18 +00:00
Borislav Petkov
42a5477251 x86, pageattr: Export page unmapping interface
We will use it in efi so expose it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:17:17 +00:00
Borislav Petkov
11cc851254 x86/efi: Dump the EFI page table
This is very useful for debugging issues with the recently added
pagetable switching code for EFI virtual mode.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:17:17 +00:00
Borislav Petkov
ef6bea6ddf x86, ptdump: Add the functionality to dump an arbitrary pagetable
With reusing the ->trampoline_pgd page table for mapping EFI regions in
order to use them after having switched to EFI virtual mode, it is very
useful to be able to dump aforementioned page table in dmesg. This adds
that functionality through the walk_pgd_level() interface which can be
called from somewhere else.

The original functionality of dumping to debugfs remains untouched.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:16:17 +00:00
Matt Fleming
3e90959921 efi: Move facility flags to struct efi
As we grow support for more EFI architectures they're going to want the
ability to query which EFI features are available on the running system.
Instead of storing this information in an architecture-specific place,
stick it in the global 'struct efi', which is already the central
location for EFI state.

While we're at it, let's change the return value of efi_enabled() to be
bool and replace all references to 'facility' with 'feature', which is
the usual word used to describe the attributes of the running system.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:16:16 +00:00
Paolo Bonzini
1c2af4968e Merge tag 'kvm-for-3.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into kvm-next 2014-03-04 15:58:00 +01:00
Andrew Jones
332967a3ea x86: kvm: introduce periodic global clock updates
commit 0061d53daf introduced a mechanism to execute a global clock
update for a vm. We can apply this periodically in order to propagate
host NTP corrections. Also, if all vcpus of a vm are pinned, then
without an additional trigger, no guest NTP corrections can propagate
either, as the current trigger is only vcpu cpu migration.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-04 11:50:54 +01:00
Andrew Jones
7e44e4495a x86: kvm: rate-limit global clock updates
When we update a vcpu's local clock it may pick up an NTP correction.
We can't wait an indeterminate amount of time for other vcpus to pick
up that correction, so commit 0061d53daf introduced a global clock
update. However, we can't request a global clock update on every vcpu
load either (which is what happens if the tsc is marked as unstable).
The solution is to rate-limit the global clock updates. Marcelo
calculated that we should delay the global clock updates no more
than 0.1s as follows:

Assume an NTP correction c is applied to one vcpu, but not the other,
then in n seconds the delta of the vcpu system_timestamps will be
c * n. If we assume a correction of 500ppm (worst-case), then the two
vcpus will diverge 50us in 0.1s, which is a considerable amount.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-04 11:50:47 +01:00
Heiko Carstens
0473c9b5f0 compat: let architectures define __ARCH_WANT_COMPAT_SYS_GETDENTS64
For architecture dependent compat syscalls in common code an architecture
must define something like __ARCH_WANT_<WHATEVER> if it wants to use the
code.
This however is not true for compat_sys_getdents64 for which architectures
must define __ARCH_OMIT_COMPAT_SYS_GETDENTS64 if they do not want the code.

This leads to the situation where all architectures, except mips, get the
compat code but only x86_64, arm64 and the generic syscall architectures
actually use it.

So invert the logic, so that architectures actively must do something to
get the compat code.

This way a couple of architectures get rid of otherwise dead code.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04 09:05:33 +01:00
Greg Kroah-Hartman
13df797743 Merge 3.14-rc5 into driver-core-next
We want the fixes in here.
2014-03-02 20:09:08 -08:00
H. Peter Anvin
840d2830e6 x86, cpufeature: Rename X86_FEATURE_CLFLSH to X86_FEATURE_CLFLUSH
We call this "clflush" in /proc/cpuinfo, and have
cpu_has_clflush()... let's be consistent and just call it that.

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-mlytfzjkvuf739okyn40p8a5@git.kernel.org
2014-02-27 08:31:30 -08:00
Ross Zwisler
171699f763 x86: Add support for the clflushopt instruction
Add support for the new clflushopt instruction.  This instruction was
announced in the document "Intel Architecture Instruction Set Extensions
Programming Reference" with Ref # 319433-018.

http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf

[ hpa: changed the feature flag to simply X86_FEATURE_CLFLUSHOPT - if
  that is what we want to report in /proc/cpuinfo anyway... ]

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Link: http://lkml.kernel.org/r/1393441612-19729-2-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-27 08:23:28 -08:00
H. Peter Anvin
b5660ba76b x86, platforms: Remove NUMAQ
The NUMAQ support seems to be unmaintained, remove it.

Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/n/530CFD6C.7040705@zytor.com
2014-02-27 08:07:39 -08:00
H. Peter Anvin
c5f9ee3d66 x86, platforms: Remove SGI Visual Workstation
The SGI Visual Workstation seems to be dead; remove support so we
don't have to continue maintaining it.

Cc: Andrey Panin <pazke@donpac.ru>
Cc: Michael Reed <mdr@sgi.com>
Link: http://lkml.kernel.org/r/530CFD6C.7040705@zytor.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-27 08:07:39 -08:00
Ingo Molnar
ff5a7088f0 Merge branch 'perf/urgent' into perf/core
Merge the latest fixes before queueing up new changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27 12:41:17 +01:00
Liu, Jinsong
da8999d318 KVM: x86: Intel MPX vmx and msr handle
From caddc009a6d2019034af8f2346b2fd37a81608d0 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Mon, 24 Feb 2014 18:11:11 +0800
Subject: [PATCH v5 1/3] KVM: x86: Intel MPX vmx and msr handle

This patch handle vmx and msr of Intel MPX feature.

Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-24 12:14:00 +01:00
Liu, Jinsong
56c103ec04 KVM: x86: Fix xsave cpuid exposing bug
From 00c920c96127d20d4c3bb790082700ae375c39a0 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Fri, 21 Feb 2014 23:47:18 +0800
Subject: [PATCH] KVM: x86: Fix xsave cpuid exposing bug

EBX of cpuid(0xD, 0) is dynamic per XCR0 features enable/disable.
Bit 63 of XCR0 is reserved for future expansion.

Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-22 15:53:34 +01:00
Fenghua Yu
c2bc11f10a x86, AVX-512: Enable AVX-512 States Context Switch
This patch enables Opmask, ZMM_Hi256, and Hi16_ZMM AVX-512 states for
xstate context switch.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1392931491-33237-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # hw enabling
2014-02-20 13:56:55 -08:00
Fenghua Yu
8e5780fdee x86, AVX-512: AVX-512 Feature Detection
AVX-512 is an extention of AVX2. Its spec can be found at:
http://download-software.intel.com/sites/default/files/managed/71/2e/319433-017.pdf

This patch detects AVX-512 features by CPUID.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1392931491-33237-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # hw enabling
2014-02-20 13:56:55 -08:00
Thomas Gleixner
5f0e030930 x86, tsc: Fallback to normal calibration if fast MSR calibration fails
If we cannot calibrate TSC via MSR based calibration
try_msr_calibrate_tsc() stores zero to fast_calibrate and returns that
to the caller. This value gets then propagated further to clockevents
code resulting division by zero oops like the one below:

 divide error: 0000 [#1] PREEMPT SMP
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0+ #47
 task: ffff880075508000 ti: ffff880075506000 task.ti: ffff880075506000
 RIP: 0010:[<ffffffff810aec14>]  [<ffffffff810aec14>] clockevents_config.part.3+0x24/0xa0
 RSP: 0000:ffff880075507e58  EFLAGS: 00010246
 RAX: ffffffffffffffff RBX: ffff880079c0cd80 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff
 RBP: ffff880075507e70 R08: 0000000000000001 R09: 00000000000000be
 R10: 00000000000000bd R11: 0000000000000003 R12: 000000000000b008
 R13: 0000000000000008 R14: 000000000000b010 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff880079c00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: ffff880079fff000 CR3: 0000000001c0b000 CR4: 00000000001006f0
 Stack:
  ffff880079c0cd80 000000000000b008 0000000000000008 ffff880075507e88
  ffffffff810aecb0 ffff880079c0cd80 ffff880075507e98 ffffffff81030168
  ffff880075507ed8 ffffffff81d1104f 00000000000000c3 0000000000000000
 Call Trace:
  [<ffffffff810aecb0>] clockevents_config_and_register+0x20/0x30
  [<ffffffff81030168>] setup_APIC_timer+0xc8/0xd0
  [<ffffffff81d1104f>] setup_boot_APIC_clock+0x4cc/0x4d8
  [<ffffffff81d0f5de>] native_smp_prepare_cpus+0x3dd/0x3f0
  [<ffffffff81d02ee9>] kernel_init_freeable+0xc3/0x205
  [<ffffffff8177c910>] ? rest_init+0x90/0x90
  [<ffffffff8177c91e>] kernel_init+0xe/0x120
  [<ffffffff8178deec>] ret_from_fork+0x7c/0xb0
  [<ffffffff8177c910>] ? rest_init+0x90/0x90

Prevent this from happening by:
 1) Modifying try_msr_calibrate_tsc() to return calibration value or zero
    if it fails.
 2) Check this return value in native_calibrate_tsc() and in case of zero
    fallback to use normal non-MSR based calibration.

[mw: Added subject and changelog]

Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bin Gao <bin.gao@linux.intel.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1392810750-18660-1-git-send-email-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-19 17:12:24 +01:00
Ard Biesheuvel
2b9c1f0327 x86: align x86 arch with generic CPU modalias handling
The x86 CPU feature modalias handling existed before it was reimplemented
generically. This patch aligns the x86 handling so that it
(a) reuses some more code that is now generic;
(b) uses the generic format for the modalias module metadata entry, i.e., it
    now uses 'cpu:type:x86,venVVVVfamFFFFmodMMMM:feature:,XXXX,YYYY' instead of
    the 'x86cpu:vendor:VVVV👪FFFF:model:MMMM:feature:,XXXX,YYYY' that was
    used before.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-18 12:45:38 -08:00
Linus Torvalds
7fc9280462 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI fixes from Peter Anvin:
 "A few more EFI-related fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Check status field to validate BGRT header
  x86/efi: Fix 32-bit fallout
2014-02-15 15:02:28 -08:00
Borislav Petkov
c55d016f7a x86/efi: Fix 32-bit fallout
We do not enable the new efi memmap on 32-bit and thus we need to run
runtime_code_page_mkexec() unconditionally there. Fix that.

Reported-and-tested-by: Lejun Zhu <lejun.zhu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-02-14 09:30:19 +00:00
Mel Gorman
a9c8e4beee xen: properly account for _PAGE_NUMA during xen pte translations
Steven Noonan forwarded a users report where they had a problem starting
vsftpd on a Xen paravirtualized guest, with this in dmesg:

  BUG: Bad page map in process vsftpd  pte:8000000493b88165 pmd:e9cc01067
  page:ffffea00124ee200 count:0 mapcount:-1 mapping:     (null) index:0x0
  page flags: 0x2ffc0000000014(referenced|dirty)
  addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping:          (null) index:7f97eea74
  CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
  Call Trace:
    dump_stack+0x45/0x56
    print_bad_pte+0x22e/0x250
    unmap_single_vma+0x583/0x890
    unmap_vmas+0x65/0x90
    exit_mmap+0xc5/0x170
    mmput+0x65/0x100
    do_exit+0x393/0x9e0
    do_group_exit+0xcc/0x140
    SyS_exit_group+0x14/0x20
    system_call_fastpath+0x1a/0x1f
  Disabling lock debugging due to kernel taint
  BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1
  BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1

The issue could not be reproduced under an HVM instance with the same
kernel, so it appears to be exclusive to paravirtual Xen guests.  He
bisected the problem to commit 1667918b64 ("mm: numa: clear numa
hinting information on mprotect") that was also included in 3.12-stable.

The problem was related to how xen translates ptes because it was not
accounting for the _PAGE_NUMA bit.  This patch splits pte_present to add
a pteval_present helper for use by xen so both bare metal and xen use
the same code when checking if a PTE is present.

[mgorman@suse.de: wrote changelog, proposed minor modifications]
[akpm@linux-foundation.org: fix typo in comment]
Reported-by: Steven Noonan <steven@uplinklabs.net>
Tested-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org>	[3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:41 -08:00
Tim Chen
ddf1d169c0 locking/mcs: Allow architecture specific asm files to be used for contended case
This patch allows each architecture to add its specific assembly optimized
arch_mcs_spin_lock_contended and arch_mcs_spinlock_uncontended for
MCS lock and unlock functions.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: AswinChandramouleeswaran <aswin@hp.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rik vanRiel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: MichelLespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Figo.zhang" <figo1802@gmail.com>
Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390347382.3138.67.camel@schen9-DESK
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 21:18:52 +01:00
David Rientjes
dc9788f40a x86/apic: Always define nox2apic and define it as initdata
The "nox2apic" variable can be defined as __initdata since it is
only used for bootstrap.  It can now unconditionally be defined
since it will later be freed.

At the same time, it is also better off as a bool.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042354380.7839@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:15:11 +01:00
David Rientjes
6d4989835e x86/apic: Remove unused function prototypes
Some function prototypes declared in asm/apic.h are never
defined, so remove them.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042354210.7839@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:15:09 +01:00
David Rientjes
465822cfc8 x86/apic: Switch wait_for_init_deassert() to a bool flag
Now that there is only a single wait_for_init_deassert()
function, just convert the member of struct apic to a bool to
determine whether we need to wait for init_deassert to become
non-zero.

There are no more callers of default_wait_for_init_deassert(),
so fold it into the caller.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042354010.7839@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:15:08 +01:00
David Rientjes
d3c63ae1e2 x86/apic: Only use default_wait_for_init_deassert()
es7000_wait_for_init_deassert() is functionally equivalent to
default_wait_for_init_deassert(), so remove the duplicate code
and use only a single function.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042353030.7839@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:15:07 +01:00
Peter Zijlstra
e90c785352 x86/nmi: Push duration printk() to irq context
Calling printk() from NMI context is bad (TM), so move it to IRQ
context.

In doing so we slightly change (probably wreck) the debugfs
nmi_longest_ns thingy, in that it doesn't update to reflect the
longest, nor does writing to it reset the count.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-rdw0au56a5ymis1u8p48c12d@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:17:22 +01:00
Linus Torvalds
c1ff84317f Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "Quite a varied little collection of fixes.  Most of them are
  relatively small or isolated; the biggest one is Mel Gorman's fixes
  for TLB range flushing.

  A couple of AMD-related fixes (including not crashing when given an
  invalid microcode image) and fix a crash when compiled with gcov"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, AMD: Unify valid container checks
  x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y
  x86/efi: Allow mapping BGRT on x86-32
  x86: Fix the initialization of physnode_map
  x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable()
  x86/intel/mid: Fix X86_INTEL_MID dependencies
  arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLIT
  mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge
  x86: mm: change tlb_flushall_shift for IvyBridge
  x86/mm: Eliminate redundant page table walk during TLB range flushing
  x86/mm: Clean up inconsistencies when flushing TLB ranges
  mm, x86: Account for TLB flushes only when debugging
  x86/AMD/NB: Fix amd_set_subcaches() parameter type
  x86/quirks: Add workaround for AMD F16h Erratum792
  x86, doc, kconfig: Fix dud URL for Microcode data
2014-02-08 11:54:43 -08:00
H. Peter Anvin
a3b072cd18 * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit).
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJS9P2pAAoJEC84WcCNIz1VABsP/j0fwiWdaoEn/9p9EUQcBcMn
 CILuiKI8GoBR+0nH7vm/jSgUKfUxzM6T0+qjoZPVkO/u/qup+JCT5JxoywUyEbfb
 +wPNIhBgKSwJdWXPd4ZvObA9jknrP6r5sJTsixpESVpVWmcC8egPgkyBFILjokKS
 BCJqqruHZcXfWaDQTocYErl3T217J7RfnCG1o7yP96g4jteX8EdvIkPjEf2mM83I
 GXacHLy8IhGhdyPKSXcRqZ0ivhZ2WX1iFQYIY19FhmzBs364WulyXve+oV+l/4h4
 Kx7ks4Ob5AlUIizchGNwVz7058F9o/v7m0CezTgi4Q/RrZi34samxbjk/95/Xc60
 JSvWkekm3/jOODubad5zj+ZnJmG83/ZlCUuTaqsE47ftbaedSUHBN9QSpm8iHEex
 n3d4J3AdP/3amcP33kZ5MRALDYIFKb4ZxtDkADqDcXhS56COivGAdZe5hnyCpb/9
 RPUDXTOlxfJQK/y2Atcdb400JjJ/Yr9Kew81LRIt0UMZU3dKSh05UZ+a7Ym0yCkt
 3k0NNkgsFCZbYTO/Z3aPDcprwU5Lq9UrwjB17U2ev/qK+qRYDzCzSR0XGPrLMRv7
 C5Bnov6uCn/0ZG/NlAx8UXK9wdWDsLhp1QkBz+daX3sGwRAS+OiKBv6+l8dqsdOc
 1L8PMkTX2rgtELiv4PJ/
 =hLCC
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit).

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-07 11:27:30 -08:00
Linus Torvalds
1cd731df09 Bug-fixes:
- Revert "xen/grant-table: Avoid m2p_override during mapping" as it broke Xen ARM build.
  - Fix CR4 not being set on AP processors in Xen PVH mode.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJS8AyQAAoJEFjIrFwIi8fJbD4IAJssMuaLI5CRsSWBgDFHHDFt
 srVJpDOYQiDr/TxkwFCVcL4sFy9Htb3KMArU4eIBl6uMqQbGa+3rHyXcHYI219YY
 XH3D8RG+9JChwsxtaeUEzwx1C8ehcygD34vtdcoQXa7eBuEi4TL3HeLifR+HrXKO
 UdFrTA34FmvpVFbSuRXkZh5sd6ca9et9xHuQHM8SIY6pVokY6xaEYOp17tfPZpwM
 7A6LFjUjXeugHC2L3+/H8UOHA9nSZQvnMiZOWq2Cusc2Dt2V7emzgk2wcc2CHttf
 EA6GbtiJzHqMPmt5EjubI9hHdSMB31HpY4hnQE38+ucl+BwiSdRE9z2Rm4TYClg=
 =IX4M
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen fixes from Konrad Rzeszutek Wilk:
 "Bug-fixes:
   - Revert "xen/grant-table: Avoid m2p_override during mapping" as it
     broke Xen ARM build.
   - Fix CR4 not being set on AP processors in Xen PVH mode"

* tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pvh: set CR4 flags for APs
  Revert "xen/grant-table: Avoid m2p_override during mapping"
2014-02-05 16:01:11 -08:00
Marcelo Tosatti
4f34d683e5 KVM: x86: remove unused last_kernel_ns variable
Remove unused last_kernel_ns variable.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-04 04:20:42 +01:00
Bjorn Helgaas
25453e9e52 x86/PCI: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node()
There are no callers of get_mp_bus_to_node(), so we no longer need
mp_bus_to_node[], get_mp_bus_to_node(), or set_mp_bus_to_node().
This removes them.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-03 10:38:46 -07:00
Bjorn Helgaas
afcf21c2be x86/PCI: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus
The AMD early_fill_mp_bus_info() already allocates a struct pci_root_info
for each PCI host bridge it finds, and that structure contains the NUMA
node number.  We don't need to keep the same information in the
mp_bus_to_node[] table.

This adds x86_pci_root_bus_node(), which returns the NUMA node number, or
NUMA_NO_NODE if the node is unknown.

Note that unlike get_mp_bus_to_node(), x86_pci_root_bus_node() only works
for root buses.  For example, if amd_bus.c finds a host bridge on node 1 to
[bus 00-0f], get_mp_bus_to_node() returns 1 for any bus between 00 and 0f,
but x86_pci_root_bus_node() returns 1 for bus 00 and NUMA_NO_NODE for buses
01-0f.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-03 10:38:35 -07:00
Bjorn Helgaas
49886cf4c4 x86/PCI: Drop return value of pcibios_scan_root()
Nobody really uses the return value of pcibios_scan_root() (one place uses
it to control a printk, but the printk is not very useful).  This converts
pcibios_scan_root() to a void function.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-03 10:38:29 -07:00
Bjorn Helgaas
289a24a699 x86/PCI: Merge pci_scan_bus_on_node() into pcibios_scan_root()
pci_scan_bus_on_node() is only called by pcibios_scan_root().
This merges pci_scan_bus_on_node() into pcibios_scan_root() and removes
pci_scan_bus_on_node().

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-03 10:38:23 -07:00
Bjorn Helgaas
8d7d818676 x86/PCI: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata()
pci_scan_bus_with_sysdata() and pcibios_scan_root() are quite similar:

  pci_scan_bus_with_sysdata
    pci_scan_bus_on_node(..., &pci_root_ops, -1)

  pcibios_scan_root
    pci_scan_bus_on_node(..., &pci_root_ops, get_mp_bus_to_node(busnum))

get_mp_bus_to_node() returns -1 if it couldn't find the node number, so
this removes pci_scan_bus_with_sysdata() and uses pcibios_scan_root()
instead.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-03 10:38:11 -07:00
Konrad Rzeszutek Wilk
e85fc98055 Revert "xen/grant-table: Avoid m2p_override during mapping"
This reverts commit 08ece5bb23.

As it breaks ARM builds and needs more attention
on the ARM side.

Acked-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-03 06:44:49 -05:00
Linus Torvalds
ab5318788c Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core debug changes from Ingo Molnar:
 "This contains mostly kernel debugging related updates:

   - make hung_task detection more configurable to distros
   - add final bits for x86 UV NMI debugging, with related KGDB changes
   - update the mailing-list of MAINTAINERS entries I'm involved with"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hung_task: Display every hung task warning
  sysctl: Add neg_one as a standard constraint
  x86/uv/nmi, kgdb/kdb: Fix UV NMI handler when KDB not configured
  x86/uv/nmi: Fix Sparse warnings
  kgdb/kdb: Fix no KDB config problem
  MAINTAINERS: Restore "L: linux-kernel@vger.kernel.org" entries
2014-01-31 08:59:46 -08:00
Linus Torvalds
14164b46fc Bug-fixes:
- Xen ARM couldn't use the new FIFO events
  - Xen ARM couldn't use the SWIOTLB if compiled as 32-bit with 64-bit PCIe devices.
  - Grant table were doing needless M2P operations.
  - Ratchet down the self-balloon code so it won't OOM.
  - Fix misplaced kfree in Xen PVH error code paths.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJS68IQAAoJEFjIrFwIi8fJAWgH/j4HStEey3rgGcqwIWSHkkap
 +t55wsrT8Ylq6CzZjaUtCo3pB7HotW526x/0rA2pxVqHn/8oCN/1EtdrNtYm/umX
 qOoda+db5NIjAEGVLWSLqGyokJQDrX/brXIWfYR300e9fnJi7yT/rFC4QHoZVUYl
 5LME8XH/jE012vvYelNu6DbbodlRmVCT8hctJS+eB5ER2WmtD9Pkw4GybEXPVYJz
 hE0Ts1DN91nKP2FGJb+mfB9UFT5X8i00akAK+Qc1R3sRnRh6eRoNV8dgyCnudKpO
 UPEdiAZvgij+mzlgIYSz6nKH0U/VbvRsG3lc3i5Si3o+vR3CYPCkvzOGX2d0rjw=
 =7cxW
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.14-rc0-late-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen bugfixes from Konrad Rzeszutek Wilk:
 "Bug-fixes for the new features that were added during this cycle.

  There are also two fixes for long-standing issues for which we have a
  solution: grant-table operations extra work that was not needed
  causing performance issues and the self balloon code was too
  aggressive causing OOMs.

  Details:
   - Xen ARM couldn't use the new FIFO events
   - Xen ARM couldn't use the SWIOTLB if compiled as 32-bit with 64-bit PCIe devices.
   - Grant table were doing needless M2P operations.
   - Ratchet down the self-balloon code so it won't OOM.
   - Fix misplaced kfree in Xen PVH error code paths"

* tag 'stable/for-linus-3.14-rc0-late-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pvh: Fix misplaced kfree from xlated_setup_gnttab_pages
  drivers: xen: deaggressive selfballoon driver
  xen/grant-table: Avoid m2p_override during mapping
  xen/gnttab: Use phys_addr_t to describe the grant frame base address
  xen: swiotlb: handle sizeof(dma_addr_t) != sizeof(phys_addr_t)
  arm/xen: Initialize event channels earlier
2014-01-31 08:38:18 -08:00
Linus Torvalds
e2a0f813e0 Second batch of KVM updates. Some minor x86 fixes,
two s390 guest features that need some handling in the host,
 and all the PPC changes.  The PPC changes include support for
 little-endian guests and enablement for new POWER8 features.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJS6UF5AAoJEBvWZb6bTYby55kP/AgTJnyu7avN653/2aSHkjkx
 KgYSMYhZPIFoY5LyZuNetXaoXFRvCykux1VYSZ6V6s35h2PZ+hdJNbHGjFYKPGTq
 FQ92xQVNuWCAPxmFCjDNuDV/0BauG5y08/Orh/jpjz+GAfH43LruUQGbtXUuyJ8u
 vf+yTHniU5gguqsAmodqjHUgbf+GoPJ1j7hmRoWwt8IWm7Ns3v/IK4l0p6G0h26a
 RjE6aK+Tm208Yr5hD/dRAqeTbBNt3c4xub+QPsKoiEMaZBSuAOiux7D3Kx+If1gp
 WsmqEQxoymihVtkZhUFO9ONLJepvmG2QwJVVyMSUW9iqxX9rraXsvVyVMwcQAhog
 JuOAYxBftH07xu6Fs4eym5KvCFghM+EaJvxxt+kgnvdD4htK1+eK5trntc2zygSi
 /qGiIrkqjXpkskW8kujLayF0eAU3CrZvFWveEPBfFgYiOGX/2wzJCtSm/bt9Jo0M
 v60qgNFK3LNqAyeEfnm9VtlwGr6ZgsAB6DHNPX4fM5s2IBjL+qloXk/e/+aVKkW0
 I3yeRdy/ExhLAab6w81JtMeR7G3YS0UNuAEVvcoxzNb5wIBY8qnpfUzTKyMxQR94
 64EVpxWEYO1s55eCCyMujWrSvc+YAwhJcWHGKgC4K7mxxLD3FVyQXX6YZvgRozMX
 HjQju+DToj9CskyrFlRL
 =yd0Z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "Second batch of KVM updates.  Some minor x86 fixes, two s390 guest
  features that need some handling in the host, and all the PPC changes.

  The PPC changes include support for little-endian guests and
  enablement for new POWER8 features"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (45 commits)
  x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101
  x86, kvm: cache the base of the KVM cpuid leaves
  kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdef
  KVM: PPC: Book3S PR: Cope with doorbell interrupts
  KVM: PPC: Book3S HV: Add software abort codes for transactional memory
  KVM: PPC: Book3S HV: Add new state for transactional memory
  powerpc/Kconfig: Make TM select VSX and VMX
  KVM: PPC: Book3S HV: Basic little-endian guest support
  KVM: PPC: Book3S HV: Add support for DABRX register on POWER7
  KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells
  KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8
  KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs
  KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from nap
  KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8
  KVM: PPC: Book3S HV: Add handler for HV facility unavailable
  KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8
  KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs
  KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers
  KVM: PPC: Book3S HV: Don't set DABR on POWER8
  kvm/ppc: IRQ disabling cleanup
  ...
2014-01-31 08:37:32 -08:00
Zoltan Kiss
08ece5bb23 xen/grant-table: Avoid m2p_override during mapping
The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
for blkback and future netback patches it just cause a lock contention, as
those pages never go to userspace. Therefore this series does the following:
- the original functions were renamed to __gnttab_[un]map_refs, with a new
  parameter m2p_override
- based on m2p_override either they follow the original behaviour, or just set
  the private flag and call set_phys_to_machine
- gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
  m2p_override false
- a new function gnttab_[un]map_refs_userspace provides the old behaviour

It also removes a stray space from page.h and change ret to 0 if
XENFEAT_auto_translated_physmap, as that is the only possible return value
there.

v2:
- move the storing of the old mfn in page->index to gnttab_map_refs
- move the function header update to a separate patch

v3:
- a new approach to retain old behaviour where it needed
- squash the patches into one

v4:
- move out the common bits from m2p* functions, and pass pfn/mfn as parameter
- clear page->private before doing anything with the page, so m2p_find_override
  won't race with this

v5:
- change return value handling in __gnttab_[un]map_refs
- remove a stray space in page.h
- add detail why ret = 0 now at some places

v6:
- don't pass pfn to m2p* functions, just get it locally

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Suggested-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-31 09:48:32 -05:00
Linus Torvalds
aa2e7100e3 Merge branch 'akpm' (patches from Andrew Morton)
Merge misc fixes from Andrew Morton:
 "A few hotfixes and various leftovers which were awaiting other merges.

  Mainly movement of zram into mm/"

* emailed patches fron Andrew Morton <akpm@linux-foundation.org>: (25 commits)
  memcg: fix mutex not unlocked on memcg_create_kmem_cache fail path
  Documentation/filesystems/vfs.txt: update file_operations documentation
  mm, oom: base root bonus on current usage
  mm: don't lose the SOFT_DIRTY flag on mprotect
  mm/slub.c: fix page->_count corruption (again)
  mm/mempolicy.c: fix mempolicy printing in numa_maps
  zram: remove zram->lock in read path and change it with mutex
  zram: remove workqueue for freeing removed pending slot
  zram: introduce zram->tb_lock
  zram: use atomic operation for stat
  zram: remove unnecessary free
  zram: delay pending free request in read path
  zram: fix race between reset and flushing pending work
  zsmalloc: add maintainers
  zram: add zram maintainers
  zsmalloc: add copyright
  zram: add copyright
  zram: remove old private project comment
  zram: promote zram from staging
  zsmalloc: move it under mm
  ...
2014-01-30 18:44:44 -08:00
Linus Torvalds
12f2bbd609 Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asmlinkage (LTO) changes from Peter Anvin:
 "This patchset adds more infrastructure for link time optimization
  (LTO).

  This patchset was pulled into my tree late because of a
  miscommunication (part of the patchset was picked up by other
  maintainers).  However, the patchset is strictly build-related and
  seems to be okay in testing"

* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, asmlinkage, xen: Fix type of NMI
  x86, asmlinkage, xen, kvm: Make {xen,kvm}_lock_spinning global and visible
  x86: Use inline assembler instead of global register variable to get sp
  x86, asmlinkage, paravirt: Make paravirt thunks global
  x86, asmlinkage, paravirt: Don't rely on local assembler labels
  x86, asmlinkage, lguest: Fix C functions used by inline assembler
2014-01-30 18:15:32 -08:00
Andrey Vagin
24f91eba18 mm: don't lose the SOFT_DIRTY flag on mprotect
The SOFT_DIRTY bit shows that the content of memory was changed after a
defined point in the past.  mprotect() doesn't change the content of
memory, so it must not change the SOFT_DIRTY bit.

This bug causes a malfunction: on the first iteration all pages are
dumped.  On other iterations only pages with the SOFT_DIRTY bit are
dumped.  So if the SOFT_DIRTY bit is cleared from a page by mistake, the
page is not dumped and its content will be restored incorrectly.

This patch does nothing with _PAGE_SWP_SOFT_DIRTY, becase pte_modify()
is called only for present pages.

Fixes commit 0f8975ec4d ("mm: soft-dirty bits for user memory changes
tracking").

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-30 16:56:56 -08:00
Andi Kleen
dff38e3e93 x86: Use inline assembler instead of global register variable to get sp
LTO in gcc 4.6/47. has trouble with global register variables. They were used
to read the stack pointer. Use a simple inline assembler statement with
a mov instead.

This also helps LLVM/clang, which does not support global register
variables.

[ hpa: Ideally this should become a builtin in both gcc and clang. ]

v2: More general asm constraint. Fix description (Jan Beulich)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1382458079-24450-6-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-29 22:17:17 -08:00
Andi Kleen
a2e7f0e3a4 x86, asmlinkage, paravirt: Make paravirt thunks global
The paravirt thunks use a hack of using a static reference to a static
function to reference that function from the top level statement.

This assumes that gcc always generates static function names in a specific
format, which is not necessarily true.

Simply make these functions global and asmlinkage or __visible. This way the
static __used variables are not needed and everything works.

Functions with arguments are __visible to keep the register calling
convention on 32bit.

Changed in paravirt and in all users (Xen and vsmp)

v2: Use __visible for functions with arguments

Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ido Yariv <ido@wizery.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1382458079-24450-5-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-29 22:17:17 -08:00
Andi Kleen
824a287009 x86, asmlinkage, paravirt: Don't rely on local assembler labels
The paravirt patching code assumes that it can reference a
local assembler label between two different top level assembler
statements. This does not work with LTO
where the assembler code may end up in different assembler files.

Replace it with extern / global /asm linkage labels.

This also removes one redundant copy of the macro.

Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1382458079-24450-4-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-29 22:17:17 -08:00
Linus Torvalds
cca21640d2 Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more x32 uabi type fixes from Peter Anvin:
 "Despite the branch name, **most of these changes are to generic
  code**.  They change types so that they make an increasing amount of
  the exported uapi kernel headers usable for libc.

  The ARM64 people are also interested in these changes for their ILP32
  ABI"

* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  uapi: Use __kernel_long_t in struct mq_attr
  uapi: Use __kernel_ulong_t in shmid64_ds/shminfo64/shm_info
  x86, uapi, x32: Use __kernel_ulong_t in x86 struct semid64_ds
  uapi: Use __kernel_ulong_t in struct msqid64_ds
  uapi: Use __kernel_long_t in struct msgbuf
  uapi, asm-generic: Use __kernel_ulong_t in uapi struct ipc64_perm
  uapi: Use __kernel_long_t/__kernel_ulong_t in <linux/resource.h>
  uapi: Use __kernel_long_t in struct timex
2014-01-29 18:22:16 -08:00
Paolo Bonzini
77f01bdfa5 x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101
When Hyper-V hypervisor leaves are present, KVM must relocate
its own leaves at 0x40000100, because Windows does not look for
Hyper-V leaves at indices other than 0x40000000.  In this case,
the KVM features are at 0x40000101, but the old code would always
look at 0x40000001.

Fix by using kvm_cpuid_base().  This also requires making the
function non-inline, since kvm_cpuid_base() is static.

Fixes: 1085ba7f55
Cc: stable@vger.kernel.org
Cc: mtosatti@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-29 18:11:55 +01:00
Paolo Bonzini
1c300a4077 x86, kvm: cache the base of the KVM cpuid leaves
It is unnecessary to go through hypervisor_cpuid_base every time
a leaf is found (which will be every time a feature is requested
after the next patch).

Fixes: 1085ba7f55
Cc: stable@vger.kernel.org
Cc: mtosatti@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-29 18:11:54 +01:00
Yinghai Lu
4ce7a8697c x86: revert wrong memblock current limit setting
Dave reported big numa system booting is broken.

It turns out that commit 5b6e529521 ("x86: memblock: set current limit
to max low memory address") sets the limit to low wrongly.

max_low_pfn_mapped is different from max_pfn_mapped.
max_low_pfn_mapped is always under 4G.

That will memblock_alloc_nid all go under 4G.

Revert 5b6e529521 to fix a no-boot regression which was triggered by
457ff1de2d ("lib/swiotlb.c: use memblock apis for early memory
allocations").

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:38 -08:00
Linus Torvalds
4ba9920e5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BPF debugger and asm tool by Daniel Borkmann.

 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

 3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

 4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match.  From Ben Hutchings.

 5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

 6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

 9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI.  From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
    Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
    address.  From Christoph Paasch.

21) Support 10G in generic phylib.  From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided.  From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
  net/cxgb4: Fix referencing freed adapter
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  fib_frontend: fix possible NULL pointer dereference
  rtnetlink: remove IFLA_BOND_SLAVE definition
  rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
  qlcnic: update version to 5.3.55
  qlcnic: Enhance logic to calculate msix vectors.
  qlcnic: Refactor interrupt coalescing code for all adapters.
  qlcnic: Update poll controller code path
  qlcnic: Interrupt code cleanup
  qlcnic: Enhance Tx timeout debugging.
  qlcnic: Use bool for rx_mac_learn.
  bonding: fix u64 division
  rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
  sfc: Use the correct maximum TX DMA ring size for SFC9100
  Add Shradha Shah as the sfc driver maintainer.
  net/vxlan: Share RX skb de-marking and checksum checks with ovs
  tulip: cleanup by using ARRAY_SIZE()
  ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
  net/cxgb4: Don't retrieve stats during recovery
  ...
2014-01-25 11:17:34 -08:00
Ingo Molnar
2b45e0f9f3 Merge branch 'linus' into x86/urgent
Merge in the x86 changes to apply a fix.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 09:16:14 +01:00
Mel Gorman
ec65993443 mm, x86: Account for TLB flushes only when debugging
Bisection between 3.11 and 3.12 fingered commit 9824cf97 ("mm:
vmstats: tlb flush counters") to cause overhead problems.

The counters are undeniably useful but how often do we really
need to debug TLB flush related issues?  It does not justify
taking the penalty everywhere so make it a debugging option.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-XzxjntugxuwpxXhcrxqqh53b@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 09:10:41 +01:00
Mike Travis
74c93f9d39 x86/uv/nmi: Fix Sparse warnings
Make uv_register_nmi_notifier() and uv_handle_nmi_ping() static
to address sparse warnings.

Fix problem where uv_nmi_kexec_failed is unused when
CONFIG_KEXEC is not defined.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/20140114162551.480872353@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 08:55:10 +01:00
Dan Carpenter
2993ae3305 x86/AMD/NB: Fix amd_set_subcaches() parameter type
This is under CAP_SYS_ADMIN, but Smatch complains that mask comes
from the user and the test for "mask > 0xf" can underflow.

The fix is simple: amd_set_subcaches() should hand down not an 'int'
but an 'unsigned long' like it was originally indended to do.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/20140121072209.GA22095@elgon.mountain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 08:50:09 +01:00
Ard Biesheuvel
cf0744021c firmware/dmi_scan: generalize for use by other archs
This patch makes a couple of changes to the SMBIOS/DMI scanning
code so it can be used on other archs (such as ARM and arm64):
(a) wrap the calls to ioremap()/iounmap(), this allows the use of a
    flavor of ioremap() more suitable for random unaligned access;
(b) allow the non-EFI fallback probe into hardcoded physical address
    0xF0000 to be disabled.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Grant Likely <grant.likely@linaro.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:57 -08:00
Mark Salter
114cefc87e x86: use generic fixmap.h
Signed-off-by: Mark Salter <msalter@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:54 -08:00
Linus Torvalds
84621c9b18 Features:
- FIFO event channels. Key advantages: support for over 100,000 events (2^17),
    16 different event priorities, improved fairness in event latency through
    the use of FIFOs.
  - Xen PVH support. "It’s a fully PV kernel mode, running with paravirtualized
    disk and network, paravirtualized interrupts and timers, no emulated devices
    of any kind (and thus no qemu), no BIOS or legacy boot — but instead of
    requiring PV MMU, it uses the HVM hardware extensions to virtualize the
    pagetables, as well as system calls and other privileged operations."
    (from "The Paravirtualization Spectrum, Part 2: From poles to a spectrum")
 Bug-fixes:
  - Fixes in balloon driver (refactor and make it work under ARM)
  - Allow xenfb to be used in HVM guests.
  - Allow xen_platform_pci=0 to work properly.
  - Refactors in event channels.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJS4BmLAAoJEFjIrFwIi8fJ4SAH/iNGESowgMhfW64vRA8pBWq+
 NRJpUjYjjwmbxpwoNl6NPwn15cIXFyc3sMtvvrDD3taRDyko2RFuT+NTjpO05xPh
 d/cRpRXpXERHoiFgPf/WTp7ONBDhvPtHG0+BzJKwgqEIOUYXdbhD+gEjaVlFJScS
 CAY68OLmk7XYMSZBNzPfKNbSCyhVgZF7wpaimK9lxZBKsFRCDIq6jIyrAsC8epIL
 6V/V4l2S6lk/uUeGB6ULphYeINjI2kkpbSfCd1vyenLfWpVscc2o8uWEYFcZMAxy
 V4HpsoseuqrfdDqgPfud3VgogdISvbkCvDfW85rzfDP4MWxei2mVHFtJ/gSBV+g=
 =ToNG
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.14-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen updates from Konrad Rzeszutek Wilk:
 "Two major features that Xen community is excited about:

  The first is event channel scalability by David Vrabel - we switch
  over from an two-level per-cpu bitmap of events (IRQs) - to an FIFO
  queue with priorities.  This lets us be able to handle more events,
  have lower latency, and better scalability.  Good stuff.

  The other is PVH by Mukesh Rathor.  In short, PV is a mode where the
  kernel lets the hypervisor program page-tables, segments, etc.  With
  EPT/NPT capabilities in current processors, the overhead of doing this
  in an HVM (Hardware Virtual Machine) container is much lower than the
  hypervisor doing it for us.

  In short we let a PV guest run without doing page-table, segment,
  syscall, etc updates through the hypervisor - instead it is all done
  within the guest container.  It is a "hybrid" PV - hence the 'PVH'
  name - a PV guest within an HVM container.

  The major benefits are less code to deal with - for example we only
  use one function from the the pv_mmu_ops (which has 39 function
  calls); faster performance for syscall (no context switches into the
  hypervisor); less traps on various operations; etc.

  It is still being baked - the ABI is not yet set in stone.  But it is
  pretty awesome and we are excited about it.

  Lastly, there are some changes to ARM code - you should get a simple
  conflict which has been resolved in #linux-next.

  In short, this pull has awesome features.

  Features:
   - FIFO event channels.  Key advantages: support for over 100,000
     events (2^17), 16 different event priorities, improved fairness in
     event latency through the use of FIFOs.
   - Xen PVH support.  "It’s a fully PV kernel mode, running with
     paravirtualized disk and network, paravirtualized interrupts and
     timers, no emulated devices of any kind (and thus no qemu), no BIOS
     or legacy boot — but instead of requiring PV MMU, it uses the HVM
     hardware extensions to virtualize the pagetables, as well as system
     calls and other privileged operations." (from "The
     Paravirtualization Spectrum, Part 2: From poles to a spectrum")

  Bug-fixes:
   - Fixes in balloon driver (refactor and make it work under ARM)
   - Allow xenfb to be used in HVM guests.
   - Allow xen_platform_pci=0 to work properly.
   - Refactors in event channels"

* tag 'stable/for-linus-3.14-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (52 commits)
  xen/pvh: Set X86_CR0_WP and others in CR0 (v2)
  MAINTAINERS: add git repository for Xen
  xen/pvh: Use 'depend' instead of 'select'.
  xen: delete new instances of __cpuinit usage
  xen/fb: allow xenfb initialization for hvm guests
  xen/evtchn_fifo: fix error return code in evtchn_fifo_setup()
  xen-platform: fix error return code in platform_pci_init()
  xen/pvh: remove duplicated include from enlighten.c
  xen/pvh: Fix compile issues with xen_pvh_domain()
  xen: Use dev_is_pci() to check whether it is pci device
  xen/grant-table: Force to use v1 of grants.
  xen/pvh: Support ParaVirtualized Hardware extensions (v3).
  xen/pvh: Piggyback on PVHVM XenBus.
  xen/pvh: Piggyback on PVHVM for grant driver (v4)
  xen/grant: Implement an grant frame array struct (v3).
  xen/grant-table: Refactor gnttab_init
  xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init.
  xen/pvh: Piggyback on PVHVM for event channels (v2)
  xen/pvh: Update E820 to work with PVH (v2)
  xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
  ...
2014-01-22 22:00:18 -08:00
Linus Torvalds
7ebd3faa9b First round of KVM updates for 3.14; PPC parts will come next week.
Nothing major here, just bugfixes all over the place.  The most
 interesting part is the ARM guys' virtualized interrupt controller
 overhaul, which lets userspace get/set the state and thus enables
 migration of ARM VMs.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJS3TVKAAoJEBvWZb6bTYbyIFgP/2cmt4ifCuFMaZv4+G1S8jZU
 uC9ZB/+7vzht/p6zAy+4BxurKbHmSBFkC1OKcxYuy7yB4CQkHabzj4V2vRtqFdwH
 5lExP9qh3kqaVLuhnvxLTmkktR3EW4PFy6OI53l5kRNktOXSuZ0aN6K3V7tCg/X0
 iL7ASo4bJKlxeWcDpmuVrNgAajmZVfXrjKY7robgBQno+yIsgKhRZRBQHjozA6B8
 FpCo/k48RZd/EzIbV/PDDRI4hmmry/lgrO9SKjzq56wSqff2bd/k/KYze4dbAPfd
 Ps60enPTuHmeEjjb4MMMU4EKHVdTQFUMx/xZCmT4xzoh8s4of6RHphXbfE0SUznQ
 dTveyEQAR7E3JNS0k1+3WEX5fWlFesp0hO2NeE0wzUq4TAr9ztgVO9NQ6Si15e7Z
 2HysO0T5Ojtt0lY08/PvS6i48eCAuuBomrejJS8hLW4SUZ5adn+yW4Qo7Fp9JeBR
 l9a3LsVT8BZMtUWrUuFcVhlM4MbzElUPjDbgWhR8UYU/kpfVZOQu8qWgGKR4UWXy
 X7/t9l/tjR99CmfMJBAOzJid+ScSpAfg77BdaKiQrVfVIJmsjEjlO8vUMyj5b1HF
 hPX5wNyJjHAOfridLeHSs4Rdm4a8sk8Az5d4h76pLVz8M4jyTi2v0rO3N4/dU/pu
 x7N8KR5hAj+mLBoM9/Al
 =8sYU
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "First round of KVM updates for 3.14; PPC parts will come next week.

  Nothing major here, just bugfixes all over the place.  The most
  interesting part is the ARM guys' virtualized interrupt controller
  overhaul, which lets userspace get/set the state and thus enables
  migration of ARM VMs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits)
  kvm: make KVM_MMU_AUDIT help text more readable
  KVM: s390: Fix memory access error detection
  KVM: nVMX: Update guest activity state field on L2 exits
  KVM: nVMX: Fix nested_run_pending on activity state HLT
  KVM: nVMX: Clean up handling of VMX-related MSRs
  KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject
  KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
  KVM: nVMX: Leave VMX mode on clearing of feature control MSR
  KVM: VMX: Fix DR6 update on #DB exception
  KVM: SVM: Fix reading of DR6
  KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS
  add support for Hyper-V reference time counter
  KVM: remove useless write to vcpu->hv_clock.tsc_timestamp
  KVM: x86: fix tsc catchup issue with tsc scaling
  KVM: x86: limit PIT timer frequency
  KVM: x86: handle invalid root_hpa everywhere
  kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub
  kvm: vfio: silence GCC warning
  KVM: ARM: Remove duplicate include
  arm/arm64: KVM: relax the requirements of VMA alignment for THP
  ...
2014-01-22 21:40:43 -08:00
Linus Torvalds
e1ba84597c PCI changes for the v3.14 merge window:
Resource management
     - Change pci_bus_region addresses to dma_addr_t (Bjorn Helgaas)
     - Support 64-bit AGP BARs (Bjorn Helgaas, Yinghai Lu)
     - Add pci_bus_address() to get bus address of a BAR (Bjorn Helgaas)
     - Use pci_resource_start() for CPU address of AGP BARs (Bjorn Helgaas)
     - Enforce bus address limits in resource allocation (Yinghai Lu)
     - Allocate 64-bit BARs above 4G when possible (Yinghai Lu)
     - Convert pcibios_resource_to_bus() to take pci_bus, not pci_dev (Yinghai Lu)
 
   PCI device hotplug
     - Major rescan/remove locking update (Rafael J. Wysocki)
     - Make ioapic builtin only (not modular) (Yinghai Lu)
     - Fix release/free issues (Yinghai Lu)
     - Clean up pciehp (Bjorn Helgaas)
     - Announce pciehp slot info during enumeration (Bjorn Helgaas)
 
   MSI
     - Add pci_msi_vec_count(), pci_msix_vec_count() (Alexander Gordeev)
     - Add pci_enable_msi_range(), pci_enable_msix_range() (Alexander Gordeev)
     - Deprecate "tri-state" interfaces: fail/success/fail+info (Alexander Gordeev)
     - Export MSI mode using attributes, not kobjects (Greg Kroah-Hartman)
     - Drop "irq" param from *_restore_msi_irqs() (DuanZhenzhong)
 
   SR-IOV
     - Clear NumVFs when disabling SR-IOV in sriov_init() (ethan.zhao)
 
   Virtualization
     - Add support for save/restore of extended capabilities (Alex Williamson)
     - Add Virtual Channel to save/restore support (Alex Williamson)
     - Never treat a VF as a multifunction device (Alex Williamson)
     - Add pci_try_reset_function(), et al (Alex Williamson)
 
   AER
     - Ignore non-PCIe error sources (Betty Dall)
     - Support ACPI HEST error sources for domains other than 0 (Betty Dall)
     - Consolidate HEST error source parsers (Bjorn Helgaas)
     - Add a TLP header print helper (Borislav Petkov)
 
   Freescale i.MX6
     - Remove unnecessary code (Fabio Estevam)
     - Make reset-gpio optional (Marek Vasut)
     - Report "link up" only after link training completes (Marek Vasut)
     - Start link in Gen1 before negotiating for Gen2 mode (Marek Vasut)
     - Fix PCIe startup code (Richard Zhu)
 
   Marvell MVEBU
     - Remove duplicate of_clk_get_by_name() call (Andrew Lunn)
     - Drop writes to bridge Secondary Status register (Jason Gunthorpe)
     - Obey bridge PCI_COMMAND_MEM and PCI_COMMAND_IO bits (Jason Gunthorpe)
     - Support a bridge with no IO port window (Jason Gunthorpe)
     - Use max_t() instead of max(resource_size_t,) (Jingoo Han)
     - Remove redundant of_match_ptr (Sachin Kamat)
     - Call pci_ioremap_io() at startup instead of dynamically (Thomas Petazzoni)
 
   NVIDIA Tegra
     - Disable Gen2 for Tegra20 and Tegra30 (Eric Brower)
 
   Renesas R-Car
     - Add runtime PM support (Valentine Barshak)
     - Fix rcar_pci_probe() return value check (Wei Yongjun)
 
   Synopsys DesignWare
     - Fix crash in dw_msi_teardown_irq() (Bjørn Erik Nilsen)
     - Remove redundant call to pci_write_config_word() (Bjørn Erik Nilsen)
     - Fix missing MSI IRQs (Harro Haan)
     - Add dw_pcie prefix before cfg_read/write (Pratyush Anand)
     - Fix I/O transfers by using CPU (not realio) address (Pratyush Anand)
     - Whitespace cleanup (Jingoo Han)
 
   EISA
     - Call put_device() if device_register() fails (Levente Kurusa)
     - Revert EISA initialization breakage ((Bjorn Helgaas)
 
   Miscellaneous
     - Remove unused code, including PCIe 3.0 interfaces (Stephen Hemminger)
     - Prevent bus conflicts while checking for bridge apertures (Bjorn Helgaas)
     - Stop clearing bridge Secondary Status when setting up I/O aperture (Bjorn Helgaas)
     - Use dev_is_pci() to identify PCI devices (Yijing Wang)
     - Deprecate DEFINE_PCI_DEVICE_TABLE (Joe Perches)
     - Update documentation 00-INDEX (Erik Ekman)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJS3ujEAAoJEFmIoMA60/r8A4EQAK9AZSUSVNWvlKdC1PrBfT3w
 7fVILx5A4KWsOU8eoFwCPQLrgvUtMltg16yN2tbCjqpKEdrVc36biMO9bwhnXSyZ
 KopHKMWnn0sza/z2H8mcGy+0azGdWcIjcErX/a8WeS6zyWBjm+yzckrHNVpPu4Ca
 SpCBhfgBMjKyIZyLtP6juFSH34S2DfQex4oUSyPC+gjqPy5wW/xw/kBxZfOXl+yU
 P9pQT+geMIc31pETMdG9wd/TT+47YAui4ieSggoVxfVrphCXv6S8mOMCMuQc2bAy
 MHy9uFm1jbvKZZIYrzJ+9HFiiU/6MNiOO3Ygua52xuSp1Zrcjwi2CLD9/QBXbDVs
 pTKU5JIO7q43llkQUpIXTwBvEApSZRhuqzXegsMAYIg4AWmbfm/2fXkfWlQThYMp
 J48blAJZ4t0vhMr9usgwbtdBe8F5euExOxpwH0QMCMABbuu8/B3TLm39+LTcIbsw
 Efgm3N9iUTyiV5fe9Rr62nflhyqXjTevPl4wbZZe4OOdm0MXZY+/BzuNJhg3wyY8
 QKz2J3FB6OR7BCLHCp80l50s5+Ih4F5kmOXwFKjT1D1MFRaNaPDmp9BY6TitU6hg
 zj55gP4c8x6n3alakbf972Yhgs/4oi1va8cZL+pCYWb8nPO5ldaMiT7QBBLUreQV
 BtDtC7u/AFWJ5e73+jVO
 =La1R
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "PCI changes for the v3.14 merge window:

  Resource management
    - Change pci_bus_region addresses to dma_addr_t (Bjorn Helgaas)
    - Support 64-bit AGP BARs (Bjorn Helgaas, Yinghai Lu)
    - Add pci_bus_address() to get bus address of a BAR (Bjorn Helgaas)
    - Use pci_resource_start() for CPU address of AGP BARs (Bjorn Helgaas)
    - Enforce bus address limits in resource allocation (Yinghai Lu)
    - Allocate 64-bit BARs above 4G when possible (Yinghai Lu)
    - Convert pcibios_resource_to_bus() to take pci_bus, not pci_dev (Yinghai Lu)

  PCI device hotplug
    - Major rescan/remove locking update (Rafael J. Wysocki)
    - Make ioapic builtin only (not modular) (Yinghai Lu)
    - Fix release/free issues (Yinghai Lu)
    - Clean up pciehp (Bjorn Helgaas)
    - Announce pciehp slot info during enumeration (Bjorn Helgaas)

  MSI
    - Add pci_msi_vec_count(), pci_msix_vec_count() (Alexander Gordeev)
    - Add pci_enable_msi_range(), pci_enable_msix_range() (Alexander Gordeev)
    - Deprecate "tri-state" interfaces: fail/success/fail+info (Alexander Gordeev)
    - Export MSI mode using attributes, not kobjects (Greg Kroah-Hartman)
    - Drop "irq" param from *_restore_msi_irqs() (DuanZhenzhong)

  SR-IOV
    - Clear NumVFs when disabling SR-IOV in sriov_init() (ethan.zhao)

  Virtualization
    - Add support for save/restore of extended capabilities (Alex Williamson)
    - Add Virtual Channel to save/restore support (Alex Williamson)
    - Never treat a VF as a multifunction device (Alex Williamson)
    - Add pci_try_reset_function(), et al (Alex Williamson)

  AER
    - Ignore non-PCIe error sources (Betty Dall)
    - Support ACPI HEST error sources for domains other than 0 (Betty Dall)
    - Consolidate HEST error source parsers (Bjorn Helgaas)
    - Add a TLP header print helper (Borislav Petkov)

  Freescale i.MX6
    - Remove unnecessary code (Fabio Estevam)
    - Make reset-gpio optional (Marek Vasut)
    - Report "link up" only after link training completes (Marek Vasut)
    - Start link in Gen1 before negotiating for Gen2 mode (Marek Vasut)
    - Fix PCIe startup code (Richard Zhu)

  Marvell MVEBU
    - Remove duplicate of_clk_get_by_name() call (Andrew Lunn)
    - Drop writes to bridge Secondary Status register (Jason Gunthorpe)
    - Obey bridge PCI_COMMAND_MEM and PCI_COMMAND_IO bits (Jason Gunthorpe)
    - Support a bridge with no IO port window (Jason Gunthorpe)
    - Use max_t() instead of max(resource_size_t,) (Jingoo Han)
    - Remove redundant of_match_ptr (Sachin Kamat)
    - Call pci_ioremap_io() at startup instead of dynamically (Thomas Petazzoni)

  NVIDIA Tegra
    - Disable Gen2 for Tegra20 and Tegra30 (Eric Brower)

  Renesas R-Car
    - Add runtime PM support (Valentine Barshak)
    - Fix rcar_pci_probe() return value check (Wei Yongjun)

  Synopsys DesignWare
    - Fix crash in dw_msi_teardown_irq() (Bjørn Erik Nilsen)
    - Remove redundant call to pci_write_config_word() (Bjørn Erik Nilsen)
    - Fix missing MSI IRQs (Harro Haan)
    - Add dw_pcie prefix before cfg_read/write (Pratyush Anand)
    - Fix I/O transfers by using CPU (not realio) address (Pratyush Anand)
    - Whitespace cleanup (Jingoo Han)

  EISA
    - Call put_device() if device_register() fails (Levente Kurusa)
    - Revert EISA initialization breakage ((Bjorn Helgaas)

  Miscellaneous
    - Remove unused code, including PCIe 3.0 interfaces (Stephen Hemminger)
    - Prevent bus conflicts while checking for bridge apertures (Bjorn Helgaas)
    - Stop clearing bridge Secondary Status when setting up I/O aperture (Bjorn Helgaas)
    - Use dev_is_pci() to identify PCI devices (Yijing Wang)
    - Deprecate DEFINE_PCI_DEVICE_TABLE (Joe Perches)
    - Update documentation 00-INDEX (Erik Ekman)"

* tag 'pci-v3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (119 commits)
  Revert "EISA: Initialize device before its resources"
  Revert "EISA: Log device resources in dmesg"
  vfio-pci: Use pci "try" reset interface
  PCI: Check parent kobject in pci_destroy_dev()
  xen/pcifront: Use global PCI rescan-remove locking
  powerpc/eeh: Use global PCI rescan-remove locking
  PCI: Fix pci_check_and_unmask_intx() comment typos
  PCI: Add pci_try_reset_function(), pci_try_reset_slot(), pci_try_reset_bus()
  MPT / PCI: Use pci_stop_and_remove_bus_device_locked()
  platform / x86: Use global PCI rescan-remove locking
  PCI: hotplug: Use global PCI rescan-remove locking
  pcmcia: Use global PCI rescan-remove locking
  ACPI / hotplug / PCI: Use global PCI rescan-remove locking
  ACPI / PCI: Use global PCI rescan-remove locking in PCI root hotplug
  PCI: Add global pci_lock_rescan_remove()
  PCI: Cleanup pci.h whitespace
  PCI: Reorder so actual code comes before stubs
  PCI/AER: Support ACPI HEST AER error sources for PCI domains other than 0
  ACPICA: Add helper macros to extract bus/segment numbers from HEST table.
  PCI: Make local functions static
  ...
2014-01-22 16:39:28 -08:00
Santosh Shilimkar
5b6e529521 x86: memblock: set current limit to max low memory address
The memblock current limit value is used to limit early boot memory
allocations below max low memory address by default, as the kernel can
access only to the low memory.

Hence, set memblock current limit value to the max mapped low memory
address instead of max mapped memory address.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Paul Walmsley <paul@pwsan.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:46 -08:00
Linus Torvalds
15c8102620 Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x32 uapi changes from Peter Anvin:
 "This is the first few of a set of patches by H.J.  Lu to make the
  kernel uapi headers usable for x32, as required by some non-glibc
  libcs.

  These particular patches make the stat and statfs structures usable"

* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, x32: Use __kernel_long_t for __statfs_word
  x86, x32: Use __kernel_long_t/__kernel_ulong_t in x86-64 stat.h
2014-01-20 15:12:49 -08:00
Linus Torvalds
c9cdd9a6ae Merge branch 'x86/mpx' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpufeature and mpx updates from Peter Anvin:
 "This includes the basic infrastructure for MPX (Memory Protection
  Extensions) support, but does not include MPX support itself.  It is,
  however, a prerequisite for KVM support for MPX, which I believe will
  be pushed later this merge window by the KVM team.

  This includes moving the functionality in
  futex_atomic_cmpxchg_inatomic() into a new function in uaccess.h so it
  can be reused - this will be used by the final MPX patches.

  The actual MPX functionality (map management and so on) will be pushed
  in a future merge window, when ready"

* 'x86/mpx' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel/mpx: Remove unused LWP structure
  x86, mpx: Add MPX related opcodes to the x86 opcode map
  x86: replace futex_atomic_cmpxchg_inatomic() with user_atomic_cmpxchg_inatomic
  x86: add user_atomic_cmpxchg_inatomic at uaccess.h
  x86, xsave: Support eager-only xsave features, add MPX support
  x86, cpufeature: Define the Intel MPX feature flag
2014-01-20 14:46:32 -08:00
Linus Torvalds
f4bcd8ccdd Merge branch 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kernel address space randomization support from Peter Anvin:
 "This enables kernel address space randomization for x86"

* 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET
  x86, kaslr: Remove unused including <linux/version.h>
  x86, kaslr: Use char array to gain sizeof sanity
  x86, kaslr: Add a circular multiply for better bit diffusion
  x86, kaslr: Mix entropy sources together as needed
  x86/relocs: Add percpu fixup for GNU ld 2.23
  x86, boot: Rename get_flags() and check_flags() to *_cpuflags()
  x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64
  x86, kaslr: Report kernel offset on panic
  x86, kaslr: Select random position from e820 maps
  x86, kaslr: Provide randomness functions
  x86, kaslr: Return location from decompress_kernel
  x86, boot: Move CPU flags out of cpucheck
  x86, relocs: Add more per-cpu gold special cases
2014-01-20 14:45:50 -08:00
H.J. Lu
386916598e x86, uapi, x32: Use __kernel_ulong_t in x86 struct semid64_ds
Both x32 and x86-64 use the same struct semid64_ds for system calls.
But x32 long is 32-bit. This patch replaces unsigned long with
__kernel_ulong_t in x86 struct semid64_ds.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Link: http://lkml.kernel.org/r/1388182464-28428-7-git-send-email-hjl.tools@gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-01-20 14:45:13 -08:00
Linus Torvalds
7fe67a1180 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull leftover x86 fixes from Ingo Molnar:
 "Two leftover fixes that did not make it into v3.13"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Add check for number of available vectors before CPU down
  x86, cpu, amd: Add workaround for family 16h, erratum 793
2014-01-20 12:11:41 -08:00
Linus Torvalds
74e8ee8262 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Intel SoC changes from Ingo Molnar:
 "Improved Intel SoC platform support"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, tsc, apic: Unbreak static (MSR) calibration when CONFIG_X86_LOCAL_APIC=n
  x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs
  arch: x86: New MailBox support driver for Intel SOC's
2014-01-20 12:09:31 -08:00
Linus Torvalds
8bd6964cbd Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "A cleanup, a fix and ASLR support for hugetlb mappings"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/numa: Fix 32-bit kernel NUMA boot
  x86/mm: Implement ASLR for hugetlb mappings
  x86/mm: Unify pte_to_pgoff() and pgoff_to_pte() helpers
2014-01-20 12:08:51 -08:00
Linus Torvalds
2bb2c5e235 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loader updates from Ingo Molnar:
 "There are two main changes in this tree:

   - AMD microcode early loading fixes
   - some microcode loader source files reorganization"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode: Move to a proper location
  x86, microcode, AMD: Fix early ucode loading
  x86, microcode: Share native MSR accessing variants
  x86, ramdisk: Export relocated ramdisk VA
2014-01-20 12:07:54 -08:00
Linus Torvalds
4500cf60db Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Intel MID updates from Ingo Molnar:
 "This tree improves Intel MID (Mobile Internet Device) platform
  support:

   - Merrifield platform support (David Cohen)
   - Clovertrail platform support (Kuppuswamy Sathyanarayanan)
   - Various cleanups and fixes (David Cohen)"

* 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, intel_mid: Replace memcpy with struct assignment
  x86, intel-mid: Return proper error code from get_gpio_by_name()
  x86, intel-mid: Check get_gpio_by_name() error code on platform code
  x86, intel-mid: sfi_handle_*_dev() should check for pdata error code
  x86, intel-mid: Remove deprecated X86_MDFLD and X86_WANT_INTEL_MID configs
  x86, intel-mid: Add Merrifield platform support
  x86, intel-mid: Add Clovertrail platform support
  x86, intel-mid: Move Medfield code out of intel-mid.c core file
2014-01-20 12:06:50 -08:00
Linus Torvalds
972d5e7e5b Merge branch 'x86-efi-kexec-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI changes from Ingo Molnar:
 "This consists of two main parts:

   - New static EFI runtime services virtual mapping layout which is
     groundwork for kexec support on EFI (Borislav Petkov)

   - EFI kexec support itself (Dave Young)"

* 'x86-efi-kexec-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/efi: parse_efi_setup() build fix
  x86: ksysfs.c build fix
  x86/efi: Delete superfluous global variables
  x86: Reserve setup_data ranges late after parsing memmap cmdline
  x86: Export x86 boot_params to sysfs
  x86: Add xloadflags bit for EFI runtime support on kexec
  x86/efi: Pass necessary EFI data for kexec via setup_data
  efi: Export EFI runtime memory mapping to sysfs
  efi: Export more EFI table variables to sysfs
  x86/efi: Cleanup efi_enter_virtual_mode() function
  x86/efi: Fix off-by-one bug in EFI Boot Services reservation
  x86/efi: Add a wrapper function efi_map_region_fixed()
  x86/efi: Remove unused variables in __map_region()
  x86/efi: Check krealloc return value
  x86/efi: Runtime services virtual mapping
  x86/mm/cpa: Map in an arbitrary pgd
  x86/mm/pageattr: Add last levels of error path
  x86/mm/pageattr: Add a PUD error unwinding path
  x86/mm/pageattr: Add a PTE pagetable populating function
  x86/mm/pageattr: Add a PMD pagetable populating function
  ...
2014-01-20 12:05:30 -08:00
Linus Torvalds
5d4863e4cc Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 TLB detection update from Ingo Molnar:
 "A single change that extends our TLB cache size detection+reporting
  code"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cpu: Detect more TLB configuration
2014-01-20 12:04:45 -08:00
Linus Torvalds
2a0fede97f Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cpu, amd: Fix a shadowed variable situation
  um, x86: Fix vDSO build
  x86: Delete non-required instances of include <linux/init.h>
  x86, realmode: Pointer walk cleanups, pull out invariant use of __pa()
  x86/traps: Clean up error exception handler definitions
2014-01-20 12:03:57 -08:00
Linus Torvalds
06bc0f4a2e Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/build changes from Ingo Molnar:
 "Misc smaller improvements"

* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, boot: Move intcall() to the .inittext section
  x86, boot: Use .code16 instead of .code16gcc
  x86, sparse: Do not force removal of __user when calling copy_to/from_user_nocheck()
2014-01-20 12:03:21 -08:00
Linus Torvalds
4cd4156994 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/asm changes from Ingo Molnar:
 "Misc optimizations"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Slightly tweak the access_ok() C variant for better code
  x86: Replace assembly access_ok() with a C variant
  x86-64, copy_user: Use leal to produce 32-bit results
  x86-64, copy_user: Remove zero byte check before copy user buffer.
2014-01-20 11:51:00 -08:00
Ingo Molnar
741e3902cd x86/intel/mpx: Remove unused LWP structure
We don't support LWP yet, don't give the impression that we do:
represent the LWP state as opaque 128 bytes, the way Linux sees it
currently.

Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-ecarmjtfKpanpAapfck6dj6g@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-20 19:57:39 +01:00
Linus Torvalds
a0fa1dd3cd Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:

 - Add the initial implementation of SCHED_DEADLINE support: a real-time
   scheduling policy where tasks that meet their deadlines and
   periodically execute their instances in less than their runtime quota
   see real-time scheduling and won't miss any of their deadlines.
   Tasks that go over their quota get delayed (Available to privileged
   users for now)

 - Clean up and fix preempt_enable_no_resched() abuse all around the
   tree

 - Do sched_clock() performance optimizations on x86 and elsewhere

 - Fix and improve auto-NUMA balancing

 - Fix and clean up the idle loop

 - Apply various cleanups and fixes

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  sched: Fix __sched_setscheduler() nice test
  sched: Move SCHED_RESET_ON_FORK into attr::sched_flags
  sched: Fix up attr::sched_priority warning
  sched: Fix up scheduler syscall LTP fails
  sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls
  sched/core: Fix htmldocs warnings
  sched/deadline: No need to check p if dl_se is valid
  sched/deadline: Remove unused variables
  sched/deadline: Fix sparse static warnings
  m68k: Fix build warning in mac_via.h
  sched, thermal: Clean up preempt_enable_no_resched() abuse
  sched, net: Fixup busy_loop_us_clock()
  sched, net: Clean up preempt_enable_no_resched() abuse
  sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
  sched/preempt, locking: Rework local_bh_{dis,en}able()
  sched/clock, x86: Avoid a runtime condition in native_sched_clock()
  sched/clock: Fix up clear_sched_clock_stable()
  sched/clock, x86: Use a static_key for sched_clock_stable
  sched/clock: Remove local_irq_disable() from the clocks
  sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
  ...
2014-01-20 10:42:08 -08:00
Linus Torvalds
2cc3f16cad Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull IRQ changes from Ingo Molnar:
 "The only change in this cycle is a CPU hotplug related spurious
  warning fix"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Fix kbuild warning in smp_irq_move_cleanup_interrupt()
  x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs
2014-01-20 10:27:52 -08:00
Linus Torvalds
6ffbe7d1fa Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking changes from Ingo Molnar:
 - futex performance increases: larger hashes, smarter wakeups
 - mutex debugging improvements
 - lots of SMP ordering documentation updates
 - introduce the smp_load_acquire(), smp_store_release() primitives.
   (There are WIP patches that make use of them - not yet merged)
 - lockdep micro-optimizations
 - lockdep improvement: better cover IRQ contexts
 - liblockdep at last. We'll continue to monitor how useful this is

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
  futexes: Fix futex_hashsize initialization
  arch: Re-sort some Kbuild files to hopefully help avoid some conflicts
  futexes: Avoid taking the hb->lock if there's nothing to wake up
  futexes: Document multiprocessor ordering guarantees
  futexes: Increase hash table size for better performance
  futexes: Clean up various details
  arch: Introduce smp_load_acquire(), smp_store_release()
  arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h
  arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h
  locking/doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE
  mutexes: Give more informative mutex warning in the !lock->owner case
  powerpc: Full barrier for smp_mb__after_unlock_lock()
  rcu: Apply smp_mb__after_unlock_lock() to preserve grace periods
  Documentation/memory-barriers.txt: Downgrade UNLOCK+BLOCK
  locking: Add an smp_mb__after_unlock_lock() for UNLOCK+BLOCK barrier
  Documentation/memory-barriers.txt: Document ACCESS_ONCE()
  Documentation/memory-barriers.txt: Prohibit speculative writes
  Documentation/memory-barriers.txt: Add long atomic examples to memory-barriers.txt
  Documentation/memory-barriers.txt: Add needed ACCESS_ONCE() calls to memory-barriers.txt
  Revert "smp/cpumask: Make CONFIG_CPUMASK_OFFSTACK=y usable without debug dependency"
  ...
2014-01-20 10:23:08 -08:00
David S. Miller
4180442058 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
	net/ipv4/tcp_metrics.c

Overlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.

Minor overlapping changes in bnx2x driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 00:55:41 -08:00
Jan Kiszka
cae501397a KVM: nVMX: Clean up handling of VMX-related MSRs
This simplifies the code and also stops issuing warning about writing to
unhandled MSRs when VMX is disabled or the Feature Control MSR is
locked - we do handle them all according to the spec.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-17 10:22:16 +01:00
Jan Kiszka
73aaf249ee KVM: SVM: Fix reading of DR6
In contrast to VMX, SVM dose not automatically transfer DR6 into the
VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor
hook to obtain the current value. And as SVM now picks the DR6 state
from its VMCB, we also need a set callback in order to write updates of
DR6 back.

Fixes a regression of 020df0794f.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-17 10:22:10 +01:00
Vadim Rozenfeld
e984097b55 add support for Hyper-V reference time counter
Signed-off: Peter Lieven <pl@kamp.de>
Signed-off: Gleb Natapov
Signed-off: Vadim Rozenfeld <vrozenfe@redhat.com>

After some consideration I decided to submit only Hyper-V reference
counters support this time. I will submit iTSC support as a separate
patch as soon as it is ready.

v1 -> v2
1. mark TSC page dirty as suggested by
    Eric Northup <digitaleric@google.com> and Gleb
2. disable local irq when calling get_kernel_ns,
    as it was done by Peter Lieven <pl@amp.de>
3. move check for TSC page enable from second patch
    to this one.

v3 -> v4
    Get rid of ref counter offset.

v4 -> v5
    replace __copy_to_user with kvm_write_guest
    when updateing iTSC page.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-17 10:22:08 +01:00
Bin Gao
7da7c15613 x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs
On SoCs that have the calibration MSRs available, either there is no
PIT, HPET or PMTIMER to calibrate against, or the PIT/HPET/PMTIMER is
driven from the same clock as the TSC, so calibration is redundant and
just slows down the boot.

TSC rate is caculated by this formula:
<maximum core-clock to bus-clock ratio> * <maximum resolved frequency>
The ratio and the resolved frequency ID can be obtained from MSR.
See Intel 64 and IA-32 System Programming Guid section 16.12 and 30.11.5
for details.

Signed-off-by: Bin Gao <bin.gao@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-rgm7xmg7k6qnjlw3ynkcjsmh@git.kernel.org
2014-01-15 22:28:48 -08:00
Prarit Bhargava
da6139e49c x86: Add check for number of available vectors before CPU down
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791

When a cpu is downed on a system, the irqs on the cpu are assigned to
other cpus.  It is possible, however, that when a cpu is downed there
aren't enough free vectors on the remaining cpus to account for the
vectors from the cpu that is being downed.

This results in an interesting "overflow" condition where irqs are
"assigned" to a CPU but are not handled.

For example, when downing cpus on a 1-64 logical processor system:

<snip>
[  232.021745] smpboot: CPU 61 is now offline
[  238.480275] smpboot: CPU 62 is now offline
[  245.991080] ------------[ cut here ]------------
[  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250()
[  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
[  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas
[  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
[  246.044451] Hardware name: Intel Corporation S4600LH ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[  246.057371]  0000000000000009 ffff88081fa03d40 ffffffff8164fbf6 ffff88081fa0ee48
[  246.065728]  ffff88081fa03d90 ffff88081fa03d80 ffffffff81054ecc ffff88081fa13040
[  246.074073]  0000000000000000 ffff88200cce0000 0000000000000040 0000000000000000
[  246.082430] Call Trace:
[  246.085174]  <IRQ>  [<ffffffff8164fbf6>] dump_stack+0x46/0x58
[  246.091633]  [<ffffffff81054ecc>] warn_slowpath_common+0x8c/0xc0
[  246.098352]  [<ffffffff81054fb6>] warn_slowpath_fmt+0x46/0x50
[  246.104786]  [<ffffffff815710d6>] dev_watchdog+0x246/0x250
[  246.110923]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
[  246.119097]  [<ffffffff8106092a>] call_timer_fn+0x3a/0x110
[  246.125224]  [<ffffffff8106280f>] ? update_process_times+0x6f/0x80
[  246.132137]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
[  246.140308]  [<ffffffff81061db0>] run_timer_softirq+0x1f0/0x2a0
[  246.146933]  [<ffffffff81059a80>] __do_softirq+0xe0/0x220
[  246.152976]  [<ffffffff8165fedc>] call_softirq+0x1c/0x30
[  246.158920]  [<ffffffff810045f5>] do_softirq+0x55/0x90
[  246.164670]  [<ffffffff81059d35>] irq_exit+0xa5/0xb0
[  246.170227]  [<ffffffff8166062a>] smp_apic_timer_interrupt+0x4a/0x60
[  246.177324]  [<ffffffff8165f40a>] apic_timer_interrupt+0x6a/0x70
[  246.184041]  <EOI>  [<ffffffff81505a1b>] ? cpuidle_enter_state+0x5b/0xe0
[  246.191559]  [<ffffffff81505a17>] ? cpuidle_enter_state+0x57/0xe0
[  246.198374]  [<ffffffff81505b5d>] cpuidle_idle_call+0xbd/0x200
[  246.204900]  [<ffffffff8100b7ae>] arch_cpu_idle+0xe/0x30
[  246.210846]  [<ffffffff810a47b0>] cpu_startup_entry+0xd0/0x250
[  246.217371]  [<ffffffff81646b47>] rest_init+0x77/0x80
[  246.223028]  [<ffffffff81d09e8e>] start_kernel+0x3ee/0x3fb
[  246.229165]  [<ffffffff81d0989f>] ? repair_env_string+0x5e/0x5e
[  246.235787]  [<ffffffff81d095a5>] x86_64_start_reservations+0x2a/0x2c
[  246.242990]  [<ffffffff81d0969f>] x86_64_start_kernel+0xf8/0xfc
[  246.249610] ---[ end trace fb74fdef54d79039 ]---
[  246.254807] ixgbe 0000:c2:00.0 p786p1: initiating reset due to tx timeout
[  246.262489] ixgbe 0000:c2:00.0 p786p1: Reset adapter
Last login: Mon Nov 11 08:35:14 from 10.18.17.119
[root@(none) ~]# [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
[  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
[  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX

(last lines keep repeating.  ixgbe driver is dead until module reload.)

If the downed cpu has more vectors than are free on the remaining cpus on the
system, it is possible that some vectors are "orphaned" even though they are
assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, the
watchdog fired and notified that something was wrong.

This patch adds a function, check_vectors(), to compare the number of vectors
on the CPU going down and compares it to the number of vectors available on
the system.  If there aren't enough vectors for the CPU to go down, an
error is returned and propogated back to userspace.

v2: Do not need to look at percpu irqs
v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
    Priority Mode
v4: Additional changes suggested by Gong Chen.
v5/v6/v7/v8: Updated comment text

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1389613861-3853-1-git-send-email-prarit@redhat.com
Reviewed-by: Gong Chen <gong.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Yang Zhang <yang.z.zhang@Intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Janet Morgan <janet.morgan@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ruiv Wang <ruiv.wang@gmail.com>
Cc: Gong Chen <gong.chen@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2014-01-15 22:24:02 -08:00
David Cohen
bc20aa48bb x86, intel-mid: Add Merrifield platform support
This code was partially based on Mark Brown's previous work.

Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Link: http://lkml.kernel.org/r/1387224459-25746-4-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: Fei Yang <fei.yang@intel.com>
Cc: Mark F. Brown <mark.f.brown@intel.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-15 14:38:58 -08:00
Kuppuswamy Sathyanarayanan
85611e3feb x86, intel-mid: Add Clovertrail platform support
This patch adds Clovertrail support on intel-mid and makes it more
flexible to support other SoCs.

Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Link: http://lkml.kernel.org/r/1387224459-25746-3-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-15 14:38:58 -08:00
Borislav Petkov
3b56496865 x86, cpu, amd: Add workaround for family 16h, erratum 793
This adds the workaround for erratum 793 as a precaution in case not
every BIOS implements it.  This addresses CVE-2013-6885.

Erratum text:

[Revision Guide for AMD Family 16h Models 00h-0Fh Processors,
document 51810 Rev. 3.04 November 2013]

793 Specific Combination of Writes to Write Combined Memory Types and
Locked Instructions May Cause Core Hang

Description

Under a highly specific and detailed set of internal timing
conditions, a locked instruction may trigger a timing sequence whereby
the write to a write combined memory type is not flushed, causing the
locked instruction to stall indefinitely.

Potential Effect on System

Processor core hang.

Suggested Workaround

BIOS should set MSR
C001_1020[15] = 1b.

Fix Planned

No fix planned

[ hpa: updated description, fixed typo in MSR name ]

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic
Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-14 16:39:07 -08:00
Borislav Petkov
5335ba5cf4 x86, microcode, AMD: Fix early ucode loading
The original idea to use the microcode cache for the APs doesn't pan out
because we do memory allocation there very early and with IRQs disabled
and we don't want to involve GFP_ATOMIC allocations. Not if it can be
helped.

Thus, extend the caching of the BSP patch approach to the APs and
iterate over the ucode in the initrd instead of using the cache. We
still save the relevant patches to it but later, right before we
jettison the initrd.

While at it, fix early ucode loading on 32-bit too.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
2014-01-13 19:59:38 +01:00
Borislav Petkov
e1b43e3f13 x86, microcode: Share native MSR accessing variants
We want to use those in AMD's early loading path too. Also, add a
native_wrmsrl variant.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
2014-01-13 19:57:27 +01:00
Borislav Petkov
5aa3d718f2 x86, ramdisk: Export relocated ramdisk VA
The ramdisk can possibly get relocated if the whole image is not mapped.
And since we're going over it in the microcode loader and fishing out
the relevant microcode patches, we want access it at its new location.
Thus, export it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
2014-01-13 19:56:25 +01:00
Peter Zijlstra
8cb75e0c4e sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
With various drivers wanting to inject idle time; we get people
calling idle routines outside of the idle loop proper.

Therefore we need to be extra careful about not missing
TIF_NEED_RESCHED -> PREEMPT_NEED_RESCHED propagations.

While looking at this, I also realized there's a small window in the
existing idle loop where we can miss TIF_NEED_RESCHED; when it hits
right after the tif_need_resched() test at the end of the loop but
right before the need_resched() test at the start of the loop.

So move preempt_fold_need_resched() out of the loop where we're
guaranteed to have TIF_NEED_RESCHED set.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-x9jgh45oeayzajz2mjt0y7d6@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 17:38:55 +01:00
Ingo Molnar
c9c8986847 Merge branch 'x86/idle' into sched/core
Merge these x86 specific bits - we are going to add generic bits as well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 17:37:05 +01:00
Peter Zijlstra
20d1c86a57 sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
Use a ring-buffer like multi-version object structure which allows
always having a coherent object; we use this to avoid having to
disable IRQs while reading sched_clock() and avoids a problem when
getting an NMI while changing the cyc2ns data.

                        MAINLINE   PRE        POST

    sched_clock_stable: 1          1          1
    (cold) sched_clock: 329841     331312     257223
    (cold) local_clock: 301773     310296     309889
    (warm) sched_clock: 38375      38247      25280
    (warm) local_clock: 100371     102713     85268
    (warm) rdtsc:       27340      27289      24247
    sched_clock_stable: 0          0          0
    (cold) sched_clock: 382634     372706     301224
    (cold) local_clock: 396890     399275     399870
    (warm) sched_clock: 38194      38124      25630
    (warm) local_clock: 143452     148698     129629
    (warm) rdtsc:       27345      27365      24307

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 15:13:06 +01:00
Peter Zijlstra
57c67da274 sched/clock, x86: Move some cyc2ns() code around
There are no __cycles_2_ns() users outside of arch/x86/kernel/tsc.c,
so move it there.

There are no cycles_2_ns() users.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-01lslnavfgo3kmbo4532zlcj@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 13:47:39 +01:00
Peter Zijlstra
5dd12c2152 sched/clock, x86: Use mul_u64_u32_shr() for native_sched_clock()
Use mul_u64_u32_shr() so that x86_64 can use a single 64x64->128 mul.

Before:

0000000000000560 <native_sched_clock>:
 560:   44 8b 1d 00 00 00 00    mov    0x0(%rip),%r11d        # 567 <native_sched_clock+0x7>
 567:   55                      push   %rbp
 568:   48 89 e5                mov    %rsp,%rbp
 56b:   45 85 db                test   %r11d,%r11d
 56e:   75 4f                   jne    5bf <native_sched_clock+0x5f>
 570:   0f 31                   rdtsc
 572:   89 c0                   mov    %eax,%eax
 574:   48 c1 e2 20             shl    $0x20,%rdx
 578:   48 c7 c1 00 00 00 00    mov    $0x0,%rcx
 57f:   48 09 c2                or     %rax,%rdx
 582:   48 c7 c7 00 00 00 00    mov    $0x0,%rdi
 589:   65 8b 04 25 00 00 00    mov    %gs:0x0,%eax
 590:   00
 591:   48 98                   cltq
 593:   48 8b 34 c5 00 00 00    mov    0x0(,%rax,8),%rsi
 59a:   00
 59b:   48 89 d0                mov    %rdx,%rax
 59e:   81 e2 ff 03 00 00       and    $0x3ff,%edx
 5a4:   48 c1 e8 0a             shr    $0xa,%rax
 5a8:   48 0f af 14 0e          imul   (%rsi,%rcx,1),%rdx
 5ad:   48 0f af 04 0e          imul   (%rsi,%rcx,1),%rax
 5b2:   5d                      pop    %rbp
 5b3:   48 03 04 3e             add    (%rsi,%rdi,1),%rax
 5b7:   48 c1 ea 0a             shr    $0xa,%rdx
 5bb:   48 01 d0                add    %rdx,%rax
 5be:   c3                      retq

After:

0000000000000550 <native_sched_clock>:
 550:   8b 3d 00 00 00 00       mov    0x0(%rip),%edi        # 556 <native_sched_clock+0x6>
 556:   55                      push   %rbp
 557:   48 89 e5                mov    %rsp,%rbp
 55a:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
 55e:   85 ff                   test   %edi,%edi
 560:   75 2c                   jne    58e <native_sched_clock+0x3e>
 562:   0f 31                   rdtsc
 564:   89 c0                   mov    %eax,%eax
 566:   48 c1 e2 20             shl    $0x20,%rdx
 56a:   48 09 c2                or     %rax,%rdx
 56d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
 574:   00 00
 576:   89 c0                   mov    %eax,%eax
 578:   48 f7 e2                mul    %rdx
 57b:   65 48 8b 0c 25 00 00    mov    %gs:0x0,%rcx
 582:   00 00
 584:   c9                      leaveq
 585:   48 0f ac d0 0a          shrd   $0xa,%rdx,%rax
 58a:   48 01 c8                add    %rcx,%rax
 58d:   c3                      retq

                        MAINLINE   POST

    sched_clock_stable: 1	   1
    (cold) sched_clock: 329841     331312
    (cold) local_clock: 301773     310296
    (warm) sched_clock: 38375      38247
    (warm) local_clock: 100371     102713
    (warm) rdtsc:       27340      27289
    sched_clock_stable: 0          0
    (cold) sched_clock: 382634     372706
    (cold) local_clock: 396890     399275
    (warm) sched_clock: 38194      38124
    (warm) local_clock: 143452     148698
    (warm) rdtsc:       27345      27365

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-piu203ses5y1g36bnyw2n16x@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 13:47:38 +01:00
Ingo Molnar
1c62448e39 Linux 3.13-rc8
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJS0miqAAoJEHm+PkMAQRiGbfgIAJSWEfo8ludknhPcHJabBtxu
 75SQAKJlL3sBVnxEc58Rtt8gsKYQIrm4IY5Slunklsn04RxuDUIQMgFoAYR5gQwz
 +Myqkw/HOqDe5VStGxtLYpWnfglxVwGDCd7ISfL9AOVy5adMWBxh4Tv+qqQc7aIZ
 eF7dy+DD+C6Q3Z5OoV8s0FZDxse29vOf17Nki7+7t8WMqyegYwjoOqNeqocGKsPi
 eHLrJgTl4T6jB4l9LKKC154DSKjKOTSwZMWgwK8mToyNLT/ufCiKgXloIjEvZZcY
 VVKUtncdHiTf+iqVojgpGBzOEeB5DM83iiapFeDiJg8C9yBzvT8lBtA9aPb5Wgw=
 =lEeV
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc8' into core/locking

Refresh the tree with the latest fixes, before applying new changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 11:44:41 +01:00
Prarit Bhargava
9345005f4e x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs
During heavy CPU-hotplug operations the following spurious kernel warnings
can trigger:

  do_IRQ: No ... irq handler for vector (irq -1)

  [ See: https://bugzilla.kernel.org/show_bug.cgi?id=64831 ]

When downing a cpu it is possible that there are unhandled irqs
left in the APIC IRR register.  The following code path shows
how the problem can occur:

 1. CPU 5 is to go down.

 2. cpu_disable() on CPU 5 executes with interrupt flag cleared
    by local_irq_save() via stop_machine().

 3. IRQ 12 asserts on CPU 5, setting IRR but not ISR because
    interrupt flag is cleared (CPU unabled to handle the irq)

 4. IRQs are migrated off of CPU 5, and the vectors' irqs are set
    to -1. 5. stop_machine() finishes cpu_disable()

 6. cpu_die() for CPU 5 executes in normal context.

 7. CPU 5 attempts to handle IRQ 12 because the IRR is set for
    IRQ 12.  The code attempts to find the vector's IRQ and cannot
    because it has been set to -1. 8. do_IRQ() warning displays
    warning about CPU 5 IRQ 12.

I added a debug printk to output which CPU & vector was
retriggered and discovered that that we are getting bogus
events.  I see a 100% correlation between this debug printk in
fixup_irqs() and the do_IRQ() warning.

This patchset resolves this by adding definitions for
VECTOR_UNDEFINED(-1) and VECTOR_RETRIGGERED(-2) and modifying
the code to use them.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=64831
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Rui Wang <rui.y.wang@intel.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Yang Zhang <yang.z.zhang@Intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: janet.morgan@Intel.com
Cc: tony.luck@Intel.com
Cc: ruiv.wang@gmail.com
Link: http://lkml.kernel.org/r/1388938252-16627-1-git-send-email-prarit@redhat.com
[ Cleaned up the code a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 13:13:02 +01:00
Peter Zijlstra
47933ad41a arch: Introduce smp_load_acquire(), smp_store_release()
A number of situations currently require the heavyweight smp_mb(),
even though there is no need to order prior stores against later
loads.  Many architectures have much cheaper ways to handle these
situations, but the Linux kernel currently has no portable way
to make use of them.

This commit therefore supplies smp_load_acquire() and
smp_store_release() to remedy this situation.  The new
smp_load_acquire() primitive orders the specified load against
any subsequent reads or writes, while the new smp_store_release()
primitive orders the specifed store against any prior reads or
writes.  These primitives allow array-based circular FIFOs to be
implemented without an smp_mb(), and also allow a theoretical
hole in rcu_assign_pointer() to be closed at no additional
expense on most architectures.

In addition, the RCU experience transitioning from explicit
smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
and rcu_assign_pointer(), respectively resulted in substantial
improvements in readability.  It therefore seems likely that
replacing other explicit barriers with smp_load_acquire() and
smp_store_release() will provide similar benefits.  It appears
that roughly half of the explicit barriers in core kernel code
might be so replaced.

[Changelog by PaulMck]

Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 10:37:17 +01:00
Linus Torvalds
26bef1318a x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround
Before we do an EMMS in the AMD FXSAVE information leak workaround we
need to clear any pending exceptions, otherwise we trap with a
floating-point exception inside this code.

Reported-by: halfdog <me@halfdog.net>
Tested-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-01-11 19:15:52 -08:00
Bjorn Helgaas
96702be560 Merge branch 'pci/resource' into next
* pci/resource:
  PCI: Allocate 64-bit BARs above 4G when possible
  PCI: Enforce bus address limits in resource allocation
  PCI: Split out bridge window override of minimum allocation address
  agp/ati: Use PCI_COMMAND instead of hard-coded 4
  agp/intel: Use CPU physical address, not bus address, for ioremap()
  agp/intel: Use pci_bus_address() to get GTTADR bus address
  agp/intel: Use pci_bus_address() to get MMADR bus address
  agp/intel: Support 64-bit GMADR
  agp/intel: Rename gtt_bus_addr to gtt_phys_addr
  drm/i915: Rename gtt_bus_addr to gtt_phys_addr
  agp: Use pci_resource_start() to get CPU physical address for BAR
  agp: Support 64-bit APBASE
  PCI: Add pci_bus_address() to get bus address of a BAR
  PCI: Convert pcibios_resource_to_bus() to take a pci_bus, not a pci_dev
  PCI: Change pci_bus_region addresses to dma_addr_t
2014-01-10 14:23:15 -07:00
David E. Box
4618441536 arch: x86: New MailBox support driver for Intel SOC's
Current Intel SOC cores use a MailBox Interface (MBI) to provide access to
configuration registers on devices (called units) connected to the system
fabric. This is a support driver that implements access to this interface on
those platforms that can enumerate the device using PCI. Initial support is for
BayTrail, for which port definitons are provided. This is a requirement for
implementing platform specific features (e.g. RAPL driver requires this to
perform platform specific power management using the registers in PUNIT).
Dependant modules should select IOSF_MBI in their respective Kconfig
configuraiton. Serialized access is handled by all exported routines with
spinlocks.

The API includes 3 functions for access to unit registers:

int iosf_mbi_read(u8 port, u8 opcode, u32 offset, u32 *mdr)
int iosf_mbi_write(u8 port, u8 opcode, u32 offset, u32 mdr)
int iosf_mbi_modify(u8 port, u8 opcode, u32 offset, u32 mdr, u32 mask)

port:	indicating the unit being accessed
opcode:	the read or write port specific opcode
offset:	the register offset within the port
mdr:	the register data to be read, written, or modified
mask:	bit locations in mdr to change

Returns nonzero on error

Note: GPU code handles access to the GFX unit. Therefore access to that unit
with this driver is disallowed to avoid conflicts.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1389216471-734-1-git-send-email-david.e.box@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
2014-01-08 14:36:29 -08:00
Yinghai Lu
f75b99d5a7 PCI: Enforce bus address limits in resource allocation
When allocating space for 32-bit BARs, we previously limited RESOURCE
addresses so they would fit in 32 bits.  However, the BUS address need not
be the same as the resource address, and it's the bus address that must fit
in the 32-bit BAR.

This patch adds:

  - pci_clip_resource_to_region(), which clips a resource so it contains
    only the range that maps to the specified bus address region, e.g., to
    clip a resource to 32-bit bus addresses, and

  - pci_bus_alloc_from_region(), which allocates space for a resource from
    the specified bus address region,

and changes pci_bus_alloc_resource() to allocate space for 64-bit BARs from
the entire bus address region, and space for 32-bit BARs from only the bus
address region below 4GB.

If we had this window:

  pci_root HWP0002:0a: host bridge window [mem 0xf0180000000-0xf01fedfffff] (bus address [0x80000000-0xfedfffff])

we previously could not put a 32-bit BAR there, because the CPU addresses
don't fit in 32 bits.  This patch fixes this, so we can use this space for
32-bit BARs.

It's also possible (though unlikely) to have resources with 32-bit CPU
addresses but bus addresses above 4GB.  In this case the previous code
would allocate space that a 32-bit BAR could not map.

Remove PCIBIOS_MAX_MEM_32, which is no longer used.

[bhelgaas: reworked starting from http://lkml.kernel.org/r/1386658484-15774-3-git-send-email-yinghai@kernel.org]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-01-07 16:24:33 -07:00
Paul Gortmaker
663b55b9b3 x86: Delete non-required instances of include <linux/init.h>
None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>.  Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.

[ hpa: undid incorrect removal from arch/x86/kernel/head_32.S ]

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1389054026-12947-1-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-01-06 21:25:18 -08:00
David S. Miller
56a4342dfe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
	net/ipv6/ip6_tunnel.c
	net/ipv6/ip6_vti.c

ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.

qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 17:37:45 -05:00
Konrad Rzeszutek Wilk
efaf30a335 xen/grant: Implement an grant frame array struct (v3).
The 'xen_hvm_resume_frames' used to be an 'unsigned long'
and contain the virtual address of the grants. That was OK
for most architectures (PVHVM, ARM) were the grants are contiguous
in memory. That however is not the case for PVH - in which case
we will have to do a lookup for each virtual address for the PFN.

Instead of doing that, lets make it a structure which will contain
the array of PFNs, the virtual address and the count of said PFNs.

Also provide a generic functions: gnttab_setup_auto_xlat_frames and
gnttab_free_auto_xlat_frames to populate said structure with
appropriate values for PVHVM and ARM.

To round it off, change the name from 'xen_hvm_resume_frames' to
a more descriptive one - 'xen_auto_xlat_grant_frames'.

For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
we will populate the 'xen_auto_xlat_grant_frames' by ourselves.

v2 moves the xen_remap in the gnttab_setup_auto_xlat_frames
and also introduces xen_unmap for gnttab_free_auto_xlat_frames.

Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v3: Based on top of 'asm/xen/page.h: remove redundant semicolon']
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06 10:44:20 -05:00
Mukesh Rathor
fc590efe66 xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn.
Most of the functions in page.h are prefaced with
	if (xen_feature(XENFEAT_auto_translated_physmap))
		return mfn;

Except the mfn_to_local_pfn. At a first sight, the function
should work without this patch - as the 'mfn_to_mfn' has
a similar check. But there are no such check in the
'get_phys_to_machine' function - so we would crash in there.

This fixes it by following the convention of having the
check for auto-xlat in these static functions.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06 10:43:56 -05:00
Ingo Molnar
ef0b8b9a52 Linux 3.13-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJSyJVbAAoJEHm+PkMAQRiGa28H/0m7GpZSpT8mvBthITxzqWCq
 JRkSPS4KTurAWlA5CJMJePyCM30DgN90s06bYUen9sTecZUwnL+qSV5OqAmg2r+0
 PrfwtXtGZR6/Y12XlZ/3oFxVfUxjmgJyDAS76TIH1IvIum52nvJmLrR+6AyVphIX
 DkgBOuapdA7lia+U+ZM1cRkeHxUOKTUEw9v611VgoN3LYZyzyRb6d0rB7JtZN1RV
 dnXRi27enaPhwxelsCnORioRjsByMwD40CERxfLHmr5CGhmvCehBjO6bJ+KAdp14
 52bfwWcNdbFMzUobcR7qlfS3Hy3AYJci+P6JzeeZ+kWEdv/eh5/1lvNuXtBJRlc=
 =iwzJ
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc7' into x86/efi-kexec to resolve conflicts

Conflicts:
	arch/x86/platform/efi/efi.c
	drivers/firmware/efi/Kconfig

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-05 12:34:29 +01:00
Steven Rostedt
df90ca9690 x86, sparse: Do not force removal of __user when calling copy_to/from_user_nocheck()
Commit ff47ab4ff3 "x86: Add 1/2/4/8 byte optimization to 64bit
__copy_{from,to}_user_inatomic" added a "_nocheck" call in between
the copy_to/from_user() and copy_user_generic(). As both the
normal and nocheck versions of theses calls use the proper __user
annotation, a typecast to remove it should not be added.
This causes sparse to spin out the following warnings:

arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
arch/x86/include/asm/uaccess_64.h:207:47:    expected void const [noderef] <asn:1>*src
arch/x86/include/asm/uaccess_64.h:207:47:    got void const *<noident>
arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
arch/x86/include/asm/uaccess_64.h:207:47:    expected void const [noderef] <asn:1>*src
arch/x86/include/asm/uaccess_64.h:207:47:    got void const *<noident>
arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
arch/x86/include/asm/uaccess_64.h:207:47:    expected void const [noderef] <asn:1>*src
arch/x86/include/asm/uaccess_64.h:207:47:    got void const *<noident>
arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
arch/x86/include/asm/uaccess_64.h:207:47:    expected void const [noderef] <asn:1>*src
arch/x86/include/asm/uaccess_64.h:207:47:    got void const *<noident>

Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140103164500.5f6478f5@gandalf.local.home
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-04 13:54:50 -08:00
Kirill A. Shutemov
dd360393f4 x86, cpu: Detect more TLB configuration
The Intel Software Developer’s Manual covers few more TLB
configurations exposed as CPUID 2 descriptors:

61H Instruction TLB: 4 KByte pages, fully associative, 48 entries
63H Data TLB: 1 GByte pages, 4-way set associative, 4 entries
76H Instruction TLB: 2M/4M pages, fully associative, 8 entries
B5H Instruction TLB: 4KByte pages, 8-way set associative, 64 entries
B6H Instruction TLB: 4KByte pages, 8-way set associative, 128 entries
C1H Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries
C2H DTLB DTLB: 2 MByte/$MByte pages, 4-way associative, 16 entries

Let's detect them as well.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/1387801018-14499-1-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-03 14:35:42 -08:00
Dave Young
5c12af0c41 x86/efi: parse_efi_setup() build fix
In case without CONFIG_EFI, there will be below build error:

   arch/x86/built-in.o: In function `setup_arch':
  (.init.text+0x9dc): undefined reference to `parse_efi_setup'

Thus fix it by adding blank inline function in asm/efi.h
Also remove an unused declaration for variable efi_data_len.

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-01-03 14:38:18 +00:00
Dave Young
456a29ddad x86: Add xloadflags bit for EFI runtime support on kexec
Old kexec-tools can not load new kernels. The reason is kexec-tools does
not fill efi_info in x86 setup header previously, thus EFI failed to
initialize.  In new kexec-tools it will by default to fill efi_info and
pass other EFI required infomation to 2nd kernel so kexec kernel EFI
initialization can succeed finally.

To prevent from breaking userspace, add a new xloadflags bit so
kexec-tools can check the flag and switch to old logic.

Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-29 13:09:06 +00:00
Dave Young
1fec053369 x86/efi: Pass necessary EFI data for kexec via setup_data
Add a new setup_data type SETUP_EFI for kexec use.  Passing the saved
fw_vendor, runtime, config tables and EFI runtime mappings.

When entering virtual mode, directly mapping the EFI runtime regions
which we passed in previously. And skip the step to call
SetVirtualAddressMap().

Specially for HP z420 workstation we need save the smbios physical
address.  The kernel boot sequence proceeds in the following order.
Step 2 requires efi.smbios to be the physical address.  However, I found
that on HP z420 EFI system table has a virtual address of SMBIOS in step
1.  Hence, we need set it back to the physical address with the smbios
in efi_setup_data.  (When it is still the physical address, it simply
sets the same value.)

1. efi_init() - Set efi.smbios from EFI system table
2. dmi_scan_machine() - Temporary map efi.smbios to access SMBIOS table
3. efi_enter_virtual_mode() - Map EFI ranges

Tested on ovmf+qemu, lenovo thinkpad, a dell laptop and an
HP z420 workstation.

Signed-off-by: Dave Young <dyoung@redhat.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-29 13:09:05 +00:00
H. Peter Anvin
a740576a4a x86: Slightly tweak the access_ok() C variant for better code
gcc can under very specific circumstances realize that the code
sequence:

	foo += bar;
	if (foo < bar) ...

... is equivalent to a carry out from the addition.  Tweak the
implementation of access_ok() (specifically __chk_range_not_ok()) to
make it more likely that gcc will make that connection.  It isn't
fool-proof (sometimes gcc seems to think it can make better code with
lea, and ends up with a second comparison), still, but it seems to be
able to connect the two more frequently this way.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/CA%2B55aFzPBdbfKovMT8Edr4SmE2_=%2BOKJFac9XW2awegogTkVTA@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-12-27 17:02:53 -08:00
Linus Torvalds
c5fe5d8068 x86: Replace assembly access_ok() with a C variant
It turns out that the assembly variant doesn't actually produce that
good code, presumably partly because it creates a long dependency
chain with no scheduling, and partly because we cannot get a flags
result out of gcc (which could be fixed with asm goto, but it turns
out not to be worth it.)

The C code allows gcc to schedule and generate multiple (easily
predictable) branches, and as a side benefit we can really optimize
the case where the size is constant.

Link: http://lkml.kernel.org/r/CA%2B55aFzPBdbfKovMT8Edr4SmE2_=%2BOKJFac9XW2awegogTkVTA@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-12-27 16:58:17 -08:00
Dave Young
3b2664964b x86/efi: Add a wrapper function efi_map_region_fixed()
Kexec kernel will use saved runtime virtual mapping, so add a new
function efi_map_region_fixed() for directly mapping a md to md->virt.

The md is passed in from 1st kernel, the virtual addr is saved in
md->virt_addr.

Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-21 15:09:51 +00:00
H.J. Lu
b70fedc158 x86, x32: Use __kernel_long_t/__kernel_ulong_t in x86-64 stat.h
Both x32 and x86-64 use the same stat system call interface.  But x32
long is 32-bit.  This patch changes x86 uapi <asm/stat.h> to use
 __kernel_long_t/__kernel_ulong_t in x86-64 stat.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Link: http://lkml.kernel.org/r/CAMe9rOquPtWEro0GQ=Z95pZJ=c7GGkSHynjN4FbiB4p445x-Ng@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-12-20 16:04:35 -08:00
H. Peter Anvin
7e98b71920 x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers
Use static_cpu_has() to conditionalize the CLFLUSH workaround, and add
memory barriers around it since the documentation is explicit that
CLFLUSH is only ordered with respect to MFENCE.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/CA%2B55aFzGxcML7j8CEvQPYzh0W81uVoAAVmGctMOUZ7CZ1yYd2A@mail.gmail.com
2013-12-19 11:58:16 -08:00
Peter Zijlstra
1682425539 x86, acpi, idle: Restructure the mwait idle routines
People seem to delight in writing wrong and broken mwait idle routines;
collapse the lot.

This leaves mwait_play_dead() the sole remaining user of __mwait() and
new __mwait() users are probably doing it wrong.

Also remove __sti_mwait() as its unused.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jacob Jun Pan <jacob.jun.pan@linux.intel.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Len Brown <lenb@kernel.org>
Cc: Rui Zhang <rui.zhang@intel.com>
Acked-by: Rafael Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131212141654.616820819@infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-12-19 11:54:44 -08:00
Ingo Molnar
3331e924e7 Linux 3.13-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSrhGrAAoJEHm+PkMAQRiGsNoH/jIK3CsQ2lbW7yRLXmfgtbzz
 i2Kep6D4SDvmaLpLYOVC8xNYTiE8jtTbSXHomwP5wMZ63MQDhBfnEWsEWqeZ9+D9
 3Q46p0QWuoBgYu2VGkoxTfygkT6hhSpwWIi3SeImbY4fg57OHiUil/+YGhORM4Qc
 K4549OCTY3sIrgmWL77gzqjRUo+pQ4C73NKqZ3+5nlOmYBZC1yugk8mFwEpQkwhK
 4NRNU760Fo+XIht/bINqRiPMddzC15p0mxvJy3cDW8bZa1tFSS9SB7AQUULBbcHL
 +2dFlFOEb5SV1sNiNPrJ0W+h2qUh2e7kPB0F8epaBppgbwVdyQoC2u4uuLV2ZN0=
 =lI2r
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc4' into x86/mm

Pick up the latest fixes before applying new patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-19 13:56:10 +01:00
Rik van Riel
2084140594 mm: fix TLB flush race between migration, and change_protection_range
There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.

The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.

During that time, a CPU may continue writing to the page.

This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.

When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.

This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).

The basic race looks like this:

CPU A			CPU B			CPU C

						load TLB entry
make entry PTE/PMD_NUMA
			fault on entry
						read/write old page
			start migrating page
			change PTE/PMD to new page
						read/write old page [*]
flush TLB
						reload TLB from new entry
						read/write new page
						lose data

[*] the old page may belong to a new user at this point!

The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.

This should fix both NUMA migration and compaction.

[mgorman@suse.de: fix build]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18 19:04:51 -08:00
David S. Miller
143c905494 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/i40e/i40e_main.c
	drivers/net/macvtap.c

Both minor merge hassles, simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:42:06 -05:00
Francesco Fusco
71ae8aac3e lib: introduce arch optimized hash library
We introduce a new hashing library that is meant to be used in
the contexts where speed is more important than uniformity of the
hashed values. The hash library leverages architecture specific
implementation to achieve high performance and fall backs to
jhash() for the generic case.

On Intel-based x86 architectures, the library can exploit the crc32l
instruction, part of the Intel SSE4.2 instruction set, if the
instruction is supported by the processor. This implementation
is twice as fast as the jhash() implementation on an i7 processor.

Additional architectures, such as Arm64 provide instructions for
accelerating the computation of CRC, so they could be added as well
in follow-up work.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:27:17 -05:00
Qiaowei Ren
0ee3b6f87d x86: replace futex_atomic_cmpxchg_inatomic() with user_atomic_cmpxchg_inatomic
futex_atomic_cmpxchg_inatomic() is simply the 32-bit implementation of
user_atomic_cmpxchg_inatomic(), which in turn is simply a
generalization of the original code in
futex_atomic_cmpxchg_inatomic().

Use the newly generalized user_atomic_cmpxchg_inatomic() as the futex
implementation, too.

[ hpa: retain the inline in futex.h rather than changing it to a macro ]

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Link: http://lkml.kernel.org/r/1387002303-6620-2-git-send-email-qiaowei.ren@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
2013-12-16 09:08:13 -08:00
Qiaowei Ren
f09174c501 x86: add user_atomic_cmpxchg_inatomic at uaccess.h
This patch adds user_atomic_cmpxchg_inatomic() to use CMPXCHG
instruction against a user space address.

This generalizes the already existing futex_atomic_cmpxchg_inatomic()
so it can be used in other contexts.  This will be used in the
upcoming support for Intel MPX (Memory Protection Extensions.)

[ hpa: replaced #ifdef inside a macro with IS_ENABLED() ]

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Link: http://lkml.kernel.org/r/1387002303-6620-1-git-send-email-qiaowei.ren@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
2013-12-16 09:07:57 -08:00
DuanZhenzhong
ac8344c4c0 PCI: Drop "irq" param from *_restore_msi_irqs()
Change x86_msi.restore_msi_irqs(struct pci_dev *dev, int irq) to
x86_msi.restore_msi_irqs(struct pci_dev *dev).

restore_msi_irqs() restores multiple MSI-X IRQs, so param 'int irq' is
unneeded.  This makes code more consistent between vm and bare metal.

Dom0 MSI-X restore code can also be optimized as XEN only has a hypercall
to restore all MSI-X vectors at one time.

Tested-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-12-13 08:44:30 -07:00
Jan Kiszka
6dfacadd58 KVM: nVMX: Add support for activity state HLT
We can easily emulate the HLT activity state for L1: If it decides that
L2 shall be halted on entry, just invoke the normal emulation of halt
after switching to L2. We do not depend on specific host features to
provide this, so we can expose the capability unconditionally.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-12-12 10:49:56 +01:00
Peter Zijlstra
ba1f14fbe7 sched: Remove PREEMPT_NEED_RESCHED from generic code
While hunting a preemption issue with Alexander, Ben noticed that the
currently generic PREEMPT_NEED_RESCHED stuff is horribly broken for
load-store architectures.

We currently rely on the IPI to fold TIF_NEED_RESCHED into
PREEMPT_NEED_RESCHED, but when this IPI lands while we already have
a load for the preempt-count but before the store, the store will erase
the PREEMPT_NEED_RESCHED change.

The current preempt-count only works on load-store archs because
interrupts are assumed to be completely balanced wrt their preempt_count
fiddling; the previous preempt_count load will match the preempt_count
state after the interrupt and therefore nothing gets lost.

This patch removes the PREEMPT_NEED_RESCHED usage from generic code and
pushes it into x86 arch code; the generic code goes back to relying on
TIF_NEED_RESCHED.

Boot tested on x86_64 and compile tested on ppc64.

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-and-Tested-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131128132641.GP10022@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-11 15:52:32 +01:00
Qiaowei Ren
e7d820a5e5 x86, xsave: Support eager-only xsave features, add MPX support
Some features, like Intel MPX, work only if the kernel uses eagerfpu
model.  So we should force eagerfpu on unless the user has explicitly
disabled it.

Add definitions for Intel MPX and add it to the supported list.

[ hpa: renamed XSTATE_FLEXIBLE to XSTATE_LAZY and added comments ]

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Link: http://lkml.kernel.org/r/9E0BE1322F2F2246BD820DA9FC397ADE014A6115@SHSMSX102.ccr.corp.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-12-06 17:17:42 -08:00
Qiaowei Ren
191f57c137 x86, cpufeature: Define the Intel MPX feature flag
Define the Intel MPX (Memory Protection Extensions) CPU feature flag
in the cpufeature list.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Link: http://lkml.kernel.org/r/1386375658-2191-2-git-send-email-qiaowei.ren@intel.com
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-12-06 10:21:44 -08:00
Linus Torvalds
53c6de5026 Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 and EFI fixes from Peter Anvin:
 "Half of these are EFI-related:

  The by far biggest change is the change to hold off the deletion of a
  sysfs entry while a backend scan is in progress.  This is to avoid
  calling kmemdup() while under a spinlock.

  The other major change is for each entry in the EFI pstore backend to
  get a unique identifier, as required by the pstore filesystem proper.

  The other changes are:

  A fix to the recent consolidation and optimization of using "asm goto"
  with read-modify-write operation, which broke the bitops; specifically
  in such a way that we could end up generating invalid code.

  A build hack to make sure we compile with -mno-sse.  icc, and most
  likely future versions of gcc, can generate SSE instructions unless we
  tell it not to.

  A comment-only patch to a change the was due in part to an unpublished
  erratum; now when the erratum is published we want to add a comment
  explaining why"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic, doc: Justification for disabling IO APIC before Local APIC
  x86, bitops: Correct the assembly constraints to testing bitops
  x86-64, build: Always pass in -mno-sse
  efi-pstore: Make efi-pstore return a unique id
  x86/efi: Fix earlyprintk off-by-one bug
  efivars, efi-pstore: Hold off deletion of sysfs entry until the scan is completed
2013-12-04 21:45:21 -08:00
H. Peter Anvin
e0f6dec35f x86, bitops: Correct the assembly constraints to testing bitops
In checkin:

0c44c2d0f4 x86: Use asm goto to implement better modify_and_test() functions

the various functions which do modify and test were unified and
optimized using "asm goto".  However, this change missed the detail
that the bitops require an "Ir" constraint rather than an "er"
constraint ("I" = integer constant from 0-31, "e" = signed 32-bit
integer constant).  This would cause code to miscompile if these
functions were used on constant bit positions 32-255 and the build to
fail if used on constant bit positions above 255.

Add the constraints as a parameter to the GEN_BINARY_RMWcc() macro to
avoid this problem.

Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/529E8719.4070202@zytor.com
2013-12-04 14:31:28 -08:00
Linus Torvalds
e321ae4c20 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc kernel and tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tools lib traceevent: Fix conversion of pointer to integer of different size
  perf/trace: Properly use u64 to hold event_id
  perf: Remove fragile swevent hlist optimization
  ftrace, perf: Avoid infinite event generation loop
  tools lib traceevent: Fix use of multiple options in processing field
  perf header: Fix possible memory leaks in process_group_desc()
  perf header: Fix bogus group name
  perf tools: Tag thread comm as overriden
2013-12-02 10:13:09 -08:00
Ingo Molnar
61d0669775 * New static EFI runtime services virtual mapping layout which is
groundwork for kexec support on EFI - Borislav Petkov
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJSfMDjAAoJEC84WcCNIz1V0LIP/2WyTbJR6bL0HXwnGLpxmxag
 v0VgnKRhypNboA3WEu4a9as6AdExqB7qsiWIipHuDSMj/vkfZgHAKTd2f1iRxmsJ
 RZxzwV6YzvsWkdXjvpCoLWKSsvQDug++BAIti1PitW6RXRjo01t3ymo/Ho1CQrpI
 hNJbB3bbihMF+uqFvdSpO0KZtZE6EtnylrfBeuo0GzqqJdTGe1MmqlWmyUlEy5JW
 ZiHV8E/xTjh3N675tWPcT9hGVfCyOXXu/kPrXsJTXrdYyZL9qgA9b8SVRLs6DctX
 wVgL9lNv4wobsmZJ5DxkYl9+TaF7rbshUeIJbzrQyMVJjb3TpXk/ZpspDMAEjL7e
 bb76c1bAx4xZuUatR/f1ykkWKAEryAhXHkvwcbIBjebW33if1MgGJLk5udJQQv6H
 j+J9ROH38MDr0Geg+pM2RnCyTz8l+q+8Mfu4Yh9TSte+ttB6fr9phs3/G+fdSUn1
 0vI627v1HWzDcBh4eZjjslzJviR8PldsGVT3EsIaOnHGtk/9FPz/7n4efph4v7+9
 yqTkLvQHxsAx7f0tR/qRpkEIQ9WTMXO0IO79OC13QTSATJSl+WSPTJM7ccqOgn+P
 h89ssBnzlwmFHkuvTi599KVHdzOZrWTsB2zROn+NnSchJ+YAY6TXwznk3/MNo9rB
 d+euVcffL4yfWZ7Bzj8Z
 =w6ff
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi

Pull EFI virtual mapping changes from Matt Fleming:

  * New static EFI runtime services virtual mapping layout which is
    groundwork for kexec support on EFI. (Borislav Petkov)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-26 12:23:04 +01:00
Linus Torvalds
26b265cd29 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 - Made x86 ablk_helper generic for ARM
 - Phase out chainiv in favour of eseqiv (affects IPsec)
 - Fixed aes-cbc IV corruption on s390
 - Added constant-time crypto_memneq which replaces memcmp
 - Fixed aes-ctr in omap-aes
 - Added OMAP3 ROM RNG support
 - Add PRNG support for MSM SoC's
 - Add and use Job Ring API in caam
 - Misc fixes

[ NOTE! This pull request was sent within the merge window, but Herbert
  has some questionable email sending setup that makes him public enemy
  #1 as far as gmail is concerned.  So most of his emails seem to be
  trapped by gmail as spam, resulting in me not seeing them.  - Linus ]

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (49 commits)
  crypto: s390 - Fix aes-cbc IV corruption
  crypto: omap-aes - Fix CTR mode counter length
  crypto: omap-sham - Add missing modalias
  padata: make the sequence counter an atomic_t
  crypto: caam - Modify the interface layers to use JR API's
  crypto: caam - Add API's to allocate/free Job Rings
  crypto: caam - Add Platform driver for Job Ring
  hwrng: msm - Add PRNG support for MSM SoC's
  ARM: DT: msm: Add Qualcomm's PRNG driver binding document
  crypto: skcipher - Use eseqiv even on UP machines
  crypto: talitos - Simplify key parsing
  crypto: picoxcell - Simplify and harden key parsing
  crypto: ixp4xx - Simplify and harden key parsing
  crypto: authencesn - Simplify key parsing
  crypto: authenc - Export key parsing helper function
  crypto: mv_cesa: remove deprecated IRQF_DISABLED
  hwrng: OMAP3 ROM Random Number Generator support
  crypto: sha256_ssse3 - also test for BMI2
  crypto: mv_cesa - Remove redundant of_match_ptr
  crypto: sahara - Remove redundant of_match_ptr
  ...
2013-11-23 16:18:25 -08:00
Linus Torvalds
82023bb7f7 More ACPI and power management updates for 3.13-rc1
- ACPI-based device hotplug fixes for issues introduced recently and
   a fix for an older error code path bug in the ACPI PCI host bridge
   driver.
 
 - Fix for recently broken OMAP cpufreq build from Viresh Kumar.
 
 - Fix for a recent hibernation regression related to s2disk.
 
 - Fix for a locking-related regression in the ACPI EC driver from
   Puneet Kumar.
 
 - System suspend error code path fix related to runtime PM and
   runtime PM documentation update from Ulf Hansson.
 
 - cpufreq's conservative governor fix from Xiaoguang Chen.
 
 - New processor IDs for intel_idle and turbostat and removal of
   an obsolete Kconfig option from Len Brown.
 
 - New device IDs for the ACPI LPSS (Low-Power Subsystem) driver and
   ACPI-based PCI hotplug (ACPIPHP) cleanup from Mika Westerberg.
 
 - Removal of several ACPI video DMI blacklist entries that are not
   necessary any more from Aaron Lu.
 
 - Rework of the ACPI companion representation in struct device and
   code cleanup related to that change from Rafael J Wysocki,
   Lan Tianyu and Jarkko Nikula.
 
 - Fixes for assigning names to ACPI-enumerated I2C and SPI devices
   from Jarkko Nikula.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJSjLYNAAoJEILEb/54YlRxkEQP/1pmFWNwSsxLtTHd+PEs0Xbo
 QccqvjQrnw/c8GcmK4eZrz6/xyuepmmjy9kfRKj2ENZniy0NEsSFqkTdSO3vYlva
 8HKWUj7MV3evhFERXAF6Tu0b4Enx4jOP7VMtmYxJo3qrSnKRUcUzc6DGv/ACsUT1
 Nkj0Lhdsg053Z+YzIXrl50w0tCDEMhVmWlMHBtYgr+dMNVnkfPBGkqMblMkKCXT2
 w/yHvauZlxQHtI+8bVqTuGgNN0CPzdlpFGiuUF+5mDf6dRX8zlSn56Ia+Wyw1k9X
 dQp4jYQOgPRo03rNKqQPDiPxUdc7T0RAHRvDB51Ncweuh5PfZGguQe71p6/LKY2W
 i6zblZ0f/vc13hTiMrP+qzKcwZvgPB5DH7SfnHr61JKV7GNFCdYAqoceS5hYMzR9
 d2Fd+txgm763IHWewXfDS/G2cU492R5qr4jpmUIACBQKWDZcqmSRDwRj83t56Ltb
 jgFBMbg4vZxG7IARhind74xsALxdhsgmFjPmx+0qPWjYxcU8otQZpXbgGNI9iOuW
 pxIQv5WPQW0tTmwO4HSuVCOwDPLPz5R0jkev7SvSj3Ek3TeD7He4LmnK055CATiC
 puq+6dp1FISPOPJYk+0DI61qN/CB/qNwRp8LU3ctZwudPVhznIE9FFQ3iN1FdBg2
 X8VDcT9t7VvVuxSBjgkj
 =QMp+
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-2-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more ACPI and power management updates from Rafael Wysocki:

 - ACPI-based device hotplug fixes for issues introduced recently and a
   fix for an older error code path bug in the ACPI PCI host bridge
   driver

 - Fix for recently broken OMAP cpufreq build from Viresh Kumar

 - Fix for a recent hibernation regression related to s2disk

 - Fix for a locking-related regression in the ACPI EC driver from
   Puneet Kumar

 - System suspend error code path fix related to runtime PM and runtime
   PM documentation update from Ulf Hansson

 - cpufreq's conservative governor fix from Xiaoguang Chen

 - New processor IDs for intel_idle and turbostat and removal of an
   obsolete Kconfig option from Len Brown

 - New device IDs for the ACPI LPSS (Low-Power Subsystem) driver and
   ACPI-based PCI hotplug (ACPIPHP) cleanup from Mika Westerberg

 - Removal of several ACPI video DMI blacklist entries that are not
   necessary any more from Aaron Lu

 - Rework of the ACPI companion representation in struct device and code
   cleanup related to that change from Rafael J Wysocki, Lan Tianyu and
   Jarkko Nikula

 - Fixes for assigning names to ACPI-enumerated I2C and SPI devices from
   Jarkko Nikula

* tag 'pm+acpi-2-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits)
  PCI / hotplug / ACPI: Drop unused acpiphp_debug declaration
  ACPI / scan: Set flags.match_driver in acpi_bus_scan_fixed()
  ACPI / PCI root: Clear driver_data before failing enumeration
  ACPI / hotplug: Fix PCI host bridge hot removal
  ACPI / hotplug: Fix acpi_bus_get_device() return value check
  cpufreq: governor: Remove fossil comment in the cpufreq_governor_dbs()
  ACPI / video: clean up DMI table for initial black screen problem
  ACPI / EC: Ensure lock is acquired before accessing ec struct members
  PM / Hibernate: Do not crash kernel in free_basic_memory_bitmaps()
  ACPI / AC: Remove struct acpi_device pointer from struct acpi_ac
  spi: Use stable dev_name for ACPI enumerated SPI slaves
  i2c: Use stable dev_name for ACPI enumerated I2C slaves
  ACPI: Provide acpi_dev_name accessor for struct acpi_device device name
  ACPI / bind: Use (put|get)_device() on ACPI device objects too
  ACPI: Eliminate the DEVICE_ACPI_HANDLE() macro
  ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node
  cpufreq: OMAP: Fix compilation error 'r & ret undeclared'
  PM / Runtime: Fix error path for prepare
  PM / Runtime: Update documentation around probe|remove|suspend
  cpufreq: conservative: set requested_freq to policy max when it is over policy max
  ...
2013-11-20 13:25:04 -08:00
Linus Torvalds
4007162647 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq cleanups from Ingo Molnar:
 "This is a multi-arch cleanup series from Thomas Gleixner, which we
  kept to near the end of the merge window, to not interfere with
  architecture updates.

  This series (motivated by the -rt kernel) unifies more aspects of IRQ
  handling and generalizes PREEMPT_ACTIVE"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  preempt: Make PREEMPT_ACTIVE generic
  sparc: Use preempt_schedule_irq
  ia64: Use preempt_schedule_irq
  m32r: Use preempt_schedule_irq
  hardirq: Make hardirq bits generic
  m68k: Simplify low level interrupt handling code
  genirq: Prevent spurious detection for unconditionally polled interrupts
2013-11-19 10:40:00 -08:00
Peter Zijlstra
d5b5f391d4 ftrace, perf: Avoid infinite event generation loop
Vince's perf-trinity fuzzer found yet another 'interesting' problem.

When we sample the irq_work_exit tracepoint with period==1 (or
PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an
infinite event generation loop:

  ,-> <IPI>
  |     irq_work_exit() ->
  |       trace_irq_work_exit() ->
  |         ...
  |           __perf_event_overflow() -> (due to fasync)
  |             irq_work_queue() -> (irq_work_list must be empty)
  '---------      arch_irq_work_raise()

Similar things can happen due to regular poll() wakeups if we exceed
the ring-buffer wakeup watermark, or have an event_limit.

To avoid this, dis-allow sampling this particular tracepoint.

In order to achieve this, create a special perf_perm function pointer
for each event and call this (when set) on trying to create a
tracepoint perf event.

[ roasted: use expr... to allow for ',' in your expression ]

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-19 16:57:40 +01:00
Kirill A. Shutemov
fd8526ad14 x86/mm: Implement ASLR for hugetlb mappings
Matthew noticed that hugetlb mappings don't participate in ASLR on x86-64:

  %  for i in `seq 3`; do
  > tools/testing/selftests/vm/map_hugetlb | grep address
  > done
  Returned address is 0x2aaaaac00000
  Returned address is 0x2aaaaac00000
  Returned address is 0x2aaaaac00000

/proc/PID/maps entries for the mapping are always the same
(except inode number):

  2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 8200              /anon_hugepage (deleted)
  2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 256               /anon_hugepage (deleted)
  2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 7180              /anon_hugepage (deleted)

The reason is the generic hugetlb_get_unmapped_area() function
which is used on x86-64.  It doesn't support randomization and
use bottom-up unmapped area lookup, instead of usual top-down
on x86-64.

x86 has arch-specific hugetlb_get_unmapped_area(), but it's used
only on x86-32.

Let's use arch-specific hugetlb_get_unmapped_area() on x86-64
too. That adds ASLR and switches hugetlb mappings to use top-down
unmapped area lookup:

  %  for i in `seq 3`; do
  > tools/testing/selftests/vm/map_hugetlb | grep address
  > done
  Returned address is 0x7f4f08a00000
  Returned address is 0x7fdda4200000
  Returned address is 0x7febe0000000

/proc/PID/maps entries:

  7f4f08a00000-7f4f18a00000 rw-p 00000000 00:0c 1168              /anon_hugepage (deleted)
  7fdda4200000-7fddb4200000 rw-p 00000000 00:0c 7092              /anon_hugepage (deleted)
  7febe0000000-7febf0000000 rw-p 00000000 00:0c 7183              /anon_hugepage (deleted)

Unmapped area lookup policy for hugetlb mappings is consistent
with normal mappings now -- the only difference is alignment
requirements for huge pages.

libhugetlbfs test-suite didn't detect any regressions with the
patch applied (although it shows few failures on my machine
regardless the patch).

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/20131119131750.EA45CE0090@blue.fi.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-19 14:24:50 +01:00
Cyrill Gorcunov
5305ca10e7 x86/mm: Unify pte_to_pgoff() and pgoff_to_pte() helpers
Use unified pte_bitop() helper to manipulate bits in pte/pgoff
bitfields and convert pte_to_pgoff()/pgoff_to_pte() to inlines.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-19 14:24:34 +01:00
Rafael J. Wysocki
6431b43097 Merge branch 'pm-tools'
* pm-tools:
  tools / power turbostat: Support Silvermont
2013-11-19 01:06:38 +01:00
Linus Torvalds
f080480488 Here are the 3.13 KVM changes. There was a lot of work on the PPC
side: the HV and emulation flavors can now coexist in a single kernel
 is probably the most interesting change from a user point of view.
 On the x86 side there are nested virtualization improvements and a
 few bugfixes.  ARM got transparent huge page support, improved
 overcommit, and support for big endian guests.
 
 Finally, there is a new interface to connect KVM with VFIO.  This
 helps with devices that use NoSnoop PCI transactions, letting the
 driver in the guest execute WBINVD instructions.  This includes
 some nVidia cards on Windows, that fail to start without these
 patches and the corresponding userspace changes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJShPAhAAoJEBvWZb6bTYbyl48P/297GgmELHAGBgjvb6q7yyGu
 L8+eHjKbh4XBAkPwyzbvUjuww5z2hM0N3JQ0BDV9oeXlO+zwwCEns/sg2Q5/NJXq
 XxnTeShaKnp9lqVBnE6G9rAOUWKoyLJ2wItlvUL8JlaO9xJ0Vmk0ta4n2Nv5GqDp
 db6UD7vju6rHtIAhNpvvAO51kAOwc01xxRixCVb7KUYOnmO9nvpixzoI/S0Rp1gu
 w/OWMfCosDzBoT+cOe79Yx1OKcpaVW94X6CH1s+ShCw3wcbCL2f13Ka8/E3FIcuq
 vkZaLBxio7vjUAHRjPObw0XBW4InXEbhI1DjzIvm8dmc4VsgmtLQkTCG8fj+jINc
 dlHQUq6Do+1F4zy6WMBUj8tNeP1Z9DsABp98rQwR8+BwHoQpGQBpAxW0TE0ZMngC
 t1caqyvjZ5pPpFUxSrAV+8Kg4AvobXPYOim0vqV7Qea07KhFcBXLCfF7BWdwq/Jc
 0CAOlsLL4mHGIQWZJuVGw0YGP7oATDCyewlBuDObx+szYCoV4fQGZVBEL0KwJx/1
 7lrLN7JWzRyw6xTgJ5VVwgYE1tUY4IFQcHu7/5N+dw8/xg9KWA3f4PeMavIKSf+R
 qteewbtmQsxUnvuQIBHLs8NRWPnBPy+F3Sc2ckeOLIe4pmfTte6shtTXcLDL+LqH
 NTmT/cfmYp2BRkiCfCiS
 =rWNf
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM changes from Paolo Bonzini:
 "Here are the 3.13 KVM changes.  There was a lot of work on the PPC
  side: the HV and emulation flavors can now coexist in a single kernel
  is probably the most interesting change from a user point of view.

  On the x86 side there are nested virtualization improvements and a few
  bugfixes.

  ARM got transparent huge page support, improved overcommit, and
  support for big endian guests.

  Finally, there is a new interface to connect KVM with VFIO.  This
  helps with devices that use NoSnoop PCI transactions, letting the
  driver in the guest execute WBINVD instructions.  This includes some
  nVidia cards on Windows, that fail to start without these patches and
  the corresponding userspace changes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
  kvm, vmx: Fix lazy FPU on nested guest
  arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu
  arm/arm64: KVM: MMIO support for BE guest
  kvm, cpuid: Fix sparse warning
  kvm: Delete prototype for non-existent function kvm_check_iopl
  kvm: Delete prototype for non-existent function complete_pio
  hung_task: add method to reset detector
  pvclock: detect watchdog reset at pvclock read
  kvm: optimize out smp_mb after srcu_read_unlock
  srcu: API for barrier after srcu read unlock
  KVM: remove vm mmap method
  KVM: IOMMU: hva align mapping page size
  KVM: x86: trace cpuid emulation when called from emulator
  KVM: emulator: cleanup decode_register_operand() a bit
  KVM: emulator: check rex prefix inside decode_register()
  KVM: x86: fix emulation of "movzbl %bpl, %eax"
  kvm_host: typo fix
  KVM: x86: emulate SAHF instruction
  MAINTAINERS: add tree for kvm.git
  Documentation/kvm: add a 00-INDEX file
  ...
2013-11-15 13:51:36 +09:00
Linus Torvalds
eda670c626 Features:
- SWIOTLB has tracing added when doing bounce buffer.
  - Xen ARM/ARM64 can use Xen-SWIOTLB. This work allows Linux to
    safely program real devices for DMA operations when running as
    a guest on Xen on ARM, without IOMMU support.*1
  - xen_raw_printk works with PVHVM guests if needed.
 Bug-fixes:
  - Make memory ballooning work under HVM with large MMIO region.
  - Inform hypervisor of MCFG regions found in ACPI DSDT.
  - Remove deprecated IRQF_DISABLED.
  - Remove deprecated __cpuinit.
 
 [*1]:
 "On arm and arm64 all Xen guests, including dom0, run with second stage
 translation enabled. As a consequence when dom0 programs a device for a
 DMA operation is going to use (pseudo) physical addresses instead
 machine addresses. This work introduces two trees to track physical to
 machine and machine to physical mappings of foreign pages. Local pages
 are assumed mapped 1:1 (physical address == machine address).  It
 enables the SWIOTLB-Xen driver on ARM and ARM64, so that Linux can
 translate physical addresses to machine addresses for dma operations
 when necessary. " (Stefano).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSgS86AAoJEFjIrFwIi8fJpY4H/R2gke1A1p9UvTwbkaDhgPs/
 u/mkI6aH+ktgvu5QZNprki660uydtc4Ck7y8leeLGYw+ed1Ys559SJhRc/x8jBYZ
 Hh2chnplld0LAjSpdIDTTePArE1xBo4Gz+fT0zc5cVh0leJwOXn92Kx8N5AWD/T3
 gwH4Ok4K1dzZBIls7imM2AM/L1xcApcx3Dl/QpNcoePQtR4yLuPWMUbb3LM8pbUY
 0B6ZVN4GOhtJ84z8HRKnh4uMnBYmhmky6laTlHVa6L+j1fv7aAPCdNbePjIt/Pvj
 HVYB1O/ht73yHw0zGfK6lhoGG8zlu+Q7sgiut9UsGZZfh34+BRKzNTypqJ3ezQo=
 =xc43
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.13-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen updates from Konrad Rzeszutek Wilk:
 "This has tons of fixes and two major features which are concentrated
  around the Xen SWIOTLB library.

  The short <blurb> is that the tracing facility (just one function) has
  been added to SWIOTLB to make it easier to track I/O progress.
  Additionally under Xen and ARM (32 & 64) the Xen-SWIOTLB driver
  "is used to translate physical to machine and machine to physical
  addresses of foreign[guest] pages for DMA operations" (Stefano) when
  booting under hardware without proper IOMMU.

  There are also bug-fixes, cleanups, compile warning fixes, etc.

  The commit times for some of the commits is a bit fresh - that is b/c
  we wanted to make sure we have the Ack's from the ARM folks - which
  with the string of back-to-back conferences took a bit of time.  Rest
  assured - the code has been stewing in #linux-next for some time.

  Features:
   - SWIOTLB has tracing added when doing bounce buffer.
   - Xen ARM/ARM64 can use Xen-SWIOTLB.  This work allows Linux to
     safely program real devices for DMA operations when running as a
     guest on Xen on ARM, without IOMMU support. [*1]
   - xen_raw_printk works with PVHVM guests if needed.

  Bug-fixes:
   - Make memory ballooning work under HVM with large MMIO region.
   - Inform hypervisor of MCFG regions found in ACPI DSDT.
   - Remove deprecated IRQF_DISABLED.
   - Remove deprecated __cpuinit.

  [*1]:
  "On arm and arm64 all Xen guests, including dom0, run with second
   stage translation enabled.  As a consequence when dom0 programs a
   device for a DMA operation is going to use (pseudo) physical
   addresses instead machine addresses.  This work introduces two trees
   to track physical to machine and machine to physical mappings of
   foreign pages.  Local pages are assumed mapped 1:1 (physical address
   == machine address).  It enables the SWIOTLB-Xen driver on ARM and
   ARM64, so that Linux can translate physical addresses to machine
   addresses for dma operations when necessary.  " (Stefano)"

* tag 'stable/for-linus-3.13-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (32 commits)
  xen/arm: pfn_to_mfn and mfn_to_pfn return the argument if nothing is in the p2m
  arm,arm64/include/asm/io.h: define struct bio_vec
  swiotlb-xen: missing include dma-direction.h
  pci-swiotlb-xen: call pci_request_acs only ifdef CONFIG_PCI
  arm: make SWIOTLB available
  xen: delete new instances of added __cpuinit
  xen/balloon: Set balloon's initial state to number of existing RAM pages
  xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas.
  xen: remove deprecated IRQF_DISABLED
  x86/xen: remove deprecated IRQF_DISABLED
  swiotlb-xen: fix error code returned by xen_swiotlb_map_sg_attrs
  swiotlb-xen: static inline xen_phys_to_bus, xen_bus_to_phys, xen_virt_to_bus and range_straddles_page_boundary
  grant-table: call set_phys_to_machine after mapping grant refs
  arm,arm64: do not always merge biovec if we are running on Xen
  swiotlb: print a warning when the swiotlb is full
  swiotlb-xen: use xen_dma_map/unmap_page, xen_dma_sync_single_for_cpu/device
  xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device
  tracing/events: Fix swiotlb tracepoint creation
  swiotlb-xen: use xen_alloc/free_coherent_pages
  xen: introduce xen_alloc/free_coherent_pages
  ...
2013-11-15 13:34:37 +09:00
Kirill A. Shutemov
9491846fca x86, mm: enable split page table lock for PMD level
Enable PMD split page table lock for X86_64 and PAE.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:15 +09:00
Rafael J. Wysocki
7b1998116b ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node
Modify struct acpi_dev_node to contain a pointer to struct acpi_device
associated with the given device object (that is, its ACPI companion
device) instead of an ACPI handle corresponding to it.  Introduce two
new macros for manipulating that pointer in a CONFIG_ACPI-safe way,
ACPI_COMPANION() and ACPI_COMPANION_SET(), and rework the
ACPI_HANDLE() macro to take the above changes into account.
Drop the ACPI_HANDLE_SET() macro entirely and rework its users to
use ACPI_COMPANION_SET() instead.  For some of them who used to
pass the result of acpi_get_child() directly to ACPI_HANDLE_SET()
introduce a helper routine acpi_preset_companion() doing an
equivalent thing.

The main motivation for doing this is that there are things
represented by struct acpi_device objects that don't have valid
ACPI handles (so called fixed ACPI hardware features, such as
power and sleep buttons) and we would like to create platform
device objects for them and "glue" them to their ACPI companions
in the usual way (which currently is impossible due to the
lack of valid ACPI handles).  However, there are more reasons
why it may be useful.

First, struct acpi_device pointers allow of much better type checking
than void pointers which are ACPI handles, so it should be more
difficult to write buggy code using modified struct acpi_dev_node
and the new macros.  Second, the change should help to reduce (over
time) the number of places in which the result of ACPI_HANDLE() is
passed to acpi_bus_get_device() in order to obtain a pointer to the
struct acpi_device associated with the given "physical" device,
because now that pointer is returned by ACPI_COMPANION() directly.
Finally, the change should make it easier to write generic code that
will build both for CONFIG_ACPI set and unset without adding explicit
compiler directives to it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> # on Haswell
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Aaron Lu <aaron.lu@intel.com> # for ATA and SDIO part
2013-11-14 23:14:43 +01:00
Linus Torvalds
d320e203ba Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull two x86 fixes from Ingo Molnar.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/amd: Tone down printk(), don't treat a missing firmware file as an error
  x86/dumpstack: Fix printk_address for direct addresses
2013-11-14 16:55:56 +09:00
Linus Torvalds
7971e23a66 Merge branch 'x86-trace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/trace changes from Ingo Molnar:
 "This adds page fault tracepoints which have zero runtime cost in the
  disabled case via IDT trickery (no NOPs in the page fault hotpath)"

* 'x86-trace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, trace: Change user|kernel_page_fault to page_fault_user|kernel
  x86, trace: Add page fault tracepoints
  x86, trace: Delete __trace_alloc_intr_gate()
  x86, trace: Register exception handler to trace IDT
  x86, trace: Remove __alloc_intr_gate()
2013-11-14 16:25:10 +09:00
Linus Torvalds
2f466d33f5 PCI changes for the v3.13 merge window:
Resource management
     - Fix host bridge window coalescing (Alexey Neyman)
     - Pass type, width, and prefetchability for window alignment (Wei Yang)
 
   PCI device hotplug
     - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu)
 
   Power management
     - Remove pci_pm_complete() (Liu Chuansheng)
 
   MSI
     - Fail initialization if device is not in PCI_D0 (Yijing Wang)
 
   MPS (Max Payload Size)
     - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang)
     - Use pcie_set_readrq() to simplify code (Yijing Wang)
     - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang)
 
   SR-IOV
     - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas)
     - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang)
 
   Virtualization
     - Add x86 MSI masking ops (Konrad Rzeszutek Wilk)
 
   Freescale i.MX6
     - Support i.MX6 PCIe controller (Sean Cross)
     - Increase link startup timeout (Marek Vasut)
     - Probe PCIe in fs_initcall() (Marek Vasut)
     - Fix imprecise abort handler (Tim Harvey)
     - Remove redundant of_match_ptr (Sachin Kamat)
 
   Renesas R-Car
     - Support Gen2 internal PCIe controller (Valentine Barshak)
 
   Samsung Exynos
     - Add MSI support (Jingoo Han)
     - Turn off power when link fails (Jingoo Han)
     - Add Jingoo Han as maintainer (Jingoo Han)
     - Add clk_disable_unprepare() on error path (Wei Yongjun)
     - Remove redundant of_match_ptr (Sachin Kamat)
 
   Synopsys DesignWare
     - Add irq_create_mapping() (Pratyush Anand)
     - Add header guards (Seungwon Jeon)
 
   Miscellaneous
     - Enable native PCIe services by default on non-ACPI (Andrew Murray)
     - Cleanup _OSC usage and messages (Bjorn Helgaas)
     - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas)
     - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman)
     - Remove unused pci_mem_start (Myron Stowe)
     - Make sysfs functions static (Sachin Kamat)
     - Warn on invalid return from driver probe (Stephen M. Cameron)
     - Remove Intel Haswell D3 delays (Todd E Brandt)
     - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu)
     - Use pci_is_pcie() to simplify code (Yijing Wang)
     - Use PCIe capability accessors to simplify code (Yijing Wang)
     - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang)
     - Removed unused "is_pcie" from struct pci_dev (Yijing Wang)
     - Simplify sysfs CPU affinity implementation (Yijing Wang))
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSgUzsAAoJEFmIoMA60/r8wmsQAJhwmtkUYR2L4T1g9smAyjJz
 bLm5zoC6WdywFcbTpTBfsTrS1CHIQG5akRgkEXGdr99epiho5F2lwmagWsUR4ijL
 39Qn3knAUMgtNjoVXXI106h/DfTyxSmkZBfih2AQFyWobJq+0kg7hjQQA3+836b4
 8ssWr1+NSl6JJTqYQ0Paw1kSqvvYoXsu5rWFEfCHk8D0s/1bvr5ldAUpk2jTg93I
 uo9/5+O264yt1YoKZOMqAMZLUfd5DaWY1mV3yeF0Uauy1pBmol5csE8ckqJPDrES
 PRdJT1+PhBeLYWcgXANOBZsW58ddxA0pQ5jQV6VJHQWsm5cE82OBpYJf6xUZ2moV
 o6DZ0KRnCPVA3NllYYR16H+wbMfADwwO83QoA+QTIZJy/WgpDH3Cst+m8KePGqbL
 uFgDdXSws9Bs1BCFs7bfYzAM3OdkBFnn+ac7JoPXKP5ibgAp9nDlurgK2r90zRnp
 j15vHMx0mV+e8B8/iwiW5eRtg7NoCHYiNfFy7JalOlsPmYr2KFazBVKclp13Hng7
 fe/Jy6X4UhWoQPdqsy4ftvSQb0gm1MClxFJeZ3VAt6LY9j8OP6S/Vdf6lpAL85KR
 lAQoQzB+lOhTPdXxFY2xgGkITkqPDOQMjPfowYUYFwybqBuG6BHXZPJobL+niBlb
 Nh+M2WlUUA9Z3V6rWJB6
 =CTPk
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI changes from Bjorn Helgaas:
 "Resource management
    - Fix host bridge window coalescing (Alexey Neyman)
    - Pass type, width, and prefetchability for window alignment (Wei Yang)

  PCI device hotplug
    - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu)

  Power management
    - Remove pci_pm_complete() (Liu Chuansheng)

  MSI
    - Fail initialization if device is not in PCI_D0 (Yijing Wang)

  MPS (Max Payload Size)
    - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang)
    - Use pcie_set_readrq() to simplify code (Yijing Wang)
    - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang)

  SR-IOV
    - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas)
    - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang)

  Virtualization
    - Add x86 MSI masking ops (Konrad Rzeszutek Wilk)

  Freescale i.MX6
    - Support i.MX6 PCIe controller (Sean Cross)
    - Increase link startup timeout (Marek Vasut)
    - Probe PCIe in fs_initcall() (Marek Vasut)
    - Fix imprecise abort handler (Tim Harvey)
    - Remove redundant of_match_ptr (Sachin Kamat)

  Renesas R-Car
    - Support Gen2 internal PCIe controller (Valentine Barshak)

  Samsung Exynos
    - Add MSI support (Jingoo Han)
    - Turn off power when link fails (Jingoo Han)
    - Add Jingoo Han as maintainer (Jingoo Han)
    - Add clk_disable_unprepare() on error path (Wei Yongjun)
    - Remove redundant of_match_ptr (Sachin Kamat)

  Synopsys DesignWare
    - Add irq_create_mapping() (Pratyush Anand)
    - Add header guards (Seungwon Jeon)

  Miscellaneous
    - Enable native PCIe services by default on non-ACPI (Andrew Murray)
    - Cleanup _OSC usage and messages (Bjorn Helgaas)
    - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas)
    - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman)
    - Remove unused pci_mem_start (Myron Stowe)
    - Make sysfs functions static (Sachin Kamat)
    - Warn on invalid return from driver probe (Stephen M. Cameron)
    - Remove Intel Haswell D3 delays (Todd E Brandt)
    - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu)
    - Use pci_is_pcie() to simplify code (Yijing Wang)
    - Use PCIe capability accessors to simplify code (Yijing Wang)
    - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang)
    - Removed unused "is_pcie" from struct pci_dev (Yijing Wang)
    - Simplify sysfs CPU affinity implementation (Yijing Wang)"

* tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (79 commits)
  PCI: Enable upstream bridges even for VFs on virtual buses
  PCI: Add pci_upstream_bridge()
  PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()
  PCI: Warn on driver probe return value greater than zero
  PCI: Drop warning about drivers that don't use pci_set_master()
  PCI: Workaround missing pci_set_master in pci drivers
  powerpc/pci: Use pci_is_pcie() to simplify code [fix]
  PCI: Update pcie_ports 'auto' behavior for non-ACPI platforms
  PCI: imx6: Probe the PCIe in fs_initcall()
  PCI: Add R-Car Gen2 internal PCI support
  PCI: imx6: Remove redundant of_match_ptr
  PCI: Report pci_pme_active() kmalloc failure
  mn10300/PCI: Remove useless pcibios_last_bus
  frv/PCI: Remove pcibios_last_bus
  PCI: imx6: Increase link startup timeout
  PCI: exynos: Remove redundant of_match_ptr
  PCI: imx6: Fix imprecise abort handler
  PCI: Fail MSI/MSI-X initialization if device is not in PCI_D0
  PCI: imx6: Remove redundant dev_err() in imx6_pcie_probe()
  x86/PCI: Coalesce multiple overlapping host bridge windows
  ...
2013-11-14 14:02:00 +09:00
Linus Torvalds
f9300eaaac ACPI and power management updates for 3.13-rc1
- New power capping framework and the the Intel Running Average Power
    Limit (RAPL) driver using it from Srinivas Pandruvada and Jacob Pan.
 
  - Addition of the in-kernel switching feature to the arm_big_little
    cpufreq driver from Viresh Kumar and Nicolas Pitre.
 
  - cpufreq support for iMac G5 from Aaro Koskinen.
 
  - Baytrail processors support for intel_pstate from Dirk Brandewie.
 
  - cpufreq support for Midway/ECX-2000 from Mark Langsdorf.
 
  - ARM vexpress/TC2 cpufreq support from Sudeep KarkadaNagesha.
 
  - ACPI power management support for the I2C and SPI bus types from
    Mika Westerberg and Lv Zheng.
 
  - cpufreq core fixes and cleanups from Viresh Kumar, Srivatsa S Bhat,
    Stratos Karafotis, Xiaoguang Chen, Lan Tianyu.
 
  - cpufreq drivers updates (mostly fixes and cleanups) from Viresh Kumar,
    Aaro Koskinen, Jungseok Lee, Sudeep KarkadaNagesha, Lukasz Majewski,
    Manish Badarkhe, Hans-Christian Egtvedt, Evgeny Kapaev.
 
  - intel_pstate updates from Dirk Brandewie and Adrian Huang.
 
  - ACPICA update to version 20130927 includig fixes and cleanups and
    some reduction of divergences between the ACPICA code in the kernel
    and ACPICA upstream in order to improve the automatic ACPICA patch
    generation process.  From Bob Moore, Lv Zheng, Tomasz Nowicki,
    Naresh Bhat, Bjorn Helgaas, David E Box.
 
  - ACPI IPMI driver fixes and cleanups from Lv Zheng.
 
  - ACPI hotplug fixes and cleanups from Bjorn Helgaas, Toshi Kani,
    Zhang Yanfei, Rafael J Wysocki.
 
  - Conversion of the ACPI AC driver to the platform bus type and
    multiple driver fixes and cleanups related to ACPI from Zhang Rui.
 
  - ACPI processor driver fixes and cleanups from Hanjun Guo, Jiang Liu,
    Bartlomiej Zolnierkiewicz, Mathieu Rhéaume, Rafael J Wysocki.
 
  - Fixes and cleanups and new blacklist entries related to the ACPI
    video support from Aaron Lu, Felipe Contreras, Lennart Poettering,
    Kirill Tkhai.
 
  - cpuidle core cleanups from Viresh Kumar and Lorenzo Pieralisi.
 
  - cpuidle drivers fixes and cleanups from Daniel Lezcano, Jingoo Han,
    Bartlomiej Zolnierkiewicz, Prarit Bhargava.
 
  - devfreq updates from Sachin Kamat, Dan Carpenter, Manish Badarkhe.
 
  - Operation Performance Points (OPP) core updates from Nishanth Menon.
 
  - Runtime power management core fix from Rafael J Wysocki and update
    from Ulf Hansson.
 
  - Hibernation fixes from Aaron Lu and Rafael J Wysocki.
 
  - Device suspend/resume lockup detection mechanism from Benoit Goby.
 
  - Removal of unused proc directories created for various ACPI drivers
    from Lan Tianyu.
 
  - ACPI LPSS driver fix and new device IDs for the ACPI platform scan
    handler from Heikki Krogerus and Jarkko Nikula.
 
  - New ACPI _OSI blacklist entry for Toshiba NB100 from Levente Kurusa.
 
  - Assorted fixes and cleanups related to ACPI from Andy Shevchenko,
    Al Stone, Bartlomiej Zolnierkiewicz, Colin Ian King, Dan Carpenter,
    Felipe Contreras, Jianguo Wu, Lan Tianyu, Yinghai Lu, Mathias Krause,
    Liu Chuansheng.
 
  - Assorted PM fixes and cleanups from Andy Shevchenko, Thierry Reding,
    Jean-Christophe Plagniol-Villard.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJSfPKLAAoJEILEb/54YlRxH6YQAJwDKi25RCZziFSIenXuqzC/
 c6JxoH/tSnDHJHhcTgqh7H7Raa+zmatMDf0m2oEv2Wjfx4Lt4BQK4iefhe/zY4lX
 yJ8uXDg+U8DYhDX2XwbwnFpd1M1k/A+s2gIHDTHHGnE0kDngXdd8RAFFktBmooTZ
 l5LBQvOrTlgX/ZfqI/MNmQ6lfY6kbCABGSHV1tUUsDA6Kkvk/LAUTOMSmptv1q22
 hcs6k55vR34qADPkUX5GghjmcYJv+gNtvbDEJUjcmCwVoPWouF415m7R5lJ8w3/M
 49Q8Tbu5HELWLwca64OorS8qh/P7sgUOf1BX5IDzHnJT+TGeDfvcYbMv2Z275/WZ
 /bqhuLuKBpsHQ2wvEeT+lYV3FlifKeTf1FBxER3ApjzI3GfpmVVQ+dpEu8e9hcTh
 ZTPGzziGtoIsHQ0unxb+zQOyt1PmIk+cU4IsKazs5U20zsVDMcKzPrb19Od49vMX
 gCHvRzNyOTqKWpE83Ss4NGOVPAG02AXiXi/BpuYBHKDy6fTH/liKiCw5xlCDEtmt
 lQrEbupKpc/dhCLo5ws6w7MZzjWJs2eSEQcNR4DlR++pxIpYOOeoPTXXrghgZt2X
 mmxZI2qsJ7GAvPzII8OBeF3CRO3fabZ6Nez+M+oEZjGe05ZtpB3ccw410HwieqBn
 dYpJFt/BHK189odhV9CM
 =JCxk
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management updates from Rafael J Wysocki:

 - New power capping framework and the the Intel Running Average Power
   Limit (RAPL) driver using it from Srinivas Pandruvada and Jacob Pan.

 - Addition of the in-kernel switching feature to the arm_big_little
   cpufreq driver from Viresh Kumar and Nicolas Pitre.

 - cpufreq support for iMac G5 from Aaro Koskinen.

 - Baytrail processors support for intel_pstate from Dirk Brandewie.

 - cpufreq support for Midway/ECX-2000 from Mark Langsdorf.

 - ARM vexpress/TC2 cpufreq support from Sudeep KarkadaNagesha.

 - ACPI power management support for the I2C and SPI bus types from Mika
   Westerberg and Lv Zheng.

 - cpufreq core fixes and cleanups from Viresh Kumar, Srivatsa S Bhat,
   Stratos Karafotis, Xiaoguang Chen, Lan Tianyu.

 - cpufreq drivers updates (mostly fixes and cleanups) from Viresh
   Kumar, Aaro Koskinen, Jungseok Lee, Sudeep KarkadaNagesha, Lukasz
   Majewski, Manish Badarkhe, Hans-Christian Egtvedt, Evgeny Kapaev.

 - intel_pstate updates from Dirk Brandewie and Adrian Huang.

 - ACPICA update to version 20130927 includig fixes and cleanups and
   some reduction of divergences between the ACPICA code in the kernel
   and ACPICA upstream in order to improve the automatic ACPICA patch
   generation process.  From Bob Moore, Lv Zheng, Tomasz Nowicki, Naresh
   Bhat, Bjorn Helgaas, David E Box.

 - ACPI IPMI driver fixes and cleanups from Lv Zheng.

 - ACPI hotplug fixes and cleanups from Bjorn Helgaas, Toshi Kani, Zhang
   Yanfei, Rafael J Wysocki.

 - Conversion of the ACPI AC driver to the platform bus type and
   multiple driver fixes and cleanups related to ACPI from Zhang Rui.

 - ACPI processor driver fixes and cleanups from Hanjun Guo, Jiang Liu,
   Bartlomiej Zolnierkiewicz, Mathieu Rhéaume, Rafael J Wysocki.

 - Fixes and cleanups and new blacklist entries related to the ACPI
   video support from Aaron Lu, Felipe Contreras, Lennart Poettering,
   Kirill Tkhai.

 - cpuidle core cleanups from Viresh Kumar and Lorenzo Pieralisi.

 - cpuidle drivers fixes and cleanups from Daniel Lezcano, Jingoo Han,
   Bartlomiej Zolnierkiewicz, Prarit Bhargava.

 - devfreq updates from Sachin Kamat, Dan Carpenter, Manish Badarkhe.

 - Operation Performance Points (OPP) core updates from Nishanth Menon.

 - Runtime power management core fix from Rafael J Wysocki and update
   from Ulf Hansson.

 - Hibernation fixes from Aaron Lu and Rafael J Wysocki.

 - Device suspend/resume lockup detection mechanism from Benoit Goby.

 - Removal of unused proc directories created for various ACPI drivers
   from Lan Tianyu.

 - ACPI LPSS driver fix and new device IDs for the ACPI platform scan
   handler from Heikki Krogerus and Jarkko Nikula.

 - New ACPI _OSI blacklist entry for Toshiba NB100 from Levente Kurusa.

 - Assorted fixes and cleanups related to ACPI from Andy Shevchenko, Al
   Stone, Bartlomiej Zolnierkiewicz, Colin Ian King, Dan Carpenter,
   Felipe Contreras, Jianguo Wu, Lan Tianyu, Yinghai Lu, Mathias Krause,
   Liu Chuansheng.

 - Assorted PM fixes and cleanups from Andy Shevchenko, Thierry Reding,
   Jean-Christophe Plagniol-Villard.

* tag 'pm+acpi-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (386 commits)
  cpufreq: conservative: fix requested_freq reduction issue
  ACPI / hotplug: Consolidate deferred execution of ACPI hotplug routines
  PM / runtime: Use pm_runtime_put_sync() in __device_release_driver()
  ACPI / event: remove unneeded NULL pointer check
  Revert "ACPI / video: Ignore BIOS initial backlight value for HP 250 G1"
  ACPI / video: Quirk initial backlight level 0
  ACPI / video: Fix initial level validity test
  intel_pstate: skip the driver if ACPI has power mgmt option
  PM / hibernate: Avoid overflow in hibernate_preallocate_memory()
  ACPI / hotplug: Do not execute "insert in progress" _OST
  ACPI / hotplug: Carry out PCI root eject directly
  ACPI / hotplug: Merge device hot-removal routines
  ACPI / hotplug: Make acpi_bus_hot_remove_device() internal
  ACPI / hotplug: Simplify device ejection routines
  ACPI / hotplug: Fix handle_root_bridge_removal()
  ACPI / hotplug: Refuse to hot-remove all objects with disabled hotplug
  ACPI / scan: Start matching drivers after trying scan handlers
  ACPI: Remove acpi_pci_slot_init() headers from internal.h
  ACPI / blacklist: fix name of ThinkPad Edge E530
  PowerCap: Fix build error with option -Werror=format-security
  ...

Conflicts:
	arch/arm/mach-omap2/opp.c
	drivers/Kconfig
	drivers/spi/spi.c
2013-11-14 13:41:48 +09:00
Thomas Gleixner
00d1a39e69 preempt: Make PREEMPT_ACTIVE generic
No point in having this bit defined by architecture.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130917183629.090698799@linutronix.de
2013-11-13 20:21:47 +01:00
Linus Torvalds
5cbb3d216e Merge branch 'akpm' (patches from Andrew Morton)
Merge first patch-bomb from Andrew Morton:
 "Quite a lot of other stuff is banked up awaiting further
  next->mainline merging, but this batch contains:

   - Lots of random misc patches
   - OCFS2
   - Most of MM
   - backlight updates
   - lib/ updates
   - printk updates
   - checkpatch updates
   - epoll tweaking
   - rtc updates
   - hfs
   - hfsplus
   - documentation
   - procfs
   - update gcov to gcc-4.7 format
   - IPC"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (269 commits)
  ipc, msg: fix message length check for negative values
  ipc/util.c: remove unnecessary work pending test
  devpts: plug the memory leak in kill_sb
  ./Makefile: export initial ramdisk compression config option
  init/Kconfig: add option to disable kernel compression
  drivers: w1: make w1_slave::flags long to avoid memory corruption
  drivers/w1/masters/ds1wm.cuse dev_get_platdata()
  drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
  drivers/memstick/core/mspro_block.c: fix attributes array allocation
  drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
  kernel/panic.c: reduce 1 byte usage for print tainted buffer
  gcov: reuse kbasename helper
  kernel/gcov/fs.c: use pr_warn()
  kernel/module.c: use pr_foo()
  gcov: compile specific gcov implementation based on gcc version
  gcov: add support for gcc 4.7 gcov format
  gcov: move gcov structs definitions to a gcc version specific file
  kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
  kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
  kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
  ...
2013-11-13 15:45:43 +09:00
Linus Torvalds
c08acff054 Merge branch 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu changes from Tejun Heo:
 "Two smallish changes for percpu.  Two patches to remove unused
  this_cpu_xor() and one to fix a bug in percpu init failure path so
  that it can reach the proper BUG() instead of oopsing earlier"

* 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  x86: remove this_cpu_xor() implementation
  percpu: remove this_cpu_xor() implementation
  percpu: fix bootmem error handling in pcpu_page_first_chunk()
2013-11-13 15:17:16 +09:00
Vineet Gupta
c375f15a43 x86: move fpu_counter into ARCH specific thread_struct
Only a couple of arches (sh/x86) use fpu_counter in task_struct so it can
be moved out into ARCH specific thread_struct, reducing the size of
task_struct for other arches.

Compile tested i386_defconfig + gcc 4.7.3

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mundt <paul.mundt@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:13 +09:00
Len Brown
144b44b135 tools / power turbostat: Support Silvermont
Support the next generation Intel Atom processor
mirco-architecture, formerly called Silvermont.

The server version, formerly called "Avoton",
is named the "Intel(R) Atom(TM) Processor C2000 Product Family".

The client version, formerly called "Bay Trail",
is named the "Intel Atom Processor Z3000 Series",
as well as various "Intel Pentium Processor"
and "Intel Celeron Processor" brands, depending
on form-factor.

Silvermont has a set of MSRs not far off from NHM,
but the RAPL register set is a sub-set of those previously supported.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-11-12 23:16:02 +01:00
Jiri Slaby
5f01c98859 x86/dumpstack: Fix printk_address for direct addresses
Consider a kernel crash in a module, simulated the following way:

 static int my_init(void)
 {
         char *map = (void *)0x5;
         *map = 3;
         return 0;
 }
 module_init(my_init);

When we turn off FRAME_POINTERs, the very first instruction in
that function causes a BUG. The problem is that we print IP in
the BUG report using %pB (from printk_address). And %pB
decrements the pointer by one to fix printing addresses of
functions with tail calls.

This was added in commit 71f9e59800 ("x86, dumpstack: Use
%pB format specifier for stack trace") to fix the call stack
printouts.

So instead of correct output:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
  IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173]

We get:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
  IP: [<ffffffffa0152000>] 0xffffffffa0151fff

To fix that, we use %pS only for stack addresses printouts (via
newly added printk_stack_address) and %pB for regs->ip (via
printk_address). I.e. we revert to the old behaviour for all
except call stacks. And since from all those reliable is 1, we
remove that parameter from printk_address.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: joe@perches.com
Cc: jirislaby@gmail.com
Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-12 21:06:06 +01:00
Linus Torvalds
10d0c9705e DeviceTree updates for 3.13. This is a bit larger pull request than
usual for this cycle with lots of clean-up.
 
 - Cross arch clean-up and consolidation of early DT scanning code.
 - Clean-up and removal of arch prom.h headers. Makes arch specific
   prom.h optional on all but Sparc.
 - Addition of interrupts-extended property for devices connected to
   multiple interrupt controllers.
 - Refactoring of DT interrupt parsing code in preparation for deferred
   probe of interrupts.
 - ARM cpu and cpu topology bindings documentation.
 - Various DT vendor binding documentation updates.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0
 R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi
 huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20
 PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN
 2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa
 Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8=
 =GCbY
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:
 "DeviceTree updates for 3.13.  This is a bit larger pull request than
  usual for this cycle with lots of clean-up.

   - Cross arch clean-up and consolidation of early DT scanning code.
   - Clean-up and removal of arch prom.h headers.  Makes arch specific
     prom.h optional on all but Sparc.
   - Addition of interrupts-extended property for devices connected to
     multiple interrupt controllers.
   - Refactoring of DT interrupt parsing code in preparation for
     deferred probe of interrupts.
   - ARM cpu and cpu topology bindings documentation.
   - Various DT vendor binding documentation updates"

* tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
  powerpc: add missing explicit OF includes for ppc
  dt/irq: add empty of_irq_count for !OF_IRQ
  dt: disable self-tests for !OF_IRQ
  of: irq: Fix interrupt-map entry matching
  MIPS: Netlogic: replace early_init_devtree() call
  of: Add Panasonic Corporation vendor prefix
  of: Add Chunghwa Picture Tubes Ltd. vendor prefix
  of: Add AU Optronics Corporation vendor prefix
  of/irq: Fix potential buffer overflow
  of/irq: Fix bug in interrupt parsing refactor.
  of: set dma_mask to point to coherent_dma_mask
  of: add vendor prefix for PHYTEC Messtechnik GmbH
  DT: sort vendor-prefixes.txt
  of: Add vendor prefix for Cadence
  of: Add empty for_each_available_child_of_node() macro definition
  arm/versatile: Fix versatile irq specifications.
  of/irq: create interrupts-extended property
  microblaze/pci: Drop PowerPC-ism from irq parsing
  of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
  of/irq: Use irq_of_parse_and_map()
  ...
2013-11-12 16:52:17 +09:00
Linus Torvalds
9b66bfb280 Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 UV debug changes from Ingo Molnar:
 "Various SGI UV debuggability improvements, amongst them KDB support,
  with related core KDB enabling patches changing kernel/debug/kdb/"

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "x86/UV: Add uvtrace support"
  x86/UV: Add call to KGDB/KDB from NMI handler
  kdb: Add support for external NMI handler to call KGDB/KDB
  x86/UV: Check for alloc_cpumask_var() failures properly in uv_nmi_setup()
  x86/UV: Add uvtrace support
  x86/UV: Add kdump to UV NMI handler
  x86/UV: Add summary of cpu activity to UV NMI handler
  x86/UV: Update UV support for external NMI signals
  x86/UV: Move NMI support
2013-11-12 12:01:14 +09:00
Linus Torvalds
c2136301e4 Merge branch 'x86-uaccess-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 uaccess changes from Ingo Molnar:
 "A single change that micro-optimizes __copy_*_user_inatomic(), used by
  the futex code"

* 'x86-uaccess-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic
2013-11-12 11:46:06 +09:00
Linus Torvalds
340286cd4e Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS changes from Ingo Molnar:
 "The biggest change adds support for Intel 'CPER' (UEFI Common Platform
  Error Record) error logging, which builds upon an enhanced error
  logging mechanism available on Xeon processors.

  Full description is here:

    http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html

  This change provides a module (and support code) to check for an
  extended error log and prints extra details about the error on the
  console"

* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ACPI, x86: Fix extended error log driver to depend on CONFIG_X86_LOCAL_APIC
  dmi: Avoid unaligned memory access in save_mem_devices()
  Move cper.c from drivers/acpi/apei to drivers/firmware/efi
  EDAC, GHES: Update ghes error record info
  ACPI, APEI, CPER: Cleanup CPER memory error output format
  ACPI, APEI, CPER: Enhance memory reporting capability
  ACPI, APEI, CPER: Add UEFI 2.4 support for memory error
  DMI: Parse memory device (type 17) in SMBIOS
  ACPI, x86: Extended error log driver for x86 platform
  bitops: Introduce a more generic BITMASK macro
  ACPI, CPER: Update cper info
  ACPI, APEI, CPER: Fix status check during error printing
2013-11-12 11:16:44 +09:00
Linus Torvalds
dba538ff56 Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/intel-mid changes from Ingo Molnar:
 "Update the 'intel mid' (mobile internet device) platform code as Intel
  is rolling out more SoC designs.

  This gets rid of most of the 'MRST' platform code in the process,
  mostly by renaming and shuffling code around into their respective
  'intel-mid' platform drivers"

* 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, intel-mid: Do not re-introduce usage of obsolete __cpuinit
  intel_mid: Move platform device setups to their own platform_<device>.* files
  x86: intel-mid: Add section for sfi device table
  intel-mid: sfi: Allow struct devs_id.get_platform_data to be NULL
  intel_mid: Moved SFI related code to sfi.c
  intel_mid: Added custom handler for ipc devices
  intel_mid: Added custom device_handler support
  intel_mid: Refactored sfi_parse_devs() function
  intel_mid: Renamed *mrst* to *intel_mid*
  pci: intel_mid: Return true/false in function returning bool
  intel_mid: Renamed *mrst* to *intel_mid*
  mrst: Fixed indentation issues
  mrst: Fixed printk/pr_* related issues
2013-11-12 11:12:22 +09:00
Linus Torvalds
2dc1733fd4 Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/hyperv changes from Ingo Molnar:
 "These changes enable Linux guests to boot as 'Modern VM' guest kernels
  on MS-Hyperv hosts"

* 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, hyperv: Move a variable to avoid an unused variable warning
  x86, hyperv: Fix build error due to missing <asm/apic.h> include
  x86, hyperv: Correctly guard the local APIC calibration code
  x86, hyperv: Get the local APIC timer frequency from the hypervisor
2013-11-12 11:07:49 +09:00
Linus Torvalds
69019d77c7 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI changes from Ingo Molnar:
 "Main changes:

   - Add support for earlyprintk=efi which uses the EFI framebuffer.
     Very useful for debugging boot problems.

   - EFI stub support for large memory maps (more than 128 entries)

   - EFI ARM support - this was mostly done by generalizing x86 <-> ARM
     platform differences, such as by moving x86 EFI code into
     drivers/firmware/efi/ and sharing it with ARM.

   - Documentation updates

   - misc fixes"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  x86/efi: Add EFI framebuffer earlyprintk support
  boot, efi: Remove redundant memset()
  x86/efi: Fix config_table_type array termination
  x86 efi: bugfix interrupt disabling sequence
  x86: EFI stub support for large memory maps
  efi: resolve warnings found on ARM compile
  efi: Fix types in EFI calls to match EFI function definitions.
  efi: Renames in handle_cmdline_files() to complete generalization.
  efi: Generalize handle_ramdisks() and rename to handle_cmdline_files().
  efi: Allow efi_free() to be called with size of 0
  efi: use efi_get_memory_map() to get final map for x86
  efi: generalize efi_get_memory_map()
  efi: Rename __get_map() to efi_get_memory_map()
  efi: Move unicode to ASCII conversion to shared function.
  efi: Generalize relocate_kernel() for use by other architectures.
  efi: Move relocate_kernel() to shared file.
  efi: Enforce minimum alignment of 1 page on allocations.
  efi: Rename memory allocation/free functions
  efi: Add system table pointer argument to shared functions.
  efi: Move common EFI stub code from x86 arch code to common location
  ...
2013-11-12 10:48:30 +09:00
Linus Torvalds
014d595c23 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot changes from Ingo Molnar:
 "Two changes that prettify and compactify the SMP bootup output from:

     smpboot: Booting Node   0, Processors  #1 #2 #3 OK
     smpboot: Booting Node   1, Processors  #4 #5 #6 #7 OK
     smpboot: Booting Node   2, Processors  #8 #9 #10 #11 OK
     smpboot: Booting Node   3, Processors  #12 #13 #14 #15 OK
     Brought up 16 CPUs

  to something like:

     x86: Booting SMP configuration:
     .... node  #0, CPUs:        #1  #2  #3
     .... node  #1, CPUs:    #4  #5  #6  #7
     .... node  #2, CPUs:    #8  #9 #10 #11
     .... node  #3, CPUs:   #12 #13 #14 #15
     x86: Booted up 4 nodes, 16 CPUs"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Further compress CPUs bootup message
  x86: Improve the printout of the SMP bootup CPU table
2013-11-12 10:41:10 +09:00
Linus Torvalds
ae795fe760 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 user access changes from Ingo Molnar:
 "This tree contains two copy_[from/to]_user() build time checking
  changes/enhancements from Jan Beulich.

  The desired outcome is to get better compiler warnings with
  CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y, to keep people from
  introducing bugs such as overflows and information leaks"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Unify copy_to_user() and add size checking to it
  x86: Unify copy_from_user() size checking
2013-11-12 10:39:08 +09:00
Linus Torvalds
39cf275a1a Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The main changes in this cycle are:

   - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
     van Riel, Peter Zijlstra et al.  Yay!

   - optimize preemption counter handling: merge the NEED_RESCHED flag
     into the preempt_count variable, by Peter Zijlstra.

   - wait.h fixes and code reorganization from Peter Zijlstra

   - cfs_bandwidth fixes from Ben Segall

   - SMP load-balancer cleanups from Peter Zijstra

   - idle balancer improvements from Jason Low

   - other fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
  ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
  stop_machine: Fix race between stop_two_cpus() and stop_cpus()
  sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
  sched: Fix asymmetric scheduling for POWER7
  sched: Move completion code from core.c to completion.c
  sched: Move wait code from core.c to wait.c
  sched: Move wait.c into kernel/sched/
  sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
  sched: Avoid throttle_cfs_rq() racing with period_timer stopping
  sched: Guarantee new group-entities always have weight
  sched: Fix hrtimer_cancel()/rq->lock deadlock
  sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
  sched: Fix race on toggling cfs_bandwidth_used
  sched: Remove extra put_online_cpus() inside sched_setaffinity()
  sched/rt: Fix task_tick_rt() comment
  sched/wait: Fix build breakage
  sched/wait: Introduce prepare_to_wait_event()
  sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
  sched: Remove get_online_cpus() usage
  sched: Fix race in migrate_swap_stop()
  ...
2013-11-12 10:20:12 +09:00
Ingo Molnar
b5dfcb09de Revert "x86/UV: Add uvtrace support"
This reverts commit 8eba18428a.

uv_trace() is not used by anything, nor is uv_trace_nmi_func, nor
uv_trace_func.

That's not how we do instrumentation code in the kernel: we add
tracepoints, printk()s, etc. so that everyone not just those with
magic kernel modules can debug a system.

So remove this unused (and misguied) piece of code.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/n/tip-tumfBffmr4jmnt8Gyxanoblg@git.kernel.org
2013-11-11 19:53:42 +01:00
H. Peter Anvin
a4f61dec55 x86, trace: Change user|kernel_page_fault to page_fault_user|kernel
Tracepoints are named hierachially, and it makes more sense to keep a
general flow of information level from general to specific from left
to right, i.e.

	x86_exceptions.page_fault_user|kernel

rather than

	x86_exceptions.user|kernel_page_fault

Suggested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/20131111082955.GB12405@gmail.com
2013-11-11 08:15:40 -08:00
Seiji Aguchi
d34603b07c x86, trace: Add page fault tracepoints
This patch introduces page fault tracepoints to x86 architecture
by switching IDT.

  Two events, for user and kernel spaces, are introduced at the beginning
  of page fault handler for tracing.

  - User space event
    There is a request of page fault event for user space as below.

    https://lkml.kernel.org/r/1368079520-11015-2-git-send-email-fdeslaur+()+gmail+!+com
    https://lkml.kernel.org/r/1368079520-11015-1-git-send-email-fdeslaur+()+gmail+!+com

  - Kernel space event:
    When we measure an overhead in kernel space for investigating performance
    issues, we can check if it comes from the page fault events.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/52716E67.6090705@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-11-08 14:15:49 -08:00
Seiji Aguchi
ac7956e269 x86, trace: Delete __trace_alloc_intr_gate()
Currently irq vector handlers for tracing are registered in both set_intr_gate()
 and __trace_alloc_intr_gate() in alloc_intr_gate().
But, we don't need to do that twice.
So, let's delete __trace_alloc_intr_gate().

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/52716E1B.7090205@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-11-08 14:15:47 -08:00
Seiji Aguchi
25c74b10ba x86, trace: Register exception handler to trace IDT
This patch registers exception handlers for tracing to a trace IDT.

To implemented it in set_intr_gate(), this patch does followings.
 - Register the exception handlers to
   the trace IDT by prepending "trace_" to the handler's names.
 - Also, newly introduce trace_page_fault() to add tracepoints
   in a subsequent patch.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/52716DEC.5050204@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-11-08 14:15:45 -08:00
Seiji Aguchi
959c071f09 x86, trace: Remove __alloc_intr_gate()
Prepare to move set_intr_gate() into a macro by removing
__alloc_intr_gate().

The purpose is to avoid failing a kernel build after applying a
subsequent patch which changes set_intr_gate() into a macro.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/52716DB8.1080702@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-11-08 14:15:44 -08:00
Rob Herring
b5480950c6 Merge remote-tracking branch 'grant/devicetree/next' into for-next 2013-11-07 10:34:46 -06:00
Konrad Rzeszutek Wilk
0e4ccb1505 PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()
Certain platforms do not allow writes in the MSI-X BARs to setup or tear
down vector values.  To combat against the generic code trying to write to
that and either silently being ignored or crashing due to the pagetables
being marked R/O this patch introduces a platform override.

Note that we keep two separate, non-weak, functions default_mask_msi_irqs()
and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs()
and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI
code.

For Xen, which does not allow the guest to write to MSI-X tables - as the
hypervisor is solely responsible for setting the vector values - we
implement two nops.

This fixes a Xen guest crash when passing a PCI device with MSI-X to the
guest.  See the bugzilla for more details.

[bhelgaas: add bugzilla info]
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=64581
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
2013-11-06 16:32:19 -07:00
Oleg Nesterov
8a8de66c4f uprobes: Introduce arch_uprobe->ixol
Currently xol_get_insn_slot() assumes that we should simply copy
arch_uprobe->insn[] which is (ignoring arch_uprobe_analyze_insn)
just the copy of the original insn.

This is not true for arm which needs to create another insn to
execute it out-of-line.

So this patch simply adds the new member, ->ixol into the union.
This doesn't make any difference for x86 and powerpc, but arm
can divorce insn/ixol and initialize the correct xol insn in
arch_uprobe_analyze_insn().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-11-06 20:00:05 +01:00
David A. Long
3820b4d278 uprobes: Move function declarations out of arch
Move the function declarations from the arch headers to the common
header, since only the function bodies are architecture-specific.
These changes are from Vincent Rabin's uprobes patch.

[ oleg: update arch/powerpc/include/asm/uprobes.h ]

Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-11-06 19:59:37 +01:00
Josh Triplett
a890b6fefd kvm: Delete prototype for non-existent function kvm_check_iopl
The prototype for kvm_check_iopl appeared in commit
f850e2e603 ("KVM: x86 emulator: Check IOPL
level during io instruction emulation"), but the function never actually
existed.  Remove the prototype.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-11-06 11:07:09 +02:00
Josh Triplett
35a5121b58 kvm: Delete prototype for non-existent function complete_pio
complete_pio ceased to exist in commit
7972995b0c ("KVM: x86 emulator: Move
string pio emulation into emulator.c"), but the prototype remained.
Remove its prototype.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-11-06 11:06:32 +02:00
Marcelo Tosatti
d63285e94a pvclock: detect watchdog reset at pvclock read
Implement reset of kernel watchdogs at pvclock read time. This avoids
adding special code to every watchdog.

This is possible for watchdogs which measure time based on sched_clock() or
ktime_get() variants.

Suggested by Don Zickus.

Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-11-06 09:48:43 +02:00
Borislav Petkov
d2f7cbe7b2 x86/efi: Runtime services virtual mapping
We map the EFI regions needed for runtime services non-contiguously,
with preserved alignment on virtual addresses starting from -4G down
for a total max space of 64G. This way, we provide for stable runtime
services addresses across kernels so that a kexec'd kernel can still use
them.

Thus, they're mapped in a separate pagetable so that we don't pollute
the kernel namespace.

Add an efi= kernel command line parameter for passing miscellaneous
options and chicken bits from the command line.

While at it, add a chicken bit called "efi=old_map" which can be used as
a fallback to the old runtime services mapping method in case there's
some b0rkage with a particular EFI implementation (haha, it is hard to
hold up the sarcasm here...).

Also, add the UEFI RT VA space to Documentation/x86/x86_64/mm.txt.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-11-02 11:09:36 +00:00
Ingo Molnar
fb10d5b7ef Merge branch 'linus' into sched/core
Resolve cherry-picking conflicts:

Conflicts:
	mm/huge_memory.c
	mm/memory.c
	mm/mprotect.c

See this upstream merge commit for more details:

  52469b4fcd Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-01 08:24:41 +01:00
Greg Thelen
bd09d9a351 percpu: fix this_cpu_sub() subtrahend casting for unsigneds
this_cpu_sub() is implemented as negation and addition.

This patch casts the adjustment to the counter type before negation to
sign extend the adjustment.  This helps in cases where the counter type
is wider than an unsigned adjustment.  An alternative to this patch is
to declare such operations unsupported, but it seemed useful to avoid
surprises.

This patch specifically helps the following example:
  unsigned int delta = 1
  preempt_disable()
  this_cpu_write(long_counter, 0)
  this_cpu_sub(long_counter, delta)
  preempt_enable()

Before this change long_counter on a 64 bit machine ends with value
0xffffffff, rather than 0xffffffffffffffff.  This is because
this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
which is basically:
  long_counter = 0 + 0xffffffff

Also apply the same cast to:
  __this_cpu_sub()
  __this_cpu_sub_return()
  this_cpu_sub_return()

All percpu_test.ko passes, especially the following cases which
previously failed:

  l -= ui_one;
  __this_cpu_sub(long_counter, ui_one);
  CHECK(l, long_counter, -1);

  l -= ui_one;
  this_cpu_sub(long_counter, ui_one);
  CHECK(l, long_counter, -1);
  CHECK(l, long_counter, 0xffffffffffffffff);

  ul -= ui_one;
  __this_cpu_sub(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, -1);
  CHECK(ul, ulong_counter, 0xffffffffffffffff);

  ul = this_cpu_sub_return(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, 2);

  ul = __this_cpu_sub_return(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, 1);

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 14:27:03 -07:00
Alex Williamson
e0f0bbc527 kvm: Create non-coherent DMA registeration
We currently use some ad-hoc arch variables tied to legacy KVM device
assignment to manage emulation of instructions that depend on whether
non-coherent DMA is present.  Create an interface for this, adapting
legacy KVM device assignment and adding VFIO via the KVM-VFIO device.
For now we assume that non-coherent DMA is possible any time we have a
VFIO group.  Eventually an interface can be developed as part of the
VFIO external user interface to query the coherency of a group.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30 19:02:23 +01:00
Alex Williamson
d96eb2c6f4 kvm/x86: Convert iommu_flags to iommu_noncoherent
Default to operating in coherent mode.  This simplifies the logic when
we switch to a model of registering and unregistering noncoherent I/O
with KVM.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30 19:02:13 +01:00
Borislav Petkov
b51e974fcd kvm, emulator: Rename VendorSpecific flag
Call it EmulateOnUD which is exactly what we're trying to do with
vendor-specific instructions.

Rename ->only_vendor_specific_insn to something shorter, while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30 18:54:40 +01:00
Borislav Petkov
1ce19dc16c kvm, emulator: Use opcode length
Add a field to the current emulation context which contains the
instruction opcode length. This will streamline handling of opcodes of
different length.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30 18:54:39 +01:00
Borislav Petkov
9c15bb1d0a kvm: Add KVM_GET_EMULATED_CPUID
Add a kvm ioctl which states which system functionality kvm emulates.
The format used is that of CPUID and we return the corresponding CPUID
bits set for which we do emulate functionality.

Make sure ->padding is being passed on clean from userspace so that we
can use it for something in the future, after the ioctl gets cast in
stone.

s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at
it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30 18:54:39 +01:00
Matt Fleming
72548e836b x86/efi: Add EFI framebuffer earlyprintk support
It's incredibly difficult to diagnose early EFI boot issues without
special hardware because earlyprintk=vga doesn't work on EFI systems.

Add support for writing to the EFI framebuffer, via earlyprintk=efi,
which will actually give users a chance of providing debug output.

Cc: H. Peter Anvin <hpa@zytor.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Jones <pjones@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-10-28 18:09:58 +00:00
Rafael J. Wysocki
ce6bceabae Merge branch 'powercap'
* powercap:
  PowerCap: Convert class code to use dev_groups
  PowerCap: Introduce Intel RAPL power capping driver
  bitops: Introduce BIT_ULL
  x86 / msr: add 64bit _on_cpu access functions
  PowerCap: Add to drivers Kconfig and Makefile
  PowerCap: Add class driver
  PowerCap: Documentation
2013-10-28 01:21:49 +01:00
Rafael J. Wysocki
3f66c315f5 Merge branch 'acpi-tables'
* acpi-tables:
  ACPI / x86: Increase override tables number limit
2013-10-28 01:13:29 +01:00
Rafael J. Wysocki
3fbc4d6374 Merge branch 'acpi-processor'
* acpi-processor:
  ACPI / processor: fixed a brace coding style issue
  ACPI / processor: Remove outdated comments
  ACPI / processor: remove unnecessary if (!pr) check
  ACPI / processor: remove some dead code in acpi_processor_get_info()
  x86 / ACPI: simplify _acpi_map_lsapic()
  ACPI / processor: use apic_id and remove duplicated _MAT evaluation
  ACPI / processor: Introduce apic_id in struct processor to save parsed APIC id
2013-10-28 01:11:24 +01:00
Heiko Carstens
90f2492cf9 x86: remove this_cpu_xor() implementation
Remove the unused x86 implementation of this_cpu_xor().

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-10-27 09:03:46 -04:00
Jan Beulich
7a3d9b0f3a x86: Unify copy_to_user() and add size checking to it
Similarly to copy_from_user(), where the range check is to
protect against kernel memory corruption, copy_to_user() can
benefit from such checking too: Here it protects against kernel
information leaks.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/5265059502000078000FC4F6@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
2013-10-26 12:27:37 +02:00
Jan Beulich
3df7b41aa5 x86: Unify copy_from_user() size checking
Commits 4a31276930 ("x86: Turn the
copy_from_user check into an (optional) compile time warning")
and 63312b6a6f ("x86: Add a
Kconfig option to turn the copy_from_user warnings into errors")
touched only the 32-bit variant of copy_from_user(), whereas the
original commit 9f0cf4adb6 ("x86:
Use __builtin_object_size() to validate the buffer size for
copy_from_user()") also added the same code to the 64-bit one.

Further the earlier conversion from an inline WARN() to the call
to copy_from_user_overflow() went a little too far: When the
number of bytes to be copied is not a constant (e.g. [looking at
3.11] in drivers/net/tun.c:__tun_chr_ioctl() or
drivers/pci/pcie/aer/aer_inject.c:aer_inject_write()), the
compiler will always have to keep the funtion call, and hence
there will always be a warning. By using __builtin_constant_p()
we can avoid this.

And then this slightly extends the effect of
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS in that apart from
converting warnings to errors in the constant size case, it
retains the (possibly wrong) warnings in the non-constant size
case, such that if someone is prepared to get a few false
positives, (s)he'll be able to recover the current behavior
(except that these diagnostics now will never be converted to
errors).

Since the 32-bit variant (intentionally) didn't call
might_fault(), the unification results in this being called
twice now. Adding a suitable #ifdef would be the alternative if
that's a problem.

I'd like to point out though that with
__compiletime_object_size() being restricted to gcc before 4.6,
the whole construct is going to become more and more pointless
going forward. I would question however that commit
2fb0815c9e ("gcc4: disable
__compiletime_object_size for GCC 4.6+") was really necessary,
and instead this should have been dealt with as is done here
from the beginning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5265056D02000078000FC4F3@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-26 12:27:36 +02:00
Stefano Stabellini
7100b077ab xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device
Introduce xen_dma_map_page, xen_dma_unmap_page,
xen_dma_sync_single_for_cpu and xen_dma_sync_single_for_device.
They have empty implementations on x86 and ia64 but they call the
corresponding platform dma_ops function on arm and arm64.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Changes in v9:
- xen_dma_map_page return void, avoid page_to_phys.
2013-10-25 10:39:49 +00:00
Chen, Gong
4b3db708b1 ACPI, x86: Extended error log driver for x86 platform
This H/W error log driver (a.k.a eMCA driver) is implemented based on
http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html

After errors are captured, more detailed platform specific information
can be got via this new enhanced H/W error log driver. Most notably we
can track memory errors back to the DIMM slot silk screen label.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-10-23 10:09:07 -07:00
David Cohen
40a96d54ee intel_mid: Move platform device setups to their own platform_<device>.* files
As Intel rolling out more SoC's after Moorestown, we need to
re-structure the code in a way that is backward compatible and easy to
expand. This patch implements a flexible way to support multiple boards
and devices.

This patch does not add any new functional support. It just refactors
the existing code to increase the modularity and decrease the code
duplication for supporting multiple soc's and boards.

Currently intel-mid.c has both board and soc related code in one file.
This patch moves the board related code to new files and let linker
script to create SFI devite table following this:

1. Move the SFI device specific code to
   arch/x86/platform/intel-mid/device-libs/platform_<device>.*
   A new device file is added for every supported device. This code will
   get conditionally compiled by using corresponding device driver
   CONFIG option.

2. Move the device_ids location to .x86_intel_mid_dev.init section by
   using new sfi_device() macro.

This patch was based on previous code from Sathyanarayanan Kuppuswamy.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: http://lkml.kernel.org/r/1382049336-21316-13-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-17 16:41:50 -07:00
Kuppuswamy Sathyanarayanan
aeedb370e7 intel_mid: Moved SFI related code to sfi.c
Moved SFI specific parsing/handling code to sfi.c. This will enable us
to reuse our intel-mid code for platforms that supports firmware
interfaces other than SFI (like ACPI).

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: http://lkml.kernel.org/r/1382049336-21316-10-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-17 16:40:51 -07:00
Kuppuswamy Sathyanarayanan
49c72a0a8a intel_mid: Added custom handler for ipc devices
Added a custom handler for medfield based ipc devices and
moved devs_id structure defintion to header file.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: http://lkml.kernel.org/r/1382049336-21316-9-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-17 16:40:50 -07:00
Kuppuswamy Sathyanarayanan
712b6aa873 intel_mid: Renamed *mrst* to *intel_mid*
mrst is used as common name to represent all intel_mid type
soc's. But moorsetwon is just one of the intel_mid soc. So
renamed them to use intel_mid.

This patch mainly renames the variables and related
functions that uses *mrst* prefix with *intel_mid*.

To ensure that there are no functional changes, I have compared
the objdump of related files before and after rename and found
the only difference is symbol and name changes.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: http://lkml.kernel.org/r/1382049336-21316-6-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-17 16:40:47 -07:00
Kuppuswamy Sathyanarayanan
05454c26eb intel_mid: Renamed *mrst* to *intel_mid*
Following files contains code that is common to all intel mid
soc's. So renamed them as below.

mrst/mrst.c              -> intel-mid/intel-mid.c
mrst/vrtc.c              -> intel-mid/intel_mid_vrtc.c
mrst/early_printk_mrst.c -> intel-mid/intel_mid_vrtc.c
pci/mrst.c               -> pci/intel_mid_pci.c

Also, renamed the corresponding header files and made changes
to the driver files that included these header files.

To ensure that there are no functional changes, I have compared
the objdump of renamed files before and after rename and found
that the only difference is file name change.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: http://lkml.kernel.org/r/1382049336-21316-4-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-17 16:40:36 -07:00
Gleb Natapov
13acfd5715 Powerpc KVM work is based on a commit after rc4.
Merging master into next to satisfy the dependencies.

Conflicts:
	arch/arm/kvm/reset.c
2013-10-17 17:41:49 +03:00
Jacob Pan
1a6b991a98 x86 / msr: add 64bit _on_cpu access functions
Having 64-bit MSR access methods on given CPU can avoid shifting and
simplify MSR content manipulation. We already have other combinations
of rdmsrl_xxx and wrmsrl_xxx but missing the _on_cpu version.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-17 00:36:06 +02:00
Christoffer Dall
6d9d41e574 KVM: Move gfn_to_index to x86 specific code
The gfn_to_index function relies on huge page defines which either may
not make sense on systems that don't support huge pages or are defined
in an unconvenient way for other architectures.  Since this is
x86-specific, move the function to arch/x86/include/asm/kvm_host.h.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-10-14 10:11:44 +03:00
Kees Cook
6145cfe394 x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64
On 64-bit, this raises the maximum location to -1 GiB (from -1.5 GiB),
the upper limit currently, since the kernel fixmap page mappings need
to be moved to use the other 1 GiB (which would be the theoretical
limit when building with -mcmodel=kernel).

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1381450698-28710-7-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-13 03:13:13 -07:00
Kees Cook
5bfce5ef55 x86, kaslr: Provide randomness functions
Adds potential sources of randomness: RDRAND, RDTSC, or the i8254.

This moves the pre-alternatives inline rdrand function into the header so
both pieces of code can use it. Availability of RDRAND is then controlled
by CONFIG_ARCH_RANDOM, if someone wants to disable it even for kASLR.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1381450698-28710-4-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-13 03:12:12 -07:00
Ingo Molnar
88f182dd77 x86: Apply the asm_volatile_goto() compiler quirk
Apply the asm_volatile_goto() compiler quirk to the new rmwcc.h
file as well, introduced in:

   c2daa3bed5 sched, x86: Provide a per-cpu preempt_count implementation

Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Suggested-by: Jakub Jelinek <jakub@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-11 07:41:43 +02:00
Ingo Molnar
ec0ad3d01f Merge branch 'core/urgent' into sched/core
Merge in asm goto fix, to be able to apply the asm/rmwcc.h fix.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-11 07:39:37 +02:00
Ingo Molnar
3f0116c323 compiler/gcc4: Add quirk for 'asm goto' miscompilation bug
Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down
a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto'
constructs, as outlined here:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670

Implement a workaround suggested by Jakub Jelinek.

Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Suggested-by: Jakub Jelinek <jakub@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-11 07:39:14 +02:00
K. Y. Srinivasan
9e7827b5ea x86, hyperv: Get the local APIC timer frequency from the hypervisor
Hyper-V supports a mechanism for retrieving the local APIC frequency.
Use this and bypass the calibration code in the kernel . This would
allow us to boot the Linux kernel as a "modern VM" on Hyper-V where
many of the legacy devices (such as PIT) are not emulated.

I would like to thank Olaf Hering <olaf@aepfle.de>, Jan Beulich <JBeulich@suse.com> and
H. Peter Anvin <h.peter.anvin@intel.com> for their help in this effort.

In this version of the patch, I have addressed Jan's comments.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1380554932-9888-1-git-send-email-olaf@aepfle.de
Tested-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-10 11:44:12 -07:00
Arthur Chunqi Li
7854cbca81 KVM: nVMX: Fully support nested VMX preemption timer
This patch contains the following two changes:
1. Fix the bug in nested preemption timer support. If vmexit L2->L0
with some reasons not emulated by L1, preemption timer value should
be save in such exits.
2. Add support of "Save VMX-preemption timer value" VM-Exit controls
to nVMX.

With this patch, nested VMX preemption timer features are fully
supported.

Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-10 18:22:54 +02:00
Rob Herring
32df8dca50 of: remove HAVE_ARCH_DEVTREE_FIXUPS
HAVE_ARCH_DEVTREE_FIXUPS appears to always be needed except for sparc,
but it is only used for /proc/device-teee and sparc does not enable
/proc/device-tree. So this option is redundant. Remove the option and
always enable it. This has the side effect of fixing /proc/device-tree
on arches such as arm64 which failed to define this option.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
2013-10-09 20:04:08 -05:00
Rob Herring
25ff79443c of: implement pci_address_to_pio as weak function
Implement pci_address_to_pio as weak function to remove the dependency on
asm/prom.h. This is in preparation to make prom.h optional.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Grant Likely <grant.likely@linaro.org>
2013-10-09 20:04:06 -05:00
Stefano Stabellini
d6fe76c58c xen: introduce xen_alloc/free_coherent_pages
xen_swiotlb_alloc_coherent needs to allocate a coherent buffer for cpu
and devices. On native x86 is sufficient to call __get_free_pages in
order to get a coherent buffer, while on ARM (and potentially ARM64) we
need to call the native dma_ops->alloc implementation.

Introduce xen_alloc_coherent_pages to abstract the arch specific buffer
allocation.

Similarly introduce xen_free_coherent_pages to free a coherent buffer:
on x86 is simply a call to free_pages while on ARM and ARM64 is
arm_dma_ops.free.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


Changes in v7:
- rename __get_dma_ops to __generic_dma_ops;
- call __generic_dma_ops(hwdev)->alloc/free on arm64 too.

Changes in v6:
- call __get_dma_ops to get the native dma_ops pointer on arm.
2013-10-09 17:18:14 +00:00
Ingo Molnar
37bf06375c Linux 3.12-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSUc9zAAoJEHm+PkMAQRiG9DMH/AtpuAF6LlMRPjrCeuJQ1pyh
 T0IUO+CsLKO6qtM5IyweP8V6zaasNjIuW1+B6IwVIl8aOrM+M7CwRiKvpey26ldM
 I8G2ron7hqSOSQqSQs20jN2yGAqQGpYIbTmpdGLAjQ350NNNvEKthbP5SZR5PAmE
 UuIx5OGEkaOyZXvCZJXU9AZkCxbihlMSt2zFVxybq2pwnGezRUYgCigE81aeyE0I
 QLwzzMVdkCxtZEpkdJMpLILAz22jN4RoVDbXRa2XC7dA9I2PEEXI9CcLzqCsx2Ii
 8eYS+no2K5N2rrpER7JFUB2B/2X8FaVDE+aJBCkfbtwaYTV9UYLq3a/sKVpo1Cs=
 =xSFJ
 -----END PGP SIGNATURE-----

Merge tag 'v3.12-rc4' into sched/core

Merge Linux v3.12-rc4 to fix a conflict and also to refresh the tree
before applying more scheduler patches.

Conflicts:
	arch/avr32/include/asm/Kbuild

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09 12:36:13 +02:00
Paolo Bonzini
8a3c1a3347 KVM: mmu: change useless int return types to void
kvm_mmu initialization is mostly filling in function pointers, there is
no way for it to fail.  Clean up unused return values.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-10-03 15:44:02 +03:00
Paolo Bonzini
d8d173dab2 KVM: mmu: remove uninteresting MMU "new_cr3" callbacks
The new_cr3 MMU callback has been a wrapper for mmu_free_roots since commit
e676505 (KVM: MMU: Force cr3 reload with two dimensional paging on mov
cr3 emulation, 2012-07-08).

The commit message mentioned that "mmu_free_roots() is somewhat of an overkill,
but fixing that is more complicated and will be done after this minimal fix".
One year has passed, and no one really felt the need to do a different fix.
Wrap the call with a kvm_mmu_new_cr3 function for clarity, but remove the
callback.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-10-03 15:43:59 +03:00
Paolo Bonzini
206260941f KVM: mmu: remove uninteresting MMU "free" callbacks
The free MMU callback has been a wrapper for mmu_free_roots since mmu_free_roots
itself was introduced (commit 17ac10a, [PATCH] KVM: MU: Special treatment
for shadow pae root pages, 2007-01-05), and has always been the same for all
MMU cases.  Remove the indirection as it is useless.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-10-03 15:43:56 +03:00
Paolo Bonzini
4344ee981e KVM: x86: only copy XSAVE state for the supported features
This makes the interface more deterministic for userspace, which can expect
(after configuring only the features it supports) to get exactly the same
state from the kernel, independent of the host CPU and kernel version.

Suggested-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-10-03 12:29:09 +03:00
Paolo Bonzini
d7876f1be4 KVM: x86: prevent setting unsupported XSAVE states
A guest can still attempt to save and restore XSAVE states even if they
have been masked in CPUID leaf 0Dh.  This usually is not visible to
the guest, but is still wrong: "Any attempt to set a reserved bit (as
determined by the contents of EAX and EDX after executing CPUID with
EAX=0DH, ECX= 0H) in XCR0 for a given processor will result in a #GP
exception".

The patch also performs the same checks as __kvm_set_xcr in KVM_SET_XSAVE.
This catches migration from newer to older kernel/processor before the
guest starts running.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-10-03 12:29:07 +03:00
Borislav Petkov
646e29a178 x86: Improve the printout of the SMP bootup CPU table
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Libin <huawei.libin@huawei.com>
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-28 10:10:26 +02:00
Peter Zijlstra
75f93fed50 sched: Revert need_resched() to look at TIF_NEED_RESCHED
Yuanhan reported a serious throughput regression in his pigz
benchmark. Using the ftrace patch I found that several idle
paths need more TLC before we can switch the generic
need_resched() over to preempt_need_resched.

The preemption paths benefit most from preempt_need_resched and
do indeed use it; all other need_resched() users don't really
care that much so reverting need_resched() back to
tif_need_resched() is the simple and safe solution.

Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: lkp@linux.intel.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130927153003.GF15690@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-28 10:04:47 +02:00
Linus Torvalds
4b97280675 Bug-fixes:
- Fix PV spinlocks triggering jump_label code bug
  - Remove extraneous code in the tpm front driver
  - Fix ballooning out of pages when non-preemptible
  - Fix deadlock when using a 32-bit initial domain with large amount of memory.
  - Add xen_nopvpsin parameter to the documentation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSQvzCAAoJEFjIrFwIi8fJyCIIAMENABapdLhrOiRdQ1Y7T5v1
 4bogPDLwpVxHzwo/vnHcNpl35/dUZrC6wQa51Bkoqq0V8o1XmjFy3SY/EBGjEAvw
 hh4qxGY0p0NNi6hKrWC8mH9u2TcluZGm1uecabkXUhl9mrAB5oBsfJdbBZ5N69gO
 QXXt0j7Xwv1APwH86T0e1Lz+lulhdw2ItXP4osYkEbRYNSaaGnuwsd0Jxcb4DeMk
 qhKgP7QMn3C7zDDaapJo1axeYQRBNEtv5M8+0wwMleX4yX1+IBRZeQTsRfMr7RB/
 8FhssWiH15xU6Gmzgi/VR8xhTEIbQh5GWsVReGf6pqIYSxGSYTvvyhm0bVRH4JI=
 =c+7u
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen fixes from Konrad Rzeszutek Wilk:
 "Bug-fixes and one update to the kernel-paramters.txt documentation.

   - Fix PV spinlocks triggering jump_label code bug
   - Remove extraneous code in the tpm front driver
   - Fix ballooning out of pages when non-preemptible
   - Fix deadlock when using a 32-bit initial domain with large amount
     of memory
   - Add xen_nopvpsin parameter to the documentation"

* tag 'stable/for-linus-3.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/spinlock: Document the xen_nopvspin parameter.
  xen/p2m: check MFN is in range before using the m2p table
  xen/balloon: don't alloc page while non-preemptible
  xen: Do not enable spinlocks before jump_label_init() has executed
  tpm: xen-tpmfront: Remove the locality sysfs attribute
  tpm: xen-tpmfront: Fix default durations
2013-09-25 15:50:53 -07:00
Yinghai Lu
bee7f9c83e ACPI / x86: Increase override tables number limit
Current ACPI tables in initrd is limited to 10, that is too small.
64 should be good enough as we have 35 sigs and could have several
SSDT.

Two problems in current code prevent us from increasing limit:
 1. The cpio file info array is put in stack, as every element is 32
    bytes, could run out of stack if we have that array size to 64.
    We can move it out from stack, make it global and put it into the
    __initdata section.
 2. early_ioremap() only can remap 256k one time. Current code maps
    10 tables at a time. If we increased that limit, the whole size
    could be more than 256k, so early_ioremap() would fail with that.
    We can map chunks one by one during copying, instead of mapping
    all of them together.

Signed-off-by: Yinghai <yinghai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 16:59:39 +02:00
David Vrabel
0160676bba xen/p2m: check MFN is in range before using the m2p table
On hosts with more than 168 GB of memory, a 32-bit guest may attempt
to grant map an MFN that is error cannot lookup in its mapping of the
m2p table.  There is an m2p lookup as part of m2p_add_override() and
m2p_remove_override().  The lookup falls off the end of the mapped
portion of the m2p and (because the mapping is at the highest virtual
address) wraps around and the lookup causes a fault on what appears to
be a user space address.

do_page_fault() (thinking it's a fault to a userspace address), tries
to lock mm->mmap_sem.  If the gntdev device is used for the grant map,
m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
already locked.  do_page_fault() then deadlocks.

The deadlock would most commonly occur when a 64-bit guest is started
and xenconsoled attempts to grant map its console ring.

Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
mapped portion of the m2p table before accessing the table and use
this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
(which already had the correct range check).

All faults caused by accessing the non-existant parts of the m2p are
thus within the kernel address space and exception_fixup() is called
without trying to lock mm->mmap_sem.

This means that for MFNs that are outside the mapped range of the m2p
then mfn_to_pfn() will always look in the m2p overrides.  This is
correct because it must be a foreign MFN (and the PFN in the m2p in
this case is only relevant for the other domain).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
--
v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
    range as it's probably foreign.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2013-09-25 09:00:03 -04:00
Peter Zijlstra
1a338ac32c sched, x86: Optimize the preempt_schedule() call
Remove the bloat of the C calling convention out of the
preempt_enable() sites by creating an ASM wrapper which allows us to
do an asm("call ___preempt_schedule") instead.

calling.h bits by Andi Kleen

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-tk7xdi1cvvxewixzke8t8le1@git.kernel.org
[ Fixed build error. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:23:07 +02:00
Peter Zijlstra
c2daa3bed5 sched, x86: Provide a per-cpu preempt_count implementation
Convert x86 to use a per-cpu preemption count. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
thread_info variables.

We still need to save/restore the actual preemption count due to
PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the
same cache-line as the other hot __switch_to() variables such as
current_task.

NOTE: this save/restore is required even for !PREEMPT kernels as
cond_resched() also relies on preempt_count's PREEMPT_ACTIVE to ignore
task_struct::state.

Also rename thread_info::preempt_count to ensure nobody is
'accidentally' still poking at it.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-gzn5rfsf8trgjoqx8hyayy3q@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:07:57 +02:00
Peter Zijlstra
a787870924 sched, arch: Create asm/preempt.h
In order to prepare to per-arch implementations of preempt_count move
the required bits into an asm-generic header and use this for all
archs.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-h5j0c1r3e3fk015m30h8f1zx@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:07:50 +02:00
Peter Zijlstra
0c44c2d0f4 x86: Use asm goto to implement better modify_and_test() functions
Linus suggested using asm goto to get rid of the typical SETcc + TEST
instruction pair -- which also clobbers an extra register -- for our
typical modify_and_test() functions.

Because asm goto doesn't allow output fields it has to include an
unconditinal memory clobber when it changes a memory variable to force
a reload.

Luckily all atomic ops already imply a compiler barrier to go along
with their memory barrier semantics.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-0mtn9siwbeo1d33bap1422se@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 13:53:08 +02:00
Mike Travis
8eba18428a x86/UV: Add uvtrace support
This patch adds support for the uvtrace module by providing a
skeleton call to the registered trace function.  It also
provides another separate 'NMI' tracer that is triggered by the
system wide 'power nmi' command.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212501.185052551@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:04 +02:00
Mike Travis
0d12ef0c90 x86/UV: Update UV support for external NMI signals
The current UV NMI handler has not been updated for the changes
in the system NMI handler and the perf operations.  The UV NMI
handler reads an MMR in the UV Hub to check to see if the NMI
event was caused by the external 'system NMI' that the operator
can initiate on the System Mgmt Controller.

The problem arises when the perf tools are running, causing
millions of perf events per second on very large CPU count
systems.  Previously this was okay because the perf NMI handler
ran at a higher priority on the NMI call chain and if the NMI
was a perf event, it would stop calling other NMI handlers
remaining on the NMI call chain.

Now the system NMI handler calls all the handlers on the NMI
call chain including the UV NMI handler.  This causes the UV NMI
handler to read the MMRs at the same millions per second rate.
This can lead to significant performance loss and possible
system failures.  It also can cause thousands of 'Dazed and
Confused' messages being sent to the system console.  This
effectively makes perf tools unusable on UV systems.

To avoid this excessive overhead when perf tools are running,
this code has been optimized to minimize reading of the MMRs as
much as possible, by moving to the NMI_UNKNOWN notifier chain.
This chain is called only when all the users on the standard
NMI_LOCAL call chain have been called and none of them have
claimed this NMI.

There is an exception where the NMI_LOCAL notifier chain is
used.  When the perf tools are in use, it's possible that the UV
NMI was captured by some other NMI handler and then either
ignored or mistakenly processed as a perf event.  We set a
per_cpu ('ping') flag for those CPUs that ignored the initial
NMI, and then send them an IPI NMI signal.  The NMI_LOCAL
handler on each cpu does not need to read the MMR, but instead
checks the in memory flag indicating it was pinged.  There are
two module variables, 'ping_count' indicating how many requested
NMI events occurred, and 'ping_misses' indicating how many stray
NMI events.  These most likely are perf events so it shows the
overhead of the perf NMI interrupts and how many MMR reads were avoided.

This patch also minimizes the reads of the MMRs by having the
first cpu entering the NMI handler on each node set a per HUB
in-memory atomic value.  (Having a per HUB value avoids sending
lock traffic over NumaLink.)  Both types of UV NMIs from the SMI
layer are supported.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.353547733@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:02 +02:00
Mike Travis
1e019421bc x86/UV: Move NMI support
This patch moves the UV NMI support from the x2apic file to a
new separate uv_nmi.c file in preparation for the next sequence
of patches. It prevents upcoming bloat of the x2apic file, and
has the added benefit of putting the upcoming /sys/module
parameters under the name 'uv_nmi' instead of 'x2apic_uv_x',
which was obscure.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.183295611@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:02 +02:00
Jiang Liu
7e1f85f96d x86 / ACPI: simplify _acpi_map_lsapic()
In acpi_register_lapic(), it will generates a new logical cpu
number and maps to the local APIC id, this logical cpu number
can be returned to simplify _acpi_map_lsapic() implementation.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-24 01:39:40 +02:00
Ard Biesheuvel
801201aa25 crypto: move x86 to the generic version of ablk_helper
Move all users of ablk_helper under x86/ to the generic version
and delete the x86 specific version.

Acked-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-09-24 06:02:24 +10:00
Cyrill Gorcunov
fa0f281cf9 mm: make sure _PAGE_SWP_SOFT_DIRTY bit is not set on present pte
_PAGE_SOFT_DIRTY bit should never be set on present pte so add VM_BUG_ON
to catch any potential future abuse.

Also add a comment on _PAGE_SWP_SOFT_DIRTY definition explaining scope of
its usage.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:06 -07:00
Dave Hansen
6df46865ff mm: vmstats: track TLB flush stats on UP too
The previous patch doing vmstats for TLB flushes ("mm: vmstats: tlb flush
counters") effectively missed UP since arch/x86/mm/tlb.c is only compiled
for SMP.

UP systems do not do remote TLB flushes, so compile those counters out on
UP.

arch/x86/kernel/cpu/mtrr/generic.c calls __flush_tlb() directly.  This is
probably an optimization since both the mtrr code and __flush_tlb() write
cr4.  It would probably be safe to make that a flush_tlb_all() (and then
get these statistics), but the mtrr code is ancient and I'm hesitant to
touch it other than to just stick in the counters.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:09 -07:00
Linus Torvalds
442e0973e9 Merge branch 'x86/jumplabel' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 jumplabel changes from Peter Anvin:
 "One more x86 tree for this merge window.  This tree improves the
  handling of jump labels, so that most of the time we don't have to do
  a massive initial patching run.

  Furthermore, we will error out of the jump label is not what is
  expected, eg if it has been corrupted or tampered with"

* 'x86/jumplabel' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/jump-label: Show where and what was wrong on errors
  x86/jump-label: Add safety checks to jump label conversions
  x86/jump-label: Do not bother updating nops if they are correct
  x86/jump-label: Use best default nops for inital jump label calls
2013-09-10 19:43:23 -07:00
Andi Kleen
ff47ab4ff3 x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic
The 64bit __copy_{from,to}_user_inatomic always called
copy_from_user_generic, but skipped the special optimizations for 1/2/4/8
byte accesses.

This especially hurts the futex call, which accesses the 4 byte futex
user value with a complicated fast string operation in a function call,
instead of a single movl.

Use __copy_{from,to}_user for _inatomic instead to get the same
optimizations. The only problem was the might_fault() in those functions.
So move that to new wrapper and call __copy_{f,t}_user_nocheck()
from *_inatomic directly.

32bit already did this correctly by duplicating the code.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1376687844-19857-2-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-09-10 15:27:43 -07:00
Linus Torvalds
64c353864e Merge branch 'for-v3.12' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull DMA mapping update from Marek Szyprowski:
 "This contains an addition of Device Tree support for reserved memory
  regions (Contiguous Memory Allocator is one of the drivers for it) and
  changes required by the KVM extensions for PowerPC architectue"

* 'for-v3.12' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  ARM: init: add support for reserved memory defined by device tree
  drivers: of: add initialization code for dma reserved memory
  drivers: of: add function to scan fdt nodes given by path
  drivers: dma-contiguous: clean source code and prepare for device tree
2013-09-09 10:26:33 -07:00
Konrad Rzeszutek Wilk
65320fceda Linux 3.11-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJSGqS5AAoJEHm+PkMAQRiGFxEH/3VrqF6WAkcviNiW/0DCdO8k
 v6Wi7Sp5LxVkwzmOCHCV1tTHwLRlH3cB9YmJlGQ0kHCREaAuEQAB0xJXIW7dnyYj
 Qq7KoRZEMe3wizmjEsj8qsrhfMLzHjBw67hBz2znwW/4P7YdgzwD7KRiEat+yRC9
 ON3nNL2zIqpfk92RXvVrSVl4KMEM+WNbOfiffgBiEP24Ja1MJMFH1d4i6hNOaB0x
 9Pb3Lw8let92x+8Ao5jnjKdKMgVsoZWbN/TgQR8zZOHM38AGGiDgk18vMz+L+hpS
 jqfjckxj1m30jGq0qZ9ZbMZx3IGif4KccVr30MqNHJpwi6Q24qXvT3YfA3HkstM=
 =nAab
 -----END PGP SIGNATURE-----

Merge tag 'v3.11-rc7' into stable/for-linus-3.12

Linux 3.11-rc7

As we need the git commit 28817e9de4f039a1a8c1fe1df2fa2df524626b9e
Author: Chuck Anderson <chuck.anderson@oracle.com>
Date:   Tue Aug 6 15:12:19 2013 -0700

    xen/smp: initialize IPI vectors before marking CPU online

* tag 'v3.11-rc7': (443 commits)
  Linux 3.11-rc7
  ARC: [lib] strchr breakage in Big-endian configuration
  VFS: collect_mounts() should return an ERR_PTR
  bfs: iget_locked() doesn't return an ERR_PTR
  efs: iget_locked() doesn't return an ERR_PTR()
  proc: kill the extra proc_readfd_common()->dir_emit_dots()
  cope with potentially long ->d_dname() output for shmem/hugetlb
  usb: phy: fix build breakage
  USB: OHCI: add missing PCI PM callbacks to ohci-pci.c
  staging: comedi: bug-fix NULL pointer dereference on failed attach
  lib/lz4: correct the LZ4 license
  memcg: get rid of swapaccount leftovers
  nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection
  nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error
  drivers/platform/olpc/olpc-ec.c: initialise earlier
  ipv4: expose IPV4_DEVCONF
  ipv6: handle Redirect ICMP Message with no Redirected Header option
  be2net: fix disabling TX in be_close()
  Revert "ACPI / video: Always call acpi_video_init_brightness() on init"
  Revert "genetlink: fix family dump race"
  ...

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-09-09 12:05:37 -04:00
Konrad Rzeszutek Wilk
c3f31f6a6f Merge branch 'x86/spinlocks' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into stable/for-linus-3.12
* 'x86/spinlocks' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kvm/guest: Fix sparse warning: "symbol 'klock_waiting' was not declared as static"
  kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  kvm guest: Add configuration support to enable debug information for KVM Guests
  kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi
  xen, pvticketlock: Allow interrupts to be enabled while blocking
  x86, ticketlock: Add slowpath logic
  jump_label: Split jumplabel ratelimit
  x86, pvticketlock: When paravirtualizing ticket locks, increment by 2
  x86, pvticketlock: Use callee-save for lock_spinning
  xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks
  xen, pvticketlock: Xen implementation for PV ticket locks
  xen: Defer spinlock setup until boot CPU setup
  x86, ticketlock: Collapse a layer of functions
  x86, ticketlock: Don't inline _spin_unlock when using paravirt spinlocks
  x86, spinlock: Replace pv spinlocks with pv ticketlocks
2013-09-09 12:01:15 -04:00
Linus Torvalds
6be48f2940 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 3.12:

   - Added MODULE_SOFTDEP to allow pre-loading of modules.
   - Reinstated crct10dif driver using the module softdep feature.
   - Allow via rng driver to be auto-loaded.

   - Split large input data when necessary in nx.
   - Handle zero length messages correctly for GCM/XCBC in nx.
   - Handle SHA-2 chunks bigger than block size properly in nx.

   - Handle unaligned lengths in omap-aes.
   - Added SHA384/SHA512 to omap-sham.
   - Added OMAP5/AM43XX SHAM support.
   - Added OMAP4 TRNG support.

   - Misc fixes"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (66 commits)
  Reinstate "crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework"
  hwrng: via - Add MODULE_DEVICE_TABLE
  crypto: fcrypt - Fix bitoperation for compilation with clang
  crypto: nx - fix SHA-2 for chunks bigger than block size
  crypto: nx - fix GCM for zero length messages
  crypto: nx - fix XCBC for zero length messages
  crypto: nx - fix limits to sg lists for AES-CCM
  crypto: nx - fix limits to sg lists for AES-XCBC
  crypto: nx - fix limits to sg lists for AES-GCM
  crypto: nx - fix limits to sg lists for AES-CTR
  crypto: nx - fix limits to sg lists for AES-CBC
  crypto: nx - fix limits to sg lists for AES-ECB
  crypto: nx - add offset to nx_build_sg_lists()
  padata - Register hotcpu notifier after initialization
  padata - share code between CPU_ONLINE and CPU_DOWN_FAILED, same to CPU_DOWN_PREPARE and CPU_UP_CANCELED
  hwrng: omap - reorder OMAP TRNG driver code
  crypto: omap-sham - correct dma burst size
  crypto: omap-sham - Enable Polling mode if DMA fails
  crypto: tegra-aes - bitwise vs logical and
  crypto: sahara - checking the wrong variable
  ...
2013-09-07 14:31:18 -07:00
Herbert Xu
eeca9fad52 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
Merge upstream tree in order to reinstate crct10dif.
2013-09-07 12:53:35 +10:00
Linus Torvalds
b4b50fd78b ARM: SoC platform changes for 3.12
This branch contains mostly additions and changes to platform enablement
 and SoC-level drivers. Since there's sometimes a dependency on device-tree
 changes, there's also a fair amount of those in this branch.
 
 Pieces worth mentioning are:
 
 - Mbus driver for Marvell platforms, allowing kernel configuration
   and resource allocation of on-chip peripherals.
 - Enablement of the mbus infrastructure from Marvell PCI-e drivers.
 - Preparation of MSI support for Marvell platforms.
 - Addition of new PCI-e host controller driver for Tegra platforms
 - Some churn caused by sharing of macro names between i.MX 6Q and 6DL
   platforms in the device tree sources and header files.
 - Various suspend/PM updates for Tegra, including LP1 support.
 - Versatile Express support for MCPM, part of big little support.
 - Allwinner platform support for A20 and A31 SoCs (dual and quad Cortex-A7)
 - OMAP2+ support for DRA7, a new Cortex-A15-based SoC.
 
 The code that touches other architectures are patches moving
 MSI arch-specific functions over to weak symbols and removal of
 ARCH_SUPPORTS_MSI, acked by PCI maintainers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSKhYmAAoJEIwa5zzehBx322AP/1ONYs8o8f7/Gzq6lZvTN6T3
 0pBTApg6Jfioi3lwKvUAEIcsW82YKQ+UZkbW66GQH6+Ri4aZJKZHuz0+JPU67OJ4
 LtSLuzVWrymy2VOOUvAnS/SXkOZw/pHhU4cLNHn1dMndhUL1Uqp9/XwuiHEQyFsP
 uOkpcBtIu0EWElov0PKKZ5SWBg8JJs2vy5ydiViGelWHCrZvDDZkWzIsDcBQxJLQ
 juzT4+JE+KOu7vKmfw78o6iHoCS2TBRAN9YUCajRb8Wl+out1hrTahHnDWaZ5Mce
 EskcQNkJROqFbjD4k3ABN4XGTv2VDmrztIwFe0SEQ7Dz/9ypCrBGT69uI9xIqTXr
 GwVRIwAUFTpMupK0gy93z1ajV3N0CXV79out9+jQNUQybYE+czp8QOyhmuc1tZx0
 8fn9jlBQe9Vy6yrs39gEcE7nUwrayeyQ+6UvqqwsE2pWZabNAnCMSPX5+QIu+T/3
 tQ7+jYmfFeserp1sIDOHOnxfhtW9EI6U9d1h/DUCwrsuFdkL9ha4M/vh9Pwgye98
 tBdz0T4yE39AJQwwFWRkv1jcQKcGu6WqJanmvS4KRBksGwuLWxy+ewOnkz2ifS25
 ZYSyxAryZRBvQRqlOK11rXPfRcbGcY0MG9lkKX96rGcyWEizgE1DdjxXD8HoIleN
 R8heV6GX5OzlFLGX2tKK
 =fJ5x
 -----END PGP SIGNATURE-----

Merge tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC platform changes from Olof Johansson:
 "This branch contains mostly additions and changes to platform
  enablement and SoC-level drivers.  Since there's sometimes a
  dependency on device-tree changes, there's also a fair amount of
  those in this branch.

  Pieces worth mentioning are:

   - Mbus driver for Marvell platforms, allowing kernel configuration
     and resource allocation of on-chip peripherals.
   - Enablement of the mbus infrastructure from Marvell PCI-e drivers.
   - Preparation of MSI support for Marvell platforms.
   - Addition of new PCI-e host controller driver for Tegra platforms
   - Some churn caused by sharing of macro names between i.MX 6Q and 6DL
     platforms in the device tree sources and header files.
   - Various suspend/PM updates for Tegra, including LP1 support.
   - Versatile Express support for MCPM, part of big little support.
   - Allwinner platform support for A20 and A31 SoCs (dual and quad
     Cortex-A7)
   - OMAP2+ support for DRA7, a new Cortex-A15-based SoC.

  The code that touches other architectures are patches moving MSI
  arch-specific functions over to weak symbols and removal of
  ARCH_SUPPORTS_MSI, acked by PCI maintainers"

* tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (266 commits)
  tegra-cpuidle: provide stub when !CONFIG_CPU_IDLE
  PCI: tegra: replace devm_request_and_ioremap by devm_ioremap_resource
  ARM: tegra: Drop ARCH_SUPPORTS_MSI and sort list
  ARM: dts: vf610-twr: enable i2c0 device
  ARM: dts: i.MX51: Add one more I2C2 pinmux entry
  ARM: dts: i.MX51: Move pins configuration under "iomuxc" label
  ARM: dtsi: imx6qdl-sabresd: Add USB OTG vbus pin to pinctrl_hog
  ARM: dtsi: imx6qdl-sabresd: Add USB host 1 VBUS regulator
  ARM: dts: imx27-phytec-phycore-som: Enable AUDMUX
  ARM: dts: i.MX27: Disable AUDMUX in the template
  ARM: dts: wandboard: Add support for SDIO bcm4329
  ARM: i.MX5 clocks: Remove optional clock setup (CKIH1) from i.MX51 template
  ARM: dts: imx53-qsb: Make USBH1 functional
  ARM i.MX6Q: dts: Enable I2C1 with EEPROM and PMIC on Phytec phyFLEX-i.MX6 Ouad module
  ARM i.MX6Q: dts: Enable SPI NOR flash on Phytec phyFLEX-i.MX6 Ouad module
  ARM: dts: imx6qdl-sabresd: Add touchscreen support
  ARM: imx: add ocram clock for imx53
  ARM: dts: imx: ocram size is different between imx6q and imx6dl
  ARM: dts: imx27-phytec-phycore-som: Fix regulator settings
  ARM: dts: i.MX27: Remove clock name from CPU node
  ...
2013-09-06 13:30:06 -07:00
Linus Torvalds
ae7a835cc5 Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Gleb Natapov:
 "The highlights of the release are nested EPT and pv-ticketlocks
  support (hypervisor part, guest part, which is most of the code, goes
  through tip tree).  Apart of that there are many fixes for all arches"

Fix up semantic conflicts as discussed in the pull request thread..

* 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (88 commits)
  ARM: KVM: Add newlines to panic strings
  ARM: KVM: Work around older compiler bug
  ARM: KVM: Simplify tracepoint text
  ARM: KVM: Fix kvm_set_pte assignment
  ARM: KVM: vgic: Bump VGIC_NR_IRQS to 256
  ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regs
  ARM: KVM: vgic: fix GICD_ICFGRn access
  ARM: KVM: vgic: simplify vgic_get_target_reg
  KVM: MMU: remove unused parameter
  KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate()
  KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls
  KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX
  KVM: x86: update masterclock when kvmclock_offset is calculated (v2)
  KVM: PPC: Book3S: Fix compile error in XICS emulation
  KVM: PPC: Book3S PR: return appropriate error when allocation fails
  arch: powerpc: kvm: add signed type cast for comparation
  KVM: x86: add comments where MMIO does not return to the emulator
  KVM: vmx: count exits to userspace during invalid guest emulation
  KVM: rename __kvm_io_bus_sort_cmp to kvm_io_bus_cmp
  kvm: optimize away THP checks in kvm_is_mmio_pfn()
  ...
2013-09-04 18:15:06 -07:00
Linus Torvalds
cf39c8e535 Features:
- Xen Trusted Platform Module (TPM) frontend driver - with the backend in MiniOS.
  - Scalability improvements in event channel.
  - Two extra Xen co-maintainers (David, Boris) and one going away (Jeremy)
 Bug-fixes:
  - Make the 1:1 mapping work during early bootup on selective regions.
  - Add scratch page to balloon driver to deal with unexpected code still holding
    on stale pages.
  - Allow NMIs on PV guests (64-bit only)
  - Remove unnecessary TLB flush in M2P code.
  - Fixes duplicate callbacks in Xen granttable code.
  - Fixes in PRIVCMD_MMAPBATCH ioctls to allow retries
  - Fix for events being lost due to rescheduling on different VCPUs.
  - More documentation.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSJgGgAAoJEFjIrFwIi8fJ4asH+gKp0aauPEdHtmn7rLfZUUJ5
 uuvWBiXiVVYMFz81NXlZ1WoAMuDuVA45Eu785uPRb9oUHDi0W8LO4Dqr+9lJTrXJ
 KiMvTXmOLSfSdjRlDI4jCoxBdg8tpbT3oJkXsFcHnrd5d4oTFGb0uuo5nFYPDicZ
 BGogDclzcqtlYl/2LUb+6vUXUQd77n0oW7RQ4yAaw3Qdj381om3Dmoeat8QU9Kdo
 Q4dhsHS6YAGR5R+G0zPfVOoKvSGoGV0NUdXr19QpYArGxKXcmiPjrgAJ/NGLsxvm
 8AbPjmQzOFJmUclHiiej6kvBsh2ZTYAesJMSAFLWD7EndXii7zljyJv0PIJ//uQ=
 =hNDW
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.12-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen updates from Konrad Rzeszutek Wilk:
 "A couple of features and a ton of bug-fixes.  There is also some
  maintership changes.  Jeremy is enjoying the full-time work at the
  startup and as much as he would love to help - he can't find the time.
  I have a bunch of other things that I promised to work on - paravirt
  diet, get SWIOTLB working everywhere, etc, but haven't been able to
  find the time.

  As such both David Vrabel and Boris Ostrovsky have graciously
  volunteered to help with the maintership role.  They will keep the lid
  on regressions, bug-fixes, etc.  I will be in the background to help -
  but eventually there will be less of me doing the Xen GIT pulls and
  more of them.  Stefano is still doing the ARM/ARM64 and will continue
  on doing so.

  Features:
   - Xen Trusted Platform Module (TPM) frontend driver - with the
     backend in MiniOS.
   - Scalability improvements in event channel.
   - Two extra Xen co-maintainers (David, Boris) and one going away (Jeremy)

  Bug-fixes:
   - Make the 1:1 mapping work during early bootup on selective regions.
   - Add scratch page to balloon driver to deal with unexpected code
     still holding on stale pages.
   - Allow NMIs on PV guests (64-bit only)
   - Remove unnecessary TLB flush in M2P code.
   - Fixes duplicate callbacks in Xen granttable code.
   - Fixes in PRIVCMD_MMAPBATCH ioctls to allow retries
   - Fix for events being lost due to rescheduling on different VCPUs.
   - More documentation"

* tag 'stable/for-linus-3.12-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (23 commits)
  hvc_xen: Remove unnecessary __GFP_ZERO from kzalloc
  drivers/xen-tpmfront: Fix compile issue with missing option.
  xen/balloon: don't set P2M entry for auto translated guest
  xen/evtchn: double free on error
  Xen: Fix retry calls into PRIVCMD_MMAPBATCH*.
  xen/pvhvm: Initialize xen panic handler for PVHVM guests
  xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mapping
  xen: fix ARM build after 6efa20e4
  MAINTAINERS: Remove Jeremy from the Xen subsystem.
  xen/events: document behaviour when scanning the start word for events
  x86/xen: during early setup, only 1:1 map the ISA region
  x86/xen: disable premption when enabling local irqs
  swiotlb-xen: replace dma_length with sg_dma_len() macro
  swiotlb: replace dma_length with sg_dma_len() macro
  xen/balloon: set a mapping for ballooned out pages
  xen/evtchn: improve scalability by using per-user locks
  xen/p2m: avoid unneccesary TLB flush in m2p_remove_override()
  MAINTAINERS: Add in two extra co-maintainers of the Xen tree.
  MAINTAINERS: Update the Xen subsystem's with proper mailing list.
  xen: replace strict_strtoul() with kstrtoul()
  ...
2013-09-04 17:45:39 -07:00
Linus Torvalds
816434ec4a Merge branch 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 spinlock changes from Ingo Molnar:
 "The biggest change here are paravirtualized ticket spinlocks (PV
  spinlocks), which bring a nice speedup on various benchmarks.

  The KVM host side will come to you via the KVM tree"

* 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kvm/guest: Fix sparse warning: "symbol 'klock_waiting' was not declared as static"
  kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  kvm guest: Add configuration support to enable debug information for KVM Guests
  kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi
  xen, pvticketlock: Allow interrupts to be enabled while blocking
  x86, ticketlock: Add slowpath logic
  jump_label: Split jumplabel ratelimit
  x86, pvticketlock: When paravirtualizing ticket locks, increment by 2
  x86, pvticketlock: Use callee-save for lock_spinning
  xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks
  xen, pvticketlock: Xen implementation for PV ticket locks
  xen: Defer spinlock setup until boot CPU setup
  x86, ticketlock: Collapse a layer of functions
  x86, ticketlock: Don't inline _spin_unlock when using paravirt spinlocks
  x86, spinlock: Replace pv spinlocks with pv ticketlocks
2013-09-04 11:55:10 -07:00
Linus Torvalds
f357a82048 Merge branch 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SMAP fixes from Ingo Molnar:
 "Fixes for Intel SMAP support, to fix SIGSEGVs during bootup"

* 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Introduce [compat_]save_altstack_ex() to unbreak x86 SMAP
  x86, smap: Handle csum_partial_copy_*_user()
2013-09-04 11:08:32 -07:00
Linus Torvalds
b20c99eb66 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS changes from Ingo Molnar:
 "[ The reason for drivers/ updates is that Boris asked for the
    drivers/edac/ changes to go via x86/ras in this cycle ]

  Main changes:

   - AMD CPUs:
      . Add ECC event decoding support for new F15h models
      . Various erratum fixes
      . Fix single-channel on dual-channel-controllers bug.

   - Intel CPUs:
      . UC uncorrectable memory error parsing fix
      . Add support for CMC (Corrected Machine Check) 'FF' (Firmware
        First) flag in the APEI HEST

   - Various cleanups and fixes"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  amd64_edac: Fix incorrect wraparounds
  amd64_edac: Correct erratum 505 range
  cpc925_edac: Use proper array termination
  x86/mce, acpi/apei: Only disable banks listed in HEST if mce is configured
  amd64_edac: Get rid of boot_cpu_data accesses
  amd64_edac: Add ECC decoding support for newer F15h models
  x86, amd_nb: Clarify F15h, model 30h GART and L3 support
  pci_ids: Add PCI device ID functions 3 and 4 for newer F15h models.
  x38_edac: Make a local function static
  i3200_edac: Make a local function static
  x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors
  APEI/ERST: Fix error message formatting
  amd64_edac: Fix single-channel setups
  EDAC: Replace strict_strtol() with kstrtol()
  mce: acpi/apei: Soft-offline a page on firmware GHES notification
  mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
  mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC
2013-09-04 11:07:04 -07:00
Linus Torvalds
05eebfb26b Merge branch 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt changes from Ingo Molnar:
 "Hypervisor signature detection cleanup and fixes - the goal is to make
  KVM guests run better on MS/Hyperv and to generalize and factor out
  the code a bit"

* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Correctly detect hypervisor
  x86, kvm: Switch to use hypervisor_cpuid_base()
  xen: Switch to use hypervisor_cpuid_base()
  x86: Introduce hypervisor_cpuid_base()
2013-09-04 11:05:13 -07:00
Linus Torvalds
cb3e4330e6 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "Misc smaller fixes:

   - a parse_setup_data() boot crash fix

   - a memblock and an __early_ioremap cleanup

   - turn the always-on CONFIG_ARCH_MEMORY_PROBE=y into a configurable
     option and turn it off - it's an unrobust debug facility, it
     shouldn't be enabled by default"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: avoid remapping data in parse_setup_data()
  x86: Use memblock_set_current_limit() to set limit for memblock.
  mm: Remove unused variable idx0 in __early_ioremap()
  mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default
2013-09-04 09:39:26 -07:00
Linus Torvalds
aafcd5d757 Merge branch 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 relocation changes from Ingo Molnar:
 "This tree contains a single change, ELF relocation handling in C - one
  of the kernel randomization patches that makes sense even without
  randomization present upstream"

* 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, relocs: Move ELF relocation handling to C
2013-09-04 09:38:10 -07:00
Linus Torvalds
228abe73ad Merge branch 'x86-fb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fb changes from Ingo Molnar:
 "This tree includes preparatory patches for SimpleDRM driver support,
  by David Herrmann.  They clean up x86 framebuffer support by creating
  simplefb devices wherever possible.  More background can be found at

     http://lwn.net/Articles/558104/"

* 'x86-fb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  fbdev: fbcon: select VT_HW_CONSOLE_BINDING
  fbdev: efifb: bind to efi-framebuffer
  fbdev: vesafb: bind to platform-framebuffer device
  fbdev: simplefb: add common x86 RGB formats
  x86: sysfb: move EFI quirks from efifb to sysfb
  x86: provide platform-devices for boot-framebuffers
  fbdev: simplefb: mark as fw and allocate apertures
  fbdev: simplefb: add init through platform_data
2013-09-04 09:12:17 -07:00
Linus Torvalds
1f9c52e16b Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu feature fixes from Ingo Molnar:
 "Two small cpufeature support updates"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix override new_cpu_data.x86 with 486
  x86, cpufeature: Use new CC_HAVE_ASM_GOTO
2013-09-04 09:11:16 -07:00
Linus Torvalds
2a475501b8 Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/asmlinkage changes from Ingo Molnar:
 "As a preparation for Andi Kleen's LTO patchset (link time
  optimizations using GCC's -flto which build time optimization has
  steadily increased in quality over the past few years and might
  eventually be usable for the kernel too) this tree includes a handful
  of preparatory patches that make function calling convention
  annotations consistent again:

   - Mark every function without arguments (or 64bit only) that is used
     by assembly code with asmlinkage()

   - Mark every function with parameters or variables that is used by
     assembly code as __visible.

  For the vanilla kernel this has documentation, consistency and
  debuggability advantages, for the time being"

* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asmlinkage: Fix warning in xen asmlinkage change
  x86, asmlinkage, vdso: Mark vdso variables __visible
  x86, asmlinkage, power: Make various symbols used by the suspend asm code visible
  x86, asmlinkage: Make dump_stack visible
  x86, asmlinkage: Make 64bit checksum functions visible
  x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops
  x86, asmlinkage, apm: Make APM data structure used from assembler visible
  x86, asmlinkage: Make syscall tables visible
  x86, asmlinkage: Make several variables used from assembler/linker script visible
  x86, asmlinkage: Make kprobes code visible and fix assembler code
  x86, asmlinkage: Make various syscalls asmlinkage
  x86, asmlinkage: Make 32bit/64bit __switch_to visible
  x86, asmlinkage: Make _*_start_kernel visible
  x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible
  x86, asmlinkage: Change dotraplinkage into __visible on 32bit
  x86: Fix sys_call_table type in asm/syscall.h
2013-09-04 08:42:44 -07:00
Linus Torvalds
3d7e5fc37f Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/asm changes from Ingo Molnar:
 "Main changes:

   - Apply low level mutex optimization on x86-64, by Wedson Almeida
     Filho.

   - Change bitops to be naturally 'long', by H Peter Anvin.

   - Add TSX-NI opcodes support to the x86 (instrumentation) decoder, by
     Masami Hiramatsu.

   - Add clang compatibility adjustments/workarounds, by Jan-Simon
     Möller"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, doc: Update uaccess.h comment to reflect clang changes
  x86, asm: Fix a compilation issue with clang
  x86, asm: Extend definitions of _ASM_* with a raw format
  x86, insn: Add new opcodes as of June, 2013
  x86/ia32/asm: Remove unused argument in macro
  x86, bitops: Change bitops to be native operand size
  x86: Use asm-goto to implement mutex fast path on x86-64
2013-09-04 08:39:38 -07:00
Linus Torvalds
6924a4672d Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/apic changes from Ingo Molnar:
 "Smaller fixes"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioapic: Check attr against the previous setting when programmed more than once
  x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lock
  x86/acpi: Fix incorrect sanity check in acpi_register_lapic()
2013-09-04 08:39:05 -07:00
Linus Torvalds
5e0b3a4e88 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "Various optimizations, cleanups and smaller fixes - no major changes
  in scheduler behavior"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix the sd_parent_degenerate() code
  sched/fair: Rework and comment the group_imb code
  sched/fair: Optimize find_busiest_queue()
  sched/fair: Make group power more consistent
  sched/fair: Remove duplicate load_per_task computations
  sched/fair: Shrink sg_lb_stats and play memset games
  sched: Clean-up struct sd_lb_stat
  sched: Factor out code to should_we_balance()
  sched: Remove one division operation in find_busiest_queue()
  sched/cputime: Use this_cpu_add() in task_group_account_field()
  cpumask: Fix cpumask leak in partition_sched_domains()
  sched/x86: Optimize switch_mm() for multi-threaded workloads
  generic-ipi: Kill unnecessary variable - csd_flags
  numa: Mark __node_set() as __always_inline
  sched/fair: Cleanup: remove duplicate variable declaration
  sched/__wake_up_sync_key(): Fix nr_exclusive tasks which lead to WF_SYNC clearing
2013-09-04 08:36:35 -07:00
Linus Torvalds
0d99b70873 Merge branches 'perf-urgent-for-linus' and 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "As a first remark I'd like to point out that the obsolete '-f'
  (--force) option, which has not done anything for several releases,
  has been removed from 'perf record' and related utilities.  Everyone
  please update muscle memory accordingly! :-)

  Main changes on the perf kernel side:

   - Performance optimizations:
        . for trace events, by Steve Rostedt.
        . for time values, by Peter Zijlstra

   - New hardware support:
        . for Intel Silvermont (22nm Atom) CPUs, by Zheng Yan
        . for Intel SNB-EP uncore PMUs, by Zheng Yan

   - Enhanced hardware support:
        . for Intel uncore PMUs: add filter support for QPI boxes, by Zheng Yan

   - Core perf events code enhancements and fixes:
        . for full-nohz feature handling, by Frederic Weisbecker
        . for group events, by Jiri Olsa
        . for call chains, by Frederic Weisbecker
        . for event stream parsing, by Adrian Hunter

   - New ABI details:
        . Add attr->mmap2 attribute, by Stephane Eranian
        . Add PERF_EVENT_IOC_ID ioctl to return event ID, by Jiri Olsa
        . Export u64 time_zero on the mmap header page to allow TSC
          calculation, by Adrian Hunter
        . Add dummy software event, by Adrian Hunter.
        . Add a new PERF_SAMPLE_IDENTIFIER to make samples always
          parseable, by Adrian Hunter.
        . Make Power7 events available via sysfs, by Runzhen Wang.

   - Code cleanups and refactorings:
        . for nohz-full, by Frederic Weisbecker
        . for group events, by Jiri Olsa

   - Documentation updates:
        . for perf_event_type, by Peter Zijlstra

  Main changes on the perf tooling side (some of these tooling changes
  utilize the above kernel side changes):

   - Lots of 'perf trace' enhancements:

        . Make 'perf trace' command line arguments consistent with
          'perf record', by David Ahern.

        . Allow specifying syscalls a la strace, by Arnaldo Carvalho de Melo.

        . Add --verbose and -o/--output options, by Arnaldo Carvalho de Melo.

        . Support ! in -e expressions, to filter a list of syscalls,
          by Arnaldo Carvalho de Melo.

        . Arg formatting improvements to allow masking arguments in
          syscalls such as futex and open, where the some arguments are
          ignored and thus should not be printed depending on other args,
          by Arnaldo Carvalho de Melo.

        . Beautify futex open, openat, open_by_handle_at, lseek and futex
          syscalls, by Arnaldo Carvalho de Melo.

        . Add option to analyze events in a file versus live, so that
          one can do:

           [root@zoo ~]# perf record -a -e raw_syscalls:* sleep 1
           [ perf record: Woken up 0 times to write data ]
           [ perf record: Captured and wrote 25.150 MB perf.data (~1098836 samples) ]
           [root@zoo ~]# perf trace -i perf.data -e futex --duration 1
              17.799 ( 1.020 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, ua
             113.344 (95.429 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 4294967
             133.778 ( 1.042 ms): 18004 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 429496
           [root@zoo ~]#

          By David Ahern.

        . Honor target pid / tid options when analyzing a file, by David Ahern.

        . Introduce better formatting of syscall arguments, including so
          far beautifiers for mmap, madvise, syscall return values,
          by Arnaldo Carvalho de Melo.

        . Handle HUGEPAGE defines in the mmap beautifier, by David Ahern.

   - 'perf report/top' enhancements:

        . Do annotation using /proc/kcore and /proc/kallsyms when
          available, removing the forced need for a vmlinux file kernel
          assembly annotation. This also improves this use case because
          vmlinux has just the initial kernel image, not what is actually
          in use after various code patchings by things like alternatives.
          By Adrian Hunter.

        . Add --ignore-callees=<regex> option to collapse undesired parts
          of call graphs, by Greg Price.

        . Simplify symbol filtering by doing it at machine class level,
          by Adrian Hunter.

        . Add support for callchains in the gtk UI, by Namhyung Kim.

        . Add --objdump option to 'perf top', by Sukadev Bhattiprolu.

   - 'perf kvm' enhancements:

        . Add option to print only events that exceed a specified time
          duration, by David Ahern.

        . Improve stack trace printing, by David Ahern.

        . Update documentation of the live command, by David Ahern

        . Add perf kvm stat live mode that combines aspects of 'perf kvm
          stat' record and report, by David Ahern.

        . Add option to analyze specific VM in perf kvm stat report, by
          David Ahern.

        . Do not require /lib/modules/* on a guest, by Jason Wessel.

   - 'perf script' enhancements:

        . Fix symbol offset computation for some dsos, by David Ahern.

        . Fix named threads support, by David Ahern.

        . Don't install scripting files files when perl/python support
          is disabled, by Arnaldo Carvalho de Melo.

   - 'perf test' enhancements:

        . Add various improvements and fixes to the "vmlinux matches
          kallsyms" 'perf test' entry, related to the /proc/kcore
          annotation feature. By Adrian Hunter.

        . Add sample parsing test, by Adrian Hunter.

        . Add test for reading object code, by Adrian Hunter.

        . Add attr record group sampling test, by Jiri Olsa.

        . Misc testing infrastructure improvements and other details,
          by Jiri Olsa.

   - 'perf list' enhancements:

        . Skip unsupported hardware events, by Namhyung Kim.

        . List pmu events, by Andi Kleen.

   - 'perf diff' enhancements:

        . Add support for more than two files comparison, by Jiri Olsa.

   - 'perf sched' enhancements:

        . Various improvements, including removing reliance on some
          scheduler tracepoints that provide the same information as the
          PERF_RECORD_{FORK,EXIT} events. By David Ahern.

        . Remove odd build stall by moving a large struct initialization
          from a local variable to a global one, by Namhyung Kim.

   - 'perf stat' enhancements:

        . Add --initial-delay option to skip measuring for a defined
          startup phase, by Andi Kleen.

   - Generic perf tooling infrastructure/plumbing changes:

        . Tidy up sample parsing validation, by Adrian Hunter.

        . Fix up jobserver setup in libtraceevent Makefile.
          by Arnaldo Carvalho de Melo.

        . Debug improvements, by Adrian Hunter.

        . Fix correlation of samples coming after PERF_RECORD_EXIT event,
          by David Ahern.

        . Improve robustness of the topology parsing code,
          by Stephane Eranian.

        . Add group leader sampling, that allows just one event in a group
          to sample while the other events have just its values read,
          by Jiri Olsa.

        . Add support for a new modifier "D", which requests that the
          event, or group of events, be pinned to the PMU.
          By Michael Ellerman.

        . Support callchain sorting based on addresses, by Andi Kleen

        . Prep work for multi perf data file storage, by Jiri Olsa.

        . libtraceevent cleanups, by Namhyung Kim.

  And lots and lots of other fixes and code reorganizations that did not
  make it into the list, see the shortlog, diffstat and the Git log for
  details!"

[ Also merge a leftover from the 3.11 cycle ]

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Prevent race in unthrottling code

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (237 commits)
  perf trace: Tell arg formatters the arg index
  perf trace: Add beautifier for open's flags arg
  perf trace: Add beautifier for lseek's whence arg
  perf tools: Fix symbol offset computation for some dsos
  perf list: Skip unsupported events
  perf tests: Add 'keep tracking' test
  perf tools: Add support for PERF_COUNT_SW_DUMMY
  perf: Add a dummy software event to keep tracking
  perf trace: Add beautifier for futex 'operation' parm
  perf trace: Allow syscall arg formatters to mask args
  perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()
  perf: Export struct perf_branch_entry to userspace
  perf: Add attr->mmap2 attribute to an event
  perf/x86: Add Silvermont (22nm Atom) support
  perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X
  perf trace: Handle missing HUGEPAGE defines
  perf trace: Honor target pid / tid options when analyzing a file
  perf trace: Add option to analyze events in a file versus live
  perf evlist: Add tracepoint lookup by name
  perf tests: Add a sample parsing test
  ...
2013-09-04 08:25:35 -07:00
Linus Torvalds
40031da445 ACPI and power management updates for 3.12-rc1
1) ACPI-based PCI hotplug (ACPIPHP) subsystem rework and introduction
     of Intel Thunderbolt support on systems that use ACPI for signalling
     Thunderbolt hotplug events.  This also should make ACPIPHP work in
     some cases in which it was known to have problems.  From
     Rafael J Wysocki, Mika Westerberg and Kirill A Shutemov.
 
  2) ACPI core code cleanups and dock station support cleanups from
     Jiang Liu and Rafael J Wysocki.
 
  3) Fixes for locking problems related to ACPI device hotplug from
     Rafael J Wysocki.
 
  4) ACPICA update to version 20130725 includig fixes, cleanups, support
     for more than 256 GPEs per GPE block and a change to make the ACPI
     PM Timer optional (we've seen systems without the PM Timer in the
     field already).  One of the fixes, related to the DeRefOf operator,
     is necessary to prevent some Windows 8 oriented AML from causing
     problems to happen.  From Bob Moore, Lv Zheng, and Jung-uk Kim.
 
  5) Removal of the old and long deprecated /proc/acpi/event interface
     and related driver changes from Thomas Renninger.
 
  6) ACPI and Xen changes to make the reduced hardware sleep work with
     the latter from Ben Guthro.
 
  7) ACPI video driver cleanups and a blacklist of systems that should
     not tell the BIOS that they are compatible with Windows 8 (or ACPI
     backlight and possibly other things will not work on them).  From
     Felipe Contreras.
 
  8) Assorted ACPI fixes and cleanups from Aaron Lu, Hanjun Guo,
     Kuppuswamy Sathyanarayanan, Lan Tianyu, Sachin Kamat, Tang Chen,
     Toshi Kani, and Wei Yongjun.
 
  9) cpufreq ondemand governor target frequency selection change to
     reduce oscillations between min and max frequencies (essentially,
     it causes the governor to choose target frequencies proportional
     to load) from Stratos Karafotis.
 
 10) cpufreq fixes allowing sysfs attributes file permissions to be
     preserved over suspend/resume cycles Srivatsa S Bhat.
 
 11) Removal of Device Tree parsing for CPU device nodes from multiple
     cpufreq drivers that required some changes related to
     of_get_cpu_node() to be made in a few architectures and in the
     driver core.  From Sudeep KarkadaNagesha.
 
 12) cpufreq core fixes and cleanups related to mutual exclusion and
     driver module references from Viresh Kumar, Lukasz Majewski and
     Rafael J Wysocki.
 
 13) Assorted cpufreq fixes and cleanups from Amit Daniel Kachhap,
     Bartlomiej Zolnierkiewicz, Hanjun Guo, Jingoo Han, Joseph Lo,
     Julia Lawall, Li Zhong, Mark Brown, Sascha Hauer, Stephen Boyd,
     Stratos Karafotis, and Viresh Kumar.
 
 14) Fixes to prevent race conditions in coupled cpuidle from happening
     from Colin Cross.
 
 15) cpuidle core fixes and cleanups from Daniel Lezcano and
     Tuukka Tikkanen.
 
 16) Assorted cpuidle fixes and cleanups from Daniel Lezcano,
     Geert Uytterhoeven, Jingoo Han, Julia Lawall, Linus Walleij,
     and Sahara.
 
 17) System sleep tracing changes from Todd E Brandt and Shuah Khan.
 
 18) PNP subsystem conversion to using struct dev_pm_ops for power
     management from Shuah Khan.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJSJcKhAAoJEKhOf7ml8uNsplIQAJSOshxhkkemvFOuHZ+0YIbh
 R9aufjXeDkMDBi8YtU+tB7ERth1j+0LUSM0NTnP51U7e+7eSGobA9s5jSZQj2l7r
 HFtnSOegLuKAfqwgfSLK91xa1rTFdfW0Kych9G2nuHtBIt6P0Oc59Cb5M0oy6QXs
 nVtaDEuU//tmO71+EF5HnMJHabRTrpvtn/7NbDUpU7LZYpWJrHJFT9xt1rXNab7H
 YRCATPm3kXGRg58Doc3EZE4G3D7DLvq74jWMaI089X/m5Pg1G6upqArypOy6oxdP
 p2FEzYVrb2bi8fakXp7BBeO1gCJTAqIgAkbSSZHLpGhFaeEMmb9/DWPXdm2TjzMV
 c1EEucvsqZWoprXgy12i5Hk814xN8d8nBBLg/UYiRJ44nc/hevXfyE9ZYj6bkseJ
 +GNHmZIa1QYC05nnGli4+W4kHns8EZf/gmvIxnPuco1RN2yMWagrud5/G6Dr9M2B
 hzJV6qauLVzgZso4oe79zv9aVxe/dPHKANLD/sg23WBiJJbJF1ocBlnj2Xlbpqze
 pmMUWGiO/gUiS0fmpW/lAJauza5jFmSCjE4E8R0Gyn0j4YXjmMhdEanaU6J3VuCi
 yVgEzYEth4sowq4AflMMLKYN+WmozDnK7taZRGmT0t+EKRFKLT6EgnNrkQgs1vKl
 oawD9LM4fZ8E0yroOEme
 =CgqW
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management updates from Rafael Wysocki:

 1) ACPI-based PCI hotplug (ACPIPHP) subsystem rework and introduction
    of Intel Thunderbolt support on systems that use ACPI for signalling
    Thunderbolt hotplug events.  This also should make ACPIPHP work in
    some cases in which it was known to have problems.  From
    Rafael J Wysocki, Mika Westerberg and Kirill A Shutemov.

 2) ACPI core code cleanups and dock station support cleanups from
    Jiang Liu and Rafael J Wysocki.

 3) Fixes for locking problems related to ACPI device hotplug from
    Rafael J Wysocki.

 4) ACPICA update to version 20130725 includig fixes, cleanups, support
    for more than 256 GPEs per GPE block and a change to make the ACPI
    PM Timer optional (we've seen systems without the PM Timer in the
    field already).  One of the fixes, related to the DeRefOf operator,
    is necessary to prevent some Windows 8 oriented AML from causing
    problems to happen.  From Bob Moore, Lv Zheng, and Jung-uk Kim.

 5) Removal of the old and long deprecated /proc/acpi/event interface
    and related driver changes from Thomas Renninger.

 6) ACPI and Xen changes to make the reduced hardware sleep work with
    the latter from Ben Guthro.

 7) ACPI video driver cleanups and a blacklist of systems that should
    not tell the BIOS that they are compatible with Windows 8 (or ACPI
    backlight and possibly other things will not work on them).  From
    Felipe Contreras.

 8) Assorted ACPI fixes and cleanups from Aaron Lu, Hanjun Guo,
    Kuppuswamy Sathyanarayanan, Lan Tianyu, Sachin Kamat, Tang Chen,
    Toshi Kani, and Wei Yongjun.

 9) cpufreq ondemand governor target frequency selection change to
    reduce oscillations between min and max frequencies (essentially,
    it causes the governor to choose target frequencies proportional
    to load) from Stratos Karafotis.

10) cpufreq fixes allowing sysfs attributes file permissions to be
    preserved over suspend/resume cycles Srivatsa S Bhat.

11) Removal of Device Tree parsing for CPU device nodes from multiple
    cpufreq drivers that required some changes related to
    of_get_cpu_node() to be made in a few architectures and in the
    driver core.  From Sudeep KarkadaNagesha.

12) cpufreq core fixes and cleanups related to mutual exclusion and
    driver module references from Viresh Kumar, Lukasz Majewski and
    Rafael J Wysocki.

13) Assorted cpufreq fixes and cleanups from Amit Daniel Kachhap,
    Bartlomiej Zolnierkiewicz, Hanjun Guo, Jingoo Han, Joseph Lo,
    Julia Lawall, Li Zhong, Mark Brown, Sascha Hauer, Stephen Boyd,
    Stratos Karafotis, and Viresh Kumar.

14) Fixes to prevent race conditions in coupled cpuidle from happening
    from Colin Cross.

15) cpuidle core fixes and cleanups from Daniel Lezcano and
    Tuukka Tikkanen.

16) Assorted cpuidle fixes and cleanups from Daniel Lezcano,
    Geert Uytterhoeven, Jingoo Han, Julia Lawall, Linus Walleij,
    and Sahara.

17) System sleep tracing changes from Todd E Brandt and Shuah Khan.

18) PNP subsystem conversion to using struct dev_pm_ops for power
    management from Shuah Khan.

* tag 'pm+acpi-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (217 commits)
  cpufreq: Don't use smp_processor_id() in preemptible context
  cpuidle: coupled: fix race condition between pokes and safe state
  cpuidle: coupled: abort idle if pokes are pending
  cpuidle: coupled: disable interrupts after entering safe state
  ACPI / hotplug: Remove containers synchronously
  driver core / ACPI: Avoid device hot remove locking issues
  cpufreq: governor: Fix typos in comments
  cpufreq: governors: Remove duplicate check of target freq in supported range
  cpufreq: Fix timer/workqueue corruption due to double queueing
  ACPI / EC: Add ASUSTEK L4R to quirk list in order to validate ECDT
  ACPI / thermal: Add check of "_TZD" availability and evaluating result
  cpufreq: imx6q: Fix clock enable balance
  ACPI: blacklist win8 OSI for buggy laptops
  cpufreq: tegra: fix the wrong clock name
  cpuidle: Change struct menu_device field types
  cpuidle: Add a comment warning about possible overflow
  cpuidle: Fix variable domains in get_typical_interval()
  cpuidle: Fix menu_device->intervals type
  cpuidle: CodingStyle: Break up multiple assignments on single line
  cpuidle: Check called function parameter in get_typical_interval()
  ...
2013-09-03 15:59:39 -07:00
Linus Torvalds
542a086ac7 Driver core patches for 3.12-rc1
Here's the big driver core pull request for 3.12-rc1.
 
 Lots of tiny changes here fixing up the way sysfs attributes are
 created, to try to make drivers simpler, and fix a whole class race
 conditions with creations of device attributes after the device was
 announced to userspace.
 
 All the various pieces are acked by the different subsystem maintainers.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.21 (GNU/Linux)
 
 iEYEABECAAYFAlIlIPcACgkQMUfUDdst+ynUMwCaAnITsxyDXYQ4DqEsz8EcOtMk
 718AoLrgnUZs3B+70AT34DVktg4HSThk
 =USl9
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core patches from Greg KH:
 "Here's the big driver core pull request for 3.12-rc1.

  Lots of tiny changes here fixing up the way sysfs attributes are
  created, to try to make drivers simpler, and fix a whole class race
  conditions with creations of device attributes after the device was
  announced to userspace.

  All the various pieces are acked by the different subsystem
  maintainers"

* tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (119 commits)
  firmware loader: fix pending_fw_head list corruption
  drivers/base/memory.c: introduce help macro to_memory_block
  dynamic debug: line queries failing due to uninitialized local variable
  sysfs: sysfs_create_groups returns a value.
  debugfs: provide debugfs_create_x64() when disabled
  rbd: convert bus code to use bus_groups
  firmware: dcdbas: use binary attribute groups
  sysfs: add sysfs_create/remove_groups for when SYSFS is not enabled
  driver core: add #include <linux/sysfs.h> to core files.
  HID: convert bus code to use dev_groups
  Input: serio: convert bus code to use drv_groups
  Input: gameport: convert bus code to use drv_groups
  driver core: firmware: use __ATTR_RW()
  driver core: core: use DEVICE_ATTR_RO
  driver core: bus: use DRIVER_ATTR_WO()
  driver core: create write-only attribute macros for devices and drivers
  sysfs: create __ATTR_WO()
  driver-core: platform: convert bus code to use dev_groups
  workqueue: convert bus code to use dev_groups
  MEI: convert bus code to use dev_groups
  ...
2013-09-03 11:37:15 -07:00
Linus Torvalds
bc08b449ee lockref: implement lockless reference count updates using cmpxchg()
Instead of taking the spinlock, the lockless versions atomically check
that the lock is not taken, and do the reference count update using a
cmpxchg() loop.  This is semantically identical to doing the reference
count update protected by the lock, but avoids the "wait for lock"
contention that you get when accesses to the reference count are
contended.

Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
Even when the lockref reference counts are updated atomically with
cmpxchg, the fact that they also verify the state of the spinlock means
that the lockless updates can never happen while somebody else holds the
spinlock.

So while "lockref_put_or_lock()" looks a lot like just another name for
"atomic_dec_and_lock()", and both optimize to lockless updates, they are
fundamentally different: the decrement done by atomic_dec_and_lock() is
truly independent of any lock (as long as it doesn't decrement to zero),
so a locked region can still see the count change.

The lockref structure, in contrast, really is a *locked* reference
count.  If you hold the spinlock, the reference count will be stable and
you can modify the reference count without using atomics, because even
the lockless updates will see and respect the state of the lock.

In order to enable the cmpxchg lockless code, the architecture needs to
do three things:

 (1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
     in an aligned u64, and have a "cmpxchg()" implementation that works
     on such a u64 data type.

 (2) define a helper function to test for a spinlock being unlocked
     ("arch_spin_value_unlocked()")

 (3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
     Kconfig file.

This enables it for x86-64 (but not 32-bit, we'd need to make sure
cmpxchg() turns into the proper cmpxchg8b in order to enable it for
32-bit mode).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-02 12:12:15 -07:00
H. Peter Anvin
7263dda41b x86, smap: Handle csum_partial_copy_*_user()
Add SMAP annotations to csum_partial_copy_to/from_user().  These
functions legitimately access user space and thus need to set the AC
flag.

TODO: add explicit checks that the side with the kernel space pointer
really points into kernel space.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-2aps0u00eer658fd5xyanan7@git.kernel.org
Cc: <stable@vger.kernel.org> # v3.7+
2013-09-01 14:09:48 -07:00
H. Peter Anvin
f69fa9a91f x86, doc: Update uaccess.h comment to reflect clang changes
Update comment in uaccess.h to reflect the changes for clang support:
gcc only cares about the base register (most architectures don't
encode the size of the operation in the operands like x86 does, and so
it is treated effectively like a register number), whereas clang tries
to enforce the size -- but not for register pairs.

Link: http://lkml.kernel.org/r/1377803585-5913-3-git-send-email-dl9pf@gmx.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jan-Simon Möller <dl9pf@gmx.de>
2013-08-29 13:34:50 -07:00
Jan-Simon Möller
bdfc017eea x86, asm: Fix a compilation issue with clang
Clang does not support the "shortcut" we're taking here for gcc (see below).
The patch uses the macro _ASM_DX to do the job.

From arch/x86/include/asm/uaccess.h:
/*
 * Careful: we have to cast the result to the type of the pointer
 * for sign reasons.
 *
 * The use of %edx as the register specifier is a bit of a
 * simplification, as gcc only cares about it as the starting point
 * and not size: for a 64-bit value it will use %ecx:%edx on 32 bits
 * (%ecx being the next register in gcc's x86 register sequence), and
 * %rdx on 64 bits.
 */

[ hpa: I consider this a compatibility bug in clang as this reflects a
  bit of a misunderstanding about how register strings are used by
  gcc, but the workaround is straightforward and there is no
  particular reason to not do it. ]

Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de>
Link: http://lkml.kernel.org/r/1377803585-5913-3-git-send-email-dl9pf@gmx.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-29 13:26:33 -07:00
Jan-Simon Möller
3e9b2327b5 x86, asm: Extend definitions of _ASM_* with a raw format
The __ASM_* macros (e.g. __ASM_DX) are used to return the proper
register name (e.g. edx for 32bit / rdx for 64bit). We want to use
this also in arch/x86/include/asm/uaccess.h / get_user() .  For this
to work, we need a raw form as both gcc and clang choke on the
whitespace in a register asm() statement, and the __ASM_FORM macro
surrounds the argument with blanks.  A new macro, __ASM_FORM_RAW was
added and we change __ASM_REG to use the new RAW form.

Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de>
Link: http://lkml.kernel.org/r/1377803585-5913-2-git-send-email-dl9pf@gmx.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-29 13:26:32 -07:00
Ingo Molnar
aee2bce3cf Merge branch 'linus' into perf/core
Pick up the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-29 12:02:08 +02:00
Marek Szyprowski
a254738039 drivers: dma-contiguous: clean source code and prepare for device tree
This patch cleans the initialization of dma contiguous framework. The
all-in-one dma_declare_contiguous() function is now separated into
dma_contiguous_reserve_area() which only steals the the memory from
memblock allocator and dma_contiguous_add_device() function, which
assigns given device to the specified reserved memory area. This improves
the flexibility in defining contiguous memory areas and assigning device
to them, because now it is possible to assign more than one device to
the given contiguous memory area. Such split in initialization procedure
is also required for upcoming device tree support.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Tomasz Figa <t.figa@samsung.com>
2013-08-27 09:18:29 +02:00
Rafael J. Wysocki
7a330a5416 Merge branch 'pm-cpufreq'
* pm-cpufreq: (60 commits)
  cpufreq: pmac32-cpufreq: remove device tree parsing for cpu nodes
  cpufreq: pmac64-cpufreq: remove device tree parsing for cpu nodes
  cpufreq: maple-cpufreq: remove device tree parsing for cpu nodes
  cpufreq: arm_big_little: remove device tree parsing for cpu nodes
  cpufreq: kirkwood-cpufreq: remove device tree parsing for cpu nodes
  cpufreq: spear-cpufreq: remove device tree parsing for cpu nodes
  cpufreq: highbank-cpufreq: remove device tree parsing for cpu nodes
  cpufreq: cpufreq-cpu0: remove device tree parsing for cpu nodes
  cpufreq: imx6q-cpufreq: remove device tree parsing for cpu nodes
  drivers/bus: arm-cci: avoid parsing DT for cpu device nodes
  ARM: mvebu: remove device tree parsing for cpu nodes
  ARM: topology: remove hwid/MPIDR dependency from cpu_capacity
  of/device: add helper to get cpu device node from logical cpu index
  driver/core: cpu: initialize of_node in cpu's device struture
  ARM: DT/kernel: define ARM specific arch_match_cpu_phys_id
  of: move of_get_cpu_node implementation to DT core library
  powerpc: refactor of_get_cpu_node to support other architectures
  openrisc: remove undefined of_get_cpu_node declaration
  microblaze: remove undefined of_get_cpu_node declaration
  cpufreq: fix bad unlock balance on !CONFIG_SMP
  ...
2013-08-27 01:44:40 +02:00
Srivatsa Vaddagiri
6aef266c6e kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state.
the presence of these hypercalls is indicated to guest via
kvm_feature_pv_unhalt.

Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration
During migration, any vcpu that got kicked but did not become runnable
(still in halted state) should be runnable after migration.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
[Raghu: Apic related changes, folding pvunhalted into vcpu_runnable
 Added flags for future use (suggested by Gleb)]
[ Raghu: fold pv_unhalt flag as suggested by Eric Northup]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26 12:47:09 +03:00
Raghavendra K T
4b0a867085 kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi
this is needed by both guest and host.

Originally-from: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26 12:46:01 +03:00
Rafael J. Wysocki
4eb5178c9c Merge back earlier 'pm-cpufreq' material. 2013-08-23 00:55:13 +02:00
Kevin Hilman
bfa664f21b ARM: tegra: core SoC enhancements for 3.12
This branch includes a number of enhancements to core SoC support for
 Tegra devices. The major new features are:
 
 * Adds a new CPU-power-gated cpuidle state for Tegra114.
 * Adds initial system suspend support for Tegra114, initially supporting
   just CPU-power-gating during suspend.
 * Adds "LP1" suspend mode support for all of Tegra20/30/114. This mode
   both gates CPU power, and places the DRAM into self-refresh mode.
 * A new DT-driven PCIe driver to Tegra20/30. The driver is also moved
   from arch/arm/mach-tegra/ to drivers/pci/host/.
 
 The PCIe driver work depends on the following tag from Thomas Petazzoni:
 git://git.infradead.org/linux-mvebu.git mis-3.12.2
 ... which is merged into the middle of this pull request.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSDlwwAAoJEMzrak5tbycxR68QAJZ/Izc9Izj0JH8hmCEvMNfi
 ub1DQfWAy3oXk0ttkk+BMvuyD8JTvBr8LSK8GqjZs//rFGlW81A4NHTvCwoKZjKe
 hgrRgI2B1wj3Um1sp8le9D0klKrTcfmpXrOxH8ALgz0BIpMge8AGZHkV0SrfQa1z
 bKiISFVAw12WJCVrQ2nbzpZGU51lbyJ/+RghttM1a8LuS2P03CZgt2kqiytk3UVK
 uiGEy3sCkjXLFO3EsUvM6ha623S6BumCAYjNfgDowTVKaoEe1r2TD4bFeU6lGcXJ
 mlVTv0Kywazf4Q2gKzkbDz8UQMArW4hok2iILHzz+sf/Rn0hie5XVqhFlbBlcae8
 vyWsHmqvmE9BJAK2G2RLs9cJCTzEpEyAjUWfE3sIIa3ztSguT5+PHndDLR/d76aS
 j8L3FYReICZ1NuNw1JSQPFs9g2EWJbNRiy+8o9O2elsJMpLDBj/FcV6TVpudbBTI
 z7hvN+XSVYUaCVD4e8ma9YoC3VGseiAZvd+Y8hPd2MFBECVPNpy2bOacieU6Bgxh
 zjSBXZ/URxN3rTkv9+F3BLWAOfVmJYN0rKV9YfM/rqpWjc9iQx30m1fRZDnXWhvd
 ps8eFIYsKqc6v9AAugl/RexFy4Laav9eREjb0k2LA8ClLhK/qLLuiisVmKWS/grh
 lX9tzPEG2nZcjxSYaEjz
 =ve9i
 -----END PGP SIGNATURE-----

Merge tag 'tegra-for-3.12-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/swarren/linux-tegra into next/soc

From: Stephen Warren:
ARM: tegra: core SoC enhancements for 3.12

This branch includes a number of enhancements to core SoC support for
Tegra devices. The major new features are:

* Adds a new CPU-power-gated cpuidle state for Tegra114.
* Adds initial system suspend support for Tegra114, initially supporting
  just CPU-power-gating during suspend.
* Adds "LP1" suspend mode support for all of Tegra20/30/114. This mode
  both gates CPU power, and places the DRAM into self-refresh mode.
* A new DT-driven PCIe driver to Tegra20/30. The driver is also moved
  from arch/arm/mach-tegra/ to drivers/pci/host/.

The PCIe driver work depends on the following tag from Thomas Petazzoni:
git://git.infradead.org/linux-mvebu.git mis-3.12.2
... which is merged into the middle of this pull request.

* tag 'tegra-for-3.12-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/swarren/linux-tegra: (33 commits)
  ARM: tegra: disable LP2 cpuidle state if PCIe is enabled
  MAINTAINERS: Add myself as Tegra PCIe maintainer
  PCI: tegra: set up PADS_REFCLK_CFG1
  PCI: tegra: Add Tegra 30 PCIe support
  PCI: tegra: Move PCIe driver to drivers/pci/host
  PCI: msi: add default MSI operations for !HAVE_GENERIC_HARDIRQS platforms
  ARM: tegra: add LP1 suspend support for Tegra114
  ARM: tegra: add LP1 suspend support for Tegra20
  ARM: tegra: add LP1 suspend support for Tegra30
  ARM: tegra: add common LP1 suspend support
  clk: tegra114: add LP1 suspend/resume support
  ARM: tegra: config the polarity of the request of sys clock
  ARM: tegra: add common resume handling code for LP1 resuming
  ARM: pci: add ->add_bus() and ->remove_bus() hooks to hw_pci
  of: pci: add registry of MSI chips
  PCI: Introduce new MSI chip infrastructure
  PCI: remove ARCH_SUPPORTS_MSI kconfig option
  PCI: use weak functions for MSI arch-specific functions
  ARM: tegra: unify Tegra's Kconfig a bit more
  ARM: tegra: remove the limitation that Tegra114 can't support suspend
  ...

Signed-off-by: Kevin Hilman <khilman@linaro.org>
2013-08-21 10:17:18 -07:00
John Haxby
edb6f29464 crypto: xor - Check for osxsave as well as avx in crypto/xor
This affects xen pv guests with sufficiently old versions of xen and
sufficiently new hardware.  On such a system, a guest with a btrfs
root won't even boot.

Signed-off-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-08-21 21:08:35 +10:00
Yoshihiro YUNOMAE
17405453f4 x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lock
Prevent crash_kexec() from deadlocking on ioapic_lock. When
crash_kexec() is executed on a CPU, the CPU will take ioapic_lock
in disable_IO_APIC(). So if the cpu gets an NMI while locking
ioapic_lock, a deadlock will happen.

In this patch, ioapic_lock is zapped/initialized before disable_IO_APIC().

You can reproduce this deadlock the following way:

1. Add mdelay(1000) after raw_spin_lock_irqsave() in
   native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.c

   Although the deadlock can occur without this modification, it will increase
   the potential of the deadlock problem.

2. Build and install the kernel

3. Set up the OS which will run panic() and kexec when NMI is injected
    # echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf
    # vim /etc/default/grub
      add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line
    # grub2-mkconfig

4. Reboot the OS

5. Run following command for each vcpu on the guest
    # while true; do echo <CPU num> > /proc/irq/<IO-APIC-edge or IO-APIC-fasteoi>/smp_affinitity; done;
   By running this command, cpus will get ioapic_lock for setting affinity.

6. Inject NMI (push a dump button or execute 'virsh inject-nmi <domain>' if you
   use VM). After injecting NMI, panic() is called in an nmi-handler context.
   Then, kexec will normally run in panic(), but the operation will be stopped
   by deadlock on ioapic_lock in crash_kexec()->machine_crash_shutdown()->
   native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()->
   clear_IO_APIC_pin()->ioapic_read_entry().

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/20130820070107.28245.83806.stgit@yunodevel
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-20 09:26:33 +02:00
Linus Torvalds
7067552dfb Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two AMD microcode loader fixes and an OLPC firmware support fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, AMD: Fix early microcode loading
  x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86
  x86: Don't clear olpc_ofw_header when sentinel is detected
2013-08-19 09:18:29 -07:00
Linus Torvalds
f1d6e17f54 Merge branch 'akpm' (patches from Andrew Morton)
Merge a bunch of fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  fs/proc/task_mmu.c: fix buffer overflow in add_page_map()
  arch: *: Kconfig: add "kernel/Kconfig.freezer" to "arch/*/Kconfig"
  ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id()
  x86 get_unmapped_area(): use proper mmap base for bottom-up direction
  ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page
  ocfs2: Revert 40bd62e to avoid regression in extended allocation
  drivers/rtc/rtc-stmp3xxx.c: provide timeout for potentially endless loop polling a HW bit
  hugetlb: fix lockdep splat caused by pmd sharing
  aoe: adjust ref of head for compound page tails
  microblaze: fix clone syscall
  mm: save soft-dirty bits on file pages
  mm: save soft-dirty bits on swapped pages
  memcg: don't initialize kmem-cache destroying work for root caches
2013-08-14 10:04:43 -07:00
Srivatsa Vaddagiri
92b75202e5 kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisor
During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130810193849.GA25260@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
[Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related changes,
addition of safe_halt for irq enabled case, bailout spinning in nmi case(Gleb)]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-14 13:12:35 +02:00
Ingo Molnar
ccb1f55e71 Those are basically two fixes which correct the AMD early ucode loader
from accessing cpu_data too early, i.e. before smp_store_cpu_info()
 has copied the boot_cpu_data ontop and overwritten an already empty
 structure (which we shouldn't access that early in the first place
 anyway).
 
 The second patch is kinda largish for that late in the game but it
 shouldn't be problematic because we're simply switching from using
 cpu_data to use the CPU family number directly and thus again, not use
 uninitialized cpu_data structure.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSCT0UAAoJEBLB8Bhh3lVKWNsQAJbu1y79mc2vHR15NqMGXVoP
 F6sUNC3iUnNjxlUxWn3PPynsJS9zFTBEo1S6ILnsfuMWliDoH+4psHLMmG5F20TV
 XJCSS9s3fdsaJZiNa7JXggdI80xU3y8uB/z3nacRDKgcdw3tX+ITP4SdPgNF5F3k
 B/XGAFxyXfl9M9SRFMIO4GS9UyFMsDQP0VWMRhcosY1/Fj5bc9Uds1ngP9E4XMe9
 jB1pIzbvtDvtolYSdShwFE9M0os90TYPHVSgriYYXHLZPryZC8mWkc7GwSTl4bG9
 EwBGXv79SJWics9vWM2n9EQbFt46fUAyhpjyOo/fqVp4L5XjKuV7UyeT7X+bsY0q
 b3z1jWzBxGZ5ltE2BmZ8tVnfKgwYJlKgAf8FbSrylhtBInP80Wau1aYTzw50h9D2
 lmjO/rSOhm2FszdeFHG/IeVbSZJbSFrUdwm51nx2xF1qG0MXldy9ehJ4pO3Ui2XA
 d0z2y83ZxOYV723fTCa1/5C5xAes7LjvxftM32G9QTu7R5XnvVrfehVfPh9K1DJA
 nIEH6pbM358/PT+/q7BCPTnH7KVi+mE633YERDOfAgz/NFnVKfKzLBM0+AyFNHjk
 a4xFrOqVbTctW1Chv8Nh1457A2+gcT8yeju1XjbLE8IMqKOGyMt0gnRZlgMYj8BH
 WYKWi2q0LaDwsOcaQ9bm
 =cxbY
 -----END PGP SIGNATURE-----

Merge tag 'amd_ucode_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent

Pull AMD microcode fixes from Borislav Petkov:

 " Those are basically two fixes which correct the AMD early ucode loader
   from accessing cpu_data too early, i.e. before smp_store_cpu_info()
   has copied the boot_cpu_data ontop and overwritten an already empty
   structure (which we shouldn't access that early in the first place
   anyway).

   The second patch is kinda largish for that late in the game but it
   shouldn't be problematic because we're simply switching from using
   cpu_data to use the CPU family number directly and thus again, not use
   uninitialized cpu_data structure. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-14 12:16:28 +02:00
Linn Crosetto
30e46b574a x86: avoid remapping data in parse_setup_data()
Type SETUP_PCI, added by setup_efi_pci(), may advertise a ROM size
larger than early_memremap() is able to handle, which is currently
limited to 256kB. If this occurs it leads to a NULL dereference in
parse_setup_data().

To avoid this, remap the setup_data header and allow parsing functions
for individual types to handle their own data remapping.

Signed-off-by: Linn Crosetto <linn@hp.com>
Link: http://lkml.kernel.org/r/1376430401-67445-1-git-send-email-linn@hp.com
Acked-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-13 23:29:19 -07:00
Cyrill Gorcunov
41bb3476b3 mm: save soft-dirty bits on file pages
Andy reported that if file page get reclaimed we lose the soft-dirty bit
if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get
encoded into pte entry.  Thus when #pf happens on such non-present pte
we can restore it back.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-13 17:57:48 -07:00
Cyrill Gorcunov
179ef71cbc mm: save soft-dirty bits on swapped pages
Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set
get swapped out, the bit is getting lost and no longer available when
pte read back.

To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in
pte entry for the page being swapped out.  When such page is to be read
back from a swap cache we check for bit presence and if it's there we
clear it and restore the former _PAGE_SOFT_DIRTY bit back.

One of the problem was to find a place in pte entry where we can save
the _PTE_SWP_SOFT_DIRTY bit while page is in swap.  The _PAGE_PSE was
chosen for that, it doesn't intersect with swap entry format stored in
pte.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-13 17:57:47 -07:00
Oleg Nesterov
e0acd0a68e sched: fix the theoretical signal_wake_up() vs schedule() race
This is only theoretical, but after try_to_wake_up(p) was changed
to check p->state under p->pi_lock the code like

	__set_current_state(TASK_INTERRUPTIBLE);
	schedule();

can miss a signal. This is the special case of wait-for-condition,
it relies on try_to_wake_up/schedule interaction and thus it does
not need mb() between __set_current_state() and if(signal_pending).

However, this __set_current_state() can move into the critical
section protected by rq->lock, now that try_to_wake_up() takes
another lock we need to ensure that it can't be reordered with
"if (signal_pending(current))" check inside that section.

The patch is actually one-liner, it simply adds smp_wmb() before
spin_lock_irq(rq->lock). This is what try_to_wake_up() already
does by the same reason.

We turn this wmb() into the new helper, smp_mb__before_spinlock(),
for better documentation and to allow the architectures to change
the default implementation.

While at it, kill smp_mb__after_lock(), it has no callers.

Perhaps we can also add smp_mb__before/after_spinunlock() for
prepare_to_wait().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-13 08:19:26 -07:00
Jan Kiszka
cc2df20c7c KVM: x86: Update symbolic exit codes
Add decoding for INVEPT and reorder the list according to the reason
numbers.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-08-13 16:58:42 +02:00
Ingo Molnar
6356bb0ad6 Bit 12 may or may not be set in MCi_STATUS.MCACOD when
an uncorrected error is reported. Ignore it when checking
 error signatures.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR/912AAoJEKurIx+X31iBz/oP/javBL5Q098ir8qj2GdlkSV0
 sPZkJ0Qd4AmmcvYgiYfE3wVL0F8mL2R3gFLcppN7wBTNqAjFOEOJb2wofwByiv+T
 0eOM9OWxPZTsoxdiCHd5pvoNuw6lkegUfmxbb9vr9Xs0JaJuhjQY/bbbUbi2ue2V
 WoghLZvcSyZBGAaFJ3tWNdPYroHp4OulbvVdgbb4+iLgaLz73zzpv1HQvBnw6CLW
 x+P5guI5FawZnS6FjPwfjIs1mKqU5fdBxGl4vHB55IPyexWOVY6i+zc9VG0EuzDE
 za+FW2v8ohJzzxNP3/u4W3ousJoXkZSrwlZvDvuEkozrM8izibmDr2JglYqnQ2sK
 5WMXL1aXHyTISWpHYiH0/218hljUhxaC5+TfNVeiZYytFULSyZOES8Em3fENx0gN
 HohTZkyBH0jNd2OZytSqS69/dvPgDIFD7qGR6KB2CaBAU+c/qH9M6g8vfaXt5Y/6
 2a0oKrZ+WXiXYgEq3PynnTHTiaHYDo2rjBN+yQDrauomopZv0qpEMEicS66G0lE3
 7+zh3CQOiv6WL9pYQxYjeIiP46H7tb+BSpbsDYDQ23++nLj61by1YXrLBJ2MsGep
 2xEDkoVE7jW5+vskcSIUfavmp7pNNnIpRsU2cb7bem44iTc2DkuHYXZtHstvQIDN
 qIwoXyi3JE2siMTs4icc
 =LbVP
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce-f-bit' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull MCE-uncorrected-error fix from Tony Luck:

 "Bit 12 may or may not be set in MCi_STATUS.MCACOD when
  an uncorrected error is reported. Ignore it when checking
  error signatures."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12 19:51:43 +02:00
Torsten Kaiser
84516098b5 x86, microcode, AMD: Fix early microcode loading
load_microcode_amd() (and the helper it is using) should not have an
cpu parameter. The microcode loading does not depend on the CPU wrt the
patches loaded since they will end up in a global list for all CPUs
anyway.

The change from cpu to x86family in load_microcode_amd()
now allows to drop the code messing with cpu_data(cpu) from
collect_cpu_info_amd_early(), which is wrong anyway because at that
point the per-cpu cpu_info is not yet setup (These values would later be
overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info()).

Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(),
because its only used at one place and without the cpuinfo_x86 accesses
it was not much left.

Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com>
[ Fengguang: build fix ]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[ Boris: adapt it to current tree. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12 18:32:45 +02:00
Ingo Molnar
0237d7f355 Merge branch 'x86/mce' into x86/ras
Pursue a single RAS/MCE topic branch on x86.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12 17:54:05 +02:00
Thomas Petazzoni
4287d824f2 PCI: use weak functions for MSI arch-specific functions
Until now, the MSI architecture-specific functions could be overloaded
using a fairly complex set of #define and compile-time
conditionals. In order to prepare for the introduction of the msi_chip
infrastructure, it is desirable to switch all those functions to use
the 'weak' mechanism. This commit converts all the architectures that
were overidding those MSI functions to use the new strategy.

Note that we keep two separate, non-weak, functions
default_teardown_msi_irqs() and default_restore_msi_irqs() for the
default behavior of the arch_teardown_msi_irqs() and
arch_restore_msi_irqs(), as the default behavior is needed by x86 PCI
code.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Daniel Price <daniel.price@gmail.com>
Tested-by: Thierry Reding <thierry.reding@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: linux-s390@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David S. Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2013-08-12 15:26:39 +00:00
Daniel Drake
d55e37bb0f x86: Don't clear olpc_ofw_header when sentinel is detected
OpenFirmware wasn't quite following the protocol described in boot.txt
and the kernel has detected this through use of the sentinel value
in boot_params. OFW does zero out almost all of the stuff that it should
do, but not the sentinel.

This causes the kernel to clear olpc_ofw_header, which breaks x86 OLPC
support.

OpenFirmware has now been fixed. However, it would be nice if we could
maintain Linux compatibility with old firmware versions. To do that, we just
have to avoid zeroing out olpc_ofw_header.

OFW does not write to any other parts of the header that are being zapped
by the sentinel-detection code, and all users of olpc_ofw_header are
somewhat protected through checking for the OLPC_OFW_SIG magic value
before using it. So this should not cause any problems for anyone.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/20130809221420.618E6FAB03@dev.laptop.org
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.9+
2013-08-09 15:29:48 -07:00
Konrad Rzeszutek Wilk
6efa20e49b xen: Support 64-bit PV guest receiving NMIs
This is based on a patch that Zhenzhong Duan had sent - which
was missing some of the remaining pieces. The kernel has the
logic to handle Xen-type-exceptions using the paravirt interface
in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME -
pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN -
pv_cpu_ops.iret).

That means the nmi handler (and other exception handlers) use
the hypervisor iret.

The other changes that would be neccessary for this would
be to translate the NMI_VECTOR to one of the entries on the
ipi_vector and make xen_send_IPI_mask_allbutself use different
events.

Fortunately for us commit 1db01b4903
(xen: Clean up apic ipi interface) implemented this and we piggyback
on the cleanup such that the apic IPI interface will pass the right
vector value for NMI.

With this patch we can trigger NMIs within a PV guest (only tested
x86_64).

For this to work with normal PV guests (not initial domain)
we need the domain to be able to use the APIC ops - they are
already implemented to use the Xen event channels. For that
to be turned on in a PV domU we need to remove the masking
of X86_FEATURE_APIC.

Incidentally that means kgdb will also now work within
a PV guest without using the 'nokgdbroundup' workaround.

Note that the 32-bit version is different and this patch
does not enable that.

CC: Lisa Nguyen <lisa@xenapiadmin.com>
CC: Ben Guthro <benjamin.guthro@citrix.com>
CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v1: Fixed up per David Vrabel comments]
Reviewed-by: Ben Guthro <benjamin.guthro@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2013-08-09 10:55:47 -04:00
Raghavendra K T
3a3bb00d5c kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi
These are needed by both guest and host.

Originally-from: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1376058122-8248-13-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Acked-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09 07:54:12 -07:00
Jeremy Fitzhardinge
96f853eaa8 x86, ticketlock: Add slowpath logic
Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks.  The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

Unlocker			Locker
				test for lock pickup
					-> fail
unlock
test slowpath
	-> false
				set slowpath flags
				block

Whereas this works in any ordering:

Unlocker			Locker
				set slowpath flags
				test for lock pickup
					-> fail
				block
unlock
test slowpath
	-> true, kick

If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.

The unlock code uses a locked add to update the head counter.  This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs.  If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.

Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks.  However, if PV ticketlocks are not
enabled, then the old non-locked "add" is the only unlocking code.

Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
doesn't the generated code isn't too bad, but its definitely suboptimal.

Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09 07:54:00 -07:00
Jeremy Fitzhardinge
4a1ed4ca68 x86, pvticketlock: When paravirtualizing ticket locks, increment by 2
Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a "is in slowpath state" bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Link: http://lkml.kernel.org/r/1376058122-8248-9-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09 07:53:50 -07:00