linux

Author	SHA1	Message	Date
Andy Lutomirski	d0a0de21f8	x86/asm/entry: Remove INIT_TSS and fold the definitions into 'cpu_tss' The INIT_TSS is unnecessary. Just define the initial TSS where 'cpu_tss' is defined. While we're at it, merge the 32-bit and 64-bit definitions. The only syntactic change is that 32-bit kernels were computing sp0 as long, but now they compute it as unsigned long. Verified by objdump: the contents and relocations of .data..percpu..shared_aligned are unchanged on 32-bit and 64-bit kernels. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8fc39fa3f6c5d635e93afbdd1a0fe0678a6d7913.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:58 +01:00
Andy Lutomirski	24933b82c0	x86/asm/entry: Rename 'init_tss' to 'cpu_tss' It has nothing to do with init -- there's only one TSS per cpu. Other names considered include: - current_tss: Confusing because we never switch the tss. - singleton_tss: Too long. This patch was generated with 's/init_tss/cpu_tss/g'. Followup patches will fix INIT_TSS and INIT_TSS_IST by hand. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:58 +01:00
Andy Lutomirski	9d0c914c60	x86/asm/entry/64/compat: Change the 32-bit sysenter code to use sp0 The ia32 sysenter code loaded the top of the kernel stack into rsp by loading kernel_stack and then adjusting it. It can be simplified to just read sp0 directly. This requires the addition of a new asm-offsets entry for sp0. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/88ff9006163d296a0665338585c36d9bfb85235d.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:58 +01:00
Andy Lutomirski	75182b1632	x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0() This will make modifying the semantics of kernel_stack easier. The change to ist_begin_non_atomic() is necessary because sp0 no longer points to the same THREAD_SIZE-aligned region as RSP; it's one byte too high for that. At Denys' suggestion, rather than offsetting it, just check explicitly that we're in the correct range ending at sp0. This has the added benefit that we no longer assume that the thread stack is aligned to THREAD_SIZE. Suggested-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ef8254ad414cbb8034c9a56396eeb24f5dd5b0de.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:57 +01:00
Andy Lutomirski	8ef46a672a	x86/asm/entry: Add this_cpu_sp0() to read sp0 for the current cpu We currently store references to the top of the kernel stack in multiple places: kernel_stack (with an offset) and init_tss.x86_tss.sp0 (no offset). The latter is defined by hardware and is a clean canonical way to find the top of the stack. Add an accessor so we can start using it. This needs minor paravirt tweaks. On native, sp0 defines the top of the kernel stack and is therefore always correct. On Xen and lguest, the hypervisor tracks the top of the stack, but we want to start reading sp0 in the kernel. Fixing this is simple: just update our local copy of sp0 as well as the hypervisor's copy on task switches. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8d675581859712bee09a055ed8f785d80dac1eca.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-06 08:32:57 +01:00
Rafael J. Wysocki	8204680c7b	Merge branch 'acpi-resources' * acpi-resources: x86/PCI/ACPI: Relax ACPI resource descriptor checks to work around BIOS bugs x86/PCI/ACPI: Ignore resources consumed by host bridge itself PCI: versatile: Update for list_for_each_entry() API change	2015-03-05 23:14:40 +01:00
Linus Torvalds	99aedde086	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: EFI fixes, an Intel Quark fix, an asm fix and an FPU handling fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu/xsaves: Fix improper uses of __ex_table x86/intel/quark: Select COMMON_CLK x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization firmware: dmi_scan: Fix dmi_len type efi/libstub: Fix boundary checking in efi_high_alloc() firmware: dmi_scan: Fix dmi scan to handle "End of Table" structure	2015-03-05 11:25:23 -08:00
Quentin Casasnovas	06c8173eb9	x86/fpu/xsaves: Fix improper uses of __ex_table Commit: `f31a9f7c71` ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area") introduced alternative instructions for XSAVES/XRSTORS and commit: `adb9d526e9` ("x86/xsaves: Add xsaves and xrstors support for booting time") added support for the XSAVES/XRSTORS instructions at boot time. Unfortunately both failed to properly protect them against faulting: The 'xstate_fault' macro will use the closest label named '1' backward and that ends up in the .altinstr_replacement section rather than in .text. This means that the kernel will never find in the __ex_table the .text address where this instruction might fault, leading to serious problems if userspace manages to trigger the fault. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Jamie Iles <jamie.iles@oracle.com> [ Improved the changelog, fixed some whitespace noise. ] Acked-by: Borislav Petkov <bp@alien8.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Cc: Allan Xavier <mr.a.xavier@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `adb9d526e9` ("x86/xsaves: Add xsaves and xrstors support for booting time") Fixes: `f31a9f7c71` ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 18:20:36 +01:00
Andy Shevchenko	9ab6eb51ef	x86/intel/quark: Select COMMON_CLK The commit `8bbc2a135b` ("x86/intel/quark: Add Intel Quark platform support") introduced a minimal support of Intel Quark SoC. That allows to use core parts of the SoC. However, the SPI, I2C, and GPIO drivers can't be selected by kernel configuration because they depend on COMMON_CLK. The patch adds a COMMON_CLK selection to the platfrom definition to allow user choose the drivers. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Ong, Boon Leong <boon.leong.ong@intel.com> Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Darren Hart <dvhart@linux.intel.com> Fixes: `8bbc2a135b` ("x86/intel/quark: Add Intel Quark platform support") Link: http://lkml.kernel.org/r/1425569044-2867-1-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 17:44:53 +01:00
Ingo Molnar	c709feda56	x86/mm/pat: Initialize __cachemode2pte_tbl[] and __pte2cachemode_tbl[] in a bit more readable fashion The initialization of these two arrays is a bit difficult to follow: restructure it optically so that a 2D structure shows which bit in the PTE is set and which not. Also improve on comments a bit. No code or data changed: # arch/x86/mm/init.o: text data bss dec hex filename 4585 424 29776 34785 87e1 init.o.before 4585 424 29776 34785 87e1 init.o.after md5: a82e11ff58bcfd0af3a94662a701f65d init.o.before.asm a82e11ff58bcfd0af3a94662a701f65d init.o.after.asm Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150305082135.GB5969@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 09:48:17 +01:00
Ingo Molnar	e61980a702	x86/mm: Simplify probe_page_size_mask() Now that we've simplified the gbpages config space, move the 'page_size_mask' initialization into probe_page_size_mask(), right next to the PSE and PGE enablement lines. Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: JBeulich@suse.com Cc: Jan Beulich <JBeulich@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Lindgren <tony@atomide.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: julia.lawall@lip6.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 09:23:12 +01:00
Ingo Molnar	10971ab269	x86/mm: Further simplify 1 GB kernel linear mappings handling It's a bit pointless to allow Kconfig configuration for 1GB kernel mappings, it's already hidden behind a 'default y' and CONFIG_EXPERT. Remove this complication and simplify the code by renaming CONFIG_ENABLE_DIRECT_GBPAGES to CONFIG_X86_DIRECT_GBPAGES and document the DEBUG_PAGE_ALLOC and KMEMCHECK quirks. Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: JBeulich@suse.com Cc: Jan Beulich <JBeulich@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Lindgren <tony@atomide.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: julia.lawall@lip6.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 09:23:04 +01:00
Luis R. Rodriguez	73c8c861dc	x86/mm: Use early_param_on_off() for direct_gbpages The enabler / disabler is pretty simple, just use the provided wrappers, this lets us easily relate the variable to the associated Kconfig entry. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: JBeulich@suse.com Cc: Jan Beulich <JBeulich@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Lindgren <tony@atomide.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: julia.lawall@lip6.fr Link: http://lkml.kernel.org/r/1425518654-3403-5-git-send-email-mcgrof@do-not-panic.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 08:02:12 +01:00
Luis R. Rodriguez	e5008abe92	x86/mm: Simplify enabling direct_gbpages direct_gbpages can be force enabled as an early parameter but not really have taken effect when DEBUG_PAGEALLOC or KMEMCHECK is enabled. You can also enable direct_gbpages right now if you have an x86_64 architecture but your CPU doesn't really have support for this feature. In both cases PG_LEVEL_1G won't actually be enabled but direct_gbpages is used in other areas under the assumptions that PG_LEVEL_1G was set. Fix this by putting together all requirements which make this feature sensible to enable under, and only enable both finally flipping on PG_LEVEL_1G and leaving PG_LEVEL_1G set when this is true. We only enable this feature then to be possible on sensible builds defined by the new ENABLE_DIRECT_GBPAGES. If the CPU has support for it you can either enable this by using the DIRECT_GBPAGES option or using the early kernel parameter. If a platform had support for this you can always force disable it as well. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: JBeulich@suse.com Cc: Jan Beulich <JBeulich@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Lindgren <tony@atomide.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: julia.lawall@lip6.fr Link: http://lkml.kernel.org/r/1425518654-3403-3-git-send-email-mcgrof@do-not-panic.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 08:02:12 +01:00
Luis R. Rodriguez	d9fd579c21	x86/mm: Use IS_ENABLED() for direct_gbpages Replace #ifdef eyesore with IS_ENABLED() use. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: JBeulich@suse.com Cc: Jan Beulich <JBeulich@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Lindgren <tony@atomide.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: julia.lawall@lip6.fr Link: http://lkml.kernel.org/r/1425518654-3403-2-git-send-email-mcgrof@do-not-panic.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 08:02:11 +01:00
Rusty Russell	d089f8e97d	x86: fix up obsolete cpu function usage. Thanks to spatch, plus manual removal of "&*". Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: x86@kernel.org	2015-03-05 15:25:05 +10:30
Andy Lutomirski	956421fbb7	x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization 'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and the related state make sense for 'ret_from_sys_call'. This is entirely the wrong check. TS_COMPAT would make a little more sense, but there's really no point in keeping this optimization at all. This fixes a return to the wrong user CS if we came from int 0x80 in a 64-bit task. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net [ Backported from tip:x86/asm. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 01:12:23 +01:00
Wang Nan	5eca7453d6	x86/traps: Separate set_intr_gate() and clean up early_trap_init() As early_trap_init() doesn't use IST, replace set_intr_gate_ist() and set_system_intr_gate_ist() with their standard counterparts. set_intr_gate() requires a trace_debug symbol which we don't have and won't use. This patch separates set_intr_gate() into two parts, and uses base version in early_trap_init(). Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: <dave.hansen@linux.intel.com> Cc: <lizefan@huawei.com> Cc: <masami.hiramatsu.pt@hitachi.com> Cc: <oleg@redhat.com> Cc: <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425010789-13714-1-git-send-email-wangnan0@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-05 00:47:29 +01:00
Andy Lutomirski	1e3fbb8a1d	x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization 'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and the related state make sense for 'ret_from_sys_call'. This is entirely the wrong check. TS_COMPAT would make a little more sense, but there's really no point in keeping this optimization at all. This fixes a return to the wrong user CS if we came from int 0x80 in a 64-bit task. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:53 +01:00
Denys Vlasenko	d441c1f2b7	x86/asm/entry/64: Simplify optimistic SYSRET Avoid redundant load of %r11 (it is already loaded a few instructions before). Also simplify %rsp restoration, instead of two steps: add $0x80, %rsp mov 0x18(%rsp), %rsp we can do a simplified single step to restore user-space RSP: mov 0x98(%rsp), %rsp and get the same result. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> [ Clarified the changelog. ] Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1aef69b346a6db0d99cdfb0f5ba83e8c985e27d7.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:52 +01:00
Denys Vlasenko	b3ab90b333	x86/asm/entry/64/compat: Use more readable constant The last instance of "mysterious" SS+8 constant is replaced by SIZEOF_PTREGS. Message-Id: <1424822419-10267-1-git-send-email-dvlasenk@redhat.com> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/d35aeba3059407ac54f472ddcfbea767ff8916ac.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:52 +01:00
Denys Vlasenko	911d2bb5cc	x86/asm/entry/64: Use more readable constants Constants such as SS+8 or SS+8-RIP are mysterious. In most cases, SS+8 is just meant to be SIZEOF_PTREGS, SS+8-RIP is RIP's offset in the iret frame. This patch changes some of these constants to be less mysterious. No code changes (verified with objdump). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1d20491384773bd606e23a382fac23ddb49b5178.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:52 +01:00
Denys Vlasenko	14f6e9532d	x86/asm/entry/64/compat: Fold the IA32_ARG_FIXUP macro into its callers Use of a small macro - one with conditional expansion - does more harm than good. It obfuscates code, with minimal code reuse. For example, because of obfuscation it's not obvious that in 'ia32_sysenter_target', we can optimize loading of r9 - currently it is loaded with a detour through ebp. This patch folds the IA32_ARG_FIXUP macro into its callers. No code changes. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/4da092094cd78734384ac31e0d4ec1d8f69145a2.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:52 +01:00
Denys Vlasenko	ebfc453e27	x86/asm/entry/64: Clean up and document various entry code details This patch does a lot of cleanup in comments and formatting, but it does not change any code: - Rename 'save_paranoid' to 'paranoid_entry': this makes naming similar to its "non-paranoid" sibling, 'error_entry', and to its counterpart, 'paranoid_exit'. - Use the same CFI annotation atop 'paranoid_entry' and 'error_entry'. - Fix irregular indentation of assembler operands. - Add/fix comments on top of 'paranoid_entry' and 'error_entry'. - Remove stale comment about "oldrax". - Make comments about "no swapgs" flag in ebx more prominent. - Deindent wrongly indented top-level comment atop 'paranoid_exit'. - Indent wrongly deindented comment inside 'error_entry'. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/4640f9fcd5ea46eb299b1cd6d3f5da3167d2f78d.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:51 +01:00
Denys Vlasenko	1eeb207f87	x86/asm/entry/64: Move 'save_paranoid' and 'ret_from_fork' closer to their users For some odd reason, these two functions are at the very top of the file. "save_paranoid"'s caller is approximately in the middle of it, move it there. Move 'ret_from_fork' to be right after fork/exec helpers. This is a pure block move, nothing is changed in the function bodies. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/6446bbfe4094532623a5b83779b7015fec167a9d.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:51 +01:00
Denys Vlasenko	b87cf63e2a	x86/asm/entry: Add comments about various syscall instructions SYSCALL/SYSRET and SYSENTER/SYSEXIT have weird semantics. Moreover, they differ in 32- and 64-bit mode. What is saved? What is not? Is rsp set? Are interrupts disabled? People tend to not remember these details well enough. This patch adds comments which explain in detail what registers are modified by each of these instructions. The comments are placed immediately before corresponding entry and exit points. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/a94b98b63527797c871a81402ff5060b18fa880a.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:51 +01:00
Andy Lutomirski	050273d19b	x86/asm/entry/64: Remove 'int_check_syscall_exit_work' Nothing references it anymore. Reported-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `96b6352c12` ("x86_64, entry: Remove the syscall exit audit and schedule optimizations") Link: http://lkml.kernel.org/r/dd2a4d26ecc7a5db61b476727175cd99ae2b32a4.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:50 +01:00
Denys Vlasenko	f2db9382c1	x86/asm/entry: Do mass removal of 'ARGOFFSET' ARGOFFSET is zero now, removing it changes no code. A few macros lost "offset" parameter, since it is always zero now too. No code changes - verified with objdump. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/8689f937622d9d2db0ab8be82331fa15e4ed4713.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:50 +01:00
Denys Vlasenko	0d55083698	x86/asm/entry/64: Shrink code in 'paranoid_exit' RESTORE_EXTRA_REGS + RESTORE_C_REGS looks small, but it's a lot of instructions (fourteen). Let's reuse them. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> [ Cleaned up the labels. ] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1421272101-16847-2-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/59d71848cee3ec9eb48c0252e602efd6bd560e3c.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:50 +01:00
Denys Vlasenko	e90e147cbc	x86/asm/entry/64: Fix comments - Misleading and slightly incorrect comments in "struct pt_regs" are fixed (four instances). - Fix incorrect comment atop EMPTY_FRAME macro. - Explain in more detail what we do with stack layout during hw interrupt. - Correct comments about "partial stack frame" which are no longer true. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1423778052-21038-3-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/e1f4429c491fe6ceeddb879dea2786e0f8920f9c.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:49 +01:00
Denys Vlasenko	76f5df43ca	x86/asm/entry/64: Always allocate a complete "struct pt_regs" on the kernel stack The 64-bit entry code was using six stack slots less by not saving/restoring registers which are callee-preserved according to the C ABI, and was not allocating space for them. Only when syscalls needed a complete "struct pt_regs" was the complete area allocated and filled in. As an additional twist, on interrupt entry a "slightly less truncated pt_regs" trick is used, to make nested interrupt stacks easier to unwind. This proved to be a source of significant obfuscation and subtle bugs. For example, 'stub_fork' had to pop the return address, extend the struct, save registers, and push return address back. Ugly. 'ia32_ptregs_common' pops return address and "returns" via jmp insn, throwing a wrench into CPU return stack cache. This patch changes the code to always allocate a complete "struct pt_regs" on the kernel stack. The saving of registers is still done lazily. "Partial pt_regs" trick on interrupt stack is retained. Macros which manipulate "struct pt_regs" on stack are reworked: - ALLOC_PT_GPREGS_ON_STACK allocates the structure. - SAVE_C_REGS saves to it those registers which are clobbered by C code. - SAVE_EXTRA_REGS saves to it all other registers. - Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros reverse it. 'ia32_ptregs_common', 'stub_fork' and friends lost their ugly dance with the return pointer. LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets instead of magic numbers. 'error_entry' and 'save_paranoid' now use SAVE_C_REGS + SAVE_EXTRA_REGS instead of having it open-coded yet again. Patch was run-tested: 64-bit executables, 32-bit executables, strace works. Timing tests did not show measurable difference in 32-bit and 64-bit syscalls. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1423778052-21038-2-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/b89763d354aa23e670b9bdf3a40ae320320a7c2e.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:49 +01:00
Denys Vlasenko	6e1327bd2b	x86/asm/entry/64: Fix incorrect symbolic constant usage: R11->ARGOFFSET Since the last fix of this nature, a few more instances have crept in. Fix them up. No object code changes (constants have the same value). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1423778052-21038-1-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/f5e1c4084319a42e5f14d41e2d638949ce66bc08.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:49 +01:00
Denys Vlasenko	49db46a67b	x86/asm: Introduce push/pop macros which generate CFI_REL_OFFSET and CFI_RESTORE Sequences: pushl_cfi %reg CFI_REL_OFFSET reg, 0 and: popl_cfi %reg CFI_RESTORE reg happen quite often. This patch adds macros which generate them. No assembly changes (verified with objdump -dr vmlinux.o). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1421017655-25561-1-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/2202eb90f175cf45d1b2d1c64dbb5676a8ad07ad.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:49 +01:00
Denys Vlasenko	69e8544cd0	x86/asm/64: Open-code register save/restore in trace_hardirqs() thunks This is a preparatory patch for change in "struct pt_regs" handling in entry_64.S. trace_hardirqs() thunks were (ab)using a part of the 'pt_regs' handling code, namely the SAVE_ARGS/RESTORE_ARGS macros, to save/restore registers across C function calls. Since SAVE_ARGS is going to be changed, open-code register saving/restoring here. Incidentally, this removes a bit of dead code: one SAVE_ARGS was used just to emit a CFI annotation, but it also generated unreachable assembly instructions. Take a page from thunk_32.S and use push/pop instructions instead of movq, they are far shorter: 1 or 2 bytes versus 5, and no need for instructions to adjust %rsp: text data bss dec hex filename 333 40 0 373 175 thunk_64_movq.o 104 40 0 144 90 thunk_64_push_pop.o [ This is ugly as sin, but we'll fix up the ugliness in the next patch. I see no point in reordering patches just to avoid an ugly intermediate state. --Andy ] Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1420927210-19738-4-git-send-email-dvlasenk@redhat.com Link: http://lkml.kernel.org/r/4c979ad604f0f02c5ade3b3da308b53eabd5e198.1424989793.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 22:50:48 +01:00
Linus Torvalds	b8e81a3b68	Merge git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Marcelo Tosatti: "KVM bug fixes, including a SVM interrupt injection regression fix, MIPS and ARM bug fixes" * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: MIPS: Enable after disabling interrupt KVM: MIPS: Fix trace event to save PC directly KVM: SVM: fix interrupt injection (apic->isr_count always 0) KVM: emulate: fix CMPXCHG8B on 32-bit hosts KVM: VMX: fix build without CONFIG_SMP arm/arm64: KVM: Add exit reaons to kvm_exit event tracing ARM: KVM: Fix size check in __coherent_cache_guest_page	2015-03-04 09:54:10 -08:00
Jiang Liu	63f1789ec7	x86/PCI/ACPI: Ignore resources consumed by host bridge itself When parsing resources for PCI host bridge, we should ignore resources consumed by host bridge itself and only report window resources available to child PCI busses. Fixes: `593669c2ac` (x86/PCI/ACPI: Use common ACPI resource interfaces ...) Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-03-04 14:09:44 +01:00
Ingo Molnar	f8e92fb4b0	A more involved rework of the alternatives framework to be able to pad instructions and thus make using the alternatives macros more straightforward and without having to figure out old and new instruction sizes but have the toolchain figure that out for us. Furthermore, it optimizes JMPs used so that fetch and decode can be relieved with smaller versions of the JMPs, where possible. Some stats: x86_64 defconfig: Alternatives sites total: 2478 Total padding added (in Bytes): 6051 The padding is currently done for: X86_FEATURE_ALWAYS X86_FEATURE_ERMS X86_FEATURE_LFENCE_RDTSC X86_FEATURE_MFENCE_RDTSC X86_FEATURE_SMAP This is with the latest version of the patchset. Of course, on each machine the alternatives sites actually being patched are a proper subset of the total number. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU9ekpAAoJEBLB8Bhh3lVKyjYP/AiHEiHkkjnpwTt49kUtUMI6 GIlGfJVNjp5LLnSRD/fkL/wdkBgQtMzr9O1g8Qi/lbFqxsOFteU9f1OtLx34ZwZw MhtdiHcrKGMsaIxTJh4FaqPHBT5ussm2yn1jlAX+LgILd3dpqe3oytsO8JihcK9j t2u9V/Lq92TV7zXxGgWJsPc86WhhgdldlU3X96S++Di18bnDaKbGkzthU6WzZG/H qtFZ5bfK8TlVHYduft+D9ZPzFYGp1WCOa03qU4+Djaxw02HDB6Ltysend9zg0lB1 RT/BP0PwHD3mOL11qpgtV1ChCbR8FJMN/z5+YdSNJgzDQA0H5Sf0UueTweosfAz+ /iC5t/wkegdYtqtA0nKVypYOJCS+UdfMZXenYgtSUJl6drB6I5BCW4mVft3AuWo+ EilPGpblvmjWRx1HiF4/Q/5zrSWHzmKQDyXuyxI9m0OUxAGAM0+8CY6wOqRA5pX+ /f5MjZ1hXELQGhl5Qdj4nqJacICGevJ8WYdZ53B+uYVxz7fbXk9hSYcZKT94UshD qSdaV4XJSuC7pDKqiWoNWXp5N1g+D2BgfwoQEr/RnodFZRlfc+cmOv/visak0OLr E/pp1vJvCi3+T3ImX1MCDiXmflQtFctiL3hNgMXYK2IGhJb2RDC2bFeZkksOHuAE BGgrn+usQDjVlikEnfI3 =0KXp -----END PGP SIGNATURE----- Merge tag 'alternatives_padding' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/asm Pull alternative instructions framework improvements from Borislav Petkov: "A more involved rework of the alternatives framework to be able to pad instructions and thus make using the alternatives macros more straightforward and without having to figure out old and new instruction sizes but have the toolchain figure that out for us. Furthermore, it optimizes JMPs used so that fetch and decode can be relieved with smaller versions of the JMPs, where possible. Some stats: x86_64 defconfig: Alternatives sites total: 2478 Total padding added (in Bytes): 6051 The padding is currently done for: X86_FEATURE_ALWAYS X86_FEATURE_ERMS X86_FEATURE_LFENCE_RDTSC X86_FEATURE_MFENCE_RDTSC X86_FEATURE_SMAP This is with the latest version of the patchset. Of course, on each machine the alternatives sites actually being patched are a proper subset of the total number." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 06:36:15 +01:00
Ingo Molnar	d2c032e3dc	Linux 4.0-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJU9enEAAoJEHm+PkMAQRiG/ewIAJ4MW4tcAhaVj6ndCF3+uL/b RaVm1apUjsTloe5Fl0TT9J5CO3zdOetmMNToy2sf0W4MJDIyHf21o83l7eniV/6q al/c3fQ6HVtNjiSUNghTtzVlL+gUD1F60b9BGYi1V5h2Mp8u0NG1alTGLQfCB8sE ArB+v2aWEdSPn7mZDA0Yuc1In+8bkpht3oy+OLD/8JNkqqLnml9YOyPjM1cuRpBr NxKCLcPzSHH9/nR3T6XtkxXYV5xD3+CDm9roJhfHukoFmfT/G3C65Zcp2KEed/Cw QQpu+ox7fpUs10F/Fbfm8AE+tRB4o2sGh97sprXrO5oaFdx6FPIBo4WN8i/Vy68= =qpY+ -----END PGP SIGNATURE----- Merge tag 'v4.0-rc2' into x86/asm, to refresh the tree Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 06:35:43 +01:00
Brian Gerst	7e8e385aaf	x86/compat: Remove sys32_vm86_warning The check against lastcomm is racy, and the message it produces isn't necessary. vm86 support can be disabled on a 32-bit kernel also, and doesn't have this message. Switch to sys_ni_syscall instead. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425439896-8322-4-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 06:16:21 +01:00
Brian Gerst	2aa4a71092	x86/compat: Merge native and compat 32-bit syscall tables Combine the 32-bit syscall tables into one file. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425439896-8322-3-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 06:16:21 +01:00
Brian Gerst	29a5ff97fa	x86/compat: Remove compat_ni_syscall() compat_ni_syscall() does the same thing as sys_ni_syscall(). Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425439896-8322-2-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-04 06:16:21 +01:00
Ingo Molnar	25efdcb43c	The first part of the scrubbing of the intel early microcode loader. There's more work to come but let's unload this pile first. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU9LsjAAoJEBLB8Bhh3lVKVIMP/0xUfRb/wV8P+HtJ6St41G8/ OygkO/D4UJcZiZP2xrxNix5/waExpegtEIqJPbO6Wlyq0N+0imCZxcsPsgmdw2Zo CQv6eu4p0on/v8EUFTJV9+ZHqs6zhch1tNQMfLuq+nXH7f8okSzbDL25RWoo8QU5 qOOHhHwjyQzivC1KpEodwVte1nT/KLNFio3moRwONKM0/1xCBHyvK4us6QqifWow hsDnVNdoXqTzqhY43u7zxNcSzo/RMCq/4sc90augdCdZFAVbKzPGM0o0Pq5FQTmv MbMuGF80LhfMFln0tBv30IGFuEc54BBD/x3d7YYadyeX0jIE4T27Pe4xbVLoTDTM T8PnaNn/sUyiBUGYO5ff5niMwzpqnuwgKi3wSe4fJ0HIHVE/SAHQogoomP1EKb59 n66RTWV5eE9KkzdZCTdXhm8aalLK8QfbSlElOrbqwr7/qfFslNpnNUzRwhaHoN1k kk5PJ8PipZR/YmWapIU7K6lEZQoRixAb+StMiOvX5n++d+Z7d2/Mu3UqebpqlvUc nVFpbPpB9FuH0XQPfICQEUvfWf+MTP9cPw5OkMu+Zo4ok8fIt7MdHCtQJVa9d0R/ 3ZMc9daDgwU8bEYoMRn6xHSGaUjtyO6AUeFhDM7b7If9YWDg59H7nufa5ySkM5KB xhWDhQYiSjhdvToKKF/e =Jlz9 -----END PGP SIGNATURE----- Merge tag 'intel_microcode_cleanup_p1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/microcode Pull x86 microcode loader code cleanups from Borislav Petkov: "The first part of the scrubbing of the intel early microcode loader. There's more work to come but let's unload this pile first." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-03 13:53:16 +01:00
Ingo Molnar	6d4d1984df	Two small fixes to the stack dumper, a cleanup and sustaining the previous log level after a newline. (Adrien Schildknecht) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU9L+WAAoJEBLB8Bhh3lVKJqIQAKizB0nODvJi8YV2PCMRfXoP ht30vwzqhLdmFQg0vC9ARbb1OB1Eq02Iq4wBmVfXRuNH+cMuLFK0BZjxKuAbRqWi cOhvvIIdqV+f/yEXtncG7q0JrD4JrtMPVTOeEC7q1yFsXlPyR7oeW6KcrRZIZUiz Rs+04QJELM1ZkLdKh/oNsA9A8IIPysoZ0elODnMb37RX/+8Rz6Lr/lG26t07xA9n 8Bb7i1oeL34GgvnZICFtON11L3iSB6vFlv3pgshqZNV0VaN0yzlk8oJBAy09srgS vLfuEW/q2GGeOkeim48tfAvoScMS8qQFRT+U92cOzNOFtULCVH9MRy4Ymq4vVtBv EmRLv3OgI0IaLBKFLNqJdvQRvMo8Ru4XW8LCbAesLAJsKTD0YSOpWNowG+wJLVv6 DJU8jUnT8zuNYQbe2Sa3XADkwWCohatLOljd6BpkyA2qGczixqYw43iNcAA5U0WH Q7taSpx2Srmi8NxT/tRbA1DdOsXATMZN1pX7lKpQUprdC3XRTQ0GQtL+TTpo5qPX 7gdNcQOdO1Jz2cf3CLn8dmDujZFNeJo10oZ/1ShBd6YJhqhm6kJv6I8ABbNW7yPj bh5FScnYPuiikxO56CaquDexEcI9NzxMaifwtTyvtHpamHknV1ciTNPO6PqFxBc/ 2K4oIGyt1fTNElgicoLL =9DIN -----END PGP SIGNATURE----- Merge tag 'tip_x86_kernel' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/debug Pull x86 debugging updates from Borislav Petkov: "Two small fixes to the stack dumper, a cleanup and sustaining the previous log level after a newline. (Adrien Schildknecht)" Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-03 12:14:58 +01:00
Radim Krčmář	f563db4bdb	KVM: SVM: fix interrupt injection (apic->isr_count always 0) In commit `b4eef9b36d`, we started to use hwapic_isr_update() != NULL instead of kvm_apic_vid_enabled(vcpu->kvm). This didn't work because SVM had it defined and "apicv" path in apic_{set,clear}_isr() does not change apic->isr_count, because it should always be 1. The initial value of apic->isr_count was based on kvm_apic_vid_enabled(vcpu->kvm), which is always 0 for SVM, so KVM could have injected interrupts when it shouldn't. Fix it by implicitly setting SVM's hwapic_isr_update to NULL and make the initial isr_count depend on hwapic_isr_update() for good measure. Fixes: `b4eef9b36d` ("kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv") Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-02 19:04:40 -03:00
Borislav Petkov	a858b5e504	x86/microcode/intel: Fix printing of microcode blobs in show_saved_mc() When doing echo 1 > /sys/devices/system/cpu/microcode/reload in order to reload microcode, I get: microcode: Total microcode saved: 1 BUG: using smp_processor_id() in preemptible [00000000] code: bash/2606 caller is debug_smp_processor_id+0x17/0x20 CPU: 1 PID: 2606 Comm: bash Not tainted 3.19.0-rc7+ #9 Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012 ffffffff81a4266d ffff8802131db808 ffffffff81666588 0000000000000007 0000000000000001 ffff8802131db838 ffffffff812e6eef ffff8802131db868 00000000000306a9 0000000000000010 0000000000000015 ffff8802131db848 Call Trace: dump_stack check_preemption_disabled debug_smp_processor_id show_saved_mc ? save_microcode.constprop.8 save_mc_for_early ? print_context_stack ? dump_trace ? __bfs ? mark_held_locks ? get_page_from_freelist ? trace_hardirqs_on_caller ? trace_hardirqs_on ? __alloc_pages_nodemask ? __get_vm_area_node ? map_vm_area ? __vmalloc_node_range ? generic_load_microcode generic_load_microcode ? microcode_fini_cpu request_microcode_fw reload_store dev_attr_store sysfs_kf_write kernfs_fop_write vfs_write ? sysret_check SyS_write system_call_fastpath microcode: CPU1: sig=0x306a9, pf=0x10, rev=0x15 microcode: mc_saved[0]: sig=0x306a9, pf=0x12, rev=0x1b, toal size=0x3000, date = 2014-05-29 because we're using smp_processor_id() in preemtible context. And we don't really need to use it there because the microcode container we're dumping is global and CPU-specific info is irrelevant. While at it, make pr_* stuff use "microcode: " prefix for easier grepping and document how to enable the DEBUG build. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:32:34 +01:00
Borislav Petkov	4f1f605cfe	x86/microcode/intel: Check scan_microcode()'s retval ... and do not attempt to load anything in case of error. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:32:20 +01:00
Borislav Petkov	140f74fced	x86/microcode/intel: Sanitize microcode_pointer() Shorten variable names and rename it to what it does. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:32:16 +01:00
Borislav Petkov	e3d8f67476	x86/microcode/intel: Move mc arg last in get_matching_{microcode\|sig} ... arguments list so that it comes more natural for those functions to have the signature, processor flags and revision together, before the rest of the args. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:32:13 +01:00
Borislav Petkov	9e02bb46d3	x86/microcode/intel: Simplify generic_load_microcode_early() * remove state variable and out label * get rid of completely unused mc_size * shorten variable names * get rid of local variables * don't do assignments in local var declarations for less cluttered code * finally rename it to the shorter and perfectly fine load_microcode_early() No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:32:10 +01:00
Borislav Petkov	58ce8d6d3a	x86/microcode: Consolidate family,model, ... code ... to the header. Split the family acquiring function into a main one, doing CPUID and a helper which computes the extended family and is used in multiple places. Get rid of the locally-grown get_x86_{family,model}(). While at it, rename local variables to something more descriptive and vertically align assignments for better readability. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:32:07 +01:00
Borislav Petkov	4f5e5f2b57	x86/microcode/intel: Rename update_match_revision() ... to revision_is_newer() and push it up into the header and make it an inline function. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:32:03 +01:00
Borislav Petkov	c868570e74	x86/microcode/intel: Sanitize _save_mc() Shorten local variable names for better readability and flatten loop indentation levels. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:32:00 +01:00
Borislav Petkov	a5de5e242b	x86/microcode/intel: Make _save_mc() return the updated saved count ... of microcode patches instead of handing in a pointer which is used for I/O in an otherwise void function. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:31:56 +01:00
Borislav Petkov	02f35177fb	x86/microcode/intel: Simplify load_ucode_intel_bsp() Don't compute start and end from start and size in order to compute size again down the path in scan_microcode(). So pass size directly instead and simplify a bunch. Shorten variable names and remove useless ones. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:31:51 +01:00
Borislav Petkov	2d48bb9b6e	x86/microcode/intel: Get rid of last arg to load_ucode_intel_bsp() Allocate it on the helper's _load_ucode_intel_bsp() stack instead and do not hand it down. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:31:48 +01:00
Borislav Petkov	f9524e6f54	x86/microcode/intel: Do the mc_saved_src NULL check first ... and only then deref it. Also, shorten some variable names and rename others so as to diminish the ubiquitous presence of the "mc_" prefix everywhere and make it a bit more readable. Use kcalloc so that we don't kfree() uninitialized memory on the unwind path, as suggested by Quentin. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>	2015-03-02 20:31:11 +01:00
Borislav Petkov	776d3cdc93	x86/microcode/intel: Check if microcode was found before applying We should check the return value of the routines fishing out the proper microcode and not try to apply if we haven't found a suitable blob. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:31:03 +01:00
Quentin Casasnovas	d496a002ae	x86/microcode/intel: Fix out of bounds memory access to the extended header Improper pointer arithmetics when calculating the address of the extended header could lead to an out of bounds memory read and kernel panic. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Link: http://lkml.kernel.org/r/20150225094125.GB30434@chrystal.uk.oracle.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-02 20:30:42 +01:00
Arnaldo Carvalho de Melo	33be4ef116	Merge 'tip/perf/urgent' into perf/core to pick fixes Needed to build perf/core buildable in some cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2015-03-02 11:45:49 -03:00
Rusty Russell	020b37ac66	x86: Fix up obsolete __cpu_set() function usage Thanks to spatch, plus manual removal of "&*". Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425296150-4722-8-git-send-email-rusty@rustcorp.com.au Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-02 14:28:17 +01:00
Linus Torvalds	a38ecbbd0b	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A CR4-shadow 32-bit init fix, plus two typo fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too x86/platform/intel-mid: Fix trivial printk message typo in intel_mid_arch_setup() x86/cpu/intel: Fix trivial typo in intel_tlb_table[]	2015-03-01 12:22:44 -08:00
Linus Torvalds	d7b48fec35	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Two kprobes fixes and a handful of tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Make sparc64 arch point to sparc perf symbols: Define EM_AARCH64 for older OSes perf top: Fix SIGBUS on sparc64 perf tools: Fix probing for PERF_FLAG_FD_CLOEXEC flag perf tools: Fix pthread_attr_setaffinity_np build error perf tools: Define _GNU_SOURCE on pthread_attr_setaffinity_np feature check perf bench: Fix order of arguments to memcpy_alloc_mem kprobes/x86: Check for invalid ftrace location in __recover_probed_insn() kprobes/x86: Use 5-byte NOP when the code might be modified by ftrace	2015-03-01 11:56:13 -08:00
Tadeusz Struk	81e397d937	crypto: aesni - make driver-gcm-aes-aesni helper a proper aead alg Changed the __driver-gcm-aes-aesni to be a proper aead algorithm. This required a valid setkey and setauthsize functions to be added and also some changes to make sure that math context is not corrupted when the alg is used directly. Note that the __driver-gcm-aes-aesni should not be used directly by modules that can use it in interrupt context as we don't have a good fallback mechanism in this case. Signed-off-by: Adrian Hoban <adrian.hoban@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-02-28 23:31:35 +13:00
Daniel Borkmann	6bbb614ec4	x86/mm: Unexport set_memory_ro() and set_memory_rw() This effectively unexports set_memory_ro() and set_memory_rw() functions, and thus reverts: `a03352d2c1` ("x86: export set_memory_ro and set_memory_rw"). They have been introduced for debugging purposes in e1000e, but no module user is in mainline kernel (anymore?) and we explicitly do not want modules to use these functions, as they i.e. protect eBPF (interpreted & JIT'ed) images from malicious modifications or bugs. Outside of eBPF scope, I believe also other set_memory_*() functions should be unexported on x86 for modules. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Bruce Allan <bruce.w.allan@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Link: http://lkml.kernel.org/r/a064393a0a5d319eebde5c761cfd743132d4f213.1425040940.git.daniel@iogearbox.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-28 10:41:59 +01:00
Steven Rostedt	5b2bdbc845	x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too Commit: `1e02ce4ccc` ("x86: Store a per-cpu shadow copy of CR4") added a shadow CR4 such that reads and writes that do not modify the CR4 execute much faster than always reading the register itself. The change modified cpu_init() in common.c, so that the shadow CR4 gets initialized before anything uses it. Unfortunately, there's two cpu_init()s in common.c. There's one for 64-bit and one for 32-bit. The commit only added the shadow init to the 64-bit path, but the 32-bit path needs the init too. Link: http://lkml.kernel.org/r/20150227125208.71c36402@gandalf.local.home Fixes: `1e02ce4ccc` "x86: Store a per-cpu shadow copy of CR4" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150227145019.2bdd4354@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-28 08:04:20 +01:00
Ingo Molnar	5838d18955	Merge branch 'linus' into x86/urgent, to merge dependent patch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-28 08:03:10 +01:00
Juergen Gross	b8f05c8803	x86/xen: correct bug in p2m list initialization Commit `054954eb05` ("xen: switch to linear virtual mapped sparse p2m list") introduced an error. During initialization of the p2m list a p2m identity area mapped by a complete identity pmd entry has to be split up into smaller chunks sometimes, if a non-identity pfn is introduced in this area. If this non-identity pfn is not at index 0 of a p2m page the new p2m page needed is initialized with wrong identity entries, as the identity pfns don't start with the value corresponding to index 0, but with the initial non-identity pfn. This results in weird wrong mappings. Correct the wrong initialization by starting with the correct pfn. Cc: stable@vger.kernel.org # 3.19 Reported-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-02-27 14:53:19 +00:00
Lad, Prabhakar	66c046b407	crypto: sha-mb - Fix big integer constant sparse warning this patch fixes following sparse warning: sha1_mb_mgr_init_avx2.c:59:31: warning: constant 0xF76543210 is so big it is long Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-02-27 22:48:49 +13:00
Wang Nan	b4d8327024	x86/traps: Enable DEBUG_STACK after cpu_init() for TRAP_DB/BP Before this patch early_trap_init() installs DEBUG_STACK for X86_TRAP_BP and X86_TRAP_DB. However, DEBUG_STACK doesn't work correctly until cpu_init() <-- trap_init(). This patch passes 0 to set_intr_gate_ist() and set_system_intr_gate_ist() instead of DEBUG_STACK to let it use same stack as kernel, and installs DEBUG_STACK for them in trap_init(). As core runs at ring 0 between early_trap_init() and trap_init(), there is no chance to get a bad stack before trap_init(). As NMI is also enabled in trap_init(), we don't need to care about is_debug_stack() and related things used in arch/x86/kernel/nmi.c. Signed-off-by: Wang Nan <wangnan0@huawei.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: <dave.hansen@linux.intel.com> Cc: <lizefan@huawei.com> Cc: <luto@amacapital.net> Cc: <oleg@redhat.com> Link: http://lkml.kernel.org/r/1424929779-13174-1-git-send-email-wangnan0@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-26 12:29:20 +01:00
Ingo Molnar	e9e4e44309	Linux 34.0-rc1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJU6pFJAAoJEHm+PkMAQRiG2OwH/24nDK+l9zkaRs0xJsVh+qiW 8A2N1od0ickz43iMk48jfeWGkFOkd4izyvan/daJshJOE1Y5lCdSs7jq/OXVOv9L G0+KQUoC5NL0hqYKn1XJPFluNQ1yqMvrDwQt99grDGzruNGBbwHuBhAQmgzpj1nU do8KrGjr7ft1Rzm4mOAdET/ExWiF+mRSJSxxOv598HbsIRdM5wgn0hHjPlqDxmLN KH4r3YYEm0cHyjf4Krse0+YdhqdamRGJlmYxJgEsYNwCoMwkmHlLTc71diseUhrg r/VYIYQvpAA6Yvgw8rJ0N5gk/sJJig+WyyPhfQuc2bD5sbL9eO7mPnz2UP7z7ss= =vXB6 -----END PGP SIGNATURE----- Merge tag 'v4.0-rc1' into perf/core, to refresh the tree Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-26 12:24:50 +01:00
Matt Fleming	59bf7fd45c	perf/x86/intel: Enable conflicting event scheduling for CQM We can leverage the workqueue that we use for RMID rotation to support scheduling of conflicting monitoring events. Allowing events that monitor conflicting things is done at various other places in the perf subsystem, so there's precedent there. An example of two conflicting events would be monitoring a cgroup and simultaneously monitoring a task within that cgroup. This uses the cache_groups list as a queuing mechanism, where every event that reaches the front of the list gets the chance to be scheduled in, possibly descheduling any conflicting events that are running. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/1422038748-21397-10-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-25 13:53:36 +01:00
Matt Fleming	bff671dba7	perf/x86/intel: Perform rotation on Intel CQM RMIDs There are many use cases where people will want to monitor more tasks than there exist RMIDs in the hardware, meaning that we have to perform some kind of multiplexing. We do this by "rotating" the RMIDs in a workqueue, and assigning an RMID to a waiting event when the RMID becomes unused. This scheme reserves one RMID at all times for rotation. When we need to schedule a new event we give it the reserved RMID, pick a victim event from the front of the global CQM list and wait for the victim's RMID to drop to zero occupancy, before it becomes the new reserved RMID. We put the victim's RMID onto the limbo list, where it resides for a "minimum queue time", which is intended to save ourselves an expensive smp IPI when the RMID is unlikely to have a occupancy value below __intel_cqm_threshold. If we fail to recycle an RMID, even after waiting the minimum queue time then we need to increment __intel_cqm_threshold. There is an upper bound on this threshold, __intel_cqm_max_threshold, which is programmable from userland as /sys/devices/intel_cqm/max_recycling_threshold. The comments above __intel_cqm_rmid_rotate() have more details. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/1422038748-21397-9-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-25 13:53:35 +01:00
Matt Fleming	bfe1fcd268	perf/x86/intel: Support task events with Intel CQM Add support for task events as well as system-wide events. This change has a big impact on the way that we gather LLC occupancy values in intel_cqm_event_read(). Currently, for system-wide (per-cpu) events we defer processing to userspace which knows how to discard all but one cpu result per package. Things aren't so simple for task events because we need to do the value aggregation ourselves. To do this, we defer updating the LLC occupancy value in event->count from intel_cqm_event_read() and do an SMP cross-call to read values for all packages in intel_cqm_event_count(). We need to ensure that we only do this for one task event per cache group, otherwise we'll report duplicate values. If we're a system-wide event we want to fallback to the default perf_event_count() implementation. Refactor this into a common function so that we don't duplicate the code. Also, introduce PERF_TYPE_INTEL_CQM, since we need a way to track an event's task (if the event isn't per-cpu) inside of the Intel CQM PMU driver. This task information is only availble in the upper layers of the perf infrastructure. Other perf backends stash the target task in event->hw.*target so we need to do something similar. The task is used to determine whether events should share a cache group and an RMID. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/1422038748-21397-8-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-25 13:53:34 +01:00
Matt Fleming	35298e554c	perf/x86/intel: Implement LRU monitoring ID allocation for CQM It's possible to run into issues with re-using unused monitoring IDs because there may be stale cachelines associated with that ID from a previous allocation. This can cause the LLC occupancy values to be inaccurate. To attempt to mitigate this problem we place the IDs on a least recently used list, essentially a FIFO. The basic idea is that the longer the time period between ID re-use the lower the probability that stale cachelines exist in the cache. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/1422038748-21397-7-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-25 13:53:33 +01:00
Matt Fleming	4afbb24ce5	perf/x86/intel: Add Intel Cache QoS Monitoring support Future Intel Xeon processors support a Cache QoS Monitoring feature that allows tracking of the LLC occupancy for a task or task group, i.e. the amount of data in pulled into the LLC for the task (group). Currently the PMU only supports per-cpu events. We create an event for each cpu and read out all the LLC occupancy values. Because this results in duplicate values being written out to userspace, we also export a .per-pkg event file so that the perf tools only accumulate values for one cpu per package. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/1422038748-21397-6-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-25 13:53:32 +01:00
Peter P Waskiewicz Jr	cbc82b1726	x86: Add support for Intel Cache QoS Monitoring (CQM) detection This patch adds support for the new Cache QoS Monitoring (CQM) feature found in future Intel Xeon processors. It includes the new values to track CQM resources to the cpuinfo_x86 structure, plus the CPUID detection routines for CQM. CQM allows a process, or set of processes, to be tracked by the CPU to determine the cache usage of that task group. Using this data from the CPU, software can be written to extract this data and report cache usage and occupancy for a particular process, or group of processes. More information about Cache QoS Monitoring can be found in the Intel (R) x86 Architecture Software Developer Manual, section 17.14. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Webb <chris@arachsys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Honeyman <stevenhoneyman@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/1422038748-21397-5-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-25 13:53:31 +01:00
Andy Lutomirski	72c6fb4f74	x86/ia32-compat: Fix CLONE_SETTLS bitness of copy_thread() CLONE_SETTLS is expected to write a TLS entry in the GDT for 32-bit callers and to set FSBASE for 64-bit callers. The correct check is is_ia32_task(), which returns true in the context of a 32-bit syscall. TIF_IA32 is set if the task itself has a 32-bit personality, which is not the same thing. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Link: http://lkml.kernel.org/r/45e2d0d695393d76406a0c7225b82c76223e0cc5.1424822291.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-25 08:27:50 +01:00
Andy Lutomirski	08571f1ae3	x86/ptrace: Remove checks for TIF_IA32 when changing CS and SS The ability for modified CS and/or SS to be useful has nothing to do with TIF_IA32. Similarly, if there's an exploit involving changing CS or SS, it's exploitable with or without a TIF_IA32 check. So just delete the check. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Link: http://lkml.kernel.org/r/71c7ab36456855d11ae07edd4945a7dfe80f9915.1424822291.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-25 08:27:49 +01:00
Linus Torvalds	b24e2bdde4	xen: regression and bug fixes for 4.0-rc1 - Fix two regression introduced in 4.0-rc1 affecting PV/PVH guests in certain configurations. - Prevent pvscsi frontends bypassing backend checks. - Allow privcmd hypercalls to be preempted even on kernel with voluntary preemption. This fixes soft-lockups with long running toolstack hypercalls (e.g., when creating/destroying large domains). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJU7GDMAAoJEFxbo/MsZsTRj7AIAIFD2Dgu9p56nx6rUA6JOWP9 2O1VlINEkggUFNCekTJ3HkAjHreC2Ir67nk1PTTfL3ZN+PqxEv+IoUF1XKJxE/G3 0KchszK0MVnc7UBrUJWRUEpGo92iIC9QBc8MyDN6dGbfV5lT4aHB5ug/aSZqG8HV 8ITOftfwnVE8ukOK9bSRPZRejnoRMbGUmqpr0X17sE4Zb8+LEf0iO5Yh88NEjzxM OD5hL3TGN5uXh6rYHS+U82M1g83G4Q7v10WIN5O96QhA6e5rVbVNsohHpL8IHY5q kazw7wBY//SKGCGniACYnh+vZDeNddwCz8chj6S49h272zanMleyolrUjlcMGzQ= =xmcy -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-4.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bugfixes from David Vrabel: "Xen regression and bug fixes for 4.0-rc1 - Fix two regressions introduced in 4.0-rc1 affecting PV/PVH guests in certain configurations. - Prevent pvscsi frontends bypassing backend checks. - Allow privcmd hypercalls to be preempted even on kernel with voluntary preemption. This fixes soft-lockups with long running toolstack hypercalls (e.g., when creating/destroying large domains)" * tag 'stable/for-linus-4.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: Initialize cr4 shadow for 64-bit PV(H) guests xen-scsiback: mark pvscsi frontend request consumed only after last read x86/xen: allow privcmd hypercalls to be preempted x86/xen: Make sure X2APIC_ENABLE bit of MSR_IA32_APICBASE is not set	2015-02-24 09:18:48 -08:00
Linus Torvalds	84257ce6b7	Lguest weird config build fix, and update to the documentation. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU69NiAAoJENkgDmzRrbjxKIEQAJe/Us6fBCToMFt71DNMtwHf 3SsyDCUciO/lu4H2/hVRI0vK0gYEAmhPZHGTKMKvsg46NJ5Y+BXvg2ZICEbn52Oh 1guQdNJhJbjNtmHhZQbNzOLJfEz0/0IJmcCKZoJ3hVGyQ54qTMila26FUJ3qWAzO s+Wlc7sw1FScSy6RMYA1UQ7foTxpIQZZPc9QKZCMU16ZnK8OutQemQOPJp/pAmm9 LKuOLAIeF18AbuiC0BChG+wm4U7v8Tci5kEI2gRb77VUjGLcccpZPoJyaP1ZaJP/ cvkraLvERrWe+PPF/9CCB8yMDsOMfGDMPo0FrdSpSWZ6JnfZmLzKZLmtMlyGjVVA y7zn8eBGIJP519Y+WK9/ySO/FL+ibbep13wjwTYOv6AQHkwpH0VvtKOo7EDYAWht Prk+xB7u0l1SBX5LPzwN9QAu8WVeECjgYpjY7Oa3Pss/RLoZlNkV6Gk7Bhe9b7Zy NOW8nIlKRCQWS5KsM2XYRTkjB0BNoYJyHULIFSrIOBRgRVxnF1vOqRSGnIEzmrBI O2ewyUirFmTQI2RZzUL9UPidBo1DbdcUSYNV1Ex4vW+y/++v2VS7RaKDDvqU9JKG TahKm2a9Xi/0dDkKu6IwFPCtEjdu5wqRawvryz8AztjpNQbucO1qBpxbc1AmAM3g jlBf1S7OtPXf5bOk8QOy =pJ4p -----END PGP SIGNATURE----- Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull lguest fixes from Rusty Russell: "Lguest weird config build fix, and update to the documentation" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: lguest: update help text. lguest: now depends on PCI	2015-02-24 09:14:43 -08:00
Juergen Gross	954e12f7a8	x86/mm, efi: Use early_ioremap() in arch/x86/platform/efi/efi-bgrt.c Use early_ioremap() to map an I/O-area instead of early_memremap(). Signed-off-by: Juergen Gross <jgross@suse.com> Cc: matt.fleming@intel.com Link: http://lkml.kernel.org/r/1424769211-11378-5-git-send-email-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-24 15:58:07 +01:00
Juergen Gross	8d4a40bc06	x86/mm: Use early_memunmap() instead of early_iounmap() Memory mapped via early_memremap() should be unmapped with early_memunmap() instead of early_iounmap(). Signed-off-by: Juergen Gross <jgross@suse.com> Cc: matt.fleming@intel.com Link: http://lkml.kernel.org/r/1424769211-11378-2-git-send-email-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-24 15:58:06 +01:00
Ingo Molnar	a1fb6696c6	Linux 34.0-rc1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJU6pFJAAoJEHm+PkMAQRiG2OwH/24nDK+l9zkaRs0xJsVh+qiW 8A2N1od0ickz43iMk48jfeWGkFOkd4izyvan/daJshJOE1Y5lCdSs7jq/OXVOv9L G0+KQUoC5NL0hqYKn1XJPFluNQ1yqMvrDwQt99grDGzruNGBbwHuBhAQmgzpj1nU do8KrGjr7ft1Rzm4mOAdET/ExWiF+mRSJSxxOv598HbsIRdM5wgn0hHjPlqDxmLN KH4r3YYEm0cHyjf4Krse0+YdhqdamRGJlmYxJgEsYNwCoMwkmHlLTc71diseUhrg r/VYIYQvpAA6Yvgw8rJ0N5gk/sJJig+WyyPhfQuc2bD5sbL9eO7mPnz2UP7z7ss= =vXB6 -----END PGP SIGNATURE----- Merge tag 'v4.0-rc1' into x86/mm, to refresh the tree Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-24 15:55:28 +01:00
Yannick Guerrini	579deee571	x86/platform/intel-mid: Fix trivial printk message typo in intel_mid_arch_setup() Change 'Uknown' to 'Unknown' Signed-off-by: Yannick Guerrini <yguerrini@tomshardware.fr> Cc: trivial@kernel.org Link: http://lkml.kernel.org/r/1424710358-10140-1-git-send-email-yguerrini@tomshardware.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-24 08:52:37 +01:00
Paolo Bonzini	4ff6f8e61e	KVM: emulate: fix CMPXCHG8B on 32-bit hosts This has been broken for a long time: it broke first in 2.6.35, then was almost fixed in 2.6.36 but this one-liner slipped through the cracks. The bug shows up as an infinite loop in Windows 7 (and newer) boot on 32-bit hosts without EPT. Windows uses CMPXCHG8B to write to page tables, which causes a page fault if running without EPT; the emulator is then called from kvm_mmu_page_fault. The loop then happens if the higher 4 bytes are not 0; the common case for this is that the NX bit (bit 63) is 1. Fixes: `6550e1f165` Fixes: `16518d5ada` Cc: stable@vger.kernel.org # 2.6.35+ Reported-by: Erik Rull <erik.rull@rdsoftware.de> Tested-by: Erik Rull <erik.rull@rdsoftware.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-23 22:28:48 +01:00
Radim Krčmář	21bc8dc5b7	KVM: VMX: fix build without CONFIG_SMP 'apic' is not defined if !CONFIG_X86_64 && !CONFIG_X86_LOCAL_APIC. Posted interrupt makes no sense without CONFIG_SMP, and CONFIG_X86_LOCAL_APIC will be set with it. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-23 22:28:48 +01:00
Adrien Schildknecht	04769ae3ac	x86/kernel: Use kstack_end() in dumpstack_64.c i386 is already using kstack_end() in dumpstack_32.c. We should also use it to make the code clearer and unify the stack printing logic some more. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Adrien Schildknecht <adrien+dev@schischi.me> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: c: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1424618638-6375-1-git-send-email-adrien+dev@schischi.me Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 18:37:13 +01:00
Adrien Schildknecht	1fc7f61c3e	x86/kernel: Fix output of show_stack_log_lvl() show_stack_log_lvl() does not set the log level after a new line, the following messages printed with pr_cont() are thus assigned to the default log level. This patch prepends the log level to the next message following a new line. print_trace_address() uses printk(log_lvl). Using printk() with just a log level is ignored and thus has no effect on the next pr_cont(). We need to prepend the log level directly into the message. Signed-off-by: Adrien Schildknecht <adrien+dev@schischi.me> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1424399661-20327-1-git-send-email-adrien+dev@schischi.me Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 18:34:42 +01:00
Boris Ostrovsky	5054daa285	x86/xen: Initialize cr4 shadow for 64-bit PV(H) guests Commit `1e02ce4ccc` ("x86: Store a per-cpu shadow copy of CR4") introduced CR4 shadows. These shadows are initialized in early boot code. The commit missed initialization for 64-bit PV(H) guests that this patch adds. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-02-23 16:30:26 +00:00
David Vrabel	fdfd811ddd	x86/xen: allow privcmd hypercalls to be preempted Hypercalls submitted by user space tools via the privcmd driver can take a long time (potentially many 10s of seconds) if the hypercall has many sub-operations. A fully preemptible kernel may deschedule such as task in any upcall called from a hypercall continuation. However, in a kernel with voluntary or no preemption, hypercall continuations in Xen allow event handlers to be run but the task issuing the hypercall will not be descheduled until the hypercall is complete and the ioctl returns to user space. These long running tasks may also trigger the kernel's soft lockup detection. Add xen_preemptible_hcall_begin() and xen_preemptible_hcall_end() to bracket hypercalls that may be preempted. Use these in the privcmd driver. When returning from an upcall, call xen_maybe_preempt_hcall() which adds a schedule point if if the current task was within a preemptible hypercall. Since _cond_resched() can move the task to a different CPU, clear and set xen_in_preemptible_hcall around the call. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2015-02-23 16:30:24 +00:00
Boris Ostrovsky	31795b470b	x86/xen: Make sure X2APIC_ENABLE bit of MSR_IA32_APICBASE is not set Commit `d524165cb8` ("x86/apic: Check x2apic early") tests X2APIC_ENABLE bit of MSR_IA32_APICBASE when CONFIG_X86_X2APIC is off and panics the kernel when this bit is set. Xen's PV guests will pass this MSR read to the hypervisor which will return its version of the MSR, where this bit might be set. Make sure we clear it before returning MSR value to the caller. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-02-23 16:30:23 +00:00
Oleg Nesterov	110d7f7513	x86/fpu: Don't abuse FPU in kernel threads if use_eager_fpu() AFAICS, there is no reason why kernel threads should have FPU context even if use_eager_fpu() == T. Now that interrupted_kernel_fpu_idle() does not check __thread_has_fpu() in the use_eager_fpu() case, we can remove the init_fpu() code from eager_fpu_init() and change flush_thread() called by do_execve() to initialize FPU. Note: of course, the change in flush_thread() is horrible and must be cleanuped. We need the new helper, and flush_thread() should return the error if init_fpu() fails. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/20150119185212.GD16427@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 15:50:45 +01:00
Oleg Nesterov	4b2e762e2e	x86/fpu: Always allow FPU in interrupt if use_eager_fpu() The __thread_has_fpu() check in interrupted_kernel_fpu_idle() was needed to prevent the nested kernel_fpu_begin(). Now that we have in_kernel_fpu and !__thread_has_fpu() case in __kernel_fpu_begin() does not depend on use_eager_fpu() (except clts) we can remove it. __thread_has_fpu() can be false even if use_eager_fpu(), but this case does not differ from !use_eager_fpu() case except we should not worry about X86_CR0_TS, __kernel_fpu_begin()/end() will not touch this bit. Note: I think we can kill all irq_fpu_usable() checks except in_kernel_fpu, just we need to record the state of X86_CR0_TS in __kernel_fpu_begin() and conditionalize stts() in __kernel_fpu_end(), but this needs another patch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/20150119185151.GC16427@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 15:50:41 +01:00
Oleg Nesterov	7aeccb83e7	x86/fpu: __kernel_fpu_begin() should clear fpu_owner_task even if use_eager_fpu() __kernel_fpu_begin() does nothing if !__thread_has_fpu() && use_eager_fpu(), perhaps it assumes that this case is simply impossible. This is certainly not possible if in_interrupt() == T; interrupted_user_mode() should have FPU, and interrupted_kernel_fpu_idle() should fail if !__thread_has_fpu(). However, even if use_eager_fpu() == T a task can do drop_fpu(), then switch to another thread which becomes fpu_owner_task, then resume and call some function which does kernel_fpu_begin(). Say, an exiting task does a lot of things after exit_thread(), it is not safe to assume that it can't use FPU in these paths. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Pekka Riikonen <priikone@iki.fi> Link: http://lkml.kernel.org/r/20150119185132.GB16427@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 15:50:28 +01:00
Borislav Petkov	e0bc8d179e	x86/lib/memcpy_64.S: Convert memcpy to ALTERNATIVE_2 macro Make REP_GOOD variant the default after alternatives have run. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:55:52 +01:00
Borislav Petkov	a77600cd03	x86/lib/memmove_64.S: Convert memmove() to ALTERNATIVE macro Make it execute the ERMS version if support is present and we're in the forward memmove() part and remove the unfolded alternatives section definition. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:54:14 +01:00
Borislav Petkov	84d95ad4cb	x86/lib/memset_64.S: Convert to ALTERNATIVE_2 macro Make alternatives replace single JMPs instead of whole memset functions, thus decreasing the amount of instructions copied during patching time at boot. While at it, make it use the REP_GOOD version by default which means alternatives NOP out the JMP to the other versions, as REP_GOOD is set by default on the majority of relevant x86 processors. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:50:59 +01:00
Borislav Petkov	a930dc4543	x86/asm: Cleanup prefetch primitives This is based on a patch originally by hpa. With the current improvements to the alternatives, we can simply use %P1 as a mem8 operand constraint and rely on the toolchain to generate the proper instruction sizes. For example, on 32-bit, where we use an empty old instruction we get: apply_alternatives: feat: 632+8, old: (c104648b, len: 4), repl: (c195566c, len: 4) c104648b: alt_insn: 90 90 90 90 c195566c: rpl_insn: 0f 0d 4b 5c ... apply_alternatives: feat: 632+8, old: (c18e09b4, len: 3), repl: (c1955948, len: 3) c18e09b4: alt_insn: 90 90 90 c1955948: rpl_insn: 0f 0d 08 ... apply_alternatives: feat: 6*32+8, old: (c1190cf9, len: 7), repl: (c1955a79, len: 7) c1190cf9: alt_insn: 90 90 90 90 90 90 90 c1955a79: rpl_insn: 0f 0d 0d a0 d4 85 c1 all with the proper padding done depending on the size of the replacement instruction the compiler generates. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@linux.intel.com>	2015-02-23 13:44:17 +01:00
Borislav Petkov	c70e1b475f	x86/asm: Use alternative_2() in rdtsc_barrier() ... now that we have it. Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:17 +01:00
Borislav Petkov	6620ef28c8	x86/lib/clear_page_64.S: Convert to ALTERNATIVE_2 macro Move clear_page() up so that we can get 2-byte forward JMPs when patching: apply_alternatives: feat: 3*32+16, old: (ffffffff8130adb0, len: 5), repl: (ffffffff81d0b859, len: 5) ffffffff8130adb0: alt_insn: 90 90 90 90 90 recompute_jump: new_displ: 0x0000003e ffffffff81d0b859: rpl_insn: eb 3e 66 66 90 even though the compiler generated 5-byte JMPs which we padded with 5 NOPs. Also, make the REP_GOOD version be the default as the majority of machines set REP_GOOD. This way we get to save ourselves the JMP: old insn VA: 0xffffffff813038b0, CPU feat: X86_FEATURE_REP_GOOD, size: 5, padlen: 0 clear_page: ffffffff813038b0 <clear_page>: ffffffff813038b0: e9 0b 00 00 00 jmpq ffffffff813038c0 repl insn: 0xffffffff81cf0e92, size: 0 old insn VA: 0xffffffff813038b0, CPU feat: X86_FEATURE_ERMS, size: 5, padlen: 0 clear_page: ffffffff813038b0 <clear_page>: ffffffff813038b0: e9 0b 00 00 00 jmpq ffffffff813038c0 repl insn: 0xffffffff81cf0e92, size: 5 ffffffff81cf0e92: e9 69 2a 61 ff jmpq ffffffff81303900 ffffffff813038b0 <clear_page>: ffffffff813038b0: e9 69 2a 61 ff jmpq ffffffff8091631e Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:16 +01:00
Borislav Petkov	8e65f6e03a	x86/entry_32: Convert X86_INVD_BUG to ALTERNATIVE macro Booting a 486 kernel on an AMD guest with this patch applied, says: apply_alternatives: feat: 0*32+25, old: (c160a475, len: 5), repl: (c19557d4, len: 5) c160a475: alt_insn: 68 10 35 00 c1 c19557d4: rpl_insn: 68 80 39 00 c1 which is: old insn VA: 0xc160a475, CPU feat: X86_FEATURE_XMM, size: 5 simd_coprocessor_error: c160a475: 68 10 35 00 c1 push $0xc1003510 <do_general_protection> repl insn: 0xc19557d4, size: 5 c160a475: 68 80 39 00 c1 push $0xc1003980 <do_simd_coprocessor_error> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:15 +01:00
Borislav Petkov	669f8a9001	x86/smap: Use ALTERNATIVE macro ... and drop unfolded version. No need for ASM_NOP3 anymore either as the alternatives do the proper padding at build time and insert proper NOPs at boot time. There should be no apparent operational change from this patch. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:14 +01:00
Borislav Petkov	de2ff88884	x86/lib/copy_user_64.S: Convert to ALTERNATIVE_2 Use the asm macro and drop the locally grown version. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:13 +01:00
Borislav Petkov	090a3f6155	x86/lib/copy_page_64.S: Use generic ALTERNATIVE macro ... instead of the semi-version with the spelled out sections. What is more, make the REP_GOOD version be the default copy_page() version as the majority of the relevant x86 CPUs do set X86_FEATURE_REP_GOOD. Thus, copy_page gets compiled to: ffffffff8130af80 <copy_page>: ffffffff8130af80: e9 0b 00 00 00 jmpq ffffffff8130af90 <copy_page_regs> ffffffff8130af85: b9 00 02 00 00 mov $0x200,%ecx ffffffff8130af8a: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi) ffffffff8130af8d: c3 retq ffffffff8130af8e: 66 90 xchg %ax,%ax ffffffff8130af90 <copy_page_regs>: ... and after the alternatives have run, the JMP to the old, unrolled version gets NOPed out: ffffffff8130af80 <copy_page>: ffffffff8130af80: 66 66 90 xchg %ax,%ax ffffffff8130af83: 66 90 xchg %ax,%ax ffffffff8130af85: b9 00 02 00 00 mov $0x200,%ecx ffffffff8130af8a: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi) ffffffff8130af8d: c3 retq On modern uarches, those NOPs are cheaper than the unconditional JMP previously. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:12 +01:00
Borislav Petkov	4fd4b6e553	x86/alternatives: Use optimized NOPs for padding Alternatives allow now for an empty old instruction. In this case we go and pad the space with NOPs at assembly time. However, there are the optimal, longer NOPs which should be used. Do that at patching time by adding alt_instr.padlen-sized NOPs at the old instruction address. Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:12 +01:00
Borislav Petkov	48c7a2509f	x86/alternatives: Make JMPs more robust Up until now we had to pay attention to relative JMPs in alternatives about how their relative offset gets computed so that the jump target is still correct. Or, as it is the case for near CALLs (opcode e8), we still have to go and readjust the offset at patching time. What is more, the static_cpu_has_safe() facility had to forcefully generate 5-byte JMPs since we couldn't rely on the compiler to generate properly sized ones so we had to force the longest ones. Worse than that, sometimes it would generate a replacement JMP which is longer than the original one, thus overwriting the beginning of the next instruction at patching time. So, in order to alleviate all that and make using JMPs more straight-forward we go and pad the original instruction in an alternative block with NOPs at build time, should the replacement(s) be longer. This way, alternatives users shouldn't pay special attention so that original and replacement instruction sizes are fine but the assembler would simply add padding where needed and not do anything otherwise. As a second aspect, we go and recompute JMPs at patching time so that we can try to make 5-byte JMPs into two-byte ones if possible. If not, we still have to recompute the offsets as the replacement JMP gets put far away in the .altinstr_replacement section leading to a wrong offset if copied verbatim. For example, on a locally generated kernel image old insn VA: 0xffffffff810014bd, CPU feat: X86_FEATURE_ALWAYS, size: 2 __switch_to: ffffffff810014bd: eb 21 jmp ffffffff810014e0 repl insn: size: 5 ffffffff81d0b23c: e9 b1 62 2f ff jmpq ffffffff810014f2 gets corrected to a 2-byte JMP: apply_alternatives: feat: 332+21, old: (ffffffff810014bd, len: 2), repl: (ffffffff81d0b23c, len: 5) alt_insn: e9 b1 62 2f ff recompute_jumps: next_rip: ffffffff81d0b241, tgt_rip: ffffffff810014f2, new_displ: 0x00000033, ret len: 2 converted to: eb 33 90 90 90 and a 5-byte JMP: old insn VA: 0xffffffff81001516, CPU feat: X86_FEATURE_ALWAYS, size: 2 __switch_to: ffffffff81001516: eb 30 jmp ffffffff81001548 repl insn: size: 5 ffffffff81d0b241: e9 10 63 2f ff jmpq ffffffff81001556 gets shortened into a two-byte one: apply_alternatives: feat: 332+21, old: (ffffffff81001516, len: 2), repl: (ffffffff81d0b241, len: 5) alt_insn: e9 10 63 2f ff recompute_jumps: next_rip: ffffffff81d0b246, tgt_rip: ffffffff81001556, new_displ: 0x0000003e, ret len: 2 converted to: eb 3e 90 90 90 ... and so on. This leads to a net win of around 40ish replacements * 3 bytes savings =~ 120 bytes of I$ on an AMD guest which means some savings of precious instruction cache bandwidth. The padding to the shorter 2-byte JMPs are single-byte NOPs which on smart microarchitectures means discarding NOPs at decode time and thus freeing up execution bandwidth. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:11 +01:00
Borislav Petkov	4332195c56	x86/alternatives: Add instruction padding Up until now we have always paid attention to make sure the length of the new instruction replacing the old one is at least less or equal to the length of the old instruction. If the new instruction is longer, at the time it replaces the old instruction it will overwrite the beginning of the next instruction in the kernel image and cause your pants to catch fire. So instead of having to pay attention, teach the alternatives framework to pad shorter old instructions with NOPs at buildtime - but only in the case when len(old instruction(s)) < len(new instruction(s)) and add nothing in the >= case. (In that case we do add_nops() when patching). This way the alternatives user shouldn't have to care about instruction sizes and simply use the macros. Add asm ALTERNATIVE* flavor macros too, while at it. Also, we need to save the pad length in a separate struct alt_instr member for NOP optimization and the way to do that reliably is to carry the pad length instead of trying to detect whether we're looking at single-byte NOPs or at pathological instruction offsets like e9 90 90 90 90, for example, which is a valid instruction. Thanks to Michael Matz for the great help with toolchain questions. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:44:00 +01:00
Borislav Petkov	db477a3386	x86/alternatives: Cleanup DPRINTK macro Make it pass __func__ implicitly. Also, dump info about each replacing we're doing. Fixup comments and style while at it. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:35:50 +01:00
Borislav Petkov	338ea55579	x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define It is unconditionally enabled so remove it. No object file change. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-23 13:35:49 +01:00
Yannick Guerrini	a927792c19	x86/cpu/intel: Fix trivial typo in intel_tlb_table[] Change 'ssociative' to 'associative' Signed-off-by: Yannick Guerrini <yguerrini@tomshardware.fr> Cc: Borislav Petkov <bp@suse.de> Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Chris Bainbridge <chris.bainbridge@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Steven Honeyman <stevenhoneyman@gmail.com> Cc: trivial@kernel.org Link: http://lkml.kernel.org/r/1424558510-1420-1-git-send-email-yguerrini@tomshardware.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-22 08:55:58 +01:00
Linus Torvalds	f9677375b0	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull Intel Quark SoC support from Ingo Molnar: "This adds support for Intel Quark X1000 SoC boards, used in the low power 32-bit x86 Intel Galileo microcontroller board intended for the Arduino space. There's been some preparatory core x86 patches for Quark CPU quirks merged already, but this rounds it all up and adds Kconfig enablement. It's a clean hardware enablement addition tree at this point" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel/quark: Fix simple_return.cocci warnings x86/intel/quark: Fix ptr_ret.cocci warnings x86/intel/quark: Add Intel Quark platform support x86/intel/quark: Add Isolated Memory Regions for Quark X1000	2015-02-21 11:12:07 -08:00
Linus Torvalds	10436cf881	Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two fixes: the paravirt spin_unlock() corruption/crash fix, and an rtmutex NULL dereference crash fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/spinlocks/paravirt: Fix memory corruption on unlock locking/rtmutex: Avoid a NULL pointer dereference on deadlock	2015-02-21 10:45:03 -08:00
Linus Torvalds	5fbe4c224c	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: "This contains: - EFI fixes - a boot printout fix - ASLR/kASLR fixes - intel microcode driver fixes - other misc fixes Most of the linecount comes from an EFI revert" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/ASLR: Avoid PAGE_SIZE redefinition for UML subarch x86/microcode/intel: Handle truncated microcode images more robustly x86/microcode/intel: Guard against stack overflow in the loader x86, mm/ASLR: Fix stack randomization on 64-bit systems x86/mm/init: Fix incorrect page size in init_memory_mapping() printks x86/mm/ASLR: Propagate base load address calculation Documentation/x86: Fix path in zero-page.txt x86/apic: Fix the devicetree build in certain configs Revert "efi/libstub: Call get_memory_map() to obtain map and desc sizes" x86/efi: Avoid triple faults during EFI mixed mode calls	2015-02-21 10:41:29 -08:00
Linus Torvalds	b5aeca54d0	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 uprobe/kprobe fixes from Ingo Molnar: "This contains two uprobes fixes, an uprobes comment update and a kprobes fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes/x86: Mark 2 bytes NOP as boostable uprobes/x86: Fix 2-byte opcode table uprobes/x86: Fix 1-byte opcode tables uprobes/x86: Add comment with insn opcodes, mnemonics and why we dont support them	2015-02-21 10:39:16 -08:00
Linus Torvalds	3f4d9925e9	Merge branches 'core-urgent-for-linus' and 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rcu fix and x86 irq fix from Ingo Molnar: - Fix a bug that caused an RCU warning splat. - Two x86 irq related fixes: a hotplug crash fix and an ACPI IRQ registry fix. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Clear need_qs flag to prevent splat * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable() x86/irq: Fix regression caused by commit `b568b8601f`	2015-02-21 10:36:06 -08:00
Petr Mladek	2a6730c8b6	kprobes/x86: Check for invalid ftrace location in __recover_probed_insn() __recover_probed_insn() should always be called from an address where an instructions starts. The check for ftrace_location() might help to discover a potential inconsistency. This patch adds WARN_ON() when the inconsistency is detected. Also it adds handling of the situation when the original code can not get recovered. Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Petr Mladek <pmladek@suse.cz> Cc: Ananth NMavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1424441250-27146-3-git-send-email-pmladek@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-21 10:33:31 +01:00
Petr Mladek	650b7b23cb	kprobes/x86: Use 5-byte NOP when the code might be modified by ftrace can_probe() checks if the given address points to the beginning of an instruction. It analyzes all the instructions from the beginning of the function until the given address. The code might be modified by another Kprobe. In this case, the current code is read into a buffer, int3 breakpoint is replaced by the saved opcode in the buffer, and can_probe() analyzes the buffer instead. There is a bug that __recover_probed_insn() tries to restore the original code even for Kprobes using the ftrace framework. But in this case, the opcode is not stored. See the difference between arch_prepare_kprobe() and arch_prepare_kprobe_ftrace(). The opcode is stored by arch_copy_kprobe() only from arch_prepare_kprobe(). This patch makes Kprobe to use the ideal 5-byte NOP when the code can be modified by ftrace. It is the original instruction, see ftrace_make_nop() and ftrace_nop_replace(). Note that we always need to use the NOP for ftrace locations. Kprobes do not block ftrace and the instruction might get modified at anytime. It might even be in an inconsistent state because it is modified step by step using the int3 breakpoint. The patch also fixes indentation of the touched comment. Note that I found this problem when playing with Kprobes. I did it on x86_64 with gcc-4.8.3 that supported -mfentry. I modified samples/kprobes/kprobe_example.c and added offset 5 to put the probe right after the fentry area: static struct kprobe kp = { .symbol_name = "do_fork", + .offset = 5, }; Then I was able to load kprobe_example before jprobe_example but not the other way around: $> modprobe jprobe_example $> modprobe kprobe_example modprobe: ERROR: could not insert 'kprobe_example': Invalid or incomplete multibyte or wide character It did not make much sense and debugging pointed to the bug described above. Signed-off-by: Petr Mladek <pmladek@suse.cz> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ananth NMavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1424441250-27146-2-git-send-email-pmladek@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-21 10:33:30 +01:00
Jiri Kosina	570e1aa84c	x86/mm/ASLR: Avoid PAGE_SIZE redefinition for UML subarch Commit `f47233c2d3` ("x86/mm/ASLR: Propagate base load address calculation") causes PAGE_SIZE redefinition warnings for UML subarch builds. This is caused by added includes that were leftovers from previous patch versions are are not actually needed (especially page_types.h inlcude in module.c). Drop those stray includes. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1502201017240.28769@pobox.suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-20 10:55:32 +01:00
Ross Zwisler	719d359dc7	x86/asm: Add support for the pcommit instruction Add support for the new pcommit (persistent commit) instruction. This instruction was announced in the document "Intel Architecture Instruction Set Extensions Programming Reference" with reference number 319433-022: https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf The pcommit instruction ensures that data that has been flushed from the processor's cache hierarchy with clwb, clflushopt or clflush is accepted to memory and is durable on the DIMM. The primary use case for this is persistent memory. This function shows how to properly use clwb/clflushopt/clflush and pcommit with appropriate fencing: void flush_and_commit_buffer(void vaddr, unsigned int size) { void vend = vaddr + size - 1; for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size) clwb(vaddr); /* Flush any possible final partial cacheline / clwb(vend); / * sfence to order clwb/clflushopt/clflush cache flushes * mfence via mb() also works / wmb(); / pcommit and the required sfence for ordering / pcommit_sfence(); } After this function completes the data pointed to by vaddr is has been accepted to memory and will be durable if the vaddr points to persistent memory. Pcommit must always be ordered by an mfence or sfence, so to help simplify things we include both the pcommit and the required sfence in the alternatives generated by pcommit_sfence(). The other option is to keep them separated, but on platforms that don't support pcommit this would then turn into: void flush_and_commit_buffer(void vaddr, unsigned int size) { void vend = vaddr + size - 1; for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size) clwb(vaddr); / Flush any possible final partial cacheline / clwb(vend); / * sfence to order clwb/clflushopt/clflush cache flushes * mfence via mb() also works / wmb(); nop(); / from pcommit(), via alternatives / / * sfence to order pcommit * mfence via mb() also works */ wmb(); } This is still correct, but now you've got two fences separated by only a nop. With the commit and the fence together in pcommit_sfence() you avoid the final unneeded fence. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1424367448-24254-1-git-send-email-ross.zwisler@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-20 09:43:36 +01:00
David Vrabel	e3a1f6cac1	x86: pte_protnone() and pmd_protnone() must check entry is not present Since _PAGE_PROTNONE aliases _PAGE_GLOBAL it is only valid if _PAGE_PRESENT is clear. Make pte_protnone() and pmd_protnone() check for this. This fixes a 64-bit Xen PV guest regression introduced by `8a0516ed8b` ("mm: convert p[te\|md]_numa users to p[te\|md]_protnone_numa"). Any userspace process would endlessly fault. In a 64-bit PV guest, userspace page table entries have _PAGE_GLOBAL set by the hypervisor. This meant that any fault on a present userspace entry (e.g., a write to a read-only mapping) would be misinterpreted as a NUMA hinting fault and the fault would not be correctly handled, resulting in the access endlessly faulting. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-19 15:04:49 -08:00
Linus Torvalds	27a22ee4c7	Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild updates from Michal Marek: - several cleanups in kbuild - serialize multiple config targets so that 'make defconfig kvmconfig' works - The cc-ifversion macro got support for an else-branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: kbuild,gcov: simplify kernel/gcov/Makefile more kbuild: allow cc-ifversion to have the argument for false condition kbuild,gcov: simplify kernel/gcov/Makefile kbuild,gcov: remove unnecessary workaround kbuild: do not add $(call ...) to invoke cc-version or cc-fullversion kbuild: fix cc-ifversion macro kbuild: drop $(version_h) from MRPROPER_FILES kbuild: use mixed-targets when two or more config targets are given kbuild: remove redundant line from bounds.h/asm-offsets.h kbuild: merge bounds.h and asm-offsets.h rules kbuild: Drop support for clean-rule	2015-02-19 10:07:08 -08:00
Ingo Molnar	1fbe23e0de	* Two fixes hardening microcode data handling. (Quentin Casasnovas) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU5cyVAAoJEBLB8Bhh3lVKJWEP/1eK+XyiVdxV7FRuPmmgjGUC mD6MypFCwc942orTdltm9vlRFTU6OE1AkfEVX3NKawy+lzt/mE+TbWzwx+mr26un pqyKgSGGKqACDBADgUiVxubXffhAx9Ke5obScZoFA/Yp+l7os8wkwr6AMjwU+XgU FMGKWra0yeZsfCSkQgQ+q+RjQe2TOjh3YYVcwpPRaU6jkJ3CR+MNQ2tVmJnEVMAq Q3xEce8mMN+xpuyTlCyvpSIid8M9klAeXb5kjqfffJGSBmtVJ+nn3mDV1a0ejeYQ aA6X6SBwpIBPPjhwJrsgUcGC0GeF4X0TKjg1F6ZEW0lN9/mipiM+t3OEgxcBH0G7 SOAUtQTRDasj0bJd5qKOhAWWmFoXjSc61XiMYUreOWDPoaje76oql+iN1auZsRSh RS6KCwYgdqQYscN05L/l4iHgJXeGUTm45BJ6rJb1wEJ9OldO5yK4O42Tn0IZyQ4g w10poQY4jkjPnHVUWvk5IQpu7AcBiZtov201a89QpRyPGFoGgOOu7n5y0nDLxuvK m3L8LrEve8xO8xdqyidQKE3KGLnDcuuTx9XscbEGtoNWQ8oGIYuYW9DvKzCK/kmU u24tx65tygcQ6NJoUW/S3mIwnlyM1egqziXjpzmfR2TvGraqNCkIkocYyf1etVjh c0Mem02eJTvarxEpvHAa =qoLG -----END PGP SIGNATURE----- Merge tag 'microcode_fixes_for-3.21' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Pull microcode fixes from Borislav Petkov: - Two fixes hardening microcode data handling. (Quentin Casasnovas) Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 13:32:42 +01:00
Ingo Molnar	fa45a45ca3	Merge tag 'ras_for_3.21' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull RAS updates from Borislav Petkov: "- Enable AMD thresholding IRQ by default if supported. (Aravind Gopalakrishnan) - Unify mce_panic() message pattern. (Derek Che) - A bit more involved simplification of the CMCI logic after yet another report about race condition with the adaptive logic. (Borislav Petkov) - ACPI APEI EINJ fleshing out of the user documentation. (Borislav Petkov) - Minor cleanup. (Jan Beulich.)" Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 13:31:33 +01:00
Aravind Gopalakrishnan	d79f931f1c	x86/MCE/AMD: Enable thresholding interrupts by default if supported We setup APIC vectors for threshold errors if interrupt_capable. However, we don't set interrupt_enable by default. Rework threshold_restart_bank() so that when we set up lvt_offset, we also set IntType to APIC and also enable thresholding interrupts for banks which support it by default. User is still allowed to disable interrupts through sysfs. While at it, check if status is valid before we proceed to log error using mce_log. This is because, in multi-node platforms, only the NBC (Node Base Core, i.e. the first core in the node) has valid status info in its MCA registers. So, the decoding of status values on the non-NBC leads to noise on kernel logs like so: EDAC DEBUG: amd64_inject_write_store: section=0x80000000 word_bits=0x10020001 [Hardware Error]: Corrected error, no action required. [Hardware Error]: CPU:25 (15:2:0) MC4_STATUS[-\|CE\|-\|-\|- [Hardware Error]: Corrected error, no action required. [Hardware Error]: CPU:26 (15:2:0) MC4_STATUS[-\|CE\|-\|-\|- <...> WARNING: CPU: 25 PID: 0 at drivers/edac/amd64_edac.c:2147 decode_bus_error+0x1ba/0x2a0() WARNING: CPU: 26 PID: 0 at drivers/edac/amd64_edac.c:2147 decode_bus_error+0x1ba/0x2a0() Something is rotten in the state of Denmark. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1422896561-7695-1-git-send-email-aravind.gopalakrishnan@amd.com [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 13:24:47 +01:00
Derek Che	8af7043a3c	x86/MCE: Make mce_panic() fatal machine check msg in the same pattern There is another mce_panic call with "Fatal machine check on current CPU" in the same mce.c file, why not keep them all in same pattern mce_panic("Fatal machine check on current CPU", &m, msg); Signed-off-by: Derek Che <drc@yahoo-inc.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 13:24:47 +01:00
Borislav Petkov	3f2f0680d1	x86/MCE/intel: Cleanup CMCI storm logic Initially, this started with the yet another report about a race condition in the CMCI storm adaptive period length thing. Yes, we have to admit, it is fragile and error prone. So let's simplify it. The simpler logic is: now, after we enter storm mode, we go straight to polling with CMCI_STORM_INTERVAL, i.e. once a second. We remain in storm mode as long as we see errors being logged while polling. Theoretically, if we see an uninterrupted error stream, we will remain in storm mode indefinitely and keep polling the MSRs. However, when the storm is actually a burst of errors, once we have logged them all, we back out of it after ~5 mins of polling and no more errors logged. If we encounter an error during those 5 minutes, we reset the polling interval to 5 mins. Making machine_check_poll() return a bool and denoting whether it has seen an error or not lets us simplify a bunch of code and move the storm handling private to mce_intel.c. Some minor cleanups while at it. Reported-by: Calvin Owens <calvinowens@fb.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1417746575-23299-1-git-send-email-calvinowens@fb.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 13:24:25 +01:00
Quentin Casasnovas	35a9ff4eec	x86/microcode/intel: Handle truncated microcode images more robustly We do not check the input data bounds containing the microcode before copying a struct microcode_intel_header from it. A specially crafted microcode could cause the kernel to read invalid memory and lead to a denial-of-service. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1422964824-22056-3-git-send-email-quentin.casasnovas@oracle.com [ Made error message differ from the next one and flipped comparison. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 12:42:23 +01:00
Quentin Casasnovas	f84598bd7c	x86/microcode/intel: Guard against stack overflow in the loader mc_saved_tmp is a static array allocated on the stack, we need to make sure mc_saved_count stays within its bounds, otherwise we're overflowing the stack in _save_mc(). A specially crafted microcode header could lead to a kernel crash or potentially kernel execution. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1422964824-22056-1-git-send-email-quentin.casasnovas@oracle.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 12:41:37 +01:00
Ingo Molnar	a267b0a349	Merge branch 'tip-x86-kaslr' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Pull ASLR and kASLR fixes from Borislav Petkov: - Add a global flag announcing KASLR state so that relevant code can do informed decisions based on its setting. (Jiri Kosina) - Fix a stack randomization entropy decrease bug. (Hector Marco-Gisbert) Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 12:31:34 +01:00
Jan Beulich	2cd4c303a7	x86/MCE/AMD: Drop bogus const modifier from AMD's bank4_names() The compiler validly warns about it being ignored. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/54C21511020000780005890E@mail.emea.novell.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 12:30:47 +01:00
Hector Marco-Gisbert	4e7c22d447	x86, mm/ASLR: Fix stack randomization on 64-bit systems The issue is that the stack for processes is not properly randomized on 64 bit architectures due to an integer overflow. The affected function is randomize_stack_top() in file "fs/binfmt_elf.c": static unsigned long randomize_stack_top(unsigned long stack_top) { unsigned int random_variable = 0; if ((current->flags & PF_RANDOMIZE) && !(current->personality & ADDR_NO_RANDOMIZE)) { random_variable = get_random_int() & STACK_RND_MASK; random_variable <<= PAGE_SHIFT; } return PAGE_ALIGN(stack_top) + random_variable; return PAGE_ALIGN(stack_top) - random_variable; } Note that, it declares the "random_variable" variable as "unsigned int". Since the result of the shifting operation between STACK_RND_MASK (which is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64): random_variable <<= PAGE_SHIFT; then the two leftmost bits are dropped when storing the result in the "random_variable". This variable shall be at least 34 bits long to hold the (22+12) result. These two dropped bits have an impact on the entropy of process stack. Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One fourth of expected entropy). This patch restores back the entropy by correcting the types involved in the operations in the functions randomize_stack_top() and stack_maxrandom_size(). The successful fix can be tested with: $ for i in `seq 1 10`; do cat /proc/self/maps \| grep stack; done 7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack] 7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack] 7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack] 7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack] ... Once corrected, the leading bytes should be between 7ffc and 7fff, rather than always being 7fff. Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by: Ismael Ripoll <iripoll@upv.es> [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: CVE-2015-1593 Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 12:21:36 +01:00
Ingo Molnar	ee408b4207	Merge branch 'tip-x86-mm' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Pull boot printout fix from Borislav Petkov. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 11:59:18 +01:00
Dave Hansen	f15e05186c	x86/mm/init: Fix incorrect page size in init_memory_mapping() printks With 32-bit non-PAE kernels, we have 2 page sizes available (at most): 4k and 4M. Enabling PAE replaces that 4M size with a 2M one (which 64-bit systems use too). But, when booting a 32-bit non-PAE kernel, in one of our early-boot printouts, we say: init_memory_mapping: [mem 0x00000000-0x000fffff] [mem 0x00000000-0x000fffff] page 4k init_memory_mapping: [mem 0x37000000-0x373fffff] [mem 0x37000000-0x373fffff] page 2M init_memory_mapping: [mem 0x00100000-0x36ffffff] [mem 0x00100000-0x003fffff] page 4k [mem 0x00400000-0x36ffffff] page 2M init_memory_mapping: [mem 0x37400000-0x377fdfff] [mem 0x37400000-0x377fdfff] page 4k Which is obviously wrong. There is no 2M page available. This is probably because of a badly-named variable: in the map_range code: PG_LEVEL_2M. Instead of renaming all the PG_LEVEL_2M's. This patch just fixes the printout: init_memory_mapping: [mem 0x00000000-0x000fffff] [mem 0x00000000-0x000fffff] page 4k init_memory_mapping: [mem 0x37000000-0x373fffff] [mem 0x37000000-0x373fffff] page 4M init_memory_mapping: [mem 0x00100000-0x36ffffff] [mem 0x00100000-0x003fffff] page 4k [mem 0x00400000-0x36ffffff] page 4M init_memory_mapping: [mem 0x37400000-0x377fdfff] [mem 0x37400000-0x377fdfff] page 4k BRK [0x03206000, 0x03206fff] PGTABLE Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20150210212030.665EC267@viggo.jf.intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:45:27 +01:00
Jiri Kosina	f47233c2d3	x86/mm/ASLR: Propagate base load address calculation Commit: `e2b32e6785` ("x86, kaslr: randomize module base load address") makes the base address for module to be unconditionally randomized in case when CONFIG_RANDOMIZE_BASE is defined and "nokaslr" option isn't present on the commandline. This is not consistent with how choose_kernel_location() decides whether it will randomize kernel load base. Namely, CONFIG_HIBERNATION disables kASLR (unless "kaslr" option is explicitly specified on kernel commandline), which makes the state space larger than what module loader is looking at. IOW CONFIG_HIBERNATION && CONFIG_RANDOMIZE_BASE is a valid config option, kASLR wouldn't be applied by default in that case, but module loader is not aware of that. Instead of fixing the logic in module.c, this patch takes more generic aproach. It introduces a new bootparam setup data_type SETUP_KASLR and uses that to pass the information whether kaslr has been applied during kernel decompression, and sets a global 'kaslr_enabled' variable accordingly, so that any kernel code (module loading, livepatching, ...) can make decisions based on its value. x86 module loader is converted to make use of this flag. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Kees Cook <keescook@chromium.org> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Link: https://lkml.kernel.org/r/alpine.LNX.2.00.1502101411280.10719@pobox.suse.cz [ Always dump correct kaslr status when panicking ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:38:54 +01:00
Ingo Molnar	f353e61230	Merge branch 'tip-x86-fpu' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/fpu Pull FPU updates from Borislav Petkov: "A round of updates to the FPU maze from Oleg and Rik. It should make the code a bit more understandable/readable/streamlined and a preparation for more cleanups and improvements in that area." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 11:19:05 +01:00
Rik van Riel	728e53fef4	x86/fpu: Also check fpu_lazy_restore() when use_eager_fpu() With Oleg's patch: `33a3ebdc07` ("x86, fpu: Don't abuse has_fpu in __kernel_fpu_begin/end()") kernel threads no longer have an FPU state even on systems with use_eager_fpu(). That in turn means that a task may still have its FPU state loaded in the FPU registers, if the task only got interrupted by kernel threads from when it went to sleep, to when it woke up again. In that case, there is no need to restore the FPU state for this task, since it is still in the registers. The kernel can simply use the same logic to determine this as is used for !use_eager_fpu() systems. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/1423252925-14451-9-git-send-email-riel@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:15:55 +01:00
Rik van Riel	6a5fe8952b	x86/fpu: Use task_disable_lazy_fpu_restore() helper Replace magic assignments of fpu.last_cpu = ~0 with more explicit task_disable_lazy_fpu_restore() calls. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1423252925-14451-8-git-send-email-riel@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:15:55 +01:00
Rik van Riel	1361ef29c7	x86/fpu: Use an explicit if/else in switch_fpu_prepare() Use an explicit if/else branch after __save_init_fpu(old) in switch_fpu_prepare(). This makes substituting the assignment with a call in task_disable_lazy_fpu_restore() in the next patch easier to review. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/1423252925-14451-7-git-send-email-riel@redhat.com [ Space out stuff for more readability. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:15:54 +01:00
Rik van Riel	33e03dedd7	x86/fpu: Introduce task_disable_lazy_fpu_restore() helper Currently there are a few magic assignments sprinkled through the code that disable lazy FPU state restoring, some more effective than others, and all equally mystifying. It would be easier to have a helper to explicitly disable lazy FPU state restoring for a task. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/1423252925-14451-6-git-send-email-riel@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:15:53 +01:00
Rik van Riel	1c927eea4c	x86/fpu: Move lazy restore functions up a few lines We need another lazy restore related function, that will be called from a function that is above where the lazy restore functions are now. It would be nice to keep all three functions grouped together. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/1423252925-14451-5-git-send-email-riel@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:15:53 +01:00
Oleg Nesterov	08a744c6bf	x86/fpu: Change math_error() to use unlazy_fpu(), kill (now) unused save_init_fpu() math_error() calls save_init_fpu() after conditional_sti(), this means that the caller can be preempted. If !use_eager_fpu() we can hit the WARN_ON_ONCE(!__thread_has_fpu(tsk)) and/or save the wrong FPU state. Change math_error() to use unlazy_fpu() and kill save_init_fpu(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1423252925-14451-4-git-send-email-riel@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:15:03 +01:00
Oleg Nesterov	1a2a7f4ec8	x86/fpu: Don't do __thread_fpu_end() if use_eager_fpu() unlazy_fpu()->__thread_fpu_end() doesn't look right if use_eager_fpu(). Unconditional __thread_fpu_end() is only correct if we know that this thread can't return to user-mode and use FPU. Fortunately it has only 2 callers. fpu_copy() checks use_eager_fpu(), and init_fpu(current) can be only called by the coredumping thread via regset->get(). But it is exported to modules, and imo this should be fixed anyway. And if we check use_eager_fpu() we can use __save_fpu() like fpu_copy() and save_init_fpu() do. - It seems that even !use_eager_fpu() case doesn't need the unconditional __thread_fpu_end(), we only need it if __save_init_fpu() returns 0. - It is still not clear to me if __save_init_fpu() can safely nest with another save + restore from __kernel_fpu_begin(). If not, we can use kernel_fpu_disable() to fix the race. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1423252925-14451-3-git-send-email-riel@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:12:46 +01:00
Oleg Nesterov	a9241ea5fd	x86/fpu: Don't reset thread.fpu_counter The "else" branch clears ->fpu_counter as a remnant of the lazy FPU usage counting: `e07e23e1fd` ("[PATCH] non lazy "sleazy" fpu implementation") However, switch_fpu_prepare() does this now so that else branch is superfluous. If we do use_eager_fpu(), then this has no effect. Otherwise, if we actually wanted to prevent fpu preload after the context switch we would need to reset it unconditionally, even if __thread_has_fpu(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1423252925-14451-2-git-send-email-riel@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 11:12:40 +01:00
Alexander Kuleshov	fb148d83ec	x86/asm/boot: Use already defined KEEP_SEGMENTS macro in head_{32,64}.S There is already defined macro KEEP_SEGMENTS in <asm/bootparam.h>, let's use it instead of hardcoded constants. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1424331298-7456-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 10:05:04 +01:00
Fengguang Wu	c11a25f443	x86/intel/quark: Fix simple_return.cocci warnings arch/x86/platform/intel-quark/imr.c:129:1-4: WARNING: end returns can be simpified Simplify a trivial if-return sequence. Possibly combine with a preceding function call. Generated by: scripts/coccinelle/misc/simple_return.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Andy Shevchenko <andy.schevchenko@gmail.com> Cc: Ong, Boon Leong <boon.leong.ong@intel.com> Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Darren Hart <dvhart@linux.intel.com> Cc: kbuild-all@01.org Link: http://lkml.kernel.org/r/20150219081432.GA21996@waimea Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 10:00:55 +01:00
Fengguang Wu	32d39169d7	x86/intel/quark: Fix ptr_ret.cocci warnings arch/x86/platform/intel-quark/imr.c:280:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Andy Shevchenko <andy.schevchenko@gmail.com> Cc: Ong, Boon Leong <boon.leong.ong@intel.com> Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Darren Hart <dvhart@linux.intel.com> Cc: kbuild-all@01.org Link: http://lkml.kernel.org/r/20150219081432.GA21983@waimea Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 10:00:54 +01:00
Rusty Russell	f476893459	lguest: update help text. We now add about 10k, not 6k, when lguest support is compiled in. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-02-19 14:44:32 +10:30
Rusty Russell	b0bd96fe9a	lguest: now depends on PCI Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-02-19 12:31:14 +10:30
Valdis Kletnieks	9261dc1de1	x86/build/defconfig: Enable USB_EHCI_TT_NEWSCHED=y Some Gentoo users are encountering problems because USB_EHCI_TT_NEWSCHED isn't set in the defconfig (and Gentoo differs from other distros in not providing a distro .config). Alan Stern has said there's no reason to not set it, and the ability to turn it off at all should probably be yanked: http://article.gmane.org/gmane.linux.usb.general/119920 This addresses issue: https://bugs.gentoo.org/show_bug.cgi?id=533472 (The problem also theoretically affects the sh, arm, mips, powerpc, and sparc archs, but those would be other patches if this one that fixes 98% of the problem is accepted). Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 02:21:14 +01:00
Sylvain BERTRAND	e85bd9892c	x86/build: Fix mkcapflags.sh bash-ism Chocked while compiling linux with dash shell instead of bash shell. See: http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_09_05 Signed-off-by: Sylvain BERTRAND <sylvain.bertrand@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20141229154324.GA27533@dhcppc1 Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 02:21:00 +01:00
Jan Beulich	0cdb81bef2	x86-64: Also clear _PAGE_GLOBAL from __supported_pte_mask if !cpu_has_pge Not just setting it when the feature is available is for consistency, and may allow Xen to drop its custom clearing of the flag (unless it needs it cleared earlier than this code executes). Note that the change is benign to ix86, as the flag starts out clear there. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/54C215D10200007800058912@mail.emea.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 02:18:26 +01:00
Pavel Machek	1f40a8bfa9	x86/mm/pat: Ensure different messages in STRICT_DEVMEM and PAT cases STRICT_DEVMEM and PAT produce same failure accessing /dev/mem, which is quite confusing to the user. Make printk messages different to lessen confusion. Signed-off-by: Pavel Machek <pavel@ucw.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 02:09:49 +01:00
Fenghua Yu	1db491f77b	x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes With more embedded systems emerging using Quark, among other things, 32-bit kernel matters again. 32-bit machine and kernel uses PAE paging, which currently wastes at least 4K of memory per process on Linux where we have to reserve an entire page to support a single 32-byte PGD structure. It would be a very good thing if we could eliminate that wastage. PAE paging is used to access more than 4GB memory on x86-32. And it is required for NX. In this patch, we still allocate one page for pgd for a Xen domain and 64-bit kernel because one page pgd is assumed in these cases. But we can save memory space by only allocating 32-byte pgd for 32-bit PAE kernel when it is not running as a Xen domain. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Glenn Williamson <glenn.p.williamson@intel.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1421382601-46912-1-git-send-email-fenghua.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 01:28:38 +01:00
Alexander Kuleshov	79287cf877	x86/boot/video: Move the 'video_segment' variable to video.c video.c is the only real user of the 'video_segment' variable, so move it to video.c and make it static. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Martin Mares <mj@ucw.cz> Link: http://lkml.kernel.org/r/1422123092-28750-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 00:25:05 +01:00
Alexander Kuleshov	5b171e8218	x86/asm/boot: Fix path in comments Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Martin Mares <mj@ucw.cz> Link: http://lkml.kernel.org/r/1422382588-10367-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 00:03:30 +01:00
Andy Lutomirski	91e5ed49fc	x86/asm/decoder: Fix and enforce max instruction size in the insn decoder x86 instructions cannot exceed 15 bytes, and the instruction decoder should enforce that. Prior to `6ba48ff46f`, the instruction length limit was implicitly set to 16, which was an approximation of 15, but there is currently no limit at all. Fix MAX_INSN_SIZE (it should be 15, not 16), and fix the decoder to reject instructions that exceed MAX_INSN_SIZE. Other than potentially confusing some of the decoder sanity checks, I'm not aware of any actual problems that omitting this check would cause, nor am I aware of any practical problems caused by the MAX_INSN_SIZE error. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Fixes: `6ba48ff46f` ("x86: Remove arbitrary instruction size limit ... Link: http://lkml.kernel.org/r/f8f0bc9b8c58cfd6830f7d88400bf1396cbdcd0f.1422403511.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 00:01:24 +01:00
Bryan O'Donoghue	8bbc2a135b	x86/intel/quark: Add Intel Quark platform support Add Intel Quark platform support. Quark needs to pull down all unlocked IMRs to ensure agreement with the EFI memory map post boot. This patch adds an entry in Kconfig for Quark as a platform and makes IMR support mandatory if selected. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Tested-by: Ong, Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Reviewed-by: Andy Shevchenko <andy.schevchenko@gmail.com> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Ong, Boon Leong <boon.leong.ong@intel.com> Cc: dvhart@infradead.org Link: http://lkml.kernel.org/r/1422635379-12476-3-git-send-email-pure.logic@nexus-software.ie Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 23:22:55 +01:00
Bryan O'Donoghue	28a375df16	x86/intel/quark: Add Isolated Memory Regions for Quark X1000 Intel's Quark X1000 SoC contains a set of registers called Isolated Memory Regions. IMRs are accessed over the IOSF mailbox interface. IMRs are areas carved out of memory that define read/write access rights to the various system agents within the Quark system. For a given agent in the system it is possible to specify if that agent may read or write an area of memory defined by an IMR with a granularity of 1 KiB. Quark_SecureBootPRM_330234_001.pdf section 4.5 details the concept of IMRs quark-x1000-datasheet.pdf section 12.7.4 details the implementation of IMRs in silicon. eSRAM flush, CPU Snoop write-only, CPU SMM Mode, CPU non-SMM mode, RMU and PCIe Virtual Channels (VC0 and VC1) can have individual read/write access masks applied to them for a given memory region in Quark X1000. This enables IMRs to treat each memory transaction type listed above on an individual basis and to filter appropriately based on the IMR access mask for the memory region. Quark supports eight IMRs. Since all of the DMA capable SoC components in the X1000 are mapped to VC0 it is possible to define sections of memory as invalid for DMA write operations originating from Ethernet, USB, SD and any other DMA capable south-cluster component on VC0. Similarly it is possible to mark kernel memory as non-SMM mode read/write only or to mark BIOS runtime memory as SMM mode accessible only depending on the particular memory footprint on a given system. On an IMR violation Quark SoC X1000 systems are configured to reset the system, so ensuring that the IMR memory map is consistent with the EFI provided memory map is critical to ensure no IMR violations reset the system. The API for accessing IMRs is based on MTRR code but doesn't provide a /proc or /sys interface to manipulate IMRs. Defining the size and extent of IMRs is exclusively the domain of in-kernel code. Quark firmware sets up a series of locked IMRs around pieces of memory that firmware owns such as ACPI runtime data. During boot a series of unlocked IMRs are placed around items in memory to guarantee no DMA modification of those items can take place. Grub also places an unlocked IMR around the kernel boot params data structure and compressed kernel image. It is necessary for the kernel to tear down all unlocked IMRs in order to ensure that the kernel's view of memory passed via the EFI memory map is consistent with the IMR memory map. Without tearing down all unlocked IMRs on boot transitory IMRs such as those used to protect the compressed kernel image will cause IMR violations and system reboots. The IMR init code tears down all unlocked IMRs and sets a protective IMR around the kernel .text and .rodata as one contiguous block. This sanitizes the IMR memory map with respect to the EFI memory map and protects the read-only portions of the kernel from unwarranted DMA access. Tested-by: Ong, Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Reviewed-by: Andy Shevchenko <andy.schevchenko@gmail.com> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Ong, Boon Leong <boon.leong.ong@intel.com> Cc: andy.shevchenko@gmail.com Cc: dvhart@infradead.org Link: http://lkml.kernel.org/r/1422635379-12476-2-git-send-email-pure.logic@nexus-software.ie Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 23:22:47 +01:00
Ricardo Ribalda Delgado	b273c2c2f2	x86/apic: Fix the devicetree build in certain configs Without this patch: LD init/built-in.o arch/x86/built-in.o: In function `dtb_lapic_setup': kernel/devicetree.c:155: undefined reference to `apic_force_enable' Makefile:923: recipe for target 'vmlinux' failed make: *** [vmlinux] Error 1 Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: David Rientjes <rientjes@google.com> Cc: Jan Beulich <JBeulich@suse.com> Link: http://lkml.kernel.org/r/1422905231-16067-1-git-send-email-ricardo.ribalda@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 22:30:19 +01:00
Jan Beulich	50849eefea	x86/Kconfig: Simplify X86_UP_APIC handling We don't really need a helper symbol for that. For one, it's pointlessly getting set to Y for all configurations (even 64-bit ones). And then the purpose can be fulfilled by suitably adjusting X86_UP_APIC: Hide its prompt when PCI_MSI, and default it to PCI_MSI. Tested-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/54D39AFC020000780005D684@mail.emea.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 22:10:54 +01:00
Jan Beulich	b1da1e715d	x86/Kconfig: Simplify X86_IO_APIC dependencies Since dependencies are transitive, we don't really need to repeat those of X86_UP_IOAPIC. Furthermore avoid the symbol getting entered into .config when it is off by having the default simply Y and the dependencies solely handled via the intended for that purpose "depends on". Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/54D39BC9020000780005D688@mail.emea.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 22:09:33 +01:00
Jan Beulich	e0fd24a3b4	x86/Kconfig: Avoid issuing pointless turned off entries to .config Settings without prompts shouldn't normally have defaults other than Y, as otherwise they (a) needlessly enlarge the resulting .config and (b) if they ever get a prompt added later, the tracked setting of off will prevent the devloper from then being prompted for his/her choice when doing an incremental update of the configuration (make oldconfig). Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/54D39CC6020000780005D6AE@mail.emea.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 22:08:46 +01:00
Wang Nan	b7e37567d0	kprobes/x86: Mark 2 bytes NOP as boostable Currently, x86 kprobes is unable to boost 2 bytes nop like: nopl 0x0(%rax,%rax,1) which is 0x0f 0x1f 0x44 0x00 0x00. Such nops have exactly 5 bytes to hold a relative jmp instruction. Boosting them should be obviously safe. This patch enable boosting such nops by simply updating twobyte_is_boostable[] array. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1423532045-41049-1-git-send-email-wangnan0@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 21:50:12 +01:00
Denys Vlasenko	cbb53b9623	x86/asm/decoder: Explain CALLW discrepancy between Intel and AMD In 64-bit mode, AMD and Intel CPUs treat 0x66 prefix before branch insns differently. For near branches, it affects decode too since immediate offset's width is different. See these empirical tests: http://marc.info/?l=linux-kernel&m=139714939728946&w=2 Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/1423768017-31766-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 21:01:59 +01:00
Denys Vlasenko	8a764a875f	x86/asm/decoder: Create artificial 3rd byte for 2-byte VEX Before this patch, users need to do this to fetch vex.vvvv: if (insn->vex_prefix.nbytes == 2) { vex_vvvv = ((insn->vex_prefix.bytes[1] >> 3) & 0xf) ^ 0xf; } if (insn->vex_prefix.nbytes == 3) { vex_vvvv = ((insn->vex_prefix.bytes[2] >> 3) & 0xf) ^ 0xf; } Make it so that insn->vex_prefix.bytes[2] always contains vex.wvvvvLpp bits. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1423767879-31691-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 21:01:50 +01:00
Denys Vlasenko	5154d4f2ad	uprobes/x86: Fix 2-byte opcode table Enabled probing of lar, lsl, popcnt, lddqu, prefetch insns. They should be safe to probe, they throw no exceptions. Enabled probing of 3-byte opcodes 0f 38-3f xx - these are vector isns, so should be safe. Enabled probing of many currently undefined 0f xx insns. At the rate new vector instructions are getting added, we don't want to constantly enable more bits. We want to only occasionally disable ones which for some reason can't be probed. This includes 0f 24,26 opcodes, which are undefined since Pentium. On 486, they were "mov to/from test register". Explained more fully what 0f 78,79 opcodes are. Explained what 0f ae opcode is. (It's unclear why we don't allow probing it, but let's not change it for now). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1423768732-32194-3-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 20:55:53 +01:00
Denys Vlasenko	67fc809217	uprobes/x86: Fix 1-byte opcode tables This change fixes 1-byte opcode tables so that only insns for which we have real reasons to disallow probing are marked with unset bits. To that end: Set bits for all prefix bytes. Their setting is ignored anyway - we check the bitmap against OPCODE1(insn), not against first byte. Keeping them set to 0 only confuses code reader with "why we don't support that opcode" question. Thus: enable bytes c4,c5 in 64-bit mode (VEX prefixes). Byte 62 (EVEX prefix) is not yet enabled since insn decoder does not support that yet. For 32-bit mode, enable probing of opcodes 63 (arpl) and d6 (salc). They don't require any special handling. For 64-bit mode, disable 9a and ea - these undefined opcodes were mistakenly left enabled. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1423768732-32194-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 20:55:51 +01:00
Denys Vlasenko	097f4e5e83	uprobes/x86: Add comment with insn opcodes, mnemonics and why we dont support them After adding these, it's clear we have some awkward choices there. Some valid instructions are prohibited from uprobing while several invalid ones are allowed. Hopefully future edits to the good-opcode tables will fix wrong bits or explain why those bits are not wrong. No actual code changes. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1423768732-32194-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 20:55:46 +01:00
Miroslav Benes	4421f8f0fa	livepatch: remove extern specifier from header files Storage-class specifier 'extern' is redundant in front of the function declaration. According to the C specification it has the same meaning as if not present at all. So remove it. Signed-off-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-02-18 20:50:05 +01:00
Linus Torvalds	eaa0eda562	asm-generic: uaccess.h cleanup Like in 3.19, I once more have a multi-stage cleanup for one asm-generic header file, this time the work was done by Michael Tsirkin and cleans up the uaccess.h file in asm-generic, as well as all architectures for which the respective maintainers did not pick up his patches directly. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAVONFpmCrR//JCVInAQIoYRAA1T3ID1bQLqdi8TU1X+vzutXzGFRhRFii u18GYeN6sGTcfqQD0GsNSaH7G8XehF3cgJ9eo4h9YkRPIG/0T0FO+dqdB0uRh8iy GKcUqVhgvCFpOBDUJC6FgMvgWWyVrgSUBqG6qSXck/PDcMSsUa/m/GcLhR/sHWGn EGEAzYNvJgdOaJ1z0vfPFK6mPwFwmYzIss5XFuoBAKKN856fBlxofkQqdpKjGDFH n0UziaJ5tbCdlZ9M9Y5JN9RU8yBCcOmGHnHUAQHz3BXOt9sD7o5jDuzsUbj+vUGJ gzNc8kee9Pyy8ZA1F959gspaxe5Oumq7NLgs3HDjK6ZDRKpJvZb6iXi56f15chlZ dItTbFSxCHOFs0d8XJKNbmPt44pJ/qKO+03lMIGttMkIm7hXfvyMWSPZV9G0Pu1y zbWEDgW2Mdrdt0saNSD46IEp+c7E5P3D9JSctQRdQjReoCbOHwqrSHi1Zeg97XL4 I1E0KwDqFUw3P1dXr5ahXmR50ZigBGjN5Fz3N7GmJt2x4PRSS2Sw92hyCrL0YM8J 56FdRA7UJ0V/SzmAko3F5wWmhabc6L+qrVA42R6U3SNSjU8hwppOkYKDINNhPZfL SGy1oQS6Jj10WxLOVp66NC7XxXzBybDcQnatz4XtNN8P5sfekUGSGBeMyMsHl7IJ 9MT3xym+DWU= =LROx -----END PGP SIGNATURE----- Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic uaccess.h cleanup from Arnd Bergmann: "Like in 3.19, I once more have a multi-stage cleanup for one asm-generic header file, this time the work was done by Michael Tsirkin and cleans up the uaccess.h file in asm-generic, as well as all architectures for which the respective maintainers did not pick up his patches directly" * tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (37 commits) sparc32: nocheck uaccess coding style tweaks sparc64: nocheck uaccess coding style tweaks xtensa: macro whitespace fixes sh: macro whitespace fixes parisc: macro whitespace fixes m68k: macro whitespace fixes m32r: macro whitespace fixes frv: macro whitespace fixes cris: macro whitespace fixes avr32: macro whitespace fixes arm64: macro whitespace fixes arm: macro whitespace fixes alpha: macro whitespace fixes blackfin: macro whitespace fixes sparc64: uaccess_64 macro whitespace fixes sparc32: uaccess_32 macro whitespace fixes avr32: whitespace fix sh: fix put_user sparse errors metag: fix put_user sparse errors ia64: fix put_user sparse errors ...	2015-02-18 10:02:24 -08:00
Linus Torvalds	53861af9a1	OK, this has the big virtio 1.0 implementation, as specified by OASIS. On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to double-check the implementation. Then comes the inevitable fixes and cleanups from that work. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU5B9cAAoJENkgDmzRrbjxPacP/jajliXX353JJ/g/hkZ6oDN5 o7FhELBKiUMr7enVZYwj2BBYk5OM36nB9pQkiqHMSbjJGoS5IK70enxb4YRxSHBn YCLblZMNqutGS0kclZ9DDysztjAhxH7CvLM6pMZ7eHP0f3+FM/QhbxHfbG9DTBUH 2U/nybvd3M/+YBe7ptwQdrH8aOCAD6RTIsXellfm99dNMK6K/5lqnWQ98WSXmNXq vyvdaAQsqqUkmxtajjcBumaCH4/SehOJJjUqojCMsR3aBkgOBWDZJURMek+KA5Dt X996fBsTAlvTtCUKRrmLTb2ScDH7fu+jwbWRqMYDk8zpEr3XqiLTTPV4/TiHGmi7 Wiw3g1wIY1YbETlZyongB5MIoVyUfmDAd+bT8nBsj3KIITD84gOUQFDMl6d63c0I z6A9Pu/UzpJGsXZT3WoFLi6TO67QyhOseqZnhS4wBgLabjxffNM7yov9RVKUVH/n JHunnpUk2iTtSgscBarOBz5867dstuurnaUIspZthVBo6y6N0z+GrU+agJ8Y4DXx mvwzeYLhQH2208PjxPFiah/kA/gHNm1m678TbpS+CUsgmpQiJ4gTwtazDSi4TwZY Hs9T9GulkzpZIzEyKL3qG2TsfyDhW5Avn+GvKInAT9+Fkig4BnP3DUONBxcwGZ78 eI3FDUWsE36NqE5ECWmz =ivCe -----END PGP SIGNATURE----- Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "OK, this has the big virtio 1.0 implementation, as specified by OASIS. On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to double-check the implementation. Then comes the inevitable fixes and cleanups from that work" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (80 commits) virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice. virtio_net: unconditionally define struct virtio_net_hdr_v1. tools/lguest: don't use legacy definitions for net device in example launcher. virtio: Don't expose legacy net features when VIRTIO_NET_NO_LEGACY defined. tools/lguest: use common error macros in the example launcher. tools/lguest: give virtqueues names for better error messages tools/lguest: more documentation and checking of virtio 1.0 compliance. lguest: don't look in console features to find emerg_wr. tools/lguest: don't start devices until DRIVER_OK status set. tools/lguest: handle indirect partway through chain. tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI) tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI) tools/lguest: rename virtio_pci_cfg_cap field to match spec. tools/lguest: fix features_accepted logic in example launcher. tools/lguest: handle device reset correctly in example launcher. virtual: Documentation: simplify and generalize paravirt_ops.txt lguest: remove NOTIFY call and eventfd facility. lguest: remove NOTIFY facility from demonstration launcher. lguest: use the PCI console device's emerg_wr for early boot messages. lguest: always put console in PCI slot #1. ...	2015-02-18 09:24:01 -08:00
Peter Zijlstra	2c44b1936b	perf/x86/intel: Expose LBR callstack to user space tooling With LBR call stack feature enable, there are three callchain options. Enable the 3rd callchain option (LBR callstack) to user space tooling. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/20141105093759.GQ10501@worktop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:16:15 +01:00
Yan, Zheng	aa54ae9b87	perf/x86/intel: Discard zero length call entries in LBR call stack "Zero length call" uses the attribute of the call instruction to push the immediate instruction pointer on to the stack and then pops off that address into a register. This is accomplished without any matching return instruction. It confuses the hardware and make the recorded call stack incorrect. We can partially resolve this issue by: decode call instructions and discard any zero length call entry in the LBR stack. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-16-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:16:14 +01:00
Yan, Zheng	2c70d0086e	perf/x86/intel: Disable FREEZE_LBRS_ON_PMI when LBR operates in callstack mode LBR callstack is designed for PEBS, It does not work well with FREEZE_LBRS_ON_PMI for non PEBS event. If FREEZE_LBRS_ON_PMI is set for non PEBS event, PMIs near call/return instructions may cause superfluous increase/decrease of LBR_TOS. This patch modifies __intel_pmu_lbr_enable() to not enable FREEZE_LBRS_ON_PMI when LBR operates in callstack mode. We currently don't use LBR callstack to capture kernel space callchain, so disabling FREEZE_LBRS_ON_PMI should not be a problem. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-15-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:16:13 +01:00
Yan, Zheng	4b85490099	perf/x86/intel: Re-organize code that implicitly enables LBR/PEBS Make later patch more readable, no logic change. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-13-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:16:12 +01:00
Yan, Zheng	a46a230001	perf: Simplify the branch stack check Use event->attr.branch_sample_type to replace intel_pmu_needs_lbr_smpl() for avoiding duplicated code that implicitly enables the LBR. Currently, branch stack can be enabled by user explicitly requesting branch sampling or implicit branch sampling to correct PEBS skid. For user explicitly requested branch sampling, the branch_sample_type is explicitly set by user. For PEBS case, the branch_sample_type is also implicitly set to PERF_SAMPLE_BRANCH_ANY in x86_pmu_hw_config. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-11-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:16:11 +01:00
Yan, Zheng	76cb2c617f	perf/x86/intel: Save/restore LBR stack during context switch When the LBR call stack is enabled, it is necessary to save/restore the LBR stack on context switch. The solution is saving/restoring the LBR stack to/from task's perf event context. The LBR stack is saved/restored only when there are events that use the LBR call stack. If no event uses LBR call stack, the LBR stack is reset when task is scheduled in. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-10-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:16:10 +01:00
Yan, Zheng	63f0c1d841	perf/x86/intel: Track number of events that use the LBR callstack When enabling/disabling an event, check if the event uses the LBR callstack feature, adjust the LBR callstack usage count accordingly. Later patch will use the usage count to decide if LBR stack should be saved/restored. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-9-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:16:09 +01:00
Yan, Zheng	e18bf52642	perf/x86/intel: Allocate space for storing LBR stack When the LBR call stack is enabled, it is necessary to save/restore the LBR stack on context switch. We can use pmu specific data to store LBR stack when task is scheduled out. This patch adds code that allocates the pmu specific data. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-8-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:16:08 +01:00
Yan, Zheng	e9d7f7cd97	perf/x86/intel: Add basic Haswell LBR call stack support Haswell has a new feature that utilizes the existing LBR facility to record call chains. To enable this feature, bits (JCC, NEAR_IND_JMP, NEAR_REL_JMP, FAR_BRANCH, EN_CALLSTACK) in LBR_SELECT must be set to 1, bits (NEAR_REL_CALL, NEAR-IND_CALL, NEAR_RET) must be cleared. Due to a hardware bug of Haswell, this feature doesn't work well with FREEZE_LBRS_ON_PMI. When the call stack feature is enabled, the LBR stack will capture unfiltered call data normally, but as return instructions are executed, the last captured branch record is flushed from the on-chip registers in a last-in first-out (LIFO) manner. Thus, branch information relative to leaf functions will not be captured, while preserving the call stack information of the main line execution path. This patch defines a separate lbr_sel map for Haswell. The map contains a new entry for the call stack feature. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-5-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:16:04 +01:00
Yan, Zheng	2a0ad3b326	perf/x86/intel: Use context switch callback to flush LBR stack Previous commit introduces context switch callback, its function overlaps with the flush branch stack callback. So we can use the context switch callback to flush LBR stack. This patch adds code that uses the flush branch callback to flush the LBR stack when task is being scheduled in. The callback is enabled only when there are events use the LBR hardware. This patch also removes all old flush branch stack code. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-4-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:16:03 +01:00
Yan, Zheng	ba532500c5	perf: Introduce pmu context switch callback The callback is invoked when process is scheduled in or out. It provides mechanism for later patches to save/store the LBR stack. For the schedule in case, the callback is invoked at the same place that flush branch stack callback is invoked. So it also can replace the flush branch stack callback. To avoid unnecessary overhead, the callback is enabled only when there are events use the LBR stack. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-3-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:16:02 +01:00
Yan, Zheng	27ac905b8f	perf/x86/intel: Reduce lbr_sel_map[] size The index of lbr_sel_map is bit value of perf branch_sample_type. PERF_SAMPLE_BRANCH_MAX is 1024 at present, so each lbr_sel_map uses 4096 bytes. By using bit shift as index, we can reduce lbr_sel_map size to 40 bytes. This patch defines 'bit shift' for branch types, and use 'bit shift' to define lbr_sel_maps. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: jolsa@redhat.com Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/1415156173-10035-2-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:16:01 +01:00
Aravind Gopalakrishnan	c796b205b8	perf/x86/amd/ibs: Convert force_ibs_eilvt_setup() to void The caller of force_ibs_eilvt_setup() is ibs_eilvt_setup() which does not care about the return values. So mark it void and clean up the return statements. Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <hpa@zytor.com> Cc: <paulus@samba.org> Cc: <tglx@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422037175-20957-1-git-send-email-aravind.gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:01:46 +01:00
Markus Elfring	8e57c586c6	perf/x86/intel/uncore: Delete an unnecessary check before pci_dev_put() call The pci_dev_put() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/54D0B59C.2060106@users.sourceforge.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 17:01:42 +01:00
Joerg Roedel	d97eb8966c	x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable() When an interrupt is migrated away from a cpu it will stay in its vector_irq array until smp_irq_move_cleanup_interrupt succeeded. The cfg->move_in_progress flag is cleared already when the IPI was sent. When the interrupt is destroyed after migration its 'struct irq_desc' is freed and the vector_irq arrays are cleaned up. But since cfg->move_in_progress is already 0 the references at cpus before the last migration will not be cleared. So this would leave a reference to an already destroyed irq alive. When the cpu is taken down at this point, the check_irq_vectors_for_cpu_disable() function finds a valid irq number in the vector_irq array, but gets NULL for its descriptor and dereferences it, causing a kernel panic. This has been observed on real systems at shutdown. Add a check to check_irq_vectors_for_cpu_disable() for a valid 'struct irq_desc' to prevent this issue. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Yinghai Lu <yinghai@kernel.org> Cc: alnovak@suse.com Cc: joro@8bytes.org Link: http://lkml.kernel.org/r/20150204132754.GA10078@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 15:01:42 +01:00
Jiang Liu	1ea76fbadd	x86/irq: Fix regression caused by commit `b568b8601f` Commit `b568b8601f` ("Treat SCI interrupt as normal GSI interrupt") accidently removes support of legacy PIC interrupt when fixing a regression for Xen, which causes a nasty regression on HP/Compaq nc6000 where we fail to register the ACPI interrupt, and thus lose eg. thermal notifications leading a potentially overheated machine. So reintroduce support of legacy PIC based ACPI SCI interrupt. Reported-by: Ville Syrjälä <syrjala@sci.fi> Tested-by: Ville Syrjälä <syrjala@sci.fi> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <stable@vger.kernel.org> # 3.19+ Cc: H. Peter Anvin <hpa@zytor.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: linux-pm@vger.kernel.org Link: http://lkml.kernel.org/r/1424052673-22974-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 15:01:41 +01:00
Raghavendra K T	d6abfdb202	x86/spinlocks/paravirt: Fix memory corruption on unlock Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); / add_smp() is a full mb() / if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x 0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: Hinted missing TICKET_LOCK_INC for kick] [Oleg: Moved slowpath flag to head, ticket_equals idea] [PeterZ: Added detailed changelog] Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Jones <drjones@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: a.ryabinin@samsung.com Cc: dave@stgolabs.net Cc: hpa@zytor.com Cc: jasowang@redhat.com Cc: jeremy@goop.org Cc: paul.gortmaker@windriver.com Cc: riel@redhat.com Cc: tglx@linutronix.de Cc: waiman.long@hp.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20150215173043.GA7471@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 14:53:49 +01:00
Ingo Molnar	829bf7af64	* Leave a valid 64-bit IDT installed during runtime EFI mixed mode calls to avoid triple faults if an NMI/MCE is received. * Revert Ard's change to the libstub get_memory_map() that went into the v3.20 merge window because it causes boot regressions on Qemu and Xen. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU5HnBAAoJEC84WcCNIz1Vln4P/08oA3V/GziY2cJGn1TBtBkA M7GzElF3ceojRVlfp7tVQpmT9oXqXsUB9ccqlOMKF8EO1s7EKnguVV/nkIah/5wu HIPYFgDC2s57cf7MGA53xOjTJNSu6PzTkUsJF7KhtZmb0c/GyWMN38Ggsdi7zuA3 5WD3O1CSLn77IqRXYtCr8aimQI3QJeTnN8IXQxyoDmQ8XK2uV1qRwFI96+2AjFJM EG+xw+p6DCpoJXbci1kZxPT/iV5P8fLwzvcRT2G/qMa5RqWALLN5VeP3yBe+VqPU s9PQK8jLablQcsglIemPHRnfLcOWz13yEx5Z6S1lyzOJrQsk7rp9dLO7GKiK22ex 1CPu+Cudk1ETn+pDyjADl6wcvZJfh1krnD4Gzm6VsSUWC924/sovYvH67sPSWc5a RxylE4pSuHYADnQZh1YqH719KMWpMKb+9UeYstq3PfebeyKJkAqXqPBTWRldI1N7 YWLweED35dg3mN8g8mYEmiIOXe/dYNoaWJw0m2FrEMxJ5x4Z1ukDSccFN5+pAkn/ Nn1wXxGzt0sU40+t5bYA6CnJAEsU1pP/kPcZeqNzB3AGYIiA+rtH45jQMVO4xorM BA2COqRrC+UdUHbiy5IgF6EGxIKy4Yb+aE/EvEd8e4GborI4in6mK8xKAllvr4+M hD5nwJviAXQkviZZeOq5 =I/nB -----END PGP SIGNATURE----- Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fixes from Matt Fleming: " - Leave a valid 64-bit IDT installed during runtime EFI mixed mode calls to avoid triple faults if an NMI/MCE is received. - Revert Ard's change to the libstub get_memory_map() that went into the v3.20 merge window because it causes boot regressions on Qemu and Xen. " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-18 14:40:40 +01:00
Linus Torvalds	d96c757efa	Fix regression - functions on the mce notifier chain should not be able to decide that an event should not be logged -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU47aGAAoJEKurIx+X31iBiX4P+wfq7uUKwQ4riD2jFppvhrcm W2Qx/iIv9QN77ZIw5I45VqGbDKXmlThl41ISem9BlKd8jKaldY3lUlQMfrPC5V11 9bl/7LsZoQLlbuwYR6uiLdKqW9wd6d5Y1mczdSDM5wCtfMw/s+C/ETzuRVHsQZjF 1LTB0rb0NPouX+y3D8aDrvk9Os5ozZsz3N/y6e3TsI/wV8d3rqwH8C8x3RjB2Evx 3WRSwoSOq9kHEbeg1r7PMKYKWAoJs97Kwo4EgJELqn8fxYMWnSsoDZGr9P2PX8oT TKgSFPnhgCLw+qlWy81MM8hutnHKnN6oXcJKzE0nHtD8JlJ/M/HdDAPIg8G9aLIn ABxPg6OORs/4YJQYGFA8ixx3TfIMspMU2m9KGoCcerpGaHCHhrlylJyheUvhRkPP u8pjGz+31d3bVVRzCLJt1eqo3H/y0wcURWaemk23lcUIsdDqisjZDzZrZxyZuWaH eDTKmHsZB/I4wnOs4Ke+U7oo/u+NtBzPmBSJcshgKSONLPd7bSJtjckLoa3wSf5I q5DkZgxrUYkO6tIoAAi/N2tc/2qkjTOug79BP9YKN3elmv3nxW4gieSaUb5p16bd ORpZLt3SDKEPv+79a/1e7ZyR7ik9Evhzc+/72M1IZvmdr3uOjj+xkuE1uRU7vuvX 44Jmx6mEtPK1mVBfwkdG =jAUs -----END PGP SIGNATURE----- Merge tag 'please-pull-fixmcelog' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull mcelog regression fix from Tony Luck: "Fix regression - functions on the mce notifier chain should not be able to decide that an event should not be logged" * tag 'please-pull-fixmcelog' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: x86/mce: Fix regression. All error records should report via /dev/mcelog	2015-02-17 17:03:07 -08:00
Linus Torvalds	37507717de	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf updates from Ingo Molnar: "This series tightens up RDPMC permissions: currently even highly sandboxed x86 execution environments (such as seccomp) have permission to execute RDPMC, which may leak various perf events / PMU state such as timing information and other CPU execution details. This 'all is allowed' RDPMC mode is still preserved as the (non-default) /sys/devices/cpu/rdpmc=2 setting. The new default is that RDPMC access is only allowed if a perf event is mmap-ed (which is needed to correctly interpret RDPMC counter values in any case). As a side effect of these changes CR4 handling is cleaned up in the x86 code and a shadow copy of the CR4 value is added. The extra CR4 manipulation adds ~ <50ns to the context switch cost between rdpmc-capable and rdpmc-non-capable mms" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks perf/x86: Only allow rdpmc if a perf_event is mapped perf: Pass the event to arch_perf_update_userpage() perf: Add pmu callbacks to track event mapping and unmapping x86: Add a comment clarifying LDT context switching x86: Store a per-cpu shadow copy of CR4 x86: Clean up cr4 manipulation	2015-02-16 14:58:12 -08:00
Linus Torvalds	a9724125ad	TTY/Serial driver patches for 3.20-rc1 Here's the big tty/serial driver update for 3.20-rc1. Nothing huge here, just lots of driver updates and some core tty layer fixes as well. All have been in linux-next with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlTgtgkACgkQMUfUDdst+ykXbACg14oFAmeYjO9RsdIHPXBvKseO 47QAn0foy91bpNQ5UFOxWS5L6Fzj2ZND =syx2 -----END PGP SIGNATURE----- Merge tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver patches from Greg KH: "Here's the big tty/serial driver update for 3.20-rc1. Nothing huge here, just lots of driver updates and some core tty layer fixes as well. All have been in linux-next with no reported issues" * tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (119 commits) serial: 8250: Fix UART_BUG_TXEN workaround serial: driver for ETRAX FS UART tty: remove unused variable sprop serial: of-serial: fetch line number from DT serial: samsung: earlycon support depends on CONFIG_SERIAL_SAMSUNG_CONSOLE tty/serial: serial8250_set_divisor() can be static tty/serial: Add Spreadtrum sc9836-uart driver support Documentation: DT: Add bindings for Spreadtrum SoC Platform serial: samsung: remove redundant interrupt enabling tty: Remove external interface for tty_set_termios() serial: omap: Fix RTS handling serial: 8250_omap: Use UPSTAT_AUTORTS for RTS handling serial: core: Rework hw-assisted flow control support tty/serial: 8250_early: Add support for PXA UARTs tty/serial: of_serial: add support for PXA/MMP uarts tty/serial: of_serial: add DT alias ID handling serial: 8250: Prevent concurrent updates to shadow registers serial: 8250: Use canary to restart console after suspend serial: 8250: Refactor XR17V35X divisor calculation serial: 8250: Refactor divisor programming ...	2015-02-15 11:37:02 -08:00
Linus Torvalds	4ba63072b9	Char / Misc patches for 3.20-rc1 Here's the big char/misc driver update for 3.20-rc1. Lots of little things in here, all described in the changelog. Nothing major or unusual, except maybe the binder selinux stuff, which was all acked by the proper selinux people and they thought it best to come through this tree. All of this has been in linux-next with no reported issues for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlTgs80ACgkQMUfUDdst+yn86gCeMLbxANGExVLd+PR46GNsAUQb SJ4AmgIqrkIz+5LCwZWM02ldbYhPeBVf =lfmM -----END PGP SIGNATURE----- Merge tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc patches from Greg KH: "Here's the big char/misc driver update for 3.20-rc1. Lots of little things in here, all described in the changelog. Nothing major or unusual, except maybe the binder selinux stuff, which was all acked by the proper selinux people and they thought it best to come through this tree. All of this has been in linux-next with no reported issues for a while" * tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (90 commits) coresight: fix function etm_writel_cp14() parameter order coresight-etm: remove check for unknown Kconfig macro coresight: fixing CPU hwid lookup in device tree coresight: remove the unnecessary function coresight_is_bit_set() coresight: fix the debug AMBA bus name coresight: remove the extra spaces coresight: fix the link between orphan connection and newly added device coresight: remove the unnecessary replicator property coresight: fix the replicator subtype value pdfdocs: Fix 'make pdfdocs' failure for 'uio-howto.tmpl' mcb: Fix error path of mcb_pci_probe virtio/console: verify device has config space ti-st: clean up data types (fix harmless memory corruption) mei: me: release hw from reset only during the reset flow mei: mask interrupt set bit on clean reset bit extcon: max77693: Constify struct regmap_config extcon: adc-jack: Release IIO channel on driver remove extcon: Remove duplicated include from extcon-class.c Drivers: hv: vmbus: hv_process_timer_expiration() can be static Drivers: hv: vmbus: serialize Offer and Rescind offer ...	2015-02-15 10:48:44 -08:00
Linus Torvalds	c833e17e27	Tighten rules for ACCESS_ONCE This series tightens the rules for ACCESS_ONCE to only work on scalar types. It also contains the necessary fixups as indicated by build bots of linux-next. Now everything is in place to prevent new non-scalar users of ACCESS_ONCE and we can continue to convert code to READ_ONCE/WRITE_ONCE. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJU2H5MAAoJEBF7vIC1phx8Jm4QALPqKOMDSUBCrqJFWJeujtv2 ILxJKsnjrAlt3dxnlVI3q6e5wi896hSce75PcvZ/vs/K3GdgMxOjrakBJGTJ2Qjg 5njW9aGJDDr/SYFX33MLWfqy222TLtpxgSz379UgXjEzB0ymMWbJJ3FnGjVqQJdp RXDutpncRySc/rGHh9UPREIRR5GvimONsWE2zxgXjUzB8vIr2fCGvHTXfIb6RKbQ yaFoihzn0m+eisc5Gy4tQ1qhhnaYyWEGrINjHTjMFTQOWTlH80BZAyQeLdbyj2K5 qloBPS/VhBTr/5TxV5onM+nVhu0LiblVNrdMHVeb7jyST4LeFOCaWK98lB3axSB5 v/2D1YKNb3g1U1x3In/oNGQvs36zGiO1uEdMF1l8ZFXgCvHmATSFSTWBtqUhb5Ew JA3YyqMTG6dpRTMSnmu3/frr4wDqnxlB/ktQC1pf3tDp87mr1ZYEy/dQld+tltjh 9Z5GSdrw0nf91wNI3DJf+26ZDdz5B+EpDnPnOKG8anI1lc/mQneI21/K/xUteFXw UZ1XGPLV2vbv9/a13u44SdjenHvQs1egsGeebMxVPoj6WmDLVmcIqinyS6NawYzn IlDGy/b3bSnXWMBP0ZVBX94KWLxqDDc4a/ayxsmxsP1tPZ+jDXjVDa7E3zskcHxG Uj5ULCPyU087t8Sl76mv =Dj70 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux Pull ACCESS_ONCE() rule tightening from Christian Borntraeger: "Tighten rules for ACCESS_ONCE This series tightens the rules for ACCESS_ONCE to only work on scalar types. It also contains the necessary fixups as indicated by build bots of linux-next. Now everything is in place to prevent new non-scalar users of ACCESS_ONCE and we can continue to convert code to READ_ONCE/WRITE_ONCE" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux: kernel: Fix sparse warning for ACCESS_ONCE next: sh: Fix compile error kernel: tighten rules for ACCESS ONCE mm/gup: Replace ACCESS_ONCE with READ_ONCE x86/spinlock: Leftover conversion ACCESS_ONCE->READ_ONCE x86/xen/p2m: Replace ACCESS_ONCE with READ_ONCE ppc/hugetlbfs: Replace ACCESS_ONCE with READ_ONCE ppc/kvm: Replace ACCESS_ONCE with READ_ONCE	2015-02-14 10:54:28 -08:00
Linus Torvalds	fee5429e02	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "Here is the crypto update for 3.20: - Added 192/256-bit key support to aesni GCM. - Added MIPS OCTEON MD5 support. - Fixed hwrng starvation and race conditions. - Added note that memzero_explicit is not a subsitute for memset. - Added user-space interface for crypto_rng. - Misc fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits) crypto: tcrypt - do not allocate iv on stack for aead speed tests crypto: testmgr - limit IV copy length in aead tests crypto: tcrypt - fix buflen reminder calculation crypto: testmgr - mark rfc4106(gcm(aes)) as fips_allowed crypto: caam - fix resource clean-up on error path for caam_jr_init crypto: caam - pair irq map and dispose in the same function crypto: ccp - terminate ccp_support array with empty element crypto: caam - remove unused local variable crypto: caam - remove dead code crypto: caam - don't emit ICV check failures to dmesg hwrng: virtio - drop extra empty line crypto: replace scatterwalk_sg_next with sg_next crypto: atmel - Free memory in error path crypto: doc - remove colons in comments crypto: seqiv - Ensure that IV size is at least 8 bytes crypto: cts - Weed out non-CBC algorithms MAINTAINERS: add linux-crypto to hw random crypto: cts - Remove bogus use of seqiv crypto: qat - don't need qat_auth_state struct crypto: algif_rng - fix sparse non static symbol warning ...	2015-02-14 09:47:01 -08:00
Andrey Ryabinin	bebf56a1b1	kasan: enable instrumentation of global variables This feature let us to detect accesses out of bounds of global variables. This will work as for globals in kernel image, so for globals in modules. Currently this won't work for symbols in user-specified sections (e.g. __init, __read_mostly, ...) The idea of this is simple. Compiler increases each global variable by redzone size and add constructors invoking __asan_register_globals() function. Information about global variable (address, size, size with redzone ...) passed to __asan_register_globals() so we could poison variable's redzone. This patch also forces module_alloc() to return 8*PAGE_SIZE aligned address making shadow memory handling ( kasan_module_alloc()/kasan_module_free() ) more simple. Such alignment guarantees that each shadow page backing modules address space correspond to only one module_alloc() allocation. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 21:21:42 -08:00
Andrey Ryabinin	cb9e3c292d	mm: vmalloc: pass additional vm_flags to __vmalloc_node_range() For instrumenting global variables KASan will shadow memory backing memory for modules. So on module loading we will need to allocate memory for shadow and map it at address in shadow that corresponds to the address allocated in module_alloc(). __vmalloc_node_range() could be used for this purpose, except it puts a guard hole after allocated area. Guard hole in shadow memory should be a problem because at some future point we might need to have a shadow memory at address occupied by guard hole. So we could fail to allocate shadow for module_alloc(). Now we have VM_NO_GUARD flag disabling guard page, so we need to pass into __vmalloc_node_range(). Add new parameter 'vm_flags' to __vmalloc_node_range() function. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 21:21:42 -08:00
Andrey Ryabinin	c420f167db	kasan: enable stack instrumentation Stack instrumentation allows to detect out of bounds memory accesses for variables allocated on stack. Compiler adds redzones around every variable on stack and poisons redzones in function's prologue. Such approach significantly increases stack usage, so all in-kernel stacks size were doubled. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 21:21:41 -08:00
Andrey Ryabinin	393f203f5f	x86_64: kasan: add interceptors for memset/memmove/memcpy functions Recently instrumentation of builtin functions calls was removed from GCC 5.0. To check the memory accessed by such functions, userspace asan always uses interceptors for them. So now we should do this as well. This patch declares memset/memmove/memcpy as weak symbols. In mm/kasan/kasan.c we have our own implementation of those functions which checks memory before accessing it. Default memset/memmove/memcpy now now always have aliases with '__' prefix. For files that built without kasan instrumentation (e.g. mm/slub.c) original mem* replaced (via #define) with prefixed variants, cause we don't want to check memory accesses there. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 21:21:41 -08:00
Andrey Ryabinin	ef7f0d6a6c	x86_64: add KASan support This patch adds arch specific code for kernel address sanitizer. 16TB of virtual addressed used for shadow memory. It's located in range [ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup stacks. At early stage we map whole shadow region with zero page. Latter, after pages mapped to direct mapping address range we unmap zero pages from corresponding shadow (see kasan_map_shadow()) and allocate and map a real shadow memory reusing vmemmap_populate() function. Also replace __pa with __pa_nodebug before shadow initialized. __pa with CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr) __phys_addr is instrumented, so __asan_load could be called before shadow area initialized. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Jim Davis <jim.epost@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 21:21:41 -08:00
Tejun Heo	bf58b4879c	x86: use %pb[l] to print bitmaps including cpumasks and nodemasks printk and friends can now format bitmaps using '%pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. * Unnecessary buffer size calculation and condition on the lenght removed from intel_cacheinfo.c::show_shared_cpu_map_func(). * uv_nmi_nr_cpus_pr() got overly smart and implemented "..." abbreviation if the output stretched over the predefined 1024 byte buffer. Replaced with plain printk. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Mike Travis <travis@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 21:21:37 -08:00
Linus Torvalds	8329aa9fff	Revert "x86/apic: Only disable CPU x2apic mode when necessary" This reverts commit `5fcee53ce7`. It causes the suspend to fail on at least the Chromebook Pixel, possibly other platforms too. Joerg Roedel points out that the logic should probably have been if (max_physical_apicid > 255 \|\| !(IS_ENABLED(CONFIG_HYPERVISOR_GUEST) && hypervisor_x2apic_available())) { instead, but since the code is not in any fast-path, so we can just live without that optimization and just revert to the original code. Acked-by: Joerg Roedel <joro@8bytes.org> Acked-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 10:26:18 -08:00
Linus Torvalds	b9085bcbf5	Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: the highlights are support for GICv3 emulation and dirty page tracking s390: several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. ARM has other conflicts where functions are added in the same place by 3.19-rc and 3.20 patches. These are not large though, and entirely within KVM. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJU28rkAAoJEL/70l94x66DXqQH/1TDOfJIjW7P2kb0Sw7Fy1wi cEX1KO/VFxAqc8R0E/0Wb55CXyPjQJM6xBXuFr5cUDaIjQ8ULSktL4pEwXyyv/s5 DBDkN65mriry2w5VuEaRLVcuX9Wy+tqLQXWNkEySfyb4uhZChWWHvKEcgw5SqCyg NlpeHurYESIoNyov3jWqvBjr4OmaQENyv7t2c6q5ErIgG02V+iCux5QGbphM2IC9 LFtPKxoqhfeB2xFxTOIt8HJiXrZNwflsTejIlCl/NSEiDVLLxxHCxK2tWK/tUXMn JfLD9ytXBWtNMwInvtFm4fPmDouv2VDyR0xnK2db+/axsJZnbxqjGu1um4Dqbak= =7gdx -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM update from Paolo Bonzini: "Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: The highlights are support for GICv3 emulation and dirty page tracking s390: Several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. Powerpc: Nothing yet. The KVM/PPC changes will come in through the PPC maintainers, because I haven't received them yet and I might end up being offline for some part of next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: ia64: drop kvm.h from installed user headers KVM: x86: fix build with !CONFIG_SMP KVM: x86: emulate: correct page fault error code for NoWrite instructions KVM: Disable compat ioctl for s390 KVM: s390: add cpu model support KVM: s390: use facilities and cpu_id per KVM KVM: s390/CPACF: Choose crypto control block format s390/kernel: Update /proc/sysinfo file with Extended Name and UUID KVM: s390: reenable LPP facility KVM: s390: floating irqs: fix user triggerable endless loop kvm: add halt_poll_ns module parameter kvm: remove KVM_MMIO_SIZE KVM: MIPS: Don't leak FPU/DSP to guest KVM: MIPS: Disable HTW while in guest KVM: nVMX: Enable nested posted interrupt processing KVM: nVMX: Enable nested virtual interrupt delivery KVM: nVMX: Enable nested apic register virtualization KVM: nVMX: Make nested control MSRs per-cpu KVM: nVMX: Enable nested virtualize x2apic mode KVM: nVMX: Prepare for using hardware MSR bitmap ...	2015-02-13 09:55:09 -08:00
Matt Fleming	96738c69a7	x86/efi: Avoid triple faults during EFI mixed mode calls Andy pointed out that if an NMI or MCE is received while we're in the middle of an EFI mixed mode call a triple fault will occur. This can happen, for example, when issuing an EFI mixed mode call while running perf. The reason for the triple fault is that we execute the mixed mode call in 32-bit mode with paging disabled but with 64-bit kernel IDT handlers installed throughout the call. At Andy's suggestion, stop playing the games we currently do at runtime, such as disabling paging and installing a 32-bit GDT for __KERNEL_CS. We can simply switch to the __KERNEL32_CS descriptor before invoking firmware services, and run in compatibility mode. This way, if an NMI/MCE does occur the kernel IDT handler will execute correctly, since it'll jump to __KERNEL_CS automatically. However, this change is only possible post-ExitBootServices(). Before then the firmware "owns" the machine and expects for its 32-bit IDT handlers to be left intact to service interrupts, etc. So, we now need to distinguish between early boot and runtime invocations of EFI services. During early boot, we need to restore the GDT that the firmware expects to be present. We can only jump to the __KERNEL32_CS code segment for mixed mode calls after ExitBootServices() has been invoked. A liberal sprinkling of comments in the thunking code should make the differences in early and late environments more apparent. Reported-by: Andy Lutomirski <luto@amacapital.net> Tested-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2015-02-13 15:42:56 +00:00
Rusty Russell	55c2d7884e	lguest: don't look in console features to find emerg_wr. The 1.0 spec clearly states that you must set the ACKNOWLEDGE and DRIVER status bits before accessing the feature bits. This is a problem for the early console code, which doesn't really want to acknowledge the device (the spec specifically excepts writing to the console's emerg_wr from the usual ordering constrains). Instead, we check that the size of the device configuration is sufficient to hold emerg_wr: at worst (if the device doesn't support the VIRTIO_CONSOLE_F_EMERG_WRITE feature), it will ignore the writes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-02-13 17:15:51 +10:30
Linus Torvalds	818099574b	Merge branch 'akpm' (patches from Andrew) Merge third set of updates from Andrew Morton: - the rest of MM [ This includes getting rid of the numa hinting bits, in favor of just generic protnone logic. Yay. - Linus ] - core kernel - procfs - some of lib/ (lots of lib/ material this time) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (104 commits) lib/lcm.c: replace include lib/percpu_ida.c: remove redundant includes lib/strncpy_from_user.c: replace module.h include lib/stmp_device.c: replace module.h include lib/sort.c: move include inside #if 0 lib/show_mem.c: remove redundant include lib/radix-tree.c: change to simpler include lib/plist.c: remove redundant include lib/nlattr.c: remove redundant include lib/kobject_uevent.c: remove redundant include lib/llist.c: remove redundant include lib/md5.c: simplify include lib/list_sort.c: rearrange includes lib/genalloc.c: remove redundant include lib/idr.c: remove redundant include lib/halfmd4.c: simplify includes lib/dynamic_queue_limits.c: simplify includes lib/sort.c: use simpler includes lib/interval_tree.c: simplify includes hexdump: make it return number of bytes placed in buffer ...	2015-02-12 18:54:28 -08:00
Rasmus Villemoes	02f1f2170d	kernel.h: remove ancient __FUNCTION__ hack __FUNCTION__ hasn't been treated as a string literal since gcc 3.4, so this only helps people who only test-compile using 3.3 (compiler-gcc3.h barks at anything older than that). Besides, there are almost no occurrences of __FUNCTION__ left in the tree. [akpm@linux-foundation.org: convert remaining __FUNCTION__ references] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-12 18:54:13 -08:00
Andy Lutomirski	f56141e3e2	all arches, signal: move restart_block to struct task_struct If an attacker can cause a controlled kernel stack overflow, overwriting the restart block is a very juicy exploit target. This is because the restart_block is held in the same memory allocation as the kernel stack. Moving the restart block to struct task_struct prevents this exploit by making the restart_block harder to locate. Note that there are other fields in thread_info that are also easy targets, at least on some architectures. It's also a decent simplification, since the restart code is more or less identical on all architectures. [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: David Miller <davem@davemloft.net> Acked-by: Richard Weinberger <richard@nod.at> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Steven Miao <realmz6@gmail.com> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-12 18:54:12 -08:00
Mel Gorman	c819f37e7e	x86: mm: restore original pte_special check Commit `b38af4721f` ("x86,mm: fix pte_special versus pte_numa") adjusted the pte_special check to take into account that a special pte had SPECIAL and neither PRESENT nor PROTNONE. Now that NUMA hinting PTEs are no longer modifying _PAGE_PRESENT it should be safe to restore the original pte_special behaviour. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-12 18:54:08 -08:00
Mel Gorman	21d9ee3eda	mm: remove remaining references to NUMA hinting bits and helpers This patch removes the NUMA PTE bits and associated helpers. As a side-effect it increases the maximum possible swap space on x86-64. One potential source of problems is races between the marking of PTEs PROT_NONE, NUMA hinting faults and migration. It must be guaranteed that a PTE being protected is not faulted in parallel, seen as a pte_none and corrupting memory. The base case is safe but transhuge has problems in the past due to an different migration mechanism and a dependance on page lock to serialise migrations and warrants a closer look. task_work hinting update parallel fault ------------------------ -------------- change_pmd_range change_huge_pmd __pmd_trans_huge_lock pmdp_get_and_clear __handle_mm_fault pmd_none do_huge_pmd_anonymous_page read? pmd_lock blocks until hinting complete, fail !pmd_none test write? __do_huge_pmd_anonymous_page acquires pmd_lock, checks pmd_none pmd_modify set_pmd_at task_work hinting update parallel migration ------------------------ ------------------ change_pmd_range change_huge_pmd __pmd_trans_huge_lock pmdp_get_and_clear __handle_mm_fault do_huge_pmd_numa_page migrate_misplaced_transhuge_page pmd_lock waits for updates to complete, recheck pmd_same pmd_modify set_pmd_at Both of those are safe and the case where a transhuge page is inserted during a protection update is unchanged. The case where two processes try migrating at the same time is unchanged by this series so should still be ok. I could not find a case where we are accidentally depending on the PTE not being cleared and flushed. If one is missed, it'll manifest as corruption problems that start triggering shortly after this series is merged and only happen when NUMA balancing is enabled. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mark Brown <broonie@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-12 18:54:08 -08:00
Mel Gorman	8a0516ed8b	mm: convert p[te\|md]_numa users to p[te\|md]_protnone_numa Convert existing users of pte_numa and friends to the new helper. Note that the kernel is broken after this patch is applied until the other page table modifiers are also altered. This patch layout is to make review easier. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-12 18:54:08 -08:00
Mel Gorman	e7bb4b6d16	mm: add p[te\|md] protnone helpers for use by NUMA balancing This is a preparatory patch that introduces protnone helpers for automatic NUMA balancing. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-12 18:54:08 -08:00
Linus Torvalds	5d8e7fb691	md updates for 3.20 - assorted locking changes so that access to /proc/mdstat and much of /sys/block/mdXX/md/* is protected by a spinlock rather than a mutex and will never block indefinitely. - Make an 'if' condition in RAID5 - which has been implicated in recent bugs - more readable. - misc minor fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIVAwUAVNwaHTnsnt1WYoG5AQLssw//TniaqtlsK6RPd4PbWdyKFBfUx6bPurtx ZRQvw7FcX8sFW/QtxKyJZsOhKcCc1CQEByzZzvuR5SLE8scyw/SxC8eFqMY+vSWF HmqJtoK3LdxY/PPC2qRYcdzpU4DzYuIu3ChkJvdXJa4mULKJhutR8nbp1k52JXN8 U2KPoE+hScVLKCuhwkvp6h4Fg/Caa9MjSpk3uMEkGDm75Iwl175EIb/fV95pIfqd cXs2wJBW3TA/YQecMeHC/YmVSIjF8nobCfhFlWfWYg0ySrSh1YSPLjb707Ohs5cF efEGoIWfm2uKd0XmsKEDclYDTNtwqgO7A3QDuWFP5ab9cTpi0Fuj1CE9LNBCf5Hf a5mwggBQ2KkpGwgtBuz6MJRaVu1xtOOjyFG8qzzeDOHLKxHRvZgiUnkb5U2I+YxB iMmyk7ijm9PF5M+wm3AqzFmG8icrAf6ixkSZidQy0PfAIFFFph+jQvdHqiXnLOGD 6S/xpG0avHxdC5pC4R0iRsNMNl3IESqy6qLkTOYjQ+yoZ9A+LY19nsMf9LBnwgdY hz9WxV+EvFVggVl19U5Bg+3z1EwPt3kNr4Se1bkPI8reHH/adacNmHWBBLQ5KHle WYdqcNXiTwaLje/7Qw5TKQFSGgLMAPc8ciDA7QOX/ZFoyUmWFn89YAoDdGqvvDd9 g5mQc/Amxt8= =/FmP -----END PGP SIGNATURE----- Merge tag 'md/3.20' of git://neil.brown.name/md Pull md updates from Neil Brown: - assorted locking changes so that access to /proc/mdstat and much of /sys/block/mdXX/md/* is protected by a spinlock rather than a mutex and will never block indefinitely. - Make an 'if' condition in RAID5 - which has been implicated in recent bugs - more readable. - misc minor fixes * tag 'md/3.20' of git://neil.brown.name/md: (28 commits) md/raid10: fix conversion from RAID0 to RAID10 md: wakeup thread upon rdev_dec_pending() md: make reconfig_mutex optional for writes to md sysfs files. md: move mddev_lock and related to md.h md: use mddev->lock to protect updates to resync_{min,max}. md: minor cleanup in safe_delay_store. md: move GET_BITMAP_FILE ioctl out from mddev_lock. md: tidy up set_bitmap_file md: remove unnecessary 'buf' from get_bitmap_file. md: remove mddev_lock from rdev_attr_show() md: remove mddev_lock() from md_attr_show() md/raid5: use ->lock to protect accessing raid5 sysfs attributes. md: remove need for mddev_lock() in md_seq_show() md/bitmap: protect clearing of ->bitmap by mddev->lock md: protect ->pers changes with mddev->lock md: level_store: group all important changes into one place. md: rename ->stop to ->free md: split detach operation out from ->stop. md/linear: remove rcu protections in favour of suspend/resume md: make merge_bvec_fn more robust in face of personality changes. ...	2015-02-12 11:05:49 -08:00
Linus Torvalds	42cf0f203e	Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm Pull ARM updates from Russell King: - clang assembly fixes from Ard - optimisations and cleanups for Aurora L2 cache support - efficient L2 cache support for secure monitor API on Exynos SoCs - debug menu cleanup from Daniel Thompson to allow better behaviour for multiplatform kernels - StrongARM SA11x0 conversion to irq domains, and pxa_timer - kprobes updates for older ARM CPUs - move probes support out of arch/arm/kernel to arch/arm/probes - add inline asm support for the rbit (reverse bits) instruction - provide an ARM mode secondary CPU entry point (for Qualcomm CPUs) - remove the unused ARMv3 user access code - add driver_override support to AMBA Primecell bus * 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (55 commits) ARM: 8256/1: driver coamba: add device binding path 'driver_override' ARM: 8301/1: qcom: Use secondary_startup_arm() ARM: 8302/1: Add a secondary_startup that assumes ARM mode ARM: 8300/1: teach __asmeq that r11 == fp and r12 == ip ARM: kprobes: Fix compilation error caused by superfluous '*' ARM: 8297/1: cache-l2x0: optimize aurora range operations ARM: 8296/1: cache-l2x0: clean up aurora cache handling ARM: 8284/1: sa1100: clear RCSR_SMR on resume ARM: 8283/1: sa1100: collie: clear PWER register on machine init ARM: 8282/1: sa1100: use handle_domain_irq ARM: 8281/1: sa1100: move GPIO-related IRQ code to gpio driver ARM: 8280/1: sa1100: switch to irq_domain_add_simple() ARM: 8279/1: sa1100: merge both GPIO irqdomains ARM: 8278/1: sa1100: split irq handling for low GPIOs ARM: 8291/1: replace magic number with PAGE_SHIFT macro in fixup_pv code ARM: 8290/1: decompressor: fix a wrong comment ARM: 8286/1: mm: Fix dma_contiguous_reserve comment ARM: 8248/1: pm: remove outdated comment ARM: 8274/1: Fix DEBUG_LL for multi-platform kernels (without PL01X) ARM: 8273/1: Seperate DEBUG_UART_PHYS from DEBUG_LL on EP93XX ...	2015-02-12 08:51:56 -08:00
Andrea Arcangeli	a7b780750e	mm: gup: use get_user_pages_unlocked within get_user_pages_fast This allows the get_user_pages_fast slow path to release the mmap_sem before blocking. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Peter Feiner <pfeiner@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-11 17:06:05 -08:00
Kirill A. Shutemov	dc6c9a35b6	mm: account pmd page tables to the process Dave noticed that unprivileged process can allocate significant amount of memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and memory cgroup. The trick is to allocate a lot of PMD page tables. Linux kernel doesn't account PMD tables to the process, only PTE. The use-cases below use few tricks to allocate a lot of PMD page tables while keeping VmRSS and VmPTE low. oom_score for the process will be 0. #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> #include <sys/prctl.h> #define PUD_SIZE (1UL << 30) #define PMD_SIZE (1UL << 21) #define NR_PUD 130000 int main(void) { char addr = NULL; unsigned long i; prctl(PR_SET_THP_DISABLE); for (i = 0; i < NR_PUD ; i++) { addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE\|PROT_READ, MAP_ANONYMOUS\|MAP_PRIVATE, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); break; } addr = 'x'; munmap(addr, PMD_SIZE); mmap(addr, PMD_SIZE, PROT_WRITE\|PROT_READ, MAP_ANONYMOUS\|MAP_PRIVATE\|MAP_FIXED, -1, 0); if (addr == MAP_FAILED) perror("re-mmap"), exit(1); } printf("PID %d consumed %lu KiB in PMD page tables\n", getpid(), i * 4096 >> 10); return pause(); } The patch addresses the issue by account PMD tables to the process the same way we account PTE. The main place where PMD tables is accounted is __pmd_alloc() and free_pmd_range(). But there're few corner cases: - HugeTLB can share PMD page tables. The patch handles by accounting the table to all processes who share it. - x86 PAE pre-allocates few PMD tables on fork. - Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity check on exit(2). Accounting only happens on configuration where PMD page table's level is present (PMD is not folded). As with nr_ptes we use per-mm counter. The counter value is used to calculate baseline for badness score by oom-killer. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: David Rientjes <rientjes@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-11 17:06:04 -08:00
Kirill A. Shutemov	d016bf7ece	mm: make FIRST_USER_ADDRESS unsigned long on all archs LKP has triggered a compiler warning after my recent patch "mm: account pmd page tables to the process": mm/mmap.c: In function 'exit_mmap': >> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default] The code: > 2857 WARN_ON(mm_nr_pmds(mm) > 2858 round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT); In this, on tile, we have FIRST_USER_ADDRESS defined as 0. round_up() has the same type -- int. PUD_SHIFT. I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned long. On every arch for consistency. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-11 17:06:03 -08:00
Naoya Horiguchi	cbef8478be	mm/hugetlb: pmd_huge() returns true for non-present hugepage Migrating hugepages and hwpoisoned hugepages are considered as non-present hugepages, and they are referenced via migration entries and hwpoison entries in their page table slots. This behavior causes race condition because pmd_huge() doesn't tell non-huge pages from migrating/hwpoisoned hugepages. follow_page_mask() is one example where the kernel would call follow_page_pte() for such hugepage while this function is supposed to handle only normal pages. To avoid this, this patch makes pmd_huge() return true when pmd_none() is true and pmd_present() is false. We don't have to worry about mixing up non-present pmd entry with normal pmd (pointing to leaf level pte entry) because pmd_present() is true in normal pmd. The same race condition could happen in (x86-specific) gup_pmd_range(), where this patch simply adds pmd_present() check instead of pmd_huge(). This is because gup_pmd_range() is fast path. If we have non-present hugepage in this function, we will go into gup_huge_pmd(), then return 0 at flag mask check, and finally fall back to the slow path. Fixes: `290408d4a2` ("hugetlb: hugepage migration core") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> [2.6.36+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-11 17:06:01 -08:00
Naoya Horiguchi	61f77eda9b	mm/hugetlb: reduce arch dependent code around follow_huge_* Currently we have many duplicates in definitions around follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this patch tries to remove the m. The basic idea is to put the default implementation for these functions in mm/hugetlb.c as weak symbols (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement arch-specific code only when the arch needs it. For follow_huge_addr(), only powerpc and ia64 have their own implementation, and in all other architectures this function just returns ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as default. As for follow_huge_(pmd\|pud)(), if (pmd\|pud)_huge() is implemented to always return 0 in your architecture (like in ia64 or sparc,) it's never called (the callsite is optimized away) no matter how implemented it is. So in such architectures, we don't need arch-specific implementation. In some architecture (like mips, s390 and tile,) their current arch-specific follow_huge_(pmd\|pud)() are effectively identical with the common code, so this patch lets these architecture use the common code. One exception is metag, where pmd_huge() could return non-zero but it expects follow_huge_pmd() to always return NULL. This means that we need arch-specific implementation which returns NULL. This behavior looks strange to me (because non-zero pmd_huge() implies that the architecture supports PMD-based hugepage, so follow_huge_pmd() can/should return some relevant value,) but that's beyond this cleanup patch, so let's keep it. Justification of non-trivial changes: - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE is true when follow_huge_pmd() can be called (note that pmd_huge() has the same check and always returns 0 for !MACHINE_HAS_HPAGE.) - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common code. This patch forces these archs use PMD_MASK, but it's OK because they are identical in both archs. In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20. In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but PTE_ORDER is always 0, so these are identical. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-11 17:06:01 -08:00
Rusty Russell	d9bab50aa4	lguest: remove NOTIFY call and eventfd facility. Disappointing, as this was kind of neat (especially getting to use RCU to manage the address -> eventfd mapping). But now the devices are PCI handled in userspace, we get rid of both the NOTIFY hypercall and the interface to connect an eventfd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-02-11 16:47:46 +10:30
Rusty Russell	a561adfaec	lguest: use the PCI console device's emerg_wr for early boot messages. This involves manually checking the console device (which is always in slot 1 of bus 0) and using the window in VIRTIO_PCI_CAP_PCI_CFG to program it (as we can't map the BAR yet). We could in fact do this much earlier, but we wait for the first write from the virtio_cons_early_init() facility. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-02-11 16:47:44 +10:30
Rusty Russell	e1b83e2788	lguest: Override pcibios_enable_irq/pcibios_disable_irq to our stupid PIC This lets us deliver interrupts for our emulated PCI devices using our dumb PIC, and not emulate an 8259 and PCI irq mapping tables or whatever. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-02-11 16:47:34 +10:30
Rusty Russell	ee72576c14	lguest: disable ACPI explicitly. Once we add PCI, it starts trying to manage our interrupts. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-02-11 16:47:34 +10:30
Rusty Russell	d1c29465b8	lguest: don't disable iospace. This no longer speeds up boot (IDE got better, I guess), but it does stop us probing for a PCI bus. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-02-11 16:47:32 +10:30
Linus Torvalds	29afc4e9a4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree changes from Jiri Kosina: "Patches from trivial.git that keep the world turning around. Mostly documentation and comment fixes, and a two corner-case code fixes from Alan Cox" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: kexec, Kconfig: spell "architecture" properly mm: fix cleancache debugfs directory path blackfin: mach-common: ints-priority: remove unused function doubletalk: probe failure causes OOPS ARM: cache-l2x0.c: Make it clear that cache-l2x0 handles L310 cache controller msdos_fs.h: fix 'fields' in comment scsi: aic7xxx: fix comment ARM: l2c: fix comment ibmraid: fix writeable attribute with no store method dynamic_debug: fix comment doc: usbmon: fix spelling s/unpriviledged/unprivileged/ x86: init_mem_mapping(): use capital BIOS in comment	2015-02-10 18:57:15 -08:00
Linus Torvalds	1d9c5d79e6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull live patching infrastructure from Jiri Kosina: "Let me provide a bit of history first, before describing what is in this pile. Originally, there was kSplice as a standalone project that implemented stop_machine()-based patching for the linux kernel. This project got later acquired, and the current owner is providing live patching as a proprietary service, without any intentions to have their implementation merged. Then, due to rising user/customer demand, both Red Hat and SUSE started working on their own implementation (not knowing about each other), and announced first versions roughly at the same time [1] [2]. The principle difference between the two solutions is how they are making sure that the patching is performed in a consistent way when it comes to different execution threads with respect to the semantic nature of the change that is being introduced. In a nutshell, kPatch is issuing stop_machine(), then looking at stacks of all existing processess, and if it decides that the system is in a state that can be patched safely, it proceeds insterting code redirection machinery to the patched functions. On the other hand, kGraft provides a per-thread consistency during one single pass of a process through the kernel and performs a lazy contignuous migration of threads from "unpatched" universe to the "patched" one at safe checkpoints. If interested in a more detailed discussion about the consistency models and its possible combinations, please see the thread that evolved around [3]. It pretty quickly became obvious to the interested parties that it's absolutely impractical in this case to have several isolated solutions for one task to co-exist in the kernel. During a dedicated Live Kernel Patching track at LPC in Dusseldorf, all the interested parties sat together and came up with a joint aproach that would work for both distro vendors. Steven Rostedt took notes [4] from this meeting. And the foundation for that aproach is what's present in this pull request. It provides a basic infrastructure for function "live patching" (i.e. code redirection), including API for kernel modules containing the actual patches, and API/ABI for userspace to be able to operate on the patches (look up what patches are applied, enable/disable them, etc). It's relatively simple and minimalistic, as it's making use of existing kernel infrastructure (namely ftrace) as much as possible. It's also self-contained, in a sense that it doesn't hook itself in any other kernel subsystem (it doesn't even touch any other code). It's now implemented for x86 only as a reference architecture, but support for powerpc, s390 and arm is already in the works (adding arch-specific support basically boils down to teaching ftrace about regs-saving). Once this common infrastructure gets merged, both Red Hat and SUSE have agreed to immediately start porting their current solutions on top of this, abandoning their out-of-tree code. The plan basically is that each patch will be marked by flag(s) that would indicate which consistency model it is willing to use (again, the details have been sketched out already in the thread at [3]). Before this happens, the current codebase can be used to patch a large group of secruity/stability problems the patches for which are not too complex (in a sense that they don't introduce non-trivial change of function's return value semantics, they don't change layout of data structures, etc) -- this corresponds to LEAVE_FUNCTION && SWITCH_FUNCTION semantics described at [3]. This tree has been in linux-next since December. [1] https://lkml.org/lkml/2014/4/30/477 [2] https://lkml.org/lkml/2014/7/14/857 [3] https://lkml.org/lkml/2014/11/7/354 [4] http://linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_LivePatching.txt [ The core code is introduced by the three commits authored by Seth Jennings, which got a lot of changes incorporated during numerous respins and reviews of the initial implementation. All the followup commits have materialized only after public tree has been created, so they were not folded into initial three commits so that the public tree doesn't get rebased ]" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add missing newline to error message livepatch: rename config to CONFIG_LIVEPATCH livepatch: fix uninitialized return value livepatch: support for repatching a function livepatch: enforce patch stacking semantics livepatch: change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING livepatch: fix deferred module patching order livepatch: handle ancient compilers with more grace livepatch: kconfig: use bool instead of boolean livepatch: samples: fix usage example comments livepatch: MAINTAINERS: add git tree location livepatch: use FTRACE_OPS_FL_IPMODIFY livepatch: move x86 specific ftrace handler code to arch/x86 livepatch: samples: add sample live patching module livepatch: kernel: add support for live patching livepatch: kernel: add TAINT_LIVEPATCH	2015-02-10 18:35:40 -08:00
Linus Torvalds	992de5a8ec	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: "Bite-sized chunks this time, to avoid the MTA ratelimiting woes. - fs/notify updates - ocfs2 - some of MM" That laconic "some MM" is mainly the removal of remap_file_pages(), which is a big simplification of the VM, and which gets rid of a lot of random cruft and special cases because we no longer support the non-linear mappings that it used. From a user interface perspective, nothing has changed, because the remap_file_pages() syscall still exists, it's just done by emulating the old behavior by creating a lot of individual small mappings instead of one non-linear one. The emulation is slower than the old "native" non-linear mappings, but nobody really uses or cares about remap_file_pages(), and simplifying the VM is a big advantage. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits) memcg: zap memcg_slab_caches and memcg_slab_mutex memcg: zap memcg_name argument of memcg_create_kmem_cache memcg: zap __memcg_{charge,uncharge}_slab mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check mm: hugetlb: fix type of hugetlb_treat_as_movable variable mm, hugetlb: remove unnecessary lower bound on sysctl handlers"? mm: memory: merge shared-writable dirtying branches in do_wp_page() mm: memory: remove ->vm_file check on shared writable vmas xtensa: drop _PAGE_FILE and pte_file()-related helpers x86: drop _PAGE_FILE and pte_file()-related helpers unicore32: drop pte_file()-related helpers um: drop _PAGE_FILE and pte_file()-related helpers tile: drop pte_file()-related helpers sparc: drop pte_file()-related helpers sh: drop _PAGE_FILE and pte_file()-related helpers score: drop _PAGE_FILE and pte_file()-related helpers s390: drop pte_file()-related helpers parisc: drop _PAGE_FILE and pte_file()-related helpers openrisc: drop _PAGE_FILE and pte_file()-related helpers nios2: drop _PAGE_FILE and pte_file()-related helpers ...	2015-02-10 16:45:56 -08:00
Linus Torvalds	872912352c	ACPI and power management updates for v3.20-rc1 - Rework of the core ACPI resources parsing code to fix issues in it and make using resource offsets more convenient and consolidation of some resource-handing code in a couple of places that have grown analagous data structures and code to cover the the same gap in the core (Jiang Liu, Thomas Gleixner, Lv Zheng). - ACPI-based IOAPIC hotplug support on top of the resources handling rework (Jiang Liu, Yinghai Lu). - ACPICA update to upstream release 20150204 including an interrupt handling rework that allows drivers to install raw handlers for ACPI GPEs which then become entirely responsible for the given GPE and the ACPICA core code won't touch it (Lv Zheng, David E Box, Octavian Purdila). - ACPI EC driver rework to fix several concurrency issues and other problems related to events handling on top of the ACPICA's new support for raw GPE handlers (Lv Zheng). - New ACPI driver for AMD SoCs analogous to the LPSS (Low-Power Subsystem) driver for Intel chips (Ken Xue). - Two minor fixes of the ACPI LPSS driver (Heikki Krogerus, Jarkko Nikula). - Two new blacklist entries for machines (Samsung 730U3E/740U3E and 510R) where the native backlight interface doesn't work correctly while the ACPI one does (Hans de Goede). - Rework of the ACPI processor driver's handling of idle states to make the code more straightforward and less bloated overall (Rafael J Wysocki). - Assorted minor fixes related to ACPI and SFI (Andreas Ruprecht, Andy Shevchenko, Hanjun Guo, Jan Beulich, Rafael J Wysocki, Yaowei Bai). - PCI core power management modification to avoid resuming (some) runtime-suspended devices during system suspend if they are in the right states already (Rafael J Wysocki). - New SFI-based cpufreq driver for Intel platforms using SFI (Srinidhi Kasagar). - cpufreq core fixes, cleanups and simplifications (Viresh Kumar, Doug Anderson, Wolfram Sang). - SkyLake CPU support and other updates for the intel_pstate driver (Kristen Carlson Accardi, Srinivas Pandruvada). - cpufreq-dt driver cleanup (Markus Elfring). - Init fix for the ARM big.LITTLE cpuidle driver (Sudeep Holla). - Generic power domains core code fixes and cleanups (Ulf Hansson). - Operating Performance Points (OPP) core code cleanups and kernel documentation update (Nishanth Menon). - New dabugfs interface to make the list of PM QoS constraints available to user space (Nishanth Menon). - New devfreq driver for Tegra Activity Monitor (Tomeu Vizoso). - New devfreq class (devfreq_event) to provide raw utilization data to devfreq governors (Chanwoo Choi). - Assorted minor fixes and cleanups related to power management (Andreas Ruprecht, Krzysztof Kozlowski, Rickard Strandqvist, Pavel Machek, Todd E Brandt, Wonhong Kwon). - turbostat updates (Len Brown) and cpupower Makefile improvement (Sriram Raghunathan). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJU2neOAAoJEILEb/54YlRx51QP/jrv1Wb5eMaemzMksPIWI5Zn I8IbxzToxu7wDDsrTBRv+LuyllMPrnppFOHHvB35gUYu7Y6I066s3ErwuqeFlbmy +VicmyGMahv3yN74qg49MXzWtaJZa8hrFXn8ItujiUIcs08yELi0vBQFlZImIbTB PdQngO88VfiOVjDvmKkYUU//9Sc9LCU0ZcdUQXSnA1oNOxuUHjiARz98R03hhSqu BWR+7M0uaFbu6XeK+BExMXJTpKicIBZ1GAF6hWrS8V4aYg+hH1cwjf2neDAzZkcU UkXieJlLJrCq+ZBNcy7WEhkWQkqJNWei5WYiy6eoQeQpNoliY2V+2OtSMJaKqDye PIiMwXstyDc5rgyULN0d1UUzY6mbcUt2rOL0VN2bsFVIJ1HWCq8mr8qq689pQUYv tcH18VQ2/6r2zW28sTO/ByWLYomklD/Y6bw2onMhGx3Knl0D8xYJKapVnTGhr5eY d4k41ybHSWNKfXsZxdJc+RxndhPwj9rFLfvY/CZEhLcW+2pAiMarRDOPXDoUI7/l aJpmPzy/6mPXGBnTfr6jKDSY3gXNazRIvfPbAdiGayKcHcdRM4glbSbNH0/h1Iq6 HKa8v9Fx87k1X5r4ZbhiPdABWlxuKDiM7725rfGpvjlWC3GNFOq7YTVMOuuBA225 Mu9PRZbOsZsnyNkixBpX =zZER -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "We have a few new features this time, including a new SFI-based cpufreq driver, a new devfreq driver for Tegra Activity Monitor, a new devfreq class for providing its governors with raw utilization data and a new ACPI driver for AMD SoCs. Still, the majority of changes here are reworks of existing code to make it more straightforward or to prepare it for implementing new features on top of it. The primary example is the rework of ACPI resources handling from Jiang Liu, Thomas Gleixner and Lv Zheng with support for IOAPIC hotplug implemented on top of it, but there is quite a number of changes of this kind in the cpufreq core, ACPICA, ACPI EC driver, ACPI processor driver and the generic power domains core code too. The most active developer is Viresh Kumar with his cpufreq changes. Specifics: - Rework of the core ACPI resources parsing code to fix issues in it and make using resource offsets more convenient and consolidation of some resource-handing code in a couple of places that have grown analagous data structures and code to cover the the same gap in the core (Jiang Liu, Thomas Gleixner, Lv Zheng). - ACPI-based IOAPIC hotplug support on top of the resources handling rework (Jiang Liu, Yinghai Lu). - ACPICA update to upstream release 20150204 including an interrupt handling rework that allows drivers to install raw handlers for ACPI GPEs which then become entirely responsible for the given GPE and the ACPICA core code won't touch it (Lv Zheng, David E Box, Octavian Purdila). - ACPI EC driver rework to fix several concurrency issues and other problems related to events handling on top of the ACPICA's new support for raw GPE handlers (Lv Zheng). - New ACPI driver for AMD SoCs analogous to the LPSS (Low-Power Subsystem) driver for Intel chips (Ken Xue). - Two minor fixes of the ACPI LPSS driver (Heikki Krogerus, Jarkko Nikula). - Two new blacklist entries for machines (Samsung 730U3E/740U3E and 510R) where the native backlight interface doesn't work correctly while the ACPI one does (Hans de Goede). - Rework of the ACPI processor driver's handling of idle states to make the code more straightforward and less bloated overall (Rafael J Wysocki). - Assorted minor fixes related to ACPI and SFI (Andreas Ruprecht, Andy Shevchenko, Hanjun Guo, Jan Beulich, Rafael J Wysocki, Yaowei Bai). - PCI core power management modification to avoid resuming (some) runtime-suspended devices during system suspend if they are in the right states already (Rafael J Wysocki). - New SFI-based cpufreq driver for Intel platforms using SFI (Srinidhi Kasagar). - cpufreq core fixes, cleanups and simplifications (Viresh Kumar, Doug Anderson, Wolfram Sang). - SkyLake CPU support and other updates for the intel_pstate driver (Kristen Carlson Accardi, Srinivas Pandruvada). - cpufreq-dt driver cleanup (Markus Elfring). - Init fix for the ARM big.LITTLE cpuidle driver (Sudeep Holla). - Generic power domains core code fixes and cleanups (Ulf Hansson). - Operating Performance Points (OPP) core code cleanups and kernel documentation update (Nishanth Menon). - New dabugfs interface to make the list of PM QoS constraints available to user space (Nishanth Menon). - New devfreq driver for Tegra Activity Monitor (Tomeu Vizoso). - New devfreq class (devfreq_event) to provide raw utilization data to devfreq governors (Chanwoo Choi). - Assorted minor fixes and cleanups related to power management (Andreas Ruprecht, Krzysztof Kozlowski, Rickard Strandqvist, Pavel Machek, Todd E Brandt, Wonhong Kwon). - turbostat updates (Len Brown) and cpupower Makefile improvement (Sriram Raghunathan)" * tag 'pm+acpi-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (151 commits) tools/power turbostat: relax dependency on APERF_MSR tools/power turbostat: relax dependency on invariant TSC Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS tools/power turbostat: relax dependency on root permission ACPI / video: Add disable_native_backlight quirk for Samsung 510R ACPI / PM: Remove unneeded nested #ifdef USB / PM: Remove unneeded #ifdef and associated dead code intel_pstate: provide option to only use intel_pstate with HWP ACPI / EC: Add GPE reference counting debugging messages ACPI / EC: Add query flushing support ACPI / EC: Refine command storm prevention support ACPI / EC: Add command flushing support. ACPI / EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag ACPI: add AMD ACPI2Platform device support for x86 system ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse() ACPI / EC: Update revision due to raw handler mode. ACPI / EC: Reduce ec_poll() by referencing the last register access timestamp. ACPI / EC: Fix several GPE handling issues by deploying ACPI_GPE_DISPATCH_RAW_HANDLER mode. ACPICA: Events: Enable APIs to allow interrupt/polling adaptive request based GPE handling model ...	2015-02-10 15:09:41 -08:00
Linus Torvalds	c08f846793	PCI changes for the v3.20 merge window: Enumeration - Move domain assignment from arm64 to generic code (Lorenzo Pieralisi) - ARM: Remove artificial dependency on pci_sys_data domain (Lorenzo Pieralisi) - ARM: Move to generic PCI domains (Lorenzo Pieralisi) - Generate uppercase hex for modalias var in uevent (Ricardo Ribalda Delgado) - Add and use generic config accessors on ARM, PowerPC (Rob Herring) Resource management - Free resources on failure in of_pci_get_host_bridge_resources() (Lorenzo Pieralisi) - Fix infinite loop with ROM image of size 0 (Michel Dänzer) PCI device hotplug - Handle surprise add even if surprise removal isn't supported (Bjorn Helgaas) Virtualization - Mark AMD/ATI VGA devices that don't reset on D3hot->D0 transition (Alex Williamson) - Add DMA alias quirk for Adaptec 3405 (Alex Williamson) - Add Wellsburg (X99) to Intel PCH root port ACS quirk (Alex Williamson) - Add ACS quirk for Emulex NICs (Vasundhara Volam) MSI - Fail MSI-X mappings if there's no space assigned to MSI-X BAR (Yijing Wang) Freescale Layerscape host bridge driver - Fix platform_no_drv_owner.cocci warnings (Julia Lawall) NVIDIA Tegra host bridge driver - Remove unnecessary tegra_pcie_fixup_bridge() (Lucas Stach) Renesas R-Car host bridge driver - Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov) TI Keystone host bridge driver - Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov) - Fix misspelling of current function in debug output (Julia Lawall) Xilinx AXI host bridge driver - Fix harmless format string warning (Arnd Bergmann) Miscellaneous - Use standard parsing functions for ASPM sysfs setters (Chris J Arges) - Add pci_device_to_OF_node() stub for !CONFIG_OF (Kevin Hao) - Delete unnecessary NULL pointer checks (Markus Elfring) - Add and use defines for PCIe Max_Read_Request_Size (Rafał Miłecki) - Include clk.h instead of clk-private.h (Stephen Boyd) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU1idjAAoJEFmIoMA60/r8C+gP/3p1Ya/cW+liX7e0Pz6wsrkB pAk9Af9Iz7RHYb0ODAs1XnlvuJJtJ6nXb9iXTFefhWKfVK0dF1i0w2VqsLUa+iLS V65XzkdtrEa+uJj5plvzehrHQhOPh5U2WtZvsAgeC6yu9F/LhnJywOIZjaYdCYwE /lXBgLPJiqXfDyEpKT6TqObwPpY7Ly+7+yNZ4LcO84AuBwb6lBq88Eyl7Ft+K57m dIJVh8ZTQMzVy5EcGbLoIYF4Mg8zdQbxju73bfeBNerxwd7QD7l0mfiQ3yIexRrQ FvzgIerDYdabKgYcbC3cQzMR4V0TEcWs0E7VqiokU4onor4VnK4A0PtbMEWcK8YN OZnQ8d4imHhJN4HdJeMhiKIIk+Cr52A1fC/AKmL0Ddw8yKusgjPz2Ux0pHpXMR1a NodymVV4XWcDBKWPX0DLESe8wJC4fN+v8bwMVWg20BC709BaK61yA7lGqJ70HmJ+ u5mWokjgiycTQmJBiJmkEM9b5YVHLjrQ0PwDiYEtkxhMmd/ti+o12fhyNU8Epkr7 fgDEI/OTURbhbrX0qiQ8RZiSSt5WXuy+ZWbw76rfo3SUH3Lt1WrcdbZrUd9em2hy dqOA1vV97JUtQMD/Dsg9apTObR9XJk2B/lyZs1YyCcb77KGSEsSw2x+xz3RRVHM4 uQDwVupB/QsRVAu4tubr =N6mH -----END PGP SIGNATURE----- Merge tag 'pci-v3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Enumeration - Move domain assignment from arm64 to generic code (Lorenzo Pieralisi) - ARM: Remove artificial dependency on pci_sys_data domain (Lorenzo Pieralisi) - ARM: Move to generic PCI domains (Lorenzo Pieralisi) - Generate uppercase hex for modalias var in uevent (Ricardo Ribalda Delgado) - Add and use generic config accessors on ARM, PowerPC (Rob Herring) Resource management - Free resources on failure in of_pci_get_host_bridge_resources() (Lorenzo Pieralisi) - Fix infinite loop with ROM image of size 0 (Michel Dänzer) PCI device hotplug - Handle surprise add even if surprise removal isn't supported (Bjorn Helgaas) Virtualization - Mark AMD/ATI VGA devices that don't reset on D3hot->D0 transition (Alex Williamson) - Add DMA alias quirk for Adaptec 3405 (Alex Williamson) - Add Wellsburg (X99) to Intel PCH root port ACS quirk (Alex Williamson) - Add ACS quirk for Emulex NICs (Vasundhara Volam) MSI - Fail MSI-X mappings if there's no space assigned to MSI-X BAR (Yijing Wang) Freescale Layerscape host bridge driver - Fix platform_no_drv_owner.cocci warnings (Julia Lawall) NVIDIA Tegra host bridge driver - Remove unnecessary tegra_pcie_fixup_bridge() (Lucas Stach) Renesas R-Car host bridge driver - Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov) TI Keystone host bridge driver - Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov) - Fix misspelling of current function in debug output (Julia Lawall) Xilinx AXI host bridge driver - Fix harmless format string warning (Arnd Bergmann) Miscellaneous - Use standard parsing functions for ASPM sysfs setters (Chris J Arges) - Add pci_device_to_OF_node() stub for !CONFIG_OF (Kevin Hao) - Delete unnecessary NULL pointer checks (Markus Elfring) - Add and use defines for PCIe Max_Read_Request_Size (Rafał Miłecki) - Include clk.h instead of clk-private.h (Stephen Boyd)" * tag 'pci-v3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (48 commits) PCI: Add pci_device_to_OF_node() stub for !CONFIG_OF PCI: xilinx: Convert to use generic config accessors PCI: xgene: Convert to use generic config accessors PCI: tegra: Convert to use generic config accessors PCI: rcar: Convert to use generic config accessors PCI: generic: Convert to use generic config accessors powerpc/powermac: Convert PCI to use generic config accessors powerpc/fsl_pci: Convert PCI to use generic config accessors ARM: ks8695: Convert PCI to use generic config accessors ARM: sa1100: Convert PCI to use generic config accessors ARM: integrator: Convert PCI to use generic config accessors PCI: versatile: Add DT-based ARM Versatile PB PCIe host driver ARM: dts: versatile: add PCI controller binding of/pci: Free resources on failure in of_pci_get_host_bridge_resources() PCI: versatile: Add DT docs for ARM Versatile PB PCIe driver PCI: Fail MSI-X mappings if there's no space assigned to MSI-X BAR r8169: use PCI define for Max_Read_Request_Size [SCSI] esas2r: use PCI define for Max_Read_Request_Size tile: use PCI define for Max_Read_Request_Size rapidio/tsi721: use PCI define for Max_Read_Request_Size ...	2015-02-10 14:31:28 -08:00
Kirill A. Shutemov	0a19136205	x86: drop _PAGE_FILE and pte_file()-related helpers We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-10 14:30:33 -08:00
Kirill A. Shutemov	ece84b390a	hugetlb, x86: register 1G page size if we can allocate them at runtime After commit `944d9fec8d` ("hugetlb: add support for gigantic page allocation at runtime") we can allocate 1G pages at runtime if CMA is enabled. Let's register 1G pages into hugetlb even if the user hasn't requested them explicitly at boot time with hugepagesz=1G. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-10 14:30:28 -08:00
Linus Torvalds	bdccc4edeb	xen: features and fixes for 3.20-rc0 - Reworked handling for foreign (grant mapped) pages to simplify the code, enable a number of additional use cases and fix a number of long-standing bugs. - Prefer the TSC over the Xen PV clock when dom0 (and the TSC is stable). - Assorted other cleanup and minor bug fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJU2JC+AAoJEFxbo/MsZsTRIvAH/1lgQ0EQlxaZtEFWY8cJBzxY dXaTMfyGQOddGYDCW0r42hhXJHeX7DWXSERSD3aW9DZOn/eYdneHq9gWRD4uPrGn hEFQ26J4jZWR5riGXaja0LqI2gJKLZ6BhHIQciLEbY+jw4ynkNBLNRPFehuwrCsZ WdBwJkyvXC3RErekncRl/aNhxdi4p1P6qeiaW/mo3UcSO/CFSKybOLwT65iePazg XuY9UiTn2+qcRkm/tjx8K9heHK8SBEGNWuoTcWYF1to8mwwUfKIAc4NO2UBDXJI+ rp7Z2lVFdII15JsQ08ATh3t7xDrMWLzCX/y4jCzmF3DBXLbSWdHCQMgI7TWt5pE= =PyJK -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.20-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen features and fixes from David Vrabel: - Reworked handling for foreign (grant mapped) pages to simplify the code, enable a number of additional use cases and fix a number of long-standing bugs. - Prefer the TSC over the Xen PV clock when dom0 (and the TSC is stable). - Assorted other cleanup and minor bug fixes. * tag 'stable/for-linus-3.20-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (25 commits) xen/manage: Fix USB interaction issues when resuming xenbus: Add proper handling of XS_ERROR from Xenbus for transactions. xen/gntdev: provide find_special_page VMA operation xen/gntdev: mark userspace PTEs as special on x86 PV guests xen-blkback: safely unmap grants in case they are still in use xen/gntdev: safely unmap grants in case they are still in use xen/gntdev: convert priv->lock to a mutex xen/grant-table: add a mechanism to safely unmap pages that are in use xen-netback: use foreign page information from the pages themselves xen: mark grant mapped pages as foreign xen/grant-table: add helpers for allocating pages x86/xen: require ballooned pages for grant maps xen: remove scratch frames for ballooned pages and m2p override xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() mm: add 'foreign' alias for the 'pinned' page flag mm: provide a find_special_page vma operation x86/xen: cleanup arch/x86/xen/mmu.c x86/xen: add some __init annotations in arch/x86/xen/mmu.c x86/xen: add some __init and static annotations in arch/x86/xen/setup.c x86/xen: use correct types for addresses in arch/x86/xen/setup.c ...	2015-02-10 13:56:56 -08:00
Rafael J. Wysocki	b5e82233ca	Merge branch 'pm-tools' * pm-tools: tools/power turbostat: relax dependency on APERF_MSR tools/power turbostat: relax dependency on invariant TSC tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS tools/power turbostat: relax dependency on root permission cpupower Makefile change to help run the tool without 'make install'	2015-02-10 16:11:26 +01:00
Rafael J. Wysocki	7bc95d4ef1	Merge branch 'pm-cpufreq' * pm-cpufreq: (46 commits) intel_pstate: provide option to only use intel_pstate with HWP cpufreq-dt: Drop unnecessary check before cpufreq_cooling_unregister() invocation cpufreq: Create for_each_governor() cpufreq: Create for_each_policy() cpufreq: Drop cpufreq_disabled() check from cpufreq_cpu_{get\|put}() cpufreq: Set cpufreq_cpu_data to NULL before putting kobject intel_pstate: honor user space min_perf_pct override on resume intel_pstate: respect cpufreq policy request intel_pstate: Add num_pstates to sysfs intel_pstate: expose turbo range to sysfs intel_pstate: Add support for SkyLake cpufreq: stats: drop unnecessary locking cpufreq: stats: don't update stats on false notifiers cpufreq: stats: don't update stats from show_trans_table() cpufreq: stats: time_in_state can't be NULL in cpufreq_stats_update() cpufreq: stats: create sysfs group once we are ready cpufreq: remove CPUFREQ_UPDATE_POLICY_CPU notifications cpufreq: stats: drop 'cpu' field of struct cpufreq_stats cpufreq: Remove (now) unused 'last_cpu' from struct cpufreq_policy cpufreq: stats: rename 'struct cpufreq_stats' objects as 'stats' ...	2015-02-10 16:10:44 +01:00
Rafael J. Wysocki	8fbcf5ecb3	Merge branch 'acpi-resources' * acpi-resources: (23 commits) Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources x86/irq, ACPI: Implement ACPI driver to support IOAPIC hotplug ACPI: Add interfaces to parse IOAPIC ID for IOAPIC hotplug x86/PCI: Refine the way to release PCI IRQ resources x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation x86/PCI: Fix the range check for IO resources PCI: Use common resource list management code instead of private implementation resources: Move struct resource_list_entry from ACPI into resource core ACPI: Introduce helper function acpi_dev_filter_resource_type() ACPI: Add field offset to struct resource_list_entry ACPI: Translate resource into master side address for bridge window resources ACPI: Return translation offset when parsing ACPI address space resources ACPI: Enforce stricter checks for address space descriptors ACPI: Set flag IORESOURCE_UNSET for unassigned resources ACPI: Normalize return value of resource parser functions ACPI: Fix a bug in parsing ACPI Memory24 resource ACPI: Add prefetch decoding to the address space parser ACPI: Move the window flag logic to the combined parser ACPI: Unify the parsing of address_space and ext_address_space ACPI: Let the parser return false for disabled resources ...	2015-02-10 16:05:16 +01:00
Rafael J. Wysocki	ca45c879c2	Merge branches 'acpi-doc', 'acpi-pm', 'acpi-pcc' and 'acpi-tables' * acpi-doc: MAINTAINERS / ACPI: add the necessary '/' according to entry rules ACPI / Documentation: add a missing '=' * acpi-pm: ACPI / sleep: mark acpi_sleep_dmi_check() __init * acpi-pcc: ACPI / PCC: Use pr_debug() for debug messages in pcc_init() * acpi-tables: ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse()	2015-02-10 16:04:12 +01:00
Rafael J. Wysocki	99e4d89afc	Merge branches 'acpi-video' and 'acpi-soc' * acpi-video: ACPI / video: Add disable_native_backlight quirk for Samsung 510R ACPI / video: Add disable_native_backlight quirk for Samsung 730U3E/740U3E * acpi-soc: ACPI: add AMD ACPI2Platform device support for x86 system ACPI / LPSS: Remove non-existing clock control from Intel Lynxpoint I2C ACPI / LPSS: check the result of ioremap()	2015-02-10 16:04:01 +01:00
Rafael J. Wysocki	55c39fc2b1	Merge branch 'acpica' * acpica: ACPICA: Events: Enable APIs to allow interrupt/polling adaptive request based GPE handling model ACPICA: Events: Introduce acpi_set_gpe()/acpi_finish_gpe() to reduce divergences ACPICA: Events: Introduce ACPI_GPE_DISPATCH_RAW_HANDLER to fix 2 issues for the current GPE APIs ACPICA: Update version to 20150204 ACPICA: Update Copyright headers to 2015 ACPICA: Hardware: Cast GPE enable_mask before storing ACPICA: Events: Cleanup GPE dispatcher type obtaining code ACPICA: Events: Cleanup to move acpi_gbl_global_event_handler invocation out of acpi_ev_gpe_dispatch() ACPICA: Events: Cleanup of resetting the GPE handler to NULL before removing ACPICA: Events: Fix uninitialized variable ACPICA: Events: Remove acpi_ev_valid_gpe_event() due to current restriction ACPICA: Events: Remove duplicated sanity check in acpi_ev_enable_gpe() ACPICA: Events: Back port "ACPICA: Save current masks of enabled GPEs after enable register writes" ACPICA: Resources: Provide common part for struct acpi_resource_address structures. ACPI: Introduce acpi_unload_parent_table() usages in Linux kernel ACPICA: take ACPI_MTX_INTERPRETER in acpi_unload_table_id()	2015-02-10 15:58:57 +01:00
Radim Krčmář	dab2087def	KVM: x86: fix build with !CONFIG_SMP <asm/apic.h> isn't included directly and without CONFIG_SMP, an option that automagically pulls it can't be enabled. Reported-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-10 08:53:18 +01:00
Linus Torvalds	e07e0d4cb0	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS update from Ingo Molnar: "The changes in this cycle were: - allow mmcfg access to APEI error injection handlers - improve MCE error messages - smaller cleanups" * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, mce: Fix sparse errors x86, mce: Improve timeout error messages ACPI, EINJ: Enhance error injection tolerance level	2015-02-09 18:22:04 -08:00
Linus Torvalds	57d3629410	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm cleanups from Ingo Molnar: "Two cleanups: simplify parse_setup_data() and sanitize_e820_map() usage" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, e820: Clean up sanitize_e820_map() users x86, setup: Let early_memremap() handle page alignment	2015-02-09 18:16:03 -08:00
Linus Torvalds	a8f7684214	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SoC updates from Ingo Molnar: "Various Intel Atom SoC updates (mostly to enhance debuggability), plus an apb_timer cleanup" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: pmc_atom: Expose contents of PSS x86: pmc_atom: Clean up init function x86: pmc-atom: Remove unused macro x86: pmc_atom: don%27t check for NULL twice x86: pmc-atom: Assign debugfs node as soon as possible x86/platform: Remove unused function from apb_timer.c	2015-02-09 18:11:28 -08:00
Linus Torvalds	c93ecedab3	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Ingo Molnar: "Initial round of kernel_fpu_begin/end cleanups from Oleg Nesterov, plus a cleanup from Borislav Petkov" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, fpu: Fix math_state_restore() race with kernel_fpu_begin() x86, fpu: Don't abuse has_fpu in __kernel_fpu_begin/end() x86, fpu: Introduce per-cpu in_kernel_fpu state x86/fpu: Use a symbolic name for asm operand	2015-02-09 18:01:52 -08:00
Linus Torvalds	80f33a5fdf	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/rtc: Remove duplicate const specifier x86, early_serial_console: Remove unnecessary check x86, early_serial_console: Remove unused macro XMTRDY x86, setup: Rename BOOT_ISDIGIT_H to BOOT_CTYPE_H x86, CPU: Fix trivial printk formatting issues with dmesg	2015-02-09 17:50:09 -08:00
Linus Torvalds	7453311d68	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm changes from Ingo Molnar: "The main changes in this cycle were the x86/entry and sysret enhancements from Andy Lutomirski, see merge commits `772a9aca12` and `b57c0b5175` for details" [ Exectutive summary: IST exceptions that interrupt user space will run on the regular kernel stack instead of the IST stack. Which simplifies things particularly on return to user space. The sysret cleanup ends up simplifying the logic on when we can use sysret vs when we have to use iret. - Linus ] * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_64, entry: Remove the syscall exit audit and schedule optimizations x86_64, entry: Use sysret to return to userspace when possible x86, traps: Fix ist_enter from userspace x86, vdso: teach 'make clean' remove vdso64 binaries x86_64 entry: Fix RCX for ptraced syscalls x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET x86: entry_64.S: delete unused code x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic x86: Clean up current_stack_pointer x86, traps: Track entry into and exit from IST context x86, entry: Switch stacks on a paranoid entry from userspace	2015-02-09 17:16:44 -08:00
Linus Torvalds	9d43bade34	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 APIC updates from Ingo Molnar: "Continued fallout of the conversion of the x86 IRQ code to the hierarchical irqdomain framework: more cleanups, simplifications, memory allocation behavior enhancements, mainly in the interrupt remapping and APIC code" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) x86, init: Fix UP boot regression on x86_64 iommu/amd: Fix irq remapping detection logic x86/acpi: Make acpi_[un]register_gsi_ioapic() depend on CONFIG_X86_LOCAL_APIC x86: Consolidate boot cpu timer setup x86/apic: Reuse apic_bsp_setup() for UP APIC setup x86/smpboot: Sanitize uniprocessor init x86/smpboot: Move apic init code to apic.c init: Get rid of x86isms x86/apic: Move apic_init_uniprocessor code x86/smpboot: Cleanup ioapic handling x86/apic: Sanitize ioapic handling x86/ioapic: Add proper checks to setp/enable_IO_APIC() x86/ioapic: Provide stub functions for IOAPIC%3Dn x86/smpboot: Move smpboot inlines to code x86/x2apic: Use state information for disable x86/x2apic: Split enable and setup function x86/x2apic: Disable x2apic from nox2apic setup x86/x2apic: Add proper state tracking x86/x2apic: Clarify remapping mode for x2apic enablement x86/x2apic: Move code in conditional region ...	2015-02-09 16:57:56 -08:00
Linus Torvalds	a4cbbf549a	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - AMD range breakpoints support: Extend breakpoint tools and core to support address range through perf event with initial backend support for AMD extended breakpoints. The syntax is: perf record -e mem:addr/len:type For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512) perf record -e mem:0x1000/512:w - event throttling/rotating fixes - various event group handling fixes, cleanups and general paranoia code to be more robust against bugs in the future. - kernel stack overhead fixes User-visible tooling side changes: - Show precise number of samples in at the end of a 'record' session, if processing build ids, since we will then traverse the whole perf.data file and see all the PERF_RECORD_SAMPLE records, otherwise stop showing the previous off-base heuristicly counted number of "samples" (Namhyung Kim). - Support to read compressed module from build-id cache (Namhyung Kim) - Enable sampling loads and stores simultaneously in 'perf mem' (Stephane Eranian) - 'perf diff' output improvements (Namhyung Kim) - Fix error reporting for evsel pgfault constructor (Arnaldo Carvalho de Melo) Tooling side infrastructure changes: - Cache eh/debug frame offset for dwarf unwind (Namhyung Kim) - Support parsing parameterized events (Cody P Schafer) - Add support for IP address formats in libtraceevent (David Ahern) Plus other misc fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) perf: Decouple unthrottling and rotating perf: Drop module reference on event init failure perf: Use POLLIN instead of POLL_IN for perf poll data in flag perf: Fix put_event() ctx lock perf: Fix move_group() order perf: Fix event->ctx locking perf: Add a bit of paranoia perf symbols: Convert lseek + read to pread perf tools: Use perf_data_file__fd() consistently perf symbols: Support to read compressed module from build-id cache perf evsel: Set attr.task bit for a tracking event perf header: Set header version correctly perf record: Show precise number of samples perf tools: Do not use __perf_session__process_events() directly perf callchain: Cache eh/debug frame offset for dwarf unwind perf tools: Provide stub for missing pthread_attr_setaffinity_np perf evsel: Don't rely on malloc working for sz 0 tools lib traceevent: Add support for IP address formats perf ui/tui: Show fatal error message only if exists perf tests: Fix typo in sample-parsing.c ...	2015-02-09 15:43:55 -08:00
Rafael J. Wysocki	c488ea4613	Merge branch 'sfi' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-cpufreq Pull SFI-based cpufreq driver for v3.20 from Len Brown. * 'sfi' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: cpufreq: Add SFI based cpufreq driver support SFI: fix compiler warnings	2015-02-09 23:43:53 +01:00
Linus Torvalds	23e8fe2e16	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main RCU changes in this cycle are: - Documentation updates. - Miscellaneous fixes. - Preemptible-RCU fixes, including fixing an old bug in the interaction of RCU priority boosting and CPU hotplug. - SRCU updates. - RCU CPU stall-warning updates. - RCU torture-test updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) rcu: Initialize tiny RCU stall-warning timeouts at boot rcu: Fix RCU CPU stall detection in tiny implementation rcu: Add GP-kthread-starvation checks to CPU stall warnings rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors rcu: Optionally run grace-period kthreads at real-time priority ksoftirqd: Use new cond_resched_rcu_qs() function ksoftirqd: Enable IRQs and call cond_resched() before poking RCU rcutorture: Add more diagnostics in rcu_barrier() test failure case torture: Flag console.log file to prevent holdovers from earlier runs torture: Add "-enable-kvm -soundhw pcspk" to qemu command line rcutorture: Handle different mpstat versions rcutorture: Check from beginning to end of grace period rcu: Remove redundant rcu_batches_completed() declaration rcutorture: Drop rcu_torture_completed() and friends rcu: Provide rcu_batches_completed_sched() for TINY_RCU rcutorture: Use unsigned for Reader Batch computations rcutorture: Make build-output parsing correctly flag RCU's warnings rcu: Make _batches_completed() functions return unsigned long rcutorture: Issue warnings on close calls due to Reader Batch blows documentation: Fix smp typo in memory-barriers.txt ...	2015-02-09 14:28:42 -08:00
Len Brown	3a9a941d0b	tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS The Processor generation code-named Haswell added MSR_{CORE \| GFX \| RING}_PERF_LIMIT_REASONS to explain when and how the processor limits frequency. turbostat -v will now decode these bits. Each MSR has an "Active" set of bits which describe current conditions, and a "Logged" set of bits, which describe what has happened since last cleared. Turbostat currently doesn't clear the log bits. Signed-off-by: Len Brown <len.brown@intel.com>	2015-02-09 16:44:24 -05:00
Linus Torvalds	b0c1936c44	spi: Updates for v3.20 The major highlight this release is a refactoring of the core to allow us to run synchronous transfers in the context of the caller when there is no contention for the bus. This improves performance in the very common case by eliminating context switches and reducing the number of hardware setup and teardown operations we need to perform. Other changes: - New drivers for DLN-2 USB-SPI adapter and ST SPI controllers. - A big round of cleanups, performance and feature improvements for the xilinx driver from Ricardo Ribalda Delgado. - A wide range of smaller cleanups, fixes and feature improvements throughout the subsystem. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJU2GNgAAoJECTWi3JdVIfQLiYH/0uLN43CunPp0gSWllQ2PY1O R1QiqXg1fr1uZKRuGy59QF0TkU/JlWPY+tpGiOH1jrnDsoecnWsxDx3YEeuYdV6U c//UrlK2uvESivbc48zVUTwCsgxsE8apG0JgqLjsfUpqZTEFxFpeSskepSJ2kIUz bsXHU8Xi0WkLalsk/8Ik8aUvOwVi5EtRE9OMvnU6QPqQMCszgv1TH4UbwbhqwwzZ U23WbNHQ262XDRwY2LKl/QROULeU5pd9F19wrveKMa42fkbu/e+kk6E3n7/Hd4mV CUjv1wTCpPZvzh3bTk50uXwA9XQOzv6ddw6jqsgLcV6jS8Ju3Z3Beya3fmdhOl0= =3ZQr -----END PGP SIGNATURE----- Merge tag 'spi-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "The major highlight this release is a refactoring of the core to allow us to run synchronous transfers in the context of the caller when there is no contention for the bus. This improves performance in the very common case by eliminating context switches and reducing the number of hardware setup and teardown operations we need to perform. Other changes: - New drivers for DLN-2 USB-SPI adapter and ST SPI controllers. - A big round of cleanups, performance and feature improvements for the xilinx driver from Ricardo Ribalda Delgado. - A wide range of smaller cleanups, fixes and feature improvements throughout the subsystem" * tag 'spi-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (68 commits) spi: mxs: cleanup wait_for_completion return handling spi: ti-qspi: cleanup wait_for_completion return handling spi: spi-imx: cleanup wait_for_completion handling spi: sh-msiof: cleanup wait_for_completion return handling spi: match var type to return type of wait_for_completion spi: spi-pxa2xx: only include mach/dma.h for legacy DMA spi: atmel: cleanup wait_for_completion return handling spi: fsl-dspi: Remove possible memory leak of 'chip' spi: sh-msiof: Update calculation of frequency dividing spi: spidev: Convert buf pointers for 32-bit compat SPI_IOC_MESSAGE(n) spi/xilinx: Fix access invalid memory on xilinx_spi_tx spi: Revert "spi/xilinx: Remove iowrite/ioread wrappers" spi/xilinx: Check number of slaves range spi/xilinx: Use polling mode on small transfers spi/xilinx: Remove remaining_words driver data variable spi/xilinx: Remove iowrite/ioread wrappers spi/xilinx: Convert bits_per_word in bytes_per_word spi/xilinx: Convert remainding_bytes in remaining words spi/xilinx: Make spi_tx and spi_rx simmetric spi/xilinx: Remove rx_fn and tx_fn pointer ...	2015-02-09 13:36:20 -08:00
Tony Luck	a2413d8b29	x86/mce: Fix regression. All error records should report via /dev/mcelog I'm getting complaints from validation teams that have updated their Linux kernels from ancient versions to current. They don't see the error logs they expect. I tell the to unload any EDAC drivers[1], and things start working again. The problem is that we short-circuit the logging process if any function on the decoder chain claims to have dealt with the problem: ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m); if (ret == NOTIFY_STOP) return; The logic we used when we added this code was that we did not want to confuse users with double reports of the same error. But it turns out users are not confused - they are upset that they don't see a log where their tools used to find a log. I could also get into a long description of how the consumer of this log does more than just decode model specific details of the error. It keeps counts, tracks thresholds, takes actions and runs scripts that can alert administrators to problems. [1] We've recently compounded the problem because the acpi_extlog driver also registers for this notifier and also returns NOTIFY_STOP. Signed-off-by: Tony Luck <tony.luck@intel.com>	2015-02-09 09:36:53 -08:00
Paolo Bonzini	d44e121223	KVM: x86: emulate: correct page fault error code for NoWrite instructions NoWrite instructions (e.g. cmp or test) never set the "write access" bit in the error code, even if one of the operands is treated as a destination. Fixes: `c205fb7d7d` Cc: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-09 13:36:01 +01:00
Mark Brown	fab4b42a9a	Merge remote-tracking branches 'spi/topic/atmel', 'spi/topic/config', 'spi/topic/dln2' and 'spi/topic/dw' into spi-next	2015-02-08 11:16:43 +08:00
Linus Torvalds	26cdd1f76a	Merge branches 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and x86 fix from Ingo Molnar: "A CLOCK_TAI early expiry fix and an x86 microcode driver oops fix" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Fix incorrect tai offset calculation for non high-res timer systems * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Return error from driver init code when loader is disabled	2015-02-06 13:56:02 -08:00
Ken Xue	92082a8886	ACPI: add AMD ACPI2Platform device support for x86 system This new feature is to interpret AMD specific ACPI device to platform device such as I2C, UART, GPIO found on AMD CZ and later chipsets. It based on example intel LPSS. Now, it can support AMD I2C, UART and GPIO. Signed-off-by: Ken Xue <Ken.Xue@amd.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-02-06 15:42:16 +01:00
Paolo Bonzini	f781951299	kvm: add halt_poll_ns module parameter This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-06 13:08:37 +01:00
Hanjun Guo	2fad93083e	ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse() In acpi_table_parse(), pointer of the table to pass to handler() is checked before handler() called, so remove all the duplicate NULL check in the handler function. CC: Tony Luck <tony.luck@intel.com> CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-02-06 01:34:47 +01:00
Linus Torvalds	f3c2352df1	PCI updates for v3.19: Enumeration - Scan all device numbers on NEC as well as Stratus (Charlotte Richardson) Resource management - Handle read-only BARs on AMD CS553x devices (Myron Stowe) Synopsys DesignWare - Reject MSI-X IRQs (Lucas Stach) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU0m7MAAoJEFmIoMA60/r8zQoP/iZpoDfkIrvJWhLjRO1agVCd 7Y5Hrzr+AmESq4n3D7IQt7c+wzaR+05gOG/orDvaFfpbUMamTHlFxUjsk7ib3lLv Qg+WUX/I444pt/jDxRlqZbqWC8AkvYrTWPZoR+fagdWMuIGnKYNvjIAdmzRgMwB0 C3AFGa1S61tqD0+b2Dc177vPZowOUgPBq+ta0Ld8RJ7zBfNnL/GdgCqWxo0OyPNv uSeQuCmKEL5J1Sn1D+j0EKL5FdSDN+LJeU4k5hdF3I+GaervFkOYVqstAqv1kqML 2I9LBjMndRueseArqq7oCAeMn+lCEqleiYr+Y0DtlLOo68eZzh2YWNRV/PcRGMnj xOJo59rDSw5amg4baL7DVdPFIW3NGZiVThNiM3ye5CuhfAl3U/PY1xiDxQ85KY2F htc5rhnm1b9dE5LrJN0iChX4fT60sYyc4IMHuu3wuS5fRNDgZWscs0TT6GSIXSCH ua1xiaq0hLc8XoWdV0A/7R9kX37KJ0kyjCmSFJV/uVWKgKpS3eGi6Ex67qCU1w0C Vw2JT/H8V7vbNCpF/+8fYhXYzsTx6Oa0qLfcBEgT7xxjPTMTsH0LUW0wAivL60rV J9PPcIBrDui6fBVLDkOlgMBjokkShkSay0MUuaUaZib3BdcDY14RVfm8K8MhYS/I ViaWaM1bQ4DGzjnz5dQg =hsLb -----END PGP SIGNATURE----- Merge tag 'pci-v3.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "Enumeration - Scan all device numbers on NEC as well as Stratus (Charlotte Richardson) Resource management - Handle read-only BARs on AMD CS553x devices (Myron Stowe) Synopsys DesignWare - Reject MSI-X IRQs (Lucas Stach)" * tag 'pci-v3.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Handle read-only BARs on AMD CS553x devices PCI: Add NEC variants to Stratus ftServer PCIe DMI check PCI: designware: Reject MSI-X IRQs	2015-02-05 10:23:12 -08:00
Jiang Liu	b4b55cda58	x86/PCI: Refine the way to release PCI IRQ resources Some PCI device drivers assume that pci_dev->irq won't change after calling pci_disable_device() and pci_enable_device() during suspend and resume. Commit `c03b3b0738` ("x86, irq, mpparse: Release IOAPIC pin when PCI device is disabled") frees PCI IRQ resources when pci_disable_device() is called and reallocate IRQ resources when pci_enable_device() is called again. This breaks above assumption. So commit `3eec595235` ("x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation") and `9eabc99a63` ("x86, irq, PCI: Keep IRQ assignment for runtime power management") fix the issue by avoiding freeing/reallocating IRQ resources during PCI device suspend/resume. They achieve this by checking dev.power.is_prepared and dev.power.runtime_status. PM maintainer, Rafael, then pointed out that it's really an ugly fix which leaking PM internal state information to IRQ subsystem. Recently David Vrabel <david.vrabel@citrix.com> also reports an regression in pciback driver caused by commit `cffe0a2b5a` ("x86, irq: Keep balance of IOAPIC pin reference count"). Please refer to: http://lkml.org/lkml/2015/1/14/546 So this patch refine the way to release PCI IRQ resources. Instead of releasing PCI IRQ resources in pci_disable_device()/ pcibios_disable_device(), we now release it at driver unbinding notification BUS_NOTIFY_UNBOUND_DRIVER. In other word, we only release PCI IRQ resources when there's no driver bound to the PCI device, and it keeps the assumption that pci_dev->irq won't through multiple invocation of pci_enable_device()/pci_disable_device(). Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-02-05 15:09:26 +01:00
Jiang Liu	593669c2ac	x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation Use common ACPI resource discovery interfaces to simplify PCI host bridge resource enumeration. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-02-05 15:09:26 +01:00
Jiang Liu	812dbd9994	x86/PCI: Fix the range check for IO resources The range check in setup_res() checks the IO range against iomem_resource. That's just wrong. Reworked based on Thomas original patch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-02-05 15:09:25 +01:00
Jiang Liu	14d76b68f2	PCI: Use common resource list management code instead of private implementation Use common resource list management data structure and interfaces instead of private implementation. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-02-05 15:09:25 +01:00
Tiejun Chen	1c2b364b22	kvm: remove KVM_MMIO_SIZE After `f78146b0f9`, "KVM: Fix page-crossing MMIO", and `87da7e66a4`, "KVM: x86: fix vcpu->mmio_fragments overflow", actually KVM_MMIO_SIZE is gone. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-05 12:26:14 +01:00
Andy Lutomirski	a66734297f	perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks While perfmon2 is a sufficiently evil library (it pokes MSRs directly) that breaking it is fair game, it's still useful, so we might as well try to support it. This allows users to write 2 to /sys/devices/cpu/rdpmc to disable all rdpmc protection so that hack like perfmon2 can continue to work. At some point, if perf_event becomes fast enough to replace perfmon2, then this can go. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/caac3c1c707dcca48ecbc35f4def21495856f479.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:49 +01:00
Andy Lutomirski	7911d3f7af	perf/x86: Only allow rdpmc if a perf_event is mapped We currently allow any process to use rdpmc. This significantly weakens the protection offered by PR_TSC_DISABLED, and it could be helpful to users attempting to exploit timing attacks. Since we can't enable access to individual counters, use a very coarse heuristic to limit access to rdpmc: allow access only when a perf_event is mmapped. This protects seccomp sandboxes. There is plenty of room to further tighen these restrictions. For example, this allows rdpmc for any x86_pmu event, but it's only useful for self-monitoring tasks. As a side effect, cap_user_rdpmc will now be false for AMD uncore events. This isn't a real regression, since .event_idx is disabled for these events anyway for the time being. Whenever that gets re-added, the cap_user_rdpmc code can be adjusted or refactored accordingly. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/a2bdb3cf3a1d70c26980d7c6dddfbaa69f3182bf.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:47 +01:00
Andy Lutomirski	c1317ec2b9	perf: Pass the event to arch_perf_update_userpage() Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/0fea9a7fac3c1eea86cb0a5954184e74f4213666.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:46 +01:00
Andy Lutomirski	22c4bd9fa9	x86: Add a comment clarifying LDT context switching The code is correct, but only for a rather subtle reason. This confused me for quite a while when I read switch_mm, so clarify the code to avoid confusing other people, too. TBH, I wouldn't be surprised if this code was only correct by accident. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/0db86397f968996fb772c443c251415b0b430ddd.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:43 +01:00
Andy Lutomirski	1e02ce4ccc	x86: Store a per-cpu shadow copy of CR4 Context switches and TLB flushes can change individual bits of CR4. CR4 reads take several cycles, so store a shadow copy of CR4 in a per-cpu variable. To avoid wasting a cache line, I added the CR4 shadow to cpu_tlbstate, which is already touched in switch_mm. The heaviest users of the cr4 shadow will be switch_mm and __switch_to_xtra, and __switch_to_xtra is called shortly after switch_mm during context switch, so the cacheline is likely to be hot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:42 +01:00
Andy Lutomirski	375074cc73	x86: Clean up cr4 manipulation CR4 manipulation was split, seemingly at random, between direct (write_cr4) and using a helper (set/clear_in_cr4). Unfortunately, the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code, which only a small subset of users actually wanted. This patch replaces all cr4 access in functions that don't leave cr4 exactly the way they found it with new helpers cr4_set_bits, cr4_clear_bits, and cr4_set_bits_and_update_boot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:41 +01:00
Josh Poimboeuf	12cf89b550	livepatch: rename config to CONFIG_LIVEPATCH Rename CONFIG_LIVE_PATCHING to CONFIG_LIVEPATCH to make the naming of the config and the code more consistent. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-02-04 11:25:51 +01:00
Ingo Molnar	0967160ad6	Merge branch 'x86/asm' into perf/x86, to avoid conflicts with upcoming patches Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 09:01:12 +01:00
Ingo Molnar	8f4bf4bcc4	Linux 3.19-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUzvgKAAoJEHm+PkMAQRiG8XQH/1qVbHI4pP0KcnzfZUHq/mXq RuS4aJMwLm/Y6cXFraXBDaPde1A3CPtwtpob2C6giKcfu2zXGunY65haOEeJWNpX lCbBsLkNC3oDNkygBpVr5Zd6yibaw63WBjjLnpAi7pn2G2Zm2zB8DfILWWWMb7yz MH8ZXV+/xIYCTkjNWGWA1iMjmdYqu0PQHPeOgLsYQ+u7rxfM1zb/wHEkjqUZS6iu IaaZv7PV2PnFYnqib/iIPYjAEDvSQ4vN/7b82zlFd2Culm9j/568KCCWUPhJTb2l X0u4QYs49GnMTWVRa3bgYxS/nTUaE/6DeWs2y2WzqTt0/XDntVUnok0blUeDxGk= =o2kS -----END PGP SIGNATURE----- Merge tag 'v3.19-rc7' into perf/core, to merge fixes before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 07:58:29 +01:00
Jan Beulich	75aaf4c3e6	x86/raid6: correctly check for assembler capabilities Just like for AVX2 (which simply needs an #if -> #ifdef conversion), SSSE3 assembler support should be checked for before using it. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-04 08:35:51 +11:00
Rafael J. Wysocki	b2cd5dd71a	Merge branch 'acpica' into acpi-resources	2015-02-03 22:27:01 +01:00
Wincy Van	705699a139	KVM: nVMX: Enable nested posted interrupt processing If vcpu has a interrupt in vmx non-root mode, injecting that interrupt requires a vmexit. With posted interrupt processing, the vmexit is not needed, and interrupts are fully taken care of by hardware. In nested vmx, this feature avoids much more vmexits than non-nested vmx. When L1 asks L0 to deliver L1's posted interrupt vector, and the target VCPU is in non-root mode, we use a physical ipi to deliver POSTED_INTR_NV to the target vCPU. Using POSTED_INTR_NV avoids unexpected interrupts if a concurrent vmexit happens and L1's vector is different with L0's. The IPI triggers posted interrupt processing in the target physical CPU. In case the target vCPU was not in guest mode, complete the posted interrupt delivery on the next entry to L2. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-03 17:15:08 +01:00
Wincy Van	608406e290	KVM: nVMX: Enable nested virtual interrupt delivery With virtual interrupt delivery, the hardware lets KVM use a more efficient mechanism for interrupt injection. This is an important feature for nested VMX, because it reduces vmexits substantially and they are much more expensive with nested virtualization. This is especially important for throughput-bound scenarios. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-03 17:07:38 +01:00
Wincy Van	82f0dd4b27	KVM: nVMX: Enable nested apic register virtualization We can reduce apic register virtualization cost with this feature, it is also a requirement for virtual interrupt delivery and posted interrupt processing. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-03 17:07:03 +01:00
Wincy Van	b9c237bb1d	KVM: nVMX: Make nested control MSRs per-cpu To enable nested apicv support, we need per-cpu vmx control MSRs: 1. If in-kernel irqchip is enabled, we can enable nested posted interrupt, we should set posted intr bit in the nested_vmx_pinbased_ctls_high. 2. If in-kernel irqchip is disabled, we can not enable nested posted interrupt, the posted intr bit in the nested_vmx_pinbased_ctls_high will be cleared. Since there would be different settings about in-kernel irqchip between VMs, different nested control MSRs are needed. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-03 17:06:51 +01:00
Wincy Van	f2b93280ed	KVM: nVMX: Enable nested virtualize x2apic mode When L2 is using x2apic, we can use virtualize x2apic mode to gain higher performance, especially in apicv case. This patch also introduces nested_vmx_check_apicv_controls for the nested apicv patches. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-03 17:06:17 +01:00
Wincy Van	3af18d9c5f	KVM: nVMX: Prepare for using hardware MSR bitmap Currently, if L1 enables MSR_BITMAP, we will emulate this feature, all of L2's msr access is intercepted by L0. Features like "virtualize x2apic mode" require that the MSR bitmap is enabled, or the hardware will exit and for example not virtualize the x2apic MSRs. In order to let L1 use these features, we need to build a merged bitmap that only not cause a VMEXIT if 1) L1 requires that 2) the bit is not required by the processor for APIC virtualization. For now the guests are still run with MSR bitmap disabled, but this patch already introduces nested_vmx_merge_msr_bitmap for future use. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-03 17:02:32 +01:00
Ingo Molnar	b57c0b5175	x86: Entry cleanups and a bugfix for 3.20 This fixes a bug in the RCU code I added in ist_enter. It also includes the sysret stuff discussed here: http://lkml.kernel.org/g/cover.1421453410.git.luto%40amacapital.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUzhZ0AAoJEK9N98ZeDfrksUEH/j7wkUlMGan5h1AQIZQW6gKk OjlE1a4rfcgKocgkc0ix6UMc8Ks/NAUWKpeHR08eqR+Xi6Yk29cqLkboTEmAdYJ3 jQvKjGu51kiprNjAGqF5wdqxvCT3oBSdm7CWdtY4zHkEr+2W93Ht9PM7xZhj4r+P ekUC8mIKQrhyhlC7g7VpXLAi3Bk4mO+f499T7XBVsVoywWpgVpOMYMhtUobV1reW V7/zul/dMerzNLB0t3amvdgCLphHBQTQ0fHBAN62RY78UvSDt36EZFyS65isirsR LhO4FpWzF5YNMRk8Dep/fB8jYlhsCi40ZIlOtGSE6kNJyLhPt+oLnkpgOwWAMQc= =uiRw -----END PGP SIGNATURE----- Merge tag 'pr-20150201-x86-entry' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm Pull "x86: Entry cleanups and a bugfix for 3.20" from Andy Lutomirski: " This fixes a bug in the RCU code I added in ist_enter. It also includes the sysret stuff discussed here: http://lkml.kernel.org/g/cover.1421453410.git.luto%40amacapital.net " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-03 12:24:08 +01:00
Ingo Molnar	ad6e46869a	x86, vdso: One trivial last-minute VDSO build improvement Andrey noticed that the VDSO build wasn't cleaning itself up. This one-liner fixes it. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUzhiRAAoJEK9N98ZeDfrkpoIH/iS3dJ8IkWMhAcgYro87Hy+0 8YVJjHj+5DgSb4+SSB7B6DXCrmzvVFEXGxz9RraPxshDKLHWNh2qiuy0XqsGwdf8 Z7qkrbfcF1Dkp5RQG9vdkkHwwsh29Ky6R+1ewJVnPz/41Iy7G3uSI6QXLseytI2J myWSI/4yVEbL0CmKU+XXtV/oy1D6816vJSS9HG/Zm6WsO3FzR3NjPct1Wm6Br1/4 hDEGWfKEUg5l0fztCfbCQJO450mNfp/bXEdsy5wPuJTahj3p4MRz/ZoDpLcvFhyL Zldofc2XmHlBKwzbPs/5fO0l3pxko6y0kFWcCWa7wMy7TLhwoXsuRqiSowCy4AQ= =cf7M -----END PGP SIGNATURE----- Merge tag 'pr-20150201-x86-vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm Pull VDSO fix fro Andy Lutomirski: "x86, vdso: One trivial last-minute VDSO build improvement Andrey noticed that the VDSO build wasn't cleaning itself up. This one-liner fixes it." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-03 12:22:47 +01:00
Ingo Molnar	8dbcb8737c	Linux 3.19-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUzvgKAAoJEHm+PkMAQRiG8XQH/1qVbHI4pP0KcnzfZUHq/mXq RuS4aJMwLm/Y6cXFraXBDaPde1A3CPtwtpob2C6giKcfu2zXGunY65haOEeJWNpX lCbBsLkNC3oDNkygBpVr5Zd6yibaw63WBjjLnpAi7pn2G2Zm2zB8DfILWWWMb7yz MH8ZXV+/xIYCTkjNWGWA1iMjmdYqu0PQHPeOgLsYQ+u7rxfM1zb/wHEkjqUZS6iu IaaZv7PV2PnFYnqib/iIPYjAEDvSQ4vN/7b82zlFd2Culm9j/568KCCWUPhJTb2l X0u4QYs49GnMTWVRa3bgYxS/nTUaE/6DeWs2y2WzqTt0/XDntVUnok0blUeDxGk= =o2kS -----END PGP SIGNATURE----- Merge tag 'v3.19-rc7' into x86/asm, to refresh the branch before pulling in new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-03 12:22:18 +01:00
Stuart R. Anderson	ea9e9d8029	Specify PCI based UART for earlyprintk Add support for specifying PCI based UARTs for earlyprintk using a syntax like "earlyprintk=pciserial,00:18.1,115200", where 00:18.1 is the BDF of a UART device. [Slightly tidied from Stuart's original patch] Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-02-02 10:11:27 -08:00
Andy Shevchenko	874e52086f	x86, mrst: remove Moorestown specific serial drivers Intel Moorestown platform support was removed few years ago. This is a follow up which removes Moorestown specific code for the serial devices. It includes mrst_max3110 and earlyprintk bits. This was used on SFI (Medfield, Clovertrail) based platforms as well, though new ones use normal serial interface for the console service. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-02-02 10:11:24 -08:00
Marcelo Tosatti	2e6d015799	KVM: x86: revert "add method to test PIR bitmap vector" Revert `7c6a98dfa1`, given that testing PIR is not necessary anymore. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-02 18:36:34 +01:00
Marcelo Tosatti	f933986038	KVM: x86: fix lapic_timer_int_injected with APIC-v With APICv, LAPIC timer interrupt is always delivered via IRR: apic_find_highest_irr syncs PIR to IRR. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-02 18:36:25 +01:00
Charlotte Richardson	51ac3d2f0c	PCI: Add NEC variants to Stratus ftServer PCIe DMI check NEC OEMs the same platforms as Stratus does, which have multiple devices on some PCIe buses under downstream ports. Link: https://bugzilla.kernel.org/show_bug.cgi?id=51331 Fixes: `1278998f8f` ("PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)") Signed-off-by: Charlotte Richardson <charlotte.richardson@stratus.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v3.5+ CC: Myron Stowe <myron.stowe@redhat.com>	2015-02-02 09:36:23 -06:00
Andy Lutomirski	96b6352c12	x86_64, entry: Remove the syscall exit audit and schedule optimizations We used to optimize rescheduling and audit on syscall exit. Now that the full slow path is reasonably fast, remove these optimizations. Syscall exit auditing is now handled exclusively by syscall_trace_leave. This adds something like 10ns to the previously optimized paths on my computer, presumably due mostly to SAVE_REST / RESTORE_REST. I think that we should eventually replace both the syscall and non-paranoid interrupt exit slow paths with a pair of C functions along the lines of the syscall entry hooks. Link: http://lkml.kernel.org/r/22f2aa4a0361707a5cfb1de9d45260b39965dead.1421453410.git.luto@amacapital.net Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-02-01 04:03:02 -08:00
Andy Lutomirski	2a23c6b8a9	x86_64, entry: Use sysret to return to userspace when possible The x86_64 entry code currently jumps through complex and inconsistent hoops to try to minimize the impact of syscall exit work. For a true fast-path syscall, almost nothing needs to be done, so returning is just a check for exit work and sysret. For a full slow-path return from a syscall, the C exit hook is invoked if needed and we join the iret path. Using iret to return to userspace is very slow, so the entry code has accumulated various special cases to try to do certain forms of exit work without invoking iret. This is error-prone, since it duplicates assembly code paths, and it's dangerous, since sysret can malfunction in interesting ways if used carelessly. It's also inefficient, since a lot of useful cases aren't optimized and therefore force an iret out of a combination of paranoia and the fact that no one has bothered to write even more asm code to avoid it. I would argue that this approach is backwards. Rather than trying to avoid the iret path, we should instead try to make the iret path fast. Under a specific set of conditions, iret is unnecessary. In particular, if RIP==RCX, RFLAGS==R11, RIP is canonical, RF is not set, and both SS and CS are as expected, then movq 32(%rsp),%rsp;sysret does the same thing as iret. This set of conditions is nearly always satisfied on return from syscalls, and it can even occasionally be satisfied on return from an irq. Even with the careful checks for sysret applicability, this cuts nearly 80ns off of the overhead from syscalls with unoptimized exit work. This includes tracing and context tracking, and any return that invokes KVM's user return notifier. For example, the cost of getpid with CONFIG_CONTEXT_TRACKING_FORCE=y drops from ~360ns to ~280ns on my computer. This may allow the removal and even eventual conversion to C of a respectable amount of exit asm. This may require further tweaking to give the full benefit on Xen. It may be worthwhile to adjust signal delivery and exec to try hit the sysret path. This does not optimize returns to 32-bit userspace. Making the same optimization for CS == __USER32_CS is conceptually straightforward, but it will require some tedious code to handle the differences between sysretl and sysexitl. Link: http://lkml.kernel.org/r/71428f63e681e1b4aa1a781e3ef7c27f027d1103.1421453410.git.luto@amacapital.net Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-02-01 04:03:01 -08:00
Andy Lutomirski	b926e6f61a	x86, traps: Fix ist_enter from userspace context_tracking_user_exit() has no effect if in_interrupt() returns true, so ist_enter() didn't work. Fix it by calling exception_enter(), and thus context_tracking_user_exit(), before incrementing the preempt count. This also adds an assertion that will catch the problem reliably if CONFIG_PROVE_RCU=y to help prevent the bug from being reintroduced. Link: http://lkml.kernel.org/r/261ebee6aee55a4724746d0d7024697013c40a08.1422709102.git.luto@amacapital.net Fixes: `9592747538` x86, traps: Track entry into and exit from IST context Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-02-01 04:02:53 -08:00
Linus Torvalds	6155bc1431	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, but also an event groups fix, two PMU driver fixes and a CPU model variant addition" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Tighten (and fix) the grouping condition perf/x86/intel: Add model number for Airmont perf/rapl: Fix crash in rapl_scale() perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization perf probe: Fix probing kretprobes perf symbols: Introduce 'for' method to iterate over the symbols with a given name perf probe: Do not rely on map__load() filter to find symbols perf symbols: Introduce method to iterate symbols ordered by name perf symbols: Return the first entry with a given name in find_by_name method perf annotate: Fix memory leaks in LOCK handling perf annotate: Handle ins parsing failures perf scripting perl: Force to use stdbool perf evlist: Remove extraneous 'was' on error message	2015-01-30 14:34:55 -08:00
Linus Torvalds	1f59fe7667	The ARM changes are largish, but not too scary. And a simple fix for x86 (bug introduced in 3.19). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJUy2ulAAoJEL/70l94x66D18kIAJhuh2k5Mt3TfP/zfhi2Y6ER IAZqyFODs8txZ3v432PB8yWWvr2XfJ3gwfjvurLygQJ3jCGZqDrmucbUUXzEaPUk mPnLpxV0ZEmNweS2HLGPX9HJ6zfsZ1dHRk55Tko9ynAO731q7yPjj6HC0th8wzvE BRv5y/18rY2zyar+5Azpj5wpOSllq0ynMgjWXGSlaTLbQoyvgZtzbqNY6nsAGrKw e8hSUPogfGUmZkBHHHVDYKpgHvWS1hARyuGFo8LeKXKPo7qhYxZHCDpch8TXnq2y 21IvQfYddGpcMsaTroA5qyXFigxCX+1j3po6MS3ZH9GGXS5fC3sI8t0EDxKiO6Q= =O4X0 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "The ARM changes are largish, but not too scary. And a simple fix for x86 (bug introduced in 3.19)" (Paolo sayus these are the "Final" fixes. We'll see). * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: check LAPIC presence when building apic_map arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault arm/arm64: KVM: Invalidate data cache on unmap arm/arm64: KVM: Use set/way op trapping to track the state of the caches	2015-01-30 10:45:24 -08:00
Paolo Bonzini	ad15a29647	kvm: vmx: fix oops with explicit flexpriority=0 option A function pointer was not NULLed, causing kvm_vcpu_reload_apic_access_page to go down the wrong path and OOPS when doing put_page(NULL). This did not happen on old processors, only when setting the module option explicitly. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 16:18:49 +01:00
Radim Krčmář	df04d1d191	KVM: x86: check LAPIC presence when building apic_map We forgot to re-check LAPIC after splitting the loop in commit `173beedc16` (KVM: x86: Software disabled APIC should still deliver NMIs, 2014-11-02). Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Fixes: `173beedc16` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:28:31 +01:00
Radim Krčmář	8a395363e2	KVM: x86: fix x2apic logical address matching We cannot hit the bug now, but future patches will expose this path. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:26:46 +01:00
Radim Krčmář	3697f302ab	KVM: x86: replace 0 with APIC_DEST_PHYSICAL To make the code self-documenting. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:26:46 +01:00
Radim Krčmář	9368b56762	KVM: x86: cleanup kvm_apic_match_*() The majority of this patch turns result = 0; if (CODE) result = 1; return result; into return CODE; because we return bool now. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:26:45 +01:00
Radim Krčmář	52c233a440	KVM: x86: return bool from kvm_apic_match*() And don't export the internal ones while at it. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 12:26:45 +01:00
Kai Huang	843e433057	KVM: VMX: Add PML support in VMX This patch adds PML support in VMX. A new module parameter 'enable_pml' is added to allow user to enable/disable it manually. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 09:39:54 +01:00
Linus Torvalds	33692f2759	vm: add VM_FAULT_SIGSEGV handling support The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit `fee7e49d45` ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-01-29 10:51:32 -08:00
Kai Huang	88178fd4f7	KVM: x86: Add new dirty logging kvm_x86_ops for PML This patch adds new kvm_x86_ops dirty logging hooks to enable/disable dirty logging for particular memory slot, and to flush potentially logged dirty GPAs before reporting slot->dirty_bitmap to userspace. kvm x86 common code calls these hooks when they are available so PML logic can be hidden to VMX specific. SVM won't be impacted as these hooks remain NULL there. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:31:41 +01:00
Kai Huang	1c91cad423	KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access This patch changes the second parameter of kvm_mmu_slot_remove_write_access from 'slot id' to 'struct kvm_memory_slot ' to align with kvm_x86_ops dirty logging hooks, which will be introduced in further patch. Better way is to change second parameter of kvm_arch_commit_memory_region from 'struct kvm_userspace_memory_region ' to 'struct kvm_memory_slot * new', but it requires changes on other non-x86 ARCH too, so avoid it now. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:31:37 +01:00
Kai Huang	9b51a63024	KVM: MMU: Explicitly set D-bit for writable spte. This patch avoids unnecessary dirty GPA logging to PML buffer in EPT violation path by setting D-bit manually prior to the occurrence of the write from guest. We only set D-bit manually in set_spte, and leave fast_page_fault path unchanged, as fast_page_fault is very unlikely to happen in case of PML. For the hva <-> pa change case, the spte is updated to either read-only (host pte is read-only) or be dropped (host pte is writeable), and both cases will be handled by above changes, therefore no change is necessary. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:31:33 +01:00
Kai Huang	f4b4b18086	KVM: MMU: Add mmu help functions to support PML This patch adds new mmu layer functions to clear/set D-bit for memory slot, and to write protect superpages for memory slot. In case of PML, CPU logs the dirty GPA automatically to PML buffer when CPU updates D-bit from 0 to 1, therefore we don't have to write protect 4K pages, instead, we only need to clear D-bit in order to log that GPA. For superpages, we still write protect it and let page fault code to handle dirty page logging, as we still need to split superpage to 4K pages in PML. As PML is always enabled during guest's lifetime, to eliminate unnecessary PML GPA logging, we set D-bit manually for the slot with dirty logging disabled. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:31:29 +01:00
Kai Huang	3b0f1d01e5	KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty We don't have to write protect guest memory for dirty logging if architecture supports hardware dirty logging, such as PML on VMX, so rename it to be more generic. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:30:38 +01:00
Ingo Molnar	6d84d1d130	One final fix for 3.19 to address a wrongful deregistering of the microcode loader module. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUyMQuAAoJEBLB8Bhh3lVKoUcP/2NW+guS0v8/X7vBtO0R1TR6 zhjXKOcTIHlM/IBES/vszxKnq/jiiq1X+ZFwucx644RxmPo5fS0+zGIoNNchQgb2 g4Y9RChANXUG66CkMW5zhjOdu6XTuLuJnoY9B3cw8Ro7lTIj6XfYq60Is/U1avJX l7d99uTEt2mUGjlvYXnpSyFwOWiMOFDruQ9wtXFJ4BfJXKsObe6NHcQYzaY2Tu93 xYMbakh3i+EmVhA4gmhtYlpm6LAXZ21vdSEOselfulgoyQm/SaU2/BGJ384RNNmC AdfLTM9qRfxUwUbA/jXak6YUDca3RznPPcBSyYhssLJkUvx8q4D6/CXIlh4ygPWr j2fc3gJt2KXzZzUvMx5MYMSyCtGm7Whx4XMLXkZBrRQK0TKwpTFqHReL/bY7nQHC iq22AloRA49rPo7GFYYm6xPOTCZUVo9VlVRIcAVqcIgjtkutwmwFyoVmuSrFlnpg tDQcG8pexxtmbbRHdlIYpN+BeKNikA0y+aiyoP8SSn0D3dduAnQ4lKZazE+i+fnT /hMz9eJVjk0ccCaCHC/gyLOgBWJlLUyfYz7nfCvQE4dKMTmyDJZZE1hH9Jr1OPQW zmTge8KqRtXbFNqnfNEE3UK/oBSuD45kx/oSa7BLlzZCjyVsfa1xjhv3rJFw0gHc TeMp8vkcTVgdX4EONupN =vHs7 -----END PGP SIGNATURE----- Merge tag 'microcode_fix_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Pull microcode fix from Borislav Petkov: "One final fix for 3.19 to address a wrongful deregistering of the microcode loader module." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-29 07:51:20 +01:00
Andrey Skvortsov	050835e9d3	x86, vdso: teach 'make clean' remove vdso64 binaries After 'make clean' vdso64.so and vdso64.dbg.so were left in arch/x86/vdso/. Link: http://lkml.kernel.org/r/1422453867-17326-1-git-send-email-andrej.skvortzov@gmail.com Signed-off-by: Andrey Skvortsov <andrej.skvortzov@gmail.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-01-28 18:44:18 -08:00
Yijing Wang	6a878e5085	PCI: Fail MSI-X mappings if there's no space assigned to MSI-X BAR Unlike MSI, which is configured via registers in the MSI capability in Configuration Space, MSI-X is configured via tables in Memory Space. These MSI-X tables are mapped by a device BAR, and if no Memory Space has been assigned to the BAR, MSI-X cannot be used. Fail MSI-X setup if no space has been assigned for the BAR. Previously, we ioremapped the MSI-X table even if the resource hadn't been assigned. In this case, the resource address is undefined (and is often zero), which may lead to warnings or oopses in this path: pci_enable_msix msix_capability_init msix_map_region ioremap_nocache The PCI core sets resource flags to zero when it can't assign space for the resource (see reset_resource()). There are also some cases where it sets the IORESOURCE_UNSET flag, e.g., pci_reassigndev_resource_alignment(), pci_assign_resource(), etc. So we must check for both cases. [bhelgaas: changelog] Reported-by: Zhang Jukuo <zhangjukuo@huawei.com> Tested-by: Zhang Jukuo <zhangjukuo@huawei.com> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2015-01-28 09:25:57 -06:00
Ingo Molnar	b3890e4704	Merge branch 'perf/hw_breakpoints' into perf/core The new hw_breakpoint bits are now ready for v3.20, merge them into the main branch, to avoid conflicts. Conflicts: tools/perf/Documentation/perf-record.txt Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 15:48:59 +01:00
Ingo Molnar	772a9aca12	This is my accumulated x86 entry work, part 1, for 3.20. The meat of this is an IST rework. When an IST exception interrupts user space, we will handle it on the per-thread kernel stack instead of on the IST stack. This sounds messy, but it actually simplifies the IST entry/exit code, because it eliminates some ugly games we used to play in order to handle rescheduling, signal delivery, etc on the way out of an IST exception. The IST rework introduces proper context tracking to IST exception handlers. I haven't seen any bug reports, but the old code could have incorrectly treated an IST exception handler as an RCU extended quiescent state. The memory failure change (included in this pull request with Borislav and Tony's permission) eliminates a bunch of code that is no longer needed now that user memory failure handlers are called in process context. Finally, this includes a few on Denys' uncontroversial and Obviously Correct (tm) cleanups. The IST and memory failure changes have been in -next for a while. LKML references: IST rework: http://lkml.kernel.org/r/cover.1416604491.git.luto@amacapital.net Memory failure change: http://lkml.kernel.org/r/54ab2ffa301102cd6e@agluck-desk.sc.intel.com Denys' cleanups: http://lkml.kernel.org/r/1420927210-19738-1-git-send-email-dvlasenk@redhat.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUtvkFAAoJEK9N98ZeDfrkcfsIAJxZ0UBUCEDvulbqgk/iPGOa fIpKLMowS7CpKtw6Wdc/YvAIkeHXWm1vU44Hj0TrjSrXCgVF8yCngs/xlXtOjoa1 dosXQqgqVJJ+hyui7chAEWyalLW7bEO8raq/6snhiMrhiuEkVKpEr7Fer4FVVCZL 4VALmNQQsbV+Qq4pXIhuagZC0Nt/XKi/+/cKvhS4p//q1F/TbHTz0FpDUrh0jPMh 18WFy0jWgxdkMRnSp/wJhekvdXX6PwUy5BdES9fjw8LQJZxxFpqN3Fe1kgfyzV0k yuvEHw1hPt2aBGj3q69wQvDVyyn4OqMpRDBhk4S+GJYmVh7mFyFMN4BDMEy/EY8= =LXVl -----END PGP SIGNATURE----- Merge tag 'pr-20150114-x86-entry' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm Pull x86/entry enhancements from Andy Lutomirski: " This is my accumulated x86 entry work, part 1, for 3.20. The meat of this is an IST rework. When an IST exception interrupts user space, we will handle it on the per-thread kernel stack instead of on the IST stack. This sounds messy, but it actually simplifies the IST entry/exit code, because it eliminates some ugly games we used to play in order to handle rescheduling, signal delivery, etc on the way out of an IST exception. The IST rework introduces proper context tracking to IST exception handlers. I haven't seen any bug reports, but the old code could have incorrectly treated an IST exception handler as an RCU extended quiescent state. The memory failure change (included in this pull request with Borislav and Tony's permission) eliminates a bunch of code that is no longer needed now that user memory failure handlers are called in process context. Finally, this includes a few on Denys' uncontroversial and Obviously Correct (tm) cleanups. The IST and memory failure changes have been in -next for a while. LKML references: IST rework: http://lkml.kernel.org/r/cover.1416604491.git.luto@amacapital.net Memory failure change: http://lkml.kernel.org/r/54ab2ffa301102cd6e@agluck-desk.sc.intel.com Denys' cleanups: http://lkml.kernel.org/r/1420927210-19738-1-git-send-email-dvlasenk@redhat.com " This tree semantically depends on and is based on the following RCU commit: `734d168013` ("rcu: Make rcu_nmi_enter() handle nesting") ... and for that reason won't be pushed upstream before the RCU bits hit Linus's tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 15:33:26 +01:00
Ingo Molnar	41ca5d4e9b	Merge commit `3669ef9fa7` ("x86, tls: Interpret an all-zero struct user_desc as 'no segment'") into x86/asm Pick up the latestest asm fixes before advancing it any further. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 15:30:32 +01:00
Jennifer Herbert	8da7633f16	xen: mark grant mapped pages as foreign Use the "foreign" page flag to mark pages that have a grant map. Use page->private to store information of the grant (the granting domain and the grant reference). Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 14:03:12 +00:00
Jennifer Herbert	0ae65f49af	x86/xen: require ballooned pages for grant maps Ballooned pages are always used for grant maps which means the original frame does not need to be saved in page->index nor restored after the grant unmap. This allows the workaround in netback for the conflicting use of the (unionized) page->index and page->pfmemalloc to be removed. Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 14:03:11 +00:00
David Vrabel	0bb599fd30	xen: remove scratch frames for ballooned pages and m2p override The scratch frame mappings for ballooned pages and the m2p override are broken. Remove them in preparation for replacing them with simpler mechanisms that works. The scratch pages did not ensure that the page was not in use. In particular, the foreign page could still be in use by hardware. If the guest reused the frame the hardware could read or write that frame. The m2p override did not handle the same frame being granted by two different grant references. Trying an M2P override lookup in this case is impossible. With the m2p override removed, the grant map/unmap for the kernel mappings (for x86 PV) can be easily batched in set_foreign_p2m_mapping() and clear_foreign_p2m_mapping(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2015-01-28 14:03:10 +00:00
David Vrabel	853d028934	xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() When unmapping grants, instead of converting the kernel map ops to unmap ops on the fly, pre-populate the set of unmap ops. This allows the grant unmap for the kernel mappings to be trivially batched in the future. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2015-01-28 14:03:10 +00:00
Kan Liang	ef454caeb7	perf/x86/intel: Add model number for Airmont Intel Airmont supports the same architectural and non-architectural performance monitoring events as Silvermont. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421913053-99803-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 13:17:32 +01:00
Stephane Eranian	98b008dff8	perf/rapl: Fix crash in rapl_scale() This patch fixes a systematic crash in rapl_scale() due to an invalid pointer. The bug was introduced by commit: `89cbc76768` ("x86: Replace __get_cpu_var uses") The fix is simple. Just put the parenthesis where it needs to be, i.e., around rapl_pmu. To my surprise, the compiler was not complaining about passing an integer instead of a pointer. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Fixes: `89cbc76768` ("x86: Replace __get_cpu_var uses") Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: cl@linux.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150122203834.GA10228@thinkpad Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 13:04:35 +01:00
Kan Liang	c05199e5a5	perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization There were some issues about the uncore driver tried to access non-existing boxes, which caused boot crashes. These issues have been all fixed. But we should avoid boot failures if that ever happens again. This patch intends to prevent this kind of potential issues. It moves uncore_box_init out of driver initialization. The box will be initialized when it's first enabled. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421729665-5912-1-git-send-email-kan.liang@intel.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-28 13:04:34 +01:00
Juergen Gross	270b79338e	x86/xen: cleanup arch/x86/xen/mmu.c Remove a nested ifdef. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 10:01:11 +00:00
Juergen Gross	bf9d834a9b	x86/xen: add some __init annotations in arch/x86/xen/mmu.c The file arch/x86/xen/mmu.c has some functions that can be annotated with "__init". Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 10:00:51 +00:00
Juergen Gross	a3f5239650	x86/xen: add some __init and static annotations in arch/x86/xen/setup.c Some more functions in arch/x86/xen/setup.c can be made "__init". xen_ignore_unusable() can be made "static". Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 10:00:36 +00:00
Juergen Gross	3ba5c867ca	x86/xen: use correct types for addresses in arch/x86/xen/setup.c In many places in arch/x86/xen/setup.c wrong types are used for physical addresses (u64 or unsigned long long). Use phys_addr_t instead. Use macros already defined instead of open coding them. Correct some other type mismatches. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 10:00:10 +00:00
Juergen Gross	f0feed10aa	x86/xen: cleanup arch/x86/xen/setup.c Remove extern declarations in arch/x86/xen/setup.c which are either not used or redundant. Move needed other extern declarations to xen-ops.h Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-28 09:59:46 +00:00
Boris Ostrovsky	da63865a01	x86, microcode: Return error from driver init code when loader is disabled Commits `65cef1311d` ("x86, microcode: Add a disable chicken bit") and `a18a0f6850` ("x86, microcode: Don't initialize microcode code on paravirt") allow microcode driver skip initialization when microcode loading is not permitted. However, they don't prevent the driver from being loaded since the init code returns 0. If at some point later the driver gets unloaded this will result in an oops while trying to deregister the (never registered) device. To avoid this, make init code return an error on paravirt or when microcode loading is disabled. The driver will then never be loaded. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1422411669-25147-1-git-send-email-boris.ostrovsky@oracle.com Reported-by: James Digwall <james@dingwall.me.uk> Cc: stable@vger.kernel.org # 3.18 Signed-off-by: Borislav Petkov <bp@suse.de>	2015-01-28 09:23:40 +01:00
Joerg Roedel	128ca093cc	kvm: iommu: Add cond_resched to legacy device assignment code When assigning devices to large memory guests (>=128GB guest memory in the failure case) the functions to create the IOMMU page-tables for the whole guest might run for a very long time. On non-preemptible kernels this might cause Soft-Lockup warnings. Fix these by adding a cond_resched() to the mapping and unmapping loops. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-27 21:31:12 +01:00
Kees Cook	d69911a68c	x86, build: replace Perl script with Shell script Commit `e6023367d7` ("x86, kaslr: Prevent .bss from overlaping initrd") added Perl to the required build environment. This reimplements in shell the Perl script used to find the size of the kernel with bss and brk added. Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Rob Landley <rob@landley.net> Acked-by: Rob Landley <rob@landley.net> Cc: Anca Emanuel <anca.emanuel@gmail.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Junjie Mao <eternal.n08@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-01-26 13:37:18 -08:00
Lv Zheng	a45de93eb1	ACPICA: Resources: Provide common part for struct acpi_resource_address structures. struct acpi_resource_address and struct acpi_resource_extended_address64 share substracts just at different offsets. To unify the parsing functions, OSPMs like Linux need a new ACPI_ADDRESS64_ATTRIBUTE as their substructs, so they can extract the shared data. This patch also synchronizes the structure changes to the Linux kernel. The usages are searched by matching the following keywords: 1. acpi_resource_address 2. acpi_resource_extended_address 3. ACPI_RESOURCE_TYPE_ADDRESS 4. ACPI_RESOURCE_TYPE_EXTENDED_ADDRESS And we found and fixed the usages in the following files: arch/ia64/kernel/acpi-ext.c arch/ia64/pci/pci.c arch/x86/pci/acpi.c arch/x86/pci/mmconfig-shared.c drivers/xen/xen-acpi-memhotplug.c drivers/acpi/acpi_memhotplug.c drivers/acpi/pci_root.c drivers/acpi/resource.c drivers/char/hpet.c drivers/pnp/pnpacpi/rsparser.c drivers/hv/vmbus_drv.c Build tests are passed with defconfig/allnoconfig/allyesconfig and defconfig+CONFIG_ACPI=n. Original-by: Thomas Gleixner <tglx@linutronix.de> Original-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-01-26 16:09:56 +01:00
Nadav Amit	82268083fa	KVM: x86: Emulation of call may use incorrect stack size On long-mode, when far call that changes cs.l takes place, the stack size is determined by the new mode. For instance, if we go from 32-bit mode to 64-bit mode, the stack-size if 64. KVM uses the old stack size. Fix it. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:17:34 +01:00
Nadav Amit	bac155310b	KVM: x86: 32-bit wraparound read/write not emulated correctly If we got a wraparound of 32-bit operand, and the limit is 0xffffffff, read and writes should be successful. It just needs to be done in two segments. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:15:18 +01:00
Nadav Amit	2b42fce695	KVM: x86: Fix defines in emulator.c Unnecassary define was left after commit `7d882ffa81` ("KVM: x86: Revert NoBigReal patch in the emulator"). Commit `39f062ff51` ("KVM: x86: Generate #UD when memory operand is required") was missing undef. Fix it. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:15:03 +01:00
Nadav Amit	2276b5116e	KVM: x86: ARPL emulation can cause spurious exceptions ARPL and MOVSXD are encoded the same and their execution depends on the execution mode. The operand sizes of each instruction are different. Currently, ARPL is detected too late, after the decoding was already done, and therefore may result in spurious exception (instead of failed emulation). Introduce a group to the emulator to handle instructions according to execution mode (32/64 bits). Note: in order not to make changes that may affect performance, the new ModeDual can only be applied to instructions with ModRM. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:14:49 +01:00
Nadav Amit	801806d956	KVM: x86: IRET emulation does not clear NMI masking The IRET instruction should clear NMI masking, but the current implementation does not do so. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:14:42 +01:00
Nadav Amit	16794aaaab	KVM: x86: Wrong operand size for far ret Indeed, Intel SDM specifically states that for the RET instruction "In 64-bit mode, the default operation size of this instruction is the stack-address size, i.e. 64 bits." However, experiments show this is not the case. Here is for example objdump of small 64-bit asm: 4004f1: ca 14 00 lret $0x14 4004f4: 48 cb lretq 4004f6: 48 ca 14 00 lretq $0x14 Therefore, remove the Stack flag from far-ret instructions. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:14:25 +01:00
Nadav Amit	2fcf5c8ae2	KVM: x86: Dirty the dest op page on cmpxchg emulation Intel SDM says for CMPXCHG: "To simplify the interface to the processorâ€™s bus, the destination operand receives a write cycle without regard to the result of the comparison.". This means the destination page should be dirtied. Fix it to by writing back the original value if cmpxchg failed. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:14:18 +01:00
Davidlohr Bueso	57b6b99bac	x86,xen: use current->state helpers Call __set_current_state() instead of assigning the new state directly. These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP environments, keeping track of who changed the state. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-26 10:21:26 +00:00
Linus Torvalds	14746306af	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Hopefully the last round of fixes for 3.19 - regression fix for the LDT changes - regression fix for XEN interrupt handling caused by the APIC changes - regression fixes for the PAT changes - last minute fixes for new the MPX support - regression fix for 32bit UP - fix for a long standing relocation issue on 64bit tagged for stable - functional fix for the Hyper-V clocksource tagged for stable - downgrade of a pr_err which tends to confuse users Looks a bit on the large side, but almost half of it are valuable comments" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Change Fast TSC calibration failed from error to info x86/apic: Re-enable PCI_MSI support for non-SMP X86_32 x86, mm: Change cachemode exports to non-gpl x86, tls: Interpret an all-zero struct user_desc as "no segment" x86, tls, ldt: Stop checking lm in LDT_empty x86, mpx: Strictly enforce empty prctl() args x86, mpx: Fix potential performance issue on unmaps x86, mpx: Explicitly disable 32-bit MPX support on 64-bit kernels x86, hyperv: Mark the Hyper-V clocksource as being continuous x86: Don't rely on VMWare emulating PAT MSR correctly x86, irq: Properly tag virtualization entry in /proc/interrupts x86, boot: Skip relocs when load address unchanged x86/xen: Override ACPI IRQ management callback __acpi_unregister_gsi ACPI: pci: Do not clear pci_dev->irq in acpi_pci_irq_disable() x86/xen: Treat SCI interrupt as normal GSI interrupt	2015-01-25 18:11:17 -08:00
K. Y. Srinivasan	4061ed9e2a	Drivers: hv: vmbus: Implement a clockevent device Implement a clockevent device based on the timer support available on Hyper-V. In this version of the patch I have addressed Jason's review comments. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-25 09:17:57 -08:00
Thomas Gleixner	ba360f887a	x86, init: Fix UP boot regression on x86_64 Commit `30b8b0066c` "init: Get rid of x86isms" broke the UP boot on x86_64. That happens because CONFIG_UP_LATE_INIT depends on CONFIG_X86_UP_APIC. X86_UP_APIC is a 32bit only config switch and therefor not set on 64bit UP builds. As a consequence the UP init of the local APIC and the IOAPIC is not called, which results in a boot failure. Make it depend on !SMP && X86_LOCAL_APIC instead. Fixes: `30b8b0066c` init: Get rid of x86isms Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-24 10:34:46 +01:00
Linus Torvalds	550695925d	PCI updates for v3.19: Resource management - Clip bridge windows to fit in upstream windows (Yinghai Lu) Virtualization - Mark Atheros AR93xx to avoid using bus reset (Alex Williamson) Miscellaneous - Update Richard Zhu's email address (Lucas Stach) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUwqpDAAoJEFmIoMA60/r8ykIQAINgkP/iPaFMTPkTSfzTJCMY oQVNGha4FDt6Ic1UWGyS/sYUpywSnALxlYWxVZTm5r+sGQ2yJBo6veuxvCI09YFw lWqf6lfvkFSthWCo7pHLoNaIjKJUNCy4a2han31aAIScMCNX4YF60YorMSjQBST8 smLMG75U3U9VWaXYsV1e5gTvLa5IQh4lgaTgMAOXqd+6WcAR4WwOgD2sR06o2X43 63JF2U+ieuA789Xu2IS92TmMMESD5haEZATqdGPtxpnqyHxmBNu0Y4JkkBWD2S92 HvveOoLBT2TBfICkftvCJscBLHh7PZMIx9nLx58SnijVzX+hzVr4Zfc96MZU50MK DuNbbZn3sO902ukOEpfih7Mg0tDxCxNytleEdAnXmZuqf+odbd/Y4AA0Hg6w7GEY OsVGbQAT/knlTfsSZsivtmUl7l1SXzrozv+q4f4szY95v34S9pm0sWzz0IBn7oKj h7N9Vslr3lyEudOUo1OrFq+0arDw53kwOOkIavMUH0nvTqKs4cmXBcGMfo1EfMa+ 3YhjwbgpvtZ3AXi2NSBk4gIGZEmQslvgRStLhgXVDl+9DieK+sw1Vx4cKe8gu9mD c7zPStEsJBJgd3v+8s8avwo8R0oPZb6MsCKFjjaYojTvpfFmfX0YyWE/TzYoUm6Z +BTyA8t0+3jTArTs/Zid =HRy7 -----END PGP SIGNATURE----- Merge tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "These are fixes for: - a resource management problem that causes a Radeon "Fatal error during GPU init" on machines where the BIOS programmed an invalid Root Port window. This was a regression in v3.16. - an Atheros AR93xx device that doesn't handle PCI bus resets correctly. This was a regression in v3.14. - an out-of-date email address" * tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: MAINTAINERS: Update Richard Zhu's email address sparc/PCI: Clip bridge windows to fit in upstream windows powerpc/PCI: Clip bridge windows to fit in upstream windows parisc/PCI: Clip bridge windows to fit in upstream windows mn10300/PCI: Clip bridge windows to fit in upstream windows microblaze/PCI: Clip bridge windows to fit in upstream windows ia64/PCI: Clip bridge windows to fit in upstream windows frv/PCI: Clip bridge windows to fit in upstream windows alpha/PCI: Clip bridge windows to fit in upstream windows x86/PCI: Clip bridge windows to fit in upstream windows PCI: Add pci_claim_bridge_resource() to clip window if necessary PCI: Add pci_bus_clip_resource() to clip to fit upstream window PCI: Pass bridge device, not bus, when updating bridge windows PCI: Mark Atheros AR93xx to avoid bus reset PCI: Add flag for devices where we can't use bus reset	2015-01-24 10:58:47 +12:00
Linus Torvalds	2e3810da41	Three small fixes. Two for x86 and one avoids that sparse bails out. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJUwkVXAAoJEL/70l94x66DwPEH/RPBmxJ+lD0nRyXVECSWxjN6 DYJvp4HsLV8BhBx/ATjAkjiVPKTUk9vQBjfgl72YatjASP9aNIkBqnN0AOVdVQ2i 04ZvYaSw3jY0A5PSecdFQZ4u8MAvaRS4AYNOYM3Kpf0EOrIwanXFpEfVRGT8ichT uBK/mbN7vDO1SsgAnB00fCew4wFrHIa7fJ8eLNnebDOuC72oUZA+2nKx8ApWq4ca ZaziqkI2CFaV2rqJokKDun2arxI2Q6/L87g7qyo+HMd1b+aepLTWYNOs1vH0YoSc 73aHg+3crIqx75XmnaxKP5SPOr6vpmnloux9yre8u1tvejBIbCMz1g9Mdl0YOmA= =YRTn -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "Three small fixes. Two for x86 and one avoids that sparse bails out" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: SYSENTER emulation is broken KVM: x86: Fix of previously incomplete fix for CVE-2014-8480 KVM: fix sparse warning in include/trace/events/kvm.h	2015-01-24 09:58:17 +12:00
WANG Chao	d574ffa106	x86, e820: Clean up sanitize_e820_map() users The argument 3 of sanitize_e820_map() will only be updated upon a successful sanitization. Some of the callers have extra conditionals for the same purpose. Clean them up. default_machine_specific_memory_setup() must keep the extra conditional because boot_params.e820_entries is an u8 and not an u32, so the direct update would overwrite other fields in boot_params. [ tglx: Massaged changelog ] Signed-off-by: WANG Chao <chaowang@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Lee Chun-Yi <joeyli.kernel@gmail.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Link: http://lkml.kernel.org/r/1420601859-18439-1-git-send-email-chaowang@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-23 16:14:27 +01:00
WANG Chao	7389882c81	x86, setup: Let early_memremap() handle page alignment early_memremap() takes care of page alignment and map size, so we can just remap the required data size and get rid of the adjustments in the setup code. [tglx: Massaged changelog ] Signed-off-by: WANG Chao <chaowang@redhat.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie> Link: http://lkml.kernel.org/r/1420628150-16872-1-git-send-email-chaowang@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-23 16:14:26 +01:00
Paolo Bonzini	8fff5e374a	KVM: s390: fixes and features for kvm/next (3.20) 1. Generic - sparse warning (make function static) - optimize locking - bugfixes for interrupt injection - fix MVPG addressing modes 2. hrtimer/wakeup fun A recent change can cause KVM hangs if adjtime is used in the host. The hrtimer might wake up too early or too late. Too early is fatal as vcpu_block will see that the wakeup condition is not met and sleep again. This CPU might never wake up again. This series addresses this problem. adjclock slowing down the host clock will result in too late wakeups. This will require more work. In addition to that we also change the hrtimer from REALTIME to MONOTONIC to avoid similar problems with timedatectl set-time. 3. sigp rework We will move all "slow" sigps to QEMU (protected with a capability that can be enabled) to avoid several races between concurrent SIGP orders. 4. Optimize the shadow page table Provide an interface to announce the maximum guest size. The kernel will use that to make the pagetable 2,3,4 (or theoretically) 5 levels. 5. Provide an interface to set the guest TOD We now use two vm attributes instead of two oneregs, as oneregs are vcpu ioctl and we don't want to call them from other threads. 6. Protected key functions The real HMC allows to enable/disable protected key CPACF functions. Lets provide an implementation + an interface for QEMU to activate this the protected key instructions. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJUwj60AAoJEBF7vIC1phx8iV0QAKq1LZRTmgTLS2fd0oyWKZeN ShWUIUiB+7IUiuogYXZMfqOm61oogxwc95Ti+3tpSWYwkzUWagpS/RJQze7E1HOc 3pHpXwrR01ueUT6uVV4xc/vmVIlQAIl/ScRDDPahlAT2crCleWcKVC9l0zBs/Kut IrfzN9pJcrkmXD178CDP8/VwXsn02ptLQEpidGibGHCd03YVFjp3X0wfwNdQxMbU qOwNYCz3SLfDm5gsybO2DG+aVY3AbM2ZOJt/qLv2j4Phz4XB4t4W9iJnAefSz7JA W4677wbMQpfZlUQYhI78H/Cl9SfWAuLug1xk83O/+lbEiR5u+8zLxB69dkFTiBaH 442OY957T6TQZ/V9d0jDo2XxFrcaU9OONbVLsfBQ56Vwv5cAg9/7zqG8eqH7Nq9R gU3fQesgD4N0Kpa77T9k45TT/hBRnUEtsGixAPT6QYKyE6cK4AJATHKSjMSLbdfj ELbt0p2mVtKhuCcANfEx54U2CxOrg5ElBmPz8hRw0OkXdwpqh1sGKmt0govcHP1I BGSzE9G4mswwI1bQ7cqcyTk/lwL8g3+KQmRJoOcgCveQlnY12X5zGD5DhuPMPiIT VENqbcTzjlxdu+4t7Enml+rXl7ySsewT9L231SSrbLsTQVgCudD1B9m72WLu5ZUT 9/Z6znv6tkeKV5rM9DYE =zLjR -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-20150122' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next KVM: s390: fixes and features for kvm/next (3.20) 1. Generic - sparse warning (make function static) - optimize locking - bugfixes for interrupt injection - fix MVPG addressing modes 2. hrtimer/wakeup fun A recent change can cause KVM hangs if adjtime is used in the host. The hrtimer might wake up too early or too late. Too early is fatal as vcpu_block will see that the wakeup condition is not met and sleep again. This CPU might never wake up again. This series addresses this problem. adjclock slowing down the host clock will result in too late wakeups. This will require more work. In addition to that we also change the hrtimer from REALTIME to MONOTONIC to avoid similar problems with timedatectl set-time. 3. sigp rework We will move all "slow" sigps to QEMU (protected with a capability that can be enabled) to avoid several races between concurrent SIGP orders. 4. Optimize the shadow page table Provide an interface to announce the maximum guest size. The kernel will use that to make the pagetable 2,3,4 (or theoretically) 5 levels. 5. Provide an interface to set the guest TOD We now use two vm attributes instead of two oneregs, as oneregs are vcpu ioctl and we don't want to call them from other threads. 6. Protected key functions The real HMC allows to enable/disable protected key CPACF functions. Lets provide an implementation + an interface for QEMU to activate this the protected key instructions.	2015-01-23 14:33:36 +01:00
Nadav Amit	f3747379ac	KVM: x86: SYSENTER emulation is broken SYSENTER emulation is broken in several ways: 1. It misses the case of 16-bit code segments completely (CVE-2015-0239). 2. MSR_IA32_SYSENTER_CS is checked in 64-bit mode incorrectly (bits 0 and 1 can still be set without causing #GP). 3. MSR_IA32_SYSENTER_EIP and MSR_IA32_SYSENTER_ESP are not masked in legacy-mode. 4. There is some unneeded code. Fix it. Cc: stable@vger.linux.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-23 13:57:15 +01:00
Nadav Amit	63ea0a49ae	KVM: x86: Fix of previously incomplete fix for CVE-2014-8480 STR and SLDT with rip-relative operand can cause a host kernel oops. Mark them as DstMem as well. Cc: stable@vger.linux.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-23 13:56:56 +01:00
Paolo Bonzini	1c6007d59a	KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added trace symbols, and adding an explicit VGIC init device control IOCTL. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUwhsKAAoJEEtpOizt6ddyuSEH/ia2uf07N0i+C1dPKYiqhKEd nFqBvgrhAMVztWLmy1Wq4SnO9YNd+CrPYATrfCiYsYQ9aKc09+qDq+uo06bVpZXz KsHjVGUsdyJ4qRqjDixkPvZviGIXa6C//+hcwg1XH2nit1uHmXVupzB9dDz3ZM2l GCwApdRdaaUVDt5Ud2ljqIWZa18Qf/5/HD8MdPXpmotDOKucL6pBr/1R1XWueCU/ ejRs/qy3EFyMWdEdfGFAMCa0ZvHbPmsJmvB/EgkyUnuJj77ptA0jNo1jtzSfEyis 53x4ffWnIsPl9yqhk0oKerIALVUvV4A7/me2ya6tsQ5fiBX7lJ3+qwggvCkWQzw= =fMS2 -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added trace symbols, and adding an explicit VGIC init device control IOCTL. Conflicts: arch/arm64/include/asm/kvm_arm.h arch/arm64/kvm/handle_exit.c	2015-01-23 13:39:51 +01:00
Dominik Dingel	31928aa586	KVM: remove unneeded return value of vcpu_postcreate The return value of kvm_arch_vcpu_postcreate is not checked in its caller. This is okay, because only x86 provides vcpu_postcreate right now and it could only fail if vcpu_load failed. But that is not possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so just get rid of the unchecked return value. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2015-01-23 13:24:52 +01:00
Alexandre Demers	520452172e	x86/tsc: Change Fast TSC calibration failed from error to info Many users see this message when booting without knowning that it is of no importance and that TSC calibration may have succeeded by another way. As explained by Paul Bolle in http://lkml.kernel.org/r/1348488259.1436.22.camel@x61.thuisdomein "Fast TSC calibration failed" should not be considered as an error since other calibration methods are being tried afterward. At most, those send a warning if they fail (not an error). So let's change the message from error to warning. [ tglx: Make if pr_info. It's really not important at all ] Fixes: `c767a54ba0` x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level> Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1418106470-6906-1-git-send-email-alexandre.f.demers@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-23 10:53:52 +01:00
Colin King	d505ad1d66	x86/rtc: Remove duplicate const specifier Building with clang: CC arch/x86/kernel/rtc.o arch/x86/kernel/rtc.c:173:29: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier] static const char * const const ids[] __initconst = Remove the duplicate const, it is not needed and causes a warning. Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: http://lkml.kernel.org/r/1421244475-313-1-git-send-email-colin.king@canonical.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-23 10:35:51 +01:00
Bryan O'Donoghue	38a1dfda8e	x86/apic: Re-enable PCI_MSI support for non-SMP X86_32 Commit `0dbc6078c0` ('x86, build, pci: Fix PCI_MSI build on !SMP') introduced the dependency that X86_UP_APIC is only available when PCI_MSI is false. This effectively prevents PCI_MSI support on 32bit UP systems because it disables both APIC and IO-APIC. But APIC support is architecturally required for PCI_MSI. The intention of the patch was to enforce APIC support when PCI_MSI is enabled, but failed to do so. Remove the !PCI_MSI dependency from X86_UP_APIC and enforce X86_UP_APIC when PCI_MSI support is enabled on 32bit UP systems. [ tglx: Massaged changelog ] Fixes `0dbc6078c0` 'x86, build, pci: Fix PCI_MSI build on !SMP' Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1421967529-9037-1-git-send-email-pure.logic@nexus-software.ie Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-23 10:20:30 +01:00
Juergen Gross	31bb772370	x86, mm: Change cachemode exports to non-gpl Commit `281d4078be` ("x86: Make page cache mode a real type") introduced the symbols __cachemode2pte_tbl and __pte2cachemode_tbl and exported them via EXPORT_SYMBOL_GPL. The exports are part of a replacement of code which has been EXPORT_SYMBOL before these changes resulting in build breakage of out-of-tree non-gpl modules. Change EXPORT_SYMBOL_GPL to EXPORT-SYMBOL for these two symbols. Fixes: `281d4078be` "x86: Make page cache mode a real type" Reported-and-tested-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/1421926997-28615-1-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 21:50:14 +01:00
Andy Lutomirski	3669ef9fa7	x86, tls: Interpret an all-zero struct user_desc as "no segment" The Witcher 2 did something like this to allocate a TLS segment index: struct user_desc u_info; bzero(&u_info, sizeof(u_info)); u_info.entry_number = (uint32_t)-1; syscall(SYS_set_thread_area, &u_info); Strictly speaking, this code was never correct. It should have set read_exec_only and seg_not_present to 1 to indicate that it wanted to find a free slot without putting anything there, or it should have put something sensible in the TLS slot if it wanted to allocate a TLS entry for real. The actual effect of this code was to allocate a bogus segment that could be used to exploit espfix. The set_thread_area hardening patches changed the behavior, causing set_thread_area to return -EINVAL and crashing the game. This changes set_thread_area to interpret this as a request to find a free slot and to leave it empty, which isn't quite what the game expects but should be close enough to keep it working. In particular, using the code above to allocate two segments will allocate the same segment both times. According to FrostbittenKing on Github, this fixes The Witcher 2. If this somehow still causes problems, we could instead allocate a limit==0 32-bit data segment, but that seems rather ugly to me. Fixes: `41bdc78544` x86/tls: Validate TLS entries to protect espfix Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: stable@vger.kernel.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 21:45:07 +01:00
Andy Lutomirski	e30ab185c4	x86, tls, ldt: Stop checking lm in LDT_empty 32-bit programs don't have an lm bit in their ABI, so they can't reliably cause LDT_empty to return true without resorting to memset. They shouldn't need to do this. This should fix a longstanding, if minor, issue in all 64-bit kernels as well as a potential regression in the TLS hardening code. Fixes: `41bdc78544` x86/tls: Validate TLS entries to protect espfix Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 21:11:06 +01:00
Dave Hansen	c922228efe	x86, mpx: Fix potential performance issue on unmaps The 3.19 merge window saw some TLB modifications merged which caused a performance regression. They were fixed in commit 045bbb9fa. Once that fix was applied, I also noticed that there was a small but intermittent regression still present. It was not present consistently enough to bisect reliably, but I'm fairly confident that it came from (my own) MPX patches. The source was reading a relatively unused field in the mm_struct via arch_unmap. I also noted that this code was in the main instruction flow of do_munmap() and probably had more icache impact than we want. This patch does two things: 1. Adds a static (via Kconfig) and dynamic (via cpuid) check for MPX with cpu_feature_enabled(). This keeps us from reading that cacheline in the mm and trades it for a check of the global CPUID variables at least on CPUs without MPX. 2. Adds an unlikely() to ensure that the MPX call ends up out of the main instruction flow in do_munmap(). I've added a detailed comment about why this was done and why we want it even on systems where MPX is present. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: luto@amacapital.net Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20150108223021.AEEAB987@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 21:11:06 +01:00
Dave Hansen	814564a0a1	x86, mpx: Explicitly disable 32-bit MPX support on 64-bit kernels We had originally planned on submitting MPX support in one patch set. We eventually broke it up in to two pieces for easier review. One of the features that didn't make the first round was supporting 32-bit binaries on 64-bit kernels. Once we split the set up, we never added code to restrict 32-bit binaries from _using_ MPX on 64-bit kernels. The 32-bit bounds tables are a different format than the 64-bit ones. Without this patch, the kernel will try to read a 32-bit binary's tables as if they were the 64-bit version. They will likely be noticed as being invalid rather quickly and the app will get killed, but that's kinda mean. This patch adds an explicit check, and will make a 64-bit kernel essentially behave as if it has no MPX support when called from a 32-bit binary. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20150108223020.9E9AA511@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 21:11:06 +01:00
Linus Torvalds	193934123c	Surprising number of fixes this merge window :( First two are minor fallout from the param rework which went in this merge window. Next three are a series which fixes a longstanding (but never previously reported and unlikely , so no CC stable) race between kallsyms and freeing the init section. Finally, a minor cleanup as our module refcount will now be -1 during unload. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUwEmwAAoJENkgDmzRrbjx77kP/1cNQR2eG2sBwokg3q0tvHnQ IKqEXErW7NvxRa+RAMEmy2uQoGt6+uNklAbtyJEYM9oR1NieFbPi2yrt9Xn5SAXS Brp1S8WYBMilA3W3o6I0trFDRWHdpdtkKIQwLWgJNSEWjbTXh8bSwp/2X1rlOPyI ZmphCMOQMU2/uFEyJhTz1WMEV8eVXiRLN8OxSkPxToxdZoGln2U8IBCCCJC9OG+f Cf3eMgEcNdEXNcPKqr11NIcHkAx6M6qI/eMDOqk151PslHa8lbis6di9Z87aE0ps i8PyrkJGTmgM9cCjXwE8deNseeCmuKYlbPIF+NoxcqtvZstfaMrISwTIEuzV4JHi p13YhDxy4XiC3H6pKHub/jo7UCl+wWtFh9SqpqGgduFX/p6FtUHQJm0S0X/DFFZt C+2MFVSe6HRHE8B7bFz86+619Qd/rU7+806CLCE+NbYlYAKIBYKzWt/bml6VH3RJ OjwXhQqmznWhJjsfD3BUUUpZpHijmylI9gAe2F1oErb8YjRU6gIm7P8hlkOzD7AS TfGHPFq2raQcfAiGdVmvkbvvhvYZXnB3WVsAexrYoqrT9I8eEfRI+7SkL75MLR2E ikzhJS3SHkAUAd7fUVMt7xMwh0jmhsPjWCCqc13m6UUFoXhTaDgKgPGftltN0bI2 g85+enZ3/eca6xh/KxvW =Kf9b -----END PGP SIGNATURE----- Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module and param fixes from Rusty Russell: "Surprising number of fixes this merge window :( The first two are minor fallout from the param rework which went in this merge window. The next three are a series which fixes a longstanding (but never previously reported and unlikely , so no CC stable) race between kallsyms and freeing the init section. Finally, a minor cleanup as our module refcount will now be -1 during unload" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: module: make module_refcount() a signed integer. module: fix race in kallsyms resolution during module load success. module: remove mod arg from module_free, rename module_memfree(). module_arch_freeing_init(): new hook for archs before module->module_init freed. param: fix uninitialized read with CONFIG_DEBUG_LOCK_ALLOC param: initialize store function to NULL if not available.	2015-01-23 06:40:36 +12:00
Thomas Gleixner	2f82c9dc60	x86/acpi: Make acpi_[un]register_gsi_ioapic() depend on CONFIG_X86_LOCAL_APIC Get rid of the defined but not used warnings Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com>	2015-01-22 15:17:41 +01:00
Thomas Gleixner	9c4d9c73dd	x86: Consolidate boot cpu timer setup Now that the APIC bringup is consolidated we can move the setup call for the percpu clock event device to apic_bsp_setup(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211704.162567839@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	374aab339f	x86/apic: Reuse apic_bsp_setup() for UP APIC setup Extend apic_bsp_setup() so the same code flow can be used for APIC_init_uniprocessor(). Folded Jiangs fix to provide proper ordering of the UP setup. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211704.084765674@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	613c25efbd	x86/smpboot: Sanitize uniprocessor init The UP related setups for local apic are mangled into smp_sanity_check(). That results in duplicate calls to disable_smp() and makes the code hard to follow. Let smp_sanity_check() return dedicated values for the various exit reasons and handle them at the call site. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.987833932@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	05f7e46d2a	x86/smpboot: Move apic init code to apic.c We better provide proper functions which implement the required code flow in the apic code rather than letting the smpboot code open code it. That allows to make more functions static and confines the APIC functionality to apic.c where it belongs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.907616730@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	30b8b0066c	init: Get rid of x86isms The UP local API support can be set up from an early initcall. No need for horrible hackery in the init code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211703.827943883@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	e714a91f92	x86/apic: Move apic_init_uniprocessor code Move the code to a different place so we can make other functions inline. Preparatory patch for further cleanups. No change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.731329006@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	ef4c59a4b6	x86/smpboot: Cleanup ioapic handling smpboot is very creative with the ways to disable ioapic. smpboot_clear_io_apic() smpboot_clear_io_apic_irqs() and disable_ioapic_support() serve a similar purpose. smpboot_clear_io_apic_irqs() is the most useless of all functions as it clears a variable which has not been setup yet. Aside of that it has the same ifdef mess and conditionals around the ioapic related code, which can now be removed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.650280684@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:56 +01:00
Thomas Gleixner	35e4c6d30e	x86/apic: Sanitize ioapic handling We have proper stubs for the IOAPIC=n case and the setup/enable function have the required checks inside now. Remove the ifdeffery and the copy&pasted conditionals. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>C Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.569830549@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	a46f5c8927	x86/ioapic: Add proper checks to setp/enable_IO_APIC() No point to have the same checks at every call site. Add them to the functions, so they can be called unconditionally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.490719938@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	8686608336	x86/ioapic: Provide stub functions for IOAPIC%3Dn To avoid lots of ifdeffery provide proper stubs for setup_IO_APIC(), enable_IO_APIC() and setup_ioapic_dest(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.397170414@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	f77aa308e5	x86/smpboot: Move smpboot inlines to code No point for a separate header file. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211703.304126687@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	6d2d49d2cd	x86/x2apic: Use state information for disable Use the state information to simplify the disable logic further. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211703.209387598@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	659006bf3a	x86/x2apic: Split enable and setup function enable_x2apic() is a convoluted unreadable mess because it is used for both enablement in early boot and for setup in cpu_init(). Split the code into x2apic_enable() for enablement and x2apic_setup() for setup of (secondary cpus). Make use of the new state tracking to simplify the logic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211703.129287153@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	44e25ff9e6	x86/x2apic: Disable x2apic from nox2apic setup There is no point in postponing the hardware disablement of x2apic. It can be disabled right away in the nox2apic setup function. Disable it right away and set the state to DISABLED . This allows to remove all the nox2apic conditionals all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211703.051214090@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	12e189d3cf	x86/x2apic: Add proper state tracking Having 3 different variables to track the state is just silly and error prone. Add a proper state tracking variable which covers the three possible states: ON/OFF/DISABLED. We cannot use x2apic_mode for this as this would require to change all users of x2apic_mode with explicit comparisons for a state value instead of treating it as boolean. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211702.955392443@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	62e61633da	x86/x2apic: Clarify remapping mode for x2apic enablement Rename the argument of try_to_enable_x2apic() so the purpose becomes more clear. Make the pr_warning more consistent and avoid the double print of "disabling". Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211702.876012628@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:55 +01:00
Thomas Gleixner	55eae7de72	x86/x2apic: Move code in conditional region No point in having try_to_enable_x2apic() outside of the CONFIG_X86_X2APIC section and having inline functions and more ifdefs to deal with it. Move the code into the existing ifdef section and remove the inline cruft. Fixup the printk about not enabling interrupt remapping as suggested by Boris. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211702.795388613@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	d524165cb8	x86/apic: Check x2apic early No point in delaying the x2apic detection for the CONFIG_X86_X2APIC=n case to enable_IR_x2apic(). We rather detect that before we try to setup anything there. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211702.702479404@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	9aa1636527	x86/apic: Make disable x2apic work really If x2apic_preenabled is not enabled, then disable_x2apic() is not called from various places which results in x2apic_disabled not being set. So other code pathes can happily reenable the x2apic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211702.621431109@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	2ca5b40479	x86/ioapic: Check x2apic really The x2apic_preenabled flag is just a horrible hack and if X2APIC support is disabled it does not reflect the actual hardware state. Check the hardware instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211702.541280622@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	bfb0507029	x86/apic: Move x2apic code to one place Having several disjunct pieces of code for x2apic support makes reading the code unnecessarily hard. Move it to one ifdeffed section. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211702.445212133@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	81a46dd824	x86/apic: Make x2apic_mode depend on CONFIG_X86_X2APIC No point in having a static variable around which is always 0. Let the compiler optimize code out if disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211702.363274310@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Thomas Gleixner	8d80696060	x86/apic: Avoid open coded x2apic detection enable_IR_x2apic() grew a open coded x2apic detection. Implement a proper helper function which shares the code with the already existing x2apic_enabled(). Made it use rdmsrl_safe as suggested by Boris. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20150115211702.285038186@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-22 15:10:54 +01:00
Borislav Petkov	cfaa790a3f	kvm: Fix CR3_PCID_INVD type on 32-bit arch/x86/kvm/emulate.c: In function ‘check_cr_write’: arch/x86/kvm/emulate.c:3552:4: warning: left shift count >= width of type rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD; happens because sizeof(UL) on 32-bit is 4 bytes but we shift it 63 bits to the left. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-21 15:59:09 +01:00
Ingo Molnar	f49028292c	Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Documentation updates. - Miscellaneous fixes. - Preemptible-RCU fixes, including fixing an old bug in the interaction of RCU priority boosting and CPU hotplug. - SRCU updates. - RCU CPU stall-warning updates. - RCU torture-test updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-21 06:12:21 +01:00
Marcelo Tosatti	54750f2cf0	KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp and rdtsc is larger than a given threshold: * If we get more than the below threshold into the future, we rerequest * the real time from the host again which has only little offset then * that we need to adjust using the TSC. * * For now that threshold is 1/5th of a jiffie. That should be good * enough accuracy for completely broken systems, but also give us swing * to not call out to the host all the time. */ #define PVCLOCK_DELTA_MAX ((1000000000ULL / HZ) / 5) Disable masterclock support (which increases said delta) in case the boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW. Upstreams kernels which support pvclock vsyscalls (and therefore make use of PVCLOCK_STABLE_BIT) use MSR_KVM_SYSTEM_TIME_NEW. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-20 20:38:39 +01:00
Fengguang Wu	69b0049a89	KVM: fix "Should it be static?" warnings from sparse arch/x86/kvm/x86.c:495:5: sparse: symbol 'kvm_read_nested_guest_page' was not declared. Should it be static? arch/x86/kvm/x86.c:646:5: sparse: symbol '__kvm_set_xcr' was not declared. Should it be static? arch/x86/kvm/x86.c:1183:15: sparse: symbol 'max_tsc_khz' was not declared. Should it be static? arch/x86/kvm/x86.c:1237:6: sparse: symbol 'kvm_track_tsc_matching' was not declared. Should it be static? Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-20 20:38:35 +01:00
Palik, Imre	94dd85f6a0	x86/xen: prefer TSC over xen clocksource for dom0 In Dom0's the use of the TSC clocksource (whenever it is stable enough to be used) instead of the Xen clocksource should not cause any issues, as Dom0 VMs never live-migrated. The TSC clocksource is somewhat more efficient than the Xen paravirtualised clocksource, thus it should have higher rating. This patch decreases the rating of the Xen clocksource in Dom0s to 275. Which is half-way between the rating of the TSC clocksource (300) and the hpet clocksource (250). Cc: Anthony Liguori <aliguori@amazon.com> Signed-off-by: Imre Palik <imrep@amazon.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-20 18:44:24 +00:00
Miroslav Benes	32b7eb8771	livepatch: change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING Change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING in Kconfigs. HAVE_ bools are prevalent there and we should go with the flow. Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-01-20 15:02:25 +01:00
K. Y. Srinivasan	32c6590d12	x86, hyperv: Mark the Hyper-V clocksource as being continuous The Hyper-V clocksource is continuous; mark it accordingly. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: jasowang@redhat.com Cc: gregkh@linuxfoundation.org Cc: devel@linuxdriverproject.org Cc: olaf@aepfle.de Cc: apw@canonical.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1421108762-3331-1-git-send-email-kys@microsoft.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 14:36:25 +01:00
Juergen Gross	9d34cfdf47	x86: Don't rely on VMWare emulating PAT MSR correctly VMWare seems not to emulate the PAT MSR correctly: reaeding MSR_IA32_CR_PAT returns 0 even after writing another value to it. Commit `bd809af16e` triggers this VMWare bug when the kernel is booted as a VMWare guest. Detect this bug and don't use the read value if it is 0. Fixes: `bd809af16e` "x86: Enable PAT to use cache mode translation tables" Reported-and-tested-by: Jongman Heo <jongman.heo@samsung.com> Acked-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Juergen Gross <jgross@suse.com> Link: http://lkml.kernel.org/r/1421039745-14335-1-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 14:33:45 +01:00
Oleg Nesterov	7575637ab2	x86, fpu: Fix math_state_restore() race with kernel_fpu_begin() math_state_restore() can race with kernel_fpu_begin() if irq comes right after __thread_fpu_begin(), __save_init_fpu() will overwrite fpu->state we are going to restore. Add 2 simple helpers, kernel_fpu_disable() and kernel_fpu_enable() which simply set/clear in_kernel_fpu, and change math_state_restore() to exclude kernel_fpu_begin() in between. Alternatively we could use local_irq_save/restore, but probably these new helpers can have more users. Perhaps they should disable/enable preemption themselves, in this case we can remove preempt_disable() in __restore_xstate_sig(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: matt.fleming@intel.com Cc: bp@suse.de Cc: pbonzini@redhat.com Cc: luto@amacapital.net Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/20150115192028.GD27332@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 13:53:07 +01:00
Oleg Nesterov	33a3ebdc07	x86, fpu: Don't abuse has_fpu in __kernel_fpu_begin/end() Now that we have in_kernel_fpu we can remove __thread_clear_has_fpu() in __kernel_fpu_begin(). And this allows to replace the asymmetrical and nontrivial use_eager_fpu + tsk_used_math check in kernel_fpu_end() with the same __thread_has_fpu() check. The logic becomes really simple; if _begin() does save() then _end() needs restore(), this is controlled by __thread_has_fpu(). Otherwise they do clts/stts unless use_eager_fpu(). Not only this makes begin/end symmetrical and imo more understandable, potentially this allows to change irq_fpu_usable() to avoid all other checks except "in_kernel_fpu". Also, with this patch __kernel_fpu_end() does restore_fpu_checking() and WARNs if it fails instead of math_state_restore(). I think this looks better because we no longer need __thread_fpu_begin(), and it would be better to report the failure in this case. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: matt.fleming@intel.com Cc: bp@suse.de Cc: pbonzini@redhat.com Cc: luto@amacapital.net Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/20150115192005.GC27332@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 13:53:07 +01:00
Oleg Nesterov	14e153ef75	x86, fpu: Introduce per-cpu in_kernel_fpu state interrupted_kernel_fpu_idle() tries to detect if kernel_fpu_begin() is safe or not. In particular it should obviously deny the nested kernel_fpu_begin() and this logic looks very confusing. If use_eager_fpu() == T we rely on a) __thread_has_fpu() check in interrupted_kernel_fpu_idle(), and b) on the fact that _begin() does __thread_clear_has_fpu(). Otherwise we demand that the interrupted task has no FPU if it is in kernel mode, this works because __kernel_fpu_begin() does clts() and interrupted_kernel_fpu_idle() checks X86_CR0_TS. Add the per-cpu "bool in_kernel_fpu" variable, and change this code to check/set/clear it. This allows to do more cleanups and fixes, see the next changes. The patch also moves WARN_ON_ONCE() under preempt_disable() just to make this_cpu_read() look better, this is not really needed. And in fact I think we should move it into __kernel_fpu_begin(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: matt.fleming@intel.com Cc: bp@suse.de Cc: pbonzini@redhat.com Cc: luto@amacapital.net Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/20150115191943.GB27332@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 13:53:07 +01:00
Andy Shevchenko	0e1540208e	x86: pmc_atom: Expose contents of PSS The PSS register reflects the power state of each island on SoC. It would be useful to know which of the islands is on or off at the momemnt. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Aubrey Li <aubrey.li@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com> Link: http://lkml.kernel.org/r/1421253575-22509-6-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 12:50:14 +01:00
Andy Shevchenko	4b25f42a37	x86: pmc_atom: Clean up init function There is no need to use err variable. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Aubrey Li <aubrey.li@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com> Link: http://lkml.kernel.org/r/1421253575-22509-5-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 12:50:14 +01:00
Andy Shevchenko	4922b9ce89	x86: pmc-atom: Remove unused macro DRIVER_NAME seems unused. This patch just removes it. There is no functional change. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Aubrey Li <aubrey.li@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com> Link: http://lkml.kernel.org/r/1421253575-22509-4-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 12:50:14 +01:00
Andy Shevchenko	d5df8fe34b	x86: pmc_atom: don%27t check for NULL twice debugfs_remove_recursive() is NULL-aware, thus, we may safely remove the check here. There is no need to assing NULL to variable since it will be not used anywhere. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Aubrey Li <aubrey.li@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com> Link: http://lkml.kernel.org/r/1421253575-22509-3-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 12:50:14 +01:00
Andy Shevchenko	1b43d7125f	x86: pmc-atom: Assign debugfs node as soon as possible pmc_dbgfs_unregister() will be called when pmc->dbgfs_dir is unconditionally NULL on error path in pmc_dbgfs_register(). To prevent this we move the assignment to where is should be. Fixes: `f855911c1f` (x86/pmc_atom: Expose PMC device state and platform sleep state) Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Aubrey Li <aubrey.li@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com> Link: http://lkml.kernel.org/r/1421253575-22509-2-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 12:50:14 +01:00
Jan Beulich	4a0d3107d6	x86, irq: Properly tag virtualization entry in /proc/interrupts The mis-naming likely was a copy-and-paste effect. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/54B9408B0200007800055E8B@mail.emea.novell.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 12:37:23 +01:00
Kees Cook	f285f4a21c	x86, boot: Skip relocs when load address unchanged On 64-bit, relocation is not required unless the load address gets changed. Without this, relocations do unexpected things when the kernel is above 4G. Reported-by: Baoquan He <bhe@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Thomas D. <whissi@whissi.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Junjie Mao <eternal.n08@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20150116005146.GA4212@www.outflux.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 12:37:23 +01:00
Jiang Liu	8abb850a03	x86/xen: Override ACPI IRQ management callback __acpi_unregister_gsi Xen overrides __acpi_register_gsi and leaves __acpi_unregister_gsi as is. That means, an IRQ allocated by acpi_register_gsi_xen_hvm() or acpi_register_gsi_xen() will be freed by acpi_unregister_gsi_ioapic(), which may cause undesired effects. So override __acpi_unregister_gsi to NULL for safety. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Cc: Tony Luck <tony.luck@intel.com> Cc: xen-devel@lists.xenproject.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Graeme Gregory <graeme.gregory@linaro.org> Cc: Lv Zheng <lv.zheng@intel.com> Link: http://lkml.kernel.org/r/1421720467-7709-4-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 11:44:41 +01:00
Jiang Liu	b568b8601f	x86/xen: Treat SCI interrupt as normal GSI interrupt Currently Xen Domain0 has special treatment for ACPI SCI interrupt, that is initialize irq for ACPI SCI at early stage in a special way as: xen_init_IRQ() ->pci_xen_initial_domain() ->xen_setup_acpi_sci() Allocate and initialize irq for ACPI SCI Function xen_setup_acpi_sci() calls acpi_gsi_to_irq() to get an irq number for ACPI SCI. But unfortunately acpi_gsi_to_irq() depends on IOAPIC irqdomains through following path acpi_gsi_to_irq() ->mp_map_gsi_to_irq() ->mp_map_pin_to_irq() ->check IOAPIC irqdomain For PV domains, it uses Xen event based interrupt manangement and doesn't make uses of native IOAPIC, so no irqdomains created for IOAPIC. This causes Xen domain0 fail to install interrupt handler for ACPI SCI and all ACPI events will be lost. Please refer to: https://lkml.org/lkml/2014/12/19/178 So the fix is to get rid of special treatment for ACPI SCI, just treat ACPI SCI as normal GSI interrupt as: acpi_gsi_to_irq() ->acpi_register_gsi() ->acpi_register_gsi_xen() ->xen_register_gsi() With above change, there's no need for xen_setup_acpi_sci() anymore. The above change also works with bare metal kernel too. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Cc: Tony Luck <tony.luck@intel.com> Cc: xen-devel@lists.xenproject.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: http://lkml.kernel.org/r/1421720467-7709-2-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-20 11:44:40 +01:00
Linus Torvalds	2262889091	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a regression that arose from the change to add a crypto prefix to module names which was done to prevent the loading of arbitrary modules through the Crypto API. In particular, a number of modules were missing the crypto prefix which meant that they could no longer be autoloaded" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: add missing crypto module aliases	2015-01-20 18:17:34 +12:00
Rusty Russell	be1f221c04	module: remove mod arg from module_free, rename module_memfree(). Nothing needs the module pointer any more, and the next patch will call it from RCU, where the module itself might no longer exist. Removing the arg is the safest approach. This just codifies the use of the module_alloc/module_free pattern which ftrace and bpf use. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Ley Foon Tan <lftan@altera.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: x86@kernel.org Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: linux-cris-kernel@axis.com Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: nios2-dev@lists.rocketboards.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Cc: netdev@vger.kernel.org	2015-01-20 11:38:33 +10:30
Christian Borntraeger	bccec2a0a2	x86/spinlock: Leftover conversion ACCESS_ONCE->READ_ONCE commit `78bff1c868` ("x86/ticketlock: Fix spin_unlock_wait() livelock") introduced two additional ACCESS_ONCE cases in x86 spinlock.h. Lets change those as well. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com>	2015-01-19 14:14:20 +01:00
Christian Borntraeger	1760f1eb7e	x86/xen/p2m: Replace ACCESS_ONCE with READ_ONCE ACCESS_ONCE does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145) Change the p2m code to replace ACCESS_ONCE with READ_ONCE. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: David Vrabel <david.vrabel@citrix.com>	2015-01-19 14:14:20 +01:00
Kai Huang	d91ffee9ec	Optimize TLB flush in kvm_mmu_slot_remove_write_access. No TLB flush is needed when there's no valid rmap in memory slot. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-19 11:09:37 +01:00
Rickard Strandqvist	0c55d6d931	x86: kvm: vmx: Remove some unused functions Removes some functions that are not used anywhere: cpu_has_vmx_eptp_writeback() cpu_has_vmx_eptp_uncacheable() This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-19 11:09:36 +01:00
Linus Torvalds	59b2858f57	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, but also two PMU driver fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools powerpc: Use dwfl_report_elf() instead of offline. perf tools: Fix segfault for symbol annotation on TUI perf test: Fix dwarf unwind using libunwind. perf tools: Avoid build splat for syscall numbers with uclibc perf tools: Elide strlcpy warning with uclibc perf tools: Fix statfs.f_type data type mismatch build error with uclibc tools: Remove bitops/hweight usage of bits in tools/perf perf machine: Fix __machine__findnew_thread() error path perf tools: Fix building error in x86_64 when dwarf unwind is on perf probe: Propagate error code when write(2) failed perf/x86/intel: Fix bug for "cycles:p" and "cycles:pp" on SLM perf/rapl: Fix sysfs_show() initialization for RAPL PMU	2015-01-18 06:24:30 +12:00
Andy Lutomirski	0fcedc8631	x86_64 entry: Fix RCX for ptraced syscalls The int_ret_from_sys_call and syscall tracing code disagrees with the sysret path as to the value of RCX. The Intel SDM, the AMD APM, and my laptop all agree that sysret returns with RCX == RIP. The syscall tracing code does not respect this property. For example, this program: int main() { extern const char syscall_rip[]; unsigned long rcx = 1; unsigned long orig_rcx = rcx; asm ("mov $-1, %%eax\n\t" "syscall\n\t" "syscall_rip:" : "+c" (rcx) : : "r11"); printf("syscall: RCX = %lX RIP = %lX orig RCX = %lx\n", rcx, (unsigned long)syscall_rip, orig_rcx); return 0; } prints: syscall: RCX = 400556 RIP = 400556 orig RCX = 1 Running it under strace gives this instead: syscall: RCX = FFFFFFFFFFFFFFFF RIP = 400556 orig RCX = 1 This changes FIXUP_TOP_OF_STACK to match sysret, causing the test to show RCX == RIP even under strace. It looks like this is a partial revert of: 88e4bc32686e ("[PATCH] x86-64 architecture specific sync for 2.5.8") from the historic git tree. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/c9a418c3dc3993cb88bb7773800225fd318a4c67.1421453410.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-17 11:02:53 +01:00
Linus Torvalds	23aa4b416a	This holds a few fixes to the ftrace infrastructure as well as the mixture of function graph tracing and kprobes. When jprobes and function graph tracing is enabled at the same time it will crash the system. # modprobe jprobe_example # echo function_graph > /sys/kernel/debug/tracing/current_tracer After the first fork (jprobe_example probes it), the system will crash. This is due to the way jprobes copies the stack frame and does not do a normal function return. This messes up with the function graph tracing accounting which hijacks the return address from the stack and replaces it with a hook function. It saves the return addresses in a separate stack to put back the correct return address when done. But because the jprobe functions do not do a normal return, their stack addresses are not put back until the function they probe is called, which means that the probed function will get the return address of the jprobe handler instead of its own. The simple fix here was to disable function graph tracing while the jprobe handler is being called. While debugging this I found two minor bugs with the function graph tracing. The first was about the function graph tracer sharing its function hash with the function tracer (they both get filtered by the same input). The changing of the set_ftrace_filter would not sync the function recording records after a change if the function tracer was disabled but the function graph tracer was enabled. This was due to the update only checking one of the ops instead of the shared ops to see if they were enabled and should perform the sync. This caused the ftrace accounting to break and a ftrace_bug() would be triggered, disabling ftrace until a reboot. The second was that the check to update records only checked one of the filter hashes. It needs to test both the "filter" and "notrace" hashes. The "filter" hash determines what functions to trace where as the "notrace" hash determines what functions not to trace (trace all but these). Both hashes need to be passed to the update code to find out what change is being done during the update. This also broke the ftrace record accounting and triggered a ftrace_bug(). This patch set also include two more fixes that were reported separately from the kprobe issue. One was that init_ftrace_syscalls() was called twice at boot up. This is not a major bug, but that call performed a rather large kmalloc (NR_syscalls * sizeof(syscalls_metadata)). The second call made the first one a memory leak, and wastes memory. The other fix is a regression caused by an update in the v3.19 merge window. The moving to enable events early, moved the enabling before PID 1 was created. The syscall events require setting the TIF_SYSCALL_TRACEPOINT for all tasks. But for_each_process_thread() does not include the swapper task (PID 0), and ended up being a nop. A suggested fix was to add the init_task() to have its flag set, but I didn't really want to mess with PID 0 for this minor bug. Instead I disable and re-enable events again at early_initcall() where it use to be enabled. This also handles any other event that might have its own reg function that could break at early boot up. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUt9vmAAoJEEjnJuOKh9ldLHEIAJ9XrPW2xMIY5yI69jT1F7pv PkSRqENnOK0l4UulD52SvIBecQTTBcEEjao4yVGkc7DCJBOws/1LZ5gW8OfNlKjq rMB8yaosL1tXJ1ARVPMjcQVy+228zkgTXznwEZCjku1g7LuScQ28qyXsXO7B6yiK xKoHqKjygmM/a2aVn+8tdiVKiDp6jdmkbYicbaFT4xP7XB5DaMmIiXRHxdvW6xdR azKrVfYiMyJqTZNt/EVSWUk2WjeaYhoXyNtvgPx515wTo/llCnzhjcsocXBtH2P/ YOtwl+1L7Z89ukV9oXqrtrUJZ6Ps7+g7I1flJuL7/1FlNGnklcP9JojD+t6HeT8= =vkec -----END PGP SIGNATURE----- Merge tag 'trace-fixes-v3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fixes from Steven Rostedt: "This holds a few fixes to the ftrace infrastructure as well as the mixture of function graph tracing and kprobes. When jprobes and function graph tracing is enabled at the same time it will crash the system: # modprobe jprobe_example # echo function_graph > /sys/kernel/debug/tracing/current_tracer After the first fork (jprobe_example probes it), the system will crash. This is due to the way jprobes copies the stack frame and does not do a normal function return. This messes up with the function graph tracing accounting which hijacks the return address from the stack and replaces it with a hook function. It saves the return addresses in a separate stack to put back the correct return address when done. But because the jprobe functions do not do a normal return, their stack addresses are not put back until the function they probe is called, which means that the probed function will get the return address of the jprobe handler instead of its own. The simple fix here was to disable function graph tracing while the jprobe handler is being called. While debugging this I found two minor bugs with the function graph tracing. The first was about the function graph tracer sharing its function hash with the function tracer (they both get filtered by the same input). The changing of the set_ftrace_filter would not sync the function recording records after a change if the function tracer was disabled but the function graph tracer was enabled. This was due to the update only checking one of the ops instead of the shared ops to see if they were enabled and should perform the sync. This caused the ftrace accounting to break and a ftrace_bug() would be triggered, disabling ftrace until a reboot. The second was that the check to update records only checked one of the filter hashes. It needs to test both the "filter" and "notrace" hashes. The "filter" hash determines what functions to trace where as the "notrace" hash determines what functions not to trace (trace all but these). Both hashes need to be passed to the update code to find out what change is being done during the update. This also broke the ftrace record accounting and triggered a ftrace_bug(). This patch set also include two more fixes that were reported separately from the kprobe issue. One was that init_ftrace_syscalls() was called twice at boot up. This is not a major bug, but that call performed a rather large kmalloc (NR_syscalls sizeof(syscalls_metadata)). The second call made the first one a memory leak, and wastes memory. The other fix is a regression caused by an update in the v3.19 merge window. The moving to enable events early, moved the enabling before PID 1 was created. The syscall events require setting the TIF_SYSCALL_TRACEPOINT for all tasks. But for_each_process_thread() does not include the swapper task (PID 0), and ended up being a nop. A suggested fix was to add the init_task() to have its flag set, but I didn't really want to mess with PID 0 for this minor bug. Instead I disable and re-enable events again at early_initcall() where it use to be enabled. This also handles any other event that might have its own reg function that could break at early boot up" tag 'trace-fixes-v3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix enabling of syscall events on the command line tracing: Remove extra call to init_ftrace_syscalls() ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing ftrace: Check both notrace and filter for old hash ftrace: Fix updating of filters for shared global_ops filters	2015-01-17 07:55:52 +13:00
Yinghai Lu	851b093692	x86/PCI: Clip bridge windows to fit in upstream windows Every PCI-PCI bridge window should fit inside an upstream bridge window because orphaned address space is unreachable from the primary side of the upstream bridge. If we inherit invalid bridge windows that overlap an upstream window from firmware, clip them to fit and update the bridge accordingly. [bhelgaas: changelog] Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491 Reported-by: Marek Kordik <kordikmarek@gmail.com> Tested-by: Marek Kordik <kordikmarek@gmail.com> Fixes: `5b28541552` ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources") Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: x86@kernel.org CC: stable@vger.kernel.org # v3.16+	2015-01-16 10:04:42 -06:00
Paolo Bonzini	e108ff2f80	KVM: x86: switch to kvm_get_dirty_log_protect We now have a generic function that does most of the work of kvm_vm_ioctl_get_dirty_log, now use it. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>	2015-01-16 14:40:14 +01:00
Kan Liang	33636732dc	perf/x86/intel: Fix bug for "cycles:p" and "cycles:pp" on SLM cycles:p and cycles:pp do not work on SLM since commit: `86a04461a9` ("perf/x86: Revamp PEBS event selection") UOPS_RETIRED.ALL is not a PEBS capable event, so it should not be used to count cycle number. Actually SLM calls intel_pebs_aliases_core2() which uses INST_RETIRED.ANY_P to count the number of cycles. It's a PEBS capable event. But inv and cmask must be set to count cycles. Considering SLM allows all events as PEBS with no flags, only INST_RETIRED.ANY_P, inv=1, cmask=16 needs to handled specially. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421084541-31639-1-git-send-email-kan.liang@intel.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-16 09:06:59 +01:00
Stephane Eranian	433678bdc6	perf/rapl: Fix sysfs_show() initialization for RAPL PMU This patch fixes a problem with the initialization of the sysfs_show() routine for the RAPL PMU. The current code was wrongly relying on the EVENT_ATTR_STR() macro which uses the events_sysfs_show() function in the x86 PMU code. That function itself was relying on the x86_pmu data structure. Yet RAPL and the core PMU (x86_pmu) have nothing to do with each other. They should therefore not interact with each other. The x86_pmu structure is initialized at boot time based on the host CPU model. When the host CPU is not supported, the x86_pmu remains uninitialized and some of the callbacks it contains are NULL. The false dependency with x86_pmu could potentially cause crashes in case the x86_pmu is not initialized while the RAPL PMU is. This may, for instance, be the case in virtualized environments. This patch fixes the problem by using a private sysfs_show() routine for exporting the RAPL PMU events. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150113225953.GA21525@thinkpad Cc: vincent.weaver@maine.edu Cc: jolsa@redhat.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-16 09:06:58 +01:00
Steven Rostedt (Red Hat)	237d28db03	ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing If the function graph tracer traces a jprobe callback, the system will crash. This can easily be demonstrated by compiling the jprobe sample module that is in the kernel tree, loading it and running the function graph tracer. # modprobe jprobe_example.ko # echo function_graph > /sys/kernel/debug/tracing/current_tracer # ls The first two commands end up in a nice crash after the first fork. (do_fork has a jprobe attached to it, so "ls" just triggers that fork) The problem is caused by the jprobe_return() that all jprobe callbacks must end with. The way jprobes works is that the function a jprobe is attached to has a breakpoint placed at the start of it (or it uses ftrace if fentry is supported). The breakpoint handler (or ftrace callback) will copy the stack frame and change the ip address to return to the jprobe handler instead of the function. The jprobe handler must end with jprobe_return() which swaps the stack and does an int3 (breakpoint). This breakpoint handler will then put back the saved stack frame, simulate the instruction at the beginning of the function it added a breakpoint to, and then continue on. For function tracing to work, it hijakes the return address from the stack frame, and replaces it with a hook function that will trace the end of the call. This hook function will restore the return address of the function call. If the function tracer traces the jprobe handler, the hook function for that handler will not be called, and its saved return address will be used for the next function. This will result in a kernel crash. To solve this, pause function tracing before the jprobe handler is called and unpause it before it returns back to the function it probed. Some other updates: Used a variable "saved_sp" to hold kcb->jprobe_saved_sp. This makes the code look a bit cleaner and easier to understand (various tries to fix this bug required this change). Note, if fentry is being used, jprobes will change the ip address before the function graph tracer runs and it will not be able to trace the function that the jprobe is probing. Link: http://lkml.kernel.org/r/20150114154329.552437962@goodmis.org Cc: stable@vger.kernel.org # 2.6.30+ Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-01-15 09:39:18 -05:00
Ingo Molnar	2372673c64	Minor cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUtScFAAoJEBLB8Bhh3lVKhUsP/37W3Sm+0kpQ/fo7h0BmiP6d U+3w8wvIRfqwp3e9id+YXaTpYBfD81ejcgqP/HV1HYYYPqpzpJY/cXfDpU9WEkKS rZZKQlw5KB+pA4nbs6GgroYjeoqBpW07Mz3FOHpIVyZ6W9wAyTtL76JK5nNIcBb4 db9H9cQOEs8mzaLMEDu36QGbnN/fQr4R3ULSAZCYBMrA7eDdpfd2mGJR57eOSutL o3XsQaIaIvnDKfjuGbBcLKkqdfwAgSZfVulKxgcBiSjdH6kN2blC9HHkQ+8vuEZp t+ouxzNZw4Ml1CbpzGU0hi9K3DkxXbhml7bMo9yZVlhjPglqyZXqVZU35rgIgEaB NxoenKSVybe2hi0K3S5kNtwig1GwadxUmK5S9M3HFbugu1OtKpgPvBp7GSy+fF28 aphe3pSh8o58rLmp6npigv0YTyIRkGKw1XYHKsP7cClvU2UbRmJrJpD6CGyMEBKC Npss2Sfon1+Ig4iP13VkUbJjxYsf/obbTSaLXsJ8mEWkv1nfNDeGmBaHWwlA1aMP i4toba9H0Ax264aApIQ4FAvict/Qvmh0Hh9sghG6ERpeeuXtzQMnfKy3Ts7tPc1s ZfmBq6IWs2ZkB2tIOUz2Caiw1ybWd2CSQdtbeu2B9wt0KzQW1xm3uTNGl27cmwR0 2MjjnO/uh0fZ1hKWFIr4 =Ut02 -----END PGP SIGNATURE----- Merge tag 'x86_queue' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/cleanups Pull minor x86 cleanups from Borislav Petkov. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-15 11:38:51 +01:00
Ingo Molnar	93d76c8026	When checking addresses in APEI action entries for validity, allow access the mmcfg space - some error injection functions need to do this. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUtWbqAAoJEKurIx+X31iBNjAP/1Yi3zKx87NmYf2juuhKKeo/ 39Pap/qw0BngSYcNh1okMFh7LdpPdQrGc/EN9/kprU9IgafiUeQHRUXyHy1OzMaI Ud6c4BiYBcfvVp81xCvfKiSUjAMB++vIamyt4daoDWqAhn0+MBNTgFoR1Nsyu9Pb bhp39n7kHQjyxTsfPnN5iBN3+fBz1zPJAWWGoPSfnL/MP3tossqlPRDqThF/BY2J lUe2BdASOx++ojE5fDPoeiJhvB/8Hovy4eN543qNvpAitBwBJLGlMcxlMEcYN3QQ skbML9ZQ4ipoLlWZ9i2siiLVF7Y2ME3/NbBGgquTZ8U3Kqi0kgCEePOFVnvxMfhA MFYVyc8lW72X8WyHdJXY0aHCyq6QFWfhiu7j/eu6bCbQnVqHWhMblzk2eEpPWwRV jK9arCHZGik9c9cYZJq+WzIukyrtJR1D+4jAcQyxUjR/wQJ1CLCe5xuDZUfiLDno GVwdarBoD1dtLIA4noc0VXwjVN51lS1MTBR7TL/cu6hj+p2Tq6FXa7QW30a80Udq 3fm2HdzhQEmfUAi2EznwZwtuy0Pt5c2DIAl/Ie4GDMJX19MuvKRRSt6SDCY+YSSQ SJB3iXm0t99MkIJrwycaQEb190DtFbVYk+gT26TtPrIuHbd5Y/5hwv0J+wSEDwom ErFpiJEhK6hnBdV+eHxB =3MwK -----END PGP SIGNATURE----- Merge tag 'please-pull-einj-mmcfg' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull RAS update from Tony Luck: "When checking addresses in APEI action entries for validity, allow access to the mmcfg space - some error injection functions need to do this." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-15 11:30:57 +01:00
Ingo Molnar	37e4d3b951	Nothing special this time, just an error messages improvement from Andy and a cleanup from me. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUtQlAAAoJEBLB8Bhh3lVKB5wQAIgnT9hZazgGtlvns8UIA1SY ZTM/4yEBGo/a5iIaD37ahfUm1Eb8Y/u3ITlKu8/DFyS7bEq9uEf+IGV9ZqeH8F9t QWe079ObsURsxUFz3jg9iW7Er4uWlvHoyaeWoGS8MCKpTB8oYLr70vZafNLLQ0Qy QYCV6SSFO51WZ2J3x7PBRXNANvvVPe8AhNx8rb3VOaP7aDlBXk8rzu3MVJwUJjOb /tgRz7uChp3HAW6PADehM+ELNDINMAW8wJB1XwbfHnwGJYTVrBdWpBnF2h/qULoN p/KU+zpuTZpopfkaHiNb7cgwR1B+Ig5DVvXHMTMkJCHe0y978ch4kwSy6nLGXvZ6 ig5h8yi83K1cXGdl6/HwRidge83Y97nceOi8hyqiVsOfGuOQMYbGw3lbzyLGPlVT RzRyaWToS7UlRtlr2qDvqzmLGujDt1bpSLhoLxNaQ3BKJ2tPZJ/TyH9BfvmE+Ed8 1zITTL88B7bXxJLIdyS8pgQxHeuoRuDutd0uRh0Uolm6Z6PHxOIt5euly99zhA9u s5Xl/7dv66VA8PmY7VoZlmCuxlU8uY/RUct/v7bwIspYO8NvUe7A7cSoFycge286 MON6rVXkvWqeJIxqXEs8K6tbtF+D6LhBUxUKGqXHbJvmslse0FwrwHoq0e4jmfth BgYBxhd/nH+CQO4OYBBF =r9gb -----END PGP SIGNATURE----- Merge tag 'ras_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras Pull RAS updates from Borislav Petkov: "Nothing special this time, just an error messages improvement from Andy and a cleanup from me." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-01-15 11:29:49 +01:00
Jiang Liu	c392f56c94	iommu/irq_remapping: Kill function irq_remapping_supported() and related code Simplify irq_remapping code by killing irq_remapping_supported() and related interfaces. Joerg posted a similar patch at https://lkml.org/lkml/2014/12/15/490, so assume an signed-off from Joerg. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Tested-by: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: iommu@lists.linux-foundation.org Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Oren Twaig <oren@scalemp.com> Link: http://lkml.kernel.org/r/1420615903-28253-14-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-15 11:24:23 +01:00
Jiang Liu	5fcee53ce7	x86/apic: Only disable CPU x2apic mode when necessary When interrupt remapping hardware is not in X2APIC, CPU X2APIC mode will be disabled if: 1) Maximum CPU APIC ID is bigger than 255 2) hypervisior doesn't support x2apic mode. But we should only check whether hypervisor supports X2APIC mode when hypervisor(CONFIG_HYPERVISOR_GUEST) is enabled, otherwise X2APIC will always be disabled when CONFIG_HYPERVISOR_GUEST is disabled and IR doesn't work in X2APIC mode. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: iommu@lists.linux-foundation.org Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Oren Twaig <oren@scalemp.com> Link: http://lkml.kernel.org/r/1420615903-28253-12-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-15 11:24:23 +01:00
Jiang Liu	ef1b2b8ad1	x86/apic: Handle XAPIC remap mode proper. If remapping is in XAPIC mode, the setup code just skips X2APIC initialization without checking max CPU APIC ID in system, which may cause problem if system has a CPU with APIC ID bigger than 255. Handle IR in XAPIC mode the same way as if remapping is disabled. [ tglx: Split out from previous patch ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: iommu@lists.linux-foundation.org Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Oren Twaig <oren@scalemp.com> Link: http://lkml.kernel.org/r/1420615903-28253-8-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-15 11:24:23 +01:00
Jiang Liu	07806c50bd	x86/apic: Refine enable_IR_x2apic() and related functions Refine enable_IR_x2apic() and related functions for better readability. [ tglx: Removed the XAPIC mode change and split it out into a seperate patch. Added comments. ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: iommu@lists.linux-foundation.org Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Oren Twaig <oren@scalemp.com> Link: http://lkml.kernel.org/r/1420615903-28253-8-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-15 11:24:23 +01:00
Jiang Liu	89356cf20e	x86/apic: Correctly detect X2APIC status in function enable_IR() X2APIC will be disabled if user specifies "nox2apic" on kernel command line, even when x2apic_preenabled is true. So correctly detect X2APIC status by using x2apic_enabled() instead of x2apic_preenabled. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: iommu@lists.linux-foundation.org Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Oren Twaig <oren@scalemp.com> Link: http://lkml.kernel.org/r/1420615903-28253-7-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-15 11:24:23 +01:00
Jiang Liu	7f530a2771	x86/apic: Kill useless variable x2apic_enabled in function enable_IR_x2apic() Local variable x2apic_enabled has been assigned to but never referred, so kill it. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: iommu@lists.linux-foundation.org Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Oren Twaig <oren@scalemp.com> Link: http://lkml.kernel.org/r/1420615903-28253-6-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-15 11:24:22 +01:00
Jiang Liu	2599094f6e	x86/apic: Panic if kernel doesn't support x2apic but BIOS has enabled x2apic When kernel doesn't support X2APIC but BIOS has enabled X2APIC, system may panic or hang without useful messages. On the other hand, it's hard to dynamically disable X2APIC when CONFIG_X86_X2APIC is disabled. So panic with a clear message in such a case. Now system panics as below when X2APIC is disabled and interrupt remapping is enabled: [ 0.316118] LAPIC pending interrupts after 512 EOI [ 0.322126] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 0.368655] Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC [ 0.378300] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.18.0+ #340 [ 0.385300] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0051.L05.1406240953 06/24/2014 [ 0.396997] ffff88046dc03000 ffff88046c307dd8 ffffffff8179dada 00000000000043f2 [ 0.405629] ffffffff81a92158 ffff88046c307e58 ffffffff8179b757 0000000000000002 [ 0.414261] 0000000000000008 ffff88046c307e68 ffff88046c307e08 ffffffff813ad82b [ 0.422890] Call Trace: [ 0.425711] [<ffffffff8179dada>] dump_stack+0x45/0x57 [ 0.431533] [<ffffffff8179b757>] panic+0xc1/0x1f5 [ 0.436978] [<ffffffff813ad82b>] ? delay_tsc+0x3b/0x70 [ 0.442910] [<ffffffff8166fa2c>] panic_if_irq_remap+0x1c/0x20 [ 0.449524] [<ffffffff81d73645>] setup_IO_APIC+0x405/0x82e [ 0.464979] [<ffffffff81d6fcc2>] native_smp_prepare_cpus+0x2d9/0x31c [ 0.472274] [<ffffffff81d5d0ac>] kernel_init_freeable+0xd6/0x223 [ 0.479170] [<ffffffff81792ad0>] ? rest_init+0x80/0x80 [ 0.485099] [<ffffffff81792ade>] kernel_init+0xe/0xf0 [ 0.490932] [<ffffffff817a537c>] ret_from_fork+0x7c/0xb0 [ 0.497054] [<ffffffff81792ad0>] ? rest_init+0x80/0x80 [ 0.502983] ---[ end Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC System hangs as below when X2APIC and interrupt remapping are both disabled: [ 1.102782] pci 0000:00:02.0: System wakeup disabled by ACPI [ 1.109351] pci 0000:00:03.0: System wakeup disabled by ACPI [ 1.115915] pci 0000:00:03.2: System wakeup disabled by ACPI [ 1.122479] pci 0000:00:03.3: System wakeup disabled by ACPI [ 1.132274] pci 0000:00:1c.0: Enabling MPC IRBNCE [ 1.137620] pci 0000:00:1c.0: Intel PCH root port ACS workaround enabled [ 1.145239] pci 0000:00:1c.0: System wakeup disabled by ACPI [ 1.151790] pci 0000:00:1c.7: Enabling MPC IRBNCE [ 1.157128] pci 0000:00:1c.7: Intel PCH root port ACS workaround enabled [ 1.164748] pci 0000:00:1c.7: System wakeup disabled by ACPI [ 1.171447] pci 0000:00:1e.0: System wakeup disabled by ACPI [ 1.178612] acpiphp: Slot [8] registered [ 1.183095] pci 0000:00:02.0: PCI bridge to [bus 01] [ 1.188867] acpiphp: Slot [2] registered With this patch applied, the system panics in both cases with a proper panic message. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: iommu@lists.linux-foundation.org Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Oren Twaig <oren@scalemp.com> Link: http://lkml.kernel.org/r/1420615903-28253-5-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-15 11:24:22 +01:00
Thomas Gleixner	f7ccadac2d	x86/apic: Clear stale x2apic mode If x2apic got disabled on the kernel command line, then the following issue can happen: enable_IR_x2apic() .... x2apic_mode = 1; enable_x2apic(); if (x2apic_disabled) { __disable_x2apic(); return; } That leaves X2APIC disabled in hardware, but x2apic_mode stays 1. So all other code which checks x2apic_mode gets the wrong information. Set x2apic_mode to 0 after disabling it in hardware. This is just a hotfix. The proper solution is to rework this code so it has seperate functions for the initial setup on the boot processor and the secondary cpus, but that's beyond the scope of this fix. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: iommu@lists.linux-foundation.org Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Oren Twaig <oren@scalemp.com>	2015-01-15 11:24:22 +01:00
Thomas Gleixner	a1dafe857d	iommu, x86: Restructure setup of the irq remapping feature enable_IR_x2apic() calls setup_irq_remapping_ops() which by default installs the intel dmar remapping ops and then calls the amd iommu irq remapping prepare callback to figure out whether we are running on an AMD machine with irq remapping hardware. Right after that it calls irq_remapping_prepare() which pointlessly checks: if (!remap_ops \|\| !remap_ops->prepare) return -ENODEV; and then calls remap_ops->prepare() which is silly in the AMD case as it got called from setup_irq_remapping_ops() already a few microseconds ago. Simplify this and just collapse everything into irq_remapping_prepare(). The irq_remapping_prepare() remains still silly as it assigns blindly the intel ops, but that's not scope of this patch. The scope here is to move the preperatory work, i.e. memory allocations out of the atomic section which is required to enable irq remapping. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Acked-and-tested-by: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: iommu@lists.linux-foundation.org Cc: Joerg Roedel <jroedel@suse.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Oren Twaig <oren@scalemp.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20141205084147.232633738@linutronix.de Link: http://lkml.kernel.org/r/1420615903-28253-2-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-15 11:24:22 +01:00
Arnd Bergmann	643165c8bb	uaccess: fix sparse warning on get/put_user for bitwise types At the moment, if p and x are both tagged as bitwise types, some of get_user(x, p), put_user(x, p), __get_user(x, p), __put_user(x, p) might produce a sparse warning on many architectures. This is a false positive: p on these architectures is loaded into long (typically using asm), then cast back to typeof(p). When typeof(p) is a bitwise type (which is uncommon), such a cast needs __force, otherwise sparse produces a warning. Some architectures already have the __force tag, add it where it's missing. I verified that adding these __force casts does not supress any useful warnings. Specifically, vhost wants to read/write bitwise types in userspace memory using get_user/put_user. At the moment this triggers sparse errors, since the value is passed through an integer. For example: __le32 __user p; __u32 x; both put_user(x, p); and get_user(x, p); should be safe, but produce warnings on some architectures. While there, I noticed that a bunch of architectures violated coding style rules within uaccess macros. Included patches to fix them up. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUtS+YAAoJECgfDbjSjVRpQ/QIAKXOc6tMXo+r/F32YC0Fv74G W4VKIk7u9XQNjOzez9i+xce75YBDBKHk5R9kLCfAg6Zew+6NRgbBV+QjGVB8dpot 2GxajcVhOySgaR45sGK3Ldg5yVz5ficqZEyYWKNgYeyMWJdlpvUk+4W5q15TiPZe u+C57/KzfRMDHyv3UkwAbqrkYGE0h7vXBi0BmOdCJlbKjG+6kFoVU/dAWsByDD5p q54ji8UdIkh2oyH5qhSbAwQN4Cg5N37Agw86HwltjQFJAVvV3yPRUsv7MQnpRB1+ hKlPXPUarNozGVV7OlcvGa9Lvz8m3a2rNd9+1tgHY0Fpia1JYAY2UdubS99fl5E= =LVcN -----END PGP SIGNATURE----- Merge tag 'uaccess_for_upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost into asm-generic Merge "uaccess: fix sparse warning on get/put_user for bitwise types" from Michael S. Tsirkin: At the moment, if p and x are both tagged as bitwise types, some of get_user(x, p), put_user(x, p), __get_user(x, p), __put_user(x, p) might produce a sparse warning on many architectures. This is a false positive: p on these architectures is loaded into long (typically using asm), then cast back to typeof(p). When typeof(p) is a bitwise type (which is uncommon), such a cast needs __force, otherwise sparse produces a warning. Some architectures already have the __force tag, add it where it's missing. I verified that adding these __force casts does not supress any useful warnings. Specifically, vhost wants to read/write bitwise types in userspace memory using get_user/put_user. At the moment this triggers sparse errors, since the value is passed through an integer. For example: __le32 __user p; __u32 x; both put_user(x, p); and get_user(x, p); should be safe, but produce warnings on some architectures. While there, I noticed that a bunch of architectures violated coding style rules within uaccess macros. Included patches to fix them up. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> * tag 'uaccess_for_upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (37 commits) sparc32: nocheck uaccess coding style tweaks sparc64: nocheck uaccess coding style tweaks xtensa: macro whitespace fixes sh: macro whitespace fixes parisc: macro whitespace fixes m68k: macro whitespace fixes m32r: macro whitespace fixes frv: macro whitespace fixes cris: macro whitespace fixes avr32: macro whitespace fixes arm64: macro whitespace fixes arm: macro whitespace fixes alpha: macro whitespace fixes blackfin: macro whitespace fixes sparc64: uaccess_64 macro whitespace fixes sparc32: uaccess_32 macro whitespace fixes avr32: whitespace fix sh: fix put_user sparse errors metag: fix put_user sparse errors ia64: fix put_user sparse errors ...	2015-01-14 23:17:49 +01:00
Timothy McCaffrey	e31ac32d3b	crypto: aesni - Add support for 192 & 256 bit keys to AESNI RFC4106 These patches fix the RFC4106 implementation in the aesni-intel module so it supports 192 & 256 bit keys. Since the AVX support that was added to this module also only supports 128 bit keys, and this patch only affects the SSE implementation, changes were also made to use the SSE version if key sizes other than 128 are specified. RFC4106 specifies that 192 & 256 bit keys must be supported (section 8.4). Also, this should fix Strongswan issue 341 where the aesni module needs to be unloaded if 256 bit keys are used: http://wiki.strongswan.org/issues/341 This patch has been tested with Sandy Bridge and Haswell processors. With 128 bit keys and input buffers > 512 bytes a slight performance degradation was noticed (~1%). For input buffers of less than 512 bytes there was no performance impact. Compared to 128 bit keys, 256 bit key size performance is approx. .5 cycles per byte slower on Sandy Bridge, and .37 cycles per byte slower on Haswell (vs. SSE code). This patch has also been tested with StrongSwan IPSec connections where it worked correctly. I created this diff from a git clone of crypto-2.6.git. Any questions, please feel free to contact me. Signed-off-by: Timothy McCaffrey <timothy.mccaffrey@unisys.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-01-14 21:56:51 +11:00
Denys Vlasenko	f6f64681d9	x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user No code changes. This is a preparatory patch for change in "struct pt_regs" handling. CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Oleg Nesterov <oleg@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Andy Lutomirski <luto@amacapital.net> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: X86 ML <x86@kernel.org> CC: Alexei Starovoitov <ast@plumgrid.com> CC: Will Drewry <wad@chromium.org> CC: Kees Cook <keescook@chromium.org> CC: linux-kernel@vger.kernel.org Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-01-13 14:18:08 -08:00
Denys Vlasenko	6c3176a216	x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET The values of these two constants are the same, the meaning is different. Acked-by: Borislav Petkov <bp@suse.de> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Oleg Nesterov <oleg@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Borislav Petkov <bp@alien8.de> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: X86 ML <x86@kernel.org> CC: Alexei Starovoitov <ast@plumgrid.com> CC: Will Drewry <wad@chromium.org> CC: Kees Cook <keescook@chromium.org> CC: linux-kernel@vger.kernel.org Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-01-13 14:10:31 -08:00
Denys Vlasenko	af9cfe270d	x86: entry_64.S: delete unused code A define, two macros and an unreferenced bit of assembly are gone. Acked-by: Borislav Petkov <bp@suse.de> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Oleg Nesterov <oleg@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Andy Lutomirski <luto@amacapital.net> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: X86 ML <x86@kernel.org> CC: Alexei Starovoitov <ast@plumgrid.com> CC: Will Drewry <wad@chromium.org> CC: Kees Cook <keescook@chromium.org> CC: linux-kernel@vger.kernel.org Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-01-13 14:00:33 -08:00
Linus Torvalds	613d4cefbb	xen: bug fixes for 3.19-rc4 - Several critical linear p2m fixes that prevented some hosts from booting. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJUtQHNAAoJEFxbo/MsZsTR/qgH/iiW4k2T8dBGZ7TPyzt88iyT 4caWjuujp2OUaRqhBQdY7z05uai6XxgJLwDyqiO+qHaRUj+ZWCrjh/ZFPU1+09hK GdwPMWU7xMRs/7F2ANO03jJ/ktvsYXtazcVrV89Q3t+ZZJIQ/THovDkaoa+dF2lh W8d5H7N2UNCJLe9w2fm5iOq4SKoTsJOq6pVQ6gUBqJcgkSDWavd6bowXnTlcepZN tNaSMZsOt4CAvYQIa0nKPJo6Q4QN3buRQMWEOAOmGVT/RkVi68wirwk59uNzcS7E HjhqxFjhXYamNTuwHYZlchBrZutdbymSlucVucb1wAoxRAX+Wd1jk5EPl6zLv4w= =kFSE -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: "Several critical linear p2m fixes that prevented some hosts from booting" * tag 'stable/for-linus-3.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: properly retrieve NMI reason xen: check for zero sized area when invalidating memory xen: use correct type for physical addresses xen: correct race in alloc_p2m_pmd() xen: correct error for building p2m list on 32 bits x86/xen: avoid freeing static 'name' when kasprintf() fails x86/xen: add extra memory for remapped frames during setup x86/xen: don't count how many PFNs are identity mapped x86/xen: Free bootmem in free_p2m_page() during early boot x86/xen: Remove unnecessary BUG_ON(preemptible()) in xen_setup_timer()	2015-01-14 08:07:42 +13:00
Masami Hiramatsu	cbf6ab52ad	kprobes: Pass the original kprobe for preparing optimized kprobe Pass the original kprobe for preparing an optimized kprobe arch-dep part, since for some architecture (e.g. ARM32) requires the information in original kprobe. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Wang Nan <wangnan0@huawei.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>	2015-01-13 16:10:16 +00:00
Michael S. Tsirkin	e182c570e9	x86/uaccess: fix sparse errors virtio wants to read bitwise types from userspace using get_user. At the moment this triggers sparse errors, since the value is passed through an integer. Fix that up using __force. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>	2015-01-13 15:22:59 +02:00
Mathias Krause	d8219f52a7	crypto: x86/des3_ede - drop bogus module aliases This module implements variations of "des3_ede" only. Drop the bogus module aliases for "des". Cc: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-01-13 22:30:52 +11:00
Mathias Krause	3e14dcf7cb	crypto: add missing crypto module aliases Commit `5d26a105b5` ("crypto: prefix module autoloading with "crypto-"") changed the automatic module loading when requesting crypto algorithms to prefix all module requests with "crypto-". This requires all crypto modules to have a crypto specific module alias even if their file name would otherwise match the requested crypto algorithm. Even though commit `5d26a105b5` added those aliases for a vast amount of modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO annotations to those files to make them get loaded automatically, again. This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work with kernels v3.18 and below. Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former won't work for crypto modules any more. Fixes: `5d26a105b5` ("crypto: prefix module autoloading with "crypto-"") Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-01-13 22:29:11 +11:00
Alexander Kuleshov	b34630014d	x86, early_serial_console: Remove unnecessary check We do this check already a couple of lines up. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Link: http://lkml.kernel.org/r/1420009958-4803-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-01-13 12:14:44 +01:00
Alexander Kuleshov	e054273a9b	x86, early_serial_console: Remove unused macro XMTRDY There is no write to serial routine, no need for XMTRDY macro. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Link: http://lkml.kernel.org/r/1420034191-20721-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-01-13 12:13:11 +01:00
Alexander Kuleshov	60b217a03b	x86, setup: Rename BOOT_ISDIGIT_H to BOOT_CTYPE_H arch/x86/boot/isdigit.h was renamed to arch/x86/boot/ctype.h in `6238b47b58` ("x86, setup: move isdigit.h to ctype.h, header files on top.") Adjust guards too. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Link: http://lkml.kernel.org/r/1420267941-26390-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-01-13 11:59:04 +01:00
Jan Beulich	f221b04fe0	x86/xen: properly retrieve NMI reason Using the native code here can't work properly, as the hypervisor would normally have cleared the two reason bits by the time Dom0 gets to see the NMI (if passed to it at all). There's a shared info field for this, and there's an existing hook to use - just fit the two together. This is particularly relevant so that NMIs intended to be handled by APEI / GHES actually make it to the respective handler. Note that the hook can (and should) be used irrespective of whether being in Dom0, as accessing port 0x61 in a DomU would be even worse, while the shared info field would just hold zero all the time. Note further that hardware NMI handling for PVH doesn't currently work anyway due to missing code in the hypervisor (but it is expected to work the native rather than the PV way). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-13 09:39:50 +00:00
Juergen Gross	9a17ad7f3d	xen: check for zero sized area when invalidating memory With the introduction of the linear mapped p2m list setting memory areas to "invalid" had to be delayed. When doing the invalidation make sure no zero sized areas are processed. Signed-off-by: Juegren Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-12 10:09:55 +00:00
Juergen Gross	e86f949667	xen: use correct type for physical addresses When converting a pfn to a physical address be sure to use 64 bit wide types or convert the physical address to a pfn if possible. Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-12 10:09:48 +00:00
Juergen Gross	f241b0b891	xen: correct race in alloc_p2m_pmd() When allocating a new pmd for the linear mapped p2m list a check is done for not introducing another pmd when this just happened on another cpu. In this case the old pte pointer was returned which points to the p2m_missing or p2m_identity page. The correct value would be the pointer to the found new page. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-12 10:09:40 +00:00
Juergen Gross	82c92ed135	xen: correct error for building p2m list on 32 bits In xen_rebuild_p2m_list() for large areas of invalid or identity mapped memory the pmd entries on 32 bit systems are initialized wrong. Correct this error. Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-12 10:09:34 +00:00
Linus Torvalds	505569d208	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: two vdso fixes, two kbuild fixes and a boot failure fix with certain odd memory mappings" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, vdso: Use asm volatile in __getcpu x86/build: Clean auto-generated processor feature files x86: Fix mkcapflags.sh bash-ism x86: Fix step size adjustment during initial memory mapping x86_64, vdso: Fix the vdso address randomization algorithm	2015-01-11 11:53:46 -08:00
Linus Torvalds	ddb321a8dd	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, but also some kernel side fixes: uncore PMU driver fix, user regs sampling fix and an instruction decoder fix that unbreaks PEBS precise sampling" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes perf/x86_64: Improve user regs sampling perf: Move task_pt_regs sampling into arch code x86: Fix off-by-one in instruction decoder perf hists browser: Fix segfault when showing callchain perf callchain: Free callchains when hist entries are deleted perf hists: Fix children sort key behavior perf diff: Fix to sort by baseline field by default perf list: Fix --raw-dump option perf probe: Fix crash in dwarf_getcfi_elf perf probe: Fix to fall back to find probe point in symbols perf callchain: Append callchains only when requested perf ui/tui: Print backtrace symbols when segfault occurs perf report: Show progress bar for output resorting	2015-01-11 11:47:45 -08:00
Steven Honeyman	f94fe119f2	x86, CPU: Fix trivial printk formatting issues with dmesg dmesg (from util-linux) currently has two methods for reading the kernel message ring buffer: /dev/kmsg and syslog(2). Since kernel 3.5.0 kmsg has been the default, which escapes control characters (e.g. new lines) before they are shown. This change means that when dmesg is using /dev/kmsg, a 2 line printk makes the output messy, because the second line does not get a timestamp. For example: [ 0.012863] CPU0: Thermal monitoring enabled (TM1) [ 0.012869] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024 Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4 [ 0.012958] Freeing SMP alternatives memory: 28K (ffffffff81d86000 - ffffffff81d8d000) [ 0.014961] dmar: Host address width 39 Because printk.c intentionally escapes control characters, they should not be there in the first place. This patch fixes two occurrences of this. Signed-off-by: Steven Honeyman <stevenhoneyman@gmail.com> Link: https://lkml.kernel.org/r/1414856696-8094-1-git-send-email-stevenhoneyman@gmail.com [ Boris: make cpu_detect_tlb() static, while at it. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-01-11 01:54:54 +01:00
Masahiro Yamada	665d92e38f	kbuild: do not add $(call ...) to invoke cc-version or cc-fullversion The macros cc-version, cc-fullversion and ld-version take no argument. It is not necessary to add $(call ...) to invoke them. Signed-off-by: Masahiro Yamada <yamada.m@jp.panasonic.com> Acked-by: Helge Deller <deller@gmx.de> [parisc] Signed-off-by: Michal Marek <mmarek@suse.cz>	2015-01-09 17:25:44 +01:00

... 7 8 9 10 11 ...

21438 Commits