linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-18 09:02:17 +00:00

Author	SHA1	Message	Date
Xiao Guangrong	0d5367900a	KVM: MMU: introduce for_each_rmap_spte() It's used to walk all the sptes on the rmap to clean up the code Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-19 20:52:38 +02:00
Paolo Bonzini	c35ebbeade	Revert "kvmclock: set scheduler clock stable" This reverts commit `ff7bbb9c6a`. Sasha Levin is seeing odd jump in time values during boot of a KVM guest: [...] [ 0.000000] tsc: Detected 2260.998 MHz processor [3376355.247558] Calibrating delay loop (skipped) preset value.. [...] and bisected them to this commit. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-19 20:52:37 +02:00
Xiao Guangrong	edc90b7dc4	KVM: MMU: fix SMAP virtualization KVM may turn a user page to a kernel page when kernel writes a readonly user page if CR0.WP = 1. This shadow page entry will be reused after SMAP is enabled so that kernel is allowed to access this user page Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu once CR4.SMAP is updated Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-19 20:52:36 +02:00
Nadav Amit	428e3d0857	KVM: x86: Fix zero iterations REP-string When a REP-string is executed in 64-bit mode with an address-size prefix, ECX/EDI/ESI are used as counter and pointers. When ECX is initially zero, Intel CPUs clear the high 32-bits of RCX, and recent Intel CPUs update the high bits of the pointers in MOVS/STOS. This behavior is specific to Intel according to few experiments. As one may guess, this is an undocumented behavior. Yet, it is observable in the guest, since at least VMX traps REP-INS/OUTS even when ECX=0. Note that VMware appears to get it right. The behavior can be observed using the following code: #include <stdio.h> #define LOW_MASK (0xffffffff00000000ull) #define ALL_MASK (0xffffffffffffffffull) #define TEST(opcode) \ do { \ asm volatile(".byte 0xf2 \n\t .byte 0x67 \n\t .byte " opcode "\n\t" \ : "=S"(s), "=c"(c), "=D"(d) \ : "S"(ALL_MASK), "c"(LOW_MASK), "D"(ALL_MASK)); \ printf("opcode %s rcx=%llx rsi=%llx rdi=%llx\n", \ opcode, c, s, d); \ } while(0) void main() { unsigned long long s, d, c; iopl(3); TEST("0x6c"); TEST("0x6d"); TEST("0x6e"); TEST("0x6f"); TEST("0xa4"); TEST("0xa5"); TEST("0xa6"); TEST("0xa7"); TEST("0xaa"); TEST("0xab"); TEST("0xae"); TEST("0xaf"); } Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-19 20:52:36 +02:00
Nadav Amit	ee122a7109	KVM: x86: Fix update RCX/RDI/RSI on REP-string When REP-string instruction is preceded with an address-size prefix, ECX/EDI/ESI are used as the operation counter and pointers. When they are updated, the high 32-bits of RCX/RDI/RSI are cleared, similarly to the way they are updated on every 32-bit register operation. Fix it. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-19 20:52:35 +02:00
Nadav Amit	3db176d5b4	KVM: x86: Fix DR7 mask on task-switch while debugging If the host sets hardware breakpoints to debug the guest, and a task-switch occurs in the guest, the architectural DR7 will not be updated. The effective DR7 would be updated instead. This fix puts the DR7 update during task-switch emulation, so it now uses the standard DR setting mechanism instead of the one that was previously used. As a bonus, the update of DR7 will now be effective for AMD as well. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-19 20:52:35 +02:00
Paolo Bonzini	31fd9880a1	KVM: MMU: fix CR4.SMEP=1, CR0.WP=0 with shadow pages smep_andnot_wp is initialized in kvm_init_shadow_mmu and shadow pages should not be reused for different values of it. Thus, it has to be added to the mask in kvm_mmu_pte_write. Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-08 10:51:47 +02:00
Xiao Guangrong	ceee7df749	KVM: MMU: fix smap permission check Current permission check assumes that RSVD bit in PFEC is always zero, however, it is not true since MMIO #PF will use it to quickly identify MMIO access Fix it by clearing the bit if walking guest page table is needed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-08 10:51:47 +02:00
Jan Kiszka	8a9781f7ad	KVM: nVMX: Fix host crash when loading MSRs with userspace irqchip vcpu->arch.apic is NULL when a userspace irqchip is active. But instead of letting the test incorrectly depend on in-kernel irqchip mode, open-code it to catch also userspace x2APICs. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-08 10:51:45 +02:00
Nadav Amit	acac6f8957	KVM: x86: Call-far should not be emulated as stack op Far call in 64-bit has a 32-bit operand size. Remove the marking of this operation as Stack so it can be emulated correctly in 64-bit. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-08 10:51:44 +02:00
Paolo Bonzini	4eb64dce8d	KVM: x86: dump VMCS on invalid entry Code and format roughly based on Xen's vmcs_dump_vcpu. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:30:40 +02:00
Marcelo Tosatti	a3eb97bd80	x86: kvmclock: drop rdtsc_barrier() Drop unnecessary rdtsc_barrier(), as has been determined empirically, see `057e6a8c66` for details. Noticed by Andy Lutomirski. Improves clock_gettime() by approximately 15% on Intel i7-3520M @ 2.90GHz. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:48 +02:00
Julia Lawall	d90e3a35e9	KVM: x86: drop unneeded null test If the null test is needed, the call to cancel_delayed_work_sync would have already crashed. Normally, the destroy function should only be called if the init function has succeeded, in which case ioapic is not null. Problem found using Coccinelle. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:47 +02:00
Radim Krčmář	74545705cb	KVM: x86: fix initial PAT value PAT should be 0007_0406_0007_0406h on RESET and not modified on INIT. VMX used a wrong value (host's PAT) and while SVM used the right one, it never got to arch.pat. This is not an issue with QEMU as it will force the correct value. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:46 +02:00
Rik van Riel	653f52c316	kvm,x86: load guest FPU context more eagerly Currently KVM will clear the FPU bits in CR0.TS in the VMCS, and trap to re-load them every time the guest accesses the FPU after a switch back into the guest from the host. This patch copies the x86 task switch semantics for FPU loading, with the FPU loaded eagerly after first use if the system uses eager fpu mode, or if the guest uses the FPU frequently. In the latter case, after loading the FPU for 255 times, the fpu_counter will roll over, and we will revert to loading the FPU on demand, until it has been established that the guest is still actively using the FPU. This mirrors the x86 task switch policy, which seems to work. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:45 +02:00
James Sullivan	d1ebdbf99a	kvm: x86: Deliver MSI IRQ to only lowest prio cpu if msi_redir_hint is true An MSI interrupt should only be delivered to the lowest priority CPU when it has RH=1, regardless of the delivery mode. Modified kvm_is_dm_lowest_prio() to check for either irq->delivery_mode == APIC_DM_LOWPRI or irq->msi_redir_hint. Moved kvm_is_dm_lowest_prio() into lapic.h and renamed to kvm_lowest_prio_delivery(). Changed a check in kvm_irq_delivery_to_apic_fast() from irq->delivery_mode == APIC_DM_LOWPRI to kvm_is_dm_lowest_prio(). Signed-off-by: James Sullivan <sullivan.james.f@gmail.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:45 +02:00
James Sullivan	93bbf0b8bc	kvm: x86: Extended struct kvm_lapic_irq with msi_redir_hint for MSI delivery Extended struct kvm_lapic_irq with bool msi_redir_hint, which will be used to determine if the delivery of the MSI should target only the lowest priority CPU in the logical group specified for delivery. (In physical dest mode, the RH bit is not relevant). Initialized the value of msi_redir_hint to true when RH=1 in kvm_set_msi_irq(), and initialized to false in all other cases. Added value of msi_redir_hint to a debug message dump of an IRQ in apic_send_ipi(). Signed-off-by: James Sullivan <sullivan.james.f@gmail.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:44 +02:00
Paolo Bonzini	b7cb223173	KVM: x86: tweak types of fields in kvm_lapic_irq Change to u16 if they only contain data in the low 16 bits. Change the level field to bool, since we assign 1 sometimes, but just mask icr_low with APIC_INT_ASSERT in apic_send_ipi. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:43 +02:00
Nadav Amit	d28bc9dd25	KVM: x86: INIT and reset sequences are different x86 architecture defines differences between the reset and INIT sequences. INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU, MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP. References (from Intel SDM): "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions] [Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT] "If the processor is reset by asserting the INIT# pin, the x87 FPU state is not changed." [9.2: X87 FPU INITIALIZATION] "The state of the local APIC following an INIT reset is the same as it is after a power-up or hardware reset, except that the APIC ID and arbitration ID registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset ("Wait-for-SIPI" State)] Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:43 +02:00
Nadav Amit	90de4a1875	KVM: x86: Support for disabling quirks Introducing KVM_CAP_DISABLE_QUIRKS for disabling x86 quirks that were previous created in order to overcome QEMU issues. Those issue were mostly result of invalid VM BIOS. Currently there are two quirks that can be disabled: 1. KVM_QUIRK_LINT0_REENABLED - LINT0 was enabled after boot 2. KVM_QUIRK_CD_NW_CLEARED - CD and NW are cleared after boot These two issues are already resolved in recent releases of QEMU, and would therefore be disabled by QEMU. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1428879221-29996-1-git-send-email-namit@cs.technion.ac.il> [Report capability from KVM_CHECK_EXTENSION too. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:42 +02:00
Christian Borntraeger	ccf73aaf5a	KVM: arm/mips/x86/power use __kvm_guest_{enter\|exit} Use __kvm_guest_{enter\|exit} instead of kvm_guest_{enter\|exit} where interrupts are disabled. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:28:22 +02:00
Luiz Capitulino	ff7bbb9c6a	kvmclock: set scheduler clock stable If you try to enable NOHZ_FULL on a guest today, you'll get the following error when the guest tries to deactivate the scheduler tick: WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 can_stop_full_tick+0xb9/0x290() NO_HZ FULL will not work with unstable sched clock CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb #204 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: events flush_to_ldisc ffffffff8162a0c7 ffff88011f583e88 ffffffff814e6ba0 0000000000000002 ffff88011f583ed8 ffff88011f583ec8 ffffffff8104d095 ffff88011f583eb8 0000000000000000 0000000000000003 0000000000000001 0000000000000001 Call Trace: <IRQ> [<ffffffff814e6ba0>] dump_stack+0x4f/0x7b [<ffffffff8104d095>] warn_slowpath_common+0x85/0xc0 [<ffffffff8104d146>] warn_slowpath_fmt+0x46/0x50 [<ffffffff810bd2a9>] can_stop_full_tick+0xb9/0x290 [<ffffffff810bd9ed>] tick_nohz_irq_exit+0x8d/0xb0 [<ffffffff810511c5>] irq_exit+0xc5/0x130 [<ffffffff814f180a>] smp_apic_timer_interrupt+0x4a/0x60 [<ffffffff814eff5e>] apic_timer_interrupt+0x6e/0x80 <EOI> [<ffffffff814ee5d1>] ? _raw_spin_unlock_irqrestore+0x31/0x60 [<ffffffff8108bbc8>] __wake_up+0x48/0x60 [<ffffffff8134836c>] n_tty_receive_buf_common+0x49c/0xba0 [<ffffffff8134a6bf>] ? tty_ldisc_ref+0x1f/0x70 [<ffffffff81348a84>] n_tty_receive_buf2+0x14/0x20 [<ffffffff8134b390>] flush_to_ldisc+0xe0/0x120 [<ffffffff81064d05>] process_one_work+0x1d5/0x540 [<ffffffff81064c81>] ? process_one_work+0x151/0x540 [<ffffffff81065191>] worker_thread+0x121/0x470 [<ffffffff81065070>] ? process_one_work+0x540/0x540 [<ffffffff8106b4df>] kthread+0xef/0x110 [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0 [<ffffffff814ef4f2>] ret_from_fork+0x42/0x70 [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0 ---[ end trace 06e3507544a38866 ]--- However, it turns out that kvmclock does provide a stable sched_clock callback. So, let the scheduler know this which in turn makes NOHZ_FULL work in the guest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:28:20 +02:00
Paolo Bonzini	73459e2a1a	x86: pvclock: Really remove the sched notifier for cross-cpu migrations This reverts commits `0a4e6be9ca` and `80f7fdb1c7`. The task migration notifier was originally introduced in order to support the pvclock vsyscall with non-synchronized TSC, but KVM only supports it with synchronized TSC. Hence, on KVM the race condition is only needed due to a bad implementation on the host side, and even then it's so rare that it's mostly theoretical. As far as KVM is concerned it's possible to fix the host, avoiding the additional complexity in the vDSO and the (re)introduction of the task migration notifier. Xen, on the other hand, hasn't yet implemented vsyscall support at all, so we do not care about its plans for non-synchronized TSC. Reported-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-27 15:49:30 +02:00
Radim Krčmář	5dca0d9147	kvm: x86: fix kvmclock update protocol The kvmclock spec says that the host will increment a version field to an odd number, then update stuff, then increment it to an even number. The host is buggy and doesn't do this, and the result is observable when one vcpu reads another vcpu's kvmclock data. There's no good way for a guest kernel to keep its vdso from reading a different vcpu's kvmclock data, but we don't need to care about changing VCPUs as long as we read a consistent data from kvmclock. (VCPU can change outside of this loop too, so it doesn't matter if we return a value not fit for this VCPU.) Based on a patch by Radim Krčmář. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-27 15:48:59 +02:00
Andy Lutomirski	61f01dd941	x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with SS == 0 results in an invalid usermode state in which SS is apparently equal to __USER_DS but causes #SS if used. Work around the issue by setting SS to __KERNEL_DS __switch_to, thus ensuring that SYSRET never happens with SS set to NULL. This was exposed by a recent vDSO cleanup. Fixes: `e7d6eefaaa` x86/vdso32/syscall.S: Do not load __USER32_DS to %ss Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Brian Gerst <brgerst@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-26 17:57:38 -07:00
Linus Torvalds	9ec3a646fe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only	2015-04-26 17:22:07 -07:00
Linus Torvalds	d89b3e19ef	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This push fixes a build problem with img-hash under non-standard configurations and a serious regression with sha512_ssse3 which can lead to boot failures" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: img-hash - CRYPTO_DEV_IMGTEC_HASH should depend on HAS_DMA crypto: x86/sha512_ssse3 - fixup for asm function prototype change	2015-04-26 13:51:05 -07:00
Linus Torvalds	eadf16a912	This mostly includes the PPC changes for 4.1, which this time cover Book3S HV only (debugging aids, minor performance improvements and some cleanups). But there are also bug fixes and small cleanups for ARM, x86 and s390. The task_migration_notifier revert and real fix is still pending review, but I'll send it as soon as possible after -rc1. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJVOONLAAoJEL/70l94x66DbsMIAIpZPsaqgXOC1sDEiZuYay+6 rD4n4id7j8hIAzcf3AlZdyf5XgLlr6I1Zyt62s1WcoRq/CCnL7k9EljzSmw31WFX P2y7/J0iBdkn0et+PpoNThfL2GsgTqNRCLOOQlKgEQwMP9Dlw5fnUbtC1UchOzTg eAMeBIpYwufkWkXhdMw4PAD4lJ9WxUZ1eXHEBRzJb0o0ZxAATJ1tPZGrFJzoUOSM WsVNTuBsNd7upT02kQdvA1TUo/OPjseTOEoksHHwfcORt6bc5qvpctL3jYfcr7sk /L6sIhYGVNkjkuredjlKGLfT2DDJjSEdJb1k2pWrDRsY76dmottQubAE9J9cDTk= =OAi2 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull second batch of KVM changes from Paolo Bonzini: "This mostly includes the PPC changes for 4.1, which this time cover Book3S HV only (debugging aids, minor performance improvements and some cleanups). But there are also bug fixes and small cleanups for ARM, x86 and s390. The task_migration_notifier revert and real fix is still pending review, but I'll send it as soon as possible after -rc1" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits) KVM: arm/arm64: check IRQ number on userland injection KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi KVM: VMX: Preserve host CR4.MCE value while in guest mode. KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C KVM: PPC: Book3S HV: Streamline guest entry and exit KVM: PPC: Book3S HV: Use bitmap of active threads rather than count KVM: PPC: Book3S HV: Use decrementer to wake napping threads KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu KVM: PPC: Book3S HV: Minor cleanups KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update KVM: PPC: Book3S HV: Accumulate timing information for real-mode code KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT KVM: PPC: Book3S HV: Add ICP real mode counters KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock KVM: PPC: Book3S HV: Add guest->host real mode completion counters KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte ...	2015-04-26 13:06:22 -07:00
Linus Torvalds	836ee4874e	Initial ACPI support for arm64: This series introduces preliminary ACPI 5.1 support to the arm64 kernel using the "hardware reduced" profile. We don't support any peripherals yet, so it's fairly limited in scope: - Memory init (UEFI) - ACPI discovery (RSDP via UEFI) - CPU init (FADT) - GIC init (MADT) - SMP boot (MADT + PSCI) - ACPI Kconfig options (dependent on EXPERT) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCgAGBQJVNOC2AAoJELescNyEwWM08dIH/1Pn5xa04wwNDn0MOpbuQMk2 kHM7hx69fbXflTJpnZRVyFBjRxxr5qilA7rljAFLnFeF8Fcll/s5VNy7ElHKLISq CB0ywgUfOd/sFJH57rcc67pC1b/XuqTbE1u1NFwvp2R3j1kGAEJWNA6SyxIP4bbc NO5jScx0lQOJ3rrPAXBW8qlGkeUk7TPOQJtMrpftNXlFLFrR63rPaEmMZ9dWepBF aRE4GXPvyUhpyv5o9RvlN5l8bQttiRJ3f9QjyG7NYhX0PXH3DyvGUzYlk2IoZtID v3ssCQH3uRsAZHIBhaTyNqFnUIaDR825bvGqyG/tj2Dt3kQZiF+QrfnU5D9TuMw= =zLJn -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull initial ACPI support for arm64 from Will Deacon: "This series introduces preliminary ACPI 5.1 support to the arm64 kernel using the "hardware reduced" profile. We don't support any peripherals yet, so it's fairly limited in scope: - MEMORY init (UEFI) - ACPI discovery (RSDP via UEFI) - CPU init (FADT) - GIC init (MADT) - SMP boot (MADT + PSCI) - ACPI Kconfig options (dependent on EXPERT) ACPI for arm64 has been in development for a while now and hardware has been available that can boot with either FDT or ACPI tables. This has been made possible by both changes to the ACPI spec to cater for ARM-based machines (known as "hardware-reduced" in ACPI parlance) but also a Linaro-driven effort to get this supported on top of the Linux kernel. This pull request is the result of that work. These changes allow us to initialise the CPUs, interrupt controller, and timers via ACPI tables, with memory information and cmdline coming from EFI. We don't support a hybrid ACPI/FDT scheme. Of course, there is still plenty of work to do (a serial console would be nice!) but I expect that to happen on a per-driver basis after this core series has been merged. Anyway, the diff stat here is fairly horrible, but splitting this up and merging it via all the different subsystems would have been extremely painful. Instead, we've got all the relevant Acks in place and I've not seen anything other than trivial (Kconfig) conflicts in -next (for completeness, I've included my resolution below). Nearly half of the insertions fall under Documentation/. So, we'll see how this goes. Right now, it all depends on EXPERT and I fully expect people to use FDT by default for the immediate future" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (31 commits) ARM64 / ACPI: make acpi_map_gic_cpu_interface() as void function ARM64 / ACPI: Ignore the return error value of acpi_map_gic_cpu_interface() ARM64 / ACPI: fix usage of acpi_map_gic_cpu_interface ARM64: kernel: acpi: honour acpi=force command line parameter ARM64: kernel: acpi: refactor ACPI tables init and checks ARM64: kernel: psci: let ACPI probe PSCI version ARM64: kernel: psci: factor out probe function ACPI: move arm64 GSI IRQ model to generic GSI IRQ layer ARM64 / ACPI: Don't unflatten device tree if acpi=force is passed ARM64 / ACPI: additions of ACPI documentation for arm64 Documentation: ACPI for ARM64 ARM64 / ACPI: Enable ARM64 in Kconfig XEN / ACPI: Make XEN ACPI depend on X86 ARM64 / ACPI: Select ACPI_REDUCED_HARDWARE_ONLY if ACPI is enabled on ARM64 clocksource / arch_timer: Parse GTDT to initialize arch timer irqchip: Add GICv2 specific ACPI boot support ARM64 / ACPI: Introduce ACPI_IRQ_MODEL_GIC and register device's gsi ACPI / processor: Make it possible to get CPU hardware ID via GICC ACPI / processor: Introduce phys_cpuid_t for CPU hardware ID ARM64 / ACPI: Parse MADT for SMP initialization ...	2015-04-24 08:23:45 -07:00
Linus Torvalds	d869844bd0	x86: fix special __probe_kernel_write() tail zeroing case Commit `cae2a173fe` ("x86: clean up/fix 'copy_in_user()' tail zeroing") fixed the failure case tail zeroing of one special case of the x86-64 generic user-copy routine, namely when used for the user-to-user case ("copy_in_user()"). But in the process it broke an even more unusual case: using the user copy routine for kernel-to-kernel copying. Now, normally kernel-kernel copies are obviously done using memcpy(), but we have a couple of special cases when we use the user-copy functions. One is when we pass a kernel buffer to a regular user-buffer routine, using set_fs(KERNEL_DS). That's a "normal" case, and continued to work fine, because it never takes any faults (with the possible exception of a silent and successful vmalloc fault). But Jan Beulich pointed out another, very unusual, special case: when we use the user-copy routines not because it's a path that expects a user pointer, but for a couple of ftrace/kgdb cases that want to do a kernel copy, but do so using "unsafe" buffers, and use the user-copy routine to gracefully handle faults. IOW, for probe_kernel_write(). And that broke for the case of a faulting kernel destination, because we saw the kernel destination and wanted to try to clear the tail of the buffer. Which doesn't work, since that's what faults. This only triggers for things like kgdb and ftrace users (eg trying setting a breakpoint on read-only memory), but it's definitely a bug. The fix is to not compare against the kernel address start (TASK_SIZE), but instead use the same limits "access_ok()" uses. Reported-and-tested-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org # 4.0 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-24 06:58:27 -07:00
Ard Biesheuvel	00425bb181	crypto: x86/sha512_ssse3 - fixup for asm function prototype change Patch `e68410ebf6` ("crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer") changed the prototypes of the core asm SHA-512 implementations so that they are compatible with the prototype used by the base layer. However, in one instance, the register that was used for passing the input buffer was reused as a scratch register later on in the code, and since the input buffer param changed places with the digest param -which needs to be written back before the function returns- this resulted in the scratch register to be dereferenced in a memory write operation, causing a GPF. Fix this by changing the scratch register to use the same register as the input buffer param again. Fixes: `e68410ebf6` ("crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer") Reported-By: Bobby Powers <bobbypowers@gmail.com> Tested-By: Bobby Powers <bobbypowers@gmail.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-04-24 20:09:01 +08:00
Linus Torvalds	b9bb6fb73b	Some virtio internal cleanups, a new virtio device "virtio input", and a change to allow the legacy virtio balloon. Most excitingly, some lguest work! No seriously, I got some cleanup patches. Cheers, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVN1SjAAoJENkgDmzRrbjxeDoP+wZnZdG4cHNc6ifiNPkSed9m cKWV7L6uTxczdFKcTNpDShn2MW0XqbcHc+VdBH9Exl3+cyick6fuhpi6SjLby0g6 a40RldysRMAc/K/dK40dG4qtSUT1uwDrOYNonMDjx1RAikO3DoTGUm4YgYZKSlM/ pKuCbAebM3dZ6EUVnaJICHWkJvY7Bk9JwGL6Z8RhF7lunVAGqIMHH9GklqSCyNiY LK+05hNXHv/OOIAkEO+ZmDrWSagogggGXEdRFom9s87xmu9GVse7Fzfq9pZ5nQre gickgBeC+gN8das1wvhlTp22F8XJslC0IRJhvbwLMQUd16hrH1YUIdvsqry/Qxds 04GgzLTVA/Z5VVEVm9MXcKWGwcsnUBu9EChsdEKZwNgBz9UF2gs39My8Co6AZ7U/ Ajcpksl22RXaR7OB65vRPIk23mh/NchGSzVGFbppzCwj2SkO9ONSFrDj3mAzfbhR 9NHi32Xm0+LdN444WCo1NzahKLAX5bYCv2ZSDs5JEBDQzmW2FWKO2ZaVJ84jpG6O O4XppI/X8cP+dxTs8xH91qh9GGmq9Aa41iuekZh/jG/8fLFT45rhlzLJfwh2B9rI djcaFFLFt+in5R6kgugM9dbCNALneXgGDnzlmqy5RwOrrCTwhyGn6DMwDqRz7EHn gsbiiv6eSsrgX4mLHP2n =Wj06 -----END PGP SIGNATURE----- Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "Some virtio internal cleanups, a new virtio device "virtio input", and a change to allow the legacy virtio balloon. Most excitingly, some lguest work! No seriously, I got some cleanup patches" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: virtio: drop virtio_device_is_legacy_only virtio_pci: support non-legacy balloon devices virtio_mmio: support non-legacy balloon devices virtio_ccw: support non-legacy balloon devices virtio: balloon might not be a legacy device virtio_balloon: transitional interface virtio_ring: Update weak barriers to use dma_wmb/rmb virtio_pci_modern: switch to type-safe io accessors virtio_pci_modern: type-safe io accessors lguest: handle traps on the "interrupt suppressed" iret instruction. virtio: drop a useless config read virtio_config: reorder functions Add virtio-input driver. lguest: suppress interrupts for single insn, not range. lguest: simplify lguest_iret lguest: rename i386_head.S in the comments lguest: explicitly set miscdevice's private_data NULL lguest: fix pending interrupt test.	2015-04-22 10:55:06 -07:00
Ben Serebrin	085e68eeaf	KVM: VMX: Preserve host CR4.MCE value while in guest mode. The host's decision to enable machine check exceptions should remain in force during non-root mode. KVM was writing 0 to cr4 on VCPU reset and passed a slightly-modified 0 to the vmcs.guest_cr4 value. Tested: Built. On earlier version, tested by injecting machine check while a guest is spinning. Before the change, if guest CR4.MCE==0, then the machine check is escalated to Catastrophic Error (CATERR) and the machine dies. If guest CR4.MCE==1, then the machine check causes VMEXIT and is handled normally by host Linux. After the change, injecting a machine check causes normal Linux machine check handling. Signed-off-by: Ben Serebrin <serebrin@google.com> Reviewed-by: Venkatesh Srinivas <venkateshs@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-21 19:01:44 +02:00
Linus Torvalds	1fc149933f	Char/Misc driver patches for 4.1-rc1 Here's the big char/misc driver patchset for 4.1-rc1. Lots of different driver subsystem updates here, nothing major, full details are in the shortlog below. All of this has been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlU2IMEACgkQMUfUDdst+yloDQCfbyIRL23WVAn9ckQse/y8gbjB OT4AoKTJbwndDP9Kb/lrj2tjd9QjNVrC =xhen -----END PGP SIGNATURE----- Merge tag 'char-misc-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here's the big char/misc driver patchset for 4.1-rc1. Lots of different driver subsystem updates here, nothing major, full details are in the shortlog. All of this has been in linux-next for a while" * tag 'char-misc-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (133 commits) mei: trace: remove unused TRACE_SYSTEM_STRING DTS: ARM: OMAP3-N900: Add lis3lv02d support Documentation: DT: lis302: update wakeup binding lis3lv02d: DT: add wakeup unit 2 and wakeup threshold lis3lv02d: DT: use s32 to support negative values Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case Drivers: hv: hv_balloon: correctly handle val.freeram<num_pages case mei: replace check for connection instead of transitioning mei: use mei_cl_is_connected consistently mei: fix mei_poll operation hv_vmbus: Add gradually increased delay for retries in vmbus_post_msg() Drivers: hv: hv_balloon: survive ballooning request with num_pages=0 Drivers: hv: hv_balloon: eliminate jumps in piecewiese linear floor function Drivers: hv: hv_balloon: do not online pages in offline blocks hv: remove the per-channel workqueue hv: don't schedule new works in vmbus_onoffer()/vmbus_onoffer_rescind() hv: run non-blocking message handlers in the dispatch tasklet coresight: moving to new "hwtracing" directory coresight-tmc: Adding a status interface to sysfs coresight: remove the unnecessary configuration coresight-default-sink ...	2015-04-21 09:42:58 -07:00
Linus Torvalds	41d5e08ea8	TTY/Serial patches for 4.1-rc1 Here's the big tty/serial driver update for 4.1-rc1. It was delayed for a bit due to some questions surrounding some of the console command line parsing changes that are in here. There's still one tiny regression for people who were previously putting multiple console command lines and expecting them all to be ignored for some odd reason, but Peter is working on fixing that. If not, I'll send a revert for the offending patch, but I have faith that Peter can address it. Other than the console work here, there's the usual serial driver updates and changes, and a buch of 8250 reworks to try to make that driver easier to maintain over time, and have it support more devices in the future. All of these have been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlU2IcUACgkQMUfUDdst+ylFqACcC8LPhFEZg9aHn0hNUoqGK3rE 5dUAnR4b8r/NYqjVoE9FJZgZfB/TqVi1 =lyN/ -----END PGP SIGNATURE----- Merge tag 'tty-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here's the big tty/serial driver update for 4.1-rc1. It was delayed for a bit due to some questions surrounding some of the console command line parsing changes that are in here. There's still one tiny regression for people who were previously putting multiple console command lines and expecting them all to be ignored for some odd reason, but Peter is working on fixing that. If not, I'll send a revert for the offending patch, but I have faith that Peter can address it. Other than the console work here, there's the usual serial driver updates and changes, and a buch of 8250 reworks to try to make that driver easier to maintain over time, and have it support more devices in the future. All of these have been in linux-next for a while" * tag 'tty-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (119 commits) n_gsm: Drop unneeded cast on netdev_priv sc16is7xx: expose RTS inversion in RS-485 mode serial: 8250_pci: port failed after wakeup from S3 earlycon: 8250: Document kernel command line options earlycon: 8250: Fix command line regression earlycon: Fix __earlycon_table stride tty: clean up the tty time logic a bit serial: 8250_dw: only get the clock rate in one place serial: 8250_dw: remove useless ACPI ID check dmaengine: hsu: move memory allocation to GFP_NOWAIT dmaengine: hsu: remove redundant pieces of code serial: 8250_pci: add Intel Tangier support dmaengine: hsu: add Intel Tangier PCI ID serial: 8250_pci: replace switch-case by formula for Intel MID serial: 8250_pci: replace switch-case by formula tty: cpm_uart: replace CONFIG_8xx by CONFIG_CPM1 serial: jsm: some off by one bugs serial: xuartps: Fix check in console_setup(). serial: xuartps: Get rid of register access macros. serial: xuartps: Fix iobase use. ...	2015-04-21 09:33:10 -07:00
Linus Torvalds	6496edfce9	This is the final removal (after several years!) of the obsolete cpus_* functions, prompted by their mis-use in staging. With these function removed, all cpu functions should only iterate to nr_cpu_ids, so we finally only allocate that many bits when cpumasks are allocated offstack. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVNPMuAAoJENkgDmzRrbjx7ZIP/j65e6xs1jEyXR3WOYSdTU1x bMo6JcII6O1oEZLgyKXgx9KiBg6uIIDta1NG/H/XIe354dwfHVsHvj5HHHQR5Xof iRrjLOaHj4XglI3hvsk0eEEl3/OBBLgyo9bUwDvMF1fmr/9tW4caMs3Op6n7Evzm YIvoAyeJ0A8BfEtOU5lXhcVIGmnHtSw0x6mdGXpXIBmWYQPCtdQP868s4lnl44w0 bSNpAYdzEqg64Ph3SK0prgWPrn5+5EiaAhV7HZzENZ5+o0DAdIXWq/W7uHyCWPKH 536cJDojec+nSUQkPYngngGprxrKO02aBcMw/3JGJ0tdCDj8yw3XAyVAFzz4hmMb Lkmyv4QHHIILLvJ4ZRH5KHWCjjVBg41LNCs2H3HnoxFACdm0lZYKHsUAh2ucBVtU Wb/eHmLxOG43UIkpX4yrhy3SfE1ZdnOVzEzOzPXtr51t8ojqk+bLFe/hJ6EkzrQX X+90qHfBq+PMJlAnc3zdXHjxoJrL6KPWVwVvFrNeibgEKtVvy/BiwZkS6QceC1Ea TatOYA5r6awFVHHQCooN1DGAxN5Juvu2SmdnTUA9ymsCNDghj1YUoAKRNP81u8Sa pe3hco/63iCuPna+vlwNDU6SgsaMk9m0p+1n1BiDIfVJIkWYCNeG+u2gQkzbDKlQ AJuKKQv1QuZiF0ylZ0wq =VAgA -----END PGP SIGNATURE----- Merge tag 'cpumask-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull final removal of deprecated cpus_* cpumask functions from Rusty Russell: "This is the final removal (after several years!) of the obsolete cpus_* functions, prompted by their mis-use in staging. With these function removed, all cpu functions should only iterate to nr_cpu_ids, so we finally only allocate that many bits when cpumasks are allocated offstack" * tag 'cpumask-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits) cpumask: remove __first_cpu / __next_cpu cpumask: resurrect CPU_MASK_CPU0 linux/cpumask.h: add typechecking to cpumask_test_cpu cpumask: only allocate nr_cpumask_bits. Fix weird uses of num_online_cpus(). cpumask: remove deprecated functions. mips: fix obsolete cpumask_of_cpu usage. x86: fix more deprecated cpu function usage. ia64: remove deprecated cpus_ usage. powerpc: fix deprecated CPU_MASK_CPU0 usage. CPU_MASK_ALL/CPU_MASK_NONE: remove from deprecated region. staging/lustre/o2iblnd: Don't use cpus_weight staging/lustre/libcfs: replace deprecated cpus_ calls with cpumask_ staging/lustre/ptlrpc: Do not use deprecated cpus_* functions blackfin: fix up obsolete cpu function usage. parisc: fix up obsolete cpu function usage. tile: fix up obsolete cpu function usage. arm64: fix up obsolete cpu function usage. mips: fix up obsolete cpu function usage. x86: fix up obsolete cpu function usage. ...	2015-04-20 10:19:03 -07:00
Linus Torvalds	09d51602cf	Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat update from Len Brown: "Updates to the turbostat utility. Just one kernel dependency in this batch -- added a #define to msr-index.h" * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: correct dumped pkg-cstate-limit value tools/power turbostat: calculate TSC frequency from CPUID(0x15) on SKL tools/power turbostat: correct DRAM RAPL units on recent Xeon processors tools/power turbostat: Initial Skylake support tools/power turbostat: Use $(CURDIR) instead of $(PWD) and add support for O= option in Makefile tools/power turbostat: modprobe msr, if needed tools/power turbostat: dump MSR_TURBO_RATIO_LIMIT2 tools/power turbostat: use new MSR_TURBO_RATIO_LIMIT names x86 msr-index: define MSR_TURBO_RATIO_LIMIT,1,2 tools/power turbostat: label base frequency tools/power turbostat: update PERF_LIMIT_REASONS decoding tools/power turbostat: simplify default output	2015-04-19 14:31:41 -07:00
Len Brown	0b2bb6925e	tools/power turbostat: Initial Skylake support Skylake adds some additional residency counters. Skylake supports a different mix of RAPL registers from any previous product. In most other ways, Skylake is like Broadwell. Signed-off-by: Len Brown <len.brown@intel.com>	2015-04-18 14:20:51 -04:00
Linus Torvalds	34a984f7b0	Merge branch 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull PMEM driver from Ingo Molnar: "This is the initial support for the pmem block device driver: persistent non-volatile memory space mapped into the system's physical memory space as large physical memory regions. The driver is based on Intel code, written by Ross Zwisler, with fixes by Boaz Harrosh, integrated with x86 e820 memory resource management and tidied up by Christoph Hellwig. Note that there were two other separate pmem driver submissions to lkml: but apparently all parties (Ross Zwisler, Boaz Harrosh) are reasonably happy with this initial version. This version enables minimal support that enables persistent memory devices out in the wild to work as block devices, identified through a magic (non-standard) e820 flag and auto-discovered if CONFIG_X86_PMEM_LEGACY=y, or added explicitly through manipulating the memory maps via the "memmap=..." boot option with the new, special '!' modifier character. Limitations: this is a regular block device, and since the pmem areas are not struct page backed, they are invisible to the rest of the system (other than the block IO device), so direct IO to/from pmem areas, direct mmap() or XIP is not possible yet. The page cache will also shadow and double buffer pmem contents, etc. Initial support is for x86" * 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: drivers/block/pmem: Fix 32-bit build warning in pmem_alloc() drivers/block/pmem: Add a driver for persistent memory x86/mm: Add support for the non-standard protected e820 type	2015-04-18 11:42:49 -04:00
Linus Torvalds	90d1c08786	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "This tree includes: - an FPU related crash fix - a ptrace fix (with matching testcase in tools/testing/selftests/) - an x86 Kconfig DMA-config defaults tweak to better avoid non-working drivers" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected x86/fpu: Load xsave pointer after initialization x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal() x86, selftests: Add single_step_syscall test	2015-04-18 11:31:11 -04:00
Linus Torvalds	96b90f27bc	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "This update has mostly fixes, but also other bits: - perf tooling fixes - PMU driver fixes - Intel Broadwell PMU driver HW-enablement for LBR callstacks - a late coming 'perf kmem' tool update that enables it to also analyze page allocation data. Note, this comes with MM tracepoint changes that we believe to not break anything: because it changes the formerly opaque 'struct page ' field that uniquely identifies pages to 'pfn' which identifies pages uniquely too, but isn't as opaque and can be used for other purposes as well" 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/pt: Fix and clean up error handling in pt_event_add() perf/x86/intel: Add Broadwell support for the LBR callstack perf/x86/intel/rapl: Fix energy counter measurements but supporing per domain energy units perf/x86/intel: Fix Core2,Atom,NHM,WSM cycles:pp events perf/x86: Fix hw_perf_event::flags collision perf probe: Fix segfault when probe with lazy_line to file perf probe: Find compilation directory path for lazy matching perf probe: Set retprobe flag when probe in address-based alternative mode perf kmem: Analyze page allocator events also tracing, mm: Record pfn instead of pointer to struct page	2015-04-18 11:26:46 -04:00
Konrad Rzeszutek Wilk	a6dfa128ce	config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected A huge amount of NIC drivers use the DMA API, however if compiled under 32-bit an very important part of the DMA API can be ommitted leading to the drivers not working at all (especially if used with 'swiotlb=force iommu=soft'). As Prashant Sreedharan explains it: "the driver [tg3] uses DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of the dma "mapping" and dma_unmap_addr() to get the "mapping" value. On most of the platforms this is a no-op, but ... with "iommu=soft and swiotlb=force" this house keeping is required, ... otherwise we pass 0 while calling pci_unmap_/pci_dma_sync_ instead of the DMA address." As such enable this even when using 32-bit kernels. Reported-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Prashant Sreedharan <prashant@broadcom.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Chan <mchan@broadcom.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: cascardo@linux.vnet.ibm.com Cc: david.vrabel@citrix.com Cc: sanjeevb@broadcom.com Cc: siva.kallam@broadcom.com Cc: vyasevich@gmail.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/20150417190448.GA9462@l.oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-18 14:36:49 +02:00
Ingo Molnar	0c99241c93	perf/x86/intel/pt: Fix and clean up error handling in pt_event_add() Dan Carpenter reported that pt_event_add() has buggy error handling logic: it returns 0 instead of -EBUSY when it fails to start a newly added event. Furthermore, the control flow in this function is messy, with cleanup labels mixed with direct returns. Fix the bug and clean up the code by converting it to a straight fast path for the regular non-failing case, plus a clear sequence of cascading goto labels to do all cleanup. NOTE: I materially changed the existing clean up logic in the pt_event_start() failure case to use the direct perf_aux_output_end() path, not pt_event_del(), because perf_aux_output_end() is enough here. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150416103830.GB7847@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-18 13:31:26 +02:00
Linus Torvalds	54e514b91b	Merge branch 'akpm' (patches from Andrew) Merge third patchbomb from Andrew Morton: - various misc things - a couple of lib/ optimisations - provide DIV_ROUND_CLOSEST_ULL() - checkpatch updates - rtc tree - befs, nilfs2, hfs, hfsplus, fatfs, adfs, affs, bfs - ptrace fixes - fork() fixes - seccomp cleanups - more mmap_sem hold time reductions from Davidlohr * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (138 commits) proc: show locks in /proc/pid/fdinfo/X docs: add missing and new /proc/PID/status file entries, fix typos drivers/rtc/rtc-at91rm9200.c: make IO endian agnostic Documentation/spi/spidev_test.c: fix warning drivers/rtc/rtc-s5m.c: allow usage on device type different than main MFD type .gitignore: ignore *.tar MAINTAINERS: add Mediatek SoC mailing list tomoyo: reduce mmap_sem hold for mm->exe_file powerpc/oprofile: reduce mmap_sem hold for exe_file oprofile: reduce mmap_sem hold for mm->exe_file mips: ip32: add platform data hooks to use DS1685 driver lib/Kconfig: fix up HAVE_ARCH_BITREVERSE help text x86: switch to using asm-generic for seccomp.h sparc: switch to using asm-generic for seccomp.h powerpc: switch to using asm-generic for seccomp.h parisc: switch to using asm-generic for seccomp.h mips: switch to using asm-generic for seccomp.h microblaze: use asm-generic for seccomp.h arm: use asm-generic for seccomp.h seccomp: allow COMPAT sigreturn overrides ...	2015-04-17 09:04:38 -04:00
Kees Cook	8eb68bf75e	x86: switch to using asm-generic for seccomp.h Switch to using the newly created asm-generic/seccomp.h for the seccomp strict mode syscall definitions. The obsolete sigreturn syscall override is retained in 32-bit mode, and the ia32 syscall overrides are used in the compat case. Remaining definitions were identical. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-04-17 09:04:10 -04:00
Borislav Petkov	18ecb3bfa5	x86/fpu: Load xsave pointer after initialization So I was playing with gdb today and did this simple thing: gdb /bin/ls ... (gdb) run Box exploded with this splat: BUG: unable to handle kernel NULL pointer dereference at 00000000000001d0 IP: [<ffffffff8100fe5a>] xstateregs_get+0x7a/0x120 [...] Call Trace: ptrace_regset ptrace_request ? wait_task_inactive ? preempt_count_sub arch_ptrace ? ptrace_get_task_struct SyS_ptrace system_call_fastpath ... because we do cache &target->thread.fpu.state->xsave into the local variable xsave but that pointer is NULL at that time and it gets initialized later, in init_fpu(), see: `e7f180dcd8` ("x86/fpu: Change xstateregs_get()/set() to use ->xsave.i387 rather than ->fxsave") The fix is simple: load xsave after init_fpu() has run. Also do the same in xstateregs_set(), as suggested by Oleg Nesterov. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tavis Ormandy <taviso@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1429209697-5902-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-17 10:15:47 +02:00
Kan Liang	78d504bcd7	perf/x86/intel: Add Broadwell support for the LBR callstack Same as Haswell, Broadwell also support the LBR callstack. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1427962377-40955-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-17 09:59:07 +02:00
Jacob Pan	6455239601	perf/x86/intel/rapl: Fix energy counter measurements but supporing per domain energy units RAPL energy hardware unit can vary within a single CPU package, e.g. HSW server DRAM has a fixed energy unit of 15.3 uJ (2^-16) whereas the unit on other domains can be enumerated from power unit MSR. There might be other variations in the future, this patch adds per cpu model quirk to allow special handling of certain cpus. hw_unit is also removed from per cpu data since it is not per cpu and the sampling rate for energy counter is typically not high. Without this patch, DRAM domain on HSW servers will be counted 4x higher than the real energy counter. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1427405325-780-1-git-send-email-jacob.jun.pan@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-17 09:58:56 +02:00
Peter Zijlstra	517e6341fa	perf/x86/intel: Fix Core2,Atom,NHM,WSM cycles:pp events Ingo reported that cycles:pp didn't work for him on some machines. It turns out that in this commit: `af4bdcf675` perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events Andi forgot to explicitly allow that event when he disabled event flags for PEBS on those uarchs. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: `af4bdcf675` ("perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-17 09:58:47 +02:00
Peter Zijlstra	c857eb56e6	perf/x86: Fix hw_perf_event::flags collision Somehow we ended up with overlapping flags when merging the RDPMC control flag - this is bad, fix it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-17 09:50:43 +02:00

1 2 3 4 5 ...

21455 Commits