linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-23 19:31:53 +00:00

Author	SHA1	Message	Date
Tiejun Chen	f2c7648d91	kvm: x86: vmx: move down hardware_setup() and hardware_unsetup() Just move this pair of functions down to make sure later we can add something dependent on others. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-07 15:43:59 +01:00
Paolo Bonzini	173ede4ddd	KVM: s390: Fixes for kvm/next (3.19) and stable 1. We should flush TLBs for load control instruction emulation (stable) 2. A workaround for a compiler bug that renders ACCESS_ONCE broken (stable) 3. Fix program check handling for load control 4. Documentation Fix -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJUXJuyAAoJEBF7vIC1phx8/2kP/0UdpHpNWQA79ib1hDiMu9Sp Rs1b24wrAMoSuHTToD/MyyKdYQ8kHGV8QNqMUdssKO4UW1upHVGE4JVcMOLFoNxn VoQvz1ctfw9SXqwRsSQxA5zmVc/Nqa1urR+jxkDauQrJEJ2E19EOwWZzWAPtop3V oXYkjJMO6WO2nNuN2HxYtDkzJfeK863EBss4VYrCQFpwEPMPs/VTVtoi0KjzIEdY 8UwfAdT56ydzLNIr+eG2ZOgKaPgb34BTtYsZg9HA8+yuSbLYnVpdIpFFky4E7sjg MlEO/8yc4UWqG/YFnT2W1+NigYi2OYjDthotKABRA9qtI73+P/zwiX74jepOP10M U1ZwkTiQfGQ5V9KLJoksYUjcN9atTwwNk+Vzf0U/FAjmnqxGD0fQUqVlKVPD1CD8 U/vsoY5p+RKp3ZEkaApwH55YjvgrzLeDUk59ZiGcAyceEkUZXEIyi5TtmdNXtj2b INW5PyxlTdY3qq9AbhUtUZ5cs+5A1fLugBC6i8yxMYpTuj+fYYDtQvppKIRdvjzB DOxm9CoaJgxc/WnHY8QGNCbX7VuzX/cs+ZBSJ0ezUV7gWpnxhxJHqaqMP0SiuYl1 YopnelQ79w8qAs8snIMw1kx4VTBQlLbKD+Ixn1RNPacER/hy50ZUIXedFJylHAzz tXJfzacYL8eSoiAdBh3f =kFKq -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-20141107' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for kvm/next (3.19) and stable 1. We should flush TLBs for load control instruction emulation (stable) 2. A workaround for a compiler bug that renders ACCESS_ONCE broken (stable) 3. Fix program check handling for load control 4. Documentation Fix	2014-11-07 15:39:44 +01:00
Heiko Carstens	fc56eb66c3	KVM: s390: fix handling of lctl[g]/stctl[g] According to the architecture all instructions are suppressing if memory access is prohibited due to DAT protection, unless stated otherwise for an instruction. The lctl[g]/stctl[g] implementations handled this incorrectly since control register handling was done piecemeal, which means they had terminating instead of suppressing semantics. This patch fixes this. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2014-11-07 11:11:08 +01:00
Christian Borntraeger	2dca485f87	KVM: s390: flush CPU on load control some control register changes will flush some aspects of the CPU, e.g. POP explicitely mentions that for CR9-CR11 "TLBs may be cleared". Instead of trying to be clever and only flush on specific CRs, let play safe and flush on all lctl(g) as future machines might define new bits in CRs. Load control intercept should not happen that often. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: stable@vger.kernel.org	2014-11-07 11:10:52 +01:00
Christian Borntraeger	1365039d0c	KVM: s390: Fix ipte locking ipte_unlock_siif uses cmpxchg to replace the in-memory data of the ipte lock together with ACCESS_ONCE for the intial read. union ipte_control { unsigned long val; struct { unsigned long k : 1; unsigned long kh : 31; unsigned long kg : 32; }; }; [...] static void ipte_unlock_siif(struct kvm_vcpu vcpu) { union ipte_control old, new, ic; ic = &vcpu->kvm->arch.sca->ipte_control; do { new = old = ACCESS_ONCE(*ic); new.kh--; if (!new.kh) new.k = 0; } while (cmpxchg(&ic->val, old.val, new.val) != old.val); if (!new.kh) wake_up(&vcpu->kvm->arch.ipte_wq); } The new value, is loaded twice from memory with gcc 4.7.2 of fedora 18, despite the ACCESS_ONCE: ---> l %r4,0(%r3) <--- load first 32 bit of lock (k and kh) in r4 alfi %r4,2147483647 <--- add -1 to r4 llgtr %r4,%r4 <--- zero out the sign bit of r4 lg %r1,0(%r3) <--- load all 64 bit of lock into new lgr %r2,%r1 <--- load the same into old risbg %r1,%r4,1,31,32 <--- shift and insert r4 into the bits 1-31 of new llihf %r4,2147483647 ngrk %r4,%r1,%r4 jne aa0 <ipte_unlock+0xf8> nihh %r1,32767 lgr %r4,%r2 csg %r4,%r1,0(%r3) cgr %r2,%r4 jne a70 <ipte_unlock+0xc8> If the memory value changes between the first load (l) and the second load (lg) we are broken. If that happens VCPU threads will hang (unkillable) in handle_ipte_interlock. Andreas Krebbel analyzed this and tracked it down to a compiler bug in that version: "while it is not that obvious the C99 standard basically forbids duplicating the memory access also in that case. For an argumentation of a similiar case please see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278#c43 For the implementation-defined cases regarding volatile there are some GCC-specific clarifications which can be found here: https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html#Volatiles I've tracked down the problem with a reduced testcase. The problem was that during a tree level optimization (SRA - scalar replacement of aggregates) the volatile marker is lost. And an RTL level optimizer (CSE - common subexpression elimination) then propagated the memory read into its second use introducing another access to the memory location. So indeed Christian's suspicion that the union access has something to do with it is correct (since it triggered the SRA optimization). This issue has been reported and fixed in the GCC 4.8 development cycle: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145" This patch replaces the ACCESS_ONCE scheme with a barrier() based scheme that should work for all supported compilers. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: stable@vger.kernel.org # v3.16+	2014-11-07 11:10:26 +01:00
Tiejun Chen	c6338ce494	kvm: kvmclock: use get_cpu() and put_cpu() We can use get_cpu() and put_cpu() to replace preempt_disable()/cpu = smp_processor_id() and preempt_enable() for slightly better code. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:33 +01:00
Radim Krčmář	f30ebc312c	KVM: x86: optimize some accesses to LVTT and SPIV We mirror a subset of these registers in separate variables. Using them directly should be faster. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:32 +01:00
Radim Krčmář	a323b40982	KVM: x86: detect LVTT changes under APICv APIC-write VM exits are "trap-like": they save CS:RIP values for the instruction after the write, and more importantly, the handler will already see the new value in the virtual-APIC page. This means that apic_reg_write cannot use kvm_apic_get_reg to omit timer cancelation when mode changes. timer_mode_mask shouldn't be changing as it depends on cpuid. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:32 +01:00
Radim Krčmář	e462755cae	KVM: x86: detect SPIV changes under APICv APIC-write VM exits are "trap-like": they save CS:RIP values for the instruction after the write, and more importantly, the handler will already see the new value in the virtual-APIC page. This caused a bug if you used KVM_SET_IRQCHIP to set the SW-enabled bit in the SPIV register. The chain of events is as follows: * When the irqchip is added to the destination VM, the apic_sw_disabled static key is incremented (1) * When the KVM_SET_IRQCHIP ioctl is invoked, it is decremented (0) * When the guest disables the bit in the SPIV register, e.g. as part of shutdown, apic_set_spiv does not notice the change and the static key is _not_ incremented. * When the guest is destroyed, the static key is decremented (-1), resulting in this trace: WARNING: at kernel/jump_label.c:81 __static_key_slow_dec+0xa6/0xb0() jump label: negative count! [<ffffffff816bf898>] dump_stack+0x19/0x1b [<ffffffff8107c6f1>] warn_slowpath_common+0x61/0x80 [<ffffffff8107c76c>] warn_slowpath_fmt+0x5c/0x80 [<ffffffff811931e6>] __static_key_slow_dec+0xa6/0xb0 [<ffffffff81193226>] static_key_slow_dec_deferred+0x16/0x20 [<ffffffffa0637698>] kvm_free_lapic+0x88/0xa0 [kvm] [<ffffffffa061c63e>] kvm_arch_vcpu_uninit+0x2e/0xe0 [kvm] [<ffffffffa05ff301>] kvm_vcpu_uninit+0x21/0x40 [kvm] [<ffffffffa067cec7>] vmx_free_vcpu+0x47/0x70 [kvm_intel] [<ffffffffa061bc50>] kvm_arch_vcpu_free+0x50/0x60 [kvm] [<ffffffffa061ca22>] kvm_arch_destroy_vm+0x102/0x260 [kvm] [<ffffffff810b68fd>] ? synchronize_srcu+0x1d/0x20 [<ffffffffa06030d1>] kvm_put_kvm+0xe1/0x1c0 [kvm] [<ffffffffa06036f8>] kvm_vcpu_release+0x18/0x20 [kvm] [<ffffffff81215c62>] __fput+0x102/0x310 [<ffffffff81215f4e>] ____fput+0xe/0x10 [<ffffffff810ab664>] task_work_run+0xb4/0xe0 [<ffffffff81083944>] do_exit+0x304/0xc60 [<ffffffff816c8dfc>] ? _raw_spin_unlock_irq+0x2c/0x50 [<ffffffff810fd22d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [<ffffffff8108432c>] do_group_exit+0x4c/0xc0 [<ffffffff810843b4>] SyS_exit_group+0x14/0x20 [<ffffffff816d33a9>] system_call_fastpath+0x16/0x1b Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:31 +01:00
Chao Peng	612263b30c	KVM: x86: Enable Intel AVX-512 for guest Expose Intel AVX-512 feature bits to guest. Also add checks for xcr0 AVX512 related bits according to spec: http://download-software.intel.com/sites/default/files/managed/71/2e/319433-017.pdf Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:30 +01:00
Radim Krčmář	1e0ad70cc1	KVM: x86: fix deadline tsc interrupt injection The check in kvm_set_lapic_tscdeadline_msr() was trying to prevent a situation where we lose a pending deadline timer in a MSR write. Losing it is fine, because it effectively occurs before the timer fired, so we should be able to cancel or postpone it. Another problem comes from interaction with QEMU, or other userspace that can set deadline MSR without a good reason, when timer is already pending: one guest's deadline request results in more than one interrupt because one is injected immediately on MSR write from userspace and one through hrtimer later. The solution is to remove the injection when replacing a pending timer and to improve the usual QEMU path, we inject without a hrtimer when the deadline has already passed. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reported-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:28 +01:00
Radim Krčmář	5d87db7119	KVM: x86: add apic_timer_expired() Make the code reusable. If the timer was already pending, we shouldn't be waiting in a queue, so wake_up can be skipped, simplifying the path. There is no 'reinject' case => the comment is removed. Current race behaves correctly. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:27 +01:00
Nadav Amit	16f8a6f979	KVM: vmx: Unavailable DR4/5 is checked before CPL If DR4/5 is accessed when it is unavailable (since CR4.DE is set), then #UD should be generated even if CPL>0. This is according to Intel SDM Table 6-2: "Priority Among Simultaneous Exceptions and Interrupts". Note, that this may happen on the first DR access, even if the host does not sets debug breakpoints. Obviously, it occurs when the host debugs the guest. This patch moves the DR4/5 checks from __kvm_set_dr/_kvm_get_dr to handle_dr. The emulator already checks DR4/5 availability in check_dr_read. Nested virutalization related calls to kvm_set_dr/kvm_get_dr would not like to inject exceptions to the guest. As for SVM, the patch follows the previous logic as much as possible. Anyhow, it appears the DR interception code might be buggy - even if the DR access may cause an exception, the instruction is skipped. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:26 +01:00
Nadav Amit	c49c759f7a	KVM: x86: Emulator performs code segment checks on read access When read access is performed using a readable code segment, the "conforming" and "non-conforming" checks should not be done. As a result, read using non-conforming readable code segment fails. This is according to Intel SDM 5.6.1 ("Accessing Data in Code Segments"). The fix is not to perform the "non-conforming" checks if the access is not a fetch; the relevant checks are already done when loading the segment. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:25 +01:00
Nadav Amit	0e8a09969a	KVM: x86: Clear DR7.LE during task-switch DR7.LE should be cleared during task-switch. This feature is poorly documented. For reference, see: http://pdos.csail.mit.edu/6.828/2005/readings/i386/s12_02.htm SDM [17.2.4]: This feature is not supported in the P6 family processors, later IA-32 processors, and Intel 64 processors. AMD [2:13.1.1.4]: This bit is ignored by implementations of the AMD64 architecture. Intel's formulation could mean that it isn't even zeroed, but current hardware indeed does not behave like that. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:25 +01:00
Nadav Amit	518547b32a	KVM: x86: Emulator does not calculate address correctly In long-mode, when the address size is 4 bytes, the linear address is not truncated as the emulator mistakenly does. Instead, the offset within the segment (the ea field) should be truncated according to the address size. As Intel SDM says: "In 64-bit mode, the effective address components are added and the effective address is truncated ... before adding the full 64-bit segment base." Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:24 +01:00
Nadav Amit	6bdf06625d	KVM: x86: DR7.GD should be cleared upon any #DB exception Intel SDM 17.2.4 (Debug Control Register (DR7)) says: "The processor clears the GD flag upon entering to the debug exception handler." This sentence may be misunderstood as if it happens only on #DB due to debug-register protection, but it happens regardless to the cause of the #DB. Fix the behavior to match both real hardware and Bochs. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:23 +01:00
Nadav Amit	394457a928	KVM: x86: some apic broadcast modes does not work KVM does not deliver x2APIC broadcast messages with physical mode. Intel SDM (10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of FFFF_FFFFH is used for broadcast of interrupts in both logical destination and physical destination modes." In addition, the local-apic enables cluster mode broadcast. As Intel SDM 10.6.2.2 says: "Broadcast to all local APICs is achieved by setting all destination bits to one." This patch enables cluster mode broadcast. The fix tries to combine broadcast in different modes through a unified code. One rare case occurs when the source of IPI has its APIC disabled. In such case, the source can still issue IPIs, but since the source is not obliged to have the same LAPIC mode as the enabled ones, we cannot rely on it. Since it is a rare case, it is unoptimized and done on the slow-path. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com> [As per Radim's review, use unsigned int for X2APIC_BROADCAST, return bool from kvm_apic_broadcast. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:22 +01:00
Andy Lutomirski	52ce3c21ae	x86,kvm,vmx: Don't trap writes to CR4.TSD CR4.TSD is guest-owned; don't trap writes to it in VMX guests. This avoids a VM exit on context switches into or out of a PR_TSC_SIGSEGV task. I think that this fixes an unintentional side-effect of: `4c38609ac5` KVM: VMX: Make guest cr4 mask more conservative Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:22 +01:00
Nadav Amit	bf0b682c9b	KVM: x86: Sysexit emulation does not mask RIP/RSP If the operand size is not 64-bit, then the sysexit instruction should assign ECX to RSP and EDX to RIP. The current code assigns the full 64-bits. Fix it by masking. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:21 +01:00
Nadav Amit	58b7075d05	KVM: x86: Distinguish between stack operation and near branches In 64-bit, stack operations default to 64-bits, but can be overriden (to 16-bit) using opsize override prefix. In contrast, near-branches are always 64-bit. This patch distinguish between the different behaviors. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:20 +01:00
Nadav Amit	f7784046ab	KVM: x86: Getting rid of grp45 in emulator Breaking grp45 to the relevant functions to speed up the emulation and simplify the code. In addition, it is necassary the next patch will distinguish between far and near branches according to the flags. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:20 +01:00
Nadav Amit	4be4de7ef9	KVM: x86: Use new is_noncanonical_address in _linearize Replace the current canonical address check with the new function which is identical. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:19 +01:00
Paolo Bonzini	d09155d2f3	KVM: emulator: always inline __linearize The two callers have a lot of constant arguments that can be optimized out. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:18 +01:00
Paolo Bonzini	123b2dd10b	Merge remote-tracking branch 'origin/master' into HEAD Several important fixes went in between 3.18-rc1 and 3.18-rc3, so KVM/x86 work for 3.19 will be based on 3.18-rc3.	2014-11-03 12:06:21 +01:00
Linus Torvalds	3c43de0ffd	Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm Pull ARM fixes from Russell King: - add the new bpf syscall to ARM. - drop a redundant return statement in __iommu_alloc_remap() - fix a performance issue noticed by Thomas Petazzoni with kmap_atomic(). - fix an issue with the L2 cache OF parsing code which caused it to incorrectly print warnings on each boot, and make the warning text more consistent with the rest of the code * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: 8180/1: mm: implement no-highmem fast path in kmap_atomic_pfn() ARM: 8183/1: l2c: Improve l2c310_of_parse() error message ARM: 8181/1: Drop extra return statement ARM: 8182/1: l2c: Make l2x0_cache_size_of_parse() return 'int' ARM: enable bpf syscall	2014-11-02 12:56:20 -08:00
Linus Torvalds	7501a53329	A small set of x86 fixes. The most serious is an SRCU lockdep fix. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJUVd9KAAoJEL/70l94x66Dc1AH/0jdb8DsewyAuJzLKaJ/qJwK 9JMqglpDQ+Sm0f2puPyJkR8NQd2AMPK7J5aJjWAl/XxJjsDcn+TQur20okzUDXLJ 21sIbqo92hCgpSNs+RHLHlj7/iMQVYnMFh7bp6JcvzmhpN8F/D793BT+oOxdjMRg PLCQ794ugGhFboesDkV822VWgtQ26yG2aQDWbYgL9r5xPp5OpbzSiq85KopSEfS0 K+PPntI8yNI+EvOC9ta0FfEOMMfQoLDds+V0FXiEIRx43MV8bwAXpWzsB8ibd1F6 eY+cVvSPzWgDSCVLn3gfYkrRl3sWGdvyfxTe/cz507ZfXcuT2uHJhtbpH2KCGto= =FJ6/ -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "A small set of x86 fixes. The most serious is an SRCU lockdep fix. A bit late - needed some time to test the SRCU fix, which only came in on Friday" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: vmx: defer load of APIC access page address during reset KVM: nVMX: Disable preemption while reading from shadow VMCS KVM: x86: Fix far-jump to non-canonical check KVM: emulator: fix execution close to the segment limit KVM: emulator: fix error code for __linearize	2014-11-02 12:31:02 -08:00
Paolo Bonzini	a73896cb5b	KVM: vmx: defer load of APIC access page address during reset Most call paths to vmx_vcpu_reset do not hold the SRCU lock. Defer loading the APIC access page to the next vmentry. This avoids the following lockdep splat: [ INFO: suspicious RCU usage. ] 3.18.0-rc2-test2+ #70 Not tainted ------------------------------- include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-x86/2371: #0: (&vcpu->mutex){+.+...}, at: [<ffffffffa037d800>] vcpu_load+0x20/0xd0 [kvm] stack backtrace: CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 0000000000000001 ffff880209983ca8 ffffffff816f514f 0000000000000000 ffff8802099b8990 ffff880209983cd8 ffffffff810bd687 00000000000fee00 ffff880208a2c000 ffff880208a10000 ffff88020ef50040 ffff880209983d08 Call Trace: [<ffffffff816f514f>] dump_stack+0x4e/0x71 [<ffffffff810bd687>] lockdep_rcu_suspicious+0xe7/0x120 [<ffffffffa037d055>] gfn_to_memslot+0xd5/0xe0 [kvm] [<ffffffffa03807d3>] __gfn_to_pfn+0x33/0x60 [kvm] [<ffffffffa0380885>] gfn_to_page+0x25/0x90 [kvm] [<ffffffffa038aeec>] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm] [<ffffffffa08f0a9c>] vmx_vcpu_reset+0x20c/0x460 [kvm_intel] [<ffffffffa039ab8e>] kvm_vcpu_reset+0x15e/0x1b0 [kvm] [<ffffffffa039ac0c>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm] [<ffffffffa037f7e0>] kvm_vm_ioctl+0x1d0/0x780 [kvm] [<ffffffff810bc664>] ? __lock_is_held+0x54/0x80 [<ffffffff812231f0>] do_vfs_ioctl+0x300/0x520 [<ffffffff8122ee45>] ? __fget+0x5/0x250 [<ffffffff8122f0fa>] ? __fget_light+0x2a/0xe0 [<ffffffff81223491>] SyS_ioctl+0x81/0xa0 [<ffffffff816fed6d>] system_call_fastpath+0x16/0x1b Reported-by: Takashi Iwai <tiwai@suse.de> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com> Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com> Fixes: `38b9917350` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-02 08:37:18 +01:00
Jan Kiszka	282da870f4	KVM: nVMX: Disable preemption while reading from shadow VMCS In order to access the shadow VMCS, we need to load it. At this point, vmx->loaded_vmcs->vmcs and the actually loaded one start to differ. If we now get preempted by Linux, vmx_vcpu_put and, on return, the vmx_vcpu_load will work against the wrong vmcs. That can cause copy_shadow_to_vmcs12 to corrupt the vmcs12 state. Fix the issue by disabling preemption during the copy operation. copy_vmcs12_to_shadow is safe from this issue as it is executed by vmx_vcpu_run when preemption is already disabled before vmentry. This bug is exposed by running Jailhouse within KVM on CPUs with shadow VMCS support. Jailhouse never expects an interrupt pending vmexit, but the bug can cause it if, after copy_shadow_to_vmcs12 is preempted, the active VMCS happens to have the virtual interrupt pending flag set in the CPU-based execution controls. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-02 07:55:46 +01:00
Nadav Amit	7e46dddd6f	KVM: x86: Fix far-jump to non-canonical check Commit `d1442d85cc` ("KVM: x86: Handle errors when RIP is set during far jumps") introduced a bug that caused the fix to be incomplete. Due to incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may not trigger #GP. As we know, this imposes a security problem. In addition, the condition for two warnings was incorrect. Fixes: `d1442d85cc` Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> [Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-02 07:54:55 +01:00
Andy Lutomirski	653bc77af6	x86_64, entry: Fix out of bounds read on sysenter Rusty noticed a Really Bad Bug (tm) in my NT fix. The entry code reads out of bounds, causing the NT fix to be unreliable. But, and this is much, much worse, if your stack is somehow just below the top of the direct map (or a hole), you read out of bounds and crash. Excerpt from the crash: [ 1.129513] RSP: 0018:ffff88001da4bf88 EFLAGS: 00010296 2b:* f7 84 24 90 00 00 00 testl $0x4000,0x90(%rsp) That read is deterministically above the top of the stack. I thought I even single-stepped through this code when I wrote it to check the offset, but I clearly screwed it up. Fixes: `8c7aa698ba` ("x86_64, entry: Filter RFLAGS.NT on entry from userspace") Reported-by: Rusty Russell <rusty@ozlabs.org> Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-31 18:47:09 -07:00
Linus Torvalds	89453379aa	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "A bit has accumulated, but it's been a week or so since my last batch of post-merge-window fixes, so... 1) Missing module license in netfilter reject module, from Pablo. Lots of people ran into this. 2) Off by one in mac80211 baserate calculation, from Karl Beldan. 3) Fix incorrect return value from ax88179_178a driver's set_mac_addr op, which broke use of it with bonding. From Ian Morgan. 4) Checking of skb_gso_segment()'s return value was not all encompassing, it can return an SKB pointer, a pointer error, or NULL. Fix from Florian Westphal. This is crummy, and longer term will be fixed to just return error pointers or a real SKB. 6) Encapsulation offloads not being handled by skb_gso_transport_seglen(). From Florian Westphal. 7) Fix deadlock in TIPC stack, from Ying Xue. 8) Fix performance regression from using rhashtable for netlink sockets. The problem was the synchronize_net() invoked for every socket destroy. From Thomas Graf. 9) Fix bug in eBPF verifier, and remove the strong dependency of BPF on NET. From Alexei Starovoitov. 10) In qdisc_create(), use the correct interface to allocate ->cpu_bstats, otherwise the u64_stats_sync member isn't initialized properly. From Sabrina Dubroca. 11) Off by one in ip_set_nfnl_get_byindex(), from Dan Carpenter. 12) nf_tables_newchain() was erroneously expecting error pointers from netdev_alloc_pcpu_stats(). It only returna a valid pointer or NULL. From Sabrina Dubroca. 13) Fix use-after-free in _decode_session6(), from Li RongQing. 14) When we set the TX flow hash on a socket, we mistakenly do so before we've nailed down the final source port. Move the setting deeper to fix this. From Sathya Perla. 15) NAPI budget accounting in amd-xgbe driver was counting descriptors instead of full packets, fix from Thomas Lendacky. 16) Fix total_data_buflen calculation in hyperv driver, from Haiyang Zhang. 17) Fix bcma driver build with OF_ADDRESS disabled, from Hauke Mehrtens. 18) Fix mis-use of per-cpu memory in TCP md5 code. The problem is that something that ends up being vmalloc memory can't be passed to the crypto hash routines via scatter-gather lists. From Eric Dumazet. 19) Fix regression in promiscuous mode enabling in cdc-ether, from Olivier Blin. 20) Bucket eviction and frag entry killing can race with eachother, causing an unlink of the object from the wrong list. Fix from Nikolay Aleksandrov. 21) Missing initialization of spinlock in cxgb4 driver, from Anish Bhatt. 22) Do not cache ipv4 routing failures, otherwise if the sysctl for forwarding is subsequently enabled this won't be seen. From Nicolas Cavallari" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (131 commits) drivers: net: cpsw: Support ALLMULTI and fix IFF_PROMISC in switch mode drivers: net: cpsw: Fix broken loop condition in switch mode net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0 stmmac: pci: set default of the filter bins net: smc91x: Fix gpios for device tree based booting mpls: Allow mpls_gso to be built as module mpls: Fix mpls_gso handler. r8152: stop submitting intr for -EPROTO netfilter: nft_reject_bridge: restrict reject to prerouting and input netfilter: nft_reject_bridge: don't use IP stack to reject traffic netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routing drivers/net: macvtap and tun depend on INET drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets drivers/net: Disable UFO through virtio net: skb_fclone_busy() needs to detect orphaned skb gre: Use inner mac length when computing tunnel length mlx4: Avoid leaking steering rules on flow creation error flow net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN ...	2014-10-31 15:04:58 -07:00
Linus Torvalds	53429290a0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc Pull sparc update from David Miller: "Two changes: 1) It makes no sense to execute a VTOC partition table request in the Sun virtual block device driver and fail to load if it doesn't succeed because a) we don't use the result at all and b) it won't succeed if there is an EFI partition on the disk, for example. We read the partition table via the normal means in the block layer anyways, so this is really completely useless, so just remove it. From Dwight Engen. 2) Hook up new bpf system call" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sunvdc: don't call VD_OP_GET_VTOC sparc: Hook up bpf system call.	2014-10-31 15:00:48 -07:00
Linus Torvalds	9f58c62fcc	Microblaze patches for 3.18-rc3 - Wire-up new syscall - Fix PCI bug - Fix Kconfig warning -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEABECAAYFAlRTi7YACgkQykllyylKDCGGIACgkMfyXR9QE6SDC09L7k2NoPTw qxMAn1/mKUPWGSCIK6NklOXgAEiILscA =Gtqs -----END PGP SIGNATURE----- Merge tag 'microblaze-3.18-rc3' of git://git.monstr.eu/linux-2.6-microblaze Pull Microblaze updates from Michal Simek: - wire-up new bpf syscall - fix PCI bug - fix Kconfig warning * tag 'microblaze-3.18-rc3' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Wire up bpf syscall microblaze: Fix IO space breakage after of_pci_range_to_resource() change microblaze: Fix missing NR_CPUS in menuconfig	2014-10-31 14:43:42 -07:00
Linus Torvalds	19e0d5f16a	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fixes from all around the place: - hyper-V 32-bit PAE guest kernel fix - two IRQ allocation fixes on certain x86 boards - intel-mid boot crash fix - intel-quark quirk - /proc/interrupts duplicate irq chip name fix - cma boot crash fix - syscall audit fix - boot crash fix with certain TSC configurations (seen on Qemu) - smpboot.c build warning fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi() x86, intel-mid: Create IRQs for APB timers and RTC timers x86: Don't enable F00F workaround on Intel Quark processors x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts x86, cma: Reserve DMA contiguous area after initmem_init() i386/audit: stop scribbling on the stack frame x86, apic: Handle a bad TSC more gracefully x86: ACPI: Do not translate GSI number if IOAPIC is disabled x86/smpboot: Move data structure to its primary usage scope	2014-10-31 14:30:16 -07:00
Linus Torvalds	f5fa363026	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Various scheduler fixes all over the place: three SCHED_DL fixes, three sched/numa fixes, two generic race fixes and a comment fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/dl: Fix preemption checks sched: Update comments for CLONE_NEWNS sched: stop the unbound recursion in preempt_schedule_context() sched/fair: Fix division by zero sysctl_numa_balancing_scan_size sched/fair: Care divide error in update_task_scan_period() sched/numa: Fix unsafe get_task_struct() in task_numa_assign() sched/deadline: Fix races between rt_mutex_setprio() and dl_task_timer() sched/deadline: Don't replenish from a !SCHED_DEADLINE entity sched: Fix race between task_group and sched_task_group	2014-10-31 14:05:35 -07:00
Linus Torvalds	5656b408ff	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, plus on the kernel side: - a revert for a newly introduced PMU driver which isn't complete yet and where we ran out of time with fixes (to be tried again in v3.19) - this makes up for a large chunk of the diffstat. - compilation warning fixes - a printk message fix - event_idx usage fixes/cleanups" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf probe: Trivial typo fix for --demangle perf tools: Fix report -F dso_from for data without branch info perf tools: Fix report -F dso_to for data without branch info perf tools: Fix report -F symbol_from for data without branch info perf tools: Fix report -F symbol_to for data without branch info perf tools: Fix report -F mispredict for data without branch info perf tools: Fix report -F in_tx for data without branch info perf tools: Fix report -F abort for data without branch info perf tools: Make CPUINFO_PROC an array to support different kernel versions perf callchain: Use global caching provided by libunwind perf/x86/intel: Revert incomplete and undocumented Broadwell client support perf/x86: Fix compile warnings for intel_uncore perf: Fix typos in sample code in the perf_event.h header perf: Fix and clean up initialization of pmu::event_idx perf: Fix bogus kernel printk perf diff: Add missing hists__init() call at tool start	2014-10-31 14:01:47 -07:00
Tony Lindgren	7d2911c438	net: smc91x: Fix gpios for device tree based booting With legacy booting, the platform init code was taking care of the configuring of GPIOs. With device tree based booting, things may or may not work depending what bootloader has configured or if the legacy platform code gets called. Let's add support for the pwrdn and reset GPIOs to the smc91x driver to fix the issues of smc91x not working properly when booted in device tree mode. And let's change n900 to use these settings as some versions of the bootloader do not configure things properly causing errors. Reported-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-31 15:54:18 -04:00
Linus Torvalds	a7ca10f263	Merge branch 'akpm' (incoming from Andrew Morton) Merge misc fixes from Andrew Morton: "21 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits) mm/balloon_compaction: fix deflation when compaction is disabled sh: fix sh770x SCIF memory regions zram: avoid NULL pointer access in concurrent situation mm/slab_common: don't check for duplicate cache names ocfs2: fix d_splice_alias() return code checking mm: rmap: split out page_remove_file_rmap() mm: memcontrol: fix missed end-writeback page accounting mm: page-writeback: inline account_page_dirtied() into single caller lib/bitmap.c: fix undefined shift in __bitmap_shift_{left\|right}() drivers/rtc/rtc-bq32k.c: fix register value memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node() drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock kernel/kmod: fix use-after-free of the sub_info structure drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc mm, thp: fix collapsing of hugepages on madvise drivers: of: add return value to of_reserved_mem_device_init() mm: free compound page with correct order gcov: add ARM64 to GCOV_PROFILE_ALL fsnotify: next_i is freed during fsnotify_unmount_inodes. mm/compaction.c: avoid premature range skip in isolate_migratepages_range ...	2014-10-29 16:38:48 -07:00
Andriy Skulysh	5417421b27	sh: fix sh770x SCIF memory regions Resources scif1_resources & scif2_resources overlap. Actual SCIF region size is 0x10. This is regression from commit `d850acf975` ("sh: Declare SCIF register base and IRQ as resources") Signed-off-by: Andriy Skulysh <askulysh@gmail.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-29 16:33:15 -07:00
Linus Torvalds	19be9e8aa7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: "There's some bug fixes or cleanups to facilitate fixes, a MAINTAINERS update, and a new syscall (bpf)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: powerpc/numa: ensure per-cpu NUMA mappings are correct on topology update powerpc/numa: use cached value of update->cpu in update_cpu_topology cxl: Fix PSL error due to duplicate segment table entries powerpc/mm: Use appropriate ESID mask in copro_calculate_slb() cxl: Refactor cxl_load_segment() and find_free_sste() cxl: Disable secondary hash in segment table Revert "powerpc/powernv: Fix endian bug in LPC bus debugfs accessors" powernv: Use _GLOBAL_TOC for opal wrappers powerpc: Wire up sys_bpf() syscall MAINTAINERS: nx-842 driver maintainer change powerpc/mm: Remove redundant #if case powerpc/mm: Fix build error with hugetlfs disabled	2014-10-29 11:11:44 -07:00
Thomas Petazzoni	9ff0bb5ba6	ARM: 8180/1: mm: implement no-highmem fast path in kmap_atomic_pfn() Since CONFIG_HIGHMEM got enabled on ARMv5 Kirkwood, we have noticed a very significant drop in networking performance. The test were conducted on an OpenBlocks A7 board. Without this patch, the outgoing performance measured with iperf are: - highmem OFF, TSO OFF 544 Mbit/s - highmem OFF, TSO ON 942 Mbit/s - highmem ON, TSO OFF 306 Mbit/s - highmem ON, TSO ON 246 Mbit/s On this Kirkwood platform, the L2 cache is a Feroceon cache, and with this cache, all the range operations have to be done on virtual addresses and not physical addresses. Therefore, whenever CONFIG_HIGHMEM is enabled, the cache maintenance operations call kmap_atomic_pfn() and kunmap_atomic(). However, kmap_atomic_pfn() does not implement the same fast path for non-highmem pages as the one implemented in kmap_atomic(), and this is one of the reason for the performance drop. While this patch does not fully restore the performances, it clearly improves them a lot: without patch with patch - highmem ON, TSO OFF 306 Mbit/s 387 Mbit/s - highmem ON, TSO ON 246 Mbit/s 434 Mbit/s We're still far from the !CONFIG_HIGHMEM performances, but it does improve a bit the situation. Thanks a lot to Ezequiel Garcia and Gregory Clement for all the testing work around this topic. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2014-10-29 17:22:07 +00:00
Fabio Estevam	6d0ec1dd90	ARM: 8183/1: l2c: Improve l2c310_of_parse() error message Russell King suggested [1]: "I'd ask for one change. Please make all these messages start with "L2C-310 OF" not "PL310 OF:". The device is described in ARM documentation as a L2C-310 not PL310. (Also note the : is dropped too - most of the other messages don't have the : either.) The: "PL310 OF: cache setting yield illegal associativity PL310 OF: -1073346556 calculated, only 8 and 16 legal" message could also be changed to something like: "L2C-310 OF cache associativity %d invalid, only 8 or 16 permittedn" [1] http://www.spinics.net/lists/arm-kernel/msg372776.html Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2014-10-29 17:20:55 +00:00
Laura Abbott	005757298f	ARM: 8181/1: Drop extra return statement Commit `513510ddba` (common: dma-mapping: introduce common remapping functions) managed to end up with an extra return statement from the original patch. Drop it. Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2014-10-29 17:20:51 +00:00
Paolo Bonzini	f62c95fd40	KVM: s390: Fixes and cleanups 1. A small fix regarding program check handling (cc stable as it overwrites the wrong guest memory) 2. Improve the ipte interlock scalability for older hardware 3. current->mm to mm cleanup (currently a no-op) 4. several SIGP rework patches (more to come) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJUT48bAAoJEBF7vIC1phx8P4EP/2QO6ZGHMKFXzvOex9mhesGt VUxmqJBwZe3xJqEr6LM7Fr7yHE4+UMg+LjV6ZeXL1KHFGZJ8mo8AOlQcx86r05+9 KARmX1K0RIBHUsGhWCAwjD9I5xWoIO56mH0jQetIuInjAMPU0TnQ4Ufh6WXTARG8 O1HRdqO91yikXRIkWw/9JN07MGaavdWbjyk+FhT39yIDqZckdzYkOmLyws9CskNS r3g9vl5kV251c28dVVr9+1j3Kd/OD06k3aGNUVenUGmbEjszl6/uoQn8swVri/eZ eLOCTYMaUBF3lbbl3H6g237x1/V49bwyfczDE1DxJ1y1fOcJlG8XQRf6g1p6jfSG j2yjl00YvsRnwuDTeW9OVmDwLK+r6IYP2QqDplW/XNS9BegTBTO3pqc/XDPqGZT7 a88vuFvD94coaqVJabXDVebzHfbskSEPkBI13QmbhxA9/a7nPpuNhws595dadn77 w9C7QKpSPtnBibDvN3atmKoekseY/3Iap/9KS3WT4mJPjHRTmt30ttZukDIcfaG7 oSi1sysXHFBmr20ZZhhqcIP2RFdlX3Efh++dZBYnAODzKIIZgjtdRutXagyXDlM/ w6Pb75/Jep19zlxVKcLNMudgxn67Ezf3f0jSbppPc16A53xkqK5f6wyTiNS1bv+q TL3JD2JA0O90ARd1AbEE =npsV -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-20141028' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes and cleanups 1. A small fix regarding program check handling (cc stable as it overwrites the wrong guest memory) 2. Improve the ipte interlock scalability for older hardware 3. current->mm to mm cleanup (currently a no-op) 4. several SIGP rework patches (more to come)	2014-10-29 13:31:32 +01:00
Jan Kiszka	41e7ed64d8	KVM: nVMX: Disable preemption while reading from shadow VMCS In order to access the shadow VMCS, we need to load it. At this point, vmx->loaded_vmcs->vmcs and the actually loaded one start to differ. If we now get preempted by Linux, vmx_vcpu_put and, on return, the vmx_vcpu_load will work against the wrong vmcs. That can cause copy_shadow_to_vmcs12 to corrupt the vmcs12 state. Fix the issue by disabling preemption during the copy operation. copy_vmcs12_to_shadow is safe from this issue as it is executed by vmx_vcpu_run when preemption is already disabled before vmentry. This bug is exposed by running Jailhouse within KVM on CPUs with shadow VMCS support. Jailhouse never expects an interrupt pending vmexit, but the bug can cause it if, after copy_shadow_to_vmcs12 is preempted, the active VMCS happens to have the virtual interrupt pending flag set in the CPU-based execution controls. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-29 13:13:52 +01:00
Nadav Amit	cd9b8e2c48	KVM: x86: Fix far-jump to non-canonical check Commit `d1442d85cc` ("KVM: x86: Handle errors when RIP is set during far jumps") introduced a bug that caused the fix to be incomplete. Due to incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may not trigger #GP. As we know, this imposes a security problem. In addition, the condition for two warnings was incorrect. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> [Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-29 13:13:51 +01:00
Paolo Bonzini	fd56e1546a	KVM: emulator: fix execution close to the segment limit Emulation of code that is 14 bytes to the segment limit or closer (e.g. RIP = 0xFFFFFFF2 after reset) is broken because we try to read as many as 15 bytes from the beginning of the instruction, and __linearize fails when the passed (address, size) pair reaches out of the segment. To fix this, let __linearize return the maximum accessible size (clamped to 2^32-1) for usage in __do_insn_fetch_bytes, and avoid the limit check by passing zero for the desired size. For expand-down segments, __linearize is performing a redundant check. (u32)(addr.ea + size - 1) <= lim can only happen if addr.ea is close to 4GB; in this case, addr.ea + size - 1 will also fail the check against the upper bound of the segment (which is provided by the D/B bit). After eliminating the redundant check, it is simple to compute the *max_size for expand-down segments too. Now that the limit check is done in __do_insn_fetch_bytes, we want to inject a general protection fault there if size < op_size (like __linearize would have done), instead of just aborting. This fixes booting Tiano Core from emulated flash with EPT disabled. Cc: stable@vger.kernel.org Fixes: `719d5a9b24` Reported-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-29 13:13:48 +01:00
Paolo Bonzini	3606189fa3	KVM: emulator: fix error code for __linearize The error code for #GP and #SS is zero when the segment is used to access an operand or an instruction. It is only non-zero when a segment register is being loaded; for limit checks this means cases such as: * for #GP, when RIP is beyond the limit on a far call (before the first instruction is executed). We do not implement this check, but it would be in em_jmp_far/em_call_far. * for #SS, if the new stack overflows during an inter-privilege-level call to a non-conforming code segment. We do not implement stack switching at all. So use an error code of zero. Reviewed-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-29 12:40:28 +01:00
Fabio Estevam	d0b92845e5	ARM: 8182/1: l2c: Make l2x0_cache_size_of_parse() return 'int' Since commit `f3354ab674` ("ARM: 8169/1: l2c: parse cache properties from ePAPR definitions") the following error is seen on imx6q: [ 0.000000] PL310 OF: cache setting yield illegal associativity [ 0.000000] PL310 OF: -2147097556 calculated, only 8 and 16 legal As imx6q does not pass the "cache-size" and "cache-sets" properties in DT, the function l2x0_cache_size_of_parse() returns early and keep the 'associativity' pointer uninitialized. To fix this problem, return error codes inside l2x0_cache_size_of_parse() and only use the 'associativity' pointer result if l2x0_cache_size_of_parse() succeeds. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2014-10-29 11:13:02 +00:00
Ingo Molnar	1776b10627	perf/x86/intel: Revert incomplete and undocumented Broadwell client support These patches: `86a349a28b` ("perf/x86/intel: Add Broadwell core support") `c46e665f03` ("perf/x86: Add INST_RETIRED.ALL workarounds") `fdda3c4aac` ("perf/x86/intel: Use Broadwell cache event list for Haswell") introduced magic constants and unexplained changes: https://lkml.org/lkml/2014/10/28/1128 https://lkml.org/lkml/2014/10/27/325 https://lkml.org/lkml/2014/8/27/546 https://lkml.org/lkml/2014/10/28/546 Peter Zijlstra has attempted to help out, to clean up the mess: https://lkml.org/lkml/2014/10/28/543 But has not received helpful and constructive replies which makes me doubt wether it can all be finished in time until v3.18 is released. Despite various review feedback the author (Andi Kleen) has answered only few of the review questions and has generally been uncooperative, only giving replies when prompted repeatedly, and only giving minimal answers instead of constructively explaining and helping along the effort. That kind of behavior is not acceptable. There's also a boot crash on Intel E5-1630 v3 CPUs reported for another commit from Andi Kleen: `e735b9db12` ("perf/x86/intel/uncore: Add Haswell-EP uncore support") https://lkml.org/lkml/2014/10/22/730 Which is not yet resolved. The uncore driver is independent in theory, but the crash makes me worry about how well all these patches were tested and makes me uneasy about the level of interminging that the Broadwell and Haswell code has received by the commits above. As a first step to resolve the mess revert the Broadwell client commits back to the v3.17 version, before we run out of time and problematic code hits a stable upstream kernel. ( If the Haswell-EP crash is not resolved via a simple fix then we'll have to revert the Haswell-EP uncore driver as well. ) The Broadwell client series has to be submitted in a clean fashion, with single, well documented changes per patch. If they are submitted in time and are accepted during review then they can possibly go into v3.19 but will need additional scrutiny due to the rocky history of this patch set. Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-29 11:07:58 +01:00
Dexuan Cui	d1cd121083	x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE pte_pfn() returns a PFN of long (32 bits in 32-PAE), so "long << PAGE_SHIFT" will overflow for PFNs above 4GB. Due to this issue, some Linux 32-PAE distros, running as guests on Hyper-V, with 5GB memory assigned, can't load the netvsc driver successfully and hence the synthetic network device can't work (we can use the kernel parameter mem=3000M to work around the issue). Cast pte_pfn() to phys_addr_t before shifting. Fixes: "commit `d765653445`: x86, mm: Create slow_virt_to_phys()" Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: gregkh@linuxfoundation.org Cc: linux-mm@kvack.org Cc: olaf@aepfle.de Cc: apw@canonical.com Cc: jasowang@redhat.com Cc: dave.hansen@intel.com Cc: riel@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1414580017-27444-1-git-send-email-decui@microsoft.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-10-29 10:57:21 +01:00
Jiang Liu	b77e8f4353	ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi() Function mp_register_gsi() returns blindly the GSI number for the ACPI SCI interrupt. That causes a regression when the GSI for ACPI SCI is shared with other devices. The regression was caused by commit `84245af729` "x86, irq, ACPI: Change __acpi_register_gsi to return IRQ number instead of GSI" and exposed on a SuperMicro system, which shares one GSI between ACPI SCI and PCI device, with following failure: http://sourceforge.net/p/linux1394/mailman/linux1394-user/?viewmonth=201410 [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 20 low level) [ 2.699224] firewire_ohci 0000:06:00.0: failed to allocate interrupt 20 Return mp_map_gsi_to_irq(gsi, 0) instead of the GSI number. Reported-and-Tested-by: Daniel Robbins <drobbins@funtoo.org> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: <stable@vger.kernel.org> # 3.17 Link: http://lkml.kernel.org/r/1414387308-27148-4-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-29 08:52:30 +01:00
Jiang Liu	f18298595a	x86, intel-mid: Create IRQs for APB timers and RTC timers Intel MID platforms has no legacy interrupts, so no IRQ descriptors preallocated. We need to call mp_map_gsi_to_irq() to create IRQ descriptors for APB timers and RTC timers, otherwise it may cause invalid memory access as: [ 0.116839] BUG: unable to handle kernel NULL pointer dereference at 0000003a [ 0.123803] IP: [<c1071c0e>] setup_irq+0xf/0x4d Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: <stable@vger.kernel.org> # 3.17 Link: http://lkml.kernel.org/r/1414387308-27148-3-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-29 08:52:23 +01:00
Dave Jones	d4e1a0af1d	x86: Don't enable F00F workaround on Intel Quark processors The Intel Quark processor is a part of family 5, but does not have the F00F bug present in Pentiums of the same family. Pentiums were models 0 through 8, Quark is model 9. Signed-off-by: Dave Jones <davej@redhat.com> Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie> Link: http://lkml.kernel.org/r/20141028175753.GA12743@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-29 08:52:09 +01:00
Russell King	2d605a3029	ARM: enable bpf syscall Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2014-10-29 00:18:20 +00:00
Nishanth Aravamudan	2c0a33f986	powerpc/numa: ensure per-cpu NUMA mappings are correct on topology update We received a report of warning in kernel/sched/core.c where the sched group was NULL on an LPAR after a topology update. This seems to occur because after the topology update has moved the CPUs, cpu_to_node is returning the old value still, which ends up breaking the consistency of the NUMA topology in the per-cpu maps. Ensure that we update the per-cpu fields when we re-map CPUs. Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-29 09:41:22 +11:00
Nishanth Aravamudan	49f8d8c043	powerpc/numa: use cached value of update->cpu in update_cpu_topology There isn't any need to keep referring to update->cpu, as we've already checked cpu == update->cpu at this point. Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-29 09:41:22 +11:00
Linus Torvalds	6e2028aaa1	Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm Pull ARM fixes from Russell King: "A couple of ARM fixes. We fix some printk formats for ptrdiff_t quantities which cause GCC 4.9 to complain, and we also blacklist known buggy GCC 4.8.x compilers as their miscompilation is serious enough to cause filesystem corruption, even through many distros have fixed their versions" * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: fix some printk formats ARM: Blacklist GCC 4.8.0 to GCC 4.8.2 - PR58854	2014-10-28 13:17:11 -07:00
David S. Miller	c20ce79303	sparc: Hook up bpf system call. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 11:30:43 -07:00
David Hildenbrand	a6cc310856	KVM: s390: sigp: split handling of SIGP STOP (AND STORE STATUS) In preparation for further code changes (e.g. getting rid of action_flags), this patch splits the handling of the two sigp orders SIGP STOP and SIGP STOP AND STORE STATUS by introducing a separate handler function for SIGP STOP AND STORE STATUS. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2014-10-28 13:09:14 +01:00
David Hildenbrand	07b0303540	KVM: s390: sigp: inject emergency calls in a separate function In preparation for further code changes, this patch moves the injection of emergency calls into a separate function and uses it for the processing of SIGP EMERGENCY CALL and SIGP CONDITIONAL EMERGENCY CALL. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2014-10-28 13:09:14 +01:00
David Hildenbrand	42cb0c9ff9	KVM: s390: sigp: instruction counters for all sigp orders This patch introduces instruction counters for all known sigp orders and also a separate one for unknown orders that are passed to user space. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2014-10-28 13:09:13 +01:00
David Hildenbrand	b898383082	KVM: s390: sigp: separate preparation handlers This patch introduces in preparation for further code changes separate handler functions for: - SIGP (RE)START - will not be allowed to terminate pending orders - SIGP (INITIAL) CPU RESET - will be allowed to terminate certain pending orders - unknown sigp orders All sigp orders that require user space intervention are logged. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2014-10-28 13:09:13 +01:00
David Hildenbrand	3d95c7d2d7	KVM: s390: sigp: move target cpu checks into dispatcher All sigp orders targeting one VCPU have to verify that the target is valid and available. Let's move the check from the single functions to the dispatcher. The destination VCPU is directly passed as a pointer - instead of the cpu address of the target. Please note that all SIGP orders except SIGP SET ARCHITECTURE - even unknown ones - will now check for the availability of the target VCPU. This is what the architecture documentation specifies. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2014-10-28 13:09:12 +01:00
David Hildenbrand	3526a66b66	KVM: s390: sigp: dispatch orders with one target in a separate function All sigp orders except SIGP SET ARCHITECTURE target exactly one vcpu. Let's move the dispatch code for these orders into a separate function to prepare for cleaner target availability checks. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2014-10-28 13:09:11 +01:00
Thomas Huth	a36c539326	KVM: s390: Fix size of monitor-class number field The monitor-class number field is only 16 bits, so we have to use a u16 pointer to access it. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> CC: stable@vger.kernel.org # v3.16+ Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2014-10-28 13:09:11 +01:00
Jason J. Herne	edeb69e537	KVM: s390: Cleanup usage of current->mm in set_guest_storage_key In set_guest_storage_key, we really want to reference the mm struct given as a parameter to the function. So replace the current->mm reference with the mm struct passed in by the caller. Signed-off-by: Jason J. Herne <jjherne@us.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2014-10-28 13:09:10 +01:00
Thomas Huth	a6b7e459ff	KVM: s390: Make the simple ipte mutex specific to a VM instead of global The ipte-locking should be done for each VM seperately, not globally. This way we avoid possible congestions when the simple ipte-lock is used and multiple VMs are running. Suggested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2014-10-28 13:08:59 +01:00
Maciej W. Rozycki	60e684f0d6	x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts Fix duplicate XT-PIC seen in /proc/interrupts on x86 systems that make use of 8259A Programmable Interrupt Controllers. Specifically convert output like this: CPU0 0: 76573 XT-PIC-XT-PIC timer 1: 11 XT-PIC-XT-PIC i8042 2: 0 XT-PIC-XT-PIC cascade 4: 8 XT-PIC-XT-PIC serial 6: 3 XT-PIC-XT-PIC floppy 7: 0 XT-PIC-XT-PIC parport0 8: 1 XT-PIC-XT-PIC rtc0 10: 448 XT-PIC-XT-PIC fddi0 12: 23 XT-PIC-XT-PIC eth0 14: 2464 XT-PIC-XT-PIC ide0 NMI: 0 Non-maskable interrupts ERR: 0 to one like this: CPU0 0: 122033 XT-PIC timer 1: 11 XT-PIC i8042 2: 0 XT-PIC cascade 4: 8 XT-PIC serial 6: 3 XT-PIC floppy 7: 0 XT-PIC parport0 8: 1 XT-PIC rtc0 10: 145 XT-PIC fddi0 12: 31 XT-PIC eth0 14: 2245 XT-PIC ide0 NMI: 0 Non-maskable interrupts ERR: 0 that is one like we used to have from ~2.2 till it was changed sometime. The rationale is there is no value in this duplicate information, it merely clutters output and looks ugly. We only have one handler for 8259A interrupts so there is no need to give it a name separate from the name already given to irq_chip. We could define meaningful names for handlers based on bits in the ELCR register on systems that have it or the value of the LTIM bit we use in ICW1 otherwise (hardcoded to 0 though with MCA support gone), to tell edge-triggered and level-triggered inputs apart. While that information does not affect 8259A interrupt handlers it could help people determine which lines are shareable and which are not. That is material for a separate change though. Any tools that parse /proc/interrupts are supposed not to be affected since it was many years we used the format this change converts back to. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.11.1410260147190.21390@eddie.linux-mips.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 12:01:08 +01:00
Peter Zijlstra	7fb0f1de49	perf/x86: Fix compile warnings for intel_uncore The uncore drivers require PCI and generate compile time warnings when !CONFIG_PCI. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:51:03 +01:00
Peter Zijlstra	c719f56092	perf: Fix and clean up initialization of pmu::event_idx Andy reported that the current state of event_idx is rather confused. So remove all but the x86_pmu implementation and change the default to return 0 (the safe option). Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: Cody P Schafer <dev@codyps.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Himangi Saraogi <himangi774@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: sukadev@linux.vnet.ibm.com <sukadev@linux.vnet.ibm.com> Cc: Thomas Huth <thuth@linux.vnet.ibm.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux390@de.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:51:01 +01:00
Peter Zijlstra (Intel)	65d71fe137	perf: Fix bogus kernel printk Andy spotted the fail in what was intended as a conditional printk level. Reported-by: Andy Lutomirski <luto@amacapital.net> Fixes: `cc6cd47e73` ("perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141007124757.GH19379@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:51:01 +01:00
Oleg Nesterov	009f60e276	sched: stop the unbound recursion in preempt_schedule_context() preempt_schedule_context() does preempt_enable_notrace() at the end and this can call the same function again; exception_exit() is heavy and it is quite possible that need-resched is true again. 1. Change this code to dec preempt_count() and check need_resched() by hand. 2. As Linus suggested, we can use the PREEMPT_ACTIVE bit and avoid the enable/disable dance around __schedule(). But in this case we need to move into sched/core.c. 3. Cosmetic, but x86 forgets to declare this function. This doesn't really matter because it is only called by asm helpers, still it make sense to add the declaration into asm/preempt.h to match preempt_schedule(). Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Graf <agraf@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20141005202322.GB27962@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:46:05 +01:00
Ian Munsie	03f5439797	powerpc/mm: Use appropriate ESID mask in copro_calculate_slb() This patch makes copro_calculate_slb() mask the ESID by the correct mask for 1T vs 256M segments. This has no effect by itself as the extra bits were ignored, but it makes debugging the segment table entries easier and means that we can directly compare the ESID values for duplicates without needing to worry about masking in the comparison. This will be used to simplify a comparison in the following patch. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-28 19:52:45 +11:00
Weijie Yang	3c325f8233	x86, cma: Reserve DMA contiguous area after initmem_init() Fengguang Wu reported a boot crash on the x86 platform via the 0-day Linux Kernel Performance Test: cma: dma_contiguous_reserve: reserving 31 MiB for global area BUG: Int 6: CR2 (null) [<41850786>] dump_stack+0x16/0x18 [<41d2b1db>] early_idt_handler+0x6b/0x6b [<41072227>] ? __phys_addr+0x2e/0xca [<41d4ee4d>] cma_declare_contiguous+0x3c/0x2d7 [<41d6d359>] dma_contiguous_reserve_area+0x27/0x47 [<41d6d4d1>] dma_contiguous_reserve+0x158/0x163 [<41d33e0f>] setup_arch+0x79b/0xc68 [<41d2b7cf>] start_kernel+0x9c/0x456 [<41d2b2ca>] i386_start_kernel+0x79/0x7d (See details at: https://lkml.org/lkml/2014/10/8/708) It is because dma_contiguous_reserve() is called before initmem_init() in x86, the variable high_memory is not initialized but accessed by __pa(high_memory) in dma_contiguous_reserve(). This patch moves dma_contiguous_reserve() after initmem_init() so that high_memory is initialized before accessed. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: iamjoonsoo.kim@lge.com Cc: 'Linux-MM' <linux-mm@kvack.org> Cc: 'Weijie Yang' <weijie.yang.kh@gmail.com> Link: http://lkml.kernel.org/r/000101cfef69%2431e528a0%2495af79e0%24%25yang@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 07:36:50 +01:00
Michael Ellerman	bf19edd290	Revert "powerpc/powernv: Fix endian bug in LPC bus debugfs accessors" This reverts commit `bf7588a085`. Ben says although the code is not correct "[this] fix was completely wrong and does more damages than it fixes things." Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2014-10-28 15:17:48 +11:00
Michal Simek	a4f174dee4	microblaze: Wire up bpf syscall Add new bpf syscall. Signed-off-by: Michal Simek <michal.simek@xilinx.com>	2014-10-27 09:25:34 +01:00
Michal Simek	70dcd942dc	microblaze: Fix IO space breakage after of_pci_range_to_resource() change Commit `0b0b0893d4` "of/pci: Fix the conversion of IO ranges into IO resources" changed the behaviour of of_pci_range_to_resource(). The issue is described here: "powerpc/pci: Fix IO space breakage after of_pci_range_to_resource() change" (sha1: `aeba3731b1`) Signed-off-by: Michal Simek <michal.simek@xilinx.com>	2014-10-27 08:29:54 +01:00
Michal Simek	4cbbbb43d6	microblaze: Fix missing NR_CPUS in menuconfig The time Kconfig expects that NR_CPUS is defined. This patch remove this config warning: "kernel/time/Kconfig:163:warning: range is invalid" Signed-off-by: Michal Simek <michal.simek@xilinx.com>	2014-10-27 08:29:54 +01:00
Linus Torvalds	88e237610b	ARM: SoC fixes for -rc2 Another week, another small batch of fixes. Most of these make zynq, socfpga and sunxi platforms work a bit better: * Due to new requirements for regulators, DWMMC on socfpga broke past 3.17. * SMP spinup fix for socfpga * A few DT fixes for zynq * Another option (FIXED_REGULATOR) for sunxi is needed that used to be selected by other options but no longer is. * A couple of small DT fixes for at91 * ...and a couple for i.MX. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUTSwQAAoJEIwa5zzehBx3D0AP/3ktsJ9ORSSDDEbpGWUPndQN bLGOT4DGfWWn/BOlMYN9kM2k7Gr6ttxFzqepKoeb0Dl5myUeqC4C42t8FqrI78TB wf8e9f2lXI+j3wve55FarTDk9JSh6PbQdavgnNCzVLJQddA//JKz9vZhL4jVYC/s kh8VeoLOYKeE/sdcYeBF36zNAkmy0CfaGjC01SZEcd7BjVv8qq0TvkXXSP1bjsry ztH+DN8OJ3gg7IKB8IntfzaxSnDQl+zxlVeOsPaU1Lvahs6wSFgRqA849Nc6KXdl rpAuaTH6Pa5RNEd1zqhE2+o4xZymk/BM+JU77pizq4dP0o3JnDy5tzzMMd24FuMG sD+JZrSCP9o58L1y9W1jhVgoxmpnRGZNO1n8FhABcnSTL50W3iAzIvlpxnOIu0/z SzNMdItA3dtCn/Aec7wL7eGLUlyI73khMIt4heQ0jPY+IncGJ0yvdFe2m8SZKmS2 mDeQaChml8rjXvIdjiWIlDTagBpTkR1R1JX6aJh0lgZIF1K9qf1ZfzJ5dbLAXtZe xjGeoOe8hXRxR0spc1rRAJlPGJh/Fqkm0UeFLDwP0DOJISTcgz4daT/Y7zdDGRJ6 n+1kjrmwv/M481wNifFt33sdZEB1EcUO/uNAYfUV0Wlpv5ye7x2aLsfbsnMEh+qd H0a6R6NZu7473ewhWxRu =MTvh -----END PGP SIGNATURE----- Merge tag 'armsoc-for-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "Another week, another small batch of fixes. Most of these make zynq, socfpga and sunxi platforms work a bit better: - due to new requirements for regulators, DWMMC on socfpga broke past v3.17 - SMP spinup fix for socfpga - a few DT fixes for zynq - another option (FIXED_REGULATOR) for sunxi is needed that used to be selected by other options but no longer is. - a couple of small DT fixes for at91 - ...and a couple for i.MX" * tag 'armsoc-for-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: imx28-evk: Let i2c0 run at 100kHz ARM: i.MX6: Fix "emi" clock name typo ARM: multi_v7_defconfig: enable CONFIG_MMC_DW_ROCKCHIP ARM: sunxi_defconfig: enable CONFIG_REGULATOR_FIXED_VOLTAGE ARM: dts: socfpga: Add a 3.3V fixed regulator node ARM: dts: socfpga: Fix SD card detect ARM: dts: socfpga: rename gpio nodes ARM: at91/dt: sam9263: fix PLLB frequencies power: reset: at91-reset: fix power down register MAINTAINERS: add atmel ssc driver maintainer entry arm: socfpga: fix fetching cpu1start_addr for SMP ARM: zynq: DT: trivial: Fix mc node ARM: zynq: DT: Add cadence watchdog node ARM: zynq: DT: Add missing reference for memory-controller ARM: zynq: DT: Add missing reference for ADC ARM: zynq: DT: Add missing address for L2 pl310 ARM: zynq: DT: Remove 222 MHz OPP ARM: zynq: DT: Fix GEM register area size	2014-10-26 11:35:51 -07:00
Olof Johansson	efc176a8ee	The i.MX fixes for 3.18: - Revert one patch which increases I2C bus frequency on imx28-evk - Fix a typo on imx6q EIM clock name -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUTE8JAAoJEFBXWFqHsHzOGawH/0KGNaHbI3rj+Hx1HHtN056y 3rgHSsLZSLQB89+bMd8aEVPJ2z0RKYXfyI1IvkcgEZxsqmHwRY8Fwlof4D38/bfP tRHnyzT2E+znnyhvUZlH9yd9foTd3VkXbxFxbEssRHl2W2OxA0+3MbskknERPZqs qr22DcMLKyrTbUH39iiEjS43qcJhuf/6vZmoVGCGdZonZwkH8WccIQ+kKneOn8/Z 11U4ioB4pirqvhM1niYQ95RLG0TveBN6op3c1HWkhqY4EKOlraZHQb4EOoslSO/X vWoJqgB9DLH3eV+WTFI0FjGDK/6CFhgAth8q0FKVlHA3FFHr+fXdxv/+NLtagzQ= =elO/ -----END PGP SIGNATURE----- Merge tag 'imx-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes Merge "ARM: imx: fixes for 3.18" from Shawn Guo: The i.MX fixes for 3.18: - Revert one patch which increases I2C bus frequency on imx28-evk - Fix a typo on imx6q EIM clock name * tag 'imx-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: imx28-evk: Let i2c0 run at 100kHz ARM: i.MX6: Fix "emi" clock name typo Signed-off-by: Olof Johansson <olof@lixom.net>	2014-10-25 20:44:05 -07:00
Fabio Estevam	d1e61eb443	ARM: dts: imx28-evk: Let i2c0 run at 100kHz Commit `78b81f4666` ("ARM: dts: imx28-evk: Run I2C0 at 400kHz") caused issues when doing the following sequence in loop: - Boot the kernel - Perform audio playback - Reboot the system via 'reboot' command In many times the audio card cannot be probed, which causes playback to fail. After restoring to the original i2c0 frequency of 100kHz there is no such problem anymore. This reverts commit `78b81f4666`. Cc: <stable@vger.kernel.org> # 3.16+ Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>	2014-10-25 20:17:36 +08:00
Steve Longerbeam	a1fc198046	ARM: i.MX6: Fix "emi" clock name typo Fix a typo error, the "emi" names refer to the eim clocks. The change fixes typo in EIM and EIM_SLOW pre-output dividers and selectors clock names. Notably EIM_SLOW clock itself is named correctly. Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com> [vladimir_zapolskiy@mentor.com: ported to v3.17] Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Cc: Sascha Hauer <kernel@pengutronix.de> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>	2014-10-25 20:01:09 +08:00
Eric Paris	26c2d2b391	i386/audit: stop scribbling on the stack frame git commit `b4f0d3755c` was very very dumb. It was writing over %esp/pt_regs semi-randomly on i686 with the expected "system can't boot" results. As noted in: https://bugs.freedesktop.org/show_bug.cgi?id=85277 This patch stops fscking with pt_regs. Instead it sets up the registers for the call to __audit_syscall_entry in the most obvious conceivable way. It then does just a tiny tiny touch of magic. We need to get what started in PT_EDX into 0(%esp) and PT_ESI into 4(%esp). This is as easy as a pair of pushes. After the call to __audit_syscall_entry all we need to do is get that now useless junk off the stack (pair of pops) and reload %eax with the original syscall so other stuff can keep going about it's business. Reported-by: Paulo Zanoni <przanoni@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com> Link: http://lkml.kernel.org/r/1414037043-30647-1-git-send-email-eparis@redhat.com Cc: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-10-24 13:27:56 -07:00
H. Peter Anvin	db65bcfd95	Linux 3.18-rc1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJURGCfAAoJEHm+PkMAQRiG6toH/RUazjqZxqMvLlm1y+O6+7s9 OpFdcDl4ZQtrvymBRYipu46pbDUoAAsVbxQJllaLNtHE0UrvaQE76WihBQYM8qW/ WoESLsZRbNQqQYQixf55pOozX7uIuG+9LKHagC8JNfD1Bw/nQ+RleSXqFsBCdpMW i7SzcZBu2Iv+LnVmjvoGMOQa+loKzO6Pj1MpoHxxJQmeypH3dZR7mLVeBJNZQtLE BGY47gYraVzb9EjKnSbjrIKjpM9o0MIihoanrrjnq0JMrfm4pi6W5GgaGDUiaBVH w7Vmr5S2pjzrS41gKSVK9/XO1CrDG8tsp3QwA2+iIbjdR3wBDynyeG3UfnLABec= =hwbG -----END PGP SIGNATURE----- Merge tag 'v3.18-rc1' into x86/urgent Reason: Need to apply audit patch on top of v3.18-rc1. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-10-24 13:26:37 -07:00
Linus Torvalds	2cc91884b6	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus Pull MIPS fixes from Ralf Baechle: "This is the first round of fixes and tying up loose ends for MIPS. - plenty of fixes for build errors in specific obscure configurations - remove redundant code on the Lantiq platform - removal of a useless SEAD I2C driver that was causing a build issue - fix an earlier TLB exeption handler fix to also work on Octeon. - fix ISA level dependencies in FPU emulator's instruction decoding. - don't hardcode kernel command line in Octeon software emulator. - fix an earlier fix for the Loondson 2 clock setting" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: SEAD3: Fix I2C device registration. MIPS: SEAD3: Nuke PIC32 I2C driver. MIPS: ftrace: Fix a microMIPS build problem MIPS: MSP71xx: Fix build error MIPS: Malta: Do not build the malta-amon.c file if CMP is not enabled MIPS: Prevent compiler warning from cop2_{save,restore} MIPS: Kconfig: Add missing MIPS_CPS dependencies to PM and cpuidle MIPS: idle: Remove leftover __pastwait symbol and its references MIPS: Sibyte: Include the swarm subdir to the sb1250 LittleSur builds MIPS: ptrace.h: Add a missing include MIPS: ath79: Fix compilation error when CONFIG_PCI is disabled MIPS: MSP71xx: Remove compilation error when CONFIG_MIPS_MT is present MIPS: Octeon: Remove special case for simulator command line. MIPS: tlbex: Properly fix HUGE TLB Refill exception handler MIPS: loongson2_cpufreq: Fix CPU clock rate setting mismerge pci: pci-lantiq: remove duplicate check on resource MIPS: Lasat: Add missing CONFIG_PROC_FS dependency to PICVUE_PROC MIPS: cp1emu: Fix ISA restrictions for cop1x_op instructions	2014-10-24 12:48:47 -07:00
Linus Torvalds	cdc63a0595	arm64 fixes: - Enable 48-bit VA space now that KVM has been fixed, together with a couple of fixes for pgd allocation alignment and initial memblock current_limit. There is still a dependency on !ARM_SMMU which needs to be updated as it uses the page table manipulation macros of the host kernel - eBPF fixes following changes/conflicts during the merging window - Compat types affecting compat_elf_prpsinfo - Compilation error on UP builds - ASLR fix when /proc/sys/kernel/randomize_va_space == 0 - DT definitions for CLCD support on ARMv8 model platform -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUSowhAAoJEGvWsS0AyF7x4A8QAK/KJj3aEhRoJCFKtkrKcQer WydkHEVpJtk8Y+o9zIBU/J9HXDDakQlIZi3bNcWA+TQda1yr7zqEgVRZwhfaQMIu oXLzSLnZtiqe2HU7TaccJfFG293K+gysjTPRPixdAwWO/9hvoPOqJHnBRWKTDNzh 8D04PTM9dcpKXvVjPcRHIxbk2oH04a/tjOBeTpi5uWaUdZLWjHt2dTjWwP/q0af4 XsDrF5pYQaYEzCI9MczSbcQLwFPkxhS36JH+V+OhmVoCFv0PT7mm5o29DiU1N/Rt UsAwtBQ4oQV8seZMQaT5sVDNBqqqyfrYDAACdY0ewIr81PF7z8tdm5+G1P4JfQ0t iVguz3s1rJ6V0yXy0t18XHgpPFLLqpoEDEO6obYXYrhe2nTquQulgJoLaIu2qXmO jlL8R1rHWKRAQ7xIyLATjhUmW5dc2aK6xO+/3Xuz1+JOunNeOZW67xexpPzRU4Vh sw9S8sKwJmL5wH+ojqxsbg73WvTUs5dd4WoK7Tci8FZ0qfG14pyaX4s9iRhUZArQ 4vx8lfF7FQma8nZ0ytXSY/666dAedL/bXZrmPhjVl/XYoEA4IFaW7uDIIqGbUMWr oNOe4QahxZu5jHI8CYncXHw51RXst+03oE5Uon30x7F3ZD71perLZe2dMTTaezKv 2MX9/BULRrjpgkA4gW5L =QssH -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - enable 48-bit VA space now that KVM has been fixed, together with a couple of fixes for pgd allocation alignment and initial memblock current_limit. There is still a dependency on !ARM_SMMU which needs to be updated as it uses the page table manipulation macros of the host kernel - eBPF fixes following changes/conflicts during the merging window - Compat types affecting compat_elf_prpsinfo - Compilation error on UP builds - ASLR fix when /proc/sys/kernel/randomize_va_space == 0 - DT definitions for CLCD support on ARMv8 model platform * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix memblock current_limit with 64K pages and 48-bit VA arm64: ASLR: Don't randomise text when randomise_va_space == 0 arm64: vexpress: Add CLCD support to the ARMv8 model platform arm64: Fix compilation error on UP builds Documentation/arm64/memory.txt: fix typo net: bpf: arm64: minor fix of type in jited arm64: bpf: add 'load 64-bit immediate' instruction arm64: bpf: add 'shift by register' instructions net: bpf: arm64: address randomize and write protect JIT code arm64: mm: Correct fixmap pagetable types arm64: compat: fix compat types affecting struct compat_elf_prpsinfo arm64: Align less than PAGE_SIZE pgds naturally arm64: Allow 48-bits VA space without ARM_SMMU	2014-10-24 12:48:04 -07:00
Linus Torvalds	83da00fbc0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc Pull two sparc fixes from David Miller: 1) Fix boots with gcc-4.9 compiled sparc64 kernels. 2) Add missing __get_user_pages_fast() on sparc64 to fix hangs on futexes used in transparent hugepage areas. It's really idiotic to have a weak symbolled fallback that just returns zero, and causes this kind of bug. There should be no backup implementation and the link should fail if the architecture fails to provide __get_user_pages_fast() and supports transparent hugepages. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Implement __get_user_pages_fast(). sparc64: Fix register corruption in top-most kernel stack frame during boot.	2014-10-24 12:45:47 -07:00
Linus Torvalds	96971e9aa9	This is a pretty large update. I think it is roughly as big as what I usually had for the _whole_ rc period. There are a few bad bugs where the guest can OOPS or crash the host. We have also started looking at attack models for nested virtualization; bugs that usually result in the guest ring 0 crashing itself become more worrisome if you have nested virtualization, because the nested guest might bring down the non-nested guest as well. For current uses of nested virtualization these do not really have a security impact, but you never know and bugs are bugs nevertheless. A lot of these bugs are in 3.17 too, resulting in a large number of stable@ Ccs. I checked that all the patches apply there with no conflicts. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJUSjmSAAoJEL/70l94x66D2cYH/3JKWsTzhXjHGxZcXQQ85CwR 49hp/crCLWJ2YRKzyAOkvwPI0/SgYKM5wJ8kgtKlpLxrPZKYwhGd1S9tKf6EdAib 5gc/SDDAgHmkqL3IrXmkyKzUVeUWvgD/IFi1Sqalko1blpRlaN/JyJV0mjjGCbA+ yH3Qi5tD0X00u00ycuZCB6mrFH0PH87BmKFiz6bSSJ43tsgD9AVD64BZid6c6hwm iaIfNcIuShavlv1TKG80cSez2qtNXjRLeTN8A10gVZo3hof/wP8aRm+LxF/1JEZX OsoNCjOhhL29qafcZOg3j/atbiAzWtSGV3vjU+iWh5mnN5oFZHcPgIGucQsuFec= =9oQY -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "This is a pretty large update. I think it is roughly as big as what I usually had for the _whole_ rc period. There are a few bad bugs where the guest can OOPS or crash the host. We have also started looking at attack models for nested virtualization; bugs that usually result in the guest ring 0 crashing itself become more worrisome if you have nested virtualization, because the nested guest might bring down the non-nested guest as well. For current uses of nested virtualization these do not really have a security impact, but you never know and bugs are bugs nevertheless. A lot of these bugs are in 3.17 too, resulting in a large number of stable@ Ccs. I checked that all the patches apply there with no conflicts" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: vfio: fix unregister kvm_device_ops of vfio KVM: x86: Wrong assertion on paging_tmpl.h kvm: fix excessive pages un-pinning in kvm_iommu_map error path. KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag KVM: x86: Emulator does not decode clflush well KVM: emulate: avoid accessing NULL ctxt->memopp KVM: x86: Decoding guest instructions which cross page boundary may fail kvm: x86: don't kill guest on unknown exit reason kvm: vmx: handle invvpid vm exit gracefully KVM: x86: Handle errors when RIP is set during far jumps KVM: x86: Emulator fixes for eip canonical checks on near branches KVM: x86: Fix wrong masking on relative jump/call KVM: x86: Improve thread safety in pit KVM: x86: Prevent host from panicking on shared MSR writes. KVM: x86: Check non-canonical addresses upon WRMSR	2014-10-24 12:42:55 -07:00
Linus Torvalds	20ca57cde5	xen: bug fixes for 3.18-rc1 - Fix regression in xen_clocksource_read() which caused all Xen guests to crash early in boot. - Several fixes for super rare race conditions in the p2m. - Assorted other minor fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJUSh3nAAoJEFxbo/MsZsTRw6IH/imL2J++b8cafVvHjmVRt1T/ P7KuFYPh/Tym+LISDBfk7MeOXZWsffvUDP653cGQiIMgmumEgVrU1+vR2Z0qRiRe 95ZDIuQBmyGNBG9MiB0+zB7+STsvLECkPVWYDJCNbGVgrlHL6UHne06edrSpfr30 13PyZeJAojezrt2hzLO43V7bu9acRmLo6WNdh6N2stfJv8QSQYSQO87baRdRB+rO I1r2jP7TJp9ZRtzSTsYLfpyhCGLcvXY58bci+Tz9x6xWMJ/HH5HvfJjxO17HzbdD 2se6MKFVbOXT7DQK+BvQBDIO52t731DWZs4t7SJg24kDoINL7XiC/qSHC0vHJJM= =Cs0b -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.18-b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - Fix regression in xen_clocksource_read() which caused all Xen guests to crash early in boot. - Several fixes for super rare race conditions in the p2m. - Assorted other minor fixes. * tag 'stable/for-linus-3.18-b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pci: Allocate memory for physdev_pci_device_add's optarr x86/xen: panic on bad Xen-provided memory map x86/xen: Fix incorrect per_cpu accessor in xen_clocksource_read() x86/xen: avoid race in p2m handling x86/xen: delay construction of mfn_list_list x86/xen: avoid writing to freed memory after race in p2m handling xen/balloon: Don't continue ballooning when BP_ECANCELED is encountered	2014-10-24 12:41:50 -07:00
Catalin Marinas	3dec0fe48a	arm64: Fix memblock current_limit with 64K pages and 48-bit VA With 48-bit VA space, the 64K page configuration uses 3 levels instead of 2 and PUD_SIZE != PMD_SIZE. Since with 64K pages we only cover PMD_SIZE with the initial swapper_pg_dir populated in head.S, the memblock current_limit needs to be set accordingly in map_mem() to avoid allocating unmapped memory. The memblock current_limit is progressively increased as more blocks are mapped. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2014-10-24 18:16:47 +01:00
David S. Miller	06090e8ed8	sparc64: Implement __get_user_pages_fast(). It is not sufficient to only implement get_user_pages_fast(), you must also implement the atomic version __get_user_pages_fast() otherwise you end up using the weak symbol fallback implementation which simply returns zero. This is dangerous, because it causes the futex code to loop forever if transparent hugepages are supported (see get_futex_key()). Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-24 09:59:02 -07:00
David S. Miller	ef3e035c3a	sparc64: Fix register corruption in top-most kernel stack frame during boot. Meelis Roos reported that kernels built with gcc-4.9 do not boot, we eventually narrowed this down to only impacting machines using UltraSPARC-III and derivitive cpus. The crash happens right when the first user process is spawned: [ 54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 [ 54.451346] [ 54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab #96 [ 54.666431] Call Trace: [ 54.698453] [0000000000762f8c] panic+0xb0/0x224 [ 54.759071] [000000000045cf68] do_exit+0x948/0x960 [ 54.823123] [000000000042cbc0] fault_in_user_windows+0xe0/0x100 [ 54.902036] [0000000000404ad0] __handle_user_windows+0x0/0x10 [ 54.978662] Press Stop-A (L1-A) to return to the boot prom [ 55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 Further investigation showed that compiling only per_cpu_patch() with an older compiler fixes the boot. Detailed analysis showed that the function is not being miscompiled by gcc-4.9, but it is using a different register allocation ordering. With the gcc-4.9 compiled function, something during the code patching causes some of the %i* input registers to get corrupted. Perhaps we have a TLB miss path into the firmware that is deep enough to cause a register window spill and subsequent restore when we get back from the TLB miss trap. Let's plug this up by doing two things: 1) Stop using the firmware stack for client interface calls into the firmware. Just use the kernel's stack. 2) As soon as we can, call into a new function "start_early_boot()" to put a one-register-window buffer between the firmware's deepest stack frame and the top-most initial kernel one. Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-24 09:52:49 -07:00
Arun Chandran	92980405f3	arm64: ASLR: Don't randomise text when randomise_va_space == 0 When user asks to turn off ASLR by writing "0" to /proc/sys/kernel/randomize_va_space there should not be any randomization to mmap base, stack, VDSO, libs, text and heap Currently arm64 violates this behavior by randomising text. Fix this by defining a constant ELF_ET_DYN_BASE. The randomisation of mm->mmap_base is done by setup_new_exec -> arch_pick_mmap_layout -> mmap_base -> mmap_rnd. Signed-off-by: Arun Chandran <achandran@mvista.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2014-10-24 15:47:49 +01:00
Ralf Baechle	4846f11816	MIPS: SEAD3: Fix I2C device registration. This isn't a module and shouldn't be one. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2014-10-24 13:34:42 +02:00
Nadav Amit	1715d0dcb0	KVM: x86: Wrong assertion on paging_tmpl.h Even after the recent fix, the assertion on paging_tmpl.h is triggered. Apparently, the assertion wants to check that the PAE is always set on long-mode, but does it in incorrect way. Note that the assertion is not enabled unless the code is debugged by defining MMU_DEBUG. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-24 13:30:37 +02:00
Nadav Amit	3f6f1480d8	KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag The decode phase of the x86 emulator assumes that every instruction with the ModRM flag, and which can be used with RIP-relative addressing, has either SrcMem or DstMem. This is not the case for several instructions - prefetch, hint-nop and clflush. Adding SrcMem\|NoAccess for prefetch and hint-nop and SrcMem for clflush. This fixes CVE-2014-8480. Fixes: `41061cdb98` Cc: stable@vger.kernel.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-24 13:30:36 +02:00
Nadav Amit	13e457e0ee	KVM: x86: Emulator does not decode clflush well Currently, all group15 instructions are decoded as clflush (e.g., mfence, xsave). In addition, the clflush instruction requires no prefix (66/f2/f3) would exist. If prefix exists it may encode a different instruction (e.g., clflushopt). Creating a group for clflush, and different group for each prefix. This has been the case forever, but the next patch needs the cflush group in order to fix a bug introduced in 3.17. Fixes: `41061cdb98` Cc: stable@vger.kernel.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-24 13:30:36 +02:00
Paolo Bonzini	a430c91663	KVM: emulate: avoid accessing NULL ctxt->memopp A failure to decode the instruction can cause a NULL pointer access. This is fixed simply by moving the "done" label as close as possible to the return. This fixes CVE-2014-8481. Reported-by: Andy Lutomirski <luto@amacapital.net> Cc: stable@vger.kernel.org Fixes: `41061cdb98` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-24 13:30:35 +02:00

1 2 3 4 5 ...

102268 Commits