linux

Author	SHA1	Message	Date
Michael S. Tsirkin	f229a55c31	virtio/ringtest: fix up need_event math last kicked event index must be updated unconditionally: even if we don't need to kick, we do not want to re-check the same entry for events. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2018-01-31 01:47:35 +02:00
Arvind Yadav	31919140ff	virtio: virtio_mmio: make of_device_ids const. of_device_ids are not supposed to change at runtime. All functions working with of_device_ids provided by <linux/of.h> work with const of_device_ids. So mark the non-const structs as const. File size before: text data bss dec hex filename 3647 608 0 4255 109f drivers/virtio/virtio_mmio.o File size after constify virtio_mmio_match. text data bss dec hex filename 4063 192 0 4255 109f drivers/virtio/virtio_mmio.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-01-31 01:47:35 +02:00
Vasyl Gomonovych	0a9e63aa39	firmware: Use PTR_ERR_OR_ZERO() Fix ptr_ret.cocci warnings: drivers/firmware/efi/efi.c:610:8-14: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Gabriel Somlo <somlo@cmu.edu> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-01-31 01:47:34 +02:00
Vasyl Gomonovych	c2c9f9bc5b	virtio-mmio: Use PTR_ERR_OR_ZERO() Fix ptr_ret.cocci warnings: drivers/virtio/virtio_mmio.c:653:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-01-31 01:47:34 +02:00
Markus Elfring	473f0b15a4	vhost/scsi: Improve a size determination in four functions Replace the specification of four data structures by pointer dereferences as the parameter for the operator "sizeof" to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-01-31 01:47:33 +02:00
Tomáš Golembiovský	4d32029b8d	virtio_balloon: include disk/file caches memory statistics Add a new field VIRTIO_BALLOON_S_CACHES to virtio_balloon memory statistics protocol. The value represents all disk/file caches. In this case it corresponds to the sum of values Buffers+Cached+SwapCached from /proc/meminfo. Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-01-31 01:47:33 +02:00
Linus Torvalds	13ddd1667e	Merge branch 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing too interesting. Documentation updates and trivial changes; however, this pull request does containt he previusly discussed dropping of __must_check from strscpy()" * 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: Documentation: Fix 'file_mapped' -> 'mapped_file' string: drop __must_check from strscpy() and restore strscpy() usages in cgroup cgroup, docs: document the root cgroup behavior of cpu and io controllers cgroup-v2.txt: fix typos cgroup: Update documentation reference Documentation/cgroup-v1: fix outdated programming details cgroup, docs: document cgroup v2 device controller	2018-01-30 15:09:47 -08:00
Vitaly Kuznetsov	0092e4346f	x86/kvm: Support Hyper-V reenlightenment When running nested KVM on Hyper-V guests its required to update masterclocks for all guests when L1 migrates to a host with different TSC frequency. Implement the procedure in the following way: - Pause all guests. - Tell the host (Hyper-V) to stop emulating TSC accesses. - Update the gtod copy, recompute clocks. - Unpause all guests. This is somewhat similar to cpufreq but there are two important differences: - TSC emulation can only be disabled globally (on all CPUs) - The new TSC frequency is not known until emulation is turned off so there is no way to 'prepare' for the event upfront. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: devel@linuxdriverproject.org Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Mohammed Gamal <mmorsy@redhat.com> Link: https://lkml.kernel.org/r/20180124132337.30138-8-vkuznets@redhat.com	2018-01-30 23:55:34 +01:00
Vitaly Kuznetsov	b0c39dc68e	x86/kvm: Pass stable clocksource to guests when running nested on Hyper-V Currently, KVM is able to work in 'masterclock' mode passing PVCLOCK_TSC_STABLE_BIT to guests when the clocksource which is used on the host is TSC. When running nested on Hyper-V the guest normally uses a different one: TSC page which is resistant to TSC frequency changes on events like L1 migration. Add support for it in KVM. The only non-trivial change is in vgettsc(): when updating the gtod copy both the clock readout and tsc value have to be updated now. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: devel@linuxdriverproject.org Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Mohammed Gamal <mmorsy@redhat.com> Link: https://lkml.kernel.org/r/20180124132337.30138-7-vkuznets@redhat.com	2018-01-30 23:55:34 +01:00
Vitaly Kuznetsov	51d4e5daa3	x86/irq: Count Hyper-V reenlightenment interrupts Hyper-V reenlightenment interrupts arrive when the VM is migrated, While they are not interesting in general it's important when L2 nested guests are running. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: devel@linuxdriverproject.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Mohammed Gamal <mmorsy@redhat.com> Link: https://lkml.kernel.org/r/20180124132337.30138-6-vkuznets@redhat.com	2018-01-30 23:55:33 +01:00
Vitaly Kuznetsov	e7c4e36c44	x86/hyperv: Redirect reenlightment notifications on CPU offlining It is very unlikely for CPUs to get offlined when running on Hyper-V as there is a protection in the vmbus module which prevents it when the guest has any VMBus devices assigned. This, however, may change in future if an option to reassign an already active channel will be added. It is also possible to run without any Hyper-V devices or to have a CPU with no assigned channels. Reassign reenlightenment notifications to some other active CPU when the CPU which is assigned to them goes offline. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: devel@linuxdriverproject.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Mohammed Gamal <mmorsy@redhat.com> Link: https://lkml.kernel.org/r/20180124132337.30138-5-vkuznets@redhat.com	2018-01-30 23:55:33 +01:00
Vitaly Kuznetsov	93286261de	x86/hyperv: Reenlightenment notifications support Hyper-V supports Live Migration notification. This is supposed to be used in conjunction with TSC emulation: when a VM is migrated to a host with different TSC frequency for some short period the host emulates the accesses to TSC and sends an interrupt to notify about the event. When the guest is done updating everything it can disable TSC emulation and everything will start working fast again. These notifications weren't required until now as Hyper-V guests are not supposed to use TSC as a clocksource: in Linux the TSC is even marked as unstable on boot. Guests normally use 'tsc page' clocksource and host updates its values on migrations automatically. Things change when with nested virtualization: even when the PV clocksources (kvm-clock or tsc page) are passed through to the nested guests the TSC frequency and frequency changes need to be know.. Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that the fix in on the way. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: devel@linuxdriverproject.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Mohammed Gamal <mmorsy@redhat.com> Link: https://lkml.kernel.org/r/20180124132337.30138-4-vkuznets@redhat.com	2018-01-30 23:55:32 +01:00
Vitaly Kuznetsov	e2768eaa1c	x86/hyperv: Add a function to read both TSC and TSC page value simulateneously This is going to be used from KVM code where both TSC and TSC page value are needed. Nothing is supposed to use the function when Hyper-V code is compiled out, just BUG(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: devel@linuxdriverproject.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Mohammed Gamal <mmorsy@redhat.com> Link: https://lkml.kernel.org/r/20180124132337.30138-3-vkuznets@redhat.com	2018-01-30 23:55:32 +01:00
Vitaly Kuznetsov	89a8f6d490	x86/hyperv: Check for required priviliges in hyperv_init() In hyperv_init() its presumed that it always has access to VP index and hypercall MSRs while according to the specification it should be checked if it's allowed to access the corresponding MSRs before accessing them. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: devel@linuxdriverproject.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Mohammed Gamal <mmorsy@redhat.com> Link: https://lkml.kernel.org/r/20180124132337.30138-2-vkuznets@redhat.com	2018-01-30 23:55:32 +01:00
Linus Torvalds	289104c9a4	Merge branch 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu update from Tejun Heo: "One trivial patch to convert the return type from int to bool" * 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: percpu_counter_initialized can be boolean	2018-01-30 14:50:36 -08:00
Linus Torvalds	76a250f9a5	Merge branch 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata updates from Tejun Heo: "Nothing too interesting. Several patches to convert mdelay() to usleep_range(), removal of unused pata_at32, and other low level driver specific changes" * 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ata: pata_pdc2027x: Replace mdelay with msleep ata: pata_it821x: Replace mdelay with usleep_range in it821x_firmware_command ata: sata_mv: Replace mdelay with usleep_range in mv_reset_channel ata: remove pata_at32 phy: brcm-sata: remove unused variable phy: brcm-sata: fix semicolon.cocci warnings ata: ahci_brcm: Recover from failures to identify devices phy: brcm-sata: Implement calibrate callback ahci: Add Intel Cannon Lake PCH-H PCI ID ata_piix: constify pci_bits libata:pata_atiixp: Don't use unconnected secondary port on SB600 ata: ahci_brcm: Avoid clobbering SATA_TOP_CTRL_BUS_CTRL ahci: Allow setting a default LPM policy for mobile chipsets ahci: Add PCI ids for Intel Bay Trail, Cherry Trail and Apollo Lake AHCI ahci: Annotate PCI ids for mobile Intel chipsets as such	2018-01-30 14:48:30 -08:00
Linus Torvalds	f8cc87b6c1	Merge branch 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Workqueue has an early init trick where workqueues can be created and work items queued on them before the workqueue subsystem is online. This helps simplifying early init and operation of low level subsystems which use workqueues for managerial things which aren't depended upon early during boot. Out of laziness, the early init didn't cover workqueues with WQ_MEM_RECLAIM, which is inconsistent and confusing because adding the flag simply makes the system fail to boot. Cover WQ_MEM_RECLAIM too. This was originally brought up for RCU but RCU didn't actually need this. I still think it's a good idea to cover it" * 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: allow WQ_MEM_RECLAIM on early init workqueues workqueue: separate out init_rescuer()	2018-01-30 14:45:39 -08:00
Linus Torvalds	2afe738fc0	Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns updates from Eric Biederman: "Between the holidays and other distractions only a small amount of namespace work made it into my tree this time. Just a final cleanup from a revert several kernels ago and a small typo fix from Wolffhardt Schwabe" * 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: fix typo in assignment of fs default overflow gid autofs4: Modify autofs_wait to use current_uid() and current_gid() userns: Don't fail follow_automount based on s_user_ns	2018-01-30 14:43:12 -08:00
Linus Torvalds	d4173023e6	Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo cleanups from Eric Biederman: "Long ago when 2.4 was just a testing release copy_siginfo_to_user was made to copy individual fields to userspace, possibly for efficiency and to ensure initialized values were not copied to userspace. Unfortunately the design was complex, it's assumptions unstated, and humans are fallible and so while it worked much of the time that design failed to ensure unitialized memory is not copied to userspace. This set of changes is part of a new design to clean up siginfo and simplify things, and hopefully make the siginfo handling robust enough that a simple inspection of the code can be made to ensure we don't copy any unitializied fields to userspace. The design is to unify struct siginfo and struct compat_siginfo into a single definition that is shared between all architectures so that anyone adding to the set of information shared with struct siginfo can see the whole picture. Hopefully ensuring all future si_code assignments are arch independent. The design is to unify copy_siginfo_to_user32 and copy_siginfo_from_user32 so that those function are complete and cope with all of the different cases documented in signinfo_layout. I don't think there was a single implementation of either of those functions that was complete and correct before my changes unified them. The design is to introduce a series of helpers including force_siginfo_fault that take the values that are needed in struct siginfo and build the siginfo structure for their callers. Ensuring struct siginfo is built correctly. The remaining work for 4.17 (unless someone thinks it is post -rc1 material) is to push usage of those helpers down into the architectures so that architecture specific code will not need to deal with the fiddly work of intializing struct siginfo, and then when struct siginfo is guaranteed to be fully initialized change copy siginfo_to_user into a simple wrapper around copy_to_user. Further there is work in progress on the issues that have been documented requires arch specific knowledge to sort out. The changes below fix or at least document all of the issues that have been found with siginfo generation. Then proceed to unify struct siginfo the 32 bit helpers that copy siginfo to and from userspace, and generally clean up anything that is not arch specific with regards to siginfo generation. It is a lot but with the unification you can of siginfo you can already see the code reduction in the kernel" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (45 commits) signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr mm/memory_failure: Remove unused trapno from memory_failure signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap signal: Helpers for faults with specialized siginfo layouts signal: Add send_sig_fault and force_sig_fault signal: Replace memset(info,...) with clear_siginfo for clarity signal: Don't use structure initializers for struct siginfo signal/arm64: Better isolate the COMPAT_TASK portion of ptrace_hbptriggered ptrace: Use copy_siginfo in setsiginfo and getsiginfo signal: Unify and correct copy_siginfo_to_user32 signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32 signal: Unify and correct copy_siginfo_from_user32 signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h signal/powerpc: Remove redefinition of NSIGTRAP on powerpc signal: Move addr_lsb into the _sigfault union for clarity ...	2018-01-30 14:18:52 -08:00
Tim Chen	18bf3c3ea8	x86/speculation: Use Indirect Branch Prediction Barrier in context switch Flush indirect branches when switching into a process that marked itself non dumpable. This protects high value processes like gpg better, without having too high performance overhead. If done naïvely, we could switch to a kernel idle thread and then back to the original process, such as: process A -> idle -> process A In such scenario, we do not have to do IBPB here even though the process is non-dumpable, as we are switching back to the same process after a hiatus. To avoid the redundant IBPB, which is expensive, we track the last mm user context ID. The cost is to have an extra u64 mm context id to track the last mm we were using before switching to the init_mm used by idle. Avoiding the extra IBPB is probably worth the extra memory for this common scenario. For those cases where tlb_defer_switch_to_init_mm() returns true (non PCID), lazy tlb will defer switch to init_mm, so we will not be changing the mm for the process A -> idle -> process A switch. So IBPB will be skipped for this case. Thanks to the reviewers and Andy Lutomirski for the suggestion of using ctx_id which got rid of the problem of mm pointer recycling. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ak@linux.intel.com Cc: karahmed@amazon.de Cc: arjan@linux.intel.com Cc: torvalds@linux-foundation.org Cc: linux@dominikbrodowski.net Cc: peterz@infradead.org Cc: bp@alien8.de Cc: luto@kernel.org Cc: pbonzini@redhat.com Cc: gregkh@linux-foundation.org Link: https://lkml.kernel.org/r/1517263487-3708-1-git-send-email-dwmw@amazon.co.uk	2018-01-30 23:09:21 +01:00
Linus Torvalds	0aebc6a440	arm64 updates for 4.16: - Security mitigations: - variant 2: invalidating the branch predictor with a call to secure firmware - variant 3: implementing KPTI for arm64 - 52-bit physical address support for arm64 (ARMv8.2) - arm64 support for RAS (firmware first only) and SDEI (software delegated exception interface; allows firmware to inject a RAS error into the OS) - Perf support for the ARM DynamIQ Shared Unit PMU - CPUID and HWCAP bits updated for new floating point multiplication instructions in ARMv8.4 - Removing some virtual memory layout printks during boot - Fix initial page table creation to cope with larger than 32M kernel images when 16K pages are enabled -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlpwxDMACgkQa9axLQDI XvF55BAAniMpxPXnYNfv6l7/4O8eKo1lJIaG1wbej4JRZ/rT3K4Z3OBXW1dKHO8d /PTbVmZ90IqIGROkoDrE+6xyjjn9yK3uuW4ytN2zQkBa8VFaHAnHlX+zKQcuwy9f yxwiHk+C7vK5JR7mpXTazjRknsUv1MPtlTt7DQrSdq0KRDJVDNFC+grmbew2rz0X cjQDqZqgzuFyrKxdiQVjDmc3zH9NsNBhDo0hlGHf2jK6bGJsAPtI8M2JcLrK8ITG Ye/dD7BJp1mWD8ff0BPaMxu24qfAMNLH8f2dpTa986/H78irVz7i/t5HG0/1+5Jh EE4OFRTKZ59Qgyo1zWcaJvdp8YjiaX/L4PWJg8CxM5OhP9dIac9ydcFQfWzpKpUs xyZfmK6XliGFReAkVOOf5tEqFUDhMtsqhzPYmbmU1lp61wmSYIZ8CTenpWWCJSRO NOGyG1X2uFBvP69+iPNlfTGz1r7tg1URY5iO8fUEIhY8LrgyORkiqw4OvPEgnMXP Ngy+dXhyvnps2AAWbSX0O4puRlTgEYLT5KaMLzH/+gWsXATT0rzUCD/aOwUQq/Y7 SWXZHkb3jpmOZZnzZsLL2MNzEIPCFBwSUE9fSv4dA9d/N6tUmlmZALJjHkfzCDpj +mPsSmAMTj72kUYzm0b5GCtOu/iQ2kDWOZjOM1m4+v/B+f7JoEE= =iEjP -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "The main theme of this pull request is security covering variants 2 and 3 for arm64. I expect to send additional patches next week covering an improved firmware interface (requires firmware changes) for variant 2 and way for KPTI to be disabled on unaffected CPUs (Cavium's ThunderX doesn't work properly with KPTI enabled because of a hardware erratum). Summary: - Security mitigations: - variant 2: invalidate the branch predictor with a call to secure firmware - variant 3: implement KPTI for arm64 - 52-bit physical address support for arm64 (ARMv8.2) - arm64 support for RAS (firmware first only) and SDEI (software delegated exception interface; allows firmware to inject a RAS error into the OS) - perf support for the ARM DynamIQ Shared Unit PMU - CPUID and HWCAP bits updated for new floating point multiplication instructions in ARMv8.4 - remove some virtual memory layout printks during boot - fix initial page table creation to cope with larger than 32M kernel images when 16K pages are enabled" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (104 commits) arm64: Fix TTBR + PAN + 52-bit PA logic in cpu_do_switch_mm arm64: Turn on KPTI only on CPUs that need it arm64: Branch predictor hardening for Cavium ThunderX2 arm64: Run enable method for errata work arounds on late CPUs arm64: Move BP hardening to check_and_switch_context arm64: mm: ignore memory above supported physical address size arm64: kpti: Fix the interaction between ASID switching and software PAN KVM: arm64: Emulate RAS error registers and set HCR_EL2's TERR & TEA KVM: arm64: Handle RAS SErrors from EL2 on guest exit KVM: arm64: Handle RAS SErrors from EL1 on guest exit KVM: arm64: Save ESR_EL2 on guest SError KVM: arm64: Save/Restore guest DISR_EL1 KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2. KVM: arm/arm64: mask/unmask daif around VHE guests arm64: kernel: Prepare for a DISR user arm64: Unconditionally enable IESB on exception entry/return for firmware-first arm64: kernel: Survive corrected RAS errors notified by SError arm64: cpufeature: Detect CPU RAS Extentions arm64: sysreg: Move to use definitions for all the SCTLR bits arm64: cpufeature: __this_cpu_has_cap() shouldn't stop early ...	2018-01-30 13:57:43 -08:00
John Pittman	9614e2ba91	dm cache: Documentation: update default migration_throttling value In commit `f8350daf7a` ("dm cache: tune migration throttling") the value for DEFAULT_MIGRATION_THRESHOLD was decreased from 204800 to 2048. Edit device-mapper/cache.txt to reflect the correct default value for migration_threshold. Signed-off-by: John Pittman <jpittman@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-01-30 16:55:47 -05:00
David Woodhouse	7fcae1118f	x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel Despite the fact that all the other code there seems to be doing it, just using set_cpu_cap() in early_intel_init() doesn't actually work. For CPUs with PKU support, setup_pku() calls get_cpu_cap() after c->c_init() has set those feature bits. That resets those bits back to what was queried from the hardware. Turning the bits off for bad microcode is easy to fix. That can just use setup_clear_cpu_cap() to force them off for all CPUs. I was less keen on forcing the feature bits on that way, just in case of inconsistencies. I appreciate that the kernel is going to get this utterly wrong if CPU features are not consistent, because it has already applied alternatives by the time secondary CPUs are brought up. But at least if setup_force_cpu_cap() isn't being used, we might have a chance of detecting the lack of the corresponding bit and either panicking or refusing to bring the offending CPU online. So ensure that the appropriate feature bits are set within get_cpu_cap() regardless of how many extra times it's called. Fixes: `2961298e` ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: karahmed@amazon.de Cc: peterz@infradead.org Cc: bp@alien8.de Link: https://lkml.kernel.org/r/1517322623-15261-1-git-send-email-dwmw@amazon.co.uk	2018-01-30 22:35:05 +01:00
Bjorn Helgaas	65d5e9135f	PCI/DPC: Reformat DPC register definitions Reformat DPC register definitions to follow the convention that register field masks indicate the register width, e.g., a field of a 16-bit register uses a mask of 4 hex digits, with leading zeros included as needed. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>	2018-01-30 15:26:30 -06:00
Bjorn Helgaas	01060e3d4e	PCI/DPC: Add and use DPC Status register field definitions Add definitions for DPC Status register fields and use them in the code. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>	2018-01-30 15:26:25 -06:00
Bjorn Helgaas	716f0f732f	PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error() dpc_process_rp_pio_error() only calls dpc_rp_pio_get_info(), so squash them together. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>	2018-01-30 15:26:20 -06:00
Bjorn Helgaas	f784c41f9c	PCI/DPC: Remove unnecessary RP PIO register structs We read and immediately print the RP PIO log registers. We don't save them, so there's no need to define structs for them. Remove the structs and read the registers into local variables instead. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>	2018-01-30 15:26:15 -06:00
Bjorn Helgaas	f5ec5a0737	PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info() Move the dpc->rp_pio_status assignment into dpc_rp_pio_get_info() since that's where we read rp_pio->status anway. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>	2018-01-30 15:26:09 -06:00
Bjorn Helgaas	a88b304e61	PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info() Separating dpc_rp_pio_print_error() doesn't really provide any useful abstraction, so squash it into its caller, dpc_rp_pio_get_info(). No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>	2018-01-30 15:26:01 -06:00
Bjorn Helgaas	64c3394efd	PCI/DPC: Make RP PIO log size check more generic In dpc_probe(), we set dpc->rp_log_size to zero if we think the hardware reports an invalid size. In this case, we could have dpc->rp_extensions set but dpc->rp_log_size == 0, and we should print the basic RP PIO registers but not the variable-size portion. We already checked for dpc->rp_log_size < 4 above, so this patch is just for consistency of style. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>	2018-01-30 15:25:56 -06:00
Bjorn Helgaas	a596a7bece	PCI/DPC: Rename local "status" to "dpc_status" In dpc_rp_pio_get_info() rename the local "status" variable to "dpc_status". This is to make room for another variable named "status" in a subsequent patch. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>	2018-01-30 15:25:51 -06:00
Bjorn Helgaas	0bbe0eb85f	PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error() Separating dpc_rp_pio_print_tlp_header() doesn't really provide any useful abstraction, so squash it into its caller, dpc_rp_pio_print_error(). No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>	2018-01-30 15:25:45 -06:00
Bjorn Helgaas	e68d281cee	PCI/DPC: Process RP PIO details only if RP PIO extensions supported The RP PIO registers (status, mask, severity, etc) are only implemented if the "RP Extensions for DPC" bit is set in the DPC Capabilities register. Previously we called dpc_process_rp_pio_error(), which reads and decodes those RP PIO registers, whenever the DPC Status register indicated an "RP PIO error" (Trigger Reason == 3 and Trigger Reason Extension == 0). It does seem reasonable to assume that DPC Status would only indicate an RP PIO error if the RP extensions are supported, but PCIe r4.0, sec 7.9.15.4, is actually not explicit about that: it does not say "Trigger Reason Extension == 0 is valid only for Root Ports that support RP Extensions for DPC." Check whether the RP Extensions for DPC are supported before trying to read the RP PIO registers. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>	2018-01-30 15:25:39 -06:00
Bjorn Helgaas	e3c44b8ddc	PCI/DPC: Read RP PIO Log Size once at probe The RP PIO Log Size is a read-only field in the DPC Capability, so it is constant and known at probe-time, but previously we read it every time we processed an RP PIO error. Read it once in dpc_probe() (if the RP Extensions for DPC are supported) and remember the size in struct dpc_dev. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>	2018-01-30 15:25:33 -06:00
Bjorn Helgaas	be3039a392	PCI/DPC: Rename struct dpc_dev.rp to rp_extensions "rp" is ambiguous: it might mean "this DPC device is a Root Port." But in fact, it means "this DPC device is a Root Port and it supports a set of DPC Extensions." Rename "rp" to "rp_extensions" to make this more clear. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>	2018-01-30 15:25:27 -06:00
Bjorn Helgaas	aa745effd0	PCI/DPC: Add local variable for DPC capability offset Add a local variable for DPC capability offset and replace repeated use of "dpc->cap_pos" with simply "cap". No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sinan Kaya <okaya@codeaurora.org>	2018-01-30 15:25:21 -06:00
Colin Ian King	e698dcdfcd	x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable" Trivial fix to spelling mistake in pr_err error message text. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: kernel-janitors@vger.kernel.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20180130193218.9271-1-colin.king@canonical.com	2018-01-30 22:08:44 +01:00
Linus Torvalds	72906f3893	Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 hyperv update from Ingo Molnar: "Enable PCID support on Hyper-V guests" * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyperv: Stop suppressing X86_FEATURE_PCID	2018-01-30 13:04:50 -08:00
Linus Torvalds	3ccabd6d9d	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Remove unused IOMMU_STRESS Kconfig x86/extable: Mark exception handler functions visible x86/timer: Don't inline __const_udelay x86/headers: Remove duplicate #includes	2018-01-30 13:01:09 -08:00
Linus Torvalds	5289d3005a	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic cleanup from Ingo Molnar: "A single change simplifying the APIC code bit" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Remove local var in flat_send_IPI_allbutself()	2018-01-30 12:59:12 -08:00
Dan Williams	edfbae53da	x86/spectre: Report get_user mitigation for spectre_v1 Reflect the presence of get_user(), __get_user(), and 'syscall' protections in sysfs. The expectation is that new and better tooling will allow the kernel to grow more usages of array_index_nospec(), for now, only claim mitigation for __user pointer de-references. Reported-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: torvalds@linux-foundation.org Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727420158.33451.11658324346540434635.stgit@dwillia2-desk3.amr.corp.intel.com	2018-01-30 21:54:32 +01:00
Dan Williams	259d8c1e98	nl80211: Sanitize array index in parse_txq_params Wireless drivers rely on parse_txq_params to validate that txq_params->ac is less than NL80211_NUM_ACS by the time the low-level driver's ->conf_tx() handler is called. Use a new helper, array_index_nospec(), to sanitize txq_params->ac with respect to speculation. I.e. ensure that any speculation into ->conf_tx() handlers is done with a value of txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS). Reported-by: Christian Lamparter <chunkeey@gmail.com> Reported-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Johannes Berg <johannes@sipsolutions.net> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: linux-wireless@vger.kernel.org Cc: torvalds@linux-foundation.org Cc: "David S. Miller" <davem@davemloft.net> Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727419584.33451.7700736761686184303.stgit@dwillia2-desk3.amr.corp.intel.com	2018-01-30 21:54:32 +01:00
Dan Williams	56c30ba7b3	vfs, fdtable: Prevent bounds-check bypass via speculative execution 'fd' is a user controlled value that is used as a data dependency to read from the 'fdt->fd' array. In order to avoid potential leaks of kernel memory values, block speculative execution of the instruction stream that could issue reads based on an invalid 'file *' returned from __fcheck_files. Co-developed-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: torvalds@linux-foundation.org Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727418500.33451.17392199002892248656.stgit@dwillia2-desk3.amr.corp.intel.com	2018-01-30 21:54:32 +01:00
Dan Williams	2fbd7af5af	x86/syscall: Sanitize syscall table de-references under speculation The syscall table base is a user controlled function pointer in kernel space. Use array_index_nospec() to prevent any out of bounds speculation. While retpoline prevents speculating into a userspace directed target it does not stop the pointer de-reference, the concern is leaking memory relative to the syscall table base, by observing instruction cache behavior. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Andy Lutomirski <luto@kernel.org> Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com	2018-01-30 21:54:31 +01:00
Dan Williams	c7f631cb07	x86/get_user: Use pointer masking to limit speculation Quoting Linus: I do think that it would be a good idea to very expressly document the fact that it's not that the user access itself is unsafe. I do agree that things like "get_user()" want to be protected, but not because of any direct bugs or problems with get_user() and friends, but simply because get_user() is an excellent source of a pointer that is obviously controlled from a potentially attacking user space. So it's a prime candidate for then finding _subsequent_ accesses that can then be used to perturb the cache. Unlike the __get_user() case get_user() includes the address limit check near the pointer de-reference. With that locality the speculation can be mitigated with pointer narrowing rather than a barrier, i.e. array_index_nospec(). Where the narrowing is performed by: cmp %limit, %ptr sbb %mask, %mask and %mask, %ptr With respect to speculation the value of %ptr is either less than %limit or NULL. Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: torvalds@linux-foundation.org Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727417469.33451.11804043010080838495.stgit@dwillia2-desk3.amr.corp.intel.com	2018-01-30 21:54:31 +01:00
Dan Williams	304ec1b050	x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec Quoting Linus: I do think that it would be a good idea to very expressly document the fact that it's not that the user access itself is unsafe. I do agree that things like "get_user()" want to be protected, but not because of any direct bugs or problems with get_user() and friends, but simply because get_user() is an excellent source of a pointer that is obviously controlled from a potentially attacking user space. So it's a prime candidate for then finding _subsequent_ accesses that can then be used to perturb the cache. __uaccess_begin_nospec() covers __get_user() and copy_from_iter() where the limit check is far away from the user pointer de-reference. In those cases a barrier_nospec() prevents speculation with a potential pointer to privileged memory. uaccess_try_nospec covers get_user_try. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727416953.33451.10508284228526170604.stgit@dwillia2-desk3.amr.corp.intel.com	2018-01-30 21:54:31 +01:00
Dan Williams	b5c4ae4f35	x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end} In preparation for converting some __uaccess_begin() instances to __uacess_begin_nospec(), make sure all 'from user' uaccess paths are using the _begin(), _end() helpers rather than open-coded stac() and clac(). No functional changes. Suggested-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Kees Cook <keescook@chromium.org> Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: torvalds@linux-foundation.org Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727416438.33451.17309465232057176966.stgit@dwillia2-desk3.amr.corp.intel.com	2018-01-30 21:54:30 +01:00
Dan Williams	b3bbfb3fb5	x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec For __get_user() paths, do not allow the kernel to speculate on the value of a user controlled pointer. In addition to the 'stac' instruction for Supervisor Mode Access Protection (SMAP), a barrier_nospec() causes the access_ok() result to resolve in the pipeline before the CPU might take any speculative action on the pointer value. Given the cost of 'stac' the speculation barrier is placed after 'stac' to hopefully overlap the cost of disabling SMAP with the cost of flushing the instruction pipeline. Since __get_user is a major kernel interface that deals with user controlled pointers, the __uaccess_begin_nospec() mechanism will prevent speculative execution past an access_ok() permission check. While speculative execution past access_ok() is not enough to lead to a kernel memory leak, it is a necessary precondition. To be clear, __uaccess_begin_nospec() is addressing a class of potential problems near __get_user() usages. Note, that while the barrier_nospec() in __uaccess_begin_nospec() is used to protect __get_user(), pointer masking similar to array_index_nospec() will be used for get_user() since it incorporates a bounds check near the usage. uaccess_try_nospec provides the same mechanism for get_user_try. No functional changes. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Andi Kleen <ak@linux.intel.com> Suggested-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Kees Cook <keescook@chromium.org> Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727415922.33451.5796614273104346583.stgit@dwillia2-desk3.amr.corp.intel.com	2018-01-30 21:54:30 +01:00
Dan Williams	b3d7ad85b8	x86: Introduce barrier_nospec Rename the open coded form of this instruction sequence from rdtsc_ordered() into a generic barrier primitive, barrier_nospec(). One of the mitigations for Spectre variant1 vulnerabilities is to fence speculative execution after successfully validating a bounds check. I.e. force the result of a bounds check to resolve in the instruction pipeline to ensure speculative execution honors that result before potentially operating on out-of-bounds data. No functional changes. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Andi Kleen <ak@linux.intel.com> Suggested-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Kees Cook <keescook@chromium.org> Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727415361.33451.9049453007262764675.stgit@dwillia2-desk3.amr.corp.intel.com	2018-01-30 21:54:29 +01:00
Dan Williams	babdde2698	x86: Implement array_index_mask_nospec array_index_nospec() uses a mask to sanitize user controllable array indexes, i.e. generate a 0 mask if 'index' >= 'size', and a ~0 mask otherwise. While the default array_index_mask_nospec() handles the carry-bit from the (index - size) result in software. The x86 array_index_mask_nospec() does the same, but the carry-bit is handled in the processor CF flag without conditional instructions in the control flow. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727414808.33451.1873237130672785331.stgit@dwillia2-desk3.amr.corp.intel.com	2018-01-30 21:54:29 +01:00

... 24 25 26 27 28 ...

737467 Commits