Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- move BGRT handling to drivers/acpi so it can be shared between x86
and ARM
- bring the EFI stub's initrd and FDT allocation logic in line with
the latest changes to the arm64 boot protocol
- improvements and fixes to the EFI stub's command line parsing
routines
- randomize the virtual mapping of the UEFI runtime services on
ARM/arm64
- ... and other misc enhancements, cleanups and fixes"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub/arm: Don't use TASK_SIZE when randomizing the RT space
ef/libstub/arm/arm64: Randomize the base of the UEFI rt services region
efi/libstub/arm/arm64: Disable debug prints on 'quiet' cmdline arg
efi/libstub: Unify command line param parsing
efi/libstub: Fix harmless command line parsing bug
efi/arm32-stub: Allow boot-time allocations in the vmlinux region
x86/efi: Clean up a minor mistake in comment
efi/pstore: Return error code (if any) from efi_pstore_write()
efi/bgrt: Enable ACPI BGRT handling on arm64
x86/efi/bgrt: Move efi-bgrt handling out of arch/x86
efi/arm-stub: Round up FDT allocation to mapping size
efi/arm-stub: Correct FDT and initrd allocation rules for arm64
Reserving a runtime region results in splitting the EFI memory
descriptors for the runtime region. This results in runtime region
descriptors with bogus memory mappings, leading to interesting crashes
like the following during a kexec:
general protection fault: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1 #53
Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM05 09/30/2016
RIP: 0010:virt_efi_set_variable()
...
Call Trace:
efi_delete_dummy_variable()
efi_enter_virtual_mode()
start_kernel()
? set_init_arg()
x86_64_start_reservations()
x86_64_start_kernel()
start_cpu()
...
Kernel panic - not syncing: Fatal exception
Runtime regions will not be freed and do not need to be reserved, so
skip the memmap modification in this case.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 8e80632fb2 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
Link: http://lkml.kernel.org/r/20170412152719.9779-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
EFI allocates runtime services regions from EFI_VA_START, -4G, down
to -68G, EFI_VA_END - 64G altogether, top-down.
The mechanism was introduced in commit:
d2f7cbe7b2 ("x86/efi: Runtime services virtual mapping")
Fix the comment that still says bottom-up.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170404160245.27812-10-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now with open-source boot firmware (EDK2) supporting ACPI BGRT table
addition even for architectures like AARCH64, it makes sense to move
out the 'efi-bgrt.c' file and supporting infrastructure from 'arch/x86'
directory and house it inside 'drivers/firmware/efi', so that this common
code can be used across architectures.
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170404160245.27812-7-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is not obvious if the reserved boot area are added correctly, add a
efi_print_memmap() call to print the new memmap.
Tested-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Nicolai Stange <nicstange@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1485868902-20401-10-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Before invoking the arch specific handler, efi_mem_reserve() reserves
the given memory region through memblock.
efi_bgrt_init() will call efi_mem_reserve() after mm_init(), at which
time memblock is dead and should not be used anymore.
The EFI BGRT code depends on ACPI initialization to get the BGRT ACPI
table, so move parsing of the BGRT table to ACPI early boot code to
ensure that efi_mem_reserve() in EFI BGRT code still use memblock safely.
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-acpi@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1485868902-20401-9-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE which describes memory
protections that may be applied to the EFI Runtime code and data regions by
the kernel. This enables the kernel to map these regions more strictly thereby
increasing security.
Presently, the only valid bits for the attribute field of a memory descriptor
are EFI_MEMORY_RO and EFI_MEMORY_XP, hence use these bits to update the
mappings in efi_pgd.
The UEFI specification recommends to use this feature instead of
EFI_PROPERTIES_TABLE and hence while updating EFI mappings we first
check for EFI_MEMORY_ATTRIBUTES_TABLE and if it's present we update
the mappings according to this table and hence disregarding
EFI_PROPERTIES_TABLE even if it's published by the firmware. We consider
EFI_PROPERTIES_TABLE only when EFI_MEMORY_ATTRIBUTES_TABLE is absent.
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Lee, Chun-Yi <jlee@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1485868902-20401-6-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
129766708 ("x86/efi: Only map RAM into EFI page tables if in mixed-mode")
stopped creating 1:1 mappings for all RAM, when running in native 64-bit mode.
It turns out though that there are 64-bit EFI implementations in the wild
(this particular problem has been reported on a Lenovo Yoga 710-11IKB),
which still make use of the first physical page for their own private use,
even though they explicitly mark it EFI_CONVENTIONAL_MEMORY in the memory
map.
In case there is no mapping for this particular frame in the EFI pagetables,
as soon as firmware tries to make use of it, a triple fault occurs and the
system reboots (in case of the Yoga 710-11IKB this is very early during bootup).
Fix that by always mapping the first page of physical memory into the EFI
pagetables. We're free to hand this page to the BIOS, as trim_bios_range()
will reserve the first page and isolate it away from memory allocators anyway.
Note that just reverting 129766708 alone is not enough on v4.9-rc1+ to fix the
regression on affected hardware, as this commit:
ab72a27da ("x86/efi: Consolidate region mapping logic")
later made the first physical frame not to be mapped anyway.
Reported-by: Hanka Pavlikova <hanka@ucw.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vojtech Pavlik <vojtech@ucw.cz>
Cc: Waiman Long <waiman.long@hpe.com>
Cc: linux-efi@vger.kernel.org
Cc: stable@kernel.org # v4.8+
Fixes: 129766708 ("x86/efi: Only map RAM into EFI page tables if in mixed-mode")
Link: http://lkml.kernel.org/r/20170127222552.22336-1-matt@codeblueprint.co.uk
[ Tidied up the changelog and the comment. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
(2.28), include memory map entries with phys_addr=0x0 and num_pages=0.
These machines fail to boot after the following commit,
commit 8e80632fb2 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
Fix this by removing such bogus entries from the memory map.
Furthermore, currently the log output for this case (with efi=debug)
looks like:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)
This is clearly wrong, and also not as informative as it could be. This
patch changes it so that if we find obviously invalid memory map
entries, we print an error and skip those entries. It also detects the
display of the address range calculation overflow, so the new output is:
[ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0x0000000000000000] (invalid)
It also detects memory map sizes that would overflow the physical
address, for example phys_addr=0xfffffffffffff000 and
num_pages=0x0200000000000001, and prints:
[ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)
It then removes these entries from the memory map.
Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
[Matt: Include bugzilla info in commit log]
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the following commit:
4bc9f92e64 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
... efi_bgrt_init() calls into the memblock allocator through
efi_mem_reserve() => efi_arch_mem_reserve() *after* mm_init() has been called.
Indeed, KASAN reports a bad read access later on in efi_free_boot_services():
BUG: KASAN: use-after-free in efi_free_boot_services+0xae/0x24c
at addr ffff88022de12740
Read of size 4 by task swapper/0/0
page:ffffea0008b78480 count:0 mapcount:-127
mapping: (null) index:0x1 flags: 0x5fff8000000000()
[...]
Call Trace:
dump_stack+0x68/0x9f
kasan_report_error+0x4c8/0x500
kasan_report+0x58/0x60
__asan_load4+0x61/0x80
efi_free_boot_services+0xae/0x24c
start_kernel+0x527/0x562
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x157/0x17a
start_cpu+0x5/0x14
The instruction at the given address is the first read from the memmap's
memory, i.e. the read of md->type in efi_free_boot_services().
Note that the writes earlier in efi_arch_mem_reserve() don't splat because
they're done through early_memremap()ed addresses.
So, after memblock is gone, allocations should be done through the "normal"
page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
it from efi_arch_mem_reserve(), efi_free_boot_services() and, for the sake
of consistency, from efi_fake_memmap() as well.
Note that for the latter, the memmap allocations cease to be page aligned.
This isn't needed though.
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org> # v4.9
Cc: Dave Young <dyoung@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 4bc9f92e64 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
Link: http://lkml.kernel.org/r/20170105125130.2815-1-nicstange@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Booting an EFI mixed mode kernel has been crashing since commit:
e37e43a497 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)")
The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.
Enabling CONFIG_DEBUG_VIRTUAL=y immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.
Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20161112210424.5157-3-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix this when building on 32-bit:
arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’:
arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
(efi_memory_desc_t *)pa);
^
arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
(efi_memory_desc_t *)pa);
^
The @pa local variable is declared as phys_addr_t and that is a u64 when
CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE
build.)
However, its value comes from __pa() which is basically doing pointer
arithmetic and checking, and returns unsigned long as it is the native
pointer width.
So let's use an unsigned long too. It should be fine to do so because
the later users cast it to a pointer too.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20161112210424.5157-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 boot updates from Ingo Molnar:
"The changes in this cycle were:
- Save e820 table RAM footprint on larger kernel configurations.
(Denys Vlasenko)
- pmem related fixes (Dan Williams)
- theoretical e820 boundary condition fix (Wei Yang)"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation
x86/e820: Use much less memory for e820/e820_saved, save up to 120k
x86/e820: Prepare e280 code for switch to dynamic storage
x86/e820: Mark some static functions __init
x86/e820: Fix very large 'size' handling boundary condition
This patch turns e820 and e820_saved into pointers to e820 tables,
of the same size as before.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160917213927.1787-2-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
new EFI memory map reservation code didn't align reservations to
EFI_PAGE_SIZE boundaries causing bogus regions to be inserted into
the global EFI memory map - Matt Fleming
-----BEGIN PGP SIGNATURE-----
iQI2BAABCAAgBQJX4UshGRxtYXR0QGNvZGVibHVlcHJpbnQuY28udWsACgkQLzhZ
wI0jPVX2RA/9HiT48K98R1fkiig/V3Wh8H8y7lQbwO+qA1WZ7q/D+DYau62qI3zM
sLgTSoV2zzjBr+4JWuGIdbtpYBjhHci0HwEcOGp5EFSZmgJGOpc7VY2VdFaYyZ2X
xhowGLXv2jgfKoAPt9MfgZX2Qh5Jt4CRlC+UcilUJp0wy1+uD4W7XmrLM1Rg7Ug2
8XTL/BA5QxQa3wx8+/z6xV/ADlhzf3DQ7BwI3qkx5RHSXSuy1/E2XGTnyuDt/7Oo
KmMwQGGrrv92uSjyq5vJ53DuNxXDJ7nGtlXTT2u0WtcZbzJXsXIaCZItbCA5rqW/
wWGgsz5XSc406m1WNv8Zs8xx9NAEP/+KXQfftwuQHFqoylG4QqYuO060V5XOxQlh
cdRuyVUgMONZAvTzEAUC+5SaN3c+WDCL8oEnrs/tdMxSzBL9Rj7yaHMW+ozVe5Nf
GFxMnO0l5SG9+o8hSAkkn5RDfIDzSTC1W5imqWJQXhC2BXHEdmh4wVWIG4QgPM/K
gqaTpX72Nu04lBjCXXpfKSK5Xqv67hZOzxUDkkjxtq1PtDbL1dPpzAF3s3GFdreI
FfQgetg4jM4EPnWDQWqWVJxQUv+uCfpDV9g5y9kMinVb2OpyGjcZpNckGHKnqIXr
hyntPYqyy793W6dPGEjTqfcVP751qisMjp6iIIqk3h+qr7HyJSF5nqs=
=S9gJ
-----END PGP SIGNATURE-----
Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into efi/core
Pull EFI fix from Matt Fleming:
* Fix a boot crash reported by Mike Galbraith and Mike Krinkin. The
new EFI memory map reservation code didn't align reservations to
EFI_PAGE_SIZE boundaries causing bogus regions to be inserted into
the global EFI memory map (Matt Fleming)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mike Galbraith reported that his machine started rebooting during boot
after,
commit 8e80632fb2 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
The ESRT table on his machine is 56 bytes and at no point in the
efi_arch_mem_reserve() call path is that size rounded up to
EFI_PAGE_SIZE, nor is the start address on an EFI_PAGE_SIZE boundary.
Since the EFI memory map only deals with whole pages, inserting an EFI
memory region with 56 bytes results in a new entry covering zero
pages, and completely screws up the calculations for the old regions
that were trimmed.
Round all sizes upwards, and start addresses downwards, to the nearest
EFI_PAGE_SIZE boundary.
Additionally, efi_memmap_insert() expects the mem::range::end value to
be one less than the end address for the region.
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
Tested-by: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Waiman reported that booting with CONFIG_EFI_MIXED enabled on his
multi-terabyte HP machine results in boot crashes, because the EFI
region mapping functions loop forever while trying to map those
regions describing RAM.
While this patch doesn't fix the underlying hang, there's really no
reason to map EFI_CONVENTIONAL_MEMORY regions into the EFI page tables
when mixed-mode is not in use at runtime.
Reported-by: Waiman Long <waiman.long@hpe.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Theodore Ts'o <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus reuse the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data type by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Commit 7b02d53e7852 ("efi: Allow drivers to reserve boot services forever")
introduced a new efi_mem_reserve to reserve the boot services memory
regions forever. This reservation involves allocating a new EFI memory
range descriptor. However, allocation can only succeed if there is memory
available for the allocation. Otherwise, error such as the following may
occur:
esrt: Reserving ESRT space from 0x000000003dd6a000 to 0x000000003dd6a010.
Kernel panic - not syncing: ERROR: Failed to allocate 0x9f0 bytes below \
0x0.
CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.0-rc5+ #503
0000000000000000 ffffffff81e03ce0 ffffffff8131dae8 ffffffff81bb6c50
ffffffff81e03d70 ffffffff81e03d60 ffffffff8111f4df 0000000000000018
ffffffff81e03d70 ffffffff81e03d08 00000000000009f0 00000000000009f0
Call Trace:
[<ffffffff8131dae8>] dump_stack+0x4d/0x65
[<ffffffff8111f4df>] panic+0xc5/0x206
[<ffffffff81f7c6d3>] memblock_alloc_base+0x29/0x2e
[<ffffffff81f7c6e3>] memblock_alloc+0xb/0xd
[<ffffffff81f6c86d>] efi_arch_mem_reserve+0xbc/0x134
[<ffffffff81fa3280>] efi_mem_reserve+0x2c/0x31
[<ffffffff81fa3280>] ? efi_mem_reserve+0x2c/0x31
[<ffffffff81fa40d3>] efi_esrt_init+0x19e/0x1b4
[<ffffffff81f6d2dd>] efi_init+0x398/0x44a
[<ffffffff81f5c782>] setup_arch+0x415/0xc30
[<ffffffff81f55af1>] start_kernel+0x5b/0x3ef
[<ffffffff81f55434>] x86_64_start_reservations+0x2f/0x31
[<ffffffff81f55520>] x86_64_start_kernel+0xea/0xed
---[ end Kernel panic - not syncing: ERROR: Failed to allocate 0x9f0
bytes below 0x0.
An inspection of the memblock configuration reveals that there is no memory
available for the allocation:
MEMBLOCK configuration:
memory size = 0x0 reserved size = 0x4f339c0
memory.cnt = 0x1
memory[0x0] [0x00000000000000-0xffffffffffffffff], 0x0 bytes on node 0\
flags: 0x0
reserved.cnt = 0x4
reserved[0x0] [0x0000000008c000-0x0000000008c9bf], 0x9c0 bytes flags: 0x0
reserved[0x1] [0x0000000009f000-0x000000000fffff], 0x61000 bytes\
flags: 0x0
reserved[0x2] [0x00000002800000-0x0000000394bfff], 0x114c000 bytes\
flags: 0x0
reserved[0x3] [0x000000304e4000-0x00000034269fff], 0x3d86000 bytes\
flags: 0x0
This situation can be avoided if we call efi_esrt_init after memblock has
memory regions for the allocation.
Also, the EFI ESRT driver makes use of early_memremap'pings. Therfore, we
do not want to defer efi_esrt_init for too long. We must call such function
while calls to early_memremap are still valid.
A good place to meet the two aforementioned conditions is right after
memblock_x86_fill, grouped with other EFI-related functions.
Reported-by: Scott Lawson <scott.lawson@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
This is a simple change to add in the physical mappings as well as the
virtual mappings in efi_map_region_fixed. The motivation here is to
get access to EFI runtime code that is only available via the 1:1
mappings on a kexec'd kernel.
The added call is essentially the kexec analog of the first __map_region
that Boris put in efi_map_region in commit d2f7cbe7b2 ("x86/efi:
Runtime services virtual mapping").
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
efi_mem_reserve() allows us to permanently mark EFI boot services
regions as reserved, which means we no longer need to copy the image
data out and into a separate buffer.
Leaving the data in the original boot services region has the added
benefit that BGRT images can now be passed across kexec reboot.
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Môshe van der Sterre <me@moshe.nl>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Now that efi.memmap is available all of the time there's no need to
allocate and build a separate copy of the EFI memory map.
Furthermore, efi.memmap contains boot services regions but only those
regions that have been reserved via efi_mem_reserve(). Using
efi.memmap allows us to pass boot services across kexec reboot so that
the ESRT and BGRT drivers will now work.
Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Today, it is not possible for drivers to reserve EFI boot services for
access after efi_free_boot_services() has been called on x86. For
ARM/arm64 it can be done simply by calling memblock_reserve().
Having this ability for all three architectures is desirable for a
couple of reasons,
1) It saves drivers copying data out of those regions
2) kexec reboot can now make use of things like ESRT
Instead of using the standard memblock_reserve() which is insufficient
to reserve the region on x86 (see efi_reserve_boot_services()), a new
API is introduced in this patch; efi_mem_reserve().
efi.memmap now always represents which EFI memory regions are
available. On x86 the EFI boot services regions that have not been
reserved via efi_mem_reserve() will be removed from efi.memmap during
efi_free_boot_services().
This has implications for kexec, since it is not possible for a newly
kexec'd kernel to access the same boot services regions that the
initial boot kernel had access to unless they are reserved by every
kexec kernel in the chain.
Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Drivers need a way to access the EFI memory map at runtime. ARM and
arm64 currently provide this by remapping the EFI memory map into the
vmalloc space before setting up the EFI virtual mappings.
x86 does not provide this functionality which has resulted in the code
in efi_mem_desc_lookup() where it will manually map individual EFI
memmap entries if the memmap has already been torn down on x86,
/*
* If a driver calls this after efi_free_boot_services,
* ->map will be NULL, and the target may also not be mapped.
* So just always get our own virtual map on the CPU.
*
*/
md = early_memremap(p, sizeof (*md));
There isn't a good reason for not providing a permanent EFI memory map
for runtime queries, especially since the EFI regions are not mapped
into the standard kernel page tables.
Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Every EFI architecture apart from ia64 needs to setup the EFI memory
map at efi.memmap, and the code for doing that is essentially the same
across all implementations. Therefore, it makes sense to factor this
out into the common code under drivers/firmware/efi/.
The only slight variation is the data structure out of which we pull
the initial memory map information, such as physical address, memory
descriptor size and version, etc. We can address this by passing a
generic data structure (struct efi_memory_map_data) as the argument to
efi_memmap_init_early() which contains the minimum info required for
initialising the memory map.
In the process, this patch also fixes a few undesirable implementation
differences:
- ARM and arm64 were failing to clear the EFI_MEMMAP bit when
unmapping the early EFI memory map. EFI_MEMMAP indicates whether
the EFI memory map is mapped (not the regions contained within) and
can be traversed. It's more correct to set the bit as soon as we
memremap() the passed in EFI memmap.
- Rename efi_unmmap_memmap() to efi_memmap_unmap() to adhere to the
regular naming scheme.
This patch also uses a read-write mapping for the memory map instead
of the read-only mapping currently used on ARM and arm64. x86 needs
the ability to update the memory map in-place when assigning virtual
addresses to regions (efi_map_region()) and tagging regions when
reserving boot services (efi_reserve_boot_services()).
There's no way for the generic fake_mem code to know which mapping to
use without introducing some arch-specific constant/hook, so just use
read-write since read-only is of dubious value for the EFI memory map.
Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
EFI regions are currently mapped in two separate places. The bulk of
the work is done in efi_map_regions() but when CONFIG_EFI_MIXED is
enabled the additional regions that are required when operating in
mixed mode are mapping in efi_setup_page_tables().
Pull everything into efi_map_regions() and refactor the test for
which regions should be mapped into a should_map_region() function.
Generously sprinkle comments to clarify the different cases.
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
On my Dell XPS 13 9350 with firmware 1.4.4 and SGX on, if I boot
Fedora 24's grub2-efi off a hard disk, my first 1MB of RAM looks
like:
efi: mem00: [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC] range=[0x0000000000000000-0x0000000000000fff] (0MB)
efi: mem01: [Boot Data | | | | | | | | |WB|WT|WC|UC] range=[0x0000000000001000-0x0000000000027fff] (0MB)
efi: mem02: [Loader Data | | | | | | | | |WB|WT|WC|UC] range=[0x0000000000028000-0x0000000000029fff] (0MB)
efi: mem03: [Reserved | | | | | | | | |WB|WT|WC|UC] range=[0x000000000002a000-0x000000000002bfff] (0MB)
efi: mem04: [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC] range=[0x000000000002c000-0x000000000002cfff] (0MB)
efi: mem05: [Loader Data | | | | | | | | |WB|WT|WC|UC] range=[0x000000000002d000-0x000000000002dfff] (0MB)
efi: mem06: [Conventional Memory| | | | | | | | |WB|WT|WC|UC] range=[0x000000000002e000-0x0000000000057fff] (0MB)
efi: mem07: [Reserved | | | | | | | | |WB|WT|WC|UC] range=[0x0000000000058000-0x0000000000058fff] (0MB)
efi: mem08: [Conventional Memory| | | | | | | | |WB|WT|WC|UC] range=[0x0000000000059000-0x000000000009ffff] (0MB)
My EBDA is at 0x2c000, which blocks off everything from 0x2c000 and
up, and my trampoline is 0x6000 bytes (6 pages), so it doesn't fit
in the loader data range at 0x28000.
Without this patch, it panics due to a failure to allocate the
trampoline. With this patch, it works:
[ +0.001744] Base memory trampoline at [ffff880000001000] 1000 size 24576
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matt Fleming <mfleming@suse.de>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/998c77b3bf709f3dfed85cb30701ed1a5d8a438b.1470821230.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cleanups:
- huge cleanup of rtc-generic and char/genrtc this allowed to cleanup rtc-cmos,
rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
- move mn10300 to rtc-cmos
Subsystem:
- fix wakealarms after hibernate
- multiples fixes for rctest
- simplify implementations of .read_alarm
New drivers:
- Maxim MAX6916
Drivers:
- ds1307: fix weekday
- m41t80: add wakeup support
- pcf85063: add support for PCF85063A variant
- rv8803: extend i2c fix and other fixes
- s35390a: fix alarm reading, this fixes instant reboot after shutdown for QNAP
TS-41x
- s3c: clock fixes
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJXokhIAAoJENiigzvaE+LCZqQP+wWzintN/N1u3dKiVB7iSdwq
+S/jAXD9wW8OK9PI60/YUGRYeUXmZW9t4XYg1VKCxU9KpVC17LgOtDyXD8BufP1V
uREJEzZw9O7zCCjeHp/ICFjBkc62Net6ZDOO+ZyXPNfddpS1Xq1uUgXLZc/202UR
ID/kewu0pJRDnoxyqznWn9+8D33w/ygXs2slY2Ive0ONtjdgxGcsj2rNbb2RYn2z
OP7br3lLg7qkFh4TtXb61eh/9GYIk6wzP/CrX5l/jH4SjQnrIk5g/X/Cd1qQ/qso
JZzFoonOKvIp5Gw/+fZ9NP3YFcnkoRMv4NjZV8PAmsYLds+ibRiBcoB8u6FmiJV7
WW5uopgPkfCGN5BV3+QHwJDVe+WlgnlzaT5zPUCcP5KWusDts4fWIgzP7vrtAzf4
3OJLrgSGdBeOqWnJD21nxKUD27JOseX7D+BFtwxR4lMsXHqlHJfETpZ8gts1ZGH3
2U353j/jkZvGWmc6dMcuxOXT2K4VqpYeIIqs0IcLu6hM9crtR89zPR2Iu1AilfDW
h2NroF+Q//SgMMzWoTEG6Tn7RAc7MthgA/tRCFZF9CBMzNs988w0CTHnKsIHmjpU
UKkMeJGAC9YrPYIcqrg0oYsmLUWXc8JuZbGJBnei3BzbaMTlcwIN9qj36zfq6xWc
TMLpbWEoIsgFIZMP/hAP
=rpGB
-----END PGP SIGNATURE-----
Merge tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"RTC for 4.8
Cleanups:
- huge cleanup of rtc-generic and char/genrtc this allowed to cleanup
rtc-cmos, rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
- move mn10300 to rtc-cmos
Subsystem:
- fix wakealarms after hibernate
- multiples fixes for rctest
- simplify implementations of .read_alarm
New drivers:
- Maxim MAX6916
Drivers:
- ds1307: fix weekday
- m41t80: add wakeup support
- pcf85063: add support for PCF85063A variant
- rv8803: extend i2c fix and other fixes
- s35390a: fix alarm reading, this fixes instant reboot after
shutdown for QNAP TS-41x
- s3c: clock fixes"
* tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (65 commits)
rtc: rv8803: Clear V1F when setting the time
rtc: rv8803: Stop the clock while setting the time
rtc: rv8803: Always apply the I²C workaround
rtc: rv8803: Fix read day of week
rtc: rv8803: Remove the check for valid time
rtc: rv8803: Kconfig: Indicate rx8900 support
rtc: asm9260: remove .owner field for driver
rtc: at91sam9: Fix missing spin_lock_init()
rtc: m41t80: add suspend handlers for alarm IRQ
rtc: m41t80: make it a real error message
rtc: pcf85063: Add support for the PCF85063A device
rtc: pcf85063: fix year range
rtc: hym8563: in .read_alarm set .tm_sec to 0 to signal minute accuracy
rtc: explicitly set tm_sec = 0 for drivers with minute accurancy
rtc: s3c: Add s3c_rtc_{enable/disable}_clk in s3c_rtc_setfreq()
rtc: s3c: Remove unnecessary call to disable already disabled clock
rtc: abx80x: use devm_add_action_or_reset()
rtc: m41t80: use devm_add_action_or_reset()
rtc: fix a typo and reduce three empty lines to one
rtc: s35390a: improve two comments in .set_alarm
...
There was only one use of __initdata_refok and __exit_refok
__init_refok was used 46 times against 82 for __ref.
Those definitions are obsolete since commit 312b1485fb ("Introduce new
section reference annotations tags: __ref, __refdata, __refconst")
This patch removes the following compatibility definitions and replaces
them treewide.
/* compatibility defines */
#define __init_refok __ref
#define __initdata_refok __refdata
#define __exit_refok __ref
I can also provide separate patches if necessary.
(One patch per tree and check in 1 month or 2 to remove old definitions)
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 header cleanups from Ingo Molnar:
"This tree is a cleanup of the x86 tree reducing spurious uses of
module.h - which should improve build performance a bit"
* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, crypto: Restore MODULE_LICENSE() to glue_helper.c so it loads
x86/apic: Remove duplicated include from probe_64.c
x86/ce4100: Remove duplicated include from ce4100.c
x86/headers: Include spinlock_types.h in x8664_ksyms_64.c for missing spinlock_t
x86/platform: Delete extraneous MODULE_* tags fromm ts5500
x86: Audit and remove any remaining unnecessary uses of module.h
x86/kvm: Audit and remove any unnecessary uses of module.h
x86/xen: Audit and remove any unnecessary uses of module.h
x86/platform: Audit and remove any unnecessary uses of module.h
x86/lib: Audit and remove any unnecessary uses of module.h
x86/kernel: Audit and remove any unnecessary uses of module.h
x86/mm: Audit and remove any unnecessary uses of module.h
x86: Don't use module.h just for AUTHOR / LICENSE tags
Pull x86 mm updates from Ingo Molnar:
"Various x86 low level modifications:
- preparatory work to support virtually mapped kernel stacks (Andy
Lutomirski)
- support for 64-bit __get_user() on 32-bit kernels (Benjamin
LaHaise)
- (involved) workaround for Knights Landing CPU erratum (Dave Hansen)
- MPX enhancements (Dave Hansen)
- mremap() extension to allow remapping of the special VDSO vma, for
purposes of user level context save/restore (Dmitry Safonov)
- hweight and entry code cleanups (Borislav Petkov)
- bitops code generation optimizations and cleanups with modern GCC
(H. Peter Anvin)
- syscall entry code optimizations (Paolo Bonzini)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
x86/mm/cpa: Add missing comment in populate_pdg()
x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
x86/smp: Remove unnecessary initialization of thread_info::cpu
x86/smp: Remove stack_smp_processor_id()
x86/uaccess: Move thread_info::addr_limit to thread_struct
x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
x86/dumpstack: When OOPSing, rewind the stack before do_exit()
x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
x86/dumpstack: Try harder to get a call trace on stack overflow
x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
x86/mm: Use pte_none() to test for empty PTE
x86/mm: Disallow running with 32-bit PTEs to work around erratum
x86/mm: Ignore A/D bits in pte/pmd/pud_none()
x86/mm: Move swap offset/type up in PTE to work around erratum
x86/entry: Inline enter_from_user_mode()
...
kernel_unmap_pages_in_pgd() is dangerous: if a PGD entry in
init_mm.pgd were to be cleared, callers would need to ensure that
the pgd entry hadn't been propagated to any other pgd.
Its only caller was efi_cleanup_page_tables(), and that, in turn,
was unused, so just delete both functions. This leaves a couple of
other helpers unused, so delete them, too.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/77ff20fdde3b75cd393be5559ad8218870520248.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends. That changed
when we forked out support for the latter into the export.h file.
This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig. The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.
Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed.
One module.h was converted to moduleparam.h since the file had
multiple module_param() in it, and another file had an instance of
MODULE_DEVICE_TABLE deleted, since that is a no-op when builtin.
Finally, the 32 bit build coverage of olpc_ofw revealed a couple
implicit includes, which were pretty self evident to fix based on
what gcc was complaining about.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-6-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Nothing calls the efi_get_time() function on x86, but it does suffer
from the 32-bit time_t overflow in 2038.
This removes the function, we can always put it back in case we need
it later.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-8-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, the efi_thunk macro has some semi-duplicated code in it that
can be replaced with the arch_efi_call_virt_setup/teardown macros. This
commit simply replaces the duplicated code with those macros.
Suggested-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roy Franz <roy.franz@linaro.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-7-git-send-email-matt@codeblueprint.co.uk
[ Renamed variables to the standard __ prefix. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.
efi_alloc_page_tables uses __GFP_REPEAT but it allocates an order-0
page. This means that this flag has never been actually useful here
because it has always been used only for PAGE_ALLOC_COSTLY requests.
Link: http://lkml.kernel.org/r/1464599699-30131-4-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 3195ef59cb ("x86: Do full rtc synchronization with ntp") had
the side-effect of unconditionally enabling the RTC_LIB symbol on x86,
which in turn disables the selection of the CONFIG_RTC and
CONFIG_GEN_RTC drivers that contain a two older implementations of
the CONFIG_RTC_DRV_CMOS driver.
This removes x86 from the list for genrtc, and changes all references
to the asm/rtc.h header to instead point to the interfaces
from linux/mc146818rtc.h.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Drivers should not really include stuff from asm-generic directly,
and the PC-style cmos rtc driver does this in order to reuse the
mc146818 implementation of get_rtc_time/set_rtc_time rather than
the architecture specific one for the architecture it gets built for.
To make it more obvious what is going on, this moves and renames the
two functions into include/linux/mc146818rtc.h, which holds the
other mc146818 specific code. Ideally it would be in a .c file,
but that would require extra infrastructure as the functions are
called by multiple drivers with conflicting dependencies.
With this change, the asm-generic/rtc.h header also becomes much
more generic, so it can be reused more easily across any architecture
that still relies on the genrtc driver.
The only caller of the internal __get_rtc_time/__set_rtc_time
functions is in arch/alpha/kernel/rtc.c, and we just change those
over to the new naming.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Pull x86 fixes from Ingo Molnar:
"Misc fixes: EFI, entry code, pkeys and MPX fixes, TASK_SIZE cleanups
and a tsc frequency table fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Switch from TASK_SIZE to TASK_SIZE_MAX in the page fault code
x86/fsgsbase/64: Use TASK_SIZE_MAX for FSBASE/GSBASE upper limits
x86/mm/mpx: Work around MPX erratum SKD046
x86/entry/64: Fix stack return address retrieval in thunk
x86/efi: Fix 7-parameter efi_call()s
x86/cpufeature, x86/mm/pkeys: Fix broken compile-time disabling of pkeys
x86/tsc: Add missing Cherrytrail frequency to the table
Alex Thorlton reported that the SGI/UV code crashes in the efi_call()
code when invoked with 7 parameters, due to:
mov (%rsp), %rax
mov 8(%rax), %rax
...
mov %rax, 40(%rsp)
Offset 8 is only true if CONFIG_FRAME_POINTERS is disabled,
with frame pointers enabled it should be 16.
Furthermore, the SAVE_XMM code saves the old stack pointer, but
that's just crazy. It saves the stack pointer *AFTER* we've done
the:
FRAME_BEGIN
... which will have *changed* the stack pointer, depending on whether
stack frames are enabled or not.
So when the code then does:
mov (%rsp), %rax
... we now move that old stack pointer into %rax, but the offset off that
stack pointer will depend on whether that FRAME_BEGIN saved off %rbp
or not.
So that whole 8-vs-16 offset confusion depends on the frame pointer!
If frame pointers were enabled, it will be 16. If they weren't, it
will be 8.
The right fix is to just get rid of that silly conditional frame
pointer thing, and always use frame pointers in this stub function.
And then we don't need that (odd) load to get the old stack
pointer into %rax - we can just use the frame pointer.
Reported-by: Alex Thorlton <athorlton@sgi.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/CA%2B55aFzBS2v%3DWnEH83cUDg7XkOremFqJ30BJwF40dCYjReBkUQ@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- In-kernel ACPICA code update to the upstream release 20160422
adding support for ACPI 6.1 along with some previously missing
bits of ACPI 6.0 support, making a fair amount of fixes and
cleanups and reducing divergences between the upstream ACPICA
and the in-kernel code (Bob Moore, Lv Zheng, Al Stone, Aleksey
Makarov, Will Miles).
- ACPI Generic Event Device (GED) support and a fix for it (Sinan Kaya,
Paul Gortmaker).
- INT3406 thermal driver for display thermal management and ACPI
backlight support code reorganization related to it (Aaron Lu,
Arnd Bergmann).
- Support for exporting the value returned by the _HRV (hardware
revision) ACPI object via sysfs (Betty Dall).
- Removal of the EXPERT dependency for ACPI on ARM64 (Mark Brown).
- Rework of the handling of ACPI _OSI mechanism allowing the
_OSI("Darwin") support to be overridden from the kernel command
line among other things (Lv Zheng, Chen Yu).
- Rework of the ACPI tables override mechanism to prepare it for
the introduction of overlays support going forward (Lv Zheng,
Rafael Wysocki).
- Fixes related to the ECDT support and module-level execution
of AML (Lv Zheng).
- ACPI PCI interrupts management update to make it work better on
ARM64 mostly (Sinan Kaya).
- ACPI SRAT handling update to make the code process all entires
in the table order regardless of the entry type (Lukasz Anaczkowski).
- EFI power off support for full-hardware ACPI platforms that don't
support ACPI S5 (Chen Yu).
- Fixes and cleanups related to the ACPI core's sysfs interface
(Dan Carpenter, Betty Dall).
- acpi_dev_present() API rework to reduce possible confusion related
to it (Lukas Wunner).
- Removal of CLK_IS_ROOT from two ACPI drivers (Stephen Boyd).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJXOjM+AAoJEILEb/54YlRxNO4P/0FsajR2iXfHybiHyJq+Iddk
MX+Jealb5klnXXtuih90oOHft9NypV1ESO7bcmjSz+2tuSgoXifdI3GO0aFghj7v
h8SaVpCGzlm+u8y+Ppbxk+eWHAV1+ohV8uaO47yDUjuyZgG6c702QqrJVaqunQoq
KQd+kqK5bhcaLhrx9Ro0I4Jbz0TdFa8j7noUTRXtDfJ9V4xZ3a6QfXz3H6GU4L31
kNKjroxkFXpHMj2mYXuskqw2IWoRZw7Z7kpLv0dM44nko6c+oM8/9BIx4xh1IbR4
vvgn/C2QYe45fz4Or/qmrPzGZ/kQtLiiVC2B/GWbCTezu3Px9E3V2NI0xLktVe0g
Y/MsRdzMs0TInWSVezOlTONmfcqZgPhbSmsuI9PJ7izxmzOLVk6tjXARkzWe2gQ0
N/nOd7I8AMsTMdpBCvf6xjJXqHRl6jdXuHAIhcPC5DINQ0daz8FZ4Cw42MtVKo0I
2OiZ7ZnAnDDHrptV9VwtEvo60Uw/QG8EhdMWyQVaFWe1pFNM9nQtD0P2QeMWUHhZ
YL7Q63nM8flQIywcSj7jyMWroWZMOI/cFOLGxZjz+yXA3fRizl4J22kJ392gSQti
da1X8OBKsOvYQutkeGeQCNYWp4j5uKpoMoR4iR4dOLNqguWxaicDSZgsU8cAAk0k
W+lRS/E8l+we5rxEZYOd
=rAwm
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"The new features here are ACPI 6.1 support (and some previously
missing bits of ACPI 6.0 support) in ACPICA and two new drivers, a
driver for the ACPI Generic Event Device (GED) feature introduced by
ACPI 6.1 and the INT3406 thermal driver for display thermal
management. Also the value returned by the _HRV (hardware revision)
ACPI object will be exported to user space via sysfs now.
In addition to that, ACPI on ARM64 will not depend on EXPERT any more.
The rest is mostly fixes and cleanups and some code reorganization.
Specifics:
- In-kernel ACPICA code update to the upstream release 20160422
adding support for ACPI 6.1 along with some previously missing bits
of ACPI 6.0 support, making a fair amount of fixes and cleanups and
reducing divergences between the upstream ACPICA and the in-kernel
code (Bob Moore, Lv Zheng, Al Stone, Aleksey Makarov, Will Miles)
- ACPI Generic Event Device (GED) support and a fix for it (Sinan
Kaya, Paul Gortmaker)
- INT3406 thermal driver for display thermal management and ACPI
backlight support code reorganization related to it (Aaron Lu, Arnd
Bergmann)
- Support for exporting the value returned by the _HRV (hardware
revision) ACPI object via sysfs (Betty Dall)
- Removal of the EXPERT dependency for ACPI on ARM64 (Mark Brown)
- Rework of the handling of ACPI _OSI mechanism allowing the
_OSI("Darwin") support to be overridden from the kernel command
line among other things (Lv Zheng, Chen Yu)
- Rework of the ACPI tables override mechanism to prepare it for the
introduction of overlays support going forward (Lv Zheng, Rafael
Wysocki)
- Fixes related to the ECDT support and module-level execution of AML
(Lv Zheng)
- ACPI PCI interrupts management update to make it work better on
ARM64 mostly (Sinan Kaya)
- ACPI SRAT handling update to make the code process all entires in
the table order regardless of the entry type (Lukasz Anaczkowski)
- EFI power off support for full-hardware ACPI platforms that don't
support ACPI S5 (Chen Yu)
- Fixes and cleanups related to the ACPI core's sysfs interface (Dan
Carpenter, Betty Dall)
- acpi_dev_present() API rework to reduce possible confusion related
to it (Lukas Wunner)
- Removal of CLK_IS_ROOT from two ACPI drivers (Stephen Boyd)"
* tag 'acpi-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (82 commits)
ACPI / video: mark acpi_video_get_levels() inline
Thermal / ACPI / video: add INT3406 thermal driver
ACPI / GED: make evged.c explicitly non-modular
ACPI / tables: Fix DSDT override mechanism
ACPI / sysfs: fix error code in get_status()
ACPICA: Update version to 20160422
ACPICA: Move all ASCII utilities to a common file
ACPICA: ACPI 2.0, Hardware: Add access_width/bit_offset support for acpi_hw_write()
ACPICA: ACPI 2.0, Hardware: Add access_width/bit_offset support in acpi_hw_read()
ACPICA: Executer: Introduce a set of macros to handle bit width mask generation
ACPICA: Hardware: Add optimized access bit width support
ACPICA: Utilities: Add ACPI_IS_ALIGNED() macro
ACPICA: Renamed some #defined flag constants for clarity
ACPICA: ACPI 6.0, tools/iasl: Add support for new resource descriptors
ACPICA: ACPI 6.0: Update _BIX support for new package element
ACPICA: ACPI 6.1: Support for new PCCT subtable
ACPICA: Refactor evaluate_object to reduce nesting
ACPICA: Divergence: remove unwanted spaces for typedef
ACPI,PCI,IRQ: remove SCI penalize function
ACPI,PCI,IRQ: remove redundant code in acpi_irq_penalty_init()
..
The promise of pretty boot splashes from firmware via BGRT was at
best only that; a promise. The kernel diligently checks to make
sure the BGRT data firmware gives it is valid, and dutifully warns
the user when it isn't. However, it does so via the pr_err log
level which seems unnecessary. The user cannot do anything about
this and there really isn't an error on the part of Linux to
correct.
This lowers the log level by using pr_notice instead. Users will
no longer have their boot process uglified by the kernel reminding
us that firmware can and often is broken when the 'quiet' kernel
parameter is specified. Ironic, considering BGRT is supposed to
make boot pretty to begin with.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Môshe van der Sterre <me@moshe.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462303781-8686-4-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This symbol is always set which makes it useless. Additionally we have
a kernel command-line switch, efi=debug, which actually controls the
printing of the memory map.
Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-16-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Our efi_memory_desc_t type is based on EFI_MEMORY_DESCRIPTOR version 1 in
the UEFI spec. No version updates are expected, but since we are about to
introduce support for new firmware tables that use the same descriptor
type, it makes sense to at least warn if we encounter other versions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-9-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Abolish the poorly named EFI memory map, 'memmap'. It is shadowed by a
bunch of local definitions in various files and having two ways to
access the EFI memory map ('efi.memmap' vs. 'memmap') is rather
confusing.
Furthermore, IA64 doesn't even provide this global object, which has
caused issues when trying to write generic EFI memmap code.
Replace all occurrences with efi.memmap, and convert the remaining
iterator code to use for_each_efi_mem_desc().
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Luck, Tony <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-8-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Most of the users of for_each_efi_memory_desc() are equally happy
iterating over the EFI memory map in efi.memmap instead of 'memmap',
since the former is usually a pointer to the latter.
For those users that want to specify an EFI memory map other than
efi.memmap, that can be done using for_each_efi_memory_desc_in_map().
One such example is in the libstub code where the firmware is queried
directly for the memory map, it gets iterated over, and then freed.
This change goes part of the way toward deleting the global 'memmap'
variable, which is not universally available on all architectures
(notably IA64) and is rather poorly named.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-7-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>