Drivers should not really include stuff from asm-generic directly,
and the PC-style cmos rtc driver does this in order to reuse the
mc146818 implementation of get_rtc_time/set_rtc_time rather than
the architecture specific one for the architecture it gets built for.
To make it more obvious what is going on, this moves and renames the
two functions into include/linux/mc146818rtc.h, which holds the
other mc146818 specific code. Ideally it would be in a .c file,
but that would require extra infrastructure as the functions are
called by multiple drivers with conflicting dependencies.
With this change, the asm-generic/rtc.h header also becomes much
more generic, so it can be reused more easily across any architecture
that still relies on the genrtc driver.
The only caller of the internal __get_rtc_time/__set_rtc_time
functions is in arch/alpha/kernel/rtc.c, and we just change those
over to the new naming.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Pull x86 fixes from Ingo Molnar:
"Misc fixes: EFI, entry code, pkeys and MPX fixes, TASK_SIZE cleanups
and a tsc frequency table fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Switch from TASK_SIZE to TASK_SIZE_MAX in the page fault code
x86/fsgsbase/64: Use TASK_SIZE_MAX for FSBASE/GSBASE upper limits
x86/mm/mpx: Work around MPX erratum SKD046
x86/entry/64: Fix stack return address retrieval in thunk
x86/efi: Fix 7-parameter efi_call()s
x86/cpufeature, x86/mm/pkeys: Fix broken compile-time disabling of pkeys
x86/tsc: Add missing Cherrytrail frequency to the table
Alex Thorlton reported that the SGI/UV code crashes in the efi_call()
code when invoked with 7 parameters, due to:
mov (%rsp), %rax
mov 8(%rax), %rax
...
mov %rax, 40(%rsp)
Offset 8 is only true if CONFIG_FRAME_POINTERS is disabled,
with frame pointers enabled it should be 16.
Furthermore, the SAVE_XMM code saves the old stack pointer, but
that's just crazy. It saves the stack pointer *AFTER* we've done
the:
FRAME_BEGIN
... which will have *changed* the stack pointer, depending on whether
stack frames are enabled or not.
So when the code then does:
mov (%rsp), %rax
... we now move that old stack pointer into %rax, but the offset off that
stack pointer will depend on whether that FRAME_BEGIN saved off %rbp
or not.
So that whole 8-vs-16 offset confusion depends on the frame pointer!
If frame pointers were enabled, it will be 16. If they weren't, it
will be 8.
The right fix is to just get rid of that silly conditional frame
pointer thing, and always use frame pointers in this stub function.
And then we don't need that (odd) load to get the old stack
pointer into %rax - we can just use the frame pointer.
Reported-by: Alex Thorlton <athorlton@sgi.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/CA%2B55aFzBS2v%3DWnEH83cUDg7XkOremFqJ30BJwF40dCYjReBkUQ@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- In-kernel ACPICA code update to the upstream release 20160422
adding support for ACPI 6.1 along with some previously missing
bits of ACPI 6.0 support, making a fair amount of fixes and
cleanups and reducing divergences between the upstream ACPICA
and the in-kernel code (Bob Moore, Lv Zheng, Al Stone, Aleksey
Makarov, Will Miles).
- ACPI Generic Event Device (GED) support and a fix for it (Sinan Kaya,
Paul Gortmaker).
- INT3406 thermal driver for display thermal management and ACPI
backlight support code reorganization related to it (Aaron Lu,
Arnd Bergmann).
- Support for exporting the value returned by the _HRV (hardware
revision) ACPI object via sysfs (Betty Dall).
- Removal of the EXPERT dependency for ACPI on ARM64 (Mark Brown).
- Rework of the handling of ACPI _OSI mechanism allowing the
_OSI("Darwin") support to be overridden from the kernel command
line among other things (Lv Zheng, Chen Yu).
- Rework of the ACPI tables override mechanism to prepare it for
the introduction of overlays support going forward (Lv Zheng,
Rafael Wysocki).
- Fixes related to the ECDT support and module-level execution
of AML (Lv Zheng).
- ACPI PCI interrupts management update to make it work better on
ARM64 mostly (Sinan Kaya).
- ACPI SRAT handling update to make the code process all entires
in the table order regardless of the entry type (Lukasz Anaczkowski).
- EFI power off support for full-hardware ACPI platforms that don't
support ACPI S5 (Chen Yu).
- Fixes and cleanups related to the ACPI core's sysfs interface
(Dan Carpenter, Betty Dall).
- acpi_dev_present() API rework to reduce possible confusion related
to it (Lukas Wunner).
- Removal of CLK_IS_ROOT from two ACPI drivers (Stephen Boyd).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJXOjM+AAoJEILEb/54YlRxNO4P/0FsajR2iXfHybiHyJq+Iddk
MX+Jealb5klnXXtuih90oOHft9NypV1ESO7bcmjSz+2tuSgoXifdI3GO0aFghj7v
h8SaVpCGzlm+u8y+Ppbxk+eWHAV1+ohV8uaO47yDUjuyZgG6c702QqrJVaqunQoq
KQd+kqK5bhcaLhrx9Ro0I4Jbz0TdFa8j7noUTRXtDfJ9V4xZ3a6QfXz3H6GU4L31
kNKjroxkFXpHMj2mYXuskqw2IWoRZw7Z7kpLv0dM44nko6c+oM8/9BIx4xh1IbR4
vvgn/C2QYe45fz4Or/qmrPzGZ/kQtLiiVC2B/GWbCTezu3Px9E3V2NI0xLktVe0g
Y/MsRdzMs0TInWSVezOlTONmfcqZgPhbSmsuI9PJ7izxmzOLVk6tjXARkzWe2gQ0
N/nOd7I8AMsTMdpBCvf6xjJXqHRl6jdXuHAIhcPC5DINQ0daz8FZ4Cw42MtVKo0I
2OiZ7ZnAnDDHrptV9VwtEvo60Uw/QG8EhdMWyQVaFWe1pFNM9nQtD0P2QeMWUHhZ
YL7Q63nM8flQIywcSj7jyMWroWZMOI/cFOLGxZjz+yXA3fRizl4J22kJ392gSQti
da1X8OBKsOvYQutkeGeQCNYWp4j5uKpoMoR4iR4dOLNqguWxaicDSZgsU8cAAk0k
W+lRS/E8l+we5rxEZYOd
=rAwm
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"The new features here are ACPI 6.1 support (and some previously
missing bits of ACPI 6.0 support) in ACPICA and two new drivers, a
driver for the ACPI Generic Event Device (GED) feature introduced by
ACPI 6.1 and the INT3406 thermal driver for display thermal
management. Also the value returned by the _HRV (hardware revision)
ACPI object will be exported to user space via sysfs now.
In addition to that, ACPI on ARM64 will not depend on EXPERT any more.
The rest is mostly fixes and cleanups and some code reorganization.
Specifics:
- In-kernel ACPICA code update to the upstream release 20160422
adding support for ACPI 6.1 along with some previously missing bits
of ACPI 6.0 support, making a fair amount of fixes and cleanups and
reducing divergences between the upstream ACPICA and the in-kernel
code (Bob Moore, Lv Zheng, Al Stone, Aleksey Makarov, Will Miles)
- ACPI Generic Event Device (GED) support and a fix for it (Sinan
Kaya, Paul Gortmaker)
- INT3406 thermal driver for display thermal management and ACPI
backlight support code reorganization related to it (Aaron Lu, Arnd
Bergmann)
- Support for exporting the value returned by the _HRV (hardware
revision) ACPI object via sysfs (Betty Dall)
- Removal of the EXPERT dependency for ACPI on ARM64 (Mark Brown)
- Rework of the handling of ACPI _OSI mechanism allowing the
_OSI("Darwin") support to be overridden from the kernel command
line among other things (Lv Zheng, Chen Yu)
- Rework of the ACPI tables override mechanism to prepare it for the
introduction of overlays support going forward (Lv Zheng, Rafael
Wysocki)
- Fixes related to the ECDT support and module-level execution of AML
(Lv Zheng)
- ACPI PCI interrupts management update to make it work better on
ARM64 mostly (Sinan Kaya)
- ACPI SRAT handling update to make the code process all entires in
the table order regardless of the entry type (Lukasz Anaczkowski)
- EFI power off support for full-hardware ACPI platforms that don't
support ACPI S5 (Chen Yu)
- Fixes and cleanups related to the ACPI core's sysfs interface (Dan
Carpenter, Betty Dall)
- acpi_dev_present() API rework to reduce possible confusion related
to it (Lukas Wunner)
- Removal of CLK_IS_ROOT from two ACPI drivers (Stephen Boyd)"
* tag 'acpi-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (82 commits)
ACPI / video: mark acpi_video_get_levels() inline
Thermal / ACPI / video: add INT3406 thermal driver
ACPI / GED: make evged.c explicitly non-modular
ACPI / tables: Fix DSDT override mechanism
ACPI / sysfs: fix error code in get_status()
ACPICA: Update version to 20160422
ACPICA: Move all ASCII utilities to a common file
ACPICA: ACPI 2.0, Hardware: Add access_width/bit_offset support for acpi_hw_write()
ACPICA: ACPI 2.0, Hardware: Add access_width/bit_offset support in acpi_hw_read()
ACPICA: Executer: Introduce a set of macros to handle bit width mask generation
ACPICA: Hardware: Add optimized access bit width support
ACPICA: Utilities: Add ACPI_IS_ALIGNED() macro
ACPICA: Renamed some #defined flag constants for clarity
ACPICA: ACPI 6.0, tools/iasl: Add support for new resource descriptors
ACPICA: ACPI 6.0: Update _BIX support for new package element
ACPICA: ACPI 6.1: Support for new PCCT subtable
ACPICA: Refactor evaluate_object to reduce nesting
ACPICA: Divergence: remove unwanted spaces for typedef
ACPI,PCI,IRQ: remove SCI penalize function
ACPI,PCI,IRQ: remove redundant code in acpi_irq_penalty_init()
..
The promise of pretty boot splashes from firmware via BGRT was at
best only that; a promise. The kernel diligently checks to make
sure the BGRT data firmware gives it is valid, and dutifully warns
the user when it isn't. However, it does so via the pr_err log
level which seems unnecessary. The user cannot do anything about
this and there really isn't an error on the part of Linux to
correct.
This lowers the log level by using pr_notice instead. Users will
no longer have their boot process uglified by the kernel reminding
us that firmware can and often is broken when the 'quiet' kernel
parameter is specified. Ironic, considering BGRT is supposed to
make boot pretty to begin with.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Môshe van der Sterre <me@moshe.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462303781-8686-4-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This symbol is always set which makes it useless. Additionally we have
a kernel command-line switch, efi=debug, which actually controls the
printing of the memory map.
Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-16-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Our efi_memory_desc_t type is based on EFI_MEMORY_DESCRIPTOR version 1 in
the UEFI spec. No version updates are expected, but since we are about to
introduce support for new firmware tables that use the same descriptor
type, it makes sense to at least warn if we encounter other versions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-9-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Abolish the poorly named EFI memory map, 'memmap'. It is shadowed by a
bunch of local definitions in various files and having two ways to
access the EFI memory map ('efi.memmap' vs. 'memmap') is rather
confusing.
Furthermore, IA64 doesn't even provide this global object, which has
caused issues when trying to write generic EFI memmap code.
Replace all occurrences with efi.memmap, and convert the remaining
iterator code to use for_each_efi_mem_desc().
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Luck, Tony <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-8-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Most of the users of for_each_efi_memory_desc() are equally happy
iterating over the EFI memory map in efi.memmap instead of 'memmap',
since the former is usually a pointer to the latter.
For those users that want to specify an EFI memory map other than
efi.memmap, that can be done using for_each_efi_memory_desc_in_map().
One such example is in the libstub code where the firmware is queried
directly for the memory map, it gets iterated over, and then freed.
This change goes part of the way toward deleting the global 'memmap'
variable, which is not universally available on all architectures
(notably IA64) and is rather poorly named.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-7-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The EFI_SYSTEM_TABLES status bit is set by all EFI supporting architectures
upon discovery of the EFI system table, but the bit is never tested in any
code we have in the tree. So remove it.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Luck, Tony <tony.luck@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The problem is Linux registers pm_power_off = efi_power_off only if
we are in hardware reduced mode. Actually, what we also want is to do
this when ACPI S5 is simply not supported on non-legacy platforms.
Since some future Intel platforms are HW-full mode where the DSDT
fails to supply an _S5 object(without SLP_TYP), we should let such
kind of platform to leverage efi runtime service to poweroff.
This patch uses efi power off as first choice when S5 is unavailable,
even if there is a customized poweroff(driver provided, eg).
Meanwhile, the legacy platforms will not be affected because there is
no path for them to overwrite the pm_power_off to efi power off.
Suggested-by: Len Brown <len.brown@intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull EFI updates from Ingo Molnar:
"The main changes are:
- Use separate EFI page tables when executing EFI firmware code.
This isolates the EFI context from the rest of the kernel, which
has security and general robustness advantages. (Matt Fleming)
- Run regular UEFI firmware with interrupts enabled. This is already
the status quo under other OSs. (Ard Biesheuvel)
- Various x86 EFI enhancements, such as the use of non-executable
attributes for EFI memory mappings. (Sai Praneeth Prakhya)
- Various arm64 UEFI enhancements. (Ard Biesheuvel)
- ... various fixes and cleanups.
The separate EFI page tables feature got delayed twice already,
because it's an intrusive change and we didn't feel confident about
it - third time's the charm we hope!"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
x86/mm/pat: Fix boot crash when 1GB pages are not supported by the CPU
x86/efi: Only map kernel text for EFI mixed mode
x86/efi: Map EFI_MEMORY_{XP,RO} memory region bits to EFI page tables
x86/mm/pat: Don't implicitly allow _PAGE_RW in kernel_map_pages_in_pgd()
efi/arm*: Perform hardware compatibility check
efi/arm64: Check for h/w support before booting a >4 KB granular kernel
efi/arm: Check for LPAE support before booting a LPAE kernel
efi/arm-init: Use read-only early mappings
efi/efistub: Prevent __init annotations from being used
arm64/vmlinux.lds.S: Handle .init.rodata.xxx and .init.bss sections
efi/arm64: Drop __init annotation from handle_kernel_image()
x86/mm/pat: Use _PAGE_GLOBAL bit for EFI page table mappings
efi/runtime-wrappers: Run UEFI Runtime Services with interrupts enabled
efi: Reformat GUID tables to follow the format in UEFI spec
efi: Add Persistent Memory type name
efi: Add NV memory attribute
x86/efi: Show actual ending addresses in efi_print_memmap
x86/efi/bgrt: Don't ignore the BGRT if the 'valid' bit is 0
efivars: Use to_efivar_entry
efi: Runtime-wrapper: Get rid of the rtc_lock spinlock
...
Pull 'objtool' stack frame validation from Ingo Molnar:
"This tree adds a new kernel build-time object file validation feature
(ONFIG_STACK_VALIDATION=y): kernel stack frame correctness validation.
It was written by and is maintained by Josh Poimboeuf.
The motivation: there's a category of hard to find kernel bugs, most
of them in assembly code (but also occasionally in C code), that
degrades the quality of kernel stack dumps/backtraces. These bugs are
hard to detect at the source code level. Such bugs result in
incorrect/incomplete backtraces most of time - but can also in some
rare cases result in crashes or other undefined behavior.
The build time correctness checking is done via the new 'objtool'
user-space utility that was written for this purpose and which is
hosted in the kernel repository in tools/objtool/. The tool's (very
simple) UI and source code design is shaped after Git and perf and
shares quite a bit of infrastructure with tools/perf (which tooling
infrastructure sharing effort got merged via perf and is already
upstream). Objtool follows the well-known kernel coding style.
Objtool does not try to check .c or .S files, it instead analyzes the
resulting .o generated machine code from first principles: it decodes
the instruction stream and interprets it. (Right now objtool supports
the x86-64 architecture.)
From tools/objtool/Documentation/stack-validation.txt:
"The kernel CONFIG_STACK_VALIDATION option enables a host tool named
objtool which runs at compile time. It has a "check" subcommand
which analyzes every .o file and ensures the validity of its stack
metadata. It enforces a set of rules on asm code and C inline
assembly code so that stack traces can be reliable.
Currently it only checks frame pointer usage, but there are plans to
add CFI validation for C files and CFI generation for asm files.
For each function, it recursively follows all possible code paths
and validates the correct frame pointer state at each instruction.
It also follows code paths involving special sections, like
.altinstructions, __jump_table, and __ex_table, which can add
alternative execution paths to a given instruction (or set of
instructions). Similarly, it knows how to follow switch statements,
for which gcc sometimes uses jump tables."
When this new kernel option is enabled (it's disabled by default), the
tool, if it finds any suspicious assembly code pattern, outputs
warnings in compiler warning format:
warning: objtool: rtlwifi_rate_mapping()+0x2e7: frame pointer state mismatch
warning: objtool: cik_tiling_mode_table_init()+0x6ce: call without frame pointer save/setup
warning: objtool:__schedule()+0x3c0: duplicate frame pointer save
warning: objtool:__schedule()+0x3fd: sibling call from callable instruction with changed frame pointer
... so that scripts that pick up compiler warnings will notice them.
All known warnings triggered by the tool are fixed by the tree, most
of the commits in fact prepare the kernel to be warning-free. Most of
them are bugfixes or cleanups that stand on their own, but there are
also some annotations of 'special' stack frames for justified cases
such entries to JIT-ed code (BPF) or really special boot time code.
There are two other long-term motivations behind this tool as well:
- To improve the quality and reliability of kernel stack frames, so
that they can be used for optimized live patching.
- To create independent infrastructure to check the correctness of
CFI stack frames at build time. CFI debuginfo is notoriously
unreliable and we cannot use it in the kernel as-is without extra
checking done both on the kernel side and on the build side.
The quality of kernel stack frames matters to debuggability as well,
so IMO we can merge this without having to consider the live patching
or CFI debuginfo angle"
* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
objtool: Only print one warning per function
objtool: Add several performance improvements
tools: Copy hashtable.h into tools directory
objtool: Fix false positive warnings for functions with multiple switch statements
objtool: Rename some variables and functions
objtool: Remove superflous INIT_LIST_HEAD
objtool: Add helper macros for traversing instructions
objtool: Fix false positive warnings related to sibling calls
objtool: Compile with debugging symbols
objtool: Detect infinite recursion
objtool: Prevent infinite recursion in noreturn detection
objtool: Detect and warn if libelf is missing and don't break the build
tools: Support relative directory path for 'O='
objtool: Support CROSS_COMPILE
x86/asm/decoder: Use explicitly signed chars
objtool: Enable stack metadata validation on 64-bit x86
objtool: Add CONFIG_STACK_VALIDATION option
objtool: Add tool to perform compile-time stack metadata validation
x86/kprobes: Mark kretprobe_trampoline() stack frame as non-standard
sched: Always inline context_switch()
...
Some machines have EFI regions in page zero (physical address
0x00000000) and historically that region has been added to the e820
map via trim_bios_range(), and ultimately mapped into the kernel page
tables. It was not mapped via efi_map_regions() as one would expect.
Alexis reports that with the new separate EFI page tables some boot
services regions, such as page zero, are not mapped. This triggers an
oops during the SetVirtualAddressMap() runtime call.
For the EFI boot services quirk on x86 we need to memblock_reserve()
boot services regions until after SetVirtualAddressMap(). Doing that
while respecting the ownership of regions that may have already been
reserved by the kernel was the motivation behind this commit:
7d68dc3f10 ("x86, efi: Do not reserve boot services regions within reserved areas")
That patch was merged at a time when the EFI runtime virtual mappings
were inserted into the kernel page tables as described above, and the
trick of setting ->numpages (and hence the region size) to zero to
track regions that should not be freed in efi_free_boot_services()
meant that we never mapped those regions in efi_map_regions(). Instead
we were relying solely on the existing kernel mappings.
Now that we have separate page tables we need to make sure the EFI
boot services regions are mapped correctly, even if someone else has
already called memblock_reserve(). Instead of stashing a tag in
->numpages, set the EFI_MEMORY_RUNTIME bit of ->attribute. Since it
generally makes no sense to mark a boot services region as required at
runtime, it's pretty much guaranteed the firmware will not have
already set this bit.
For the record, the specific circumstances under which Alexis
triggered this bug was that an EFI runtime driver on his machine was
responding to the EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE event during
SetVirtualAddressMap().
The event handler for this driver looks like this,
sub rsp,0x28
lea rdx,[rip+0x2445] # 0xaa948720
mov ecx,0x4
call func_aa9447c0 ; call to ConvertPointer(4, & 0xaa948720)
mov r11,QWORD PTR [rip+0x2434] # 0xaa948720
xor eax,eax
mov BYTE PTR [r11+0x1],0x1
add rsp,0x28
ret
Which is pretty typical code for an EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE
handler. The "mov r11, QWORD PTR [rip+0x2424]" was the faulting
instruction because ConvertPointer() was being called to convert the
address 0x0000000000000000, which when converted is left unchanged and
remains 0x0000000000000000.
The output of the oops trace gave the impression of a standard NULL
pointer dereference bug, but because we're accessing physical
addresses during ConvertPointer(), it wasn't. EFI boot services code
is stored at that address on Alexis' machine.
Reported-by: Alexis Murzeau <amurzeau@gmail.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raphael Hertzog <hertzog@debian.org>
Cc: Roger Shimizu <rogershimizu@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1457695163-29632-2-git-send-email-matt@codeblueprint.co.uk
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815125
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Code which runs outside the kernel's normal mode of operation often does
unusual things which can cause a static analysis tool like objtool to
emit false positive warnings:
- boot image
- vdso image
- relocation
- realmode
- efi
- head
- purgatory
- modpost
Set OBJECT_FILES_NON_STANDARD for their related files and directories,
which will tell objtool to skip checking them. It's ok to skip them
because they don't affect runtime stack traces.
Also skip the following code which does the right thing with respect to
frame pointers, but is too "special" to be validated by a tool:
- entry
- mcount
Also skip the test_nx module because it modifies its exception handling
table at runtime, which objtool can't understand. Fortunately it's
just a test module so it doesn't matter much.
Currently objtool is the only user of OBJECT_FILES_NON_STANDARD, but it
might eventually be useful for other tools.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/366c080e3844e8a5b6a0327dc7e8c2b90ca3baeb.1456719558.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
efi_call() is a callable non-leaf function which doesn't honor
CONFIG_FRAME_POINTER, which can result in bad stack traces.
Create a stack frame for it when CONFIG_FRAME_POINTER is enabled.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/2294b6fad60eea4cc862eddc8e98a1324e6eeeca.1453405861.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The correct symbol to use when figuring out the size of the kernel
text is '_etext', not '_end' which is the symbol for the entire kernel
image includes data and debug sections.
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1455712566-16727-14-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that we have EFI memory region bits that indicate which regions do
not need execute permission or read/write permission in the page tables,
let's use them.
We also check for EFI_NX_PE_DATA and only enforce the restrictive
mappings if it's present (to allow us to ignore buggy firmware that sets
bits it didn't mean to and to preserve backwards compatibility).
Instead of assuming that firmware would set appropriate attributes in
memory descriptor like EFI_MEMORY_RO for code and EFI_MEMORY_XP for
data, we can expect some firmware out there which might only set *type*
in memory descriptor to be EFI_RUNTIME_SERVICES_CODE or
EFI_RUNTIME_SERVICES_DATA leaving away attribute. This will lead to
improper mappings of EFI runtime regions. In order to avoid it, we check
attribute and type of memory descriptor to update mappings and moreover
Windows works this way.
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lee, Chun-Yi <jlee@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1455712566-16727-13-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As part of the preparation for the EFI_MEMORY_RO flag added in the UEFI
2.5 specification, we need the ability to map pages in kernel page
tables without _PAGE_RW being set.
Modify kernel_map_pages_in_pgd() to require its callers to pass _PAGE_RW
if the pages need to be mapped read/write. Otherwise, we'll map the
pages as read-only.
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lee, Chun-Yi <jlee@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1455712566-16727-12-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Adjust efi_print_memmap to print the real end address of each
range, not 1 byte beyond. This matches other prints like those
for SRAT and nosave memory.
While investigating grub persistent memory corruption issues, it
was helpful to make this table match the ending address
convention used by:
* the kernel's e820 table prints
BIOS-e820: [mem 0x0000001680000000-0x0000001c7fffffff] reserved
* the kernel's nosave memory prints
PM: Registered nosave memory: [mem 0x880000000-0xc7fffffff]
* the kernel's ACPI System Resource Affinity Table prints
SRAT: Node 1 PXM 1 [mem 0x480000000-0x87fffffff]
* grub's lsmmap and lsefimmap commands
reserved 0000001680000000-0000001c7fffffff 00600000 24GiB UC WC WT WB NV
* the UEFI shell's memmap command
Reserved 000000007FC00000-000000007FFFFFFF 0000000000000400 0000000000000001
For example, if you grep all the various logs for c7fffffff, you
won't find the kernel's line if it uses c80000000.
Also, change the closing ) to ] to match the opening [.
old:
efi: mem61: [Persistent Memory | | | | | | | |WB|WT|WC|UC] range=[0x0000000880000000-0x0000000c80000000) (16384MB)
new:
efi: mem61: [Persistent Memory | | | | | | | |WB|WT|WC|UC] range=[0x0000000880000000-0x0000000c7fffffff] (16384MB)
Signed-off-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1454364428-494-12-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Unintuitively, the BGRT graphic is apparently meant to be usable
if the valid bit in not set. The valid bit only conveys
uncertainty about the validity in relation to the screen state.
Windows 10 actually uses the BGRT image for its boot screen even
if not 'valid', for example when the user triggered the boot
menu. Because it is unclear if all firmwares will provide a
usable graphic in this case, we now look at the BMP magic number
as an additional check.
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Môshe van der Sterre <me@moshe.nl>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: =?UTF-8?q?M=C3=B4she=20van=20der=20Sterre?= <me@moshe.nl>
Link: http://lkml.kernel.org/r/1454364428-494-10-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The function efi_query_variable_store() may be invoked by
efivar_entry_set_nonblocking(), which itself takes care to only
call a non-blocking version of the SetVariable() runtime
wrapper. However, efi_query_variable_store() may call the
SetVariable() wrapper directly, as well as the wrapper for
QueryVariableInfo(), both of which could deadlock in the same
way we are trying to prevent by calling
efivar_entry_set_nonblocking() in the first place.
So instead, modify efi_query_variable_store() to use the
non-blocking variants of QueryVariableInfo() (and give up rather
than free up space if the available space is below
EFI_MIN_RESERVE) if invoked with the 'nonblocking' argument set
to true.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1454364428-494-5-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The switch to using a new dedicated page table for EFI runtime
calls in commit commit 67a9108ed4 ("x86/efi: Build our own
page table structures") failed to take into account changes
required for the kexec code paths, which are unfortunately
duplicated in the EFI code.
Call the allocation and setup functions in
kexec_enter_virtual_mode() just like we do for
__efi_enter_virtual_mode() to avoid hitting NULL-pointer
dereferences when making EFI runtime calls.
At the very least, the call to efi_setup_page_tables() should
have existed for kexec before the following commit:
67a9108ed4 ("x86/efi: Build our own page table structures")
Things just magically worked because we were actually using
the kernel's page tables that contained the required mappings.
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453385519-11477-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit a5d90c923b ("x86/efi: Quirk out SGI UV") added a quirk
to efi_apply_memmap_quirks to force SGI UV systems to fall back
to the old EFI memmap mechanism. We have a BIOS fix for this
issue on all systems except for UV1. This commit fixes up the
EFI quirk/MMR mapping code so that we only apply the special
case to UV1 hardware.
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1449867585-189233-2-git-send-email-athorlton@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Môshe reported the following warning triggered on his machine since
commit 50a0cb5652 ("x86/efi-bgrt: Fix kernel panic when mapping BGRT
data"),
[ 0.026936] ------------[ cut here ]------------
[ 0.026941] WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:137 __early_ioremap+0x102/0x1bb()
[ 0.026941] Modules linked in:
[ 0.026944] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-rc1 #2
[ 0.026945] Hardware name: Dell Inc. XPS 13 9343/09K8G1, BIOS A05 07/14/2015
[ 0.026946] 0000000000000000 900f03d5a116524d ffffffff81c03e60 ffffffff813a3fff
[ 0.026948] 0000000000000000 ffffffff81c03e98 ffffffff810a0852 00000000d7b76000
[ 0.026949] 0000000000000000 0000000000000001 0000000000000001 000000000000017c
[ 0.026951] Call Trace:
[ 0.026955] [<ffffffff813a3fff>] dump_stack+0x44/0x55
[ 0.026958] [<ffffffff810a0852>] warn_slowpath_common+0x82/0xc0
[ 0.026959] [<ffffffff810a099a>] warn_slowpath_null+0x1a/0x20
[ 0.026961] [<ffffffff81d8c395>] __early_ioremap+0x102/0x1bb
[ 0.026962] [<ffffffff81d8c602>] early_memremap+0x13/0x15
[ 0.026964] [<ffffffff81d78361>] efi_bgrt_init+0x162/0x1ad
[ 0.026966] [<ffffffff81d778ec>] efi_late_init+0x9/0xb
[ 0.026968] [<ffffffff81d58ff5>] start_kernel+0x46f/0x49f
[ 0.026970] [<ffffffff81d58120>] ? early_idt_handler_array+0x120/0x120
[ 0.026972] [<ffffffff81d58339>] x86_64_start_reservations+0x2a/0x2c
[ 0.026974] [<ffffffff81d58485>] x86_64_start_kernel+0x14a/0x16d
[ 0.026977] ---[ end trace f9b3812eb8e24c58 ]---
[ 0.026978] efi_bgrt: Ignoring BGRT: failed to map image memory
early_memremap() has an upper limit on the size of mapping it can
handle which is ~200KB. Clearly the BGRT image on Môshe's machine is
much larger than that.
There's actually no reason to restrict ourselves to using the early_*
version of memremap() - the ACPI BGRT driver is invoked late enough in
boot that we can use the standard version, with the benefit that the
late version allows mappings of arbitrary size.
Reported-by: Môshe van der Sterre <me@moshe.nl>
Tested-by: Môshe van der Sterre <me@moshe.nl>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1450707172-12561-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
because the kobject API can do that for us - Rasmus Villemoes
* Update the arm64 file paths in Documentation/efi-stub.txt to match
the current tree - Alan Ott
* Consistently preface all print statements with "efi" arch/x86 so
that it's more obvious to users reporting problems which statements
in the kernel log are relevant for EFI - Matt Fleming
* Fix a boot crash in the ACPI BGRT driver and delete
efi_lookup_mapped_addr() since it's useless now that the EFI
mappings *only* exist in the 'efi_pgd' page table. Instead we
always early_memremap() the BGRT memory - Sai Praneeth Prakhya
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJWbuWUAAoJEC84WcCNIz1VJpcQAKqs09lCyZ3scgusZwk0MM4x
fnDiJ9BW6GjWskY9AJzgcQLmb/pJJtbenQNIioVeeLEy93Vsn5+JCiJWs3BVC4o6
T3caYbObL5gJiKoqxIsKemXIPJpzVzjlGrz1JWB9M6dQFj89y9pMa2Vx2/oNT40x
sEp8MlNrgGm0Zy6wSZBBj/qk6tVYNQfaUoIYiCtyvTRFsyw1MA+mX47qLj/W9KSp
9XYN6Cfy8EfKl0ioNxhD+JtH3MPqk6ao7TRfJQoL5RLMa5/hAnI6dUJnfoeWyGgI
NpXmPCHcRQiLEbrpYXu2Rm5E5u244VuJaczmMKNvBHAdhAlpVD9airM7u8tedrzr
DWe1uKSDr5sfyNHJHevFDuVOD2Uarut0YOZe69/hQN39aFSe8VoFtquGrBJpACwQ
6zWB97t2u0ZlxNFUN/6wy+g3HxPItJZglGlAuzACqmtjtZ6jYyGu7d/QIPZ73CCK
gyQFoedr6Gnm8wEgCTkEyVLssdSz9t1rchUR2s710hp9V/wptgzG+dorwJzDAzLb
Q1xrH1wPZPqmfNL9Yn7RoEiSlz/Tk4y4i1jGsNuzEYxn0g4ElwYCJ/n8v1SEhIEk
c4mUZLRJ4RssjQ5LqarVJE7bWvhhQLiiORNXiUeWFi8zoO0KU8lBXUiedfAunYwo
/yzndz6k6sahdDTuhfa+
=VYPl
-----END PGP SIGNATURE-----
Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi
Pull efi changes from Matt Fleming:
* We don't need to carry our own formatting code in the esrt driver
because the kobject API can do that for us - Rasmus Villemoes
* Update the arm64 file paths in Documentation/efi-stub.txt to match
the current tree - Alan Ott
* Consistently preface all print statements with "efi" arch/x86 so
that it's more obvious to users reporting problems which statements
in the kernel log are relevant for EFI - Matt Fleming
* Fix a boot crash in the ACPI BGRT driver and delete
efi_lookup_mapped_addr() since it's useless now that the EFI
mappings *only* exist in the 'efi_pgd' page table. Instead we
always early_memremap() the BGRT memory - Sai Praneeth Prakhya
Starting with this commit 35eb8b81edd4 ("x86/efi: Build our own page
table structures") efi regions have a separate page directory called
"efi_pgd". In order to access any efi region we have to first shift %cr3
to this page table. In the bgrt code we are trying to copy bgrt_header
and image, but these regions fall under "EFI_BOOT_SERVICES_DATA"
and to access these regions we have to shift %cr3 to efi_pgd and not
doing so will cause page fault as shown below.
[ 0.251599] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
[ 0.259126] Freeing SMP alternatives memory: 32K (ffffffff8230e000 - ffffffff82316000)
[ 0.271803] BUG: unable to handle kernel paging request at fffffffefce35002
[ 0.279740] IP: [<ffffffff821bca49>] efi_bgrt_init+0x144/0x1fd
[ 0.286383] PGD 300f067 PUD 0
[ 0.289879] Oops: 0000 [#1] SMP
[ 0.293566] Modules linked in:
[ 0.297039] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-rc1-eywa-eywa-built-in-47041+ #2
[ 0.306619] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.R00.B104.B01.1511110114 11/11/2015
[ 0.320925] task: ffffffff820134c0 ti: ffffffff82000000 task.ti: ffffffff82000000
[ 0.329420] RIP: 0010:[<ffffffff821bca49>] [<ffffffff821bca49>] efi_bgrt_init+0x144/0x1fd
[ 0.338821] RSP: 0000:ffffffff82003f18 EFLAGS: 00010246
[ 0.344852] RAX: fffffffefce35000 RBX: fffffffefce35000 RCX: fffffffefce2b000
[ 0.352952] RDX: 000000008a82b000 RSI: ffffffff8235bb80 RDI: 000000008a835000
[ 0.361050] RBP: ffffffff82003f30 R08: 000000008a865000 R09: ffffffffff202850
[ 0.369149] R10: ffffffff811ad62f R11: 0000000000000000 R12: 0000000000000000
[ 0.377248] R13: ffff88016dbaea40 R14: ffffffff822622c0 R15: ffffffff82003fb0
[ 0.385348] FS: 0000000000000000(0000) GS:ffff88016d800000(0000) knlGS:0000000000000000
[ 0.394533] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.401054] CR2: fffffffefce35002 CR3: 000000000300c000 CR4: 00000000003406f0
[ 0.409153] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.417252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.425350] Stack:
[ 0.427638] ffffffffffffffff ffffffff82256900 ffff88016dbaea40 ffffffff82003f40
[ 0.436086] ffffffff821bbce0 ffffffff82003f88 ffffffff8219c0c2 0000000000000000
[ 0.444533] ffffffff8219ba4a ffffffff822622c0 0000000000083000 00000000ffffffff
[ 0.452978] Call Trace:
[ 0.455763] [<ffffffff821bbce0>] efi_late_init+0x9/0xb
[ 0.461697] [<ffffffff8219c0c2>] start_kernel+0x463/0x47f
[ 0.467928] [<ffffffff8219ba4a>] ? set_init_arg+0x55/0x55
[ 0.474159] [<ffffffff8219b120>] ? early_idt_handler_array+0x120/0x120
[ 0.481669] [<ffffffff8219b5ee>] x86_64_start_reservations+0x2a/0x2c
[ 0.488982] [<ffffffff8219b72d>] x86_64_start_kernel+0x13d/0x14c
[ 0.495897] Code: 00 41 b4 01 48 8b 78 28 e8 09 36 01 00 48 85 c0 48 89 c3 75 13 48 c7 c7 f8 ac d3 81 31 c0 e8 d7 3b fb fe e9 b5 00 00 00 45 84 e4 <44> 8b 6b 02 74 0d be 06 00 00 00 48 89 df e8 ae 34 0$
[ 0.518151] RIP [<ffffffff821bca49>] efi_bgrt_init+0x144/0x1fd
[ 0.524888] RSP <ffffffff82003f18>
[ 0.528851] CR2: fffffffefce35002
[ 0.532615] ---[ end trace 7b06521e6ebf2aea ]---
[ 0.537852] Kernel panic - not syncing: Attempted to kill the idle task!
As said above one way to fix this bug is to shift %cr3 to efi_pgd but we
are not doing that way because it leaks inner details of how we switch
to EFI page tables into a new call site and it also adds duplicate code.
Instead, we remove the call to efi_lookup_mapped_addr() and always
perform early_mem*() instead of early_io*() because we want to remap RAM
regions and not I/O regions. We also delete efi_lookup_mapped_addr()
because we are no longer using it.
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Reported-by: Wendy Wang <wendy.wang@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
The pr_*() calls in the x86 EFI code may or may not include a
subsystem tag, which makes it difficult to grep the kernel log for all
relevant EFI messages and leads users to miss important information.
Recently, a bug reporter provided all the EFI print messages from the
kernel log when trying to diagnose an issue but missed the following
statement because it wasn't prefixed with anything indicating it was
related to EFI,
pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
Cc: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
With commit e1a58320a3 ("x86/mm: Warn on W^X mappings") all
users booting on 64-bit UEFI machines see the following warning,
------------[ cut here ]------------
WARNING: CPU: 7 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x5dc/0x780()
x86/mm: Found insecure W+X mapping at address ffff88000005f000/0xffff88000005f000
...
x86/mm: Checked W+X mappings: FAILED, 165660 W+X pages found.
...
This is caused by mapping EFI regions with RWX permissions.
There isn't much we can do to restrict the permissions for these
regions due to the way the firmware toolchains mix code and
data, but we can at least isolate these mappings so that they do
not appear in the regular kernel page tables.
In commit d2f7cbe7b2 ("x86/efi: Runtime services virtual
mapping") we started using 'trampoline_pgd' to map the EFI
regions because there was an existing identity mapping there
which we use during the SetVirtualAddressMap() call and for
broken firmware that accesses those addresses.
But 'trampoline_pgd' shares some PGD entries with
'swapper_pg_dir' and does not provide the isolation we require.
Notably the virtual address for __START_KERNEL_map and
MODULES_START are mapped by the same PGD entry so we need to be
more careful when copying changes over in
efi_sync_low_kernel_mappings().
This patch doesn't go the full mile, we still want to share some
PGD entries with 'swapper_pg_dir'. Having completely separate
page tables brings its own issues such as synchronising new
mappings after memory hotplug and module loading. Sharing also
keeps memory usage down.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1448658575-17029-6-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This change is a prerequisite for pending patches that switch to
a dedicated EFI page table, instead of using 'trampoline_pgd'
which shares PGD entries with 'swapper_pg_dir'. The pending
patches make it impossible to dereference the runtime service
function pointer without first switching %cr3.
It's true that we now have duplicated switching code in
efi_call_virt() and efi_call_phys_{prolog,epilog}() but we are
sacrificing code duplication for a little more clarity and the
ease of writing the page table switching code in C instead of
asm.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1448658575-17029-5-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are relying on the pre-existing mappings in 'trampoline_pgd'
when accessing function arguments in the EFI mixed mode thunking
code.
Instead let's map memory explicitly so that things will continue
to work when we move to a separate page table in the future.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1448658575-17029-4-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The x86 pageattr code is confused about the data that is stored
in cpa->pfn, sometimes it's treated as a page frame number,
sometimes it's treated as an unshifted physical address, and in
one place it's treated as a pte.
The result of this is that the mapping functions do not map the
intended physical address.
This isn't a problem in practice because most of the addresses
we're mapping in the EFI code paths are already mapped in
'trampoline_pgd' and so the pageattr mapping functions don't
actually do anything in this case. But when we move to using a
separate page table for the EFI runtime this will be an issue.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1448658575-17029-3-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We have been getting away with using a void* for the physical
address of the UEFI memory map, since, even on 32-bit platforms
with 64-bit physical addresses, no truncation takes place if the
memory map has been allocated by the firmware (which only uses
1:1 virtually addressable memory), which is usually the case.
However, commit:
0f96a99dab ("efi: Add "efi_fake_mem" boot option")
adds code that clones and modifies the UEFI memory map, and the
clone may live above 4 GB on 32-bit platforms.
This means our use of void* for struct efi_memory_map::phys_map has
graduated from 'incorrect but working' to 'incorrect and
broken', and we need to fix it.
So redefine struct efi_memory_map::phys_map as phys_addr_t, and
get rid of a bunch of casts that are now unneeded.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: izumi.taku@jp.fujitsu.com
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: linux-efi@vger.kernel.org
Cc: matt.fleming@intel.com
Link: http://lkml.kernel.org/r/1445593697-1342-1-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
non-modular by ripping out the module_* code since Kconfig doesn't
allow it to be built as a module anyway - Paul Gortmaker
* Make the x86 efi=debug kernel parameter, which enables EFI debug
code and output, generic and usable by arm64 - Leif Lindholm
* Add support to the x86 EFI boot stub for 64-bit Graphics Output
Protocol frame buffer addresses - Matt Fleming
* Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
in the firmware and set an efi.flags bit so the kernel knows when
it can apply more strict runtime mapping attributes - Ard Biesheuvel
* Auto-load the efi-pstore module on EFI systems, just like we
currently do for the efivars module - Ben Hutchings
* Add "efi_fake_mem" kernel parameter which allows the system's EFI
memory map to be updated with additional attributes for specific
memory ranges. This is useful for testing the kernel code that handles
the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
doesn't include support - Taku Izumi
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWG7OwAAoJEC84WcCNIz1VEEEP/0SsdrwJ66B4MfP5YNjqHYWm
+OTHR6Ovv2i10kc+NjOV/GN8sWPndnkLfIfJ4EqJ9BoQ9PDEYZilV2aleSQ4DrPm
H7uGwBXQkfd76tZKX9pMToK76mkhg6M7M2LR3Suv3OGfOEzuozAOt3Ez37lpksTN
2ByhHr/oGbhu99jC2ki5+k0ySH8PMqDBRxqrPbBzTD+FfB7bM11vAJbSNbSMQ21R
ZwX0acZBLqb9J2Vf7tDsW+fCfz0TFo8JHW8jdLRFm/y2dpquzxswkkBpODgA8+VM
0F5UbiUdkaIRug75I6N/OJ8+yLwdzuxm7ul+tbS3JrXGLAlK3850+dP2Pr5zQ2Ce
zaYGRUy+tD5xMXqOKgzpu+Ia8XnDRLhOlHabiRd5fG6ZC9nR8E9uK52g79voSN07
pADAJnVB03CGV/HdduDOI4C4UykUKubuArbQVkqWJcecV1Jic/tYI0gjeACmU1VF
v8FzXpBUe3U3A0jauOz8PBz8M+k5qky/GbIrnEvXreBtKdt999LN9fykTN7rBOpo
dk/6vTR1Jyv3aYc9EXHmRluktI6KmfWCqmRBOIgQveX1VhdRM+1w2LKC0+8co3dF
v/DBh19KDyfPI8eOvxKykhn164UeAt03EXqDa46wFGr2nVOm/JiShL/d+QuyYU4G
8xb/rET4JrhCG4gFMUZ7
=1Oee
-----END PGP SIGNATURE-----
Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi
Pull v4.4 EFI updates from Matt Fleming:
- Make the EFI System Resource Table (ESRT) driver explicitly
non-modular by ripping out the module_* code since Kconfig doesn't
allow it to be built as a module anyway. (Paul Gortmaker)
- Make the x86 efi=debug kernel parameter, which enables EFI debug
code and output, generic and usable by arm64. (Leif Lindholm)
- Add support to the x86 EFI boot stub for 64-bit Graphics Output
Protocol frame buffer addresses. (Matt Fleming)
- Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
in the firmware and set an efi.flags bit so the kernel knows when
it can apply more strict runtime mapping attributes - Ard Biesheuvel
- Auto-load the efi-pstore module on EFI systems, just like we
currently do for the efivars module. (Ben Hutchings)
- Add "efi_fake_mem" kernel parameter which allows the system's EFI
memory map to be updated with additional attributes for specific
memory ranges. This is useful for testing the kernel code that handles
the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
doesn't include support. (Taku Izumi)
Note: there is a semantic conflict between the following two commits:
8a53554e12 ("x86/efi: Fix multiple GOP device support")
ae2ee627dc ("efifb: Add support for 64-bit frame buffer addresses")
I fixed up the interaction in the merge commit, changing the type of
current_fb_base from u32 to u64.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch renames print_efi_memmap() to efi_print_memmap() and
make it global function so that we can invoke it outside of
arch/x86/platform/efi/efi.c
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
fed6cefe3b ("x86/efi: Add a "debug" option to the efi= cmdline")
adds the DBG flag, but does so for x86 only. Move this early param
parsing to core code.
Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced
that signals that the firmware PE/COFF loader supports splitting
code and data sections of PE/COFF images into separate EFI
memory map entries. This allows the kernel to map those regions
with strict memory protections, e.g. EFI_MEMORY_RO for code,
EFI_MEMORY_XP for data, etc.
Unfortunately, an unwritten requirement of this new feature is
that the regions need to be mapped with the same offsets
relative to each other as observed in the EFI memory map. If
this is not done crashes like this may occur,
BUG: unable to handle kernel paging request at fffffffefe6086dd
IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
Call Trace:
[<ffffffff8104c90e>] efi_call+0x7e/0x100
[<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
[<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
[<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
[<ffffffff81f37e1b>] start_kernel+0x38a/0x417
[<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
[<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
Here 0xfffffffefe6086dd refers to an address the firmware
expects to be mapped but which the OS never claimed was mapped.
The issue is that included in these regions are relative
addresses to other regions which were emitted by the firmware
toolchain before the "splitting" of sections occurred at
runtime.
Needless to say, we don't satisfy this unwritten requirement on
x86_64 and instead map the EFI memory map entries in reverse
order. The above crash is almost certainly triggerable with any
kernel newer than v3.13 because that's when we rewrote the EFI
runtime region mapping code, in commit d2f7cbe7b2 ("x86/efi:
Runtime services virtual mapping"). For kernel versions before
v3.13 things may work by pure luck depending on the
fragmentation of the kernel virtual address space at the time we
map the EFI regions.
Instead of mapping the EFI memory map entries in reverse order,
where entry N has a higher virtual address than entry N+1, map
them in the same order as they appear in the EFI memory map to
preserve this relative offset between regions.
This patch has been kept as small as possible with the intention
that it should be applied aggressively to stable and
distribution kernels. It is very much a bugfix rather than
support for a new feature, since when EFI_PROPERTIES_TABLE is
enabled we must map things as outlined above to even boot - we
have no way of asking the firmware not to split the code/data
regions.
In fact, this patch doesn't even make use of the more strict
memory protections available in UEFI v2.5. That will come later.
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chun-Yi <jlee@suse.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Lee, Chun-Yi <jlee@suse.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are two kexec load syscalls, kexec_load another and kexec_file_load.
kexec_file_load has been splited as kernel/kexec_file.c. In this patch I
split kexec_load syscall code to kernel/kexec.c.
And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.
The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use
kexec_load syscall can bypass the checking.
Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.
Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig. Also updated general kernel code with to
kexec_load syscall.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
x86 and ia64 implement efi_mem_attributes() differently. This
function needs to be available for other architectures
(such as arm64) as well, such as for the purpose of ACPI/APEI.
ia64 EFI does not set up a 'memmap' variable and does not set
the EFI_MEMMAP flag, so it needs to have its unique implementation
of efi_mem_attributes().
Move efi_mem_attributes() implementation from x86 to the core
EFI code, and declare it with __weak.
It is recommended that other architectures should not override
the default implementation.
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Reviewed-by: Matt Fleming <matt.fleming@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438936621-5215-4-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It's totally legitimate, per the ACPI spec, for the firmware to
set the BGRT 'status' field to zero to indicate that the BGRT
image isn't being displayed, and we shouldn't be printing an
error message in that case because it's just noise for users. So
swap pr_err() for pr_debug().
However, Josh points that out it still makes sense to test the
validity of the upper 7 bits of the 'status' field, since
they're marked as "reserved" in the spec and must be zero. If
firmware violates this it really *is* an error.
Reported-by: Tom Yan <tom.ty89@gmail.com>
Tested-by: Tom Yan <tom.ty89@gmail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438936621-5215-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Even though it is documented how to specifiy efi parameters, it is
possible to cause a kernel panic due to a dereference of a NULL pointer when
parsing such parameters if "efi" alone is given:
PANIC: early exception 0e rip 10:ffffffff812fb361 error 0 cr2 0
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-rc1+ #450
[ 0.000000] ffffffff81fe20a9 ffffffff81e03d50 ffffffff8184bb0f 00000000000003f8
[ 0.000000] 0000000000000000 ffffffff81e03e08 ffffffff81f371a1 64656c62616e6520
[ 0.000000] 0000000000000069 000000000000005f 0000000000000000 0000000000000000
[ 0.000000] Call Trace:
[ 0.000000] [<ffffffff8184bb0f>] dump_stack+0x45/0x57
[ 0.000000] [<ffffffff81f371a1>] early_idt_handler_common+0x81/0xae
[ 0.000000] [<ffffffff812fb361>] ? parse_option_str+0x11/0x90
[ 0.000000] [<ffffffff81f4dd69>] arch_parse_efi_cmdline+0x15/0x42
[ 0.000000] [<ffffffff81f376e1>] do_early_param+0x50/0x8a
[ 0.000000] [<ffffffff8106b1b3>] parse_args+0x1e3/0x400
[ 0.000000] [<ffffffff81f37a43>] parse_early_options+0x24/0x28
[ 0.000000] [<ffffffff81f37691>] ? loglevel+0x31/0x31
[ 0.000000] [<ffffffff81f37a78>] parse_early_param+0x31/0x3d
[ 0.000000] [<ffffffff81f3ae98>] setup_arch+0x2de/0xc08
[ 0.000000] [<ffffffff8109629a>] ? vprintk_default+0x1a/0x20
[ 0.000000] [<ffffffff81f37b20>] start_kernel+0x90/0x423
[ 0.000000] [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
[ 0.000000] [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
[ 0.000000] RIP 0xffffffff81ba2efc
This panic is not reproducible with "efi=" as this will result in a non-NULL
zero-length string.
Thus, verify that the pointer to the parameter string is not NULL. This is
consistent with other parameter-parsing functions which check for NULL pointers.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
4 drivers / enabling modules:
NFIT:
Instantiates an "nvdimm bus" with the core and registers memory devices
(NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware Interface
table). After registering NVDIMMs the NFIT driver then registers
"region" devices. A libnvdimm-region defines an access mode and the
boundaries of persistent memory media. A region may span multiple
NVDIMMs that are interleaved by the hardware memory controller. In
turn, a libnvdimm-region can be carved into a "namespace" device and
bound to the PMEM or BLK driver which will attach a Linux block device
(disk) interface to the memory.
PMEM:
Initially merged in v4.1 this driver for contiguous spans of persistent
memory address ranges is re-worked to drive PMEM-namespaces emitted by
the libnvdimm-core. In this update the PMEM driver, on x86, gains the
ability to assert that writes to persistent memory have been flushed all
the way through the caches and buffers in the platform to persistent
media. See memcpy_to_pmem() and wmb_pmem().
BLK:
This new driver enables access to persistent memory media through "Block
Data Windows" as defined by the NFIT. The primary difference of this
driver to PMEM is that only a small window of persistent memory is
mapped into system address space at any given point in time. Per-NVDIMM
windows are reprogrammed at run time, per-I/O, to access different
portions of the media. BLK-mode, by definition, does not support DAX.
BTT:
This is a library, optionally consumed by either PMEM or BLK, that
converts a byte-accessible namespace into a disk with atomic sector
update semantics (prevents sector tearing on crash or power loss). The
sinister aspect of sector tearing is that most applications do not know
they have a atomic sector dependency. At least today's disk's rarely
ever tear sectors and if they do one almost certainly gets a CRC error
on access. NVDIMMs will always tear and always silently. Until an
application is audited to be robust in the presence of sector-tearing
the usage of BTT is recommended.
Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
Wysocki, and Bob Moore.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVjZGBAAoJEB7SkWpmfYgC4fkP/j+k6HmSRNU/yRYPyo7CAWvj
3P5P1i6R6nMZZbjQrQArAXaIyLlFk4sEQDYsciR6dmslhhFZAkR2eFwVO5rBOyx3
QN0yxEpyjJbroRFUrV/BLaFK4cq2oyJAFFHs0u7/pLHBJ4MDMqfRKAMtlnBxEkTE
LFcqXapSlvWitSbjMdIBWKFEvncaiJ2mdsFqT4aZqclBBTj00eWQvEG9WxleJLdv
+tj7qR/vGcwOb12X5UrbQXgwtMYos7A6IzhHbqwQL8IrOcJ6YB8NopJUpLDd7ZVq
KAzX6ZYMzNueN4uvv6aDfqDRLyVL7qoxM9XIjGF5R8SV9sF2LMspm1FBpfowo1GT
h2QMr0ky1nHVT32yspBCpE9zW/mubRIDtXxEmZZ53DIc4N6Dy9jFaNVmhoWtTAqG
b9pndFnjUzzieCjX5pCvo2M5U6N0AQwsnq76/CasiWyhSa9DNKOg8MVDRg0rbxb0
UvK0v8JwOCIRcfO3qiKcx+02nKPtjCtHSPqGkFKPySRvAdb+3g6YR26CxTb3VmnF
etowLiKU7HHalLvqGFOlDoQG6viWes9Zl+ZeANBOCVa6rL2O7ZnXJtYgXf1wDQee
fzgKB78BcDjXH4jHobbp/WBANQGN/GF34lse8yHa7Ym+28uEihDvSD1wyNLnefmo
7PJBbN5M5qP5tD0aO7SZ
=VtWG
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm
Pull libnvdimm subsystem from Dan Williams:
"The libnvdimm sub-system introduces, in addition to the
libnvdimm-core, 4 drivers / enabling modules:
NFIT:
Instantiates an "nvdimm bus" with the core and registers memory
devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware
Interface table).
After registering NVDIMMs the NFIT driver then registers "region"
devices. A libnvdimm-region defines an access mode and the
boundaries of persistent memory media. A region may span multiple
NVDIMMs that are interleaved by the hardware memory controller. In
turn, a libnvdimm-region can be carved into a "namespace" device and
bound to the PMEM or BLK driver which will attach a Linux block
device (disk) interface to the memory.
PMEM:
Initially merged in v4.1 this driver for contiguous spans of
persistent memory address ranges is re-worked to drive
PMEM-namespaces emitted by the libnvdimm-core.
In this update the PMEM driver, on x86, gains the ability to assert
that writes to persistent memory have been flushed all the way
through the caches and buffers in the platform to persistent media.
See memcpy_to_pmem() and wmb_pmem().
BLK:
This new driver enables access to persistent memory media through
"Block Data Windows" as defined by the NFIT. The primary difference
of this driver to PMEM is that only a small window of persistent
memory is mapped into system address space at any given point in
time.
Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access
different portions of the media. BLK-mode, by definition, does not
support DAX.
BTT:
This is a library, optionally consumed by either PMEM or BLK, that
converts a byte-accessible namespace into a disk with atomic sector
update semantics (prevents sector tearing on crash or power loss).
The sinister aspect of sector tearing is that most applications do
not know they have a atomic sector dependency. At least today's
disk's rarely ever tear sectors and if they do one almost certainly
gets a CRC error on access. NVDIMMs will always tear and always
silently. Until an application is audited to be robust in the
presence of sector-tearing the usage of BTT is recommended.
Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
Wysocki, and Bob Moore"
* tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits)
arch, x86: pmem api for ensuring durability of persistent memory updates
libnvdimm: Add sysfs numa_node to NVDIMM devices
libnvdimm: Set numa_node to NVDIMM devices
acpi: Add acpi_map_pxm_to_online_node()
libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only
pmem: flag pmem block devices as non-rotational
libnvdimm: enable iostat
pmem: make_request cleanups
libnvdimm, pmem: fix up max_hw_sectors
libnvdimm, blk: add support for blk integrity
libnvdimm, btt: add support for blk integrity
fs/block_dev.c: skip rw_page if bdev has integrity
libnvdimm: Non-Volatile Devices
tools/testing/nvdimm: libnvdimm unit test infrastructure
libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory
nd_btt: atomic sector updates
libnvdimm: infrastructure for btt devices
libnvdimm: write blk label set
libnvdimm: write pmem label set
libnvdimm: blk labels and namespace instantiation
...
UEFI GetMemoryMap() uses a new attribute bit to mark mirrored memory
address ranges. See UEFI 2.5 spec pages 157-158:
http://www.uefi.org/sites/default/files/resources/UEFI%202_5.pdf
On EFI enabled systems scan the memory map and tell memblock about any
mirrored ranges.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Xiexiuqi <xiexiuqi@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ACPI 6.0 formalizes e820-type-7 and efi-type-14 as persistent memory.
Mark it "reserved" and allow it to be claimed by a persistent memory
device driver.
This definition is in addition to the Linux kernel's existing type-12
definition that was recently added in support of shipping platforms with
NVDIMM support that predate ACPI 6.0 (which now classifies type-12 as
OEM reserved).
Note, /proc/iomem can be consulted for differentiating legacy
"Persistent Memory (legacy)" E820_PRAM vs standard "Persistent Memory"
E820_PMEM.
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add sysfs files for the EFI System Resource Table (ESRT) under
/sys/firmware/efi/esrt and for each EFI System Resource Entry under
entries/ as a subdir.
The EFI System Resource Table (ESRT) provides a read-only catalog of
system components for which the system accepts firmware upgrades via
UEFI's "Capsule Update" feature. This module allows userland utilities
to evaluate what firmware updates can be applied to this system, and
potentially arrange for those updates to occur.
The ESRT is described as part of the UEFI specification, in version 2.5
which should be available from http://uefi.org/specifications in early
2015. If you're a member of the UEFI Forum, information about its
addition to the standard is available as UEFI Mantis 1090.
For some hardware platforms, additional restrictions may be found at
http://msdn.microsoft.com/en-us/library/windows/hardware/jj128256.aspx ,
and additional documentation may be found at
http://download.microsoft.com/download/5/F/5/5F5D16CD-2530-4289-8019-94C6A20BED3C/windows-uefi-firmware-update-platform.docx
.
Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Pull x86 mm changes from Ingo Molnar:
"The main changes in this cycle were:
- reduce the x86/32 PAE per task PGD allocation overhead from 4K to
0.032k (Fenghua Yu)
- early_ioremap/memunmap() usage cleanups (Juergen Gross)
- gbpages support cleanups (Luis R Rodriguez)
- improve AMD Bulldozer (family 0x15) ASLR I$ aliasing workaround to
increase randomization by 3 bits (per bootup) (Hector
Marco-Gisbert)
- misc fixlets"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Improve AMD Bulldozer ASLR workaround
x86/mm/pat: Initialize __cachemode2pte_tbl[] and __pte2cachemode_tbl[] in a bit more readable fashion
init.h: Clean up the __setup()/early_param() macros
x86/mm: Simplify probe_page_size_mask()
x86/mm: Further simplify 1 GB kernel linear mappings handling
x86/mm: Use early_param_on_off() for direct_gbpages
init.h: Add early_param_on_off()
x86/mm: Simplify enabling direct_gbpages
x86/mm: Use IS_ENABLED() for direct_gbpages
x86/mm: Unexport set_memory_ro() and set_memory_rw()
x86/mm, efi: Use early_ioremap() in arch/x86/platform/efi/efi-bgrt.c
x86/mm: Use early_memunmap() instead of early_iounmap()
x86/mm/pat: Ensure different messages in STRICT_DEVMEM and PAT cases
x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes
Currently x86-64 efi_call_phys_prolog() saves into a global variable (save_pgd),
and efi_call_phys_epilog() restores the kernel pagetables from that global
variable.
Change this to a cleaner save/restore pattern where the saving function returns
the saved object and the restore function restores that.
Apply the same concept to the 32-bit code as well.
Plus this approach, as an added bonus, allows us to express the
!efi_enabled(EFI_OLD_MEMMAP) situation in a clean fashion as well,
via a 'NULL' return value.
Cc: Tapasweni Pathak <tapaswenipathak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Tapasweni Pathak reported that we do a kmalloc() in efi_call_phys_prolog()
on x86-64 while having interrupts disabled, which is a big no-no, as
kmalloc() can sleep.
Solve this by removing the irq disabling from the prolog/epilog calls
around EFI calls: it's unnecessary, as in this stage we are single
threaded in the boot thread, and we don't ever execute this from
interrupt contexts.
Reported-by: Tapasweni Pathak <tapaswenipathak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
... and hide the memory regions dump behind it. Make it default-off.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20141209095843.GA3990@pd.tnic
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Andy pointed out that if an NMI or MCE is received while we're in the
middle of an EFI mixed mode call a triple fault will occur. This can
happen, for example, when issuing an EFI mixed mode call while running
perf.
The reason for the triple fault is that we execute the mixed mode call
in 32-bit mode with paging disabled but with 64-bit kernel IDT handlers
installed throughout the call.
At Andy's suggestion, stop playing the games we currently do at runtime,
such as disabling paging and installing a 32-bit GDT for __KERNEL_CS. We
can simply switch to the __KERNEL32_CS descriptor before invoking
firmware services, and run in compatibility mode. This way, if an
NMI/MCE does occur the kernel IDT handler will execute correctly, since
it'll jump to __KERNEL_CS automatically.
However, this change is only possible post-ExitBootServices(). Before
then the firmware "owns" the machine and expects for its 32-bit IDT
handlers to be left intact to service interrupts, etc.
So, we now need to distinguish between early boot and runtime
invocations of EFI services. During early boot, we need to restore the
GDT that the firmware expects to be present. We can only jump to the
__KERNEL32_CS code segment for mixed mode calls after ExitBootServices()
has been invoked.
A liberal sprinkling of comments in the thunking code should make the
differences in early and late environments more apparent.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
In commit 3891a04aaf ("x86-64, espfix: Don't leak bits 31:16 of %esp
returning..") the "ESPFix Area" was added to the page table dump special
sections. That area, though, has a limited amount of entries printed.
The EFI runtime services are, unfortunately, located in-between the
espfix area and the high kernel memory mapping. Due to the enforced
limitation for the espfix area, the EFI mappings won't be printed in the
page table dump.
To make the ESP runtime service mappings visible again, provide them a
dedicated entry.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
The 32 bit and 64 bit implementations differ in their __init annotations
for some functions referenced from the common EFI code. Namely, the 32
bit variant is missing some of the __init annotations the 64 bit variant
has.
To solve the colliding annotations, mark the corresponding functions in
efi_32.c as initialization code, too -- as it is such.
Actually, quite a few more functions are only used during initialization
and therefore can be marked __init. They are therefore annotated, too.
Also add the __init annotation to the prototypes in the efi.h header so
users of those functions will see it's meant as initialization code
only.
This patch also fixes the "prelog" typo. ("prologue" / "epilogue" might
be more appropriate but this is C code after all, not an opera! :D)
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Commit 3f4a7836e3 ("x86/efi: Rip out phys_efi_get_time()") left
set_virtual_address_map as the only runtime service needed with a
phys mapping but missed to update the preceding comment. Fix that.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This variable was accidentally exported, even though it's only used in
this compilation unit and only during initialization.
Remove the bogus export, make the variable static instead and mark it
as __initdata.
Fixes: 200001eb14 ("x86 boot: only pick up additional EFI memmap...")
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
If enter virtual mode failed due to some reason other than the efi call
the EFI_RUNTIME_SERVICES bit in efi.flags should be cleared thus users
of efi runtime services can check the bit and handle the case instead of
assume efi runtime is ok.
Per Matt, if efi call SetVirtualAddressMap fails we will be not sure
it's safe to make any assumptions about the state of the system. So
kernel panics instead of clears EFI_RUNTIME_SERVICES bit.
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
noefi kernel param means actually disabling efi runtime, Per suggestion
from Leif Lindholm efi=noruntime should be better. But since noefi is
already used in X86 thus just adding another param efi=noruntime for
same purpose.
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
There should be a generic function to parse params like a=b,c
Adding parse_option_str in lib/cmdline.c which will return true
if there's specified option set in the params.
Also updated efi=old_map parsing code to use the new function
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
noefi param can be used for arches other than X86 later, thus move it
out of x86 platform code.
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Gracefully handle failures to allocate memory for the image, which might
be arbitrarily large.
efi_bgrt_init can fail in various ways as well, usually because the
BIOS-provided BGRT structure does not match expectations. Add
appropriate error messages rather than failing silently.
Reported-by: Srihari Vijayaraghavan <linux.bug.reporting@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81321
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
We need a way to customize the behaviour of the EFI boot stub, in
particular, we need a way to disable the "chunking" workaround, used
when reading files from the EFI System Partition.
One of my machines doesn't cope well when reading files in 1MB chunks to
a buffer above the 4GB mark - it appears that the "chunking" bug
workaround triggers another firmware bug. This was only discovered with
commit 4bf7111f50 ("x86/efi: Support initrd loaded above 4G"), and
that commit is perfectly valid. The symptom I observed was a corrupt
initrd rather than any kind of crash.
efi= is now used to specify EFI parameters in two very different
execution environments, the EFI boot stub and during kernel boot.
There is also a slight performance optimization by enabling efi=nochunk,
but that's offset by the fact that you're more likely to run into
firmware issues, at least on x86. This is the rationale behind leaving
the workaround enabled by default.
Also provide some documentation for EFI_READ_CHUNK_SIZE and why we're
using the current value of 1MB.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Roy Franz <roy.franz@linaro.org>
Cc: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
efi_set_rtc_mmss() is never used to set RTC due to bugs found
on many EFI platforms. It is set directly by mach_set_rtc_mmss().
Hence, remove unused efi_set_rtc_mmss() function.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Remove redundant set_bit(EFI_MEMMAP, &efi.flags) call.
It is executed earlier in efi_memmap_init().
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Remove redundant set_bit(EFI_SYSTEM_TABLES, &efi.flags) call.
It is executed earlier in efi_systab_init().
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Introduce EFI_PARAVIRT flag. If it is set then kernel runs
on EFI platform but it has not direct control on EFI stuff
like EFI runtime, tables, structures, etc. If not this means
that Linux Kernel has direct access to EFI infrastructure
and everything runs as usual.
This functionality is used in Xen dom0 because hypervisor
has full control on EFI stuff and all calls from dom0 to
EFI must be requested via special hypercall which in turn
executes relevant EFI code in behalf of dom0.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Do not access EFI memory map if it is not available. At least
Xen dom0 EFI implementation does not have an access to it.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Use early_mem*() instead of early_io*() because all mapped EFI regions
are memory (usually RAM but they could also be ROM, EPROM, EEPROM, flash,
etc.) not I/O regions. Additionally, I/O family calls do not work correctly
under Xen in our case. early_ioremap() skips the PFN to MFN conversion
when building the PTE. Using it for memory will attempt to map the wrong
machine frame. However, all artificial EFI structures created under Xen
live in dom0 memory and should be mapped/unmapped using early_mem*() family
calls which map domain memory.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
It appears that the BayTrail-T class of hardware requires EFI in order
to powerdown and reboot and no other reliable method exists.
This quirk is generally applicable to all hardware that has the ACPI
Hardware Reduced bit set, since usually ACPI would be the preferred
method.
Cc: Len Brown <len.brown@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
In order for other archs (such as arm64) to be able to reuse the virtual
mode function call wrappers, move them to drivers/firmware/efi/runtime-wrappers.c.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
The kbuild reports the following sparse errors,
>> arch/x86/platform/efi/quirks.c:242:23: sparse: incorrect type in >> argument 1 (different address spaces)
arch/x86/platform/efi/quirks.c:242:23: expected void [noderef] <asn:2>*addr
arch/x86/platform/efi/quirks.c:242:23: got void *[assigned] tablep
>> arch/x86/platform/efi/quirks.c:245:23: sparse: incorrect type in >> argument 1 (different address spaces)
arch/x86/platform/efi/quirks.c:245:23: expected void [noderef] <asn:2>*addr
arch/x86/platform/efi/quirks.c:245:23: got struct efi_setup_data *[assigned] data
Dave Young had made previous attempts to convert the early_iounmap()
calls to early_memunmap() but ran into merge conflicts with commit
9e5c33d7ae ("mm: create generic early_ioremap() support").
Now that we've got that commit in place we can switch to using
early_memunmap() since we're already using early_memremap() in
efi_reuse_config().
Cc: Dave Young <dyoung@redhat.com>
Cc: Saurabh Tangri <saurabh.tangri@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Currently, it's difficult to find all the workarounds that are
applied when running on EFI, because they're littered throughout
various code paths. This change moves all of them into a separate
file with the hope that it will be come the single location for all
our well documented quirks.
Signed-off-by: Saurabh Tangri <saurabh.tangri@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Now that 3.15 is released, this merges the 'next' branch into 'master',
bringing us to the normal situation where my 'master' branch is the
merge window.
* accumulated work in next: (6809 commits)
ufs: sb mutex merge + mutex_destroy
powerpc: update comments for generic idle conversion
cris: update comments for generic idle conversion
idle: remove cpu_idle() forward declarations
nbd: zero from and len fields in NBD_CMD_DISCONNECT.
mm: convert some level-less printks to pr_*
MAINTAINERS: adi-buildroot-devel is moderated
MAINTAINERS: add linux-api for review of API/ABI changes
mm/kmemleak-test.c: use pr_fmt for logging
fs/dlm/debug_fs.c: replace seq_printf by seq_puts
fs/dlm/lockspace.c: convert simple_str to kstr
fs/dlm/config.c: convert simple_str to kstr
mm: mark remap_file_pages() syscall as deprecated
mm: memcontrol: remove unnecessary memcg argument from soft limit functions
mm: memcontrol: clean up memcg zoneinfo lookup
mm/memblock.c: call kmemleak directly from memblock_(alloc|free)
mm/mempool.c: update the kmemleak stack trace for mempool allocations
lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations
mm: introduce kmemleak_update_trace()
mm/kmemleak.c: use %u to print ->checksum
...
Pull x86 EFI updates from Peter Anvin:
"A collection of EFI changes. The perhaps most important one is to
fully save and restore the FPU state around each invocation of EFI
runtime, and to not choke on non-ASCII characters in the boot stub"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efivars: Add compatibility code for compat tasks
efivars: Refactor sanity checking code into separate function
efivars: Stop passing a struct argument to efivar_validate()
efivars: Check size of user object
efivars: Use local variables instead of a pointer dereference
x86/efi: Save and restore FPU context around efi_calls (i386)
x86/efi: Save and restore FPU context around efi_calls (x86_64)
x86/efi: Implement a __efi_call_virt macro
x86, fpu: Extend the use of static_cpu_has_safe
x86/efi: Delete most of the efi_call* macros
efi: x86: Handle arbitrary Unicode characters
efi: Add get_dram_base() helper function
efi: Add shared printk wrapper for consistent prefixing
efi: create memory map iteration helper
efi: efi-stub-helper cleanup
For ioremapped efi memory aka old_map the virt addresses are not persistant
across kexec reboot. kexec-tools will read the runtime maps from sysfs then
pass them to 2nd kernel and assuming kexec efi boot is ok. This will cause
kexec boot failure.
To address this issue do not export runtime maps in case efi old_map so
userspace can use no efi boot instead.
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
earlyprintk=efi,keep will cause kernel hangs while freeing initmem like
below:
VFS: Mounted root (ext4 filesystem) readonly on device 254:2.
devtmpfs: mounted
Freeing unused kernel memory: 880K (ffffffff817d4000 - ffffffff818b0000)
It is caused by efi earlyprintk use __init function which will be freed
later. Such as early_efi_write is marked as __init, also it will use
early_ioremap which is init function as well.
To fix this issue, I added early initcall early_efi_map_fb which maps
the whole efi fb for later use. OTOH, adding a wrapper function
early_efi_map which calls early_ioremap before ioremap is available.
With this patch applied efi boot ok with earlyprintk=efi,keep console=efi
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
earlyprintk=efi,keep will cause kernel hangs while freeing initmem like
below:
VFS: Mounted root (ext4 filesystem) readonly on device 254:2.
devtmpfs: mounted
Freeing unused kernel memory: 880K (ffffffff817d4000 - ffffffff818b0000)
It is caused by efi earlyprintk use __init function which will be freed
later. Such as early_efi_write is marked as __init, also it will use
early_ioremap which is init function as well.
To fix this issue, I added early initcall early_efi_map_fb which maps
the whole efi fb for later use. OTOH, adding a wrapper function
early_efi_map which calls early_ioremap before ioremap is available.
With this patch applied efi boot ok with earlyprintk=efi,keep console=efi
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
For i386, all the EFI system runtime services functions return efi_status_t
except efi_reset_system_system. Therefore, not all functions can be covered
by the same macro in case the macro needs to do more than calling the function
(i.e., return a value). The purpose of the __efi_call_virt macro is to be used
when no return value is expected.
For x86_64, this macro would not be needed as all the runtime services return
u64. However, the same code is used for both x86_64 and i386. Thus, the macro
__efi_call_virt is also defined to not break compilation.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
We really only need one phys and one virt function call, and then only
one assembly function to make firmware calls.
Since we are not using the C type system anyway, we're not really losing
much by deleting the macros apart from no longer having a check that
we are passing the correct number of parameters. The lack of duplicated
code seems like a worthwhile trade-off.
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
In the thunk patches the 'attr' argument was dropped to
query_variable_info(). Restore it otherwise the firmware will return
EFI_INVALID_PARAMETER.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Dan reported that phys_efi_get_time() is doing kmalloc(..., GFP_KERNEL)
under a spinlock which is very clearly a bug. Since phys_efi_get_time()
has no users let's just delete it instead of trying to fix it.
Note that since there are no users of phys_efi_get_time(), it is not
possible to actually trigger a GFP_KERNEL alloc under the spinlock.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
I was triggering a #GP(0) from userland when running with
CONFIG_EFI_MIXED and CONFIG_IA32_EMULATION, from what looked like
register corruption. Turns out that the mixed mode code was trashing the
contents of %ds, %es and %ss in __efi64_thunk().
Save and restore the contents of these segment registers across the call
to __efi64_thunk() so that we don't corrupt the CPU context.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Alex reported hitting the following BUG after the EFI 1:1 virtual
mapping work was merged,
kernel BUG at arch/x86/mm/init_64.c:351!
invalid opcode: 0000 [#1] SMP
Call Trace:
[<ffffffff818aa71d>] init_extra_mapping_uc+0x13/0x15
[<ffffffff818a5e20>] uv_system_init+0x22b/0x124b
[<ffffffff8108b886>] ? clockevents_register_device+0x138/0x13d
[<ffffffff81028dbb>] ? setup_APIC_timer+0xc5/0xc7
[<ffffffff8108b620>] ? clockevent_delta2ns+0xb/0xd
[<ffffffff818a3a92>] ? setup_boot_APIC_clock+0x4a8/0x4b7
[<ffffffff8153d955>] ? printk+0x72/0x74
[<ffffffff818a1757>] native_smp_prepare_cpus+0x389/0x3d6
[<ffffffff818957bc>] kernel_init_freeable+0xb7/0x1fb
[<ffffffff81535530>] ? rest_init+0x74/0x74
[<ffffffff81535539>] kernel_init+0x9/0xff
[<ffffffff81541dfc>] ret_from_fork+0x7c/0xb0
[<ffffffff81535530>] ? rest_init+0x74/0x74
Getting this thing to work with the new mapping scheme would need more
work, so automatically switch to the old memmap layout for SGI UV.
Acked-by: Russ Anderson <rja@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Some firmware appears to enable interrupts during boot service calls,
even if we've explicitly disabled them prior to the call. This is
actually allowed per the UEFI spec because boottime services expect to
be called with interrupts enabled.
So that's fine, we just need to ensure that we disable them again in
efi_enter32() before switching to a 64-bit GDT, otherwise an interrupt
may fire causing a 32-bit IRQ handler to run after we've left
compatibility mode.
Despite efi_enter32() being called both for boottime and runtime
services, this really only affects boottime because the runtime services
callchain is executed with interrupts disabled. See efi_thunk().
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Add the Kconfig option and bump the kernel header version so that boot
loaders can check whether the handover code is available if they want.
The xloadflags field in the bzImage header is also updated to reflect
that the kernel supports both entry points by setting both of
XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
guaranteed to be addressable with 32-bits.
Note that no boot loaders should be using the bits set in xloadflags to
decide which entry point to jump to. The entire scheme is based on the
concept that 32-bit bootloaders always jump to ->handover_offset and
64-bit loaders always jump to ->handover_offset + 512. We set both bits
merely to inform the boot loader that it's safe to use the native
handover offset even if the machine type in the PE/COFF header claims
otherwise.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Setup the runtime services based on whether we're booting in EFI native
mode or not. For non-native mode we need to thunk from 64-bit into
32-bit mode before invoking the EFI runtime services.
Using the runtime services after SetVirtualAddressMap() is slightly more
complicated because we need to ensure that all the addresses we pass to
the firmware are below the 4GB boundary so that they can be addressed
with 32-bit pointers, see efi_setup_page_tables().
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Implement the transition code to go from IA32e mode to protected mode in
the EFI boot stub. This is required to use 32-bit EFI services from a
64-bit kernel.
Since EFI boot stub is executed in an identity-mapped region, there's
not much we need to do before invoking the 32-bit EFI boot services.
However, we do reload the firmware's global descriptor table
(efi32_boot_gdt) in case things like timer events are still running in
the firmware.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Both efi_free_boot_services() and efi_enter_virtual_mode() are invoked
from init/main.c, but only if the EFI runtime services are available.
This is not the case for non-native boots, e.g. where a 64-bit kernel is
booted with 32-bit EFI firmware.
Delete the dead code.
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
... into a kexec flavor for better code readability and simplicity. The
original one was getting ugly with ifdeffery.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Currently, running SetVirtualAddressMap() and passing the physical
address of the virtual map array was working only by a lucky coincidence
because the memory was present in the EFI page table too. Until Toshi
went and booted this on a big HP box - the krealloc() manner of resizing
the memmap we're doing did allocate from such physical addresses which
were not mapped anymore and boom:
http://lkml.kernel.org/r/1386806463.1791.295.camel@misato.fc.hp.com
One way to take care of that issue is to reimplement the krealloc thing
but with pages. We start with contiguous pages of order 1, i.e. 2 pages,
and when we deplete that memory (shouldn't happen all that often but you
know firmware) we realloc the next power-of-two pages.
Having the pages, it is much more handy and easy to map them into the
EFI page table with the already existing mapping code which we're using
for building the virtual mappings.
Thanks to Toshi Kani and Matt for the great debugging help.
Reported-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This is very useful for debugging issues with the recently added
pagetable switching code for EFI virtual mode.
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Coalesce formats and remove spaces before tabs.
Move __initdata after the variable declaration.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
For now we only ensure about 5kb free space for avoiding our board
refusing boot. But the comment lies that we retain 50% space.
Signed-off-by: Madper Xie <cxie@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
It makes more sense to set the feature flag in the success path of the
detection function than it does to rely on the caller doing it. Apart
from it being more logical to group the code and data together, it sets
a much better example for new EFI architectures.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
As we grow support for more EFI architectures they're going to want the
ability to query which EFI features are available on the running system.
Instead of storing this information in an architecture-specific place,
stick it in the global 'struct efi', which is already the central
location for EFI state.
While we're at it, let's change the return value of efi_enabled() to be
bool and replace all references to 'facility' with 'feature', which is
the usual word used to describe the attributes of the running system.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Madper reported seeing the following crash,
BUG: unable to handle kernel paging request at ffffffffff340003
IP: [<ffffffff81d85ba4>] efi_bgrt_init+0x9d/0x133
Call Trace:
[<ffffffff81d8525d>] efi_late_init+0x9/0xb
[<ffffffff81d68f59>] start_kernel+0x436/0x450
[<ffffffff81d6892c>] ? repair_env_string+0x5c/0x5c
[<ffffffff81d68120>] ? early_idt_handlers+0x120/0x120
[<ffffffff81d685de>] x86_64_start_reservations+0x2a/0x2c
[<ffffffff81d6871e>] x86_64_start_kernel+0x13e/0x14d
This is caused because the layout of the ACPI BGRT header on this system
doesn't match the definition from the ACPI spec, and so we get a bogus
physical address when dereferencing ->image_address in efi_bgrt_init().
Luckily the status field in the BGRT header clearly marks it as invalid,
so we can check that field and skip BGRT initialisation.
Reported-by: Madper Xie <cxie@redhat.com>
Suggested-by: Toshi Kani <toshi.kani@hp.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
We do not enable the new efi memmap on 32-bit and thus we need to run
runtime_code_page_mkexec() unconditionally there. Fix that.
Reported-and-tested-by: Lejun Zhu <lejun.zhu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
CONFIG_X86_32 doesn't map the boot services regions into the EFI memory
map (see commit 700870119f ("x86, efi: Don't map Boot Services on
i386")), and so efi_lookup_mapped_addr() will fail to return a valid
address. Executing the ioremap() path in efi_bgrt_init() causes the
following warning on x86-32 because we're trying to ioremap() RAM,
WARNING: CPU: 0 PID: 0 at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x2ad/0x2c0()
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-0.rc5.git0.1.2.fc21.i686 #1
Hardware name: DellInc. Venue 8 Pro 5830/09RP78, BIOS A02 10/17/2013
00000000 00000000 c0c0df08 c09a5196 00000000 c0c0df38 c0448c1e c0b41310
00000000 00000000 c0b37bc1 00000066 c043bbfd c043bbfd 00e7dfe0 00073eff
00073eff c0c0df48 c0448ce2 00000009 00000000 c0c0df9c c043bbfd 00078d88
Call Trace:
[<c09a5196>] dump_stack+0x41/0x52
[<c0448c1e>] warn_slowpath_common+0x7e/0xa0
[<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
[<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
[<c0448ce2>] warn_slowpath_null+0x22/0x30
[<c043bbfd>] __ioremap_caller+0x2ad/0x2c0
[<c0718f92>] ? acpi_tb_verify_table+0x1c/0x43
[<c0719c78>] ? acpi_get_table_with_size+0x63/0xb5
[<c087cd5e>] ? efi_lookup_mapped_addr+0xe/0xf0
[<c043bc2b>] ioremap_nocache+0x1b/0x20
[<c0cb01c8>] ? efi_bgrt_init+0x83/0x10c
[<c0cb01c8>] efi_bgrt_init+0x83/0x10c
[<c0cafd82>] efi_late_init+0x8/0xa
[<c0c9bab2>] start_kernel+0x3ae/0x3c3
[<c0c9b53b>] ? repair_env_string+0x51/0x51
[<c0c9b378>] i386_start_kernel+0x12e/0x131
Switch to using early_memremap(), which won't trigger this warning, and
has the added benefit of more accurately conveying what we're trying to
do - map a chunk of memory.
This patch addresses the following bug report,
https://bugzilla.kernel.org/show_bug.cgi?id=67911
Reported-by: Adam Williamson <awilliam@redhat.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
There's no need to save the runtime map details in global variables, the
values are only required to pass to efi_runtime_map_setup().
And because 'nr_efi_runtime_map' isn't needed, get_nr_runtime_map() can
be deleted along with 'efi_data_len'.
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Add a new setup_data type SETUP_EFI for kexec use. Passing the saved
fw_vendor, runtime, config tables and EFI runtime mappings.
When entering virtual mode, directly mapping the EFI runtime regions
which we passed in previously. And skip the step to call
SetVirtualAddressMap().
Specially for HP z420 workstation we need save the smbios physical
address. The kernel boot sequence proceeds in the following order.
Step 2 requires efi.smbios to be the physical address. However, I found
that on HP z420 EFI system table has a virtual address of SMBIOS in step
1. Hence, we need set it back to the physical address with the smbios
in efi_setup_data. (When it is still the physical address, it simply
sets the same value.)
1. efi_init() - Set efi.smbios from EFI system table
2. dmi_scan_machine() - Temporary map efi.smbios to access SMBIOS table
3. efi_enter_virtual_mode() - Map EFI ranges
Tested on ovmf+qemu, lenovo thinkpad, a dell laptop and an
HP z420 workstation.
Signed-off-by: Dave Young <dyoung@redhat.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
kexec kernel will need exactly same mapping for EFI runtime memory
ranges. Thus here export the runtime ranges mapping to sysfs,
kexec-tools will assemble them and pass to 2nd kernel via setup_data.
Introducing a new directory /sys/firmware/efi/runtime-map just like
/sys/firmware/memmap. Containing below attribute in each file of that
directory:
attribute num_pages phys_addr type virt_addr
Signed-off-by: Dave Young <dyoung@redhat.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Export fw_vendor, runtime and config table physical addresses to
/sys/firmware/efi/{fw_vendor,runtime,config_table} because kexec kernels
need them.
From EFI spec these 3 variables will be updated to virtual address after
entering virtual mode. But kernel startup code will need the physical
address.
Signed-off-by: Dave Young <dyoung@redhat.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Add two small functions:
efi_merge_regions() and efi_map_regions(), efi_enter_virtual_mode()
calls them instead of embedding two long for loop.
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Current code check boot service region with kernel text region by:
start+size >= __pa_symbol(_text)
The end of the above region should be start + size - 1 instead.
I see this problem in ovmf + Fedora 19 grub boot:
text start: 1000000 md start: 800000 md size: 800000
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Kexec kernel will use saved runtime virtual mapping, so add a new
function efi_map_region_fixed() for directly mapping a md to md->virt.
The md is passed in from 1st kernel, the virtual addr is saved in
md->virt_addr.
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
variables size and end is useless in this function, thus remove them.
Reported-by: Toshi Kani <toshi.kani@hp.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
UEFI time services are often broken once we're in virtual mode. We were
already refusing to use them on 64-bit systems, but it turns out that
they're also broken on some 32-bit firmware, including the Dell Venue.
Disable them for now, we can revisit once we have the 1:1 mappings code
incorporated.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Link: http://lkml.kernel.org/r/1385754283-2464-1-git-send-email-matthew.garrett@nebula.com
Cc: <stable@vger.kernel.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Dave reported seeing the following incorrect output on his Thinkpad T420
when using earlyprintk=efi,
[ 0.000000] efi: EFI v2.00 by Lenovo
ACPI=0xdabfe000 ACPI 2.0=0xdabfe014 SMBIOS=0xdaa9e000
The output should be on one line, not split over two. The cause is an
off-by-one error when checking that the efi_y coordinate hasn't been
incremented out of bounds.
Reported-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
groundwork for kexec support on EFI - Borislav Petkov
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJSfMDjAAoJEC84WcCNIz1V0LIP/2WyTbJR6bL0HXwnGLpxmxag
v0VgnKRhypNboA3WEu4a9as6AdExqB7qsiWIipHuDSMj/vkfZgHAKTd2f1iRxmsJ
RZxzwV6YzvsWkdXjvpCoLWKSsvQDug++BAIti1PitW6RXRjo01t3ymo/Ho1CQrpI
hNJbB3bbihMF+uqFvdSpO0KZtZE6EtnylrfBeuo0GzqqJdTGe1MmqlWmyUlEy5JW
ZiHV8E/xTjh3N675tWPcT9hGVfCyOXXu/kPrXsJTXrdYyZL9qgA9b8SVRLs6DctX
wVgL9lNv4wobsmZJ5DxkYl9+TaF7rbshUeIJbzrQyMVJjb3TpXk/ZpspDMAEjL7e
bb76c1bAx4xZuUatR/f1ykkWKAEryAhXHkvwcbIBjebW33if1MgGJLk5udJQQv6H
j+J9ROH38MDr0Geg+pM2RnCyTz8l+q+8Mfu4Yh9TSte+ttB6fr9phs3/G+fdSUn1
0vI627v1HWzDcBh4eZjjslzJviR8PldsGVT3EsIaOnHGtk/9FPz/7n4efph4v7+9
yqTkLvQHxsAx7f0tR/qRpkEIQ9WTMXO0IO79OC13QTSATJSl+WSPTJM7ccqOgn+P
h89ssBnzlwmFHkuvTi599KVHdzOZrWTsB2zROn+NnSchJ+YAY6TXwznk3/MNo9rB
d+euVcffL4yfWZ7Bzj8Z
=w6ff
-----END PGP SIGNATURE-----
Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi
Pull EFI virtual mapping changes from Matt Fleming:
* New static EFI runtime services virtual mapping layout which is
groundwork for kexec support on EFI. (Borislav Petkov)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 EFI changes from Ingo Molnar:
"Main changes:
- Add support for earlyprintk=efi which uses the EFI framebuffer.
Very useful for debugging boot problems.
- EFI stub support for large memory maps (more than 128 entries)
- EFI ARM support - this was mostly done by generalizing x86 <-> ARM
platform differences, such as by moving x86 EFI code into
drivers/firmware/efi/ and sharing it with ARM.
- Documentation updates
- misc fixes"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
x86/efi: Add EFI framebuffer earlyprintk support
boot, efi: Remove redundant memset()
x86/efi: Fix config_table_type array termination
x86 efi: bugfix interrupt disabling sequence
x86: EFI stub support for large memory maps
efi: resolve warnings found on ARM compile
efi: Fix types in EFI calls to match EFI function definitions.
efi: Renames in handle_cmdline_files() to complete generalization.
efi: Generalize handle_ramdisks() and rename to handle_cmdline_files().
efi: Allow efi_free() to be called with size of 0
efi: use efi_get_memory_map() to get final map for x86
efi: generalize efi_get_memory_map()
efi: Rename __get_map() to efi_get_memory_map()
efi: Move unicode to ASCII conversion to shared function.
efi: Generalize relocate_kernel() for use by other architectures.
efi: Move relocate_kernel() to shared file.
efi: Enforce minimum alignment of 1 page on allocations.
efi: Rename memory allocation/free functions
efi: Add system table pointer argument to shared functions.
efi: Move common EFI stub code from x86 arch code to common location
...
Check it just in case. We might just as well panic there because runtime
won't be functioning anyway.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
We map the EFI regions needed for runtime services non-contiguously,
with preserved alignment on virtual addresses starting from -4G down
for a total max space of 64G. This way, we provide for stable runtime
services addresses across kernels so that a kexec'd kernel can still use
them.
Thus, they're mapped in a separate pagetable so that we don't pollute
the kernel namespace.
Add an efi= kernel command line parameter for passing miscellaneous
options and chicken bits from the command line.
While at it, add a chicken bit called "efi=old_map" which can be used as
a fallback to the old runtime services mapping method in case there's
some b0rkage with a particular EFI implementation (haha, it is hard to
hold up the sarcasm here...).
Also, add the UEFI RT VA space to Documentation/x86/x86_64/mm.txt.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
It's incredibly difficult to diagnose early EFI boot issues without
special hardware because earlyprintk=vga doesn't work on EFI systems.
Add support for writing to the EFI framebuffer, via earlyprintk=efi,
which will actually give users a chance of providing debug output.
Cc: H. Peter Anvin <hpa@zytor.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Jones <pjones@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Incorrect use of 0 in terminating entry of arch_tables[] causes the
following sparse warning,
arch/x86/platform/efi/efi.c:74:27: sparse: Using plain integer as NULL pointer
Replace with NULL.
Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
[ Included sparse warning in commit message. ]
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Add patch to fix 32bit EFI service mapping (rhbz 726701)
Multiple people are reporting hitting the following WARNING on i386,
WARNING: at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x3d3/0x440()
Modules linked in:
Pid: 0, comm: swapper Not tainted 3.9.0-rc7+ #95
Call Trace:
[<c102b6af>] warn_slowpath_common+0x5f/0x80
[<c1023fb3>] ? __ioremap_caller+0x3d3/0x440
[<c1023fb3>] ? __ioremap_caller+0x3d3/0x440
[<c102b6ed>] warn_slowpath_null+0x1d/0x20
[<c1023fb3>] __ioremap_caller+0x3d3/0x440
[<c106007b>] ? get_usage_chars+0xfb/0x110
[<c102d937>] ? vprintk_emit+0x147/0x480
[<c1418593>] ? efi_enter_virtual_mode+0x1e4/0x3de
[<c102406a>] ioremap_cache+0x1a/0x20
[<c1418593>] ? efi_enter_virtual_mode+0x1e4/0x3de
[<c1418593>] efi_enter_virtual_mode+0x1e4/0x3de
[<c1407984>] start_kernel+0x286/0x2f4
[<c1407535>] ? repair_env_string+0x51/0x51
[<c1407362>] i386_start_kernel+0x12c/0x12f
Due to the workaround described in commit 916f676f8 ("x86, efi: Retain
boot service code until after switching to virtual mode") EFI Boot
Service regions are mapped for a period during boot. Unfortunately, with
the limited size of the i386 direct kernel map it's possible that some
of the Boot Service regions will not be directly accessible, which
causes them to be ioremap()'d, triggering the above warning as the
regions are marked as E820_RAM in the e820 memmap.
There are currently only two situations where we need to map EFI Boot
Service regions,
1. To workaround the firmware bug described in 916f676f8
2. To access the ACPI BGRT image
but since we haven't seen an i386 implementation that requires either,
this simple fix should suffice for now.
[ Added to changelog - Matt ]
Reported-by: Bryan O'Donoghue <bryan.odonoghue.lkml@nexus-software.ie>
Acked-by: Tom Zanussi <tom.zanussi@intel.com>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
efi_lookup_mapped_addr() is a handy utility for other platforms than
x86. Move it from arch/x86 to drivers/firmware. Add memmap pointer
to global efi structure, and initialise it on x86.
Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Common to (U)EFI support on all platforms is the global "efi" data
structure, and the code that parses the System Table to locate
addresses to populate that structure with.
This patch adds both of these to the global EFI driver code and
removes the local definition of the global "efi" data structure from
the x86 and ia64 code.
Squashed into one big patch to avoid breaking bisection.
Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This reverts commit 1acba98f81.
The firmware on both Dave's Thinkpad and Maarten's Macbook Pro appear to
rely on the old behaviour, and their machines fail to boot with the
above commit.
Reported-by: Dave Young <dyoung@redhat.com>
Reported-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Pull timer core updates from Thomas Gleixner:
"The timer changes contain:
- posix timer code consolidation and fixes for odd corner cases
- sched_clock implementation moved from ARM to core code to avoid
duplication by other architectures
- alarm timer updates
- clocksource and clockevents unregistration facilities
- clocksource/events support for new hardware
- precise nanoseconds RTC readout (Xen feature)
- generic support for Xen suspend/resume oddities
- the usual lot of fixes and cleanups all over the place
The parts which touch other areas (ARM/XEN) have been coordinated with
the relevant maintainers. Though this results in an handful of
trivial to solve merge conflicts, which we preferred over nasty cross
tree merge dependencies.
The patches which have been committed in the last few days are bug
fixes plus the posix timer lot. The latter was in akpms queue and
next for quite some time; they just got forgotten and Frederic
collected them last minute."
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
hrtimer: Remove unused variable
hrtimers: Move SMP function call to thread context
clocksource: Reselect clocksource when watchdog validated high-res capability
posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
posix_timers: fix racy timer delta caching on task exit
posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
selftests: add basic posix timers selftests
posix_cpu_timers: consolidate expired timers check
posix_cpu_timers: consolidate timer list cleanups
posix_cpu_timer: consolidate expiry time type
tick: Sanitize broadcast control logic
tick: Prevent uncontrolled switch to oneshot mode
tick: Make oneshot broadcast robust vs. CPU offlining
x86: xen: Sync the CMOS RTC as well as the Xen wallclock
x86: xen: Sync the wallclock when the system time is set
timekeeping: Indicate that clock was set in the pvclock gtod notifier
timekeeping: Pass flags instead of multiple bools to timekeeping_update()
xen: Remove clock_was_set() call in the resume path
hrtimers: Support resuming with two or more CPUs online (but stopped)
timer: Fix jiffies wrap behavior of round_jiffies_common()
...
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core
Frederic sayed: "Most of these patches have been hanging around for
several month now, in -mmotm for a significant chunk. They already
missed a few releases."
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 EFI changes from Ingo Molnar:
"Two fixes that should in principle increase robustness of our
interaction with the EFI firmware, and a cleanup"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, efi: retry ExitBootServices() on failure
efi: Convert runtime services function ptrs
UEFI: Don't pass boot services regions to SetVirtualAddressMap()
1. Check for allocation failure
2. Clear the buffer contents, as they may actually be written to flash
3. Don't leak the buffer
Compile-tested only.
[ Tested successfully on my buggy ASUS machine - Matt ]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
This patch reworks the UEFI anti-bricking code, including an effective
reversion of cc5a080c and 31ff2f20. It turns out that calling
QueryVariableInfo() from boot services results in some firmware
implementations jumping to physical addresses even after entering virtual
mode, so until we have 1:1 mappings for UEFI runtime space this isn't
going to work so well.
Reverting these gets us back to the situation where we'd refuse to create
variables on some systems because they classify deleted variables as "used"
until the firmware triggers a garbage collection run, which they won't do
until they reach a lower threshold. This results in it being impossible to
install a bootloader, which is unhelpful.
Feedback from Samsung indicates that the firmware doesn't need more than
5KB of storage space for its own purposes, so that seems like a reasonable
threshold. However, there's still no guarantee that a platform will attempt
garbage collection merely because it drops below this threshold. It seems
that this is often only triggered if an attempt to write generates a
genuine EFI_OUT_OF_RESOURCES error. We can force that by attempting to
create a variable larger than the remaining space. This should fail, but if
it somehow succeeds we can then immediately delete it.
I've tested this on the UEFI machines I have available, but I don't have
a Samsung and so can't verify that it avoids the bricking problem.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Lee, Chun-Y <jlee@suse.com> [ dummy variable cleanup ]
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
We need to map boot services regions during startup in order to avoid
firmware bugs, but we shouldn't be passing those regions to
SetVirtualAddressMap(). Ensure that we're only passing regions that are
marked as being mapped at runtime.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
All the virtualized platforms (KVM, lguest and Xen) have persistent
wallclocks that have more than one second of precision.
read_persistent_wallclock() and update_persistent_wallclock() allow
for nanosecond precision but their implementation on x86 with
x86_platform.get/set_wallclock() only allows for one second precision.
This means guests may see a wallclock time that is off by up to 1
second.
Make set_wallclock() and get_wallclock() take a struct timespec
parameter (which allows for nanosecond precision) so KVM and Xen
guests may start with a more accurate wallclock time and a Xen dom0
can maintain a more accurate wallclock for guests.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
That will be better initial the value of DataSize to zero for the input of
GetVariable(), otherwise we will feed a random value. The debug log of input
DataSize like this:
...
[ 195.915612] EFI Variables Facility v0.08 2004-May-17
[ 195.915819] efi: size: 18446744071581821342
[ 195.915969] efi: size': 18446744071581821342
[ 195.916324] efi: size: 18446612150714306560
[ 195.916632] efi: size': 18446612150714306560
[ 195.917159] efi: size: 18446612150714306560
[ 195.917453] efi: size': 18446612150714306560
...
The size' is value that was returned by BIOS.
After applied this patch:
[ 82.442042] EFI Variables Facility v0.08 2004-May-17
[ 82.442202] efi: size: 0
[ 82.442360] efi: size': 1039
[ 82.443828] efi: size: 0
[ 82.444127] efi: size': 2616
[ 82.447057] efi: size: 0
[ 82.447356] efi: size': 5832
...
Found on Acer Aspire V3 BIOS, it will not return the size of data if we input a
non-zero DataSize.
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Pull VFS updates from Al Viro,
Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).
7kloc removed.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...
Pull x86/efi changes from Peter Anvin:
"The bulk of these changes are cleaning up the efivars handling and
breaking it up into a tree of files. There are a number of fixes as
well.
The entire changeset is pretty big, but most of it is code movement.
Several of these commits are quite new; the history got very messed up
due to a mismerge with the urgent changes for rc8 which completely
broke IA64, and so Ingo requested that we rebase it to straighten it
out."
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: remove "kfree(NULL)"
efi: locking fix in efivar_entry_set_safe()
efi, pstore: Read data from variable store before memcpy()
efi, pstore: Remove entry from list when erasing
efi, pstore: Initialise 'entry' before iterating
efi: split efisubsystem from efivars
efivarfs: Move to fs/efivarfs
efivars: Move pstore code into the new EFI directory
efivars: efivar_entry API
efivars: Keep a private global pointer to efivars
efi: move utf16 string functions to efi.h
x86, efi: Make efi_memblock_x86_reserve_range more readable
efivarfs: convert to use simple_open()
Using this parameter one can disable the storage_size/2 check if
he is really sure that the UEFI does sane gc and fulfills the spec.
This parameter is useful if a devices uses more than 50% of the
storage by default.
The Intel DQSW67 desktop board is such a sucker for exmaple.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Fixes build with CONFIG_EFI_VARS=m which was broken after the commit
"x86, efivars: firmware bug workarounds should be in platform code".
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
EFI implementations distinguish between space that is actively used by a
variable and space that merely hasn't been garbage collected yet. Space
that hasn't yet been garbage collected isn't available for use and so isn't
counted in the remaining_space field returned by QueryVariableInfo().
Combined with commit 68d9298 this can cause problems. Some implementations
don't garbage collect until the remaining space is smaller than the maximum
variable size, and as a result check_var_size() will always fail once more
than 50% of the variable store has been used even if most of that space is
marked as available for garbage collection. The user is unable to create
new variables, and deleting variables doesn't increase the remaining space.
The problem that 68d9298 was attempting to avoid was one where certain
platforms fail if the actively used space is greater than 50% of the
available storage space. We should be able to calculate that by simply
summing the size of each available variable and subtracting that from
the total storage space. With luck this will fix the problem described in
https://bugzilla.kernel.org/show_bug.cgi?id=55471 without permitting
damage to occur to the machines 68d9298 was attempting to fix.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
EFI variables can be flagged as being accessible only within boot services.
This makes it awkward for us to figure out how much space they use at
runtime. In theory we could figure this out by simply comparing the results
from QueryVariableInfo() to the space used by all of our variables, but
that fails if the platform doesn't garbage collect on every boot. Thankfully,
calling QueryVariableInfo() while still inside boot services gives a more
reliable answer. This patch passes that information from the EFI boot stub
up to the efi platform code.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Some EFI implementations return always a MaximumVariableSize of 0,
check against max_size only if it is non-zero.
My Intel DQ67SW desktop board has such an implementation.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Let's not burden ia64 with checks in the common efivars code that we're not
writing too much data to the variable store. That kind of thing is an x86
firmware bug, plain and simple.
efi_query_variable_store() provides platforms with a wrapper in which they can
perform checks and workarounds for EFI variable storage bugs.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Every 11 minutes ntp attempts to update the x86 rtc with the current
system time. Currently, the x86 code only updates the rtc if the system
time is within +/-15 minutes of the current value of the rtc. This
was done originally to avoid setting the RTC if the RTC was in localtime
mode (common with Windows dualbooting). Other architectures do a full
synchronization and now that we have better infrastructure to detect
when the RTC is in localtime, there is no reason that x86 should be
software limited to a 30 minute window.
This patch changes the behavior of the kernel to do a full synchronization
(year, month, day, hour, minute, and second) of the rtc when ntp requests
a synchronization between the system time and the rtc.
I've used the RTC library functions in this patchset as they do all the
required bounds checking.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: x86@kernel.org
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: linux-efi@vger.kernel.org
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
[jstultz: Tweak commit message, fold in build fix found by fengguang
Also add select RTC_LIB to X86, per new dependency, as found by prarit]
Signed-off-by: John Stultz <john.stultz@linaro.org>
So basically this function copies EFI memmap stuff from boot_params into
the EFI memmap descriptor and reserves memory for it. Make it much more
readable.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Matthew Garret <mjg59@srcf.ucam.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Pull x86/EFI changes from Peter Anvin:
- Improve the initrd handling in the EFI boot stub by allowing forward
slashes in the pathname - from Chun-Yi Lee.
- Cleanup code duplication in the EFI mixed kernel/firmware code - from
Satoru Takeuchi.
- efivarfs bug fixes for more strict filename validation, with lots of
input from Al Viro.
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, efi: remove duplicate code in setup_arch() by using, efi_is_native()
efivarfs: guid part of filenames are case-insensitive
efivarfs: Validate filenames much more aggressively
efivarfs: Use sizeof() instead of magic number
x86, efi: Allow slash in file path of initrd