Merge yet more updates from Andrew Morton:
- the rest of ocfs2
- various hotfixes, mainly MM
- quite a bit of misc stuff - drivers, fork, exec, signals, etc.
- printk updates
- firmware
- checkpatch
- nilfs2
- more kexec stuff than usual
- rapidio updates
- w1 things
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits)
ipc: delete "nr_ipc_ns"
kcov: allow more fine-grained coverage instrumentation
init/Kconfig: add clarification for out-of-tree modules
config: add android config fragments
init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
relay: add global mode support for buffer-only channels
init: allow blacklisting of module_init functions
w1:omap_hdq: fix regression
w1: add helper macro module_w1_family
w1: remove need for ida and use PLATFORM_DEVID_AUTO
rapidio/switches: add driver for IDT gen3 switches
powerpc/fsl_rio: apply changes for RIO spec rev 3
rapidio: modify for rev.3 specification changes
rapidio: change inbound window size type to u64
rapidio/idt_gen2: fix locking warning
rapidio: fix error handling in mbox request/release functions
rapidio/tsi721_dma: advance queue processing from transfer submit call
rapidio/tsi721: add messaging mbox selector parameter
rapidio/tsi721: add PCIe MRRS override parameter
rapidio/tsi721_dma: add channel mask and queue size parameters
...
In general, there's no need for the "restore sigmask" flag to live in
ti->flags. alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
tile, and x86 use essentially identical alternative implementations,
placing the flag in ti->status.
Replace those optimized implementations with an equally good common
implementation that stores it in a bitfield in struct task_struct and
drop the custom implementations.
Additional architectures can opt in by removing their
TIF_RESTORE_SIGMASK defines.
Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
VGIC implementation.
- s390: support for trapping software breakpoints, nested virtualization
(vSIE), the STHYI opcode, initial extensions for CPU model support.
- MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups,
preliminary to this and the upcoming support for hardware virtualization
extensions.
- x86: support for execute-only mappings in nested EPT; reduced vmexit
latency for TSC deadline timer (by about 30%) on Intel hosts; support for
more than 255 vCPUs.
- PPC: bugfixes.
The ugly bit is the conflicts. A couple of them are simple conflicts due
to 4.7 fixes, but most of them are with other trees. There was definitely
too much reliance on Acked-by here. Some conflicts are for KVM patches
where _I_ gave my Acked-by, but the worst are for this pull request's
patches that touch files outside arch/*/kvm. KVM submaintainers should
probably learn to synchronize better with arch maintainers, with the
latter providing topic branches whenever possible instead of Acked-by.
This is what we do with arch/x86. And I should learn to refuse pull
requests when linux-next sends scary signals, even if that means that
submaintainers have to rebase their branches.
Anyhow, here's the list:
- arch/x86/kvm/vmx.c: handle_pcommit and EXIT_REASON_PCOMMIT was removed
by the nvdimm tree. This tree adds handle_preemption_timer and
EXIT_REASON_PREEMPTION_TIMER at the same place. In general all mentions
of pcommit have to go.
There is also a conflict between a stable fix and this patch, where the
stable fix removed the vmx_create_pml_buffer function and its call.
- virt/kvm/kvm_main.c: kvm_cpu_notifier was removed by the hotplug tree.
This tree adds kvm_io_bus_get_dev at the same place.
- virt/kvm/arm/vgic.c: a few final bugfixes went into 4.7 before the
file was completely removed for 4.8.
- include/linux/irqchip/arm-gic-v3.h: this one is entirely our fault;
this is a change that should have gone in through the irqchip tree and
pulled by kvm-arm. I think I would have rejected this kvm-arm pull
request. The KVM version is the right one, except that it lacks
GITS_BASER_PAGES_SHIFT.
- arch/powerpc: what a mess. For the idle_book3s.S conflict, the KVM
tree is the right one; everything else is trivial. In this case I am
not quite sure what went wrong. The commit that is causing the mess
(fd7bacbca4, "KVM: PPC: Book3S HV: Fix TB corruption in guest exit
path on HMI interrupt", 2016-05-15) touches both arch/powerpc/kernel/
and arch/powerpc/kvm/. It's large, but at 396 insertions/5 deletions
I guessed that it wasn't really possible to split it and that the 5
deletions wouldn't conflict. That wasn't the case.
- arch/s390: also messy. First is hypfs_diag.c where the KVM tree
moved some code and the s390 tree patched it. You have to reapply the
relevant part of commits 6c22c98637, plus all of e030c1125e, to
arch/s390/kernel/diag.c. Or pick the linux-next conflict
resolution from http://marc.info/?l=kvm&m=146717549531603&w=2.
Second, there is a conflict in gmap.c between a stable fix and 4.8.
The KVM version here is the correct one.
I have pushed my resolution at refs/heads/merge-20160802 (commit
3d1f53419842) at git://git.kernel.org/pub/scm/virt/kvm/kvm.git.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJXoGm7AAoJEL/70l94x66DugQIAIj703ePAFepB/fCrKHkZZia
SGrsBdvAtNsOhr7FQ5qvvjLxiv/cv7CymeuJivX8H+4kuUHUllDzey+RPHYHD9X7
U6n1PdCH9F15a3IXc8tDjlDdOMNIKJixYuq1UyNZMU6NFwl00+TZf9JF8A2US65b
x/41W98ilL6nNBAsoDVmCLtPNWAqQ3lajaZELGfcqRQ9ZGKcAYOaLFXHv2YHf2XC
qIDMf+slBGSQ66UoATnYV2gAopNlWbZ7n0vO6tE2KyvhHZ1m399aBX1+k8la/0JI
69r+Tz7ZHUSFtmlmyByi5IAB87myy2WQHyAPwj+4vwJkDGPcl0TrupzbG7+T05Y=
=42ti
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
- ARM: GICv3 ITS emulation and various fixes. Removal of the
old VGIC implementation.
- s390: support for trapping software breakpoints, nested
virtualization (vSIE), the STHYI opcode, initial extensions
for CPU model support.
- MIPS: support for MIPS64 hosts (32-bit guests only) and lots
of cleanups, preliminary to this and the upcoming support for
hardware virtualization extensions.
- x86: support for execute-only mappings in nested EPT; reduced
vmexit latency for TSC deadline timer (by about 30%) on Intel
hosts; support for more than 255 vCPUs.
- PPC: bugfixes.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
KVM: PPC: Introduce KVM_CAP_PPC_HTM
MIPS: Select HAVE_KVM for MIPS64_R{2,6}
MIPS: KVM: Reset CP0_PageMask during host TLB flush
MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
MIPS: KVM: Sign extend MFC0/RDHWR results
MIPS: KVM: Fix 64-bit big endian dynamic translation
MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
MIPS: KVM: Use 64-bit CP0_EBase when appropriate
MIPS: KVM: Set CP0_Status.KX on MIPS64
MIPS: KVM: Make entry code MIPS64 friendly
MIPS: KVM: Use kmap instead of CKSEG0ADDR()
MIPS: KVM: Use virt_to_phys() to get commpage PFN
MIPS: Fix definition of KSEGX() for 64-bit
KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
kvm: x86: nVMX: maintain internal copy of current VMCS
KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
KVM: arm64: vgic-its: Simplify MAPI error handling
KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
...
Pull x86 header cleanups from Ingo Molnar:
"This tree is a cleanup of the x86 tree reducing spurious uses of
module.h - which should improve build performance a bit"
* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, crypto: Restore MODULE_LICENSE() to glue_helper.c so it loads
x86/apic: Remove duplicated include from probe_64.c
x86/ce4100: Remove duplicated include from ce4100.c
x86/headers: Include spinlock_types.h in x8664_ksyms_64.c for missing spinlock_t
x86/platform: Delete extraneous MODULE_* tags fromm ts5500
x86: Audit and remove any remaining unnecessary uses of module.h
x86/kvm: Audit and remove any unnecessary uses of module.h
x86/xen: Audit and remove any unnecessary uses of module.h
x86/platform: Audit and remove any unnecessary uses of module.h
x86/lib: Audit and remove any unnecessary uses of module.h
x86/kernel: Audit and remove any unnecessary uses of module.h
x86/mm: Audit and remove any unnecessary uses of module.h
x86: Don't use module.h just for AUTHOR / LICENSE tags
Pull x86 microcode updates from Thomas Gleixner:
- more work to make the microcode loader robust
- a fix for the micro code load precedence
- fixes for initrd loading with randomized memory
- less printk noise on SMP machines
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm, x86/microcode: Add __PAGE_OFFSET_BASE define on 32-bit
x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
x86/microcode: Remove unused symbol exports
x86/microcode/intel: Do not issue microcode updates messages on each CPU
Documentation/microcode: Document some aspects for more clarity
x86/microcode/AMD: Make amd_ucode_patch[] static
x86/microcode/intel: Unexport save_mc_for_early()
x86/microcode/intel: Rename load_microcode_early() to find_microcode_patch()
x86/microcode: Propagate save_microcode_in_initrd() retval
x86/microcode: Get rid of find_cpio_data()'s dummy offset arg
lib/cpio: Make find_cpio_data()'s offset arg optional
x86/microcode: Fix suspend to RAM with builtin microcode
x86/microcode: Fix loading precedence
Pull x86 cpufeature updates from Thomas Gleixner:
- a workaround for the MONITOR instruction erratum of Goldmont CPUs
- small fixes and cleanups here and there
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add workaround for MONITOR instruction erratum on Goldmont based CPUs
x86/cpu: Rename "WESTMERE2" family to "NEHALEM_G"
x86/amd_nb: Clean up init path
x86/cpufeature: Add helper macro for mask check macros
x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated
x86/cpufeature: Update cpufeaure macros
Pull security subsystem updates from James Morris:
"Highlights:
- TPM core and driver updates/fixes
- IPv6 security labeling (CALIPSO)
- Lots of Apparmor fixes
- Seccomp: remove 2-phase API, close hole where ptrace can change
syscall #"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
tpm: Factor out common startup code
tpm: use devm_add_action_or_reset
tpm2_i2c_nuvoton: add irq validity check
tpm: read burstcount from TPM_STS in one 32-bit transaction
tpm: fix byte-order for the value read by tpm2_get_tpm_pt
tpm_tis_core: convert max timeouts from msec to jiffies
apparmor: fix arg_size computation for when setprocattr is null terminated
apparmor: fix oops, validate buffer size in apparmor_setprocattr()
apparmor: do not expose kernel stack
apparmor: fix module parameters can be changed after policy is locked
apparmor: fix oops in profile_unpack() when policy_db is not present
apparmor: don't check for vmalloc_addr if kvzalloc() failed
apparmor: add missing id bounds check on dfa verification
apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
apparmor: use list_next_entry instead of list_entry_next
apparmor: fix refcount race when finding a child profile
apparmor: fix ref count leak when profile sha1 hash is read
apparmor: check that xindex is in trans_table bounds
...
1/ Replace pcommit with ADR / directed-flushing:
The pcommit instruction, which has not shipped on any product, is
deprecated. Instead, the requirement is that platforms implement either
ADR, or provide one or more flush addresses per nvdimm. ADR
(Asynchronous DRAM Refresh) flushes data in posted write buffers to the
memory controller on a power-fail event. Flush addresses are defined in
ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure:
"Flush Hint Address Structure". A flush hint is an mmio address that
when written and fenced assures that all previous posted writes
targeting a given dimm have been flushed to media.
2/ On-demand ARS (address range scrub):
Linux uses the results of the ACPI ARS commands to track bad blocks
in pmem devices. When latent errors are detected we re-scrub the media
to refresh the bad block list, userspace can also request a re-scrub at
any time.
3/ Support for the Microsoft DSM (device specific method) command format.
4/ Support for EDK2/OVMF virtual disk device memory ranges.
5/ Various fixes and cleanups across the subsystem.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXmXBsAAoJEB7SkWpmfYgCEwwP/1IOt9ocP+iHLMDH9KE7VaTZ
NmUDR+Zy6g5cRQM7SgcuU5BXUcx+OsSrSrUTVF1cW994o9Gbz1mFotkv0ZAsPcYY
ZVRQxo2oqHrssyOcg+PsgKWiXn68rJOCgmpEyzaJywl5qTMst7pzsT1s1f7rSh6h
trCf4VaJJwxZR8fARGtlHUnnhPe2Orp99EZRKEWprAsIv2kPuWpPHSjRjuEgN1JG
KW8AYwWqFTtiLRUk86I4KBB0wcDrfctsjgN9Ogd6+aHyQBRnVSr2U+vDCFkC8KLu
qiDCpYp+yyxBjclnljz7tRRT3GtzfCUWd4v2KVWqgg2IaobUc0Lbukp/rmikUXQP
WLikT2OCQ994eFK5OX3Q3cIU/4j459TQnof8q14yVSpjAKrNUXVSR5puN7Hxa+V7
41wKrAsnsyY1oq+Yd/rMR8VfH7PHx3bFkrmRCGZCufLX1UQm4aYj+sWagDKiV3yA
DiudghbOnhfurfGsnXUVw7y7GKs+gNWNBmB6ndAD6ZEHmKoGUhAEbJDLCc3DnANl
b/2mv1MIdIcC1DlCmnbbcn6fv6bICe/r8poK3VrCK3UgOq/EOvKIWl7giP+k1JuC
6DdVYhlNYIVFXUNSLFAwz8OkLu8byx7WDm36iEqrKHtPw+8qa/2bWVgOU6OBgpjV
cN3edFVIdxvZeMgM5Ubq
=xCBG
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
- Replace pcommit with ADR / directed-flushing.
The pcommit instruction, which has not shipped on any product, is
deprecated. Instead, the requirement is that platforms implement
either ADR, or provide one or more flush addresses per nvdimm.
ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
to the memory controller on a power-fail event.
Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
A flush hint is an mmio address that when written and fenced assures
that all previous posted writes targeting a given dimm have been
flushed to media.
- On-demand ARS (address range scrub).
Linux uses the results of the ACPI ARS commands to track bad blocks
in pmem devices. When latent errors are detected we re-scrub the
media to refresh the bad block list, userspace can also request a
re-scrub at any time.
- Support for the Microsoft DSM (device specific method) command
format.
- Support for EDK2/OVMF virtual disk device memory ranges.
- Various fixes and cleanups across the subsystem.
* tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
nfit: do an ARS scrub on hitting a latent media error
nfit: move to nfit/ sub-directory
nfit, libnvdimm: allow an ARS scrub to be triggered on demand
libnvdimm: register nvdimm_bus devices with an nd_bus driver
pmem: clarify a debug print in pmem_clear_poison
x86/insn: remove pcommit
Revert "KVM: x86: add pcommit support"
nfit, tools/testing/nvdimm/: unify shutdown paths
libnvdimm: move ->module to struct nvdimm_bus_descriptor
nfit: cleanup acpi_nfit_init calling convention
nfit: fix _FIT evaluation memory leak + use after free
tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
tools/testing/nvdimm: add virtual ramdisk range
acpi, nfit: treat virtual ramdisk SPA as pmem region
pmem: kill __pmem address space
pmem: kill wmb_pmem()
libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
fs/dax: remove wmb_pmem()
libnvdimm, pmem: flush posted-write queues on shutdown
...
- ACPI support for guests on ARM platforms.
- Generic steal time support for arm and x86.
- Support cases where kernel cpu is not Xen VCPU number (e.g., if
in-guest kexec is used).
- Use the system workqueue instead of a custom workqueue in various
places.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXmLlrAAoJEFxbo/MsZsTRvRQH/1wOMF8BmlbZfR7H3qwDfjst
ApNifCiZE08xDtWBlwUaBFAQxyflQS9BBiNZDVK0sysIdXeOdpWV7V0ZjRoLL+xr
czsaaGXDcmXxJxApoMDVuT7FeP6rEk6LVAYRoHpVjJjMZGW3BbX1vZaMW4DXl2WM
9YNaF2Lj+rpc1f8iG31nUxwkpmcXFog6ct4tu7HiyCFT3hDkHt/a4ghuBdQItCkd
vqBa1pTpcGtQBhSmWzlylN/PV2+NKcRd+kGiwd09/O/rNzogTMCTTWeHKAtMpPYb
Cu6oSqJtlK5o0vtr0qyLSWEGIoyjE2gE92s0wN3iCzFY1PldqdsxUO622nIj+6o=
=G6q3
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from David Vrabel:
"Features and fixes for 4.8-rc0:
- ACPI support for guests on ARM platforms.
- Generic steal time support for arm and x86.
- Support cases where kernel cpu is not Xen VCPU number (e.g., if
in-guest kexec is used).
- Use the system workqueue instead of a custom workqueue in various
places"
* tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (47 commits)
xen: add static initialization of steal_clock op to xen_time_ops
xen/pvhvm: run xen_vcpu_setup() for the boot CPU
xen/evtchn: use xen_vcpu_id mapping
xen/events: fifo: use xen_vcpu_id mapping
xen/events: use xen_vcpu_id mapping in events_base
x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info
x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op
xen: introduce xen_vcpu_id mapping
x86/acpi: store ACPI ids from MADT for future usage
x86/xen: update cpuid.h from Xen-4.7
xen/evtchn: add IOCTL_EVTCHN_RESTRICT
xen-blkback: really don't leak mode property
xen-blkback: constify instance of "struct attribute_group"
xen-blkfront: prefer xenbus_scanf() over xenbus_gather()
xen-blkback: prefer xenbus_scanf() over xenbus_gather()
xen: support runqueue steal time on xen
arm/xen: add support for vm_assist hypercall
xen: update xen headers
xen-pciback: drop superfluous variables
xen-pciback: short-circuit read path used for merging write values
...
... in order to avoid #ifdeffery in code computing the ASLR randomization
offset. Remove that #ifdeffery in the microcode loader.
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160727120939.GA18911@nazgul.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge updates from Andrew Morton:
- a few misc bits
- ocfs2
- most(?) of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
thp: fix comments of __pmd_trans_huge_lock()
cgroup: remove unnecessary 0 check from css_from_id()
cgroup: fix idr leak for the first cgroup root
mm: memcontrol: fix documentation for compound parameter
mm: memcontrol: remove BUG_ON in uncharge_list
mm: fix build warnings in <linux/compaction.h>
mm, thp: convert from optimistic swapin collapsing to conservative
mm, thp: fix comment inconsistency for swapin readahead functions
thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
shmem: split huge pages beyond i_size under memory pressure
thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
khugepaged: add support of collapse for tmpfs/shmem pages
shmem: make shmem_inode_info::lock irq-safe
khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
thp: extract khugepaged from mm/huge_memory.c
shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
shmem: add huge pages support
shmem: get_unmapped_area align huge page
shmem: prepare huge= mount option and sysfs knob
mm, rmap: account shmem thp pages
...
- Support for ACPI SSDT overlays allowing Secondary System
Description Tables (SSDTs) to be loaded at any time from EFI
variables or via configfs (Octavian Purdila, Mika Westerberg).
- Support for the ACPI LPI (Low-Power Idle) feature introduced in
ACPI 6.0 and allowing processor idle states to be represented in
ACPI tables in a hierarchical way (with the help of Processor
Container objects) and support for ACPI idle states management
on ARM64, based on LPI (Sudeep Holla).
- General improvements of ACPI support for NUMA and ARM64 support
for ACPI-based NUMA (Hanjun Guo, David Daney, Robert Richter).
- General improvements of the ACPI table upgrade mechanism and
ARM64 support for that feature (Aleksey Makarov, Jon Masters).
- Support for the Boot Error Record Table (BERT) in APEI and
improvements of kernel messages printed by the error injection
code (Huang Ying, Borislav Petkov).
- New driver for the Intel Broxton WhiskeyCove PMIC operation
region and support for the REGS operation region on Broxton,
PMIC code cleanups (Bin Gao, Felipe Balbi, Paul Gortmaker).
- New driver for the power participant device which is part of the
Dynamic Power and Thermal Framework (DPTF) and DPTF-related code
reorganization (Srinivas Pandruvada).
- Support for the platform-initiated graceful shutdown feature
introduced in ACPI 6.1 (Prashanth Prakash).
- ACPI button driver update related to lid input events generated
automatically on initialization and system resume that have been
problematic for some time (Lv Zheng).
- ACPI EC driver cleanups (Lv Zheng).
- Documentation of the ACPICA release automation process and the
in-kernel ACPI AML debugger (Lv Zheng).
- New blacklist entry and two fixes for the ACPI backlight driver
(Alex Hung, Arvind Yadav, Ralf Gerbig).
- Cleanups of the ACPI pci_slot driver (Joe Perches, Paul Gortmaker).
- ACPI CPPC code changes to make it more robust against possible
defects in ACPI tables and new symbol definitions for PCC (Hoan
Tran).
- System reboot code modification to execute the ACPI _PTS (Prepare
To Sleep) method in addition to _TTS (Ocean He).
- ACPICA-related change to carry out lock ordering checks in ACPICA
if ACPICA debug is enabled in the kernel (Lv Zheng).
- Assorted minor fixes and cleanups (Andy Shevchenko, Baoquan He,
Bhaktipriya Shridhar, Paul Gortmaker, Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJXl8A7AAoJEILEb/54YlRxF0kQAI6mH0yan60Osu4598+VNvgv
wxOWl1TEbKd+LaJkofRZ+FPzZkQf5c/h/8Oo8Q3LEpFhjkARhhX7ThDjS5v2Nx6v
I/icQ64ynPUPrw6hGNVrmec9ofZjiAs3j3Rt2bEiae+YN6guvfhWE+kBCHo2G/nN
o4BSaxYjkphUTDSi4/5BfaocV2sl3apvwjtAj8zgGn4RD81bFFLnblynHkqJVcoN
HAfm7QTVjT01Zkv565OSZgK8CFcD8Ky2KKKBQvcIW8zQmD6IXaoTHSYSwL0SH+oK
bxUZUmWVfFWw4kDTAY9mw0QwtWz9ODTWh/WMhs3itWRRN5qHfogs99rCVYFtFufQ
ODVy4wpt4wmpzZVhyUDTTigAhznPAtCam6EpL1YeNbtyrRN4evvZVFHBZJnmhosX
zI9iLF4eqdnJZKvh+L1VFU+py8aAZpz1ZEOatNMI+xdhArbGm7v89cldzaRkJhuW
LZr+JqYQGaOZS5qSnymwJL1KfF66+2QGpzdvzJN5FNIDACoqanATbZ/Iie2ENcM+
WwCEWrGJFDmM30raBNNcvx0yHFtVkcNbOymla4paVg7i29nu88Ynw4Z6seIIP11C
DryzLFhw+3jdTg2zK/te/wkhciJ0F+iZjo6VXywSMnwatf36bpdp4r4JLUVfEo2t
8DOGKyFMLYY1zOPMK9Th
=YwbM
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"The new feaures here are the support for ACPI overlays (allowing ACPI
tables to be loaded at any time from EFI variables or via configfs)
and the LPI (Low-Power Idle) support. Also notable is the ACPI-based
NUMA support for ARM64.
Apart from that we have two new drivers, for the DPTF (Dynamic Power
and Thermal Framework) power participant device and for the Intel
Broxton WhiskeyCove PMIC, some more PMIC-related changes, support for
the Boot Error Record Table (BERT) in APEI and support for
platform-initiated graceful shutdown.
Plus two new pieces of documentation and usual assorted fixes and
cleanups in quite a few places.
Specifics:
- Support for ACPI SSDT overlays allowing Secondary System
Description Tables (SSDTs) to be loaded at any time from EFI
variables or via configfs (Octavian Purdila, Mika Westerberg).
- Support for the ACPI LPI (Low-Power Idle) feature introduced in
ACPI 6.0 and allowing processor idle states to be represented in
ACPI tables in a hierarchical way (with the help of Processor
Container objects) and support for ACPI idle states management on
ARM64, based on LPI (Sudeep Holla).
- General improvements of ACPI support for NUMA and ARM64 support for
ACPI-based NUMA (Hanjun Guo, David Daney, Robert Richter).
- General improvements of the ACPI table upgrade mechanism and ARM64
support for that feature (Aleksey Makarov, Jon Masters).
- Support for the Boot Error Record Table (BERT) in APEI and
improvements of kernel messages printed by the error injection code
(Huang Ying, Borislav Petkov).
- New driver for the Intel Broxton WhiskeyCove PMIC operation region
and support for the REGS operation region on Broxton, PMIC code
cleanups (Bin Gao, Felipe Balbi, Paul Gortmaker).
- New driver for the power participant device which is part of the
Dynamic Power and Thermal Framework (DPTF) and DPTF-related code
reorganization (Srinivas Pandruvada).
- Support for the platform-initiated graceful shutdown feature
introduced in ACPI 6.1 (Prashanth Prakash).
- ACPI button driver update related to lid input events generated
automatically on initialization and system resume that have been
problematic for some time (Lv Zheng).
- ACPI EC driver cleanups (Lv Zheng).
- Documentation of the ACPICA release automation process and the
in-kernel ACPI AML debugger (Lv Zheng).
- New blacklist entry and two fixes for the ACPI backlight driver
(Alex Hung, Arvind Yadav, Ralf Gerbig).
- Cleanups of the ACPI pci_slot driver (Joe Perches, Paul Gortmaker).
- ACPI CPPC code changes to make it more robust against possible
defects in ACPI tables and new symbol definitions for PCC (Hoan
Tran).
- System reboot code modification to execute the ACPI _PTS (Prepare
To Sleep) method in addition to _TTS (Ocean He).
- ACPICA-related change to carry out lock ordering checks in ACPICA
if ACPICA debug is enabled in the kernel (Lv Zheng).
- Assorted minor fixes and cleanups (Andy Shevchenko, Baoquan He,
Bhaktipriya Shridhar, Paul Gortmaker, Rafael Wysocki)"
* tag 'acpi-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (71 commits)
ACPI: enable ACPI_PROCESSOR_IDLE on ARM64
arm64: add support for ACPI Low Power Idle(LPI)
drivers: firmware: psci: initialise idle states using ACPI LPI
cpuidle: introduce CPU_PM_CPU_IDLE_ENTER macro for ARM{32, 64}
arm64: cpuidle: drop __init section marker to arm_cpuidle_init
ACPI / processor_idle: Add support for Low Power Idle(LPI) states
ACPI / processor_idle: introduce ACPI_PROCESSOR_CSTATE
ACPI / DPTF: move int340x_thermal.c to the DPTF folder
ACPI / DPTF: Add DPTF power participant driver
ACPI / lpat: make it explicitly non-modular
ACPI / dock: make dock explicitly non-modular
ACPI / PCI: make pci_slot explicitly non-modular
ACPI / PMIC: remove modular references from non-modular code
ACPICA: Linux: Enable ACPI_MUTEX_DEBUG for Linux kernel
ACPI: Rename configfs.c to acpi_configfs.c to prevent link error
ACPI / debugger: Add AML debugger documentation
ACPI: Add documentation describing ACPICA release automation
ACPI: add support for loading SSDTs via configfs
ACPI: add support for configfs
efi / ACPI: load SSTDs from EFI variables
...
- Rework the cpufreq governor interface to make it more straightforward
and modify the conservative governor to avoid using transition
notifications (Rafael Wysocki).
- Rework the handling of frequency tables by the cpufreq core to make
it more efficient (Viresh Kumar).
- Modify the schedutil governor to reduce the number of wakeups it
causes to occur in cases when the CPU frequency doesn't need to be
changed (Steve Muckle, Viresh Kumar).
- Fix some minor issues and clean up code in the cpufreq core and
governors (Rafael Wysocki, Viresh Kumar).
- Add Intel Broxton support to the intel_pstate driver (Srinivas
Pandruvada).
- Fix problems related to the config TDP feature and to the validity
of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka,
Srinivas Pandruvada).
- Make intel_pstate update the cpu_frequency tracepoint even if
the frequency doesn't change to avoid confusing powertop (Rafael
Wysocki).
- Clean up the usage of __init/__initdata in intel_pstate, mark some
of its internal variables as __read_mostly and drop an unused
structure element from it (Jisheng Zhang, Carsten Emde).
- Clean up the usage of some duplicate MSR symbols in intel_pstate
and turbostat (Srinivas Pandruvada).
- Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay
Adiga, Viresh Kumar, Ben Dooks).
- Fix a regression (introduced during the 4.5 cycle) in the
pcc-cpufreq driver by reverting the problematic commit (Andreas
Herrmann).
- Add support for Intel Denverton to intel_idle, clean up Broxton
support in it and make it explicitly non-modular (Jacob Pan,
Jan Beulich, Paul Gortmaker).
- Add support for Denverton and Ivy Bridge server to the Intel RAPL
power capping driver and make it more careful about the handing
of MSRs that may not be present (Jacob Pan, Xiaolong Wang).
- Fix resume from hibernation on x86-64 by making the CPU offline
during resume avoid using MONITOR/MWAIT in the "play dead" loop
which may lead to an inadvertent "revival" of a "dead" CPU and
a page fault leading to a kernel crash from it (Rafael Wysocki).
- Make memory management during resume from hibernation more
straightforward (Rafael Wysocki).
- Add debug features that should help to detect problems related
to hibernation and resume from it (Rafael Wysocki, Chen Yu).
- Clean up hibernation core somewhat (Rafael Wysocki).
- Prevent KASAN from instrumenting the hibernation core which leads
to large numbers of false-positives from it (James Morse).
- Prevent PM (hibernate and suspend) notifiers from being called
during the cleanup phase if they have not been called during the
corresponding preparation phase which is possible if one of the
other notifiers returns an error at that time (Lianwei Wang).
- Improve suspend-related debug printout in the tasks freezer and
clean up suspend-related console handling (Roger Lu, Borislav
Petkov).
- Update the AnalyzeSuspend script in the kernel sources to
version 4.2 (Todd Brandt).
- Modify the generic power domains framework to make it handle
system suspend/resume better (Ulf Hansson).
- Make the runtime PM framework avoid resuming devices synchronously
when user space changes the runtime PM settings for them and
improve its error reporting (Rafael Wysocki, Linus Walleij).
- Fix error paths in devfreq drivers (exynos, exynos-ppmu, exynos-bus)
and in the core, make some devfreq code explicitly non-modular and
change some of it into tristate (Bartlomiej Zolnierkiewicz,
Peter Chen, Paul Gortmaker).
- Add DT support to the generic PM clocks management code and make
it export some more symbols (Jon Hunter, Paul Gortmaker).
- Make the PCI PM core code slightly more robust against possible
driver errors (Andy Shevchenko).
- Make it possible to change DESTDIR and PREFIX in turbostat
(Andy Shevchenko).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJXl7/dAAoJEILEb/54YlRx+VgQAIQJOWvxKew3Yl02c/sdj9OT
5VNnFrzGzdcAPofvvG9qGq8B0Es1vYehJpwwOB21ri8EvYv0riIiU1yrqslObojQ
oaZOkSBpbIoKjGR4CpYA/A+feE+8EqIBdPGd+lx5a6oRdUi7tRVHBG9lyLO3FB/i
jan1q8dMpZsmu+Y+rVVHGnCVuIlIEqr2ZnZfCwDAulO2Arp/QFAh4kH08ELATvrl
bkPa25vq7/VMP/vCDzrfZKD5mUuKogIRu/J5wx4py1nE+FB35cKKyqBOgklLwAeY
UI8vjDhr/myNUs54AZlktOkq47TCYvjvhX9kmOxBjuWqFbRusU012IRek1fYPRIV
ZqbkqNX7UEVQwunAEg9AyFwyzEtOht93dQDT5RLEd4QzKuM76gmHpLeTGGMzE+nu
FnmF9JGl4DVwqpZl9yU2+hR2Mt3bP8OF8qYmNiGUB3KO4emPslhSd+6y8liA5Bx2
SJf0Gb//vaHCh3/uMnwAonYPqRkZvBLOMwuL1VUjNQfRMnQtDdgHMYB1aT/EglPA
8ww6j4J8rVRLAxvYQ3UEmNA/vBNclKXblRR18+JddEZP9/oX0ATfwnCCUpr839uk
xxyQhrm4/AI60+PHWCX4GG80YrKdOGTkF7LXCQZanVWjjuyF17rufegZ2YWLT07v
JU1Cmumfdy2jJluT8xsR
=uVGz
-----END PGP SIGNATURE-----
Merge tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"Again, the majority of changes go into the cpufreq subsystem, but
there are no big features this time. The cpufreq changes that stand
out somewhat are the governor interface rework and improvements
related to the handling of frequency tables. Apart from those, there
are fixes and new device/CPU IDs in drivers, cleanups and an
improvement of the new schedutil governor.
Next, there are some changes in the hibernation core, including a fix
for a nasty problem related to the MONITOR/MWAIT usage by CPU offline
during resume from hibernation, a few core improvements related to
memory management during resume, a couple of additional debug features
and cleanups.
Finally, we have some fixes and cleanups in the devfreq subsystem,
generic power domains framework improvements related to system
suspend/resume, support for some new chips in intel_idle and in the
power capping RAPL driver, a new version of the AnalyzeSuspend utility
and some assorted fixes and cleanups.
Specifics:
- Rework the cpufreq governor interface to make it more
straightforward and modify the conservative governor to avoid using
transition notifications (Rafael Wysocki).
- Rework the handling of frequency tables by the cpufreq core to make
it more efficient (Viresh Kumar).
- Modify the schedutil governor to reduce the number of wakeups it
causes to occur in cases when the CPU frequency doesn't need to be
changed (Steve Muckle, Viresh Kumar).
- Fix some minor issues and clean up code in the cpufreq core and
governors (Rafael Wysocki, Viresh Kumar).
- Add Intel Broxton support to the intel_pstate driver (Srinivas
Pandruvada).
- Fix problems related to the config TDP feature and to the validity
of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka,
Srinivas Pandruvada).
- Make intel_pstate update the cpu_frequency tracepoint even if the
frequency doesn't change to avoid confusing powertop (Rafael
Wysocki).
- Clean up the usage of __init/__initdata in intel_pstate, mark some
of its internal variables as __read_mostly and drop an unused
structure element from it (Jisheng Zhang, Carsten Emde).
- Clean up the usage of some duplicate MSR symbols in intel_pstate
and turbostat (Srinivas Pandruvada).
- Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay
Adiga, Viresh Kumar, Ben Dooks).
- Fix a regression (introduced during the 4.5 cycle) in the
pcc-cpufreq driver by reverting the problematic commit (Andreas
Herrmann).
- Add support for Intel Denverton to intel_idle, clean up Broxton
support in it and make it explicitly non-modular (Jacob Pan, Jan
Beulich, Paul Gortmaker).
- Add support for Denverton and Ivy Bridge server to the Intel RAPL
power capping driver and make it more careful about the handing of
MSRs that may not be present (Jacob Pan, Xiaolong Wang).
- Fix resume from hibernation on x86-64 by making the CPU offline
during resume avoid using MONITOR/MWAIT in the "play dead" loop
which may lead to an inadvertent "revival" of a "dead" CPU and a
page fault leading to a kernel crash from it (Rafael Wysocki).
- Make memory management during resume from hibernation more
straightforward (Rafael Wysocki).
- Add debug features that should help to detect problems related to
hibernation and resume from it (Rafael Wysocki, Chen Yu).
- Clean up hibernation core somewhat (Rafael Wysocki).
- Prevent KASAN from instrumenting the hibernation core which leads
to large numbers of false-positives from it (James Morse).
- Prevent PM (hibernate and suspend) notifiers from being called
during the cleanup phase if they have not been called during the
corresponding preparation phase which is possible if one of the
other notifiers returns an error at that time (Lianwei Wang).
- Improve suspend-related debug printout in the tasks freezer and
clean up suspend-related console handling (Roger Lu, Borislav
Petkov).
- Update the AnalyzeSuspend script in the kernel sources to version
4.2 (Todd Brandt).
- Modify the generic power domains framework to make it handle system
suspend/resume better (Ulf Hansson).
- Make the runtime PM framework avoid resuming devices synchronously
when user space changes the runtime PM settings for them and
improve its error reporting (Rafael Wysocki, Linus Walleij).
- Fix error paths in devfreq drivers (exynos, exynos-ppmu,
exynos-bus) and in the core, make some devfreq code explicitly
non-modular and change some of it into tristate (Bartlomiej
Zolnierkiewicz, Peter Chen, Paul Gortmaker).
- Add DT support to the generic PM clocks management code and make it
export some more symbols (Jon Hunter, Paul Gortmaker).
- Make the PCI PM core code slightly more robust against possible
driver errors (Andy Shevchenko).
- Make it possible to change DESTDIR and PREFIX in turbostat (Andy
Shevchenko)"
* tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency"
PM / hibernate: Introduce test_resume mode for hibernation
cpufreq: export cpufreq_driver_resolve_freq()
cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index()
PCI / PM: check all fields in pci_set_platform_pm()
cpufreq: acpi-cpufreq: use cached frequency mapping when possible
cpufreq: schedutil: map raw required frequency to driver frequency
cpufreq: add cpufreq_driver_resolve_freq()
cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT
intel_pstate: Update cpu_frequency tracepoint every time
cpufreq: intel_pstate: clean remnant struct element
PM / tools: scripts: AnalyzeSuspend v4.2
x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
cpufreq: powernv: Replacing pstate_id with frequency table index
intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
PM / hibernate: Image data protection during restoration
PM / hibernate: Add missing braces in __register_nosave_region()
PM / hibernate: Clean up comments in snapshot.c
PM / hibernate: Clean up function headers in snapshot.c
PM / hibernate: Add missing braces in hibernate_setup()
...
Page tables can bite a relatively big chunk off system memory and their
allocations are easy to trigger from userspace, so they should be
accounted to kmemcg.
This patch marks page table allocations as __GFP_ACCOUNT for x86. Note
we must not charge allocations of kernel page tables, because they can
be shared among processes from different cgroups so accounting them to a
particular one can pin other cgroups for indefinitely long. So we clear
__GFP_ACCOUNT flag if a page table is allocated for the kernel.
Link: http://lkml.kernel.org/r/7d5c54f6a2bcbe76f03171689440003d87e6c742.1464079538.git.vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull perf fixes from Ingo Molnar:
"This tree contains tooling fixes plus some additions:
- fixes to the vdso2c build environment that Stephen Rothwell is
using for the linux-next build (Arnaldo Carvalho de Melo)
- AVX-512 instruction mappings (Adrian Hunter)
- misc fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "perf tools: event.h needs asm/perf_regs.h"
x86: Make the vdso2c compiler use the host architecture headers
tools build: Fix objtool build with ARCH=x86_64
objtool: Always use host headers
objtool: Use tools/scripts/Makefile.arch to get ARCH and HOSTARCH
tools build: Add HOSTARCH Makefile variable
perf tests kmod-path: Fix build on ubuntu:16.04-x-armhf
perf tools: Add AVX-512 instructions to the new instructions test
perf tools: Add AVX-512 support to the instruction decoder used by Intel PT
x86/insn: Add AVX-512 support to the instruction decoder
x86/insn: perf tools: Fix vcvtph2ps instruction decoding
Pull x86 timer updates from Ingo Molnar:
"The main change in this tree is the reworking, fixing and extension of
the TSC frequency enumeration code (by Len Brown)"
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Remove the unused check_tsc_disabled()
x86/tsc: Enumerate BXT tsc_khz via CPUID
x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID
x86/tsc_msr: Remove irqoff around MSR-based TSC enumeration
x86/tsc_msr: Add Airmont reference clock values
x86/tsc_msr: Correct Silvermont reference clock values
x86/tsc_msr: Update comments, expand definitions
x86/tsc_msr: Remove debugging messages
x86/tsc_msr: Identify Intel-specific code
Revert "x86/tsc: Add missing Cherrytrail frequency to the table"
Pull x86 platform updates from Ingo Molnar:
"The main changes in this cycle were:
- Intel-SoC enhancements (Andy Shevchenko)
- Intel CPU symbolic model definition rework (Dave Hansen)
- ... other misc changes"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
x86/sfi: Enable enumeration of SD devices
x86/pci: Use MRFLD abbreviation for Merrifield
x86/platform/intel-mid: Make vertical indentation consistent
x86/platform/intel-mid: Mark regulators explicitly defined
x86/platform/intel-mid: Rename mrfl.c to mrfld.c
x86/platform/intel-mid: Enable spidev on Intel Edison boards
x86/platform/intel-mid: Extend PWRMU to support Penwell
x86/pci, x86/platform/intel_mid_pci: Remove duplicate power off code
x86/platform/intel-mid: Add pinctrl for Intel Merrifield
x86/platform/intel-mid: Enable GPIO expanders on Edison
x86/platform/intel-mid: Add Power Management Unit driver
x86/platform/atom/punit: Enable support for Merrifield
x86/platform/intel_mid_pci: Rework IRQ0 workaround
x86, thermal: Clean up and fix CPU model detection for intel_soc_dts_thermal
x86, mmc: Use Intel family name macros for mmc driver
x86/intel_telemetry: Use Intel family name macros for telemetry driver
x86/acpi/lss: Use Intel family name macros for the acpi_lpss driver
x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver
x86/platform: Use new Intel model number macros
x86/intel_idle: Use Intel family macros for intel_idle
...
Pull x86 fpu updates from Ingo Molnar:
"The main x86 FPU changes in this cycle were:
- a large series of cleanups, fixes and enhancements to re-enable the
XSAVES instruction on Intel CPUs - which is the most advanced
instruction to do FPU context switches (Yu-cheng Yu, Fenghua Yu)
- Add FPU tracepoints for the FPU state machine (Dave Hansen)"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Do not BUG_ON() in early FPU code
x86/fpu/xstate: Re-enable XSAVES
x86/fpu/xstate: Fix fpstate_init() for XRSTORS
x86/fpu/xstate: Return NULL for disabled xstate component address
x86/fpu/xstate: Fix __fpu_restore_sig() for XSAVES
x86/fpu/xstate: Fix xstate_offsets, xstate_sizes for non-extended xstates
x86/fpu/xstate: Fix XSTATE component offset print out
x86/fpu/xstate: Fix PTRACE frames for XSAVES
x86/fpu/xstate: Fix supervisor xstate component offset
x86/fpu/xstate: Align xstate components according to CPUID
x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use
x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization
x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size'
x86/fpu/xstate: Define and use 'fpu_user_xstate_size'
x86/fpu: Add tracepoints to dump FPU state at key points
Pull x86 stackdump update from Ingo Molnar:
"A number of stackdump enhancements"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/dumpstack: Add show_stack_regs() and use it
printk: Make the printk*once() variants return a value
x86/dumpstack: Honor supplied @regs arg
Pull x86 build updates from Ingo Molnar:
"A build system fix and a cleanup"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kbuild: Remove stale asm-generic wrappers
kbuild, x86: Track generated headers with generated-y
Pull x86 boot updates from Ingo Molnar:
"The main changes:
- add initial commits to randomize kernel memory section virtual
addresses, enabled via a new kernel option: RANDOMIZE_MEMORY
(Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu)
- enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees
Cook)
- EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar)
- misc cleanups/fixes"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Simplify EBDA-vs-BIOS reservation logic
x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does
x86/boot: Reorganize and clean up the BIOS area reservation code
x86/mm: Do not reference phys addr beyond kernel
x86/mm: Add memory hotplug support for KASLR memory randomization
x86/mm: Enable KASLR for vmalloc memory regions
x86/mm: Enable KASLR for physical mapping memory regions
x86/mm: Implement ASLR for kernel memory regions
x86/mm: Separate variable for trampoline PGD
x86/mm: Add PUD VA support for physical mapping
x86/mm: Update physical mapping variable names
x86/mm: Refactor KASLR entropy functions
x86/KASLR: Fix boot crash with certain memory configurations
x86/boot/64: Add forgotten end of function marker
x86/KASLR: Allow randomization below the load address
x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
x86/KASLR: Randomize virtual address separately
x86/KASLR: Clarify identity map interface
x86/boot: Refuse to build with data relocations
x86/KASLR, x86/power: Remove x86 hibernation restrictions
Pull x86 mm updates from Ingo Molnar:
"Various x86 low level modifications:
- preparatory work to support virtually mapped kernel stacks (Andy
Lutomirski)
- support for 64-bit __get_user() on 32-bit kernels (Benjamin
LaHaise)
- (involved) workaround for Knights Landing CPU erratum (Dave Hansen)
- MPX enhancements (Dave Hansen)
- mremap() extension to allow remapping of the special VDSO vma, for
purposes of user level context save/restore (Dmitry Safonov)
- hweight and entry code cleanups (Borislav Petkov)
- bitops code generation optimizations and cleanups with modern GCC
(H. Peter Anvin)
- syscall entry code optimizations (Paolo Bonzini)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
x86/mm/cpa: Add missing comment in populate_pdg()
x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
x86/smp: Remove unnecessary initialization of thread_info::cpu
x86/smp: Remove stack_smp_processor_id()
x86/uaccess: Move thread_info::addr_limit to thread_struct
x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
x86/dumpstack: When OOPSing, rewind the stack before do_exit()
x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
x86/dumpstack: Try harder to get a call trace on stack overflow
x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
x86/mm: Use pte_none() to test for empty PTE
x86/mm: Disallow running with 32-bit PTEs to work around erratum
x86/mm: Ignore A/D bits in pte/pmd/pud_none()
x86/mm: Move swap offset/type up in PTE to work around erratum
x86/entry: Inline enter_from_user_mode()
...
Pull x86/apic updates from Ingo Molnar:
"Misc cleanups and a small fix"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Remove the unused struct apic::apic_id_mask field
x86/apic: Fix misspelled APIC
x86/ioapic: Simplify ioapic_setup_resources()
Pull perf updates from Ingo Molnar:
"With over 300 commits it's been a busy cycle - with most of the work
concentrated on the tooling side (as it should).
The main kernel side enhancements were:
- Add per event callchain limit: Recently we introduced a sysctl to
tune the max-stack for all events for which callchains were
requested:
$ sysctl kernel.perf_event_max_stack
kernel.perf_event_max_stack = 127
Now this patch introduces a way to configure this per event, i.e.
this becomes possible:
$ perf record -e sched:*/max-stack=2/ -e block:*/max-stack=10/ -a
allowing finer tuning of how much buffer space callchains use.
This uses an u16 from the reserved space at the end, leaving
another u16 for future use.
There has been interest in even finer tuning, namely to control the
max stack for kernel and userspace callchains separately. Further
discussion is needed, we may for instance use the remaining u16 for
that and when it is present, assume that the sample_max_stack
introduced in this patch applies for the kernel, and the u16 left
is used for limiting the userspace callchain (Arnaldo Carvalho de
Melo)
- Optimize AUX event (hardware assisted side-band event) delivery
(Kan Liang)
- Rework Intel family name macro usage (this is partially x86 arch
work) (Dave Hansen)
- Refine and fix Intel LBR support (David Carrillo-Cisneros)
- Add support for Intel 'TopDown' events (Andi Kleen)
- Intel uncore PMU driver fixes and enhancements (Kan Liang)
- ... other misc changes.
Here's an incomplete list of the tooling enhancements (but there's
much more, see the shortlog and the git log for details):
- Support cross unwinding, i.e. collecting '--call-graph dwarf'
perf.data files in one machine and then doing analysis in another
machine of a different hardware architecture. This enables, for
instance, to do:
$ perf record -a --call-graph dwarf
on a x86-32 or aarch64 system and then do 'perf report' on it on a
x86_64 workstation (He Kuang)
- Allow reading from a backward ring buffer (one setup via
sys_perf_event_open() with perf_event_attr.write_backward = 1)
(Wang Nan)
- Finish merging initial SDT (Statically Defined Traces) support, see
cset comments for details about how it all works (Masami Hiramatsu)
- Support attaching eBPF programs to tracepoints (Wang Nan)
- Add demangling of symbols in programs written in the Rust language
(David Tolnay)
- Add support for tracepoints in the python binding, including an
example, that sets up and parses sched:sched_switch events,
tools/perf/python/tracepoint.py (Jiri Olsa)
- Introduce --stdio-color to set up the color output mode selection
in 'annotate' and 'report', allowing emit color escape sequences
when redirecting the output of these tools (Arnaldo Carvalho de
Melo)
- Add 'callindent' option to 'perf script -F', to indent the Intel PT
call stack, making this output more ftrace-like (Adrian Hunter,
Andi Kleen)
- Allow dumping the object files generated by llvm when processing
eBPF scriptlet events (Wang Nan)
- Add stackcollapse.py script to help generating flame graphs (Paolo
Bonzini)
- Add --ldlat option to 'perf mem' to specify load latency for loads
event (e.g. cpu/mem-loads/ ) (Jiri Olsa)
- Tooling support for Intel TopDown counters, recently added to the
kernel (Andi Kleen)"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (303 commits)
perf tests: Add is_printable_array test
perf tools: Make is_printable_array global
perf script python: Fix string vs byte array resolving
perf probe: Warn unmatched function filter correctly
perf cpu_map: Add more helpers
perf stat: Balance opening and reading events
tools: Copy linux/{hash,poison}.h and check for drift
perf tools: Remove include/linux/list.h from perf's MANIFEST
tools: Copy the bitops files accessed from the kernel and check for drift
Remove: kernel unistd*h files from perf's MANIFEST, not used
perf tools: Remove tools/perf/util/include/linux/const.h
perf tools: Remove tools/perf/util/include/asm/byteorder.h
perf tools: Add missing linux/compiler.h include to perf-sys.h
perf jit: Remove some no-op error handling
perf jit: Add missing curly braces
objtool: Initialize variable to silence old compiler
objtool: Add -I$(srctree)/tools/arch/$(ARCH)/include/uapi
perf record: Add --tail-synthesize option
perf session: Don't warn about out of order event if write_backward is used
perf tools: Enable overwrite settings
...
Pull locking updates from Ingo Molnar:
"The locking tree was busier in this cycle than the usual pattern - a
couple of major projects happened to coincide.
The main changes are:
- implement the atomic_fetch_{add,sub,and,or,xor}() API natively
across all SMP architectures (Peter Zijlstra)
- add atomic_fetch_{inc/dec}() as well, using the generic primitives
(Davidlohr Bueso)
- optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
Waiman Long)
- optimize smp_cond_load_acquire() on arm64 and implement LSE based
atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
on arm64 (Will Deacon)
- introduce smp_acquire__after_ctrl_dep() and fix various barrier
mis-uses and bugs (Peter Zijlstra)
- after discovering ancient spin_unlock_wait() barrier bugs in its
implementation and usage, strengthen its semantics and update/fix
usage sites (Peter Zijlstra)
- optimize mutex_trylock() fastpath (Peter Zijlstra)
- ... misc fixes and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
locking/static_keys: Fix non static symbol Sparse warning
locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
locking/atomic, arch/tile: Fix tilepro build
locking/atomic, arch/m68k: Remove comment
locking/atomic, arch/arc: Fix build
locking/Documentation: Clarify limited control-dependency scope
locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
locking/atomic, arch/mips: Convert to _relaxed atomics
locking/atomic, arch/alpha: Convert to _relaxed atomics
locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
locking/atomic: Fix atomic64_relaxed() bits
locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
...
Pull EFI updates from Ingo Molnar:
"The biggest change in this cycle were SGI/UV related changes that
clean up and fix UV boot quirks and problems.
There's also various smaller cleanups and refinements"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Reorganize the GUID table to make it easier to read
x86/efi: Remove the unused efi_get_time() function
x86/efi: Update efi_thunk() to use the the arch_efi_call_virt*() macros
x86/uv: Update uv_bios_call() to use efi_call_virt_pointer()
efi: Convert efi_call_virt() to efi_call_virt_pointer()
x86/efi: Remove unused variable 'efi'
efi: Document #define FOO_PROTOCOL_GUID layout
efibc: Report more information in the error messages
Currently we don't save ACPI ids (unlike LAPIC ids which go to
x86_cpu_to_apicid) from MADT and we may need this information later.
Particularly, ACPI ids is the only existent way for a PVHVM Xen guest
to figure out Xen's idea of its vCPUs ids before these CPUs boot and
in some cases these ids diverge from Linux's cpu ids.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Update cpuid.h header from xen hypervisor tree to get
XEN_HVM_CPUID_VCPU_ID_PRESENT definition.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* pm-sleep:
PM / hibernate: Introduce test_resume mode for hibernation
x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
PM / hibernate: Image data protection during restoration
PM / hibernate: Add missing braces in __register_nosave_region()
PM / hibernate: Clean up comments in snapshot.c
PM / hibernate: Clean up function headers in snapshot.c
PM / hibernate: Add missing braces in hibernate_setup()
PM / hibernate: Recycle safe pages after image restoration
PM / hibernate: Simplify mark_unsafe_pages()
PM / hibernate: Do not free preallocated safe pages during image restore
PM / suspend: show workqueue state in suspend flow
PM / sleep: make PM notifiers called symmetrically
PM / sleep: Make pm_prepare_console() return void
PM / Hibernate: Don't let kasan instrument snapshot.c
* pm-tools:
PM / tools: scripts: AnalyzeSuspend v4.2
tools/turbostat: allow user to alter DESTDIR and PREFIX
* acpi-tables:
ACPI: Rename configfs.c to acpi_configfs.c to prevent link error
ACPI: add support for loading SSDTs via configfs
ACPI: add support for configfs
efi / ACPI: load SSTDs from EFI variables
spi / ACPI: add support for ACPI reconfigure notifications
i2c / ACPI: add support for ACPI reconfigure notifications
ACPI: add support for ACPI reconfiguration notifiers
ACPI / scan: fix enumeration (visited) flags for bus rescans
ACPI / documentation: add SSDT overlays documentation
ACPI: ARM64: support for ACPI_TABLE_UPGRADE
ACPI / tables: introduce ARCH_HAS_ACPI_TABLE_UPGRADE
ACPI / tables: move arch-specific symbol to asm/acpi.h
ACPI / tables: table upgrade: refactor function definitions
ACPI / tables: table upgrade: use cacheable map for tables
Conflicts:
arch/arm64/include/asm/acpi.h
The pcommit instruction is being deprecated in favor of either ADR
(asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
Firmware Interface Table).
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This reverts commit 8b3e34e46a.
Given the deprecation of the pcommit instruction, the relevant VMX
features and CPUID bits are not going to be rolled into the SDM. Remove
their usage from KVM.
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
It doesn't just control probing for the EBDA -- it controls whether we
detect and reserve the <1MB BIOS regions in general.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/55bd591115498440d461857a7b64f349a5d911f3.1469135598.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add support for Intel's AVX-512 instructions to the instruction decoder.
AVX-512 instructions are documented in Intel Architecture Instruction
Set Extensions Programming Reference (February 2016).
AVX-512 instructions are identified by a EVEX prefix which, for the
purpose of instruction decoding, can be treated as though it were a
4-byte VEX prefix.
Existing instructions which can now accept an EVEX prefix need not be
further annotated in the op code map (x86-opcode-map.txt). In the case
of new instructions, the op code map is updated accordingly.
Also add associated Mask Instructions that are used to manipulate mask
registers used in AVX-512 instructions.
The 'perf tools' instruction decoder is updated in a subsequent patch.
And a representative set of instructions is added to the perf tools new
instructions test in a subsequent patch.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: X86 ML <x86@kernel.org>
Link: http://lkml.kernel.org/r/1469003437-32706-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So the reserve_ebda_region() code has accumulated a number of
problems over the years that make it really difficult to read
and understand:
- The calculation of 'lowmem' and 'ebda_addr' is an unnecessarily
interleaved mess of first lowmem, then ebda_addr, then lowmem tweaks...
- 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other
parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it
super confusing to read.
- It does not help at all that we have various memory range markers, half of which
are 'start of range', half of which are 'end of range' - but this crucial
property is not obvious in the naming at all ... gave me a headache trying to
understand all this.
- Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is
obvious, all values here are addresses!), while it does not highlight that it's
the _start_ of the EBDA region ...
- 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value
that is a pointer to a value, not a memory range address!
- The function name itself is a misnomer: it says 'reserve_ebda_region()' while
its main purpose is to reserve all the firmware ROM typically between 640K and
1MB, while the 'EBDA' part is only a small part of that ...
- Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this
too should be about whether to reserve firmware areas in the paravirt case.
- In fact thinking about this as 'end of RAM' is confusing: what this function
*really* wants to reserve is firmware data and code areas! Once the thinking is
inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure
'reserved area' notion everything becomes a lot clearer.
To improve all this rewrite the whole code (without changing the logic):
- Firstly invert the naming from 'lowmem end' to 'BIOS reserved area start'
and propagate this concept through all the variable names and constants.
BIOS_RAM_SIZE_KB_PTR // was: BIOS_LOWMEM_KILOBYTES
BIOS_START_MIN // was: INSANE_CUTOFF
ebda_start // was: ebda_addr
bios_start // was: lowmem
BIOS_START_MAX // was: LOWMEM_CAP
- Then clean up the name of the function itself by renaming it
to reserve_bios_regions() and renaming the ::ebda_search paravirt
flag to ::reserve_bios_regions.
- Fix up all the comments (fix typos), harmonize and simplify their
formulation and remove comments that become unnecessary due to
the much better naming all around.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Monitored cached line may not wake up from mwait on certain
Goldmont based CPUs. This patch will avoid calling
current_set_polling_and_test() and thereby not set the TIF_ flag.
The result is that we'll always send IPIs for wakeups.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1468867270-18493-1-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On Intel hardware, native_play_dead() uses mwait_play_dead() by
default and only falls back to the other methods if that fails.
That also happens during resume from hibernation, when the restore
(boot) kernel runs disable_nonboot_cpus() to take all of the CPUs
except for the boot one offline.
However, that is problematic, because the address passed to
__monitor() in mwait_play_dead() is likely to be written to in the
last phase of hibernate image restoration and that causes the "dead"
CPU to start executing instructions again. Unfortunately, the page
containing the address in that CPU's instruction pointer may not be
valid any more at that point.
First, that page may have been overwritten with image kernel memory
contents already, so the instructions the CPU attempts to execute may
simply be invalid. Second, the page tables previously used by that
CPU may have been overwritten by image kernel memory contents, so the
address in its instruction pointer is impossible to resolve then.
A report from Varun Koyyalagunta and investigation carried out by
Chen Yu show that the latter sometimes happens in practice.
To prevent it from happening, temporarily change the smp_ops.play_dead
pointer during resume from hibernation so that it points to a special
"play dead" routine which uses hlt_play_dead() and avoids the
inadvertent "revivals" of "dead" CPUs this way.
A slightly unpleasant consequence of this change is that if the
system is hibernated with one or more CPUs offline, it will generally
draw more power after resume than it did before hibernation, because
the physical state entered by CPUs via hlt_play_dead() is higher-power
than the mwait_play_dead() one in the majority of cases. It is
possible to work around this, but it is unclear how much of a problem
that's going to be in practice, so the workaround will be implemented
later if it turns out to be necessary.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371
Reported-by: Varun Koyyalagunta <cpudebug@centtech.com>
Original-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
check_tsc_disabled() was introduced by commit:
c73deb6aec ("perf/x86: Add ability to calculate TSC from perf sample timestamps")
The only caller was arch_perf_update_userpage(), which had been refactored
by commit:
d8b11a0cbd ("perf/x86: Clean up cap_user_time* setting")
... so no need keep and export it any more.
Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: a.p.zijlstra@chello.nl
Cc: adrian.hunter@intel.com
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/1468570330-25810-1-git-send-email-weijg.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Don't use the same syscall numbers for 2 different syscalls:
534 x32 preadv compat_sys_preadv64
535 x32 pwritev compat_sys_pwritev64
534 x32 preadv2 compat_sys_preadv2
535 x32 pwritev2 compat_sys_pwritev2
Add compat_sys_preadv64v2() and compat_sys_pwritev64v2() so that 64-bit offset
is passed in one 64-bit register on x32, similar to compat_sys_preadv64()
and compat_sys_pwritev64().
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/CAMe9rOovCMf-RQfx_n1U_Tu_DX1BYkjtFr%3DQ4-_PFVSj9BCzUA@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It serves no purpose -- raw_smp_processor_id() works fine. This
change will be needed to move thread_info off the stack.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a2bf4f07fbc30fb32f9f7f3f8f94ad3580823847.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
struct thread_info is a legacy mess. To prepare for its partial removal,
move thread_info::addr_limit out.
As an added benefit, this way is simpler.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/15bee834d09402b47ac86f2feccdf6529f9bc5b0.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rename it to match the thread_struct::uaccess_err pattern and also
because it was too long.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
struct thread_info is a legacy mess. To prepare for its partial removal,
move the uaccess control fields out -- they're straightforward.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d0ac4d01c8e4d4d756264604e47445d5acc7900e.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>