Change the callers of walk_iomem_res() scanning for the
following resources by name to use walk_iomem_res_desc()
instead.
"ACPI Tables"
"ACPI Non-volatile Storage"
"Persistent Memory (legacy)"
"Crash kernel"
Note, the caller of walk_iomem_res() with "GART" will be removed
in a later patch.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chun-Yi <joeyli.kernel@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Lee, Chun-Yi <joeyli.kernel@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Minfei Huang <mnfhuang@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Takao Indoh <indou.takao@jp.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: kexec@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm <linux-mm@kvack.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1453841853-11383-15-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change e820_reserve_resources() to set 'flags' and 'desc' from
e820 types.
Set E820_RESERVED_KERN and E820_RAM's (System RAM) io resource
type to IORESOURCE_SYSTEM_RAM.
Do the same for "Kernel data", "Kernel code", and "Kernel bss",
which are child nodes of System RAM.
I/O resource descriptor is set to 'desc' for entries that are
(and will be) target ranges of walk_iomem_res() and
region_intersects().
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm <linux-mm@kvack.org>
Link: http://lkml.kernel.org/r/1453841853-11383-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Save 25 bytes of code and make the bootup a tiny bit faster:
text data bss dec filename
9735144 4970776 15474688 30180608 vmlinux.old
9735119 4970776 15474688 30180583 vmlinux
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454140872-16926-1-git-send-email-kuleshovmail@gmail.com
[ Fixed various small details. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This will let us specify something like 'sys_xyz/foo' instead of
'sys_xyz' in the syscall table, where the 'foo' qualifier conveys
some extra information to the C code.
The intent is to allow things like sys_execve/ptregs to indicate
that sys_execve() touches pt_regs.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2de06e33dce62556b3ec662006fcb295504e296e.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rather than duplicating the compat entry handling in all
consumers of syscalls_BITS.h, handle it directly in
syscalltbl.sh. Now we generate entries in syscalls_32.h like:
__SYSCALL_I386(5, sys_open)
__SYSCALL_I386(5, compat_sys_open)
and all of its consumers implicitly get the right entry point.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b7c2b501dc0e6e43050e916b95807c3e2e16e9bb.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The common/64/x32 distinction has no effect other than
determining which kernels actually support the syscall. Move
the logic into syscalltbl.sh.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/58d4a95f40e43b894f93288b4a3633963d0ee22e.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following commit:
f8e617f458 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs")
adds memory barriers around clflush(), but this seems wrong for UP since
barrier() has no effect on clflush(). We really want MFENCE, so switch
to mb() instead.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization <virtualization@lists.linux-foundation.org>
Link: http://lkml.kernel.org/r/1453921746-16178-5-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When calling intel_alt_er() with .idx != EXTRA_REG_RSP_* we will not
initialize alt_idx and then use this uninitialized value to index an
array.
When that is not fatal, it can result in an infinite loop in its
caller __intel_shared_reg_get_constraints(), with IRQs disabled.
Alternative error modes are random memory corruption due to the
cpuc->shared_regs->regs[] array overrun, which manifest in either
get_constraints or put_constraints doing weird stuff.
Only took 6 hours of painful debugging to find this. Neither GCC nor
Smatch warnings flagged this bug.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: ae3f011fc2 ("perf/x86/intel: Fix SLM MSR_OFFCORE_RSP1 valid_mask")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
L3_PAGE_OFFSET was introduced in commit a6523748bd (paravirt/x86, 64-bit: move
__PAGE_OFFSET to leave a space for hypervisor), but has no users.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Link: http://lkml.kernel.org/r/1453810881-30622-1-git-send-email-kuleshovmail@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch enables the uncore_imc PMU for Intel
SkyLake Desktop processors (Core i7-6700, model 94).
It is possible to compute memory read/write bandwidth
using:
$ perf stat -a -e uncore_imc/data_reads/,uncore_imc/data_writes/ ....
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1452151546-8853-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the stuff currently only used by the kexec file code within
CONFIG_KEXEC_FILE (and CONFIG_KEXEC_VERIFY_SIG).
Also move internal "struct kexec_sha_region" and "struct kexec_buf" into
"kexec_internal.h".
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's a short window in which very early mappings can end up
with NX clear because they are created before we've noticed that
we have NX.
It turns out that we detect NX very early, so there's no need to
defer __supported_pte_mask setup.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2b544627345f7110160545a3f47031eb45c3ad4f.1453239349.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit a5d90c923b ("x86/efi: Quirk out SGI UV") added a quirk
to efi_apply_memmap_quirks to force SGI UV systems to fall back
to the old EFI memmap mechanism. We have a BIOS fix for this
issue on all systems except for UV1. This commit fixes up the
EFI quirk/MMR mapping code so that we only apply the special
case to UV1 hardware.
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1449867585-189233-2-git-send-email-athorlton@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Intel Tangier SoC is known to have 64-bit dual core CPU. Enable
64-bit build for it.
The kernel has been tested on Intel Edison board:
Linux buildroot 4.4.0-next-20160115+ #25 SMP Fri Jan 15 22:03:19 EET 2016 x86_64 GNU/Linux
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 74
model name : Genuine Intel(R) CPU 4000 @ 500MHz
stepping : 8
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1452888668-147116-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge second patch-bomb from Andrew Morton:
- more MM stuff:
- Kirill's page-flags rework
- Kirill's now-allegedly-fixed THP rework
- MADV_FREE implementation
- DAX feature work (msync/fsync). This isn't quite complete but DAX
is new and it's good enough and the guys have a handle on what
needs to be done - I expect this to be wrapped in the next week or
two.
- some vsprintf maintenance work
- various other misc bits
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (145 commits)
printk: change recursion_bug type to bool
lib/vsprintf: factor out %pN[F] handler as netdev_bits()
lib/vsprintf: refactor duplicate code to special_hex_number()
printk-formats.txt: remove unimplemented %pT
printk: help pr_debug and pr_devel to optimize out arguments
lib/test_printf.c: test dentry printing
lib/test_printf.c: add test for large bitmaps
lib/test_printf.c: account for kvasprintf tests
lib/test_printf.c: add a few number() tests
lib/test_printf.c: test precision quirks
lib/test_printf.c: check for out-of-bound writes
lib/test_printf.c: don't BUG
lib/kasprintf.c: add sanity check to kvasprintf
lib/vsprintf.c: warn about too large precisions and field widths
lib/vsprintf.c: help gcc make number() smaller
lib/vsprintf.c: expand field_width to 24 bits
lib/vsprintf.c: eliminate potential race in string()
lib/vsprintf.c: move string() below widen_string()
lib/vsprintf.c: pull out padding code from dentry_name()
printk: do cond_resched() between lines while outputting to consoles
...
We've had quite busy weeks in this cycle. Looking at ALSA core, the
significant changes are a few fixes wrt timer and sequencer ioctls
that have been revealed by fuzzer recently. Other than that, ASoC
core got a few updates about DAI link handling, but these are rather
straightforward refactoring.
In drivers scene, ASoC received quite lots of new drivers in addition
to bunch of updates for still ongoing Intel Skylake support and
topology API. HD-audio gained a new HDMI/DP hotplug notification via
component. FireWire got a pile of code refactoring/updates with
SCS.1x driver integration.
More highlights are shown below.
[NOTE: this contains also many commits for DRM. This is due to the
pull of drm stable branch into sound tree, as the base of i915 audio
component work for HD-audio. The highlights below don't contain
these DRM changes, as these are supposed to be pulled via drm tree in
anyway sooner or later.]
Core
- Handful fixes to harden ALSA timer and sequencer ioctls against
races reported by syzkaller fuzzer
- Irq description string can be unique to each card; only for
HD-audio for now
ASoC
- Conversion of the array of DAI links to a list for supporting
dynamically adding and removing DAI links
- Topology API enhancements to make everything more component based
and being able to specify PCM links via topology
- Some more fixes for the topology code, though it is still not final
and ready for enabling in production; we really need to get to the
point where that can be done
- A pile of changes for Intel SkyLake drivers which hopefully deliver
some useful initial functionality for systems with this chipset,
though there is more work still to come
- Lots of new features and cleanups for the Renesas drivers
- ANC support for WM5110
- New drivers: Imagination Technologies IPs, Atmel class D speaker,
Cirrus CS47L24 and WM1831, Dialog DA7128, Realtek RT5659 and
RT56156, Rockchip RK3036, TI PC3168A, and AMD ACP
- Rename PCM1792a driver to be generic pcm179x
HD-Audio
- Use audio component for i915 HDMI/DP hotplug handling
- On-demand binding with i915 driver
- bdl_pos_adj parameter adjustment for Baytrail controllers
- Enable power_save_node for CX20722; this shouldn't lead to
regression, hopefully
- Kabylake HDMI/DP codec support
- Quirks for Lenovo E50-80, Dell Latitude E-series, and other Dell
machines
- A few code refactoring
FireWire
- Lots of code cleanup and refactoring
- Integrate the support of SCS.1x devices into snd-oxfw driver;
snd-scs1x driver is obsoleted
USB-audio
- Fix possible NULL dereference at disconnection
- A regression fix for Native Instruments devices
Misc
- A few code cleanups of fm801 driver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJWmmhNAAoJEGwxgFQ9KSmk/wsP/3eO+giAT9VRPa6qxR6VdT6I
dZwTxcp4ZzUrgLxk9k5VYjqey6QL+1xWfl3Abrd+NzXDj1wo4KsDh2XCKG1btO9K
UpIZf76Nzt7o91pzHbsU6mrjDeoVNqloZoGbg1utAmmegaXH3owd18p/ZHfE3sz2
BbaHmYW/R8lnaBgBhzqJB97+zRaLJmMWpWHfpHaIPjdfw8/V4j76jtPnpmv2hDZl
BHXVHcQXjVGunFRzxdzBLuTC+FmhzUeTAbbAdOT4fEoOCv5MtZqYppNxdhj+b9l5
mrsXe5FBTNmrt9Z5TtfCuzgJPkzoDperFb0aKd7wI1jVMtLzkNCMlanHr9U6B6fr
jSrs6l25xrpF1BBfRMfHjNudA5vng/XC5dtW00JofXSrIxtwPNUoDDiqJgw7xVm5
aVWK7KkQIjRbHdCQaeTymv70oHHKei92hbCrXUobXZ7wLeJMXNVPT25ttChWrgAI
7cu5h+K5PjReI/sJFTMPL4aHZ+jAn9quQl7vK8EXiL9E6G8lLiuBiVW6hjGd9At+
Z6UyGV+nCM6O3qZcyParMuLkNtWx9uT7Pcn8oTZAdKPngNhsf8+yl9qmsFkNLDC4
LKPx0+rdCjtMKn2du3krsHhG3EN9pLDrE6g5U3d6Cz83e69Y7fCuSjl31SjD91H0
bZDcM/ejYSbid3yKN4TL
=Gvgb
-----END PGP SIGNATURE-----
Merge tag 'sound-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"We've had quite busy weeks in this cycle. Looking at ALSA core, the
significant changes are a few fixes wrt timer and sequencer ioctls
that have been revealed by fuzzer recently. Other than that, ASoC
core got a few updates about DAI link handling, but these are rather
straightforward refactoring.
In drivers scene, ASoC received quite lots of new drivers in addition
to bunch of updates for still ongoing Intel Skylake support and
topology API. HD-audio gained a new HDMI/DP hotplug notification via
component. FireWire got a pile of code refactoring/updates with
SCS.1x driver integration.
More highlights are shown below.
[ NOTE: this contains also many commits for DRM. This is due to the
pull of drm stable branch into sound tree, as the base of i915 audio
component work for HD-audio. The highlights below don't contain
these DRM changes, as these are supposed to be pulled via drm tree
in anyway sooner or later. ]
Core:
- Handful fixes to harden ALSA timer and sequencer ioctls against
races reported by syzkaller fuzzer
- Irq description string can be unique to each card; only for
HD-audio for now
ASoC:
- Conversion of the array of DAI links to a list for supporting
dynamically adding and removing DAI links
- Topology API enhancements to make everything more component based
and being able to specify PCM links via topology
- Some more fixes for the topology code, though it is still not final
and ready for enabling in production; we really need to get to the
point where that can be done
- A pile of changes for Intel SkyLake drivers which hopefully deliver
some useful initial functionality for systems with this chipset,
though there is more work still to come
- Lots of new features and cleanups for the Renesas drivers
- ANC support for WM5110
- New drivers: Imagination Technologies IPs, Atmel class D speaker,
Cirrus CS47L24 and WM1831, Dialog DA7128, Realtek RT5659 and
RT56156, Rockchip RK3036, TI PC3168A, and AMD ACP
- Rename PCM1792a driver to be generic pcm179x
HD-Audio:
- Use audio component for i915 HDMI/DP hotplug handling
- On-demand binding with i915 driver
- bdl_pos_adj parameter adjustment for Baytrail controllers
- Enable power_save_node for CX20722; this shouldn't lead to
regression, hopefully
- Kabylake HDMI/DP codec support
- Quirks for Lenovo E50-80, Dell Latitude E-series, and other Dell
machines
- A few code refactoring
FireWire:
- Lots of code cleanup and refactoring
- Integrate the support of SCS.1x devices into snd-oxfw driver;
snd-scs1x driver is obsoleted
USB-audio:
- Fix possible NULL dereference at disconnection
- A regression fix for Native Instruments devices
Misc:
- A few code cleanups of fm801 driver"
* tag 'sound-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (722 commits)
ALSA: timer: Code cleanup
ALSA: timer: Harden slave timer list handling
ALSA: hda - Add fixup for Dell Latitidue E6540
ALSA: timer: Fix race among timer ioctls
ALSA: hda - add codec support for Kabylake display audio codec
ALSA: timer: Fix double unlink of active_list
ALSA: usb-audio: Fix mixer ctl regression of Native Instrument devices
ALSA: hda - fix the headset mic detection problem for a Dell laptop
ALSA: hda - Fix white noise on Dell Latitude E5550
ALSA: hda_intel: add card number to irq description
ALSA: seq: Fix race at timer setup and close
ALSA: seq: Fix missing NULL check at remove_events ioctl
ALSA: usb-audio: Avoid calling usb_autopm_put_interface() at disconnect
ASoC: hdac_hdmi: remove unused hdac_hdmi_query_pin_connlist
ASoC: AMD: Add missing include file
ALSA: hda - Fixup inverted internal mic for Lenovo E50-80
ALSA: usb: Add native DSD support for Oppo HA-1
ASoC: Make aux_dev more like a generic component
ASoC: bcm2835: cleanup includes by ordering them alphabetically
ASoC: AMD: Manage ACP 2.x SRAM banks power
...
We still can end up with a stale vector due to the following:
CPU0 CPU1 CPU2
lock_vector()
data->move_in_progress=0
sendIPI()
unlock_vector()
set_affinity()
assign_irq_vector()
lock_vector() handle_IPI
move_in_progress = 1 lock_vector()
unlock_vector()
move_in_progress == 1
So we need to serialize the vector assignment against a pending cleanup. The
solution is rather simple now. We not only check for the move_in_progress flag
in assign_irq_vector(), we also check whether there is still a cleanup pending
in the old_domain cpumask. If so, we return -EBUSY to the caller and let him
deal with it. Though we have to be careful in the cpu unplug case. If the
cleanout has not yet completed then the following setaffinity() call would
return -EBUSY. Add code which prevents this.
Full context is here: http://lkml.kernel.org/r/5653B688.4050809@stratus.com
Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160107.207265407@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
First of all there is no point in looking up the irq descriptor again, but we
also need the descriptor for the final cleanup race fix in the next
patch. Make that change seperate. No functional difference.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160107.125211743@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We want to synchronize new vector assignments with a pending cleanup. Remove a
dying cpu from a pending cleanup mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160107.045961667@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There is no need to allocate a new cpumask for sending the cleanup vector. The
old_domain mask is now protected by the vector_lock, so we can safely remove
the offline cpus from it and send the IPI with the resulting mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.967993932@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
send_cleanup_vector() fiddles with the old_domain mask unprotected because it
relies on the protection by the move_in_progress flag. But this is fatal, as
the flag is reset after the IPI has been sent. So a cpu which receives the IPI
can still see the flag set and therefor ignores the cleanup request. If no
other cleanup request happens then the vector stays stale on that cpu and in
case of an irq removal the vector still persists. That can lead to use after
free when the next cleanup IPI happens.
Protect the code with vector_lock and clear move_in_progress before sending
the IPI.
This does not plug the race which Joe reported because:
CPU0 CPU1 CPU2
lock_vector()
data->move_in_progress=0
sendIPI()
unlock_vector()
set_affinity()
assign_irq_vector()
lock_vector() handle_IPI
move_in_progress = 1 lock_vector()
unlock_vector()
move_in_progress == 1
The full fix comes with a later patch.
Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.892412198@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
No point of keeping offline cpus in the cleanup mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.808642683@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reusing an existing vector and assigning a new vector has duplicated
code. Consolidate it.
This is also a preparatory patch for finally plugging the cleanup race.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.721599216@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In the case that the new vector mask is a subset of the existing mask there is
no point to do a AND operation of currentmask & newmask. The result is
newmask. So we can simply copy the new mask to the current mask and be done
with it. Preparatory patch for further consolidation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.640253454@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
__assign_irq_vector() uses the vector_cpumask which is assigned by
apic->vector_allocation_domain() without doing basic sanity checks. That can
result in a situation where the final assignement of a newly found vector
fails in apic->cpu_mask_to_apicid_and(). So we have to do rollbacks for no
reason.
apic->cpu_mask_to_apicid_and() only fails if
vector_cpumask & requested_cpumask & cpu_online_mask
is empty.
Check for this condition right away and if the result is empty try immediately
the next possible cpu in the requested mask. So in case of a failure the old
setting is unchanged and we can remove the rollback code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.561877324@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Split out the code which advances the target cpu for the search so we can
reuse it for the next patch which adds an early validation check for the
vectormask which we get from the apic.
Add comments while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.484562040@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use an explicit goto for the cases where we have success in the search/update
and return -ENOSPC if the search loop ends due to no space.
Preparatory patch for fixes. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.403491024@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Function __assign_irq_vector() makes use of apic_chip_data.old_domain as a
temporary buffer, which is in the way of using apic_chip_data.old_domain for
synchronizing the vector cleanup with the vector assignement code.
Use a proper temporary cpumask for this.
[ tglx: Renamed the mask to searched_cpumask for clarity ]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/1450880014-11741-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In fixup_irqs() we unconditionally dereference the irq chip of an irq
descriptor. The descriptor might still be valid, but already cleaned up,
i.e. the chip removed. Add a check for this condition.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.236423282@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There's a race condition between
x86_vector_free_irqs()
{
free_apic_chip_data(irq_data->chip_data);
xxxxx //irq_data->chip_data has been freed, but the pointer
//hasn't been reset yet
irq_domain_reset_irq_data(irq_data);
}
and
smp_irq_move_cleanup_interrupt()
{
raw_spin_lock(&vector_lock);
data = apic_chip_data(irq_desc_get_irq_data(desc));
access data->xxxx // may access freed memory
raw_spin_unlock(&desc->lock);
}
which may cause smp_irq_move_cleanup_interrupt() to access freed memory.
Call irq_domain_reset_irq_data(), which clears the pointer with vector lock
held.
[ tglx: Free memory outside of lock held region. ]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/1450880014-11741-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
setup_ioapic_dest() calls irqchip->irq_set_affinity() completely
unprotected. That's wrong in several aspects:
- it opens a race window where irq_set_affinity() can be interrupted and the
irq chip left in unconsistent state.
- it triggers a lockdep splat when we fix the vector race for 4.3+ because
vector lock is taken with interrupts enabled.
The proper calling convention is irq descriptor lock held and interrupts
disabled.
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1601140919420.3575@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull livepatching updates from Jiri Kosina:
- RO/NX attribute fixes for patch module relocations from Josh
Poimboeuf. As part of this effort, module.c has been cleaned up as
well and livepatching is piggy-backing on this cleanup. Rusty is OK
with this whole lot going through livepatching tree.
- symbol disambiguation support from Chris J Arges. That series is
also
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
but this came in only after I've alredy pushed out. Didn't want to
rebase because of that, hence I am mentioning it here.
- symbol lookup fix from Miroslav Benes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Cleanup module page permission changes
module: keep percpu symbols in module's symtab
module: clean up RO/NX handling.
module: use a structure to encapsulate layout.
gcov: use within_module() helper.
module: Use the same logic for setting and unsetting RO/NX
livepatch: function,sympos scheme in livepatch sysfs directory
livepatch: add sympos as disambiguator field to klp_reloc
livepatch: add old_sympos as disambiguator field to klp_func
Pull x86 fixes from Ingo Molnar:
"Misc changes:
- fix lguest bug
- fix /proc/meminfo output on certain configs
- fix pvclock bug
- fix reboot on certain iMacs by adding new reboot quirk
- fix bootup crash
- fix FPU boot line option parsing
- add more x86 self-tests
- small cleanups, documentation improvements, etc"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/amd: Remove an unneeded condition in srat_detect_node()
x86/vdso/pvclock: Protect STABLE check with the seqcount
x86/mm: Improve switch_mm() barrier comments
selftests/x86: Test __kernel_sigreturn and __kernel_rt_sigreturn
x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[]
lguest: Map switcher text R/O
x86/boot: Hide local labels in verify_cpu()
x86/fpu: Disable AVX when eagerfpu is off
x86/fpu: Disable MPX when eagerfpu is off
x86/fpu: Disable XGETBV1 when no XSAVE
x86/fpu: Fix early FPU command-line parsing
x86/mm: Use PAGE_ALIGNED instead of IS_ALIGNED
selftests/x86: Disable the ldt_gdt_64 test for now
x86/mm/pat: Make split_page_count() check for empty levels to fix /proc/meminfo output
x86/boot: Double BOOT_HEAP_SIZE to 64KB
x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
Originally we calculated ht_nodeid as "ht_nodeid = apicid -
boot_cpu_id;" so presumably it could be negative.
But after commit:
01aaea1afb ('x86: introduce initial apicid')
we use c->initial_apicid which is an unsigned short and thus always >= 0.
It causes a static checker warning to test for impossible
conditions so let's remove it.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Link: http://lkml.kernel.org/r/20160113123940.GE19993@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
minor fixes.
Here's what else is new:
o A new TRACE_EVENT_FN_COND macro, combining both _FN and _COND for
those that want both.
o New selftest to test the instance create and delete
o Better debug output when ftrace fails
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWlU8tAAoJEKKk/i67LK/8JckH/2XIhjwMunm35uCg1308sDqy
d44G3+p0pm8ztjBf8iD8wH2nP3m7z+nC8JBmSPIUgAHsKOYHWsBy2A/36OVWv5lK
1hVXvBwOuZXnyWXr7bC2RO9S9f9acSFaabZXWDi1BCJRJSgEcknz32V7ZAL4jOCO
SfBWBNrWJfUsURbfbElfVxPLArvyUg9Bb5dW5B+QFf6PuoJaORYzNLYXHlbsq++T
WlrlnD+mFZ/DKFZ/gl3FMSGMPaGimw09/3eqMzv/tLQobp6PbCWlJTwjUoxJ/9dO
XOY4sWUrUUZilU8qCk0i0ZSEumWmE+SWS3eq+Ef18B/5haIj/LkoM4UQD3h2Rc4=
=FDR+
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Not much new with tracing for this release. Mostly just clean ups and
minor fixes.
Here's what else is new:
- A new TRACE_EVENT_FN_COND macro, combining both _FN and _COND for
those that want both.
- New selftest to test the instance create and delete
- Better debug output when ftrace fails"
* tag 'trace-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (24 commits)
ftrace: Fix the race between ftrace and insmod
ftrace: Add infrastructure for delayed enabling of module functions
x86: ftrace: Fix the comments for ftrace_modify_code_direct()
tracing: Fix comment to use tracing_on over tracing_enable
metag: ftrace: Fix the comments for ftrace_modify_code
sh: ftrace: Fix the comments for ftrace_modify_code()
ia64: ftrace: Fix the comments for ftrace_modify_code()
ftrace: Clean up ftrace_module_init() code
ftrace: Join functions ftrace_module_init() and ftrace_init_module()
tracing: Introduce TRACE_EVENT_FN_COND macro
tracing: Use seq_buf_used() in seq_buf_to_user() instead of len
bpf: Constify bpf_verifier_ops structure
ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too
ftrace: Remove use of control list and ops
ftrace: Fix output of enabled_functions for showing tramp
ftrace: Fix a typo in comment
ftrace: Show all tramps registered to a record on ftrace_bug()
ftrace: Add variable ftrace_expected for archs to show expected code
ftrace: Add new type to distinguish what kind of ftrace_bug()
tracing: Update cond flag when enabling or disabling a trigger
...
Pull misc vfs updates from Al Viro:
"All kinds of stuff. That probably should've been 5 or 6 separate
branches, but by the time I'd realized how large and mixed that bag
had become it had been too close to -final to play with rebasing.
Some fs/namei.c cleanups there, memdup_user_nul() introduction and
switching open-coded instances, burying long-dead code, whack-a-mole
of various kinds, several new helpers for ->llseek(), assorted
cleanups and fixes from various people, etc.
One piece probably deserves special mention - Neil's
lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets
called without ->i_mutex and tries to avoid ever taking it. That, of
course, means that it's not useful for any directory modifications,
but things like getting inode attributes in nfds readdirplus are fine
with that. I really should've asked for moratorium on lookup-related
changes this cycle, but since I hadn't done that early enough... I
*am* asking for that for the coming cycle, though - I'm going to try
and get conversion of i_mutex to rwsem with ->lookup() done under lock
taken shared.
There will be a patch closer to the end of the window, along the lines
of the one Linus had posted last May - mechanical conversion of
->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
inode_is_locked()/inode_lock_nested(). To quote Linus back then:
-----
| This is an automated patch using
|
| sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
| sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
| sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
| sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
| sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
|
| with a very few manual fixups
-----
I'm going to send that once the ->i_mutex-affecting stuff in -next
gets mostly merged (or when Linus says he's about to stop taking
merges)"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
nfsd: don't hold i_mutex over userspace upcalls
fs:affs:Replace time_t with time64_t
fs/9p: use fscache mutex rather than spinlock
proc: add a reschedule point in proc_readfd_common()
logfs: constify logfs_block_ops structures
fcntl: allow to set O_DIRECT flag on pipe
fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
fs: xattr: Use kvfree()
[s390] page_to_phys() always returns a multiple of PAGE_SIZE
nbd: use ->compat_ioctl()
fs: use block_device name vsprintf helper
lib/vsprintf: add %*pg format specifier
fs: use gendisk->disk_name where possible
poll: plug an unused argument to do_poll
amdkfd: don't open-code memdup_user()
cdrom: don't open-code memdup_user()
rsxx: don't open-code memdup_user()
mtip32xx: don't open-code memdup_user()
[um] mconsole: don't open-code memdup_user_nul()
[um] hostaudio: don't open-code memdup_user()
...
Without the reboot=pci method, the iMac 10,1 simply
hangs after printing "Restarting system" at the point
when it should reboot. This fixes it.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1450466646-26663-1-git-send-email-mario.kleiner.de@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When "eagerfpu=off" is given as a command-line input, the kernel
should disable AVX support.
The Task Switched bit used for lazy context switching does not
support AVX. If AVX is enabled without eagerfpu context
switching, one task's AVX state could become corrupted or leak
to other tasks. This is a bug and has bad security implications.
This only affects systems that have AVX/AVX2/AVX512 and this
issue will be found only when one actually uses AVX/AVX2/AVX512
_AND_ does eagerfpu=off.
Reference: Intel Software Developer's Manual Vol. 3A
Sec. 2.5 Control Registers:
TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the
x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch
to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4
instruction is actually executed by the new task.
Sec. 13.4.1 Using the TS Flag to Control the Saving of the X87
FPU and SSE State
When the TS flag is set, the processor monitors the instruction
stream for x87 FPU, MMX, SSE instructions. When the processor
detects one of these instructions, it raises a
device-not-available exeception (#NM) prior to executing the
instruction.
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-5-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This issue is a fallout from the command-line parsing move.
When "eagerfpu=off" is given as a command-line input, the kernel
should disable MPX support. The decision for turning off MPX was
made in fpu__init_system_ctx_switch(), which is after the
selection of the XSAVE format. This patch fixes it by getting
that decision done earlier in fpu__init_system_xstate().
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-4-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When "noxsave" is given as a command-line input, the kernel
should disable XGETBV1. This issue currently does not cause any
actual problems. XGETBV1 is only useful if we have something
using the 'init optimization' (i.e. xsaveopt, xsaves). We
already clear both of those in fpu__xstate_clear_all_cpu_caps().
But this is good for completeness.
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-3-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The function fpu__init_system() is executed before
parse_early_param(). This causes wrong FPU configuration. This
patch fixes this issue by parsing boot_command_line in the
beginning of fpu__init_system().
With all four patches in this series, each parameter disables
features as the following:
eagerfpu=off: eagerfpu, avx, avx2, avx512, mpx
no387: fpu
nofxsr: fxsr, fxsropt, xmm
noxsave: xsave, xsaveopt, xsaves, xsavec, avx, avx2, avx512,
mpx, xgetbv1 noxsaveopt: xsaveopt
noxsaves: xsaves
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-2-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 platform updates from Ingo Molnar:
"Two changes:
- one to quirk-save/restore certain system MSRs across
suspend/resume, to make certain Intel systems work better
(Chen Yu)
- and also to constify a read only structure (Julia Lawall)"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/calgary: Constify cal_chipset_ops structures
x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume
Pull x86 mm updates from Ingo Molnar:
"The main changes in this cycle were:
- make the debugfs 'kernel_page_tables' file read-only, as it only
has read ops. (Borislav Petkov)
- micro-optimize clflush_cache_range() (Chris Wilson)
- swiotlb enhancements, which fixes certain KVM emulated devices
(Igor Mammedov)
- fix an LDT related debug message (Jan Beulich)
- modularize CONFIG_X86_PTDUMP (Kees Cook)
- tone down an overly alarming warning (Laura Abbott)
- Mark variable __initdata (Rasmus Villemoes)
- PAT additions (Toshi Kani)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Micro-optimise clflush_cache_range()
x86/mm/pat: Change free_memtype() to support shrinking case
x86/mm/pat: Add untrack_pfn_moved for mremap
x86/mm: Drop WARN from multi-BAR check
x86/LDT: Print the real LDT base address
x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN
x86/mm: Introduce max_possible_pfn
x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only
x86/mm/mtrr: Mark the 'range_new' static variable in mtrr_calc_range_state() as __initdata
x86/mm: Turn CONFIG_X86_PTDUMP into a module
Pull x86 fpu updates from Ingo Molnar:
"This cleans up the FPU fault handling methods to be more robust, and
moves eligible variables to .init.data"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Put a few variables in .init.data
x86/fpu: Get rid of xstate_fault()
x86/fpu: Add an XSTATE_OP() macro
Pull x86 cpu updates from Ingo Molnar:
"The main changes in this cycle were:
- Improved CPU ID handling code and related enhancements (Borislav
Petkov)
- RDRAND fix (Len Brown)"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Replace RDRAND forced-reseed with simple sanity check
x86/MSR: Chop off lower 32-bit value
x86/cpu: Fix MSR value truncation issue
x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR
kvm: Add accessors for guest CPU's family, model, stepping
x86/cpu: Unify CPU family, model, stepping calculation
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- vDSO and asm entry improvements (Andy Lutomirski)
- Xen paravirt entry enhancements (Boris Ostrovsky)
- asm entry labels enhancement (Borislav Petkov)
- and other misc changes (Thomas Gleixner, me)"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n
Revert "x86/kvm: On KVM re-enable (e.g. after suspend), update clocks"
x86/entry/64_compat: Make labels local
x86/platform/uv: Include clocksource.h for clocksource_touch_watchdog()
x86/vdso: Enable vdso pvclock access on all vdso variants
x86/vdso: Remove pvclock fixmap machinery
x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
x86/kvm: On KVM re-enable (e.g. after suspend), update clocks
x86/entry/64: Bypass enter_from_user_mode on non-context-tracking boots
x86/asm: Add asm macros for static keys/jump labels
x86/asm: Error out if asm/jump_label.h is included inappropriately
context_tracking: Switch to new static_branch API
x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op
x86/paravirt: Remove the unused irq_enable_sysexit pv op
x86/xen: Avoid fast syscall path for Xen PV guests
Pull x86 apic updates from Ingo Molnar:
"The main changes in this cycle were:
- introduce optimized single IPI sending methods on modern APICs
(Linus Torvalds, Thomas Gleixner)
- kexec/crash APIC handling fixes and enhancements (Hidehiro Kawai)
- extend lapic vector saving/restoring to the CMCI (MCE) vector as
well (Juergen Gross)
- various fixes and enhancements (Jake Oshins, Len Brown)"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/irq: Export functions to allow MSI domains in modules
Documentation: Document kernel.panic_on_io_nmi sysctl
x86/nmi: Save regs in crash dump on external NMI
x86/apic: Introduce apic_extnmi command line parameter
kexec: Fix race between panic() and crash_kexec()
panic, x86: Allow CPUs to save registers even if looping in NMI context
panic, x86: Fix re-entrance problem due to panic on NMI
x86/apic: Fix the saving and restoring of lapic vectors during suspend/resume
x86/smpboot: Re-enable init_udelay=0 by default on modern CPUs
x86/smp: Remove single IPI wrapper
x86/apic: Use default send single IPI wrapper
x86/apic: Provide default send single IPI wrapper
x86/apic: Implement single IPI for apic_noop
x86/apic: Wire up single IPI for apic_numachip
x86/apic: Wire up single IPI for x2apic_uv
x86/apic: Implement single IPI for x2apic_phys
x86/apic: Wire up single IPI for bigsmp_apic
x86/apic: Remove pointless indirections from bigsmp_apic
x86/apic: Wire up single IPI for apic_physflat
x86/apic: Remove pointless indirections from apic_physflat
...
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- tickless load average calculation enhancements (Byungchul Park)
- vtime handling enhancements (Frederic Weisbecker)
- scalability improvement via properly aligning a key structure field
(Jiri Olsa)
- various stop_machine() fixes (Oleg Nesterov)
- sched/numa enhancement (Rik van Riel)
- various fixes and improvements (Andi Kleen, Dietmar Eggemann,
Geliang Tang, Hiroshi Shimamoto, Joonwoo Park, Peter Zijlstra,
Waiman Long, Wanpeng Li, Yuyang Du)"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task()
sched/core: Move sched_entity::avg into separate cache line
x86/fpu: Properly align size in CHECK_MEMBER_AT_END_OF() macro
sched/deadline: Fix the earliest_dl.next logic
sched/fair: Disable the task group load_avg update for the root_task_group
sched/fair: Move the cache-hot 'load_avg' variable into its own cacheline
sched/fair: Avoid redundant idle_cpu() call in update_sg_lb_stats()
sched/core: Move the sched_to_prio[] arrays out of line
sched/cputime: Convert vtime_seqlock to seqcount
sched/cputime: Introduce vtime accounting check for readers
sched/cputime: Rename vtime_accounting_enabled() to vtime_accounting_cpu_enabled()
sched/cputime: Correctly handle task guest time on housekeepers
sched/cputime: Clarify vtime symbols and document them
sched/cputime: Remove extra cost in task_cputime()
sched/fair: Make it possible to account fair load avg consistently
sched/fair: Modify the comment about lock assumptions in migrate_task_rq_fair()
stop_machine: Clean up the usage of the preemption counter in cpu_stopper_thread()
stop_machine: Shift the 'done != NULL' check from cpu_stop_signal_done() to callers
stop_machine: Kill cpu_stop_done->executed
stop_machine: Change __stop_cpus() to rely on cpu_stop_queue_work()
...
Pull RAS updates from Ingo Molnar:
"Various x86 MCE fixes and small enhancements"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Make usable address checks Intel-only
x86/mce: Add the missing memory error check on AMD
x86/RAS: Remove mce.usable_addr
x86/mce: Do not enter deferred errors into the generic pool twice
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Intel Knights Landing support. (Harish Chegondi)
- Intel Broadwell-EP uncore PMU support. (Kan Liang)
- Core code improvements. (Peter Zijlstra.)
- Event filter, LBR and PEBS fixes. (Stephane Eranian)
- Enable cycles:pp on Intel Atom. (Stephane Eranian)
- Add cycles:ppp support for Skylake. (Andi Kleen)
- Various x86 NMI overhead optimizations. (Andi Kleen)
- Intel PT enhancements. (Takao Indoh)
- AMD cache events fix. (Vince Weaver)
Tons of tooling changes:
- Show random perf tool tips in the 'perf report' bottom line
(Namhyung Kim)
- perf report now defaults to --group if the perf.data file has
grouped events, try it with:
# perf record -e '{cycles,instructions}' -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ]
# perf report
# Samples: 1K of event 'anon group { cycles, instructions }'
# Event count (approx.): 1955219195
#
# Overhead Command Shared Object Symbol
2.86% 0.22% swapper [kernel.kallsyms] [k] intel_idle
1.05% 0.33% firefox libxul.so [.] js::SetObjectElement
1.05% 0.00% kworker/0:3 [kernel.kallsyms] [k] gen6_ring_get_seqno
0.88% 0.17% chrome chrome [.] 0x0000000000ee27ab
0.65% 0.86% firefox libxul.so [.] js::ValueToId<(js::AllowGC)1>
0.64% 0.23% JS Helper libxul.so [.] js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::splay
0.62% 1.27% firefox libxul.so [.] js::GetIterator
0.61% 1.74% firefox libxul.so [.] js::NativeSetProperty
0.61% 0.31% firefox libxul.so [.] js::SetPropertyByDefining
- Introduce the 'perf stat record/report' workflow:
Generate perf.data files from 'perf stat', to tap into the
scripting capabilities perf has instead of defining a 'perf stat'
specific scripting support to calculate event ratios, etc.
Simple example:
$ perf stat record -e cycles usleep 1
Performance counter stats for 'usleep 1':
1,134,996 cycles
0.000670644 seconds time elapsed
$ perf stat report
Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1':
1,134,996 cycles
0.000670644 seconds time elapsed
$
It generates PERF_RECORD_ userspace records to store the details:
$ perf report -D | grep PERF_RECORD
0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637
0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535
0x12a [0x40]: PERF_RECORD_STAT_CONFIG
0x16a [0x30]: PERF_RECORD_STAT
-1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text
0x1da [0x18]: PERF_RECORD_STAT_ROUND
[acme@ssdandy linux]$
An effort was made to make perf.data files generated like this to
not generate cryptic messages when processed by older tools.
The 'perf script' bits need rebasing, will go up later.
- Make command line options always available, even when they depend
on some feature being enabled, warning the user about use of such
options (Wang Nan)
- Support hw breakpoint events (mem:0xAddress) in the default output
mode in 'perf script' (Wang Nan)
- Fixes and improvements for supporting annotating ARM binaries,
support ARM call and jump instructions, more work needed to have
arch specific stuff separated into tools/perf/arch/*/annotate/
(Russell King)
- Add initial 'perf config' command, for now just with a --list
command to the contents of the configuration file in use and a
basic man page describing its format, commands for doing edits and
detailed documentation are being reviewed and proof-read. (Taeung
Song)
- Allows BPF scriptlets specify arguments to be fetched using DWARF
info, using a prologue generated at compile/build time (He Kuang,
Wang Nan)
- Allow attaching BPF scriptlets to module symbols (Wang Nan)
- Allow attaching BPF scriptlets to userspace code using uprobe (Wang
Nan)
- BPF programs now can specify 'perf probe' tunables via its section
name, separating key=val values using semicolons (Wang Nan)
Testing some of these new BPF features:
Use case: get callchains when receiving SSL packets, filter then in the
kernel, at arbitrary place.
# cat ssl.bpf.c
#define SEC(NAME) __attribute__((section(NAME), used))
struct pt_regs;
SEC("func=__inet_lookup_established hnum")
int func(struct pt_regs *ctx, int err, unsigned short port)
{
return err == 0 && port == 443;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
#
# perf record -a -g -e ssl.bpf.c
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ]
# perf script | head -30
swapper 0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux)
856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux)
2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux)
2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux)
96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux)
969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux)
2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux)
95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux)
1163ffa start_kernel ([kernel.vmlinux].init.text)
11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text)
1163623 x86_64_start_kernel ([kernel.vmlinux].init.text)
qemu-system-x86 9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux)
8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux)
430a br_handle_frame_finish ([bridge])
48bc br_handle_frame ([bridge])
855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
#
- Use 'perf probe' various options to list functions, see what
variables can be collected at any given point, experiment first
collecting without a filter, then filter, use it together with
'perf trace', 'perf top', with or without callchains, if it
explodes, please tell us!
- Introduce a new callchain mode: "folded", that will list per line
representations of all callchains for a give histogram entry,
facilitating 'perf report' output processing by other tools, such
as Brendan Gregg's flamegraph tools (Namhyung Kim)
E.g:
# perf report | grep -v ^# | head
18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
|
---cpu_startup_entry
|
|--12.07%--start_secondary
|
--6.30%--rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
#
Becomes, in "folded" mode:
# perf report -g folded | grep -v ^# | head -5
18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
12.07% cpu_startup_entry;start_secondary
6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
16.90% 0.00% swapper [kernel.kallsyms] [k] call_cpuidle
11.23% call_cpuidle;cpu_startup_entry;start_secondary
5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
16.90% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter
11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
15.12% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter_state
#
The user can also select one of "count", "period" or "percent" as
the first column.
... and lots of infrastructure enhancements, plus fixes and other
changes, features I failed to list - see the shortlog and the git log
for details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (271 commits)
perf evlist: Add --trace-fields option to show trace fields
perf record: Store data mmaps for dwarf unwind
perf libdw: Check for mmaps also in MAP__VARIABLE tree
perf unwind: Check for mmaps also in MAP__VARIABLE tree
perf unwind: Use find_map function in access_dso_mem
perf evlist: Remove perf_evlist__(enable|disable)_event functions
perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does)
perf report: Show random usage tip on the help line
perf hists: Export a couple of hist functions
perf diff: Use perf_hpp__register_sort_field interface
perf tools: Add overhead/overhead_children keys defaults via string
perf tools: Remove list entry from struct sort_entry
perf tools: Include all tools/lib directory for tags/cscope/TAGS targets
perf script: Align event name properly
perf tools: Add missing headers in perf's MANIFEST
perf tools: Do not show trace command if it's not compiled in
perf report: Change default to use event group view
perf top: Decay periods in callchains
tools lib: Move bitmap.[ch] from tools/perf/ to tools/{lib,include}/
tools lib: Sync tools/lib/find_bit.c with the kernel
...
This is a long standing bug with the l1-dcache-stores generic event on
AMD machines. My perf_event testsuite has been complaining about this
for years and I'm finally getting around to trying to get it fixed.
The data_cache_refills:system event does not make sense for l1-dcache-stores.
Maybe this was a typo and it was meant to be for l1-dcache-store-misses?
In any case, the values returned are nowhere near correct for l1-dcache-stores
and in fact the umask values for the event have completely changed with
fam15h so it makes even less sense than ever. So just remove it.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1512091134350.24311@vincent-weaver-1.umelst.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Knights Landing uncore performance monitoring (perfmon) is derived from
Haswell-EP uncore perfmon with several differences. One notable difference
is in PCI device IDs. Knights Landing uses common PCI device ID for
multiple instances of an uncore PMU device type. In Haswell-EP, each
instance of a PMU device type has a unique device ID.
Knights Landing uncore components that have performance monitoring units
are UBOX, CHA, EDC, MC, M2PCIe, IRP and PCU. Perfmon registers in EDC, MC,
IRP, and M2PCIe reside in the PCIe configuration space. Perfmon registers
in UBOX, CHA and PCU are accessed via the MSR interface.
For more details, please refer to the public document:
https://software.intel.com/sites/default/files/managed/15/8d/IntelXeonPhi%E2%84%A2x200ProcessorPerformanceMonitoringReferenceManual_Volume1_Registers_v0%206.pdf
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Harish Chegondi <harish.chegondi@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/8ac513981264c3eb10343a3f523f19cc5a2d12fe.1449470704.git.harish.chegondi@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Call uncore_pci_box_ctl() function to get the PMON box control MSR offset
instead of hard coding the offset. This would allow us to use this
snbep_uncore_pci_init_box() function for other PCI PMON devices whose box
control MSR offset is different from SNBEP_PCI_PMON_BOX_CTL.
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Harish Chegondi <harish.chegondi@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/872e8ef16cfc38e5ff3b45fac1094e6f1722e4ad.1449470704.git.harish.chegondi@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Knights Landing core is based on Silvermont core with several differences.
Like Silvermont, Knights Landing has 8 pairs of LBR MSRs. However, the
LBR MSRs addresses match those of the Xeon cores' first 8 pairs of LBR MSRs
Unlike Silvermont, Knights Landing supports hyperthreading. Knights Landing
offcore response events config register mask is different from that of the
Silvermont.
This patch was developed based on a patch from Andi Kleen.
For more details, please refer to the public document:
https://software.intel.com/sites/default/files/managed/15/8d/IntelXeonPhi%E2%84%A2x200ProcessorPerformanceMonitoringReferenceManual_Volume1_Registers_v0%206.pdf
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Harish Chegondi <harish.chegondi@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/d14593c7311f78c93c9cf6b006be843777c5ad5c.1449517401.git.harish.chegondi@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The uncore subsystem for Broadwell-EP is similar to Haswell-EP.
There are some differences in pci device IDs, box number and
constraints. This patch extends the Broadwell-DE codes to support
Broadwell-EP.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1449176411-9499-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch updates the PEBS support for Intel Atom to provide
an alias for the cycles:pp event used by perf record/top by default
nowadays.
On Atom, only INST_RETIRED:ANY supports PEBS, so we use this event
instead with a large cmask to count cycles. Given that Core2 has
the same issue, we use the intel_pebs_aliases_core2() function for Atom
as well.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1449172990-30183-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch fixes broken PEBS support on Intel Atom and Core2
due to wrong pointer arithmetic in intel_pmu_drain_pebs_core().
The get_next_pebs_record_by_bit() was called on PEBS format fmt0
which does not use the pebs_record_nhm layout.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Fixes: 21509084f9 ("perf/x86/intel: Handle multiple records in the PEBS buffer")
Link: http://lkml.kernel.org/r/1449182000-31524-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patches fixes the LBR kernel crashes on Intel Atom.
The kernel was assuming that if the CPU supports 64-bit format
LBR, then it has an LBR_SELECT MSR. Atom uses 64-bit LBR format
but does not have LBR_SELECT. That was causing NULL pointer
dereferences in a couple of places.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Fixes: 96f3eda67f ("perf/x86/intel: Fix static checker warning in lbr enable")
Link: http://lkml.kernel.org/r/1449182000-31524-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch fixes a bug in the filter_events() function.
The patch fixes the bug whereby if some mappings did not
exist, e.g., STALLED_CYCLES_FRONTEND, then any event after it
in the attrs array would disappear from the published list of
events in /sys/devices/cpu/events. This could be verified
easily on any system post SNB (which do not publish
STALLED_CYCLES_FRONTEND):
$ ./perf stat -e cycles,ref-cycles true
Performance counter stats for 'true':
1,217,348 cycles
<not supported> ref-cycles
The problem is that in filter_events() there is an assumption
that the argument (attrs) is organized in increasing continuous
event indexes related to the event_map(). But if we remove the
non-supported events by shifing the position in the array, then
the lookup x86_pmu.event_map() needs to compensate for it, otherwise
we are looking up the wrong index. This patch corrects this problem
by compensating for the deleted events and with that ref-cycles
reappears (here shown on Haswell):
$ perf stat -e ref-cycles,cycles true
Performance counter stats for 'true':
4,525,910 ref-cycles
1,064,920 cycles
0.002943888 seconds time elapsed
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: jolsa@kernel.org
Cc: kan.liang@intel.com
Fixes: 8300daa267 ("perf/x86: Filter out undefined events from sysfs events attribute")
Link: http://lkml.kernel.org/r/1449516805-6637-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add a new 'three-p' precise level, that uses INST_RETIRED.PREC_DIST as
base. The basic mechanism of abusing the inverse cmask to get all
cycles works the same as before.
PREC_DIST is available on Sandy Bridge or later. It had some problems
on Sandy Bridge, so we only use it on IvyBridge and later. I tested it
on Broadwell and Skylake.
PREC_DIST has special support for avoiding shadow effects, which can
give better results compare to UOPS_RETIRED. The drawback is that
PREC_DIST can only schedule on counter 1, but that is ok for cycle
sampling, as there is normally no need to do multiple cycle sampling
runs in parallel. It is still possible to run perf top in parallel, as
that doesn't use precise mode. Also of course the multiplexing can
still allow parallel operation.
:pp stays with the previous event.
Example:
Sample a loop with 10 sqrt with old cycles:pp
0.14 │10: sqrtps %xmm1,%xmm0 <--------------
9.13 │ sqrtps %xmm1,%xmm0
11.58 │ sqrtps %xmm1,%xmm0
11.51 │ sqrtps %xmm1,%xmm0
6.27 │ sqrtps %xmm1,%xmm0
10.38 │ sqrtps %xmm1,%xmm0
12.20 │ sqrtps %xmm1,%xmm0
12.74 │ sqrtps %xmm1,%xmm0
5.40 │ sqrtps %xmm1,%xmm0
10.14 │ sqrtps %xmm1,%xmm0
10.51 │ ↑ jmp 10
We expect all 10 sqrt to get roughly the sample number of samples.
But you can see that the instruction directly after the JMP is
systematically underestimated in the result, due to sampling shadow
effects.
With the new PREC_DIST based sampling this problem is gone and all
instructions show up roughly evenly:
9.51 │10: sqrtps %xmm1,%xmm0
11.74 │ sqrtps %xmm1,%xmm0
11.84 │ sqrtps %xmm1,%xmm0
6.05 │ sqrtps %xmm1,%xmm0
10.46 │ sqrtps %xmm1,%xmm0
12.25 │ sqrtps %xmm1,%xmm0
12.18 │ sqrtps %xmm1,%xmm0
5.26 │ sqrtps %xmm1,%xmm0
10.13 │ sqrtps %xmm1,%xmm0
10.43 │ sqrtps %xmm1,%xmm0
0.16 │ ↑ jmp 10
Even with PREC_DIST there is still sampling skid and the result is not
completely even, but systematic shadow effects are significantly
reduced.
The improvements are mainly expected to make a difference in high IPC
code. With low IPC it should be similar.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1448929689-13771-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I added UOPS_RETIRED.ALL by mistake to the Skylake PEBS event list for
cycles:pp. But the event is not documented for Skylake, and has some
issues.
The recommended replacement for cycles:pp is to use
INST_RETIRED.ANY+pebs as a base, similar to what CPUs before Sandy
Bridge did. This new event is called INST_RETIRED.TOTAL_CYCLES_PS. The
event is not really new, but has been already used by perf before
Sandy Bridge for the original cycles:p
Note the SDM doesn't document that event either, but it's being
documented in the latest version of the event list on:
https://download.01.org/perfmon/SKL
This patch does:
- Remove UOPS_RETIRED.ALL from the Skylake PEBS event list
- Add INST_RETIRED.ANY to the Skylake PEBS event list, and an table entry to
allow cmask=16,inv=1 for cycles:pp
- We don't need an extra entry for the base INST_RETIRED event,
because it is already covered by the catch-all PEBS table entry.
- Switch Skylake to use the Core2 PEBS alias (which is
INST_RETIRED.TOTAL_CYCLES_PS)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1448929689-13771-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Normally we drop PEBS events with a zero status field. But when
there is only a single PEBS event active we can assume the
PEBS record is for that event. The PEBS buffer is always flushed
when PEBS events are disabled, so there is no risk of mishandling
state PEBS records this way.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1449177740-5422-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The recent commit:
75f80859b1 ("perf/x86/intel/pebs: Robustify PEBS buffer drain")
causes lots of warnings on different CPUs before Skylake
when running PEBS intensive workloads.
They can have a zero status field in the PEBS record when
PEBS is racing with clearing of GLOBAl_STATUS.
This also can cause hangs (it seems there are still
problems with printk in NMI).
Disable the warning, but still ignore the record.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1449177740-5422-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The CHECK_MEMBER_AT_END_OF(TYPE, MEMBER) checks whether MEMBER
is last member of TYPE by evaluating:
offsetof(TYPE::MEMBER) + sizeof(TYPE::MEMBER) == sizeof(TYPE)
and ensuring TYPE::MEMBER is the last member of the TYPE.
This condition breaks on structs that are padded to be
aligned. This patch ensures the TYPE alignment is taken
into account.
This bug was revealed after adding cacheline alignment into
struct sched_entity, which broke task_struct::thread check:
CHECK_MEMBER_AT_END_OF(struct task_struct, thread);
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1450707930-3445-1-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is no need to worry about module and __init text disappearing
case, because that ftrace has a module notifier that is called when
a module is being unloaded and before the text goes away and this
code grabs the ftrace_lock mutex and removes the module functions
from the ftrace list, such that it will no longer do any
modifications to that module's text, the update to make functions
be traced or not is done under the ftrace_lock mutex as well.
And by now, __init section codes should not been modified
by ftrace, because it is black listed in recordmcount.c and
ignored by ftrace.
Link: http://lkml.kernel.org/r/1449367378-29430-6-git-send-email-huawei.libin@huawei.com
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The MMCFG PCI accessors weren't being setup for NumacConnect2
correctly due to over-early assignment; this would create the
potential for the wrong PCI domain to be accessed.
Fix this by using the correct arch-specific PCI init function.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Steffen Persvold <sp@numascale.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1451498807-15920-1-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This was meant to print base address and entry count; make it do so
again.
Fixes: 37868fe113 "x86/ldt: Make modify_ldt synchronous"
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/56797D8402000078000C24F0@prv-mh.provo.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The related warning from gcc 6.0:
arch/x86/kernel/ptrace.c:127:18: warning: ‘arg_offs_table’ defined but not used [-Wunused-const-variable]
static const int arg_offs_table[] = {
^~~~~~~~~~~~~~
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Link: http://lkml.kernel.org/r/1451137798-28701-1-git-send-email-chengang@emindsoft.com.cn
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Adding the rtc platform device in non-privileged Xen PV guests causes
an IRQ conflict because these guests do not have legacy PIC and may
allocate irqs in the legacy range.
In a single VCPU Xen PV guest we should have:
/proc/interrupts:
CPU0
0: 4934 xen-percpu-virq timer0
1: 0 xen-percpu-ipi spinlock0
2: 0 xen-percpu-ipi resched0
3: 0 xen-percpu-ipi callfunc0
4: 0 xen-percpu-virq debug0
5: 0 xen-percpu-ipi callfuncsingle0
6: 0 xen-percpu-ipi irqwork0
7: 321 xen-dyn-event xenbus
8: 90 xen-dyn-event hvc_console
...
But hvc_console cannot get its interrupt because it is already in use
by rtc0 and the console does not work.
genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0)
We can avoid this problem by realizing that unprivileged PV guests (both
Xen and lguests) are not supposed to have rtc_cmos device and so
adding it is not necessary.
Privileged guests (i.e. Xen's dom0) do use it but they should not have
irq conflicts since they allocate irqs above legacy range (above
gsi_top, in fact).
Instead of explicitly testing whether the guest is privileged we can
extend pv_info structure to include information about guest's RTC
support.
Reported-and-tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: vkuznets@redhat.com
Cc: xen-devel@lists.xenproject.org
Cc: konrad.wilk@oracle.com
Cc: stable@vger.kernel.org # 4.2+
Link: http://lkml.kernel.org/r/1449842873-2613-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Those are stupid and code should use static_cpu_has_safe() or
boot_cpu_has() instead. Kill the least used and unused ones.
The remaining ones need more careful inspection before a conversion can
happen. On the TODO.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de
Cc: David Sterba <dsterba@suse.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add an enum for the ->x86_capability array indices and cleanup
get_cpu_cap() by killing some redundant local vars.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-3-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Now, multiple CPUs can receive an external NMI simultaneously by
specifying the "apic_extnmi=all" command line parameter. When we take
a crash dump by using external NMI with this option, we fail to save
registers into the crash dump. This happens as follows:
CPU 0 CPU 1
================================ =============================
receive an external NMI
default_do_nmi() receive an external NMI
spin_lock(&nmi_reason_lock) default_do_nmi()
io_check_error() spin_lock(&nmi_reason_lock)
panic() busy loop
...
kdump_nmi_shootdown_cpus()
issue NMI IPI -----------> blocked until IRET
busy loop...
Here, since CPU 1 is in NMI context, an additional NMI from CPU 0
remains unhandled until CPU 1 IRETs. However, CPU 1 will never execute
IRET so the NMI is not handled and the callback function to save
registers is never called.
To solve this issue, we check if the IPI for crash dumping was issued
while waiting for nmi_reason_lock to be released, and if so, call its
callback function directly. If the IPI is not issued (e.g. kdump is
disabled), the actual behavior doesn't change.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20151210065245.4587.39316.stgit@softrs
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch introduces a command line parameter apic_extnmi:
apic_extnmi=( bsp|all|none )
The default value is "bsp" and this is the current behavior: only the
Boot-Strapping Processor receives an external NMI.
"all" allows external NMIs to be broadcast to all CPUs. This would
raise the success rate of panic on NMI when BSP hangs in NMI context
or the external NMI is swallowed by other NMI handlers on the BSP.
If you specify "none", no CPUs receive external NMIs. This is useful for
the dump capture kernel so that it cannot be shot down by accidentally
pressing the external NMI button (on platforms which have it) while
saving a crash dump.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bandan Das <bsd@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20151210014632.25437.43778.stgit@softrs
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(),
sends an NMI IPI to CPUs which haven't called panic() to stop them,
save their register information and do some cleanups for crash dumping.
However, if such a CPU is infinitely looping in NMI context, we fail to
save its register information into the crash dump.
For example, this can happen when unknown NMIs are broadcast to all
CPUs as follows:
CPU 0 CPU 1
=========================== ==========================
receive an unknown NMI
unknown_nmi_error()
panic() receive an unknown NMI
spin_trylock(&panic_lock) unknown_nmi_error()
crash_kexec() panic()
spin_trylock(&panic_lock)
panic_smp_self_stop()
infinite loop
kdump_nmi_shootdown_cpus()
issue NMI IPI -----------> blocked until IRET
infinite loop...
Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is
blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET,
so the NMI is not handled and the callback function to save registers is
never called.
In practice, this can happen on some servers which broadcast NMIs to all
CPUs when the NMI button is pushed.
To save registers in this case, we need to:
a) Return from NMI handler instead of looping infinitely
or
b) Call the callback function directly from the infinite loop
Inherently, a) is risky because NMI is also used to prevent corrupted
data from being propagated to devices. So, we chose b).
This patch does the following:
1. Move the infinite looping of CPUs which haven't called panic() in NMI
context (actually done by panic_smp_self_stop()) outside of panic() to
enable us to refer pt_regs. Please note that panic_smp_self_stop() is
still used for normal context.
2. Call a callback of kdump_nmi_shootdown_cpus() directly to save
registers and do some cleanups after setting waiting_for_crash_ipi which
is used for counting down the number of CPUs which handled the callback
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If panic on NMI happens just after panic() on the same CPU, panic() is
recursively called. Kernel stalls, as a result, after failing to acquire
panic_lock.
To avoid this problem, don't call panic() in NMI context if we've
already entered panic().
For that, introduce nmi_panic() macro to reduce code duplication. In
the case of panic on NMI, don't return from NMI handlers if another CPU
already panicked.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Intel's MCA implementation broadcasts MCEs to all CPUs on the
node. This poses a problem for offlined CPUs which cannot
participate in the rendezvous process:
Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
Kernel Offset: disabled
Rebooting in 100 seconds..
More specifically, Linux does a soft offline of a CPU when
writing a 0 to /sys/devices/system/cpu/cpuX/online, which
doesn't prevent the #MC exception from being broadcasted to that
CPU.
Ensure that offline CPUs don't participate in the MCE rendezvous
and clear the RIP valid status bit so that a second MCE won't
cause a shutdown.
Without the patch, mce_start() will increment mce_callin and
wait for all CPUs. Offlined CPUs should avoid participating in
the rendezvous process altogether.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1449742346-21470-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull perf fixes from Ingo Molnar:
"This tree includes four core perf fixes for misc bugs, three fixes to
x86 PMU drivers, and two updates to old email addresses"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Do not send exit event twice
perf/x86/intel: Fix INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA macro
perf/x86/intel: Make L1D_PEND_MISS.FB_FULL not constrained on Haswell
perf: Fix PERF_EVENT_IOC_PERIOD deadlock
treewide: Remove old email address
perf/x86: Fix LBR call stack save/restore
perf: Update email address in MAINTAINERS
perf/core: Robustify the perf_cgroup_from_task() RCU checks
perf/core: Fix RCU problem with cgroup context switching code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWZMgaAAoJEHm+PkMAQRiGGcIH+gNS/hbN2DKW7wphl1QuaV7C
1fror8AvpwbGa/o0yuxovaVuZzAR0TF31vn+gAemF4U/hnM25xqxEHXYZEVv8OWw
mbz4/z+jbVk3SiS5AiZPIZgL4W6RZnG5QYfiTVGPlBHuBznW2ITlNlClBOmBL45o
uhb3bjTzi70AZ7Gh6i9sHgJoHg6D9u/ZxLaLcWnM79BzyTMHTf2t0wnrQmh66lEE
hp7Rn9wXv9bk/e3iH7CVUb97P4IWhhkmfqcoturqAg9+C/M26b0VmvQp9Sy8S6Pd
FVQ+SUIZllj5ZDKe9mOcs37czlxTr0keEFqzWeMh/7y4iuI3RaRp/qb+7mX5sIE=
=WGZ1
-----END PGP SIGNATURE-----
Back merge tag 'v4.4-rc4' into drm-next
We've picked up a few conflicts and it would be nice
to resolve them before we move onwards.
Pull x86 fixes from Thoma Gleixner:
"Another round of fixes for x86:
- Move the initialization of the microcode driver to late_initcall to
make sure everything that init function needs is available.
- Make sure that lockdep knows about interrupts being off in the
entry code before calling into c-code.
- Undo the cpu hotplug init delay regression.
- Use the proper conditionals in the mpx instruction decoder.
- Fixup restart_syscall for x32 tasks.
- Fix the hugepage regression on PAE kernels which was introduced
with the latest PAT changes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/signal: Fix restart_syscall number for x32 tasks
x86/mpx: Fix instruction decoder condition
x86/mm: Fix regression with huge pages on PAE
x86 smpboot: Re-enable init_udelay=0 by default on modern CPUs
x86/entry/64: Fix irqflag tracing wrt context tracking
x86/microcode: Initialize the driver late when facilities are up
Now that we have generic MSR trace points we can remove the old
hackish perf MSR read tracing code.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1449018060-1742-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix a definition in the perf rapl driver. __initconst must
be applied to a const object, but to declare a const pointer
you need to use * const ..., not const ... *
This fixes a section attribute conflict with LTO builds.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1448905722-2767-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We need to add rest of the flags to the constraint mask
instead of another INTEL_ARCH_EVENT_MASK, fixing a typo.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1447061071-28085-1-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
when memory hotplug enabled system is booted with less
than 4GB of RAM and then later more RAM is hotplugged
32-bit devices stop functioning with following error:
nommu_map_single: overflow 327b4f8c0+1522 of device mask ffffffff
the reason for this is that if x86_64 system were booted
with RAM less than 4GB, it doesn't enable SWIOTLB and
when memory is hotplugged beyond MAX_DMA32_PFN, devices
that expect 32-bit addresses can't handle 64-bit addresses.
Fix it by tracking max possible PFN when parsing
memory affinity structures from SRAT ACPI table and
enable SWIOTLB if there is hotpluggable memory
regions beyond MAX_DMA32_PFN.
It fixes KVM guests when they use emulated devices
(reproduces with ata_piix, e1000 and usb devices,
RHBZ: 1275941, 1275977, 1271527)
It also fixes the HyperV, VMWare with emulated devices
which are affected by this issue as well.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: fujita.tomonori@lab.ntt.co.jp
Cc: konrad.wilk@oracle.com
Cc: pbonzini@redhat.com
Cc: revers@redhat.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1449234426-273049-3-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
max_possible_pfn will be used for tracking max possible
PFN for memory that isn't present in E820 table and
could be hotplugged later.
By default max_possible_pfn is initialized with max_pfn,
but later it could be updated with highest PFN of
hotpluggable memory ranges declared in ACPI SRAT table
if any present.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: fujita.tomonori@lab.ntt.co.jp
Cc: konrad.wilk@oracle.com
Cc: pbonzini@redhat.com
Cc: revers@redhat.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1449234426-273049-2-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When restarting a syscall with regs->ax == -ERESTART_RESTARTBLOCK,
regs->ax is assigned to a restart_syscall number. For x32 tasks, this
syscall number must have __X32_SYSCALL_BIT set, otherwise it will be
an x86_64 syscall number instead of a valid x32 syscall number. This
issue has been there since the introduction of x32.
Reported-by: strace/tests/restart_syscall.test
Reported-and-tested-by: Elvira Khabirova <lineprinter0@gmail.com>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: Elvira Khabirova <lineprinter0@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20151130215436.GA25996@altlinux.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Calling set_memory_rw() and set_memory_ro() for every iteration of the
loop in klp_write_object_relocations() is messy, inefficient, and
error-prone.
Change all the read-only pages to read-write before the loop and convert
them back to read-only again afterwards.
Suggested-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Makes it easier to handle init vs core cleanly, though the change is
fairly invasive across random architectures.
It simplifies the rbtree code immediately, however, while keeping the
core data together in the same cachline (now iff the rbtree code is
enabled).
Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
'range_new' doesn't seem to be used after init. It is only passed
to memset(), sum_ranges(), memcmp() and x86_get_mtrr_mem_range(), the
latter of which also only passes it on to various *range*
library functions.
So mark it __initdata to free up an extra page after init.
Its contents are wiped at every call to mtrr_calc_range_state(),
so it being static is not about preserving state between calls,
but simply to avoid a 4k+ stack frame. While there, add a
comment explaining this and why it's safe.
We could also mark nr_range_new as __initdata, but since it's
just a single int and also doesn't carry state between calls (it
is unconditionally assigned to before it is read), we might as
well make it an ordinary automatic variable.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/1449002691-20783-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If there are no persistent memory ranges present then don't bother
creating the platform device. Otherwise, it loads the full libnvdimm
sub-system only to discover no resources present.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Ingo noted that if we can guarantee _end is aligned to PAGE_SIZE
we can automatically avoid bugs along the lines of,
size = _end - _text >> PAGE_SHIFT
which is missing a call to PFN_ALIGN(). The EFI mixed mode
contains this bug, for example.
_text is already aligned to PAGE_SIZE through the use of
LOAD_PHYSICAL_ADDR, and the BSS and BRK sections are explicitly
aligned in the linker script, so it makes sense to align _end to
match.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1448658575-17029-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The cal_chipset_ops structures are never modified, so declare
them as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jon D. Mason <jdmason@kudzu.us>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448726295-10959-1-git-send-email-Julia.Lawall@lip6.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>
commit f1ccd24931 allowed the cmdline "cpu_init_udelay=" to work
with all values, including the default of 10000.
But in setting the default of 10000, it over-rode the code that sets
the delay 0 on modern processors.
Also, tidy up use of INT/UINT.
Fixes: f1ccd24931 "x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior"
Reported-by: Shane <shrybman@teksavvy.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: dparsons@brightdsl.net
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/9082eb809ef40dad02db714759c7aaf618c518d4.1448232494.git.len.brown@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
x86_init_rdrand() was added with 2 goals:
1. Sanity check that the built-in-self-test circuit on the Digital
Random Number Generator (DRNG) is not complaining. As RDRAND
HW self-checks on every invocation, this goal is achieved
by simply invoking RDRAND and checking its return code.
2. Force a full re-seed of the random number generator.
This was done out of paranoia to benefit the most un-sophisticated
DRNG implementation conceivable in the architecture,
an implementation that does not exist, and unlikely ever will.
This worst-case full-re-seed is achieved by invoking
a 64-bit RDRAND 8192 times.
Unfortunately, this worst-case re-seed costs O(1,000us).
Magnifying this cost, it is done from identify_cpu(), which is the
synchronous critical path to bring a processor on-line -- repeated
for every logical processor in the system at boot and resume from S3.
As it is very expensive, and of highly dubious value, we delete the
worst-case re-seed from the kernel.
We keep the 1st goal -- sanity check the hardware, and mark it absent
if it complains.
This change reduces the cost of x86_init_rdrand() by a factor of 1,000x,
to O(1us) from O(1,000us).
Signed-off-by: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/058618cc56ec6611171427ad7205e37e377aa8d4.1439738240.git.len.brown@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When an anomaly is found while modifying function code, ftrace_bug() is
called which disables the function tracing infrastructure and reports
information about what failed. If the code that is to be replaced does not
match what is expected, then actual code is shown. Currently there is no
arch generic way to show what was expected.
Add a new variable pointer calld ftrace_expected that the arch code can set
to point to what it expected so that ftrace_bug() can report the actual text
as well as the text that was expected to be there.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Saving and restoring lapic vectors in lapic_suspend() and
lapic_resume() is not consistent: the thmr vector saving is
guarded by a different config option than the restore part. The
cmci vector isn't handled at all.
Those inconsistencies are not very critical, as the missing cmci
vector will be set via mce resume handling, the wrong config
option used for restoring the thmr vector can't be configured
differently than the one which should be used.
Nevertheless correct the thmr vector restore and add cmci vector
handling.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448276364-31334-1-git-send-email-jgross@suse.com
[ Minor code edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So sparse rightfully complains that the u64 MSR value we're
writing into the STAR MSR, i.e. 0xc0000081, is being truncated:
./arch/x86/include/asm/msr.h:193:36: warning: cast truncates
bits from constant value (23001000000000 becomes 0)
because the actual value doesn't fit into the unsigned 32-bit
quantity which are the @low and @high wrmsrl() parameters.
This is not a problem, practically, because gcc is actually
being smart enough here and does the right thing:
.loc 3 87 0
xorl %esi, %esi # we needz a 32-bit zero
movl $2293776, %edx # 0x00230010 == (__USER32_CS << 16) | __KERNEL_CS go into the high bits
movl $-1073741695, %ecx # MSR_STAR, i.e., 0xc0000081
movl %esi, %eax # low order 32 bits in the MSR which are 0
#APP
# 87 "./arch/x86/include/asm/msr.h" 1
wrmsr
More specifically, MSR_STAR[31:0] is being set to 0. That field
is reserved on Intel and on AMD it is 32-bit SYSCALL Target EIP.
I'd strongly guess because Intel doesn't have SYSCALL in
compat/legacy mode and we're using SYSENTER and INT80 there. And
for compat syscalls in long mode we use CSTAR.
So let's fix the sparse warning by writing SYSRET and SYSCALL CS
and SS into the high 32-bit half of STAR and 0 in the low half
explicitly.
[ Actually, if we had to be precise, we would have to read what's in
STAR[31:0] and write it back unchanged on Intel and write 0 on AMD. I
guess the current writing to 0 is still ok since Intel can apparently
stomach it. ]
The resulting code is identical to what we have above:
.loc 3 87 0
xorl %esi, %esi # tmp104
movl $2293776, %eax #, tmp103
movl $-1073741695, %ecx #, tmp102
movl %esi, %edx # tmp104, tmp104
...
wrmsr
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The kernel accesses IC_CFG MSR (0xc0011021) on AMD because it
checks whether the way access filter is enabled on some F15h
models, and, if so, disables it.
kvm doesn't handle that MSR access and complains about it, which
can get really noisy in dmesg when one starts kvm guests all the
time for testing. And it is useless anyway - guest kernel
shouldn't be doing such changes anyway so tell it that that
filter is disabled.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add generic functions which calc family, model and stepping from
the CPUID_1.EAX leaf and stick them into the library we have.
Rename those which do call CPUID with the prefix "x86_cpuid" as
suggested by Paolo Bonzini.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The MCi_MISC bitfield definitions mce_usable_address() checks
are Intel-only. Make them so.
While at it, move mce_usable_address() up, before all its
callers and get rid of the forward declaration.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448350880-5573-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We simply need to look at the extended error code when detecting
whether the error is of type memory.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448350880-5573-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is useless and we can use the function instead. Besides,
mcelog(8) hasn't managed to make use of it yet. So kill it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448350880-5573-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We used to have a special ring buffer for deferred errors that
was used to mark problem pages. We replaced that with a generic
pool. Then later converted mce_log() to also use the same pool.
As a result, we end up adding all deferred errors to the pool
twice.
Rearrange this code. Make sure to set the m.severity and
m.usable_addr fields for deferred errors. Then if flags and
mca_cfg.dont_log_ce mean we call mce_log() we are done, because
that will add this entry to the generic pool.
If we skipped mce_log(), then we still want to take action for
the deferred error, so add to the pool.
Change the name of the boolean "error_logged" to "error_seen",
we should set it whether of not we logged an error because the
return value from machine_check_poll() is used to decide whether
storms have subsided or not.
Reported-by: Gong Chen <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1448350880-5573-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As result of commit "x86/xen: Avoid fast syscall path for Xen PV
guests", usergs_sysret32 pv op is not called by Xen PV guests
anymore and since they were the only ones who used it we can
safely remove it.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1447970147-1733-4-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As result of commit "x86/xen: Avoid fast syscall path for Xen PV
guests", the irq_enable_sysexit pv op is not called by Xen PV guests
anymore and since they were the only ones who used it we can
safely remove it.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1447970147-1733-3-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Running microcode_init() from setup_arch() is a bad idea because
not even kmalloc() is ready at that point and the loader does
all kinds of allocations and init/registration with various
subsystems.
Make it a late initcall when required facilities are initialized
so that the microcode driver initialization can succeed too.
Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151120112400.GC4028@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The earlier constraint fix for Broadwell CYCLE_ACTIVITY.*
forced umask 8 to counter 2. For this it used UEVENT,
to match the complete umask.
The event list for Broadwell has an additional
STALLS_L1D_PENDIND event that uses umask 8, but also
sets other bits in the umask. The earlier strict umask match
didn't handle this case.
Add a new UBIT_EVENT constraint macro that only matches
the specified bits in the umask. Then use that macro
to handle CYCLE_ACTIVITY.* on Broadwell.
The documented event also uses cmask, but there's no
need to let the event scheduler know about the cmask,
as the scheduling restriction is only tied to the umask.
Reported-by: Grant Ayers <ayers@cs.stanford.edu>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1447719667-9998-1-git-send-email-andi@firstfloor.org
[ Filled in the missing email address of Grant Ayers - hopefully I got the right one. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch stops Intel PT logging and saves its registers in memory
before kdump is started. This feature is needed to prevent Intel PT from
overwriting its log buffer after panic, and saved registers are needed to
find the last position where Intel PT wrote data.
After the crash dump is captured by kdump, users can retrieve the log buffer
from the vmcore and use it to investigate bad kernel behavior.
Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin<alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H.Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1446614553-6072-3-git-send-email-indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch add a function for external components to stop Intel PT.
Basically this function is used when kernel panic occurs. When it is
called, the intel_pt driver disables Intel PT and saves its registers
using pt_event_stop(), which is also used by pmu.stop handler.
This function stops Intel PT on the CPU where it is working, therefore
users of it need to call it for each CPU to stop all logging.
Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin<alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H.Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1446614553-6072-2-git-send-email-indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With LBRv5 reading the extra LBR flags like mispredict, TSX, cycles is
not free anymore, as it has moved to a separate MSR.
For callstack mode we don't need any of this information; so we can
avoid the unnecessary MSR read. Add flags to the perf interface where
perf record can request not collecting this information.
Add branch_sample_type flags for CYCLES and FLAGS. It's a bit unusual
for branch_sample_types to be negative (disable), not positive (enable),
but since the legacy ABI reported the flags we need some form of
explicit disabling to avoid breaking the ABI.
After we have the flags the x86 perf code can keep track if any users
need the flags. If noone needs it the information is not collected.
This cuts down the cost of LBR callstack on Skylake significantly.
Profiling a kernel build with LBR call stack the average run time of
the PMI handler drops by 43%.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1445366797-30894-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change the perf user stack walking to use the new
__copy_from_user_nmi(), and split each access into word sized transfer
sizes. This allows to inline the complete access and optimize it all
into a single load.
The main advantage is that this avoids the overhead of double page
faults. When normal copy_from_user() fails it reexecutes the copy to
compute an accurate number of non copied bytes. This leads to
executing the expensive page fault twice.
While walking stacks having a fault at some point is relatively common
(typically when some part of the program isn't compiled with frame
pointers), so this is a large overhead.
With the optimized copies we avoid this problem because they only do
all accesses once. And of course they're much faster too when the
access does not fault because they're just single instructions instead
of complex function calls.
While profiling a kernel build with -g, the patch brings down the
average time of the PMI handler from 966ns to 552ns (-43%).
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1445551641-13379-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There were still a number of references to my old Red Hat email
address in the kernel source. Remove these while keeping the
Red Hat copyright notices intact.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This fixes a bug I added in the following commit:
90405aa022 ("perf/x86/intel/lbr: Limit LBR accesses to TOS in callstack mode")
The bug could lead to lost LBR call stacks. When restoring the LBR state
we need to use the TOS of the previous context, not the current context.
To do that we need to save/restore the TOS.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1445366797-30894-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch reinforces the lockdep checks performed by
perf_cgroup_from_tsk() by passing the perf_event_context
whenever possible. It is okay to not hold the RCU read lock
when we know we hold the ctx->lock. This patch makes sure this
property holds.
In some functions, such as perf_cgroup_sched_in(), we do not
pass the context because we are sure we are holding the RCU
read lock.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: edumazet@google.com
Link: http://lkml.kernel.org/r/1447322404-10920-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix a Linux-4.3 corner case performance regression, introduced by commit:
f1ccd24931 ("x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior")
which allowed the cmdline "cpu_init_udelay=" to work with all values,
including the default of 10000.
But in setting the default of 10000, it over-rode the code stat sets
the delay 0 on modern processors.
Also, tidy up use of INT/UINT.
Reported-by: Shane <shrybman@teksavvy.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dparsons@brightdsl.net
Link: http://lkml.kernel.org/r/9082eb809ef40dad02db714759c7aaf618c518d4.1448232494.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWUmHZAAoJEHm+PkMAQRiGHtcH/RVRsn8re0WdRWYaTr9+Hknm
CGlRJN4LKecttgYQ/2bS1QsDbt8usDPBiiYVopqGXQxPBmjyDAqPjsa+8VzCaVc6
WA+9LDB+PcW28lD6BO+qSZCOAm7hHSZq7dtw9x658IqO+mI2mVeCybsAyunw2iWi
Kf5q90wq6tIBXuT8YH9MXGrSCQw00NclbYeYwB9CmCt9hT/koEFBdl7uFUFitB+Q
GSPTz5fXhgc5Lms85n7flZlrVKoQKmtDQe4/DvKZm+SjsATHU9ru89OxDBdS5gSG
YcEIM4zc9tMjhs3GC9t6WXf6iFOdctum8HOhUoIN/+LVfeOMRRwAhRVqtGJ//Xw=
=DCUg
-----END PGP SIGNATURE-----
Merge tag 'v4.4-rc2' into drm-intel-next-queued
Linux 4.4-rc2
Backmerge to get at
commit 1b0e3a049e
Author: Imre Deak <imre.deak@intel.com>
Date: Thu Nov 5 23:04:11 2015 +0200
drm/i915/skl: disable display side power well support for now
so that we can proplery re-eanble skl power wells in -next.
Conflicts are just adjacent lines changed, except for intel_fbdev.c
where we need to interleave the changs. Nothing nefarious.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Pull x86 fixes from Thomas Gleixner:
"This update contains:
- MPX updates for handling 32bit processes
- A fix for a long standing bug in 32bit signal frame handling
related to FPU/XSAVE state
- Handle get_xsave_addr() correctly in KVM
- Fix SMAP check under paravirtualization
- Add a comment to the static function trace entry to avoid further
confusion about the difference to dynamic tracing"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Fix SMAP check in PVOPS environments
x86/ftrace: Add comment on static function tracing
x86/fpu: Fix get_xsave_addr() behavior under virtualization
x86/fpu: Fix 32-bit signal frame handling
x86/mpx: Fix 32-bit address space calculation
x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernels
There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely. Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at. This may
have been true when initially implemented, but no longer is.
To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour. Experimentally, this now trips for 32bit PV
guests on Broadwell hardware. The BUG_ON() is consistent for an individual
build, but not consistent for all builds. It has also been a sitting timebomb
since SMAP support was introduced.
Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <lguest@lists.ozlabs.org>
Cc: Xen-devel <xen-devel@lists.xen.org>
CC: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.cooper3@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There was a confusion between update_ftrace_function() and static
function tracing trampoline regarding 3rd parameter (ftrace_ops).
Add a comment for clarification.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1447721004-2551-1-git-send-email-namhyung@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull perf updates from Thomas Gleixner:
"Mostly updates to the perf tool plus two fixes to the kernel core code:
- Handle tracepoint filters correctly for inherited events (Peter
Zijlstra)
- Prevent a deadlock in perf_lock_task_context (Paul McKenney)
- Add missing newlines to some pr_err() calls (Arnaldo Carvalho de
Melo)
- Print full source file paths when using 'perf annotate --print-line
--full-paths' (Michael Petlan)
- Fix 'perf probe -d' when just one out of uprobes and kprobes is
enabled (Wang Nan)
- Add compiler.h to list.h to fix 'make perf-tar-src-pkg' generated
tarballs, i.e. out of tree building (Arnaldo Carvalho de Melo)
- Add the llvm-src-base.c and llvm-src-kbuild.c files, generated by
the 'perf test' LLVM entries, when running it in-tree, to
.gitignore (Yunlong Song)
- libbpf error reporting improvements, using a strerror interface to
more precisely tell the user about problems with the provided
scriptlet, be it in C or as a ready made object file (Wang Nan)
- Do not be case sensitive when searching for matching 'perf test'
entries (Arnaldo Carvalho de Melo)
- Inform the user about objdump failures in 'perf annotate' (Andi
Kleen)
- Improve the LLVM 'perf test' entry, introduce a new ones for BPF
and kbuild tests to check the environment used by clang to compile
.c scriptlets (Wang Nan)"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
perf/x86/intel/rapl: Remove the unused RAPL_EVENT_DESC() macro
tools include: Add compiler.h to list.h
perf probe: Verify parameters in two functions
perf session: Add missing newlines to some pr_err() calls
perf annotate: Support full source file paths for srcline fix
perf test: Add llvm-src-base.c and llvm-src-kbuild.c to .gitignore
perf: Fix inherited events vs. tracepoint filters
perf: Disable IRQs across RCU RS CS that acquires scheduler lock
perf test: Do not be case sensitive when searching for matching tests
perf test: Add 'perf test BPF'
perf test: Enhance the LLVM tests: add kbuild test
perf test: Enhance the LLVM test: update basic BPF test program
perf bpf: Improve BPF related error messages
perf tools: Make fetch_kernel_version() publicly available
bpf tools: Add new API bpf_object__get_kversion()
bpf tools: Improve libbpf error reporting
perf probe: Cleanup find_perf_probe_point_from_map to reduce redundancy
perf annotate: Inform the user about objdump failures in --stdio
perf stat: Make stat options global
perf sched latency: Fix thread pid reuse issue
...
Pull x86 fixes from Thomas Gleixner:
"A couple of fixes and updates related to x86:
- Fix the W+X check regression on XEN
- The real fix for the low identity map trainwreck
- Probe legacy PIC early instead of unconditionally allocating legacy
irqs
- Add cpu verification to long mode entry
- Adjust the cache topology to AMD Fam17H systems
- Let Merrifield use the TSC across S3"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Call verify_cpu() after having entered long mode too
x86/setup: Fix low identity map for >= 2GB kernel range
x86/mm: Skip the hypervisor range when walking PGD
x86/AMD: Fix last level cache topology for AMD Fam17h systems
x86/irq: Probe for PIC presence before allocating descs for legacy IRQs
x86/cpu/intel: Enable X86_FEATURE_NONSTOP_TSC_S3 for Merrifield
KVM uses the get_xsave_addr() function in a different fashion from
the native kernel, in that the 'xsave' parameter belongs to guest vcpu,
not the currently running task.
But 'xsave' is replaced with current task's (host) xsave structure, so
get_xsave_addr() will incorrectly return the bad xsave address to KVM.
Fix it so that the passed in 'xsave' address is used - as intended
originally.
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/1446800423-21622-1-git-send-email-huaitong.han@intel.com
[ Tidied up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(This should have gone to LKML originally. Sorry for the extra
noise, folks on the cc.)
Background:
Signal frames on x86 have two formats:
1. For 32-bit executables (whether on a real 32-bit kernel or
under 32-bit emulation on a 64-bit kernel) we have a
'fpregset_t' that includes the "FSAVE" registers.
2. For 64-bit executables (on 64-bit kernels obviously), the
'fpregset_t' is smaller and does not contain the "FSAVE"
state.
When creating the signal frame, we have to be aware of whether
we are running a 32 or 64-bit executable so we create the
correct format signal frame.
Problem:
save_xstate_epilog() uses 'fx_sw_reserved_ia32' whenever it is
called for a 32-bit executable. This is for real 32-bit and
ia32 emulation.
But, fpu__init_prepare_fx_sw_frame() only initializes
'fx_sw_reserved_ia32' when emulation is enabled, *NOT* for real
32-bit kernels.
This leads to really wierd situations where 32-bit programs
lose their extended state when returning from a signal handler.
The kernel copies the uninitialized (zero) 'fx_sw_reserved_ia32'
out to userspace in save_xstate_epilog(). But when returning
from the signal, the kernel errors out in check_for_xstate()
when it does not see FP_XSTATE_MAGIC1 present (because it was
zeroed). This leads to the FPU/XSAVE state being initialized.
For MPX, this leads to the most permissive state and means we
silently lose bounds violations. I think this would also mean
that we could lose *ANY* FPU/SSE/AVX state. I'm not sure why
no one has spotted this bug.
I believe this was broken by:
72a671ced6 ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")
way back in 2012.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@sr71.net
Cc: fenghua.yu@intel.com
Cc: yu-cheng.yu@intel.com
Link: http://lkml.kernel.org/r/20151111002354.A0799571@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge second patch-bomb from Andrew Morton:
- most of the rest of MM
- procfs
- lib/ updates
- printk updates
- bitops infrastructure tweaks
- checkpatch updates
- nilfs2 update
- signals
- various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
dma-debug, dma-mapping, ...
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
ipc,msg: drop dst nil validation in copy_msg
include/linux/zutil.h: fix usage example of zlib_adler32()
panic: release stale console lock to always get the logbuf printed out
dma-debug: check nents in dma_sync_sg*
dma-mapping: tidy up dma_parms default handling
pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
kexec: use file name as the output message prefix
fs, seqfile: always allow oom killer
seq_file: reuse string_escape_str()
fs/seq_file: use seq_* helpers in seq_hex_dump()
coredump: change zap_threads() and zap_process() to use for_each_thread()
coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
signals: kill block_all_signals() and unblock_all_signals()
nilfs2: fix gcc uninitialized-variable warnings in powerpc build
nilfs2: fix gcc unused-but-set-variable warnings
MAINTAINERS: nilfs2: add header file for tracing
nilfs2: add tracepoints for analyzing reading and writing metadata files
...
Pull livepatching fix from Jiri Kosina:
"A fix for a kernel oops in case CONFIG_DEBUG_SET_MODULE_RONX is unset
(as in such case it's possible for module struct to share a page with
executable text, which is currently not being handled with grace) from
Josh Poimboeuf"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX
When we get loaded by a 64-bit bootloader, kernel entry point is
startup_64 in head_64.S. We don't trust any and all bootloaders because
some will fiddle with CPU configuration so we go ahead and massage each
CPU into sanity again.
For example, some dell BIOSes have this XD disable feature which set
IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround
for other OSes but Linux sure doesn't need it.
A similar thing is present in the Surface 3 firmware - see
https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit
only on the BSP:
# rdmsr -a 0x1a0
400850089
850089
850089
850089
I know, right?!
There's not even an off switch in there.
So fix all those cases by sanitizing the 64-bit entry point too. For
that, make verify_cpu() callable in 64-bit mode also.
Requested-and-debugged-by: "H. Peter Anvin" <hpa@zytor.com>
Reported-and-tested-by: Bastien Nocera <bugzilla@hadess.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The commit f5f3497cad extended the low identity mapping. However, if
the kernel uses more than 2 GB (VMSPLIT_2G_OPT or VMSPLIT_1G memory
split), the normal memory mapping is overwritten by the low identity
mapping causing a crash. To avoid overwritting, limit the low identity
map to cover only memory before kernel range (PAGE_OFFSET).
Fixes: f5f3497cad "x86/setup: Extend low identity map to cover whole kernel range
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446815916-22105-1-git-send-email-krzysiek@podlesie.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
On AMD Fam17h systems, the last level cache is not resident in the
northbridge. Therefore, we cannot assign cpu_llc_id to the same value as
Node ID as we have been doing until now.
We should rather look at the ApicID bits of the core to provide us the
last level cache ID info.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Link: http://lkml.kernel.org/r/1446582899-9378-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain
interfaces") brought a regression for Hyper-V Gen2 instances. These
instances don't have i8259 legacy PIC but they use legacy IRQs for serial
port, rtc, and acpi. With this commit included we end up with these IRQs
not initialized. Earlier, there was a special workaround for legacy IRQs
in mp_map_pin_to_irq() doing mp_irqdomain_map() without looking at
nr_legacy_irqs() and now we fail in __irq_domain_alloc_irqs() when
irq_domain_alloc_descs() returns -EEXIST.
The essence of the issue seems to be that early_irq_init() calls
arch_probe_nr_irqs() to figure out the number of legacy IRQs before
we probe for i8259 and gets 16. Later when init_8259A() is called we switch
to NULL legacy PIC and nr_legacy_irqs() starts to return 0 but we already
have 16 descs allocated.
Solve the issue by separating i8259 probe from init and calling it in
arch_probe_nr_irqs() before we actually use nr_legacy_irqs() information.
Fixes: d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446543614-3621-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The Intel Merrifield SoC is a successor of the Intel MID line of
SoCs. Let's set the neccessary capability for that chip. See commit
c54fdbb282 (x86: Add cpu capability flag X86_FEATURE_NONSTOP_TSC_S3)
for the details.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: http://lkml.kernel.org/r/1444319786-36125-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
stable tags to them. I searched through my INBOX just as the merge window
opened and found lots of patches to pull. I ran them through all my tests
and they were in linux-next for a few days.
Features added this release:
----------------------------
o Module globbing. You can now filter function tracing to several
modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)
o Tracer specific options are now visible even when the tracer is not
active. It was rather annoying that you can only see and modify tracer
options after enabling the tracer. Now they are in the options/ directory
even when the tracer is not active. Although they are still only visible
when the tracer is active in the trace_options file.
o Trace options are now per instance (although some of the tracer specific
options are global)
o New tracefs file: set_event_pid. If any pid is added to this file, then
all events in the instance will filter out events that are not part of
this pid. sched_switch and sched_wakeup events handle next and the wakee
pids.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWPLQ5AAoJEKKk/i67LK/8CTYIAI1u8DE5QCzv3J0p54jVpNVR
J5FqEU3eXIzd6FS4JXD4nxCeMpUZAy21YnhlZpsnrbJJM5bc9bUsBCwiKKM+MuSZ
ztmy2sgYKkO0h/KUdhNgYJrzis3/Ojquyx9iAqK5ST/Fr+nKYx81akFKjNK53iur
RJRut45sSa8rv11LaL8sgJ6hAWQTc+YkybUdZ5xaMdJmZ6A61T7Y6VzTjbUexuvL
hntCfTjYLtVd8dbfknAnf3B7n/VOO3IFF85wr7ciYR5oEVfPrF8tHmJBlhHExPpX
kaXAiDDRY/UTg/5DQqnp4zmxJoR5BQ2l4pT5PwiLcnwhcphIDNYS8EYUmOYAWjU=
=TjOE
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracking updates from Steven Rostedt:
"Most of the changes are clean ups and small fixes. Some of them have
stable tags to them. I searched through my INBOX just as the merge
window opened and found lots of patches to pull. I ran them through
all my tests and they were in linux-next for a few days.
Features added this release:
----------------------------
- Module globbing. You can now filter function tracing to several
modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)
- Tracer specific options are now visible even when the tracer is not
active. It was rather annoying that you can only see and modify
tracer options after enabling the tracer. Now they are in the
options/ directory even when the tracer is not active. Although
they are still only visible when the tracer is active in the
trace_options file.
- Trace options are now per instance (although some of the tracer
specific options are global)
- New tracefs file: set_event_pid. If any pid is added to this file,
then all events in the instance will filter out events that are not
part of this pid. sched_switch and sched_wakeup events handle next
and the wakee pids"
* tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits)
tracefs: Fix refcount imbalance in start_creating()
tracing: Put back comma for empty fields in boot string parsing
tracing: Apply tracer specific options from kernel command line.
tracing: Add some documentation about set_event_pid
ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark
tracing: Allow dumping traces without tracking trace started cpus
ring_buffer: Fix more races when terminating the producer in the benchmark
ring_buffer: Do no not complete benchmark reader too early
tracing: Remove redundant TP_ARGS redefining
tracing: Rename max_stack_lock to stack_trace_max_lock
tracing: Allow arch-specific stack tracer
recordmcount: arm64: Replace the ignored mcount call into nop
recordmcount: Fix endianness handling bug for nop_mcount
tracepoints: Fix documentation of RCU lockdep checks
tracing: ftrace_event_is_function() can return boolean
tracing: is_legal_op() can return boolean
ring-buffer: rb_event_is_commit() can return boolean
ring-buffer: rb_per_cpu_empty() can return boolean
ring_buffer: ring_buffer_empty{cpu}() can return boolean
ring-buffer: rb_is_reader_page() can return boolean
...
handling.
PPC: Mostly bug fixes.
ARM: No big features, but many small fixes and prerequisites including:
- a number of fixes for the arch-timer
- introducing proper level-triggered semantics for the arch-timers
- a series of patches to synchronously halt a guest (prerequisite for
IRQ forwarding)
- some tracepoint improvements
- a tweak for the EL2 panic handlers
- some more VGIC cleanups getting rid of redundant state
x86: quite a few changes:
- support for VT-d posted interrupts (i.e. PCI devices can inject
interrupts directly into vCPUs). This introduces a new component (in
virt/lib/) that connects VFIO and KVM together. The same infrastructure
will be used for ARM interrupt forwarding as well.
- more Hyper-V features, though the main one Hyper-V synthetic interrupt
controller will have to wait for 4.5. These will let KVM expose Hyper-V
devices.
- nested virtualization now supports VPID (same as PCID but for vCPUs)
which makes it quite a bit faster
- for future hardware that supports NVDIMM, there is support for clflushopt,
clwb, pcommit
- support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
userspace, which reduces the attack surface of the hypervisor
- obligatory smattering of SMM fixes
- on the guest side, stable scheduler clock support was rewritten to not
require help from the hypervisor.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a
f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw
VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u
p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ
PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG
iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc=
=iv2z
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"First batch of KVM changes for 4.4.
s390:
A bunch of fixes and optimizations for interrupt and time handling.
PPC:
Mostly bug fixes.
ARM:
No big features, but many small fixes and prerequisites including:
- a number of fixes for the arch-timer
- introducing proper level-triggered semantics for the arch-timers
- a series of patches to synchronously halt a guest (prerequisite
for IRQ forwarding)
- some tracepoint improvements
- a tweak for the EL2 panic handlers
- some more VGIC cleanups getting rid of redundant state
x86:
Quite a few changes:
- support for VT-d posted interrupts (i.e. PCI devices can inject
interrupts directly into vCPUs). This introduces a new
component (in virt/lib/) that connects VFIO and KVM together.
The same infrastructure will be used for ARM interrupt
forwarding as well.
- more Hyper-V features, though the main one Hyper-V synthetic
interrupt controller will have to wait for 4.5. These will let
KVM expose Hyper-V devices.
- nested virtualization now supports VPID (same as PCID but for
vCPUs) which makes it quite a bit faster
- for future hardware that supports NVDIMM, there is support for
clflushopt, clwb, pcommit
- support for "split irqchip", i.e. LAPIC in kernel +
IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
the hypervisor
- obligatory smattering of SMM fixes
- on the guest side, stable scheduler clock support was rewritten
to not require help from the hypervisor"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
KVM: VMX: Fix commit which broke PML
KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
KVM: x86: allow RSM from 64-bit mode
KVM: VMX: fix SMEP and SMAP without EPT
KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
KVM: device assignment: remove pointless #ifdefs
KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
KVM: x86: zero apic_arb_prio on reset
drivers/hv: share Hyper-V SynIC constants with userspace
KVM: x86: handle SMBASE as physical address in RSM
KVM: x86: add read_phys to x86_emulate_ops
KVM: x86: removing unused variable
KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
KVM: arm/arm64: Optimize away redundant LR tracking
KVM: s390: use simple switch statement as multiplexer
KVM: s390: drop useless newline in debugging data
KVM: s390: SCA must not cross page boundaries
KVM: arm: Do not indent the arguments of DECLARE_BITMAP
...
All APIC implementation have send_IPI now. Remove the conditional in
the calling code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.807817097@linutronix.de
Wire up the default_send_IPI_single() wrapper to the last holdouts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.711224890@linutronix.de
Instead of doing the wrapping in the smp code we can provide a default
wrapper for those APICs which insist on cpumasks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.631111846@linutronix.de
No value in having 32 byte extra text.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220848.975653382@linutronix.de
[ tglx: Split it out from the patch which provides the new callback ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220848.817975597@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We still fall back on the "send mask" versions if an apic definition
doesn't have the single-target version, but at least this allows the
(trivial) case for the common clustered x2apic case.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220848.737120838@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
- ACPICA update to upstream revision 20150930 (Bob Moore, Lv Zheng).
The most significant change is to allow the AML debugger to be
built into the kernel. On top of that there is an update related
to the NFIT table (the ACPI persistent memory interface)
and a few fixes and cleanups.
- ACPI CPPC2 (Collaborative Processor Performance Control v2)
support along with a cpufreq frontend (Ashwin Chaugule).
This can only be enabled on ARM64 at this point.
- New ACPI infrastructure for the early probing of IRQ chips and
clock sources (Marc Zyngier).
- Support for a new hierarchical properties extension of the ACPI
_DSD (Device Specific Data) device configuration object allowing
the kernel to handle hierarchical properties (provided by the
platform firmware this way) automatically and make them available
to device drivers via the generic device properties interface
(Rafael Wysocki).
- Generic device properties API extension to obtain an index of
certain string value in an array of strings, along the lines of
of_property_match_string(), but working for all of the supported
firmware node types, and support for the "dma-names" device
property based on it (Mika Westerberg).
- ACPI core fix to parse the MADT (Multiple APIC Description Table)
entries in the order expected by platform firmware (and mandated
by the specification) to avoid confusion on systems with more than
255 logical CPUs (Lukasz Anaczkowski).
- Consolidation of the ACPI-based handling of PCI host bridges
on x86 and ia64 (Jiang Liu).
- ACPI core fixes to ensure that the correct IRQ number is used to
represent the SCI (System Control Interrupt) in the cases when
it has been re-mapped (Chen Yu).
- New ACPI backlight quirk for Lenovo IdeaPad S405 (Hans de Goede).
- ACPI EC driver fixes (Lv Zheng).
- Assorted ACPI fixes and cleanups (Dan Carpenter, Insu Yun, Jiri
Kosina, Rami Rosen, Rasmus Villemoes).
- New mechanism in the PM core allowing drivers to check if the
platform firmware is going to be involved in the upcoming system
suspend or if it has been involved in the suspend the system is
resuming from at the moment (Rafael Wysocki).
This should allow drivers to optimize their suspend/resume
handling in some cases and the changes include a couple of users
of it (the i8042 input driver, PCI PM).
- PCI PM fix to prevent runtime-suspended devices with PME enabled
from being resumed during system suspend even if they aren't
configured to wake up the system from sleep (Rafael Wysocki).
- New mechanism to report the number of a wakeup IRQ that woke up
the system from sleep last time (Alexandra Yates).
- Removal of unused interfaces from the generic power domains
framework and fixes related to latency measurements in that
code (Ulf Hansson, Daniel Lezcano).
- cpufreq core sysfs interface rework to make it handle CPUs that
share performance scaling settings (represented by a common
cpufreq policy object) more symmetrically (Viresh Kumar).
This should help to simplify the CPU offline/online handling among
other things.
- cpufreq core fixes and cleanups (Viresh Kumar).
- intel_pstate fixes related to the Turbo Activation Ratio (TAR)
mechanism on client platforms which causes the turbo P-states
range to vary depending on platform firmware settings (Srinivas
Pandruvada).
- intel_pstate sysfs interface fix (Prarit Bhargava).
- Assorted cpufreq driver (imx, tegra20, powernv, integrator) fixes
and cleanups (Bai Ping, Bartlomiej Zolnierkiewicz, Shilpasri G
Bhat, Luis de Bethencourt).
- cpuidle mvebu driver cleanups (Russell King).
- OPP (Operating Performance Points) framework code reorganization
to make it more maintainable (Viresh Kumar).
- Intel Broxton support for the RAPL (Running Average Power Limits)
power capping driver (Amy Wiles).
- Assorted power management code fixes and cleanups (Dan Carpenter,
Geert Uytterhoeven, Geliang Tang, Luis de Bethencourt, Rasmus
Villemoes).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJWOC9oAAoJEILEb/54YlRx/c8P/joflwoFsISwJccG62YTQMuc
bMQKM4Kw0vl5La8+pkLpe5t6+mW7l81UFtYF6Dzd8LOKlD9sszD34z1lHmCeT/oR
wn0uZpHagRyLMUfoyiEtlU/VRU6WQNNtS3EgjwUi7xgFz9Q0pjcCZ9OQ6vKov1j5
+6j40ODif5sgo+2vl+rztJiV0SIMkYdkgNqgfN1FE9bdLA2Zkk+PxxJbtGQORuDu
O/K+XhQT2xWquVWi/1p+VtQxs5glBS1oKm0kogV5bElCvNTRNIVABUNcjogITQwo
QSAKgoCKIoaIl5jtDT6u5dc0y67q/dMtqOY9fOCcOz1Z7jbWQzR8D7mpFWIsJUPK
K2LClI3t85ynpN6Jref246A6+C9nwB8JMAiAR11oBw7WbBlkd6tbRgcT5B+iz8UE
FuCCif7pha/Fs+Jt1YRazscIqteQ2bAhhxikuIPMfw2M6M67MNfVNeKA1bAoSM34
dH7JsilblitvV7shrwJHwXPXCOF2jEPoK8I4/q2+TR5qUxEpRJjelQxXGSaQScMZ
iNnjeTgv8H8q+rY5Yjzsl4pxP0Fvf7IuqkptWOJbgepg4cQc9pS87wOpY3uEeQzr
H7ruaQJFCnLO4aXbPNClsiJARhrBk+qMlsh4vBEyCJ2T0ucb+nIUcN4BTi8t85yl
X97BfHHUiDoUrnIsNids
=1gaH
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.4-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"Quite a new features are included this time.
First off, the Collaborative Processor Performance Control interface
(version 2) defined by ACPI will now be supported on ARM64 along with
a cpufreq frontend for CPU performance scaling.
Second, ACPI gets a new infrastructure for the early probing of IRQ
chips and clock sources (along the lines of the existing similar
mechanism for DT).
Next, the ACPI core and the generic device properties API will now
support a recently introduced hierarchical properties extension of the
_DSD (Device Specific Data) ACPI device configuration object. If the
ACPI platform firmware uses that extension to organize device
properties in a hierarchical way, the kernel will automatically handle
it and make those properties available to device drivers via the
generic device properties API.
It also will be possible to build the ACPICA's AML interpreter
debugger into the kernel now and use that to diagnose AML-related
problems more efficiently. In the future, this should make it
possible to single-step AML execution and do similar things.
Interesting stuff, although somewhat experimental at this point.
Finally, the PM core gets a new mechanism that can be used by device
drivers to distinguish between suspend-to-RAM (based on platform
firmware support) and suspend-to-idle (or other variants of system
suspend the platform firmware is not involved in) and possibly
optimize their device suspend/resume handling accordingly.
In addition to that, some existing features are re-organized quite
substantially.
First, the ACPI-based handling of PCI host bridges on x86 and ia64 is
unified and the common code goes into the ACPI core (so as to reduce
code duplication and eliminate non-essential differences between the
two architectures in that area).
Second, the Operating Performance Points (OPP) framework is
reorganized to make the code easier to find and follow.
Next, the cpufreq core's sysfs interface is reorganized to get rid of
the "primary CPU" concept for configurations in which the same
performance scaling settings are shared between multiple CPUs.
Finally, some interfaces that aren't necessary any more are dropped
from the generic power domains framework.
On top of the above we have some minor extensions, cleanups and bug
fixes in multiple places, as usual.
Specifics:
- ACPICA update to upstream revision 20150930 (Bob Moore, Lv Zheng).
The most significant change is to allow the AML debugger to be
built into the kernel. On top of that there is an update related
to the NFIT table (the ACPI persistent memory interface) and a few
fixes and cleanups.
- ACPI CPPC2 (Collaborative Processor Performance Control v2) support
along with a cpufreq frontend (Ashwin Chaugule).
This can only be enabled on ARM64 at this point.
- New ACPI infrastructure for the early probing of IRQ chips and
clock sources (Marc Zyngier).
- Support for a new hierarchical properties extension of the ACPI
_DSD (Device Specific Data) device configuration object allowing
the kernel to handle hierarchical properties (provided by the
platform firmware this way) automatically and make them available
to device drivers via the generic device properties interface
(Rafael Wysocki).
- Generic device properties API extension to obtain an index of
certain string value in an array of strings, along the lines of
of_property_match_string(), but working for all of the supported
firmware node types, and support for the "dma-names" device
property based on it (Mika Westerberg).
- ACPI core fix to parse the MADT (Multiple APIC Description Table)
entries in the order expected by platform firmware (and mandated by
the specification) to avoid confusion on systems with more than 255
logical CPUs (Lukasz Anaczkowski).
- Consolidation of the ACPI-based handling of PCI host bridges on x86
and ia64 (Jiang Liu).
- ACPI core fixes to ensure that the correct IRQ number is used to
represent the SCI (System Control Interrupt) in the cases when it
has been re-mapped (Chen Yu).
- New ACPI backlight quirk for Lenovo IdeaPad S405 (Hans de Goede).
- ACPI EC driver fixes (Lv Zheng).
- Assorted ACPI fixes and cleanups (Dan Carpenter, Insu Yun, Jiri
Kosina, Rami Rosen, Rasmus Villemoes).
- New mechanism in the PM core allowing drivers to check if the
platform firmware is going to be involved in the upcoming system
suspend or if it has been involved in the suspend the system is
resuming from at the moment (Rafael Wysocki).
This should allow drivers to optimize their suspend/resume handling
in some cases and the changes include a couple of users of it (the
i8042 input driver, PCI PM).
- PCI PM fix to prevent runtime-suspended devices with PME enabled
from being resumed during system suspend even if they aren't
configured to wake up the system from sleep (Rafael Wysocki).
- New mechanism to report the number of a wakeup IRQ that woke up the
system from sleep last time (Alexandra Yates).
- Removal of unused interfaces from the generic power domains
framework and fixes related to latency measurements in that code
(Ulf Hansson, Daniel Lezcano).
- cpufreq core sysfs interface rework to make it handle CPUs that
share performance scaling settings (represented by a common cpufreq
policy object) more symmetrically (Viresh Kumar).
This should help to simplify the CPU offline/online handling among
other things.
- cpufreq core fixes and cleanups (Viresh Kumar).
- intel_pstate fixes related to the Turbo Activation Ratio (TAR)
mechanism on client platforms which causes the turbo P-states range
to vary depending on platform firmware settings (Srinivas
Pandruvada).
- intel_pstate sysfs interface fix (Prarit Bhargava).
- Assorted cpufreq driver (imx, tegra20, powernv, integrator) fixes
and cleanups (Bai Ping, Bartlomiej Zolnierkiewicz, Shilpasri G
Bhat, Luis de Bethencourt).
- cpuidle mvebu driver cleanups (Russell King).
- OPP (Operating Performance Points) framework code reorganization to
make it more maintainable (Viresh Kumar).
- Intel Broxton support for the RAPL (Running Average Power Limits)
power capping driver (Amy Wiles).
- Assorted power management code fixes and cleanups (Dan Carpenter,
Geert Uytterhoeven, Geliang Tang, Luis de Bethencourt, Rasmus
Villemoes)"
* tag 'pm+acpi-4.4-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (108 commits)
cpufreq: postfix policy directory with the first CPU in related_cpus
cpufreq: create cpu/cpufreq/policyX directories
cpufreq: remove cpufreq_sysfs_{create|remove}_file()
cpufreq: create cpu/cpufreq at boot time
cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask
cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
PM / Domains: Merge measurements for PM QoS device latencies
PM / Domains: Don't measure ->start|stop() latency in system PM callbacks
PM / clk: Fix broken build due to non-matching code and header #ifdefs
ACPI / Documentation: add copy_dsdt to ACPI format options
ACPI / sysfs: correctly check failing memory allocation
ACPI / video: Add a quirk to force native backlight on Lenovo IdeaPad S405
ACPI / CPPC: Fix potential memory leak
ACPI / CPPC: signedness bug in register_pcc_channel()
ACPI / PAD: power_saving_thread() is not freezable
ACPI / PM: Fix incorrect wakeup IRQ setting during suspend-to-idle
ACPI: Using correct irq when waiting for events
ACPI: Use correct IRQ when uninstalling ACPI interrupt handler
cpuidle: mvebu: disable the bind/unbind attributes and use builtin_platform_driver
cpuidle: mvebu: clean up multiple platform drivers
...
Pull x86 sigcontext header cleanups from Ingo Molnar:
"This series reorganizes and cleans up various aspects of the main
sigcontext UAPI headers, such as unifying the data structures and
updating/adding lots of comments to explain all the ABI details and
quirks. The headers can now also be built in user-space standalone"
* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/headers: Clean up too long lines
x86/headers: Remove <asm/sigcontext.h> references on the kernel side
x86/headers: Remove direct sigcontext32.h uses
x86/headers: Convert sigcontext_ia32 uses to sigcontext_32
x86/headers: Unify 'struct sigcontext_ia32' and 'struct sigcontext_32'
x86/headers: Make sigcontext pointers bit independent
x86/headers: Move the 'struct sigcontext' definitions into the UAPI header
x86/headers: Clean up the kernel's struct sigcontext types to be ABI-clean
x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32
x86/headers: Unify 'struct _fpstate_ia32' and i386 struct _fpstate
x86/headers: Unify register type definitions between 32-bit compat and i386
x86/headers: Use ABI types consistently in sigcontext*.h
x86/headers: Separate out legacy user-space structure definitions
x86/headers: Clean up and better document uapi/asm/sigcontext.h
x86/headers: Clean up uapi/asm/sigcontext32.h
x86/headers: Fix (old) header file dependency bug in uapi/asm/sigcontext32.h
Pull x86 fpu changes from Ingo Molnar:
"There are two main areas of changes:
- Rework of the extended FPU state code to robustify the kernel's
usage of cpuid provided xstate sizes - and related changes (Dave
Hansen)"
- math emulation enhancements: new modern FPU instructions support,
with testcases, plus cleanups (Denys Vlasnko)"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/fpu: Fixup uninitialized feature_name warning
x86/fpu/math-emu: Add support for FISTTP instructions
x86/fpu/math-emu, selftests: Add test for FISTTP instructions
x86/fpu/math-emu: Add support for FCMOVcc insns
x86/fpu/math-emu: Add support for F[U]COMI[P] insns
x86/fpu/math-emu: Remove define layer for undocumented opcodes
x86/fpu/math-emu, selftests: Add tests for FCMOV and FCOMI insns
x86/fpu/math-emu: Remove !NO_UNDOC_CODE
x86/fpu: Check CPU-provided sizes against struct declarations
x86/fpu: Check to ensure increasing-offset xstate offsets
x86/fpu: Correct and check XSAVE xstate size calculations
x86/fpu: Add C structures for AVX-512 state components
x86/fpu: Rework YMM definition
x86/fpu/mpx: Rework MPX 'xstate' types
x86/fpu: Add xfeature_enabled() helper instead of test_bit()
x86/fpu: Remove 'xfeature_nr'
x86/fpu: Rework XSTATE_* macros to remove magic '2'
x86/fpu: Rename XFEATURES_NR_MAX
x86/fpu: Rename XSAVE macros
x86/fpu: Remove partial LWP support definitions
...
Pull x86 kgdb fixlet from Ingo Molnar:
"A single debugging related commit: compress the memory usage of a kgdb
data structure"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kgdb: Replace bool_int_array[NR_CPUS] with bitmap
Pull x86 cpu changes from Ingo Molnar:
"Two changes in this cycle: a Kconfig help text enhancement, and an AMD
CLZERO instruction capability detection and enumeration"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add CLZERO detection
x86/Kconfig/cpus: Fix/complete CPU type help texts
Pull x86 cleanups from Ingo Molnar:
"An early_printk cleanup plus deinlining enhancements"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/early_printk: Set __iomem address space for IO
x86/signal: Deinline get_sigframe, save 240 bytes
x86: Deinline early_console_register, save 403 bytes
x86/e820: Deinline e820_type_to_string, save 126 bytes
Pull x86 boot cleanup from Ingo Molnar:
"A single commit: remove an obsolete kcrash boot flag"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kexec: Remove obsolete 'in_crash_kexec' flag
Pull x86 asm changes from Ingo Molnar:
"The main change in this cycle is another step in the big x86 system
call interface rework by Andy Lutomirski, which moves most of the low
level x86 entry code from assembly to C, for all syscall entries
except native 64-bit system calls:
arch/x86/entry/entry_32.S | 182 ++++------
arch/x86/entry/entry_64_compat.S | 547 ++++++++-----------------------
194 insertions(+), 535 deletions(-)
... our hope is that the final remaining step (converting native
64-bit system calls) will be less painful as all the previous steps,
given that most of the legacies and quirks are concentrated around
native 32-bit and compat environments"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
x86/entry/32: Fix FS and GS restore in opportunistic SYSEXIT
x86/entry/32: Fix entry_INT80_32() to expect interrupts to be on
um/x86: Fix build after x86 syscall changes
x86/asm: Remove the xyz_cfi macros from dwarf2.h
selftests/x86: Style fixes for the 'unwind_vdso' test
x86/entry/64/compat: Document sysenter_fix_flags's reason for existence
x86/entry: Split and inline syscall_return_slowpath()
x86/entry: Split and inline prepare_exit_to_usermode()
x86/entry: Use pt_regs_to_thread_info() in syscall entry tracing
x86/entry: Hide two syscall entry assertions behind CONFIG_DEBUG_ENTRY
x86/entry: Micro-optimize compat fast syscall arg fetch
x86/entry: Force inlining of 32-bit syscall code
x86/entry: Make irqs_disabled checks in exit code depend on lockdep
x86/entry: Remove unnecessary IRQ twiddling in fast 32-bit syscalls
x86/asm: Remove thread_info.sysenter_return
x86/entry/32: Re-implement SYSENTER using the new C path
x86/entry/32: Switch INT80 to the new C syscall path
x86/entry/32: Open-code return tracking from fork and kthreads
x86/entry/compat: Implement opportunistic SYSRETL for compat syscalls
x86/vdso/compat: Wire up SYSENTER and SYSCSALL for compat userspace
...
Pull x86 apic changes from Ingo Molnar:
"The main changes in this cycle were:
- Numachip updates: new hardware support, fixes and cleanups.
(Daniel J Blueman)
- misc smaller cleanups and fixlets"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/io_apic: Make eoi_ioapic_pin() static
x86/irq: Drop unlikely before IS_ERR_OR_NULL
x86/x2apic: Make stub functions available even if !CONFIG_X86_LOCAL_APIC
x86/apic: Deinline various functions
x86/numachip: Fix timer build conflict
x86/numachip: Introduce Numachip2 timer mechanisms
x86/numachip: Add Numachip IPI optimisations
x86/numachip: Add Numachip2 APIC support
x86/numachip: Cleanup Numachip support
Pull scheduler changes from Ingo Molnar:
"The main changes in this cycle were:
- sched/fair load tracking fixes and cleanups (Byungchul Park)
- Make load tracking frequency scale invariant (Dietmar Eggemann)
- sched/deadline updates (Juri Lelli)
- stop machine fixes, cleanups and enhancements for bugs triggered by
CPU hotplug stress testing (Oleg Nesterov)
- scheduler preemption code rework: remove PREEMPT_ACTIVE and related
cleanups (Peter Zijlstra)
- Rework the sched_info::run_delay code to fix races (Peter Zijlstra)
- Optimize per entity utilization tracking (Peter Zijlstra)
- ... misc other fixes, cleanups and smaller updates"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
sched: Don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS
sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop()
sched: Start stopper early
stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark()
stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark()
stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled
stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works()
stop_machine: Ensure that a queued callback will be called before cpu_stop_park()
sched/x86: Fix typo in __switch_to() comments
sched/core: Remove a parameter in the migrate_task_rq() function
sched/core: Drop unlikely behind BUG_ON()
sched/core: Fix task and run queue sched_info::run_delay inconsistencies
sched/numa: Fix task_tick_fair() from disabling numa_balancing
sched/core: Add preempt_count invariant check
sched/core: More notrace annotations
sched/core: Kill PREEMPT_ACTIVE
sched/core, sched/x86: Kill thread_info::saved_preempt_count
sched/core: Simplify preempt_count tests
sched/core: Robustify preemption leak checks
sched/core: Stop setting PREEMPT_ACTIVE
...
Pull RAS changes from Ingo Molnar:
"The main system reliability related changes were from x86, but also
some generic RAS changes:
- AMD MCE error injection subsystem enhancements. (Aravind
Gopalakrishnan)
- Fix MCE and CPU hotplug interaction bug. (Ashok Raj)
- kcrash bootup robustness fix. (Baoquan He)
- kcrash cleanups. (Borislav Petkov)
- x86 microcode driver rework: simplify it by unmodularizing it and
other cleanups. (Borislav Petkov)"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init()
x86/mce: Add a Scalable MCA vendor flags bit
MAINTAINERS: Unify the microcode driver section
x86/microcode/intel: Move #ifdef DEBUG inside the function
x86/microcode/amd: Remove maintainers from comments
x86/microcode: Remove modularization leftovers
x86/microcode: Merge the early microcode loader
x86/microcode: Unmodularize the microcode driver
x86/mce: Fix thermal throttling reporting after kexec
kexec/crash: Say which char is the unrecognized
x86/setup/crash: Check memblock_reserve() retval
x86/setup/crash: Cleanup some more
x86/setup/crash: Remove alignment variable
x86/setup: Cleanup crashkernel reservation functions
x86/amd_nb, EDAC: Rename amd_get_node_id()
x86/setup: Do not reserve crashkernel high memory if low reservation failed
x86/microcode/amd: Do not overwrite final patch levels
x86/microcode/amd: Extract current patch level read to a function
x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC
x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts
...
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Improve accuracy of perf/sched clock on x86. (Adrian Hunter)
- Intel DS and BTS updates. (Alexander Shishkin)
- Intel cstate PMU support. (Kan Liang)
- Add group read support to perf_event_read(). (Peter Zijlstra)
- Branch call hardware sampling support, implemented on x86 and
PowerPC. (Stephane Eranian)
- Event groups transactional interface enhancements. (Sukadev
Bhattiprolu)
- Enable proper x86/intel/uncore PMU support on multi-segment PCI
systems. (Taku Izumi)
- ... misc fixes and cleanups.
The perf tooling team was very busy again with 200+ commits, the full
diff doesn't fit into lkml size limits. Here's an (incomplete) list
of the tooling highlights:
New features:
- Change the default event used in all tools (record/top): use the
most precise "cycles" hw counter available, i.e. when the user
doesn't specify any event, it will try using cycles:ppp, cycles:pp,
etc and fall back transparently until it finds a working counter.
(Arnaldo Carvalho de Melo)
- Integration of perf with eBPF that, given an eBPF .c source file
(or .o file built for the 'bpf' target with clang), will get it
automatically built, validated and loaded into the kernel via the
sys_bpf syscall, which can then be used and seen using 'perf trace'
and other tools.
(Wang Nan)
Various user interface improvements:
- Automatic pager invocation on long help output. (Namhyung Kim)
- Search for more options when passing args to -h, e.g.: (Arnaldo
Carvalho de Melo)
$ perf report -h interface
Usage: perf report [<options>]
--gtk Use the GTK2 interface
--stdio Use the stdio interface
--tui Use the TUI interface
- Show ordered command line options when -h is used or when an
unknown option is specified. (Arnaldo Carvalho de Melo)
- If options are passed after -h, show just its descriptions, not all
options. (Arnaldo Carvalho de Melo)
- Implement column based horizontal scrolling in the hists browser
(top, report), making it possible to use the TUI for things like
'perf mem report' where there are many more columns than can fit in
a terminal. (Arnaldo Carvalho de Melo)
- Enhance the error reporting of tracepoint event parsing, e.g.:
$ oldperf record -e sched:sched_switc usleep 1
event syntax error: 'sched:sched_switc'
\___ unknown tracepoint
Run 'perf list' for a list of valid events
Now we get the much nicer:
$ perf record -e sched:sched_switc ls
event syntax error: 'sched:sched_switc'
\___ can't access trace events
Error: No permissions to read /sys/kernel/debug/tracing/events/sched/sched_switc
Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug'
And after we have those mount point permissions fixed:
$ perf record -e sched:sched_switc ls
event syntax error: 'sched:sched_switc'
\___ unknown tracepoint
Error: File /sys/kernel/debug/tracing/events/sched/sched_switc not found.
Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?.
I.e. basically now the event parsing routing uses the strerror_open()
routines introduced by and used in 'perf trace' work. (Jiri Olsa)
- Fail properly when pattern matching fails to find a tracepoint,
i.e. '-e non:existent' was being correctly handled, with a proper
error message about that not being a valid event, but '-e
non:existent*' wasn't, fix it. (Jiri Olsa)
- Do event name substring search as last resort in 'perf list'.
(Arnaldo Carvalho de Melo)
E.g.:
# perf list clock
List of pre-defined events (to be used in -e):
cpu-clock [Software event]
task-clock [Software event]
uncore_cbox_0/clockticks/ [Kernel PMU event]
uncore_cbox_1/clockticks/ [Kernel PMU event]
kvm:kvm_pvclock_update [Tracepoint event]
kvm:kvm_update_master_clock [Tracepoint event]
power:clock_disable [Tracepoint event]
power:clock_enable [Tracepoint event]
power:clock_set_rate [Tracepoint event]
syscalls:sys_enter_clock_adjtime [Tracepoint event]
syscalls:sys_enter_clock_getres [Tracepoint event]
syscalls:sys_enter_clock_gettime [Tracepoint event]
syscalls:sys_enter_clock_nanosleep [Tracepoint event]
syscalls:sys_enter_clock_settime [Tracepoint event]
syscalls:sys_exit_clock_adjtime [Tracepoint event]
syscalls:sys_exit_clock_getres [Tracepoint event]
syscalls:sys_exit_clock_gettime [Tracepoint event]
syscalls:sys_exit_clock_nanosleep [Tracepoint event]
syscalls:sys_exit_clock_settime [Tracepoint event]
Intel PT hardware tracing enhancements:
- Accept a zero --itrace period, meaning "as often as possible". In
the case of Intel PT that is the same as a period of 1 and a unit
of 'instructions' (i.e. --itrace=i1i). (Adrian Hunter)
- Harmonize itrace's synthesized callchains with the existing
--max-stack tool option. (Adrian Hunter)
- Allow time to be displayed in nanoseconds in 'perf script'.
(Adrian Hunter)
- Fix potential infinite loop when handling Intel PT timestamps.
(Adrian Hunter)
- Slighly improve Intel PT debug logging. (Adrian Hunter)
- Warn when AUX data has been lost, just like when processing
PERF_RECORD_LOST. (Adrian Hunter)
- Further document export-to-postgresql.py script. (Adrian Hunter)
- Add option to synthesize branch stack from auxtrace data. (Adrian
Hunter)
Misc notable changes:
- Switch the default callchain output mode to 'graph,0.5,caller', to
make it look like the default for other tools, reducing the
learning curve for people used to 'caller' based viewing. (Arnaldo
Carvalho de Melo)
- various call chain usability enhancements. (Namhyung Kim)
- Introduce the 'P' event modifier, meaning 'max precision level,
please', i.e.:
$ perf record -e cycles:P usleep 1
Is now similar to:
$ perf record usleep 1
Useful, for instance, when specifying multiple events. (Jiri Olsa)
- Add 'socket' sort entry, to sort by the processor socket in 'perf
top' and 'perf report'. (Kan Liang)
- Introduce --socket-filter to 'perf report', for filtering by
processor socket. (Kan Liang)
- Add new "Zoom into Processor Socket" operation in the perf hists
browser, used in 'perf top' and 'perf report'. (Kan Liang)
- Allow probing on kmodules without DWARF. (Masami Hiramatsu)
- Fix 'perf probe -l' for probes added to kernel module functions.
(Masami Hiramatsu)
- Preparatory work for the 'perf stat record' feature that will allow
generating perf.data files with counting data in addition to the
sampling mode we have now (Jiri Olsa)
- Update libtraceevent KVM plugin. (Paolo Bonzini)
- ... plus lots of other enhancements that I failed to list properly,
by: Adrian Hunter, Alexander Shishkin, Andi Kleen, Andrzej Hajda,
Arnaldo Carvalho de Melo, Dima Kogan, Don Zickus, Geliang Tang, He
Kuang, Huaitong Han, Ingo Molnar, Jan Stancek, Jiri Olsa, Kan
Liang, Kirill Tkhai, Masami Hiramatsu, Matt Fleming, Namhyung Kim,
Paolo Bonzini, Peter Zijlstra, Rabin Vincent, Scott Wood, Stephane
Eranian, Sukadev Bhattiprolu, Taku Izumi, Vaishali Thakkar, Wang
Nan, Yang Shi and Yunlong Song"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (260 commits)
perf unwind: Pass symbol source to libunwind
tools build: Fix libiberty feature detection
perf tools: Compile scriptlets to BPF objects when passing '.c' to --event
perf record: Add clang options for compiling BPF scripts
perf bpf: Attach eBPF filter to perf event
perf tools: Make sure fixdep is built before libbpf
perf script: Enable printing of branch stack
perf trace: Add cmd string table to decode sys_bpf first arg
perf bpf: Collect perf_evsel in BPF object files
perf tools: Load eBPF object into kernel
perf tools: Create probe points for BPF programs
perf tools: Enable passing bpf object file to --event
perf ebpf: Add the libbpf glue
perf tools: Make perf depend on libbpf
perf symbols: Fix endless loop in dso__split_kallsyms_for_kcore
perf tools: Enable pre-event inherit setting by config terms
perf symbols: we can now read separate debug-info files based on a build ID
perf symbols: Fix type error when reading a build-id
perf tools: Search for more options when passing args to -h
perf stat: Cache aggregated map entries in extra cpumap
...
Pull EFI changes from Ingo Molnar:
"The main changes in this cycle were:
- further EFI code generalization to make it more workable for ARM64
- various extensions, such as 64-bit framebuffer address support,
UEFI v2.5 EFI_PROPERTIES_TABLE support
- code modularization simplifications and cleanups
- new debugging parameters
- various fixes and smaller additions"
* 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
efi: Fix warning of int-to-pointer-cast on x86 32-bit builds
efi: Use correct type for struct efi_memory_map::phys_map
x86/efi: Fix kernel panic when CONFIG_DEBUG_VIRTUAL is enabled
efi: Add "efi_fake_mem" boot option
x86/efi: Rename print_efi_memmap() to efi_print_memmap()
efi: Auto-load the efi-pstore module
efi: Introduce EFI_NX_PE_DATA bit and set it from properties table
efi: Add support for UEFIv2.5 Properties table
efi: Add EFI_MEMORY_MORE_RELIABLE support to efi_md_typeattr_format()
efifb: Add support for 64-bit frame buffer addresses
efi/arm64: Clean up efi_get_fdt_params() interface
arm64: Use core efi=debug instead of uefi_debug command line parameter
efi/x86: Move efi=debug option parsing to core
drivers/firmware: Make efi/esrt.c driver explicitly non-modular
efi: Use the generic efi.memmap instead of 'memmap'
acpi/apei: Use appropriate pgprot_t to map GHES memory
arm64, acpi/apei: Implement arch_apei_get_mem_attributes()
arm64/mm: Add PROT_DEVICE_nGnRnE and PROT_NORMAL_WT
acpi, x86: Implement arch_apei_get_mem_attributes()
efi, x86: Rearrange efi_mem_attributes()
...
Pull timer updates from Thomas Gleixner:
"The timer departement provides:
- More y2038 work in the area of ntp and pps.
- Optimization of posix cpu timers
- New time related selftests
- Some new clocksource drivers
- The usual pile of fixes, cleanups and improvements"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
timeconst: Update path in comment
timers/x86/hpet: Type adjustments
clocksource/drivers/armada-370-xp: Implement ARM delay timer
clocksource/drivers/tango_xtal: Add new timer for Tango SoCs
clocksource/drivers/imx: Allow timer irq affinity change
clocksource/drivers/exynos_mct: Use container_of() instead of this_cpu_ptr()
clocksource/drivers/h8300_*: Remove unneeded memset()s
clocksource/drivers/sh_cmt: Remove unneeded memset() in sh_cmt_setup()
clocksource/drivers/em_sti: Remove unneeded memset()s
clocksource/drivers/mediatek: Use GPT as sched clock source
clockevents/drivers/mtk: Fix spurious interrupt leading to crash
posix_cpu_timer: Reduce unnecessary sighand lock contention
posix_cpu_timer: Convert cputimer->running to bool
posix_cpu_timer: Check thread timers only when there are active thread timers
posix_cpu_timer: Optimize fastpath_timer_check()
timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments
timers: Use __fls in apply_slack()
clocksource: Remove return statement from void functions
net: sfc: avoid using timespec
ntp/pps: use y2038 safe types in pps_event_time
...
AMD Fam17h processors introduce support for the CLZERO
instruction. It zeroes out the 64 byte cache line specified in
RAX.
Add the bit here to allow /proc/cpuinfo to list the feature.
Boris: we're adding this as a separate ->x86_capability leaf
because CPUID_80000008_EBX is going to contain more feature bits
and it will fill out with time.
Signed-off-by: Wan Zongshun <Vincent.Wan@amd.com>
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
[ Wrap code in patch form, fix comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1446207099-24948-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Scalable MCA (SMCA) is a new feature in AMD Fam17h processors
which indicates presence of MCA extensions.
MCA extensions expands existing register space for the MCE banks
and also introduces a new MSR range to accommodate new banks.
Add the detection bit.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Reformat mce_vendor_flags definitions and save indentation levels. Improve comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1446207099-24948-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
thread.vm86 points to per-task information -- the pointer should not
be copied on clone.
Fixes: d4ce0f26c7 ("x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86'")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Stas Sergeev <stsp@list.ru>
Link: http://lkml.kernel.org/r/71c5d6985d70ec8197c8d72f003823c81b7dcf99.1446270067.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit 4857c91f0d changed the way how irq affinity is setup in
setup_ioapic_dest() from using the core helper function to
unconditionally calling the irq_set_affinity() callback of the
underlying irq chip.
That results in a NULL pointer dereference for the rare case where the
underlying irq chip is lapic_chip which has no irq_set_affinity()
callback. lapic_chip is occasionally used for the timer interrupt (irq
0).
The fix is simple: Check the availability of the callback instead of
calling it unconditionally.
Fixes: 4857c91f0d "x86/ioapic: Force affinity setting in setup_ioapic_dest()"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Commit 6894258eda broke drivers that pass NULL as the device pointer
to dma_alloc. The reason is that arch_dma_alloc_attrs() now calls
dma_alloc_coherent_gfp_flags() which in turn calls
dma_alloc_coherent_mask(), where the device pointer is dereferenced
unconditionally.
Fix things by moving the ISA DMA fallback device assignment before the
call to dma_alloc_coherent_gfp_flags().
Fixes: 6894258eda ("dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}")
Reported-and-tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/1445807503-8920-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Now, ftrace only calculate the dyn_ftrace number in the adding
breakpoint loop, not in adding update and finish update loop.
Calculate the correct dyn_ftrace, once ftrace reports the failure message
to the userspace.
Link: http://lkml.kernel.org/r/1442420382-13130-1-git-send-email-mnfhuang@gmail.com
Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
... and save us the stub.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We have the MAINTAINERS file for that. Also, Andreas doesn't
have the time for this work anymore.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Remove the remaining module functionality leftovers. Make
"dis_ucode_ldr" an early_param and make it static again. Drop
module aliases, autoloading table, description, etc.
Bump version number, while at it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge the early loader functionality into the driver proper. The
diff is huge but logically, it is simply moving code from the
_early.c files into the main driver.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make CONFIG_MICROCODE a bool. It was practically a bool already anyway,
since early loader was forcing it to =y.
Regardless, there's no real reason to have something be a module which
gets built-in on the majority of installations out there. And its not
like there's noticeable change in functionality - we still can load late
microcode - just the module glue disappears.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Standardize on bool instead of an inconsistent mixture of u8 and plain 'int'.
Also use u32 or 'unsigned int' instead of 'unsigned long' when a 32-bit type
suffices, generating slightly better code on x86-64.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5624E3A002000078000AC49A@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The per CPU thermal vector init code checks if the thermal
vector is already installed and complains and bails out if it
is.
This happens after kexec, as kernel shut down does not clear the
thermal vector APIC register.
This causes two problems:
1. So we always do not fully initialize thermal reports after
kexec. The CPU is still likely initialized, as the previous
kernel should have done it. But we don't set up the software
pointer to the thermal vector, so reporting may end up with a
unknown thermal interrupt message.
2. Also it complains for every logical CPU, even though the
value is actually derived from BP only.
The problem is that we end up with one message per CPU, so on
larger systems it becomes very noisy and messes up the otherwise
nicely formatted CPU bootup numbers in the kernel log.
Just remove the check. I checked the code and there's no valid
code paths where the thermal init code for a CPU could be called
multiple times.
Why the kernel does not clean up this value on shutdown:
The thermal monitoring is controlled per logical CPU thread.
Normal shutdown code is just running on one CPU. To disable it
we would need a broadcast NMI to all CPUs on shut down. That's
overkill for this. So we just ignore it after kexec.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1445246268-26285-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
memblock_reserve() can fail but the crashkernel reservation code
doesn't check that and this can lead the user into believing
that the crashkernel region was actually reserved. Make sure we
check that return value and we exit early with a failure message
in the error case.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Chao <chaowang@redhat.com>
Cc: jerry_hoemann@hp.com
Link: http://lkml.kernel.org/r/1445246268-26285-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
People reported that when allocating crashkernel memory using
the ",high" and ",low" syntax, there were cases where the
reservation of the high portion succeeds but the reservation of
the low portion fails.
Then kexec can load the kdump kernel successfully, but booting
the kdump kernel fails as there's no low memory.
The low memory allocation for the kdump kernel can fail on large
systems for a couple of reasons. For example, the manually
specified crashkernel low memory can be too large and thus no
adequate memblock region would be found.
Therefore, we try to reserve low memory for the crash kernel
*after* the high memory portion has been allocated. If that
fails, we free crashkernel high memory too and return. The user
can then take measures accordingly.
Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Massage text. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Young <dyoung@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Chao <chaowang@redhat.com>
Cc: jerry_hoemann@hp.com
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/1445246268-26285-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
get_wchan() is racy by design, it may access volatile stack
of running task, thus it may access redzone in a stack frame
and cause KASAN to warn about this.
Use READ_ONCE_NOCHECK() to silence these warnings.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Link: http://lkml.kernel.org/r/1445243838-17763-3-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch enables the suport for the PERF_SAMPLE_BRANCH_CALL
for Intel x86 processors. When the processor support LBR filtering
this the selection is done in hardware. Otherwise, the filter is
applied by software. Note that we chose to include zero length calls
because they also represent calls.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: khandual@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1444720151-10275-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
b20112edea ("perf/x86: Improve accuracy of perf/sched clock")
allowed the time_shift value in perf_event_mmap_page to be as much
as 32. Unfortunately the documented algorithms for using time_shift
have it shifting an integer, whereas to work correctly with the value
32, the type must be u64.
In the case of perf tools, Intel PT decodes correctly but the timestamps
that are output (for example by perf script) have lost 32-bits of
granularity so they look like they are not changing at all.
Fix by limiting the shift to 31 and adjusting the multiplier accordingly.
Also update the documentation of perf_event_mmap_page so that new code
based on it will be more future-proof.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: b20112edea ("perf/x86: Improve accuracy of perf/sched clock")
Link: http://lkml.kernel.org/r/1445001845-13688-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following commit:
a9bcaa02a5 ("x86/smpboot: Remove SIPI delays from cpu_up()")
Caused some Intel Core2 processors to time-out when bringing up CPU #1,
resulting in the missing of that CPU after bootup.
That patch reduced the SIPI delays from udelay() 300, 200 to udelay() 0,
0 on modern processors.
Several Intel(R) Core(TM)2 systems failed to bring up CPU #1 10/10 times
after that change.
Increasing either of the SIPI delays to udelay(1) results in
success. So here we increase both to udelay(10). While this may
be 20x slower than the absolute minimum, it is still 20x to 30x
faster than the original code.
Tested-by: Donald Parsons <dparsons@brightdsl.net>
Tested-by: Shane <shrybman@teksavvy.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dparsons@brightdsl.net
Cc: shrybman@teksavvy.com
Link: http://lkml.kernel.org/r/6dd554ee8945984d85aafb2ad35793174d068af0.1444968087.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For legacy machines cpu_init_udelay defaults to 10,000.
For modern machines it is set to 0.
The user should be able to set cpu_init_udelay to
any value on the cmdline, including 10,000.
Before this patch, that was seen as "unchanged from default"
and thus on a modern machine, the user request was ignored
and the delay was set to 0.
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dparsons@brightdsl.net
Cc: shrybman@teksavvy.com
Link: http://lkml.kernel.org/r/de363cdbbcfcca1d22569683f7eb9873e0177251.1444968087.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A sporadic hang with consequent crash is observed when booting Hyper-V Gen1
guests:
Call Trace:
<IRQ>
[<ffffffff810ab68d>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff8107b616>] queue_work_on+0x46/0x90
[<ffffffff81365696>] ? add_interrupt_randomness+0x176/0x1d0
...
<EOI>
[<ffffffff81471ddb>] ? _raw_spin_unlock_irqrestore+0x3b/0x60
[<ffffffff810c295e>] __irq_put_desc_unlock+0x1e/0x40
[<ffffffff810c5c35>] irq_modify_status+0xb5/0xd0
[<ffffffff8104adbb>] mp_register_handler+0x4b/0x70
[<ffffffff8104c55a>] mp_irqdomain_alloc+0x1ea/0x2a0
[<ffffffff810c7f10>] irq_domain_alloc_irqs_recursive+0x40/0xa0
[<ffffffff810c860c>] __irq_domain_alloc_irqs+0x13c/0x2b0
[<ffffffff8104b070>] alloc_isa_irq_from_domain.isra.1+0xc0/0xe0
[<ffffffff8104bfa5>] mp_map_pin_to_irq+0x165/0x2d0
[<ffffffff8104c157>] pin_2_irq+0x47/0x80
[<ffffffff81744253>] setup_IO_APIC+0xfe/0x802
...
[<ffffffff814631c0>] ? rest_init+0x140/0x140
The issue is easily reproducible with a simple instrumentation: if
mdelay(10) is put between mp_setup_entry() and mp_register_handler() calls
in mp_irqdomain_alloc() Hyper-V guest always fails to boot when re-routing
IRQ0. The issue seems to be caused by the fact that we don't disable
interrupts while doing IOPIC programming for legacy IRQs and IRQ0 actually
happens.
Protect the setup sequence against concurrent interrupts.
[ tglx: Make the protection unconditional and not only for legacy
interrupts ]
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1444930943-19336-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
On 32-bit systems, the initial_page_table is reused by
efi_call_phys_prolog as an identity map to call
SetVirtualAddressMap. efi_call_phys_prolog takes care of
converting the current CPU's GDT to a physical address too.
For PAE kernels the identity mapping is achieved by aliasing the
first PDPE for the kernel memory mapping into the first PDPE
of initial_page_table. This makes the EFI stub's trick "just work".
However, for non-PAE kernels there is no guarantee that the identity
mapping in the initial_page_table extends as far as the GDT; in this
case, accesses to the GDT will cause a page fault (which quickly becomes
a triple fault). Fix this by copying the kernel mappings from
swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at
identity mapping.
For some reason, this is only reproducible with QEMU's dynamic translation
mode, and not for example with KVM. However, even under KVM one can clearly
see that the page table is bogus:
$ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize
$ gdb
(gdb) target remote localhost:1234
(gdb) hb *0x02858f6f
Hardware assisted breakpoint 1 at 0x2858f6f
(gdb) c
Continuing.
Breakpoint 1, 0x02858f6f in ?? ()
(gdb) monitor info registers
...
GDT= 0724e000 000000ff
IDT= fffbb000 000007ff
CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690
...
The page directory is sane:
(gdb) x/4wx 0x32b7000
0x32b7000: 0x03398063 0x03399063 0x0339a063 0x0339b063
(gdb) x/4wx 0x3398000
0x3398000: 0x00000163 0x00001163 0x00002163 0x00003163
(gdb) x/4wx 0x3399000
0x3399000: 0x00400003 0x00401003 0x00402003 0x00403003
but our particular page directory entry is empty:
(gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4
0x32b7070: 0x00000000
[ It appears that you can skate past this issue if you don't receive
any interrupts while the bogus GDT pointer is loaded, or if you avoid
reloading the segment registers in general.
Andy Lutomirski provides some additional insight:
"AFAICT it's entirely permissible for the GDTR and/or LDT
descriptor to point to unmapped memory. Any attempt to use them
(segment loads, interrupts, IRET, etc) will try to access that memory
as if the access came from CPL 0 and, if the access fails, will
generate a valid page fault with CR2 pointing into the GDT or
LDT."
Up until commit 23a0d4e8fa ("efi: Disable interrupts around EFI
calls, not in the epilog/prolog calls") interrupts were disabled
around the prolog and epilog calls, and the functional GDT was
re-installed before interrupts were re-enabled.
Which explains why no one has hit this issue until now. ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Laszlo Ersek <lersek@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
[ Updated changelog. ]
ACPI specifies the following rules when listing APIC IDs:
(1) Boot processor is listed first
(2) For multi-threaded processors, BIOS should list the first logical
processor of each of the individual multi-threaded processors in MADT
before listing any of the second logical processors.
(3) APIC IDs < 0xFF should be listed in APIC subtable, APIC IDs >= 0xFF
should be listed in X2APIC subtable
Because of above, when there's more than 0xFF logical CPUs, BIOS
interleaves APIC/X2APIC subtables.
Assuming, there's 72 cores, 72 hyper-threads each, 288 CPUs total,
listing is like this:
APIC (0,4,8, .., 252)
X2APIC (258,260,264, .. 284)
APIC (1,5,9,...,253)
X2APIC (259,261,265,...,285)
APIC (2,6,10,...,254)
X2APIC (260,262,266,..,286)
APIC (3,7,11,...,251)
X2APIC (255,261,262,266,..,287)
Now, before this patch, due to how ACPI MADT subtables were parsed (BSP
then X2APIC then APIC), kernel enumerated CPUs in reverted order (i.e.
high APIC IDs were getting low logical IDs, and low APIC IDs were
getting high logical IDs).
This is wrong for the following reasons:
() it's hard to predict how cores and threads are enumerated
() when it's hard to predict, s/w threads cannot be properly affinitized
causing significant performance impact due to e.g. inproper cache
sharing
() enumeration is inconsistent with how threads are enumerated on
other Intel Xeon processors
So, order in which MADT APIC/X2APIC handlers are passed is
reverse and both handlers are passed to be called during same MADT
table to walk to achieve correct CPU enumeration.
In scenario when someone boots kernel with options 'maxcpus=72 nox2apic',
in result less cores may be booted, since some of the CPUs the kernel
will try to use will have APIC ID >= 0xFF. In such case, one
should not pass 'nox2apic'.
Disclimer: code parsing MADT APIC/X2APIC has not been touched since 2009,
when X2APIC support was initially added. I do not know why MADT parsing
code was added in the reversed order in the first place.
I guess it didn't matter at that time since nobody cared about cores
with APIC IDs >= 0xFF, right?
This patch is based on work of "Yinghai Lu <yinghai@kernel.org>"
previously published at https://lkml.org/lkml/2013/1/21/563
Here's the explanation why parsing interface needs to be changed
and why simpler approach will not work https://lkml.org/lkml/2015/9/7/285
Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de> (commit message)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
non-modular by ripping out the module_* code since Kconfig doesn't
allow it to be built as a module anyway - Paul Gortmaker
* Make the x86 efi=debug kernel parameter, which enables EFI debug
code and output, generic and usable by arm64 - Leif Lindholm
* Add support to the x86 EFI boot stub for 64-bit Graphics Output
Protocol frame buffer addresses - Matt Fleming
* Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
in the firmware and set an efi.flags bit so the kernel knows when
it can apply more strict runtime mapping attributes - Ard Biesheuvel
* Auto-load the efi-pstore module on EFI systems, just like we
currently do for the efivars module - Ben Hutchings
* Add "efi_fake_mem" kernel parameter which allows the system's EFI
memory map to be updated with additional attributes for specific
memory ranges. This is useful for testing the kernel code that handles
the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
doesn't include support - Taku Izumi
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWG7OwAAoJEC84WcCNIz1VEEEP/0SsdrwJ66B4MfP5YNjqHYWm
+OTHR6Ovv2i10kc+NjOV/GN8sWPndnkLfIfJ4EqJ9BoQ9PDEYZilV2aleSQ4DrPm
H7uGwBXQkfd76tZKX9pMToK76mkhg6M7M2LR3Suv3OGfOEzuozAOt3Ez37lpksTN
2ByhHr/oGbhu99jC2ki5+k0ySH8PMqDBRxqrPbBzTD+FfB7bM11vAJbSNbSMQ21R
ZwX0acZBLqb9J2Vf7tDsW+fCfz0TFo8JHW8jdLRFm/y2dpquzxswkkBpODgA8+VM
0F5UbiUdkaIRug75I6N/OJ8+yLwdzuxm7ul+tbS3JrXGLAlK3850+dP2Pr5zQ2Ce
zaYGRUy+tD5xMXqOKgzpu+Ia8XnDRLhOlHabiRd5fG6ZC9nR8E9uK52g79voSN07
pADAJnVB03CGV/HdduDOI4C4UykUKubuArbQVkqWJcecV1Jic/tYI0gjeACmU1VF
v8FzXpBUe3U3A0jauOz8PBz8M+k5qky/GbIrnEvXreBtKdt999LN9fykTN7rBOpo
dk/6vTR1Jyv3aYc9EXHmRluktI6KmfWCqmRBOIgQveX1VhdRM+1w2LKC0+8co3dF
v/DBh19KDyfPI8eOvxKykhn164UeAt03EXqDa46wFGr2nVOm/JiShL/d+QuyYU4G
8xb/rET4JrhCG4gFMUZ7
=1Oee
-----END PGP SIGNATURE-----
Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi
Pull v4.4 EFI updates from Matt Fleming:
- Make the EFI System Resource Table (ESRT) driver explicitly
non-modular by ripping out the module_* code since Kconfig doesn't
allow it to be built as a module anyway. (Paul Gortmaker)
- Make the x86 efi=debug kernel parameter, which enables EFI debug
code and output, generic and usable by arm64. (Leif Lindholm)
- Add support to the x86 EFI boot stub for 64-bit Graphics Output
Protocol frame buffer addresses. (Matt Fleming)
- Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
in the firmware and set an efi.flags bit so the kernel knows when
it can apply more strict runtime mapping attributes - Ard Biesheuvel
- Auto-load the efi-pstore module on EFI systems, just like we
currently do for the efivars module. (Ben Hutchings)
- Add "efi_fake_mem" kernel parameter which allows the system's EFI
memory map to be updated with additional attributes for specific
memory ranges. This is useful for testing the kernel code that handles
the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
doesn't include support. (Taku Izumi)
Note: there is a semantic conflict between the following two commits:
8a53554e12 ("x86/efi: Fix multiple GOP device support")
ae2ee627dc ("efifb: Add support for 64-bit frame buffer addresses")
I fixed up the interaction in the merge commit, changing the type of
current_fb_base from u32 to u64.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are following warnings on unpatched code:
arch/x86/kernel/early_printk.c:198:32: warning: incorrect type in initializer (different address spaces)
arch/x86/kernel/early_printk.c:198:32: expected void [noderef] <asn:2>*vaddr
arch/x86/kernel/early_printk.c:198:32: got unsigned int [usertype] *<noident>
arch/x86/kernel/early_printk.c:205:32: warning: incorrect type in initializer (different address spaces)
arch/x86/kernel/early_printk.c:205:32: expected void [noderef] <asn:2>*vaddr
arch/x86/kernel/early_printk.c:205:32: got unsigned int [usertype] *<noident>
Annotate it proper.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: http://lkml.kernel.org/r/1444646837-42615-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
A certain number of patch levels of applied microcode should not
be overwritten by the microcode loader, otherwise bad things
will happen.
Check those and abort update if the current core has one of
those final patch levels applied by the BIOS. 32-bit needs
special handling, of course.
See https://bugzilla.suse.com/show_bug.cgi?id=913996 for more
info.
Tested-by: Peter Kirchgeßner <pkirchgessner@t-online.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1444641762-9437-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pave the way for checking the current patch level of the
microcode in a core. We want to be able to do stuff depending on
the patch level - in this case decide whether to update or not.
But that will be added in a later patch.
Drop unused local var uci assignment, while at it.
Integrate a fix for 32-bit and CONFIG_PARAVIRT from Takashi Iwai:
Use native_rdmsr() in check_current_patch_level() because with
CONFIG_PARAVIRT enabled and on 32-bit, where we run before
paging has been enabled, we cannot deref pv_info yet. Or we
could, but we'd need to access its physical address. This way of
fixing it is simpler. See:
https://bugzilla.suse.com/show_bug.cgi?id=943179 for the background.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Takashi Iwai <tiwai@suse.com>:
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1444641762-9437-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch introduces new boot option named "efi_fake_mem".
By specifying this parameter, you can add arbitrary attribute
to specific memory range.
This is useful for debugging of Address Range Mirroring feature.
For example, if "efi_fake_mem=2G@4G:0x10000,2G@0x10a0000000:0x10000"
is specified, the original (firmware provided) EFI memmap will be
updated so that the specified memory regions have
EFI_MEMORY_MORE_RELIABLE attribute (0x10000):
<original>
efi: mem36: [Conventional Memory| | | | | | |WB|WT|WC|UC] range=[0x0000000100000000-0x00000020a0000000) (129536MB)
<updated>
efi: mem36: [Conventional Memory| |MR| | | | |WB|WT|WC|UC] range=[0x0000000100000000-0x0000000180000000) (2048MB)
efi: mem37: [Conventional Memory| | | | | | |WB|WT|WC|UC] range=[0x0000000180000000-0x00000010a0000000) (61952MB)
efi: mem38: [Conventional Memory| |MR| | | | |WB|WT|WC|UC] range=[0x00000010a0000000-0x0000001120000000) (2048MB)
efi: mem39: [Conventional Memory| | | | | | |WB|WT|WC|UC] range=[0x0000001120000000-0x00000020a0000000) (63488MB)
And you will find that the following message is output:
efi: Memory: 4096M/131455M mirrored memory
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Previously, UV NMI used the 'in_crash_kexec' flag to determine whether
we are in a kdump kernel or not:
5edd19af18 ("x86, UV: Make kdump avoid stack dumps")
But this flags was removed in the following commit:
9c48f1c629 ("x86, nmi: Wire up NMI handlers to new routines")
Since it isn't used any more, remove it.
Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: cpw@sgi.com
Cc: kexec@lists.infradead.org
Cc: mhuang@redhat.com
Link: http://lkml.kernel.org/r/1444070155-17934-1-git-send-email-mhuang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We have to define internally used function as static, otherwise the following
warning will be generated:
arch/x86/kernel/apic/io_apic.c:532:6: warning: no previous prototype for 'eoi_ioapic_pin' [-Wmissing-prototypes]
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/1444400685-98611-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It's no longer needed.
We could reinstate something like it as an optimization, which
would remove two cachelines from the fast syscall entry working
set. I benchmarked it, and it makes no difference whatsoever to
the performance of cache-hot compat syscalls on Sandy Bridge.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/f08cc0cff30201afe9bb565c47134c0a6c1a96a2.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
32-bit userspace will now always see the same vDSO, which is
exactly what used to be the int80 vDSO. Subsequent patches will
clean it up and make it support SYSENTER and SYSCALL using
alternatives.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/e7e6b3526fa442502e6125fe69486aab50813c32.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In multi-segment system, uncore devices may belong to buses whose segment
number is other than 0:
....
0000:ff:10.5 System peripheral: Intel Corporation Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 03)
...
0001:7f:10.5 System peripheral: Intel Corporation Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 03)
...
0001:bf:10.5 System peripheral: Intel Corporation Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 03)
...
0001:ff:10.5 System peripheral: Intel Corporation Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 03
...
In that case, relation of bus number and physical id may be broken
because "uncore_pcibus_to_physid" doesn't take account of PCI segment.
For example, bus 0000:ff and 0001:ff uses the same entry of
"uncore_pcibus_to_physid" array.
This patch fixes this problem by introducing the segment-aware pci2phy_map instead.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1443096621-4119-1-git-send-email-izumi.taku@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds new PMUs to support cstate related free running
(read-only) counters. These counters may be used simultaneously by other
tools, such as turbostat. However, it still make sense to implement them
in perf. Because we can conveniently collect them together with other
events, and allow to use them from tools without special MSR access
code.
These counters include CORE_C*_RESIDENCY and PKG_C*_RESIDENCY.
According to counters' scope and category, two PMUs are registered with
the perf_event core subsystem.
- 'cstate_core': The counter is available for each physical core. The
counters include CORE_C*_RESIDENCY.
- 'cstate_pkg': The counter is available for each physical package. The
counters include PKG_C*_RESIDENCY.
The events are exposed in sysfs for use by perf stat and other tools.
The files are:
/sys/devices/cstate_core/events/c*-residency
/sys/devices/cstate_pkg/events/c*-residency
These events only support system-wide mode counting.
The /sys/devices/cstate_*/cpumask file can be used by tools to figure
out which CPUs to monitor by default.
The PMU type (attr->type) is dynamically allocated and is available from
/sys/devices/core_misc/type and /sys/device/cstate_*/type.
Sampling is not supported.
Here is an example.
- To caculate the fraction of time when the core is running in C6 state
CORE_C6_time% = CORE_C6_RESIDENCY / TSC
# perf stat -x, -e"cstate_core/c6-residency/,msr/tsc/" -C0 -- taskset -c 0 sleep 5
11838820015,,cstate_core/c6-residency/,5175919658,100.00
11877130740,,msr/tsc/,5175922010,100.00
For sleep, 99.7% of time we ran in C6 state.
# perf stat -x, -e"cstate_core/c6-residency/,msr/tsc/" -C0 -- taskset -c 0 busyloop
1253316,,cstate_core/c6-residency/,4360969154,100.00
10012635248,,msr/tsc/,4360972366,100.00
For busyloop, 0.01% of time we ran in C6 state.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1443443404-8581-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the introduction of the context switch preempt_count invariant,
and the demise of PREEMPT_ACTIVE, its pointless to save/restore the
per-cpu preemption count, it must always be 2.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Fixes all around the map: W+X kernel mapping fix, WCHAN fixes, two
build failure fixes for corner case configs, x32 header fix and a
speling fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds
x86/mm: Set NX on gap between __ex_table and rodata
x86/kexec: Fix kexec crash in syscall kexec_file_load()
x86/process: Unify 32bit and 64bit implementations of get_wchan()
x86/process: Add proper bound checks in 64bit get_wchan()
x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernels
x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE case
x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag
The original bug is a page fault crash that sometimes happens
on big machines when preparing ELF headers:
BUG: unable to handle kernel paging request at ffffc90613fc9000
IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
The bug is caused by us under-counting the number of memory ranges
and subsequently not allocating enough ELF header space for them.
The bug is typically masked on smaller systems, because the ELF header
allocation is rounded up to the next page.
This patch modifies the code in fill_up_crash_elf_data() by using
walk_system_ram_res() instead of walk_system_ram_range() to correctly
count the max number of crash memory ranges. That's because the
walk_system_ram_range() filters out small memory regions that
reside in the same page, but walk_system_ram_res() does not.
Here's how I found the bug:
After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
the code uses walk_system_ram_res() to fill-in crash memory regions information
to the program header, so it counts those small memory regions that
reside in a page area.
But, when the kernel was using walk_system_ram_range() in
fill_up_crash_elf_data() to count the number of crash memory regions,
it filters out small regions.
I printed those small memory regions, for example:
kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
Based on the code in walk_system_ram_range(), this memory region
will be filtered out:
pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
end_pfn - pfn = 0x77593 - 0x77593 = 0 <=== if (end_pfn > pfn) is FALSE
So, the max_nr_ranges that's counted by the kernel doesn't include
small memory regions - causing us to under-allocate the required space.
That causes the page fault crash that happens in a later code path
when preparing ELF headers.
This bug is not easy to reproduce on small machines that have few
CPUs, because the allocated page aligned ELF buffer has more free
space to cover those small memory regions' PT_LOAD headers.
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Newer KVM won't be exposing PVCLOCK_COUNTS_FROM_ZERO anymore.
The purpose of that flags was to start counting system time from 0 when
the KVM clock has been initialized.
We can achieve the same by selecting one read as the initial point.
A simple subtraction will work unless the KVM clock count overflows
earlier (has smaller width) than scheduler's cycle count. We should be
safe till x86_128.
Because PVCLOCK_COUNTS_FROM_ZERO was enabled only on new hypervisors,
setting sched clock as stable based on PVCLOCK_TSC_STABLE_BIT might
regress on older ones.
I presume we don't need to change kvm_clock_read instead of introducing
kvm_sched_clock_read. A problem could arise in case sched_clock is
expected to return the same value as get_cycles, but we should have
merged those clocks in that case.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The DEFINE_PCI_DEVICE_TABLE() macro is deprecated. Use
'struct pci_device_id' instead of DEFINE_PCI_DEVICE_TABLE(),
with the goal of getting rid of this macro completely.
This Coccinelle semantic patch performs this transformation:
@@
identifier a;
declarer name DEFINE_PCI_DEVICE_TABLE;
initializer i;
@@
- DEFINE_PCI_DEVICE_TABLE(a)
+ const struct pci_device_id a[] = i;
Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20151001085201.GA16939@localhost
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This function compiles to 277 bytes of machine code and has 4 callsites.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Link: http://lkml.kernel.org/r/1443443037-22077-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This function compiles to 60 bytes of machine code.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Link: http://lkml.kernel.org/r/1443443037-22077-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This function compiles to 102 bytes of machine code. It has two
callsites.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Link: http://lkml.kernel.org/r/1443443037-22077-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Dmitry Vyukov reported the following using trinity and the memory
error detector AddressSanitizer
(https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
[ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
address ffff88002e280000
[ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
[ 124.578633] Accessed by thread T10915:
[ 124.579295] inlined in describe_heap_address
./arch/x86/mm/asan/report.c:164
[ 124.579295] #0 ffffffff810dd277 in asan_report_error
./arch/x86/mm/asan/report.c:278
[ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
./arch/x86/mm/asan/asan.c:37
[ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
[ 124.581893] #3 ffffffff8107c093 in get_wchan
./arch/x86/kernel/process_64.c:444
The address checks in the 64bit implementation of get_wchan() are
wrong in several ways:
- The lower bound of the stack is not the start of the stack
page. It's the start of the stack page plus sizeof (struct
thread_info)
- The upper bound must be:
top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).
The 2 * sizeof(unsigned long) is required because the stack pointer
points at the frame pointer. The layout on the stack is: ... IP FP
... IP FP. So we need to make sure that both IP and FP are in the
bounds.
Fix the bound checks and get rid of the mix of numeric constants, u64
and unsigned long. Making all unsigned long allows us to use the same
function for 32bit as well.
Use READ_ONCE() when accessing the stack. This does not prevent a
concurrent wakeup of the task and the stack changing, but at least it
avoids TOCTOU.
Also check task state at the end of the loop. Again that does not
prevent concurrent changes, but it avoids walking for nothing.
Add proper comments while at it.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Recent changes in the Hyper-V driver:
b4370df2b1 ("Drivers: hv: vmbus: add special crash handler")
broke the build when CONFIG_KEXEC_CORE is not set:
arch/x86/built-in.o: In function `hv_machine_crash_shutdown':
arch/x86/kernel/cpu/mshyperv.c:112: undefined reference to `native_machine_crash_shutdown'
Decorate all kexec related code with #ifdef CONFIG_KEXEC_CORE.
Reported-by: Jim Davis <jim.epost@gmail.com>
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1443002577-25370-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is not safe to clear global MCi_CTL banks during CPU offline
or suspend/resume operations. These MSRs are either
thread-scoped (meaning private to a thread), or core-scoped
(private to threads in that core only), or with a socket scope:
visible and controllable from all threads in the socket.
When we offline a single CPU, clearing those MCi_CTL bits will
stop signaling for all the shared, i.e., socket-wide resources,
such as LLC, iMC, etc.
In addition, it might be possible to compromise the integrity of
an Intel Secure Guard eXtentions (SGX) system if the attacker
has control of the host system and is able to inject errors
which would be otherwise ignored when MCi_CTL bits are cleared.
Hence on SGX enabled systems, if MCi_CTL is cleared, SGX gets
disabled.
Tested-by: Serge Ayoun <serge.ayoun@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
[ Cleanup text. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1441391390-16985-1-git-send-email-ashok.raj@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Straigntforward conversion from:
int was_in_debug_nmi[NR_CPUS]
to:
DECLARE_BITMAP(was_in_debug_nmi, NR_CPUS)
Saves about 2 kbytes in BSS for NR_CPUS=512.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443271638-2568-1-git-send-email-dvlasenk@redhat.com
[ Tidied up the code a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf fixes from Thomas Gleixner:
"Another pile of fixes for perf:
- Plug overflows and races in the core code
- Sanitize the flow of the perf syscall so we error out before
handling the more complex and hard to undo setups
- Improve and fix Broadwell and Skylake hardware support
- Revert a fix which broke what it tried to fix in perf tools
- A couple of smaller fixes in various places of perf tools"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Fix copying of /proc/kcore
perf intel-pt: Remove no_force_psb from documentation
perf probe: Use existing routine to look for a kernel module by dso->short_name
perf/x86: Change test_aperfmperf() and test_intel() to static
tools lib traceevent: Fix string handling in heterogeneous arch environments
perf record: Avoid infinite loop at buildid processing with no samples
perf: Fix races in computing the header sizes
perf: Fix u16 overflows
perf: Restructure perf syscall point of no return
perf/x86/intel: Fix Skylake FRONTEND MSR extrareg mask
perf/x86/intel/pebs: Add PEBS frontend profiling for Skylake
perf/x86/intel: Make the CYCLE_ACTIVITY.* constraint on Broadwell more specific
perf tools: Bool functions shouldn't return -1
tools build: Add test for presence of __get_cpuid() gcc builtin
tools build: Add test for presence of numa_num_possible_cpus() in libnuma
Revert "perf symbols: Fix mismatched declarations for elf_getphdrnum"
perf stat: Fix per-pkg event reporting bug