Commit Graph

25060 Commits

Author SHA1 Message Date
Matt Fleming
080fe0b790 perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
While the Intel PMU monitors the LLC when perf enables the
HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor
L1 instruction cache fetches (0x0080) and instruction cache misses
(0x0081) on the AMD PMU.

This is extremely confusing when monitoring the same workload across
Intel and AMD machines, since parameters like,

  $ perf stat -e cache-references,cache-misses

measure completely different things.

Instead, make the AMD PMU measure instruction/data cache and TLB fill
requests to the L2 and instruction/data cache and TLB misses in the L2
when HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled,
respectively. That way the events measure unified caches on both
platforms.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1472044328-21302-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 16:19:49 +02:00
Alexander Shishkin
1155bafcb7 perf/x86/intel/pt: Do validate the size of a kernel address filter
Right now, the kernel address filters in PT are prone to integer overflow
that may happen in adding filter's size to its offset to obtain the end
of the range. Such an overflow would also throw a #GP in the PT event
configuration path.

Fix this by explicitly validating the result of this calculation.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-4-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 11:14:16 +02:00
Alexander Shishkin
ddfdad991e perf/x86/intel/pt: Fix kernel address filter's offset validation
The kernel_ip() filter is used mostly by the DS/LBR code to look at the
branch addresses, but Intel PT also uses it to validate the address
filter offsets for kernel addresses, for which it is not sufficient:
supplying something in bits 64:48 that's not a sign extension of the lower
address bits (like 0xf00d000000000000) throws a #GP.

This patch adds address validation for the user supplied kernel filters.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 11:14:16 +02:00
Alexander Shishkin
95f60084ac perf/x86/intel/pt: Fix an off-by-one in address filter configuration
PT address filter configuration requires that a range is specified by
its first and last address, but at the moment we're obtaining the end
of the range by adding user specified size to its start, which is off
by one from what it actually needs to be.

Fix this and make sure that zero-sized filters don't pass the filter
validation.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 11:14:16 +02:00
Alexander Shishkin
cecf62352a perf/x86/intel: Don't disable "intel_bts" around "intel" event batching
At the moment, intel_bts events get disabled from intel PMU's disable
callback, which includes event scheduling transactions of said PMU,
which have nothing to do with intel_bts events.

We do want to keep intel_bts events off inside the PMI handler to
avoid filling up their buffer too soon.

This patch moves intel_bts enabling/disabling directly to the PMI
handler.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915082233.11065-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-15 11:25:26 +02:00
Linus Torvalds
4cea877657 PCI updates for v4.8:
Enumeration
     Mark Haswell Power Control Unit as having non-compliant BARs (Bjorn Helgaas)
 
   Power management
     Fix bridge_d3 update on device removal (Lukas Wunner)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX2az0AAoJEFmIoMA60/r8/EoP/28mGRiKi8mlqNAR3MYN3F0n
 VSIm7WyxWNawH1gRJXKQBzNqgMJnj4qRGXSIvP3AIYyBDcJs/X7j91/eKOARNfQr
 55A+gfSz4jUKlw+0WgPY8/U2/xQ4yoom1zhbsAYcIVeljZo/3JUg+wHpPjhIMkH0
 2slTerHRDExrS43jxQi225toiEaO6lcY8EVmHCDo+jYlQz3sCEwIXg9hn1rwTbvG
 sJI0zyUwHF+oWowgJqlwYxsbPPnelPAN5YAx7KrHuVmBdL0Bgo3oIRtbb3JZZ9Up
 L9bQ6NpRjSARvijaZ2TAhueqIIDv2HGgvwNB01l4Yggw7Sm1dFCuUS6vj/e5tpZA
 xntE3F6s2Z+I4I1D7pAX3jMYCdYx/QltiTCeGRp8pJv+f4ewW3jcel3FAksY3BEg
 0NCjDrGFqGYai4hGRROpt/aXlW/Pn53eQLlu4Xg2qgkj0NMh0ODMrTjMnABB39ae
 eGqIXab7WeVBxt10eU19J1u1RTqpUO2LJW+cMnvYdCfKAYby/gj8SD8vIsn3oDjZ
 hQS/4fSHurc7LZwDmwOfaiHlGnvcQWV9EKwgScS0v8AxPQnC8pNUgYpzZcXY8Q6I
 YXtyK7suFriSZPS0Qs4FrdfJrmTBaBQ55aZu9aftb5v4YqacN5qZo+HaXiONBy3v
 RzsFK6xIbnIgb35g8vKy
 =PQml
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "Here are two changes for v4.8.  The first fixes a "[Firmware Bug]: reg
  0x10: invalid BAR (can't size)" warning on Haswell, and the second
  fixes a problem in some new runtime suspend functionality we merged
  for v4.8.  Summary:

  Enumeration:
    Mark Haswell Power Control Unit as having non-compliant BARs (Bjorn Helgaas)

  Power management:
    Fix bridge_d3 update on device removal (Lukas Wunner)"

* tag 'pci-v4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Fix bridge_d3 update on device removal
  PCI: Mark Haswell Power Control Unit as having non-compliant BARs
2016-09-14 14:06:30 -07:00
Linus Torvalds
5924bbecd0 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Three fixes:

   - AMD microcode loading fix with randomization

   - an lguest tooling fix

   - and an APIC enumeration boundary condition fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Fix num_processors value in case of failure
  tools/lguest: Don't bork the terminal in case of wrong args
  x86/microcode/AMD: Fix load of builtin microcode with randomized memory
2016-09-13 12:52:45 -07:00
Linus Torvalds
ee319d5834 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This contains:

   - a set of fixes found by directed-random perf fuzzing efforts by
     Vince Weaver, Alexander Shishkin and Peter Zijlstra

   - a cqm driver crash fix

   - an AMD uncore driver use after free fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix PEBSv3 record drain
  perf/x86/intel/bts: Kill a silly warning
  perf/x86/intel/bts: Fix BTS PMI detection
  perf/x86/intel/bts: Fix confused ordering of PMU callbacks
  perf/core: Fix aux_mmap_count vs aux_refcount order
  perf/core: Fix a race between mmap_close() and set_output() of AUX events
  perf/x86/amd/uncore: Prevent use after free
  perf/x86/intel/cqm: Check cqm/mbm enabled state in event init
  perf/core: Remove WARN from perf_event_read()
2016-09-13 12:47:29 -07:00
Linus Torvalds
7c2c114416 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
 "This contains a Xen fix, an arm64 fix and a race condition /
  robustization set of fixes related to ExitBootServices() usage and
  boundary conditions"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Use efi_exit_boot_services()
  efi/libstub: Use efi_exit_boot_services() in FDT
  efi/libstub: Introduce ExitBootServices helper
  efi/libstub: Allocate headspace in efi_get_memory_map()
  efi: Fix handling error value in fdt_find_uefi_params
  efi: Make for_each_efi_memory_desc_in_map() cope with running on Xen
2016-09-13 12:02:00 -07:00
Linus Torvalds
ac059c4fa7 * s390: nested virt fixes (new 4.8 feature)
* x86: fixes for 4.8 regressions
 * ARM: two small bugfixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJX1xaBAAoJEL/70l94x66Db/8H/AuKWs6QbvUNnA+tWITdmFbi
 ah7R2BoRVak4h6UGYC4OsY9BTuWVeaCx2+8yLIqSdfWoYUJzwiGFPwkuUK/hVSkK
 /9gZO8SsREAm/1gVck2X2vzATYbfCxKcRsSq0/ocmIQEpRYWKGoJWS6zeiwLZ9kq
 H/KsAvsGc83VdlPXrhLVQa7js/Tl/M5JlBEbm6i8nIsvhoqZkES4rgwHKBtkxTyU
 8oepfNQnYTb2hZhJE0aW+9z3V0O+NKbGW75bKOzrbJBoEUGeHuPpLnP0mZIyEGPU
 grQkfuWMmfEaWrGahNA9ARpsNcy4Zp0BF5mgJ03OAmgfY+KmJroVk95IvcP9jF4=
 =+uuA
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 - s390: nested virt fixes (new 4.8 feature)
 - x86: fixes for 4.8 regressions
 - ARM: two small bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm-arm: Unmap shadow pagetables properly
  x86, clock: Fix kvm guest tsc initialization
  arm: KVM: Fix idmap overlap detection when the kernel is idmap'ed
  KVM: lapic: adjust preemption timer correctly when goes TSC backward
  KVM: s390: vsie: fix riccbd
  KVM: s390: don't use current->thread.fpu.* when accessing registers
2016-09-12 14:30:14 -07:00
Linus Torvalds
98ac9a608d Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
 "nvdimm fixes for v4.8, two of them are tagged for -stable:

   - Fix devm_memremap_pages() to use track_pfn_insert().  Otherwise,
     DAX pmd mappings end up with an uncached pgprot, and unusable
     performance for the device-dax interface.  The device-dax interface
     appeared in 4.7 so this is tagged for -stable.

   - Fix a couple VM_BUG_ON() checks in the show_smaps() path to
     understand DAX pmd entries.  This fix is tagged for -stable.

   - Fix a mis-merge of the nfit machine-check handler to flip the
     polarity of an if() to match the final version of the patch that
     Vishal sent for 4.8-rc1.  Without this the nfit machine check
     handler never detects / inserts new 'badblocks' entries which
     applications use to identify lost portions of files.

   - For test purposes, fix the nvdimm_clear_poison() path to operate on
     legacy / simulated nvdimm memory ranges.  Without this fix a test
     can set badblocks, but never clear them on these ranges.

   - Fix the range checking done by dax_dev_pmd_fault().  This is not
     tagged for -stable since this problem is mitigated by specifying
     aligned resources at device-dax setup time.

  These patches have appeared in a next release over the past week.  The
  recent rebase you can see in the timestamps was to drop an invalid fix
  as identified by the updated device-dax unit tests [1].  The -mm
  touches have an ack from Andrew"

[1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
   https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm: allow legacy (e820) pmem region to clear bad blocks
  nfit, mce: Fix SPA matching logic in MCE handler
  mm: fix cache mode of dax pmd mappings
  mm: fix show_smap() for zone_device-pmd ranges
  dax: fix mapping size check
2016-09-10 09:58:52 -07:00
Peter Zijlstra
8ef9b8455a perf/x86/intel: Fix PEBSv3 record drain
Alexander hit the WARN_ON_ONCE(!event) on his Skylake while running
the perf fuzzer.

This means the PEBSv3 record included a status bit for an inactive
event, something that _should_ not happen.

Move the code that filters the status bits against our known PEBS
events up a spot to guarantee we only deal with events we know about.

Further add "continue" statements to the WARN_ON_ONCE()s such that
we'll not die nor generate silly events in case we ever do hit them
again.

Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: stable@vger.kernel.org
Fixes: a3d86542de ("perf/x86/intel/pebs: Add PEBSv3 decoding")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:39 +02:00
Alexander Shishkin
ef9ef3befa perf/x86/intel/bts: Kill a silly warning
At the moment, intel_bts will WARN() out if there is more than one
event writing to the same ring buffer, via SET_OUTPUT, and will only
send data from one event to a buffer.

There is no reason to have this warning in, so kill it.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-6-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:38 +02:00
Alexander Shishkin
4d4c474124 perf/x86/intel/bts: Fix BTS PMI detection
Since BTS doesn't have a dedicated PMI status bit, the driver needs to
take extra care to check for the condition that triggers it to avoid
spurious NMI warnings.

Regardless of the local BTS context state, the only way of knowing that
the NMI is ours is to compare the write pointer against the interrupt
threshold.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-5-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:38 +02:00
Alexander Shishkin
a9a94401c2 perf/x86/intel/bts: Fix confused ordering of PMU callbacks
The intel_bts driver is using a CPU-local 'started' variable to order
callbacks and PMIs and make sure that AUX transactions don't get messed
up. However, the ordering rules in regard to this variable is a complete
mess, which recently resulted in perf_fuzzer-triggered warnings and
panics.

The general ordering rule that is patch is enforcing is that this
cpu-local variable be set only when the cpu-local AUX transaction is
active; consequently, this variable is to be checked before the AUX
related bits can be touched.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160906132353.19887-4-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10 11:15:37 +02:00
Dan Williams
9049771f7d mm: fix cache mode of dax pmd mappings
track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as
uncacheable rendering them impractical for application usage.  DAX-pte
mappings are cached and the goal of establishing DAX-pmd mappings is to
attain more performance, not dramatically less (3 orders of magnitude).

track_pfn_insert() relies on a previous call to reserve_memtype() to
establish the expected page_cache_mode for the range.  While memremap()
arranges for reserve_memtype() to be called, devm_memremap_pages() does
not.  So, teach track_pfn_insert() and untrack_pfn() how to handle
tracking without a vma, and arrange for devm_memremap_pages() to
establish the write-back-cache reservation in the memtype tree.

Cc: <stable@vger.kernel.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Reported-by: Kai Zhang <kai.ka.zhang@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-09 17:34:46 -07:00
Sebastian Andrzej Siewior
7d762e49c2 perf/x86/amd/uncore: Prevent use after free
The resent conversion of the cpu hotplug support in the uncore driver
introduced a regression due to the way the callbacks are invoked at
initialization time.

The old code called the prepare/starting/online function on each online cpu
as a block. The new code registers the hotplug callbacks in the core for
each state. The core invokes the callbacks at each registration on all
online cpus.

The code implicitely relied on the prepare/starting/online callbacks being
called as combo on a particular cpu, which was not obvious and completely
undocumented.

The resulting subtle wreckage happens due to the way how the uncore code
manages shared data structures for cpus which share an uncore resource in
hardware. The sharing is determined in the cpu starting callback, but the
prepare callback allocates per cpu data for the upcoming cpu because
potential sharing is unknown at this point. If the starting callback finds
a online cpu which shares the hardware resource it takes a refcount on the
percpu data of that cpu and puts the own data structure into a
'free_at_online' pointer of that shared data structure. The online callback
frees that.

With the old model this worked because in a starting callback only one non
unused structure (the one of the starting cpu) was available. The new code
allocates the data structures for all cpus when the prepare callback is
registered.

Now the starting function iterates through all online cpus and looks for a
data structure (skipping its own) which has a matching hardware id. The id
member of the data structure is initialized to 0, but the hardware id can
be 0 as well. The resulting wreckage is:

  CPU0 finds a matching id on CPU1, takes a refcount on CPU1 data and puts
  its own data structure into CPU1s data structure to be freed.

  CPU1 skips CPU0 because the data structure is its allegedly unsued own.
  It finds a matching id on CPU2, takes a refcount on CPU1 data and puts
  its own data structure into CPU2s data structure to be freed.

  ....

Now the online callbacks are invoked.

  CPU0 has a pointer to CPU1s data and frees the original CPU0 data. So
  far so good.

  CPU1 has a pointer to CPU2s data and frees the original CPU1 data, which
  is still referenced by CPU0 ---> Booom

So there are two issues to be solved here:

1) The id field must be initialized at allocation time to a value which
   cannot be a valid hardware id, i.e. -1

   This prevents the above scenario, but now CPU1 and CPU2 both stick their
   own data structure into the free_at_online pointer of CPU0. So we leak
   CPU1s data structure.

2) Fix the memory leak described in #1

   Instead of having a single pointer, use a hlist to enqueue the
   superflous data structures which are then freed by the first cpu
   invoking the online callback.

Ideally we should know the sharing _before_ invoking the prepare callback,
but that's way beyond the scope of this bug fix.

[ tglx: Rewrote changelog ]

Fixes: 96b2bd3866 ("perf/x86/amd/uncore: Convert to hotplug state machine")
Reported-and-tested-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20160909160822.lowgmkdwms2dheyv@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-10 00:00:06 +02:00
Prarit Bhargava
a4497a86fb x86, clock: Fix kvm guest tsc initialization
When booting a kvm guest on AMD with the latest kernel the following
messages are displayed in the boot log:

 tsc: Unable to calibrate against PIT
 tsc: HPET/PMTIMER calibration failed

aa297292d7 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID")
introduced a change to account for a difference in cpu and tsc frequencies for
Intel SKL processors. Before this change the native tsc set
x86_platform.calibrate_tsc to native_calibrate_tsc() which is a hardware
calibration of the tsc, and in tsc_init() executed

	tsc_khz = x86_platform.calibrate_tsc();
	cpu_khz = tsc_khz;

The kvm code changed x86_platform.calibrate_tsc to kvm_get_tsc_khz() and
executed the same tsc_init() function.  This meant that KVM guests did not
execute the native hardware calibration function.

After aa297292d7, there are separate native calibrations for cpu_khz and
tsc_khz.  The code sets x86_platform.calibrate_tsc to native_calibrate_tsc()
which is now an Intel specific calibration function, and
x86_platform.calibrate_cpu to native_calibrate_cpu() which is the "old"
native_calibrate_tsc() function (ie, the native hardware calibration
function).

tsc_init() now does

	cpu_khz = x86_platform.calibrate_cpu();
	tsc_khz = x86_platform.calibrate_tsc();
	if (tsc_khz == 0)
		tsc_khz = cpu_khz;
	else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
		cpu_khz = tsc_khz;

The kvm code should not call the hardware initialization in
native_calibrate_cpu(), as it isn't applicable for kvm and it didn't do that
prior to aa297292d7.

This patch resolves this issue by setting x86_platform.calibrate_cpu to
kvm_get_tsc_khz().

v2: I had originally set x86_platform.calibrate_cpu to
cpu_khz_from_cpuid(), however, pbonzini pointed out that the CPUID leaf
in that function is not available in KVM.  I have changed the function
pointer to kvm_get_tsc_khz().

Fixes: aa297292d7 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Len Brown <len.brown@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-08 16:41:55 +02:00
Dou Liyang
c291b01515 x86/apic: Fix num_processors value in case of failure
If the topology package map check of the APIC ID and the CPU is a failure,
we don't generate the processor info for that APIC ID yet we increase
disabled_cpus by one - which is buggy.

Only increase num_processors once we are sure we don't fail.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1473214893-16481-1-git-send-email-douly.fnst@cn.fujitsu.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 08:11:03 +02:00
Linus Torvalds
ab29b33a84 - Fix UM seccomp vs ptrace, after reordering landed.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJX0D/IAAoJEIly9N/cbcAmORYQAIzhZ+tFuXkOrEJx0efQx8ik
 8ix0wBrh0KTDQkMWQxUnDyIr+7vNcH0oOkH3kwgYaShAZif4oq6TGkdvgJJQr+ca
 ZsBcEdasYvglcr/7hEF4cH28h3YqN7HwW+Mx8IqrBQqqjly7V2jamBBd/YXOcDK9
 gqXooH1He9QVtVb0uC/AVFVY+st37iuqrreLFKWrX52FaUQ+7f0svN4LWC9b1UEq
 UQyi+HMMRbovum9+2WDKPpl8oEZvXE7CiWexwl2G9qBCpm/AhragArl9BhrA31eE
 0PPuVBTtypPNvKVieboXUhTg5eet+nFXtlea+/CLWfbqxpecsNiqMgSIg+aWOnz4
 Y6Te5P7oSwhto0WzXvilxjvShvjHgTo98PY9Qj0bBb7ECCbI09GfoDf3SPYayTwC
 9zPC04mJihr+6h8QdmYgfuHzTVjMxb6YOmAorz805cJ0vHtdPIP4Gkei4R1tNUkG
 TNQNXKef/UJbImL17pFu70Ru6/J58OgDuMuAAuswsFt7KELw4DZhmJL+KnYdPG1G
 gHirQd1HtQlsBoVvQAwmQjBp+TqFoY3oiv4Y2oq7IEk/ZCs3XJaEYKm90CoZlm4a
 sMtA0j+9rWqNirzuyWUeJfkd48yjcbOzSimzSCTsNVVsbyeA1yyapAdHSTWWm554
 zEubN4LCJoGiH/FWnX6U
 =UgsA
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fixes from Kees Cook:
 "Fix UM seccomp vs ptrace, after reordering landed"

* tag 'seccomp-v4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: Remove 2-phase API documentation
  um/ptrace: Fix the syscall number update after a ptrace
  um/ptrace: Fix the syscall_trace_leave call
2016-09-07 10:46:06 -07:00
Mickaël Salaün
ce29856a5e um/ptrace: Fix the syscall number update after a ptrace
Update the syscall number after each PTRACE_SETREGS on ORIG_*AX.

This is needed to get the potentially altered syscall number in the
seccomp filters after RET_TRACE.

This fix four seccomp_bpf tests:
> [ RUN      ] TRACE_syscall.skip_after_RET_TRACE
> seccomp_bpf.c:1560:TRACE_syscall.skip_after_RET_TRACE:Expected -1 (18446744073709551615) == syscall(39) (26)
> seccomp_bpf.c:1561:TRACE_syscall.skip_after_RET_TRACE:Expected 1 (1) == (*__errno_location ()) (22)
> [     FAIL ] TRACE_syscall.skip_after_RET_TRACE
> [ RUN      ] TRACE_syscall.kill_after_RET_TRACE
> TRACE_syscall.kill_after_RET_TRACE: Test exited normally instead of by signal (code: 1)
> [     FAIL ] TRACE_syscall.kill_after_RET_TRACE
> [ RUN      ] TRACE_syscall.skip_after_ptrace
> seccomp_bpf.c:1622:TRACE_syscall.skip_after_ptrace:Expected -1 (18446744073709551615) == syscall(39) (26)
> seccomp_bpf.c:1623:TRACE_syscall.skip_after_ptrace:Expected 1 (1) == (*__errno_location ()) (22)
> [     FAIL ] TRACE_syscall.skip_after_ptrace
> [ RUN      ] TRACE_syscall.kill_after_ptrace
> TRACE_syscall.kill_after_ptrace: Test exited normally instead of by signal (code: 1)
> [     FAIL ] TRACE_syscall.kill_after_ptrace

Fixes: 26703c636c ("um/ptrace: run seccomp after ptrace")

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: James Morris <jmorris@namei.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-09-07 09:25:04 -07:00
Kees Cook
e6971009a9 x86/uaccess: force copy_*_user() to be inlined
As already done with __copy_*_user(), mark copy_*_user() as __always_inline.
Without this, the checks for things like __builtin_const_p() won't work
consistently in either hardened usercopy nor the recent adjustments for
detecting usercopy overflows at compile time.

The change in kernel text size is detectable, but very small:

 text      data     bss     dec      hex     filename
12118735  5768608 14229504 32116847 1ea106f vmlinux.before
12120207  5768608 14229504 32118319 1ea162f vmlinux.after

Signed-off-by: Kees Cook <keescook@chromium.org>
2016-09-06 12:16:42 -07:00
Jiri Olsa
79d102cbfd perf/x86/intel/cqm: Check cqm/mbm enabled state in event init
Yanqiu Zhang reported kernel panic when using mbm event
on system where CQM is detected but without mbm event
support, like with perf:

  # perf stat -e 'intel_cqm/event=3/' -a

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: [<ffffffff8100d64c>] update_sample+0xbc/0xe0
  ...
   <IRQ>
   [<ffffffff8100d688>] __intel_mbm_event_init+0x18/0x20
   [<ffffffff81113d6b>] flush_smp_call_function_queue+0x7b/0x160
   [<ffffffff81114853>] generic_smp_call_function_single_interrupt+0x13/0x60
   [<ffffffff81052017>] smp_call_function_interrupt+0x27/0x40
   [<ffffffff816fb06c>] call_function_interrupt+0x8c/0xa0
  ...

The reason is that we currently allow to init mbm event
even if mbm support is not detected.  Adding checks for
both cqm and mbm events and support into cqm's event_init.

Fixes: 33c3cc7acf ("perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init")
Reported-by: Yanqiu Zhang <yanqzhan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1473089407-21857-1-git-send-email-jolsa@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06 10:42:12 +02:00
Wanpeng Li
e12c8f36f3 KVM: lapic: adjust preemption timer correctly when goes TSC backward
TSC_OFFSET will be adjusted if discovers TSC backward during vCPU load.
The preemption timer, which relies on the guest tsc to reprogram its
preemption timer value, is also reprogrammed if vCPU is scheded in to
a different pCPU. However, the current implementation reprogram preemption
timer before TSC_OFFSET is adjusted to the right value, resulting in the
preemption timer firing prematurely.

This patch fix it by adjusting TSC_OFFSET before reprogramming preemption
timer if TSC backward.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krċmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-05 16:14:39 +02:00
Jeffrey Hugo
d64934019f x86/efi: Use efi_exit_boot_services()
The eboot code directly calls ExitBootServices.  This is inadvisable as the
UEFI spec details a complex set of errors, race conditions, and API
interactions that the caller of ExitBootServices must get correct.  The
eboot code attempts allocations after calling ExitBootSerives which is
not permitted per the spec.  Call the efi_exit_boot_services() helper
intead, which handles the allocation scenario properly.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-05 12:40:16 +01:00
Jeffrey Hugo
dadb57abc3 efi/libstub: Allocate headspace in efi_get_memory_map()
efi_get_memory_map() allocates a buffer to store the memory map that it
retrieves.  This buffer may need to be reused by the client after
ExitBootServices() is called, at which point allocations are not longer
permitted.  To support this usecase, provide the allocated buffer size back
to the client, and allocate some additional headroom to account for any
reasonable growth in the map that is likely to happen between the call to
efi_get_memory_map() and the client reusing the buffer.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-05 12:18:17 +01:00
Borislav Petkov
cc2187a6e0 x86/microcode/AMD: Fix load of builtin microcode with randomized memory
We do not need to add the randomization offset when the microcode is
built in.

Reported-and-tested-by: Emanuel Czirai <icanrealizeum@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20160904093736.GA11939@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-05 10:38:56 +02:00
Linus Torvalds
9ca581b50d Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
 "A single fix for an AMD erratum so machines without a BIOS fix work"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/AMD: Apply erratum 665 on machines without a BIOS fix
2016-09-04 08:45:41 -07:00
Emanuel Czirai
d199299675 x86/AMD: Apply erratum 665 on machines without a BIOS fix
AMD F12h machines have an erratum which can cause DIV/IDIV to behave
unpredictably. The workaround is to set MSRC001_1029[31] but sometimes
there is no BIOS update containing that workaround so let's do it
ourselves unconditionally. It is simple enough.

[ Borislav: Wrote commit message. ]

Signed-off-by: Emanuel Czirai <icanrealizeum@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yaowu Xu <yaowu@google.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160902053550.18097-1-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-02 20:42:28 +02:00
Steven Rostedt
15301a5707 x86/paravirt: Do not trace _paravirt_ident_*() functions
Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up
after enabling function tracer. I asked him to bisect the functions within
available_filter_functions, which he did and it came down to three:

  _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64()

It was found that this is only an issue when noreplace-paravirt is added
to the kernel command line.

This means that those functions are most likely called within critical
sections of the funtion tracer, and must not be traced.

In newer kenels _paravirt_nop() is defined within gcc asm(), and is no
longer an issue.  But both _paravirt_ident_{32,64}() causes the
following splat when they are traced:

 mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054)
 mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070)
 mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054)
 mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054)
 NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469]
 Modules linked in: e1000e
 CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000
 RIP: 0010:[<ffffffff81134148>]  [<ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0
 RSP: 0018:ffff8800d4aefb90  EFLAGS: 00000246
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40
 RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030
 RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000
 R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8
 R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0
 FS:  00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0
 Call Trace:
   _raw_spin_lock+0x27/0x30
   handle_pte_fault+0x13db/0x16b0
   handle_mm_fault+0x312/0x670
   __do_page_fault+0x1b1/0x4e0
   do_page_fault+0x22/0x30
   page_fault+0x28/0x30
   __vfs_read+0x28/0xe0
   vfs_read+0x86/0x130
   SyS_read+0x46/0xa0
   entry_SYSCALL_64_fastpath+0x1e/0xa8
 Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b

Reported-by: Łukasz Daniluk <lukasz.daniluk@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-02 09:40:47 -07:00
Arnd Bergmann
236dec0510 kconfig: tinyconfig: provide whole choice blocks to avoid warnings
Using "make tinyconfig" produces a couple of annoying warnings that show
up for build test machines all the time:

    .config:966:warning: override: NOHIGHMEM changes choice state
    .config:965:warning: override: SLOB changes choice state
    .config:963:warning: override: KERNEL_XZ changes choice state
    .config:962:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
    .config:933:warning: override: SLOB changes choice state
    .config:930:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
    .config:870:warning: override: SLOB changes choice state
    .config:868:warning: override: KERNEL_XZ changes choice state
    .config:867:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state

I've made a previous attempt at fixing them and we discussed a number of
alternatives.

I tried changing the Makefile to use "merge_config.sh -n
$(fragment-list)" but couldn't get that to work properly.

This is yet another approach, based on the observation that we do want
to see a warning for conflicting 'choice' options, and that we can
simply make them non-conflicting by listing all other options as
disabled.  This is a trivial patch that we can apply independent of
plans for other changes.

Link: http://lkml.kernel.org/r/20160829214952.1334674-2-arnd@arndb.de
Link: https://storage.kernelci.org/mainline/v4.7-rc6/x86-tinyconfig/build.log
https://patchwork.kernel.org/patch/9212749/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-01 17:52:01 -07:00
Bjorn Helgaas
6af7e4f772 PCI: Mark Haswell Power Control Unit as having non-compliant BARs
The Haswell Power Control Unit has a non-PCI register (CONFIG_TDP_NOMINAL)
where BAR 0 is supposed to be.  This is erratum HSE43 in the spec update
referenced below:

  The PCIe* Base Specification indicates that Configuration Space Headers
  have a base address register at offset 0x10.  Due to this erratum, the
  Power Control Unit's CONFIG_TDP_NOMINAL CSR (Bus 1; Device 30; Function
  3; Offset 0x10) is located where a base register is expected.

Mark the PCU as having non-compliant BARs so we don't try to probe any of
them.  There are no other BARs on this device.

Rename the quirk so it's not Broadwell-specific.

Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v3-spec-update.html
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v3-datasheet-vol-2.html (section 5.4, Device 30 Function 3)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=153881
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Myron Stowe <myron.stowe@redhat.com>
2016-09-01 08:52:29 -05:00
Josh Poimboeuf
0d025d271e mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
There are three usercopy warnings which are currently being silenced for
gcc 4.6 and newer:

1) "copy_from_user() buffer size is too small" compile warning/error

   This is a static warning which happens when object size and copy size
   are both const, and copy size > object size.  I didn't see any false
   positives for this one.  So the function warning attribute seems to
   be working fine here.

   Note this scenario is always a bug and so I think it should be
   changed to *always* be an error, regardless of
   CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.

2) "copy_from_user() buffer size is not provably correct" compile warning

   This is another static warning which happens when I enable
   __compiletime_object_size() for new compilers (and
   CONFIG_DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size
   is const, but copy size is *not*.  In this case there's no way to
   compare the two at build time, so it gives the warning.  (Note the
   warning is a byproduct of the fact that gcc has no way of knowing
   whether the overflow function will be called, so the call isn't dead
   code and the warning attribute is activated.)

   So this warning seems to only indicate "this is an unusual pattern,
   maybe you should check it out" rather than "this is a bug".

   I get 102(!) of these warnings with allyesconfig and the
   __compiletime_object_size() gcc check removed.  I don't know if there
   are any real bugs hiding in there, but from looking at a small
   sample, I didn't see any.  According to Kees, it does sometimes find
   real bugs.  But the false positive rate seems high.

3) "Buffer overflow detected" runtime warning

   This is a runtime warning where object size is const, and copy size >
   object size.

All three warnings (both static and runtime) were completely disabled
for gcc 4.6 with the following commit:

  2fb0815c9e ("gcc4: disable __compiletime_object_size for GCC 4.6+")

That commit mistakenly assumed that the false positives were caused by a
gcc bug in __compiletime_object_size().  But in fact,
__compiletime_object_size() seems to be working fine.  The false
positives were instead triggered by #2 above.  (Though I don't have an
explanation for why the warnings supposedly only started showing up in
gcc 4.6.)

So remove warning #2 to get rid of all the false positives, and re-enable
warnings #1 and #3 by reverting the above commit.

Furthermore, since #1 is a real bug which is detected at compile time,
upgrade it to always be an error.

Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
needed.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-30 10:10:21 -07:00
Linus Torvalds
5d84ee7964 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
 "A single bugfix to prevent irq remapping when the ioapic is disabled"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Do not init irq remapping if ioapic is disabled
2016-08-28 10:00:21 -07:00
Linus Torvalds
af56ff27eb * ARM fixes:
** fixes for ITS init issues, error handling, IRQ leakage, race conditions
 ** An erratum workaround for timers
 ** Some removal of misleading use of errors and comments
 ** A fix for GICv3 on 32-bit guests
 * MIPS fix where the guest could wrongly map the first page of physical memory
 * x86 nested virtualization fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJXtyfVAAoJEL/70l94x66Dhe4IAIOGI/OYVWU5IfUQ01oeRgD3
 7wN222OmyC/K0/hSZc7ndRdcQfr5ombgM9XsS/EbkcRacWxAUHDX2FaYMpKgjT2M
 Dnh2tJHuPz/4VtByGQ2fZ4hziK7amn18/MtPFCee+mIj0ya2fcWZ4qHVU8pKC6Ps
 mVVZ0kxXsdV4pw9y6XgBLz/4bTLeASKvhFZrWOnjJoa+GeH2MFwocS0xaEI0HwxP
 HVwcgoRdGXJuKUB9jE9FDWmWOgdoLnCG1bNUOvXKPcE0ZaFQDT4I4dImkBys3rqz
 jbqnhLrpGEY2ZC3Rj+VyD2MOXbYOOSi59GRwYmCkqD96ZarHxSu3PdyCxmIFWzM=
 =+4WK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "ARM:
   - fixes for ITS init issues, error handling, IRQ leakage, race
     conditions
   - an erratum workaround for timers
   - some removal of misleading use of errors and comments
   - a fix for GICv3 on 32-bit guests

  MIPS:
   - fix for where the guest could wrongly map the first page of
     physical memory

  x86:
   - nested virtualization fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  MIPS: KVM: Check for pfn noslot case
  kvm: nVMX: fix nested tsc scaling
  KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write
  KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC
  arm64: KVM: report configured SRE value to 32-bit world
  arm64: KVM: remove misleading comment on pmu status
  KVM: arm/arm64: timer: Workaround misconfigured timer interrupt
  arm64: Document workaround for Cortex-A72 erratum #853709
  KVM: arm/arm64: Change misleading use of is_error_pfn
  KVM: arm64: ITS: avoid re-mapping LPIs
  KVM: arm64: check for ITS device on MSI injection
  KVM: arm64: ITS: move ITS registration into first VCPU run
  KVM: arm64: vgic-its: Make updates to propbaser/pendbaser atomic
  KVM: arm64: vgic-its: Plug race in vgic_put_irq
  KVM: arm64: vgic-its: Handle errors from vgic_add_lpi
  KVM: arm64: ITS: return 1 on successful MSI injection
2016-08-27 15:51:50 -07:00
Linus Torvalds
5e608a0270 Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "11 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: silently skip readahead for DAX inodes
  dax: fix device-dax region base
  fs/seq_file: fix out-of-bounds read
  mm: memcontrol: avoid unused function warning
  mm: clarify COMPACTION Kconfig text
  treewide: replace config_enabled() with IS_ENABLED() (2nd round)
  printk: fix parsing of "brl=" option
  soft_dirty: fix soft_dirty during THP split
  sysctl: handle error writing UINT_MAX to u32 fields
  get_maintainer: quiet noisy implicit -f vcs_file_exists checking
  byteswap: don't use __builtin_bswap*() with sparse
2016-08-26 23:12:12 -07:00
Linus Torvalds
219c04cea3 PCI updates for v4.8:
Resource management
     Update "pci=resource_alignment" documentation (Mathias Koehrer)
 
   MSI
     Use positive flags in pci_alloc_irq_vectors() (Christoph Hellwig)
     Call pci_intx() when using legacy interrupts in pci_alloc_irq_vectors() (Christoph Hellwig)
 
   Intel VMD host bridge driver
     Fix infinite loop executing irq's (Keith Busch)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXvvHgAAoJEFmIoMA60/r8y+MP/3cXG5FG3fdRB9Pn8q5BvdD4
 hmIgOpuEH859iYFOUJNxVB6haxG73Z+z3aFvNUDKTOh7GXQ7UHzNw1qj1OEmjXS0
 ex45ynjxP2DTBIO3eoyLpTKaRuOCAz03sDIucK9NzSBkrazBGoaxRRYY5gi+V8K9
 CNsbT8pPSGOrfRX6nk2QjiGMHVw+ExurvskbwuQmRYmHGU1Y1nnkmlnT5AMUJsn+
 BgSLOLFkv9UOIfLVY2+/KCnPFfi0nIxspCJHinosLl5BsgULgFPdRFKr6SiY+yhx
 W3d8MKnhDX5wsajtE7V/HurYmETiyimLL1ZBPJOCzGXwNGrwvUrt1rji+83S+1Zx
 LbrPGDRNBLW51prcrxOx1Pr/RDaHT0+/2UrSvaeUnTqniV5xW1daBPvB6h8jbKih
 C+GGqfiHURvJi1sd9QUMP95yVaxVgSBM9BrKfNYAmVYarnCpx3xbdeMv8gA0+QHx
 Ts0QKJ7ph80mcztdk2H0Ov7OK+OGjDhZPDijXiCTSPfKgK0HPsaytDCULt9LcqAv
 M8daeD42nwe36xOlNgcosj++gL+VxVYk/4BAiOWCgwF+k2EDh1ol8/eK6a+hXAqC
 qslLMNNucZP5BSsSIYc5KVGbkEXuHTw8viUNvBhLE2CWD3SY378IlzrF7QxjgKNl
 AawNqjfXY7obX3m5xr9g
 =qJG4
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "Resource management:
   - Update "pci=resource_alignment" documentation (Mathias Koehrer)

  MSI:
   - Use positive flags in pci_alloc_irq_vectors() (Christoph Hellwig)
   - Call pci_intx() when using legacy interrupts in pci_alloc_irq_vectors() (Christoph Hellwig)

  Intel VMD host bridge driver:
   - Fix infinite loop executing irq's (Keith Busch)"

* tag 'pci-v4.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/PCI: VMD: Fix infinite loop executing irq's
  PCI: Call pci_intx() when using legacy interrupts in pci_alloc_irq_vectors()
  PCI: Use positive flags in pci_alloc_irq_vectors()
  PCI: Update "pci=resource_alignment" documentation
2016-08-26 18:26:07 -07:00
Masahiro Yamada
a5ff1b34e1 treewide: replace config_enabled() with IS_ENABLED() (2nd round)
Commit 97f2645f35 ("tree-wide: replace config_enabled() with
IS_ENABLED()") mostly killed config_enabled(), but some new users have
appeared for v4.8-rc1.  They are all used for a boolean option, so can
be replaced with IS_ENABLED() safely.

Link: http://lkml.kernel.org/r/1471970749-24867-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-26 17:39:35 -07:00
Linus Torvalds
fe2dd21282 xen: regression fix for 4.8-rc3
- Fix a regression in the xenbus device preventing userspace tools
   from working.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXvdugAAoJEFxbo/MsZsTRAwEH/AiKLV4T0OiARv/df827WVnL
 obUmEAh/wVSWZh2xdUNurDOH64lEfeBDSBIpGPQMLGmXLzNEQO9u8ZJYWJ7R1Ryp
 JU37lu3DP7HqQqTXsy8ltgcBkwVaQZAo0GRtDeua80ZPdjulnZirwHWS48TuNIFF
 pVtW4Eoy1BNAVri55o5hOIub4HUKMRoNB/J+o+SKLyJEvOon+qD4pOfIhR3sqeja
 oYVX7QpY/4Miymd5uI9v8LUefS4PW/U58a7tjr414Ng4mzQbZOHDmNyWF0CH27lj
 INAmgMXDG7RtiSQMWPKtDQUvuefApKoeRmFr6mQ/xHyCX3cAzOw07+p0rKacCig=
 =PTX1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.8b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen regression fix from David Vrabel:
 "Fix a regression in the xenbus device preventing userspace tools from
  working"

* tag 'for-linus-4.8b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: change the type of xen_vcpu_id to uint32_t
  xenbus: don't look up transaction IDs for ordinary writes
2016-08-24 14:04:30 -04:00
Vitaly Kuznetsov
55467dea29 xen: change the type of xen_vcpu_id to uint32_t
We pass xen_vcpu_id mapping information to hypercalls which require
uint32_t type so it would be cleaner to have it as uint32_t. The
initializer to -1 can be dropped as we always do the mapping before using
it and we never check the 'not set' value anyway.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-08-24 18:17:27 +01:00
Wanpeng Li
2e63ad4bd5 x86/apic: Do not init irq remapping if ioapic is disabled
native_smp_prepare_cpus
  -> default_setup_apic_routing
    -> enable_IR_x2apic
      -> irq_remapping_prepare
        -> intel_prepare_irq_remapping
          -> intel_setup_irq_remapping		  

So IR table is setup even if "noapic" boot parameter is added. As a result we
crash later when the interrupt affinity is set due to a half initialized
remapping infrastructure.

Prevent remap initialization when IOAPIC is disabled.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1471954039-3942-1-git-send-email-wanpeng.li@hotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-08-24 09:45:40 +02:00
Keith Busch
21c80c9fef x86/PCI: VMD: Fix infinite loop executing irq's
We can't initialize the list head on deletion as this causes the node to
point to itself, which causes an infinite loop if vmd_irq() happens to be
servicing that node.

The list initialization was trying to fix a bug from multiple calls to
disable the same IRQ.  Fix this instead by having the VMD driver track if
the interrupt is enabled.

[bhelgaas: changelog, add "Fixes"]
Fixes: 97e9230635 ("x86/PCI: VMD: Initialize list item in IRQ disable")
Reported-by: Grzegorz Koczot <grzegorz.koczot@intel.com>
Tested-by: Miroslaw Drost <miroslaw.drost@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
2016-08-23 16:36:42 -05:00
Linus Torvalds
d1fdafa10f Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes a number of memory corruption bugs in the newly added
  sha256-mb/sha256-mb code"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: sha512-mb - fix ctx pointer
  crypto: sha256-mb - fix ctx pointer and digest copy
2016-08-23 14:29:00 -04:00
Linus Torvalds
3408fef744 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "An initrd microcode loading fix, and an SMP bootup topology setup fix
  to resolve crashes on SGI/UV systems if the BIOS is configured in a
  certain way"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/smp: Fix __max_logical_packages value setup
  x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
2016-08-18 15:09:41 -07:00
Peter Feiner
c95ba92afb kvm: nVMX: fix nested tsc scaling
When the host supported TSC scaling, L2 would use a TSC multiplier of
0, which causes a VM entry failure. Now L2's TSC uses the same
multiplier as L1.

Signed-off-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-18 12:19:09 +02:00
Radim Krčmář
dccbfcf52c KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write
If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
write with vmcs02 as the current VMCS.
This will incorrectly apply modifications intended for vmcs01 to vmcs02
and L2 can use it to gain access to L0's x2APIC registers by disabling
virtualized x2APIC while using msr bitmap that assumes enabled.

Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
current VMCS.  An alternative solution would temporarily make vmcs01 the
current VMCS, but it requires more care.

Fixes: 8d14695f95 ("x86, apicv: add virtual x2apic support")
Reported-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-18 12:19:08 +02:00
Radim Krčmář
d048c09821 KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC
msr bitmap can be used to avoid a VM exit (interception) on guest MSR
accesses.  In some configurations of VMX controls, the guest can even
directly access host's x2APIC MSRs.  See SDM 29.5 VIRTUALIZING MSR-BASED
APIC ACCESSES.

L2 could read all L0's x2APIC MSRs and write TPR, EOI, and SELF_IPI.
To do so, L1 would first trick KVM to disable all possible interceptions
by enabling APICv features and then would turn those features off;
nested_vmx_merge_msr_bitmap() only disabled interceptions, so VMX would
not intercept previously enabled MSRs even though they were not safe
with the new configuration.

Correctly re-enabling interceptions is not enough as a second bug would
still allow L1+L2 to access host's MSRs: msr bitmap was shared for all
VMCSs, so L1 could trigger a race to get the desired combination of msr
bitmap and VMX controls.

This fix allocates a msr bitmap for every L1 VCPU, allows only safe
x2APIC MSRs from L1's msr bitmap, and disables msr bitmaps if they would
have to intercept everything anyway.

Fixes: 3af18d9c5f ("KVM: nVMX: Prepare for using hardware MSR bitmap")
Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Wincy Van <fanwenyi0529@gmail.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-18 12:19:07 +02:00
Jiri Olsa
7b0501b1e7 x86/smp: Fix __max_logical_packages value setup
Frank reported kernel panic when he disabled several cores in BIOS
via following option:

  Core Disable Bitmap(Hex)   [0]

with number 0xFFE, which leaves 16 CPUs in system (out of 48).

The kernel panic below goes along with following messages:

 smpboot: Max logical packages: 2^M
 smpboot: APIC(0) Converting physical 0 to logical package 0^M
 smpboot: APIC(20) Converting physical 1 to logical package 1^M
 smpboot: APIC(40) Package 2 exceeds logical package map^M
 smpboot: CPU 8 APICId 40 disabled^M
 smpboot: APIC(60) Package 3 exceeds logical package map^M
 smpboot: CPU 12 APICId 60 disabled^M
 ...
 general protection fault: 0000 [#1] SMP^M
 Modules linked in:^M
 CPU: 15 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc5+ #1^M
 Hardware name: SGI UV300/UV300, BIOS SGI UV 300 series BIOS 05/25/2016^M
 task: ffff8801673e0000 ti: ffff8801673ac000 task.ti: ffff8801673ac000^M
 RIP: 0010:[<ffffffff81014d54>]  [<ffffffff81014d54>] uncore_change_context+0xd4/0x180^M
 ...
  [<ffffffff810158ac>] uncore_event_init_cpu+0x6c/0x70^M
  [<ffffffff81d8c91c>] intel_uncore_init+0x1c2/0x2dd^M
  [<ffffffff81d8c75a>] ? uncore_cpu_setup+0x17/0x17^M
  [<ffffffff81002190>] do_one_initcall+0x50/0x190^M
  [<ffffffff810ab193>] ? parse_args+0x293/0x480^M
  [<ffffffff81d87365>] kernel_init_freeable+0x1a5/0x249^M
  [<ffffffff81d86a35>] ? set_debug_rodata+0x12/0x12^M
  [<ffffffff816dc19e>] kernel_init+0xe/0x110^M
  [<ffffffff816e93bf>] ret_from_fork+0x1f/0x40^M
  [<ffffffff816dc190>] ? rest_init+0x80/0x80^M

The reason for the panic is wrong value of __max_logical_packages,
which lets logical_package_map uninitialized and the uncore code
relying on this map being properly initialized (maybe we should
add some safety checks there as well).

The __max_logical_packages is computed as:

  DIV_ROUND_UP(total_cpus, ncpus);
  - ncpus being number of cores

With above BIOS setup we get total_cpus == 16 which set
__max_logical_packages to 2 (ncpus is 12).

Once topology_update_package_map processes CPU with logical
pkg over 2 we display above messages and fail to initialize
the physical_to_logical_pkg map, which makes the uncore code
crash.

The fix is to remove logical_package_map bitmap completely
and keep and update the logical_packages number instead.

After we enumerate all the present CPUs, we check if the
enumerated logical packages count is within its computed
maximum from BIOS data.

If it's not the case, we set this maximum to the new enumerated
value and freeze any new addition of logical packages.

The freeze is because lot of init code like uncore/rapl/cqm
depends on having maximum logical package value set to allocate
their data, so we can't change it later on.

Prarit Bhargava tested the patch and confirms that it solves
the problem:

  From dmidecode:
          Core Count: 24
          Core Enabled: 24
          Thread Count: 48

Orig kernel boot log:

 [    0.464981] smpboot: Max logical packages: 19
 [    0.469861] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.477261] smpboot: APIC(40) Converting physical 1 to logical package 1
 [    0.484760] smpboot: APIC(80) Converting physical 2 to logical package 2
 [    0.492258] smpboot: APIC(c0) Converting physical 3 to logical package 3

1.  nr_cpus=8, should stop enumerating in package 0:

 [    0.533664] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.539596] smpboot: Max logical packages: 19

2.  max_cpus=8, should still enumerate all packages:

 [    0.526494] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.532428] smpboot: APIC(40) Converting physical 1 to logical package 1
 [    0.538456] smpboot: APIC(80) Converting physical 2 to logical package 2
 [    0.544486] smpboot: APIC(c0) Converting physical 3 to logical package 3
 [    0.550524] smpboot: Max logical packages: 19

3.  nr_cpus=49 ( 2 socket + 1 core on 3rd socket), should stop enumerating in
    package 2:

 [    0.521378] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.527314] smpboot: APIC(40) Converting physical 1 to logical package 1
 [    0.533345] smpboot: APIC(80) Converting physical 2 to logical package 2
 [    0.539368] smpboot: Max logical packages: 19

4.  maxcpus=49, should still enumerate all packages:

 [    0.525591] smpboot: APIC(0) Converting physical 0 to logical package 0
 [    0.531525] smpboot: APIC(40) Converting physical 1 to logical package 1
 [    0.537547] smpboot: APIC(80) Converting physical 2 to logical package 2
 [    0.543579] smpboot: APIC(c0) Converting physical 3 to logical package 3
 [    0.549624] smpboot: Max logical packages: 19

5.  kdump (nr_cpus=1) works as well.

Reported-by: Frank Ramsay <framsay@redhat.com>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160815101700.GA30090@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18 10:14:48 +02:00
Borislav Petkov
88b2f63402 x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
Similar to:

  efaad554b4 ("x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y")

... fix microcode loading from the initrd on AMD by adding the
randomization offset to the microcode patch container within the initrd.

Reported-and-tested-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-tip-commits@vger.kernel.org
Link: http://lkml.kernel.org/r/20160817113314.GA19221@nazgul.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18 10:06:49 +02:00
Rafael J. Wysocki
6c16f42a4e Merge branch 'pm-sleep'
* pm-sleep:
  PM / hibernate: Fix rtree_next_node() to avoid walking off list ends
  x86/power/64: Use __pa() for physical address computation
  PM / sleep: Update some system sleep documentation
2016-08-18 03:27:08 +02:00