Commit Graph

17886 Commits

Author SHA1 Message Date
Pavel Emelyanov
0f8975ec4d mm: soft-dirty bits for user memory changes tracking
The soft-dirty is a bit on a PTE which helps to track which pages a task
writes to.  In order to do this tracking one should

  1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
  2. Wait some time.
  3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)

To do this tracking, the writable bit is cleared from PTEs when the
soft-dirty bit is.  Thus, after this, when the task tries to modify a
page at some virtual address the #PF occurs and the kernel sets the
soft-dirty bit on the respective PTE.

Note, that although all the task's address space is marked as r/o after
the soft-dirty bits clear, the #PF-s that occur after that are processed
fast.  This is so, since the pages are still mapped to physical memory,
and thus all the kernel does is finds this fact out and puts back
writable, dirty and soft-dirty bits on the PTE.

Another thing to note, is that when mremap moves PTEs they are marked
with soft-dirty as well, since from the user perspective mremap modifies
the virtual memory at mremap's new address.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:26 -07:00
David S. Miller
0c1072ae02 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/freescale/fec_main.c
	drivers/net/ethernet/renesas/sh_eth.c
	net/ipv4/gre.c

The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
and the splitting of the gre.c code into seperate files.

The FEC conflict was two sets of changes adding ethtool support code
in an "!CONFIG_M5272" CPP protected block.

Finally the sh_eth.c conflict was between one commit add bits set
in the .eesr_err_check mask whilst another commit removed the
.tx_error_check member and assignments.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 14:55:13 -07:00
Linus Torvalds
f991fae5c6 Power management and ACPI updates for 3.11-rc1
- Hotplug changes allowing device hot-removal operations to fail
   gracefully (instead of crashing the kernel) if they cannot be
   carried out completely.  From Rafael J Wysocki and Toshi Kani.
 
 - Freezer update from Colin Cross and Mandeep Singh Baines targeted
   at making the freezing of tasks a bit less heavy weight operation.
 
 - cpufreq resume fix from Srivatsa S Bhat for a regression introduced
   during the 3.10 cycle causing some cpufreq sysfs attributes to
   return wrong values to user space after resume.
 
 - New freqdomain_cpus sysfs attribute for the acpi-cpufreq driver to
   provide information previously available via related_cpus from
   Lan Tianyu.
 
 - cpufreq fixes and cleanups from Viresh Kumar, Jacob Shin,
   Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia, Arnd Bergmann, and
   Tang Yuantian.
 
 - Fix for an ACPICA regression causing suspend/resume issues to
   appear on some systems introduced during the 3.4 development cycle
   from Lv Zheng.
 
 - ACPICA fixes and cleanups from Bob Moore, Tomasz Nowicki, Lv Zheng,
   Chao Guan, and Zhang Rui.
 
 - New cupidle driver for Xilinx Zynq processors from Michal Simek.
 
 - cpuidle fixes and cleanups from Daniel Lezcano.
 
 - Changes to make suspend/resume work correctly in Xen guests from
   Konrad Rzeszutek Wilk.
 
 - ACPI device power management fixes and cleanups from Fengguang Wu
   and Rafael J Wysocki.
 
 - ACPI documentation updates from Lv Zheng, Aaron Lu and Hanjun Guo.
 
 - Fix for the IA-64 issue that was the reason for reverting commit
   9f29ab1 and updates of the ACPI scan code from Rafael J Wysocki.
 
 - Mechanism for adding CMOS RTC address space handlers from Lan Tianyu
   (to allow some EC-related breakage to be fixed on some systems).
 
 - Spec-compliant implementation of acpi_os_get_timer() from
   Mika Westerberg.
 
 - Modification of do_acpi_find_child() to execute _STA in order to
   to avoid situations in which a pointer to a disabled device object
   is returned instead of an enabled one with the same _ADR value.
   From Jeff Wu.
 
 - Intel BayTrail PCH (Platform Controller Hub) support for the ACPI
   Intel Low-Power Subsystems (LPSS) driver and modificaions of that
   driver to work around a couple of known BIOS issues from
   Mika Westerberg and Heikki Krogerus.
 
 - EC driver fix from Vasiliy Kulikov to make it use get_user() and
   put_user() instead of dereferencing user space pointers blindly.
 
 - Assorted ACPI code cleanups from Bjorn Helgaas, Nicholas Mazzuca and
   Toshi Kani.
 
 - Modification of the "runtime idle" helper routine to take the return
   values of the callbacks executed by it into account and to call
   rpm_suspend() if they return 0, which allows some code bloat
   reduction to be done, from Rafael J Wysocki and Alan Stern.
 
 - New trace points for PM QoS from Sahara <keun-o.park@windriver.com>.
 
 - PM QoS documentation update from Lan Tianyu.
 
 - Assorted core PM code cleanups and changes from Bernie Thompson,
   Bjorn Helgaas, Julius Werner, and Shuah Khan.
 
 - New devfreq driver for the Exynos5-bus device from Abhilash Kesavan.
 
 - Minor devfreq cleanups, fixes and MAINTAINERS update from
   MyungJoo Ham, Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and
   Wei Yongjun.
 
 - OMAP Adaptive Voltage Scaling (AVS) SmartReflex voltage control
   driver updates from Andrii Tseglytskyi and Nishanth Menon.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0ZNOAAoJEKhOf7ml8uNsDLYP/0EU4rmvw0TWTITfp6RS1KDE
 9GwBn96ZR4Q5bJd9gBCTPSqhHOYMqxWEUp99sn/M2wehG1pk/jw5LO56+2IhM3UZ
 g1HDcJ7te2nVT/iXsKiAGTVhU9Rk0aYwoVSknwk27qpIBGxW9w/s5tLX8pY3Q3Zq
 wL/7aTPjyL+PFFFEaxgH7qLqsl3DhbtYW5AriUBTkXout/tJ4eO1b7MNBncLDh8X
 VQ/0DNCKE95VEJfkO4rk9RKUyVp9GDn0i+HXCD/FS4IA5oYzePdVdNDmXf7g+swe
 CGlTZq8pB+oBpDiHl4lxzbNrKQjRNbGnDUkoRcWqn0nAw56xK+vmYnWJhW99gQ/I
 fKnvxeLca5po1aiqmC4VSJxZIatFZqLrZAI4dzoCLWY+bGeTnCKmj0/F8ytFnZA2
 8IuLLs7/dFOaHXV/pKmpg6FAlFa9CPxoqRFoyqb4M0GjEarADyalXUWsPtG+6xCp
 R/p0CISpwk+guKZR/qPhL7M654S7SHrPwd2DPF0KgGsvk+G2GhoB8EzvD8BVp98Z
 9siCGCdgKQfJQVI6R0k9aFmn/4gRQIAgyPhkhv9tqULUUkiaXki+/t8kPfnb8O/d
 zep+CA57E2G8MYLkDJfpFeKS7GpPD6TIdgFdGmOUC0Y6sl9iTdiw4yTx8O2JM37z
 rHBZfYGkJBrbGRu+Q1gs
 =VBBq
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "This time the total number of ACPI commits is slightly greater than
  the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
  remains the most active patch submitter.

  To me, the most significant change is the addition of offline/online
  device operations to the driver core (with the Greg's blessing) and
  the related modifications of the ACPI core hotplug code.  Next are the
  freezer updates from Colin Cross that should make the freezing of
  tasks a bit less heavy weight.

  We also have a couple of regression fixes, a number of fixes for
  issues that have not been identified as regressions, two new drivers
  and a bunch of cleanups all over.

  Highlights:

   - Hotplug changes to support graceful hot-removal failures.

     It sometimes is necessary to fail device hot-removal operations
     gracefully if they cannot be carried out completely.  For example,
     if memory from a memory module being hot-removed has been allocated
     for the kernel's own use and cannot be moved elsewhere, it's
     desirable to fail the hot-removal operation in a graceful way
     rather than to crash the kernel, but currenty a success or a kernel
     crash are the only possible outcomes of an attempted memory
     hot-removal.  Needless to say, that is not a very attractive
     alternative and it had to be addressed.

     However, in order to make it work for memory, I first had to make
     it work for CPUs and for this purpose I needed to modify the ACPI
     processor driver.  It's been split into two parts, a resident one
     handling the low-level initialization/cleanup and a modular one
     playing the actual driver's role (but it binds to the CPU system
     device objects rather than to the ACPI device objects representing
     processors).  That's been sort of like a live brain surgery on a
     patient who's riding a bike.

     So this is a little scary, but since we found and fixed a couple of
     regressions it caused to happen during the early linux-next testing
     (a month ago), nobody has complained.

     As a bonus we remove some duplicated ACPI hotplug code, because the
     ACPI-based CPU hotplug is now going to use the common ACPI hotplug
     code.

   - Lighter weight freezing of tasks.

     These changes from Colin Cross and Mandeep Singh Baines are
     targeted at making the freezing of tasks a bit less heavy weight
     operation.  They reduce the number of tasks woken up every time
     during the freezing, by using the observation that the freezer
     simply doesn't need to wake up some of them and wait for them all
     to call refrigerator().  The time needed for the freezer to decide
     to report a failure is reduced too.

     Also reintroduced is the check causing a lockdep warining to
     trigger when try_to_freeze() is called with locks held (which is
     generally unsafe and shouldn't happen).

   - cpufreq updates

     First off, a commit from Srivatsa S Bhat fixes a resume regression
     introduced during the 3.10 cycle causing some cpufreq sysfs
     attributes to return wrong values to user space after resume.  The
     fix is kind of fresh, but also it's pretty obvious once Srivatsa
     has identified the root cause.

     Second, we have a new freqdomain_cpus sysfs attribute for the
     acpi-cpufreq driver to provide information previously available via
     related_cpus.  From Lan Tianyu.

     Finally, we fix a number of issues, mostly related to the
     CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
     up some code.  The majority of changes from Viresh Kumar with bits
     from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
     Arnd Bergmann, and Tang Yuantian.

   - ACPICA update

     A usual bunch of updates from the ACPICA upstream.

     During the 3.4 cycle we introduced support for ACPI 5 extended
     sleep registers, but they are only supposed to be used if the
     HW-reduced mode bit is set in the FADT flags and the code attempted
     to use them without checking that bit.  That caused suspend/resume
     regressions to happen on some systems.  Fix from Lv Zheng causes
     those registers to be used only if the HW-reduced mode bit is set.

     Apart from this some other ACPICA bugs are fixed and code cleanups
     are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
     Zhang Rui.

   - cpuidle updates

     New driver for Xilinx Zynq processors is added by Michal Simek.

     Multidriver support simplification, addition of some missing
     kerneldoc comments and Kconfig-related fixes come from Daniel
     Lezcano.

   - ACPI power management updates

     Changes to make suspend/resume work correctly in Xen guests from
     Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
     cleanups and fixes of the ACPI device power state selection
     routine.

   - ACPI documentation updates

     Some previously missing pieces of ACPI documentation are added by
     Lv Zheng and Aaron Lu (hopefully, that will help people to
     uderstand how the ACPI subsystem works) and one outdated doc is
     updated by Hanjun Guo.

   - Assorted ACPI updates

     We finally nailed down the IA-64 issue that was the reason for
     reverting commit 9f29ab11dd ("ACPI / scan: do not match drivers
     against objects having scan handlers"), so we can fix it and move
     the ACPI scan handler check added to the ACPI video driver back to
     the core.

     A mechanism for adding CMOS RTC address space handlers is
     introduced by Lan Tianyu to allow some EC-related breakage to be
     fixed on some systems.

     A spec-compliant implementation of acpi_os_get_timer() is added by
     Mika Westerberg.

     The evaluation of _STA is added to do_acpi_find_child() to avoid
     situations in which a pointer to a disabled device object is
     returned instead of an enabled one with the same _ADR value.  From
     Jeff Wu.

     Intel BayTrail PCH (Platform Controller Hub) support is added to
     the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
     driver is modified to work around a couple of known BIOS issues.
     Changes from Mika Westerberg and Heikki Krogerus.

     The EC driver is fixed by Vasiliy Kulikov to use get_user() and
     put_user() instead of dereferencing user space pointers blindly.

     Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
     Kani.

   - Assorted power management updates

     The "runtime idle" helper routine is changed to take the return
     values of the callbacks executed by it into account and to call
     rpm_suspend() if they return 0, which allows us to reduce the
     overall code bloat a bit (by dropping some code that's not
     necessary any more after that modification).

     The runtime PM documentation is updated by Alan Stern (to reflect
     the "runtime idle" behavior change).

     New trace points for PM QoS are added by Sahara
     (<keun-o.park@windriver.com>).

     PM QoS documentation is updated by Lan Tianyu.

     Code cleanups are made and minor issues are addressed by Bernie
     Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.

   - devfreq updates

     New driver for the Exynos5-bus device from Abhilash Kesavan.

     Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
     Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.

   - OMAP power management updates

     Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
     updates from Andrii Tseglytskyi and Nishanth Menon."

* tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
  cpufreq: Fix cpufreq regression after suspend/resume
  ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
  PM / Sleep: Warn about system time after resume with pm_trace
  cpufreq: don't leave stale policy pointer in cdbs->cur_policy
  acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
  cpufreq: make sure frequency transitions are serialized
  ACPI: implement acpi_os_get_timer() according the spec
  ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
  ACPI: Add CMOS RTC Operation Region handler support
  ACPI / processor: Drop unused variable from processor_perflib.c
  cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
  ...
2013-07-03 14:35:40 -07:00
Linus Torvalds
fe489bf450 KVM fixes for 3.11
On the x86 side, there are some optimizations and documentation updates.
 The big ARM/KVM change for 3.11, support for AArch64, will come through
 Catalin Marinas's tree.  s390 and PPC have misc cleanups and bugfixes.
 
 There is a conflict due to "s390/pgtable: fix ipte notify bit" having
 entered 3.10 through Martin Schwidefsky's s390 tree.  This pull request
 has additional changes on top, so this tree's version is the correct one.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0oU6AAoJEBvWZb6bTYbynnsP/RSUrrHrA8Wu1tqVfAKu+1y5
 6OIihqZ9x11/YMaNofAfv86jqxFu0/j7CzMGphNdjzujqKI+Q1tGe7oiVCmKzoG+
 UvSctWsz0lpllgBtnnrm5tcfmG6rrddhLtpA7m320+xCVx8KV5P4VfyHZEU+Ho8h
 ziPmb2mAQ65gBNX6nLHEJ3ITTgad6gt4NNbrKIYpyXuWZQJypzaRqT/vpc4md+Ed
 dCebMXsL1xgyb98EcnOdrWH1wV30MfucR7IpObOhXnnMKeeltqAQPvaOlKzZh4dK
 +QfxJfdRZVS0cepcxzx1Q2X3dgjoKQsHq1nlIyz3qu1vhtfaqBlixLZk0SguZ/R9
 1S1YqucZiLRO57RD4q0Ak5oxwobu18ZoqJZ6nledNdWwDe8bz/W2wGAeVty19ky0
 qstBdM9jnwXrc0qrVgZp3+s5dsx3NAm/KKZBoq4sXiDLd/yBzdEdWIVkIrU3X9wU
 3X26wOmBxtsB7so/JR7ciTsQHelmLicnVeXohAEP9CjIJffB81xVXnXs0P0SYuiQ
 RzbSCwjPzET4JBOaHWT0Dhv0DTS/EaI97KzlN32US3Bn3WiLlS1oDCoPFoaLqd2K
 LxQMsXS8anAWxFvexfSuUpbJGPnKSidSQoQmJeMGBa9QhmZCht3IL16/Fb641ToN
 xBohzi49L9FDbpOnTYfz
 =1zpG
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "On the x86 side, there are some optimizations and documentation
  updates.  The big ARM/KVM change for 3.11, support for AArch64, will
  come through Catalin Marinas's tree.  s390 and PPC have misc cleanups
  and bugfixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits)
  KVM: PPC: Ignore PIR writes
  KVM: PPC: Book3S PR: Invalidate SLB entries properly
  KVM: PPC: Book3S PR: Allow guest to use 1TB segments
  KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match
  KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry
  KVM: PPC: Book3S PR: Fix proto-VSID calculations
  KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL
  KVM: Fix RTC interrupt coalescing tracking
  kvm: Add a tracepoint write_tsc_offset
  KVM: MMU: Inform users of mmio generation wraparound
  KVM: MMU: document fast invalidate all mmio sptes
  KVM: MMU: document fast invalidate all pages
  KVM: MMU: document fast page fault
  KVM: MMU: document mmio page fault
  KVM: MMU: document write_flooding_count
  KVM: MMU: document clear_spte_count
  KVM: MMU: drop kvm_mmu_zap_mmio_sptes
  KVM: MMU: init kvm generation close to mmio wrap-around value
  KVM: MMU: add tracepoint for check_mmio_spte
  KVM: MMU: fast invalidate all mmio sptes
  ...
2013-07-03 13:21:40 -07:00
Linus Torvalds
3e34131a65 Bug-fixes:
* Fix memory leak when CPU hotplugging.
 * Compile bugs with various #ifdefs
 * Fix state changes in Xen PCI front not dealing well with new toolstack.
 * Cleanups in code (use pr_*, fix 80 characters splits, etc)
 * Long standing bug in double-reporting the steal time
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJR0Xc1AAoJEFjIrFwIi8fJo68H/jZaJJmytDI7exHTyq8fSGXQ
 5OERw5YZeM5jZQG55YC5hmGS5oIpKEdBt+aAEpuofYUhrR/ZFqDr0j+QEiqC36bl
 cl0/IAnMBGnyyO6FYY4Sut2H+S5BGYQNbwo9YAtgKtZANr2eLABxYUfMU44I/jCW
 M7DAojME9OZLBW3ORYZTGf1A0T8hJINhxZIWhtLMrkckCb9AZMieKdMOHJEWq2jl
 aPxx78U+2CLTdLquOLIiBEiTO3vAzx2Wt4prD+uCizWna45H9gBj7GFwBYH7p9Ry
 TFyzmDc5PufThTilfDyQW1y4yiNLpFDC/67a1wpwObEBn87hstHgwHQQ5INzqwY=
 =Ej7M
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.11-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen bugfixes from Konrad Rzeszutek Wilk:
 - Fix memory leak when CPU hotplugging.
 - Compile bugs with various #ifdefs
 - Fix state changes in Xen PCI front not dealing well with new
   toolstack.
 - Cleanups in code (use pr_*, fix 80 characters splits, etc)
 - Long standing bug in double-reporting the steal time

* tag 'stable/for-linus-3.11-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/time: remove blocked time accounting from xen "clockchip"
  xen: Convert printks to pr_<level>
  xen: ifdef CONFIG_HIBERNATE_CALLBACKS xen_*_suspend
  xen/pcifront: Deal with toolstack missing 'XenbusStateClosing' state.
  xen/time: Free onlined per-cpu data structure if we want to online it again.
  xen/time: Check that the per_cpu data structure has data before freeing.
  xen/time: Don't leak interrupt name when offlining.
  xen/time: Encapsulate the struct clock_event_device in another structure.
  xen/spinlock: Don't leak interrupt name when offlining.
  xen/smp: Don't leak interrupt name when offlining.
  xen/smp: Set the per-cpu IRQ number to a valid default.
  xen/smp: Introduce a common structure to contain the IRQ name and interrupt line.
  xen/smp: Coalesce the free_irq calls in one function.
  xen-pciback: fix error return code in pcistub_irq_handler_switch()
2013-07-03 13:12:42 -07:00
Linus Torvalds
baa6f82093 Simple warning fix for module sections. If too late to pull, no big deal.
Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0NO+AAoJENkgDmzRrbjxS0YP/RoI5WguUDVx8tIdS29pwe9L
 Z5U0uIDUBZkNftlvvkdqdGjU1MeejxoMIs6cMz6ScY6YEI7li1AzuNApRadflnMx
 Do9u999NylI6q2oCLrt9y/eliJ/p5hkeCbhDrGdCgB7rPCjEAp/+Eu9Kq7FO8AYr
 Qu0rpLjvJ/5c5lsLAmUn95lZZ/uc/EPHasfZDiR8eGHRoAvPeyJ5qR06MpHFcl33
 2EKwR8RCYP9yUdqWnSKlC4LTuIpNK/BjEpHcQK2UPSn49vR8B3gVz5VXNP/7nIYQ
 hCr3VGoEOBoAlsD38IuKycXQ7vmBTU2hq7bQPq7Z6wyH96Sb6xFmVqT1Dm7uAI4+
 3OU69ugQb7WM3hQjeRRFvn5WjPqu8THwJKU7vhEf+xA09+v09QFjbHoEFabrFDlo
 MqVVhelkT6Mo8bBrxl8sZc/CG+aSl/kLQfyXWIcR2MQEAlfYKOYuDz+5TXtMZHaR
 SwlxgE2wbPP54v27yQWfBWY4d7dtJ+OX25GW1yzJdsNUFg7fQ9zzMnrv1prsUlog
 bxs4P0JcBOMLZx2IWm0R2kmbRFYjPgj6Kr3VnrIhZGesG1cwQB2QNvEPEkCtl6kq
 8HdX8Tua3dpfKAhDxnlztpVcJUmTuLn95jMWjW1KhCcjOqqJbaElq9ubplP2eMtr
 7zlfzFq9RsBKG0+J98Gt
 =vZYj
 -----END PGP SIGNATURE-----
mergetag object b3087e48ce
 type commit
 tag virtio-next-for-linus
 tagger Rusty Russell <rusty@rustcorp.com.au> 1372639977 +0930
 
 Was away, but it's all trivial and been sitting in linux-next.  So if you don't
 pull, no electrons will be harmed.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0NMjAAoJENkgDmzRrbjxf3EQALoIAuQAjiEcHTGkLd/hGMJk
 mDiN14CXNH0siKASlCfcAxudJxKfIRM1tYzMFV1e7DtC7ZAH43m/SYBmXC4hRpIz
 nHsv57xtUg8bGZgb2OZKX/L0rqiE5wxtK7ZWGIPh3aE4uZKif+YvSBNJniO/HRTl
 UVDbrp+sbZzbT8NIHQ8d2vdtfG7rCNRkVaaETxk1wguu2TGJkyt0xuU3zT54We0L
 M8rJuHyTplgEMuDUydmNi9uNPZJZZY7QqoZ7s+ubXELo2FcFeyqWP2oHZnywESok
 fWOfhgdgIgmTED/eXS9wGwB7+K6azKg2chA23QZ91ncX/Ts141sEAd9ezwuQZc+5
 VS/Lo2mRvWbEcG78uYGUz+ukltZv0Og6uOvfSfCrALcVA4pYNFJTjbRu+Io+qAml
 082byeBUT/eurZr7xE30b4aTvZV2bYfqFkVynjF9ugNvTTjMiruw8uAzep6DD1gl
 38XRwtrk6MHLAFuxsyNV1FgunIWB6zvXNjMTUUVDLAQCbZwxZArr9gDd4zUG9VTS
 EbA9TIulSMMLUve5zZQyByqRLlzmzDVghMOZVbYUOQN4uNxbAPoXZ87tpiRhk8Oc
 XW+LRABjwZOjiQ9SDWRxxb8p2WTVyjuwIOTE0pLCuEBw6RFmjGga+PzH9rDygJnF
 zEIbVDRT9Jj7rOtxSabE
 =p6z7
 -----END PGP SIGNATURE-----

Merge tags 'modules-next-for-linus' and 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull trivial module and virtio fixes from Rusty Russell.

Apparently these were meant for 3.10, but came in after the release.

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  modpost.c: Add .text.unlikely to TEXT_SECTIONS

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  virtio: remove virtqueue_add_buf().
  lguest: rename i386_head.S
  virtio_blk: Add missing 'static' qualifiers
  virtio: console: Add emergency writeonly register to config space
  virtio_pci: better macro exported in uapi
2013-07-03 13:09:06 -07:00
Linus Torvalds
a9f4a7005f Thermal limit warnings are too scary and cause unnecessary concern
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRzhAdAAoJEKurIx+X31iByQEP/1eAEHNiSCrrKUm/ovGCiVBZ
 w+ZRJkcKYBj+/3cQUeMds7OevmM8fSU5t+4zGpikobQ75xCjz7vJ6nQNLrSylqQG
 Z4weVme6a529szRwv6H9an+psj5wP0yWAbCgBSw4/O3cogDS0eMD8T6ZhEi8mWJ9
 PRSdqOLdxl7AZpJpJpP3GF7VQLJLDbVv56FcKgAZ3mtFM6qr7HFiTtnTc+Jl33V5
 bM7v23HXyoyRk+JOUMtO+4BlrkKzbtL4Zh8aGw4s71QX3zZbP0BIw9sKffNMD+2k
 oYXOcJjTwbtxQqXKoSKOtvLUQzf/SDqzlFf7yU38Spw+nRECPhVVRO0aAXKOmFKD
 74qyNuW6ZXMw7xH4M7iG6MuM/cZIvueDtp0RSZVc6Gmb5CGb1aXz4GeotvKF5Bev
 nVL65UVo6XwfB4m77ayPYqfD6KX0BsWHdCzWABNl8WbgyCY5dXRJJs/c9iLCl05I
 EXRgWLJK/H/wOY9mj3NGcvRicYJigZP2luJEdwn8mVlpZl+3wstDcgq9Ijxr0LGi
 bj2H9ro3vSaXpAnJw3FR9QbXgH/8Nbb8sV7yBddy9zntrnfEfvrYtrhOKR+fPrQL
 C5zXeaXGEnqCEcYNESSjyhDCa1az49BlJC5ebEHHqdviHzRpVVi/EG920GViWw+D
 z+z25N5hi4yoUF/+1vc/
 =Dm06
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull thermal power-limit update from Tony Luck:
 "Thermal limit warnings are too scary and cause unnecessary concern"

* tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  x86 thermal: Disable power limit notification interrupt by default
  x86 thermal: Delete power-limit-notification console messages
2013-07-03 11:16:09 -07:00
Linus Torvalds
1873e50028 Main features:
- KVM and Xen ports to AArch64
 - Hugetlbfs and transparent huge pages support for arm64
 - Applied Micro X-Gene Kconfig entry and dts file
 - Cache flushing improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.9 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0bZAAAoJEGvWsS0AyF7xTEEP/R/aRoqWwbVAMlwAhujq616O
 t4RzIyBXZXqxS9I+raokCX4mgYxdeisJlzN2hoq73VEX2BQlXZoYh8vmfY9WeNSM
 2pdfif2HF7oo9ymCRyqfuhbumPrTyJhpbguzOYrxPqpp2f1hv2D8hbUJEFj429yL
 UjqTFoONngfouZmAlwrPGZQKhBI95vvN53yvDMH0PWfvpm07DKGIQMYp20y0pj8j
 slhLH3bh2kfpS1cf23JtH6IICwWD2pXW0POo569CfZry6bI74xve+Trcsm7iPnsO
 PSI1P046ME1mu3SBbKwiPIdN/FQqWwTHW07fvMmH/xuXu3Zs/mxgzi7vDzDrVvTg
 PJSbKWD6N/IPPwKS/gCUmWWDASO0bXx3KlDuRZqAjbRojs0UPUOTUhzJM/BHUms1
 vY2QS9lAm02LmZZrk1LeKKP85gB+qKQvHuOVhIOldWeLGKtsNufz1kynz6YTqsLq
 uUB55KwbhQ7q8+aoY6lWujqiTXMoLkBgGdjHs2I407PAv7ZjlhRWk2fIry7xJifp
 rKu2cIlWsRe4CGvGI410NvIJFrGvJAV4wA43sgBDjPumyILgT/5jw9r3RpJEBZZs
 akw/Bl1CbL+gMjyoPUWgcWZdRkUCE0eLrgyMOmaYfst8cOTaWw4dWLvUG/bBZg+Y
 mGnuEQUQtAPadk8P/Sv3
 =PZ/e
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64

Pull ARM64 updates from Catalin Marinas:
 "Main features:
   - KVM and Xen ports to AArch64
   - Hugetlbfs and transparent huge pages support for arm64
   - Applied Micro X-Gene Kconfig entry and dts file
   - Cache flushing improvements

  For arm64 huge pages support, there are x86 changes moving part of
  arch/x86/mm/hugetlbpage.c into mm/hugetlb.c to be re-used by arm64"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: (66 commits)
  arm64: Add initial DTS for APM X-Gene Storm SOC and APM Mustang board
  arm64: Add defines for APM ARMv8 implementation
  arm64: Enable APM X-Gene SOC family in the defconfig
  arm64: Add Kconfig option for APM X-Gene SOC family
  arm64/Makefile: provide vdso_install target
  ARM64: mm: THP support.
  ARM64: mm: Raise MAX_ORDER for 64KB pages and THP.
  ARM64: mm: HugeTLB support.
  ARM64: mm: Move PTE_PROT_NONE bit.
  ARM64: mm: Make PAGE_NONE pages read only and no-execute.
  ARM64: mm: Restore memblock limit when map_mem finished.
  mm: thp: Correct the HPAGE_PMD_ORDER check.
  x86: mm: Remove general hugetlb code from x86.
  mm: hugetlb: Copy general hugetlb code from x86 to mm.
  x86: mm: Remove x86 version of huge_pmd_share.
  mm: hugetlb: Copy huge_pmd_share from x86 to mm.
  arm64: KVM: document kernel object mappings in HYP
  arm64: KVM: MAINTAINERS update
  arm64: KVM: userspace API documentation
  arm64: KVM: enable initialization of a 32bit vcpu
  ...
2013-07-03 10:31:38 -07:00
Linus Torvalds
7c6809ff2b Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 UV update from Ingo Molnar:
 "There's a single commit in this tree, which adds support for a new SGI
  UV GRU (Global Reference Unit - fast NUMA messaging ASIC) hardware
  feature to scale up and beyond: an optional distributed mode that will
  allow per-node address mapping of local GRU space, as opposed to
  mapping all GRU hardware to the same contiguous high space"

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/UV: Add GRU distributed mode mappings
2013-07-02 16:33:05 -07:00
Linus Torvalds
96a3d998fb Merge branch 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 tracing updates from Ingo Molnar:
 "This tree adds IRQ vector tracepoints that are named after the handler
  and which output the vector #, based on a zero-overhead approach that
  relies on changing the IDT entries, by Seiji Aguchi.

  The new tracepoints look like this:

   # perf list | grep -i irq_vector
    irq_vectors:local_timer_entry                      [Tracepoint event]
    irq_vectors:local_timer_exit                       [Tracepoint event]
    irq_vectors:reschedule_entry                       [Tracepoint event]
    irq_vectors:reschedule_exit                        [Tracepoint event]
    irq_vectors:spurious_apic_entry                    [Tracepoint event]
    irq_vectors:spurious_apic_exit                     [Tracepoint event]
    irq_vectors:error_apic_entry                       [Tracepoint event]
    irq_vectors:error_apic_exit                        [Tracepoint event]
   [...]"

* 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tracing: Add config option checking to the definitions of mce handlers
  trace,x86: Do not call local_irq_save() in load_current_idt()
  trace,x86: Move creation of irq tracepoints from apic.c to irq.c
  x86, trace: Add irq vector tracepoints
  x86: Rename variables for debugging
  x86, trace: Introduce entering/exiting_irq()
  tracing: Add DEFINE_EVENT_FN() macro
2013-07-02 16:31:49 -07:00
Linus Torvalds
3045f94a20 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "The changes in this tree are:

   - ACPI APEI (ACPI Platform Error Interface) improvements, by Chen
     Gong
   - misc MCE fixes/cleanups"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Update MCE severity condition check
  mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem
  ACPI/APEI: Update einj documentation for param1/param2
  ACPI/APEI: Add parameter check before error injection
  ACPI, APEI, EINJ: Fix error return code in einj_init()
  x86, mce: Fix "braodcast" typo
2013-07-02 16:30:46 -07:00
Linus Torvalds
52e8ad9066 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "Two changes:

   - A Kconfig dependency fix/cleanup

   - Introduce the 'make kvmconfig' KVM configuration helper utility
     that turns the current .config into a KVM-bootable config.  Useful
     for debugging specific native kernel configs that have no KVM
     config options enabled on VM setups."

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform: Make X86_GOLDFISH depend on X86_EXTENDED_PLATFORM
  x86/platform: Add kvmconfig to the phony targets
  x86, platform, kvm, kconfig: Turn existing .config's into KVM-capable configs
2013-07-02 16:29:48 -07:00
Linus Torvalds
1982269a5c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "Misc improvements:

   - Fix /proc/mtrr reporting
   - Fix ioremap printout
   - Remove the unused pvclock fixmap entry on 32-bit
   - misc cleanups"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioremap: Correct function name output
  x86: Fix /proc/mtrr with base/size more than 44bits
  ix86: Don't waste fixmap entries
  x86/mm: Drop unneeded include <asm/*pgtable, page*_types.h>
  x86_64: Correct phys_addr in cleanup_highmap comment
2013-07-02 16:29:05 -07:00
Linus Torvalds
fdd78889aa Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading update from Ingo Molnar:
 "Two main changes that improve microcode loading on AMD CPUs:

   - Add support for all-in-one binary microcode files that concatenate
     the microcode images of multiple processor families, by Jacob Shin

   - Add early microcode loading (embedded in the initrd) support, also
     by Jacob Shin"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, amd: Another early loading fixup
  x86, microcode, amd: Allow multiple families' bin files appended together
  x86, microcode, amd: Make find_ucode_in_initrd() __init
  x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=m
  x86, microcode, amd: Early microcode patch loading support for AMD
  x86, microcode, amd: Refactor functions to prepare for early loading
  x86, microcode: Vendor abstract out save_microcode_in_initrd()
  x86, microcode, intel: Correct typo in printk
2013-07-02 16:28:10 -07:00
Linus Torvalds
d652df0b2f Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU changes from Ingo Molnar:
 "There are two bigger changes in this tree:

   - Add an [early-use-]safe static_cpu_has() variant and other
     robustness improvements, including the new X86_DEBUG_STATIC_CPU_HAS
     configurable debugging facility, motivated by recent obscure FPU
     code bugs, by Borislav Petkov

   - Reimplement FPU detection code in C and drop the old asm code, by
     Peter Anvin."

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, fpu: Use static_cpu_has_safe before alternatives
  x86: Add a static_cpu_has_safe variant
  x86: Sanity-check static_cpu_has usage
  x86, cpu: Add a synthetic, always true, cpu feature
  x86: Get rid of ->hard_math and all the FPU asm fu
2013-07-02 16:26:44 -07:00
Linus Torvalds
4d6f843a38 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI changes from Ingo Molnar:
 "Two fixes that should in principle increase robustness of our
  interaction with the EFI firmware, and a cleanup"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: retry ExitBootServices() on failure
  efi: Convert runtime services function ptrs
  UEFI: Don't pass boot services regions to SetVirtualAddressMap()
2013-07-02 16:25:50 -07:00
Linus Torvalds
55a0d3ff60 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 debug update from Ingo Molnar:
 "Misc debuggability improvements:

   - Optimize the x86 CPU register printout a bit
   - Expose the tboot TXT log via debugfs
   - Small do_debug() cleanup"

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tboot: Provide debugfs interfaces to access TXT log
  x86: Remove weird PTR_ERR() in do_debug
  x86/debug: Only print out DR registers if they are not power-on defaults
2013-07-02 16:25:06 -07:00
Linus Torvalds
35c23d5d79 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
 "Two changes:

   - Extend 32-bit double fault debugging aid to 64-bit
   - Fix a build warning"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel/cacheinfo: Shut up last long-standing warning
  x86: Extend #DF debugging aid to 64-bit
2013-07-02 16:24:24 -07:00
Linus Torvalds
57935b262c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc x86 cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S
  x86, cleanups: Remove extra tab in __flush_tlb_one()
  x86/mce: Remove check for CONFIG_X86_MCE_P4THERMAL
2013-07-02 16:23:50 -07:00
Linus Torvalds
5f16a8cf2d Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot build fix from Ingo Molnar:
 "Small fixlet for the build process"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Close opened file descriptor
2013-07-02 16:22:55 -07:00
Linus Torvalds
002e44bfb5 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull asm/x86 changes from Ingo Molnar:
 "Misc changes, with a bigger processor-flags cleanup/reorganization by
  Peter Anvin"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, asm, cleanup: Replace open-coded control register values with symbolic
  x86, processor-flags: Fix the datatypes and add bit number defines
  x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE
  x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
  linux/const.h: Add _BITUL() and _BITULL()
  x86/vdso: Convert use of typedef ctl_table to struct ctl_table
  x86: __force_order doesn't need to be an actual variable
2013-07-02 16:21:45 -07:00
Linus Torvalds
e13053f506 Merge branch 'sched-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull voluntary preemption fixes from Ingo Molnar:
 "This tree contains a speedup which is achieved through better
  might_sleep()/might_fault() preemption point annotations for uaccess
  functions, by Michael S Tsirkin:

  1. The only reason uaccess routines might sleep is if they fault.
     Make this explicit for all architectures.

  2. A voluntary preemption point in uaccess functions means compiler
     can't inline them efficiently, this breaks assumptions that they
     are very fast and small that e.g.  net code seems to make.  Remove
     this preemption point so behaviour matches with what callers
     assume.

  3. Accesses (e.g through socket ops) to kernel memory with KERNEL_DS
     like net/sunrpc does will never sleep.  Remove an unconditinal
     might_sleep() in the might_fault() inline in kernel.h (used when
     PROVE_LOCKING is not set).

  4. Accesses with pagefault_disable() return EFAULT but won't cause
     caller to sleep.  Check for that and thus avoid might_sleep() when
     PROVE_LOCKING is set.

  These changes offer a nice speedup for CONFIG_PREEMPT_VOLUNTARY=y
  kernels, here's a network bandwidth measurement between a virtual
  machine and the host:

   before:
        incoming: 7122.77   Mb/s
        outgoing: 8480.37   Mb/s

   after:
        incoming: 8619.24   Mb/s   [ +21.0% ]
        outgoing: 9455.42   Mb/s   [ +11.5% ]

  I kept these changes in a separate tree, separate from scheduler
  changes, because it's a mixed MM and scheduler topic"

* 'sched-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mm, sched: Allow uaccess in atomic with pagefault_disable()
  mm, sched: Drop voluntary schedule from might_fault()
  x86: uaccess s/might_sleep/might_fault/
  tile: uaccess s/might_sleep/might_fault/
  powerpc: uaccess s/might_sleep/might_fault/
  mn10300: uaccess s/might_sleep/might_fault/
  microblaze: uaccess s/might_sleep/might_fault/
  m32r: uaccess s/might_sleep/might_fault/
  frv: uaccess s/might_sleep/might_fault/
  arm64: uaccess s/might_sleep/might_fault/
  asm-generic: uaccess s/might_sleep/might_fault/
2013-07-02 16:19:24 -07:00
Linus Torvalds
f0bb4c0ab0 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel improvements:

   - watchdog driver improvements by Li Zefan
   - Power7 CPI stack events related improvements by Sukadev Bhattiprolu
   - event multiplexing via hrtimers and other improvements by Stephane
     Eranian
   - kernel stack use optimization by Andrew Hunter
   - AMD IOMMU uncore PMU support by Suravee Suthikulpanit
   - NMI handling rate-limits by Dave Hansen
   - various hw_breakpoint fixes by Oleg Nesterov
   - hw_breakpoint overflow period sampling and related signal handling
     fixes by Jiri Olsa
   - Intel Haswell PMU support by Andi Kleen

  Tooling improvements:

   - Reset SIGTERM handler in workload child process, fix from David
     Ahern.
   - Makefile reorganization, prep work for Kconfig patches, from Jiri
     Olsa.
   - Add automated make test suite, from Jiri Olsa.
   - Add --percent-limit option to 'top' and 'report', from Namhyung
     Kim.
   - Sorting improvements, from Namhyung Kim.
   - Expand definition of sysfs format attribute, from Michael Ellerman.

  Tooling fixes:

   - 'perf tests' fixes from Jiri Olsa.
   - Make Power7 CPI stack events available in sysfs, from Sukadev
     Bhattiprolu.
   - Handle death by SIGTERM in 'perf record', fix from David Ahern.
   - Fix printing of perf_event_paranoid message, from David Ahern.
   - Handle realloc failures in 'perf kvm', from David Ahern.
   - Fix divide by 0 in variance, from David Ahern.
   - Save parent pid in thread struct, from David Ahern.
   - Handle JITed code in shared memory, from Andi Kleen.
   - Fixes for 'perf diff', from Jiri Olsa.
   - Remove some unused struct members, from Jiri Olsa.
   - Add missing liblk.a dependency for python/perf.so, fix from Jiri
     Olsa.
   - Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
   - No need to do locking when adding hists in perf report, only 'top'
     needs that, from Namhyung Kim.
   - Fix alignment of symbol column in in the hists browser (top,
     report) when -v is given, from NAmhyung Kim.
   - Fix 'perf top' -E option behavior, from Namhyung Kim.
   - Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
   - Fix compile errors in bp_signal 'perf test', from Sukadev
     Bhattiprolu.

  ... and more things"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
  perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
  perf/x86: Fix shared register mutual exclusion enforcement
  perf/x86/intel: Support full width counting
  x86: Add NMI duration tracepoints
  perf: Drop sample rate when sampling is too slow
  x86: Warn when NMI handlers take large amounts of time
  hw_breakpoint: Introduce "struct bp_cpuinfo"
  hw_breakpoint: Simplify *register_wide_hw_breakpoint()
  hw_breakpoint: Introduce cpumask_of_bp()
  hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
  hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
  perf/x86/intel: Add mem-loads/stores support for Haswell
  perf/x86/intel: Support Haswell/v4 LBR format
  perf/x86/intel: Move NMI clearing to end of PMI handler
  perf/x86/intel: Add Haswell PEBS support
  perf/x86/intel: Add simple Haswell PMU support
  perf/x86/intel: Add Haswell PEBS record support
  perf/x86/intel: Fix sparse warning
  perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
  perf/x86/amd: Add IOMMU Performance Counter resource management
  ...
2013-07-02 16:15:23 -07:00
Linus Torvalds
0c46d68d19 Merge branch 'core-mutexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull WW mutex support from Ingo Molnar:
 "This tree adds support for wound/wait style locks, which the graphics
  guys would like to make use of in the TTM graphics subsystem.

  Wound/wait mutexes are used when other multiple lock acquisitions of a
  similar type can be done in an arbitrary order.  The deadlock handling
  used here is called wait/wound in the RDBMS literature: The older
  tasks waits until it can acquire the contended lock.  The younger
  tasks needs to back off and drop all the locks it is currently
  holding, ie the younger task is wounded.

  See this LWN.net description of W/W mutexes:

     https://lwn.net/Articles/548909/

  The comments there outline specific usecases for this facility (which
  have already been implemented for the DRM tree).

  Also see Documentation/ww-mutex-design.txt for more details"

* 'core-mutexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking-selftests: Handle unexpected failures more strictly
  mutex: Add more w/w tests to test EDEADLK path handling
  mutex: Add more tests to lib/locking-selftest.c
  mutex: Add w/w tests to lib/locking-selftest.c
  mutex: Add w/w mutex slowpath debugging
  mutex: Add support for wound/wait style locks
  arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not
2013-07-02 16:09:13 -07:00
Linus Torvalds
fc76a258d4 Driver core patches for 3.11-rc1
Here's the big driver core merge for 3.11-rc1
 
 Lots of little things, and larger firmware subsystem updates, all
 described in the shortlog.  Nice thing here is that we finally get rid
 of CONFIG_HOTPLUG, after 10+ years, thanks to Stephen Rohtwell (it had
 been always on for a number of kernel releases, now it's just removed.)
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlHRsGMACgkQMUfUDdst+ylIIACfW8lLxOPVK+iYG699TWEBAkp0
 LFEAnjlpAMJ1JnoZCuWDZObNCev93zGB
 =020+
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here's the big driver core merge for 3.11-rc1

  Lots of little things, and larger firmware subsystem updates, all
  described in the shortlog.  Nice thing here is that we finally get rid
  of CONFIG_HOTPLUG, after 10+ years, thanks to Stephen Rohtwell (it had
  been always on for a number of kernel releases, now it's just
  removed)"

* tag 'driver-core-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (27 commits)
  driver core: device.h: fix doc compilation warnings
  firmware loader: fix another compile warning with PM_SLEEP unset
  build some drivers only when compile-testing
  firmware loader: fix compile warning with PM_SLEEP set
  kobject: sanitize argument for format string
  sysfs_notify is only possible on file attributes
  firmware loader: simplify holding module for request_firmware
  firmware loader: don't export cache_firmware and uncache_firmware
  drivers/base: Use attribute groups to create sysfs memory files
  firmware loader: fix compile warning
  firmware loader: fix build failure with !CONFIG_FW_LOADER_USER_HELPER
  Documentation: Updated broken link in HOWTO
  Finally eradicate CONFIG_HOTPLUG
  driver core: firmware loader: kill FW_ACTION_NOHOTPLUG requests before suspend
  driver core: firmware loader: don't cache FW_ACTION_NOHOTPLUG firmware
  Documentation: Tidy up some drivers/base/core.c kerneldoc content.
  platform_device: use a macro instead of platform_driver_register
  firmware: move EXPORT_SYMBOL annotations
  firmware: Avoid deadlock of usermodehelper lock at shutdown
  dell_rbu: Select CONFIG_FW_LOADER_USER_HELPER explicitly
  ...
2013-07-02 11:44:19 -07:00
Linus Torvalds
63580e51bb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS patches (part 1) from Al Viro:
 "The major change in this pile is ->readdir() replacement with
  ->iterate(), dealing with ->f_pos races in ->readdir() instances for
  good.

  There's a lot more, but I'd prefer to split the pull request into
  several stages and this is the first obvious cutoff point."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (67 commits)
  [readdir] constify ->actor
  [readdir] ->readdir() is gone
  [readdir] convert ecryptfs
  [readdir] convert coda
  [readdir] convert ocfs2
  [readdir] convert fatfs
  [readdir] convert xfs
  [readdir] convert btrfs
  [readdir] convert hostfs
  [readdir] convert afs
  [readdir] convert ncpfs
  [readdir] convert hfsplus
  [readdir] convert hfs
  [readdir] convert befs
  [readdir] convert cifs
  [readdir] convert freevxfs
  [readdir] convert fuse
  [readdir] convert hpfs
  reiserfs: switch reiserfs_readdir_dentry to inode
  reiserfs: is_privroot_deh() needs only directory inode, actually
  ...
2013-07-02 09:28:37 -07:00
Seiji Aguchi
4787c368a9 x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()
Reschedule vector tracepoints may be called in cpu idle state.
This causes lockdep check warning below.

The tracepoint requires rcu but for accuracy it also
requires irq_enter() (tracepoints record the irq context), thus,
the tracepoint interrupt handler should be calling irq_enter()
and not rcu_irq_enter() (irq_enter() calls rcu_irq_enter()).

So, add irq_enter/exit() to smp_trace_reschedule_interrupt()
with common pre/post processing functions, smp_entering_irq()
and exiting_irq() (exiting_irq() calls just irq_exit()
 in arch/x86/include/asm/apic.h),
because these can be shared among reschedule, call_function,
and call_function_single vectors.

[   50.720557] Testing event reschedule_exit:
[   50.721349]
[   50.721502] ===============================
[   50.721835] [ INFO: suspicious RCU usage. ]
[   50.722169] 3.10.0-rc6-00004-gcf910e8 #190 Not tainted
[   50.722582] -------------------------------
[   50.722915] /c/kernel-tests/src/linux/arch/x86/include/asm/trace/irq_vectors.h:50 suspicious rcu_dereference_check() usage!
[   50.723770]
[   50.723770] other info that might help us debug this:
[   50.723770]
[   50.724385]
[   50.724385] RCU used illegally from idle CPU!
[   50.724385] rcu_scheduler_active = 1, debug_locks = 0
[   50.725232] RCU used illegally from extended quiescent state!
[   50.725690] no locks held by swapper/0/0.
[   50.726010]
[   50.726010] stack backtrace:
[...]

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/51CDCFA3.9080101@hds.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-02 09:52:31 +02:00
Al Viro
40d158e618 consolidate io_remap_pfn_range definitions
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:35 +04:00
Wang YanQing
237d154854 x86: Fix override new_cpu_data.x86 with 486
We should set X86 to 486 before use cpuid to detect the cpu type, if
we set X86 to 486 after cpuid, then we will get 486 until cpu_detect
runs.

Signed-off-by: Wang YanQing <udknight@gmail.com>
Link: http://lkml.kernel.org/r/20130628144516.GA2177@udknight
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-28 15:27:29 -07:00
Borislav Petkov
62122fd7da x86, cpufeature: Use new CC_HAVE_ASM_GOTO
... for checking for "asm goto" compiler support. It is more explicit
this way and we cover the cases where distros have backported that
support even to gcc versions < 4.5.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1372437701-13351-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-28 15:26:48 -07:00
H. Peter Anvin
9f84b6267c Merge remote-tracking branch 'origin/x86/fpu' into queue/x86/cpu
Use the union of 3.10 x86/cpu and x86/fpu as baseline.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-28 15:26:17 -07:00
Wedson Almeida Filho
00e55a7907 x86: Use asm-goto to implement mutex fast path on x86-64
The new implementation allows the compiler to better optimize the code; the
original implementation is still used when the kernel is compiled with older
versions of gcc that don't support asm-goto.

Compiling with gcc 4.7.3, the original mutex_lock() is 60 bytes with the fast
path taking 16 instructions; the new mutex_lock() is 42 bytes, with the fast
path taking 12 instructions.

The original mutex_unlock() is 24 bytes with the fast path taking 7
instructions; the new mutex_unlock() is 25 bytes (because the compiler used
a 2-byte ret) with the fast path taking 4 instructions.

The two versions of the functions are included below for reference.

Old:
ffffffff817742a0 <mutex_lock>:
ffffffff817742a0:       55                      push   %rbp
ffffffff817742a1:       48 89 e5                mov    %rsp,%rbp
ffffffff817742a4:       48 83 ec 10             sub    $0x10,%rsp
ffffffff817742a8:       48 89 5d f0             mov    %rbx,-0x10(%rbp)
ffffffff817742ac:       48 89 fb                mov    %rdi,%rbx
ffffffff817742af:       4c 89 65 f8             mov    %r12,-0x8(%rbp)
ffffffff817742b3:       e8 28 15 00 00          callq  ffffffff817757e0 <_cond_resched>
ffffffff817742b8:       48 89 df                mov    %rbx,%rdi
ffffffff817742bb:       f0 ff 0f                lock decl (%rdi)
ffffffff817742be:       79 05                   jns    ffffffff817742c5 <mutex_lock+0x25>
ffffffff817742c0:       e8 cb 04 00 00          callq  ffffffff81774790 <__mutex_lock_slowpath>
ffffffff817742c5:       65 48 8b 04 25 c0 b7    mov    %gs:0xb7c0,%rax
ffffffff817742cc:       00 00
ffffffff817742ce:       4c 8b 65 f8             mov    -0x8(%rbp),%r12
ffffffff817742d2:       48 89 43 18             mov    %rax,0x18(%rbx)
ffffffff817742d6:       48 8b 5d f0             mov    -0x10(%rbp),%rbx
ffffffff817742da:       c9                      leaveq
ffffffff817742db:       c3                      retq

ffffffff81774250 <mutex_unlock>:
ffffffff81774250:       55                      push   %rbp
ffffffff81774251:       48 c7 47 18 00 00 00    movq   $0x0,0x18(%rdi)
ffffffff81774258:       00
ffffffff81774259:       48 89 e5                mov    %rsp,%rbp
ffffffff8177425c:       f0 ff 07                lock incl (%rdi)
ffffffff8177425f:       7f 05                   jg     ffffffff81774266 <mutex_unlock+0x16>
ffffffff81774261:       e8 ea 04 00 00          callq  ffffffff81774750 <__mutex_unlock_slowpath>
ffffffff81774266:       5d                      pop    %rbp
ffffffff81774267:       c3                      retq

New:
ffffffff81774920 <mutex_lock>:
ffffffff81774920:       55                      push   %rbp
ffffffff81774921:       48 89 e5                mov    %rsp,%rbp
ffffffff81774924:       53                      push   %rbx
ffffffff81774925:       48 89 fb                mov    %rdi,%rbx
ffffffff81774928:       e8 a3 0e 00 00          callq  ffffffff817757d0 <_cond_resched>
ffffffff8177492d:       f0 ff 0b                lock decl (%rbx)
ffffffff81774930:       79 08                   jns    ffffffff8177493a <mutex_lock+0x1a>
ffffffff81774932:       48 89 df                mov    %rbx,%rdi
ffffffff81774935:       e8 16 fe ff ff          callq  ffffffff81774750 <__mutex_lock_slowpath>
ffffffff8177493a:       65 48 8b 04 25 c0 b7    mov    %gs:0xb7c0,%rax
ffffffff81774941:       00 00
ffffffff81774943:       48 89 43 18             mov    %rax,0x18(%rbx)
ffffffff81774947:       5b                      pop    %rbx
ffffffff81774948:       5d                      pop    %rbp
ffffffff81774949:       c3                      retq

ffffffff81774730 <mutex_unlock>:
ffffffff81774730:       48 c7 47 18 00 00 00    movq   $0x0,0x18(%rdi)
ffffffff81774737:       00
ffffffff81774738:       f0 ff 07                lock incl (%rdi)
ffffffff8177473b:       7f 0a                   jg     ffffffff81774747 <mutex_unlock+0x17>
ffffffff8177473d:       55                      push   %rbp
ffffffff8177473e:       48 89 e5                mov    %rsp,%rbp
ffffffff81774741:       e8 aa ff ff ff          callq  ffffffff817746f0 <__mutex_unlock_slowpath>
ffffffff81774746:       5d                      pop    %rbp
ffffffff81774747:       f3 c3                   repz retq

Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Link: http://lkml.kernel.org/r/1372420245-60021-1-git-send-email-wedsonaf@gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-28 15:22:18 -07:00
David Vrabel
47433b8c9d x86: xen: Sync the CMOS RTC as well as the Xen wallclock
Adjustments to Xen's persistent clock via update_persistent_clock()
don't actually persist, as the Xen wallclock is a software only clock
and modifications to it do not modify the underlying CMOS RTC.

The x86_platform.set_wallclock hook is there to keep the hardware RTC
synchronized. On a guest this is pointless.

On Dom0 we can use the native implementaion which actually updates the
hardware RTC, but we still need to keep the software emulation of RTC
for the guests up to date. The subscription to the pvclock_notifier
allows us to emulate this easily. The notifier is called at every tick
and when the clock was set.

Right now we only use that notifier when the clock was set, but due to
the fact that it is called periodically from the timekeeping update
code, we can utilize it to emulate the NTP driven drift compensation
of update_persistant_clock() for the Xen wall (software) clock.

Add a 11 minutes periodic update to the pvclock_gtod notifier callback
to achieve that. The static variable 'next' which maintains that 11
minutes update cycle is protected by the core code serialization so
there is no need to add a Xen specific serialization mechanism.

[ tglx: Massaged changelog and added a few comments ]

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/1372329348-20841-6-git-send-email-david.vrabel@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-28 23:15:07 +02:00
David Vrabel
5584880e44 x86: xen: Sync the wallclock when the system time is set
Currently the Xen wallclock is only updated every 11 minutes if NTP is
synchronized to its clock source (using the sync_cmos_clock() work).
If a guest is started before NTP is synchronized it may see an
incorrect wallclock time.

Use the pvclock_gtod notifier chain to receive a notification when the
system time has changed and update the wallclock to match.

This chain is called on every timer tick and we want to avoid an extra
(expensive) hypercall on every tick.  Because dom0 has historically
never provided a very accurate wallclock and guests do not expect one,
we can do this simply: the wallclock is only updated if the clock was
set.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/1372329348-20841-5-git-send-email-david.vrabel@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-28 23:15:06 +02:00
Laszlo Ersek
0b0c002c34 xen/time: remove blocked time accounting from xen "clockchip"
... because the "clock_event_device framework" already accounts for idle
time through the "event_handler" function pointer in
xen_timer_interrupt().

The patch is intended as the completion of [1]. It should fix the double
idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to
stolen time accounting (the removed code seems to be isolated).

The approach may be completely misguided.

[1] https://lkml.org/lkml/2011/10/6/10
[2] http://lists.xensource.com/archives/html/xen-devel/2010-08/msg01068.html

John took the time to retest this patch on top of v3.10 and reported:
"idle time is correctly incremented for pv and hvm for the normal
case, nohz=off and nohz=idle." so lets put this patch in.

CC: stable@vger.kernel.org
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-28 12:11:39 -04:00
Rafael J. Wysocki
3b4550e0e0 Merge branch 'acpi-pm'
* acpi-pm:
  ACPI / PM: Rework and clean up acpi_dev_pm_get_state()
  ACPI / PM: Replace ACPI_STATE_D3 with ACPI_STATE_D3_COLD in device_pm.c
  ACPI / PM: Rename function acpi_device_power_state() and make it static
  ACPI / PM: acpi_processor_suspend() can be static
  xen / ACPI / sleep: Register an acpi_suspend_lowlevel callback.
  x86 / ACPI / sleep: Provide registration for acpi_suspend_lowlevel.
2013-06-28 12:58:30 +02:00
Borislav Petkov
4f4319a02a x86/ioremap: Correct function name output
Infact, let the compiler enter the function name so that there
are no discrepancies.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1372369996-20556-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28 11:11:58 +02:00
Ingo Molnar
fb476cffd5 Changes to simplify the SDM mean that we can also simplify
the code for SRAR (software recoverable action required) errors.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRzKq7AAoJEKurIx+X31iBviQP+gJ6b9BemXj8/TLjWN3P3mAB
 bQHU9imlSXmggfauC1C4/b60k4G/D8d1hN1Lboak7TbFvY69ONcSodY9w6O9Lknb
 r83xT0oUIe5IYWqC42+godt8P59ed10lUa8F5eaA9DhhzcsCWO2GG/ITqCmyjEom
 zJGKlLytq6ViJf+6ig3gTmWgHU4tvlJvOPh/O0WNWXH9kbOtnsLT8ImY9GMO/R1B
 LdItg2nxbDZmhjLOsKamaj2YpPbZ9Vtvam9CnQwirbUAkdGZfmRsq+K9y8O5MjHJ
 GRqOKwQ3BdKBUIEu36BxgIGHSg8hWYSefGD8nz8IP+QHwy5fx7SREdgx5M+AMSFt
 8DewSUij2wMwH/vuP4KrdYWva6LNMBtR+NItCkzpdY5ykcdoSzo06nCotGcUrxVN
 iEmYywZzO3CTYJj0lWqdrhhiFln/9ZrhXaWUbRoX5m3HAQK7XeTPT+3O7Vs5fWDN
 3Fb/0EzyJjmv+Mysl8/PvR9umJs3ELK5MiVLIAIkuozh8+L33byk/WX+VOfqTsY4
 KXYDm6fck3tRfn+PJGrAMSwIEyhgVJArRJC39PydEUIwjlel6DRU0Rs26wQXtFh8
 DHxTIiEubmvctvi3DnHOztOI3d3EkAayU6Dk4cNSy46FwujJC0v+07EQav6BLA3e
 x1HK+3PVVSQYuqFIeyWa
 =PJXS
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull MCE cleanup from Tony Luck:

 "Changes to simplify the SDM means that we can also simplify
  the code for SRAR (software recoverable action required) errors."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28 11:09:26 +02:00
Qiaowei Ren
13bfd47a0e x86/tboot: Provide debugfs interfaces to access TXT log
These logs come from tboot (Trusted Boot, an open source,
pre-kernel/VMM module that uses Intel TXT to perform a
measured and verified launch of an OS kernel/VMM.).

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Gang Wei <gang.wei@intel.com>
Link: http://lkml.kernel.org/r/1372053333-21788-1-git-send-email-qiaowei.ren@intel.com
[ Beautified the code a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28 11:05:16 +02:00
Chen Gong
33d7885b59 x86/mce: Update MCE severity condition check
Update some SRAR severity conditions check to make it clearer,
according to latest Intel SDM Vol 3(June 2013), table 15-20.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-27 13:45:52 -07:00
Gleb Natapov
24f7bb52e9 KVM: Fix RTC interrupt coalescing tracking
This reverts most of the f1ed0450a5. After
the commit kvm_apic_set_irq() no longer returns accurate information
about interrupt injection status if injection is done into disabled
APIC. RTC interrupt coalescing tracking relies on the information to be
accurate and cannot recover if it is not.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-27 14:20:53 +03:00
Yoshihiro YUNOMAE
489223edf2 kvm: Add a tracepoint write_tsc_offset
Add a tracepoint write_tsc_offset for tracing TSC offset change.
We want to merge ftrace's trace data of guest OSs and the host OS using
TSC for timestamp in chronological order. We need "TSC offset" values for
each guest when merge those because the TSC value on a guest is always the
host TSC plus guest's TSC offset. If we get the TSC offset values, we can
calculate the host TSC value for each guest events from the TSC offset and
the event TSC value. The host TSC values of the guest events are used when we
want to merge trace data of guests and the host in chronological order.
(Note: the trace_clock of both the host and the guest must be set x86-tsc in
this case)

This tracepoint also records vcpu_id which can be used to merge trace data for
SMP guests. A merge tool will read TSC offset for each vcpu, then the tool
converts guest TSC values to host TSC values for each vcpu.

TSC offset is stored in the VMCS by vmx_write_tsc_offset() or
vmx_adjust_tsc_offset(). KVM executes the former function when a guest boots.
The latter function is executed when kvm clock is updated. Only host can read
TSC offset value from VMCS, so a host needs to output TSC offset value
when TSC offset is changed.

Since the TSC offset is not often changed, it could be overwritten by other
frequent events while tracing. To avoid that, I recommend to use a special
instance for getting this event:

1. set a instance before booting a guest
 # cd /sys/kernel/debug/tracing/instances
 # mkdir tsc_offset
 # cd tsc_offset
 # echo x86-tsc > trace_clock
 # echo 1 > events/kvm/kvm_write_tsc_offset/enable

2. boot a guest

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-27 14:20:51 +03:00
Takuya Yoshikawa
7a2e8aaf0f KVM: MMU: Inform users of mmio generation wraparound
Without this information, users will just see unexpected performance
problems and there is little chance we will get good reports from them:
note that mmio generation is increased even when we just start, or stop,
dirty logging for some memory slot, in which case users cannot expect
all shadow pages to be zapped.

printk_ratelimited() is used for this taking into account the problems
that we can see the information many times when we start multiple VMs
and guests can trigger this by reading ROM in a loop for example.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27 14:20:49 +03:00
Xiao Guangrong
f6f8adeef5 KVM: MMU: document fast invalidate all pages
Document it to Documentation/virtual/kvm/mmu.txt

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27 14:20:47 +03:00
Xiao Guangrong
0cbf8e437b KVM: MMU: document write_flooding_count
Document write_flooding_count to Documentation/virtual/kvm/mmu.txt

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27 14:20:43 +03:00
Xiao Guangrong
accaefe07d KVM: MMU: document clear_spte_count
Document it to Documentation/virtual/kvm/mmu.txt

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27 14:20:42 +03:00
Xiao Guangrong
a8eca9dcc6 KVM: MMU: drop kvm_mmu_zap_mmio_sptes
Drop kvm_mmu_zap_mmio_sptes and use kvm_mmu_invalidate_zap_all_pages
instead to handle mmio generation number overflow

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27 14:20:40 +03:00
Xiao Guangrong
69c9ea93ea KVM: MMU: init kvm generation close to mmio wrap-around value
Then it has the chance to trigger mmio generation number wrap-around

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
[Change from MMIO_MAX_GEN - 13 to MMIO_MAX_GEN - 150, because 13 is
 very close to the number of calls to KVM_SET_USER_MEMORY_REGION
 before the guest is started and there is any chance to create any
 spte. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27 14:20:39 +03:00
Xiao Guangrong
089504c0d4 KVM: MMU: add tracepoint for check_mmio_spte
It is useful for debug mmio spte invalidation

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27 14:20:37 +03:00
Xiao Guangrong
f8f559422b KVM: MMU: fast invalidate all mmio sptes
This patch tries to introduce a very simple and scale way to invalidate
all mmio sptes - it need not walk any shadow pages and hold mmu-lock

KVM maintains a global mmio valid generation-number which is stored in
kvm->memslots.generation and every mmio spte stores the current global
generation-number into his available bits when it is created

When KVM need zap all mmio sptes, it just simply increase the global
generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
then it walks the shadow page table and get the mmio spte. If the
generation-number on the spte does not equal the global generation-number,
it will go to the normal #PF handler to update the mmio spte

Since 19 bits are used to store generation-number on mmio spte, we zap all
mmio sptes when the number is round

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27 14:20:36 +03:00
Xiao Guangrong
b37fbea6ce KVM: MMU: make return value of mmio page fault handler more readable
Define some meaningful names instead of raw code

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27 14:20:17 +03:00
Xiao Guangrong
f2fd125d32 KVM: MMU: store generation-number into mmio spte
Store the generation-number into bit3 ~ bit11 and bit52 ~ bit61, totally
19 bits can be used, it should be enough for nearly all most common cases

In this patch, the generation-number is always 0, it will be changed in
the later patch

[Gleb: masking generation bits from spte in get_mmio_spte_gfn() and
       get_mmio_spte_access()]

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-27 14:18:15 +03:00
Dave Airlie
dc0216445c Merge branch 'core/mutexes' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into drm-next
Merge in the tip core/mutexes branch for future GPU driver use.

Ingo will send this branch to Linus prior to drm-next.

* 'core/mutexes' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  locking-selftests: Handle unexpected failures more strictly
  mutex: Add more w/w tests to test EDEADLK path handling
  mutex: Add more tests to lib/locking-selftest.c
  mutex: Add w/w tests to lib/locking-selftest.c
  mutex: Add w/w mutex slowpath debugging
  mutex: Add support for wound/wait style locks
  arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not
  powerpc/pci: Fix boot panic on mpc83xx (regression)
  s390/ipl: Fix FCP WWPN and LUN format strings for read
  fs: fix new splice.c kernel-doc warning
  spi/pxa2xx: fix memory corruption due to wrong size used in devm_kzalloc()
  s390/mem_detect: fix memory hole handling
  s390/dma: support debug_dma_mapping_error
  s390/dma: fix mapping_error detection
  s390/irq: Only define synchronize_irq() on SMP
  Input: xpad - fix for "Mad Catz Street Fighter IV FightPad" controllers
  Input: wacom - add a new stylus (0x100802) for Intuos5 and Cintiqs
  spi/pxa2xx: use GFP_ATOMIC in sg table allocation
  fuse: hold i_mutex in fuse_file_fallocate()
  Input: add missing dependencies on CONFIG_HAS_IOMEM
  ...
2013-06-27 20:42:09 +10:00
Dave Airlie
4300a0f8bd Linux 3.10-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEbBAABAgAGBQJRxf9cAAoJEHm+PkMAQRiGMWkH911xM4gRmFgE7SqVW4F4AWBm
 ngcqMqNy9IdqKfibORUUDvVfEa5gjD5ai2quIKpfQiaukbpQJ696H90ijuAkajLn
 DQBrN243s0pzhhc/quWINnWxsFQ613JjdUMUMaD7e9A1aKjYzWrPGt/tSjrFXGCP
 tArTupVzc/iOmnEQDKiROI/Nokq44QJ36aTGPM7n08xMtpKmkCXM+9/UosBteB0O
 HVI33dmjwz7i55fI53XAWyuZCE+gSEnA4z8spJ9LfXso2W14V+roc+GuL6OyeeTI
 pCn/+4niVPb4B0ROZlpyVmdZjbPPcMMEK5o+BSJI68SH6LHZTQh2iVuqYfpSyA==
 =uUH5
 -----END PGP SIGNATURE-----

Merge tag 'v3.10-rc7' into drm-next

Linux 3.10-rc7

The sdvo lvds fix in this -fixes pull

commit c3456fb3e4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Jun 10 09:47:58 2013 +0200

    drm/i915: prefer VBT modes for SVDO-LVDS over EDID

has a silent functional conflict with

commit 990256aec2
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Fri May 31 12:17:07 2013 +0000

    drm: Add probed modes in probe order

in drm-next. W simply need to add the vbt modes before edid modes, i.e. the
other way round than now.

Conflicts:
	drivers/gpu/drm/drm_prime.c
	drivers/gpu/drm/i915/intel_sdvo.c
2013-06-27 20:40:44 +10:00
Jacob Shin
9608d33b82 x86, microcode, amd: Another early loading fixup
commit cd1c32ca96 is an early premature
rendition of the patch. Augment it with this delta patch to:
  * correctly mark offset and size of the matching bin file
  * use __pa instead of __pa_nodebug during AP load
  * check for !initrd_start before using it

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/20130620152414.GA6676@jshin-Toonie
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-26 14:55:37 -07:00
Stephane Eranian
983433b581 perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
Make sure intel_pmu_pebs_disable() and intel_pmu_pebs_enable()
are symmetrical w.r.t. PEBS-LL and precise store.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1371824448-7306-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 21:58:51 +02:00
Stephane Eranian
2f7f73a520 perf/x86: Fix shared register mutual exclusion enforcement
This patch fixes a problem with the shared registers mutual
exclusion code and incremental event scheduling by the
generic perf_event code.

There was a bug whereby the mutual exclusion on the shared
registers was not enforced because of incremental scheduling
abort due to event constraints. As an example on Intel
Nehalem, consider the following events:

group1= L1D_CACHE_LD:E_STATE,OFFCORE_RESPONSE_0:PF_RFO,L1D_CACHE_LD:I_STATE
group2= L1D_CACHE_LD:I_STATE

The L1D_CACHE_LD event can only be measured by 2 counters. Yet, there
are 3 instances here. The first group can be scheduled and is committed.
Then, the generic code tries to schedule group2 and this fails (because
there is no more counter to support the 3rd instance of L1D_CACHE_LD).
But in x86_schedule_events() error path, put_event_contraints() is invoked
on ALL the events and not just the ones that just failed. That causes the
"lock" on the shared offcore_response MSR to be released. Yet the first group
is actually scheduled and is exposed to reprogramming of that shared msr by
the sibling HT thread. In other words, there is no guarantee on what is
measured.

This patch fixes the problem by tagging committed events with the
PERF_X86_EVENT_COMMITTED tag. In the error path of x86_schedule_events(),
only the events NOT tagged have their constraint released. The tag
is eventually removed when the event in descheduled.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130620164254.GA3556@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 21:58:49 +02:00
Linus Torvalds
54faf77d06 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Three small fixlets"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()
  hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
  kprobes: Fix arch_prepare_kprobe to handle copy insn failures
2013-06-26 08:51:44 -10:00
Ben Hutchings
cb7b80237a x86/platform: Make X86_GOLDFISH depend on X86_EXTENDED_PLATFORM
All non-PC platforms are supposed to be dependent on this
option.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Jun Nakajima <jnakajim@gmail.com>
Link: http://lkml.kernel.org/n/tip-Bcihhqhstm67fchjnkxoiJbu@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 13:28:54 +02:00
Maarten Lankhorst
a41b56efa7 arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not
This will allow me to call functions that have multiple
arguments if fastpath fails. This is required to support ticket
mutexes, because they need to be able to pass an extra argument
to the fail function.

Originally I duplicated the functions, by adding
__mutex_fastpath_lock_retval_arg. This ended up being just a
duplication of the existing function, so a way to test if
fastpath was called ended up being better.

This also cleaned up the reservation mutex patch some by being
able to call an atomic_set instead of atomic_xchg, and making it
easier to detect if the wrong unlock function was previously
used.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: robclark@gmail.com
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113105.4001.83929.stgit@patser
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 12:10:55 +02:00
Andi Kleen
069e0c3c40 perf/x86/intel: Support full width counting
Recent Intel CPUs like Haswell and IvyBridge have a new
alternative MSR range for perfctrs that allows writing the full
counter width. Enable this range if the hardware reports it
using a new capability bit.

Currently the perf code queries CPUID to get the counter width,
and sign extends the counter values as needed. The traditional
PERFCTR MSRs always limit to 32bit, even though the counter
internally is larger (usually 48 bits on recent CPUs)

When the new capability is set use the alternative range which
do not have these restrictions.

This lowers the overhead of perf stat slightly because it has to
do less interrupts to accumulate the counter value. On Haswell
it also avoids some problems with TSX aborting when the end of
the counter range is reached.

( See the patch "perf/x86/intel: Avoid checkpointed counters
  causing excessive TSX aborts" for more details. )

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1372173153-20215-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 11:59:25 +02:00
Ingo Molnar
ca02c21674 Better comments so we understand our existing machine check
bank bitmaps - prelude to adding another bitmap soon.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRygQkAAoJEKurIx+X31iB5RsP/R6w/X4wbKlXD8K8e2qhSXU2
 oYVtaMN46M+xLRdNDdE1Y+clVKDQTuapfrlOtBLZVHIkNfqF5jAtu2O1wTERpx+F
 7lDHuWuFbddbqV2vqE6RJ6ZqsUTsrlzrqLSLWGSoHn36DlnkSm+L5vDDG75QzV+w
 fPpGpJTM3kRxPUHnmwvfniJxTiBjsli6FDZ1vGq4Ingu+xxOJDU6+FdWts08hn4Y
 +KLjs1JMJSdo3di05T6t4ARKBgW3B1lJnrIS6fGYMmax+kHQqO/zvzHs3A86LZyN
 7L0toEoZHa8VRMN6C1mw0XsFwPmyMOsrHrRpBlFgHpC7QLAzx6LkxchyDBqPL0mo
 W4bnoyu1QZUvblq1mrsnJa9yeyMjSHAeq2XTBj8Pbv20AKzmCbNl3aJ5tCDhPj4Y
 A+e8vk/tq8zWCjdf3SV/D+wjgEtkeWALZZj70maufu9AKtKXy5repa6DZCKEhyIC
 yh72c3gZ25IkxjwcHmJh+CYaKll9MgdW5toZkftBin2iOdWSaHS9QX2r4I5Awvbi
 m0Iwq5Fg8DSs3AhBLxiJi8L7N+RNa+7GoTH0z6UDtfAY3ExvepfwjPzLxAfTbPw6
 oK2wLfuPiNizl2DX1yNlizGD2YSRiWKoap3Iog84A3IMTIe1nJ+97GHvsVcGna8I
 zXjnZDNQThAgBswtNY1s
 =U/Kg
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce-bitmap-comment' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull MCE updates from Tony Luck:

 "Better comments so we understand our existing machine check
  bank bitmaps - prelude to adding another bitmap soon."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 10:53:45 +02:00
H. Peter Anvin
a3d7b7dddc x86, asm, cleanup: Replace open-coded control register values with symbolic
Clean up an unnecessary open-coded control register values.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-um7za1nzf6brb17o0h4om6e3@git.kernel.org
2013-06-25 16:26:06 -07:00
H. Peter Anvin
d1fbefcb3a x86, processor-flags: Fix the datatypes and add bit number defines
The control registers are unsigned long (32 bits on i386, 64 bits on
x86-64), and so make that manifest in the data type for the various
constants.  Add defines with a _BIT suffix which defines the bit
number, as opposed to the bit mask.

This should resolve some issues with ~bitmask that Linus discovered.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-cwckhbrib2aux1qbteaebij0@git.kernel.org
2013-06-25 16:26:06 -07:00
H. Peter Anvin
afcbf13fa6 x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE
Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE to match the SDM.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Link: http://lkml.kernel.org/n/tip-buq1evi5dpykxx7ak6amaam0@git.kernel.org
2013-06-25 16:26:06 -07:00
H. Peter Anvin
1adfa76a95 x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
Bit 1 in the x86 EFLAGS is always set.  Name the macro something that
actually tries to explain what it is all about, rather than being a
tautology.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/n/tip-f10rx5vjjm6tfnt8o1wseb3v@git.kernel.org
2013-06-25 16:25:32 -07:00
Naveen N. Rao
0644414e62 mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem
There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned'
per-cpu bitmaps.  Provide comments so that we all know exactly what these
are used for, and why.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-25 13:53:27 -07:00
Yinghai Lu
d5c78673b1 x86: Fix /proc/mtrr with base/size more than 44bits
On one sytem that mtrr range is more then 44bits, in dmesg we have
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-DFFFF write-through
[    0.000000]   E0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 [000080000000-0000FFFFFFFF] mask 3FFF80000000 uncachable
[    0.000000]   1 [380000000000-38FFFFFFFFFF] mask 3F0000000000 uncachable
[    0.000000]   2 [000099000000-000099FFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   3 [00009A000000-00009AFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   4 [381FFA000000-381FFBFFFFFF] mask 3FFFFE000000 write-through
[    0.000000]   5 [381FFC000000-381FFC0FFFFF] mask 3FFFFFF00000 write-through
[    0.000000]   6 [0000AD000000-0000ADFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   7 [0000BD000000-0000BDFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   8 disabled
[    0.000000]   9 disabled

but /proc/mtrr report wrong:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x80000000000 (8388608MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x81ffa000000 (8519584MB), size=   32MB, count=1: write-through
reg05: base=0x81ffc000000 (8519616MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

so bit 44 and bit 45 get cut off.

We have problems in arch/x86/kernel/cpu/mtrr/generic.c::generic_get_mtrr().
1. for base, we miss cast base_lo to 64bit before shifting.
Fix that by adding u64 casting.

2. for size, it only can handle 44 bits aka 32bits + page_shift
Fix that with 64bit mask instead of 32bit mask_lo, then range could be
more than 44bits.
At the same time, we need to update size_or_mask for old cpus that does
support cpuid 0x80000008 to get phys_addr. Need to set high 32bits
to all 1s, otherwise will not get correct size for them.

Also fix mtrr_add_page: it should check base and (base + size - 1)
instead of base and size, as base and size could be small but
base + size could bigger enough to be out of boundary. We can
use boot_cpu_data.x86_phys_bits directly to avoid size_or_mask.

So When are we going to have size more than 44bits? that is 16TiB.

after patch we have right ouput:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x381ffa000000 (58851232MB), size=   32MB, count=1: write-through
reg05: base=0x381ffc000000 (58851264MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

-v2: simply checking in mtrr_add_page according to hpa.

[ hpa: This probably wants to go into -stable only after having sat in
  mainline for a bit.  It is not a regression. ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371162815-29931-1-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-25 13:08:10 -07:00
Greg Kroah-Hartman
b5aef682e0 Merge 3.10-rc7 into driver-core-next
We want the firmware merge fixes, and other bits, in here now.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-24 15:14:43 -07:00
H. Peter Anvin
5236eb968e Merge remote-tracking branch 'trace/tip/x86/trace' into x86/trace
Fix from Steven Rostedt.
2013-06-24 11:01:09 -07:00
Grant Likely
ddaf144c61 irqdomain: Refactor irq_domain_associate_many()
Originally, irq_domain_associate_many() was designed to unwind the
mapped irqs on a failure of any individual association. However, that
proved to be a problem with certain IRQ controllers. Some of them only
support a subset of irqs, and will fail when attempting to map a
reserved IRQ. In those cases we want to map as many IRQs as possible, so
instead it is better for irq_domain_associate_many() to make a
best-effort attempt to map irqs, but not fail if any or all of them
don't succeed. If a caller really cares about how many irqs got
associated, then it should instead go back and check that all of the
irqs is cares about were mapped.

The original design open-coded the individual association code into the
body of irq_domain_associate_many(), but with no longer needing to
unwind associations, the code becomes simpler to split out
irq_domain_associate() to contain the bulk of the logic, and
irq_domain_associate_many() to be a simple loop wrapper.

This patch also adds a new error check to the associate path to make
sure it isn't called for an irq larger than the controller can handle,
and adds locking so that the irq_domain_mutex is held while setting up a
new association.

v3: Fixup missing change to irq_domain_add_tree()
v2: Fixup x86 warning. irq_domain_associate_many() no longer returns an
    error code, but reports errors to the printk log directly. In the
    majority of cases we don't actually want to fail if there is a
    problem, but rather log it and still try to boot the system.

Signed-off-by: Grant Likely <grant.likely@linaro.org>

irqdomain: Fix flubbed irq_domain_associate_many refactoring

commit d39046ec72, "irqdomain: Refactor irq_domain_associate_many()" was
missing the following hunk which causes a boot failure on anything using
irq_domain_add_tree() to allocate an irq domain.

Signed-off-by: Grant Likely <grant.likely@linaro.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Cc: Thomas Gleixner <tglx@linutronix.de>,
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2013-06-24 14:01:42 +01:00
Borislav Petkov
fc58be7596 x86/platform: Add kvmconfig to the phony targets
... so as not to disable it with a file of the same name in the
toplevel build directory.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1371801891-23618-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 12:17:35 +02:00
Dave Hansen
0c4df02d73 x86: Add NMI duration tracepoints
This patch has been invaluable in my adventures finding
issues in the perf NMI handler.  I'm as big a fan of
printk() as anybody is, but using printk() in NMIs is
deadly when they're happening frequently.

Even hacking in trace_printk() ended up eating enough
CPU to throw off some of the measurements I was making.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:58 +02:00
Dave Hansen
14c63f17b1 perf: Drop sample rate when sampling is too slow
This patch keeps track of how long perf's NMI handler is taking,
and also calculates how many samples perf can take a second.  If
the sample length times the expected max number of samples
exceeds a configurable threshold, it drops the sample rate.

This way, we don't have a runaway sampling process eating up the
CPU.

This patch can tend to drop the sample rate down to level where
perf doesn't work very well.  *BUT* the alternative is that my
system hangs because it spends all of its time handling NMIs.

I'll take a busted performance tool over an entire system that's
busted and undebuggable any day.

BTW, my suspicion is that there's still an underlying bug here.
Using the HPET instead of the TSC is definitely a contributing
factor, but I suspect there are some other things going on.
But, I can't go dig down on a bug like that with my machine
hanging all the time.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
[ Prettified it a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:57 +02:00
Dave Hansen
2ab00456ea x86: Warn when NMI handlers take large amounts of time
I have a system which is causing all kinds of problems.  It has
8 NUMA nodes, and lots of cores that can fight over cachelines.
If things are not working _perfectly_, then NMIs can take longer
than expected.

If we get too many of them backed up to each other, we can
easily end up in a situation where we are doing nothing *but*
running NMIs.  The biggest problem, though, is that this happens
_silently_.  You might be lucky to get an hrtimer warning, but
most of the time system simply hangs.

This patch should at least give us some warning before we fall
off the cliff.  the warnings look like this:

	nmi_handle: perf_event_nmi_handler() took: 26095071 ns

The message is triggered whenever we notice the longest NMI
we've seen to date.  You can always view and reset this value
via the debugfs interface if you like.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:56 +02:00
Seiji Aguchi
33e5ff634f x86/tracing: Add config option checking to the definitions of mce handlers
In case CONFIG_X86_MCE_THRESHOLD and CONFIG_X86_THERMAL_VECTOR
are disabled, kernel build fails as follows.

   arch/x86/built-in.o: In function `trace_threshold_interrupt':
   (.entry.text+0x122b): undefined reference to `smp_trace_threshold_interrupt'
   arch/x86/built-in.o: In function `trace_thermal_interrupt':
   (.entry.text+0x132b): undefined reference to `smp_trace_thermal_interrupt'

In this case, trace_threshold_interrupt/trace_thermal_interrupt
are not needed to define.

So, add config option checking to their definitions in entry_64.S.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/51C58B8A.2080808@hds.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:41:36 +02:00
Linus Torvalds
b8ff768b5a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Several fixes for bugs caught while looking through f_pos (ab)users"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  aout32 coredump compat fix
  splice: don't pass the address of ->f_pos to methods
  mconsole: we'd better initialize pos before passing it to vfs_read()...
2013-06-22 08:42:20 -10:00
Steven Rostedt (Red Hat)
2b4bc78956 trace,x86: Do not call local_irq_save() in load_current_idt()
As load_current_idt() is now what is used to update the IDT for the
switches needed for NMI, lockdep debug, and for tracing, it must not
call local_irq_save(). This is because one of the users of this is
lockdep, which does tracing of local_irq_save() and when the debug
trap is hit, we need to update the IDT before tracing interrupts
being disabled. As load_current_idt() is used to do this, calling
local_irq_save() which lockdep traces, defeats the point of calling
load_current_idt().

As interrupts are already disabled when used by lockdep and NMI, the
only other user is tracing that can disable interrupts itself. Simply
have the tracing update disable interrupts before calling load_current_idt()
instead of breaking the other users.

Here's the dump that happened:

------------[ cut here ]------------
WARNING: at /work/autotest/nobackup/linux-test.git/kernel/fork.c:1196 copy_process+0x2c3/0x1398()
DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled)
Modules linked in:
CPU: 1 PID: 4570 Comm: gdm-simple-gree Not tainted 3.10.0-rc3-test+ #5
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
 ffffffff81d2a7a5 ffff88006ed13d50 ffffffff8192822b ffff88006ed13d90
 ffffffff81035f25 ffff8800721c6000 ffff88006ed13da0 0000000001200011
 0000000000000000 ffff88006ed5e000 ffff8800721c6000 ffff88006ed13df0
Call Trace:
 [<ffffffff8192822b>] dump_stack+0x19/0x1b
 [<ffffffff81035f25>] warn_slowpath_common+0x67/0x80
 [<ffffffff81035fe1>] warn_slowpath_fmt+0x46/0x48
 [<ffffffff812bfc5d>] ? __raw_spin_lock_init+0x31/0x52
 [<ffffffff810341f7>] copy_process+0x2c3/0x1398
 [<ffffffff8103539d>] do_fork+0xa8/0x260
 [<ffffffff810ca7b1>] ? trace_preempt_on+0x2a/0x2f
 [<ffffffff812afb3e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56
 [<ffffffff81937fe7>] ? sysret_check+0x1b/0x56
 [<ffffffff810355cf>] SyS_clone+0x16/0x18
 [<ffffffff81938369>] stub_clone+0x69/0x90
 [<ffffffff81937fc2>] ? system_call_fastpath+0x16/0x1b
---[ end trace 8b157a9d20ca1aa2 ]---

in fork.c:

 #ifdef CONFIG_PROVE_LOCKING
	DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled); <-- bug here
	DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
 #endif

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-22 13:16:19 -04:00
Al Viro
945fb136df aout32 coredump compat fix
dump_seek() does SEEK_CUR, not SEEK_SET; native binfmt_aout
handles it correctly (seeks by PAGE_SIZE - sizeof(struct user),
getting the current position to PAGE_SIZE), compat one seeks
by PAGE_SIZE and ends up at PAGE_SIZE + already written...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-22 11:01:38 +04:00
Linus Torvalds
f71194a7d4 Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This series fixes a couple of build failures, and fixes MTRR cleanup
  and memory setup on very specific memory maps.

  Finally, it fixes triggering backtraces on all CPUs, which was
  inadvertently disabled on x86."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Fix dummy variable buffer allocation
  x86: Fix trigger_all_cpu_backtrace() implementation
  x86: Fix section mismatch on load_ucode_ap
  x86: fix build error and kconfig for ia32_emulation and binfmt
  range: Do not add new blank slot with add_range_with_merge
  x86, mtrr: Fix original mtrr range get for mtrr_cleanup
2013-06-21 06:33:48 -10:00
Linus Torvalds
9d0be540d7 KVM fixes for 3.10-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRwc4AAAoJEBvWZb6bTYbyT/wQAIAvvjM+jlqOsZCUjnutdWXf
 VGvJB1jXPEa6cuUkaE3RYO8rLLJz/PC4Geg8I/awtjeSkPvldeu0FihSkdbrNkpJ
 3dSAJ0cR8+ScUZB74Lt1pfwwEAdZk61MU1opD1qy3wWTy5Rk2xp4qzE2aCVJoLkh
 K8gscLrCHR47DELcm+AWCtHt/0VOZSAVe9z/7Qf9eAMCNmH/efHgrZZTxF/1aBkI
 7XaZdqT+d84D/Bc2isdRNvFYt9rkuII2GCeRciw2t3QsgVoBXn7FVK2OXbufIaYW
 /XX0YiNpSnUpcpOtGw2MwzpKgDSRE9QbnOUVsWThTDeG7Q5d72ioCFuRDba8p3Lc
 u146nQM495bAdJQz6IW3JCeiZlQ86/QjFquk9hJuCwQpuHmlRpVEqFbLVnZEnYvk
 lc9dnK7BcaftbMgo/j6DMv6Bpq/EDp1JrPXzNT4BwG7ptArEXzmdoktwpONjrZzI
 1oZV5xCcNhTjyqlndiZKxneKXoqz/3NOdR+EV0yh3+0l4PK6C52jd6PaKRv7qN3l
 Lt35uFrmKqzOR12KVStYREyh1Sy43ADE4tIIPnOBbcXz23Df4/vVSb1Bqv0nPq4i
 JhsRsGoR8ZdDBd7c40XlvpxOiMmubFLSklX0WaKn2yIZH9FMj63Ak0wet7qhaqb8
 lbJRUH8gnK0RCVEoUFpB
 =yGPS
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Three one-line fixes for my first pull request; one for x86 host, one
  for x86 guest, one for PPC"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86: kvmclock: zero initialize pvclock shared memory area
  kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
  KVM: x86: remove vcpu's CPL check in host-invoked XCR set
2013-06-21 06:29:22 -10:00
Linus Torvalds
92616ee654 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "This fixes an unaligned crash in XTS mode when using aseni_intel"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: aesni_intel - fix accessing of unaligned memory
2013-06-21 06:28:39 -10:00
Steven Rostedt (Red Hat)
83ab85140b trace,x86: Move creation of irq tracepoints from apic.c to irq.c
Compiling without CONFIG_X86_LOCAL_APIC set, apic.c will not be
compiled, and the irq tracepoints will not be created via the
CREATE_TRACE_POINTS macro. When CONFIG_X86_LOCAL_APIC is not set,
we get the following build error:

  LD      init/built-in.o
arch/x86/built-in.o: In function `trace_x86_platform_ipi_entry':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:66: undefined reference to `__tracepoint_x86_platform_ipi_entry'
arch/x86/built-in.o: In function `trace_x86_platform_ipi_exit':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:66: undefined reference to `__tracepoint_x86_platform_ipi_exit'
arch/x86/built-in.o: In function `trace_irq_work_entry':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:72: undefined reference to `__tracepoint_irq_work_entry'
arch/x86/built-in.o: In function `trace_irq_work_exit':
linux-test.git/arch/x86/include/asm/trace/irq_vectors.h:72: undefined reference to `__tracepoint_irq_work_exit'
arch/x86/built-in.o:(__jump_table+0x8): undefined reference to `__tracepoint_x86_platform_ipi_entry'
arch/x86/built-in.o:(__jump_table+0x14): undefined reference to `__tracepoint_x86_platform_ipi_exit'
arch/x86/built-in.o:(__jump_table+0x20): undefined reference to `__tracepoint_irq_work_entry'
arch/x86/built-in.o:(__jump_table+0x2c): undefined reference to `__tracepoint_irq_work_exit'
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2

As irq.c is always compiled for x86, it is a more appropriate location
to create the irq tracepoints.

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-21 10:33:28 -04:00
H. Peter Anvin
df91c3513f * Don't leak random kernel memory to EFI variable NVRAM when attempting
to initiate garbage collection. Also, free the kernel memory when
    we're done with it instead of leaking - Ben Hutchings
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRxCKWAAoJEC84WcCNIz1VIDcP/icYJSdXw/fKUgCG9+zAk3FE
 xivUa6aJKG+GBL4Cg2VYTJ/sBDA1v0xdFcf73amCdTcWuor6ha5LxrYFwFhhKtYk
 3O05spiCZ+F6nveZTrrxqKce3cN5XOYwbF8bZOAuJYmjaTvZQ0Cb89zf/asC4+Fx
 ui6HCRmCy6nUy5Fn4A7fBHlTyuOE4xZXvZjOUXWGiUwmZK/SbHYtN1+4nXBTnAWF
 yDcB8fkl4v7cYLIl9U8dc7Dg8ita4bj6F8RRNJf2WXipsOpMoMRtiua8VLCZ0ECc
 9cpONx7wMQS5nDEfNefWm4vdqS8PEG1CkZ8tN37nIU/AlTi6mTyTqZv0ICQfgVna
 6nChmkz9uxvzTzlMlOR5eGBu68xQLiyHhauTJBpMkwOp4q4gg/Ui+bhV7Vo9ySF3
 DlrSekT0Uo1PLgdAUYg7+hZhN98HGHoGSEx7FsymOjqlY3nF+4k7ZeTerEI4vI2D
 zqyj5QVWqS/M9zj1ojUu3nFDNhfQHO4PvVt9O8ca4/IJS1pbb/J8qtwIOFTNdJHf
 2Iyk6GnT9UaKvOs4trWXmPjiZkBcS8JdQH80fDSIgIxLhwOQksT+q90TY5h3mAED
 Hwm9JywXlxpFncPixAkpoorqDWSZmBTmUw9JR0FQNTFGuENPA0SGn51P2gGbTA+e
 82ECeWRXMMoYA/qNxMI2
 =kfTf
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * Don't leak random kernel memory to EFI variable NVRAM when attempting
   to initiate garbage collection. Also, free the kernel memory when
   we're done with it instead of leaking - Ben Hutchings

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-21 03:01:21 -07:00
Ben Hutchings
b8cb62f821 x86/efi: Fix dummy variable buffer allocation
1. Check for allocation failure
2. Clear the buffer contents, as they may actually be written to flash
3. Don't leak the buffer

Compile-tested only.

[ Tested successfully on my buggy ASUS machine - Matt ]

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-21 10:52:49 +01:00
Herbert Xu
02c0241b60 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto
Merge crypto to resolve conflict in crypto/Kconfig.
2013-06-21 15:13:27 +08:00
Jussi Kivilinna
99f42f937a Revert "crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher"
This reverts commit cf1521a1a5.

Instruction (vpgatherdd) that this implementation relied on turned out to be
slow performer on real hardware (i5-4570). The previous 8-way twofish/AVX
implementation is therefore faster and this implementation should be removed.

Converting this implementation to use the same method as in twofish/AVX for
table look-ups would give additional ~3% speed up vs twofish/AVX, but would
hardly be worth of the added code and binary size.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-06-21 14:44:29 +08:00
Jussi Kivilinna
3d387ef08c Revert "crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher"
This reverts commit 6048801070.

Instruction (vpgatherdd) that this implementation relied on turned out to be
slow performer on real hardware (i5-4570). The previous 4-way blowfish
implementation is therefore faster and this implementation should be removed.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-06-21 14:44:28 +08:00
Jussi Kivilinna
acfffdb803 crypto: camellia-aesni-avx2 - tune assembly code for more performance
Add implementation tuned for more performance on real hardware. Changes are
mostly around the part mixing 128-bit extract and insert instructions and
AES-NI instructions. Also 'vpbroadcastb' instructions have been change to
'vpshufb with zero mask'.

Tests on Intel Core i5-4570:

tcrypt ECB results, old-AVX2 vs new-AVX2:

size    128bit key      256bit key
        enc     dec     enc     dec
256     1.00x   1.00x   1.00x   1.00x
1k      1.08x   1.09x   1.05x   1.06x
8k      1.06x   1.06x   1.06x   1.06x

tcrypt ECB results, AVX vs new-AVX2:

size    128bit key      256bit key
        enc     dec     enc     dec
256     1.00x   1.00x   1.00x   1.00x
1k      1.51x   1.50x   1.52x   1.50x
8k      1.47x   1.48x   1.48x   1.48x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-06-21 14:44:23 +08:00
Seiji Aguchi
cf910e83ae x86, trace: Add irq vector tracepoints
[Purpose of this patch]

As Vaibhav explained in the thread below, tracepoints for irq vectors
are useful.

http://www.spinics.net/lists/mm-commits/msg85707.html

<snip>
The current interrupt traces from irq_handler_entry and irq_handler_exit
provide when an interrupt is handled.  They provide good data about when
the system has switched to kernel space and how it affects the currently
running processes.

There are some IRQ vectors which trigger the system into kernel space,
which are not handled in generic IRQ handlers.  Tracing such events gives
us the information about IRQ interaction with other system events.

The trace also tells where the system is spending its time.  We want to
know which cores are handling interrupts and how they are affecting other
processes in the system.  Also, the trace provides information about when
the cores are idle and which interrupts are changing that state.
<snip>

On the other hand, my usecase is tracing just local timer event and
getting a value of instruction pointer.

I suggested to add an argument local timer event to get instruction pointer before.
But there is another way to get it with external module like systemtap.
So, I don't need to add any argument to irq vector tracepoints now.

[Patch Description]

Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
But there is an above use case to trace specific irq_vector rather than tracing all events.
In this case, we are concerned about overhead due to unwanted events.

So, add following tracepoints instead of introducing irq_vector_entry/exit.
so that we can enable them independently.
   - local_timer_vector
   - reschedule_vector
   - call_function_vector
   - call_function_single_vector
   - irq_work_entry_vector
   - error_apic_vector
   - thermal_apic_vector
   - threshold_apic_vector
   - spurious_apic_vector
   - x86_platform_ipi_vector

Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
makes a zero when tracepoints are disabled. Detailed explanations are as follows.
 - Create trace irq handlers with entering_irq()/exiting_irq().
 - Create a new IDT, trace_idt_table, at boot time by adding a logic to
   _set_gate(). It is just a copy of original idt table.
 - Register the new handlers for tracpoints to the new IDT by introducing
   macros to alloc_intr_gate() called at registering time of irq_vector handlers.
 - Add checking, whether irq vector tracing is on/off, into load_current_idt().
   This has to be done below debug checking for these reasons.
   - Switching to debug IDT may be kicked while tracing is enabled.
   - On the other hands, switching to trace IDT is kicked only when debugging
     is disabled.

In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
used for other purposes.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:34 -07:00
Seiji Aguchi
629f4f9d59 x86: Rename variables for debugging
Rename variables for debugging to describe meaning of them precisely.

Also, introduce a generic way to switch IDT by checking a current state,
debug on/off.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C323A8.7050905@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:13 -07:00
Seiji Aguchi
eddc0e922a x86, trace: Introduce entering/exiting_irq()
When implementing tracepoints in interrupt handers, if the tracepoints are
simply added in the performance sensitive path of interrupt handers,
it may cause potential performance problem due to the time penalty.

To solve the problem, an idea is to prepare non-trace/trace irq handers and
switch their IDTs at the enabling/disabling time.

So, let's introduce entering_irq()/exiting_irq() for pre/post-
processing of each irq handler.

A way to use them is as follows.

Non-trace irq handler:
smp_irq_handler()
{
	entering_irq();		/* pre-processing of this handler */
	__smp_irq_handler();	/*
				 * common logic between non-trace and trace handlers
				 * in a vector.
				 */
	exiting_irq();		/* post-processing of this handler */

}

Trace irq_handler:
smp_trace_irq_handler()
{
	entering_irq();		/* pre-processing of this handler */
	trace_irq_entry();	/* tracepoint for irq entry */
	__smp_irq_handler();	/*
				 * common logic between non-trace and trace handlers
				 * in a vector.
				 */
	trace_irq_exit();	/* tracepoint for irq exit */
	exiting_irq();		/* post-processing of this handler */

}

If tracepoints can place outside entering_irq()/exiting_irq() as follows,
it looks cleaner.

smp_trace_irq_handler()
{
	trace_irq_entry();
	smp_irq_handler();
	trace_irq_exit();
}

But it doesn't work.
The problem is with irq_enter/exit() being called. They must be called before
trace_irq_enter/exit(),  because of the rcu_irq_enter() must be called before
any tracepoints are used, as tracepoints use  rcu to synchronize.

As a possible alternative, we may be able to call irq_enter() first as follows
if irq_enter() can nest.

smp_trace_irq_hander()
{
	irq_entry();
	trace_irq_entry();
	smp_irq_handler();
	trace_irq_exit();
	irq_exit();
}

But it doesn't work, either.
If irq_enter() is nested, it may have a time penalty because it has to check if it
was already called or not. The time penalty is not desired in performance sensitive
paths even if it is tiny.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:01 -07:00
H. Peter Anvin
f037e416af x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S
There is no point in using "xorq" to clear a register... use "xorl" to
clear the bottom 32 bits, and the upper 32 bits get cleared by virtue
of zero extension.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/n/tip-b76zi1gep39c0zs8fbvkhie9@git.kernel.org
2013-06-20 21:30:04 -07:00
H. Peter Anvin
e6bca5a6a8 Linux 3.10-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRvOHmAAoJEHm+PkMAQRiGpOYIAI/OMDjCMro8S1FjDPGr+wsz
 r/u4h4DiHQTdAUSBYhksgXVBSm02hGhDF3tz7u9oAXuv+4zlGUMZNjmEmLOFfdyd
 Ve9vqMrUkdwA0jt0d7AYVjtCy/j/yVe5szXZCksHXOZiyb78SEZPQgHc13CJ4lKI
 K2jPk0sMko7d5ptALPIX+ome48M+3w3k1HuQ62Znm4pFSADcFGpGdf40HP/5LL+I
 Llp3XHJ5OZrcHRal0t39oHJOCWW61bnYXTWel9J7XoUztebF2cRgFydAhtrto42z
 x7ELOKVY197WH8l56cnVHrISneQd4kgWrooxLYy6QsJl73/qvQBX+7nmkoes7so=
 =qeV0
 -----END PGP SIGNATURE-----

Merge tag 'v3.10-rc6' into x86/cleanups

Linux 3.10-rc6

We need a change that is the mainline tree for further work.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 21:13:55 -07:00
Borislav Petkov
5f8c421814 x86, fpu: Use static_cpu_has_safe before alternatives
The call stack below shows how this happens: basically eager_fpu_init()
calls __thread_fpu_begin(current) which then does if (!use_eager_fpu()),
which, in turn, uses static_cpu_has.

And we're executing before alternatives so static_cpu_has doesn't work
there yet.

Use the safe variant in this path which becomes optimal after
alternatives have run.

WARNING: at arch/x86/kernel/cpu/common.c:1368 warn_pre_alternatives+0x1e/0x20()
You're using static_cpu_has before alternatives have run!
Modules linked in:
Pid: 0, comm: swapper Not tainted 3.9.0-rc8+ #1
Call Trace:
 warn_slowpath_common
 warn_slowpath_fmt
 ? fpu_finit
 warn_pre_alternatives
 eager_fpu_init
 fpu_init
 cpu_init
 trap_init
 start_kernel
 ? repair_env_string
 x86_64_start_reservations
 x86_64_start_kernel

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-6-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:38:22 -07:00
Borislav Petkov
4a90a99c4f x86: Add a static_cpu_has_safe variant
We want to use this in early code where alternatives might not have run
yet and for that case we fall back to the dynamic boot_cpu_has.

For that, force a 5-byte jump since the compiler could be generating
differently sized jumps for each label.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:38:14 -07:00
Borislav Petkov
5700f743b5 x86: Sanity-check static_cpu_has usage
static_cpu_has may be used only after alternatives have run. Before that
it always returns false if constant folding with __builtin_constant_p()
doesn't happen. And you don't want that.

This patch is the result of me debugging an issue where I overzealously
put static_cpu_has in code which executed before alternatives have run
and had to spend some time with scratching head and cursing at the
monitor.

So add a jump to a warning which screams loudly when we use this
function too early. The alternatives patch that check away in
conjunction with patching the rest of the kernel image.

[ hpa: factored this into its own configuration option.  If we want to
  have an overarching option, it should be an option which selects
  other options, not as a group option in the source code. ]

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:37:19 -07:00
Borislav Petkov
c3b83598c1 x86, cpu: Add a synthetic, always true, cpu feature
This will be used in alternatives later as an always-replace flag.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:06:07 -07:00
Xiao Guangrong
885032b910 KVM: MMU: retain more available bits on mmio spte
Let mmio spte only use bit62 and bit63 on upper 32 bits, then bit 52 ~ bit 61
can be used for other purposes

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-20 23:33:20 +02:00
Linus Torvalds
a3d5c3460a Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Two smaller fixes - plus a context tracking tracing fix that is a bit
  bigger"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/context-tracking: Add preempt_schedule_context() for tracing
  sched: Fix clear NOHZ_BALANCE_KICK
  sched/x86: Construct all sibling maps if smt
2013-06-20 08:18:35 -10:00
Linus Torvalds
86c76676cf Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Four fixes.  The mmap ones are unfortunately larger than desired -
  fuzzing uncovered bugs that needed perf context life time management
  changes to fix properly"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
  perf: Fix mmap() accounting hole
  perf: Fix perf mmap bugs
  kprobes: Fix to free gone and unused optprobes
2013-06-20 08:17:36 -10:00
Linus Torvalds
805e318548 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu idle fixes from Thomas Gleixner:
 - Add a missing irq enable. Fallout of the idle conversion
 - Fix stackprotector wreckage caused by the idle conversion

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  idle: Enable interrupts in the weak arch_cpu_idle() implementation
  idle: Add the stack canary init to cpu_startup_entry()
2013-06-20 08:16:07 -10:00
Ingo Molnar
f070a4dba9 Merge branch 'perf/urgent' into perf/core
Merge in two hw_breakpoint fixes, before applying another 5.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 17:57:40 +02:00
Masami Hiramatsu
003002e04e kprobes: Fix arch_prepare_kprobe to handle copy insn failures
Fix arch_prepare_kprobe() to handle failures in copy instruction
correctly. This fix is related to the previous fix: 8101376
which made __copy_instruction return an error result if failed,
but caller site was not updated to handle it. Thus, this is the
other half of the bugfix.

This fix is also related to the following bug-report:

   https://bugzilla.redhat.com/show_bug.cgi?id=910649

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Jonathan Lebon <jlebon@redhat.com>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: systemtap@sourceware.org
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20130605031216.15285.2001.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 14:25:48 +02:00
Michel Lespinasse
b52e0a7c4e x86: Fix trigger_all_cpu_backtrace() implementation
The following change fixes the x86 implementation of
trigger_all_cpu_backtrace(), which was previously (accidentally,
as far as I can tell) disabled to always return false as on
architectures that do not implement this function.

trigger_all_cpu_backtrace(), as defined in include/linux/nmi.h,
should call arch_trigger_all_cpu_backtrace() if available, or
return false if the underlying arch doesn't implement this
function.

x86 did provide a suitable arch_trigger_all_cpu_backtrace()
implementation, but it wasn't actually being used because it was
declared in asm/nmi.h, which linux/nmi.h doesn't include. Also,
linux/nmi.h couldn't easily be fixed by including asm/nmi.h,
because that file is not available on all architectures.

I am proposing to fix this by moving the x86 definition of
arch_trigger_all_cpu_backtrace() to asm/irq.h.

Tested via: echo l > /proc/sysrq-trigger

Before the change, this uses a fallback implementation which
shows backtraces on active CPUs (using
smp_call_function_interrupt() )

After the change, this shows NMI backtraces on all CPUs

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1370518875-1346-1-git-send-email-walken@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 14:00:21 +02:00
Borislav Petkov
719038de98 x86/intel/cacheinfo: Shut up last long-standing warning
arch/x86/kernel/cpu/intel_cacheinfo.c: In function ‘init_intel_cacheinfo’:
arch/x86/kernel/cpu/intel_cacheinfo.c:642:28: warning: ‘this_leaf.size’ may be used uninitialized in this function [-Wmaybe-uninitialized] arch/x86/kernel/cpu/intel_cacheinfo.c:643:29: warning: ‘this_leaf.eax.split.num_threads_sharing’ may be used uninitialized in this function [-Wmaybe-uninitialized]

This keeps on happening during randbuilds and the compiler is
wrong here:

In the case where cpuid4_cache_lookup_regs() returns 0, both
this_leaf.size and this_leaf.eax get initialized. In the case
where the CPUID leaf doesn't contain valid cache info, we error
out which init_intel_cacheinfo() handles correctly without
touching the abovementioned fields.

So shut up the warning by clearing out the struct which we hand
down.

While at it, reverse error handling and gain one indentation
level.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370710095-20547-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 12:27:41 +02:00
David S. Miller
d98cae64e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/wireless/ath/ath9k/Kconfig
	drivers/net/xen-netback/netback.c
	net/batman-adv/bat_iv_ogm.c
	net/wireless/nl80211.c

The ath9k Kconfig conflict was a change of a Kconfig option name right
next to the deletion of another option.

The xen-netback conflict was overlapping changes involving the
handling of the notify list in xen_netbk_rx_action().

Batman conflict resolution provided by Antonio Quartulli, basically
keep everything in both conflict hunks.

The nl80211 conflict is a little more involved.  In 'net' we added a
dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
Linus reported.  Meanwhile in 'net-next' the handlers were converted
to use pre and post doit handlers which use a flag to determine
whether to hold the RTNL mutex around the operation.

However, the dump handlers to not use this logic.  Instead they have
to explicitly do the locking.  There were apparent bugs in the
conversion of nl80211_dump_wiphy() in that we were not dropping the
RTNL mutex in all the return paths, and it seems we very much should
be doing so.  So I fixed that whilst handling the overlapping changes.

To simplify the initial returns, I take the RTNL mutex after we try
to allocate 'tb'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 16:49:39 -07:00
Paul Gortmaker
949785996e x86: Fix section mismatch on load_ucode_ap
We are in the process of removing all the __cpuinit annotations.
While working on making that change, an existing problem was
made evident:

  WARNING: arch/x86/kernel/built-in.o(.text+0x198f2): Section mismatch
  in reference from the function cpu_init() to the function
  .init.text:load_ucode_ap()   The function cpu_init() references
  the function __init load_ucode_ap().  This is often because cpu_init
  lacks a __init annotation or the annotation of load_ucode_ap is wrong.

This now appears because in my working tree, cpu_init() is no longer
tagged as __cpuinit, and so the audit picks up the mismatch.  The 2nd
hypothesis from the audit is the correct one, as there was an incorrect
__init tag on the prototype in the header (but __cpuinit was used on
the function itself.)

The audit is telling us that the prototype's __init annotation took
effect and the function did land in the .init.text section.  Checking
with objdump on a mainline tree that still has __cpuinit shows that
the __cpuinit on the function takes precedence over the __init on the
prototype, but that won't be true once we make __cpuinit a no-op.

Even though we are removing __cpuinit, we temporarily align both
the function and the prototype on __cpuinit so that the changeset
can be applied to stable trees  if desired.

[ hpa: build fix only, no object code change ]

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable <stable@vger.kernel.org> # 3.9+
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1371654926-11729-1-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-19 14:43:59 -07:00
Konrad Rzeszutek Wilk
d6a77ead21 x86 / ACPI / sleep: Provide registration for acpi_suspend_lowlevel.
Which by default will be x86_acpi_suspend_lowlevel.
This registration allows us to register another callback
if there is a need to use another platform specific callback.

Signed-off-by: Liang Tang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Ben Guthro <benjamin.guthro@citrix.com>
Acked-by: "H. Peter Anvin" <hpa@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-19 23:36:30 +02:00
Joe Perches
f07d91ede6 x86/vdso: Convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jiri Kosina <trivial@kernel.org>
Link: http://lkml.kernel.org/r/a756fa0060e8eea25e8c1863c2764e86c2823617.1371177118.git.joe@perches.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 15:06:09 +02:00
Rusty Russell
5a802e1530 x86: Remove weird PTR_ERR() in do_debug
62edab905 changed the argument to notify_die() from dr6 to &dr6,
but weirdly, used PTR_ERR() to cast it to a long.  Since dr6 is
on the stack, this is an abuse of PTR_ERR().  Cast to long, as
per kernel standard.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1371357768-4968-8-git-send-email-rusty@rustcorp.com.au
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 15:01:36 +02:00
Andi Kleen
f9134f36ae perf/x86/intel: Add mem-loads/stores support for Haswell
mem-loads is basically the same as Sandy Bridge,
but we use a separate string for changes later.

Haswell doesn't support the full precise store mode,
so we emulate it using the "DataLA" facility.
This allows to do everything, but for data sources we
can only detect L1 hit or not.

There is no explicit enable bit anymore, so we have
to tie it to a perf internal only flag.

The address is supported for all memory related PEBS
events with DataLA. Instead of only logging for the
load and store events we allow logging it for all
(it will be simply 0 if the current event does not
support it)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-7-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:35 +02:00
Andi Kleen
135c5612c4 perf/x86/intel: Support Haswell/v4 LBR format
Haswell has two additional LBR from flags for TSX: in_tx and
abort_tx, implemented as a new "v4" version of the LBR format.

Handle those in and adjust the sign extension code to still
correctly extend. The flags are exported similarly in the LBR
record to the existing misprediction flag

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-6-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:35 +02:00
Andi Kleen
72db559646 perf/x86/intel: Move NMI clearing to end of PMI handler
This avoids some problems with spurious PMIs on Haswell.
Haswell seems to behave more like P4 in this regard. Do
the same thing as the P4 perf handler by unmasking
the NMI only at the end. Shouldn't make any difference
for earlier family 6 cores.

(Tested on Haswell, IvyBridge, Westmere, Saltwell (Atom).)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:34 +02:00
Andi Kleen
3044318f1f perf/x86/intel: Add Haswell PEBS support
Add simple PEBS support for Haswell.

The constraints are similar to SandyBridge with a few new
events.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:33 +02:00
Andi Kleen
3a632cb229 perf/x86/intel: Add simple Haswell PMU support
Similar to SandyBridge, but has a few new events and two
new counter bits.

There are some new counter flags that need to be prevented
from being set on fixed counters, and allowed to be set
for generic counters.

Also we add support for the counter 2 constraint to handle
all raw events.

(Contains fixes from Stephane Eranian.)

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:33 +02:00
Andi Kleen
130768b8c9 perf/x86/intel: Add Haswell PEBS record support
Add support for the Haswell extended (fmt2) PEBS format.

It has a superset of the nhm (fmt1) PEBS fields, but has a
longer record so we need to adjust the code paths.

The main advantage is the new "EventingRip" support which
directly gives the instruction, not off-by-one instruction. So
with precise == 2 we use that directly and don't try to use LBRs
and walking basic blocks. This lowers the overhead of using
precise significantly.

Some other features are added in later patches.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:32 +02:00
Dave Jones
4338774cd4 x86/debug: Only print out DR registers if they are not power-on defaults
The DR registers are rarely useful when decoding oopses.
With screen real estate during oopses at a premium, we can save
two lines by only printing out these registers when they are set
to something other than they power-on state.

Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130618160911.GA24487@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:33:59 +02:00
Ingo Molnar
2e7e98b85d Fix typo in define
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRsuoUAAoJEBLB8Bhh3lVKpFUP/j6763r5WPtrzUx0sAZBt7SN
 T9VAdqWd3SvxP+lDnRoU1luGQc0j+05njUnc74e2k2rGu2+WDonkxJIv4xMHsuG/
 cItdK+ThmxOQa4ijm3uJ0N8I16k15cZkW0H5hyHSVqbawPaAjCt25DEXexTSgBQM
 wnQSsDNXRjtU/LSBPAZX4t9ePdawy/EwLGO+YYJ1ZKck4oesCwp0j/PVqQG+IGTM
 QUbB3mh0GvdHW5pPgVH6n+yqQ7puN92WHzJicYsYCtJDHa1Yf5D6/C+ZBq8GJ1V0
 rCdxQ8uI5tcn9WOJ1gZrUGPmjEMotnLrZX26b/cTEPzBP0mjf+Kd/Ew+gNmdmvX8
 5jv1Agzb1GQVXesPpv+N3hw5HNqLEQCBdlLQ5kiX63wYwF4vck+JaAGDWUILO6qp
 TtdEiotBHDKpk9YuWeepmCUDB2u/uyMrHSA0HGLBh6ACQXsl5/RnQj3KnGFlbvo6
 FbMvoOEXbNIOB92OEnC9kZteXGlIBtsK0sHmRXfOsckbH/gyYyBcFwqmeHFpzuKV
 N5f3NV8GTrUyrNdn4S28S/wlnx+KOqpqFhllVkgp/OcW2/Ry2eEZYM7ypU8bXMpo
 +HOFwQZimmQonVqRzGrQQI9w5+D5xGJ3gI/0lcjkX2r2Njowx8q13wfKKmqE8EDB
 7a+BsNdTBUt01dskFlSe
 =8FzZ
 -----END PGP SIGNATURE-----

Merge tag 'ras_fixlet_for_3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull "Fix typo in define" change from Borislav Petkov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:51:54 +02:00
Jiri Slaby
062f487190 x86/boot: Close opened file descriptor
During build we open a file, read that but do not close it. Fix
that by sticking fclose() at the right place.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: http://lkml.kernel.org/r/1371628383-11216-1-git-send-email-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
2013-06-19 13:32:19 +02:00
Yan, Zheng
b2fa344d0c perf/x86/intel: Fix sparse warning
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370421025-10986-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:04:55 +02:00
Suravee Suthikulpanit
7be6296fdd perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
Implement a perf PMU to handle IOMMU performance counters and events.
The PMU only supports counting mode (e.g. perf stat). Since the counters
are shared across all cores, the PMU is implemented as "system-wide" mode.

To invoke the AMD IOMMU PMU, issue a perf tool command such as:

  ./perf stat -a -e amd_iommu/<events>/ <command>

or:

  ./perf stat -a -e amd_iommu/config=<config-data>,config1=<config1-data>/ <command>

For example:

  ./perf stat -a -e amd_iommu/mem_trans_total/ <command>

The resulting count will be how many IOMMU total peripheral memory
operations were performed during the command execution window.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370466709-3212-3-git-send-email-suravee.suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:04:53 +02:00
Dave Hansen
ae0def05ed perf/x86: Only print PMU state when also WARN()'ing
intel_pmu_handle_irq() has a warning in it if it does too many
loops.  It is a WARN_ONCE(), but the perf_event_print_debug()
call beneath it is unconditional. For the first warning, you get
a nice backtrace and message, but subsequent ones just dump the
PMU state with no leading messages.  I doubt this is what was
intended.

This patch will only print the PMU state when paired with the
WARN_ON() text.  It effectively open-codes WARN_ONCE()'s
one-time-only logic.

My suspicion is that the code really just wants to make sure we
do not sit in the loop and spit out a warning for every loop
iteration after the 100th.  From what I've seen, this is very
unlikely to happen since we also clear the PMU state.

After this patch, instead of seeing the PMU state dumped each
time, you will just see:

	[57494.894540] perf_event_intel: clearing PMU state on CPU#129
	[57579.539668] perf_event_intel: clearing PMU state on CPU#10
	[57587.137762] perf_event_intel: clearing PMU state on CPU#134
	[57623.039912] perf_event_intel: clearing PMU state on CPU#114
	[57644.559943] perf_event_intel: clearing PMU state on CPU#118
	...

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130530174559.0DB049F4@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:50:47 +02:00
Andrew Hunter
43b4578071 perf/x86: Reduce stack usage of x86_schedule_events()
x86_schedule_events() caches event constraints on the stack during
scheduling.  Given the number of possible events, this is 512 bytes of
stack; since it can be invoked under schedule() under god-knows-what,
this is causing stack blowouts.

Trade some space usage for stack safety: add a place to cache the
constraint pointer to struct perf_event.  For 8 bytes per event (1% of
its size) we can save the giant stack frame.

This shouldn't change any aspect of scheduling whatsoever and while in
theory the locality's a tiny bit worse, I doubt we'll see any
performance impact either.

Tested: `perf stat whatever` does not blow up and produces
results that aren't hugely obviously wrong.  I'm not sure how to run
particularly good tests of perf code, but this should not produce any
functional change whatsoever.

Signed-off-by: Andrew Hunter <ahh@google.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1369332423-4400-1-git-send-email-ahh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:50:44 +02:00
Ingo Molnar
eff2108f02 Merge branch 'perf/urgent' into perf/core
Merge in the latest fixes, to avoid conflicts with ongoing work.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:44:41 +02:00
Stephane Eranian
f1a527899e perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
This patch fixes broken support of PEBS-LL on SNB-EP/IVB-EP.
For some reason, the LDLAT extra reg definition for snb_ep
showed up as duplicate in the snb table.

This patch moves the definition of LDLAT back into the
snb_ep table.

Thanks to Don Zickus for tracking this one down.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130607212210.GA11849@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:44:16 +02:00
Igor Mammedov
07868fc6aa x86: kvmclock: zero initialize pvclock shared memory area
kernel might hung in pvclock_clocksource_read() due to
uninitialized memory might contain odd version value in
following cycle:

        do {
                version = __pvclock_read_cycles(src, &ret, &flags);
        } while ((src->version & 1) || version != src->version);

if secondary kvmclock is accessed before it's registered with kvm.

Clear garbage in pvclock shared memory area right after it's
allocated to avoid this issue.

Ref: https://bugzilla.kernel.org/show_bug.cgi?id=59521
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[See BZ for analysis.  We may want a different fix for 3.11, but
 this is the safest for now - Paolo]
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-19 12:25:28 +02:00
Randy Dunlap
d1603990ea x86: fix build error and kconfig for ia32_emulation and binfmt
Fix kconfig warning and build errors on x86_64 by selecting BINFMT_ELF
when COMPAT_BINFMT_ELF is being selected.

warning: (IA32_EMULATION) selects COMPAT_BINFMT_ELF which has unmet direct dependencies (COMPAT && BINFMT_ELF)

fs/built-in.o: In function `elf_core_dump':
compat_binfmt_elf.c:(.text+0x3e093): undefined reference to `elf_core_extra_phdrs'
compat_binfmt_elf.c:(.text+0x3ebcd): undefined reference to `elf_core_extra_data_size'
compat_binfmt_elf.c:(.text+0x3eddd): undefined reference to `elf_core_write_extra_phdrs'
compat_binfmt_elf.c:(.text+0x3f004): undefined reference to `elf_core_write_extra_data'

[ hpa: This was sent to me for -next but it is a low risk build fix ]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: http://lkml.kernel.org/r/51C0B614.5000708@infradead.org
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-18 16:20:32 -05:00
Yinghai Lu
d8d386c106 x86, mtrr: Fix original mtrr range get for mtrr_cleanup
Joshua reported: Commit cd7b304dfaf1 (x86, range: fix missing merge
during add range) broke mtrr cleanup on his setup in 3.9.5.
corresponding commit in upstream is fbe06b7bae.

  *BAD*gran_size: 64K chunk_size: 16M num_reg: 6 lose cover RAM: -0G

https://bugzilla.kernel.org/show_bug.cgi?id=59491

So it rejects new var mtrr layout.

It turns out we have some problem with initial mtrr range retrieval.
The current sequence is:
	x86_get_mtrr_mem_range
		==> bunchs of add_range_with_merge
		==> bunchs of subract_range
		==> clean_sort_range
	add_range_with_merge for [0,1M)
	sort_range()

add_range_with_merge could have blank slots, so we can not just
sort only, that will have final result have extra blank slot in head.

So move that calling add_range_with_merge for [0,1M), with that we
could avoid extra clean_sort_range calling.

Reported-by: Joshua Covington <joshuacov@googlemail.com>
Tested-by: Joshua Covington <joshuacov@googlemail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371154622-8929-2-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-18 11:32:02 -05:00
Zhanghaoyu (A)
764bcbc5a6 KVM: x86: remove vcpu's CPL check in host-invoked XCR set
__kvm_set_xcr function does the CPL check when set xcr. __kvm_set_xcr is
called in two flows, one is invoked by guest, call stack shown as below,

  handle_xsetbv(or xsetbv_interception)
    kvm_set_xcr
      __kvm_set_xcr

the other one is invoked by host, for example during system reset:

  kvm_arch_vcpu_ioctl
    kvm_vcpu_ioctl_x86_set_xcrs
      __kvm_set_xcr

The former does need the CPL check, but the latter does not.

Cc: stable@vger.kernel.org
Signed-off-by: Zhang Haoyu <haoyu.zhang@huawei.com>
[Tweaks to commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-18 09:55:35 +02:00
Greg Kroah-Hartman
bb07b00be7 Merge 3.10-rc6 into driver-core-next
We want these fixes here too.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-17 16:57:20 -07:00
Fenghua Yu
6bb2ff846f x86 thermal: Disable power limit notification interrupt by default
The package power limit notification interrupt is primarily for
system diagnosis, and should not be blindly enabled on every
system by default -- particuarly since Linux does nothing in the
handler except count how many times it has been called...

Add a new kernel cmdline parameter "int_pln_enable" for situations where
users want to oberve these events via existing system counters:

$ grep TRM /proc/interrupts

$ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/*

https://bugzilla.kernel.org/show_bug.cgi?id=36182

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-14 14:49:00 -07:00
Fenghua Yu
c81147483e x86 thermal: Delete power-limit-notification console messages
Package power limits are common on some systems under some conditions --
so printing console messages when limits are reached
causes unnecessary customer concern and support calls.

Note that even with these console messages gone,
the events can still be observed via system counters:

$ grep TRM /proc/interrupts

Shows total thermal interrupts, which includes both power
limit notifications and thermal throttling interrupts.

$ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/*

Will show what caused those interrupts, core and package
throttling and power limit notifications.

https://bugzilla.kernel.org/show_bug.cgi?id=36182

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-14 14:48:37 -07:00
Steve Capper
53313b2c91 x86: mm: Remove general hugetlb code from x86.
huge_pte_alloc, huge_pte_offset and follow_huge_p[mu]d have
already been copied over to mm.

This patch removes the x86 copies of these functions and activates
the general ones by enabling:
CONFIG_ARCH_WANT_GENERAL_HUGETLB

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-14 09:40:15 +01:00
Steve Capper
cfe28c5d63 x86: mm: Remove x86 version of huge_pmd_share.
The huge_pmd_share code has been copied over to mm/hugetlb.c to
make it accessible to other architectures.

Remove the x86 copy of the huge_pmd_share code and enable the
ARCH_WANT_HUGE_PMD_SHARE config flag. That way we reference the
general one.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-14 09:39:46 +01:00
Linus Torvalds
cb03dc094a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "Another set of fixes, the biggest bit of this is yet another tweak to
  the UEFI anti-bricking code; apparently we finally got some feedback
  from Samsung as to what makes at least their systems fail.  This set
  should actually fix the boot regressions that some other systems (e.g.
  SGI) have exhibited.

  Other than that, there is a patch to avoid a panic with particularly
  unhappy memory layouts and two minor protocol fixes which may or may
  not be manifest bugs"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix typo in kexec register clearing
  x86, relocs: Move __vvar_page from S_ABS to S_REL
  Modify UEFI anti-bricking code
  x86: Fix adjust_range_size_mask calling position
2013-06-13 13:08:51 -07:00
H. Peter Anvin
45df901cc8 * More tweaking to the EFI variable anti-bricking algorithm. Quite a
few users were reporting boot regressions in v3.9. This has now been
    fixed with a more accurate "minimum storage requirement to avoid
    bricking" value from Samsung (5K instead of 50%) and code to trigger
    garbage collection when we near our limit - Matthew Garrett.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRtkY2AAoJEC84WcCNIz1VJOsP/00xwiY4VKh2RfqNkYKSl/w5
 gEshIHFEAXHX5X8C4ReocZVywvdjTgbJoKBbBy3FePYRzLddrmavvjen17hk7BzS
 /cO8/eXForkNWCGR1kLagA6HLpgKP5DPayKizoMb4Mg6muzfT1SCcN6Pzh8cDMWe
 btcq/l9JZejXdJ4Wfoq1My+WdXs19OT/BNeD3y65K4x29vNUjop6oaIdDJWLlH/S
 aeLHh8d4xbSHNWzK1fBP7CnFTYU27xxs1BFNAReU6McxeQCYZAIaRovYnjTZEvfJ
 twd2tLrOn9HBVTbWa8T4XGNSr+QcT4XGMadLvdwuqltmKDfH6Onm8aWQM3IqA7gy
 Qimbcv2B7HrITgXWTzp3DPkXF1LA8/8QHSBXVMUU9Rl6QOLy18vIdKiQy3M1Ng9Z
 0q+Ow93JtnL11zf9wLDMdKaKcA9HOxbG/wRTK6XO4vGaWj9brFv3n5Ib7OreHH6D
 GP58zDEnThFuj97K/NKREBZZFcFOMZpKk5MAipVkzltihUQmNeTF/dAtBJ3Ncu/A
 PqQE6uuKVXjASJR8Gy0bI3WHtSTZK4L/sg9c2MF3bdJa9BswN+m8IEbls+S+iFOx
 +sYPQx7Zw6SFENxDw8cDYNzC14yfr60qyOxTWfkHH7l/FnvhOgwHzqPsLcXx0ouR
 C6k1yPYSTgiqFdWC2sjn
 =TZuM
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * More tweaking to the EFI variable anti-bricking algorithm. Quite a
   few users were reporting boot regressions in v3.9. This has now been
   fixed with a more accurate "minimum storage requirement to avoid
   bricking" value from Samsung (5K instead of 50%) and code to trigger
   garbage collection when we near our limit - Matthew Garrett.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-13 08:59:23 -07:00
Jussi Kivilinna
fe6510b5d6 crypto: aesni_intel - fix accessing of unaligned memory
The new XTS code for aesni_intel uses input buffers directly as memory operands
for pxor instructions, which causes crash if those buffers are not aligned to
16 bytes.

Patch changes XTS code to handle unaligned memory correctly, by loading memory
with movdqu instead.

Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-06-13 14:57:42 +08:00
Srinivas Pandruvada
25cdce170d x86, mcheck, therm_throt: Process package thresholds
Added callback registration for package threshold reports. Also added
a callback to check the rate control implemented in callback or not.
If there is no rate control implemented, then there is a default rate
control similar to core threshold notification by delaying for
CHECK_INTERVAL (5 minutes) between reports.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-06-13 09:59:14 +08:00
Kees Cook
c8a22d19dd x86: Fix typo in kexec register clearing
Fixes a typo in register clearing code. Thanks to PaX Team for fixing
this originally, and James Troup for pointing it out.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130605184718.GA8396@www.outflux.net
Cc: <stable@vger.kernel.org> v2.6.30+
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-12 15:16:18 -07:00
Kees Cook
b1983b0a75 x86, relocs: Move __vvar_page from S_ABS to S_REL
The __vvar_page relocation should actually be listed in S_REL instead
of S_ABS. Oddly, this didn't always cause things to break, presumably
because there are no users for relocation information on 64 bits yet.

[ hpa: Not for stable - new code in 3.10 ]

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130611185652.GA23674@www.outflux.net
Reported-by: Michael Davidson <md@google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-12 15:14:57 -07:00
Marcelo Tosatti
8915aa27d5 KVM: x86: handle idiv overflow at kvm_write_tsc
Its possible that idivl overflows (due to large delta stored in usdiff,
valid scenario).

Create an exception handler to catch the overflow exception (division by zero
is protected by vcpu->arch.virtual_tsc_khz check), and interpret it accordingly
(delta is larger than USEC_PER_SEC).

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=969644

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-12 14:24:11 +03:00
Thomas Gleixner
d7880812b3 idle: Add the stack canary init to cpu_startup_entry()
Moving x86 to the generic idle implementation (commit 7d1a9417 "x86:
Use generic idle loop") wreckaged the stack protector.

I stupidly missed that boot_init_stack_canary() must be inlined from a
function which never returns, but I put that call into
arch_cpu_idle_prepare() which of course returns.

I pondered to play tricks with arch_cpu_idle_prepare() first, but then
I noticed, that the other archs which have implemented the
stackprotector (ARM and SH) do not initialize the canary for the
non-boot cpus.

So I decided to move the boot_init_stack_canary() call into
cpu_startup_entry() ifdeffed with an CONFIG_X86 for now. This #ifdef
is just a temporary measure as I don't want to inflict the
boot_init_stack_canary() call on ARM and SH that late in the cycle.

I'll queue a patch for 3.11 which removes the #ifdef if the ARM/SH
maintainers have no objection.

Reported-by: Wouter van Kesteren <woutershep@gmail.com>
Cc: x86@kernel.org
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-11 22:04:47 +02:00
Zach Bobroff
d3768d885c x86, efi: retry ExitBootServices() on failure
ExitBootServices is absolutely supposed to return a failure if any
ExitBootServices event handler changes the memory map.  Basically the
get_map loop should run again if ExitBootServices returns an error the
first time.  I would say it would be fair that if ExitBootServices gives
an error the second time then Linux would be fine in returning control
back to BIOS.

The second change is the following line:

again:
        size += sizeof(*mem_map) * 2;

Originally you were incrementing it by the size of one memory map entry.
The issue here is all related to the low_alloc routine you are using.
In this routine you are making allocations to get the memory map itself.
Doing this allocation or allocations can affect the memory map by more
than one record.

[ mfleming - changelog, code style ]
Signed-off-by: Zach Bobroff <zacharyb@ami.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-11 07:51:54 +01:00
Borislav Petkov
43ab0476a6 efi: Convert runtime services function ptrs
... to void * like the boot services and lose all the void * casts. No
functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-11 07:39:26 +01:00
Matthew Garrett
f8b8404337 Modify UEFI anti-bricking code
This patch reworks the UEFI anti-bricking code, including an effective
reversion of cc5a080c and 31ff2f20. It turns out that calling
QueryVariableInfo() from boot services results in some firmware
implementations jumping to physical addresses even after entering virtual
mode, so until we have 1:1 mappings for UEFI runtime space this isn't
going to work so well.

Reverting these gets us back to the situation where we'd refuse to create
variables on some systems because they classify deleted variables as "used"
until the firmware triggers a garbage collection run, which they won't do
until they reach a lower threshold. This results in it being impossible to
install a bootloader, which is unhelpful.

Feedback from Samsung indicates that the firmware doesn't need more than
5KB of storage space for its own purposes, so that seems like a reasonable
threshold. However, there's still no guarantee that a platform will attempt
garbage collection merely because it drops below this threshold. It seems
that this is often only triggered if an attempt to write generates a
genuine EFI_OUT_OF_RESOURCES error. We can force that by attempting to
create a variable larger than the remaining space. This should fail, but if
it somehow succeeds we can then immediately delete it.

I've tested this on the UEFI machines I have available, but I don't have
a Samsung and so can't verify that it avoids the bricking problem.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Lee, Chun-Y <jlee@suse.com> [ dummy variable cleanup ]
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-10 21:59:37 +01:00
Linus Torvalds
50e6f8511a Bug-fixes for regressions:
- xen/tmem stopped working after a certain combination of modprobe/swapon was used
  - cpu online/offlining would trigger WARN_ON.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRtgQxAAoJEFjIrFwIi8fJ9QYH/ibEGBlKLfGzN4Apx6evBA68
 l9CuLIpGCkPOiJLgVs10zY77Sg3f95bHFkJrwrEIDeTQ42joNKybsIp+qZCIS5CW
 sAnzSd6Mb8jqQYJpgLl03+z3GdFzILTJ9e0zYTETbW2vfLpAETm87XWH+gjxDK0e
 9I0kZX+Q3+2VWi5xv+UwWkYIOLbggLyKYajTHDwWNuC7vQgkJulCAbmjgN/NBv7A
 xacvXdbEClIfZ5tvJJN0RggdEWo6WyTxyExfLPmpXFbHEvWDUX5LPEf8MhFY1ORK
 U1C0BSV6YuLo350G5lY4I0R75ZEWLeUyVkZFnJIeGnUF1rP6OXEuVWTyeivPDBA=
 =Fy36
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull xen fixes from Konrad Rzeszutek Wilk:
 "Two bug-fixes for regressions:
   - xen/tmem stopped working after a certain combination of
     modprobe/swapon was used
   - cpu online/offlining would trigger WARN_ON."

* tag 'stable/for-linus-3.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/tmem: Don't over-write tmem_frontswap_poolid after tmem_frontswap_init set it.
  xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU.
2013-06-10 13:27:46 -07:00
Konrad Rzeszutek Wilk
09e99da766 xen/time: Free onlined per-cpu data structure if we want to online it again.
If the per-cpu time data structure has been onlined already and
we are trying to online it again, then free the previous copy
before blindly over-writting it.

A developer naturally should not call this function multiple times
but just in case.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10 08:43:37 -04:00
Konrad Rzeszutek Wilk
a05e2c371f xen/time: Check that the per_cpu data structure has data before freeing.
We don't check whether the per_cpu data structure has actually
been freed in the past. This checks it and if it has been freed
in the past then just continues on without double-freeing.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10 08:43:36 -04:00
Konrad Rzeszutek Wilk
c9d76a24a2 xen/time: Don't leak interrupt name when offlining.
When the user does:
    echo 0 > /sys/devices/system/cpu/cpu1/online
    echo 1 > /sys/devices/system/cpu/cpu1/online

kmemleak reports:
kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

One of the leaks is from xen/time:

unreferenced object 0xffff88003fa51280 (size 32):
  comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
  hex dump (first 32 bytes):
    74 69 6d 65 72 31 00 00 00 00 00 00 00 00 00 00  timer1..........
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
    [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
    [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
    [<ffffffff812fe228>] kasprintf+0x38/0x40
    [<ffffffff81041ec1>] xen_setup_timer+0x51/0xf0
    [<ffffffff8166339f>] xen_cpu_up+0x5f/0x3e8
    [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
    [<ffffffff8166bd48>] cpu_up+0xd9/0xec
    [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
    [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
    [<ffffffff8165ce39>] kernel_init+0x9/0xf0
    [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
    [<ffffffffffffffff>] 0xffffffffffffffff

This patch fixes it by stashing away the 'name' in the per-cpu
data structure and freeing it when offlining the CPU.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10 08:43:35 -04:00
Konrad Rzeszutek Wilk
31620a198c xen/time: Encapsulate the struct clock_event_device in another structure.
We don't do any code movement. We just encapsulate the struct clock_event_device
in a new structure which contains said structure and a pointer to
a char *name. The 'name' will be used in 'xen/time: Don't leak interrupt
name when offlining'.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10 08:43:34 -04:00
Konrad Rzeszutek Wilk
354e7b7619 xen/spinlock: Don't leak interrupt name when offlining.
When the user does:
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online

kmemleak reports:
kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

unreferenced object 0xffff88003fa51260 (size 32):
  comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
  hex dump (first 32 bytes):
    73 70 69 6e 6c 6f 63 6b 31 00 00 00 00 00 00 00  spinlock1.......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
    [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
    [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
    [<ffffffff812fe228>] kasprintf+0x38/0x40
    [<ffffffff81663789>] xen_init_lock_cpu+0x61/0xbe
    [<ffffffff816633a6>] xen_cpu_up+0x66/0x3e8
    [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
    [<ffffffff8166bd48>] cpu_up+0xd9/0xec
    [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
    [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
    [<ffffffff8165ce39>] kernel_init+0x9/0xf0
    [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
    [<ffffffffffffffff>] 0xffffffffffffffff

Instead of doing it like the "xen/smp: Don't leak interrupt name when offlining"
patch did (which has a per-cpu structure which contains both the
IRQ number and char*) we use a per-cpu pointers to a *char.

The reason is that the "__this_cpu_read(lock_kicker_irq);" macro
blows up with "__bad_size_call_parameter()" as the size of the
returned structure is not within the parameters of what it expects
and optimizes for.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10 08:43:33 -04:00
Konrad Rzeszutek Wilk
b85fffec7f xen/smp: Don't leak interrupt name when offlining.
When the user does:
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online

kmemleak reports:
kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

unreferenced object 0xffff88003fa51240 (size 32):
  comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
  hex dump (first 32 bytes):
    72 65 73 63 68 65 64 31 00 00 00 00 00 00 00 00  resched1........
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
    [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
    [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
    [<ffffffff812fe228>] kasprintf+0x38/0x40
    [<ffffffff81047ed1>] xen_smp_intr_init+0x41/0x2c0
    [<ffffffff816636d3>] xen_cpu_up+0x393/0x3e8
    [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
    [<ffffffff8166bd48>] cpu_up+0xd9/0xec
    [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
    [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
    [<ffffffff8165ce39>] kernel_init+0x9/0xf0
    [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
    [<ffffffffffffffff>] 0xffffffffffffffff

This patch fixes some of it by using the 'struct xen_common_irq->name'
field to stash away the char so that it can be freed when
the interrupt line is destroyed.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10 08:43:32 -04:00
Konrad Rzeszutek Wilk
ee336e10d5 xen/smp: Set the per-cpu IRQ number to a valid default.
When we free it we want to make sure to set it to a default
value of -1 so that we don't double-free it (in case somebody
calls us twice).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10 08:43:31 -04:00
Konrad Rzeszutek Wilk
9547689fcd xen/smp: Introduce a common structure to contain the IRQ name and interrupt line.
This patch adds a new structure to contain the common two things
that each of the per-cpu interrupts need:
 - an interrupt number,
 - and the name of the interrupt (to be added in 'xen/smp: Don't leak
   interrupt name when offlining').

This allows us to carry the tuple of the per-cpu interrupt data structure
and expand it as we need in the future.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10 08:43:30 -04:00
Konrad Rzeszutek Wilk
53b94fdc8f xen/smp: Coalesce the free_irq calls in one function.
There are two functions that do a bunch of 'free_irq' on
the per_cpu IRQ. Instead of having duplicate code just move
it to one function.

This is just code movement.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10 08:43:29 -04:00
David S. Miller
93a306aef5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge 'net' into 'net-next' to get the MSG_CMSG_COMPAT
regression fix.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-06 23:39:26 -07:00
Linus Torvalds
c51aa6db2a PCI update for v3.10:
PCI ROM from EFI
       x86/PCI: Map PCI setup data with ioremap() so it can be in highmem
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRsMcBAAoJEFmIoMA60/r8EGMP/RiVo0oHoM1DlNfFonw8BjTl
 iClotD3rfQRgjDx8/563/KfO464mLJLbWdjanbpmWB215Xu4LJwv3wfWWEB1f3C+
 hbsz4aFLmoQFnCzg1BIsuNNu88cXymf4A+osfKl+pCxPY+zChkqNZUPBk0m7aW4/
 Y8JZz0NkxDUUb6ewtDhRDci9d5dHAzL/bmfcdUrP5X56znV72eVH555vY33hzNQ+
 opoB9bFOdjxq8EDMlSaGxHmClUH/1VJd8Gr9Q4oj8OUOMq7At/WbHbFYJPbn/09+
 xdh+78oGKZ6vI1rfIwlGSbJaizZJoE8l3EVEeY1rSgxF2zKNRAyN9IZOPaTigsjd
 5zHTwUp+itxstw3fxjeG0R+Gl8guf13XTWzGcR6sC1TTn/NXSxqI3inORs5PWmpP
 ldfPkOW5Y3pcP6UfmJcAn1z9W8toyCouovFPnSoit6ZzS/+aFY2vYfHBQZ8LdJHv
 Gq4sYmcxWTzfgBxybc6OczztXgTQ+grhs3nnP29/QrGDyJ6RA6E6ENKiqARkALCF
 1qFeOdWcoG4HaAlQhMRwuJTsUw7ofJWmYi6AdbHgI5pBQM4lLR1Kp+hGJnmeTBHm
 Svx98dv525HZ2tVUZwsd+ExsEfQ4IlPECVmOpxFkUf4sRILhvCTP5C6KWHBev7sG
 +F4rwomkmlS1hu82xMwl
 =dMNk
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.10-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "This fixes a crash when booting a 32-bit kernel via the EFI boot stub.

  PCI ROM from EFI
      x86/PCI: Map PCI setup data with ioremap() so it can be in highmem"

* tag 'pci-v3.10-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/PCI: Map PCI setup data with ioremap() so it can be in highmem
2013-06-06 16:28:15 -07:00
H. Peter Anvin
60e019eb37 x86: Get rid of ->hard_math and all the FPU asm fu
Reimplement FPU detection code in C and drop old, not-so-recommended
detection method in asm. Move all the relevant stuff into i387.c where
it conceptually belongs. Finally drop cpuinfo_x86.hard_math.

[ hpa: huge thanks to Borislav for taking my original concept patch
  and productizing it ]

[ Boris, note to self: do not use static_cpu_has before alternatives! ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1367244262-29511-2-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/1365436666-9837-2-git-send-email-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-06 14:32:04 -07:00
Matthew Garrett
1acba98f81 UEFI: Don't pass boot services regions to SetVirtualAddressMap()
We need to map boot services regions during startup in order to avoid
firmware bugs, but we shouldn't be passing those regions to
SetVirtualAddressMap(). Ensure that we're only passing regions that are
marked as being mapped at runtime.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-06 14:28:11 +01:00
David S. Miller
6bc19fb82d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge 'net' bug fixes into 'net-next' as we have patches
that will build on top of them.

This merge commit includes a change from Emil Goode
(emilgoode@gmail.com) that fixes a warning that would
have been introduced by this merge.  Specifically it
fixes the pingv6_ops method ipv6_chk_addr() to add a
"const" to the "struct net_device *dev" argument and
likewise update the dummy_ipv6_chk_addr() declaration.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-05 16:37:30 -07:00
Jacob Shin
cd1c32ca96 x86, microcode, amd: Allow multiple families' bin files appended together
Add support for parsing through multiple families' microcode patch
container binary files appended together when early loading. This is
already supported on Intel.

Reported-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1370463236-2115-3-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-05 13:56:55 -07:00
Jacob Shin
275bbe2e29 x86, microcode, amd: Make find_ucode_in_initrd() __init
Change find_ucode_in_initrd() to __init and only let BSP call it
during cold boot. This is the right thing to do because only BSP will
see initrd loaded by the boot loader. APs will offset into
initrd_start to find the microcode patch binary.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1370463236-2115-2-git-send-email-jacob.shin@amd.com
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-05 13:56:47 -07:00
Matt Fleming
65694c5aad x86/PCI: Map PCI setup data with ioremap() so it can be in highmem
f9a37be0f0 ("x86: Use PCI setup data") added support for using PCI ROM
images from setup_data.  This used phys_to_virt(), which is not valid for
highmem addresses, and can cause a crash when booting a 32-bit kernel via
the EFI boot stub.

pcibios_add_device() assumes that the physical addresses stored in
setup_data are accessible via the direct kernel mapping, and that calling
phys_to_virt() is valid.  This isn't guaranteed to be true on x86 where the
direct mapping range is much smaller than on x86-64.

Calling phys_to_virt() on a highmem address results in the following:

 BUG: unable to handle kernel paging request at 39a3c198
 IP: [<c262be0f>] pcibios_add_device+0x2f/0x90
 ...
 Call Trace:
  [<c2370c73>] pci_device_add+0xe3/0x130
  [<c274640b>] pci_scan_single_device+0x8b/0xb0
  [<c2370d08>] pci_scan_slot+0x48/0x100
  [<c2371904>] pci_scan_child_bus+0x24/0xc0
  [<c262a7b0>] pci_acpi_scan_root+0x2c0/0x490
  [<c23b7203>] acpi_pci_root_add+0x312/0x42f
  ...

The solution is to use ioremap() instead of phys_to_virt() to map the
setup data into the kernel address space.

[bhelgaas: changelog]
Tested-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: stable@vger.kernel.org	# v3.8+
2013-06-05 10:50:04 -06:00
Mathias Krause
a90936845d x86, mce: Fix "braodcast" typo
Fix the typo in MCJ_IRQ_BRAODCAST.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-06-05 11:59:17 +02:00
Gleb Natapov
05988d728d KVM: MMU: reduce KVM_REQ_MMU_RELOAD when root page is zapped
Quote Gleb's mail:
| why don't we check for sp->role.invalid in
| kvm_mmu_prepare_zap_page before calling kvm_reload_remote_mmus()?

and

| Actually we can add check for is_obsolete_sp() there too since
| kvm_mmu_invalidate_all_pages() already calls kvm_reload_remote_mmus()
| after incrementing mmu_valid_gen.

[ Xiao: add some comments and the check of is_obsolete_sp() ]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05 12:34:02 +03:00
Xiao Guangrong
365c886860 KVM: MMU: reclaim the zapped-obsolete page first
As Marcelo pointed out that
| "(retention of large number of pages while zapping)
| can be fatal, it can lead to OOM and host crash"

We introduce a list, kvm->arch.zapped_obsolete_pages, to link all
the pages which are deleted from the mmu cache but not actually
freed. When page reclaiming is needed, we always zap this kind of
pages first.

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05 12:33:33 +03:00
Xiao Guangrong
f34d251d66 KVM: MMU: collapse TLB flushes when zap all pages
kvm_zap_obsolete_pages uses lock-break technique to zap pages,
it will flush tlb every time when it does lock-break

We can reload mmu on all vcpus after updating the generation
number so that the obsolete pages are not used on any vcpus,
after that we do not need to flush tlb when obsolete pages
are zapped

It will do kvm_mmu_prepare_zap_page many times and use one
kvm_mmu_commit_zap_page to collapse tlb flush, the side-effects
is that causes obsolete pages unlinked from active_list but leave
on hash-list, so we add the comment around the hash list walker

Note: kvm_mmu_commit_zap_page is still needed before free
the pages since other vcpus may be doing locklessly shadow
page walking

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05 12:33:18 +03:00
Xiao Guangrong
e7d11c7a89 KVM: MMU: zap pages in batch
Zap at lease 10 pages before releasing mmu-lock to reduce the overload
caused by requiring lock

After the patch, kvm_zap_obsolete_pages can forward progress anyway,
so update the comments

[ It improves the case 0.6% ~ 1% that do kernel building meanwhile read
  PCI ROM. ]

Note: i am not sure that "10" is the best speculative value, i just
guessed that '10' can make vcpu do not spend long time on
kvm_zap_obsolete_pages and do not cause mmu-lock too hungry.

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05 12:33:10 +03:00
Xiao Guangrong
7f52af7412 KVM: MMU: do not reuse the obsolete page
The obsolete page will be zapped soon, do not reuse it to
reduce future page fault

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05 12:33:04 +03:00
Xiao Guangrong
35006126f0 KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages
It is good for debug and development

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05 12:32:57 +03:00
Xiao Guangrong
2248b02321 KVM: MMU: show mmu_valid_gen in shadow page related tracepoints
Show sp->mmu_valid_gen

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05 12:32:49 +03:00
Xiao Guangrong
6ca18b6950 KVM: x86: use the fast way to invalidate all pages
Replace kvm_mmu_zap_all by kvm_mmu_invalidate_zap_all_pages

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05 12:32:42 +03:00
Xiao Guangrong
5304b8d37c KVM: MMU: fast invalidate all pages
The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
walk and zap all shadow pages one by one, also it need to zap all guest
page's rmap and all shadow page's parent spte list. Particularly, things
become worse if guest uses more memory or vcpus. It is not good for
scalability

In this patch, we introduce a faster way to invalidate all shadow pages.
KVM maintains a global mmu invalid generation-number which is stored in
kvm->arch.mmu_valid_gen and every shadow page stores the current global
generation-number into sp->mmu_valid_gen when it is created

When KVM need zap all shadow pages sptes, it just simply increase the
global generation-number then reload root shadow pages on all vcpus.
Vcpu will create a new shadow page table according to current kvm's
generation-number. It ensures the old pages are not used any more.
Then the obsolete pages (sp->mmu_valid_gen != kvm->arch.mmu_valid_gen)
are zapped by using lock-break technique

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05 12:32:33 +03:00
Xiao Guangrong
a2ae162265 KVM: MMU: drop unnecessary kvm_reload_remote_mmus
It is the responsibility of kvm_mmu_zap_all that keeps the
consistent of mmu and tlbs. And it is also unnecessary after
zap all mmio sptes since no mmio spte exists on root shadow
page and it can not be cached into tlb

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05 12:32:24 +03:00
Xiao Guangrong
758ccc89b8 KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercall
Quote Gleb's mail:

| Back then kvm->lock protected memslot access so code like:
|
| mutex_lock(&vcpu->kvm->lock);
| kvm_mmu_zap_all(vcpu->kvm);
| mutex_unlock(&vcpu->kvm->lock);
|
| which is what 7aa81cc0 does was enough to guaranty that no vcpu will
| run while code is patched. This is no longer the case and
| mutex_lock(&vcpu->kvm->lock); is gone from that code path long time ago,
| so now kvm_mmu_zap_all() there is useless and the code is incorrect.

So we drop it and it will be fixed later

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05 12:32:00 +03:00
Linus Torvalds
8b35c35955 Merge branch 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm bugfixes from Gleb Natapov:
 "The bulk of the fixes is in MIPS KVM kernel<->userspace ABI.  MIPS KVM
  is new for 3.10 and some problems were found with current ABI.  It is
  better to fix them now and do not have a kernel with broken one"

* 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Fix race in apic->pending_events processing
  KVM: fix sil/dil/bpl/spl in the mod/rm fields
  KVM: Emulate multibyte NOP
  ARM: KVM: be more thorough when invalidating TLBs
  ARM: KVM: prevent NULL pointer dereferences with KVM VCPU ioctl
  mips/kvm: Use ENOIOCTLCMD to indicate unimplemented ioctls.
  mips/kvm: Fix ABI by moving manipulation of CP0 registers to KVM_{G,S}ET_ONE_REG
  mips/kvm: Use ARRAY_SIZE() instead of hardcoded constants in kvm_arch_vcpu_ioctl_{s,g}et_regs
  mips/kvm: Fix name of gpr field in struct kvm_regs.
  mips/kvm: Fix ABI for use of 64-bit registers.
  mips/kvm: Fix ABI for use of FPU.
2013-06-05 09:09:35 +09:00
Michael Wang
e8747f10ba x86, cleanups: Remove extra tab in __flush_tlb_one()
Remove the extra tab in __flush_tlb_one().

CC: Alex Shi <alex.shi@intel.com>
CC: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/51AD8902.60603@linux.vnet.ibm.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-04 11:51:26 -07:00
Konrad Rzeszutek Wilk
466318a87f xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU.
The xen_play_dead is an undead function. When the vCPU is told to
offline it ends up calling xen_play_dead wherin it calls the
VCPUOP_down hypercall which offlines the vCPU. However, when the
vCPU is onlined back, it resumes execution right after
VCPUOP_down hypercall.

That was OK (albeit the API for play_dead assumes that the CPU
stays dead and never returns) but with commit 4b0c0f294
(tick: Cleanup NOHZ per cpu data on cpu down) that is no longer safe
as said commit resets the ts->inidle which at the start of the
cpu_idle loop was set.

The net effect is that we get this warn:

Broke affinity for irq 16
installing Xen timer for CPU 1
cpu 1 spinlock event irq 48
------------[ cut here ]------------
WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0()
Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc3upstream-00068-gdcdbe33 #1
Hardware name: BIOSTAR Group N61PB-M2S/N61PB-M2S, BIOS 6.00 PG 09/03/2009
 ffffffff8193b448 ffff880039da5e60 ffffffff816707c8 ffff880039da5ea0
 ffffffff8108ce8b ffff880039da4010 ffff88003fa8e500 ffff880039da4010
 0000000000000001 ffff880039da4000 ffff880039da4010 ffff880039da5eb0
Call Trace:
 [<ffffffff816707c8>] dump_stack+0x19/0x1b
 [<ffffffff8108ce8b>] warn_slowpath_common+0x6b/0xa0
 [<ffffffff8108ced5>] warn_slowpath_null+0x15/0x20
 [<ffffffff810e4745>] tick_nohz_idle_exit+0x195/0x1b0
 [<ffffffff810da755>] cpu_startup_entry+0x205/0x250
 [<ffffffff81661070>] cpu_bringup_and_idle+0x13/0x15
---[ end trace 915c8c486004dda1 ]---

b/c ts_inidle is set to zero. Thomas suggested that we just add a workaround
to call tick_nohz_idle_enter before returning from xen_play_dead() - and
that is what this patch does and fixes the issue.

We also add the stable part b/c git commit 4b0c0f294 is on the stable
tree.

CC: stable@vger.kernel.org
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-04 09:03:18 -04:00
Stephen Rothwell
40b313608a Finally eradicate CONFIG_HOTPLUG
Ever since commit 45f035ab9b ("CONFIG_HOTPLUG should be always on"),
it has been basically impossible to build a kernel with CONFIG_HOTPLUG
turned off.  Remove all the remaining references to it.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-03 14:20:18 -07:00
Gleb Natapov
299018f44a KVM: Fix race in apic->pending_events processing
apic->pending_events processing has a race that may cause INIT and
SIPI
processing to be reordered:

vpu0:                            vcpu1:
set INIT
                               test_and_clear_bit(KVM_APIC_INIT)
                                  process INIT
set INIT
set SIPI
                               test_and_clear_bit(KVM_APIC_SIPI)
                                  process SIPI

At the end INIT is left pending in pending_events. The following patch
fixes this by latching pending event before processing them.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-03 11:32:39 +03:00
Paolo Bonzini
8acb42070e KVM: fix sil/dil/bpl/spl in the mod/rm fields
The x86-64 extended low-byte registers were fetched correctly from reg,
but not from mod/rm.

This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
not enough.

Cc: <stable@vger.kernel.org> # 3.9
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-03 11:27:12 +03:00
Paolo Bonzini
103f98ea64 KVM: Emulate multibyte NOP
This is encountered when booting RHEL5.9 64-bit.  There is another bug
after this one that is not a simple emulation failure, but this one lets
the boot proceed a bit.

Cc: <stable@vger.kernel.org> # 3.9
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-03 11:20:53 +03:00
Jacob Shin
6b3389ac21 x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=m
Fix section mismatch warnings on microcode_amd_early.
Compile error occurs when CONFIG_MICROCODE=m, change so that early
loading depends on microcode_core.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/20130531150241.GA12006@jshin-Toonie
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-31 13:56:58 -07:00
Yinghai Lu
7de3d66b13 x86: Fix adjust_range_size_mask calling position
Commit

    8d57470d x86, mm: setup page table in top-down

causes a kernel panic while setting mem=2G.

     [mem 0x00000000-0x000fffff] page 4k
     [mem 0x7fe00000-0x7fffffff] page 1G
     [mem 0x7c000000-0x7fdfffff] page 1G
     [mem 0x00100000-0x001fffff] page 4k
     [mem 0x00200000-0x7bffffff] page 2M

for last entry is not what we want, we should have
     [mem 0x00200000-0x3fffffff] page 2M
     [mem 0x40000000-0x7bffffff] page 1G

Actually we merge the continuous ranges with same page size too early.
in this case, before merging we have
     [mem 0x00200000-0x3fffffff] page 2M
     [mem 0x40000000-0x7bffffff] page 2M
after merging them, will get
     [mem 0x00200000-0x7bffffff] page 2M
even we can use 1G page to map
     [mem 0x40000000-0x7bffffff]

that will cause problem, because we already map
     [mem 0x7fe00000-0x7fffffff] page 1G
     [mem 0x7c000000-0x7fdfffff] page 1G
with 1G page, aka [0x40000000-0x7fffffff] is mapped with 1G page already.
During phys_pud_init() for [0x40000000-0x7bffffff], it will not
reuse existing that pud page, and allocate new one then try to use
2M page to map it instead, as page_size_mask does not include
PG_LEVEL_1G. At end will have [7c000000-0x7fffffff] not mapped, loop
in phys_pmd_init stop mapping at 0x7bffffff.

That is right behavoir, it maps exact range with exact page size that
we ask, and we should explicitly call it to map [7c000000-0x7fffffff]
before or after mapping 0x40000000-0x7bffffff.
Anyway we need to make sure ranges' page_size_mask correct and consistent
after split_mem_range for each range.

Fix that by calling adjust_range_size_mask before merging range
with same page size.

-v2: update change log.
-v3: add more explanation why [7c000000-0x7fffffff] is not mapped, and
    it causes panic.

Bisected-by: "Xie, ChanglongX" <changlongx.xie@intel.com>
Bisected-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reported-and-tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1370015587-20835-1-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-31 13:21:32 -07:00
Paul Bolle
71c69f7f4b x86/mce: Remove check for CONFIG_X86_MCE_P4THERMAL
The Kconfig symbol X86_MCE_P4THERMAL was removed in v2.6.32.
Remove a useless check for its macro, as it will now always
evaluate to false.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Link: http://lkml.kernel.org/r/1369853850.23034.28.camel@x61.thuisdomein
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 13:12:35 +02:00
Andrew Jones
b0bc225d0e sched/x86: Construct all sibling maps if smt
Commit 316ad24830 ("sched/x86: Rewrite
set_cpu_sibling_map()") broke the construction of sibling maps,
which also broke the booted_cores accounting.

Before the rewrite, if smt was present, then each map was
updated for each smt sibling. After the rewrite only
cpu_sibling_mask gets updated, as the llc and core maps depend
on 'has_mc = x86_max_cores > 1' instead. This leads to problems
with topologies like the following

(qemu -smp sockets=2,cores=1,threads=2)

  processor       : 0
  physical id     : 0
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 1

  processor       : 1
  physical id     : 0
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 0    <= should be 1

  processor       : 2
  physical id     : 1
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 1

  processor       : 3
  physical id     : 1
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 0    <= should be 1

This patch restores the former construction by defining has_mc
as (has_smt || x86_max_cores > 1). This should be fine as there
were no (has_smt && !has_mc) conditions in the context.

Aso rename has_mc to has_mp now that it's not just for cores.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: a.p.zijlstra@chello.nl
Cc: fenghua.yu@intel.com
Link: http://lkml.kernel.org/r/1369831695-11970-1-git-send-email-drjones@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 13:10:38 +02:00
Jan Beulich
1d10f6ee60 x86: __force_order doesn't need to be an actual variable
It being static causes over a dozen instances to be scattered
across the kernel image, with non of them ever being referenced
in any way. Making the variable extern without ever defining it
works as well - all we need is to have the compiler think the
variable is being accessed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/51A610B802000078000D99A0@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 13:09:17 +02:00
Jan Beulich
cbdce7b251 ix86: Don't waste fixmap entries
The vsyscall related pvclock entries can only ever be used on
x86-64, and hence they shouldn't even get allocated for 32-bit
kernels (the more that it is there where address space is
relatively precious).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/51A60F1F02000078000D997C@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 13:08:18 +02:00
Jacob Shin
757885e94a x86, microcode, amd: Early microcode patch loading support for AMD
Add early microcode patch loading support for AMD.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-5-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Jacob Shin
a76096a657 x86, microcode, amd: Refactor functions to prepare for early loading
In preparation work for early loading, refactor some common functions
that will be shared, and move some struct defines to a common header file.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-4-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Jacob Shin
f2b3ee820a x86, microcode: Vendor abstract out save_microcode_in_initrd()
Currently save_microcode_in_initrd() is declared in vendor neutural
microcode.h file, but defined in vendor specific
microcode_intel_early.c file. Vendor abstract it out to
microcode_core_early.c with a wrapper function.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-3-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Borislav Petkov
83b325f1b4 x86, microcode, intel: Correct typo in printk
User-visible so correct it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1369940959-2077-2-git-send-email-jacob.shin@amd.com
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Andy Lutomirski
d0d98eedee Add arch_phys_wc_{add, del} to manipulate WC MTRRs if needed
Several drivers currently use mtrr_add through various #ifdef guards
and/or drm wrappers.  The vast majority of them want to add WC MTRRs
on x86 systems and don't actually need the MTRR if PAT (i.e.
ioremap_wc, etc) are working.

arch_phys_wc_add and arch_phys_wc_del are new functions, available
on all architectures and configurations, that add WC MTRRs on x86 if
needed (and handle errors) and do nothing at all otherwise.  They're
also easier to use than mtrr_add and mtrr_del, so the call sites can
be simplified.

As an added benefit, this will avoid wasting MTRRs and possibly
warning pointlessly on PAT-supporting systems.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-05-31 13:02:52 +10:00
Linus Torvalds
484b002e28 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:

 - Three EFI-related fixes

 - Two early memory initialization fixes

 - build fix for older binutils

 - fix for an eager FPU performance regression -- currently we don't
   allow the use of the FPU at interrupt time *at all* in eager mode,
   which is clearly wrong.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Allow FPU to be used at interrupt time even with eagerfpu
  x86, crc32-pclmul: Fix build with older binutils
  x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
  x86, range: fix missing merge during add range
  x86, efi: initial the local variable of DataSize to zero
  efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse
  efivarfs: Never return ENOENT from firmware again
2013-05-31 09:44:10 +09:00
Pekka Riikonen
5187b28ff0 x86: Allow FPU to be used at interrupt time even with eagerfpu
With the addition of eagerfpu the irq_fpu_usable() now returns false
negatives especially in the case of ksoftirqd and interrupted idle task,
two common cases for FPU use for example in networking/crypto.  With
eagerfpu=off FPU use is possible in those contexts.  This is because of
the eagerfpu check in interrupted_kernel_fpu_idle():

...
  * For now, with eagerfpu we will return interrupted kernel FPU
  * state as not-idle. TBD: Ideally we can change the return value
  * to something like __thread_has_fpu(current). But we need to
  * be careful of doing __thread_clear_has_fpu() before saving
  * the FPU etc for supporting nested uses etc. For now, take
  * the simple route!
...
 	if (use_eager_fpu())
 		return 0;

As eagerfpu is automatically "on" on those CPUs that also have the
features like AES-NI this patch changes the eagerfpu check to return 1 in
case the kernel_fpu_begin() has not been said yet.  Once it has been the
__thread_has_fpu() will start returning 0.

Notice that with eagerfpu the __thread_has_fpu is always true initially.
FPU use is thus always possible no matter what task is under us, unless
the state has already been saved with kernel_fpu_begin().

[ hpa: this is a performance regression, not a correctness regression,
  but since it can be quite serious on CPUs which need encryption at
  interrupt time I am marking this for urgent/stable. ]

Signed-off-by: Pekka Riikonen <priikone@iki.fi>
Link: http://lkml.kernel.org/r/alpine.GSO.2.00.1305131356320.18@git.silcnet.org
Cc: <stable@vger.kernel.org> v3.7+
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-30 16:36:42 -07:00
Jan Beulich
2baad6121e x86, crc32-pclmul: Fix build with older binutils
binutils prior to 2.18 (e.g. the ones found on SLE10) don't support
assembling PEXTRD, so a macro based approach like the one for PCLMULQDQ
in the same file should be used.

This requires making the helper macros capable of recognizing 32-bit
general purpose register operands.

[ hpa: tagging for stable as it is a low risk build fix ]

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/51A6142A02000078000D99D8@nat28.tlf.novell.com
Cc: Alexander Boyko <alexander_boyko@xyratex.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-30 16:36:23 -07:00
Linus Torvalds
3655b22de0 Fixes:
- Use proper error paths
  - Clean up APIC IPI usage (incorrect arguments)
  - Delay XenBus frontend resume is backend (xenstored) is not running
  - Fix build error with various combinations of CONFIG_
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRp59pAAoJEFjIrFwIi8fJRIYIAKvRh2Dp/AB44ZN97MW/QhEN
 NUvrSTYr2HlqcUW7bv0ScrMLb0LlFeo+9s/bo0KI2+2F+zK822WPC+2KEZmzQIVs
 q261dNsA3/HoyBDOLwWjatjsSus+njBOEgDIwARPwhkoon4fRXBnRJVMy+0bZC3I
 fpd1nlUy0J7jW0QLO5ueKqd5ZN0Mkwn2H4+D8TOPVYHCnk3mT2W+qLCEJmkMxOuZ
 iFYy95K1ky5r0leUUwCTUIGLmgftoh0Qo/RweXSmzuLiZrY+5ilike3gxQSiAjsM
 lIjq+gKXNJJGz4M6wbOTfDzb/WQnKD+2PqlsbulrTD7E6RD6wIsqG/zvc1RqHqw=
 =9gi8
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen fixes from Konrad Rzeszutek Wilk:
 - Use proper error paths
 - Clean up APIC IPI usage (incorrect arguments)
 - Delay XenBus frontend resume is backend (xenstored) is not running
 - Fix build error with various combinations of CONFIG_

* tag 'stable/for-linus-3.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xenbus_client.c: correct exit path for xenbus_map_ring_valloc_hvm
  xen-pciback: more uses of cached MSI-X capability offset
  xen: Clean up apic ipi interface
  xenbus: save xenstore local status for later use
  xenbus: delay xenbus frontend resume if xenstored is not running
  xmem/tmem: fix 'undefined variable' build error.
2013-05-31 06:01:18 +09:00
Dimitri Sivanich
879d5ad0dc x86/UV: Add GRU distributed mode mappings
GRU hardware will support an optional distributed mode that will
allow  per-node address mapping of local GRU space, as opposed
to mapping all GRU hardware to the same contiguous high space.

If GRU distributed mode is selected, setup per-node page table
mappings.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130529155609.GB22917@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-30 09:46:09 +02:00
John Stultz
ce0b098981 x86: Fix vrtc_get_time/set_mmss to use new timespec interface
The patch "x86: Increase precision of x86_platform.get/set_wallclock"
changed the x86 platform set_wallclock/get_wallclock interfaces to
use nsec granular timespecs instead of a second granular interface.

However, that patch missed converting the vrtc code, so this patch
converts those functions to use timespecs.

Many thanks to the kbuild test robot for finding this!

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-05-29 12:57:35 -07:00
Stefan Bader
1db01b4903 xen: Clean up apic ipi interface
Commit f447d56d36 introduced the
implementation of the PV apic ipi interface. But there were some
odd things (it seems none of which cause really any issue but
maybe they should be cleaned up anyway):
 - xen_send_IPI_mask_allbutself (and by that xen_send_IPI_allbutself)
   ignore the passed in vector and only use the CALL_FUNCTION_SINGLE
   vector. While xen_send_IPI_all and xen_send_IPI_mask use the vector.
 - physflat_send_IPI_allbutself is declared unnecessarily. It is never
   used.

This patch tries to clean up those things.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-05-29 09:04:21 -04:00
Zhang Yanfei
e9d0626ed4 x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
In head_64.S, a switchover has been used to handle kernel crossing
1G, 512G boundaries.

And commit 8170e6bed4
    x86, 64bit: Use a #PF handler to materialize early mappings on demand
said:
    During the switchover in head_64.S, before #PF handler is available,
    we use three pages to handle kernel crossing 1G, 512G boundaries with
    sharing page by playing games with page aliasing: the same page is
    mapped twice in the higher-level tables with appropriate wraparound.

But from the switchover code, when we set up the PUD table:
114         addq    $4096, %rdx
115         movq    %rdi, %rax
116         shrq    $PUD_SHIFT, %rax
117         andl    $(PTRS_PER_PUD-1), %eax
118         movq    %rdx, (4096+0)(%rbx,%rax,8)
119         movq    %rdx, (4096+8)(%rbx,%rax,8)

It seems line 119 has a potential bug there. For example,
if the kernel is loaded at physical address 511G+1008M, that is
    000000000 111111111 111111000 000000000000000000000
and the kernel _end is 512G+2M, that is
    000000001 000000000 000000001 000000000000000000000
So in this example, when using the 2nd page to setup PUD (line 114~119),
rax is 511.
In line 118, we put rdx which is the address of the PMD page (the 3rd page)
into entry 511 of the PUD table. But in line 119, the entry we calculate from
(4096+8)(%rbx,%rax,8) has exceeded the PUD page. IMO, the entry in line
119 should be wraparound into entry 0 of the PUD table.

The patch fixes the bug.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/5191DE5A.3020302@cn.fujitsu.com
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-28 15:41:59 -07:00
David Vrabel
3565184ed0 x86: Increase precision of x86_platform.get/set_wallclock()
All the virtualized platforms (KVM, lguest and Xen) have persistent
wallclocks that have more than one second of precision.

read_persistent_wallclock() and update_persistent_wallclock() allow
for nanosecond precision but their implementation on x86 with
x86_platform.get/set_wallclock() only allows for one second precision.
This means guests may see a wallclock time that is off by up to 1
second.

Make set_wallclock() and get_wallclock() take a struct timespec
parameter (which allows for nanosecond precision) so KVM and Xen
guests may start with a more accurate wallclock time and a Xen dom0
can maintain a more accurate wallclock for guests.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-05-28 14:00:59 -07:00
Linus Torvalds
30a9e50143 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This push fixes a crash in the new sha256_ssse3 driver as well as a
  DMA setup/teardown bug in caam"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: sha256_ssse3 - fix stack corruption with SSSE3 and AVX implementations
  crypto: caam - fix inconsistent assoc dma mapping direction
2013-05-28 10:09:38 -07:00
Yijing Wang
ea221e6414 x86/PCI: Increase info->res_num before checking pci_use_crs
We should increase info->res_num before we checking pci_use_crs return
when pci=nocrs set.

No functional change, since we don't use res_num and res_offset[]
in the "!pci_use_crs" case anyway, but this makes the code read better.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Feng Tang <feng.tang@intel.com>
2013-05-28 11:09:16 -06:00
Borislav Petkov
46ff53874b x86, platform, kvm, kconfig: Turn existing .config's into KVM-capable configs
Add an config file snippet which enables additional options
useful for running the kernel in a kvm guest. When you execute
'make kvmconfig' it merges those options with an already
existing user config before you build the kernel.

Based on an patch from the external lkvm tree.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: penberg@kernel.org
Cc: levinsasha928@gmail.com
Cc: mtosatti@redhat.com
Cc: fengguang.wu@intel.com
Link: http://lkml.kernel.org/r/20130522144638.GB15085@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 12:11:32 +02:00
Thorsten Glaser
8ef726cb23 update AMD powerflags comments
from http://www.flounder.com/cpuid80000007amd.gif
and http://support.amd.com/us/Embedded_TechDocs/25481.pdf

Signed-off-by: Thorsten Glaser <t.glaser@tarent.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-28 12:02:10 +02:00
Zhang Yanfei
592a9b8cc8 x86/mm: Drop unneeded include <asm/*pgtable, page*_types.h>
arch/x86/boot/compressed/head_64.S includes <asm/pgtable_types.h> and
 <asm/page_types.h> but it doesn't look like it needs them. So remove them.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/5191FAE2.4020403@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 11:47:23 +02:00
Zhang Yanfei
1e3b308181 x86_64: Correct phys_addr in cleanup_highmap comment
For x86_64, we have phys_base, which means the delta between the
the address kernel is actually running at and the address kernel
is compiled to run at. Not phys_addr so correct it.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/5192F9BF.2000802@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 11:41:24 +02:00
Jussi Kivilinna
a710f761fc crypto: sha256_ssse3 - add sha224 support
Add sha224 implementation to sha256_ssse3 module.

This also fixes sha256_ssse3 module autoloading issue when 'sha224' is used
before 'sha256'. Previously in such case, just sha256_generic was loaded and
not sha256_ssse3 (since it did not provide sha224). Now if 'sha256' was used
after 'sha224' usage, sha256_ssse3 would remain unloaded.

Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-05-28 15:43:05 +08:00
Jussi Kivilinna
340991e30c crypto: sha512_ssse3 - add sha384 support
Add sha384 implementation to sha512_ssse3 module.

This also fixes sha512_ssse3 module autoloading issue when 'sha384' is used
before 'sha512'. Previously in such case, just sha512_generic was loaded and
not sha512_ssse3 (since it did not provide sha384). Now if 'sha512' was used
after 'sha384' usage, sha512_ssse3 would remain unloaded. For example, this
happens with tcrypt testing module since it tests 'sha384' before 'sha512'.

Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-05-28 15:43:05 +08:00
Michael S. Tsirkin
016be2e55d x86: uaccess s/might_sleep/might_fault/
The only reason uaccess routines might sleep
is if they fault. Make this explicit.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1369577426-26721-9-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:41:10 +02:00
Peter Zijlstra
1b45adcd9a perf/x86/amd: Rework AMD PMU init code
Josh reported that his QEMU is a bad hardware emulator and trips a
WARN in the AMD PMU init code. He requested the WARN be turned into a
pr_err() or similar.

While there, rework the code a little.

Reported-by: Josh Boyer <jwboyer@redhat.com>
Acked-by: Robert Richter <rric@kernel.org>
Acked-by: Jacob Shin <jacob.shin@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130521110537.GG26912@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:13:55 +02:00
Stephane Eranian
2b923c8f5d perf/x86: Check branch sampling priv level in generic code
This patch moves commit 7cc23cd to the generic code:

 perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL

The check is now implemented in generic code instead of x86 specific
code. That way we do not have to repeat the test in each arch
supporting branch sampling.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20130521105337.GA2879@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:13:54 +02:00
Dan Carpenter
13acac3075 perf/x86/intel: Prevent some shift wrapping bugs in the Intel uncore driver
We're trying to use 64 bit masks but the shifts wrap so we can't use the
high 32 bits. I've fixed this by changing several types to unsigned
long long.

This is a static checker fix.  The one change which is clearly needed is
"mask = 0xff << (idx * 8);" where the author obviously intended to use
all 64 bits.  The other changes are mostly to silence my static checker.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20130518183452.GA14587@elgon.mountain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:13:52 +02:00
Jiri Olsa
ddd40da4cc x86/signals: Merge EFLAGS bit clearing into a single statement
Merging EFLAGS bit clearing into a single statement, to
ensure EFLAGS bits are being cleared in a single instruction.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1367421944-19082-4-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 08:46:53 +02:00
Jiri Olsa
24cda10996 x86/signals: Clear RF EFLAGS bit for signal handler
Clearing RF EFLAGS bit for signal handler. The reason is
that this flag is set by debug exception code to prevent
the recursive exception entry.

Leaving it set for signal handler might prevent debug
exception of the signal handler itself.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1367421944-19082-3-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 08:46:52 +02:00
Jiri Olsa
5e219b3c67 x86/signals: Propagate RF EFLAGS bit through the signal restore call
While porting Vince's perf overflow tests I found perf event
breakpoint overflow does not work properly.

I found the x86 RF EFLAG bit not being set when returning
from debug exception after triggering signal handler. Which
is exactly what you get when you set perf breakpoint overflow
SIGIO handler.

This patch and the next two patches fix the underlying bugs.

This patch adds the RF EFLAGS bit to be restored on return from
signal from the original register context before the signal was
entered.

This will prevent the RF flag to disappear when returning
from exception due to the signal handler being executed.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1367421944-19082-2-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 08:46:50 +02:00
Jussi Kivilinna
de614e561b crypto: sha256_ssse3 - fix stack corruption with SSSE3 and AVX implementations
The _XFER stack element size was set too small, 8 bytes, when it needs to be
16 bytes. As _XFER is the last stack element used by these implementations,
the 16 byte stores with 'movdqa' corrupt the stack where the value of register
%r12 is temporarily stored. As these implementations align the stack pointer
to 16 bytes, this corruption did not happen every time.

Patch corrects this issue.

Reported-by: Julian Wollrath <jwollrath@web.de>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Tested-by: Julian Wollrath <jwollrath@web.de>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-05-28 13:46:47 +08:00
David S. Miller
e6ff4c75f9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge net into net-next because some upcoming net-next changes
build on top of bug fixes that went into net.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-24 16:48:28 -07:00
Tim Chen
0b95a7f857 crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform
Glue code that plugs the PCLMULQDQ accelerated CRC T10 DIF hash into the
crypto framework.  The config CRYPTO_CRCT10DIF_PCLMUL should be turned
on to enable the feature.  The crc_t10dif crypto library function will
use this faster algorithm when crct10dif_pclmul module is loaded.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-05-24 17:55:27 +08:00
Linus Torvalds
b91fd4d5aa PCI updates for v3.10:
Moorestown
       Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0"
   Hotplug
       PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRnnUuAAoJEFmIoMA60/r8+JwQALrHdQvA8rGl/TNF0xkJjsU+
 EfdkBj23+qYYsYNla7Z/CfY7k5gXTtymqauhzpa1VRoY/bkpSWMDBxK551CxgBae
 RW0JbZxfoswLlA9eVRUz6mlUl7hL0Ky0/E6d9/UBXSAC0wsN6BB11lqKgi/uxAr2
 ACuZBZW1EZP9zoD615Id5gtRAZe8+9w47p8d6kQcyJntmCoZwo8Na/HRE/tR+wVY
 o3AHMxMOr9Ac+G3/rFl5ffzoHkLwTws4UEVNhvKBek3ogsAau3tgeIgjjvFRToFh
 HxsdpCmgOAfjCF/RC0AxRugtykHw4n3WlaND2WM6JWolLeIQTZr41GlwYy4F1CRG
 gSRk0dMNGBYs9OgDC81KJGYqstHwL0MTig2ul4BvlrLanLnySgVu1kbhaerR3WzA
 11k0GwlJbkFMCbQLZDaq51KHAdDYnCsRVG/tOGmsM8puUPOTXu3pLbPk3SbgWVGc
 7vbUkjJlHwlNoUbeyOOOElUMCRnIiCl/IetwjgZb+Qibq2dO4v6WF5GR+2DPBwua
 8GNbzQXyNxOPzvYBHE4YgK5lcb7yVeLvlOtPgbM0UMqC1SnFISzJRiNpfbj21rAk
 ADT4lqL29vqkX37uIQ55KPeTTThA65EKUoI3+dNtGHLSPoKdt3QlJc9wwQNKxEzx
 EEHxGRkebYNcPB1GqwEl
 =M8/H
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Here are some more fixes for v3.10.  The Moorestown update broke Intel
  Medfield devices, so I reverted it.  The acpiphp change fixes a
  regression: we broke hotplug notifications to host bridges when we
  split acpiphp into the host-bridge related part and the
  endpoint-related part.

  Moorestown
      Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0"
  Hotplug
      PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check"

* tag 'pci-v3.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0"
  PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check
2013-05-23 13:50:53 -07:00
H. Peter Anvin
3979fdce77 * Avoid confusing the user by returning -EIO instead of -ENOENT in
efivarfs if an EFI variable gets deleted from under us and return EOF
    when reading from a zero-length file - Lingzhu Xiang
 
  * Fix an oops in efivar_update_sysfs_entries() caused by reusing (and
    therefore corrupting) a kzalloc() allocation - Seiji Aguchi
 
  * Initialise the DataSize argument to GetVariable() otherwise it will
    not be updated with the actual size of the variable on return.
    Discovered on a Acer Aspire V3 BIOS - Lee, Chun-Yi
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRlgJ5AAoJEC84WcCNIz1VcRgQAJWnQpuI+Y1RjPYuLb8ISZ92
 jKnVSAQPaWDccf/KKx7qtk2iK6G1mExomOggp6PpV4/uHWqkQ64qoGL0HmRbmdH7
 //gNXm+ITnSVojoTBBis2LYEYDUQ1aDp3vTevFT2OOMKzKRsgwFD5KbqLnPJCxnl
 kFHlxQSPa7ZYk6GXOVdBJ5ya7spU+Lk4ODiL4xM8EBkMBedUStedB5zNg5rEMLJq
 5FZfs2Ryg7riOkhBnKWCvFjAj3PO/yHLVdrVrGZx+KpATHr/rbsbzDEp3+Pk8zEu
 3aE1p/XLaB4Qk9WvwhWonWF7PSZP41SjrcEt77mTmYivk0cJHBORo/wJsom683BU
 4elWR588Pi92nKEx4MSLREhkzP5RdVFIb4Yrg+1xKLEfI5Kfj1nEQrFVk53VM/rL
 L1m1bwhKLaNUVFJmxJYHaZ6VTjcB6lupz9/AMQr1N1UAavlmr4xEveNTfto+Ys7u
 /J3Z04wouhOGQonVDrZxxOzWFewq1kR/hRfwBAf/V0qemAeS61ZHGpfmmq00y5L2
 jmXncQ5EspFIJfk7+KV2sPQfqC8U8b2jyfO36Ed9u3d/qQ6eiQ1rTZMLKGg1iSdW
 Z+94w7+huEdsNZShSUE8+HXNvzPCS6704PBa9YT292ZibWM9onk6AoBQy/snrEaO
 K+PKxkZyPBML77Vs4TSI
 =8+a3
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * Avoid confusing the user by returning -EIO instead of -ENOENT in
   efivarfs if an EFI variable gets deleted from under us and return EOF
   when reading from a zero-length file - Lingzhu Xiang

 * Fix an oops in efivar_update_sysfs_entries() caused by reusing (and
   therefore corrupting) a kzalloc() allocation - Seiji Aguchi

 * Initialise the DataSize argument to GetVariable() otherwise it will
   not be updated with the actual size of the variable on return.
   Discovered on a Acer Aspire V3 BIOS - Lee, Chun-Yi

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-22 10:58:57 -07:00
Avi Kivity
e47a5f5fb7 KVM: x86 emulator: convert XADD to fastop
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-21 15:43:24 +03:00
Avi Kivity
203831e8e4 KVM: x86 emulator: drop unused old-style inline emulation
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-21 15:43:23 +03:00
Avi Kivity
b8c0b6ae49 KVM: x86 emulator: convert DIV/IDIV to fastop
Since DIV and IDIV can generate exceptions, we need an additional output
parameter indicating whether an execption has occured.  To avoid increasing
register pressure on i386, we use %rsi, which is already allocated for
the fastop code pointer.

Gleb: added comment about fop usage as exception indication.

Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-21 15:43:21 +03:00
Avi Kivity
b9fa409b00 KVM: x86 emulator: convert single-operand MUL/IMUL to fastop
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-21 15:43:20 +03:00
Avi Kivity
017da7b604 KVM: x86 emulator: Switch fastop src operand to RDX
This makes OpAccHi useful.

Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-21 15:43:19 +03:00
Avi Kivity
ab2c5ce666 KVM: x86 emulator: switch MUL/DIV to DstXacc
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-21 15:43:17 +03:00
Avi Kivity
820207c8fc KVM: x86 emulator: decode extended accumulator explicity
Single-operand MUL and DIV access an extended accumulator: AX for byte
instructions, and DX:AX, EDX:EAX, or RDX:RAX for larger-sized instructions.
Add support for fetching the extended accumulator.

In order not to change things too much, RDX is loaded into Src2, which is
already loaded by fastop().  This avoids increasing register pressure on
i386.

Gleb: disable src writeback for ByteOp div/mul.

Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-21 15:43:16 +03:00
Avi Kivity
fb32b1eda2 KVM: x86 emulator: add support for writing back the source operand
Some instructions write back the source operand, not just the destination.
Add support for doing this via the decode flags.

Gleb: add BUG_ON() to prevent source to be memory operand.

Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-21 15:43:14 +03:00
Linus Torvalds
5e427ec2d0 x86: Fix bit corruption at CPU resume time
In commit 78d77df715 ("x86-64, init: Do not set NX bits on non-NX
capable hardware") we added the early_pmd_flags that gets the NX bit set
when a CPU supports NX. However, the new variable was marked __initdata,
because the main _use_ of this is in an __init routine.

However, the bit setting happens from secondary_startup_64(), which is
called not only at bootup, but on every secondary CPU start.  Including
resuming from STR and at CPU hotplug time.  So the value cannot be
__initdata.

Reported-bisected-and-tested-by: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org # v3.9
Acked-by: Peter Anvin <hpa@linux.intel.com>
Cc: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-20 11:36:03 -07:00
Bjorn Helgaas
f3f011750a Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0"
This reverts commit dd72be99d1.

Andy Shevchenko <andy.shevchenko@gmail.com> reported that this commit
broke Intel Medfield devices.

Reference: https://lkml.kernel.org/r/CAHp75Vdf6gFZChS47=grUygHBDWcoOWDYPzw+Zj5bdVCWj85Jw@mail.gmail.com
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-05-20 10:20:21 -06:00
Tim Chen
31d939625a crypto: crct10dif - Accelerated CRC T10 DIF computation with PCLMULQDQ instruction
This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ
instructions.  Details discussing the implementation can be found in the
paper:

"Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-05-20 20:11:06 +08:00
Eric Dumazet
314beb9bca x86: bpf_jit_comp: secure bpf jit against spraying attacks
hpa bringed into my attention some security related issues
with BPF JIT on x86.

This patch makes sure the bpf generated code is marked read only,
as other kernel text sections.

It also splits the unused space (we vmalloc() and only use a fraction of
the page) in two parts, so that the generated bpf code not starts at a
known offset in the page, but a pseudo random one.

Refs:
http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html

Reported-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-19 23:55:41 -07:00
Daniel Baluta
6865b32a86 lguest: rename i386_head.S
Since commit 9a163ed8e0 (i386: move kernel) kernel/i386_head.S
was renamed to kernel/head_32.S. We do the same for lguest/i386_head.S.

Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-05-20 12:09:40 +09:30
Marc Zyngier
535cf7b3b1 KVM: get rid of $(addprefix ../../../virt/kvm/, ...) in Makefiles
As requested by the KVM maintainers, remove the addprefix used to
refer to the main KVM code from the arch code, and replace it with
a KVM variable that does the same thing.

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Christoffer Dall <cdall@cs.columbia.edu>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-19 15:14:00 +03:00
Eric Dumazet
650c849670 x86: bpf_jit_comp: can call module_free() from any context
It looks like we can call module_free()/vfree() from softirq context,
so no longer need a wrapper and a work_struct.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-17 17:37:19 -07:00
Gleb Natapov
35af577aac KVM: MMU: clenaup locking in mmu_free_roots()
Do locking around each case separately instead of having one lock and two
unlocks. Move root_hpa assignment out of the lock.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-16 11:55:51 +03:00
Linus Torvalds
ae3b29e67c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:

 - Fix for a CPU hot-add deadlock in microcode update code

 - Fix for idle consolidation fallout

 - Documentation update for initial kernel direct mapping

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Add missing comments for initial kernel direct mapping
  x86/microcode: Add local mutex to fix physical CPU hot-add deadlock
  x86: Fix idle consolidation fallout
2013-05-15 14:07:53 -07:00
Linus Torvalds
cc51bf6e6d Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:

 - Cure for not using zalloc in the first place, which leads to random
   crashes with CPUMASK_OFF_STACK.

 - Revert a user space visible change which broke udev

 - Add a missing cpu_online early return introduced by the new full
   dyntick conversions

 - Plug a long standing race in the timer wheel cpu hotplug code.
   Sigh...

 - Cleanup NOHZ per cpu data on cpu down to prevent stale data on cpu
   up.

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons
  timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE
  tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline
  tick: Cleanup NOHZ per cpu data on cpu down
  tick: Use zalloc_cpumask_var for allocating offstack cpumasks
2013-05-15 14:05:17 -07:00
Marcelo Tosatti
0061d53daf KVM: x86: limit difference between kvmclock updates
kvmclock updates which are isolated to a given vcpu, such as vcpu->cpu
migration, should not allow system_timestamp from the rest of the vcpus
to remain static. Otherwise ntp frequency correction applies to one
vcpu's system_timestamp but not the others.

So in those cases, request a kvmclock update for all vcpus. The worst
case for a remote vcpu to update its kvmclock is then bounded by maximum
nohz sleep latency.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-15 20:36:09 +03:00
John Stultz
b4f711ee03 time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons
Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
which enables some minor compile time optimization to avoid
uncessary code in mostly the suspend/resume path could cause
problems for userland.

In particular, the dependency for RTC_HCTOSYS on
!ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
twice and simplifies suspend/resume, has the side effect
of causing the /sys/class/rtc/rtcN/hctosys flag to always be
zero, and this flag is commonly used by udev to setup the
/dev/rtc symlink to /dev/rtcN, which can cause pain for
older applications.

While the udev rules could use some work to be less fragile,
breaking userland should strongly be avoided. Additionally
the compile time optimizations are fairly minor, and the code
being optimized is likely to be reworked in the future, so
lets revert this change.

Reported-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: stable <stable@vger.kernel.org> #3.9
Cc: Feng Tang <feng.tang@intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-14 20:54:06 +02:00
Jan Kiszka
f1ed0450a5 KVM: x86: Remove support for reporting coalesced APIC IRQs
Since the arrival of posted interrupt support we can no longer guarantee
that coalesced IRQs are always reported to the IRQ source. Moreover,
accumulated APIC timer events could cause a busy loop when a VCPU should
rather be halted. The consensus is to remove coalesced tracking from the
LAPIC.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-14 12:09:02 +03:00
Lee, Chun-Yi
eccaf52fee x86, efi: initial the local variable of DataSize to zero
That will be better initial the value of DataSize to zero for the input of
GetVariable(), otherwise we will feed a random value. The debug log of input
DataSize like this:

...
[  195.915612] EFI Variables Facility v0.08 2004-May-17
[  195.915819] efi: size: 18446744071581821342
[  195.915969] efi:  size': 18446744071581821342
[  195.916324] efi: size: 18446612150714306560
[  195.916632] efi:  size': 18446612150714306560
[  195.917159] efi: size: 18446612150714306560
[  195.917453] efi:  size': 18446612150714306560
...

The size' is value that was returned by BIOS.

After applied this patch:
[   82.442042] EFI Variables Facility v0.08 2004-May-17
[   82.442202] efi: size: 0
[   82.442360] efi:  size': 1039
[   82.443828] efi: size: 0
[   82.444127] efi:  size': 2616
[   82.447057] efi: size: 0
[   82.447356] efi:  size': 5832
...

Found on Acer Aspire V3 BIOS, it will not return the size of data if we input a
non-zero DataSize.

Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-05-14 08:13:05 +01:00
Borislav Petkov
4d067d8e05 x86: Extend #DF debugging aid to 64-bit
It is sometimes very helpful to be able to pinpoint the location which
causes a double fault before it turns into a triple fault and the
machine reboots. We have this for 32-bit already so extend it to 64-bit.
On 64-bit we get the register snapshot at #DF time and not from the
first exception which actually causes the #DF. It should be close
enough, though.

[ hpa: and definitely better than nothing, which is what we have now. ]

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1368093749-31296-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-13 13:42:44 -07:00
Takuya Yoshikawa
e2858b4ab1 KVM: MMU: Use kvm_mmu_sync_roots() in kvm_mmu_load()
No need to open-code this function.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-12 14:52:55 +03:00
Linus Torvalds
607eeb0b83 Bug-fixes:
- More fixes in the vCPU PVHVM hotplug path.
  - Add more documentation.
  - Fix various ARM related issues in the Xen generic drivers.
  - Updates in the xen-pciback driver per Bjorn's updates.
  - Mask the x2APIC feature for PV guests.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRjPL5AAoJEFjIrFwIi8fJdlIIANXawH+B+aFbqsFSKOOh76XN
 smgICU78SVzKpW9WAPYK7YFqSdNN4AleGC2Mn2lSkiaqgciRyDb9Yt+OSMMts2Xn
 ZVbFkGhEKR+DtZfTKo9YgsGatul/McTiVEkuuli+aN5dql3WXDLAaA+/b9bO3ohh
 TCWtWNuSCGmlfDoJET2je+J6CgKvCErH3fvzKNxgYxytcGhAvxoVK/lC4d3pnq/m
 wQUAIcF8XYENqC2m1WDR0OGveAB0Me0j9g+UkQS+TzqA8GPmxC4aptjkroFYhOz6
 8nZp+LanimmTI6olVioWEXkCr5+dxb058jQKwncQfonFpl58RS0qUrz5zoe3etU=
 =h9SX
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
 - More fixes in the vCPU PVHVM hotplug path.
 - Add more documentation.
 - Fix various ARM related issues in the Xen generic drivers.
 - Updates in the xen-pciback driver per Bjorn's updates.
 - Mask the x2APIC feature for PV guests.

* tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pci: Used cached MSI-X capability offset
  xen/pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
  xen: clear IRQ_NOAUTOEN and IRQ_NOREQUEST
  xen: mask x2APIC feature in PV
  xen: SWIOTLB is only used on x86
  xen/spinlock: Fix check from greater than to be also be greater or equal to.
  xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info
  xen/vcpu: Document the xen_vcpu_info and xen_vcpu
  xen/vcpu/pvhvm: Fix vcpu hotplugging hanging.
2013-05-11 16:19:30 -07:00
Linus Torvalds
ac4e01093f Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull idle update from Len Brown:
 "Add support for new Haswell-ULT CPU idle power states"

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  intel_idle: initial C8, C9, C10 support
  tools/power turbostat: display C8, C9, C10 residency
2013-05-11 15:23:17 -07:00
Linus Torvalds
3644bc2ec7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull stray syscall bits from Al Viro:
 "Several syscall-related commits that were missing from the original"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE
  unicore32: just use mmap_pgoff()...
  unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
  x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)
2013-05-10 09:21:05 -07:00