Pull MIPS fixes from Ralf Baechle:
"A fair number of fixes across the field. Nothing terribly
complicated; the one liners in below changelog should be fairly
descriptive.
Noteworthy is the SB1 change which the result of changes to binutils
resulting in one big gas warning for most files being assembled as
well as the asid_cache and branch emulation fixes which fix corruption
or possible uninteded behaviour of kernel or application code. The
remainder of fixes are more platforms or subsystem specific"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: R46000: Fix Micro-assembler field overflow for R4600 V2
MIPS: ptrace: Avoid smp_processor_id() in preemptible code
MIPS: Lemote 2F: cs5536: mfgpt: use raw locks
MIPS: SB1: Fix excessive kernel warnings.
MIPS: RC32434: fix broken PCI resource initialization
MIPS: malta: memory.c: Initialize the 'memsize' variable
MIPS: Fix typo when reporting cache and ftlb errors for ImgTec cores
MIPS: Fix inconsistancy of __NR_Linux_syscalls value
MIPS: Fix branch emulation of branch likely instructions.
MIPS: Fix a typo error in AUDIT_ARCH definition
MIPS: Change type of asid_cache to unsigned long
memblock is now fully integrated into the kernel and is the prefered
method for tracking memory. Rather than reinvent the wheel with
meminfo, migrate to using memblock directly instead of meminfo as
an intermediate.
Acked-by: Jason Cooper <jason@lakedaemon.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Due to a design incompatibility between the PCIe Marvell controller
and the Cortex-A9, stressing PCIe devices with a lot of traffic
quickly causes a deadlock.
One part of the workaround for this is to have all PCIe regions mapped
as strongly-ordered (MT_UNCACHED) instead of the default
MT_DEVICE. While the arch_ioremap_caller() mechanism allows
sub-architecture code to override ioremap(), used to map PCIe memory
regions, there isn't such a mechanism to override the behavior of
pci_ioremap_io().
This commit adds the arch_pci_ioremap_mem_type variable, initialized
to MT_DEVICE by default, and that sub-architecture code can
override. We have chosen to expose a single variable rather than
offering the possibility of overriding the entire pci_ioremap_io(),
because implementing pci_ioremap_io() requires calling functions
(get_mem_type()) that are private to the arch/arm/mm/ code.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This was caught by a panic on Broadcom mobile platforms.
Note that this code is all going away with the pending l2x0 cleanup
series from Russell, but we need this here until that's landed so we
can enable exynos multiplatform.
Signed-off-by: Olof Johansson <olof@lixom.net>
Make it a little clearer what the littleendian access macros in
vdso2c.[ch] actually do. This way they can probably also be moved to
a central location (e.g. tools/include) for the benefit of other host
tools.
We should avoid implementation namespace symbols when writing code
that is compiling for the compiler host, so avoid names starting with
double underscore or underscore-capital.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2cf258df123cb24bad63c274c8563c050547d99d.1401464755.git.luto@amacapital.net
This fixes the following build errors:
/tmp/ccRbZlaA.s: Assembler messages:
/tmp/ccRbZlaA.s:69: Error: selected processor does not support ARM mode `isb '
/tmp/ccRbZlaA.s:75: Error: selected processor does not support ARM mode `isb '
/tmp/ccRbZlaA.s:76: Error: selected processor does not support ARM mode `dsb '
make[1]: *** [arch/arm/mach-exynos/hotplug.o] Error 1
/tmp/ccJEg4jw.s: Assembler messages:
/tmp/ccJEg4jw.s:454: Error: selected processor does not support ARM mode `isb'
/tmp/ccJEg4jw.s:455: Error: selected processor does not support ARM mode `dsb'
/tmp/ccJEg4jw.s:465: Error: selected processor does not support ARM mode `isb'
/tmp/ccJEg4jw.s:474: Error: selected processor does not support ARM mode `isb'
/tmp/ccJEg4jw.s:475: Error: selected processor does not support ARM mode `dsb'
/tmp/ccJEg4jw.s:516: Error: selected processor does not support ARM mode `isb'
/tmp/ccJEg4jw.s:525: Error: selected processor does not support ARM mode `isb'
/tmp/ccJEg4jw.s:526: Error: selected processor does not support ARM mode `dsb'
make[1]: *** [arch/arm/mach-exynos/mcpm-exynos.o] Error 1
Reported-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Tested-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Enable Exynos platform and its related IPs for multi_v7_defconfig.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
This is including fix exynos cpufreq driver compilation with
ARCH_MULTIPLATFORM. Even though this is a work around, this
is required for support exynos multiplatform for a while and
will be updated in near future.
This is based on tags/samsung-exynos.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJTiN4aAAoJEA0Cl+kVi2xqh14P/2Alq0Q4KgPmlsIIOsofADLw
BEig7xQAh9iREkfKLm6Wy1K/OP99jNWdbtZnMF1UK3pTxNVPKluz3ZADdJXJWlmb
UsZcu60l9ZuZ+B4vTUG7jQx2oq6sxo7aLVMyTvT+kWmzZsNWEdqMDzDATgqjPOW8
yGfWg6KwdTkpGbesID6w1h2D611RDIhJU+LpXE+AmyKWX3x/jkOxOU49CY8bKEhw
ZwYw41vjTk8e2U/cBYCjqPhEuUzpFq+mmVQcVDGch/p/TFUA1/SbzuqbVDHVe+yZ
qQHNZREsAWJPFJ22IjUleq/oNLiSVkJVVSkVrfXCwowzzPsw+cLFDuSRaJMROm1a
baR77C0TJREqoXu4FexBSoWLPYbn7aUOlZ3wGmr/7MnPK/bK3e9BGiOsJ8GPIQuF
e2ytESFl15ABduY0lV0H/TtOxjx8JZJf+xxh0Ip0qoE8XNkNxt5cIsMW7LD/NcaE
zUuPoO+w97pUxyVDrTPXakLRdn643PmYRX+rMG8gXP82m/CiCKe+x8znc7BkXIq2
h8rvSLqnWQFYUdEj/0/T8scRT5DyZPjSTDguUj+lhDU2rE0gWNCHEO+2a7Qwm8Js
YfeQ4dd71ZNqzw8Ui62do2HzNGIA/FWPwQBzHeu2tPxVrWOwSQz/zSA4TLwjeB3J
n1Tzm9JOJNwH3FTAcH9k
=SQbr
-----END PGP SIGNATURE-----
Merge tag 'samsung-drivers-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/drivers
Merge "Samsung 2nd drivers for 3.16" from Kukjin Kim:
This is including fix exynos cpufreq driver compilation with
ARCH_MULTIPLATFORM. Even though this is a work around, this
is required for support exynos multiplatform for a while and
will be updated in near future.
This is based on tags/samsung-exynos.
* tag 'samsung-drivers-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung: (24 commits)
cpufreq: exynos: Fix driver compilation with ARCH_MULTIPLATFORM
ARM: EXYNOS: Enable multi-platform build support
ARM: EXYNOS: Consolidate Kconfig entries
ARM: EXYNOS: Add support for EXYNOS5410 SoC
ARM: EXYNOS: Support secondary CPU boot of Exynos3250
ARM: EXYNOS: Add Exynos3250 SoC ID
ARM: EXYNOS: Add 5800 SoC support
ARM: EXYNOS: initial board support for exynos5260 SoC
clk: exynos5250: Add missing sysmmu clocks for DISP and ISP blocks
cpufreq: exynos: Fix the compile error
ARM: S3C24XX: move debug-macro.S into the common space
ARM: S3C24XX: use generic DEBUG_UART_PHY/_VIRT in debug macro
ARM: S3C24XX: trim down debug uart handling
ARM: compressed/head.S: remove s3c24xx special case
ARM: EXYNOS: Remove unnecessary inclusion of cpu.h
ARM: EXYNOS: Migrate Exynos specific macros from plat to mach
ARM: EXYNOS: Remove exynos_subsys registration
ARM: EXYNOS: Remove duplicate lines in Makefile
ARM: EXYNOS: use v7_exit_coherency_flush macro for cache disabling
ARM: dts: Remove g2d_pd node for exynos5420
...
Signed-off-by: Olof Johansson <olof@lixom.net>
- enable mcpm for dual-cluster exynos5800
- since commit 166aaf39 ("ARM: 8029/1: mcpm: Rename the
power_down_finish() functions to be less confusing"),
use new member name wait_for_cpu_powerdown.
This is based on tags/exynos-mcpm.
Note that since the commit 166aaf39 is in rmk tree so this
should be sent to upstream after that in 3.16 merge window.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJTiNzxAAoJEA0Cl+kVi2xqEoQQALNi4AGqI9Gu15tRff3a9gmA
cu7VsnkYWnazUg6bYpYjy1miNbGmXsEErg0lZORkW+RY76UI3kDtvwFCxV1/EnOl
Ue3igxJGg7kWfCHekp2jpWPG4a9eLgCxN1EiF/z3lFJoTT9pksAftR/4DHmJWKwf
HFKq1ogOSz0d0EwCe9keysAIXjK69gfSnoSw6Y9uIGXrpyDiMEUbhhllu+qwYGZy
KQcC+jN4fdDUAKGE/k50EbFEd9DCFd1XxcVYW763prTJXcIP8sFWUfqJdi2kGi8b
hRG8pYlXh2MGI6rzUzWBsxk9nIwzkd5zUkgCAgAoSzmLSolV49Fk6AMdBQPuNKWf
yNhpXwawRaTduVGzdEUl6BAX3E7dnb9MQfwdcvcsos8GuZtoxwzXsTFxu4AtC8kg
izv99RJJ952kj8kjEZjuFx37LgaYWv8+dc0O3Lo5WiW4ibgtkZ2Ek9FvIXYSdnho
5+2l6qsXIV7J/1gJE0WJpBaGN5t4mPh6zokuXREq8RfTICFJqY8JmWrYEVIV/rpB
X3e4sqMtuMB02u+VdJOzoc4krwLF8xokxgo5PJ/3im9i5plzSi9NTJ2r/lOCLmos
aYQn4HIB5h8vYs/ozVncLs7gVNuir4BR8mNREUZTVXbNAPFnd6hXLAA4W6OYpzSG
uXthZ8lup1pH9pGqkvBr
=95LQ
-----END PGP SIGNATURE-----
Merge tag 'exynos-mcpm-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/soc
Merge "Exynos 2nd MCPM updates for v3.16" from Kukjin Kim:
- enable mcpm for dual-cluster exynos5800
- since commit 166aaf39 ("ARM: 8029/1: mcpm: Rename the
power_down_finish() functions to be less confusing"),
use new member name wait_for_cpu_powerdown.
This is based on tags/exynos-mcpm.
Note that since the commit 166aaf39 is in rmk tree so this
should be sent to upstream after that in 3.16 merge window.
* tag 'exynos-mcpm-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: EXYNOS: mcpm rename the power_down_finish
ARM: EXYNOS: Enable mcpm for dual-cluster exynos5800 SoC
Signed-off-by: Olof Johansson <olof@lixom.net>
- add new SoCs support
: exynos3250, 5260, 5410 and 5800
- enable multi-platform on exynos
: consolidate exynos related Kconfig entries
Note that this requires tags/samsung-cleanup and tags/samsung-clk-2
because of mostly migration exynos specific macros into mach-exynos
and exynos related Kconfig entries.
One more merge conflict happens in arch/arm/Kconfig for ARCH_EXYNOS
due to SRAM stuff, even though tried to sort them out. Since just
resolving it would be better I think, please remove ARCH_EXYNOS in
arch/arm/Kconfig when merge conflict happens.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJTiNnJAAoJEA0Cl+kVi2xqw4AP/1h2Rtg5/VXTwspQNicVqmjy
xh1R5HNkCK9J+Vs+hrajFog4lneTuKDimgJqwM4JcMR9Y6KFCVivMZ+rPpaVpa2o
Ly50EgEUAx7iir/PswttUVsSAoP/LXwwP44tWgV6qwV4N2qZ4BPKSAkc+l4icrqv
Zzx4h1sxMKF2oTarDsrBIj7UszZtIB6Dt00ogyEJ2BNgWcWzU51YtWOz6MdC+Eha
Wz2cfnrl2GiSIUC+Y0jRxeNfJyIgVMvrWpC23d5QSO0qx5xDNhSq/+eNXpd4gTGc
aMoBnTgzvymOgLYFkcDCDYxsehXqX1b3hHzu1BfRTWjcA9wmxFhC8eOiMAgDso79
V8L/k5/XVAP34TFzZR1ZD5rCqgBIuDgmeOqyzMAxMLVKTQZrJ/6smKq94FCCD89G
lOtiLmAPems8g7DE2wpWsYiml7TyUHNqgHP4FWJThnERjBa+rDKsyozzgFLJLqD7
/qL+qn6C0COtfDN0SBiIiLyV/0j+DI+tS5uGWgKWxGkRLQev1GyWGsGJZBx1tlJB
v0/aAEAFkTWg9b+dO9OdUXzbGvc3VAQbaiwfCb73vsEibuO3IGpsXsl0d+ChXUkK
XbQW5nNHOCKJj1auAg+4AqWm+evSzjZa/gViUlxE0vpK1Bc8uXZy0sYhQwC0lBaB
KBqHE3ImvgIgPhXa3t5R
=/Vpg
-----END PGP SIGNATURE-----
Merge tag 'samsung-exynos' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/soc
Samsung Exynos updates for 3.16
- add new SoCs support
: exynos3250, 5260, 5410 and 5800
- enable multi-platform on exynos
: consolidate exynos related Kconfig entries
* tag 'samsung-exynos' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung: (22 commits)
ARM: EXYNOS: Enable multi-platform build support
ARM: EXYNOS: Consolidate Kconfig entries
ARM: EXYNOS: Add support for EXYNOS5410 SoC
ARM: EXYNOS: Support secondary CPU boot of Exynos3250
ARM: EXYNOS: Add Exynos3250 SoC ID
ARM: EXYNOS: Add 5800 SoC support
ARM: EXYNOS: initial board support for exynos5260 SoC
cpufreq: exynos: Fix the compile error
ARM: S3C24XX: move debug-macro.S into the common space
ARM: S3C24XX: use generic DEBUG_UART_PHY/_VIRT in debug macro
ARM: S3C24XX: trim down debug uart handling
ARM: compressed/head.S: remove s3c24xx special case
ARM: EXYNOS: Remove unnecessary inclusion of cpu.h
ARM: EXYNOS: Migrate Exynos specific macros from plat to mach
ARM: EXYNOS: Remove exynos_subsys registration
ARM: EXYNOS: Remove duplicate lines in Makefile
ARM: EXYNOS: use v7_exit_coherency_flush macro for cache disabling
ARM: dts: Remove g2d_pd node for exynos5420
ARM: dts: Remove mau_pd node for exynos5420
ARM: exynos_defconfig: enable HS-I2C to fix for mmc partition mount
...
Signed-off-by: Olof Johansson <olof@lixom.net>
* Updated Kconfig DEBUG_QCOM_UARTDM help to include APQ8084 info
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: GPGTools - https://gpgtools.org
iQIcBAABCgAGBQJTiLLlAAoJEF9hYXeAcXzB+WMP/R2hkfj5fMtMmerpAaGt9KXW
qa8qPM+PC0vEhiZSWHTjUsB+FBx6H4AWVu8FfOYaZQAthYPO/gd5fh7Zd8pzqFs4
sJVFyXjFDaEHxbj1aDw0b5R08CLN2nG15K5x8MS8QQDRW3CzE49Nm3rCXQliEtla
V6ID31EOAqXLz9JX0QauSMC2YMkrO0COfR2ERUROg/3cuwFnZPCGmbEnn8UAqtUc
ks7JC4toCbxyoA2KOE420pkQOZYHlNvnZgrCI/ZJsjiOlDq3X8vFCNqTgSzjNdwE
doH5CGsnC+plZ/BDKm9M71htnbCBZe/LzbPrfK15LB6fwpjHZyIfPFfMRUbEA+j9
UIQFlWp0q+1m4YJ3dfmTOpUzq7j+PQLFjDEiXpxVdNKtVKdYAqrh71gzSsZLz0ay
dMYbvuIjGiq9cjQwm4sqT6q4hI3xfNnsUiNFMabhyv1cu5icf+5ysJHWaOHVu1rg
j6+HvZCxDet2/zM4ipty94zxF+8/0NFd2AVEfhYqcM3ZObkHdM48hcyIkGQNOqpR
xnE2YKQAT7wQQq+cwvRqwEk/aK00fVqqU5AWv14gH8ZKLX3+VBN3Q/8YGktyt92L
ML4HziqD10ck+VqWmnKFVNkvh6RBTuZI2Lnz+p56AW9sp3ZceY2a+nbw+5dXsvly
q3ft6A0hQ46Hyi5L7D0q
=xIWF
-----END PGP SIGNATURE-----
Merge tag 'qcom-soc-for-3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/galak/linux-qcom into next/soc
Merge "Qualcomm ARM Based SoC Updates for v3.16-2" from Kumar Gala:
* Updated Kconfig DEBUG_QCOM_UARTDM help to include APQ8084 info
* tag 'qcom-soc-for-3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/galak/linux-qcom:
ARM: debug: qcom: add UART addresses to Kconfig help for APQ8084
Signed-off-by: Olof Johansson <olof@lixom.net>
* Updated MSM8660/MSM8960/MSM8974 dts for various updates or conformitity
to binding specs
* Added APQ8064 SoC and IFC6410 board device tree support
* Added APQ8084 SoC and APQ8084-MTP board device tree support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: GPGTools - https://gpgtools.org
iQIcBAABCgAGBQJTiLIDAAoJEF9hYXeAcXzBTZkQALpph9EDQz583BLIHabv5lCY
mhRSCBS5j5CscM8o0euaqFcKIyNmFOuPCis2jArMmX9Z8RjRJod6zNzdQW92Kddq
TXMtRo+BX1Jel9Ri5ysXGvDOkUeViUf56PuYy4pqFOEfbeKpH/yfWTAJDt2usIEi
9bfDodwAbbsB9t8IBgTNhyhAmMUqkfvBvtBnG8qAyXBxUkvC5iAjPnWT8LOisrTC
hTIvh8H3acImN4TLlvElvhRDPC7a4rcmF7XdoCjm3F9t9b/7U2mwGCGqZGGal/p7
H7SvAwTfQcNToZeo9RMz+xxq0fT00qLQqnHtx7s0NXXA2XPVQ2RxNH2YUqBUGAEq
EEfXjTh+pBUUfzLmxbBzuD9pNx5JD0rhIKV5sA/pCXabmEFQzXcIeGmKpSapbuOi
7zoqSqaqDlR21qdBuh545E6zIlSE7VPeeG0oqI2bzaG4OHksoRuMJhUzkKikHxQ8
uN+P5lPQudBT6FbncVvpGNuAkTVB6T7Qtl7jzCplLCgZ2T3X1gb0sg5+1R9vqOEz
1n9er/cAiaklv+ca4jsfCuQAkgrxAfqtPGPxyVo8fuaIE+Q97y38yWfCquxl1oGp
vkE9s2TaREWbRCIvOf2Oxm0dFrTWWVjQI1kAq4iSJCDIBS7f1/qXtS0NvrE1E2+e
B+9veuw7/7xIcSZZvb52
=u8HT
-----END PGP SIGNATURE-----
Merge tag 'qcom-dt-for-3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/galak/linux-qcom into next/dt
Merge "Qualcomm ARM Based Device Tree Updates for v3.16-2" from Kumar Gala:
* Updated MSM8660/MSM8960/MSM8974 dts for various updates or conformitity
to binding specs
* Added APQ8064 SoC and IFC6410 board device tree support
* Added APQ8084 SoC and APQ8084-MTP board device tree support
* tag 'qcom-dt-for-3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/galak/linux-qcom:
ARM: dts: qcom: Add APQ8084-MTP board support
ARM: dts: qcom: Add APQ8084 SoC support
ARM: dts: qcom: Add initial APQ8064 SoC and IFC6410 board device trees
ARM: dts: qcom: Update msm8660 device trees
ARM: dts: qcom: Update msm8960 device trees
ARM: dts: qcom: Update msm8974/apq8074 device trees
Signed-off-by: Olof Johansson <olof@lixom.net>
There is very little and maybe practically nothing we can do to recover
from a system where at least one core has reached a timeout during the
whole monarch cores gathering. So panic when that happens.
Link: http://lkml.kernel.org/r/20140523091041.GA21332@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
Arndale-Octa board is always configured to work with trustzone
firmware binary. Added DTS node entry to enable this support.
Signed-off-by: Tushar Behera <tushar.behera@linaro.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
While I play inhouse patches with much memory pressure on qemu-kvm,
3.14 kernel was randomly crashed. The reason was kernel stack overflow.
When I investigated the problem, the callstack was a little bit deeper
by involve with reclaim functions but not direct reclaim path.
I tried to diet stack size of some functions related with alloc/reclaim
so did a hundred of byte but overflow was't disappeard so that I encounter
overflow by another deeper callstack on reclaim/allocator path.
Of course, we might sweep every sites we have found for reducing
stack usage but I'm not sure how long it saves the world(surely,
lots of developer start to add nice features which will use stack
agains) and if we consider another more complex feature in I/O layer
and/or reclaim path, it might be better to increase stack size(
meanwhile, stack usage on 64bit machine was doubled compared to 32bit
while it have sticked to 8K. Hmm, it's not a fair to me and arm64
already expaned to 16K. )
So, my stupid idea is just let's expand stack size and keep an eye
toward stack consumption on each kernel functions via stacktrace of ftrace.
For example, we can have a bar like that each funcion shouldn't exceed 200K
and emit the warning when some function consumes more in runtime.
Of course, it could make false positive but at least, it could make a
chance to think over it.
I guess this topic was discussed several time so there might be
strong reason not to increase kernel stack size on x86_64, for me not
knowing so Ccing x86_64 maintainers, other MM guys and virtio
maintainers.
Here's an example call trace using up the kernel stack:
Depth Size Location (51 entries)
----- ---- --------
0) 7696 16 lookup_address
1) 7680 16 _lookup_address_cpa.isra.3
2) 7664 24 __change_page_attr_set_clr
3) 7640 392 kernel_map_pages
4) 7248 256 get_page_from_freelist
5) 6992 352 __alloc_pages_nodemask
6) 6640 8 alloc_pages_current
7) 6632 168 new_slab
8) 6464 8 __slab_alloc
9) 6456 80 __kmalloc
10) 6376 376 vring_add_indirect
11) 6000 144 virtqueue_add_sgs
12) 5856 288 __virtblk_add_req
13) 5568 96 virtio_queue_rq
14) 5472 128 __blk_mq_run_hw_queue
15) 5344 16 blk_mq_run_hw_queue
16) 5328 96 blk_mq_insert_requests
17) 5232 112 blk_mq_flush_plug_list
18) 5120 112 blk_flush_plug_list
19) 5008 64 io_schedule_timeout
20) 4944 128 mempool_alloc
21) 4816 96 bio_alloc_bioset
22) 4720 48 get_swap_bio
23) 4672 160 __swap_writepage
24) 4512 32 swap_writepage
25) 4480 320 shrink_page_list
26) 4160 208 shrink_inactive_list
27) 3952 304 shrink_lruvec
28) 3648 80 shrink_zone
29) 3568 128 do_try_to_free_pages
30) 3440 208 try_to_free_pages
31) 3232 352 __alloc_pages_nodemask
32) 2880 8 alloc_pages_current
33) 2872 200 __page_cache_alloc
34) 2672 80 find_or_create_page
35) 2592 80 ext4_mb_load_buddy
36) 2512 176 ext4_mb_regular_allocator
37) 2336 128 ext4_mb_new_blocks
38) 2208 256 ext4_ext_map_blocks
39) 1952 160 ext4_map_blocks
40) 1792 384 ext4_writepages
41) 1408 16 do_writepages
42) 1392 96 __writeback_single_inode
43) 1296 176 writeback_sb_inodes
44) 1120 80 __writeback_inodes_wb
45) 1040 160 wb_writeback
46) 880 208 bdi_writeback_workfn
47) 672 144 process_one_work
48) 528 112 worker_thread
49) 416 240 kthread
50) 176 176 ret_from_fork
[ Note: the problem is exacerbated by certain gcc versions that seem to
generate much bigger stack frames due to apparently bad coalescing of
temporaries and generating too many spills. Rusty saw gcc-4.6.4 using
35% more stack on the virtio path than 4.8.2 does, for example.
Minchan not only uses such a bad gcc version (4.6.3 in his case), but
some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
what causes that kernel_map_pages() frame, for example). But we're
clearly getting too close.
The VM code also seems to have excessive stack frames partly for the
same compiler reason, triggered by excessive inlining and lots of
function arguments.
We need to improve on our stack use, but in the meantime let's do this
simple stack increase too. Unlike most earlier reports, there is
nothing simple that stands out as being really horribly wrong here,
apart from the fact that the stack frames are just bigger than they
should need to be. - Linus ]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S Tsirkin <mst@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit 166aaf39 ("ARM: 8029/1: mcpm: Rename the power_down_finish()
functions to be less confusing") changed the name of power_down_finish to
wait_for_cpu_powerdown, so use new member name wait_for_cpu_powerdown.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
The exynos5800 is very similar to exynos5420. We can re-use
the existing MCPM support for exynos5800 for secondary boot
-up and switching.
Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This makes it possible to enable the Exynos platform as part of a
multiplatform kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Instead of repeating the Kconfig entries for every SoC,
move them under ARCH_EXYNOS3, 4 and 5 and move the entries
common to 3, 4 and 5 under ARCH_EXYNOS.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
EXYNOS5410 is SoC in Samsung's Exynos5 SoC series.
Add initial support for this SoC.
Signed-off-by: Tarek Dakhran <t.dakhran@samsung.com>
Signed-off-by: Vyacheslav Tyrtov <v.tyrtov@samsung.com>
Reviewed-by: Tomasz Figa <t.figa@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch fix the offset of CPU boot address and don't
need to send smc call of SMC_CMD_CPU1BOOT command for
secondary CPU boot because Exynos3250 removes WFE in
secure mode.
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Tomasz Figa <t.figa@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch add Exynos3250's SoC ID. Exynos 3250 is SoC that
is based on the 32-bit RISC processor for Smartphone.
Exynos3250 uses Cortex-A7 dual cores and has a target speed
of 1.0GHz.
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Tomasz Figa <t.figa@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Exynos5800 is an octa core SoC which is based on the 5420
platform. This patch adds the basic support for it in the
mach-exynos.
Signed-off-by: Arun Kumar K <arun.kk@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch add basic arch side support for exynos5260 SoC.
Note that this is required to enable build for clock driver.
Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com>
Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch add pmusysreg node for Exynos3250 to access PMU
(Power Management Unit) register in a centralized way using
syscon driver.
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Tomasz Figa <t.figa@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
The vbus-supply property is wrongly updated in the
usbdrd node instead of the usbdrd_phy node. This patch
fixes the same.
Signed-off-by: Arun Kumar K <arun.kk@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
The vbus-supply property is wrongly updated in the
usbdrd node instead of the usbdrd_phy node. This patch
fixes the same.
Signed-off-by: Arun Kumar K <arun.kk@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch adds new exynos3250.dtsi to support Exynos3250 SoC
based on Cortex-A7 dual core and includes following dt nodes:
- GIC interrupt controller
- Pinctrl to control GPIOs
- Clock controller
- CPU information (Cortex-A7 dual core)
- UART to support serial port
- MCT (Multi Core Timer)
- ADC (Analog Digital Converter)
- I2C/SPI bus
- Power domain
- PMU (Performance Monitoring Unit)
- MSHC (Mobile Storage Host Controller)
- PWM (Pluse Width Modulation)
- AMBA bus
- sysram node for SYSRAM memory mapping
Signed-off-by: Tomasz Figa <t.figa@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Hyunhee Kim <hyunhee.kim@samsung.com>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ian Campbell <ijc+devicetree@hellion.org.uk>
Cc: Kumar Gala <galak@codeaurora.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: devicetree@vger.kernel.org
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Add required fixed-regulator for VBUS supply for USB 3.0
controller phy.
Signed-off-by: Vivek Gautam <gautam.vivek@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Enable display controller with timing information for 1080p
panel in Exynos5800 peach-pi board.
Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com>
Reviewed-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Adds support for google peach-pi board having the
Exynos5800 SoC.
Signed-off-by: Arun Kumar K <arun.kk@samsung.com>
Signed-off-by: Doug Anderson <dianders@chromium.org>
Reviewed-by: Tomasz Figa <t.figa@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Most of the nodes of exynos5420 remains same for exynos5800.
So the exynos5420.dtsi is included in exynos5800 and the changed
node properties will be overriden.
Signed-off-by: Arun Kumar K <arun.kk@samsung.com>
Reviewed-by: Tomasz Figa <t.figa@samsung.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
The patch adds the dts file for xyref5260 board which
is based on exynos5260 SoC.
Signed-off-by: Rahul Sharma <rahul.sharma@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Made it as per DT node naming convention <name@reg_addr>.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Reviewed-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Key code macros improve readability on exnos4210-origen,
exynos4412-origen and exynos5250-arndale boards.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
[kgene.kim@samsung.com: squashed similar two patches]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Enabled RTC and WDT nodes on exynos4210-origen and
exynos4412-origen boards.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
[kgene.kim@samsung.com: squashed similar two patches]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Call pcie_bus_configure_settings() on ARM, like for other platforms.
pcie_bus_configure_settings() makes sure the MPS across the bus is uniform
and provides the ability to tune the MRSS and MPS to higher performance
values. This is particularly important for embedded where there is no
firmware to program these PCIe settings for the OS.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
CC: Santosh Shilimkar <santosh.shilimkar@ti.com>
On platforms implementing CPU power management, the CPUidle subsystem
can allow CPUs to enter idle states where local timers logic is lost on power
down. To keep the software timers functional the kernel relies on an
always-on broadcast timer to be present in the platform to relay the
interrupt signalling the timer expiries.
For platforms implementing CPU core gating that do not implement an always-on
HW timer or implement it in a broken way, this patch adds code to initialize
the kernel hrtimer based clock event device upon boot (which can be chosen as
tick broadcast device by the kernel).
It relies on a dynamically chosen CPU to be always powered-up. This CPU then
relays the timer interrupt to CPUs in deep-idle states through its HW local
timer device.
Having a CPU always-on has implications on power management platform
capabilities and makes CPUidle suboptimal, since at least a CPU is kept
always in a shallow idle state by the kernel to relay timer interrupts,
but at least leaves the kernel with a functional system with some working
power management capabilities.
The hrtimer based clock event device is unconditionally registered, but
has the lowest possible rating such that any broadcast-capable HW clock
event device present will be chosen in preference as the tick broadcast
device.
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This corrects a bug that will be introduced in v3.15.
The bug causes audio playback to fail on the Armadillo800 EVA board.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJTh8LKAAoJENfPZGlqN0++qVEP/3dzAzOCnEzjw1zHgnl8OdZI
VWaAPd9mfyhNPk364wt09hy4G4mk99Oz3XwfQJ7mifR9BgT8msIm/oPjVFvZSrdO
ShZvn3o7Rdd43ZTz0QHyKoFbB9/kxQxGasvNDe5Bm5ckIAyqCqA8leWLzRNJ1Rdh
sb7jUgYrgJjQlTy/y88L/vq9GaJzVy1fsVo9VUWwSxoxqiyLKPRvlMxTqucaW1gM
NIUaa1bPzXiC00pQNSMytgSVSCUTbret8yifqTa4VGZH2E7iK5eGjqwe75JqFXzs
Ovrm7lfs8C1/FvMWNYP4qliZku/J2zacAUTKnuEv0g0s0NsJGk4WYTKh2vzr6N+N
XjLblrZ3yz8R3jorPVXBNYCZd7KoZi/9/byhduPyXtZkg4+QSws8vrtRdyX/8GXJ
JM9LsQLTBIXS8nBuxe175BttCZUS8S199CbPaFyZl1KcV+GKIYn9KgcTZEJnZkqq
ZMXeuSlHxYk17zmClO2V/PDqTs9bF7jiK14NpjYczGSvcAaPoGXMHLzem1ob0YUK
sFS4IqCM/Xrw3XDs3mD5pF7SgwdopJEOnQNW1Aqw4P1ixiZWJVb+y48OKHw4SVzn
35cCl/A22VydwwbggU0iEfBoME740jiWXv7G8t4GrxYZ08O/v8pAokLo63jAYhbi
k0pp/Ubl3up/TKw3Vp4E
=aVCn
-----END PGP SIGNATURE-----
Merge tag 'renesas-fixes-for-v3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/fixes-non-critical
Merge "Renesas ARM Based SoC Fixes for v3.16" from Simon Horman:
This corrects a bug that will be introduced in v3.15.
The bug causes audio playback to fail on the Armadillo800 EVA board.
* tag 'renesas-fixes-for-v3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
ARM: shmobile: armadillo800eva: fixup HDMI sound flags setting
Signed-off-by: Olof Johansson <olof@lixom.net>
In this round we have a few nice gems. PR KVM gains initial POWER8 support
as well as LE host awareness, ihe e500 targets can now properly run u-boot,
LE guests now work with PR KVM including KVM hypercalls and HV KVM guests
can now use huge pages.
On top of this there are some bug fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJTiHxMAAoJECszeR4D/txgGggQAL5rYpFkuU8+KiLd7uX4DasT
DYrKxgOX/G8WRI0OJZpZHGdcw4fefBxHjY2ydnCSDk+EH2IPrlF6+aYnCwoarj3I
L2Ji/OEfFZVgQtqNkqbuG5Z8onmrgZ86y5TFZKiMcDZe3zLnuGIP4fCLWEeWalor
JQ9fE0z4WFkCtUD18/S/Cf6bOniGLdphUrmgECo+4SbSxUbixrckBEwuA7TjWure
VrA0UX35l+VoVtYntKrQqYzsf+QSNWPzcDQnhlso+rz5ap6sB6rlC0PO8SHTPjk4
BXvHVvripVss7mGv7vZPizp52WXAFsNnVPj1tNSCu3Tn+gRBB8ChCV8xHIWYex8a
nz8AvdUq1TdWmSy7NwJODjJfclauTVtrhPvrbLNSN3ksSzx4eLh8i+Lf8qe0SW7i
4oTfxG+PSDlF/ub8KX8BVfM0FpyvHdcR1h1ex3szPAQfiaAy+/ce9c3bB0xzXmLx
sN3GX0BocdWejMo1U8+/+CVUngtol81BAb+k4GB5mPjW08u5jFVZu+XwxhK2ACx3
y3hAuaGd1d0YuTztYav7+wuu5gPwhYSA3/tmTR1+0jx9uCQbPpOyPJmprt/9A2DP
TY0T0I3TUJqWZAtHOsY0P0hCGl+gZQJb8KhjVKHC6t6/ab3grMkpHgp9XDVvKpHJ
u9lPDCuQRyJBSy9WI2Qa
=t4YN
-----END PGP SIGNATURE-----
Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-next
Patch queue for ppc - 2014-05-30
In this round we have a few nice gems. PR KVM gains initial POWER8 support
as well as LE host awareness, ihe e500 targets can now properly run u-boot,
LE guests now work with PR KVM including KVM hypercalls and HV KVM guests
can now use huge pages.
On top of this there are some bug fixes.
Conflicts:
include/uapi/linux/kvm.h
On LPAR guest systems Linux enables the shadow SLB to indicate to the
hypervisor a number of SLB entries that always have to be available.
Today we go through this shadow SLB and disable all ESID's valid bits.
However, pHyp doesn't like this approach very much and honors us with
fancy machine checks.
Fortunately the shadow SLB descriptor also has an entry that indicates
the number of valid entries following. During the lifetime of a guest
we can just swap that value to 0 and don't have to worry about the
SLB restoration magic.
While we're touching the code, let's also make it more readable (get
rid of rldicl), allow it to deal with a dynamic number of bolted
SLB entries and only do shadow SLB swizzling on LPAR systems.
Signed-off-by: Alexander Graf <agraf@suse.de>
We didn't make use of SLB entry 0 because ... of no good reason. SLB entry 0
will always be used by the Linux linear SLB entry, so the fact that slbia
does not invalidate it doesn't matter as we overwrite SLB 0 on exit anyway.
Just enable use of SLB entry 0 for our shadow SLB code.
Signed-off-by: Alexander Graf <agraf@suse.de>
The code that delivered a machine check to the guest after handling
it in real mode failed to load up r11 before calling kvmppc_msr_interrupt,
which needs the old MSR value in r11 so it can see the transactional
state there. This adds the missing load.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds workarounds for two hardware bugs in the POWER8 performance
monitor unit (PMU), both related to interrupt generation. The effect
of these bugs is that PMU interrupts can get lost, leading to tools
such as perf reporting fewer counts and samples than they should.
The first bug relates to the PMAO (perf. mon. alert occurred) bit in
MMCR0; setting it should cause an interrupt, but doesn't. The other
bug relates to the PMAE (perf. mon. alert enable) bit in MMCR0.
Setting PMAE when a counter is negative and counter negative
conditions are enabled to cause alerts should cause an alert, but
doesn't.
The workaround for the first bug is to create conditions where a
counter will overflow, whenever we are about to restore a MMCR0
value that has PMAO set (and PMAO_SYNC clear). The workaround for
the second bug is to freeze all counters using MMCR2 before reading
MMCR0.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Current, when testing whether a page is dirty (when constructing the
bitmap for the KVM_GET_DIRTY_LOG ioctl), we test the C (changed) bit
in the HPT entries mapping the page, and if it is 0, we consider the
page to be clean. However, the Power ISA doesn't require processors
to set the C bit to 1 immediately when writing to a page, and in fact
allows them to delay the writeback of the C bit until they receive a
TLB invalidation for the page. Thus it is possible that the page
could be dirty and we miss it.
Now, if there are vcpus running, this is not serious since the
collection of the dirty log is racy already - some vcpu could dirty
the page just after we check it. But if there are no vcpus running we
should return definitive results, in case we are in the final phase of
migrating the guest.
Also, if the permission bits in the HPTE don't allow writing, then we
know that no CPU can set C. If the HPTE was previously writable and
the page was modified, any C bit writeback would have been flushed out
by the tlbie that we did when changing the HPTE to read-only.
Otherwise we need to do a TLB invalidation even if the C bit is 0, and
then check the C bit.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
The dirty map that we construct for the KVM_GET_DIRTY_LOG ioctl has
one bit per system page (4K/64K). Currently, we only set one bit in
the map for each HPT entry with the Change bit set, even if the HPT is
for a large page (e.g., 16MB). Userspace then considers only the
first system page dirty, though in fact the guest may have modified
anywhere in the large page.
To fix this, we make kvm_test_clear_dirty() return the actual number
of pages that are dirty (and rename it to kvm_test_clear_dirty_npages()
to emphasize that that's what it returns). In kvmppc_hv_get_dirty_log()
we then set that many bits in the dirty map.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently, when a huge page is faulted in for a guest, we select the
rmap chain to insert the HPTE into based on the guest physical address
that the guest tried to access. Since there is an rmap chain for each
system page, there are many rmap chains for the area covered by a huge
page (e.g. 256 for 16MB pages when PAGE_SIZE = 64kB), and the huge-page
HPTE could end up in any one of them.
For consistency, and to make the huge-page HPTEs easier to find, we now
put huge-page HPTEs in the rmap chain corresponding to the base address
of the huge page.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
The global_invalidates() function contains a check that is intended
to tell whether we are currently executing in the context of a hypercall
issued by the guest. The reason is that the optimization of using a
local TLB invalidate instruction is only valid in that context. The
check was testing local_paca->kvm_hstate.kvm_vcore, which gets set
when entering the guest but no longer gets cleared when exiting the
guest. To fix this, we use the kvm_vcpu field instead, which does
get cleared when exiting the guest, by the kvmppc_release_hwthread()
calls inside kvmppc_run_core().
The effect of having the check wrong was that when kvmppc_do_h_remove()
got called from htab_write() on the destination machine during a
migration, it cleared the current cpu's bit in kvm->arch.need_tlb_flush.
This meant that when the guest started running in the destination VM,
it may miss out on doing a complete TLB flush, and therefore may end
up using stale TLB entries from a previous guest that used the same
LPID value.
This should make migration more reliable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Commit b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8
SPRs") added a definition of KVM_REG_PPC_WORT with the same register
number as the existing KVM_REG_PPC_VRSAVE (though in fact the
definitions are not identical because of the different register sizes.)
For clarity, this moves KVM_REG_PPC_WORT to the next unused number,
and also adds it to api.txt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
We worked around some nasty KVM magic page hcall breakages:
1) NX bit not honored, so ignore NX when we detect it
2) LE guests swizzle hypercall instruction
Without these fixes in place, there's no way it would make sense to expose kvm
hypercalls to a guest. Chances are immensely high it would trip over and break.
So add a new CAP that gives user space a hint that we have workarounds for the
bugs above in place. It can use those as hint to disable PV hypercalls when
the guest CPU is anything POWER7 or higher and the host does not have fixes
in place.
Signed-off-by: Alexander Graf <agraf@suse.de>
When we reset the in-kernel MPIC controller, we forget to reset some hidden
state such as destmask and output. This state is usually set when the guest
writes to the IDR register for a specific IRQ line.
To make sure we stay in sync and don't forget hidden state, treat reset of
the IDR register as a simple write of the IDR register. That automatically
updates all the hidden state as well.
Reported-by: Paul Janzen <pcj@pauljanzen.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
There are LE Linux guests out there that don't handle hypercalls correctly.
Instead of interpreting the instruction stream from device tree as big endian
they assume it's a little endian instruction stream and fail.
When we see an illegal instruction from such a byte reversed instruction stream,
bail out graciously and just declare every hcall as error.
Signed-off-by: Alexander Graf <agraf@suse.de>
We get an array of instructions from the hypervisor via device tree that
we write into a buffer that gets executed whenever we want to make an
ePAPR compliant hypercall.
However, the hypervisor passes us these instructions in BE order which
we have to manually convert to LE when we want to run them in LE mode.
With this fixup in place, I can successfully run LE kernels with KVM
PV enabled on PR KVM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Use make_dsisr instead of open coding it. This also have
the added benefit of handling alignment interrupt on additional
instructions.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Although it's optional, IBM POWER cpus always had DAR value set on
alignment interrupt. So don't try to compute these values.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Because old kernels enable the magic page and then choke on NXed trampoline
code we have to disable NX by default in KVM when we use the magic page.
However, since commit b18db0b8 we have successfully fixed that and can now
leave NX enabled, so tell the hypervisor about this.
Signed-off-by: Alexander Graf <agraf@suse.de>
Old guests try to use the magic page, but map their trampoline code inside
of an NX region.
Since we can't fix those old kernels, try to detect whether the guest is sane
or not. If not, just disable NX functionality in KVM so that old guests at
least work at all. For newer guests, add a bit that we can set to keep NX
functionality available.
Signed-off-by: Alexander Graf <agraf@suse.de>
On recent IBM Power CPUs, while the hashed page table is looked up using
the page size from the segmentation hardware (i.e. the SLB), it is
possible to have the HPT entry indicate a larger page size. Thus for
example it is possible to put a 16MB page in a 64kB segment, but since
the hash lookup is done using a 64kB page size, it may be necessary to
put multiple entries in the HPT for a single 16MB page. This
capability is called mixed page-size segment (MPSS). With MPSS,
there are two relevant page sizes: the base page size, which is the
size used in searching the HPT, and the actual page size, which is the
size indicated in the HPT entry. [ Note that the actual page size is
always >= base page size ].
We use "ibm,segment-page-sizes" device tree node to advertise
the MPSS support to PAPR guest. The penc encoding indicates whether
we support a specific combination of base page size and actual
page size in the same segment. We also use the penc value in the
LP encoding of HPTE entry.
This patch exposes MPSS support to KVM guest by advertising the
feature via "ibm,segment-page-sizes". It also adds the necessary changes
to decode the base page size and the actual page size correctly from the
HPTE entry.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Today when KVM tries to reserve memory for the hash page table it
allocates from the normal page allocator first. If that fails it
falls back to CMA's reserved region. One of the side effects of
this is that we could end up exhausting the page allocator and
get linux into OOM conditions while we still have plenty of space
available in CMA.
This patch addresses this issue by first trying hash page table
allocation from CMA's reserved region before falling back to the normal
page allocator. So if we run out of memory, we really are out of memory.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
POWER8 introduces transactional memory which brings along a number of new
registers and MSR bits.
Implementing all of those is a pretty big headache, so for now let's at least
emulate enough to make Linux's context switching code happy.
Signed-off-by: Alexander Graf <agraf@suse.de>
POWER8 introduces a new facility called the "Event Based Branch" facility.
It contains of a few registers that indicate where a guest should branch to
when a defined event occurs and it's in PR mode.
We don't want to really enable EBB as it will create a big mess with !PR guest
mode while hardware is in PR and we don't really emulate the PMU anyway.
So instead, let's just leave it at emulation of all its registers.
Signed-off-by: Alexander Graf <agraf@suse.de>
POWER8 implements a new register called TAR. This register has to be
enabled in FSCR and then from KVM's point of view is mere storage.
This patch enables the guest to use TAR.
Signed-off-by: Alexander Graf <agraf@suse.de>
POWER8 introduced a new interrupt type called "Facility unavailable interrupt"
which contains its status message in a new register called FSCR.
Handle these exits and try to emulate instructions for unhandled facilities.
Follow-on patches enable KVM to expose specific facilities into the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>
In parallel to the Processor ID Register (PIR) threaded POWER8 also adds a
Thread ID Register (TIR). Since PR KVM doesn't emulate more than one thread
per core, we can just always expose 0 here.
Signed-off-by: Alexander Graf <agraf@suse.de>
When we expose a POWER8 CPU into the guest, it will start accessing PMU SPRs
that we don't emulate. Just ignore accesses to them.
Signed-off-by: Alexander Graf <agraf@suse.de>
With the previous patches applied, we can now successfully use PR KVM on
little endian hosts which means we can now allow users to select it.
However, HV KVM still needs some work, so let's keep the kconfig conflict
on that one.
Signed-off-by: Alexander Graf <agraf@suse.de>
When the host CPU we're running on doesn't support dcbz32 itself, but the
guest wants to have dcbz only clear 32 bytes of data, we loop through every
executable mapped page to search for dcbz instructions and patch them with
a special privileged instruction that we emulate as dcbz32.
The only guests that want to see dcbz act as 32byte are book3s_32 guests, so
we don't have to worry about little endian instruction ordering. So let's
just always search for big endian dcbz instructions, also when we're on a
little endian host.
Signed-off-by: Alexander Graf <agraf@suse.de>
The shared (magic) page is a data structure that contains often used
supervisor privileged SPRs accessible via memory to the user to reduce
the number of exits we have to take to read/write them.
When we actually share this structure with the guest we have to maintain
it in guest endianness, because some of the patch tricks only work with
native endian load/store operations.
Since we only share the structure with either host or guest in little
endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv.
For booke, the shared struct stays big endian. For book3s_64 hv we maintain
the struct in host native endian, since it never gets shared with the guest.
For book3s_64 pr we introduce a variable that tells us which endianness the
shared struct is in and route every access to it through helper inline
functions that evaluate this variable.
Signed-off-by: Alexander Graf <agraf@suse.de>
We expose a blob of hypercall instructions to user space that it gives to
the guest via device tree again. That blob should contain a stream of
instructions necessary to do a hypercall in big endian, as it just gets
passed into the guest and old guests use them straight away.
Signed-off-by: Alexander Graf <agraf@suse.de>
When the guest does an RTAS hypercall it keeps all RTAS variables inside a
big endian data structure.
To make sure we don't have to bother about endianness inside the actual RTAS
handlers, let's just convert the whole structure to host endian before we
call our RTAS handlers and back to big endian when we return to the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>
The HTAB on PPC is always in big endian. When we access it via hypercalls
on behalf of the guest and we're running on a little endian host, we need
to make sure we swap the bits accordingly.
Signed-off-by: Alexander Graf <agraf@suse.de>
The default MSR when user space does not define anything should be identical
on little and big endian hosts, so remove MSR_LE from it.
Signed-off-by: Alexander Graf <agraf@suse.de>
The "shadow SLB" in the PACA is shared with the hypervisor, so it has to
be big endian. We access the shadow SLB during world switch, so let's make
sure we access it in big endian even when we're on a little endian host.
Signed-off-by: Alexander Graf <agraf@suse.de>
The HTAB is always big endian. We access the guest's HTAB using
copy_from/to_user, but don't yet take care of the fact that we might
be running on an LE host.
Wrap all accesses to the guest HTAB with big endian accessors.
Signed-off-by: Alexander Graf <agraf@suse.de>
The HTAB is always big endian. We access the guest's HTAB using
copy_from/to_user, but don't yet take care of the fact that we might
be running on an LE host.
Wrap all accesses to the guest HTAB with big endian accessors.
Signed-off-by: Alexander Graf <agraf@suse.de>
Commit 9308ab8e2d made C/R HTAB updates go byte-wise into the target HTAB.
However, it didn't update the guest's copy of the HTAB, but instead the
host local copy of it.
Write to the guest's HTAB instead.
Signed-off-by: Alexander Graf <agraf@suse.de>
CC: Paul Mackerras <paulus@samba.org>
Acked-by: Paul Mackerras <paulus@samba.org>
This patch make sure we inherit the LE bit correctly in different case
so that we can run Little Endian distro in PR mode
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The dcbtls instruction is able to lock data inside the L1 cache.
We don't want to give the guest actual access to hardware cache locks,
as that could influence other VMs on the same system. But we can tell
the guest that its locking attempt failed.
By implementing the instruction we at least don't give the guest a
program exception which it definitely does not expect.
Signed-off-by: Alexander Graf <agraf@suse.de>
The L1 instruction cache control register contains bits that indicate
that we're still handling a request. Mask those out when we set the SPR
so that a read doesn't assume we're still doing something.
Signed-off-by: Alexander Graf <agraf@suse.de>
The kfree() function already NULL checks the parameter so remove the
redundant NULL checks before kfree() calls in arch/mips/kvm/.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The logging from MIPS KVM is fairly noisy with kvm_info() in places
where it shouldn't be, such as on VM creation and migration to a
different CPU. Replace these kvm_info() calls with kvm_debug().
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_debug() uses pr_debug() which is already compiled out in the absence
of a DEBUG define, so remove the unnecessary ifdef DEBUG lines around
kvm_debug() calls which are littered around arch/mips/kvm/.
As well as generally cleaning up, this prevents future bit-rot due to
DEBUG not being commonly used.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix build errors when DEBUG is defined in arch/mips/kvm/.
- The DEBUG code in kvm_mips_handle_tlbmod() was missing some variables.
- The DEBUG code in kvm_mips_host_tlb_write() was conditional on an
undefined "debug" variable.
- The DEBUG code in kvm_mips_host_tlb_inv() accessed asid_map directly
rather than using kvm_mips_get_user_asid(). Also fixed brace
placement.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The kvm_mips_comparecount_func() and kvm_mips_comparecount_wakeup()
functions are only used within arch/mips/kvm/kvm_mips.c, so make them
static.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Expose the KVM guest CP0_Count frequency to userland via a new
KVM_REG_MIPS_COUNT_HZ register accessible with the KVM_{GET,SET}_ONE_REG
ioctls.
When the frequency is altered the bias is adjusted such that the guest
CP0_Count doesn't jump discontinuously or lose any timer interrupts.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Expose two new virtual registers to userland via the
KVM_{GET,SET}_ONE_REG ioctls.
KVM_REG_MIPS_COUNT_CTL is for timer configuration fields and just
contains a master disable count bit. This can be used by userland to
freeze the timer in order to read a consistent state from the timer
count value and timer interrupt pending bit. This cannot be done with
the CP0_Cause.DC bit because the timer interrupt pending bit (TI) is
also in CP0_Cause so it would be impossible to stop the timer without
also risking a race with an hrtimer interrupt and having to explicitly
check whether an interrupt should have occurred.
When the timer is re-enabled it resumes without losing time, i.e. the
CP0_Count value jumps to what it would have been had the timer not been
disabled, which would also be impossible to do from userland with
CP0_Cause.DC. The timer interrupt also cannot be lost, i.e. if a timer
interrupt would have occurred had the timer not been disabled it is
queued when the timer is re-enabled.
This works by storing the nanosecond monotonic time when the master
disable is set, and using it for various operations instead of the
current monotonic time (e.g. when recalculating the bias when the
CP0_Count is set), until the master disable is cleared again, i.e. the
timer state is read/written as it would have been at that time. This
state is exposed to userland via the read-only KVM_REG_MIPS_COUNT_RESUME
virtual register so that userland can determine the exact time the
master disable took effect.
This should allow userland to atomically save the state of the timer,
and later restore it.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The KVM_HOST_FREQ Kconfig symbol was used by KVM guest kernels to
override the timer frequency calculation to a value based on the host
frequency. Now that the KVM timer emulation is implemented independent
of the host timer frequency and defaults to 100MHz, adjust the working
of CONFIG_KVM_HOST_FREQ to match.
The Kconfig symbol now specifies the guest timer frequency directly, and
has been renamed accordingly to KVM_GUEST_TIMER_FREQ. It now defaults to
100MHz too and the help text is updated to make it clear that a zero
value will allow the normal timer frequency calculation to take place
(based on the emulated RTC).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Previously the emulation of the CPU timer was just enough to get a Linux
guest running but some shortcuts were taken:
- The guest timer interrupt was hard coded to always happen every 10 ms
rather than being timed to when CP0_Count would match CP0_Compare.
- The guest's CP0_Count register was based on the host's CP0_Count
register. This isn't very portable and fails on cores without a
CP_Count register implemented such as Ingenic XBurst. It also meant
that the guest's CP0_Cause.DC bit to disable the CP0_Count register
took no effect.
- The guest's CP0_Count register was emulated by just dividing the
host's CP0_Count register by 4. This resulted in continuity problems
when used as a clock source, since when the host CP0_Count overflows
from 0x7fffffff to 0x80000000, the guest CP0_Count transitions
discontinuously from 0x1fffffff to 0xe0000000.
Therefore rewrite & fix emulation of the guest timer based on the
monotonic kernel time (i.e. ktime_get()). Internally a 32-bit count_bias
value is added to the frequency scaled nanosecond monotonic time to get
the guest's CP0_Count. The frequency of the timer is initialised to
100MHz and cannot yet be changed, but a later patch will allow the
frequency to be configured via the KVM_{GET,SET}_ONE_REG ioctl
interface.
The timer can now be stopped via the CP0_Cause.DC bit (by the guest or
via the KVM_SET_ONE_REG ioctl interface), at which point the current
CP0_Count is stored and can be read directly. When it is restarted the
bias is recalculated such that the CP0_Count value is continuous.
Due to the nature of hrtimer interrupts any read of the guest's
CP0_Count register while it is running triggers a check for whether the
hrtimer has expired, so that the guest/userland cannot observe the
CP0_Count passing CP0_Compare without queuing a timer interrupt. This is
also taken advantage of when stopping the timer to ensure that a pending
timer interrupt is queued.
This replaces the implementation of:
- Guest read of CP0_Count
- Guest write of CP0_Count
- Guest write of CP0_Compare
- Guest write of CP0_Cause
- Guest read of HWR 2 (CC) with RDHWR
- Host read of CP0_Count via KVM_GET_ONE_REG ioctl interface
- Host write of CP0_Count via KVM_SET_ONE_REG ioctl interface
- Host write of CP0_Compare via KVM_SET_ONE_REG ioctl interface
- Host write of CP0_Cause via KVM_SET_ONE_REG ioctl interface
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a VCPU is scheduled in on a different CPU, refresh the hrtimer used
for emulating count/compare so that it gets migrated to the same CPU.
This should prevent a timer interrupt occurring on a different CPU to
where the guest it relates to is running, which would cause the guest
timer interrupt not to be delivered until after the next guest exit.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The hrtimer callback for guest timer timeouts sets the guest's
CP0_Cause.TI bit to indicate to the guest that a timer interrupt is
pending, however there is no mutual exclusion implemented to prevent
this occurring while the guest's CP0_Cause register is being
read-modify-written elsewhere.
When this occurs the setting of the CP0_Cause.TI bit is undone and the
guest misses the timer interrupt and doesn't reprogram the CP0_Compare
register for the next timeout. Currently another timer interrupt will be
triggered again in another 10ms anyway due to the way timers are
emulated, but after the MIPS timer emulation is fixed this would result
in Linux guest time standing still and the guest scheduler not being
invoked until the guest CP0_Count has looped around again, which at
100MHz takes just under 43 seconds.
Currently this is the only asynchronous modification of guest registers,
therefore it is fixed by adjusting the implementations of the
kvm_set_c0_guest_cause(), kvm_clear_c0_guest_cause(), and
kvm_change_c0_guest_cause() macros which are used for modifying the
guest CP0_Cause register to use ll/sc to ensure atomic modification.
This should work in both UP and SMP cases without requiring interrupts
to be disabled.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When about to run the guest, deliver guest interrupts after disabling
host interrupts. This should prevent an hrtimer interrupt from being
handled after delivering guest interrupts, and therefore not delivering
the guest timer interrupt until after the next guest exit.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
HWREna register. This is so that userland can save and restore its
value so that RDHWR instructions don't have to be emulated by the guest.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
UserLocal register. This is so that userland can save and restore its
value.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
Count and Compare registers. These registers are special in that writing
to them has side effects (adjusting the time until the next timer
interrupt) and reading of Count depends on the time. Therefore add a
couple of callbacks so that different implementations (trap & emulate or
VZ) can implement them differently depending on what the hardware
provides.
The trap & emulate versions mostly duplicate what happens when a T&E
guest reads or writes these registers, so it inherits the same
limitations which can be fixed in later patches.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the KVM_{GET,SET}_ONE_REG MIPS register id definitions out of
kvm_mips.c to kvm_host.h so that they can be shared between multiple
source files. This allows register access to be indirected depending on
the underlying implementation (trap & emulate or VZ).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Contrary to the comment, the guest CP0_EPC register cannot be set via
kvm_regs, since it is distinct from the guest PC. Add the EPC register
to the KVM_{GET,SET}_ONE_REG ioctl interface.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When MIPS KVM needs to write a TLB entry for the guest it reads the
CP0_Random register, uses it to generate the CP_Index, and writes the
TLB entry using the TLBWI instruction (tlb_write_indexed()).
However there's an instruction for that, TLBWR (tlb_write_random()) so
use that instead.
This happens to also fix an issue with Ingenic XBurst cores where the
same TLB entry is replaced each time preventing forward progress on
stores due to alternating between TLB load misses for the instruction
fetch and TLB store misses.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MIPS KVM uses mips32_SyncICache to synchronise the icache with the
dcache after dynamically modifying guest instructions or writing guest
exception vector. However this uses rdhwr to get the SYNCI step, which
causes a reserved instruction exception on Ingenic XBurst cores.
It would seem to make more sense to use local_flush_icache_range()
instead which does the same thing but is more portable.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Export the local_flush_icache_range function pointer for GPL modules so
that it can be used by KVM for syncing the icache after binary
translation of trapping instructions.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Each MIPS KVM guest has its own copy of the KVM exception vector. This
contains the TLB refill exception handler at offset 0x000, the general
exception handler at offset 0x180, and interrupt exception handlers at
offset 0x200 in case Cause_IV=1. A common handler is copied to offset
0x2000 and offset 0x3000 is used for temporarily storing k1 during entry
from guest.
However the amount of memory allocated for this purpose is calculated as
0x200 rounded up to the next page boundary, which is insufficient if 4KB
pages are in use. This can lead to the common handler at offset 0x2000
being overwritten and infinitely recursive exceptions on the next exit
from the guest.
Increase the minimum size from 0x200 to 0x4000 to cover the full use of
the page.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2. Fix flag check for gdb support
3. Remove unnecessary vcpu start
4. Remove code duplication for sigp interrupts
5. Better DAT handling for the TPROT instruction
6. Correct addressing exception for standby memory
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJTiD4+AAoJEBF7vIC1phx88poQALJkg1EZLePQBwuqe3XKeUnV
AxbtP1T5++Pv+JpXj/mTmI4ToE3U+t66iObCZ8GAWZtXNr47yJQRetTqSU1G8BFe
eLpuWhyaQD31awp5M9sWIAzNjqnO4vR0f6kbVeoXbRmPzdZY19QaQb5QIRFMA/GT
hSYlNoftRfM/e2ZRFUvqiMIKFHs2r/f9T+7SMRhu3YeZo42N3KOnIwFQrLCoaP+x
8GBmxfl0OaEU9Rer6/MvyJVdMHD5ap4UyreaB5PLVtUNGIrcAJ2vjQGbF78bRonA
9SwJ0mMEfzJoH6gZXDEby1e/p7MJ80wHf9jMoy2u2qtyvNuQIyBWS1eco7LH5dpd
pxXcViLC8proiXfFK/6N4ra5cr3auMAmA5iX1LjD+ZxhAjlA+WTcC31jSfueDn33
mBpDASLcoJXbPoEhWiAjP8wkgfUAQnr5CPu5lrmloOrj5llYx0FL2gDhX5dXe7op
5orTAcEtfSxqosK/Av2oQnmUU61BkI4UuBwCbpqM/1w8oKcYX4jUE5Yx/Kmej80Z
MtpYUnBDmJaYPogddr8q5QcnRyLEoRUKfKu5lDnlLaP1oi7vr+WBR+BkNLl0/o7o
+XVdruc7OtC/Ahtep8unkLV5C8j1siXergjvHXhHjpGNsIQ2tfD8LO55ST4vYkUd
GAosCgM/6wt8YmK67qc+
=84hG
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-20140530' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
1. Several minor fixes and cleanups for KVM:
2. Fix flag check for gdb support
3. Remove unnecessary vcpu start
4. Remove code duplication for sigp interrupts
5. Better DAT handling for the TPROT instruction
6. Correct addressing exception for standby memory
Based on original patch from Jeng-fang (Nick) Wang
When standby memory is specified for a guest Linux, but no virtual memory has
been allocated on the Qemu host backing that guest, the guest memory detection
process encounters a memory access exception which is not thrown from the KVM
handle_tprot() instruction-handler function. The access exception comes from
sie64a returning EFAULT, which then passes an addressing exception to the guest.
Unfortunately this does not the proper PSW fixup (nullifying vs.
suppressing) so the guest will get a fault for the wrong address.
Let's just intercept the tprot instruction all the time to do the right thing
and not go the page fault handler path for standby memory. tprot is only used
by Linux during startup so some exits should be ok.
Without this patch, standby memory cannot be used with KVM.
Signed-off-by: Nick Wang <jfwang@us.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch removes the start of a VCPU when delivering a RESTART interrupt.
Interrupt delivery is called from kvm_arch_vcpu_ioctl_run. So the VCPU is
already considered started - no need to call kvm_s390_vcpu_start. This function
will early exit anyway.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch fixes a minor bug when updating the guest debug settings.
We should check the given debug flags, not the already set ones.
Doesn't do any harm but too many (for now unused) flags could be set internally
without error.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
We have all the logic to inject interrupts available in
kvm_s390_inject_vcpu(), so let's use it instead of
injecting irqs manually to the list in sigp code.
SIGP stop is special because we have to check the
action_flags before injecting the interrupt. As
the action_flags are not available in kvm_s390_inject_vcpu()
we leave the code for the stop order code untouched for now.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The TPROT instruction can be used to check the accessability of storage
for any kind of logical addresses. So far, our handler only supported
real addresses. This patch now also enables support for addresses that
have to be translated via DAT first. And while we're at it, change the
code to use the common KVM function gfn_to_hva_prot() to check for the
validity and writability of the memory page.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
This patch adds a function for translating logical guest addresses into
physical guest addresses without touching the memory at the given location.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Pull ARM fixes from Russell King:
"The usual random collection of relatively small ARM fixes"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8063/1: bL_switcher: fix individual online status reporting of removed CPUs
ARM: 8064/1: fix v7-M signal return
ARM: 8057/1: amba: Add Qualcomm vendor ID.
ARM: 8052/1: unwind: Fix handling of "Pop r4-r[4+nnn],r14" opcode
ARM: 8051/1: put_user: fix possible data corruption in put_user
ARM: 8048/1: fix v7-M setup stack location
When configure kprobe events of ftrace with "stacktrace" option enabled
in arm, there is no stacktrace was recorded after the kprobe event was
triggered. The root cause is no save_stack_trace_regs() function implemented.
Implement the save_stack_trace_regs() function in arm, then ftrace will
call this architecture-related function to record the stacktrace into
ring buffer.
After this fix, stacktrace can be recorded, for example:
# mount -t debugfs nodev /sys/kernel/debug
# echo "p:netrx net_rx_action" >> /sys/kernel/debug/tracing/kprobe_events
# echo 1 > /sys/kernel/debug/tracing/events/kprobes/netrx/enable
# echo 1 > /sys/kernel/debug/tracing/options/stacktrace
# echo 1 > /sys/kernel/debug/tracing/tracing_on
# ping 127.0.0.1 -c 1
# echo 0 > /sys/kernel/debug/tracing/tracing_on
# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 12/12 #P:1
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<------ missing some entries ---------------->
ping-1200 [000] dNs1 667.603250: netrx: (net_rx_action+0x0/0x1f8)
ping-1200 [000] dNs1 667.604738: <stack trace>
=> net_rx_action
=> do_softirq
=> local_bh_enable
=> ip_finish_output
=> ip_output
=> ip_local_out
=> ip_send_skb
=> ip_push_pending_frames
=> raw_sendmsg
=> inet_sendmsg
=> sock_sendmsg
=> SyS_sendto
=> ret_fast_syscall
Signed-off-by: Lin Yongting <linyongting@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Support for ARM710 CPUs was removed in v3.5. Now remove the last code
depending on its Kconfig macro.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
We will reach fixup handler when one thread(say cpu0) caused an undefined exception, while another thread(say cpu1) is unmmaping the page.
Fixup handler returns to the next userspace instruction which has caused the undef execption, rather than going to the same instruction.
ARM ARM says that after undefined exception, the PC will be pointing
to the next instruction. ie +4 offset in case of ARM and +2 in case of Thumb
And there is no correction offset passed to vector_stub in case of
undef exception.
File: arch/arm/kernel/entry-armv.S +1085
vector_stub und, UND_MODE
During an undefined exception, in normal scenario(ie when ldrt
instruction does not cause an abort) after resorting the context in
VFP hardware, the PC is modified as show below before jumping to
ret_from_exception which is in r9.
File: arch/arm/vfp/vfphw.S +169
@ The context stored in the VFP hardware is up to date with this thread
vfp_hw_state_valid:
tst r1, #FPEXC_EX
bne process_exception @ might as well handle the pending
@ exception before retrying branch
@ out before setting an FPEXC that
@ stops us reading stuff
VFPFMXR FPEXC, r1 @ Restore FPEXC last
sub r2, r2, #4 @ Retry current instruction - if Thumb
str r2, [sp, #S_PC] @ mode it's two 16-bit instructions,
@ else it's one 32-bit instruction, so
@ always subtract 4 from the following
@ instruction address.
But if ldrt results in an abort, we reach the fixup handler and return
to ret_from_execption without correcting the pc.
This patch modifes the fixup handler to re-execute the same instruction which caused undefined execption.
Signed-off-by: Vinayak Menon <vinayakm.list@gmail.com>
Signed-off-by: Arun KS <getarunks@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
asm-generic offers an atomic-add based rwsem implementation, which
can avoid the need for heavier, spinlock-based synchronisation on the
fast path.
This patch makes use of the optimised implementation for ARM CPUs.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
As we have now removed all instances of the L2C-310 having its cache
size "modified" via platform/SoC code, discourage new cases showing
up by printing a warning.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
We no longer need or require the .set_debug method; we handle everything
it used to do via the .write_sec method instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
L2X0_AUX_CTRL_MASK is not useful for PL310s. It would be better if
people thought about their value for this rather than cargo-cult
programming.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove the explicit call to l2x0_of_init(), converting to the generic
infrastructure instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The cache size should already be present in the L2 cache auxiliary
control register: it is part of the integration process to configure
the hardware IP. Most platforms get this right, yet still many
cargo-cult program, and assume that they always need specifying to
the L2 cache code. Remove them so we can find out which really need
this.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove the explicit call to l2x0_of_init(), converting to the generic
infrastructure instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
It is beneficial to have the L2 cache up and running earlier in the
system boot. Not only will this allow for simpler code when we come to
enable some features, but it also means that we get a more accurate
bogomips value for the udelay() loop. Calibrating the loop with the
L2 cache off, and then running with the L2 cache on is not the best
idea.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
ux500 can't change the auxiliary control register, so there's no point
passing values to try and modify it to the l2x0 init functions.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The cache size should already be present in the L2 cache auxiliary
control register: it is part of the integration process to configure
the hardware IP. Most platforms get this right, yet still many
cargo-cult program, and assume that they always need specifying to
the L2 cache code. Remove them so we can find out which really need
this.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
ux500 can't write to any of the secure registers on the L2C controllers,
so provide a dummy handler which ignores all writes.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove the explicit call to l2x0_of_init(), converting to the generic
infrastructure instead.
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Reviewed-by: Joseph Lo <josephl@nvidia.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The cache size should already be present in the L2 cache auxiliary
control register: it is part of the integration process to configure
the hardware IP. Most platforms get this right, yet still many
cargo-cult program, and assume that they always need specifying to
the L2 cache code. Remove them so we can find out which really need
this.
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove the explicit call to l2x0_of_init(), converting to the generic
infrastructure instead. We can remove the .init_machine as it becomes
the same as the generic version.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The cache size should already be present in the L2 cache auxiliary
control register: it is part of the integration process to configure
the hardware IP. Most platforms get this right, yet still many
cargo-cult program, and assume that they always need specifying to
the L2 cache code. Remove them so we can find out which really need
this.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove the explicit call to l2x0_of_init(), converting to the generic
infrastructure instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The cache size should already be present in the L2 cache auxiliary
control register: it is part of the integration process to configure
the hardware IP. Most platforms get this right, yet still many
cargo-cult program, and assume that they always need specifying to
the L2 cache code. Remove them so we can find out which really need
this.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove the explicit call to l2x0_of_init(), converting to the generic
infrastructure instead. This also allows us to eliminate the
.init_machine function as this becomes the same as the generic version.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add better commentry about the L2 cache requirements on these platforms.
Unfortunately, the auxiliary control register is not pre-set to indicate
the correct cache parameters, so we have to manually program these.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove the explicit call to l2x0_of_init(), converting to the generic
infrastructure instead. Along with this change, we can delete l2x0.c
from prima2.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>