Under CONFIG_STRICT_DEVMEM, reading System RAM through /dev/mem is
disallowed. However, on x86, the first 1MB was always allowed for BIOS
and similar things, regardless of it actually being System RAM. It was
possible for heap to end up getting allocated in low 1MB RAM, and then
read by things like x86info or dd, which would trip hardened usercopy:
usercopy: kernel memory exposure attempt detected from ffff880000090000 (dma-kmalloc-256) (4096 bytes)
This changes the x86 exception for the low 1MB by reading back zeros for
System RAM areas instead of blindly allowing them. More work is needed to
extend this to mmap, but currently mmap doesn't go through usercopy, so
hardened usercopy won't Oops the kernel.
Reported-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Tested-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
virt_xxx memory barriers are implemented trivially using the low-level
__smp_xxx macros, __smp_xxx is equal to a compiler barrier for strong
TSO memory model, however, mandatory barriers will unconditional add
memory barriers, this patch replaces the rmb() in kvm_steal_clock() by
virt_rmb().
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
VCPU TSC synchronization is perfromed in kvm_write_tsc() when the TSC
value being set is within 1 second from the expected, as obtained by
extrapolating of the TSC in already synchronized VCPUs.
This is naturally achieved on all VCPUs at VM start and resume;
however on VCPU hotplug it is not: the newly added VCPU is created
with TSC == 0 while others are well ahead.
To compensate for that, consider host-initiated kvm_write_tsc() with
TSC == 0 a special case requiring synchronization regardless of the
current TSC on other VCPUs.
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reuse existing code instead of using inline asm.
Make the code more concise and clear in the TSC
synchronization part.
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Although the current check is not wrong, this check explicitly includes
the pic.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
We already have the exact same checks a couple of lines below.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Not used outside of i8259.c, so let's make it static.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
We can easily compact this code and get rid of one local variable.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Let's rename it into a proper arch specific callback.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
We know there is an ioapic, so let's call it directly.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
kvm_ioapic_init() is guaranteed to be called without any created VCPUs,
so doing an all-vcpu request results in a NOP.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Currently, one could set pin 8-15, implicitly referring to
KVM_IRQCHIP_PIC_SLAVE.
Get rid of the two local variables max_pin and delta on the way.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Let's just move it to the place where it is actually needed.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
I don't see any reason any more for this lock, seemed to be used to protect
removal of kvm->arch.vpic / kvm->arch.vioapic when already partially
inititalized, now access is properly protected using kvm->arch.irqchip_mode
and this shouldn't be necessary anymore.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
When handling KVM_GET_IRQCHIP, we already check irqchip_kernel(), which
implies a fully inititalized ioapic.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
It seemed like a nice idea to encapsulate access to kvm->arch.vpic. But
as the usage is already mixed, internal locks are taken outside of i8259.c
and grepping for "vpic" only is much easier, let's just get rid of
pic_irqchip().
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
KVM_IRQCHIP_KERNEL implies a fully inititalized ioapic, while
kvm->arch.vioapic might temporarily be set but invalidated again if e.g.
setting of default routing fails when setting KVM_CREATE_IRQCHIP.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Let's avoid checking against kvm->arch.vpic. We have kvm->arch.irqchip_mode
for that now.
KVM_IRQCHIP_KERNEL implies a fully inititalized pic, while kvm->arch.vpic
might temporarily be set but invalidated again if e.g. kvm_ioapic_init()
fails when setting KVM_CREATE_IRQCHIP. Although current users seem to be
fine, this avoids future bugs.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Let's replace the checks for pic_in_kernel() and ioapic_in_kernel() by
checks against irqchip_mode.
Also make sure that creation of any route is only possible if we have
an lapic in kernel (irqchip_in_kernel()) or if we are currently
inititalizing the irqchip.
This is necessary to switch pic_in_kernel() and ioapic_in_kernel() to
irqchip_mode, too.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Let's add a new mode and set it while we create the irqchip via
KVM_CREATE_IRQCHIP and KVM_CAP_SPLIT_IRQCHIP.
This mode will be used later to test if adding routes
(in kvm_set_routing_entry()) is already allowed.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Pull rockchip clk driver updates from Heiko Stuebner:
General rockchip clock changes for 4.12. Contains some new clock-ids
as well as fixups of the clock-ids on rk3368 timers, which were unused
and completely wrong (more and differently named timers).
Also there is one new clock on rk3328 using the muxgrf type, a fix for
pll enablement which should wait for the pll to lock before continuing,
some more critical clocks and the rename of the rk1108 to rv1108, as the
soc seems to have been using a preliminary name before its actual release.
The plan is to have the driver changes (pinctrl, clk) go through the
respective maintainer trees and once everything landed in mainline do
the rename of the devicetree files. With the dts-include change in the
clock rename, we also keep everything compiling and thus bisectability.
* tag 'v4.12-rockchip-clk1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
clk: rockchip: add pll_wait_lock for pll_enable
clk: rockchip: rename RK1108 to RV1108
dt-bindings: rk1108-cru: rename RK1108 to RV1108
clk: rockchip: mark some rk3368 core-clks as critical
clk: rockchip: export SCLK_TIMERXX id for timers on rk3368
clk: rockchip: describe clk_gmac using the new muxgrf type on rk3328
clk: rockchip: add clock ids for timer10-15 of RK3368 SoCs
clk: rockchip: fix up rk3368 timer-ids
clk: rockchip: add rk3328 clk_mac2io_ext ID
clk: rockchip: Set "ignore unused" for PMU M0 clocks on rk3399
We found out that HW checksum generation only works from AST2500
onward. This disables it on AST2400 and removes the "no-hw-checksum"
properties in the device-trees. The problem we had wasn't related
to NC-SI.
Also rework the logic testing for that property so it can be used
to disable HW checksum generation and checking regardless of whether
NC-SI is used or not in case other variants out there need this.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We test for aspeed chips to handle a couple of special cases,
but we do that by checking the machine type which isn't right.
Instead check the actual device compatible property. This also
updates the dtsi files for the aspeed SoC to match.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Safexcel EIP197 cryptographic engine is used on some Marvell SoCs,
such as Armada 7040 and Armada 8040. Enable this driver as a module in
the ARM64 defconfig.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
This patch enables the driver for the SDHCI controller found on the
Marvell Armada 3700 and 7K/8K ARM64 SoCs.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
arch_check_elf contains a usage of current_cpu_data that will call
smp_processor_id() with preemption enabled and therefore triggers a
"BUG: using smp_processor_id() in preemptible" warning when an fpxx
executable is loaded.
As a follow-up to commit b244614a60 ("MIPS: Avoid a BUG warning during
prctl(PR_SET_FP_MODE, ...)"), apply the same fix to arch_check_elf by
using raw_current_cpu_data instead. The rationale quoted from the previous
commit:
"It is assumed throughout the kernel that if any CPU has an FPU, then
all CPUs would have an FPU as well, so it is safe to perform the check
with preemption enabled - change the code to use raw_ variant of the
check to avoid the warning."
Fixes: 46490b5725 ("MIPS: kernel: elf: Improve the overall ABI and FPU mode checks")
Signed-off-by: James Cowgill <James.Cowgill@imgtec.com>
CC: <stable@vger.kernel.org> # 4.0+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15951/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
In commit 827456e710 ("MIPS: Export _mcount alongside its definition")
the EXPORT_SYMBOL macro exporting _mcount was moved from C code into
assembly. Unlike C, exported assembly symbols need to have a function
prototype in asm/asm-prototypes.h for modversions to work properly.
Without this, modpost prints out this warning:
WARNING: EXPORT symbol "_mcount" [vmlinux] version generation failed,
symbol will not be versioned.
Fix by including asm/ftrace.h (where _mcount is declared) in
asm/asm-prototypes.h.
Fixes: 827456e710 ("MIPS: Export _mcount alongside its definition")
Signed-off-by: James Cowgill <James.Cowgill@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15952/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The current behaviour of the hash table dump assumes that memory is contiguous
and iterates from the start of memory to (start + size of memory). When memory
isn't physically contiguous, this doesn't work.
If memory exists at 0-5 GB and 6-10 GB then the current approach will check if
entries exist in the hash table from 0GB to 9GB. This patch changes the
behaviour to iterate over any holes up to the end of memory.
Fixes: 1515ab9321 ("powerpc/mm: Dump hash table")
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The current page table dumper scans the Linux page tables and coalesces mappings
with adjacent virtual addresses and similar PTE flags. This behaviour is
somewhat broken when you consider the IOREMAP space where entirely unrelated
mappings will appear to be virtually contiguous. This patch modifies the range
coalescing so that only ranges that are both physically and virtually contiguous
are combined. This patch also adds to the dump output the physical address at
the start of each range.
Fixes: 8eb07b1870 ("powerpc/mm: Dump linux pagetables")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Print the physicall address with 0x like the other addresses]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On Book3s we have two PTE flags used to mark cache-inhibited mappings:
_PAGE_TOLERANT and _PAGE_NON_IDEMPOTENT. Currently the kernel page table dumper
only looks at the generic _PAGE_NO_CACHE which is defined to be _PAGE_TOLERANT.
This patch modifies the dumper so both flags are shown in the dump.
Fixes: 8eb07b1870 ("powerpc/mm: Dump linux pagetables")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently sys_mmap() and sys_mmap2() (32-bit only), are not visible to the
syscall tracing machinery. This means users are not able to see the execution of
mmap() syscalls using the syscall tracer.
Fix that by using SYSCALL_DEFINE6 for sys_mmap() and sys_mmap2() so that the
meta-data associated with these syscalls is visible to the syscall tracer.
A side-effect of this change is that the return type has changed from unsigned
long to long. However this should have no effect, the only code in the kernel
which uses the result of these syscalls is in the syscall return path, which is
written in asm and treats the result as unsigned regardless.
Example output:
cat-3399 [001] .... 196.542410: sys_mmap(addr: 7fff922a0000, len: 20000, prot: 3, flags: 812, fd: 3, offset: 1b0000)
cat-3399 [001] .... 196.542443: sys_mmap -> 0x7fff922a0000
cat-3399 [001] .... 196.542668: sys_munmap(addr: 7fff922c0000, len: 6d2c)
cat-3399 [001] .... 196.542677: sys_munmap -> 0x0
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Massage change log, add detail on return type change]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Recently in commit f6eedbba7a ("powerpc/mm/hash: Increase VA range to 128TB"),
we increased H_PGD_INDEX_SIZE to 15 when we're building with 64K pages. This
makes it larger than RADIX_PGD_INDEX_SIZE (13), which means the logic to
calculate MAX_PGD_INDEX_SIZE in book3s/64/pgtable.h is wrong.
The end result is that the PGD (Page Global Directory, ie top level page table)
of the kernel (aka. swapper_pg_dir), is too small.
This generally doesn't lead to a crash, as we don't use the full range in normal
operation. However if we try to dump the kernel pagetables we can trigger a
crash because we walk off the end of the pgd into other memory and eventually
try to dereference something bogus:
$ cat /sys/kernel/debug/kernel_pagetables
Unable to handle kernel paging request for data at address 0xe8fece0000000000
Faulting instruction address: 0xc000000000072314
cpu 0xc: Vector: 380 (Data SLB Access) at [c0000000daa13890]
pc: c000000000072314: ptdump_show+0x164/0x430
lr: c000000000072550: ptdump_show+0x3a0/0x430
dar: e802cf0000000000
seq_read+0xf8/0x560
full_proxy_read+0x84/0xc0
__vfs_read+0x6c/0x1d0
vfs_read+0xbc/0x1b0
SyS_read+0x6c/0x110
system_call+0x38/0xfc
The root cause is that MAX_PGD_INDEX_SIZE isn't actually computed to be
the max of H_PGD_INDEX_SIZE or RADIX_PGD_INDEX_SIZE. To fix that move
the calculation into asm-offsets.c where we can do it easily using
max().
Fixes: f6eedbba7a ("powerpc/mm/hash: Increase VA range to 128TB")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This merges the arch part of the XIVE support, leaving the final commit
with the KVM specific pieces dangling on the branch for Paul to merge
via the kvm-ppc tree.
This the mips version of commit c1bd55f922 ("x86: opt into
HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit").
Simply use the tls system call argument instead of extracting the tls
argument by magic from the pt_regs structure.
See commit 3033f14ab7 ("clone: support passing tls argument via C
rather than pt_regs magic") for more background.
Signed-off-by: James Cowgill <James.Cowgill@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15855/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
We always either target MIPS32/MIPS64 or microMIPS, and always include
one & only one of uasm-mips.c or uasm-micromips.c. Therefore the
abstraction of the ISA in asm/uasm.h declaring functions for either ISA
is redundant & needless. Remove it to simplify the code.
This is largely the result of the following:
:%s/ISAOPC(\(.\{-}\))/uasm_i##\1/
:%s/ISAFUNC(\(.\{-}\))/\1/
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Paul Burton <paul.burton@imgtec.com>
Patchwork: https://patchwork.linux-mips.org/patch/15844/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
When delivering a machine check the CPU state is "loaded", which
means that some registers are already in the host registers.
Before writing the register content into the machine check
save area, we must make sure that we save the content of the
registers into the data structures that are used for delivering
the machine check.
We already do the right thing for access, vector/floating point
registers, let's do the same for guarded storage.
Fixes: 4e0b1ab72b ("KVM: s390: gs support for kvm guests")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
If the guest does not use the host register management, but it uses
the sdnx area, we must fill in a proper sdnxo value (address of sdnx
and the sdnxc).
Reported-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Recently most nodes got labels to make them referenceable. The USB 3.0
nodes as well as the nodes for the SATA controllers were left out,
rectify the omission.
The labels "sataX" are already used by some boards for the SATA ports,
therefore use "ahciX" to label the SATA controller nodes.
To avoid potential confusion by labeling an USB3.0 controller "usb2" use
usb3_X as labels. This also coincides with the node names themselves
(usb@xxxxx vs usb3@xxxxx).
Signed-off-by: Ralph Sennhauser <ralph.sennhauser@gmail.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Since bbb56c2722 ("arm64: Add detection code for broken .inst support
in binutils"), running any make target that doesn't involve the cross
compiler results in a spurious warning:
$ make ARCH=arm64 menuconfig
arch/arm64/Makefile:43: Detected assembler with broken .inst; disassembly will be unreliable
while
$ make ARCH=arm64 CROSS_COMPILE=aarch64-arm-linux- menuconfig
is silent (assuming your compiler is not affected). That's because
the code that tests for the workaround is always run, irrespective
of the current configuration being available or not.
An easy fix is to make the detection conditional on CONFIG_ARM64
being defined, which is only the case when actually building
something.
Fixes: bbb56c2722 ("arm64: Add detection code for broken .inst support in binutils")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>