Commit Graph

7309 Commits

Author SHA1 Message Date
Vasily Gorbik
bcf1650c9b s390/boot: avoid unnecessary zeroing of .bss section
.bss section is a part of the decompressor's image now, linker fills it
with zeros already. No need do it with memset additionally.

Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-21 08:07:44 +02:00
Liu Shixin
61f2e77489 s390/diag: convert to use DEFINE_SEQ_ATTRIBUTE macro
Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-17 14:11:03 +02:00
Heiko Carstens
110a6dbb2e s390/uaccess: add HAVE_GET_KERNEL_NOFAULT support
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-17 14:11:03 +02:00
Heiko Carstens
fc3f61e1bc s390/dis: get rid of set_fs() usage
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-17 14:11:03 +02:00
Vasily Gorbik
c360c9a238 s390/kasan: support protvirt with 4-level paging
Currently the kernel crashes in Kasan instrumentation code if
CONFIG_KASAN_S390_4_LEVEL_PAGING is used on protected virtualization
capable machine where the ultravisor imposes addressing limitations on
the host and those limitations are lower then KASAN_SHADOW_OFFSET.

The problem is that Kasan has to know in advance where vmalloc/modules
areas would be. With protected virtualization enabled vmalloc/modules
areas are moved down to the ultravisor secure storage limit while kasan
still expects them at the very end of 4-level paging address space.

To fix that make Kasan recognize when protected virtualization is enabled
and predefine vmalloc/modules areas position which are compliant with
ultravisor secure storage limit.

Kasan shadow itself stays in place and might reside above that ultravisor
secure storage limit.

One slight difference compaired to a kernel without Kasan enabled is that
vmalloc/modules areas position is not reverted to default if ultravisor
initialization fails. It would still be below the ultravisor secure
storage limit.

Kernel layout with kasan, 4-level paging and protected virtualization
enabled (ultravisor secure storage limit is at 0x0000800000000000):
---[ vmemmap Area Start ]---
0x0000400000000000-0x0000400080000000
---[ vmemmap Area End ]---
---[ vmalloc Area Start ]---
0x00007fe000000000-0x00007fff80000000
---[ vmalloc Area End ]---
---[ Modules Area Start ]---
0x00007fff80000000-0x0000800000000000
---[ Modules Area End ]---
---[ Kasan Shadow Start ]---
0x0018000000000000-0x001c000000000000
---[ Kasan Shadow End ]---
0x001c000000000000-0x0020000000000000         1P PGD I

Kernel layout with kasan, 4-level paging and protected virtualization
disabled/unsupported:
---[ vmemmap Area Start ]---
0x0000400000000000-0x0000400060000000
---[ vmemmap Area End ]---
---[ Kasan Shadow Start ]---
0x0018000000000000-0x001c000000000000
---[ Kasan Shadow End ]---
---[ vmalloc Area Start ]---
0x001fffe000000000-0x001fffff80000000
---[ vmalloc Area End ]---
---[ Modules Area Start ]---
0x001fffff80000000-0x0020000000000000
---[ Modules Area End ]---

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16 14:08:48 +02:00
Vasily Gorbik
c2314cb2dd s390/protvirt: support ultravisor without secure storage limit
Avoid potential crash due to lack of secure storage limit. Check that
max_sec_stor_addr is not 0 before adjusting vmalloc position.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16 14:08:47 +02:00
Vasily Gorbik
1d6671ae46 s390/protvirt: parse prot_virt option in the decompressor
To make early kernel address space layout definition possible parse
prot_virt option in the decompressor and pass it to the uncompressed
kernel. This enables kasan to take ultravisor secure storage limit into
consideration and pre-define vmalloc position correctly.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16 14:08:47 +02:00
Vasily Gorbik
8f78657c29 s390/kasan: avoid unnecessary moving of vmemmap
Currently vmemmap area is unconditionally moved beyond Kasan shadow
memory. When Kasan is not enabled vmemmap area position is calculated
in setup_memory_end() and depends on limiting factors like ultravisor
secure storage limit. Try to follow the same logic with Kasan enabled
as well and avoid unnecessary vmemmap area position changes unless it
really intersects with Kasan shadow.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16 14:08:47 +02:00
Vasily Gorbik
ee4b2ce6d1 s390/mm,ptdump: sort markers
Kasan configuration options and size of physical memory present could
affect kernel memory layout. In particular vmemmap, vmalloc and modules
might come before kasan shadow or after it. To make ptdump correctly
output markers in the right order markers have to be sorted.

To preserve the original order of markers with the same start address
avoid using sort() from lib/sort.c (which is not stable sorting algorithm)
and sort markers in place.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16 14:08:47 +02:00
Niklas Schnelle
4904e1941e s390/pci: add missing pci_iov.h include
this fixes a missing prototype compiler warning spotted by the kernel
test robot.

Fixes: abb95b7550 ("s390/pci: consolidate SR-IOV specific code")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16 14:08:47 +02:00
Heiko Carstens
48111b4838 s390/mm,ptdump: add proper ifdefs
Use ifdefs instead of IS_ENABLED() to avoid compile error
for !PTDUMP_DEBUGFS:

arch/s390/mm/dump_pagetables.c: In function ‘pt_dump_init’:
arch/s390/mm/dump_pagetables.c:248:64: error: ‘ptdump_fops’ undeclared (first use in this function); did you mean ‘pidfd_fops’?
   debugfs_create_file("kernel_page_tables", 0400, NULL, NULL, &ptdump_fops);

Reported-by: Julian Wiedmann <jwi@linux.ibm.com>
Fixes: 08c8e685c7 ("s390: add ARCH_HAS_DEBUG_WX support")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16 14:08:47 +02:00
Alexander Egorenkov
980d5f9ab3 s390/boot: enable .bss section for compressed kernel
- Support static uninitialized variables in compressed kernel.
- Remove chkbss script
- Get rid of workarounds for not having .bss section

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16 14:08:47 +02:00
Janosch Frank
1a80b54d1c s390/uv: add destroy page call
We don't need to export pages if we destroy the VM configuration
afterwards anyway. Instead we can destroy the page which will zero it
and then make it accessible to the host.

Destroying is about twice as fast as the export.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/kvm/20200907124700.10374-2-frankja@linux.ibm.com/
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:35 +02:00
Vasily Gorbik
e670e64af1 s390/mm,ptdump: add couple of additional markers
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
[hca@linux.ibm.com: add more markers, rename some markers]
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:35 +02:00
Vasily Gorbik
d411e3c674 s390/kasan: make shadow memory noexec
ARCH_HAS_DEBUG_WX feature support brought attention to the fact that
currently initial kasan shadow memory mapped without noexec flag. So fix that.

Temporary initial identity mapping is still created without noexec, but
it is replaced by properly set up paging later.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:35 +02:00
Heiko Carstens
08c8e685c7 s390: add ARCH_HAS_DEBUG_WX support
Checks the whole kernel address space for W+X mappings. Note that
currently the first lowcore page unfortunately has to be mapped
W+X. Therefore this not reported as an insecure mapping.

For the very same reason the wording is also different to other
architectures if the test passes:

On s390 it is "no unexpected W+X pages found" instead of
"no W+X pages found".

Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:35 +02:00
Heiko Carstens
6bf9a639e7 s390/mm,ptdump: make page table dumping seq_file optional
s390 version of ae5d1cf358 ("arm64: dump: Make the page table
dumping seq_file optional").

Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:35 +02:00
Heiko Carstens
6c6687a444 s390/kprobes: make insn pages read-only
Make sure that kprobe insn pages are not writable anymore.

Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:35 +02:00
Niklas Schnelle
c3b2c9064e s390/pci: remove clp_rescan_pci_devices_simple()
clp_rescan_pci_devices_simple() is neither simpler than
clp_scan_pci_devices() nor does it really scan PCI devices, in particular
it will neither add newly discovered devices nor remove those which
disappeared.
Instead it only refreshes PCI function handles and also
has just a single callsite in the same translation unit left which
in fact only refreshes one specific function handle identified by
a FID.

Clarify this by renaming the function and its helper to
clp_refresh_fh() respectvely __clp_refresh_fh() and make it take
a fid directly which saves us dealing with the NULL case which
updated all function handles but is not used anymore.
Furthermore since the only callsite is in the same translation unit
make it static.

Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:35 +02:00
Niklas Schnelle
809fcfaf92 s390/pci: remove clp_rescan_pci_devices()
there is only one call site of clp_rescan_pci_devices() and
all the function does is call zpci_remove_reserved_devices()
followed by a duplicating clp_scan_pci_devices().
So inline the single call as a call to zpci_remove_reserved_devices()
and clp_scan_pci_devices() and remove the function.

Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:34 +02:00
Niklas Schnelle
2bce60b503 s390/pci: remove unused function zpci_rescan()
the only caller of this was removed as part of the suspend/resume
removal so no need to keep this function around.

Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:34 +02:00
Niklas Schnelle
abb95b7550 s390/pci: consolidate SR-IOV specific code
currently we have multiple #ifdef CONFIG_PCI_IOV blocks spread over
different compliation units and headers, all dealing with SR-IOV
specific behavior.
This violates the style guide which discourages conditionally compiled
code blocks and hinders maintainability by speading SR-IOV functionality
over many files.

Let's move all of this into a conditionally compiled pci_iov.c file and
local header and prefix SR-IOV specific functions with zpci_iov_*.

Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:34 +02:00
Heiko Carstens
da1694ad9e s390/mm,ptdump: hold cpa mutex while walking for kernel page table dump
This is currently only preventing that outdated information is
provided to user space. A concurrent split of huge/large pages does
modify the kernel page tables, however either the huge/large mapping
is reported or the split area is being walked.

This "fixes" also only a potential future bug, since split pages could
also be merged again if page permissions are the same for larger
memory areas.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:34 +02:00
Heiko Carstens
36c2733c43 s390/mm,ptdump: hold memory hotplug lock while walking for kernel page table dump
This is the s390 variant of commit bf2b59f60e ("arm64/mm: Hold
memory hotplug lock while walking for kernel page table dump").

Right now this doesn't fix any real bug, however as soon as kvm
patches get merged which make use of memory remove we might end up
dereferencing/accessing freed page tables.

Therefore fix this potential bug already now.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:34 +02:00
Heiko Carstens
9d719d39aa s390/mm,ptdump: convert to generic page table dumper
Make use of generic ptdump infrastructure.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:34 +02:00
Julian Wiedmann
180a4c42e5 s390/qdio: always use dev_name() for device name in QIB
Passing a custom name from the device driver is nice - but in practice
it's only zfcp who has been using this. So we might as well hard-code
a naming scheme in the qdio layer, so that qeth also benefits from it.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 10:30:07 +02:00
Niklas Schnelle
b02002cc4c s390/pci: Implement ioremap_wc/prot() with MIO
With our current support for the new MIO PCI instructions, write
combining/write back MMIO memory can be obtained via the pci_iomap_wc()
and pci_iomap_wc_range() functions.
This is achieved by using the write back address for a specific bar
as provided in clp_store_query_pci_fn()

These functions are however not widely used and instead drivers often
rely on ioremap_wc() and ioremap_prot(), which on other platforms enable
write combining using a PTE flag set through the pgrprot value.

While we do not have a write combining flag in the low order flag bits
of the PTE like x86_64 does, with MIO support, there is a write back bit
in the physical address (bit 1 on z15) and thus also the PTE.
Which bit is used to toggle write back and whether it is available at
all, is however not fixed in the architecture. Instead we get this
information from the CLP Store Logical Processor Characteristics for PCI
command. When the write back bit is not provided we fall back to the
existing behavior.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 10:30:07 +02:00
Julian Wiedmann
4d4a3caaf3 s390/qdio: clean up QDR setup
__qdio_allocate_fill_qdr() is meant to set up one specific queue
descriptor in the QDR. But for this simple task, it gets passed a bunch
of global structs and offsets - and then navigates through the structs
to find its actual operands.

Clean up all the complicated pointer chasing & index calculation, and
just pass a descriptor and its associated queue struct.

While at it also add some virt_to_phys() translations, to clarify that
addresses in the QDR are meant to be absolute.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 10:30:07 +02:00
Sven Schnelle
4bf3ec384e s390: disable branch profiling for vdso
When branch profiling is enabled, if () gets annotated with code to
instrument the hit/miss ratio. This doesn't work for VDSO as we can't
access kernel code. Add -DDISABLE_BRANCH_PROFILING to fix this.

Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 10:30:07 +02:00
Sven Schnelle
4bff8cb545 s390: convert to GENERIC_VDSO
Convert s390 to generic vDSO. There are a few special things on s390:

- vDSO can be called without a stack frame - glibc did this in the past.
  So we need to allocate a stackframe on our own.

- The former assembly code used stcke to get the TOD clock and applied
  time steering to it. We need to do the same in the new code. This is done
  in the architecture specific __arch_get_hw_counter function. The steering
  information is stored in an architecure specific area in the vDSO data.

- CPUCLOCK_VIRT is now handled with a syscall fallback, which might
  be slower/less accurate than the old implementation.

The getcpu() function stays as an assembly function because there is no
generic implementation and the code is just a few lines.

Performance number from my system do 100 mio gettimeofday() calls:

Plain syscall: 8.6s
Generic VDSO:  1.3s
old ASM VDSO:  1s

So it's a bit slower but still much faster than syscalls.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-26 18:47:21 +02:00
Heiko Carstens
98ad45fb58 s390/checksum: coding style changes
Add some coding style changes which hopefully make the code
look a bit less odd.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-26 18:47:20 +02:00
Heiko Carstens
612ad0785d s390/checksum: have consistent calculations
Use "|" instead of "+" within csum_fold() for consistency reasons,
like in the rest of the file.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-26 18:47:20 +02:00
Heiko Carstens
614b4f5d0f s390/checksum: make ip_fast_csum() faster
Convert ip_fast_csum() so it doesn't call csum_partial(), but instead
open code the checksum calculation. The problem with csum_partial() is
that it makes use of the cksm instruction, which has high startup
costs and therefore is only very fast if used on larger memory
regions.

IPv4 headers however are small in size (5-16 32-bit words). The open
coded variant calculates the checksum in ~30% of the time compared to
the old variant (z14, march=z196).

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-26 18:47:20 +02:00
Heiko Carstens
bb4644b14a s390/checksum: rewrite csum_tcpudp_nofold()
Rewrite csum_tcpudp_nofold() so that the generated code will not
contain branches. The old implementation was also optimized for
machines which came with "add logical with carry" instructions,
however the compiler doesn't generate them anymore. This is most
likely because those instructions are slower.

However with the old code the compiler generates a lot of branches,
which isn't too helpful usually. Therefore rewrite the code.

In a tight loop this doesn't make any difference since the branch
prediction unit does its job.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-26 18:47:20 +02:00
Heiko Carstens
b064904c50 s390/checksum: provide csum_ipv6_magic()
This implementation needs only ~30% of the time to calculate the
checksum compared to the generic variant. In addition the compiler
also generates only ~30% of the instructions compared to the generic
variant (on z14, compiled with march=z196).

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-26 18:47:20 +02:00
Niklas Schnelle
b97bf44f99 s390/pci: fix PF/VF linking on hot plug
Currently there are four places in which a PCI function is scanned
and made available to drivers:
 1. In pci_scan_root_bus() as part of the initial zbus
    creation.
 2. In zpci_bus_add_devices() when registering
    a device in configured state on a zbus that has already been
    scanned.
 3. When a function is already known to zPCI (in reserved/standby state)
    and configuration is triggered through firmware by PEC 0x301.
 4. When a device is already known to zPCI (in standby/reserved state)
    and configuration is triggered from within Linux using
    enable_slot().

The PF/VF linking step and setting of pdev->is_virtfn introduced with
commit e5794cf1a2 ("s390/pci: create links between PFs and VFs") was
only triggered for the second case, which is where VFs created through
sriov_numvfs usually land. However unlike some other platforms but like
POWER VFs can be individually enabled/disabled through
/sys/bus/pci/slots.

Fix this by doing VF setup as part of pcibios_bus_add_device() which is
called in all of the above cases.

Finally to remove the PF/VF links call the common code
pci_iov_remove_virtfn() function to remove linked VFs.
This takes care of the necessary sysfs cleanup.

Fixes: e5794cf1a2 ("s390/pci: create links between PFs and VFs")
Cc: <stable@vger.kernel.org> # 5.8: 2f0230b2f2: s390/pci: re-introduce zpci_remove_device()
Cc: <stable@vger.kernel.org> # 5.8
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-08-17 13:17:34 +02:00
Niklas Schnelle
2f0230b2f2 s390/pci: re-introduce zpci_remove_device()
For fixing the PF to VF link removal we need to perform some action on
every removal of a zdev from the common PCI subsystem.
So in preparation re-introduce zpci_remove_device() and use that instead
of directly calling the common code functions. This  was actually still
declared from earlier code but no longer implemented.

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-08-17 13:17:25 +02:00
Niklas Schnelle
3cddb79afc s390/pci: fix zpci_bus_link_virtfn()
We were missing the pci_dev_put() for candidate PFs.  Furhtermore in
discussion with upstream it turns out that somewhat counterintuitively
some common code, in particular the vfio-pci driver, assumes that
pdev->is_virtfn always implies that pdev->physfn is set, i.e. that VFs
are always linked.
While POWER does seem to set pdev->is_virtfn even for unlinked functions
(see comments in arch/powerpc/kernel/eeh.c:eeh_debugfs_break_device())
for now just be safe and only set pdev->is_virtfn on linking.
Also make sure that we only search for parent PFs if the zbus is
multifunction and we thus know the devfn values supplied by firmware
come from the RID.

Fixes: e5794cf1a2 ("s390/pci: create links between PFs and VFs")
Cc: <stable@vger.kernel.org> # 5.8
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-08-17 13:17:18 +02:00
Heiko Carstens
fd78c59446 s390/ptrace: fix storage key handling
The key member of the runtime instrumentation control block contains
only the access key, not the complete storage key. Therefore the value
must be shifted by four bits. Since existing user space does not
necessarily query and set the access key correctly, just ignore the
user space provided key and use the correct one.
Note: this is only relevant for debugging purposes in case somebody
compiles a kernel with a default storage access key set to a value not
equal to zero.

Fixes: 262832bc5a ("s390/ptrace: add runtime instrumention register get/set")
Reported-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-08-17 13:17:14 +02:00
Heiko Carstens
9eaba29c79 s390/runtime_instrumentation: fix storage key handling
The key member of the runtime instrumentation control block contains
only the access key, not the complete storage key. Therefore the value
must be shifted by four bits.
Note: this is only relevant for debugging purposes in case somebody
compiles a kernel with a default storage access key set to a value not
equal to zero.

Fixes: e4b8b3f33f ("s390: add support for runtime instrumentation")
Reported-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-08-17 13:17:10 +02:00
Niklas Schnelle
b76fee1bc5 s390/pci: ignore stale configuration request event
A configuration request event may be stale, that is the event
may reference a zdev which was already configured.
This can happen when a hotplug happens during boot such that
the device is discovered and configured in the initial clp_list_pci(),
then after initialization we enable events and process
the original configuration request which additionally still contains
the old disabled function handle leading to a failure during device
enablement and subsequent I/O lockout.

Fix this by restoring the check that the device to be configured is in
standby which was removed in commit f606b3ef47 ("s390/pci: adapt events
for zbus").

This check does not need serialization as we only enable the events after
zPCI has fully initialized, which includes the initial clp_list_pci(),
rescan only does updates and events are serialized with respect to each
other.

Fixes: f606b3ef47 ("s390/pci: adapt events for zbus")
Cc: <stable@vger.kernel.org> # 5.8
Reported-by: Shalini Chellathurai Saroja <shalini@linux.ibm.com>
Tested-by: Shalini Chellathurai Saroja <shalini@linux.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-08-17 13:17:05 +02:00
Xiaoming Ni
88db0aa242 all arch: remove system call sys_sysctl
Since commit 61a47c1ad3 ("sysctl: Remove the sysctl system call"),
sys_sysctl is actually unavailable: any input can only return an error.

We have been warning about people using the sysctl system call for years
and believe there are no more users.  Even if there are users of this
interface if they have not complained or fixed their code by now they
probably are not going to, so there is no point in warning them any
longer.

So completely remove sys_sysctl on all architectures.

[nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu]
 Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.com

Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Will Deacon <will@kernel.org>		[arm/arm64]
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: chenzefeng <chenzefeng2@huawei.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Diego Elio Pettenò <flameeyes@flameeyes.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kars de Jong <jongk@linux-m68k.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Paul Burton <paulburton@kernel.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Sven Schnelle <svens@stackframe.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Zhou Yanjie <zhouyanjie@wanyeetech.com>
Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-14 19:56:56 -07:00
Linus Torvalds
990f227371 - Allow s390 debug feature to handle finally more than 256 CPU numbers, instead
of truncating the most significant bits.
 
 - Improve THP splitting required by qemu processes by making use of
   walk_page_vma() instead of calling follow_page() for every single page
   within each vma.
 
 - Add missing ZCRYPT dependency to VFIO_AP to fix potential compile problems.
 
 - Remove not required select CLOCKSOURCE_VALIDATE_LAST_CYCLE again.
 
 - Set node distance to LOCAL_DISTANCE instead of 0, since e.g. libnuma
   translates a node distance of 0 to "no NUMA support available".
 
 - Couple of other minor fixes and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAl81bRwACgkQIg7DeRsp
 bsIT3g/7BSIfbI852VTn6hWw+LfdoBLVqXDFcg9xQhUqccbPcG8LG/BYlmuvIn1Z
 X9XujP76xAX7LKnt9x0Z9znwpgGIj0oURmcBxzGBTk+yLBf7YXCOLu9w4MV1AD57
 WRoGG7pPFZy5rMDM29frqBNQBWc7g+/wGoPVqiffzqgj2DcS4Ka8TSTTboMIH7th
 mLK5xaU504FdDGwTgNv6Q/Pt3+YBZ6P/1cIuifzYnfjJ2CsjSBLjS9IFP6/w/c1Q
 VQ2TfajuRbAgkkY01o15tKrlqwPzzWYhnh9ix1LkL0WT6VNql+fcwN665HdsSDFu
 Ctu6Sk3BZmhN1GONK4gIx5didRxBi7neAfUMenSasPT8uMGJcR4EMoNUr70sXNA/
 B1naRayfCH0dE3SVZflAeFJRSh1LnikY14uj65Gg/Wtla5N+90zuHEGDttIBNzrr
 FDzc+39GWN1wJEl7FzSIm+YRC88C/So/BNUQhmlVYNY0sbqyAszUGBOAa+5VS2WQ
 tkWzWPjJ6QA3e3t/dpnc0dvlCKLTJG3o7bdtilb71QNybGiAniP8+79hCNa5nPT3
 iLRhmLX27cQF54IUw1whVA5mkfzHSzwAlB+/laPi8MWFQXTRLTD1XzhmjrIyR57Y
 Q6wM+lxC4qreDz7yGX+eQk7OEs8j8IKTF14HZxkp/DEL09Vbmbs=
 =E0hf
 -----END PGP SIGNATURE-----

Merge tag 's390-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:

 - Allow s390 debug feature to handle finally more than 256 CPU numbers,
   instead of truncating the most significant bits.

 - Improve THP splitting required by qemu processes by making use of
   walk_page_vma() instead of calling follow_page() for every single
   page within each vma.

 - Add missing ZCRYPT dependency to VFIO_AP to fix potential compile
   problems.

 - Remove not required select CLOCKSOURCE_VALIDATE_LAST_CYCLE again.

 - Set node distance to LOCAL_DISTANCE instead of 0, since e.g. libnuma
   translates a node distance of 0 to "no NUMA support available".

 - Couple of other minor fixes and improvements.

* tag 's390-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/numa: move code to arch/s390/kernel
  s390/time: remove select CLOCKSOURCE_VALIDATE_LAST_CYCLE again
  s390/debug: debug feature version 3
  s390/Kconfig: add missing ZCRYPT dependency to VFIO_AP
  s390/numa: set node distance to LOCAL_DISTANCE
  s390/pkey: remove redundant variable initialization
  s390/test_unwind: fix possible memleak in test_unwind()
  s390/gmap: improve THP splitting
  s390/atomic: circumvent gcc 10 build regression
2020-08-13 12:38:32 -07:00
Peter Xu
64019a2e46 mm/gup: remove task_struct pointer for all gup code
After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more.  Remove that parameter in the whole gup
stack.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:58:04 -07:00
Peter Xu
35e45f3e5a mm/s390: use general page fault accounting
Use the general page fault accounting by passing regs into
handle_mm_fault().  It naturally solve the issue of multiple page fault
accounting when page fault retry happened.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Link: http://lkml.kernel.org/r/20200707225021.200906-19-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:58:03 -07:00
Peter Xu
bce617edec mm: do page fault accounting in handle_mm_fault
Patch series "mm: Page fault accounting cleanups", v5.

This is v5 of the pf accounting cleanup series.  It originates from Gerald
Schaefer's report on an issue a week ago regarding to incorrect page fault
accountings for retried page fault after commit 4064b98270 ("mm: allow
VM_FAULT_RETRY for multiple times"):

  https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

What this series did:

  - Correct page fault accounting: we do accounting for a page fault
    (no matter whether it's from #PF handling, or gup, or anything else)
    only with the one that completed the fault.  For example, page fault
    retries should not be counted in page fault counters.  Same to the
    perf events.

  - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
    event is used in an adhoc way across different archs.

    Case (1): for many archs it's done at the entry of a page fault
    handler, so that it will also cover e.g.  errornous faults.

    Case (2): for some other archs, it is only accounted when the page
    fault is resolved successfully.

    Case (3): there're still quite some archs that have not enabled
    this perf event.

    Since this series will touch merely all the archs, we unify this
    perf event to always follow case (1), which is the one that makes most
    sense.  And since we moved the accounting into handle_mm_fault, the
    other two MAJ/MIN perf events are well taken care of naturally.

  - Unify definition of "major faults": the definition of "major
    fault" is slightly changed when used in accounting (not
    VM_FAULT_MAJOR).  More information in patch 1.

  - Always account the page fault onto the one that triggered the page
    fault.  This does not matter much for #PF handlings, but mostly for
    gup.  More information on this in patch 25.

Patchset layout:

Patch 1:     Introduced the accounting in handle_mm_fault(), not enabled.
Patch 2-23:  Enable the new accounting for arch #PF handlers one by one.
Patch 24:    Enable the new accounting for the rest outliers (gup, iommu, etc.)
Patch 25:    Cleanup GUP task_struct pointer since it's not needed any more

This patch (of 25):

This is a preparation patch to move page fault accountings into the
general code in handle_mm_fault().  This includes both the per task
flt_maj/flt_min counters, and the major/minor page fault perf events.  To
do this, the pt_regs pointer is passed into handle_mm_fault().

PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
handlers.

So far, all the pt_regs pointer that passed into handle_mm_fault() is
NULL, which means this patch should have no intented functional change.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:58:02 -07:00
Christoph Hellwig
428e2976a5 uaccess: remove segment_eq
segment_eq is only used to implement uaccess_kernel.  Just open code
uaccess_kernel in the arch uaccess headers and remove one layer of
indirection.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Greentime Hu <green.hu@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Link: http://lkml.kernel.org/r/20200710135706.537715-5-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:57:58 -07:00
Alexander Gordeev
b450eeb0c9 s390/numa: move code to arch/s390/kernel
Move all code from arch/s390/numa/ to arch/s390/kernel/
since numa.c is the only source file and all others were
deleted with the fake NUMA support removal.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-08-11 18:16:55 +02:00
Heiko Carstens
12bbf0962a s390/time: remove select CLOCKSOURCE_VALIDATE_LAST_CYCLE again
Sven Schnelle reported that setting CLOCKSOURCE_VALIDATE_LAST_CYCLE
doesn't make sense: even if our tod clock overflows delta calculation
(now - last) with unsigned 64 bit values will still be correct.

Therefore revert commit 555701a714 ("s390/time: select
CLOCKSOURCE_VALIDATE_LAST_CYCLE").

Fixes: 555701a714 ("s390/time: select CLOCKSOURCE_VALIDATE_LAST_CYCLE")
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-08-11 18:16:48 +02:00
Mikhail Zaslonko
0990d836ce s390/debug: debug feature version 3
Change __debug_entry structure in the following way:
 - remove redundant union
 - Field containing cpuid is expanded to 16 bits. 8-bit width was not
   enough since we already support up to 512 cpus.
 - Field containing the timestamp is expanded to 60 bits. The timestamp
   itself is now stored in the absolute Unix time format in microseconds
   taking the Epoch Index into acount.
Adjust default header for debug entries by setting minimum width for cpuid
to 4 digits.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-08-11 18:16:43 +02:00