Commit Graph

766574 Commits

Author SHA1 Message Date
Nicholas Piggin
f1cb8f9beb powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags
The ISA suggests ptesync after setting a pte, to prevent a table walk
initiated by a subsequent access from missing that store and causing a
spurious fault. This is an architectual allowance that allows an
implementation's page table walker to be incoherent with the store
queue.

However there is no correctness problem in taking a spurious fault in
userspace -- the kernel copes with these at any time, so the updated
pte will be found eventually. Spurious kernel faults on vmap memory
must be avoided, so a ptesync is put into flush_cache_vmap.

On POWER9 so far I have not found a measurable window where this can
result in more minor faults, so as an optimisation, remove the costly
ptesync from pte updates. If an implementation benefits from ptesync,
it would be better to add it back in update_mmu_cache, so it's not
done for things like fork(2).

fork --fork --exec benchmark improved 5.2% (12400->13100).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:36 +10:00
Nicholas Piggin
68662f85f3 powerpc/64s/radix: prefetch user address in update_mmu_cache
Prefetch the faulting address in update_mmu_cache to give the page
table walker perhaps 100 cycles head start as locks are dropped and
the interrupt completed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:35 +10:00
Nicholas Piggin
f569bd94ef powerpc/64s/radix: make ptep_get_and_clear_full non-atomic for the full case
This matches other architectures, when we know there will be no
further accesses to the address (e.g., for teardown), page table
entries can be cleared non-atomically.

The comments about NMMU are bogus: all MMU notifiers (including NMMU)
are released at this point, with their TLBs flushed. An NMMU access at
this point would be a bug.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:35 +10:00
Nicholas Piggin
6d8278c414 powerpc/64s/radix: do not flush TLB on spurious fault
In the case of a spurious fault (which can happen due to a race with
another thread that changes the page table), the default Linux mm code
calls flush_tlb_page for that address. This is not required because
the pte will be re-fetched. Hash does not wire this up to a hardware
TLB flush for this reason. This patch avoids the flush for radix.

>From Power ISA v3.0B, p.1090:

    Setting a Reference or Change Bit or Upgrading Access Authority
    (PTE Subject to Atomic Hardware Updates)

    If the only change being made to a valid PTE that is subject to
    atomic hardware updates is to set the Refer- ence or Change bit to
    1 or to add access authorities, a simpler sequence suffices
    because the translation hardware will refetch the PTE if an access
    is attempted for which the only problems were reference and/or
    change bits needing to be set or insufficient access authority.

The nest MMU on POWER9 does not re-fetch the PTE after such an access
attempt before faulting, so address spaces with a coprocessor
attached will continue to flush in these cases.

This reduces tlbies for a kernel compile workload from 0.95M to 0.90M.

fork --fork --exec benchmark improved 0.5% (12300->12400).

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:35 +10:00
Nicholas Piggin
e5f7cb58c2 powerpc/64s/radix: do not flush TLB when relaxing access
Radix flushes the TLB when updating ptes to increase permissiveness
of protection (increase access authority). Book3S does not require
TLB flushing in this case, and it is not done on hash. This patch
avoids the flush for radix.

>From Power ISA v3.0B, p.1090:

    Setting a Reference or Change Bit or Upgrading Access Authority
    (PTE Subject to Atomic Hardware Updates)

    If the only change being made to a valid PTE that is subject to
    atomic hardware updates is to set the Reference or Change bit to 1
    or to add access authorities, a simpler sequence suffices because
    the translation hardware will refetch the PTE if an access is
    attempted for which the only problems were reference and/or change
    bits needing to be set or insufficient access authority.

The nest MMU on POWER9 does not re-fetch the PTE after such an access
attempt before faulting, so address spaces with a coprocessor
attached will continue to flush in these cases.

This reduces tlbies for a kernel compile workload from 1.28M to 0.95M,
tlbiels from 20.17M 19.68M.

fork --fork --exec benchmark improved 2.77% (12000->12300).

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:34 +10:00
Aneesh Kumar K.V
bd5050e38a powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang
When relaxing access (read -> read_write update), pte needs to be marked invalid
to handle a nest MMU bug. We also need to do a tlb flush after the pte is
marked invalid before updating the pte with new access bits.

We also move tlb flush to platform specific __ptep_set_access_flags. This will
help us to gerid of unnecessary tlb flush on BOOK3S 64 later. We don't do that
in this patch. This also helps in avoiding multiple tlbies with coprocessor
attached.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:34 +10:00
Aneesh Kumar K.V
e4c1112c3f powerpc/mm: Change function prototype
In later patch, we use the vma and psize to do tlb flush. Do the prototype
update in separate patch to make the review easy.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:34 +10:00
Aneesh Kumar K.V
044003b52a powerpc/mm/radix: Move function from radix.h to pgtable-radix.c
In later patch we will update them which require them to be moved
to pgtable-radix.c. Keeping the function in radix.h results in
compile warning as below.

./arch/powerpc/include/asm/book3s/64/radix.h: In function ‘radix__ptep_set_access_flags’:
./arch/powerpc/include/asm/book3s/64/radix.h:196:28: error: dereferencing pointer to incomplete type ‘struct vm_area_struct’
  struct mm_struct *mm = vma->vm_mm;
                            ^~
./arch/powerpc/include/asm/book3s/64/radix.h:204:6: error: implicit declaration of function ‘atomic_read’; did you mean ‘__atomic_load’? [-Werror=implicit-function-declaration]
      atomic_read(&mm->context.copros) > 0) {
      ^~~~~~~~~~~
      __atomic_load
./arch/powerpc/include/asm/book3s/64/radix.h:204:21: error: dereferencing pointer to incomplete type ‘struct mm_struct’
      atomic_read(&mm->context.copros) > 0) {

Instead of fixing header dependencies, we move the function to pgtable-radix.c
Also the function is now large to be a static inline . Doing the
move in separate patch helps in review.

No functional change in this patch. Only code movement.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:34 +10:00
Aneesh Kumar K.V
f069ff396d powerpc/mm/hugetlb: Update huge_ptep_set_access_flags to call __ptep_set_access_flags directly
In a later patch, we want to update __ptep_set_access_flags take page size
arg. This makes ptep_set_access_flags only work with mmu_virtual_psize.
To simplify the code make huge_ptep_set_access_flags directly call
__ptep_set_access_flags so that we can compute the hugetlb page size in
hugetlb function.

Now that ptep_set_access_flags won't be called for hugetlb remove
the is_vm_hugetlb_page() check and add the assert of pte lock
unconditionally.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:33 +10:00
Alastair D'Silva
721c551d31 ocxl: Document new OCXL IOCTLs
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:33 +10:00
Alastair D'Silva
02a8e5bc1c ocxl: Add an IOCTL so userspace knows what OCXL features are available
In order for a userspace AFU driver to call the POWER9 specific
OCXL_IOCTL_ENABLE_P9_WAIT, it needs to verify that it can actually
make that call.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:32 +10:00
Alastair D'Silva
e948e06fc6 ocxl: Expose the thread_id needed for wait on POWER9
In order to successfully issue as_notify, an AFU needs to know the TID
to notify, which in turn means that this information should be
available in userspace so it can be communicated to the AFU.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:32 +10:00
Alastair D'Silva
19df39581c ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
The function removes the process element from NPU cache.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:32 +10:00
Alastair D'Silva
71cc64a85d powerpc: use task_pid_nr() for TID allocation
The current implementation of TID allocation, using a global IDR, may
result in an errant process starving the system of available TIDs.
Instead, use task_pid_nr(), as mentioned by the original author. The
scenario described which prevented it's use is not applicable, as
set_thread_tidr can only be called after the task struct has been
populated.

In the unlikely event that 2 threads share the TID and are waiting,
all potential outcomes have been determined safe.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:31 +10:00
Alastair D'Silva
3449f191ca powerpc: Use TIDR CPU feature to control TIDR allocation
Switch the use of TIDR on it's CPU feature, rather than assuming it
is available based on architecture.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:31 +10:00
Alastair D'Silva
819844285e powerpc: Add TIDR CPU feature for POWER9
This patch adds a CPU feature bit to show whether the CPU has
the TIDR register available, enabling as_notify/wait in userspace.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:31 +10:00
Nicholas Piggin
56c0b48b1e powerpc/powernv: process all OPAL event interrupts with kopald
Using irq_work for processing OPAL event interrupts is not necessary.
irq_work is typically used to schedule work from NMI context, a
softirq may be more appropriate. However OPAL events are not
particularly performance or latency critical, so they can all be
invoked by kopald.

This patch removes the irq_work queueing, and instead wakes up
kopald when there is an event to be processed. kopald processes
interrupts individually, enabling irqs and calling cond_resched
between each one to minimise latencies.

Event handlers themselves should still use threaded handlers,
workqueues, etc. as necessary to avoid high interrupts-off latencies
within any single interrupt.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:30 +10:00
Nicholas Piggin
ee03b9b447 powerpc/powernv: call OPAL_QUIESCE before OPAL_SIGNAL_SYSTEM_RESET
Although it is often possible to recover a CPU that was interrupted
from OPAL with a system reset NMI, it's undesirable to interrupt them
for a few reasons. Firstly because dump/debug code itself needs to
call firmware, so it could hang on a lock or possibly corrupt a
per-cpu data structure if it or another CPU was interrupted from
OPAL. Secondly, the kexec crash dump code will not return from
interrupt to unwind the OPAL call.

Call OPAL_QUIESCE with QUIESCE_HOLD before sending an NMI IPI to
another CPU, which wait for it to leave firmware (or time out) to
avoid this problem in normal conditions. Firmware bugs may still
result in a timeout and interrupting OPAL, but that is the best
option (stops the CPU, and possibly allows firmware to be debugged).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:30 +10:00
Nicholas Piggin
3130a7bb6e powerpc/64: change softe to irqmask in show_regs and xmon
When the soft enabled flag was changed to a soft disable mask, xmon
and register dump code was not updated to reflect that, which is
confusing ('SOFTE: 1' previously meant interrupts were soft enabled,
currently it means the opposite, the general interrupt type has been
disabled).

Fix this by using the name irqmask, and printing it in hex.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:30 +10:00
Nicholas Piggin
81ea11d3af powerpc/pmu/fsl: fix is_nmi test for irq mask change
When soft enabled was changed to irq disabled mask, this test missed
being converted (although the equivalent book3s test was converted).

The PMU drivers consider it an NMI when they take a PMI while general
interrupts are disabled. This change restores that behaviour.

Fixes: 01417c6cc7 ("powerpc/64: Change soft_enabled from flag to bitmask")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:29 +10:00
Nicholas Piggin
e360cd37f0 powerpc/time: account broadcast timer event interrupts separately
These are not local timer interrupts but IPIs. It's good to be able
to see how timer offloading is behaving, so split these out into
their own category.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:29 +10:00
Nicholas Piggin
21bfd6a8e9 powerpc: move a stray NMI IPI case under NMI_IPI ifdef
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:29 +10:00
Nicholas Piggin
bc90711331 powerpc: move timer broadcast code under GENERIC_CLOCKEVENTS_BROADCAST ifdef
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:28 +10:00
Nicholas Piggin
a7cba02dec powerpc: allow soft-NMI watchdog to cover timer interrupts with large decrementers
Large decrementers (e.g., POWER9) can take a very long time to wrap,
so when the timer iterrupt handler sets the decrementer to max so as
to avoid taking another decrementer interrupt when hard enabling
interrupts before running timers, it effectively disables the soft
NMI coverage for timer interrupts.

Fix this by using the traditional 31-bit value instead, which wraps
after a few seconds. masked interrupt code does the same thing, and
in normal operation neither of these paths would ever wrap even the
31 bit value.

Note: the SMP watchdog should catch timer interrupt lockups, but it
is preferable for the local soft-NMI to catch them, mainly to avoid
the IPI.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:28 +10:00
Nicholas Piggin
3f984620f9 powerpc: generic clockevents broadcast receiver call tick_receive_broadcast
The broadcast tick recipient can call tick_receive_broadcast rather
than re-running the full timer interrupt.

It does not have to check for the next event time, because the sender
already determined the timer has expired. It does not have to test
irq_work_pending, because that's a direct decrementer interrupt and
does not go through the clock events subsystem. And it does not have
to read PURR because that was removed with the previous patch.

This results in no code size change, but both the decrementer and
broadcast path lengths are reduced.

Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:27 +10:00
Nicholas Piggin
3d3a6021dd powerpc/pseries: lparcfg calculate PURR on demand
For SPLPAR, lparcfg provides a sum of PURR registers for all CPUs.
Currently this is done by reading PURR in context switch and timer
interrupt, and storing that into a per-CPU variable. These are summed
to provide the value.

This does not work with all timer schemes (e.g., NO_HZ_FULL), and it
is sub-optimal for performance because it reads the PURR register on
every context switch, although that's been difficult to distinguish
from noise in the contxt_switch microbenchmark.

This patch implements the sum by calling a function on each CPU, to
read and add PURR values of each CPU.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:27 +10:00
Nicholas Piggin
36d632ea83 powerpc/64: remove start_tb and accum_tb from thread_struct
These fields are only written to.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:26 +10:00
Nicholas Piggin
54071e4176 powerpc/64s: micro-optimise __hard_irq_enable() for mtmsrd L=1 support
Book3S minimum supported ISA version now requires mtmsrd L=1. This
instruction does not require bits other than RI and EE to be supplied,
so __hard_irq_enable() and __hard_irq_disable() does not have to read
the kernel_msr from paca.

Interrupt entry code already relies on L=1 support.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:26 +10:00
Nicholas Piggin
9f4b61b277 powerpc/pseries: put cede MSR[EE] check under IRQ_SOFT_MASK_DEBUG
This check does not catch IRQ soft mask bugs, but this option is
slightly more suitable than TRACE_IRQFLAGS.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:25 +10:00
Nicholas Piggin
ebb37cf3ff powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled
irq_work_raise should not cause a decrementer exception unless it is
called from NMI context. Doing so often just results in an immediate
masked decrementer interrupt:

   <...>-550    90d...    4us : update_curr_rt <-dequeue_task_rt
   <...>-550    90d...    5us : dbs_update_util_handler <-update_curr_rt
   <...>-550    90d...    6us : arch_irq_work_raise <-irq_work_queue
   <...>-550    90d...    7us : soft_nmi_interrupt <-soft_nmi_common
   <...>-550    90d...    7us : printk_nmi_enter <-soft_nmi_interrupt
   <...>-550    90d.Z.    8us : rcu_nmi_enter <-soft_nmi_interrupt
   <...>-550    90d.Z.    9us : rcu_nmi_exit <-soft_nmi_interrupt
   <...>-550    90d...    9us : printk_nmi_exit <-soft_nmi_interrupt
   <...>-550    90d...   10us : cpuacct_charge <-update_curr_rt

The soft_nmi_interrupt here is the call into the watchdog, due to the
decrementer interrupt firing with irqs soft-disabled. This is
harmless, but sub-optimal.

When it's not called from NMI context or with interrupts enabled, mark
the decrementer pending in the irq_happened mask directly, rather than
having the masked decrementer interupt handler do it. This will be
replayed at the next local_irq_enable. See the comment for details.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:25 +10:00
Alexey Kardashevskiy
98fd72fe82 powerpc/powernv/ioda2: Remove redundant free of TCE pages
When IODA2 creates a PE, it creates an IOMMU table with it_ops::free
set to pnv_ioda2_table_free() which calls pnv_pci_ioda2_table_free_pages().

Since iommu_tce_table_put() calls it_ops::free when the last reference
to the table is released, explicit call to pnv_pci_ioda2_table_free_pages()
is not needed so let's remove it.

This should fix double free in the case of PCI hotuplug as
pnv_pci_ioda2_table_free_pages() does not reset neither
iommu_table::it_base nor ::it_size.

This was not exposed by SRIOV as it uses different code path via
pnv_pcibios_sriov_disable().

IODA1 does not inialize it_ops::free so it does not have this issue.

Fixes: c5f7700bbd ("powerpc/powernv: Dynamically release PE")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:25 +10:00
Yisheng Xie
0abbf2bfdc powerpc/xmon: use match_string() helper
match_string() returns the index of an array for a matching string,
which can be used instead of open coded variant.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:24 +10:00
Christophe Leroy
2479bfc9bc powerpc: Fix build by disabling attribute-alias warning for SYSCALL_DEFINEx
GCC 8.1 emits warnings such as the following. As arch/powerpc code is
built with -Werror, this breaks the build with GCC 8.1.

  In file included from arch/powerpc/kernel/pci_64.c:23:
  ./include/linux/syscalls.h:233:18: error: 'sys_pciconfig_iobase' alias
  between functions of incompatible types 'long int(long int, long
  unsigned int, long unsigned int)' and 'long int(long int, long int,
  long int)' [-Werror=attribute-alias]
    asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
                    ^~~
  ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
    __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)

This patch inhibits those warnings.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Trim change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:24 +10:00
Christophe Leroy
c959988118 powerpc/64: Fix strncpy() related build failures with GCC 8.1
GCC 8.1 warns about possible string truncation:

  arch/powerpc/kernel/nvram_64.c:1042:2: error: 'strncpy' specified
  bound 12 equals destination size [-Werror=stringop-truncation]
    strncpy(new_part->header.name, name, 12);

  arch/powerpc/platforms/ps3/repository.c:106:2: error: 'strncpy'
  output truncated before terminating nul copying 8 bytes from a
  string of the same length [-Werror=stringop-truncation]
    strncpy((char *)&n, text, 8);

Fix it by using memcpy(). To make that safe we need to ensure the
destination is pre-zeroed. Use kzalloc() in the nvram code and
initialise the u64 to zero in the ps3 code.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Use kzalloc() in the nvram code, flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:24 +10:00
Michael Ellerman
2135a6ec3e Merge branch 'topic/pkey' into next
This is a branch with a mixture of mm, x86 and powerpc commits all
relating to some minor cross-arch pkeys consolidation. The x86/mm
changes have been reviewed by Ingo & Dave Hansen and the tree has been
in linux-next for some weeks without issue.
2018-06-03 20:35:27 +10:00
Michael Ellerman
f1079d3a3d Merge branch 'fixes' into next
We ended up with an ugly conflict between fixes and next in ftrace.h
involving multiple nested ifdefs, and the automatic resolution is
wrong. So merge fixes into next so we can fix it up.
2018-06-03 20:32:02 +10:00
Michael Ellerman
b5240b1439 Merge branch 'topic/kbuild' into next
Merge in some commits we're sharing with the kbuild tree.
2018-06-03 20:24:15 +10:00
Michael Ellerman
481c63acba Merge branch 'topic/ppc-kvm' into next
Merge in some commits we're sharing with the kvm-ppc tree.
2018-06-03 20:23:54 +10:00
Rafael J. Wysocki
9b34ffa09d Merge back earlier PM tools material for v4.18. 2018-06-03 10:12:30 +02:00
Rafael J. Wysocki
ba8042a85e Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat utility updates for v4.18 from Len Brown.

* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (65 commits)
  tools/power turbostat: update version number
  tools/power turbostat: Add Node in output
  tools/power turbostat: add node information into turbostat calculations
  tools/power turbostat: remove num_ from cpu_topology struct
  tools/power turbostat: rename num_cores_per_pkg to num_cores_per_node
  tools/power turbostat: track thread ID in cpu_topology
  tools/power turbostat: Calculate additional node information for a package
  tools/power turbostat: Fix node and siblings lookup data
  tools/power turbostat: set max_num_cpus equal to the cpumask length
  tools/power turbostat: if --num_iterations, print for specific number of iterations
  tools/power turbostat: Add Cannon Lake support
  tools/power turbostat: delete duplicate #defines
  x86: msr-index.h: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
  tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
  tools/power turbostat: add POLL and POLL% column
  tools/power turbostat: Fix --hide Pk%pc10
  tools/power turbostat: Build-in "Low Power Idle" counters support
  tools/power turbostat: Don't make man pages executable
  tools/power turbostat: remove blank lines
  tools/power turbostat: a small C-states dump readability immprovement
  ...
2018-06-03 10:11:50 +02:00
Matthew Wilcox
cc4a90ac81 dax: dax_insert_mapping_entry always succeeds
It does not return an error, so we don't need to check the return value
for IS_ERR().  Indeed, it is a bug to do so; with a sufficiently large
PFN, a legitimate DAX entry may be mistaken for an error return.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-06-02 19:39:37 -07:00
Ming Lei
32a50fabb3 blk-mq: update nr_requests when switching to 'none' scheduler
Now we setup q->nr_requests when switching to one new scheduler,
but not do it for 'none', then q->nr_requests may not be correct
for 'none'.

This patch fixes this issue by always updating 'nr_requests' when
switching to 'none'.

Cc: Marco Patalano <mpatalan@redhat.com>
Cc: "Ewan D. Milne" <emilne@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-02 20:35:00 -06:00
Jens Axboe
cd4a4ae468 block: don't use blocking queue entered for recursive bio submits
If we end up splitting a bio and the queue goes away between
the initial submission and the later split submission, then we
can block forever in blk_queue_enter() waiting for the reference
to drop to zero. This will never happen, since we already hold
a reference.

Mark a split bio as already having entered the queue, so we can
just use the live non-blocking queue enter variant.

Thanks to Tetsuo Handa for the analysis.

Reported-by: syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-02 20:35:00 -06:00
Kent Overstreet
d00a11df69 dm-crypt: fix warning in shutdown path
The counter for the number of allocated pages includes pages in the
mempool's reserve, so checking that the number of allocated pages is 0
needs to happen after we exit the mempool.

Fixes: 6f1c819c21 ("dm: convert to bioset_init()/mempool_init()")
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Reported-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Mike Snitzer <snitzer@redhat.com>

Fixed to always just use percpu_counter_sum()

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-02 20:35:00 -06:00
Linus Torvalds
918fe1b315 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Infinite loop in _decode_session6(), from Eric Dumazet.

 2) Pass correct argument to nla_strlcpy() in netfilter, also from Eric
    Dumazet.

 3) Out of bounds memory access in ipv6 srh code, from Mathieu Xhonneux.

 4) NULL deref in XDP_REDIRECT handling of tun driver, from Toshiaki
    Makita.

 5) Incorrect idr release in cls_flower, from Paul Blakey.

 6) Probe error handling fix in davinci_emac, from Dan Carpenter.

 7) Memory leak in XPS configuration, from Alexander Duyck.

 8) Use after free with cloned sockets in kcm, from Kirill Tkhai.

 9) MTU handling fixes fo ip_tunnel and ip6_tunnel, from Nicolas
    Dichtel.

10) Fix UAPI hole in bpf data structure for 32-bit compat applications,
    from Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
  bpf: fix uapi hole for 32 bit compat applications
  net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
  ip6_tunnel: remove magic mtu value 0xFFF8
  ip_tunnel: restore binding to ifaces with a large mtu
  net: dsa: b53: Add BCM5389 support
  kcm: Fix use-after-free caused by clonned sockets
  net-sysfs: Fix memory leak in XPS configuration
  ixgbe: fix parsing of TC actions for HW offload
  net: ethernet: davinci_emac: fix error handling in probe()
  net/ncsi: Fix array size in dumpit handler
  cls_flower: Fix incorrect idr release when failing to modify rule
  net/sonic: Use dma_mapping_error()
  xfrm Fix potential error pointer dereference in xfrm_bundle_create.
  vhost_net: flush batched heads before trying to busy polling
  tun: Fix NULL pointer dereference in XDP redirect
  be2net: Fix error detection logic for BE3
  net: qmi_wwan: Add Netgear Aircard 779S
  mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG
  atm: zatm: fix memcmp casting
  iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs
  ...
2018-06-02 17:35:53 -07:00
Bjorn Helgaas
010caed4cc PCI/AER: Decode Error Source Requester ID
Decode the Requester ID from the AER Error Source Register into domain/
bus/device/function format to match other logging.  In cases where the ID
matches the device used for pci_err(), drop the extra ID completely so we
don't print it twice.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-06-02 19:29:29 -05:00
Borislav Petkov
ad4050dcda PCI/AER: Remove aer_recover_work_func() forward declaration
Just move the actual function up so that it is visible to its user
aer_recover_queue().

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-06-02 19:29:27 -05:00
Oza Pawandeep
b09803b5e5 PCI/DPC: Use the generic pcie_do_fatal_recovery() path
Our goal is to handle ERR_FATAL errors similarly, whether they are reported
via AER or via DPC.  A previous commit changed AER so it handles ERR_FATAL
by calling driver .remove() methods and resetting the Link.  DPC already
does that (although the Link reset is done automatically by hardware and
happens before we call the driver .remove() methods).

Restructure the DPC code so it calls the same pcie_do_fatal_recovery()
interface used by AER.  This makes it clearer that we want to use the same
path.

Implement the .reset_link() method used by pcie_do_fatal_recovery().  For
DPC, the actual reset is done automatically by hardware, so we really only
have to wait for the Link to be inactive, then release the Port from DPC.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog, DPC_FATAL is not a bitfield, can be sequential]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-06-02 19:29:27 -05:00
Oza Pawandeep
0b91439d35 PCI/AER: Pass service type to pcie_do_fatal_recovery()
Pass the service type to pcie_do_fatal_recovery() instead of assuming AER.
We will make DPC also use pcie_do_fatal_recovery(), and it needs to do
things a little differently for AER and DPC.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-06-02 19:29:26 -05:00
Oza Pawandeep
6927868e7a PCI/DPC: Disable ERR_NONFATAL handling by DPC
PCIe ERR_NONFATAL errors mean a particular transaction is unreliable but
the Link is otherwise fully functional (PCIe r4.0, sec 6.2.2).

The AER driver handles these by logging the error details and calling
driver-supplied pci_error_handlers callbacks.  It does not reset downstream
devices, does not remove them from the PCI subsystem, does not re-enumerate
them, and does not call their driver .remove() or .probe() methods.

But DPC driver previously enabled DPC on ERR_NONFATAL, so if the hardware
supports DPC, these errors caused a Link reset (performed automatically by
the hardware), followed by the DPC driver removing affected devices (which
calls their .remove() methods), bringing the Link back up, and
re-enumerating (which calls driver .probe() methods).

Disable ERR_NONFATAL DPC triggering so these errors will only be handled by
AER.  This means drivers won't have to deal with different usage of their
pci_error_handlers callbacks and .probe() and .remove() methods based on
whether the platform has DPC support.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-06-02 19:29:25 -05:00