Commit Graph

976572 Commits

Author SHA1 Message Date
Bjorn Helgaas
4cc0a34ae2 Merge branch 'remotes/lorenzo/pci/iproc'
- Declare iproc register set sizes to help avoid out-of-bound accesses
  (Bharat Gooty)

- Invalidate iproc PAXB IARR1/IMAP1 inbound windows to erase bootloader
  footprint (Roman Bacik)

- Log Root Port link speed & width at startup (Srinath Mannam)

* remotes/lorenzo/pci/iproc:
  PCI: iproc: Enhance PCIe Link information display
  PCI: iproc: Invalidate correct PAXB inbound windows
  PCI: iproc: Fix out-of-bound array accesses
2020-12-15 15:11:12 -06:00
Bjorn Helgaas
ff9f1683b6 Merge branch 'remotes/lorenzo/pci/dwc'
- Support multiple ATU memory regions (Rob Herring)

- Warn if non-prefetchable memory aperture is > 32-bit (Vidya Sagar)

- Allow programming ATU for >4GB memory (Vidya Sagar)

- Move ATU offset out of driver match data (Rob Herring)

- Move "dbi", "dbi2", and "addr_space" resource setup to common code (Rob
  Herring)

- Remove unneeded function wrappers (Rob Herring)

- Ensure all outbound ATU windows are reset to reduce dependencies on
  bootloader (Rob Herring)

- Use the default MSI irq_chip for dra7xx (Rob Herring)

- Drop the .set_num_vectors() host op (Rob Herring)

- Move MSI interrupt setup into DWC common code (Rob Herring)

- Rework and simplify DWC MSI initialization (Rob Herring)

- Move link handling to DWC common code (Rob Herring)

- Move dw_pcie_msi_init() calls to DWC common code (Rob Herring)

- Move dw_pcie_setup_rc() calls to DWC common code (Rob Herring)

- Remove unnecessary wrappers around dw_pcie_host_init() (Rob Herring)

- Revert "keystone: Drop duplicated 'num-viewport'" to prepare for
  detecting number of iATU regions without help from DT (Rob Herring)

- Move inbound and outbound windows to common struct (Rob Herring)

- Detect number of DWC iATU windows from device registers (Rob Herring)

- Drop samsung,exynos5440-pcie binding (Marek Szyprowski)

- Add samsung,exynos-pcie and samsung,exynos-pcie-phy bindings for
  Exynos5433 variant (Marek Szyprowski)

- Rework phy-exynos-pcie driver to support Exynos5433 PCIe PHY (Jaehoon
  Chung)

- Rework pci-exynos.c to support Exynos5433 PCIe host (Jaehoon Chung)

- Move tegra "dbi" accesses to post common DWC initialization (Vidya Sagar)

- Read tegra dbi" base address in application logic (Vidya Sagar)

- Fix tegra ASPM-L1SS advertisement disable code (Vidya Sagar)

- Set Tegra194 DesignWare IP version to 0x490A (Vidya Sagar)

- Continue tegra unconfig sequence even if parts fail (Vidya Sagar)

- Check return value of tegra_pcie_init_controller() (Vidya Sagar)

- Disable tegra LTSSM during L2 entry (Vidya Sagar)

- Add SM8250 SoC PCIe DT bindings and support (Manivannan Sadhasivam)

- Add SM8250 BDF to SID mapping (Manivannan Sadhasivam)

- Set 32-bit DMA mask for DWC MSI target address allocation (Vidya Sagar)

* remotes/lorenzo/pci/dwc:
  PCI: dwc: Set 32-bit DMA mask for MSI target address allocation
  PCI: qcom: Add support for configuring BDF to SID mapping for SM8250
  PCI: qcom: Add SM8250 SoC support
  dt-bindings: pci: qcom: Document PCIe bindings for SM8250 SoC
  PCI: tegra: Disable LTSSM during L2 entry
  PCI: tegra: Check return value of tegra_pcie_init_controller()
  PCI: tegra: Continue unconfig sequence even if parts fail
  PCI: tegra: Set DesignWare IP version
  PCI: tegra: Fix ASPM-L1SS advertisement disable code
  PCI: tegra: Read "dbi" base address to program in application logic
  PCI: tegra: Move "dbi" accesses to post common DWC initialization
  PCI: dwc: exynos: Rework the driver to support Exynos5433 variant
  phy: samsung: phy-exynos-pcie: rework driver to support Exynos5433 PCIe PHY
  dt-bindings: phy: exynos: add the samsung,exynos-pcie-phy binding
  dt-bindings: PCI: exynos: add the samsung,exynos-pcie binding
  dt-bindings: PCI: exynos: drop samsung,exynos5440-pcie binding
  PCI: dwc: Detect number of iATU windows
  PCI: dwc: Move inbound and outbound windows to common struct
  Revert "PCI: dwc/keystone: Drop duplicated 'num-viewport'"
  PCI: dwc: Remove unnecessary wrappers around dw_pcie_host_init()
  PCI: dwc: Move dw_pcie_setup_rc() to DWC common code
  PCI: dwc: Move dw_pcie_msi_init() into core
  PCI: dwc: Move link handling into common code
  PCI: dwc: Rework MSI initialization
  PCI: dwc: Move MSI interrupt setup into DWC common code
  PCI: dwc: Drop the .set_num_vectors() host op
  PCI: dwc/dra7xx: Use the common MSI irq_chip
  PCI: dwc: Ensure all outbound ATU windows are reset
  PCI: dwc/intel-gw: Remove some unneeded function wrappers
  PCI: dwc: Move "dbi", "dbi2", and "addr_space" resource setup into common code
  PCI: dwc/intel-gw: Move ATU offset out of driver match data
  PCI: dwc: Add support to program ATU for >4GB memory
  PCI: of: Warn if non-prefetchable memory aperture size is > 32-bit
  PCI: dwc: Support multiple ATU memory regions
2020-12-15 15:11:11 -06:00
Bjorn Helgaas
ee4871d010 Merge branch 'remotes/lorenzo/pci/cadence'
- Make "cdns,max-outbound-regions" optional (Kishon Vijay Abraham I)

- Fix "ti,syscon-pcie-ctrl" DT property to take argument (Kishon Vijay
  Abraham I)

- Add TI J7200 host and endpoint mode DT bindings (Kishon Vijay Abraham I)

* remotes/lorenzo/pci/cadence:
  PCI: j721e: Get offset within "syscon" from "ti,syscon-pcie-ctrl" phandle arg
  dt-bindings: PCI: Add EP mode dt-bindings for TI's J7200 SoC
  dt-bindings: PCI: Add host mode dt-bindings for TI's J7200 SoC
  dt-bindings: pci: ti,j721e: Fix "ti,syscon-pcie-ctrl" to take argument
  PCI: cadence: Do not error if "cdns,max-outbound-regions" is not found
  dt-bindings: PCI: Make "cdns,max-outbound-regions" optional property
2020-12-15 15:11:11 -06:00
Bjorn Helgaas
0032242459 Merge branch 'remotes/lorenzo/pci/brcmstb'
- Initialize "tmp" before use (Jim Quinlan)

* remotes/lorenzo/pci/brcmstb:
  PCI: brcmstb: Initialize "tmp" before use
2020-12-15 15:11:11 -06:00
Bjorn Helgaas
7546ad5e3c Merge branch 'remotes/lorenzo/pci/aardvark'
- Update comment about delay before link training (Pali Rohár)

* remotes/lorenzo/pci/aardvark:
  PCI: aardvark: Update comment about disabling link training
2020-12-15 15:11:10 -06:00
Bjorn Helgaas
7c250f8293 Merge branch 'pci/ecam'
- Unify ECAM constants in native PCI Express drivers (Krzysztof Wilczyński)

- Add thunder-pem constant for custom ".bus_shift" initialiser (Krzysztof
  Wilczyński)

- Convert iproc to use new ECAM constants (Krzysztof Wilczyński)

- Change vmd __iomem pointers from "char *" to "void *" (Krzysztof
  Wilczyński)

- Remove unused xgene .bus_shift initialisers (Krzysztof Wilczyński)

* pci/ecam:
  PCI: xgene: Removed unused ".bus_shift" initialisers from pci-xgene.c
  PCI: vmd: Update type of the __iomem pointers
  PCI: iproc: Convert to use the new ECAM constants
  PCI: thunder-pem: Add constant for custom ".bus_shift" initialiser
  PCI: Unify ECAM constants in native PCI Express drivers
2020-12-15 15:11:10 -06:00
Bjorn Helgaas
c086b55e37 Merge branch 'pci/virtualization'
- Mark AMD Raven iGPU ATS as broken in some Emerson platforms to avoid
  issues (Alex Deucher)

- Add function 1 DMA alias quirk for Marvell 9215 SATA controller (Bjorn
  Helgaas)

* pci/virtualization:
  PCI: Add function 1 DMA alias quirk for Marvell 9215 SATA controller
  PCI: Mark AMD Raven iGPU ATS as broken in some platforms
2020-12-15 15:11:09 -06:00
Bjorn Helgaas
72b3a644bb Merge branch 'pci/ptm'
- Save/restore Precision Time Measurement Capability for suspend/resume
  (David E. Box)

- Disable PTM during suspend to save power (David E. Box)

* pci/ptm:
  PCI: Disable PTM during suspend to save power
  PCI/PTM: Save/restore Precision Time Measurement Capability for suspend/resume
2020-12-15 15:11:09 -06:00
Bjorn Helgaas
ff163da95b Merge branch 'pci/pm'
- Add sysfs attribute for device power state (Maximilian Luz)

- Rename pci_wakeup_bus() to pci_resume_bus() (Mika Westerberg)

- Do not generate wakeup event when runtime resuming bus (Mika Westerberg)

* pci/pm:
  PCI/PM: Do not generate wakeup event when runtime resuming device
  PCI/PM: Rename pci_wakeup_bus() to pci_resume_bus()
  PCI: Add sysfs attribute for device power state
2020-12-15 15:11:08 -06:00
Bjorn Helgaas
a48e486b37 Merge branch 'pci/msi'
- Disable MSI for broken Pericom PCIe-USB adapter (Andy Shevchenko)

- Move MSI/MSI-X init to msi.c (Bjorn Helgaas)

- Move MSI/MSI-X flags updaters to msi.c (Bjorn Helgaas)

- Warn if we assign 64-bit MSI address to device that only supports 32-bit
  MSI (Vidya Sagar)

* pci/msi:
  PCI/MSI: Set device flag indicating only 32-bit MSI support
  PCI/MSI: Move MSI/MSI-X flags updaters to msi.c
  PCI/MSI: Move MSI/MSI-X init to msi.c
  PCI: Use predefined Pericom Vendor ID
  PCI: Disable MSI for Pericom PCIe-USB adapter
2020-12-15 15:11:08 -06:00
Bjorn Helgaas
6db645f99c Merge branch 'pci/misc'
- Update kernel-doc to match function prototypes (Mauro Carvalho Chehab)

- Bounds-check "pci=resource_alignment=" requests (Bjorn Helgaas)

- Fix integer overflow in "pci=resource_alignment=" requests (Colin Ian
  King)

- Remove unused HAVE_PCI_SET_MWI definition (Heiner Kallweit)

- Reduce pci_set_cacheline_size() message to debug level (Heiner Kallweit)

* pci/misc:
  PCI: Reduce pci_set_cacheline_size() message to debug level
  PCI: Remove unused HAVE_PCI_SET_MWI
  PCI: Fix overflow in command-line resource alignment requests
  PCI: Bounds-check command-line resource alignment requests
  PCI: Fix kernel-doc markup

# Conflicts:
#	drivers/pci/pci-driver.c
2020-12-15 15:11:08 -06:00
Bjorn Helgaas
1a76dceaf4 Merge branch 'pci/hotplug'
- Remove unneeded break in ibmphp (Bjorn Helgaas)

- Fix pci_slot_release() NULL pointer dereference (Jubin Zhong)

* pci/hotplug:
  PCI: Fix pci_slot_release() NULL pointer dereference
  PCI: ibmphp: Remove unneeded break
2020-12-15 15:11:07 -06:00
Bjorn Helgaas
6a94785fb9 Merge branch 'pci/err'
- Stop writing AER Capability when we don't own it (Sean V Kelley)

- Bind RCEC devices to the Port driver (Qiuxu Zhuo)

- Cache the RCEC RA Capability offset (Sean V Kelley)

- Add pci_walk_bridge() (Sean V Kelley)

- Clear AER status only when we control AER (Sean V Kelley)

- Recover from RCEC AER errors (Sean V Kelley)

- Add pcie_link_rcec() to associate RCiEPs with RCECs (Sean V Kelley)

- Recover from RCiEP AER errors (Sean V Kelley)

- Add pcie_walk_rcec() for RCEC AER handling (Sean V Kelley)

- Add pcie_walk_rcec() for RCEC PME handling (Sean V Kelley)

- Add RCEC AER error injection support (Qiuxu Zhuo)

* pci/err:
  PCI/AER: Add RCEC AER error injection support
  PCI/PME: Add pcie_walk_rcec() to RCEC PME handling
  PCI/AER: Add pcie_walk_rcec() to RCEC AER handling
  PCI/ERR: Recover from RCiEP AER errors
  PCI/ERR: Add pcie_link_rcec() to associate RCiEPs
  PCI/ERR: Recover from RCEC AER errors
  PCI/ERR: Clear AER status only when we control AER
  PCI/ERR: Add pci_walk_bridge() to pcie_do_recovery()
  PCI/ERR: Avoid negated conditional for clarity
  PCI/ERR: Use "bridge" for clarity in pcie_do_recovery()
  PCI/ERR: Simplify by computing pci_pcie_type() once
  PCI/ERR: Simplify by using pci_upstream_bridge()
  PCI/ERR: Rename reset_link() to reset_subordinates()
  PCI/ERR: Cache RCEC EA Capability offset in pci_init_capabilities()
  PCI/ERR: Bind RCEC devices to the Root Port driver
  PCI/AER: Write AER Capability only when we control it
2020-12-15 15:11:06 -06:00
Bjorn Helgaas
e8722508dd Merge branch 'pci/enumeration'
- Decode PCIe 64 GT/s link speed (Gustavo Pimentel)

- De-duplicate Device IDs in the driver dynamic IDs list (Zhenzhong Duan)

- Return u8 from pci_find_capability() and similar (Puranjay Mohan)

- Return u16 from pci_find_ext_capability() and similar (Bjorn Helgaas)

- Include both device and resource name in config space resources
  (Alexander Lobakin)

- Fix ACPI companion lookup for device 0 on the root bus (Rafael J.
  Wysocki)

* pci/enumeration:
  PCI/ACPI: Fix companion lookup for device 0 on the root bus
  PCI: Keep both device and resource name for config space remaps
  PCI: Return u16 from pci_find_ext_capability() and similar
  PCI: Return u8 from pci_find_capability() and similar
  PCI: Avoid duplicate IDs in driver dynamic IDs list
  PCI: Move pci_match_device() ahead of new_id_store()
  PCI: Decode PCIe 64 GT/s link speed
2020-12-15 15:11:06 -06:00
Bjorn Helgaas
1559c4b588 Merge branch 'pci/aspm'
- Save/restore ASPM L1SS Capability for suspend/resume (Vidya Sagar)

* pci/aspm:
  PCI/ASPM: Save/restore L1SS Capability for suspend/resume
2020-12-15 15:11:06 -06:00
Bjorn Helgaas
059983790a PCI: Add function 1 DMA alias quirk for Marvell 9215 SATA controller
Add function 1 DMA alias quirk for Marvell 88SS9215 PCIe SSD Controller.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=42679#c135
Link: https://lore.kernel.org/r/20201110220516.697934-1-helgaas@kernel.org
Reported-by: John Smith <LK7S2ED64JHGLKj75shg9klejHWG49h5hk@protonmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-12-15 15:09:28 -06:00
Linus Torvalds
ac73e3dc8a Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - a few random little subsystems

 - almost all of the MM patches which are staged ahead of linux-next
   material. I'll trickle to post-linux-next work in as the dependents
   get merged up.

Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
uaccess, zram, and cleanups).

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits)
  mm: cleanup kstrto*() usage
  mm: fix fall-through warnings for Clang
  mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
  mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
  mm:backing-dev: use sysfs_emit in macro defining functions
  mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
  mm: use sysfs_emit for struct kobject * uses
  mm: fix kernel-doc markups
  zram: break the strict dependency from lzo
  zram: add stat to gather incompressible pages since zram set up
  zram: support page writeback
  mm/process_vm_access: remove redundant initialization of iov_r
  mm/zsmalloc.c: rework the list_add code in insert_zspage()
  mm/zswap: move to use crypto_acomp API for hardware acceleration
  mm/zswap: fix passing zero to 'PTR_ERR' warning
  mm/zswap: make struct kernel_param_ops definitions const
  userfaultfd/selftests: hint the test runner on required privilege
  userfaultfd/selftests: fix retval check for userfaultfd_open()
  userfaultfd/selftests: always dump something in modes
  userfaultfd: selftests: make __{s,u}64 format specifiers portable
  ...
2020-12-15 12:53:37 -08:00
Alexey Dobriyan
dfefd226b0 mm: cleanup kstrto*() usage
Range checks can folded into proper conversion function.  kstrto*() exist
for all arithmetic types.

Link: https://lkml.kernel.org/r/20201122123759.GC92364@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:47 -08:00
Gustavo A. R. Silva
01359eb201 mm: fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple of
warnings by explicitly adding a break statement instead of just letting
the code fall through to the next, and by adding a fallthrough
pseudo-keyword in places where the code is intended to fall through.

Link: https://github.com/KSPP/linux/issues/115
Link: https://lkml.kernel.org/r/f5756988b8842a3f10008fbc5b0a654f828920a9.1605896059.git.gustavoars@kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:47 -08:00
Joe Perches
bf16d19aab mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
Convert the unbounded uses of sprintf to sysfs_emit.

A few conversions may now not end in a newline if the output buffer is
overflowed.

Link: https://lkml.kernel.org/r/0c90a90f466167f8c37de4b737553cf49c4a277f.1605376435.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:47 -08:00
Joe Perches
79d4d38a03 mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
Update the function to use sysfs_emit_at while neatening the uses of
sprintf and overwriting the last space char with a newline to avoid
possible output buffer overflow.

Miscellanea:

 - in shmem_enabled_show, the removal of the indirected use of fmt
   allows __printf verification

Link: https://lkml.kernel.org/r/b612a93825e5ea330cb68d2e8b516e9687a06cc6.1605376435.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:47 -08:00
Joe Perches
5e4c0d86cf mm:backing-dev: use sysfs_emit in macro defining functions
The cocci script used in commit bdacbb8d04f ("mm: Use sysfs_emit for
struct kobject * uses") does not convert the name##_show macro because the
macro uses concatenation via ##.

Convert it by hand.

Link: https://lkml.kernel.org/r/45ec6cfc177d743f9c0ebaf35e43969dce43af42.1605376435.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:47 -08:00
Joe Perches
bfb0ffeb2a mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
Convert the only use of sprintf with struct kobject * that the cocci
script could not convert.

Miscellanea:

 - Neaten the uses of a constant string with sysfs_emit to use a const
   char * to reduce overall object size

Link: https://lkml.kernel.org/r/7df6be66bbd68e1a0bca9d35aca1341dbf94d2a7.1605376435.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:47 -08:00
Joe Perches
ae7a927d27 mm: use sysfs_emit for struct kobject * uses
Patch series "mm: Convert sysfs sprintf family to sysfs_emit", v2.

Use the new sysfs_emit family and not the sprintf family.

This patch (of 5):

Use the sysfs_emit function instead of the sprintf family.

Done with cocci script as in commit 3c6bff3cf9 ("RDMA: Convert sysfs
kobject * show functions to use sysfs_emit()")

Link: https://lkml.kernel.org/r/cover.1605376435.git.joe@perches.com
Link: https://lkml.kernel.org/r/9c249215bad6df616ba0410ad980042694970c1b.1605376435.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:47 -08:00
Mauro Carvalho Chehab
a00cda3f0a mm: fix kernel-doc markups
Kernel-doc markups should use this format:
        identifier - description

Fix some issues on mm files:

1) The definition for get_user_pages_locked() doesn't follow it.  Also,
   it expects a short descrpition at the header, followed by a long one,
   after the parameters.  Fix it.

2) Kernel-doc requires that a kernel-doc markup to be immediately below
   the function prototype, as otherwise it will rename it.  So, move
   get_pfnblock_flags_mask() description to the right place.

3) Make invalidate_mapping_pagevec() to also follow the expected
   kernel-doc format.

While here, fix a few minor English syntax issues, as suggested
by Matthew:
	will used -> will be used
	similar with -> similar to

Link: https://lkml.kernel.org/r/80e85dddc92d333bc2159ee8a2294921612e8745.1605521731.git.mchehab+huawei@kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Suggested-by: Mattew Wilcox <willy@infradead.org>	[English fixes]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:47 -08:00
Rui Salvaterra
3d711a3827 zram: break the strict dependency from lzo
From the beginning, the zram block device always enabled CRYPTO_LZO,
since lzo-rle is hardcoded as the fallback compression algorithm.  As a
consequence, on systems where another compression algorithm is chosen
(e.g.  CRYPTO_ZSTD), the lzo kernel module becomes unused, while still
having to be built/loaded.

This patch removes the hardcoded lzo-rle dependency and allows the user
to select the default compression algorithm for zram at build time.  The
previous behaviour is kept, as the default algorithm is still lzo-rle.

Link: https://lkml.kernel.org/r/20201207121245.50529-1-rsalvaterra@gmail.com
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Suggested-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:47 -08:00
Minchan Kim
194e28da1a zram: add stat to gather incompressible pages since zram set up
Currently, zram supports the stat via /sys/block/zram/mm_stat to represent
how many of incompressible pages are stored at the moment but it couldn't
show how many times incompressible pages were wrote down since zram set
up.  It's also good indication to see how zram is effective in the system.

Link: https://lkml.kernel.org/r/20201130201907.1284910-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:47 -08:00
Minchan Kim
0d8359620d zram: support page writeback
There is demand to writeback specific process pages to backing store
instead of all idles pages in the system due to storage wear out concerns
and to launching latency of apps which are most of the time idle but are
critical for resume latency.

This patch extends the writeback knob to support a specific page
writeback.

Link: https://lkml.kernel.org/r/20201020190506.3758660-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:47 -08:00
Colin Ian King
95c9ae14a9 mm/process_vm_access: remove redundant initialization of iov_r
The pointer iov_r is being initialized with a value that is never read and
it is being updated later with a new value.  The initialization is
redundant and can be removed.

Link: https://lkml.kernel.org/r/20201102120614.694917-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Miaohe Lin
110ceb8287 mm/zsmalloc.c: rework the list_add code in insert_zspage()
Rework the list_add code to make it more readable and simple.

Link: https://lkml.kernel.org/r/20201015130107.65195-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Barry Song
1ec3b5fe6e mm/zswap: move to use crypto_acomp API for hardware acceleration
Right now, all new ZIP drivers are adapted to crypto_acomp APIs rather
than legacy crypto_comp APIs.  Tradiontal ZIP drivers like lz4,lzo etc
have been also wrapped into acomp via scomp backend.  But zswap.c is still
using the old APIs.  That means zswap won't be able to work on any new ZIP
drivers in kernel.

This patch moves to use cryto_acomp APIs to fix the disconnected bridge
between new ZIP drivers and zswap.  It is probably the first real user to
use acomp but perhaps not a good example to demonstrate how multiple acomp
requests can be executed in parallel in one acomp instance.  frontswap is
doing page load and store page by page synchronously.  swap_writepage()
depends on the completion of frontswap_store() to decide if it should call
__swap_writepage() to swap to disk.

However this patch creates multiple acomp instances, so multiple threads
running on multiple different cpus can actually do (de)compression
parallelly, leveraging the power of multiple ZIP hardware queues.  This is
also consistent with frontswap's page management model.

The old zswap code uses atomic context and avoids the race conditions
while shared resources like zswap_dstmem are accessed.  Here since acomp
can sleep, per-cpu mutex is used to replace preemption-disable.

While it is possible to make mm/page_io.c and mm/frontswap.c support async
(de)compression in some way, the entire design requires careful thinking
and performance evaluation.  For the first step, the base with fixed
connection between ZIP drivers and zswap should be built.

Link: https://lkml.kernel.org/r/20201107065332.26992-1-song.bao.hua@hisilicon.com
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Acked-by: Vitaly Wool <vitalywool@gmail.com>
Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Mahipal Challa <mahipalreddy2006@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Zhou Wang <wangzhou1@hisilicon.com>
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
YueHaibing
42a4470436 mm/zswap: fix passing zero to 'PTR_ERR' warning
Fix smatch warning:

  mm/zswap.c:425 zswap_cpu_comp_prepare() warn: passing zero to 'PTR_ERR'

crypto_alloc_comp() never return NULL, use IS_ERR instead of
IS_ERR_OR_NULL to fix this.

Link: https://lkml.kernel.org/r/20201031055615.28080-1-yuehaibing@huawei.com
Fixes: f1c54846ee ("zswap: dynamic pool creation")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Joe Perches
83aed6cde8 mm/zswap: make struct kernel_param_ops definitions const
These should be const, so make it so.

Link: https://lkml.kernel.org/r/1791535ee0b00f4a5c68cc4a8adada06593ad8f1.1601770305.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Peter Xu
d9f411bacf userfaultfd/selftests: hint the test runner on required privilege
Now userfaultfd test program requires either root or ptrace privilege due
to the signal/event tests.  When UFFDIO_API failed, hint the test runner
about this fact verbosely.

Link: https://lkml.kernel.org/r/20201208024709.7701-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Peter Xu
1e17a24edf userfaultfd/selftests: fix retval check for userfaultfd_open()
userfaultfd_open() returns 1 for errors rather than negatives.  Fix it on
all the callers so when UFFDIO_API failed the test will bail out.

Link: https://lkml.kernel.org/r/20201208024709.7701-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Peter Xu
164c50be28 userfaultfd/selftests: always dump something in modes
Patch series "userfaultfd: selftests: Small fixes".

Some very trivial fixes that I kept locally to userfaultfd selftest
program.

This patch (of 3):

BOUNCE_POLL is a special bit that if cleared it means "READ" instead.
Dump that too otherwise we'll see tests with empty modes.

Link: https://lkml.kernel.org/r/20201208024709.7701-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20201208024709.7701-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Axel Rasmussen
77f962e7ae userfaultfd: selftests: make __{s,u}64 format specifiers portable
On certain platforms (powerpcle is the one on which I ran into this),
"%Ld" and "%Lu" are unsuitable for printing __s64 and __u64, respectively,
resulting in build warnings.  Cast to {u,}int64_t, and use the PRI{d,u}64
macros defined in inttypes.h to print them.  This ought to be portable to
all platforms.

Splitting this off into a separate macro lets us remove some lines, and
get rid of some (I would argue) stylistically odd cases where we joined
printf() and exit() into a single statement with a ,.

Finally, this also fixes a "missing braces around initializer" warning
when we initialize prms in wp_range().

[axelrasmussen@google.com: v2]
  Link: https://lkml.kernel.org/r/20201203180244.1811601-1-axelrasmussen@google.com

Link: https://lkml.kernel.org/r/20201202211542.1121189-1-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Alan Gilbert <dgilbert@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Lokesh Gidra
d0d4730ac2 userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob
With this change, when the knob is set to 0, it allows unprivileged users
to call userfaultfd, like when it is set to 1, but with the restriction
that page faults from only user-mode can be handled.  In this mode, an
unprivileged user (without SYS_CAP_PTRACE capability) must pass
UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM.

This enables administrators to reduce the likelihood that an attacker with
access to userfaultfd can delay faulting kernel code to widen timing
windows for other exploits.

The default value of this knob is changed to 0.  This is required for
correct functioning of pipe mutex.  However, this will fail postcopy live
migration, which will be unnoticeable to the VM guests.  To avoid this,
set 'vm.userfault = 1' in /sys/sysctl.conf.

The main reason this change is desirable as in the short term is that the
Android userland will behave as with the sysctl set to zero.  So without
this commit, any Linux binary using userfaultfd to manage its memory would
behave differently if run within the Android userland.  For more details,
refer to Andrea's reply [1].

[1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/

Link: https://lkml.kernel.org/r/20201120030411.2690816-3-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: <calin@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Lokesh Gidra
37cd0575b8 userfaultfd: add UFFD_USER_MODE_ONLY
Patch series "Control over userfaultfd kernel-fault handling", v6.

This patch series is split from [1].  The other series enables SELinux
support for userfaultfd file descriptors so that its creation and movement
can be controlled.

It has been demonstrated on various occasions that suspending kernel code
execution for an arbitrary amount of time at any access to userspace
memory (copy_from_user()/copy_to_user()/...) can be exploited to change
the intended behavior of the kernel.  For instance, handling page faults
in kernel-mode using userfaultfd has been exploited in [2, 3].  Likewise,
FUSE, which is similar to userfaultfd in this respect, has been exploited
in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object.  It then adds a 'user-mode only' option to the
unprivileged_userfaultfd sysctl knob to require unprivileged callers to
use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to enhance
security vulnerabilities by lengthening the race window in kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

This patch (of 2):

userfaultfd handles page faults from both user and kernel code.  Add a new
UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting
userfaultfd object refuse to handle faults from kernel mode, treating
these faults as if SIGBUS were always raised, causing the kernel code to
fail with EFAULT.

A future patch adds a knob allowing administrators to give some processes
the ability to create userfaultfd file objects only if they pass
UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will
exploit userfaultfd's ability to delay kernel page faults to open timing
windows for future exploits.

Link: https://lkml.kernel.org/r/20201120030411.2690816-1-lokeshgidra@google.com
Link: https://lkml.kernel.org/r/20201120030411.2690816-2-lokeshgidra@google.com
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <calin@google.com>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Vlastimil Babka
f289041ed4 mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO
CONFIG_PAGE_POISONING_ZERO uses the zero pattern instead of 0xAA.  It was
introduced by commit 1414c7f4f7 ("mm/page_poisoning.c: allow for zero
poisoning"), noting that using zeroes retains the benefit of sanitizing
content of freed pages, with the benefit of not having to zero them again
on alloc, and the downside of making some forms of corruption (stray
writes of NULLs) harder to detect than with the 0xAA pattern.  Together
with CONFIG_PAGE_POISONING_NO_SANITY it made possible to sanitize the
contents on free without checking it back on alloc.

These days we have the init_on_free() option to achieve sanitization with
zeroes and to save clearing on alloc (and without checking on alloc).
Arguably if someone does choose to check the poison for corruption on
alloc, the savings of not clearing the page are secondary, and it makes
sense to always use the 0xAA poison pattern.  Thus, remove the
CONFIG_PAGE_POISONING_ZERO option for being redundant.

Link: https://lkml.kernel.org/r/20201113104033.22907-6-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Vlastimil Babka
8f424750ba mm, page_poison: remove CONFIG_PAGE_POISONING_NO_SANITY
CONFIG_PAGE_POISONING_NO_SANITY skips the check on page alloc whether the
poison pattern was corrupted, suggesting a use-after-free.  The motivation
to introduce it in commit 8823b1dbc0 ("mm/page_poison.c: enable
PAGE_POISONING as a separate option") was to simply sanitize freed pages,
optimally together with CONFIG_PAGE_POISONING_ZERO.

These days we have an init_on_free=1 boot option, which makes this use
case of page poisoning redundant.  For sanitizing, writing zeroes is
sufficient, there is pretty much no benefit from writing the 0xAA poison
pattern to freed pages, without checking it back on alloc.  Thus, remove
this option and suggest init_on_free instead in the main config's help.

Link: https://lkml.kernel.org/r/20201113104033.22907-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Vlastimil Babka
03b6c9a3e8 kernel/power: allow hibernation with page_poison sanity checking
Page poisoning used to be incompatible with hibernation, as the state of
poisoned pages was lost after resume, thus enabling CONFIG_HIBERNATION
forces CONFIG_PAGE_POISONING_NO_SANITY.  For the same reason, the
poisoning with zeroes variant CONFIG_PAGE_POISONING_ZERO used to disable
hibernation.  The latter restriction was removed by commit 1ad1410f63
("PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO") and
similarly for init_on_free by commit 18451f9f9e ("PM: hibernate: fix
crashes with init_on_free=1") by making sure free pages are cleared after
resume.

We can use the same mechanism to instead poison free pages with
PAGE_POISON after resume.  This covers both zero and 0xAA patterns.  Thus
we can remove the Kconfig restriction that disables page poison sanity
checking when hibernation is enabled.

Link: https://lkml.kernel.org/r/20201113104033.22907-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[hibernation]
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Vlastimil Babka
8db26a3d47 mm, page_poison: use static key more efficiently
Commit 11c9c7edae ("mm/page_poison.c: replace bool variable with static
key") changed page_poisoning_enabled() to a static key check.  However,
the function is not inlined, so each check still involves a function call
with overhead not eliminated when page poisoning is disabled.

Analogically to how debug_pagealloc is handled, this patch converts
page_poisoning_enabled() back to boolean check, and introduces
page_poisoning_enabled_static() for fast paths.  Both functions are
inlined.

The function kernel_poison_pages() is also called unconditionally and does
the static key check inside.  Remove it from there and put it to callers.
Also split it to two functions kernel_poison_pages() and
kernel_unpoison_pages() instead of the confusing bool parameter.

Also optimize the check that enables page poisoning instead of
debug_pagealloc for architectures without proper debug_pagealloc support.
Move the check to init_mem_debugging_and_hardening() to enable a single
static key instead of having two static branches in
page_poisoning_enabled_static().

Link: https://lkml.kernel.org/r/20201113104033.22907-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Vlastimil Babka
04013513cc mm, page_alloc: do not rely on the order of page_poison and init_on_alloc/free parameters
Patch series "cleanup page poisoning", v3.

I have identified a number of issues and opportunities for cleanup with
CONFIG_PAGE_POISON and friends:

 - interaction with init_on_alloc and init_on_free parameters depends on
   the order of parameters (Patch 1)

 - the boot time enabling uses static key, but inefficienty (Patch 2)

 - sanity checking is incompatible with hibernation (Patch 3)

 - CONFIG_PAGE_POISONING_NO_SANITY can be removed now that we have
   init_on_free (Patch 4)

 - CONFIG_PAGE_POISONING_ZERO can be most likely removed now that we
   have init_on_free (Patch 5)

This patch (of 5):

Enabling page_poison=1 together with init_on_alloc=1 or init_on_free=1
produces a warning in dmesg that page_poison takes precedence.  However,
as these warnings are printed in early_param handlers for
init_on_alloc/free, they are not printed if page_poison is enabled later
on the command line (handlers are called in the order of their
parameters), or when init_on_alloc/free is always enabled by the
respective config option - before the page_poison early param handler is
called, it is not considered to be enabled.  This is inconsistent.

We can remove the dependency on order by making the init_on_* parameters
only set a boolean variable, and postponing the evaluation after all early
params have been processed.  Introduce a new
init_mem_debugging_and_hardening() function for that, and move the related
debug_pagealloc processing there as well.

As a result init_mem_debugging_and_hardening() knows always accurately if
init_on_* and/or page_poison options were enabled.  Thus we can also
optimize want_init_on_alloc() and want_init_on_free().  We don't need to
check page_poisoning_enabled() there, we can instead not enable the
init_on_* static keys at all, if page poisoning is enabled.  This results
in a simpler and more effective code.

Link: https://lkml.kernel.org/r/20201113104033.22907-1-vbabka@suse.cz
Link: https://lkml.kernel.org/r/20201113104033.22907-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mateusz Nosek <mateusznosek0@gmail.com>
Cc: Laura Abbott <labbott@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Charan Teja Reddy
b8ca396f98 mm: cma: improve pr_debug log in cma_release()
It is required to print 'count' of pages, along with the pages, passed to
cma_release to debug the cases of mismatched count value passed between
cma_alloc() and cma_release() from a code path.

As an example, consider the below scenario:

1) CMA pool size is 4MB and

2) User doing the erroneous step of allocating 2 pages but freeing 1
   page in a loop from this CMA pool.  The step 2 causes cma_alloc() to
   return NULL at one point of time because of -ENOMEM condition.

And the current pr_debug logs is not giving the info about these types of
allocation patterns because of count value not being printed in
cma_release().

We are printing the count value in the trace logs, just extend the same to
pr_debug logs too.

[akpm@linux-foundation.org: fix printk warning]

Link: https://lkml.kernel.org/r/1606318341-29521-1-git-send-email-charante@codeaurora.org
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Reviewed-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Lecopzer Chen
a4efc174b3 mm/cma.c: remove redundant cma_mutex lock
The cma_mutex which protects alloc_contig_range() was first appeared in
commit 7ee793a62f ("cma: Remove potential deadlock situation"), at that
time, there is no guarantee the behavior of concurrency inside
alloc_contig_range().

After commit 2c7452a075 ("mm/page_isolation.c: make
start_isolate_page_range() fail if already isolated")

  > However, two subsystems (CMA and gigantic
  > huge pages for example) could attempt operations on the same range.  If
  > this happens, one thread may 'undo' the work another thread is doing.
  > This can result in pageblocks being incorrectly left marked as
  > MIGRATE_ISOLATE and therefore not available for page allocation.

The concurrency inside alloc_contig_range() was clarified.

Now we can find that hugepage and virtio call alloc_contig_range() without
any lock, thus cma_mutex is "redundant" in cma_alloc() now.

Link: https://lkml.kernel.org/r/20201020102241.3729-1-lecopzer.chen@mediatek.com
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: YJ Chiang <yj.chiang@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Stephen Zhang
d85c6db4cc mm: migrate: remove unused parameter in migrate_vma_insert_page()
"dst" parameter to migrate_vma_insert_page() is not used anymore.

Link: https://lkml.kernel.org/r/CANubcdUwCAMuUyamG2dkWP=cqSR9MAS=tHLDc95kQkqU-rEnAg@mail.gmail.com
Signed-off-by: Stephen Zhang <starzhangzsd@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:46 -08:00
Yang Shi
d532e2e57e mm: migrate: return -ENOSYS if THP migration is unsupported
In the current implementation unmap_and_move() would return -ENOMEM if THP
migration is unsupported, then the THP will be split.  If split is failed
just exit without trying to migrate other pages.  It doesn't make too much
sense since there may be enough free memory to migrate other pages and
there may be a lot base pages on the list.

Return -ENOSYS to make consistent with hugetlb.  And if THP split is
failed just skip and try other pages on the list.

Just skip the whole list and exit when free memory is really low.

Link: https://lkml.kernel.org/r/20201113205359.556831-6-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:45 -08:00
Yang Shi
236c32eb10 mm: migrate: clean up migrate_prep{_local}
The migrate_prep{_local} never fails, so it is pointless to have return
value and check the return value.

Link: https://lkml.kernel.org/r/20201113205359.556831-5-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:45 -08:00
Yang Shi
c77c5cbafe mm: migrate: skip shared exec THP for NUMA balancing
The NUMA balancing skip shared exec base page.  Since
CONFIG_READ_ONLY_THP_FOR_FS was introduced, there are probably shared exec
THP, so skip such THPs for NUMA balancing as well.

And Willy's regular filesystem THP support patches could create shared
exec THP wven without that config.

In addition, the page_is_file_lru() is used to tell if the page is file
cache or not, but it filters out shmem page.  It sounds like a typical
usecase by putting executables in shmem to achieve performance gain via
using shmem-THP, so it sounds worth skipping migration for such case too.

Link: https://lkml.kernel.org/r/20201113205359.556831-4-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:45 -08:00