Commit Graph

1107533 Commits

Author SHA1 Message Date
Mel Gorman
751d4cbc43 sched/core: Do not requeue task on CPU excluded from cpus_mask
The following warning was triggered on a large machine early in boot on
a distribution kernel but the same problem should also affect mainline.

   WARNING: CPU: 439 PID: 10 at ../kernel/workqueue.c:2231 process_one_work+0x4d/0x440
   Call Trace:
    <TASK>
    rescuer_thread+0x1f6/0x360
    kthread+0x156/0x180
    ret_from_fork+0x22/0x30
    </TASK>

Commit c6e7bd7afa ("sched/core: Optimize ttwu() spinning on p->on_cpu")
optimises ttwu by queueing a task that is descheduling on the wakelist,
but does not check if the task descheduling is still allowed to run on that CPU.

In this warning, the problematic task is a workqueue rescue thread which
checks if the rescue is for a per-cpu workqueue and running on the wrong CPU.
While this is early in boot and it should be possible to create workers,
the rescue thread may still used if the MAYDAY_INITIAL_TIMEOUT is reached
or MAYDAY_INTERVAL and on a sufficiently large machine, the rescue
thread is being used frequently.

Tracing confirmed that the task should have migrated properly using the
stopper thread to handle the migration. However, a parallel wakeup from udev
running on another CPU that does not share CPU cache observes p->on_cpu and
uses task_cpu(p), queues the task on the old CPU and triggers the warning.

Check that the wakee task that is descheduling is still allowed to run
on its current CPU and if not, wait for the descheduling to complete
and select an allowed CPU.

Fixes: c6e7bd7afa ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20220804092119.20137-1-mgorman@techsingularity.net
2022-08-04 11:26:13 +02:00
Ben Dooks
87514b2c24 sched/rt: Fix Sparse warnings due to undefined rt.c declarations
There are several symbols defined in kernel/sched/sched.h but get wrapped
in CONFIG_CGROUP_SCHED, even though dummy versions get built in rt.c and
therefore trigger Sparse warnings:

  kernel/sched/rt.c:309:6: warning: symbol 'unregister_rt_sched_group' was not declared. Should it be static?
  kernel/sched/rt.c:311:6: warning: symbol 'free_rt_sched_group' was not declared. Should it be static?
  kernel/sched/rt.c:313:5: warning: symbol 'alloc_rt_sched_group' was not declared. Should it be static?

Fix this by moving them outside the CONFIG_CGROUP_SCHED block.

[ mingo: Refreshed to the latest scheduler tree, tweaked changelog. ]

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20220721145155.358366-1-ben-linux@fluff.org
2022-08-03 11:22:37 +02:00
Ingo Molnar
dcca34754a exit: Fix typo in comment: s/sub-theads/sub-threads
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-08-03 10:44:54 +02:00
Waiman Long
b6e8d40d43 sched, cpuset: Fix dl_cpu_busy() panic due to empty cs->cpus_allowed
With cgroup v2, the cpuset's cpus_allowed mask can be empty indicating
that the cpuset will just use the effective CPUs of its parent. So
cpuset_can_attach() can call task_can_attach() with an empty mask.
This can lead to cpumask_any_and() returns nr_cpu_ids causing the call
to dl_bw_of() to crash due to percpu value access of an out of bound
CPU value. For example:

	[80468.182258] BUG: unable to handle page fault for address: ffffffff8b6648b0
	  :
	[80468.191019] RIP: 0010:dl_cpu_busy+0x30/0x2b0
	  :
	[80468.207946] Call Trace:
	[80468.208947]  cpuset_can_attach+0xa0/0x140
	[80468.209953]  cgroup_migrate_execute+0x8c/0x490
	[80468.210931]  cgroup_update_dfl_csses+0x254/0x270
	[80468.211898]  cgroup_subtree_control_write+0x322/0x400
	[80468.212854]  kernfs_fop_write_iter+0x11c/0x1b0
	[80468.213777]  new_sync_write+0x11f/0x1b0
	[80468.214689]  vfs_write+0x1eb/0x280
	[80468.215592]  ksys_write+0x5f/0xe0
	[80468.216463]  do_syscall_64+0x5c/0x80
	[80468.224287]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix that by using effective_cpus instead. For cgroup v1, effective_cpus
is the same as cpus_allowed. For v2, effective_cpus is the real cpumask
to be used by tasks within the cpuset anyway.

Also update task_can_attach()'s 2nd argument name to cs_effective_cpus to
reflect the change. In addition, a check is added to task_can_attach()
to guard against the possibility that cpumask_any_and() may return a
value >= nr_cpu_ids.

Fixes: 7f51412a41 ("sched/deadline: Fix bandwidth check/update when migrating tasks between exclusive cpusets")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20220803015451.2219567-1-longman@redhat.com
2022-08-03 10:34:26 +02:00
Linus Torvalds
9de1f9c8ca Merge tag 'irq-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "Updates for interrupt core and drivers:

  Core:

   - Fix a few inconsistencies between UP and SMP vs interrupt
     affinities

   - Small updates and cleanups all over the place

  New drivers:

   - LoongArch interrupt controller

   - Renesas RZ/G2L interrupt controller

  Updates:

   - Hotpath optimization for SiFive PLIC

   - Workaround for broken PLIC edge triggered interrupts

   - Simall cleanups and improvements as usual"

* tag 'irq-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  irqchip/mmp: Declare init functions in common header file
  irqchip/mips-gic: Check the return value of ioremap() in gic_of_init()
  genirq: Use for_each_action_of_desc in actions_show()
  irqchip / ACPI: Introduce ACPI_IRQ_MODEL_LPIC for LoongArch
  irqchip: Add LoongArch CPU interrupt controller support
  irqchip: Add Loongson Extended I/O interrupt controller support
  irqchip/loongson-liointc: Add ACPI init support
  irqchip/loongson-pch-msi: Add ACPI init support
  irqchip/loongson-pch-pic: Add ACPI init support
  irqchip: Add Loongson PCH LPC controller support
  LoongArch: Prepare to support multiple pch-pic and pch-msi irqdomain
  LoongArch: Use ACPI_GENERIC_GSI for gsi handling
  genirq/generic_chip: Export irq_unmap_generic_chip
  ACPI: irq: Allow acpi_gsi_to_irq() to have an arch-specific fallback
  APCI: irq: Add support for multiple GSI domains
  LoongArch: Provisionally add ACPICA data structures
  irqdomain: Use hwirq_max instead of revmap_size for NOMAP domains
  irqdomain: Report irq number for NOMAP domains
  irqchip/gic-v3: Fix comment typo
  dt-bindings: interrupt-controller: renesas,rzg2l-irqc: Document RZ/V2L SoC
  ...
2022-08-01 12:48:15 -07:00
Linus Torvalds
dfea84827f Merge tag 'timers-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Timers, timekeeping and related drivers update:

  Core:

   - Make wait_event_hrtimeout() aware of RT/DL tasks

  New drivers:

   - R-Car Gen4 timer

   - Tegra186 timer

   - Mediatek MT6795 CPUXGPT timer

  Updates:

   - Rework suspend/resume handling in timer drivers so it
     takes inactive clocks into account.

   - The usual device tree compatible add ons

   - Small fixed and cleanups all over the place"

* tag 'timers-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  wait: Fix __wait_event_hrtimeout for RT/DL tasks
  clocksource/drivers/sun5i: Remove unnecessary (void*) conversions
  dt-bindings: timer: allwinner,sun4i-a10-timer: Add D1 compatible
  dt-bindings: timer: ingenic,tcu: use absolute path to other schema
  clocksource/drivers/sun4i: Remove unnecessary (void*) conversions
  dt-bindings: timer: renesas,cmt: Fix R-Car Gen4 fall-out
  clocksource/drivers/tegra186: Put Kconfig option 'tristate' to 'bool'
  clocksource/drivers/timer-ti-dm: Make driver selection bool for TI K3
  clocksource/drivers/timer-ti-dm: Add compatible for am6 SoCs
  clocksource/drivers/timer-ti-dm: Make timer selectable for ARCH_K3
  clocksource/drivers/timer-ti-dm: Move inline functions to driver for am6
  clocksource/drivers/sh_cmt: Add R-Car Gen4 support
  dt-bindings: timer: renesas,cmt: R-Car V3U is R-Car Gen4
  dt-bindings: timer: renesas,cmt: Add r8a779f0 and generic Gen4 CMT support
  clocksource/drivers/timer-microchip-pit64b: Fix compilation warnings
  clocksource/drivers/timer-microchip-pit64b: Use mchp_pit64b_{suspend, resume}
  clocksource/drivers/timer-microchip-pit64b: Remove suspend/resume ops for ce
  thermal/drivers/rcar_gen3_thermal: Add r8a779f0 support
  clocksource/drivers/timer-mediatek: Implement CPUXGPT timers
  dt-bindings: timer: mediatek: Add CPUX System Timer and MT6795 compatible
  ...
2022-08-01 12:37:54 -07:00
Linus Torvalds
63e6053add Merge tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:

 - Fix Intel Alder Lake PEBS memory access latency & data source
   profiling info bugs.

 - Use Intel large-PEBS hardware feature in more circumstances, to
   reduce PMI overhead & reduce sampling data.

 - Extend the lost-sample profiling output with the PERF_FORMAT_LOST ABI
   variant, which tells tooling the exact number of samples lost.

 - Add new IBS register bits definitions.

 - AMD uncore events: Add PerfMonV2 DF (Data Fabric) enhancements.

* tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/ibs: Add new IBS register bits into header
  perf/x86/intel: Fix PEBS data source encoding for ADL
  perf/x86/intel: Fix PEBS memory access info encoding for ADL
  perf/core: Add a new read format to get a number of lost samples
  perf/x86/amd/uncore: Add PerfMonV2 RDPMC assignments
  perf/x86/amd/uncore: Add PerfMonV2 DF event format
  perf/x86/amd/uncore: Detect available DF counters
  perf/x86/amd/uncore: Use attr_update for format attributes
  perf/x86/amd/uncore: Use dynamic events array
  x86/events/intel/ds: Enable large PEBS for PERF_SAMPLE_WEIGHT_TYPE
2022-08-01 12:24:30 -07:00
Linus Torvalds
22a39c3d86 Merge tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "This was a fairly quiet cycle for the locking subsystem:

   - lockdep: Fix a handful of the more complex lockdep_init_map_*()
     primitives that can lose the lock_type & cause false reports. No
     such mishap was observed in the wild.

   - jump_label improvements: simplify the cross-arch support of initial
     NOP patching by making it arch-specific code (used on MIPS only),
     and remove the s390 initial NOP patching that was superfluous"

* tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Fix lockdep_init_map_*() confusion
  jump_label: make initial NOP patching the special case
  jump_label: mips: move module NOP patching into arch code
  jump_label: s390: avoid pointless initial NOP patching
2022-08-01 12:15:27 -07:00
Linus Torvalds
b167fdffe9 Merge tag 'sched-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Load-balancing improvements:

   - Improve NUMA balancing on AMD Zen systems for affine workloads.

   - Improve the handling of reduced-capacity CPUs in load-balancing.

   - Energy Model improvements: fix & refine all the energy fairness
     metrics (PELT), and remove the conservative threshold requiring 6%
     energy savings to migrate a task. Doing this improves power
     efficiency for most workloads, and also increases the reliability
     of energy-efficiency scheduling.

   - Optimize/tweak select_idle_cpu() to spend (much) less time
     searching for an idle CPU on overloaded systems. There's reports of
     several milliseconds spent there on large systems with large
     workloads ...

     [ Since the search logic changed, there might be behavioral side
       effects. ]

   - Improve NUMA imbalance behavior. On certain systems with spare
     capacity, initial placement of tasks is non-deterministic, and such
     an artificial placement imbalance can persist for a long time,
     hurting (and sometimes helping) performance.

     The fix is to make fork-time task placement consistent with runtime
     NUMA balancing placement.

     Note that some performance regressions were reported against this,
     caused by workloads that are not memory bandwith limited, which
     benefit from the artificial locality of the placement bug(s). Mel
     Gorman's conclusion, with which we concur, was that consistency is
     better than random workload benefits from non-deterministic bugs:

        "Given there is no crystal ball and it's a tradeoff, I think
         it's better to be consistent and use similar logic at both fork
         time and runtime even if it doesn't have universal benefit."

   - Improve core scheduling by fixing a bug in
     sched_core_update_cookie() that caused unnecessary forced idling.

   - Improve wakeup-balancing by allowing same-LLC wakeup of idle CPUs
     for newly woken tasks.

   - Fix a newidle balancing bug that introduced unnecessary wakeup
     latencies.

  ABI improvements/fixes:

   - Do not check capabilities and do not issue capability check denial
     messages when a scheduler syscall doesn't require privileges. (Such
     as increasing niceness.)

   - Add forced-idle accounting to cgroups too.

   - Fix/improve the RSEQ ABI to not just silently accept unknown flags.
     (No existing tooling is known to have learned to rely on the
     previous behavior.)

   - Depreciate the (unused) RSEQ_CS_FLAG_NO_RESTART_ON_* flags.

  Optimizations:

   - Optimize & simplify leaf_cfs_rq_list()

   - Micro-optimize set_nr_{and_not,if}_polling() via try_cmpxchg().

  Misc fixes & cleanups:

   - Fix the RSEQ self-tests on RISC-V and Glibc 2.35 systems.

   - Fix a full-NOHZ bug that can in some cases result in the tick not
     being re-enabled when the last SCHED_RT task is gone from a
     runqueue but there's still SCHED_OTHER tasks around.

   - Various PREEMPT_RT related fixes.

   - Misc cleanups & smaller fixes"

* tag 'sched-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  rseq: Kill process when unknown flags are encountered in ABI structures
  rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags
  sched/core: Fix the bug that task won't enqueue into core tree when update cookie
  nohz/full, sched/rt: Fix missed tick-reenabling bug in dequeue_task_rt()
  sched/core: Always flush pending blk_plug
  sched/fair: fix case with reduced capacity CPU
  sched/core: Use try_cmpxchg in set_nr_{and_not,if}_polling
  sched/core: add forced idle accounting for cgroups
  sched/fair: Remove the energy margin in feec()
  sched/fair: Remove task_util from effective utilization in feec()
  sched/fair: Use the same cpumask per-PD throughout find_energy_efficient_cpu()
  sched/fair: Rename select_idle_mask to select_rq_mask
  sched, drivers: Remove max param from effective_cpu_util()/sched_cpu_util()
  sched/fair: Decay task PELT values during wakeup migration
  sched/fair: Provide u64 read for 32-bits arch helper
  sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg
  sched: only perform capability check on privileged operation
  sched: Remove unused function group_first_cpu()
  sched/fair: Remove redundant word " *"
  selftests/rseq: check if libc rseq support is registered
  ...
2022-08-01 11:49:06 -07:00
Linus Torvalds
0dd1cabe8a Merge tag 'slab-for-5.20_or_6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:

 - An addition of 'accounted' flag to slab allocation tracepoints to
   indicate memcg_kmem accounting, by Vasily

 - An optimization of memcg handling in freeing paths, by Muchun

 - Various smaller fixes and cleanups

* tag 'slab-for-5.20_or_6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/slab_common: move generic bulk alloc/free functions to SLOB
  mm/sl[au]b: use own bulk free function when bulk alloc failed
  mm: slab: optimize memcg_slab_free_hook()
  mm/tracing: add 'accounted' entry into output of allocation tracepoints
  tools/vm/slabinfo: Handle files in debugfs
  mm/slub: Simplify __kmem_cache_alias()
  mm, slab: fix bad alignments
2022-08-01 11:46:58 -07:00
Linus Torvalds
0cec3f24a7 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
 "Highlights include a major rework of our kPTI page-table rewriting
  code (which makes it both more maintainable and considerably faster in
  the cases where it is required) as well as significant changes to our
  early boot code to reduce the need for data cache maintenance and
  greatly simplify the KASLR relocation dance.

  Summary:

   - Remove unused generic cpuidle support (replaced by PSCI version)

   - Fix documentation describing the kernel virtual address space

   - Handling of some new CPU errata in Arm implementations

   - Rework of our exception table code in preparation for handling
     machine checks (i.e. RAS errors) more gracefully

   - Switch over to the generic implementation of ioremap()

   - Fix lockdep tracking in NMI context

   - Instrument our memory barrier macros for KCSAN

   - Rework of the kPTI G->nG page-table repainting so that the MMU
     remains enabled and the boot time is no longer slowed to a crawl
     for systems which require the late remapping

   - Enable support for direct swapping of 2MiB transparent huge-pages
     on systems without MTE

   - Fix handling of MTE tags with allocating new pages with HW KASAN

   - Expose the SMIDR register to userspace via sysfs

   - Continued rework of the stack unwinder, particularly improving the
     behaviour under KASAN

   - More repainting of our system register definitions to match the
     architectural terminology

   - Improvements to the layout of the vDSO objects

   - Support for allocating additional bits of HWCAP2 and exposing
     FEAT_EBF16 to userspace on CPUs that support it

   - Considerable rework and optimisation of our early boot code to
     reduce the need for cache maintenance and avoid jumping in and out
     of the kernel when handling relocation under KASLR

   - Support for disabling SVE and SME support on the kernel
     command-line

   - Support for the Hisilicon HNS3 PMU

   - Miscellanous cleanups, trivial updates and minor fixes"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (136 commits)
  arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}
  arm64: fix KASAN_INLINE
  arm64/hwcap: Support FEAT_EBF16
  arm64/cpufeature: Store elf_hwcaps as a bitmap rather than unsigned long
  arm64/hwcap: Document allocation of upper bits of AT_HWCAP
  arm64: enable THP_SWAP for arm64
  arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52
  arm64: errata: Remove AES hwcap for COMPAT tasks
  arm64: numa: Don't check node against MAX_NUMNODES
  drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX
  perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node()
  docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING
  arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"
  mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON
  mm: kasan: Skip unpoisoning of user pages
  mm: kasan: Ensure the tags are visible before the tag in page->flags
  drivers/perf: hisi: add driver for HNS3 PMU
  drivers/perf: hisi: Add description for HNS3 PMU driver
  drivers/perf: riscv_pmu_sbi: perf format
  perf/arm-cci: Use the bitmap API to allocate bitmaps
  ...
2022-08-01 10:37:00 -07:00
Linus Torvalds
a82c58cf1a Merge tag 'm68k-for-v5.20-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:

 - Use RNG seed from bootinfo block on virt platform

 - defconfig updates

 - Minor fixes and improvements

* tag 'm68k-for-v5.20-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: defconfig: Update defconfigs for v5.19-rc1
  m68k: Add common forward declaration for show_registers()
  m68k: mac: Remove forward declaration for mac_nmi_handler()
  m68k: virt: Fix missing platform_device_unregister() on error in virt_platform_init()
  m68k: virt: Use RNG seed from bootinfo block
  m68k: bitops: Change __fls to return and accept unsigned long
  m68k: Kconfig.machine: Add endif comment
  m68k: Kconfig.debug: Replace single quotes
  m68k: Kconfig.cpu: Fix indentation and add endif comments
  m68k: q40: Align '*' in comments
  m68k: sun3: Use __func__ to get function's name in an output message
  m68k: mac: Fix typos in comments
  m68k: virt: Kconfig minor fixes
2022-08-01 10:32:06 -07:00
Linus Torvalds
60ee49fac8 Merge tag 'x86_kdump_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kdump updates from Borislav Petkov:

 - Add the ability to pass early an RNG seed to the kernel from the boot
   loader

 - Add the ability to pass the IMA measurement of kernel and bootloader
   to the kexec-ed kernel

* tag 'x86_kdump_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/setup: Use rng seeds from setup_data
  x86/kexec: Carry forward IMA measurement log on kexec
2022-08-01 10:17:19 -07:00
Linus Torvalds
8b7054528c Merge tag 'x86_build_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Borislav Petkov:

 - Fix stack protector builds when cross compiling with Clang

 - Other Kbuild improvements and fixes

* tag 'x86_build_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/purgatory: Omit use of bin2c
  x86/purgatory: Hard-code obj-y in Makefile
  x86/build: Remove unused OBJECT_FILES_NON_STANDARD_test_nx.o
  x86/Kconfig: Fix CONFIG_CC_HAS_SANE_STACKPROTECTOR when cross compiling with clang
2022-08-01 10:14:19 -07:00
Linus Torvalds
ecf9b7bfea Merge tag 'x86_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Borislav Petkov:

 - Have invalid MSR accesses warnings appear only once after a
   pr_warn_once() change broke that

 - Simplify {JMP,CALL}_NOSPEC and let the objtool retpoline patching
   infra take care of them instead of having unreadable alternative
   macros there

* tag 'x86_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/extable: Fix ex_handler_msr() print condition
  x86,nospec: Simplify {JMP,CALL}_NOSPEC
2022-08-01 10:04:00 -07:00
Linus Torvalds
98b1783de2 Merge tag 'x86_misc_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:

 - Add a bunch of PCI IDs for new AMD CPUs and use them in k10temp

 - Free the pmem platform device on the registration error path

* tag 'x86_misc_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hwmon: (k10temp): Add support for new family 17h and 19h models
  x86/amd_nb: Add AMD PCI IDs for SMN communication
  x86/pmem: Fix platform-device leak in error path
2022-08-01 10:00:43 -07:00
Linus Torvalds
42efa5e3a8 Merge tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov:

 - Remove the vendor check when selecting MWAIT as the default idle
   state

 - Respect idle=nomwait when supplied on the kernel cmdline

 - Two small cleanups

* tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Use MSR_IA32_MISC_ENABLE constants
  x86: Fix comment for X86_FEATURE_ZEN
  x86: Remove vendor checks from prefer_mwait_c1_over_halt
  x86: Handle idle=nomwait cmdline properly for x86_idle
2022-08-01 09:49:29 -07:00
Linus Torvalds
650ea1f626 Merge tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu update from Borislav Petkov:

 - Add machinery to initialize AMX register state in order for
   AMX-capable CPUs to be able to enter deeper low-power state

* tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  intel_idle: Add a new flag to initialize the AMX state
  x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
2022-08-01 09:36:18 -07:00
Linus Torvalds
92598ae22f Merge tag 'x86_mm_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Borislav Petkov:

 - Rename a PKRU macro to make more sense when reading the code

 - Update pkeys documentation

 - Avoid reading contended mm's TLB generation var if not absolutely
   necessary along with fixing a case where arch_tlbbatch_flush()
   doesn't adhere to the generation scheme and thus violates the
   conditions for the above avoidance.

* tag 'x86_mm_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/tlb: Ignore f->new_tlb_gen when zero
  x86/pkeys: Clarify PKRU_AD_KEY macro
  Documentation/protection-keys: Clean up documentation for User Space pkeys
  x86/mm/tlb: Avoid reading mm_tlb_gen when possible
2022-08-01 09:34:39 -07:00
Linus Torvalds
94e37e8489 Merge tag 'x86_cleanups_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanup from Borislav Petkov:

 - A single CONFIG_ symbol correction in a comment

* tag 'x86_cleanups_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Refer to the intended config STRICT_DEVMEM in a comment
2022-08-01 09:33:17 -07:00
Linus Torvalds
dbc1f5a9f4 Merge tag 'x86_vmware_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vmware cleanup from Borislav Petkov:

 - A single statement simplification by using the BIT() macro

* tag 'x86_vmware_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vmware: Use BIT() macro for shifting
2022-08-01 09:31:49 -07:00
Linus Torvalds
296d3b3e05 Merge tag 'ras_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS update from Borislav Petkov:
 "A single RAS change:

   - Probe whether hardware error injection (direct MSR writes) is
     possible when injecting errors on AMD platforms. In some cases, the
     platform could prohibit those"

* tag 'ras_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Check whether writes to MCA_STATUS are getting ignored
2022-08-01 09:29:41 -07:00
Linus Torvalds
0fac198def Merge tag 'fs.idmapped.overlay.acl.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull acl updates from Christian Brauner:
 "Last cycle we introduced support for mounting overlayfs on top of
  idmapped mounts. While looking into additional testing we realized
  that posix acls don't really work correctly with stacking filesystems
  on top of idmapped layers.

  We already knew what the fix were but it would require work that is
  more suitable for the merge window so we turned off posix acls for
  v5.19 for overlayfs on top of idmapped layers with Miklos routing my
  patch upstream in 72a8e05d4f ("Merge tag 'ovl-fixes-5.19-rc7' [..]").

  This contains the work to support posix acls for overlayfs on top of
  idmapped layers. Since the posix acl fixes should use the new
  vfs{g,u}id_t work the associated branch has been merged in. (We sent a
  pull request for this earlier.)

  We've also pulled in Miklos pull request containing my patch to turn
  of posix acls on top of idmapped layers. This allowed us to avoid
  rebasing the branch which we didn't like because we were already at
  rc7 by then. Merging it in allows this branch to first fix posix acls
  and then to cleanly revert the temporary fix it brought in by commit
  4a47c6385b ("ovl: turn of SB_POSIXACL with idmapped layers
  temporarily").

  The last patch in this series adds Seth Forshee as a co-maintainer for
  idmapped mounts. Seth has been integral to all of this work and is
  also the main architect behind the filesystem idmapping work which
  ultimately made filesystems such as FUSE and overlayfs available in
  containers. He continues to be active in both development and review.
  I'm very happy he decided to help and he has my full trust. This
  increases the bus factor which is always great for work like this. I'm
  honestly very excited about this because I think in general we don't
  do great in the bringing on new maintainers department"

For more explanations of the ACL issues, see

  https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org/

* tag 'fs.idmapped.overlay.acl.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  Add Seth Forshee as co-maintainer for idmapped mounts
  Revert "ovl: turn of SB_POSIXACL with idmapped layers temporarily"
  ovl: handle idmappings in ovl_get_acl()
  acl: make posix_acl_clone() available to overlayfs
  acl: port to vfs{g,u}id_t
  acl: move idmapped mount fixup into vfs_{g,s}etxattr()
  mnt_idmapping: add vfs[g,u]id_into_k[g,u]id()
2022-08-01 09:10:07 -07:00
Linus Torvalds
bdfae5ce38 Merge tag 'fs.idmapped.vfsuid.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull fs idmapping updates from Christian Brauner:
 "This introduces the new vfs{g,u}id_t types we agreed on. Similar to
  k{g,u}id_t the new types are just simple wrapper structs around
  regular {g,u}id_t types.

  They allow to establish a type safety boundary in the VFS for idmapped
  mounts preventing confusion betwen {g,u}ids mapped into an idmapped
  mount and {g,u}ids mapped into the caller's or the filesystem's
  idmapping.

  An initial set of helpers is introduced that allows to operate on
  vfs{g,u}id_t types. We will remove all references to non-type safe
  idmapped mounts helpers in the very near future. The patches do
  already exist.

  This converts the core attribute changing codepaths which become
  significantly easier to reason about because of this change.

  Just a few highlights here as the patches give detailed overviews of
  what is happening in the commit messages:

   - The kernel internal struct iattr contains type safe vfs{g,u}id_t
     values clearly communicating that these values have to take a given
     mount's idmapping into account.

   - The ownership values placed in struct iattr to change ownership are
     identical for idmapped and non-idmapped mounts going forward. This
     also allows to simplify stacking filesystems such as overlayfs that
     change attributes In other words, they always represent the values.

   - Instead of open coding checks for whether ownership changes have
     been requested and an actual update of the inode is required we now
     have small static inline wrappers that abstract this logic away
     removing a lot of code duplication from individual filesystems that
     all open-coded the same checks"

* tag 'fs.idmapped.vfsuid.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  mnt_idmapping: align kernel doc and parameter order
  mnt_idmapping: use new helpers in mapped_fs{g,u}id()
  fs: port HAS_UNMAPPED_ID() to vfs{g,u}id_t
  mnt_idmapping: return false when comparing two invalid ids
  attr: fix kernel doc
  attr: port attribute changes to new types
  security: pass down mount idmapping to setattr hook
  quota: port quota helpers mount ids
  fs: port to iattr ownership update helpers
  fs: introduce tiny iattr ownership update helpers
  fs: use mount types in iattr
  fs: add two type safe mapping helpers
  mnt_idmapping: add vfs{g,u}id_t
2022-08-01 08:56:55 -07:00
Linus Torvalds
e6a7cf70a3 Merge tag 'filelock-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking updates from Jeff Layton:
 "Just a couple of flock() patches from Kuniyuki Iwashima.

  The main change is that this moves a file_lock allocation from the
  slab to the stack"

* tag 'filelock-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  fs/lock: Rearrange ops in flock syscall.
  fs/lock: Don't allocate file_lock in flock_make_lock().
2022-08-01 08:54:59 -07:00
Linus Torvalds
e88745dcfd Merge tag 'erofs-for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
 "First of all, we'd like to add Yue Hu and Jeffle Xu as two new
  reviewers. Thank them for spending time working on EROFS!

  There is no major feature outstanding in this cycle, mainly a patchset
  I worked on to prepare for rolling hash deduplication and folios for
  compressed data as the next big features. It kills the unneeded
  PG_error flag dependency as well.

  Apart from that, there are bugfixes and cleanups as always. Details
  are listed below:

   - Add Yue Hu and Jeffle Xu as reviewers

   - Add the missing wake_up when updating lzma streams

   - Avoid consecutive detection for Highmem memory

   - Prepare for multi-reference pclusters and get rid of PG_error

   - Fix ctx->pos update for NFS export

   - minor cleanups"

* tag 'erofs-for-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (23 commits)
  erofs: update ctx->pos for every emitted dirent
  erofs: get rid of the leftover PAGE_SIZE in dir.c
  erofs: get rid of erofs_prepare_dio() helper
  erofs: introduce multi-reference pclusters (fully-referenced)
  erofs: record the longest decompressed size in this round
  erofs: introduce z_erofs_do_decompressed_bvec()
  erofs: try to leave (de)compressed_pages on stack if possible
  erofs: introduce struct z_erofs_decompress_backend
  erofs: get rid of `z_pagemap_global'
  erofs: clean up `enum z_erofs_collectmode'
  erofs: get rid of `enum z_erofs_page_type'
  erofs: rework online page handling
  erofs: switch compressed_pages[] to bufvec
  erofs: introduce `z_erofs_parse_in_bvecs'
  erofs: drop the old pagevec approach
  erofs: introduce bufvec to store decompressed buffers
  erofs: introduce `z_erofs_parse_out_bvecs()'
  erofs: clean up z_erofs_collector_begin()
  erofs: get rid of unneeded `inode', `map' and `sb'
  erofs: avoid consecutive detection for Highmem memory
  ...
2022-08-01 08:52:37 -07:00
Linus Torvalds
bec14d79f7 Merge tag 'fsnotify_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:

 - support for FAN_MARK_IGNORE which untangles some of the not well
   defined corner cases with fanotify ignore masks

 - small cleanups

* tag 'fsnotify_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: Fix comment typo
  fanotify: introduce FAN_MARK_IGNORE
  fanotify: cleanups for fanotify_mark() input validations
  fanotify: prepare for setting event flags in ignore mask
  fs: inotify: Fix typo in inotify comment
2022-08-01 08:50:39 -07:00
Linus Torvalds
af07685b9c Merge tag 'fs_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2 and reiserfs updates from Jan Kara:
 "A fix for ext2 handling of a corrupted fs image and cleanups in ext2
  and reiserfs"

* tag 'fs_for_v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: Add more validity checks for inode counts
  fs/reiserfs/inode: remove dead code in _get_block_create_0()
  fs/ext2: replace ternary operator with min_t()
2022-08-01 08:48:37 -07:00
Linus Torvalds
eb43bbac4c Merge tag 'dlm-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:

 - Delay the cleanup of interrupted posix lock requests until the user
   space result arrives. Previously, the immediate cleanup would lead to
   extraneous warnings when the result arrived.

 - Tracepoint improvements, e.g. adding the lock resource name.

 - Delay the completion of lockspace creation until one full recovery
   cycle has completed. This allows more error cases to be returned to
   the caller.

 - Remove warnings from the locking layer about delayed network replies.
   The recently added midcomms warnings are much more useful.

 - Begin the process of deprecating two unused lock-timeout-related
   features. These features now require enabling via a Kconfig option,
   and enabling them triggers deprecation warnings. We expect to remove
   the code in v6.2.

* tag 'dlm-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  fs: dlm: move kref_put assert for lkb structs
  fs: dlm: don't use deprecated timeout features by default
  fs: dlm: add deprecation Kconfig and warnings for timeouts
  fs: dlm: remove timeout from dlm_user_adopt_orphan
  fs: dlm: remove waiter warnings
  fs: dlm: fix grammar in lowcomms output
  fs: dlm: add comment about lkb IFL flags
  fs: dlm: handle recovery result outside of ls_recover
  fs: dlm: make new_lockspace() wait until recovery completes
  fs: dlm: call dlm_lsop_recover_prep once
  fs: dlm: update comments about recovery and membership handling
  fs: dlm: add resource name to tracepoints
  fs: dlm: remove additional dereference of lksb
  fs: dlm: change ast and bast trace order
  fs: dlm: change posix lock sigint handling
  fs: dlm: use dlm_plock_info for do_unlock_close
  fs: dlm: change plock interrupted message to debug again
  fs: dlm: add pid to debug log
  fs: dlm: plock use list_first_entry
2022-08-01 08:46:53 -07:00
Alexander Aring
9585898922 fs: dlm: move kref_put assert for lkb structs
The unhold_lkb() function decrements the lock's kref, and
asserts that the ref count was not the final one.  Use the
kref_put release function (which should not be called) to
call the assert, rather than doing the assert based on the
kref_put return value.  Using kill_lkb() as the release
function doesn't make sense if we only want to assert.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-01 09:31:46 -05:00
Alexander Aring
6b0afc0cc3 fs: dlm: don't use deprecated timeout features by default
This patch will disable use of deprecated timeout features if
CONFIG_DLM_DEPRECATED_API is not set.  The deprecated features
will be removed in upcoming kernel release v6.2.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-01 09:31:38 -05:00
Alexander Aring
81eeb82fc2 fs: dlm: add deprecation Kconfig and warnings for timeouts
This patch adds a CONFIG_DLM_DEPRECATED_API Kconfig option
that must be enabled to use two timeout-related features
that we intend to remove in kernel v6.2.  Warnings are
printed if either is enabled and used.  Neither has ever
been used as far as we know.

. The DLM_LSFL_TIMEWARN lockspace creation flag will be
  removed, along with the associated configfs entry for
  setting the timeout.  Setting the flag and configfs file
  would cause dlm to track how long locks were waiting
  for reply messages.  After a timeout, a kernel message
  would be logged, and a netlink message would be sent
  to userspace.  Recently, midcomms messages have been
  added that produce much better logging about actual
  problems with messages.  No use has ever been found
  for the netlink messages.

. The userspace libdlm API has allowed the DLM_LKF_TIMEOUT
  flag with a timeout value to be set in lock requests.
  The lock request would be cancelled after the timeout.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-01 09:31:32 -05:00
Mathieu Desnoyers
c17a6ff932 rseq: Kill process when unknown flags are encountered in ABI structures
rseq_abi()->flags and rseq_abi()->rseq_cs->flags 29 upper bits are
currently unused.

The current behavior when those bits are set is to ignore them. This is
not an ideal behavior, because when future features will start using
those flags, if user-space fails to correctly validate that the kernel
indeed supports those flags (e.g. with a new sys_rseq flags bit) before
using them, it may incorrectly assume that the kernel will handle those
flags way when in fact those will be silently ignored on older kernels.

Validating that unused flags bits are cleared will allow a smoother
transition when those flags will start to be used by allowing
applications to fail early, and obviously, when they attempt to use the
new flags on an older kernel that does not support them.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20220622194617.1155957-2-mathieu.desnoyers@efficios.com
2022-08-01 15:21:42 +02:00
Mathieu Desnoyers
0190e4198e rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags
The pretty much unused RSEQ_CS_FLAG_NO_RESTART_ON_* flags introduce
complexity in rseq, and are subtly buggy [1]. Solving those issues
requires introducing additional complexity in the rseq implementation
for each supported architecture.

Considering that it complexifies the rseq ABI, I am proposing that we
deprecate those flags. [2]

So far there appears to be consensus from maintainers of user-space
projects impacted by this feature that its removal would be a welcome
simplification. [3]

The deprecation approach proposed here is to issue WARN_ON_ONCE() when
encountering those flags and kill the offending process with sigsegv.
This should allow us to quickly identify whether anyone yells at us for
removing this.

Link: https://lore.kernel.org/lkml/20220618182515.95831-1-minhquangbui99@gmail.com/ [1]
Link: https://lore.kernel.org/lkml/258546133.12151.1655739550814.JavaMail.zimbra@efficios.com/ [2]
Link: https://lore.kernel.org/lkml/87pmj1enjh.fsf@email.froward.int.ebiederm.org/ [3]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/lkml/20220622194617.1155957-1-mathieu.desnoyers@efficios.com
2022-08-01 15:21:29 +02:00
Linus Torvalds
3d7cb6b04c Linux 5.19 2022-07-31 14:03:01 -07:00
Linus Torvalds
334c0ef642 Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
 "One-liner fix of a NULL pointer deref in the Allwinner clk driver"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi-ng: Fix H6 RTC clock definition
2022-07-31 09:52:20 -07:00
Linus Torvalds
89caf57540 Merge tag 'x86_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Update the 'mitigations=' kernel param documentation

 - Check the IBPB feature flag before enabling IBPB in firmware calls
   because cloud vendors' fantasy when it comes to creating guest
   configurations is unlimited

 - Unexport sev_es_ghcb_hv_call() before 5.19 releases now that HyperV
   doesn't need it anymore

 - Remove dead CONFIG_* items

* tag 'x86_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
  x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
  Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV"
  x86/configs: Update configs in x86_debug.config
2022-07-31 09:26:53 -07:00
Linus Torvalds
5e4823e6da Merge tag 'locking_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Borislav Petkov:

 - Avoid rwsem lockups in certain situations when handling the handoff
   bit

* tag 'locking_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
2022-07-31 09:21:13 -07:00
Linus Torvalds
cd2715b792 Merge tag 'edac_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:

 - Relax the condition under which the DIMM label in ghes_edac is set in
   order to accomodate an HPE BIOS which sets only the device but not
   the bank

 - Two forgotten fixes to synopsys_edac when handling error interrupts

* tag 'edac_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/ghes: Set the DIMM label unconditionally
  EDAC/synopsys: Re-enable the error interrupts on v3 hw
  EDAC/synopsys: Use the correct register to disable the error interrupt on v3 hw
2022-07-31 09:12:58 -07:00
Hongnan Li
ecce9212d0 erofs: update ctx->pos for every emitted dirent
erofs_readdir update ctx->pos after filling a batch of dentries
and it may cause dir/files duplication for NFS readdirplus which
depends on ctx->pos to fill dir correctly. So update ctx->pos for
every emitted dirent in erofs_fill_dentries to fix it.

Also fix the update of ctx->pos when the initial file position has
exceeded nameoff.

Fixes: 3e917cc305 ("erofs: make filesystem exportable")
Signed-off-by: Hongnan Li <hongnan.li@linux.alibaba.com>
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220722082732.30935-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-07-31 22:26:29 +08:00
Linus Torvalds
6a01025844 Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 "Last set of ARM fixes for 5.19:

   - fix for MAX_DMA_ADDRESS overflow

   - fix for find_*_bit performing an out of bounds memory access"

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: findbit: fix overflowing offset
  ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow
2022-07-30 17:24:16 -07:00
Waiman Long
6eebd5fb20 locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
With commit d257cc8cb8 ("locking/rwsem: Make handoff bit handling more
consistent"), the writer that sets the handoff bit can be interrupted
out without clearing the bit if the wait queue isn't empty. This disables
reader and writer optimistic lock spinning and stealing.

Now if a non-first writer in the queue is somehow woken up or a new
waiter enters the slowpath, it can't acquire the lock.  This is not the
case before commit d257cc8cb8 as the writer that set the handoff bit
will clear it when exiting out via the out_nolock path. This is less
efficient as the busy rwsem stays in an unlock state for a longer time.

In some cases, this new behavior may cause lockups as shown in [1] and
[2].

This patch allows a non-first writer to ignore the handoff bit if it
is not originally set or initiated by the first waiter. This patch is
shown to be effective in fixing the lockup problem reported in [1].

[1] https://lore.kernel.org/lkml/20220617134325.GC30825@techsingularity.net/
[2] https://lore.kernel.org/lkml/3f02975c-1a9d-be20-32cf-f1d8e3dfafcc@oracle.com/

Fixes: d257cc8cb8 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20220622200419.778799-1-longman@redhat.com
2022-07-30 10:58:28 +02:00
Linus Torvalds
620725263f Merge tag 'mm-hotfixes-stable-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "Two hotfixes, both cc:stable"

* tag 'mm-hotfixes-stable-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/hmm: fault non-owner device private entries
  page_alloc: fix invalid watermark check on a negative value
2022-07-29 21:02:35 -07:00
Linus Torvalds
8a91f86f3e Merge tag 'block-5.19-2022-07-29' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
 "Just a single fix for NVMe, yet another quirk addition"

* tag 'block-5.19-2022-07-29' of git://git.kernel.dk/linux-block:
  nvme-pci: Crucial P2 has bogus namespace ids
2022-07-29 16:07:35 -07:00
Linus Torvalds
e65c6a46df Merge tag 'drm-fixes-2022-07-30' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
 "Maxime had the dog^Wmailing list server eat his homework^Wmisc pull
  request.

  Two more small fixes, one in nouveau svm code and the other in
  simpledrm.

  nouveau:
   - page migration fix

  simpledrm:
   - fix mode_valid return value"

* tag 'drm-fixes-2022-07-30' of git://anongit.freedesktop.org/drm/drm:
  nouveau/svm: Fix to migrate all requested pages
  drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()
2022-07-29 13:25:31 -07:00
Dave Airlie
ce156c8a18 Merge tag 'drm-misc-fixes-2022-07-29' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
One fix to fix simpledrm mode_valid return value, and one for page
migration in nouveau

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220729094514.sfzhc3gqjgwgal62@penduick
2022-07-30 06:09:57 +10:00
Linus Torvalds
1c8ac1c4af Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Four fixes, three in drivers.

  The two biggest fixes are ufs and the remaining driver and core fix
  are small and obvious (and the core fix is low risk)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Fix a race condition related to device management
  scsi: core: Fix warning in scsi_alloc_sgtables()
  scsi: ufs: host: Hold reference returned by of_parse_phandle()
  scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown
2022-07-29 13:07:03 -07:00
Eiichi Tsukata
ea304a8b89 docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
Updates descriptions for "mitigations=off" and "mitigations=auto,nosmt"
with the respective retbleed= settings.

Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: corbet@lwn.net
Link: https://lore.kernel.org/r/20220728043907.165688-1-eiichi.tsukata@nutanix.com
2022-07-29 20:47:07 +02:00
Ralph Campbell
8a295dbbaf mm/hmm: fault non-owner device private entries
If hmm_range_fault() is called with the HMM_PFN_REQ_FAULT flag and a
device private PTE is found, the hmm_range::dev_private_owner page is used
to determine if the device private page should not be faulted in. 
However, if the device private page is not owned by the caller,
hmm_range_fault() returns an error instead of calling migrate_to_ram() to
fault in the page.

For example, if a page is migrated to GPU private memory and a RDMA fault
capable NIC tries to read the migrated page, without this patch it will
get an error.  With this patch, the page will be migrated back to system
memory and the NIC will be able to read the data.

Link: https://lkml.kernel.org/r/20220727000837.4128709-2-rcampbell@nvidia.com
Link: https://lkml.kernel.org/r/20220725183615.4118795-2-rcampbell@nvidia.com
Fixes: 08ddddda66 ("mm/hmm: check the device private page owner in hmm_range_fault()")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reported-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-29 11:33:37 -07:00
Jaewon Kim
9282012fc0 page_alloc: fix invalid watermark check on a negative value
There was a report that a task is waiting at the
throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
increasing.

This is a bug where zone_watermark_fast returns true even when the free
is very low. The commit f27ce0e140 ("page_alloc: consider highatomic
reserve in watermark fast") changed the watermark fast to consider
highatomic reserve. But it did not handle a negative value case which
can be happened when reserved_highatomic pageblock is bigger than the
actual free.

If watermark is considered as ok for the negative value, allocating
contexts for order-0 will consume all free pages without direct reclaim,
and finally free page may become depleted except highatomic free.

Then allocating contexts may fall into throttle_direct_reclaim. This
symptom may easily happen in a system where wmark min is low and other
reclaimers like kswapd does not make free pages quickly.

Handle the negative case by using MIN.

Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com
Fixes: f27ce0e140 ("page_alloc: consider highatomic reserve in watermark fast")
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Reported-by: GyeongHwan Hong <gh21.hong@samsung.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: <stable@vger.kerenl.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-29 11:33:37 -07:00