Commit Graph

19483 Commits

Author SHA1 Message Date
Thomas Gleixner
4176b541c2 x86/cpu/topology: Provide separate APIC registration functions
generic_processor_info() aside of being a complete misnomer is used for
both early boot registration and ACPI CPU hotplug.

While it's arguable that this can share some code, it results in code which
is hard to understand and kept around post init for no real reason.

Also the call sites do lots of manual fiddling in topology related
variables instead of having proper interfaces for the purpose which handle
the topology internals correctly.

Provide topology_register_apic(), topology_hotplug_apic() and
topology_hotunplug_apic() which have the extra magic of the call sites
incorporated and for now are wrappers around generic_processor_info().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210251.605007456@linutronix.de
2024-02-15 22:07:42 +01:00
Thomas Gleixner
c0a66c2847 x86/cpu/topology: Move registration out of APIC code
The APIC/CPU registration sits in the middle of the APIC code. In fact this
is a topology evaluation function and has nothing to do with the inner
workings of the local APIC.

Move it out into a file which reflects what this is about.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210251.543948812@linutronix.de
2024-02-15 22:07:41 +01:00
Thomas Gleixner
1a5d0f62d1 x86/apic: Use a proper define for invalid ACPI CPU ID
The ACPI ID for CPUs is preset with U32_MAX which is completely non
obvious. Use a proper define for it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154640.177504138@linutronix.de
2024-02-15 22:07:41 +01:00
Thomas Gleixner
4a5f72a4a3 x86/apic: Remove yet another dubious callback
Paranoia is not wrong, but having an APIC callback which is in most
implementations a complete NOOP and in one actually looking whether the
APICID of an upcoming CPU has been registered. The same APICID which was
used to bring the CPU out of wait for startup.

That's paranoia for the paranoia sake. Remove the voodoo.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154640.116510935@linutronix.de
2024-02-15 22:07:41 +01:00
Thomas Gleixner
58d1692835 x86/apic: Remove the pointless writeback of boot_cpu_physical_apicid
There is absolutely no point to write the APIC ID which was read from the
local APIC earlier, back into the local APIC for the 64-bit UP case.

Remove that along with the apic callback which is solely there for this
pointless exercise.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154640.055288922@linutronix.de
2024-02-15 22:07:41 +01:00
Thomas Gleixner
350b5e2730 x86/mpparse: Remove the physid_t bitmap wrapper
physid_t is a wrapper around bitmap. Just remove the onion layer and use
bitmap functionality directly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154639.994904510@linutronix.de
2024-02-15 22:07:41 +01:00
Thomas Gleixner
de6aec2417 x86/mm/numa: Move early mptable evaluation into common code
There is no reason to have the early mptable evaluation conditionally
invoked only from the AMD numa topology code.

Make it explicit and invoke it from setup_arch() right after the
corresponding ACPI init call. Remove the pointless wrapper and invoke
x86_init::mpparse::early_parse_smp_config() directly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154639.931761608@linutronix.de
2024-02-15 22:07:41 +01:00
Thomas Gleixner
dcb7600849 x86/mpparse: Switch to new init callbacks
Now that all platforms have the new split SMP configuration callbacks set
up, flip the switch and remove the old callback pointer and mop up the
platform code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154639.870883080@linutronix.de
2024-02-15 22:07:41 +01:00
Thomas Gleixner
30c928691c x86/jailhouse: Prepare for separate mpparse callbacks
Provide a wrapper around the existing function and fill the new callbacks
in.

No functional change as the new callbacks are not yet operational.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154639.683073662@linutronix.de
2024-02-15 22:07:40 +01:00
Thomas Gleixner
5faf8ec771 x86/dtb: Rename x86_dtb_init()
x86_dtb_init() is a misnomer and it really should be used as a SMP
configuration parser which is selected by the platform via
x86_init::mpparse:parse_smp_config().

Rename it to x86_dtb_parse_smp_config() in preparation for that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154639.495992801@linutronix.de
2024-02-15 22:07:40 +01:00
Thomas Gleixner
d0a85126b1 x86/mpparse: Prepare for callback separation
In preparation of splitting the get_smp_config() callback, rename
default_get_smp_config() to mpparse_get_smp_config() and provide an early
and late wrapper.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154639.433811243@linutronix.de
2024-02-15 22:07:40 +01:00
Thomas Gleixner
e061c7ae08 x86/mpparse: Rename default_find_smp_config()
MPTABLE is no longer the default SMP configuration mechanism.  Rename it to
mpparse_find_mptable() because that's what it does.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154639.306287711@linutronix.de
2024-02-15 22:07:40 +01:00
Thomas Gleixner
3e48d804c8 x86/apic: Remove check_apicid_used() and ioapic_phys_id_map()
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154639.243307499@linutronix.de
2024-02-15 22:07:39 +01:00
Thomas Gleixner
4b99e735a5 x86/ioapic: Simplify setup_ioapic_ids_from_mpc_nocheck()
No need to go through APIC callbacks. It's already established that this is
an ancient APIC. So just copy the present mask and use the direct physid*
functions all over the place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154639.181901887@linutronix.de
2024-02-15 22:07:39 +01:00
Thomas Gleixner
533535afc0 x86/ioapic: Make io_apic_get_unique_id() simpler
No need to go through APIC callbacks. It's already established that this is
an ancient APIC. So just copy the present mask and use the direct physid*
functions all over the place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154639.119261725@linutronix.de
2024-02-15 22:07:39 +01:00
Thomas Gleixner
517234446c x86/apic: Get rid of get_physical_broadcast()
There is no point for this function. The only case where this is used is
when there is no XAPIC available, which means the broadcast address is 0xF.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154639.057209154@linutronix.de
2024-02-15 22:07:39 +01:00
Thomas Gleixner
2ac9e529d7 x86/ioapic: Replace some more set bit nonsense
Yet another set_bit() operation wrapped in oring a mask.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154638.995080989@linutronix.de
2024-02-15 22:07:39 +01:00
Thomas Gleixner
490cc3c5e7 x86/platform/ce4100: Dont override x86_init.mpparse.setup_ioapic_ids
There is no point to do that. The ATOMs have an XAPIC for which this
function is a pointless exercise.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154638.931617775@linutronix.de
2024-02-15 22:07:39 +01:00
Thomas Gleixner
52128a7a21 x86/cpu/topology: Make the APIC mismatch warnings complete
Detect all possible combinations of mismatch right in the CPUID evaluation
code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154638.867699078@linutronix.de
2024-02-15 22:07:39 +01:00
Thomas Gleixner
bcccdf8b30 x86/apic/uv: Remove the private leaf 0xb parser
The package shift has been already evaluated by the early CPU init.

Put the mindless copy right next to the original leaf 0xb parser.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153625.637385562@linutronix.de
2024-02-15 22:07:39 +01:00
Thomas Gleixner
035fc90a9d x86/apic: Remove unused phys_pkg_id() callback
Now that the core code does not use this monstrosity anymore, it's time to
put it to rest.

The only real purpose was to read the APIC ID on UV and VSMP systems for
the actual evaluation. That's what the core code does now.

For doing the actual shift operation there is truly no APIC callback
required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153625.516536121@linutronix.de
2024-02-15 22:07:38 +01:00
Thomas Gleixner
fab75e790f x86/cpu: Remove x86_coreid_bits
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153625.455839743@linutronix.de
2024-02-15 22:07:38 +01:00
Thomas Gleixner
6cf70394e7 x86/cpu: Remove topology.c
No more users. Stick it into the ugly code museum.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153625.395230346@linutronix.de
2024-02-15 22:07:38 +01:00
Thomas Gleixner
03fa6bea5a x86/cpu: Make topology_amd_node_id() use the actual node info
Now that everything is converted switch it over and remove the intermediate
operation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153625.334185785@linutronix.de
2024-02-15 22:07:38 +01:00
Thomas Gleixner
3279081dd0 x86/cpu: Use common topology code for HYGON
Switch it over to use the consolidated topology evaluation and remove the
temporary safe guards which are not longer needed.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153625.207750409@linutronix.de
2024-02-15 22:07:38 +01:00
Thomas Gleixner
c749ce393b x86/cpu: Use common topology code for AMD
Switch it over to the new topology evaluation mechanism and remove the
random bits and pieces which are sprinkled all over the place.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153625.145745053@linutronix.de
2024-02-15 22:07:38 +01:00
Thomas Gleixner
ace278e7ec x86/smpboot: Teach it about topo.amd_node_id
When switching AMD over to the new topology parser then the match functions
need to look for AMD systems with the extended topology feature at the new
topo.amd_node_id member which is then holding the node id information.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153625.082979150@linutronix.de
2024-02-15 22:07:37 +01:00
Thomas Gleixner
f7fb3b2dd9 x86/cpu: Provide an AMD/HYGON specific topology parser
AMD/HYGON uses various methods for topology evaluation:

  - Leaf 0x80000008 and 0x8000001e based with an optional leaf 0xb,
    which is the preferred variant for modern CPUs.

    Leaf 0xb will be superseded by leaf 0x80000026 soon, which is just
    another variant of the Intel 0x1f leaf for whatever reasons.
    
  - Subleaf 0x80000008 and NODEID_MSR base

  - Legacy fallback

That code is following the principle of random bits and pieces all over the
place which results in multiple evaluations and impenetrable code flows in
the same way as the Intel parsing did.

Provide a sane implementation by clearly separating the three variants and
bringing them in the proper preference order in one place.

This provides the parsing for both AMD and HYGON because there is no point
in having a separate HYGON parser which only differs by 3 lines of
code. Any further divergence between AMD and HYGON can be handled in
different functions, while still sharing the existing parsers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153625.020038641@linutronix.de
2024-02-15 22:07:37 +01:00
Thomas Gleixner
7e3ec62867 x86/cpu/amd: Provide a separate accessor for Node ID
AMD (ab)uses topology_die_id() to store the Node ID information and
topology_max_dies_per_pkg to store the number of nodes per package.

This collides with the proper processor die level enumeration which is
coming on AMD with CPUID 8000_0026, unless there is a correlation between
the two. There is zero documentation about that.

So provide new storage and new accessors which for now still access die_id
and topology_max_die_per_pkg(). Will be mopped up after AMD and HYGON are
converted over.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.956116738@linutronix.de
2024-02-15 22:07:37 +01:00
Thomas Gleixner
22d63660c3 x86/cpu: Use common topology code for Intel
Intel CPUs use either topology leaf 0xb/0x1f evaluation or the legacy
SMP/HT evaluation based on CPUID leaf 0x1/0x4.

Move it over to the consolidated topology code and remove the random
topology hacks which are sprinkled into the Intel and the common code.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.893644349@linutronix.de
2024-02-15 22:07:37 +01:00
Thomas Gleixner
3d41009425 x86/cpu: Provide a sane leaf 0xb/0x1f parser
detect_extended_topology() along with it's early() variant is a classic
example for duct tape engineering:

  - It evaluates an array of subleafs with a boatload of local variables
    for the relevant topology levels instead of using an array to save the
    enumerated information and propagate it to the right level

  - It has no boundary checks for subleafs

  - It prevents updating the die_id with a crude workaround instead of
    checking for leaf 0xb which does not provide die information.

  - It's broken vs. the number of dies evaluation as it uses:

      num_processors[DIE_LEVEL] / num_processors[CORE_LEVEL]

    which "works" only correctly if there is none of the intermediate
    topology levels (MODULE/TILE) enumerated.

There is zero value in trying to "fix" that code as the only proper fix is
to rewrite it from scratch.

Implement a sane parser with proper code documentation, which will be used
for the consolidated topology evaluation in the next step.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.830571770@linutronix.de
2024-02-15 22:07:37 +01:00
Thomas Gleixner
92853a7774 x86/cpu: Move __max_die_per_package to common.c
In preparation of a complete replacement for the topology leaf 0xb/0x1f
evaluation, move __max_die_per_package into the common code.

Will be removed once everything is converted over.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.768188958@linutronix.de
2024-02-15 22:07:37 +01:00
Thomas Gleixner
598e719c40 x86/cpu: Use common topology code for Centaur and Zhaoxin
Centaur and Zhaoxin CPUs use only the legacy SMP detection. Remove the
invocations from their 32bit path and exclude them from the 64-bit call
path.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.706794189@linutronix.de
2024-02-15 22:07:37 +01:00
Thomas Gleixner
bda74aae20 x86/cpu: Add legacy topology parser
The legacy topology detection via CPUID leaf 4, which provides the number
of cores in the package and CPUID leaf 1 which provides the number of
logical CPUs in case that FEATURE_HT is enabled and the CMP_LEGACY feature
is not set, is shared for Intel, Centaur and Zhaoxin CPUs.

Lift the code from common.c without the early detection hack and provide it
as common fallback mechanism.

Will be utilized in later changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.644448852@linutronix.de
2024-02-15 22:07:37 +01:00
Thomas Gleixner
ebdb203610 x86/cpu: Provide cpu_init/parse_topology()
Topology evaluation is a complete disaster and impenetrable mess. It's
scattered all over the place with some vendor implementations doing early
evaluation and some not. The most horrific part is the permanent
overwriting of smt_max_siblings and __max_die_per_package, instead of
establishing them once on the boot CPU and validating the result on the
APs.

The goals are:

  - One topology evaluation entry point

  - Proper sharing of pointlessly duplicated code

  - Proper structuring of the evaluation logic and preferences.

  - Evaluating important system wide information only once on the boot CPU

  - Making the 0xb/0x1f leaf parsing less convoluted and actually fixing
    the short comings of leaf 0x1f evaluation.

Start to consolidate the topology evaluation code by providing the entry
points for the early boot CPU evaluation and for the final parsing on the
boot CPU and the APs.

Move the trivial pieces into that new code:

   - The initialization of cpuinfo_x86::topo

   - The evaluation of CPUID leaf 1, which presets topo::initial_apicid

   - topo_apicid is set to topo::initial_apicid when invoked from early
     boot. When invoked for the final evaluation on the boot CPU it reads
     the actual APIC ID, which makes apic_get_initial_apicid() obsolete
     once everything is converted over.

Provide a temporary helper function topo_converted() which shields off the
not yet converted CPU vendors from invoking code which would break them.
This shielding covers all vendor CPUs which support SMP, but not the
historical pure UP ones as they only need the topology info init and
eventually the initial APIC initialization.

Provide two new members in cpuinfo_x86::topo to store the maximum number of
SMT siblings and the number of dies per package and add them to the debugfs
readout. These two members will be used to populate this information on the
boot CPU and to validate the APs against it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240212153624.581436579@linutronix.de
2024-02-15 22:07:36 +01:00
Ingo Molnar
4589f199eb Merge branch 'x86/bugs' into x86/core, to pick up pending changes before dependent patches
Merge in pending alternatives patching infrastructure changes, before
applying more patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-02-14 10:49:37 +01:00
Ingo Molnar
03c11eb3b1 Linux 6.8-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmXJK4UeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGHsYH/jKmzKXDRsBCcw/Q
 HGUvFtpohWBOpN6efdf0nxilQisuyQrqKB9fnwvfcdE60VpqMJXFMdlFh/fonxPl
 JMbpk9y5uw48IJZA43NwTxUrjZ4wyWzv4ZF6YWa+5WdTAJpPLEPhhnLxcHOKklMr
 5Cm/7B/M7eB2BXBfc45b1pkKN22q9OXvjaKxZ+5wYmiMxS+GC8l8jiJ/WlHX78PR
 eLgsa1v732f2D7YF75wVhaoYepR+QzA9wTKqhjMNCEaVc2PQhA2JRsBXEt84qEIa
 FZigmf7LLc4ed9YA2XjRBZhAehe3cZVJZ1lasW37IATS921La2WfKuiysICJOtyT
 bGjK8tk=
 =Pt7W
 -----END PGP SIGNATURE-----

Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the branch

Conflicts:
	arch/x86/include/asm/percpu.h
	arch/x86/include/asm/text-patching.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-02-14 10:45:07 +01:00
Josh Poimboeuf
4461438a84 x86/retpoline: Ensure default return thunk isn't used at runtime
Make sure the default return thunk is not used after all return
instructions have been patched by the alternatives because the default
return thunk is insufficient when it comes to mitigating Retbleed or
SRSO.

Fix based on an earlier version by David Kaplan <david.kaplan@amd.com>.

  [ bp: Fix the compilation error of warn_thunk_thunk being an invisible
        symbol, hoist thunk macro into calling.h ]

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231010171020.462211-4-david.kaplan@amd.com
Link: https://lore.kernel.org/r/20240104132446.GEZZaxnrIgIyat0pqf@fat_crate.local
2024-02-12 11:42:15 +01:00
Linus Torvalds
c021e191cf - Correct the minimum CPU family for Transmeta Crusoe in Kconfig so that
such hw can boot again
 
 - Do not take into accout XSTATE buffer size info supplied by userspace
   when constructing a sigreturn frame
 
 - Switch get_/put_user* to EX_TYPE_UACCESS exception handling when an
   MCE is encountered so that it can be properly recovered from instead
   of simply panicking
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmXIo3cACgkQEsHwGGHe
 VUpnvg//THpQodOkgc8SLMut0fx/qcmWTZAxXKBPQklZkBq3sbA6wEDQqvBNkXfl
 ovSss8TeL0KRrq3OsurJK+QXP94+nFt11q9SEhqPmhGb9d4H7aBimCrNjP0yEE1f
 YuvkhGhylIPnrwYoJUrK024tuxkFFgIVqr+adv1PrvtohnpVhICJY2oTpxtpQDZi
 r+k7P7VBG1oNvYETAbljbTQr5KV84YTmZa899/tncZaZbE+18bK/VJhL728ztSzD
 Xdwoztrf37fqYk03l40MJwJwpiAC5t2g/qwa5yvHjr9Eavb5YeLX34nxeG2AdOpx
 GTwrWkIW1dY4ck3lC4HR/igd2bDB4ZEfxJMMLkQAIvurGpQjU/jVXC28V4r6N5MW
 UF1gf4i9m2/BrpX+wpDOi11tl5RQQcV7Y8qsMN1lqRM5sDjjh4PV9oT2TXKmuYn6
 2T4Xv0A94FROFkQ9F52MFqTcwh0Yu9vtGsmtbCRP/em5OwqyyVFHWdEFR4PSZUpU
 89V7zVFlLWTEuPjrUAU9sQmTL56gNlVmejWAzearhHgeFKUs0EK1hcn310454aVm
 CzDN+4u8uCHFDKsF915nQnRI6jpRnf3mC4xWYheHcoCg02iSImWwVGGVHbJrWSNV
 fFYxwWtpFw0N9jzCfUHnElp3jN1Ll1LkkWQC4NvCtZxeUioqKJI=
 =b7B7
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Correct the minimum CPU family for Transmeta Crusoe in Kconfig so
   that such hw can boot again

 - Do not take into accout XSTATE buffer size info supplied by userspace
   when constructing a sigreturn frame

 - Switch get_/put_user* to EX_TYPE_UACCESS exception handling when an
   MCE is encountered so that it can be properly recovered from instead
   of simply panicking

* tag 'x86_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6
  x86/fpu: Stop relying on userspace for info to fault in xsave buffer
  x86/lib: Revert to _ASM_EXTABLE_UA() for {get,put}_user() fixups
2024-02-11 11:41:51 -08:00
Paolo Bonzini
dcf0926e9b x86: replace CONFIG_HAVE_KVM with IS_ENABLED(CONFIG_KVM)
It is more accurate to check if KVM is enabled, instead of having the
architecture say so.  Architectures always "have" KVM, so for example
checking CONFIG_HAVE_KVM in x86 code is pointless, but if KVM is disabled
in a specific build, there is no need for support code.

Alternatively, many of the #ifdefs could simply be deleted.  However,
this would add completely dead code.  For example, when KVM is disabled,
there should not be any posted interrupts, i.e. NOT wiring up the "dummy"
handlers and treating IRQs on those vectors as spurious is the right
thing to do.

Cc: x86@kernel.org
Cc: kbingham@kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08 08:45:35 -05:00
Linus Torvalds
5c24ba2055 x86 guest:
* Avoid false positive for check that only matters on AMD processors
 
 x86:
 
 * Give a hint when Win2016 might fail to boot due to XSAVES && !XSAVEC configuration
 
 * Do not allow creating an in-kernel PIT unless an IOAPIC already exists
 
 RISC-V:
 
 * Allow ISA extensions that were enabled for bare metal in 6.8
   (Zbc, scalar and vector crypto, Zfh[min], Zihintntl, Zvfh[min], Zfa)
 
 S390:
 
 * fix CC for successful PQAP instruction
 
 * fix a race when creating a shadow page
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmXB9EIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNF6Qf/VbNzzntY2BBNL6ZReqH+7GqMCMo7
 Q8OYsP+B7TWc0C84JNBTmvC5lwY0FmXEV+i9XFUnyMt/eEHEfr/rko1McRf+byAM
 vcfbTAz8t24bFSfojg7QJGM+pfUTrqjGmWqHwke/DuARsGB8Zntgtb50m966+xso
 kDtcsrfGOlpHbnnWZQLLQKJ6tVv7Z2/clFlf4gCT/Quex4Jo76Uq08MA9BFS9iw1
 e1oftwuXe6pCUcyt1M/AwOe8FnkP+Xm8oVmW0eJgO0TVDwob0Msx2LpVS2N/+/Oj
 1mtBSz4rUQyDdI1j6D0+HkdAlNnwEWSV6eQb+qtjXbhIWBOHUpFXNpQWkg==
 =LVAr
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "x86 guest:

   - Avoid false positive for check that only matters on AMD processors

  x86:

   - Give a hint when Win2016 might fail to boot due to XSAVES &&
     !XSAVEC configuration

   - Do not allow creating an in-kernel PIT unless an IOAPIC already
     exists

  RISC-V:

   - Allow ISA extensions that were enabled for bare metal in 6.8 (Zbc,
     scalar and vector crypto, Zfh[min], Zihintntl, Zvfh[min], Zfa)

  S390:

   - fix CC for successful PQAP instruction

   - fix a race when creating a shadow page"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86/coco: Define cc_vendor without CONFIG_ARCH_HAS_CC_PLATFORM
  x86/kvm: Fix SEV check in sev_map_percpu_data()
  KVM: x86: Give a hint when Win2016 might fail to boot due to XSAVES erratum
  KVM: x86: Check irqchip mode before create PIT
  KVM: riscv: selftests: Add Zfa extension to get-reg-list test
  RISC-V: KVM: Allow Zfa extension for Guest/VM
  KVM: riscv: selftests: Add Zvfh[min] extensions to get-reg-list test
  RISC-V: KVM: Allow Zvfh[min] extensions for Guest/VM
  KVM: riscv: selftests: Add Zihintntl extension to get-reg-list test
  RISC-V: KVM: Allow Zihintntl extension for Guest/VM
  KVM: riscv: selftests: Add Zfh[min] extensions to get-reg-list test
  RISC-V: KVM: Allow Zfh[min] extensions for Guest/VM
  KVM: riscv: selftests: Add vector crypto extensions to get-reg-list test
  RISC-V: KVM: Allow vector crypto extensions for Guest/VM
  KVM: riscv: selftests: Add scaler crypto extensions to get-reg-list test
  RISC-V: KVM: Allow scalar crypto extensions for Guest/VM
  KVM: riscv: selftests: Add Zbc extension to get-reg-list test
  RISC-V: KVM: Allow Zbc extension for Guest/VM
  KVM: s390: fix cc for successful PQAP
  KVM: s390: vsie: fix race during shadow creation
2024-02-07 17:52:16 +00:00
Peter Hilber
27f6a9c87a kvmclock: Unexport kvmclock clocksource
The KVM PTP driver now refers to the clocksource ID CSID_X86_KVM_CLK, not
to the clocksource itself any more. There are no remaining users of the
clocksource export.

Therefore, make the clocksource static again.

Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240201010453.2212371-9-peter.hilber@opensynergy.com
2024-02-07 17:05:21 +01:00
Peter Hilber
b152688c91 treewide: Remove system_counterval_t.cs, which is never read
The clocksource pointer in struct system_counterval_t is not evaluated any
more. Remove the code setting the member, and the member itself.

Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240201010453.2212371-8-peter.hilber@opensynergy.com
2024-02-07 17:05:21 +01:00
Peter Hilber
576bd4962f x86/kvm, ptp/kvm: Add clocksource ID, set system_counterval_t.cs_id
Add a clocksource ID for the x86 kvmclock.

Also, for ptp_kvm, set the recently added struct system_counterval_t member
cs_id to the clocksource ID (x86 kvmclock or ARM Generic Timer). In the
future, get_device_system_crosststamp() will compare the clocksource ID in
struct system_counterval_t, rather than the clocksource.

For now, to avoid touching too many subsystems at once, extract the
clocksource ID from the clocksource. The clocksource dereference will be
removed once everything is converted over..

Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240201010453.2212371-5-peter.hilber@opensynergy.com
2024-02-07 17:05:21 +01:00
Peter Hilber
a2c1fe7206 x86/tsc: Add clocksource ID, set system_counterval_t.cs_id
Add a clocksource ID for TSC and a distinct one for the early TSC.

Use distinct IDs for TSC and early TSC, since those also have distinct
clocksource structs. This should help to keep existing semantics when
comparing clocksources.

Also, set the recently added struct system_counterval_t member cs_id to the
TSC ID in the cases where the clocksource member is being set to the TSC
clocksource. In the future, get_device_system_crosststamp() will compare
the clocksource ID in struct system_counterval_t, rather than the
clocksource.

For the x86 ART related code, system_counterval_t.cs == NULL corresponds to
system_counterval_t.cs_id == CSID_GENERIC (0).

Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240201010453.2212371-4-peter.hilber@opensynergy.com
2024-02-07 17:05:21 +01:00
Randy Dunlap
c55cbfcea6 x86/tsc: Correct kernel-doc notation
Add or modify function descriptions to remove kernel-doc warnings:

tsc.c:655: warning: missing initial short description on line:
 * native_calibrate_tsc
tsc.c:1339: warning: Excess function parameter 'cycles' description in 'convert_art_ns_to_tsc'
tsc.c:1339: warning: Excess function parameter 'cs' description in 'convert_art_ns_to_tsc'
tsc.c:1373: warning: Function parameter or member 'work' not described in 'tsc_refine_calibration_work'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20231221033620.32379-1-rdunlap@infradead.org
2024-02-07 17:05:21 +01:00
Xiaoyao Li
ccb2280ec2 x86/kvm: Use separate percpu variable to track the enabling of asyncpf
Refer to commit fd10cde929 ("KVM paravirt: Add async PF initialization
to PV guest") and commit 344d9588a9 ("KVM: Add PV MSR to enable
asynchronous page faults delivery"). It turns out that at the time when
asyncpf was introduced, the purpose was defining the shared PV data 'struct
kvm_vcpu_pv_apf_data' with the size of 64 bytes. However, it made a mistake
and defined the size to 68 bytes, which failed to make fit in a cache line
and made the code inconsistent with the documentation.

Below justification quoted from Sean[*]

  KVM (the host side) has *never* read kvm_vcpu_pv_apf_data.enabled, and
  the documentation clearly states that enabling is based solely on the
  bit in the synthetic MSR.

  So rather than update the documentation, fix the goof by removing the
  enabled filed and use the separate percpu variable instread.
  KVM-as-a-host obviously doesn't enforce anything or consume the size,
  and changing the header will only affect guests that are rebuilt against
  the new header, so there's no chance of ABI breakage between KVM and its
  guests. The only possible breakage is if some other hypervisor is
  emulating KVM's async #PF (LOL) and relies on the guest to set
  kvm_vcpu_pv_apf_data.enabled. But (a) I highly doubt such a hypervisor
  exists, (b) that would arguably be a violation of KVM's "spec", and
  (c) the worst case scenario is that the guest would simply lose async
  #PF functionality.

[*] https://lore.kernel.org/all/ZS7ERnnRqs8Fl0ZF@google.com/T/#u

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20231025055914.1201792-2-xiaoyao.li@intel.com
[sean: use true/false instead of 1/0 for booleans]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-06 10:58:56 -08:00
Ard Biesheuvel
1c811d403a x86/sev: Fix position dependent variable references in startup code
The early startup code executes from a 1:1 mapping of memory, which
differs from the mapping that the code was linked and/or relocated to
run at. The latter mapping is not active yet at this point, and so
symbol references that rely on it will fault.

Given that the core kernel is built without -fPIC, symbol references are
typically emitted as absolute, and so any such references occuring in
the early startup code will therefore crash the kernel.

While an attempt was made to work around this for the early SEV/SME
startup code, by forcing RIP-relative addressing for certain global
SEV/SME variables via inline assembly (see snp_cpuid_get_table() for
example), RIP-relative addressing must be pervasively enforced for
SEV/SME global variables when accessed prior to page table fixups.

__startup_64() already handles this issue for select non-SEV/SME global
variables using fixup_pointer(), which adjusts the pointer relative to a
`physaddr` argument. To avoid having to pass around this `physaddr`
argument across all functions needing to apply pointer fixups, introduce
a macro RIP_RELATIVE_REF() which generates a RIP-relative reference to
a given global variable. It is used where necessary to force
RIP-relative accesses to global variables.

For backporting purposes, this patch makes no attempt at cleaning up
other occurrences of this pattern, involving either inline asm or
fixup_pointer(). Those will be addressed later.

  [ bp: Call it "rip_rel_ref" everywhere like other code shortens
    "rIP-relative reference" and make the asm wrapper __always_inline. ]

Co-developed-by: Kevin Loughlin <kevinloughlin@google.com>
Signed-off-by: Kevin Loughlin <kevinloughlin@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/all/20240130220845.1978329-1-kevinloughlin@google.com
2024-02-06 16:38:42 +01:00
Ricardo B. Marliere
a6a789165b x86/mce: Make mce_subsys const
Now that the driver core can properly handle constant struct bus_type,
make mce_subsys a constant structure.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20240204-bus_cleanup-x86-v1-1-4e7171be88e8@marliere.net
2024-02-05 10:26:51 +01:00
Kirill A. Shutemov
a9ef277488 x86/kvm: Fix SEV check in sev_map_percpu_data()
The function sev_map_percpu_data() checks if it is running on an SEV
platform by checking the CC_ATTR_GUEST_MEM_ENCRYPT attribute. However,
this attribute is also defined for TDX.

To avoid false positives, add a cc_vendor check.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: 4d96f91091 ("x86/sev: Replace occurrences of sev_active() with cc_platform_has()")
Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: David Rientjes <rientjes@google.com>
Message-Id: <20240124130317.495519-1-kirill.shutemov@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-31 16:21:01 -05:00
H. Peter Anvin (Intel)
208d8c79fd x86/fred: Invoke FRED initialization code to enable FRED
Let cpu_init_exception_handling() call cpu_init_fred_exceptions() to
initialize FRED. However if FRED is unavailable or disabled, it falls
back to set up TSS IST and initialize IDT.

Co-developed-by: Xin Li <xin3.li@intel.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-36-xin3.li@intel.com
2024-01-31 22:03:36 +01:00
H. Peter Anvin (Intel)
cdd99dd873 x86/fred: Add FRED initialization functions
Add cpu_init_fred_exceptions() to:
  - Set FRED entrypoints for events happening in ring 0 and 3.
  - Specify the stack level for IRQs occurred ring 0.
  - Specify dedicated event stacks for #DB/NMI/#MCE/#DF.
  - Enable FRED and invalidtes IDT.
  - Force 32-bit system calls to use "int $0x80" only.

Add fred_complete_exception_setup() to:
  - Initialize system_vectors as done for IDT systems.
  - Set unused sysvec_table entries to fred_handle_spurious_interrupt().

Co-developed-by: Xin Li <xin3.li@intel.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-35-xin3.li@intel.com
2024-01-31 22:03:32 +01:00
Xin Li
530dce278a x86/syscall: Split IDT syscall setup code into idt_syscall_init()
Because FRED uses the ring 3 FRED entrypoint for SYSCALL and SYSENTER and
ERETU is the only legit instruction to return to ring 3, there is NO need
to setup SYSCALL and SYSENTER MSRs for FRED, except the IA32_STAR MSR.

Split IDT syscall setup code into idt_syscall_init() to make it easy to
skip syscall setup code when FRED is enabled.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-34-xin3.li@intel.com
2024-01-31 22:03:27 +01:00
Xin Li
8f4a29b0e8 x86/traps: Add sysvec_install() to install a system interrupt handler
Add sysvec_install() to install a system interrupt handler into the IDT
or the FRED system interrupt handler table.

Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-28-xin3.li@intel.com
2024-01-31 22:02:36 +01:00
Xin Li
ffa4901f0e x86/fred: Add a machine check entry stub for FRED
Like #DB, when occurred on different ring level, i.e., from user or kernel
context, #MCE needs to be handled on different stack: User #MCE on current
task stack, while kernel #MCE on a dedicated stack.

This is exactly how FRED event delivery invokes an exception handler: ring
3 event on level 0 stack, i.e., current task stack; ring 0 event on the
the FRED machine check entry stub doesn't do stack switch.

Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-26-xin3.li@intel.com
2024-01-31 22:02:25 +01:00
H. Peter Anvin (Intel)
f8b8ee45f8 x86/fred: Add a NMI entry stub for FRED
On a FRED system, NMIs nest both with themselves and faults, transient
information is saved into the stack frame, and NMI unblocking only
happens when the stack frame indicates that so should happen.

Thus, the NMI entry stub for FRED is really quite small...

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231216063139.25567-1-xin3.li@intel.com
2024-01-31 22:02:20 +01:00
H. Peter Anvin (Intel)
99fcc968e7 x86/fred: Add a debug fault entry stub for FRED
When occurred on different ring level, i.e., from user or kernel context,
stack, while kernel #DB on a dedicated stack. This is exactly how FRED
event delivery invokes an exception handler: ring 3 event on level 0
stack, i.e., current task stack; ring 0 event on the #DB dedicated stack
specified in the IA32_FRED_STKLVLS MSR. So unlike IDT, the FRED debug
exception entry stub doesn't do stack switch.

On a FRED system, the debug trap status information (DR6) is passed on
the stack, to avoid the problem of transient state. Furthermore, FRED
transitions avoid a lot of ugly corner cases the handling of which can,
and should be, skipped.

The FRED debug trap status information saved on the stack differs from
DR6 in both stickiness and polarity; it is exactly in the format which
debug_read_clear_dr6() returns for the IDT entry points.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-24-xin3.li@intel.com
2024-01-31 22:02:15 +01:00
H. Peter Anvin (Intel)
ad41a14cc2 x86/fred: Allow single-step trap and NMI when starting a new task
Entering a new task is logically speaking a return from a system call
(exec, fork, clone, etc.). As such, if ptrace enables single stepping
a single step exception should be allowed to trigger immediately upon
entering user space. This is not optional.

NMI should *never* be disabled in user space. As such, this is an
optional, opportunistic way to catch errors.

Allow single-step trap and NMI when starting a new task, thus once
the new task enters user space, single-step trap and NMI are both
enabled immediately.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-21-xin3.li@intel.com
2024-01-31 22:02:00 +01:00
H. Peter Anvin (Intel)
df8838737b x86/fred: No ESPFIX needed when FRED is enabled
Because FRED always restores the full value of %rsp, ESPFIX is
no longer needed when it's enabled.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-20-xin3.li@intel.com
2024-01-31 22:01:51 +01:00
H. Peter Anvin (Intel)
09794f6893 x86/fred: Disallow the swapgs instruction when FRED is enabled
SWAPGS is no longer needed thus NOT allowed with FRED because FRED
transitions ensure that an operating system can _always_ operate
with its own GS base address:

  - For events that occur in ring 3, FRED event delivery swaps the GS
    base address with the IA32_KERNEL_GS_BASE MSR.

  - ERETU (the FRED transition that returns to ring 3) also swaps the
    GS base address with the IA32_KERNEL_GS_BASE MSR.

And the operating system can still setup the GS segment for a user
thread without the need of loading a user thread GS with:

  - Using LKGS, available with FRED, to modify other attributes of the
    GS segment without compromising its ability always to operate with
    its own GS base address.

  - Accessing the GS segment base address for a user thread as before
    using RDMSR or WRMSR on the IA32_KERNEL_GS_BASE MSR.

Note, LKGS loads the GS base address into the IA32_KERNEL_GS_BASE MSR
instead of the GS segment's descriptor cache. As such, the operating
system never changes its runtime GS base address.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-19-xin3.li@intel.com
2024-01-31 22:01:41 +01:00
Xin Li
ee63291aa8 x86/ptrace: Cleanup the definition of the pt_regs structure
struct pt_regs is hard to read because the member or section related
comments are not aligned with the members.

The 'cs' and 'ss' members of pt_regs are type of 'unsigned long' while
in reality they are only 16-bit wide. This works so far as the
remaining space is unused, but FRED will use the remaining bits for
other purposes.

To prepare for FRED:

  - Cleanup the formatting
  - Convert 'cs' and 'ss' to u16 and embed them into an union
    with a u64
  - Fixup the related printk() format strings

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Originally-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-14-xin3.li@intel.com
2024-01-31 22:01:13 +01:00
H. Peter Anvin (Intel)
ff45746fbf x86/cpu: Add X86_CR4_FRED macro
Add X86_CR4_FRED macro for the FRED bit in %cr4. This bit must not be
changed after initialization, so add it to the pinned CR4 bits.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-12-xin3.li@intel.com
2024-01-31 22:00:38 +01:00
Ard Biesheuvel
1567570624 x86/startup_64: Drop long return to initial_code pointer
Since

  866b556efa ("x86/head/64: Install startup GDT")

the primary startup sequence sets the code segment register (CS) to
__KERNEL_CS before calling into the startup code shared between primary
and secondary boot.

This means a simple indirect call is sufficient here.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240129180502.4069817-24-ardb+git@google.com
2024-01-31 18:31:21 +01:00
Xin Li
3810da1271 x86/fred: Add a fred= cmdline param
Let command line option "fred" accept multiple options to make it
easier to tweak its behavior.

Currently, two options 'on' and 'off' are allowed, and the default
behavior is to disable FRED. To enable FRED, append "fred=on" to the
kernel command line.

  [ bp: Use cpu_feature_enabled(), touch ups. ]

Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-9-xin3.li@intel.com
2024-01-30 18:19:20 +01:00
Andrei Vagin
d877550eaf x86/fpu: Stop relying on userspace for info to fault in xsave buffer
Before this change, the expected size of the user space buffer was
taken from fx_sw->xstate_size. fx_sw->xstate_size can be changed
from user-space, so it is possible construct a sigreturn frame where:

 * fx_sw->xstate_size is smaller than the size required by valid bits in
   fx_sw->xfeatures.
 * user-space unmaps parts of the sigrame fpu buffer so that not all of
   the buffer required by xrstor is accessible.

In this case, xrstor tries to restore and accesses the unmapped area
which results in a fault. But fault_in_readable succeeds because buf +
fx_sw->xstate_size is within the still mapped area, so it goes back and
tries xrstor again. It will spin in this loop forever.

Instead, fault in the maximum size which can be touched by XRSTOR (taken
from fpstate->user_size).

[ dhansen: tweak subject / changelog ]

Fixes: fcb3635f50 ("x86/fpu/signal: Handle #PF in the direct restore path")
Reported-by: Konstantin Bogomolov <bogomolov@google.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240130063603.3392627-1-avagin%40google.com
2024-01-30 07:25:48 -08:00
Thomas Zimmermann
103bf75fc9 x86: Do not include <asm/bootparam.h> in several files
Remove the include statement for <asm/bootparam.h> from several files
that don't require it and limit the exposure of those definitions within
the Linux kernel code.

  [ bp: Massage commit message. ]

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240112095000.8952-5-tzimmermann@suse.de
2024-01-30 15:17:24 +01:00
Ashish Kalra
8ef979584e crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
Add a kdump safe version of sev_firmware_shutdown() and register it as a
crash_kexec_post_notifier so it will be invoked during panic/crash to do
SEV/SNP shutdown. This is required for transitioning all IOMMU pages to
reclaim/hypervisor state, otherwise re-init of IOMMU pages during
crashdump kernel boot fails and panics the crashdump kernel.

This panic notifier runs in atomic context, hence it ensures not to
acquire any locks/mutexes and polls for PSP command completion instead
of depending on PSP command completion interrupt.

  [ mdr: Remove use of "we" in comments. ]

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240126041126.1927228-21-michael.roth@amd.com
2024-01-29 20:34:19 +01:00
Ashish Kalra
e3fd08afb7 x86/mtrr: Don't print errors if MtrrFixDramModEn is set when SNP enabled
SNP enabled platforms require the MtrrFixDramModeEn bit to be set across
all CPUs when SNP is enabled. Therefore, don't print error messages when
MtrrFixDramModeEn is set when bringing CPUs online.

Closes: https://lore.kernel.org/kvm/68b2d6bf-bce7-47f9-bebb-2652cc923ff9@linux.microsoft.com/
Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240126041126.1927228-6-michael.roth@amd.com
2024-01-29 17:24:27 +01:00
Brijesh Singh
216d106c7f x86/sev: Add SEV-SNP host initialization support
The memory integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). The RMP is a single data
structure shared across the system that contains one entry for every 4K
page of DRAM that may be used by SEV-SNP VMs. The APM Volume 2 section
on Secure Nested Paging (SEV-SNP) details a number of steps needed to
detect/enable SEV-SNP and RMP table support on the host:

 - Detect SEV-SNP support based on CPUID bit
 - Initialize the RMP table memory reported by the RMP base/end MSR
   registers and configure IOMMU to be compatible with RMP access
   restrictions
 - Set the MtrrFixDramModEn bit in SYSCFG MSR
 - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
 - Configure IOMMU

RMP table entry format is non-architectural and it can vary by
processor. It is defined by the PPR document for each respective CPU
family. Restrict SNP support to CPU models/families which are compatible
with the current RMP table entry format to guard against any undefined
behavior when running on other system types. Future models/support will
handle this through an architectural mechanism to allow for broader
compatibility.

SNP host code depends on CONFIG_KVM_AMD_SEV config flag which may be
enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
instead of CONFIG_AMD_MEM_ENCRYPT.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Link: https://lore.kernel.org/r/20240126041126.1927228-5-michael.roth@amd.com
2024-01-29 17:20:23 +01:00
Kim Phillips
acaa4b5c4c x86/speculation: Do not enable Automatic IBRS if SEV-SNP is enabled
Without SEV-SNP, Automatic IBRS protects only the kernel. But when
SEV-SNP is enabled, the Automatic IBRS protection umbrella widens to all
host-side code, including userspace. This protection comes at a cost:
reduced userspace indirect branch performance.

To avoid this performance loss, don't use Automatic IBRS on SEV-SNP
hosts and all back to retpolines instead.

  [ mdr: squash in changes from review discussion. ]

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lore.kernel.org/r/20240126041126.1927228-3-michael.roth@amd.com
2024-01-29 17:19:01 +01:00
Brijesh Singh
b6e0f6666f x86/cpufeatures: Add SEV-SNP CPU feature
Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.

Since enabling the SNP CPU feature imposes a number of additional
requirements on host initialization and handling legacy firmware APIs
for SEV/SEV-ES guests, only introduce the CPU feature bit so that the
relevant handling can be added, but leave it disabled via a
disabled-features mask.

Once all the necessary changes needed to maintain legacy SEV/SEV-ES
support are introduced in subsequent patches, the SNP feature bit will
be unmasked/enabled.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240126041126.1927228-2-michael.roth@amd.com
2024-01-29 17:13:16 +01:00
Borislav Petkov (AMD)
e3ef461af3 x86/sev: Harden #VC instruction emulation somewhat
Compare the opcode bytes at rIP for each #VC exit reason to verify the
instruction which raised the #VC exception is actually the right one.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240105101407.11694-1-bp@alien8.de
2024-01-29 17:08:22 +01:00
H. Peter Anvin (Intel)
51c158f7aa x86/cpufeatures: Add the CPU feature bit for FRED
Any FRED enabled CPU will always have the following features as its
baseline:

  1) LKGS, load attributes of the GS segment but the base address into
     the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor
     cache.

  2) WRMSRNS, non-serializing WRMSR for faster MSR writes.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-7-xin3.li@intel.com
2024-01-25 19:10:30 +01:00
Namhyung Kim
e2fbc857d3 x86/nmi: Rate limit unknown NMI messages
On some AMD machines, unknown NMI messages were printed on the console
continuously when using perf command with IBS.  It was reported that it
can slow down the kernel. Ratelimit the unknown NMI messages.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Guilherme Amadio <amadio@gentoo.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20231209015211.357983-1-namhyung@kernel.org
2024-01-25 17:29:06 +01:00
Mario Limonciello
b9328fd636 x86/CPU/AMD: Add more models to X86_FEATURE_ZEN5
Add model ranges starting at 0x20, 0x40 and 0x70 to the synthetic
feature flag X86_FEATURE_ZEN5.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240124220749.2983-1-mario.limonciello@amd.com
2024-01-25 12:26:21 +01:00
Babu Moger
fc747eebef x86/resctrl: Remove redundant variable in mbm_config_write_domain()
The kernel test robot reported the following warning after commit

  54e35eb861 ("x86/resctrl: Read supported bandwidth sources from CPUID").

even though the issue is present even in the original commit

  92bd5a1390 ("x86/resctrl: Add interface to write mbm_total_bytes_config")

which added this function. The reported warning is:

  $ make C=1 CHECK=scripts/coccicheck arch/x86/kernel/cpu/resctrl/rdtgroup.o
  ...
  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1621:5-8: Unneeded variable: "ret". Return "0" on line 1655

Remove the local variable 'ret'.

  [ bp: Massage commit message, make mbm_config_write_domain() void. ]

Fixes: 92bd5a1390 ("x86/resctrl: Add interface to write mbm_total_bytes_config")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401241810.jbd8Ipa1-lkp@intel.com/
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/202401241810.jbd8Ipa1-lkp@intel.com
2024-01-25 00:41:59 +01:00
Tony Luck
c2427e70c1 x86/resctrl: Implement new mba_MBps throttling heuristic
The mba_MBps feedback loop increases throttling when a group is using
more bandwidth than the target set by the user in the schemata file, and
decreases throttling when below target.

To avoid possibly stepping throttling up and down on every poll a flag
"delta_comp" is set whenever throttling is changed to indicate that the
actual change in bandwidth should be recorded on the next poll in
"delta_bw". Throttling is only reduced if the current bandwidth plus
delta_bw is below the user target.

This algorithm works well if the workload has steady bandwidth needs.
But it can go badly wrong if the workload moves to a different phase
just as the throttling level changed. E.g. if the workload becomes
essentially idle right as throttling level is increased, the value
calculated for delta_bw will be more or less the old bandwidth level.
If the workload then resumes, Linux may never reduce throttling because
current bandwidth plus delta_bw is above the target set by the user.

Implement a simpler heuristic by assuming that in the worst case the
currently measured bandwidth is being controlled by the current level of
throttling. Compute how much it may increase if throttling is relaxed to
the next higher level. If that is still below the user target, then it
is ok to reduce the amount of throttling.

Fixes: ba0f26d852 ("x86/intel_rdt/mba_sc: Prepare for feedback loop")
Reported-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xiaochen Shen <xiaochen.shen@intel.com>
Link: https://lore.kernel.org/r/20240122180807.70518-1-tony.luck@intel.com
2024-01-24 11:32:01 +01:00
Babu Moger
54e35eb861 x86/resctrl: Read supported bandwidth sources from CPUID
If the BMEC (Bandwidth Monitoring Event Configuration) feature is
supported, the bandwidth events can be configured. The maximum supported
bandwidth bitmask can be read from CPUID:

  CPUID_Fn80000020_ECX_x03 [Platform QoS Monitoring Bandwidth Event Configuration]
  Bits    Description
  31:7    Reserved
   6:0    Identifies the bandwidth sources that can be tracked.

While at it, move the mask checking to mon_config_write() before
iterating over all the domains. Also, print the valid bitmask when the
user tries to configure invalid event configuration value.

The CPUID details are documented in the Processor Programming Reference
(PPR) Vol 1.1 for AMD Family 19h Model 11h B1 - 55901 Rev 0.25 in the
Link tag.

Fixes: dc2a3e8579 ("x86/resctrl: Add interface to read mbm_total_bytes_config")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Link: https://lore.kernel.org/r/669896fa512c7451319fa5ca2fdb6f7e015b5635.1705359148.git.babu.moger@amd.com
2024-01-23 16:26:42 +01:00
Babu Moger
0976783bb1 x86/resctrl: Remove hard-coded memory bandwidth limit
The QOS Memory Bandwidth Enforcement Limit is reported by
CPUID_Fn80000020_EAX_x01 and CPUID_Fn80000020_EAX_x02:

  Bits	 Description
  31:0	 BW_LEN: Size of the QOS Memory Bandwidth Enforcement Limit.

Newer processors can support higher bandwidth limit than the current
hard-coded value. Remove latter and detect using CPUID instead. Also,
update the register variables eax and edx to match the AMD CPUID
definition.

The CPUID details are documented in the Processor Programming Reference
(PPR) Vol 1.1 for AMD Family 19h Model 11h B1 - 55901 Rev 0.25 in the
Link tag below.

Fixes: 4d05bf71f1 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Link: https://lore.kernel.org/r/c26a8ca79d399ed076cf8bf2e9fbc58048808289.1705359148.git.babu.moger@amd.com
2024-01-23 16:22:51 +01:00
Borislav Petkov (AMD)
3e4147f33f x86/CPU/AMD: Add X86_FEATURE_ZEN5
Add a synthetic feature flag for Zen5.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240104201138.5072-1-bp@alien8.de
2024-01-23 11:06:18 +01:00
Tony Luck
1b908debf5 x86/resctrl: Fix unused variable warning in cache_alloc_hsw_probe()
In a "W=1" build gcc throws a warning:

  arch/x86/kernel/cpu/resctrl/core.c: In function ‘cache_alloc_hsw_probe’:
  arch/x86/kernel/cpu/resctrl/core.c:139:16: warning: variable ‘h’ set but not used

Switch from wrmsr_safe() to wrmsrl_safe(), and from rdmsr() to rdmsrl()
using a single u64 argument for the MSR value instead of the pair of u32
for the high and low halves.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/ZULCd/TGJL9Dmncf@agluck-desk3
2024-01-22 19:54:54 +01:00
Juergen Gross
f7cfe7017b x86/paravirt: Make BUG_func() usable by non-GPL modules
Several inlined functions subject to paravirt patching are referencing
BUG_func() after the recent switch to the alternative patching
mechanism.

As those functions can legally be used by non-GPL modules, BUG_func()
must be usable by those modules, too. So use EXPORT_SYMBOL() when
exporting BUG_func().

Fixes: 9824b00c2b ("x86/paravirt: Move some functions and defines to alternative.c")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240109082232.22657-1-jgross@suse.com
2024-01-22 15:51:07 +01:00
Linus Torvalds
378de6df19 RTC for 6.8
Subsytem:
 
 New driver:
  - Analog Devices MAX31335
  - Nuvoton ma35d1
  - Texas Instrument TPS6594 PMIC RTC
 
 Drivers:
  - cmos: use ACPI alarm instead of HPET on recent AMD platforms
  - nuvoton: add NCT3015Y-R and NCT3018Y-R support
  - rv8803: proper suspend/resume and wakeup-source support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEBqsFVZXh8s/0O5JiY6TcMGxwOjIFAmWpkqQACgkQY6TcMGxw
 OjJ2eQ//cDaUPgvOUQ2+qQiZOyOrjnck+QEyXZr2mcnNfRnMPiafJX09v3W4BMPh
 0oOVNF5BbGofTgJRnTC9BCRQnq8XZUwXaMr1B+mLQF4V9Xq3896pILkdRmd0q7EW
 3qwKCNP1vkYx2hGyWB9wVAMESAdUIFHCLxJWeQZ3ESGUacMoON0cdFa96TUx4fKa
 m29ybMTHRHKpnZsIYpegxG42lWp84IPvTbtySbT52dr2ucLToVos/dX23juQ40D0
 nyUa8Q+g6aLoTxjZPcFwK6dHJJIwWz56s40IbMRGr6dVfRis5QZfIQ/cB8ULj48L
 AkCtN6kptVsov/W2R9ZriTf5p53K7Fmwz+dhccW7SxA82cGyswWOD3BNzzOYnzPY
 pKSVeTnR7mD2IC28pGrekSpg3ExuNu/4+WHnz0EwpjgyXmtlxnfK6oV9nf8bIUsn
 JsY3tNOvenv8NQSQ1at3GugeQ4bGMbxay6pL9zm5EZjYGMDX0z7IcPFr9KeYC5tJ
 60dYWCGuB2JF0WocuAXSNrj2l9VFFqhn7OdDVuiB00rBROQcKKV/H30EnmAwRUI1
 8SZhVJ0TIeoG6XsajhbapvOH8EWUnKPn1mgJxSiyVR6Hz93oZldIPK3cUtfjcltx
 xl7LdVeseE1CXGh4g222CI0MCX6x1QPQ5L8jhgc67dyTUx/wQ4Q=
 =7u4i
 -----END PGP SIGNATURE-----

Merge tag 'rtc-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "There are three new drivers this cycle. Also the cmos driver is
  getting fixes for longstanding wakeup issues on AMD.

  New drivers:
   - Analog Devices MAX31335
   - Nuvoton ma35d1
   - Texas Instrument TPS6594 PMIC RTC

  Drivers:
   - cmos: use ACPI alarm instead of HPET on recent AMD platforms
   - nuvoton: add NCT3015Y-R and NCT3018Y-R support
   - rv8803: proper suspend/resume and wakeup-source support"

* tag 'rtc-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (26 commits)
  rtc: nuvoton: Compatible with NCT3015Y-R and NCT3018Y-R
  rtc: da9063: Use dev_err_probe()
  rtc: da9063: Use device_get_match_data()
  rtc: da9063: Make IRQ as optional
  rtc: max31335: Fix comparison in max31335_volatile_reg()
  rtc: max31335: use regmap_update_bits_check
  rtc: max31335: remove unecessary locking
  rtc: max31335: add driver support
  dt-bindings: rtc: max31335: add max31335 bindings
  rtc: rv8803: add wakeup-source support
  rtc: ac100: remove misuses of kernel-doc
  rtc: class: Remove usage of the deprecated ida_simple_xx() API
  rtc: MAINTAINERS: drop Alessandro Zummo
  rtc: ma35d1: remove hardcoded UIE support
  dt-bindings: rtc: qcom-pm8xxx: fix inconsistent example
  rtc: rv8803: Add power management support
  rtc: ds3232: avoid unused-const-variable warning
  rtc: lpc24xx: add missing dependency
  rtc: tps6594: Add driver for TPS6594 RTC
  rtc: Add driver for Nuvoton ma35d1 rtc controller
  ...
2024-01-18 17:25:39 -08:00
Linus Torvalds
0dde2bf67b IOMMU Updates for Linux v6.8
Including:
 
 	- Core changes:
 	  - Fix race conditions in device probe path
 	  - Retire IOMMU bus_ops
 	  - Support for passing custom allocators to page table drivers
 	  - Clean up Kconfig around IOMMU_SVA
 	  - Support for sharing SVA domains with all devices bound to
 	    a mm
 	  - Firmware data parsing cleanup
 	  - Tracing improvements for iommu-dma code
 	  - Some smaller fixes and cleanups
 
 	- ARM-SMMU drivers:
 	  - Device-tree binding updates:
 	     - Add additional compatible strings for Qualcomm SoCs
 	     - Document Adreno clocks for Qualcomm's SM8350 SoC
 	  - SMMUv2:
 	    - Implement support for the ->domain_alloc_paging() callback
 	    - Ensure Secure context is restored following suspend of Qualcomm SMMU
 	      implementation
 	  - SMMUv3:
 	    - Disable stalling mode for the "quiet" context descriptor
 	    - Minor refactoring and driver cleanups
 
 	 - Intel VT-d driver:
 	   - Cleanup and refactoring
 
 	 - AMD IOMMU driver:
 	   - Improve IO TLB invalidation logic
 	   - Small cleanups and improvements
 
 	 - Rockchip IOMMU driver:
 	   - DT binding update to add Rockchip RK3588
 
 	 - Apple DART driver:
 	   - Apple M1 USB4/Thunderbolt DART support
 	   - Cleanups
 
 	 - Virtio IOMMU driver:
 	   - Add support for iotlb_sync_map
 	   - Enable deferred IO TLB flushes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmWecQoACgkQK/BELZcB
 GuN5ZxAAzC5QUKAzANx0puk7QhPpKKlbSvj6Q7iRgCLk00KJO1+VQh9v4ouCmXqF
 kn3Ko8gddjhtrgwN0OQ54F39cLUrp1SBemy71K5YOR+vu8VKtwtmawZGeeRZ+k+B
 Eohw58oaXTiR1maYvoLixLYczLrjklqyJOQ1vZ0GxFGxDqrFByAryHDgG/3OCpJx
 C9e6PsLbbfhfqA8Kv97iKcBqniGbXxAMuodqSUG0buQ3oZgfpIP6Bt3EgUzFGPGk
 3BTlYxowS/gkjUWd3fgjQFIFLTA01u9FhpA2Jb0a4v67pUCR64YxHN7rBQ6ZChtG
 kB9laQfU9re79RsHhqQzr0JT9x/eyq7pzGzjp5TV5TPW6IW+sqjMIPhzd9P08Ef7
 BclkCVobx0jSAHOhnnG4QJiKANr2Y2oM3HfsAJccMMY45RRhUKmVqM7jxMPfGn3A
 i+inlee73xTjZXJse1EWG1fmKKMLvX9LDEp4DyOfn9CqVT+7hpZvzPjfbGr937Rm
 JlwXhF3rQXEpOCagEsbt1vOf+V0e9QiCLf1Y2KpkIkDbE5wwSD/2qLm3tFhJG3oF
 fkW+J14Cid0pj+hY0afGe0kOUOIYlimu0nFmSf0pzMH+UktZdKogSfyb1gSDsy+S
 rsZRGPFhMJ832ExqhlDfxqBebqh+jsfKynlskui6Td5C9ZULaHA=
 =q751
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "Core changes:
   - Fix race conditions in device probe path
   - Retire IOMMU bus_ops
   - Support for passing custom allocators to page table drivers
   - Clean up Kconfig around IOMMU_SVA
   - Support for sharing SVA domains with all devices bound to a mm
   - Firmware data parsing cleanup
   - Tracing improvements for iommu-dma code
   - Some smaller fixes and cleanups

  ARM-SMMU drivers:
   - Device-tree binding updates:
      - Add additional compatible strings for Qualcomm SoCs
      - Document Adreno clocks for Qualcomm's SM8350 SoC
   - SMMUv2:
      - Implement support for the ->domain_alloc_paging() callback
      - Ensure Secure context is restored following suspend of Qualcomm
        SMMU implementation
   - SMMUv3:
      - Disable stalling mode for the "quiet" context descriptor
      - Minor refactoring and driver cleanups

  Intel VT-d driver:
   - Cleanup and refactoring

  AMD IOMMU driver:
   - Improve IO TLB invalidation logic
   - Small cleanups and improvements

  Rockchip IOMMU driver:
   - DT binding update to add Rockchip RK3588

  Apple DART driver:
   - Apple M1 USB4/Thunderbolt DART support
   - Cleanups

  Virtio IOMMU driver:
   - Add support for iotlb_sync_map
   - Enable deferred IO TLB flushes"

* tag 'iommu-updates-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (66 commits)
  iommu: Don't reserve 0-length IOVA region
  iommu/vt-d: Move inline helpers to header files
  iommu/vt-d: Remove unused vcmd interfaces
  iommu/vt-d: Remove unused parameter of intel_pasid_setup_pass_through()
  iommu/vt-d: Refactor device_to_iommu() to retrieve iommu directly
  iommu/sva: Fix memory leak in iommu_sva_bind_device()
  dt-bindings: iommu: rockchip: Add Rockchip RK3588
  iommu/dma: Trace bounce buffer usage when mapping buffers
  iommu/arm-smmu: Convert to domain_alloc_paging()
  iommu/arm-smmu: Pass arm_smmu_domain to internal functions
  iommu/arm-smmu: Implement IOMMU_DOMAIN_BLOCKED
  iommu/arm-smmu: Convert to a global static identity domain
  iommu/arm-smmu: Reorganize arm_smmu_domain_add_master()
  iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED
  iommu/arm-smmu-v3: Master cannot be NULL in arm_smmu_write_strtab_ent()
  iommu/arm-smmu-v3: Add a type for the STE
  iommu/arm-smmu-v3: disable stall for quiet_cd
  iommu/qcom: restore IOMMU state if needed
  iommu/arm-smmu-qcom: Add QCM2290 MDSS compatible
  iommu/arm-smmu-qcom: Add missing GMU entry to match table
  ...
2024-01-18 15:16:57 -08:00
Linus Torvalds
b4442cadca - Add support managing TDX host hardware
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmWfCRQACgkQaDWVMHDJ
 krDUqQ//VCvkpf0mAbYDJa1oTXFW8O5cVTusBtPi8k7cFbtjQpjno/9AqKol+sK8
 AKg+y5iHHl7QJmDmEcpS+O9OBbmFOpvDzm3QZhk8RkWS5pe0B108dnINYtS0eP9R
 MkzZwfrI2yC6NX4hvHGdD8WGHjrt+oxY0bojehX87JZsyRU+xqc/g1OO7a5bUPQe
 3Ip0kKiCeqFv0y+Q1pFMEd9RdZ8XxqzUHCJT3hfgZ6FajJ2eVy6jNrPOm6LozycB
 eOtYYNapSgw3k/WhJCOYWHX7kePXibLxBRONLpi6P3U6pMVk4n8wrgl7qPtdW1Qx
 nR2UHX5P6eFkxNCuU1BzvmPBROe37C51MFVw29eRnigvuX3j/vfCH1+17xQOVKVv
 5JyxYA0rJWqoOz6mX7YaNJHlmrxHzeKXudICyOFuu1j5c8CuGjh8NQsOSCq16XfZ
 hPzfYDUS8I7/kHYQPJlnB+kF9pmbyjTM70h74I8D6ZWvXESHJZt+TYPyWfkBXP/P
 L9Pwx1onAyoBApGxCWuvgGTLonzNredgYG4ABbqhUqxqncJS9M7Y/yJa+f+3SOkR
 T6LxoByuDVld5cIfbOzRwIaRezZDe/NL7rkHm/DWo98OaV3zILsr20Hx1lPZ1Vce
 ryZ9lCdZGGxm2jmpzr/VymPQz/E+ezahRHE1+F3su8jpCU41txg=
 =1EJI
 -----END PGP SIGNATURE-----

Merge tag 'x86_tdx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 TDX updates from Dave Hansen:
 "This contains the initial support for host-side TDX support so that
  KVM can run TDX-protected guests. This does not include the actual
  KVM-side support which will come from the KVM folks. The TDX host
  interactions with kexec also needs to be ironed out before this is
  ready for prime time, so this code is currently Kconfig'd off when
  kexec is on.

  The majority of the code here is the kernel telling the TDX module
  which memory to protect and handing some additional memory over to it
  to use to store TDX module metadata. That sounds pretty simple, but
  the TDX architecture is rather flexible and it takes quite a bit of
  back-and-forth to say, "just protect all memory, please."

  There is also some code tacked on near the end of the series to handle
  a hardware erratum. The erratum can make software bugs such as a
  kernel write to TDX-protected memory cause a machine check and
  masquerade as a real hardware failure. The erratum handling watches
  out for these and tries to provide nicer user errors"

* tag 'x86_tdx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/virt/tdx: Make TDX host depend on X86_MCE
  x86/virt/tdx: Disable TDX host support when kexec is enabled
  Documentation/x86: Add documentation for TDX host support
  x86/mce: Differentiate real hardware #MCs from TDX erratum ones
  x86/cpu: Detect TDX partial write machine check erratum
  x86/virt/tdx: Handle TDX interaction with sleep and hibernation
  x86/virt/tdx: Initialize all TDMRs
  x86/virt/tdx: Configure global KeyID on all packages
  x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID
  x86/virt/tdx: Designate reserved areas for all TDMRs
  x86/virt/tdx: Allocate and set up PAMTs for TDMRs
  x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions
  x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions
  x86/virt/tdx: Get module global metadata for module initialization
  x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory
  x86/virt/tdx: Add skeleton to enable TDX on demand
  x86/virt/tdx: Add SEAMCALL error printing for module initialization
  x86/virt/tdx: Handle SEAMCALL no entropy error in common code
  x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC
  x86/virt/tdx: Define TDX supported page sizes as macros
  ...
2024-01-18 13:41:48 -08:00
Linus Torvalds
80955ae955 Driver core changes for 6.8-rc1
Here are the set of driver core and kernfs changes for 6.8-rc1.  Nothing
 major in here this release cycle, just lots of small cleanups and some
 tweaks on kernfs that in the very end, got reverted and will come back
 in a safer way next release cycle.
 
 Included in here are:
   - more driver core 'const' cleanups and fixes
   - fw_devlink=rpm is now the default behavior
   - kernfs tiny changes to remove some string functions
   - cpu handling in the driver core is updated to work better on many
     systems that add topologies and cpus after booting
   - other minor changes and cleanups
 
 All of the cpu handling patches have been acked by the respective
 maintainers and are coming in here in one series.  Everything has been
 in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZaeOrg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymtcwCffzvKKkSY9qAp6+0v2WQNkZm1JWoAoJCPYUwF
 If6wEoPLWvRfKx4gIoq9
 =D96r
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here are the set of driver core and kernfs changes for 6.8-rc1.
  Nothing major in here this release cycle, just lots of small cleanups
  and some tweaks on kernfs that in the very end, got reverted and will
  come back in a safer way next release cycle.

  Included in here are:

   - more driver core 'const' cleanups and fixes

   - fw_devlink=rpm is now the default behavior

   - kernfs tiny changes to remove some string functions

   - cpu handling in the driver core is updated to work better on many
     systems that add topologies and cpus after booting

   - other minor changes and cleanups

  All of the cpu handling patches have been acked by the respective
  maintainers and are coming in here in one series. Everything has been
  in linux-next for a while with no reported issues"

* tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (51 commits)
  Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"
  kernfs: convert kernfs_idr_lock to an irq safe raw spinlock
  class: fix use-after-free in class_register()
  PM: clk: make pm_clk_add_notifier() take a const pointer
  EDAC: constantify the struct bus_type usage
  kernfs: fix reference to renamed function
  driver core: device.h: fix Excess kernel-doc description warning
  driver core: class: fix Excess kernel-doc description warning
  driver core: mark remaining local bus_type variables as const
  driver core: container: make container_subsys const
  driver core: bus: constantify subsys_register() calls
  driver core: bus: make bus_sort_breadthfirst() take a const pointer
  kernfs: d_obtain_alias(NULL) will do the right thing...
  driver core: Better advertise dev_err_probe()
  kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_name_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_walk_ns() from strlcpy() to strscpy()
  initramfs: Expose retained initrd as sysfs file
  fs/kernfs/dir: obey S_ISGID
  kernel/cgroup: use kernfs_create_dir_ns()
  ...
2024-01-18 09:48:40 -08:00
Linus Torvalds
e1aa9df440 pci-v6.8-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmWldYsUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vyxUhAAs2ctoK/sMAfTOO2b1UAD/ig7CGGz
 DlDt38RezFU4uqeY0Ix4heFs3RIt8YGuns76Fejfyevh1I7SOA9lbhFuMLBfO9j0
 LU+KuZeGoXtIe5Kd6hCQIUgVvwISs407yp7JUUzqxFQ2rv7bin64xiDb407ZQGaK
 5v4oRsnQn1KBhgZ2wfQ/S+adAma9IroK9F3C/Bm+IJ+mpNxJcbWPqnf9+5ExoxzU
 MFyu0azan1crqWA/geJBetL4zVoRJx4qNEve0gqwk06vwLeIKyzB2jPO5dmn9pAb
 kfAFCQgtTUGZHvZWyBZMWQcMKEQLSupOLYXU4b2Vf+oR9U0jvevqs3LArBsUceM9
 vQw8Vg9RZiWs9lVeVYSQErYQecMhdiHYCXFuteaNH9tvATN4PumXiT2ZM9OsX6uy
 jrXW7YLawJbGLIDNsAyrn8JESzY/CsRPpCIUq3JzL2VQdInC3mEl18rTEuKTBeZF
 zE/RgwudhWDT58/vceS2LHa5KNd/vAzMTmUHEUwHg1N7TV3qkSgpPaVcvx4KklXv
 1nKT2KcfD5K1Yy/InjxUYdGhRPYa7azl+l7W4hJ+NCGxwL+tUCg3knp80+empTJ0
 mZm6/VSbc245nKjx3ydLlTbQ/xNMQXgHHDKPW6eO4ezZaydJZG2xkK3x6eF1+i0k
 PWHSLjUxrK1AGrg=
 =ri0M
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Reserve ECAM so we don't assign it to PCI BARs; this works around
     bugs where BIOS included ECAM in a PNP0A03 host bridge window,
     didn't reserve it via a PNP0C02 motherboard device, and didn't
     allocate space for SR-IOV VF BARs (Bjorn Helgaas)

   - Add MMCONFIG/ECAM debug logging (Bjorn Helgaas)

   - Rename 'MMCONFIG' to 'ECAM' to match spec usage (Bjorn Helgaas)

   - Log device type (Root Port, Switch Port, etc) during enumeration
     (Bjorn Helgaas)

   - Log bridges before downstream devices so the dmesg order is more
     logical (Bjorn Helgaas)

   - Log resource names (BAR 0, VF BAR 0, bridge window, etc)
     consistently instead of a mix of names and "reg 0x10" (Puranjay
     Mohan, Bjorn Helgaas)

   - Fix 64GT/s effective data rate calculation to use 1b/1b encoding
     rather than the 8b/10b or 128b/130b used by lower rates (Ilpo
     Järvinen)

   - Use PCI_HEADER_TYPE_* instead of literals in x86, powerpc, SCSI
     lpfc (Ilpo Järvinen)

   - Clean up open-coded PCIBIOS return code mangling (Ilpo Järvinen)

  Resource management:

   - Restructure pci_dev_for_each_resource() to avoid computing the
     address of an out-of-bounds array element (the bounds check was
     performed later so the element was never actually *read*, but it's
     nicer to avoid even computing an out-of-bounds address) (Andy
     Shevchenko)

  Driver binding:

   - Convert pci-host-common.c platform .remove() callback to
     .remove_new() returning 'void' since it's not useful to return
     error codes here (Uwe Kleine-König)

   - Convert exynos, keystone, kirin from .remove() to .remove_new(),
     which returns void instead of int (Uwe Kleine-König)

   - Drop unused struct pci_driver.node member (Mathias Krause)

  Virtualization:

   - Add ACS quirk for more Zhaoxin Root Ports (LeoLiuoc)

  Error handling:

   - Log AER errors as "Correctable" (not "Corrected") or
     "Uncorrectable" to match spec terminology (Bjorn Helgaas)

   - Decode Requester ID when no error info found instead of printing
     the raw hex value (Bjorn Helgaas)

  Endpoint framework:

   - Use a unique test pattern for each BAR in the pci_endpoint_test to
     make it easier to debug address translation issues (Niklas Cassel)

  Broadcom STB PCIe controller driver:

   - Add DT property "brcm,clkreq-mode" and driver support for different
     CLKREQ# modes to make ASPM L1.x states possible (Jim Quinlan)

  Freescale Layerscape PCIe controller driver:

   - Add suspend/resume support for Layerscape LS1043a and LS1021a,
     including software-managed PME_Turn_Off and transitions between L0,
     L2/L3_Ready Link states (Frank Li)

  MediaTek PCIe controller driver:

   - Clear MSI interrupt status before handler to avoid missing MSIs
     that occur after the handler (qizhong cheng)

  MediaTek PCIe Gen3 controller driver:

   - Update mediatek-gen3 translation window setup to handle MMIO space
     that is not a power of two in size (Jianjun Wang)

  Qualcomm PCIe controller driver:

   - Increase qcom iommu-map maxItems to accommodate SDX55 (five
     entries) and SDM845 (sixteen entries) (Krzysztof Kozlowski)

   - Describe qcom,pcie-sc8180x clocks and resets accurately (Krzysztof
     Kozlowski)

   - Describe qcom,pcie-sm8150 clocks and resets accurately (Krzysztof
     Kozlowski)

   - Correct the qcom "reset-name" property, previously incorrectly
     called "reset-names" (Krzysztof Kozlowski)

   - Document qcom,pcie-sm8650, based on qcom,pcie-sm8550 (Neil
     Armstrong)

  Renesas R-Car PCIe controller driver:

   - Replace of_device.h with explicit of.h include to untangle header
     usage (Rob Herring)

   - Add DT and driver support for optional miniPCIe 1.5v and 3.3v
     regulators on KingFisher (Wolfram Sang)

  SiFive FU740 PCIe controller driver:

   - Convert fu740 CONFIG_PCIE_FU740 dependency from SOC_SIFIVE to
     ARCH_SIFIVE (Conor Dooley)

  Synopsys DesignWare PCIe controller driver:

   - Align iATU mapping for endpoint MSI-X (Niklas Cassel)

   - Drop "host_" prefix from struct dw_pcie_host_ops members (Yoshihiro
     Shimoda)

   - Drop "ep_" prefix from struct dw_pcie_ep_ops members (Yoshihiro
     Shimoda)

   - Rename struct dw_pcie_ep_ops.func_conf_select() to
     .get_dbi_offset() to be more descriptive (Yoshihiro Shimoda)

   - Add Endpoint DBI accessors to encapsulate offset lookups (Yoshihiro
     Shimoda)

  TI J721E PCIe driver:

   - Add j721e DT and driver support for 'num-lanes' for devices that
     support x1, x2, or x4 Links (Matt Ranostay)

   - Add j721e DT compatible strings and driver support for j784s4 (Matt
     Ranostay)

   - Make TI J721E Kconfig depend on ARCH_K3 since the hardware is
     specific to those TI SoC parts (Peter Robinson)

  TI Keystone PCIe controller driver:

   - Hold power management references to all PHYs while enabling them to
     avoid a race when one provides clocks to others (Siddharth
     Vadapalli)

  Xilinx XDMA PCIe controller driver:

   - Remove redundant dev_err(), since platform_get_irq() and
     platform_get_irq_byname() already log errors (Yang Li)

   - Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq()
     (Krzysztof Wilczyński)

   - Fix xilinx_pl_dma_pcie_init_irq_domain() error return when
     irq_domain_add_linear() fails (Harshit Mogalapalli)

  MicroSemi Switchtec management driver:

   - Do dma_mrpc cleanup during switchtec_pci_remove() to match its devm
     ioremapping in switchtec_pci_probe(). Previously the cleanup was
     done in stdev_release(), which used stale pointers if stdev->cdev
     happened to be open when the PCI device was removed (Daniel
     Stodden)

  Miscellaneous:

   - Convert interrupt terminology from "legacy" to "INTx" to be more
     specific and match spec terminology (Damien Le Moal)

   - In dw-xdata-pcie, pci_endpoint_test, and vmd, replace usage of
     deprecated ida_simple_*() API with ida_alloc() and ida_free()
     (Christophe JAILLET)"

* tag 'pci-v6.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits)
  PCI: Fix kernel-doc issues
  PCI: brcmstb: Configure HW CLKREQ# mode appropriate for downstream device
  dt-bindings: PCI: brcmstb: Add property "brcm,clkreq-mode"
  PCI: mediatek-gen3: Fix translation window size calculation
  PCI: mediatek: Clear interrupt status before dispatching handler
  PCI: keystone: Fix race condition when initializing PHYs
  PCI: xilinx-xdma: Fix error code in xilinx_pl_dma_pcie_init_irq_domain()
  PCI: xilinx-xdma: Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq()
  PCI: rcar-gen4: Fix -Wvoid-pointer-to-enum-cast error
  PCI: iproc: Fix -Wvoid-pointer-to-enum-cast warning
  PCI: dwc: Add dw_pcie_ep_{read,write}_dbi[2] helpers
  PCI: dwc: Rename .func_conf_select to .get_dbi_offset in struct dw_pcie_ep_ops
  PCI: dwc: Rename .ep_init to .init in struct dw_pcie_ep_ops
  PCI: dwc: Drop host prefix from struct dw_pcie_host_ops members
  misc: pci_endpoint_test: Use a unique test pattern for each BAR
  PCI: j721e: Make TI J721E depend on ARCH_K3
  PCI: j721e: Add TI J784S4 PCIe configuration
  PCI/AER: Use explicit register sizes for struct members
  PCI/AER: Decode Requester ID when no error info found
  PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
  ...
2024-01-17 16:23:17 -08:00
Linus Torvalds
09d1c6a80f Generic:
- Use memdup_array_user() to harden against overflow.
 
 - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
 
 - Clean up Kconfigs that all KVM architectures were selecting
 
 - New functionality around "guest_memfd", a new userspace API that
   creates an anonymous file and returns a file descriptor that refers
   to it.  guest_memfd files are bound to their owning virtual machine,
   cannot be mapped, read, or written by userspace, and cannot be resized.
   guest_memfd files do however support PUNCH_HOLE, which can be used to
   switch a memory area between guest_memfd and regular anonymous memory.
 
 - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
   per-page attributes for a given page of guest memory; right now the
   only attribute is whether the guest expects to access memory via
   guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
   TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees
   confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM).
 
 x86:
 
 - Support for "software-protected VMs" that can use the new guest_memfd
   and page attributes infrastructure.  This is mostly useful for testing,
   since there is no pKVM-like infrastructure to provide a meaningfully
   reduced TCB.
 
 - Fix a relatively benign off-by-one error when splitting huge pages during
   CLEAR_DIRTY_LOG.
 
 - Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
   TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
 
 - Use more generic lockdep assertions in paths that don't actually care
   about whether the caller is a reader or a writer.
 
 - let Xen guests opt out of having PV clock reported as "based on a stable TSC",
   because some of them don't expect the "TSC stable" bit (added to the pvclock
   ABI by KVM, but never set by Xen) to be set.
 
 - Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL.
 
 - Advertise flush-by-ASID support for nSVM unconditionally, as KVM always
   flushes on nested transitions, i.e. always satisfies flush requests.  This
   allows running bleeding edge versions of VMware Workstation on top of KVM.
 
 - Sanity check that the CPU supports flush-by-ASID when enabling SEV support.
 
 - On AMD machines with vNMI, always rely on hardware instead of intercepting
   IRET in some cases to detect unmasking of NMIs
 
 - Support for virtualizing Linear Address Masking (LAM)
 
 - Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state
   prior to refreshing the vPMU model.
 
 - Fix a double-overflow PMU bug by tracking emulated counter events using a
   dedicated field instead of snapshotting the "previous" counter.  If the
   hardware PMC count triggers overflow that is recognized in the same VM-Exit
   that KVM manually bumps an event count, KVM would pend PMIs for both the
   hardware-triggered overflow and for KVM-triggered overflow.
 
 - Turn off KVM_WERROR by default for all configs so that it's not
   inadvertantly enabled by non-KVM developers, which can be problematic for
   subsystems that require no regressions for W=1 builds.
 
 - Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL
   "features".
 
 - Don't force a masterclock update when a vCPU synchronizes to the current TSC
   generation, as updating the masterclock can cause kvmclock's time to "jump"
   unexpectedly, e.g. when userspace hotplugs a pre-created vCPU.
 
 - Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths,
   partly as a super minor optimization, but mostly to make KVM play nice with
   position independent executable builds.
 
 - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
   CONFIG_HYPERV as a minor optimization, and to self-document the code.
 
 - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation"
   at build time.
 
 ARM64:
 
 - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
   base granule sizes. Branch shared with the arm64 tree.
 
 - Large Fine-Grained Trap rework, bringing some sanity to the
   feature, although there is more to come. This comes with
   a prefix branch shared with the arm64 tree.
 
 - Some additional Nested Virtualization groundwork, mostly
   introducing the NV2 VNCR support and retargetting the NV
   support to that version of the architecture.
 
 - A small set of vgic fixes and associated cleanups.
 
 Loongarch:
 
 - Optimization for memslot hugepage checking
 
 - Cleanup and fix some HW/SW timer issues
 
 - Add LSX/LASX (128bit/256bit SIMD) support
 
 RISC-V:
 
 - KVM_GET_REG_LIST improvement for vector registers
 
 - Generate ISA extension reg_list using macros in get-reg-list selftest
 
 - Support for reporting steal time along with selftest
 
 s390:
 
 - Bugfixes
 
 Selftests:
 
 - Fix an annoying goof where the NX hugepage test prints out garbage
   instead of the magic token needed to run the test.
 
 - Fix build errors when a header is delete/moved due to a missing flag
   in the Makefile.
 
 - Detect if KVM bugged/killed a selftest's VM and print out a helpful
   message instead of complaining that a random ioctl() failed.
 
 - Annotate the guest printf/assert helpers with __printf(), and fix the
   various bugs that were lurking due to lack of said annotation.
 
 There are two non-KVM patches buried in the middle of guest_memfd support:
 
   fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
   mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
 
 The first is small and mostly suggested-by Christian Brauner; the second
 a bit less so but it was written by an mm person (Vlastimil Babka).
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmWcMWkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroO15gf/WLmmg3SET6Uzw9iEq2xo28831ZA+
 6kpILfIDGKozV5safDmMvcInlc/PTnqOFrsKyyN4kDZ+rIJiafJdg/loE0kPXBML
 wdR+2ix5kYI1FucCDaGTahskBDz8Lb/xTpwGg9BFLYFNmuUeHc74o6GoNvr1uliE
 4kLZL2K6w0cSMPybUD+HqGaET80ZqPwecv+s1JL+Ia0kYZJONJifoHnvOUJ7DpEi
 rgudVdgzt3EPjG0y1z6MjvDBXTCOLDjXajErlYuZD3Ej8N8s59Dh2TxOiDNTLdP4
 a4zjRvDmgyr6H6sz+upvwc7f4M4p+DBvf+TkWF54mbeObHUYliStqURIoA==
 =66Ws
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "Generic:

   - Use memdup_array_user() to harden against overflow.

   - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all
     architectures.

   - Clean up Kconfigs that all KVM architectures were selecting

   - New functionality around "guest_memfd", a new userspace API that
     creates an anonymous file and returns a file descriptor that refers
     to it. guest_memfd files are bound to their owning virtual machine,
     cannot be mapped, read, or written by userspace, and cannot be
     resized. guest_memfd files do however support PUNCH_HOLE, which can
     be used to switch a memory area between guest_memfd and regular
     anonymous memory.

   - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
     per-page attributes for a given page of guest memory; right now the
     only attribute is whether the guest expects to access memory via
     guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
     TDX or ARM64 pKVM is checked by firmware or hypervisor that
     guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in
     the case of pKVM).

  x86:

   - Support for "software-protected VMs" that can use the new
     guest_memfd and page attributes infrastructure. This is mostly
     useful for testing, since there is no pKVM-like infrastructure to
     provide a meaningfully reduced TCB.

   - Fix a relatively benign off-by-one error when splitting huge pages
     during CLEAR_DIRTY_LOG.

   - Fix a bug where KVM could incorrectly test-and-clear dirty bits in
     non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with
     a non-huge SPTE.

   - Use more generic lockdep assertions in paths that don't actually
     care about whether the caller is a reader or a writer.

   - let Xen guests opt out of having PV clock reported as "based on a
     stable TSC", because some of them don't expect the "TSC stable" bit
     (added to the pvclock ABI by KVM, but never set by Xen) to be set.

   - Revert a bogus, made-up nested SVM consistency check for
     TLB_CONTROL.

   - Advertise flush-by-ASID support for nSVM unconditionally, as KVM
     always flushes on nested transitions, i.e. always satisfies flush
     requests. This allows running bleeding edge versions of VMware
     Workstation on top of KVM.

   - Sanity check that the CPU supports flush-by-ASID when enabling SEV
     support.

   - On AMD machines with vNMI, always rely on hardware instead of
     intercepting IRET in some cases to detect unmasking of NMIs

   - Support for virtualizing Linear Address Masking (LAM)

   - Fix a variety of vPMU bugs where KVM fail to stop/reset counters
     and other state prior to refreshing the vPMU model.

   - Fix a double-overflow PMU bug by tracking emulated counter events
     using a dedicated field instead of snapshotting the "previous"
     counter. If the hardware PMC count triggers overflow that is
     recognized in the same VM-Exit that KVM manually bumps an event
     count, KVM would pend PMIs for both the hardware-triggered overflow
     and for KVM-triggered overflow.

   - Turn off KVM_WERROR by default for all configs so that it's not
     inadvertantly enabled by non-KVM developers, which can be
     problematic for subsystems that require no regressions for W=1
     builds.

   - Advertise all of the host-supported CPUID bits that enumerate
     IA32_SPEC_CTRL "features".

   - Don't force a masterclock update when a vCPU synchronizes to the
     current TSC generation, as updating the masterclock can cause
     kvmclock's time to "jump" unexpectedly, e.g. when userspace
     hotplugs a pre-created vCPU.

   - Use RIP-relative address to read kvm_rebooting in the VM-Enter
     fault paths, partly as a super minor optimization, but mostly to
     make KVM play nice with position independent executable builds.

   - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
     CONFIG_HYPERV as a minor optimization, and to self-document the
     code.

   - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV
     "emulation" at build time.

  ARM64:

   - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base
     granule sizes. Branch shared with the arm64 tree.

   - Large Fine-Grained Trap rework, bringing some sanity to the
     feature, although there is more to come. This comes with a prefix
     branch shared with the arm64 tree.

   - Some additional Nested Virtualization groundwork, mostly
     introducing the NV2 VNCR support and retargetting the NV support to
     that version of the architecture.

   - A small set of vgic fixes and associated cleanups.

  Loongarch:

   - Optimization for memslot hugepage checking

   - Cleanup and fix some HW/SW timer issues

   - Add LSX/LASX (128bit/256bit SIMD) support

  RISC-V:

   - KVM_GET_REG_LIST improvement for vector registers

   - Generate ISA extension reg_list using macros in get-reg-list
     selftest

   - Support for reporting steal time along with selftest

  s390:

   - Bugfixes

  Selftests:

   - Fix an annoying goof where the NX hugepage test prints out garbage
     instead of the magic token needed to run the test.

   - Fix build errors when a header is delete/moved due to a missing
     flag in the Makefile.

   - Detect if KVM bugged/killed a selftest's VM and print out a helpful
     message instead of complaining that a random ioctl() failed.

   - Annotate the guest printf/assert helpers with __printf(), and fix
     the various bugs that were lurking due to lack of said annotation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits)
  x86/kvm: Do not try to disable kvmclock if it was not enabled
  KVM: x86: add missing "depends on KVM"
  KVM: fix direction of dependency on MMU notifiers
  KVM: introduce CONFIG_KVM_COMMON
  KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
  KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
  RISC-V: KVM: selftests: Add get-reg-list test for STA registers
  RISC-V: KVM: selftests: Add steal_time test support
  RISC-V: KVM: selftests: Add guest_sbi_probe_extension
  RISC-V: KVM: selftests: Move sbi_ecall to processor.c
  RISC-V: KVM: Implement SBI STA extension
  RISC-V: KVM: Add support for SBI STA registers
  RISC-V: KVM: Add support for SBI extension registers
  RISC-V: KVM: Add SBI STA info to vcpu_arch
  RISC-V: KVM: Add steal-update vcpu request
  RISC-V: KVM: Add SBI STA extension skeleton
  RISC-V: paravirt: Implement steal-time support
  RISC-V: Add SBI STA extension definitions
  RISC-V: paravirt: Add skeleton for pv-time support
  RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
  ...
2024-01-17 13:03:37 -08:00
Linus Torvalds
3e7aeb78ab Networking changes for 6.8.
Core & protocols
 ----------------
 
  - Analyze and reorganize core networking structs (socks, netdev,
    netns, mibs) to optimize cacheline consumption and set up
    build time warnings to safeguard against future header changes.
    This improves TCP performances with many concurrent connections
    up to 40%.
 
  - Add page-pool netlink-based introspection, exposing the
    memory usage and recycling stats. This helps indentify
    bad PP users and possible leaks.
 
  - Refine TCP/DCCP source port selection to no longer favor even
    source port at connect() time when IP_LOCAL_PORT_RANGE is set.
    This lowers the time taken by connect() for hosts having
    many active connections to the same destination.
 
  - Refactor the TCP bind conflict code, shrinking related socket
    structs.
 
  - Refactor TCP SYN-Cookie handling, as a preparation step to
    allow arbitrary SYN-Cookie processing via eBPF.
 
  - Tune optmem_max for 0-copy usage, increasing the default value
    to 128KB and namespecifying it.
 
  - Allow coalescing for cloned skbs coming from page pools, improving
    RX performances with some common configurations.
 
  - Reduce extension header parsing overhead at GRO time.
 
  - Add bridge MDB bulk deletion support, allowing user-space to
    request the deletion of matching entries.
 
  - Reorder nftables struct members, to keep data accessed by the
    datapath first.
 
  - Introduce TC block ports tracking and use. This allows supporting
    multicast-like behavior at the TC layer.
 
  - Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and
    classifiers (RSVP and tcindex).
 
  - More data-race annotations.
 
  - Extend the diag interface to dump TCP bound-only sockets.
 
  - Conditional notification of events for TC qdisc class and actions.
 
  - Support for WPAN dynamic associations with nearby devices, to form
    a sub-network using a specific PAN ID.
 
  - Implement SMCv2.1 virtual ISM device support.
 
  - Add support for Batman-avd mulicast packet type.
 
 BPF
 ---
 
  - Tons of verifier improvements:
    - BPF register bounds logic and range support along with a large
      test suite
    - log improvements
    - complete precision tracking support for register spills
    - track aligned STACK_ZERO cases as imprecise spilled registers. It
      improves the verifier "instructions processed" metric from single
      digit to 50-60% for some programs
    - support for user's global BPF subprogram arguments with few
      commonly requested annotations for a better developer experience
    - support tracking of BPF_JNE which helps cases when the compiler
      transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the
      like
    - several fixes
 
  - Add initial TX metadata implementation for AF_XDP with support in
    mlx5 and stmmac drivers. Two types of offloads are supported right
    now, that is, TX timestamp and TX checksum offload.
 
  - Fix kCFI bugs in BPF all forms of indirect calls from BPF into
    kernel and from kernel into BPF work with CFI enabled. This allows
    BPF to work with CONFIG_FINEIBT=y.
 
  - Change BPF verifier logic to validate global subprograms lazily
    instead of unconditionally before the main program, so they can be
    guarded using BPF CO-RE techniques.
 
  - Support uid/gid options when mounting bpffs.
 
  - Add a new kfunc which acquires the associated cgroup of a task
    within a specific cgroup v1 hierarchy where the latter is identified
    by its id.
 
  - Extend verifier to allow bpf_refcount_acquire() of a map value field
    obtained via direct load which is a use-case needed in sched_ext.
 
  - Add BPF link_info support for uprobe multi link along with bpftool
    integration for the latter.
 
  - Support for VLAN tag in XDP hints.
 
  - Remove deprecated bpfilter kernel leftovers given the project
    is developed in user-space (https://github.com/facebook/bpfilter).
 
 Misc
 ----
 
  - Support for parellel TC self-tests execution.
 
  - Increase MPTCP self-tests coverage.
 
  - Updated the bridge documentation, including several so-far
    undocumented features.
 
  - Convert all the net self-tests to run in unique netns, to
    avoid random failures due to conflict and allow concurrent
    runs.
 
  - Add TCP-AO self-tests.
 
  - Add kunit tests for both cfg80211 and mac80211.
 
  - Autogenerate Netlink families documentation from YAML spec.
 
  - Add yml-gen support for fixed headers and recursive nests, the
    tool can now generate user-space code for all genetlink families
    for which we have specs.
 
  - A bunch of additional module descriptions fixes.
 
  - Catch incorrect freeing of pages belonging to a page pool.
 
 Driver API
 ----------
 
  - Rust abstractions for network PHY drivers; do not cover yet the
    full C API, but already allow implementing functional PHY drivers
    in rust.
 
  - Introduce queue and NAPI support in the netdev Netlink interface,
    allowing complete access to the device <> NAPIs <> queues
    relationship.
 
  - Introduce notifications filtering for devlink to allow control
    application scale to thousands of instances.
 
  - Improve PHY validation, requesting rate matching information for
    each ethtool link mode supported by both the PHY and host.
 
  - Add support for ethtool symmetric-xor RSS hash.
 
  - ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD
    platform.
 
  - Expose pin fractional frequency offset value over new DPLL generic
    netlink attribute.
 
  - Convert older drivers to platform remove callback returning void.
 
  - Add support for PHY package MMD read/write.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Octeon CN10K devices
    - Broadcom 5760X P7
    - Qualcomm SM8550 SoC
    - Texas Instrument DP83TG720S PHY
 
  - Bluetooth:
    - IMC Networks Bluetooth radio
 
 Removed
 -------
 
  - WiFi:
    - libertas 16-bit PCMCIA support
    - Atmel at76c50x drivers
    - HostAP ISA/PCMCIA style 802.11b driver
    - zd1201 802.11b USB dongles
    - Orinoco ISA/PCMCIA 802.11b driver
    - Aviator/Raytheon driver
    - Planet WL3501 driver
    - RNDIS USB 802.11b driver
 
 Drivers
 -------
 
  - Ethernet high-speed NICs:
    - Intel (100G, ice, idpf):
      - allow one by one port representors creation and removal
      - add temperature and clock information reporting
      - add get/set for ethtool's header split ringparam
      - add again FW logging
      - adds support switchdev hardware packet mirroring
      - iavf: implement symmetric-xor RSS hash
      - igc: add support for concurrent physical and free-running timers
      - i40e: increase the allowable descriptors
    - nVidia/Mellanox:
      - Preparation for Socket-Direct multi-dev netdev. That will allow
        in future releases combining multiple PFs devices attached to
        different NUMA nodes under the same netdev
    - Broadcom (bnxt):
      - TX completion handling improvements
      - add basic ntuple filter support
      - reduce MSIX vectors usage for MQPRIO offload
      - add VXLAN support, USO offload and TX coalesce completion for P7
    - Marvell Octeon EP:
      - xmit-more support
      - add PF-VF mailbox support and use it for FW notifications for VFs
    - Wangxun (ngbe/txgbe):
      - implement ethtool functions to operate pause param, ring param,
        coalesce channel number and msglevel
    - Netronome/Corigine (nfp):
      - add flow-steering support
      - support UDP segmentation offload
 
  - Ethernet NICs embedded, slower, virtual:
    - Xilinx AXI: remove duplicate DMA code adopting the dma engine driver
    - stmmac: add support for HW-accelerated VLAN stripping
    - TI AM654x sw: add mqprio, frame preemption & coalescing
    - gve: add support for non-4k page sizes.
    - virtio-net: support dynamic coalescing moderation
 
  - nVidia/Mellanox Ethernet datacenter switches:
    - allow firmware upgrade without a reboot
    - more flexible support for bridge flooding via the compressed
      FID flooding mode
 
  - Ethernet embedded switches:
    - Microchip:
      - fine-tune flow control and speed configurations in KSZ8xxx
      - KSZ88X3: enable setting rmii reference
    - Renesas:
      - add jumbo frames support
    - Marvell:
      - 88E6xxx: add "eth-mac" and "rmon" stats support
 
  - Ethernet PHYs:
    - aquantia: add firmware load support
    - at803x: refactor the driver to simplify adding support for more
      chip variants
    - NXP C45 TJA11xx: Add MACsec offload support
 
  - Wifi:
    - MediaTek (mt76):
      - NVMEM EEPROM improvements
      - mt7996 Extremely High Throughput (EHT) improvements
      - mt7996 Wireless Ethernet Dispatcher (WED) support
      - mt7996 36-bit DMA support
    - Qualcomm (ath12k):
      - support for a single MSI vector
      - WCN7850: support AP mode
    - Intel (iwlwifi):
      - new debugfs file fw_dbg_clear
      - allow concurrent P2P operation on DFS channels
 
  - Bluetooth:
    - QCA2066: support HFP offload
    - ISO: more broadcast-related improvements
    - NXP: better recovery in case receiver/transmitter get out of sync
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmWdamsSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkGC4P/2xjLzdw22ckSssuE9ORbGko9SNjnqHk
 PQh1E+26BHiCg5KB8VvzMsL78E79MRNXEattSW+1g7dhCvln3oi+Vd0WkdRkgt35
 98Iv18zLbbwFAJeyKvmLAPAkQkMLtVj19QILBBRrugF+egEZgVSE3JBcTAiKv2ZQ
 HzkabA171Ri6LpCcEEtY5XuaKvimGnGzF8YMFf8rX0wtqd2p5kbY9aMe47WAGxvU
 Vf9548XvH+A5yVH2/4/gujtUOpA/RHuhuCMb+oo0cZ+VCC1x9MGzoXzj6r87OTkf
 k2W1whNzcGoin92f+9Lk1JYMuiGKBH4QVaDdNXJnYFSJWPTE7RvRsPzYTSD4/GzK
 yEZbzSJXpy/2vDQm16NoAxl7evRs8Sorzkw4LQRviZHI/5SAkK2ZQiCK5CO8QSYy
 C1LELcV5kn6Foe24xWnrWLjAGug9oJnYoGPMU5gvPmFJMvUMXqm5rmbBgUWL5Rxw
 q1M6gVzabCyWUy6z2G2vaqW2ZntNVvCkdsLtIX0XZkcTzNoP0MA+TuhyGz4wbiuo
 PeyQp/mbGnDgCYggqKIA0YWrTVxkhFrKN520cbO8qXBQytV9oFbM/0/+C0/r/5WX
 pL1JVzLrh6l5ME7EIQfha8UOF9j8q4ueSwb40P3AR2NaZiDABM0zfUZ6+sx+91WF
 ucqPEcZB5cRE
 =1bW6
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "The most interesting thing is probably the networking structs
  reorganization and a significant amount of changes is around
  self-tests.

  Core & protocols:

   - Analyze and reorganize core networking structs (socks, netdev,
     netns, mibs) to optimize cacheline consumption and set up build
     time warnings to safeguard against future header changes

     This improves TCP performances with many concurrent connections up
     to 40%

   - Add page-pool netlink-based introspection, exposing the memory
     usage and recycling stats. This helps indentify bad PP users and
     possible leaks

   - Refine TCP/DCCP source port selection to no longer favor even
     source port at connect() time when IP_LOCAL_PORT_RANGE is set. This
     lowers the time taken by connect() for hosts having many active
     connections to the same destination

   - Refactor the TCP bind conflict code, shrinking related socket
     structs

   - Refactor TCP SYN-Cookie handling, as a preparation step to allow
     arbitrary SYN-Cookie processing via eBPF

   - Tune optmem_max for 0-copy usage, increasing the default value to
     128KB and namespecifying it

   - Allow coalescing for cloned skbs coming from page pools, improving
     RX performances with some common configurations

   - Reduce extension header parsing overhead at GRO time

   - Add bridge MDB bulk deletion support, allowing user-space to
     request the deletion of matching entries

   - Reorder nftables struct members, to keep data accessed by the
     datapath first

   - Introduce TC block ports tracking and use. This allows supporting
     multicast-like behavior at the TC layer

   - Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and
     classifiers (RSVP and tcindex)

   - More data-race annotations

   - Extend the diag interface to dump TCP bound-only sockets

   - Conditional notification of events for TC qdisc class and actions

   - Support for WPAN dynamic associations with nearby devices, to form
     a sub-network using a specific PAN ID

   - Implement SMCv2.1 virtual ISM device support

   - Add support for Batman-avd mulicast packet type

  BPF:

   - Tons of verifier improvements:
       - BPF register bounds logic and range support along with a large
         test suite
       - log improvements
       - complete precision tracking support for register spills
       - track aligned STACK_ZERO cases as imprecise spilled registers.
         This improves the verifier "instructions processed" metric from
         single digit to 50-60% for some programs
       - support for user's global BPF subprogram arguments with few
         commonly requested annotations for a better developer
         experience
       - support tracking of BPF_JNE which helps cases when the compiler
         transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the
         like
       - several fixes

   - Add initial TX metadata implementation for AF_XDP with support in
     mlx5 and stmmac drivers. Two types of offloads are supported right
     now, that is, TX timestamp and TX checksum offload

   - Fix kCFI bugs in BPF all forms of indirect calls from BPF into
     kernel and from kernel into BPF work with CFI enabled. This allows
     BPF to work with CONFIG_FINEIBT=y

   - Change BPF verifier logic to validate global subprograms lazily
     instead of unconditionally before the main program, so they can be
     guarded using BPF CO-RE techniques

   - Support uid/gid options when mounting bpffs

   - Add a new kfunc which acquires the associated cgroup of a task
     within a specific cgroup v1 hierarchy where the latter is
     identified by its id

   - Extend verifier to allow bpf_refcount_acquire() of a map value
     field obtained via direct load which is a use-case needed in
     sched_ext

   - Add BPF link_info support for uprobe multi link along with bpftool
     integration for the latter

   - Support for VLAN tag in XDP hints

   - Remove deprecated bpfilter kernel leftovers given the project is
     developed in user-space (https://github.com/facebook/bpfilter)

  Misc:

   - Support for parellel TC self-tests execution

   - Increase MPTCP self-tests coverage

   - Updated the bridge documentation, including several so-far
     undocumented features

   - Convert all the net self-tests to run in unique netns, to avoid
     random failures due to conflict and allow concurrent runs

   - Add TCP-AO self-tests

   - Add kunit tests for both cfg80211 and mac80211

   - Autogenerate Netlink families documentation from YAML spec

   - Add yml-gen support for fixed headers and recursive nests, the tool
     can now generate user-space code for all genetlink families for
     which we have specs

   - A bunch of additional module descriptions fixes

   - Catch incorrect freeing of pages belonging to a page pool

  Driver API:

   - Rust abstractions for network PHY drivers; do not cover yet the
     full C API, but already allow implementing functional PHY drivers
     in rust

   - Introduce queue and NAPI support in the netdev Netlink interface,
     allowing complete access to the device <> NAPIs <> queues
     relationship

   - Introduce notifications filtering for devlink to allow control
     application scale to thousands of instances

   - Improve PHY validation, requesting rate matching information for
     each ethtool link mode supported by both the PHY and host

   - Add support for ethtool symmetric-xor RSS hash

   - ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD
     platform

   - Expose pin fractional frequency offset value over new DPLL generic
     netlink attribute

   - Convert older drivers to platform remove callback returning void

   - Add support for PHY package MMD read/write

  New hardware / drivers:

   - Ethernet:
       - Octeon CN10K devices
       - Broadcom 5760X P7
       - Qualcomm SM8550 SoC
       - Texas Instrument DP83TG720S PHY

   - Bluetooth:
       - IMC Networks Bluetooth radio

  Removed:

   - WiFi:
       - libertas 16-bit PCMCIA support
       - Atmel at76c50x drivers
       - HostAP ISA/PCMCIA style 802.11b driver
       - zd1201 802.11b USB dongles
       - Orinoco ISA/PCMCIA 802.11b driver
       - Aviator/Raytheon driver
       - Planet WL3501 driver
       - RNDIS USB 802.11b driver

  Driver updates:

   - Ethernet high-speed NICs:
       - Intel (100G, ice, idpf):
          - allow one by one port representors creation and removal
          - add temperature and clock information reporting
          - add get/set for ethtool's header split ringparam
          - add again FW logging
          - adds support switchdev hardware packet mirroring
          - iavf: implement symmetric-xor RSS hash
          - igc: add support for concurrent physical and free-running
            timers
          - i40e: increase the allowable descriptors
       - nVidia/Mellanox:
          - Preparation for Socket-Direct multi-dev netdev. That will
            allow in future releases combining multiple PFs devices
            attached to different NUMA nodes under the same netdev
       - Broadcom (bnxt):
          - TX completion handling improvements
          - add basic ntuple filter support
          - reduce MSIX vectors usage for MQPRIO offload
          - add VXLAN support, USO offload and TX coalesce completion
            for P7
       - Marvell Octeon EP:
          - xmit-more support
          - add PF-VF mailbox support and use it for FW notifications
            for VFs
       - Wangxun (ngbe/txgbe):
          - implement ethtool functions to operate pause param, ring
            param, coalesce channel number and msglevel
       - Netronome/Corigine (nfp):
          - add flow-steering support
          - support UDP segmentation offload

   - Ethernet NICs embedded, slower, virtual:
       - Xilinx AXI: remove duplicate DMA code adopting the dma engine
         driver
       - stmmac: add support for HW-accelerated VLAN stripping
       - TI AM654x sw: add mqprio, frame preemption & coalescing
       - gve: add support for non-4k page sizes.
       - virtio-net: support dynamic coalescing moderation

   - nVidia/Mellanox Ethernet datacenter switches:
       - allow firmware upgrade without a reboot
       - more flexible support for bridge flooding via the compressed
         FID flooding mode

   - Ethernet embedded switches:
       - Microchip:
          - fine-tune flow control and speed configurations in KSZ8xxx
          - KSZ88X3: enable setting rmii reference
       - Renesas:
          - add jumbo frames support
       - Marvell:
          - 88E6xxx: add "eth-mac" and "rmon" stats support

   - Ethernet PHYs:
       - aquantia: add firmware load support
       - at803x: refactor the driver to simplify adding support for more
         chip variants
       - NXP C45 TJA11xx: Add MACsec offload support

   - Wifi:
       - MediaTek (mt76):
          - NVMEM EEPROM improvements
          - mt7996 Extremely High Throughput (EHT) improvements
          - mt7996 Wireless Ethernet Dispatcher (WED) support
          - mt7996 36-bit DMA support
       - Qualcomm (ath12k):
          - support for a single MSI vector
          - WCN7850: support AP mode
       - Intel (iwlwifi):
          - new debugfs file fw_dbg_clear
          - allow concurrent P2P operation on DFS channels

   - Bluetooth:
       - QCA2066: support HFP offload
       - ISO: more broadcast-related improvements
       - NXP: better recovery in case receiver/transmitter get out of sync"

* tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1714 commits)
  lan78xx: remove redundant statement in lan78xx_get_eee
  lan743x: remove redundant statement in lan743x_ethtool_get_eee
  bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer()
  bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel()
  bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter()
  tcp: Revert no longer abort SYN_SENT when receiving some ICMP
  Revert "mlx5 updates 2023-12-20"
  Revert "net: stmmac: Enable Per DMA Channel interrupt"
  ipvlan: Remove usage of the deprecated ida_simple_xx() API
  ipvlan: Fix a typo in a comment
  net/sched: Remove ipt action tests
  net: stmmac: Use interrupt mode INTM=1 for per channel irq
  net: stmmac: Add support for TX/RX channel interrupt
  net: stmmac: Make MSI interrupt routine generic
  dt-bindings: net: snps,dwmac: per channel irq
  net: phy: at803x: make read_status more generic
  net: phy: at803x: add support for cdt cross short test for qca808x
  net: phy: at803x: refactor qca808x cable test get status function
  net: phy: at803x: generalize cdt fault length function
  net: ethernet: cortina: Drop TSO support
  ...
2024-01-11 10:07:29 -08:00
Linus Torvalds
c299010061 asm-generic cleanups for 6.8
A series from Baoquan He cleans up the asm-generic/io.h to remove the
 ioremap_uc() definition from everything except x86, which still needs it
 for pre-PAT systems. This series notably contains a patch from Jiaxun Yang
 that converts MIPS to use asm-generic/io.h like every other architecture
 does, enabling future cleanups.
 
 Some of my own patches fix -Wmissing-prototype warnings in architecture
 specific code across several architectures. This is now needed as the
 warning is enabled by default. There are still some remaining warnings
 in minor platforms, but the series should catch most of the widely used
 ones make them more consistent with one another.
 
 David McKay fixes a bug in __generic_cmpxchg_local() when this is used
 on 64-bit architectures. This could currently only affect parisc64
 and sparc64.
 
 Additional cleanups address from Linus Walleij, Uwe Kleine-König,
 Thomas Huth, and Kefeng Wang help reduce unnecessary inconsistencies
 between architectures.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmWeak8ACgkQYKtH/8kJ
 UidSiQ/+LL1WTO9d3Zx5HI0GGGjaIYpYs6jUNSf9Y5GPQiOrvjfEWj7CU11/4vxl
 GlQRpRyncYm8Eiz0Qu+aNxZFiiMah8Uful75yfbX8P1L4EPTbAYNDjkyNJrTjIAK
 jPK4sl8awIrapOeFUz++PsEj22R/4Is4f0mo+CqoCkL5RKlHe5oFdXzcwjmds4yK
 CvU6Ldn+M7FZ3EItMdjXaB3D3HS9uictFiO5JByZY8p+IcqgNRI/iHNnZIMsltJ+
 XjDi0DG+x4jCj6teElSchw7AofE4OcNSP3xbR1PLKv6+xBLGYaAGZhNuPTz88eV/
 Gj0loDQrrR5McGUfDBRHK9zN2Jd0O/FKnfh9kLOt1FLFyGPvC78Q/2HkpVCjbBr2
 Pr1aqhLDHA+tGNSsThsV8RUa8/tiEnxAki43tfBFS3SEKhtQsTm2g1z4miwbE3p0
 BJIrSgTqrP/SBq7a9z/thPrkzdZcNuA9FUETTbaMeUlJS51n1V9E5A1t7sOG7jaI
 vV/gbuR6FjvD49mTyQiOSCt3V4ygRqgN1Q+C4QM8WLqq2keUq0AhGodquv8F78in
 J3x2j2r27lHY7jKf8B0dua/JXAsF20u8qD6yDQ9ymkjt/MWhGXBgK0jpT7RTIuMS
 e2jmTywUVD4UohAcx3inkOojUhIJ5KDB0I4Pzv4zWcHNbyFNKcY=
 =4VQl
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanups from Arnd Bergmann:
 "A series from Baoquan He cleans up the asm-generic/io.h to remove the
  ioremap_uc() definition from everything except x86, which still needs
  it for pre-PAT systems. This series notably contains a patch from
  Jiaxun Yang that converts MIPS to use asm-generic/io.h like every
  other architecture does, enabling future cleanups.

  Some of my own patches fix -Wmissing-prototype warnings in
  architecture specific code across several architectures. This is now
  needed as the warning is enabled by default. There are still some
  remaining warnings in minor platforms, but the series should catch
  most of the widely used ones make them more consistent with one
  another.

  David McKay fixes a bug in __generic_cmpxchg_local() when this is used
  on 64-bit architectures. This could currently only affect parisc64 and
  sparc64.

  Additional cleanups address from Linus Walleij, Uwe Kleine-König,
  Thomas Huth, and Kefeng Wang help reduce unnecessary inconsistencies
  between architectures"

* tag 'asm-generic-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: Fix 32 bit __generic_cmpxchg_local
  Hexagon: Make pfn accessors statics inlines
  ARC: mm: Make virt_to_pfn() a static inline
  mips: remove extraneous asm-generic/iomap.h include
  sparc: Use $(kecho) to announce kernel images being ready
  arm64: vdso32: Define BUILD_VDSO32_64 to correct prototypes
  csky: fix arch_jump_label_transform_static override
  arch: add do_page_fault prototypes
  arch: add missing prepare_ftrace_return() prototypes
  arch: vdso: consolidate gettime prototypes
  arch: include linux/cpu.h for trap_init() prototype
  arch: fix asm-offsets.c building with -Wmissing-prototypes
  arch: consolidate arch_irq_work_raise prototypes
  hexagon: Remove CONFIG_HEXAGON_ARCH_VERSION from uapi header
  asm/io: remove unnecessary xlate_dev_mem_ptr() and unxlate_dev_mem_ptr()
  mips: io: remove duplicated codes
  arch/*/io.h: remove ioremap_uc in some architectures
  mips: add <asm-generic/io.h> including
2024-01-10 18:13:44 -08:00
Linus Torvalds
78273df7f6 header cleanups for 6.8
The goal is to get sched.h down to a type only header, so the main thing
 happening in this patchset is splitting out various _types.h headers and
 dependency fixups, as well as moving some things out of sched.h to
 better locations.
 
 This is prep work for the memory allocation profiling patchset which
 adds new sched.h interdepencencies.
 
 Testing - it's been in -next, and fixes from pretty much all
 architectures have percolated in - nothing major.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWfBwwACgkQE6szbY3K
 bnZPwBAAmuRojXaeWxi01IPIOehSGDe68vw44PR9glEMZvxdnZuPOdvE4/+245/L
 bRKU2WBCjBUokUbV9msIShwRkFTZAmEMPNfPAAsFMA+VXeDYHKB+ZRdwTggNAQ+I
 SG6fZgh5m0HsewCDxU8oqVHkjVq4fXn0cy+aL6xLEd9gu67GoBzX2pDieS2Kvy6j
 jnyoKTxFwb+LTQgph0P4EIpq5I2umAsdLwdSR8EJ+8e9NiNvMo1pI00Lx/ntAnFZ
 JftWUJcMy3TQ5u1GkyfQN9y/yThX1bZK5GvmHS9SJ2Dkacaus5d+xaKCHtRuFS1I
 7C6b8PsNgRczUMumBXus44HdlNfNs1yU3lvVxFvBIPE1qC9pYRHrkWIXXIocXLLC
 oxTEJ6B2G3BQZVQgLIA4fOaxMVhmvKffi/aEZLi9vN9VVosd1a6XNKI6KbyRnXFp
 GSs9qDqszhn5I3GYNlDNQTc/8UsRlhPFgS6nS0By6QnvxtGi9QkU2tBRBsXvqwCy
 cLoCYIhc2tvugHvld70dz26umiJ4rnmxGlobStNoigDvIKAIUt1UmIdr1so8P8eH
 xehnL9ZcOX6xnANDL0AqMFFHV6I58CJynhFdUoXfVQf/DWLGX48mpi9LVNsYBzsI
 CAwVOAQ0UjGrpdWmJ9ueY/ABYqg9vRjzaDEXQ+MhAYO55CLaVsg=
 =3tyT
 -----END PGP SIGNATURE-----

Merge tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs

Pull header cleanups from Kent Overstreet:
 "The goal is to get sched.h down to a type only header, so the main
  thing happening in this patchset is splitting out various _types.h
  headers and dependency fixups, as well as moving some things out of
  sched.h to better locations.

  This is prep work for the memory allocation profiling patchset which
  adds new sched.h interdepencencies"

* tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (51 commits)
  Kill sched.h dependency on rcupdate.h
  kill unnecessary thread_info.h include
  Kill unnecessary kernel.h include
  preempt.h: Kill dependency on list.h
  rseq: Split out rseq.h from sched.h
  LoongArch: signal.c: add header file to fix build error
  restart_block: Trim includes
  lockdep: move held_lock to lockdep_types.h
  sem: Split out sem_types.h
  uidgid: Split out uidgid_types.h
  seccomp: Split out seccomp_types.h
  refcount: Split out refcount_types.h
  uapi/linux/resource.h: fix include
  x86/signal: kill dependency on time.h
  syscall_user_dispatch.h: split out *_types.h
  mm_types_task.h: Trim dependencies
  Split out irqflags_types.h
  ipc: Kill bogus dependency on spinlock.h
  shm: Slim down dependencies
  workqueue: Split out workqueue_types.h
  ...
2024-01-10 16:43:55 -08:00
Breno Leitao
0911b8c52c x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK
Step 10/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Added one more case. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-11-leitao@debian.org
2024-01-10 10:52:29 +01:00
Breno Leitao
a033eec9a0 x86/bugs: Rename CONFIG_CPU_SRSO => CONFIG_MITIGATION_SRSO
Step 9/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-10-leitao@debian.org
2024-01-10 10:52:29 +01:00
Breno Leitao
1da8d2172c x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY => CONFIG_MITIGATION_IBRS_ENTRY
Step 8/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-9-leitao@debian.org
2024-01-10 10:52:29 +01:00
Breno Leitao
ac61d43983 x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRY
Step 7/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-8-leitao@debian.org
2024-01-10 10:52:28 +01:00
Breno Leitao
7b75782ffd x86/bugs: Rename CONFIG_SLS => CONFIG_MITIGATION_SLS
Step 6/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-7-leitao@debian.org
2024-01-10 10:52:28 +01:00
Breno Leitao
aefb2f2e61 x86/bugs: Rename CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE
Step 5/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Converted a few more uses in comments/messages as well. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ariel Miculas <amiculas@cisco.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-6-leitao@debian.org
2024-01-10 10:52:28 +01:00
Breno Leitao
ea4654e088 x86/bugs: Rename CONFIG_PAGE_TABLE_ISOLATION => CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
Step 4/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Converted new uses that got added since the series was posted. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-5-leitao@debian.org
2024-01-10 10:52:28 +01:00
Breno Leitao
5fa31af31e x86/bugs: Rename CONFIG_CALL_DEPTH_TRACKING => CONFIG_MITIGATION_CALL_DEPTH_TRACKING
Step 3/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-4-leitao@debian.org
2024-01-10 10:52:28 +01:00
Breno Leitao
e0b8fcfa3c x86/bugs: Rename CONFIG_CPU_IBPB_ENTRY => CONFIG_MITIGATION_IBPB_ENTRY
Step 2/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-3-leitao@debian.org
2024-01-10 10:52:28 +01:00
Breno Leitao
be83e809ca x86/bugs: Rename CONFIG_GDS_FORCE_MITIGATION => CONFIG_MITIGATION_GDS_FORCE
So the CPU mitigations Kconfig entries - there's 10 meanwhile - are named
in a historically idiosyncratic and hence rather inconsistent fashion
and have become hard to relate with each other over the years:

   https://lore.kernel.org/lkml/20231011044252.42bplzjsam3qsasz@treble/

When they were introduced we never expected that we'd eventually have
about a dozen of them, and that more organization would be useful,
especially for Linux distributions that want to enable them in an
informed fashion, and want to make sure all mitigations are configured
as expected.

For example, the current CONFIG_SPECULATION_MITIGATIONS namespace is only
halfway populated, where some mitigations have entries in Kconfig, and
they could be modified, while others mitigations do not have Kconfig entries,
and can not be controlled at build time.

Fine-grained control over these Kconfig entries can help in a number of ways:

  1) Users can choose and pick only mitigations that are important for
     their workloads.

  2) Users and developers can choose to disable mitigations that mangle
     the assembly code generation, making it hard to read.

  3) Separate Kconfigs for just source code readability,
     so that we see *which* butt-ugly piece of crap code is for what
     reason...

In most cases, if a mitigation is disabled at compilation time, it
can still be enabled at runtime using kernel command line arguments.

This is the first patch of an initial series that renames various
mitigation related Kconfig options, unifying them under a single
CONFIG_MITIGATION_* namespace:

    CONFIG_GDS_FORCE_MITIGATION => CONFIG_MITIGATION_GDS_FORCE
    CONFIG_CPU_IBPB_ENTRY       => CONFIG_MITIGATION_IBPB_ENTRY
    CONFIG_CALL_DEPTH_TRACKING  => CONFIG_MITIGATION_CALL_DEPTH_TRACKING
    CONFIG_PAGE_TABLE_ISOLATION => CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
    CONFIG_RETPOLINE            => CONFIG_MITIGATION_RETPOLINE
    CONFIG_SLS                  => CONFIG_MITIGATION_SLS
    CONFIG_CPU_UNRET_ENTRY      => CONFIG_MITIGATION_UNRET_ENTRY
    CONFIG_CPU_IBRS_ENTRY       => CONFIG_MITIGATION_IBRS_ENTRY
    CONFIG_CPU_SRSO             => CONFIG_MITIGATION_SRSO
    CONFIG_RETHUNK              => CONFIG_MITIGATION_RETHUNK

Implement step 1/10 of the namespace unification of CPU mitigations related
Kconfig options and rename CONFIG_GDS_FORCE_MITIGATION to
CONFIG_MITIGATION_GDS_FORCE.

[ mingo: Rewrote changelog for clarity. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-2-leitao@debian.org
2024-01-10 10:43:14 +01:00
Linus Torvalds
9f2a635235 Quite a lot of kexec work this time around. Many singleton patches in
many places.  The notable patch series are:
 
 - nilfs2 folio conversion from Matthew Wilcox in "nilfs2: Folio
   conversions for file paths".
 
 - Additional nilfs2 folio conversion from Ryusuke Konishi in "nilfs2:
   Folio conversions for directory paths".
 
 - IA64 remnant removal in Heiko Carstens's "Remove unused code after
   IA-64 removal".
 
 - Arnd Bergmann has enabled the -Wmissing-prototypes warning everywhere
   in "Treewide: enable -Wmissing-prototypes".  This had some followup
   fixes:
 
   - Nathan Chancellor has cleaned up the hexagon build in the series
     "hexagon: Fix up instances of -Wmissing-prototypes".
 
   - Nathan also addressed some s390 warnings in "s390: A couple of
     fixes for -Wmissing-prototypes".
 
   - Arnd Bergmann addresses the same warnings for MIPS in his series
     "mips: address -Wmissing-prototypes warnings".
 
 - Baoquan He has made kexec_file operate in a top-down-fitting manner
   similar to kexec_load in the series "kexec_file: Load kernel at top of
   system RAM if required"
 
 - Baoquan He has also added the self-explanatory "kexec_file: print out
   debugging message if required".
 
 - Some checkstack maintenance work from Tiezhu Yang in the series
   "Modify some code about checkstack".
 
 - Douglas Anderson has disentangled the watchdog code's logging when
   multiple reports are occurring simultaneously.  The series is "watchdog:
   Better handling of concurrent lockups".
 
 - Yuntao Wang has contributed some maintenance work on the crash code in
   "crash: Some cleanups and fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZ2R6AAKCRDdBJ7gKXxA
 juCVAP4t76qUISDOSKugB/Dn5E4Nt9wvPY9PcufnmD+xoPsgkQD+JVl4+jd9+gAV
 vl6wkJDiJO5JZ3FVtBtC3DFA/xHtVgk=
 =kQw+
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "Quite a lot of kexec work this time around. Many singleton patches in
  many places. The notable patch series are:

   - nilfs2 folio conversion from Matthew Wilcox in 'nilfs2: Folio
     conversions for file paths'.

   - Additional nilfs2 folio conversion from Ryusuke Konishi in 'nilfs2:
     Folio conversions for directory paths'.

   - IA64 remnant removal in Heiko Carstens's 'Remove unused code after
     IA-64 removal'.

   - Arnd Bergmann has enabled the -Wmissing-prototypes warning
     everywhere in 'Treewide: enable -Wmissing-prototypes'. This had
     some followup fixes:

      - Nathan Chancellor has cleaned up the hexagon build in the series
        'hexagon: Fix up instances of -Wmissing-prototypes'.

      - Nathan also addressed some s390 warnings in 's390: A couple of
        fixes for -Wmissing-prototypes'.

      - Arnd Bergmann addresses the same warnings for MIPS in his series
        'mips: address -Wmissing-prototypes warnings'.

   - Baoquan He has made kexec_file operate in a top-down-fitting manner
     similar to kexec_load in the series 'kexec_file: Load kernel at top
     of system RAM if required'

   - Baoquan He has also added the self-explanatory 'kexec_file: print
     out debugging message if required'.

   - Some checkstack maintenance work from Tiezhu Yang in the series
     'Modify some code about checkstack'.

   - Douglas Anderson has disentangled the watchdog code's logging when
     multiple reports are occurring simultaneously. The series is
     'watchdog: Better handling of concurrent lockups'.

   - Yuntao Wang has contributed some maintenance work on the crash code
     in 'crash: Some cleanups and fixes'"

* tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (157 commits)
  crash_core: fix and simplify the logic of crash_exclude_mem_range()
  x86/crash: use SZ_1M macro instead of hardcoded value
  x86/crash: remove the unused image parameter from prepare_elf_headers()
  kdump: remove redundant DEFAULT_CRASH_KERNEL_LOW_SIZE
  scripts/decode_stacktrace.sh: strip unexpected CR from lines
  watchdog: if panicking and we dumped everything, don't re-enable dumping
  watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
  watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
  watchdog/hardlockup: adopt softlockup logic avoiding double-dumps
  kexec_core: fix the assignment to kimage->control_page
  x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init()
  lib/trace_readwrite.c:: replace asm-generic/io with linux/io
  nilfs2: cpfile: fix some kernel-doc warnings
  stacktrace: fix kernel-doc typo
  scripts/checkstack.pl: fix no space expression between sp and offset
  x86/kexec: fix incorrect argument passed to kexec_dprintk()
  x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs
  nilfs2: add missing set_freezable() for freezable kthread
  kernel: relay: remove relay_file_splice_read dead code, doesn't work
  docs: submit-checklist: remove all of "make namespacecheck"
  ...
2024-01-09 11:46:20 -08:00
Linus Torvalds
aac4de465a Performance events changes for v6.8 are:
- Add branch stack counters ABI extension to better capture
    the growing amount of information the PMU exposes via
    branch stack sampling. There's matching tooling support.
 
  - Fix race when creating the nr_addr_filters sysfs file
 
  - Add Intel Sierra Forest and Grand Ridge intel/cstate
    PMU support.
 
  - Add Intel Granite Rapids, Sierra Forest and Grand Ridge
    uncore PMU support.
 
  - Misc cleanups & fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmWb4lURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jlnQ/+NSzrPQ9hEiS5a1iMMxdwC6IoXCmeFVsv
 s5NsGaVC7FEgjm3oCfvQlP63HolMO9R7TNLZsgINzOda5IHtE7WUcgBK7gbZr+NT
 WabdTyFrdmUr+Br0rLrEe0bxDSQU7r41ptqKE5HZRM9/3SbLhWgaXSJbfFAG2JV0
 xboZ/2qzb7Puch6VTWv1YhuIpr1Pi817As4SOo7JR4V8jBB2bh2eZ7XBN1z23aw2
 xuglbYml5gs4dOaFTqkRLWyn2PmrZ9wYKcdp63FVUscZ4LxvSw749BxEcNpTbxLp
 PT6uXIKw9PnStNfscfrsk6fDocVJzqrOK71blgiOKbmhWTE0UimEpFf1Hd3ooewg
 hFp3hmkE5Bc2MTUnwivkBxj96fz5rXH+3+Cue/5NsvDNlhlkswIIxzDw8M1G4rOI
 KQMDUYFOhQPa3Hi1lSp2SgHI5AcYHudepr/Z3QMxD3iLs+Wo2cmDcp8d2VrMLfb7
 GHSITG592iYcZPYsJosxby8CSFaUPxIl9l3AODQwWuEjd4PcOYa6iB2HbEa/mC3R
 wXcs8mFIMAaH/HRYUlqUDA5pOqN5chb13iDtS4JqJqBKyWgdrDLCVxoZSQvB64+I
 bldyy1e5oQSVVwJ42WLkUK3Eld2x75ki1JLZFwMgYuOgQv3jfu2VNenUWJ5ig0La
 dPpHP8PwOoc=
 =2O/5
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:

 - Add branch stack counters ABI extension to better capture the growing
   amount of information the PMU exposes via branch stack sampling.
   There's matching tooling support.

 - Fix race when creating the nr_addr_filters sysfs file

 - Add Intel Sierra Forest and Grand Ridge intel/cstate PMU support

 - Add Intel Granite Rapids, Sierra Forest and Grand Ridge uncore PMU
   support

 - Misc cleanups & fixes

* tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Factor out topology_gidnid_map()
  perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology()
  perf/x86/amd: Reject branch stack for IBS events
  perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge
  perf/x86/intel/uncore: Support IIO free-running counters on GNR
  perf/x86/intel/uncore: Support Granite Rapids
  perf/x86/uncore: Use u64 to replace unsigned for the uncore offsets array
  perf/x86/intel/uncore: Generic uncore_get_uncores and MMIO format of SPR
  perf: Fix the nr_addr_filters fix
  perf/x86/intel/cstate: Add Grand Ridge support
  perf/x86/intel/cstate: Add Sierra Forest support
  x86/smp: Export symbol cpu_clustergroup_mask()
  perf/x86/intel/cstate: Cleanup duplicate attr_groups
  perf/core: Fix narrow startup race when creating the perf nr_addr_filters sysfs file
  perf/x86/intel: Support branch counters logging
  perf/x86/intel: Reorganize attrs and is_visible
  perf: Add branch_sample_call_stack
  perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag
  perf: Add branch stack counters
2024-01-08 19:37:20 -08:00
Linus Torvalds
b51cc5d028 x86/cleanups changes for v6.8:
- A micro-optimization got misplaced as a cleanup:
     - Micro-optimize the asm code in secondary_startup_64_no_verify()
 
  - Change global variables to local
  - Add missing kernel-doc function parameter descriptions
  - Remove unused parameter from a macro
  - Remove obsolete Kconfig entry
  - Fix comments
  - Fix typos, mostly scripted, manually reviewed
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmWb2i8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iFIQ//RjqKWmEBfv0UVCNgtRgkUKOvYVkfhC1R
 FykHWbSE+/oDODS7B+gbWqzl9Fq2Oxx9re4KZuMfnojE96KZ6H1flQn7z3UVRUrf
 pfMx13E+uyf7qbVZktqH38lUS4s/AHdX2PKCiXlU/0hIkiBdjbAl3ylyqMv7ytIL
 Fi2N9iYJN+eLlMkc3A5IK83xNiU8rb0gO6Uywn3nUbqadY/YX2gDpND5kfzRIneR
 lTKy4rX3+E65qYB2Ly1wDr7e0Q0rgaTzPctx6twFrxQXK+MsHiartJhM5juND/tU
 DEjSW9ISOHlitKEJI/zbdrvJlr5AKDNy2zHYmQQuqY6+YHRamCKqwIjLIPkKj52g
 lAbosNwvp/o8W3zUHgUfVZR5hVxN863zV2qa/ehoQ3b/9kNjQC8actILjYEgIVu9
 av1sd+nETbjCUABIF9H9uAoRbgc+wQs2nupJZrjvginFz8+WVhgaBdJDMYCNAmjc
 fNMjGtRS7YXiIMj09ZAXFThVW302FdbTgggDh/qlQlDOXFu5HRbyuWR+USr4/jkP
 qs2G6m/BHDs9HxDRo/no+ccSrUBV5phfhZbO7qwjTf2NJJvPHW+cxGpT00zU2v8A
 lgfVI7SDkxwbyi1gacJ054GqEhsWuEdi40ikqxjhL8Oq4xwwsey/PiaIxjkDQx92
 Gj3XUSDnGEs=
 =kUav
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Ingo Molnar:

 - Change global variables to local

 - Add missing kernel-doc function parameter descriptions

 - Remove unused parameter from a macro

 - Remove obsolete Kconfig entry

 - Fix comments

 - Fix typos, mostly scripted, manually reviewed

and a micro-optimization got misplaced as a cleanup:

 - Micro-optimize the asm code in secondary_startup_64_no_verify()

* tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arch/x86: Fix typos
  x86/head_64: Use TESTB instead of TESTL in secondary_startup_64_no_verify()
  x86/docs: Remove reference to syscall trampoline in PTI
  x86/Kconfig: Remove obsolete config X86_32_SMP
  x86/io: Remove the unused 'bw' parameter from the BUILDIO() macro
  x86/mtrr: Document missing function parameters in kernel-doc
  x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()
2024-01-08 17:23:32 -08:00
Linus Torvalds
106b88d7a9 x86/asm changes for v6.8:
- Replace magic numbers in GDT descriptor definitions & handling:
 
    - Introduce symbolic names via macros for descriptor types/fields/flags,
      and then use these symbolic names.
 
    - Clean up definitions a bit, such as GDT_ENTRY_INIT()
 
    - Fix/clean up details that became visibly inconsistent after the
      symbol-based code was introduced:
 
       - Unify accessed flag handling
 
       - Set the D/B size flag consistently & according to the HW specification
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmWb1hQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j7/xAAk2L36n4ZpWqm4oWxMyRDTd713qT+Jzg5
 txVr7i8el5RZ8D9WQejzOWemJ/XgirkWuRSFfY+UzZZ+7lmioXWDFLz6PFpyMOwM
 OG2yyAAZIQzFhe1LxhurN5G7F+q9Nl/CcDnRWv+FUKs829GOGlaqirKfKwgIFzgw
 GUT0eT5gPfIfEFnzMUrix9+KNIZU6s6Lgg5tVpavNw6bKEimg8Mtn4J45voFbg98
 l/PU0RaC28vePTMdoBLv/ZZUHS9K/UdGUq600guzk/Zh2plqxGtqjXtaM5/8giPO
 JbgQCTP0LaCC0oZf0SAbUW7+TiKD03CP6QzleYyLdhTjLz1CCOswgz0upPjfa/5j
 nhgsMCk/CEhr6nMwhJiRNhOrFFFJWX00bpcQ2tyyVO156Q+CfhJxuk1XafeQ5MSs
 K4l97IVc/vPQSEZy5Nh6GdGmWljYxP3Ku4eHRkyVGVYf8YyaD2l6CimaRcdPNqqL
 w9GLwczLfEzkomfdxsyppwu364O91Lpv6gi6AY4Tj7Hgo1ZMm9ucUO3AsTUz5EJJ
 aiqX6o7puIJJkU2O3hKviTzNwTojjjicD/Az66re3lHZBND+luP7LjVY2wIomH5i
 bFPVdSCDmCdTrsTkO3f87FHbCVrbNllKLGChvJ66vyC2nJeIM8UHJ/MxYJsGRhqk
 87vU1DWYPrQ=
 =Y0pg
 -----END PGP SIGNATURE-----

Merge tag 'x86-asm-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 asm updates from Ingo Molnar:
 "Replace magic numbers in GDT descriptor definitions & handling:

   - Introduce symbolic names via macros for descriptor
     types/fields/flags, and then use these symbolic names.

   - Clean up definitions a bit, such as GDT_ENTRY_INIT()

   - Fix/clean up details that became visibly inconsistent after the
     symbol-based code was introduced:

      - Unify accessed flag handling

      - Set the D/B size flag consistently & according to the HW
        specification"

* tag 'x86-asm-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Add DB flag to 32-bit percpu GDT entry
  x86/asm: Always set A (accessed) flag in GDT descriptors
  x86/asm: Replace magic numbers in GDT descriptors, script-generated change
  x86/asm: Replace magic numbers in GDT descriptors, preparations
  x86/asm: Provide new infrastructure for GDT descriptors
2024-01-08 17:02:57 -08:00
Linus Torvalds
33034c4f94 x86/apic changes for v6.8:
- Clean up 'struct apic':
 
     - Drop ::delivery_mode
     - Drop 'enum apic_delivery_modes'
     - Drop 'struct local_apic'
 
  - Fix comments
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmWb0wYRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j0TA/7By0/nG1cSpGOJEZ19R3fY6H7hy+xbmaF
 7DnuELRzheESbRgPbnc45jj8SYfjWMB/tPlkEQx7JTG/dgo8yLIjK8bfKfp5O7wF
 RQz7bT+iTpTy2Y/Ww0cTN8e8ihaNP8Po0b/b1Ux60B4k/6iYOavDbJmkQj3MUHfx
 6cfV5gYNagMNVyzeZQbWzOLajz2DN3TlTkdoQy2H/lJsqO1IktRaCZ2xGr4HGE40
 FRRu643pMD581D5/+Ug80DcSEOP4fDBZM1DkAOXpxqxjDM4L+Km5syASqrdSqSDo
 8hc+a8yPiJ63A1yfnp67SMp+ZPW3qev7E6ssXyRr3wVMGNGUrV8qlgHAFhuu0dEX
 /B9Fo4vLaHp+ti5XffMdE+huKj43ztcl6ThCtCNG8hsy21G0G2Z1fvpoy+VlQs+b
 P4HwM/+Ktnr7jgVufXWwVcSHWiZJ4FX0mEFTCt6ZZMrXctuiWJXJlU3lKZhFkrT9
 nzsDhriHtkz/1QAWe5rvlX79b+YJo8OYWzNFlGJxq9XgVbtfxqzsGskWm4nX1Adh
 x8cMaOmyk40sST9Vwuvq8SH0eK1Kngkin38pC/SPI/+evh0Z4e7h1/kvoeJu2DSY
 vU2pSTT+CV+GAohsNYlwY1l69OYE1Jy/cAEAurWgsOsMlSPP5knu3m0ksSqv5S1a
 E+lfpV+5RF8=
 =vG8v
 -----END PGP SIGNATURE-----

Merge tag 'x86-apic-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 apic updates from Ingo Molnar:

 - Clean up 'struct apic':
    - Drop ::delivery_mode
    - Drop 'enum apic_delivery_modes'
    - Drop 'struct local_apic'

 - Fix comments

* tag 'x86-apic-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioapic: Remove unfinished sentence from comment
  x86/apic: Drop struct local_apic
  x86/apic: Drop enum apic_delivery_modes
  x86/apic: Drop apic::delivery_mode
2024-01-08 16:46:41 -08:00
Linus Torvalds
3edbe8afb6 - Convert the hw error storm handling into a finer-grained, per-bank
solution which allows for more timely detection and reporting of
   errors
 
 - Start a documentation section which will hold down relevant
   RAS features description and how they should be used
 
 - Add new AMD error bank types
 
 - Slim down and remove error type descriptions from the kernel side of
   error decoding to rasdaemon which can be used from now on to decode
   hw errors on AMD
 
 - Mark pages containing uncorrectable errors as poison so that kdump can
   avoid them and thus not cause another panic
 
 - The usual cleanups and fixlets
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWamqkACgkQEsHwGGHe
 VUrkMw/+Jv//z5pKFMXF2GlI/Xefh8pXDB57D22B1T6zNJm+Qq1b58U44WpxoU24
 b9gPtkgxjzA3JwoQG+cBDkCSs1xLV3McUS0UoEqQ28QvFN+mzxYKu4ura3F+rcZG
 PiCT4gPEYgZirYl0PKYeypnBPq3Krx/RYeTUE0vQ9HtmeBCmH71X1egyB4TqFbFU
 7ui9aITLAnLBwgO6On1qviGPvJoEDGAsQ656XoY0Js+8dUeYwAI4qpaaUwtXUo1D
 ARGzss55/qTRo2G+IkDDhbJ8e8G+eX22oa0n1tdhNFBqfYAN6gM+t4NiFlNnn+18
 nlbaSZfDygciE8DVjEnkVIrRJtkq7uj0dk7LvnqEI2y7J0LybHojC3hDKrqYLK3o
 PRgfPwmykOCwZQldGFYbShvmY8KoEQc/V9OWi/+A/M/uTJsForQmHn78Z2YkO9kG
 K6VaLuYszSCqz47wf56pHBwtMrivEPmkcxaz9ErkK90vM/NmV7kfLl+R8IK8apNJ
 cJkuLBjfgGpBP+AlpXGl9OE0lRJK5MCbEBlbGBBl58REBaB4DNkM4QHItrUSRR82
 ADLcfLAIRWT8UwwXieDbWF1jb+4L+IJnXCKGCCQ7+eYxcFI9V9TABD2B8io+Dzvz
 evZwLCPKmjuPc2CMcDu/eUdBKLNTn3QAoN/NLcVmzJ23lguQW/M=
 =o3UM
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 RAS updates from Borislav Petkov:

 - Convert the hw error storm handling into a finer-grained, per-bank
   solution which allows for more timely detection and reporting of
   errors

 - Start a documentation section which will hold down relevant RAS
   features description and how they should be used

 - Add new AMD error bank types

 - Slim down and remove error type descriptions from the kernel side of
   error decoding to rasdaemon which can be used from now on to decode
   hw errors on AMD

 - Mark pages containing uncorrectable errors as poison so that kdump
   can avoid them and thus not cause another panic

 - The usual cleanups and fixlets

* tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Handle Intel threshold interrupt storms
  x86/mce: Add per-bank CMCI storm mitigation
  x86/mce: Remove old CMCI storm mitigation code
  Documentation: Begin a RAS section
  x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types
  EDAC/mce_amd: Remove SMCA Extended Error code descriptions
  x86/mce/amd, EDAC/mce_amd: Move long names to decoder module
  x86/mce/inject: Clear test status value
  x86/mce: Remove redundant check from mce_device_create()
  x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
2024-01-08 16:03:00 -08:00
Linus Torvalds
bef91c28f2 - Add synthetic X86_FEATURE flags for the different AMD Zen generations
and use them everywhere instead of ad-hoc family/model checks. Drop an
   ancient AMD errata checking facility as a result
 
 - Fix a fragile initcall ordering in intel_epb
 
 - Do not issue the MFENCE+LFENCE barrier for the TSC deadline and X2APIC
   MSRs on AMD as it is not needed there
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWalaEACgkQEsHwGGHe
 VUrDYg/+LjqjJv3OcVZZkx9WVds0kmBCajrf9JxRYgiSTIpiL/usH0QOms8FjHQ6
 tYcukj+RJDk2nP5ho3Vs1WNA0mvU0nxC+99u0Ph4zugSMSl0XGOA+YxxTBPXmDGB
 1IxH9IloMFPhwDoQ4/ear0IvjIrfE4ESV2Dafe45WzVSdG7/0ijurisaH1kPYraP
 wzuNn142Tk0eicaam30sdThXZraO9Paz5MOYbpYEAU4lxNtdH85sQa+Xk0tqJcjD
 IwEcQJLE6n3r8t/lNMIlhAsmOVGrD5WltDH9HvEmKT4mzTumSc9DLu3YHHRWyx2K
 TMpRYHlVuvGJkJV3CYXi8fhTsV6uMsHEe1+xZ/Rf0iQzOG25v+zen8WK4REWOr/o
 VmprG3j7LkEFeeH3CqSOtVSbYmxFILQb6pAbzSlI907b5C6PaEYuudjVXuX01urN
 IG3krWHGMJ3AWKDV2Z3hW1TYtbLJyqKPNhqcBJiOuWyCe8cQXfKQBTpP5HuAEZEd
 UXc4QpStMvuPqxyQhlPSTAtY7L/UVhBH8oHoXPYiBmcCo7VtJYW6HH9z1ISUc1av
 FgKdkpx6vaJiXlD/wI/B5T1oViWQ8udhHpit99rhKl623e7WC2rdguAOVDLn/YIe
 cZB+R05yknBWOavH0kcuz9R9xYKMSBcEsRnBKmeOg9R+tTK/7BM=
 =afTN
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu feature updates from Borislav Petkov:

 - Add synthetic X86_FEATURE flags for the different AMD Zen generations
   and use them everywhere instead of ad-hoc family/model checks. Drop
   an ancient AMD errata checking facility as a result

 - Fix a fragile initcall ordering in intel_epb

 - Do not issue the MFENCE+LFENCE barrier for the TSC deadline and
   X2APIC MSRs on AMD as it is not needed there

* tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Add X86_FEATURE_ZEN1
  x86/CPU/AMD: Drop now unused CPU erratum checking function
  x86/CPU/AMD: Get rid of amd_erratum_1485[]
  x86/CPU/AMD: Get rid of amd_erratum_400[]
  x86/CPU/AMD: Get rid of amd_erratum_383[]
  x86/CPU/AMD: Get rid of amd_erratum_1054[]
  x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init function
  x86/CPU/AMD: Move Zenbleed check to the Zen2 init function
  x86/CPU/AMD: Rename init_amd_zn() to init_amd_zen_common()
  x86/CPU/AMD: Call the spectral chicken in the Zen2 init function
  x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init function
  x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init function
  x86/CPU/AMD: Carve out the erratum 1386 fix
  x86/CPU/AMD: Add ZenX generations flags
  x86/cpu/intel_epb: Don't rely on link order
  x86/barrier: Do not serialize MSR accesses on AMD
2024-01-08 15:45:31 -08:00
Linus Torvalds
e900042f04 - Convert the sev-guest plaform ->remove callback to return void
- Move the SEV C-bit verification to the BSP as it needs to happen only
   once and not on every AP
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWbIXUACgkQEsHwGGHe
 VUocDQ//cxfdJuF+Srww/WgpXSc2EmT6OAlSX8R1S89PD5PPDWJq3U8dE7gScZnH
 bjcFj7CMkyFHe4yPkDljdS4+7Zrt2ilbzuojdnCjByd4aL5fCsZJCjWlVeaR/0ay
 3lkr+fX3GtAl2njOBv9Fc0E7qh/E8zHRNcIRIS6Lz+LY5ziNqU9IGZruESQKkR2B
 GokNKqWfqFhJaifEe4F+AdWRbZaEZlli6Gttn7T+QPPRk9LPCaxqkHT0BbIbUqnr
 eMdvPZucWZiqNe1C9zxKeRyzqc2nPXbvi1QkCundXrCYwRuF4+sQiyM5yEMmStdl
 wQ/kfbLJFFS96RZ408UNrvSJHYulFRb5uNj0M6lIc2KlQt2PCPhz2+MsipjWMYEP
 7qz5KGkiSyosTZ+ZT0jw1G++KtRcoG/brXFKXGuu9I2yi0OMOtNdCXNH9SyTJDT3
 d6nk5t2xloPH8qLfAb9BK7uVINSSlu/f/3nuoKqlHPSYcZoK7kt0BW4DJNb8+iYO
 XhdmCpvYIfMvyaLjFYypimE7AnqgafUS/zVQCfnIF6BJveGj6AI/Xv0yubwErbuK
 ijcrz2x9LlwgoAGMClePFWwhmRfg1M3cRRD4VekH4kIC6pcrP2Wk3Dl2VgaojEdg
 F4qE+RB1x4AO5wkiYLtHACPZQ7i7yzLLGOhBTxFjgXz0EBMwTYU=
 =zKez
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

 - Convert the sev-guest plaform ->remove callback to return void

 - Move the SEV C-bit verification to the BSP as it needs to happen only
   once and not on every AP

* tag 'x86_sev_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  virt: sev-guest: Convert to platform remove callback returning void
  x86/sev: Do the C-bit verification only on the BSP
2024-01-08 15:42:52 -08:00
Linus Torvalds
fc5e5c5923 - Replace the paravirt patching functionality using the alternatives
infrastructure and remove the former
 
 - Misc other improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWbIJ0ACgkQEsHwGGHe
 VUrOSg//duslGRzbLkrjijYmlb9C9yjdf+7611biRNExxlOUbQvgMX8O5/qnvfhk
 ddqcyRJlg6GeooMI3ch25KRqdUbcgI5Zq1ftG7hFiX94gIiWhiHibYxtmnNGsDuy
 sq4R3nxqZxpTh2Seajh9AfXS4uINu1wsedLxuSyNiCBk2/BB7zqC3aPCyAtjJDdT
 tpPN+dx5s/rmPkdCcdFxwAgSxsohLjZgktdZIUCbVM+blbghDgPV1wjhCUimxUYx
 2woqS0zgOZb6eRcZgdXkuaNvOqODOwIZo3iebr6XSIb49YCC/KhuUlDDr/ZZGV3a
 CeMR42sa2CEdl0YHsnHEm8LswpUVqL6FBfFnC+TP0bAGRC+L4pNEaS6eQk+ktWf/
 +k6bvuBhuPgZG5KdxgQiJB8wogDxCXqQtI7MNLI0IuBS7GC2hjHb35WB6Qb8i+EB
 ffyIJYhK0shpVzo9H1efosmLfO+i2RPn5YZiC+S27MVqGbYEkW6713hr56vp58mK
 UaxXWoTnfRR9O5Hyf0xEf/qjZFO2zpmQRgXIM2ECzLxn7GNvsPtcHsYJmLJyeUXf
 749glVHz8Ue9HsDTCQB35Z9yRT4cVuz2FclwEt51Qlta8QtX8MRaL/l+F3FOrHoo
 CPgo/fzSav3Z/XpQpjbC7vI886azQivuX5i5J9LhO5IDc8O3OMs=
 =Q5U5
 -----END PGP SIGNATURE-----

Merge tag 'x86_paravirt_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 paravirt updates from Borislav Petkov:

 - Replace the paravirt patching functionality using the alternatives
   infrastructure and remove the former

 - Misc other improvements

* tag 'x86_paravirt_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternative: Correct feature bit debug output
  x86/paravirt: Remove no longer needed paravirt patching code
  x86/paravirt: Switch mixed paravirt/alternative calls to alternatives
  x86/alternative: Add indirect call patching
  x86/paravirt: Move some functions and defines to alternative.c
  x86/paravirt: Introduce ALT_NOT_XEN
  x86/paravirt: Make the struct paravirt_patch_site packed
  x86/paravirt: Use relative reference for the original instruction offset
2024-01-08 13:41:42 -08:00
Linus Torvalds
6e0b939180 - Correct minor issues after the microcode revision reporting sanitization
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWZaMsACgkQEsHwGGHe
 VUpiig//cDmMbajaeInAVVEilmJnw+YYxKCpvgOGdUnL+dAxBJ2SehVeEGZLiMRs
 79o5X2In+tdTjXRH1oTxWJWSMx9MW4aGbDbgZDAJoPigdZYkFoWE2p81t5n11LNt
 xjMNcnGMbns1aNCdBqGlg0baQ/xJavdFPndziq7kUMS/f3QLRm8Jp7MA+zUkDgCf
 DPuJcyFVYq7ffmc9VuVEsrh6yNREOj88Ek1tatDobbOd6LA/a3N/aOjosEwCS9W1
 qBVcMSaNt34ySzHV4sd/Uw8Dj3LXmxe2a51O+nsy+AjjkmHPYU9F0jSK4UqZgI3X
 +NPl9o8eNz8ACnK9oF/OLogoK3R9ED8vsok0r8OdvziSpi0XXaAJFGJCIYYcwJWb
 5aJTVtwKCoOhdV80NIcwjY/QJLTji4LKK46w+HaHy8wNQhAWY3Kt3P4XAn7ERTh6
 vBjeeFQtJLmE+KKYcNC90VGf/efztN9AdtYxwdb8JkkFZwOyc7M2MPV74YvQI0iz
 O0FCf2PWbQ0hKXwsVvfwn5bZoNXyIcvRc+rGpYi0NaGaiUcq53JY00NateOyhG1Z
 ebN30AWeivqW4Mwz3izKddmeOwniWpCbbJHR8E3KGZH/Uy6S9F6UQY14y0NhucKq
 RWTwFs5+YWB6rsmJRtUPgkX18hRVl5AjURcBXK7WbR0cONntXHs=
 =0Ewr
 -----END PGP SIGNATURE-----

Merge tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode updates from Borislav Petkov:

 - Correct minor issues after the microcode revision reporting
   sanitization

* tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/intel: Set new revision only after a successful update
  x86/microcode/intel: Remove redundant microcode late updated message
2024-01-08 13:24:30 -08:00
Kirill A. Shutemov
1c6d984f52 x86/kvm: Do not try to disable kvmclock if it was not enabled
kvm_guest_cpu_offline() tries to disable kvmclock regardless if it is
present in the VM. It leads to write to a MSR that doesn't exist on some
configurations, namely in TDX guest:

	unchecked MSR access error: WRMSR to 0x12 (tried to write 0x0000000000000000)
	at rIP: 0xffffffff8110687c (kvmclock_disable+0x1c/0x30)

kvmclock enabling is gated by CLOCKSOURCE and CLOCKSOURCE2 KVM paravirt
features.

Do not disable kvmclock if it was not enabled.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: c02027b574 ("x86/kvm: Disable kvmclock on all CPUs on shutdown")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Message-Id: <20231205004510.27164-6-kirill.shutemov@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-08 12:14:41 -05:00
Yuntao Wang
61bb219f9d x86/crash: use SZ_1M macro instead of hardcoded value
Use SZ_1M macro instead of hardcoded 1<<20 to make code more readable.

Link: https://lkml.kernel.org/r/20240102144905.110047-3-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:45:25 -08:00
Yuntao Wang
83d4a42a91 x86/crash: remove the unused image parameter from prepare_elf_headers()
Patch series "crash: Some cleanups and fixes", v2.

This patchset includes two cleanups and one fix.


This patch (of 3):

The image parameter is no longer in use, remove it.  Also, tidy up the
code formatting.

Link: https://lkml.kernel.org/r/20240102144905.110047-1-ytcoode@gmail.com
Link: https://lkml.kernel.org/r/20240102144905.110047-2-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:45:25 -08:00
Jakub Kicinski
e63c1822ac Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  e009b2efb7 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()")
  0f2b214779 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL")
https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 18:06:46 -08:00
Jinghao Jia
f5d03da48d x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect
kprobe_emulate_call_indirect currently uses int3_emulate_call to emulate
indirect calls. However, int3_emulate_call always assumes the size of
the call to be 5 bytes when calculating the return address. This is
incorrect for register-based indirect calls in x86, which can be either
2 or 3 bytes depending on whether REX prefix is used. At kprobe runtime,
the incorrect return address causes control flow to land onto the wrong
place after return -- possibly not a valid instruction boundary. This
can lead to a panic like the following:

[    7.308204][    C1] BUG: unable to handle page fault for address: 000000000002b4d8
[    7.308883][    C1] #PF: supervisor read access in kernel mode
[    7.309168][    C1] #PF: error_code(0x0000) - not-present page
[    7.309461][    C1] PGD 0 P4D 0
[    7.309652][    C1] Oops: 0000 [#1] SMP
[    7.309929][    C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.7.0-rc5-trace-for-next #6
[    7.310397][    C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[    7.311068][    C1] RIP: 0010:__common_interrupt+0x52/0xc0
[    7.311349][    C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3
[    7.312512][    C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046
[    7.312899][    C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001
[    7.313334][    C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4
[    7.313702][    C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482
[    7.314146][    C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023
[    7.314509][    C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000
[    7.314951][    C1] FS:  0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
[    7.315396][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.315691][    C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0
[    7.316153][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    7.316508][    C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    7.316948][    C1] Call Trace:
[    7.317123][    C1]  <IRQ>
[    7.317279][    C1]  ? __die_body+0x64/0xb0
[    7.317482][    C1]  ? page_fault_oops+0x248/0x370
[    7.317712][    C1]  ? __wake_up+0x96/0xb0
[    7.317964][    C1]  ? exc_page_fault+0x62/0x130
[    7.318211][    C1]  ? asm_exc_page_fault+0x22/0x30
[    7.318444][    C1]  ? __cfi_native_send_call_func_single_ipi+0x10/0x10
[    7.318860][    C1]  ? default_idle+0xb/0x10
[    7.319063][    C1]  ? __common_interrupt+0x52/0xc0
[    7.319330][    C1]  common_interrupt+0x78/0x90
[    7.319546][    C1]  </IRQ>
[    7.319679][    C1]  <TASK>
[    7.319854][    C1]  asm_common_interrupt+0x22/0x40
[    7.320082][    C1] RIP: 0010:default_idle+0xb/0x10
[    7.320309][    C1] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 66 90 0f 00 2d 09 b9 3b 00 fb f4 <fa> c3 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 e9
[    7.321449][    C1] RSP: 0018:ffffc9000009bee8 EFLAGS: 00000256
[    7.321808][    C1] RAX: ffff88813bca8b68 RBX: 0000000000000001 RCX: 000000000001ef0c
[    7.322227][    C1] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000001ef0c
[    7.322656][    C1] RBP: ffffc9000009bef8 R08: 8000000000000000 R09: 00000000000008c2
[    7.323083][    C1] R10: 0000000000000000 R11: ffffffff81058e70 R12: 0000000000000000
[    7.323530][    C1] R13: ffff8881002b30c0 R14: 0000000000000000 R15: 0000000000000000
[    7.323948][    C1]  ? __cfi_lapic_next_deadline+0x10/0x10
[    7.324239][    C1]  default_idle_call+0x31/0x50
[    7.324464][    C1]  do_idle+0xd3/0x240
[    7.324690][    C1]  cpu_startup_entry+0x25/0x30
[    7.324983][    C1]  start_secondary+0xb4/0xc0
[    7.325217][    C1]  secondary_startup_64_no_verify+0x179/0x17b
[    7.325498][    C1]  </TASK>
[    7.325641][    C1] Modules linked in:
[    7.325906][    C1] CR2: 000000000002b4d8
[    7.326104][    C1] ---[ end trace 0000000000000000 ]---
[    7.326354][    C1] RIP: 0010:__common_interrupt+0x52/0xc0
[    7.326614][    C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3
[    7.327570][    C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046
[    7.327910][    C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001
[    7.328273][    C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4
[    7.328632][    C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482
[    7.329223][    C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023
[    7.329780][    C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000
[    7.330193][    C1] FS:  0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
[    7.330632][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.331050][    C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0
[    7.331454][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    7.331854][    C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    7.332236][    C1] Kernel panic - not syncing: Fatal exception in interrupt
[    7.332730][    C1] Kernel Offset: disabled
[    7.333044][    C1] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

The relevant assembly code is (from objdump, faulting address
highlighted):

ffffffff8102ed9d:       41 ff d3                  call   *%r11
ffffffff8102eda0:       65 48 <8b> 05 30 c7 ff    mov    %gs:0x7effc730(%rip),%rax

The emulation incorrectly sets the return address to be ffffffff8102ed9d
+ 0x5 = ffffffff8102eda2, which is the 8b byte in the middle of the next
mov. This in turn causes incorrect subsequent instruction decoding and
eventually triggers the page fault above.

Instead of invoking int3_emulate_call, perform push and jmp emulation
directly in kprobe_emulate_call_indirect. At this point we can obtain
the instruction size from p->ainsn.size so that we can calculate the
correct return address.

Link: https://lore.kernel.org/all/20240102233345.385475-1-jinghao7@illinois.edu/

Fixes: 6256e668b7 ("x86/kprobes: Use int3 instead of debug trap for single-step")
Cc: stable@vger.kernel.org
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-01-04 14:10:59 +09:00
Bjorn Helgaas
54aa699e80 arch/x86: Fix typos
Fix typos, most reported by "codespell arch/x86".  Only touches comments,
no code changes.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
2024-01-03 11:46:22 +01:00
Joerg Roedel
75f74f85a4 Merge branches 'apple/dart', 'arm/rockchip', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next 2024-01-03 09:59:32 +01:00
Borislav Petkov (AMD)
7991ed4358 x86/alternative: Correct feature bit debug output
In

  https://lore.kernel.org/r/20231206110636.GBZXBVvCWj2IDjVk4c@fat_crate.local

I wanted to adjust the alternative patching debug output to the new
changes introduced by

  da0fe6e68e ("x86/alternative: Add indirect call patching")

but removed the '*' which denotes the ->x86_capability word. The correct
output should be, for example:

  [    0.230071] SMP alternatives: feat: 11*32+15, old: (entry_SYSCALL_64_after_hwframe+0x5a/0x77 (ffffffff81c000c2) len: 16), repl: (ffffffff89ae896a, len: 5) flags: 0x0

while the incorrect one says "... 1132+15" currently.

Add back the '*'.

Fixes: da0fe6e68e ("x86/alternative: Add indirect call patching")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231206110636.GBZXBVvCWj2IDjVk4c@fat_crate.local
2023-12-30 12:25:55 +01:00
Yuntao Wang
3177e6315b x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init()
kernel_ident_mapping_init() takes an exclusive memory range [pstart, pend)
where pend is not included in the range, while res represents an inclusive
memory range [start, end] where end is considered part of the range.

Passing [start, end] rather than [start, end+1) to
kernel_ident_mapping_init() may result in the identity mapping for the
end address not being set up.

For example, when res->start is equal to res->end,
kernel_ident_mapping_init() will not establish any identity mapping. 
Similarly, when the value of res->end is a multiple of 2M and the page
table maps 2M pages, kernel_ident_mapping_init() will also not set up
identity mapping for res->end.

Therefore, passing res->end directly to kernel_ident_mapping_init() is
incorrect, the correct end address should be `res->end + 1`.

Link: https://lkml.kernel.org/r/20231221101702.20956-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29 12:22:29 -08:00
Yuntao Wang
22bb6bcd4c x86/kexec: fix incorrect argument passed to kexec_dprintk()
kexec_dprintk() expects the last argument to be kbuf.memsz, but the actual
argument being passed is kbuf.bufsz.

Although these two values are currently equal, it is better to pass the
correct one, in case these two values become different in the future.

Link: https://lkml.kernel.org/r/20231220154105.215610-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29 12:22:28 -08:00
Yuntao Wang
43132282d8 x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs
When detecting an error, the current code uses kexec_dprintk() to output
log message. This is not quite appropriate as kexec_dprintk() is mainly
used for outputting debugging messages, rather than error messages.

Replace kexec_dprintk() with pr_err(). This also makes the output method
for this error log align with the output method for other error logs in
this function.

Additionally, the last return statement in set_page_address() is
unnecessary, remove it.

Link: https://lkml.kernel.org/r/20231220030124.149160-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29 12:22:28 -08:00
Kent Overstreet
932562a604 rseq: Split out rseq.h from sched.h
We're trying to get sched.h down to more or less just types only, not
code - rseq can live in its own header.

This helps us kill the dependency on preempt.h in sched.h.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-27 11:49:56 -05:00
Yuntao Wang
8474f82ade x86/kexec: simplify the logic of mem_region_callback()
The expression `mstart + resource_size(res) - 1` is actually equivalent to
`res->end`, simplify the logic of this function to improve readability.

Link: https://lkml.kernel.org/r/20231212150506.31711-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-20 15:02:58 -08:00
Baoquan He
e687b2fabd kexec_file, x86: print out debugging message if required
Then when specifying '-d' for kexec_file_load interface, loaded locations
of kernel/initrd/cmdline etc can be printed out to help debug.

Here replace pr_debug() with the newly added kexec_dprintk() in kexec_file
loading related codes.

And also print out e820 memmap passed to 2nd kernel just as kexec_load
interface has been doing.

Link: https://lkml.kernel.org/r/20231213055747.61826-4-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-20 15:02:57 -08:00
Kent Overstreet
a5a0abfdb3 x86: fix missing includes/forward declarations
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-20 16:40:30 -05:00
Vegard Nossum
bc90aefa99 x86/asm: Add DB flag to 32-bit percpu GDT entry
The D/B size flag for the 32-bit percpu GDT entry was not set.

The Intel manual (vol 3, section 3.4.5) only specifies the meaning of
this flag for three cases:

 1) code segments used for %cs -- doesn't apply here

 2) stack segments used for %ss -- doesn't apply

 3) expand-down data segments -- but we don't have the expand-down flag
    set, so it also doesn't apply here

The flag likely doesn't do anything here, although the manual does also
say: "This flag should always be set to 1 for 32-bit code and data
segments [...]" so we should probably do it anyway.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231219151200.2878271-6-vegard.nossum@oracle.com
2023-12-20 10:57:51 +01:00
Vegard Nossum
3b184b71df x86/asm: Always set A (accessed) flag in GDT descriptors
We have no known use for having the CPU track whether GDT descriptors
have been accessed or not.

Simplify the code by adding the flag to the common flags and removing
it everywhere else.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231219151200.2878271-5-vegard.nossum@oracle.com
2023-12-20 10:57:51 +01:00
Vegard Nossum
1445f6e15f x86/asm: Replace magic numbers in GDT descriptors, script-generated change
Actually replace the numeric values by the new symbolic values.

I used this to find all the existing users of the GDT_ENTRY*() macros:

  $ git grep -P 'GDT_ENTRY(_INIT)?\('

Some of the lines will exceed 80 characters, but some of them will be
shorter again in the next couple of patches.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231219151200.2878271-4-vegard.nossum@oracle.com
2023-12-20 10:57:38 +01:00
Vegard Nossum
41ef75c848 x86/asm: Replace magic numbers in GDT descriptors, preparations
We'd like to replace all the magic numbers in various GDT descriptors
with new, semantically meaningful, symbolic values.

In order to be able to verify that the change doesn't cause any actual
changes to the compiled binary code, I've split the change into two
patches:

 - Part 1 (this commit): everything _but_ actually replacing the numbers
 - Part 2 (the following commit): _only_ replacing the numbers

The reason we need this split for verification is that including new
headers causes some spurious changes to the object files, mostly line
number changes in the debug info but occasionally other subtle codegen
changes.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231219151200.2878271-3-vegard.nossum@oracle.com
2023-12-20 10:57:20 +01:00
Arnd Bergmann
86ed430cf5 x86/alternatives: Move apply_relocation() out of init section
This function is now called from a few places that are no __init_or_module,
resulting a link time warning:

  WARNING: modpost: vmlinux: section mismatch in reference: patch_dest+0x8a (section: .text) -> apply_relocation (section: .init.text)

Remove the annotation here.

[ mingo: Also sync up add_nop() with these changes. ]

Fixes: 17bce3b2ae ("x86/callthunks: Handle %rip-relative relocations in call thunk template")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20231204072856.1033621-1-arnd@kernel.org
2023-12-19 14:21:49 +01:00
Jakub Kicinski
c49b292d03 netdev
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmWAz2EACgkQ6rmadz2v
 bToqrw/9EwroZCc8GEHOKAlb/fzrMvn92rLo0ZW/cGN84QJPnx4zM6Zo0+fgLaaN
 oqqztwMUwdzGC3uX3FfVXaaLKbJ/MeHeL9BXFZNW8zkRHciw4R7kIBhOdPnHyET7
 uT+rQ4xPe1Mt7e9PjepKlSL5mEsxWfBkdUgsdn19Z2Vjdfr9mZMhYWYMJGcfTCD1
 TwxHKBPhq5fN3IsshmMBB8IrRp1HStUKb65MgZ4dI22LJXxTsFkx5XMFXcmuqvkH
 NhKj8jDcPEEh31bYcb6aG2Z4onw5F2lquygjk1Qyy5cyw45m/ipJKAXKdAyvJG+R
 VZCWOET/9wbRwFSK5wxwihCuKghFiofK52i2PcGtXZh0PCouyZZneSJOKM0yVWKO
 BvuJBxK4ETRnQyN6ZxhuJiEXG3/YMBBhyR2TX1LntVK9ct/k7qFVzATG49J39/sR
 SYMbptBRj4a5oMJ1qn0nFVEDFkg0jTnTDNnsEpcz60Ayt6EsJ1XosO5yz2huf861
 xgRMTKMseyG1/uV45tQ8ZPzbSPpBxjUi9Dl3coYsIm1a+y6clWUXcarONY5KVrpS
 CR98DuFgl+E7dXuisd/Kz2p2KxxSPq8nytsmLlgOvrUqhwiXqB+TKN8EHgIapVOt
 l1A5LrzXFTcGlT9MlaWBqEIy83Bu1nqQqbxrAFOE0k8A5jomXaw=
 =stU2
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2023-12-18

This PR is larger than usual and contains changes in various parts
of the kernel.

The main changes are:

1) Fix kCFI bugs in BPF, from Peter Zijlstra.

End result: all forms of indirect calls from BPF into kernel
and from kernel into BPF work with CFI enabled. This allows BPF
to work with CONFIG_FINEIBT=y.

2) Introduce BPF token object, from Andrii Nakryiko.

It adds an ability to delegate a subset of BPF features from privileged
daemon (e.g., systemd) through special mount options for userns-bound
BPF FS to a trusted unprivileged application. The design accommodates
suggestions from Christian Brauner and Paul Moore.

Example:
$ sudo mkdir -p /sys/fs/bpf/token
$ sudo mount -t bpf bpffs /sys/fs/bpf/token \
             -o delegate_cmds=prog_load:MAP_CREATE \
             -o delegate_progs=kprobe \
             -o delegate_attachs=xdp

3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei.

 - Complete precision tracking support for register spills
 - Fix verification of possibly-zero-sized stack accesses
 - Fix access to uninit stack slots
 - Track aligned STACK_ZERO cases as imprecise spilled registers.
   It improves the verifier "instructions processed" metric from single
   digit to 50-60% for some programs.
 - Fix verifier retval logic

4) Support for VLAN tag in XDP hints, from Larysa Zaremba.

5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu.

End result: better memory utilization and lower I$ miss for calls to BPF
via BPF trampoline.

6) Fix race between BPF prog accessing inner map and parallel delete,
from Hou Tao.

7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu.

It allows BPF interact with IPSEC infra. The intent is to support
software RSS (via XDP) for the upcoming ipsec pcpu work.
Experiments on AWS demonstrate single tunnel pcpu ipsec reaching
line rate on 100G ENA nics.

8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao.

9) BPF file verification via fsverity, from Song Liu.

It allows BPF progs get fsverity digest.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits)
  bpf: Ensure precise is reset to false in __mark_reg_const_zero()
  selftests/bpf: Add more uprobe multi fail tests
  bpf: Fail uprobe multi link with negative offset
  selftests/bpf: Test the release of map btf
  s390/bpf: Fix indirect trampoline generation
  selftests/bpf: Temporarily disable dummy_struct_ops test on s390
  x86/cfi,bpf: Fix bpf_exception_cb() signature
  bpf: Fix dtor CFI
  cfi: Add CFI_NOSEAL()
  x86/cfi,bpf: Fix bpf_struct_ops CFI
  x86/cfi,bpf: Fix bpf_callback_t CFI
  x86/cfi,bpf: Fix BPF JIT call
  cfi: Flip headers
  selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment
  selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test
  selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment
  bpf: Limit the number of kprobes when attaching program to multiple kprobes
  bpf: Limit the number of uprobes when attaching program to multiple uprobes
  bpf: xdp: Register generic_kfunc_set with XDP programs
  selftests/bpf: utilize string values for delegate_xxx mount options
  ...
====================

Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 16:46:08 -08:00
Thomas Gleixner
d5a10b976e x86/acpi: Handle bogus MADT APIC tables gracefully
The recent fix to ignore invalid x2APIC entries inadvertently broke
systems with creative MADT APIC tables. The affected systems have APIC
MADT tables where all entries have invalid APIC IDs (0xFF), which means
they register exactly zero CPUs.

But the condition to ignore the entries of APIC IDs < 255 in the X2APIC
MADT table is solely based on the count of MADT APIC table entries.

As a consequence, the affected machines enumerate no secondary CPUs at
all because the APIC table has entries and therefore the X2APIC table
entries with APIC IDs < 255 are ignored.

Change the condition so that the APIC table preference for APIC IDs <
255 only becomes effective when the APIC table has valid APIC ID
entries.

IOW, an APIC table full of invalid APIC IDs is considered to be empty
which in consequence enables the X2APIC table entries with a APIC ID
< 255 and restores the expected behaviour.

Fixes: ec9aedb2aa ("x86/acpi: Ignore invalid x2APIC entries")
Reported-by: John Sperbeck <jsperbeck@google.com>
Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/169953729188.3135.6804572126118798018.tip-bot2@tip-bot2
2023-12-18 14:21:44 +01:00
Mario Limonciello
cef9ecc8e9 rtc: Extend timeout for waiting for UIP to clear to 1s
Specs don't say anything about UIP being cleared within 10ms. They
only say that UIP won't occur for another 244uS. If a long NMI occurs
while UIP is still updating it might not be possible to get valid
data in 10ms.

This has been observed in the wild that around s2idle some calls can
take up to 480ms before UIP is clear.

Adjust callers from outside an interrupt context to wait for up to a
1s instead of 10ms.

Cc:  <stable@vger.kernel.org> # 6.1.y
Fixes: ec5895c0f2 ("rtc: mc146818-lib: extract mc146818_avoid_UIP")
Reported-by: Carsten Hatger <xmb8dsv4@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217626
Tested-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Reviewed-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Acked-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20231128053653.101798-5-mario.limonciello@amd.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2023-12-17 22:33:55 +01:00
Mario Limonciello
120931db07 rtc: Add support for configuring the UIP timeout for RTC reads
The UIP timeout is hardcoded to 10ms for all RTC reads, but in some
contexts this might not be enough time. Add a timeout parameter to
mc146818_get_time() and mc146818_get_time_callback().

If UIP timeout is configured by caller to be >=100 ms and a call
takes this long, log a warning.

Make all callers use 10ms to ensure no functional changes.

Cc:  <stable@vger.kernel.org> # 6.1.y
Fixes: ec5895c0f2 ("rtc: mc146818-lib: extract mc146818_avoid_UIP")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Reviewed-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Acked-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Link: https://lore.kernel.org/r/20231128053653.101798-4-mario.limonciello@amd.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2023-12-17 22:33:55 +01:00
Peter Zijlstra
2cd3e3772e x86/cfi,bpf: Fix bpf_struct_ops CFI
BPF struct_ops uses __arch_prepare_bpf_trampoline() to write
trampolines for indirect function calls. These tramplines much have
matching CFI.

In order to obtain the correct CFI hash for the various methods, add a
matching structure that contains stub functions, the compiler will
generate correct CFI which we can pilfer for the trampolines.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231215092707.566977112@infradead.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-15 16:25:55 -08:00
Peter Zijlstra
e72d88d18d x86/cfi,bpf: Fix bpf_callback_t CFI
Where the main BPF program is expected to match bpf_func_t,
sub-programs are expected to match bpf_callback_t.

This fixes things like:

tools/testing/selftests/bpf/progs/bloom_filter_bench.c:

           bpf_for_each_map_elem(&array_map, bloom_callback, &data, 0);

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231215092707.451956710@infradead.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-15 16:25:55 -08:00
Peter Zijlstra
4f9087f166 x86/cfi,bpf: Fix BPF JIT call
The current BPF call convention is __nocfi, except when it calls !JIT things,
then it calls regular C functions.

It so happens that with FineIBT the __nocfi and C calling conventions are
incompatible. Specifically __nocfi will call at func+0, while FineIBT will have
endbr-poison there, which is not a valid indirect target. Causing #CP.

Notably this only triggers on IBT enabled hardware, which is probably why this
hasn't been reported (also, most people will have JIT on anyway).

Implement proper CFI prologues for the BPF JIT codegen and drop __nocfi for
x86.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231215092707.345270396@infradead.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-15 16:25:55 -08:00
Peter Zijlstra
4382159696 cfi: Flip headers
Normal include order is that linux/foo.h should include asm/foo.h, CFI has it
the wrong way around.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20231215092707.231038174@infradead.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-15 16:25:55 -08:00
Thomas Gleixner
2dc4196138 x86/alternatives: Disable interrupts and sync when optimizing NOPs in place
apply_alternatives() treats alternatives with the ALT_FLAG_NOT flag set
special as it optimizes the existing NOPs in place.

Unfortunately, this happens with interrupts enabled and does not provide any
form of core synchronization.

So an interrupt hitting in the middle of the update and using the affected code
path will observe a half updated NOP and crash and burn. The following
3 NOP sequence was observed to expose this crash halfway reliably under QEMU
  32bit:

   0x90 0x90 0x90

which is replaced by the optimized 3 byte NOP:

   0x8d 0x76 0x00

So an interrupt can observe:

   1) 0x90 0x90 0x90		nop nop nop
   2) 0x8d 0x90 0x90		undefined
   3) 0x8d 0x76 0x90		lea    -0x70(%esi),%esi
   4) 0x8d 0x76 0x00		lea     0x0(%esi),%esi

Where only #1 and #4 are true NOPs. The same problem exists for 64bit obviously.

Disable interrupts around this NOP optimization and invoke sync_core()
before re-enabling them.

Fixes: 270a69c448 ("x86/alternative: Support relocations in alternatives")
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/ZT6narvE%2BLxX%2B7Be@windriver.com
2023-12-15 19:34:42 +01:00
Thomas Gleixner
3ea1704a92 x86/alternatives: Sync core before enabling interrupts
text_poke_early() does:

   local_irq_save(flags);
   memcpy(addr, opcode, len);
   local_irq_restore(flags);
   sync_core();

That's not really correct because the synchronization should happen before
interrupts are re-enabled to ensure that a pending interrupt observes the
complete update of the opcodes.

It's not entirely clear whether the interrupt entry provides enough
serialization already, but moving the sync_core() invocation into interrupt
disabled region does no harm and is obviously correct.

Fixes: 6fffacb303 ("x86/alternatives, jumplabel: Use text_poke_early() before mm_init()")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/ZT6narvE%2BLxX%2B7Be@windriver.com
2023-12-15 19:34:42 +01:00
Thomas Gleixner
69a7386c1e x86/smpboot/64: Handle X2APIC BIOS inconsistency gracefully
Chris reported that a Dell PowerEdge T340 system stopped to boot when upgrading
to a kernel which contains the parallel hotplug changes.  Disabling parallel
hotplug on the kernel command line makes it boot again.

It turns out that the Dell BIOS has x2APIC enabled and the boot CPU comes up in
X2APIC mode, but the APs come up inconsistently in xAPIC mode.

Parallel hotplug requires that the upcoming CPU reads out its APIC ID from the
local APIC in order to map it to the Linux CPU number.

In this particular case the readout on the APs uses the MMIO mapped registers
because the BIOS failed to enable x2APIC mode. That readout results in a page
fault because the kernel does not have the APIC MMIO space mapped when X2APIC
mode was enabled by the BIOS on the boot CPU and the kernel switched to X2APIC
mode early. That page fault can't be handled on the upcoming CPU that early and
results in a silent boot failure.

If parallel hotplug is disabled the system boots because in that case the APIC
ID read is not required as the Linux CPU number is provided to the AP in the
smpboot control word. When the kernel uses x2APIC mode then the APs are
switched to x2APIC mode too slightly later in the bringup process, but there is
no reason to do it that late.

Cure the BIOS bogosity by checking in the parallel bootup path whether the
kernel uses x2APIC mode and if so switching over the APs to x2APIC mode before
the APIC ID readout.

Fixes: 0c7ffa32db ("x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it")
Reported-by: Chris Lindee <chris.lindee@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Tested-by: Chris Lindee <chris.lindee@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/CA%2B2tU59853R49EaU_tyvOZuOTDdcU0RshGyydccp9R1NX9bEeQ@mail.gmail.com
2023-12-15 19:33:54 +01:00
Tony Luck
1f68ce2a02 x86/mce: Handle Intel threshold interrupt storms
Add an Intel specific hook into machine_check_poll() to keep track of
per-CPU, per-bank corrected error logs (with a stub for the
CONFIG_MCE_INTEL=n case).

When a storm is observed the rate of interrupts is reduced by setting
a large threshold value for this bank in IA32_MCi_CTL2. This bank is
added to the bitmap of banks for this CPU to poll. The polling rate is
increased to once per second.

When a storm ends reset the threshold in IA32_MCi_CTL2 back to 1, remove
the bank from the bitmap for polling, and change the polling rate back
to the default.

If a CPU with banks in storm mode is taken offline, the new CPU that
inherits ownership of those banks takes over management of storm(s) in
the inherited bank(s).

The cmci_discover() function was already very large. These changes
pushed it well over the top. Refactor with three helper functions to
bring it back under control.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231115195450.12963-4-tony.luck@intel.com
2023-12-15 14:53:42 +01:00
Tony Luck
7eae17c4ad x86/mce: Add per-bank CMCI storm mitigation
This is the core functionality to track CMCI storms at the machine check
bank granularity. Subsequent patches will add the vendor specific hooks
to supply input to the storm detection and take actions on the start/end
of a storm.

machine_check_poll() is called both by the CMCI interrupt code, and for
periodic polls from a timer. Add a hook in this routine to maintain
a bitmap history for each bank showing whether the bank logged an
corrected error or not each time it is polled.

In normal operation the interval between polls of these banks determines
how far to shift the history. The 64 bit width corresponds to about one
second.

When a storm is observed a CPU vendor specific action is taken to reduce
or stop CMCI from the bank that is the source of the storm.  The bank is
added to the bitmap of banks for this CPU to poll. The polling rate is
increased to once per second.  During a storm each bit in the history
indicates the status of the bank each time it is polled. Thus the
history covers just over a minute.

Declare a storm for that bank if the number of corrected interrupts seen
in that history is above some threshold (defined as 5 in this series,
could be tuned later if there is data to suggest a better value).

A storm on a bank ends if enough consecutive polls of the bank show no
corrected errors (defined as 30, may also change). That calls the CPU
vendor specific function to revert to normal operational mode, and
changes the polling rate back to the default.

  [ bp: Massage. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231115195450.12963-3-tony.luck@intel.com
2023-12-15 14:52:01 +01:00
Tony Luck
3ed57b41a4 x86/mce: Remove old CMCI storm mitigation code
When a "storm" of corrected machine check interrupts (CMCI) is detected
this code mitigates by disabling CMCI interrupt signalling from all of
the banks owned by the CPU that saw the storm.

There are problems with this approach:

1) It is very coarse grained. In all likelihood only one of the banks
   was generating the interrupts, but CMCI is disabled for all.  This
   means Linux may delay seeing and processing errors logged from other
   banks.

2) Although CMCI stands for Corrected Machine Check Interrupt, it is
   also used to signal when an uncorrected error is logged. This is
   a problem because these errors should be handled in a timely manner.

Delete all this code in preparation for a finer grained solution.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20231115195450.12963-2-tony.luck@intel.com
2023-12-15 13:44:12 +01:00
Borislav Petkov (AMD)
30579c8baa x86/sev: Do the C-bit verification only on the BSP
There's no need to do it on every AP.

The C-bit value read on the BSP and also verified there, is used
everywhere from now on.

No functional changes - just a bit faster booting APs.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20231130132601.10317-1-bp@alien8.de
2023-12-13 21:07:56 +01:00
Uros Bizjak
7d28365a06 x86/head_64: Use TESTB instead of TESTL in secondary_startup_64_no_verify()
There is no need to use TESTL when checking the least-significant bit
with a TEST instruction. Use TESTB, which is three bytes shorter:

   f6 05 00 00 00 00 01    testb  $0x1,0x0(%rip)

vs:

   f7 05 00 00 00 00 01    testl  $0x1,0x0(%rip)
   00 00 00

for the same effect.

No functional changes intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231109201032.4439-1-ubizjak@gmail.com
2023-12-13 13:35:38 +01:00
Kai Huang
70060463cb x86/mce: Differentiate real hardware #MCs from TDX erratum ones
The first few generations of TDX hardware have an erratum.  Triggering
it in Linux requires some kind of kernel bug involving relatively exotic
memory writes to TDX private memory and will manifest via
spurious-looking machine checks when reading the affected memory.

Make an effort to detect these TDX-induced machine checks and spit out
a new blurb to dmesg so folks do not think their hardware is failing.

== Background ==

Virtually all kernel memory accesses operations happen in full
cachelines.  In practice, writing a "byte" of memory usually reads a 64
byte cacheline of memory, modifies it, then writes the whole line back.
Those operations do not trigger this problem.

This problem is triggered by "partial" writes where a write transaction
of less than cacheline lands at the memory controller.  The CPU does
these via non-temporal write instructions (like MOVNTI), or through
UC/WC memory mappings.  The issue can also be triggered away from the
CPU by devices doing partial writes via DMA.

== Problem ==

A partial write to a TDX private memory cacheline will silently "poison"
the line.  Subsequent reads will consume the poison and generate a
machine check.  According to the TDX hardware spec, neither of these
things should have happened.

To add insult to injury, the Linux machine code will present these as a
literal "Hardware error" when they were, in fact, a software-triggered
issue.

== Solution ==

In the end, this issue is hard to trigger.  Rather than do something
rash (and incomplete) like unmap TDX private memory from the direct map,
improve the machine check handler.

Currently, the #MC handler doesn't distinguish whether the memory is
TDX private memory or not but just dump, for instance, below message:

 [...] mce: [Hardware Error]: CPU 147: Machine Check Exception: f Bank 1: bd80000000100134
 [...] mce: [Hardware Error]: RIP 10:<ffffffffadb69870> {__tlb_remove_page_size+0x10/0xa0}
 	...
 [...] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
 [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
 [...] Kernel panic - not syncing: Fatal local machine check

Which says "Hardware Error" and "Data load in unrecoverable area of
kernel".

Ideally, it's better for the log to say "software bug around TDX private
memory" instead of "Hardware Error".  But in reality the real hardware
memory error can happen, and sadly such software-triggered #MC cannot be
distinguished from the real hardware error.  Also, the error message is
used by userspace tool 'mcelog' to parse, so changing the output may
break userspace.

So keep the "Hardware Error".  The "Data load in unrecoverable area of
kernel" is also helpful, so keep it too.

Instead of modifying above error log, improve the error log by printing
additional TDX related message to make the log like:

  ...
 [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
 [...] mce: [Hardware Error]: Machine Check: TDX private memory error. Possible kernel bug.

Adding this additional message requires determination of whether the
memory page is TDX private memory.  There is no existing infrastructure
to do that.  Add an interface to query the TDX module to fill this gap.

== Impact ==

This issue requires some kind of kernel bug to trigger.

TDX private memory should never be mapped UC/WC.  A partial write
originating from these mappings would require *two* bugs, first mapping
the wrong page, then writing the wrong memory.  It would also be
detectable using traditional memory corruption techniques like
DEBUG_PAGEALLOC.

MOVNTI (and friends) could cause this issue with something like a simple
buffer overrun or use-after-free on the direct map.  It should also be
detectable with normal debug techniques.

The one place where this might get nasty would be if the CPU read data
then wrote back the same data.  That would trigger this problem but
would not, for instance, set off mechanisms like slab redzoning because
it doesn't actually corrupt data.

With an IOMMU at least, the DMA exposure is similar to the UC/WC issue.
TDX private memory would first need to be incorrectly mapped into the
I/O space and then a later DMA to that mapping would actually cause the
poisoning event.

[ dhansen: changelog tweaks ]

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20231208170740.53979-18-dave.hansen%40intel.com
2023-12-12 08:46:46 -08:00
Borislav Petkov (AMD)
232afb5578 x86/CPU/AMD: Add X86_FEATURE_ZEN1
Add a synthetic feature flag specifically for first generation Zen
machines. There's need to have a generic flag for all Zen generations so
make X86_FEATURE_ZEN be that flag.

Fixes: 30fa92832f ("x86/CPU/AMD: Add ZenX generations flags")
Suggested-by: Brian Gerst <brgerst@gmail.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/dc3835e3-0731-4230-bbb9-336bbe3d042b@amd.com
2023-12-12 11:17:37 +01:00
Tina Zhang
2396046d75 iommu: Add mm_get_enqcmd_pasid() helper function
mm_get_enqcmd_pasid() should be used by architecture code and closely
related to learn the PASID value that the x86 ENQCMD operation should
use for the mm.

For the moment SMMUv3 uses this without any connection to ENQCMD, it
will be cleaned up similar to how the prior patch made VT-d use the
PASID argument of set_dev_pasid().

The motivation is to replace mm->pasid with an iommu private data
structure that is introduced in a later patch.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20231027000525.1278806-4-tina.zhang@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-12 10:11:29 +01:00
Jason Gunthorpe
8f23f5dba6 iommu: Change kconfig around IOMMU_SVA
Linus suggested that the kconfig here is confusing:

https://lore.kernel.org/all/CAHk-=wgUiAtiszwseM1p2fCJ+sC4XWQ+YN4TanFhUgvUqjr9Xw@mail.gmail.com/

Let's break it into three kconfigs controlling distinct things:

 - CONFIG_IOMMU_MM_DATA controls if the mm_struct has the additional
   fields for the IOMMU. Currently only PASID, but later patches store
   a struct iommu_mm_data *

 - CONFIG_ARCH_HAS_CPU_PASID controls if the arch needs the scheduling bit
   for keeping track of the ENQCMD instruction. x86 will select this if
   IOMMU_SVA is enabled

 - IOMMU_SVA controls if the IOMMU core compiles in the SVA support code
   for iommu driver use and the IOMMU exported API

This way ARM will not enable CONFIG_ARCH_HAS_CPU_PASID

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20231027000525.1278806-2-tina.zhang@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-12 10:11:27 +01:00
Uros Bizjak
13408c6ae6 x86/traps: Use current_top_of_stack() helper in traps.c
Use current_top_of_stack() helper in sync_regs() and vc_switch_off_ist()
instead of open-coding the reading of  the top_of_stack percpu variable
explicitly.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231204210320.114429-2-ubizjak@gmail.com
2023-12-11 11:47:15 +01:00
Juergen Gross
f7af697762 x86/paravirt: Remove no longer needed paravirt patching code
Now that paravirt is using the alternatives patching infrastructure,
remove the paravirt patching code.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231210062138.2417-6-jgross@suse.com
2023-12-10 23:34:37 +01:00
Juergen Gross
60bc276b12 x86/paravirt: Switch mixed paravirt/alternative calls to alternatives
Instead of stacking alternative and paravirt patching, use the new
ALT_FLAG_CALL flag to switch those mixed calls to pure alternative
handling.

Eliminate the need to be careful regarding the sequence of alternative
and paravirt patching.

  [ bp: Touch up commit message. ]

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231210062138.2417-5-jgross@suse.com
2023-12-10 23:33:09 +01:00
Juergen Gross
da0fe6e68e x86/alternative: Add indirect call patching
In order to prepare replacing of paravirt patching with alternative
patching, add the capability to replace an indirect call with a direct
one.

This is done via a new flag ALT_FLAG_CALL as the target of the CALL
instruction needs to be evaluated using the value of the location
addressed by the indirect call.

For convenience, add a macro for a default CALL instruction. In case it
is being used without the new flag being set, it will result in a BUG()
when being executed. As in most cases, the feature used will be
X86_FEATURE_ALWAYS so add another macro ALT_CALL_ALWAYS usable for the
flags parameter of the ALTERNATIVE macros.

For a complete replacement, handle the special cases of calling a nop
function and an indirect call of NULL the same way as paravirt does.

  [ bp: Massage commit message, fixup the debug output and clarify flow
    more. ]

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231210062138.2417-4-jgross@suse.com
2023-12-10 20:49:20 +01:00
Juergen Gross
9824b00c2b x86/paravirt: Move some functions and defines to alternative.c
As a preparation for replacing paravirt patching completely by
alternative patching, move some backend functions and #defines to
the alternatives code and header.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231129133332.31043-3-jgross@suse.com
2023-12-10 20:30:31 +01:00
Linus Torvalds
5412fed784 - Add a forgotten CPU vendor check in the AMD microcode post-loading
callback so that the callback runs only on AMD
 
 - Make sure SEV-ES protocol negotiation happens only once and on the BSP
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmV1l4MACgkQEsHwGGHe
 VUoESQ//WxIdMb/ATNtBRXEM3GRzd/X2he7kXRGVSUeJZMxEOnpVEmo4NtAzXwI7
 E2yOk4EJ6QlN+p6/eea5X2FJ5IkpiDKj1jVvU6NxTJiF2I5t1XUjdcXiv05Is+bP
 5Cx1ysaznMQYvZUler15NkbLCZvVX5/1V9FthnLpS1d1cvem0NKoH+hVUQHkdHbo
 /QNzEy+dyii8mp+t6D4QEfjq2Ab7bRbx1FufJcvHy9tFt3u8JsuWI/kHLm5Jk3VN
 n7Efynluw1wJC3l/18QzrxUCL5LVwrs2LxC0GH42dq9vBp0EJclgsBOh+43cm+Ey
 ffowP3uJxayngdigRA4ANpfbJL7GHjEtoYlxk/ooI4l2ccNtSdCXn86UVI9q3BiT
 60j/qV9r/o0Wl46/BVrT6SXhHlcKm4YchUa53NbOHcU3hBpUuM78WAcqs+63voMq
 kFqXQVihlJWYYytyFKw6Ng7O6SHR/SMC3uSygG0q2KebiUlVMZSheJin2Od+shhC
 skD0z2EUzUGuNICFHuxwA2uu6jr/KvkEwDfjczwdE2haUfZLCnOfTjxZmqBbuzJ6
 l7lfGjBCpGGtGettjrM8WAuHnL6MzNDI2jVDV3K6bzqQVzk9wjFQjIG9X7c8XjN/
 E2IyQxNuiIKVj3GfFOVM9lgJ4C8qwNDa2A46Kad2dKHioiFfMuM=
 =uIo2
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.7_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add a forgotten CPU vendor check in the AMD microcode post-loading
   callback so that the callback runs only on AMD

 - Make sure SEV-ES protocol negotiation happens only once and on the
   BSP

* tag 'x86_urgent_for_v6.7_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Check vendor in the AMD microcode callback
  x86/sev: Fix kernel crash due to late update to read-only ghcb_version
2023-12-10 10:53:55 -08:00
Kai Huang
abe8dbab8f x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory
Start to transit out the "multi-steps" to initialize the TDX module.

TDX provides increased levels of memory confidentiality and integrity.
This requires special hardware support for features like memory
encryption and storage of memory integrity checksums.  Not all memory
satisfies these requirements.

As a result, TDX introduced the concept of a "Convertible Memory Region"
(CMR).  During boot, the firmware builds a list of all of the memory
ranges which can provide the TDX security guarantees.  The list of these
ranges is available to the kernel by querying the TDX module.

CMRs tell the kernel which memory is TDX compatible.  The kernel needs
to build a list of memory regions (out of CMRs) as "TDX-usable" memory
and pass them to the TDX module.  Once this is done, those "TDX-usable"
memory regions are fixed during module's lifetime.

To keep things simple, assume that all TDX-protected memory will come
from the page allocator.  Make sure all pages in the page allocator
*are* TDX-usable memory.

As TDX-usable memory is a fixed configuration, take a snapshot of the
memory configuration from memblocks at the time of module initialization
(memblocks are modified on memory hotplug).  This snapshot is used to
enable TDX support for *this* memory configuration only.  Use a memory
hotplug notifier to ensure that no other RAM can be added outside of
this configuration.

This approach requires all memblock memory regions at the time of module
initialization to be TDX convertible memory to work, otherwise module
initialization will fail in a later SEAMCALL when passing those regions
to the module.  This approach works when all boot-time "system RAM" is
TDX convertible memory and no non-TDX-convertible memory is hot-added
to the core-mm before module initialization.

For instance, on the first generation of TDX machines, both CXL memory
and NVDIMM are not TDX convertible memory.  Using kmem driver to hot-add
any CXL memory or NVDIMM to the core-mm before module initialization
will result in failure to initialize the module.  The SEAMCALL error
code will be available in the dmesg to help user to understand the
failure.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/all/20231208170740.53979-7-dave.hansen%40intel.com
2023-12-08 09:12:16 -08:00
Kai Huang
765a0542fd x86/virt/tdx: Detect TDX during kernel boot
Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
host and certain physical attacks.  A CPU-attested software module
called 'the TDX module' runs inside a new isolated memory range as a
trusted hypervisor to manage and run protected VMs.

Pre-TDX Intel hardware has support for a memory encryption architecture
called MKTME.  The memory encryption hardware underpinning MKTME is also
used for Intel TDX.  TDX ends up "stealing" some of the physical address
space from the MKTME architecture for crypto-protection to VMs.  The
BIOS is responsible for partitioning the "KeyID" space between legacy
MKTME and TDX.  The KeyIDs reserved for TDX are called 'TDX private
KeyIDs' or 'TDX KeyIDs' for short.

During machine boot, TDX microcode verifies that the BIOS programmed TDX
private KeyIDs consistently and correctly programmed across all CPU
packages.  The MSRs are locked in this state after verification.  This
is why MSR_IA32_MKTME_KEYID_PARTITIONING gets used for TDX enumeration:
it indicates not just that the hardware supports TDX, but that all the
boot-time security checks passed.

The TDX module is expected to be loaded by the BIOS when it enables TDX,
but the kernel needs to properly initialize it before it can be used to
create and run any TDX guests.  The TDX module will be initialized by
the KVM subsystem when KVM wants to use TDX.

Detect platform TDX support by detecting TDX private KeyIDs.

The TDX module itself requires one TDX KeyID as the 'TDX global KeyID'
to protect its metadata.  Each TDX guest also needs a TDX KeyID for its
own protection.  Just use the first TDX KeyID as the global KeyID and
leave the rest for TDX guests.  If no TDX KeyID is left for TDX guests,
disable TDX as initializing the TDX module alone is useless.

[ dhansen: add X86_FEATURE, replace helper function ]

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/all/20231208170740.53979-1-dave.hansen%40intel.com
2023-12-08 09:11:58 -08:00
Thomas Gleixner
be5341eb0d x86/entry: Convert INT 0x80 emulation to IDTENTRY
There is no real reason to have a separate ASM entry point implementation
for the legacy INT 0x80 syscall emulation on 64-bit.

IDTENTRY provides all the functionality needed with the only difference
that it does not:

  - save the syscall number (AX) into pt_regs::orig_ax
  - set pt_regs::ax to -ENOSYS

Both can be done safely in the C code of an IDTENTRY before invoking any of
the syscall related functions which depend on this convention.

Aside of ASM code reduction this prepares for detecting and handling a
local APIC injected vector 0x80.

[ kirill.shutemov: More verbose comments ]
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@vger.kernel.org> # v6.0+
2023-12-07 09:51:29 -08:00
Russell King (Oracle)
e850a5c406 x86/topology: convert to use arch_cpu_is_hotpluggable()
Convert x86 to use the arch_cpu_is_hotpluggable() helper rather than
arch_register_cpu().

Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R3w-00Cszy-6k@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06 12:41:49 +09:00
Russell King (Oracle)
b0b26bc580 x86/topology: use weak version of arch_unregister_cpu()
Since the x86 version of arch_unregister_cpu() is the same as the weak
version, drop the x86 specific version.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R3r-00Cszs-2R@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06 12:41:49 +09:00
James Morse
5b95f94c3b x86/topology: Switch over to GENERIC_CPU_DEVICES
Now that GENERIC_CPU_DEVICES calls arch_register_cpu(), which can be
overridden by the arch code, switch over to this to allow common code
to choose when the register_cpu() call is made.

x86's struct cpus come from struct x86_cpu, which has no other members
or users. Remove this and use the version defined by common code.

This is an intermediate step to the logic being moved to drivers/acpi,
where GENERIC_CPU_DEVICES will do the work when booting with acpi=off.

This patch also has the effect of moving the registration of CPUs from
subsys to driver core initialisation, prior to any initcalls running.

----
Changes since RFC:
 * Fixed the second copy of arch_register_cpu() used for non-hotplug
Changes since RFC v2:
 * Remove duplicate of the weak generic arch_register_cpu(), spotted
   by Jonathan Cameron. Add note about initialisation order change.
Changes since RFC v3:
 * Adapt to removal of EXPORT_SYMBOL()s

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R3l-00Cszm-UA@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06 12:41:49 +09:00
Russell King (Oracle)
9aa9b4fcc3 x86/topology: remove arch_*register_cpu() exports
arch_register_cpu() and arch_unregister_cpu() are not used by anything
that can be a module - they are used by drivers/base/cpu.c and
drivers/acpi/acpi_processor.c, neither of which can be a module.

Remove the exports.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R2r-00Csyh-7B@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06 12:41:48 +09:00
James Morse
d87c49377d x86: intel_epb: Don't rely on link order
intel_epb_init() is called as a subsys_initcall() to register cpuhp
callbacks. The callbacks make use of get_cpu_device() which will return
NULL unless register_cpu() has been called. register_cpu() is called
from topology_init(), which is also a subsys_initcall().

This is fragile. Moving the register_cpu() to a different
subsys_initcall()  leads to a NULL dereference during boot.

Make intel_epb_init() a late_initcall(), user-space can't provide a
policy before this point anyway.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R2m-00Csyb-2S@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06 12:41:48 +09:00
Borislav Petkov (AMD)
9c21ea53e6 x86/microcode/intel: Set new revision only after a successful update
This was meant to be done only when early microcode got updated
successfully. Move it into the if-branch.

Also, make sure the current revision is read unconditionally and only
once.

Fixes: 080990aa33 ("x86/microcode: Rework early revisions reporting")
Reported-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Link: https://lore.kernel.org/r/ZWjVt5dNRjbcvlzR@a4bf019067fa.jf.intel.com
2023-12-03 11:49:53 +01:00
Borislav Petkov (AMD)
9b8493dc43 x86/CPU/AMD: Check vendor in the AMD microcode callback
Commit in Fixes added an AMD-specific microcode callback. However, it
didn't check the CPU vendor the kernel runs on explicitly.

The only reason the Zenbleed check in it didn't run on other x86 vendors
hardware was pure coincidental luck:

  if (!cpu_has_amd_erratum(c, amd_zenbleed))
	  return;

gives true on other vendors because they don't have those families and
models.

However, with the removal of the cpu_has_amd_erratum() in

  05f5f73936 ("x86/CPU/AMD: Drop now unused CPU erratum checking function")

that coincidental condition is gone, leading to the zenbleed check
getting executed on other vendors too.

Add the explicit vendor check for the whole callback as it should've
been done in the first place.

Fixes: 522b1d6921 ("x86/cpu/amd: Add a Zenbleed fix")
Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231201184226.16749-1-bp@alien8.de
2023-12-02 11:40:24 +01:00
Uros Bizjak
fc50065325 x86/callthunks: Correct calculation of dest address in is_callthunk()
GCC didn't warn on the invalid use of relocation destination
pointer, so the calculated destination value was applied to
the uninitialized pointer location in error.

Fixes: 17bce3b2ae ("x86/callthunks: Handle %rip-relative relocations in call thunk template")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Closes: https://lore.kernel.org/lkml/20231201035457.GA321497@dev-arch.thelio-3990X/
Link: https://lore.kernel.org/r/20231201085727.3647051-1-ubizjak@gmail.com
2023-12-02 10:51:28 +01:00
Ilpo Järvinen
197e0da1f1 x86/pci: Use PCI_HEADER_TYPE_* instead of literals
Replace 0x7f and 0x80 literals with PCI_HEADER_TYPE_* defines.

Link: https://lore.kernel.org/r/20231124090919.23687-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-12-01 15:00:43 -06:00
Ashok Raj
1f693ef550 x86/microcode/intel: Remove redundant microcode late updated message
After successful update, the late loading routine prints an update
summary similar to:

  microcode: load: updated on 128 primary CPUs with 128 siblings
  microcode: revision: 0x21000170 -> 0x21000190

Remove the redundant message in the Intel side of the driver.

  [ bp: Massage commit message. ]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/ZWjYhedNfhAUmt0k@a4bf019067fa.jf.intel.com
2023-12-01 18:52:01 +01:00
Uros Bizjak
9e9d673b2c x86/smp: Use atomic_try_cmpxchg in native_stop_other_cpus()
Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) == old.

X86 CMPXCHG instruction returns success in ZF flag, so this change saves a
compare after the CMPXCHG.

Tested by building a native Fedora-38 kernel and rebooting
a 12-way SMP system using "shutdown -r" command some 100 times.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20231123203605.3474745-2-ubizjak@gmail.com
2023-11-30 20:25:09 +01:00
Uros Bizjak
9d1c8f2153 x86/smp: Move the call to smp_processor_id() after the early exit in native_stop_other_cpus()
Improve code generation in native_stop_other_cpus() a tiny bit:
smp_processor_id() accesses a per-CPU variable, so the compiler
is not able to move the call after the early exit on its own.

Also rename the "cpu" variable to a more descriptive "this_cpu", and
use 'cpu' as a separate iterator variable later in the function.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20231123203605.3474745-1-ubizjak@gmail.com
2023-11-30 20:25:09 +01:00
Ingo Molnar
6724ba89e0 x86/callthunks: Mark apply_relocation() as __init_or_module
Do it like the rest of the methods using it.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20231105213731.1878100-3-ubizjak@gmail.com
2023-11-30 20:15:51 +01:00
Uros Bizjak
0978d64f94 x86/acpi: Use %rip-relative addressing in wakeup_64.S
This is a "nice-to-have" change with minor code generation benefits:

 - Instruction with %rip-relative address operand is one byte shorter than
   its absolute address counterpart,

 - it is also compatible with position independent executable (-fpie) builds,

 - it is also consistent with what the compiler emits by default when
   a symbol is accessed.

No functional changes intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20231103104900.409470-1-ubizjak@gmail.com
2023-11-30 20:09:49 +01:00
Uros Bizjak
17bce3b2ae x86/callthunks: Handle %rip-relative relocations in call thunk template
Contrary to alternatives, relocations are currently not supported in
call thunk templates.  Re-use the existing infrastructure from
alternative.c to allow %rip-relative relocations when copying call
thunk template from its storage location.

The patch allows unification of ASM_INCREMENT_CALL_DEPTH, which already
uses PER_CPU_VAR macro, with INCREMENT_CALL_DEPTH, used in call thunk
template, which is currently limited to use absolute address.

Reuse existing relocation infrastructure from alternative.c.,
as suggested by Peter Zijlstra.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20231105213731.1878100-3-ubizjak@gmail.com
2023-11-30 20:06:17 +01:00
Ashwin Dayanand Kamat
27d25348d4 x86/sev: Fix kernel crash due to late update to read-only ghcb_version
A write-access violation page fault kernel crash was observed while running
cpuhotplug LTP testcases on SEV-ES enabled systems. The crash was
observed during hotplug, after the CPU was offlined and the process
was migrated to different CPU. setup_ghcb() is called again which
tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
is a read_only variable which is initialised during booting.

Trying to write it results in a pagefault:

  BUG: unable to handle page fault for address: ffffffffba556e70
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0003) - permissions violation
  [ ...]
  Call Trace:
   <TASK>
   ? __die_body.cold+0x1a/0x1f
   ? __die+0x2a/0x35
   ? page_fault_oops+0x10c/0x270
   ? setup_ghcb+0x71/0x100
   ? __x86_return_thunk+0x5/0x6
   ? search_exception_tables+0x60/0x70
   ? __x86_return_thunk+0x5/0x6
   ? fixup_exception+0x27/0x320
   ? kernelmode_fixup_or_oops+0xa2/0x120
   ? __bad_area_nosemaphore+0x16a/0x1b0
   ? kernel_exc_vmm_communication+0x60/0xb0
   ? bad_area_nosemaphore+0x16/0x20
   ? do_kern_addr_fault+0x7a/0x90
   ? exc_page_fault+0xbd/0x160
   ? asm_exc_page_fault+0x27/0x30
   ? setup_ghcb+0x71/0x100
   ? setup_ghcb+0xe/0x100
   cpu_init_exception_handling+0x1b9/0x1f0

The fix is to call sev_es_negotiate_protocol() only in the BSP boot phase,
and it only needs to be done once in any case.

[ mingo: Refined the changelog. ]

Fixes: 95d33bfaa3 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Bo Gan <bo.gan@broadcom.com>
Signed-off-by: Bo Gan <bo.gan@broadcom.com>
Signed-off-by: Ashwin Dayanand Kamat <ashwin.kamat@broadcom.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/1701254429-18250-1-git-send-email-kashwindayan@vmware.com
2023-11-30 10:23:12 +01:00
Borislav Petkov (AMD)
05f5f73936 x86/CPU/AMD: Drop now unused CPU erratum checking function
Bye bye.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-14-bp@alien8.de
2023-11-29 12:13:53 +01:00
Borislav Petkov (AMD)
794c68b204 x86/CPU/AMD: Get rid of amd_erratum_1485[]
No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-13-bp@alien8.de
2023-11-29 12:13:31 +01:00
Borislav Petkov (AMD)
b3ffbbd282 x86/CPU/AMD: Get rid of amd_erratum_400[]
Setting X86_BUG_AMD_E400 in init_amd() is early enough.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-12-bp@alien8.de
2023-11-29 12:13:23 +01:00
Borislav Petkov (AMD)
1709528f73 x86/CPU/AMD: Get rid of amd_erratum_383[]
Set it in init_amd_gh() unconditionally as that is the F10h init
function.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-11-bp@alien8.de
2023-11-29 12:13:08 +01:00
Borislav Petkov (AMD)
54c33e23f7 x86/CPU/AMD: Get rid of amd_erratum_1054[]
No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-10-bp@alien8.de
2023-11-29 12:12:55 +01:00
Borislav Petkov (AMD)
bfff3c6692 x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init function
No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-9-bp@alien8.de
2023-11-29 12:12:42 +01:00
Borislav Petkov (AMD)
f69759be25 x86/CPU/AMD: Move Zenbleed check to the Zen2 init function
Prefix it properly so that it is clear which generation it is dealing
with.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: http://lore.kernel.org/r/20231120104152.13740-8-bp@alien8.de
2023-11-29 12:12:34 +01:00
Borislav Petkov (AMD)
7c81ad8e8b x86/CPU/AMD: Rename init_amd_zn() to init_amd_zen_common()
Call it from all Zen init functions.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-7-bp@alien8.de
2023-11-29 12:12:27 +01:00
Borislav Petkov (AMD)
cfbf4f992b x86/CPU/AMD: Call the spectral chicken in the Zen2 init function
No functional change.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-6-bp@alien8.de
2023-11-29 12:12:20 +01:00
Borislav Petkov (AMD)
0da91912fc x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init function
No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-5-bp@alien8.de
2023-11-29 12:11:59 +01:00
Borislav Petkov (AMD)
affc66cb96 x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init function
No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-4-bp@alien8.de
2023-11-29 12:11:44 +01:00
Borislav Petkov (AMD)
a7c32a1ae9 x86/CPU/AMD: Carve out the erratum 1386 fix
Call it on the affected CPU generations.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-3-bp@alien8.de
2023-11-29 12:11:21 +01:00
Borislav Petkov (AMD)
30fa92832f x86/CPU/AMD: Add ZenX generations flags
Add X86_FEATURE flags for each Zen generation. They should be used from
now on instead of checking f/m/s.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/r/20231120104152.13740-2-bp@alien8.de
2023-11-29 12:11:01 +01:00
Muralidhara M K
47b744ea5e x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types
Add HWID and McaType values for new SMCA bank types.

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231102114225.2006878-3-muralimk@amd.com
2023-11-28 16:26:55 +01:00
Yazen Ghannam
ff03ff328f x86/mce/amd, EDAC/mce_amd: Move long names to decoder module
The long names of the SMCA banks are only used by the MCE decoder
module.

Move them out of the arch code and into the decoder module.

  [ bp: Name the long names array "smca_long_names", drop local ptr in
    decode_smca_error(), constify arrays. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231118193248.1296798-5-yazen.ghannam@amd.com
2023-11-27 12:16:51 +01:00
Linus Torvalds
4892711ace Fix/enhance x86 microcode version reporting: fix the bootup log spam,
and remove the driver version announcement to avoid version
 confusion when distros backport fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmVjFKgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1h/7hAAs5hL1KvrfdW0VpAW91MbX6mtIe7Emc8T
 LCiBJtl9UngRdASUC9CGrcIZ5JIps0702gAq0qPVzk5zKxC22ySWsqMZybask+eF
 d6E7amMtF+KX0wiCZSuC66StCKA08JfrUXgxvYHnxDjNqERYFmVr1QabGL1IN5lZ
 KUrVUyvN8VOnzypOiQ98lXGWDJYwaV7t+IzMMh7mT5OUkoo09e6tFm7IF+NWg5xe
 NYCcvZqyo0Ipld7HOjlGHYG+blFkDxJpfTby5UevZybXsPd00cxBzDSR9zs1sYeG
 Kt6cgwDhfewfcM1QFVvNV/SDVsPp1BlVvMUa6Xa3vtsnWCit8zQqMbkYYWUaTcIh
 yUJvtzh/xZQtxaQ8Z8SbI7EhUBOFJXoHWV9JoEe3gWsWA5thu+4iOCh/P8C2ON3n
 6kLSgNQ4GAylH34MWoS84t2Jxv7XmNZljR/78ucRQrJ1JJIEA+r4sJ9hK9btxqf2
 0n86StHuwtNXSQwEhDcacqUpFPLZ65Za1Y9AXc69CDuiwj3DvTVBMwjEOBbnGTrZ
 dL9QOYG5gkklOx4o5ePj7RoLrzz/j6dj6idmu8FxZZ4q+QB9vvL2lRHusJnUEloE
 yxR7WWOB/kyUZT7FriLHRuEP7yQNRSvLs6U7b8uXCHiGcAb2mN0fFi2m/BcJGS4A
 hHA0t9WyNBM=
 =eYQ4
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode fixes from Ingo Molnar:
 "Fix/enhance x86 microcode version reporting: fix the bootup log spam,
  and remove the driver version announcement to avoid version confusion
  when distros backport fixes"

* tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Rework early revisions reporting
  x86/microcode: Remove the driver announcement and version
2023-11-26 08:42:42 -08:00
James Morse
5bfa0e45e9 x86/cpu/intel_epb: Don't rely on link order
intel_epb_init() is called as a subsys_initcall() to register cpuhp
callbacks. The callbacks make use of get_cpu_device() which will return
NULL unless register_cpu() has been called. register_cpu() is called
from topology_init(), which is also a subsys_initcall().

This is fragile. Moving the register_cpu() to a different
subsys_initcall() leads to a NULL dereference during boot.

Make intel_epb_init() a late_initcall(), user-space can't provide a
policy before this point anyway.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-24 13:54:31 +01:00
Arnd Bergmann
f717a8d164 arch: include linux/cpu.h for trap_init() prototype
some architectures run into a -Wmissing-prototypes warning
for trap_init()

arch/microblaze/kernel/traps.c:21:6: warning: no previous prototype for 'trap_init' [-Wmissing-prototypes]

Include the right header to avoid this consistently, removing
the extra declarations on m68k and x86 that were added as local
workarounds already.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-11-23 11:32:31 +01:00
Adrian Huang
5e1c8a47fc x86/ioapic: Remove unfinished sentence from comment
[ mingo: Refine changelog. ]

Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
2023-11-23 11:00:24 +01:00
Yazen Ghannam
6175b40775 x86/mce/inject: Clear test status value
AMD systems generally allow MCA "simulation" where MCA registers can be
written with valid data and the full MCA handling flow can be tested by
software.

However, the platform on Scalable MCA systems, can prevent software from
writing data to the MCA registers. There is no architectural way to
determine this configuration. Therefore, the MCE injection module will
check for this behavior by writing and reading back a test status value.
This is done during module init, and the check can run on any CPU with
any valid MCA bank.

If MCA_STATUS writes are ignored by the platform, then there are no side
effects on the hardware state.

If the writes are not ignored, then the test status value will remain in
the hardware MCA_STATUS register. It is likely that the value will not
be overwritten by hardware or software, since the tested CPU and bank
are arbitrary. Therefore, the user may see a spurious, synthetic MCA
error reported whenever MCA is polled for this CPU.

Clear the test value immediately after writing it. It is very unlikely
that a valid MCA error is logged by hardware during the test. Errors
that cause an #MC won't be affected.

Fixes: 891e465a1b ("x86/mce: Check whether writes to MCA_STATUS are getting ignored")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231118193248.1296798-2-yazen.ghannam@amd.com
2023-11-22 19:13:38 +01:00
Linus Torvalds
05c8c94ed4 hyperv-fixes for 6.7-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmVdgqYTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXhBsCACzUGLF3vOQdrmgTMymzaaOzfLJtvNW
 oQ34FwMJMOAyJ6FxM12IJPHA2j+azl9CPjQc5O6F2CBcF8hVj2mDIINQIi+4wpV5
 FQv445g2KFml/+AJr/1waz1GmhHtr1rfu7B7NX6tPUtOpxKN7AHAQXWYmHnwK8BJ
 5Mh2a/7Lphjin4M1FWCeBTj0JtqF1oVAW2L9jsjqogq1JV0a51DIFutROtaPSC/4
 ssTLM5Rqpnw8Z1GWVYD2PObIW4A+h1LV1tNGOIoGW6mX56mPU+KmVA7tTKr8Je/i
 z3Jk8bZXFyLvPW2+KNJacbldKNcfwAFpReffNz/FO3R16Stq9Ta1mcE2
 =wXju
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - One fix for the KVP daemon (Ani Sinha)

 - Fix for the detection of E820_TYPE_PRAM in a Gen2 VM (Saurabh Sengar)

 - Micro-optimization for hv_nmi_unknown() (Uros Bizjak)

* tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()
  x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM
  hv/hv_kvp_daemon: Some small fixes for handling NM keyfiles
2023-11-22 09:56:26 -08:00
Uros Bizjak
18286883e7 x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()
Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) == old
in hv_nmi_unknown(). On x86 the CMPXCHG instruction returns success in
the ZF flag, so this change saves a compare after CMPXCHG. The generated
asm code improves from:

  3e:	65 8b 15 00 00 00 00 	mov    %gs:0x0(%rip),%edx
  45:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
  4a:	f0 0f b1 15 00 00 00 	lock cmpxchg %edx,0x0(%rip)
  51:	00
  52:	83 f8 ff             	cmp    $0xffffffff,%eax
  55:	0f 95 c0             	setne  %al

to:

  3e:	65 8b 15 00 00 00 00 	mov    %gs:0x0(%rip),%edx
  45:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
  4a:	f0 0f b1 15 00 00 00 	lock cmpxchg %edx,0x0(%rip)
  51:	00
  52:	0f 95 c0             	setne  %al

No functional change intended.

Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20231114170038.381634-1-ubizjak@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20231114170038.381634-1-ubizjak@gmail.com>
2023-11-22 03:47:44 +00:00
Andrew Cooper
07e8f88568 x86/apic: Drop apic::delivery_mode
This field is set to APIC_DELIVERY_MODE_FIXED in all cases, and is read
exactly once.  Fold the constant in uv_program_mmr() and drop the field.

Searching for the origin of the stale HyperV comment reveals commit
a31e58e129 ("x86/apic: Switch all APICs to Fixed delivery mode") which
notes:

  As a consequence of this change, the apic::irq_delivery_mode field is
  now pointless, but this needs to be cleaned up in a separate patch.

6 years is long enough for this technical debt to have survived.

  [ bp: Fold in
    https://lore.kernel.org/r/20231121123034.1442059-1-andrew.cooper3@citrix.com
  ]

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20231102-x86-apic-v1-1-bf049a2a0ed6@citrix.com
2023-11-21 16:58:54 +01:00
Borislav Petkov (AMD)
080990aa33 x86/microcode: Rework early revisions reporting
The AMD side of the loader issues the microcode revision for each
logical thread on the system, which can become really noisy on huge
machines. And doing that doesn't make a whole lot of sense - the
microcode revision is already in /proc/cpuinfo.

So in case one is interested in the theoretical support of mixed silicon
steppings on AMD, one can check there.

What is also missing on the AMD side - something which people have
requested before - is showing the microcode revision the CPU had
*before* the early update.

So abstract that up in the main code and have the BSP on each vendor
provide those revision numbers.

Then, dump them only once on driver init.

On Intel, do not dump the patch date - it is not needed.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/CAHk-=wg=%2B8rceshMkB4VnKxmRccVLtBLPBawnewZuuqyx5U=3A@mail.gmail.com
2023-11-21 16:35:48 +01:00