The number of commits for documentation is not huge this time around, but

there are some significant changes nonetheless:
 
 - Some more Spanish-language and Chinese translations.
 
 - The much-discussed documentation of the confidential-computing threat
   model.
 
 - Powerpc and RISCV documentation move under Documentation/arch - these
   complete this particular bit of documentation churn.
 
 - A large traditional-Chinese documentation update.
 
 - A new document on backporting and conflict resolution.
 
 - Some kernel-doc and Sphinx fixes.
 
 Plus the usual smattering of smaller updates and typo fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmVBNv8PHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Y0JkH/36MOpkaDnsY69/dMRKSuD4mAAP2H6LS8V63
 SsMgH5VCj8lcy/Tz1+J89t14pbcX8l0viKxSo4UxvzoJ5snrz8A8gZ9oqY7NCcNs
 nMtolnN5IwdbgGnEGqASSLsl07lnabhRK0VYv9ZO7lHjYQp97VsJ/qrjJn385HFE
 vYW8iRcxcKdwtuuwOtbPcdAMjP54saJdNC5wMLsfMR0csKcGbzaSNpqpiGovzT7l
 phG2DSxrJH0gUZyeGPryroNppaf+mVKSDSiwRdI8mzm0J67p6dZYYwBS1Iw6Awbf
 8iYoj6W63/FVQbXffPx5d6ffOSQh4JkAskxgBUOzluSGusSDc+4=
 =9HU5
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.7' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "The number of commits for documentation is not huge this time around,
  but there are some significant changes nonetheless:

   - Some more Spanish-language and Chinese translations

   - The much-discussed documentation of the confidential-computing
     threat model

   - Powerpc and RISCV documentation move under Documentation/arch -
     these complete this particular bit of documentation churn

   - A large traditional-Chinese documentation update

   - A new document on backporting and conflict resolution

   - Some kernel-doc and Sphinx fixes

  Plus the usual smattering of smaller updates and typo fixes"

* tag 'docs-6.7' of git://git.lwn.net/linux: (40 commits)
  scripts/kernel-doc: Fix the regex for matching -Werror flag
  docs: backporting: address feedback
  Documentation: driver-api: pps: Update PPS generator documentation
  speakup: Document USB support
  doc: blk-ioprio: Bring the doc in line with the implementation
  docs: usb: fix reference to nonexistent file in UVC Gadget
  docs: doc-guide: mention 'make refcheckdocs'
  Documentation: fix typo in dynamic-debug howto
  scripts/kernel-doc: match -Werror flag strictly
  Documentation/sphinx: Remove the repeated word "the" in comments.
  docs: sparse: add SPDX-License-Identifier
  docs/zh_CN: Add subsystem-apis Chinese translation
  docs/zh_TW: update contents for zh_TW
  docs: submitting-patches: encourage direct notifications to commenters
  docs: add backporting and conflict resolution document
  docs: move riscv under arch
  docs: update link to powerpc/vmemmap_dedup.rst
  mm/memory-hotplug: fix typo in documentation
  docs: move powerpc under arch
  PCI: Update the devres documentation regarding to pcim_*()
  ...
This commit is contained in:
Linus Torvalds 2023-11-01 17:11:41 -10:00
commit babe393974
190 changed files with 9304 additions and 2179 deletions

View File

@ -8,7 +8,7 @@ Description:
more bits set in the dimm-health-bitmap retrieved in
response to H_SCM_HEALTH hcall. The details of the bit
flags returned in response to this hcall is available
at 'Documentation/powerpc/papr_hcalls.rst' . Below are
at 'Documentation/arch/powerpc/papr_hcalls.rst' . Below are
the flags reported in this sysfs file:
* "not_armed"

View File

@ -364,7 +364,7 @@ Note, however, not all failures are truly "permanent". Some are
caused by over-heating, some by a poorly seated card. Many
PCI error events are caused by software bugs, e.g. DMAs to
wild addresses or bogus split transactions due to programming
errors. See the discussion in Documentation/powerpc/eeh-pci-error-recovery.rst
errors. See the discussion in Documentation/arch/powerpc/eeh-pci-error-recovery.rst
for additional detail on real-life experience of the causes of
software errors.
@ -404,7 +404,7 @@ That is, the recovery API only requires that:
.. note::
Implementation details for the powerpc platform are discussed in
the file Documentation/powerpc/eeh-pci-error-recovery.rst
the file Documentation/arch/powerpc/eeh-pci-error-recovery.rst
As of this writing, there is a growing list of device drivers with
patches implementing error recovery. Not all of these patches are in

View File

@ -2030,7 +2030,7 @@ IO Priority
~~~~~~~~~~~
A single attribute controls the behavior of the I/O priority cgroup policy,
namely the blkio.prio.class attribute. The following values are accepted for
namely the io.prio.class attribute. The following values are accepted for
that attribute:
no-change
@ -2059,9 +2059,11 @@ The following numerical values are associated with the I/O priority policies:
+----------------+---+
| no-change | 0 |
+----------------+---+
| rt-to-be | 2 |
| promote-to-rt | 1 |
+----------------+---+
| all-to-idle | 3 |
| restrict-to-be | 2 |
+----------------+---+
| idle | 3 |
+----------------+---+
The numerical value that corresponds to each I/O priority class is as follows:
@ -2081,7 +2083,7 @@ The algorithm to set the I/O priority class for a request is as follows:
- If I/O priority class policy is promote-to-rt, change the request I/O
priority class to IOPRIO_CLASS_RT and change the request I/O priority
level to 4.
- If I/O priorityt class is not promote-to-rt, translate the I/O priority
- If I/O priority class policy is not promote-to-rt, translate the I/O priority
class policy into a number, then change the request I/O priority class
into the maximum of the I/O priority class policy number and the numerical
I/O priority class.

View File

@ -259,7 +259,7 @@ Debug Messages at Module Initialization Time
When ``modprobe foo`` is called, modprobe scans ``/proc/cmdline`` for
``foo.params``, strips ``foo.``, and passes them to the kernel along with
params given in modprobe args or ``/etc/modprob.d/*.conf`` files,
params given in modprobe args or ``/etc/modprobe.d/*.conf`` files,
in the following order:
1. parameters given via ``/etc/modprobe.d/*.conf``::

View File

@ -15,7 +15,7 @@ between architectures is in drivers/firmware/efi/libstub.
For arm64, there is no compressed kernel support, so the Image itself
masquerades as a PE/COFF image and the EFI stub is linked into the
kernel. The arm64 EFI stub lives in arch/arm64/kernel/efi-entry.S
kernel. The arm64 EFI stub lives in drivers/firmware/efi/libstub/arm64.c
and drivers/firmware/efi/libstub/arm64-stub.c.
By using the EFI boot stub it's possible to boot a Linux kernel

View File

@ -102,9 +102,19 @@ The possible values in this file are:
* - 'Vulnerable'
- The processor is vulnerable, but no mitigation enabled
* - 'Vulnerable: Clear CPU buffers attempted, no microcode'
- The processor is vulnerable but microcode is not updated.
- The processor is vulnerable but microcode is not updated. The
mitigation is enabled on a best effort basis.
The mitigation is enabled on a best effort basis. See :ref:`vmwerv`
If the processor is vulnerable but the availability of the microcode
based mitigation mechanism is not advertised via CPUID, the kernel
selects a best effort mitigation mode. This mode invokes the mitigation
instructions without a guarantee that they clear the CPU buffers.
This is done to address virtualization scenarios where the host has the
microcode update applied, but the hypervisor is not yet updated to
expose the CPUID to the guest. If the host has updated microcode the
protection takes effect; otherwise a few CPU cycles are wasted
pointlessly.
* - 'Mitigation: Clear CPU buffers'
- The processor is vulnerable and the CPU buffer clearing mitigation is
enabled.
@ -119,24 +129,6 @@ to the above information:
'SMT Host state unknown' Kernel runs in a VM, Host SMT state unknown
======================== ============================================
.. _vmwerv:
Best effort mitigation mode
^^^^^^^^^^^^^^^^^^^^^^^^^^^
If the processor is vulnerable, but the availability of the microcode based
mitigation mechanism is not advertised via CPUID the kernel selects a best
effort mitigation mode. This mode invokes the mitigation instructions
without a guarantee that they clear the CPU buffers.
This is done to address virtualization scenarios where the host has the
microcode update applied, but the hypervisor is not yet updated to expose
the CPUID to the guest. If the host has updated microcode the protection
takes effect otherwise a few cpu cycles are wasted pointlessly.
The state in the mds sysfs file reflects this situation accordingly.
Mitigation mechanism
-------------------------

View File

@ -225,8 +225,19 @@ The possible values in this file are:
* - 'Vulnerable'
- The processor is vulnerable, but no mitigation enabled
* - 'Vulnerable: Clear CPU buffers attempted, no microcode'
- The processor is vulnerable, but microcode is not updated. The
- The processor is vulnerable but microcode is not updated. The
mitigation is enabled on a best effort basis.
If the processor is vulnerable but the availability of the microcode
based mitigation mechanism is not advertised via CPUID, the kernel
selects a best effort mitigation mode. This mode invokes the mitigation
instructions without a guarantee that they clear the CPU buffers.
This is done to address virtualization scenarios where the host has the
microcode update applied, but the hypervisor is not yet updated to
expose the CPUID to the guest. If the host has updated microcode the
protection takes effect; otherwise a few CPU cycles are wasted
pointlessly.
* - 'Mitigation: Clear CPU buffers'
- The processor is vulnerable and the CPU buffer clearing mitigation is
enabled.

View File

@ -98,7 +98,19 @@ The possible values in this file are:
* - 'Vulnerable'
- The CPU is affected by this vulnerability and the microcode and kernel mitigation are not applied.
* - 'Vulnerable: Clear CPU buffers attempted, no microcode'
- The system tries to clear the buffers but the microcode might not support the operation.
- The processor is vulnerable but microcode is not updated. The
mitigation is enabled on a best effort basis.
If the processor is vulnerable but the availability of the microcode
based mitigation mechanism is not advertised via CPUID, the kernel
selects a best effort mitigation mode. This mode invokes the mitigation
instructions without a guarantee that they clear the CPU buffers.
This is done to address virtualization scenarios where the host has the
microcode update applied, but the hypervisor is not yet updated to
expose the CPUID to the guest. If the host has updated microcode the
protection takes effect; otherwise a few CPU cycles are wasted
pointlessly.
* - 'Mitigation: Clear CPU buffers'
- The microcode has been updated to clear the buffers. TSX is still enabled.
* - 'Mitigation: TSX disabled'
@ -106,25 +118,6 @@ The possible values in this file are:
* - 'Not affected'
- The CPU is not affected by this issue.
.. _ucode_needed:
Best effort mitigation mode
^^^^^^^^^^^^^^^^^^^^^^^^^^^
If the processor is vulnerable, but the availability of the microcode-based
mitigation mechanism is not advertised via CPUID the kernel selects a best
effort mitigation mode. This mode invokes the mitigation instructions
without a guarantee that they clear the CPU buffers.
This is done to address virtualization scenarios where the host has the
microcode update applied, but the hypervisor is not yet updated to expose the
CPUID to the guest. If the host has updated microcode the protection takes
effect; otherwise a few CPU cycles are wasted pointlessly.
The state in the tsx_async_abort sysfs file reflects this situation
accordingly.
Mitigation mechanism
--------------------

View File

@ -75,7 +75,7 @@ Memory hotunplug consists of two phases:
(1) Offlining memory blocks
(2) Removing the memory from Linux
In the fist phase, memory is "hidden" from the page allocator again, for
In the first phase, memory is "hidden" from the page allocator again, for
example, by migrating busy memory to other memory locations and removing all
relevant free pages from the page allocator After this phase, the memory is no
longer visible in memory statistics of the system.
@ -250,15 +250,15 @@ Observing the State of Memory Blocks
The state (online/offline/going-offline) of a memory block can be observed
either via::
% cat /sys/device/system/memory/memoryXXX/state
% cat /sys/devices/system/memory/memoryXXX/state
Or alternatively (1/0) via::
% cat /sys/device/system/memory/memoryXXX/online
% cat /sys/devices/system/memory/memoryXXX/online
For an online memory block, the managing zone can be observed via::
% cat /sys/device/system/memory/memoryXXX/valid_zones
% cat /sys/devices/system/memory/memoryXXX/valid_zones
Configuring Memory Hot(Un)Plug
==============================
@ -326,7 +326,7 @@ however, a memory block might span memory holes. A memory block spanning memory
holes cannot be offlined.
For example, assume 1 GiB memory block size. A device for a memory starting at
0x100000000 is ``/sys/device/system/memory/memory4``::
0x100000000 is ``/sys/devices/system/memory/memory4``::
(0x100000000 / 1Gib = 4)

View File

@ -7,7 +7,7 @@ Last modified on Mon Sep 27 14:26:31 2010
Document version 1.3
Copyright (c) 2005 Gene Collins
Copyright (c) 2008 Samuel Thibault
Copyright (c) 2008, 2023 Samuel Thibault
Copyright (c) 2009, 2010 the Speakup Team
Permission is granted to copy, distribute and/or modify this document
@ -83,8 +83,7 @@ spkout -- Speak Out
txprt -- Transport
dummy -- Plain text terminal
Note: Speakup does * NOT * support usb connections! Speakup also does *
NOT * support the internal Tripletalk!
Note: Speakup does * NOT * support the internal Tripletalk!
Speakup does support two other synthesizers, but because they work in
conjunction with other software, they must be loaded as modules after
@ -94,6 +93,12 @@ These are as follows:
decpc -- DecTalk PC (not available at boot up)
soft -- One of several software synthesizers (not available at boot up)
By default speakup looks for the synthesizer on the ttyS0 serial port. This can
be changed with the device parameter of the modules, for instance for
DoubleTalk LT:
speakup_ltlk.dev=ttyUSB0
See the sections on loading modules and software synthesizers later in
this manual for further details. It should be noted here that the
speakup.synth boot parameter will have no effect if Speakup has been

View File

@ -42,16 +42,16 @@ pre-allocation or re-sizing of any kernel data structures.
dentry-state
------------
This file shows the values in ``struct dentry_stat``, as defined in
``linux/include/linux/dcache.h``::
This file shows the values in ``struct dentry_stat_t``, as defined in
``fs/dcache.c``::
struct dentry_stat_t dentry_stat {
int nr_dentry;
int nr_unused;
int age_limit; /* age in seconds */
int want_pages; /* pages requested by system */
int nr_negative; /* # of unused negative dentries */
int dummy; /* Reserved for future use */
long nr_dentry;
long nr_unused;
long age_limit; /* age in seconds */
long want_pages; /* pages requested by system */
long nr_negative; /* # of unused negative dentries */
long dummy; /* Reserved for future use */
};
Dentries are dynamically allocated and deallocated.

View File

@ -742,8 +742,8 @@ overcommit_memory
This value contains a flag that enables memory overcommitment.
When this flag is 0, the kernel attempts to estimate the amount
of free memory left when userspace requests more memory.
When this flag is 0, the kernel compares the userspace memory request
size against total memory plus swap and rejects obvious overcommits.
When this flag is 1, the kernel pretends there is always enough
memory until it actually runs out.

View File

@ -18,8 +18,8 @@ implementation.
nios2/index
openrisc/index
parisc/index
../powerpc/index
../riscv/index
powerpc/index
riscv/index
s390/index
sh/index
sparc/index

View File

@ -32,7 +32,7 @@ Introduction
responsible for the initialization of the adapter, setting up the
special path for user space access, and performing error recovery. It
communicates directly the Flash Accelerator Functional Unit (AFU)
as described in Documentation/powerpc/cxl.rst.
as described in Documentation/arch/powerpc/cxl.rst.
The cxlflash driver supports two, mutually exclusive, modes of
operation at the device (LUN) level:

View File

@ -202,7 +202,7 @@ PPC_FEATURE2_VEC_CRYPTO
PPC_FEATURE2_HTM_NOSC
System calls fail if called in a transactional state, see
Documentation/powerpc/syscall64-abi.rst
Documentation/arch/powerpc/syscall64-abi.rst
PPC_FEATURE2_ARCH_3_00
The processor supports the v3.0B / v3.0C userlevel architecture. Processors
@ -217,11 +217,11 @@ PPC_FEATURE2_DARN
PPC_FEATURE2_SCV
The scv 0 instruction may be used for system calls, see
Documentation/powerpc/syscall64-abi.rst.
Documentation/arch/powerpc/syscall64-abi.rst.
PPC_FEATURE2_HTM_NO_SUSPEND
A limited Transactional Memory facility that does not support suspend is
available, see Documentation/powerpc/transactional_memory.rst.
available, see Documentation/arch/powerpc/transactional_memory.rst.
PPC_FEATURE2_ARCH_3_1
The processor supports the v3.1 userlevel architecture. Processors

View File

@ -56,7 +56,7 @@ sent to the software queue.
Then, after the requests are processed by software queues, they will be placed
at the hardware queue, a second stage queue where the hardware has direct access
to process those requests. However, if the hardware does not have enough
resources to accept more requests, blk-mq will places requests on a temporary
resources to accept more requests, blk-mq will place requests on a temporary
queue, to be sent in the future, when the hardware is able.
Software staging queues

View File

@ -138,6 +138,10 @@ times, but it's highly important. If we can actually eliminate warnings
from the documentation build, then we can start expecting developers to
avoid adding new ones.
In addition to warnings from the regular documentation build, you can also
run ``make refcheckdocs`` to find references to nonexistent documentation
files.
Languishing kerneldoc comments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@ -322,10 +322,8 @@ IOMAP
devm_platform_ioremap_resource_byname()
devm_platform_get_and_ioremap_resource()
devm_iounmap()
pcim_iomap()
pcim_iomap_regions() : do request_region() and iomap() on multiple BARs
pcim_iomap_table() : array of mapped addresses indexed by BAR
pcim_iounmap()
Note: For the PCI devices the specific pcim_*() functions may be used, see below.
IRQ
devm_free_irq()
@ -392,8 +390,16 @@ PCI
devm_pci_alloc_host_bridge() : managed PCI host bridge allocation
devm_pci_remap_cfgspace() : ioremap PCI configuration space
devm_pci_remap_cfg_resource() : ioremap PCI configuration space resource
pcim_enable_device() : after success, all PCI ops become managed
pcim_iomap() : do iomap() on a single BAR
pcim_iomap_regions() : do request_region() and iomap() on multiple BARs
pcim_iomap_regions_request_all() : do request_region() on all and iomap() on multiple BARs
pcim_iomap_table() : array of mapped addresses indexed by BAR
pcim_iounmap() : do iounmap() on a single BAR
pcim_iounmap_regions() : do iounmap() and release_region() on multiple BARs
pcim_pin_device() : keep PCI device enabled after release
pcim_set_mwi() : enable Memory-Write-Invalidate PCI transaction
PHY
devm_usb_get_phy()

View File

@ -200,11 +200,17 @@ Generators
Sometimes one needs to be able not only to catch PPS signals but to produce
them also. For example, running a distributed simulation, which requires
computers' clock to be synchronized very tightly. One way to do this is to
invent some complicated hardware solutions but it may be neither necessary
nor affordable. The cheap way is to load a PPS generator on one of the
computers (master) and PPS clients on others (slaves), and use very simple
cables to deliver signals using parallel ports, for example.
computers' clock to be synchronized very tightly.
Parallel port generator
------------------------
One way to do this is to invent some complicated hardware solutions but it
may be neither necessary nor affordable. The cheap way is to load a PPS
generator on one of the computers (master) and PPS clients on others
(slaves), and use very simple cables to deliver signals using parallel
ports, for example.
Parallel port cable pinout::

View File

@ -111,13 +111,13 @@ channel that was exported. The following properties will then be available:
duty_cycle
The active time of the PWM signal (read/write).
Value is in nanoseconds and must be less than the period.
Value is in nanoseconds and must be less than or equal to the period.
polarity
Changes the polarity of the PWM signal (read/write).
Writes to this property only work if the PWM chip supports changing
the polarity. The polarity can only be changed if the PWM is not
enabled. Value is the string "normal" or "inversed".
the polarity.
Value is the string "normal" or "inversed".
enable
Enable/disable the PWM signal (read/write).

View File

@ -1585,7 +1585,7 @@ The transaction sequence looks like this:
2. The second transaction contains a physical update to the free space btrees
of AG 3 to release the former BMBT block and a second physical update to the
free space btrees of AG 7 to release the unmapped file space.
Observe that the the physical updates are resequenced in the correct order
Observe that the physical updates are resequenced in the correct order
when possible.
Attached to the transaction is a an extent free done (EFD) log item.
The EFD contains a pointer to the EFI logged in transaction #1 so that log

View File

@ -101,7 +101,7 @@ to do something different in the near future.
../doc-guide/maintainer-profile
../nvdimm/maintainer-entry-profile
../riscv/patch-acceptance
../arch/riscv/patch-acceptance
../driver-api/media/maintainer-entry-profile
../driver-api/vfio-pci-device-specific-driver-acceptance
../nvme/feature-and-quirk-policy

View File

@ -8,8 +8,7 @@ The Linux kernel supports the following overcommit handling modes
Heuristic overcommit handling. Obvious overcommits of address
space are refused. Used for a typical system. It ensures a
seriously wild allocation fails while allowing overcommit to
reduce swap usage. root is allowed to allocate slightly more
memory in this mode. This is the default.
reduce swap usage. This is the default.
1
Always overcommit. Appropriate for some scientific

View File

@ -152,3 +152,130 @@ Page table handling code that wishes to be architecture-neutral, such as the
virtual memory manager, will need to be written so that it traverses all of the
currently five levels. This style should also be preferred for
architecture-specific code, so as to be robust to future changes.
MMU, TLB, and Page Faults
=========================
The `Memory Management Unit (MMU)` is a hardware component that handles virtual
to physical address translations. It may use relatively small caches in hardware
called `Translation Lookaside Buffers (TLBs)` and `Page Walk Caches` to speed up
these translations.
When CPU accesses a memory location, it provides a virtual address to the MMU,
which checks if there is the existing translation in the TLB or in the Page
Walk Caches (on architectures that support them). If no translation is found,
MMU uses the page walks to determine the physical address and create the map.
The dirty bit for a page is set (i.e., turned on) when the page is written to.
Each page of memory has associated permission and dirty bits. The latter
indicate that the page has been modified since it was loaded into memory.
If nothing prevents it, eventually the physical memory can be accessed and the
requested operation on the physical frame is performed.
There are several reasons why the MMU can't find certain translations. It could
happen because the CPU is trying to access memory that the current task is not
permitted to, or because the data is not present into physical memory.
When these conditions happen, the MMU triggers page faults, which are types of
exceptions that signal the CPU to pause the current execution and run a special
function to handle the mentioned exceptions.
There are common and expected causes of page faults. These are triggered by
process management optimization techniques called "Lazy Allocation" and
"Copy-on-Write". Page faults may also happen when frames have been swapped out
to persistent storage (swap partition or file) and evicted from their physical
locations.
These techniques improve memory efficiency, reduce latency, and minimize space
occupation. This document won't go deeper into the details of "Lazy Allocation"
and "Copy-on-Write" because these subjects are out of scope as they belong to
Process Address Management.
Swapping differentiates itself from the other mentioned techniques because it's
undesirable since it's performed as a means to reduce memory under heavy
pressure.
Swapping can't work for memory mapped by kernel logical addresses. These are a
subset of the kernel virtual space that directly maps a contiguous range of
physical memory. Given any logical address, its physical address is determined
with simple arithmetic on an offset. Accesses to logical addresses are fast
because they avoid the need for complex page table lookups at the expenses of
frames not being evictable and pageable out.
If the kernel fails to make room for the data that must be present in the
physical frames, the kernel invokes the out-of-memory (OOM) killer to make room
by terminating lower priority processes until pressure reduces under a safe
threshold.
Additionally, page faults may be also caused by code bugs or by maliciously
crafted addresses that the CPU is instructed to access. A thread of a process
could use instructions to address (non-shared) memory which does not belong to
its own address space, or could try to execute an instruction that want to write
to a read-only location.
If the above-mentioned conditions happen in user-space, the kernel sends a
`Segmentation Fault` (SIGSEGV) signal to the current thread. That signal usually
causes the termination of the thread and of the process it belongs to.
This document is going to simplify and show an high altitude view of how the
Linux kernel handles these page faults, creates tables and tables' entries,
check if memory is present and, if not, requests to load data from persistent
storage or from other devices, and updates the MMU and its caches.
The first steps are architecture dependent. Most architectures jump to
`do_page_fault()`, whereas the x86 interrupt handler is defined by the
`DEFINE_IDTENTRY_RAW_ERRORCODE()` macro which calls `handle_page_fault()`.
Whatever the routes, all architectures end up to the invocation of
`handle_mm_fault()` which, in turn, (likely) ends up calling
`__handle_mm_fault()` to carry out the actual work of allocating the page
tables.
The unfortunate case of not being able to call `__handle_mm_fault()` means
that the virtual address is pointing to areas of physical memory which are not
permitted to be accessed (at least from the current context). This
condition resolves to the kernel sending the above-mentioned SIGSEGV signal
to the process and leads to the consequences already explained.
`__handle_mm_fault()` carries out its work by calling several functions to
find the entry's offsets of the upper layers of the page tables and allocate
the tables that it may need.
The functions that look for the offset have names like `*_offset()`, where the
"*" is for pgd, p4d, pud, pmd, pte; instead the functions to allocate the
corresponding tables, layer by layer, are called `*_alloc`, using the
above-mentioned convention to name them after the corresponding types of tables
in the hierarchy.
The page table walk may end at one of the middle or upper layers (PMD, PUD).
Linux supports larger page sizes than the usual 4KB (i.e., the so called
`huge pages`). When using these kinds of larger pages, higher level pages can
directly map them, with no need to use lower level page entries (PTE). Huge
pages contain large contiguous physical regions that usually span from 2MB to
1GB. They are respectively mapped by the PMD and PUD page entries.
The huge pages bring with them several benefits like reduced TLB pressure,
reduced page table overhead, memory allocation efficiency, and performance
improvement for certain workloads. However, these benefits come with
trade-offs, like wasted memory and allocation challenges.
At the very end of the walk with allocations, if it didn't return errors,
`__handle_mm_fault()` finally calls `handle_pte_fault()`, which via `do_fault()`
performs one of `do_read_fault()`, `do_cow_fault()`, `do_shared_fault()`.
"read", "cow", "shared" give hints about the reasons and the kind of fault it's
handling.
The actual implementation of the workflow is very complex. Its design allows
Linux to handle page faults in a way that is tailored to the specific
characteristics of each architecture, while still sharing a common overall
structure.
To conclude this high altitude view of how Linux handles page faults, let's
add that the page faults handler can be disabled and enabled respectively with
`pagefault_disable()` and `pagefault_enable()`.
Several code path make use of the latter two functions because they need to
disable traps into the page faults handler, mostly to prevent deadlocks.

View File

@ -211,7 +211,7 @@ the device (altmap).
The following page sizes are supported in DAX: PAGE_SIZE (4K on x86_64),
PMD_SIZE (2M on x86_64) and PUD_SIZE (1G on x86_64).
For powerpc equivalent details see Documentation/powerpc/vmemmap_dedup.rst
For powerpc equivalent details see Documentation/arch/powerpc/vmemmap_dedup.rst
The differences with HugeTLB are relatively minor.

View File

@ -0,0 +1,604 @@
.. SPDX-License-Identifier: GPL-2.0
===================================
Backporting and conflict resolution
===================================
:Author: Vegard Nossum <vegard.nossum@oracle.com>
.. contents::
:local:
:depth: 3
:backlinks: none
Introduction
============
Some developers may never really have to deal with backporting patches,
merging branches, or resolving conflicts in their day-to-day work, so
when a merge conflict does pop up, it can be daunting. Luckily,
resolving conflicts is a skill like any other, and there are many useful
techniques you can use to make the process smoother and increase your
confidence in the result.
This document aims to be a comprehensive, step-by-step guide to
backporting and conflict resolution.
Applying the patch to a tree
============================
Sometimes the patch you are backporting already exists as a git commit,
in which case you just cherry-pick it directly using
``git cherry-pick``. However, if the patch comes from an email, as it
often does for the Linux kernel, you will need to apply it to a tree
using ``git am``.
If you've ever used ``git am``, you probably already know that it is
quite picky about the patch applying perfectly to your source tree. In
fact, you've probably had nightmares about ``.rej`` files and trying to
edit the patch to make it apply.
It is strongly recommended to instead find an appropriate base version
where the patch applies cleanly and *then* cherry-pick it over to your
destination tree, as this will make git output conflict markers and let
you resolve conflicts with the help of git and any other conflict
resolution tools you might prefer to use. For example, if you want to
apply a patch that just arrived on LKML to an older stable kernel, you
can apply it to the most recent mainline kernel and then cherry-pick it
to your older stable branch.
It's generally better to use the exact same base as the one the patch
was generated from, but it doesn't really matter that much as long as it
applies cleanly and isn't too far from the original base. The only
problem with applying the patch to the "wrong" base is that it may pull
in more unrelated changes in the context of the diff when cherry-picking
it to the older branch.
A good reason to prefer ``git cherry-pick`` over ``git am`` is that git
knows the precise history of an existing commit, so it will know when
code has moved around and changed the line numbers; this in turn makes
it less likely to apply the patch to the wrong place (which can result
in silent mistakes or messy conflicts).
If you are using `b4`_. and you are applying the patch directly from an
email, you can use ``b4 am`` with the options ``-g``/``--guess-base``
and ``-3``/``--prep-3way`` to do some of this automatically (see the
`b4 presentation`_ for more information). However, the rest of this
article will assume that you are doing a plain ``git cherry-pick``.
.. _b4: https://people.kernel.org/monsieuricon/introducing-b4-and-patch-attestation
.. _b4 presentation: https://youtu.be/mF10hgVIx9o?t=2996
Once you have the patch in git, you can go ahead and cherry-pick it into
your source tree. Don't forget to cherry-pick with ``-x`` if you want a
written record of where the patch came from!
Note that if you are submiting a patch for stable, the format is
slightly different; the first line after the subject line needs tobe
either::
commit <upstream commit> upstream
or::
[ Upstream commit <upstream commit> ]
Resolving conflicts
===================
Uh-oh; the cherry-pick failed with a vaguely threatening message::
CONFLICT (content): Merge conflict
What to do now?
In general, conflicts appear when the context of the patch (i.e., the
lines being changed and/or the lines surrounding the changes) doesn't
match what's in the tree you are trying to apply the patch *to*.
For backports, what likely happened was that the branch you are
backporting from contains patches not in the branch you are backporting
to. However, the reverse is also possible. In any case, the result is a
conflict that needs to be resolved.
If your attempted cherry-pick fails with a conflict, git automatically
edits the files to include so-called conflict markers showing you where
the conflict is and how the two branches have diverged. Resolving the
conflict typically means editing the end result in such a way that it
takes into account these other commits.
Resolving the conflict can be done either by hand in a regular text
editor or using a dedicated conflict resolution tool.
Many people prefer to use their regular text editor and edit the
conflict directly, as it may be easier to understand what you're doing
and to control the final result. There are definitely pros and cons to
each method, and sometimes there's value in using both.
We will not cover using dedicated merge tools here beyond providing some
pointers to various tools that you could use:
- `Emacs Ediff mode <https://www.emacswiki.org/emacs/EdiffMode>`__
- `vimdiff/gvimdiff <https://linux.die.net/man/1/vimdiff>`__
- `KDiff3 <http://kdiff3.sourceforge.net/>`__
- `TortoiseMerge <https://tortoisesvn.net/TortoiseMerge.html>`__
- `Meld <https://meldmerge.org/help/>`__
- `P4Merge <https://www.perforce.com/products/helix-core-apps/merge-diff-tool-p4merge>`__
- `Beyond Compare <https://www.scootersoftware.com/>`__
- `IntelliJ <https://www.jetbrains.com/help/idea/resolve-conflicts.html>`__
- `VSCode <https://code.visualstudio.com/docs/editor/versioncontrol>`__
To configure git to work with these, see ``git mergetool --help`` or
the official `git-mergetool documentation`_.
.. _git-mergetool documentation: https://git-scm.com/docs/git-mergetool
Prerequisite patches
--------------------
Most conflicts happen because the branch you are backporting to is
missing some patches compared to the branch you are backporting *from*.
In the more general case (such as merging two independent branches),
development could have happened on either branch, or the branches have
simply diverged -- perhaps your older branch had some other backports
applied to it that themselves needed conflict resolutions, causing a
divergence.
It's important to always identify the commit or commits that caused the
conflict, as otherwise you cannot be confident in the correctness of
your resolution. As an added bonus, especially if the patch is in an
area you're not that famliar with, the changelogs of these commits will
often give you the context to understand the code and potential problems
or pitfalls with your conflict resolution.
git log
~~~~~~~
A good first step is to look at ``git log`` for the file that has the
conflict -- this is usually sufficient when there aren't a lot of
patches to the file, but may get confusing if the file is big and
frequently patched. You should run ``git log`` on the range of commits
between your currently checked-out branch (``HEAD``) and the parent of
the patch you are picking (``<commit>``), i.e.::
git log HEAD..<commit>^ -- <path>
Even better, if you want to restrict this output to a single function
(because that's where the conflict appears), you can use the following
syntax::
git log -L:'\<function\>':<path> HEAD..<commit>^
.. note::
The ``\<`` and ``\>`` around the function name ensure that the
matches are anchored on a word boundary. This is important, as this
part is actually a regex and git only follows the first match, so
if you use ``-L:thread_stack:kernel/fork.c`` it may only give you
results for the function ``try_release_thread_stack_to_cache`` even
though there are many other functions in that file containing the
string ``thread_stack`` in their names.
Another useful option for ``git log`` is ``-G``, which allows you to
filter on certain strings appearing in the diffs of the commits you are
listing::
git log -G'regex' HEAD..<commit>^ -- <path>
This can also be a handy way to quickly find when something (e.g. a
function call or a variable) was changed, added, or removed. The search
string is a regular expression, which means you can potentially search
for more specific things like assignments to a specific struct member::
git log -G'\->index\>.*='
git blame
~~~~~~~~~
Another way to find prerequisite commits (albeit only the most recent
one for a given conflict) is to run ``git blame``. In this case, you
need to run it against the parent commit of the patch you are
cherry-picking and the file where the conflict appared, i.e.::
git blame <commit>^ -- <path>
This command also accepts the ``-L`` argument (for restricting the
output to a single function), but in this case you specify the filename
at the end of the command as usual::
git blame -L:'\<function\>' <commit>^ -- <path>
Navigate to the place where the conflict occurred. The first column of
the blame output is the commit ID of the patch that added a given line
of code.
It might be a good idea to ``git show`` these commits and see if they
look like they might be the source of the conflict. Sometimes there will
be more than one of these commits, either because multiple commits
changed different lines of the same conflict area *or* because multiple
subsequent patches changed the same line (or lines) multiple times. In
the latter case, you may have to run ``git blame`` again and specify the
older version of the file to look at in order to dig further back in
the history of the file.
Prerequisite vs. incidental patches
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Having found the patch that caused the conflict, you need to determine
whether it is a prerequisite for the patch you are backporting or
whether it is just incidental and can be skipped. An incidental patch
would be one that touches the same code as the patch you are
backporting, but does not change the semantics of the code in any
material way. For example, a whitespace cleanup patch is completely
incidental -- likewise, a patch that simply renames a function or a
variable would be incidental as well. On the other hand, if the function
being changed does not even exist in your current branch then this would
not be incidental at all and you need to carefully consider whether the
patch adding the function should be cherry-picked first.
If you find that there is a necessary prerequisite patch, then you need
to stop and cherry-pick that instead. If you've already resolved some
conflicts in a different file and don't want to do it again, you can
create a temporary copy of that file.
To abort the current cherry-pick, go ahead and run
``git cherry-pick --abort``, then restart the cherry-picking process
with the commit ID of the prerequisite patch instead.
Understanding conflict markers
------------------------------
Combined diffs
~~~~~~~~~~~~~~
Let's say you've decided against picking (or reverting) additional
patches and you just want to resolve the conflict. Git will have
inserted conflict markers into your file. Out of the box, this will look
something like::
<<<<<<< HEAD
this is what's in your current tree before cherry-picking
=======
this is what the patch wants it to be after cherry-picking
>>>>>>> <commit>... title
This is what you would see if you opened the file in your editor.
However, if you were to run ``git diff`` without any arguments, the
output would look something like this::
$ git diff
[...]
++<<<<<<<< HEAD
+this is what's in your current tree before cherry-picking
++========
+ this is what the patch wants it to be after cherry-picking
++>>>>>>>> <commit>... title
When you are resolving a conflict, the behavior of ``git diff`` differs
from its normal behavior. Notice the two columns of diff markers
instead of the usual one; this is a so-called "`combined diff`_", here
showing the 3-way diff (or diff-of-diffs) between
#. the current branch (before cherry-picking) and the current working
directory, and
#. the current branch (before cherry-picking) and the file as it looks
after the original patch has been applied.
.. _combined diff: https://git-scm.com/docs/diff-format#_combined_diff_format
Better diffs
~~~~~~~~~~~~
3-way combined diffs include all the other changes that happened to the
file between your current branch and the branch you are cherry-picking
from. While this is useful for spotting other changes that you need to
take into account, this also makes the output of ``git diff`` somewhat
intimidating and difficult to read. You may instead prefer to run
``git diff HEAD`` (or ``git diff --ours``) which shows only the diff
between the current branch before cherry-picking and the current working
directory. It looks like this::
$ git diff HEAD
[...]
+<<<<<<<< HEAD
this is what's in your current tree before cherry-picking
+========
+this is what the patch wants it to be after cherry-picking
+>>>>>>>> <commit>... title
As you can see, this reads just like any other diff and makes it clear
which lines are in the current branch and which lines are being added
because they are part of the merge conflict or the patch being
cherry-picked.
Merge styles and diff3
~~~~~~~~~~~~~~~~~~~~~~
The default conflict marker style shown above is known as the ``merge``
style. There is also another style available, known as the ``diff3``
style, which looks like this::
<<<<<<< HEAD
this is what is in your current tree before cherry-picking
||||||| parent of <commit> (title)
this is what the patch expected to find there
=======
this is what the patch wants it to be after being applied
>>>>>>> <commit> (title)
As you can see, this has 3 parts instead of 2, and includes what git
expected to find there but didn't. It is *highly recommended* to use
this conflict style as it makes it much clearer what the patch actually
changed; i.e., it allows you to compare the before-and-after versions
of the file for the commit you are cherry-picking. This allows you to
make better decisions about how to resolve the conflict.
To change conflict marker styles, you can use the following command::
git config merge.conflictStyle diff3
There is a third option, ``zdiff3``, introduced in `Git 2.35`_,
which has the same 3 sections as ``diff3``, but where common lines have
been trimmed off, making the conflict area smaller in some cases.
.. _Git 2.35: https://github.blog/2022-01-24-highlights-from-git-2-35/
Iterating on conflict resolutions
---------------------------------
The first step in any conflict resolution process is to understand the
patch you are backporting. For the Linux kernel this is especially
important, since an incorrect change can lead to the whole system
crashing -- or worse, an undetected security vulnerability.
Understanding the patch can be easy or difficult depending on the patch
itself, the changelog, and your familiarity with the code being changed.
However, a good question for every change (or every hunk of the patch)
might be: "Why is this hunk in the patch?" The answers to these
questions will inform your conflict resolution.
Resolution process
~~~~~~~~~~~~~~~~~~
Sometimes the easiest thing to do is to just remove all but the first
part of the conflict, leaving the file essentially unchanged, and apply
the changes by hand. Perhaps the patch is changing a function call
argument from ``0`` to ``1`` while a conflicting change added an
entirely new (and insignificant) parameter to the end of the parameter
list; in that case, it's easy enough to change the argument from ``0``
to ``1`` by hand and leave the rest of the arguments alone. This
technique of manually applying changes is mostly useful if the conflict
pulled in a lot of unrelated context that you don't really need to care
about.
For particularly nasty conflicts with many conflict markers, you can use
``git add`` or ``git add -i`` to selectively stage your resolutions to
get them out of the way; this also lets you use ``git diff HEAD`` to
always see what remains to be resolved or ``git diff --cached`` to see
what your patch looks like so far.
Dealing with file renames
~~~~~~~~~~~~~~~~~~~~~~~~~
One of the most annoying things that can happen while backporting a
patch is discovering that one of the files being patched has been
renamed, as that typically means git won't even put in conflict markers,
but will just throw up its hands and say (paraphrased): "Unmerged path!
You do the work..."
There are generally a few ways to deal with this. If the patch to the
renamed file is small, like a one-line change, the easiest thing is to
just go ahead and apply the change by hand and be done with it. On the
other hand, if the change is big or complicated, you definitely don't
want to do it by hand.
As a first pass, you can try something like this, which will lower the
rename detection threshold to 30% (by default, git uses 50%, meaning
that two files need to have at least 50% in common for it to consider
an add-delete pair to be a potential rename)::
git cherry-pick -strategy=recursive -Xrename-threshold=30
Sometimes the right thing to do will be to also backport the patch that
did the rename, but that's definitely not the most common case. Instead,
what you can do is to temporarily rename the file in the branch you're
backporting to (using ``git mv`` and committing the result), restart the
attempt to cherry-pick the patch, rename the file back (``git mv`` and
committing again), and finally squash the result using ``git rebase -i``
(see the `rebase tutorial`_) so it appears as a single commit when you
are done.
.. _rebase tutorial: https://medium.com/@slamflipstrom/a-beginners-guide-to-squashing-commits-with-git-rebase-8185cf6e62ec
Gotchas
-------
Function arguments
~~~~~~~~~~~~~~~~~~
Pay attention to changing function arguments! It's easy to gloss over
details and think that two lines are the same but actually they differ
in some small detail like which variable was passed as an argument
(especially if the two variables are both a single character that look
the same, like i and j).
Error handling
~~~~~~~~~~~~~~
If you cherry-pick a patch that includes a ``goto`` statement (typically
for error handling), it is absolutely imperative to double check that
the target label is still correct in the branch you are backporting to.
The same goes for added ``return``, ``break``, and ``continue``
statements.
Error handling is typically located at the bottom of the function, so it
may not be part of the conflict even though could have been changed by
other patches.
A good way to ensure that you review the error paths is to always use
``git diff -W`` and ``git show -W`` (AKA ``--function-context``) when
inspecting your changes. For C code, this will show you the whole
function that's being changed in a patch. One of the things that often
go wrong during backports is that something else in the function changed
on either of the branches that you're backporting from or to. By
including the whole function in the diff you get more context and can
more easily spot problems that might otherwise go unnoticed.
Refactored code
~~~~~~~~~~~~~~~
Something that happens quite often is that code gets refactored by
"factoring out" a common code sequence or pattern into a helper
function. When backporting patches to an area where such a refactoring
has taken place, you effectively need to do the reverse when
backporting: a patch to a single location may need to be applied to
multiple locations in the backported version. (One giveaway for this
scenario is that a function was renamed -- but that's not always the
case.)
To avoid incomplete backports, it's worth trying to figure out if the
patch fixes a bug that appears in more than one place. One way to do
this would be to use ``git grep``. (This is actually a good idea to do
in general, not just for backports.) If you do find that the same kind
of fix would apply to other places, it's also worth seeing if those
places exist upstream -- if they don't, it's likely the patch may need
to be adjusted. ``git log`` is your friend to figure out what happened
to these areas as ``git blame`` won't show you code that has been
removed.
If you do find other instances of the same pattern in the upstream tree
and you're not sure whether it's also a bug, it may be worth asking the
patch author. It's not uncommon to find new bugs during backporting!
Verifying the result
====================
colordiff
---------
Having committed a conflict-free new patch, you can now compare your
patch to the original patch. It is highly recommended that you use a
tool such as `colordiff`_ that can show two files side by side and color
them according to the changes between them::
colordiff -yw -W 200 <(git diff -W <upstream commit>^-) <(git diff -W HEAD^-) | less -SR
.. _colordiff: https://www.colordiff.org/
Here, ``-y`` means to do a side-by-side comparison; ``-w`` ignores
whitespace, and ``-W 200`` sets the width of the output (as otherwise it
will use 130 by default, which is often a bit too little).
The ``rev^-`` syntax is a handy shorthand for ``rev^..rev``, essentially
giving you just the diff for that single commit; also see
the official `git rev-parse documentation`_.
.. _git rev-parse documentation: https://git-scm.com/docs/git-rev-parse#_other_rev_parent_shorthand_notations
Again, note the inclusion of ``-W`` for ``git diff``; this ensures that
you will see the full function for any function that has changed.
One incredibly important thing that colordiff does is to highlight lines
that are different. For example, if an error-handling ``goto`` has
changed labels between the original and backported patch, colordiff will
show these side-by-side but highlighted in a different color. Thus, it
is easy to see that the two ``goto`` statements are jumping to different
labels. Likewise, lines that were not modified by either patch but
differ in the context will also be highlighted and thus stand out during
a manual inspection.
Of course, this is just a visual inspection; the real test is building
and running the patched kernel (or program).
Build testing
-------------
We won't cover runtime testing here, but it can be a good idea to build
just the files touched by the patch as a quick sanity check. For the
Linux kernel you can build single files like this, assuming you have the
``.config`` and build environment set up correctly::
make path/to/file.o
Note that this won't discover linker errors, so you should still do a
full build after verifying that the single file compiles. By compiling
the single file first you can avoid having to wait for a full build *in
case* there are compiler errors in any of the files you've changed.
Runtime testing
---------------
Even a successful build or boot test is not necessarily enough to rule
out a missing dependency somewhere. Even though the chances are small,
there could be code changes where two independent changes to the same
file result in no conflicts, no compile-time errors, and runtime errors
only in exceptional cases.
One concrete example of this was a pair of patches to the system call
entry code where the first patch saved/restored a register and a later
patch made use of the same register somewhere in the middle of this
sequence. Since there was no overlap between the changes, one could
cherry-pick the second patch, have no conflicts, and believe that
everything was fine, when in fact the code was now scribbling over an
unsaved register.
Although the vast majority of errors will be caught during compilation
or by superficially exercising the code, the only way to *really* verify
a backport is to review the final patch with the same level of scrutiny
as you would (or should) give to any other patch. Having unit tests and
regression tests or other types of automatic testing can help increase
the confidence in the correctness of a backport.
Submitting backports to stable
==============================
As the stable maintainers try to cherry-pick mainline fixes onto their
stable kernels, they may send out emails asking for backports when when
encountering conflicts, see e.g.
<https://lore.kernel.org/stable/2023101528-jawed-shelving-071a@gregkh/>.
These emails typically include the exact steps you need to cherry-pick
the patch to the correct tree and submit the patch.
One thing to make sure is that your changelog conforms to the expected
format::
<original patch title>
[ Upstream commit <mainline rev> ]
<rest of the original changelog>
[ <summary of the conflicts and their resolutions> ]
Signed-off-by: <your name and email>
The "Upstream commit" line is sometimes slightly different depending on
the stable version. Older version used this format::
commit <mainline rev> upstream.
It is most common to indicate the kernel version the patch applies to
in the email subject line (using e.g.
``git send-email --subject-prefix='PATCH 6.1.y'``), but you can also put
it in the Signed-off-by:-area or below the ``---`` line.
The stable maintainers expect separate submissions for each active
stable version, and each submission should also be tested separately.
A few final words of advice
===========================
1) Approach the backporting process with humility.
2) Understand the patch you are backporting; this means reading both
the changelog and the code.
3) Be honest about your confidence in the result when submitting the
patch.
4) Ask relevant maintainers for explicit acks.
Examples
========
The above shows roughly the idealized process of backporting a patch.
For a more concrete example, see this video tutorial where two patches
are backported from mainline to stable:
`Backporting Linux Kernel Patches`_.
.. _Backporting Linux Kernel Patches: https://youtu.be/sBR7R1V2FeA

View File

@ -66,12 +66,13 @@ lack of a better place.
:maxdepth: 1
applying-patches
backporting
adding-syscalls
magic-number
volatile-considered-harmful
botching-up-ioctls
clang-format
../riscv/patch-acceptance
../arch/riscv/patch-acceptance
../core-api/unaligned-memory-access
.. only:: subproject and html

View File

@ -327,6 +327,8 @@ politely and address the problems they have pointed out. When sending a next
version, add a ``patch changelog`` to the cover letter or to individual patches
explaining difference against previous submission (see
:ref:`the_canonical_patch_format`).
Notify people that commented on your patch about new versions by adding them to
the patches CC list.
See Documentation/process/email-clients.rst for recommendations on email
clients and mailing list etiquette.
@ -366,10 +368,10 @@ busy people and may not get to your patch right away.
Once upon a time, patches used to disappear into the void without comment,
but the development process works more smoothly than that now. You should
receive comments within a week or so; if that does not happen, make sure
that you have sent your patches to the right place. Wait for a minimum of
one week before resubmitting or pinging reviewers - possibly longer during
busy times like merge windows.
receive comments within a few weeks (typically 2-3); if that does not
happen, make sure that you have sent your patches to the right place.
Wait for a minimum of one week before resubmitting or pinging reviewers
- possibly longer during busy times like merge windows.
It's also ok to resend the patch or the patch series after a couple of
weeks with the word "RESEND" added to the subject line::

View File

@ -6,6 +6,7 @@ Security Documentation
:maxdepth: 1
credentials
snp-tdx-threat-model
IMA-templates
keys/index
lsm

View File

@ -0,0 +1,253 @@
======================================================
Confidential Computing in Linux for x86 virtualization
======================================================
.. contents:: :local:
By: Elena Reshetova <elena.reshetova@intel.com> and Carlos Bilbao <carlos.bilbao@amd.com>
Motivation
==========
Kernel developers working on confidential computing for virtualized
environments in x86 operate under a set of assumptions regarding the Linux
kernel threat model that differ from the traditional view. Historically,
the Linux threat model acknowledges attackers residing in userspace, as
well as a limited set of external attackers that are able to interact with
the kernel through various networking or limited HW-specific exposed
interfaces (USB, thunderbolt). The goal of this document is to explain
additional attack vectors that arise in the confidential computing space
and discuss the proposed protection mechanisms for the Linux kernel.
Overview and terminology
========================
Confidential Computing (CoCo) is a broad term covering a wide range of
security technologies that aim to protect the confidentiality and integrity
of data in use (vs. data at rest or data in transit). At its core, CoCo
solutions provide a Trusted Execution Environment (TEE), where secure data
processing can be performed and, as a result, they are typically further
classified into different subtypes depending on the SW that is intended
to be run in TEE. This document focuses on a subclass of CoCo technologies
that are targeting virtualized environments and allow running Virtual
Machines (VM) inside TEE. From now on in this document will be referring
to this subclass of CoCo as 'Confidential Computing (CoCo) for the
virtualized environments (VE)'.
CoCo, in the virtualization context, refers to a set of HW and/or SW
technologies that allow for stronger security guarantees for the SW running
inside a CoCo VM. Namely, confidential computing allows its users to
confirm the trustworthiness of all SW pieces to include in its reduced
Trusted Computing Base (TCB) given its ability to attest the state of these
trusted components.
While the concrete implementation details differ between technologies, all
available mechanisms aim to provide increased confidentiality and
integrity for the VM's guest memory and execution state (vCPU registers),
more tightly controlled guest interrupt injection, as well as some
additional mechanisms to control guest-host page mapping. More details on
the x86-specific solutions can be found in
:doc:`Intel Trust Domain Extensions (TDX) </arch/x86/tdx>` and
`AMD Memory Encryption <https://www.amd.com/system/files/techdocs/sev-snp-strengthening-vm-isolation-with-integrity-protection-and-more.pdf>`_.
The basic CoCo guest layout includes the host, guest, the interfaces that
communicate guest and host, a platform capable of supporting CoCo VMs, and
a trusted intermediary between the guest VM and the underlying platform
that acts as a security manager. The host-side virtual machine monitor
(VMM) typically consists of a subset of traditional VMM features and
is still in charge of the guest lifecycle, i.e. create or destroy a CoCo
VM, manage its access to system resources, etc. However, since it
typically stays out of CoCo VM TCB, its access is limited to preserve the
security objectives.
In the following diagram, the "<--->" lines represent bi-directional
communication channels or interfaces between the CoCo security manager and
the rest of the components (data flow for guest, host, hardware) ::
+-------------------+ +-----------------------+
| CoCo guest VM |<---->| |
+-------------------+ | |
| Interfaces | | CoCo security manager |
+-------------------+ | |
| Host VMM |<---->| |
+-------------------+ | |
| |
+--------------------+ | |
| CoCo platform |<--->| |
+--------------------+ +-----------------------+
The specific details of the CoCo security manager vastly diverge between
technologies. For example, in some cases, it will be implemented in HW
while in others it may be pure SW.
Existing Linux kernel threat model
==================================
The overall components of the current Linux kernel threat model are::
+-----------------------+ +-------------------+
| |<---->| Userspace |
| | +-------------------+
| External attack | | Interfaces |
| vectors | +-------------------+
| |<---->| Linux Kernel |
| | +-------------------+
+-----------------------+ +-------------------+
| Bootloader/BIOS |
+-------------------+
+-------------------+
| HW platform |
+-------------------+
There is also communication between the bootloader and the kernel during
the boot process, but this diagram does not represent it explicitly. The
"Interfaces" box represents the various interfaces that allow
communication between kernel and userspace. This includes system calls,
kernel APIs, device drivers, etc.
The existing Linux kernel threat model typically assumes execution on a
trusted HW platform with all of the firmware and bootloaders included on
its TCB. The primary attacker resides in the userspace, and all of the data
coming from there is generally considered untrusted, unless userspace is
privileged enough to perform trusted actions. In addition, external
attackers are typically considered, including those with access to enabled
external networks (e.g. Ethernet, Wireless, Bluetooth), exposed hardware
interfaces (e.g. USB, Thunderbolt), and the ability to modify the contents
of disks offline.
Regarding external attack vectors, it is interesting to note that in most
cases external attackers will try to exploit vulnerabilities in userspace
first, but that it is possible for an attacker to directly target the
kernel; particularly if the host has physical access. Examples of direct
kernel attacks include the vulnerabilities CVE-2019-19524, CVE-2022-0435
and CVE-2020-24490.
Confidential Computing threat model and its security objectives
===============================================================
Confidential Computing adds a new type of attacker to the above list: a
potentially misbehaving host (which can also include some part of a
traditional VMM or all of it), which is typically placed outside of the
CoCo VM TCB due to its large SW attack surface. It is important to note
that this doesnt imply that the host or VMM are intentionally
malicious, but that there exists a security value in having a small CoCo
VM TCB. This new type of adversary may be viewed as a more powerful type
of external attacker, as it resides locally on the same physical machine
(in contrast to a remote network attacker) and has control over the guest
kernel communication with most of the HW::
+------------------------+
| CoCo guest VM |
+-----------------------+ | +-------------------+ |
| |<--->| | Userspace | |
| | | +-------------------+ |
| External attack | | | Interfaces | |
| vectors | | +-------------------+ |
| |<--->| | Linux Kernel | |
| | | +-------------------+ |
+-----------------------+ | +-------------------+ |
| | Bootloader/BIOS | |
+-----------------------+ | +-------------------+ |
| |<--->+------------------------+
| | | Interfaces |
| | +------------------------+
| CoCo security |<--->| Host/Host-side VMM |
| manager | +------------------------+
| | +------------------------+
| |<--->| CoCo platform |
+-----------------------+ +------------------------+
While traditionally the host has unlimited access to guest data and can
leverage this access to attack the guest, the CoCo systems mitigate such
attacks by adding security features like guest data confidentiality and
integrity protection. This threat model assumes that those features are
available and intact.
The **Linux kernel CoCo VM security objectives** can be summarized as follows:
1. Preserve the confidentiality and integrity of CoCo guest's private
memory and registers.
2. Prevent privileged escalation from a host into a CoCo guest Linux kernel.
While it is true that the host (and host-side VMM) requires some level of
privilege to create, destroy, or pause the guest, part of the goal of
preventing privileged escalation is to ensure that these operations do not
provide a pathway for attackers to gain access to the guest's kernel.
The above security objectives result in two primary **Linux kernel CoCo
VM assets**:
1. Guest kernel execution context.
2. Guest kernel private memory.
The host retains full control over the CoCo guest resources, and can deny
access to them at any time. Examples of resources include CPU time, memory
that the guest can consume, network bandwidth, etc. Because of this, the
host Denial of Service (DoS) attacks against CoCo guests are beyond the
scope of this threat model.
The **Linux CoCo VM attack surface** is any interface exposed from a CoCo
guest Linux kernel towards an untrusted host that is not covered by the
CoCo technology SW/HW protection. This includes any possible
side-channels, as well as transient execution side channels. Examples of
explicit (not side-channel) interfaces include accesses to port I/O, MMIO
and DMA interfaces, access to PCI configuration space, VMM-specific
hypercalls (towards Host-side VMM), access to shared memory pages,
interrupts allowed to be injected into the guest kernel by the host, as
well as CoCo technology-specific hypercalls, if present. Additionally, the
host in a CoCo system typically controls the process of creating a CoCo
guest: it has a method to load into a guest the firmware and bootloader
images, the kernel image together with the kernel command line. All of this
data should also be considered untrusted until its integrity and
authenticity is established via attestation.
The table below shows a threat matrix for the CoCo guest Linux kernel but
does not discuss potential mitigation strategies. The matrix refers to
CoCo-specific versions of the guest, host and platform.
.. list-table:: CoCo Linux guest kernel threat matrix
:widths: auto
:align: center
:header-rows: 1
* - Threat name
- Threat description
* - Guest malicious configuration
- A misbehaving host modifies one of the following guest's
configuration:
1. Guest firmware or bootloader
2. Guest kernel or module binaries
3. Guest command line parameters
This allows the host to break the integrity of the code running
inside a CoCo guest, and violates the CoCo security objectives.
* - CoCo guest data attacks
- A misbehaving host retains full control of the CoCo guest's data
in-transit between the guest and the host-managed physical or
virtual devices. This allows any attack against confidentiality,
integrity or freshness of such data.
* - Malformed runtime input
- A misbehaving host injects malformed input via any communication
interface used by the guest's kernel code. If the code is not
prepared to handle this input correctly, this can result in a host
--> guest kernel privilege escalation. This includes traditional
side-channel and/or transient execution attack vectors.
* - Malicious runtime input
- A misbehaving host injects a specific input value via any
communication interface used by the guest's kernel code. The
difference with the previous attack vector (malformed runtime input)
is that this input is not malformed, but its value is crafted to
impact the guest's kernel security. Examples of such inputs include
providing a malicious time to the guest or the entropy to the guest
random number generator. Additionally, the timing of such events can
be an attack vector on its own, if it results in a particular guest
kernel action (i.e. processing of a host-injected interrupt).
resistant to supplied host input.

View File

@ -93,7 +93,7 @@ def markup_ctype_refs(match):
#
RE_expr = re.compile(r':c:(expr|texpr):`([^\`]+)`')
def markup_c_expr(match):
return '\ ``' + match.group(2) + '``\ '
return '\\ ``' + match.group(2) + '``\\ '
#
# Parse Sphinx 3.x C markups, replacing them by backward-compatible ones
@ -151,7 +151,7 @@ class CObject(Base_CObject):
def handle_func_like_macro(self, sig, signode):
u"""Handles signatures of function-like macros.
If the objtype is 'function' and the the signature ``sig`` is a
If the objtype is 'function' and the signature ``sig`` is a
function-like macro, the name of the macro is returned. Otherwise
``False`` is returned. """

View File

@ -138,7 +138,7 @@ class KernelCmd(Directive):
code_block += "\n " + l
lines = code_block + "\n\n"
line_regex = re.compile("^\.\. LINENO (\S+)\#([0-9]+)$")
line_regex = re.compile(r"^\.\. LINENO (\S+)\#([0-9]+)$")
ln = 0
n = 0
f = fname

View File

@ -104,7 +104,7 @@ class KernelFeat(Directive):
lines = self.runCmd(cmd, shell=True, cwd=cwd, env=shell_env)
line_regex = re.compile("^\.\. FILE (\S+)$")
line_regex = re.compile(r"^\.\. FILE (\S+)$")
out_lines = ""

View File

@ -130,7 +130,7 @@ class KernelDocDirective(Directive):
result = ViewList()
lineoffset = 0;
line_regex = re.compile("^\.\. LINENO ([0-9]+)$")
line_regex = re.compile(r"^\.\. LINENO ([0-9]+)$")
for line in lines:
match = line_regex.search(line)
if match:
@ -138,7 +138,7 @@ class KernelDocDirective(Directive):
lineoffset = int(match.group(1)) - 1
# we must eat our comments since the upset the markup
else:
doc = env.srcdir + "/" + env.docname + ":" + str(self.lineno)
doc = str(env.srcdir) + "/" + env.docname + ":" + str(self.lineno)
result.append(line, doc + ": " + filename, lineoffset)
lineoffset += 1

View File

@ -309,7 +309,7 @@ def convert_image(img_node, translator, src_fname=None):
if dst_fname:
# the builder needs not to copy one more time, so pop it if exists.
translator.builder.images.pop(img_node['uri'], None)
_name = dst_fname[len(translator.builder.outdir) + 1:]
_name = dst_fname[len(str(translator.builder.outdir)) + 1:]
if isNewer(dst_fname, src_fname):
kernellog.verbose(app,

View File

@ -77,7 +77,7 @@ class MaintainersInclude(Include):
line = line.rstrip()
# Linkify all non-wildcard refs to ReST files in Documentation/.
pat = '(Documentation/([^\s\?\*]*)\.rst)'
pat = r'(Documentation/([^\s\?\*]*)\.rst)'
m = re.search(pat, line)
if m:
# maintainers.rst is in a subdirectory, so include "../".
@ -90,11 +90,11 @@ class MaintainersInclude(Include):
output = "| %s" % (line.replace("\\", "\\\\"))
# Look for and record field letter to field name mappings:
# R: Designated *reviewer*: FullName <address@domain>
m = re.search("\s(\S):\s", line)
m = re.search(r"\s(\S):\s", line)
if m:
field_letter = m.group(1)
if field_letter and not field_letter in fields:
m = re.search("\*([^\*]+)\*", line)
m = re.search(r"\*([^\*]+)\*", line)
if m:
fields[field_letter] = m.group(1)
elif subsystems:
@ -112,7 +112,7 @@ class MaintainersInclude(Include):
field_content = ""
# Collapse whitespace in subsystem name.
heading = re.sub("\s+", " ", line)
heading = re.sub(r"\s+", " ", line)
output = output + "%s\n%s" % (heading, "~" * len(heading))
field_prev = ""
else:

View File

@ -35,6 +35,7 @@ Human interfaces
sound/index
gpu/index
fb/index
leds/index
Networking interfaces
---------------------
@ -70,7 +71,6 @@ Storage interfaces
fpga/index
i2c/index
iio/index
leds/index
pcmcia/index
spi/index
w1/index

View File

@ -1,6 +1,6 @@
.. include:: ../disclaimer-ita.rst
:Original: :doc:`../../../riscv/patch-acceptance`
:Original: :doc:`../../../arch/riscv/patch-acceptance`
:Translator: Federico Vaga <federico.vaga@vaga.pv.it>
arch/riscv linee guida alla manutenzione per gli sviluppatori

View File

@ -0,0 +1,341 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-sp.rst
:Original: Documentation/process/embargoed-hardware-issues.rst
:Translator: Avadhut Naik <avadhut.naik@amd.com>
Problemas de hardware embargados
================================
Alcance
-------
Los problemas de hardware que resultan en problemas de seguridad son una
categoría diferente de errores de seguridad que los errores de software
puro que solo afectan al kernel de Linux.
Los problemas de hardware como Meltdown, Spectre, L1TF, etc. deben
tratarse de manera diferente porque usualmente afectan a todos los
sistemas operativos (“OS”) y, por lo tanto, necesitan coordinación entre
vendedores diferentes de OS, distribuciones, vendedores de hardware y
otras partes. Para algunos de los problemas, las mitigaciones de software
pueden depender de actualizaciones de microcódigo o firmware, los cuales
necesitan una coordinación adicional.
.. _Contacto:
Contacto
--------
El equipo de seguridad de hardware del kernel de Linux es separado del
equipo regular de seguridad del kernel de Linux.
El equipo solo maneja la coordinación de los problemas de seguridad de
hardware embargados. Los informes de errores de seguridad de software puro
en el kernel de Linux no son manejados por este equipo y el "reportero"
(quien informa del error) será guiado a contactar el equipo de seguridad
del kernel de Linux (:doc:`errores de seguridad <security-bugs>`) en su
lugar.
El equipo puede contactar por correo electrónico en
<hardware-security@kernel.org>. Esta es una lista privada de oficiales de
seguridad que lo ayudarán a coordinar un problema de acuerdo con nuestro
proceso documentado.
La lista esta encriptada y el correo electrónico a la lista puede ser
enviado por PGP o S/MIME encriptado y debe estar firmado con la llave de
PGP del reportero o el certificado de S/MIME. La llave de PGP y el
certificado de S/MIME de la lista están disponibles en las siguientes
URLs:
- PGP: https://www.kernel.org/static/files/hardware-security.asc
- S/MIME: https://www.kernel.org/static/files/hardware-security.crt
Si bien los problemas de seguridad del hardware a menudo son manejados por
el vendedor de hardware afectado, damos la bienvenida al contacto de
investigadores o individuos que hayan identificado una posible falla de
hardware.
Oficiales de seguridad de hardware
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
El equipo actual de oficiales de seguridad de hardware:
- Linus Torvalds (Linux Foundation Fellow)
- Greg Kroah-Hartman (Linux Foundation Fellow)
- Thomas Gleixner (Linux Foundation Fellow)
Operación de listas de correo
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Las listas de correo encriptadas que se utilizan en nuestro proceso están
alojados en la infraestructura de IT de la Fundación Linux. Al proporcionar
este servicio, los miembros del personal de operaciones de IT de la
Fundación Linux técnicamente tienen la capacidad de acceder a la
información embargada, pero están obligados a la confidencialidad por su
contrato de trabajo. El personal de IT de la Fundación Linux también es
responsable para operar y administrar el resto de la infraestructura de
kernel.org.
El actual director de infraestructura de proyecto de IT de la Fundación
Linux es Konstantin Ryabitsev.
Acuerdos de no divulgación
--------------------------
El equipo de seguridad de hardware del kernel de Linux no es un organismo
formal y, por lo tanto, no puede firmar cualquier acuerdo de no
divulgación. La comunidad del kernel es consciente de la naturaleza
delicada de tales problemas y ofrece un Memorando de Entendimiento en su
lugar.
Memorando de Entendimiento
--------------------------
La comunidad del kernel de Linux tiene una comprensión profunda del
requisito de mantener los problemas de seguridad de hardware bajo embargo
para la coordinación entre diferentes vendedores de OS, distribuidores,
vendedores de hardware y otras partes.
La comunidad del kernel de Linux ha manejado con éxito los problemas de
seguridad del hardware en el pasado y tiene los mecanismos necesarios para
permitir el desarrollo compatible con la comunidad bajo restricciones de
embargo.
La comunidad del kernel de Linux tiene un equipo de seguridad de hardware
dedicado para el contacto inicial, el cual supervisa el proceso de manejo
de tales problemas bajo las reglas de embargo.
El equipo de seguridad de hardware identifica a los desarrolladores
(expertos en dominio) que formarán el equipo de respuesta inicial para un
problema en particular. El equipo de respuesta inicial puede involucrar
más desarrolladores (expertos en dominio) para abordar el problema de la
mejor manera técnica.
Todos los desarrolladores involucrados se comprometen a adherirse a las
reglas del embargo y a mantener confidencial la información recibida. La
violación de la promesa conducirá a la exclusión inmediata del problema
actual y la eliminación de todas las listas de correo relacionadas.
Además, el equipo de seguridad de hardware también excluirá al
delincuente de problemas futuros. El impacto de esta consecuencia es un
elemento de disuasión altamente efectivo en nuestra comunidad. En caso de
que ocurra una violación, el equipo de seguridad de hardware informará a
las partes involucradas inmediatamente. Si usted o alguien tiene
conocimiento de una posible violación, por favor, infórmelo inmediatamente
a los oficiales de seguridad de hardware.
Proceso
^^^^^^^
Debido a la naturaleza distribuida globalmente del desarrollo del kernel
de Linux, las reuniones cara a cara hacen imposible abordar los
problemas de seguridad del hardware. Las conferencias telefónicas son
difíciles de coordinar debido a las zonas horarias y otros factores y
solo deben usarse cuando sea absolutamente necesario. El correo
electrónico encriptado ha demostrado ser el método de comunicación más
efectivo y seguro para estos tipos de problemas.
Inicio de la divulgación
""""""""""""""""""""""""
La divulgación comienza contactado al equipo de seguridad de hardware del
kernel de Linux por correo electrónico. Este contacto inicial debe
contener una descripción del problema y una lista de cualquier hardware
afectado conocido. Si su organización fabrica o distribuye el hardware
afectado, le animamos a considerar también que otro hardware podría estar
afectado.
El equipo de seguridad de hardware proporcionará una lista de correo
encriptada específica para el incidente que se utilizará para la discusión
inicial con el reportero, la divulgación adicional y la coordinación.
El equipo de seguridad de hardware proporcionará a la parte reveladora una
lista de desarrolladores (expertos de dominios) a quienes se debe informar
inicialmente sobre el problema después de confirmar con los
desarrolladores que se adherirán a este Memorando de Entendimiento y al
proceso documentado. Estos desarrolladores forman el equipo de respuesta
inicial y serán responsables de manejar el problema después del contacto
inicial. El equipo de seguridad de hardware apoyará al equipo de
respuesta, pero no necesariamente involucrandose en el proceso de desarrollo
de mitigación.
Si bien los desarrolladores individuales pueden estar cubiertos por un
acuerdo de no divulgación a través de su empleador, no pueden firmar
acuerdos individuales de no divulgación en su papel de desarrolladores
del kernel de Linux. Sin embargo, aceptarán adherirse a este proceso
documentado y al Memorando de Entendimiento.
La parte reveladora debe proporcionar una lista de contactos para todas
las demás entidades ya que han sido, o deberían ser, informadas sobre el
problema. Esto sirve para varios propósitos:
- La lista de entidades divulgadas permite la comunicación en toda la
industria, por ejemplo, otros vendedores de OS, vendedores de HW, etc.
- Las entidades divulgadas pueden ser contactadas para nombrar a expertos
que deben participar en el desarrollo de la mitigación.
- Si un experto que se requiere para manejar un problema es empleado por
una entidad cotizada o un miembro de una entidad cotizada, los equipos
de respuesta pueden solicitar la divulgación de ese experto a esa
entidad. Esto asegura que el experto también forme parte del equipo de
respuesta de la entidad.
Divulgación
"""""""""""
La parte reveladora proporcionará información detallada al equipo de
respuesta inicial a través de la lista de correo encriptada especifica.
Según nuestra experiencia, la documentación técnica de estos problemas
suele ser un punto de partida suficiente y es mejor hacer aclaraciones
técnicas adicionales a través del correo electrónico.
Desarrollo de la mitigación
"""""""""""""""""""""""""""
El equipo de respuesta inicial configura una lista de correo encriptada o
reutiliza una existente si es apropiada.
El uso de una lista de correo está cerca del proceso normal de desarrollo
de Linux y se ha utilizado con éxito en el desarrollo de mitigación para
varios problemas de seguridad de hardware en el pasado.
La lista de correo funciona en la misma manera que el desarrollo normal de
Linux. Los parches se publican, discuten y revisan y, si se acuerda, se
aplican a un repositorio git no público al que solo pueden acceder los
desarrolladores participantes a través de una conexión segura. El
repositorio contiene la rama principal de desarrollo en comparación con
el kernel principal y las ramas backport para versiones estables del
kernel según sea necesario.
El equipo de respuesta inicial identificará a más expertos de la
comunidad de desarrolladores del kernel de Linux según sea necesario. La
incorporación de expertos puede ocurrir en cualquier momento del proceso
de desarrollo y debe manejarse de manera oportuna.
Si un experto es empleado por o es miembro de una entidad en la lista de
divulgación proporcionada por la parte reveladora, entonces se solicitará
la participación de la entidad pertinente.
Si no es así, entonces se informará a la parte reveladora sobre la
participación de los expertos. Los expertos están cubiertos por el
Memorando de Entendimiento y se solicita a la parte reveladora que
reconozca la participación. En caso de que la parte reveladora tenga una
razón convincente para objetar, entonces esta objeción debe plantearse
dentro de los cinco días laborables y resolverse con el equipo de
incidente inmediatamente. Si la parte reveladora no reacciona dentro de
los cinco días laborables, esto se toma como un reconocimiento silencioso.
Después del reconocimiento o la resolución de una objeción, el experto es
revelado por el equipo de incidente y se incorpora al proceso de
desarrollo.
Lanzamiento coordinado
""""""""""""""""""""""
Las partes involucradas negociarán la fecha y la hora en la que termina el
embargo. En ese momento, las mitigaciones preparadas se integran en los
árboles de kernel relevantes y se publican.
Si bien entendemos que los problemas de seguridad del hardware requieren
un tiempo de embargo coordinado, el tiempo de embargo debe limitarse al
tiempo mínimo que se requiere para que todas las partes involucradas
desarrollen, prueben y preparen las mitigaciones. Extender el tiempo de
embargo artificialmente para cumplir con las fechas de discusión de la
conferencia u otras razones no técnicas está creando más trabajo y carga
para los desarrolladores y los equipos de respuesta involucrados, ya que
los parches necesitan mantenerse actualizados para seguir el desarrollo en
curso del kernel upstream, lo cual podría crear cambios conflictivos.
Asignación de CVE
"""""""""""""""""
Ni el equipo de seguridad de hardware ni el equipo de respuesta inicial
asignan CVEs, ni se requieren para el proceso de desarrollo. Si los CVEs
son proporcionados por la parte reveladora, pueden usarse con fines de
documentación.
Embajadores del proceso
-----------------------
Para obtener asistencia con este proceso, hemos establecido embajadores
en varias organizaciones, que pueden responder preguntas o proporcionar
orientación sobre el proceso de reporte y el manejo posterior. Los
embajadores no están involucrados en la divulgación de un problema en
particular, a menos que lo solicite un equipo de respuesta o una parte
revelada involucrada. La lista de embajadores actuales:
============= ========================================================
AMD Tom Lendacky <thomas.lendacky@amd.com>
Ampere Darren Hart <darren@os.amperecomputing.com>
ARM Catalin Marinas <catalin.marinas@arm.com>
IBM Power Anton Blanchard <anton@linux.ibm.com>
IBM Z Christian Borntraeger <borntraeger@de.ibm.com>
Intel Tony Luck <tony.luck@intel.com>
Qualcomm Trilok Soni <tsoni@codeaurora.org>
Samsung Javier González <javier.gonz@samsung.com>
Microsoft James Morris <jamorris@linux.microsoft.com>
Xen Andrew Cooper <andrew.cooper3@citrix.com>
Canonical John Johansen <john.johansen@canonical.com>
Debian Ben Hutchings <ben@decadent.org.uk>
Oracle Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Red Hat Josh Poimboeuf <jpoimboe@redhat.com>
SUSE Jiri Kosina <jkosina@suse.cz>
Google Kees Cook <keescook@chromium.org>
LLVM Nick Desaulniers <ndesaulniers@google.com>
============= ========================================================
Si quiere que su organización se añada a la lista de embajadores, por
favor póngase en contacto con el equipo de seguridad de hardware. El
embajador nominado tiene que entender y apoyar nuestro proceso
completamente y está idealmente bien conectado en la comunidad del kernel
de Linux.
Listas de correo encriptadas
----------------------------
Usamos listas de correo encriptadas para la comunicación. El principio de
funcionamiento de estas listas es que el correo electrónico enviado a la
lista se encripta con la llave PGP de la lista o con el certificado S/MIME
de la lista. El software de lista de correo descifra el correo electrónico
y lo vuelve a encriptar individualmente para cada suscriptor con la llave
PGP del suscriptor o el certificado S/MIME. Los detalles sobre el software
de la lista de correo y la configuración que se usa para asegurar la
seguridad de las listas y la protección de los datos se pueden encontrar
aquí: https://korg.wiki.kernel.org/userdoc/remail.
Llaves de lista
^^^^^^^^^^^^^^^
Para el contacto inicial, consulte :ref:`Contacto`. Para las listas de
correo especificas de incidentes, la llave y el certificado S/MIME se
envían a los suscriptores por correo electrónico desde la lista
especifica.
Suscripción a listas específicas de incidentes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
La suscripción es manejada por los equipos de respuesta. Las partes
reveladas que quieren participar en la comunicación envían una lista de
suscriptores potenciales al equipo de respuesta para que el equipo de
respuesta pueda validar las solicitudes de suscripción.
Cada suscriptor necesita enviar una solicitud de suscripción al equipo de
respuesta por correo electrónico. El correo electrónico debe estar firmado
con la llave PGP del suscriptor o el certificado S/MIME. Si se usa una
llave PGP, debe estar disponible desde un servidor de llave publica y esta
idealmente conectada a la red de confianza PGP del kernel de Linux. Véase
también: https://www.kernel.org/signature.html.
El equipo de respuesta verifica que la solicitud del suscriptor sea válida
y añade al suscriptor a la lista. Después de la suscripción, el suscriptor
recibirá un correo electrónico de la lista que está firmado con la llave
PGP de la lista o el certificado S/MIME de la lista. El cliente de correo
electrónico del suscriptor puede extraer la llave PGP o el certificado
S/MIME de la firma, de modo que el suscriptor pueda enviar correo
electrónico encriptado a la lista.

View File

@ -22,3 +22,5 @@
adding-syscalls
researcher-guidelines
contribution-maturity-model
security-bugs
embargoed-hardware-issues

View File

@ -0,0 +1,103 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-sp.rst
:Original: Documentation/process/security-bugs.rst
:Translator: Avadhut Naik <avadhut.naik@amd.com>
Errores de seguridad
====================
Los desarrolladores del kernel de Linux se toman la seguridad muy en
serio. Como tal, nos gustaría saber cuándo se encuentra un error de
seguridad para que pueda ser corregido y divulgado lo más rápido posible.
Por favor, informe sobre los errores de seguridad al equipo de seguridad
del kernel de Linux.
Contacto
--------
El equipo de seguridad del kernel de Linux puede ser contactado por correo
electrónico en <security@kernel.org>. Esta es una lista privada de
oficiales de seguridad que ayudarán a verificar el informe del error y
desarrollarán y publicarán una corrección. Si ya tiene una corrección, por
favor, inclúyala con su informe, ya que eso puede acelerar considerablemente
el proceso. Es posible que el equipo de seguridad traiga ayuda adicional
de mantenedores del área para comprender y corregir la vulnerabilidad de
seguridad.
Como ocurre con cualquier error, cuanta más información se proporcione,
más fácil será diagnosticarlo y corregirlo. Por favor, revise el
procedimiento descrito en 'Documentation/admin-guide/reporting-issues.rst'
si no tiene claro que información es útil. Cualquier código de explotación
es muy útil y no será divulgado sin el consentimiento del "reportero" (el
que envia el error) a menos que ya se haya hecho público.
Por favor, envíe correos electrónicos en texto plano sin archivos
adjuntos cuando sea posible. Es mucho más difícil tener una discusión
citada en contexto sobre un tema complejo si todos los detalles están
ocultos en archivos adjuntos. Piense en ello como un
:doc:`envío de parche regular <submitting-patches>` (incluso si no tiene
un parche todavía) describa el problema y el impacto, enumere los pasos
de reproducción, y sígalo con una solución propuesta, todo en texto plano.
Divulgación e información embargada
-----------------------------------
La lista de seguridad no es un canal de divulgación. Para eso, ver
Coordinación debajo. Una vez que se ha desarrollado una solución robusta,
comienza el proceso de lanzamiento. Las soluciones para errores conocidos
públicamente se lanzan inmediatamente.
Aunque nuestra preferencia es lanzar soluciones para errores no divulgados
públicamente tan pronto como estén disponibles, esto puede postponerse a
petición del reportero o una parte afectada por hasta 7 días calendario
desde el inicio del proceso de lanzamiento, con una extensión excepcional
a 14 días de calendario si se acuerda que la criticalidad del error requiere
más tiempo. La única razón válida para aplazar la publicación de una
solución es para acomodar la logística de QA y los despliegues a gran
escala que requieren coordinación de lanzamiento.
Si bien la información embargada puede compartirse con personas de
confianza para desarrollar una solución, dicha información no se publicará
junto con la solución o en cualquier otro canal de divulgación sin el
permiso del reportero. Esto incluye, pero no se limita al informe original
del error y las discusiones de seguimiento (si las hay), exploits,
información sobre CVE o la identidad del reportero.
En otras palabras, nuestro único interés es solucionar los errores. Toda
otra información presentada a la lista de seguridad y cualquier discusión
de seguimiento del informe se tratan confidencialmente incluso después de
que se haya levantado el embargo, en perpetuidad.
Coordinación con otros grupos
-----------------------------
El equipo de seguridad del kernel recomienda encarecidamente que los
reporteros de posibles problemas de seguridad NUNCA contacten la lista
de correo “linux-distros” hasta DESPUES de discutirlo con el equipo de
seguridad del kernel. No Cc: ambas listas a la vez. Puede ponerse en
contacto con la lista de correo linux-distros después de que se haya
acordado una solución y comprenda completamente los requisitos que al
hacerlo le impondrá a usted y la comunidad del kernel.
Las diferentes listas tienen diferentes objetivos y las reglas de
linux-distros no contribuyen en realidad a solucionar ningún problema de
seguridad potencial.
Asignación de CVE
-----------------
El equipo de seguridad no asigna CVEs, ni los requerimos para informes o
correcciones, ya que esto puede complicar innecesariamente el proceso y
puede retrasar el manejo de errores. Si un reportero desea que se le
asigne un identificador CVE, debe buscar uno por sí mismo, por ejemplo,
poniéndose en contacto directamente con MITRE. Sin embargo, en ningún
caso se retrasará la inclusión de un parche para esperar a que llegue un
identificador CVE.
Acuerdos de no divulgación
--------------------------
El equipo de seguridad del kernel de Linux no es un organismo formal y,
por lo tanto, no puede firmar cualquier acuerdo de no divulgación.

View File

@ -10,7 +10,7 @@
mips/index
arm64/index
../riscv/index
../arch/riscv/index
openrisc/index
parisc/index
loongarch/index

View File

@ -1,6 +1,6 @@
.. include:: ../disclaimer-zh_CN.rst
.. include:: ../../disclaimer-zh_CN.rst
:Original: Documentation/riscv/boot-image-header.rst
:Original: Documentation/arch/riscv/boot-image-header.rst
:翻译:

View File

@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
.. include:: ../../disclaimer-zh_CN.rst
:Original: Documentation/riscv/index.rst
:Original: Documentation/arch/riscv/index.rst
:翻译:

View File

@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
.. include:: ../../disclaimer-zh_CN.rst
:Original: Documentation/riscv/patch-acceptance.rst
:Original: Documentation/arch/riscv/patch-acceptance.rst
:翻译:

View File

@ -1,7 +1,7 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
.. include:: ../../disclaimer-zh_CN.rst
:Original: Documentation/riscv/vm-layout.rst
:Original: Documentation/arch/riscv/vm-layout.rst
:翻译:

View File

@ -52,12 +52,9 @@
core-api/index
driver-api/index
subsystem-apis
内核中的锁 <locking/index>
TODOList:
* subsystem-apis
开发工具和流程
--------------

View File

@ -89,4 +89,4 @@
../doc-guide/maintainer-profile
../../../nvdimm/maintainer-entry-profile
../../../riscv/patch-acceptance
../../../arch/riscv/patch-acceptance

View File

@ -0,0 +1,110 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ./disclaimer-zh_CN.rst
:Original: Documentation/subsystem-apis.rst
:翻译:
唐艺舟 Tang Yizhou <tangyeechou@gmail.com>
==============
内核子系统文档
==============
这些书籍从内核开发者的角度,详细介绍了特定内核子系统
的如何工作。这里的大部分信息直接取自内核源代码,并
根据需要添加了补充材料(或者至少是我们设法添加的 - 可
*不是* 所有的材料都有需要)。
核心子系统
----------
.. toctree::
:maxdepth: 1
core-api/index
driver-api/index
mm/index
power/index
scheduler/index
locking/index
TODOList:
* timers/index
人机接口
--------
.. toctree::
:maxdepth: 1
sound/index
TODOList:
* input/index
* hid/index
* gpu/index
* fb/index
网络接口
--------
.. toctree::
:maxdepth: 1
infiniband/index
TODOList:
* networking/index
* netlabel/index
* isdn/index
* mhi/index
存储接口
--------
.. toctree::
:maxdepth: 1
filesystems/index
TODOList:
* block/index
* cdrom/index
* scsi/index
* target/index
**Fixme**: 这里还需要更多的分类组织工作。
.. toctree::
:maxdepth: 1
accounting/index
cpu-freq/index
iio/index
virt/index
PCI/index
peci/index
TODOList:
* fpga/index
* i2c/index
* leds/index
* pcmcia/index
* spi/index
* w1/index
* watchdog/index
* hwmon/index
* accel/index
* security/index
* crypto/index
* bpf/index
* usb/index
* misc-devices/index
* wmi/index

View File

@ -9,16 +9,16 @@
吳想成 Wu XiangCheng <bobwxc@email.cn>
胡皓文 Hu Haowen <src.res.211@gmail.com>
Linux內核5.x版本 <http://kernel.org/>
Linux內核6.x版本 <http://kernel.org/>
=========================================
以下是Linux版本5的發行註記。仔細閱讀它們,
以下是Linux版本6的發行註記。仔細閱讀它們,
它們會告訴你這些都是什麼,解釋如何安裝內核,以及遇到問題時該如何做。
什麼是Linux
---------------
Linux是Unix作系統的克隆版本由Linus Torvalds在一個鬆散的網絡黑客
Linux是Unix作系統的克隆版本由Linus Torvalds在一個鬆散的網絡黑客
Hacker無貶義團隊的幫助下從頭開始編寫。它旨在實現兼容POSIX和
單一UNIX規範。
@ -28,7 +28,7 @@ Linux內核5.x版本 <http://kernel.org/>
Linux在GNU通用公共許可證版本2GNU GPLv2下分發詳見隨附的COPYING文件。
它能在什麼樣的硬上運行?
它能在什麼樣的硬上運行?
-----------------------------
雖然Linux最初是爲32位的x86 PC機386或更高版本開發的但今天它也能運行在
@ -40,16 +40,16 @@ Linux內核5.x版本 <http://kernel.org/>
單元PMMU和一個移植的GNU C編譯器gccGNU Compiler CollectionGCC的一
部分。Linux也被移植到許多沒有PMMU的體系架構中儘管功能顯然受到了一定的
限制。
Linux也被移植到了其自己上。現在可以將內核作爲用戶空間應用程運行——這被
Linux也被移植到了其自己上。現在可以將內核作爲用戶空間應用程運行——這被
稱爲用戶模式LinuxUML
文檔
-----
際網路上和書籍上都有大量的電子文檔既有Linux專屬文檔也有與一般UNIX問題相關
因特網上和書籍上都有大量的電子文檔既有Linux專屬文檔也有與一般UNIX問題相關
的文檔。我建議在任何Linux FTP站點上查找LDPLinux文檔項目書籍的文檔子目錄。
本自述文件並不是關於系統的文檔:有更好的可用資源。
- 網際網路上和書籍上都有大量的電子文檔既有Linux專屬文檔也有與普通
- 因特網上和書籍上都有大量的電子文檔既有Linux專屬文檔也有與普通
UNIX問題相關的文檔。我建議在任何有LDPLinux文檔項目書籍的Linux FTP
站點上查找文檔子目錄。本自述文件並不是關於系統的文檔:有更好的可用資源。
@ -58,33 +58,33 @@ Linux內核5.x版本 <http://kernel.org/>
:ref:`Documentation/process/changes.rst <changes>` 文件,它包含了升級內核
可能會導致的問題的相關信息。
安裝內核原始
安裝內核源代
---------------
- 如果您要安裝完整的原始請把內核tar檔案包放在您有權限的目錄中例如您
- 如果您要安裝完整的源代請把內核tar檔案包放在您有權限的目錄中例如您
的主目錄)並將其解包::
xz -cd linux-5.x.tar.xz | tar xvf -
xz -cd linux-6.x.tar.xz | tar xvf -
「X」替換成最新內核的版本號。
“X”替換成最新內核的版本號。
【不要】使用 /usr/src/linux 目錄!這有一組庫頭文件使用的內核頭文件
【不要】使用 /usr/src/linux 目錄!這有一組庫頭文件使用的內核頭文件
(通常是不完整的)。它們應該與庫匹配,而不是被內核的變化搞得一團糟。
- 您還可以通過打補丁在5.x版本之間升級。補丁以xz格式分發。要通過打補丁進行
安裝,請獲取所有較新的補丁文件,進入內核原始碼linux-5.x的目錄並
- 您還可以通過打補丁在6.x版本之間升級。補丁以xz格式分發。要通過打補丁進行
安裝,請獲取所有較新的補丁文件,進入內核源代碼linux-6.x的目錄並
執行::
xz -cd ../patch-5.x.xz | patch -p1
xz -cd ../patch-6.x.xz | patch -p1
請【按順序】替換所有大於當前原始碼樹版本的「x」,這樣就可以了。您可能想要
請【按順序】替換所有大於當前源代碼樹版本的“x”,這樣就可以了。您可能想要
刪除備份文件文件名類似xxx~ 或 xxx.orig),並確保沒有失敗的補丁(文件名
類似xxx# 或 xxx.rej。如果有不是你就是我犯了錯誤。
5.x內核的補丁不同5.x.y內核也稱爲穩定版內核的補丁不是增量的而是
直接應用於基本的5.x內核。例如如果您的基本內核是5.0並且希望應用5.0.3
補丁,則不應先應用5.0.1和5.0.2的補丁。類似地如果您運行的是5.0.2內核,
並且希望跳轉到5.0.3那麼在應用5.0.3補丁之前必須首先撤銷5.0.2補丁
6.x內核的補丁不同6.x.y內核也稱爲穩定版內核的補丁不是增量的而是
直接應用於基本的6.x內核。例如如果您的基本內核是6.0並且希望應用6.0.3
補丁,則不應先應用6.0.1和6.0.2的補丁。類似地如果您運行的是6.0.2內核,
並且希望跳轉到6.0.3那麼在應用6.0.3補丁之前必須首先撤銷6.0.2補丁
即patch -R。更多關於這方面的內容請閱讀
:ref:`Documentation/process/applying-patches.rst <applying_patches>`
@ -93,7 +93,7 @@ Linux內核5.x版本 <http://kernel.org/>
linux/scripts/patch-kernel linux
上面命令中的第一個參數是內核原始碼的位置。補丁是在當前目錄應用的,但是
上面命令中的第一個參數是內核源代碼的位置。補丁是在當前目錄應用的,但是
可以將另一個目錄指定爲第二個參數。
- 確保沒有過時的 .o 文件和依賴項::
@ -101,30 +101,30 @@ Linux內核5.x版本 <http://kernel.org/>
cd linux
make mrproper
現在您應該已經正確安裝了原始碼。
現在您應該已經正確安裝了源代碼。
要求
要求
---------
編譯和運行5.x內核需要各種軟體包的最新版本。請參考
編譯和運行6.x內核需要各種軟件包的最新版本。請參考
:ref:`Documentation/process/changes.rst <changes>`
來了解最低版本要求以及如何升級軟包。請注意,使用過舊版本的這些包可能會
來了解最低版本要求以及如何升級軟包。請注意,使用過舊版本的這些包可能會
導致很難追蹤的間接錯誤,因此不要以爲在生成或操作過程中出現明顯問題時可以
只更新包。
爲內核建立目錄
---------------
編譯內核時,默認情況下所有輸出文件都將與內核原始碼放在一起。使用
編譯內核時,默認情況下所有輸出文件都將與內核源代碼放在一起。使用
``make O=output/dir`` 選項可以爲輸出文件(包括 .config指定備用位置。
例如::
kernel source code: /usr/src/linux-5.x
kernel source code: /usr/src/linux-6.x
build directory: /home/name/build/kernel
要配置和構建內核,請使用::
cd /usr/src/linux-5.x
cd /usr/src/linux-6.x
make O=/home/name/build/kernel menuconfig
make O=/home/name/build/kernel
sudo make O=/home/name/build/kernel modules_install install
@ -136,7 +136,7 @@ Linux內核5.x版本 <http://kernel.org/>
即使只升級一個小版本,也不要跳過此步驟。每個版本中都會添加新的配置選項,
如果配置文件沒有按預定設置,就會出現奇怪的問題。如果您想以最少的工作量
將現有配置升級到新版本,請使用 ``makeoldconfig`` ,它只會詢問您新配置
將現有配置升級到新版本,請使用 ``make oldconfig`` ,它只會詢問您新配置
選項的答案。
- 其他配置命令包括::
@ -164,17 +164,17 @@ Linux內核5.x版本 <http://kernel.org/>
"make ${PLATFORM}_defconfig"
使用arch/$arch/configs/${PLATFORM}_defconfig中
的默認選項值創建一個./.config文件。
「makehelp」來獲取您體系架構中所有可用平台的列表。
“make help”來獲取您體系架構中所有可用平臺的列表。
"make allyesconfig"
通過儘可能將選項值設置爲「y」,創建一個
通過儘可能將選項值設置爲“y”,創建一個
./.config文件。
"make allmodconfig"
通過儘可能將選項值設置爲「m」,創建一個
通過儘可能將選項值設置爲“m”,創建一個
./.config文件。
"make allnoconfig" 通過儘可能將選項值設置爲「n」,創建一個
"make allnoconfig" 通過儘可能將選項值設置爲“n”,創建一個
./.config文件。
"make randconfig" 通過隨機設置選項值來創建./.config文件。
@ -182,7 +182,7 @@ Linux內核5.x版本 <http://kernel.org/>
"make localmodconfig" 基於當前配置和加載的模塊lsmod創建配置。禁用
已加載的模塊不需要的任何模塊選項。
要爲另一計算機創建localmodconfig請將該計算機
要爲另一計算機創建localmodconfig請將該計算機
的lsmod存儲到一個文件中並將其作爲lsmod參數傳入。
此外通過在參數LMC_KEEP中指定模塊的路徑可以將
@ -200,9 +200,10 @@ Linux內核5.x版本 <http://kernel.org/>
"make localyesconfig" 與localmodconfig類似只是它會將所有模塊選項轉換
爲內置(=y。你可以同時通過LMC_KEEP保留模塊。
"make kvmconfig" 爲kvm客體內核支持啓用其他選項。
"make kvm_guest.config"
爲kvm客戶機內核支持啓用其他選項。
"make xenconfig" 爲xen dom0客體內核支持啓用其他選項。
"make xen.config" 爲xen dom0客戶機內核支持啓用其他選項。
"make tinyconfig" 配置儘可能小的內核。
@ -218,10 +219,10 @@ Linux內核5.x版本 <http://kernel.org/>
這種情況下,數學仿真永遠不會被使用。內核會稍微大一點,但不管
是否有數學協處理器,都可以在不同的機器上工作。
- 「kernel hacking」配置細節通常會導致更大或更慢的內核(或兩者
- “kernel hacking”配置細節通常會導致更大或更慢的內核(或兩者
兼而有之),甚至可以通過配置一些例程來主動嘗試破壞壞代碼以發現
內核問題從而降低內核的穩定性kmalloc())。因此,您可能應該
用於研究「開發」、「實驗」或「調試」特性相關問題。
用於研究“開發”、“實驗”或“調試”特性相關問題。
編譯內核
---------
@ -229,10 +230,8 @@ Linux內核5.x版本 <http://kernel.org/>
- 確保您至少有gcc 5.1可用。
有關更多信息,請參閱 :ref:`Documentation/process/changes.rst <changes>`
請注意您仍然可以使用此內核運行a.out用戶程序。
- 執行 ``make`` 來創建壓縮內核映像。如果您安裝了lilo以適配內核makefile
那麼也可以進行 ``makeinstall`` 但是您可能需要先檢查特定的lilo設置。
那麼也可以進行 ``make install`` 但是您可能需要先檢查特定的lilo設置。
實際安裝必須以root身份執行但任何正常構建都不需要。
無須徒然使用root身份。
@ -242,8 +241,8 @@ Linux內核5.x版本 <http://kernel.org/>
- 詳細的內核編譯/生成輸出:
通常,內核構建系統在相當安靜的模式下運行(但不是完全安靜)。但是有時您或
其他內核開發人員需要看到編譯、連結或其他命令的執行過程。爲此,可使用
「verbose詳細構建模式。
其他內核開發人員需要看到編譯、鏈接或其他命令的執行過程。爲此,可使用
“verbose詳細構建模式。
``make`` 命令傳遞 ``V=1`` 來實現,例如::
make V=1 all
@ -255,15 +254,15 @@ Linux內核5.x版本 <http://kernel.org/>
與工作內核版本號相同的新內核,請在進行 ``make modules_install`` 安裝
之前備份modules目錄。
或者,在編譯之前,使用內核配置選項「LOCALVERSION」向常規內核版本附加
一個唯一的後綴。LOCALVERSION可以在「General Setup」菜單中設置。
或者,在編譯之前,使用內核配置選項“LOCALVERSION”向常規內核版本附加
一個唯一的後綴。LOCALVERSION可以在“General Setup”菜單中設置。
- 爲了引導新內核,您需要將內核映像(例如編譯後的
.../linux/arch/x86/boot/bzImage複製到常規可引導內核的位置。
- 不再支持在沒有LILO等啓動裝載程序幫助的情況下直接從軟盤引導內核。
如果從硬引導Linux很可能使用LILO它使用/etc/lilo.conf文件中
如果從硬引導Linux很可能使用LILO它使用/etc/lilo.conf文件中
指定的內核映像文件。內核映像文件通常是/vmlinuz、/boot/vmlinuz、
/bzImage或/boot/bzImage。使用新內核前請保存舊映像的副本並複製
新映像覆蓋舊映像。然後您【必須重新運行LILO】來更新加載映射否則
@ -284,68 +283,13 @@ Linux內核5.x版本 <http://kernel.org/>
若遇到問題
-----------
- 如果您發現了一些可能由於內核缺陷所導致的問題請檢查MAINTAINERS維護者
文件看看是否有人與令您遇到麻煩的內核部分相關。如果無人在此列出,那麼第二
個最好的方案就是把它們發給我torvalds@linux-foundation.org也可能發送
到任何其他相關的郵件列表或新聞組。
如果您發現了一些可能由於內核缺陷所導致的問題,請參閱:
Documentation/translations/zh_CN/admin-guide/reporting-issues.rst 。
- 在所有的缺陷報告中,【請】告訴我們您在說什麼內核,如何復現問題,以及您的
設置是什麼的(使用您的常識)。如果問題是新的,請告訴我;如果問題是舊的,
請嘗試告訴我您什麼時候首次注意到它。
想要理解內核錯誤報告,請參閱:
Documentation/translations/zh_CN/admin-guide/bug-hunting.rst 。
- 如果缺陷導致如下消息::
unable to handle kernel paging request at address C0000010
Oops: 0002
EIP: 0010:XXXXXXXX
eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx
esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx
ds: xxxx es: xxxx fs: xxxx gs: xxxx
Pid: xx, process nr: xx
xx xx xx xx xx xx xx xx xx xx
或者類似的內核調試信息顯示在屏幕上或在系統日誌里,請【如實】複製它。
可能對你來說轉儲dump看起來不可理解但它確實包含可能有助於調試問題的
信息。轉儲上方的文本也很重要:它說明了內核轉儲代碼的原因(在上面的示例中,
是由於內核指針錯誤)。更多關於如何理解轉儲的信息,請參見
Documentation/admin-guide/bug-hunting.rst。
- 如果使用 CONFIG_KALLSYMS 編譯內核,則可以按原樣發送轉儲,否則必須使用
``ksymoops`` 程序來理解轉儲但通常首選使用CONFIG_KALLSYMS編譯
此實用程序可從
https://www.kernel.org/pub/linux/utils/kernel/ksymoops/ 下載。
或者,您可以手動執行轉儲查找:
- 在調試像上面這樣的轉儲時如果您可以查找EIP值的含義這將非常有幫助。
十六進位值本身對我或其他任何人都沒有太大幫助:它會取決於特定的內核設置。
您應該做的是從EIP行獲取十六進位值忽略 ``0010:`` ),然後在內核名字列表
中查找它,以查看哪個內核函數包含有問題的地址。
要找到內核函數名,您需要找到與顯示症狀的內核相關聯的系統二進位文件。就是
文件「linux/vmlinux」。要提取名字列表並將其與內核崩潰中的EIP進行匹配
請執行::
nm vmlinux | sort | less
這將爲您提供一個按升序排序的內核地址列表,從中很容易找到包含有問題的地址
的函數。請注意,內核調試消息提供的地址不一定與函數地址完全匹配(事實上,
這是不可能的因此您不能只「grep」列表不過列表將爲您提供每個內核函數
的起點,因此通過查找起始地址低於你正在搜索的地址,但後一個函數的高於的
函數,你會找到您想要的。實際上,在您的問題報告中加入一些「上下文」可能是
一個好主意,給出相關的上下幾行。
如果您由於某些原因無法完成上述操作(如您使用預編譯的內核映像或類似的映像),
請儘可能多地告訴我您的相關設置信息,這會有所幫助。有關詳細信息請閱讀
『Documentation/admin-guide/reporting-issues.rst』。
- 或者您可以在正在運行的內核上使用gdb只讀的即不能更改值或設置斷點
爲此,請首先使用-g編譯內核適當地編輯arch/x86/Makefile然後執行 ``make
clean`` 。您還需要啓用CONFIG_PROC_FS通過 ``make config`` )。
使用新內核重新啓動後,執行 ``gdb vmlinux /proc/kcore`` 。現在可以使用所有
普通的gdb命令。查找系統崩潰點的命令是 ``l *0xXXXXXXXX`` 將xxx替換爲EIP
值)。
用gdb無法調試一個當前未運行的內核是由於gdb錯誤地忽略了編譯內核的起始
偏移量。
更多用GDB調試內核的信息請參閱
Documentation/translations/zh_CN/dev-tools/gdb-kernel-debugging.rst
和 Documentation/dev-tools/kgdb.rst 。

View File

@ -0,0 +1,294 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_TW.rst
:Original: Documentation/admin-guide/bootconfig.rst
:譯者: 吳想成 Wu XiangCheng <bobwxc@email.cn>
========
引導配置
========
:作者: Masami Hiramatsu <mhiramat@kernel.org>
概述
====
引導配置擴展了現有的內核命令行,以一種更有效率的方式在引導內核時進一步支持
鍵值數據。這允許管理員傳遞一份結構化關鍵字的配置文件。
配置文件語法
============
引導配置文件的語法採用非常簡單的鍵值結構。每個關鍵字由點連接的單詞組成,鍵
和值由 ``=`` 連接。值以分號( ``;`` )或換行符( ``\n`` )結尾。數組值中每
個元素由逗號( ``,`` )分隔。::
KEY[.WORD[...]] = VALUE[, VALUE2[...]][;]
與內核命令行語法不同,逗號和 ``=`` 周圍允許有空格。
關鍵字只允許包含字母、數字、連字符( ``-`` )和下劃線( ``_`` )。值可包含
可打印字符和空格,但分號( ``;`` )、換行符( ``\n`` )、逗號( ``,`` )、
井號( ``#`` )和右大括號( ``}`` )等分隔符除外。
如果你需要在值中使用這些分隔符,可以用雙引號( ``"VALUE"`` )或單引號
``'VALUE'`` )括起來。注意,引號無法轉義。
鍵的值可以爲空或不存在。這些鍵用於檢查該鍵是否存在(類似布爾值)。
鍵值語法
--------
引導配置文件語法允許用戶通過大括號合併鍵名部分相同的關鍵字。例如::
foo.bar.baz = value1
foo.bar.qux.quux = value2
也可以寫成::
foo.bar {
baz = value1
qux.quux = value2
}
或者更緊湊一些,寫成::
foo.bar { baz = value1; qux.quux = value2 }
在這兩種樣式中,引導解析時相同的關鍵字都會自動合併。因此可以追加類似的樹或
鍵值。
相同關鍵字的值
--------------
禁止兩個或多個值或數組共享同一個關鍵字。例如::
foo = bar, baz
foo = qux # !錯誤! 我們不可以重定義相同的關鍵字
如果你想要更新值,必須顯式使用覆蓋操作符 ``:=`` 。例如::
foo = bar, baz
foo := qux
這樣 ``foo`` 關鍵字的值就變成了 ``qux`` 。這對於通過添加(部分)自定義引導
配置來覆蓋默認值非常有用,免於解析默認引導配置。
如果你想對現有關鍵字追加值作爲數組成員,可以使用 ``+=`` 操作符。例如::
foo = bar, baz
foo += qux
這樣, ``foo`` 關鍵字就同時擁有了 ``bar`` ``baz````qux``
此外,父關鍵字下可同時存在值和子關鍵字。
例如,下列配置是可行的。::
foo = value1
foo.bar = value2
foo := value3 # 這會更新foo的值。
注意,裸值不能直接放進結構化關鍵字中,必須在大括號外定義它。例如::
foo {
bar = value1
bar {
baz = value2
qux = value3
}
}
同時,關鍵字下值節點的順序是固定的。如果值和子關鍵字同時存在,值永遠是該關
鍵字的第一個子節點。因此如果用戶先指定子關鍵字,如::
foo.bar = value1
foo = value2
則在程序(和/proc/bootconfig它會按如下顯示::
foo = value2
foo.bar = value1
註釋
----
配置語法接受shell腳本風格的註釋。註釋以井號 ``#`` )開始,到換行符
``\n`` )結束。
::
# comment line
foo = value # value is set to foo.
bar = 1, # 1st element
2, # 2nd element
3 # 3rd element
會被解析爲::
foo = value
bar = 1, 2, 3
注意你不能把註釋放在值和分隔符( ``,````;`` )之間。如下配置語法是錯誤的::
key = 1 # comment
,2
/proc/bootconfig
================
/proc/bootconfig是引導配置的用戶空間接口。與/proc/cmdline不同此文件內容以
鍵值列表樣式顯示。
每個鍵值對一行,樣式如下::
KEY[.WORDS...] = "[VALUE]"[,"VALUE2"...]
用引導配置引導內核
==================
用引導配置引導內核有兩種方法將引導配置附加到initrd鏡像或直接嵌入內核中。
*initrd: initial RAM disk初始內存磁盤*
將引導配置附加到initrd
----------------------
由於默認情況下引導配置文件是用initrd加載的因此它將被添加到initrdinitramfs
鏡像文件的末尾其中包含填充、大小、校驗值和12字節幻數如下所示::
[initrd][bootconfig][padding][size(le32)][checksum(le32)][#BOOTCONFIG\n]
大小和校驗值爲小端序存放的32位無符號值。
當引導配置被加到initrd鏡像時整個文件大小會對齊到4字節。空字符 ``\0``
會填補對齊空隙。因此 ``size`` 就是引導配置文件的長度+填充的字節。
Linux內核在內存中解碼initrd鏡像的最後部分以獲取引導配置數據。由於這種“揹負式”
的方法只要引導加載器傳遞了正確的initrd文件大小就無需更改或更新引導加載器
和內核鏡像本身。如果引導加載器意外傳遞了更長的大小,內核將無法找到引導配置數
據。
Linux內核在tools/bootconfig下提供了 ``bootconfig`` 命令來完成此操作,管理員
可以用它從initrd鏡像中刪除或追加配置文件。你可以用以下命令來構建它::
# make -C tools/bootconfig
要向initrd鏡像添加你的引導配置文件請按如下命令操作舊數據會自動移除::
# tools/bootconfig/bootconfig -a your-config /boot/initrd.img-X.Y.Z
要從鏡像中移除配置,可以使用-d選項::
# tools/bootconfig/bootconfig -d /boot/initrd.img-X.Y.Z
然後在內核命令行上添加 ``bootconfig`` 告訴內核去initrd文件末尾尋找內核配置。
將引導配置嵌入內核
------------------
如果你不能使用initrd也可以通過Kconfig選項將引導配置文件嵌入內核中。在此情
況下,你需要用以下選項重新編譯內核::
CONFIG_BOOT_CONFIG_EMBED=y
CONFIG_BOOT_CONFIG_EMBED_FILE="/引導配置/文件/的/路徑"
``CONFIG_BOOT_CONFIG_EMBED_FILE`` 需要從源碼樹或對象樹開始的引導配置文件的
絕對/相對路徑。內核會將其嵌入作爲默認引導配置。
與將引導配置附加到initrd一樣你也需要在內核命令行上添加 ``bootconfig`` 告訴
內核去啓用內嵌的引導配置。
注意即使你已經設置了此選項仍可用附加到initrd的其他引導配置覆蓋內嵌的引導
配置。
通過引導配置傳遞內核參數
========================
除了內核命令行,引導配置也可以用於傳遞內核參數。所有 ``kernel`` 關鍵字下的鍵
值對都將直接傳遞給內核命令行。此外, ``init`` 下的鍵值對將通過命令行傳遞給
init進程。參數按以下順序與用戶給定的內核命令行字符串相連因此命令行參數可以
覆蓋引導配置參數(這取決於子系統如何處理參數,但通常前面的參數將被後面的參數
覆蓋)::
[bootconfig params][cmdline params] -- [bootconfig init params][cmdline init params]
如果引導配置文件給出的kernel/init參數是::
kernel {
root = 01234567-89ab-cdef-0123-456789abcd
}
init {
splash
}
這將被複制到內核命令行字符串中,如下所示::
root="01234567-89ab-cdef-0123-456789abcd" -- splash
如果用戶給出的其他命令行是::
ro bootconfig -- quiet
則最後的內核命令行如下::
root="01234567-89ab-cdef-0123-456789abcd" ro bootconfig -- splash quiet
配置文件的限制
==============
當前最大的配置大小是32KB關鍵字總數不是鍵值條目必須少於1024個節點。
注意這不是條目數而是節點數條目必須消耗超過2個節點一個關鍵字和一個值
所以從理論上講最多512個鍵值對。如果關鍵字平均包含3個單詞則可有256個鍵值對。
在大多數情況下配置項的數量將少於100個條目小於8KB因此這應該足夠了。如果
節點數超過1024解析器將返回錯誤即使文件大小小於32KB。請注意此最大尺寸
不包括填充的空字符。)
無論如何,因爲 ``bootconfig`` 命令在附加啓動配置到initrd映像時會驗證它用戶
可以在引導之前注意到它。
引導配置API
===========
用戶可以查詢或遍歷鍵值對,也可以查找(前綴)根關鍵字節點,並在查找該節點下的
鍵值。
如果您有一個關鍵字字符串,則可以直接使用 xbc_find_value() 查詢該鍵的值。如果
你想知道引導配置裏有哪些關鍵字,可以使用 xbc_for_each_key_value() 迭代鍵值對。
請注意,您需要使用 xbc_array_for_each_value() 訪問數組的值,例如::
vnode = NULL;
xbc_find_value("key.word", &vnode);
if (vnode && xbc_node_is_array(vnode))
xbc_array_for_each_value(vnode, value) {
printk("%s ", value);
}
如果您想查找具有前綴字符串的鍵,可以使用 xbc_find_node() 通過前綴字符串查找
節點,然後用 xbc_node_for_each_key_value() 迭代前綴節點下的鍵。
但最典型的用法是獲取前綴下的命名值或前綴下的命名數組,例如::
root = xbc_find_node("key.prefix");
value = xbc_node_find_value(root, "option", &vnode);
...
xbc_node_for_each_array_value(root, "array-option", value, anode) {
...
}
這將訪問值“key.prefix.option”的值和“key.prefix.array-option”的數組。
鎖是不需要的,因爲在初始化之後配置只讀。如果需要修改,必須複製所有數據和關鍵字。
函數與結構體
============
相關定義的kernel-doc參見
- include/linux/bootconfig.h
- lib/bootconfig.c

View File

@ -17,14 +17,14 @@
引言
=====
始終嘗試由來自kernel.org的原始碼構建的最新內核。如果您沒有信心這樣做,請將
始終嘗試由來自kernel.org的源代碼構建的最新內核。如果您沒有信心這樣做,請將
錯誤報告給您的發行版供應商,而不是內核開發人員。
找到缺陷bug並不總是那麼容易不過仍然得去找。如果你找不到它不要放棄。
儘可能多的向相關維護人員報告您發現的信息。請參閱MAINTAINERS文件以解您所
儘可能多的向相關維護人員報告您發現的信息。請參閱MAINTAINERS文件以解您所
關注的子系統的維護人員。
在提交錯誤報告之前,請閱讀「Documentation/admin-guide/reporting-issues.rst」
在提交錯誤報告之前,請閱讀“Documentation/admin-guide/reporting-issues.rst”
設備未出現Devices not appearing
====================================
@ -38,7 +38,7 @@
操作步驟:
- 從git原始碼構建內核
- 從git源代碼構建內核
- 以此開始二分 [#f1]_::
$ git bisect start
@ -76,7 +76,7 @@
如需進一步參考,請閱讀:
- ``git-bisect`` 的手冊頁
- `Fighting regressions with git bisect用git bisect解決歸)
- `Fighting regressions with git bisect用git bisect解決歸)
<https://www.kernel.org/pub/software/scm/git/docs/git-bisect-lk2009.html>`_
- `Fully automated bisecting with "git bisect run"使用git bisect run
來全自動二分) <https://lwn.net/Articles/317154>`_

View File

@ -48,8 +48,8 @@
[<c1549f43>] ? sysenter_past_esp+0x40/0x6a
---[ end trace 6ebc60ef3981792f ]---
這樣的堆棧跟蹤提供了足夠的信息來識別內核原始碼中發生錯誤的那一行。根據問題的
嚴重性,它還可能包含 **「Oops」** 一詞,比如::
這樣的堆棧跟蹤提供了足夠的信息來識別內核源代碼中發生錯誤的那一行。根據問題的
嚴重性,它還可能包含 **“Oops”** 一詞,比如::
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c06969d4>] iret_exc+0x7d0/0xa59
@ -58,17 +58,17 @@
...
儘管有 **Oops** 或其他類型的堆棧跟蹤,但通常需要找到出問題的行來識別和處理缺
陷。在本章中,我們將參考「Oops」來了解需要分析的各種堆棧跟蹤。
陷。在本章中,我們將參考“Oops”來了解需要分析的各種堆棧跟蹤。
如果內核是用 ``CONFIG_DEBUG_INFO`` 編譯的,那麼可以使用文件:
`scripts/decode_stacktrace.sh`
連結的模塊
鏈接的模塊
-----------
受到汙染或正在加載/卸載的模塊用「(…)」標記,汙染標誌在
`Documentation/admin-guide/tainted-kernels.rst` 文件中進行了描述,正在被加
」用「+」標註,「正在被卸載」用「-」標註。
受到污染或正在加載/卸載的模塊用“(…)”標記,污染標誌在
`Documentation/admin-guide/tainted-kernels.rst` 文件中進行了描述,正在被加
”用“+”標註,“正在被卸載”用“-”標註。
Oops消息在哪
@ -81,19 +81,19 @@ syslog文件通常是 ``/var/log/messages`` (取決於 ``/etc/syslog.conf``
有時 ``klogd`` 會掛掉,這種情況下您可以運行 ``dmesg > file`` 從內核緩衝區
讀取數據並保存它。或者您可以 ``cat /proc/kmsg > file`` ,但是您必須適時
中斷以停止傳輸,因爲 ``kmsg`` 是一個「永無止境的文件」
中斷以停止傳輸,因爲 ``kmsg`` 是一個“永無止境的文件”
如果機器嚴重崩潰,無法輸入命令或磁不可用,那還有三個選項:
如果機器嚴重崩潰,無法輸入命令或磁不可用,那還有三個選項:
(1) 手動複製屏幕上的文本,並在機器重新啓動後輸入。很難受,但這是突然崩潰下
唯一的選擇。或者你可以用數相機拍下屏幕——雖然不那麼好,但總比什麼都沒
有好。如果消息滾動超出控制台頂部,使用更高解析度(例如 ``vga=791``
引導啓動將允許您閱讀更多文本。(警告:這需要 ``vesafb`` ,因此對「早期」
唯一的選擇。或者你可以用數相機拍下屏幕——雖然不那麼好,但總比什麼都沒
有好。如果消息滾動超出控制檯頂部,使用更高分辨率(例如 ``vga=791``
引導啓動將允許您閱讀更多文本。(警告:這需要 ``vesafb`` ,因此對“早期”
的Oppses沒有幫助
(2) 從串口終端啓動(參見
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`
在另一台機器上運行數據機然後用你喜歡的通信程序捕獲輸出。
在另一臺機器上運行調制解調器然後用你喜歡的通信程序捕獲輸出。
Minicom運行良好。
(3) 使用Kdump參閱 Documentation/admin-guide/kdump/kdump.rst ),使用
@ -103,7 +103,7 @@ syslog文件通常是 ``/var/log/messages`` (取決於 ``/etc/syslog.conf``
找到缺陷位置
-------------
如果你能指出缺陷在內核原始碼中的位置,則報告缺陷的效果會非常好。這有兩種方法。
如果你能指出缺陷在內核源代碼中的位置,則報告缺陷的效果會非常好。這有兩種方法。
通常來說使用 ``gdb`` 會比較容易,不過內核需要用調試信息來預編譯。
gdb
@ -187,7 +187,7 @@ GNU 調試器GNU debugger ``gdb`` )是從 ``vmlinux`` 文件中找出OOP
objdump
^^^^^^^^
要調試內核請使用objdump並從崩潰輸出中查找十六進偏移,以找到有效的代碼/匯
要調試內核請使用objdump並從崩潰輸出中查找十六進偏移,以找到有效的代碼/匯
編行。如果沒有調試符號,您將看到所示例程的彙編程序代碼,但是如果內核有調試
符號C代碼也將可見調試符號可以在內核配置菜單的hacking項中啓用。例如::
@ -197,7 +197,7 @@ objdump
您需要處於內核樹的頂層以便此獲得您的C文件。
如果您無法訪問原始仍然可以使用以下方法調試一些崩潰轉儲如Dave Miller的
如果您無法訪問源代仍然可以使用以下方法調試一些崩潰轉儲如Dave Miller的
示例崩潰轉儲輸出所示)::
EIP is at +0x14/0x4c0
@ -234,9 +234,9 @@ objdump
報告缺陷
---------
一旦你通過定位缺陷找到了其發生的地方,你可以嘗試自己修復它或者向上報告它。
一旦你通過定位缺陷找到了其發生的地方,你可以嘗試自己修復它或者向上報告它。
爲了向上報告,您應該找出用於開發受影響代碼的郵件列表。這可以使用 ``get_maintainer.pl``
爲了向上報告,您應該找出用於開發受影響代碼的郵件列表。這可以使用 ``get_maintainer.pl``
例如您在gspca的sonixj.c文件中發現一個缺陷則可以通過以下方法找到它的維護者::
@ -251,7 +251,7 @@ objdump
請注意它將指出:
- 最後接觸原始碼的開發人員如果這是在git樹中完成的。在上面的例子中是Tejun
- 最後接觸源代碼的開發人員如果這是在git樹中完成的。在上面的例子中是Tejun
和Bhaktipriya在這個特定的案例中沒有人真正參與這個文件的開發
- 驅動維護人員Hans Verkuil
- 子系統維護人員Mauro Carvalho Chehab

View File

@ -7,10 +7,10 @@
清除 WARN_ONCE
--------------
WARN_ONCE / WARN_ON_ONCE / printk_once 僅僅印一次消息.
WARN_ONCE / WARN_ON_ONCE / printk_once 僅僅印一次消息.
echo 1 > /sys/kernel/debug/clear_warn_once
可以清除這種狀態並且再次允許印一次告警信息,這對於運行測試集後重現問題
可以清除這種狀態並且再次允許印一次告警信息,這對於運行測試集後重現問題
很有用。

View File

@ -20,13 +20,13 @@ Linux通過``/proc/stat``和``/proc/uptime``導出各種信息,用戶空間工
...
裡系統認爲在默認採樣周期內有10.01%的時間工作在用戶空間2.92%的時
裏系統認爲在默認採樣週期內有10.01%的時間工作在用戶空間2.92%的時
間用在系統空間總體上有81.63%的時間是空閒的。
大多數情況下``/proc/stat``的信息幾乎真實反映了系統信息,然而,由於內
核採集這些數據的方式/時間的特點,有時這些信息根本不可靠。
那麼這些信息是如何被集的呢?每當時間中斷觸發時,內核查看此刻運行的
那麼這些信息是如何被集的呢?每當時間中斷觸發時,內核查看此刻運行的
進程類型,並增加與此類型/狀態進程對應的計數器的值。這種方法的問題是
在兩次時間中斷之間系統(進程)能夠在多種狀態之間切換多次,而計數器只
增加最後一種狀態下的計數。
@ -34,7 +34,7 @@ Linux通過``/proc/stat``和``/proc/uptime``導出各種信息,用戶空間工
舉例
---
假設系統有一個進程以如下方式周期性地占用cpu::
假設系統有一個進程以如下方式週期性地佔用cpu::
兩個時鐘中斷之間的時間線
|-----------------------|
@ -46,7 +46,7 @@ Linux通過``/proc/stat``和``/proc/uptime``導出各種信息,用戶空間工
在上面的情況下,根據``/proc/stat``的信息(由於當系統處於空閒狀態時,
時間中斷經常會發生系統的負載將會是0
大家能夠想內核的這種行爲會發生在許多情況下,這將導致``/proc/stat``
大家能夠想內核的這種行爲會發生在許多情況下,這將導致``/proc/stat``
中存在相當古怪的信息::
/* gcc -o hog smallhog.c */

View File

@ -0,0 +1,97 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_TW.rst
:Original: Documentation/admin-guide/cputopology.rst
:翻譯:
唐藝舟 Tang Yizhou <tangyeechou@gmail.com>
==========================
如何通過sysfs將CPU拓撲導出
==========================
CPU拓撲信息通過sysfs導出。顯示的項屬性和某些架構的/proc/cpuinfo輸出相似。它們位於
/sys/devices/system/cpu/cpuX/topology/。請閱讀ABI文件
Documentation/ABI/stable/sysfs-devices-system-cpu。
drivers/base/topology.c是體系結構中性的它導出了這些屬性。然而die、cluster、book、
draw這些層次結構相關的文件僅在體系結構提供了下文描述的宏的條件下被創建。
對於支持這個特性的體系結構它必須在include/asm-XXX/topology.h中定義這些宏中的一部分::
#define topology_physical_package_id(cpu)
#define topology_die_id(cpu)
#define topology_cluster_id(cpu)
#define topology_core_id(cpu)
#define topology_book_id(cpu)
#define topology_drawer_id(cpu)
#define topology_sibling_cpumask(cpu)
#define topology_core_cpumask(cpu)
#define topology_cluster_cpumask(cpu)
#define topology_die_cpumask(cpu)
#define topology_book_cpumask(cpu)
#define topology_drawer_cpumask(cpu)
``**_id macros`` 的類型是int。
``**_cpumask macros`` 的類型是 ``(const) struct cpumask *`` 。後者和恰當的
``**_siblings`` sysfs屬性對應除了topology_sibling_cpumask()它和thread_siblings
對應)。
爲了在所有體系結構上保持一致include/linux/topology.h提供了上述所有宏的默認定義以防
它們未在include/asm-XXX/topology.h中定義:
1) topology_physical_package_id: -1
2) topology_die_id: -1
3) topology_cluster_id: -1
4) topology_core_id: 0
5) topology_book_id: -1
6) topology_drawer_id: -1
7) topology_sibling_cpumask: 僅入參CPU
8) topology_core_cpumask: 僅入參CPU
9) topology_cluster_cpumask: 僅入參CPU
10) topology_die_cpumask: 僅入參CPU
11) topology_book_cpumask: 僅入參CPU
12) topology_drawer_cpumask: 僅入參CPU
此外CPU拓撲信息由/sys/devices/system/cpu提供包含下述文件。輸出對應的內部數據源放在
方括號("[]")中。
=========== ==================================================================
kernel_max: 內核配置允許的最大CPU下標值。[NR_CPUS-1]
offline: 由於熱插拔移除或者超過內核允許的CPU上限上文描述的kernel_max
導致未上線的CPU。[~cpu_online_mask + cpus >= NR_CPUS]
online: 在線的CPU可供調度使用。[cpu_online_mask]
possible: 已被分配資源的CPU如果它們CPU實際存在可以上線。
[cpu_possible_mask]
present: 被系統識別實際存在的CPU。[cpu_present_mask]
=========== ==================================================================
上述輸出的格式和cpulist_parse()兼容[參見 <linux/cpumask.h>]。下面給些例子。
在本例中系統中有64個CPU但是CPU 32-63超過了kernel_max值因爲NR_CPUS配置項是32
取值範圍被限制爲0..31。此外注意CPU2和4-31未上線但是可以上線因爲它們同時存在於
present和possible::
kernel_max: 31
offline: 2,4-31,32-63
online: 0-1,3
possible: 0-31
present: 0-31
在本例中NR_CPUS配置項是128但內核啓動時設置possible_cpus=144。系統中有4個CPU
CPU2被手動設置下線也是唯一一個可以上線的CPU::
kernel_max: 127
offline: 2,4-127,128-143
online: 0-1,3
possible: 0-127
present: 0-3
閱讀Documentation/core-api/cpu_hotplug.rst可瞭解開機參數possible_cpus=NUM同時還
可以瞭解各種cpumask的信息。

View File

@ -3,13 +3,14 @@
.. include:: ../disclaimer-zh_TW.rst
:Original: :doc:`../../../admin-guide/index`
:Translator: 胡皓文 Hu Haowen <src.res.211@gmail.com>
:Translator: Alex Shi <alex.shi@linux.alibaba.com>
胡皓文 Hu Haowen <src.res.211@gmail.com>
Linux 內核用戶和管理員指南
==========================
下面是一組隨時間添加到內核中的面向用戶的文檔的集合。到目前爲止,還沒有一個
整體的順序或組織 - 這些材料不是一個單一的,連貫的文件!幸運的話,情況會隨
整體的順序或組織 - 這些材料不是一個單一的,連貫的文件!幸運的話,情況會隨
時間的推移而迅速改善。
這個初始部分包含總體信息包括描述內核的README 關於內核參數的文檔等。
@ -21,15 +22,15 @@ Linux 內核用戶和管理員指南
Todolist:
kernel-parameters
devices
sysctl/index
* kernel-parameters
* devices
* sysctl/index
本節介紹CPU漏洞及其緩解措施。
Todolist:
hw-vuln/index
* hw-vuln/index
下面的一組文檔針對的是試圖跟蹤問題和bug的用戶。
@ -37,6 +38,7 @@ Todolist:
:maxdepth: 1
reporting-issues
reporting-regressions
security-bugs
bug-hunting
bug-bisect
@ -45,18 +47,17 @@ Todolist:
Todolist:
reporting-bugs
ramoops
dynamic-debug-howto
kdump/index
perf/index
* ramoops
* dynamic-debug-howto
* kdump/index
* perf/index
這是應用程式開發人員感興趣的章節的開始。可以在這裡找到涵蓋內核ABI各個
這是應用程序開發人員感興趣的章節的開始。可以在這裏找到涵蓋內核ABI各個
方面的文檔。
Todolist:
sysfs-rules
* sysfs-rules
本手冊的其餘部分包括各種指南,介紹如何根據您的喜好配置內核的特定行爲。
@ -64,67 +65,67 @@ Todolist:
.. toctree::
:maxdepth: 1
bootconfig
clearing-warn-once
cpu-load
cputopology
lockup-watchdogs
unicode
sysrq
mm/index
Todolist:
acpi/index
aoe/index
auxdisplay/index
bcache
binderfs
binfmt-misc
blockdev/index
bootconfig
braille-console
btmrvl
cgroup-v1/index
cgroup-v2
cifs/index
cputopology
dell_rbu
device-mapper/index
edid
efi-stub
ext4
nfs/index
gpio/index
highuid
hw_random
initrd
iostats
java
jfs
kernel-per-CPU-kthreads
laptops/index
lcd-panel-cgram
ldm
lockup-watchdogs
LSM/index
md
media/index
mm/index
module-signing
mono
namespaces/index
numastat
parport
perf-security
pm/index
pnp
rapidio
ras
rtc
serial-console
svga
sysrq
thunderbolt
ufs
vga-softcursor
video-output
xfs
* acpi/index
* aoe/index
* auxdisplay/index
* bcache
* binderfs
* binfmt-misc
* blockdev/index
* braille-console
* btmrvl
* cgroup-v1/index
* cgroup-v2
* cifs/index
* dell_rbu
* device-mapper/index
* edid
* efi-stub
* ext4
* nfs/index
* gpio/index
* highuid
* hw_random
* initrd
* iostats
* java
* jfs
* kernel-per-CPU-kthreads
* laptops/index
* lcd-panel-cgram
* ldm
* LSM/index
* md
* media/index
* module-signing
* mono
* namespaces/index
* numastat
* parport
* perf-security
* pm/index
* pnp
* rapidio
* ras
* rtc
* serial-console
* svga
* thunderbolt
* ufs
* vga-softcursor
* video-output
* xfs
.. only:: subproject and html

View File

@ -9,8 +9,8 @@
吳想成 Wu XiangCheng <bobwxc@email.cn>
胡皓文 Hu Haowen <src.res.211@gmail.com>
解釋「No working init found.」啓動掛起消息
==========================================
解釋“No working init found.”啓動掛起消息
=========================================
:作者:
@ -18,41 +18,41 @@
Cristian Souza <cristianmsbr at gmail period com>
本文檔提供了加載初始化二進init binary失敗的一些高層級原因大致按執行
本文檔提供了加載初始化二進init binary失敗的一些高層級原因大致按執行
順序列出)。
1) **無法掛載根文件系統Unable to mount root FS** :請設置「debug」內核參數(在
1) **無法掛載根文件系統Unable to mount root FS** :請設置“debug”內核參數(在
引導加載程序bootloader配置文件或CONFIG_CMDLINE以獲取更詳細的內核消息。
2) **初始化二進不存在於根文件系統上init binary doesn't exist on rootfs**
2) **初始化二進不存在於根文件系統上init binary doesn't exist on rootfs**
確保您的根文件系統類型正確(並且 ``root=`` 內核參數指向正確的分區);擁有
所需的驅動程序例如SCSI或USB等存儲硬文件系統ext3、jffs2等是內建的
所需的驅動程序例如SCSI或USB等存儲硬文件系統ext3、jffs2等是內建的
或者作爲模塊由initrd預加載
3) **控制設備損壞Broken console device** ``console= setup`` 中可能存在
衝突 --> 初始控制不可用initial console unavailable。例如由於串行
IRQ問題如缺少基於中斷的配置導致的某些串行控制不可靠。嘗試使用不同的
3) **控制設備損壞Broken console device** ``console= setup`` 中可能存在
衝突 --> 初始控制不可用initial console unavailable。例如由於串行
IRQ問題如缺少基於中斷的配置導致的某些串行控制不可靠。嘗試使用不同的
``console= device`` 或像 ``netconsole=``
4) **二進存在但依賴項不可用Binary exists but dependencies not available**
例如初始化二進的必需庫依賴項,像 ``/lib/ld-linux.so.2`` 丟失或損壞。使用
4) **二進存在但依賴項不可用Binary exists but dependencies not available**
例如初始化二進的必需庫依賴項,像 ``/lib/ld-linux.so.2`` 丟失或損壞。使用
``readelf -d <INIT>|grep NEEDED`` 找出需要哪些庫。
5) **無法加載二進位Binary cannot be loaded** :請確保二進位的體系結構與您的
體匹配。例如i386不匹配x86_64或者嘗試在ARM硬體上加載x86。如果您嘗試在
此處加載非二進文件shell腳本您應該確保腳本在其工作頭shebang
5) **無法加載二進制Binary cannot be loaded** :請確保二進制的體系結構與您的
件匹配。例如i386不匹配x86_64或者嘗試在ARM硬件上加載x86。如果您嘗試在
此處加載非二進文件shell腳本您應該確保腳本在其工作頭shebang
header``#!/...`` 中指定能正常工作的解釋器(包括其庫依賴項)。在處理
腳本之前,最好先測試一個簡單的非腳本二進文件,比如 ``/bin/sh`` ,並確認
腳本之前,最好先測試一個簡單的非腳本二進文件,比如 ``/bin/sh`` ,並確認
它能成功執行。要了解更多信息,請將代碼添加到 ``init/main.c`` 以顯示
kernel_execve()的返回值。
當您發現新的失敗原因時,請擴展本解釋(畢竟加載初始化二進是一個 **關鍵**
當您發現新的失敗原因時,請擴展本解釋(畢竟加載初始化二進是一個 **關鍵**
艱難的過渡步驟需要儘可能無痛地進行然後向LKML提交一個補丁。
待辦事項:
- 通過一個可以存儲 ``kernel_execve()`` 結果值的結構體數組實現各種
``run_init_process()`` 調用,並在失敗時通過**所有** 結果來記錄一切
``run_init_process()`` 調用,並在失敗時通過**所有** 結果來記錄一切
(非常重要的可用性修復)。
- 試使實現本身在一般情況下更有幫助,例如在受影響的地方提供額外的錯誤消息。
- 試使實現本身在一般情況下更有幫助,例如在受影響的地方提供額外的錯誤消息。

View File

@ -0,0 +1,67 @@
.. include:: ../disclaimer-zh_TW.rst
:Original: Documentation/admin-guide/lockup-watchdogs.rst
:Translator: Hailong Liu <liu.hailong6@zte.com.cn>
.. _tw_lockup-watchdogs:
=================================================
Softlockup與hardlockup檢測機制(又名:nmi_watchdog)
=================================================
Linux中內核實現了一種用以檢測系統發生softlockup和hardlockup的看門狗機制。
Softlockup是一種會引發系統在內核態中一直循環超過20秒詳見下面“實現”小節導致
其他任務沒有機會得到運行的BUG。一旦檢測到'softlockup'發生,默認情況下系統會打
印當前堆棧跟蹤信息並進入鎖定狀態。也可配置使其在檢測到'softlockup'後進入panic
狀態通過sysctl命令設置“kernel.softlockup_panic”、使用內核啓動參數
“softlockup_panic”詳見Documentation/admin-guide/kernel-parameters.rst以及使
能內核編譯選項“BOOTPARAM_SOFTLOCKUP_PANIC”都可實現這種配置。
而'hardlockup'是一種會引發系統在內核態一直循環超過10秒鐘詳見"實現"小節)導致其
他中斷沒有機會運行的缺陷。與'softlockup'情況類似除了使用sysctl命令設置
'hardlockup_panic'、使能內核選項“BOOTPARAM_HARDLOCKUP_PANIC”以及使用內核參數
"nmi_watchdog"(詳見:”Documentation/admin-guide/kernel-parameters.rst“)外,一旦檢
測到'hardlockup'默認情況下系統打印當前堆棧跟蹤信息,然後進入鎖定狀態。
這個panic選項也可以與panic_timeout結合使用這個panic_timeout是通過稍具迷惑性的
sysctl命令"kernel.panic"來設置使系統在panic指定時間後自動重啓。
實現
====
Softlockup和hardlockup分別建立在hrtimer(高精度定時器)和perf兩個子系統上而實現。
這也就意味着理論上任何架構只要實現了這兩個子系統就支持這兩種檢測機制。
Hrtimer用於週期性產生中斷並喚醒watchdog線程NMI perf事件則以”watchdog_thresh“
(編譯時默認初始化爲10秒也可通過”watchdog_thresh“這個sysctl接口來進行配置修改)
爲間隔週期產生以檢測 hardlockups。如果一個CPU在這個時間段內沒有檢測到hrtimer中
斷髮生,'hardlockup 檢測器'(即NMI perf事件處理函數)將會視系統配置而選擇產生內核
警告或者直接panic。
而watchdog線程本質上是一個高優先級內核線程每調度一次就對時間戳進行一次更新。
如果時間戳在2*watchdog_thresh(這個是softlockup的觸發門限)這段時間都未更新,那麼
"softlocup 檢測器"(內部hrtimer定時器回調函數)會將相關的調試信息打印到系統日誌中,
然後如果系統配置了進入panic流程則進入panic否則內核繼續執行。
Hrtimer定時器的週期是2*watchdog_thresh/5也就是說在hardlockup被觸發前hrtimer有
2~3次機會產生時鐘中斷。
如上所述,內核相當於爲系統管理員提供了一個可調節hrtimer定時器和perf事件週期長度
的調節旋鈕。如何通過這個旋鈕爲特定使用場景配置一個合理的週期值要對lockups檢測的
響應速度和lockups檢測開銷這二者之間進行權衡。
默認情況下所有在線cpu上都會運行一個watchdog線程。不過在內核配置了”NO_HZ_FULL“的
情況下watchdog線程默認只會運行在管家(housekeeping)cpu上而”nohz_full“啓動參數指
定的cpu上則不會有watchdog線程運行。試想如果我們允許watchdog線程在”nohz_full“指
定的cpu上運行這些cpu上必須得運行時鐘定時器來激發watchdog線程調度這樣一來就會
使”nohz_full“保護用戶程序免受內核干擾的功能失效。當然副作用就是”nohz_full“指定
的cpu即使在內核產生了lockup問題我們也無法檢測到。不過至少我們可以允許watchdog
線程在管家(non-tickless)核上繼續運行以便我們能繼續正常的監測這些cpus上的lockups
事件。
不論哪種情況都可以通過sysctl命令kernel.watchdog_cpumask來對沒有運行watchdog線程
的cpu集合進行調節。對於nohz_full而言,如果nohz_full cpu上有異常掛住的情況通過
這種方式打開這些cpu上的watchdog進行調試可能會有所作用。

Some files were not shown because too many files have changed in this diff Show More