Merge remote-tracking branches 'spi/topic/rockchip', 'spi/topic/rspi', 'spi/topic/s3c64xx', 'spi/topic/sh-msiof' and 'spi/topic/slave' into spi-next
This commit is contained in:
6
CREDITS
6
CREDITS
@@ -2775,6 +2775,10 @@ S: C/ Mieses 20, 9-B
|
||||
S: Valladolid 47009
|
||||
S: Spain
|
||||
|
||||
N: Peter Oruba
|
||||
D: AMD Microcode loader driver
|
||||
S: Germany
|
||||
|
||||
N: Jens Osterkamp
|
||||
E: jens@de.ibm.com
|
||||
D: Maintainer of Spidernet network driver for Cell
|
||||
@@ -3945,8 +3949,6 @@ E: gwingerde@gmail.com
|
||||
D: Ralink rt2x00 WLAN driver
|
||||
D: Minix V2 file-system
|
||||
D: Misc fixes
|
||||
S: Geessinkweg 177
|
||||
S: 7544 TX Enschede
|
||||
S: The Netherlands
|
||||
|
||||
N: Lars Wirzenius
|
||||
|
||||
@@ -14,13 +14,8 @@ Following translations are available on the WWW:
|
||||
- this file.
|
||||
ABI/
|
||||
- info on kernel <-> userspace ABI and relative interface stability.
|
||||
|
||||
BUG-HUNTING
|
||||
- brute force method of doing binary search of patches to find bug.
|
||||
Changes
|
||||
- list of changes that break older software packages.
|
||||
CodingStyle
|
||||
- how the maintainers expect the C code in the kernel to look.
|
||||
- nothing here, just a pointer to process/coding-style.rst.
|
||||
DMA-API.txt
|
||||
- DMA API, pci_ API & extensions for non-consistent memory machines.
|
||||
DMA-API-HOWTO.txt
|
||||
@@ -33,8 +28,6 @@ DocBook/
|
||||
- directory with DocBook templates etc. for kernel documentation.
|
||||
EDID/
|
||||
- directory with info on customizing EDID for broken gfx/displays.
|
||||
HOWTO
|
||||
- the process and procedures of how to do Linux kernel development.
|
||||
IPMI.txt
|
||||
- info on Linux Intelligent Platform Management Interface (IPMI) Driver.
|
||||
IRQ-affinity.txt
|
||||
@@ -46,62 +39,43 @@ IRQ.txt
|
||||
Intel-IOMMU.txt
|
||||
- basic info on the Intel IOMMU virtualization support.
|
||||
Makefile
|
||||
- This file does nothing. Removing it breaks make htmldocs and
|
||||
make distclean.
|
||||
ManagementStyle
|
||||
- how to (attempt to) manage kernel hackers.
|
||||
- It's not of interest for those who aren't touching the build system.
|
||||
Makefile.sphinx
|
||||
- It's not of interest for those who aren't touching the build system.
|
||||
PCI/
|
||||
- info related to PCI drivers.
|
||||
RCU/
|
||||
- directory with info on RCU (read-copy update).
|
||||
SAK.txt
|
||||
- info on Secure Attention Keys.
|
||||
SM501.txt
|
||||
- Silicon Motion SM501 multimedia companion chip
|
||||
SecurityBugs
|
||||
- procedure for reporting security bugs found in the kernel.
|
||||
SubmitChecklist
|
||||
- Linux kernel patch submission checklist.
|
||||
SubmittingDrivers
|
||||
- procedure to get a new driver source included into the kernel tree.
|
||||
SubmittingPatches
|
||||
- procedure to get a source patch included into the kernel tree.
|
||||
VGA-softcursor.txt
|
||||
- how to change your VGA cursor from a blinking underscore.
|
||||
- nothing here, just a pointer to process/coding-style.rst.
|
||||
accounting/
|
||||
- documentation on accounting and taskstats.
|
||||
acpi/
|
||||
- info on ACPI-specific hooks in the kernel.
|
||||
admin-guide/
|
||||
- info related to Linux users and system admins.
|
||||
aoe/
|
||||
- description of AoE (ATA over Ethernet) along with config examples.
|
||||
applying-patches.txt
|
||||
- description of various trees and how to apply their patches.
|
||||
arm/
|
||||
- directory with info about Linux on the ARM architecture.
|
||||
arm64/
|
||||
- directory with info about Linux on the 64 bit ARM architecture.
|
||||
assoc_array.txt
|
||||
- generic associative array intro.
|
||||
atomic_ops.txt
|
||||
- semantics and behavior of atomic and bitmask operations.
|
||||
auxdisplay/
|
||||
- misc. LCD driver documentation (cfag12864b, ks0108).
|
||||
backlight/
|
||||
- directory with info on controlling backlights in flat panel displays
|
||||
bad_memory.txt
|
||||
- how to use kernel parameters to exclude bad RAM regions.
|
||||
basic_profiling.txt
|
||||
- basic instructions for those who wants to profile Linux kernel.
|
||||
bcache.txt
|
||||
- Block-layer cache on fast SSDs to improve slow (raid) I/O performance.
|
||||
binfmt_misc.txt
|
||||
- info on the kernel support for extra binary formats.
|
||||
blackfin/
|
||||
- directory with documentation for the Blackfin arch.
|
||||
block/
|
||||
- info on the Block I/O (BIO) layer.
|
||||
blockdev/
|
||||
- info on block devices & drivers
|
||||
braille-console.txt
|
||||
- info on how to use serial devices for Braille support.
|
||||
bt8xxgpio.txt
|
||||
- info on how to modify a bt8xx video card for GPIO usage.
|
||||
btmrvl.txt
|
||||
@@ -114,18 +88,24 @@ cachetlb.txt
|
||||
- describes the cache/TLB flushing interfaces Linux uses.
|
||||
cdrom/
|
||||
- directory with information on the CD-ROM drivers that Linux has.
|
||||
cgroups/
|
||||
- cgroups features, including cpusets and memory controller.
|
||||
cgroup-v1/
|
||||
- cgroups v1 features, including cpusets and memory controller.
|
||||
cgroup-v2.txt
|
||||
- cgroups v2 features, including cpusets and memory controller.
|
||||
circular-buffers.txt
|
||||
- how to make use of the existing circular buffer infrastructure
|
||||
clk.txt
|
||||
- info on the common clock framework
|
||||
coccinelle.txt
|
||||
- info on how to get and use the Coccinelle code checking tool.
|
||||
cma/
|
||||
- Continuous Memory Area (CMA) debugfs interface.
|
||||
conf.py
|
||||
- It's not of interest for those who aren't touching the build system.
|
||||
connector/
|
||||
- docs on the netlink based userspace<->kernel space communication mod.
|
||||
console/
|
||||
- documentation on Linux console drivers.
|
||||
core-api/
|
||||
- documentation on kernel core components.
|
||||
cpu-freq/
|
||||
- info on CPU frequency and voltage scaling.
|
||||
cpu-hotplug.txt
|
||||
@@ -150,42 +130,42 @@ debugging-via-ohci1394.txt
|
||||
- how to use firewire like a hardware debugger memory reader.
|
||||
dell_rbu.txt
|
||||
- document demonstrating the use of the Dell Remote BIOS Update driver.
|
||||
development-process/
|
||||
- how to work with the mainline kernel development process.
|
||||
dev-tools/
|
||||
- directory with info on development tools for the kernel.
|
||||
device-mapper/
|
||||
- directory with info on Device Mapper.
|
||||
devices.txt
|
||||
- plain ASCII listing of all the nodes in /dev/ with major minor #'s.
|
||||
dmaengine/
|
||||
- the DMA engine and controller API guides.
|
||||
devicetree/
|
||||
- directory with info on device tree files used by OF/PowerPC/ARM
|
||||
digsig.txt
|
||||
-info on the Digital Signature Verification API
|
||||
dma-buf-sharing.txt
|
||||
- the DMA Buffer Sharing API Guide
|
||||
docutils.conf
|
||||
- nothing here. Just a configuration file for docutils.
|
||||
dontdiff
|
||||
- file containing a list of files that should never be diff'ed.
|
||||
driver-api/
|
||||
- the Linux driver implementer's API guide.
|
||||
driver-model/
|
||||
- directory with info about Linux driver model.
|
||||
dvb/
|
||||
- info on Linux Digital Video Broadcast (DVB) subsystem.
|
||||
dynamic-debug-howto.txt
|
||||
- how to use the dynamic debug (dyndbg) feature.
|
||||
early-userspace/
|
||||
- info about initramfs, klibc, and userspace early during boot.
|
||||
edac.txt
|
||||
- information on EDAC - Error Detection And Correction
|
||||
efi-stub.txt
|
||||
- How to use the EFI boot stub to bypass GRUB or elilo on EFI systems.
|
||||
eisa.txt
|
||||
- info on EISA bus support.
|
||||
email-clients.txt
|
||||
- info on how to use e-mail to send un-mangled (git) patches.
|
||||
extcon/
|
||||
- directory with porting guide for Android kernel switch driver.
|
||||
isa.txt
|
||||
- info on EISA bus support.
|
||||
fault-injection/
|
||||
- dir with docs about the fault injection capabilities infrastructure.
|
||||
fb/
|
||||
- directory with info on the frame buffer graphics abstraction layer.
|
||||
features/
|
||||
- status of feature implementation on different architectures.
|
||||
filesystems/
|
||||
- info on the vfs and the various filesystems that Linux supports.
|
||||
firmware_class/
|
||||
@@ -194,20 +174,22 @@ flexible-arrays.txt
|
||||
- how to make use of flexible sized arrays in linux
|
||||
fmc/
|
||||
- information about the FMC bus abstraction
|
||||
fpga/
|
||||
- FPGA Manager Core.
|
||||
frv/
|
||||
- Fujitsu FR-V Linux documentation.
|
||||
futex-requeue-pi.txt
|
||||
- info on requeueing of tasks from a non-PI futex to a PI futex
|
||||
gcov.txt
|
||||
- use of GCC's coverage testing tool "gcov" with the Linux kernel
|
||||
gcc-plugins.txt
|
||||
- GCC plugin infrastructure.
|
||||
gpio/
|
||||
- gpio related documentation
|
||||
gpu/
|
||||
- directory with information on GPU driver developer's guide.
|
||||
hid/
|
||||
- directory with information on human interface devices
|
||||
highuid.txt
|
||||
- notes on the change from 16 bit to 32 bit user/group IDs.
|
||||
hsi.txt
|
||||
- HSI subsystem overview.
|
||||
hwspinlock.txt
|
||||
- hardware spinlock provides hardware assistance for synchronization
|
||||
timers/
|
||||
@@ -218,18 +200,18 @@ hwmon/
|
||||
- directory with docs on various hardware monitoring drivers.
|
||||
i2c/
|
||||
- directory with info about the I2C bus/protocol (2 wire, kHz speed).
|
||||
i2o/
|
||||
- directory with info about the Linux I2O subsystem.
|
||||
x86/i386/
|
||||
- directory with info about Linux on Intel 32 bit architecture.
|
||||
ia64/
|
||||
- directory with info about Linux on Intel 64 bit architecture.
|
||||
ide/
|
||||
- Information regarding the Enhanced IDE drive.
|
||||
iio/
|
||||
- info on industrial IIO configfs support.
|
||||
index.rst
|
||||
- main index for the documentation at ReST format.
|
||||
infiniband/
|
||||
- directory with documents concerning Linux InfiniBand support.
|
||||
init.txt
|
||||
- what to do when the kernel can't find the 1st process to run.
|
||||
initrd.txt
|
||||
- how to use the RAM disk as an initial/temporary root filesystem.
|
||||
input/
|
||||
- info on Linux input device support.
|
||||
intel_txt.txt
|
||||
@@ -248,28 +230,16 @@ isapnp.txt
|
||||
- info on Linux ISA Plug & Play support.
|
||||
isdn/
|
||||
- directory with info on the Linux ISDN support, and supported cards.
|
||||
java.txt
|
||||
- info on the in-kernel binary support for Java(tm).
|
||||
ja_JP/
|
||||
- directory with Japanese translations of various documents
|
||||
kbuild/
|
||||
- directory with info about the kernel build process.
|
||||
kernel-doc-nano-HOWTO.txt
|
||||
- outdated info about kernel-doc documentation.
|
||||
kdump/
|
||||
- directory with mini HowTo on getting the crash dump code to work.
|
||||
kernel-docs.txt
|
||||
- listing of various WWW + books that document kernel internals.
|
||||
kernel-documentation.rst
|
||||
doc-guide/
|
||||
- how to write and format reStructuredText kernel documentation
|
||||
kernel-parameters.txt
|
||||
- summary listing of command line / boot prompt args for the kernel.
|
||||
kernel-per-CPU-kthreads.txt
|
||||
- List of all per-CPU kthreads and how they introduce jitter.
|
||||
kmemcheck.txt
|
||||
- info on dynamic checker that detects uses of uninitialized memory.
|
||||
kmemleak.txt
|
||||
- info on how to make use of the kernel memory leak detection system
|
||||
ko_KR/
|
||||
- directory with Korean translations of various documents
|
||||
kobject.txt
|
||||
- info of the kobject infrastructure of the Linux kernel.
|
||||
kprobes.txt
|
||||
@@ -284,8 +254,8 @@ ldm.txt
|
||||
- a brief description of LDM (Windows Dynamic Disks).
|
||||
leds/
|
||||
- directory with info about LED handling under Linux.
|
||||
local_ops.txt
|
||||
- semantics and behavior of local atomic operations.
|
||||
livepatch/
|
||||
- info on kernel live patching.
|
||||
locking/
|
||||
- directory with info about kernel locking primitives
|
||||
lockup-watchdogs.txt
|
||||
@@ -298,22 +268,24 @@ lzo.txt
|
||||
- kernel LZO decompressor input formats
|
||||
m68k/
|
||||
- directory with info about Linux on Motorola 68k architecture.
|
||||
magic-number.txt
|
||||
- list of magic numbers used to mark/protect kernel data structures.
|
||||
mailbox.txt
|
||||
- How to write drivers for the common mailbox framework (IPC).
|
||||
md.txt
|
||||
- info on boot arguments for the multiple devices driver.
|
||||
media-framework.txt
|
||||
- info on media framework, its data structures, functions and usage.
|
||||
md-cluster.txt
|
||||
- info on shared-device RAID MD cluster.
|
||||
media/
|
||||
- info on media drivers: uAPI, kAPI and driver documentation.
|
||||
memory-barriers.txt
|
||||
- info on Linux kernel memory barriers.
|
||||
memory-devices/
|
||||
- directory with info on parts like the Texas Instruments EMIF driver
|
||||
memory-hotplug.txt
|
||||
- Hotpluggable memory support, how to use and current status.
|
||||
men-chameleon-bus.txt
|
||||
- info on MEN chameleon bus.
|
||||
metag/
|
||||
- directory with info about Linux on Meta architecture.
|
||||
mic/
|
||||
- Intel Many Integrated Core (MIC) architecture device driver.
|
||||
mips/
|
||||
- directory with info about Linux on MIPS architecture.
|
||||
misc-devices/
|
||||
@@ -322,12 +294,8 @@ mmc/
|
||||
- directory with info about the MMC subsystem
|
||||
mn10300/
|
||||
- directory with info about the mn10300 architecture port
|
||||
module-signing.txt
|
||||
- Kernel module signing for increased security when loading modules.
|
||||
mtd/
|
||||
- directory with info about memory technology devices (flash)
|
||||
mono.txt
|
||||
- how to execute Mono-based .NET binaries with the help of BINFMT_MISC.
|
||||
namespaces/
|
||||
- directory with various information about namespaces
|
||||
netlabel/
|
||||
@@ -336,30 +304,42 @@ networking/
|
||||
- directory with info on various aspects of networking with Linux.
|
||||
nfc/
|
||||
- directory relating info about Near Field Communications support.
|
||||
nios2/
|
||||
- Linux on the Nios II architecture.
|
||||
nommu-mmap.txt
|
||||
- documentation about no-mmu memory mapping support.
|
||||
numastat.txt
|
||||
- info on how to read Numa policy hit/miss statistics in sysfs.
|
||||
oops-tracing.txt
|
||||
- how to decode those nasty internal kernel error dump messages.
|
||||
ntb.txt
|
||||
- info on Non-Transparent Bridge (NTB) drivers.
|
||||
nvdimm/
|
||||
- info on non-volatile devices.
|
||||
nvmem/
|
||||
- info on non volatile memory framework.
|
||||
output/
|
||||
- default directory where html/LaTeX/pdf files will be written.
|
||||
padata.txt
|
||||
- An introduction to the "padata" parallel execution API
|
||||
parisc/
|
||||
- directory with info on using Linux on PA-RISC architecture.
|
||||
parport.txt
|
||||
- how to use the parallel-port driver.
|
||||
parport-lowlevel.txt
|
||||
- description and usage of the low level parallel port functions.
|
||||
pcmcia/
|
||||
- info on the Linux PCMCIA driver.
|
||||
percpu-rw-semaphore.txt
|
||||
- RCU based read-write semaphore optimized for locking for reading
|
||||
perf/
|
||||
- info about the APM X-Gene SoC Performance Monitoring Unit (PMU).
|
||||
phy/
|
||||
- ino on Samsung USB 2.0 PHY adaptation layer.
|
||||
phy.txt
|
||||
- Description of the generic PHY framework.
|
||||
pi-futex.txt
|
||||
- documentation on lightweight priority inheritance futexes.
|
||||
pinctrl.txt
|
||||
- info on pinctrl subsystem and the PINMUX/PINCONF and drivers
|
||||
platform/
|
||||
- List of supported hardware by compal and Dell laptop.
|
||||
pnp.txt
|
||||
- Linux Plug and Play documentation.
|
||||
power/
|
||||
@@ -372,14 +352,16 @@ preempt-locking.txt
|
||||
- info on locking under a preemptive kernel.
|
||||
printk-formats.txt
|
||||
- how to get printk format specifiers right
|
||||
process/
|
||||
- how to work with the mainline kernel development process.
|
||||
pps/
|
||||
- directory with information on the pulse-per-second support
|
||||
pti/
|
||||
- directory with info on Intel MID PTI.
|
||||
ptp/
|
||||
- directory with info on support for IEEE 1588 PTP clocks in Linux.
|
||||
pwm.txt
|
||||
- info on the pulse width modulation driver subsystem
|
||||
ramoops.txt
|
||||
- documentation of the ramoops oops/panic logging module.
|
||||
rapidio/
|
||||
- directory with info on RapidIO packet-based fabric interconnect
|
||||
rbtree.txt
|
||||
@@ -406,8 +388,6 @@ security/
|
||||
- directory that contains security-related info
|
||||
serial/
|
||||
- directory with info on the low level serial API.
|
||||
serial-console.txt
|
||||
- how to set up Linux with a serial line console as the default.
|
||||
sgi-ioc4.txt
|
||||
- description of the SGI IOC4 PCI (multi function) device.
|
||||
sh/
|
||||
@@ -416,24 +396,20 @@ smsc_ece1099.txt
|
||||
-info on the smsc Keyboard Scan Expansion/GPIO Expansion device.
|
||||
sound/
|
||||
- directory with info on sound card support.
|
||||
sparse.txt
|
||||
- info on how to obtain and use the sparse tool for typechecking.
|
||||
spi/
|
||||
- overview of Linux kernel Serial Peripheral Interface (SPI) support.
|
||||
stable_api_nonsense.txt
|
||||
- info on why the kernel does not have a stable in-kernel api or abi.
|
||||
stable_kernel_rules.txt
|
||||
- rules and procedures for the -stable kernel releases.
|
||||
sphinx/
|
||||
- no documentation here, just files required by Sphinx toolchain.
|
||||
sphinx-static/
|
||||
- no documentation here, just files required by Sphinx toolchain.
|
||||
static-keys.txt
|
||||
- info on how static keys allow debug code in hotpaths via patching
|
||||
svga.txt
|
||||
- short guide on selecting video modes at boot via VGA BIOS.
|
||||
sysfs-rules.txt
|
||||
- How not to use sysfs.
|
||||
sync_file.txt
|
||||
- Sync file API guide.
|
||||
sysctl/
|
||||
- directory with info on the /proc/sys/* files.
|
||||
sysrq.txt
|
||||
- info on the magic SysRq key.
|
||||
target/
|
||||
- directory with info on generating TCM v4 fabric .ko modules
|
||||
this_cpu_ops.txt
|
||||
@@ -442,39 +418,29 @@ thermal/
|
||||
- directory with information on managing thermal issues (CPU/temp)
|
||||
trace/
|
||||
- directory with info on tracing technologies within linux
|
||||
translations/
|
||||
- translations of this document from English to another language
|
||||
unaligned-memory-access.txt
|
||||
- info on how to avoid arch breaking unaligned memory access in code.
|
||||
unicode.txt
|
||||
- info on the Unicode character/font mapping used in Linux.
|
||||
unshare.txt
|
||||
- description of the Linux unshare system call.
|
||||
usb/
|
||||
- directory with info regarding the Universal Serial Bus.
|
||||
vDSO/
|
||||
- directory with info regarding virtual dynamic shared objects
|
||||
vfio.txt
|
||||
- info on Virtual Function I/O used in guest/hypervisor instances.
|
||||
vgaarbiter.txt
|
||||
- info on enable/disable the legacy decoding on different VGA devices
|
||||
video-output.txt
|
||||
- sysfs class driver interface to enable/disable a video output device.
|
||||
video4linux/
|
||||
- directory with info regarding video/TV/radio cards and linux.
|
||||
virtual/
|
||||
- directory with information on the various linux virtualizations.
|
||||
vm/
|
||||
- directory with info on the Linux vm code.
|
||||
vme_api.txt
|
||||
- file relating info on the VME bus API in linux
|
||||
volatile-considered-harmful.txt
|
||||
- Why the "volatile" type class should not be used
|
||||
w1/
|
||||
- directory with documents regarding the 1-wire (w1) subsystem.
|
||||
watchdog/
|
||||
- how to auto-reboot Linux if it has "fallen and can't get up". ;-)
|
||||
wimax/
|
||||
- directory with info about Intel Wireless Wimax Connections
|
||||
workqueue.txt
|
||||
core-api/workqueue.rst
|
||||
- information on the Concurrency Managed Workqueue implementation
|
||||
x86/x86_64/
|
||||
- directory with info on Linux support for AMD x86-64 (Hammer) machines.
|
||||
@@ -484,7 +450,5 @@ xtensa/
|
||||
- directory with documents relating to arch/xtensa port/implementation
|
||||
xz.txt
|
||||
- how to make use of the XZ data compression within linux kernel
|
||||
zh_CN/
|
||||
- directory with Chinese translations of various documents
|
||||
zorro.txt
|
||||
- info on writing drivers for Zorro bus devices found on Amigas.
|
||||
|
||||
@@ -1,5 +0,0 @@
|
||||
# -*- coding: utf-8; mode: python -*-
|
||||
|
||||
project = "Linux 802.11 Driver Developer's Guide"
|
||||
|
||||
tags.add("subproject")
|
||||
@@ -1,17 +0,0 @@
|
||||
=====================================
|
||||
Linux 802.11 Driver Developer's Guide
|
||||
=====================================
|
||||
|
||||
.. toctree::
|
||||
|
||||
introduction
|
||||
cfg80211
|
||||
mac80211
|
||||
mac80211-advanced
|
||||
|
||||
.. only:: subproject
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
||||
@@ -84,4 +84,4 @@ stable:
|
||||
|
||||
- Kernel-internal symbols. Do not rely on the presence, absence, location, or
|
||||
type of any kernel symbol, either in System.map files or the kernel binary
|
||||
itself. See Documentation/stable_api_nonsense.txt.
|
||||
itself. See Documentation/process/stable-api-nonsense.rst.
|
||||
|
||||
@@ -8,3 +8,17 @@ Description:
|
||||
Any device associated with a device-tree node will have
|
||||
an of_path symlink pointing to the corresponding device
|
||||
node in /sys/firmware/devicetree/
|
||||
|
||||
What: /sys/devices/*/devspec
|
||||
Date: October 2016
|
||||
Contact: Device Tree mailing list <devicetree@vger.kernel.org>
|
||||
Description:
|
||||
If CONFIG_OF is enabled, then this file is present. When
|
||||
read, it returns full name of the device node.
|
||||
|
||||
What: /sys/devices/*/obppath
|
||||
Date: October 2016
|
||||
Contact: Device Tree mailing list <devicetree@vger.kernel.org>
|
||||
Description:
|
||||
If CONFIG_OF is enabled, then this file is present. When
|
||||
read, it returns full name of the device node.
|
||||
|
||||
@@ -235,3 +235,45 @@ Description:
|
||||
write_same_max_bytes is 0, write same is not supported
|
||||
by the device.
|
||||
|
||||
What: /sys/block/<disk>/queue/write_zeroes_max_bytes
|
||||
Date: November 2016
|
||||
Contact: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
|
||||
Description:
|
||||
Devices that support write zeroes operation in which a
|
||||
single request can be issued to zero out the range of
|
||||
contiguous blocks on storage without having any payload
|
||||
in the request. This can be used to optimize writing zeroes
|
||||
to the devices. write_zeroes_max_bytes indicates how many
|
||||
bytes can be written in a single write zeroes command. If
|
||||
write_zeroes_max_bytes is 0, write zeroes is not supported
|
||||
by the device.
|
||||
|
||||
What: /sys/block/<disk>/queue/zoned
|
||||
Date: September 2016
|
||||
Contact: Damien Le Moal <damien.lemoal@hgst.com>
|
||||
Description:
|
||||
zoned indicates if the device is a zoned block device
|
||||
and the zone model of the device if it is indeed zoned.
|
||||
The possible values indicated by zoned are "none" for
|
||||
regular block devices and "host-aware" or "host-managed"
|
||||
for zoned block devices. The characteristics of
|
||||
host-aware and host-managed zoned block devices are
|
||||
described in the ZBC (Zoned Block Commands) and ZAC
|
||||
(Zoned Device ATA Command Set) standards. These standards
|
||||
also define the "drive-managed" zone model. However,
|
||||
since drive-managed zoned block devices do not support
|
||||
zone commands, they will be treated as regular block
|
||||
devices and zoned will report "none".
|
||||
|
||||
What: /sys/block/<disk>/queue/chunk_sectors
|
||||
Date: September 2016
|
||||
Contact: Hannes Reinecke <hare@suse.com>
|
||||
Description:
|
||||
chunk_sectors has different meaning depending on the type
|
||||
of the disk. For a RAID device (dm-raid), chunk_sectors
|
||||
indicates the size in 512B sectors of the RAID volume
|
||||
stripe segment. For a zoned block device, either
|
||||
host-aware or host-managed, chunk_sectors indicates the
|
||||
size of 512B sectors of the zones of the device, with
|
||||
the eventual exception of the last zone of the device
|
||||
which may be smaller.
|
||||
|
||||
21
Documentation/ABI/testing/sysfs-bus-fsl-mc
Normal file
21
Documentation/ABI/testing/sysfs-bus-fsl-mc
Normal file
@@ -0,0 +1,21 @@
|
||||
What: /sys/bus/fsl-mc/drivers/.../bind
|
||||
Date: December 2016
|
||||
Contact: stuart.yoder@nxp.com
|
||||
Description:
|
||||
Writing a device location to this file will cause
|
||||
the driver to attempt to bind to the device found at
|
||||
this location. The format for the location is Object.Id
|
||||
and is the same as found in /sys/bus/fsl-mc/devices/.
|
||||
For example:
|
||||
# echo dpni.2 > /sys/bus/fsl-mc/drivers/fsl_dpaa2_eth/bind
|
||||
|
||||
What: /sys/bus/fsl-mc/drivers/.../unbind
|
||||
Date: December 2016
|
||||
Contact: stuart.yoder@nxp.com
|
||||
Description:
|
||||
Writing a device location to this file will cause the
|
||||
driver to attempt to unbind from the device found at
|
||||
this location. The format for the location is Object.Id
|
||||
and is the same as found in /sys/bus/fsl-mc/devices/.
|
||||
For example:
|
||||
# echo dpni.2 > /sys/bus/fsl-mc/drivers/fsl_dpaa2_eth/unbind
|
||||
@@ -329,6 +329,7 @@ What: /sys/bus/iio/devices/iio:deviceX/in_pressure_scale
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_humidityrelative_scale
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_velocity_sqrt(x^2+y^2+z^2)_scale
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_illuminance_scale
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_countY_scale
|
||||
KernelVersion: 2.6.35
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
@@ -1579,3 +1580,20 @@ Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Raw (unscaled no offset etc.) electric conductivity reading that
|
||||
can be processed to siemens per meter.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_countY_raw
|
||||
KernelVersion: 4.9
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Raw counter device counts from channel Y. For quadrature
|
||||
counters, multiplication by an available [Y]_scale results in
|
||||
the counts of a single quadrature signal phase from channel Y.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_indexY_raw
|
||||
KernelVersion: 4.9
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Raw counter device index value from channel Y. This attribute
|
||||
provides an absolute positional reference (e.g. a pulse once per
|
||||
revolution) which may be used to home positional systems as
|
||||
required.
|
||||
|
||||
@@ -0,0 +1,36 @@
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_altvoltageY_invert
|
||||
Date: October 2016
|
||||
KernelVersion: 4.9
|
||||
Contact: Peter Rosin <peda@axentia.se>
|
||||
Description:
|
||||
The DAC is used to find the peak level of an alternating
|
||||
voltage input signal by a binary search using the output
|
||||
of a comparator wired to an interrupt pin. Like so:
|
||||
_
|
||||
| \
|
||||
input +------>-------|+ \
|
||||
| \
|
||||
.-------. | }---.
|
||||
| | | / |
|
||||
| dac|-->--|- / |
|
||||
| | |_/ |
|
||||
| | |
|
||||
| | |
|
||||
| irq|------<-------'
|
||||
| |
|
||||
'-------'
|
||||
The boolean invert attribute (0/1) should be set when the
|
||||
input signal is centered around the maximum value of the
|
||||
dac instead of zero. The envelope detector will search
|
||||
from below in this case and will also invert the result.
|
||||
The edge/level of the interrupt is also switched to its
|
||||
opposite value.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_altvoltageY_compare_interval
|
||||
Date: October 2016
|
||||
KernelVersion: 4.9
|
||||
Contact: Peter Rosin <peda@axentia.se>
|
||||
Description:
|
||||
Number of milliseconds to wait for the comparator in each
|
||||
step of the binary search for the input peak level. Needs
|
||||
to relate to the frequency of the input signal.
|
||||
125
Documentation/ABI/testing/sysfs-bus-iio-counter-104-quad-8
Normal file
125
Documentation/ABI/testing/sysfs-bus-iio-counter-104-quad-8
Normal file
@@ -0,0 +1,125 @@
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_count_count_direction_available
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_count_count_mode_available
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_count_noise_error_available
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_count_quadrature_mode_available
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_index_index_polarity_available
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_index_synchronous_mode_available
|
||||
KernelVersion: 4.9
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Discrete set of available values for the respective counter
|
||||
configuration are listed in this file.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_countY_count_direction
|
||||
KernelVersion: 4.9
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Read-only attribute that indicates whether the counter for
|
||||
channel Y is counting up or down.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_countY_count_mode
|
||||
KernelVersion: 4.9
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Count mode for channel Y. Four count modes are available:
|
||||
normal, range limit, non-recycle, and modulo-n. The preset value
|
||||
for channel Y is used by the count mode where required.
|
||||
|
||||
Normal:
|
||||
Counting is continuous in either direction.
|
||||
|
||||
Range Limit:
|
||||
An upper or lower limit is set, mimicking limit switches
|
||||
in the mechanical counterpart. The upper limit is set to
|
||||
the preset value, while the lower limit is set to 0. The
|
||||
counter freezes at count = preset when counting up, and
|
||||
at count = 0 when counting down. At either of these
|
||||
limits, the counting is resumed only when the count
|
||||
direction is reversed.
|
||||
|
||||
Non-recycle:
|
||||
Counter is disabled whenever a 24-bit count overflow or
|
||||
underflow takes place. The counter is re-enabled when a
|
||||
new count value is loaded to the counter via a preset
|
||||
operation or write to raw.
|
||||
|
||||
Modulo-N:
|
||||
A count boundary is set between 0 and the preset value.
|
||||
The counter is reset to 0 at count = preset when
|
||||
counting up, while the counter is set to the preset
|
||||
value at count = 0 when counting down; the counter does
|
||||
not freeze at the bundary points, but counts
|
||||
continuously throughout.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_countY_noise_error
|
||||
KernelVersion: 4.9
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Read-only attribute that indicates whether excessive noise is
|
||||
present at the channel Y count inputs in quadrature clock mode;
|
||||
irrelevant in non-quadrature clock mode.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_countY_preset
|
||||
KernelVersion: 4.9
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
If the counter device supports preset registers, the preset
|
||||
count for channel Y is provided by this attribute.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_countY_quadrature_mode
|
||||
KernelVersion: 4.9
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Configure channel Y counter for non-quadrature or quadrature
|
||||
clock mode. Selecting non-quadrature clock mode will disable
|
||||
synchronous load mode. In quadrature clock mode, the channel Y
|
||||
scale attribute selects the encoder phase division (scale of 1
|
||||
selects full-cycle, scale of 0.5 selects half-cycle, scale of
|
||||
0.25 selects quarter-cycle) processed by the channel Y counter.
|
||||
|
||||
Non-quadrature:
|
||||
The filter and decoder circuit are bypassed. Encoder A
|
||||
input serves as the count input and B as the UP/DOWN
|
||||
direction control input, with B = 1 selecting UP Count
|
||||
mode and B = 0 selecting Down Count mode.
|
||||
|
||||
Quadrature:
|
||||
Encoder A and B inputs are digitally filtered and
|
||||
decoded for UP/DN clock.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_countY_set_to_preset_on_index
|
||||
KernelVersion: 4.9
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Whether to set channel Y counter with channel Y preset value
|
||||
when channel Y index input is active, or continuously count.
|
||||
Valid attribute values are boolean.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_indexY_index_polarity
|
||||
KernelVersion: 4.9
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Active level of channel Y index input; irrelevant in
|
||||
non-synchronous load mode.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_indexY_synchronous_mode
|
||||
KernelVersion: 4.9
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Configure channel Y counter for non-synchronous or synchronous
|
||||
load mode. Synchronous load mode cannot be selected in
|
||||
non-quadrature clock mode.
|
||||
|
||||
Non-synchronous:
|
||||
A logic low level is the active level at this index
|
||||
input. The index function (as enabled via
|
||||
set_to_preset_on_index) is performed directly on the
|
||||
active level of the index input.
|
||||
|
||||
Synchronous:
|
||||
Intended for interfacing with encoder Index output in
|
||||
quadrature clock mode. The active level is configured
|
||||
via index_polarity. The index function (as enabled via
|
||||
set_to_preset_on_index) is performed synchronously with
|
||||
the quadrature clock on the active level of the index
|
||||
input.
|
||||
18
Documentation/ABI/testing/sysfs-bus-iio-cros-ec
Normal file
18
Documentation/ABI/testing/sysfs-bus-iio-cros-ec
Normal file
@@ -0,0 +1,18 @@
|
||||
What: /sys/bus/iio/devices/iio:deviceX/calibrate
|
||||
Date: July 2015
|
||||
KernelVersion: 4.7
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Writing '1' will perform a FOC (Fast Online Calibration). The
|
||||
corresponding calibration offsets can be read from *_calibbias
|
||||
entries.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/location
|
||||
Date: July 2015
|
||||
KernelVersion: 4.7
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
This attribute returns a string with the physical location where
|
||||
the motion sensor is placed. For example, in a laptop a motion
|
||||
sensor can be located on the base or on the lid. Current valid
|
||||
values are 'base' and 'lid'.
|
||||
8
Documentation/ABI/testing/sysfs-bus-iio-dac-dpot-dac
Normal file
8
Documentation/ABI/testing/sysfs-bus-iio-dac-dpot-dac
Normal file
@@ -0,0 +1,8 @@
|
||||
What: /sys/bus/iio/devices/iio:deviceX/out_voltageY_raw_available
|
||||
Date: October 2016
|
||||
KernelVersion: 4.9
|
||||
Contact: Peter Rosin <peda@axentia.se>
|
||||
Description:
|
||||
The range of available values represented as the minimum value,
|
||||
the step and the maximum value, all enclosed in square brackets.
|
||||
Example: [0 1 256]
|
||||
19
Documentation/ABI/testing/sysfs-bus-iio-light-isl29018
Normal file
19
Documentation/ABI/testing/sysfs-bus-iio-light-isl29018
Normal file
@@ -0,0 +1,19 @@
|
||||
What: /sys/bus/iio/devices/iio:deviceX/proximity_on_chip_ambient_infrared_suppression
|
||||
Date: January 2011
|
||||
KernelVersion: 2.6.37
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
From ISL29018 Data Sheet (FN6619.4, Oct 8, 2012) regarding the
|
||||
infrared suppression:
|
||||
|
||||
Scheme 0, makes full n (4, 8, 12, 16) bits (unsigned) proximity
|
||||
detection. The range of Scheme 0 proximity count is from 0 to
|
||||
2^n. Logic 1 of this bit, Scheme 1, makes n-1 (3, 7, 11, 15)
|
||||
bits (2's complementary) proximity_less_ambient detection. The
|
||||
range of Scheme 1 proximity count is from -2^(n-1) to 2^(n-1).
|
||||
The sign bit is extended for resolutions less than 16. While
|
||||
Scheme 0 has wider dynamic range, Scheme 1 proximity detection
|
||||
is less affected by the ambient IR noise variation.
|
||||
|
||||
0 Sensing IR from LED and ambient
|
||||
1 Sensing IR from LED with ambient IR rejection
|
||||
20
Documentation/ABI/testing/sysfs-bus-iio-light-tsl2583
Normal file
20
Documentation/ABI/testing/sysfs-bus-iio-light-tsl2583
Normal file
@@ -0,0 +1,20 @@
|
||||
What: /sys/bus/iio/devices/device[n]/in_illuminance_calibrate
|
||||
KernelVersion: 2.6.37
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
This property causes an internal calibration of the als gain trim
|
||||
value which is later used in calculating illuminance in lux.
|
||||
|
||||
What: /sys/bus/iio/devices/device[n]/in_illuminance_lux_table
|
||||
KernelVersion: 2.6.37
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
This property gets/sets the table of coefficients
|
||||
used in calculating illuminance in lux.
|
||||
|
||||
What: /sys/bus/iio/devices/device[n]/in_illuminance_input_target
|
||||
KernelVersion: 2.6.37
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
This property is the known externally illuminance (in lux).
|
||||
It is used in the process of calibrating the device accuracy.
|
||||
@@ -0,0 +1,8 @@
|
||||
What: /sys/bus/iio/devices/iio:deviceX/out_resistance_raw_available
|
||||
Date: October 2016
|
||||
KernelVersion: 4.9
|
||||
Contact: Peter Rosin <peda@axentia.se>
|
||||
Description:
|
||||
The range of available values represented as the minimum value,
|
||||
the step and the maximum value, all enclosed in square brackets.
|
||||
Example: [0 1 256]
|
||||
@@ -294,3 +294,10 @@ Description:
|
||||
a firmware bug to the system vendor. Writing to this file
|
||||
taints the kernel with TAINT_FIRMWARE_WORKAROUND, which
|
||||
reduces the supportability of your system.
|
||||
|
||||
What: /sys/bus/pci/devices/.../revision
|
||||
Date: November 2016
|
||||
Contact: Emil Velikov <emil.l.velikov@gmail.com>
|
||||
Description:
|
||||
This file contains the revision field of the the PCI device.
|
||||
The value comes from device config space. The file is read only.
|
||||
|
||||
111
Documentation/ABI/testing/sysfs-bus-vfio-mdev
Normal file
111
Documentation/ABI/testing/sysfs-bus-vfio-mdev
Normal file
@@ -0,0 +1,111 @@
|
||||
What: /sys/.../<device>/mdev_supported_types/
|
||||
Date: October 2016
|
||||
Contact: Kirti Wankhede <kwankhede@nvidia.com>
|
||||
Description:
|
||||
This directory contains list of directories of currently
|
||||
supported mediated device types and their details for
|
||||
<device>. Supported type attributes are defined by the
|
||||
vendor driver who registers with Mediated device framework.
|
||||
Each supported type is a directory whose name is created
|
||||
by adding the device driver string as a prefix to the
|
||||
string provided by the vendor driver.
|
||||
|
||||
What: /sys/.../<device>/mdev_supported_types/<type-id>/
|
||||
Date: October 2016
|
||||
Contact: Kirti Wankhede <kwankhede@nvidia.com>
|
||||
Description:
|
||||
This directory gives details of supported type, like name,
|
||||
description, available_instances, device_api etc.
|
||||
'device_api' and 'available_instances' are mandatory
|
||||
attributes to be provided by vendor driver. 'name',
|
||||
'description' and other vendor driver specific attributes
|
||||
are optional.
|
||||
|
||||
What: /sys/.../mdev_supported_types/<type-id>/create
|
||||
Date: October 2016
|
||||
Contact: Kirti Wankhede <kwankhede@nvidia.com>
|
||||
Description:
|
||||
Writing UUID to this file will create mediated device of
|
||||
type <type-id> for parent device <device>. This is a
|
||||
write-only file.
|
||||
For example:
|
||||
# echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \
|
||||
/sys/devices/foo/mdev_supported_types/foo-1/create
|
||||
|
||||
What: /sys/.../mdev_supported_types/<type-id>/devices/
|
||||
Date: October 2016
|
||||
Contact: Kirti Wankhede <kwankhede@nvidia.com>
|
||||
Description:
|
||||
This directory contains symbolic links pointing to mdev
|
||||
devices sysfs entries which are created of this <type-id>.
|
||||
|
||||
What: /sys/.../mdev_supported_types/<type-id>/available_instances
|
||||
Date: October 2016
|
||||
Contact: Kirti Wankhede <kwankhede@nvidia.com>
|
||||
Description:
|
||||
Reading this attribute will show the number of mediated
|
||||
devices of type <type-id> that can be created. This is a
|
||||
readonly file.
|
||||
Users:
|
||||
Userspace applications interested in creating mediated
|
||||
device of that type. Userspace application should check
|
||||
the number of available instances could be created before
|
||||
creating mediated device of this type.
|
||||
|
||||
What: /sys/.../mdev_supported_types/<type-id>/device_api
|
||||
Date: October 2016
|
||||
Contact: Kirti Wankhede <kwankhede@nvidia.com>
|
||||
Description:
|
||||
Reading this attribute will show VFIO device API supported
|
||||
by this type. For example, "vfio-pci" for a PCI device,
|
||||
"vfio-platform" for platform device.
|
||||
|
||||
What: /sys/.../mdev_supported_types/<type-id>/name
|
||||
Date: October 2016
|
||||
Contact: Kirti Wankhede <kwankhede@nvidia.com>
|
||||
Description:
|
||||
Reading this attribute will show human readable name of the
|
||||
mediated device that will get created of type <type-id>.
|
||||
This is optional attribute. For example: "Grid M60-0Q"
|
||||
Users:
|
||||
Userspace applications interested in knowing the name of
|
||||
a particular <type-id> that can help in understanding the
|
||||
type of mediated device.
|
||||
|
||||
What: /sys/.../mdev_supported_types/<type-id>/description
|
||||
Date: October 2016
|
||||
Contact: Kirti Wankhede <kwankhede@nvidia.com>
|
||||
Description:
|
||||
Reading this attribute will show description of the type of
|
||||
mediated device that will get created of type <type-id>.
|
||||
This is optional attribute. For example:
|
||||
"2 heads, 512M FB, 2560x1600 maximum resolution"
|
||||
Users:
|
||||
Userspace applications interested in knowing the details of
|
||||
a particular <type-id> that can help in understanding the
|
||||
features provided by that type of mediated device.
|
||||
|
||||
What: /sys/.../<device>/<UUID>/
|
||||
Date: October 2016
|
||||
Contact: Kirti Wankhede <kwankhede@nvidia.com>
|
||||
Description:
|
||||
This directory represents device directory of mediated
|
||||
device. It contains all the attributes related to mediated
|
||||
device.
|
||||
|
||||
What: /sys/.../<device>/<UUID>/mdev_type
|
||||
Date: October 2016
|
||||
Contact: Kirti Wankhede <kwankhede@nvidia.com>
|
||||
Description:
|
||||
This is symbolic link pointing to supported type, <type-id>
|
||||
directory of which this mediated device is created.
|
||||
|
||||
What: /sys/.../<device>/<UUID>/remove
|
||||
Date: October 2016
|
||||
Contact: Kirti Wankhede <kwankhede@nvidia.com>
|
||||
Description:
|
||||
Writing '1' to this file destroys the mediated device. The
|
||||
vendor driver can fail the remove() callback if that device
|
||||
is active and the vendor driver doesn't support hot unplug.
|
||||
Example:
|
||||
# echo 1 > /sys/bus/mdev/devices/<UUID>/remove
|
||||
11
Documentation/ABI/testing/sysfs-class-fpga-bridge
Normal file
11
Documentation/ABI/testing/sysfs-class-fpga-bridge
Normal file
@@ -0,0 +1,11 @@
|
||||
What: /sys/class/fpga_bridge/<bridge>/name
|
||||
Date: January 2016
|
||||
KernelVersion: 4.5
|
||||
Contact: Alan Tull <atull@opensource.altera.com>
|
||||
Description: Name of low level FPGA bridge driver.
|
||||
|
||||
What: /sys/class/fpga_bridge/<bridge>/state
|
||||
Date: January 2016
|
||||
KernelVersion: 4.5
|
||||
Contact: Alan Tull <atull@opensource.altera.com>
|
||||
Description: Show bridge state as "enabled" or "disabled"
|
||||
@@ -4,16 +4,24 @@ KernelVersion: 2.6.17
|
||||
Contact: Richard Purdie <rpurdie@rpsys.net>
|
||||
Description:
|
||||
Set the brightness of the LED. Most LEDs don't
|
||||
have hardware brightness support so will just be turned on for
|
||||
have hardware brightness support, so will just be turned on for
|
||||
non-zero brightness settings. The value is between 0 and
|
||||
/sys/class/leds/<led>/max_brightness.
|
||||
|
||||
Writing 0 to this file clears active trigger.
|
||||
|
||||
Writing non-zero to this file while trigger is active changes the
|
||||
top brightness trigger is going to use.
|
||||
|
||||
What: /sys/class/leds/<led>/max_brightness
|
||||
Date: March 2006
|
||||
KernelVersion: 2.6.17
|
||||
Contact: Richard Purdie <rpurdie@rpsys.net>
|
||||
Description:
|
||||
Maximum brightness level for this led, default is 255 (LED_FULL).
|
||||
Maximum brightness level for this LED, default is 255 (LED_FULL).
|
||||
|
||||
If the LED does not support different brightness levels, this
|
||||
should be 1.
|
||||
|
||||
What: /sys/class/leds/<led>/trigger
|
||||
Date: March 2006
|
||||
@@ -21,7 +29,7 @@ KernelVersion: 2.6.17
|
||||
Contact: Richard Purdie <rpurdie@rpsys.net>
|
||||
Description:
|
||||
Set the trigger for this LED. A trigger is a kernel based source
|
||||
of led events.
|
||||
of LED events.
|
||||
You can change triggers in a similar manner to the way an IO
|
||||
scheduler is chosen. Trigger specific parameters can appear in
|
||||
/sys/class/leds/<led> once a given trigger is selected. For
|
||||
|
||||
@@ -29,3 +29,19 @@ Description: Display fw status registers content
|
||||
Also number of registers varies between 1 and 6
|
||||
depending on generation.
|
||||
|
||||
What: /sys/class/mei/meiN/hbm_ver
|
||||
Date: Aug 2016
|
||||
KernelVersion: 4.9
|
||||
Contact: Tomas Winkler <tomas.winkler@intel.com>
|
||||
Description: Display the negotiated HBM protocol version.
|
||||
|
||||
The HBM protocol version negotiated
|
||||
between the driver and the device.
|
||||
|
||||
What: /sys/class/mei/meiN/hbm_ver_drv
|
||||
Date: Aug 2016
|
||||
KernelVersion: 4.9
|
||||
Contact: Tomas Winkler <tomas.winkler@intel.com>
|
||||
Description: Display the driver HBM protocol version.
|
||||
|
||||
The HBM protocol version supported by the driver.
|
||||
|
||||
50
Documentation/ABI/testing/sysfs-class-remoteproc
Normal file
50
Documentation/ABI/testing/sysfs-class-remoteproc
Normal file
@@ -0,0 +1,50 @@
|
||||
What: /sys/class/remoteproc/.../firmware
|
||||
Date: October 2016
|
||||
Contact: Matt Redfearn <matt.redfearn@imgtec.com>
|
||||
Description: Remote processor firmware
|
||||
|
||||
Reports the name of the firmware currently loaded to the
|
||||
remote processor.
|
||||
|
||||
To change the running firmware, ensure the remote processor is
|
||||
stopped (using /sys/class/remoteproc/.../state) and write a new filename.
|
||||
|
||||
What: /sys/class/remoteproc/.../state
|
||||
Date: October 2016
|
||||
Contact: Matt Redfearn <matt.redfearn@imgtec.com>
|
||||
Description: Remote processor state
|
||||
|
||||
Reports the state of the remote processor, which will be one of:
|
||||
|
||||
"offline"
|
||||
"suspended"
|
||||
"running"
|
||||
"crashed"
|
||||
"invalid"
|
||||
|
||||
"offline" means the remote processor is powered off.
|
||||
|
||||
"suspended" means that the remote processor is suspended and
|
||||
must be woken to receive messages.
|
||||
|
||||
"running" is the normal state of an available remote processor
|
||||
|
||||
"crashed" indicates that a problem/crash has been detected on
|
||||
the remote processor.
|
||||
|
||||
"invalid" is returned if the remote processor is in an
|
||||
unknown state.
|
||||
|
||||
Writing this file controls the state of the remote processor.
|
||||
The following states can be written:
|
||||
|
||||
"start"
|
||||
"stop"
|
||||
|
||||
Writing "start" will attempt to start the processor running the
|
||||
firmware indicated by, or written to,
|
||||
/sys/class/remoteproc/.../firmware. The remote processor should
|
||||
transition to "running" state.
|
||||
|
||||
Writing "stop" will attempt to halt the remote processor and
|
||||
return it to the "offline" state.
|
||||
@@ -1,8 +1,9 @@
|
||||
What: Attribute for calibrating ST-Ericsson AB8500 Real Time Clock
|
||||
What: /sys/class/rtc/rtc0/device/rtc_calibration
|
||||
Date: Oct 2011
|
||||
KernelVersion: 3.0
|
||||
Contact: Mark Godfrey <mark.godfrey@stericsson.com>
|
||||
Description: The rtc_calibration attribute allows the userspace to
|
||||
Description: Attribute for calibrating ST-Ericsson AB8500 Real Time Clock
|
||||
The rtc_calibration attribute allows the userspace to
|
||||
calibrate the AB8500.s 32KHz Real Time Clock.
|
||||
Every 60 seconds the AB8500 will correct the RTC's value
|
||||
by adding to it the value of this attribute.
|
||||
|
||||
12
Documentation/ABI/testing/sysfs-devices-deferred_probe
Normal file
12
Documentation/ABI/testing/sysfs-devices-deferred_probe
Normal file
@@ -0,0 +1,12 @@
|
||||
What: /sys/devices/.../deferred_probe
|
||||
Date: August 2016
|
||||
Contact: Ben Hutchings <ben.hutchings@codethink.co.uk>
|
||||
Description:
|
||||
The /sys/devices/.../deferred_probe attribute is
|
||||
present for all devices. If a driver detects during
|
||||
probing a device that a related device is not yet
|
||||
ready, it may defer probing of the first device. The
|
||||
kernel will retry probing the first device after any
|
||||
other device is successfully probed. This attribute
|
||||
reads as 1 if probing of this device is currently
|
||||
deferred, or 0 otherwise.
|
||||
@@ -272,6 +272,22 @@ Description: Parameters for the CPU cache attributes
|
||||
the modified cache line is written to main
|
||||
memory only when it is replaced
|
||||
|
||||
|
||||
What: /sys/devices/system/cpu/cpu*/cache/index*/id
|
||||
Date: September 2016
|
||||
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
|
||||
Description: Cache id
|
||||
|
||||
The id provides a unique number for a specific instance of
|
||||
a cache of a particular type. E.g. there may be a level
|
||||
3 unified cache on each socket in a server and we may
|
||||
assign them ids 0, 1, 2, ...
|
||||
|
||||
Note that id value can be non-contiguous. E.g. level 1
|
||||
caches typically exist per core, but there may not be a
|
||||
power of two cores on a socket, so these caches may be
|
||||
numbered 0, 1, 2, 3, 4, 5, 8, 9, 10, ...
|
||||
|
||||
What: /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats
|
||||
/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat
|
||||
/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub_turbo_stat
|
||||
|
||||
@@ -347,7 +347,7 @@ Description:
|
||||
because of fragmentation, SLUB will retry with the minimum order
|
||||
possible depending on its characteristics.
|
||||
When debug_guardpage_minorder=N (N > 0) parameter is specified
|
||||
(see Documentation/kernel-parameters.txt), the minimum possible
|
||||
(see Documentation/admin-guide/kernel-parameters.rst), the minimum possible
|
||||
order is used and this sysfs entry can not be used to change
|
||||
the order at run time.
|
||||
|
||||
|
||||
15
Documentation/ABI/testing/sysfs-platform-phy-rcar-gen3-usb2
Normal file
15
Documentation/ABI/testing/sysfs-platform-phy-rcar-gen3-usb2
Normal file
@@ -0,0 +1,15 @@
|
||||
What: /sys/devices/platform/<phy-name>/role
|
||||
Date: October 2016
|
||||
KernelVersion: 4.10
|
||||
Contact: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
|
||||
Description:
|
||||
This file can be read and write.
|
||||
The file can show/change the phy mode for role swap of usb.
|
||||
|
||||
Write the following strings to change the mode:
|
||||
"host" - switching mode from peripheral to host.
|
||||
"peripheral" - switching mode from host to peripheral.
|
||||
|
||||
Read the file, then it shows the following strings:
|
||||
"host" - The mode is host now.
|
||||
"peripheral" - The mode is peripheral now.
|
||||
17
Documentation/ABI/testing/sysfs-platform-sst-atom
Normal file
17
Documentation/ABI/testing/sysfs-platform-sst-atom
Normal file
@@ -0,0 +1,17 @@
|
||||
What: /sys/devices/platform/8086%x:00/firmware_version
|
||||
Date: November 2016
|
||||
KernelVersion: 4.10
|
||||
Contact: "Sebastien Guiriec" <sebastien.guiriec@intel.com>
|
||||
Description:
|
||||
LPE Firmware version for SST driver on all atom
|
||||
plaforms (BYT/CHT/Merrifield/BSW).
|
||||
If the FW has never been loaded it will display:
|
||||
"FW not yet loaded"
|
||||
If FW has been loaded it will display:
|
||||
"v01.aa.bb.cc"
|
||||
aa: Major version is reflecting SoC version:
|
||||
0d: BYT FW
|
||||
0b: BSW FW
|
||||
07: Merrifield FW
|
||||
bb: Minor version
|
||||
cc: Build version
|
||||
@@ -7,30 +7,35 @@ Description:
|
||||
subsystem.
|
||||
|
||||
What: /sys/power/state
|
||||
Date: May 2014
|
||||
Date: November 2016
|
||||
Contact: Rafael J. Wysocki <rjw@rjwysocki.net>
|
||||
Description:
|
||||
The /sys/power/state file controls system sleep states.
|
||||
Reading from this file returns the available sleep state
|
||||
labels, which may be "mem", "standby", "freeze" and "disk"
|
||||
(hibernation). The meanings of the first three labels depend on
|
||||
the relative_sleep_states command line argument as follows:
|
||||
1) relative_sleep_states = 1
|
||||
"mem", "standby", "freeze" represent non-hibernation sleep
|
||||
states from the deepest ("mem", always present) to the
|
||||
shallowest ("freeze"). "standby" and "freeze" may or may
|
||||
not be present depending on the capabilities of the
|
||||
platform. "freeze" can only be present if "standby" is
|
||||
present.
|
||||
2) relative_sleep_states = 0 (default)
|
||||
"mem" - "suspend-to-RAM", present if supported.
|
||||
"standby" - "power-on suspend", present if supported.
|
||||
"freeze" - "suspend-to-idle", always present.
|
||||
labels, which may be "mem" (suspend), "standby" (power-on
|
||||
suspend), "freeze" (suspend-to-idle) and "disk" (hibernation).
|
||||
|
||||
Writing to this file one of these strings causes the system to
|
||||
transition into the corresponding state, if available. See
|
||||
Documentation/power/states.txt for a description of what
|
||||
"suspend-to-RAM", "power-on suspend" and "suspend-to-idle" mean.
|
||||
Writing one of the above strings to this file causes the system
|
||||
to transition into the corresponding state, if available.
|
||||
|
||||
See Documentation/power/states.txt for more information.
|
||||
|
||||
What: /sys/power/mem_sleep
|
||||
Date: November 2016
|
||||
Contact: Rafael J. Wysocki <rjw@rjwysocki.net>
|
||||
Description:
|
||||
The /sys/power/mem_sleep file controls the operating mode of
|
||||
system suspend. Reading from it returns the available modes
|
||||
as "s2idle" (always present), "shallow" and "deep" (present if
|
||||
supported). The mode that will be used on subsequent attempts
|
||||
to suspend the system (by writing "mem" to the /sys/power/state
|
||||
file described above) is enclosed in square brackets.
|
||||
|
||||
Writing one of the above strings to this file causes the mode
|
||||
represented by it to be used on subsequent attempts to suspend
|
||||
the system.
|
||||
|
||||
See Documentation/power/states.txt for more information.
|
||||
|
||||
What: /sys/power/disk
|
||||
Date: September 2006
|
||||
|
||||
@@ -1,246 +0,0 @@
|
||||
Table of contents
|
||||
=================
|
||||
|
||||
Last updated: 20 December 2005
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- Introduction
|
||||
- Devices not appearing
|
||||
- Finding patch that caused a bug
|
||||
-- Finding using git-bisect
|
||||
-- Finding it the old way
|
||||
- Fixing the bug
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
Always try the latest kernel from kernel.org and build from source. If you are
|
||||
not confident in doing that please report the bug to your distribution vendor
|
||||
instead of to a kernel developer.
|
||||
|
||||
Finding bugs is not always easy. Have a go though. If you can't find it don't
|
||||
give up. Report as much as you have found to the relevant maintainer. See
|
||||
MAINTAINERS for who that is for the subsystem you have worked on.
|
||||
|
||||
Before you submit a bug report read REPORTING-BUGS.
|
||||
|
||||
Devices not appearing
|
||||
=====================
|
||||
|
||||
Often this is caused by udev. Check that first before blaming it on the
|
||||
kernel.
|
||||
|
||||
Finding patch that caused a bug
|
||||
===============================
|
||||
|
||||
|
||||
|
||||
Finding using git-bisect
|
||||
------------------------
|
||||
|
||||
Using the provided tools with git makes finding bugs easy provided the bug is
|
||||
reproducible.
|
||||
|
||||
Steps to do it:
|
||||
- start using git for the kernel source
|
||||
- read the man page for git-bisect
|
||||
- have fun
|
||||
|
||||
Finding it the old way
|
||||
----------------------
|
||||
|
||||
[Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)]
|
||||
|
||||
This is how to track down a bug if you know nothing about kernel hacking.
|
||||
It's a brute force approach but it works pretty well.
|
||||
|
||||
You need:
|
||||
|
||||
. A reproducible bug - it has to happen predictably (sorry)
|
||||
. All the kernel tar files from a revision that worked to the
|
||||
revision that doesn't
|
||||
|
||||
You will then do:
|
||||
|
||||
. Rebuild a revision that you believe works, install, and verify that.
|
||||
. Do a binary search over the kernels to figure out which one
|
||||
introduced the bug. I.e., suppose 1.3.28 didn't have the bug, but
|
||||
you know that 1.3.69 does. Pick a kernel in the middle and build
|
||||
that, like 1.3.50. Build & test; if it works, pick the mid point
|
||||
between .50 and .69, else the mid point between .28 and .50.
|
||||
. You'll narrow it down to the kernel that introduced the bug. You
|
||||
can probably do better than this but it gets tricky.
|
||||
|
||||
. Narrow it down to a subdirectory
|
||||
|
||||
- Copy kernel that works into "test". Let's say that 3.62 works,
|
||||
but 3.63 doesn't. So you diff -r those two kernels and come
|
||||
up with a list of directories that changed. For each of those
|
||||
directories:
|
||||
|
||||
Copy the non-working directory next to the working directory
|
||||
as "dir.63".
|
||||
One directory at time, try moving the working directory to
|
||||
"dir.62" and mv dir.63 dir"time, try
|
||||
|
||||
mv dir dir.62
|
||||
mv dir.63 dir
|
||||
find dir -name '*.[oa]' -print | xargs rm -f
|
||||
|
||||
And then rebuild and retest. Assuming that all related
|
||||
changes were contained in the sub directory, this should
|
||||
isolate the change to a directory.
|
||||
|
||||
Problems: changes in header files may have occurred; I've
|
||||
found in my case that they were self explanatory - you may
|
||||
or may not want to give up when that happens.
|
||||
|
||||
. Narrow it down to a file
|
||||
|
||||
- You can apply the same technique to each file in the directory,
|
||||
hoping that the changes in that file are self contained.
|
||||
|
||||
. Narrow it down to a routine
|
||||
|
||||
- You can take the old file and the new file and manually create
|
||||
a merged file that has
|
||||
|
||||
#ifdef VER62
|
||||
routine()
|
||||
{
|
||||
...
|
||||
}
|
||||
#else
|
||||
routine()
|
||||
{
|
||||
...
|
||||
}
|
||||
#endif
|
||||
|
||||
And then walk through that file, one routine at a time and
|
||||
prefix it with
|
||||
|
||||
#define VER62
|
||||
/* both routines here */
|
||||
#undef VER62
|
||||
|
||||
Then recompile, retest, move the ifdefs until you find the one
|
||||
that makes the difference.
|
||||
|
||||
Finally, you take all the info that you have, kernel revisions, bug
|
||||
description, the extent to which you have narrowed it down, and pass
|
||||
that off to whomever you believe is the maintainer of that section.
|
||||
A post to linux.dev.kernel isn't such a bad idea if you've done some
|
||||
work to narrow it down.
|
||||
|
||||
If you get it down to a routine, you'll probably get a fix in 24 hours.
|
||||
|
||||
My apologies to Linus and the other kernel hackers for describing this
|
||||
brute force approach, it's hardly what a kernel hacker would do. However,
|
||||
it does work and it lets non-hackers help fix bugs. And it is cool
|
||||
because Linux snapshots will let you do this - something that you can't
|
||||
do with vendor supplied releases.
|
||||
|
||||
Fixing the bug
|
||||
==============
|
||||
|
||||
Nobody is going to tell you how to fix bugs. Seriously. You need to work it
|
||||
out. But below are some hints on how to use the tools.
|
||||
|
||||
To debug a kernel, use objdump and look for the hex offset from the crash
|
||||
output to find the valid line of code/assembler. Without debug symbols, you
|
||||
will see the assembler code for the routine shown, but if your kernel has
|
||||
debug symbols the C code will also be available. (Debug symbols can be enabled
|
||||
in the kernel hacking menu of the menu configuration.) For example:
|
||||
|
||||
objdump -r -S -l --disassemble net/dccp/ipv4.o
|
||||
|
||||
NB.: you need to be at the top level of the kernel tree for this to pick up
|
||||
your C files.
|
||||
|
||||
If you don't have access to the code you can also debug on some crash dumps
|
||||
e.g. crash dump output as shown by Dave Miller.
|
||||
|
||||
> EIP is at ip_queue_xmit+0x14/0x4c0
|
||||
> ...
|
||||
> Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
|
||||
> 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
|
||||
> <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85
|
||||
>
|
||||
> Put the bytes into a "foo.s" file like this:
|
||||
>
|
||||
> .text
|
||||
> .globl foo
|
||||
> foo:
|
||||
> .byte .... /* bytes from Code: part of OOPS dump */
|
||||
>
|
||||
> Compile it with "gcc -c -o foo.o foo.s" then look at the output of
|
||||
> "objdump --disassemble foo.o".
|
||||
>
|
||||
> Output:
|
||||
>
|
||||
> ip_queue_xmit:
|
||||
> push %ebp
|
||||
> push %edi
|
||||
> push %esi
|
||||
> push %ebx
|
||||
> sub $0xbc, %esp
|
||||
> mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb)
|
||||
> mov 0x8(%ebp), %ebx ! %ebx = skb->sk
|
||||
> mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
|
||||
|
||||
In addition, you can use GDB to figure out the exact file and line
|
||||
number of the OOPS from the vmlinux file. If you have
|
||||
CONFIG_DEBUG_INFO enabled, you can simply copy the EIP value from the
|
||||
OOPS:
|
||||
|
||||
EIP: 0060:[<c021e50e>] Not tainted VLI
|
||||
|
||||
And use GDB to translate that to human-readable form:
|
||||
|
||||
gdb vmlinux
|
||||
(gdb) l *0xc021e50e
|
||||
|
||||
If you don't have CONFIG_DEBUG_INFO enabled, you use the function
|
||||
offset from the OOPS:
|
||||
|
||||
EIP is at vt_ioctl+0xda8/0x1482
|
||||
|
||||
And recompile the kernel with CONFIG_DEBUG_INFO enabled:
|
||||
|
||||
make vmlinux
|
||||
gdb vmlinux
|
||||
(gdb) p vt_ioctl
|
||||
(gdb) l *(0x<address of vt_ioctl> + 0xda8)
|
||||
or, as one command
|
||||
(gdb) l *(vt_ioctl + 0xda8)
|
||||
|
||||
If you have a call trace, such as :-
|
||||
>Call Trace:
|
||||
> [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
|
||||
> [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
|
||||
> [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
|
||||
> ...
|
||||
this shows the problem in the :jbd: module. You can load that module in gdb
|
||||
and list the relevant code.
|
||||
gdb fs/jbd/jbd.ko
|
||||
(gdb) p log_wait_commit
|
||||
(gdb) l *(0x<address> + 0xa3)
|
||||
or
|
||||
(gdb) l *(log_wait_commit + 0xa3)
|
||||
|
||||
|
||||
Another very useful option of the Kernel Hacking section in menuconfig is
|
||||
Debug memory allocations. This will help you see whether data has been
|
||||
initialised and not set before use etc. To see the values that get assigned
|
||||
with this look at mm/slab.c and search for POISON_INUSE. When using this an
|
||||
Oops will often show the poisoned data instead of zero which is the default.
|
||||
|
||||
Once you have worked out a fix please submit it upstream. After all open
|
||||
source is about sharing what you do and don't you want to be recognised for
|
||||
your genius?
|
||||
|
||||
Please do read Documentation/SubmittingPatches though to help your code get
|
||||
accepted.
|
||||
@@ -1,485 +0,0 @@
|
||||
.. _changes:
|
||||
|
||||
Minimal requerements to compile the Kernel
|
||||
++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
Intro
|
||||
=====
|
||||
|
||||
This document is designed to provide a list of the minimum levels of
|
||||
software necessary to run the 4.x kernels.
|
||||
|
||||
This document is originally based on my "Changes" file for 2.0.x kernels
|
||||
and therefore owes credit to the same people as that file (Jared Mauch,
|
||||
Axel Boldt, Alessandro Sigala, and countless other users all over the
|
||||
'net).
|
||||
|
||||
Current Minimal Requirements
|
||||
****************************
|
||||
|
||||
Upgrade to at **least** these software revisions before thinking you've
|
||||
encountered a bug! If you're unsure what version you're currently
|
||||
running, the suggested command should tell you.
|
||||
|
||||
Again, keep in mind that this list assumes you are already functionally
|
||||
running a Linux kernel. Also, not all tools are necessary on all
|
||||
systems; obviously, if you don't have any ISDN hardware, for example,
|
||||
you probably needn't concern yourself with isdn4k-utils.
|
||||
|
||||
====================== =============== ========================================
|
||||
Program Minimal version Command to check the version
|
||||
====================== =============== ========================================
|
||||
GNU C 3.2 gcc --version
|
||||
GNU make 3.80 make --version
|
||||
binutils 2.12 ld -v
|
||||
util-linux 2.10o fdformat --version
|
||||
module-init-tools 0.9.10 depmod -V
|
||||
e2fsprogs 1.41.4 e2fsck -V
|
||||
jfsutils 1.1.3 fsck.jfs -V
|
||||
reiserfsprogs 3.6.3 reiserfsck -V
|
||||
xfsprogs 2.6.0 xfs_db -V
|
||||
squashfs-tools 4.0 mksquashfs -version
|
||||
btrfs-progs 0.18 btrfsck
|
||||
pcmciautils 004 pccardctl -V
|
||||
quota-tools 3.09 quota -V
|
||||
PPP 2.4.0 pppd --version
|
||||
isdn4k-utils 3.1pre1 isdnctrl 2>&1|grep version
|
||||
nfs-utils 1.0.5 showmount --version
|
||||
procps 3.2.0 ps --version
|
||||
oprofile 0.9 oprofiled --version
|
||||
udev 081 udevd --version
|
||||
grub 0.93 grub --version || grub-install --version
|
||||
mcelog 0.6 mcelog --version
|
||||
iptables 1.4.2 iptables -V
|
||||
openssl & libcrypto 1.0.0 openssl version
|
||||
bc 1.06.95 bc --version
|
||||
Sphinx\ [#f1]_ 1.2 sphinx-build --version
|
||||
====================== =============== ========================================
|
||||
|
||||
.. [#f1] Sphinx is needed only to build the Kernel documentation
|
||||
|
||||
Kernel compilation
|
||||
******************
|
||||
|
||||
GCC
|
||||
---
|
||||
|
||||
The gcc version requirements may vary depending on the type of CPU in your
|
||||
computer.
|
||||
|
||||
Make
|
||||
----
|
||||
|
||||
You will need GNU make 3.80 or later to build the kernel.
|
||||
|
||||
Binutils
|
||||
--------
|
||||
|
||||
Linux on IA-32 has recently switched from using ``as86`` to using ``gas`` for
|
||||
assembling the 16-bit boot code, removing the need for ``as86`` to compile
|
||||
your kernel. This change does, however, mean that you need a recent
|
||||
release of binutils.
|
||||
|
||||
Perl
|
||||
----
|
||||
|
||||
You will need perl 5 and the following modules: ``Getopt::Long``,
|
||||
``Getopt::Std``, ``File::Basename``, and ``File::Find`` to build the kernel.
|
||||
|
||||
BC
|
||||
--
|
||||
|
||||
You will need bc to build kernels 3.10 and higher
|
||||
|
||||
|
||||
OpenSSL
|
||||
-------
|
||||
|
||||
Module signing and external certificate handling use the OpenSSL program and
|
||||
crypto library to do key creation and signature generation.
|
||||
|
||||
You will need openssl to build kernels 3.7 and higher if module signing is
|
||||
enabled. You will also need openssl development packages to build kernels 4.3
|
||||
and higher.
|
||||
|
||||
|
||||
System utilities
|
||||
****************
|
||||
|
||||
Architectural changes
|
||||
---------------------
|
||||
|
||||
DevFS has been obsoleted in favour of udev
|
||||
(http://www.kernel.org/pub/linux/utils/kernel/hotplug/)
|
||||
|
||||
32-bit UID support is now in place. Have fun!
|
||||
|
||||
Linux documentation for functions is transitioning to inline
|
||||
documentation via specially-formatted comments near their
|
||||
definitions in the source. These comments can be combined with the
|
||||
SGML templates in the Documentation/DocBook directory to make DocBook
|
||||
files, which can then be converted by DocBook stylesheets to PostScript,
|
||||
HTML, PDF files, and several other formats. In order to convert from
|
||||
DocBook format to a format of your choice, you'll need to install Jade as
|
||||
well as the desired DocBook stylesheets.
|
||||
|
||||
Util-linux
|
||||
----------
|
||||
|
||||
New versions of util-linux provide ``fdisk`` support for larger disks,
|
||||
support new options to mount, recognize more supported partition
|
||||
types, have a fdformat which works with 2.4 kernels, and similar goodies.
|
||||
You'll probably want to upgrade.
|
||||
|
||||
Ksymoops
|
||||
--------
|
||||
|
||||
If the unthinkable happens and your kernel oopses, you may need the
|
||||
ksymoops tool to decode it, but in most cases you don't.
|
||||
It is generally preferred to build the kernel with ``CONFIG_KALLSYMS`` so
|
||||
that it produces readable dumps that can be used as-is (this also
|
||||
produces better output than ksymoops). If for some reason your kernel
|
||||
is not build with ``CONFIG_KALLSYMS`` and you have no way to rebuild and
|
||||
reproduce the Oops with that option, then you can still decode that Oops
|
||||
with ksymoops.
|
||||
|
||||
Module-Init-Tools
|
||||
-----------------
|
||||
|
||||
A new module loader is now in the kernel that requires ``module-init-tools``
|
||||
to use. It is backward compatible with the 2.4.x series kernels.
|
||||
|
||||
Mkinitrd
|
||||
--------
|
||||
|
||||
These changes to the ``/lib/modules`` file tree layout also require that
|
||||
mkinitrd be upgraded.
|
||||
|
||||
E2fsprogs
|
||||
---------
|
||||
|
||||
The latest version of ``e2fsprogs`` fixes several bugs in fsck and
|
||||
debugfs. Obviously, it's a good idea to upgrade.
|
||||
|
||||
JFSutils
|
||||
--------
|
||||
|
||||
The ``jfsutils`` package contains the utilities for the file system.
|
||||
The following utilities are available:
|
||||
|
||||
- ``fsck.jfs`` - initiate replay of the transaction log, and check
|
||||
and repair a JFS formatted partition.
|
||||
|
||||
- ``mkfs.jfs`` - create a JFS formatted partition.
|
||||
|
||||
- other file system utilities are also available in this package.
|
||||
|
||||
Reiserfsprogs
|
||||
-------------
|
||||
|
||||
The reiserfsprogs package should be used for reiserfs-3.6.x
|
||||
(Linux kernels 2.4.x). It is a combined package and contains working
|
||||
versions of ``mkreiserfs``, ``resize_reiserfs``, ``debugreiserfs`` and
|
||||
``reiserfsck``. These utils work on both i386 and alpha platforms.
|
||||
|
||||
Xfsprogs
|
||||
--------
|
||||
|
||||
The latest version of ``xfsprogs`` contains ``mkfs.xfs``, ``xfs_db``, and the
|
||||
``xfs_repair`` utilities, among others, for the XFS filesystem. It is
|
||||
architecture independent and any version from 2.0.0 onward should
|
||||
work correctly with this version of the XFS kernel code (2.6.0 or
|
||||
later is recommended, due to some significant improvements).
|
||||
|
||||
PCMCIAutils
|
||||
-----------
|
||||
|
||||
PCMCIAutils replaces ``pcmcia-cs``. It properly sets up
|
||||
PCMCIA sockets at system startup and loads the appropriate modules
|
||||
for 16-bit PCMCIA devices if the kernel is modularized and the hotplug
|
||||
subsystem is used.
|
||||
|
||||
Quota-tools
|
||||
-----------
|
||||
|
||||
Support for 32 bit uid's and gid's is required if you want to use
|
||||
the newer version 2 quota format. Quota-tools version 3.07 and
|
||||
newer has this support. Use the recommended version or newer
|
||||
from the table above.
|
||||
|
||||
Intel IA32 microcode
|
||||
--------------------
|
||||
|
||||
A driver has been added to allow updating of Intel IA32 microcode,
|
||||
accessible as a normal (misc) character device. If you are not using
|
||||
udev you may need to::
|
||||
|
||||
mkdir /dev/cpu
|
||||
mknod /dev/cpu/microcode c 10 184
|
||||
chmod 0644 /dev/cpu/microcode
|
||||
|
||||
as root before you can use this. You'll probably also want to
|
||||
get the user-space microcode_ctl utility to use with this.
|
||||
|
||||
udev
|
||||
----
|
||||
|
||||
``udev`` is a userspace application for populating ``/dev`` dynamically with
|
||||
only entries for devices actually present. ``udev`` replaces the basic
|
||||
functionality of devfs, while allowing persistent device naming for
|
||||
devices.
|
||||
|
||||
FUSE
|
||||
----
|
||||
|
||||
Needs libfuse 2.4.0 or later. Absolute minimum is 2.3.0 but mount
|
||||
options ``direct_io`` and ``kernel_cache`` won't work.
|
||||
|
||||
Networking
|
||||
**********
|
||||
|
||||
General changes
|
||||
---------------
|
||||
|
||||
If you have advanced network configuration needs, you should probably
|
||||
consider using the network tools from ip-route2.
|
||||
|
||||
Packet Filter / NAT
|
||||
-------------------
|
||||
The packet filtering and NAT code uses the same tools like the previous 2.4.x
|
||||
kernel series (iptables). It still includes backwards-compatibility modules
|
||||
for 2.2.x-style ipchains and 2.0.x-style ipfwadm.
|
||||
|
||||
PPP
|
||||
---
|
||||
|
||||
The PPP driver has been restructured to support multilink and to
|
||||
enable it to operate over diverse media layers. If you use PPP,
|
||||
upgrade pppd to at least 2.4.0.
|
||||
|
||||
If you are not using udev, you must have the device file /dev/ppp
|
||||
which can be made by::
|
||||
|
||||
mknod /dev/ppp c 108 0
|
||||
|
||||
as root.
|
||||
|
||||
Isdn4k-utils
|
||||
------------
|
||||
|
||||
Due to changes in the length of the phone number field, isdn4k-utils
|
||||
needs to be recompiled or (preferably) upgraded.
|
||||
|
||||
NFS-utils
|
||||
---------
|
||||
|
||||
In ancient (2.4 and earlier) kernels, the nfs server needed to know
|
||||
about any client that expected to be able to access files via NFS. This
|
||||
information would be given to the kernel by ``mountd`` when the client
|
||||
mounted the filesystem, or by ``exportfs`` at system startup. exportfs
|
||||
would take information about active clients from ``/var/lib/nfs/rmtab``.
|
||||
|
||||
This approach is quite fragile as it depends on rmtab being correct
|
||||
which is not always easy, particularly when trying to implement
|
||||
fail-over. Even when the system is working well, ``rmtab`` suffers from
|
||||
getting lots of old entries that never get removed.
|
||||
|
||||
With modern kernels we have the option of having the kernel tell mountd
|
||||
when it gets a request from an unknown host, and mountd can give
|
||||
appropriate export information to the kernel. This removes the
|
||||
dependency on ``rmtab`` and means that the kernel only needs to know about
|
||||
currently active clients.
|
||||
|
||||
To enable this new functionality, you need to::
|
||||
|
||||
mount -t nfsd nfsd /proc/fs/nfsd
|
||||
|
||||
before running exportfs or mountd. It is recommended that all NFS
|
||||
services be protected from the internet-at-large by a firewall where
|
||||
that is possible.
|
||||
|
||||
mcelog
|
||||
------
|
||||
|
||||
On x86 kernels the mcelog utility is needed to process and log machine check
|
||||
events when ``CONFIG_X86_MCE`` is enabled. Machine check events are errors
|
||||
reported by the CPU. Processing them is strongly encouraged.
|
||||
|
||||
Kernel documentation
|
||||
********************
|
||||
|
||||
Sphinx
|
||||
------
|
||||
|
||||
The ReST markups currently used by the Documentation/ files are meant to be
|
||||
built with ``Sphinx`` version 1.2 or upper. If you're desiring to build
|
||||
PDF outputs, it is recommended to use version 1.4.6.
|
||||
|
||||
.. note::
|
||||
|
||||
Please notice that, for PDF and LaTeX output, you'll also need ``XeLaTeX``
|
||||
version 3.14159265. Depending on the distribution, you may also need
|
||||
to install a series of ``texlive`` packages that provide the minimal
|
||||
set of functionalities required for ``XeLaTex`` to work.
|
||||
|
||||
Other tools
|
||||
-----------
|
||||
|
||||
In order to produce documentation from DocBook, you'll also need ``xmlto``.
|
||||
Please notice, however, that we're currently migrating all documents to use
|
||||
``Sphinx``.
|
||||
|
||||
Getting updated software
|
||||
========================
|
||||
|
||||
Kernel compilation
|
||||
******************
|
||||
|
||||
gcc
|
||||
---
|
||||
|
||||
- <ftp://ftp.gnu.org/gnu/gcc/>
|
||||
|
||||
Make
|
||||
----
|
||||
|
||||
- <ftp://ftp.gnu.org/gnu/make/>
|
||||
|
||||
Binutils
|
||||
--------
|
||||
|
||||
- <ftp://ftp.kernel.org/pub/linux/devel/binutils/>
|
||||
|
||||
OpenSSL
|
||||
-------
|
||||
|
||||
- <https://www.openssl.org/>
|
||||
|
||||
System utilities
|
||||
****************
|
||||
|
||||
Util-linux
|
||||
----------
|
||||
|
||||
- <ftp://ftp.kernel.org/pub/linux/utils/util-linux/>
|
||||
|
||||
Ksymoops
|
||||
--------
|
||||
|
||||
- <ftp://ftp.kernel.org/pub/linux/utils/kernel/ksymoops/v2.4/>
|
||||
|
||||
Module-Init-Tools
|
||||
-----------------
|
||||
|
||||
- <ftp://ftp.kernel.org/pub/linux/kernel/people/rusty/modules/>
|
||||
|
||||
Mkinitrd
|
||||
--------
|
||||
|
||||
- <https://code.launchpad.net/initrd-tools/main>
|
||||
|
||||
E2fsprogs
|
||||
---------
|
||||
|
||||
- <http://prdownloads.sourceforge.net/e2fsprogs/e2fsprogs-1.29.tar.gz>
|
||||
|
||||
JFSutils
|
||||
--------
|
||||
|
||||
- <http://jfs.sourceforge.net/>
|
||||
|
||||
Reiserfsprogs
|
||||
-------------
|
||||
|
||||
- <http://www.kernel.org/pub/linux/utils/fs/reiserfs/>
|
||||
|
||||
Xfsprogs
|
||||
--------
|
||||
|
||||
- <ftp://oss.sgi.com/projects/xfs/>
|
||||
|
||||
Pcmciautils
|
||||
-----------
|
||||
|
||||
- <ftp://ftp.kernel.org/pub/linux/utils/kernel/pcmcia/>
|
||||
|
||||
Quota-tools
|
||||
-----------
|
||||
|
||||
- <http://sourceforge.net/projects/linuxquota/>
|
||||
|
||||
DocBook Stylesheets
|
||||
-------------------
|
||||
|
||||
- <http://sourceforge.net/projects/docbook/files/docbook-dsssl/>
|
||||
|
||||
XMLTO XSLT Frontend
|
||||
-------------------
|
||||
|
||||
- <http://cyberelk.net/tim/xmlto/>
|
||||
|
||||
Intel P6 microcode
|
||||
------------------
|
||||
|
||||
- <https://downloadcenter.intel.com/>
|
||||
|
||||
udev
|
||||
----
|
||||
|
||||
- <http://www.freedesktop.org/software/systemd/man/udev.html>
|
||||
|
||||
FUSE
|
||||
----
|
||||
|
||||
- <http://sourceforge.net/projects/fuse>
|
||||
|
||||
mcelog
|
||||
------
|
||||
|
||||
- <http://www.mcelog.org/>
|
||||
|
||||
Networking
|
||||
**********
|
||||
|
||||
PPP
|
||||
---
|
||||
|
||||
- <ftp://ftp.samba.org/pub/ppp/>
|
||||
|
||||
Isdn4k-utils
|
||||
------------
|
||||
|
||||
- <ftp://ftp.isdn4linux.de/pub/isdn4linux/utils/>
|
||||
|
||||
NFS-utils
|
||||
---------
|
||||
|
||||
- <http://sourceforge.net/project/showfiles.php?group_id=14>
|
||||
|
||||
Iptables
|
||||
--------
|
||||
|
||||
- <http://www.iptables.org/downloads.html>
|
||||
|
||||
Ip-route2
|
||||
---------
|
||||
|
||||
- <https://www.kernel.org/pub/linux/utils/net/iproute2/>
|
||||
|
||||
OProfile
|
||||
--------
|
||||
|
||||
- <http://oprofile.sf.net/download/>
|
||||
|
||||
NFS-Utils
|
||||
---------
|
||||
|
||||
- <http://nfs.sourceforge.net/>
|
||||
|
||||
Kernel documentation
|
||||
********************
|
||||
|
||||
Sphinx
|
||||
------
|
||||
|
||||
- <http://www.sphinx-doc.org/>
|
||||
1
Documentation/Changes
Symbolic link
1
Documentation/Changes
Symbolic link
@@ -0,0 +1 @@
|
||||
process/changes.rst
|
||||
@@ -1,27 +0,0 @@
|
||||
Code of Conflict
|
||||
----------------
|
||||
|
||||
The Linux kernel development effort is a very personal process compared
|
||||
to "traditional" ways of developing software. Your code and ideas
|
||||
behind it will be carefully reviewed, often resulting in critique and
|
||||
criticism. The review will almost always require improvements to the
|
||||
code before it can be included in the kernel. Know that this happens
|
||||
because everyone involved wants to see the best possible solution for
|
||||
the overall success of Linux. This development process has been proven
|
||||
to create the most robust operating system kernel ever, and we do not
|
||||
want to do anything to cause the quality of submission and eventual
|
||||
result to ever decrease.
|
||||
|
||||
If however, anyone feels personally abused, threatened, or otherwise
|
||||
uncomfortable due to this process, that is not acceptable. If so,
|
||||
please contact the Linux Foundation's Technical Advisory Board at
|
||||
<tab@lists.linux-foundation.org>, or the individual members, and they
|
||||
will work to resolve the issue to the best of their ability. For more
|
||||
information on who is on the Technical Advisory Board and what their
|
||||
role is, please see:
|
||||
http://www.linuxfoundation.org/projects/linux/tab
|
||||
|
||||
As a reviewer of code, please strive to keep things civil and focused on
|
||||
the technical issues involved. We are all humans, and frustrations can
|
||||
be high on both sides of the process. Try to keep in mind the immortal
|
||||
words of Bill and Ted, "Be excellent to each other."
|
||||
File diff suppressed because it is too large
Load Diff
@@ -9,13 +9,11 @@
|
||||
DOCBOOKS := z8530book.xml \
|
||||
kernel-hacking.xml kernel-locking.xml deviceiobook.xml \
|
||||
writing_usb_driver.xml networking.xml \
|
||||
kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \
|
||||
kernel-api.xml filesystems.xml lsm.xml kgdb.xml \
|
||||
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
|
||||
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
|
||||
debugobjects.xml sh.xml regulator.xml \
|
||||
alsa-driver-api.xml writing-an-alsa-driver.xml \
|
||||
tracepoint.xml w1.xml \
|
||||
writing_musb_glue_layer.xml crypto-API.xml iio.xml
|
||||
80211.xml sh.xml regulator.xml w1.xml \
|
||||
writing_musb_glue_layer.xml iio.xml
|
||||
|
||||
ifeq ($(DOCBOOKS),)
|
||||
|
||||
@@ -264,6 +262,7 @@ clean-files := $(DOCBOOKS) \
|
||||
$(patsubst %.xml, %.aux.xml, $(DOCBOOKS)) \
|
||||
$(patsubst %.xml, %.xml.db, $(DOCBOOKS)) \
|
||||
$(patsubst %.xml, %.xml, $(DOCBOOKS)) \
|
||||
$(patsubst %.xml, .%.xml.cmd, $(DOCBOOKS)) \
|
||||
$(index)
|
||||
|
||||
clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS)) man
|
||||
|
||||
@@ -1,142 +0,0 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||
|
||||
<!-- ****************************************************** -->
|
||||
<!-- Header -->
|
||||
<!-- ****************************************************** -->
|
||||
<book id="ALSA-Driver-API">
|
||||
<bookinfo>
|
||||
<title>The ALSA Driver API</title>
|
||||
|
||||
<legalnotice>
|
||||
<para>
|
||||
This document is free; you can redistribute it and/or modify it
|
||||
under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This document is distributed in the hope that it will be useful,
|
||||
but <emphasis>WITHOUT ANY WARRANTY</emphasis>; without even the
|
||||
implied warranty of <emphasis>MERCHANTABILITY or FITNESS FOR A
|
||||
PARTICULAR PURPOSE</emphasis>. See the GNU General Public License
|
||||
for more details.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
You should have received a copy of the GNU General Public
|
||||
License along with this program; if not, write to the Free
|
||||
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
MA 02111-1307 USA
|
||||
</para>
|
||||
</legalnotice>
|
||||
|
||||
</bookinfo>
|
||||
|
||||
<toc></toc>
|
||||
|
||||
<chapter><title>Management of Cards and Devices</title>
|
||||
<sect1><title>Card Management</title>
|
||||
!Esound/core/init.c
|
||||
</sect1>
|
||||
<sect1><title>Device Components</title>
|
||||
!Esound/core/device.c
|
||||
</sect1>
|
||||
<sect1><title>Module requests and Device File Entries</title>
|
||||
!Esound/core/sound.c
|
||||
</sect1>
|
||||
<sect1><title>Memory Management Helpers</title>
|
||||
!Esound/core/memory.c
|
||||
!Esound/core/memalloc.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>PCM API</title>
|
||||
<sect1><title>PCM Core</title>
|
||||
!Esound/core/pcm.c
|
||||
!Esound/core/pcm_lib.c
|
||||
!Esound/core/pcm_native.c
|
||||
!Iinclude/sound/pcm.h
|
||||
</sect1>
|
||||
<sect1><title>PCM Format Helpers</title>
|
||||
!Esound/core/pcm_misc.c
|
||||
</sect1>
|
||||
<sect1><title>PCM Memory Management</title>
|
||||
!Esound/core/pcm_memory.c
|
||||
</sect1>
|
||||
<sect1><title>PCM DMA Engine API</title>
|
||||
!Esound/core/pcm_dmaengine.c
|
||||
!Iinclude/sound/dmaengine_pcm.h
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>Control/Mixer API</title>
|
||||
<sect1><title>General Control Interface</title>
|
||||
!Esound/core/control.c
|
||||
</sect1>
|
||||
<sect1><title>AC97 Codec API</title>
|
||||
!Esound/pci/ac97/ac97_codec.c
|
||||
!Esound/pci/ac97/ac97_pcm.c
|
||||
</sect1>
|
||||
<sect1><title>Virtual Master Control API</title>
|
||||
!Esound/core/vmaster.c
|
||||
!Iinclude/sound/control.h
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>MIDI API</title>
|
||||
<sect1><title>Raw MIDI API</title>
|
||||
!Esound/core/rawmidi.c
|
||||
</sect1>
|
||||
<sect1><title>MPU401-UART API</title>
|
||||
!Esound/drivers/mpu401/mpu401_uart.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>Proc Info API</title>
|
||||
<sect1><title>Proc Info Interface</title>
|
||||
!Esound/core/info.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>Compress Offload</title>
|
||||
<sect1><title>Compress Offload API</title>
|
||||
!Esound/core/compress_offload.c
|
||||
!Iinclude/uapi/sound/compress_offload.h
|
||||
!Iinclude/uapi/sound/compress_params.h
|
||||
!Iinclude/sound/compress_driver.h
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>ASoC</title>
|
||||
<sect1><title>ASoC Core API</title>
|
||||
!Iinclude/sound/soc.h
|
||||
!Esound/soc/soc-core.c
|
||||
<!-- !Esound/soc/soc-cache.c no docbook comments here -->
|
||||
!Esound/soc/soc-devres.c
|
||||
!Esound/soc/soc-io.c
|
||||
!Esound/soc/soc-pcm.c
|
||||
!Esound/soc/soc-ops.c
|
||||
!Esound/soc/soc-compress.c
|
||||
</sect1>
|
||||
<sect1><title>ASoC DAPM API</title>
|
||||
!Esound/soc/soc-dapm.c
|
||||
</sect1>
|
||||
<sect1><title>ASoC DMA Engine API</title>
|
||||
!Esound/soc/soc-generic-dmaengine-pcm.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter><title>Miscellaneous Functions</title>
|
||||
<sect1><title>Hardware-Dependent Devices API</title>
|
||||
!Esound/core/hwdep.c
|
||||
</sect1>
|
||||
<sect1><title>Jack Abstraction Layer API</title>
|
||||
!Iinclude/sound/jack.h
|
||||
!Esound/core/jack.c
|
||||
!Esound/soc/soc-jack.c
|
||||
</sect1>
|
||||
<sect1><title>ISA DMA Helpers</title>
|
||||
!Esound/core/isadma.c
|
||||
</sect1>
|
||||
<sect1><title>Other Helper Macros</title>
|
||||
!Iinclude/sound/core.h
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
</book>
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,443 +0,0 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||
|
||||
<book id="debug-objects-guide">
|
||||
<bookinfo>
|
||||
<title>Debug objects life time</title>
|
||||
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Thomas</firstname>
|
||||
<surname>Gleixner</surname>
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>tglx@linutronix.de</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
</authorgroup>
|
||||
|
||||
<copyright>
|
||||
<year>2008</year>
|
||||
<holder>Thomas Gleixner</holder>
|
||||
</copyright>
|
||||
|
||||
<legalnotice>
|
||||
<para>
|
||||
This documentation is free software; you can redistribute
|
||||
it and/or modify it under the terms of the GNU General Public
|
||||
License version 2 as published by the Free Software Foundation.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This program is distributed in the hope that it will be
|
||||
useful, but WITHOUT ANY WARRANTY; without even the implied
|
||||
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
See the GNU General Public License for more details.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
You should have received a copy of the GNU General Public
|
||||
License along with this program; if not, write to the Free
|
||||
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
MA 02111-1307 USA
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For more details see the file COPYING in the source
|
||||
distribution of Linux.
|
||||
</para>
|
||||
</legalnotice>
|
||||
</bookinfo>
|
||||
|
||||
<toc></toc>
|
||||
|
||||
<chapter id="intro">
|
||||
<title>Introduction</title>
|
||||
<para>
|
||||
debugobjects is a generic infrastructure to track the life time
|
||||
of kernel objects and validate the operations on those.
|
||||
</para>
|
||||
<para>
|
||||
debugobjects is useful to check for the following error patterns:
|
||||
<itemizedlist>
|
||||
<listitem><para>Activation of uninitialized objects</para></listitem>
|
||||
<listitem><para>Initialization of active objects</para></listitem>
|
||||
<listitem><para>Usage of freed/destroyed objects</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
debugobjects is not changing the data structure of the real
|
||||
object so it can be compiled in with a minimal runtime impact
|
||||
and enabled on demand with a kernel command line option.
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="howto">
|
||||
<title>Howto use debugobjects</title>
|
||||
<para>
|
||||
A kernel subsystem needs to provide a data structure which
|
||||
describes the object type and add calls into the debug code at
|
||||
appropriate places. The data structure to describe the object
|
||||
type needs at minimum the name of the object type. Optional
|
||||
functions can and should be provided to fixup detected problems
|
||||
so the kernel can continue to work and the debug information can
|
||||
be retrieved from a live system instead of hard core debugging
|
||||
with serial consoles and stack trace transcripts from the
|
||||
monitor.
|
||||
</para>
|
||||
<para>
|
||||
The debug calls provided by debugobjects are:
|
||||
<itemizedlist>
|
||||
<listitem><para>debug_object_init</para></listitem>
|
||||
<listitem><para>debug_object_init_on_stack</para></listitem>
|
||||
<listitem><para>debug_object_activate</para></listitem>
|
||||
<listitem><para>debug_object_deactivate</para></listitem>
|
||||
<listitem><para>debug_object_destroy</para></listitem>
|
||||
<listitem><para>debug_object_free</para></listitem>
|
||||
<listitem><para>debug_object_assert_init</para></listitem>
|
||||
</itemizedlist>
|
||||
Each of these functions takes the address of the real object and
|
||||
a pointer to the object type specific debug description
|
||||
structure.
|
||||
</para>
|
||||
<para>
|
||||
Each detected error is reported in the statistics and a limited
|
||||
number of errors are printk'ed including a full stack trace.
|
||||
</para>
|
||||
<para>
|
||||
The statistics are available via /sys/kernel/debug/debug_objects/stats.
|
||||
They provide information about the number of warnings and the
|
||||
number of successful fixups along with information about the
|
||||
usage of the internal tracking objects and the state of the
|
||||
internal tracking objects pool.
|
||||
</para>
|
||||
</chapter>
|
||||
<chapter id="debugfunctions">
|
||||
<title>Debug functions</title>
|
||||
<sect1 id="prototypes">
|
||||
<title>Debug object function reference</title>
|
||||
!Elib/debugobjects.c
|
||||
</sect1>
|
||||
<sect1 id="debug_object_init">
|
||||
<title>debug_object_init</title>
|
||||
<para>
|
||||
This function is called whenever the initialization function
|
||||
of a real object is called.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is already tracked by debugobjects it is
|
||||
checked, whether the object can be initialized. Initializing
|
||||
is not allowed for active and destroyed objects. When
|
||||
debugobjects detects an error, then it calls the fixup_init
|
||||
function of the object type description structure if provided
|
||||
by the caller. The fixup function can correct the problem
|
||||
before the real initialization of the object happens. E.g. it
|
||||
can deactivate an active object in order to prevent damage to
|
||||
the subsystem.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is not yet tracked by debugobjects,
|
||||
debugobjects allocates a tracker object for the real object
|
||||
and sets the tracker object state to ODEBUG_STATE_INIT. It
|
||||
verifies that the object is not on the callers stack. If it is
|
||||
on the callers stack then a limited number of warnings
|
||||
including a full stack trace is printk'ed. The calling code
|
||||
must use debug_object_init_on_stack() and remove the object
|
||||
before leaving the function which allocated it. See next
|
||||
section.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="debug_object_init_on_stack">
|
||||
<title>debug_object_init_on_stack</title>
|
||||
<para>
|
||||
This function is called whenever the initialization function
|
||||
of a real object which resides on the stack is called.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is already tracked by debugobjects it is
|
||||
checked, whether the object can be initialized. Initializing
|
||||
is not allowed for active and destroyed objects. When
|
||||
debugobjects detects an error, then it calls the fixup_init
|
||||
function of the object type description structure if provided
|
||||
by the caller. The fixup function can correct the problem
|
||||
before the real initialization of the object happens. E.g. it
|
||||
can deactivate an active object in order to prevent damage to
|
||||
the subsystem.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is not yet tracked by debugobjects
|
||||
debugobjects allocates a tracker object for the real object
|
||||
and sets the tracker object state to ODEBUG_STATE_INIT. It
|
||||
verifies that the object is on the callers stack.
|
||||
</para>
|
||||
<para>
|
||||
An object which is on the stack must be removed from the
|
||||
tracker by calling debug_object_free() before the function
|
||||
which allocates the object returns. Otherwise we keep track of
|
||||
stale objects.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="debug_object_activate">
|
||||
<title>debug_object_activate</title>
|
||||
<para>
|
||||
This function is called whenever the activation function of a
|
||||
real object is called.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is already tracked by debugobjects it is
|
||||
checked, whether the object can be activated. Activating is
|
||||
not allowed for active and destroyed objects. When
|
||||
debugobjects detects an error, then it calls the
|
||||
fixup_activate function of the object type description
|
||||
structure if provided by the caller. The fixup function can
|
||||
correct the problem before the real activation of the object
|
||||
happens. E.g. it can deactivate an active object in order to
|
||||
prevent damage to the subsystem.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is not yet tracked by debugobjects then
|
||||
the fixup_activate function is called if available. This is
|
||||
necessary to allow the legitimate activation of statically
|
||||
allocated and initialized objects. The fixup function checks
|
||||
whether the object is valid and calls the debug_objects_init()
|
||||
function to initialize the tracking of this object.
|
||||
</para>
|
||||
<para>
|
||||
When the activation is legitimate, then the state of the
|
||||
associated tracker object is set to ODEBUG_STATE_ACTIVE.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="debug_object_deactivate">
|
||||
<title>debug_object_deactivate</title>
|
||||
<para>
|
||||
This function is called whenever the deactivation function of
|
||||
a real object is called.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is tracked by debugobjects it is checked,
|
||||
whether the object can be deactivated. Deactivating is not
|
||||
allowed for untracked or destroyed objects.
|
||||
</para>
|
||||
<para>
|
||||
When the deactivation is legitimate, then the state of the
|
||||
associated tracker object is set to ODEBUG_STATE_INACTIVE.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="debug_object_destroy">
|
||||
<title>debug_object_destroy</title>
|
||||
<para>
|
||||
This function is called to mark an object destroyed. This is
|
||||
useful to prevent the usage of invalid objects, which are
|
||||
still available in memory: either statically allocated objects
|
||||
or objects which are freed later.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is tracked by debugobjects it is checked,
|
||||
whether the object can be destroyed. Destruction is not
|
||||
allowed for active and destroyed objects. When debugobjects
|
||||
detects an error, then it calls the fixup_destroy function of
|
||||
the object type description structure if provided by the
|
||||
caller. The fixup function can correct the problem before the
|
||||
real destruction of the object happens. E.g. it can deactivate
|
||||
an active object in order to prevent damage to the subsystem.
|
||||
</para>
|
||||
<para>
|
||||
When the destruction is legitimate, then the state of the
|
||||
associated tracker object is set to ODEBUG_STATE_DESTROYED.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="debug_object_free">
|
||||
<title>debug_object_free</title>
|
||||
<para>
|
||||
This function is called before an object is freed.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is tracked by debugobjects it is checked,
|
||||
whether the object can be freed. Free is not allowed for
|
||||
active objects. When debugobjects detects an error, then it
|
||||
calls the fixup_free function of the object type description
|
||||
structure if provided by the caller. The fixup function can
|
||||
correct the problem before the real free of the object
|
||||
happens. E.g. it can deactivate an active object in order to
|
||||
prevent damage to the subsystem.
|
||||
</para>
|
||||
<para>
|
||||
Note that debug_object_free removes the object from the
|
||||
tracker. Later usage of the object is detected by the other
|
||||
debug checks.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="debug_object_assert_init">
|
||||
<title>debug_object_assert_init</title>
|
||||
<para>
|
||||
This function is called to assert that an object has been
|
||||
initialized.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is not tracked by debugobjects, it calls
|
||||
fixup_assert_init of the object type description structure
|
||||
provided by the caller, with the hardcoded object state
|
||||
ODEBUG_NOT_AVAILABLE. The fixup function can correct the problem
|
||||
by calling debug_object_init and other specific initializing
|
||||
functions.
|
||||
</para>
|
||||
<para>
|
||||
When the real object is already tracked by debugobjects it is
|
||||
ignored.
|
||||
</para>
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter id="fixupfunctions">
|
||||
<title>Fixup functions</title>
|
||||
<sect1 id="debug_obj_descr">
|
||||
<title>Debug object type description structure</title>
|
||||
!Iinclude/linux/debugobjects.h
|
||||
</sect1>
|
||||
<sect1 id="fixup_init">
|
||||
<title>fixup_init</title>
|
||||
<para>
|
||||
This function is called from the debug code whenever a problem
|
||||
in debug_object_init is detected. The function takes the
|
||||
address of the object and the state which is currently
|
||||
recorded in the tracker.
|
||||
</para>
|
||||
<para>
|
||||
Called from debug_object_init when the object state is:
|
||||
<itemizedlist>
|
||||
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
The function returns true when the fixup was successful,
|
||||
otherwise false. The return value is used to update the
|
||||
statistics.
|
||||
</para>
|
||||
<para>
|
||||
Note, that the function needs to call the debug_object_init()
|
||||
function again, after the damage has been repaired in order to
|
||||
keep the state consistent.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="fixup_activate">
|
||||
<title>fixup_activate</title>
|
||||
<para>
|
||||
This function is called from the debug code whenever a problem
|
||||
in debug_object_activate is detected.
|
||||
</para>
|
||||
<para>
|
||||
Called from debug_object_activate when the object state is:
|
||||
<itemizedlist>
|
||||
<listitem><para>ODEBUG_STATE_NOTAVAILABLE</para></listitem>
|
||||
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
The function returns true when the fixup was successful,
|
||||
otherwise false. The return value is used to update the
|
||||
statistics.
|
||||
</para>
|
||||
<para>
|
||||
Note that the function needs to call the debug_object_activate()
|
||||
function again after the damage has been repaired in order to
|
||||
keep the state consistent.
|
||||
</para>
|
||||
<para>
|
||||
The activation of statically initialized objects is a special
|
||||
case. When debug_object_activate() has no tracked object for
|
||||
this object address then fixup_activate() is called with
|
||||
object state ODEBUG_STATE_NOTAVAILABLE. The fixup function
|
||||
needs to check whether this is a legitimate case of a
|
||||
statically initialized object or not. In case it is it calls
|
||||
debug_object_init() and debug_object_activate() to make the
|
||||
object known to the tracker and marked active. In this case
|
||||
the function should return false because this is not a real
|
||||
fixup.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="fixup_destroy">
|
||||
<title>fixup_destroy</title>
|
||||
<para>
|
||||
This function is called from the debug code whenever a problem
|
||||
in debug_object_destroy is detected.
|
||||
</para>
|
||||
<para>
|
||||
Called from debug_object_destroy when the object state is:
|
||||
<itemizedlist>
|
||||
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
The function returns true when the fixup was successful,
|
||||
otherwise false. The return value is used to update the
|
||||
statistics.
|
||||
</para>
|
||||
</sect1>
|
||||
<sect1 id="fixup_free">
|
||||
<title>fixup_free</title>
|
||||
<para>
|
||||
This function is called from the debug code whenever a problem
|
||||
in debug_object_free is detected. Further it can be called
|
||||
from the debug checks in kfree/vfree, when an active object is
|
||||
detected from the debug_check_no_obj_freed() sanity checks.
|
||||
</para>
|
||||
<para>
|
||||
Called from debug_object_free() or debug_check_no_obj_freed()
|
||||
when the object state is:
|
||||
<itemizedlist>
|
||||
<listitem><para>ODEBUG_STATE_ACTIVE</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
The function returns true when the fixup was successful,
|
||||
otherwise false. The return value is used to update the
|
||||
statistics.
|
||||
</para>
|
||||
</sect1>
|
||||
<sect1 id="fixup_assert_init">
|
||||
<title>fixup_assert_init</title>
|
||||
<para>
|
||||
This function is called from the debug code whenever a problem
|
||||
in debug_object_assert_init is detected.
|
||||
</para>
|
||||
<para>
|
||||
Called from debug_object_assert_init() with a hardcoded state
|
||||
ODEBUG_STATE_NOTAVAILABLE when the object is not found in the
|
||||
debug bucket.
|
||||
</para>
|
||||
<para>
|
||||
The function returns true when the fixup was successful,
|
||||
otherwise false. The return value is used to update the
|
||||
statistics.
|
||||
</para>
|
||||
<para>
|
||||
Note, this function should make sure debug_object_init() is
|
||||
called before returning.
|
||||
</para>
|
||||
<para>
|
||||
The handling of statically initialized objects is a special
|
||||
case. The fixup function should check if this is a legitimate
|
||||
case of a statically initialized object or not. In this case only
|
||||
debug_object_init() should be called to make the object known to
|
||||
the tracker. Then the function should return false because this
|
||||
is not
|
||||
a real fixup.
|
||||
</para>
|
||||
</sect1>
|
||||
</chapter>
|
||||
<chapter id="bugs">
|
||||
<title>Known Bugs And Assumptions</title>
|
||||
<para>
|
||||
None (knock on wood).
|
||||
</para>
|
||||
</chapter>
|
||||
</book>
|
||||
@@ -1208,8 +1208,8 @@ static struct block_device_operations opt_fops = {
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
Finally, don't forget to read <filename>Documentation/SubmittingPatches</filename>
|
||||
and possibly <filename>Documentation/SubmittingDrivers</filename>.
|
||||
Finally, don't forget to read <filename>Documentation/process/submitting-patches.rst</filename>
|
||||
and possibly <filename>Documentation/process/submitting-drivers.rst</filename>.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
@@ -1,112 +0,0 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||
|
||||
<book id="Tracepoints">
|
||||
<bookinfo>
|
||||
<title>The Linux Kernel Tracepoint API</title>
|
||||
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Jason</firstname>
|
||||
<surname>Baron</surname>
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>jbaron@redhat.com</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
<author>
|
||||
<firstname>William</firstname>
|
||||
<surname>Cohen</surname>
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>wcohen@redhat.com</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
</authorgroup>
|
||||
|
||||
<legalnotice>
|
||||
<para>
|
||||
This documentation is free software; you can redistribute
|
||||
it and/or modify it under the terms of the GNU General Public
|
||||
License as published by the Free Software Foundation; either
|
||||
version 2 of the License, or (at your option) any later
|
||||
version.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This program is distributed in the hope that it will be
|
||||
useful, but WITHOUT ANY WARRANTY; without even the implied
|
||||
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
See the GNU General Public License for more details.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
You should have received a copy of the GNU General Public
|
||||
License along with this program; if not, write to the Free
|
||||
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
MA 02111-1307 USA
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For more details see the file COPYING in the source
|
||||
distribution of Linux.
|
||||
</para>
|
||||
</legalnotice>
|
||||
</bookinfo>
|
||||
|
||||
<toc></toc>
|
||||
<chapter id="intro">
|
||||
<title>Introduction</title>
|
||||
<para>
|
||||
Tracepoints are static probe points that are located in strategic points
|
||||
throughout the kernel. 'Probes' register/unregister with tracepoints
|
||||
via a callback mechanism. The 'probes' are strictly typed functions that
|
||||
are passed a unique set of parameters defined by each tracepoint.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
From this simple callback mechanism, 'probes' can be used to profile, debug,
|
||||
and understand kernel behavior. There are a number of tools that provide a
|
||||
framework for using 'probes'. These tools include Systemtap, ftrace, and
|
||||
LTTng.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Tracepoints are defined in a number of header files via various macros. Thus,
|
||||
the purpose of this document is to provide a clear accounting of the available
|
||||
tracepoints. The intention is to understand not only what tracepoints are
|
||||
available but also to understand where future tracepoints might be added.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The API presented has functions of the form:
|
||||
<function>trace_tracepointname(function parameters)</function>. These are the
|
||||
tracepoints callbacks that are found throughout the code. Registering and
|
||||
unregistering probes with these callback sites is covered in the
|
||||
<filename>Documentation/trace/*</filename> directory.
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="irq">
|
||||
<title>IRQ</title>
|
||||
!Iinclude/trace/events/irq.h
|
||||
</chapter>
|
||||
|
||||
<chapter id="signal">
|
||||
<title>SIGNAL</title>
|
||||
!Iinclude/trace/events/signal.h
|
||||
</chapter>
|
||||
|
||||
<chapter id="block">
|
||||
<title>Block IO</title>
|
||||
!Iinclude/trace/events/block.h
|
||||
</chapter>
|
||||
|
||||
<chapter id="workqueue">
|
||||
<title>Workqueue</title>
|
||||
!Iinclude/trace/events/workqueue.h
|
||||
</chapter>
|
||||
</book>
|
||||
@@ -45,6 +45,13 @@ GPL version 2.
|
||||
</abstract>
|
||||
|
||||
<revhistory>
|
||||
<revision>
|
||||
<revnumber>0.10</revnumber>
|
||||
<date>2016-10-17</date>
|
||||
<authorinitials>sch</authorinitials>
|
||||
<revremark>Added generic hyperv driver
|
||||
</revremark>
|
||||
</revision>
|
||||
<revision>
|
||||
<revnumber>0.9</revnumber>
|
||||
<date>2009-07-16</date>
|
||||
@@ -1033,6 +1040,61 @@ int main()
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="uio_hv_generic" xreflabel="Using Generic driver for Hyper-V VMBUS">
|
||||
<?dbhtml filename="uio_hv_generic.html"?>
|
||||
<title>Generic Hyper-V UIO driver</title>
|
||||
<para>
|
||||
The generic driver is a kernel module named uio_hv_generic.
|
||||
It supports devices on the Hyper-V VMBus similar to uio_pci_generic
|
||||
on PCI bus.
|
||||
</para>
|
||||
|
||||
<sect1 id="uio_hv_generic_binding">
|
||||
<title>Making the driver recognize the device</title>
|
||||
<para>
|
||||
Since the driver does not declare any device GUID's, it will not get loaded
|
||||
automatically and will not automatically bind to any devices, you must load it
|
||||
and allocate id to the driver yourself. For example, to use the network device
|
||||
GUID:
|
||||
<programlisting>
|
||||
modprobe uio_hv_generic
|
||||
echo "f8615163-df3e-46c5-913f-f2d2f965ed0e" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id
|
||||
</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
If there already is a hardware specific kernel driver for the device, the
|
||||
generic driver still won't bind to it, in this case if you want to use the
|
||||
generic driver (why would you?) you'll have to manually unbind the hardware
|
||||
specific driver and bind the generic driver, like this:
|
||||
<programlisting>
|
||||
echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/hv_netvsc/unbind
|
||||
echo -n vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/uio_hv_generic/bind
|
||||
</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
You can verify that the device has been bound to the driver
|
||||
by looking for it in sysfs, for example like the following:
|
||||
<programlisting>
|
||||
ls -l /sys/bus/vmbus/devices/vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver
|
||||
</programlisting>
|
||||
Which if successful should print
|
||||
<programlisting>
|
||||
.../vmbus-ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver -> ../../../bus/vmbus/drivers/uio_hv_generic
|
||||
</programlisting>
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="uio_hv_generic_internals">
|
||||
<title>Things to know about uio_hv_generic</title>
|
||||
<para>
|
||||
On each interrupt, uio_hv_generic sets the Interrupt Disable bit.
|
||||
This prevents the device from generating further interrupts
|
||||
until the bit is cleared. The userspace driver should clear this
|
||||
bit before blocking and waiting for more interrupts.
|
||||
</para>
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<appendix id="app1">
|
||||
<title>Further information</title>
|
||||
<itemizedlist>
|
||||
|
||||
@@ -1,992 +0,0 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||
|
||||
<book id="Linux-USB-API">
|
||||
<bookinfo>
|
||||
<title>The Linux-USB Host Side API</title>
|
||||
|
||||
<legalnotice>
|
||||
<para>
|
||||
This documentation is free software; you can redistribute
|
||||
it and/or modify it under the terms of the GNU General Public
|
||||
License as published by the Free Software Foundation; either
|
||||
version 2 of the License, or (at your option) any later
|
||||
version.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This program is distributed in the hope that it will be
|
||||
useful, but WITHOUT ANY WARRANTY; without even the implied
|
||||
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
See the GNU General Public License for more details.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
You should have received a copy of the GNU General Public
|
||||
License along with this program; if not, write to the Free
|
||||
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
MA 02111-1307 USA
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For more details see the file COPYING in the source
|
||||
distribution of Linux.
|
||||
</para>
|
||||
</legalnotice>
|
||||
</bookinfo>
|
||||
|
||||
<toc></toc>
|
||||
|
||||
<chapter id="intro">
|
||||
<title>Introduction to USB on Linux</title>
|
||||
|
||||
<para>A Universal Serial Bus (USB) is used to connect a host,
|
||||
such as a PC or workstation, to a number of peripheral
|
||||
devices. USB uses a tree structure, with the host as the
|
||||
root (the system's master), hubs as interior nodes, and
|
||||
peripherals as leaves (and slaves).
|
||||
Modern PCs support several such trees of USB devices, usually
|
||||
one USB 2.0 tree (480 Mbit/sec each) with
|
||||
a few USB 1.1 trees (12 Mbit/sec each) that are used when you
|
||||
connect a USB 1.1 device directly to the machine's "root hub".
|
||||
</para>
|
||||
|
||||
<para>That master/slave asymmetry was designed-in for a number of
|
||||
reasons, one being ease of use. It is not physically possible to
|
||||
assemble (legal) USB cables incorrectly: all upstream "to the host"
|
||||
connectors are the rectangular type (matching the sockets on
|
||||
root hubs), and all downstream connectors are the squarish type
|
||||
(or they are built into the peripheral).
|
||||
Also, the host software doesn't need to deal with distributed
|
||||
auto-configuration since the pre-designated master node manages all that.
|
||||
And finally, at the electrical level, bus protocol overhead is reduced by
|
||||
eliminating arbitration and moving scheduling into the host software.
|
||||
</para>
|
||||
|
||||
<para>USB 1.0 was announced in January 1996 and was revised
|
||||
as USB 1.1 (with improvements in hub specification and
|
||||
support for interrupt-out transfers) in September 1998.
|
||||
USB 2.0 was released in April 2000, adding high-speed
|
||||
transfers and transaction-translating hubs (used for USB 1.1
|
||||
and 1.0 backward compatibility).
|
||||
</para>
|
||||
|
||||
<para>Kernel developers added USB support to Linux early in the 2.2 kernel
|
||||
series, shortly before 2.3 development forked. Updates from 2.3 were
|
||||
regularly folded back into 2.2 releases, which improved reliability and
|
||||
brought <filename>/sbin/hotplug</filename> support as well more drivers.
|
||||
Such improvements were continued in the 2.5 kernel series, where they added
|
||||
USB 2.0 support, improved performance, and made the host controller drivers
|
||||
(HCDs) more consistent. They also simplified the API (to make bugs less
|
||||
likely) and added internal "kerneldoc" documentation.
|
||||
</para>
|
||||
|
||||
<para>Linux can run inside USB devices as well as on
|
||||
the hosts that control the devices.
|
||||
But USB device drivers running inside those peripherals
|
||||
don't do the same things as the ones running inside hosts,
|
||||
so they've been given a different name:
|
||||
<emphasis>gadget drivers</emphasis>.
|
||||
This document does not cover gadget drivers.
|
||||
</para>
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="host">
|
||||
<title>USB Host-Side API Model</title>
|
||||
|
||||
<para>Host-side drivers for USB devices talk to the "usbcore" APIs.
|
||||
There are two. One is intended for
|
||||
<emphasis>general-purpose</emphasis> drivers (exposed through
|
||||
driver frameworks), and the other is for drivers that are
|
||||
<emphasis>part of the core</emphasis>.
|
||||
Such core drivers include the <emphasis>hub</emphasis> driver
|
||||
(which manages trees of USB devices) and several different kinds
|
||||
of <emphasis>host controller drivers</emphasis>,
|
||||
which control individual busses.
|
||||
</para>
|
||||
|
||||
<para>The device model seen by USB drivers is relatively complex.
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
|
||||
<listitem><para>USB supports four kinds of data transfers
|
||||
(control, bulk, interrupt, and isochronous). Two of them (control
|
||||
and bulk) use bandwidth as it's available,
|
||||
while the other two (interrupt and isochronous)
|
||||
are scheduled to provide guaranteed bandwidth.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>The device description model includes one or more
|
||||
"configurations" per device, only one of which is active at a time.
|
||||
Devices that are capable of high-speed operation must also support
|
||||
full-speed configurations, along with a way to ask about the
|
||||
"other speed" configurations which might be used.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>Configurations have one or more "interfaces", each
|
||||
of which may have "alternate settings". Interfaces may be
|
||||
standardized by USB "Class" specifications, or may be specific to
|
||||
a vendor or device.</para>
|
||||
|
||||
<para>USB device drivers actually bind to interfaces, not devices.
|
||||
Think of them as "interface drivers", though you
|
||||
may not see many devices where the distinction is important.
|
||||
<emphasis>Most USB devices are simple, with only one configuration,
|
||||
one interface, and one alternate setting.</emphasis>
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>Interfaces have one or more "endpoints", each of
|
||||
which supports one type and direction of data transfer such as
|
||||
"bulk out" or "interrupt in". The entire configuration may have
|
||||
up to sixteen endpoints in each direction, allocated as needed
|
||||
among all the interfaces.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>Data transfer on USB is packetized; each endpoint
|
||||
has a maximum packet size.
|
||||
Drivers must often be aware of conventions such as flagging the end
|
||||
of bulk transfers using "short" (including zero length) packets.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>The Linux USB API supports synchronous calls for
|
||||
control and bulk messages.
|
||||
It also supports asynchronous calls for all kinds of data transfer,
|
||||
using request structures called "URBs" (USB Request Blocks).
|
||||
</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
<para>Accordingly, the USB Core API exposed to device drivers
|
||||
covers quite a lot of territory. You'll probably need to consult
|
||||
the USB 2.0 specification, available online from www.usb.org at
|
||||
no cost, as well as class or device specifications.
|
||||
</para>
|
||||
|
||||
<para>The only host-side drivers that actually touch hardware
|
||||
(reading/writing registers, handling IRQs, and so on) are the HCDs.
|
||||
In theory, all HCDs provide the same functionality through the same
|
||||
API. In practice, that's becoming more true on the 2.5 kernels,
|
||||
but there are still differences that crop up especially with
|
||||
fault handling. Different controllers don't necessarily report
|
||||
the same aspects of failures, and recovery from faults (including
|
||||
software-induced ones like unlinking an URB) isn't yet fully
|
||||
consistent.
|
||||
Device driver authors should make a point of doing disconnect
|
||||
testing (while the device is active) with each different host
|
||||
controller driver, to make sure drivers don't have bugs of
|
||||
their own as well as to make sure they aren't relying on some
|
||||
HCD-specific behavior.
|
||||
(You will need external USB 1.1 and/or
|
||||
USB 2.0 hubs to perform all those tests.)
|
||||
</para>
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="types"><title>USB-Standard Types</title>
|
||||
|
||||
<para>In <filename><linux/usb/ch9.h></filename> you will find
|
||||
the USB data types defined in chapter 9 of the USB specification.
|
||||
These data types are used throughout USB, and in APIs including
|
||||
this host side API, gadget APIs, and usbfs.
|
||||
</para>
|
||||
|
||||
!Iinclude/linux/usb/ch9.h
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="hostside"><title>Host-Side Data Types and Macros</title>
|
||||
|
||||
<para>The host side API exposes several layers to drivers, some of
|
||||
which are more necessary than others.
|
||||
These support lifecycle models for host side drivers
|
||||
and devices, and support passing buffers through usbcore to
|
||||
some HCD that performs the I/O for the device driver.
|
||||
</para>
|
||||
|
||||
|
||||
!Iinclude/linux/usb.h
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="usbcore"><title>USB Core APIs</title>
|
||||
|
||||
<para>There are two basic I/O models in the USB API.
|
||||
The most elemental one is asynchronous: drivers submit requests
|
||||
in the form of an URB, and the URB's completion callback
|
||||
handle the next step.
|
||||
All USB transfer types support that model, although there
|
||||
are special cases for control URBs (which always have setup
|
||||
and status stages, but may not have a data stage) and
|
||||
isochronous URBs (which allow large packets and include
|
||||
per-packet fault reports).
|
||||
Built on top of that is synchronous API support, where a
|
||||
driver calls a routine that allocates one or more URBs,
|
||||
submits them, and waits until they complete.
|
||||
There are synchronous wrappers for single-buffer control
|
||||
and bulk transfers (which are awkward to use in some
|
||||
driver disconnect scenarios), and for scatterlist based
|
||||
streaming i/o (bulk or interrupt).
|
||||
</para>
|
||||
|
||||
<para>USB drivers need to provide buffers that can be
|
||||
used for DMA, although they don't necessarily need to
|
||||
provide the DMA mapping themselves.
|
||||
There are APIs to use used when allocating DMA buffers,
|
||||
which can prevent use of bounce buffers on some systems.
|
||||
In some cases, drivers may be able to rely on 64bit DMA
|
||||
to eliminate another kind of bounce buffer.
|
||||
</para>
|
||||
|
||||
!Edrivers/usb/core/urb.c
|
||||
!Edrivers/usb/core/message.c
|
||||
!Edrivers/usb/core/file.c
|
||||
!Edrivers/usb/core/driver.c
|
||||
!Edrivers/usb/core/usb.c
|
||||
!Edrivers/usb/core/hub.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="hcd"><title>Host Controller APIs</title>
|
||||
|
||||
<para>These APIs are only for use by host controller drivers,
|
||||
most of which implement standard register interfaces such as
|
||||
EHCI, OHCI, or UHCI.
|
||||
UHCI was one of the first interfaces, designed by Intel and
|
||||
also used by VIA; it doesn't do much in hardware.
|
||||
OHCI was designed later, to have the hardware do more work
|
||||
(bigger transfers, tracking protocol state, and so on).
|
||||
EHCI was designed with USB 2.0; its design has features that
|
||||
resemble OHCI (hardware does much more work) as well as
|
||||
UHCI (some parts of ISO support, TD list processing).
|
||||
</para>
|
||||
|
||||
<para>There are host controllers other than the "big three",
|
||||
although most PCI based controllers (and a few non-PCI based
|
||||
ones) use one of those interfaces.
|
||||
Not all host controllers use DMA; some use PIO, and there
|
||||
is also a simulator.
|
||||
</para>
|
||||
|
||||
<para>The same basic APIs are available to drivers for all
|
||||
those controllers.
|
||||
For historical reasons they are in two layers:
|
||||
<structname>struct usb_bus</structname> is a rather thin
|
||||
layer that became available in the 2.2 kernels, while
|
||||
<structname>struct usb_hcd</structname> is a more featureful
|
||||
layer (available in later 2.4 kernels and in 2.5) that
|
||||
lets HCDs share common code, to shrink driver size
|
||||
and significantly reduce hcd-specific behaviors.
|
||||
</para>
|
||||
|
||||
!Edrivers/usb/core/hcd.c
|
||||
!Edrivers/usb/core/hcd-pci.c
|
||||
!Idrivers/usb/core/buffer.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="usbfs">
|
||||
<title>The USB Filesystem (usbfs)</title>
|
||||
|
||||
<para>This chapter presents the Linux <emphasis>usbfs</emphasis>.
|
||||
You may prefer to avoid writing new kernel code for your
|
||||
USB driver; that's the problem that usbfs set out to solve.
|
||||
User mode device drivers are usually packaged as applications
|
||||
or libraries, and may use usbfs through some programming library
|
||||
that wraps it. Such libraries include
|
||||
<ulink url="http://libusb.sourceforge.net">libusb</ulink>
|
||||
for C/C++, and
|
||||
<ulink url="http://jUSB.sourceforge.net">jUSB</ulink> for Java.
|
||||
</para>
|
||||
|
||||
<note><title>Unfinished</title>
|
||||
<para>This particular documentation is incomplete,
|
||||
especially with respect to the asynchronous mode.
|
||||
As of kernel 2.5.66 the code and this (new) documentation
|
||||
need to be cross-reviewed.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
<para>Configure usbfs into Linux kernels by enabling the
|
||||
<emphasis>USB filesystem</emphasis> option (CONFIG_USB_DEVICEFS),
|
||||
and you get basic support for user mode USB device drivers.
|
||||
Until relatively recently it was often (confusingly) called
|
||||
<emphasis>usbdevfs</emphasis> although it wasn't solving what
|
||||
<emphasis>devfs</emphasis> was.
|
||||
Every USB device will appear in usbfs, regardless of whether or
|
||||
not it has a kernel driver.
|
||||
</para>
|
||||
|
||||
<sect1 id="usbfs-files">
|
||||
<title>What files are in "usbfs"?</title>
|
||||
|
||||
<para>Conventionally mounted at
|
||||
<filename>/proc/bus/usb</filename>, usbfs
|
||||
features include:
|
||||
<itemizedlist>
|
||||
<listitem><para><filename>/proc/bus/usb/devices</filename>
|
||||
... a text file
|
||||
showing each of the USB devices on known to the kernel,
|
||||
and their configuration descriptors.
|
||||
You can also poll() this to learn about new devices.
|
||||
</para></listitem>
|
||||
<listitem><para><filename>/proc/bus/usb/BBB/DDD</filename>
|
||||
... magic files
|
||||
exposing the each device's configuration descriptors, and
|
||||
supporting a series of ioctls for making device requests,
|
||||
including I/O to devices. (Purely for access by programs.)
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
|
||||
<para> Each bus is given a number (BBB) based on when it was
|
||||
enumerated; within each bus, each device is given a similar
|
||||
number (DDD).
|
||||
Those BBB/DDD paths are not "stable" identifiers;
|
||||
expect them to change even if you always leave the devices
|
||||
plugged in to the same hub port.
|
||||
<emphasis>Don't even think of saving these in application
|
||||
configuration files.</emphasis>
|
||||
Stable identifiers are available, for user mode applications
|
||||
that want to use them. HID and networking devices expose
|
||||
these stable IDs, so that for example you can be sure that
|
||||
you told the right UPS to power down its second server.
|
||||
"usbfs" doesn't (yet) expose those IDs.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="usbfs-fstab">
|
||||
<title>Mounting and Access Control</title>
|
||||
|
||||
<para>There are a number of mount options for usbfs, which will
|
||||
be of most interest to you if you need to override the default
|
||||
access control policy.
|
||||
That policy is that only root may read or write device files
|
||||
(<filename>/proc/bus/BBB/DDD</filename>) although anyone may read
|
||||
the <filename>devices</filename>
|
||||
or <filename>drivers</filename> files.
|
||||
I/O requests to the device also need the CAP_SYS_RAWIO capability,
|
||||
</para>
|
||||
|
||||
<para>The significance of that is that by default, all user mode
|
||||
device drivers need super-user privileges.
|
||||
You can change modes or ownership in a driver setup
|
||||
when the device hotplugs, or maye just start the
|
||||
driver right then, as a privileged server (or some activity
|
||||
within one).
|
||||
That's the most secure approach for multi-user systems,
|
||||
but for single user systems ("trusted" by that user)
|
||||
it's more convenient just to grant everyone all access
|
||||
(using the <emphasis>devmode=0666</emphasis> option)
|
||||
so the driver can start whenever it's needed.
|
||||
</para>
|
||||
|
||||
<para>The mount options for usbfs, usable in /etc/fstab or
|
||||
in command line invocations of <emphasis>mount</emphasis>, are:
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term><emphasis>busgid</emphasis>=NNNNN</term>
|
||||
<listitem><para>Controls the GID used for the
|
||||
/proc/bus/usb/BBB
|
||||
directories. (Default: 0)</para></listitem></varlistentry>
|
||||
<varlistentry><term><emphasis>busmode</emphasis>=MMM</term>
|
||||
<listitem><para>Controls the file mode used for the
|
||||
/proc/bus/usb/BBB
|
||||
directories. (Default: 0555)
|
||||
</para></listitem></varlistentry>
|
||||
<varlistentry><term><emphasis>busuid</emphasis>=NNNNN</term>
|
||||
<listitem><para>Controls the UID used for the
|
||||
/proc/bus/usb/BBB
|
||||
directories. (Default: 0)</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term><emphasis>devgid</emphasis>=NNNNN</term>
|
||||
<listitem><para>Controls the GID used for the
|
||||
/proc/bus/usb/BBB/DDD
|
||||
files. (Default: 0)</para></listitem></varlistentry>
|
||||
<varlistentry><term><emphasis>devmode</emphasis>=MMM</term>
|
||||
<listitem><para>Controls the file mode used for the
|
||||
/proc/bus/usb/BBB/DDD
|
||||
files. (Default: 0644)</para></listitem></varlistentry>
|
||||
<varlistentry><term><emphasis>devuid</emphasis>=NNNNN</term>
|
||||
<listitem><para>Controls the UID used for the
|
||||
/proc/bus/usb/BBB/DDD
|
||||
files. (Default: 0)</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term><emphasis>listgid</emphasis>=NNNNN</term>
|
||||
<listitem><para>Controls the GID used for the
|
||||
/proc/bus/usb/devices and drivers files.
|
||||
(Default: 0)</para></listitem></varlistentry>
|
||||
<varlistentry><term><emphasis>listmode</emphasis>=MMM</term>
|
||||
<listitem><para>Controls the file mode used for the
|
||||
/proc/bus/usb/devices and drivers files.
|
||||
(Default: 0444)</para></listitem></varlistentry>
|
||||
<varlistentry><term><emphasis>listuid</emphasis>=NNNNN</term>
|
||||
<listitem><para>Controls the UID used for the
|
||||
/proc/bus/usb/devices and drivers files.
|
||||
(Default: 0)</para></listitem></varlistentry>
|
||||
</variablelist>
|
||||
|
||||
</para>
|
||||
|
||||
<para>Note that many Linux distributions hard-wire the mount options
|
||||
for usbfs in their init scripts, such as
|
||||
<filename>/etc/rc.d/rc.sysinit</filename>,
|
||||
rather than making it easy to set this per-system
|
||||
policy in <filename>/etc/fstab</filename>.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="usbfs-devices">
|
||||
<title>/proc/bus/usb/devices</title>
|
||||
|
||||
<para>This file is handy for status viewing tools in user
|
||||
mode, which can scan the text format and ignore most of it.
|
||||
More detailed device status (including class and vendor
|
||||
status) is available from device-specific files.
|
||||
For information about the current format of this file,
|
||||
see the
|
||||
<filename>Documentation/usb/proc_usb_info.txt</filename>
|
||||
file in your Linux kernel sources.
|
||||
</para>
|
||||
|
||||
<para>This file, in combination with the poll() system call, can
|
||||
also be used to detect when devices are added or removed:
|
||||
<programlisting>int fd;
|
||||
struct pollfd pfd;
|
||||
|
||||
fd = open("/proc/bus/usb/devices", O_RDONLY);
|
||||
pfd = { fd, POLLIN, 0 };
|
||||
for (;;) {
|
||||
/* The first time through, this call will return immediately. */
|
||||
poll(&pfd, 1, -1);
|
||||
|
||||
/* To see what's changed, compare the file's previous and current
|
||||
contents or scan the filesystem. (Scanning is more precise.) */
|
||||
}</programlisting>
|
||||
Note that this behavior is intended to be used for informational
|
||||
and debug purposes. It would be more appropriate to use programs
|
||||
such as udev or HAL to initialize a device or start a user-mode
|
||||
helper program, for instance.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="usbfs-bbbddd">
|
||||
<title>/proc/bus/usb/BBB/DDD</title>
|
||||
|
||||
<para>Use these files in one of these basic ways:
|
||||
</para>
|
||||
|
||||
<para><emphasis>They can be read,</emphasis>
|
||||
producing first the device descriptor
|
||||
(18 bytes) and then the descriptors for the current configuration.
|
||||
See the USB 2.0 spec for details about those binary data formats.
|
||||
You'll need to convert most multibyte values from little endian
|
||||
format to your native host byte order, although a few of the
|
||||
fields in the device descriptor (both of the BCD-encoded fields,
|
||||
and the vendor and product IDs) will be byteswapped for you.
|
||||
Note that configuration descriptors include descriptors for
|
||||
interfaces, altsettings, endpoints, and maybe additional
|
||||
class descriptors.
|
||||
</para>
|
||||
|
||||
<para><emphasis>Perform USB operations</emphasis> using
|
||||
<emphasis>ioctl()</emphasis> requests to make endpoint I/O
|
||||
requests (synchronously or asynchronously) or manage
|
||||
the device.
|
||||
These requests need the CAP_SYS_RAWIO capability,
|
||||
as well as filesystem access permissions.
|
||||
Only one ioctl request can be made on one of these
|
||||
device files at a time.
|
||||
This means that if you are synchronously reading an endpoint
|
||||
from one thread, you won't be able to write to a different
|
||||
endpoint from another thread until the read completes.
|
||||
This works for <emphasis>half duplex</emphasis> protocols,
|
||||
but otherwise you'd use asynchronous i/o requests.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
|
||||
<sect1 id="usbfs-lifecycle">
|
||||
<title>Life Cycle of User Mode Drivers</title>
|
||||
|
||||
<para>Such a driver first needs to find a device file
|
||||
for a device it knows how to handle.
|
||||
Maybe it was told about it because a
|
||||
<filename>/sbin/hotplug</filename> event handling agent
|
||||
chose that driver to handle the new device.
|
||||
Or maybe it's an application that scans all the
|
||||
/proc/bus/usb device files, and ignores most devices.
|
||||
In either case, it should <function>read()</function> all
|
||||
the descriptors from the device file,
|
||||
and check them against what it knows how to handle.
|
||||
It might just reject everything except a particular
|
||||
vendor and product ID, or need a more complex policy.
|
||||
</para>
|
||||
|
||||
<para>Never assume there will only be one such device
|
||||
on the system at a time!
|
||||
If your code can't handle more than one device at
|
||||
a time, at least detect when there's more than one, and
|
||||
have your users choose which device to use.
|
||||
</para>
|
||||
|
||||
<para>Once your user mode driver knows what device to use,
|
||||
it interacts with it in either of two styles.
|
||||
The simple style is to make only control requests; some
|
||||
devices don't need more complex interactions than those.
|
||||
(An example might be software using vendor-specific control
|
||||
requests for some initialization or configuration tasks,
|
||||
with a kernel driver for the rest.)
|
||||
</para>
|
||||
|
||||
<para>More likely, you need a more complex style driver:
|
||||
one using non-control endpoints, reading or writing data
|
||||
and claiming exclusive use of an interface.
|
||||
<emphasis>Bulk</emphasis> transfers are easiest to use,
|
||||
but only their sibling <emphasis>interrupt</emphasis> transfers
|
||||
work with low speed devices.
|
||||
Both interrupt and <emphasis>isochronous</emphasis> transfers
|
||||
offer service guarantees because their bandwidth is reserved.
|
||||
Such "periodic" transfers are awkward to use through usbfs,
|
||||
unless you're using the asynchronous calls. However, interrupt
|
||||
transfers can also be used in a synchronous "one shot" style.
|
||||
</para>
|
||||
|
||||
<para>Your user-mode driver should never need to worry
|
||||
about cleaning up request state when the device is
|
||||
disconnected, although it should close its open file
|
||||
descriptors as soon as it starts seeing the ENODEV
|
||||
errors.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="usbfs-ioctl"><title>The ioctl() Requests</title>
|
||||
|
||||
<para>To use these ioctls, you need to include the following
|
||||
headers in your userspace program:
|
||||
<programlisting>#include <linux/usb.h>
|
||||
#include <linux/usbdevice_fs.h>
|
||||
#include <asm/byteorder.h></programlisting>
|
||||
The standard USB device model requests, from "Chapter 9" of
|
||||
the USB 2.0 specification, are automatically included from
|
||||
the <filename><linux/usb/ch9.h></filename> header.
|
||||
</para>
|
||||
|
||||
<para>Unless noted otherwise, the ioctl requests
|
||||
described here will
|
||||
update the modification time on the usbfs file to which
|
||||
they are applied (unless they fail).
|
||||
A return of zero indicates success; otherwise, a
|
||||
standard USB error code is returned. (These are
|
||||
documented in
|
||||
<filename>Documentation/usb/error-codes.txt</filename>
|
||||
in your kernel sources.)
|
||||
</para>
|
||||
|
||||
<para>Each of these files multiplexes access to several
|
||||
I/O streams, one per endpoint.
|
||||
Each device has one control endpoint (endpoint zero)
|
||||
which supports a limited RPC style RPC access.
|
||||
Devices are configured
|
||||
by hub_wq (in the kernel) setting a device-wide
|
||||
<emphasis>configuration</emphasis> that affects things
|
||||
like power consumption and basic functionality.
|
||||
The endpoints are part of USB <emphasis>interfaces</emphasis>,
|
||||
which may have <emphasis>altsettings</emphasis>
|
||||
affecting things like which endpoints are available.
|
||||
Many devices only have a single configuration and interface,
|
||||
so drivers for them will ignore configurations and altsettings.
|
||||
</para>
|
||||
|
||||
|
||||
<sect2 id="usbfs-mgmt">
|
||||
<title>Management/Status Requests</title>
|
||||
|
||||
<para>A number of usbfs requests don't deal very directly
|
||||
with device I/O.
|
||||
They mostly relate to device management and status.
|
||||
These are all synchronous requests.
|
||||
</para>
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry><term>USBDEVFS_CLAIMINTERFACE</term>
|
||||
<listitem><para>This is used to force usbfs to
|
||||
claim a specific interface,
|
||||
which has not previously been claimed by usbfs or any other
|
||||
kernel driver.
|
||||
The ioctl parameter is an integer holding the number of
|
||||
the interface (bInterfaceNumber from descriptor).
|
||||
</para><para>
|
||||
Note that if your driver doesn't claim an interface
|
||||
before trying to use one of its endpoints, and no
|
||||
other driver has bound to it, then the interface is
|
||||
automatically claimed by usbfs.
|
||||
</para><para>
|
||||
This claim will be released by a RELEASEINTERFACE ioctl,
|
||||
or by closing the file descriptor.
|
||||
File modification time is not updated by this request.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_CONNECTINFO</term>
|
||||
<listitem><para>Says whether the device is lowspeed.
|
||||
The ioctl parameter points to a structure like this:
|
||||
<programlisting>struct usbdevfs_connectinfo {
|
||||
unsigned int devnum;
|
||||
unsigned char slow;
|
||||
}; </programlisting>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
<emphasis>You can't tell whether a "not slow"
|
||||
device is connected at high speed (480 MBit/sec)
|
||||
or just full speed (12 MBit/sec).</emphasis>
|
||||
You should know the devnum value already,
|
||||
it's the DDD value of the device file name.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_GETDRIVER</term>
|
||||
<listitem><para>Returns the name of the kernel driver
|
||||
bound to a given interface (a string). Parameter
|
||||
is a pointer to this structure, which is modified:
|
||||
<programlisting>struct usbdevfs_getdriver {
|
||||
unsigned int interface;
|
||||
char driver[USBDEVFS_MAXDRIVERNAME + 1];
|
||||
};</programlisting>
|
||||
File modification time is not updated by this request.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_IOCTL</term>
|
||||
<listitem><para>Passes a request from userspace through
|
||||
to a kernel driver that has an ioctl entry in the
|
||||
<emphasis>struct usb_driver</emphasis> it registered.
|
||||
<programlisting>struct usbdevfs_ioctl {
|
||||
int ifno;
|
||||
int ioctl_code;
|
||||
void *data;
|
||||
};
|
||||
|
||||
/* user mode call looks like this.
|
||||
* 'request' becomes the driver->ioctl() 'code' parameter.
|
||||
* the size of 'param' is encoded in 'request', and that data
|
||||
* is copied to or from the driver->ioctl() 'buf' parameter.
|
||||
*/
|
||||
static int
|
||||
usbdev_ioctl (int fd, int ifno, unsigned request, void *param)
|
||||
{
|
||||
struct usbdevfs_ioctl wrapper;
|
||||
|
||||
wrapper.ifno = ifno;
|
||||
wrapper.ioctl_code = request;
|
||||
wrapper.data = param;
|
||||
|
||||
return ioctl (fd, USBDEVFS_IOCTL, &wrapper);
|
||||
} </programlisting>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
This request lets kernel drivers talk to user mode code
|
||||
through filesystem operations even when they don't create
|
||||
a character or block special device.
|
||||
It's also been used to do things like ask devices what
|
||||
device special file should be used.
|
||||
Two pre-defined ioctls are used
|
||||
to disconnect and reconnect kernel drivers, so
|
||||
that user mode code can completely manage binding
|
||||
and configuration of devices.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_RELEASEINTERFACE</term>
|
||||
<listitem><para>This is used to release the claim usbfs
|
||||
made on interface, either implicitly or because of a
|
||||
USBDEVFS_CLAIMINTERFACE call, before the file
|
||||
descriptor is closed.
|
||||
The ioctl parameter is an integer holding the number of
|
||||
the interface (bInterfaceNumber from descriptor);
|
||||
File modification time is not updated by this request.
|
||||
</para><warning><para>
|
||||
<emphasis>No security check is made to ensure
|
||||
that the task which made the claim is the one
|
||||
which is releasing it.
|
||||
This means that user mode driver may interfere
|
||||
other ones. </emphasis>
|
||||
</para></warning></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_RESETEP</term>
|
||||
<listitem><para>Resets the data toggle value for an endpoint
|
||||
(bulk or interrupt) to DATA0.
|
||||
The ioctl parameter is an integer endpoint number
|
||||
(1 to 15, as identified in the endpoint descriptor),
|
||||
with USB_DIR_IN added if the device's endpoint sends
|
||||
data to the host.
|
||||
</para><warning><para>
|
||||
<emphasis>Avoid using this request.
|
||||
It should probably be removed.</emphasis>
|
||||
Using it typically means the device and driver will lose
|
||||
toggle synchronization. If you really lost synchronization,
|
||||
you likely need to completely handshake with the device,
|
||||
using a request like CLEAR_HALT
|
||||
or SET_INTERFACE.
|
||||
</para></warning></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_DROP_PRIVILEGES</term>
|
||||
<listitem><para>This is used to relinquish the ability
|
||||
to do certain operations which are considered to be
|
||||
privileged on a usbfs file descriptor.
|
||||
This includes claiming arbitrary interfaces, resetting
|
||||
a device on which there are currently claimed interfaces
|
||||
from other users, and issuing USBDEVFS_IOCTL calls.
|
||||
The ioctl parameter is a 32 bit mask of interfaces
|
||||
the user is allowed to claim on this file descriptor.
|
||||
You may issue this ioctl more than one time to narrow
|
||||
said mask.
|
||||
</para></listitem></varlistentry>
|
||||
</variablelist>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="usbfs-sync">
|
||||
<title>Synchronous I/O Support</title>
|
||||
|
||||
<para>Synchronous requests involve the kernel blocking
|
||||
until the user mode request completes, either by
|
||||
finishing successfully or by reporting an error.
|
||||
In most cases this is the simplest way to use usbfs,
|
||||
although as noted above it does prevent performing I/O
|
||||
to more than one endpoint at a time.
|
||||
</para>
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry><term>USBDEVFS_BULK</term>
|
||||
<listitem><para>Issues a bulk read or write request to the
|
||||
device.
|
||||
The ioctl parameter is a pointer to this structure:
|
||||
<programlisting>struct usbdevfs_bulktransfer {
|
||||
unsigned int ep;
|
||||
unsigned int len;
|
||||
unsigned int timeout; /* in milliseconds */
|
||||
void *data;
|
||||
};</programlisting>
|
||||
</para><para>The "ep" value identifies a
|
||||
bulk endpoint number (1 to 15, as identified in an endpoint
|
||||
descriptor),
|
||||
masked with USB_DIR_IN when referring to an endpoint which
|
||||
sends data to the host from the device.
|
||||
The length of the data buffer is identified by "len";
|
||||
Recent kernels support requests up to about 128KBytes.
|
||||
<emphasis>FIXME say how read length is returned,
|
||||
and how short reads are handled.</emphasis>.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_CLEAR_HALT</term>
|
||||
<listitem><para>Clears endpoint halt (stall) and
|
||||
resets the endpoint toggle. This is only
|
||||
meaningful for bulk or interrupt endpoints.
|
||||
The ioctl parameter is an integer endpoint number
|
||||
(1 to 15, as identified in an endpoint descriptor),
|
||||
masked with USB_DIR_IN when referring to an endpoint which
|
||||
sends data to the host from the device.
|
||||
</para><para>
|
||||
Use this on bulk or interrupt endpoints which have
|
||||
stalled, returning <emphasis>-EPIPE</emphasis> status
|
||||
to a data transfer request.
|
||||
Do not issue the control request directly, since
|
||||
that could invalidate the host's record of the
|
||||
data toggle.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_CONTROL</term>
|
||||
<listitem><para>Issues a control request to the device.
|
||||
The ioctl parameter points to a structure like this:
|
||||
<programlisting>struct usbdevfs_ctrltransfer {
|
||||
__u8 bRequestType;
|
||||
__u8 bRequest;
|
||||
__u16 wValue;
|
||||
__u16 wIndex;
|
||||
__u16 wLength;
|
||||
__u32 timeout; /* in milliseconds */
|
||||
void *data;
|
||||
};</programlisting>
|
||||
</para><para>
|
||||
The first eight bytes of this structure are the contents
|
||||
of the SETUP packet to be sent to the device; see the
|
||||
USB 2.0 specification for details.
|
||||
The bRequestType value is composed by combining a
|
||||
USB_TYPE_* value, a USB_DIR_* value, and a
|
||||
USB_RECIP_* value (from
|
||||
<emphasis><linux/usb.h></emphasis>).
|
||||
If wLength is nonzero, it describes the length of the data
|
||||
buffer, which is either written to the device
|
||||
(USB_DIR_OUT) or read from the device (USB_DIR_IN).
|
||||
</para><para>
|
||||
At this writing, you can't transfer more than 4 KBytes
|
||||
of data to or from a device; usbfs has a limit, and
|
||||
some host controller drivers have a limit.
|
||||
(That's not usually a problem.)
|
||||
<emphasis>Also</emphasis> there's no way to say it's
|
||||
not OK to get a short read back from the device.
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_RESET</term>
|
||||
<listitem><para>Does a USB level device reset.
|
||||
The ioctl parameter is ignored.
|
||||
After the reset, this rebinds all device interfaces.
|
||||
File modification time is not updated by this request.
|
||||
</para><warning><para>
|
||||
<emphasis>Avoid using this call</emphasis>
|
||||
until some usbcore bugs get fixed,
|
||||
since it does not fully synchronize device, interface,
|
||||
and driver (not just usbfs) state.
|
||||
</para></warning></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_SETINTERFACE</term>
|
||||
<listitem><para>Sets the alternate setting for an
|
||||
interface. The ioctl parameter is a pointer to a
|
||||
structure like this:
|
||||
<programlisting>struct usbdevfs_setinterface {
|
||||
unsigned int interface;
|
||||
unsigned int altsetting;
|
||||
}; </programlisting>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
Those struct members are from some interface descriptor
|
||||
applying to the current configuration.
|
||||
The interface number is the bInterfaceNumber value, and
|
||||
the altsetting number is the bAlternateSetting value.
|
||||
(This resets each endpoint in the interface.)
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_SETCONFIGURATION</term>
|
||||
<listitem><para>Issues the
|
||||
<function>usb_set_configuration</function> call
|
||||
for the device.
|
||||
The parameter is an integer holding the number of
|
||||
a configuration (bConfigurationValue from descriptor).
|
||||
File modification time is not updated by this request.
|
||||
</para><warning><para>
|
||||
<emphasis>Avoid using this call</emphasis>
|
||||
until some usbcore bugs get fixed,
|
||||
since it does not fully synchronize device, interface,
|
||||
and driver (not just usbfs) state.
|
||||
</para></warning></listitem></varlistentry>
|
||||
|
||||
</variablelist>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="usbfs-async">
|
||||
<title>Asynchronous I/O Support</title>
|
||||
|
||||
<para>As mentioned above, there are situations where it may be
|
||||
important to initiate concurrent operations from user mode code.
|
||||
This is particularly important for periodic transfers
|
||||
(interrupt and isochronous), but it can be used for other
|
||||
kinds of USB requests too.
|
||||
In such cases, the asynchronous requests described here
|
||||
are essential. Rather than submitting one request and having
|
||||
the kernel block until it completes, the blocking is separate.
|
||||
</para>
|
||||
|
||||
<para>These requests are packaged into a structure that
|
||||
resembles the URB used by kernel device drivers.
|
||||
(No POSIX Async I/O support here, sorry.)
|
||||
It identifies the endpoint type (USBDEVFS_URB_TYPE_*),
|
||||
endpoint (number, masked with USB_DIR_IN as appropriate),
|
||||
buffer and length, and a user "context" value serving to
|
||||
uniquely identify each request.
|
||||
(It's usually a pointer to per-request data.)
|
||||
Flags can modify requests (not as many as supported for
|
||||
kernel drivers).
|
||||
</para>
|
||||
|
||||
<para>Each request can specify a realtime signal number
|
||||
(between SIGRTMIN and SIGRTMAX, inclusive) to request a
|
||||
signal be sent when the request completes.
|
||||
</para>
|
||||
|
||||
<para>When usbfs returns these urbs, the status value
|
||||
is updated, and the buffer may have been modified.
|
||||
Except for isochronous transfers, the actual_length is
|
||||
updated to say how many bytes were transferred; if the
|
||||
USBDEVFS_URB_DISABLE_SPD flag is set
|
||||
("short packets are not OK"), if fewer bytes were read
|
||||
than were requested then you get an error report.
|
||||
</para>
|
||||
|
||||
<programlisting>struct usbdevfs_iso_packet_desc {
|
||||
unsigned int length;
|
||||
unsigned int actual_length;
|
||||
unsigned int status;
|
||||
};
|
||||
|
||||
struct usbdevfs_urb {
|
||||
unsigned char type;
|
||||
unsigned char endpoint;
|
||||
int status;
|
||||
unsigned int flags;
|
||||
void *buffer;
|
||||
int buffer_length;
|
||||
int actual_length;
|
||||
int start_frame;
|
||||
int number_of_packets;
|
||||
int error_count;
|
||||
unsigned int signr;
|
||||
void *usercontext;
|
||||
struct usbdevfs_iso_packet_desc iso_frame_desc[];
|
||||
};</programlisting>
|
||||
|
||||
<para> For these asynchronous requests, the file modification
|
||||
time reflects when the request was initiated.
|
||||
This contrasts with their use with the synchronous requests,
|
||||
where it reflects when requests complete.
|
||||
</para>
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry><term>USBDEVFS_DISCARDURB</term>
|
||||
<listitem><para>
|
||||
<emphasis>TBS</emphasis>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_DISCSIGNAL</term>
|
||||
<listitem><para>
|
||||
<emphasis>TBS</emphasis>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_REAPURB</term>
|
||||
<listitem><para>
|
||||
<emphasis>TBS</emphasis>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_REAPURBNDELAY</term>
|
||||
<listitem><para>
|
||||
<emphasis>TBS</emphasis>
|
||||
File modification time is not updated by this request.
|
||||
</para><para>
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
<varlistentry><term>USBDEVFS_SUBMITURB</term>
|
||||
<listitem><para>
|
||||
<emphasis>TBS</emphasis>
|
||||
</para><para>
|
||||
</para></listitem></varlistentry>
|
||||
|
||||
</variablelist>
|
||||
</sect2>
|
||||
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
||||
|
||||
</book>
|
||||
<!-- vim:syntax=sgml:sw=4
|
||||
-->
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,652 +0,0 @@
|
||||
HOWTO do Linux kernel development
|
||||
=================================
|
||||
|
||||
This is the be-all, end-all document on this topic. It contains
|
||||
instructions on how to become a Linux kernel developer and how to learn
|
||||
to work with the Linux kernel development community. It tries to not
|
||||
contain anything related to the technical aspects of kernel programming,
|
||||
but will help point you in the right direction for that.
|
||||
|
||||
If anything in this document becomes out of date, please send in patches
|
||||
to the maintainer of this file, who is listed at the bottom of the
|
||||
document.
|
||||
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
So, you want to learn how to become a Linux kernel developer? Or you
|
||||
have been told by your manager, "Go write a Linux driver for this
|
||||
device." This document's goal is to teach you everything you need to
|
||||
know to achieve this by describing the process you need to go through,
|
||||
and hints on how to work with the community. It will also try to
|
||||
explain some of the reasons why the community works like it does.
|
||||
|
||||
The kernel is written mostly in C, with some architecture-dependent
|
||||
parts written in assembly. A good understanding of C is required for
|
||||
kernel development. Assembly (any architecture) is not required unless
|
||||
you plan to do low-level development for that architecture. Though they
|
||||
are not a good substitute for a solid C education and/or years of
|
||||
experience, the following books are good for, if anything, reference:
|
||||
|
||||
- "The C Programming Language" by Kernighan and Ritchie [Prentice Hall]
|
||||
- "Practical C Programming" by Steve Oualline [O'Reilly]
|
||||
- "C: A Reference Manual" by Harbison and Steele [Prentice Hall]
|
||||
|
||||
The kernel is written using GNU C and the GNU toolchain. While it
|
||||
adheres to the ISO C89 standard, it uses a number of extensions that are
|
||||
not featured in the standard. The kernel is a freestanding C
|
||||
environment, with no reliance on the standard C library, so some
|
||||
portions of the C standard are not supported. Arbitrary long long
|
||||
divisions and floating point are not allowed. It can sometimes be
|
||||
difficult to understand the assumptions the kernel has on the toolchain
|
||||
and the extensions that it uses, and unfortunately there is no
|
||||
definitive reference for them. Please check the gcc info pages (`info
|
||||
gcc`) for some information on them.
|
||||
|
||||
Please remember that you are trying to learn how to work with the
|
||||
existing development community. It is a diverse group of people, with
|
||||
high standards for coding, style and procedure. These standards have
|
||||
been created over time based on what they have found to work best for
|
||||
such a large and geographically dispersed team. Try to learn as much as
|
||||
possible about these standards ahead of time, as they are well
|
||||
documented; do not expect people to adapt to you or your company's way
|
||||
of doing things.
|
||||
|
||||
|
||||
Legal Issues
|
||||
------------
|
||||
|
||||
The Linux kernel source code is released under the GPL. Please see the
|
||||
file, COPYING, in the main directory of the source tree, for details on
|
||||
the license. If you have further questions about the license, please
|
||||
contact a lawyer, and do not ask on the Linux kernel mailing list. The
|
||||
people on the mailing lists are not lawyers, and you should not rely on
|
||||
their statements on legal matters.
|
||||
|
||||
For common questions and answers about the GPL, please see:
|
||||
|
||||
https://www.gnu.org/licenses/gpl-faq.html
|
||||
|
||||
|
||||
Documentation
|
||||
-------------
|
||||
|
||||
The Linux kernel source tree has a large range of documents that are
|
||||
invaluable for learning how to interact with the kernel community. When
|
||||
new features are added to the kernel, it is recommended that new
|
||||
documentation files are also added which explain how to use the feature.
|
||||
When a kernel change causes the interface that the kernel exposes to
|
||||
userspace to change, it is recommended that you send the information or
|
||||
a patch to the manual pages explaining the change to the manual pages
|
||||
maintainer at mtk.manpages@gmail.com, and CC the list
|
||||
linux-api@vger.kernel.org.
|
||||
|
||||
Here is a list of files that are in the kernel source tree that are
|
||||
required reading:
|
||||
|
||||
README
|
||||
This file gives a short background on the Linux kernel and describes
|
||||
what is necessary to do to configure and build the kernel. People
|
||||
who are new to the kernel should start here.
|
||||
|
||||
:ref:`Documentation/Changes <changes>`
|
||||
This file gives a list of the minimum levels of various software
|
||||
packages that are necessary to build and run the kernel
|
||||
successfully.
|
||||
|
||||
:ref:`Documentation/CodingStyle <codingstyle>`
|
||||
This describes the Linux kernel coding style, and some of the
|
||||
rationale behind it. All new code is expected to follow the
|
||||
guidelines in this document. Most maintainers will only accept
|
||||
patches if these rules are followed, and many people will only
|
||||
review code if it is in the proper style.
|
||||
|
||||
:ref:`Documentation/SubmittingPatches <submittingpatches>` and :ref:`Documentation/SubmittingDrivers <submittingdrivers>`
|
||||
These files describe in explicit detail how to successfully create
|
||||
and send a patch, including (but not limited to):
|
||||
|
||||
- Email contents
|
||||
- Email format
|
||||
- Who to send it to
|
||||
|
||||
Following these rules will not guarantee success (as all patches are
|
||||
subject to scrutiny for content and style), but not following them
|
||||
will almost always prevent it.
|
||||
|
||||
Other excellent descriptions of how to create patches properly are:
|
||||
|
||||
"The Perfect Patch"
|
||||
https://www.ozlabs.org/~akpm/stuff/tpp.txt
|
||||
|
||||
"Linux kernel patch submission format"
|
||||
http://linux.yyz.us/patch-format.html
|
||||
|
||||
:ref:`Documentation/stable_api_nonsense.txt <stable_api_nonsense>`
|
||||
This file describes the rationale behind the conscious decision to
|
||||
not have a stable API within the kernel, including things like:
|
||||
|
||||
- Subsystem shim-layers (for compatibility?)
|
||||
- Driver portability between Operating Systems.
|
||||
- Mitigating rapid change within the kernel source tree (or
|
||||
preventing rapid change)
|
||||
|
||||
This document is crucial for understanding the Linux development
|
||||
philosophy and is very important for people moving to Linux from
|
||||
development on other Operating Systems.
|
||||
|
||||
:ref:`Documentation/SecurityBugs <securitybugs>`
|
||||
If you feel you have found a security problem in the Linux kernel,
|
||||
please follow the steps in this document to help notify the kernel
|
||||
developers, and help solve the issue.
|
||||
|
||||
:ref:`Documentation/ManagementStyle <managementstyle>`
|
||||
This document describes how Linux kernel maintainers operate and the
|
||||
shared ethos behind their methodologies. This is important reading
|
||||
for anyone new to kernel development (or anyone simply curious about
|
||||
it), as it resolves a lot of common misconceptions and confusion
|
||||
about the unique behavior of kernel maintainers.
|
||||
|
||||
:ref:`Documentation/stable_kernel_rules.txt <stable_kernel_rules>`
|
||||
This file describes the rules on how the stable kernel releases
|
||||
happen, and what to do if you want to get a change into one of these
|
||||
releases.
|
||||
|
||||
:ref:`Documentation/kernel-docs.txt <kernel_docs>`
|
||||
A list of external documentation that pertains to kernel
|
||||
development. Please consult this list if you do not find what you
|
||||
are looking for within the in-kernel documentation.
|
||||
|
||||
:ref:`Documentation/applying-patches.txt <applying_patches>`
|
||||
A good introduction describing exactly what a patch is and how to
|
||||
apply it to the different development branches of the kernel.
|
||||
|
||||
The kernel also has a large number of documents that can be
|
||||
automatically generated from the source code itself or from
|
||||
ReStructuredText markups (ReST), like this one. This includes a
|
||||
full description of the in-kernel API, and rules on how to handle
|
||||
locking properly.
|
||||
|
||||
All such documents can be generated as PDF or HTML by running::
|
||||
|
||||
make pdfdocs
|
||||
make htmldocs
|
||||
|
||||
respectively from the main kernel source directory.
|
||||
|
||||
The documents that uses ReST markup will be generated at Documentation/output.
|
||||
They can also be generated on LaTeX and ePub formats with::
|
||||
|
||||
make latexdocs
|
||||
make epubdocs
|
||||
|
||||
Currently, there are some documents written on DocBook that are in
|
||||
the process of conversion to ReST. Such documents will be created in the
|
||||
Documentation/DocBook/ directory and can be generated also as
|
||||
Postscript or man pages by running::
|
||||
|
||||
make psdocs
|
||||
make mandocs
|
||||
|
||||
Becoming A Kernel Developer
|
||||
---------------------------
|
||||
|
||||
If you do not know anything about Linux kernel development, you should
|
||||
look at the Linux KernelNewbies project:
|
||||
|
||||
https://kernelnewbies.org
|
||||
|
||||
It consists of a helpful mailing list where you can ask almost any type
|
||||
of basic kernel development question (make sure to search the archives
|
||||
first, before asking something that has already been answered in the
|
||||
past.) It also has an IRC channel that you can use to ask questions in
|
||||
real-time, and a lot of helpful documentation that is useful for
|
||||
learning about Linux kernel development.
|
||||
|
||||
The website has basic information about code organization, subsystems,
|
||||
and current projects (both in-tree and out-of-tree). It also describes
|
||||
some basic logistical information, like how to compile a kernel and
|
||||
apply a patch.
|
||||
|
||||
If you do not know where you want to start, but you want to look for
|
||||
some task to start doing to join into the kernel development community,
|
||||
go to the Linux Kernel Janitor's project:
|
||||
|
||||
https://kernelnewbies.org/KernelJanitors
|
||||
|
||||
It is a great place to start. It describes a list of relatively simple
|
||||
problems that need to be cleaned up and fixed within the Linux kernel
|
||||
source tree. Working with the developers in charge of this project, you
|
||||
will learn the basics of getting your patch into the Linux kernel tree,
|
||||
and possibly be pointed in the direction of what to go work on next, if
|
||||
you do not already have an idea.
|
||||
|
||||
If you already have a chunk of code that you want to put into the kernel
|
||||
tree, but need some help getting it in the proper form, the
|
||||
kernel-mentors project was created to help you out with this. It is a
|
||||
mailing list, and can be found at:
|
||||
|
||||
https://selenic.com/mailman/listinfo/kernel-mentors
|
||||
|
||||
Before making any actual modifications to the Linux kernel code, it is
|
||||
imperative to understand how the code in question works. For this
|
||||
purpose, nothing is better than reading through it directly (most tricky
|
||||
bits are commented well), perhaps even with the help of specialized
|
||||
tools. One such tool that is particularly recommended is the Linux
|
||||
Cross-Reference project, which is able to present source code in a
|
||||
self-referential, indexed webpage format. An excellent up-to-date
|
||||
repository of the kernel code may be found at:
|
||||
|
||||
http://lxr.free-electrons.com/
|
||||
|
||||
|
||||
The development process
|
||||
-----------------------
|
||||
|
||||
Linux kernel development process currently consists of a few different
|
||||
main kernel "branches" and lots of different subsystem-specific kernel
|
||||
branches. These different branches are:
|
||||
|
||||
- main 4.x kernel tree
|
||||
- 4.x.y -stable kernel tree
|
||||
- 4.x -git kernel patches
|
||||
- subsystem specific kernel trees and patches
|
||||
- the 4.x -next kernel tree for integration tests
|
||||
|
||||
4.x kernel tree
|
||||
-----------------
|
||||
4.x kernels are maintained by Linus Torvalds, and can be found on
|
||||
https://kernel.org in the pub/linux/kernel/v4.x/ directory. Its development
|
||||
process is as follows:
|
||||
|
||||
- As soon as a new kernel is released a two weeks window is open,
|
||||
during this period of time maintainers can submit big diffs to
|
||||
Linus, usually the patches that have already been included in the
|
||||
-next kernel for a few weeks. The preferred way to submit big changes
|
||||
is using git (the kernel's source management tool, more information
|
||||
can be found at https://git-scm.com/) but plain patches are also just
|
||||
fine.
|
||||
- After two weeks a -rc1 kernel is released it is now possible to push
|
||||
only patches that do not include new features that could affect the
|
||||
stability of the whole kernel. Please note that a whole new driver
|
||||
(or filesystem) might be accepted after -rc1 because there is no
|
||||
risk of causing regressions with such a change as long as the change
|
||||
is self-contained and does not affect areas outside of the code that
|
||||
is being added. git can be used to send patches to Linus after -rc1
|
||||
is released, but the patches need to also be sent to a public
|
||||
mailing list for review.
|
||||
- A new -rc is released whenever Linus deems the current git tree to
|
||||
be in a reasonably sane state adequate for testing. The goal is to
|
||||
release a new -rc kernel every week.
|
||||
- Process continues until the kernel is considered "ready", the
|
||||
process should last around 6 weeks.
|
||||
|
||||
It is worth mentioning what Andrew Morton wrote on the linux-kernel
|
||||
mailing list about kernel releases:
|
||||
|
||||
*"Nobody knows when a kernel will be released, because it's
|
||||
released according to perceived bug status, not according to a
|
||||
preconceived timeline."*
|
||||
|
||||
4.x.y -stable kernel tree
|
||||
-------------------------
|
||||
Kernels with 3-part versions are -stable kernels. They contain
|
||||
relatively small and critical fixes for security problems or significant
|
||||
regressions discovered in a given 4.x kernel.
|
||||
|
||||
This is the recommended branch for users who want the most recent stable
|
||||
kernel and are not interested in helping test development/experimental
|
||||
versions.
|
||||
|
||||
If no 4.x.y kernel is available, then the highest numbered 4.x
|
||||
kernel is the current stable kernel.
|
||||
|
||||
4.x.y are maintained by the "stable" team <stable@vger.kernel.org>, and
|
||||
are released as needs dictate. The normal release period is approximately
|
||||
two weeks, but it can be longer if there are no pressing problems. A
|
||||
security-related problem, instead, can cause a release to happen almost
|
||||
instantly.
|
||||
|
||||
The file Documentation/stable_kernel_rules.txt in the kernel tree
|
||||
documents what kinds of changes are acceptable for the -stable tree, and
|
||||
how the release process works.
|
||||
|
||||
4.x -git patches
|
||||
----------------
|
||||
These are daily snapshots of Linus' kernel tree which are managed in a
|
||||
git repository (hence the name.) These patches are usually released
|
||||
daily and represent the current state of Linus' tree. They are more
|
||||
experimental than -rc kernels since they are generated automatically
|
||||
without even a cursory glance to see if they are sane.
|
||||
|
||||
Subsystem Specific kernel trees and patches
|
||||
-------------------------------------------
|
||||
The maintainers of the various kernel subsystems --- and also many
|
||||
kernel subsystem developers --- expose their current state of
|
||||
development in source repositories. That way, others can see what is
|
||||
happening in the different areas of the kernel. In areas where
|
||||
development is rapid, a developer may be asked to base his submissions
|
||||
onto such a subsystem kernel tree so that conflicts between the
|
||||
submission and other already ongoing work are avoided.
|
||||
|
||||
Most of these repositories are git trees, but there are also other SCMs
|
||||
in use, or patch queues being published as quilt series. Addresses of
|
||||
these subsystem repositories are listed in the MAINTAINERS file. Many
|
||||
of them can be browsed at https://git.kernel.org/.
|
||||
|
||||
Before a proposed patch is committed to such a subsystem tree, it is
|
||||
subject to review which primarily happens on mailing lists (see the
|
||||
respective section below). For several kernel subsystems, this review
|
||||
process is tracked with the tool patchwork. Patchwork offers a web
|
||||
interface which shows patch postings, any comments on a patch or
|
||||
revisions to it, and maintainers can mark patches as under review,
|
||||
accepted, or rejected. Most of these patchwork sites are listed at
|
||||
https://patchwork.kernel.org/.
|
||||
|
||||
4.x -next kernel tree for integration tests
|
||||
-------------------------------------------
|
||||
Before updates from subsystem trees are merged into the mainline 4.x
|
||||
tree, they need to be integration-tested. For this purpose, a special
|
||||
testing repository exists into which virtually all subsystem trees are
|
||||
pulled on an almost daily basis:
|
||||
|
||||
https://git.kernel.org/?p=linux/kernel/git/next/linux-next.git
|
||||
|
||||
This way, the -next kernel gives a summary outlook onto what will be
|
||||
expected to go into the mainline kernel at the next merge period.
|
||||
Adventurous testers are very welcome to runtime-test the -next kernel.
|
||||
|
||||
|
||||
Bug Reporting
|
||||
-------------
|
||||
|
||||
https://bugzilla.kernel.org is where the Linux kernel developers track kernel
|
||||
bugs. Users are encouraged to report all bugs that they find in this
|
||||
tool. For details on how to use the kernel bugzilla, please see:
|
||||
|
||||
https://bugzilla.kernel.org/page.cgi?id=faq.html
|
||||
|
||||
The file REPORTING-BUGS in the main kernel source directory has a good
|
||||
template for how to report a possible kernel bug, and details what kind
|
||||
of information is needed by the kernel developers to help track down the
|
||||
problem.
|
||||
|
||||
|
||||
Managing bug reports
|
||||
--------------------
|
||||
|
||||
One of the best ways to put into practice your hacking skills is by fixing
|
||||
bugs reported by other people. Not only you will help to make the kernel
|
||||
more stable, you'll learn to fix real world problems and you will improve
|
||||
your skills, and other developers will be aware of your presence. Fixing
|
||||
bugs is one of the best ways to get merits among other developers, because
|
||||
not many people like wasting time fixing other people's bugs.
|
||||
|
||||
To work in the already reported bug reports, go to https://bugzilla.kernel.org.
|
||||
If you want to be advised of the future bug reports, you can subscribe to the
|
||||
bugme-new mailing list (only new bug reports are mailed here) or to the
|
||||
bugme-janitor mailing list (every change in the bugzilla is mailed here)
|
||||
|
||||
https://lists.linux-foundation.org/mailman/listinfo/bugme-new
|
||||
|
||||
https://lists.linux-foundation.org/mailman/listinfo/bugme-janitors
|
||||
|
||||
|
||||
|
||||
Mailing lists
|
||||
-------------
|
||||
|
||||
As some of the above documents describe, the majority of the core kernel
|
||||
developers participate on the Linux Kernel Mailing list. Details on how
|
||||
to subscribe and unsubscribe from the list can be found at:
|
||||
|
||||
http://vger.kernel.org/vger-lists.html#linux-kernel
|
||||
|
||||
There are archives of the mailing list on the web in many different
|
||||
places. Use a search engine to find these archives. For example:
|
||||
|
||||
http://dir.gmane.org/gmane.linux.kernel
|
||||
|
||||
It is highly recommended that you search the archives about the topic
|
||||
you want to bring up, before you post it to the list. A lot of things
|
||||
already discussed in detail are only recorded at the mailing list
|
||||
archives.
|
||||
|
||||
Most of the individual kernel subsystems also have their own separate
|
||||
mailing list where they do their development efforts. See the
|
||||
MAINTAINERS file for a list of what these lists are for the different
|
||||
groups.
|
||||
|
||||
Many of the lists are hosted on kernel.org. Information on them can be
|
||||
found at:
|
||||
|
||||
http://vger.kernel.org/vger-lists.html
|
||||
|
||||
Please remember to follow good behavioral habits when using the lists.
|
||||
Though a bit cheesy, the following URL has some simple guidelines for
|
||||
interacting with the list (or any list):
|
||||
|
||||
http://www.albion.com/netiquette/
|
||||
|
||||
If multiple people respond to your mail, the CC: list of recipients may
|
||||
get pretty large. Don't remove anybody from the CC: list without a good
|
||||
reason, or don't reply only to the list address. Get used to receiving the
|
||||
mail twice, one from the sender and the one from the list, and don't try
|
||||
to tune that by adding fancy mail-headers, people will not like it.
|
||||
|
||||
Remember to keep the context and the attribution of your replies intact,
|
||||
keep the "John Kernelhacker wrote ...:" lines at the top of your reply, and
|
||||
add your statements between the individual quoted sections instead of
|
||||
writing at the top of the mail.
|
||||
|
||||
If you add patches to your mail, make sure they are plain readable text
|
||||
as stated in Documentation/SubmittingPatches.
|
||||
Kernel developers don't want to deal with
|
||||
attachments or compressed patches; they may want to comment on
|
||||
individual lines of your patch, which works only that way. Make sure you
|
||||
use a mail program that does not mangle spaces and tab characters. A
|
||||
good first test is to send the mail to yourself and try to apply your
|
||||
own patch by yourself. If that doesn't work, get your mail program fixed
|
||||
or change it until it works.
|
||||
|
||||
Above all, please remember to show respect to other subscribers.
|
||||
|
||||
|
||||
Working with the community
|
||||
--------------------------
|
||||
|
||||
The goal of the kernel community is to provide the best possible kernel
|
||||
there is. When you submit a patch for acceptance, it will be reviewed
|
||||
on its technical merits and those alone. So, what should you be
|
||||
expecting?
|
||||
|
||||
- criticism
|
||||
- comments
|
||||
- requests for change
|
||||
- requests for justification
|
||||
- silence
|
||||
|
||||
Remember, this is part of getting your patch into the kernel. You have
|
||||
to be able to take criticism and comments about your patches, evaluate
|
||||
them at a technical level and either rework your patches or provide
|
||||
clear and concise reasoning as to why those changes should not be made.
|
||||
If there are no responses to your posting, wait a few days and try
|
||||
again, sometimes things get lost in the huge volume.
|
||||
|
||||
What should you not do?
|
||||
|
||||
- expect your patch to be accepted without question
|
||||
- become defensive
|
||||
- ignore comments
|
||||
- resubmit the patch without making any of the requested changes
|
||||
|
||||
In a community that is looking for the best technical solution possible,
|
||||
there will always be differing opinions on how beneficial a patch is.
|
||||
You have to be cooperative, and willing to adapt your idea to fit within
|
||||
the kernel. Or at least be willing to prove your idea is worth it.
|
||||
Remember, being wrong is acceptable as long as you are willing to work
|
||||
toward a solution that is right.
|
||||
|
||||
It is normal that the answers to your first patch might simply be a list
|
||||
of a dozen things you should correct. This does **not** imply that your
|
||||
patch will not be accepted, and it is **not** meant against you
|
||||
personally. Simply correct all issues raised against your patch and
|
||||
resend it.
|
||||
|
||||
|
||||
Differences between the kernel community and corporate structures
|
||||
-----------------------------------------------------------------
|
||||
|
||||
The kernel community works differently than most traditional corporate
|
||||
development environments. Here are a list of things that you can try to
|
||||
do to avoid problems:
|
||||
|
||||
Good things to say regarding your proposed changes:
|
||||
|
||||
- "This solves multiple problems."
|
||||
- "This deletes 2000 lines of code."
|
||||
- "Here is a patch that explains what I am trying to describe."
|
||||
- "I tested it on 5 different architectures..."
|
||||
- "Here is a series of small patches that..."
|
||||
- "This increases performance on typical machines..."
|
||||
|
||||
Bad things you should avoid saying:
|
||||
|
||||
- "We did it this way in AIX/ptx/Solaris, so therefore it must be
|
||||
good..."
|
||||
- "I've being doing this for 20 years, so..."
|
||||
- "This is required for my company to make money"
|
||||
- "This is for our Enterprise product line."
|
||||
- "Here is my 1000 page design document that describes my idea"
|
||||
- "I've been working on this for 6 months..."
|
||||
- "Here's a 5000 line patch that..."
|
||||
- "I rewrote all of the current mess, and here it is..."
|
||||
- "I have a deadline, and this patch needs to be applied now."
|
||||
|
||||
Another way the kernel community is different than most traditional
|
||||
software engineering work environments is the faceless nature of
|
||||
interaction. One benefit of using email and irc as the primary forms of
|
||||
communication is the lack of discrimination based on gender or race.
|
||||
The Linux kernel work environment is accepting of women and minorities
|
||||
because all you are is an email address. The international aspect also
|
||||
helps to level the playing field because you can't guess gender based on
|
||||
a person's name. A man may be named Andrea and a woman may be named Pat.
|
||||
Most women who have worked in the Linux kernel and have expressed an
|
||||
opinion have had positive experiences.
|
||||
|
||||
The language barrier can cause problems for some people who are not
|
||||
comfortable with English. A good grasp of the language can be needed in
|
||||
order to get ideas across properly on mailing lists, so it is
|
||||
recommended that you check your emails to make sure they make sense in
|
||||
English before sending them.
|
||||
|
||||
|
||||
Break up your changes
|
||||
---------------------
|
||||
|
||||
The Linux kernel community does not gladly accept large chunks of code
|
||||
dropped on it all at once. The changes need to be properly introduced,
|
||||
discussed, and broken up into tiny, individual portions. This is almost
|
||||
the exact opposite of what companies are used to doing. Your proposal
|
||||
should also be introduced very early in the development process, so that
|
||||
you can receive feedback on what you are doing. It also lets the
|
||||
community feel that you are working with them, and not simply using them
|
||||
as a dumping ground for your feature. However, don't send 50 emails at
|
||||
one time to a mailing list, your patch series should be smaller than
|
||||
that almost all of the time.
|
||||
|
||||
The reasons for breaking things up are the following:
|
||||
|
||||
1) Small patches increase the likelihood that your patches will be
|
||||
applied, since they don't take much time or effort to verify for
|
||||
correctness. A 5 line patch can be applied by a maintainer with
|
||||
barely a second glance. However, a 500 line patch may take hours to
|
||||
review for correctness (the time it takes is exponentially
|
||||
proportional to the size of the patch, or something).
|
||||
|
||||
Small patches also make it very easy to debug when something goes
|
||||
wrong. It's much easier to back out patches one by one than it is
|
||||
to dissect a very large patch after it's been applied (and broken
|
||||
something).
|
||||
|
||||
2) It's important not only to send small patches, but also to rewrite
|
||||
and simplify (or simply re-order) patches before submitting them.
|
||||
|
||||
Here is an analogy from kernel developer Al Viro:
|
||||
|
||||
*"Think of a teacher grading homework from a math student. The
|
||||
teacher does not want to see the student's trials and errors
|
||||
before they came up with the solution. They want to see the
|
||||
cleanest, most elegant answer. A good student knows this, and
|
||||
would never submit her intermediate work before the final
|
||||
solution.*
|
||||
|
||||
*The same is true of kernel development. The maintainers and
|
||||
reviewers do not want to see the thought process behind the
|
||||
solution to the problem one is solving. They want to see a
|
||||
simple and elegant solution."*
|
||||
|
||||
It may be challenging to keep the balance between presenting an elegant
|
||||
solution and working together with the community and discussing your
|
||||
unfinished work. Therefore it is good to get early in the process to
|
||||
get feedback to improve your work, but also keep your changes in small
|
||||
chunks that they may get already accepted, even when your whole task is
|
||||
not ready for inclusion now.
|
||||
|
||||
Also realize that it is not acceptable to send patches for inclusion
|
||||
that are unfinished and will be "fixed up later."
|
||||
|
||||
|
||||
Justify your change
|
||||
-------------------
|
||||
|
||||
Along with breaking up your patches, it is very important for you to let
|
||||
the Linux community know why they should add this change. New features
|
||||
must be justified as being needed and useful.
|
||||
|
||||
|
||||
Document your change
|
||||
--------------------
|
||||
|
||||
When sending in your patches, pay special attention to what you say in
|
||||
the text in your email. This information will become the ChangeLog
|
||||
information for the patch, and will be preserved for everyone to see for
|
||||
all time. It should describe the patch completely, containing:
|
||||
|
||||
- why the change is necessary
|
||||
- the overall design approach in the patch
|
||||
- implementation details
|
||||
- testing results
|
||||
|
||||
For more details on what this should all look like, please see the
|
||||
ChangeLog section of the document:
|
||||
|
||||
"The Perfect Patch"
|
||||
http://www.ozlabs.org/~akpm/stuff/tpp.txt
|
||||
|
||||
|
||||
All of these things are sometimes very hard to do. It can take years to
|
||||
perfect these practices (if at all). It's a continuous process of
|
||||
improvement that requires a lot of patience and determination. But
|
||||
don't give up, it's possible. Many have done it before, and each had to
|
||||
start exactly where you are now.
|
||||
|
||||
|
||||
|
||||
|
||||
----------
|
||||
|
||||
Thanks to Paolo Ciarrocchi who allowed the "Development Process"
|
||||
(https://lwn.net/Articles/94386/) section
|
||||
to be based on text he had written, and to Randy Dunlap and Gerrit
|
||||
Huizenga for some of the list of things you should and should not say.
|
||||
Also thanks to Pat Mochel, Hanna Linder, Randy Dunlap, Kay Sievers,
|
||||
Vojtech Pavlik, Jan Kara, Josh Boyer, Kees Cook, Andrew Morton, Andi
|
||||
Kleen, Vadim Lobanov, Jesper Juhl, Adrian Bunk, Keri Harris, Frans Pop,
|
||||
David A. Wheeler, Junio Hamano, Michael Kerrisk, and Alex Shepard for
|
||||
their review, comments, and contributions. Without their help, this
|
||||
document would not have been possible.
|
||||
|
||||
|
||||
|
||||
Maintainer: Greg Kroah-Hartman <greg@kroah.com>
|
||||
@@ -111,6 +111,8 @@ ipmi_ssif - A driver for accessing BMCs on the SMBus. It uses the
|
||||
I2C kernel driver's SMBus interfaces to send and receive IPMI messages
|
||||
over the SMBus.
|
||||
|
||||
ipmi_powernv - A driver for access BMCs on POWERNV systems.
|
||||
|
||||
ipmi_watchdog - IPMI requires systems to have a very capable watchdog
|
||||
timer. This driver implements the standard Linux watchdog timer
|
||||
interface on top of the IPMI message handler.
|
||||
@@ -118,17 +120,15 @@ interface on top of the IPMI message handler.
|
||||
ipmi_poweroff - Some systems support the ability to be turned off via
|
||||
IPMI commands.
|
||||
|
||||
These are all individually selectable via configuration options.
|
||||
bt-bmc - This is not part of the main driver, but instead a driver for
|
||||
accessing a BMC-side interface of a BT interface. It is used on BMCs
|
||||
running Linux to provide an interface to the host.
|
||||
|
||||
Note that the KCS-only interface has been removed. The af_ipmi driver
|
||||
is no longer supported and has been removed because it was impossible
|
||||
to do 32 bit emulation on 64-bit kernels with it.
|
||||
These are all individually selectable via configuration options.
|
||||
|
||||
Much documentation for the interface is in the include files. The
|
||||
IPMI include files are:
|
||||
|
||||
net/af_ipmi.h - Contains the socket interface.
|
||||
|
||||
linux/ipmi.h - Contains the user interface and IOCTL interface for IPMI.
|
||||
|
||||
linux/ipmi_smi.h - Contains the interface for system management interfaces
|
||||
@@ -245,6 +245,16 @@ addressed (because some boards actually have multiple BMCs on them)
|
||||
and the user should not have to care what type of SMI is below them.
|
||||
|
||||
|
||||
Watching For Interfaces
|
||||
|
||||
When your code comes up, the IPMI driver may or may not have detected
|
||||
if IPMI devices exist. So you might have to defer your setup until
|
||||
the device is detected, or you might be able to do it immediately.
|
||||
To handle this, and to allow for discovery, you register an SMI
|
||||
watcher with ipmi_smi_watcher_register() to iterate over interfaces
|
||||
and tell you when they come and go.
|
||||
|
||||
|
||||
Creating the User
|
||||
|
||||
To user the message handler, you must first create a user using
|
||||
@@ -263,7 +273,7 @@ closing the device automatically destroys the user.
|
||||
|
||||
Messaging
|
||||
|
||||
To send a message from kernel-land, the ipmi_request() call does
|
||||
To send a message from kernel-land, the ipmi_request_settime() call does
|
||||
pretty much all message handling. Most of the parameter are
|
||||
self-explanatory. However, it takes a "msgid" parameter. This is NOT
|
||||
the sequence number of messages. It is simply a long value that is
|
||||
@@ -352,11 +362,12 @@ that for more details.
|
||||
The SI Driver
|
||||
-------------
|
||||
|
||||
The SI driver allows up to 4 KCS or SMIC interfaces to be configured
|
||||
in the system. By default, scan the ACPI tables for interfaces, and
|
||||
if it doesn't find any the driver will attempt to register one KCS
|
||||
interface at the spec-specified I/O port 0xca2 without interrupts.
|
||||
You can change this at module load time (for a module) with:
|
||||
The SI driver allows KCS, BT, and SMIC interfaces to be configured
|
||||
in the system. It discovers interfaces through a host of different
|
||||
methods, depending on the system.
|
||||
|
||||
You can specify up to four interfaces on the module load line and
|
||||
control some module parameters:
|
||||
|
||||
modprobe ipmi_si.o type=<type1>,<type2>....
|
||||
ports=<port1>,<port2>... addrs=<addr1>,<addr2>...
|
||||
@@ -367,7 +378,7 @@ You can change this at module load time (for a module) with:
|
||||
force_kipmid=<enable1>,<enable2>,...
|
||||
kipmid_max_busy_us=<ustime1>,<ustime2>,...
|
||||
unload_when_empty=[0|1]
|
||||
trydefaults=[0|1] trydmi=[0|1] tryacpi=[0|1]
|
||||
trydmi=[0|1] tryacpi=[0|1]
|
||||
tryplatform=[0|1] trypci=[0|1]
|
||||
|
||||
Each of these except try... items is a list, the first item for the
|
||||
@@ -386,10 +397,6 @@ use the I/O port given as the device address.
|
||||
If you specify irqs as non-zero for an interface, the driver will
|
||||
attempt to use the given interrupt for the device.
|
||||
|
||||
trydefaults sets whether the standard IPMI interface at 0xca2 and
|
||||
any interfaces specified by ACPE are tried. By default, the driver
|
||||
tries it, set this value to zero to turn this off.
|
||||
|
||||
The other try... items disable discovery by their corresponding
|
||||
names. These are all enabled by default, set them to zero to disable
|
||||
them. The tryplatform disables openfirmware.
|
||||
@@ -434,7 +441,7 @@ kernel command line as:
|
||||
|
||||
ipmi_si.type=<type1>,<type2>...
|
||||
ipmi_si.ports=<port1>,<port2>... ipmi_si.addrs=<addr1>,<addr2>...
|
||||
ipmi_si.irqs=<irq1>,<irq2>... ipmi_si.trydefaults=[0|1]
|
||||
ipmi_si.irqs=<irq1>,<irq2>...
|
||||
ipmi_si.regspacings=<sp1>,<sp2>,...
|
||||
ipmi_si.regsizes=<size1>,<size2>,...
|
||||
ipmi_si.regshifts=<shift1>,<shift2>,...
|
||||
@@ -444,11 +451,6 @@ kernel command line as:
|
||||
|
||||
It works the same as the module parameters of the same names.
|
||||
|
||||
By default, the driver will attempt to detect any device specified by
|
||||
ACPI, and if none of those then a KCS device at the spec-specified
|
||||
0xca2. If you want to turn this off, set the "trydefaults" option to
|
||||
false.
|
||||
|
||||
If your IPMI interface does not support interrupts and is a KCS or
|
||||
SMIC interface, the IPMI driver will start a kernel thread for the
|
||||
interface to help speed things up. This is a low-priority kernel
|
||||
@@ -500,7 +502,8 @@ at module load time (for a module) with:
|
||||
addr=<i2caddr1>[,<i2caddr2>[,...]]
|
||||
adapter=<adapter1>[,<adapter2>[...]]
|
||||
dbg=<flags1>,<flags2>...
|
||||
slave_addrs=<addr1>,<addr2>,...
|
||||
slave_addrs=<addr1>,<addr2>,...
|
||||
tryacpi=[0|1] trydmi=[0|1]
|
||||
[dbg_probe=1]
|
||||
|
||||
The addresses are normal I2C addresses. The adapter is the string
|
||||
@@ -513,6 +516,9 @@ spaces in kernel parameters.
|
||||
The debug flags are bit flags for each BMC found, they are:
|
||||
IPMI messages: 1, driver state: 2, timing: 4, I2C probe: 8
|
||||
|
||||
The tryxxx parameters can be used to disable detecting interfaces
|
||||
from various sources.
|
||||
|
||||
Setting dbg_probe to 1 will enable debugging of the probing and
|
||||
detection process for BMCs on the SMBusses.
|
||||
|
||||
@@ -535,7 +541,8 @@ kernel command line as:
|
||||
ipmi_ssif.adapter=<adapter1>[,<adapter2>[...]]
|
||||
ipmi_ssif.dbg=<flags1>[,<flags2>[...]]
|
||||
ipmi_ssif.dbg_probe=1
|
||||
ipmi_ssif.slave_addrs=<addr1>[,<addr2>[...]]
|
||||
ipmi_ssif.slave_addrs=<addr1>[,<addr2>[...]]
|
||||
ipmi_ssif.tryacpi=[0|1] ipmi_ssif.trydmi=[0|1]
|
||||
|
||||
These are the same options as on the module command line.
|
||||
|
||||
|
||||
@@ -10,6 +10,8 @@ _SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/conf.py,%,$(wildcard $(src
|
||||
SPHINX_CONF = conf.py
|
||||
PAPER =
|
||||
BUILDDIR = $(obj)/output
|
||||
PDFLATEX = xelatex
|
||||
LATEXOPTS = -interaction=batchmode
|
||||
|
||||
# User-friendly check for sphinx-build
|
||||
HAVE_SPHINX := $(shell if which $(SPHINXBUILD) >/dev/null 2>&1; then echo 1; else echo 0; fi)
|
||||
@@ -29,7 +31,7 @@ else ifneq ($(DOCBOOKS),)
|
||||
else # HAVE_SPHINX
|
||||
|
||||
# User-friendly check for pdflatex
|
||||
HAVE_PDFLATEX := $(shell if which xelatex >/dev/null 2>&1; then echo 1; else echo 0; fi)
|
||||
HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
|
||||
|
||||
# Internal variables.
|
||||
PAPEROPT_a4 = -D latex_paper_size=a4
|
||||
@@ -51,8 +53,8 @@ loop_cmd = $(echo-cmd) $(cmd_$(1))
|
||||
# $5 reST source folder relative to $(srctree)/$(src),
|
||||
# e.g. "media" for the linux-tv book-set at ./Documentation/media
|
||||
|
||||
quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4);
|
||||
cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media all;\
|
||||
quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
|
||||
cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media $2;\
|
||||
BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \
|
||||
$(SPHINXBUILD) \
|
||||
-b $2 \
|
||||
@@ -67,16 +69,19 @@ htmldocs:
|
||||
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
|
||||
|
||||
latexdocs:
|
||||
ifeq ($(HAVE_PDFLATEX),0)
|
||||
$(warning The 'xelatex' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
|
||||
@echo " SKIP Sphinx $@ target."
|
||||
else # HAVE_PDFLATEX
|
||||
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
|
||||
endif # HAVE_PDFLATEX
|
||||
|
||||
ifeq ($(HAVE_PDFLATEX),0)
|
||||
|
||||
pdfdocs:
|
||||
$(warning The '$(PDFLATEX)' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
|
||||
@echo " SKIP Sphinx $@ target."
|
||||
|
||||
else # HAVE_PDFLATEX
|
||||
|
||||
pdfdocs: latexdocs
|
||||
ifneq ($(HAVE_PDFLATEX),0)
|
||||
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=xelatex LATEXOPTS="-interaction=nonstopmode" -C $(BUILDDIR)/$(var)/latex)
|
||||
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=$(PDFLATEX) LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex;)
|
||||
|
||||
endif # HAVE_PDFLATEX
|
||||
|
||||
epubdocs:
|
||||
@@ -93,6 +98,7 @@ installmandocs:
|
||||
|
||||
cleandocs:
|
||||
$(Q)rm -rf $(BUILDDIR)
|
||||
$(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) -C Documentation/media clean
|
||||
|
||||
endif # HAVE_SPHINX
|
||||
|
||||
|
||||
@@ -1,288 +0,0 @@
|
||||
.. _managementstyle:
|
||||
|
||||
Linux kernel management style
|
||||
=============================
|
||||
|
||||
This is a short document describing the preferred (or made up, depending
|
||||
on who you ask) management style for the linux kernel. It's meant to
|
||||
mirror the CodingStyle document to some degree, and mainly written to
|
||||
avoid answering [#f1]_ the same (or similar) questions over and over again.
|
||||
|
||||
Management style is very personal and much harder to quantify than
|
||||
simple coding style rules, so this document may or may not have anything
|
||||
to do with reality. It started as a lark, but that doesn't mean that it
|
||||
might not actually be true. You'll have to decide for yourself.
|
||||
|
||||
Btw, when talking about "kernel manager", it's all about the technical
|
||||
lead persons, not the people who do traditional management inside
|
||||
companies. If you sign purchase orders or you have any clue about the
|
||||
budget of your group, you're almost certainly not a kernel manager.
|
||||
These suggestions may or may not apply to you.
|
||||
|
||||
First off, I'd suggest buying "Seven Habits of Highly Effective
|
||||
People", and NOT read it. Burn it, it's a great symbolic gesture.
|
||||
|
||||
.. [#f1] This document does so not so much by answering the question, but by
|
||||
making it painfully obvious to the questioner that we don't have a clue
|
||||
to what the answer is.
|
||||
|
||||
Anyway, here goes:
|
||||
|
||||
.. _decisions:
|
||||
|
||||
1) Decisions
|
||||
------------
|
||||
|
||||
Everybody thinks managers make decisions, and that decision-making is
|
||||
important. The bigger and more painful the decision, the bigger the
|
||||
manager must be to make it. That's very deep and obvious, but it's not
|
||||
actually true.
|
||||
|
||||
The name of the game is to **avoid** having to make a decision. In
|
||||
particular, if somebody tells you "choose (a) or (b), we really need you
|
||||
to decide on this", you're in trouble as a manager. The people you
|
||||
manage had better know the details better than you, so if they come to
|
||||
you for a technical decision, you're screwed. You're clearly not
|
||||
competent to make that decision for them.
|
||||
|
||||
(Corollary:if the people you manage don't know the details better than
|
||||
you, you're also screwed, although for a totally different reason.
|
||||
Namely that you are in the wrong job, and that **they** should be managing
|
||||
your brilliance instead).
|
||||
|
||||
So the name of the game is to **avoid** decisions, at least the big and
|
||||
painful ones. Making small and non-consequential decisions is fine, and
|
||||
makes you look like you know what you're doing, so what a kernel manager
|
||||
needs to do is to turn the big and painful ones into small things where
|
||||
nobody really cares.
|
||||
|
||||
It helps to realize that the key difference between a big decision and a
|
||||
small one is whether you can fix your decision afterwards. Any decision
|
||||
can be made small by just always making sure that if you were wrong (and
|
||||
you **will** be wrong), you can always undo the damage later by
|
||||
backtracking. Suddenly, you get to be doubly managerial for making
|
||||
**two** inconsequential decisions - the wrong one **and** the right one.
|
||||
|
||||
And people will even see that as true leadership (*cough* bullshit
|
||||
*cough*).
|
||||
|
||||
Thus the key to avoiding big decisions becomes to just avoiding to do
|
||||
things that can't be undone. Don't get ushered into a corner from which
|
||||
you cannot escape. A cornered rat may be dangerous - a cornered manager
|
||||
is just pitiful.
|
||||
|
||||
It turns out that since nobody would be stupid enough to ever really let
|
||||
a kernel manager have huge fiscal responsibility **anyway**, it's usually
|
||||
fairly easy to backtrack. Since you're not going to be able to waste
|
||||
huge amounts of money that you might not be able to repay, the only
|
||||
thing you can backtrack on is a technical decision, and there
|
||||
back-tracking is very easy: just tell everybody that you were an
|
||||
incompetent nincompoop, say you're sorry, and undo all the worthless
|
||||
work you had people work on for the last year. Suddenly the decision
|
||||
you made a year ago wasn't a big decision after all, since it could be
|
||||
easily undone.
|
||||
|
||||
It turns out that some people have trouble with this approach, for two
|
||||
reasons:
|
||||
|
||||
- admitting you were an idiot is harder than it looks. We all like to
|
||||
maintain appearances, and coming out in public to say that you were
|
||||
wrong is sometimes very hard indeed.
|
||||
- having somebody tell you that what you worked on for the last year
|
||||
wasn't worthwhile after all can be hard on the poor lowly engineers
|
||||
too, and while the actual **work** was easy enough to undo by just
|
||||
deleting it, you may have irrevocably lost the trust of that
|
||||
engineer. And remember: "irrevocable" was what we tried to avoid in
|
||||
the first place, and your decision ended up being a big one after
|
||||
all.
|
||||
|
||||
Happily, both of these reasons can be mitigated effectively by just
|
||||
admitting up-front that you don't have a friggin' clue, and telling
|
||||
people ahead of the fact that your decision is purely preliminary, and
|
||||
might be the wrong thing. You should always reserve the right to change
|
||||
your mind, and make people very **aware** of that. And it's much easier
|
||||
to admit that you are stupid when you haven't **yet** done the really
|
||||
stupid thing.
|
||||
|
||||
Then, when it really does turn out to be stupid, people just roll their
|
||||
eyes and say "Oops, he did it again".
|
||||
|
||||
This preemptive admission of incompetence might also make the people who
|
||||
actually do the work also think twice about whether it's worth doing or
|
||||
not. After all, if **they** aren't certain whether it's a good idea, you
|
||||
sure as hell shouldn't encourage them by promising them that what they
|
||||
work on will be included. Make them at least think twice before they
|
||||
embark on a big endeavor.
|
||||
|
||||
Remember: they'd better know more about the details than you do, and
|
||||
they usually already think they have the answer to everything. The best
|
||||
thing you can do as a manager is not to instill confidence, but rather a
|
||||
healthy dose of critical thinking on what they do.
|
||||
|
||||
Btw, another way to avoid a decision is to plaintively just whine "can't
|
||||
we just do both?" and look pitiful. Trust me, it works. If it's not
|
||||
clear which approach is better, they'll eventually figure it out. The
|
||||
answer may end up being that both teams get so frustrated by the
|
||||
situation that they just give up.
|
||||
|
||||
That may sound like a failure, but it's usually a sign that there was
|
||||
something wrong with both projects, and the reason the people involved
|
||||
couldn't decide was that they were both wrong. You end up coming up
|
||||
smelling like roses, and you avoided yet another decision that you could
|
||||
have screwed up on.
|
||||
|
||||
|
||||
2) People
|
||||
---------
|
||||
|
||||
Most people are idiots, and being a manager means you'll have to deal
|
||||
with it, and perhaps more importantly, that **they** have to deal with
|
||||
**you**.
|
||||
|
||||
It turns out that while it's easy to undo technical mistakes, it's not
|
||||
as easy to undo personality disorders. You just have to live with
|
||||
theirs - and yours.
|
||||
|
||||
However, in order to prepare yourself as a kernel manager, it's best to
|
||||
remember not to burn any bridges, bomb any innocent villagers, or
|
||||
alienate too many kernel developers. It turns out that alienating people
|
||||
is fairly easy, and un-alienating them is hard. Thus "alienating"
|
||||
immediately falls under the heading of "not reversible", and becomes a
|
||||
no-no according to :ref:`decisions`.
|
||||
|
||||
There's just a few simple rules here:
|
||||
|
||||
(1) don't call people d*ckheads (at least not in public)
|
||||
(2) learn how to apologize when you forgot rule (1)
|
||||
|
||||
The problem with #1 is that it's very easy to do, since you can say
|
||||
"you're a d*ckhead" in millions of different ways [#f2]_, sometimes without
|
||||
even realizing it, and almost always with a white-hot conviction that
|
||||
you are right.
|
||||
|
||||
And the more convinced you are that you are right (and let's face it,
|
||||
you can call just about **anybody** a d*ckhead, and you often **will** be
|
||||
right), the harder it ends up being to apologize afterwards.
|
||||
|
||||
To solve this problem, you really only have two options:
|
||||
|
||||
- get really good at apologies
|
||||
- spread the "love" out so evenly that nobody really ends up feeling
|
||||
like they get unfairly targeted. Make it inventive enough, and they
|
||||
might even be amused.
|
||||
|
||||
The option of being unfailingly polite really doesn't exist. Nobody will
|
||||
trust somebody who is so clearly hiding his true character.
|
||||
|
||||
.. [#f2] Paul Simon sang "Fifty Ways to Leave Your Lover", because quite
|
||||
frankly, "A Million Ways to Tell a Developer He Is a D*ckhead" doesn't
|
||||
scan nearly as well. But I'm sure he thought about it.
|
||||
|
||||
|
||||
3) People II - the Good Kind
|
||||
----------------------------
|
||||
|
||||
While it turns out that most people are idiots, the corollary to that is
|
||||
sadly that you are one too, and that while we can all bask in the secure
|
||||
knowledge that we're better than the average person (let's face it,
|
||||
nobody ever believes that they're average or below-average), we should
|
||||
also admit that we're not the sharpest knife around, and there will be
|
||||
other people that are less of an idiot than you are.
|
||||
|
||||
Some people react badly to smart people. Others take advantage of them.
|
||||
|
||||
Make sure that you, as a kernel maintainer, are in the second group.
|
||||
Suck up to them, because they are the people who will make your job
|
||||
easier. In particular, they'll be able to make your decisions for you,
|
||||
which is what the game is all about.
|
||||
|
||||
So when you find somebody smarter than you are, just coast along. Your
|
||||
management responsibilities largely become ones of saying "Sounds like a
|
||||
good idea - go wild", or "That sounds good, but what about xxx?". The
|
||||
second version in particular is a great way to either learn something
|
||||
new about "xxx" or seem **extra** managerial by pointing out something the
|
||||
smarter person hadn't thought about. In either case, you win.
|
||||
|
||||
One thing to look out for is to realize that greatness in one area does
|
||||
not necessarily translate to other areas. So you might prod people in
|
||||
specific directions, but let's face it, they might be good at what they
|
||||
do, and suck at everything else. The good news is that people tend to
|
||||
naturally gravitate back to what they are good at, so it's not like you
|
||||
are doing something irreversible when you **do** prod them in some
|
||||
direction, just don't push too hard.
|
||||
|
||||
|
||||
4) Placing blame
|
||||
----------------
|
||||
|
||||
Things will go wrong, and people want somebody to blame. Tag, you're it.
|
||||
|
||||
It's not actually that hard to accept the blame, especially if people
|
||||
kind of realize that it wasn't **all** your fault. Which brings us to the
|
||||
best way of taking the blame: do it for another guy. You'll feel good
|
||||
for taking the fall, he'll feel good about not getting blamed, and the
|
||||
guy who lost his whole 36GB porn-collection because of your incompetence
|
||||
will grudgingly admit that you at least didn't try to weasel out of it.
|
||||
|
||||
Then make the developer who really screwed up (if you can find him) know
|
||||
**in_private** that he screwed up. Not just so he can avoid it in the
|
||||
future, but so that he knows he owes you one. And, perhaps even more
|
||||
importantly, he's also likely the person who can fix it. Because, let's
|
||||
face it, it sure ain't you.
|
||||
|
||||
Taking the blame is also why you get to be manager in the first place.
|
||||
It's part of what makes people trust you, and allow you the potential
|
||||
glory, because you're the one who gets to say "I screwed up". And if
|
||||
you've followed the previous rules, you'll be pretty good at saying that
|
||||
by now.
|
||||
|
||||
|
||||
5) Things to avoid
|
||||
------------------
|
||||
|
||||
There's one thing people hate even more than being called "d*ckhead",
|
||||
and that is being called a "d*ckhead" in a sanctimonious voice. The
|
||||
first you can apologize for, the second one you won't really get the
|
||||
chance. They likely will no longer be listening even if you otherwise
|
||||
do a good job.
|
||||
|
||||
We all think we're better than anybody else, which means that when
|
||||
somebody else puts on airs, it **really** rubs us the wrong way. You may
|
||||
be morally and intellectually superior to everybody around you, but
|
||||
don't try to make it too obvious unless you really **intend** to irritate
|
||||
somebody [#f3]_.
|
||||
|
||||
Similarly, don't be too polite or subtle about things. Politeness easily
|
||||
ends up going overboard and hiding the problem, and as they say, "On the
|
||||
internet, nobody can hear you being subtle". Use a big blunt object to
|
||||
hammer the point in, because you can't really depend on people getting
|
||||
your point otherwise.
|
||||
|
||||
Some humor can help pad both the bluntness and the moralizing. Going
|
||||
overboard to the point of being ridiculous can drive a point home
|
||||
without making it painful to the recipient, who just thinks you're being
|
||||
silly. It can thus help get through the personal mental block we all
|
||||
have about criticism.
|
||||
|
||||
.. [#f3] Hint: internet newsgroups that are not directly related to your work
|
||||
are great ways to take out your frustrations at other people. Write
|
||||
insulting posts with a sneer just to get into a good flame every once in
|
||||
a while, and you'll feel cleansed. Just don't crap too close to home.
|
||||
|
||||
|
||||
6) Why me?
|
||||
----------
|
||||
|
||||
Since your main responsibility seems to be to take the blame for other
|
||||
peoples mistakes, and make it painfully obvious to everybody else that
|
||||
you're incompetent, the obvious question becomes one of why do it in the
|
||||
first place?
|
||||
|
||||
First off, while you may or may not get screaming teenage girls (or
|
||||
boys, let's not be judgmental or sexist here) knocking on your dressing
|
||||
room door, you **will** get an immense feeling of personal accomplishment
|
||||
for being "in charge". Never mind the fact that you're really leading
|
||||
by trying to keep up with everybody else and running after them as fast
|
||||
as you can. Everybody will still think you're the person in charge.
|
||||
|
||||
It's a great job if you can hack it.
|
||||
@@ -547,7 +547,7 @@ The <tt>rcu_access_pointer()</tt> on line 6 is similar to
|
||||
It could reuse a value formerly fetched from this same pointer.
|
||||
It could also fetch the pointer from <tt>gp</tt> in a byte-at-a-time
|
||||
manner, resulting in <i>load tearing</i>, in turn resulting a bytewise
|
||||
mash-up of two distince pointer values.
|
||||
mash-up of two distinct pointer values.
|
||||
It might even use value-speculation optimizations, where it makes
|
||||
a wrong guess, but by the time it gets around to checking the
|
||||
value, an update has changed the pointer to match the wrong guess.
|
||||
@@ -659,6 +659,29 @@ systems with more than one CPU:
|
||||
In other words, a given instance of <tt>synchronize_rcu()</tt>
|
||||
can avoid waiting on a given RCU read-side critical section only
|
||||
if it can prove that <tt>synchronize_rcu()</tt> started first.
|
||||
|
||||
<p>
|
||||
A related question is “When <tt>rcu_read_lock()</tt>
|
||||
doesn't generate any code, why does it matter how it relates
|
||||
to a grace period?”
|
||||
The answer is that it is not the relationship of
|
||||
<tt>rcu_read_lock()</tt> itself that is important, but rather
|
||||
the relationship of the code within the enclosed RCU read-side
|
||||
critical section to the code preceding and following the
|
||||
grace period.
|
||||
If we take this viewpoint, then a given RCU read-side critical
|
||||
section begins before a given grace period when some access
|
||||
preceding the grace period observes the effect of some access
|
||||
within the critical section, in which case none of the accesses
|
||||
within the critical section may observe the effects of any
|
||||
access following the grace period.
|
||||
|
||||
<p>
|
||||
As of late 2016, mathematical models of RCU take this
|
||||
viewpoint, for example, see slides 62 and 63
|
||||
of the
|
||||
<a href="http://www2.rdrop.com/users/paulmck/scalability/paper/LinuxMM.2016.10.04c.LCE.pdf">2016 LinuxCon EU</a>
|
||||
presentation.
|
||||
</font></td></tr>
|
||||
<tr><td> </td></tr>
|
||||
</table>
|
||||
|
||||
@@ -237,7 +237,7 @@ rcu_dereference()
|
||||
|
||||
The reader uses rcu_dereference() to fetch an RCU-protected
|
||||
pointer, which returns a value that may then be safely
|
||||
dereferenced. Note that rcu_deference() does not actually
|
||||
dereferenced. Note that rcu_dereference() does not actually
|
||||
dereference the pointer, instead, it protects the pointer for
|
||||
later dereferencing. It also executes any needed memory-barrier
|
||||
instructions for a given CPU architecture. Currently, only Alpha
|
||||
|
||||
@@ -1,46 +0,0 @@
|
||||
.. _securitybugs:
|
||||
|
||||
Security bugs
|
||||
=============
|
||||
|
||||
Linux kernel developers take security very seriously. As such, we'd
|
||||
like to know when a security bug is found so that it can be fixed and
|
||||
disclosed as quickly as possible. Please report security bugs to the
|
||||
Linux kernel security team.
|
||||
|
||||
1) Contact
|
||||
----------
|
||||
|
||||
The Linux kernel security team can be contacted by email at
|
||||
<security@kernel.org>. This is a private list of security officers
|
||||
who will help verify the bug report and develop and release a fix.
|
||||
It is possible that the security team will bring in extra help from
|
||||
area maintainers to understand and fix the security vulnerability.
|
||||
|
||||
As it is with any bug, the more information provided the easier it
|
||||
will be to diagnose and fix. Please review the procedure outlined in
|
||||
REPORTING-BUGS if you are unclear about what information is helpful.
|
||||
Any exploit code is very helpful and will not be released without
|
||||
consent from the reporter unless it has already been made public.
|
||||
|
||||
2) Disclosure
|
||||
-------------
|
||||
|
||||
The goal of the Linux kernel security team is to work with the
|
||||
bug submitter to bug resolution as well as disclosure. We prefer
|
||||
to fully disclose the bug as soon as possible. It is reasonable to
|
||||
delay disclosure when the bug or the fix is not yet fully understood,
|
||||
the solution is not well-tested or for vendor coordination. However, we
|
||||
expect these delays to be short, measurable in days, not weeks or months.
|
||||
A disclosure date is negotiated by the security team working with the
|
||||
bug submitter as well as vendors. However, the kernel security team
|
||||
holds the final say when setting a disclosure date. The timeframe for
|
||||
disclosure is from immediate (esp. if it's already publicly known)
|
||||
to a few weeks. As a basic default policy, we expect report date to
|
||||
disclosure date to be on the order of 7 days.
|
||||
|
||||
3) Non-disclosure agreements
|
||||
----------------------------
|
||||
|
||||
The Linux kernel security team is not a formal body and therefore unable
|
||||
to enter any non-disclosure agreements.
|
||||
@@ -1,120 +0,0 @@
|
||||
.. _submitchecklist:
|
||||
|
||||
Linux Kernel patch submission checklist
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Here are some basic things that developers should do if they want to see their
|
||||
kernel patch submissions accepted more quickly.
|
||||
|
||||
These are all above and beyond the documentation that is provided in
|
||||
:ref:`Documentation/SubmittingPatches <submittingpatches>`
|
||||
and elsewhere regarding submitting Linux kernel patches.
|
||||
|
||||
|
||||
1) If you use a facility then #include the file that defines/declares
|
||||
that facility. Don't depend on other header files pulling in ones
|
||||
that you use.
|
||||
|
||||
2) Builds cleanly:
|
||||
|
||||
a) with applicable or modified ``CONFIG`` options ``=y``, ``=m``, and
|
||||
``=n``. No ``gcc`` warnings/errors, no linker warnings/errors.
|
||||
|
||||
b) Passes ``allnoconfig``, ``allmodconfig``
|
||||
|
||||
c) Builds successfully when using ``O=builddir``
|
||||
|
||||
3) Builds on multiple CPU architectures by using local cross-compile tools
|
||||
or some other build farm.
|
||||
|
||||
4) ppc64 is a good architecture for cross-compilation checking because it
|
||||
tends to use ``unsigned long`` for 64-bit quantities.
|
||||
|
||||
5) Check your patch for general style as detailed in
|
||||
:ref:`Documentation/CodingStyle <codingstyle>`.
|
||||
Check for trivial violations with the patch style checker prior to
|
||||
submission (``scripts/checkpatch.pl``).
|
||||
You should be able to justify all violations that remain in
|
||||
your patch.
|
||||
|
||||
6) Any new or modified ``CONFIG`` options don't muck up the config menu.
|
||||
|
||||
7) All new ``Kconfig`` options have help text.
|
||||
|
||||
8) Has been carefully reviewed with respect to relevant ``Kconfig``
|
||||
combinations. This is very hard to get right with testing -- brainpower
|
||||
pays off here.
|
||||
|
||||
9) Check cleanly with sparse.
|
||||
|
||||
10) Use ``make checkstack`` and ``make namespacecheck`` and fix any problems
|
||||
that they find.
|
||||
|
||||
.. note::
|
||||
|
||||
``checkstack`` does not point out problems explicitly,
|
||||
but any one function that uses more than 512 bytes on the stack is a
|
||||
candidate for change.
|
||||
|
||||
11) Include :ref:`kernel-doc <kernel_doc>` to document global kernel APIs.
|
||||
(Not required for static functions, but OK there also.) Use
|
||||
``make htmldocs`` or ``make pdfdocs`` to check the
|
||||
:ref:`kernel-doc <kernel_doc>` and fix any issues.
|
||||
|
||||
12) Has been tested with ``CONFIG_PREEMPT``, ``CONFIG_DEBUG_PREEMPT``,
|
||||
``CONFIG_DEBUG_SLAB``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``,
|
||||
``CONFIG_DEBUG_SPINLOCK``, ``CONFIG_DEBUG_ATOMIC_SLEEP``,
|
||||
``CONFIG_PROVE_RCU`` and ``CONFIG_DEBUG_OBJECTS_RCU_HEAD`` all
|
||||
simultaneously enabled.
|
||||
|
||||
13) Has been build- and runtime tested with and without ``CONFIG_SMP`` and
|
||||
``CONFIG_PREEMPT.``
|
||||
|
||||
14) If the patch affects IO/Disk, etc: has been tested with and without
|
||||
``CONFIG_LBDAF.``
|
||||
|
||||
15) All codepaths have been exercised with all lockdep features enabled.
|
||||
|
||||
16) All new ``/proc`` entries are documented under ``Documentation/``
|
||||
|
||||
17) All new kernel boot parameters are documented in
|
||||
``Documentation/kernel-parameters.txt``.
|
||||
|
||||
18) All new module parameters are documented with ``MODULE_PARM_DESC()``
|
||||
|
||||
19) All new userspace interfaces are documented in ``Documentation/ABI/``.
|
||||
See ``Documentation/ABI/README`` for more information.
|
||||
Patches that change userspace interfaces should be CCed to
|
||||
linux-api@vger.kernel.org.
|
||||
|
||||
20) Check that it all passes ``make headers_check``.
|
||||
|
||||
21) Has been checked with injection of at least slab and page-allocation
|
||||
failures. See ``Documentation/fault-injection/``.
|
||||
|
||||
If the new code is substantial, addition of subsystem-specific fault
|
||||
injection might be appropriate.
|
||||
|
||||
22) Newly-added code has been compiled with ``gcc -W`` (use
|
||||
``make EXTRA_CFLAGS=-W``). This will generate lots of noise, but is good
|
||||
for finding bugs like "warning: comparison between signed and unsigned".
|
||||
|
||||
23) Tested after it has been merged into the -mm patchset to make sure
|
||||
that it still works with all of the other queued patches and various
|
||||
changes in the VM, VFS, and other subsystems.
|
||||
|
||||
24) All memory barriers {e.g., ``barrier()``, ``rmb()``, ``wmb()``} need a
|
||||
comment in the source code that explains the logic of what they are doing
|
||||
and why.
|
||||
|
||||
25) If any ioctl's are added by the patch, then also update
|
||||
``Documentation/ioctl/ioctl-number.txt``.
|
||||
|
||||
26) If your modified source code depends on or uses any of the kernel
|
||||
APIs or features that are related to the following ``Kconfig`` symbols,
|
||||
then test multiple builds with the related ``Kconfig`` symbols disabled
|
||||
and/or ``=m`` (if that option is available) [not all of these at the
|
||||
same time, just various/random combinations of them]:
|
||||
|
||||
``CONFIG_SMP``, ``CONFIG_SYSFS``, ``CONFIG_PROC_FS``, ``CONFIG_INPUT``, ``CONFIG_PCI``, ``CONFIG_BLOCK``, ``CONFIG_PM``, ``CONFIG_MAGIC_SYSRQ``,
|
||||
``CONFIG_NET``, ``CONFIG_INET=n`` (but latter with ``CONFIG_NET=y``).
|
||||
@@ -1,183 +0,0 @@
|
||||
.. _submittingdrivers:
|
||||
|
||||
Submitting Drivers For The Linux Kernel
|
||||
=======================================
|
||||
|
||||
This document is intended to explain how to submit device drivers to the
|
||||
various kernel trees. Note that if you are interested in video card drivers
|
||||
you should probably talk to XFree86 (http://www.xfree86.org/) and/or X.Org
|
||||
(http://x.org/) instead.
|
||||
|
||||
Also read the Documentation/SubmittingPatches document.
|
||||
|
||||
|
||||
Allocating Device Numbers
|
||||
-------------------------
|
||||
|
||||
Major and minor numbers for block and character devices are allocated
|
||||
by the Linux assigned name and number authority (currently this is
|
||||
Torben Mathiasen). The site is http://www.lanana.org/. This
|
||||
also deals with allocating numbers for devices that are not going to
|
||||
be submitted to the mainstream kernel.
|
||||
See Documentation/devices.txt for more information on this.
|
||||
|
||||
If you don't use assigned numbers then when your device is submitted it will
|
||||
be given an assigned number even if that is different from values you may
|
||||
have shipped to customers before.
|
||||
|
||||
Who To Submit Drivers To
|
||||
------------------------
|
||||
|
||||
Linux 2.0:
|
||||
No new drivers are accepted for this kernel tree.
|
||||
|
||||
Linux 2.2:
|
||||
No new drivers are accepted for this kernel tree.
|
||||
|
||||
Linux 2.4:
|
||||
If the code area has a general maintainer then please submit it to
|
||||
the maintainer listed in MAINTAINERS in the kernel file. If the
|
||||
maintainer does not respond or you cannot find the appropriate
|
||||
maintainer then please contact Willy Tarreau <w@1wt.eu>.
|
||||
|
||||
Linux 2.6 and upper:
|
||||
The same rules apply as 2.4 except that you should follow linux-kernel
|
||||
to track changes in API's. The final contact point for Linux 2.6+
|
||||
submissions is Andrew Morton.
|
||||
|
||||
What Criteria Determine Acceptance
|
||||
----------------------------------
|
||||
|
||||
Licensing:
|
||||
The code must be released to us under the
|
||||
GNU General Public License. We don't insist on any kind
|
||||
of exclusive GPL licensing, and if you wish the driver
|
||||
to be useful to other communities such as BSD you may well
|
||||
wish to release under multiple licenses.
|
||||
See accepted licenses at include/linux/module.h
|
||||
|
||||
Copyright:
|
||||
The copyright owner must agree to use of GPL.
|
||||
It's best if the submitter and copyright owner
|
||||
are the same person/entity. If not, the name of
|
||||
the person/entity authorizing use of GPL should be
|
||||
listed in case it's necessary to verify the will of
|
||||
the copyright owner.
|
||||
|
||||
Interfaces:
|
||||
If your driver uses existing interfaces and behaves like
|
||||
other drivers in the same class it will be much more likely
|
||||
to be accepted than if it invents gratuitous new ones.
|
||||
If you need to implement a common API over Linux and NT
|
||||
drivers do it in userspace.
|
||||
|
||||
Code:
|
||||
Please use the Linux style of code formatting as documented
|
||||
in :ref:`Documentation/CodingStyle <codingStyle>`.
|
||||
If you have sections of code
|
||||
that need to be in other formats, for example because they
|
||||
are shared with a windows driver kit and you want to
|
||||
maintain them just once separate them out nicely and note
|
||||
this fact.
|
||||
|
||||
Portability:
|
||||
Pointers are not always 32bits, not all computers are little
|
||||
endian, people do not all have floating point and you
|
||||
shouldn't use inline x86 assembler in your driver without
|
||||
careful thought. Pure x86 drivers generally are not popular.
|
||||
If you only have x86 hardware it is hard to test portability
|
||||
but it is easy to make sure the code can easily be made
|
||||
portable.
|
||||
|
||||
Clarity:
|
||||
It helps if anyone can see how to fix the driver. It helps
|
||||
you because you get patches not bug reports. If you submit a
|
||||
driver that intentionally obfuscates how the hardware works
|
||||
it will go in the bitbucket.
|
||||
|
||||
PM support:
|
||||
Since Linux is used on many portable and desktop systems, your
|
||||
driver is likely to be used on such a system and therefore it
|
||||
should support basic power management by implementing, if
|
||||
necessary, the .suspend and .resume methods used during the
|
||||
system-wide suspend and resume transitions. You should verify
|
||||
that your driver correctly handles the suspend and resume, but
|
||||
if you are unable to ensure that, please at least define the
|
||||
.suspend method returning the -ENOSYS ("Function not
|
||||
implemented") error. You should also try to make sure that your
|
||||
driver uses as little power as possible when it's not doing
|
||||
anything. For the driver testing instructions see
|
||||
Documentation/power/drivers-testing.txt and for a relatively
|
||||
complete overview of the power management issues related to
|
||||
drivers see Documentation/power/devices.txt .
|
||||
|
||||
Control:
|
||||
In general if there is active maintenance of a driver by
|
||||
the author then patches will be redirected to them unless
|
||||
they are totally obvious and without need of checking.
|
||||
If you want to be the contact and update point for the
|
||||
driver it is a good idea to state this in the comments,
|
||||
and include an entry in MAINTAINERS for your driver.
|
||||
|
||||
What Criteria Do Not Determine Acceptance
|
||||
-----------------------------------------
|
||||
|
||||
Vendor:
|
||||
Being the hardware vendor and maintaining the driver is
|
||||
often a good thing. If there is a stable working driver from
|
||||
other people already in the tree don't expect 'we are the
|
||||
vendor' to get your driver chosen. Ideally work with the
|
||||
existing driver author to build a single perfect driver.
|
||||
|
||||
Author:
|
||||
It doesn't matter if a large Linux company wrote the driver,
|
||||
or you did. Nobody has any special access to the kernel
|
||||
tree. Anyone who tells you otherwise isn't telling the
|
||||
whole story.
|
||||
|
||||
|
||||
Resources
|
||||
---------
|
||||
|
||||
Linux kernel master tree:
|
||||
ftp.\ *country_code*\ .kernel.org:/pub/linux/kernel/...
|
||||
|
||||
where *country_code* == your country code, such as
|
||||
**us**, **uk**, **fr**, etc.
|
||||
|
||||
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git
|
||||
|
||||
Linux kernel mailing list:
|
||||
linux-kernel@vger.kernel.org
|
||||
[mail majordomo@vger.kernel.org to subscribe]
|
||||
|
||||
Linux Device Drivers, Third Edition (covers 2.6.10):
|
||||
http://lwn.net/Kernel/LDD3/ (free version)
|
||||
|
||||
LWN.net:
|
||||
Weekly summary of kernel development activity - http://lwn.net/
|
||||
|
||||
2.6 API changes:
|
||||
|
||||
http://lwn.net/Articles/2.6-kernel-api/
|
||||
|
||||
Porting drivers from prior kernels to 2.6:
|
||||
|
||||
http://lwn.net/Articles/driver-porting/
|
||||
|
||||
KernelNewbies:
|
||||
Documentation and assistance for new kernel programmers
|
||||
|
||||
http://kernelnewbies.org/
|
||||
|
||||
Linux USB project:
|
||||
http://www.linux-usb.org/
|
||||
|
||||
How to NOT write kernel driver by Arjan van de Ven:
|
||||
http://www.fenrus.org/how-to-not-write-a-device-driver-paper.pdf
|
||||
|
||||
Kernel Janitor:
|
||||
http://kernelnewbies.org/KernelJanitors
|
||||
|
||||
GIT, Fast Version Control System:
|
||||
http://git-scm.com/
|
||||
@@ -1,841 +1 @@
|
||||
.. _submittingpatches:
|
||||
|
||||
How to Get Your Change Into the Linux Kernel or Care And Operation Of Your Linus Torvalds
|
||||
=========================================================================================
|
||||
|
||||
For a person or company who wishes to submit a change to the Linux
|
||||
kernel, the process can sometimes be daunting if you're not familiar
|
||||
with "the system." This text is a collection of suggestions which
|
||||
can greatly increase the chances of your change being accepted.
|
||||
|
||||
This document contains a large number of suggestions in a relatively terse
|
||||
format. For detailed information on how the kernel development process
|
||||
works, see :ref:`Documentation/development-process <development_process_main>`.
|
||||
Also, read :ref:`Documentation/SubmitChecklist <submitchecklist>`
|
||||
for a list of items to check before
|
||||
submitting code. If you are submitting a driver, also read
|
||||
:ref:`Documentation/SubmittingDrivers <submittingdrivers>`;
|
||||
for device tree binding patches, read
|
||||
Documentation/devicetree/bindings/submitting-patches.txt.
|
||||
|
||||
Many of these steps describe the default behavior of the ``git`` version
|
||||
control system; if you use ``git`` to prepare your patches, you'll find much
|
||||
of the mechanical work done for you, though you'll still need to prepare
|
||||
and document a sensible set of patches. In general, use of ``git`` will make
|
||||
your life as a kernel developer easier.
|
||||
|
||||
Creating and Sending your Change
|
||||
********************************
|
||||
|
||||
|
||||
0) Obtain a current source tree
|
||||
-------------------------------
|
||||
|
||||
If you do not have a repository with the current kernel source handy, use
|
||||
``git`` to obtain one. You'll want to start with the mainline repository,
|
||||
which can be grabbed with::
|
||||
|
||||
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
|
||||
|
||||
Note, however, that you may not want to develop against the mainline tree
|
||||
directly. Most subsystem maintainers run their own trees and want to see
|
||||
patches prepared against those trees. See the **T:** entry for the subsystem
|
||||
in the MAINTAINERS file to find that tree, or simply ask the maintainer if
|
||||
the tree is not listed there.
|
||||
|
||||
It is still possible to download kernel releases via tarballs (as described
|
||||
in the next section), but that is the hard way to do kernel development.
|
||||
|
||||
1) ``diff -up``
|
||||
---------------
|
||||
|
||||
If you must generate your patches by hand, use ``diff -up`` or ``diff -uprN``
|
||||
to create patches. Git generates patches in this form by default; if
|
||||
you're using ``git``, you can skip this section entirely.
|
||||
|
||||
All changes to the Linux kernel occur in the form of patches, as
|
||||
generated by :manpage:`diff(1)`. When creating your patch, make sure to
|
||||
create it in "unified diff" format, as supplied by the ``-u`` argument
|
||||
to :manpage:`diff(1)`.
|
||||
Also, please use the ``-p`` argument which shows which C function each
|
||||
change is in - that makes the resultant ``diff`` a lot easier to read.
|
||||
Patches should be based in the root kernel source directory,
|
||||
not in any lower subdirectory.
|
||||
|
||||
To create a patch for a single file, it is often sufficient to do::
|
||||
|
||||
SRCTREE= linux
|
||||
MYFILE= drivers/net/mydriver.c
|
||||
|
||||
cd $SRCTREE
|
||||
cp $MYFILE $MYFILE.orig
|
||||
vi $MYFILE # make your change
|
||||
cd ..
|
||||
diff -up $SRCTREE/$MYFILE{.orig,} > /tmp/patch
|
||||
|
||||
To create a patch for multiple files, you should unpack a "vanilla",
|
||||
or unmodified kernel source tree, and generate a ``diff`` against your
|
||||
own source tree. For example::
|
||||
|
||||
MYSRC= /devel/linux
|
||||
|
||||
tar xvfz linux-3.19.tar.gz
|
||||
mv linux-3.19 linux-3.19-vanilla
|
||||
diff -uprN -X linux-3.19-vanilla/Documentation/dontdiff \
|
||||
linux-3.19-vanilla $MYSRC > /tmp/patch
|
||||
|
||||
``dontdiff`` is a list of files which are generated by the kernel during
|
||||
the build process, and should be ignored in any :manpage:`diff(1)`-generated
|
||||
patch.
|
||||
|
||||
Make sure your patch does not include any extra files which do not
|
||||
belong in a patch submission. Make sure to review your patch -after-
|
||||
generating it with :manpage:`diff(1)`, to ensure accuracy.
|
||||
|
||||
If your changes produce a lot of deltas, you need to split them into
|
||||
individual patches which modify things in logical stages; see
|
||||
:ref:`split_changes`. This will facilitate review by other kernel developers,
|
||||
very important if you want your patch accepted.
|
||||
|
||||
If you're using ``git``, ``git rebase -i`` can help you with this process. If
|
||||
you're not using ``git``, ``quilt`` <http://savannah.nongnu.org/projects/quilt>
|
||||
is another popular alternative.
|
||||
|
||||
.. _describe_changes:
|
||||
|
||||
2) Describe your changes
|
||||
------------------------
|
||||
|
||||
Describe your problem. Whether your patch is a one-line bug fix or
|
||||
5000 lines of a new feature, there must be an underlying problem that
|
||||
motivated you to do this work. Convince the reviewer that there is a
|
||||
problem worth fixing and that it makes sense for them to read past the
|
||||
first paragraph.
|
||||
|
||||
Describe user-visible impact. Straight up crashes and lockups are
|
||||
pretty convincing, but not all bugs are that blatant. Even if the
|
||||
problem was spotted during code review, describe the impact you think
|
||||
it can have on users. Keep in mind that the majority of Linux
|
||||
installations run kernels from secondary stable trees or
|
||||
vendor/product-specific trees that cherry-pick only specific patches
|
||||
from upstream, so include anything that could help route your change
|
||||
downstream: provoking circumstances, excerpts from dmesg, crash
|
||||
descriptions, performance regressions, latency spikes, lockups, etc.
|
||||
|
||||
Quantify optimizations and trade-offs. If you claim improvements in
|
||||
performance, memory consumption, stack footprint, or binary size,
|
||||
include numbers that back them up. But also describe non-obvious
|
||||
costs. Optimizations usually aren't free but trade-offs between CPU,
|
||||
memory, and readability; or, when it comes to heuristics, between
|
||||
different workloads. Describe the expected downsides of your
|
||||
optimization so that the reviewer can weigh costs against benefits.
|
||||
|
||||
Once the problem is established, describe what you are actually doing
|
||||
about it in technical detail. It's important to describe the change
|
||||
in plain English for the reviewer to verify that the code is behaving
|
||||
as you intend it to.
|
||||
|
||||
The maintainer will thank you if you write your patch description in a
|
||||
form which can be easily pulled into Linux's source code management
|
||||
system, ``git``, as a "commit log". See :ref:`explicit_in_reply_to`.
|
||||
|
||||
Solve only one problem per patch. If your description starts to get
|
||||
long, that's a sign that you probably need to split up your patch.
|
||||
See :ref:`split_changes`.
|
||||
|
||||
When you submit or resubmit a patch or patch series, include the
|
||||
complete patch description and justification for it. Don't just
|
||||
say that this is version N of the patch (series). Don't expect the
|
||||
subsystem maintainer to refer back to earlier patch versions or referenced
|
||||
URLs to find the patch description and put that into the patch.
|
||||
I.e., the patch (series) and its description should be self-contained.
|
||||
This benefits both the maintainers and reviewers. Some reviewers
|
||||
probably didn't even receive earlier versions of the patch.
|
||||
|
||||
Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
|
||||
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
|
||||
to do frotz", as if you are giving orders to the codebase to change
|
||||
its behaviour.
|
||||
|
||||
If the patch fixes a logged bug entry, refer to that bug entry by
|
||||
number and URL. If the patch follows from a mailing list discussion,
|
||||
give a URL to the mailing list archive; use the https://lkml.kernel.org/
|
||||
redirector with a ``Message-Id``, to ensure that the links cannot become
|
||||
stale.
|
||||
|
||||
However, try to make your explanation understandable without external
|
||||
resources. In addition to giving a URL to a mailing list archive or
|
||||
bug, summarize the relevant points of the discussion that led to the
|
||||
patch as submitted.
|
||||
|
||||
If you want to refer to a specific commit, don't just refer to the
|
||||
SHA-1 ID of the commit. Please also include the oneline summary of
|
||||
the commit, to make it easier for reviewers to know what it is about.
|
||||
Example::
|
||||
|
||||
Commit e21d2170f36602ae2708 ("video: remove unnecessary
|
||||
platform_set_drvdata()") removed the unnecessary
|
||||
platform_set_drvdata(), but left the variable "dev" unused,
|
||||
delete it.
|
||||
|
||||
You should also be sure to use at least the first twelve characters of the
|
||||
SHA-1 ID. The kernel repository holds a *lot* of objects, making
|
||||
collisions with shorter IDs a real possibility. Bear in mind that, even if
|
||||
there is no collision with your six-character ID now, that condition may
|
||||
change five years from now.
|
||||
|
||||
If your patch fixes a bug in a specific commit, e.g. you found an issue using
|
||||
``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
|
||||
the SHA-1 ID, and the one line summary. For example::
|
||||
|
||||
Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()")
|
||||
|
||||
The following ``git config`` settings can be used to add a pretty format for
|
||||
outputting the above style in the ``git log`` or ``git show`` commands::
|
||||
|
||||
[core]
|
||||
abbrev = 12
|
||||
[pretty]
|
||||
fixes = Fixes: %h (\"%s\")
|
||||
|
||||
.. _split_changes:
|
||||
|
||||
3) Separate your changes
|
||||
------------------------
|
||||
|
||||
Separate each **logical change** into a separate patch.
|
||||
|
||||
For example, if your changes include both bug fixes and performance
|
||||
enhancements for a single driver, separate those changes into two
|
||||
or more patches. If your changes include an API update, and a new
|
||||
driver which uses that new API, separate those into two patches.
|
||||
|
||||
On the other hand, if you make a single change to numerous files,
|
||||
group those changes into a single patch. Thus a single logical change
|
||||
is contained within a single patch.
|
||||
|
||||
The point to remember is that each patch should make an easily understood
|
||||
change that can be verified by reviewers. Each patch should be justifiable
|
||||
on its own merits.
|
||||
|
||||
If one patch depends on another patch in order for a change to be
|
||||
complete, that is OK. Simply note **"this patch depends on patch X"**
|
||||
in your patch description.
|
||||
|
||||
When dividing your change into a series of patches, take special care to
|
||||
ensure that the kernel builds and runs properly after each patch in the
|
||||
series. Developers using ``git bisect`` to track down a problem can end up
|
||||
splitting your patch series at any point; they will not thank you if you
|
||||
introduce bugs in the middle.
|
||||
|
||||
If you cannot condense your patch set into a smaller set of patches,
|
||||
then only post say 15 or so at a time and wait for review and integration.
|
||||
|
||||
|
||||
|
||||
4) Style-check your changes
|
||||
---------------------------
|
||||
|
||||
Check your patch for basic style violations, details of which can be
|
||||
found in
|
||||
:ref:`Documentation/CodingStyle <codingstyle>`.
|
||||
Failure to do so simply wastes
|
||||
the reviewers time and will get your patch rejected, probably
|
||||
without even being read.
|
||||
|
||||
One significant exception is when moving code from one file to
|
||||
another -- in this case you should not modify the moved code at all in
|
||||
the same patch which moves it. This clearly delineates the act of
|
||||
moving the code and your changes. This greatly aids review of the
|
||||
actual differences and allows tools to better track the history of
|
||||
the code itself.
|
||||
|
||||
Check your patches with the patch style checker prior to submission
|
||||
(scripts/checkpatch.pl). Note, though, that the style checker should be
|
||||
viewed as a guide, not as a replacement for human judgment. If your code
|
||||
looks better with a violation then its probably best left alone.
|
||||
|
||||
The checker reports at three levels:
|
||||
- ERROR: things that are very likely to be wrong
|
||||
- WARNING: things requiring careful review
|
||||
- CHECK: things requiring thought
|
||||
|
||||
You should be able to justify all violations that remain in your
|
||||
patch.
|
||||
|
||||
|
||||
5) Select the recipients for your patch
|
||||
---------------------------------------
|
||||
|
||||
You should always copy the appropriate subsystem maintainer(s) on any patch
|
||||
to code that they maintain; look through the MAINTAINERS file and the
|
||||
source code revision history to see who those maintainers are. The
|
||||
script scripts/get_maintainer.pl can be very useful at this step. If you
|
||||
cannot find a maintainer for the subsystem you are working on, Andrew
|
||||
Morton (akpm@linux-foundation.org) serves as a maintainer of last resort.
|
||||
|
||||
You should also normally choose at least one mailing list to receive a copy
|
||||
of your patch set. linux-kernel@vger.kernel.org functions as a list of
|
||||
last resort, but the volume on that list has caused a number of developers
|
||||
to tune it out. Look in the MAINTAINERS file for a subsystem-specific
|
||||
list; your patch will probably get more attention there. Please do not
|
||||
spam unrelated lists, though.
|
||||
|
||||
Many kernel-related lists are hosted on vger.kernel.org; you can find a
|
||||
list of them at http://vger.kernel.org/vger-lists.html. There are
|
||||
kernel-related lists hosted elsewhere as well, though.
|
||||
|
||||
Do not send more than 15 patches at once to the vger mailing lists!!!
|
||||
|
||||
Linus Torvalds is the final arbiter of all changes accepted into the
|
||||
Linux kernel. His e-mail address is <torvalds@linux-foundation.org>.
|
||||
He gets a lot of e-mail, and, at this point, very few patches go through
|
||||
Linus directly, so typically you should do your best to -avoid-
|
||||
sending him e-mail.
|
||||
|
||||
If you have a patch that fixes an exploitable security bug, send that patch
|
||||
to security@kernel.org. For severe bugs, a short embargo may be considered
|
||||
to allow distributors to get the patch out to users; in such cases,
|
||||
obviously, the patch should not be sent to any public lists.
|
||||
|
||||
Patches that fix a severe bug in a released kernel should be directed
|
||||
toward the stable maintainers by putting a line like this::
|
||||
|
||||
Cc: stable@vger.kernel.org
|
||||
|
||||
into the sign-off area of your patch (note, NOT an email recipient). You
|
||||
should also read
|
||||
:ref:`Documentation/stable_kernel_rules.txt <stable_kernel_rules>`
|
||||
in addition to this file.
|
||||
|
||||
Note, however, that some subsystem maintainers want to come to their own
|
||||
conclusions on which patches should go to the stable trees. The networking
|
||||
maintainer, in particular, would rather not see individual developers
|
||||
adding lines like the above to their patches.
|
||||
|
||||
If changes affect userland-kernel interfaces, please send the MAN-PAGES
|
||||
maintainer (as listed in the MAINTAINERS file) a man-pages patch, or at
|
||||
least a notification of the change, so that some information makes its way
|
||||
into the manual pages. User-space API changes should also be copied to
|
||||
linux-api@vger.kernel.org.
|
||||
|
||||
For small patches you may want to CC the Trivial Patch Monkey
|
||||
trivial@kernel.org which collects "trivial" patches. Have a look
|
||||
into the MAINTAINERS file for its current manager.
|
||||
|
||||
Trivial patches must qualify for one of the following rules:
|
||||
|
||||
- Spelling fixes in documentation
|
||||
- Spelling fixes for errors which could break :manpage:`grep(1)`
|
||||
- Warning fixes (cluttering with useless warnings is bad)
|
||||
- Compilation fixes (only if they are actually correct)
|
||||
- Runtime fixes (only if they actually fix things)
|
||||
- Removing use of deprecated functions/macros
|
||||
- Contact detail and documentation fixes
|
||||
- Non-portable code replaced by portable code (even in arch-specific,
|
||||
since people copy, as long as it's trivial)
|
||||
- Any fix by the author/maintainer of the file (ie. patch monkey
|
||||
in re-transmission mode)
|
||||
|
||||
|
||||
|
||||
6) No MIME, no links, no compression, no attachments. Just plain text
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Linus and other kernel developers need to be able to read and comment
|
||||
on the changes you are submitting. It is important for a kernel
|
||||
developer to be able to "quote" your changes, using standard e-mail
|
||||
tools, so that they may comment on specific portions of your code.
|
||||
|
||||
For this reason, all patches should be submitted by e-mail "inline".
|
||||
|
||||
.. warning::
|
||||
|
||||
Be wary of your editor's word-wrap corrupting your patch,
|
||||
if you choose to cut-n-paste your patch.
|
||||
|
||||
Do not attach the patch as a MIME attachment, compressed or not.
|
||||
Many popular e-mail applications will not always transmit a MIME
|
||||
attachment as plain text, making it impossible to comment on your
|
||||
code. A MIME attachment also takes Linus a bit more time to process,
|
||||
decreasing the likelihood of your MIME-attached change being accepted.
|
||||
|
||||
Exception: If your mailer is mangling patches then someone may ask
|
||||
you to re-send them using MIME.
|
||||
|
||||
See :ref:`Documentation/email-clients.txt <email_clients>`
|
||||
for hints about configuring your e-mail client so that it sends your patches
|
||||
untouched.
|
||||
|
||||
7) E-mail size
|
||||
--------------
|
||||
|
||||
Large changes are not appropriate for mailing lists, and some
|
||||
maintainers. If your patch, uncompressed, exceeds 300 kB in size,
|
||||
it is preferred that you store your patch on an Internet-accessible
|
||||
server, and provide instead a URL (link) pointing to your patch. But note
|
||||
that if your patch exceeds 300 kB, it almost certainly needs to be broken up
|
||||
anyway.
|
||||
|
||||
8) Respond to review comments
|
||||
-----------------------------
|
||||
|
||||
Your patch will almost certainly get comments from reviewers on ways in
|
||||
which the patch can be improved. You must respond to those comments;
|
||||
ignoring reviewers is a good way to get ignored in return. Review comments
|
||||
or questions that do not lead to a code change should almost certainly
|
||||
bring about a comment or changelog entry so that the next reviewer better
|
||||
understands what is going on.
|
||||
|
||||
Be sure to tell the reviewers what changes you are making and to thank them
|
||||
for their time. Code review is a tiring and time-consuming process, and
|
||||
reviewers sometimes get grumpy. Even in that case, though, respond
|
||||
politely and address the problems they have pointed out.
|
||||
|
||||
|
||||
9) Don't get discouraged - or impatient
|
||||
---------------------------------------
|
||||
|
||||
After you have submitted your change, be patient and wait. Reviewers are
|
||||
busy people and may not get to your patch right away.
|
||||
|
||||
Once upon a time, patches used to disappear into the void without comment,
|
||||
but the development process works more smoothly than that now. You should
|
||||
receive comments within a week or so; if that does not happen, make sure
|
||||
that you have sent your patches to the right place. Wait for a minimum of
|
||||
one week before resubmitting or pinging reviewers - possibly longer during
|
||||
busy times like merge windows.
|
||||
|
||||
|
||||
10) Include PATCH in the subject
|
||||
--------------------------------
|
||||
|
||||
Due to high e-mail traffic to Linus, and to linux-kernel, it is common
|
||||
convention to prefix your subject line with [PATCH]. This lets Linus
|
||||
and other kernel developers more easily distinguish patches from other
|
||||
e-mail discussions.
|
||||
|
||||
|
||||
|
||||
11) Sign your work
|
||||
------------------
|
||||
|
||||
To improve tracking of who did what, especially with patches that can
|
||||
percolate to their final resting place in the kernel through several
|
||||
layers of maintainers, we've introduced a "sign-off" procedure on
|
||||
patches that are being emailed around.
|
||||
|
||||
The sign-off is a simple line at the end of the explanation for the
|
||||
patch, which certifies that you wrote it or otherwise have the right to
|
||||
pass it on as an open-source patch. The rules are pretty simple: if you
|
||||
can certify the below:
|
||||
|
||||
Developer's Certificate of Origin 1.1
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
By making a contribution to this project, I certify that:
|
||||
|
||||
(a) The contribution was created in whole or in part by me and I
|
||||
have the right to submit it under the open source license
|
||||
indicated in the file; or
|
||||
|
||||
(b) The contribution is based upon previous work that, to the best
|
||||
of my knowledge, is covered under an appropriate open source
|
||||
license and I have the right under that license to submit that
|
||||
work with modifications, whether created in whole or in part
|
||||
by me, under the same open source license (unless I am
|
||||
permitted to submit under a different license), as indicated
|
||||
in the file; or
|
||||
|
||||
(c) The contribution was provided directly to me by some other
|
||||
person who certified (a), (b) or (c) and I have not modified
|
||||
it.
|
||||
|
||||
(d) I understand and agree that this project and the contribution
|
||||
are public and that a record of the contribution (including all
|
||||
personal information I submit with it, including my sign-off) is
|
||||
maintained indefinitely and may be redistributed consistent with
|
||||
this project or the open source license(s) involved.
|
||||
|
||||
then you just add a line saying::
|
||||
|
||||
Signed-off-by: Random J Developer <random@developer.example.org>
|
||||
|
||||
using your real name (sorry, no pseudonyms or anonymous contributions.)
|
||||
|
||||
Some people also put extra tags at the end. They'll just be ignored for
|
||||
now, but you can do this to mark internal company procedures or just
|
||||
point out some special detail about the sign-off.
|
||||
|
||||
If you are a subsystem or branch maintainer, sometimes you need to slightly
|
||||
modify patches you receive in order to merge them, because the code is not
|
||||
exactly the same in your tree and the submitters'. If you stick strictly to
|
||||
rule (c), you should ask the submitter to rediff, but this is a totally
|
||||
counter-productive waste of time and energy. Rule (b) allows you to adjust
|
||||
the code, but then it is very impolite to change one submitter's code and
|
||||
make him endorse your bugs. To solve this problem, it is recommended that
|
||||
you add a line between the last Signed-off-by header and yours, indicating
|
||||
the nature of your changes. While there is nothing mandatory about this, it
|
||||
seems like prepending the description with your mail and/or name, all
|
||||
enclosed in square brackets, is noticeable enough to make it obvious that
|
||||
you are responsible for last-minute changes. Example::
|
||||
|
||||
Signed-off-by: Random J Developer <random@developer.example.org>
|
||||
[lucky@maintainer.example.org: struct foo moved from foo.c to foo.h]
|
||||
Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org>
|
||||
|
||||
This practice is particularly helpful if you maintain a stable branch and
|
||||
want at the same time to credit the author, track changes, merge the fix,
|
||||
and protect the submitter from complaints. Note that under no circumstances
|
||||
can you change the author's identity (the From header), as it is the one
|
||||
which appears in the changelog.
|
||||
|
||||
Special note to back-porters: It seems to be a common and useful practice
|
||||
to insert an indication of the origin of a patch at the top of the commit
|
||||
message (just after the subject line) to facilitate tracking. For instance,
|
||||
here's what we see in a 3.x-stable release::
|
||||
|
||||
Date: Tue Oct 7 07:26:38 2014 -0400
|
||||
|
||||
libata: Un-break ATA blacklist
|
||||
|
||||
commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream.
|
||||
|
||||
And here's what might appear in an older kernel once a patch is backported::
|
||||
|
||||
Date: Tue May 13 22:12:27 2008 +0200
|
||||
|
||||
wireless, airo: waitbusy() won't delay
|
||||
|
||||
[backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a]
|
||||
|
||||
Whatever the format, this information provides a valuable help to people
|
||||
tracking your trees, and to people trying to troubleshoot bugs in your
|
||||
tree.
|
||||
|
||||
|
||||
12) When to use Acked-by: and Cc:
|
||||
---------------------------------
|
||||
|
||||
The Signed-off-by: tag indicates that the signer was involved in the
|
||||
development of the patch, or that he/she was in the patch's delivery path.
|
||||
|
||||
If a person was not directly involved in the preparation or handling of a
|
||||
patch but wishes to signify and record their approval of it then they can
|
||||
ask to have an Acked-by: line added to the patch's changelog.
|
||||
|
||||
Acked-by: is often used by the maintainer of the affected code when that
|
||||
maintainer neither contributed to nor forwarded the patch.
|
||||
|
||||
Acked-by: is not as formal as Signed-off-by:. It is a record that the acker
|
||||
has at least reviewed the patch and has indicated acceptance. Hence patch
|
||||
mergers will sometimes manually convert an acker's "yep, looks good to me"
|
||||
into an Acked-by: (but note that it is usually better to ask for an
|
||||
explicit ack).
|
||||
|
||||
Acked-by: does not necessarily indicate acknowledgement of the entire patch.
|
||||
For example, if a patch affects multiple subsystems and has an Acked-by: from
|
||||
one subsystem maintainer then this usually indicates acknowledgement of just
|
||||
the part which affects that maintainer's code. Judgement should be used here.
|
||||
When in doubt people should refer to the original discussion in the mailing
|
||||
list archives.
|
||||
|
||||
If a person has had the opportunity to comment on a patch, but has not
|
||||
provided such comments, you may optionally add a ``Cc:`` tag to the patch.
|
||||
This is the only tag which might be added without an explicit action by the
|
||||
person it names - but it should indicate that this person was copied on the
|
||||
patch. This tag documents that potentially interested parties
|
||||
have been included in the discussion.
|
||||
|
||||
|
||||
13) Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes:
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
The Reported-by tag gives credit to people who find bugs and report them and it
|
||||
hopefully inspires them to help us again in the future. Please note that if
|
||||
the bug was reported in private, then ask for permission first before using the
|
||||
Reported-by tag.
|
||||
|
||||
A Tested-by: tag indicates that the patch has been successfully tested (in
|
||||
some environment) by the person named. This tag informs maintainers that
|
||||
some testing has been performed, provides a means to locate testers for
|
||||
future patches, and ensures credit for the testers.
|
||||
|
||||
Reviewed-by:, instead, indicates that the patch has been reviewed and found
|
||||
acceptable according to the Reviewer's Statement:
|
||||
|
||||
Reviewer's statement of oversight
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
By offering my Reviewed-by: tag, I state that:
|
||||
|
||||
(a) I have carried out a technical review of this patch to
|
||||
evaluate its appropriateness and readiness for inclusion into
|
||||
the mainline kernel.
|
||||
|
||||
(b) Any problems, concerns, or questions relating to the patch
|
||||
have been communicated back to the submitter. I am satisfied
|
||||
with the submitter's response to my comments.
|
||||
|
||||
(c) While there may be things that could be improved with this
|
||||
submission, I believe that it is, at this time, (1) a
|
||||
worthwhile modification to the kernel, and (2) free of known
|
||||
issues which would argue against its inclusion.
|
||||
|
||||
(d) While I have reviewed the patch and believe it to be sound, I
|
||||
do not (unless explicitly stated elsewhere) make any
|
||||
warranties or guarantees that it will achieve its stated
|
||||
purpose or function properly in any given situation.
|
||||
|
||||
A Reviewed-by tag is a statement of opinion that the patch is an
|
||||
appropriate modification of the kernel without any remaining serious
|
||||
technical issues. Any interested reviewer (who has done the work) can
|
||||
offer a Reviewed-by tag for a patch. This tag serves to give credit to
|
||||
reviewers and to inform maintainers of the degree of review which has been
|
||||
done on the patch. Reviewed-by: tags, when supplied by reviewers known to
|
||||
understand the subject area and to perform thorough reviews, will normally
|
||||
increase the likelihood of your patch getting into the kernel.
|
||||
|
||||
A Suggested-by: tag indicates that the patch idea is suggested by the person
|
||||
named and ensures credit to the person for the idea. Please note that this
|
||||
tag should not be added without the reporter's permission, especially if the
|
||||
idea was not posted in a public forum. That said, if we diligently credit our
|
||||
idea reporters, they will, hopefully, be inspired to help us again in the
|
||||
future.
|
||||
|
||||
A Fixes: tag indicates that the patch fixes an issue in a previous commit. It
|
||||
is used to make it easy to determine where a bug originated, which can help
|
||||
review a bug fix. This tag also assists the stable kernel team in determining
|
||||
which stable kernel versions should receive your fix. This is the preferred
|
||||
method for indicating a bug fixed by the patch. See :ref:`describe_changes`
|
||||
for more details.
|
||||
|
||||
|
||||
14) The canonical patch format
|
||||
------------------------------
|
||||
|
||||
This section describes how the patch itself should be formatted. Note
|
||||
that, if you have your patches stored in a ``git`` repository, proper patch
|
||||
formatting can be had with ``git format-patch``. The tools cannot create
|
||||
the necessary text, though, so read the instructions below anyway.
|
||||
|
||||
The canonical patch subject line is::
|
||||
|
||||
Subject: [PATCH 001/123] subsystem: summary phrase
|
||||
|
||||
The canonical patch message body contains the following:
|
||||
|
||||
- A ``from`` line specifying the patch author (only needed if the person
|
||||
sending the patch is not the author).
|
||||
|
||||
- An empty line.
|
||||
|
||||
- The body of the explanation, line wrapped at 75 columns, which will
|
||||
be copied to the permanent changelog to describe this patch.
|
||||
|
||||
- The ``Signed-off-by:`` lines, described above, which will
|
||||
also go in the changelog.
|
||||
|
||||
- A marker line containing simply ``---``.
|
||||
|
||||
- Any additional comments not suitable for the changelog.
|
||||
|
||||
- The actual patch (``diff`` output).
|
||||
|
||||
The Subject line format makes it very easy to sort the emails
|
||||
alphabetically by subject line - pretty much any email reader will
|
||||
support that - since because the sequence number is zero-padded,
|
||||
the numerical and alphabetic sort is the same.
|
||||
|
||||
The ``subsystem`` in the email's Subject should identify which
|
||||
area or subsystem of the kernel is being patched.
|
||||
|
||||
The ``summary phrase`` in the email's Subject should concisely
|
||||
describe the patch which that email contains. The ``summary
|
||||
phrase`` should not be a filename. Do not use the same ``summary
|
||||
phrase`` for every patch in a whole patch series (where a ``patch
|
||||
series`` is an ordered sequence of multiple, related patches).
|
||||
|
||||
Bear in mind that the ``summary phrase`` of your email becomes a
|
||||
globally-unique identifier for that patch. It propagates all the way
|
||||
into the ``git`` changelog. The ``summary phrase`` may later be used in
|
||||
developer discussions which refer to the patch. People will want to
|
||||
google for the ``summary phrase`` to read discussion regarding that
|
||||
patch. It will also be the only thing that people may quickly see
|
||||
when, two or three months later, they are going through perhaps
|
||||
thousands of patches using tools such as ``gitk`` or ``git log
|
||||
--oneline``.
|
||||
|
||||
For these reasons, the ``summary`` must be no more than 70-75
|
||||
characters, and it must describe both what the patch changes, as well
|
||||
as why the patch might be necessary. It is challenging to be both
|
||||
succinct and descriptive, but that is what a well-written summary
|
||||
should do.
|
||||
|
||||
The ``summary phrase`` may be prefixed by tags enclosed in square
|
||||
brackets: "Subject: [PATCH <tag>...] <summary phrase>". The tags are
|
||||
not considered part of the summary phrase, but describe how the patch
|
||||
should be treated. Common tags might include a version descriptor if
|
||||
the multiple versions of the patch have been sent out in response to
|
||||
comments (i.e., "v1, v2, v3"), or "RFC" to indicate a request for
|
||||
comments. If there are four patches in a patch series the individual
|
||||
patches may be numbered like this: 1/4, 2/4, 3/4, 4/4. This assures
|
||||
that developers understand the order in which the patches should be
|
||||
applied and that they have reviewed or applied all of the patches in
|
||||
the patch series.
|
||||
|
||||
A couple of example Subjects::
|
||||
|
||||
Subject: [PATCH 2/5] ext2: improve scalability of bitmap searching
|
||||
Subject: [PATCH v2 01/27] x86: fix eflags tracking
|
||||
|
||||
The ``from`` line must be the very first line in the message body,
|
||||
and has the form:
|
||||
|
||||
From: Original Author <author@example.com>
|
||||
|
||||
The ``from`` line specifies who will be credited as the author of the
|
||||
patch in the permanent changelog. If the ``from`` line is missing,
|
||||
then the ``From:`` line from the email header will be used to determine
|
||||
the patch author in the changelog.
|
||||
|
||||
The explanation body will be committed to the permanent source
|
||||
changelog, so should make sense to a competent reader who has long
|
||||
since forgotten the immediate details of the discussion that might
|
||||
have led to this patch. Including symptoms of the failure which the
|
||||
patch addresses (kernel log messages, oops messages, etc.) is
|
||||
especially useful for people who might be searching the commit logs
|
||||
looking for the applicable patch. If a patch fixes a compile failure,
|
||||
it may not be necessary to include _all_ of the compile failures; just
|
||||
enough that it is likely that someone searching for the patch can find
|
||||
it. As in the ``summary phrase``, it is important to be both succinct as
|
||||
well as descriptive.
|
||||
|
||||
The ``---`` marker line serves the essential purpose of marking for patch
|
||||
handling tools where the changelog message ends.
|
||||
|
||||
One good use for the additional comments after the ``---`` marker is for
|
||||
a ``diffstat``, to show what files have changed, and the number of
|
||||
inserted and deleted lines per file. A ``diffstat`` is especially useful
|
||||
on bigger patches. Other comments relevant only to the moment or the
|
||||
maintainer, not suitable for the permanent changelog, should also go
|
||||
here. A good example of such comments might be ``patch changelogs``
|
||||
which describe what has changed between the v1 and v2 version of the
|
||||
patch.
|
||||
|
||||
If you are going to include a ``diffstat`` after the ``---`` marker, please
|
||||
use ``diffstat`` options ``-p 1 -w 70`` so that filenames are listed from
|
||||
the top of the kernel source tree and don't use too much horizontal
|
||||
space (easily fit in 80 columns, maybe with some indentation). (``git``
|
||||
generates appropriate diffstats by default.)
|
||||
|
||||
See more details on the proper patch format in the following
|
||||
references.
|
||||
|
||||
.. _explicit_in_reply_to:
|
||||
|
||||
15) Explicit In-Reply-To headers
|
||||
--------------------------------
|
||||
|
||||
It can be helpful to manually add In-Reply-To: headers to a patch
|
||||
(e.g., when using ``git send-email``) to associate the patch with
|
||||
previous relevant discussion, e.g. to link a bug fix to the email with
|
||||
the bug report. However, for a multi-patch series, it is generally
|
||||
best to avoid using In-Reply-To: to link to older versions of the
|
||||
series. This way multiple versions of the patch don't become an
|
||||
unmanageable forest of references in email clients. If a link is
|
||||
helpful, you can use the https://lkml.kernel.org/ redirector (e.g., in
|
||||
the cover email text) to link to an earlier version of the patch series.
|
||||
|
||||
|
||||
16) Sending ``git pull`` requests
|
||||
---------------------------------
|
||||
|
||||
If you have a series of patches, it may be most convenient to have the
|
||||
maintainer pull them directly into the subsystem repository with a
|
||||
``git pull`` operation. Note, however, that pulling patches from a developer
|
||||
requires a higher degree of trust than taking patches from a mailing list.
|
||||
As a result, many subsystem maintainers are reluctant to take pull
|
||||
requests, especially from new, unknown developers. If in doubt you can use
|
||||
the pull request as the cover letter for a normal posting of the patch
|
||||
series, giving the maintainer the option of using either.
|
||||
|
||||
A pull request should have [GIT] or [PULL] in the subject line. The
|
||||
request itself should include the repository name and the branch of
|
||||
interest on a single line; it should look something like::
|
||||
|
||||
Please pull from
|
||||
|
||||
git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus
|
||||
|
||||
to get these changes:
|
||||
|
||||
A pull request should also include an overall message saying what will be
|
||||
included in the request, a ``git shortlog`` listing of the patches
|
||||
themselves, and a ``diffstat`` showing the overall effect of the patch series.
|
||||
The easiest way to get all this information together is, of course, to let
|
||||
``git`` do it for you with the ``git request-pull`` command.
|
||||
|
||||
Some maintainers (including Linus) want to see pull requests from signed
|
||||
commits; that increases their confidence that the request actually came
|
||||
from you. Linus, in particular, will not pull from public hosting sites
|
||||
like GitHub in the absence of a signed tag.
|
||||
|
||||
The first step toward creating such tags is to make a GNUPG key and get it
|
||||
signed by one or more core kernel developers. This step can be hard for
|
||||
new developers, but there is no way around it. Attending conferences can
|
||||
be a good way to find developers who can sign your key.
|
||||
|
||||
Once you have prepared a patch series in ``git`` that you wish to have somebody
|
||||
pull, create a signed tag with ``git tag -s``. This will create a new tag
|
||||
identifying the last commit in the series and containing a signature
|
||||
created with your private key. You will also have the opportunity to add a
|
||||
changelog-style message to the tag; this is an ideal place to describe the
|
||||
effects of the pull request as a whole.
|
||||
|
||||
If the tree the maintainer will be pulling from is not the repository you
|
||||
are working from, don't forget to push the signed tag explicitly to the
|
||||
public tree.
|
||||
|
||||
When generating your pull request, use the signed tag as the target. A
|
||||
command like this will do the trick::
|
||||
|
||||
git request-pull master git://my.public.tree/linux.git my-signed-tag
|
||||
|
||||
|
||||
REFERENCES
|
||||
**********
|
||||
|
||||
Andrew Morton, "The perfect patch" (tpp).
|
||||
<http://www.ozlabs.org/~akpm/stuff/tpp.txt>
|
||||
|
||||
Jeff Garzik, "Linux kernel patch submission format".
|
||||
<http://linux.yyz.us/patch-format.html>
|
||||
|
||||
Greg Kroah-Hartman, "How to piss off a kernel subsystem maintainer".
|
||||
<http://www.kroah.com/log/linux/maintainer.html>
|
||||
|
||||
<http://www.kroah.com/log/linux/maintainer-02.html>
|
||||
|
||||
<http://www.kroah.com/log/linux/maintainer-03.html>
|
||||
|
||||
<http://www.kroah.com/log/linux/maintainer-04.html>
|
||||
|
||||
<http://www.kroah.com/log/linux/maintainer-05.html>
|
||||
|
||||
<http://www.kroah.com/log/linux/maintainer-06.html>
|
||||
|
||||
NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!
|
||||
<https://lkml.org/lkml/2005/7/11/336>
|
||||
|
||||
Kernel Documentation/CodingStyle:
|
||||
:ref:`Documentation/CodingStyle <codingstyle>`
|
||||
|
||||
Linus Torvalds's mail on the canonical patch format:
|
||||
<http://lkml.org/lkml/2005/4/7/183>
|
||||
|
||||
Andi Kleen, "On submitting kernel patches"
|
||||
Some strategies to get difficult or controversial changes in.
|
||||
|
||||
http://halobates.de/on-submitting-patches.pdf
|
||||
|
||||
This file has moved to process/submitting-patches.rst
|
||||
|
||||
@@ -1,39 +0,0 @@
|
||||
Software cursor for VGA by Pavel Machek <pavel@atrey.karlin.mff.cuni.cz>
|
||||
======================= and Martin Mares <mj@atrey.karlin.mff.cuni.cz>
|
||||
|
||||
Linux now has some ability to manipulate cursor appearance. Normally, you
|
||||
can set the size of hardware cursor (and also work around some ugly bugs in
|
||||
those miserable Trident cards--see #define TRIDENT_GLITCH in drivers/video/
|
||||
vgacon.c). You can now play a few new tricks: you can make your cursor look
|
||||
like a non-blinking red block, make it inverse background of the character it's
|
||||
over or to highlight that character and still choose whether the original
|
||||
hardware cursor should remain visible or not. There may be other things I have
|
||||
never thought of.
|
||||
|
||||
The cursor appearance is controlled by a "<ESC>[?1;2;3c" escape sequence
|
||||
where 1, 2 and 3 are parameters described below. If you omit any of them,
|
||||
they will default to zeroes.
|
||||
|
||||
Parameter 1 specifies cursor size (0=default, 1=invisible, 2=underline, ...,
|
||||
8=full block) + 16 if you want the software cursor to be applied + 32 if you
|
||||
want to always change the background color + 64 if you dislike having the
|
||||
background the same as the foreground. Highlights are ignored for the last two
|
||||
flags.
|
||||
|
||||
The second parameter selects character attribute bits you want to change
|
||||
(by simply XORing them with the value of this parameter). On standard VGA,
|
||||
the high four bits specify background and the low four the foreground. In both
|
||||
groups, low three bits set color (as in normal color codes used by the console)
|
||||
and the most significant one turns on highlight (or sometimes blinking--it
|
||||
depends on the configuration of your VGA).
|
||||
|
||||
The third parameter consists of character attribute bits you want to set.
|
||||
Bit setting takes place before bit toggling, so you can simply clear a bit by
|
||||
including it in both the set mask and the toggle mask.
|
||||
|
||||
Examples:
|
||||
=========
|
||||
|
||||
To get normal blinking underline, use: echo -e '\033[?2c'
|
||||
To get blinking block, use: echo -e '\033[?6c'
|
||||
To get red non-blinking block, use: echo -e '\033[?17;0;64c'
|
||||
97
Documentation/acpi/DSD-properties-rules.txt
Normal file
97
Documentation/acpi/DSD-properties-rules.txt
Normal file
@@ -0,0 +1,97 @@
|
||||
_DSD Device Properties Usage Rules
|
||||
----------------------------------
|
||||
|
||||
Properties, Property Sets and Property Subsets
|
||||
----------------------------------------------
|
||||
|
||||
The _DSD (Device Specific Data) configuration object, introduced in ACPI 5.1,
|
||||
allows any type of device configuration data to be provided via the ACPI
|
||||
namespace. In principle, the format of the data may be arbitrary, but it has to
|
||||
be identified by a UUID which must be recognized by the driver processing the
|
||||
_DSD output. However, there are generic UUIDs defined for _DSD recognized by
|
||||
the ACPI subsystem in the Linux kernel which automatically processes the data
|
||||
packages associated with them and makes those data available to device drivers
|
||||
as "device properties".
|
||||
|
||||
A device property is a data item consisting of a string key and a value (of a
|
||||
specific type) associated with it.
|
||||
|
||||
In the ACPI _DSD context it is an element of the sub-package following the
|
||||
generic Device Properties UUID in the _DSD return package as specified in the
|
||||
Device Properties UUID definition document [1].
|
||||
|
||||
It also may be regarded as the definition of a key and the associated data type
|
||||
that can be returned by _DSD in the Device Properties UUID sub-package for a
|
||||
given device.
|
||||
|
||||
A property set is a collection of properties applicable to a hardware entity
|
||||
like a device. In the ACPI _DSD context it is the set of all properties that
|
||||
can be returned in the Device Properties UUID sub-package for the device in
|
||||
question.
|
||||
|
||||
Property subsets are nested collections of properties. Each of them is
|
||||
associated with an additional key (name) allowing the subset to be referred
|
||||
to as a whole (and to be treated as a separate entity). The canonical
|
||||
representation of property subsets is via the mechanism specified in the
|
||||
Hierarchical Properties Extension UUID definition document [2].
|
||||
|
||||
Property sets may be hierarchical. That is, a property set may contain
|
||||
multiple property subsets that each may contain property subsets of its
|
||||
own and so on.
|
||||
|
||||
General Validity Rule for Property Sets
|
||||
---------------------------------------
|
||||
|
||||
Valid property sets must follow the guidance given by the Device Properties UUID
|
||||
definition document [1].
|
||||
|
||||
_DSD properties are intended to be used in addition to, and not instead of, the
|
||||
existing mechanisms defined by the ACPI specification. Therefore, as a rule,
|
||||
they should only be used if the ACPI specification does not make direct
|
||||
provisions for handling the underlying use case. It generally is invalid to
|
||||
return property sets which do not follow that rule from _DSD in data packages
|
||||
associated with the Device Properties UUID.
|
||||
|
||||
Additional Considerations
|
||||
-------------------------
|
||||
|
||||
There are cases in which, even if the general rule given above is followed in
|
||||
principle, the property set may still not be regarded as a valid one.
|
||||
|
||||
For example, that applies to device properties which may cause kernel code
|
||||
(either a device driver or a library/subsystem) to access hardware in a way
|
||||
possibly leading to a conflict with AML methods in the ACPI namespace. In
|
||||
particular, that may happen if the kernel code uses device properties to
|
||||
manipulate hardware normally controlled by ACPI methods related to power
|
||||
management, like _PSx and _DSW (for device objects) or _ON and _OFF (for power
|
||||
resource objects), or by ACPI device disabling/enabling methods, like _DIS and
|
||||
_SRS.
|
||||
|
||||
In all cases in which kernel code may do something that will confuse AML as a
|
||||
result of using device properties, the device properties in question are not
|
||||
suitable for the ACPI environment and consequently they cannot belong to a valid
|
||||
property set.
|
||||
|
||||
Property Sets and Device Tree Bindings
|
||||
--------------------------------------
|
||||
|
||||
It often is useful to make _DSD return property sets that follow Device Tree
|
||||
bindings.
|
||||
|
||||
In those cases, however, the above validity considerations must be taken into
|
||||
account in the first place and returning invalid property sets from _DSD must be
|
||||
avoided. For this reason, it may not be possible to make _DSD return a property
|
||||
set following the given DT binding literally and completely. Still, for the
|
||||
sake of code re-use, it may make sense to provide as much of the configuration
|
||||
data as possible in the form of device properties and complement that with an
|
||||
ACPI-specific mechanism suitable for the use case at hand.
|
||||
|
||||
In any case, property sets following DT bindings literally should not be
|
||||
expected to automatically work in the ACPI environment regardless of their
|
||||
contents.
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
[1] http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
|
||||
[2] http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf
|
||||
@@ -415,3 +415,12 @@ the "compatible" property in the _DSD or a _CID as long as one of their
|
||||
ancestors provides a _DSD with a valid "compatible" property. Such device
|
||||
objects are then simply regarded as additional "blocks" providing hierarchical
|
||||
configuration information to the driver of the composite ancestor device.
|
||||
|
||||
However, PRP0001 can only be returned from either _HID or _CID of a device
|
||||
object if all of the properties returned by the _DSD associated with it (either
|
||||
the _DSD of the device object itself or the _DSD of its ancestor in the
|
||||
"composite device" case described above) can be used in the ACPI environment.
|
||||
Otherwise, the _DSD itself is regarded as invalid and therefore the "compatible"
|
||||
property returned by it is meaningless.
|
||||
|
||||
Refer to DSD-properties-rules.txt for more information.
|
||||
|
||||
@@ -51,6 +51,68 @@ it to 1 marks the GPIO as active low.
|
||||
In our Bluetooth example the "reset-gpios" refers to the second GpioIo()
|
||||
resource, second pin in that resource with the GPIO number of 31.
|
||||
|
||||
It is possible to leave holes in the array of GPIOs. This is useful in
|
||||
cases like with SPI host controllers where some chip selects may be
|
||||
implemented as GPIOs and some as native signals. For example a SPI host
|
||||
controller can have chip selects 0 and 2 implemented as GPIOs and 1 as
|
||||
native:
|
||||
|
||||
Package () {
|
||||
"cs-gpios",
|
||||
Package () {
|
||||
^GPIO, 19, 0, 0, // chip select 0: GPIO
|
||||
0, // chip select 1: native signal
|
||||
^GPIO, 20, 0, 0, // chip select 2: GPIO
|
||||
}
|
||||
}
|
||||
|
||||
Other supported properties
|
||||
--------------------------
|
||||
|
||||
Following Device Tree compatible device properties are also supported by
|
||||
_DSD device properties for GPIO controllers:
|
||||
|
||||
- gpio-hog
|
||||
- output-high
|
||||
- output-low
|
||||
- input
|
||||
- line-name
|
||||
|
||||
Example:
|
||||
|
||||
Name (_DSD, Package () {
|
||||
// _DSD Hierarchical Properties Extension UUID
|
||||
ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
|
||||
Package () {
|
||||
Package () {"hog-gpio8", "G8PU"}
|
||||
}
|
||||
})
|
||||
|
||||
Name (G8PU, Package () {
|
||||
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
|
||||
Package () {
|
||||
Package () {"gpio-hog", 1},
|
||||
Package () {"gpios", Package () {8, 0}},
|
||||
Package () {"output-high", 1},
|
||||
Package () {"line-name", "gpio8-pullup"},
|
||||
}
|
||||
})
|
||||
|
||||
- gpio-line-names
|
||||
|
||||
Example:
|
||||
|
||||
Package () {
|
||||
"gpio-line-names",
|
||||
Package () {
|
||||
"SPI0_CS_N", "EXP2_INT", "MUX6_IO", "UART0_RXD", "MUX7_IO",
|
||||
"LVL_C_A1", "MUX0_IO", "SPI1_MISO"
|
||||
}
|
||||
}
|
||||
|
||||
See Documentation/devicetree/bindings/gpio/gpio.txt for more information
|
||||
about these properties.
|
||||
|
||||
ACPI GPIO Mappings Provided by Drivers
|
||||
--------------------------------------
|
||||
|
||||
|
||||
187
Documentation/acpi/osi.txt
Normal file
187
Documentation/acpi/osi.txt
Normal file
@@ -0,0 +1,187 @@
|
||||
ACPI _OSI and _REV methods
|
||||
--------------------------
|
||||
|
||||
An ACPI BIOS can use the "Operating System Interfaces" method (_OSI)
|
||||
to find out what the operating system supports. Eg. If BIOS
|
||||
AML code includes _OSI("XYZ"), the kernel's AML interpreter
|
||||
can evaluate that method, look to see if it supports 'XYZ'
|
||||
and answer YES or NO to the BIOS.
|
||||
|
||||
The ACPI _REV method returns the "Revision of the ACPI specification
|
||||
that OSPM supports"
|
||||
|
||||
This document explains how and why the BIOS and Linux should use these methods.
|
||||
It also explains how and why they are widely misused.
|
||||
|
||||
How to use _OSI
|
||||
---------------
|
||||
|
||||
Linux runs on two groups of machines -- those that are tested by the OEM
|
||||
to be compatible with Linux, and those that were never tested with Linux,
|
||||
but where Linux was installed to replace the original OS (Windows or OSX).
|
||||
|
||||
The larger group is the systems tested to run only Windows. Not only that,
|
||||
but many were tested to run with just one specific version of Windows.
|
||||
So even though the BIOS may use _OSI to query what version of Windows is running,
|
||||
only a single path through the BIOS has actually been tested.
|
||||
Experience shows that taking untested paths through the BIOS
|
||||
exposes Linux to an entire category of BIOS bugs.
|
||||
For this reason, Linux _OSI defaults must continue to claim compatibility
|
||||
with all versions of Windows.
|
||||
|
||||
But Linux isn't actually compatible with Windows, and the Linux community
|
||||
has also been hurt with regressions when Linux adds the latest version of
|
||||
Windows to its list of _OSI strings. So it is possible that additional strings
|
||||
will be more thoroughly vetted before shipping upstream in the future.
|
||||
But it is likely that they will all eventually be added.
|
||||
|
||||
What should an OEM do if they want to support Linux and Windows
|
||||
using the same BIOS image? Often they need to do something different
|
||||
for Linux to deal with how Linux is different from Windows.
|
||||
Here the BIOS should ask exactly what it wants to know:
|
||||
|
||||
_OSI("Linux-OEM-my_interface_name")
|
||||
where 'OEM' is needed if this is an OEM-specific hook,
|
||||
and 'my_interface_name' describes the hook, which could be a
|
||||
quirk, a bug, or a bug-fix.
|
||||
|
||||
In addition, the OEM should send a patch to upstream Linux
|
||||
via the linux-acpi@vger.kernel.org mailing list. When that patch
|
||||
is checked into Linux, the OS will answer "YES" when the BIOS
|
||||
on the OEM's system uses _OSI to ask if the interface is supported
|
||||
by the OS. Linux distributors can back-port that patch for Linux
|
||||
pre-installs, and it will be included by all distributions that
|
||||
re-base to upstream. If the distribution can not update the kernel binary,
|
||||
they can also add an acpi_osi=Linux-OEM-my_interface_name
|
||||
cmdline parameter to the boot loader, as needed.
|
||||
|
||||
If the string refers to a feature where the upstream kernel
|
||||
eventually grows support, a patch should be sent to remove
|
||||
the string when that support is added to the kernel.
|
||||
|
||||
That was easy. Read on, to find out how to do it wrong.
|
||||
|
||||
Before _OSI, there was _OS
|
||||
--------------------------
|
||||
|
||||
ACPI 1.0 specified "_OS" as an
|
||||
"object that evaluates to a string that identifies the operating system."
|
||||
|
||||
The ACPI BIOS flow would include an evaluation of _OS, and the AML
|
||||
interpreter in the kernel would return to it a string identifying the OS:
|
||||
|
||||
Windows 98, SE: "Microsoft Windows"
|
||||
Windows ME: "Microsoft WindowsME:Millenium Edition"
|
||||
Windows NT: "Microsoft Windows NT"
|
||||
|
||||
The idea was on a platform tasked with running multiple OS's,
|
||||
the BIOS could use _OS to enable devices that an OS
|
||||
might support, or enable quirks or bug workarounds
|
||||
necessary to make the platform compatible with that pre-existing OS.
|
||||
|
||||
But _OS had fundamental problems. First, the BIOS needed to know the name
|
||||
of every possible version of the OS that would run on it, and needed to know
|
||||
all the quirks of those OS's. Certainly it would make more sense
|
||||
for the BIOS to ask *specific* things of the OS, such
|
||||
"do you support a specific interface", and thus in ACPI 3.0,
|
||||
_OSI was born to replace _OS.
|
||||
|
||||
_OS was abandoned, though even today, many BIOS look for
|
||||
_OS "Microsoft Windows NT", though it seems somewhat far-fetched
|
||||
that anybody would install those old operating systems
|
||||
over what came with the machine.
|
||||
|
||||
Linux answers "Microsoft Windows NT" to please that BIOS idiom.
|
||||
That is the *only* viable strategy, as that is what modern Windows does,
|
||||
and so doing otherwise could steer the BIOS down an untested path.
|
||||
|
||||
_OSI is born, and immediately misused
|
||||
--------------------------------------
|
||||
|
||||
With _OSI, the *BIOS* provides the string describing an interface,
|
||||
and asks the OS: "YES/NO, are you compatible with this interface?"
|
||||
|
||||
eg. _OSI("3.0 Thermal Model") would return TRUE if the OS knows how
|
||||
to deal with the thermal extensions made to the ACPI 3.0 specification.
|
||||
An old OS that doesn't know about those extensions would answer FALSE,
|
||||
and a new OS may be able to return TRUE.
|
||||
|
||||
For an OS-specific interface, the ACPI spec said that the BIOS and the OS
|
||||
were to agree on a string of the form such as "Windows-interface_name".
|
||||
|
||||
But two bad things happened. First, the Windows ecosystem used _OSI
|
||||
not as designed, but as a direct replacement for _OS -- identifying
|
||||
the OS version, rather than an OS supported interface. Indeed, right
|
||||
from the start, the ACPI 3.0 spec itself codified this misuse
|
||||
in example code using _OSI("Windows 2001").
|
||||
|
||||
This misuse was adopted and continues today.
|
||||
|
||||
Linux had no choice but to also return TRUE to _OSI("Windows 2001")
|
||||
and its successors. To do otherwise would virtually guarantee breaking
|
||||
a BIOS that has been tested only with that _OSI returning TRUE.
|
||||
|
||||
This strategy is problematic, as Linux is never completely compatible with
|
||||
the latest version of Windows, and sometimes it takes more than a year
|
||||
to iron out incompatibilities.
|
||||
|
||||
Not to be out-done, the Linux community made things worse by returning TRUE
|
||||
to _OSI("Linux"). Doing so is even worse than the Windows misuse
|
||||
of _OSI, as "Linux" does not even contain any version information.
|
||||
_OSI("Linux") led to some BIOS' malfunctioning due to BIOS writer's
|
||||
using it in untested BIOS flows. But some OEM's used _OSI("Linux")
|
||||
in tested flows to support real Linux features. In 2009, Linux
|
||||
removed _OSI("Linux"), and added a cmdline parameter to restore it
|
||||
for legacy systems still needed it. Further a BIOS_BUG warning prints
|
||||
for all BIOS's that invoke it.
|
||||
|
||||
No BIOS should use _OSI("Linux").
|
||||
|
||||
The result is a strategy for Linux to maximize compatibility with
|
||||
ACPI BIOS that are tested on Windows machines. There is a real risk
|
||||
of over-stating that compatibility; but the alternative has often been
|
||||
catastrophic failure resulting from the BIOS taking paths that
|
||||
were never validated under *any* OS.
|
||||
|
||||
Do not use _REV
|
||||
---------------
|
||||
|
||||
Since _OSI("Linux") went away, some BIOS writers used _REV
|
||||
to support Linux and Windows differences in the same BIOS.
|
||||
|
||||
_REV was defined in ACPI 1.0 to return the version of ACPI
|
||||
supported by the OS and the OS AML interpreter.
|
||||
|
||||
Modern Windows returns _REV = 2. Linux used ACPI_CA_SUPPORT_LEVEL,
|
||||
which would increment, based on the version of the spec supported.
|
||||
|
||||
Unfortunately, _REV was also misused. eg. some BIOS would check
|
||||
for _REV = 3, and do something for Linux, but when Linux returned
|
||||
_REV = 4, that support broke.
|
||||
|
||||
In response to this problem, Linux returns _REV = 2 always,
|
||||
from mid-2015 onward. The ACPI specification will also be updated
|
||||
to reflect that _REV is deprecated, and always returns 2.
|
||||
|
||||
Apple Mac and _OSI("Darwin")
|
||||
----------------------------
|
||||
|
||||
On Apple's Mac platforms, the ACPI BIOS invokes _OSI("Darwin")
|
||||
to determine if the machine is running Apple OSX.
|
||||
|
||||
Like Linux's _OSI("*Windows*") strategy, Linux defaults to
|
||||
answering YES to _OSI("Darwin") to enable full access
|
||||
to the hardware and validated BIOS paths seen by OSX.
|
||||
Just like on Windows-tested platforms, this strategy has risks.
|
||||
|
||||
Starting in Linux-3.18, the kernel answered YES to _OSI("Darwin")
|
||||
for the purpose of enabling Mac Thunderbolt support. Further,
|
||||
if the kernel noticed _OSI("Darwin") being invoked, it additionally
|
||||
disabled all _OSI("*Windows*") to keep poorly written Mac BIOS
|
||||
from going down untested combinations of paths.
|
||||
|
||||
The Linux-3.18 change in default caused power regressions on Mac
|
||||
laptops, and the 3.18 implementation did not allow changing
|
||||
the default via cmdline "acpi_osi=!Darwin". Linux-4.7 fixed
|
||||
the ability to use acpi_osi=!Darwin as a workaround, and
|
||||
we hope to see Mac Thunderbolt power management support in Linux-4.11.
|
||||
@@ -101,6 +101,6 @@ received a notification, it will set the backlight level accordingly. This does
|
||||
not affect the sending of event to user space, they are always sent to user
|
||||
space regardless of whether or not the video module controls the backlight level
|
||||
directly. This behaviour can be controlled through the brightness_switch_enabled
|
||||
module parameter as documented in kernel-parameters.txt. It is recommended to
|
||||
module parameter as documented in admin-guide/kernel-parameters.rst. It is recommended to
|
||||
disable this behaviour once a GUI environment starts up and wants to have full
|
||||
control of the backlight level.
|
||||
|
||||
@@ -1,527 +0,0 @@
|
||||
Adding a New System Call
|
||||
========================
|
||||
|
||||
This document describes what's involved in adding a new system call to the
|
||||
Linux kernel, over and above the normal submission advice in
|
||||
Documentation/SubmittingPatches.
|
||||
|
||||
|
||||
System Call Alternatives
|
||||
------------------------
|
||||
|
||||
The first thing to consider when adding a new system call is whether one of
|
||||
the alternatives might be suitable instead. Although system calls are the
|
||||
most traditional and most obvious interaction points between userspace and the
|
||||
kernel, there are other possibilities -- choose what fits best for your
|
||||
interface.
|
||||
|
||||
- If the operations involved can be made to look like a filesystem-like
|
||||
object, it may make more sense to create a new filesystem or device. This
|
||||
also makes it easier to encapsulate the new functionality in a kernel module
|
||||
rather than requiring it to be built into the main kernel.
|
||||
- If the new functionality involves operations where the kernel notifies
|
||||
userspace that something has happened, then returning a new file
|
||||
descriptor for the relevant object allows userspace to use
|
||||
poll/select/epoll to receive that notification.
|
||||
- However, operations that don't map to read(2)/write(2)-like operations
|
||||
have to be implemented as ioctl(2) requests, which can lead to a
|
||||
somewhat opaque API.
|
||||
- If you're just exposing runtime system information, a new node in sysfs
|
||||
(see Documentation/filesystems/sysfs.txt) or the /proc filesystem may be
|
||||
more appropriate. However, access to these mechanisms requires that the
|
||||
relevant filesystem is mounted, which might not always be the case (e.g.
|
||||
in a namespaced/sandboxed/chrooted environment). Avoid adding any API to
|
||||
debugfs, as this is not considered a 'production' interface to userspace.
|
||||
- If the operation is specific to a particular file or file descriptor, then
|
||||
an additional fcntl(2) command option may be more appropriate. However,
|
||||
fcntl(2) is a multiplexing system call that hides a lot of complexity, so
|
||||
this option is best for when the new function is closely analogous to
|
||||
existing fcntl(2) functionality, or the new functionality is very simple
|
||||
(for example, getting/setting a simple flag related to a file descriptor).
|
||||
- If the operation is specific to a particular task or process, then an
|
||||
additional prctl(2) command option may be more appropriate. As with
|
||||
fcntl(2), this system call is a complicated multiplexor so is best reserved
|
||||
for near-analogs of existing prctl() commands or getting/setting a simple
|
||||
flag related to a process.
|
||||
|
||||
|
||||
Designing the API: Planning for Extension
|
||||
-----------------------------------------
|
||||
|
||||
A new system call forms part of the API of the kernel, and has to be supported
|
||||
indefinitely. As such, it's a very good idea to explicitly discuss the
|
||||
interface on the kernel mailing list, and it's important to plan for future
|
||||
extensions of the interface.
|
||||
|
||||
(The syscall table is littered with historical examples where this wasn't done,
|
||||
together with the corresponding follow-up system calls -- eventfd/eventfd2,
|
||||
dup2/dup3, inotify_init/inotify_init1, pipe/pipe2, renameat/renameat2 -- so
|
||||
learn from the history of the kernel and plan for extensions from the start.)
|
||||
|
||||
For simpler system calls that only take a couple of arguments, the preferred
|
||||
way to allow for future extensibility is to include a flags argument to the
|
||||
system call. To make sure that userspace programs can safely use flags
|
||||
between kernel versions, check whether the flags value holds any unknown
|
||||
flags, and reject the system call (with EINVAL) if it does:
|
||||
|
||||
if (flags & ~(THING_FLAG1 | THING_FLAG2 | THING_FLAG3))
|
||||
return -EINVAL;
|
||||
|
||||
(If no flags values are used yet, check that the flags argument is zero.)
|
||||
|
||||
For more sophisticated system calls that involve a larger number of arguments,
|
||||
it's preferred to encapsulate the majority of the arguments into a structure
|
||||
that is passed in by pointer. Such a structure can cope with future extension
|
||||
by including a size argument in the structure:
|
||||
|
||||
struct xyzzy_params {
|
||||
u32 size; /* userspace sets p->size = sizeof(struct xyzzy_params) */
|
||||
u32 param_1;
|
||||
u64 param_2;
|
||||
u64 param_3;
|
||||
};
|
||||
|
||||
As long as any subsequently added field, say param_4, is designed so that a
|
||||
zero value gives the previous behaviour, then this allows both directions of
|
||||
version mismatch:
|
||||
|
||||
- To cope with a later userspace program calling an older kernel, the kernel
|
||||
code should check that any memory beyond the size of the structure that it
|
||||
expects is zero (effectively checking that param_4 == 0).
|
||||
- To cope with an older userspace program calling a newer kernel, the kernel
|
||||
code can zero-extend a smaller instance of the structure (effectively
|
||||
setting param_4 = 0).
|
||||
|
||||
See perf_event_open(2) and the perf_copy_attr() function (in
|
||||
kernel/events/core.c) for an example of this approach.
|
||||
|
||||
|
||||
Designing the API: Other Considerations
|
||||
---------------------------------------
|
||||
|
||||
If your new system call allows userspace to refer to a kernel object, it
|
||||
should use a file descriptor as the handle for that object -- don't invent a
|
||||
new type of userspace object handle when the kernel already has mechanisms and
|
||||
well-defined semantics for using file descriptors.
|
||||
|
||||
If your new xyzzy(2) system call does return a new file descriptor, then the
|
||||
flags argument should include a value that is equivalent to setting O_CLOEXEC
|
||||
on the new FD. This makes it possible for userspace to close the timing
|
||||
window between xyzzy() and calling fcntl(fd, F_SETFD, FD_CLOEXEC), where an
|
||||
unexpected fork() and execve() in another thread could leak a descriptor to
|
||||
the exec'ed program. (However, resist the temptation to re-use the actual value
|
||||
of the O_CLOEXEC constant, as it is architecture-specific and is part of a
|
||||
numbering space of O_* flags that is fairly full.)
|
||||
|
||||
If your system call returns a new file descriptor, you should also consider
|
||||
what it means to use the poll(2) family of system calls on that file
|
||||
descriptor. Making a file descriptor ready for reading or writing is the
|
||||
normal way for the kernel to indicate to userspace that an event has
|
||||
occurred on the corresponding kernel object.
|
||||
|
||||
If your new xyzzy(2) system call involves a filename argument:
|
||||
|
||||
int sys_xyzzy(const char __user *path, ..., unsigned int flags);
|
||||
|
||||
you should also consider whether an xyzzyat(2) version is more appropriate:
|
||||
|
||||
int sys_xyzzyat(int dfd, const char __user *path, ..., unsigned int flags);
|
||||
|
||||
This allows more flexibility for how userspace specifies the file in question;
|
||||
in particular it allows userspace to request the functionality for an
|
||||
already-opened file descriptor using the AT_EMPTY_PATH flag, effectively giving
|
||||
an fxyzzy(3) operation for free:
|
||||
|
||||
- xyzzyat(AT_FDCWD, path, ..., 0) is equivalent to xyzzy(path,...)
|
||||
- xyzzyat(fd, "", ..., AT_EMPTY_PATH) is equivalent to fxyzzy(fd, ...)
|
||||
|
||||
(For more details on the rationale of the *at() calls, see the openat(2) man
|
||||
page; for an example of AT_EMPTY_PATH, see the fstatat(2) man page.)
|
||||
|
||||
If your new xyzzy(2) system call involves a parameter describing an offset
|
||||
within a file, make its type loff_t so that 64-bit offsets can be supported
|
||||
even on 32-bit architectures.
|
||||
|
||||
If your new xyzzy(2) system call involves privileged functionality, it needs
|
||||
to be governed by the appropriate Linux capability bit (checked with a call to
|
||||
capable()), as described in the capabilities(7) man page. Choose an existing
|
||||
capability bit that governs related functionality, but try to avoid combining
|
||||
lots of only vaguely related functions together under the same bit, as this
|
||||
goes against capabilities' purpose of splitting the power of root. In
|
||||
particular, avoid adding new uses of the already overly-general CAP_SYS_ADMIN
|
||||
capability.
|
||||
|
||||
If your new xyzzy(2) system call manipulates a process other than the calling
|
||||
process, it should be restricted (using a call to ptrace_may_access()) so that
|
||||
only a calling process with the same permissions as the target process, or
|
||||
with the necessary capabilities, can manipulate the target process.
|
||||
|
||||
Finally, be aware that some non-x86 architectures have an easier time if
|
||||
system call parameters that are explicitly 64-bit fall on odd-numbered
|
||||
arguments (i.e. parameter 1, 3, 5), to allow use of contiguous pairs of 32-bit
|
||||
registers. (This concern does not apply if the arguments are part of a
|
||||
structure that's passed in by pointer.)
|
||||
|
||||
|
||||
Proposing the API
|
||||
-----------------
|
||||
|
||||
To make new system calls easy to review, it's best to divide up the patchset
|
||||
into separate chunks. These should include at least the following items as
|
||||
distinct commits (each of which is described further below):
|
||||
|
||||
- The core implementation of the system call, together with prototypes,
|
||||
generic numbering, Kconfig changes and fallback stub implementation.
|
||||
- Wiring up of the new system call for one particular architecture, usually
|
||||
x86 (including all of x86_64, x86_32 and x32).
|
||||
- A demonstration of the use of the new system call in userspace via a
|
||||
selftest in tools/testing/selftests/.
|
||||
- A draft man-page for the new system call, either as plain text in the
|
||||
cover letter, or as a patch to the (separate) man-pages repository.
|
||||
|
||||
New system call proposals, like any change to the kernel's API, should always
|
||||
be cc'ed to linux-api@vger.kernel.org.
|
||||
|
||||
|
||||
Generic System Call Implementation
|
||||
----------------------------------
|
||||
|
||||
The main entry point for your new xyzzy(2) system call will be called
|
||||
sys_xyzzy(), but you add this entry point with the appropriate
|
||||
SYSCALL_DEFINEn() macro rather than explicitly. The 'n' indicates the number
|
||||
of arguments to the system call, and the macro takes the system call name
|
||||
followed by the (type, name) pairs for the parameters as arguments. Using
|
||||
this macro allows metadata about the new system call to be made available for
|
||||
other tools.
|
||||
|
||||
The new entry point also needs a corresponding function prototype, in
|
||||
include/linux/syscalls.h, marked as asmlinkage to match the way that system
|
||||
calls are invoked:
|
||||
|
||||
asmlinkage long sys_xyzzy(...);
|
||||
|
||||
Some architectures (e.g. x86) have their own architecture-specific syscall
|
||||
tables, but several other architectures share a generic syscall table. Add your
|
||||
new system call to the generic list by adding an entry to the list in
|
||||
include/uapi/asm-generic/unistd.h:
|
||||
|
||||
#define __NR_xyzzy 292
|
||||
__SYSCALL(__NR_xyzzy, sys_xyzzy)
|
||||
|
||||
Also update the __NR_syscalls count to reflect the additional system call, and
|
||||
note that if multiple new system calls are added in the same merge window,
|
||||
your new syscall number may get adjusted to resolve conflicts.
|
||||
|
||||
The file kernel/sys_ni.c provides a fallback stub implementation of each system
|
||||
call, returning -ENOSYS. Add your new system call here too:
|
||||
|
||||
cond_syscall(sys_xyzzy);
|
||||
|
||||
Your new kernel functionality, and the system call that controls it, should
|
||||
normally be optional, so add a CONFIG option (typically to init/Kconfig) for
|
||||
it. As usual for new CONFIG options:
|
||||
|
||||
- Include a description of the new functionality and system call controlled
|
||||
by the option.
|
||||
- Make the option depend on EXPERT if it should be hidden from normal users.
|
||||
- Make any new source files implementing the function dependent on the CONFIG
|
||||
option in the Makefile (e.g. "obj-$(CONFIG_XYZZY_SYSCALL) += xyzzy.c").
|
||||
- Double check that the kernel still builds with the new CONFIG option turned
|
||||
off.
|
||||
|
||||
To summarize, you need a commit that includes:
|
||||
|
||||
- CONFIG option for the new function, normally in init/Kconfig
|
||||
- SYSCALL_DEFINEn(xyzzy, ...) for the entry point
|
||||
- corresponding prototype in include/linux/syscalls.h
|
||||
- generic table entry in include/uapi/asm-generic/unistd.h
|
||||
- fallback stub in kernel/sys_ni.c
|
||||
|
||||
|
||||
x86 System Call Implementation
|
||||
------------------------------
|
||||
|
||||
To wire up your new system call for x86 platforms, you need to update the
|
||||
master syscall tables. Assuming your new system call isn't special in some
|
||||
way (see below), this involves a "common" entry (for x86_64 and x32) in
|
||||
arch/x86/entry/syscalls/syscall_64.tbl:
|
||||
|
||||
333 common xyzzy sys_xyzzy
|
||||
|
||||
and an "i386" entry in arch/x86/entry/syscalls/syscall_32.tbl:
|
||||
|
||||
380 i386 xyzzy sys_xyzzy
|
||||
|
||||
Again, these numbers are liable to be changed if there are conflicts in the
|
||||
relevant merge window.
|
||||
|
||||
|
||||
Compatibility System Calls (Generic)
|
||||
------------------------------------
|
||||
|
||||
For most system calls the same 64-bit implementation can be invoked even when
|
||||
the userspace program is itself 32-bit; even if the system call's parameters
|
||||
include an explicit pointer, this is handled transparently.
|
||||
|
||||
However, there are a couple of situations where a compatibility layer is
|
||||
needed to cope with size differences between 32-bit and 64-bit.
|
||||
|
||||
The first is if the 64-bit kernel also supports 32-bit userspace programs, and
|
||||
so needs to parse areas of (__user) memory that could hold either 32-bit or
|
||||
64-bit values. In particular, this is needed whenever a system call argument
|
||||
is:
|
||||
|
||||
- a pointer to a pointer
|
||||
- a pointer to a struct containing a pointer (e.g. struct iovec __user *)
|
||||
- a pointer to a varying sized integral type (time_t, off_t, long, ...)
|
||||
- a pointer to a struct containing a varying sized integral type.
|
||||
|
||||
The second situation that requires a compatibility layer is if one of the
|
||||
system call's arguments has a type that is explicitly 64-bit even on a 32-bit
|
||||
architecture, for example loff_t or __u64. In this case, a value that arrives
|
||||
at a 64-bit kernel from a 32-bit application will be split into two 32-bit
|
||||
values, which then need to be re-assembled in the compatibility layer.
|
||||
|
||||
(Note that a system call argument that's a pointer to an explicit 64-bit type
|
||||
does *not* need a compatibility layer; for example, splice(2)'s arguments of
|
||||
type loff_t __user * do not trigger the need for a compat_ system call.)
|
||||
|
||||
The compatibility version of the system call is called compat_sys_xyzzy(), and
|
||||
is added with the COMPAT_SYSCALL_DEFINEn() macro, analogously to
|
||||
SYSCALL_DEFINEn. This version of the implementation runs as part of a 64-bit
|
||||
kernel, but expects to receive 32-bit parameter values and does whatever is
|
||||
needed to deal with them. (Typically, the compat_sys_ version converts the
|
||||
values to 64-bit versions and either calls on to the sys_ version, or both of
|
||||
them call a common inner implementation function.)
|
||||
|
||||
The compat entry point also needs a corresponding function prototype, in
|
||||
include/linux/compat.h, marked as asmlinkage to match the way that system
|
||||
calls are invoked:
|
||||
|
||||
asmlinkage long compat_sys_xyzzy(...);
|
||||
|
||||
If the system call involves a structure that is laid out differently on 32-bit
|
||||
and 64-bit systems, say struct xyzzy_args, then the include/linux/compat.h
|
||||
header file should also include a compat version of the structure (struct
|
||||
compat_xyzzy_args) where each variable-size field has the appropriate compat_
|
||||
type that corresponds to the type in struct xyzzy_args. The
|
||||
compat_sys_xyzzy() routine can then use this compat_ structure to parse the
|
||||
arguments from a 32-bit invocation.
|
||||
|
||||
For example, if there are fields:
|
||||
|
||||
struct xyzzy_args {
|
||||
const char __user *ptr;
|
||||
__kernel_long_t varying_val;
|
||||
u64 fixed_val;
|
||||
/* ... */
|
||||
};
|
||||
|
||||
in struct xyzzy_args, then struct compat_xyzzy_args would have:
|
||||
|
||||
struct compat_xyzzy_args {
|
||||
compat_uptr_t ptr;
|
||||
compat_long_t varying_val;
|
||||
u64 fixed_val;
|
||||
/* ... */
|
||||
};
|
||||
|
||||
The generic system call list also needs adjusting to allow for the compat
|
||||
version; the entry in include/uapi/asm-generic/unistd.h should use
|
||||
__SC_COMP rather than __SYSCALL:
|
||||
|
||||
#define __NR_xyzzy 292
|
||||
__SC_COMP(__NR_xyzzy, sys_xyzzy, compat_sys_xyzzy)
|
||||
|
||||
To summarize, you need:
|
||||
|
||||
- a COMPAT_SYSCALL_DEFINEn(xyzzy, ...) for the compat entry point
|
||||
- corresponding prototype in include/linux/compat.h
|
||||
- (if needed) 32-bit mapping struct in include/linux/compat.h
|
||||
- instance of __SC_COMP not __SYSCALL in include/uapi/asm-generic/unistd.h
|
||||
|
||||
|
||||
Compatibility System Calls (x86)
|
||||
--------------------------------
|
||||
|
||||
To wire up the x86 architecture of a system call with a compatibility version,
|
||||
the entries in the syscall tables need to be adjusted.
|
||||
|
||||
First, the entry in arch/x86/entry/syscalls/syscall_32.tbl gets an extra
|
||||
column to indicate that a 32-bit userspace program running on a 64-bit kernel
|
||||
should hit the compat entry point:
|
||||
|
||||
380 i386 xyzzy sys_xyzzy compat_sys_xyzzy
|
||||
|
||||
Second, you need to figure out what should happen for the x32 ABI version of
|
||||
the new system call. There's a choice here: the layout of the arguments
|
||||
should either match the 64-bit version or the 32-bit version.
|
||||
|
||||
If there's a pointer-to-a-pointer involved, the decision is easy: x32 is
|
||||
ILP32, so the layout should match the 32-bit version, and the entry in
|
||||
arch/x86/entry/syscalls/syscall_64.tbl is split so that x32 programs hit the
|
||||
compatibility wrapper:
|
||||
|
||||
333 64 xyzzy sys_xyzzy
|
||||
...
|
||||
555 x32 xyzzy compat_sys_xyzzy
|
||||
|
||||
If no pointers are involved, then it is preferable to re-use the 64-bit system
|
||||
call for the x32 ABI (and consequently the entry in
|
||||
arch/x86/entry/syscalls/syscall_64.tbl is unchanged).
|
||||
|
||||
In either case, you should check that the types involved in your argument
|
||||
layout do indeed map exactly from x32 (-mx32) to either the 32-bit (-m32) or
|
||||
64-bit (-m64) equivalents.
|
||||
|
||||
|
||||
System Calls Returning Elsewhere
|
||||
--------------------------------
|
||||
|
||||
For most system calls, once the system call is complete the user program
|
||||
continues exactly where it left off -- at the next instruction, with the
|
||||
stack the same and most of the registers the same as before the system call,
|
||||
and with the same virtual memory space.
|
||||
|
||||
However, a few system calls do things differently. They might return to a
|
||||
different location (rt_sigreturn) or change the memory space (fork/vfork/clone)
|
||||
or even architecture (execve/execveat) of the program.
|
||||
|
||||
To allow for this, the kernel implementation of the system call may need to
|
||||
save and restore additional registers to the kernel stack, allowing complete
|
||||
control of where and how execution continues after the system call.
|
||||
|
||||
This is arch-specific, but typically involves defining assembly entry points
|
||||
that save/restore additional registers and invoke the real system call entry
|
||||
point.
|
||||
|
||||
For x86_64, this is implemented as a stub_xyzzy entry point in
|
||||
arch/x86/entry/entry_64.S, and the entry in the syscall table
|
||||
(arch/x86/entry/syscalls/syscall_64.tbl) is adjusted to match:
|
||||
|
||||
333 common xyzzy stub_xyzzy
|
||||
|
||||
The equivalent for 32-bit programs running on a 64-bit kernel is normally
|
||||
called stub32_xyzzy and implemented in arch/x86/entry/entry_64_compat.S,
|
||||
with the corresponding syscall table adjustment in
|
||||
arch/x86/entry/syscalls/syscall_32.tbl:
|
||||
|
||||
380 i386 xyzzy sys_xyzzy stub32_xyzzy
|
||||
|
||||
If the system call needs a compatibility layer (as in the previous section)
|
||||
then the stub32_ version needs to call on to the compat_sys_ version of the
|
||||
system call rather than the native 64-bit version. Also, if the x32 ABI
|
||||
implementation is not common with the x86_64 version, then its syscall
|
||||
table will also need to invoke a stub that calls on to the compat_sys_
|
||||
version.
|
||||
|
||||
For completeness, it's also nice to set up a mapping so that user-mode Linux
|
||||
still works -- its syscall table will reference stub_xyzzy, but the UML build
|
||||
doesn't include arch/x86/entry/entry_64.S implementation (because UML
|
||||
simulates registers etc). Fixing this is as simple as adding a #define to
|
||||
arch/x86/um/sys_call_table_64.c:
|
||||
|
||||
#define stub_xyzzy sys_xyzzy
|
||||
|
||||
|
||||
Other Details
|
||||
-------------
|
||||
|
||||
Most of the kernel treats system calls in a generic way, but there is the
|
||||
occasional exception that may need updating for your particular system call.
|
||||
|
||||
The audit subsystem is one such special case; it includes (arch-specific)
|
||||
functions that classify some special types of system call -- specifically
|
||||
file open (open/openat), program execution (execve/exeveat) or socket
|
||||
multiplexor (socketcall) operations. If your new system call is analogous to
|
||||
one of these, then the audit system should be updated.
|
||||
|
||||
More generally, if there is an existing system call that is analogous to your
|
||||
new system call, it's worth doing a kernel-wide grep for the existing system
|
||||
call to check there are no other special cases.
|
||||
|
||||
|
||||
Testing
|
||||
-------
|
||||
|
||||
A new system call should obviously be tested; it is also useful to provide
|
||||
reviewers with a demonstration of how user space programs will use the system
|
||||
call. A good way to combine these aims is to include a simple self-test
|
||||
program in a new directory under tools/testing/selftests/.
|
||||
|
||||
For a new system call, there will obviously be no libc wrapper function and so
|
||||
the test will need to invoke it using syscall(); also, if the system call
|
||||
involves a new userspace-visible structure, the corresponding header will need
|
||||
to be installed to compile the test.
|
||||
|
||||
Make sure the selftest runs successfully on all supported architectures. For
|
||||
example, check that it works when compiled as an x86_64 (-m64), x86_32 (-m32)
|
||||
and x32 (-mx32) ABI program.
|
||||
|
||||
For more extensive and thorough testing of new functionality, you should also
|
||||
consider adding tests to the Linux Test Project, or to the xfstests project
|
||||
for filesystem-related changes.
|
||||
- https://linux-test-project.github.io/
|
||||
- git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
|
||||
|
||||
|
||||
Man Page
|
||||
--------
|
||||
|
||||
All new system calls should come with a complete man page, ideally using groff
|
||||
markup, but plain text will do. If groff is used, it's helpful to include a
|
||||
pre-rendered ASCII version of the man page in the cover email for the
|
||||
patchset, for the convenience of reviewers.
|
||||
|
||||
The man page should be cc'ed to linux-man@vger.kernel.org
|
||||
For more details, see https://www.kernel.org/doc/man-pages/patches.html
|
||||
|
||||
References and Sources
|
||||
----------------------
|
||||
|
||||
- LWN article from Michael Kerrisk on use of flags argument in system calls:
|
||||
https://lwn.net/Articles/585415/
|
||||
- LWN article from Michael Kerrisk on how to handle unknown flags in a system
|
||||
call: https://lwn.net/Articles/588444/
|
||||
- LWN article from Jake Edge describing constraints on 64-bit system call
|
||||
arguments: https://lwn.net/Articles/311630/
|
||||
- Pair of LWN articles from David Drysdale that describe the system call
|
||||
implementation paths in detail for v3.14:
|
||||
- https://lwn.net/Articles/604287/
|
||||
- https://lwn.net/Articles/604515/
|
||||
- Architecture-specific requirements for system calls are discussed in the
|
||||
syscall(2) man-page:
|
||||
http://man7.org/linux/man-pages/man2/syscall.2.html#NOTES
|
||||
- Collated emails from Linus Torvalds discussing the problems with ioctl():
|
||||
http://yarchive.net/comp/linux/ioctl.html
|
||||
- "How to not invent kernel interfaces", Arnd Bergmann,
|
||||
http://www.ukuug.org/events/linux2007/2007/papers/Bergmann.pdf
|
||||
- LWN article from Michael Kerrisk on avoiding new uses of CAP_SYS_ADMIN:
|
||||
https://lwn.net/Articles/486306/
|
||||
- Recommendation from Andrew Morton that all related information for a new
|
||||
system call should come in the same email thread:
|
||||
https://lkml.org/lkml/2014/7/24/641
|
||||
- Recommendation from Michael Kerrisk that a new system call should come with
|
||||
a man page: https://lkml.org/lkml/2014/6/13/309
|
||||
- Suggestion from Thomas Gleixner that x86 wire-up should be in a separate
|
||||
commit: https://lkml.org/lkml/2014/11/19/254
|
||||
- Suggestion from Greg Kroah-Hartman that it's good for new system calls to
|
||||
come with a man-page & selftest: https://lkml.org/lkml/2014/3/19/710
|
||||
- Discussion from Michael Kerrisk of new system call vs. prctl(2) extension:
|
||||
https://lkml.org/lkml/2014/6/3/411
|
||||
- Suggestion from Ingo Molnar that system calls that involve multiple
|
||||
arguments should encapsulate those arguments in a struct, which includes a
|
||||
size field for future extensibility: https://lkml.org/lkml/2015/7/30/117
|
||||
- Numbering oddities arising from (re-)use of O_* numbering space flags:
|
||||
- commit 75069f2b5bfb ("vfs: renumber FMODE_NONOTIFY and add to uniqueness
|
||||
check")
|
||||
- commit 12ed2e36c98a ("fanotify: FMODE_NONOTIFY and __O_SYNC in sparc
|
||||
conflict")
|
||||
- commit bb458c644a59 ("Safer ABI for O_TMPFILE")
|
||||
- Discussion from Matthew Wilcox about restrictions on 64-bit arguments:
|
||||
https://lkml.org/lkml/2008/12/12/187
|
||||
- Recommendation from Greg Kroah-Hartman that unknown flags should be
|
||||
policed: https://lkml.org/lkml/2014/7/17/577
|
||||
- Recommendation from Linus Torvalds that x32 system calls should prefer
|
||||
compatibility with 64-bit versions rather than 32-bit versions:
|
||||
https://lkml.org/lkml/2011/8/31/244
|
||||
411
Documentation/admin-guide/README.rst
Normal file
411
Documentation/admin-guide/README.rst
Normal file
@@ -0,0 +1,411 @@
|
||||
Linux kernel release 4.x <http://kernel.org/>
|
||||
=============================================
|
||||
|
||||
These are the release notes for Linux version 4. Read them carefully,
|
||||
as they tell you what this is all about, explain how to install the
|
||||
kernel, and what to do if something goes wrong.
|
||||
|
||||
What is Linux?
|
||||
--------------
|
||||
|
||||
Linux is a clone of the operating system Unix, written from scratch by
|
||||
Linus Torvalds with assistance from a loosely-knit team of hackers across
|
||||
the Net. It aims towards POSIX and Single UNIX Specification compliance.
|
||||
|
||||
It has all the features you would expect in a modern fully-fledged Unix,
|
||||
including true multitasking, virtual memory, shared libraries, demand
|
||||
loading, shared copy-on-write executables, proper memory management,
|
||||
and multistack networking including IPv4 and IPv6.
|
||||
|
||||
It is distributed under the GNU General Public License - see the
|
||||
accompanying COPYING file for more details.
|
||||
|
||||
On what hardware does it run?
|
||||
-----------------------------
|
||||
|
||||
Although originally developed first for 32-bit x86-based PCs (386 or higher),
|
||||
today Linux also runs on (at least) the Compaq Alpha AXP, Sun SPARC and
|
||||
UltraSPARC, Motorola 68000, PowerPC, PowerPC64, ARM, Hitachi SuperH, Cell,
|
||||
IBM S/390, MIPS, HP PA-RISC, Intel IA-64, DEC VAX, AMD x86-64, AXIS CRIS,
|
||||
Xtensa, Tilera TILE, AVR32, ARC and Renesas M32R architectures.
|
||||
|
||||
Linux is easily portable to most general-purpose 32- or 64-bit architectures
|
||||
as long as they have a paged memory management unit (PMMU) and a port of the
|
||||
GNU C compiler (gcc) (part of The GNU Compiler Collection, GCC). Linux has
|
||||
also been ported to a number of architectures without a PMMU, although
|
||||
functionality is then obviously somewhat limited.
|
||||
Linux has also been ported to itself. You can now run the kernel as a
|
||||
userspace application - this is called UserMode Linux (UML).
|
||||
|
||||
Documentation
|
||||
-------------
|
||||
|
||||
- There is a lot of documentation available both in electronic form on
|
||||
the Internet and in books, both Linux-specific and pertaining to
|
||||
general UNIX questions. I'd recommend looking into the documentation
|
||||
subdirectories on any Linux FTP site for the LDP (Linux Documentation
|
||||
Project) books. This README is not meant to be documentation on the
|
||||
system: there are much better sources available.
|
||||
|
||||
- There are various README files in the Documentation/ subdirectory:
|
||||
these typically contain kernel-specific installation notes for some
|
||||
drivers for example. See Documentation/00-INDEX for a list of what
|
||||
is contained in each file. Please read the
|
||||
:ref:`Documentation/process/changes.rst <changes>` file, as it
|
||||
contains information about the problems, which may result by upgrading
|
||||
your kernel.
|
||||
|
||||
- The Documentation/DocBook/ subdirectory contains several guides for
|
||||
kernel developers and users. These guides can be rendered in a
|
||||
number of formats: PostScript (.ps), PDF, HTML, & man-pages, among others.
|
||||
After installation, ``make psdocs``, ``make pdfdocs``, ``make htmldocs``,
|
||||
or ``make mandocs`` will render the documentation in the requested format.
|
||||
|
||||
Installing the kernel source
|
||||
----------------------------
|
||||
|
||||
- If you install the full sources, put the kernel tarball in a
|
||||
directory where you have permissions (e.g. your home directory) and
|
||||
unpack it::
|
||||
|
||||
xz -cd linux-4.X.tar.xz | tar xvf -
|
||||
|
||||
Replace "X" with the version number of the latest kernel.
|
||||
|
||||
Do NOT use the /usr/src/linux area! This area has a (usually
|
||||
incomplete) set of kernel headers that are used by the library header
|
||||
files. They should match the library, and not get messed up by
|
||||
whatever the kernel-du-jour happens to be.
|
||||
|
||||
- You can also upgrade between 4.x releases by patching. Patches are
|
||||
distributed in the xz format. To install by patching, get all the
|
||||
newer patch files, enter the top level directory of the kernel source
|
||||
(linux-4.X) and execute::
|
||||
|
||||
xz -cd ../patch-4.x.xz | patch -p1
|
||||
|
||||
Replace "x" for all versions bigger than the version "X" of your current
|
||||
source tree, **in_order**, and you should be ok. You may want to remove
|
||||
the backup files (some-file-name~ or some-file-name.orig), and make sure
|
||||
that there are no failed patches (some-file-name# or some-file-name.rej).
|
||||
If there are, either you or I have made a mistake.
|
||||
|
||||
Unlike patches for the 4.x kernels, patches for the 4.x.y kernels
|
||||
(also known as the -stable kernels) are not incremental but instead apply
|
||||
directly to the base 4.x kernel. For example, if your base kernel is 4.0
|
||||
and you want to apply the 4.0.3 patch, you must not first apply the 4.0.1
|
||||
and 4.0.2 patches. Similarly, if you are running kernel version 4.0.2 and
|
||||
want to jump to 4.0.3, you must first reverse the 4.0.2 patch (that is,
|
||||
patch -R) **before** applying the 4.0.3 patch. You can read more on this in
|
||||
:ref:`Documentation/process/applying-patches.rst <applying_patches>`.
|
||||
|
||||
Alternatively, the script patch-kernel can be used to automate this
|
||||
process. It determines the current kernel version and applies any
|
||||
patches found::
|
||||
|
||||
linux/scripts/patch-kernel linux
|
||||
|
||||
The first argument in the command above is the location of the
|
||||
kernel source. Patches are applied from the current directory, but
|
||||
an alternative directory can be specified as the second argument.
|
||||
|
||||
- Make sure you have no stale .o files and dependencies lying around::
|
||||
|
||||
cd linux
|
||||
make mrproper
|
||||
|
||||
You should now have the sources correctly installed.
|
||||
|
||||
Software requirements
|
||||
---------------------
|
||||
|
||||
Compiling and running the 4.x kernels requires up-to-date
|
||||
versions of various software packages. Consult
|
||||
:ref:`Documentation/process/changes.rst <changes>` for the minimum version numbers
|
||||
required and how to get updates for these packages. Beware that using
|
||||
excessively old versions of these packages can cause indirect
|
||||
errors that are very difficult to track down, so don't assume that
|
||||
you can just update packages when obvious problems arise during
|
||||
build or operation.
|
||||
|
||||
Build directory for the kernel
|
||||
------------------------------
|
||||
|
||||
When compiling the kernel, all output files will per default be
|
||||
stored together with the kernel source code.
|
||||
Using the option ``make O=output/dir`` allows you to specify an alternate
|
||||
place for the output files (including .config).
|
||||
Example::
|
||||
|
||||
kernel source code: /usr/src/linux-4.X
|
||||
build directory: /home/name/build/kernel
|
||||
|
||||
To configure and build the kernel, use::
|
||||
|
||||
cd /usr/src/linux-4.X
|
||||
make O=/home/name/build/kernel menuconfig
|
||||
make O=/home/name/build/kernel
|
||||
sudo make O=/home/name/build/kernel modules_install install
|
||||
|
||||
Please note: If the ``O=output/dir`` option is used, then it must be
|
||||
used for all invocations of make.
|
||||
|
||||
Configuring the kernel
|
||||
----------------------
|
||||
|
||||
Do not skip this step even if you are only upgrading one minor
|
||||
version. New configuration options are added in each release, and
|
||||
odd problems will turn up if the configuration files are not set up
|
||||
as expected. If you want to carry your existing configuration to a
|
||||
new version with minimal work, use ``make oldconfig``, which will
|
||||
only ask you for the answers to new questions.
|
||||
|
||||
- Alternative configuration commands are::
|
||||
|
||||
"make config" Plain text interface.
|
||||
|
||||
"make menuconfig" Text based color menus, radiolists & dialogs.
|
||||
|
||||
"make nconfig" Enhanced text based color menus.
|
||||
|
||||
"make xconfig" Qt based configuration tool.
|
||||
|
||||
"make gconfig" GTK+ based configuration tool.
|
||||
|
||||
"make oldconfig" Default all questions based on the contents of
|
||||
your existing ./.config file and asking about
|
||||
new config symbols.
|
||||
|
||||
"make silentoldconfig"
|
||||
Like above, but avoids cluttering the screen
|
||||
with questions already answered.
|
||||
Additionally updates the dependencies.
|
||||
|
||||
"make olddefconfig"
|
||||
Like above, but sets new symbols to their default
|
||||
values without prompting.
|
||||
|
||||
"make defconfig" Create a ./.config file by using the default
|
||||
symbol values from either arch/$ARCH/defconfig
|
||||
or arch/$ARCH/configs/${PLATFORM}_defconfig,
|
||||
depending on the architecture.
|
||||
|
||||
"make ${PLATFORM}_defconfig"
|
||||
Create a ./.config file by using the default
|
||||
symbol values from
|
||||
arch/$ARCH/configs/${PLATFORM}_defconfig.
|
||||
Use "make help" to get a list of all available
|
||||
platforms of your architecture.
|
||||
|
||||
"make allyesconfig"
|
||||
Create a ./.config file by setting symbol
|
||||
values to 'y' as much as possible.
|
||||
|
||||
"make allmodconfig"
|
||||
Create a ./.config file by setting symbol
|
||||
values to 'm' as much as possible.
|
||||
|
||||
"make allnoconfig" Create a ./.config file by setting symbol
|
||||
values to 'n' as much as possible.
|
||||
|
||||
"make randconfig" Create a ./.config file by setting symbol
|
||||
values to random values.
|
||||
|
||||
"make localmodconfig" Create a config based on current config and
|
||||
loaded modules (lsmod). Disables any module
|
||||
option that is not needed for the loaded modules.
|
||||
|
||||
To create a localmodconfig for another machine,
|
||||
store the lsmod of that machine into a file
|
||||
and pass it in as a LSMOD parameter.
|
||||
|
||||
target$ lsmod > /tmp/mylsmod
|
||||
target$ scp /tmp/mylsmod host:/tmp
|
||||
|
||||
host$ make LSMOD=/tmp/mylsmod localmodconfig
|
||||
|
||||
The above also works when cross compiling.
|
||||
|
||||
"make localyesconfig" Similar to localmodconfig, except it will convert
|
||||
all module options to built in (=y) options.
|
||||
|
||||
You can find more information on using the Linux kernel config tools
|
||||
in Documentation/kbuild/kconfig.txt.
|
||||
|
||||
- NOTES on ``make config``:
|
||||
|
||||
- Having unnecessary drivers will make the kernel bigger, and can
|
||||
under some circumstances lead to problems: probing for a
|
||||
nonexistent controller card may confuse your other controllers
|
||||
|
||||
- A kernel with math-emulation compiled in will still use the
|
||||
coprocessor if one is present: the math emulation will just
|
||||
never get used in that case. The kernel will be slightly larger,
|
||||
but will work on different machines regardless of whether they
|
||||
have a math coprocessor or not.
|
||||
|
||||
- The "kernel hacking" configuration details usually result in a
|
||||
bigger or slower kernel (or both), and can even make the kernel
|
||||
less stable by configuring some routines to actively try to
|
||||
break bad code to find kernel problems (kmalloc()). Thus you
|
||||
should probably answer 'n' to the questions for "development",
|
||||
"experimental", or "debugging" features.
|
||||
|
||||
Compiling the kernel
|
||||
--------------------
|
||||
|
||||
- Make sure you have at least gcc 3.2 available.
|
||||
For more information, refer to :ref:`Documentation/process/changes.rst <changes>`.
|
||||
|
||||
Please note that you can still run a.out user programs with this kernel.
|
||||
|
||||
- Do a ``make`` to create a compressed kernel image. It is also
|
||||
possible to do ``make install`` if you have lilo installed to suit the
|
||||
kernel makefiles, but you may want to check your particular lilo setup first.
|
||||
|
||||
To do the actual install, you have to be root, but none of the normal
|
||||
build should require that. Don't take the name of root in vain.
|
||||
|
||||
- If you configured any of the parts of the kernel as ``modules``, you
|
||||
will also have to do ``make modules_install``.
|
||||
|
||||
- Verbose kernel compile/build output:
|
||||
|
||||
Normally, the kernel build system runs in a fairly quiet mode (but not
|
||||
totally silent). However, sometimes you or other kernel developers need
|
||||
to see compile, link, or other commands exactly as they are executed.
|
||||
For this, use "verbose" build mode. This is done by passing
|
||||
``V=1`` to the ``make`` command, e.g.::
|
||||
|
||||
make V=1 all
|
||||
|
||||
To have the build system also tell the reason for the rebuild of each
|
||||
target, use ``V=2``. The default is ``V=0``.
|
||||
|
||||
- Keep a backup kernel handy in case something goes wrong. This is
|
||||
especially true for the development releases, since each new release
|
||||
contains new code which has not been debugged. Make sure you keep a
|
||||
backup of the modules corresponding to that kernel, as well. If you
|
||||
are installing a new kernel with the same version number as your
|
||||
working kernel, make a backup of your modules directory before you
|
||||
do a ``make modules_install``.
|
||||
|
||||
Alternatively, before compiling, use the kernel config option
|
||||
"LOCALVERSION" to append a unique suffix to the regular kernel version.
|
||||
LOCALVERSION can be set in the "General Setup" menu.
|
||||
|
||||
- In order to boot your new kernel, you'll need to copy the kernel
|
||||
image (e.g. .../linux/arch/x86/boot/bzImage after compilation)
|
||||
to the place where your regular bootable kernel is found.
|
||||
|
||||
- Booting a kernel directly from a floppy without the assistance of a
|
||||
bootloader such as LILO, is no longer supported.
|
||||
|
||||
If you boot Linux from the hard drive, chances are you use LILO, which
|
||||
uses the kernel image as specified in the file /etc/lilo.conf. The
|
||||
kernel image file is usually /vmlinuz, /boot/vmlinuz, /bzImage or
|
||||
/boot/bzImage. To use the new kernel, save a copy of the old image
|
||||
and copy the new image over the old one. Then, you MUST RERUN LILO
|
||||
to update the loading map! If you don't, you won't be able to boot
|
||||
the new kernel image.
|
||||
|
||||
Reinstalling LILO is usually a matter of running /sbin/lilo.
|
||||
You may wish to edit /etc/lilo.conf to specify an entry for your
|
||||
old kernel image (say, /vmlinux.old) in case the new one does not
|
||||
work. See the LILO docs for more information.
|
||||
|
||||
After reinstalling LILO, you should be all set. Shutdown the system,
|
||||
reboot, and enjoy!
|
||||
|
||||
If you ever need to change the default root device, video mode,
|
||||
ramdisk size, etc. in the kernel image, use the ``rdev`` program (or
|
||||
alternatively the LILO boot options when appropriate). No need to
|
||||
recompile the kernel to change these parameters.
|
||||
|
||||
- Reboot with the new kernel and enjoy.
|
||||
|
||||
If something goes wrong
|
||||
-----------------------
|
||||
|
||||
- If you have problems that seem to be due to kernel bugs, please check
|
||||
the file MAINTAINERS to see if there is a particular person associated
|
||||
with the part of the kernel that you are having trouble with. If there
|
||||
isn't anyone listed there, then the second best thing is to mail
|
||||
them to me (torvalds@linux-foundation.org), and possibly to any other
|
||||
relevant mailing-list or to the newsgroup.
|
||||
|
||||
- In all bug-reports, *please* tell what kernel you are talking about,
|
||||
how to duplicate the problem, and what your setup is (use your common
|
||||
sense). If the problem is new, tell me so, and if the problem is
|
||||
old, please try to tell me when you first noticed it.
|
||||
|
||||
- If the bug results in a message like::
|
||||
|
||||
unable to handle kernel paging request at address C0000010
|
||||
Oops: 0002
|
||||
EIP: 0010:XXXXXXXX
|
||||
eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx
|
||||
esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx
|
||||
ds: xxxx es: xxxx fs: xxxx gs: xxxx
|
||||
Pid: xx, process nr: xx
|
||||
xx xx xx xx xx xx xx xx xx xx
|
||||
|
||||
or similar kernel debugging information on your screen or in your
|
||||
system log, please duplicate it *exactly*. The dump may look
|
||||
incomprehensible to you, but it does contain information that may
|
||||
help debugging the problem. The text above the dump is also
|
||||
important: it tells something about why the kernel dumped code (in
|
||||
the above example, it's due to a bad kernel pointer). More information
|
||||
on making sense of the dump is in Documentation/admin-guide/oops-tracing.rst
|
||||
|
||||
- If you compiled the kernel with CONFIG_KALLSYMS you can send the dump
|
||||
as is, otherwise you will have to use the ``ksymoops`` program to make
|
||||
sense of the dump (but compiling with CONFIG_KALLSYMS is usually preferred).
|
||||
This utility can be downloaded from
|
||||
ftp://ftp.<country>.kernel.org/pub/linux/utils/kernel/ksymoops/ .
|
||||
Alternatively, you can do the dump lookup by hand:
|
||||
|
||||
- In debugging dumps like the above, it helps enormously if you can
|
||||
look up what the EIP value means. The hex value as such doesn't help
|
||||
me or anybody else very much: it will depend on your particular
|
||||
kernel setup. What you should do is take the hex value from the EIP
|
||||
line (ignore the ``0010:``), and look it up in the kernel namelist to
|
||||
see which kernel function contains the offending address.
|
||||
|
||||
To find out the kernel function name, you'll need to find the system
|
||||
binary associated with the kernel that exhibited the symptom. This is
|
||||
the file 'linux/vmlinux'. To extract the namelist and match it against
|
||||
the EIP from the kernel crash, do::
|
||||
|
||||
nm vmlinux | sort | less
|
||||
|
||||
This will give you a list of kernel addresses sorted in ascending
|
||||
order, from which it is simple to find the function that contains the
|
||||
offending address. Note that the address given by the kernel
|
||||
debugging messages will not necessarily match exactly with the
|
||||
function addresses (in fact, that is very unlikely), so you can't
|
||||
just 'grep' the list: the list will, however, give you the starting
|
||||
point of each kernel function, so by looking for the function that
|
||||
has a starting address lower than the one you are searching for but
|
||||
is followed by a function with a higher address you will find the one
|
||||
you want. In fact, it may be a good idea to include a bit of
|
||||
"context" in your problem report, giving a few lines around the
|
||||
interesting one.
|
||||
|
||||
If you for some reason cannot do the above (you have a pre-compiled
|
||||
kernel image or similar), telling me as much about your setup as
|
||||
possible will help. Please read the :ref:`admin-guide/reporting-bugs.rst <reportingbugs>`
|
||||
document for details.
|
||||
|
||||
- Alternatively, you can use gdb on a running kernel. (read-only; i.e. you
|
||||
cannot change values or set break points.) To do this, first compile the
|
||||
kernel with -g; edit arch/x86/Makefile appropriately, then do a ``make
|
||||
clean``. You'll also need to enable CONFIG_PROC_FS (via ``make config``).
|
||||
|
||||
After you've rebooted with the new kernel, do ``gdb vmlinux /proc/kcore``.
|
||||
You can now use all the usual gdb commands. The command to look up the
|
||||
point where your system crashed is ``l *0xXXXXXXXX``. (Replace the XXXes
|
||||
with the EIP value.)
|
||||
|
||||
gdb'ing a non-running kernel currently fails because ``gdb`` (wrongly)
|
||||
disregards the starting offset for which the kernel is compiled.
|
||||
151
Documentation/admin-guide/binfmt-misc.rst
Normal file
151
Documentation/admin-guide/binfmt-misc.rst
Normal file
@@ -0,0 +1,151 @@
|
||||
Kernel Support for miscellaneous (your favourite) Binary Formats v1.1
|
||||
=====================================================================
|
||||
|
||||
This Kernel feature allows you to invoke almost (for restrictions see below)
|
||||
every program by simply typing its name in the shell.
|
||||
This includes for example compiled Java(TM), Python or Emacs programs.
|
||||
|
||||
To achieve this you must tell binfmt_misc which interpreter has to be invoked
|
||||
with which binary. Binfmt_misc recognises the binary-type by matching some bytes
|
||||
at the beginning of the file with a magic byte sequence (masking out specified
|
||||
bits) you have supplied. Binfmt_misc can also recognise a filename extension
|
||||
aka ``.com`` or ``.exe``.
|
||||
|
||||
First you must mount binfmt_misc::
|
||||
|
||||
mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
|
||||
|
||||
To actually register a new binary type, you have to set up a string looking like
|
||||
``:name:type:offset:magic:mask:interpreter:flags`` (where you can choose the
|
||||
``:`` upon your needs) and echo it to ``/proc/sys/fs/binfmt_misc/register``.
|
||||
|
||||
Here is what the fields mean:
|
||||
|
||||
- ``name``
|
||||
is an identifier string. A new /proc file will be created with this
|
||||
``name below /proc/sys/fs/binfmt_misc``; cannot contain slashes ``/`` for
|
||||
obvious reasons.
|
||||
- ``type``
|
||||
is the type of recognition. Give ``M`` for magic and ``E`` for extension.
|
||||
- ``offset``
|
||||
is the offset of the magic/mask in the file, counted in bytes. This
|
||||
defaults to 0 if you omit it (i.e. you write ``:name:type::magic...``).
|
||||
Ignored when using filename extension matching.
|
||||
- ``magic``
|
||||
is the byte sequence binfmt_misc is matching for. The magic string
|
||||
may contain hex-encoded characters like ``\x0a`` or ``\xA4``. Note that you
|
||||
must escape any NUL bytes; parsing halts at the first one. In a shell
|
||||
environment you might have to write ``\\x0a`` to prevent the shell from
|
||||
eating your ``\``.
|
||||
If you chose filename extension matching, this is the extension to be
|
||||
recognised (without the ``.``, the ``\x0a`` specials are not allowed).
|
||||
Extension matching is case sensitive, and slashes ``/`` are not allowed!
|
||||
- ``mask``
|
||||
is an (optional, defaults to all 0xff) mask. You can mask out some
|
||||
bits from matching by supplying a string like magic and as long as magic.
|
||||
The mask is anded with the byte sequence of the file. Note that you must
|
||||
escape any NUL bytes; parsing halts at the first one. Ignored when using
|
||||
filename extension matching.
|
||||
- ``interpreter``
|
||||
is the program that should be invoked with the binary as first
|
||||
argument (specify the full path)
|
||||
- ``flags``
|
||||
is an optional field that controls several aspects of the invocation
|
||||
of the interpreter. It is a string of capital letters, each controls a
|
||||
certain aspect. The following flags are supported:
|
||||
|
||||
``P`` - preserve-argv[0]
|
||||
Legacy behavior of binfmt_misc is to overwrite
|
||||
the original argv[0] with the full path to the binary. When this
|
||||
flag is included, binfmt_misc will add an argument to the argument
|
||||
vector for this purpose, thus preserving the original ``argv[0]``.
|
||||
e.g. If your interp is set to ``/bin/foo`` and you run ``blah``
|
||||
(which is in ``/usr/local/bin``), then the kernel will execute
|
||||
``/bin/foo`` with ``argv[]`` set to ``["/bin/foo", "/usr/local/bin/blah", "blah"]``. The interp has to be aware of this so it can
|
||||
execute ``/usr/local/bin/blah``
|
||||
with ``argv[]`` set to ``["blah"]``.
|
||||
``O`` - open-binary
|
||||
Legacy behavior of binfmt_misc is to pass the full path
|
||||
of the binary to the interpreter as an argument. When this flag is
|
||||
included, binfmt_misc will open the file for reading and pass its
|
||||
descriptor as an argument, instead of the full path, thus allowing
|
||||
the interpreter to execute non-readable binaries. This feature
|
||||
should be used with care - the interpreter has to be trusted not to
|
||||
emit the contents of the non-readable binary.
|
||||
``C`` - credentials
|
||||
Currently, the behavior of binfmt_misc is to calculate
|
||||
the credentials and security token of the new process according to
|
||||
the interpreter. When this flag is included, these attributes are
|
||||
calculated according to the binary. It also implies the ``O`` flag.
|
||||
This feature should be used with care as the interpreter
|
||||
will run with root permissions when a setuid binary owned by root
|
||||
is run with binfmt_misc.
|
||||
``F`` - fix binary
|
||||
The usual behaviour of binfmt_misc is to spawn the
|
||||
binary lazily when the misc format file is invoked. However,
|
||||
this doesn``t work very well in the face of mount namespaces and
|
||||
changeroots, so the ``F`` mode opens the binary as soon as the
|
||||
emulation is installed and uses the opened image to spawn the
|
||||
emulator, meaning it is always available once installed,
|
||||
regardless of how the environment changes.
|
||||
|
||||
|
||||
There are some restrictions:
|
||||
|
||||
- the whole register string may not exceed 1920 characters
|
||||
- the magic must reside in the first 128 bytes of the file, i.e.
|
||||
offset+size(magic) has to be less than 128
|
||||
- the interpreter string may not exceed 127 characters
|
||||
|
||||
To use binfmt_misc you have to mount it first. You can mount it with
|
||||
``mount -t binfmt_misc none /proc/sys/fs/binfmt_misc`` command, or you can add
|
||||
a line ``none /proc/sys/fs/binfmt_misc binfmt_misc defaults 0 0`` to your
|
||||
``/etc/fstab`` so it auto mounts on boot.
|
||||
|
||||
You may want to add the binary formats in one of your ``/etc/rc`` scripts during
|
||||
boot-up. Read the manual of your init program to figure out how to do this
|
||||
right.
|
||||
|
||||
Think about the order of adding entries! Later added entries are matched first!
|
||||
|
||||
|
||||
A few examples (assumed you are in ``/proc/sys/fs/binfmt_misc``):
|
||||
|
||||
- enable support for em86 (like binfmt_em86, for Alpha AXP only)::
|
||||
|
||||
echo ':i386:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
|
||||
echo ':i486:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x06:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
|
||||
|
||||
- enable support for packed DOS applications (pre-configured dosemu hdimages)::
|
||||
|
||||
echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register
|
||||
|
||||
- enable support for Windows executables using wine::
|
||||
|
||||
echo ':DOSWin:M::MZ::/usr/local/bin/wine:' > register
|
||||
|
||||
For java support see Documentation/admin-guide/java.rst
|
||||
|
||||
|
||||
You can enable/disable binfmt_misc or one binary type by echoing 0 (to disable)
|
||||
or 1 (to enable) to ``/proc/sys/fs/binfmt_misc/status`` or
|
||||
``/proc/.../the_name``.
|
||||
Catting the file tells you the current status of ``binfmt_misc/the_entry``.
|
||||
|
||||
You can remove one entry or all entries by echoing -1 to ``/proc/.../the_name``
|
||||
or ``/proc/sys/fs/binfmt_misc/status``.
|
||||
|
||||
|
||||
Hints
|
||||
-----
|
||||
|
||||
If you want to pass special arguments to your interpreter, you can
|
||||
write a wrapper script for it. See Documentation/admin-guide/java.rst for an
|
||||
example.
|
||||
|
||||
Your interpreter should NOT look in the PATH for the filename; the kernel
|
||||
passes it the full filename (or the file descriptor) to use. Using ``$PATH`` can
|
||||
cause unexpected behaviour and can be a security hazard.
|
||||
|
||||
|
||||
Richard Günther <rguenth@tat.physik.uni-tuebingen.de>
|
||||
38
Documentation/admin-guide/braille-console.rst
Normal file
38
Documentation/admin-guide/braille-console.rst
Normal file
@@ -0,0 +1,38 @@
|
||||
Linux Braille Console
|
||||
=====================
|
||||
|
||||
To get early boot messages on a braille device (before userspace screen
|
||||
readers can start), you first need to compile the support for the usual serial
|
||||
console (see :ref:`Documentation/admin-guide/serial-console.rst <serial_console>`), and
|
||||
for braille device
|
||||
(in :menuselection:`Device Drivers --> Accessibility support --> Console on braille device`).
|
||||
|
||||
Then you need to specify a ``console=brl``, option on the kernel command line, the
|
||||
format is::
|
||||
|
||||
console=brl,serial_options...
|
||||
|
||||
where ``serial_options...`` are the same as described in
|
||||
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`.
|
||||
|
||||
So for instance you can use ``console=brl,ttyS0`` if the braille device is connected to the first serial port, and ``console=brl,ttyS0,115200`` to
|
||||
override the baud rate to 115200, etc.
|
||||
|
||||
By default, the braille device will just show the last kernel message (console
|
||||
mode). To review previous messages, press the Insert key to switch to the VT
|
||||
review mode. In review mode, the arrow keys permit to browse in the VT content,
|
||||
:kbd:`PAGE-UP`/:kbd:`PAGE-DOWN` keys go at the top/bottom of the screen, and
|
||||
the :kbd:`HOME` key goes back
|
||||
to the cursor, hence providing very basic screen reviewing facility.
|
||||
|
||||
Sound feedback can be obtained by adding the ``braille_console.sound=1`` kernel
|
||||
parameter.
|
||||
|
||||
For simplicity, only one braille console can be enabled, other uses of
|
||||
``console=brl,...`` will be discarded. Also note that it does not interfere with
|
||||
the console selection mechanism described in
|
||||
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`.
|
||||
|
||||
For now, only the VisioBraille device is supported.
|
||||
|
||||
Samuel Thibault <samuel.thibault@ens-lyon.org>
|
||||
76
Documentation/admin-guide/bug-bisect.rst
Normal file
76
Documentation/admin-guide/bug-bisect.rst
Normal file
@@ -0,0 +1,76 @@
|
||||
Bisecting a bug
|
||||
+++++++++++++++
|
||||
|
||||
Last updated: 28 October 2016
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
Always try the latest kernel from kernel.org and build from source. If you are
|
||||
not confident in doing that please report the bug to your distribution vendor
|
||||
instead of to a kernel developer.
|
||||
|
||||
Finding bugs is not always easy. Have a go though. If you can't find it don't
|
||||
give up. Report as much as you have found to the relevant maintainer. See
|
||||
MAINTAINERS for who that is for the subsystem you have worked on.
|
||||
|
||||
Before you submit a bug report read
|
||||
:ref:`Documentation/admin-guide/reporting-bugs.rst <reportingbugs>`.
|
||||
|
||||
Devices not appearing
|
||||
=====================
|
||||
|
||||
Often this is caused by udev/systemd. Check that first before blaming it
|
||||
on the kernel.
|
||||
|
||||
Finding patch that caused a bug
|
||||
===============================
|
||||
|
||||
Using the provided tools with ``git`` makes finding bugs easy provided the bug
|
||||
is reproducible.
|
||||
|
||||
Steps to do it:
|
||||
|
||||
- build the Kernel from its git source
|
||||
- start bisect with [#f1]_::
|
||||
|
||||
$ git bisect start
|
||||
|
||||
- mark the broken changeset with::
|
||||
|
||||
$ git bisect bad [commit]
|
||||
|
||||
- mark a changeset where the code is known to work with::
|
||||
|
||||
$ git bisect good [commit]
|
||||
|
||||
- rebuild the Kernel and test
|
||||
- interact with git bisect by using either::
|
||||
|
||||
$ git bisect good
|
||||
|
||||
or::
|
||||
|
||||
$ git bisect bad
|
||||
|
||||
depending if the bug happened on the changeset you're testing
|
||||
- After some interactions, git bisect will give you the changeset that
|
||||
likely caused the bug.
|
||||
|
||||
- For example, if you know that the current version is bad, and version
|
||||
4.8 is good, you could do::
|
||||
|
||||
$ git bisect start
|
||||
$ git bisect bad # Current version is bad
|
||||
$ git bisect good v4.8
|
||||
|
||||
|
||||
.. [#f1] You can, optionally, provide both good and bad arguments at git
|
||||
start with ``git bisect start [BAD] [GOOD]``
|
||||
|
||||
For further references, please read:
|
||||
|
||||
- The man page for ``git-bisect``
|
||||
- `Fighting regressions with git bisect <https://www.kernel.org/pub/software/scm/git/docs/git-bisect-lk2009.html>`_
|
||||
- `Fully automated bisecting with "git bisect run" <https://lwn.net/Articles/317154>`_
|
||||
- `Using Git bisect to figure out when brokenness was introduced <http://webchick.net/node/99>`_
|
||||
369
Documentation/admin-guide/bug-hunting.rst
Normal file
369
Documentation/admin-guide/bug-hunting.rst
Normal file
@@ -0,0 +1,369 @@
|
||||
Bug hunting
|
||||
===========
|
||||
|
||||
Kernel bug reports often come with a stack dump like the one below::
|
||||
|
||||
------------[ cut here ]------------
|
||||
WARNING: CPU: 1 PID: 28102 at kernel/module.c:1108 module_put+0x57/0x70
|
||||
Modules linked in: dvb_usb_gp8psk(-) dvb_usb dvb_core nvidia_drm(PO) nvidia_modeset(PO) snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd soundcore nvidia(PO) [last unloaded: rc_core]
|
||||
CPU: 1 PID: 28102 Comm: rmmod Tainted: P WC O 4.8.4-build.1 #1
|
||||
Hardware name: MSI MS-7309/MS-7309, BIOS V1.12 02/23/2009
|
||||
00000000 c12ba080 00000000 00000000 c103ed6a c1616014 00000001 00006dc6
|
||||
c1615862 00000454 c109e8a7 c109e8a7 00000009 ffffffff 00000000 f13f6a10
|
||||
f5f5a600 c103ee33 00000009 00000000 00000000 c109e8a7 f80ca4d0 c109f617
|
||||
Call Trace:
|
||||
[<c12ba080>] ? dump_stack+0x44/0x64
|
||||
[<c103ed6a>] ? __warn+0xfa/0x120
|
||||
[<c109e8a7>] ? module_put+0x57/0x70
|
||||
[<c109e8a7>] ? module_put+0x57/0x70
|
||||
[<c103ee33>] ? warn_slowpath_null+0x23/0x30
|
||||
[<c109e8a7>] ? module_put+0x57/0x70
|
||||
[<f80ca4d0>] ? gp8psk_fe_set_frontend+0x460/0x460 [dvb_usb_gp8psk]
|
||||
[<c109f617>] ? symbol_put_addr+0x27/0x50
|
||||
[<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb]
|
||||
[<f80bb3bf>] ? dvb_usb_exit+0x2f/0xd0 [dvb_usb]
|
||||
[<c13d03bc>] ? usb_disable_endpoint+0x7c/0xb0
|
||||
[<f80bb48a>] ? dvb_usb_device_exit+0x2a/0x50 [dvb_usb]
|
||||
[<c13d2882>] ? usb_unbind_interface+0x62/0x250
|
||||
[<c136b514>] ? __pm_runtime_idle+0x44/0x70
|
||||
[<c13620d8>] ? __device_release_driver+0x78/0x120
|
||||
[<c1362907>] ? driver_detach+0x87/0x90
|
||||
[<c1361c48>] ? bus_remove_driver+0x38/0x90
|
||||
[<c13d1c18>] ? usb_deregister+0x58/0xb0
|
||||
[<c109fbb0>] ? SyS_delete_module+0x130/0x1f0
|
||||
[<c1055654>] ? task_work_run+0x64/0x80
|
||||
[<c1000fa5>] ? exit_to_usermode_loop+0x85/0x90
|
||||
[<c10013f0>] ? do_fast_syscall_32+0x80/0x130
|
||||
[<c1549f43>] ? sysenter_past_esp+0x40/0x6a
|
||||
---[ end trace 6ebc60ef3981792f ]---
|
||||
|
||||
Such stack traces provide enough information to identify the line inside the
|
||||
Kernel's source code where the bug happened. Depending on the severity of
|
||||
the issue, it may also contain the word **Oops**, as on this one::
|
||||
|
||||
BUG: unable to handle kernel NULL pointer dereference at (null)
|
||||
IP: [<c06969d4>] iret_exc+0x7d0/0xa59
|
||||
*pdpt = 000000002258a001 *pde = 0000000000000000
|
||||
Oops: 0002 [#1] PREEMPT SMP
|
||||
...
|
||||
|
||||
Despite being an **Oops** or some other sort of stack trace, the offended
|
||||
line is usually required to identify and handle the bug. Along this chapter,
|
||||
we'll refer to "Oops" for all kinds of stack traces that need to be analized.
|
||||
|
||||
.. note::
|
||||
|
||||
``ksymoops`` is useless on 2.6 or upper. Please use the Oops in its original
|
||||
format (from ``dmesg``, etc). Ignore any references in this or other docs to
|
||||
"decoding the Oops" or "running it through ksymoops".
|
||||
If you post an Oops from 2.6+ that has been run through ``ksymoops``,
|
||||
people will just tell you to repost it.
|
||||
|
||||
Where is the Oops message is located?
|
||||
-------------------------------------
|
||||
|
||||
Normally the Oops text is read from the kernel buffers by klogd and
|
||||
handed to ``syslogd`` which writes it to a syslog file, typically
|
||||
``/var/log/messages`` (depends on ``/etc/syslog.conf``). On systems with
|
||||
systemd, it may also be stored by the ``journald`` daemon, and accessed
|
||||
by running ``journalctl`` command.
|
||||
|
||||
Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to
|
||||
read the data from the kernel buffers and save it. Or you can
|
||||
``cat /proc/kmsg > file``, however you have to break in to stop the transfer,
|
||||
``kmsg`` is a "never ending file".
|
||||
|
||||
If the machine has crashed so badly that you cannot enter commands or
|
||||
the disk is not available then you have three options:
|
||||
|
||||
(1) Hand copy the text from the screen and type it in after the machine
|
||||
has restarted. Messy but it is the only option if you have not
|
||||
planned for a crash. Alternatively, you can take a picture of
|
||||
the screen with a digital camera - not nice, but better than
|
||||
nothing. If the messages scroll off the top of the console, you
|
||||
may find that booting with a higher resolution (eg, ``vga=791``)
|
||||
will allow you to read more of the text. (Caveat: This needs ``vesafb``,
|
||||
so won't help for 'early' oopses)
|
||||
|
||||
(2) Boot with a serial console (see
|
||||
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`),
|
||||
run a null modem to a second machine and capture the output there
|
||||
using your favourite communication program. Minicom works well.
|
||||
|
||||
(3) Use Kdump (see Documentation/kdump/kdump.txt),
|
||||
extract the kernel ring buffer from old memory with using dmesg
|
||||
gdbmacro in Documentation/kdump/gdbmacros.txt.
|
||||
|
||||
Finding the bug's location
|
||||
--------------------------
|
||||
|
||||
Reporting a bug works best if you point the location of the bug at the
|
||||
Kernel source file. There are two methods for doing that. Usually, using
|
||||
``gdb`` is easier, but the Kernel should be pre-compiled with debug info.
|
||||
|
||||
gdb
|
||||
^^^
|
||||
|
||||
The GNU debug (``gdb``) is the best way to figure out the exact file and line
|
||||
number of the OOPS from the ``vmlinux`` file.
|
||||
|
||||
The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``.
|
||||
This can be set by running::
|
||||
|
||||
$ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
|
||||
|
||||
On a kernel compiled with ``CONFIG_DEBUG_INFO``, you can simply copy the
|
||||
EIP value from the OOPS::
|
||||
|
||||
EIP: 0060:[<c021e50e>] Not tainted VLI
|
||||
|
||||
And use GDB to translate that to human-readable form::
|
||||
|
||||
$ gdb vmlinux
|
||||
(gdb) l *0xc021e50e
|
||||
|
||||
If you don't have ``CONFIG_DEBUG_INFO`` enabled, you use the function
|
||||
offset from the OOPS::
|
||||
|
||||
EIP is at vt_ioctl+0xda8/0x1482
|
||||
|
||||
And recompile the kernel with ``CONFIG_DEBUG_INFO`` enabled::
|
||||
|
||||
$ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
|
||||
$ make vmlinux
|
||||
$ gdb vmlinux
|
||||
(gdb) l *vt_ioctl+0xda8
|
||||
0x1888 is in vt_ioctl (drivers/tty/vt/vt_ioctl.c:293).
|
||||
288 {
|
||||
289 struct vc_data *vc = NULL;
|
||||
290 int ret = 0;
|
||||
291
|
||||
292 console_lock();
|
||||
293 if (VT_BUSY(vc_num))
|
||||
294 ret = -EBUSY;
|
||||
295 else if (vc_num)
|
||||
296 vc = vc_deallocate(vc_num);
|
||||
297 console_unlock();
|
||||
|
||||
or, if you want to be more verbose::
|
||||
|
||||
(gdb) p vt_ioctl
|
||||
$1 = {int (struct tty_struct *, unsigned int, unsigned long)} 0xae0 <vt_ioctl>
|
||||
(gdb) l *0xae0+0xda8
|
||||
|
||||
You could, instead, use the object file::
|
||||
|
||||
$ make drivers/tty/
|
||||
$ gdb drivers/tty/vt/vt_ioctl.o
|
||||
(gdb) l *vt_ioctl+0xda8
|
||||
|
||||
If you have a call trace, such as::
|
||||
|
||||
Call Trace:
|
||||
[<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
|
||||
[<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
|
||||
[<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
|
||||
...
|
||||
|
||||
this shows the problem likely in the :jbd: module. You can load that module
|
||||
in gdb and list the relevant code::
|
||||
|
||||
$ gdb fs/jbd/jbd.ko
|
||||
(gdb) l *log_wait_commit+0xa3
|
||||
|
||||
.. note::
|
||||
|
||||
You can also do the same for any function call at the stack trace,
|
||||
like this one::
|
||||
|
||||
[<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb]
|
||||
|
||||
The position where the above call happened can be seen with::
|
||||
|
||||
$ gdb drivers/media/usb/dvb-usb/dvb-usb.o
|
||||
(gdb) l *dvb_usb_adapter_frontend_exit+0x3a
|
||||
|
||||
objdump
|
||||
^^^^^^^
|
||||
|
||||
To debug a kernel, use objdump and look for the hex offset from the crash
|
||||
output to find the valid line of code/assembler. Without debug symbols, you
|
||||
will see the assembler code for the routine shown, but if your kernel has
|
||||
debug symbols the C code will also be available. (Debug symbols can be enabled
|
||||
in the kernel hacking menu of the menu configuration.) For example::
|
||||
|
||||
$ objdump -r -S -l --disassemble net/dccp/ipv4.o
|
||||
|
||||
.. note::
|
||||
|
||||
You need to be at the top level of the kernel tree for this to pick up
|
||||
your C files.
|
||||
|
||||
If you don't have access to the code you can also debug on some crash dumps
|
||||
e.g. crash dump output as shown by Dave Miller::
|
||||
|
||||
EIP is at +0x14/0x4c0
|
||||
...
|
||||
Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
|
||||
00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
|
||||
<8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85
|
||||
|
||||
Put the bytes into a "foo.s" file like this:
|
||||
|
||||
.text
|
||||
.globl foo
|
||||
foo:
|
||||
.byte .... /* bytes from Code: part of OOPS dump */
|
||||
|
||||
Compile it with "gcc -c -o foo.o foo.s" then look at the output of
|
||||
"objdump --disassemble foo.o".
|
||||
|
||||
Output:
|
||||
|
||||
ip_queue_xmit:
|
||||
push %ebp
|
||||
push %edi
|
||||
push %esi
|
||||
push %ebx
|
||||
sub $0xbc, %esp
|
||||
mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb)
|
||||
mov 0x8(%ebp), %ebx ! %ebx = skb->sk
|
||||
mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
|
||||
|
||||
Reporting the bug
|
||||
-----------------
|
||||
|
||||
Once you find where the bug happened, by inspecting its location,
|
||||
you could either try to fix it yourself or report it upstream.
|
||||
|
||||
In order to report it upstream, you should identify the mailing list
|
||||
used for the development of the affected code. This can be done by using
|
||||
the ``get_maintainer.pl`` script.
|
||||
|
||||
For example, if you find a bug at the gspca's conex.c file, you can get
|
||||
their maintainers with::
|
||||
|
||||
$ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c
|
||||
Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
|
||||
Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB),commit_signer:1/1=100%)
|
||||
Tejun Heo <tj@kernel.org> (commit_signer:1/1=100%)
|
||||
Bhaktipriya Shridhar <bhaktipriya96@gmail.com> (commit_signer:1/1=100%,authored:1/1=100%,added_lines:4/4=100%,removed_lines:9/9=100%)
|
||||
linux-media@vger.kernel.org (open list:GSPCA USB WEBCAM DRIVER)
|
||||
linux-kernel@vger.kernel.org (open list)
|
||||
|
||||
Please notice that it will point to:
|
||||
|
||||
- The last developers that touched on the source code. On the above example,
|
||||
Tejun and Bhaktipriya (in this specific case, none really envolved on the
|
||||
development of this file);
|
||||
- The driver maintainer (Hans Verkuil);
|
||||
- The subsystem maintainer (Mauro Carvalho Chehab)
|
||||
- The driver and/or subsystem mailing list (linux-media@vger.kernel.org);
|
||||
- the Linux Kernel mailing list (linux-kernel@vger.kernel.org).
|
||||
|
||||
Usually, the fastest way to have your bug fixed is to report it to mailing
|
||||
list used for the development of the code (linux-media ML) copying the driver maintainer (Hans).
|
||||
|
||||
If you are totally stumped as to whom to send the report, and
|
||||
``get_maintainer.pl`` didn't provide you anything useful, send it to
|
||||
linux-kernel@vger.kernel.org.
|
||||
|
||||
Thanks for your help in making Linux as stable as humanly possible.
|
||||
|
||||
Fixing the bug
|
||||
--------------
|
||||
|
||||
If you know programming, you could help us by not only reporting the bug,
|
||||
but also providing us with a solution. After all open source is about
|
||||
sharing what you do and don't you want to be recognised for your genius?
|
||||
|
||||
If you decide to take this way, once you have worked out a fix please submit
|
||||
it upstream.
|
||||
|
||||
Please do read
|
||||
ref:`Documentation/process/submitting-patches.rst <submittingpatches>` though
|
||||
to help your code get accepted.
|
||||
|
||||
|
||||
---------------------------------------------------------------------------
|
||||
|
||||
Notes on Oops tracing with ``klogd``
|
||||
------------------------------------
|
||||
|
||||
In order to help Linus and the other kernel developers there has been
|
||||
substantial support incorporated into ``klogd`` for processing protection
|
||||
faults. In order to have full support for address resolution at least
|
||||
version 1.3-pl3 of the ``sysklogd`` package should be used.
|
||||
|
||||
When a protection fault occurs the ``klogd`` daemon automatically
|
||||
translates important addresses in the kernel log messages to their
|
||||
symbolic equivalents. This translated kernel message is then
|
||||
forwarded through whatever reporting mechanism ``klogd`` is using. The
|
||||
protection fault message can be simply cut out of the message files
|
||||
and forwarded to the kernel developers.
|
||||
|
||||
Two types of address resolution are performed by ``klogd``. The first is
|
||||
static translation and the second is dynamic translation. Static
|
||||
translation uses the System.map file in much the same manner that
|
||||
ksymoops does. In order to do static translation the ``klogd`` daemon
|
||||
must be able to find a system map file at daemon initialization time.
|
||||
See the klogd man page for information on how ``klogd`` searches for map
|
||||
files.
|
||||
|
||||
Dynamic address translation is important when kernel loadable modules
|
||||
are being used. Since memory for kernel modules is allocated from the
|
||||
kernel's dynamic memory pools there are no fixed locations for either
|
||||
the start of the module or for functions and symbols in the module.
|
||||
|
||||
The kernel supports system calls which allow a program to determine
|
||||
which modules are loaded and their location in memory. Using these
|
||||
system calls the klogd daemon builds a symbol table which can be used
|
||||
to debug a protection fault which occurs in a loadable kernel module.
|
||||
|
||||
At the very minimum klogd will provide the name of the module which
|
||||
generated the protection fault. There may be additional symbolic
|
||||
information available if the developer of the loadable module chose to
|
||||
export symbol information from the module.
|
||||
|
||||
Since the kernel module environment can be dynamic there must be a
|
||||
mechanism for notifying the ``klogd`` daemon when a change in module
|
||||
environment occurs. There are command line options available which
|
||||
allow klogd to signal the currently executing daemon that symbol
|
||||
information should be refreshed. See the ``klogd`` manual page for more
|
||||
information.
|
||||
|
||||
A patch is included with the sysklogd distribution which modifies the
|
||||
``modules-2.0.0`` package to automatically signal klogd whenever a module
|
||||
is loaded or unloaded. Applying this patch provides essentially
|
||||
seamless support for debugging protection faults which occur with
|
||||
kernel loadable modules.
|
||||
|
||||
The following is an example of a protection fault in a loadable module
|
||||
processed by ``klogd``::
|
||||
|
||||
Aug 29 09:51:01 blizard kernel: Unable to handle kernel paging request at virtual address f15e97cc
|
||||
Aug 29 09:51:01 blizard kernel: current->tss.cr3 = 0062d000, %cr3 = 0062d000
|
||||
Aug 29 09:51:01 blizard kernel: *pde = 00000000
|
||||
Aug 29 09:51:01 blizard kernel: Oops: 0002
|
||||
Aug 29 09:51:01 blizard kernel: CPU: 0
|
||||
Aug 29 09:51:01 blizard kernel: EIP: 0010:[oops:_oops+16/3868]
|
||||
Aug 29 09:51:01 blizard kernel: EFLAGS: 00010212
|
||||
Aug 29 09:51:01 blizard kernel: eax: 315e97cc ebx: 003a6f80 ecx: 001be77b edx: 00237c0c
|
||||
Aug 29 09:51:01 blizard kernel: esi: 00000000 edi: bffffdb3 ebp: 00589f90 esp: 00589f8c
|
||||
Aug 29 09:51:01 blizard kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
|
||||
Aug 29 09:51:01 blizard kernel: Process oops_test (pid: 3374, process nr: 21, stackpage=00589000)
|
||||
Aug 29 09:51:01 blizard kernel: Stack: 315e97cc 00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80 00000001
|
||||
Aug 29 09:51:01 blizard kernel: 00000000 00237810 bfffff00 0010a7fa 00000003 00000001 00000000 bfffff00
|
||||
Aug 29 09:51:01 blizard kernel: bffffdb3 bffffed4 ffffffda 0000002b 0007002b 0000002b 0000002b 00000036
|
||||
Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128]
|
||||
Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3
|
||||
|
||||
---------------------------------------------------------------------------
|
||||
|
||||
::
|
||||
|
||||
Dr. G.W. Wettstein Oncology Research Div. Computing Facility
|
||||
Roger Maris Cancer Center INTERNET: greg@wind.rmcc.com
|
||||
820 4th St. N.
|
||||
Fargo, ND 58122
|
||||
Phone: 701-234-7556
|
||||
10
Documentation/admin-guide/conf.py
Normal file
10
Documentation/admin-guide/conf.py
Normal file
@@ -0,0 +1,10 @@
|
||||
# -*- coding: utf-8; mode: python -*-
|
||||
|
||||
project = 'Linux Kernel User Documentation'
|
||||
|
||||
tags.add("subproject")
|
||||
|
||||
latex_documents = [
|
||||
('index', 'linux-user.tex', 'Linux Kernel User Documentation',
|
||||
'The kernel development community', 'manual'),
|
||||
]
|
||||
268
Documentation/admin-guide/devices.rst
Normal file
268
Documentation/admin-guide/devices.rst
Normal file
@@ -0,0 +1,268 @@
|
||||
|
||||
Linux allocated devices (4.x+ version)
|
||||
======================================
|
||||
|
||||
This list is the Linux Device List, the official registry of allocated
|
||||
device numbers and ``/dev`` directory nodes for the Linux operating
|
||||
system.
|
||||
|
||||
The LaTeX version of this document is no longer maintained, nor is
|
||||
the document that used to reside at lanana.org. This version in the
|
||||
mainline Linux kernel is the master document. Updates shall be sent
|
||||
as patches to the kernel maintainers (see the
|
||||
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` document).
|
||||
Specifically explore the sections titled "CHAR and MISC DRIVERS", and
|
||||
"BLOCK LAYER" in the MAINTAINERS file to find the right maintainers
|
||||
to involve for character and block devices.
|
||||
|
||||
This document is included by reference into the Filesystem Hierarchy
|
||||
Standard (FHS). The FHS is available from http://www.pathname.com/fhs/.
|
||||
|
||||
Allocations marked (68k/Amiga) apply to Linux/68k on the Amiga
|
||||
platform only. Allocations marked (68k/Atari) apply to Linux/68k on
|
||||
the Atari platform only.
|
||||
|
||||
This document is in the public domain. The authors requests, however,
|
||||
that semantically altered versions are not distributed without
|
||||
permission of the authors, assuming the authors can be contacted without
|
||||
an unreasonable effort.
|
||||
|
||||
|
||||
.. attention::
|
||||
|
||||
DEVICE DRIVERS AUTHORS PLEASE READ THIS
|
||||
|
||||
Linux now has extensive support for dynamic allocation of device numbering
|
||||
and can use ``sysfs`` and ``udev`` (``systemd``) to handle the naming needs.
|
||||
There are still some exceptions in the serial and boot device area. Before
|
||||
asking for a device number make sure you actually need one.
|
||||
|
||||
To have a major number allocated, or a minor number in situations
|
||||
where that applies (e.g. busmice), please submit a patch and send to
|
||||
the authors as indicated above.
|
||||
|
||||
Keep the description of the device *in the same format
|
||||
as this list*. The reason for this is that it is the only way we have
|
||||
found to ensure we have all the requisite information to publish your
|
||||
device and avoid conflicts.
|
||||
|
||||
Finally, sometimes we have to play "namespace police." Please don't be
|
||||
offended. We often get submissions for ``/dev`` names that would be bound
|
||||
to cause conflicts down the road. We are trying to avoid getting in a
|
||||
situation where we would have to suffer an incompatible forward
|
||||
change. Therefore, please consult with us **before** you make your
|
||||
device names and numbers in any way public, at least to the point
|
||||
where it would be at all difficult to get them changed.
|
||||
|
||||
Your cooperation is appreciated.
|
||||
|
||||
.. include:: devices.txt
|
||||
:literal:
|
||||
|
||||
Additional ``/dev/`` directory entries
|
||||
--------------------------------------
|
||||
|
||||
This section details additional entries that should or may exist in
|
||||
the /dev directory. It is preferred that symbolic links use the same
|
||||
form (absolute or relative) as is indicated here. Links are
|
||||
classified as "hard" or "symbolic" depending on the preferred type of
|
||||
link; if possible, the indicated type of link should be used.
|
||||
|
||||
Compulsory links
|
||||
++++++++++++++++
|
||||
|
||||
These links should exist on all systems:
|
||||
|
||||
=============== =============== =============== ===============================
|
||||
/dev/fd /proc/self/fd symbolic File descriptors
|
||||
/dev/stdin fd/0 symbolic stdin file descriptor
|
||||
/dev/stdout fd/1 symbolic stdout file descriptor
|
||||
/dev/stderr fd/2 symbolic stderr file descriptor
|
||||
/dev/nfsd socksys symbolic Required by iBCS-2
|
||||
/dev/X0R null symbolic Required by iBCS-2
|
||||
=============== =============== =============== ===============================
|
||||
|
||||
Note: ``/dev/X0R`` is <letter X>-<digit 0>-<letter R>.
|
||||
|
||||
Recommended links
|
||||
+++++++++++++++++
|
||||
|
||||
It is recommended that these links exist on all systems:
|
||||
|
||||
|
||||
=============== =============== =============== ===============================
|
||||
/dev/core /proc/kcore symbolic Backward compatibility
|
||||
/dev/ramdisk ram0 symbolic Backward compatibility
|
||||
/dev/ftape qft0 symbolic Backward compatibility
|
||||
/dev/bttv0 video0 symbolic Backward compatibility
|
||||
/dev/radio radio0 symbolic Backward compatibility
|
||||
/dev/i2o* /dev/i2o/* symbolic Backward compatibility
|
||||
/dev/scd? sr? hard Alternate SCSI CD-ROM name
|
||||
=============== =============== =============== ===============================
|
||||
|
||||
Locally defined links
|
||||
+++++++++++++++++++++
|
||||
|
||||
The following links may be established locally to conform to the
|
||||
configuration of the system. This is merely a tabulation of existing
|
||||
practice, and does not constitute a recommendation. However, if they
|
||||
exist, they should have the following uses.
|
||||
|
||||
=============== =============== =============== ===============================
|
||||
/dev/mouse mouse port symbolic Current mouse device
|
||||
/dev/tape tape device symbolic Current tape device
|
||||
/dev/cdrom CD-ROM device symbolic Current CD-ROM device
|
||||
/dev/cdwriter CD-writer symbolic Current CD-writer device
|
||||
/dev/scanner scanner symbolic Current scanner device
|
||||
/dev/modem modem port symbolic Current dialout device
|
||||
/dev/root root device symbolic Current root filesystem
|
||||
/dev/swap swap device symbolic Current swap device
|
||||
=============== =============== =============== ===============================
|
||||
|
||||
``/dev/modem`` should not be used for a modem which supports dialin as
|
||||
well as dialout, as it tends to cause lock file problems. If it
|
||||
exists, ``/dev/modem`` should point to the appropriate primary TTY device
|
||||
(the use of the alternate callout devices is deprecated).
|
||||
|
||||
For SCSI devices, ``/dev/tape`` and ``/dev/cdrom`` should point to the
|
||||
*cooked* devices (``/dev/st*`` and ``/dev/sr*``, respectively), whereas
|
||||
``/dev/cdwriter`` and /dev/scanner should point to the appropriate generic
|
||||
SCSI devices (/dev/sg*).
|
||||
|
||||
``/dev/mouse`` may point to a primary serial TTY device, a hardware mouse
|
||||
device, or a socket for a mouse driver program (e.g. ``/dev/gpmdata``).
|
||||
|
||||
Sockets and pipes
|
||||
+++++++++++++++++
|
||||
|
||||
Non-transient sockets and named pipes may exist in /dev. Common entries are:
|
||||
|
||||
=============== =============== ===============================================
|
||||
/dev/printer socket lpd local socket
|
||||
/dev/log socket syslog local socket
|
||||
/dev/gpmdata socket gpm mouse multiplexer
|
||||
=============== =============== ===============================================
|
||||
|
||||
Mount points
|
||||
++++++++++++
|
||||
|
||||
The following names are reserved for mounting special filesystems
|
||||
under /dev. These special filesystems provide kernel interfaces that
|
||||
cannot be provided with standard device nodes.
|
||||
|
||||
=============== =============== ===============================================
|
||||
/dev/pts devpts PTY slave filesystem
|
||||
/dev/shm tmpfs POSIX shared memory maintenance access
|
||||
=============== =============== ===============================================
|
||||
|
||||
Terminal devices
|
||||
----------------
|
||||
|
||||
Terminal, or TTY devices are a special class of character devices. A
|
||||
terminal device is any device that could act as a controlling terminal
|
||||
for a session; this includes virtual consoles, serial ports, and
|
||||
pseudoterminals (PTYs).
|
||||
|
||||
All terminal devices share a common set of capabilities known as line
|
||||
disciplines; these include the common terminal line discipline as well
|
||||
as SLIP and PPP modes.
|
||||
|
||||
All terminal devices are named similarly; this section explains the
|
||||
naming and use of the various types of TTYs. Note that the naming
|
||||
conventions include several historical warts; some of these are
|
||||
Linux-specific, some were inherited from other systems, and some
|
||||
reflect Linux outgrowing a borrowed convention.
|
||||
|
||||
A hash mark (``#``) in a device name is used here to indicate a decimal
|
||||
number without leading zeroes.
|
||||
|
||||
Virtual consoles and the console device
|
||||
+++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
Virtual consoles are full-screen terminal displays on the system video
|
||||
monitor. Virtual consoles are named ``/dev/tty#``, with numbering
|
||||
starting at ``/dev/tty1``; ``/dev/tty0`` is the current virtual console.
|
||||
``/dev/tty0`` is the device that should be used to access the system video
|
||||
card on those architectures for which the frame buffer devices
|
||||
(``/dev/fb*``) are not applicable. Do not use ``/dev/console``
|
||||
for this purpose.
|
||||
|
||||
The console device, ``/dev/console``, is the device to which system
|
||||
messages should be sent, and on which logins should be permitted in
|
||||
single-user mode. Starting with Linux 2.1.71, ``/dev/console`` is managed
|
||||
by the kernel; for previous versions it should be a symbolic link to
|
||||
either ``/dev/tty0``, a specific virtual console such as ``/dev/tty1``, or to
|
||||
a serial port primary (``tty*``, not ``cu*``) device, depending on the
|
||||
configuration of the system.
|
||||
|
||||
Serial ports
|
||||
++++++++++++
|
||||
|
||||
Serial ports are RS-232 serial ports and any device which simulates
|
||||
one, either in hardware (such as internal modems) or in software (such
|
||||
as the ISDN driver.) Under Linux, each serial ports has two device
|
||||
names, the primary or callin device and the alternate or callout one.
|
||||
Each kind of device is indicated by a different letter. For any
|
||||
letter X, the names of the devices are ``/dev/ttyX#`` and ``/dev/cux#``,
|
||||
respectively; for historical reasons, ``/dev/ttyS#`` and ``/dev/ttyC#``
|
||||
correspond to ``/dev/cua#`` and ``/dev/cub#``. In the future, it should be
|
||||
expected that multiple letters will be used; all letters will be upper
|
||||
case for the "tty" device (e.g. ``/dev/ttyDP#``) and lower case for the
|
||||
"cu" device (e.g. ``/dev/cudp#``).
|
||||
|
||||
The names ``/dev/ttyQ#`` and ``/dev/cuq#`` are reserved for local use.
|
||||
|
||||
The alternate devices provide for kernel-based exclusion and somewhat
|
||||
different defaults than the primary devices. Their main purpose is to
|
||||
allow the use of serial ports with programs with no inherent or broken
|
||||
support for serial ports. Their use is deprecated, and they may be
|
||||
removed from a future version of Linux.
|
||||
|
||||
Arbitration of serial ports is provided by the use of lock files with
|
||||
the names ``/var/lock/LCK..ttyX#``. The contents of the lock file should
|
||||
be the PID of the locking process as an ASCII number.
|
||||
|
||||
It is common practice to install links such as /dev/modem
|
||||
which point to serial ports. In order to ensure proper locking in the
|
||||
presence of these links, it is recommended that software chase
|
||||
symlinks and lock all possible names; additionally, it is recommended
|
||||
that a lock file be installed with the corresponding alternate
|
||||
device. In order to avoid deadlocks, it is recommended that the locks
|
||||
are acquired in the following order, and released in the reverse:
|
||||
|
||||
1. The symbolic link name, if any (``/var/lock/LCK..modem``)
|
||||
2. The "tty" name (``/var/lock/LCK..ttyS2``)
|
||||
3. The alternate device name (``/var/lock/LCK..cua2``)
|
||||
|
||||
In the case of nested symbolic links, the lock files should be
|
||||
installed in the order the symlinks are resolved.
|
||||
|
||||
Under no circumstances should an application hold a lock while waiting
|
||||
for another to be released. In addition, applications which attempt
|
||||
to create lock files for the corresponding alternate device names
|
||||
should take into account the possibility of being used on a non-serial
|
||||
port TTY, for which no alternate device would exist.
|
||||
|
||||
Pseudoterminals (PTYs)
|
||||
++++++++++++++++++++++
|
||||
|
||||
Pseudoterminals, or PTYs, are used to create login sessions or provide
|
||||
other capabilities requiring a TTY line discipline (including SLIP or
|
||||
PPP capability) to arbitrary data-generation processes. Each PTY has
|
||||
a master side, named ``/dev/pty[p-za-e][0-9a-f]``, and a slave side, named
|
||||
``/dev/tty[p-za-e][0-9a-f]``. The kernel arbitrates the use of PTYs by
|
||||
allowing each master side to be opened only once.
|
||||
|
||||
Once the master side has been opened, the corresponding slave device
|
||||
can be used in the same manner as any TTY device. The master and
|
||||
slave devices are connected by the kernel, generating the equivalent
|
||||
of a bidirectional pipe with TTY capabilities.
|
||||
|
||||
Recent versions of the Linux kernels and GNU libc contain support for
|
||||
the System V/Unix98 naming scheme for PTYs, which assigns a common
|
||||
device, ``/dev/ptmx``, to all the masters (opening it will automatically
|
||||
give you a previously unassigned PTY) and a subdirectory, ``/dev/pts``,
|
||||
for the slaves; the slaves are named with decimal integers (``/dev/pts/#``
|
||||
in our notation). This removes the problem of exhausting the
|
||||
namespace and enables the kernel to automatically create the device
|
||||
nodes for the slaves on demand using the "devpts" filesystem.
|
||||
3081
Documentation/admin-guide/devices.txt
Normal file
3081
Documentation/admin-guide/devices.txt
Normal file
File diff suppressed because it is too large
Load Diff
353
Documentation/admin-guide/dynamic-debug-howto.rst
Normal file
353
Documentation/admin-guide/dynamic-debug-howto.rst
Normal file
@@ -0,0 +1,353 @@
|
||||
Dynamic debug
|
||||
+++++++++++++
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
This document describes how to use the dynamic debug (dyndbg) feature.
|
||||
|
||||
Dynamic debug is designed to allow you to dynamically enable/disable
|
||||
kernel code to obtain additional kernel information. Currently, if
|
||||
``CONFIG_DYNAMIC_DEBUG`` is set, then all ``pr_debug()``/``dev_dbg()`` and
|
||||
``print_hex_dump_debug()``/``print_hex_dump_bytes()`` calls can be dynamically
|
||||
enabled per-callsite.
|
||||
|
||||
If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is just
|
||||
shortcut for ``print_hex_dump(KERN_DEBUG)``.
|
||||
|
||||
For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is
|
||||
its ``prefix_str`` argument, if it is constant string; or ``hexdump``
|
||||
in case ``prefix_str`` is build dynamically.
|
||||
|
||||
Dynamic debug has even more useful features:
|
||||
|
||||
* Simple query language allows turning on and off debugging
|
||||
statements by matching any combination of 0 or 1 of:
|
||||
|
||||
- source filename
|
||||
- function name
|
||||
- line number (including ranges of line numbers)
|
||||
- module name
|
||||
- format string
|
||||
|
||||
* Provides a debugfs control file: ``<debugfs>/dynamic_debug/control``
|
||||
which can be read to display the complete list of known debug
|
||||
statements, to help guide you
|
||||
|
||||
Controlling dynamic debug Behaviour
|
||||
===================================
|
||||
|
||||
The behaviour of ``pr_debug()``/``dev_dbg()`` are controlled via writing to a
|
||||
control file in the 'debugfs' filesystem. Thus, you must first mount
|
||||
the debugfs filesystem, in order to make use of this feature.
|
||||
Subsequently, we refer to the control file as:
|
||||
``<debugfs>/dynamic_debug/control``. For example, if you want to enable
|
||||
printing from source file ``svcsock.c``, line 1603 you simply do::
|
||||
|
||||
nullarbor:~ # echo 'file svcsock.c line 1603 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
If you make a mistake with the syntax, the write will fail thus::
|
||||
|
||||
nullarbor:~ # echo 'file svcsock.c wtf 1 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
-bash: echo: write error: Invalid argument
|
||||
|
||||
Viewing Dynamic Debug Behaviour
|
||||
===============================
|
||||
|
||||
You can view the currently configured behaviour of all the debug
|
||||
statements via::
|
||||
|
||||
nullarbor:~ # cat <debugfs>/dynamic_debug/control
|
||||
# filename:lineno [module]function flags format
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup =_ "SVCRDMA Module Removed, deregister RPC RDMA transport\012"
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init =_ "\011max_inline : %d\012"
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init =_ "\011sq_depth : %d\012"
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init =_ "\011max_requests : %d\012"
|
||||
...
|
||||
|
||||
|
||||
You can also apply standard Unix text manipulation filters to this
|
||||
data, e.g.::
|
||||
|
||||
nullarbor:~ # grep -i rdma <debugfs>/dynamic_debug/control | wc -l
|
||||
62
|
||||
|
||||
nullarbor:~ # grep -i tcp <debugfs>/dynamic_debug/control | wc -l
|
||||
42
|
||||
|
||||
The third column shows the currently enabled flags for each debug
|
||||
statement callsite (see below for definitions of the flags). The
|
||||
default value, with no flags enabled, is ``=_``. So you can view all
|
||||
the debug statement callsites with any non-default flags::
|
||||
|
||||
nullarbor:~ # awk '$3 != "=_"' <debugfs>/dynamic_debug/control
|
||||
# filename:lineno [module]function flags format
|
||||
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012"
|
||||
|
||||
Command Language Reference
|
||||
==========================
|
||||
|
||||
At the lexical level, a command comprises a sequence of words separated
|
||||
by spaces or tabs. So these are all equivalent::
|
||||
|
||||
nullarbor:~ # echo -c 'file svcsock.c line 1603 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
nullarbor:~ # echo -c ' file svcsock.c line 1603 +p ' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
Command submissions are bounded by a write() system call.
|
||||
Multiple commands can be written together, separated by ``;`` or ``\n``::
|
||||
|
||||
~# echo "func pnpacpi_get_resources +p; func pnp_assign_mem +p" \
|
||||
> <debugfs>/dynamic_debug/control
|
||||
|
||||
If your query set is big, you can batch them too::
|
||||
|
||||
~# cat query-batch-file > <debugfs>/dynamic_debug/control
|
||||
|
||||
A another way is to use wildcard. The match rule support ``*`` (matches
|
||||
zero or more characters) and ``?`` (matches exactly one character).For
|
||||
example, you can match all usb drivers::
|
||||
|
||||
~# echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control
|
||||
|
||||
At the syntactical level, a command comprises a sequence of match
|
||||
specifications, followed by a flags change specification::
|
||||
|
||||
command ::= match-spec* flags-spec
|
||||
|
||||
The match-spec's are used to choose a subset of the known pr_debug()
|
||||
callsites to which to apply the flags-spec. Think of them as a query
|
||||
with implicit ANDs between each pair. Note that an empty list of
|
||||
match-specs will select all debug statement callsites.
|
||||
|
||||
A match specification comprises a keyword, which controls the
|
||||
attribute of the callsite to be compared, and a value to compare
|
||||
against. Possible keywords are:::
|
||||
|
||||
match-spec ::= 'func' string |
|
||||
'file' string |
|
||||
'module' string |
|
||||
'format' string |
|
||||
'line' line-range
|
||||
|
||||
line-range ::= lineno |
|
||||
'-'lineno |
|
||||
lineno'-' |
|
||||
lineno'-'lineno
|
||||
|
||||
lineno ::= unsigned-int
|
||||
|
||||
.. note::
|
||||
|
||||
``line-range`` cannot contain space, e.g.
|
||||
"1-30" is valid range but "1 - 30" is not.
|
||||
|
||||
|
||||
The meanings of each keyword are:
|
||||
|
||||
func
|
||||
The given string is compared against the function name
|
||||
of each callsite. Example::
|
||||
|
||||
func svc_tcp_accept
|
||||
|
||||
file
|
||||
The given string is compared against either the full pathname, the
|
||||
src-root relative pathname, or the basename of the source file of
|
||||
each callsite. Examples::
|
||||
|
||||
file svcsock.c
|
||||
file kernel/freezer.c
|
||||
file /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c
|
||||
|
||||
module
|
||||
The given string is compared against the module name
|
||||
of each callsite. The module name is the string as
|
||||
seen in ``lsmod``, i.e. without the directory or the ``.ko``
|
||||
suffix and with ``-`` changed to ``_``. Examples::
|
||||
|
||||
module sunrpc
|
||||
module nfsd
|
||||
|
||||
format
|
||||
The given string is searched for in the dynamic debug format
|
||||
string. Note that the string does not need to match the
|
||||
entire format, only some part. Whitespace and other
|
||||
special characters can be escaped using C octal character
|
||||
escape ``\ooo`` notation, e.g. the space character is ``\040``.
|
||||
Alternatively, the string can be enclosed in double quote
|
||||
characters (``"``) or single quote characters (``'``).
|
||||
Examples::
|
||||
|
||||
format svcrdma: // many of the NFS/RDMA server pr_debugs
|
||||
format readahead // some pr_debugs in the readahead cache
|
||||
format nfsd:\040SETATTR // one way to match a format with whitespace
|
||||
format "nfsd: SETATTR" // a neater way to match a format with whitespace
|
||||
format 'nfsd: SETATTR' // yet another way to match a format with whitespace
|
||||
|
||||
line
|
||||
The given line number or range of line numbers is compared
|
||||
against the line number of each ``pr_debug()`` callsite. A single
|
||||
line number matches the callsite line number exactly. A
|
||||
range of line numbers matches any callsite between the first
|
||||
and last line number inclusive. An empty first number means
|
||||
the first line in the file, an empty line number means the
|
||||
last number in the file. Examples::
|
||||
|
||||
line 1603 // exactly line 1603
|
||||
line 1600-1605 // the six lines from line 1600 to line 1605
|
||||
line -1605 // the 1605 lines from line 1 to line 1605
|
||||
line 1600- // all lines from line 1600 to the end of the file
|
||||
|
||||
The flags specification comprises a change operation followed
|
||||
by one or more flag characters. The change operation is one
|
||||
of the characters::
|
||||
|
||||
- remove the given flags
|
||||
+ add the given flags
|
||||
= set the flags to the given flags
|
||||
|
||||
The flags are::
|
||||
|
||||
p enables the pr_debug() callsite.
|
||||
f Include the function name in the printed message
|
||||
l Include line number in the printed message
|
||||
m Include module name in the printed message
|
||||
t Include thread ID in messages not generated from interrupt context
|
||||
_ No flags are set. (Or'd with others on input)
|
||||
|
||||
For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only ``p`` flag
|
||||
have meaning, other flags ignored.
|
||||
|
||||
For display, the flags are preceded by ``=``
|
||||
(mnemonic: what the flags are currently equal to).
|
||||
|
||||
Note the regexp ``^[-+=][flmpt_]+$`` matches a flags specification.
|
||||
To clear all flags at once, use ``=_`` or ``-flmpt``.
|
||||
|
||||
|
||||
Debug messages during Boot Process
|
||||
==================================
|
||||
|
||||
To activate debug messages for core code and built-in modules during
|
||||
the boot process, even before userspace and debugfs exists, use
|
||||
``dyndbg="QUERY"``, ``module.dyndbg="QUERY"``, or ``ddebug_query="QUERY"``
|
||||
(``ddebug_query`` is obsoleted by ``dyndbg``, and deprecated). QUERY follows
|
||||
the syntax described above, but must not exceed 1023 characters. Your
|
||||
bootloader may impose lower limits.
|
||||
|
||||
These ``dyndbg`` params are processed just after the ddebug tables are
|
||||
processed, as part of the arch_initcall. Thus you can enable debug
|
||||
messages in all code run after this arch_initcall via this boot
|
||||
parameter.
|
||||
|
||||
On an x86 system for example ACPI enablement is a subsys_initcall and::
|
||||
|
||||
dyndbg="file ec.c +p"
|
||||
|
||||
will show early Embedded Controller transactions during ACPI setup if
|
||||
your machine (typically a laptop) has an Embedded Controller.
|
||||
PCI (or other devices) initialization also is a hot candidate for using
|
||||
this boot parameter for debugging purposes.
|
||||
|
||||
If ``foo`` module is not built-in, ``foo.dyndbg`` will still be processed at
|
||||
boot time, without effect, but will be reprocessed when module is
|
||||
loaded later. ``dyndbg_query=`` and bare ``dyndbg=`` are only processed at
|
||||
boot.
|
||||
|
||||
|
||||
Debug Messages at Module Initialization Time
|
||||
============================================
|
||||
|
||||
When ``modprobe foo`` is called, modprobe scans ``/proc/cmdline`` for
|
||||
``foo.params``, strips ``foo.``, and passes them to the kernel along with
|
||||
params given in modprobe args or ``/etc/modprob.d/*.conf`` files,
|
||||
in the following order:
|
||||
|
||||
1. parameters given via ``/etc/modprobe.d/*.conf``::
|
||||
|
||||
options foo dyndbg=+pt
|
||||
options foo dyndbg # defaults to +p
|
||||
|
||||
2. ``foo.dyndbg`` as given in boot args, ``foo.`` is stripped and passed::
|
||||
|
||||
foo.dyndbg=" func bar +p; func buz +mp"
|
||||
|
||||
3. args to modprobe::
|
||||
|
||||
modprobe foo dyndbg==pmf # override previous settings
|
||||
|
||||
These ``dyndbg`` queries are applied in order, with last having final say.
|
||||
This allows boot args to override or modify those from ``/etc/modprobe.d``
|
||||
(sensible, since 1 is system wide, 2 is kernel or boot specific), and
|
||||
modprobe args to override both.
|
||||
|
||||
In the ``foo.dyndbg="QUERY"`` form, the query must exclude ``module foo``.
|
||||
``foo`` is extracted from the param-name, and applied to each query in
|
||||
``QUERY``, and only 1 match-spec of each type is allowed.
|
||||
|
||||
The ``dyndbg`` option is a "fake" module parameter, which means:
|
||||
|
||||
- modules do not need to define it explicitly
|
||||
- every module gets it tacitly, whether they use pr_debug or not
|
||||
- it doesn't appear in ``/sys/module/$module/parameters/``
|
||||
To see it, grep the control file, or inspect ``/proc/cmdline.``
|
||||
|
||||
For ``CONFIG_DYNAMIC_DEBUG`` kernels, any settings given at boot-time (or
|
||||
enabled by ``-DDEBUG`` flag during compilation) can be disabled later via
|
||||
the sysfs interface if the debug messages are no longer needed::
|
||||
|
||||
echo "module module_name -p" > <debugfs>/dynamic_debug/control
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
::
|
||||
|
||||
// enable the message at line 1603 of file svcsock.c
|
||||
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable all the messages in file svcsock.c
|
||||
nullarbor:~ # echo -n 'file svcsock.c +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable all the messages in the NFS server module
|
||||
nullarbor:~ # echo -n 'module nfsd +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable all 12 messages in the function svc_process()
|
||||
nullarbor:~ # echo -n 'func svc_process +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// disable all 12 messages in the function svc_process()
|
||||
nullarbor:~ # echo -n 'func svc_process -p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable messages for NFS calls READ, READLINK, READDIR and READDIR+.
|
||||
nullarbor:~ # echo -n 'format "nfsd: READ" +p' >
|
||||
<debugfs>/dynamic_debug/control
|
||||
|
||||
// enable messages in files of which the paths include string "usb"
|
||||
nullarbor:~ # echo -n '*usb* +p' > <debugfs>/dynamic_debug/control
|
||||
|
||||
// enable all messages
|
||||
nullarbor:~ # echo -n '+p' > <debugfs>/dynamic_debug/control
|
||||
|
||||
// add module, function to all enabled messages
|
||||
nullarbor:~ # echo -n '+mf' > <debugfs>/dynamic_debug/control
|
||||
|
||||
// boot-args example, with newlines and comments for readability
|
||||
Kernel command line: ...
|
||||
// see whats going on in dyndbg=value processing
|
||||
dynamic_debug.verbose=1
|
||||
// enable pr_debugs in 2 builtins, #cmt is stripped
|
||||
dyndbg="module params +p #cmt ; module sys +p"
|
||||
// enable pr_debugs in 2 functions in a module loaded later
|
||||
pc87360.dyndbg="func pc87360_init_device +p; func pc87360_find +p"
|
||||
69
Documentation/admin-guide/index.rst
Normal file
69
Documentation/admin-guide/index.rst
Normal file
@@ -0,0 +1,69 @@
|
||||
The Linux kernel user's and administrator's guide
|
||||
=================================================
|
||||
|
||||
The following is a collection of user-oriented documents that have been
|
||||
added to the kernel over time. There is, as yet, little overall order or
|
||||
organization here — this material was not written to be a single, coherent
|
||||
document! With luck things will improve quickly over time.
|
||||
|
||||
This initial section contains overall information, including the README
|
||||
file describing the kernel as a whole, documentation on kernel parameters,
|
||||
etc.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
README
|
||||
kernel-parameters
|
||||
devices
|
||||
|
||||
Here is a set of documents aimed at users who are trying to track down
|
||||
problems and bugs in particular.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
reporting-bugs
|
||||
security-bugs
|
||||
bug-hunting
|
||||
bug-bisect
|
||||
tainted-kernels
|
||||
ramoops
|
||||
dynamic-debug-howto
|
||||
init
|
||||
|
||||
This is the beginning of a section with information of interest to
|
||||
application developers. Documents covering various aspects of the kernel
|
||||
ABI will be found here.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
sysfs-rules
|
||||
|
||||
The rest of this manual consists of various unordered guides on how to
|
||||
configure specific aspects of kernel behavior to your liking.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
initrd
|
||||
serial-console
|
||||
braille-console
|
||||
parport
|
||||
md
|
||||
module-signing
|
||||
sysrq
|
||||
unicode
|
||||
vga-softcursor
|
||||
binfmt-misc
|
||||
mono
|
||||
java
|
||||
ras
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
||||
52
Documentation/admin-guide/init.rst
Normal file
52
Documentation/admin-guide/init.rst
Normal file
@@ -0,0 +1,52 @@
|
||||
Explaining the dreaded "No init found." boot hang message
|
||||
=========================================================
|
||||
|
||||
OK, so you've got this pretty unintuitive message (currently located
|
||||
in init/main.c) and are wondering what the H*** went wrong.
|
||||
Some high-level reasons for failure (listed roughly in order of execution)
|
||||
to load the init binary are:
|
||||
|
||||
A) Unable to mount root FS
|
||||
B) init binary doesn't exist on rootfs
|
||||
C) broken console device
|
||||
D) binary exists but dependencies not available
|
||||
E) binary cannot be loaded
|
||||
|
||||
Detailed explanations:
|
||||
|
||||
A) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE)
|
||||
to get more detailed kernel messages.
|
||||
B) make sure you have the correct root FS type
|
||||
(and ``root=`` kernel parameter points to the correct partition),
|
||||
required drivers such as storage hardware (such as SCSI or USB!)
|
||||
and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules,
|
||||
to be pre-loaded by an initrd)
|
||||
C) Possibly a conflict in ``console= setup`` --> initial console unavailable.
|
||||
E.g. some serial consoles are unreliable due to serial IRQ issues (e.g.
|
||||
missing interrupt-based configuration).
|
||||
Try using a different ``console= device`` or e.g. ``netconsole=``.
|
||||
D) e.g. required library dependencies of the init binary such as
|
||||
``/lib/ld-linux.so.2`` missing or broken. Use
|
||||
``readelf -d <INIT>|grep NEEDED`` to find out which libraries are required.
|
||||
E) make sure the binary's architecture matches your hardware.
|
||||
E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
|
||||
In case you tried loading a non-binary file here (shell script?),
|
||||
you should make sure that the script specifies an interpreter in its shebang
|
||||
header line (``#!/...``) that is fully working (including its library
|
||||
dependencies). And before tackling scripts, better first test a simple
|
||||
non-script binary such as ``/bin/sh`` and confirm its successful execution.
|
||||
To find out more, add code ``to init/main.c`` to display kernel_execve()s
|
||||
return values.
|
||||
|
||||
Please extend this explanation whenever you find new failure causes
|
||||
(after all loading the init binary is a CRITICAL and hard transition step
|
||||
which needs to be made as painless as possible), then submit patch to LKML.
|
||||
Further TODOs:
|
||||
|
||||
- Implement the various ``run_init_process()`` invocations via a struct array
|
||||
which can then store the ``kernel_execve()`` result value and on failure
|
||||
log it all by iterating over **all** results (very important usability fix).
|
||||
- try to make the implementation itself more helpful in general,
|
||||
e.g. by providing additional error messages at affected places.
|
||||
|
||||
Andreas Mohr <andi at lisas period de>
|
||||
383
Documentation/admin-guide/initrd.rst
Normal file
383
Documentation/admin-guide/initrd.rst
Normal file
@@ -0,0 +1,383 @@
|
||||
Using the initial RAM disk (initrd)
|
||||
===================================
|
||||
|
||||
Written 1996,2000 by Werner Almesberger <werner.almesberger@epfl.ch> and
|
||||
Hans Lermen <lermen@fgan.de>
|
||||
|
||||
|
||||
initrd provides the capability to load a RAM disk by the boot loader.
|
||||
This RAM disk can then be mounted as the root file system and programs
|
||||
can be run from it. Afterwards, a new root file system can be mounted
|
||||
from a different device. The previous root (from initrd) is then moved
|
||||
to a directory and can be subsequently unmounted.
|
||||
|
||||
initrd is mainly designed to allow system startup to occur in two phases,
|
||||
where the kernel comes up with a minimum set of compiled-in drivers, and
|
||||
where additional modules are loaded from initrd.
|
||||
|
||||
This document gives a brief overview of the use of initrd. A more detailed
|
||||
discussion of the boot process can be found in [#f1]_.
|
||||
|
||||
|
||||
Operation
|
||||
---------
|
||||
|
||||
When using initrd, the system typically boots as follows:
|
||||
|
||||
1) the boot loader loads the kernel and the initial RAM disk
|
||||
2) the kernel converts initrd into a "normal" RAM disk and
|
||||
frees the memory used by initrd
|
||||
3) if the root device is not ``/dev/ram0``, the old (deprecated)
|
||||
change_root procedure is followed. see the "Obsolete root change
|
||||
mechanism" section below.
|
||||
4) root device is mounted. if it is ``/dev/ram0``, the initrd image is
|
||||
then mounted as root
|
||||
5) /sbin/init is executed (this can be any valid executable, including
|
||||
shell scripts; it is run with uid 0 and can do basically everything
|
||||
init can do).
|
||||
6) init mounts the "real" root file system
|
||||
7) init places the root file system at the root directory using the
|
||||
pivot_root system call
|
||||
8) init execs the ``/sbin/init`` on the new root filesystem, performing
|
||||
the usual boot sequence
|
||||
9) the initrd file system is removed
|
||||
|
||||
Note that changing the root directory does not involve unmounting it.
|
||||
It is therefore possible to leave processes running on initrd during that
|
||||
procedure. Also note that file systems mounted under initrd continue to
|
||||
be accessible.
|
||||
|
||||
|
||||
Boot command-line options
|
||||
-------------------------
|
||||
|
||||
initrd adds the following new options::
|
||||
|
||||
initrd=<path> (e.g. LOADLIN)
|
||||
|
||||
Loads the specified file as the initial RAM disk. When using LILO, you
|
||||
have to specify the RAM disk image file in /etc/lilo.conf, using the
|
||||
INITRD configuration variable.
|
||||
|
||||
noinitrd
|
||||
|
||||
initrd data is preserved but it is not converted to a RAM disk and
|
||||
the "normal" root file system is mounted. initrd data can be read
|
||||
from /dev/initrd. Note that the data in initrd can have any structure
|
||||
in this case and doesn't necessarily have to be a file system image.
|
||||
This option is used mainly for debugging.
|
||||
|
||||
Note: /dev/initrd is read-only and it can only be used once. As soon
|
||||
as the last process has closed it, all data is freed and /dev/initrd
|
||||
can't be opened anymore.
|
||||
|
||||
root=/dev/ram0
|
||||
|
||||
initrd is mounted as root, and the normal boot procedure is followed,
|
||||
with the RAM disk mounted as root.
|
||||
|
||||
Compressed cpio images
|
||||
----------------------
|
||||
|
||||
Recent kernels have support for populating a ramdisk from a compressed cpio
|
||||
archive. On such systems, the creation of a ramdisk image doesn't need to
|
||||
involve special block devices or loopbacks; you merely create a directory on
|
||||
disk with the desired initrd content, cd to that directory, and run (as an
|
||||
example)::
|
||||
|
||||
find . | cpio --quiet -H newc -o | gzip -9 -n > /boot/imagefile.img
|
||||
|
||||
Examining the contents of an existing image file is just as simple::
|
||||
|
||||
mkdir /tmp/imagefile
|
||||
cd /tmp/imagefile
|
||||
gzip -cd /boot/imagefile.img | cpio -imd --quiet
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
First, a directory for the initrd file system has to be created on the
|
||||
"normal" root file system, e.g.::
|
||||
|
||||
# mkdir /initrd
|
||||
|
||||
The name is not relevant. More details can be found on the
|
||||
:manpage:`pivot_root(2)` man page.
|
||||
|
||||
If the root file system is created during the boot procedure (i.e. if
|
||||
you're building an install floppy), the root file system creation
|
||||
procedure should create the ``/initrd`` directory.
|
||||
|
||||
If initrd will not be mounted in some cases, its content is still
|
||||
accessible if the following device has been created::
|
||||
|
||||
# mknod /dev/initrd b 1 250
|
||||
# chmod 400 /dev/initrd
|
||||
|
||||
Second, the kernel has to be compiled with RAM disk support and with
|
||||
support for the initial RAM disk enabled. Also, at least all components
|
||||
needed to execute programs from initrd (e.g. executable format and file
|
||||
system) must be compiled into the kernel.
|
||||
|
||||
Third, you have to create the RAM disk image. This is done by creating a
|
||||
file system on a block device, copying files to it as needed, and then
|
||||
copying the content of the block device to the initrd file. With recent
|
||||
kernels, at least three types of devices are suitable for that:
|
||||
|
||||
- a floppy disk (works everywhere but it's painfully slow)
|
||||
- a RAM disk (fast, but allocates physical memory)
|
||||
- a loopback device (the most elegant solution)
|
||||
|
||||
We'll describe the loopback device method:
|
||||
|
||||
1) make sure loopback block devices are configured into the kernel
|
||||
2) create an empty file system of the appropriate size, e.g.::
|
||||
|
||||
# dd if=/dev/zero of=initrd bs=300k count=1
|
||||
# mke2fs -F -m0 initrd
|
||||
|
||||
(if space is critical, you may want to use the Minix FS instead of Ext2)
|
||||
3) mount the file system, e.g.::
|
||||
|
||||
# mount -t ext2 -o loop initrd /mnt
|
||||
|
||||
4) create the console device::
|
||||
|
||||
# mkdir /mnt/dev
|
||||
# mknod /mnt/dev/console c 5 1
|
||||
|
||||
5) copy all the files that are needed to properly use the initrd
|
||||
environment. Don't forget the most important file, ``/sbin/init``
|
||||
|
||||
.. note:: ``/sbin/init`` permissions must include "x" (execute).
|
||||
|
||||
6) correct operation the initrd environment can frequently be tested
|
||||
even without rebooting with the command::
|
||||
|
||||
# chroot /mnt /sbin/init
|
||||
|
||||
This is of course limited to initrds that do not interfere with the
|
||||
general system state (e.g. by reconfiguring network interfaces,
|
||||
overwriting mounted devices, trying to start already running demons,
|
||||
etc. Note however that it is usually possible to use pivot_root in
|
||||
such a chroot'ed initrd environment.)
|
||||
7) unmount the file system::
|
||||
|
||||
# umount /mnt
|
||||
|
||||
8) the initrd is now in the file "initrd". Optionally, it can now be
|
||||
compressed::
|
||||
|
||||
# gzip -9 initrd
|
||||
|
||||
For experimenting with initrd, you may want to take a rescue floppy and
|
||||
only add a symbolic link from ``/sbin/init`` to ``/bin/sh``. Alternatively, you
|
||||
can try the experimental newlib environment [#f2]_ to create a small
|
||||
initrd.
|
||||
|
||||
Finally, you have to boot the kernel and load initrd. Almost all Linux
|
||||
boot loaders support initrd. Since the boot process is still compatible
|
||||
with an older mechanism, the following boot command line parameters
|
||||
have to be given::
|
||||
|
||||
root=/dev/ram0 rw
|
||||
|
||||
(rw is only necessary if writing to the initrd file system.)
|
||||
|
||||
With LOADLIN, you simply execute::
|
||||
|
||||
LOADLIN <kernel> initrd=<disk_image>
|
||||
|
||||
e.g.::
|
||||
|
||||
LOADLIN C:\LINUX\BZIMAGE initrd=C:\LINUX\INITRD.GZ root=/dev/ram0 rw
|
||||
|
||||
With LILO, you add the option ``INITRD=<path>`` to either the global section
|
||||
or to the section of the respective kernel in ``/etc/lilo.conf``, and pass
|
||||
the options using APPEND, e.g.::
|
||||
|
||||
image = /bzImage
|
||||
initrd = /boot/initrd.gz
|
||||
append = "root=/dev/ram0 rw"
|
||||
|
||||
and run ``/sbin/lilo``
|
||||
|
||||
For other boot loaders, please refer to the respective documentation.
|
||||
|
||||
Now you can boot and enjoy using initrd.
|
||||
|
||||
|
||||
Changing the root device
|
||||
------------------------
|
||||
|
||||
When finished with its duties, init typically changes the root device
|
||||
and proceeds with starting the Linux system on the "real" root device.
|
||||
|
||||
The procedure involves the following steps:
|
||||
- mounting the new root file system
|
||||
- turning it into the root file system
|
||||
- removing all accesses to the old (initrd) root file system
|
||||
- unmounting the initrd file system and de-allocating the RAM disk
|
||||
|
||||
Mounting the new root file system is easy: it just needs to be mounted on
|
||||
a directory under the current root. Example::
|
||||
|
||||
# mkdir /new-root
|
||||
# mount -o ro /dev/hda1 /new-root
|
||||
|
||||
The root change is accomplished with the pivot_root system call, which
|
||||
is also available via the ``pivot_root`` utility (see :manpage:`pivot_root(8)`
|
||||
man page; ``pivot_root`` is distributed with util-linux version 2.10h or higher
|
||||
[#f3]_). ``pivot_root`` moves the current root to a directory under the new
|
||||
root, and puts the new root at its place. The directory for the old root
|
||||
must exist before calling ``pivot_root``. Example::
|
||||
|
||||
# cd /new-root
|
||||
# mkdir initrd
|
||||
# pivot_root . initrd
|
||||
|
||||
Now, the init process may still access the old root via its
|
||||
executable, shared libraries, standard input/output/error, and its
|
||||
current root directory. All these references are dropped by the
|
||||
following command::
|
||||
|
||||
# exec chroot . what-follows <dev/console >dev/console 2>&1
|
||||
|
||||
Where what-follows is a program under the new root, e.g. ``/sbin/init``
|
||||
If the new root file system will be used with udev and has no valid
|
||||
``/dev`` directory, udev must be initialized before invoking chroot in order
|
||||
to provide ``/dev/console``.
|
||||
|
||||
Note: implementation details of pivot_root may change with time. In order
|
||||
to ensure compatibility, the following points should be observed:
|
||||
|
||||
- before calling pivot_root, the current directory of the invoking
|
||||
process should point to the new root directory
|
||||
- use . as the first argument, and the _relative_ path of the directory
|
||||
for the old root as the second argument
|
||||
- a chroot program must be available under the old and the new root
|
||||
- chroot to the new root afterwards
|
||||
- use relative paths for dev/console in the exec command
|
||||
|
||||
Now, the initrd can be unmounted and the memory allocated by the RAM
|
||||
disk can be freed::
|
||||
|
||||
# umount /initrd
|
||||
# blockdev --flushbufs /dev/ram0
|
||||
|
||||
It is also possible to use initrd with an NFS-mounted root, see the
|
||||
:manpage:`pivot_root(8)` man page for details.
|
||||
|
||||
|
||||
Usage scenarios
|
||||
---------------
|
||||
|
||||
The main motivation for implementing initrd was to allow for modular
|
||||
kernel configuration at system installation. The procedure would work
|
||||
as follows:
|
||||
|
||||
1) system boots from floppy or other media with a minimal kernel
|
||||
(e.g. support for RAM disks, initrd, a.out, and the Ext2 FS) and
|
||||
loads initrd
|
||||
2) ``/sbin/init`` determines what is needed to (1) mount the "real" root FS
|
||||
(i.e. device type, device drivers, file system) and (2) the
|
||||
distribution media (e.g. CD-ROM, network, tape, ...). This can be
|
||||
done by asking the user, by auto-probing, or by using a hybrid
|
||||
approach.
|
||||
3) ``/sbin/init`` loads the necessary kernel modules
|
||||
4) ``/sbin/init`` creates and populates the root file system (this doesn't
|
||||
have to be a very usable system yet)
|
||||
5) ``/sbin/init`` invokes ``pivot_root`` to change the root file system and
|
||||
execs - via chroot - a program that continues the installation
|
||||
6) the boot loader is installed
|
||||
7) the boot loader is configured to load an initrd with the set of
|
||||
modules that was used to bring up the system (e.g. ``/initrd`` can be
|
||||
modified, then unmounted, and finally, the image is written from
|
||||
``/dev/ram0`` or ``/dev/rd/0`` to a file)
|
||||
8) now the system is bootable and additional installation tasks can be
|
||||
performed
|
||||
|
||||
The key role of initrd here is to re-use the configuration data during
|
||||
normal system operation without requiring the use of a bloated "generic"
|
||||
kernel or re-compiling or re-linking the kernel.
|
||||
|
||||
A second scenario is for installations where Linux runs on systems with
|
||||
different hardware configurations in a single administrative domain. In
|
||||
such cases, it is desirable to generate only a small set of kernels
|
||||
(ideally only one) and to keep the system-specific part of configuration
|
||||
information as small as possible. In this case, a common initrd could be
|
||||
generated with all the necessary modules. Then, only ``/sbin/init`` or a file
|
||||
read by it would have to be different.
|
||||
|
||||
A third scenario is more convenient recovery disks, because information
|
||||
like the location of the root FS partition doesn't have to be provided at
|
||||
boot time, but the system loaded from initrd can invoke a user-friendly
|
||||
dialog and it can also perform some sanity checks (or even some form of
|
||||
auto-detection).
|
||||
|
||||
Last not least, CD-ROM distributors may use it for better installation
|
||||
from CD, e.g. by using a boot floppy and bootstrapping a bigger RAM disk
|
||||
via initrd from CD; or by booting via a loader like ``LOADLIN`` or directly
|
||||
from the CD-ROM, and loading the RAM disk from CD without need of
|
||||
floppies.
|
||||
|
||||
|
||||
Obsolete root change mechanism
|
||||
------------------------------
|
||||
|
||||
The following mechanism was used before the introduction of pivot_root.
|
||||
Current kernels still support it, but you should _not_ rely on its
|
||||
continued availability.
|
||||
|
||||
It works by mounting the "real" root device (i.e. the one set with rdev
|
||||
in the kernel image or with root=... at the boot command line) as the
|
||||
root file system when linuxrc exits. The initrd file system is then
|
||||
unmounted, or, if it is still busy, moved to a directory ``/initrd``, if
|
||||
such a directory exists on the new root file system.
|
||||
|
||||
In order to use this mechanism, you do not have to specify the boot
|
||||
command options root, init, or rw. (If specified, they will affect
|
||||
the real root file system, not the initrd environment.)
|
||||
|
||||
If /proc is mounted, the "real" root device can be changed from within
|
||||
linuxrc by writing the number of the new root FS device to the special
|
||||
file /proc/sys/kernel/real-root-dev, e.g.::
|
||||
|
||||
# echo 0x301 >/proc/sys/kernel/real-root-dev
|
||||
|
||||
Note that the mechanism is incompatible with NFS and similar file
|
||||
systems.
|
||||
|
||||
This old, deprecated mechanism is commonly called ``change_root``, while
|
||||
the new, supported mechanism is called ``pivot_root``.
|
||||
|
||||
|
||||
Mixed change_root and pivot_root mechanism
|
||||
------------------------------------------
|
||||
|
||||
In case you did not want to use ``root=/dev/ram0`` to trigger the pivot_root
|
||||
mechanism, you may create both ``/linuxrc`` and ``/sbin/init`` in your initrd
|
||||
image.
|
||||
|
||||
``/linuxrc`` would contain only the following::
|
||||
|
||||
#! /bin/sh
|
||||
mount -n -t proc proc /proc
|
||||
echo 0x0100 >/proc/sys/kernel/real-root-dev
|
||||
umount -n /proc
|
||||
|
||||
Once linuxrc exited, the kernel would mount again your initrd as root,
|
||||
this time executing ``/sbin/init``. Again, it would be the duty of this init
|
||||
to build the right environment (maybe using the ``root= device`` passed on
|
||||
the cmdline) before the final execution of the real ``/sbin/init``.
|
||||
|
||||
|
||||
Resources
|
||||
---------
|
||||
|
||||
.. [#f1] Almesberger, Werner; "Booting Linux: The History and the Future"
|
||||
http://www.almesberger.net/cv/papers/ols2k-9.ps.gz
|
||||
.. [#f2] newlib package (experimental), with initrd example
|
||||
https://www.sourceware.org/newlib/
|
||||
.. [#f3] util-linux: Miscellaneous utilities for Linux
|
||||
https://www.kernel.org/pub/linux/utils/util-linux/
|
||||
423
Documentation/admin-guide/java.rst
Normal file
423
Documentation/admin-guide/java.rst
Normal file
@@ -0,0 +1,423 @@
|
||||
Java(tm) Binary Kernel Support for Linux v1.03
|
||||
----------------------------------------------
|
||||
|
||||
Linux beats them ALL! While all other OS's are TALKING about direct
|
||||
support of Java Binaries in the OS, Linux is doing it!
|
||||
|
||||
You can execute Java applications and Java Applets just like any
|
||||
other program after you have done the following:
|
||||
|
||||
1) You MUST FIRST install the Java Developers Kit for Linux.
|
||||
The Java on Linux HOWTO gives the details on getting and
|
||||
installing this. This HOWTO can be found at:
|
||||
|
||||
ftp://sunsite.unc.edu/pub/Linux/docs/HOWTO/Java-HOWTO
|
||||
|
||||
You should also set up a reasonable CLASSPATH environment
|
||||
variable to use Java applications that make use of any
|
||||
nonstandard classes (not included in the same directory
|
||||
as the application itself).
|
||||
|
||||
2) You have to compile BINFMT_MISC either as a module or into
|
||||
the kernel (``CONFIG_BINFMT_MISC``) and set it up properly.
|
||||
If you choose to compile it as a module, you will have
|
||||
to insert it manually with modprobe/insmod, as kmod
|
||||
cannot easily be supported with binfmt_misc.
|
||||
Read the file 'binfmt_misc.txt' in this directory to know
|
||||
more about the configuration process.
|
||||
|
||||
3) Add the following configuration items to binfmt_misc
|
||||
(you should really have read ``binfmt_misc.txt`` now):
|
||||
support for Java applications::
|
||||
|
||||
':Java:M::\xca\xfe\xba\xbe::/usr/local/bin/javawrapper:'
|
||||
|
||||
support for executable Jar files::
|
||||
|
||||
':ExecutableJAR:E::jar::/usr/local/bin/jarwrapper:'
|
||||
|
||||
support for Java Applets::
|
||||
|
||||
':Applet:E::html::/usr/bin/appletviewer:'
|
||||
|
||||
or the following, if you want to be more selective::
|
||||
|
||||
':Applet:M::<!--applet::/usr/bin/appletviewer:'
|
||||
|
||||
Of course you have to fix the path names. The path/file names given in this
|
||||
document match the Debian 2.1 system. (i.e. jdk installed in ``/usr``,
|
||||
custom wrappers from this document in ``/usr/local``)
|
||||
|
||||
Note, that for the more selective applet support you have to modify
|
||||
existing html-files to contain ``<!--applet-->`` in the first line
|
||||
(``<`` has to be the first character!) to let this work!
|
||||
|
||||
For the compiled Java programs you need a wrapper script like the
|
||||
following (this is because Java is broken in case of the filename
|
||||
handling), again fix the path names, both in the script and in the
|
||||
above given configuration string.
|
||||
|
||||
You, too, need the little program after the script. Compile like::
|
||||
|
||||
gcc -O2 -o javaclassname javaclassname.c
|
||||
|
||||
and stick it to ``/usr/local/bin``.
|
||||
|
||||
Both the javawrapper shellscript and the javaclassname program
|
||||
were supplied by Colin J. Watson <cjw44@cam.ac.uk>.
|
||||
|
||||
Javawrapper shell script:
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
#!/bin/bash
|
||||
# /usr/local/bin/javawrapper - the wrapper for binfmt_misc/java
|
||||
|
||||
if [ -z "$1" ]; then
|
||||
exec 1>&2
|
||||
echo Usage: $0 class-file
|
||||
exit 1
|
||||
fi
|
||||
|
||||
CLASS=$1
|
||||
FQCLASS=`/usr/local/bin/javaclassname $1`
|
||||
FQCLASSN=`echo $FQCLASS | sed -e 's/^.*\.\([^.]*\)$/\1/'`
|
||||
FQCLASSP=`echo $FQCLASS | sed -e 's-\.-/-g' -e 's-^[^/]*$--' -e 's-/[^/]*$--'`
|
||||
|
||||
# for example:
|
||||
# CLASS=Test.class
|
||||
# FQCLASS=foo.bar.Test
|
||||
# FQCLASSN=Test
|
||||
# FQCLASSP=foo/bar
|
||||
|
||||
unset CLASSBASE
|
||||
|
||||
declare -i LINKLEVEL=0
|
||||
|
||||
while :; do
|
||||
if [ "`basename $CLASS .class`" == "$FQCLASSN" ]; then
|
||||
# See if this directory works straight off
|
||||
cd -L `dirname $CLASS`
|
||||
CLASSDIR=$PWD
|
||||
cd $OLDPWD
|
||||
if echo $CLASSDIR | grep -q "$FQCLASSP$"; then
|
||||
CLASSBASE=`echo $CLASSDIR | sed -e "s.$FQCLASSP$.."`
|
||||
break;
|
||||
fi
|
||||
# Try dereferencing the directory name
|
||||
cd -P `dirname $CLASS`
|
||||
CLASSDIR=$PWD
|
||||
cd $OLDPWD
|
||||
if echo $CLASSDIR | grep -q "$FQCLASSP$"; then
|
||||
CLASSBASE=`echo $CLASSDIR | sed -e "s.$FQCLASSP$.."`
|
||||
break;
|
||||
fi
|
||||
# If no other possible filename exists
|
||||
if [ ! -L $CLASS ]; then
|
||||
exec 1>&2
|
||||
echo $0:
|
||||
echo " $CLASS should be in a" \
|
||||
"directory tree called $FQCLASSP"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
if [ ! -L $CLASS ]; then break; fi
|
||||
# Go down one more level of symbolic links
|
||||
let LINKLEVEL+=1
|
||||
if [ $LINKLEVEL -gt 5 ]; then
|
||||
exec 1>&2
|
||||
echo $0:
|
||||
echo " Too many symbolic links encountered"
|
||||
exit 1
|
||||
fi
|
||||
CLASS=`ls --color=no -l $CLASS | sed -e 's/^.* \([^ ]*\)$/\1/'`
|
||||
done
|
||||
|
||||
if [ -z "$CLASSBASE" ]; then
|
||||
if [ -z "$FQCLASSP" ]; then
|
||||
GOODNAME=$FQCLASSN.class
|
||||
else
|
||||
GOODNAME=$FQCLASSP/$FQCLASSN.class
|
||||
fi
|
||||
exec 1>&2
|
||||
echo $0:
|
||||
echo " $FQCLASS should be in a file called $GOODNAME"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if ! echo $CLASSPATH | grep -q "^\(.*:\)*$CLASSBASE\(:.*\)*"; then
|
||||
# class is not in CLASSPATH, so prepend dir of class to CLASSPATH
|
||||
if [ -z "${CLASSPATH}" ] ; then
|
||||
export CLASSPATH=$CLASSBASE
|
||||
else
|
||||
export CLASSPATH=$CLASSBASE:$CLASSPATH
|
||||
fi
|
||||
fi
|
||||
|
||||
shift
|
||||
/usr/bin/java $FQCLASS "$@"
|
||||
|
||||
javaclassname.c:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/* javaclassname.c
|
||||
*
|
||||
* Extracts the class name from a Java class file; intended for use in a Java
|
||||
* wrapper of the type supported by the binfmt_misc option in the Linux kernel.
|
||||
*
|
||||
* Copyright (C) 1999 Colin J. Watson <cjw44@cam.ac.uk>.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License as published by
|
||||
* the Free Software Foundation; either version 2 of the License, or
|
||||
* (at your option) any later version.
|
||||
*
|
||||
* This program is distributed in the hope that it will be useful,
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
* GNU General Public License for more details.
|
||||
*
|
||||
* You should have received a copy of the GNU General Public License
|
||||
* along with this program; if not, write to the Free Software
|
||||
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
||||
*/
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
#include <stdarg.h>
|
||||
#include <sys/types.h>
|
||||
|
||||
/* From Sun's Java VM Specification, as tag entries in the constant pool. */
|
||||
|
||||
#define CP_UTF8 1
|
||||
#define CP_INTEGER 3
|
||||
#define CP_FLOAT 4
|
||||
#define CP_LONG 5
|
||||
#define CP_DOUBLE 6
|
||||
#define CP_CLASS 7
|
||||
#define CP_STRING 8
|
||||
#define CP_FIELDREF 9
|
||||
#define CP_METHODREF 10
|
||||
#define CP_INTERFACEMETHODREF 11
|
||||
#define CP_NAMEANDTYPE 12
|
||||
#define CP_METHODHANDLE 15
|
||||
#define CP_METHODTYPE 16
|
||||
#define CP_INVOKEDYNAMIC 18
|
||||
|
||||
/* Define some commonly used error messages */
|
||||
|
||||
#define seek_error() error("%s: Cannot seek\n", program)
|
||||
#define corrupt_error() error("%s: Class file corrupt\n", program)
|
||||
#define eof_error() error("%s: Unexpected end of file\n", program)
|
||||
#define utf8_error() error("%s: Only ASCII 1-255 supported\n", program);
|
||||
|
||||
char *program;
|
||||
|
||||
long *pool;
|
||||
|
||||
u_int8_t read_8(FILE *classfile);
|
||||
u_int16_t read_16(FILE *classfile);
|
||||
void skip_constant(FILE *classfile, u_int16_t *cur);
|
||||
void error(const char *format, ...);
|
||||
int main(int argc, char **argv);
|
||||
|
||||
/* Reads in an unsigned 8-bit integer. */
|
||||
u_int8_t read_8(FILE *classfile)
|
||||
{
|
||||
int b = fgetc(classfile);
|
||||
if(b == EOF)
|
||||
eof_error();
|
||||
return (u_int8_t)b;
|
||||
}
|
||||
|
||||
/* Reads in an unsigned 16-bit integer. */
|
||||
u_int16_t read_16(FILE *classfile)
|
||||
{
|
||||
int b1, b2;
|
||||
b1 = fgetc(classfile);
|
||||
if(b1 == EOF)
|
||||
eof_error();
|
||||
b2 = fgetc(classfile);
|
||||
if(b2 == EOF)
|
||||
eof_error();
|
||||
return (u_int16_t)((b1 << 8) | b2);
|
||||
}
|
||||
|
||||
/* Reads in a value from the constant pool. */
|
||||
void skip_constant(FILE *classfile, u_int16_t *cur)
|
||||
{
|
||||
u_int16_t len;
|
||||
int seekerr = 1;
|
||||
pool[*cur] = ftell(classfile);
|
||||
switch(read_8(classfile))
|
||||
{
|
||||
case CP_UTF8:
|
||||
len = read_16(classfile);
|
||||
seekerr = fseek(classfile, len, SEEK_CUR);
|
||||
break;
|
||||
case CP_CLASS:
|
||||
case CP_STRING:
|
||||
case CP_METHODTYPE:
|
||||
seekerr = fseek(classfile, 2, SEEK_CUR);
|
||||
break;
|
||||
case CP_METHODHANDLE:
|
||||
seekerr = fseek(classfile, 3, SEEK_CUR);
|
||||
break;
|
||||
case CP_INTEGER:
|
||||
case CP_FLOAT:
|
||||
case CP_FIELDREF:
|
||||
case CP_METHODREF:
|
||||
case CP_INTERFACEMETHODREF:
|
||||
case CP_NAMEANDTYPE:
|
||||
case CP_INVOKEDYNAMIC:
|
||||
seekerr = fseek(classfile, 4, SEEK_CUR);
|
||||
break;
|
||||
case CP_LONG:
|
||||
case CP_DOUBLE:
|
||||
seekerr = fseek(classfile, 8, SEEK_CUR);
|
||||
++(*cur);
|
||||
break;
|
||||
default:
|
||||
corrupt_error();
|
||||
}
|
||||
if(seekerr)
|
||||
seek_error();
|
||||
}
|
||||
|
||||
void error(const char *format, ...)
|
||||
{
|
||||
va_list ap;
|
||||
va_start(ap, format);
|
||||
vfprintf(stderr, format, ap);
|
||||
va_end(ap);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
FILE *classfile;
|
||||
u_int16_t cp_count, i, this_class, classinfo_ptr;
|
||||
u_int8_t length;
|
||||
|
||||
program = argv[0];
|
||||
|
||||
if(!argv[1])
|
||||
error("%s: Missing input file\n", program);
|
||||
classfile = fopen(argv[1], "rb");
|
||||
if(!classfile)
|
||||
error("%s: Error opening %s\n", program, argv[1]);
|
||||
|
||||
if(fseek(classfile, 8, SEEK_SET)) /* skip magic and version numbers */
|
||||
seek_error();
|
||||
cp_count = read_16(classfile);
|
||||
pool = calloc(cp_count, sizeof(long));
|
||||
if(!pool)
|
||||
error("%s: Out of memory for constant pool\n", program);
|
||||
|
||||
for(i = 1; i < cp_count; ++i)
|
||||
skip_constant(classfile, &i);
|
||||
if(fseek(classfile, 2, SEEK_CUR)) /* skip access flags */
|
||||
seek_error();
|
||||
|
||||
this_class = read_16(classfile);
|
||||
if(this_class < 1 || this_class >= cp_count)
|
||||
corrupt_error();
|
||||
if(!pool[this_class] || pool[this_class] == -1)
|
||||
corrupt_error();
|
||||
if(fseek(classfile, pool[this_class] + 1, SEEK_SET))
|
||||
seek_error();
|
||||
|
||||
classinfo_ptr = read_16(classfile);
|
||||
if(classinfo_ptr < 1 || classinfo_ptr >= cp_count)
|
||||
corrupt_error();
|
||||
if(!pool[classinfo_ptr] || pool[classinfo_ptr] == -1)
|
||||
corrupt_error();
|
||||
if(fseek(classfile, pool[classinfo_ptr] + 1, SEEK_SET))
|
||||
seek_error();
|
||||
|
||||
length = read_16(classfile);
|
||||
for(i = 0; i < length; ++i)
|
||||
{
|
||||
u_int8_t x = read_8(classfile);
|
||||
if((x & 0x80) || !x)
|
||||
{
|
||||
if((x & 0xE0) == 0xC0)
|
||||
{
|
||||
u_int8_t y = read_8(classfile);
|
||||
if((y & 0xC0) == 0x80)
|
||||
{
|
||||
int c = ((x & 0x1f) << 6) + (y & 0x3f);
|
||||
if(c) putchar(c);
|
||||
else utf8_error();
|
||||
}
|
||||
else utf8_error();
|
||||
}
|
||||
else utf8_error();
|
||||
}
|
||||
else if(x == '/') putchar('.');
|
||||
else putchar(x);
|
||||
}
|
||||
putchar('\n');
|
||||
free(pool);
|
||||
fclose(classfile);
|
||||
return 0;
|
||||
}
|
||||
|
||||
jarwrapper::
|
||||
|
||||
#!/bin/bash
|
||||
# /usr/local/java/bin/jarwrapper - the wrapper for binfmt_misc/jar
|
||||
|
||||
java -jar $1
|
||||
|
||||
|
||||
Now simply ``chmod +x`` the ``.class``, ``.jar`` and/or ``.html`` files you
|
||||
want to execute.
|
||||
|
||||
To add a Java program to your path best put a symbolic link to the main
|
||||
.class file into /usr/bin (or another place you like) omitting the .class
|
||||
extension. The directory containing the original .class file will be
|
||||
added to your CLASSPATH during execution.
|
||||
|
||||
|
||||
To test your new setup, enter in the following simple Java app, and name
|
||||
it "HelloWorld.java":
|
||||
|
||||
.. code-block:: java
|
||||
|
||||
class HelloWorld {
|
||||
public static void main(String args[]) {
|
||||
System.out.println("Hello World!");
|
||||
}
|
||||
}
|
||||
|
||||
Now compile the application with::
|
||||
|
||||
javac HelloWorld.java
|
||||
|
||||
Set the executable permissions of the binary file, with::
|
||||
|
||||
chmod 755 HelloWorld.class
|
||||
|
||||
And then execute it::
|
||||
|
||||
./HelloWorld.class
|
||||
|
||||
|
||||
To execute Java Jar files, simple chmod the ``*.jar`` files to include
|
||||
the execution bit, then just do::
|
||||
|
||||
./Application.jar
|
||||
|
||||
|
||||
To execute Java Applets, simple chmod the ``*.html`` files to include
|
||||
the execution bit, then just do::
|
||||
|
||||
./Applet.html
|
||||
|
||||
|
||||
originally by Brian A. Lantz, brian@lantz.com
|
||||
heavily edited for binfmt_misc by Richard Günther
|
||||
new scripts by Colin J. Watson <cjw44@cam.ac.uk>
|
||||
added executable Jar file support by Kurt Huwig <kurt@iku-netz.de>
|
||||
209
Documentation/admin-guide/kernel-parameters.rst
Normal file
209
Documentation/admin-guide/kernel-parameters.rst
Normal file
@@ -0,0 +1,209 @@
|
||||
The kernel's command-line parameters
|
||||
====================================
|
||||
|
||||
The following is a consolidated list of the kernel parameters as
|
||||
implemented by the __setup(), core_param() and module_param() macros
|
||||
and sorted into English Dictionary order (defined as ignoring all
|
||||
punctuation and sorting digits before letters in a case insensitive
|
||||
manner), and with descriptions where known.
|
||||
|
||||
The kernel parses parameters from the kernel command line up to "--";
|
||||
if it doesn't recognize a parameter and it doesn't contain a '.', the
|
||||
parameter gets passed to init: parameters with '=' go into init's
|
||||
environment, others are passed as command line arguments to init.
|
||||
Everything after "--" is passed as an argument to init.
|
||||
|
||||
Module parameters can be specified in two ways: via the kernel command
|
||||
line with a module name prefix, or via modprobe, e.g.::
|
||||
|
||||
(kernel command line) usbcore.blinkenlights=1
|
||||
(modprobe command line) modprobe usbcore blinkenlights=1
|
||||
|
||||
Parameters for modules which are built into the kernel need to be
|
||||
specified on the kernel command line. modprobe looks through the
|
||||
kernel command line (/proc/cmdline) and collects module parameters
|
||||
when it loads a module, so the kernel command line can be used for
|
||||
loadable modules too.
|
||||
|
||||
Hyphens (dashes) and underscores are equivalent in parameter names, so::
|
||||
|
||||
log_buf_len=1M print-fatal-signals=1
|
||||
|
||||
can also be entered as::
|
||||
|
||||
log-buf-len=1M print_fatal_signals=1
|
||||
|
||||
Double-quotes can be used to protect spaces in values, e.g.::
|
||||
|
||||
param="spaces in here"
|
||||
|
||||
cpu lists:
|
||||
----------
|
||||
|
||||
Some kernel parameters take a list of CPUs as a value, e.g. isolcpus,
|
||||
nohz_full, irqaffinity, rcu_nocbs. The format of this list is:
|
||||
|
||||
<cpu number>,...,<cpu number>
|
||||
|
||||
or
|
||||
|
||||
<cpu number>-<cpu number>
|
||||
(must be a positive range in ascending order)
|
||||
|
||||
or a mixture
|
||||
|
||||
<cpu number>,...,<cpu number>-<cpu number>
|
||||
|
||||
Note that for the special case of a range one can split the range into equal
|
||||
sized groups and for each group use some amount from the beginning of that
|
||||
group:
|
||||
|
||||
<cpu number>-cpu number>:<used size>/<group size>
|
||||
|
||||
For example one can add to the command line following parameter:
|
||||
|
||||
isolcpus=1,2,10-20,100-2000:2/25
|
||||
|
||||
where the final item represents CPUs 100,101,125,126,150,151,...
|
||||
|
||||
|
||||
|
||||
This document may not be entirely up to date and comprehensive. The command
|
||||
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
|
||||
module. Loadable modules, after being loaded into the running kernel, also
|
||||
reveal their parameters in /sys/module/${modulename}/parameters/. Some of these
|
||||
parameters may be changed at runtime by the command
|
||||
``echo -n ${value} > /sys/module/${modulename}/parameters/${parm}``.
|
||||
|
||||
The parameters listed below are only valid if certain kernel build options were
|
||||
enabled and if respective hardware is present. The text in square brackets at
|
||||
the beginning of each description states the restrictions within which a
|
||||
parameter is applicable::
|
||||
|
||||
ACPI ACPI support is enabled.
|
||||
AGP AGP (Accelerated Graphics Port) is enabled.
|
||||
ALSA ALSA sound support is enabled.
|
||||
APIC APIC support is enabled.
|
||||
APM Advanced Power Management support is enabled.
|
||||
ARM ARM architecture is enabled.
|
||||
AVR32 AVR32 architecture is enabled.
|
||||
AX25 Appropriate AX.25 support is enabled.
|
||||
BLACKFIN Blackfin architecture is enabled.
|
||||
CLK Common clock infrastructure is enabled.
|
||||
CMA Contiguous Memory Area support is enabled.
|
||||
DRM Direct Rendering Management support is enabled.
|
||||
DYNAMIC_DEBUG Build in debug messages and enable them at runtime
|
||||
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
|
||||
EFI EFI Partitioning (GPT) is enabled
|
||||
EIDE EIDE/ATAPI support is enabled.
|
||||
EVM Extended Verification Module
|
||||
FB The frame buffer device is enabled.
|
||||
FTRACE Function tracing enabled.
|
||||
GCOV GCOV profiling is enabled.
|
||||
HW Appropriate hardware is enabled.
|
||||
IA-64 IA-64 architecture is enabled.
|
||||
IMA Integrity measurement architecture is enabled.
|
||||
IOSCHED More than one I/O scheduler is enabled.
|
||||
IP_PNP IP DHCP, BOOTP, or RARP is enabled.
|
||||
IPV6 IPv6 support is enabled.
|
||||
ISAPNP ISA PnP code is enabled.
|
||||
ISDN Appropriate ISDN support is enabled.
|
||||
JOY Appropriate joystick support is enabled.
|
||||
KGDB Kernel debugger support is enabled.
|
||||
KVM Kernel Virtual Machine support is enabled.
|
||||
LIBATA Libata driver is enabled
|
||||
LP Printer support is enabled.
|
||||
LOOP Loopback device support is enabled.
|
||||
M68k M68k architecture is enabled.
|
||||
These options have more detailed description inside of
|
||||
Documentation/m68k/kernel-options.txt.
|
||||
MDA MDA console support is enabled.
|
||||
MIPS MIPS architecture is enabled.
|
||||
MOUSE Appropriate mouse support is enabled.
|
||||
MSI Message Signaled Interrupts (PCI).
|
||||
MTD MTD (Memory Technology Device) support is enabled.
|
||||
NET Appropriate network support is enabled.
|
||||
NUMA NUMA support is enabled.
|
||||
NFS Appropriate NFS support is enabled.
|
||||
OSS OSS sound support is enabled.
|
||||
PV_OPS A paravirtualized kernel is enabled.
|
||||
PARIDE The ParIDE (parallel port IDE) subsystem is enabled.
|
||||
PARISC The PA-RISC architecture is enabled.
|
||||
PCI PCI bus support is enabled.
|
||||
PCIE PCI Express support is enabled.
|
||||
PCMCIA The PCMCIA subsystem is enabled.
|
||||
PNP Plug & Play support is enabled.
|
||||
PPC PowerPC architecture is enabled.
|
||||
PPT Parallel port support is enabled.
|
||||
PS2 Appropriate PS/2 support is enabled.
|
||||
RAM RAM disk support is enabled.
|
||||
S390 S390 architecture is enabled.
|
||||
SCSI Appropriate SCSI support is enabled.
|
||||
A lot of drivers have their options described inside
|
||||
the Documentation/scsi/ sub-directory.
|
||||
SECURITY Different security models are enabled.
|
||||
SELINUX SELinux support is enabled.
|
||||
APPARMOR AppArmor support is enabled.
|
||||
SERIAL Serial support is enabled.
|
||||
SH SuperH architecture is enabled.
|
||||
SMP The kernel is an SMP kernel.
|
||||
SPARC Sparc architecture is enabled.
|
||||
SWSUSP Software suspend (hibernation) is enabled.
|
||||
SUSPEND System suspend states are enabled.
|
||||
TPM TPM drivers are enabled.
|
||||
TS Appropriate touchscreen support is enabled.
|
||||
UMS USB Mass Storage support is enabled.
|
||||
USB USB support is enabled.
|
||||
USBHID USB Human Interface Device support is enabled.
|
||||
V4L Video For Linux support is enabled.
|
||||
VMMIO Driver for memory mapped virtio devices is enabled.
|
||||
VGA The VGA console has been enabled.
|
||||
VT Virtual terminal support is enabled.
|
||||
WDT Watchdog support is enabled.
|
||||
XT IBM PC/XT MFM hard disk support is enabled.
|
||||
X86-32 X86-32, aka i386 architecture is enabled.
|
||||
X86-64 X86-64 architecture is enabled.
|
||||
More X86-64 boot options can be found in
|
||||
Documentation/x86/x86_64/boot-options.txt .
|
||||
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
|
||||
X86_UV SGI UV support is enabled.
|
||||
XEN Xen support is enabled
|
||||
|
||||
In addition, the following text indicates that the option::
|
||||
|
||||
BUGS= Relates to possible processor bugs on the said processor.
|
||||
KNL Is a kernel start-up parameter.
|
||||
BOOT Is a boot loader parameter.
|
||||
|
||||
Parameters denoted with BOOT are actually interpreted by the boot
|
||||
loader, and have no meaning to the kernel directly.
|
||||
Do not modify the syntax of boot loader parameters without extreme
|
||||
need or coordination with <Documentation/x86/boot.txt>.
|
||||
|
||||
There are also arch-specific kernel-parameters not documented here.
|
||||
See for example <Documentation/x86/x86_64/boot-options.txt>.
|
||||
|
||||
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
|
||||
a trailing = on the name of any parameter states that that parameter will
|
||||
be entered as an environment variable, whereas its absence indicates that
|
||||
it will appear as a kernel argument readable via /proc/cmdline by programs
|
||||
running once the system is up.
|
||||
|
||||
The number of kernel parameters is not limited, but the length of the
|
||||
complete command line (parameters including spaces etc.) is limited to
|
||||
a fixed number of characters. This limit depends on the architecture
|
||||
and is between 256 and 4096 characters. It is defined in the file
|
||||
./include/asm/setup.h as COMMAND_LINE_SIZE.
|
||||
|
||||
Finally, the [KMG] suffix is commonly described after a number of kernel
|
||||
parameter values. These 'K', 'M', and 'G' letters represent the _binary_
|
||||
multipliers 'Kilo', 'Mega', and 'Giga', equalling 2^10, 2^20, and 2^30
|
||||
bytes respectively. Such letter suffixes can also be entirely omitted:
|
||||
|
||||
.. include:: kernel-parameters.txt
|
||||
:literal:
|
||||
|
||||
Todo
|
||||
----
|
||||
|
||||
Add more DRM drivers.
|
||||
4379
Documentation/admin-guide/kernel-parameters.txt
Normal file
4379
Documentation/admin-guide/kernel-parameters.txt
Normal file
File diff suppressed because it is too large
Load Diff
727
Documentation/admin-guide/md.rst
Normal file
727
Documentation/admin-guide/md.rst
Normal file
@@ -0,0 +1,727 @@
|
||||
RAID arrays
|
||||
===========
|
||||
|
||||
Boot time assembly of RAID arrays
|
||||
---------------------------------
|
||||
|
||||
Tools that manage md devices can be found at
|
||||
http://www.kernel.org/pub/linux/utils/raid/
|
||||
|
||||
|
||||
You can boot with your md device with the following kernel command
|
||||
lines:
|
||||
|
||||
for old raid arrays without persistent superblocks::
|
||||
|
||||
md=<md device no.>,<raid level>,<chunk size factor>,<fault level>,dev0,dev1,...,devn
|
||||
|
||||
for raid arrays with persistent superblocks::
|
||||
|
||||
md=<md device no.>,dev0,dev1,...,devn
|
||||
|
||||
or, to assemble a partitionable array::
|
||||
|
||||
md=d<md device no.>,dev0,dev1,...,devn
|
||||
|
||||
``md device no.``
|
||||
+++++++++++++++++
|
||||
|
||||
The number of the md device
|
||||
|
||||
================= =========
|
||||
``md device no.`` device
|
||||
================= =========
|
||||
0 md0
|
||||
1 md1
|
||||
2 md2
|
||||
3 md3
|
||||
4 md4
|
||||
================= =========
|
||||
|
||||
``raid level``
|
||||
++++++++++++++
|
||||
|
||||
level of the RAID array
|
||||
|
||||
=============== =============
|
||||
``raid level`` level
|
||||
=============== =============
|
||||
-1 linear mode
|
||||
0 striped mode
|
||||
=============== =============
|
||||
|
||||
other modes are only supported with persistent super blocks
|
||||
|
||||
``chunk size factor``
|
||||
+++++++++++++++++++++
|
||||
|
||||
(raid-0 and raid-1 only)
|
||||
|
||||
Set the chunk size as 4k << n.
|
||||
|
||||
``fault level``
|
||||
+++++++++++++++
|
||||
|
||||
Totally ignored
|
||||
|
||||
``dev0`` to ``devn``
|
||||
++++++++++++++++++++
|
||||
|
||||
e.g. ``/dev/hda1``, ``/dev/hdc1``, ``/dev/sda1``, ``/dev/sdb1``
|
||||
|
||||
A possible loadlin line (Harald Hoyer <HarryH@Royal.Net>) looks like this::
|
||||
|
||||
e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro
|
||||
|
||||
|
||||
Boot time autodetection of RAID arrays
|
||||
--------------------------------------
|
||||
|
||||
When md is compiled into the kernel (not as module), partitions of
|
||||
type 0xfd are scanned and automatically assembled into RAID arrays.
|
||||
This autodetection may be suppressed with the kernel parameter
|
||||
``raid=noautodetect``. As of kernel 2.6.9, only drives with a type 0
|
||||
superblock can be autodetected and run at boot time.
|
||||
|
||||
The kernel parameter ``raid=partitionable`` (or ``raid=part``) means
|
||||
that all auto-detected arrays are assembled as partitionable.
|
||||
|
||||
Boot time assembly of degraded/dirty arrays
|
||||
-------------------------------------------
|
||||
|
||||
If a raid5 or raid6 array is both dirty and degraded, it could have
|
||||
undetectable data corruption. This is because the fact that it is
|
||||
``dirty`` means that the parity cannot be trusted, and the fact that it
|
||||
is degraded means that some datablocks are missing and cannot reliably
|
||||
be reconstructed (due to no parity).
|
||||
|
||||
For this reason, md will normally refuse to start such an array. This
|
||||
requires the sysadmin to take action to explicitly start the array
|
||||
despite possible corruption. This is normally done with::
|
||||
|
||||
mdadm --assemble --force ....
|
||||
|
||||
This option is not really available if the array has the root
|
||||
filesystem on it. In order to support this booting from such an
|
||||
array, md supports a module parameter ``start_dirty_degraded`` which,
|
||||
when set to 1, bypassed the checks and will allows dirty degraded
|
||||
arrays to be started.
|
||||
|
||||
So, to boot with a root filesystem of a dirty degraded raid 5 or 6, use::
|
||||
|
||||
md-mod.start_dirty_degraded=1
|
||||
|
||||
|
||||
Superblock formats
|
||||
------------------
|
||||
|
||||
The md driver can support a variety of different superblock formats.
|
||||
Currently, it supports superblock formats ``0.90.0`` and the ``md-1`` format
|
||||
introduced in the 2.5 development series.
|
||||
|
||||
The kernel will autodetect which format superblock is being used.
|
||||
|
||||
Superblock format ``0`` is treated differently to others for legacy
|
||||
reasons - it is the original superblock format.
|
||||
|
||||
|
||||
General Rules - apply for all superblock formats
|
||||
------------------------------------------------
|
||||
|
||||
An array is ``created`` by writing appropriate superblocks to all
|
||||
devices.
|
||||
|
||||
It is ``assembled`` by associating each of these devices with an
|
||||
particular md virtual device. Once it is completely assembled, it can
|
||||
be accessed.
|
||||
|
||||
An array should be created by a user-space tool. This will write
|
||||
superblocks to all devices. It will usually mark the array as
|
||||
``unclean``, or with some devices missing so that the kernel md driver
|
||||
can create appropriate redundancy (copying in raid 1, parity
|
||||
calculation in raid 4/5).
|
||||
|
||||
When an array is assembled, it is first initialized with the
|
||||
SET_ARRAY_INFO ioctl. This contains, in particular, a major and minor
|
||||
version number. The major version number selects which superblock
|
||||
format is to be used. The minor number might be used to tune handling
|
||||
of the format, such as suggesting where on each device to look for the
|
||||
superblock.
|
||||
|
||||
Then each device is added using the ADD_NEW_DISK ioctl. This
|
||||
provides, in particular, a major and minor number identifying the
|
||||
device to add.
|
||||
|
||||
The array is started with the RUN_ARRAY ioctl.
|
||||
|
||||
Once started, new devices can be added. They should have an
|
||||
appropriate superblock written to them, and then be passed in with
|
||||
ADD_NEW_DISK.
|
||||
|
||||
Devices that have failed or are not yet active can be detached from an
|
||||
array using HOT_REMOVE_DISK.
|
||||
|
||||
|
||||
Specific Rules that apply to format-0 super block arrays, and arrays with no superblock (non-persistent)
|
||||
--------------------------------------------------------------------------------------------------------
|
||||
|
||||
An array can be ``created`` by describing the array (level, chunksize
|
||||
etc) in a SET_ARRAY_INFO ioctl. This must have ``major_version==0`` and
|
||||
``raid_disks != 0``.
|
||||
|
||||
Then uninitialized devices can be added with ADD_NEW_DISK. The
|
||||
structure passed to ADD_NEW_DISK must specify the state of the device
|
||||
and its role in the array.
|
||||
|
||||
Once started with RUN_ARRAY, uninitialized spares can be added with
|
||||
HOT_ADD_DISK.
|
||||
|
||||
|
||||
MD devices in sysfs
|
||||
-------------------
|
||||
|
||||
md devices appear in sysfs (``/sys``) as regular block devices,
|
||||
e.g.::
|
||||
|
||||
/sys/block/md0
|
||||
|
||||
Each ``md`` device will contain a subdirectory called ``md`` which
|
||||
contains further md-specific information about the device.
|
||||
|
||||
All md devices contain:
|
||||
|
||||
level
|
||||
a text file indicating the ``raid level``. e.g. raid0, raid1,
|
||||
raid5, linear, multipath, faulty.
|
||||
If no raid level has been set yet (array is still being
|
||||
assembled), the value will reflect whatever has been written
|
||||
to it, which may be a name like the above, or may be a number
|
||||
such as ``0``, ``5``, etc.
|
||||
|
||||
raid_disks
|
||||
a text file with a simple number indicating the number of devices
|
||||
in a fully functional array. If this is not yet known, the file
|
||||
will be empty. If an array is being resized this will contain
|
||||
the new number of devices.
|
||||
Some raid levels allow this value to be set while the array is
|
||||
active. This will reconfigure the array. Otherwise it can only
|
||||
be set while assembling an array.
|
||||
A change to this attribute will not be permitted if it would
|
||||
reduce the size of the array. To reduce the number of drives
|
||||
in an e.g. raid5, the array size must first be reduced by
|
||||
setting the ``array_size`` attribute.
|
||||
|
||||
chunk_size
|
||||
This is the size in bytes for ``chunks`` and is only relevant to
|
||||
raid levels that involve striping (0,4,5,6,10). The address space
|
||||
of the array is conceptually divided into chunks and consecutive
|
||||
chunks are striped onto neighbouring devices.
|
||||
The size should be at least PAGE_SIZE (4k) and should be a power
|
||||
of 2. This can only be set while assembling an array
|
||||
|
||||
layout
|
||||
The ``layout`` for the array for the particular level. This is
|
||||
simply a number that is interpretted differently by different
|
||||
levels. It can be written while assembling an array.
|
||||
|
||||
array_size
|
||||
This can be used to artificially constrain the available space in
|
||||
the array to be less than is actually available on the combined
|
||||
devices. Writing a number (in Kilobytes) which is less than
|
||||
the available size will set the size. Any reconfiguration of the
|
||||
array (e.g. adding devices) will not cause the size to change.
|
||||
Writing the word ``default`` will cause the effective size of the
|
||||
array to be whatever size is actually available based on
|
||||
``level``, ``chunk_size`` and ``component_size``.
|
||||
|
||||
This can be used to reduce the size of the array before reducing
|
||||
the number of devices in a raid4/5/6, or to support external
|
||||
metadata formats which mandate such clipping.
|
||||
|
||||
reshape_position
|
||||
This is either ``none`` or a sector number within the devices of
|
||||
the array where ``reshape`` is up to. If this is set, the three
|
||||
attributes mentioned above (raid_disks, chunk_size, layout) can
|
||||
potentially have 2 values, an old and a new value. If these
|
||||
values differ, reading the attribute returns::
|
||||
|
||||
new (old)
|
||||
|
||||
and writing will effect the ``new`` value, leaving the ``old``
|
||||
unchanged.
|
||||
|
||||
component_size
|
||||
For arrays with data redundancy (i.e. not raid0, linear, faulty,
|
||||
multipath), all components must be the same size - or at least
|
||||
there must a size that they all provide space for. This is a key
|
||||
part or the geometry of the array. It is measured in sectors
|
||||
and can be read from here. Writing to this value may resize
|
||||
the array if the personality supports it (raid1, raid5, raid6),
|
||||
and if the component drives are large enough.
|
||||
|
||||
metadata_version
|
||||
This indicates the format that is being used to record metadata
|
||||
about the array. It can be 0.90 (traditional format), 1.0, 1.1,
|
||||
1.2 (newer format in varying locations) or ``none`` indicating that
|
||||
the kernel isn't managing metadata at all.
|
||||
Alternately it can be ``external:`` followed by a string which
|
||||
is set by user-space. This indicates that metadata is managed
|
||||
by a user-space program. Any device failure or other event that
|
||||
requires a metadata update will cause array activity to be
|
||||
suspended until the event is acknowledged.
|
||||
|
||||
resync_start
|
||||
The point at which resync should start. If no resync is needed,
|
||||
this will be a very large number (or ``none`` since 2.6.30-rc1). At
|
||||
array creation it will default to 0, though starting the array as
|
||||
``clean`` will set it much larger.
|
||||
|
||||
new_dev
|
||||
This file can be written but not read. The value written should
|
||||
be a block device number as major:minor. e.g. 8:0
|
||||
This will cause that device to be attached to the array, if it is
|
||||
available. It will then appear at md/dev-XXX (depending on the
|
||||
name of the device) and further configuration is then possible.
|
||||
|
||||
safe_mode_delay
|
||||
When an md array has seen no write requests for a certain period
|
||||
of time, it will be marked as ``clean``. When another write
|
||||
request arrives, the array is marked as ``dirty`` before the write
|
||||
commences. This is known as ``safe_mode``.
|
||||
The ``certain period`` is controlled by this file which stores the
|
||||
period as a number of seconds. The default is 200msec (0.200).
|
||||
Writing a value of 0 disables safemode.
|
||||
|
||||
array_state
|
||||
This file contains a single word which describes the current
|
||||
state of the array. In many cases, the state can be set by
|
||||
writing the word for the desired state, however some states
|
||||
cannot be explicitly set, and some transitions are not allowed.
|
||||
|
||||
Select/poll works on this file. All changes except between
|
||||
Active_idle and active (which can be frequent and are not
|
||||
very interesting) are notified. active->active_idle is
|
||||
reported if the metadata is externally managed.
|
||||
|
||||
clear
|
||||
No devices, no size, no level
|
||||
|
||||
Writing is equivalent to STOP_ARRAY ioctl
|
||||
|
||||
inactive
|
||||
May have some settings, but array is not active
|
||||
all IO results in error
|
||||
|
||||
When written, doesn't tear down array, but just stops it
|
||||
|
||||
suspended (not supported yet)
|
||||
All IO requests will block. The array can be reconfigured.
|
||||
|
||||
Writing this, if accepted, will block until array is quiessent
|
||||
|
||||
readonly
|
||||
no resync can happen. no superblocks get written.
|
||||
|
||||
Write requests fail
|
||||
|
||||
read-auto
|
||||
like readonly, but behaves like ``clean`` on a write request.
|
||||
|
||||
clean
|
||||
no pending writes, but otherwise active.
|
||||
|
||||
When written to inactive array, starts without resync
|
||||
|
||||
If a write request arrives then
|
||||
if metadata is known, mark ``dirty`` and switch to ``active``.
|
||||
if not known, block and switch to write-pending
|
||||
|
||||
If written to an active array that has pending writes, then fails.
|
||||
active
|
||||
fully active: IO and resync can be happening.
|
||||
When written to inactive array, starts with resync
|
||||
|
||||
write-pending
|
||||
clean, but writes are blocked waiting for ``active`` to be written.
|
||||
|
||||
active-idle
|
||||
like active, but no writes have been seen for a while (safe_mode_delay).
|
||||
|
||||
bitmap/location
|
||||
This indicates where the write-intent bitmap for the array is
|
||||
stored.
|
||||
|
||||
It can be one of ``none``, ``file`` or ``[+-]N``.
|
||||
``file`` may later be extended to ``file:/file/name``
|
||||
``[+-]N`` means that many sectors from the start of the metadata.
|
||||
|
||||
This is replicated on all devices. For arrays with externally
|
||||
managed metadata, the offset is from the beginning of the
|
||||
device.
|
||||
|
||||
bitmap/chunksize
|
||||
The size, in bytes, of the chunk which will be represented by a
|
||||
single bit. For RAID456, it is a portion of an individual
|
||||
device. For RAID10, it is a portion of the array. For RAID1, it
|
||||
is both (they come to the same thing).
|
||||
|
||||
bitmap/time_base
|
||||
The time, in seconds, between looking for bits in the bitmap to
|
||||
be cleared. In the current implementation, a bit will be cleared
|
||||
between 2 and 3 times ``time_base`` after all the covered blocks
|
||||
are known to be in-sync.
|
||||
|
||||
bitmap/backlog
|
||||
When write-mostly devices are active in a RAID1, write requests
|
||||
to those devices proceed in the background - the filesystem (or
|
||||
other user of the device) does not have to wait for them.
|
||||
``backlog`` sets a limit on the number of concurrent background
|
||||
writes. If there are more than this, new writes will by
|
||||
synchronous.
|
||||
|
||||
bitmap/metadata
|
||||
This can be either ``internal`` or ``external``.
|
||||
|
||||
``internal``
|
||||
is the default and means the metadata for the bitmap
|
||||
is stored in the first 256 bytes of the allocated space and is
|
||||
managed by the md module.
|
||||
|
||||
``external``
|
||||
means that bitmap metadata is managed externally to
|
||||
the kernel (i.e. by some userspace program)
|
||||
|
||||
bitmap/can_clear
|
||||
This is either ``true`` or ``false``. If ``true``, then bits in the
|
||||
bitmap will be cleared when the corresponding blocks are thought
|
||||
to be in-sync. If ``false``, bits will never be cleared.
|
||||
This is automatically set to ``false`` if a write happens on a
|
||||
degraded array, or if the array becomes degraded during a write.
|
||||
When metadata is managed externally, it should be set to true
|
||||
once the array becomes non-degraded, and this fact has been
|
||||
recorded in the metadata.
|
||||
|
||||
|
||||
|
||||
|
||||
As component devices are added to an md array, they appear in the ``md``
|
||||
directory as new directories named::
|
||||
|
||||
dev-XXX
|
||||
|
||||
where ``XXX`` is a name that the kernel knows for the device, e.g. hdb1.
|
||||
Each directory contains:
|
||||
|
||||
block
|
||||
a symlink to the block device in /sys/block, e.g.::
|
||||
|
||||
/sys/block/md0/md/dev-hdb1/block -> ../../../../block/hdb/hdb1
|
||||
|
||||
super
|
||||
A file containing an image of the superblock read from, or
|
||||
written to, that device.
|
||||
|
||||
state
|
||||
A file recording the current state of the device in the array
|
||||
which can be a comma separated list of:
|
||||
|
||||
faulty
|
||||
device has been kicked from active use due to
|
||||
a detected fault, or it has unacknowledged bad
|
||||
blocks
|
||||
|
||||
in_sync
|
||||
device is a fully in-sync member of the array
|
||||
|
||||
writemostly
|
||||
device will only be subject to read
|
||||
requests if there are no other options.
|
||||
|
||||
This applies only to raid1 arrays.
|
||||
|
||||
blocked
|
||||
device has failed, and the failure hasn't been
|
||||
acknowledged yet by the metadata handler.
|
||||
|
||||
Writes that would write to this device if
|
||||
it were not faulty are blocked.
|
||||
|
||||
spare
|
||||
device is working, but not a full member.
|
||||
|
||||
This includes spares that are in the process
|
||||
of being recovered to
|
||||
|
||||
write_error
|
||||
device has ever seen a write error.
|
||||
|
||||
want_replacement
|
||||
device is (mostly) working but probably
|
||||
should be replaced, either due to errors or
|
||||
due to user request.
|
||||
|
||||
replacement
|
||||
device is a replacement for another active
|
||||
device with same raid_disk.
|
||||
|
||||
|
||||
This list may grow in future.
|
||||
|
||||
This can be written to.
|
||||
|
||||
Writing ``faulty`` simulates a failure on the device.
|
||||
|
||||
Writing ``remove`` removes the device from the array.
|
||||
|
||||
Writing ``writemostly`` sets the writemostly flag.
|
||||
|
||||
Writing ``-writemostly`` clears the writemostly flag.
|
||||
|
||||
Writing ``blocked`` sets the ``blocked`` flag.
|
||||
|
||||
Writing ``-blocked`` clears the ``blocked`` flags and allows writes
|
||||
to complete and possibly simulates an error.
|
||||
|
||||
Writing ``in_sync`` sets the in_sync flag.
|
||||
|
||||
Writing ``write_error`` sets writeerrorseen flag.
|
||||
|
||||
Writing ``-write_error`` clears writeerrorseen flag.
|
||||
|
||||
Writing ``want_replacement`` is allowed at any time except to a
|
||||
replacement device or a spare. It sets the flag.
|
||||
|
||||
Writing ``-want_replacement`` is allowed at any time. It clears
|
||||
the flag.
|
||||
|
||||
Writing ``replacement`` or ``-replacement`` is only allowed before
|
||||
starting the array. It sets or clears the flag.
|
||||
|
||||
|
||||
This file responds to select/poll. Any change to ``faulty``
|
||||
or ``blocked`` causes an event.
|
||||
|
||||
errors
|
||||
An approximate count of read errors that have been detected on
|
||||
this device but have not caused the device to be evicted from
|
||||
the array (either because they were corrected or because they
|
||||
happened while the array was read-only). When using version-1
|
||||
metadata, this value persists across restarts of the array.
|
||||
|
||||
This value can be written while assembling an array thus
|
||||
providing an ongoing count for arrays with metadata managed by
|
||||
userspace.
|
||||
|
||||
slot
|
||||
This gives the role that the device has in the array. It will
|
||||
either be ``none`` if the device is not active in the array
|
||||
(i.e. is a spare or has failed) or an integer less than the
|
||||
``raid_disks`` number for the array indicating which position
|
||||
it currently fills. This can only be set while assembling an
|
||||
array. A device for which this is set is assumed to be working.
|
||||
|
||||
offset
|
||||
This gives the location in the device (in sectors from the
|
||||
start) where data from the array will be stored. Any part of
|
||||
the device before this offset is not touched, unless it is
|
||||
used for storing metadata (Formats 1.1 and 1.2).
|
||||
|
||||
size
|
||||
The amount of the device, after the offset, that can be used
|
||||
for storage of data. This will normally be the same as the
|
||||
component_size. This can be written while assembling an
|
||||
array. If a value less than the current component_size is
|
||||
written, it will be rejected.
|
||||
|
||||
recovery_start
|
||||
When the device is not ``in_sync``, this records the number of
|
||||
sectors from the start of the device which are known to be
|
||||
correct. This is normally zero, but during a recovery
|
||||
operation it will steadily increase, and if the recovery is
|
||||
interrupted, restoring this value can cause recovery to
|
||||
avoid repeating the earlier blocks. With v1.x metadata, this
|
||||
value is saved and restored automatically.
|
||||
|
||||
This can be set whenever the device is not an active member of
|
||||
the array, either before the array is activated, or before
|
||||
the ``slot`` is set.
|
||||
|
||||
Setting this to ``none`` is equivalent to setting ``in_sync``.
|
||||
Setting to any other value also clears the ``in_sync`` flag.
|
||||
|
||||
bad_blocks
|
||||
This gives the list of all known bad blocks in the form of
|
||||
start address and length (in sectors respectively). If output
|
||||
is too big to fit in a page, it will be truncated. Writing
|
||||
``sector length`` to this file adds new acknowledged (i.e.
|
||||
recorded to disk safely) bad blocks.
|
||||
|
||||
unacknowledged_bad_blocks
|
||||
This gives the list of known-but-not-yet-saved-to-disk bad
|
||||
blocks in the same form of ``bad_blocks``. If output is too big
|
||||
to fit in a page, it will be truncated. Writing to this file
|
||||
adds bad blocks without acknowledging them. This is largely
|
||||
for testing.
|
||||
|
||||
|
||||
|
||||
An active md device will also contain an entry for each active device
|
||||
in the array. These are named::
|
||||
|
||||
rdNN
|
||||
|
||||
where ``NN`` is the position in the array, starting from 0.
|
||||
So for a 3 drive array there will be rd0, rd1, rd2.
|
||||
These are symbolic links to the appropriate ``dev-XXX`` entry.
|
||||
Thus, for example::
|
||||
|
||||
cat /sys/block/md*/md/rd*/state
|
||||
|
||||
will show ``in_sync`` on every line.
|
||||
|
||||
|
||||
|
||||
Active md devices for levels that support data redundancy (1,4,5,6,10)
|
||||
also have
|
||||
|
||||
sync_action
|
||||
a text file that can be used to monitor and control the rebuild
|
||||
process. It contains one word which can be one of:
|
||||
|
||||
resync
|
||||
redundancy is being recalculated after unclean
|
||||
shutdown or creation
|
||||
|
||||
recover
|
||||
a hot spare is being built to replace a
|
||||
failed/missing device
|
||||
|
||||
idle
|
||||
nothing is happening
|
||||
check
|
||||
A full check of redundancy was requested and is
|
||||
happening. This reads all blocks and checks
|
||||
them. A repair may also happen for some raid
|
||||
levels.
|
||||
|
||||
repair
|
||||
A full check and repair is happening. This is
|
||||
similar to ``resync``, but was requested by the
|
||||
user, and the write-intent bitmap is NOT used to
|
||||
optimise the process.
|
||||
|
||||
This file is writable, and each of the strings that could be
|
||||
read are meaningful for writing.
|
||||
|
||||
``idle`` will stop an active resync/recovery etc. There is no
|
||||
guarantee that another resync/recovery may not be automatically
|
||||
started again, though some event will be needed to trigger
|
||||
this.
|
||||
|
||||
``resync`` or ``recovery`` can be used to restart the
|
||||
corresponding operation if it was stopped with ``idle``.
|
||||
|
||||
``check`` and ``repair`` will start the appropriate process
|
||||
providing the current state is ``idle``.
|
||||
|
||||
This file responds to select/poll. Any important change in the value
|
||||
triggers a poll event. Sometimes the value will briefly be
|
||||
``recover`` if a recovery seems to be needed, but cannot be
|
||||
achieved. In that case, the transition to ``recover`` isn't
|
||||
notified, but the transition away is.
|
||||
|
||||
degraded
|
||||
This contains a count of the number of devices by which the
|
||||
arrays is degraded. So an optimal array will show ``0``. A
|
||||
single failed/missing drive will show ``1``, etc.
|
||||
|
||||
This file responds to select/poll, any increase or decrease
|
||||
in the count of missing devices will trigger an event.
|
||||
|
||||
mismatch_count
|
||||
When performing ``check`` and ``repair``, and possibly when
|
||||
performing ``resync``, md will count the number of errors that are
|
||||
found. The count in ``mismatch_cnt`` is the number of sectors
|
||||
that were re-written, or (for ``check``) would have been
|
||||
re-written. As most raid levels work in units of pages rather
|
||||
than sectors, this may be larger than the number of actual errors
|
||||
by a factor of the number of sectors in a page.
|
||||
|
||||
bitmap_set_bits
|
||||
If the array has a write-intent bitmap, then writing to this
|
||||
attribute can set bits in the bitmap, indicating that a resync
|
||||
would need to check the corresponding blocks. Either individual
|
||||
numbers or start-end pairs can be written. Multiple numbers
|
||||
can be separated by a space.
|
||||
|
||||
Note that the numbers are ``bit`` numbers, not ``block`` numbers.
|
||||
They should be scaled by the bitmap_chunksize.
|
||||
|
||||
sync_speed_min, sync_speed_max
|
||||
This are similar to ``/proc/sys/dev/raid/speed_limit_{min,max}``
|
||||
however they only apply to the particular array.
|
||||
|
||||
If no value has been written to these, or if the word ``system``
|
||||
is written, then the system-wide value is used. If a value,
|
||||
in kibibytes-per-second is written, then it is used.
|
||||
|
||||
When the files are read, they show the currently active value
|
||||
followed by ``(local)`` or ``(system)`` depending on whether it is
|
||||
a locally set or system-wide value.
|
||||
|
||||
sync_completed
|
||||
This shows the number of sectors that have been completed of
|
||||
whatever the current sync_action is, followed by the number of
|
||||
sectors in total that could need to be processed. The two
|
||||
numbers are separated by a ``/`` thus effectively showing one
|
||||
value, a fraction of the process that is complete.
|
||||
|
||||
A ``select`` on this attribute will return when resync completes,
|
||||
when it reaches the current sync_max (below) and possibly at
|
||||
other times.
|
||||
|
||||
sync_speed
|
||||
This shows the current actual speed, in K/sec, of the current
|
||||
sync_action. It is averaged over the last 30 seconds.
|
||||
|
||||
suspend_lo, suspend_hi
|
||||
The two values, given as numbers of sectors, indicate a range
|
||||
within the array where IO will be blocked. This is currently
|
||||
only supported for raid4/5/6.
|
||||
|
||||
sync_min, sync_max
|
||||
The two values, given as numbers of sectors, indicate a range
|
||||
within the array where ``check``/``repair`` will operate. Must be
|
||||
a multiple of chunk_size. When it reaches ``sync_max`` it will
|
||||
pause, rather than complete.
|
||||
You can use ``select`` or ``poll`` on ``sync_completed`` to wait for
|
||||
that number to reach sync_max. Then you can either increase
|
||||
``sync_max``, or can write ``idle`` to ``sync_action``.
|
||||
|
||||
The value of ``max`` for ``sync_max`` effectively disables the limit.
|
||||
When a resync is active, the value can only ever be increased,
|
||||
never decreased.
|
||||
The value of ``0`` is the minimum for ``sync_min``.
|
||||
|
||||
|
||||
|
||||
Each active md device may also have attributes specific to the
|
||||
personality module that manages it.
|
||||
These are specific to the implementation of the module and could
|
||||
change substantially if the implementation changes.
|
||||
|
||||
These currently include:
|
||||
|
||||
stripe_cache_size (currently raid5 only)
|
||||
number of entries in the stripe cache. This is writable, but
|
||||
there are upper and lower limits (32768, 17). Default is 256.
|
||||
|
||||
strip_cache_active (currently raid5 only)
|
||||
number of active entries in the stripe cache
|
||||
|
||||
preread_bypass_threshold (currently raid5 only)
|
||||
number of times a stripe requiring preread will be bypassed by
|
||||
a stripe that does not require preread. For fairness defaults
|
||||
to 1. Setting this to 0 disables bypass accounting and
|
||||
requires preread stripes to wait until all full-width stripe-
|
||||
writes are complete. Valid values are 0 to stripe_cache_size.
|
||||
285
Documentation/admin-guide/module-signing.rst
Normal file
285
Documentation/admin-guide/module-signing.rst
Normal file
@@ -0,0 +1,285 @@
|
||||
Kernel module signing facility
|
||||
------------------------------
|
||||
|
||||
.. CONTENTS
|
||||
..
|
||||
.. - Overview.
|
||||
.. - Configuring module signing.
|
||||
.. - Generating signing keys.
|
||||
.. - Public keys in the kernel.
|
||||
.. - Manually signing modules.
|
||||
.. - Signed modules and stripping.
|
||||
.. - Loading signed modules.
|
||||
.. - Non-valid signatures and unsigned modules.
|
||||
.. - Administering/protecting the private key.
|
||||
|
||||
|
||||
========
|
||||
Overview
|
||||
========
|
||||
|
||||
The kernel module signing facility cryptographically signs modules during
|
||||
installation and then checks the signature upon loading the module. This
|
||||
allows increased kernel security by disallowing the loading of unsigned modules
|
||||
or modules signed with an invalid key. Module signing increases security by
|
||||
making it harder to load a malicious module into the kernel. The module
|
||||
signature checking is done by the kernel so that it is not necessary to have
|
||||
trusted userspace bits.
|
||||
|
||||
This facility uses X.509 ITU-T standard certificates to encode the public keys
|
||||
involved. The signatures are not themselves encoded in any industrial standard
|
||||
type. The facility currently only supports the RSA public key encryption
|
||||
standard (though it is pluggable and permits others to be used). The possible
|
||||
hash algorithms that can be used are SHA-1, SHA-224, SHA-256, SHA-384, and
|
||||
SHA-512 (the algorithm is selected by data in the signature).
|
||||
|
||||
|
||||
==========================
|
||||
Configuring module signing
|
||||
==========================
|
||||
|
||||
The module signing facility is enabled by going to the
|
||||
:menuselection:`Enable Loadable Module Support` section of
|
||||
the kernel configuration and turning on::
|
||||
|
||||
CONFIG_MODULE_SIG "Module signature verification"
|
||||
|
||||
This has a number of options available:
|
||||
|
||||
(1) :menuselection:`Require modules to be validly signed`
|
||||
(``CONFIG_MODULE_SIG_FORCE``)
|
||||
|
||||
This specifies how the kernel should deal with a module that has a
|
||||
signature for which the key is not known or a module that is unsigned.
|
||||
|
||||
If this is off (ie. "permissive"), then modules for which the key is not
|
||||
available and modules that are unsigned are permitted, but the kernel will
|
||||
be marked as being tainted, and the concerned modules will be marked as
|
||||
tainted, shown with the character 'E'.
|
||||
|
||||
If this is on (ie. "restrictive"), only modules that have a valid
|
||||
signature that can be verified by a public key in the kernel's possession
|
||||
will be loaded. All other modules will generate an error.
|
||||
|
||||
Irrespective of the setting here, if the module has a signature block that
|
||||
cannot be parsed, it will be rejected out of hand.
|
||||
|
||||
|
||||
(2) :menuselection:`Automatically sign all modules`
|
||||
(``CONFIG_MODULE_SIG_ALL``)
|
||||
|
||||
If this is on then modules will be automatically signed during the
|
||||
modules_install phase of a build. If this is off, then the modules must
|
||||
be signed manually using::
|
||||
|
||||
scripts/sign-file
|
||||
|
||||
|
||||
(3) :menuselection:`Which hash algorithm should modules be signed with?`
|
||||
|
||||
This presents a choice of which hash algorithm the installation phase will
|
||||
sign the modules with:
|
||||
|
||||
=============================== ==========================================
|
||||
``CONFIG_MODULE_SIG_SHA1`` :menuselection:`Sign modules with SHA-1`
|
||||
``CONFIG_MODULE_SIG_SHA224`` :menuselection:`Sign modules with SHA-224`
|
||||
``CONFIG_MODULE_SIG_SHA256`` :menuselection:`Sign modules with SHA-256`
|
||||
``CONFIG_MODULE_SIG_SHA384`` :menuselection:`Sign modules with SHA-384`
|
||||
``CONFIG_MODULE_SIG_SHA512`` :menuselection:`Sign modules with SHA-512`
|
||||
=============================== ==========================================
|
||||
|
||||
The algorithm selected here will also be built into the kernel (rather
|
||||
than being a module) so that modules signed with that algorithm can have
|
||||
their signatures checked without causing a dependency loop.
|
||||
|
||||
|
||||
(4) :menuselection:`File name or PKCS#11 URI of module signing key`
|
||||
(``CONFIG_MODULE_SIG_KEY``)
|
||||
|
||||
Setting this option to something other than its default of
|
||||
``certs/signing_key.pem`` will disable the autogeneration of signing keys
|
||||
and allow the kernel modules to be signed with a key of your choosing.
|
||||
The string provided should identify a file containing both a private key
|
||||
and its corresponding X.509 certificate in PEM form, or — on systems where
|
||||
the OpenSSL ENGINE_pkcs11 is functional — a PKCS#11 URI as defined by
|
||||
RFC7512. In the latter case, the PKCS#11 URI should reference both a
|
||||
certificate and a private key.
|
||||
|
||||
If the PEM file containing the private key is encrypted, or if the
|
||||
PKCS#11 token requries a PIN, this can be provided at build time by
|
||||
means of the ``KBUILD_SIGN_PIN`` variable.
|
||||
|
||||
|
||||
(5) :menuselection:`Additional X.509 keys for default system keyring`
|
||||
(``CONFIG_SYSTEM_TRUSTED_KEYS``)
|
||||
|
||||
This option can be set to the filename of a PEM-encoded file containing
|
||||
additional certificates which will be included in the system keyring by
|
||||
default.
|
||||
|
||||
Note that enabling module signing adds a dependency on the OpenSSL devel
|
||||
packages to the kernel build processes for the tool that does the signing.
|
||||
|
||||
|
||||
=======================
|
||||
Generating signing keys
|
||||
=======================
|
||||
|
||||
Cryptographic keypairs are required to generate and check signatures. A
|
||||
private key is used to generate a signature and the corresponding public key is
|
||||
used to check it. The private key is only needed during the build, after which
|
||||
it can be deleted or stored securely. The public key gets built into the
|
||||
kernel so that it can be used to check the signatures as the modules are
|
||||
loaded.
|
||||
|
||||
Under normal conditions, when ``CONFIG_MODULE_SIG_KEY`` is unchanged from its
|
||||
default, the kernel build will automatically generate a new keypair using
|
||||
openssl if one does not exist in the file::
|
||||
|
||||
certs/signing_key.pem
|
||||
|
||||
during the building of vmlinux (the public part of the key needs to be built
|
||||
into vmlinux) using parameters in the::
|
||||
|
||||
certs/x509.genkey
|
||||
|
||||
file (which is also generated if it does not already exist).
|
||||
|
||||
It is strongly recommended that you provide your own x509.genkey file.
|
||||
|
||||
Most notably, in the x509.genkey file, the req_distinguished_name section
|
||||
should be altered from the default::
|
||||
|
||||
[ req_distinguished_name ]
|
||||
#O = Unspecified company
|
||||
CN = Build time autogenerated kernel key
|
||||
#emailAddress = unspecified.user@unspecified.company
|
||||
|
||||
The generated RSA key size can also be set with::
|
||||
|
||||
[ req ]
|
||||
default_bits = 4096
|
||||
|
||||
|
||||
It is also possible to manually generate the key private/public files using the
|
||||
x509.genkey key generation configuration file in the root node of the Linux
|
||||
kernel sources tree and the openssl command. The following is an example to
|
||||
generate the public/private key files::
|
||||
|
||||
openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \
|
||||
-config x509.genkey -outform PEM -out kernel_key.pem \
|
||||
-keyout kernel_key.pem
|
||||
|
||||
The full pathname for the resulting kernel_key.pem file can then be specified
|
||||
in the ``CONFIG_MODULE_SIG_KEY`` option, and the certificate and key therein will
|
||||
be used instead of an autogenerated keypair.
|
||||
|
||||
|
||||
=========================
|
||||
Public keys in the kernel
|
||||
=========================
|
||||
|
||||
The kernel contains a ring of public keys that can be viewed by root. They're
|
||||
in a keyring called ".system_keyring" that can be seen by::
|
||||
|
||||
[root@deneb ~]# cat /proc/keys
|
||||
...
|
||||
223c7853 I------ 1 perm 1f030000 0 0 keyring .system_keyring: 1
|
||||
302d2d52 I------ 1 perm 1f010000 0 0 asymmetri Fedora kernel signing key: d69a84e6bce3d216b979e9505b3e3ef9a7118079: X509.RSA a7118079 []
|
||||
...
|
||||
|
||||
Beyond the public key generated specifically for module signing, additional
|
||||
trusted certificates can be provided in a PEM-encoded file referenced by the
|
||||
``CONFIG_SYSTEM_TRUSTED_KEYS`` configuration option.
|
||||
|
||||
Further, the architecture code may take public keys from a hardware store and
|
||||
add those in also (e.g. from the UEFI key database).
|
||||
|
||||
Finally, it is possible to add additional public keys by doing::
|
||||
|
||||
keyctl padd asymmetric "" [.system_keyring-ID] <[key-file]
|
||||
|
||||
e.g.::
|
||||
|
||||
keyctl padd asymmetric "" 0x223c7853 <my_public_key.x509
|
||||
|
||||
Note, however, that the kernel will only permit keys to be added to
|
||||
``.system_keyring _if_`` the new key's X.509 wrapper is validly signed by a key
|
||||
that is already resident in the .system_keyring at the time the key was added.
|
||||
|
||||
|
||||
========================
|
||||
Manually signing modules
|
||||
========================
|
||||
|
||||
To manually sign a module, use the scripts/sign-file tool available in
|
||||
the Linux kernel source tree. The script requires 4 arguments:
|
||||
|
||||
1. The hash algorithm (e.g., sha256)
|
||||
2. The private key filename or PKCS#11 URI
|
||||
3. The public key filename
|
||||
4. The kernel module to be signed
|
||||
|
||||
The following is an example to sign a kernel module::
|
||||
|
||||
scripts/sign-file sha512 kernel-signkey.priv \
|
||||
kernel-signkey.x509 module.ko
|
||||
|
||||
The hash algorithm used does not have to match the one configured, but if it
|
||||
doesn't, you should make sure that hash algorithm is either built into the
|
||||
kernel or can be loaded without requiring itself.
|
||||
|
||||
If the private key requires a passphrase or PIN, it can be provided in the
|
||||
$KBUILD_SIGN_PIN environment variable.
|
||||
|
||||
|
||||
============================
|
||||
Signed modules and stripping
|
||||
============================
|
||||
|
||||
A signed module has a digital signature simply appended at the end. The string
|
||||
``~Module signature appended~.`` at the end of the module's file confirms that a
|
||||
signature is present but it does not confirm that the signature is valid!
|
||||
|
||||
Signed modules are BRITTLE as the signature is outside of the defined ELF
|
||||
container. Thus they MAY NOT be stripped once the signature is computed and
|
||||
attached. Note the entire module is the signed payload, including any and all
|
||||
debug information present at the time of signing.
|
||||
|
||||
|
||||
======================
|
||||
Loading signed modules
|
||||
======================
|
||||
|
||||
Modules are loaded with insmod, modprobe, ``init_module()`` or
|
||||
``finit_module()``, exactly as for unsigned modules as no processing is
|
||||
done in userspace. The signature checking is all done within the kernel.
|
||||
|
||||
|
||||
=========================================
|
||||
Non-valid signatures and unsigned modules
|
||||
=========================================
|
||||
|
||||
If ``CONFIG_MODULE_SIG_FORCE`` is enabled or module.sig_enforce=1 is supplied on
|
||||
the kernel command line, the kernel will only load validly signed modules
|
||||
for which it has a public key. Otherwise, it will also load modules that are
|
||||
unsigned. Any module for which the kernel has a key, but which proves to have
|
||||
a signature mismatch will not be permitted to load.
|
||||
|
||||
Any module that has an unparseable signature will be rejected.
|
||||
|
||||
|
||||
=========================================
|
||||
Administering/protecting the private key
|
||||
=========================================
|
||||
|
||||
Since the private key is used to sign modules, viruses and malware could use
|
||||
the private key to sign modules and compromise the operating system. The
|
||||
private key must be either destroyed or moved to a secure location and not kept
|
||||
in the root node of the kernel source tree.
|
||||
|
||||
If you use the same private key to sign modules for multiple kernel
|
||||
configurations, you must ensure that the module version information is
|
||||
sufficient to prevent loading a module into a different kernel. Either
|
||||
set ``CONFIG_MODVERSIONS=y`` or ensure that each configuration has a different
|
||||
kernel release string by changing ``EXTRAVERSION`` or ``CONFIG_LOCALVERSION``.
|
||||
70
Documentation/admin-guide/mono.rst
Normal file
70
Documentation/admin-guide/mono.rst
Normal file
@@ -0,0 +1,70 @@
|
||||
Mono(tm) Binary Kernel Support for Linux
|
||||
-----------------------------------------
|
||||
|
||||
To configure Linux to automatically execute Mono-based .NET binaries
|
||||
(in the form of .exe files) without the need to use the mono CLR
|
||||
wrapper, you can use the BINFMT_MISC kernel support.
|
||||
|
||||
This will allow you to execute Mono-based .NET binaries just like any
|
||||
other program after you have done the following:
|
||||
|
||||
1) You MUST FIRST install the Mono CLR support, either by downloading
|
||||
a binary package, a source tarball or by installing from CVS. Binary
|
||||
packages for several distributions can be found at:
|
||||
|
||||
http://go-mono.com/download.html
|
||||
|
||||
Instructions for compiling Mono can be found at:
|
||||
|
||||
http://www.go-mono.com/compiling.html
|
||||
|
||||
Once the Mono CLR support has been installed, just check that
|
||||
``/usr/bin/mono`` (which could be located elsewhere, for example
|
||||
``/usr/local/bin/mono``) is working.
|
||||
|
||||
2) You have to compile BINFMT_MISC either as a module or into
|
||||
the kernel (``CONFIG_BINFMT_MISC``) and set it up properly.
|
||||
If you choose to compile it as a module, you will have
|
||||
to insert it manually with modprobe/insmod, as kmod
|
||||
cannot be easily supported with binfmt_misc.
|
||||
Read the file ``binfmt_misc.txt`` in this directory to know
|
||||
more about the configuration process.
|
||||
|
||||
3) Add the following entries to ``/etc/rc.local`` or similar script
|
||||
to be run at system startup:
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Insert BINFMT_MISC module into the kernel
|
||||
if [ ! -e /proc/sys/fs/binfmt_misc/register ]; then
|
||||
/sbin/modprobe binfmt_misc
|
||||
# Some distributions, like Fedora Core, perform
|
||||
# the following command automatically when the
|
||||
# binfmt_misc module is loaded into the kernel
|
||||
# or during normal boot up (systemd-based systems).
|
||||
# Thus, it is possible that the following line
|
||||
# is not needed at all.
|
||||
mount -t binfmt_misc none /proc/sys/fs/binfmt_misc
|
||||
fi
|
||||
|
||||
# Register support for .NET CLR binaries
|
||||
if [ -e /proc/sys/fs/binfmt_misc/register ]; then
|
||||
# Replace /usr/bin/mono with the correct pathname to
|
||||
# the Mono CLR runtime (usually /usr/local/bin/mono
|
||||
# when compiling from sources or CVS).
|
||||
echo ':CLR:M::MZ::/usr/bin/mono:' > /proc/sys/fs/binfmt_misc/register
|
||||
else
|
||||
echo "No binfmt_misc support"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
4) Check that ``.exe`` binaries can be ran without the need of a
|
||||
wrapper script, simply by launching the ``.exe`` file directly
|
||||
from a command prompt, for example::
|
||||
|
||||
/usr/bin/xsd.exe
|
||||
|
||||
.. note::
|
||||
|
||||
If this fails with a permission denied error, check
|
||||
that the ``.exe`` file has execute permissions.
|
||||
286
Documentation/admin-guide/parport.rst
Normal file
286
Documentation/admin-guide/parport.rst
Normal file
@@ -0,0 +1,286 @@
|
||||
Parport
|
||||
+++++++
|
||||
|
||||
The ``parport`` code provides parallel-port support under Linux. This
|
||||
includes the ability to share one port between multiple device
|
||||
drivers.
|
||||
|
||||
You can pass parameters to the ``parport`` code to override its automatic
|
||||
detection of your hardware. This is particularly useful if you want
|
||||
to use IRQs, since in general these can't be autoprobed successfully.
|
||||
By default IRQs are not used even if they **can** be probed. This is
|
||||
because there are a lot of people using the same IRQ for their
|
||||
parallel port and a sound card or network card.
|
||||
|
||||
The ``parport`` code is split into two parts: generic (which deals with
|
||||
port-sharing) and architecture-dependent (which deals with actually
|
||||
using the port).
|
||||
|
||||
|
||||
Parport as modules
|
||||
==================
|
||||
|
||||
If you load the `parport`` code as a module, say::
|
||||
|
||||
# insmod parport
|
||||
|
||||
to load the generic ``parport`` code. You then must load the
|
||||
architecture-dependent code with (for example)::
|
||||
|
||||
# insmod parport_pc io=0x3bc,0x378,0x278 irq=none,7,auto
|
||||
|
||||
to tell the ``parport`` code that you want three PC-style ports, one at
|
||||
0x3bc with no IRQ, one at 0x378 using IRQ 7, and one at 0x278 with an
|
||||
auto-detected IRQ. Currently, PC-style (``parport_pc``), Sun ``bpp``,
|
||||
Amiga, Atari, and MFC3 hardware is supported.
|
||||
|
||||
PCI parallel I/O card support comes from ``parport_pc``. Base I/O
|
||||
addresses should not be specified for supported PCI cards since they
|
||||
are automatically detected.
|
||||
|
||||
|
||||
modprobe
|
||||
--------
|
||||
|
||||
If you use modprobe , you will find it useful to add lines as below to a
|
||||
configuration file in /etc/modprobe.d/ directory::
|
||||
|
||||
alias parport_lowlevel parport_pc
|
||||
options parport_pc io=0x378,0x278 irq=7,auto
|
||||
|
||||
modprobe will load ``parport_pc`` (with the options ``io=0x378,0x278 irq=7,auto``)
|
||||
whenever a parallel port device driver (such as ``lp``) is loaded.
|
||||
|
||||
Note that these are example lines only! You shouldn't in general need
|
||||
to specify any options to ``parport_pc`` in order to be able to use a
|
||||
parallel port.
|
||||
|
||||
|
||||
Parport probe [optional]
|
||||
------------------------
|
||||
|
||||
In 2.2 kernels there was a module called ``parport_probe``, which was used
|
||||
for collecting IEEE 1284 device ID information. This has now been
|
||||
enhanced and now lives with the IEEE 1284 support. When a parallel
|
||||
port is detected, the devices that are connected to it are analysed,
|
||||
and information is logged like this::
|
||||
|
||||
parport0: Printer, BJC-210 (Canon)
|
||||
|
||||
The probe information is available from files in ``/proc/sys/dev/parport/``.
|
||||
|
||||
|
||||
Parport linked into the kernel statically
|
||||
=========================================
|
||||
|
||||
If you compile the ``parport`` code into the kernel, then you can use
|
||||
kernel boot parameters to get the same effect. Add something like the
|
||||
following to your LILO command line::
|
||||
|
||||
parport=0x3bc parport=0x378,7 parport=0x278,auto,nofifo
|
||||
|
||||
You can have many ``parport=...`` statements, one for each port you want
|
||||
to add. Adding ``parport=0`` to the kernel command-line will disable
|
||||
parport support entirely. Adding ``parport=auto`` to the kernel
|
||||
command-line will make ``parport`` use any IRQ lines or DMA channels that
|
||||
it auto-detects.
|
||||
|
||||
|
||||
Files in /proc
|
||||
==============
|
||||
|
||||
If you have configured the ``/proc`` filesystem into your kernel, you will
|
||||
see a new directory entry: ``/proc/sys/dev/parport``. In there will be a
|
||||
directory entry for each parallel port for which parport is
|
||||
configured. In each of those directories are a collection of files
|
||||
describing that parallel port.
|
||||
|
||||
The ``/proc/sys/dev/parport`` directory tree looks like::
|
||||
|
||||
parport
|
||||
|-- default
|
||||
| |-- spintime
|
||||
| `-- timeslice
|
||||
|-- parport0
|
||||
| |-- autoprobe
|
||||
| |-- autoprobe0
|
||||
| |-- autoprobe1
|
||||
| |-- autoprobe2
|
||||
| |-- autoprobe3
|
||||
| |-- devices
|
||||
| | |-- active
|
||||
| | `-- lp
|
||||
| | `-- timeslice
|
||||
| |-- base-addr
|
||||
| |-- irq
|
||||
| |-- dma
|
||||
| |-- modes
|
||||
| `-- spintime
|
||||
`-- parport1
|
||||
|-- autoprobe
|
||||
|-- autoprobe0
|
||||
|-- autoprobe1
|
||||
|-- autoprobe2
|
||||
|-- autoprobe3
|
||||
|-- devices
|
||||
| |-- active
|
||||
| `-- ppa
|
||||
| `-- timeslice
|
||||
|-- base-addr
|
||||
|-- irq
|
||||
|-- dma
|
||||
|-- modes
|
||||
`-- spintime
|
||||
|
||||
.. tabularcolumns:: |p{4.0cm}|p{13.5cm}|
|
||||
|
||||
======================= =======================================================
|
||||
File Contents
|
||||
======================= =======================================================
|
||||
``devices/active`` A list of the device drivers using that port. A "+"
|
||||
will appear by the name of the device currently using
|
||||
the port (it might not appear against any). The
|
||||
string "none" means that there are no device drivers
|
||||
using that port.
|
||||
|
||||
``base-addr`` Parallel port's base address, or addresses if the port
|
||||
has more than one in which case they are separated
|
||||
with tabs. These values might not have any sensible
|
||||
meaning for some ports.
|
||||
|
||||
``irq`` Parallel port's IRQ, or -1 if none is being used.
|
||||
|
||||
``dma`` Parallel port's DMA channel, or -1 if none is being
|
||||
used.
|
||||
|
||||
``modes`` Parallel port's hardware modes, comma-separated,
|
||||
meaning:
|
||||
|
||||
- PCSPP
|
||||
PC-style SPP registers are available.
|
||||
|
||||
- TRISTATE
|
||||
Port is bidirectional.
|
||||
|
||||
- COMPAT
|
||||
Hardware acceleration for printers is
|
||||
available and will be used.
|
||||
|
||||
- EPP
|
||||
Hardware acceleration for EPP protocol
|
||||
is available and will be used.
|
||||
|
||||
- ECP
|
||||
Hardware acceleration for ECP protocol
|
||||
is available and will be used.
|
||||
|
||||
- DMA
|
||||
DMA is available and will be used.
|
||||
|
||||
Note that the current implementation will only take
|
||||
advantage of COMPAT and ECP modes if it has an IRQ
|
||||
line to use.
|
||||
|
||||
``autoprobe`` Any IEEE-1284 device ID information that has been
|
||||
acquired from the (non-IEEE 1284.3) device.
|
||||
|
||||
``autoprobe[0-3]`` IEEE 1284 device ID information retrieved from
|
||||
daisy-chain devices that conform to IEEE 1284.3.
|
||||
|
||||
``spintime`` The number of microseconds to busy-loop while waiting
|
||||
for the peripheral to respond. You might find that
|
||||
adjusting this improves performance, depending on your
|
||||
peripherals. This is a port-wide setting, i.e. it
|
||||
applies to all devices on a particular port.
|
||||
|
||||
``timeslice`` The number of milliseconds that a device driver is
|
||||
allowed to keep a port claimed for. This is advisory,
|
||||
and driver can ignore it if it must.
|
||||
|
||||
``default/*`` The defaults for spintime and timeslice. When a new
|
||||
port is registered, it picks up the default spintime.
|
||||
When a new device is registered, it picks up the
|
||||
default timeslice.
|
||||
======================= =======================================================
|
||||
|
||||
Device drivers
|
||||
==============
|
||||
|
||||
Once the parport code is initialised, you can attach device drivers to
|
||||
specific ports. Normally this happens automatically; if the lp driver
|
||||
is loaded it will create one lp device for each port found. You can
|
||||
override this, though, by using parameters either when you load the lp
|
||||
driver::
|
||||
|
||||
# insmod lp parport=0,2
|
||||
|
||||
or on the LILO command line::
|
||||
|
||||
lp=parport0 lp=parport2
|
||||
|
||||
Both the above examples would inform lp that you want ``/dev/lp0`` to be
|
||||
the first parallel port, and /dev/lp1 to be the **third** parallel port,
|
||||
with no lp device associated with the second port (parport1). Note
|
||||
that this is different to the way older kernels worked; there used to
|
||||
be a static association between the I/O port address and the device
|
||||
name, so ``/dev/lp0`` was always the port at 0x3bc. This is no longer the
|
||||
case - if you only have one port, it will default to being ``/dev/lp0``,
|
||||
regardless of base address.
|
||||
|
||||
Also:
|
||||
|
||||
* If you selected the IEEE 1284 support at compile time, you can say
|
||||
``lp=auto`` on the kernel command line, and lp will create devices
|
||||
only for those ports that seem to have printers attached.
|
||||
|
||||
* If you give PLIP the ``timid`` parameter, either with ``plip=timid`` on
|
||||
the command line, or with ``insmod plip timid=1`` when using modules,
|
||||
it will avoid any ports that seem to be in use by other devices.
|
||||
|
||||
* IRQ autoprobing works only for a few port types at the moment.
|
||||
|
||||
Reporting printer problems with parport
|
||||
=======================================
|
||||
|
||||
If you are having problems printing, please go through these steps to
|
||||
try to narrow down where the problem area is.
|
||||
|
||||
When reporting problems with parport, really you need to give all of
|
||||
the messages that ``parport_pc`` spits out when it initialises. There are
|
||||
several code paths:
|
||||
|
||||
- polling
|
||||
- interrupt-driven, protocol in software
|
||||
- interrupt-driven, protocol in hardware using PIO
|
||||
- interrupt-driven, protocol in hardware using DMA
|
||||
|
||||
The kernel messages that ``parport_pc`` logs give an indication of which
|
||||
code path is being used. (They could be a lot better actually..)
|
||||
|
||||
For normal printer protocol, having IEEE 1284 modes enabled or not
|
||||
should not make a difference.
|
||||
|
||||
To turn off the 'protocol in hardware' code paths, disable
|
||||
``CONFIG_PARPORT_PC_FIFO``. Note that when they are enabled they are not
|
||||
necessarily **used**; it depends on whether the hardware is available,
|
||||
enabled by the BIOS, and detected by the driver.
|
||||
|
||||
So, to start with, disable ``CONFIG_PARPORT_PC_FIFO``, and load ``parport_pc``
|
||||
with ``irq=none``. See if printing works then. It really should,
|
||||
because this is the simplest code path.
|
||||
|
||||
If that works fine, try with ``io=0x378 irq=7`` (adjust for your
|
||||
hardware), to make it use interrupt-driven in-software protocol.
|
||||
|
||||
If **that** works fine, then one of the hardware modes isn't working
|
||||
right. Enable ``CONFIG_FIFO`` (no, it isn't a module option,
|
||||
and yes, it should be), set the port to ECP mode in the BIOS and note
|
||||
the DMA channel, and try with::
|
||||
|
||||
io=0x378 irq=7 dma=none (for PIO)
|
||||
io=0x378 irq=7 dma=3 (for DMA)
|
||||
|
||||
----------
|
||||
|
||||
philb@gnu.org
|
||||
tim@cyberelk.net
|
||||
156
Documentation/admin-guide/ramoops.rst
Normal file
156
Documentation/admin-guide/ramoops.rst
Normal file
@@ -0,0 +1,156 @@
|
||||
Ramoops oops/panic logger
|
||||
=========================
|
||||
|
||||
Sergiu Iordache <sergiu@chromium.org>
|
||||
|
||||
Updated: 17 November 2011
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
Ramoops is an oops/panic logger that writes its logs to RAM before the system
|
||||
crashes. It works by logging oopses and panics in a circular buffer. Ramoops
|
||||
needs a system with persistent RAM so that the content of that area can
|
||||
survive after a restart.
|
||||
|
||||
Ramoops concepts
|
||||
----------------
|
||||
|
||||
Ramoops uses a predefined memory area to store the dump. The start and size
|
||||
and type of the memory area are set using three variables:
|
||||
|
||||
* ``mem_address`` for the start
|
||||
* ``mem_size`` for the size. The memory size will be rounded down to a
|
||||
power of two.
|
||||
* ``mem_type`` to specifiy if the memory type (default is pgprot_writecombine).
|
||||
|
||||
Typically the default value of ``mem_type=0`` should be used as that sets the pstore
|
||||
mapping to pgprot_writecombine. Setting ``mem_type=1`` attempts to use
|
||||
``pgprot_noncached``, which only works on some platforms. This is because pstore
|
||||
depends on atomic operations. At least on ARM, pgprot_noncached causes the
|
||||
memory to be mapped strongly ordered, and atomic operations on strongly ordered
|
||||
memory are implementation defined, and won't work on many ARMs such as omaps.
|
||||
|
||||
The memory area is divided into ``record_size`` chunks (also rounded down to
|
||||
power of two) and each oops/panic writes a ``record_size`` chunk of
|
||||
information.
|
||||
|
||||
Dumping both oopses and panics can be done by setting 1 in the ``dump_oops``
|
||||
variable while setting 0 in that variable dumps only the panics.
|
||||
|
||||
The module uses a counter to record multiple dumps but the counter gets reset
|
||||
on restart (i.e. new dumps after the restart will overwrite old ones).
|
||||
|
||||
Ramoops also supports software ECC protection of persistent memory regions.
|
||||
This might be useful when a hardware reset was used to bring the machine back
|
||||
to life (i.e. a watchdog triggered). In such cases, RAM may be somewhat
|
||||
corrupt, but usually it is restorable.
|
||||
|
||||
Setting the parameters
|
||||
----------------------
|
||||
|
||||
Setting the ramoops parameters can be done in several different manners:
|
||||
|
||||
A. Use the module parameters (which have the names of the variables described
|
||||
as before). For quick debugging, you can also reserve parts of memory during
|
||||
boot and then use the reserved memory for ramoops. For example, assuming a
|
||||
machine with > 128 MB of memory, the following kernel command line will tell
|
||||
the kernel to use only the first 128 MB of memory, and place ECC-protected
|
||||
ramoops region at 128 MB boundary::
|
||||
|
||||
mem=128M ramoops.mem_address=0x8000000 ramoops.ecc=1
|
||||
|
||||
B. Use Device Tree bindings, as described in
|
||||
``Documentation/device-tree/bindings/reserved-memory/admin-guide/ramoops.rst``.
|
||||
For example::
|
||||
|
||||
reserved-memory {
|
||||
#address-cells = <2>;
|
||||
#size-cells = <2>;
|
||||
ranges;
|
||||
|
||||
ramoops@8f000000 {
|
||||
compatible = "ramoops";
|
||||
reg = <0 0x8f000000 0 0x100000>;
|
||||
record-size = <0x4000>;
|
||||
console-size = <0x4000>;
|
||||
};
|
||||
};
|
||||
|
||||
C. Use a platform device and set the platform data. The parameters can then
|
||||
be set through that platform data. An example of doing that is:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#include <linux/pstore_ram.h>
|
||||
[...]
|
||||
|
||||
static struct ramoops_platform_data ramoops_data = {
|
||||
.mem_size = <...>,
|
||||
.mem_address = <...>,
|
||||
.mem_type = <...>,
|
||||
.record_size = <...>,
|
||||
.dump_oops = <...>,
|
||||
.ecc = <...>,
|
||||
};
|
||||
|
||||
static struct platform_device ramoops_dev = {
|
||||
.name = "ramoops",
|
||||
.dev = {
|
||||
.platform_data = &ramoops_data,
|
||||
},
|
||||
};
|
||||
|
||||
[... inside a function ...]
|
||||
int ret;
|
||||
|
||||
ret = platform_device_register(&ramoops_dev);
|
||||
if (ret) {
|
||||
printk(KERN_ERR "unable to register platform device\n");
|
||||
return ret;
|
||||
}
|
||||
|
||||
You can specify either RAM memory or peripheral devices' memory. However, when
|
||||
specifying RAM, be sure to reserve the memory by issuing memblock_reserve()
|
||||
very early in the architecture code, e.g.::
|
||||
|
||||
#include <linux/memblock.h>
|
||||
|
||||
memblock_reserve(ramoops_data.mem_address, ramoops_data.mem_size);
|
||||
|
||||
Dump format
|
||||
-----------
|
||||
|
||||
The data dump begins with a header, currently defined as ``====`` followed by a
|
||||
timestamp and a new line. The dump then continues with the actual data.
|
||||
|
||||
Reading the data
|
||||
----------------
|
||||
|
||||
The dump data can be read from the pstore filesystem. The format for these
|
||||
files is ``dmesg-ramoops-N``, where N is the record number in memory. To delete
|
||||
a stored record from RAM, simply unlink the respective pstore file.
|
||||
|
||||
Persistent function tracing
|
||||
---------------------------
|
||||
|
||||
Persistent function tracing might be useful for debugging software or hardware
|
||||
related hangs. The functions call chain log is stored in a ``ftrace-ramoops``
|
||||
file. Here is an example of usage::
|
||||
|
||||
# mount -t debugfs debugfs /sys/kernel/debug/
|
||||
# echo 1 > /sys/kernel/debug/pstore/record_ftrace
|
||||
# reboot -f
|
||||
[...]
|
||||
# mount -t pstore pstore /mnt/
|
||||
# tail /mnt/ftrace-ramoops
|
||||
0 ffffffff8101ea64 ffffffff8101bcda native_apic_mem_read <- disconnect_bsp_APIC+0x6a/0xc0
|
||||
0 ffffffff8101ea44 ffffffff8101bcf6 native_apic_mem_write <- disconnect_bsp_APIC+0x86/0xc0
|
||||
0 ffffffff81020084 ffffffff8101a4b5 hpet_disable <- native_machine_shutdown+0x75/0x90
|
||||
0 ffffffff81005f94 ffffffff8101a4bb iommu_shutdown_noop <- native_machine_shutdown+0x7b/0x90
|
||||
0 ffffffff8101a6a1 ffffffff8101a437 native_machine_emergency_restart <- native_machine_restart+0x37/0x40
|
||||
0 ffffffff811f9876 ffffffff8101a73a acpi_reboot <- native_machine_emergency_restart+0xaa/0x1e0
|
||||
0 ffffffff8101a514 ffffffff8101a772 mach_reboot_fixups <- native_machine_emergency_restart+0xe2/0x1e0
|
||||
0 ffffffff811d9c54 ffffffff8101a7a0 __const_udelay <- native_machine_emergency_restart+0x110/0x1e0
|
||||
0 ffffffff811d9c34 ffffffff811d9c80 __delay <- __const_udelay+0x30/0x40
|
||||
0 ffffffff811d9d14 ffffffff811d9c3f delay_tsc <- __delay+0xf/0x20
|
||||
1190
Documentation/admin-guide/ras.rst
Normal file
1190
Documentation/admin-guide/ras.rst
Normal file
File diff suppressed because it is too large
Load Diff
182
Documentation/admin-guide/reporting-bugs.rst
Normal file
182
Documentation/admin-guide/reporting-bugs.rst
Normal file
@@ -0,0 +1,182 @@
|
||||
.. _reportingbugs:
|
||||
|
||||
Reporting bugs
|
||||
++++++++++++++
|
||||
|
||||
Background
|
||||
==========
|
||||
|
||||
The upstream Linux kernel maintainers only fix bugs for specific kernel
|
||||
versions. Those versions include the current "release candidate" (or -rc)
|
||||
kernel, any "stable" kernel versions, and any "long term" kernels.
|
||||
|
||||
Please see https://www.kernel.org/ for a list of supported kernels. Any
|
||||
kernel marked with [EOL] is "end of life" and will not have any fixes
|
||||
backported to it.
|
||||
|
||||
If you've found a bug on a kernel version that isn't listed on kernel.org,
|
||||
contact your Linux distribution or embedded vendor for support.
|
||||
Alternatively, you can attempt to run one of the supported stable or -rc
|
||||
kernels, and see if you can reproduce the bug on that. It's preferable
|
||||
to reproduce the bug on the latest -rc kernel.
|
||||
|
||||
|
||||
How to report Linux kernel bugs
|
||||
===============================
|
||||
|
||||
|
||||
Identify the problematic subsystem
|
||||
----------------------------------
|
||||
|
||||
Identifying which part of the Linux kernel might be causing your issue
|
||||
increases your chances of getting your bug fixed. Simply posting to the
|
||||
generic linux-kernel mailing list (LKML) may cause your bug report to be
|
||||
lost in the noise of a mailing list that gets 1000+ emails a day.
|
||||
|
||||
Instead, try to figure out which kernel subsystem is causing the issue,
|
||||
and email that subsystem's maintainer and mailing list. If the subsystem
|
||||
maintainer doesn't answer, then expand your scope to mailing lists like
|
||||
LKML.
|
||||
|
||||
|
||||
Identify who to notify
|
||||
----------------------
|
||||
|
||||
Once you know the subsystem that is causing the issue, you should send a
|
||||
bug report. Some maintainers prefer bugs to be reported via bugzilla
|
||||
(https://bugzilla.kernel.org), while others prefer that bugs be reported
|
||||
via the subsystem mailing list.
|
||||
|
||||
To find out where to send an emailed bug report, find your subsystem or
|
||||
device driver in the MAINTAINERS file. Search in the file for relevant
|
||||
entries, and send your bug report to the person(s) listed in the "M:"
|
||||
lines, making sure to Cc the mailing list(s) in the "L:" lines. When the
|
||||
maintainer replies to you, make sure to 'Reply-all' in order to keep the
|
||||
public mailing list(s) in the email thread.
|
||||
|
||||
If you know which driver is causing issues, you can pass one of the driver
|
||||
files to the get_maintainer.pl script::
|
||||
|
||||
perl scripts/get_maintainer.pl -f <filename>
|
||||
|
||||
If it is a security bug, please copy the Security Contact listed in the
|
||||
MAINTAINERS file. They can help coordinate bugfix and disclosure. See
|
||||
:ref:`Documentation/admin-guide/security-bugs.rst <securitybugs>` for more information.
|
||||
|
||||
If you can't figure out which subsystem caused the issue, you should file
|
||||
a bug in kernel.org bugzilla and send email to
|
||||
linux-kernel@vger.kernel.org, referencing the bugzilla URL. (For more
|
||||
information on the linux-kernel mailing list see
|
||||
http://www.tux.org/lkml/).
|
||||
|
||||
|
||||
Tips for reporting bugs
|
||||
-----------------------
|
||||
|
||||
If you haven't reported a bug before, please read:
|
||||
|
||||
http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
|
||||
|
||||
http://www.catb.org/esr/faqs/smart-questions.html
|
||||
|
||||
It's REALLY important to report bugs that seem unrelated as separate email
|
||||
threads or separate bugzilla entries. If you report several unrelated
|
||||
bugs at once, it's difficult for maintainers to tease apart the relevant
|
||||
data.
|
||||
|
||||
|
||||
Gather information
|
||||
------------------
|
||||
|
||||
The most important information in a bug report is how to reproduce the
|
||||
bug. This includes system information, and (most importantly)
|
||||
step-by-step instructions for how a user can trigger the bug.
|
||||
|
||||
If the failure includes an "OOPS:", take a picture of the screen, capture
|
||||
a netconsole trace, or type the message from your screen into the bug
|
||||
report. Please read "Documentation/admin-guide/oops-tracing.rst" before posting your
|
||||
bug report. This explains what you should do with the "Oops" information
|
||||
to make it useful to the recipient.
|
||||
|
||||
This is a suggested format for a bug report sent via email or bugzilla.
|
||||
Having a standardized bug report form makes it easier for you not to
|
||||
overlook things, and easier for the developers to find the pieces of
|
||||
information they're really interested in. If some information is not
|
||||
relevant to your bug, feel free to exclude it.
|
||||
|
||||
First run the ver_linux script included as scripts/ver_linux, which
|
||||
reports the version of some important subsystems. Run this script with
|
||||
the command ``awk -f scripts/ver_linux``.
|
||||
|
||||
Use that information to fill in all fields of the bug report form, and
|
||||
post it to the mailing list with a subject of "PROBLEM: <one line
|
||||
summary from [1.]>" for easy identification by the developers::
|
||||
|
||||
[1.] One line summary of the problem:
|
||||
[2.] Full description of the problem/report:
|
||||
[3.] Keywords (i.e., modules, networking, kernel):
|
||||
[4.] Kernel information
|
||||
[4.1.] Kernel version (from /proc/version):
|
||||
[4.2.] Kernel .config file:
|
||||
[5.] Most recent kernel version which did not have the bug:
|
||||
[6.] Output of Oops.. message (if applicable) with symbolic information
|
||||
resolved (see Documentation/admin-guide/oops-tracing.rst)
|
||||
[7.] A small shell script or example program which triggers the
|
||||
problem (if possible)
|
||||
[8.] Environment
|
||||
[8.1.] Software (add the output of the ver_linux script here)
|
||||
[8.2.] Processor information (from /proc/cpuinfo):
|
||||
[8.3.] Module information (from /proc/modules):
|
||||
[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
|
||||
[8.5.] PCI information ('lspci -vvv' as root)
|
||||
[8.6.] SCSI information (from /proc/scsi/scsi)
|
||||
[8.7.] Other information that might be relevant to the problem
|
||||
(please look in /proc and include all information that you
|
||||
think to be relevant):
|
||||
[X.] Other notes, patches, fixes, workarounds:
|
||||
|
||||
|
||||
Follow up
|
||||
=========
|
||||
|
||||
Expectations for bug reporters
|
||||
------------------------------
|
||||
|
||||
Linux kernel maintainers expect bug reporters to be able to follow up on
|
||||
bug reports. That may include running new tests, applying patches,
|
||||
recompiling your kernel, and/or re-triggering your bug. The most
|
||||
frustrating thing for maintainers is for someone to report a bug, and then
|
||||
never follow up on a request to try out a fix.
|
||||
|
||||
That said, it's still useful for a kernel maintainer to know a bug exists
|
||||
on a supported kernel, even if you can't follow up with retests. Follow
|
||||
up reports, such as replying to the email thread with "I tried the latest
|
||||
kernel and I can't reproduce my bug anymore" are also helpful, because
|
||||
maintainers have to assume silence means things are still broken.
|
||||
|
||||
Expectations for kernel maintainers
|
||||
-----------------------------------
|
||||
|
||||
Linux kernel maintainers are busy, overworked human beings. Some times
|
||||
they may not be able to address your bug in a day, a week, or two weeks.
|
||||
If they don't answer your email, they may be on vacation, or at a Linux
|
||||
conference. Check the conference schedule at https://LWN.net for more info:
|
||||
|
||||
https://lwn.net/Calendar/
|
||||
|
||||
In general, kernel maintainers take 1 to 5 business days to respond to
|
||||
bugs. The majority of kernel maintainers are employed to work on the
|
||||
kernel, and they may not work on the weekends. Maintainers are scattered
|
||||
around the world, and they may not work in your time zone. Unless you
|
||||
have a high priority bug, please wait at least a week after the first bug
|
||||
report before sending the maintainer a reminder email.
|
||||
|
||||
The exceptions to this rule are regressions, kernel crashes, security holes,
|
||||
or userspace breakage caused by new kernel behavior. Those bugs should be
|
||||
addressed by the maintainers ASAP. If you suspect a maintainer is not
|
||||
responding to these types of bugs in a timely manner (especially during a
|
||||
merge window), escalate the bug to LKML and Linus Torvalds.
|
||||
|
||||
Thank you!
|
||||
|
||||
[Some of this is taken from Frohwalt Egerer's original linux-kernel FAQ]
|
||||
46
Documentation/admin-guide/security-bugs.rst
Normal file
46
Documentation/admin-guide/security-bugs.rst
Normal file
@@ -0,0 +1,46 @@
|
||||
.. _securitybugs:
|
||||
|
||||
Security bugs
|
||||
=============
|
||||
|
||||
Linux kernel developers take security very seriously. As such, we'd
|
||||
like to know when a security bug is found so that it can be fixed and
|
||||
disclosed as quickly as possible. Please report security bugs to the
|
||||
Linux kernel security team.
|
||||
|
||||
Contact
|
||||
-------
|
||||
|
||||
The Linux kernel security team can be contacted by email at
|
||||
<security@kernel.org>. This is a private list of security officers
|
||||
who will help verify the bug report and develop and release a fix.
|
||||
It is possible that the security team will bring in extra help from
|
||||
area maintainers to understand and fix the security vulnerability.
|
||||
|
||||
As it is with any bug, the more information provided the easier it
|
||||
will be to diagnose and fix. Please review the procedure outlined in
|
||||
admin-guide/reporting-bugs.rst if you are unclear about what information is helpful.
|
||||
Any exploit code is very helpful and will not be released without
|
||||
consent from the reporter unless it has already been made public.
|
||||
|
||||
Disclosure
|
||||
----------
|
||||
|
||||
The goal of the Linux kernel security team is to work with the
|
||||
bug submitter to bug resolution as well as disclosure. We prefer
|
||||
to fully disclose the bug as soon as possible. It is reasonable to
|
||||
delay disclosure when the bug or the fix is not yet fully understood,
|
||||
the solution is not well-tested or for vendor coordination. However, we
|
||||
expect these delays to be short, measurable in days, not weeks or months.
|
||||
A disclosure date is negotiated by the security team working with the
|
||||
bug submitter as well as vendors. However, the kernel security team
|
||||
holds the final say when setting a disclosure date. The timeframe for
|
||||
disclosure is from immediate (esp. if it's already publicly known)
|
||||
to a few weeks. As a basic default policy, we expect report date to
|
||||
disclosure date to be on the order of 7 days.
|
||||
|
||||
Non-disclosure agreements
|
||||
-------------------------
|
||||
|
||||
The Linux kernel security team is not a formal body and therefore unable
|
||||
to enter any non-disclosure agreements.
|
||||
115
Documentation/admin-guide/serial-console.rst
Normal file
115
Documentation/admin-guide/serial-console.rst
Normal file
@@ -0,0 +1,115 @@
|
||||
.. _serial_console:
|
||||
|
||||
Linux Serial Console
|
||||
====================
|
||||
|
||||
To use a serial port as console you need to compile the support into your
|
||||
kernel - by default it is not compiled in. For PC style serial ports
|
||||
it's the config option next to menu option:
|
||||
|
||||
:menuselection:`Character devices --> Serial drivers --> 8250/16550 and compatible serial support --> Console on 8250/16550 and compatible serial port`
|
||||
|
||||
You must compile serial support into the kernel and not as a module.
|
||||
|
||||
It is possible to specify multiple devices for console output. You can
|
||||
define a new kernel command line option to select which device(s) to
|
||||
use for console output.
|
||||
|
||||
The format of this option is::
|
||||
|
||||
console=device,options
|
||||
|
||||
device: tty0 for the foreground virtual console
|
||||
ttyX for any other virtual console
|
||||
ttySx for a serial port
|
||||
lp0 for the first parallel port
|
||||
ttyUSB0 for the first USB serial device
|
||||
|
||||
options: depend on the driver. For the serial port this
|
||||
defines the baudrate/parity/bits/flow control of
|
||||
the port, in the format BBBBPNF, where BBBB is the
|
||||
speed, P is parity (n/o/e), N is number of bits,
|
||||
and F is flow control ('r' for RTS). Default is
|
||||
9600n8. The maximum baudrate is 115200.
|
||||
|
||||
You can specify multiple console= options on the kernel command line.
|
||||
Output will appear on all of them. The last device will be used when
|
||||
you open ``/dev/console``. So, for example::
|
||||
|
||||
console=ttyS1,9600 console=tty0
|
||||
|
||||
defines that opening ``/dev/console`` will get you the current foreground
|
||||
virtual console, and kernel messages will appear on both the VGA
|
||||
console and the 2nd serial port (ttyS1 or COM2) at 9600 baud.
|
||||
|
||||
Note that you can only define one console per device type (serial, video).
|
||||
|
||||
If no console device is specified, the first device found capable of
|
||||
acting as a system console will be used. At this time, the system
|
||||
first looks for a VGA card and then for a serial port. So if you don't
|
||||
have a VGA card in your system the first serial port will automatically
|
||||
become the console.
|
||||
|
||||
You will need to create a new device to use ``/dev/console``. The official
|
||||
``/dev/console`` is now character device 5,1.
|
||||
|
||||
(You can also use a network device as a console. See
|
||||
``Documentation/networking/netconsole.txt`` for information on that.)
|
||||
|
||||
Here's an example that will use ``/dev/ttyS1`` (COM2) as the console.
|
||||
Replace the sample values as needed.
|
||||
|
||||
1. Create ``/dev/console`` (real console) and ``/dev/tty0`` (master virtual
|
||||
console)::
|
||||
|
||||
cd /dev
|
||||
rm -f console tty0
|
||||
mknod -m 622 console c 5 1
|
||||
mknod -m 622 tty0 c 4 0
|
||||
|
||||
2. LILO can also take input from a serial device. This is a very
|
||||
useful option. To tell LILO to use the serial port:
|
||||
In lilo.conf (global section)::
|
||||
|
||||
serial = 1,9600n8 (ttyS1, 9600 bd, no parity, 8 bits)
|
||||
|
||||
3. Adjust to kernel flags for the new kernel,
|
||||
again in lilo.conf (kernel section)::
|
||||
|
||||
append = "console=ttyS1,9600"
|
||||
|
||||
4. Make sure a getty runs on the serial port so that you can login to
|
||||
it once the system is done booting. This is done by adding a line
|
||||
like this to ``/etc/inittab`` (exact syntax depends on your getty)::
|
||||
|
||||
S1:23:respawn:/sbin/getty -L ttyS1 9600 vt100
|
||||
|
||||
5. Init and ``/etc/ioctl.save``
|
||||
|
||||
Sysvinit remembers its stty settings in a file in ``/etc``, called
|
||||
``/etc/ioctl.save``. REMOVE THIS FILE before using the serial
|
||||
console for the first time, because otherwise init will probably
|
||||
set the baudrate to 38400 (baudrate of the virtual console).
|
||||
|
||||
6. ``/dev/console`` and X
|
||||
Programs that want to do something with the virtual console usually
|
||||
open ``/dev/console``. If you have created the new ``/dev/console`` device,
|
||||
and your console is NOT the virtual console some programs will fail.
|
||||
Those are programs that want to access the VT interface, and use
|
||||
``/dev/console instead of /dev/tty0``. Some of those programs are::
|
||||
|
||||
Xfree86, svgalib, gpm, SVGATextMode
|
||||
|
||||
It should be fixed in modern versions of these programs though.
|
||||
|
||||
Note that if you boot without a ``console=`` option (or with
|
||||
``console=/dev/tty0``), ``/dev/console`` is the same as ``/dev/tty0``.
|
||||
In that case everything will still work.
|
||||
|
||||
7. Thanks
|
||||
|
||||
Thanks to Geert Uytterhoeven <geert@linux-m68k.org>
|
||||
for porting the patches from 2.1.4x to 2.1.6x for taking care of
|
||||
the integration of these patches into m68k, ppc and alpha.
|
||||
|
||||
Miquel van Smoorenburg <miquels@cistron.nl>, 11-Jun-2000
|
||||
192
Documentation/admin-guide/sysfs-rules.rst
Normal file
192
Documentation/admin-guide/sysfs-rules.rst
Normal file
@@ -0,0 +1,192 @@
|
||||
Rules on how to access information in sysfs
|
||||
===========================================
|
||||
|
||||
The kernel-exported sysfs exports internal kernel implementation details
|
||||
and depends on internal kernel structures and layout. It is agreed upon
|
||||
by the kernel developers that the Linux kernel does not provide a stable
|
||||
internal API. Therefore, there are aspects of the sysfs interface that
|
||||
may not be stable across kernel releases.
|
||||
|
||||
To minimize the risk of breaking users of sysfs, which are in most cases
|
||||
low-level userspace applications, with a new kernel release, the users
|
||||
of sysfs must follow some rules to use an as-abstract-as-possible way to
|
||||
access this filesystem. The current udev and HAL programs already
|
||||
implement this and users are encouraged to plug, if possible, into the
|
||||
abstractions these programs provide instead of accessing sysfs directly.
|
||||
|
||||
But if you really do want or need to access sysfs directly, please follow
|
||||
the following rules and then your programs should work with future
|
||||
versions of the sysfs interface.
|
||||
|
||||
- Do not use libsysfs
|
||||
It makes assumptions about sysfs which are not true. Its API does not
|
||||
offer any abstraction, it exposes all the kernel driver-core
|
||||
implementation details in its own API. Therefore it is not better than
|
||||
reading directories and opening the files yourself.
|
||||
Also, it is not actively maintained, in the sense of reflecting the
|
||||
current kernel development. The goal of providing a stable interface
|
||||
to sysfs has failed; it causes more problems than it solves. It
|
||||
violates many of the rules in this document.
|
||||
|
||||
- sysfs is always at ``/sys``
|
||||
Parsing ``/proc/mounts`` is a waste of time. Other mount points are a
|
||||
system configuration bug you should not try to solve. For test cases,
|
||||
possibly support a ``SYSFS_PATH`` environment variable to overwrite the
|
||||
application's behavior, but never try to search for sysfs. Never try
|
||||
to mount it, if you are not an early boot script.
|
||||
|
||||
- devices are only "devices"
|
||||
There is no such thing like class-, bus-, physical devices,
|
||||
interfaces, and such that you can rely on in userspace. Everything is
|
||||
just simply a "device". Class-, bus-, physical, ... types are just
|
||||
kernel implementation details which should not be expected by
|
||||
applications that look for devices in sysfs.
|
||||
|
||||
The properties of a device are:
|
||||
|
||||
- devpath (``/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0``)
|
||||
|
||||
- identical to the DEVPATH value in the event sent from the kernel
|
||||
at device creation and removal
|
||||
- the unique key to the device at that point in time
|
||||
- the kernel's path to the device directory without the leading
|
||||
``/sys``, and always starting with a slash
|
||||
- all elements of a devpath must be real directories. Symlinks
|
||||
pointing to /sys/devices must always be resolved to their real
|
||||
target and the target path must be used to access the device.
|
||||
That way the devpath to the device matches the devpath of the
|
||||
kernel used at event time.
|
||||
- using or exposing symlink values as elements in a devpath string
|
||||
is a bug in the application
|
||||
|
||||
- kernel name (``sda``, ``tty``, ``0000:00:1f.2``, ...)
|
||||
|
||||
- a directory name, identical to the last element of the devpath
|
||||
- applications need to handle spaces and characters like ``!`` in
|
||||
the name
|
||||
|
||||
- subsystem (``block``, ``tty``, ``pci``, ...)
|
||||
|
||||
- simple string, never a path or a link
|
||||
- retrieved by reading the "subsystem"-link and using only the
|
||||
last element of the target path
|
||||
|
||||
- driver (``tg3``, ``ata_piix``, ``uhci_hcd``)
|
||||
|
||||
- a simple string, which may contain spaces, never a path or a
|
||||
link
|
||||
- it is retrieved by reading the "driver"-link and using only the
|
||||
last element of the target path
|
||||
- devices which do not have "driver"-link just do not have a
|
||||
driver; copying the driver value in a child device context is a
|
||||
bug in the application
|
||||
|
||||
- attributes
|
||||
|
||||
- the files in the device directory or files below subdirectories
|
||||
of the same device directory
|
||||
- accessing attributes reached by a symlink pointing to another device,
|
||||
like the "device"-link, is a bug in the application
|
||||
|
||||
Everything else is just a kernel driver-core implementation detail
|
||||
that should not be assumed to be stable across kernel releases.
|
||||
|
||||
- Properties of parent devices never belong into a child device.
|
||||
Always look at the parent devices themselves for determining device
|
||||
context properties. If the device ``eth0`` or ``sda`` does not have a
|
||||
"driver"-link, then this device does not have a driver. Its value is empty.
|
||||
Never copy any property of the parent-device into a child-device. Parent
|
||||
device properties may change dynamically without any notice to the
|
||||
child device.
|
||||
|
||||
- Hierarchy in a single device tree
|
||||
There is only one valid place in sysfs where hierarchy can be examined
|
||||
and this is below: ``/sys/devices.``
|
||||
It is planned that all device directories will end up in the tree
|
||||
below this directory.
|
||||
|
||||
- Classification by subsystem
|
||||
There are currently three places for classification of devices:
|
||||
``/sys/block,`` ``/sys/class`` and ``/sys/bus.`` It is planned that these will
|
||||
not contain any device directories themselves, but only flat lists of
|
||||
symlinks pointing to the unified ``/sys/devices`` tree.
|
||||
All three places have completely different rules on how to access
|
||||
device information. It is planned to merge all three
|
||||
classification directories into one place at ``/sys/subsystem``,
|
||||
following the layout of the bus directories. All buses and
|
||||
classes, including the converted block subsystem, will show up
|
||||
there.
|
||||
The devices belonging to a subsystem will create a symlink in the
|
||||
"devices" directory at ``/sys/subsystem/<name>/devices``,
|
||||
|
||||
If ``/sys/subsystem`` exists, ``/sys/bus``, ``/sys/class`` and ``/sys/block``
|
||||
can be ignored. If it does not exist, you always have to scan all three
|
||||
places, as the kernel is free to move a subsystem from one place to
|
||||
the other, as long as the devices are still reachable by the same
|
||||
subsystem name.
|
||||
|
||||
Assuming ``/sys/class/<subsystem>`` and ``/sys/bus/<subsystem>``, or
|
||||
``/sys/block`` and ``/sys/class/block`` are not interchangeable is a bug in
|
||||
the application.
|
||||
|
||||
- Block
|
||||
The converted block subsystem at ``/sys/class/block`` or
|
||||
``/sys/subsystem/block`` will contain the links for disks and partitions
|
||||
at the same level, never in a hierarchy. Assuming the block subsystem to
|
||||
contain only disks and not partition devices in the same flat list is
|
||||
a bug in the application.
|
||||
|
||||
- "device"-link and <subsystem>:<kernel name>-links
|
||||
Never depend on the "device"-link. The "device"-link is a workaround
|
||||
for the old layout, where class devices are not created in
|
||||
``/sys/devices/`` like the bus devices. If the link-resolving of a
|
||||
device directory does not end in ``/sys/devices/``, you can use the
|
||||
"device"-link to find the parent devices in ``/sys/devices/``, That is the
|
||||
single valid use of the "device"-link; it must never appear in any
|
||||
path as an element. Assuming the existence of the "device"-link for
|
||||
a device in ``/sys/devices/`` is a bug in the application.
|
||||
Accessing ``/sys/class/net/eth0/device`` is a bug in the application.
|
||||
|
||||
Never depend on the class-specific links back to the ``/sys/class``
|
||||
directory. These links are also a workaround for the design mistake
|
||||
that class devices are not created in ``/sys/devices.`` If a device
|
||||
directory does not contain directories for child devices, these links
|
||||
may be used to find the child devices in ``/sys/class.`` That is the single
|
||||
valid use of these links; they must never appear in any path as an
|
||||
element. Assuming the existence of these links for devices which are
|
||||
real child device directories in the ``/sys/devices`` tree is a bug in
|
||||
the application.
|
||||
|
||||
It is planned to remove all these links when all class device
|
||||
directories live in ``/sys/devices.``
|
||||
|
||||
- Position of devices along device chain can change.
|
||||
Never depend on a specific parent device position in the devpath,
|
||||
or the chain of parent devices. The kernel is free to insert devices into
|
||||
the chain. You must always request the parent device you are looking for
|
||||
by its subsystem value. You need to walk up the chain until you find
|
||||
the device that matches the expected subsystem. Depending on a specific
|
||||
position of a parent device or exposing relative paths using ``../`` to
|
||||
access the chain of parents is a bug in the application.
|
||||
|
||||
- When reading and writing sysfs device attribute files, avoid dependency
|
||||
on specific error codes wherever possible. This minimizes coupling to
|
||||
the error handling implementation within the kernel.
|
||||
|
||||
In general, failures to read or write sysfs device attributes shall
|
||||
propagate errors wherever possible. Common errors include, but are not
|
||||
limited to:
|
||||
|
||||
``-EIO``: The read or store operation is not supported, typically
|
||||
returned by the sysfs system itself if the read or store pointer
|
||||
is ``NULL``.
|
||||
|
||||
``-ENXIO``: The read or store operation failed
|
||||
|
||||
Error codes will not be changed without good reason, and should a change
|
||||
to error codes result in user-space breakage, it will be fixed, or the
|
||||
the offending change will be reverted.
|
||||
|
||||
Userspace applications can, however, expect the format and contents of
|
||||
the attribute files to remain consistent in the absence of a version
|
||||
attribute change in the context of a given attribute.
|
||||
289
Documentation/admin-guide/sysrq.rst
Normal file
289
Documentation/admin-guide/sysrq.rst
Normal file
@@ -0,0 +1,289 @@
|
||||
Linux Magic System Request Key Hacks
|
||||
====================================
|
||||
|
||||
Documentation for sysrq.c
|
||||
|
||||
What is the magic SysRq key?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
It is a 'magical' key combo you can hit which the kernel will respond to
|
||||
regardless of whatever else it is doing, unless it is completely locked up.
|
||||
|
||||
How do I enable the magic SysRq key?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
You need to say "yes" to 'Magic SysRq key (CONFIG_MAGIC_SYSRQ)' when
|
||||
configuring the kernel. When running a kernel with SysRq compiled in,
|
||||
/proc/sys/kernel/sysrq controls the functions allowed to be invoked via
|
||||
the SysRq key. The default value in this file is set by the
|
||||
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE config symbol, which itself defaults
|
||||
to 1. Here is the list of possible values in /proc/sys/kernel/sysrq:
|
||||
|
||||
- 0 - disable sysrq completely
|
||||
- 1 - enable all functions of sysrq
|
||||
- >1 - bitmask of allowed sysrq functions (see below for detailed function
|
||||
description)::
|
||||
|
||||
2 = 0x2 - enable control of console logging level
|
||||
4 = 0x4 - enable control of keyboard (SAK, unraw)
|
||||
8 = 0x8 - enable debugging dumps of processes etc.
|
||||
16 = 0x10 - enable sync command
|
||||
32 = 0x20 - enable remount read-only
|
||||
64 = 0x40 - enable signalling of processes (term, kill, oom-kill)
|
||||
128 = 0x80 - allow reboot/poweroff
|
||||
256 = 0x100 - allow nicing of all RT tasks
|
||||
|
||||
You can set the value in the file by the following command::
|
||||
|
||||
echo "number" >/proc/sys/kernel/sysrq
|
||||
|
||||
The number may be written here either as decimal or as hexadecimal
|
||||
with the 0x prefix. CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE must always be
|
||||
written in hexadecimal.
|
||||
|
||||
Note that the value of ``/proc/sys/kernel/sysrq`` influences only the invocation
|
||||
via a keyboard. Invocation of any operation via ``/proc/sysrq-trigger`` is
|
||||
always allowed (by a user with admin privileges).
|
||||
|
||||
How do I use the magic SysRq key?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
On x86 - You press the key combo :kbd:`ALT-SysRq-<command key>`.
|
||||
|
||||
.. note::
|
||||
Some
|
||||
keyboards may not have a key labeled 'SysRq'. The 'SysRq' key is
|
||||
also known as the 'Print Screen' key. Also some keyboards cannot
|
||||
handle so many keys being pressed at the same time, so you might
|
||||
have better luck with press :kbd:`Alt`, press :kbd:`SysRq`,
|
||||
release :kbd:`SysRq`, press :kbd:`<command key>`, release everything.
|
||||
|
||||
On SPARC - You press :kbd:`ALT-STOP-<command key>`, I believe.
|
||||
|
||||
On the serial console (PC style standard serial ports only)
|
||||
You send a ``BREAK``, then within 5 seconds a command key. Sending
|
||||
``BREAK`` twice is interpreted as a normal BREAK.
|
||||
|
||||
On PowerPC
|
||||
Press :kbd:`ALT - Print Screen` (or :kbd:`F13`) - :kbd:`<command key>`,
|
||||
:kbd:`Print Screen` (or :kbd:`F13`) - :kbd:`<command key>` may suffice.
|
||||
|
||||
On other
|
||||
If you know of the key combos for other architectures, please
|
||||
let me know so I can add them to this section.
|
||||
|
||||
On all
|
||||
write a character to /proc/sysrq-trigger. e.g.::
|
||||
|
||||
echo t > /proc/sysrq-trigger
|
||||
|
||||
What are the 'command' keys?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
=========== ===================================================================
|
||||
Command Function
|
||||
=========== ===================================================================
|
||||
``b`` Will immediately reboot the system without syncing or unmounting
|
||||
your disks.
|
||||
|
||||
``c`` Will perform a system crash by a NULL pointer dereference.
|
||||
A crashdump will be taken if configured.
|
||||
|
||||
``d`` Shows all locks that are held.
|
||||
|
||||
``e`` Send a SIGTERM to all processes, except for init.
|
||||
|
||||
``f`` Will call the oom killer to kill a memory hog process, but do not
|
||||
panic if nothing can be killed.
|
||||
|
||||
``g`` Used by kgdb (kernel debugger)
|
||||
|
||||
``h`` Will display help (actually any other key than those listed
|
||||
here will display help. but ``h`` is easy to remember :-)
|
||||
|
||||
``i`` Send a SIGKILL to all processes, except for init.
|
||||
|
||||
``j`` Forcibly "Just thaw it" - filesystems frozen by the FIFREEZE ioctl.
|
||||
|
||||
``k`` Secure Access Key (SAK) Kills all programs on the current virtual
|
||||
console. NOTE: See important comments below in SAK section.
|
||||
|
||||
``l`` Shows a stack backtrace for all active CPUs.
|
||||
|
||||
``m`` Will dump current memory info to your console.
|
||||
|
||||
``n`` Used to make RT tasks nice-able
|
||||
|
||||
``o`` Will shut your system off (if configured and supported).
|
||||
|
||||
``p`` Will dump the current registers and flags to your console.
|
||||
|
||||
``q`` Will dump per CPU lists of all armed hrtimers (but NOT regular
|
||||
timer_list timers) and detailed information about all
|
||||
clockevent devices.
|
||||
|
||||
``r`` Turns off keyboard raw mode and sets it to XLATE.
|
||||
|
||||
``s`` Will attempt to sync all mounted filesystems.
|
||||
|
||||
``t`` Will dump a list of current tasks and their information to your
|
||||
console.
|
||||
|
||||
``u`` Will attempt to remount all mounted filesystems read-only.
|
||||
|
||||
``v`` Forcefully restores framebuffer console
|
||||
``v`` Causes ETM buffer dump [ARM-specific]
|
||||
|
||||
``w`` Dumps tasks that are in uninterruptable (blocked) state.
|
||||
|
||||
``x`` Used by xmon interface on ppc/powerpc platforms.
|
||||
Show global PMU Registers on sparc64.
|
||||
Dump all TLB entries on MIPS.
|
||||
|
||||
``y`` Show global CPU Registers [SPARC-64 specific]
|
||||
|
||||
``z`` Dump the ftrace buffer
|
||||
|
||||
``0``-``9`` Sets the console log level, controlling which kernel messages
|
||||
will be printed to your console. (``0``, for example would make
|
||||
it so that only emergency messages like PANICs or OOPSes would
|
||||
make it to your console.)
|
||||
=========== ===================================================================
|
||||
|
||||
Okay, so what can I use them for?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Well, unraw(r) is very handy when your X server or a svgalib program crashes.
|
||||
|
||||
sak(k) (Secure Access Key) is useful when you want to be sure there is no
|
||||
trojan program running at console which could grab your password
|
||||
when you would try to login. It will kill all programs on given console,
|
||||
thus letting you make sure that the login prompt you see is actually
|
||||
the one from init, not some trojan program.
|
||||
|
||||
.. important::
|
||||
|
||||
In its true form it is not a true SAK like the one in a
|
||||
c2 compliant system, and it should not be mistaken as
|
||||
such.
|
||||
|
||||
It seems others find it useful as (System Attention Key) which is
|
||||
useful when you want to exit a program that will not let you switch consoles.
|
||||
(For example, X or a svgalib program.)
|
||||
|
||||
``reboot(b)`` is good when you're unable to shut down. But you should also
|
||||
``sync(s)`` and ``umount(u)`` first.
|
||||
|
||||
``crash(c)`` can be used to manually trigger a crashdump when the system is hung.
|
||||
Note that this just triggers a crash if there is no dump mechanism available.
|
||||
|
||||
``sync(s)`` is great when your system is locked up, it allows you to sync your
|
||||
disks and will certainly lessen the chance of data loss and fscking. Note
|
||||
that the sync hasn't taken place until you see the "OK" and "Done" appear
|
||||
on the screen. (If the kernel is really in strife, you may not ever get the
|
||||
OK or Done message...)
|
||||
|
||||
``umount(u)`` is basically useful in the same ways as ``sync(s)``. I generally
|
||||
``sync(s)``, ``umount(u)``, then ``reboot(b)`` when my system locks. It's saved
|
||||
me many a fsck. Again, the unmount (remount read-only) hasn't taken place until
|
||||
you see the "OK" and "Done" message appear on the screen.
|
||||
|
||||
The loglevels ``0``-``9`` are useful when your console is being flooded with
|
||||
kernel messages you do not want to see. Selecting ``0`` will prevent all but
|
||||
the most urgent kernel messages from reaching your console. (They will
|
||||
still be logged if syslogd/klogd are alive, though.)
|
||||
|
||||
``term(e)`` and ``kill(i)`` are useful if you have some sort of runaway process
|
||||
you are unable to kill any other way, especially if it's spawning other
|
||||
processes.
|
||||
|
||||
"just thaw ``it(j)``" is useful if your system becomes unresponsive due to a
|
||||
frozen (probably root) filesystem via the FIFREEZE ioctl.
|
||||
|
||||
Sometimes SysRq seems to get 'stuck' after using it, what can I do?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
That happens to me, also. I've found that tapping shift, alt, and control
|
||||
on both sides of the keyboard, and hitting an invalid sysrq sequence again
|
||||
will fix the problem. (i.e., something like :kbd:`alt-sysrq-z`). Switching to
|
||||
another virtual console (:kbd:`ALT+Fn`) and then back again should also help.
|
||||
|
||||
I hit SysRq, but nothing seems to happen, what's wrong?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
There are some keyboards that produce a different keycode for SysRq than the
|
||||
pre-defined value of 99 (see ``KEY_SYSRQ`` in ``include/linux/input.h``), or
|
||||
which don't have a SysRq key at all. In these cases, run ``showkey -s`` to find
|
||||
an appropriate scancode sequence, and use ``setkeycodes <sequence> 99`` to map
|
||||
this sequence to the usual SysRq code (e.g., ``setkeycodes e05b 99``). It's
|
||||
probably best to put this command in a boot script. Oh, and by the way, you
|
||||
exit ``showkey`` by not typing anything for ten seconds.
|
||||
|
||||
I want to add SysRQ key events to a module, how does it work?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In order to register a basic function with the table, you must first include
|
||||
the header ``include/linux/sysrq.h``, this will define everything else you need.
|
||||
Next, you must create a ``sysrq_key_op`` struct, and populate it with A) the key
|
||||
handler function you will use, B) a help_msg string, that will print when SysRQ
|
||||
prints help, and C) an action_msg string, that will print right before your
|
||||
handler is called. Your handler must conform to the prototype in 'sysrq.h'.
|
||||
|
||||
After the ``sysrq_key_op`` is created, you can call the kernel function
|
||||
``register_sysrq_key(int key, struct sysrq_key_op *op_p);`` this will
|
||||
register the operation pointed to by ``op_p`` at table key 'key',
|
||||
if that slot in the table is blank. At module unload time, you must call
|
||||
the function ``unregister_sysrq_key(int key, struct sysrq_key_op *op_p)``, which
|
||||
will remove the key op pointed to by 'op_p' from the key 'key', if and only if
|
||||
it is currently registered in that slot. This is in case the slot has been
|
||||
overwritten since you registered it.
|
||||
|
||||
The Magic SysRQ system works by registering key operations against a key op
|
||||
lookup table, which is defined in 'drivers/tty/sysrq.c'. This key table has
|
||||
a number of operations registered into it at compile time, but is mutable,
|
||||
and 2 functions are exported for interface to it::
|
||||
|
||||
register_sysrq_key and unregister_sysrq_key.
|
||||
|
||||
Of course, never ever leave an invalid pointer in the table. I.e., when
|
||||
your module that called register_sysrq_key() exits, it must call
|
||||
unregister_sysrq_key() to clean up the sysrq key table entry that it used.
|
||||
Null pointers in the table are always safe. :)
|
||||
|
||||
If for some reason you feel the need to call the handle_sysrq function from
|
||||
within a function called by handle_sysrq, you must be aware that you are in
|
||||
a lock (you are also in an interrupt handler, which means don't sleep!), so
|
||||
you must call ``__handle_sysrq_nolock`` instead.
|
||||
|
||||
When I hit a SysRq key combination only the header appears on the console?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Sysrq output is subject to the same console loglevel control as all
|
||||
other console output. This means that if the kernel was booted 'quiet'
|
||||
as is common on distro kernels the output may not appear on the actual
|
||||
console, even though it will appear in the dmesg buffer, and be accessible
|
||||
via the dmesg command and to the consumers of ``/proc/kmsg``. As a specific
|
||||
exception the header line from the sysrq command is passed to all console
|
||||
consumers as if the current loglevel was maximum. If only the header
|
||||
is emitted it is almost certain that the kernel loglevel is too low.
|
||||
Should you require the output on the console channel then you will need
|
||||
to temporarily up the console loglevel using :kbd:`alt-sysrq-8` or::
|
||||
|
||||
echo 8 > /proc/sysrq-trigger
|
||||
|
||||
Remember to return the loglevel to normal after triggering the sysrq
|
||||
command you are interested in.
|
||||
|
||||
I have more questions, who can I ask?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Just ask them on the linux-kernel mailing list:
|
||||
linux-kernel@vger.kernel.org
|
||||
|
||||
Credits
|
||||
~~~~~~~
|
||||
|
||||
Written by Mydraal <vulpyne@vulpyne.net>
|
||||
Updated by Adam Sulmicki <adam@cfar.umd.edu>
|
||||
Updated by Jeremy M. Dolan <jmd@turbogeek.org> 2001/01/28 10:15:59
|
||||
Added to by Crutcher Dunnavant <crutcher+kernel@datastacks.com>
|
||||
59
Documentation/admin-guide/tainted-kernels.rst
Normal file
59
Documentation/admin-guide/tainted-kernels.rst
Normal file
@@ -0,0 +1,59 @@
|
||||
Tainted kernels
|
||||
---------------
|
||||
|
||||
Some oops reports contain the string **'Tainted: '** after the program
|
||||
counter. This indicates that the kernel has been tainted by some
|
||||
mechanism. The string is followed by a series of position-sensitive
|
||||
characters, each representing a particular tainted value.
|
||||
|
||||
1) 'G' if all modules loaded have a GPL or compatible license, 'P' if
|
||||
any proprietary module has been loaded. Modules without a
|
||||
MODULE_LICENSE or with a MODULE_LICENSE that is not recognised by
|
||||
insmod as GPL compatible are assumed to be proprietary.
|
||||
|
||||
2) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all
|
||||
modules were loaded normally.
|
||||
|
||||
3) ``S`` if the oops occurred on an SMP kernel running on hardware that
|
||||
hasn't been certified as safe to run multiprocessor.
|
||||
Currently this occurs only on various Athlons that are not
|
||||
SMP capable.
|
||||
|
||||
4) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all
|
||||
modules were unloaded normally.
|
||||
|
||||
5) ``M`` if any processor has reported a Machine Check Exception,
|
||||
``' '`` if no Machine Check Exceptions have occurred.
|
||||
|
||||
6) ``B`` if a page-release function has found a bad page reference or
|
||||
some unexpected page flags.
|
||||
|
||||
7) ``U`` if a user or user application specifically requested that the
|
||||
Tainted flag be set, ``' '`` otherwise.
|
||||
|
||||
8) ``D`` if the kernel has died recently, i.e. there was an OOPS or BUG.
|
||||
|
||||
9) ``A`` if the ACPI table has been overridden.
|
||||
|
||||
10) ``W`` if a warning has previously been issued by the kernel.
|
||||
(Though some warnings may set more specific taint flags.)
|
||||
|
||||
11) ``C`` if a staging driver has been loaded.
|
||||
|
||||
12) ``I`` if the kernel is working around a severe bug in the platform
|
||||
firmware (BIOS or similar).
|
||||
|
||||
13) ``O`` if an externally-built ("out-of-tree") module has been loaded.
|
||||
|
||||
14) ``E`` if an unsigned module has been loaded in a kernel supporting
|
||||
module signature.
|
||||
|
||||
15) ``L`` if a soft lockup has previously occurred on the system.
|
||||
|
||||
16) ``K`` if the kernel has been live patched.
|
||||
|
||||
The primary reason for the **'Tainted: '** string is to tell kernel
|
||||
debuggers if this is a clean kernel or if anything unusual has
|
||||
occurred. Tainting is permanent: even if an offending module is
|
||||
unloaded, the tainted value remains to indicate that the kernel is not
|
||||
trustworthy.
|
||||
189
Documentation/admin-guide/unicode.rst
Normal file
189
Documentation/admin-guide/unicode.rst
Normal file
@@ -0,0 +1,189 @@
|
||||
Unicode support
|
||||
===============
|
||||
|
||||
Last update: 2005-01-17, version 1.4
|
||||
|
||||
This file is maintained by H. Peter Anvin <unicode@lanana.org> as part
|
||||
of the Linux Assigned Names And Numbers Authority (LANANA) project.
|
||||
The current version can be found at:
|
||||
|
||||
http://www.lanana.org/docs/unicode/admin-guide/unicode.rst
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
The Linux kernel code has been rewritten to use Unicode to map
|
||||
characters to fonts. By downloading a single Unicode-to-font table,
|
||||
both the eight-bit character sets and UTF-8 mode are changed to use
|
||||
the font as indicated.
|
||||
|
||||
This changes the semantics of the eight-bit character tables subtly.
|
||||
The four character tables are now:
|
||||
|
||||
=============== =============================== ================
|
||||
Map symbol Map name Escape code (G0)
|
||||
=============== =============================== ================
|
||||
LAT1_MAP Latin-1 (ISO 8859-1) ESC ( B
|
||||
GRAF_MAP DEC VT100 pseudographics ESC ( 0
|
||||
IBMPC_MAP IBM code page 437 ESC ( U
|
||||
USER_MAP User defined ESC ( K
|
||||
=============== =============================== ================
|
||||
|
||||
In particular, ESC ( U is no longer "straight to font", since the font
|
||||
might be completely different than the IBM character set. This
|
||||
permits for example the use of block graphics even with a Latin-1 font
|
||||
loaded.
|
||||
|
||||
Note that although these codes are similar to ISO 2022, neither the
|
||||
codes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and
|
||||
G1), whereas ISO 2022 has four 7-bit codes (G0-G3).
|
||||
|
||||
In accordance with the Unicode standard/ISO 10646 the range U+F000 to
|
||||
U+F8FF has been reserved for OS-wide allocation (the Unicode Standard
|
||||
refers to this as a "Corporate Zone", since this is inaccurate for
|
||||
Linux we call it the "Linux Zone"). U+F000 was picked as the starting
|
||||
point since it lets the direct-mapping area start on a large power of
|
||||
two (in case 1024- or 2048-character fonts ever become necessary).
|
||||
This leaves U+E000 to U+EFFF as End User Zone.
|
||||
|
||||
[v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been
|
||||
hard-coded to map directly to the loaded font, bypassing the
|
||||
translation table. The user-defined map now defaults to U+F000 to
|
||||
U+F0FF, emulating the previous behaviour. In practice, this range
|
||||
might be shorter; for example, vgacon can only handle 256-character
|
||||
(U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts.
|
||||
|
||||
|
||||
Actual characters assigned in the Linux Zone
|
||||
--------------------------------------------
|
||||
|
||||
In addition, the following characters not present in Unicode 1.1.4
|
||||
have been defined; these are used by the DEC VT graphics map. [v1.2]
|
||||
THIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW.
|
||||
|
||||
====== ======================================
|
||||
U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1
|
||||
U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3
|
||||
U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7
|
||||
U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9
|
||||
====== ======================================
|
||||
|
||||
The DEC VT220 uses a 6x10 character matrix, and these characters form
|
||||
a smooth progression in the DEC VT graphics character set. I have
|
||||
omitted the scan 5 line, since it is also used as a block-graphics
|
||||
character, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL.
|
||||
|
||||
[v1.3]: These characters have been officially added to Unicode 3.2.0;
|
||||
they are added at U+23BA, U+23BB, U+23BC, U+23BD. Linux now uses the
|
||||
new values.
|
||||
|
||||
[v1.2]: The following characters have been added to represent common
|
||||
keyboard symbols that are unlikely to ever be added to Unicode proper
|
||||
since they are horribly vendor-specific. This, of course, is an
|
||||
excellent example of horrible design.
|
||||
|
||||
====== ======================================
|
||||
U+F810 KEYBOARD SYMBOL FLYING FLAG
|
||||
U+F811 KEYBOARD SYMBOL PULLDOWN MENU
|
||||
U+F812 KEYBOARD SYMBOL OPEN APPLE
|
||||
U+F813 KEYBOARD SYMBOL SOLID APPLE
|
||||
====== ======================================
|
||||
|
||||
Klingon language support
|
||||
------------------------
|
||||
|
||||
In 1996, Linux was the first operating system in the world to add
|
||||
support for the artificial language Klingon, created by Marc Okrand
|
||||
for the "Star Trek" television series. This encoding was later
|
||||
adopted by the ConScript Unicode Registry and proposed (but ultimately
|
||||
rejected) for inclusion in Unicode Plane 1. Thus, it remains as a
|
||||
Linux/CSUR private assignment in the Linux Zone.
|
||||
|
||||
This encoding has been endorsed by the Klingon Language Institute.
|
||||
For more information, contact them at:
|
||||
|
||||
http://www.kli.org/
|
||||
|
||||
Since the characters in the beginning of the Linux CZ have been more
|
||||
of the dingbats/symbols/forms type and this is a language, I have
|
||||
located it at the end, on a 16-cell boundary in keeping with standard
|
||||
Unicode practice.
|
||||
|
||||
.. note::
|
||||
|
||||
This range is now officially managed by the ConScript Unicode
|
||||
Registry. The normative reference is at:
|
||||
|
||||
http://www.evertype.com/standards/csur/klingon.html
|
||||
|
||||
Klingon has an alphabet of 26 characters, a positional numeric writing
|
||||
system with 10 digits, and is written left-to-right, top-to-bottom.
|
||||
|
||||
Several glyph forms for the Klingon alphabet have been proposed.
|
||||
However, since the set of symbols appear to be consistent throughout,
|
||||
with only the actual shapes being different, in keeping with standard
|
||||
Unicode practice these differences are considered font variants.
|
||||
|
||||
====== =======================================================
|
||||
U+F8D0 KLINGON LETTER A
|
||||
U+F8D1 KLINGON LETTER B
|
||||
U+F8D2 KLINGON LETTER CH
|
||||
U+F8D3 KLINGON LETTER D
|
||||
U+F8D4 KLINGON LETTER E
|
||||
U+F8D5 KLINGON LETTER GH
|
||||
U+F8D6 KLINGON LETTER H
|
||||
U+F8D7 KLINGON LETTER I
|
||||
U+F8D8 KLINGON LETTER J
|
||||
U+F8D9 KLINGON LETTER L
|
||||
U+F8DA KLINGON LETTER M
|
||||
U+F8DB KLINGON LETTER N
|
||||
U+F8DC KLINGON LETTER NG
|
||||
U+F8DD KLINGON LETTER O
|
||||
U+F8DE KLINGON LETTER P
|
||||
U+F8DF KLINGON LETTER Q
|
||||
- Written <q> in standard Okrand Latin transliteration
|
||||
U+F8E0 KLINGON LETTER QH
|
||||
- Written <Q> in standard Okrand Latin transliteration
|
||||
U+F8E1 KLINGON LETTER R
|
||||
U+F8E2 KLINGON LETTER S
|
||||
U+F8E3 KLINGON LETTER T
|
||||
U+F8E4 KLINGON LETTER TLH
|
||||
U+F8E5 KLINGON LETTER U
|
||||
U+F8E6 KLINGON LETTER V
|
||||
U+F8E7 KLINGON LETTER W
|
||||
U+F8E8 KLINGON LETTER Y
|
||||
U+F8E9 KLINGON LETTER GLOTTAL STOP
|
||||
|
||||
U+F8F0 KLINGON DIGIT ZERO
|
||||
U+F8F1 KLINGON DIGIT ONE
|
||||
U+F8F2 KLINGON DIGIT TWO
|
||||
U+F8F3 KLINGON DIGIT THREE
|
||||
U+F8F4 KLINGON DIGIT FOUR
|
||||
U+F8F5 KLINGON DIGIT FIVE
|
||||
U+F8F6 KLINGON DIGIT SIX
|
||||
U+F8F7 KLINGON DIGIT SEVEN
|
||||
U+F8F8 KLINGON DIGIT EIGHT
|
||||
U+F8F9 KLINGON DIGIT NINE
|
||||
|
||||
U+F8FD KLINGON COMMA
|
||||
U+F8FE KLINGON FULL STOP
|
||||
U+F8FF KLINGON SYMBOL FOR EMPIRE
|
||||
====== =======================================================
|
||||
|
||||
Other Fictional and Artificial Scripts
|
||||
--------------------------------------
|
||||
|
||||
Since the assignment of the Klingon Linux Unicode block, a registry of
|
||||
fictional and artificial scripts has been established by John Cowan
|
||||
<jcowan@reutershealth.com> and Michael Everson <everson@evertype.com>.
|
||||
The ConScript Unicode Registry is accessible at:
|
||||
|
||||
http://www.evertype.com/standards/csur/
|
||||
|
||||
The ranges used fall at the low end of the End User Zone and can hence
|
||||
not be normatively assigned, but it is recommended that people who
|
||||
wish to encode fictional scripts use these codes, in the interest of
|
||||
interoperability. For Klingon, CSUR has adopted the Linux encoding.
|
||||
The CSUR people are driving adding Tengwar and Cirth into Unicode
|
||||
Plane 1; the addition of Klingon to Unicode Plane 1 has been rejected
|
||||
and so the above encoding remains official.
|
||||
62
Documentation/admin-guide/vga-softcursor.rst
Normal file
62
Documentation/admin-guide/vga-softcursor.rst
Normal file
@@ -0,0 +1,62 @@
|
||||
Software cursor for VGA
|
||||
=======================
|
||||
|
||||
by Pavel Machek <pavel@atrey.karlin.mff.cuni.cz>
|
||||
and Martin Mares <mj@atrey.karlin.mff.cuni.cz>
|
||||
|
||||
Linux now has some ability to manipulate cursor appearance. Normally,
|
||||
you can set the size of hardware cursor. You can now play a few new
|
||||
tricks: you can make your cursor look like a non-blinking red block,
|
||||
make it inverse background of the character it's over or to highlight
|
||||
that character and still choose whether the original hardware cursor
|
||||
should remain visible or not. There may be other things I have never
|
||||
thought of.
|
||||
|
||||
The cursor appearance is controlled by a ``<ESC>[?1;2;3c`` escape sequence
|
||||
where 1, 2 and 3 are parameters described below. If you omit any of them,
|
||||
they will default to zeroes.
|
||||
|
||||
first Parameter
|
||||
specifies cursor size::
|
||||
|
||||
0=default
|
||||
1=invisible
|
||||
2=underline,
|
||||
...
|
||||
8=full block
|
||||
+ 16 if you want the software cursor to be applied
|
||||
+ 32 if you want to always change the background color
|
||||
+ 64 if you dislike having the background the same as the
|
||||
foreground.
|
||||
|
||||
Highlights are ignored for the last two flags.
|
||||
|
||||
second parameter
|
||||
selects character attribute bits you want to change
|
||||
(by simply XORing them with the value of this parameter). On standard
|
||||
VGA, the high four bits specify background and the low four the
|
||||
foreground. In both groups, low three bits set color (as in normal
|
||||
color codes used by the console) and the most significant one turns
|
||||
on highlight (or sometimes blinking -- it depends on the configuration
|
||||
of your VGA).
|
||||
|
||||
third parameter
|
||||
consists of character attribute bits you want to set.
|
||||
|
||||
Bit setting takes place before bit toggling, so you can simply clear a
|
||||
bit by including it in both the set mask and the toggle mask.
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
To get normal blinking underline, use::
|
||||
|
||||
echo -e '\033[?2c'
|
||||
|
||||
To get blinking block, use::
|
||||
|
||||
echo -e '\033[?6c'
|
||||
|
||||
To get red non-blinking block, use::
|
||||
|
||||
echo -e '\033[?17;0;64c'
|
||||
@@ -1,465 +0,0 @@
|
||||
.. _applying_patches:
|
||||
|
||||
Applying Patches To The Linux Kernel
|
||||
++++++++++++++++++++++++++++++++++++
|
||||
|
||||
Original by:
|
||||
Jesper Juhl, August 2005
|
||||
|
||||
Last update:
|
||||
2016-09-14
|
||||
|
||||
|
||||
A frequently asked question on the Linux Kernel Mailing List is how to apply
|
||||
a patch to the kernel or, more specifically, what base kernel a patch for
|
||||
one of the many trees/branches should be applied to. Hopefully this document
|
||||
will explain this to you.
|
||||
|
||||
In addition to explaining how to apply and revert patches, a brief
|
||||
description of the different kernel trees (and examples of how to apply
|
||||
their specific patches) is also provided.
|
||||
|
||||
|
||||
What is a patch?
|
||||
================
|
||||
|
||||
A patch is a small text document containing a delta of changes between two
|
||||
different versions of a source tree. Patches are created with the ``diff``
|
||||
program.
|
||||
|
||||
To correctly apply a patch you need to know what base it was generated from
|
||||
and what new version the patch will change the source tree into. These
|
||||
should both be present in the patch file metadata or be possible to deduce
|
||||
from the filename.
|
||||
|
||||
|
||||
How do I apply or revert a patch?
|
||||
=================================
|
||||
|
||||
You apply a patch with the ``patch`` program. The patch program reads a diff
|
||||
(or patch) file and makes the changes to the source tree described in it.
|
||||
|
||||
Patches for the Linux kernel are generated relative to the parent directory
|
||||
holding the kernel source dir.
|
||||
|
||||
This means that paths to files inside the patch file contain the name of the
|
||||
kernel source directories it was generated against (or some other directory
|
||||
names like "a/" and "b/").
|
||||
|
||||
Since this is unlikely to match the name of the kernel source dir on your
|
||||
local machine (but is often useful info to see what version an otherwise
|
||||
unlabeled patch was generated against) you should change into your kernel
|
||||
source directory and then strip the first element of the path from filenames
|
||||
in the patch file when applying it (the ``-p1`` argument to ``patch`` does
|
||||
this).
|
||||
|
||||
To revert a previously applied patch, use the -R argument to patch.
|
||||
So, if you applied a patch like this::
|
||||
|
||||
patch -p1 < ../patch-x.y.z
|
||||
|
||||
You can revert (undo) it like this::
|
||||
|
||||
patch -R -p1 < ../patch-x.y.z
|
||||
|
||||
|
||||
How do I feed a patch/diff file to ``patch``?
|
||||
=============================================
|
||||
|
||||
This (as usual with Linux and other UNIX like operating systems) can be
|
||||
done in several different ways.
|
||||
|
||||
In all the examples below I feed the file (in uncompressed form) to patch
|
||||
via stdin using the following syntax::
|
||||
|
||||
patch -p1 < path/to/patch-x.y.z
|
||||
|
||||
If you just want to be able to follow the examples below and don't want to
|
||||
know of more than one way to use patch, then you can stop reading this
|
||||
section here.
|
||||
|
||||
Patch can also get the name of the file to use via the -i argument, like
|
||||
this::
|
||||
|
||||
patch -p1 -i path/to/patch-x.y.z
|
||||
|
||||
If your patch file is compressed with gzip or xz and you don't want to
|
||||
uncompress it before applying it, then you can feed it to patch like this
|
||||
instead::
|
||||
|
||||
xzcat path/to/patch-x.y.z.xz | patch -p1
|
||||
bzcat path/to/patch-x.y.z.gz | patch -p1
|
||||
|
||||
If you wish to uncompress the patch file by hand first before applying it
|
||||
(what I assume you've done in the examples below), then you simply run
|
||||
gunzip or xz on the file -- like this::
|
||||
|
||||
gunzip patch-x.y.z.gz
|
||||
xz -d patch-x.y.z.xz
|
||||
|
||||
Which will leave you with a plain text patch-x.y.z file that you can feed to
|
||||
patch via stdin or the ``-i`` argument, as you prefer.
|
||||
|
||||
A few other nice arguments for patch are ``-s`` which causes patch to be silent
|
||||
except for errors which is nice to prevent errors from scrolling out of the
|
||||
screen too fast, and ``--dry-run`` which causes patch to just print a listing of
|
||||
what would happen, but doesn't actually make any changes. Finally ``--verbose``
|
||||
tells patch to print more information about the work being done.
|
||||
|
||||
|
||||
Common errors when patching
|
||||
===========================
|
||||
|
||||
When patch applies a patch file it attempts to verify the sanity of the
|
||||
file in different ways.
|
||||
|
||||
Checking that the file looks like a valid patch file and checking the code
|
||||
around the bits being modified matches the context provided in the patch are
|
||||
just two of the basic sanity checks patch does.
|
||||
|
||||
If patch encounters something that doesn't look quite right it has two
|
||||
options. It can either refuse to apply the changes and abort or it can try
|
||||
to find a way to make the patch apply with a few minor changes.
|
||||
|
||||
One example of something that's not 'quite right' that patch will attempt to
|
||||
fix up is if all the context matches, the lines being changed match, but the
|
||||
line numbers are different. This can happen, for example, if the patch makes
|
||||
a change in the middle of the file but for some reasons a few lines have
|
||||
been added or removed near the beginning of the file. In that case
|
||||
everything looks good it has just moved up or down a bit, and patch will
|
||||
usually adjust the line numbers and apply the patch.
|
||||
|
||||
Whenever patch applies a patch that it had to modify a bit to make it fit
|
||||
it'll tell you about it by saying the patch applied with **fuzz**.
|
||||
You should be wary of such changes since even though patch probably got it
|
||||
right it doesn't /always/ get it right, and the result will sometimes be
|
||||
wrong.
|
||||
|
||||
When patch encounters a change that it can't fix up with fuzz it rejects it
|
||||
outright and leaves a file with a ``.rej`` extension (a reject file). You can
|
||||
read this file to see exactly what change couldn't be applied, so you can
|
||||
go fix it up by hand if you wish.
|
||||
|
||||
If you don't have any third-party patches applied to your kernel source, but
|
||||
only patches from kernel.org and you apply the patches in the correct order,
|
||||
and have made no modifications yourself to the source files, then you should
|
||||
never see a fuzz or reject message from patch. If you do see such messages
|
||||
anyway, then there's a high risk that either your local source tree or the
|
||||
patch file is corrupted in some way. In that case you should probably try
|
||||
re-downloading the patch and if things are still not OK then you'd be advised
|
||||
to start with a fresh tree downloaded in full from kernel.org.
|
||||
|
||||
Let's look a bit more at some of the messages patch can produce.
|
||||
|
||||
If patch stops and presents a ``File to patch:`` prompt, then patch could not
|
||||
find a file to be patched. Most likely you forgot to specify -p1 or you are
|
||||
in the wrong directory. Less often, you'll find patches that need to be
|
||||
applied with ``-p0`` instead of ``-p1`` (reading the patch file should reveal if
|
||||
this is the case -- if so, then this is an error by the person who created
|
||||
the patch but is not fatal).
|
||||
|
||||
If you get ``Hunk #2 succeeded at 1887 with fuzz 2 (offset 7 lines).`` or a
|
||||
message similar to that, then it means that patch had to adjust the location
|
||||
of the change (in this example it needed to move 7 lines from where it
|
||||
expected to make the change to make it fit).
|
||||
|
||||
The resulting file may or may not be OK, depending on the reason the file
|
||||
was different than expected.
|
||||
|
||||
This often happens if you try to apply a patch that was generated against a
|
||||
different kernel version than the one you are trying to patch.
|
||||
|
||||
If you get a message like ``Hunk #3 FAILED at 2387.``, then it means that the
|
||||
patch could not be applied correctly and the patch program was unable to
|
||||
fuzz its way through. This will generate a ``.rej`` file with the change that
|
||||
caused the patch to fail and also a ``.orig`` file showing you the original
|
||||
content that couldn't be changed.
|
||||
|
||||
If you get ``Reversed (or previously applied) patch detected! Assume -R? [n]``
|
||||
then patch detected that the change contained in the patch seems to have
|
||||
already been made.
|
||||
|
||||
If you actually did apply this patch previously and you just re-applied it
|
||||
in error, then just say [n]o and abort this patch. If you applied this patch
|
||||
previously and actually intended to revert it, but forgot to specify -R,
|
||||
then you can say [**y**]es here to make patch revert it for you.
|
||||
|
||||
This can also happen if the creator of the patch reversed the source and
|
||||
destination directories when creating the patch, and in that case reverting
|
||||
the patch will in fact apply it.
|
||||
|
||||
A message similar to ``patch: **** unexpected end of file in patch`` or
|
||||
``patch unexpectedly ends in middle of line`` means that patch could make no
|
||||
sense of the file you fed to it. Either your download is broken, you tried to
|
||||
feed patch a compressed patch file without uncompressing it first, or the patch
|
||||
file that you are using has been mangled by a mail client or mail transfer
|
||||
agent along the way somewhere, e.g., by splitting a long line into two lines.
|
||||
Often these warnings can easily be fixed by joining (concatenating) the
|
||||
two lines that had been split.
|
||||
|
||||
As I already mentioned above, these errors should never happen if you apply
|
||||
a patch from kernel.org to the correct version of an unmodified source tree.
|
||||
So if you get these errors with kernel.org patches then you should probably
|
||||
assume that either your patch file or your tree is broken and I'd advise you
|
||||
to start over with a fresh download of a full kernel tree and the patch you
|
||||
wish to apply.
|
||||
|
||||
|
||||
Are there any alternatives to ``patch``?
|
||||
========================================
|
||||
|
||||
|
||||
Yes there are alternatives.
|
||||
|
||||
You can use the ``interdiff`` program (http://cyberelk.net/tim/patchutils/) to
|
||||
generate a patch representing the differences between two patches and then
|
||||
apply the result.
|
||||
|
||||
This will let you move from something like 4.7.2 to 4.7.3 in a single
|
||||
step. The -z flag to interdiff will even let you feed it patches in gzip or
|
||||
bzip2 compressed form directly without the use of zcat or bzcat or manual
|
||||
decompression.
|
||||
|
||||
Here's how you'd go from 4.7.2 to 4.7.3 in a single step::
|
||||
|
||||
interdiff -z ../patch-4.7.2.gz ../patch-4.7.3.gz | patch -p1
|
||||
|
||||
Although interdiff may save you a step or two you are generally advised to
|
||||
do the additional steps since interdiff can get things wrong in some cases.
|
||||
|
||||
Another alternative is ``ketchup``, which is a python script for automatic
|
||||
downloading and applying of patches (http://www.selenic.com/ketchup/).
|
||||
|
||||
Other nice tools are diffstat, which shows a summary of changes made by a
|
||||
patch; lsdiff, which displays a short listing of affected files in a patch
|
||||
file, along with (optionally) the line numbers of the start of each patch;
|
||||
and grepdiff, which displays a list of the files modified by a patch where
|
||||
the patch contains a given regular expression.
|
||||
|
||||
|
||||
Where can I download the patches?
|
||||
=================================
|
||||
|
||||
The patches are available at http://kernel.org/
|
||||
Most recent patches are linked from the front page, but they also have
|
||||
specific homes.
|
||||
|
||||
The 4.x.y (-stable) and 4.x patches live at
|
||||
|
||||
ftp://ftp.kernel.org/pub/linux/kernel/v4.x/
|
||||
|
||||
The -rc patches live at
|
||||
|
||||
ftp://ftp.kernel.org/pub/linux/kernel/v4.x/testing/
|
||||
|
||||
In place of ``ftp.kernel.org`` you can use ``ftp.cc.kernel.org``, where cc is a
|
||||
country code. This way you'll be downloading from a mirror site that's most
|
||||
likely geographically closer to you, resulting in faster downloads for you,
|
||||
less bandwidth used globally and less load on the main kernel.org servers --
|
||||
these are good things, so do use mirrors when possible.
|
||||
|
||||
|
||||
The 4.x kernels
|
||||
===============
|
||||
|
||||
These are the base stable releases released by Linus. The highest numbered
|
||||
release is the most recent.
|
||||
|
||||
If regressions or other serious flaws are found, then a -stable fix patch
|
||||
will be released (see below) on top of this base. Once a new 4.x base
|
||||
kernel is released, a patch is made available that is a delta between the
|
||||
previous 4.x kernel and the new one.
|
||||
|
||||
To apply a patch moving from 4.6 to 4.7, you'd do the following (note
|
||||
that such patches do **NOT** apply on top of 4.x.y kernels but on top of the
|
||||
base 4.x kernel -- if you need to move from 4.x.y to 4.x+1 you need to
|
||||
first revert the 4.x.y patch).
|
||||
|
||||
Here are some examples::
|
||||
|
||||
# moving from 4.6 to 4.7
|
||||
|
||||
$ cd ~/linux-4.6 # change to kernel source dir
|
||||
$ patch -p1 < ../patch-4.7 # apply the 4.7 patch
|
||||
$ cd ..
|
||||
$ mv linux-4.6 linux-4.7 # rename source dir
|
||||
|
||||
# moving from 4.6.1 to 4.7
|
||||
|
||||
$ cd ~/linux-4.6.1 # change to kernel source dir
|
||||
$ patch -p1 -R < ../patch-4.6.1 # revert the 4.6.1 patch
|
||||
# source dir is now 4.6
|
||||
$ patch -p1 < ../patch-4.7 # apply new 4.7 patch
|
||||
$ cd ..
|
||||
$ mv linux-4.6.1 linux-4.7 # rename source dir
|
||||
|
||||
|
||||
The 4.x.y kernels
|
||||
=================
|
||||
|
||||
Kernels with 3-digit versions are -stable kernels. They contain small(ish)
|
||||
critical fixes for security problems or significant regressions discovered
|
||||
in a given 4.x kernel.
|
||||
|
||||
This is the recommended branch for users who want the most recent stable
|
||||
kernel and are not interested in helping test development/experimental
|
||||
versions.
|
||||
|
||||
If no 4.x.y kernel is available, then the highest numbered 4.x kernel is
|
||||
the current stable kernel.
|
||||
|
||||
.. note::
|
||||
|
||||
The -stable team usually do make incremental patches available as well
|
||||
as patches against the latest mainline release, but I only cover the
|
||||
non-incremental ones below. The incremental ones can be found at
|
||||
ftp://ftp.kernel.org/pub/linux/kernel/v4.x/incr/
|
||||
|
||||
These patches are not incremental, meaning that for example the 4.7.3
|
||||
patch does not apply on top of the 4.7.2 kernel source, but rather on top
|
||||
of the base 4.7 kernel source.
|
||||
|
||||
So, in order to apply the 4.7.3 patch to your existing 4.7.2 kernel
|
||||
source you have to first back out the 4.7.2 patch (so you are left with a
|
||||
base 4.7 kernel source) and then apply the new 4.7.3 patch.
|
||||
|
||||
Here's a small example::
|
||||
|
||||
$ cd ~/linux-4.7.2 # change to the kernel source dir
|
||||
$ patch -p1 -R < ../patch-4.7.2 # revert the 4.7.2 patch
|
||||
$ patch -p1 < ../patch-4.7.3 # apply the new 4.7.3 patch
|
||||
$ cd ..
|
||||
$ mv linux-4.7.2 linux-4.7.3 # rename the kernel source dir
|
||||
|
||||
The -rc kernels
|
||||
===============
|
||||
|
||||
These are release-candidate kernels. These are development kernels released
|
||||
by Linus whenever he deems the current git (the kernel's source management
|
||||
tool) tree to be in a reasonably sane state adequate for testing.
|
||||
|
||||
These kernels are not stable and you should expect occasional breakage if
|
||||
you intend to run them. This is however the most stable of the main
|
||||
development branches and is also what will eventually turn into the next
|
||||
stable kernel, so it is important that it be tested by as many people as
|
||||
possible.
|
||||
|
||||
This is a good branch to run for people who want to help out testing
|
||||
development kernels but do not want to run some of the really experimental
|
||||
stuff (such people should see the sections about -git and -mm kernels below).
|
||||
|
||||
The -rc patches are not incremental, they apply to a base 4.x kernel, just
|
||||
like the 4.x.y patches described above. The kernel version before the -rcN
|
||||
suffix denotes the version of the kernel that this -rc kernel will eventually
|
||||
turn into.
|
||||
|
||||
So, 4.8-rc5 means that this is the fifth release candidate for the 4.8
|
||||
kernel and the patch should be applied on top of the 4.7 kernel source.
|
||||
|
||||
Here are 3 examples of how to apply these patches::
|
||||
|
||||
# first an example of moving from 4.7 to 4.8-rc3
|
||||
|
||||
$ cd ~/linux-4.7 # change to the 4.7 source dir
|
||||
$ patch -p1 < ../patch-4.8-rc3 # apply the 4.8-rc3 patch
|
||||
$ cd ..
|
||||
$ mv linux-4.7 linux-4.8-rc3 # rename the source dir
|
||||
|
||||
# now let's move from 4.8-rc3 to 4.8-rc5
|
||||
|
||||
$ cd ~/linux-4.8-rc3 # change to the 4.8-rc3 dir
|
||||
$ patch -p1 -R < ../patch-4.8-rc3 # revert the 4.8-rc3 patch
|
||||
$ patch -p1 < ../patch-4.8-rc5 # apply the new 4.8-rc5 patch
|
||||
$ cd ..
|
||||
$ mv linux-4.8-rc3 linux-4.8-rc5 # rename the source dir
|
||||
|
||||
# finally let's try and move from 4.7.3 to 4.8-rc5
|
||||
|
||||
$ cd ~/linux-4.7.3 # change to the kernel source dir
|
||||
$ patch -p1 -R < ../patch-4.7.3 # revert the 4.7.3 patch
|
||||
$ patch -p1 < ../patch-4.8-rc5 # apply new 4.8-rc5 patch
|
||||
$ cd ..
|
||||
$ mv linux-4.7.3 linux-4.8-rc5 # rename the kernel source dir
|
||||
|
||||
|
||||
The -git kernels
|
||||
================
|
||||
|
||||
These are daily snapshots of Linus' kernel tree (managed in a git
|
||||
repository, hence the name).
|
||||
|
||||
These patches are usually released daily and represent the current state of
|
||||
Linus's tree. They are more experimental than -rc kernels since they are
|
||||
generated automatically without even a cursory glance to see if they are
|
||||
sane.
|
||||
|
||||
-git patches are not incremental and apply either to a base 4.x kernel or
|
||||
a base 4.x-rc kernel -- you can see which from their name.
|
||||
A patch named 4.7-git1 applies to the 4.7 kernel source and a patch
|
||||
named 4.8-rc3-git2 applies to the source of the 4.8-rc3 kernel.
|
||||
|
||||
Here are some examples of how to apply these patches::
|
||||
|
||||
# moving from 4.7 to 4.7-git1
|
||||
|
||||
$ cd ~/linux-4.7 # change to the kernel source dir
|
||||
$ patch -p1 < ../patch-4.7-git1 # apply the 4.7-git1 patch
|
||||
$ cd ..
|
||||
$ mv linux-4.7 linux-4.7-git1 # rename the kernel source dir
|
||||
|
||||
# moving from 4.7-git1 to 4.8-rc2-git3
|
||||
|
||||
$ cd ~/linux-4.7-git1 # change to the kernel source dir
|
||||
$ patch -p1 -R < ../patch-4.7-git1 # revert the 4.7-git1 patch
|
||||
# we now have a 4.7 kernel
|
||||
$ patch -p1 < ../patch-4.8-rc2 # apply the 4.8-rc2 patch
|
||||
# the kernel is now 4.8-rc2
|
||||
$ patch -p1 < ../patch-4.8-rc2-git3 # apply the 4.8-rc2-git3 patch
|
||||
# the kernel is now 4.8-rc2-git3
|
||||
$ cd ..
|
||||
$ mv linux-4.7-git1 linux-4.8-rc2-git3 # rename source dir
|
||||
|
||||
|
||||
The -mm patches and the linux-next tree
|
||||
=======================================
|
||||
|
||||
The -mm patches are experimental patches released by Andrew Morton.
|
||||
|
||||
In the past, -mm tree were used to also test subsystem patches, but this
|
||||
function is now done via the
|
||||
:ref:`linux-next <https://www.kernel.org/doc/man-pages/linux-next.html>`
|
||||
tree. The Subsystem maintainers push their patches first to linux-next,
|
||||
and, during the merge window, sends them directly to Linus.
|
||||
|
||||
The -mm patches serve as a sort of proving ground for new features and other
|
||||
experimental patches that aren't merged via a subsystem tree.
|
||||
Once such patches has proved its worth in -mm for a while Andrew pushes
|
||||
it on to Linus for inclusion in mainline.
|
||||
|
||||
The linux-next tree is daily updated, and includes the -mm patches.
|
||||
Both are in constant flux and contains many experimental features, a
|
||||
lot of debugging patches not appropriate for mainline etc., and is the most
|
||||
experimental of the branches described in this document.
|
||||
|
||||
These patches are not appropriate for use on systems that are supposed to be
|
||||
stable and they are more risky to run than any of the other branches (make
|
||||
sure you have up-to-date backups -- that goes for any experimental kernel but
|
||||
even more so for -mm patches or using a Kernel from the linux-next tree).
|
||||
|
||||
Testing of -mm patches and linux-next is greatly appreciated since the whole
|
||||
point of those are to weed out regressions, crashes, data corruption bugs,
|
||||
build breakage (and any other bug in general) before changes are merged into
|
||||
the more stable mainline Linus tree.
|
||||
|
||||
But testers of -mm and linux-next should be aware that breakages are
|
||||
more common than in any other tree.
|
||||
|
||||
|
||||
This concludes this list of explanations of the various kernel trees.
|
||||
I hope you are now clear on how to apply the various patches and help testing
|
||||
the kernel.
|
||||
|
||||
Thank you's to Randy Dunlap, Rolf Eike Beer, Linus Torvalds, Bodo Eggert,
|
||||
Johannes Stezenbach, Grant Coady, Pavel Machek and others that I may have
|
||||
forgotten for their reviews and contributions to this document.
|
||||
|
||||
@@ -51,7 +51,7 @@ As an alternative, the boot loader can pass the relevant 'console='
|
||||
option to the kernel via the tagged lists specifying the port, and
|
||||
serial format options as described in
|
||||
|
||||
Documentation/kernel-parameters.txt.
|
||||
Documentation/admin-guide/kernel-parameters.rst.
|
||||
|
||||
|
||||
3. Detect the machine type
|
||||
|
||||
@@ -5,7 +5,8 @@ Introduction
|
||||
------------
|
||||
|
||||
The STMicroelectronics family of Cortex-M based MCUs are supported by the
|
||||
'STM32' platform of ARM Linux. Currently only the STM32F429 is supported.
|
||||
'STM32' platform of ARM Linux. Currently only the STM32F429 (Cortex-M4)
|
||||
and STM32F746 (Cortex-M7) are supported.
|
||||
|
||||
|
||||
Configuration
|
||||
|
||||
34
Documentation/arm/stm32/stm32f746-overview.txt
Normal file
34
Documentation/arm/stm32/stm32f746-overview.txt
Normal file
@@ -0,0 +1,34 @@
|
||||
STM32F746 Overview
|
||||
==================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
The STM32F746 is a Cortex-M7 MCU aimed at various applications.
|
||||
It features:
|
||||
- Cortex-M7 core running up to @216MHz
|
||||
- 1MB internal flash, 320KBytes internal RAM (+4KB of backup SRAM)
|
||||
- FMC controller to connect SDRAM, NOR and NAND memories
|
||||
- Dual mode QSPI
|
||||
- SD/MMC/SDIO support
|
||||
- Ethernet controller
|
||||
- USB OTFG FS & HS controllers
|
||||
- I2C, SPI, CAN busses support
|
||||
- Several 16 & 32 bits general purpose timers
|
||||
- Serial Audio interface
|
||||
- LCD controller
|
||||
- HDMI-CEC
|
||||
- SPDIFRX
|
||||
|
||||
Resources
|
||||
---------
|
||||
Datasheet and reference manual are publicly available on ST website:
|
||||
- http://www.st.com/content/st_com/en/products/microcontrollers/stm32-32-bit-arm-cortex-mcus/stm32f7-series/stm32f7x6/stm32f746ng.html
|
||||
|
||||
Document Author
|
||||
---------------
|
||||
Alexandre Torgue <alexandre.torgue@st.com>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -1,574 +0,0 @@
|
||||
========================================
|
||||
GENERIC ASSOCIATIVE ARRAY IMPLEMENTATION
|
||||
========================================
|
||||
|
||||
Contents:
|
||||
|
||||
- Overview.
|
||||
|
||||
- The public API.
|
||||
- Edit script.
|
||||
- Operations table.
|
||||
- Manipulation functions.
|
||||
- Access functions.
|
||||
- Index key form.
|
||||
|
||||
- Internal workings.
|
||||
- Basic internal tree layout.
|
||||
- Shortcuts.
|
||||
- Splitting and collapsing nodes.
|
||||
- Non-recursive iteration.
|
||||
- Simultaneous alteration and iteration.
|
||||
|
||||
|
||||
========
|
||||
OVERVIEW
|
||||
========
|
||||
|
||||
This associative array implementation is an object container with the following
|
||||
properties:
|
||||
|
||||
(1) Objects are opaque pointers. The implementation does not care where they
|
||||
point (if anywhere) or what they point to (if anything).
|
||||
|
||||
[!] NOTE: Pointers to objects _must_ be zero in the least significant bit.
|
||||
|
||||
(2) Objects do not need to contain linkage blocks for use by the array. This
|
||||
permits an object to be located in multiple arrays simultaneously.
|
||||
Rather, the array is made up of metadata blocks that point to objects.
|
||||
|
||||
(3) Objects require index keys to locate them within the array.
|
||||
|
||||
(4) Index keys must be unique. Inserting an object with the same key as one
|
||||
already in the array will replace the old object.
|
||||
|
||||
(5) Index keys can be of any length and can be of different lengths.
|
||||
|
||||
(6) Index keys should encode the length early on, before any variation due to
|
||||
length is seen.
|
||||
|
||||
(7) Index keys can include a hash to scatter objects throughout the array.
|
||||
|
||||
(8) The array can iterated over. The objects will not necessarily come out in
|
||||
key order.
|
||||
|
||||
(9) The array can be iterated over whilst it is being modified, provided the
|
||||
RCU readlock is being held by the iterator. Note, however, under these
|
||||
circumstances, some objects may be seen more than once. If this is a
|
||||
problem, the iterator should lock against modification. Objects will not
|
||||
be missed, however, unless deleted.
|
||||
|
||||
(10) Objects in the array can be looked up by means of their index key.
|
||||
|
||||
(11) Objects can be looked up whilst the array is being modified, provided the
|
||||
RCU readlock is being held by the thread doing the look up.
|
||||
|
||||
The implementation uses a tree of 16-pointer nodes internally that are indexed
|
||||
on each level by nibbles from the index key in the same manner as in a radix
|
||||
tree. To improve memory efficiency, shortcuts can be emplaced to skip over
|
||||
what would otherwise be a series of single-occupancy nodes. Further, nodes
|
||||
pack leaf object pointers into spare space in the node rather than making an
|
||||
extra branch until as such time an object needs to be added to a full node.
|
||||
|
||||
|
||||
==============
|
||||
THE PUBLIC API
|
||||
==============
|
||||
|
||||
The public API can be found in <linux/assoc_array.h>. The associative array is
|
||||
rooted on the following structure:
|
||||
|
||||
struct assoc_array {
|
||||
...
|
||||
};
|
||||
|
||||
The code is selected by enabling CONFIG_ASSOCIATIVE_ARRAY.
|
||||
|
||||
|
||||
EDIT SCRIPT
|
||||
-----------
|
||||
|
||||
The insertion and deletion functions produce an 'edit script' that can later be
|
||||
applied to effect the changes without risking ENOMEM. This retains the
|
||||
preallocated metadata blocks that will be installed in the internal tree and
|
||||
keeps track of the metadata blocks that will be removed from the tree when the
|
||||
script is applied.
|
||||
|
||||
This is also used to keep track of dead blocks and dead objects after the
|
||||
script has been applied so that they can be freed later. The freeing is done
|
||||
after an RCU grace period has passed - thus allowing access functions to
|
||||
proceed under the RCU read lock.
|
||||
|
||||
The script appears as outside of the API as a pointer of the type:
|
||||
|
||||
struct assoc_array_edit;
|
||||
|
||||
There are two functions for dealing with the script:
|
||||
|
||||
(1) Apply an edit script.
|
||||
|
||||
void assoc_array_apply_edit(struct assoc_array_edit *edit);
|
||||
|
||||
This will perform the edit functions, interpolating various write barriers
|
||||
to permit accesses under the RCU read lock to continue. The edit script
|
||||
will then be passed to call_rcu() to free it and any dead stuff it points
|
||||
to.
|
||||
|
||||
(2) Cancel an edit script.
|
||||
|
||||
void assoc_array_cancel_edit(struct assoc_array_edit *edit);
|
||||
|
||||
This frees the edit script and all preallocated memory immediately. If
|
||||
this was for insertion, the new object is _not_ released by this function,
|
||||
but must rather be released by the caller.
|
||||
|
||||
These functions are guaranteed not to fail.
|
||||
|
||||
|
||||
OPERATIONS TABLE
|
||||
----------------
|
||||
|
||||
Various functions take a table of operations:
|
||||
|
||||
struct assoc_array_ops {
|
||||
...
|
||||
};
|
||||
|
||||
This points to a number of methods, all of which need to be provided:
|
||||
|
||||
(1) Get a chunk of index key from caller data:
|
||||
|
||||
unsigned long (*get_key_chunk)(const void *index_key, int level);
|
||||
|
||||
This should return a chunk of caller-supplied index key starting at the
|
||||
*bit* position given by the level argument. The level argument will be a
|
||||
multiple of ASSOC_ARRAY_KEY_CHUNK_SIZE and the function should return
|
||||
ASSOC_ARRAY_KEY_CHUNK_SIZE bits. No error is possible.
|
||||
|
||||
|
||||
(2) Get a chunk of an object's index key.
|
||||
|
||||
unsigned long (*get_object_key_chunk)(const void *object, int level);
|
||||
|
||||
As the previous function, but gets its data from an object in the array
|
||||
rather than from a caller-supplied index key.
|
||||
|
||||
|
||||
(3) See if this is the object we're looking for.
|
||||
|
||||
bool (*compare_object)(const void *object, const void *index_key);
|
||||
|
||||
Compare the object against an index key and return true if it matches and
|
||||
false if it doesn't.
|
||||
|
||||
|
||||
(4) Diff the index keys of two objects.
|
||||
|
||||
int (*diff_objects)(const void *object, const void *index_key);
|
||||
|
||||
Return the bit position at which the index key of the specified object
|
||||
differs from the given index key or -1 if they are the same.
|
||||
|
||||
|
||||
(5) Free an object.
|
||||
|
||||
void (*free_object)(void *object);
|
||||
|
||||
Free the specified object. Note that this may be called an RCU grace
|
||||
period after assoc_array_apply_edit() was called, so synchronize_rcu() may
|
||||
be necessary on module unloading.
|
||||
|
||||
|
||||
MANIPULATION FUNCTIONS
|
||||
----------------------
|
||||
|
||||
There are a number of functions for manipulating an associative array:
|
||||
|
||||
(1) Initialise an associative array.
|
||||
|
||||
void assoc_array_init(struct assoc_array *array);
|
||||
|
||||
This initialises the base structure for an associative array. It can't
|
||||
fail.
|
||||
|
||||
|
||||
(2) Insert/replace an object in an associative array.
|
||||
|
||||
struct assoc_array_edit *
|
||||
assoc_array_insert(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops,
|
||||
const void *index_key,
|
||||
void *object);
|
||||
|
||||
This inserts the given object into the array. Note that the least
|
||||
significant bit of the pointer must be zero as it's used to type-mark
|
||||
pointers internally.
|
||||
|
||||
If an object already exists for that key then it will be replaced with the
|
||||
new object and the old one will be freed automatically.
|
||||
|
||||
The index_key argument should hold index key information and is
|
||||
passed to the methods in the ops table when they are called.
|
||||
|
||||
This function makes no alteration to the array itself, but rather returns
|
||||
an edit script that must be applied. -ENOMEM is returned in the case of
|
||||
an out-of-memory error.
|
||||
|
||||
The caller should lock exclusively against other modifiers of the array.
|
||||
|
||||
|
||||
(3) Delete an object from an associative array.
|
||||
|
||||
struct assoc_array_edit *
|
||||
assoc_array_delete(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops,
|
||||
const void *index_key);
|
||||
|
||||
This deletes an object that matches the specified data from the array.
|
||||
|
||||
The index_key argument should hold index key information and is
|
||||
passed to the methods in the ops table when they are called.
|
||||
|
||||
This function makes no alteration to the array itself, but rather returns
|
||||
an edit script that must be applied. -ENOMEM is returned in the case of
|
||||
an out-of-memory error. NULL will be returned if the specified object is
|
||||
not found within the array.
|
||||
|
||||
The caller should lock exclusively against other modifiers of the array.
|
||||
|
||||
|
||||
(4) Delete all objects from an associative array.
|
||||
|
||||
struct assoc_array_edit *
|
||||
assoc_array_clear(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops);
|
||||
|
||||
This deletes all the objects from an associative array and leaves it
|
||||
completely empty.
|
||||
|
||||
This function makes no alteration to the array itself, but rather returns
|
||||
an edit script that must be applied. -ENOMEM is returned in the case of
|
||||
an out-of-memory error.
|
||||
|
||||
The caller should lock exclusively against other modifiers of the array.
|
||||
|
||||
|
||||
(5) Destroy an associative array, deleting all objects.
|
||||
|
||||
void assoc_array_destroy(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops);
|
||||
|
||||
This destroys the contents of the associative array and leaves it
|
||||
completely empty. It is not permitted for another thread to be traversing
|
||||
the array under the RCU read lock at the same time as this function is
|
||||
destroying it as no RCU deferral is performed on memory release -
|
||||
something that would require memory to be allocated.
|
||||
|
||||
The caller should lock exclusively against other modifiers and accessors
|
||||
of the array.
|
||||
|
||||
|
||||
(6) Garbage collect an associative array.
|
||||
|
||||
int assoc_array_gc(struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops,
|
||||
bool (*iterator)(void *object, void *iterator_data),
|
||||
void *iterator_data);
|
||||
|
||||
This iterates over the objects in an associative array and passes each one
|
||||
to iterator(). If iterator() returns true, the object is kept. If it
|
||||
returns false, the object will be freed. If the iterator() function
|
||||
returns true, it must perform any appropriate refcount incrementing on the
|
||||
object before returning.
|
||||
|
||||
The internal tree will be packed down if possible as part of the iteration
|
||||
to reduce the number of nodes in it.
|
||||
|
||||
The iterator_data is passed directly to iterator() and is otherwise
|
||||
ignored by the function.
|
||||
|
||||
The function will return 0 if successful and -ENOMEM if there wasn't
|
||||
enough memory.
|
||||
|
||||
It is possible for other threads to iterate over or search the array under
|
||||
the RCU read lock whilst this function is in progress. The caller should
|
||||
lock exclusively against other modifiers of the array.
|
||||
|
||||
|
||||
ACCESS FUNCTIONS
|
||||
----------------
|
||||
|
||||
There are two functions for accessing an associative array:
|
||||
|
||||
(1) Iterate over all the objects in an associative array.
|
||||
|
||||
int assoc_array_iterate(const struct assoc_array *array,
|
||||
int (*iterator)(const void *object,
|
||||
void *iterator_data),
|
||||
void *iterator_data);
|
||||
|
||||
This passes each object in the array to the iterator callback function.
|
||||
iterator_data is private data for that function.
|
||||
|
||||
This may be used on an array at the same time as the array is being
|
||||
modified, provided the RCU read lock is held. Under such circumstances,
|
||||
it is possible for the iteration function to see some objects twice. If
|
||||
this is a problem, then modification should be locked against. The
|
||||
iteration algorithm should not, however, miss any objects.
|
||||
|
||||
The function will return 0 if no objects were in the array or else it will
|
||||
return the result of the last iterator function called. Iteration stops
|
||||
immediately if any call to the iteration function results in a non-zero
|
||||
return.
|
||||
|
||||
|
||||
(2) Find an object in an associative array.
|
||||
|
||||
void *assoc_array_find(const struct assoc_array *array,
|
||||
const struct assoc_array_ops *ops,
|
||||
const void *index_key);
|
||||
|
||||
This walks through the array's internal tree directly to the object
|
||||
specified by the index key..
|
||||
|
||||
This may be used on an array at the same time as the array is being
|
||||
modified, provided the RCU read lock is held.
|
||||
|
||||
The function will return the object if found (and set *_type to the object
|
||||
type) or will return NULL if the object was not found.
|
||||
|
||||
|
||||
INDEX KEY FORM
|
||||
--------------
|
||||
|
||||
The index key can be of any form, but since the algorithms aren't told how long
|
||||
the key is, it is strongly recommended that the index key includes its length
|
||||
very early on before any variation due to the length would have an effect on
|
||||
comparisons.
|
||||
|
||||
This will cause leaves with different length keys to scatter away from each
|
||||
other - and those with the same length keys to cluster together.
|
||||
|
||||
It is also recommended that the index key begin with a hash of the rest of the
|
||||
key to maximise scattering throughout keyspace.
|
||||
|
||||
The better the scattering, the wider and lower the internal tree will be.
|
||||
|
||||
Poor scattering isn't too much of a problem as there are shortcuts and nodes
|
||||
can contain mixtures of leaves and metadata pointers.
|
||||
|
||||
The index key is read in chunks of machine word. Each chunk is subdivided into
|
||||
one nibble (4 bits) per level, so on a 32-bit CPU this is good for 8 levels and
|
||||
on a 64-bit CPU, 16 levels. Unless the scattering is really poor, it is
|
||||
unlikely that more than one word of any particular index key will have to be
|
||||
used.
|
||||
|
||||
|
||||
=================
|
||||
INTERNAL WORKINGS
|
||||
=================
|
||||
|
||||
The associative array data structure has an internal tree. This tree is
|
||||
constructed of two types of metadata blocks: nodes and shortcuts.
|
||||
|
||||
A node is an array of slots. Each slot can contain one of four things:
|
||||
|
||||
(*) A NULL pointer, indicating that the slot is empty.
|
||||
|
||||
(*) A pointer to an object (a leaf).
|
||||
|
||||
(*) A pointer to a node at the next level.
|
||||
|
||||
(*) A pointer to a shortcut.
|
||||
|
||||
|
||||
BASIC INTERNAL TREE LAYOUT
|
||||
--------------------------
|
||||
|
||||
Ignoring shortcuts for the moment, the nodes form a multilevel tree. The index
|
||||
key space is strictly subdivided by the nodes in the tree and nodes occur on
|
||||
fixed levels. For example:
|
||||
|
||||
Level: 0 1 2 3
|
||||
=============== =============== =============== ===============
|
||||
NODE D
|
||||
NODE B NODE C +------>+---+
|
||||
+------>+---+ +------>+---+ | | 0 |
|
||||
NODE A | | 0 | | | 0 | | +---+
|
||||
+---+ | +---+ | +---+ | : :
|
||||
| 0 | | : : | : : | +---+
|
||||
+---+ | +---+ | +---+ | | f |
|
||||
| 1 |---+ | 3 |---+ | 7 |---+ +---+
|
||||
+---+ +---+ +---+
|
||||
: : : : | 8 |---+
|
||||
+---+ +---+ +---+ | NODE E
|
||||
| e |---+ | f | : : +------>+---+
|
||||
+---+ | +---+ +---+ | 0 |
|
||||
| f | | | f | +---+
|
||||
+---+ | +---+ : :
|
||||
| NODE F +---+
|
||||
+------>+---+ | f |
|
||||
| 0 | NODE G +---+
|
||||
+---+ +------>+---+
|
||||
: : | | 0 |
|
||||
+---+ | +---+
|
||||
| 6 |---+ : :
|
||||
+---+ +---+
|
||||
: : | f |
|
||||
+---+ +---+
|
||||
| f |
|
||||
+---+
|
||||
|
||||
In the above example, there are 7 nodes (A-G), each with 16 slots (0-f).
|
||||
Assuming no other meta data nodes in the tree, the key space is divided thusly:
|
||||
|
||||
KEY PREFIX NODE
|
||||
========== ====
|
||||
137* D
|
||||
138* E
|
||||
13[0-69-f]* C
|
||||
1[0-24-f]* B
|
||||
e6* G
|
||||
e[0-57-f]* F
|
||||
[02-df]* A
|
||||
|
||||
So, for instance, keys with the following example index keys will be found in
|
||||
the appropriate nodes:
|
||||
|
||||
INDEX KEY PREFIX NODE
|
||||
=============== ======= ====
|
||||
13694892892489 13 C
|
||||
13795289025897 137 D
|
||||
13889dde88793 138 E
|
||||
138bbb89003093 138 E
|
||||
1394879524789 12 C
|
||||
1458952489 1 B
|
||||
9431809de993ba - A
|
||||
b4542910809cd - A
|
||||
e5284310def98 e F
|
||||
e68428974237 e6 G
|
||||
e7fffcbd443 e F
|
||||
f3842239082 - A
|
||||
|
||||
To save memory, if a node can hold all the leaves in its portion of keyspace,
|
||||
then the node will have all those leaves in it and will not have any metadata
|
||||
pointers - even if some of those leaves would like to be in the same slot.
|
||||
|
||||
A node can contain a heterogeneous mix of leaves and metadata pointers.
|
||||
Metadata pointers must be in the slots that match their subdivisions of key
|
||||
space. The leaves can be in any slot not occupied by a metadata pointer. It
|
||||
is guaranteed that none of the leaves in a node will match a slot occupied by a
|
||||
metadata pointer. If the metadata pointer is there, any leaf whose key matches
|
||||
the metadata key prefix must be in the subtree that the metadata pointer points
|
||||
to.
|
||||
|
||||
In the above example list of index keys, node A will contain:
|
||||
|
||||
SLOT CONTENT INDEX KEY (PREFIX)
|
||||
==== =============== ==================
|
||||
1 PTR TO NODE B 1*
|
||||
any LEAF 9431809de993ba
|
||||
any LEAF b4542910809cd
|
||||
e PTR TO NODE F e*
|
||||
any LEAF f3842239082
|
||||
|
||||
and node B:
|
||||
|
||||
3 PTR TO NODE C 13*
|
||||
any LEAF 1458952489
|
||||
|
||||
|
||||
SHORTCUTS
|
||||
---------
|
||||
|
||||
Shortcuts are metadata records that jump over a piece of keyspace. A shortcut
|
||||
is a replacement for a series of single-occupancy nodes ascending through the
|
||||
levels. Shortcuts exist to save memory and to speed up traversal.
|
||||
|
||||
It is possible for the root of the tree to be a shortcut - say, for example,
|
||||
the tree contains at least 17 nodes all with key prefix '1111'. The insertion
|
||||
algorithm will insert a shortcut to skip over the '1111' keyspace in a single
|
||||
bound and get to the fourth level where these actually become different.
|
||||
|
||||
|
||||
SPLITTING AND COLLAPSING NODES
|
||||
------------------------------
|
||||
|
||||
Each node has a maximum capacity of 16 leaves and metadata pointers. If the
|
||||
insertion algorithm finds that it is trying to insert a 17th object into a
|
||||
node, that node will be split such that at least two leaves that have a common
|
||||
key segment at that level end up in a separate node rooted on that slot for
|
||||
that common key segment.
|
||||
|
||||
If the leaves in a full node and the leaf that is being inserted are
|
||||
sufficiently similar, then a shortcut will be inserted into the tree.
|
||||
|
||||
When the number of objects in the subtree rooted at a node falls to 16 or
|
||||
fewer, then the subtree will be collapsed down to a single node - and this will
|
||||
ripple towards the root if possible.
|
||||
|
||||
|
||||
NON-RECURSIVE ITERATION
|
||||
-----------------------
|
||||
|
||||
Each node and shortcut contains a back pointer to its parent and the number of
|
||||
slot in that parent that points to it. None-recursive iteration uses these to
|
||||
proceed rootwards through the tree, going to the parent node, slot N + 1 to
|
||||
make sure progress is made without the need for a stack.
|
||||
|
||||
The backpointers, however, make simultaneous alteration and iteration tricky.
|
||||
|
||||
|
||||
SIMULTANEOUS ALTERATION AND ITERATION
|
||||
-------------------------------------
|
||||
|
||||
There are a number of cases to consider:
|
||||
|
||||
(1) Simple insert/replace. This involves simply replacing a NULL or old
|
||||
matching leaf pointer with the pointer to the new leaf after a barrier.
|
||||
The metadata blocks don't change otherwise. An old leaf won't be freed
|
||||
until after the RCU grace period.
|
||||
|
||||
(2) Simple delete. This involves just clearing an old matching leaf. The
|
||||
metadata blocks don't change otherwise. The old leaf won't be freed until
|
||||
after the RCU grace period.
|
||||
|
||||
(3) Insertion replacing part of a subtree that we haven't yet entered. This
|
||||
may involve replacement of part of that subtree - but that won't affect
|
||||
the iteration as we won't have reached the pointer to it yet and the
|
||||
ancestry blocks are not replaced (the layout of those does not change).
|
||||
|
||||
(4) Insertion replacing nodes that we're actively processing. This isn't a
|
||||
problem as we've passed the anchoring pointer and won't switch onto the
|
||||
new layout until we follow the back pointers - at which point we've
|
||||
already examined the leaves in the replaced node (we iterate over all the
|
||||
leaves in a node before following any of its metadata pointers).
|
||||
|
||||
We might, however, re-see some leaves that have been split out into a new
|
||||
branch that's in a slot further along than we were at.
|
||||
|
||||
(5) Insertion replacing nodes that we're processing a dependent branch of.
|
||||
This won't affect us until we follow the back pointers. Similar to (4).
|
||||
|
||||
(6) Deletion collapsing a branch under us. This doesn't affect us because the
|
||||
back pointers will get us back to the parent of the new node before we
|
||||
could see the new node. The entire collapsed subtree is thrown away
|
||||
unchanged - and will still be rooted on the same slot, so we shouldn't
|
||||
process it a second time as we'll go back to slot + 1.
|
||||
|
||||
Note:
|
||||
|
||||
(*) Under some circumstances, we need to simultaneously change the parent
|
||||
pointer and the parent slot pointer on a node (say, for example, we
|
||||
inserted another node before it and moved it up a level). We cannot do
|
||||
this without locking against a read - so we have to replace that node too.
|
||||
|
||||
However, when we're changing a shortcut into a node this isn't a problem
|
||||
as shortcuts only have one slot and so the parent slot number isn't used
|
||||
when traversing backwards over one. This means that it's okay to change
|
||||
the slot number first - provided suitable barriers are used to make sure
|
||||
the parent slot number is read after the back pointer.
|
||||
|
||||
Obsolete blocks and leaves are freed up after an RCU grace period has passed,
|
||||
so as long as anyone doing walking or iteration holds the RCU read lock, the
|
||||
old superstructure should not go away on them.
|
||||
@@ -1,640 +0,0 @@
|
||||
Semantics and Behavior of Atomic and
|
||||
Bitmask Operations
|
||||
|
||||
David S. Miller
|
||||
|
||||
This document is intended to serve as a guide to Linux port
|
||||
maintainers on how to implement atomic counter, bitops, and spinlock
|
||||
interfaces properly.
|
||||
|
||||
The atomic_t type should be defined as a signed integer and
|
||||
the atomic_long_t type as a signed long integer. Also, they should
|
||||
be made opaque such that any kind of cast to a normal C integer type
|
||||
will fail. Something like the following should suffice:
|
||||
|
||||
typedef struct { int counter; } atomic_t;
|
||||
typedef struct { long counter; } atomic_long_t;
|
||||
|
||||
Historically, counter has been declared volatile. This is now discouraged.
|
||||
See Documentation/volatile-considered-harmful.txt for the complete rationale.
|
||||
|
||||
local_t is very similar to atomic_t. If the counter is per CPU and only
|
||||
updated by one CPU, local_t is probably more appropriate. Please see
|
||||
Documentation/local_ops.txt for the semantics of local_t.
|
||||
|
||||
The first operations to implement for atomic_t's are the initializers and
|
||||
plain reads.
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
#define atomic_set(v, i) ((v)->counter = (i))
|
||||
|
||||
The first macro is used in definitions, such as:
|
||||
|
||||
static atomic_t my_counter = ATOMIC_INIT(1);
|
||||
|
||||
The initializer is atomic in that the return values of the atomic operations
|
||||
are guaranteed to be correct reflecting the initialized value if the
|
||||
initializer is used before runtime. If the initializer is used at runtime, a
|
||||
proper implicit or explicit read memory barrier is needed before reading the
|
||||
value with atomic_read from another thread.
|
||||
|
||||
As with all of the atomic_ interfaces, replace the leading "atomic_"
|
||||
with "atomic_long_" to operate on atomic_long_t.
|
||||
|
||||
The second interface can be used at runtime, as in:
|
||||
|
||||
struct foo { atomic_t counter; };
|
||||
...
|
||||
|
||||
struct foo *k;
|
||||
|
||||
k = kmalloc(sizeof(*k), GFP_KERNEL);
|
||||
if (!k)
|
||||
return -ENOMEM;
|
||||
atomic_set(&k->counter, 0);
|
||||
|
||||
The setting is atomic in that the return values of the atomic operations by
|
||||
all threads are guaranteed to be correct reflecting either the value that has
|
||||
been set with this operation or set with another operation. A proper implicit
|
||||
or explicit memory barrier is needed before the value set with the operation
|
||||
is guaranteed to be readable with atomic_read from another thread.
|
||||
|
||||
Next, we have:
|
||||
|
||||
#define atomic_read(v) ((v)->counter)
|
||||
|
||||
which simply reads the counter value currently visible to the calling thread.
|
||||
The read is atomic in that the return value is guaranteed to be one of the
|
||||
values initialized or modified with the interface operations if a proper
|
||||
implicit or explicit memory barrier is used after possible runtime
|
||||
initialization by any other thread and the value is modified only with the
|
||||
interface operations. atomic_read does not guarantee that the runtime
|
||||
initialization by any other thread is visible yet, so the user of the
|
||||
interface must take care of that with a proper implicit or explicit memory
|
||||
barrier.
|
||||
|
||||
*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***
|
||||
|
||||
Some architectures may choose to use the volatile keyword, barriers, or inline
|
||||
assembly to guarantee some degree of immediacy for atomic_read() and
|
||||
atomic_set(). This is not uniformly guaranteed, and may change in the future,
|
||||
so all users of atomic_t should treat atomic_read() and atomic_set() as simple
|
||||
C statements that may be reordered or optimized away entirely by the compiler
|
||||
or processor, and explicitly invoke the appropriate compiler and/or memory
|
||||
barrier for each use case. Failure to do so will result in code that may
|
||||
suddenly break when used with different architectures or compiler
|
||||
optimizations, or even changes in unrelated code which changes how the
|
||||
compiler optimizes the section accessing atomic_t variables.
|
||||
|
||||
*** YOU HAVE BEEN WARNED! ***
|
||||
|
||||
Properly aligned pointers, longs, ints, and chars (and unsigned
|
||||
equivalents) may be atomically loaded from and stored to in the same
|
||||
sense as described for atomic_read() and atomic_set(). The ACCESS_ONCE()
|
||||
macro should be used to prevent the compiler from using optimizations
|
||||
that might otherwise optimize accesses out of existence on the one hand,
|
||||
or that might create unsolicited accesses on the other.
|
||||
|
||||
For example consider the following code:
|
||||
|
||||
while (a > 0)
|
||||
do_something();
|
||||
|
||||
If the compiler can prove that do_something() does not store to the
|
||||
variable a, then the compiler is within its rights transforming this to
|
||||
the following:
|
||||
|
||||
tmp = a;
|
||||
if (a > 0)
|
||||
for (;;)
|
||||
do_something();
|
||||
|
||||
If you don't want the compiler to do this (and you probably don't), then
|
||||
you should use something like the following:
|
||||
|
||||
while (ACCESS_ONCE(a) < 0)
|
||||
do_something();
|
||||
|
||||
Alternatively, you could place a barrier() call in the loop.
|
||||
|
||||
For another example, consider the following code:
|
||||
|
||||
tmp_a = a;
|
||||
do_something_with(tmp_a);
|
||||
do_something_else_with(tmp_a);
|
||||
|
||||
If the compiler can prove that do_something_with() does not store to the
|
||||
variable a, then the compiler is within its rights to manufacture an
|
||||
additional load as follows:
|
||||
|
||||
tmp_a = a;
|
||||
do_something_with(tmp_a);
|
||||
tmp_a = a;
|
||||
do_something_else_with(tmp_a);
|
||||
|
||||
This could fatally confuse your code if it expected the same value
|
||||
to be passed to do_something_with() and do_something_else_with().
|
||||
|
||||
The compiler would be likely to manufacture this additional load if
|
||||
do_something_with() was an inline function that made very heavy use
|
||||
of registers: reloading from variable a could save a flush to the
|
||||
stack and later reload. To prevent the compiler from attacking your
|
||||
code in this manner, write the following:
|
||||
|
||||
tmp_a = ACCESS_ONCE(a);
|
||||
do_something_with(tmp_a);
|
||||
do_something_else_with(tmp_a);
|
||||
|
||||
For a final example, consider the following code, assuming that the
|
||||
variable a is set at boot time before the second CPU is brought online
|
||||
and never changed later, so that memory barriers are not needed:
|
||||
|
||||
if (a)
|
||||
b = 9;
|
||||
else
|
||||
b = 42;
|
||||
|
||||
The compiler is within its rights to manufacture an additional store
|
||||
by transforming the above code into the following:
|
||||
|
||||
b = 42;
|
||||
if (a)
|
||||
b = 9;
|
||||
|
||||
This could come as a fatal surprise to other code running concurrently
|
||||
that expected b to never have the value 42 if a was zero. To prevent
|
||||
the compiler from doing this, write something like:
|
||||
|
||||
if (a)
|
||||
ACCESS_ONCE(b) = 9;
|
||||
else
|
||||
ACCESS_ONCE(b) = 42;
|
||||
|
||||
Don't even -think- about doing this without proper use of memory barriers,
|
||||
locks, or atomic operations if variable a can change at runtime!
|
||||
|
||||
*** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! ***
|
||||
|
||||
Now, we move onto the atomic operation interfaces typically implemented with
|
||||
the help of assembly code.
|
||||
|
||||
void atomic_add(int i, atomic_t *v);
|
||||
void atomic_sub(int i, atomic_t *v);
|
||||
void atomic_inc(atomic_t *v);
|
||||
void atomic_dec(atomic_t *v);
|
||||
|
||||
These four routines add and subtract integral values to/from the given
|
||||
atomic_t value. The first two routines pass explicit integers by
|
||||
which to make the adjustment, whereas the latter two use an implicit
|
||||
adjustment value of "1".
|
||||
|
||||
One very important aspect of these two routines is that they DO NOT
|
||||
require any explicit memory barriers. They need only perform the
|
||||
atomic_t counter update in an SMP safe manner.
|
||||
|
||||
Next, we have:
|
||||
|
||||
int atomic_inc_return(atomic_t *v);
|
||||
int atomic_dec_return(atomic_t *v);
|
||||
|
||||
These routines add 1 and subtract 1, respectively, from the given
|
||||
atomic_t and return the new counter value after the operation is
|
||||
performed.
|
||||
|
||||
Unlike the above routines, it is required that these primitives
|
||||
include explicit memory barriers that are performed before and after
|
||||
the operation. It must be done such that all memory operations before
|
||||
and after the atomic operation calls are strongly ordered with respect
|
||||
to the atomic operation itself.
|
||||
|
||||
For example, it should behave as if a smp_mb() call existed both
|
||||
before and after the atomic operation.
|
||||
|
||||
If the atomic instructions used in an implementation provide explicit
|
||||
memory barrier semantics which satisfy the above requirements, that is
|
||||
fine as well.
|
||||
|
||||
Let's move on:
|
||||
|
||||
int atomic_add_return(int i, atomic_t *v);
|
||||
int atomic_sub_return(int i, atomic_t *v);
|
||||
|
||||
These behave just like atomic_{inc,dec}_return() except that an
|
||||
explicit counter adjustment is given instead of the implicit "1".
|
||||
This means that like atomic_{inc,dec}_return(), the memory barrier
|
||||
semantics are required.
|
||||
|
||||
Next:
|
||||
|
||||
int atomic_inc_and_test(atomic_t *v);
|
||||
int atomic_dec_and_test(atomic_t *v);
|
||||
|
||||
These two routines increment and decrement by 1, respectively, the
|
||||
given atomic counter. They return a boolean indicating whether the
|
||||
resulting counter value was zero or not.
|
||||
|
||||
Again, these primitives provide explicit memory barrier semantics around
|
||||
the atomic operation.
|
||||
|
||||
int atomic_sub_and_test(int i, atomic_t *v);
|
||||
|
||||
This is identical to atomic_dec_and_test() except that an explicit
|
||||
decrement is given instead of the implicit "1". This primitive must
|
||||
provide explicit memory barrier semantics around the operation.
|
||||
|
||||
int atomic_add_negative(int i, atomic_t *v);
|
||||
|
||||
The given increment is added to the given atomic counter value. A boolean
|
||||
is return which indicates whether the resulting counter value is negative.
|
||||
This primitive must provide explicit memory barrier semantics around
|
||||
the operation.
|
||||
|
||||
Then:
|
||||
|
||||
int atomic_xchg(atomic_t *v, int new);
|
||||
|
||||
This performs an atomic exchange operation on the atomic variable v, setting
|
||||
the given new value. It returns the old value that the atomic variable v had
|
||||
just before the operation.
|
||||
|
||||
atomic_xchg must provide explicit memory barriers around the operation.
|
||||
|
||||
int atomic_cmpxchg(atomic_t *v, int old, int new);
|
||||
|
||||
This performs an atomic compare exchange operation on the atomic value v,
|
||||
with the given old and new values. Like all atomic_xxx operations,
|
||||
atomic_cmpxchg will only satisfy its atomicity semantics as long as all
|
||||
other accesses of *v are performed through atomic_xxx operations.
|
||||
|
||||
atomic_cmpxchg must provide explicit memory barriers around the operation,
|
||||
although if the comparison fails then no memory ordering guarantees are
|
||||
required.
|
||||
|
||||
The semantics for atomic_cmpxchg are the same as those defined for 'cas'
|
||||
below.
|
||||
|
||||
Finally:
|
||||
|
||||
int atomic_add_unless(atomic_t *v, int a, int u);
|
||||
|
||||
If the atomic value v is not equal to u, this function adds a to v, and
|
||||
returns non zero. If v is equal to u then it returns zero. This is done as
|
||||
an atomic operation.
|
||||
|
||||
atomic_add_unless must provide explicit memory barriers around the
|
||||
operation unless it fails (returns 0).
|
||||
|
||||
atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
|
||||
|
||||
|
||||
If a caller requires memory barrier semantics around an atomic_t
|
||||
operation which does not return a value, a set of interfaces are
|
||||
defined which accomplish this:
|
||||
|
||||
void smp_mb__before_atomic(void);
|
||||
void smp_mb__after_atomic(void);
|
||||
|
||||
For example, smp_mb__before_atomic() can be used like so:
|
||||
|
||||
obj->dead = 1;
|
||||
smp_mb__before_atomic();
|
||||
atomic_dec(&obj->ref_count);
|
||||
|
||||
It makes sure that all memory operations preceding the atomic_dec()
|
||||
call are strongly ordered with respect to the atomic counter
|
||||
operation. In the above example, it guarantees that the assignment of
|
||||
"1" to obj->dead will be globally visible to other cpus before the
|
||||
atomic counter decrement.
|
||||
|
||||
Without the explicit smp_mb__before_atomic() call, the
|
||||
implementation could legally allow the atomic counter update visible
|
||||
to other cpus before the "obj->dead = 1;" assignment.
|
||||
|
||||
A missing memory barrier in the cases where they are required by the
|
||||
atomic_t implementation above can have disastrous results. Here is
|
||||
an example, which follows a pattern occurring frequently in the Linux
|
||||
kernel. It is the use of atomic counters to implement reference
|
||||
counting, and it works such that once the counter falls to zero it can
|
||||
be guaranteed that no other entity can be accessing the object:
|
||||
|
||||
static void obj_list_add(struct obj *obj, struct list_head *head)
|
||||
{
|
||||
obj->active = 1;
|
||||
list_add(&obj->list, head);
|
||||
}
|
||||
|
||||
static void obj_list_del(struct obj *obj)
|
||||
{
|
||||
list_del(&obj->list);
|
||||
obj->active = 0;
|
||||
}
|
||||
|
||||
static void obj_destroy(struct obj *obj)
|
||||
{
|
||||
BUG_ON(obj->active);
|
||||
kfree(obj);
|
||||
}
|
||||
|
||||
struct obj *obj_list_peek(struct list_head *head)
|
||||
{
|
||||
if (!list_empty(head)) {
|
||||
struct obj *obj;
|
||||
|
||||
obj = list_entry(head->next, struct obj, list);
|
||||
atomic_inc(&obj->refcnt);
|
||||
return obj;
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
void obj_poke(void)
|
||||
{
|
||||
struct obj *obj;
|
||||
|
||||
spin_lock(&global_list_lock);
|
||||
obj = obj_list_peek(&global_list);
|
||||
spin_unlock(&global_list_lock);
|
||||
|
||||
if (obj) {
|
||||
obj->ops->poke(obj);
|
||||
if (atomic_dec_and_test(&obj->refcnt))
|
||||
obj_destroy(obj);
|
||||
}
|
||||
}
|
||||
|
||||
void obj_timeout(struct obj *obj)
|
||||
{
|
||||
spin_lock(&global_list_lock);
|
||||
obj_list_del(obj);
|
||||
spin_unlock(&global_list_lock);
|
||||
|
||||
if (atomic_dec_and_test(&obj->refcnt))
|
||||
obj_destroy(obj);
|
||||
}
|
||||
|
||||
(This is a simplification of the ARP queue management in the
|
||||
generic neighbour discover code of the networking. Olaf Kirch
|
||||
found a bug wrt. memory barriers in kfree_skb() that exposed
|
||||
the atomic_t memory barrier requirements quite clearly.)
|
||||
|
||||
Given the above scheme, it must be the case that the obj->active
|
||||
update done by the obj list deletion be visible to other processors
|
||||
before the atomic counter decrement is performed.
|
||||
|
||||
Otherwise, the counter could fall to zero, yet obj->active would still
|
||||
be set, thus triggering the assertion in obj_destroy(). The error
|
||||
sequence looks like this:
|
||||
|
||||
cpu 0 cpu 1
|
||||
obj_poke() obj_timeout()
|
||||
obj = obj_list_peek();
|
||||
... gains ref to obj, refcnt=2
|
||||
obj_list_del(obj);
|
||||
obj->active = 0 ...
|
||||
... visibility delayed ...
|
||||
atomic_dec_and_test()
|
||||
... refcnt drops to 1 ...
|
||||
atomic_dec_and_test()
|
||||
... refcount drops to 0 ...
|
||||
obj_destroy()
|
||||
BUG() triggers since obj->active
|
||||
still seen as one
|
||||
obj->active update visibility occurs
|
||||
|
||||
With the memory barrier semantics required of the atomic_t operations
|
||||
which return values, the above sequence of memory visibility can never
|
||||
happen. Specifically, in the above case the atomic_dec_and_test()
|
||||
counter decrement would not become globally visible until the
|
||||
obj->active update does.
|
||||
|
||||
As a historical note, 32-bit Sparc used to only allow usage of
|
||||
24-bits of its atomic_t type. This was because it used 8 bits
|
||||
as a spinlock for SMP safety. Sparc32 lacked a "compare and swap"
|
||||
type instruction. However, 32-bit Sparc has since been moved over
|
||||
to a "hash table of spinlocks" scheme, that allows the full 32-bit
|
||||
counter to be realized. Essentially, an array of spinlocks are
|
||||
indexed into based upon the address of the atomic_t being operated
|
||||
on, and that lock protects the atomic operation. Parisc uses the
|
||||
same scheme.
|
||||
|
||||
Another note is that the atomic_t operations returning values are
|
||||
extremely slow on an old 386.
|
||||
|
||||
We will now cover the atomic bitmask operations. You will find that
|
||||
their SMP and memory barrier semantics are similar in shape and scope
|
||||
to the atomic_t ops above.
|
||||
|
||||
Native atomic bit operations are defined to operate on objects aligned
|
||||
to the size of an "unsigned long" C data type, and are least of that
|
||||
size. The endianness of the bits within each "unsigned long" are the
|
||||
native endianness of the cpu.
|
||||
|
||||
void set_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
void clear_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
void change_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
|
||||
These routines set, clear, and change, respectively, the bit number
|
||||
indicated by "nr" on the bit mask pointed to by "ADDR".
|
||||
|
||||
They must execute atomically, yet there are no implicit memory barrier
|
||||
semantics required of these interfaces.
|
||||
|
||||
int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
int test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
|
||||
Like the above, except that these routines return a boolean which
|
||||
indicates whether the changed bit was set _BEFORE_ the atomic bit
|
||||
operation.
|
||||
|
||||
WARNING! It is incredibly important that the value be a boolean,
|
||||
ie. "0" or "1". Do not try to be fancy and save a few instructions by
|
||||
declaring the above to return "long" and just returning something like
|
||||
"old_val & mask" because that will not work.
|
||||
|
||||
For one thing, this return value gets truncated to int in many code
|
||||
paths using these interfaces, so on 64-bit if the bit is set in the
|
||||
upper 32-bits then testers will never see that.
|
||||
|
||||
One great example of where this problem crops up are the thread_info
|
||||
flag operations. Routines such as test_and_set_ti_thread_flag() chop
|
||||
the return value into an int. There are other places where things
|
||||
like this occur as well.
|
||||
|
||||
These routines, like the atomic_t counter operations returning values,
|
||||
must provide explicit memory barrier semantics around their execution.
|
||||
All memory operations before the atomic bit operation call must be
|
||||
made visible globally before the atomic bit operation is made visible.
|
||||
Likewise, the atomic bit operation must be visible globally before any
|
||||
subsequent memory operation is made visible. For example:
|
||||
|
||||
obj->dead = 1;
|
||||
if (test_and_set_bit(0, &obj->flags))
|
||||
/* ... */;
|
||||
obj->killed = 1;
|
||||
|
||||
The implementation of test_and_set_bit() must guarantee that
|
||||
"obj->dead = 1;" is visible to cpus before the atomic memory operation
|
||||
done by test_and_set_bit() becomes visible. Likewise, the atomic
|
||||
memory operation done by test_and_set_bit() must become visible before
|
||||
"obj->killed = 1;" is visible.
|
||||
|
||||
Finally there is the basic operation:
|
||||
|
||||
int test_bit(unsigned long nr, __const__ volatile unsigned long *addr);
|
||||
|
||||
Which returns a boolean indicating if bit "nr" is set in the bitmask
|
||||
pointed to by "addr".
|
||||
|
||||
If explicit memory barriers are required around {set,clear}_bit() (which do
|
||||
not return a value, and thus does not need to provide memory barrier
|
||||
semantics), two interfaces are provided:
|
||||
|
||||
void smp_mb__before_atomic(void);
|
||||
void smp_mb__after_atomic(void);
|
||||
|
||||
They are used as follows, and are akin to their atomic_t operation
|
||||
brothers:
|
||||
|
||||
/* All memory operations before this call will
|
||||
* be globally visible before the clear_bit().
|
||||
*/
|
||||
smp_mb__before_atomic();
|
||||
clear_bit( ... );
|
||||
|
||||
/* The clear_bit() will be visible before all
|
||||
* subsequent memory operations.
|
||||
*/
|
||||
smp_mb__after_atomic();
|
||||
|
||||
There are two special bitops with lock barrier semantics (acquire/release,
|
||||
same as spinlocks). These operate in the same way as their non-_lock/unlock
|
||||
postfixed variants, except that they are to provide acquire/release semantics,
|
||||
respectively. This means they can be used for bit_spin_trylock and
|
||||
bit_spin_unlock type operations without specifying any more barriers.
|
||||
|
||||
int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
|
||||
void clear_bit_unlock(unsigned long nr, unsigned long *addr);
|
||||
void __clear_bit_unlock(unsigned long nr, unsigned long *addr);
|
||||
|
||||
The __clear_bit_unlock version is non-atomic, however it still implements
|
||||
unlock barrier semantics. This can be useful if the lock itself is protecting
|
||||
the other bits in the word.
|
||||
|
||||
Finally, there are non-atomic versions of the bitmask operations
|
||||
provided. They are used in contexts where some other higher-level SMP
|
||||
locking scheme is being used to protect the bitmask, and thus less
|
||||
expensive non-atomic operations may be used in the implementation.
|
||||
They have names similar to the above bitmask operation interfaces,
|
||||
except that two underscores are prefixed to the interface name.
|
||||
|
||||
void __set_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
void __clear_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
void __change_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
int __test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
int __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
int __test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
|
||||
These non-atomic variants also do not require any special memory
|
||||
barrier semantics.
|
||||
|
||||
The routines xchg() and cmpxchg() must provide the same exact
|
||||
memory-barrier semantics as the atomic and bit operations returning
|
||||
values.
|
||||
|
||||
Note: If someone wants to use xchg(), cmpxchg() and their variants,
|
||||
linux/atomic.h should be included rather than asm/cmpxchg.h, unless
|
||||
the code is in arch/* and can take care of itself.
|
||||
|
||||
Spinlocks and rwlocks have memory barrier expectations as well.
|
||||
The rule to follow is simple:
|
||||
|
||||
1) When acquiring a lock, the implementation must make it globally
|
||||
visible before any subsequent memory operation.
|
||||
|
||||
2) When releasing a lock, the implementation must make it such that
|
||||
all previous memory operations are globally visible before the
|
||||
lock release.
|
||||
|
||||
Which finally brings us to _atomic_dec_and_lock(). There is an
|
||||
architecture-neutral version implemented in lib/dec_and_lock.c,
|
||||
but most platforms will wish to optimize this in assembler.
|
||||
|
||||
int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
|
||||
|
||||
Atomically decrement the given counter, and if will drop to zero
|
||||
atomically acquire the given spinlock and perform the decrement
|
||||
of the counter to zero. If it does not drop to zero, do nothing
|
||||
with the spinlock.
|
||||
|
||||
It is actually pretty simple to get the memory barrier correct.
|
||||
Simply satisfy the spinlock grab requirements, which is make
|
||||
sure the spinlock operation is globally visible before any
|
||||
subsequent memory operation.
|
||||
|
||||
We can demonstrate this operation more clearly if we define
|
||||
an abstract atomic operation:
|
||||
|
||||
long cas(long *mem, long old, long new);
|
||||
|
||||
"cas" stands for "compare and swap". It atomically:
|
||||
|
||||
1) Compares "old" with the value currently at "mem".
|
||||
2) If they are equal, "new" is written to "mem".
|
||||
3) Regardless, the current value at "mem" is returned.
|
||||
|
||||
As an example usage, here is what an atomic counter update
|
||||
might look like:
|
||||
|
||||
void example_atomic_inc(long *counter)
|
||||
{
|
||||
long old, new, ret;
|
||||
|
||||
while (1) {
|
||||
old = *counter;
|
||||
new = old + 1;
|
||||
|
||||
ret = cas(counter, old, new);
|
||||
if (ret == old)
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
Let's use cas() in order to build a pseudo-C atomic_dec_and_lock():
|
||||
|
||||
int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
|
||||
{
|
||||
long old, new, ret;
|
||||
int went_to_zero;
|
||||
|
||||
went_to_zero = 0;
|
||||
while (1) {
|
||||
old = atomic_read(atomic);
|
||||
new = old - 1;
|
||||
if (new == 0) {
|
||||
went_to_zero = 1;
|
||||
spin_lock(lock);
|
||||
}
|
||||
ret = cas(atomic, old, new);
|
||||
if (ret == old)
|
||||
break;
|
||||
if (went_to_zero) {
|
||||
spin_unlock(lock);
|
||||
went_to_zero = 0;
|
||||
}
|
||||
}
|
||||
|
||||
return went_to_zero;
|
||||
}
|
||||
|
||||
Now, as far as memory barriers go, as long as spin_lock()
|
||||
strictly orders all subsequent memory operations (including
|
||||
the cas()) with respect to itself, things will be fine.
|
||||
|
||||
Said another way, _atomic_dec_and_lock() must guarantee that
|
||||
a counter dropping to zero is never made visible before the
|
||||
spinlock being acquired.
|
||||
|
||||
Note that this also means that for the case where the counter
|
||||
is not dropping to zero, there are no memory ordering
|
||||
requirements.
|
||||
@@ -1,45 +0,0 @@
|
||||
March 2008
|
||||
Jan-Simon Moeller, dl9pf@gmx.de
|
||||
|
||||
|
||||
How to deal with bad memory e.g. reported by memtest86+ ?
|
||||
#########################################################
|
||||
|
||||
There are three possibilities I know of:
|
||||
|
||||
1) Reinsert/swap the memory modules
|
||||
|
||||
2) Buy new modules (best!) or try to exchange the memory
|
||||
if you have spare-parts
|
||||
|
||||
3) Use BadRAM or memmap
|
||||
|
||||
This Howto is about number 3) .
|
||||
|
||||
|
||||
BadRAM
|
||||
######
|
||||
BadRAM is the actively developed and available as kernel-patch
|
||||
here: http://rick.vanrein.org/linux/badram/
|
||||
|
||||
For more details see the BadRAM documentation.
|
||||
|
||||
memmap
|
||||
######
|
||||
|
||||
memmap is already in the kernel and usable as kernel-parameter at
|
||||
boot-time. Its syntax is slightly strange and you may need to
|
||||
calculate the values by yourself!
|
||||
|
||||
Syntax to exclude a memory area (see kernel-parameters.txt for details):
|
||||
memmap=<size>$<address>
|
||||
|
||||
Example: memtest86+ reported here errors at address 0x18691458, 0x18698424 and
|
||||
some others. All had 0x1869xxxx in common, so I chose a pattern of
|
||||
0x18690000,0xffff0000.
|
||||
|
||||
With the numbers of the example above:
|
||||
memmap=64K$0x18690000
|
||||
or
|
||||
memmap=0x10000$0x18690000
|
||||
|
||||
@@ -1,56 +0,0 @@
|
||||
These instructions are deliberately very basic. If you want something clever,
|
||||
go read the real docs ;-) Please don't add more stuff, but feel free to
|
||||
correct my mistakes ;-) (mbligh@aracnet.com)
|
||||
Thanks to John Levon, Dave Hansen, et al. for help writing this.
|
||||
|
||||
<test> is the thing you're trying to measure.
|
||||
Make sure you have the correct System.map / vmlinux referenced!
|
||||
|
||||
It is probably easiest to use "make install" for linux and hack
|
||||
/sbin/installkernel to copy vmlinux to /boot, in addition to vmlinuz,
|
||||
config, System.map, which are usually installed by default.
|
||||
|
||||
Readprofile
|
||||
-----------
|
||||
A recent readprofile command is needed for 2.6, such as found in util-linux
|
||||
2.12a, which can be downloaded from:
|
||||
|
||||
http://www.kernel.org/pub/linux/utils/util-linux/
|
||||
|
||||
Most distributions will ship it already.
|
||||
|
||||
Add "profile=2" to the kernel command line.
|
||||
|
||||
clear readprofile -r
|
||||
<test>
|
||||
dump output readprofile -m /boot/System.map > captured_profile
|
||||
|
||||
Oprofile
|
||||
--------
|
||||
|
||||
Get the source (see Changes for required version) from
|
||||
http://oprofile.sourceforge.net/ and add "idle=poll" to the kernel command
|
||||
line.
|
||||
|
||||
Configure with CONFIG_PROFILING=y and CONFIG_OPROFILE=y & reboot on new kernel
|
||||
|
||||
./configure --with-kernel-support
|
||||
make install
|
||||
|
||||
For superior results, be sure to enable the local APIC. If opreport sees
|
||||
a 0Hz CPU, APIC was not on. Be aware that idle=poll may mean a performance
|
||||
penalty.
|
||||
|
||||
One time setup:
|
||||
opcontrol --setup --vmlinux=/boot/vmlinux
|
||||
|
||||
clear opcontrol --reset
|
||||
start opcontrol --start
|
||||
<test>
|
||||
stop opcontrol --stop
|
||||
dump output opreport > output_file
|
||||
|
||||
To only report on the kernel, run opreport -l /boot/vmlinux > output_file
|
||||
|
||||
A reset is needed to clear old statistics, which survive a reboot.
|
||||
|
||||
@@ -1,131 +0,0 @@
|
||||
Kernel Support for miscellaneous (your favourite) Binary Formats v1.1
|
||||
=====================================================================
|
||||
|
||||
This Kernel feature allows you to invoke almost (for restrictions see below)
|
||||
every program by simply typing its name in the shell.
|
||||
This includes for example compiled Java(TM), Python or Emacs programs.
|
||||
|
||||
To achieve this you must tell binfmt_misc which interpreter has to be invoked
|
||||
with which binary. Binfmt_misc recognises the binary-type by matching some bytes
|
||||
at the beginning of the file with a magic byte sequence (masking out specified
|
||||
bits) you have supplied. Binfmt_misc can also recognise a filename extension
|
||||
aka '.com' or '.exe'.
|
||||
|
||||
First you must mount binfmt_misc:
|
||||
mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
|
||||
|
||||
To actually register a new binary type, you have to set up a string looking like
|
||||
:name:type:offset:magic:mask:interpreter:flags (where you can choose the ':'
|
||||
upon your needs) and echo it to /proc/sys/fs/binfmt_misc/register.
|
||||
|
||||
Here is what the fields mean:
|
||||
- 'name' is an identifier string. A new /proc file will be created with this
|
||||
name below /proc/sys/fs/binfmt_misc; cannot contain slashes '/' for obvious
|
||||
reasons.
|
||||
- 'type' is the type of recognition. Give 'M' for magic and 'E' for extension.
|
||||
- 'offset' is the offset of the magic/mask in the file, counted in bytes. This
|
||||
defaults to 0 if you omit it (i.e. you write ':name:type::magic...'). Ignored
|
||||
when using filename extension matching.
|
||||
- 'magic' is the byte sequence binfmt_misc is matching for. The magic string
|
||||
may contain hex-encoded characters like \x0a or \xA4. Note that you must
|
||||
escape any NUL bytes; parsing halts at the first one. In a shell environment
|
||||
you might have to write \\x0a to prevent the shell from eating your \.
|
||||
If you chose filename extension matching, this is the extension to be
|
||||
recognised (without the '.', the \x0a specials are not allowed). Extension
|
||||
matching is case sensitive, and slashes '/' are not allowed!
|
||||
- 'mask' is an (optional, defaults to all 0xff) mask. You can mask out some
|
||||
bits from matching by supplying a string like magic and as long as magic.
|
||||
The mask is anded with the byte sequence of the file. Note that you must
|
||||
escape any NUL bytes; parsing halts at the first one. Ignored when using
|
||||
filename extension matching.
|
||||
- 'interpreter' is the program that should be invoked with the binary as first
|
||||
argument (specify the full path)
|
||||
- 'flags' is an optional field that controls several aspects of the invocation
|
||||
of the interpreter. It is a string of capital letters, each controls a
|
||||
certain aspect. The following flags are supported -
|
||||
'P' - preserve-argv[0]. Legacy behavior of binfmt_misc is to overwrite
|
||||
the original argv[0] with the full path to the binary. When this
|
||||
flag is included, binfmt_misc will add an argument to the argument
|
||||
vector for this purpose, thus preserving the original argv[0].
|
||||
e.g. If your interp is set to /bin/foo and you run `blah` (which is
|
||||
in /usr/local/bin), then the kernel will execute /bin/foo with
|
||||
argv[] set to ["/bin/foo", "/usr/local/bin/blah", "blah"]. The
|
||||
interp has to be aware of this so it can execute /usr/local/bin/blah
|
||||
with argv[] set to ["blah"].
|
||||
'O' - open-binary. Legacy behavior of binfmt_misc is to pass the full path
|
||||
of the binary to the interpreter as an argument. When this flag is
|
||||
included, binfmt_misc will open the file for reading and pass its
|
||||
descriptor as an argument, instead of the full path, thus allowing
|
||||
the interpreter to execute non-readable binaries. This feature
|
||||
should be used with care - the interpreter has to be trusted not to
|
||||
emit the contents of the non-readable binary.
|
||||
'C' - credentials. Currently, the behavior of binfmt_misc is to calculate
|
||||
the credentials and security token of the new process according to
|
||||
the interpreter. When this flag is included, these attributes are
|
||||
calculated according to the binary. It also implies the 'O' flag.
|
||||
This feature should be used with care as the interpreter
|
||||
will run with root permissions when a setuid binary owned by root
|
||||
is run with binfmt_misc.
|
||||
'F' - fix binary. The usual behaviour of binfmt_misc is to spawn the
|
||||
binary lazily when the misc format file is invoked. However,
|
||||
this doesn't work very well in the face of mount namespaces and
|
||||
changeroots, so the F mode opens the binary as soon as the
|
||||
emulation is installed and uses the opened image to spawn the
|
||||
emulator, meaning it is always available once installed,
|
||||
regardless of how the environment changes.
|
||||
|
||||
|
||||
There are some restrictions:
|
||||
- the whole register string may not exceed 1920 characters
|
||||
- the magic must reside in the first 128 bytes of the file, i.e.
|
||||
offset+size(magic) has to be less than 128
|
||||
- the interpreter string may not exceed 127 characters
|
||||
|
||||
To use binfmt_misc you have to mount it first. You can mount it with
|
||||
"mount -t binfmt_misc none /proc/sys/fs/binfmt_misc" command, or you can add
|
||||
a line "none /proc/sys/fs/binfmt_misc binfmt_misc defaults 0 0" to your
|
||||
/etc/fstab so it auto mounts on boot.
|
||||
|
||||
You may want to add the binary formats in one of your /etc/rc scripts during
|
||||
boot-up. Read the manual of your init program to figure out how to do this
|
||||
right.
|
||||
|
||||
Think about the order of adding entries! Later added entries are matched first!
|
||||
|
||||
|
||||
A few examples (assumed you are in /proc/sys/fs/binfmt_misc):
|
||||
|
||||
- enable support for em86 (like binfmt_em86, for Alpha AXP only):
|
||||
echo ':i386:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
|
||||
echo ':i486:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x06:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
|
||||
|
||||
- enable support for packed DOS applications (pre-configured dosemu hdimages):
|
||||
echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register
|
||||
|
||||
- enable support for Windows executables using wine:
|
||||
echo ':DOSWin:M::MZ::/usr/local/bin/wine:' > register
|
||||
|
||||
For java support see Documentation/java.txt
|
||||
|
||||
|
||||
You can enable/disable binfmt_misc or one binary type by echoing 0 (to disable)
|
||||
or 1 (to enable) to /proc/sys/fs/binfmt_misc/status or /proc/.../the_name.
|
||||
Catting the file tells you the current status of binfmt_misc/the entry.
|
||||
|
||||
You can remove one entry or all entries by echoing -1 to /proc/.../the_name
|
||||
or /proc/sys/fs/binfmt_misc/status.
|
||||
|
||||
|
||||
HINTS:
|
||||
======
|
||||
|
||||
If you want to pass special arguments to your interpreter, you can
|
||||
write a wrapper script for it. See Documentation/java.txt for an
|
||||
example.
|
||||
|
||||
Your interpreter should NOT look in the PATH for the filename; the kernel
|
||||
passes it the full filename (or the file descriptor) to use. Using $PATH can
|
||||
cause unexpected behaviour and can be a security hazard.
|
||||
|
||||
|
||||
Richard Günther <rguenth@tat.physik.uni-tuebingen.de>
|
||||
@@ -348,7 +348,7 @@ Drivers can now specify a request prepare function (q->prep_rq_fn) that the
|
||||
block layer would invoke to pre-build device commands for a given request,
|
||||
or perform other preparatory processing for the request. This is routine is
|
||||
called by elv_next_request(), i.e. typically just before servicing a request.
|
||||
(The prepare function would not be called for requests that have REQ_DONTPREP
|
||||
(The prepare function would not be called for requests that have RQF_DONTPREP
|
||||
enabled)
|
||||
|
||||
Aside:
|
||||
@@ -553,8 +553,8 @@ struct request {
|
||||
struct request_list *rl;
|
||||
}
|
||||
|
||||
See the rq_flag_bits definitions for an explanation of the various flags
|
||||
available. Some bits are used by the block layer or i/o scheduler.
|
||||
See the req_ops and req_flag_bits definitions for an explanation of the various
|
||||
flags available. Some bits are used by the block layer or i/o scheduler.
|
||||
|
||||
The behaviour of the various sector counts are almost the same as before,
|
||||
except that since we have multi-segment bios, current_nr_sectors refers
|
||||
|
||||
@@ -240,11 +240,11 @@ All cfq queues doing synchronous sequential IO go on to sync-idle tree.
|
||||
On this tree we idle on each queue individually.
|
||||
|
||||
All synchronous non-sequential queues go on sync-noidle tree. Also any
|
||||
request which are marked with REQ_NOIDLE go on this service tree. On this
|
||||
tree we do not idle on individual queues instead idle on the whole group
|
||||
of queues or the tree. So if there are 4 queues waiting for IO to dispatch
|
||||
we will idle only once last queue has dispatched the IO and there is
|
||||
no more IO on this service tree.
|
||||
synchronous write request which is not marked with REQ_IDLE goes on this
|
||||
service tree. On this tree we do not idle on individual queues instead idle
|
||||
on the whole group of queues or the tree. So if there are 4 queues waiting
|
||||
for IO to dispatch we will idle only once last queue has dispatched the IO
|
||||
and there is no more IO on this service tree.
|
||||
|
||||
All async writes go on async service tree. There is no idling on async
|
||||
queues.
|
||||
@@ -257,17 +257,17 @@ tree idling provides isolation with buffered write queues on async tree.
|
||||
|
||||
FAQ
|
||||
===
|
||||
Q1. Why to idle at all on queues marked with REQ_NOIDLE.
|
||||
Q1. Why to idle at all on queues not marked with REQ_IDLE.
|
||||
|
||||
A1. We only do tree idle (all queues on sync-noidle tree) on queues marked
|
||||
with REQ_NOIDLE. This helps in providing isolation with all the sync-idle
|
||||
A1. We only do tree idle (all queues on sync-noidle tree) on queues not marked
|
||||
with REQ_IDLE. This helps in providing isolation with all the sync-idle
|
||||
queues. Otherwise in presence of many sequential readers, other
|
||||
synchronous IO might not get fair share of disk.
|
||||
|
||||
For example, if there are 10 sequential readers doing IO and they get
|
||||
100ms each. If a REQ_NOIDLE request comes in, it will be scheduled
|
||||
roughly after 1 second. If after completion of REQ_NOIDLE request we
|
||||
do not idle, and after a couple of milli seconds a another REQ_NOIDLE
|
||||
100ms each. If a !REQ_IDLE request comes in, it will be scheduled
|
||||
roughly after 1 second. If after completion of !REQ_IDLE request we
|
||||
do not idle, and after a couple of milli seconds a another !REQ_IDLE
|
||||
request comes in, again it will be scheduled after 1second. Repeat it
|
||||
and notice how a workload can lose its disk share and suffer due to
|
||||
multiple sequential readers.
|
||||
@@ -276,16 +276,16 @@ A1. We only do tree idle (all queues on sync-noidle tree) on queues marked
|
||||
context of fsync, and later some journaling data is written. Journaling
|
||||
data comes in only after fsync has finished its IO (atleast for ext4
|
||||
that seemed to be the case). Now if one decides not to idle on fsync
|
||||
thread due to REQ_NOIDLE, then next journaling write will not get
|
||||
thread due to !REQ_IDLE, then next journaling write will not get
|
||||
scheduled for another second. A process doing small fsync, will suffer
|
||||
badly in presence of multiple sequential readers.
|
||||
|
||||
Hence doing tree idling on threads using REQ_NOIDLE flag on requests
|
||||
Hence doing tree idling on threads using !REQ_IDLE flag on requests
|
||||
provides isolation from multiple sequential readers and at the same
|
||||
time we do not idle on individual threads.
|
||||
|
||||
Q2. When to specify REQ_NOIDLE
|
||||
A2. I would think whenever one is doing synchronous write and not expecting
|
||||
Q2. When to specify REQ_IDLE
|
||||
A2. I would think whenever one is doing synchronous write and expecting
|
||||
more writes to be dispatched from same context soon, should be able
|
||||
to specify REQ_NOIDLE on writes and that probably should work well for
|
||||
to specify REQ_IDLE on writes and that probably should work well for
|
||||
most of the cases.
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user