linux/Documentation
Florian Westphal d1b4c689d4 netlink: remove mmapped netlink support
mmapped netlink has a number of unresolved issues:

- TX zerocopy support had to be disabled more than a year ago via
  commit 4682a03586 ("netlink: Always copy on mmap TX.")
  because the content of the mmapped area can change after netlink
  attribute validation but before message processing.

- RX support was implemented mainly to speed up nfqueue dumping packet
  payload to userspace.  However, since commit ae08ce0021
  ("netfilter: nfnetlink_queue: zero copy support") we avoid one copy
  with the socket-based interface too (via the skb_zerocopy helper).

The other problem is that skbs attached to mmaped netlink socket
behave different from normal skbs:

- they don't have a shinfo area, so all functions that use skb_shinfo()
(e.g. skb_clone) cannot be used.

- reserving headroom prevents userspace from seeing the content as
it expects message to start at skb->head.
See for instance
commit aa3a022094 ("netlink: not trim skb for mmaped socket when dump").

- skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we
crash because it needs the sk to check if a tx ring is attached.

Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf359
("netfilter: nfnetlink: use original skbuff when acking batches").

mmaped netlink also didn't play nicely with the skb_zerocopy helper
used by nfqueue and openvswitch.  Daniel Borkmann fixed this via
commit 6bb0fef489 ("netlink, mmap: fix edge-case leakages in nf queue
zero-copy")' but at the cost of also needing to provide remaining
length to the allocation function.

nfqueue also has problems when used with mmaped rx netlink:
- mmaped netlink doesn't allow use of nfqueue batch verdict messages.
  Problem is that in the mmap case, the allocation time also determines
  the ordering in which the frame will be seen by userspace (A
  allocating before B means that A is located in earlier ring slot,
  but this also means that B might get a lower sequence number then A
  since seqno is decided later.  To fix this we would need to extend the
  spinlocked region to also cover the allocation and message setup which
  isn't desirable.
- nfqueue can now be configured to queue large (GSO) skbs to userspace.
  Queing GSO packets is faster than having to force a software segmentation
  in the kernel, so this is a desirable option.  However, with a mmap based
  ring one has to use 64kb per ring slot element, else mmap has to fall back
  to the socket path (NL_MMAP_STATUS_COPY) for all large packets.

To use the mmap interface, userspace not only has to probe for mmap netlink
support, it also has to implement a recv/socket receive path in order to
handle messages that exceed the size of an rx ring element.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ken-ichirou MATSUZAWA <chamaken@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 11:42:18 -05:00
..
ABI Initial roundup of 4.5 merge window patches 2016-01-23 18:45:06 -08:00
accounting Documentation-getdelays: Apply a recommendation from "checkpatch.pl" in main() 2015-12-24 07:22:32 -07:00
acpi
aoe
arm ARM: SoC multiplatform code changes for v4.5 2016-01-20 18:03:56 -08:00
arm64 arm64: Documentation: add list of software workarounds for errata 2015-12-11 17:33:21 +00:00
auxdisplay
backlight
blackfin
block A relatively boring cycle in the docs tree. There's a few kernel-doc 2016-01-17 11:55:07 -08:00
blockdev
bus-devices
cdrom
cgroup-v1 cgroup: rename cgroup documentations 2016-01-11 23:14:51 -05:00
cma
connector
console
cpu-freq Documentation: cpufreq: intel_pstate: enhance documentation 2016-01-05 13:47:37 +01:00
cpuidle
cris
crypto
development-process
device-mapper dm verity: add ignore_zero_blocks feature 2015-12-10 10:39:03 -05:00
devicetree Major changes: 2016-02-16 20:38:29 -05:00
dmaengine Merge branch 'topic/async' into for-linus 2016-01-06 15:17:47 +05:30
DocBook Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux 2016-01-17 13:40:25 -08:00
driver-model
dvb [media] use https://linuxtv.org for LinuxTV URLs 2015-12-04 10:38:59 -02:00
early-userspace
EDID
extcon
fault-injection net: Add support for CHANGEUPPER notifier error injection 2015-12-03 11:49:23 -05:00
fb
features dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
filesystems Documentation/filesystems/vfat.txt: update the limitation for fat fallocate 2016-01-20 17:09:18 -08:00
firmware_class
fmc
fpga
frv
gpio Doc: gpio: Fix typos in Documentation/gpio 2015-11-20 16:51:16 -07:00
hid
hwmon hwmon: (pmbus) Add client driver for LTC3815 2015-12-18 08:20:59 -08:00
i2c i2c: i801: add Intel Lewisburg device IDs 2015-11-20 16:22:21 +01:00
ia64
ide
iio iio: Documentation: Add IIO configfs documentation 2015-12-03 18:19:28 +00:00
infiniband IB: remove in-kernel support for memory windows 2015-12-23 14:29:04 -05:00
input
ioctl Doc: ioctl: Fix typos in Documentation/ioctl 2015-11-20 16:52:50 -07:00
isdn
ja_JP Documentation: translations: update linux cross reference link 2016-01-11 18:26:58 -07:00
kbuild
kdump
ko_KR Documentation: translations: update linux cross reference link 2016-01-11 18:26:58 -07:00
laptops
leds Documentation: leds: Add description of brightness setting API 2016-01-04 09:57:31 +01:00
locking
m68k
memory-devices
metag
mic
mips
misc-devices
mmc
mn10300
mtd Documentation: mtd: improve nand_ecc.txt for readability and correctness 2015-11-17 17:05:14 -08:00
namespaces
netlabel
networking netlink: remove mmapped netlink support 2016-02-18 11:42:18 -05:00
nfc
nios2
nvdimm libnvdimm: documentation clarifications 2015-11-12 09:55:23 -08:00
nvmem
parisc
PCI
pcmcia
phy
platform
power Merge branches 'pm-pci' and 'pm-core' 2016-01-12 01:10:52 +01:00
powerpc
pps
prctl
pti
ptp
rapidio
RCU documentation: Update RCU requirements based on expedited changes 2015-12-05 12:34:32 -08:00
s390 s390/zcore: remove /sys/kernel/debug/zcore/mem 2015-11-27 09:24:12 +01:00
scheduler
scsi
security keys, trusted: seal with a TPM2 authorization policy 2015-12-20 15:27:13 +02:00
serial
sh
sound ASoC: img: Add documentation for SPDIF in controls 2015-11-16 10:06:58 +00:00
spi spi: tools: move spidev_test metadata 2015-11-30 12:14:12 +00:00
sysctl Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-22 10:24:03 -08:00
target
thermal thermal: add description for integral_cutoff unit 2016-01-14 13:29:08 -07:00
timers
tpm
trace x86, tracing, perf: Add trace point for MSR accesses 2015-12-06 12:56:10 +01:00
usb The chipidea changes for v4.5-rc1 2015-12-26 16:59:14 -08:00
vDSO
video4linux [media] media framework: rename pads init function to media_entity_pads_init() 2016-01-11 12:19:03 -02:00
virtual KVM doc: Fix KVM_SMI chapter number 2016-01-26 16:29:59 +01:00
vm Merge branch 'akpm' (patches from Andrew) 2016-01-17 12:58:52 -08:00
w1
watchdog watchdog: Drop pointer to watchdog device from struct watchdog_device 2016-01-11 21:53:59 +01:00
wimax
x86
xtensa
zh_CN [media] media framework: rename pads init function to media_entity_pads_init() 2016-01-11 12:19:03 -02:00
00-INDEX
adding-syscalls.txt
applying-patches.txt
assoc_array.txt
atomic_ops.txt
bad_memory.txt
basic_profiling.txt
bcache.txt
binfmt_misc.txt
braille-console.txt
bt8xxgpio.txt
btmrvl.txt
BUG-HUNTING
bus-virt-phys-mapping.txt
cachetlb.txt
cgroup-v2.txt mm: memcontrol: basic memory statistics in cgroup2 memory controller 2016-01-20 17:09:18 -08:00
Changes
circular-buffers.txt
clk.txt
coccinelle.txt
CodeOfConflict
CodingStyle Documentation: fix typo in CodingStyle 2016-01-11 18:18:16 -07:00
cpu-hotplug.txt Documentation: cpu-hotplug: Fix sysfs mount instructions 2015-12-10 11:35:30 -07:00
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt
devices.txt
digsig.txt
DMA-API-HOWTO.txt dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
DMA-API.txt DMA-API: fix confusing sentence in Documentation/DMA-API.txt 2016-01-11 18:29:00 -07:00
DMA-attributes.txt
dma-buf-sharing.txt
DMA-ISA-LPC.txt
dontdiff
dynamic-debug-howto.txt
edac.txt EDAC: Remove references to bluesmoke.sourceforge.net 2015-11-26 14:46:06 +01:00
efi-stub.txt
eisa.txt
email-clients.txt A few more documentation patches that wandered in and have no reason to 2015-11-13 09:19:05 -08:00
flexible-arrays.txt
futex-requeue-pi.txt
gcov.txt
gdb-kernel-debugging.txt
highuid.txt
HOWTO Documentation: HOWTO: update code cross reference link 2015-12-10 11:19:35 -07:00
hsi.txt
hw_random.txt
hwspinlock.txt
init.txt
initrd.txt
intel_txt.txt
Intel-IOMMU.txt iommu/vt-d: Fix link to Intel IOMMU Specification 2016-01-29 12:32:12 +01:00
io_ordering.txt
io-mapping.txt
iostats.txt
IPMI.txt ipmi watchdog : add panic_wdt_timeout parameter 2015-11-16 06:28:43 -06:00
IRQ-affinity.txt
IRQ-domain.txt
IRQ.txt
irqflags-tracing.txt
isapnp.txt
java.txt
kasan.txt
kernel-doc-nano-HOWTO.txt
kernel-docs.txt Documentation: translations: update linux cross reference link 2016-01-11 18:26:58 -07:00
kernel-parameters.txt Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2016-01-24 12:50:56 -08:00
kernel-per-CPU-kthreads.txt irq_poll: make blk-iopoll available outside the block layer 2015-12-11 11:52:24 -08:00
kmemcheck.txt
kmemleak.txt
kobject.txt
kprobes.txt
kref.txt
kselftest.txt
ldm.txt
local_ops.txt
lockup-watchdogs.txt
logo.gif
logo.txt
lzo.txt
magic-number.txt
mailbox.txt
Makefile spi: Move spi code from Documentation to tools 2015-11-23 14:54:01 +00:00
ManagementStyle
md-cluster.txt md-cluster: update the documentation 2016-01-06 11:39:06 +11:00
md.txt
memory-barriers.txt virtio: barrier rework+fixes 2016-01-18 16:44:24 -08:00
memory-hotplug.txt
men-chameleon-bus.txt
module-signing.txt
mono.txt
nommu-mmap.txt
ntb.txt
numastat.txt
oops-tracing.txt
padata.txt
parport-lowlevel.txt
parport.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt
pinctrl.txt
pnp.txt
preempt-locking.txt
printk-formats.txt printk-formats.txt: remove unimplemented %pT 2016-01-16 11:17:30 -08:00
pwm.txt
ramoops.txt
rbtree.txt
remoteproc.txt
rfkill.txt
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt
SAK.txt
SecurityBugs
serial-console.txt
sgi-ioc4.txt
SM501.txt
smsc_ece1099.txt
sparse.txt
stable_api_nonsense.txt
stable_kernel_rules.txt stable_kernel_rules.txt: Remove extra space after Cc: 2015-11-20 16:54:57 -07:00
static-keys.txt
SubmitChecklist
SubmittingDrivers
SubmittingPatches A few more documentation patches that wandered in and have no reason to 2015-11-13 09:19:05 -08:00
svga.txt
sysfs-rules.txt
sysrq.txt
this_cpu_ops.txt
ubsan.txt UBSAN: run-time undefined behavior sanity checker 2016-01-20 17:09:18 -08:00
unaligned-memory-access.txt
unicode.txt
unshare.txt
vfio.txt
VGA-softcursor.txt
vgaarbiter.txt
video-output.txt
vme_api.txt
volatile-considered-harmful.txt
workqueue.txt
xillybus.txt
xz.txt
zorro.txt