linux/drivers
Daniel Borkmann 7cc9f7003a ipvlan: disallow userns cap_net_admin to change global mode/flags
When running Docker with userns isolation e.g. --userns-remap="default"
and spawning up some containers with CAP_NET_ADMIN under this realm, I
noticed that link changes on ipvlan slave device inside that container
can affect all devices from this ipvlan group which are in other net
namespaces where the container should have no permission to make changes
to, such as the init netns, for example.

This effectively allows to undo ipvlan private mode and switch globally to
bridge mode where slaves can communicate directly without going through
hostns, or it allows to switch between global operation mode (l2/l3/l3s)
for everyone bound to the given ipvlan master device. libnetwork plugin
here is creating an ipvlan master and ipvlan slave in hostns and a slave
each that is moved into the container's netns upon creation event.

* In hostns:

  # ip -d a
  [...]
  8: cilium_host@bond0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
     ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
     inet 10.41.0.1/32 scope link cilium_host
       valid_lft forever preferred_lft forever
  [...]

* Spawn container & change ipvlan mode setting inside of it:

  # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf
  9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c

  # docker exec -ti client ip -d a
  [...]
  10: cilium0@if4: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
         valid_lft forever preferred_lft forever

  # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2

  # docker exec -ti client ip -d a
  [...]
  10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
         valid_lft forever preferred_lft forever

* In hostns (mode switched to l2):

  # ip -d a
  [...]
  8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.0.1/32 scope link cilium_host
         valid_lft forever preferred_lft forever
  [...]

Same l3 -> l2 switch would also happen by creating another slave inside
the container's network namespace when specifying the existing cilium0
link to derive the actual (bond0) master:

  # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2

  # docker exec -ti client ip -d a
  [...]
  2: cilium1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
  10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
         valid_lft forever preferred_lft forever

* In hostns:

  # ip -d a
  [...]
  8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.0.1/32 scope link cilium_host
         valid_lft forever preferred_lft forever
  [...]

One way to mitigate it is to check CAP_NET_ADMIN permissions of
the ipvlan master device's ns, and only then allow to change
mode or flags for all devices bound to it. Above two cases are
then disallowed after the patch.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22 11:27:19 -08:00
..
accessibility
acpi ACPI: Set debug output flags independent of ACPICA 2019-02-07 12:24:28 +01:00
amba
android binderfs: remove separate device_initcall() 2019-02-01 15:50:26 +01:00
ata libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD 2019-02-06 12:47:09 -07:00
atm atm: he: fix sign-extension overflow on large shift 2019-01-17 11:27:00 -08:00
auxdisplay auxdisplay: ht16k33: fix potential user-after-free on module unload 2019-02-15 19:48:39 +01:00
base Driver core fixes for 5.0-rc6 2019-02-08 10:53:44 -08:00
bcma
block floppy: check_events callback should not return a negative number 2019-02-12 09:13:18 -07:00
bluetooth Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading 2018-12-19 13:43:42 +01:00
bus Merge branch 'pwm-dmtimer-fixes' into omap-for-v5.0/fixes-v2 2019-01-29 07:53:47 -08:00
cdrom gdrom: fix a memory leak bug 2018-12-29 08:20:44 -07:00
char Char/Misc driver fixes for 5.0-rc4 2019-01-25 13:03:34 -10:00
clk clk: qcom: gcc: Use active only source for CPUSS clocks 2019-01-24 11:41:48 -08:00
clocksource Merge branch 'pwm-dmtimer-fixes' into omap-for-v5.0/fixes-v2 2019-01-29 07:53:47 -08:00
connector
cpufreq Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'pm-sleep' 2019-01-11 10:09:51 +01:00
cpuidle cpuidle: poll_state: Fix default time limit 2019-01-30 22:57:42 +01:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2019-02-15 08:11:43 -08:00
dax mm, devm_memremap_pages: fix shutdown handling 2018-12-28 12:11:47 -08:00
dca
devfreq PM / devfreq: add devfreq_suspend/resume() functions 2018-12-11 11:40:13 +09:00
dio
dma dmaengine-fix-5.0-rc6 2019-02-10 10:39:37 -08:00
dma-buf drivers/dma-buf/udmabuf.c: convert to use vm_fault_t 2019-01-04 13:13:46 -08:00
edac EDAC, altera: Fix S10 persistent register offset 2019-01-24 17:13:59 +01:00
eisa
extcon
firewire scsi: communicate max segment size to the DMA mapping code 2019-01-22 20:40:59 -05:00
firmware Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-02-17 09:22:01 -08:00
fmc
fpga fpga: stratix10-soc: fix wrong of_node_put() in init function 2019-01-31 16:19:48 +01:00
fsi
gnss Merge 4.20-rc6 into tty-next 2018-12-10 10:17:45 +01:00
gpio gpio: vf610: Mask all GPIO interrupts 2019-01-28 15:28:43 +01:00
gpu drm: Use array_size() when creating lease 2019-02-15 13:08:08 +10:00
hid HID: debug: fix the ring buffer implementation 2019-01-29 12:09:11 +01:00
hsi
hv vmbus: fix subchannel removal 2019-01-09 19:20:31 -05:00
hwmon hwmon: (nct6775) Fix fan6 detection for NCT6793D 2019-01-27 18:55:49 -08:00
hwspinlock hwspinlock: fix return value check in stm32_hwspinlock_probe() 2019-01-03 11:42:10 -08:00
hwtracing intel_th: msu: Fix an off-by-one in attribute store 2018-12-19 20:21:06 +01:00
i2c i2c: bcm2835: Clear current buffer pointers and counts after a transfer 2019-02-15 09:45:05 +01:00
i3c i3c: master: dw: fix deadlock 2019-01-26 11:14:25 +01:00
ide ide: ensure atapi sense request aren't preempted 2019-01-31 08:25:09 -07:00
idle
iio First set of IIO fixes for the 5.0 cycle. 2019-02-03 13:10:41 +01:00
infiniband IB/uverbs: Fix OOPs in uverbs_user_mmap_disassociate 2019-01-29 13:57:22 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2019-02-17 08:30:35 -08:00
iommu IOMMU Fix for Linux v5.0-rc5: 2019-02-08 15:34:10 -08:00
ipack
irqchip Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-02-10 09:54:19 -08:00
isdn mISDN: fix a race in dev_expire_timer() 2019-02-05 16:39:29 -08:00
leds leds: lp5523: fix a missing check of return value of lp55xx_read 2019-01-17 22:27:39 +01:00
lightnvm lightnvm: pblk: fix use-after-free bug 2018-12-22 14:45:35 -07:00
macintosh Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
mailbox mailbox: bcm-flexrm-mailbox: Fix FlexRM ring flush timeout issue 2019-02-18 10:40:58 -06:00
mcb
md for-linus-20190215 2019-02-15 09:12:28 -08:00
media media: vim2m: only cancel work if it is for right context 2019-01-16 11:13:25 -05:00
memory ARM: SoC: late updates 2019-01-05 11:30:37 -08:00
memstick MMC core: 2018-12-28 16:52:18 -08:00
message scsi: flip the default on use_clustering 2018-12-18 23:13:12 -05:00
mfd mfd: Fix unmet dependency warning for MFD_TPS68470 2019-01-29 10:55:34 +01:00
misc mic: vop: Fix crash on remove 2019-02-01 15:53:54 +01:00
mmc mmc: meson-gx: fix interrupt name 2019-02-13 08:41:15 +01:00
mtd mtd: rawnand: gpmi: fix MX28 bus master lockup problem 2019-02-06 09:39:22 +01:00
mux
net ipvlan: disallow userns cap_net_admin to change global mode/flags 2019-02-22 11:27:19 -08:00
nfc
ntb cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
nubus
nvdimm libnvdimm/security: Require nvdimm_security_setup_events() to succeed 2019-01-21 09:57:43 -08:00
nvme nvme-pci: add missing unlock for reset error 2019-02-12 09:29:07 +01:00
nvmem
of OF: properties: add missing of_node_put 2019-01-16 12:49:53 -06:00
opp cpufreq: scpi/scmi: Fix freeing of dynamic OPPs 2019-01-04 12:19:40 +01:00
oprofile
parisc Kconfig file consolidation for v4.21 2018-12-29 13:40:29 -08:00
parport
pci pci-v5.0-fixes-4 2019-02-08 15:32:10 -08:00
pcmcia Included in this update: 2019-01-05 11:23:17 -08:00
perf drivers/perf: hisi: Fixup one DDRC PMU register offset 2019-01-04 10:13:27 +00:00
phy USB/PHY fixes for 5.0-rc4 2019-01-25 12:57:09 -10:00
pinctrl pinctrl: sunxi: Correct number of IRQ banks on H6 main pin controller 2019-01-22 10:52:39 +01:00
platform platform/x86: Fix unmet dependency warning for SAMSUNG_Q10 2019-01-29 10:59:07 +01:00
pnp Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
power power supply and reset changes for the v4.21 series 2018-12-28 20:22:45 -08:00
powercap
pps Kconfig updates for v4.21 2018-12-29 13:03:29 -08:00
ps3
ptp ptp: check that rsv field is zero in struct ptp_sys_offset_extended 2019-01-08 16:22:56 -05:00
pwm pwm: imx: Add ipg clock operation 2018-12-24 12:06:56 +01:00
rapidio cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
ras treewide: surround Kconfig file paths with double quotes 2018-12-22 00:25:54 +09:00
regulator Merge remote-tracking branch 'regulator/topic/coupled' into regulator-next 2018-12-21 13:43:35 +00:00
remoteproc virtio: don't allocate vqs when names[i] = NULL 2019-01-14 20:15:19 -05:00
reset reset: uniphier-glue: Add AHCI reset control support in glue layer 2019-01-07 16:38:51 +01:00
rpmsg
rtc RTC for 4.21 2019-01-01 13:24:31 -08:00
s390 s390 update with bug fixes for 5.0-rc6 2019-02-11 10:28:48 -08:00
sbus Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next 2018-12-26 10:32:18 -08:00
scsi SCSI fixes on 20190215 2019-02-15 13:36:43 -08:00
sfi
sh
siox
slimbus
sn
soc soc/fsl fixes for v5.0 2019-01-30 11:14:04 +01:00
soundwire
spi cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
spmi
ssb
staging Staging/IIO driver fixes for 5.0-rc6 2019-02-08 10:51:59 -08:00
target scsi: target: make the pi_prot_format ConfigFS path readable 2019-02-04 21:40:32 -05:00
tc
tee OP-TEE dynamic shm log message 2018-12-31 13:06:30 -08:00
thermal thermal: cpu_cooling: Clarify error message 2019-02-05 15:50:13 -08:00
thunderbolt
tty TTY/Serial fixes for 5.0-rc6 2019-02-08 10:49:55 -08:00
uio Char/Misc driver patches for 4.21-rc1 2018-12-28 20:54:57 -08:00
usb usb: typec: tcpm: Correct the PPS out_volt calculation 2019-01-31 09:14:00 +01:00
uwb
vfio vfio-pci/nvlink2: Fix ancient gcc warnings 2019-01-23 08:20:43 -07:00
vhost vhost: correctly check the return value of translate_desc() in log_used() 2019-02-19 13:14:45 -08:00
video TTY/Serial driver fixes for 5.0-rc4 2019-01-25 12:58:40 -10:00
virt
virtio virtio: drop internal struct from UAPI 2019-02-05 15:29:48 -05:00
visorbus
vlynq
vme
w1 treewide: surround Kconfig file paths with double quotes 2018-12-22 00:25:54 +09:00
watchdog watchdog: tqmx86: Fix a couple IS_ERR() vs NULL bugs 2019-01-07 10:10:35 +01:00
xen arm64/xen: fix xen-swiotlb cache flushing 2019-01-23 22:14:56 +01:00
zorro
Kconfig Kconfig file consolidation for v4.21 2018-12-29 13:40:29 -08:00
Makefile