linux/net
Jesper Dangaard Brouer c2a936600f net: increase fragment memory usage limits
Increase the amount of memory usage limits for incomplete
IP fragments.

Arguing for new thresh high/low values:

 High threshold = 4 MBytes
 Low  threshold = 3 MBytes

The fragmentation memory accounting code, tries to account for the
real memory usage, by measuring both the size of frag queue struct
(inet_frag_queue (ipv4:ipq/ipv6:frag_queue)) and the SKB's truesize.

We want to be able to handle/hold-on-to enough fragments, to ensure
good performance, without causing incomplete fragments to hurt
scalability, by causing the number of inet_frag_queue to grow too much
(resulting longer searches for frag queues).

For IPv4, how much memory does the largest frag consume.

Maximum size fragment is 64K, which is approx 44 fragments with
MTU(1500) sized packets. Sizeof(struct ipq) is 200.  A 1500 byte
packet results in a truesize of 2944 (not 2048 as I first assumed)

  (44*2944)+200 = 129736 bytes

The current default high thresh of 262144 bytes, is obviously
problematic, as only two 64K fragments can fit in the queue at the
same time.

How many 64K fragment can we fit into 4 MBytes:

  4*2^20/((44*2944)+200) = 32.34 fragment in queues

An attacker could send a separate/distinct fake fragment packets per
queue, causing us to allocate one inet_frag_queue per packet, and thus
attacking the hash table and its lists.

How many frag queue do we need to store, and given a current hash size
of 64, what is the average list length.

Using one MTU sized fragment per inet_frag_queue, each consuming
(2944+200) 3144 bytes.

  4*2^20/(2944+200) = 1334 frag queues -> 21 avg list length

An attack could send small fragments, the smallest packet I could send
resulted in a truesize of 896 bytes (I'm a little surprised by this).

  4*2^20/(896+200)  = 3827 frag queues -> 59 avg list length

When increasing these number, we also need to followup with
improvements, that is going to help scalability.  Simply increasing
the hash size, is not enough as the current implementation does not
have a per hash bucket locking.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 14:29:53 -05:00
..
9p virtio: 9p: correctly pass physical address to userspace for high pages 2012-10-22 18:19:36 +10:30
802
8021q ethtool: fix drvinfo strings set in drivers 2013-01-06 21:06:31 -08:00
appletalk userns: Print out socket uids in a user namespace aware fashion. 2012-08-14 21:48:06 -07:00
atm atm: use scnprintf() instead of sprintf() 2012-12-17 20:50:51 -08:00
ax25 userns: Convert net/ax25 to use kuid_t where appropriate 2012-08-14 21:49:42 -07:00
batman-adv batman-adv: unbloat batadv_priv if debug is not enabled 2013-01-12 20:58:23 +10:00
bluetooth Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2012-12-13 12:00:48 -08:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-01-15 15:05:59 -05:00
caif caif_usb: Make the driver name check more efficient 2012-12-09 00:34:02 -05:00
can Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-12-13 12:00:02 -08:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2013-01-02 17:32:49 -08:00
core sk-filter: Add ability to lock a socket filter program 2013-01-17 03:21:25 -05:00
dcb net: Allow DCBnl to use other namespaces besides init_net 2012-12-10 14:09:01 -05:00
dccp inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock 2012-12-14 13:14:07 -05:00
decnet net: Push capable(CAP_NET_ADMIN) into the rtnl methods 2012-11-18 20:32:44 -05:00
dns_resolver Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2012-12-16 15:40:50 -08:00
dsa net: phy: remove flags argument from phy_{attach, connect, connect_direct} 2013-01-14 15:11:50 -05:00
ethernet net: remove unnecessary NET_ADDR_RANDOM "bitclean" 2013-01-03 22:37:36 -08:00
ieee802154 6lowpan: consider checksum bytes in fragmentation threshold 2012-11-30 12:19:24 -05:00
ipv4 net: increase fragment memory usage limits 2013-01-17 14:29:53 -05:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-01-15 15:05:59 -05:00
ipx userns: Print out socket uids in a user namespace aware fashion. 2012-08-14 21:48:06 -07:00
irda TTY/Serial merge for 3.8-rc1 2012-12-11 14:08:47 -08:00
iucv s390/irq: remove split irq fields from /proc/stat 2013-01-08 10:57:07 +01:00
key net: Allow userns root to control llc, netfilter, netlink, packet, and xfrm 2012-11-18 20:32:45 -05:00
l2tp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-11-10 18:32:51 -05:00
lapb
llc net: Allow userns root to control llc, netfilter, netlink, packet, and xfrm 2012-11-18 20:32:45 -05:00
mac80211 mac80211: fix maximum MTU 2013-01-03 13:00:01 +01:00
mac802154 mac802154: fix NOHZ local_softirq_pending 08 warning 2013-01-04 13:47:21 -08:00
netfilter netfilter: x_tables: print correct hook names for ARP 2013-01-13 12:54:12 +01:00
netlabel Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2012-10-02 13:38:27 -07:00
netlink netlink: Use FIELD_SIZEOF() in netlink_proto_init(). 2013-01-09 23:38:23 -08:00
netrom net: change return values from -EACCES to -EPERM 2012-09-21 13:58:08 -04:00
nfc nfc: remove noisy message from llcp_sock_sendmsg 2012-12-13 12:58:10 -05:00
openvswitch openvswitch: Use FIELD_SIZEOF() in dp_init(). 2013-01-09 23:38:24 -08:00
packet net: Allow userns root to control llc, netfilter, netlink, packet, and xfrm 2012-11-18 20:32:45 -05:00
phonet net: Push capable(CAP_NET_ADMIN) into the rtnl methods 2012-11-18 20:32:44 -05:00
rds IB/rds: suppress incompatible protocol when version is known 2012-12-26 15:17:37 -08:00
rfkill Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-12-13 12:00:02 -08:00
rose
rxrpc rxrpc: Use FIELD_SIZEOF() in af_rxrpc_init(). 2013-01-09 23:38:24 -08:00
sched pkt_sched: namespace aware act_mirred 2013-01-14 15:09:36 -05:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-01-15 15:05:59 -05:00
sunrpc NFS client bugfixe for Linux 3.8 2013-01-11 12:09:04 -08:00
tipc tipc: refactor accept() code for improved readability 2012-12-07 17:23:24 -05:00
unix unix: Use FIELD_SIZEOF() in af_unix_init(). 2013-01-09 23:38:24 -08:00
wanrouter
wimax
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-01-15 15:05:59 -05:00
x25
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2012-11-22 15:25:55 -05:00
compat.c make get_file() return its argument 2012-09-26 21:10:25 -04:00
Kconfig net: Add support for XPS without sysfs being defined 2013-01-10 22:47:04 -08:00
Makefile ipv6: Preserve ipv6 functionality needed by NET 2012-11-18 02:34:00 -05:00
nonet.c
socket.c cgroup: net_cls: Rework update socket logic 2012-10-26 03:40:51 -04:00
sysctl_net.c user_ns: get rid of duplicate code in net_ctl_permissions 2012-11-18 20:32:45 -05:00