Commit Graph

28371 Commits

Author SHA1 Message Date
Pablo Neira Ayuso
c05cdb1b86 netlink: allow large data transfers from user-space
I can hit ENOBUFS in the sendmsg() path with a large batch that is
composed of many netlink messages. Here that limit is 8 MBytes of
skbuff data area as kmalloc does not manage to get more than that.

While discussing atomic rule-set for nftables with Patrick McHardy,
we decided to put all rule-set updates that need to be applied
atomically in one single batch to simplify the existing approach.
However, as explained above, the existing netlink code limits us
to a maximum of ~20000 rules that fit in one single batch without
hitting ENOBUFS. iptables does not have such limitation as it is
using vmalloc.

This patch adds netlink_alloc_large_skb() which is only used in
the netlink_sendmsg() path. It uses alloc_skb if the memory
requested is <= one memory page, that should be the common case
for most subsystems, else vmalloc for higher memory allocations.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 16:26:34 -07:00
Daniel Borkmann
28850dc7c7 net: tcp: move GRO/GSO functions to tcp_offload
Would be good to make things explicit and move those functions to
a new file called tcp_offload.c, thus make this similar to tcpv6_offload.c.
While moving all related functions into tcp_offload.c, we can also
make some of them static, since they are only used there. Also, add
an explicit registration function.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 14:39:05 -07:00
Daniel Borkmann
5ee9859157 net: minor: tcp: use tcp_skb_mss helper in tcp_tso_segment
We have the minimal inline helper tcp_skb_mss to access
skb_shinfo(skb)->gso_size, so also use it here to get mss.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07 14:39:05 -07:00
David S. Miller
93a306aef5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge 'net' into 'net-next' to get the MSG_CMSG_COMPAT
regression fix.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-06 23:39:26 -07:00
Andy Lutomirski
a7526eb5d0 net: Unbreak compat_sys_{send,recv}msg
I broke them in this commit:

    commit 1be374a051
    Author: Andy Lutomirski <luto@amacapital.net>
    Date:   Wed May 22 14:07:44 2013 -0700

        net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg

This patch adds __sys_sendmsg and __sys_sendmsg as common helpers that accept
MSG_CMSG_COMPAT and blocks MSG_CMSG_COMPAT at the syscall entrypoints.  It
also reverts some unnecessary checks in sys_socketcall.

Apparently I was suffering from underscore blindness the first time around.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-06 11:52:14 -07:00
David S. Miller
143554ace8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Conflicts:
	net/netfilter/nf_log.c

The conflict in nf_log.c is that in 'net' we added CONFIG_PROC_FS
protection around foo_proc_entry() calls to fix a build failure,
whereas in Pablo's tree a guard if() test around a call is
remove_proc_entry() was removed.  Trivially resolved.

Pablo Neira Ayuso says:

====================
The following patchset contains the first batch of
Netfilter/IPVS updates for your net-next tree, they are:

* Three patches with improvements and code refactorization
  for nfnetlink_queue, from Florian Westphal.

* FTP helper now parses replies without brackets, as RFC1123
  recommends, from Jeff Mahoney.

* Rise a warning to tell everyone about ULOG deprecation,
  NFLOG has been already in the kernel tree for long time
  and supersedes the old logging over netlink stub, from
  myself.

* Don't panic if we fail to load netfilter core framework,
  just bail out instead, from myself.

* Add cond_resched_rcu, used by IPVS to allow rescheduling
  while walking over big hashtables, from Simon Horman.

* Change type of IPVS sysctl_sync_qlen_max sysctl to avoid
  possible overflow, from Zhang Yanfei.

* Use strlcpy instead of strncpy to skip zeroing of already
  initialized area to write the extension names in ebtables,
  from Chen Gang.

* Use already existing per-cpu notrack object from xt_CT,
  from Eric Dumazet.

* Save explicit socket lookup in xt_socket now that we have
  early demux, also from Eric Dumazet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-06 01:03:06 -07:00
David S. Miller
6bc19fb82d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge 'net' bug fixes into 'net-next' as we have patches
that will build on top of them.

This merge commit includes a change from Emil Goode
(emilgoode@gmail.com) that fixes a warning that would
have been introduced by this merge.  Specifically it
fixes the pingv6_ops method ipv6_chk_addr() to add a
"const" to the "struct net_device *dev" argument and
likewise update the dummy_ipv6_chk_addr() declaration.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-05 16:37:30 -07:00
Florian Westphal
7f87712c01 netfilter: nfnetlink_queue: only add CAP_LEN attr when needed
CAP_LEN contains the size of the network packet we're queueing to
userspace, i.e. normally it is the same as the NFQA_PAYLOAD attribute len.

Include it only in the unlikely case when NFQA_PAYLOAD is truncated due
to copy_range limitations.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-05 12:40:54 +02:00
Florian Westphal
9cefbbc9c8 netfilter: nfnetlink_queue: cleanup copy_range usage
For every packet queued, we check if configured copy_range
is 0, and treat that as 'copy entire packet'.

We can move this check to the queue configuration, and can
set copy_range appropriately.

Also, convert repetitive '0xffff - NLA_HDRLEN' to a macro.

[ queue initialization still used 0xffff, although its harmless
  since the initial setting is overwritten on queue config ]

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-05 12:40:28 +02:00
Linus Torvalds
4d3797d7e1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix timeouts with direct mode authentication in mac80211, from
    Stanislaw Gruszka.

 2) Aggregation sessions can deadlock in ath9k, from Felix Fietkau.

 3) Netfilter's xt_addrtype doesn't work with ipv6 due to route lookups
    creating undesirable cache entries, from Florian Westphal.

 4) Fix netfilter's ipt_ULOG from generating non-NULL terminated
    strings.

 5) Fix netdev transmit queue crashes in mac80211, from Johannes Berg.

 6) Fix copy and paste error in 802.11 stack that broke reporting of
    64-bit station tx statistics, from Felix Fietkau.

 7) When qlge_probe fails, it leaks the netdev.  Fix from Wei Yongjun.

 8) SKB control block (where we store the IP options information,
    amongst other things) must be cleared properly otherwise ICMP
    sending can crash for IP tunnels.  Fix from Eric Dumazet.

 9) Verification of Energy Efficient Ether support was coded wrongly,
    the test was inversed.  Fix from Giuseppe CAVALLARO.

10) TCP handles redirects improperly because the wrong flow key is used
    for the route lookup.  From Michal Kubecek.

11) Don't interpret MSG_CMSG_COMPAT from userspace, fix from Andy
    Lutomirski.

12) The new AF_VSOCK was missing from the lockdep string table, fix from
    Federico Vaga.

13) be2net doesn't handle checksumming of IP fragments properly, from
    Somnath Kotur.

14) Fix several bugs in the device address list code that lead to
    crashes and other misbehaviors.  From Jay Vosburgh.

15) Fix ipv6 segmentation handling of fragmented GRE tunnel traffic,
    from Pravin B Shalr.

16) Fix usage of stale policies in IPSEC layer, from Paul Moore.

17) Fix team driver dump of ports when there are a large number of them,
    from Jiri Pirko.

18) Fix softlockups in UDP ipv4 socket lookup causes by and error in the
    hlist_nulls_for_each_entry_rcu() macro.  From Eric Dumazet.

19) Fix several regressions added by the high rate accuracy changes to
    the htb packet scheduler.  From Eric Dumazet.

20) Fix DMA'ing onto the stack in esd_usb2 and peak_usb CAN drivers,
    from Olivier Sobrie and Marc Kleine-Budde.

21) Fix unremovable network devices due to missing route pointer
    installation in the per-device ipv6 address list entries.  From Gao
    feng.

22) Apply the tg3 5719 DMA workaround on 5720 chips as well, otherwise
    we get stalls.  From Nithin Sujir.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (68 commits)
  net_sched: htb: do not mix 1ns and 64ns time units
  net: fix sk_buff head without data area
  tg3: Add read dma workaround for 5720
  net: ethernet: xilinx_emaclite: set protocol selector bits when writing ANAR
  bnx2x: Fix bridged GSO for 57710/57711 chips
  net: fec: add fallback to random MAC address
  bnx2x: fix TCP offload for tunneling ipv4 over ipv6
  ipv6: assign rt6_info to inet6_ifaddr in init_loopback
  net/mlx4_core: Keep VF assigned MAC in the PF admin table
  net/mlx4_en: Handle unassigned VF MAC address correctly
  net/mlx4_core: Return -EPROBE_DEFER when a VF is probed before PF is sufficiently initialized
  net/mlx4_en: Fix adaptive moderation cq update
  net: can: peak_usb: Do not do dma on the stack
  net: can: esd_usb2: Do not do dma on the stack
  net: can: kvaser_usb: fix reception on "USBcan Pro" and "USBcan R" type hardware.
  net_sched: restore "overhead xxx" handling
  net: force a reload of first item in hlist_nulls_for_each_entry_rcu
  hyperv: Fix vlan_proto setting in netvsc_recv_callback()
  team: fix port list dump for big number of ports
  list: introduce list_first_entry_or_null
  ...
2013-06-05 19:19:04 +09:00
Eric Dumazet
5343a7f8be net_sched: htb: do not mix 1ns and 64ns time units
commit 56b765b79 ("htb: improved accuracy at high rates") added another
regression for low rates, because it mixes 1ns and 64ns time units.

So the maximum delay (mbuffer) was not 60 second, but 937 ms.

Lets convert all time fields to 1ns as 64bit arches are becoming the
norm.

Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 17:44:07 -07:00
Amerigo Wang
00f97da17a netpoll: fix position of network header
Similar to the problem in pktgen, netpoll uses skb_tail_offset()
too, as the code is copied from pktgen.

Also use return values of skb_put() directly, this will simiplify
the code.

Reported-by: Thomas Graf <tgraf@suug.ch>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Daniel Borkmann <dborkmann@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 17:37:04 -07:00
Thomas Graf
525cebedb3 pktgen: Fix position of ip and udp header
skb_set_network_header() expects an offset based on the data pointer
whereas skb_tail_offset() also includes the headroom. This resulted
in the ip header being written in a wrong location.

Use return values of skb_put() directly and rely on skb->len to
set mac, network, and transport header.

Cc: Simon Horman <horms@verge.net.au>
Cc: Daniel Borkmann <dborkmann@redhat.com>
Assisted-by: Daniel Borkmann <dborkmann@redhat.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Daniel Borkmann <dborkmann@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 17:35:45 -07:00
Pablo Neira
5e71d9d77c net: fix sk_buff head without data area
Eric Dumazet spotted that we have to check skb->head instead
of skb->data as skb->head points to the beginning of the
data area of the skbuff. Similarly, we have to initialize the
skb->head pointer, not skb->data in __alloc_skb_head.

After this fix, netlink crashes in the release path of the
sk_buff, so let's fix that as well.

This bug was introduced in (0ebd0ac net: add function to
allocate sk_buff head without data area).

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 17:26:49 -07:00
Yan Burman
600fed5e97 net/ethtool: Fix comment regarding location of dev_ethtool() call
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 17:10:03 -07:00
Cong Wang
c26d6b46da ping: always initialize ->sin6_scope_id and ->sin6_flowinfo
If we don't need scope id, we should initialize it to zero.
Same for ->sin6_flowinfo.

Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 16:58:42 -07:00
Gao feng
534c877928 ipv6: assign rt6_info to inet6_ifaddr in init_loopback
Commit 25fb6ca4ed
"net IPv6 : Fix broken IPv6 routing table after loopback down-up"
forgot to assign rt6_info to the inet6_ifaddr.
When disable the net device, the rt6_info which allocated
in init_loopback will not be destroied in __ipv6_ifa_notify.

This will trigger the waring message below
[23527.916091] unregister_netdevice: waiting for tap0 to become free. Usage count = 1

Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 16:57:41 -07:00
Baruch Siach
430f03cde2 net: mark netdev_create_hash __net_init
netdev_create_hash() is only called from netdev_init() which is marked
__net_init.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 16:47:27 -07:00
Jean Sacren
4960c2c6fa Kconfig: remove dangling references to the deleted file
Commit 202dc3fc59 (Documentation: remove
obsolete networking/multicast.txt file) deleted the obsolete file. After
the file has been removed, clean up a couple of places where references
to the deleted file were made so that users wouldn't be confused when
they consult the Help menu.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 15:17:39 -07:00
Jean Sacren
ebd4687af7 xfrm: simplify the exit path of xfrm_output_one()
Clean up unnecessary assignment and jump. While there, fix up the label
name.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 15:17:38 -07:00
Lorenzo Colitti
d862e54614 net: ipv6: Implement /proc/net/icmp6.
The format is based on /proc/net/icmp and /proc/net/{udp,raw}6.

Compiles and displays reasonable results with CONFIG_IPV6={n,m,y}
Couldn't figure out how to test without CONFIG_PROC_FS enabled.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 12:56:14 -07:00
Lorenzo Colitti
8cc785f6f4 net: ipv4: make the ping /proc code AF-independent
Introduce a ping_seq_afinfo structure (similar to its UDP
equivalent) and use it to make some of the ping /proc functions
address-family independent. Rename the remaining ping /proc
functions from ping_* to ping_v4_*.

Compiles and displays reasonable results with CONFIG_IPV6={n,m,y}

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 12:56:14 -07:00
Lorenzo Colitti
17ef66afc0 net: ipv6: Unify {raw,udp}6_sock_seq_show.
udp6_sock_seq_show and raw6_sock_seq_show are identical, except
the UDP version displays ports and the raw version displays the
protocol. Refactor most of the code in these two functions into
a new common ip6_dgram_sock_seq_show function, in preparation
for using it to display ICMPv6 sockets as well.

Also reduce the indentation in parts of include/net/transp_v6.h
to improve readability.

Compiles and displays reasonable results with CONFIG_IPV6={n,m,y}

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 12:56:14 -07:00
Cong Wang
9a99d4a50c icmp: avoid allocating large struct on stack
struct icmp_bxm is a large struct, reduce stack usage
by allocating it on heap.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-03 00:28:44 -07:00
Rami Rosen
08578d8d4e ] icmp: fix icmp_unreach() comment.
ICMP_PARAMETERPROB is handled by icmp_unreach(); This patch adds
ICMP_PARAMETERPROB to the list of ICMP message types handled by icmp_unreach().

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-03 00:27:15 -07:00
Timo Teräs
5aad1de5ea ipv4: use separate genid for next hop exceptions
commit 13d82bf5 (ipv4: Fix flushing of cached routing informations)
added the support to flush learned pmtu information.

However, using rt_genid is quite heavy as it is bumped on route
add/change and multicast events amongst other places. These can
happen quite often, especially if using dynamic routing protocols.

While this is ok with routes (as they are just recreated locally),
the pmtu information is learned from remote systems and the icmp
notification can come with long delays. It is worthy to have separate
genid to avoid excessive pmtu resets.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-03 00:07:43 -07:00
Timo Teräs
f016229e30 ipv4: rate limit updating of next hop exceptions with same pmtu
The tunnel devices call update_pmtu for each packet sent, this causes
contention on the fnhe_lock. Ignore the pmtu update if pmtu is not
actually changed, and there is still plenty of time before the entry
expires.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-03 00:07:43 -07:00
Timo Teräs
387aa65a89 ipv4: properly refresh rtable entries on pmtu/redirect events
This reverts commit 05ab86c5 (xfrm4: Invalidate all ipv4 routes on
IPsec pmtu events). Flushing all cached entries is not needed.

Instead, invalidate only the related next hop dsts to recheck for
the added next hop exception where needed. This also fixes a subtle
race due to bumping generation id's before updating the pmtu.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-03 00:07:42 -07:00
Eric Dumazet
01cb71d2d4 net_sched: restore "overhead xxx" handling
commit 56b765b79 ("htb: improved accuracy at high rates")
broke the "overhead xxx" handling, as well as the "linklayer atm"
attribute.

tc class add ... htb rate X ceil Y linklayer atm overhead 10

This patch restores the "overhead xxx" handling, for htb, tbf
and act_police

The "linklayer atm" thing needs a separate fix.

Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vimalkumar <j.vimal@gmail.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-02 22:22:35 -07:00
Paul Moore
e4c1721642 xfrm: force a garbage collection after deleting a policy
In some cases after deleting a policy from the SPD the policy would
remain in the dst/flow/route cache for an extended period of time
which caused problems for SELinux as its dynamic network access
controls key off of the number of XFRM policy and state entries.
This patch corrects this problem by forcing a XFRM garbage collection
whenever a policy is sucessfully removed.

Reported-by: Ondrej Moris <omoris@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 17:30:07 -07:00
Nicolas Dichtel
32b8a8e59c sit: add IPv4 over IPv4 support
This patch adds the support of IPv4 over Ipv4 for the module sit. The gain of
this feature is to be able to have 4in4 and 6in4 over the same interface
instead of having one interface for 6in4 and another for 4in4 even if
encapsulation addresses are the same.

To avoid conflicting with ipip module, sit IPv4 over IPv4 protocol is
registered with a smaller priority.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 17:19:05 -07:00
Nicolas Dichtel
bf3d6a8f79 iptunnel: specify protocol outside IP header
Before this patch, ip_tunnel_xmit() was using the field protocol from the IP
header passed into argument.
There is no functional change, this patch prepares the support of IPv4 over
IPv4 for module sit.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 17:19:05 -07:00
Pravin B Shelar
1e2bd517c1 udp6: Fix udp fragmentation for tunnel traffic.
udp6 over GRE tunnel does not work after to GRE tso changes. GRE
tso handler passes inner packet but keeps track of outer header
start in SKB_GSO_CB(skb)->mac_offset.  udp6 fragment need to
take care of outer header, which start at the mac_offset, while
adding fragment header.
This bug is introduced by commit 68c3316311 (GRE: Add TCP
segmentation offload for GRE).

Reported-by: Dmitry Kravkov <dkravkov@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Tested-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 17:06:07 -07:00
Cong Wang
35d0461061 net: clean up skb headers code
commit 1a37e412a0 (net: Use 16bits for *_headers
fields of struct skbuff) converts skb->*_header to u16,
some #if NET_SKBUFF_DATA_USES_OFFSET are now useless,
and to be safe, we could just use "X = (typeof(X)) ~0U;"
as suggested by David.

Cc: David S. Miller <davem@davemloft.net>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 17:02:47 -07:00
Jay Vosburgh
b190a50875 net/core: dev_mc_sync_multiple calls wrong helper
The dev_mc_sync_multiple function is currently calling
__hw_addr_sync, and not __hw_addr_sync_multiple.  This will result in
addresses only being synced to the first device from the set.

	Corrected by calling the _multiple variant.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Tested-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 16:56:56 -07:00
Jay Vosburgh
29ca2f8fcc net/core: __hw_addr_sync_one / _multiple broken
Currently, __hw_addr_sync_one is called in a loop by
__hw_addr_sync_multiple to sync each of a "from" device's hw addresses
to a "to" device.  __hw_addr_sync_one calls __hw_addr_add_ex to attempt
to add each address.  __hw_addr_add_ex is called with global=false, and
sync=true.

	__hw_addr_add_ex checks to see if the new address matches an
address already on the list.  If so, it tests global and sync.  In this
case, sync=true, and it then checks if the address is already synced,
and if so, returns 0.

	This 0 return causes __hw_addr_sync_one to increment the sync_cnt
and refcount for the "from" list's address entry, even though the address
is already synced and has a reference and sync_cnt.  This will cause
the sync_cnt and refcount to increment without bound every time an
addresses is added to the "from" device and synced to the "to" device.

	The fix here has two parts:

	First, when __hw_addr_add_ex finds the address already exists
and is synced, return -EEXIST instead of 0.

	Second, __hw_addr_sync_one checks the error return for -EEXIST,
and if so, it (a) does not add a refcount/sync_cnt, and (b) returns 0
itself so that __hw_addr_sync_multiple will not return an error.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Tested-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 16:56:56 -07:00
Jay Vosburgh
60ba834c2f net/core: __hw_addr_unsync_one "from" address not marked synced
When an address is added to a subordinate interface (the "to"
list), the address entry in the "from" list is not marked "synced" as
the entry added to the "to" list is.

	When performing the unsync operation (e.g., dev_mc_unsync),
__hw_addr_unsync_one calls __hw_addr_del_entry with the "synced"
parameter set to true for the case when the address reference is being
released from the "from" list.  This causes a test inside to fail,
with the result being that the reference count on the "from" address
is not properly decremeted and the address on the "from" list will
never be freed.

	Correct this by having __hw_addr_unsync_one call the
__hw_addr_del_entry function with the "sync" flag set to false for the
"remove from the from list" case.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Tested-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 16:56:56 -07:00
Jay Vosburgh
9747ba6636 net/core: __hw_addr_create_ex does not initialize sync_cnt
The sync_cnt field is not being initialized, which can result
in arbitrary values in the field.  Fixed by initializing it to zero.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Tested-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 16:56:56 -07:00
Simon Horman
938177e9f3 netfilter: Correct calculation using skb->tail and skb-network_header
This corrects an regression introduced by "net: Use 16bits for *_headers
fields of struct skbuff" when NET_SKBUFF_DATA_USES_OFFSET is not set. In
that case skb->tail will be a pointer whereas skb->network_header
will be an offset from head. This is corrected by using wrappers that
ensure that calculations are always made using pointers.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 16:38:25 -07:00
Nicolas Dichtel
fda3f402f4 snmp6: remove IPSTATS_MIB_CSUMERRORS
This stat is not relevant in IPv6, there is no checksum in IPv6 header.
Just leave a comment to explain the hole.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 16:26:49 -07:00
Eric Dumazet
db8caf3dbc gro: should aggregate frames without DF
GRO on IPv4 doesn't aggregate frames if they don't have DF bit set.

Some servers use IP_MTU_DISCOVER/IP_PMTUDISC_PROBE, so linux receivers
are unable to aggregate this kind of traffic.

The right thing to do is to allow aggregation as long as the DF bit has
same value on all segments.

bnx2x LRO does this correctly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 16:25:56 -07:00
David Majnemer
c3f1dbaf6e net: Update RFS target at poll for tcp/udp
The current state of affairs is that read()/write() will setup
RFS (Receive Flow Steering) for internet protocol sockets while
poll()/epoll() does not.

When poll() gets called with a TCP or UDP socket, we should update
the flow target.

This permits to RFS (if enabled) to select the appropriate CPU for
following incoming packets.

Note: Only connected UDP sockets can benefit from RFS.

Signed-off-by: David Majnemer <majnemer@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 16:24:43 -07:00
David S. Miller
fb68e2f438 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
Please pull this batch of fixes intended for the 3.10 stream...

Regarding the NFC bits, Samuel says:

"This is the first batch of NFC fixes for 3.10, and it contains:

- 3 fixes for the NFC MEI support:
        * We now depend on the correct Kconfig symbol.
        * We register an MEI event callback whenever we enable an NFC device,
          otherwise we fail to read anything after an enable/disable cycle.
        * We only disable an MEI device from its disable mey_phy_ops,
          preventing useless consecutive disable calls.

- An NFC Makefile cleanup, as I forgot to remove a commented out line when
  moving the LLCP code to the NFC top level directory."

As for the mac80211 bits, Johannes says:

"This time I have a fix from Stanislaw for a stupid mistake I made in the
auth/assoc timeout changes, a fix from Felix for 64-bit traffic counters
and one from Helmut for address mask handling in mac80211. I also have a
few fixes myself for four different crashes reported by a few people."

And Johannes says this about the iwlwifi bit:

"This fixes a brown paper-bag bug that we really should've caught in
review. More details in the changelog for the fix."

On top of that...

Arend van Spriel and Hante Meuleman cooperate to send a series of AP
and P2P mode fixes for brcmfmac.

Gabor Juhos corrects a register offset for AR9550, avoiding a bus
error.

Dan Carpenter provides a fixup to some dmesg output in the atmel
driver.

And, finally...

Felix Fietkau not only gives us a trio of small AR934x fixes, but
also refactors the ath9k aggregation session start/stop handling
(using the generic mac80211 support) in order to avoid a deadlock.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 02:38:19 -07:00
Simon Horman
aef6de511a sctp: Correct byte order of access to skb->{network, transport}_header
Corrects an byte order conflict introduced by "sctp: Correct access to
skb->{network, transport}_header". All the values in question are host
byte order.

Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 01:41:36 -07:00
Yuchung Cheng
c7d9d6a185 tcp: undo on DSACK during recovery
If the receiver supports DSACK, sender can detect false recoveries and
revert cwnd reductions triggered by either severe network reordering or
concurrent reordering and loss event.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-30 18:06:11 -07:00
Yuchung Cheng
7026b912f9 tcp: fix undo on partial ack in recovery
Upon detecting spurious fast retransmit via timestamps during recovery,
use PRR to clock out new data packet instead of retransmission. Once
all retransmission are proven spurious, the sender then reverts the
cwnd reduction and congestion state to open or disorder.

The current code does the opposite: it undoes cwnd as soon as any
retransmission is spurious and continues to retransmit until all
data are acked. This nullifies the point to undo the cwnd because
the sender is still retransmistting spuriously. This patch fixes
it. The undo_ssthresh argument of tcp_undo_cwnd_reductiuon() is no
longer needed and is removed.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-30 18:06:11 -07:00
Yuchung Cheng
6a63df46a7 tcp: refactor undo functions
Refactor and relocate various functions or variables to prepare the
undo fix.  Remove some unused function arguments. Rename tcp_undo_cwr
to tcp_undo_cwnd_reduction to be consistent with the rest of
CWR related function names.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-30 18:06:11 -07:00
Yuchung Cheng
6804973ffb tcp: consolidate PRR packet accounting
This patch series fixes an undo bug in fast recovery: the sender
mistakenly undos the cwnd too early but continues fast retransmits
until all pending data are acked. This also multiplies the SNMP
stat PARTIALUNDO events by the degree of the network reordering.

The first patch prepares the fix by consolidating the accounting
of newly_acked_sacked in tcp_cwnd_reduction(), instead of updating
newly_acked_sacked everytime sacked_out is adjusted.  Also pass
acked and prior_unsacked as const type because they are readonly
in the rest of recovery processing.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-30 18:06:11 -07:00
Linus Torvalds
4203afc3fb Merge branch 'for-3.10' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
 "A couple minor fixes for the (new to 3.10) gss-proxy code.

  And one regression from user-namespace changes.  (XBMC clients were
  doing something admittedly weird--sending -1 gid's--but something that
  we used to allow.)"

* 'for-3.10' of git://linux-nfs.org/~bfields/linux:
  svcrpc: fix failures to handle -1 uid's and gid's
  svcrpc: implement O_NONBLOCK behavior for use-gss-proxy
  svcauth_gss: fix error code in use_gss_proxy()
2013-05-31 09:48:56 +09:00
David S. Miller
73ce00d4d6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter/IPVS fixes for 3.10-rc3,
they are:

* fix xt_addrtype with IPv6, from Florian Westphal. This required
  a new hook for IPv6 functions in the netfilter core to avoid
  hard dependencies with the ipv6 subsystem when this match is
  only used for IPv4.

* fix connection reuse case in IPVS. Currently, if an reused
  connection are directed to the same server. If that server is
  down, those connection would fail. Therefore, clear the
  connection and choose a new server among the available ones.

* fix possible non-nul terminated string sent to user-space if
  ipt_ULOG is used as the default netfilter logging stub, from
  Chen Gang.

* fix mark logging of IPv6 packets in xt_LOG, from Michal Kubecek.
  This bug has been there since 2.6.26.

* Fix breakage ip_vs_sh due to incorrect structure layout for
  RCU, from Jan Beulich.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-30 16:38:38 -07:00