Commit Graph

33586 Commits

Author SHA1 Message Date
Alexei Starovoitov
2695fb552c net: filter: rename 'struct sock_filter_int' into 'struct bpf_insn'
eBPF is used by socket filtering, seccomp and soon by tracing and
exposed to userspace, therefore 'sock_filter_int' name is not accurate.
Rename it to 'bpf_insn'

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-24 23:27:17 -07:00
Himangi Saraogi
e40f5c7234 net_sched: remove exceptional & on function name
In this file, function names are otherwise used as pointers without &.

A simplified version of the Coccinelle semantic patch that makes this
change is as follows:

// <smpl>
@r@
identifier f;
@@

f(...) { ... }

@@
identifier r.f;
@@

- &f
+ f
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-24 23:23:32 -07:00
Himangi Saraogi
56ec0fb10c neigh: remove exceptional & on function name
In this file, function names are otherwise used as pointers without &.

A simplified version of the Coccinelle semantic patch that makes this
change is as follows:

// <smpl>
@r@
identifier f;
@@

f(...) { ... }

@@
identifier r.f;
@@

- &f
+ f
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-24 23:23:31 -07:00
Himangi Saraogi
179542a548 igmp: remove exceptional & on function name
In this file, function names are otherwise used as pointers without &.

A simplified version of the Coccinelle semantic patch that makes this
change is as follows:

// <smpl>
@r@
identifier f;
@@

f(...) { ... }

@@
identifier r.f;
@@

- &f
+ f
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-24 23:23:31 -07:00
Andy Zhou
d9e0ecb814 openvswitch: Add skb_clone NULL check for the sampling action.
Fix a bug where skb_clone() NULL check is missing in sample action
implementation.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-07-24 09:37:22 -07:00
Simon Horman
651887b0c2 openvswitch: Sample action without side effects
The sample action is rather generic, allowing arbitrary actions to be
executed based on a probability. However its use, within the Open
vSwitch
code-base is limited: only a single user-space action is ever nested.

A consequence of the current implementation of sample actions is that
depending on weather the sample action executed (due to its probability)
any side-effects of nested actions may or may not be present before
executing subsequent actions.  This has the potential to complicate
verification of valid actions by the (kernel) datapath. And indeed
adding support for push and pop MPLS actions inside sample actions
is one case where such case.

In order to allow all supported actions to be continue to be nested
inside sample actions without the potential need for complex
verification code this patch changes the implementation of the sample
action in the kernel datapath so that sample actions are more like
a function call and any side effects of nested actions are not
present when executing subsequent actions.

With the above in mind the motivation for this change is twofold:

* To contain side-effects the sample action in the hope of making it
  easier to deal with in the future and;
* To avoid some rather complex verification code introduced in the MPLS
  datapath patch.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-07-24 09:37:21 -07:00
Andy Zhou
f53e38317d openvswitch: Avoid memory corruption in queue_userspace_packet()
In queue_userspace_packet(), the ovs_nla_put_flow return value is
not checked. This is fine as long as key_attr_size() returns the
correct value. In case it does not, the current code may corrupt buffer
memory. Add a run time assertion catch this case to avoid silent
failure.

Reported-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-07-24 09:37:20 -07:00
Pravin B Shelar
f6eec614d2 openvswitch: Enable tunnel GSO for OVS bridge.
Following patch enables all available tunnel GSO features for OVS
bridge device so that ovs can use hardware offloads available to
underling device.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-07-24 01:15:04 -07:00
Alex Wang
5cd667b0a4 openvswitch: Allow each vport to have an array of 'port_id's.
In order to allow handlers directly read upcalls from datapath,
we need to support per-handler netlink socket for each vport in
datapath.  This commit makes this happen.  Also, it is guaranteed
to be backward compatible with previous branch.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-07-24 01:15:04 -07:00
Alexei Starovoitov
f5bffecda9 net: filter: split filter.c into two files
BPF is used in several kernel components. This split creates logical boundary
between generic eBPF core and the rest

kernel/bpf/core.c: eBPF interpreter

net/core/filter.c: classic->eBPF converter, classic verifiers, socket filters

This patch only moves functions.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-23 21:06:22 -07:00
Quentin Armitage
f5220d6399 ipv4: Make IP_MULTICAST_ALL and IP_MSFILTER work on raw sockets
Currently, although IP_MULTICAST_ALL and IP_MSFILTER ioctl calls succeed on
raw sockets, there is no code to implement the functionality on received
packets; it is only implemented for UDP sockets. The raw(7) man page states:
"In addition, all ip(7) IPPROTO_IP socket options valid for datagram sockets
are supported", which implies these ioctls should work on raw sockets.

To fix this, add a call to ip_mc_sf_allow on raw sockets.

This should not break any existing code, since the current position of
not calling ip_mc_sf_filter makes it behave as if neither the IP_MULTICAST_ALL
nor the IP_MSFILTER ioctl had been called. Adding the call to ip_mc_sf_allow
will therefore maintain the current behaviour so long as IP_MULTICAST_ALL and
IP_MSFILTER ioctls are not called. Any code that currently is calling
IP_MULTICAST_ALL or IP_MSFILTER ioctls on raw sockets presumably is wanting
the filter to be applied, although no filtering will currently be occurring.

Signed-off-by: Quentin Armitage <quentin@armitage.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-23 15:13:26 -07:00
Sorin Dumitru
274f482d33 sock: remove skb argument from sk_rcvqueues_full
It hasn't been used since commit 0fd7bac(net: relax rcvbuf limits).

Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-23 13:23:06 -07:00
David Laight
526cbef778 net: sctp: Rename SCTP_XMIT_NAGLE_DELAY to SCTP_XMIT_DELAY
MSG_MORE and 'corking' a socket would require that the transmit of
a data chunk be delayed.
Rename the return value to be less specific.

Signed-off-by: David Laight <david.laight@aculab.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22 13:32:11 -07:00
David Laight
723189faca net: sctp: Open out the check for Nagle
The check for Nagle contains 6 separate checks all of which must be true
before a data packet is delayed.
Separate out each into its own 'if (test) return SCTP_XMIT_OK' so that
the reasons can be individually described.

Also return directly with SCTP_XMIT_RWND_FULL.
Delete the now-unused 'retval' variable and 'finish' label from
sctp_packet_can_append_data().

Signed-off-by: David Laight <david.laight@aculab.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22 13:32:10 -07:00
David S. Miller
8fd90bb889 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/infiniband/hw/cxgb4/device.c

The cxgb4 conflict was simply overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22 00:44:59 -07:00
Ursula Braun
1042cab862 af_iucv: avoid path quiesce of severed path in shutdown()
An af_iucv stress test showed -EPIPE results for sendmsg()
calls. They are caused by quiescing a path even though it has
been already severed by peer. For IUCV transport shutdown()
consists of 2 steps:
(1) sending the shutdown message to peer
(2) quiescing the iucv path
If the iucv path between these 2 steps is severed due to peer
closing the path, the quiesce step is no longer needed.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reported-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21 20:21:40 -07:00
David S. Miller
850717ef00 Included fixes:
- recognise and drop Bridge Loop Avoidance packets even if
   they are encapsulated in the 802.1q header multiple times.
   Forwarding them into the mesh creates issues on other
   nodes.
 - properly handle VLAN private objects in order to avoid race
   conditions upon fast VLAN deletion-addition. Such conditions
   create an unrecoverable inconsistency in the TT database of
   the nodes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJTzMYfAAoJEJgn97Bh2u9eKLIP+wWwqvRe5hFleA7Xd7vHS769
 20TrhDPZrQAcaK8dg8/VpqUZ4oGAi0WHhbhAdur1Vj3Ie5DDsqqu45lK9a/o+PAe
 avWafxcPcK5LLoLbDKNxX98n6BN3aNFIp7rUy4CDO7Beix/PfQUYGbZ01IEueNlX
 tvKz1oO7r3SvWFELltSU7bndU+0NoZRon5qXSaxnlYHMXcsJEJAKRPE9eLdwXUaF
 9h0oIKkPVQt8YFn0w1zZRePSPWGQSAb20exgRGwPxI23xs7ui1i+s5Od9aSt8FcR
 e6eNuMDsuHVeAmW+nsxF3WAyYGIGyaTb9sSkwrToXZge7BRFRfphKN1WHD1bp6A5
 a0Lu3wkzCJbrS3LZkjt99jh+0XAaaoWkAt4Lu4+VUcMYtfITHHHN4kfmzoPE7Z8y
 Qq64KL/ry6v2lqGk2+9G5/oHXMAYAyed+TPk/HSn5O0CS+zXxXFvrvbYyQyFg99X
 BcuOD6dGLbfaPQh9XuCE9jJ2D5QHnkAXj2FlK5oFd7y6ASdLltratTYNKJ4T7cVR
 +cyBkZ6cI3Ehzq1jrR8/9qqAal+a/jdzne6J7DPnWksDWxnTylANuWecVkETkpcL
 mUp6Zv9SYISqQSPtrbE7xu1XW/ICoajc+6H0eEOFhKU+JEqKjxwSE2QoKvzxeC8Y
 OHIbq99fItGwH7Vuldkg
 =RdJM
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
pull request [net]: batman-adv 20140721

here you have two fixes that we have been testing for quite some time
(this is why they arrived a bit late in the rc cycle).

Patch 1) ensures that BLA packets get dropped and not forwarded to the
mesh even if they reach batman-adv within QinQ frames. Forwarding them
into the mesh means messing up with the TT database of other nodes which
can generate all kind of unexpected behaviours during route computation.

Patch 2) avoids a couple of race conditions triggered upon fast VLAN
deletion-addition. Such race conditions are pretty dangerous because
they not only create inconsistencies in the TT database of the nodes
in the network, but such scenario is also unrecoverable (unless
nodes are rebooted).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21 20:19:09 -07:00
Eric Dumazet
10ec9472f0 ipv4: fix buffer overflow in ip_options_compile()
There is a benign buffer overflow in ip_options_compile spotted by
AddressSanitizer[1] :

Its benign because we always can access one extra byte in skb->head
(because header is followed by struct skb_shared_info), and in this case
this byte is not even used.

[28504.910798] ==================================================================
[28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile
[28504.913170] Read of size 1 by thread T15843:
[28504.914026]  [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0
[28504.915394]  [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120
[28504.916843]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
[28504.918175]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
[28504.919490]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
[28504.920835]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
[28504.922208]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
[28504.923459]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
[28504.924722]
[28504.925106] Allocated by thread T15843:
[28504.925815]  [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120
[28504.926884]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
[28504.927975]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
[28504.929175]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
[28504.930400]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
[28504.931677]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
[28504.932851]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
[28504.934018]
[28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right
[28504.934377]  of 40-byte region [ffff880026382800, ffff880026382828)
[28504.937144]
[28504.937474] Memory state around the buggy address:
[28504.938430]  ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr
[28504.939884]  ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.941294]  ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
[28504.942504]  ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.943483]  ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
[28504.945573]                         ^
[28504.946277]  ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.094949]  ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.096114]  ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.097116]  ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.098472]  ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.099804] Legend:
[28505.100269]  f - 8 freed bytes
[28505.100884]  r - 8 redzone bytes
[28505.101649]  . - 8 allocated bytes
[28505.102406]  x=1..7 - x allocated bytes + (8-x) redzone bytes
[28505.103637] ==================================================================

[1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21 20:16:26 -07:00
Antonio Quartulli
35df3b298f batman-adv: fix TT VLAN inconsistency on VLAN re-add
When a VLAN interface (on top of batX) is removed and
re-added within a short timeframe TT does not have enough
time to properly cleanup. This creates an internal TT state
mismatch as the newly created softif_vlan will be
initialized from scratch with a TT client count of zero
(even if TT entries for this VLAN still exist). The
resulting TT messages are bogus due to the counter / tt
client listing mismatch, thus creating inconsistencies on
every node in the network

To fix this issue destroy_vlan() has to not free the VLAN
object immediately but it has to be kept alive until all the
TT entries for this VLAN have been removed. destroy_vlan()
still removes the sysfs folder so that the user has the
feeling that everything went fine.

If the same VLAN is re-added before the old object is free'd,
then the latter is resurrected and re-used.

Implement such behaviour by increasing the reference counter
of a softif_vlan object every time a new local TT entry for
such VLAN is created and remove the object from the list
only when all the TT entries have been destroyed.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-07-21 09:49:30 +02:00
Simon Wunderlich
d46b6bfa76 batman-adv: drop QinQ claim frames in bridge loop avoidance
Since bridge loop avoidance only supports untagged or simple 802.1q
tagged VLAN claim frames, claim frames with stacked VLAN headers (QinQ)
should be detected and dropped. Transporting the over the mesh may cause
problems on the receivers, or create bogus entries in the local tt
tables.

Reported-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-07-21 09:05:31 +02:00
Ben Hutchings
640d7efe4c dns_resolver: Null-terminate the right string
*_result[len] is parsed as *(_result[len]) which is not at all what we
want to touch here.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 84a7c0b1db ("dns_resolver: assure that dns_query() result is null-terminated")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20 22:33:32 -07:00
Wei Yongjun
52f50ce556 tipc: fix sparse non static symbol warnings
Fixes the following sparse warnings:

net/tipc/socket.c:545:5: warning:
 symbol 'tipc_sk_proto_rcv' was not declared. Should it be static?
net/tipc/socket.c:2015:5: warning:
 symbol 'tipc_ioctl' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20 22:19:04 -07:00
Andrey Utkin
fa4eff44a6 net/rxrpc/ar-key.c: drop negativity check on unsigned value
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=80611
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20 21:25:56 -07:00
David S. Miller
a8138f42d4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains updates for your net-next tree,
they are:

1) Use kvfree() helper function from x_tables, from Eric Dumazet.

2) Remove extra timer from the conntrack ecache extension, use a
   workqueue instead to redeliver lost events to userspace instead,
   from Florian Westphal.

3) Removal of the ulog targets for ebtables and iptables. The nflog
   infrastructure superseded this almost 9 years ago, time to get rid
   of this code.

4) Replace the list of loggers by an array now that we can only have
   two possible non-overlapping logger flavours, ie. kernel ring buffer
   and netlink logging.

5) Move Eric Dumazet's log buffer code to nf_log to reuse it from
   all of the supported per-family loggers.

6) Consolidate nf_log_packet() as an unified interface for packet logging.
   After this patch, if the struct nf_loginfo is available, it explicitly
   selects the logger that is used.

7) Move ip and ip6 logging code from xt_LOG to the corresponding
   per-family loggers. Thus, x_tables and nf_tables share the same code
   for packet logging.

8) Add generic ARP packet logger, which is used by nf_tables. The
   format aims to be consistent with the output of xt_LOG.

9) Add generic bridge packet logger. Again, this is used by nf_tables
   and it routes the packets to the real family loggers. As a result,
   we get consistent logging format for the bridge family. The ebt_log
   logging code has been intentionally left in place not to break
   backward compatibility since the logging output differs from xt_LOG.

10) Update nft_log to explicitly request the required family logger when
    needed.

11) Finish nft_log so it supports arp, ip, ip6, bridge and inet families.
    Allowing selection between netlink and kernel buffer ring logging.

12) Several fixes coming after the netfilter core logging changes spotted
    by robots.

13) Use IS_ENABLED() macros whenever possible in the netfilter tree,
    from Duan Jiong.

14) Removal of a couple of unnecessary branch before kfree, from Fabian
    Frederick.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20 21:01:43 -07:00
Cong Wang
7801db8aec net_sched: avoid generating same handle for u32 filters
When kernel generates a handle for a u32 filter, it tries to start
from the max in the bucket. So when we have a filter with the max (fff)
handle, it will cause kernel always generates the same handle for new
filters. This can be shown by the following command:

	tc qdisc add dev eth0 ingress
	tc filter add dev eth0 parent ffff: protocol ip pref 770 handle 800::fff u32 match ip protocol 1 0xff
	tc filter add dev eth0 parent ffff: protocol ip pref 770 u32 match ip protocol 1 0xff
	...

we will get some u32 filters with same handle:

 # tc filter show dev eth0 parent ffff:
filter protocol ip pref 770 u32
filter protocol ip pref 770 u32 fh 800: ht divisor 1
filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
  match 00010000/00ff0000 at 8
filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
  match 00010000/00ff0000 at 8
filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
  match 00010000/00ff0000 at 8
filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
  match 00010000/00ff0000 at 8

handles should be unique. This patch fixes it by looking up a bitmap,
so that can guarantee the handle is as unique as possible. For compatibility,
we still start from 0x800.

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20 20:49:17 -07:00
Veaceslav Falico
6fe82a39e5 net: print a notification on device rename
Currently it's done silently (from the kernel part), and thus it might be
hard to track the renames from logs.

Add a simple netdev_info() to notify the rename, but only in case the
previous name was valid.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Vlad Yasevich <vyasevic@redhat.com>
CC: stephen hemminger <stephen@networkplumber.org>
CC: Jerry Chu <hkchu@google.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20 20:44:25 -07:00
Veaceslav Falico
ccc7f4968a net: print net_device reg_state in netdev_* unless it's registered
This way we'll always know in what status the device is, unless it's
running normally (i.e. NETDEV_REGISTERED).

Also, emit a warning once in case of a bad reg_state.

CC: "David S. Miller" <davem@davemloft.net>
CC: Jason Baron <jbaron@akamai.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Vlad Yasevich <vyasevic@redhat.com>
CC: stephen hemminger <stephen@networkplumber.org>
CC: Jerry Chu <hkchu@google.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
CC: Joe Perches <joe@perches.com>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20 20:38:43 -07:00
Cong Wang
224e923cd9 net_sched: hold tcf_lock in netdevice notifier
We modify mirred action (m->tcfm_dev) in netdev event, we need to
prevent on-going mirred actions from reading freed m->tcfm_dev.
So we need to acquire this spin lock.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20 20:31:42 -07:00
Anish Bhatt
c2659479f7 Update setapp/getapp prototypes in dcbnl_rtnl_ops to return int instead of u8
v2: fixed issue with checking return of dcbnl_rtnl_ops->getapp()

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-17 16:02:29 -07:00
Cong Wang
9cc63db5e1 net_sched: cancel nest attribute on failure in tcf_exts_dump()
Like other places, we need to cancel the nest attribute after
we start. Fortunately the netlink message will not be sent on
failure, so it's not a big problem at all.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-17 14:58:52 -07:00
stephen hemminger
48e48a70c0 openvswitch: make generic netlink group const
Generic netlink tables can be const.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 23:41:13 -07:00
David Held
2dc41cff75 udp: Use hash2 for long hash1 chains in __udp*_lib_mcast_deliver.
Many multicast sources can have the same port which can result in a very
large list when hashing by port only. Hash by address and port instead
if this is the case. This makes multicast more similar to unicast.

On a 24-core machine receiving from 500 multicast sockets on the same
port, before this patch 80% of system CPU was used up by spin locking
and only ~25% of packets were successfully delivered.

With this patch, all packets are delivered and kernel overhead is ~8%
system CPU on spinlocks.

Signed-off-by: David Held <drheld@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 23:29:52 -07:00
David Held
5cf3d46192 udp: Simplify __udp*_lib_mcast_deliver.
Switch to using sk_nulls_for_each which shortens the code and makes it
easier to update.

Signed-off-by: David Held <drheld@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 23:29:52 -07:00
Varka Bhadram
498044bb2b netlink: remove bool varible
This patch removes the bool variable 'pass'.
If the swith case exist return true or return false.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 23:15:00 -07:00
Alexander Duyck
c8a89c4a1d rtnetlink: Drop unnecessary return value from ndo_dflt_fdb_del
This change cleans up ndo_dflt_fdb_del to drop the ENOTSUPP return value since
that isn't actually returned anywhere in the code.  As a result we are able to
drop a few lines by just defaulting this to -EINVAL.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 23:13:26 -07:00
françois romieu
a40e0a664b net: remove open-coded skb_cow_head.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 22:42:32 -07:00
Jon Paul Maloy
6f92ee54b3 tipc: ensure sequential message delivery across dual bearers
When we run broadcast packets over dual bearers/interfaces, the
current transmission code is flipping bearers between each sent
packet, with the purpose of leveraging the double bandwidth
available. The receiving bclink is resequencing the packets if
needed, so all messages are delivered upwards from the broadcast
link in the correct order, even if they may arrive in concurrent
interrupts.

However, at the moment of delivery upwards to the socket, we release
all spinlocks (bclink_lock, node_lock), so it is still possible
that arriving messages bypass each other before they reach the socket
queue.

We fix this by applying the same technique we are using for unicast
traffic. We use a link selector (i.e., the last bit of sending port
number) to ensure that messages from the same sender socket always are
sent over the same bearer. This guarantees sequential delivery between
socket pairs, which is sufficient to satisfy the protocol spec, as well
as all known user requirements.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 21:38:19 -07:00
Jon Paul Maloy
9fbfb8b120 tipc: rename temporarily named functions
After the previous commit, we can now give the functions with temporary
names, such as tipc_link_xmit2(), tipc_msg_build2() etc., their proper
names.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 21:38:19 -07:00
Jon Paul Maloy
c4116e1057 tipc: remove unreferenced functions
We can now remove a number of functions which have become obsolete
and unreferenced through this commit series. There are no functional
changes in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 21:38:19 -07:00
Jon Paul Maloy
0abd8ff21f tipc: start using the new multicast functions
In this commit, we convert the socket multicast send function to
directly call the new multicast/broadcast function (tipc_bclink_xmit2())
introduced in the previous commit. We do this instead of letting the
call go via the now obsolete tipc_port_mcast_xmit(), hence saving
a call level and some code complexity.

We also remove the initial destination lookup at the message sending
side, and replace that with an unconditional lookup at the receiving
side, including on the sending node itself. This makes the destination
lookup and message transfer more uniform than before.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 21:38:18 -07:00
Jon Paul Maloy
078bec826f tipc: add new functions for multicast and broadcast distribution
We add a new broadcast link transmit function in bclink.c and a new
receive function in socket.c. The purpose is to move the branching
between external and internal destination down to the link layer,
just as we have done with unicast in earlier commits. We also make
use of the new link-independent fragmentation support that was
introduced in an earlier commit series.

This gives a shorter and simpler code path, and makes it possible
to obtain copy-free buffer delivery to all node local destination
sockets.

The new transmission code is added in parallel with the existing one,
and will be used by the socket multicast send function in the next
commit in this series.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 21:38:18 -07:00
Jon Paul Maloy
25b660c7e2 tipc: let internal link users call the new link send function
We convert the link internal users (changeover protocol, broadcast
synchronization) to use the new packet send function.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 21:38:18 -07:00
Jon Paul Maloy
dbdf6d24ad tipc: make name table distributor use new send function
In a previous commit series ("tipc: new unicast transmission code")
we introduced a new message sending function, tipc_link_xmit2(),
and moved the unicast data users over to use that function. We now
let the internal name table distributor do the same.

The interaction between the name distributor and the node/link
layer also becomes significantly simpler, so we can eliminate
the function tipc_link_names_xmit().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 21:38:18 -07:00
David S. Miller
38a4dfcf80 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/nf_tables fixes

The following patchset contains nf_tables fixes, they are:

1) Fix wrong transaction handling when the table flags are not
   modified.

2) Fix missing rcu read_lock section in the netlink dump path, which
   is not protected by the nfnl_lock.

3) Set NLM_F_DUMP_INTR in the netlink dump path to indicate
   interferences with updates.

4) Fix 64 bits chain counters when they are retrieved from a 32 bits
   arch, from Eric Dumazet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 15:27:16 -07:00
Jerry Chu
c3caf1192f net-gre-gro: Fix a bug that breaks the forwarding path
Fixed a bug that was introduced by my GRE-GRO patch
(bf5a755f5e net-gre-gro: Add GRE
support to the GRO stack) that breaks the forwarding path
because various GSO related fields were not set. The bug will
cause on the egress path either the GSO code to fail, or a
GRE-TSO capable (NETIF_F_GSO_GRE) NICs to choke. The following
fix has been tested for both cases.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 14:45:26 -07:00
Daniel Borkmann
bbbea41d5e net: sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support
With support of SCTP_SNDINFO/SCTP_RCVINFO as described in RFC6458,
5.3.4/5.3.5, we can now deprecate SCTP_SNDRCV. The RFC already
declares it as deprecated:

  This structure mixes the send and receive path. SCTP_SNDINFO
  (described in Section 5.3.4) and SCTP_RCVINFO (described in
  Section 5.3.5) split this information. These structures should
  be used, when possible, since SCTP_SNDRCV is deprecated.

So whenever a user tries to subscribe to sctp_data_io_event via
setsockopt(2) which triggers inclusion of SCTP_SNDRCV cmsg_type,
issue a warning in the log.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 14:40:04 -07:00
Geir Ola Vaagland
6b3fd5f3a2 net: sctp: implement rfc6458, 8.1.31. SCTP_DEFAULT_SNDINFO support
This patch implements section 8.1.31. of RFC6458, which adds support
for setting/retrieving SCTP_DEFAULT_SNDINFO:

  Applications that wish to use the sendto() system call may wish
  to specify a default set of parameters that would normally be
  supplied through the inclusion of ancillary data. This socket
  option allows such an application to set the default sctp_sndinfo
  structure. The application that wishes to use this socket option
  simply passes the sctp_sndinfo structure (defined in Section 5.3.4)
  to this call. The input parameters accepted by this call include
  snd_sid, snd_flags, snd_ppid, and snd_context. The snd_flags
  parameter is composed of a bitwise OR of SCTP_UNORDERED, SCTP_EOF,
  and SCTP_SENDALL. The snd_assoc_id field specifies the association
  to which to apply the parameters. For a one-to-many style socket,
  any of the predefined constants are also allowed in this field.
  The field is ignored for one-to-one style sockets.

Joint work with Daniel Borkmann.

Signed-off-by: Geir Ola Vaagland <geirola@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 14:40:04 -07:00
Geir Ola Vaagland
2347c80ff1 net: sctp: implement rfc6458, 5.3.6. SCTP_NXTINFO cmsg support
This patch implements section 5.3.6. of RFC6458, that is, support
for 'SCTP Next Receive Information Structure' (SCTP_NXTINFO) which
is placed into ancillary data cmsghdr structure for each recvmsg()
call, if this information is already available when delivering the
current message.

This option can be enabled/disabled via setsockopt(2) on SOL_SCTP
level by setting an int value with 1/0 for SCTP_RECVNXTINFO in
user space applications as per RFC6458, section 8.1.30.

The sctp_nxtinfo structure is defined as per RFC as below ...

  struct sctp_nxtinfo {
    uint16_t nxt_sid;
    uint16_t nxt_flags;
    uint32_t nxt_ppid;
    uint32_t nxt_length;
    sctp_assoc_t nxt_assoc_id;
  };

... and provided under cmsg_level IPPROTO_SCTP, cmsg_type
SCTP_NXTINFO, while cmsg_data[] contains struct sctp_nxtinfo.

Joint work with Daniel Borkmann.

Signed-off-by: Geir Ola Vaagland <geirola@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 14:40:03 -07:00
Geir Ola Vaagland
0d3a421d28 net: sctp: implement rfc6458, 5.3.5. SCTP_RCVINFO cmsg support
This patch implements section 5.3.5. of RFC6458, that is, support
for 'SCTP Receive Information Structure' (SCTP_RCVINFO) which is
placed into ancillary data cmsghdr structure for each recvmsg()
call.

This option can be enabled/disabled via setsockopt(2) on SOL_SCTP
level by setting an int value with 1/0 for SCTP_RECVRCVINFO in user
space applications as per RFC6458, section 8.1.29.

The sctp_rcvinfo structure is defined as per RFC as below ...

  struct sctp_rcvinfo {
    uint16_t rcv_sid;
    uint16_t rcv_ssn;
    uint16_t rcv_flags;
    <-- 2 bytes hole  -->
    uint32_t rcv_ppid;
    uint32_t rcv_tsn;
    uint32_t rcv_cumtsn;
    uint32_t rcv_context;
    sctp_assoc_t rcv_assoc_id;
  };

... and provided under cmsg_level IPPROTO_SCTP, cmsg_type
SCTP_RCVINFO, while cmsg_data[] contains struct sctp_rcvinfo.
An sctp_rcvinfo item always corresponds to the data in msg_iov.

Joint work with Daniel Borkmann.

Signed-off-by: Geir Ola Vaagland <geirola@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 14:40:03 -07:00
Geir Ola Vaagland
63b949382c net: sctp: implement rfc6458, 5.3.4. SCTP_SNDINFO cmsg support
This patch implements section 5.3.4. of RFC6458, that is, support
for 'SCTP Send Information Structure' (SCTP_SNDINFO) which can be
placed into ancillary data cmsghdr structure for sendmsg() calls.

The sctp_sndinfo structure is defined as per RFC as below ...

  struct sctp_sndinfo {
    uint16_t snd_sid;
    uint16_t snd_flags;
    uint32_t snd_ppid;
    uint32_t snd_context;
    sctp_assoc_t snd_assoc_id;
  };

... and supplied under cmsg_level IPPROTO_SCTP, cmsg_type
SCTP_SNDINFO, while cmsg_data[] contains struct sctp_sndinfo.
An sctp_sndinfo item always corresponds to the data in msg_iov.

Joint work with Daniel Borkmann.

Signed-off-by: Geir Ola Vaagland <geirola@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16 14:40:03 -07:00